OA STM Corpus

abernard102@gmail.com 2015-05-06

Summary:

"Natural Language Processing (NLP) tools perform best if they are used on the same kind of content on which they were trained and tested. Unfortunately for those in the STM domains, our content has some big differences from the newswire text that is commonly used in the development of most NLP tools. There are some corpora of STM content, but the ones we know of are specific to one domain, such as biomedicine, and will typically consist of abstracts instead of full articles. This is less than optimum because math articles are very different from biomed articles, and articles are very different from abstracts ..."

Link:

http://elsevierlabs.github.io/OA-STM-Corpus/

From feeds:

Open Access Tracking Project (OATP) ยป abernard102@gmail.com

Tags:

oa.new oa.tools oa.elsevier oa.formats

Date tagged:

05/06/2015, 13:41

Date published:

05/06/2015, 09:41