The ContentMine Scraping Stack: Literature-scale Content Mining with Community-maintained Collections of Declarative Scrapers

abernard102@gmail.com 2014-11-17

Summary:

Use the link to access the full text articel from D-Lib Magazine.  "Successfully mining scholarly literature at scale is inhibited by technical and political barriers that have been only partially addressed by publishers' application programming interfaces (APIs). Many of those APIs have restrictions that inhibit data mining at scale, and while only some publishers actually provide APIs, almost all publishers make their content available on the web. Current web technologies should make it possible to harvest and mine the scholarly literature regardless of the source of publication, and without using specialised programmatic interfaces controlled by each publisher. Here we describe the tools developed to address this challenge as part of the ContentMine project."

Link:

http://www.dlib.org/dlib/november14/smith-unna/11smith-unna.html

From feeds:

Open Access Tracking Project (OATP) » abernard102@gmail.com

Tags:

oa.new oa.mining oa.harvesting oa.tools oa.contentmine

Date tagged:

11/17/2014, 12:10

Date published:

11/17/2014, 07:10