How Long Does It Take to Text-Mine 55,000 Publisher Web Pages?: A Technical Update from the 3rd of September 2014 Jisc Monitor Webinar from Richard Jones | Jisc Monitor

abernard102@gmail.com 2014-09-05

Summary:

"Running up to the end of the first quarter, we have been working on detecting licences of open access articles from the Directory of Open Access Journals (DOAJ).  There are 1.7 million articles in the DOAJ, and approximately 50% have DOIs (and so are processable), so the scale of the task is not insignificant.  Even if we constrain the data set to only journals published in the UK, there are around 160,000 articles we may want to process.  This poses a significant technical challenge, as in order to detect the licence, we need to resolve each DOI, download the content, and mine the text for information which is time consuming and error-prone. We are using the Open Article Gauge (OAG) service to perform the heavy-lifting in this process. It has been developed by Cottage Labs with funding from PLOS  to solve the problem of determining what licence end-users of articles are actually provided when reading the full-text (as opposed to what licence the publisher asserts in their terms and conditions an article will have).  Detailed diagrams on its internal workflow and API are available from the website via the links attached ..."

Link:

http://jiscmonitor.jiscinvolve.org/wp/2014/09/04/how-long-does-it-take-to-text-mine-55000-publisher-web-pages-a-technical-update-from-the-3rd-of-september-2014-jisc-monitor-webinar-from-richard-jones/

From feeds:

Open Access Tracking Project (OATP) » abernard102@gmail.com

Tags:

oa.new oa.comment oa.tools oa.apis oa.plos oa.jisc oa.doaj oa.dois oa.gold oa.copyright oa.licensing oa.mining oa.libre oa.journals

Date tagged:

09/05/2014, 10:56

Date published:

09/05/2014, 06:56