Publication & License Harvesting – Development Update from Richard Jones | Jisc Monitor

abernard102@gmail.com 2014-08-08

Summary:

"We began this sprint by focussing on running the DOAJ article data through howopenisit.org, since this seemed likely to present the most challenges, and therefore require the most time in this section of the project. We made a partial clone of the Directory of Open Access Journals (DOAJ) data, containing 100,000 articles which have Digital Object Identifiers (DOIs); approximately, 50% of the DOAJ articles do not have DOIs.  This is a large enough subset of the 1.7 million articles in DOAJ that we can get a representative feel for the datal it also presents us with scalability challenges which we can work to overcome to ensure that the full dataset is ultimately processable. It was also necessary for us to create a new client library for howopenisit.org, which was geared specifically towards high volume, long running requests for data from the service. Since howopenisit.org downloads content and analyses that content, it can take a long time to process all the identifiers (in the order of many hours or even days), so the client library would have to be robust enough to handle this kind of operation. The current version can be found here: (https://github.com/JiscMonitor/doaj2oag/blob/master/doaj2oag/oag.py). In an initial run of 5000 of these identifiers, our results showed 50% of the articles licences could not be detected, 30% were CC BY, 15% were Free to Read, and the remaining 5% were other CC licences.  This is only a small sub-set of the data, and therefore may not be representative of the bigger picture, but it gives us an early indication, as well as points us in the direction of some issues that we will need to resolve with howopenisit.org ..."

Link:

http://jiscmonitor.jiscinvolve.org/wp/2014/08/06/publication-license-harvesting-development-update-from-richard-jones/

From feeds:

Open Access Tracking Project (OATP) » abernard102@gmail.com
Open Access Tracking Project (OATP) » pontika.nancy@gmail.com's bookmarks

Tags:

oa.gratis oa.cc oa.licensing oa.copyright oa.standards oa.tools oa.howopenisit oa.doaj oa.jisc oa.comment oa.new ru.sparc oa.libre

Date tagged:

08/08/2014, 15:13

Date published:

08/08/2014, 08:09