API to check if a publication is “Open Access”

abernard102@gmail.com 2012-08-20

Summary:

[Use the link above to access the blog post which includes screen shots, the embedded source code, and a video demonstration of the API developed by the blogger and discussed below. The API was developed to enhance the blogger’s use of Researchr. Researchr is “Researchr is a web site for finding, collecting, sharing, and reviewing scientific publications, for researchers by researchers.”] ... “What new ways of interacting with publications can be enabled by Open Access, which are impossible with Toll Access journals, even those for which you have paid subscription fees... Of course, the extent to which innovative new workflows and analyses can be implemented also depend on the ‘degree’ of openess, given that the term Open Access covers a wide array of cases. The minimal (but also the most common) case is to expect the ability to download a PDF without logging in, or paying money. The ideal case would be a Creative Commons-licensed article, preferably in a semantic open text-based format (not PDF), with machine-readable metadata, linked datasets, semantic citations, etc... I have spent a lot of time pushing for policy change around OA, and I am in awe of what we have already accomplished, and I also know we have a lot left to do. However, I feel that the OA community has not sufficiently addressed this other technical part, and I believe increasing the use-value of OA journals can do a lot to convince academics to become more active with self-archiving etc. I mentioned a number of dimensions of openness above, but even with just the PDF file available, there are things we can do. When I see a citation in a journal article, on a social citation sharing website, or on somebody’s blog or wiki, which I find interesting, what is the first thing I want to do? Import the citation and the PDF into my citation manager. If you are really lucky, the location that posted the citation also offers machine-readable metadata, such as BibTeX. However, there is very seldom any distinction made between OA and non-OA publications – in both cases, the PDFs are not hosted on the site, because of copyright issues (which is correct in most cases, since a very small percentage of OA articles are openly licensed). Sadly, it is very rare that the metadata contains a link to download the PDF, even though theoretically, that would actually be the most important part of the metadata. (After all, the reason we create such standardized formatting schemes is to make it easy for other people to locate the article we cited). And the few times the URL field in a BibTeX entry is filled in, it goes to an abstract page, from which we can download the PDF. The same is true for DOI, which always resolves to an HTML abstract page, and almost never to the actual publication – even when the publication is OA, and can be downloaded without login, etc. Initially, Researchr was no distinction. I simply captured the BibTeX offered by Google Scholar, or other sites. Of course, I always endeavored to find the PDF and download for my own purposes, but I was not able to share these online, because of copyright. Thus, an article page on my wiki for an OA article, would look identical to that of a TA article, and users would have to copy the title into Google Scholar to locate the PDF. I really wanted to change this, and thought about ways of automatically capturing the download URL when importing PDFs. It turns out there is a very elegant solution – OSX Finder stores the download URL as part of a file’s metadata, even when using Chrome. We can easily access this information through the command line Thus I could easily add this URL to the citation’s metadata, tag it as Open Access, display this in various ways in the wiki, etc. However, was the file really Open Access? If I am sitting at the University of Toronto, I can download PDFs from all the big publishers, because they have ‘whitelisted’ the IP ranges belonging to the University of Toronto. However, these PDFs are not available outside of the university. How to distinguish? I wrote a tiny little API, which I uploaded to my public server, which sits outside of University of Toronto, and does not have any special privileges. It accepts a URL as an argument, and attempts to download the header of that URL ... It checks whether the URL accepts at all, and whether it is of the kind ‘PDF’ ... If successful, it simply returns ‘true’, and the URL is added to the file’s metadata. So far, I have not done anything specific to display this fact, although the citation template automatically displays the contents of the URL field in the citation. Eventually, I want to make a big nice (green?) icon next to this metadata field saying ‘PDF Download’ or something similar. However, since the URL field is also present in the hidden machine-readable metadata field below, we are able to do some fun stuff. traditionally used Ctrl+Alt+Cmd+B as a shortcut to grab a citation from Google Scholar. I enhanced this script to also look for the hidden BibTeX on a Researchr wiki page. If it finds it, it imports it into BibDesk. If it also f

Link:

http://reganmian.net/blog/2012/04/17/api-to-check-if-a-publication-is-open-access/

Updated:

08/16/2012, 06:08

From feeds:

Open Access Tracking Project (OATP) » abernard102@gmail.com

Tags:

oa.new oa.data oa.gold oa.business_models oa.publishers oa.licensing oa.comment oa.lod oa.video oa.green oa.advocacy oa.copyright oa.cc oa.metadata oa.social_media oa.hybrid oa.floss oa.citations oa.wikis oa.apis oa.bibdesk oa.researchr oa.bibtex oa.semantic_citations oa.repositories oa.libre oa.journals oa.dois oa.google_scholar

Authors:

abernard

Date tagged:

08/20/2012, 18:16

Date published:

04/18/2012, 14:13