Scholarly metadata in R

R-bloggers 2013-03-16

Summary:

Scholarly metadata - the meta-information surrounding articles - can be super useful. Although metadata does not contain the full content of articles, it contains a lot of useful information, including title, authors, abstract, URL to the article, etc.

One of the largest sources of metadata is provided via the Open Archives Initiative Protocol for Metadata Harvesting or OAI-PMH. Many publishers, provide their metadata through their own endpoint, and implement the standard OAI-PMH methods: GetRecord, Identify, ListIdentifiers, ListMetadataFormats, ListRecords, and ListSets. Many providers use OAI-PMH, including DataCite, Dryad, and PubMed.

Some data-/article-providers provide their metadata via their own APIs. For example, Nature Publishing Group provides their own metadata API here in non OAI-PMH format; you can get PLoS metadata through their search API, and the BHL (see below) provides their own custom metadata service.

In addition, CrossRef provides a number of metadata search services: metadata search and openurl.

What about the other publishers? (please tell me if I'm wrong about these three)

  • Springer has a metadata API, but it is terrible, soooo...
  • Elsevier, are you kidding? Well, they do have some sort of API service, but its a pain in the ass.
  • Wiley, no better than Elsevier.

Note that metadata can live in other places:

  • Another package being developed by David Springate, rpubmed can get PubMed metadata.
  • Our wrapper to the Mendeley API, RMendeley, gets article metadata via Mendeley's database.
  • Our wrapper to the Biodiversity Heritage Library API here gets their metadata.

No, you can't get metadata via Google Scholar - the don't allow scraping, and don't have expose their data via an API.

I have discussed this package in a previous blog post, but have since worked on the code a bit, and thought it deserved a new post.

You can see a tutorial for this package here, and contribute to the code here.


Install rmetadata

# install_github('rmetadata', 'ropensci') # uncomment to installlibrary(rmetadata)

Count OAI-PMH identifiers for a data provider.

# For DataCite.count_identifiers("datacite")  provider   count1 datacite 1216193

Lookup article info via CrossRef with DOI and get a citation.

As Bibtex

print(crossref_citation("10.3998/3336451.0009.101"), style = "Bibtex")@Article{,  title = {In Google We Trust?},  author = {Geoffrey Bilder},  journal = {The Journal of Electronic Publishing},  year = {2006},  month = {01},  volume = {9},  doi = {10.3998/3336451.0009.101},}

As regular text

print(crossref_citation("10.3998/3336451.0009.101"), style = "text")Bilder G (2006). "In Google We Trust?" _The Journal of ElectronicPublishing_, *

Link:

http://feedproxy.google.com/~r/RBloggers/~3/7pAHA7WgIyM/

From feeds:

Statistics and Visualization ยป R-bloggers

Tags:

Authors:

Recology - R

Date tagged:

03/16/2013, 22:00

Date published:

03/14/2013, 03:00