Open Citations Corpus Import Process | Open Citations and Related Work
abernard102@gmail.com 2013-03-07
Summary:
"As part of the Open Citations project, we have been asked to review and improve the process of importing data into the Open Citations Corpus, taking the scripts from the initial project as our starting point.
The current import procedure evolved from several disconnected processes and requires running multiple command line scripts and transforming the data into different intermediate formats. As a consequence, it is not very efficient and we will be looking to improve on the speed and reliability of the import procedure. Moreover, there are two distinct procedures depending on the source of the data (arXiv or PubMed Central); we are hoping to unify the common parts of these procedures into a single process which can be simplified and normalised to improve code re-use and comprehensibility ..."