British Library PhD placement scheme – project profile: Data Mining of Doctoral Theses
lterrat's bookmarks 2017-01-08
"The British Library manages the national database of UK doctoral theses, called EThOS (http://ethos.bl.uk). We work with UK universities to aggregate information about all theses produced by PhD students in the UK, and provide full access to as many as possible to support researchers everywhere.
The main purpose of EThOS is to support researchers to search for, discover and access theses for use in their own research. Where possible, users can download the full thesis or order a scan of older print theses, and EThOS is one of the British Library’s most well used research resources.
But the almost complete aggregation of the metadata records of all UK PhD theses – some 450,000 – also has enormous value. For example, users can compare one university’s outputs against another, make connections between thesis authors and their supervisors, or analyse trends in funding for doctoral research. The value of this large dataset is increasingly recognised, and we now want to extend and improve the data held within it as much as we possibly can. To analyse funding trends, we need consistent, comprehensive funding body information; to make connections between thesis authors and their supervisors, we need supervisor names to be present in the metadata records; to analyse trends in the research itself, we need at least the thesis abstract so that people can mine and re-use the information.
This placement project focuses on three metadata elements – supervisor names, research funding organisations, and the thesis abstracts. Very often, these data elements are described within the ‘front matter’ pages of the full theses themselves but not (yet) in the re-usable metadata records. The aim of the project is to use content mining methods to extract the missing data from the full theses."