ScienceBeam: Using open technology to extract knowledge from research PDFs | Labs | eLife

peter.suber's bookmarks 2021-01-19

Summary:

"When we first embarked on the ScienceBeam project just over two years ago, we had one clear goal in mind: to liberate knowledge locked inside academic papers published in the print-era PDF format, and make it available to new, web-native tools and services that could improve the experience of publishing, discovering and consuming science. With the rapidly increasing popularity of preprints, those goals are even more valid today.

In order to make better use of the knowledge locked inside academic research PDFs, we need to extract information in a semantically structured way, that is to say in a way that lets us understand and record what it is that we are extracting. This goal is not an easy one to achieve, as the PDF format, with its primary focus on presentation, does not do much to help represent the semantic structure of a paper. PDF has no concept of an “Abstract” or a “Methods section”, much less which strings of text signify an author’s name, their affiliation, a reference.

Authors pay for this by having to fill lengthy submission forms with information they already included in their submitted Word or PDF manuscript, because no submission system is smart enough to accurately extract that information on its own. Production staff working at journals pay for it in time and effort ensuring those forms match the contents of the paper. Scientific data miners and software developers pay for it by spending resources on data extraction that could be much better spent on data analysis. And a whole industry has developed around the painstaking manual conversion of Word and PDF submissions into more web-friendly formats that power the online academic publishing industry.

These are challenging problems, but we’re making great progress. Here we share an update on ScienceBeam’s latest results, launch a working prototype you can try out today, and talk about how you can contribute to the project to help move it forward...."

Link:

https://elifesciences.org/labs/743da0fc/sciencebeam-using-open-technology-to-extract-knowledge-from-research-pdfs

Updated:

01/19/2021, 05:51

From feeds:

Open Access Tracking Project (OATP) » peter.suber's bookmarks

Tags:

oa.new oa.tools oa.floss oa.sciencebeam oa.extraction oa.semantic oa.data oa.ai oa.pdf oa.xml oa.preprints oa.biorxiv oa.versions oa.formats

Date tagged:

01/19/2021, 10:49

Date published:

12/04/2020, 05:51