Open Access and Scientific Research: Are Digital Repositories A Solution? | Libraries | Colorado State University
peter.suber's bookmarks 2019-01-14
"Digital repository tools such as DSpace and Fedora are now being widely promoted for the capture and preservation of primary academic research output and many institutions and funders are starting to mandate such processes. In principle this effort can be extended to the data on which scientific research rests and has the potential of generating a huge resource for data-driven practice. In Cambridge, we have started to explore this, and I shall report - with interactive demonstrations - on what is currently possible.
However there are many problems that do not apply to research articles (usually in PDF). These include:
- Ownership of data. Although in principle data cannot be copyrighted, in practice many publishers require scientists to hand over their data which they can then re-sell. Even without this there are strong cultural issues about who owns the data - the funder, the institution, the research group, the scientist, etc.
- Syntactic problems. PDF is a disaster - it destroys data. The technical answer is simple - use XML, but the culture must change.
- Semantics and ontology. Digital data is only useful if we know what it means and how it can be used. This requires a community-wide issue, ideally by learned societies.
- Metadata. All metadata must be populated by machines. This should be possible for rights, formats, provenance, For discovery and indexing the only realistic approach is automatic indexing of free text (neither scientists nor librarians have the time or knowledge to do this).
- Repositories. The relative success of reposition of PDFs suggests that it should be straightforward to do the same for scientific data. It is anything but easy. We need schemas, use cases, and much software.
But the major problem is getting it to happen. We believe that the best place to start is with theses. ..."