Bioscientists Publish their Research Data

abernard102@gmail.com 2012-08-20

Summary:

Q: What is Dryad? A: Dryad is a repository for the data that supports the findings in the scientific literature. Its focus is on biology, biomedicine, and related fields [1].  While there already exist primary archives for a few specialized kinds of biological data, like DNA sequences, the long tail of research data requires a repository such as Dryad that can accept data files of varying formats. Dryad is also a membership organization that provides a platform for publishers, librarians, and other stakeholders to work together toward the common goal of having the evidence base of the scientific literature preserved and made openly available – not only for the validation of published findings but also to drive new research. Q: Dryad’s contact with researchers is through the journals in which they publish rather than institutions where they work.  Why is that? A: I believe the over-riding challenge in making research data available for reuse is winning the involvement of the researchers who collected the data.  Since researchers already select and organize the most valuable and reliable of their data in preparing their articles for publication, it takes relatively little extra effort for them to then release that data to the journal or repository as part of the publication process.  There is always an article that describes why the data were collected, the methods used, the results obtained, and so on.  And the study has been judged by peer reviewers to be of value to the scientific record.  So there is some assurance that the data are both reusable and merit preservation... Q: How are authors motivated to deposit data? A: The majority of researchers favor research data being available for reuse but need to be assured that their colleagues will also release their data, that their journal or funder or institution cares, and that the professional credit they receive is going to outweigh the risks (such as getting the next paper scooped).  One critical way to achieve this is for research organizations to send a loud and clear message that public data archiving is expected as a matter of course, and to live up to that message by evaluating the data contributions of each researcher alongside their publications.  We also work to engineer the repository to maximize professional reward (e.g., through data citations and other means of impact tracking) and minimize risk (e.g., by allowing limited-duration data embargoes).  There are many subtle ways we can support professional reward.  For instance, in a large collaborative project, someone who is only a middle author on an article will be happy to list a dataset on their CV to which they can claim first-authorship. Q: How does one repository handle the diversity of datatypes, formats, technical standards and so on? A: When a journal publishes an article, they expect us to host any and all data, so we must be flexible about what we accept.  We do review the data files upon deposit to ensure they meet minimal standards, but do not recode the data or reject legitimate content.  In our view, the ultimate responsibility for reusability is with the author, the reviewers and the journal.  This diversity of content does make it more challenging to take preservation actions through migration of file formats, but we are learning how to do that.  What we can ensure at the repository is discoverability and uniformity of presentation - through high quality bibliographic metadata, reciprocal links between the data and article, assignment of DataCite DOIs [3], getting datasets indexed by search engines, and so on.  As a digital library, we see our role as providing a system for accessing books (aka data), not deciding what should be inside those books. Q: How are the data licensed? A: Researchers agree, upon deposit, that once the article is published - or, for about a third of the files, once the one year post-publication embargo is over - the data are to be released into the public domain using a Creative Commons Zero waiver [4].  This allows us to make the data available open in the sense of the Panton Principles for Open Data in Science: “freely available on the public internet permitting any user to download, copy, analyze, re-process, pass them to software or use them for any other purpose without financial, legal, or technical barriers other than those inseparable from gaining access to the internet itself” [5].   At the same time, we work on many fronts to ensure that authors are cited when their data is reused, and that those data citations are trackable.  There are some hard cultural and technical problems to be overcome in making trackable data citations a reality, but in our view adding a legal requirement is not helpful. Q: What is the relationship of data archiving to green and gold Open Access for journal articles? A: In order to make the data available to users without cost, while having a sustainable organization that can look after data long term, Dryad must have a business model in which curation and preservation costs are met upfront - at the time of data deposit.  In tha

Link:

http://www.openaire.eu/en/component/content/article/76-highlights/345-interview-with-a-data-repository-dryad

Updated:

08/16/2012, 06:08

From feeds:

Open Access Tracking Project (OATP) » abernard102@gmail.com

Tags:

oa.biology oa.new oa.data oa.gold oa.business_models oa.publishers oa.policies oa.licensing oa.green oa.deposits oa.panton oa.cc oa.open_science oa.impact oa.preservation oa.sustainability oa.librarians oa.funders oa.fees oa.embargoes oa.citations oa.biomedicine oa.studies oa.indexing oa.definitions oa.repositories.data oa.stem oa.metadata oa.interviews oa.libre oa.journals oa.economics_of oa.repositories oa.dois oa.people

Authors:

abernard

Date tagged:

08/20/2012, 18:30

Date published:

04/05/2012, 16:32