Scientific Data to complement and promote public data repositories : Scientific Data
abernard102@gmail.com 2013-08-08
Summary:
In parallel, we are working with Data Dryad and figshare, two 'generalist' scientific repositories, to ensure that all data-types will have a home. Even when good data-type specific repositories exist, they will be available as 'fallback repositories,' helping us put datasets through peer review if the existing repositories do not support confidential peer review or happen to be down for maintenance. Authors would then be expected to move their data to the community standard repository before publication. Ultimately, we believe that journal-specific data repositories are not the answer to promoting open data sharing. Research journals already store a wide range of datasets in their supplementary material sections. This is much better than not releasing the data, but it is widely regarded as a terrible place to store primary datasets. Indeed, the Nature-titled journals already have strong policies that require data deposition to public repositories in fields where standards and repositories are well-established. Journal-specific data repositories risk muddying these important policies. In line with this strategy, our main content type, the Data Descriptor, is designed to complement the information in both research journal articles and at data repositories. Data Descriptors will provide detailed descriptions of the experiments and procedures involved in generating important datasets, including essential information needed for scientists to assess the technical quality of the data, reproduce key methods or analysis workflows, and ultimately reuse the data to address important research questions. In addition, every publication at Scientific Data will be supported by metadata describing key properties of the experiments and resulting data, which will be checked by an in-house curator and released in the ISA-tab format, and hopefully other standard formats in the future. These metadata will aid data mining, and will help scientists find and reuse high-quality datasets stored across multiple data repositories."