Balancing privacy with public benefit : Nature News & Comment
This week’s publication in Nature of a second HeLa cancer-cell genome (see page 207) and the announcement from the US National Institutes of Health on how it will control who gets to use the sequence information (see page 141) highlight a growing issue in modern science: access to biomedical and health-related research data. The amount of such data continues to grow at breakneck speed, generated by large epidemiological and cohort studies that track people’s health over many years (for example, the UK Biobank project) and by studies that sequence the DNA of many individuals, such as the 1000 Genomes project. Researchers, funders and governments are becoming increasingly aware of the potential power of linking and co-analysing different data sets. Genomic data linked to large sets of patient records, for example, might reveal connections about disease that we would not otherwise discover, and data from the social sciences could add further value to these studies. Maximizing access to data resources should increase the chances that scientists will make discoveries with medical benefits. As a result, most major research funders require grant recipients to make any large data sets they create available to other researchers. It is an ethical imperative that we seek to maximize the value of research data generated from human participants, particularly when using public funds. In response to open-access policies, a trend is emerging of allowing legitimate researchers access to research data before publication. In making unpublished data available, however, two sets of interests need to be safeguarded. Most research participants expect privacy protection and do not want their genomes or health records to be readily identifiable. Furthermore, researchers who spend time, effort and ingenuity to generate, process and manage large research data sets expect to get appropriate credit. This also relates to emerging discussions about clinical trials: there is a need for more access to patient-level data (as highlighted by the AllTrials campaign), while respecting the terms of study participants’ consent. To navigate these issues, many large genome and longitudinal studies have set up specific data-access procedures, often overseen by committees. This is what the National Institutes of Health has done for the HeLa sequence. As the number of these data-access committees grows along with the links between data sets, a question arises: is such a piecemeal approach appropriate? The scientific and medical potential of data will only be realized if researchers are not stymied by myriad data-access mechanisms and by inconsistent ways of recording and describing data variables. So, does biomedical science need to establish and enforce common principles of governance? ... I chair the Expert Advisory Group on Data Access — a working group that has been set up to provide strategic advice on this issue to funders — and we need your help. We have already talked to those who produce and manage biomedical and social-science data. Now we want to hear from those who use the data, or who would like to use them in future ... The remit of the working group is for UK-based funders, but our scope is international and we want to reach across both disciplinary and national borders. Still, so far we have found it extremely difficult to get an overview of the situation in the United Kingdom, let alone internationally. This is partly because of the proliferation of data in increasingly large, complex and heterogeneous data sets, but also because of the patchwork of regulations, standards and policies that govern the management of research data across the world. To help fill in the gaps, we are conducting an online survey of users of research data, and we would value the input of Nature readers. If you use shared data in your research, wherever you are in the world, I urge you to participate (see go.nature.com/bmun1x) ..."