NIMH · Open Data
"A couple of weeks ago, President Obama launched a new open data policy (pdf) for the federal government. Declaring that, '…information is a valuable asset that is multiplied when it is shared,' the Administration’s new policy empowers federal agencies to promote an environment in which shareable data are maximally and responsibly accessible. The policy supports broad access to government data in order to promote entrepreneurship, innovation, and scientific discovery. If the White House needed an example of the power of data sharing, it could point to the Psychiatric Genomics Consortium (PGC). The PGC began in 2007 and now boasts 123,000 samples from people with a diagnosis of schizophrenia, bipolar disorder, ADHD, or autism and 80,000 controls collected by over 300 scientists from 80 institutions in 20 countries. This consortium is the largest collaboration in the history of psychiatry. More important than the size of this mega-consortium is its success. There are perhaps three million common variants in the human genome. Amidst so much variation, it takes a large sample to find a statistically significant genetic signal associated with disease. Showing a kind of 'selfish altruism,' scientists began to realize that by pooling data, combining computing efforts, and sharing ideas, they could detect the signals that had been obscured because of lack of statistical power. In 2011, with 9,000 cases, the PGC was able to identify 5 genetic variants associated with schizophrenia. In 2012, with 14,000 cases, they discovered 22 significant genetic variants. Today, with over 30,000 cases, over 100 genetic variants are significant. None of these alone are likely to be genetic causes for schizophrenia, but they define the architecture of risk and collectively could be useful for identifying the biological pathways that contribute to the illness. We are seeing a similar culture change in neuroimaging. The Human Connectome Project is scanning 1,200 healthy volunteers with state of the art technology to define variation in the brain’s wiring. The imaging data, cognitive data, and de-identified demographic data on each volunteer are available, along with a workbench of web-based analytical tools, so that qualified researchers can obtain access and interrogate one of the largest imaging data sets anywhere. How exciting to think that a curious scientist with a good question can now explore a treasure trove of human brain imaging data—and possibly uncover an important aspect of brain organization—without ever doing a scan. However, not all scientists are comfortable sharing data. Some point out that data collected under different conditions or with different assessment tools should not be combined. Some have expressed concern that data will be 'misinterpreted' if analyzed without the input of the researchers who collected the data. And others worry about the competitive disadvantage of sharing data before publication. In an academic culture that rewards the first to report a finding and for which publication is critical for promotion, sharing might seem unfair to early career scientists and unacceptable to more established investigators. Finally, privacy concerns may be a complex—though not insurmountable—barrier to sharing data, both for scientists and for research participants. We must not minimize these concerns. But as an agency that is ultimately focused on improving the health of patients, NIMH must find a way to balance the concerns of the academic community with our public health mission ..."