"Just as software code can be open source rather than proprietary, so there are publicly funded genomic sequencing initiatives that make their results available to all. One of the largest projects, the UK Biobank(UKB), involves 500,000 participants. Any researcher, anywhere in the world, can download complete, anonymized data sets, provided they are approved by the UKB board. One important restriction is that they must not try to re-identify any participant—something that would be relatively easy to do given the extremely detailed clinical history that was gathered from volunteers along with blood and urine samples. Investigators asked all 500,000 participants about their habits, and examined them for more than 2,000 different traits, including data on their social lives, cognitive state, lifestyle and physical health.

Given the large number of genomes that need to be sequenced, the first open DNA data sets from UKB are only partial, although the plan is to sequence all genomes more fully in due course. These smaller data sets allow what is called "genotyping", which provides a rough map of a person's DNA and its specific properties. Even this partial sequencing provides valuable information, especially when it is available for large numbers of people. As an article in Science points out, it is not just the size and richness of the open data sets that makes the UK Biobank unique, it is the thorough-going nature of the sharing that is required from researchers....

It's the classic "given enough eyeballs, all bugs are shallow". By open-sourcing the genomic code of 500,000 of its citizens, the UK is getting the top DNA hackers in the world to find the "bugs"—the variants that are associated with medical conditions—that will help our understanding of them and may well lead to the development of new treatments for them. The advantages are so obvious, it's a wonder people use anything else. A bit like open source...."


