1000 Genomes Project data available on Amazon Cloud, March 29, 2012 News Release - National Institutes of Health (NIH)

abernard102@gmail.com 2012-08-20

Summary:

“The world's largest set of data on human genetic variation — produced by the international 1000 Genomes Project — is now publicly available on the Amazon Web Services (AWS) cloud, the National Institutes of Health and AWS jointly announced today. The public-private collaboration demonstrates the kind of solutions that may emerge from the Big Data Research and Development Initiative announced today by the White House Office of Science and Technology Policy (OSTP) during an event at the American Association for the Advancement of Science ... The Big Data initiative will initially engage at least six federal science agencies — including the NIH, the National Science Foundation, and the Department of Defense and the Department of Energy — committing more than $200 million to a collaborative effort to develop core technologies and other resources needed by researchers to manage and analyze enormous data sets. Among the NIH components participating in the Big Data initiative are the National Human Genome Research Institute (NHGRI) and the NIH National Center for Biotechnology Information (NCBI), a division of the National Library of Medicine. NHGRI played a lead role in organizing and funding the international 1000 Genomes Project. NCBI, along with the European Bioinformatics Institute, Hinxton, England, began making 1000 Genomes Project data freely available to researchers in 2008. Since the project's launch, the data set has grown enormously: At 200 terabytes — the equivalent of 16 million file cabinets filled with text, or more than 30,000 standard DVDs — the current 1000 Genomes Project records are a prime example of big data that has become so massive that few researchers have the computing power to use them. To help solve that problem, AWS has just posted the 1000 Genomes Project data for free as a public data set, providing a centralized repository on the Amazon Simple Storage Service. The data can be seamlessly accessed through services such as Amazon Elastic Compute Cloud and Amazon Elastic MapReduce, which provide organizations with the highly scalable resources needed to power big data and high performance computing applications often needed in research. Researchers pay only for the additional AWS resources they need to further process or analyze the data. The public-private collaboration to store the data in the AWS cloud allows any researcher to access and analyze the data at a fraction of the cost it would take for their institution to acquire the needed internet bandwidth, data storage and analytical computing capacity... Cloud access also enables users to analyze the data much more quickly, as it eliminates the time-consuming download of data and because users can run their analyses over many servers at once... Initiated in 2008, the 1000 Genomes Project is an international public-private consortium that aims to build the most detailed map of human genetic variation available, ultimately with data from the genomes of more than 2,600 people from 26 populations around the world. The project began with three pilot studies that assessed strategies for producing a catalog of genetic variants that are present at 1 percent or greater in the populations studied. Data from the pilot studies were released on AWS in 2010. The data now being released in the cloud include results from sequencing the DNA of some 1,700 people; the remaining 900 samples will be sequenced in 2012 and that data will be released to researchers as soon as possible. The new results identify genetic variation occurring in less than 1 percent of the study populations and which may make important genetic contributions to common diseases, such as cancer or diabetes...”

Link:

http://www.nih.gov/news/health/mar2012/nhgri-29.htm

Updated:

08/16/2012, 06:08

From feeds:

Open Access Tracking Project (OATP) » abernard102@gmail.com

Tags:

oa.medicine oa.biology oa.new oa.data oa.usa oa.nih oa.events oa.funding oa.1000genomes oa.biomedicine oa.bioinformatics oa.cloud oa.ostp oa.doe oa.nsf oa.amazon oa.ncbi oa.dod oa.nhgri oa.announcements

Authors:

abernard

Date tagged:

08/20/2012, 18:37

Date published:

03/31/2012, 16:45