Open Source Genomics - Open Enterprise

abernard102@gmail.com 2013-12-24

Summary:

"There's a revolution underway. It's digital, but not in the computing sector. I'm referring to the world of genomics, which deals with the data that resides inside all living things: DNA. As most people know, DNA uses four chemical compounds - adenine, cytosine, guanine and thymine - to encode various structures, most notably proteins, which are represented by stretches of DNA called genes. Those four chemical 'letters' - A, C, G, T - actually form not a binary information system, but a quaternary one (it's trivial to convert between them.) That makes genomics an inherently digital domain, and therefore one that is ideally suited to computers for storage and analysis. That's been true for a while, but the costs of elucidating the DNA of an organism - "sequencing" it - have meant that this has only taken place in research laboratories. But costs for sequencing are dropping so rapidly - much faster than Moore's Law - that it will soon be possible to sequence anybody's complete DNA for a few hundred pounds, then tens of pounds, and finally for a vanishingly small amount. Since our DNA contains all kinds of hints about our genetic make-up, and our predisposition to certain diseases, once sequencing costs fall to this level, there will be a huge move to make our genome the basic barcode of our lives, since it is not only incredibly informative, it is unique - even identical twins aren't identical at the genetic level. However, each human genome has around three billion chemical letters, and the raw output of sequencers runs to many tens of gigabytes for each. This means that hospitals, say, will need to be able to manage petabytes and more of genomic data, as quickly and as cheaply as possible. This makes the new world of genomic medicine a natural for open source, which scales well, and is much more economical than alternatives. One new company hoping to exploit the natural strengths of open source in the field of medical genomics is Curoverse. Here's how it introduces itself and its approach: 'Next-generation sequencing is driving an explosion in big data that poses unique challenges for bioinformaticians, computational biologists, and the IT teams that support them. At Curoverse, we’re addressing those infrastructure challenges with a platform that makes storing, organizing, and processing these data faster, easier, and more affordable. (We’ll be launching in 2014.) Curoverse is entirely built with open source software. At its core, Curoverse uses a free and open source system called Arvados that was first developed for the Harvard Personal Genome Project. Arvados is designed to address the unique data management, computation, and sharing requirements driven by genomic and biomedical data. You can learn more about the project and join the community at arvados.org. During the last 10 years, web-scale businesses have produced an array of innovations in distributed computing, virtualization, file storage, and big-data processing. These innovations have only just begun to make their way into bioinformatics cores. At Curoverse, we’re dedicated to translating these technologies into products that address the unique requirements of the biomedical industry' ... 'Bioinformaticians' are just the people who using computers to analyse genomic data. The idea behind Arvados - and hence Curoverse - is to create a new, and totally open platform on top of which bioinformaticians in hospitals and companies can develop and run genome-based applications.  Curoverse will adopt a classic open source business model: the software will be free, but Curoverse will offer support and service contracts. That might mean running a hospital's genomic holdings on Curoverse's premises, allowing access over the Internet; running on a public cloud like AWS; or taking charge of the hospital's own systems on site.  Curoverse also hopes that extra layers of information will be added to systems running its code. For example, imaging data requires large storage capacities, as will sensor data once it is captured routinely, and continuously. Curoverse says that the techniques applied to managing petabytes of genomic data can be applied to these other domains too.  Given the incredible advances in sequencing, and the corresponding fall in costs to sequence complete human genomes, it seems likely that digital DNA will soon form the foundation of future health systems (with plenty of tricky privacy and security issues that need to be resolved as a result.) That means there will be a big market for the kind of system that Curoverse is offering ..."

Link:

http://blogs.computerworlduk.com/open-enterprise/2013/12/open-source-genomics/index.htm

From feeds:

Open Access Tracking Project (OATP) » abernard102@gmail.com

Tags:

oa.new oa.data oa.business_models oa.comment oa.floss oa.bioinformatics oa.cloud oa.economics_of oa.genomics oa.arvados oa.curoverse

Date tagged:

12/24/2013, 07:54

Date published:

12/24/2013, 02:54