Exploiting the data mine | Chemistry World

abernard102@gmail.com 2015-08-15

Summary:

" ... Historically, the chemical community has lagged behind molecular biologists in developing both the policies and the software to enable its data to be widely used. This is not just because molecular biology was arguably ahead of chemistry in becoming a ‘big data’ discipline, but through deliberate decisions by leading biologists. The agreement in 1996 that all data obtained in the Human Genome Project should be made freely accessible within 24 hours is rightly regarded as a model for open access in all scientific disciplines. Chemistry, however, is now catching up. Anne Hersey, coordinator of the open access ChEMBL database group at the EBI, published an editorial in the journal Future Medicinal Chemistry in 2012 arguing that pharmaceutical chemists should learn from the biological community and make as much of their data as possible widely accessible, preferably in machine-readable formats. Drug discovery, she suggested, is becoming a much more diverse process, with academic and charitable labs (with no big budgets to spend on data access) beginning to play as important a role as their commercial counterparts. Three years later, evidence that this change has started in earnest is now apparent ...One organisation helping with the move towards open access databases is the Research Data Alliance(RDA). The RDA – set up with funding from the European Commission, the US National Science Foundation and the Australian government – supports and promotes open data and data sharing in all its forms across all scientific disciplines. It is free for anyone to join, and volunteer experts come together in working and interest groups to develop the technical and social infrastructure that is needed for open data sharing ... To date, medicinal chemists are probably the most active of all the chemistry sub-disciplines in terms of depositing, using and sharing open data. ChEMBL is one of the most widely used open chemical databases, and it focuses entirely on the structures and properties of pharmacologically active molecules. This database currently holds about 1.5 million such compounds, with 13.5 million related bioactivity data points ... ChemSpider, another freely available and widely used database, was set up by Antony Williams in the US to automatically trawl the internet for chemical data and integrate it into a database. It was initially almost a hobby project, but since it was acquired by the Royal Society of Chemistry in 2009 it has now grown to 34 million compounds and covers a larger and more representative part of ‘chemical space’ than the pharmacologically-focused ChEMBL ... Extracting chemical (or any other) data from the literature and converting it into database entries requires a set of methods that are grouped together under the term ‘text mining’. This is now a recognised sub-discipline within informatics, and it has many applications in chemistry, the life sciences and medicine. The National Centre for Text Mining in Manchester (Nactem), UK, has close links with the University of Manchester’s Centre for Integrative Systems Biology ..."

 

Link:

http://www.rsc.org/chemistryworld/2015/08/data-mining-bioinformatics

From feeds:

Open Access Tracking Project (OATP) » abernard102@gmail.com

Tags:

oa.new oa.comment oa.data oa.open_science oa.chemistry oa.mining oa.databases oa.chemspider oa.chembl

Date tagged:

08/15/2015, 07:37

Date published:

08/15/2015, 03:37