Textmining: Update, Wiley, Nature and Hargreaves. And Elsevier allows me unrestricted text-mining! Thanks!!!

“I shall continue to update on a daily basis. [1] Hargreaves... We have formed a small group to coordinate our reply to Hargreaves and this will take place on the OKF open-science and open-access lists (and @ccess). Please let us know of useful experience in access to published material... [2] Nature... Today Richard van Noorden (of Nature) posted a useful article on the current frustration within the research community about not being able to textmine when and where they want. It’s moderately well balanced. However it doesn’t say anything critical about Nature: ‘Nature Publishing Group in London, which publishes this journal, says that it does not charge existing subscribers to mine content to which they already have access, subject to contract.’ RvN didn’t say that NPG sent Max Haeussler a quote for 85,000 USD to mine Nature content. I talked with RvN, gave him a lot of material for his article and pointed him to MaxH. I said that I would expect RvN to be objective in his report and not favour Nature. He said he would, but had to get his copy agreed. In the end he decided not to use any of my material – that’s fine, journalists collect more material than they can use. [3] Elsevier... I have had a useful set of email communications with Alicia Wise of Elsevier. Today she has agreed that I can go ahead with textmining as I wish!... [Text of the letter from AW to PMR] ‘As I indicated to you when we met in Oxford, we (at Elsevier) have no problem in principle with you text mining for research purposes. There are some practical matters to resolve through discussion. With regret I have formed the view that you are not – at this time – really seeking practical solutions. If this changes please do let me know as we remain willing to work with you and other colleagues at Cambridge – and elsewhere – who need and want to text mine. While I am here, I would like to stress the real value of librarians in these discussions. Your library colleagues at Cambridge have – both directly and through JISC Collections – relationships and existing agreements with a wide array of publishers. They are constructive partners for us all in facilitating text mining and scaling up as we move forward’ ... I am actively seeking practical solutions. I’m going to start tomorrow! (I’ll let our library know in case there are teething glitches). Last week we (Daniel Lowe) mined 1,000,000 (1 million) chemical reactions from US patents...We are going to put the results up on DSpace and Figshare and our own Quixote so everyone else can do research on them as well. NOTE: I didn’t need any help from the USPTO or Cambridge Library. I simply want to do the same with papers in Elsevier journals... This is research because the science is done for a different purpose than invention and generally is aimed towards novelty rather than production. So we get a whole new set of chemistry. It’s also done on a different scale – much novel chemistry doesn’t scale directly into production... I assume you will trust me as to what RESEARCH in chemical text-mining is – I’m a world expert, honoured by the ACS for this work. And I assume you will trust me not to publish copyright content – I haven’t done so in 10 years of semantic research. I shan’t publish the VoR PDF nor the author’s final manuscript. But I shall publish all the factual data on which the RESEARCH relies and all the bibliography metadata which is required to manage the output. So here’s what I am going to do: [a] Use our Pubcrawler software to systematically retrieve all publications from Elsevier journals... [b] We shall determine which papers contain chemistry using our OSCAR4 software. This is the best Open Source software for chemical textmining... [c] We shall filter the articles into those that have a significant proportion of chemistry and those that don’t and concentrate on the former... [d] We shall then extract and analyse the chemical names and formulae. Where possible we shall try to match redundant information [e] We shall extract the factual data (spectra) and check their validity against the chemical structure using our OSCAR2 software... We’ll show where papers contain errors... [f] We shall use computational chemistry to compute the properties of the compounds and compare them with experiment. That’s really valuable RESEARCH. 15% of all supercomputer time is on compchem and there is a desperate need to calibrate its usefulness. [g] We shall extract the chemical reactions. There is very little research done in academia on the phenomenology of published reactions – we did some of this last year at the Open Science Summit where we analysed chemical reactions for eco-friendliness (the “Green Chain Reaction”). We’ll be able now to show whether the chemistry in Elsevier journals is more eco-friendly than in patents..."




