Information mining and Hargreaves: I set out the absolute rights for readers. Non-negotiable


“As I have already blogged I have been asked by Ben Hawes at the UK Intellectual Property Office to respond to the Hargreaves report on ‘textmining’. I shall be getting help from my OKF colleagues. The issues are, in my mind, simple: [1] Legitimate human readers of the literature (‘subscribers’) have a right to extract factual information from the literature and have exercised this for 200 years. [2] We can now do this with machines, often better than with humans. It’s vastly faster and cheaper. It increases the value of the literature [3] The publishers forbid us to do this and put in place legal and technical obstacles on top of normal copyright [4] We are now demanding the removal of these obstacles. This is not a negotiation, it’s a statement of our absolute right... In essence we shall report to Hargreaves: Our position and the justification for it [and] Whether the publishers have agreed that these are our rights... We shall contact 9 publishers tomorrow through known contacts; this represents our best approach to non-repudiation. Since the publishers have no-formal mechanism for readers to make formal enquiries (in itself Institutionalised unhelpfulness to readers) it will be done through this blog and email. There is enough evidence to show that all publishers will be aware of this request and if they wish to be helpful they can... Background and clarification... Human abstracters have for centuries abstracted from and commented on the scholarly literature and made the results public without requiring permission from publishers and/or authors. Indeed science is based on being able to do this. In our present request I shall confine the request to “facts” in scientific papers, but the permission I am asserting extends to abstracting papers and to commenting and other activities practised in the paper era. None of this requires new permissions; it is explicitly and implicitly part of current practice. If I read a paper I can write an abstract; I can also critique parts (e.g. reproduce paragraphs and comment in detail on what the authors said.) Refusal to allow this is a direct attack on the integrity of science. I do not have to be the owner of a scientific article to do this. If I borrow a journal from the public library I can sit at home and write abstracts on every paper. I would strongly urge anyone interested in abstraction, commentary, parody, etc. to make representation to Hargreaves... Factual information is frequently contained in graphs, tables, images, speech and video. Therefore ‘text-mining’ is a subset of information-mining and I shall use that term. Indeed our software can understand simple human spoken discourse about chemical reactions and extract the facts... Alicia Wise from Elsevier wants to know what I want to do with the content. There is no reason why I should have to justify what I do to Elsevier, but here it is: I want to extract as many facts as I can from the scientific literature and publish them (as CC0) for me and others to do science with, to build new scientific tools and improve the quality of science. It is my right. There is absolutely no reason why anyone should need to involve the publisher in information-mining... Scientists should not have to ask permission not should they have to ‘use the publisher API’ and they should never have to pay... Legitimate publisher concerns about information-mining... There is only one valid reason for liaising with the publisher – the possibility of server overload. This is a negligible problem if done responsibly – for example if one allows a short pause between each download request... I suspect the following concerns: [1] Peter Murray-Rust will steal and publish ‘our’ content... [2] Peter Murray-Rust and his robots will find errors in our papers. I hope that no-one is afraid of this. It is the purpose of science to find errors and our robots are better than humans at finding many types of error. The publishers’ refusal to allow us to validate the literature is damaging science, not enhancing it. [3] Peter Murray-Rust will create disruptive technology that will seriously disturb our cosy monopolies. I think this is the real crux... Elsevier run a chemical database where they abstract information from the literature (Reaxys) which probably has a revenue stream of several hundred million USD [ACS do the same ("Chemical Abstracts", CAS) and estimates are in the range 200-500 million USD]. So to preserve their monopoly they prevent me mining information. Is it a real threat? One chemist against Elsevier? Yes. Because I have many people who think the same way. Other walled gardens include bibliography and citations. It’s possible to extract both of these robotically and we have the technology to do this. But Scopus and World of Science will be disrupted by this... The robots have no benefit to the subscriber and are deeply insulting. I am prepared to agree that we should be considerate in our crawling. I have been very considerate so far, verbally agreed it with at least two publishers. It’s insulting to suggest that Universities are incapable of writing



