Text-mining the scholarly literature: towards a set of universal Principles; Update and strategy

abernard102@gmail.com 2012-08-20

Summary:

“For some years I have seen the primary literature as an enormous untapped resource of scholarly information. We humans are very good at some aspects of ‘reading the literature’ but there are many areas where machines are better and should be used. These include scale (hundreds of thousands of manuscripts), checking, validation, transformation (e.g. scientific units), deduction (many papers have implicit semantics), aggregation of knowledge, and much more. We are now reaching the time when the technology of ‘text-mining’ is mature enough to deploy and, for example, my group and I have developed among the best tools in the world for mining chemistry. n general the readers of the scholarly literature (who may include the #scholarlypoor) have been seriously frustrated by the restrictions imposed by publishers and universally agreed by librarians. Most subscriptions to most major journals have terms forbidding readers to mine/crawl/index/extract etc. This is not a consequence of copyright – it is an additional restriction imposed by published and apparently automatically assented to by academic purchasing systems (mainly libraries). This automatic assent has done scholarship a grave disservice, so I give the library community a chance to correct the historical record: Has any library ever publicly challenged the terms of use [on mining] set by publishers? I haven’t seen any. But I’d be grateful to know public cases, and what happened. My current view is that publishers set conditions and that libraries accept them verbatim, which, unfortunately, means that they don’t have a track record of fighting for text-mining or other freedoms. Moving on, the UK Hargreaves report has recommended removing these restrictions (which are not legally required) and also modifying copyright law. My grapevine suggests there is a high probability that significant changes will be made and that ‘text-mining’ will become widely available without requiring explicit permission. We should prepare for this, and any responsible publisher and library/purchaser should be preparing for this. A month ago I and colleagues in OKF submitted cases to the Hargreaves process. As part of that I asked 6 major publishers whether I could “text-mine” their journals. Naomi Lillie of OKF is summarising the results and I will keep you in suspense till then. It’s fair to say some were helpful, some were not and some were fuzzy (for whatever motivation). A number of publishers said we should discuss it with the library. There is no need for this. I and my group can text mine material by myself – in one week Daniel Lowe extracted 500,000 chemical reactions from the US Patent Office without needing any help. Nick Day has built PubCrawler and extracted 200,000 crystal structures from supplemental information without any help. The only thing I need is: [1] An assurance I won’t be sued for behaving like a responsible scholar [2] An assurance that my institution won’t get cut off for (my) responsible behaviour In case anyone in the publishing or library communities doesn’t understand what ‘responsible’ means, it means: ‘I do not intend deliberately to re-publish the publishers manuscripts (“the PDF”) in bulk without valid scholarly reason.’ So this post asserts my absolute right as a subscriber to the scholarly literature to carry out textmining and to disseminate the results to anyone. I do not need any other permissions... At present, therefore, a group of us – under the aegis of the Open Knowledge Foundation – is drafting a set of principles for textmining. They include: [1] Heather Piwowar. Heather has written several blogposts (http://researchremix.wordpress.com/ ) about text-mining. They include negotiations with Elsevier (which include the need for Elsevier and librarians to give her permission) and more recently a manifesto (http://researchremix.wordpress.com/2012/04/20/new-fron/ ). [2] Maximilian Haussler. See (http://blogs.ch.cam.ac.uk/pmr/2012/03/09/textmining-update-max-haussler%E2%80%99s-questions-to-publishers-they-have-a-duty-to-reply/ ). Max was quoted 85,000 USD by NPG to mine their content (I think this has been altered to 0?) . He and colleagues have fought for the right and he has submitted a detailed case to the US government [3] Diane Cabell and Jenny Molloy, OKF. Diane is a specialist in intellectual property law and has helped to craft the OKF open-science response to Hargreaves. [4] Ross Mounce. Panton fellow (http://about.me/rossmounce ). Ross has created a superb and damning summary of publishers distortion of the term ‘Open Access’ in paid hybrid journals. Ross and I are now working on the technology and strategy of textmining. We shall come up with a manifesto/set-of-principles. This will be a statement of our rights and our responsibilities. It is not a negotiation, anymore than Tom Paine or the Founding fathers negotiated in the construction of their declarations. Or, more recently, the BBB declarations of Open Access. Those declaration are priceless – it’s just a pity that there are not enough who believe in them enough to push for their universa

Link:

http://blogs.ch.cam.ac.uk/pmr/2012/04/25/text-mining-the-scholarly-literature-towards-a-set-of-universal-principles-update-and-strategy/

Updated:

08/16/2012, 06:08

From feeds:

Open Access Tracking Project (OATP) » abernard102@gmail.com

Tags:

oa.new oa.npg oa.business_models oa.publishers oa.mining oa.comment oa.legislation oa.elsevier oa.copyright oa.libraries oa.panton oa.declarations oa.consultations oa.litigation oa.librarians oa.prices oa.hybrid oa.patents oa.fees oa.okfn oa.hargreaves oa.uspto

Authors:

abernard

Date tagged:

08/20/2012, 18:05

Date published:

04/25/2012, 17:10