Open and Shut?: A New Declaration of Rights: Open Content Mining
Use the link to access the full post including, a PDF outlining the history of and recent developments in the content mining debate and a transcript of the interview described in the following opening statement: “In a recent investment report, analyst Claudio Aspesi concluded that a new front had opened up in the Open Access (OA) debate. Writing in April, Aspesi noted that academics are ‘increasingly protesting the limitations to the usage of the information and data contained in the articles published through subscription models, and — in particular — to the practice of text mining articles.’ Aspesi is right, and a central figure in this battleground is University of Cambridge chemist Peter Murray-Rust. A long-time advocate for open data, Murray-Rust is now spearheading an initiative to draft a ‘Content Mining Declaration’. What is the background to this? When I interviewed Peter Murray-Rust in 2008, he expressed considerable frustration at the difficulties he was experiencing in trying to extract and reuse the data published in scholarly journals — even where his university had paid an electronic licence to access the content. However, he was having huge problems achieving this, not because of any technical issue, but because of uncertainty over copyright and publishers’ insistence that a licence to read journals does not encompass the right to mine them with software... In pursuit of his dream, Murray-Rust became a formative voice in the creation of the open data movement. Open data, Murray-Rust explained to me in 2008, is data ‘free of any restraint on access and on reuse.’ Recently, however, governments have tended to lead the way in urging for open data, spawning a generation of data wranglers; open scientific information has often lagged behind, but is now beginning to be seen as a central issue. Four years later Murray-Rust is still frustrated. He is not, however, a man to give up, and he continues his advocacy today under the rubric of ‘open content mining’. Essentially, this is text mining plus. As Murray-Rust explains today, he views the mining of scholarly journals as a hierarchical activity, with content mining encompassing not just the mining of text and data, but other types of content too, including images, tables, graphs, audio, and video. Simply using the term ‘text mining’, he adds, ‘might imply that anything other than text should be protected by the ‘content provider’. However, I and others can extract factual information from a wide range of material.’ The good news is that the research community is finally beginning to understand what Murray-Rust has been ‘banging on about’ for all these years, as are research funders and governments, and Murray-Rust believes the door to what he wants is finally beginning to open. However, he says, it is imperative that text mining advocates push hard at that open door if they want to achieve their objectives. To this end, Murray-Rust recently convened an ad hoc group of interested parties to draft what he calls a ‘Content Mining Declaration’ (disclosure: I am a member of the group)."