The Internet Archive Will Digitize & Preserve Millions of Academic Articles with Its New Database, "Internet Archive Scholar" | Open Culture
flavoursofopenscience's bookmarks 2020-09-23
Summary:
Open access publishing has, indeed, made academic research more accessible, but in “the move from physical academic journals to digitally-accessible papers,” Samantha Cole writes at Vice, it has also become “more precarious to preserve…. If an institution stops paying for web hosting or changes servers, the research within could disappear.” At least a couple hundred open access journals vanished in this way between 2000 and 2019, a new study published on arxiv found. Another 900 journals are in danger of meeting the same fate.
The journals in peril include scholarship in the humanities and sciences, though many publications may only be of interest to historians, given the speed at which scientific research tends to move. In any case, “there shouldn’t really be any decay or loss in scientific publications, particularly those that have been open on the web,” says study co-author Mikael Laasko, information scientist at the Hanken School of Economics in Helsinki. Yet, in digital publishing, there are no printed copies in university libraries, catalogued and maintained by librarians.
To fill the need, the Internet Archive has created its own scholarly search platform, a “fulltext search index” that includes “over 25 million research articles and other scholarly documents” preserved on its servers. These collections span digitized and original digital articles published from the 18th century to “the latest Open Access conference proceedings and pre-prints crawled from the World Wide Web." Content in this search index comes in one of three forms:
- public web content in the Wayback Machine web archives (web.archive.org), either identified from historic collecting, crawled specifically to ensure long-term access to scholarly materials, or crawled at the direction of Archive-It partners
- digitized print material from paper and microform collections purchased and scanned by Internet Archive or its partners
- general materials on the archive.org collections, including content from partner organizations, uploads from the general public, and mirrors of other projects
The project is still in “alpha” and “has several bugs,” the site cautions, but it could, when it’s fully up and running, become part of a much-needed revolution in academic research—that is if the major academic publishers don’t find some legal pretext to shut it down.
[...]