Lessons from the Library: Extreme Minimalist Scaling at Pirate Ebook Platforms
peter.suber's bookmarks 2022-06-27
Abstract: At 33TB of data in its main collection, the highly illegal Library Genesis project is one of the largest repositories of copyright-violating educational ebooks ever created. Established over a decade ago in 2008, the goal of Library Genesis is nothing short of a modern Library of Alexandria, albeit without anyone’s legal sanction. As one of its administrators wrote: “within decades, generations of people everywhere in the world will grow up with access to the best scientific texts of all time. [...] [T]he quality and accessibility of education to the poor will grow dramatically too. Frankly, I see this as the only way to naturally improve mankind: we need to make all the information available to them at any time” [Bodó 2018b]. Rooted in its homeland’s Russian communist principles and particularly the Soviet isolationist copyright policies of the twentieth century, Library Genesis is a formidable resource and threat to conventional academic publishers. The Library Genesis database had just short of 1.2m records (books) in 2014 [Bodó 2018a]. As of January 2020, this capacity has doubled to 2.5m books. In this article, I examine the minimal computational design choices taken by this maximal-in-intent, illicit archive of epistemological dissent and how such decisions have shaped the scalability and growth of the platform. This includes Library Genesis’s numerical subdivision of record identifiers into “buckets” to work around directory file limitations in the GNU/Linux operating system; its use of md5 hashing of filenames within directories capped at 1,000 files to avoid future hashing collisions while allowing for on-disk integrity checking; and its use of the MySQL socket/network server as opposed to SQLite or similar disk-based database. Beyond these computational details, though, the theoretical tension that this article highlights is the path dependencies that are set in (illegal) computational projects that have goals of absolute abundance and maximalist capacity, and the minimalist design principles that they must instigate at the outset to ensure a degree of scalability. I also query the ways in which the project’s contested mission statements target an economic (geographic) audience demographic with only minimalist access to high-capacity computing resources. I finally examine the limits on scalability of the distribution of the Library Genesis through its torrent archive and other distributed networking technologies such as IPFS, which despite their promise of peer-to-peer redundancy fall down on an archive of this size.
From feeds:[IOI] Open Infrastructure Tracking Project » Items tagged with oa.infrastructure in Open Access Tracking Project (OATP)
Open Access Tracking Project (OATP) » peter.suber's bookmarks