Why Not Spare a Little Bandwidth for the Archive Team? – ProfHacker - Blogs - The Chronicle of Higher Education

abernard102@gmail.com 2014-02-03

Summary:

"Do you remember Seti@home? It was the first really widely loved example of a crowdsourced distributed computing project. You install a cool looking screensaver and in the background your computer crunches data on behalf of the noble cause of finding aliens in space. There are now many projects which take advantage of large networks of home computers to carry out tasks. The use of distributed computing for the “mining” in the virtual currency Bitcoin is another recent example from the news.The distributed computing project that is perhaps closest to my heart these days is the Archive Team Warrior project with Jason Scott as their spokesman, which helps archive the public content of large web services before they are buried in their digital graves. Their first great coup was in 2010, when they released a torrent file to download GeoCities, where a good chunk of the internet resided in the early days.I first found about their activities while working on the Digital Archive of Japan’s 2011 Disasters, which included a large-scale web archiving element carried out in cooperation with the Internet Archive. It is hard to realize how much of the open web goes down in just a few months of time, and watching this process unfold in closeup after the 2011 disasters in Japan made me realize how monumental the challenge will be for historians in the future to capture some of the quieter corners of the net that constitute a particularly unique and rich heritage, especially when it comes to small scale projects and local communities in particular ... The way we get can involved is to install Virtual Box, mentioned in the first tutorial by William Turkel I introduced last week, and use it to install the 'Archive Team Warrior' (a custom installation of Debian Linux). This allows the Archive Team to distribute web scraping tasks to your computer while it is running. To control its behavior, a simple local web interface is made available to you (accessible at http://localhost:8001/ when Archive Team Warrior is running) where you can choose the scraping project you want to be a part of and shows you exactly when and how much it is downloading and processing. At the time of writing this posting, it seems the web services currently active in the Archive Team Warrior software have already been shut down, but there are plenty of services on the “Deathwatch” that the Archive Team is keeping their eye on in the coming years ..."

Link:

http://chronicle.com/blogs/profhacker/why-not-spare-a-little-bandwidth-for-the-archive-team/55071

From feeds:

#edutech » ProfHacker
Open Access Tracking Project (OATP) » abernard102@gmail.com

Tags:

scraping archiveteam archives software oa.archive_team oa.software

Authors:

Konrad M. Lawson

Date tagged:

02/03/2014, 17:40

Date published:

02/03/2014, 05:04