2001: The Internet Gets a Memory With the Wayback Machine

Cybercultural: Internet History 2025-10-21

Internet Archive, November 2001 Internet Archive website after the launch of Wayback Machine in October 2001.

If the future is going to devolve into chaos, then the present ought to be preserved somehow, online. That was part of the thinking behind the Wayback Machine, a public archive of web pages that was launched on October 24, 2001, at a library at the University of California at Berkeley.

At the event, Internet Archive founder Brewster Kahle demonstrated the new time machine by pulling up a web page from the White House website from September 10, 1996, featuring President Clinton declaring the prevention of hijacking and terrorist attacks in the air a priority. Kahle then showed a special collection of archived websites about 9/11 (still, of course, fresh in the collective memory).

September 11 section of Wayback Machine, October 2001 September 11 archive in the Wayback Machine; screenshot from October 2001.

The Internet Archive had begun operating five years before the public launch of the Wayback Machine. That year, 1996, Kahle wrote an essay that was eventually published by Scientific American magazine. He wrote, “While the Internet’s World Wide Web is unprecedented in spreading the popular voice of millions that would never have been published before, no one recorded these documents and images from 1 year ago.”

So that’s what the Internet Archive set out to do. Indeed, one of the websites archived the month it launched, October 1996, was davidbowie.com — which I've written about extensively here on Cybercultural. This was the version before BowieNet, when it was still focused on the Outside album. However, what the Wayback Machine preserved — or failed to preserve — of Bowie’s website in October 1996 also illustrates a key problem of archiving digital content.

Bowie website in October 1996; via Wayback Machine.

On the homepage were four prominent links: “Enter site,” “site index,” “new single ‘telling lies’,” and “What’s new.” All four linked to a slice of www.davidbowie.com/maps/splash.map, a URL that indicated an animated “splash page” created with Macromedia Flash. It turns out, the Internet Archive could not preserve Flash animations at that time. So if you click any of those links in the Wayback Machine, you’ll get a “page does not exist” error. Even some of the HTML versions of pages from that time were not captured (such as sitemap.html), and many of the images are missing.

When the Internet Archive made the Wayback Machine available for public use in October 2001, it admitted that many sites weren’t fully preserved. “If you look at the collection of archived sites, you will find some broken pages, missing graphics, and some sites that aren't archived at all,” the organisation explained in an FAQ. “The Internet Archive has tried to create a complete archive, but has had difficulties with some sites, because the link structure was not straightforward to crawl.”

Wayback Machine, October 2001 Wayback Machine soon after its public launch in October 2001.

So with Bowie’s website, the use of Flash for a menu meant that the Internet Archive was not easily able to read its structure back in 1996. Also, sites that required a login to browse were unable to be captured — and this particular issue affected how much of BowieNet was preserved in the Wayback Machine. BowieNet was launched in September 1998, but there are no copies of it in Wayback Machine until late November 1999. Indeed, right up until the end of 2001, there are very limited copies of Bowie’s website — even the public pages — in the Internet Archive.

It wasn’t just technical issues that prevented a complete archiving of the World Wide Web. Sometimes early websites, even prominent ones, were simply overlooked. For example, CNN.com was launched in 1995 but the Wayback Machine didn’t start saving it until mid-2000. Even the BBC’s website, which was first captured in December 1996, has large holes in its archives until early 2000.

Brewster Kahle 1991 Internet Archive founder Brewster Kahle in 1991; photo by Carl Malamud via Flickr.

When it comes to our cultural history, the Internet Archive organisation has achieved a remarkable feat in preserving a decent chunk of the web from the 1990s and early 2000s. Nevertheless, the gaps in its archive are notable. In an analysis published on Forbes in November 2015, the academic Kalev Leetaru wrote, “When archiving an infinite web with finite resources, countless decisions must be made as to which narrow slices of the web to preserve.”

“Websites are like shifting sands,” Brewster Kahle himself admitted in a 2002 interview. “The average life of a Web page is 100 days. After that either it's changed or it disappears. So our intellectual society is built on sand.”

Even the formats used to archive websites had morphed a few times by the time Kahle gave that interview. “We recorded 1996, 1997 and 1998 on tape,” he said. “By 1999 we were just using hard drives, and now we're using a new generation of hard drives.”

Salon article about Internet Archive, November 2001; preserved via Wayback Machine.

In a followup article for Forbes in January 2016, Leetaru took a deeper look at the Internet Archive. He found that it “operates far more like a traditional library archive than a modern commercial search engine.” He meant that the Internet Archive doesn’t necessarily crawl everything on the web, and it relies on “an exquisitely complex assemblage of datasets and partners” to build its unique web repository.

“Rather than a single centralized and standardized continuous crawling farm, the Archive’s holdings are comprised of millions of files in thousands of collections from hundreds of partners, all woven together into a rich collage which the Archive preserves as custodian and curator,” wrote Leetaru. He added that the Wayback Machine is “merely a public interface to an unknown fraction of these holdings.”

The Wayback Machine is impressive enough, even despite its quirks, but Kahle’s ambitions didn’t stop there. In a profile by The New York Times the week after the launch of the Wayback Machine, he made clear that he wanted to preserve all forms of culture online:

“He doesn't want to stop with Web pages. Mr. Kahle (pronounced ''Kale'') is inviting copyright holders for books, movies, music and more to add their creations to the mix. Ultimately, he hopes to finally deliver the kind of library that the ancients tried to create in Alexandria.”

Brewster article, Scientific American, February 1997 Brewster Kahle's seminal Scientific American article, February 1997.

The Times article also raised possible concerns over copyright. Stanford University law professor Lawrence Lessig, a supporter of the Internet Archive, warned that copyright holders would eventually drag Kahle into court.

In time, copyright did indeed become a problem for the Internet Archive — especially regarding books and music. Two large lawsuits have only recently been resolved; in 2023 the book publisher Hachette won a case against the IA in regards to the lending of digital books during the COVID-19 pandemic (a 2024 appeal was denied), and a case brought by Universal Music Group was settled in September of this year.

Copyright issues aside, the biggest achievement of Brewster Kahle is that he legitimised web content as a valid form of culture worth preserving. That we can revisit previous versions of davidbowie.com and other old websites is a blessing, and it’s all thanks to the launch of the Wayback Machine in 2001. On a personal note, this very website — Cybercultural — would not exist without Kahle's time machine.

Brewster Kahle in front of containers Brewster Kahle leading a tour of the Internet Archive in October 2023; photos by the author.

Postscript: Today, October 21, 2025, there is a rally in San Francisco in support of the Internet Archive. Tomorrow, there's a party to celebrate 1 trillion webpages archived. Long live the Internet Archive!