Public can now search UK government’s entire digital archive

beSpacific 2018-05-20

BusinessCloud: “The British government’s entire online presence comprising billions of web pages has been indexed and digitally archived to the cloud for the first time. Manchester tech firm MirrorWeb has devised an all-new indexing to create an accessible, searchable and user-friendly resource for the public. The National Archives’ gigantic 120TB web archive encompasses billions of web pages – from every government department website and social media account – from 1996 to the present. It took MirrorWeb – named among our 101 Rising Stars of the UK Start-up Scene last year – just two weeks to transfer the data from 72 hard drives at The National Archives to internal hard drives before transferring and digitally archiving more than two decades of government internet history to the cloud. As part of a four-year contract, MirrorWeb was tasked with both moving the data to the cloud using Amazon Web Services as well as indexing it. Indexing the data meant that MirrorWeb had to write a complete replacement for the UK Government Web Archives’ previous search functionality. As a result, 1.4bn documents were indexed and are now accessible and searchable to researchers, students and the members of the public who need to use them, enabling them to view websites and social media content in their original form as well as search for content on specific topics. John Sheridan, digital director of The National Archives, said: “We are preserving 1,000 years of British history and a big part of that is preserving the digital record of government today…”