New 5 Billion Page Web Index with Page Rank Now Available for Free from Common Crawl Foundation

Connotea Imports 2012-07-31

Summary:

"A freely accessible index of 5 billion web pages, their page rank, their link graphs and other metadata, hosted on Amazon EC2, was announced today by the Common Crawl Foundation. "It is crucial [in] our information-based society that Web crawl data be open and accessible to anyone who desires to utilize it," writes Foundation director Lisa Green on the organization's blog. The Foundation is an organization dedicated to leveraging the falling costs of crawling and storage for the benefit of "individuals, academic groups, small start-ups, big companies, governments and nonprofits." It's lead by Gilad Elbaz, the forefather of Google AdSense and the CEO of data platform startup Factual. Joining Elbaz on the Foundation board is internet public domain champion Carl Malamud and semantic web serial entrepreneur Nova Spivack. Director Lisa Green came to the Foundation by way of Creative Commons...."

Link:

http://www.readwriteweb.com/archives/common_crawl_foundation_announces_5_billion_page_w.php

From feeds:

Open Access Tracking Project (OATP) ยป Connotea Imports

Tags:

oa.new oa.data ru.do ru.ps

Authors:

petersuber

Date tagged:

07/31/2012, 12:16

Date published:

11/07/2011, 20:44