Researchers Open Repository for ‘Dark Data’ – Wired Campus - Blogs - The Chronicle of Higher Education

Wired Campus 2015-07-22

DataBridge LogoResearchers at the University of North Carolina at Chapel Hill are leading an effort to create a one-stop shop for data sets that would otherwise be lost to the public after the papers they were produced for are published.

The goal of the project, called DataBridge, is to expand the life cycle of so-called dark data, said Arcot Rajasekar, the lead principal investigator on the project and a professor in the School of Information and Library Science at Chapel Hill. It will serve as an archive for data sets and metadata, and will group them into clusters of information to make relevant data easier to find.

“You can reuse it, repurpose it, and then maybe someone else will reuse it, and see how we can enable that to get more science,” Mr. Rajasekar said.

A key aspect of the project will be how it allows researchers to make connections, “so that a person who wants to use the data will be able to pull in other data of a similar nature,” he said.

The hope is that eventually researchers from around the country will submit their data after publishing their findings. Also involved in the project are researchers at North Carolina A&T State and Harvard Universities, and it was funded by the National Science Foundation three years ago.

The researchers are also interested in including another type of “dark data”: archives of social-media posts. For example, Mr. Rajasekar has imagined creating algorithms to sort through tweets posted during the Arab Spring, for researchers studying the role of social media in the movement.

The project could save researchers time, said Laura Mandell, director of the Initiative for Digital Humanities, Media, and Culture at Texas A&M University at College Station. “People spend a lot of time cleaning their data, and we don’t need to each be reinventing the wheel, performing the same tasks on the same data sets,” she said.

And in some cases, the project could serve as a model for libraries at research institutions that are looking to better track data in line with federal requirements, said Bruce Herbert, director of digital services and scholarly communications and a geology professor at Texas A&M. He said it could also extend researchers’ “trusted network” of colleagues with whom they share data.