Cloud Dataverse: A Data Repository Platform for the Cloud
peter.suber's bookmarks 2017-04-29
"Cloud Dataverse benefits from the repository infrastructure and rich set of features provided by Dataverse, as well as from cloud technologies that enable storing and computing of large sets. Our first implementation of Cloud Dataverse is with the Massachusetts Open Cloud (MOC); a regional public cloud effort by Harvard, Boston University, MIT, Northeastern, and UMass along with a community of industry partners. How does Cloud Dataverse extend Dataverse? First, it integrates with the MOC’s OpenStack Swift object storage. Swift provides scalable storage optimized to handle large and not bounding datasets, at a low cost. This integration lets Dataverse users deposit and access large data files directly from the Swift storage, without being limited by the Dataverse web interface and APIs, which can only handle datasets up to a few GBs. Second, it integrates with the MOC’s OpenStack’s Keystone identity services. This allows data users to find a dataset in a Dataverse repository and seamlessly access the data in the cloud environment, using the credentials in Dataverse. And third, it integrates with the MOC’s OpenStack Sahara service to manage access to computational-intensive data processing frameworks such as Hadoop or Spark. We are now starting to design how Cloud Dataverse can integrate with other Dataverse repositories to allow datasets from federated repositories to be automatically integrated into the Cloud.
With the convergence of two growing open-source projects, Cloud Dataverse can grow the set of features and be useful to both the scientific and industry communities. But more importantly, Cloud Dataverse represents the necessary next step to combine cloud computing with data sharing."
From feeds:Open Access Tracking Project (OATP) » peter.suber's bookmarks
Open Access Tracking Project (OATP) » lterrat's bookmarks