Opening up scientific data with CKAN and the DataHub | Open Knowledge Foundation Blog 2012-06-20


“The argument for open-access science has been won... As scientists’ work moves online, it is the old model we can no longer afford: the costs to humanity of restricting access is too high. A few scientists may have been saying this for years, but now, not only does open-access have the backing of such respected bodies as the Wellcome Trust, but the fact gets lead front-page coverage in the national press. A government-commissioned report published yesterday adds weight to the case. The Open Access tide, we may hope, is unstoppable... if you are a researcher, how can you get your research results out where people can read your conclusions – and even work with your data? At the Open Knowledge Foundation, we believe we have one answer. CKAN is a free, open-source data management system. It is used to get data out in the open by local andnational governments as well as international bodies, but it was originally designed for the more community-oriented use of which the DataHub is an excellent example. On the DataHub, anyone can create a dataset in a couple of minutes. Data can be uploaded or linked to elsewhere on the web. Different data ‘resources’ (such as files of any kind) can be collected together in a dataset, and annotated with information about their author(s), provenance, availablity for re-use, etc... CKAN is agnostic about what kind of data can be published. A scientific paper might be catalogued as one dataset. The resources could be, for example: different versions of the printed paper (say, the author’s TeX file, and a PDF); a link to the paper’s page on a journal website; spreadsheets of experimental results; the source code you wrote to process the results; and others, such as separate image files of your graphs and diagrams. Of course, how much is included will depend, among other things, on which rights you haven’t signed away to the publisher... CKAN provides interactive visualisations to your data, as well as an API for querying the data directly across the web – allowing other scientists (or your future self!) to search and process your results without downloading large data files or writing their own interface. Visualisations can also be embedded in blog posts or other web pages... CKAN stores a rich set of metadata, with versioned history... Benefits to the researcher ... Collect all your output together ... Collect publications from other hubs: Conversely, perhaps you are an institution, looking to build a repository ... Acess control: You can control who can see and edit your datasets, so for example joint papers can be edited by any of the authors... Alt metrics: Get a record of how many people have accessed or downloaded your data. If the appropriate CKAN extension is installed, your dataset can have share buttons (for Twitter, Facebook, etc) and you can also get figures for how often it has been shared...”



