Untangling the open data debate: definitions and implications

abernard102@gmail.com 2012-08-20

Summary:

“I’m exploring creating a series of short notes based on my current PhD research into open data as tools to support wider dialogue around data policy and practice. Here’s a draft of the first one, trying to set out some clear categories for understanding debates over data. It’s also available as a two-page PDF here ... Data is a hot topic right now: from big data, to open data and linked data, entrepreneurs and policy makers are making big claims about ‘data revolutions’. But, not all ‘data’ are the same, and good decision making about data involves knowing the differences... [1] Big data... Definition: Data that requires ‘massive’ computing power to process (Crawford & Boyd, 2011). Massive computing power, originally only available on supercomputers, is increasingly available on desktop computers or via low cost cloud computing. Implications: Companies and researchers can ‘data mine’ vast data resources, to identify trends and patterns. Big data is often generated by combining different datasets. Digital traces from individuals and companies are increasingly captured and stored for their potential value as ‘big data’... [2] Raw data... Definition: Primary data, as collected or measured direct from the source. Or Data in a form that allows it to be easily manipulated, sorted, filtered and remixed... Implications: Access to raw data can allows journalists, researchers and citizens to ‘fact check’ official analysis. Programmers are interested in building innovative services with raw data. [3] Real-time data ... Definitions: Data measured and made accessible with minimal delay. Often accessed over the web as a stream of data through APIs (Application Programming Interfaces)....Implications: Real-time data supports rapid identifications trends. Data can support the development of ‘early warning systems’ (e.g. Google Flu Trends; Ushahidi). ‘Smart systems’ and ‘smart cities’ can be configured to respond to real-time data and adapt to changing circumstances. [4] Open data... Definition: Datasets that are made accessible in non-proprietary formats under licenses that permit unrestricted re-use (OKF – Open Knowledge Foundation, 2006). Open government data involves governments providing many of their datasets online in this way. Implications: Third-parties can innovate with open data, generating social and economic benefits. Citizens and advocacy groups can use open government data to hold state institutions to account. Data can be shared between institutions with less friction. [5] Personal/ private data... Definitions: Data about an individual that they have a right to control access to. Such data might be gathered by companies, governments or other third-parties in order to provide a service to someone, or as part of regulatory and law-enforcement activities... Implications: Many big and raw datasets are based on aggregating personal data, and combining them with other data. Effective anonymisation of personal data is difficult particularly when open data provides the pieces for ‘jigsaw identification’ of private facts about people (Ohm, 2009)... [6] Linked data... Definitions: Datasets are published in the RDF format using URIs (web addresses) to identify the elements they contain, with links made between datasets (Berners-Lee, 2006; Shadbolt, Hall, & Berners-Lee, 2006)... Implications: A ‘web of linked data’ emerges, supporting ‘smart applications’ (Allemang & Hendler, 2008) that can follow the links between datasets. This provides the foundations for the Semantic Web. [7] More dimensions of data: These are just a few different types of data commonly discussed in policy debates. There are many other data-distinctions we could also draw. For example: we can look at whether data was crowd-sourced, statistically sampled, or collected through a census. The content of a dataset also has important influence on the implications that working with that data will have: an operational dataset of performance statistics is very different from a geographical dataset describing the road network for example. [8] Crossovers and conflicts: Almost all of the above types of data can be found in combination: you can have big linked raw data; real-time open data; raw personal data; and so-on. There are some combinations that must be addressed with care. For example, ‘open data’ and ‘personal data’ are two categories that are generally kept apart for good reason: open data involves giving up control over access to a dataset, whilst personal data is the data an individual has the right to control access over. These can be found in combination on platforms like Twitter, when individuals choose to give wider access to personal information by sharing it in a public space, but this is different from the controller of a dataset of personal data making that whole dataset openly available. [9] A nuanced debate: It’s not uncommon to see claims and anecdotes about the impacts of ‘big data’ use in companies like Amazon, Google or Twitter being used to justify publishing ‘open’ and ‘raw data’ from governments, drawing on aggregating ‘personal data’.

Link:

http://www.practicalparticipation.co.uk/odi/2012/03/untangling-the-open-data-debate-definitions-and-implications/

Updated:

08/16/2012, 06:08

From feeds:

Open Access Tracking Project (OATP) » abernard102@gmail.com

Tags:

oa.new oa.psi oa.policies oa.licensing oa.mining oa.government oa.lod oa.google oa.crowd oa.social_media oa.twitter oa.privacy oa.apis oa.cloud oa.definitions oa.amazon oa.libre oa.data

Authors:

abernard

Date tagged:

08/20/2012, 18:45

Date published:

03/26/2012, 20:08