Frictionless Data Case Study: data.world – Open Knowledge International Blog

lterrat's bookmarks 2017-04-11

Summary:

"How do you use the Frictionless Data specs and what advantages did you find in using the Data Package approach?

We deal with a great diversity of data, both in terms of content and in terms of source format – most people working with data are emailing each other spreadsheets or CSVs, and not formally defining schema or semantics for what’s contained in these data files.

When data.world ingests tabular data, we 'virtualize' the tables away from their source format, and build layers of type and semantic information on top of the raw data. What this allows us to do is to produce a clean Tabular Data Package[^Package] for any dataset, whether the input is CSV files, Excel Spreadsheets, JSON data, SQLite Database files – any format that we know how to extract tabular information from – we can present it as cleaned-up CSV data with a datapackage.json that describes the schema and metadata of the contents.

[...]

What else would you like to see developed?

Graph data packages, or 'Universal Data Packages' that can encapsulate both tabular and graph data. It would be great to be able to present tabular and graph data in the same package and develop tools that know how to use these things together.

To elaborate on this, it makes a lot of sense to normalize tabular data down to clean, well-formed CSVs.or data that more graph-like, it would also make sense to normalize it to a standard format. RDF is a well-established and standardized format, with many serialized forms that could be used interchangeably (RDF XML, Turtle, N-Triples, or JSON-LD, for example). The metadata in the datapackage.json would be extremely minimal, since the schema for RDF data is encoded into the data file itself. It might be helpful to use the datapackage.json descriptor to catalog the standard taxonomies and ontologies that were in use, for example it would be useful to know if a file contained SKOS vocabularies, or OWL classes.

What are the next things you are going to be working on yourself?

We want to continue to enrich the metadata we include in Tabular Data Packages exported from data.world, and we’re looking into using datapackage.json as an import format as well as export.

How do the Frictionless Data specifications compare to existing proprietary and nonproprietary specifications for the kind of data you work with?

data.world works with lots of data across many domains – what’s great about the Frictionless Data specs is that it’s a lightweight content standard that can be a starting point for building domain-specific content standards – it really helps with the 'first mile' of standardising data and making it interoperable."

Link:

https://blog.okfn.org/2017/04/11/frictionless-data-case-study-data-world/

From feeds:

Open Access Tracking Project (OATP) » lterrat's bookmarks

Date tagged:

04/11/2017, 23:34

Date published:

04/11/2017, 19:34