Tim Davies: Joined Up Philanthropy data standards: seeking simplicity, and depth
Data & Society / saved 2014-01-09
Summary:
[Summary: technical notes on work in progress for the Open Philanthropy data standard ]
I’m currently working on sketching out a alpha version of a data standard for the Open Philanthropy project (soon to be 360giving). Based on work Pete Bass has done analysing the supply of data from trusts and foundations, a workshop on demand for the data , and a lot of time spent looking at existing standards at the content layer ( eGrant/hGrant , IATI , Schema.org , GML etc) and deeper technical layers (CSV, SDF , XML , RDF , JSON , JSON-Schema and JSON-LD ), I’m getting closer to having a draft proposal. But – ahead of that – and spurred on by discussions at the Berkman Center this afternoon about the role of blogging in helping in the idea-formation process, here’s a rough outline of where it might be heading. (What follows is ‘thinking aloud’ from my work in progress, and does not represent any set views of the Open Philanthropy project)
Building Blocks: Core data plus
There are lots of things that different people might want to know about philanthropic giving, from where money is going, to detailed information on the location of grant beneficiaries, information on the grant-making process, and results information. However, few trusts and foundations have all this information to hand, and very few are likely to have it in a single system such that creating an single open data file covering all these different areas of the funding process would be an easy task. And if presented with a massive spreadsheet with 100s of columns to fill in, many potential data publishers are liable to be put off by the complexity. We need a simple starting point for new publishers of data, and a way for those who want to say more about their giving to share deeper and more detailed information.
The approach to that should be a modular, rather than monolithic standard: based on common building blocks. Indeed, in line with the Joined Up Data efforts initiated by Development Initiatives, many of these building blocks may be common across different data standards.
In the Open Philanthropy case, we’ve sketched out seven broad building blocks, in addition to the core “who, what and how much” data that is needed for each of the ‘funding activities’ that are the heart of an open philanthropy standard. These are:
Organisations - names, addresses and other details of the organisations funding, receiving funds and partnering in a project
Process - information about the events which take place during the lifetime of a funding activity
Locations - information about the geography of a funded activity – including the location of the organisations involved, and the location of beneficiaries
Transactions - information about pledges and transfers of funding from one party to another
Results - information about the aims and targets of the activity, and whether they have been met
Classifications - categorisations of different kinds that are applied to the funded activity (e.g. the subject area), or to the organisations involved (e.g. audited accounts?)
Documents - links to associated documents, and more in-depth descriptions of the activity
Some of these may provide more in-depth information about some core field (e.g. ‘Total grant amount’ might be part of the core data, but individual yearly breakdowns could be expressed within the transactions building block), whilst others provide information that is not contained in the core information at all (results or documents for example).
An ontological approach: flat > structured > linked
One of the biggest challenges with sketching out a possible standard data format for open philanthropy is in balancing the technical needs of a number of different groups:
Publishers of the data need it to be as simple as possible to share their information. Publishing open philanthropy must be simple, with a minimum of technical skills and resources required. In practice, that means flat, spreadsheet-like data structures.
Analysts like flat spreadsheet-style data too – but often want to be able to cut it in different ways. Standards like IATI are based on richly structured XML data, nested a number of levels deep, which can make flattening the data for analysts to use it very challenging.
Coders prefer structured data. In most cases for web applications that means JSON. Whilst some expressive path languages for JSON are emerging , ideally a JSON structure should make it easy for a coder to simply drill-down in the tree to find what they want, so being able to look for activity.organisations.fundingOrganisation[0] is better than having to iterate through all the activity.organisation nodes to find the one which has “type”:”fundingOrganisation” .
Data integrators want to read data into their own preferred database structures, from noSQL to relational databases. Those wanting to integrate heterogeneous data sources from different ‘Joined Up Data’ standards might also benefit from Linked Data approaches, and graph-based data using cross-mapped ontologies.
It’s pretty hard to see how