Documenting Beautiful Data

metaLAB (at) Harvard 2016-05-18

With art museums making both their imagery and collections data open and accessible, the question arises: what to do with it all? This was the question put to participants in Beautiful Data II, a summer workshop supported by the Getty Research Institute and hosted by metaLAB with the Harvard Art Museums and the Carpenter Center for Visual Arts. For two weeks in July 2016, a gathering of art historians, curators, designers, and technologists forged concepts and skills necessary to make use of open collections to develop art-historical storytelling.

The workshop was extensively documented by participants and staff. metaLAB has now made that documentation public, in the form of a web site incorporating several data-visualization modes. As a provocation, we’ve treated this documentation as “data,” turning it into an array in JSON, an open data format used widely in web programming. This site visualizes the resulting data set in three ways: styled as content tiles; as “raw” metadata, with cross-referencing tags visibly linking records; and as a rotary timeline expressing those connections as arcs of adjacency. So although each visualization expresses the same data, its styling and features privilege certain characteristics and connections. Each has its emphases, and its missing elements as well. These interdependent visualizations not only offer a set of mnemonics for our class of participants; they also offer a collective provocation on the multimodal nature of “data” as a concept and norm.

In the first instance, our documentation data are visualized as media, in combinations of text and image. People are visualized as scans of their ID badges, which, during the workshop, identified us to one another and helped us pass in and out of the museum. Books, articles, and other resources show up as citations not unlike cards in a library catalogue or index file; presentations, projects, and other excursions are documented in images and descriptive tags. Vague, oblique, and fitfully incomplete, these documentary traces represent a “problem collection” in themselves.

An alternative visualization displays our documentation data in “raw” form as JSON, or JavaScript Object Notation, an open format which describes data objects as combinations of pairs of attributes and values. That formula might not be penetrable to you; what is essential to understand here is that we can use HTML, JavaScript, and other languages to interact with these data on web sites, making a direct connection between server and browser. Typically, these data reside invisibly in files accessed by the web sites we use to search and browse collections; here, we’ve made them visible. Clicking on tags visualizes additional linkages, as shared tags connect up and down the array. Clicking on ID numbers links every instance of that object cross-referenced in other fields throughout the dataset.

It’s important to note that these “raw” data are thoroughly cooked—they’re constructs, painstakingly transcribed in JSON format from manuscript notebooks and digital documents. This transcription necessarily involved ordering, structuring, excision, and revision. Increasingly, museums and other collections institutions are making their metadata available in this form for outside developers to interact with via Application Programming Interfaces that are open, accessible, and direct. JSON and APIs are not the only means for doing this. Another approach, called “linked open data,” is favored by some; still others practice “structured data” approaches using versions of XML. Each option has its benefits and drawbacks, its promises of utopia.

In an evocative circular graph, events, projects, presentations, and resources—“data objects” all—manifest as spokes on a wheel, their relative lengths reflecting the amount of media associated with each of them. Arcs express adjacencies in the form of shared tags. The default display, in color, expresses connections made in the JSON array on the previous screen. Mousing over an individual spoke shows its connections across the circle to other data objects that it references, with which it will tend to share resemblances. The arcs and spokes are evocative of connection and system, even as they may be less “transparent” in this view than in the other visualizations. What are “beautiful data”? They’re appealing, they’re lovely—and perhaps they’re a bit distant and incomprehensible as well, as they withdraw from us into their beauty.