Scholars Collaborate to Make Sound Recordings More Accessible

Wired Campus 2014-03-26

A project based at the University of Texas at Austin is on track to increase the accessibility and use of sound recordings by repurposing a tool originally developed to classify bird calls. The goal is to teach the tool how to classify sounds in a wide variety of existing recordings, and even to give scholars ways to visualize the sounds.

“When dealing with sound, there’s only one way to access it or move around, and that is to press play and listen to it real time. Otherwise you can’t get—for the most part—any information” out of recordings, says Tanya E. Clement, an assistant professor at the University of Texas at Austin’s School of Information.

So Ms. Clement teamed up with David Tcheng and Loretta Auvil, of the Illinois Informatics Institute at the University of Illinois at Urbana-Champaign, to develop new tools. The project is called Hipstas—for High Performance Sound Technologies for Access and Scholarship—and it uses a machine-learning algorithm called Automatic Recognition with Layered Optimization, or ARLO, to visualize and classify sound.

The algorithm, created by Mr. Tcheng, was first used for ornithology—it could be trained to recognize patterns of bird calls and label the recordings. A researcher could then ask the machine to find similar or different patterns in other recordings without having to listen to the entire recording.

The algorithm demonstrated a potential for finding patterns in other sound materials, and for increasing the access to and use of sound for research or in classrooms.

“The Hipstas project came out of my observation that in digital humanities we are doing a lot of work with text and not so much work with other artifacts that are of interest to the humanities,” Ms. Clement says.

She received a grant from the National Endowment for the Humanities’ Institutes for Advanced Topics in the Digital Humanities in 2012. The money was used to put together a conference in which librarians, archivists, and scholars interested in sound gathered to develop ideas on how to use the technology.

Participants found different uses. For instance, Michael Nardone, a Ph.D. candidate at Concordia University, in Canada, tried comparing readings by two poets whose work is very different in terms of text appearance and organization—William Carlos Williams and Allen Ginsberg. He was surprised to find matches in tone and other affinities.

That makes sense “knowing that both WCW and Ginsberg were both from the same area of northern New Jersey,” Mr. Nardone wrote after the conference. “I wondered if I might be [able] to think about community poetics through sound, through a concept of sounded affinity.”

Ms. Clement plans to test those and other ideas to further develop the algorithm, which is now being tested on a supercomputer in Austin. In January she received a second grant, this one for $250,000 from the NEH’s Preservation and Access Division. She hopes the tool will be completed and available by 2015, when the second grant ends.

Such a tool could help save archived recordings that are in danger of being discarded for lack of use. Many colleges’ sound collections go beyond music and entertainment to encompass a wide array of cultural and historical items, such as poetry readings, speeches, and storytelling.

But with an estimated 46 million recordings already stored in public institutions, libraries, and archives, it might not be practical or realistic to think that all recordings can be preserved and will be used. The National Recording Preservation Plan, released by the Library of Congress in 2012, recognized that “expanding access to audio recordings remains problematic.”

“The idea is that if people don’t use these, then institutions won’t know they are important,” Ms. Clement says.

Because institutions have limited storage space, archivists give priority to preserve what will and can be used. The problem now is that content is more accessible in books or journals than in digital sound recordings, and important content can’t be discovered if no one is listening.