Editors’ Choice: Coding with Unknowns
Digital Humanities Now 2019-12-12
The Shakespeare and Company Project is based on the Sylvia Beach papers at Princeton University Library. Logbooks and lending library cards trace members’ engagement with Beach’s famous lending library in Paris. Members included literary luminaries Gertrude Stein, James Joyce, Ernest Hemingway, and Simone de Beauvoir, as well as students, businessmen, and French girls with English governesses. A significant part of the project data consists of events: memberships, renewals, reimbursements, borrowed books, purchased books, etc. Yet, due to the fragmentary and handwritten nature of these sources, the dates aren’t always easy to manage with code. Working on the project required managing imprecise data with precise code.
Let’s walk through how we tackled one aspect of this problem. The event_date_ranges method shown above aggregates all events for a library member into a timeline of known activity. The resulting list of date ranges is the basis for visualizing a member’s engagement with the library. This method loops through all the events for a member, sorted by date, and collects them into groups of date ranges. If an event starts within or up to one day after the current date range, it is included and the range is extended, if needed; if not, that range is closed and a new range is started. For Simone de Beauvoir, who was active in 1937 and 1940, the results look roughly like this:
[[1937-04-07), 1937-05-03)], [1940-07-25, 1940-12-31]]
You would probably expect a member’s borrowing activity to occur within the dates they were a member—but, due to missing logbooks and the oddities of human behavior, that’s not always the case.
The code has to handle one-day events, like buying a book or closing out an account, as well as longer-duration activities, like a membership or borrowing a book. In addition, because these are historical records that were kept by hand, and not all preserved, we have to handle a variety of unusual dates. There are date ranges with a start but no end, and in some cases, end dates with no start; the code here treats those as a single date.