A Python script to anonymize qualitative data for open criminology · CrimRxiv
scotttjacques's bookmarks 2024-10-31
Summary:
Qualitative researchers are expected, sometimes required, to publish their data open access. This is for the sake of science, impact, and social justice. Yet, understandably, qualitative criminologists are worried about what this means for their workload and their ability to protect subjects’ confidentiality. To be solutions-oriented, we developed an open-source Python script for anonymizing qualitative data. It uses named-entity recognition and fuzzy-rule based merging to identify and replace personally identifiable information (PII) with unique pseudonyms. This tool doesn’t eliminate the need for manual work, but it reduces the cost and associated risk. In this article, we describe and explain how our script works and how to use it. We conclude by discussing the implications for open (qualitative) criminology.