A Python library to check the level of anonymity of a dataset | Scientific Data

peter.suber's bookmarks 2022-12-26


Abstract:  Openly sharing data with sensitive attributes and privacy restrictions is a challenging task. In this document we present the implementation of pyCANON, a Python library and command line interface (CLI) to check and assess the level of anonymity of a dataset through some of the most common anonymization techniques: k-anonymity, (α,k)-anonymity, ℓ-diversity, entropy ℓ-diversity, recursive (c,ℓ)-diversity, t-closeness, basic β-likeness, enhanced β-likeness and δ-disclosure privacy. For the case of more than one sensitive attribute, two approaches are proposed for evaluating these techniques. The main strength of this library is to obtain a full report of the parameters that are fulfilled for each of the techniques mentioned above, with the unique requirement of the set of quasi-identifiers and sensitive attributes. The methods implemented are presented together with the attacks they prevent, the description of the library, examples of the different functions’ usage, as well as the impact and the possible applications that can be developed. Finally, some possible aspects to be incorporated in future updates are proposed.  



From feeds:

[IOI] Open Infrastructure Tracking Project » Items tagged with oa.floss in Open Access Tracking Project (OATP)
Open Access Tracking Project (OATP) » peter.suber's bookmarks


oa.tools oa.privacy oa.new oa.floss oa.data oa.anonymity oa.data

Date tagged:

12/26/2022, 08:55

Date published:

12/26/2022, 03:55