NOT-OD-22-029: Request for Information on Proposed Updates and Long-Term Considerations for the NIH Genomic Data Sharing Policy
peter.suber's bookmarks 2022-01-27
"Respect for and protection of the interests of research participants are central tenets of the NIH GDS Policy and are fundamental to NIH’s stewardship of large-scale genomic data. Data derived from human research participants under the GDS Policy must be de-identified and provided with a random, unique code, the key to which is held by the submitting institution. NIH acknowledges that the concept of “identifiability” is a matter of ongoing deliberation within the scientific and bioethics communities. NIH relies on robust protections beyond de-identification, such as Institutional Review Board (IRB) consideration of risks associated with data submission, designating controlled access for certain data types, use of Data Access Committees to review requests, data use agreements to prohibit data disclosure and participant re-identification, and Certificates of Confidentiality[ii] to prohibit disclosure. As outlined in the NIH GDS Policy, the criteria for establishing de-identification are:
- Identities of research participants cannot be readily ascertained or otherwise associated with the data by the repository staff or secondary data users (45 CFR 46.102(e) (Federal Policy for the Protection of Human Subjects); and
- 18 identifiers enumerated at 45 CFR 164.514(b)(2)(the HIPAA Privacy Rule) are removed.
The reliance on the 18 identifiers enumerated at 45 CFR 164.514(b)(2) (the HIPAA Privacy Rule) as the only acceptable method under the GDS Policy for de-identification has recently presented several challenges. Certain data elements considered potentially identifiable, such as date ranges shorter than a year, may have scientific utility, especially when studying disease progression (e.g., with COVID-19) or higher resolution location data than the regulatory standard (e.g., full ZIP codes or mobile location data), which may be valuable for studying the social determinants of health or environmental risk.
Challenges have also arisen recently around data linkage. It is difficult to know in advance which data sources may add scientific value when combined, so it is not always possible to tell participants about data linkage during their initial consent. Linking data refers to connecting two or more data sources (often multiple studies) to bring together information about a person, enabling researchers to learn more about a participant or small group of participants. For example, a participant might enroll in a study that uses their electronic health record as well as a separate study that uses a sample of their blood, and the data about them from those studies could later be linked in new research for more powerful analyses. This challenge in prospectively informing participants about data linkage raises questions about respecting individuals’ autonomy and what participants understand about how their data will be used. Furthermore, data from multiple sources may not have been obtained under the same consent and de-identification expectations as the GDS Policy...."