0% found this document useful (0 votes)
11 views

FPF - Visual Guide To Practical Data DeID

This document provides a visual guide to different categories of data identifiability, from explicitly personal/identifiable data to anonymous data. It shows data exists on a spectrum ranging from containing direct identifiers to having both direct and indirect identifiers removed. Pseudonymous data contains transformed direct identifiers while de-identified data has direct and indirect identifiers removed or transformed, with additional safeguards. Anonymous data is highly aggregated with all identifiers removed.

Uploaded by

vaxowav170
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

FPF - Visual Guide To Practical Data DeID

This document provides a visual guide to different categories of data identifiability, from explicitly personal/identifiable data to anonymous data. It shows data exists on a spectrum ranging from containing direct identifiers to having both direct and indirect identifiers removed. Pseudonymous data contains transformed direct identifiers while de-identified data has direct and indirect identifiers removed or transformed, with additional safeguards. Anonymous data is highly aggregated with all identifiers removed.

Uploaded by

vaxowav170
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

Produced by

A VISUAL GUIDE TO PRACTICAL DATA DE-IDENTIFICATION


Produced by In collaboration with

FPF.ORG FPF.org
What do scientists, regulators
and lawyers mean when they
talk about de-identification?
How does anonymous data
differ from pseudonymous
or de-identified information?
Data identifiability is not
binary. Data lies on a
52%
spectrum with multiple
shades of identifiability.

This is a primer on how DEGREES OF IDENTIFIABILITY PSEUDONYMOUS DATA DE-IDENTIFIED DATA ANONYMOUS DATA
to distinguish different Information containing direct and indirect identifiers. Information from which direct identifiers have Direct and known indirect Direct and indirect identifiers have
categories of data. been eliminated or transformed, but indirect identifiers have been removed or been removed or manipulated together
identifiers remain intact. manipulated to break the linkage with mathematical and technical
to real world identities. guarantees to prevent re-identification.

EXPLICITLY POTENTIALLY NOT READILY KEY PROTECTED PROTECTED AGGREGATED


PERSONAL IDENTIFIABLE IDENTIFIABLE CODED PSEUDONYMOUS PSEUDONYMOUS DE-IDENTIFIED DE-IDENTIFIED ANONYMOUS ANONYMOUS

DIRECT IDENTIFIERS
Data that identifies a
person without additional
information or by linking
to information in the public ELIMINATED or ELIMINATED or ELIMINATED or ELIMINATED or ELIMINATED or ELIMINATED or ELIMINATED or
domain (e.g., name, SSN) INTACT PARTIALLY MASKED PARTIALLY MASKED TRANSFORMED TRANSFORMED TRANSFORMED TRANSFORMED TRANSFORMED TRANSFORMED TRANSFORMED

INDIRECT IDENTIFIERS
Data that identifies an
individual indirectly. Helps
connect pieces of information
until an individual can be
ELIMINATED or ELIMINATED or ELIMINATED or ELIMINATED or
singled out (e.g., DOB, gender) INTACT INTACT INTACT INTACT INTACT INTACT TRANSFORMED TRANSFORMED TRANSFORMED TRANSFORMED

SAFEGUARDS and CONTROLS


Technical, organizational
and legal controls preventing
employees, researchers or
NOT RELEVANT
other third parties from LIMITED or LIMITED or LIMITED or
NOT RELEVANT NOT RELEVANT due to high degree
re-identifying individuals NONE IN PLACE CONTROLS IN PLACE CONTROLS IN PLACE NONE IN PLACE CONTROLS IN PLACE NONE IN PLACE CONTROLS IN PLACE
due to nature of data due to nature of data of data aggregation

SELECTED Name, address, Unique device ID, Same as Potentially Clinical or research Unique, artificial Same as Pseudonymous, Data are suppressed, Same as De-Identified, For example, noise is Very highly aggregated
phone number, SSN, license plate, medical Identifiable except data datasets where only pseudonyms replace except data are also generalized, perturbed, except data are also calibrated to a data set data (e.g., statistical
EXAMPLES government-issued ID record number, are also protected by curator retains key direct identifiers (e.g., protected by safeguards swapped, etc. (e.g., GPA: protected by safeguards to hide whether an data, census data, or
(e.g., Jane Smith, cookie, IP address safeguards and controls (e.g., Jane Smith, HIPAA Limited Datasets, and controls 3.2 = 3.0-3.5, gender: and controls individual is present or population data that
123 Main Street, (e.g., MAC address (e.g., hashed MAC diabetes, HgB 15.1 John Doe = 5L7T LX619Z) female = gender: male) not (differential privacy) 52.6% of Washington,
555-555-5555) 68:A8:6D:35:65:03) addresses & legal g/dl = Csrk123) (unique sequence not DC residents are women)
representations) used anywhere else)

You might also like