FPF - Visual Guide To Practical Data DeID
FPF - Visual Guide To Practical Data DeID
FPF.ORG FPF.org
What do scientists, regulators
and lawyers mean when they
talk about de-identification?
How does anonymous data
differ from pseudonymous
or de-identified information?
Data identifiability is not
binary. Data lies on a
52%
spectrum with multiple
shades of identifiability.
This is a primer on how DEGREES OF IDENTIFIABILITY PSEUDONYMOUS DATA DE-IDENTIFIED DATA ANONYMOUS DATA
to distinguish different Information containing direct and indirect identifiers. Information from which direct identifiers have Direct and known indirect Direct and indirect identifiers have
categories of data. been eliminated or transformed, but indirect identifiers have been removed or been removed or manipulated together
identifiers remain intact. manipulated to break the linkage with mathematical and technical
to real world identities. guarantees to prevent re-identification.
DIRECT IDENTIFIERS
Data that identifies a
person without additional
information or by linking
to information in the public ELIMINATED or ELIMINATED or ELIMINATED or ELIMINATED or ELIMINATED or ELIMINATED or ELIMINATED or
domain (e.g., name, SSN) INTACT PARTIALLY MASKED PARTIALLY MASKED TRANSFORMED TRANSFORMED TRANSFORMED TRANSFORMED TRANSFORMED TRANSFORMED TRANSFORMED
INDIRECT IDENTIFIERS
Data that identifies an
individual indirectly. Helps
connect pieces of information
until an individual can be
ELIMINATED or ELIMINATED or ELIMINATED or ELIMINATED or
singled out (e.g., DOB, gender) INTACT INTACT INTACT INTACT INTACT INTACT TRANSFORMED TRANSFORMED TRANSFORMED TRANSFORMED
SELECTED Name, address, Unique device ID, Same as Potentially Clinical or research Unique, artificial Same as Pseudonymous, Data are suppressed, Same as De-Identified, For example, noise is Very highly aggregated
phone number, SSN, license plate, medical Identifiable except data datasets where only pseudonyms replace except data are also generalized, perturbed, except data are also calibrated to a data set data (e.g., statistical
EXAMPLES government-issued ID record number, are also protected by curator retains key direct identifiers (e.g., protected by safeguards swapped, etc. (e.g., GPA: protected by safeguards to hide whether an data, census data, or
(e.g., Jane Smith, cookie, IP address safeguards and controls (e.g., Jane Smith, HIPAA Limited Datasets, and controls 3.2 = 3.0-3.5, gender: and controls individual is present or population data that
123 Main Street, (e.g., MAC address (e.g., hashed MAC diabetes, HgB 15.1 John Doe = 5L7T LX619Z) female = gender: male) not (differential privacy) 52.6% of Washington,
555-555-5555) 68:A8:6D:35:65:03) addresses & legal g/dl = Csrk123) (unique sequence not DC residents are women)
representations) used anywhere else)