Lecture-43 program to detect double space
Lecture-43 program to detect double space
Lecture # 43
1
Objectives
• Introduction to Anonymity of Data.
2
Anonymity of Data
• Data anonymization is the process of protecting
private or sensitive information by erasing or
encrypting identifiers that connect an individual to
stored data.
3
Anonymity of Data (Cont..)
• For example, you can run Personally Identifiable
Information (PII) such as names, social security
numbers, and addresses through a data anonymization
process that retains the data but keeps the source
anonymous.
4
Anonymity of Data (Cont..)
• However, even when you clear data of identifiers,
attackers can use de-anonymization methods to
retrace the data anonymization process.
5
Anonymity of Data (Cont..)
• The General Data Protection Regulation (GDPR)
outlines a specific set of rules that protect user data
and create transparency.
6
Anonymity of Data (Cont..)
Data Anonymization Techniques
Data masking
• Hiding data with altered values.
7
Anonymity of Data (Cont..)
Data Anonymization Techniques
Data masking
• For example, you can replace a value character with a
symbol such as “*” or “x”.
8
Anonymity of Data (Cont..)
Data Anonymization Techniques
Pseudonymization
• A data management and de-identification method that
replaces private identifiers with fake identifiers or
pseudonyms, for example replacing the identifier
“John Smith” with “Mark Spencer”.
9
Anonymity of Data (Cont..)
Data Anonymization Techniques
Pseudonymization
• Pseudonymization preserves statistical accuracy and
data integrity, allowing the modified data to be used
for training, development, testing, and analytics while
protecting data privacy.
10
Anonymity of Data (Cont..)
Data Anonymization Techniques
Generalization
• Deliberately removes some of the data to make it less
identifiable.
11
Anonymity of Data (Cont..)
Data Anonymization Techniques
Generalization
• You can remove the house number in an address, but
make sure you don’t remove the road name.
12
Anonymity of Data (Cont..)
Data Anonymization Techniques
Data swapping
• Also known as shuffling and permutation, a technique
used to rearrange the dataset attribute values so they
don’t correspond with the original records.
13
Anonymity of Data (Cont..)
Data Anonymization Techniques
Data swapping
• Swapping attributes (columns) that contain identifiers
values such as date of birth, for example, may have
more impact on anonymization than membership type
values.
14
Anonymity of Data (Cont..)
Data Anonymization Techniques
Data perturbation
• Modifies the original dataset slightly by applying
techniques that round numbers and add random noise.
15
Anonymity of Data (Cont..)
Data Anonymization Techniques
Data perturbation
• A small base may lead to weak anonymization while
a large base can reduce the utility of the dataset.
16
Anonymity of Data (Cont..)
Data Anonymization Techniques
Synthetic data
• Algorithmically manufactured information that has
no connection to real events.
17
Anonymity of Data (Cont..)
Data Anonymization Techniques
Synthetic data
• The process involves creating statistical models based
on patterns found in the original dataset.
18
Anonymity of Data (Cont..)
Data Anonymization Techniques
Data aggregation
• Data aggregation, which combines data collected
from many different sources into a single view, is
used to gain insights for enhanced decision-making,
or analysis of trends and patterns.
19
Anonymity of Data (Cont..)
Data Anonymization Techniques
Data aggregation
• Data can be aggregated at different levels of
granularity, from simple summaries to complex
calculations, and can be done on categorical data,
numerical data, and text data.
20
Anonymity of Data (Cont..)
Data Anonymization Techniques
Data aggregation
• Aggregated data can be presented in various forms,
and used for a variety of purposes, including analysis,
reporting, and visualization.
21
Anonymity of Data (Cont..)
Data Anonymization Techniques
Random data generation
• Random data generation, which randomly shuffles
data in order to obscure sensitive information, can be
applied to an entire dataset, or to specific fields or
columns in a database.
22
Anonymity of Data (Cont..)
Data Anonymization Techniques
Random data generation
• Often used together with data masking tools or data
tokenization tools, random data generation is ideal for
clinical trials, to ensure that the subjects are not only
randomly chosen, but also randomly assigned to
different treatment groups.
23
Anonymity of Data (Cont..)
Data Anonymization Techniques
Random data generation
• Often used together with data masking tools or data
tokenization tools, random data generation is ideal for
clinical trials, to ensure that the subjects are not only
randomly chosen, but also randomly assigned to
different treatment groups.
24
Anonymity of Data (Cont..)
Disadvantages of Data Anonymization
• The GDPR stipulates that websites must obtain
consent from users to collect personal information
such as IP addresses, device ID, and cookies.
25
Anonymity of Data (Cont..)
Disadvantages of Data Anonymization
• Collecting anonymous data and deleting identifiers
from the database limit your ability to derive value
and insight from your data.
26
Questions
Any Question Please?
27
Further Readings
• Chapter No. 1
Computer_Security_Principles_and_Practice_(3rd_E
dition)
By William Stallings and Lawrie Brown
28
Thanks
29