0% found this document useful (0 votes)
2 views

Lecture-43 program to detect double space

The document discusses data anonymization, which is the process of protecting sensitive information by removing identifiers that link individuals to their data. It outlines various techniques for anonymization, such as data masking, pseudonymization, generalization, and synthetic data, while also highlighting the limitations imposed by regulations like GDPR. Additionally, it addresses the disadvantages of anonymization, including the potential loss of valuable insights from the data.

Uploaded by

MUHAMMAD AHMAD
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Lecture-43 program to detect double space

The document discusses data anonymization, which is the process of protecting sensitive information by removing identifiers that link individuals to their data. It outlines various techniques for anonymization, such as data masking, pseudonymization, generalization, and synthetic data, while also highlighting the limitations imposed by regulations like GDPR. Additionally, it addresses the disadvantages of anonymization, including the potential loss of valuable insights from the data.

Uploaded by

MUHAMMAD AHMAD
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 29

Information Security

Lecture # 43

Dr. Shafiq Hussain


Associate Professor & Chairperson
Department of Computer Science

1
Objectives
• Introduction to Anonymity of Data.

2
Anonymity of Data
• Data anonymization is the process of protecting
private or sensitive information by erasing or
encrypting identifiers that connect an individual to
stored data.

3
Anonymity of Data (Cont..)
• For example, you can run Personally Identifiable
Information (PII) such as names, social security
numbers, and addresses through a data anonymization
process that retains the data but keeps the source
anonymous.

4
Anonymity of Data (Cont..)
• However, even when you clear data of identifiers,
attackers can use de-anonymization methods to
retrace the data anonymization process.

• Since data usually passes through multiple sources—


some available to the public—de-anonymization
techniques can cross-reference the sources and reveal
personal information.

5
Anonymity of Data (Cont..)
• The General Data Protection Regulation (GDPR)
outlines a specific set of rules that protect user data
and create transparency.

• While the GDPR is strict, it permits companies to


collect anonymized data without consent, use it for
any purpose, and store it for an indefinite time—as
long as companies remove all identifiers from the
data.

6
Anonymity of Data (Cont..)
Data Anonymization Techniques
Data masking
• Hiding data with altered values.

• You can create a mirror version of a database and


apply modification techniques such as character
shuffling, encryption, and word or character
substitution.

7
Anonymity of Data (Cont..)
Data Anonymization Techniques
Data masking
• For example, you can replace a value character with a
symbol such as “*” or “x”.

• Data masking makes reverse engineering or detection


impossible.

8
Anonymity of Data (Cont..)
Data Anonymization Techniques
Pseudonymization
• A data management and de-identification method that
replaces private identifiers with fake identifiers or
pseudonyms, for example replacing the identifier
“John Smith” with “Mark Spencer”.

9
Anonymity of Data (Cont..)
Data Anonymization Techniques
Pseudonymization
• Pseudonymization preserves statistical accuracy and
data integrity, allowing the modified data to be used
for training, development, testing, and analytics while
protecting data privacy.

10
Anonymity of Data (Cont..)
Data Anonymization Techniques
Generalization
• Deliberately removes some of the data to make it less
identifiable.

• Data can be modified into a set of ranges or a broad


area with appropriate boundaries.

11
Anonymity of Data (Cont..)
Data Anonymization Techniques
Generalization
• You can remove the house number in an address, but
make sure you don’t remove the road name.

• The purpose is to eliminate some of the identifiers


while retaining a measure of data accuracy.

12
Anonymity of Data (Cont..)
Data Anonymization Techniques
Data swapping
• Also known as shuffling and permutation, a technique
used to rearrange the dataset attribute values so they
don’t correspond with the original records.

13
Anonymity of Data (Cont..)
Data Anonymization Techniques
Data swapping
• Swapping attributes (columns) that contain identifiers
values such as date of birth, for example, may have
more impact on anonymization than membership type
values.

14
Anonymity of Data (Cont..)
Data Anonymization Techniques
Data perturbation
• Modifies the original dataset slightly by applying
techniques that round numbers and add random noise.

• The range of values needs to be in proportion to the


perturbation.

15
Anonymity of Data (Cont..)
Data Anonymization Techniques
Data perturbation
• A small base may lead to weak anonymization while
a large base can reduce the utility of the dataset.

• For example, you can use a base of 5 for rounding


values like age or house number because it’s
proportional to the original value.

16
Anonymity of Data (Cont..)
Data Anonymization Techniques
Synthetic data
• Algorithmically manufactured information that has
no connection to real events.

• Synthetic data is used to create artificial datasets


instead of altering the original dataset or using it as is
and risking privacy and security.

17
Anonymity of Data (Cont..)
Data Anonymization Techniques
Synthetic data
• The process involves creating statistical models based
on patterns found in the original dataset.

• You can use standard deviations, medians, linear


regression or other statistical techniques to generate
the synthetic data.

18
Anonymity of Data (Cont..)
Data Anonymization Techniques
Data aggregation
• Data aggregation, which combines data collected
from many different sources into a single view, is
used to gain insights for enhanced decision-making,
or analysis of trends and patterns.

19
Anonymity of Data (Cont..)
Data Anonymization Techniques
Data aggregation
• Data can be aggregated at different levels of
granularity, from simple summaries to complex
calculations, and can be done on categorical data,
numerical data, and text data.

20
Anonymity of Data (Cont..)
Data Anonymization Techniques
Data aggregation
• Aggregated data can be presented in various forms,
and used for a variety of purposes, including analysis,
reporting, and visualization.

21
Anonymity of Data (Cont..)
Data Anonymization Techniques
Random data generation
• Random data generation, which randomly shuffles
data in order to obscure sensitive information, can be
applied to an entire dataset, or to specific fields or
columns in a database.

22
Anonymity of Data (Cont..)
Data Anonymization Techniques
Random data generation
• Often used together with data masking tools or data
tokenization tools, random data generation is ideal for
clinical trials, to ensure that the subjects are not only
randomly chosen, but also randomly assigned to
different treatment groups.

23
Anonymity of Data (Cont..)
Data Anonymization Techniques
Random data generation
• Often used together with data masking tools or data
tokenization tools, random data generation is ideal for
clinical trials, to ensure that the subjects are not only
randomly chosen, but also randomly assigned to
different treatment groups.

24
Anonymity of Data (Cont..)
Disadvantages of Data Anonymization
• The GDPR stipulates that websites must obtain
consent from users to collect personal information
such as IP addresses, device ID, and cookies.

25
Anonymity of Data (Cont..)
Disadvantages of Data Anonymization
• Collecting anonymous data and deleting identifiers
from the database limit your ability to derive value
and insight from your data.

• For example, anonymized data cannot be used for


marketing efforts, or to personalize the user
experience.

26
Questions
Any Question Please?

You can contact me at: [email protected]

Your Query will be answered within one working day.

27
Further Readings
• Chapter No. 1
Computer_Security_Principles_and_Practice_(3rd_E
dition)
By William Stallings and Lawrie Brown

28
Thanks

29

You might also like