Cs Unit-5
Cs Unit-5
Privacy Issues
Data Privacy:
Data Privacy or Information privacy is a part of the data protection area that deals with
the proper handling of data focusing on compliance with data protection regulations.
Data Privacy is centered around how data should be collected, stored, managed, and
shared with any third parties.
Data Privacy
• Data Privacy focuses on the rights of individuals, the purpose of data collection and
processing, privacy preferences, and the way organizations govern personal data of data
subjects.
• It focuses on how to collect, process, share, archive, and delete the data in accordance with
the law.
Data Security
• Data Security includes a set of standards and different safeguards and measures that an
organization is taking in order to prevent any third party from unauthorized access to digital
data, or any intentional or unintentional alteration, deletion or disclosure of data.
• It focuses on the protection of data from malicious attacks and prevents the exploitation of
stolen data (data breach or cyber-attack). It includes Access control, Encryption, Network
security, etc.
Data Breach:
A data breach is a security violation in which sensitive, protected or confidential data is
copied, transmitted, viewed, stolen or used by an individual unauthorized to do so.
Ransomware:
Ransomware is a type of malware attack in which the attacker locks and encrypts the
victim’s data, important files and then demands a payment to unlock and decrypt the data.
Phishing:
• Phishing attacks are the practice of sending fraudulent communications that appear to come from
a reputable source. It is usually done through email. The goal is to steal sensitive data like credit
card and login information, or to install malware on the victim’s machine.
Data Linkage:
Data linking is the process of joining datasets together so that we can make as much use as
possible of the information that they hold.
Data Profiling:
Data profiling helps you discover, understand and organize your data.
Data profiling techniques or processes used today fall into three major categories:
• Structure discovery
• Content discovery
• Relationship discovery.
• Structure discovery, also known as structure analysis, validates that the data that you have
is consistent and formatted correctly.
• Content discovery is the process of looking more closely into the individual elements of the
database to check data quality. This can help you find areas that contain null values or values
that are incorrect or ambiguous.
• Relationship discovery involves discovering what data is in use and trying to gain a better
understanding of the connections between the data sets.
There are four general methods by which data profiling tools help accomplish better data quality:
• Column profiling scans through a table and counts the number of times each value shows up
within each column. This method can be useful to find frequency distribution and patterns within
a column of data.
• Cross-column profiling is made up of two processes: key analysis and dependency analysis.
• Key analysis examines collections of attribute values by scouting for a possible primary
key.
• Dependency analysis is a more complex process that determines whether there are
relationships or structures embedded in a data set.
• Both techniques help analyze dependencies among data attributes within the same table.
• Cross-table profiling uses foreign key analysis, which is the identification of orphaned records
and determination of semantic and syntactic differences, to examine the relationships of column
sets in different tables.
This can help cut down on redundancy but also identify data value sets that could be
mapped together.
• Finally, data rule validation uses data profiling in a proactive manner to verify that data instances
and data sets conform with predefined rules. This process helps find ways to improve data quality
and can be achieved either through batch validation or an ongoing validation service.