0% found this document useful (0 votes)
8 views

Unit-1_Notes

Uploaded by

Daksh Chapadiya
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Unit-1_Notes

Uploaded by

Daksh Chapadiya
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

CHAPTER-1 : ETHICS IN DATA SCIENCE

1. How is Data evolving?


or
Compare and contrast data through various stages in the past and
present.
or
Explain the evolution of data ecosystem.
Ans:
Initial Stage:
 In the initial stages of data science, the kind of data that were used for
academic purposes or business needs was small, structural, and static.
 This kind of data was easy to put in the form of rows and columns and
displayed via spreadsheets.
 This type of data was very helpful and easy to handle for statisticians.
 Traditional tools such as descriptive statistics, predictive modelling, and
classifications were used to handle this type of data.
Second Stage:
 Data continued to evolve from small, structural, and basic to large,
unstructured and in motion.
 This change in data behaviour led to the need to develop skills to handle
sensor-based data, IoT data, machine learning skills, and concepts like
support vector machines.
Third Stage:
 The kind of data that we see today is massive, integrated, and dynamic.
 In this stage, we deal with a system of data dealing with another system
of data.
 To handle such behaviour, one must learn concepts of machine learning
and deep networks.
2. What is Data?
 Data is a collection of raw, unorganised facts and details like text,
observations, figures, symbols and descriptions of things etc.
 Data does not carry any specific purpose and has no significance by itself.
 Data is measured in terms of bits and bytes – which are basic units of
information in the context of computer storage and processing.
3. List some examples of data.
 The number of visitors to a website in one month.
 Inventory levels in a warehouse on a specific date.
 Individual satisfaction scores on a customer service survey.
 The price of a competitors’ product.
4. What is Information?
 Information is processed, organised and structured data.
 It provides context for data and enables decision making.
 For example, a single customer’s sale at a restaurant is data – this
becomes information when the business is able to identify the most
popular or least popular dish.
5. List some examples of Information.
 Analysing and listing the changes done in a website that has led to an
increase or decrease in monthly site visitors.
 Identifying supply chain issues based on trends in warehouse inventory
levels over time.
 Finding areas for improvement with customer service based on a
collection of survey responses.
 Determining if a competitor is charging more or less for a similar product.
6. What is unlabelled data?
 Any data that does not have any labels specifying its characteristics,
identity, classification, or properties can be considered unlabelled data.
 Example: photos, videos, or text that do not have any category or
classification assigned to it
7. What is Labelled Data?
 Any data which has a characteristic, category, or attributes assigned to it
can be referred to as labelled data.
 Example: the height of a human, price of a product
8. What is data labelling?
 Data labelling is defined as a process of identifying raw data- like text, pdf,
files, images and classifying and adding one or more labels to it to enable
machine learning models to learn from it.
 Labelling helps the machine learning model identify the attributes of the
data to analyse and make predictions.
9. State the difference between structured and unstructured data.
 Structured data is data that fits neatly into data tables and includes
discrete data types such as numbers, short text, and dates.
o Example : Excel files
 Unstructured data doesn't fit neatly into a data table because its size or
nature.
o Example : audio and video files
10. What is retrospective and prospective data?
 Retrospective data looks back into the past.
 In retrospective studies, individuals are sampled and information is
collected about their past.
 Prospective data looks forward into the future.
 In prospective studies, individuals are followed over time and data about
them is collected as their characteristics or circumstances change.
Example : A retrospective cohort study is a study where researchers
comparatively study the historical data of a group of people who have a
particular disease and a group of people who do not have that disease. A
prospective cohort study, on the other hand, is a type of study where no one
in the sample has the disease being measured when the study commences.
11. What is the importance of Ethics in data Science?
Or
Why do Data Scientists need to follow Ethics?
 Data scientists have access to a vast pool of data in their data analysis,
hence it becomes essential for them to adhere to ethical guidelines of
handling data.
 The use of protective mechanisms and policies to discourage the
mishandling and unethical use of data should be made part of best
practices.
 Some of the negative scenarios that may arise if ethical guidelines are
disrespected include:
a) A few people can do an immense amount of harm:
 Many organizations have become vulnerable to data
breaches.
 Hackers worldwide are on the lookout to crack through a
reputed organization's firewalls and steal important data
from their servers. The stolen data are then sold out for a
hefty sum.
Eg: To date, Yahoo holds the title for the largest data breach in
the history of the internet. In 2016, the company disclosed that
it had been the victim of multiple data breaches over the years,
starting in 2013. The data breaches had exposed the email
addresses, names, dates of birth of around three billion people
who have used Yahoo.
Another example of a data breach is Marriott (Starwood) hotel.
In 2018, Marriott’s data team had confirmed that around 383
million accounts of the guests were compromised in the year
2016. The breach had exposed the names, addresses, contact
numbers, and passport information of the guests whose
accounts were hacked.
b) Lack of consent:
 One of the leading social networking sites experimented
wherein without consent they purposely fed the users in their
newsfeed highly extreme point of view, particularly provocative
part of the news feed because they were trying to elicit a
reaction from the users and then to see if that impacted what
the users were posting back. The newsfeed was curated with
purposeful intention to see if that ultimately impacted the way
a user interacted with the rest of the network. Was the consent
from the users taken before performing this experiment? The
answer to this is “No”.
Refer more here : https://www.analyticsvidhya.com/blog/2022/02/ethics-in-
data-science-and-proper-privacy-and-usage-of-data/
12. What is data governance framework?
 Data governance framework can be defined as a collection of practices
and processes that ensure the authorized management of data in an
organization.
 The primary purpose behind implementing data governance by any
organization is to achieve better control over its data assets, including the
methods, technologies, and behaviours around the proper management
of data.
 It also deals with the security, privacy, integrity, and management of data
flowing in and out of the organization.
 Some of the benefits of implementing a data governance framework are:
o Procedures around regulation and compliance activities become
exact.
o There is greater transparency within data-related activities.
o Increase in value of organization’s data.
o Better resolution of issues around current data.

You might also like