100% found this document useful (11 votes)

8K views14 pages

CSBS - AD3491 - FDSA - IA 1 - Answer Key

This document contains an answer key for an internal assessment test on fundamentals of data science and analytics. It includes multiple choice and short answer questions that assess students' understanding of key concepts like data quality, data science processes, applications of data science, outlier detection, frequency distributions, and measures of central tendency. The document provides the questions, learning outcomes, required skills, and model answers to aid instructors in evaluating students.

Uploaded by

R.Mohan Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (11 votes)

8K views14 pages

CSBS - AD3491 - FDSA - IA 1 - Answer Key

Uploaded by

R.Mohan Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Saranathan College of Engineering

Tiruchirappalli - 620012

Internal Assessment Test – I – Answer Key Date/Session 21-09-2022 Marks 50

Course code AD3491 Course Title FUNDAMENTALS OF DATA SCIENCE AND ANALYTICS
Batch No. Duration 90 MINUTES Academic Year 2022-2023/ODD
Year II Semester III Department B.Tech - CSBS
Part – A (20 Marks)
Answer all the Questions (10x2=10 Marks)
Q. No. Questions CO Skills
1 What are the characteristics of a quality data?
 Validity - The degree to which your data conforms to defined business
rules or constraints.
 Accuracy - Ensure your data is close to the true values.
 Completeness - The degree to which all required data is known. C206.1 R
 Consistency - Ensure your data is consistent within the same dataset
and/or across multiple data sets.
 Uniformity - The degree to which the data is specified using the same unit
of measure.
2 What do you mean by Data Science?
 Data science is the domain of study that deals with vast volumes of data
using modern tools and techniques to find hidden patterns, derive
meaningful information, and make business decisions.
 Data science can be explained as the entire process of gathering actionable C206.1 R
insights from raw data that involves concepts like pre-processing of data,
data modelling, statistical analysis, data analysis, machine learning
algorithms, etc.
 The main purpose of data science is to compute better decision making.
3 List out at least five applications of data science.
 Finance and Fraud & Risk Detection.
 Healthcare.
 Internet Search and Website Recommendations.
C206.1 R
 Retail Marketing and Targeted Advertising.
 Advanced Image Recognition.
 Speech Recognition.
 Airline Route Planning.
4 Write short note on outlier detection and state its real-time application.
 In statistics, an outlier is a data point that differs significantly from
other observations.
 An outlier detection technique (ODT) is used to detect anomalous C206.1 U
observations/samples that do not fit the typical/normal statistical
distribution of a dataset.
 Applications of Outlier Detection are SPAM Detection, Credit Card
Fraudulent Activity detection, intrusion detection in cyber security.
5 What are the contents should be included in a project charter?
A project charter requires teamwork, and your input covers at least the
following:
i. A clear research goal
ii. The project mission and context
C206.1 U
iii. How you’re going to perform your analysis
iv. What resources you expect to use
v. Proof that it’s an achievable project, or proof of concepts
vi. Deliverables and a measure of success
vii. A timeline
Define Data Cleansing.
 Data cleaning is the process of fixing or removing incorrect, corrupted,
incorrectly formatted, duplicate, or incomplete data within a dataset.
6  When combining multiple data sources, there are many opportunities C206.1 R
for data to be duplicated or mislabeled. If data is incorrect, outcomes
and algorithms are unreliable, even though they may look correct.
 Data cleansing, also referred to as data cleaning or data scrubbing.
Why Frequency Distribution is important in Data Science?
 Frequency distribution is an organized tabulation/graphical representation
of the number of individuals in each category on the scale of measurement.
The reasons for constructing a frequency distribution are as follows:
 To organize the data in a meaningful, intelligible way.
7 C206.2 AZ
 To determine the shape of the distribution.
 To facilitate computational procedures for measures of average and
spread.
 To draw charts and graphs for the presentation of data.
 To enable the reader to make comparisons among different data sets.
State the “Guidelines for frequency distribution”.
1. Each observation should be included in one, and only one, class.
2. List all classes, even those with zero frequencies.
3. All classes should have equal intervals.
8 4. All classes should have both an upper boundary and a lower boundary. C206.2 U
5. Select the class interval from convenient numbers, particularly 5 and 10 or
multiples of 5 and 10.
6. The lower boundary of each class should be a multiple of the class interval.
7. Aim for a total of approximately 10 classes.
Write short note on Stem-and-leaf display. Represent the following data
in stem-and-leaf display. 67, 74, 63, 88, 82, 97, 65, 79
 A stem-and-leaf display is used to present quantitative data in a graphical
9 format, similar to a histogram, to assist in visualizing the shape of a C206.2 C
distribution.
 A stem and leaf plot displays data by splitting up each value in a dataset
into a “stem” and a “leaf.”
67, 74, 63, 88, 82, 97, 65, 79

Raw Data Stem Leaf

67
6 7 3 5
74
63 7 4 9
88
82 8 8 2
97
65
9 7
79
How the skewness of a data distribution can be identified?

10 C206.2 U
Part – B
(Answer all the questions 2 x 10 = 20marks)

Q.
Questions CO Skills
No.
11 Discuss in detail about step-by-step process in Data Science with neat diagram
Ans: Refer to Unit-I study material Page No 15-18
The data science process typically consists of six steps, as follows,
1. Setting the research goal - Defining the what, the why, and the how of your
project in a project charter.
2. Retrieving data - Finding and getting access to data needed in your project.
This data is either found within the company or retrieved from a third party.
3. Data preparation - Checking and remediating data errors, enriching the data
with data from other data sources, and transforming it into a suitable format for C206.1 R
your models.
4. Data exploration - Diving deeper into your data using descriptive statistics
and visual techniques.
5. Data modelling - Using machine learning and statistical techniques to
achieve your project goal.
6. Presentation and automation - Presenting your results to the stakeholders
and industrializing your analysis process for repetitive reuse and integration with
other tools.

12 Discuss briefly about:

i. Life cycle of Data Science (5)
Ans: Refer to Unit-I Study material – Page 04 to 06
Formulating a Business Problem, Data Extraction, Transformation,
Loading, Data Preprocessing, Data Modeling, Gathering Actionable C206.1 R
Insights, Solutions For the Business Problem

ii. Machine Learning in Data Science (5)

Ans: Refer to Unit-I Study material – Page 09 to 11
Regression, Decision tree, Clustering, Classification, Outlier Analysis

13 The IQ scores for a group of 35 high school dropouts are as follows:

91 85 84 79 80 87 96 75 86
104 95 71 105 90 77 123 80 100
93 108 98 69 99 95 90 110 109
94 100 103 112 90 90 98 89
C206.2 A
i. Construct a frequency distribution for grouped data (4)
ii. Relative Frequency distribution (3)
iii. Cumulative Frequency distribution (3)

Solution:
Or

14 i. Discuss in detail about “Measures of Central Tendency” and calculate

each measure for the following retirement ages data: (6)
A
60, 63, 45, 63, 65, 70, 55, 63, 60, 65, 63 C206.2
ii. Is it possible to calculate “Mean” for qualitative data? Justify your
answer. (2) AZ
iii. Is the above data following “Bimodal”? Justify your answer. (2)
AZ
Answer:
ii) NO, Mean or Average can be computed only for quantitative data which is measurable in nature. Whereas,
Qualitative data is not measurable and countable so mean cannot be calculated.

iii) Bimodal Mode - A set of data with two Modes is known as a Bimodal Mode. This means that there are
two data values that are having the highest frequencies.
60, 63, 45, 63, 65, 70, 55, 63, 60, 65, 63

Data Frequency
45 1
55 1
60 2
63 4
65 2
70 1

Since only one observation (63) is having high frequency value of 4. This is NOT following bimodal.

Part – C
(Answer all the questions 1 x 10 = 10marks)

Q.No. Questions CO Skills

Discuss about following measures and calculate them with given
“residence changes” data.
1, 3, 4, 1, 0, 2, 5, 8, 0, 2, 3, 4, 7, 11, 0, 2, 3, 4
i. Range (1 + 1)
15 ii. Variance (1 + 1) C206.2 A
iii. Standard Deviation (1 + 1)
iv. InterQuartile Range (IQR) (1 + 1)
v. Z-Score (1 + 1)

Answer:

 Definition for all above measures needs to be written. Each definition carries 1 mark.
16 During their first swim through a water maze, 15 laboratory rats made the
following number of errors (blind alleyway entrances):

2, 17, 5, 3, 28, 7, 5, 8, 5, 6, 2, 12, 10, 4, 3.

(a) Find the mode, median, and mean for these data. (6)
C206.2 A
(b) Draw the shape of uniform distribution, positively skewed
distribution, and negatively skewed distribution. (3)

(c) Without constructing a frequency distribution or graph, would you

characterize the shape of the above data distribution as balanced,
positively skewed, or negatively skewed? (1)

Answer:
(b)
Normal Distribution Positively Skewed Distribution Negatively Skewed Distribution

(c)

MFA Manageria Portion Part 1 Notes Complete 2nd Edition
No ratings yet
MFA Manageria Portion Part 1 Notes Complete 2nd Edition
71 pages
DS - Question paper
No ratings yet
DS - Question paper
3 pages
Copia de Starfinder Character Sheet
No ratings yet
Copia de Starfinder Character Sheet
67 pages
Devotional Paths To The Divine
100% (1)
Devotional Paths To The Divine
19 pages
AD3491 QB
No ratings yet
AD3491 QB
17 pages
Foundations of Data Science Faq 5 Units
No ratings yet
Foundations of Data Science Faq 5 Units
13 pages
Subject Verb Agreement
No ratings yet
Subject Verb Agreement
17 pages
32-1-1 SST 2020 PYP (1)
No ratings yet
32-1-1 SST 2020 PYP (1)
12 pages
Lesson Plan Profit and Loss Class 7th
0% (1)
Lesson Plan Profit and Loss Class 7th
27 pages
Updated Cs3352 - Foundations of Data Science - Duraimurugan
No ratings yet
Updated Cs3352 - Foundations of Data Science - Duraimurugan
16 pages
LP-STITCHES Mhey
No ratings yet
LP-STITCHES Mhey
15 pages
CE 112 MODULE 5 NUCLEAR CHEM Used
No ratings yet
CE 112 MODULE 5 NUCLEAR CHEM Used
25 pages
Aptis Speaking Part 1: Sample Questions, Model Answers and Tips
No ratings yet
Aptis Speaking Part 1: Sample Questions, Model Answers and Tips
32 pages
Transformative Philosophical Perspectives Notes
No ratings yet
Transformative Philosophical Perspectives Notes
4 pages
TEACHERS' PROJECT GUIDE NCDC 2023
No ratings yet
TEACHERS' PROJECT GUIDE NCDC 2023
11 pages
Ebrd Contract
No ratings yet
Ebrd Contract
18 pages
Lennart Green Magic Card Tricks PDF
0% (1)
Lennart Green Magic Card Tricks PDF
6 pages
SSRN Id4123138
No ratings yet
SSRN Id4123138
12 pages
Statistics and Probability TG For SHS
75% (44)
Statistics and Probability TG For SHS
18 pages
Ocs353dsf Unit Wise Notes
100% (2)
Ocs353dsf Unit Wise Notes
121 pages
De Borja V de Borja
No ratings yet
De Borja V de Borja
7 pages
Cephean Archive - Slightly Magical Trinkets
No ratings yet
Cephean Archive - Slightly Magical Trinkets
3 pages
Q1 DLL-MAPEH 10 - Arts
No ratings yet
Q1 DLL-MAPEH 10 - Arts
8 pages
Digital Signature: Submitted By: Shrinkhla Sinha-1js17is066 Shivani Beldale-1js17is063 Trishala Kumari-1js17is082
No ratings yet
Digital Signature: Submitted By: Shrinkhla Sinha-1js17is066 Shivani Beldale-1js17is063 Trishala Kumari-1js17is082
25 pages
Updated Methodological Guidance For The Conduct of Scoping Reviews
No ratings yet
Updated Methodological Guidance For The Conduct of Scoping Reviews
8 pages
IA Mathematics SL Volume by Revolution
86% (7)
IA Mathematics SL Volume by Revolution
28 pages
A Sacred Feminine in Hinduism
100% (1)
A Sacred Feminine in Hinduism
7 pages
CSBS - AD3491 - FDSA - IA 2 - Answer Key
50% (2)
CSBS - AD3491 - FDSA - IA 2 - Answer Key
14 pages
Ccs346 Eda Unit 1 Notes
100% (2)
Ccs346 Eda Unit 1 Notes
20 pages
CS3311 - Data Structures Lab.R2021
100% (3)
CS3311 - Data Structures Lab.R2021
58 pages
Lab Manual Daa Ad3351 Aids III Sem Regulation 2021
No ratings yet
Lab Manual Daa Ad3351 Aids III Sem Regulation 2021
48 pages
Ad3511 Deep Learning Lab Manual III Yearjnn
No ratings yet
Ad3511 Deep Learning Lab Manual III Yearjnn
58 pages
OCS353 - Data Science Manual-FULL
No ratings yet
OCS353 - Data Science Manual-FULL
64 pages
Ang Ginto Accomplishment Report
No ratings yet
Ang Ginto Accomplishment Report
2 pages
Power Networking Tips
No ratings yet
Power Networking Tips
1 page
cs3361 Data Science Lab Record Manual
89% (9)
cs3361 Data Science Lab Record Manual
92 pages
CS3381 Oop Lab Manual
100% (3)
CS3381 Oop Lab Manual
7 pages
Godspell Synopsis
No ratings yet
Godspell Synopsis
4 pages
Ad3301 - Data Exploration and Visualization
100% (4)
Ad3301 - Data Exploration and Visualization
2 pages
Cs3461 Operating System Lab Manual-1-4
100% (2)
Cs3461 Operating System Lab Manual-1-4
24 pages
AD3491 - FDSA - Unit I - Introduction - Part I
100% (2)
AD3491 - FDSA - Unit I - Introduction - Part I
23 pages
NNDL Lab Manual
No ratings yet
NNDL Lab Manual
41 pages
Os Lab Manual
No ratings yet
Os Lab Manual
107 pages
Ccs334 - Big Data Analytics
75% (4)
Ccs334 - Big Data Analytics
2 pages
GenEd 1 - Understanding The Self
No ratings yet
GenEd 1 - Understanding The Self
4 pages
Ad3491 Fdsa Unit 3 Notes Eduengg
No ratings yet
Ad3491 Fdsa Unit 3 Notes Eduengg
37 pages
Foreign Language Teachers Are Born, Not Made. A Closer Look at The Teaching Profession
No ratings yet
Foreign Language Teachers Are Born, Not Made. A Closer Look at The Teaching Profession
5 pages
cs3362 Foundations of Data Science Lab Manual
75% (8)
cs3362 Foundations of Data Science Lab Manual
53 pages
Ad3491 Fdsa Unit 2 Notes Eduengg
No ratings yet
Ad3491 Fdsa Unit 2 Notes Eduengg
82 pages
CCS345 Ethics and AI Apr May 2024 Question Paper Download
No ratings yet
CCS345 Ethics and AI Apr May 2024 Question Paper Download
4 pages
Unit I - Part I Notes
100% (7)
Unit I - Part I Notes
33 pages
Machine Learning - AL3451 - Notes - Unit 1 - Introduction To Machine Learning
No ratings yet
Machine Learning - AL3451 - Notes - Unit 1 - Introduction To Machine Learning
29 pages
CS3401 Algorithm Lab Manual
No ratings yet
CS3401 Algorithm Lab Manual
41 pages
ccs346 Eda Lab Manual
No ratings yet
ccs346 Eda Lab Manual
41 pages
Ccs369 - Text and Speech Analysis - Lab Manual
100% (1)
Ccs369 - Text and Speech Analysis - Lab Manual
23 pages
CCS356 OOSE QUESTION BANK New Format
100% (2)
CCS356 OOSE QUESTION BANK New Format
7 pages
Lab Cs3591 Computer Networks Lab
100% (2)
Lab Cs3591 Computer Networks Lab
38 pages
AL3391 Notes Unit I
100% (1)
AL3391 Notes Unit I
52 pages
2 Visual Information and Media
100% (1)
2 Visual Information and Media
15 pages
Ad3411 Data Science and Analytics Laboratory
100% (7)
Ad3411 Data Science and Analytics Laboratory
24 pages
Cs3591 Computer Networks Lab Mannual
0% (1)
Cs3591 Computer Networks Lab Mannual
41 pages
Jessica Briks
No ratings yet
Jessica Briks
2 pages
Cs3481 - Dbms Lab Manual
No ratings yet
Cs3481 - Dbms Lab Manual
72 pages
Ad3301-Data-Exploration-And-Visualization Lab Manual
No ratings yet
Ad3301-Data-Exploration-And-Visualization Lab Manual
24 pages
CCS341-Data Warehousing Lab Manual (2021)
100% (1)
CCS341-Data Warehousing Lab Manual (2021)
50 pages
Ccs354-Network-Security Lab Manual (Printcopy)
0% (1)
Ccs354-Network-Security Lab Manual (Printcopy)
59 pages
MACHINE LEARNING AL3451
No ratings yet
MACHINE LEARNING AL3451
10 pages
ccs341 Data Warehousing Lab Manual2021
No ratings yet
ccs341 Data Warehousing Lab Manual2021
41 pages
CS3491 Ai & ML Lab Manual
No ratings yet
CS3491 Ai & ML Lab Manual
57 pages
CS3481 - DBMS Lab Manual - New
100% (2)
CS3481 - DBMS Lab Manual - New
82 pages
CCS341 Set1
100% (2)
CCS341 Set1
2 pages
CCS354 NS Lab QP With Mark Allotment
No ratings yet
CCS354 NS Lab QP With Mark Allotment
5 pages
CS3492-DBMS Study Material - Unit I
50% (4)
CS3492-DBMS Study Material - Unit I
11 pages
ML - LAB Record
No ratings yet
ML - LAB Record
36 pages
CS3591 Computer Networks Unit-01 Notes
No ratings yet
CS3591 Computer Networks Unit-01 Notes
87 pages
Ad3301 Data Exploration and Visualization
100% (3)
Ad3301 Data Exploration and Visualization
30 pages
AD3491 FDSA Syllabus
No ratings yet
AD3491 FDSA Syllabus
2 pages
CS3491 Ai Lab Manula R2021 Final
100% (4)
CS3491 Ai Lab Manula R2021 Final
43 pages
Cs3491 - Aiml - Unit III - Introduction To Machine Learning1
100% (1)
Cs3491 - Aiml - Unit III - Introduction To Machine Learning1
23 pages
Question Paper Code: Reg. No.
100% (1)
Question Paper Code: Reg. No.
2 pages
Me cp4212 Software Engineering Manual
No ratings yet
Me cp4212 Software Engineering Manual
34 pages
Question Bank For Int - Data Science
100% (1)
Question Bank For Int - Data Science
5 pages
ME P4252-II Semester - MACHINE LEARNING
100% (1)
ME P4252-II Semester - MACHINE LEARNING
48 pages
CS3461 Oslab
No ratings yet
CS3461 Oslab
2 pages
CS3461 OS Lab Syllabus 2021
0% (1)
CS3461 OS Lab Syllabus 2021
1 page
DVT - Question Bank
100% (1)
DVT - Question Bank
3 pages
CS3501 Compiler Design Lab Manual
No ratings yet
CS3501 Compiler Design Lab Manual
43 pages
Ccs341 - Data Warehousing
100% (1)
Ccs341 - Data Warehousing
2 pages
AD3251 Data Structures Design Question Bank 1
No ratings yet
AD3251 Data Structures Design Question Bank 1
1 page
Accomplishment Report On Brigada Eskwela 2021
100% (1)
Accomplishment Report On Brigada Eskwela 2021
14 pages

CSBS - AD3491 - FDSA - IA 1 - Answer Key

Uploaded by

CSBS - AD3491 - FDSA - IA 1 - Answer Key

Uploaded by

Saranathan College of Engineering

Internal Assessment Test – I – Answer Key Date/Session 21-09-2022 Marks 50

Raw Data Stem Leaf

12 Discuss briefly about:

ii. Machine Learning in Data Science (5)

13 The IQ scores for a group of 35 high school dropouts are as follows:

14 i. Discuss in detail about “Measures of Central Tendency” and calculate

Q.No. Questions CO Skills

2, 17, 5, 3, 28, 7, 5, 8, 5, 6, 2, 12, 10, 4, 3.

(c) Without constructing a frequency distribution or graph, would you

You might also like