50% found this document useful (2 votes)
3K views

CSBS - AD3491 - FDSA - IA 2 - Answer Key

This document contains an answer key for an internal assessment test on fundamentals of data science and analytics. It includes 10 multiple choice questions in Part A worth 2 marks each on topics like regression, correlation, standard error of estimate, populations and samples, hypothesis testing, and the central limit theorem. Part B includes 2 long answer questions worth 10 marks each on regression equations, standard error, hypothesis statements, and using correlation to predict values.

Uploaded by

R.Mohan Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
50% found this document useful (2 votes)
3K views

CSBS - AD3491 - FDSA - IA 2 - Answer Key

This document contains an answer key for an internal assessment test on fundamentals of data science and analytics. It includes 10 multiple choice questions in Part A worth 2 marks each on topics like regression, correlation, standard error of estimate, populations and samples, hypothesis testing, and the central limit theorem. Part B includes 2 long answer questions worth 10 marks each on regression equations, standard error, hypothesis statements, and using correlation to predict values.

Uploaded by

R.Mohan Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Saranathan College of Engineering

Tiruchirappalli - 620012

Internal Assessment Test – II – Answer Key Date/Session 19/10/2022 Marks 50


Course code AD3491 Course Title FUNDAMENTALS OF DATA SCIENCE AND ANALYTICS
Batch No. Duration 90 MINUTES Academic Year 2022-2023/ODD
Year II Semester III Department B.Tech - CSBS
Part – A (20 Marks) Answer all the Questions (10x2=10 Marks)
Q. No. Questions CO Skills
1 Define Regression.
Regression is a statistical method used to determine the strength and
character of the relationship between one dependent variable (usually
denoted by Y) and a series of other variables (known as independent
C206.2 R
variables).
Linear regression is the most common form of this technique. Linear
regression establishes the linear relationship between two variables based on
a line of best fit.
2 Compare and contrast Correlation and Regression.

Correlation Regression

‘Correlation’ as the name says it ‘Regression’ explains how an


determines the interconnection or a co- independent variable is numerically
relationship between the variables. associated with the dependent
variable.

In Correlation, both the independent and However, in Regression, both the


dependent values have no difference. dependent and independent variable
are different.

The primary objective of Correlation is, When it comes to regression, its C206.2 U
to find out a quantitative/numerical value primary intent is, to reckon the values
expressing the association between the of a haphazard variable based on the
values. values of the fixed variable.

Correlation stipulates the degree to which However, regression specifies the


both of the variables can move together. effect of the change in the unit, in the
known variable (p) on the evaluated
variable (q).

Correlation helps to constitute the Regression helps in estimating a


connection between the two variables. variable’s value based on another
given value.

3 Write short note on Standard Error of Estimate.


A rough measure of the average amount of predictive error. The
standard error of estimate, sy|x, is useful as a rough measure of the average
amount by which known Y values deviate from their predicted Y values.
C206.2 R

where SSy – Sum of Squares of ‘Y’ Scores


r – Correlation co-efficient
n – Sample size
4 Define Real Population, Hypothesis Population, and Sample.
 Any complete set of observations or potential observations is called
“Population” and any subset of observations from a population is
known as “Sample”.
C206.2 U
 A real population is one in which all potential observations are
accessible at the time of sampling.
 A hypothetical population is one in which all potential observations
are not accessible at the time of sampling.
5 What do you mean by Hypothesis? Name at least 4 of its types.
Hypothesis is a statement about the nature of a population. It is often
stated in terms of a population parameter. Hypothesis testing is a form of
statistical inference that uses data from a sample to draw conclusions about a
population parameter or a population probability distribution C206.3 R
Some types of hypothesis statements are Directional Hypothesis, Non-
Directional Hypothesis, Null hypothesis, Alternative hypothesis, Associative
Hypothesis.

State Addition Rule and Multiplication Rule.


Addition rule states that add together the separate probabilities of
several mutually exclusive events to find the probability that any one of
these events will occur.

6 C206.3 R
Multiplication rule states that multiply together the separate
probabilities of several independent events to find the probability that these
events will occur together.

Imagine a very simple population consisting of only four observations:


2, 4, 6, 8. List all possible samples of size two.
For given sample size 2, list of possible samples that can be taken from
above observations are,
7 C206.3 A
(2,2) (2,4) (2,6) (2,8)
(4,2) (4,4) (4,6) (4,8)
(6,2) (6,4) (6,6) (6,8)
(8,2) (8,4) (8,6) (8,8)
State Central Limit Theorem.
Central Limit Theorem states that regardless of the population shape,
the shape of the sampling distribution of the mean approximates a normal
curve if the sample size is sufficiently large.
According to this theorem, it doesn’t matter whether the shape of the
8 parent population is normal, positively skewed or negatively skewed, as long C206.3 U
as the sample size is sufficiently large.
If the shape of the parent population is normal, then any sample size
will be sufficiently large. Otherwise, depending on the degree of non-
normality in the parent population, a sample size between 25 and 100 is
sufficiently large.
Indicate whether the following statements are True or False with proper
justification. The mean of all sample means, μx̅ , . . .
(a) always equals the value of a particular sample mean.
(b) equals 100 if, in fact, the population mean equals 100.
(c) usually equals the value of a particular sample mean.
(d) is interchangeable with the population mean.
9 Answer: C206.3 A
(a) FALSE, Mean of all sample mean will not represent a particular sample
mean.
(b) TRUE, The population mean can be equated to mean of all sample mean.
(c) FALSE, Mean of all sample mean will not represent a particular sample
mean.
(d) TRUE, The population mean can be equated to mean of all sample mean.
Indicate what’s wrong with each of the following statistical hypotheses:
(a) H0 : μ = 155 (b) H0: X̅ = 241
H1 : μ ≠ 160 H1: X̅ ≠ 241
Answer:
10 C206.3 U
(a) Null hypothesis and its respective alternative hypothesis cannot have
different anchor point values. In given scenario, both hypothesis dint
cover any values between 155 and 160 exclusively.
(b) Any hypothesis statement represents details about any one of population
parameter. But Sample mean X̅ is referred in given above scenario.

Part – B
(Answer all the questions 2 x 10 = 20marks)

Q.
Questions CO Skills
No.
11 Assume that an r of (–0.80) describes the strong negative relationship
between years of heavy smoking (X) and life expectancy (Y). Assume,
furthermore, that the distributions of heavy smoking and life expectancy
each have the following means and sums of squares:
X̅ = 5 Y̅ = 60
SSx = 35 SSy = 70
(a) Determine the least squares regression equation for predicting life C206.2 A
expectancy from years of heavy smoking.
(b) Determine the standard error of estimate, Sy|x, assuming that the
correlation of (–0.80) was based on n = 50 pairs of observations.
(c) Predict the life expectancy for John, who has smoked for 8 years.
(d) Predict the life expectancy for Katie, who has never smoked.
Or
Each of the following pairs represents the number of licensed drivers (X )
12
and the number of cars (Y ) for seven houses in my neighborhood:
DRIVERS (X) CARS (Y)
5 4
5 3
2 2
2 2
C206.2 A
3 2
1 1
2 2
(a) Determine the least squares equation for these data.
(b) Determine the standard error of estimate, Sy|x, given that n = 7.
(c) Predict the number of cars for each of two new families with two and
five drivers.
Discuss about the following with suitable example: (5x2=10)
13
i. Random Sampling vs Random Assignments
ii. Independent vs Dependent Events C206.3 R
iii. Independent vs Mutually Exclusive Events
iv. Conditional Probability
v. Sampling Distribution of the Mean

Answer:
i) Random sampling is a selection process that guarantees all potential observations in the population
have an equal chance of being selected. Random sampling occurs if, at each stage of sampling, the
selection process guarantees that all potential observations in the population have an equal chance
of being included in the sample. It’s important to note that randomness describes the selection
process - that is, the conditions under which the sample is taken and not the particular pattern of
observations in the sample.
Random Assignment is a procedure designed to ensure that each subject has an equal chance of
being assigned to any group in an experiment. Random sampling occurs in well-designed surveys,
and random assignment occurs in well-designed experiments.

ii) Dependent Events - The occurrence of one event cause some effect on the probability that the other
event will occur.
Independent Events - The occurrence of one event has no effect on the probability that the other
event will occur.

iii) Mutually Exclusive Events - Events that cannot occur together.


Independent Events - The occurrence of one event has no effect on the probability that the other
event will occur.

iv) Conditional Probability is the probability of one event, given the occurrence of another event.
Before multiplying to obtain the probability that two dependent events occur together, the
probability of the second event must be adjusted to reflect its dependency on the prior occurrence of
the first event. This new probability is the conditional probability of the second event, given the first
event.

v) The sampling distribution of the mean refers to the probability distribution of means for all possible
random samples of a given size from some population. The sampling distribution of the mean
allows us to determine whether, given the variability among all possible sample means, the one
observed sample mean can be viewed as a common outcome or as a rare outcome.

Refer to class notebook or textbook to know about each terms with suitable examples.
Or
Imagine a very simple population consisting of only four observations:
14 2 3 4 5
(a) Explain the process of constructing relative frequency table showing the
C206.3 A
sampling distribution of the mean.
(b) Construct a relative frequency table showing the sampling distribution
of the mean for the above observations.
Part – C
(Answer all the questions 1 x 10 = 10marks)
Q.No. Questions CO Skills
Define Hypothesis. Discuss in detail about at least 5 types of hypothesis
statement with suitable example.
Hypothesis is a statement about the nature of a population. It is often
stated in terms of a population parameter. Hypothesis testing is a form of
statistical inference that uses data from a sample to draw conclusions about a
15 population parameter or a population probability distribution C206.3 U
Some types of hypothesis statements are Directional Hypothesis, Non-
Directional Hypothesis, Null hypothesis, Alternative hypothesis, Associative
Hypothesis.
Refer to class notebook or https://www.analyticssteps.com/blogs/what-hypothesis-
testing-types-and-methods to know more about different types of hypothesis testing.
Or
Calculate the value of the z test for each of the following situations. Also
given critical z scores of ±1.96, calculate the critical confidence level.
16 (a) X̄ = 12; σ = 9; n = 25, μhyp = 15 C206.3 A
(b) X̄ = 3600; σ = 4000; n = 100; μhyp = 3500
(c) X̄ = 0.25; σ = 010; n = 36; μhyp = 0.22

Critical z Score: A z score that separates common from rare outcomes and hence dictates whether H0 should
be retained or rejected.

You might also like