0% found this document useful (0 votes)
7 views

Phython Assignment

The document consists of multiple assignments containing questions and answers related to data analytics, statistics, and hypothesis testing. Each assignment includes true/false questions, multiple-choice questions, and programming tasks, with an answer key provided for each section. The content covers various topics such as data measurement scales, probability, ANOVA, and regression analysis.

Uploaded by

shortanydv
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Phython Assignment

The document consists of multiple assignments containing questions and answers related to data analytics, statistics, and hypothesis testing. Each assignment includes true/false questions, multiple-choice questions, and programming tasks, with an answer key provided for each section. The content covers various topics such as data measurement scales, probability, ANOVA, and regression analysis.

Uploaded by

shortanydv
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Assignment 1

Q1 State True or false:


Statement: data can be generated by machines but not by humans.

a) True
b) False
Q2 Which one of the following is not a classification of Data Analytics?
a) Diagnostic analytics
b) Deceptive analytics
c) Predictive analytics
d) Prescriptive analytics
Q3 State True or false:
Statement: Nominal scale is the lowest level of measurement and ratio scale is the
highest level of measurement.

a) True
b) False
Q4 Consider the following statements-
Statement A : With iloc, we can pass in the negative value.
Statement B : With loc, we can pass in the negative value.
a. A and B are correct
b. Both are false
c. A is correct B is false
d. B is correct A is false

Q5 For getting 3rd, 4th & 6th row of a datafile “df”in Python programming, we can write:
a. df.loc[[2,3,5]]
b. df.loc[[3,4,5]]
c. df.iloc[3,4,6]
d. None of the above

Q6 Which of the following is not a measure of dispersion?


a. Skewness
b. Kurtosis
c. Range
d. percentile

Q7 State the following true or false?


Statement: Bimodal Data sets contains more than two modes.

a. True
b. False

Q8 Bar Charts are used for :


a. Continuous data
b. Categorical data
c. both (a) & (b)
d. None of the above

Q9 Median is not applicable to


a. Ordinal
b. Interval
c. Nominal
d. None of the above

Q10 def m(data)


Diff = max(data) – min(data)
return(Diff)
The above defined data function in Python programming, will calculate the?
a. Inter quartile range
b. Mode
c. Median
d. Range
Correct ans: d)
Assignment 2

Q1 A college plans to interview 8 students for possible offers of graduate


assistantships. The college has three assistantships available. How many groups
of three can the college select?
a) 126
b) 56
c) 136
d) 130
Q2 A student has to take 9 more courses before he can graduate. If none of the
courses are prerequisite to others, how many groups of four courses can he select
for the next semester?
a) 126
b) 56
c) 136
d) 130
Q3 Ten individuals are candidates for positions of president, vice president of an
organization. How many possibilities of selections exist?
a) 90
b) 100
c) 120
d) 130
Q4 A student has to take 7 more courses before she can graduate .If none of the
courses are prerequisites to others, how many groups of three courses can she
select for the next semester?
a. 30
b. 35
c. 40
d. 45
Q5 A company plans to interview 10 recent graduates for possible employment. The
company has three positions open. How many groups of three can the company
select?
a) 90
b) 100
c) 120
d) 130
Q6 Eight individuals are candidates for positions of president, vice president, and
treasurer of an organization. How many possibilities of selections exist?
a. 300
b. 330
c. 336
d. 339
Q7 From a group of three finalists for a privately endowed scholarship, two individuals
are to be selected for the first and second places. Determine the number of
possible selections
a. 3
b. 6
c. 9
d. 12
Q8 State true or false
Statement: All mutually exclusive events are independent events
a. True
b. False
Q9 A committee of 4 is to be selected from a group of 12 people. How many
possible committees can be selected?
a. 395
b. 425
c. 495
d. 525
Q10 Assume a businessman has 7 suits and 8 ties. He is planning to take 3
suits and 2 ties with him on his next business trip. How many possibilities of
selection does he have?
a. 140
b. 250
c. 480
d. 500
We have noted the typo in Option C. It should have been 980 instead of 480.

Answer Key

Ques 1 B
8x7x6!/(5!x 3!)
Ques2 A
3 A
4 B
5 C
6 C
7 B
8 B
9 C
10 C
Assignment 3

Q1 The specific value of a random variable is called estimator


a)​ True
b)​ False

Q2 If the true proportion of customers who are below 20 years is P=0.35, what is the
probability that a sample size 100 yields a sample proportion between 0.3 to
0.4
a)​ 0.961
b)​ 0.827
c)​ 0.706
d)​ 0.53

Q3 Stratified random sampling is a method of selecting a sample in which


a. the sample is first divided into strata, and then random samples are taken from
each stratum

b. various strata are selected from the sample

c. the population is first divided into strata, and then random samples are drawn
from each stratum

d. None of these alternatives is correct.

Q4 The interval estimate provides more information about a population characteristics


than the point estimate
a)​ True
b)​ False

Q5 A question paper contains 90 multiple choice questions. There are 4 alternative


answers (A, B, C or D) out of which only one is correct. Mr X answers these
questions randomly (i.e. without preparation). What is the probability that X gets a
score of at least 10 marks?
a.​ 0.9997
b.​ 0.7894
c.​ 0
d.​ 0.001

Q6 On an average 5 % items supplied by manufacturer X. are defectives. If a batch of


10 items is inspected: what is the probability that 2 items are defective
a. 0.065
b. 0.075
c. 0.085
d. 0.095
Q7 A car distributor in city Y experiences on an average 2.5 car sales per day. Find the
probability that on a randomly selected day, they will sell 5 car:
a.​ 0.0668
b.​ 0.544
c.​ 0.082
d.​ 0.205

Q8 In question 7, Find the probability that on a randomly selected day, they will sell no
cars:
a.​ 0.0668
b.​ 0.544
c.​ 0.082
d.​ 0.205

Q9 In question 7, Find the probability that on a randomly selected day, they will sell at
most 2 cars
a.​ 0.0668
b.​ 0.544
c.​ 0.082
d.​ 0.205

Q10 In question 7, Find the probability that on a randomly selected day, they will sell
exactly one car:
a.​ 0.0668
b.​ 0.544
c.​ 0.082
d.​ 0.205

Answer Key

A1 B
The specific value of a random variable is called estimate
A2 C
A3 C
A4 A

A5 A

A6 B

A7 A
A8 C
A9 B
A10 D
Assignment 4

Q1 If we have a sample size of 20 and population standard deviation is known, we will


use:
a) t- test for hypothesis testing
b) z-test for hypothesis testing
c) both t and z test
d) F-test

Q2 Null hypothesis, Ho: 1- 2 = 0 is a


a) Upper tail test
b) Lower tail test
c) Two tail test
d) F Test
Q3 The quality-control manager at a Li-BATTERY factory needs to determine whether
the mean life of a large shipment of Li-Battery is equal to the specified value of 375
hours. The process standard deviation is known to be 100 hours. A random sample of
64 batteries indicates a sample mean life of 350 hours.
State the null and alternative hypotheses
a. Mu = 375
b. Mu ≤ 375
c. Mu = 350
d. Mu ≥ 350
Q4 In question 3, At the alpha = 0.05 level of significance is there any evidence that the
mean life is different from 375 hours?
a. Yes, there is
b. No, there is not
c. None of the above
Q5 For one-tailed test, the test statistic z is determined to be zero. The p-value for this
test is
a. zero

b. -0.5

c. +0.5

d. 1.00

Q6 The error of rejecting a true null hypothesis is


a. a Type I error

b. a Type II error
c. is the same as b

d. committed when not enough information is available

Q7 The mean cost of a hotel room in a city is said to be $168 per night. A random
sample of 25 hotels resulted in X-bar = $172.50 and sample standard deviation s =
15.40. Calculate the t statistic.
a. 2
b. -2
c. 1.46
d. -1.46
Q8 In hypothesis testing if the null hypothesis is rejected,
a. no conclusions can be drawn from the test

b. the alternative hypothesis is true

c. the data must have been accumulated incorrectly

d. the sample size has been too small

Q9 In the hypothesis testing procedure, α is

a. 1 - the level of significance

b. the critical value

c. the confidence level

d. level of significance
Q10 If a hypothesis is rejected at the 5% level of significance, it
a. will always be rejected at the 1% level

b. will always be accepted at the 1% level

c. will never be tested at the 1% level

d. may be rejected or not rejected at the 1% level

ANSWERKEY

A1 B
A2 C
A3 A
A4 A
A5 C
A6 A
A7 C

A8 B
A9 D
A10 D
Assignment 5

Q1 In the analysis of variance procedure (ANOVA) the term "factor" refers to:
a. the dependent variable
b. the independent variable
c. different levels of a treatment
d. the critical value of F

Q2 In a problem of ANOVA, involving 3 treatments and 10 observations per treatment, SSE = 500.
The MSE for this situation is
a. 130.2
b. 48.8
c. 18.52
d. 30.0

Q3 The ‘F’ ratio in a completely randomized ANOVA is the ratio of


a. MST/MSE
b. MSTR/MSE
c. MSE/MSTR
d. MSE/MST

Q4. An ANOVA procedure is applied to data obtained from 7 samples where each sample contains
10 observations. The degrees of freedom for the critical value of F are
a. 7 numerator and 20 denominator degrees of freedom
b. 5 numerator and 20 denominator degrees of freedom
c. 6 numerator and 63 denominator degrees of freedom
d. 7 numerator and 63 denominator degrees of freedom

Q5. In an ANOVA problem if SST = 200 and SSTR = 80, then SSE is
a. 280
b. 120
c. 80
d. 120

Q6. The critical F value with 8 numerator and 29 denominator degrees of freedom at α = 0.01 is
a. 2.18
b. 3.20
c. 3.53
d. 3.94

Q7. Two Independent simple random samples are taken to test the difference between the means of
two populations. The standard deviations are not known, but are assumed to be equal. The
sample sizes are n1 = 15 and n2 = 35. The correct distribution to use is the
a. t distribution with 51 degrees of freedom
b. z distribution with 50 degrees of freedom
c. z distribution with 49 degrees of freedom
d. t distribution with 48 degrees of freedom
Q8. Stare true or false:

Statement: The sampling distribution of two populations is approximated by a normal


distribution
a. True
b. False

Q9. Mean marks obtained by male and female students of school ABCD in first unit test are shown
as below.
Male Female
Sample Size 64 36
Sample Mean Marks 44 41
128 72
Population Variance ( )

The standard error for the difference between the two means is
a. 4
b. 7.46
c. 4.24
d. 2.0

Q10 If you are interested in testing whether or not the average marks of males is significantly
greater than that of females, the test statistic is
a. 2.0
b. 1.5
c. 1.96
d. 1.645

ANSWER KEY
A1 B
A2 C
MSE = SSE/DOF =500/(30-3) = 18.52
A3 B
A4 C
NUMERATOR DOF = C-1 =6
DENOMINATOR DOF =N-C = 70 - 7 = 63
A5 B
SSE = SST-SSTR = 200 – 80 = 120
A6 B (USE F TABLE)
A7 D
DOF for two sample t test = n1+n2 -2 = 15 +35 -2 = 48
A8 A
Only z test is possible in case of two proportions.
A9 D
A10 B
Week 6: Two way ANOVA and Linear regression

Q1: The model developed from sample data having the form of is known as

a. regression equation

b. correlation equation

c. estimated regression equation

d. regression model

ANS:​ C

Q2: In regression analysis, which of the following is not a required assumption about the error term ε?

a. The expected value of the error term is one.

b. The variance of the error term is the same for all values of X.

c. The values of the error term are independent.

d. The error term is normally distributed.

ANS:​ A

Q3: A regression analysis between sales (Y in $1000) and advertising (X in dollars) resulted in the following
equation

= 30,000 + 5 X

The above equation implies that an

a. increase of $5 in advertising is associated with an increase of $5,000 in sales

b. increase of $1 in advertising is associated with an increase of $5 in sales

c. increase of $1 in advertising is associated with an increase of $35,000 in sales

d. increase of $1 in advertising is associated with an increase of $5,000 in sales

ANS:​ D
Q4: In a regression and correlation analysis if r2 = 1, then

a. SSE = SST

b. SSE = 1

c. SSR = SSE

d. SSR = SST

ANS:​ D

Q5: SSE can never be

a. larger than SST

b. smaller than SST

c. equal to 1

d. equal to zero

ANS:​ A

Q6:

For the given data determine the R-squared value


Data:
Miles travel​ Petrol Consumption in litre
20​ ​ 1
45​ ​ 3
56​ ​ 5
34​ ​ 2
28​ ​ 1.6
49​ ​ 3.7

a)​ 0.887
b)​ 0.956
c)​ 0.945
d)​ 0.932

ANS: B

Q7: In the question no. 6 we will:


a)​ Accept the null hypothesis
b)​ Reject the null hypothesis
c)​ Can’t state any conclusion
d)​ None of the above

ANS: B
Q8: In Question 6, determine a 95% confidence interval for b1 to test the hypotheses

a)​ (0.045, 0.138)


b)​ (0.055, 0.148)
c)​ (0.065, 0.158)
d)​ (0.075, 0.138)

ANS: D

Q9: State TRUE or FALSE –

Statement: The variance of error, is same for all values of the independent variable

a)​ True

b)​ False

ANS: A

Q10: Which of the following is possible for the coefficient of determination:


a)​ It can be larger than 1
b)​ It is less than one
c)​ It can be less than -1
d)​ None of these alternatives is correct

ANS: B
Week 7 - Linear and Multiple Regression

Q1. The interval estimate of the mean value of y for a given value of x is defined as?
a.​ Prediction interval estimate
b.​ Confidence interval estimate
c.​ Average regression
d.​ X vs Y correlation interval

Ans: B

Q2. If the coefficient of determination is a positive value, then the coefficient of correlation
a. must also be positive
b. must be zero
c. can be either negative or positive
d. must be larger than 1
ANS: C

Q3. Which of the following is true about multiple regression model?


a.​ It has only one independent variable
b.​ It has more than one dependent variable
c.​ It has more than one independent variable
d.​ It has at least 2 dependent variable

Ans: C

Q4. In a multiple regression model, the error term ɛ is assumed to


a.​ Have a mean of 1
b.​ Have a variance of 0
c.​ Have a standard deviation of 1
d.​ Be normally distributed
Ans: D

Q5. Regression analysis is a statistical procedure for developing a mathematical equation that
describes how
a. one independent and one or more dependent variables are related
b. several independent and several dependent variables are related
c. one dependent and one or more independent variables are related
d. None of these alternatives is correct.
ANS: C

Q6. If the R.sq value is small for a model with a large number of independent variables, the
adjusted coefficient of determination _______________
a.​ Can be positive
b.​ Can be negative
c.​ Is zero
d.​ Can’t say
Ans: B

Q7. Which one of the statements is true regarding residuals in regression analysis?
a.​ Mean of residuals is always 0
b.​ Mean of residuals is always < 0
c.​ Mean of residuals is always > 0
d.​ There is no such rule for residuals
Ans: A

Q8. In a simple linear regression model (one independent variable), if we change the input
variable by 1 unit, how much will the output variable change?
a.​ By 1
b.​ No change
c.​ By its slope
d.​ None of these
Ans: C

Q9. If all the points of a scatter diagram lie on the least squares regression line, then the
coefficient of determination for these variables based on these data is
a. 0
b. 1
c. either 1 or -1, depending upon whether the relationship is positive or negative
d. could be any value between -1 and 1
ANS: B

Q10. In a regression analysis, the regression equation is given by y = 12 - 6x. If SSE = 510 and
SST = 1000, then the coefficient of correlation is
a. -0.7
b. +0.7
c. 0.49
d. -0.49
ANS: A
Q1. For categorical data with ‘n’ categories, the number of dummy variables will be________
a.​ n
b.​ n-1
c.​ n+1
d.​ 2n
Ans: b

Q2. In estimation of regression parameters

A.​ The likelihood function is a function of only 𝜎


B.​ The values of 𝛽0,….,𝛽n and 𝜎 should be such that, they maximizes the likelihood function.
C.​ Both (a) and (b)
D.​ All of the above
Ans: B

Q3. In logistic regression, the null hypothesis tested is:


a.​ H0: β = 0
b.​ H0: β ≠ 0
c.​ H0: μ = 0
d.​ H0: μ ≠ 0
Ans: a

Q4. In logistic regression,


a.​ The graph doesn’t follow S shape curve
b.​ The dependent variable is categorical
c.​ The estimated value of dependent variable is not probability
d.​ None of the above.
Ans. b

Q5. State true or false: G statistic is used to check the individual significance of the independent
variables
a.​ True
b.​ False
Ans: B.

Q6. The maximum likelihood estimate for binomial distribution is p = ____


a.​ 0.1
b.​ 0.2
c.​ 0.3
d.​ None of the above
Ans: c.

Q7. State True or False: The Method of Least Squares can be applied to models with any
probability distribution.
a.​ True
b.​ False
Ans: b.

Q8. Suppose you have been given a fair coin and you want to find out the odds of getting
heads. Which of the following option is true for such a case?
a.​ Odds will be 0
b.​ Odds will be 0.5
c.​ Odds will be 1
d.​ None of these
Ans. C

Q9. Large values of the log-likelihood statistic indicate:


a.​ That there are a greater number of explained vs. unexplained observations.
b.​ That the statistical model fits the data well.
c.​ That as the predictor variable increases, the likelihood of the outcome occurring
decreases.
d.​ That the statistical model is a poor fit of the data.
Ans. b

Q10. The logit function(given as l(x)) is the log of odds function. What could be the range of logit
function in the domain x=[0,1]?
a.​ (– ∞ , ∞)
b.​ (0,1)
c.​ (0 , ∞)
d.​ (- ∞, 0 )
Ans. a.
Week 9

Q1.State true or false: Statement: there is no difference between, E(y) = 0 + 1x and y = 0 + 1x


+ e , both are regression equations.
1.​ True
2.​ False
Ans: b.

Q2. Which of the following statements is correct:


●​ Sensitivity in ROC analysis is called True Positive Rate(tpr)
●​ Specificity in ROC analysis is not called True Negative Rate (tnr)
●​ Specificity in ROC analysis is called True Positive Rate(tpr)
●​ Sensitivity in ROC analysis is called True Negative Rate (tnr)
Ans: A

Q3. In ROC analysis when the Threshold value is Higher:


A.​ Specificity decreases
B.​ Sensitivity decreases
C.​ Both a. and b.
D.​ None of the above
Ans: b.

Q4. Sensitivity in ROC analysis is defined as (TP = True Positive, FP = False Positive, TN =
True Negative, FN = False Negative):
a.​ FP / (FP+TN)
b.​ FN/(TP+FN)
c.​ TN / (TN+FP)
d.​ TP / (TP+FN)
Ans. d.

Q5. In ROC analysis, a classifier is called ‘good’ if it has ______


a.​ Low TPR and Low FPR
b.​ Low TPR and High FPR
c.​ High TPR and Low FPR
d.​ High TPR and High FPR
Ans: c

Q6. For the given confusion matrix, compute the recall

True Positive True Negative

Predicted Positive 8 3

Predicted Negative 2 7
a.​ 0.73
b.​ 0.7
c.​ 0.78
d.​ 0.8
Ans: d

Q7. State true or False: Precision is inversely proportional to recall


a.​ True
b.​ False
Ans: b.

Q8. State True or False: Standardization of features is not required before training a Logistic
regression model
a.​ True
b.​ False
Ans: a.

Q9. Which of the following option is true?


A) Linear Regression errors values have to be normally distributed but in the case of Logistic
Regression it is not the case
B) Logistic Regression errors values have to be normally distributed but in the case of Linear
Regression it is not the case
C) Both Linear Regression and Logistic Regression error values have to be normally distributed
D) Both Linear Regression and Logistic Regression error values have not to be normally
distributed
Ans: a

Q10. Which of the following is true regarding the logistic function for any value “x”?
A.​ Logistic(x): is a logistic function of any number “x”
B.​ Logit(x): is a logit function of any number “x”
C.​ Logit_inv(x): is an inverse logit function of any number “x”
A) Logistic(x) = Logit(x)
B) Logistic(x) = Logit_inv(x)
C) Logit_inv(x) = Logit(x)
D) None of these
Ans: b.
Week 10: Chi-square test & Clustering

Q1. Sampling distribution for a goodness of fit test is the


a. Poisson distribution
b. t distribution
c. normal distribution
d. chi-square distribution
Ans : d.

Q2. Goodness of fit test is always conducted as a


a. lower-tail test
b. upper-tail test
c. middle test
d. None of these alternatives is correct.
Ans. b.

Q3. State True or False: Statement: Null hypothesis for chi square test of independence
assumes that, all the proportions are equal.
a.​ True
b.​ False
Ans. a.

Q4. Statistical test conducted to determine whether to reject or not reject a hypothesized
probability distribution for a population is known as a ________
a. contingency test
b. probability test
c. goodness of fit test
d. None of these alternatives is correct.
Ans. c.

Q5.What is the minimum no. of variables/ features required to perform clustering?


a.​ 0
b.​ 1
c.​ 2
d.​ 3
Ans. b.

Q6. The degrees of freedom for a contingency table with 12 rows and 12 columns is
a. 144
b. 121
c. 12
d. 120
ANS: B
Q7. The table below gives beverage preferences for random samples of teens and adults.
Teens ​Adults Total
Coffee ​ 50 ​ 200 ​ 250
Tea ​ ​ 100 ​ 150 ​ 250
Soft Drink ​ 200 ​ 200 ​ 400
Other ​ ​ 50 ​ 50 ​ 100
400 ​ 600 ​ 1,000
We are asked to test for independence between age (i.e., adult and teen) and drink preferences.
With a .05 level of significance, the critical value for the test is _______
a. 1.645
b. 7.815
c. 14.067
d. 15.507
ANS: B

Q8. How can Clustering (Unsupervised Learning) be used to improve the accuracy of the
Linear Regression model (Supervised Learning):
1. Creating different models for different cluster groups.
2. Creating an input feature for cluster ids as an ordinal variable.
3. Creating an input feature for cluster centroids as a continuous variable.
4. Creating an input feature for cluster size as a continuous variable

a.​ 1. Only
b.​ 1 & 2
c.​ 1 & 4
d.​ 1,2,3 & 4
Ans. d.

Q9. Let x1 = (1,2) and x2 = (3,5) be the co-ordiantes for two objects. The Euclidean and
Manhattan distance between these two objects is __________ respectively
a.​ 4.2 and 3
b.​ 3.15 and 2
c.​ 3.61 and 5
d.​ None of the above
Ans: c.

Q10. Last school year, the student body of a local university consisted of 30% freshmen, 24%
sophomores, 26% juniors, and 20% seniors. A sample of 300 students taken from this year's
student body showed the following number of students in each classification.
Freshmen ​ 83
Sophomores ​ 68
Juniors ​ 85
Seniors ​ 64
We are interested in determining whether or not there has been a significant change in the
classifications between the last school year and this school year. The expected number of
freshmen is ________
a. 83
b. 90
c. 30
d. 10
ANS: B
Week 11 - Clustering Analysis, K-means, Hierarchical clustering

Q1. Which library is used for calculating distance measures in clustering using python?

A.​ distance_matrix
B.​ scipy.spatial
C.​ scipy_spatial
D.​ distance.matrix
Ans: B.
(Error in the portal will be rectified soon.)

Q2. Formula for dissimilarity computation between two objects for categorical variables is –
Here p is a categorical variable and m denotes the number of matches.

A.​ D(i, j) = p-m / p


B.​ D(i, j) = p-m / m
C.​ D(i, j) = m-p / p
D.​ D(i, j) = m-p / m
Ans: A

Q3. Select the correct option for a data set with 7 objects and an interval-scaled variable ‘f’ we
have the following measurements:
f = (1, 2, 3, 4, 5, 8, 50)
containing one outlying value.

A.​ Std deviation (std_f) and mean absolute deviation (s_f) are having the same effect of the
outlier.
B.​ Mean absolute deviation (s_f) is more affected by the outlier
C.​ Std deviation (std_f) is less affected by the outlier
D.​ Std deviation(std_f) is more affected by the outlier.
Ans. D

Q4. Select the correct statement about the standardization in the following options –

A.​ Standardizing the data always gives inefficient result while making clusters
B.​ Standardizing the data always beneficial during clustering analysis
C.​ The variables having an absolute value may not efficient after standardization during
clustering analysis
D.​ Outliers can not be detected by standardized data
Ans. C

Q5. Which of the following can act as possible termination conditions in K-Means?

1.​ For a fixed number of iterations.


2.​ Assignment of observations to clusters does not change between iterations. Except for
cases with a bad local minimum.
3.​ Centroids do not change between successive iterations.
4.​ Terminate when RSS falls below a threshold.

A.​ 1,3, and 4


B.​ 1,2,3 and 4
C.​ 2 and 3
D.​ None of these
Ans. B

Q6. In the figure below, if you draw a horizontal line on y-axis for y=2. What will be the number
of clusters formed?

a.​ 1
b.​ 2
c.​ 3
d.​ 4
Ans: b.

Q7. Which of the following clustering requires merging approach?


a.​ Partitional
b.​ Naive Bayes
c.​ Hierarchical
d.​ None of the above
Ans: c

Q8. State True or False: Hierarchical clustering should primarily be used for exploration
a.​ True
b.​ False
Ans. a.
Q9. State True or False: For finding dissimilarity between two clusters in hierarchical clustering,
average-link is the only metric used
a.​ True
b.​ False
Ans. b.

Q10. If two variables V1 and V2, are used for clustering. Which of the following are true for K
means clustering with k =3?

1.​ If V1 and V2 has a correlation of 1, the cluster centroids will be in a straight line
2.​ If V1 and V2 has a correlation of 0, the cluster centroids will be in straight line

a.​ 1 only
b.​ 2 only
c.​ 1 and 2
d.​ None of the above
Ans: a.
Week 12 - CART 1 & 2

Q1. Which clustering algorithm works well when the shape of the clusters is hyper-spherical?
a.​ K means
b.​ Agglomerative Hierarchical clustering
c.​ Divisive Hierarchical clustering
d.​ All of the above
Ans: a.

Q2. In decision tree, an internal node represents –


a.​ A test on an attribute
b.​ An outcome of the test
c.​ Entire sample population
d.​ Holds a class label
Ans: a.

Q3. Choose the correct statement about the CART model –


a.​ CART is an unsupervised learning technique
b.​ CART is a supervised learning technique
c.​ CART adopts a greedy approach
d.​ Both b. & c.
Ans. d.

Q4. Which library is used to built the decision tree model-


a.​ Decision tree classifier
b.​ DecisionTreeClassifier
c.​ Decision_Tree_Classifier
d.​ Decision_tree_model
Ans. b.

Q5. State True or False: Gini Index enforces the resulting tree to have multiway splits
a.​ True
b.​ False
Ans. b

Q6. Chance nodes are represented by ___________


a.​ Disks
b.​ Squares
c.​ Circles
d.​ Triangles
Ans. c.

Q7. _______is the measure of uncertainty of a random variable, it characterizes the impurity of
an arbitrary collection of examples.
a.​ Information Gain
b.​ Gini Index
c.​ Entropy
d.​ None of the above
Ans: c

Q8. End Nodes are represented by ________


a.​ Disks
b.​ Squares
c.​ Circles
d.​ Triangles
Ans: d.

Q9. Decision tree learners may create biased trees if some classes dominate. What’s the
solution of it?
a.​ Balance the dataset prior to fitting
b.​ Imbalance the dataset prior to fitting
c.​ Balance the dataset after fitting
d.​ None of the above
Ans: a.

Q10. Suppose, your target variable is the price of a house using Decision Tree. What type of
tree do you need to predict the target variable?
a.​ Classification tree
b.​ Regression tree
c.​ Clustering tree
d.​ Dimensionality reduction tree
Ans. b.

You might also like