0% found this document useful (0 votes)

7 views

Phython Assignment

The document consists of multiple assignments containing questions and answers related to data analytics, statistics, and hypothesis testing. Each assignment includes true/false questions, multiple-choice questions, and programming tasks, with an answer key provided for each section. The content covers various topics such as data measurement scales, probability, ANOVA, and regression analysis.

Uploaded by

shortanydv

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views

Phython Assignment

Uploaded by

shortanydv

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

Assignment 1

Q1 State True or false:

Statement: data can be generated by machines but not by humans.

a) True
b) False
Q2 Which one of the following is not a classification of Data Analytics?
a) Diagnostic analytics
b) Deceptive analytics
c) Predictive analytics
d) Prescriptive analytics
Q3 State True or false:
Statement: Nominal scale is the lowest level of measurement and ratio scale is the
highest level of measurement.

a) True
b) False
Q4 Consider the following statements-
Statement A : With iloc, we can pass in the negative value.
Statement B : With loc, we can pass in the negative value.
a. A and B are correct
b. Both are false
c. A is correct B is false
d. B is correct A is false

Q5 For getting 3rd, 4th & 6th row of a datafile “df”in Python programming, we can write:
a. df.loc[[2,3,5]]
b. df.loc[[3,4,5]]
c. df.iloc[3,4,6]
d. None of the above

Q6 Which of the following is not a measure of dispersion?

a. Skewness
b. Kurtosis
c. Range
d. percentile

Q7 State the following true or false?

Statement: Bimodal Data sets contains more than two modes.

a. True
b. False

Q8 Bar Charts are used for :

a. Continuous data
b. Categorical data
c. both (a) & (b)
d. None of the above

Q9 Median is not applicable to

a. Ordinal
b. Interval
c. Nominal
d. None of the above

Q10 def m(data)

Diff = max(data) – min(data)
return(Diff)
The above defined data function in Python programming, will calculate the?
a. Inter quartile range
b. Mode
c. Median
d. Range
Correct ans: d)
Assignment 2

Q1 A college plans to interview 8 students for possible offers of graduate

assistantships. The college has three assistantships available. How many groups
of three can the college select?
a) 126
b) 56
c) 136
d) 130
Q2 A student has to take 9 more courses before he can graduate. If none of the
courses are prerequisite to others, how many groups of four courses can he select
for the next semester?
a) 126
b) 56
c) 136
d) 130
Q3 Ten individuals are candidates for positions of president, vice president of an
organization. How many possibilities of selections exist?
a) 90
b) 100
c) 120
d) 130
Q4 A student has to take 7 more courses before she can graduate .If none of the
courses are prerequisites to others, how many groups of three courses can she
select for the next semester?
a. 30
b. 35
c. 40
d. 45
Q5 A company plans to interview 10 recent graduates for possible employment. The
company has three positions open. How many groups of three can the company
select?
a) 90
b) 100
c) 120
d) 130
Q6 Eight individuals are candidates for positions of president, vice president, and
treasurer of an organization. How many possibilities of selections exist?
a. 300
b. 330
c. 336
d. 339
Q7 From a group of three finalists for a privately endowed scholarship, two individuals
are to be selected for the first and second places. Determine the number of
possible selections
a. 3
b. 6
c. 9
d. 12
Q8 State true or false
Statement: All mutually exclusive events are independent events
a. True
b. False
Q9 A committee of 4 is to be selected from a group of 12 people. How many
possible committees can be selected?
a. 395
b. 425
c. 495
d. 525
Q10 Assume a businessman has 7 suits and 8 ties. He is planning to take 3
suits and 2 ties with him on his next business trip. How many possibilities of
selection does he have?
a. 140
b. 250
c. 480
d. 500
We have noted the typo in Option C. It should have been 980 instead of 480.

Answer Key

Ques 1 B
8x7x6!/(5!x 3!)
Ques2 A
3 A
4 B
5 C
6 C
7 B
8 B
9 C
10 C
Assignment 3

Q1 The specific value of a random variable is called estimator

a) True
b) False

Q2 If the true proportion of customers who are below 20 years is P=0.35, what is the
probability that a sample size 100 yields a sample proportion between 0.3 to
0.4
a) 0.961
b) 0.827
c) 0.706
d) 0.53

Q3 Stratified random sampling is a method of selecting a sample in which

a. the sample is first divided into strata, and then random samples are taken from
each stratum

b. various strata are selected from the sample

c. the population is first divided into strata, and then random samples are drawn
from each stratum

d. None of these alternatives is correct.

Q4 The interval estimate provides more information about a population characteristics

than the point estimate
a) True
b) False

Q5 A question paper contains 90 multiple choice questions. There are 4 alternative

answers (A, B, C or D) out of which only one is correct. Mr X answers these
questions randomly (i.e. without preparation). What is the probability that X gets a
score of at least 10 marks?
a. 0.9997
b. 0.7894
c. 0
d. 0.001

Q6 On an average 5 % items supplied by manufacturer X. are defectives. If a batch of

10 items is inspected: what is the probability that 2 items are defective
a. 0.065
b. 0.075
c. 0.085
d. 0.095
Q7 A car distributor in city Y experiences on an average 2.5 car sales per day. Find the
probability that on a randomly selected day, they will sell 5 car:
a. 0.0668
b. 0.544
c. 0.082
d. 0.205

Q8 In question 7, Find the probability that on a randomly selected day, they will sell no
cars:
a. 0.0668
b. 0.544
c. 0.082
d. 0.205

Q9 In question 7, Find the probability that on a randomly selected day, they will sell at
most 2 cars
a. 0.0668
b. 0.544
c. 0.082
d. 0.205

Q10 In question 7, Find the probability that on a randomly selected day, they will sell
exactly one car:
a. 0.0668
b. 0.544
c. 0.082
d. 0.205

Answer Key

A1 B
The specific value of a random variable is called estimate
A2 C
A3 C
A4 A

A5 A

A6 B

A7 A
A8 C
A9 B
A10 D
Assignment 4

Q1 If we have a sample size of 20 and population standard deviation is known, we will

use:
a) t- test for hypothesis testing
b) z-test for hypothesis testing
c) both t and z test
d) F-test

Q2 Null hypothesis, Ho: 1- 2 = 0 is a

a) Upper tail test
b) Lower tail test
c) Two tail test
d) F Test
Q3 The quality-control manager at a Li-BATTERY factory needs to determine whether
the mean life of a large shipment of Li-Battery is equal to the specified value of 375
hours. The process standard deviation is known to be 100 hours. A random sample of
64 batteries indicates a sample mean life of 350 hours.
State the null and alternative hypotheses
a. Mu = 375
b. Mu ≤ 375
c. Mu = 350
d. Mu ≥ 350
Q4 In question 3, At the alpha = 0.05 level of significance is there any evidence that the
mean life is different from 375 hours?
a. Yes, there is
b. No, there is not
c. None of the above
Q5 For one-tailed test, the test statistic z is determined to be zero. The p-value for this
test is
a. zero

b. -0.5

c. +0.5

d. 1.00

Q6 The error of rejecting a true null hypothesis is

a. a Type I error

b. a Type II error
c. is the same as b

d. committed when not enough information is available

Q7 The mean cost of a hotel room in a city is said to be $168 per night. A random
sample of 25 hotels resulted in X-bar = $172.50 and sample standard deviation s =
15.40. Calculate the t statistic.
a. 2
b. -2
c. 1.46
d. -1.46
Q8 In hypothesis testing if the null hypothesis is rejected,
a. no conclusions can be drawn from the test

b. the alternative hypothesis is true

c. the data must have been accumulated incorrectly

d. the sample size has been too small

Q9 In the hypothesis testing procedure, α is

a. 1 - the level of significance

b. the critical value

c. the confidence level

d. level of significance
Q10 If a hypothesis is rejected at the 5% level of significance, it
a. will always be rejected at the 1% level

b. will always be accepted at the 1% level

c. will never be tested at the 1% level

d. may be rejected or not rejected at the 1% level

ANSWERKEY

A1 B
A2 C
A3 A
A4 A
A5 C
A6 A
A7 C

A8 B
A9 D
A10 D
Assignment 5

Q1 In the analysis of variance procedure (ANOVA) the term "factor" refers to:
a. the dependent variable
b. the independent variable
c. different levels of a treatment
d. the critical value of F

Q2 In a problem of ANOVA, involving 3 treatments and 10 observations per treatment, SSE = 500.
The MSE for this situation is
a. 130.2
b. 48.8
c. 18.52
d. 30.0

Q3 The ‘F’ ratio in a completely randomized ANOVA is the ratio of

a. MST/MSE
b. MSTR/MSE
c. MSE/MSTR
d. MSE/MST

Q4. An ANOVA procedure is applied to data obtained from 7 samples where each sample contains
10 observations. The degrees of freedom for the critical value of F are
a. 7 numerator and 20 denominator degrees of freedom
b. 5 numerator and 20 denominator degrees of freedom
c. 6 numerator and 63 denominator degrees of freedom
d. 7 numerator and 63 denominator degrees of freedom

Q5. In an ANOVA problem if SST = 200 and SSTR = 80, then SSE is
a. 280
b. 120
c. 80
d. 120

Q6. The critical F value with 8 numerator and 29 denominator degrees of freedom at α = 0.01 is
a. 2.18
b. 3.20
c. 3.53
d. 3.94

Q7. Two Independent simple random samples are taken to test the difference between the means of
two populations. The standard deviations are not known, but are assumed to be equal. The
sample sizes are n1 = 15 and n2 = 35. The correct distribution to use is the
a. t distribution with 51 degrees of freedom
b. z distribution with 50 degrees of freedom
c. z distribution with 49 degrees of freedom
d. t distribution with 48 degrees of freedom
Q8. Stare true or false:

Statement: The sampling distribution of two populations is approximated by a normal

distribution
a. True
b. False

Q9. Mean marks obtained by male and female students of school ABCD in first unit test are shown
as below.
Male Female
Sample Size 64 36
Sample Mean Marks 44 41
128 72
Population Variance ( )

The standard error for the difference between the two means is
a. 4
b. 7.46
c. 4.24
d. 2.0

Q10 If you are interested in testing whether or not the average marks of males is significantly
greater than that of females, the test statistic is
a. 2.0
b. 1.5
c. 1.96
d. 1.645

ANSWER KEY
A1 B
A2 C
MSE = SSE/DOF =500/(30-3) = 18.52
A3 B
A4 C
NUMERATOR DOF = C-1 =6
DENOMINATOR DOF =N-C = 70 - 7 = 63
A5 B
SSE = SST-SSTR = 200 – 80 = 120
A6 B (USE F TABLE)
A7 D
DOF for two sample t test = n1+n2 -2 = 15 +35 -2 = 48
A8 A
Only z test is possible in case of two proportions.
A9 D
A10 B
Week 6: Two way ANOVA and Linear regression

Q1: The model developed from sample data having the form of is known as

a. regression equation

b. correlation equation

c. estimated regression equation

d. regression model

ANS: C

Q2: In regression analysis, which of the following is not a required assumption about the error term ε?

a. The expected value of the error term is one.

b. The variance of the error term is the same for all values of X.

c. The values of the error term are independent.

d. The error term is normally distributed.

ANS: A

Q3: A regression analysis between sales (Y in $1000) and advertising (X in dollars) resulted in the following
equation

= 30,000 + 5 X

The above equation implies that an

a. increase of $5 in advertising is associated with an increase of $5,000 in sales

b. increase of $1 in advertising is associated with an increase of $5 in sales

c. increase of $1 in advertising is associated with an increase of $35,000 in sales

d. increase of $1 in advertising is associated with an increase of $5,000 in sales

ANS: D
Q4: In a regression and correlation analysis if r2 = 1, then

a. SSE = SST

b. SSE = 1

c. SSR = SSE

d. SSR = SST

ANS: D

Q5: SSE can never be

a. larger than SST

b. smaller than SST

c. equal to 1

d. equal to zero

ANS: A

Q6:

For the given data determine the R-squared value

Data:
Miles travel Petrol Consumption in litre
20 1
45 3
56 5
34 2
28 1.6
49 3.7

a) 0.887
b) 0.956
c) 0.945
d) 0.932

ANS: B

Q7: In the question no. 6 we will:

a) Accept the null hypothesis
b) Reject the null hypothesis
c) Can’t state any conclusion
d) None of the above

ANS: B
Q8: In Question 6, determine a 95% confidence interval for b1 to test the hypotheses

a) (0.045, 0.138)

b) (0.055, 0.148)
c) (0.065, 0.158)
d) (0.075, 0.138)

ANS: D

Q9: State TRUE or FALSE –

Statement: The variance of error, is same for all values of the independent variable

a) True

b) False

ANS: A

Q10: Which of the following is possible for the coefficient of determination:

a) It can be larger than 1
b) It is less than one
c) It can be less than -1
d) None of these alternatives is correct

ANS: B
Week 7 - Linear and Multiple Regression

Q1. The interval estimate of the mean value of y for a given value of x is defined as?
a. Prediction interval estimate
b. Confidence interval estimate
c. Average regression
d. X vs Y correlation interval

Ans: B

Q2. If the coefficient of determination is a positive value, then the coefficient of correlation
a. must also be positive
b. must be zero
c. can be either negative or positive
d. must be larger than 1
ANS: C

Q3. Which of the following is true about multiple regression model?

a. It has only one independent variable
b. It has more than one dependent variable
c. It has more than one independent variable
d. It has at least 2 dependent variable

Ans: C

Q4. In a multiple regression model, the error term ɛ is assumed to

a. Have a mean of 1
b. Have a variance of 0
c. Have a standard deviation of 1
d. Be normally distributed
Ans: D

Q5. Regression analysis is a statistical procedure for developing a mathematical equation that
describes how
a. one independent and one or more dependent variables are related
b. several independent and several dependent variables are related
c. one dependent and one or more independent variables are related
d. None of these alternatives is correct.
ANS: C

Q6. If the R.sq value is small for a model with a large number of independent variables, the
adjusted coefficient of determination _______________
a. Can be positive
b. Can be negative
c. Is zero
d. Can’t say
Ans: B

Q7. Which one of the statements is true regarding residuals in regression analysis?
a. Mean of residuals is always 0
b. Mean of residuals is always < 0
c. Mean of residuals is always > 0
d. There is no such rule for residuals
Ans: A

Q8. In a simple linear regression model (one independent variable), if we change the input
variable by 1 unit, how much will the output variable change?
a. By 1
b. No change
c. By its slope
d. None of these
Ans: C

Q9. If all the points of a scatter diagram lie on the least squares regression line, then the
coefficient of determination for these variables based on these data is
a. 0
b. 1
c. either 1 or -1, depending upon whether the relationship is positive or negative
d. could be any value between -1 and 1
ANS: B

Q10. In a regression analysis, the regression equation is given by y = 12 - 6x. If SSE = 510 and
SST = 1000, then the coefficient of correlation is
a. -0.7
b. +0.7
c. 0.49
d. -0.49
ANS: A
Q1. For categorical data with ‘n’ categories, the number of dummy variables will be________
a. n
b. n-1
c. n+1
d. 2n
Ans: b

Q2. In estimation of regression parameters

A. The likelihood function is a function of only 𝜎

B. The values of 𝛽0,….,𝛽n and 𝜎 should be such that, they maximizes the likelihood function.
C. Both (a) and (b)
D. All of the above
Ans: B

Q3. In logistic regression, the null hypothesis tested is:

a. H0: β = 0
b. H0: β ≠ 0
c. H0: μ = 0
d. H0: μ ≠ 0
Ans: a

Q4. In logistic regression,

a. The graph doesn’t follow S shape curve
b. The dependent variable is categorical
c. The estimated value of dependent variable is not probability
d. None of the above.
Ans. b

Q5. State true or false: G statistic is used to check the individual significance of the independent
variables
a. True
b. False
Ans: B.

Q6. The maximum likelihood estimate for binomial distribution is p = ____

a. 0.1
b. 0.2
c. 0.3
d. None of the above
Ans: c.

Q7. State True or False: The Method of Least Squares can be applied to models with any
probability distribution.
a. True
b. False
Ans: b.

Q8. Suppose you have been given a fair coin and you want to find out the odds of getting
heads. Which of the following option is true for such a case?
a. Odds will be 0
b. Odds will be 0.5
c. Odds will be 1
d. None of these
Ans. C

Q9. Large values of the log-likelihood statistic indicate:

a. That there are a greater number of explained vs. unexplained observations.
b. That the statistical model fits the data well.
c. That as the predictor variable increases, the likelihood of the outcome occurring
decreases.
d. That the statistical model is a poor fit of the data.
Ans. b

Q10. The logit function(given as l(x)) is the log of odds function. What could be the range of logit
function in the domain x=[0,1]?
a. (– ∞ , ∞)
b. (0,1)
c. (0 , ∞)
d. (- ∞, 0 )
Ans. a.
Week 9

Q1.State true or false: Statement: there is no difference between, E(y) = 0 + 1x and y = 0 + 1x

+ e , both are regression equations.
1. True
2. False
Ans: b.

Q2. Which of the following statements is correct:

● Sensitivity in ROC analysis is called True Positive Rate(tpr)
● Specificity in ROC analysis is not called True Negative Rate (tnr)
● Specificity in ROC analysis is called True Positive Rate(tpr)
● Sensitivity in ROC analysis is called True Negative Rate (tnr)
Ans: A

Q3. In ROC analysis when the Threshold value is Higher:

A. Specificity decreases
B. Sensitivity decreases
C. Both a. and b.
D. None of the above
Ans: b.

Q4. Sensitivity in ROC analysis is defined as (TP = True Positive, FP = False Positive, TN =
True Negative, FN = False Negative):
a. FP / (FP+TN)
b. FN/(TP+FN)
c. TN / (TN+FP)
d. TP / (TP+FN)
Ans. d.

Q5. In ROC analysis, a classifier is called ‘good’ if it has ______

a. Low TPR and Low FPR
b. Low TPR and High FPR
c. High TPR and Low FPR
d. High TPR and High FPR
Ans: c

Q6. For the given confusion matrix, compute the recall

True Positive True Negative

Predicted Positive 8 3

Predicted Negative 2 7
a. 0.73
b. 0.7
c. 0.78
d. 0.8
Ans: d

Q7. State true or False: Precision is inversely proportional to recall

a. True
b. False
Ans: b.

Q8. State True or False: Standardization of features is not required before training a Logistic
regression model
a. True
b. False
Ans: a.

Q9. Which of the following option is true?

A) Linear Regression errors values have to be normally distributed but in the case of Logistic
Regression it is not the case
B) Logistic Regression errors values have to be normally distributed but in the case of Linear
Regression it is not the case
C) Both Linear Regression and Logistic Regression error values have to be normally distributed
D) Both Linear Regression and Logistic Regression error values have not to be normally
distributed
Ans: a

Q10. Which of the following is true regarding the logistic function for any value “x”?
A. Logistic(x): is a logistic function of any number “x”
B. Logit(x): is a logit function of any number “x”
C. Logit_inv(x): is an inverse logit function of any number “x”
A) Logistic(x) = Logit(x)
B) Logistic(x) = Logit_inv(x)
C) Logit_inv(x) = Logit(x)
D) None of these
Ans: b.
Week 10: Chi-square test & Clustering

Q1. Sampling distribution for a goodness of fit test is the

a. Poisson distribution
b. t distribution
c. normal distribution
d. chi-square distribution
Ans : d.

Q2. Goodness of fit test is always conducted as a

a. lower-tail test
b. upper-tail test
c. middle test
d. None of these alternatives is correct.
Ans. b.

Q3. State True or False: Statement: Null hypothesis for chi square test of independence
assumes that, all the proportions are equal.
a. True
b. False
Ans. a.

Q4. Statistical test conducted to determine whether to reject or not reject a hypothesized
probability distribution for a population is known as a ________
a. contingency test
b. probability test
c. goodness of fit test
d. None of these alternatives is correct.
Ans. c.

Q5.What is the minimum no. of variables/ features required to perform clustering?

a. 0
b. 1
c. 2
d. 3
Ans. b.

Q6. The degrees of freedom for a contingency table with 12 rows and 12 columns is
a. 144
b. 121
c. 12
d. 120
ANS: B
Q7. The table below gives beverage preferences for random samples of teens and adults.
Teens Adults Total
Coffee 50 200 250
Tea 100 150 250
Soft Drink 200 200 400
Other 50 50 100
400 600 1,000
We are asked to test for independence between age (i.e., adult and teen) and drink preferences.
With a .05 level of significance, the critical value for the test is _______
a. 1.645
b. 7.815
c. 14.067
d. 15.507
ANS: B

Q8. How can Clustering (Unsupervised Learning) be used to improve the accuracy of the
Linear Regression model (Supervised Learning):
1. Creating different models for different cluster groups.
2. Creating an input feature for cluster ids as an ordinal variable.
3. Creating an input feature for cluster centroids as a continuous variable.
4. Creating an input feature for cluster size as a continuous variable

a. 1. Only
b. 1 & 2
c. 1 & 4
d. 1,2,3 & 4
Ans. d.

Q9. Let x1 = (1,2) and x2 = (3,5) be the co-ordiantes for two objects. The Euclidean and
Manhattan distance between these two objects is __________ respectively
a. 4.2 and 3
b. 3.15 and 2
c. 3.61 and 5
d. None of the above
Ans: c.

Q10. Last school year, the student body of a local university consisted of 30% freshmen, 24%
sophomores, 26% juniors, and 20% seniors. A sample of 300 students taken from this year's
student body showed the following number of students in each classification.
Freshmen 83
Sophomores 68
Juniors 85
Seniors 64
We are interested in determining whether or not there has been a significant change in the
classifications between the last school year and this school year. The expected number of
freshmen is ________
a. 83
b. 90
c. 30
d. 10
ANS: B
Week 11 - Clustering Analysis, K-means, Hierarchical clustering

Q1. Which library is used for calculating distance measures in clustering using python?

A. distance_matrix
B. scipy.spatial
C. scipy_spatial
D. distance.matrix
Ans: B.
(Error in the portal will be rectified soon.)

Q2. Formula for dissimilarity computation between two objects for categorical variables is –
Here p is a categorical variable and m denotes the number of matches.

A. D(i, j) = p-m / p

B. D(i, j) = p-m / m
C. D(i, j) = m-p / p
D. D(i, j) = m-p / m
Ans: A

Q3. Select the correct option for a data set with 7 objects and an interval-scaled variable ‘f’ we
have the following measurements:
f = (1, 2, 3, 4, 5, 8, 50)
containing one outlying value.

A. Std deviation (std_f) and mean absolute deviation (s_f) are having the same effect of the
outlier.
B. Mean absolute deviation (s_f) is more affected by the outlier
C. Std deviation (std_f) is less affected by the outlier
D. Std deviation(std_f) is more affected by the outlier.
Ans. D

Q4. Select the correct statement about the standardization in the following options –

A. Standardizing the data always gives inefficient result while making clusters
B. Standardizing the data always beneficial during clustering analysis
C. The variables having an absolute value may not efficient after standardization during
clustering analysis
D. Outliers can not be detected by standardized data
Ans. C

Q5. Which of the following can act as possible termination conditions in K-Means?

1. For a fixed number of iterations.

2. Assignment of observations to clusters does not change between iterations. Except for
cases with a bad local minimum.
3. Centroids do not change between successive iterations.
4. Terminate when RSS falls below a threshold.

A. 1,3, and 4

B. 1,2,3 and 4
C. 2 and 3
D. None of these
Ans. B

Q6. In the figure below, if you draw a horizontal line on y-axis for y=2. What will be the number
of clusters formed?

a. 1
b. 2
c. 3
d. 4
Ans: b.

Q7. Which of the following clustering requires merging approach?

a. Partitional
b. Naive Bayes
c. Hierarchical
d. None of the above
Ans: c

Q8. State True or False: Hierarchical clustering should primarily be used for exploration
a. True
b. False
Ans. a.
Q9. State True or False: For finding dissimilarity between two clusters in hierarchical clustering,
average-link is the only metric used
a. True
b. False
Ans. b.

Q10. If two variables V1 and V2, are used for clustering. Which of the following are true for K
means clustering with k =3?

1. If V1 and V2 has a correlation of 1, the cluster centroids will be in a straight line
2. If V1 and V2 has a correlation of 0, the cluster centroids will be in straight line

a. 1 only
b. 2 only
c. 1 and 2
d. None of the above
Ans: a.
Week 12 - CART 1 & 2

Q1. Which clustering algorithm works well when the shape of the clusters is hyper-spherical?
a. K means
b. Agglomerative Hierarchical clustering
c. Divisive Hierarchical clustering
d. All of the above
Ans: a.

Q2. In decision tree, an internal node represents –

a. A test on an attribute
b. An outcome of the test
c. Entire sample population
d. Holds a class label
Ans: a.

Q3. Choose the correct statement about the CART model –

a. CART is an unsupervised learning technique
b. CART is a supervised learning technique
c. CART adopts a greedy approach
d. Both b. & c.
Ans. d.

Q4. Which library is used to built the decision tree model-

a. Decision tree classifier
b. DecisionTreeClassifier
c. Decision_Tree_Classifier
d. Decision_tree_model
Ans. b.

Q5. State True or False: Gini Index enforces the resulting tree to have multiway splits
a. True
b. False
Ans. b

Q6. Chance nodes are represented by ___________

a. Disks
b. Squares
c. Circles
d. Triangles
Ans. c.

Q7. _______is the measure of uncertainty of a random variable, it characterizes the impurity of
an arbitrary collection of examples.
a. Information Gain
b. Gini Index
c. Entropy
d. None of the above
Ans: c

Q8. End Nodes are represented by ________

a. Disks
b. Squares
c. Circles
d. Triangles
Ans: d.

Q9. Decision tree learners may create biased trees if some classes dominate. What’s the
solution of it?
a. Balance the dataset prior to fitting
b. Imbalance the dataset prior to fitting
c. Balance the dataset after fitting
d. None of the above
Ans: a.

Q10. Suppose, your target variable is the price of a house using Decision Tree. What type of
tree do you need to predict the target variable?
a. Classification tree
b. Regression tree
c. Clustering tree
d. Dimensionality reduction tree
Ans. b.

QTM MCQs File
50% (2)
QTM MCQs File
21 pages
MB0040-Statistics For Management-Answer Keys
75% (8)
MB0040-Statistics For Management-Answer Keys
34 pages
End-Term Questions and Answers
No ratings yet
End-Term Questions and Answers
3 pages
Forecasting: To Accompany by Render, Stair, Hanna and Hale Power Point Slides Created by Jeff Heyl
No ratings yet
Forecasting: To Accompany by Render, Stair, Hanna and Hale Power Point Slides Created by Jeff Heyl
82 pages
1) Statement: Descriptive Analytics, Is The Conventional Form of Business Intelligence and Data Analysis. B. False
100% (1)
1) Statement: Descriptive Analytics, Is The Conventional Form of Business Intelligence and Data Analysis. B. False
21 pages
Applied Mathematics paper-1
No ratings yet
Applied Mathematics paper-1
6 pages
Compre Solution Regular
No ratings yet
Compre Solution Regular
27 pages
Assignment 3.docx (1)
No ratings yet
Assignment 3.docx (1)
4 pages
BSA - PUT - SEM I - 21-22 Solution
No ratings yet
BSA - PUT - SEM I - 21-22 Solution
16 pages
nptel-assignment-answers
No ratings yet
nptel-assignment-answers
52 pages
Probability MCQ's
100% (1)
Probability MCQ's
8 pages
Quant exercices 221124 solutions
No ratings yet
Quant exercices 221124 solutions
6 pages
Biostatistics II Esq Bcq
No ratings yet
Biostatistics II Esq Bcq
8 pages
241 Applied Mathematics Set B
No ratings yet
241 Applied Mathematics Set B
6 pages
CS3ET01 Statistics and Probability (1)
No ratings yet
CS3ET01 Statistics and Probability (1)
14 pages
Quantative Analysis-1 Sample Paper
100% (1)
Quantative Analysis-1 Sample Paper
4 pages
1
No ratings yet
1
47 pages
STA1510_2018_TL_001_2_B_combine
No ratings yet
STA1510_2018_TL_001_2_B_combine
33 pages
Sta301 Final Term Solved Mcqs Mega File-1-Converted-1
No ratings yet
Sta301 Final Term Solved Mcqs Mega File-1-Converted-1
40 pages
Final Exam - Solutions QBA 201 - Summer 2013 Instructor: Michael Malcolm
No ratings yet
Final Exam - Solutions QBA 201 - Summer 2013 Instructor: Michael Malcolm
9 pages
Null 7
No ratings yet
Null 7
11 pages
GUECO Assignment Statistics
No ratings yet
GUECO Assignment Statistics
18 pages
Model Paper STAT-211
No ratings yet
Model Paper STAT-211
7 pages
FINAL Eaxm C 2
No ratings yet
FINAL Eaxm C 2
7 pages
Mid Term Exam StatBusiness 2019-2020 Set F PDF
No ratings yet
Mid Term Exam StatBusiness 2019-2020 Set F PDF
13 pages
BT MGCR 650 Sample Final Exam Solutions MBAJapan
No ratings yet
BT MGCR 650 Sample Final Exam Solutions MBAJapan
9 pages
CMS 301-F_April_2019 Part Time (4)
No ratings yet
CMS 301-F_April_2019 Part Time (4)
6 pages
Final Exam Quantitative Data Analysis 1 2023 For Canvas - 451650306
No ratings yet
Final Exam Quantitative Data Analysis 1 2023 For Canvas - 451650306
8 pages
AIT 19101 Business Statistics MT QP FINAL - 11
No ratings yet
AIT 19101 Business Statistics MT QP FINAL - 11
9 pages
9835_ESE_DEC21_SOB__Sem 1_MBA(CORE)_DSQT7001_Quantitative Methods
No ratings yet
9835_ESE_DEC21_SOB__Sem 1_MBA(CORE)_DSQT7001_Quantitative Methods
3 pages
SSMDA_Assignment_1new
No ratings yet
SSMDA_Assignment_1new
3 pages
BUSINESS STATS - Consolidated
No ratings yet
BUSINESS STATS - Consolidated
10 pages
Statistics Test PDF
100% (1)
Statistics Test PDF
8 pages
2018 Semester 2 - Main Exam
No ratings yet
2018 Semester 2 - Main Exam
18 pages
Statistacise Exam Practice
No ratings yet
Statistacise Exam Practice
16 pages
FINAL D (3) 2
No ratings yet
FINAL D (3) 2
6 pages
BMSI
No ratings yet
BMSI
24 pages
Practice Exam Final
No ratings yet
Practice Exam Final
11 pages
MB0040
No ratings yet
MB0040
35 pages
Genz iitian 50 most important questions
No ratings yet
Genz iitian 50 most important questions
8 pages
4649302
No ratings yet
4649302
3 pages
MA Economics MCQ
No ratings yet
MA Economics MCQ
13 pages
Set - 2 - 2023 - Review - Outline Solutions
No ratings yet
Set - 2 - 2023 - Review - Outline Solutions
12 pages
quiz shastra[1]
No ratings yet
quiz shastra[1]
12 pages
Complete Pastpaper
No ratings yet
Complete Pastpaper
82 pages
Applied Statistics MCQ
0% (2)
Applied Statistics MCQ
7 pages
Exame - 2021:2022 (1º Sem) - Soluções
No ratings yet
Exame - 2021:2022 (1º Sem) - Soluções
11 pages
Stats-Proj Group 2
0% (1)
Stats-Proj Group 2
53 pages
QMM Exam Assist
67% (3)
QMM Exam Assist
21 pages
Exame - 2022:2023 (1º Sem) - Soluções
No ratings yet
Exame - 2022:2023 (1º Sem) - Soluções
12 pages
Class 12 Statistics
No ratings yet
Class 12 Statistics
11 pages
Maths 3
No ratings yet
Maths 3
22 pages
Gujarat Technological University
No ratings yet
Gujarat Technological University
3 pages
Problem Set 2 Topics: Sampling Distributions and Central Limit Theorem
100% (1)
Problem Set 2 Topics: Sampling Distributions and Central Limit Theorem
4 pages
Statistical Inferences Solved Paper
No ratings yet
Statistical Inferences Solved Paper
7 pages
Couchbase Certified Java Developer - Exam Practice Tests
From Everand
Couchbase Certified Java Developer - Exam Practice Tests
Cristian Scutaru
No ratings yet
Math Practice Tests For The ACT
From Everand
Math Practice Tests For The ACT
Vibrant Publishers
No ratings yet
Criteria Cognitive Aptitude Test (CCAT) Preparation
From Everand
Criteria Cognitive Aptitude Test (CCAT) Preparation
Georgio Daccache
No ratings yet
Mastering the International Mathematics Olympiad - Class 3 Workbook
From Everand
Mastering the International Mathematics Olympiad - Class 3 Workbook
u-smartkid academy
No ratings yet
Brute Force Search: Fundamentals and Applications
From Everand
Brute Force Search: Fundamentals and Applications
Fouad Sabry
No ratings yet
Sat Mathematics Review And Practice
From Everand
Sat Mathematics Review And Practice
Addison Shaw
1/5 (1)
British Culture An Introduction 3rd Edition David P. Christopher all chapter instant download
100% (9)
British Culture An Introduction 3rd Edition David P. Christopher all chapter instant download
85 pages
Jr Steno Marks Statement
No ratings yet
Jr Steno Marks Statement
43 pages
PDVSA Complaint Against David Rivera
No ratings yet
PDVSA Complaint Against David Rivera
10 pages
PHD Thesis Electric Vehicle
100% (3)
PHD Thesis Electric Vehicle
6 pages
PDF Corporation Law by Ladia Cases Chapter 6 - Compress
No ratings yet
PDF Corporation Law by Ladia Cases Chapter 6 - Compress
10 pages
IPV FAQs
No ratings yet
IPV FAQs
2 pages
Foundations for Local Governance Decentralization in Comparative Perspective 1st Edition Fumihiko Saito (Auth.) - Read the ebook now or download it for a full experience
100% (1)
Foundations for Local Governance Decentralization in Comparative Perspective 1st Edition Fumihiko Saito (Auth.) - Read the ebook now or download it for a full experience
55 pages
Love Story
No ratings yet
Love Story
3 pages
Gangster City The History of the New York Underworld 1900 1935 Downey instant download
100% (1)
Gangster City The History of the New York Underworld 1900 1935 Downey instant download
50 pages
EE Assessment Criteria
No ratings yet
EE Assessment Criteria
3 pages
AEP Vol. 15, No. 8 Abstracts (Ace) September 2005: 630-665
No ratings yet
AEP Vol. 15, No. 8 Abstracts (Ace) September 2005: 630-665
1 page
NMR UNit 1
No ratings yet
NMR UNit 1
22 pages
The Pareto Principle: Rosie Dunford, Quanrong Su, Ekraj Tamang and Abigail Wintour
No ratings yet
The Pareto Principle: Rosie Dunford, Quanrong Su, Ekraj Tamang and Abigail Wintour
9 pages
Excerpt from "After the Music Stopped: The Financial Crisis, the Response, and the Work Ahead" by Alan Blinder. Copyright 2013 by Alan Blinder. Reprinted here by permission of Penguin Press. All rights reserved.
No ratings yet
Excerpt from "After the Music Stopped: The Financial Crisis, the Response, and the Work Ahead" by Alan Blinder. Copyright 2013 by Alan Blinder. Reprinted here by permission of Penguin Press. All rights reserved.
8 pages
Đáp Án Đề Thi Cuối Kỳ Band 7.5 Bị Fake, Readding
No ratings yet
Đáp Án Đề Thi Cuối Kỳ Band 7.5 Bị Fake, Readding
18 pages
00, Business Communication 7, Psychological and Cultural Dimensions of Business Communication
No ratings yet
00, Business Communication 7, Psychological and Cultural Dimensions of Business Communication
15 pages
LSM Exam 2017 A
No ratings yet
LSM Exam 2017 A
9 pages
Jurnal Pricing Strategy
No ratings yet
Jurnal Pricing Strategy
8 pages
Buck-Morss Susan The Dialectics of Seeing Walter Benjamin and The Arcades Project 1989 PDF
100% (1)
Buck-Morss Susan The Dialectics of Seeing Walter Benjamin and The Arcades Project 1989 PDF
500 pages
Caterpillar
No ratings yet
Caterpillar
16 pages
Les Adverbes
No ratings yet
Les Adverbes
9 pages
Vaccination Centers On 01.09.2021
No ratings yet
Vaccination Centers On 01.09.2021
13 pages
Assignment 6 1
No ratings yet
Assignment 6 1
4 pages
NN11
No ratings yet
NN11
23 pages
BrandMaker Focus Paper: Marketing Process Optimization
No ratings yet
BrandMaker Focus Paper: Marketing Process Optimization
8 pages
Karina X Winter AU
No ratings yet
Karina X Winter AU
4 pages
Cavite City: 0 - Page
No ratings yet
Cavite City: 0 - Page
89 pages
Chap13 Leverage and Capital Structure
No ratings yet
Chap13 Leverage and Capital Structure
101 pages
2nd Session
No ratings yet
2nd Session
3 pages

Phython Assignment

Uploaded by

Phython Assignment

Uploaded by

Assignment 1

Q1 State True or false:

Q6 Which of the following is not a measure of dispersion?

Q7 State the following true or false?

Q8 Bar Charts are used for :

Q9 Median is not applicable to

Q10 def m(data)

Q1 A college plans to interview 8 students for possible offers of graduate

Q1 The specific value of a random variable is called estimator

Q3 Stratified random sampling is a method of selecting a sample in which

b. various strata are selected from the sample

d. None of these alternatives is correct.

Q4 The interval estimate provides more information about a population characteristics

Q5 A question paper contains 90 multiple choice questions. There are 4 alternative

Q6 On an average 5 % items supplied by manufacturer X. are defectives. If a batch of

Q1 If we have a sample size of 20 and population standard deviation is known, we will

Q2 Null hypothesis, Ho: 1- 2 = 0 is a

Q6 The error of rejecting a true null hypothesis is

d. committed when not enough information is available

b. the alternative hypothesis is true

c. the data must have been accumulated incorrectly

d. the sample size has been too small

Q9 In the hypothesis testing procedure, α is

a. 1 - the level of significance

b. the critical value

c. the confidence level

b. will always be accepted at the 1% level

c. will never be tested at the 1% level

d. may be rejected or not rejected at the 1% level

Q3 The ‘F’ ratio in a completely randomized ANOVA is the ratio of

Statement: The sampling distribution of two populations is approximated by a normal

c. estimated regression equation

a. The expected value of the error term is one.

c. The values of the error term are independent.

d. The error term is normally distributed.

The above equation implies that an

a. increase of $5 in advertising is associated with an increase of $5,000 in sales

b. increase of $1 in advertising is associated with an increase of $5 in sales

c. increase of $1 in advertising is associated with an increase of $35,000 in sales

d. increase of $1 in advertising is associated with an increase of $5,000 in sales

Q5: SSE can never be

a. larger than SST

b. smaller than SST

For the given data determine the R-squared value

Q7: In the question no. 6 we will:

a)​ (0.045, 0.138)

Q9: State TRUE or FALSE –

Q10: Which of the following is possible for the coefficient of determination:

Q3. Which of the following is true about multiple regression model?

Q4. In a multiple regression model, the error term ɛ is assumed to

Q2. In estimation of regression parameters

A.​ The likelihood function is a function of only 𝜎

Q3. In logistic regression, the null hypothesis tested is:

Q4. In logistic regression,

Q6. The maximum likelihood estimate for binomial distribution is p = ____

Q9. Large values of the log-likelihood statistic indicate:

Q1.State true or false: Statement: there is no difference between, E(y) = 0 + 1x and y = 0 + 1x

Q2. Which of the following statements is correct:

Q3. In ROC analysis when the Threshold value is Higher:

Q5. In ROC analysis, a classifier is called ‘good’ if it has ______

Q6. For the given confusion matrix, compute the recall

True Positive True Negative

Q7. State true or False: Precision is inversely proportional to recall

Q9. Which of the following option is true?

Q1. Sampling distribution for a goodness of fit test is the

Q2. Goodness of fit test is always conducted as a

Q5.What is the minimum no. of variables/ features required to perform clustering?

A.​ D(i, j) = p-m / p

1.​ For a fixed number of iterations.

A.​ 1,3, and 4

Q7. Which of the following clustering requires merging approach?

Q2. In decision tree, an internal node represents –

Q3. Choose the correct statement about the CART model –

Q4. Which library is used to built the decision tree model-

Q6. Chance nodes are represented by ___________

Q8. End Nodes are represented by ________

a) (0.045, 0.138)

A. The likelihood function is a function of only 𝜎

A. D(i, j) = p-m / p

1. For a fixed number of iterations.

A. 1,3, and 4