Regression _ DPP 01
Regression _ DPP 01
Machine Learning
DPP: 1
Regression
Q1 The parameters acquired through linear The first principal component explains the
regression: largest proportion of the variation in the
(A) can take any value in the real space dependent variable.
(B) are strictly integers (C) Principal components are linear
(C) always lie in the range [0,1] combinations of the original predictors that
(D) can take only non-zero values are uncorrelated with each other.
(D) PCR selects the principal components with
Q2 Which of the statements is/are True ?
the highest p-values for inclusion in the
(A) Ridge has sparsity constraint, and it will
regression model.
drive coefficients with low values to 0.
(E) PCR always results in a lower model
(B) Lasso has a closed form solution for the
complexity compared to ordinary least
optimization problem, but this is not the
squares regression.
case for Ridge.
(C) Ridge regression does not reduce the Q5 Which statement is true about outliers in Linear
number of variables since it never leads a regression ?
coefficient to zero but only minimizes it. (A) Linear regression model is not sensitive to
(D) If there are two or more highly collinear outliers
variables, Lasso will select one of them (B) Linear regression model is sensitive to
randomly outliers
(C) Can't say
Q3 The relation between studying time (in hours)
(D) None of these
and grade on the final examination (0-100) in a
random sample of students in the Introduction Q6 What does the slope coefficient in a linear
to Machine Learning Class was found to be : regression model indicate?
Grade = 30.5 + 15.2 (h) How will a student's (A) The point where the regression line intersects
grade be affected if she studies for four hours ? the y-axis
(A) It will go down by 30.4 points. (B) The dependent variable changes for every
(B) It will go down by 30.4 points. one-unit change in the independent
(C) It will go up by 60.8 points. variable
(D) The grade will remain unchanged. (C) The average value of the dependent
(E) It cannot be determined from the variable
information given (D) The dispersion of the dependent variable
Q4 Which of the following statements about Q7 Find the mean of squared error for the given
principal components in Principal Component predications :
Regression (PCR) is true?
(A) Principal components are calculated based
on the correlation matrix of the original Y F(X)
predictors.
1 2
(B)
1/13
GATE
Q9 Which of the following statements is true Q13 For a bivariate data set on (x, y), if the means,
regarding Partial Least Squares (PLS) standard deviations and correlation coefficient
regression? are
(A) PLS is a dimensionality reduction technique x = 1.0, y = 2.0, sx = 3.0, sy = 9.0, r = 0.8
that maximizes the covariance between the Then the regression line of y on x is:
predictors and the dependent variable. (A) y = 1 + 2.4(x-1)
(B) PLS is only applicable when there is no (B) y = 2 + 0.27(x - 1)
multicollinearity among the independent (C) y = 2 + 2.4(x-1)
variables. (D) y = 1 + 0.27(x-2)
(C) PLS can handle situations where the number
Q14 What is the purpose of regularization in linear
of predictors is larger than the number of
regression?
observations.
(A) To make the model more complex
(D) PLS estimates the regression coefficients by
(B) To avoid underfitting
minimizing the residual sum of squares.
(C) To encourage overfitting
(E) PLS is based on the assumption of normally
(D) To reduce the complexity of the model
distributed residuals.
(F) All of the above. Q15 A set of observations of independent variable
(G) None of the above. (x) and the corresponding dependent variable
(y) is given below :
2/13
GATE
3/13
GATE
(b) Plot the given points and the regression line To determine the significance of individual
in the same rectangular system of axes. coefficients.
(B) To test the overall significance of the
Q22 In the table below, the xi column shows scores
regression model.
on the aptitude test. Similarly, the yi column
(C) To assess the presence of multicollinearity
shows statistics grades. The last two columns
among independent variables.
show deviations scores - the difference
(D) To evaluate the normality of residuals.
between the student's score and the average
score on each measurement. The last two rows Q26 When performing linear regression,
show sums and mean scores. multicollinearity can be problematic. Which of
Find the regression equation the following statements about multicollinearity
Stu is true?
de xi yi (xi − x̄)
2
(yi − ȳ)
2
(A) Multicollinearity occurs when there is no
nt correlation between independent variables.
1 95 85 289 64 (B) Multicollinearity makes it easier to interpret
4/13
GATE
When there is multicollinearity among the (D) When the dataset has a large number of
independent variables. observations.
5/13
GATE
Answer Key
Q1 (C) Q15 (1.9)
Q7 (A) Q21 0.
6/13
GATE
7/13
GATE
The true statement about principal False. PCR can result in a lower model
components in Principal Component Regression complexity compared to ordinary least squares
(PCR) is: (OLS) regression when a small number of
Principal components are linear combinations principal components are retained. However, if
of the original predictors that are uncorrelated all principal components are used in PCR, the
with each other. model complexity can be similar to the full OLS
1. Principal components are calculated based regression model.
on the correlation matrix of the original
Q5 Text Solution:
predictors.
The slope of the regression line will change due
False. Principal components are calculated
to outliers in most of thecases.
based on the covariance matrix (or
Q6 Text Solution:
equivalently, the correlation matrix after
In a linear regression model, the slope
standardization) of the original predictors, not
coefficient represents the rate of change in the
the correlation matrix directly.
dependent variable (Y) for each one-unit
2. The first principal component explains the
change in the independent variable (X).
largest proportion of the variation in the
Specifically, it indicates how much the
dependent variable. False. The first principal
predicted value of the dependent variable
component explains the largest proportion of
changes for every one-unit increase (or
the variation in the predictors, not the
decrease) in the independent variable, holding
dependent variable. It captures the direction of
all other variables constant.
maximum variance in the predictor space.
3. Principal components are linear combinations Q7 Text Solution:
of the original predictors that are uncorrelated Calculate the squared error for each prediction,
with each other. which is the square of the difference between
True. Principal components are linear each predicted value (F(x)) and the
combinations of the original predictors that are corresponding true value (y).
constructed in such a way that they are Given predicitions : y = [1, 2, 4, 8, 16 , 32] F(x) = [2 ,
uncorrelated with each other. Each principal 3 , 5 , 9 ,15 , 31]
component represents a unique orthogonal Squared error for each prediction :
direction in the predictor space. Prediction 1 : (2 - 1) ∧2 = 1
4. PCR selects the principal components with Prediction 2 : (3 − 2) ∧ 2 = 1
reduction technique that aims to reduce 2. Calculate the mean of squared error by
multicollinearity and model complexity by taking the sum of squared errors and dividing
selecting a subset of the principal components by the number of predictions (samples).
that capture most of the variance in the Mean squared error (MSE) = (Squared error 1 +
predictors. Squared error 2 +Squared error 3 + Squared
5. PCR always results in a lower model error 4 + Squared error 5 +Squared error 6 ) / 6
complexity compared to ordinary least squares Mean squared error (MSE) = (1 + 1 + 1 + 1 + 1+ 1) / 6
regression. = 6 / 6 = 1.
8/13
GATE
So, the mean squared error for the given minimizing the residual sum of squares as in
predicitions is 1. ordinary least squares (OLS) regression.
dependent variable (outcome). This means that method and makes fewer assumptions about
the change in the dependent variable is the underlying data distribution compared to
PLS is a dimensionality reduction technique estimates the range within which the true mean
that maximizes the covariance between the of the dependent variable is likely to fall.
predictors and the dependent variable. (B) The standard deviation value of the
dependent variable: This is incorrect. The
True. PLS aims to find a low-dimensional latent
confidence interval is not typically used to
space that maximizes the covariance between
the predictors (independent variables) and the estimate the standard deviation of the
dependent variable while considering their dependent variable. Instead, it estimates the
(2). PLS is only applicable when there is no (C) The mean value of the independent
False. Unlike traditional multiple linear other parameters of the dependent variable,
among the independent variables. It deals with (D) The standard deviation value of the
multicollinearity by creating latent variables independent variable: This is incorrect for the
(components) that are linear combinations of same reason as option B. The confidence
9/13
GATE
= 0
categories or groups), such as gender, ethnicity,
or geographic region, in your dataset, you need
2 2
∴ R = (2x − 3) + (4x − 1)
dx
= 0 to convert them into a numerical format to
∴
dR
= 2 × 2 (2x − 3) + 4 × 2 (4x − 1) include them in a regression model. Dummy
dx
= 0
variables are created to represent different
∴ x =
1
&R min = (2 ×
1
− 3)
2
categories of the categorical variable.
2 2
+ (4 ×
1
− 1)
2
= 5 Q17 Text Solution:
2
squares of the errors in the two equation is 1/2. = y − Xb where b = (X'X) X'y
−1
= (I − H)y where H = X(X'X) X'
Q13 Text Solution:
= (I − H) ε
According to the question
E (ε̂ ) = 0
y - 2 = 0.8 x 9 (x-1)/3 2
V (ε̂ ) = σ (I − H)
⇒ y - 2 2.4(x - 1)
Since E (ε̂ ) = 0 , ε̂ i 's have zero mean.
y = 2 + 2.4(x - 1)
Since I − H is not generally a diagonal matrix ,
Q14 Text Solution: So ε̂ i 's do not have necessarily the same
The purpose of regularization in linear variances.
regression is to prevent underfitting by allowing The off - diagonal elements in (I − H) are not
the model to capture more complex zero , in general. So ε̂ i 's are not independent.
relationships in the data while still avoiding
overfitting. Regularization achieves this by
Q18 Text Solution:
penalizing overly complex models and
0 minimizes the sum of squared errors and
encouraging simpler models that generalize
obtain the optimal linear regression model, we
better to new data.
need to solve for the parameter vector 0. The
Q15 Text Solution: equation that holds in this context is:
Given Data and Calculation : XTXO = XTY
x y x2 xy Where
5 16 25 80 X is the Nx(p+1) matrix of input values
2 10 4 20 (augmented by 1's) with N data points and p
attributes each.
4 13 16 52
Y is the Nx1 vector of target values (the
3 12 9 36
dependent variable).
Σ x2 = Σ xy = O(thetha) is the (p+1)x1 vector of parameter
Σ x = 14 Σ y = 51
54 188 values (a0, a1, a2, ..., ap).
n = 4 So To understand why this equation holds, let's
51 = 4a + 14b briefly describe the steps of linear regression.
188 = 14a + 54 b The goal of linear regression is to find the
Solving the above two equations a = 6.1 and b = parameter vector 0 that minimizes the sum of
1.9. squared errors (SSE). The SSE is given by:
10/13
GATE
2 2
n(∑ x )−(∑ x)
where (Y-XO) is the vector of residuals (the 6(20485)−(247)(486)
b1 =
difference between the actual target values Y 6(11409)−(247)
2
To find the optimal 0 that minimizes SSE, we Insert the values into the equation.
take the derivative of SSE with respect to 0 and y’ =bo +b1 * x
set it to zero. The solution for 8 that satisfies this y’ = 65.14 + (0.385225 * x)
condition is: Prediction – the value of y for the given value of
0 = (XTX)^(-1)XTY x = 55
Substituting this value of 0 back into the SSE y’ = 65.14 +(0.385225 ∗55)
equation, we get: y’ = 86.327
SSE(0) = (Y - XO)T (Y - X8) SSE(0) = (Y -
Q20 Text Solution:
X(XTX)^(-1)XTY)T (Y - X(XTX)^(-1)XTY) SSE(0) = (Y -
Lasso tends to produce sparse coefficient
Xe)T (Y- xe)
vectors, while Ridge does not. Lasso
Q19 Text Solution: regularization includes an L1 penalty term that
2
b0 =
(∑ y)(∑ x )−(∑ x)(∑ xy)
encourages some coefficients to be exactly
2 2
n(∑ x )−(∑ x)
56 A = 23/38 b = 5/19
4 42 75 3150 1764
25 (b) now graph the regression line given by y = a
324 75 x + b and the given points.
5 57 87 4959
9 69
65
6 59 81 4779 3481
61
40
204 1140
E 247 486 02
85 9
2
Find b0 :
2
(∑ y)(∑ x )−(∑ x)(∑ xy) Q22 Text Solution:
b0 =
2 2
n(∑ x )−(∑ x)
(xi
(486)(11409)−(247)(20485)
b0 =
2 Studen − x̄)
6(11409)−(247)
xi yi
b0 =
4848979
= 65. 14 t (yi
7445
Find b1 : − ȳ)
11/13
GATE
12/13
GATE
13/13