0% found this document useful (0 votes)
72 views

Revision Exercise SDSC5001 Midterm

Uploaded by

2974936599
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
72 views

Revision Exercise SDSC5001 Midterm

Uploaded by

2974936599
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

True/False Questions:

1. True/False: KNN always perform better than linear regression in a learning task.

2. True/False: Data objects can be represented in a multi-dimensional space if they have


the same set of fixed numeric variables.

3. True/False: Ordinal variables represent data that has no inherent ordering.

4. True/False: Noise in data refers to perturbations of original values, whereas outliers are
observations that are considerably different from others.

5. True/False: Data matrices can only represent numeric data, not categorical data.

6. True/False: The training error of a model is always a reliable measure of its


performance on unseen data.

7. True/False: In classification tasks, the probability of misclassification is used as a


measure of model performance.

8. True/False: Cross-validation is used to estimate how a model generalizes to an unseen


dataset.

9. True/False: In bias-variance tradeoff, decreasing bias typically leads to an increase in


variance.

10. True/False: In regression analysis, multicollinearity refers to the situation where


predictor variables are highly correlated with each other.

11. True/False: The main goal of data exploration is to finalize a model for prediction tasks.

12. True/False: One of the possible issues in a regression model is when the errors
(residuals) have non-constant variance, known as heteroscedasticity.

13. True/False: Bias-variance decomposition explains the U-shaped curve for the test error
observed when model complexity increases.
14. True/False: High leverage points in a regression model are those with unusual predictor
variable values.

15. True/False: Cross-validation can reduce the test error of a model.

16. True/False: In a classification task, the decision boundary is more continuous when
using a k-nearest neighbors classifier with a smaller value of 𝑘 (e.g., k=1).
17. True/False: In a boxplot, the whiskers extend to show the range of the entire dataset.
Multiple Choice Questions:
18. Which data exploration technique focuses primarily on visualization?
A) Data Mining
B) Exploratory Data Analysis (EDA)
C) Predictive Modeling
D) Sampling Bias Correction
19. Which of the following types of data is best represented by a data matrix?
A) Graph data
B) Text data
C) Image data
D) Data with a fixed set of numeric variables
20. Which of the following is true about the bias-variance tradeoff?
A) Increasing model complexity always decreases both bias and variance.
B) As bias decreases, variance generally increases.
C) A perfect model has zero bias and variance.
D) High bias is desirable for complex models.
21. What does a high variance inflation factor (VIF) indicate?
A) The predictors are uncorrelated.
B) There is significant multicollinearity in the data.
C) The residuals are normally distributed.
D) The model has a high prediction accuracy.
22. In cross-validation, which of the following is true about Leave-One-Out Cross
Validation (LOOCV)?
A) It is less biased than K-Fold Cross Validation.
B) It is computationally less expensive than K-Fold Cross Validation.
C) LOOCV uses the entire dataset for training each time.
D) LOOCV is preferred for small datasets due to its low variance.
23. Which of the following error terms cannot be reduced through better model fitting?
A) Irreducible error
B) Reducible error
C) Bias
D) Variance
24. Which of the following statements is true about classification models?
A) Linear regression is always a better model for classification than k-nearest
neighbors.
B) K-nearest neighbors tend to use fewer parameters than linear regression.
C) Misclassification error is more relevant for regression tasks than classification tasks.
D) K-nearest neighbors classify based on the majority class in the neighborhood of a
new point.
25. Which of the following is a correct statement about missing data?
A) Missing data should always be imputed using the mean of the available values.
B) Missing data can be handled by removing rows with missing values or imputing them.
C) Missing data never impacts the quality of the model.
D) Missing data is always caused by data collection errors.
26. What is the main motivation for exploratory data analysis (EDA)?
A) To replace formal statistical analysis
B) To help recognize patterns and choose the appropriate analysis techniques
C) To finalize the model for production use
D) To confirm the distribution of errors
27. Which of the following is not considered as summary statistics?
A) Median
B) Skewness
C) Mode
D) Mean squared error
28. Which of the following graphs is most suitable for detecting relationships between
two continuous variables?

A) Bar chart
B) Boxplot
C) Scatter plot
D) Histogram
29. Which of the following indicates that a regression model may be overfitting?
A) High training error
B) Very low training error but high test error
C) High test error and low bias
D) The model works good on new, unseen data.
30. Which of the following are examples of qualitative predictors in regression?
A) Gender
B) Height
C) Income
D) Temperature
31. Which method is commonly used to test the statistical significance of all predictors
in a multiple regression model?

A) T-test
B) F-test
C) Chi-square test
D) Z-test
32. What does the R-squared value in regression analysis represent?
A) The proportion of variance in the dependent variable explained by the independent
variables
B) The overall accuracy of the regression model
C) The strength of correlation between two variables
D) The fraction of predictors that perfectly predict the dependent variable.

You might also like