0% found this document useful (0 votes)
49 views

ISE 529 mock test answers

The document is a mock midterm exam for ISE 529 (Predictive Analysis) containing 22 questions related to predictive modeling concepts, including K-Nearest Neighbors, data preprocessing, decision trees, and regression analysis. It covers topics such as the importance of using an odd number for K in KNN, handling missing values, and evaluating model performance with metrics suitable for imbalanced datasets. Additionally, it includes practical scenarios for applying statistical methods like ANOVA and polynomial regression.

Uploaded by

amandahan.0023
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views

ISE 529 mock test answers

The document is a mock midterm exam for ISE 529 (Predictive Analysis) containing 22 questions related to predictive modeling concepts, including K-Nearest Neighbors, data preprocessing, decision trees, and regression analysis. It covers topics such as the importance of using an odd number for K in KNN, handling missing values, and evaluating model performance with metrics suitable for imbalanced datasets. Additionally, it includes practical scenarios for applying statistical methods like ANOVA and polynomial regression.

Uploaded by

amandahan.0023
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

ISE 529 (Predictive Analysis)

Mock mid

1. Why should there not be an even number of K in K-Nearest Neighbors (KNN)?


A) It leads to overfitting
B) It may cause ties when classifying
C) It reduces the accuracy of the model
D) It increases the computational complexity

2. Why is it generally recommended to use an odd number for K in KNN?

A) It simplifies the computational complexity


B) It avoids ties when determining the majority class
C) It ensures better accuracy
D) It prevents overfitting

3. Why is it important to split the data before scaling?

A) To ensure the model performs better on the training data.


B) To prevent information from the test set from leaking into the training data.
C) To reduce the computational cost of scaling.
D) To avoid overfitting the model to the training data.

4. Why doesn't feature scaling typically apply to dummy variables?

A) Dummy variables represent categorical data with inherent ordering.


B) Dummy variables are already normalized between 0 and 1.
C) Dummy variables can cause multicollinearity if scaled.
D) Dummy variables are binary and do not require scaling for most algorithms.

5. In the Bag of Words model, what is the primary limitation that can
affect text classification tasks?

A) It considers word order and context, making it computationally expensive.


B) It does not consider the semantic meaning of words and treats them as
independent features.
C) It only works with small datasets and cannot handle large text corpora.
D) It requires deep learning models for effective implementation.

6. In the Bag of Words model, how is text data typically represented for machine
learning models?
A) As a structured table where each word is a feature with its frequency as a value.
B) As a continuous vector where word meanings are preserved.
C) As a sequence of words in the order they appear in the document.
D) As an image matrix where words are represented as pixels.

7. Why is it important to handle missing values before training a model?

A) To reduce the size of the dataset


B) To improve model interpretability
C) To avoid biased estimates from the model
D) To enhance the speed of training

8. When should you use R^2 adjusted instead of R^2?

A) When you have a large dataset with a single predictor variable.


B) When comparing models with different numbers of predictor variables.
C) When your model has a very high R2R2 value.
D) When the predictor variables are highly correlated.

9. Which of the following preprocessing steps helps reduce the dimensionality of text
data while preserving important information?
A) Removing stop words and applying stemming or lemmatization
B) Converting text into uppercase letters for uniformity
C) Keeping all punctuation and special characters
D) Replacing all words with their synonyms to increase variability

10. What is the primary goal of constructing a decision tree?

A) To find the maximum likelihood of a feature


B) To minimize the variance among branches
C) To split data in a way that maximizes information gain
D) To calculate the distance to the nearest neighbor

11. What does a confusion matrix illustrate?

A) The correlation between predicted and actual values


B) The distribution of data points in a clustering problem
C) The performance of a classification model in terms of true/false positives/negatives
D) The accuracy of a regression model

12. Which evaluation metric is most suitable for a classification model when dealing
with imbalanced datasets?
A) Accuracy
B) Precision, Recall, and F1-score
C) Mean Squared Error (MSE)
D) R-squared (R²)
13. How are Decision Trees different from other linear models?

A) Decision Trees create splits in the data based on linear combinations of features.
B) Decision Trees can capture non-linear relationships without needing feature
transformations.
C) Decision Trees only work with continuous data, while linear models handle both continuous
and categorical data.
D) Decision Trees require the data to be scaled like linear models.

14. How can we overcome overfitting in machine learning models?

A) By increasing the number of features in the model.


B) By reducing the size of the training dataset.
C) By using techniques like cross-validation, regularization, or pruning.
D) By increasing the complexity of the model to better fit the training data.

15. Assume you are working on predicting housing prices in a highly volatile market
where some houses are priced unusually high due to unique features (like proximity
to landmarks).
You want to create a model that ignores these extreme outliers and focuses on
capturing the general trend in pricing. There are few things you are considering:

- You expect that the relationship between the house features (like size, location,
and age)
- the price is non-linear.

Which model would be the most appropriate for this situation to ensure the best fit
while considering maximum margin and handling non-linearity?

A) Linear Regression, to fit a straight line through the data.


B) Decision Trees, to split the data based on feature values and handle outliers directly.
C) Support Vector Regression (SVR) with a non-linear kernel, to create a margin of tolerance
around the data points and handle non-linearity.
D) K-Nearest Neighbors, to classify prices based on the nearest points in the feature space.

16. Which assumption of linear regression ensures that the error terms have the same
variance across all levels of the independent variables?

A) Linearity
B) Normality of residuals
C) Homoscedasticity
D) Independence of errors

17. What is meant by the "independence of errors" assumption in linear regression?

A) The residuals should have a constant variance.


B) The residuals should follow a normal distribution.
C) The residuals should not be correlated with each other.
D) The predictors should not be correlated with each other.

18. Based on the data table, you are tasked with fitting a polynomial regression model to
predict Y from X. You suspect that a polynomial relationship exists between X and
Y. You decide to increase the polynomial degree until you find the best fit.
Looking at the values of Y as X increases, what degree of polynomial would you expect to
be appropriate for fitting this data?
X (Features) Y (Target)

1 3

2 12

3 27

4 48

5 75

A) Linear (degree = 1)
B) Quadratic (degree = 2)
C) Cubic (degree = 3)
D) Quartic (degree = 4)

19. You are working on a multiple linear regression model to predict house prices based
on house size and location of the following table. The "Location" variable is
categorical and needs to be encoded using dummy variables.
House Size (sqft) Location Price ($)

1500 Urban 300,000

2000 Suburban 350,000

1700 Rural 280,000

2500 Urban 400,000

1800 Suburban 320,000

You create three dummy variables:

● Location_Urban (1 if the house is in Urban, 0 otherwise)


● Location_Suburban (1 if the house is in Suburban, 0 otherwise)

● Location_Rural (1 if the house is in Rural, 0 otherwise)

Which of the following actions should you take to avoid the Dummy Variable Trap?
A) Keep all three dummy variables in the model.
B) Remove the dummy variable for "Urban" since it's the first category.
C) Remove one of the dummy variables (for any category) to prevent multicollinearity.
D) Add an additional dummy variable for houses that do not belong to any of the existing
categories.

20. What are the methods to tackle non-linearly separable data?

A) Use a linear regression model and increase the number of features.


B) Apply feature scaling to make the data linearly separable.
C) Use kernel methods like the kernel trick in Support Vector Machines (SVM) or polynomial
features to map data to a higher-dimensional space.
D) Use k-means clustering to separate the data into different clusters.

21. The following table represents an ANOVA table in an experiment.

2 2
Calculate 𝑅 adjusted and 𝑅 .
2
2 2 (1−𝑅 )58
𝑅 = 173383 / 225993, 𝑅 adjusted = 1 - 44

22. Plant performance is based on pulp brightness as measured by a reflectance meter.


Each of the four shift operators (denoted by A, B, C, and D) made five pulp
handsheets from unbleached pulp. Reflectance was read for each of the handsheets
using a brightness tester, as reported in Table 2.1. A goal of the experiment is to
determine whether there are differences between the operators in making the
handsheets and reading their brightness.
Write an appropriate null hypothesis and way to analyze it.

Null Hypothesis (H₀): There is no significant difference in pulp brightness between the shift
operators.

Use ANOVA (Analysis of Variance) to compare means across the four operators.

You might also like