Adijfpqo
Adijfpqo
Feel free to edit this (formatting, missing content, answers...). Can an instructor provide answers
to these? I can also try to make a key if I get the time. Maybe we can pin and keep growing this
post?
1.1 (Intro): select all of the following statements which are True
Varada's answers: A, D, E
Select all of the following statements which are examples of supervised machine learning
Varada's answers: B, D, E
Select all of the following statements which are examples of regression problems
(A) Predicting the price of a house based on features such as number of bedrooms and the
year built.
(B) Predicting if a house will sell or not based on features like the price of the house,
number of rooms, etc.
(C) Predicting percentage grade in CPSC 330 based on past grades.
This study source was downloaded by 100000792727365 from CourseHero.com on 04-24-2024 11:29:43 GMT -05:00
https://www.coursehero.com/file/223575961/All-iClicker-Questions-330docx/
(D) Predicting whether you should bicycle tomorrow or not based on the weather
forecast.
(E) Predicting appropriate thermostat temperature based on the wind speed and the
number of people in a room.
Varada's answers: A, C, E
Varada's answer:
o Creating X and y
o Creating a model instance
o fit
o score to evaluate the performance of a given model
o predict on new examples
Varada's answers: B, D
3.1: ML Fundamentals
(A) A decision tree model with no depth (the default max_depth in sklearn) is likely to
perform very well on the deployment data.
(B) Data splitting helps us assess how well our model would generalize.
This study source was downloaded by 100000792727365 from CourseHero.com on 04-24-2024 11:29:43 GMT -05:00
https://www.coursehero.com/file/223575961/All-iClicker-Questions-330docx/
(C) Deployment data is only scored once.
(D) Validation data could be used for hyperparameter optimization.
(E) It's recommended that data be shuffled before splitting it into train and test sets..
Varada's answers: B, D, E
3.2: ML Fundamentals
Varada's answers: A, B, C, E
(A) Analogy-based models find examples from the test set that are most similar to the
query example we are predicting.
(B) Euclidean distance will always have a non-negative value.
(C) With k-NN, setting the hyperparameter k to larger values typically reduces training
error.
(D) Similar to decision trees, k-NNs finds a small set of good features.
(E) In k-NN, with k >1, the classification of the closest neighbour to the test example
always contributes the most to the prediction.
Varada's answers: B
(A) k-NN may perform poorly in high-dimensional space (say, d > 1000).
(B) In SVM RBF, removing a non-support vector would not change the decision
boundary.
(C) In sklearn’s SVC classifier, large values of gamma tend to result in higher training
score but probably lower validation score.
This study source was downloaded by 100000792727365 from CourseHero.com on 04-24-2024 11:29:43 GMT -05:00
https://www.coursehero.com/file/223575961/All-iClicker-Questions-330docx/
(D) If we increase both gamma and C, we can't be certain if the model becomes more
complex or less complex.
Take a guess: In your machine learning project, how much time will you typically spend on data
preparation and transformation?
1. StandardScaler ensures a fixed range (i.e., minimum and maximum values) for the
features.
2. StandardScaler calculates mean and standard deviation for each feature separately.
3. In general, it's a good idea to apply scaling on numeric features before training k-NN or
SVM RBF models.
4. The transformed feature values might be hard to interpret for humans.
5. After applying SimpleImputer The transformed data has a different shape than the
original data.
1. You can have scaling of numeric features, one-hot encoding of categorical features,
and scikit-learn estimator within a single pipeline.
2. Once you have a scikit-learn pipeline object with an estimator as the last step, you can
call fit, predict, and score on it.
3. You can carry out data splitting within scikit-learn pipeline.
4. We have to be careful of the order we put each transformation and model in a pipeline.
5. If you call cross_validate with a pipeline object, it will call fit and transform on the
training fold and only transform and predict/score on the validation fold.
This study source was downloaded by 100000792727365 from CourseHero.com on 04-24-2024 11:29:43 GMT -05:00
https://www.coursehero.com/file/223575961/All-iClicker-Questions-330docx/
Varada's answers: 2., 4., 5.
Varada's answers: 4.
Varada's answers: A, B, D
This study source was downloaded by 100000792727365 from CourseHero.com on 04-24-2024 11:29:43 GMT -05:00
https://www.coursehero.com/file/223575961/All-iClicker-Questions-330docx/
Varada's answers: A, B, D
Varada's answers: A, B, C, D
(A) If you get best results at the edges of your parameter grid, it might be a good idea to
adjust the range of values in your parameter grid.
(B) Grid search is guaranteed to find the best hyperparameter values.
(C) It is possible to get different hyperparameters in different runs
of RandomizedSearchCV.
Varada's answers: A, C
You have a dataset and you give me 1/10th of it. The dataset given to me is rather small
and so I split it into 96% train and 4% validation split. I carry out hyperparameter
optimization using a single 4% validation split and report validation accuracy of 0.97.
Would it classify the rest of the data with similar accuracy?
1. Probably
2. Probably not
This study source was downloaded by 100000792727365 from CourseHero.com on 04-24-2024 11:29:43 GMT -05:00
https://www.coursehero.com/file/223575961/All-iClicker-Questions-330docx/
(A) In medical diagnosis, false positives are more damaging than false negatives (assume
"positive" means the person has a disease, "negative" means they don't).
(B) In spam classification, false positives are more damaging than false negatives
(assume "positive" means the email is spam, "negative" means they it's not).
(C) If method A gets a higher accuracy than method B, that means its precision is also
higher.
(D) If method A gets a higher accuracy than method B, that means its recall is also
higher.
Varada's answers: B
(A) Price per square foot would be a good feature to add in our X.
(B) The alpha hyperparameter of Ridge has similar interpretation of C hyperparameter
of LogisticRegression; higher alpha means more complex model.
(C) In Ridge, smaller alpha means bigger coefficients whereas bigger alpha means
smaller coefficients.
Varada's answers: C
(A) We can use still use precision and recall for regression problems but now we have
other metrics we can use as well.
(B) In sklearn for regression problems, using r2_score() and .score() (with default
values) will produce the same results.
(C) RMSE is always going to be non-negative.
(D) MSE does not directly provide the information about whether the model is
underpredicting or overpredicting.
(E) We can pass multiple scoring metrics to GridSearchCV or RandomizedSearchCV for
regression as well as classification problems.
Varada's answers: B, C, D, E
11.1: Ensembles
(A) Every tree in a random forest uses a different bootstrap sample of the training set.
This study source was downloaded by 100000792727365 from CourseHero.com on 04-24-2024 11:29:43 GMT -05:00
https://www.coursehero.com/file/223575961/All-iClicker-Questions-330docx/
(B) To train a tree in a random forest, we first randomly select a subset of features. The
tree is then restricted to only using those features.
(C) The n_estimators hyperparameter of random forests should be tuned to get a better
performance on the validation or test data.
(D) In random forests we build trees in a sequential fashion, where the current tree is
dependent upon the previous tree.
(E) Let classifiers A, B, and C have training errors of 10%, 20%, and 30%, respectively.
Then, the best possible training error from averaging A, B and C is 10%.
Varada's answers: A
This study source was downloaded by 100000792727365 from CourseHero.com on 04-24-2024 11:29:43 GMT -05:00
https://www.coursehero.com/file/223575961/All-iClicker-Questions-330docx/
Powered by TCPDF (www.tcpdf.org)