0% found this document useful (0 votes)
2K views

This Study Resource Was

The document discusses various preprocessing and machine learning techniques in scikit-learn. It provides code examples and questions related to preprocessing techniques like normalization, encoding, imputing etc. It also discusses machine learning algorithms like k-nearest neighbors, decision trees, clustering, support vector machines etc. and the relevant scikit-learn APIs. The document tests the understanding of core scikit-learn concepts through multiple choice questions.

Uploaded by

John Solomon
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2K views

This Study Resource Was

The document discusses various preprocessing and machine learning techniques in scikit-learn. It provides code examples and questions related to preprocessing techniques like normalization, encoding, imputing etc. It also discusses machine learning algorithms like k-nearest neighbors, decision trees, clustering, support vector machines etc. and the relevant scikit-learn APIs. The document tests the understanding of core scikit-learn concepts through multiple choice questions.

Uploaded by

John Solomon
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

The preprocessing technique in which a dataset is transformed to a distribution

of mean 0 and variance 1 is known as __________________.


Mean Removal

What is the output of the following code?


import sklearn.preprocessing as preprocessing
x = [[7.8], [1.3], [4.5], [0.9]]
print(preprocessing.Binarizer().fit(x).transform(x).shape)
(4, 1)

The preprocessing technique in which categorical values are transformed to


categorical integers is known as ________________.
Encoding

What is the output of the following code?


import sklearn.preprocessing as preprocessing
x = [[0, 0], [0, 1], [2,0]]
enc = preprocessing.OneHotEncoder()
print(enc.fit(x).transform([[1, 1]]).toarray())
[[ 0. 0. 0. 1.]]

Which of the following module of sklearn contains preprocessing utilities?

m
Preprocessing

er as
co
Which of the following API is used to scale a dataset to range 0 and 1?

eH w
MinMaxScaler

o.
What is the output of the following code?
rs e
import sklearn.preprocessing as preprocessing
ou urc
regions = ['HYD', 'CHN', 'MUM', 'HYD', 'KOL', 'CHN']
print(preprocessing.LabelEncoder().fit(regions).transform(regions))
[1 0 3 1 2 0]
o

The preprocessing technique in which missing values are replaced with the mean
aC s

of a dataset is known as _______________.


v i y re

Imputing

What is the output of the following code?


import sklearn.preprocessing as preprocessing
x = [[7.8], [1.3], [4.5], [0.9]]
ed d

print(preprocessing.Binarizer().fit(x).transform(x))
ar stu

[[ 1.]
[ 1.]
[ 1.]
[ 1.]]
sh is

Scikit-learn provides Pipeline utility to build a pipeline, which performs a


series of transformations.
Th

True

________ parameter is used to control the number of neighbors of


KNearestClassifier.
n_value

Which regressor utility of sklearn.neighbors is used to learn from k nearest


neighbors of each query point?
KNeighborsRegressor

Which of the following module of sklearn is used to deal with Nearest Neighbors?
n_neighbors

Which of the following parameter can be used to give more weightage to the
points, which are nearer to a point in the nearest neighbors method?
weights

This study source was downloaded by 100000832806195 from CourseHero.com on 09-28-2021 04:30:41 GMT -05:00

https://www.coursehero.com/file/52381959/Machine-Learning-Using-scikit-learntxt/
What is the strategy followed by Radius Neighbors method?
It looks in the vincinity of area, covered by a fixed radius, of each training
point.

Which of the following class is used to implement the K-Nearest Neighbors


classification in scikit-learn?
KNeighborsClassifier

Neighbors-based regression is mainly used when the data labels are continuous
rather than discrete variables.
True

Which of the following algorithms can be used with any nearest neighbors utility
in scikit-learn?
all

A feature can be reused to split a tree during Decision tree creation.


True

Which of the following module of sklearn is used for dealing with Decision
Trees?

m
tree

er as
co
Which of the following utility is used for regression using decision trees?

eH w
DecisionRegressor

o.
Decision trees overfit the data very easily.
rs e
True
ou urc
Which of the following parameter is used to tune a Decision Tree?
split
o

Data used for Decision Trees have to be preprocessed compulsorily.


aC s

False
v i y re

A small change in data features may change a Decision Tree completely.


True

Ensemble methods are better than Decision Trees.


ed d

True
ar stu

More improvement is found in an ensemble when base estimators are highly


correlated?
False
sh is

Which of the following utility of sklearn.ensemble is used for classification


with extra randomness?
Th

RandomForestClassifier

Which of the following utility of sklearn.ensemble is used for implementing


classification with the bagging method?
BaggingClassifier

Which of the following are Boosting ensemble methods?


Adaboost, Gradient Tree Boosting

Which of the following module of sklearn is used for dealing with ensemble
methods?
ensemble

Which parameter is used to manage many base estimators in


RandomForestClassifier?
n_estimators

This study source was downloaded by 100000832806195 from CourseHero.com on 09-28-2021 04:30:41 GMT -05:00

https://www.coursehero.com/file/52381959/Machine-Learning-Using-scikit-learntxt/
Which approach is used by SVC and NuSVC for multi-class classification?
one vs one

What happens when very small value is used for parameter C in support vector
machines?
Misclassification happens- but in here all options are wrong

Which of the following module of sklearn provides the utilities to deal with
support vector machines?
svm

SVM algorithms are memory efficient.


True

Which attribute provides details of obtained support vectors, after classifying


data using SVC?
support_vectors
support_vectors_

What values can be used for kernel parameter of SVC class?

m
All

er as
co
LinearSVC class accepts kernel parameter value.

eH w
False

o.
Scaling or Normalization of data improves the accuracy of support vector
rs e
machines.
ou urc
True

Which of the following parameter of SVC method is used for fine-tuning the
model?
o

C
aC s
v i y re

Which of the following utilities are provided by sklearn to perform


classification using support vector machines?
All

Agglomerative Clustering follows a top-down approach.


ed d

false
ar stu

Which of the following attribute is used to access cluster centers, after


completing clustering of given data points, using one of the clustering
algorithms?
cluster_centers_
sh is

Which of the following parameters are used to control Density-based clustering?


Th

eps, min_samples

Which of the following parameters are used to control Affinity Propagation


clustering ?
preference, damping

Which of the following clustering technique is used to group data points into
user given k clusters?
K-means clustering

What values can be used for the linkage parameter in AgglomerativeClustering?


All

Which of the following utility of sklearn.cluster is used for performing k-means


clustering?
KMeans()

This study source was downloaded by 100000832806195 from CourseHero.com on 09-28-2021 04:30:41 GMT -05:00

https://www.coursehero.com/file/52381959/Machine-Learning-Using-scikit-learntxt/
Spectral Clustering is best suited for identifying dense clusters.
False

What does the Homogeneity score of a clsutering algorithm indicate ?


Verifies if each cluster contains only members of a single class.\

Which of the following module of sklearn contains popular processed datasets?


datasets

Data used for Decision Trees have to be preprocessed compulsorily.


False

Which of the following API is used to normalize a sample to the unit norm?
Normalizer

Which of the following library is widely used to read data from external sources
with structured data?
pandas

Which of the following expressions can access the features of the iris dataset,
shown in the below expression?

m
from sklearn import datasets

er as
iris = datasets.load_iris()

co
iris.data

eH w
What do the methods starting with fetch, of sklearn.datasets module do?

o.
Downloads a specific dataset from a library
rs e
ou urc
Which of the following is an important parameter of RadiusNeighborsClassifier?
radius
o
aC s
v i y re
ed d
ar stu
sh is
Th

This study source was downloaded by 100000832806195 from CourseHero.com on 09-28-2021 04:30:41 GMT -05:00

https://www.coursehero.com/file/52381959/Machine-Learning-Using-scikit-learntxt/
Powered by TCPDF (www.tcpdf.org)

You might also like