0% found this document useful (0 votes)
4 views

CS6301 Homework2 KR

This document outlines Homework 2 for the CS6301 Machine Learning course, due on February 25, 2019. It involves generating synthetic data for a nonlinear binary classification problem and exploring the effects of the regularization parameter C and the RBF kernel parameter gamma on Support Vector Machines. Students are required to analyze training and validation errors, select optimal parameters, and report test set accuracy based on their findings.

Uploaded by

Erhan Tiryaki
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

CS6301 Homework2 KR

This document outlines Homework 2 for the CS6301 Machine Learning course, due on February 25, 2019. It involves generating synthetic data for a nonlinear binary classification problem and exploring the effects of the regularization parameter C and the RBF kernel parameter gamma on Support Vector Machines. Students are required to analyze training and validation errors, select optimal parameters, and report test set accuracy based on their findings.

Uploaded by

Erhan Tiryaki
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

CS6301-Homework2_KR 2/24/19, 7)38 PM

CS6301.010: Machine Learning for Engineers and Scientists (Spring '19)


Instructor: Gautam Kunapuli
Due: In class, February 25 (Monday)

Kaitlin Rabe

Homework 2
The report component of this assignment is the hard copy of this homework, along with your answers to
questions, and is due at the start of class on Monday, February 25, 2019.

The electronic version of this homework must be uploaded on eLearning by 9:59am Central Standard
Time, Monday, February 25, 2019. All deadlines are hard and without exceptions unless permission was
obtained from the instructor in advance.

You may work in groups to discuss the problems and work through solutions together. However, you must
write up your solutions on your own, without copying another student's work or letting another student
copy your work. In your solution for each problem, you must write down the names of your partner (if any);
this will not affect your grade.

1. **Support Vector Machines with Synthetic Data**, 50


points.

For this problem, we will generate synthetic data for a nonlinear binary classification problem and partition it
into training, validation and test sets. Our goal is to understand the behavior of SVMs with Radial-Basis
Function (RBF) kernels with different values of C and γ .

In [1]: #
# DO NOT EDIT THIS FUNCTION; IF YOU WANT TO PLAY AROUND WITH DATA GENE
RATION,
# MAKE A COPY OF THIS FUNCTION AND THEN EDIT
#
import numpy as np
from sklearn.datasets import make_moons
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap

def generate_data(n_samples, tst_frac=0.2, val_frac=0.2):


# Generate a non-linear data set
X, y = make_moons(n_samples=n_samples, noise=0.25, random_state=42)

# Take a small subset of the data and make it VERY noisy; that is, g
enerate outliers
http://localhost:8888/nbconvert/html/Box%20Sync/Coursework/Spring%202019/CS%206301.010%20-%20Machine%20Learning/CS6301-Homework2_KR.ipynb?download=false Page 1 of 13
CS6301-Homework2_KR 2/24/19, 7)38 PM

enerate outliers
m = 30
np.random.seed(42)
ind = np.random.permutation(n_samples)[:m]
X[ind, :] += np.random.multivariate_normal([0, 0], np.eye(2), (m, ))
y[ind] = 1 - y[ind]

# Plot this data


cmap = ListedColormap(['#b30065', '#178000'])
plt.scatter(X[:, 0], X[:, 1], c=y, cmap=cmap, edgecolors='k')

# First, we use train_test_split to partition (X, y) into training a


nd test sets
X_trn, X_tst, y_trn, y_tst = train_test_split(X, y, test_size=tst_fr
ac,
random_state=42)

# Next, we use train_test_split to further partition (X_trn, y_trn)


into training and validation sets
X_trn, X_val, y_trn, y_val = train_test_split(X_trn, y_trn, test_siz
e=val_frac,
random_state=42)

return (X_trn, y_trn), (X_val, y_val), (X_tst, y_tst)

In [2]: #
# DO NOT EDIT THIS FUNCTION; IF YOU WANT TO PLAY AROUND WITH VISUALIZ
ATION,
# MAKE A COPY OF THIS FUNCTION AND THEN EDIT
#

def visualize(models, param, X, y):


# Initialize plotting
if len(models) % 3 == 0:
nrows = len(models) // 3
else:
nrows = len(models) // 3 + 1

fig, axes = plt.subplots(nrows=nrows, ncols=3, figsize=(15, 5.0 * nr


ows))
cmap = ListedColormap(['#b30065', '#178000'])

# Create a mesh
xMin, xMax = X[:, 0].min() - 1, X[:, 0].max() + 1
yMin, yMax = X[:, 1].min() - 1, X[:, 1].max() + 1
xMesh, yMesh = np.meshgrid(np.arange(xMin, xMax, 0.01),
np.arange(yMin, yMax, 0.01))

for i, (p, clf) in enumerate(models.items()):


# if i > 0:
# break
r, c = np.divmod(i, 3)
ax = axes[r, c]

# Plot contours
zMesh = clf.decision_function(np.c_[xMesh.ravel(), yMesh.ravel()])
zMesh = zMesh.reshape(xMesh.shape)
ax.contourf(xMesh, yMesh, zMesh, cmap=plt.cm.PiYG, alpha=0.6)

http://localhost:8888/nbconvert/html/Box%20Sync/Coursework/Spring%202019/CS%206301.010%20-%20Machine%20Learning/CS6301-Homework2_KR.ipynb?download=false Page 2 of 13
CS6301-Homework2_KR 2/24/19, 7)38 PM

if (param == 'C' and p > 0.0) or (param == 'gamma'):


ax.contour(xMesh, yMesh, zMesh, colors='k', levels=[-1, 0, 1],
alpha=0.5, linestyles=['--', '-', '--'])

# Plot data
ax.scatter(X[:, 0], X[:, 1], c=y, cmap=cmap, edgecolors='k')
ax.set_title('{0} = {1}'.format(param, p))

In [3]: # Generate the data


n_samples = 300 # Total size of data set
(X_trn, y_trn), (X_val, y_val), (X_tst, y_tst) = generate_data(n_sampl
es)

a. (25 points) The effect of the regularization parameter, C


Complete the Python code snippet below that takes the generated synthetic 2-d data as input and learns
non-linear SVMs. Use scikit-learn's SVC (https://scikit-
learn.org/stable/modules/generated/sklearn.svm.SVC.html) function to learn SVM models with radial-basis
kernels for fixed γ and various choices of C ∈ {10−3 , 10−2 ⋯ , 1, ⋯ 105 }. The value of γ is fixed to
γ = d⋅σ1 , where d is the data dimension and σX is the standard deviation of the data set X . SVC can
X
automatically use these setting for γ if you pass the argument gamma = 'scale' (see documentation for more
details).

Plot: For each classifier, compute both the training error and the validation error. Plot them together,
making sure to label the axes and each curve clearly.

Discussion: How do the training error and the validation error change with C ? Based on the visualization of
the models and their resulting classifiers, how does changing C change the models? Explain in terms of
minimizing the SVM's objective function 12 w′ w + C Σni= 1 ℓ(w ∣ xi , yi ), where ℓ is the hinge loss for each
training example (xi , y i ) .

Final Model Selection: Use the validation set to select the best the classifier corresponding to the best
value, Cbest . Report the accuracy on the test set for this selected best SVM model. Note: You should report
a single number, your final test set accuracy on the model corresponding to $C{best}$_.

http://localhost:8888/nbconvert/html/Box%20Sync/Coursework/Spring%202019/CS%206301.010%20-%20Machine%20Learning/CS6301-Homework2_KR.ipynb?download=false Page 3 of 13
CS6301-Homework2_KR 2/24/19, 7)38 PM

In [4]: # Learn support vector classifiers with a radial-basis function kernel


with
# fixed gamma = 1 / (n_features * X.std()) and different values of C
C_range = np.arange(-3.0, 6.0, 1.0)
C_values = np.power(10.0, C_range)

models = dict()
trnErr = dict()
valErr = dict()
tstAcc = dict()

from sklearn.svm import SVC

for C in C_values:

classifier = SVC(C, kernel='rbf', gamma='scale')


models[C] = classifier.fit(X_trn, y_trn)
trnErr[C] = 1 - classifier.score(X_trn, y_trn)
valErr[C] = 1 - classifier.score(X_val, y_val)
tstAcc[C] = classifier.score(X_tst, y_tst)

visualize(models, 'C', X_trn, y_trn)

# Plot training error and validation error vs. regularization paramete


r for each classifier

plt.figure()
plt.plot(trnErr.keys(), trnErr.values(), marker='o', linewidth=3, mark
ersize=12)
plt.plot(valErr.keys(), valErr.values(), marker='s', linewidth=3, mark
ersize=12)
plt.xlabel('Regularization Parameter, C', fontsize=16)
plt.ylabel('Training/Validation Error', fontsize=16)
plt.legend(['Training Error', 'Validation Error'], fontsize=16)
plt.xscale('log')

# Code to perform model selection

min_Error = min(valErr.values())
C_best = 0

for C in valErr.keys():
if valErr.get(C) == min_Error:
C_best = C
tst_Accuracy = tstAcc.get(C)
print("C_best is", C_best)
print("Test accuracy with C_best is", "%.5f" % tst_Accuracy)
print("-------------------------------------------------------
---")

C_best is 1.0
Test accuracy with C_best is 0.65000
----------------------------------------------------------

http://localhost:8888/nbconvert/html/Box%20Sync/Coursework/Spring%202019/CS%206301.010%20-%20Machine%20Learning/CS6301-Homework2_KR.ipynb?download=false Page 4 of 13
CS6301-Homework2_KR 2/24/19, 7)38 PM

Discussion

As the regularization parameter, C, is increased, the training error continues to decrease approaching
zero. The C parameter acts as a trade off between correct classification and a large margin for the
support vector. As C continues to increase, the margin will decrease and the classifier will attempt to
correctly classify all of the training examples. This corresponds to overfitting the training dataset. This is
confirmed when you look at the validation error, as it begins to increase with increasing C. Based on the
data for this example, a C value of 1 would be the most generalizable without overfitting the data.

http://localhost:8888/nbconvert/html/Box%20Sync/Coursework/Spring%202019/CS%206301.010%20-%20Machine%20Learning/CS6301-Homework2_KR.ipynb?download=false Page 5 of 13
CS6301-Homework2_KR 2/24/19, 7)38 PM

b. (25 points) The effect of the RBF kernel parameter, γ


Complete the Python code snippet below that takes the generated synthetic 2-d data as input and learns
various non-linear SVMs. Use scikit-learn's SVC (https://scikit-
learn.org/stable/modules/generated/sklearn.svm.SVC.html) function to learn SVM models with radial-basis
−2 −1 2 3
kernels for fixed C and various choices of γ ∈ {10 , 10 1, 10, 10 10 }. The value of C is fixed to
C = 10.

Plot: For each classifier, compute both the training error and the validation error. Plot them together,
making sure to label the axes and each curve clearly.

Discussion: How do the training error and the validation error change with γ ? Based on the visualization of
the models and their resulting classifiers, how does changing γ change the models? Explain in terms of the
functional form of the RBF kernel, κ(x, z) = exp(−γ ⋅ ‖x −z‖2 )

Final Model Selection: Use the validation set to select the best the classifier corresponding to the best
value, γbest . Report the accuracy on the test set for this selected best SVM model. Note: You should report a
single number, your final test set accuracy on the model corresponding to $\gamma{best}$_.

http://localhost:8888/nbconvert/html/Box%20Sync/Coursework/Spring%202019/CS%206301.010%20-%20Machine%20Learning/CS6301-Homework2_KR.ipynb?download=false Page 6 of 13
CS6301-Homework2_KR 2/24/19, 7)38 PM

In [5]: # Learn support vector classifiers with a radial-basis function kernel


with
# fixed C = 10.0 and different values of gamma
gamma_range = np.arange(-2.0, 4.0, 1.0)
gamma_values = np.power(10.0, gamma_range)

models = dict()
trnErr = dict()
valErr = dict()
tstAcc = dict()

from sklearn.svm import SVC

for G in gamma_values:

classifier = SVC(C=10, kernel='rbf', gamma=G)


models[G] = classifier.fit(X_trn, y_trn)
trnErr[G] = 1 - classifier.score(X_trn, y_trn)
valErr[G] = 1 - classifier.score(X_val, y_val)
tstAcc[G] = classifier.score(X_tst, y_tst)

visualize(models, 'gamma', X_trn, y_trn)

# Plot training error and validation error vs. gamma for each classifi
er

plt.figure()
plt.plot(trnErr.keys(), trnErr.values(), marker='o', linewidth=3, mark
ersize=12)
plt.plot(valErr.keys(), valErr.values(), marker='s', linewidth=3, mark
ersize=12)
plt.xlabel('Gamma', fontsize=16)
plt.ylabel('Training/Validation Error', fontsize=16)
plt.legend(['Training Error', 'Validation Error'], fontsize=16)
plt.xscale('log')

# Code to perform model selection

min_Error = min(valErr.values())
G_best = 0

for G in valErr.keys():
if valErr.get(G) == min_Error:
G_best = G
tst_Accuracy = tstAcc.get(G)
print("G_best is", G_best)
print("Test accuracy with G_best is", "%.5f" % tst_Accuracy)
print("-------------------------------------------------------
---")

G_best is 1.0
Test accuracy with G_best is 0.66667
----------------------------------------------------------
G_best is 10.0
Test accuracy with G_best is 0.61667
----------------------------------------------------------

http://localhost:8888/nbconvert/html/Box%20Sync/Coursework/Spring%202019/CS%206301.010%20-%20Machine%20Learning/CS6301-Homework2_KR.ipynb?download=false Page 7 of 13
CS6301-Homework2_KR 2/24/19, 7)38 PM

Discussion

As gamma is increased, the training error continues to decrease to zero. This is because gamma defines
the magnitude of the influence of a a single training example, or the inverse of the radius of each
training examples support vector. Therefore when gamma increases, the classifier begins to dramatically
overfit the training dataset. This becomes very obvious when you look at the the validation error, as it
begins to increase rapidly with increasing gamma. Based on the data for this example, a gamma value
of 1 would be the most generalizable without overfitting the data.

2. **Breast Cancer Diagnosis with Support Vector


Machines**, 25 points.

For this problem, we will use the Wisconsin Breast Cancer


(https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic)) data set, which has already
been pre-processed and partitioned into training, validation and test sets. Numpy's loadtxt
(https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.loadtxt.html) command can be used
to load CSV files.

In [6]: # Load the Breast Cancer Diagnosis data set; download the files from e
Learning
# CSV files can be read easily using np.loadtxt()

wdbc_trn = np.loadtxt('wdbc_trn.csv', dtype = 'float', delimiter = ','


)
http://localhost:8888/nbconvert/html/Box%20Sync/Coursework/Spring%202019/CS%206301.010%20-%20Machine%20Learning/CS6301-Homework2_KR.ipynb?download=false Page 8 of 13
CS6301-Homework2_KR 2/24/19, 7)38 PM

)
wdbc_val = np.loadtxt('wdbc_val.csv', dtype = 'float', delimiter = ','
)
wdbc_tst = np.loadtxt('wdbc_tst.csv', dtype = 'float', delimiter = ','
)

y_trn = wdbc_trn[:, 0]
X_trn = wdbc_trn[:, 1:]

y_val = wdbc_val[:, 0]
X_val = wdbc_val[:, 1:]

y_tst = wdbc_val[:, 0]
X_tst = wdbc_val[:, 1:]

Use scikit-learn's SVC (https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html) function to


learn SVM models with radial-basis kernels for each combination of C ∈ {10−2 , 10−1 , 1, 101 , ⋯ 104 }
and γ ∈ {10−3 , 10−2 10−1 , 1, 10, 102 }. Print the tables corresponding to the training and validation errors.

Final Model Selection: Use the validation set to select the best the classifier corresponding to the best
parameter values, Cbest and γbest . Report the accuracy on the test set for this selected best SVM model.
Note: You should report a single number, your final test set accuracy on the model corresponding to $C{best}
and \gamma{best}$.

In [7]: from sklearn.svm import SVC

# Learn SVM models for each combination of C and gamma

C_range = np.arange(-2.0, 5.0, 1.0)


C_values = np.power(10.0, C_range)

gamma_range = np.arange(-3.0, 3.0, 1.0)


gamma_values = np.power(10.0, gamma_range)

models = dict()
trnErr = dict()
valErr = dict()
tstAcc = dict()

print("C", '\t', '\t', "G", '\t', '\t', "Training Error", '\t', "Valid
ation Error")
print("---", '\t', '\t', "---", '\t', '\t', "---------------", '\t', "
----------------")

for C in C_values:

for G in gamma_values:

classifier = SVC(C, kernel='rbf', gamma=G)


models[C, G] = classifier.fit(X_trn, y_trn)
trnErr[C, G] = 1 - classifier.fit(X_trn, y_trn).score(X_trn, y
_trn)
valErr[C, G] = 1 - classifier.score(X_val ,y_val)
tstAcc[C, G] = classifier.score(X_tst, y_tst)

#Print tables corresponding to training and validation errors

print(C, '\t', '\t', G, '\t', '\t', "%.5f" % trnErr[C, G], '\t


', '\t', "%.5f" % valErr[C, G])

# Code for final model selection


http://localhost:8888/nbconvert/html/Box%20Sync/Coursework/Spring%202019/CS%206301.010%20-%20Machine%20Learning/CS6301-Homework2_KR.ipynb?download=false Page 9 of 13
CS6301-Homework2_KR 2/24/19, 7)38 PM

# Code for final model selection

min_Error = min(valErr.values())
C_best = 0
G_best = 0

for k in valErr.keys():
if valErr.get(k) == min_Error:
C_best, G_best = k
tst_Accuracy = tstAcc.get(k)
print("C_best and G_best are", C_best, "and", G_best, ", r
espectively.")
print("Test accuracy with (C_best, G_best) is", "%.5f" % t
st_Accuracy)
print("---------------------------------------------------
-------")

C G Training Error Validation


Error
--- --- --------------- -----------
-----
0.01 0.001 0.37168 0.37391
0.01 0.01 0.37168 0.37391
0.01 0.1 0.37168 0.37391
0.01 1.0 0.37168 0.37391
0.01 10.0 0.37168 0.37391
0.01 100.0 0.37168 0.37391
0.1 0.001 0.30678 0.30435
0.1 0.01 0.05015 0.06957
0.1 0.1 0.03540 0.07826
0.1 1.0 0.37168 0.37391
0.1 10.0 0.37168 0.37391
0.1 100.0 0.37168 0.37391
1.0 0.001 0.04720 0.06087
1.0 0.01 0.02950 0.06087
1.0 0.1 0.01180 0.04348
1.0 1.0 0.00000 0.37391
1.0 10.0 0.00000 0.37391
1.0 100.0 0.00000 0.37391
10.0 0.001 0.02655 0.03478
10.0 0.01 0.01180 0.04348
10.0 0.1 0.00000 0.03478
10.0 1.0 0.00000 0.37391
10.0 10.0 0.00000 0.37391
10.0 100.0 0.00000 0.37391
100.0 0.001 0.01475 0.03478
100.0 0.01 0.00295 0.02609
100.0 0.1 0.00000 0.03478
100.0 1.0 0.00000 0.37391
100.0 10.0 0.00000 0.37391
100.0 100.0 0.00000 0.37391
1000.0 0.001 0.00590 0.03478
1000.0 0.01 0.00000 0.02609
1000.0 0.1 0.00000 0.03478
1000.0 1.0 0.00000 0.37391
1000.0 10.0 0.00000 0.37391
1000.0 100.0 0.00000 0.37391
10000.0 0.001 0.00000 0.0
2609
10000.0 0.01 0.00000 0.0
2609
10000.0 0.1 0.00000 0.0
3478
10000.0 1.0 0.00000 0.3
7391
10000.0 10.0 0.00000 0.3
7391
10000.0 100.0 0.00000 0.3
7391
http://localhost:8888/nbconvert/html/Box%20Sync/Coursework/Spring%202019/CS%206301.010%20-%20Machine%20Learning/CS6301-Homework2_KR.ipynb?download=false Page 10 of 13
CS6301-Homework2_KR 2/24/19, 7)38 PM

7391
C_best and G_best are 100.0 and 0.01 , respectively.
Test accuracy with (C_best, G_best) is 0.97391
----------------------------------------------------------
C_best and G_best are 1000.0 and 0.01 , respectively.
Test accuracy with (C_best, G_best) is 0.97391
----------------------------------------------------------
C_best and G_best are 10000.0 and 0.001 , respectively.
Test accuracy with (C_best, G_best) is 0.97391
----------------------------------------------------------
C_best and G_best are 10000.0 and 0.01 , respectively.
Test accuracy with (C_best, G_best) is 0.97391
----------------------------------------------------------

3. **Breast Cancer Diagnosis with k -Nearest


Neighbors**, 25 points.

Use scikit-learn's k-nearest neighbor (https://scikit-


learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html) classifier to learn models
for Breast Cancer Diagnosis with k ∈ {1, 5, 11, 15, 21}, with the kd-tree algorithm.

Plot: For each classifier, compute both the training error and the validation error. Plot them together,
making sure to label the axes and each curve clearly.

Final Model Selection: Use the validation set to select the best the classifier corresponding to the best
parameter value, k best . Report the accuracy on the test set for this selected best kNN model. Note: You
should report a single number, your final test set accuracy on the model corresponding to $k{best}$_.

http://localhost:8888/nbconvert/html/Box%20Sync/Coursework/Spring%202019/CS%206301.010%20-%20Machine%20Learning/CS6301-Homework2_KR.ipynb?download=false Page 11 of 13
CS6301-Homework2_KR 2/24/19, 7)38 PM

In [8]: from sklearn.neighbors import KNeighborsClassifier

# Learn kNN models for k = 1, 5, 11, 15, 21

k_values = [1, 5, 11, 15, 21]

models = dict()
trnErr = dict()
valErr = dict()
tstAcc = dict()

for k in k_values:

kNNclassifier = KNeighborsClassifier(n_neighbors=k, algorithm='kd_


tree')
models[k] = kNNclassifier.fit(X_trn, y_trn)
trnErr[k] = 1 - kNNclassifier.score(X_trn, y_trn)
valErr[k] = 1 - kNNclassifier.score(X_val, y_val)
tstAcc[k] = kNNclassifier.score(X_tst, y_tst)

# Plot training error and validation error vs. k for each classifier

plt.figure()
plt.plot(trnErr.keys(), trnErr.values(), marker='o', linewidth=3, mark
ersize=12)
plt.plot(valErr.keys(), valErr.values(), marker='s', linewidth=3, mark
ersize=12)
plt.xlabel('# of Nearest Neighbors, k', fontsize=16)
plt.ylabel('Training/Validation Error', fontsize=16)
plt.legend(['Training Error', 'Validation Error'], fontsize=16)
plt.axis('tight')

# Code to perform model selection

min_Error = min(valErr.values())
k_best = 0

for k in valErr.keys():
if valErr.get(k) == min_Error:
k_best = k
tst_Accuracy = tstAcc.get(k)
print("k_best is", k_best)
print("Test accuracy with k_best is", "%.5f" % tst_Accuracy)
print("-------------------------------------------------------
---")

http://localhost:8888/nbconvert/html/Box%20Sync/Coursework/Spring%202019/CS%206301.010%20-%20Machine%20Learning/CS6301-Homework2_KR.ipynb?download=false Page 12 of 13
CS6301-Homework2_KR 2/24/19, 7)38 PM

k_best is 5
Test accuracy with k_best is 0.95652
----------------------------------------------------------
k_best is 11
Test accuracy with k_best is 0.95652
----------------------------------------------------------

Discussion: Which of these two approaches, SVMs or kNN, would you prefer for this classification task?
Explain.

I would choose the support vector machine classifier for this task because it gives the best classification
accuracy when applied with to test dataset.

I worked with Erhan Tryaki on parts of this assignment.

In [ ]:

http://localhost:8888/nbconvert/html/Box%20Sync/Coursework/Spring%202019/CS%206301.010%20-%20Machine%20Learning/CS6301-Homework2_KR.ipynb?download=false Page 13 of 13

You might also like