0% found this document useful (0 votes)
54 views

Lab Assignment - SVM - 2024

This document outlines a lab assignment on support vector machines (SVM) for students to complete individually. The assignment has two exercises: [1] Building SVM classifiers with different kernels on breast cancer data and evaluating their performance; [2] Using grid search to tune hyperparameters of an SVM pipeline on the same data. Students are instructed to load and explore the data, build and test SVM models, perform grid search, and submit a written report and demonstration video.

Uploaded by

tessliz2003
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views

Lab Assignment - SVM - 2024

This document outlines a lab assignment on support vector machines (SVM) for students to complete individually. The assignment has two exercises: [1] Building SVM classifiers with different kernels on breast cancer data and evaluating their performance; [2] Using grid search to tune hyperparameters of an SVM pipeline on the same data. Students are instructed to load and explore the data, build and test SVM models, perform grid search, and submit a written report and demonstration video.

Uploaded by

tessliz2003
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Lab assignment “Support Vector Machines”

Pre-requisite to carrying out the assignment:

1. Go through and watch all the lab tutorials of modules 3&4:


2. Download the breast cancer dataset.

Assignment due date: end of week # 6

General Instructions:

Be sure to read the following general instructions carefully:

1. This assignment must be completed individually by all the students.


2. You will accompany your solution submission with an analysis report that contains your findings
and the required screenshots.
3. A 5 minute demonstration video must be provided for your solution, and both the video and
the solution must be uploaded to eCentennial assignment dropbox. See the directions for
recording a video at the end of this document. Any submission without a demo video will
encounter a loss of 75%.

Assignment – exercise1: (35 marks)

Load & check the data:

1. Load the data into a pandas dataframe named data_firstname where first name is you name.
2. Carryout some initial investigations:
a. Check the names and types of columns.
b. Check the missing values.
c. Check the statistics of the numeric fields (mean, min, max, median, count..etc.)
d. In you written response write a paragraph explaining your findings about each column.

Pre-process and visualize the data

3. Replace the ‘?’ mark in the ‘bare’ column by np.nan and change the type to ‘float’
4. Fill any missing data with the median of the column.
5. Drop the ID column
6. Using Pandas, Matplotlib, seaborn (you can use any or a mix) generate 3-5 plots and add them
to your written response explaining what are the key insights and findings from the plots.
7. Separate the features from the class.
8. Split your data into train 80% train and 20% test, use the last two digits of your student number
for the seed.
Build Classification Models

Support vector machine classifier with linear kernel

9. Train an SVM classifier using the training data, set the kernel to linear and set the regularization
parameter to C= 0.1. Name the classifier clf_linear_firstname.
10. Print out two accuracy score one for the model on the training set i.e. X_train, y_train and the
other on the testing set i.e. X_test, y_test. Record both results in your written response.
11. Generate the accuracy matrix. Record the results in your written response.

Support vector machine classifier with “rbf” kernel

12. Repeat steps 9 to 11, in step 9 change the kernel to “rbf” and do not set any value for C.

Support vector machine classifier with “poly” kernel

13. Repeat steps 9 to 11, in step 9 change the kernel to “poly” and do not set any value for C.

Support vector machine classifier with “sigmoid” kernel

14. Repeat steps 9 to 11, in step 9 change the kernel to “sigmoid” and do not set any value for C.

(Optional: for steps 9 to 14 you can consider a loop)

By now you have the results of four SVM classifiers with different kernels recorded in your written
report. Please examine and write a small paragraph indicating which classifier you would recommend
and why.

Assignment – exercise2: (65 marks)

1. Load the data into a pandas dataframe named data_firstname_df2 where first name is you
name.
2. Replace the ‘?’ mark in the ‘bare’ column by np.nan and change the type to ‘float’
3. Drop the ID column
4. Separate the features from the class.
5. Split your data into train 80% train and 20% test use the last two digits of your student number
for the seed.
6. Using the preprocessing library to define two transformer objects to transform your training
data:
a. Fill the missing values with the median (hint: checkout SimpleImputer)
b. Scale the data (hint: checkout StandardScaler)
7. Combine the two transformers into a pipeline name it num_pipe_firstname.
8. Create a new Pipeline that has two steps the first is the num_pipe_firstname and the second is
an SVM classifier with random state = last two digits of your student number. Name the pipeline
pipe_svm_firstname. (make note of the labels)
9. Take a screenshot showing your num_pipe_firstname object and add it to your written report.
10. Define the grid search parameters in an object and name it param_grid, as follows:
a. 'svc__kernel': ['linear', 'rbf','poly'],
b. 'svc__C': [0.01,0.1, 1, 10, 100],
c. 'svc__gamma': [0.01, 0.03, 0.1, 0.3, 1.0, 3.0],
d. 'svc__degree':[2,3]},

Make sure you replace svc with the label you used in the pipe_svm_firstname for the model

11. Take a screenshot showing your grid search parameter object and add it to your written report.
12. Create a grid search object name it grid_search_firstname with the following parameters:
a. estimator= pipe_svm_firstname
b. param_grid=param_grid_svm
c. scoring='accuracy'
d. refit = True
e. verbose = 3
13. Take a screenshot showing your grid search object and add it to your written report.
14. Fit your training data to the gird search object. (This will take some time but you will see the
results on the console)
15. Print out the best parameters and note it in your written response
16. Printout the best estimator and note it in your written response
17. Create an object that holds the best model i.e. best estimator to an object named
best_model_firstname.
18. Fit the training data to the best model. Printout the accuracy score and note it in your written
response.
19. Save the model using the joblib (dump).
20. Save the full pipeline using the joblib – (dump).
21. Finally, in your written response Compare the results and write your conclusions. As part of
conclusions indicate the main difference between exercise #1 and exercise #2.

Naming and Submission Rules:

1. Submit one zipped file, use


2. You must name your submission according to the following rule:
YourFullname_COMP247_assignment#.zip
Example: AdamPerjouski_COMP247_assignment2.zip

3. Upload the zipped submission file on e-Centennial using the Assignment link(s).
4. In total you should submit the following:
a. One demonstration video
b. One python script for all exercises
c. One analysis report covering all exercises. Make sure you write your name and student
Id in the analysis report.
Evaluation Not acceptable Below Average Competent Excellent
criteria Average
0% - 24% 25%-49% 50-69% 70%-83% 84%-100%
Data exploration Missing all Some Majority of Majority of All requirements
Visualization & requirements requirements are requirements are requirements are implemented
Pre-processing required implemented. implemented but implemented. Correctly.
code some are
30% malfunctioning.
Model building No evidence of Minor evaluation Some of the Majority of Realistic
Validation testing and and testing requirements have requirements are evaluation and
&Testing evaluation of the efforts. been tested & tested & evaluated. testing,
30% requirements. evaluated. comparing the
solution to the
requirements.
Code No comments Minor comments Some code is Majority of code is All code is
Documentation explaining code. are implemented. correctly correctly correctly
5% commented. commented. commented.
Written analysis Missed all the Shows some Indicates thinking Indicates original Indicates
Content key ideas; very thinking and and reasoning thinking and synthesis of ideas,
10% shallow. reasoning but applied with original develops ideas with in-depth analysis
most ideas are thought on a few sufficient and firm and evidences
underdeveloped. ideas. evidence. original thought
and support for
the topic.
Written analysis Writing lacks Writing lacks Writing is coherent Writing is coherent Writing shows
Format and logical logical and logically and logically high degree of
organization organization. It organization. It organized. Some organized with attention to logic
5% shows no shows some points remain transitions used and reasoning of
coherence and coherence but misplaced. between ideas and all points. Unity
ideas lack unity. ideas lack unity. Format is neat but paragraphs to clearly leads the
Serious errors. Serious errors. has some assembly create coherence. reader to the
No transitions. Format needs errors. Overall unity of conclusion.
Format is very attention, some ideas is present. Format is neat
messy. major errors. Format is neat and and correctly
correctly assembled with
assembled. professional look.
Demonstration Very weak no Some parts of the All code changes All code changes A comprehensive
Video mention of the code changes presented but presented with view of all code
20% code changes. presented. without explanation explanation, changes
Execution of Execution of code why. Code exceeding time presented with
code not partially demonstrated. limit. Code explanation,
demonstrated. demonstrated. demonstrated. within time limit.
Code
demonstrated.

Demonstration Video Recording


Please record a short video (max 4-5 minutes) to explain/demonstrate your assignment solution. You
may use the Windows 10 Game bar to do the recording:
1. Press the Windows key + G at the same time to open the Game Bar dialog.

2. Check the "Yes, this is a game" checkbox to load the Game Bar.

3. Click on the Start Recording button (or Win + Alt + R) to begin capturing the video.

4. Stop the recording by clicking on the red recording bar that will be on the top right of the program
window.

(If it disappears on you, press Win + G again to bring the Game Bar back.)

You'll find your recorded video (MP4 file), under the Videos folder in a subfolder called Captures.

Submit the video together with your solution and written response.
End of lab assignment

You might also like