Assignment 3 1
Assignment 3 1
Problem Statement:
A floriculture research team X is studying the use of multiple measurements to distinguish three
different iris flower species. The dataset contains a set of 150 records under five attributes: sepal
length, sepal width, petal length, petal width and species (see Fig. 1). Develop a logistic regression
that classifies the species according to the above measurements.
Implementation: [3+2=5]
● Implementation of gradient descent approach towards logistic regression for multiple classes
[LOG_MUL_GRAD] with Minibatch
● Evaluate the model using (a) Accuracy; (b) confusion matrix; (c) precision, recall and f1-score.
**Implement [LOG_MUL_GRAD] from scratch. You may make use of the numpy library to perform
matrix operations.
**In general, you may use libraries to process and handle data.
**For training [LOG_MUL_GRAD], use minibatch size of 30 and total no of epochs 50.
Experiments: [3+3+2=8]
The dataset will be split into Train:Validation:Test with 60:20:20 ratio.
2. Experiment 2: With the optimal parameters found in the earlier experiments, plot the average
class probability for each class in the training data after every epoch. Specifically,
a. Segregate the training data into three different sets according to their true class label.
b. For a particular set, find the mean probabilities for different classes using the updated
weights after every epoch. Then, plot the probabilities vs epochs in a single figure.
c. Repeat the previous step (Step b) for all three sets.
(For the 3 class classification problem, there will be 3 plots corresponding to 3 different
sets, with each plot having 3 probability curves corresponding to 3 different classes.)
3. Experiment 3:
a. Analyse the performance of the two models [LOG_MUL_GRAD] with the optimal
hyperparameters found in the earlier experiments using the following:
1. Confusion matrix as a matrix and heat map.
2. Precision, Recall, and F1-score for individual classes. What characteristics of the
dataset will be reflected in this class-specific information (e.g., if data-size for a
particular class is relatively less or has a large amount of noise)?
Report your observations with appropriate explanations.
Datasets:
This dataset comprises three iris species with 50 samples each as well as some properties about each
flower. You can find the dataset here.
Submission:
A .zip file containing the python source code and a PDF report file. The final name should
follow the template: <Assign-No>_<Your Roll No>.zip. For example, if your roll no is
15CE30021, the filename for Assignment 3 will be: Assign-3_15ce30021.zip
1. A single python code (.py) containing the implementations of the models and experiments with
comments at function level. The first two lines should contain your name and roll no.
Deadline:
The deadline for submission is 2nd SEPTEMBER (Saturday), 11:55 PM, IST. Irrespective of the time
in your device, once submission in moodle is closed, no request for submission post-deadline will be
entertained. No email submission will be considered. So, it is suggested that you start submitting the
solution at least one hour before the deadline.