Homework2

Homework 2 for Deep Learning (CS/DS541) involves collaboration with a partner and prohibits the use of ChatGPT. The assignment includes mathematical derivations related to the XOR problem, softmax regression, and implementation of a softmax neural network using the Fashion MNIST dataset, along with questions on logistic regression convergence. Students must submit a Zip file containing Python and PDF files on Canvas, with specific evaluation criteria for performance on the test set.

Uploaded by

Madhav Kalyan

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

Homework2

Uploaded by

Madhav Kalyan

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Homework 2 – Deep Learning (CS/DS541, Whitehill, Fall 2024)

Collaboration policy: You may complete this assignment with a partner, if you choose. In that case,
both partners should sign up on Canvas as a “team”, and only one of you should submit the assignment.
You are not permitted to use ChatGPT on any part of this assignment.

1. A linear NN will never solve the XOR problem [10 points, on paper]: Read the description of
the XOR problem from Section 6.1 of the Deep Learning textbook: https://www.deeplearningbook.
org/contents/mlp.html. Then show (by deriving the gradient, setting to 0, and solving mathemat-
ically, not in Python) that the values for w = [w1 , w2 ]⊤ and b that minimize the function fMSE (w, b)
in Equation 6.1 are: w1 = 0, w2 = 0, and b = 0.5 – in other words, the best prediction line is simply
flat and always guesses ŷ = 0.5.
2. Derivation of softmax regression gradient updates [20 points, on paper]: As explained in
class, let
W = w(1) . . . w(c)
be an m×c matrix containing the weight vectors from the c different classes. The output of the softmax
regression neural network is a vector with c dimensions such that:
exp zk
ŷk = Pc (1)
k′ =1 exp zk
′

zk = x⊤ w(k) + bk

for each k = 1, . . . , c. Correspondingly, our cost function will sum over all c classes:
n c
1 X X (i) (i)
fCE (W, b) = − yk log ŷk
n i=1
k=1

Important note: When deriving the gradient expression for each weight vector w(l) , it is crucial to
keep in mind that the weight vector for each class l ∈ {1, . . . , c} affects the outputs of the network for
every class, not just for class l. This is due to the normalization in Equation 1 – if changing the weight
vector increases the value of ŷl , then it necessarily must decrease the values of the other ŷl′ ̸=l .
In this homework problem, please complete the following derivation that is outlined below:
Derivation: For each weight vector w(l) , we can derive the gradient expression as:
n c
1 X X (i) (i)
∇w(l) fCE (W, b) = − yk ∇w(l) log ŷk
n i=1
k=1
n X c (i)
!
1 X (i) ∇w(l) ŷk
= − yk (i)
n i=1 ŷ
k=1 k

We handle the two cases l = k and l ̸= k separately. For l = k:

(i)
∇w(l) ŷk = complete me...
(i) (i)
= x(i) ŷl (1 − ŷl )

For l ̸= k:
(i)
∇w(l) ŷk = complete me...
(i) (i)
= −x(i) ŷk ŷl

1
To compute the total Pgradient of P
fCE w.r.t. eachP w(k) , we have to sum over all examples and over
l = 1, . . . , c. (Hint: k ak = al + k̸=l ak . Also, k yk = 1.)

n c
1 X X (i) (i)
∇w(l) fCE (W, b) = − yk ∇w(l) log ŷk
n i=1
k=1
= complete me...
n
1 X (i) (i) (i)

= − x yl − ŷl
n i=1

Finally, show that

n
1 X (i)
∇b fCE (W, b) = − y − ŷ(i)
n i=1

3. Implementation of softmax regression [25 points, in Python code]:

Train a 2-layer softmax neural network to classify images of fashion items (10 different classes, such
as shoes, t-shirts, dresses, etc.) from the Fashion MNIST dataset. The input to the network will be
a 28 × 28-pixel image (converted into a 784-dimensional vector); the output will be a vector of 10
probabilities (one for each class). The cross-entropy loss function1 that you minimize should be
n 10 c
1 X X (i) (i) α X (k) ⊤ (k)
fCE (w(1) , . . . , w(10) , b(1) , . . . , b(10) ) = − yk log ŷk + w w
n i=1 2
k=1 k=1

where n is the number of examples

and α is a regularization
constant..
Note that each ŷk implicitly
depends on all the weights W = w(1) , . . . , w(10) and biases b = b(1) , . . . , b(10) .
To get started, first download the Fashion MNIST dataset from the following web links:
• https://s3.amazonaws.com/jrwprojects/fashion_mnist_train_images.npy
• https://s3.amazonaws.com/jrwprojects/fashion_mnist_train_labels.npy
• https://s3.amazonaws.com/jrwprojects/fashion_mnist_test_images.npy
• https://s3.amazonaws.com/jrwprojects/fashion_mnist_test_labels.npy
These files can be loaded into numpy using np.load. Each “labels” file consists of a 1-d array containing
n labels (valued 0-9), and each “images” file contains a 2-d array of size n × 784, where n is the number
of images.
Next, implement stochastic gradient descent (SGD) to minimize the cross-entropy loss function on
this dataset. Regularize the weights but not the biases. Optimize the same hyperparameters as in
homework 1 problem 2 (age regression). You should also use the same methodology as for the previous
homework, including the splitting of the training files into validation and training portions.
Performance evaluation: Once you have tuned the hyperparameters and optimized the weights so
as to maximize performance on the validation set, then: (1) stop training the network and (2) evaluate
1 In this equation, the regularization term is not divided by n like in the lecture notes. Either equation is valid since the 1/n

can be subsumed into α. Here, for simplicity, the 1/n is omitted.

2
the network on the test set. Record the performance both in terms of (unregularized) cross-entropy
loss (smaller is better) and percent correctly classified examples (larger is better); put this information
into the PDF you submit.
Hint 1: it accelerates training if you first normalize all the pixel values of both the training and testing
data by dividing each pixel by 255. Hint 2: when using functions like np.sum and np.mean, make
sure you know what the axis and keepdims parameters mean and that you use them in a way that is
consistent with the math!
4. Logistic Regression [15 points, on paper]: Consider a 2-layer neural network that computes the
function
ŷ = σ(x⊤ w + b)
where x is an example, w is a vector of weights, b is a bias term, and σ is the logistic sigmoid function.
Assume we train this network using the log loss, as described in class. Moreover, suppose all the
training examples are positive. Answer the following questions about convergence. (Informally,
a sequence of numbers converges if it gets closer and closer to a specific number as the sequence
progresses. A sequence that does not converge can do different things, e.g., change erratically, or grow
towards +/ − ∞.) While you are not required to give formal proofs, you should explain your reasoning,
which could either be a mathematical argument or a simulation result. Put your answers into your
PDF file.

(a) Given a well-chosen learning rate: what value will the training loss converge to during gradient
descent?
(b) Given a well-chosen learning rate: will b always converge; does convergence depend on the exact
training examples; or does it never converge?
(c) Suppose the training set contains exactly 2 examples, x(1) , x(2) ∈ R2 . Give specific values for
these training data such that:
i. w will converge during gradient descent (given a well-chosen learning rate).
ii. w will not converge during gradient descent (no matter what the learning rate).

Create a Zip file containing both your Python and PDF files, and then submit on Canvas. If you are working
as part of a group, then only one member of your group should submit (but make sure you have already
signed up in a pre-allocated team for the homework on Canvas).

DL Unit-2
No ratings yet
DL Unit-2
24 pages
Solution PDF
No ratings yet
Solution PDF
20 pages
DSCI 303: Machine Learning For Data Science Fall 2020
No ratings yet
DSCI 303: Machine Learning For Data Science Fall 2020
5 pages
Solution: Introduction To Deep Learning
No ratings yet
Solution: Introduction To Deep Learning
20 pages
DNN Cluster S2 22 MidSem Makeup
No ratings yet
DNN Cluster S2 22 MidSem Makeup
7 pages
CS335 Lab6
No ratings yet
CS335 Lab6
7 pages
Midpaper
No ratings yet
Midpaper
16 pages
DL ppt
No ratings yet
DL ppt
110 pages
DL Quiz1
No ratings yet
DL Quiz1
5 pages
Section05 Solutions
No ratings yet
Section05 Solutions
9 pages
Introduction To Deep Learning Assignment 0: September 2023
No ratings yet
Introduction To Deep Learning Assignment 0: September 2023
3 pages
Types of Neural Networks
No ratings yet
Types of Neural Networks
7 pages
C2W3_Lab_01_Model_Evaluation_and_Selection
No ratings yet
C2W3_Lab_01_Model_Evaluation_and_Selection
21 pages
CS 229, Autumn 2017 Problem Set #4: EM, DL & RL
No ratings yet
CS 229, Autumn 2017 Problem Set #4: EM, DL & RL
10 pages
C2W3 Lab 01 Model Evaluation and Selection
No ratings yet
C2W3 Lab 01 Model Evaluation and Selection
21 pages
Machine Learning Assignments
No ratings yet
Machine Learning Assignments
3 pages
IBest_DeepLearning
No ratings yet
IBest_DeepLearning
123 pages
final_exam_solutions
No ratings yet
final_exam_solutions
12 pages
CS4100 CS5100 CW1 20241001
No ratings yet
CS4100 CS5100 CW1 20241001
10 pages
ML Record Print
No ratings yet
ML Record Print
20 pages
SS 2020 Solutions
No ratings yet
SS 2020 Solutions
22 pages
HW 3
No ratings yet
HW 3
4 pages
niraj dl
No ratings yet
niraj dl
15 pages
practicalMachineLearning_lecture3
No ratings yet
practicalMachineLearning_lecture3
25 pages
hw4_red
No ratings yet
hw4_red
6 pages
Kagan Lecture2
No ratings yet
Kagan Lecture2
118 pages
chapter_4_assignment (6)
No ratings yet
chapter_4_assignment (6)
5 pages
Machine Learning Homework
No ratings yet
Machine Learning Homework
8 pages
DNN Cluster S2 22 MidSem Regular
No ratings yet
DNN Cluster S2 22 MidSem Regular
6 pages
disc11-examprep-sols (9 files merged)
No ratings yet
disc11-examprep-sols (9 files merged)
12 pages
Sample Final Exam Solutions
No ratings yet
Sample Final Exam Solutions
30 pages
Hw3 Solutions
No ratings yet
Hw3 Solutions
7 pages
CIS 419/519 Introduction To Machine Learning Assignment 2: Instructions
No ratings yet
CIS 419/519 Introduction To Machine Learning Assignment 2: Instructions
12 pages
HW 1
No ratings yet
HW 1
3 pages
ML Lab 08 Manual - Logisitic Regression (Ver7)
No ratings yet
ML Lab 08 Manual - Logisitic Regression (Ver7)
9 pages
Lec 04 Deep Networks 2
No ratings yet
Lec 04 Deep Networks 2
78 pages
SS_2020
No ratings yet
SS_2020
21 pages
Cs230exam Win20 Soln
No ratings yet
Cs230exam Win20 Soln
28 pages
Assignment DL
No ratings yet
Assignment DL
20 pages
1735550619101_ad3511-deep-learning-lab-manual_241230_204240
No ratings yet
1735550619101_ad3511-deep-learning-lab-manual_241230_204240
63 pages
Curs4site PDF
No ratings yet
Curs4site PDF
44 pages
Tutorial 8 Questions
No ratings yet
Tutorial 8 Questions
3 pages
Assignment1 Math Deep Learning
No ratings yet
Assignment1 Math Deep Learning
3 pages
HW 3
No ratings yet
HW 3
12 pages
Six Lectures On NN - Montanari
No ratings yet
Six Lectures On NN - Montanari
77 pages
Curs5site PDF
No ratings yet
Curs5site PDF
47 pages
Midterm Solutions
No ratings yet
Midterm Solutions
14 pages
Deep Learning Assignment3 Solution
No ratings yet
Deep Learning Assignment3 Solution
9 pages
DL 3
No ratings yet
DL 3
72 pages
C2 W2 SoftMax
No ratings yet
C2 W2 SoftMax
7 pages
CS229 Lecture 2 PDF
100% (1)
CS229 Lecture 2 PDF
48 pages
W02 MLOptDL
No ratings yet
W02 MLOptDL
23 pages
hw1 f21112 Problems11
No ratings yet
hw1 f21112 Problems11
2 pages
ML Lab 06 Manual - Linear Regression 1 (Version 6)
No ratings yet
ML Lab 06 Manual - Linear Regression 1 (Version 6)
8 pages
Department of Electrical Engineering School of Science and Engineering EE514/CS535 Machine Learning Homework 1
No ratings yet
Department of Electrical Engineering School of Science and Engineering EE514/CS535 Machine Learning Homework 1
11 pages
Assignment 4x
No ratings yet
Assignment 4x
19 pages
DeepLearning Recap
No ratings yet
DeepLearning Recap
104 pages
DeepLearning Workshop Humayun
No ratings yet
DeepLearning Workshop Humayun
63 pages
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
From Everand
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
Yue Jiang
4.5/5 (2)
Calculus-II (Mathematics) Question Bank
From Everand
Calculus-II (Mathematics) Question Bank
Mohmmad Khaja Shareef
No ratings yet
Cloud SecOps V7
No ratings yet
Cloud SecOps V7
26 pages
Final Project Report
No ratings yet
Final Project Report
27 pages
How Digital Transformation Can Improve Hospitals' Operational Decisions
No ratings yet
How Digital Transformation Can Improve Hospitals' Operational Decisions
9 pages
House Price
No ratings yet
House Price
44 pages
Python - Final 1
No ratings yet
Python - Final 1
17 pages
Ain3001 Presentation Guideline For ML Midterm
No ratings yet
Ain3001 Presentation Guideline For ML Midterm
3 pages
Weather Report Generation and Prediction
No ratings yet
Weather Report Generation and Prediction
4 pages
COMPUVISION
No ratings yet
COMPUVISION
27 pages
AI Interview notes (1)
No ratings yet
AI Interview notes (1)
11 pages
Big Data Report 1
No ratings yet
Big Data Report 1
17 pages
Project Thesis (22102d020103)
No ratings yet
Project Thesis (22102d020103)
68 pages
Data Science Skills
No ratings yet
Data Science Skills
31 pages
AI Pillar PDFdownload
No ratings yet
AI Pillar PDFdownload
40 pages
Data Science Lecture No 01
No ratings yet
Data Science Lecture No 01
28 pages
GCP CDL
100% (1)
GCP CDL
156 pages
Python Major EDITED
No ratings yet
Python Major EDITED
3 pages
How To Pass Image Datasets To CNN Models Using Image Data Generations - by MD Shahbaz Alam - Medium
No ratings yet
How To Pass Image Datasets To CNN Models Using Image Data Generations - by MD Shahbaz Alam - Medium
14 pages
Major Project PPT Format
No ratings yet
Major Project PPT Format
19 pages
Hard Net Hardness Aware Discrimination Network For 3d Early Activity Prediction
No ratings yet
Hard Net Hardness Aware Discrimination Network For 3d Early Activity Prediction
17 pages
A CNN-LSTM Model For Gold Price Time-Series Forecasting
No ratings yet
A CNN-LSTM Model For Gold Price Time-Series Forecasting
10 pages
Classification
No ratings yet
Classification
10 pages
Tusquellas 2024 Analysis of The Potential of Artificial Intelligence For Professional
No ratings yet
Tusquellas 2024 Analysis of The Potential of Artificial Intelligence For Professional
9 pages
5 NLP Cheat Sheets - Beginners - Professional
No ratings yet
5 NLP Cheat Sheets - Beginners - Professional
6 pages
10 1108 - FS 10 2021 0210
No ratings yet
10 1108 - FS 10 2021 0210
23 pages
NF Assighment4
No ratings yet
NF Assighment4
5 pages
BCG Turning Visibility Into Value in Digital Supply Chains
No ratings yet
BCG Turning Visibility Into Value in Digital Supply Chains
8 pages
Rahul Iyer CV PDF
No ratings yet
Rahul Iyer CV PDF
3 pages
Regularization: The Problem of Overfitting
No ratings yet
Regularization: The Problem of Overfitting
24 pages
hw3 (1)
No ratings yet
hw3 (1)
35 pages
Week05 - Naive Bayes Tutorial - Solutions
No ratings yet
Week05 - Naive Bayes Tutorial - Solutions
23 pages