0% found this document useful (0 votes)
14 views

HW 3

This document outlines homework assignments for an EE 541 class. It includes 3 problems: 1. Calculating the output of a simple multi-layer perceptron network on a given input. 2. Creating an HDF5 file to store binary data and exploring its capabilities for random access. 3. Implementing logistic regression and softmax classification on the MNIST handwritten digits dataset using NumPy. Tasks include training a logistic regression model to detect the digit 2, exploring hyperparameters and regularization, and plotting learning curves.

Uploaded by

bohari NADRA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

HW 3

This document outlines homework assignments for an EE 541 class. It includes 3 problems: 1. Calculating the output of a simple multi-layer perceptron network on a given input. 2. Creating an HDF5 file to store binary data and exploring its capabilities for random access. 3. Implementing logistic regression and softmax classification on the MNIST handwritten digits dataset using NumPy. Tasks include training a logistic regression model to detect the digit 2, exploring hyperparameters and regularization, and plotting learning curves.

Uploaded by

bohari NADRA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Assigned:

25 September

Homework #3
EE 541:Fall2023

1. An MLP has two input nodes, one hidden layer, and two Recall outputs.
that the output for layer l
(l) (l−1)
is given by a = h l W l a + bl . The two sets of weights and biases are given by:

1 −2 1
W1 = b1 =
3 4 0
2 2 0
W2 = b2 =
2 −3 −4

The non-linearactivation forthe hidden layer is ReLU (rectified linear


unit) – that is h(x) =
max (x, 0).The output layer is linear (i.e.,
identity activation function).
What is the output ac-
tivation for input x = [+1; T−1]
?

2. The hd5 format can store multiple data objects in a single file each keyed by object name – e.g.,
can store a numpy float array called regressor and a numpy integer array called labels in the sam
Hd5 also allows fast non-sequential
access to objects without scanning the entire
This
file.
means
you can efficiently access objects and data such as x[idxs] with non-consecutive indexes e.g., idx
= [2, 234, 512]. This random-access property is useful
when extracting a random subset from a
larger training database.

In this problem you will create an hd5 file containing a numpy array of binary random sequences
you generate yourself.
Follow these steps:

(1) Run the provided template python file (random binary collection.py).
The script is set to
DEBUG mode by default.

(2) Experiment with the assert statements to trap errors and understand what they are doing b
using the shape method on numpy arrays, etc.

(3) Set the DEBUG flag to False.


Manually create 25 binary sequences each with length
It is 20.
important that you do this by hand, i.e., , do not use a coin, computer, or random numbe
generator.

(4) Verify that your hd5 file was written properly by checking that it can be read-back.

(5) Submit your hd5 file as directed.


3. Logistic regression

The MNIST dataset of handwritten digits is one of the earliest and most used datasets to benchm
machine learning classifiers.
Each datapoint contains 784 input features – the
values
pixel from a
28 × 28 image – and belongs to one of 10 output classes – represented by the numbers 0-9.

In this problem you will


use numpy to classify input images using a logistic-regression.
Use only
Python standard library modules, numpy, and mathplotlib for this problem.

(a) Logistic “2” detector

In this part you will use the provided MNIST handwritten-digit data to build and train a logist
“2” detector:
(
1 x is a “2”
y=
0 else.

1, w
A logistic classifier takes learned weight vector w . . . wL ]T and the unregularized
=2, [w
offset bias b ≜0wto estimate a probability that an input vector 1, xx2,= . , xL ]T is “2”:
. . [x

1 1
p(x) = P [Y = 1|x, w] = P = T x + w ))
.
1 + exp− L 1 + exp (− (w
i=1 w k · x k + w0 0

Train a logistic classifier to find weights that minimize the binary log-loss (also called the bin
cross entropy loss):
N
1X
l(w) = − (y i log p(x)) + (1 − iy) log (1 − p(x))
N
i=1

where the sum is over the N samples in the training


Train
set.
your modeluntilconvergence
according to some metric you choose.
Experiment with variations1- of
and/or
ℓ ℓ-regularization
2
to stabilize training and improve generalization.

Submit answers to the following.

i. How did you determine a learning rate?


What values did you try?
What was your final
value?

ii. Describe the method you used to establish


convergence.
model

iii. What regularizers did you try? Specifically, how did each impactor
your
improve
modelits
performance?

iv. Plot log-loss (i.e.,learning curve) of the training set and test set on the same
On figure.
a separate figure plot the accuracy against iteration number ofon your
themodel
training
set and test set.Plot each as a function of the iteration number.

v. Clasify each input to the binary output “digit is a 2” using a 0.5 threshold.
Compute
the finalloss and final
accuracy for both your training set and test set.
Submit your trained weights to Autolab.
Save your weights and bias to an hdf5
Usefile.
keys w
and b for the weights and bias, respectively.
w should be a length-784 numpy vector/array and
b should be a numpy scalar.
Use the following as guidance:

with h5py.File(outFile, 'w') as hf:


hf.create_dataset('w', data = np.asarray(weights))
hf.create_dataset('b', data = np.asarray(bias))

Note: you willnot be scored on your models overall


accuracy.But a low-score may indicate
errors in training or poor optimization.

(b) Softmax classification:


gradient descent (GD)

In this part you will


use soft-max to peform multi-class classification instead
distinctof“one
against all” detectors.
The target vector
(
1 x is an “l”
[Y]l =
0 else.

for l = 0, . . . , K − 1. You can alternatively consider a scalar output Ytoequal


the value in
Construct
{0, 1, . . . , K − 1} corresponding to the class of input x. a logistic classifier that uses
K seperate linear weight vectors 0, w1w K−1 . Compute estimated probabilities for each
,..., w
class given input x and select the class with the largest score among your K predictors:
T
exp(wl x)
P [Y = l|x, w] =P K T
i=0 exp(wi x)

Ŷ = arg maxP [Y = l|x, w] .


l

Note that the probabilities sum to


Use
1. log-loss and optimize with batch gradient descent.
The (negative) likelihood function on an N sampling training set is:

1X
N h i
L(w) = − log P Y = y (i) |x (i) , w
N
i=1

where the sum is over the N points in our training set.

Submit answers to the following.

i. Compute (by-hand) the derivative of the log-likelihood of the soft-max


Write
function.
the
derivative in terms of conditional
probabilities, and indicator functions (i.e.,
the vector x,
do not write this expression in terms of exponentials).
You need this gradient in subsequent
parts of this problem.

ii. Implement batch gradient descent.


What learning rate did you use?

iii. Plot log-loss (i.e.,learning curve) of the training set and test set on the same
On figure.
a separate figure plot the accuracy against iteration number ofon
your
themodel
training
set and test set.
Plot each as a function of the iteration number.

iv. Compute the final


loss and final
accuracy for both your training set and test set.

(c) Softmax classification:


stochastic gradient descent

In this part you will


use stochastic gradient descent (SGD) in place of (deterministic) gradient
descent above.Test your SGD implmentation using single-point updates and a mini-batch size
of 100. You may need to adjust the learning rate to improve performance.
You can either:
modify the rate by hand or according to some decay scheme or you may choose a single lea
rate.You should get a final
predictor comparable to that in the previous question.

Submit answers to the following.

i. Implement SGD with mini-batch size of 1compute


(i.e., the gradient and update weights
after each sample).
Record the log-loss and accuracy of the training set and test set every
5,000 samples.Plot the sampled log-loss and accuracy values on the same (respective)
figures against the batch number.
Your plots should start at iteration 0 (i.e., include initial
log-loss and accuracy).
Your curves should show performance comparable to batch gradient
descent.How many iterations did it take to acheive comparable performance with batch
gradient descent?How does this number depend on the learning(or rate?
learning rate
decay schedule if you have a non-constant learning rate).

ii. Compare (to batch gradient descent) the computational


total complexity to reach a com-
parable accuracy on your training Note
set.that each iteration of batch gradient descent
costs an extra factor of N operations where N is the number data points.

iii. Implement SGD with mini-batch size of 100 (i.e., compute the gradient and update weigh
with accumulated average after every 100 samples).
Record the log-loss and accuracies as
Yourplots.
above (every 5,000 samples – not 5,000 batches) and create similar curves
should show performance comparable to batch gradient
descent.How many iterations
did it take to acheive comparable performance with batch gradientHow
descent?
does
this number depend on the learning (or
rate?
learning rate decay schedule if you have a
non-constant learning rate).

iv. Compare the computational


complexity to reach comparable perforamnce between the 100
sample mini-batch algorithm, the single-point mini-batch, and batch gradient descent.

Submit your trained weights to Autolab.


Save your weights and bias to an hdf5
Usefile.
keys W
and b for the weights and bias, respectively.
W should be a 10 × 784 numpy array and b should
be 10 × 1 – shape:
(10,) – numpy array.The code to save the weights is the same as (a) –
substituting W for w.

Note: you willnot be scored on your models overall


accuracy.But a low-score may indicate
errors in training or poor optimization.

You might also like