HW 3
HW 3
25 September
Homework #3
EE 541:Fall2023
1. An MLP has two input nodes, one hidden layer, and two Recall outputs.
that the output for layer l
(l) (l−1)
is given by a = h l W l a + bl . The two sets of weights and biases are given by:
1 −2 1
W1 = b1 =
3 4 0
2 2 0
W2 = b2 =
2 −3 −4
2. The hd5 format can store multiple data objects in a single file each keyed by object name – e.g.,
can store a numpy float array called regressor and a numpy integer array called labels in the sam
Hd5 also allows fast non-sequential
access to objects without scanning the entire
This
file.
means
you can efficiently access objects and data such as x[idxs] with non-consecutive indexes e.g., idx
= [2, 234, 512]. This random-access property is useful
when extracting a random subset from a
larger training database.
In this problem you will create an hd5 file containing a numpy array of binary random sequences
you generate yourself.
Follow these steps:
(1) Run the provided template python file (random binary collection.py).
The script is set to
DEBUG mode by default.
(2) Experiment with the assert statements to trap errors and understand what they are doing b
using the shape method on numpy arrays, etc.
(4) Verify that your hd5 file was written properly by checking that it can be read-back.
The MNIST dataset of handwritten digits is one of the earliest and most used datasets to benchm
machine learning classifiers.
Each datapoint contains 784 input features – the
values
pixel from a
28 × 28 image – and belongs to one of 10 output classes – represented by the numbers 0-9.
In this part you will use the provided MNIST handwritten-digit data to build and train a logist
“2” detector:
(
1 x is a “2”
y=
0 else.
1, w
A logistic classifier takes learned weight vector w . . . wL ]T and the unregularized
=2, [w
offset bias b ≜0wto estimate a probability that an input vector 1, xx2,= . , xL ]T is “2”:
. . [x
1 1
p(x) = P [Y = 1|x, w] = P = T x + w ))
.
1 + exp− L 1 + exp (− (w
i=1 w k · x k + w0 0
Train a logistic classifier to find weights that minimize the binary log-loss (also called the bin
cross entropy loss):
N
1X
l(w) = − (y i log p(x)) + (1 − iy) log (1 − p(x))
N
i=1
iii. What regularizers did you try? Specifically, how did each impactor
your
improve
modelits
performance?
iv. Plot log-loss (i.e.,learning curve) of the training set and test set on the same
On figure.
a separate figure plot the accuracy against iteration number ofon your
themodel
training
set and test set.Plot each as a function of the iteration number.
v. Clasify each input to the binary output “digit is a 2” using a 0.5 threshold.
Compute
the finalloss and final
accuracy for both your training set and test set.
Submit your trained weights to Autolab.
Save your weights and bias to an hdf5
Usefile.
keys w
and b for the weights and bias, respectively.
w should be a length-784 numpy vector/array and
b should be a numpy scalar.
Use the following as guidance:
1X
N h i
L(w) = − log P Y = y (i) |x (i) , w
N
i=1
iii. Plot log-loss (i.e.,learning curve) of the training set and test set on the same
On figure.
a separate figure plot the accuracy against iteration number ofon
your
themodel
training
set and test set.
Plot each as a function of the iteration number.
iii. Implement SGD with mini-batch size of 100 (i.e., compute the gradient and update weigh
with accumulated average after every 100 samples).
Record the log-loss and accuracies as
Yourplots.
above (every 5,000 samples – not 5,000 batches) and create similar curves
should show performance comparable to batch gradient
descent.How many iterations
did it take to acheive comparable performance with batch gradientHow
descent?
does
this number depend on the learning (or
rate?
learning rate decay schedule if you have a
non-constant learning rate).