0% found this document useful (0 votes)
13 views

CS445 - Neural Networks and Deep Learning - Lecture Notes

Uploaded by

nihed13535
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

CS445 - Neural Networks and Deep Learning - Lecture Notes

Uploaded by

nihed13535
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

CS445: Neural Networks and Deep Learning

Lecture 4: Backpropagation and Gradient Descent Professor Chen - Fall 2024

I. Understanding Backpropagation
Today's lecture focused on the mathematics behind neural network training. The
backpropagation algorithm is fundamental to how neural networks learn from data.

Key Concepts:
1. Forward Propagation

- Input signals flow through the network


- Each neuron computes: output = activation_function(weighted_sum + bias)
- Final layer produces prediction

2. Computing the Loss

- Measures difference between prediction and actual target


- Common loss functions:
● Mean Squared Error (MSE): L = 1/n Σ(y - ŷ)²
● Cross-Entropy: L = -Σ y log(ŷ)

II. The Chain Rule in Neural Networks


The chain rule is crucial for computing gradients through multiple layers:

∂L/∂w = ∂L/∂a × ∂a/∂z × ∂z/∂w

Where:

- L is the loss
- a is the activation
- z is the weighted sum
- w is the weight

III. Gradient Descent Implementation


def backward_pass(network, loss, learning_rate=0.01):
# Compute gradients

for layer in reversed(network.layers):

layer.gradients = compute_gradients(layer)

# Update weights and biases

layer.weights -= learning_rate * layer.gradients['weights']

layer.biases -= learning_rate * layer.gradients['biases']

Types of Gradient Descent:


1. Batch Gradient Descent

- Uses entire dataset for each update


- Very stable but slow
- High memory requirements

2. Stochastic Gradient Descent (SGD)

- Uses single sample for each update


- Faster but noisier
- Lower memory requirements

3. Mini-batch Gradient Descent

- Best of both worlds


- Typically 32-256 samples per batch
- Most commonly used in practice

IV. Activation Functions


We covered several activation functions and their derivatives:

1. Sigmoid

- σ(x) = 1/(1 + e^(-x))


- Derivative: σ(x)(1 - σ(x))
- Issues with vanishing gradients
2. ReLU

- f(x) = max(0, x)
- Derivative: 1 if x > 0, 0 otherwise
- Most commonly used today

3. Tanh

- Range: [-1, 1]
- Often better than sigmoid
- Still has vanishing gradient issues

V. Common Challenges and Solutions

1. Vanishing Gradients
Solutions:

- Use ReLU activation


- Implement residual connections
- Proper initialization

2. Exploding Gradients
Solutions:

- Gradient clipping
- Batch normalization
- L2 regularization

VI. Practical Implementation Tips


1. Weight Initialization:

# He initialization for ReLU networks

weights = np.random.randn(shape) * np.sqrt(2/n_inputs)

# Xavier initialization for tanh networks

weights = np.random.randn(shape) * np.sqrt(1/n_inputs)


2. Learning Rate Selection:
- Start with 0.01
- Use learning rate schedules
- Consider adaptive methods (Adam, RMSprop)

VII. Today's Lab Exercise


Implement a simple neural network with:

1. One hidden layer (64 units)


2. ReLU activation
3. Softmax output layer
4. Cross-entropy loss
5. Mini-batch gradient descent

Homework Assignment
Due next Tuesday:

1. Implement backpropagation from scratch


2. Train a network on MNIST dataset
3. Experiment with different:
- Learning rates
- Batch sizes
- Network architectures

Important Formulas to Remember


1. Softmax: σ(z)ᵢ = e^zᵢ / Σ e^z

2. Cross-Entropy Loss: L = -Σ yᵢ log(ŷᵢ)

3. Weight Update Rule: w = w - α∇L

Additional reading: "Deep Learning" by Goodfellow, Bengio, and Courville - Chapter 6.5

Next Week's Preview


- Convolutional Neural Networks
- Feature Maps
- Pooling Layers
- CNN Architectures
Recommended Resources
- Tensorflow Documentation
- PyTorch Tutorials
- Stanford CS231n Course Notes
- Andrew Ng's Deep Learning Specialization

Note: Office hours this week are Wednesday 2-4pm and Thursday 3-5pm in Room 405.

You might also like