0% found this document useful (0 votes)
22 views

gradient_exploding_vanishing_problem_v2

Uploaded by

gaoxiang0411
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views

gradient_exploding_vanishing_problem_v2

Uploaded by

gaoxiang0411
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Gradient Exploding and Vanishing Problem in Deep Learning

In deep learning, the gradient exploding and gradient vanishing problems are two common issues

that occur during the training of neural networks, particularly deep networks.

### Gradient Vanishing Problem

The gradient vanishing problem happens when gradients become very small during

backpropagation, particularly in deep networks. This leads to the weights of earlier layers (closer to

the input) updating very slowly or not at all, making it difficult for the model to learn effectively. This

issue is most prominent when using activation functions like the sigmoid or tanh, which squash input

values into small ranges (e.g., between 0 and 1 for sigmoid), causing the gradients to shrink as they

are propagated b...

**Why it happens:**

- The gradient values computed through backpropagation are products of the derivatives of

activation functions and weights. In deep networks, this can lead to the gradients becoming

exceedingly small as they move from the output layer to the input layer.

- For example, using the sigmoid activation function results in gradients that are always smaller than

1, which can quickly cause the gradients to become very small in deeper layers.

**Consequences:**

- Slow or no learning in deeper layers of the network.

- The model may fail to capture complex features or patterns in the data.

**Mitigations:**

- Use ReLU (Rectified Linear Unit) or its variants (like Leaky ReLU) as activation functions, as they
don't squash the gradient in the same way.

- Implement techniques like batch normalization or gradient clipping.

- Use proper weight initialization techniques, such as Xavier or He initialization.

### Gradient Exploding Problem

The exploding gradient problem occurs when gradients become extremely large during

backpropagation, which causes the weights of the network to become very large and unstable. This

leads to a situation where the model?s learning process diverges instead of converging.

**Why it happens:**

- In deep networks, if the gradients are excessively large due to factors like large weights or an

unsuitable activation function, the gradients can grow exponentially as they propagate back through

the layers.

- For example, using activation functions with large derivatives, or poor weight initialization, can

cause this problem.

**Consequences:**

- Weight updates become too large, causing the model to overshoot and fail to converge to a good

solution.

- The model's training can become unstable, sometimes resulting in NaN values during computation.

**Mitigations:**

- Apply gradient clipping, which limits the gradient values to a certain threshold.

- Use weight regularization techniques (like L2 regularization) to prevent the weights from growing

too large.

- Use appropriate weight initialization methods (e.g., Xavier initialization for sigmoid or tanh, or He

initialization for ReLU).


### Summary

- **Vanishing gradients** make training deep networks slow and ineffective by causing gradients to

shrink, especially with certain activation functions.

- **Exploding gradients** cause instability and prevent the model from converging by causing

excessively large updates to the weights.

Both problems are especially significant in very deep neural networks, but various strategies (like

proper initialization, choice of activation functions, and gradient clipping) can help mitigate them.

You might also like