0% found this document useful (0 votes)
2 views

OPTIMIZATION _Lecture4_

The document discusses optimization techniques, focusing on the Gradient Descent and Steepest Descent algorithms. Gradient Descent minimizes functions by iteratively updating parameters in the direction of the steepest descent, while Steepest Descent computes an optimal step size for each iteration, making it more reliable but computationally expensive. Key differences include step size choice, convergence robustness, speed, implementation complexity, and suitability for various problems.

Uploaded by

gi7282
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

OPTIMIZATION _Lecture4_

The document discusses optimization techniques, focusing on the Gradient Descent and Steepest Descent algorithms. Gradient Descent minimizes functions by iteratively updating parameters in the direction of the steepest descent, while Steepest Descent computes an optimal step size for each iteration, making it more reliable but computationally expensive. Key differences include step size choice, convergence robustness, speed, implementation complexity, and suitability for various problems.

Uploaded by

gi7282
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

OPTIMIZATION TECHNIQUES

[AcSIR-01-ES-AD-009]
Presented by:
Dr. R. S. Bisht, Pr. Scientist,
[email protected]
CSIR-CBRI, Roorkee, India
Gradient Descent Algorithm
• Gradient Descent is a fundamental optimization algorithm used to
minimize functions by iteratively moving toward the function’s local
minimum. It is widely used in machine learning, especially for training
models by minimizing loss functions.
• The main idea is to update the variables or parameters of the function
in the direction of the steepest descent (i.e., the negative gradient),
which is the direction that reduces the function's value the fastest.
Gradient Descent Algorithm
• Gradient Calculation (Derivative)
• Update Rule:
• Learning Rate The learning rate α controls how fast we move towards
the minimum.
• If α is too small, convergence will be slow.
• If α is too large, we may overshoot the minimum and fail to converge.
• Convergence The algorithm stops when the updates to the
parameters become very small, meaning the cost function is
minimized, or after a fixed number of iterations.
gradient descent method
Mathematical Formulation:
Disadvantages:
• Sensitive to the choice of the learning rate α. If the learning rate is too
large, it may overshoot; if too small, convergence can be slow.

• For functions with a complex geometry (e.g., ill-conditioned


problems), the algorithm may suffer from slow convergence,
especially in narrow valleys.
Steepest Descent Method
• The steepest descent method is similar to gradient descent, but it
finds the optimal step size for each iteration.

• Instead of using a fixed learning rate, it computes the step size by


solving a line search problem that minimizes the function along the
direction of the negative gradient.
Mathematical Formulation:
Advantages:
• Steepest descent is generally more reliable because it always finds the
best step size at each iteration.

• It works well in situations where the landscape of the function is


complicated (e.g., narrow valleys, ill-conditioned problems).
The steepest descent method can be computationally
Disadvantages: expensive because it requires solving a line search problem at
every iteration, which can significantly slow down the
algorithm for large-scale problems.
In practice, for many machine learning tasks, it can be overkill
due to the overhead of computing the optimal step size.
Key Differences:
Aspect Gradient Descent Steepest Descent
Fixed step size α chosen before the Dynamically computed αk using line
Step Size (Learning Rate)
process. search.
Also moves in the direction of the
Moves in the direction of the
Direction negative gradient, but with an
negative gradient.
optimal step size for each step.
Sensitive to the choice of the More robust convergence due to
Convergence
learning rate. optimal step size choice.

Potentially faster for well- Can be slower due to the overhead


Speed
conditioned problems. of calculating αk​ in each iteration.

Simple to implement; commonly Requires line search, which adds


Implementation
used in large-scale optimization. complexity and computational cost.

Suitable for problems where high


Suitable for machine learning precision and robust convergence
Suitability
problems with large datasets. are important (e.g., scientific
computations).

You might also like