Linear Regression and Gradient Descent (1)

Linear regression is a statistical method used to determine the relationship between features and a label, such as predicting a car's fuel efficiency based on its weight. The model's accuracy is measured using loss metrics like MAE and MSE, which help in handling outliers differently. Gradient descent is employed to iteratively minimize loss, and the model converges when it finds the lowest possible loss, represented by a convex loss surface.

Uploaded by

arjav jhamb

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

Linear Regression and Gradient Descent (1)

Uploaded by

arjav jhamb

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 36

LINEAR REGRESSION

• Linear Regression is a statistical technique used to find

the relationship between variables. In an ML context,
linear regression finds the relationship between
features and a label.
• For example, suppose we want to predict a car's fuel
efficiency in miles per gallon based on how heavy the
car is, and we have the following dataset:
• If we plotted these points, we'd get the following graph:

Figure 1. Car heaviness (in pounds) versus miles per gallon rating. As a car gets
heavier, its miles per gallon rating generally decreases.
• We could create our own model by drawing a best fit
line through the points:

Figure 2. A best fit line drawn through the data from the
previous figure.
Figure 4. Using the model, a 4,000-pound car has a predicted fuel efficiency of 15.6
miles per gallon.
By graphing some of these additional features, we can see that they also have a linear relationship
to the label, miles per gallon:

Figure 6. A car's displacement in cubic centimeters and its miles per gallon rating. As a car's
engine gets bigger, its miles per gallon rating generally decreases.
Figure 7. A car's acceleration and its miles per gallon rating. As a car's acceleration takes
longer, the miles per gallon rating generally increases.
Figure 8. A car's horsepower and its miles per gallon rating. As a car's horsepower
increases, the miles per gallon rating generally decreases.
LINEAR REGRESSION LOSS
• Loss is a numerical metric that describes how wrong a
model's predictions are. Loss measures the distance
between the model's predictions and the actual labels.
The goal of training a model is to minimize the loss,
reducing it to its lowest possible value.
• In the following image, you can visualize loss as arrows
drawn from the data points to the model. The arrows show
how far the model's predictions are from the actual values.
Figure 9. Loss is measured from the actual value to the predicted
value.
The functional difference between L1 loss and L2 loss (or between MAE and MSE) is squaring. When the
difference between the prediction and label is large, squaring makes the loss even larger. When the
difference is small (less than 1), squaring makes the loss even smaller.

When processing multiple examples at once, we recommend averaging the losses across all the
examples, whether using MAE or MSE.
CHOOSING A LOSS
• Deciding whether to use MAE or MSE can depend on the dataset and the way you want to handle
certain predictions. Most feature values in a dataset typically fall within a distinct range. For example,
cars are normally between 2000 and 5000 pounds and get between 8 to 50 miles per gallon. An 8,000-
pound car, or a car that gets 100 miles per gallon, is outside the typical range and would be considered
an outlier.
• An outlier can also refer to how far off a model's predictions are from the real values. For instance,
3,000 pounds is within the typical car-weight range, and 40 miles per gallon is within the typical fuel-
efficiency range. However, a 3,000-pound car that gets 40 miles per gallon would be an outlier in terms
of the model's prediction because the model would predict that a 3,000-pound car would get between
18 and 20 miles per gallon.
• When choosing the best loss function, consider how you want the model to treat outliers. For instance,
MSE moves the model more toward the outliers, while MAE doesn't. L 2 loss incurs a much higher
penalty for an outlier than L1 loss. For example, the following images show a model trained using MAE
and a model trained using MSE. The red line represents a fully trained model that will be used to make
predictions. The outliers are closer to the model trained with MSE than to the model trained with MAE.
Figure 10. A model trained with MSE moves the model closer to the
outliers.
Figure 11. A model trained with MAE is farther from the
outliers.
Note the relationship between the model and the data:

•MSE. The model is closer to the outliers but further away from most of the other data points.

•MAE. The model is further away from the outliers but closer to most of the other data points.
• Gradient descent is a mathematical technique that iteratively
finds the weights and bias that produce the model with the
lowest loss. Gradient descent finds the best weight and bias by
repeating the following process for a number of user-defined
iterations.
• The model begins training with randomized weights and biases
near zero, and then repeats the following steps:
1.Calculate the loss with the current weight and bias.
2.Determine the direction to move the weights and bias that
reduce loss.
3.Move the weight and bias values a small amount in the direction
that reduces loss.
4.Return to step one and repeat the process until the model can't
reduce the loss any further.
The diagram outlines the iterative steps gradient descent performs to find the weights and bias
that produce the model with the lowest loss.
MODEL CONVERGENCE AND LOSS
CURVES
• When training a model, you'll often look at a loss curve
to determine if the model has converged. The loss
curve shows how the loss changes as the model trains.
The following is what a typical loss curve looks like. Loss
is on the y-axis and iterations are on the x-axis:
Figure 13. Loss curve showing the model converging around the 1,000th-
iteration mark.
• You can see that loss dramatically decreases during the first few iterations,
then gradually decreases before flattening out around the 1,000th-iteration
mark. After 1,000 iterations, we can be mostly certain that the model has
converged.

• In the following figures, we draw the model at three points during the training
process: the beginning, the middle, and the end. Visualizing the model's state
at snapshots during the training process solidifies the link between updating
the weights and bias, reducing loss, and model convergence.

• In the figures, we use the derived weights and bias at a particular iteration to
represent the model. In the graph with the data points and the model
snapshot, blue loss lines from the model to the data points show the amount
of loss. The longer the lines, the more loss there is.

• In the following figure, we can see that around the second iteration the model
would not be good at making predictions because of the high amount of loss.
Figure 14. Loss curve and snapshot of the model at the beginning of the
training process.
At around the 400th-iteration, we can see that gradient
descent has found the weight and bias that produce a
better model.

Figure 15. Loss curve and snapshot of model about midway through
training.
And at around the 1,000th-iteration, we can see that the
model has converged, producing a model with the lowest
possible loss.

Figure 16. Loss curve and snapshot of the model near the end of the training
process.
CONVERGENCE AND CONVEX
FUNCTIONS
• The loss functions for linear models always produce a
convex surface. As a result of this property, when a
linear regression model converges, we know the model
has found the weights and bias that produce the lowest
loss.
• If we graph the loss surface for a model with one
feature, we can see its convex shape. The following is
the loss surface of the miles per gallon dataset used in
the previous examples. Weight is on the x-axis, bias is
on the y-axis, and loss is on the z-axis:
Figure 17. Loss surface that shows its convex
shape.
In this example, a weight of -5.44 and bias of 35.94 produce
the lowest loss at 5.54:

Figure 18. Loss surface showing the weight and bias values that produce the lowest loss.
A linear model converges when it's found the minimum loss. Therefore, additional
iterations only cause gradient descent to move the weight and bias values in very small
amounts around the minimum. If we graphed the weights and bias points during gradient
descent, the points would look like a ball rolling down a hill, finally stopping at the point
where there's no more downward slope.

Figure 19. Loss graph showing gradient descent points stopping at the lowest point on the grap
Notice that the black loss points create the exact shape of
the loss curve: a steep decline before gradually sloping
down until they've reached the lowest point on the loss
surface.
It's important to note that the model almost never finds the
exact minimum for each weight and bias, but instead finds
a value very close to it. It's also important to note that the
minimum for the weights and bias don't correspond to zero
loss, only a value that produces the lowest loss for that
parameter.
Using the weight and bias values that produce the lowest
loss—in this case a weight of -5.44 and a bias of 35.94—we
can graph the model to see how well it fits the data:
Figure 20. Model graphed using the weight and bias values that produce the lowest loss.

This would be the best model for this dataset because no other weight and bias values produce
a model with lower loss.

Conlog PDF
67% (12)
Conlog PDF
73 pages
DL Unit-2
No ratings yet
DL Unit-2
24 pages
Gradient Descent
No ratings yet
Gradient Descent
17 pages
5.1Loss Function, Optimization,Gd
No ratings yet
5.1Loss Function, Optimization,Gd
39 pages
Module2-Optimizations
No ratings yet
Module2-Optimizations
65 pages
Loss functions
No ratings yet
Loss functions
29 pages
L02 Linear Regression
No ratings yet
L02 Linear Regression
9 pages
Lecture 3
No ratings yet
Lecture 3
90 pages
NN WK 3 Lec 5 6 Gradient Descent
No ratings yet
NN WK 3 Lec 5 6 Gradient Descent
7 pages
2EL1730 ML Lecture02 Linear and Logistic Regression
No ratings yet
2EL1730 ML Lecture02 Linear and Logistic Regression
65 pages
Lecture - 7 Gradient Descent for Linear Regression
No ratings yet
Lecture - 7 Gradient Descent for Linear Regression
17 pages
Machine Learning Models
No ratings yet
Machine Learning Models
52 pages
MLA TAB Lecture3
No ratings yet
MLA TAB Lecture3
70 pages
ML - WEEK 06
No ratings yet
ML - WEEK 06
31 pages
Module3_Ch1
No ratings yet
Module3_Ch1
83 pages
Lecture 1, Part 1: Linear Regression: Roger Grosse
No ratings yet
Lecture 1, Part 1: Linear Regression: Roger Grosse
9 pages
Lec 3-5 (Function Approximation)
No ratings yet
Lec 3-5 (Function Approximation)
34 pages
Gradient Descent Deep Learning: by T.K. Damodharan Vice President, RBS Reg - No: PC2013003013008
No ratings yet
Gradient Descent Deep Learning: by T.K. Damodharan Vice President, RBS Reg - No: PC2013003013008
37 pages
(MLP) Lecture Notes
No ratings yet
(MLP) Lecture Notes
22 pages
03 Linear Models
No ratings yet
03 Linear Models
46 pages
Machine Vesion hw6
No ratings yet
Machine Vesion hw6
18 pages
IML-Summary
No ratings yet
IML-Summary
12 pages
ML Linear Model
No ratings yet
ML Linear Model
10 pages
Linear Regression
No ratings yet
Linear Regression
29 pages
2-LR_Optim
No ratings yet
2-LR_Optim
60 pages
HODL Lec 2 Training NNs Intro TF
No ratings yet
HODL Lec 2 Training NNs Intro TF
83 pages
02 - Linear Models - A
No ratings yet
02 - Linear Models - A
23 pages
ml
No ratings yet
ml
10 pages
MECH4403 LR Week04
No ratings yet
MECH4403 LR Week04
25 pages
ML Lecture2
No ratings yet
ML Lecture2
36 pages
ML-1
No ratings yet
ML-1
24 pages
DL UNIT 2
No ratings yet
DL UNIT 2
46 pages
Linear Models (Unit II) Chapter III 1
No ratings yet
Linear Models (Unit II) Chapter III 1
24 pages
05 AIS302 ANN-Optimization
No ratings yet
05 AIS302 ANN-Optimization
44 pages
linear regression
No ratings yet
linear regression
130 pages
Machine Learning - SoS 2017
No ratings yet
Machine Learning - SoS 2017
15 pages
Lect03 CSN382
No ratings yet
Lect03 CSN382
31 pages
Group 30 Ppt
No ratings yet
Group 30 Ppt
33 pages
Regression and Optimization in ML
No ratings yet
Regression and Optimization in ML
41 pages
Linear Regression
No ratings yet
Linear Regression
20 pages
Lecture 3 - Linear Regression
No ratings yet
Lecture 3 - Linear Regression
31 pages
Week 7
No ratings yet
Week 7
53 pages
Regression Analysis
No ratings yet
Regression Analysis
54 pages
Basic Machine Learning: Case Study
No ratings yet
Basic Machine Learning: Case Study
11 pages
Linear Regression
No ratings yet
Linear Regression
62 pages
Linear Classifier: by Dr. Sanjeev Kumar Associate Professor Department of Mathematics IIT Roorkee, Roorkee-247 667, India
No ratings yet
Linear Classifier: by Dr. Sanjeev Kumar Associate Professor Department of Mathematics IIT Roorkee, Roorkee-247 667, India
86 pages
Linear Regression Using Gradient Descent
No ratings yet
Linear Regression Using Gradient Descent
2 pages
Loss Functions
No ratings yet
Loss Functions
37 pages
Linear Regression
No ratings yet
Linear Regression
61 pages
Gradient Descent
No ratings yet
Gradient Descent
13 pages
Week 1 Lecture Notes
No ratings yet
Week 1 Lecture Notes
7 pages
(Machine Learning Coursera) Lecture Note Week 1
No ratings yet
(Machine Learning Coursera) Lecture Note Week 1
8 pages
04 LossFunctions
No ratings yet
04 LossFunctions
22 pages
Gradient Descent
No ratings yet
Gradient Descent
108 pages
Notes-1
No ratings yet
Notes-1
3 pages
Lecture 2
No ratings yet
Lecture 2
66 pages
Neural Network
No ratings yet
Neural Network
14 pages
Lec9 - Linear Models
No ratings yet
Lec9 - Linear Models
44 pages
Lec 07-08 - Final
No ratings yet
Lec 07-08 - Final
32 pages
Worked Examples in Mechanics of Machines using MATLAB
From Everand
Worked Examples in Mechanics of Machines using MATLAB
Eric Okoth Ogur
No ratings yet
Calculus Essentials For Dummies
From Everand
Calculus Essentials For Dummies
Mark Ryan
No ratings yet
الذكاء العاطفي وعلاقته بضغوط العمل - دراسة حالة على عينة من موظفي مؤسسة توزيع الكهرباء والغاز بمدينة الأغواط
No ratings yet
الذكاء العاطفي وعلاقته بضغوط العمل - دراسة حالة على عينة من موظفي مؤسسة توزيع الكهرباء والغاز بمدينة الأغواط
24 pages
Click Here For Download: (PDF) Machine Learning
No ratings yet
Click Here For Download: (PDF) Machine Learning
5 pages
GPS Denied Navigation
100% (1)
GPS Denied Navigation
3 pages
VW 6-Speed Automatic Gearbox 09G
No ratings yet
VW 6-Speed Automatic Gearbox 09G
116 pages
Noam Chomsky's Linguistic Theories
No ratings yet
Noam Chomsky's Linguistic Theories
10 pages
Theory and Experiment On Microstrip Antennas
No ratings yet
Theory and Experiment On Microstrip Antennas
9 pages
Carton Drop Test - Working Instruction
100% (1)
Carton Drop Test - Working Instruction
2 pages
21DCS133 CCP Assigment 1
No ratings yet
21DCS133 CCP Assigment 1
14 pages
Cable Fault Current Calculation
No ratings yet
Cable Fault Current Calculation
5 pages
Dominance Pattern of Inheritance 1
No ratings yet
Dominance Pattern of Inheritance 1
2 pages
Statistics in Nursing Research Benjie Arias
100% (1)
Statistics in Nursing Research Benjie Arias
15 pages
Copernicus Natural Science Category 3: Grades 7-8 Practice Paper
No ratings yet
Copernicus Natural Science Category 3: Grades 7-8 Practice Paper
29 pages
Time Chap
No ratings yet
Time Chap
19 pages
Temtop PM-900M Laser Particle Sensor User Manual
No ratings yet
Temtop PM-900M Laser Particle Sensor User Manual
10 pages
Real World OCaml Functional Programming For The Masses
No ratings yet
Real World OCaml Functional Programming For The Masses
514 pages
Josh Braun - 10 Key Difference of A PRO Vendor
No ratings yet
Josh Braun - 10 Key Difference of A PRO Vendor
46 pages
Theory of Computation - Part - A - Anna University Questions
0% (1)
Theory of Computation - Part - A - Anna University Questions
8 pages
Guidance From The Group of Notified Bodies For The Construction Products Directive PDF
No ratings yet
Guidance From The Group of Notified Bodies For The Construction Products Directive PDF
7 pages
WWW - Examinationonline.in Nicl Reports AdmitHTML
No ratings yet
WWW - Examinationonline.in Nicl Reports AdmitHTML
2 pages
Rc Coupled Amplifier
No ratings yet
Rc Coupled Amplifier
17 pages
100 HF Brochure N
No ratings yet
100 HF Brochure N
2 pages
Training Needs Assessment
100% (1)
Training Needs Assessment
11 pages
Loctite 515 and 518 Tech. Sheet
No ratings yet
Loctite 515 and 518 Tech. Sheet
6 pages
Evaluating the effects of digital finance on urban poverty
No ratings yet
Evaluating the effects of digital finance on urban poverty
11 pages
Ding Dong Project
No ratings yet
Ding Dong Project
4 pages
I Think You Will... : Activity Type
No ratings yet
I Think You Will... : Activity Type
2 pages
Hobart City Campus Guide
No ratings yet
Hobart City Campus Guide
1 page
Questions & Answers On Synchronous Motors
100% (1)
Questions & Answers On Synchronous Motors
28 pages
Artificial Intelligence Concepts Areas Techniques and Applications 1st Edition Anne Håkansson instant download
100% (1)
Artificial Intelligence Concepts Areas Techniques and Applications 1st Edition Anne Håkansson instant download
49 pages

Linear Regression and Gradient Descent (1)

Uploaded by

Linear Regression and Gradient Descent (1)

Uploaded by

LINEAR REGRESSION

• Linear Regression is a statistical technique used to find

You might also like