0% found this document useful (0 votes)

28 views60 pages

Linear Regression in Machine Learning

The document outlines the fundamentals of linear regression in machine learning, detailing the model definition, loss function, and parameter optimization. It explains the supervised learning setup, where the goal is to predict a target value based on input features. The document also discusses the steps involved in defining the model, measuring its goodness through loss functions, and optimizing parameters to minimize error.

Uploaded by

vinay

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views60 pages

Linear Regression in Machine Learning

Uploaded by

vinay

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

EECS 836: Machine Learning

Zijun Yao
Assistant Professor, EECS Department
The University of Kansas

1
Agenda

• Linear Regression model

• Model definition

• Loss function

• Optimizing parameters

2
Supervised learning setup
• Given a collection of records (training set) Training
• Each record is characterized by a pair (x, y) set

• x: feature, attribute, independent variable

Learning
• y: target, label, dependent variable algorithm
• Goal
• Learn a model (function ) so that f (x) can Function
correctly predict 𝑦ො for the corresponding value of y
• Tasks x 𝑦ො
• Regression: to predict a continuous value 𝑦ො Feature Predicted
• Classification: to predict a category class 𝑦ො value of y

3
* x is bolded because it represents a set of features; y is not because it is just a value.
House price prediction - regression

Size of House
# of Bedrooms f Price of House
…….

4
Linear regression

Independent variables (features) x Dependent variables (targets) y

• Given
• Data
• Corresponding labels

• Goal: find a continuous function that models the continuous points

5
3 ML steps for linear regression

Step 1: Step 2: Step 3: pick

define a set goodness the best
of function* of function function

Define a model Measure the error Optimizing parameters

*A set of function means same model but with different values of parameter. 6
Step 1: Model definition

Predicted value of y

Target 1st Feature 2nd Feature d-th Feature

A linear relationship between feature and target Data 7

Step 1: Model definition

Bias: a fixed offset Weights: significance of each feature

Parameters

Predicted value of y

Target 1st Feature 2nd Feature d-th Feature

A linear relationship between feature and target Data 8

Step 1: Define set of functions
Size of
house

𝑓 𝑥 = price 𝑦
𝑥𝑠𝑖𝑧𝑒 , 𝑥𝑏𝑎𝑡ℎ
# of bath

9
Step 1: Define set of functions
Size of
house

𝑓 𝑥 = price 𝑦
𝑥𝑠𝑖𝑧𝑒 , 𝑥𝑏𝑎𝑡ℎ
# of bath

Linear Regression model

10
Step 1: Define set of functions
Size of
house

𝑓 𝑥 = price 𝑦
𝑥𝑠𝑖𝑧𝑒 , 𝑥𝑏𝑎𝑡ℎ
# of bath

Linear Regression model

w and b are parameters (can be any value)

A set of f1:
f1 , f 2 
function f2:
With different values of parameters …… infinite 11
Step 1: A variant form of linear model

1st Feature 2nd Feature d-th Feature

12
Step 1: A variant form of linear model

Equivalence
by

1st Feature 2nd Feature d-th Feature

0st Feature 1st Feature 2nd Feature d-th Feature

13
Step 1: A variant form of linear model

0st Feature 1st Feature 2nd Feature d-th Feature

Prediction Parameters

Linear Regression model

Function of 𝑥 Features

14
Agenda

• Linear Regression model

• Model definition

• Loss function

• Optimizing parameters

15
Step 2: Goodness of function
How good is a function? - measure the difference between predicted and true y

Model

A set of
function

f1 , f 2 

Training
Data

16
Step 2: Goodness of function
How good is a function? - measure the difference between predicted and true y

Model function function

input: Output (scalar): label value:

A set of (𝟏)
𝒙𝒔𝒊𝒛𝒆 ∶ 4,043 sqft
function (𝟏)
𝒙𝒂𝒈𝒆 : 26 years 𝑦ො (1) = f(𝐱 (1) ) 𝑦 (1) 784,000

f1 , f 2 

Training (𝟐)
𝒙𝒔𝒊𝒛𝒆 ∶ 4,976 sqft
Data (𝟐) 𝑦ො (2) = f(𝐱 (2) ) 𝑦 (2) 724,900
𝒙𝒂𝒈𝒆 : 8 years
Suppose we have two house features in
this data: size and age Superscript means the data index
Step 2: Goodness of function
How good is a function? - measure the difference between predicted and true y

Model function function

input: Output (scalar): label value:
Measure the difference
A set of (𝟏)
𝒙𝒔𝒊𝒛𝒆 ∶ 4,043 sqft
function (𝟏)
𝒙𝒂𝒈𝒆 : 26 years 𝑦ො (1) = f(𝐱 (1) ) 𝑦 (1) 784,000

f1 , f 2 

Training (𝟐)
𝒙𝒔𝒊𝒛𝒆 ∶ 4,976 sqft
Data (𝟐) 𝑦ො (2) = f(𝐱 (2) ) 𝑦 (2) 724,900
𝒙𝒂𝒈𝒆 : 8 years
Suppose we have two house features in
this data: size and age Superscript means the data index
Step 2: Measure error
How good is a function? - Loss function 𝐿

A set of Model
Input: a function Output: the loss - how far is
function f1 , f 2  and data prediction from true value
𝑛 Sum of square error (SSE)
2
Goodness of 𝐿 𝑓 =෍ 𝑦 (𝑖) −𝑓 𝑥 (𝑖)
function f 𝑖=1
Sum over examples Estimated y based
on input function
Training 1
𝑛
2
Averaged by n, you have mean squared error (MSE) loss ෍ 𝑦 (𝑖) − 𝑓 𝑥 (𝑖)
Data 𝑛
𝑖=1

Loss function also called cost function and objective function

Step 2: Measure error
How good is a function? - Loss function 𝐿
𝑛
Input: a function and data 2
𝐿 𝑓 =෍ 𝑦 (𝑖) −𝑓 𝑥 (𝑖)
output: the loss - how far is 𝑖=1
prediction from true value Sum over examples
True y
Estimated y based
on input function
𝐿 𝑤𝑠𝑖𝑧𝑒 , 𝑤𝑎𝑔𝑒 , 𝑏
𝑛
2
(𝑖) (𝑖)
=෍ 𝑦 (𝑖) − 𝑏 + 𝑤𝑠𝑖𝑧𝑒 ∙ 𝑥𝑠𝑖𝑧𝑒 + 𝑤𝑎𝑔𝑒 ∙ 𝑥𝑎𝑔𝑒
𝑖=1

Loss function also called cost function and objective function

Step 2: Loss function

𝑛
2
𝐿 𝑓 =෍ 𝑦 (𝑖) −𝑓 𝑥 (𝑖)
𝑖=1

Estimated y based on
input function

A simple case where only one feature is used to predict y 21

Step 2: Loss function

𝑛
2
𝐿 𝑓 =෍ 𝑦 (𝑖) −𝑓 𝑥 (𝑖)
𝑖=1

A simple case where only one feature is used to predict y When there are 2 features to predict y 22
Step 2: Intuition of loss function
• Let’s use a simple case with only one feature
𝑛
2
𝐿 𝑓 =෍ 𝑦 (𝑖) −𝑤∙𝑥 𝑖

𝑖=1
𝑓 𝑥 𝐿 𝑓

3 15
𝑤=1
𝑦 2 L 𝑓 10

1 5

𝑏=0
0 1 2 3 0 0.5 1 1.5
𝑥 𝑤
𝐿 𝑓 = (1−1)2 +(2−2)2 +(3−3)2 = 0
23
Step 2: Intuition of loss function
• Let’s use a simple case with only one feature
𝑛
2
𝐿 𝑓 =෍ 𝑦 (𝑖) −𝑤∙𝑥 𝑖

𝑖=1
𝑓 𝑥 𝐿 𝑓

3 15

𝑤 = 0.5 10
𝑦 2 L 𝑓

1 5

𝑏=0
0 1 2 3 0 0.5 1 1.5
𝑥 𝑤
𝐿 𝑓 = (1−0.5)2 +(2−1)2 +(3−1.5)2 = 3.5
24
Step 2: Intuition of loss function
• Let’s use a simple case with only one feature
𝑛
2
𝐿 𝑓 =෍ 𝑦 (𝑖) −𝑤∙𝑥 𝑖

𝑖=1
𝑓 𝑥 𝐿 𝑓

3 15

𝑦 2 L 𝑓 10

1 5
𝑤=0
𝑏=0
0 1 2 3 0 0.5 1 1.5
𝑥 𝑤
𝐿 𝑓 = (1−0)2 +(2−0)2 +(3−0)2 = 14
25
Step 2: Intuition of loss function
• Let’s use a simple case with only one feature
𝑛
2
𝐿 𝑓 =෍ 𝑦 (𝑖) −𝑤∙𝑥 𝑖

𝑖=1
𝑓 𝑥 𝐿 𝑓
Loss function L is
3 15 a concave

𝑦 2 L 𝑓 10

1 5
𝑤=0
𝑏=0
0 1 2 3 0 0.5 1 1.5
𝑥 𝑤
𝐿 𝑓 = (1−0)2 +(2−0)2 +(3−0)2 = 14
26
Step 2: Intuition of loss function

One-feature case Two-feature case

L(𝑓)
𝐿 𝑓 10

0 0.5 1 1.5 𝑤1
𝑤2
𝑤

Loss function tracks the performance of model as parameters change

27
Agenda

• Linear Regression model

• Model definition

• Loss function

• Optimizing parameters

28
Step 3: Find the best function

𝐿 𝑤, 𝑏
𝑛
A set of Model 𝑖
2
=෍ 𝑦𝑖 − 𝑏 + 𝑤𝑠𝑖𝑧𝑒 ∙ 𝑥𝑠𝑖𝑧𝑒 + 𝑖
𝑤𝑎𝑔𝑒 ∙ 𝑥𝑎𝑔𝑒
function f1 , f 2  𝑖=1

Loss
Goodness of
function f

Training
Data

29
Step 3: Find the best function

𝐿 𝑤, 𝑏
𝑛
A set of Model 𝑖
2
=෍ 𝑦𝑖 − 𝑏 + 𝑤𝑠𝑖𝑧𝑒 ∙ 𝑥𝑠𝑖𝑧𝑒 + 𝑖
𝑤𝑎𝑔𝑒 ∙ 𝑥𝑎𝑔𝑒
function f1 , f 2  𝑖=1

Loss The “best” – the function

Goodness of gives minimum loss
function f
Optimizing Search 𝑤, 𝑏 to find
Parameters minimum 𝐿
Training
Data

30
Step 3: Find the best function

𝐿 𝑤, 𝑏
𝑛
A set of Model 𝑖
2
=෍ 𝑦𝑖 − 𝑏 + 𝑤𝑠𝑖𝑧𝑒 ∙ 𝑥𝑠𝑖𝑧𝑒 + 𝑖
𝑤𝑎𝑔𝑒 ∙ 𝑥𝑎𝑔𝑒
function f1 , f 2  𝑖=1

Loss The “best” – the function

Goodness of gives minimum loss
function f
𝑓 ∗ = 𝑎𝑟𝑔 min L 𝑓 Optimizing Search 𝑤, 𝑏 to find
𝑓
Parameters minimum 𝐿
𝑤 ∗ , 𝑏 ∗ = 𝑎𝑟𝑔 min L 𝑤, 𝑏
𝑤,𝑏
Training 𝑛
2
Data (𝑖) (𝑖)
= 𝑎𝑟𝑔 min ෍ 𝑦 (𝑖) − 𝑏 + 𝑤𝑠𝑖𝑧𝑒 ∙ 𝑥𝑠𝑖𝑧𝑒 + 𝑤𝑎𝑔𝑒 ∙ 𝑥𝑎𝑔𝑒
𝑤,𝑏
𝑖=1
31
𝐿(𝑤)
Derivatives Tangent line (𝑤=1)

𝐿(𝑤)
• The derivative of loss function 𝐿 is

The sensitivity to change of loss function

with respect to a change in a parameter 𝑤

32
𝐿(𝑤)
Derivatives Tangent line (𝑤=1)

𝐿(𝑤)
• The derivative of loss function 𝐿 is

• Partial Derivatives: Let be loss function of parameters

Function derivative with respect to one of

parameters 𝑤𝑖 , with the others held constant

33
𝐿(𝑤)
Derivatives Tangent line (𝑤=1)

𝐿(𝑤)
• The derivative of loss function 𝐿 is

• Partial Derivatives: Let be loss function of parameters

• Gradients
Gradients is a vector consists of
partial derivative of each parameter

34
𝐿(𝑤)
Derivatives Tangent line (𝑤=1)

𝐿(𝑤)
• The derivative of loss function 𝐿 is

• Partial Derivatives: Let be loss function of parameters

• Gradients How to reduce Loss function?

Subtract gradient from each
parameter w
Gradient is a direction that increase the value of 𝐿(𝒘)
35
Step 3: Gradient descent
∗
• Consider loss function 𝐿(𝑤) with one parameter w: 𝑤 = 𝑎𝑟𝑔 min
𝑤
𝐿 𝑤

➢ (Randomly) Pick an initial value w0 at time 0

𝑑𝐿
➢ Compute | 0
Loss 𝑑𝑤 𝑤=𝑤
𝐿 𝑤 Negative Increase w

Positive Decrease w

w0 w
36
Step 3: Gradient descent
∗
• Consider loss function 𝐿(𝑤) with one parameter w: 𝑤 = 𝑎𝑟𝑔 min
𝑤
𝐿 𝑤

➢ (Randomly) Pick an initial value w0 at time 0

𝑑𝐿
➢ Compute | 𝑡 𝑑𝐿
Loss 𝑑𝑤 𝑤=𝑤 𝑤 𝑡+1 ← 𝑤𝑡 −𝛼 |𝑤=𝑤 𝑡
𝐿 𝑤 𝑑𝑤

𝛼 is called
w0 𝑑𝐿 w1 “learning rate” w
−𝛼 |𝑤=𝑤 0 37
𝑑𝑤 Usually small, like 0.05
Step 3: Gradient descent
∗
• Consider loss function 𝐿(𝑤) with one parameter w: 𝑤 = 𝑎𝑟𝑔 min
𝑤
𝐿 𝑤

➢ (Randomly) Pick an initial value w0

𝑑𝐿
➢ Compute | 0 1 0
𝑑𝐿
Loss 𝑑𝑤 𝑤=𝑤 𝑤 ←𝑤 −𝛼 |𝑤=𝑤 0
𝐿 𝑤 𝑑𝑤
𝑑𝐿
➢ Compute |𝑤=𝑤 1 2 1
𝑑𝐿
𝑑𝑤 𝑤 ←𝑤 −𝛼 |𝑤=𝑤 1
𝑑𝑤
repeat iterations until convergence
Local global
minima minima
w0 w1 w2 wT w
38
𝜕𝐿
Step 3: Gradient descent 𝛻𝐿 = 𝜕𝑤
𝜕𝐿
• How about two parameters? 𝜕𝑏 gradient
𝑤 ∗ , 𝑏 ∗ = 𝑎𝑟𝑔 min 𝐿 𝑤, 𝑏
𝑤,𝑏

➢ (Randomly) Pick an initial value for each parameter w0, b0

𝜕𝐿 𝜕𝐿
➢ Compute | 0 0, | 0 0
𝜕𝑤 𝑤=𝑤 ,𝑏=𝑏 𝜕𝑏 𝑤=𝑤 ,𝑏=𝑏

1 0
𝜕𝐿 𝜕𝐿
𝑤 ←𝑤 −𝛼 |𝑤=𝑤 0 ,𝑏=𝑏0 𝑏1 ← 𝑏0 − 𝛼 |𝑤=𝑤 0 ,𝑏=𝑏0
𝜕𝑤 𝜕𝑏
𝜕𝐿 𝜕𝐿
➢ Compute | 1 1, | 1 1
𝜕𝑤 𝑤=𝑤 ,𝑏=𝑏 𝜕𝑏 𝑤=𝑤 ,𝑏=𝑏

𝜕𝐿 𝜕𝐿
𝑤2 ← 𝑤1 −𝛼 |𝑤=𝑤 1 ,𝑏=𝑏1 2 1
𝑏 ← 𝑏 − 𝛼 |𝑤=𝑤 1 ,𝑏=𝑏1
𝜕𝑤 𝜕𝑏
repeat iterations until convergence
39
Step 3: Gradient descent
• Gradient of Linear Regression

40
Step 3: Gradient descent
𝑓 𝑥 𝐿 𝑤, 𝑏
High 𝐿

𝑤
Low 𝐿
𝑏

How does the loss get minimized by gradient descent

Slide by Andrew Ng 41
Step 3: Gradient descent
𝑓 𝑥 𝐿 𝑤, 𝑏
High 𝐿

𝑤
Low 𝐿
𝑏

How does the loss get minimized by gradient descent

Slide by Andrew Ng 42
Step 3: Gradient descent
𝑓 𝑥 𝐿 𝑤, 𝑏
High 𝐿

𝑤
Low 𝐿
𝑏

How does the loss get minimized by gradient descent

Slide by Andrew Ng 43
Step 3: Gradient descent
𝑓 𝑥 𝐿 𝑤, 𝑏
High 𝐿

𝑤
Low 𝐿
𝑏

How does the loss get minimized by gradient descent

Slide by Andrew Ng 44
Step 3: Gradient descent
𝑓 𝑥 𝐿 𝑤, 𝑏
High 𝐿

𝑤
Low 𝐿
𝑏

How does the loss get minimized by gradient descent

Slide by Andrew Ng 45
Step 3: Gradient descent
𝑓 𝑥 𝐿 𝑤, 𝑏
High 𝐿

𝑤
Low 𝐿
𝑏

How does the loss get minimized by gradient descent

Slide by Andrew Ng 46
Step 3: Gradient descent
𝑓 𝑥 𝐿 𝑤, 𝑏
High 𝐿

𝑤
Low 𝐿
𝑏

How does the loss get minimized by gradient descent

Slide by Andrew Ng 47
Step 3: Gradient descent
𝑓 𝑥 𝐿 𝑤, 𝑏
High 𝐿

𝑤
Low 𝐿
𝑏

How does the loss get minimized by gradient descent

Slide by Andrew Ng 48
Step 3: Gradient descent
𝑓 𝑥 𝐿 𝑤, 𝑏 Find minimum!
High 𝐿

𝑤
Low 𝐿
𝑏

How does the loss get minimized by gradient descent

Slide by Andrew Ng 49
Step 3: Gradient descent
• Small gradient can slow down or halt the optimization
Loss
Very slow at
the plateau
Stuck at
saddle point
Stuck at local minima

𝜕𝐿 ∕ 𝜕𝑤 𝜕𝐿 ∕ 𝜕𝑤 𝜕𝐿 ∕ 𝜕𝑤
≈0 =0 =0

The value of the parameter w 50

Step 3: Gradient descent – learning rate 𝜶

Monitor the loss at each iteration

- Search 𝛼 by order of magnitude
at first (like, 10−1 … 10−5 )
- Tune 𝛼 locally to achieve efficient
convergence

[Link]
Linear algebra review
• Vector in ℝ𝑑 is an ordered set of 𝑑 real values
• Matrix in ℝ𝑛×𝑚 is a 𝑛 by 𝑚 object with 𝑛 rows and 𝑚
columns
• Transpose

• Matrix production

52
4×2 2×3 4×3
Vectorization of linear regression
• Benefits of vectorization
• More compact equations
• Faster code (using optimized matrix libraries)
• Linear regression model:

• Let

• In vectorized form, linear regression model:

53
Vectorization of linear regression
• Consider the model for 𝑛 instance

• Let

ℝ(𝑑+1)×1 ℝ𝑛×(𝑑+1)
• In vectorized form, linear regression model:
54
Vectorization of linear regression
• For the loss function

One time calculation, without iterating through all data samples.

55
Improving learning
• Feature scaling (or normalization)
• Ensure all features have similar scales
• Gradient descent would converge faster

56
[Link]
Feature standardization
• Rescale features to have zero mean and unit variance

• Let 𝜇𝑗 be the mean of feature 𝑗

• Let 𝑠𝑗 be the standard deviation of feature 𝑗

• Replace each value with

for 𝑗 = 1 … 𝑑 (not 𝑥0 )

• Must apply the same transformation for both training and testing instances

• Outliers can cause problems

57
Regularization
• A method to control the complexity of model, avoid overfitting
• Why - address overfitting issues by keeping 𝑤 small
• How - Penalize for large value of 𝑤𝑗
• Can incorporate into the loss function
• Works well when we have a lot of features

𝑛 𝑑
2
Also called 𝐿2 -norm
𝐿 𝑓 =෍ 𝑦 𝑖 −𝑓 𝑥 𝑖 + 𝜆 ෍ 𝑤𝑗2
𝑖=1 𝑗=1

model fit to data regularization

o 𝜆 is the predefined hyperparameter to control the degree of regularization
o No regularization on 𝑤0 (bias 𝑏)
58
Summary
• Problem: estimate a real value
• Model: 𝑦ො = 𝑏 + 𝑤1 𝑥1 + 𝑤2 𝑥2 + ⋯ + 𝑤𝑑 𝑥𝑑
• Loss function: sum of square (SSE)
𝑛 𝑛
1
𝐿 𝐰, 𝑏 = ෍ 𝑙 𝑖 𝐰, 𝑏 = ෍ (𝑦 𝑖
− 𝑦ො (𝑖) )2
𝑛
𝑖=1 𝑖=1

• Optimize parameters by Gradient Descent method

• Choose a starting point
• Repeat
• Compute gradient
• Update parameters
59
Demo
• Use ML library
• [Link]

Deriving Linear Regression Loss Function
No ratings yet
Deriving Linear Regression Loss Function
29 pages
Linear Regression Basics and Techniques
No ratings yet
Linear Regression Basics and Techniques
34 pages
L02 Linear Regression
No ratings yet
L02 Linear Regression
9 pages
Introduction to Linear Regression
No ratings yet
Introduction to Linear Regression
9 pages
Gradient Descent for Multivariable Regression
No ratings yet
Gradient Descent for Multivariable Regression
101 pages
Understanding Linear Regression Basics
No ratings yet
Understanding Linear Regression Basics
20 pages
Linear Regression Overview
No ratings yet
Linear Regression Overview
56 pages
Linear Regression for Color Mixing Analysis
No ratings yet
Linear Regression for Color Mixing Analysis
10 pages
Linear Regression in Machine Learning
No ratings yet
Linear Regression in Machine Learning
10 pages
Linear Regression Fundamentals Explained
No ratings yet
Linear Regression Fundamentals Explained
80 pages
Linear Regression in Python Overview
No ratings yet
Linear Regression in Python Overview
90 pages
Linear and Logistic Regression Basics
No ratings yet
Linear and Logistic Regression Basics
60 pages
Linear Neural Networks Explained
No ratings yet
Linear Neural Networks Explained
36 pages
Understanding Linear Regression Basics
No ratings yet
Understanding Linear Regression Basics
28 pages
Linear Regression and Gradient Descent
No ratings yet
Linear Regression and Gradient Descent
40 pages
Understanding Linear Regression Basics
No ratings yet
Understanding Linear Regression Basics
101 pages
Linear and Logistic Regression Overview
No ratings yet
Linear and Logistic Regression Overview
65 pages
Univariate Linear Regression Explained
No ratings yet
Univariate Linear Regression Explained
72 pages
Supervised Learning in Machine Learning
No ratings yet
Supervised Learning in Machine Learning
53 pages
Linear Regression Basics and Model Fitting
No ratings yet
Linear Regression Basics and Model Fitting
35 pages
Machine Learning: Supervised & Unsupervised
No ratings yet
Machine Learning: Supervised & Unsupervised
75 pages
Linear Regression for Housing Prices
No ratings yet
Linear Regression for Housing Prices
47 pages
Understanding Linear Regression Basics
No ratings yet
Understanding Linear Regression Basics
10 pages
Understanding Linear Regression in ML
No ratings yet
Understanding Linear Regression in ML
61 pages
Boston Housing Price Prediction Analysis
No ratings yet
Boston Housing Price Prediction Analysis
33 pages
Linear Regression and Gradient Descent
100% (1)
Linear Regression and Gradient Descent
51 pages
Linear Regression in Machine Learning
No ratings yet
Linear Regression in Machine Learning
51 pages
Introduction to Linear Regression in ML
No ratings yet
Introduction to Linear Regression in ML
29 pages
Training Machine Learning Models Overview
No ratings yet
Training Machine Learning Models Overview
83 pages
Limitations and Methods of Linear Regression
No ratings yet
Limitations and Methods of Linear Regression
31 pages
Linear Regression Basics in ML
No ratings yet
Linear Regression Basics in ML
1 page
Linear Regression Explained: Models & Methods
No ratings yet
Linear Regression Explained: Models & Methods
52 pages
Understanding Linear Models and Regression
No ratings yet
Understanding Linear Models and Regression
62 pages
Linear Regression in Machine Learning
No ratings yet
Linear Regression in Machine Learning
91 pages
Linear Regression for House Price Prediction
No ratings yet
Linear Regression for House Price Prediction
29 pages
Linear Regression and PCA Overview
No ratings yet
Linear Regression and PCA Overview
80 pages
Linear Regression Fundamentals
No ratings yet
Linear Regression Fundamentals
48 pages
Understanding Mean Squared Error in Regression
No ratings yet
Understanding Mean Squared Error in Regression
18 pages
Linear Regression Methods in ML
No ratings yet
Linear Regression Methods in ML
53 pages
Linear Models Training Overview
No ratings yet
Linear Models Training Overview
23 pages
Linear Regression and Loss Function Design
No ratings yet
Linear Regression and Loss Function Design
40 pages
Linear Regression Fundamentals Explained
No ratings yet
Linear Regression Fundamentals Explained
9 pages
Understanding Linear Regression Basics
No ratings yet
Understanding Linear Regression Basics
7 pages
Linear Models in Deep Learning
No ratings yet
Linear Models in Deep Learning
28 pages
Linear Regression Analysis in Python
No ratings yet
Linear Regression Analysis in Python
115 pages
Linear Regression and Optimization in ML
No ratings yet
Linear Regression and Optimization in ML
42 pages
Machine Learning Optimization Techniques
No ratings yet
Machine Learning Optimization Techniques
70 pages
Understanding Linear Regression Basics
No ratings yet
Understanding Linear Regression Basics
5 pages
Understanding Linear Regression Basics
No ratings yet
Understanding Linear Regression Basics
130 pages
Understanding Linear Regression Basics
No ratings yet
Understanding Linear Regression Basics
29 pages
Understanding Linear Models in Regression
No ratings yet
Understanding Linear Models in Regression
10 pages
Understanding Regression Variables
No ratings yet
Understanding Regression Variables
5 pages
Linear Regression: Predictors & Outcomes
No ratings yet
Linear Regression: Predictors & Outcomes
6 pages
Machine Learning: Regression Techniques
No ratings yet
Machine Learning: Regression Techniques
24 pages
Supervised Learning and Linear Regression
No ratings yet
Supervised Learning and Linear Regression
8 pages
Understanding Regression Techniques
No ratings yet
Understanding Regression Techniques
37 pages
Compute Gradient for Linear Regression
No ratings yet
Compute Gradient for Linear Regression
14 pages
Linear Regression and Optimization Techniques
No ratings yet
Linear Regression and Optimization Techniques
41 pages
Column Generation in Linear Programming
No ratings yet
Column Generation in Linear Programming
121 pages
Engineering Mathematics II Test - Jan 2022
No ratings yet
Engineering Mathematics II Test - Jan 2022
2 pages
IDFT Calculation for Sequence X(k)
No ratings yet
IDFT Calculation for Sequence X(k)
2 pages
Conclusion of Numerical Linear Algebra
No ratings yet
Conclusion of Numerical Linear Algebra
14 pages
MIMO Control: Fuzzy vs. PID vs. Eigenstructure
No ratings yet
MIMO Control: Fuzzy vs. PID vs. Eigenstructure
4 pages
Consistency in Linear Systems Analysis
No ratings yet
Consistency in Linear Systems Analysis
12 pages
Applied Integer Programming Course Details
No ratings yet
Applied Integer Programming Course Details
3 pages
Cayley-Hamilton Theorem in Algebra
No ratings yet
Cayley-Hamilton Theorem in Algebra
74 pages
Understanding Polynomial Parts & Operations
No ratings yet
Understanding Polynomial Parts & Operations
69 pages
Construction Engineering Technology Program
No ratings yet
Construction Engineering Technology Program
22 pages
Least Squares Problems and Solutions
No ratings yet
Least Squares Problems and Solutions
30 pages
Numerical Methods for Root Finding
No ratings yet
Numerical Methods for Root Finding
12 pages
Bracketing Methods in Numerical Analysis
No ratings yet
Bracketing Methods in Numerical Analysis
18 pages
Polynomial Functions and Their Zeros
100% (1)
Polynomial Functions and Their Zeros
34 pages
GATE 2014 ECE: Prime Implicants Guide
100% (2)
GATE 2014 ECE: Prime Implicants Guide
112 pages
Engineering Mathematics I Question Bank
No ratings yet
Engineering Mathematics I Question Bank
8 pages
Finite Element Method in Electromagnetics
No ratings yet
Finite Element Method in Electromagnetics
52 pages
Optimizing Dairy Transportation Costs
No ratings yet
Optimizing Dairy Transportation Costs
6 pages
Understanding Polynomials: Operations and Examples
No ratings yet
Understanding Polynomials: Operations and Examples
4 pages
Optimization Problems and Solutions Analysis
No ratings yet
Optimization Problems and Solutions Analysis
7 pages
MATLAB Polynomial and Spline Interpolation
No ratings yet
MATLAB Polynomial and Spline Interpolation
20 pages
Class 9 Polynomials MCQ Worksheet
No ratings yet
Class 9 Polynomials MCQ Worksheet
2 pages
Truncation Error in Numerical Methods
No ratings yet
Truncation Error in Numerical Methods
2 pages
Newton's Method for Root Finding
No ratings yet
Newton's Method for Root Finding
1 page
Greedy Algorithm for Knapsack Problem
No ratings yet
Greedy Algorithm for Knapsack Problem
24 pages
Galerkin FEM in Numerical Methods
No ratings yet
Galerkin FEM in Numerical Methods
7 pages
Understanding Duality in Linear Programming
No ratings yet
Understanding Duality in Linear Programming
17 pages
Applied Computational Fluid Dynamics
100% (1)
Applied Computational Fluid Dynamics
354 pages
Robotics Assignment Overview
No ratings yet
Robotics Assignment Overview
20 pages
Piled Raft Foundations in Addis Ababa
100% (1)
Piled Raft Foundations in Addis Ababa
125 pages

Linear Regression in Machine Learning

Uploaded by

Linear Regression in Machine Learning

Uploaded by

EECS 836: Machine Learning

• Linear Regression model

• x: feature, attribute, independent variable

Independent variables (features) x Dependent variables (targets) y

• Goal: find a continuous function that models the continuous points

Step 1: Step 2: Step 3: pick

Define a model Measure the error Optimizing parameters

Target 1st Feature 2nd Feature d-th Feature

A linear relationship between feature and target Data 7

Bias: a fixed offset Weights: significance of each feature

Target 1st Feature 2nd Feature d-th Feature

A linear relationship between feature and target Data 8

Linear Regression model

Linear Regression model

w and b are parameters (can be any value)

1st Feature 2nd Feature d-th Feature

1st Feature 2nd Feature d-th Feature

0st Feature 1st Feature 2nd Feature d-th Feature

Linear Regression model

• Linear Regression model

Model function function

Model function function

Loss function also called cost function and objective function

Loss function also called cost function and objective function

A simple case where only one feature is used to predict y 21

One-feature case Two-feature case

Loss function tracks the performance of model as parameters change

• Linear Regression model

Loss The “best” – the function

Loss The “best” – the function

The sensitivity to change of loss function

• Partial Derivatives: Let be loss function of parameters

Function derivative with respect to one of

• Partial Derivatives: Let be loss function of parameters

• Partial Derivatives: Let be loss function of parameters

• Gradients How to reduce Loss function?

➢ (Randomly) Pick an initial value w0 at time 0

➢ (Randomly) Pick an initial value w0 at time 0

➢ (Randomly) Pick an initial value w0

➢ (Randomly) Pick an initial value for each parameter w0, b0

How does the loss get minimized by gradient descent

How does the loss get minimized by gradient descent

How does the loss get minimized by gradient descent

How does the loss get minimized by gradient descent

How does the loss get minimized by gradient descent

How does the loss get minimized by gradient descent

How does the loss get minimized by gradient descent

How does the loss get minimized by gradient descent

How does the loss get minimized by gradient descent

The value of the parameter w 50

Monitor the loss at each iteration

• In vectorized form, linear regression model:

One time calculation, without iterating through all data samples.

• Let 𝜇𝑗 be the mean of feature 𝑗

• Let 𝑠𝑗 be the standard deviation of feature 𝑗

• Replace each value with

• Outliers can cause problems

model fit to data regularization

• Optimize parameters by Gradient Descent method

You might also like