ML4 Linear Models
ML4 Linear Models
Types of Algorithms
Parametric algorithms
• Assumes a known form to model
the input – output relationship
• Learns a fixed, pre-determined set
of parameters/coefficients
• Can learn quickly and work well
even on small data
• Constrained to the specified form,
prone to underfitting
Types of Algorithms
Non-Parametric algorithms
• Does not make strong assumption
about the form of the input-output
relationship
• Highly flexible to model non-linear,
complex data
• Can result in higher performance in
prediction
• Require more data to train and are
prone to overfitting
Supervised Learning Algorithms
Simple Complex
Module 4 Objectives:
At the conclusion of this module, you
should be able to:
Number
House
of
sale price
bedrooms
y = 𝑤 0 + 𝑤 1x
Number
of
bedrooms
Square House
footage sale price
…
School
district
Error(𝑥! ) = 𝑦$! − 𝑦!
x
We seek a model function that minimizes
total error ∑$ $! − 𝑦! ), or alternatively the
!"#(𝑦
Sum of Squared Error (SSE) ∑$ $! − 𝑦! )%
!"#(𝑦
Estimating the parameters
• SSE is our cost function (loss function)
$
z = 𝑥 & OR z = log(x)
$
Cost function with
regularization: 𝐽 𝑤 = &(𝑦! − 𝑤% + 𝑤# 𝑥!,# + ⋯ + 𝑤' 𝑥!,' )( +𝜆 ∗ 𝑃𝑒𝑛𝑎𝑙𝑡𝑦(𝑤# … 𝑤' )
!"#
$ '
$ '
y=1
y
𝑦! = 𝑤 0 + 𝑤# 𝑥#
y=0
x
Problems
• The linear regression will almost always
predict the wrong value
• How do we interpret predictions between 0
and 1?
• What about predictions greater than 1?
y=1
y
𝑦$ = 𝑤0 + 𝑤" 𝑥"
y=0
x
Solution: Predict the Probability y=1
• Rather than predicting y, let’s predict
the probability P(y=1),
• To do so we need a function that
predicts outputs between 0 and 1
• We use the logistic/sigmoid function
1
𝜎(𝑧) =
1 + 𝑒 '(
Solution: Predict the Probability y=1
• Desired model output is P(y=1)
• We use the sigmoid function to get outputs
between 0 and 1
• As input to the sigmoid we provide the
output of our linear regression (w0+w1x)
X
𝑥# 𝑤#
𝑥" 𝑤"
𝑤$ 𝑧
𝑥$ 𝑧 = 𝑤! 𝑥 𝜎(𝑧) 𝑃 𝑦=1
𝑤…
…
𝑤%
𝑥%
Estimating the parameters
To find the optimal values of w1…wp:
1. Define our cost function J(w)
2. Find the weight/coefficient values
that minimize the cost function
1. Calculate the derivative (gradient)
2. Set the gradient equal to 0
3. Solve for the coefficients using
gradient descent
Gradient descent
• Suppose we want to minimize a function
such as 𝑦 = 𝑥 .
• We start at some point on the curve and
move iteratively towards the minimum Gradient
points this
way
– Move in the direction opposite the
gradient
– Move by some small value (called the
learning rate or 𝜂) multiplied by the
gradient We move the
opposite way
• We continue until we find the minimum
or reach a set number of iterations
Estimating the parameters
1. Define our cost function 𝐽(𝑤)
2. Use gradient descent to find the values
of the weights that minimize the cost
– Calculate gradient of the cost function
– Iteratively update the weights using
gradient descent:
𝑤)*# = 𝑤) − 𝜂 ∗ ∇𝐽(𝑤) )
Dog: 0.8
X
Cat: 0.05
𝑥# 𝑤'# For each class k: Rabbit: 0.05
𝑥" 𝑤'"
Bear: 0.1
$ 𝑧'
𝑥$ 𝑤' 𝑧" = 𝑤"! 𝑥
𝑠𝑜𝑓𝑡𝑚𝑎𝑥
𝑃(𝑦 = 𝑘)
𝑤'… (𝑧' )
…
𝑤'%
𝑥%
Wrap-Up
Wrap-Up: Linear Models
• The mathematical intuition behind
linear models is the foundation of
neural networks