Regression
Regression
Dr Vidhya Kamakshi
Assistant Professor
National Institute of Technology Calicut
[email protected]
Earnings
Expenses
• For a best fit the least squares error has to be low for most if not all
data points
• Objective/ Loss function is thus
𝑁
1
𝐽 𝑤 = 𝑓 𝑥𝑖 ; 𝑤 − 𝑦𝑖 2
2𝑁
𝑖=1
To be read as 𝑓 𝑥𝑖
parameterized by w
Fall 2024 – B Tech S7 - Dr Vidhya Kamakshi 4
𝑤0 = 0 , 𝑤1 = 0.5 𝑤0 = 0 , 𝑤1 = 2
….
𝑁
1
𝑤1𝑛𝑒𝑤 = 𝑤1𝑜𝑙𝑑 − 𝛼 𝑥𝑖 (𝑓(𝑥𝑖 ; 𝑤 𝑜𝑙𝑑 ) − 𝑦𝑖 )
𝑁
𝑖=1
1
• is mostly ignored in analysis but helps practically
𝑁
• *Practical implementations use batch mode processing
• Dot Product f ∶ ℝ𝑛 → ℝ = 𝑝𝑇 𝑞 = 𝑝1 𝑞1 + 𝑝2 𝑞2 + … + 𝑝𝑛 𝑞𝑛
∇𝑝1 𝑓
∇ 𝑓
• ∇𝑝 𝑓 = 𝑝2 ∈ ℝ𝑛
…
∇𝑝𝑛 𝑓
𝑞1
𝑞2
• ∇𝑝 𝑓 = … ∈ ℝ𝑛
𝑞𝑛
• ∇𝑝 𝑝𝑇 𝑞 = q ; Similarly ∇𝑞 𝑝𝑇 𝑞 = p
Under fit
y
from a Gaussian 𝑓(𝑥)
(normal) distribution
– Why?
• 3 𝜎 rule x
− 𝜖2
1
• 𝑃 𝜖 = 1 𝑒 2𝜎2
2𝜋 2 𝜎
Minimize noise Analogous to minimizing the objective function – Least Squares Error