02-Linear Regression
02-Linear Regression
Deep Learning
Linear
Regression
Regression
Examples
• Predicting house prices based on features like size, location.
• Forecasting stock prices using historical data.
• Estimating a person's weight based on height and age.
Basic Elements of Linear Regression
● Assumption:
○ Linear relationship between independent variables x and the dependent
variable y
○ y can be expressed as a weighted sum of the elements in x, given noise on
the observations
○ Noise well behaved (follow Gaussian distribution)
● We need:
○ A training dataset or training set
○ Rows referred to as examples, data points, data instances, sample
○ Dependent variable is called the label or target
○ Independent variables are called the features or covariates
Notations
● Example:
○ We would like to estimate price of houses based on area age price
Based on the linearity assumption we say that the target is a weighted sum of
the features and translation (bias)
● Goal: choose weights and bias such that on average, predictions of the model
fit the true prices observed in the data
● Linear models rely on the affine transformation specified by the chosen
weights and Bias
● So, generally speaking we have
● A function that quantifies the difference between real and predicted value of
the target
● The smaller the value of loss the better
● A popular loss function: squared error
estimation
● Empirical error is a function of the parameters
1 2
● 𝑙 (𝑖) 𝐰, 𝑏 = 𝑦ො (𝑖) − 𝑦 𝑖 , Where 𝑦ො (𝑖) = 𝐰 T 𝑥 (𝑖) + 𝑏
2
Term cancels
when we take
the derivative observation
of the loss
Loss Function
𝑦ො (𝑖)
1 𝑛 1 𝑛 1 2
𝐿(𝐰, 𝐛) = σ𝑖=1 𝑙 (𝑖) 𝐰, 𝑏 = σ 𝐰 T 𝑥 (𝑖) +𝑏− 𝑦 (𝑖)
𝑛 𝑛 𝑖=1 2
● We seek to iteratively reduce the error of the model and improve its quality:
○ Updating the parameters in the direction that incrementally lowers the
loss function
● We use the gradient descent algorithm to achieve it
● We can directly take the derivative of the average loss on the entire dataset
○ This requires pass over the entire dataset before making a single update
● A better solution is called miniBatch stochastic gradient descent
○ Sampling random minibatch of examples and
○ Take the derivative of the average loss on the minibatch with regard to
the model parameters and then compute the update
MiniBatch Stochastic Gradient Descend
Linear regression with the square loss can be motivated by assuming that the
observations arise from noisy distributions. Hence,
Linear regression as single-layer neural network