0% found this document useful (0 votes)
1 views

02-Linear Regression

The document provides an overview of linear regression, a supervised learning technique used to predict continuous outcomes by estimating relationships among variables. It discusses the basic elements, notations, loss functions, optimization methods, and the distinction between regression and classification. Key concepts include the linear model, the goal of minimizing loss through gradient descent, and the use of cross-entropy loss in classification problems.

Uploaded by

Kalp Patel
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views

02-Linear Regression

The document provides an overview of linear regression, a supervised learning technique used to predict continuous outcomes by estimating relationships among variables. It discusses the basic elements, notations, loss functions, optimization methods, and the distinction between regression and classification. Key concepts include the linear model, the goal of minimizing loss through gradient descent, and the use of cross-entropy loss in classification problems.

Uploaded by

Kalp Patel
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

CSD456

Deep Learning
Linear
Regression
Regression

• Regression is a type of supervised learning used to predict


continuous outcomes.
• It estimates the relationships among variables.
• Common in various fields like finance, biology, and economics.

Examples
• Predicting house prices based on features like size, location.
• Forecasting stock prices using historical data.
• Estimating a person's weight based on height and age.
Basic Elements of Linear Regression
● Assumption:
○ Linear relationship between independent variables x and the dependent
variable y
○ y can be expressed as a weighted sum of the elements in x, given noise on
the observations
○ Noise well behaved (follow Gaussian distribution)
● We need:
○ A training dataset or training set
○ Rows referred to as examples, data points, data instances, sample
○ Dependent variable is called the label or target
○ Independent variables are called the features or covariates
Notations

● n denotes the # of examples


T
𝑖 𝑖
● Index the data examples 𝐱 (𝑖) = 𝑥1 , 𝑥2
● Corresponding labels 𝑦 (𝑖)

● Example:
○ We would like to estimate price of houses based on area age price

area and age 15 10 15000


○ We want to develop a predictive model for predicting 25 15 25000
house prices
Linear Model

Based on the linearity assumption we say that the target is a weighted sum of
the features and translation (bias)

Bias -Take on this value when


Weights -Influence of features on
features take value of 0
the predictions
Linear Model

● Goal: choose weights and bias such that on average, predictions of the model
fit the true prices observed in the data
● Linear models rely on the affine transformation specified by the chosen
weights and Bias
● So, generally speaking we have

● Compact form using vectors


Linear Regression

Linear Regression 𝑦ො (𝑖) = 𝐰 ∗ 𝑥 (𝑖) + 𝑏

For more than 1D input, ෝ𝑦 (𝑖) = 𝐰 𝑇 𝐱 (𝑖) + 𝑏, 𝐱 (𝑖) ∈ ℝ𝑑 , 𝑏 ∈ ℝ

• Linear Regression is one of the simplest and


W
most widely used regression techniques.
y • It models the relationship between input
features (independent variables) and a
continuous output (dependent variable) using a
linear function.
x • The objective is to find the line (or hyperplane
-b/w1 in higher dimensions) that best fits the data.
Loss Function

● A function that quantifies the difference between real and predicted value of
the target
● The smaller the value of loss the better
● A popular loss function: squared error
estimation
● Empirical error is a function of the parameters
1 2
● 𝑙 (𝑖) 𝐰, 𝑏 = 𝑦ො (𝑖) − 𝑦 𝑖 , Where 𝑦ො (𝑖) = 𝐰 T 𝑥 (𝑖) + 𝑏
2

Term cancels
when we take
the derivative observation
of the loss
Loss Function
𝑦ො (𝑖)
1 𝑛 1 𝑛 1 2
𝐿(𝐰, 𝐛) = σ𝑖=1 𝑙 (𝑖) 𝐰, 𝑏 = σ 𝐰 T 𝑥 (𝑖) +𝑏− 𝑦 (𝑖)
𝑛 𝑛 𝑖=1 2

When training the


Number of examples model, we seek Note the superscript suggests
parameters that an operation applied to a single
minimize the total loss example
across all training
examples
Optimization

● We seek to iteratively reduce the error of the model and improve its quality:
○ Updating the parameters in the direction that incrementally lowers the
loss function
● We use the gradient descent algorithm to achieve it
● We can directly take the derivative of the average loss on the entire dataset
○ This requires pass over the entire dataset before making a single update
● A better solution is called miniBatch stochastic gradient descent
○ Sampling random minibatch of examples and
○ Take the derivative of the average loss on the minibatch with regard to
the model parameters and then compute the update
MiniBatch Stochastic Gradient Descend

Partial derivative of the


Term multiplied average loss of the
to the gradient minibatch
(learning rate)

3ubtract the result


from the current
parameters
Minibatch size
Optimization Algorithm

● Initialize model parameters (typically random)


● Sample random minibatches
● Update parameters in the direction of the negative gradient:
Prediction using Linear Regression Model

● We adjust hyperparameters (e.g., learning rate) assessed on validation set


● We aim to find parameters that achieve low loss on unseen data
○ Also referred to generalization (discussed later on)
● Given those learned parameters it is now possible to estimate targets given
features of a new instance
● We use the squares loss below to quantify goodness/badness of the model
○ Using maximum likelihood estimate principles we get negative log
likelihood
Normal Distribution and Squared loss

Linear regression with the square loss can be motivated by assuming that the
observations arise from noisy distributions. Hence,
Linear regression as single-layer neural network

Figure source: d2l.ai


Classification
● Classification aims to predict from set of categories (e.g., cat vs. dogs, positive
vs. negative)
● Hard assignment of examples to categories (classes)
● Soft assignments; assess probabilities (discussed later on)
● We want a model that estimates the conditional probabilities' with all
possible classes
○ Model with multiple outputs (one per class)
○ Goal: optimize our parameters to produce probabilities that maximize the
likelihood of the observed data
○ One main approach is softmax regression
Cross-entropy loss

● Used to measure quality of predicted probabilities


● Common loss function used in classification problems
● Computes the expected value of the loss for a distribution over labels
● Concretely, cross-entropy objective
○ Maximizes the likelihood of the observed data
○ Measures difference between two probability distributions
○ Minimize the surprisal require to communicate the labels (refer to information
theory)

You might also like