Class 5 - LinearRegression
Class 5 - LinearRegression
Supervised Learning
1
Types of Machine learning
3.1 Regression:
Linear Regression
4
Linear Regression
5
Simple linear regression
● Linear Regression with one variable
● The objective variable y and the input
variable x1 have the following linear
relationship:
6
Simple linear regression
● How do we estimate the coefficients (“fit the model”)?
● How to evaluate model fit from observed data?
7
Prediction error
● It is expected that the difference between
the true value y and the predicted value ŷ
is minimal
⇒ Sum of squared error is calculated by the
formula:
9
Linear Regression with one variable
Example:
● Given a data set consisting of population information and profit earned when opening
restaurants in 15 cities, the data distributed as follows:
Population Profit Population Profit
3.5 154
● Predict the profit of a certain restaurant, given the population of the city in which the
restaurant is located?
10
Linear Regression with one variable
Population Profit Population Profit
3.5 154
11
Linear Regression with one variable
12
Multiple linear regression
● Multivariable linear regression (Linear Regression with
multiple variables): the model with more than 1 variables is
used to predict the target variable (output):
ŷ = f(w,x)= w0 + w1x1 + w2x2 + …+ wnxn
● Loss function:
13
Multiple linear regression
● The derivation of loss function:
14
Multiple linear regression
Area (m2) Number of Number of Sale Price
● Example: bedrooms Floors ($1000)
852 2 1 178
1600 3 2 329
1985 5 1 420
1535 4 2 330
1050 2 1 195
2300 4 2 450
1200 3 2 250 15
Data regulation
● A house costs $2100, number of bedrooms: 5,
number of floors: 1. House price forecast:
16
Data regulation
Normalize the data to the same range
● Min max scale:
● Standard scale:
17
Advantage of linear regression
● Simple model, Easy to understand
● Continuous variables are predictable
● Simple optimal solution
● Easy to interpret the model through regression
coefficients
18
Limitation of linear regression
● The model is simple, so it is not flexible to represent
complex data relationships.
● Very sensitive to outliers (noise)
19
Pratical exercises
20