Linear Regression
Linear Regression
Learning
-
Classification
Linear Regression: Prediction Model
X (years of Y (salary, Rs
Given one variable X experience) 1,000)
Goal: Predict value of Y 3 30
Example: 8 57
◦ Given Years of Experience 9 64
◦ Predict Salary 13 72
Questions: 3 36
◦ When X=10, what is Y? 6 43
◦ When X=25, what is Y? 11 59
◦ This is known as regression
21 90
1 20
16 83
LINEAR REGRESSION
• Plot a graph between the cost and area of the house
Area Cost (Lakh) • The area of the house is represented in the X-axis while
(sq.feet) X Y cost is represented in Y-axis
1000 30 • What will Regression Do?
1200 40 • Fit the line through these points
1300 50
1450 70
1495 70
1600 80
Cost
Area
LINEAR REGRESSION
We are given a dataset of the form where, is a -dimensional feature vector (real), and a real
value
We want to learn a function which given a feature vector predicts a value that is as close as
possible to the value or itself
Minimize error or sum of square of the difference between actual value y i or predicted value :
Linear Regression Example
Linear Regression: Y=3.5*X+23.2
Y =
120
100
80
Salary
60
40
20
0
0 5 10 15 20 25
Years
Linear Regression - Visualization
The Simple Regression Model
Fit as good as possible a regression line through the data points:
Slope parameter
Intercept
Y =
Dependent variable,
LHS variable,
explained variable, Independent variable,
response variable,… RHS variable,
explanatory variable,
Control variable,…
The Simple Regression Model
Example: Soybean yield and fertilizer
Rainfall,
land quality,
presence of parasites, …
Measures the effect of fertilizer on
yield, holding all other factors fixed
Training Set
Learning Algorithm
h(x) = 1.5 + 0x
Hypothesis
Consider 0 = 0 and 1 = 1 Consider 0 = 0 and 1 = 0.5
h(x) = 0 + 1x h(x) = 0 + 0.5x
Linear Regression Example
Linear Regression: Y=3.5*X+23.2
120
100
80
Salary
60
40
20
0
0 5 10 15 20 25
Years
Basic Idea
◦ To be learned:
Example
Salary Dataset
X Y
Years of Salary in
Experience 1000s
3 30
8 57
9 64
13 72
3 36
6 43
11 59
21 90
1 20
16 83
Multivariate models
simple regression model
(Yrs_Experience) x y (Income)
(Work_score) x2
y (Income)
(Yrs_Experience) x3
(Age) x4
More than one prediction attribute
Equation: Y 1 x1 2 x2
Outliers
In the regression problem sometimes our features may have very different scales:
◦ For example: predict the GDP of a country using the features - number_of_properties and the income
◦ The weights in this case will not be interpretable
Solution: Normalize the features by replacing the values with the z-scores