0% found this document useful (0 votes)
8 views

Lect03 CSN382

Uploaded by

Himanshu Saxena
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Lect03 CSN382

Uploaded by

Himanshu Saxena
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

INDIAN INSTITUTE OF TECHNOLOGY ROORKEE

Machine Learning
CSN-382 (Lecture 3)
Dr. R. Balasubramanian
Professor
Department of Computer Science and Engineering
Mehta Family School of Data Science and Artificial Intelligence
Indian Institute of Technology Roorkee
Roorkee 247 667
[email protected]
https://faculty.iitr.ac.in/cs/bala/
Limitations of ML

There are some limitations and scope of


improvement as well.

● Related to data
○ Lack of suitable data & human bias in
the data
○ Data privacy and ethical issues
○ Rapid changes in the data
● Related to models
○ Biased models
○ Poor performance in production
○ Regular training required
○ Black box models
● Related to infrastructure
○ Expensive infrastructure requirement

2
Limitations of ML

3
How reliable are the models we
train?

[1] https://bgr.com/2018/06/22/uber-self-driving-car-crash-arizona-hulu-logs/
[2] https://www.mirror.co.uk/news/world-news/google-self-driving-car-hits-
7529261

4
Regression

● Regression generally means “stepping back towards the


average”.
● Regression analysis is also defined as the measure of
average relationship between two or more variables.
● The predicted variable is known as
dependent/target/response.
● The variable(s) which is used for prediction is known as
independent/explanatory/regressor.

5
Terminologies Related to the
Regression Analysis

► Dependent Variable
► Independent Variable
► Outliers
► Multicollinearity
► Underfitting and Overfitting

6
Types of Regression

7
What is linear regression?

● Linear regression is one of the supervised learning


algorithms that are used to identify a relationship between
two or more variables.
● This relationship can be used to predict values for one
variable, if value(s) of other variable(s) is given.
● A simple linear regression model (also called bivariate
regression) has one independent variable X that has a
linear relationship with the dependent variable Y.
y = β0 + β1x + ε
Here β0 and β1 are the parameters of the linear regression
model.
8
Regression: Use case

● Let us consider impact of a single variable for now.

Age Blood
(Independent Pressure
variable) (target
variable)

We say, that only Age decides what the Blood pressure of


a person should be.

9
Regression: Use case

Age in Blood
years pressure
Let us consider this data: (x) (y)
● The task is to predict the 25 120
blood pressure when the 36 135
age is 40. 68 143
55 139
49 120
72 165
40 ?
10
Linear Regression line

y = β0 + β1x + ε
y = set of values taken by
dependent variable Y
x = set of values taken by
independent variable X
β0 = y intercept
β1 = slope
ε = random error component

11
Linear Regression line

In context with our example,


Blood Pressure = β0 + β1 Age + ε
y = set of values taken by dependent variable, blood
pressure
x = set of values taken by independent variable, age
β0 = Blood pressure value where the best fit line cuts the Y -
axis (blood pressure)
β1 = beta coefficient for age
ε = random error component

12
What is error term?

● Error term also called residual represents the distance of the


observed value from the value predicted by regression line.
● In our example,
Error term = Actual blood pressure - Predicted blood pressure
for each observation

13
Calculating the error term
● Equation of regression line is

given by, y = β0 + β1x + ε

● Error term can be calculated

as, ε = y - (β0 + β1x)


● We have an error term for
every observation in the data.
εi = yactual - ypredicted

● Sum of squared errors = ∑ εi2


14
Which line best fits our data?

● The regression line which best


explains the trend in the data
is the best fit line
● The line with the least error
will be chosen as the best
fitting line

15
Methods to get the best fit line

● Two methods can be used to find the best fit line

Ordinary least squares Gradient Descent

16
Linear Regression Model

Relationship Between Variables Is a Linear Function

Population Population Slope Random Error


Y-Intercept

Y i   0  1 X i   i
Dependent Independent (Explanatory) Variable
(Response) Variable

17
OLS

18
19
Measures of variation

Sum of squared error (SSE):


Σ(yi-ŷi)2
Sum of squared total (SST):
Σ(yi-ȳ)2
Sum of squared regression
(SSR):
Σ(ŷi-ȳ)2
yi = observed values of y
ŷi = predicted values of y
ȳ = mean value of observed values of y

20
Gradient descent

● Using the OLS method, we get the estimates of parameters


of the linear regression model by minimizing the sum of the
square of errors.
● The gradient descent is an optimization technique which
finds the β parameters such that the error term is minimum.
● Computation speed for higher data dimension is more if
parameter were to be obtained using the OLS method
whereas the gradient descent does it faster.

21
Gradient descent

● An error function, also known as a


loss function is used to calculate Initial approximation

Loss Function
the cost associated with the
deviation of observed data from Cost Minima
Gradient

predicted data.
Beta coefficient to estimate
● It is an iterative method which
converges to the optimum solution.
● The estimates of the parameter are
updated at every iteration.

22
Gradient descent
● Consider a ball rolling down the slope as shown below
● Any position on the slope is the loss of the current values of the
coefficients (cost)
● The bottom of the slope where the cost function is minimum
● The objective is to find lowest point in the cost function by
continuously trying different values of the parameters
● Repeating this process numerous times, the best parameters are
such that the cost is minimum

23
Gradient Descent Algorithm

► Gradient Descent is an algorithm that finds the best-fit line


for a given training dataset in a smaller number of iterations.
► If we plot m and c against MSE, it will acquire a bowl shape.

24
► For some combination of m and c, we will get the least Error
(MSE). That combination of m and c will give us our best fit
line.
► The algorithm starts with some value of m and c (usually
starts with m=0, c=0). We calculate MSE (cost) at point m=0,
c=0. Let say the MSE (cost) at m=0, c=0 is 100.
► Then we reduce the value of m and c by some amount
(Learning Step). We will notice a decrease in MSE (cost).
► We will continue doing the same until our loss function is a
very small value or ideally 0 (which means 0 error or 100%
accuracy).

25
Algorithm

1. Let m = 0 and c = 0. Let L be our learning rate. It could be a


small value like 0.01 for good accuracy.
(Learning rate gives the rate of speed where the gradient
moves during gradient descent. Setting it too high would make
your path instable, too low would make convergence slow. Put
it to zero means your model isn’t learning anything from the
gradients.)

2. (a) Calculate the partial derivative of the Cost function with


respect to m. Let partial derivative of the Cost function with
respect to m be 𝐷𝑚 (With little change in m how much Cost
function changes).

26
Step 2(a)

27
Step 2(b)

28
2. (b) Similarly, let’s find the partial derivative with respect to c.
Let partial derivative of the Cost function with respect to c
be 𝐷𝑐 (With little change in c how much Cost function
changes).
3. Now update the current values of m and c using the
following equation:
𝑚 = 𝑚 − 𝐿𝐷𝑚
𝑐 = 𝑐 − 𝐿𝐷𝑐
4. We will repeat this process until our Cost function is very
small (ideally 0).

29
Problem

► Perform Linear regression using OLS and Gradient Descent


methods.

x y
2 3
4 7
6 5
8 10

30
Thank You!

31

You might also like