0% found this document useful (0 votes)
7 views

Linear regression for machine learning

Linear regression is a statistical method used to model the relationship between a dependent variable and one or more independent variables, providing insights for prediction and data analysis. It can be categorized into simple linear regression, which involves one independent variable, and multiple linear regression, which involves multiple independent variables. Key assumptions for linear regression include linearity, independence, homoscedasticity, and normality, ensuring the model's accuracy and reliability.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Linear regression for machine learning

Linear regression is a statistical method used to model the relationship between a dependent variable and one or more independent variables, providing insights for prediction and data analysis. It can be categorized into simple linear regression, which involves one independent variable, and multiple linear regression, which involves multiple independent variables. Key assumptions for linear regression include linearity, independence, homoscedasticity, and normality, ensuring the model's accuracy and reliability.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Linear regression for machine learning

• Linear regression is a statistical concept used to model a


relationship between the dependent variable on one or
many independent variables
• Linear regression provides valuable insights for
prediction and data analysis

Understanding linear regression


• Linear is a type of supervised machine learning
algorithms .That learn from labelled datasets and map
the datapoints with the most optimized data functions
which can be used for prediction for new datasets .
• Linear regression computes linear relationship between
the dependent variable and one or more dependent
feature by fitting a linear equations with observed
dataset .
• It predicts the continues output variables on the input
variables

Why linear regression is important ?


• The interpretablity of linear regression is one of its
gratest strengths
• The models equation offers clear coefficient that
illustrate the influence of each independent variable on
the dependent variable .
• Linear regression is transparent , easy to implement and
serve as the foundation concept for more advance
algorithms.
What is best fit line ?
• The best fit line equation provides the straight line that
represent the relationship between the independent
and the dependent variables .
• The slope of line indicates the how much the
dependent variable change for a unit change in the
independent variable

Here :
• Y = dependent or target variable
• X = independent variable or predictor of Y
( A linear function is simplest type of function )
Here X may be a Single feature or multiple feature representing
the problem .
( Linear regression performs the task to predict on a dependent
variable(Y) based on the given independent variable(X) )
In the figure above X is a input of a work experience and Y is an
output of salary
This regression line is best fit line for the model .

I regression some hypothesis are made to ensure reliability of


the models result
Hypothesis function in linear regression mode :
➢ Linearity
o It assume that there is relationship between the
independent and dependent variable.
o Here changes in independent variable leads to
proportional change in dependent variable.
➢ Independence :
o The observation should be independent from each
other that is error forms in one observation should
not influence others .

We have :
X = ( Experience ) independent variable .
Y = ( Salary ) dependent variable .
Lets assume :
There is a linear relationship between X & Y Then salary
can be predicted using

Y = ϴ1 + ϴ2X

Or
Yi = ϴ1 + ϴ2Xi
(i = 1,2,3,…..,n)
Y = labelled data
X = input training data
The model get best regression fit line by finding the best ϴ1 and
ϴ2 value
ϴ1 = intercept
ϴ2 = coefficient of X
Once we are find best ϴ1 and ϴ2 values we get the best fit line

How to update ϴ1 and ϴ2 value for get best fit line ?


To get best fit regression line yhe model aims to predict
the target value such that the error difference between the
predicted value and actual value is minimum
It is very important to update ϴ1 and ϴ2 values to reach
the best value that minimize the error between the predicted
value and true value

Types of linear regression

1 : Simple linear regression


(Unvaririated linear regression)

• Simple linear regression is the simplest form of


linear regression and it involves only one
independent variable and dependent variable .
The equation for it:
Y = β0 + β1 X
Where:
Y = dependent variable
X = independent variable
β0 = intercept
β1 = slope of line

Assumption of simple linear regression :


→ Linear regression is powerfull tool for the understanding
and predicting the behaviour of variable however it needs
to meet some conditions in order to be accurate and
dependent solution
1. Linearity:
• The independent and dependent variable have a line or
relationship with one another
• The change in dependent variable follow those in
independent varible in linear function
• This means there must be a straight line that can be drawn
through the data points.
• If relation is not linear then linear regression is not a
accurate model
• There are some types of relations
a. Linear relation
b. Non-linear reletion
2. Independence:
• The obsrevation in the dataset are independent to
each other
• The value of dependent variable in one dataset does
not depend on the value of dependent variable for
another dataset
• If observation is not indipendent then linear
regression is not a accurate model

3. Homoscedasticity:
• Across all levels of independent variable the
varience error are constant
• This indicates that the ammout of independent
variable has no impact on variece of error
• If the varience if the reciduals is not constant then
linear regrassion will not be an accurate model
There are 2 type of scedacity
4. Normality :
• The recedual should be noramlly distributes
• If residuals are not noramally distributed then linear
reggression is not a accurate model

Multiple linear regression


(multivariate regression )

Multiple linear regression involves more than one


independent variable and one dependent variable

Such that:
Y = β0 + β1 X1 + β2 X2 + ……… + βn Xn
Where :
Y = dependent variable
X1, X2, ……. ,Xn = independent variable
β0 = intercept
β1, β2,……. , βn = slopes
• the goal of this algorithm is to find the best line can
predict the values based on the independent varible
• In this regression set of records are present with X and
Y values and those values are used to learn a function
• If we want to predict value of Y from unknown X this
function cant be used
• In multiple regression we have to find value of Y so a
function is requiredhat predict continues Y in the case
of the regression given x as a independent feature

Assumption for multiple regression:


1. NO Multicollinearity
• Multicollinearity occurs when two or more
independent varibles are highly correlated to
each other wich can make it difficult to
determine individuals effects of each variable of
dependent variable .
• If there is multicollinearity then multiple linear
regression will not be accurate model
2. Additivity:
• The model assume that the effect of the
changes in a predictor variable on the response
variable is consist regardless of the values of the
other variables
• Additivity implies that there is no interaction
between variables in their effects on the
dependent variables
3. Feature selection :
• In multiple linear regression it id essentially to
carefully select the independent variables that
will be include in the model
• Including the irrelevant or redundant variable
may lead to overfiting and complicated the
interpretation of the model
4. Overfitting :
• Overfitting occurs when the model fits the
training data to closely capturing the noice as
a random fluctuations that do not represent
the true underlying relationship betweens the
variables
• It leads to poor generalization performance
on new unseen data

You might also like