0% found this document useful (0 votes)
186 views

1.linear Regression PSP

The document discusses different types of machine learning algorithms including supervised and unsupervised learning. It specifically focuses on linear regression, describing it as a supervised learning algorithm that finds patterns in labeled data to predict a continuous output variable. It provides examples of using linear regression for tasks like predicting tip amounts or sales estimates. It also covers key concepts in linear regression like the regression line, residuals, R-squared, and assumptions of the linear regression model.

Uploaded by

sharad
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
186 views

1.linear Regression PSP

The document discusses different types of machine learning algorithms including supervised and unsupervised learning. It specifically focuses on linear regression, describing it as a supervised learning algorithm that finds patterns in labeled data to predict a continuous output variable. It provides examples of using linear regression for tasks like predicting tip amounts or sales estimates. It also covers key concepts in linear regression like the regression line, residuals, R-squared, and assumptions of the linear regression model.

Uploaded by

sharad
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 92

Linear

Regression
Why Machine Learning?
 Develop system that can automatically adapt and customize themselves to
individuals users.
-- Personalized news or mail filters
 Discovers new knowledge from large datasets.
-- Market basket analysis
 Ability to mimic human and replace certain monotonous task which require
some intelligence.
-- like recognizing handwritten characters
 Develop systems that are too difficult/expensive to construct manually because
they require specific detailed skills or knowledge tuned to specific task
Supervised Machine Learning
 Machine learning algorithm that makes predications on given set of samples
 Supervised machine learning algorithm searches for patterns within the value
labels assigned to data points.

Working of Supervised Machine Learning


Supervised Machine Learning
 Supervised Learning problems can be further grouped into regression and
classification
 Overtime, the algorithm changes its strategy to learn better and achieve the
best reward.
 Regression: A regression problem is when the output variable is a real value, such as
“dollars” or “weight”
 Classification: A classification problem is when the output variable is a category, such as
“red” or “blue” or “disease” and “no disease”.
Regression Analysis
 Regression analysis is a form of
predictive modelling techniques
which investigate the relationship
between a dependent and
independent variable.
 Regression shows the changes in
dependent on y axis on the
changes in explanatory variable in
x- axis.
Regression Analysis
 Regression is based on a hypothesis that can be linear, quadratic, polynomial,
non-linear, etc.
 The hypothesis is a function that based on some hidden parameters and the
input values. In the training phase, the hidden parameters are optimized w.r.t.
the input values presented in the training.
Use of Regression
 Three Major uses of regression analysis are
 Determining the strength of predictors
 Forecasting an effect
 Trend forecasting
 Where is Linear Regression Used?
 Evaluating Trends and Sales estimates
 Analysing impact of price changes
 Assessment of risk in financial services and insurance domain
Problems- TIPS for Service
 As the waiter or owner of restaurant , you would
like to develop a model that will allow you to make
predication about what amount of tip to expect for
any given bill amount.

TIPS FOR SERVICE


Problems- TIPS for Service
 How might you predict the tip amount for future
meals using only this data?

 With only one variable , the


best predication for next
measurement is the mean
of sample itself
Problems- TIPS for Service
“Goodness of FIT” for the TIPS

 Distance from mean line its called as Residuals (Errors)


Squaring the RESIDUALS

Sum of Square Errors ( SSE ) = 120


 Why Square the Residuals?
1) Make them positive 2) Emphasizes larger deviations
Squaring the RESIDUALS
Simple Linear Regression
 The Goal of simple linear regression is to create a linear model that minimize
the sum of squares of residuals/Errors ( SSE).
 The regression line will/ should literally “fit” the data better. It will minimize the
residuals.
Understanding Linear Regression Algorithm.
Understanding Linear Regression Algorithm.
Y= b +
0

bx
1
Simple Linear Regression Example - Implementation

 Find Value of R2

Where,
E(Y) = Estimated value of Y
Β0, C = Constant/ intersection point with Y
Β1, M = Slope
(a)Estimate β0/ C and β1/ M - for the linear regression
E(Y) = β0 + β1* X or E(Y) = M*X + C X is independent variable
(b) Find the value of (Coefficient of determination) R2 Y is Dependent Variable
(c) Plot the data E(Y)
Simple Linear Regression Example - Implementation

X Y (X – X̅) (Y – Y̅) (X – X̅)2 (X – X̅) (Y – Y̅)

2 58 -2 -2 4 4
4 32 0 -28 0 0
5 63 1 3 1 3
7 87 3 27 9 81
3 67 -1 7 1 -7
1 45 -3 -15 9 45
6 68 2 8 4 16
M=?
Xmean = ? C= ?
Ymean = ? R2 =?
Simple Linear Regression Example - Implementation

X Y (X – X̅) (Y – Y̅) (X – X̅)2 (X – X̅) (Y – Y̅)

2 58 -2 -2 4 4
4 32 0 -28 0 0
5 63 1 3 1 3
7 87 3 27 9 81
3 67 -1 7 1 -7
1 45 -3 -15 9 45
6 68 2 8 4 16
M = 142 / 28 = 5.071429
Xmean = 4 C = 39.71
Ymean = 60 R2 = 0.3863
Standard Deviation and Variance
 Standard Deviation
 Standard Deviation is a measure of how spread out the numbers.
 Its symbol is σ (the greek letter sigma)
 The formula is easy: it is the square root of the Variance. So now you ask, "What is the
Variance?"
 Variance
 The Variance is defined as:
 The average of the squared differences from the Mean
Example
 You and your friends have just measured the heights of your dogs (in
millimetres):

• The heights (at the shoulders) are: 600mm, 470mm, 170mm, 430mm and
300mm.
• Find out the Mean, the Variance, and the Standard Deviation.
Example
 Mean = ( 600 + 470 + 170 + 430 + 300 ) / 5 = (1970 /5) = 394
 so the mean (average) height is 394 mm. Let's plot this on the chart:

 Now we calculate each dog's difference from the Mean:


Example
 Mean = ( 600 + 470 + 170 + 430 + 300 ) / 5 = (1970 /5) = 394
 so the mean (average) height is 394 mm. Let's plot this on the chart:

 Now we calculate each dog's difference from the Mean:


Example
 To calculate the Variance, take each difference, square it, and then average the
result:
Example
Why Linear Regression Important?
 Linear-regression models are relatively simple and provide an easy-to-interpret
mathematical formula that can generate predictions.
 Linear regression can be applied to various areas in business and academic
study
 You’ll find that linear regression is used in everything from biological,
behavioural, environmental and social sciences to business.
 Linear-regression models have become a proven way to scientifically and
reliably predict the future.
 Because linear regression is a long-established statistical procedure, the
properties of linear-regression models are well understood and can be trained
very quickly
Key Assumptions of effective LR
Assumptions to be considered for success with linear-regression analysis:

 For each variable: Consider the number of valid cases, mean and standard deviation.
 For each model: Consider regression coefficients, correlation matrix, part and partial
correlations, R2, standard error of the estimate, analysis-of-variance table, predicted values
and residuals.
 Plots: Consider scatterplots, partial plots, histograms and normal probability plots.
 Data: Dependent and independent variables should be quantitative. Categorical variables,
such as religion, major field of study or region of residence, need to be recoded to binary
(dummy) variables or other types of contrast variables.
 Other assumptions: For each value of the independent variable, the distribution of the
dependent variable must be normal. The variance of the distribution of the dependent
variable should be constant for all values of the independent variable. The relationship
between the dependent variable and each independent variable should be linear and all
observations should be independent.
Types of Regression
 Simple Linear Regression
 Multiple Linear Regression
 Polynomial Regression
 Support vector Regression
 Decision Tree
 Random Forest
Multiple Linear Regression
 Multiple linear regression is used to estimate the relationship between two or
more independent variables and one dependent variable.
 You can use multiple linear regression when you want to know:
 How strong the relationship is between two or more independent variables and one
dependent variable (e.g. how rainfall, temperature, and amount of fertilizer added affect
crop growth).
 The value of the dependent variable at a certain value of the independent variables (e.g. the
expected yield of a crop at certain levels of rainfall, temperature, and fertilizer addition).
Multiple Linear Regression
 Used to predict a correlation between more then one independent variables
and a dependent variable.
 E.g., Income and Age is correlated with spending
 When the data is plotted on a graph, there appears to be a hyper plane
relationship
Multiple Linear Regression

Y = β 0 + β 1 X 1 + β 2 X 2 + … + βp X p + ε
where:
• Y: The response(dependant) variable
• Xj: The jth predictor(Independent) variable
• βj: The average effect on Y of a one unit increase in Xj, holding all other
predictors fixed
• ε: The error term
• The values for β0, β1, B2, … , βp are chosen using the least square method,
which minimizes the sum of squared residuals (RSS):
• RSS = Σ(yi – ŷi)2
Multiple Linear Regression
Multiple Linear Regression

• The Model errors are statistically independent and represent a random sample
from the population of all possible errors.
• For a given x, there can exist many values of y; thus many possible value of Ƹ
• The errors are normally distributed
• The mean of the errors is zero
• Errors have a constant variance
Multiple Linear Regression Example
 Let us suppose that we have a dataset containing a set of expenditure
information for different companies. We would like to know the profit made by
each company to determine which company can give the best results if
collaborated with them.
Multiple Linear Regression Example
Multiple Linear Regression Example
 Slope(bi)
 Estimates that the average value of y changes by bi units for each 1 unit
increase in Xi holding all other variables constant

 Y- Intercept (b0)
 The estimated average value of y when all Xi = 0
Multiple Linear Regression Example
Multiple Linear Regression Example
Multiple Linear Regression Example
Multiple Linear Regression Example
Multiple Linear Regression Example

You might also like