1.linear Regression PSP
1.linear Regression PSP
Regression
Why Machine Learning?
Develop system that can automatically adapt and customize themselves to
individuals users.
-- Personalized news or mail filters
Discovers new knowledge from large datasets.
-- Market basket analysis
Ability to mimic human and replace certain monotonous task which require
some intelligence.
-- like recognizing handwritten characters
Develop systems that are too difficult/expensive to construct manually because
they require specific detailed skills or knowledge tuned to specific task
Supervised Machine Learning
Machine learning algorithm that makes predications on given set of samples
Supervised machine learning algorithm searches for patterns within the value
labels assigned to data points.
bx
1
Simple Linear Regression Example - Implementation
Find Value of R2
Where,
E(Y) = Estimated value of Y
Β0, C = Constant/ intersection point with Y
Β1, M = Slope
(a)Estimate β0/ C and β1/ M - for the linear regression
E(Y) = β0 + β1* X or E(Y) = M*X + C X is independent variable
(b) Find the value of (Coefficient of determination) R2 Y is Dependent Variable
(c) Plot the data E(Y)
Simple Linear Regression Example - Implementation
2 58 -2 -2 4 4
4 32 0 -28 0 0
5 63 1 3 1 3
7 87 3 27 9 81
3 67 -1 7 1 -7
1 45 -3 -15 9 45
6 68 2 8 4 16
M=?
Xmean = ? C= ?
Ymean = ? R2 =?
Simple Linear Regression Example - Implementation
2 58 -2 -2 4 4
4 32 0 -28 0 0
5 63 1 3 1 3
7 87 3 27 9 81
3 67 -1 7 1 -7
1 45 -3 -15 9 45
6 68 2 8 4 16
M = 142 / 28 = 5.071429
Xmean = 4 C = 39.71
Ymean = 60 R2 = 0.3863
Standard Deviation and Variance
Standard Deviation
Standard Deviation is a measure of how spread out the numbers.
Its symbol is σ (the greek letter sigma)
The formula is easy: it is the square root of the Variance. So now you ask, "What is the
Variance?"
Variance
The Variance is defined as:
The average of the squared differences from the Mean
Example
You and your friends have just measured the heights of your dogs (in
millimetres):
• The heights (at the shoulders) are: 600mm, 470mm, 170mm, 430mm and
300mm.
• Find out the Mean, the Variance, and the Standard Deviation.
Example
Mean = ( 600 + 470 + 170 + 430 + 300 ) / 5 = (1970 /5) = 394
so the mean (average) height is 394 mm. Let's plot this on the chart:
For each variable: Consider the number of valid cases, mean and standard deviation.
For each model: Consider regression coefficients, correlation matrix, part and partial
correlations, R2, standard error of the estimate, analysis-of-variance table, predicted values
and residuals.
Plots: Consider scatterplots, partial plots, histograms and normal probability plots.
Data: Dependent and independent variables should be quantitative. Categorical variables,
such as religion, major field of study or region of residence, need to be recoded to binary
(dummy) variables or other types of contrast variables.
Other assumptions: For each value of the independent variable, the distribution of the
dependent variable must be normal. The variance of the distribution of the dependent
variable should be constant for all values of the independent variable. The relationship
between the dependent variable and each independent variable should be linear and all
observations should be independent.
Types of Regression
Simple Linear Regression
Multiple Linear Regression
Polynomial Regression
Support vector Regression
Decision Tree
Random Forest
Multiple Linear Regression
Multiple linear regression is used to estimate the relationship between two or
more independent variables and one dependent variable.
You can use multiple linear regression when you want to know:
How strong the relationship is between two or more independent variables and one
dependent variable (e.g. how rainfall, temperature, and amount of fertilizer added affect
crop growth).
The value of the dependent variable at a certain value of the independent variables (e.g. the
expected yield of a crop at certain levels of rainfall, temperature, and fertilizer addition).
Multiple Linear Regression
Used to predict a correlation between more then one independent variables
and a dependent variable.
E.g., Income and Age is correlated with spending
When the data is plotted on a graph, there appears to be a hyper plane
relationship
Multiple Linear Regression
Y = β 0 + β 1 X 1 + β 2 X 2 + … + βp X p + ε
where:
• Y: The response(dependant) variable
• Xj: The jth predictor(Independent) variable
• βj: The average effect on Y of a one unit increase in Xj, holding all other
predictors fixed
• ε: The error term
• The values for β0, β1, B2, … , βp are chosen using the least square method,
which minimizes the sum of squared residuals (RSS):
• RSS = Σ(yi – ŷi)2
Multiple Linear Regression
Multiple Linear Regression
• The Model errors are statistically independent and represent a random sample
from the population of all possible errors.
• For a given x, there can exist many values of y; thus many possible value of Ƹ
• The errors are normally distributed
• The mean of the errors is zero
• Errors have a constant variance
Multiple Linear Regression Example
Let us suppose that we have a dataset containing a set of expenditure
information for different companies. We would like to know the profit made by
each company to determine which company can give the best results if
collaborated with them.
Multiple Linear Regression Example
Multiple Linear Regression Example
Slope(bi)
Estimates that the average value of y changes by bi units for each 1 unit
increase in Xi holding all other variables constant
Y- Intercept (b0)
The estimated average value of y when all Xi = 0
Multiple Linear Regression Example
Multiple Linear Regression Example
Multiple Linear Regression Example
Multiple Linear Regression Example
Multiple Linear Regression Example