0% found this document useful (0 votes)
5 views

1.a. Regression

The document provides an overview of regression analysis, detailing various types such as simple, multiple, and hierarchical regression, and explaining the purpose of regression in determining the influence of one variable on another. It includes the regression equation, hypothesis testing, significance levels, and interpretations of coefficients, as well as examples of data sets and their analysis. Additionally, it covers concepts like ANOVA, R-squared values, and the importance of sample size in regression modeling.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

1.a. Regression

The document provides an overview of regression analysis, detailing various types such as simple, multiple, and hierarchical regression, and explaining the purpose of regression in determining the influence of one variable on another. It includes the regression equation, hypothesis testing, significance levels, and interpretations of coefficients, as well as examples of data sets and their analysis. Additionally, it covers concepts like ANOVA, R-squared values, and the importance of sample size in regression modeling.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 62

Regression

R.Kasilingam
Types Regression
• Simple Regression
• Multiple Regression
• Step-wise Regression
• Hierarchical Regression
• Dummy
• Binary
• Multi-nominal
• Quantile Regression
Purpose
• To find the influence of one variable (x) on another (y)
• To find out extent of influence
• So one variable will be independent (x) and one variable
is dependent (y)
• The influence is measured by beta
• So the equation is
• Y = a+ bX
• a is called intercept
• b is beta or Regression co-efficient
Y=a+bX
Y=a+bX
• When X =0 then
• bx = b*0 =0
• Then Y= a
• A is a point of intersection
Y=a+bx
• If b =2, a=5
• Let us assume X is 10 then
• Y = 5+(2*10)=25
• Let me increase X by 1 which
means X=11 then
• Y=5+(2*11) =27 which means Y
increases by 2
• So b or beta is when there is a one
unit increase in X what is the
increase in Y
• So b or beta explains extent of
influence of X on Y
• b or beta is regression Co-efficient
Y =5+2X
X A B Y Y=a+bx
10 5 2 25 5+(2*10)=25
11 5 2 27 5+(2*11)=27

• β=dy/dx
• β =2/1 =2
• Beta is slope
• Correlation and Regression difference
What is a and b?
Y X Y
2 1 3
4 2 5
6 3 7
8 4 9
10 5 11
12 6 13
14 7 15
16 8 17
18 9 19
20 10 21
Hypothesis
• Null : There is no influence
• Which means b=0
• Alternate : There is a influence
Data set
• Family Income
• Total Income – Metric
• Total Monthly Savings - Metric
Testing
• If the sig. value is less than 0.05 then reject the Null
hypothesis
• Sig < .05 – Reject H0
Testing
• If the sig. value is less than 0.05 then reject the Null
hypothesis
• Sig < .05 – Reject H0
• It is a probability of committing type 1 error
• Type 1 error is rejecting H0 when it is true
• When sig value is less than 0.05 our error is less Which
means we are not committing error
• So we can reject H0
Coefficientsa

Unstandardized Coefficients Standardized


Model Coefficients t Sig.
B Std. Error Beta

(Constant) 3281.033 527.447 6.221 .000


1
totalin .161 .016 .404 10.351 .000
a. Dependent Variable: totsav

• The significant value is 0.000 which is less than 0.05


• So reject the H0. So no influence is rejected which means
there is a influence
• Whenever sig value is <.05 then we can tell there is a
influence
Critical Values
T Value Sig Confidence

1.64 0.1 90%

1.96 0.05 95%

2.58 0.01 99%


Coefficientsa

Unstandardized Coefficients Standardized


Model Coefficients t Sig.
B Std. Error Beta
(Constant) 3281.033 527.447 6.221 .000
1
totalin .161 .016 .404 10.351 .000
a. Dependent Variable: totsav

• Sig value <.05 so there is influence


• T value 10.351 which is more than 1.96 so there is a
influence
t value in Regression
Null Hypothesis

Test Null Hypothesis t

Chi-Square No Association Expected =Observed

Anova No Difference µ1 =µ2 Difference/Std. Error

Correlation No Relation R=0 r/std error

Regression No Influence Β=0 B/Std. Error


Coefficientsa

Unstandardized Coefficients Standardized


Model Coefficients t Sig.
B Std. Error Beta

(Constant) 3281.033 527.447 6.221 .000


1
totalin .161 .016 .404 10.351 .000
a. Dependent Variable: totsav

• The b or beta (Unstandardized )is 0.161 and a is 3281.033


• Y=a+bX
• Y = 3281 + 0.161 * X
• Monthly savings = 3281+ 0.161 * Monthly Income
Regression
Slope

Intercept

t statistics

• ƐY = Na+bƐX
• ƐXY = aƐX+bƐ
• S y/x is a SE for Regression
• =SE for intercept
• =SE for beta
Confidence Interval
• Lower limit =Co-efficient -1.96*SE
• Upper Limit =Co-efficient +1.96*SE
SE
• Use the standard error of the coefficient to measure the
precision of the estimate of the coefficient. The smaller the
standard error, the more precise the estimate.
• The standard error of the Stiffness (variable) coefficient is
smaller than that of Temp (variable). Therefore, our model
was able to estimate the coefficient for Stiffness with
greater precision.
Save
Predicted and Residuals
• It is a prediction technique

• Y = + Residuals
• Y=Predicted +Residual

• Y +e
• Residuals = Y -
35000

30000
f(x) = 0.161314408073057 x + 3281.03310190484
R² = 1
25000

20000
Predicted

Series1
15000
Linear (Series1)

10000

5000

0
0 20000 40000 60000 80000 100000 120000 140000 160000 180000
Income
ANOVA
ANOVAa
Model Sum of Squares df Mean Square F Sig.

Regression 2909134982.0 1 2909134982.00 107.138 .000b

1 Residual 14934291750.0 550 27153257.730

Total 17843426730.0 551


a. Dependent Variable: totsav
b. Predictors: (Constant), totalin
ANOVAa
Model Sum of Squares df Mean Square F Sig.

Regression 2909134982.0 1 2909134982.00 107.138 .000b

1 Residual 14934291750.0 550 27153257.730

Total 17843426730.0 551


a. Dependent Variable: totsav
b. Predictors: (Constant), totalin

Degrees of Freedom:
dfRegression = No. of Predictor variables (k) (1)
dfResidual = n-k-1 (552-1-1=550)
dfTotal = n-1 (552-1)
F Value

F> 3.8416 or 1.96^2


Model Summary – R value
Model Summary

Model R R Square Adjusted R Square Std. Error of the


Estimate

1 .404a .163 .162 5210.879

a. Predictors: (Constant), totalin

• R is correlation value
• The relationship between actual Y and Predicted Y ()

• /n-2
2
𝑅
• is degree of determination

• =.163

• /n-2
• /n-2 (MSE)
• /n-2
• Explained/Predicted/Regressed
• Unexplained/Not Predicted/Residual
2
𝑅 𝐴𝑑𝑗𝑢𝑠𝑡𝑒𝑑

• 1-.838=0.162
• = .5 = .4
• =2.4, =3
2
𝑅 𝐴𝑑𝑗𝑢𝑠𝑡𝑒𝑑
Model Sum of Squares df Mean Square F Sig.

Regression 2909134982.0 1 2909134982.00 107.138 .000b

1 Residual 14934291750.0 550 27153257.730 0.837 0.163

Total 17843426730.0 551 32,383,714.57 .838 .162

• dft is the degrees of freedom n– 1 of the estimate of the


population variance of the dependent variable,
• dfe is the degrees of freedom n – k – 1 error variance.
What is R square
Y X Y
2 1 3
4 2 5
6 3 7
8 4 9
10 5 11
12 6 13
14 7 15
16 8 17
18 9 19
20 10 21

• What is R Square
Standardized-Descriptives
Descriptive – Save Standardized
Changed Data Set
Regression with new variables
Coefficientsa

Standardized
Unstandardized Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) -2.776E-16 .039 .000 1.000

Ztotalin .404 .039 .404 10.351 .000


a. Dependent Variable: Ztotsav

Coefficientsa

Unstandardized Coefficients Standardized


Model Coefficients t Sig.
B Std. Error Beta

(Constant) 3281.033 527.447 6.221 .000


1
totalin .161 .016 .404 10.351 .000
a. Dependent Variable: totsav
Interpretation
• One standard deviation increase on the total income will
result in a .404 standard deviation increase in total savings.
Descriptive Statistics – For Std
Statistics

Ztotalin Ztotsav
N Valid 552 552
Missing 0 0
Mean .0000000 .0000000
Std. Deviation 1.00000000 1.00000000
Variance 1.000 1.000
Sample Size
• 10 to 15 cases for each predictor
• 50+8k or 104+k whichever is larger here k is number of
predictors
Y i = β1 + β 2 X i + u i

1. Stochastic 1.Explanatory Variable


2. Error
3. Random 2. Independent
4. Disturbance 3. Predictor
Term 4. Regressor
5. Residual
6. Shock 5. Stimulus
7. Noise 6. Exgoneous
8. Unknown 7. Covariate
Parameter , Constant 8. Control

Ŷ = β1 + β 2 X i Parameter , Slope of the curve


Car Sales
Coefficientsa

Standardized
Unstandardized Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) 59.234 31.477 1.882 .062
mpg -.267 1.300 -.017 -.206 .837
a. Dependent Variable: sales

• There is no sufficient evidence to reject Ho


Employee Branding
• Dependent – 5 point scale
• Independent - 5 point Scale
Spurious
Coefficientsa

Standardized
Unstandardized Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) -4.618 .888 -5.203 .000
LCO2 .636 .064 .809 9.920 .000
a. Dependent Variable: LLE

Model Summaryb

Adjusted R Std. Error of


Model R R Square Square the Estimate Durbin-Watson
1 .809a .654 .648 .08468 .081
a. Predictors: (Constant), LCO2

b. Dependent Variable: LLE


Multiple Regression
• Multiple Independent Variable, X1, X2, X3
• B1 for X1, B2 for X2, B3 for X3
• Y =a+b1X1+b2X2+b3X3
Data Set
• Gold
• Dependent variable Gold Price - Metric
• Independent variables
• GDP, Savings, Inflation, Real Interest Rate, Per Capita,
Money Supply
• All metric
Multiple Regression
Coefficientsa

Unstandardized Coefficients Standardized


Coefficients
Model t Sig.

B Std. Error Beta

(Constant) 9912.561 6296.821 1.574 .176

GDP 11.616 1.727 .832 6.726 .001

GDP_S -387.496 86.602 -.235 -4.474 .007

GDP_I -112.438 197.335 -.041 -.570 .593


1

RIR -239.964 204.224 -.072 -1.175 .293

GDP_C 11.810 .902 .482 13.092 .000

money_supply -82.850 62.963 -.081 -1.316 .245

a. Dependent Variable: GOLD


Model Summary
Adjusted R Std. Error of the
Model R R Square Square Estimate
1 .999a .999 .998 336.22947

a. Predictors: (Constant), money_supply, RIR, GDP_S, GDP_I, GDP_C,


GDP
Interpretation
• Three values are having sig value less than 0.05 which
means only three variables are influencing
• Y=a+b1*X1+b2X2+b3X3….
• Y=9912+11.616*GDP-387*Savings-112*Inflation ….
• It is not good to include variables which are not influencing
• We have to rerun with variables so that we will get new R
square and new co-efficient
• The equation will be different
Important
• Standardized beta
• GDP .832
• Per Capita .482
• Savings -.235
Stepwise
Coefficientsa

Unstandardized Coefficients Standardized


Model Coefficients t Sig.
B Std. Error Beta

(Constant) -3085.766 1379.533 -2.237 .049


1
GDP 13.380 1.253 .959 10.679 .000

(Constant) -4981.309 673.247 -7.399 .000


2 GDP 7.713 1.029 .553 7.498 .000
GDP_C 11.787 1.806 .481 6.526 .000

(Constant) 1651.450 987.004 1.673 .133

3 GDP 9.926 .518 .711 19.165 .000


GDP_C 11.024 .727 .450 15.155 .000
GDP_S -284.874 40.798 -.173 -6.983 .000
a. Dependent Variable: GOLD
Three Equation
• Model 1 Y = -3085+13.38*GDP
• Model 2 Y =-4981+7.713*GDP+11.787*GDP_C
• Model 3 Y =1651+9.926+11.024*GDP_c-284.874*GDP_S
R Square
Model Summary

Std. Error of the


Model R a R Square Adjusted R Square Estimate
1 .959 .919 .911 2016.86123
2 .993b .986 .983 887.96471
3 .999c .998 .997 353.59630
a. Predictors: (Constant), GDP
b. Predictors: (Constant), GDP, GDP_C
c. Predictors: (Constant), GDP, GDP_C, GDP_S
ANOVAa

Model Sum of Squares df Mean Square F Sig.b


1 Regression 463890507.80 1 463890507.80 114.042 .000

Residual 40677292.21 10 4067729.221


Total 504567800.00 11
2 Regression 497471468.00 2 248735734.00 315.462 .000c
Residual 7096331.955 9 788481.328

Total 504567800.00 11

3 Regression 503567557.30 3 167855852.40 1342.521 .000d

Residual 1000242.731 8 125030.341

Total 504567800.00 11
a. Dependent Variable: GOLD
b. Predictors: (Constant), GDP
c. Predictors: (Constant), GDP, GDP_C
d. Predictors: (Constant), GDP, GDP_C, GDP_S
Thank You

You might also like