0% found this document useful (0 votes)
29 views

Multiple Regression Analysis & Applications

This document discusses multiple regression analysis and its applications. Multiple regression examines relationships between a dependent variable and multiple independent variables. It can determine how much variation in the dependent variable is explained by the independent variables, predict dependent variable values, and control for other independent variables. The document outlines statistics used in bivariate and multiple regression including coefficients, standard errors, and tests. It provides examples of applying regression and discusses interpreting partial regression coefficients and issues like multicollinearity.

Uploaded by

Harshul Bansal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views

Multiple Regression Analysis & Applications

This document discusses multiple regression analysis and its applications. Multiple regression examines relationships between a dependent variable and multiple independent variables. It can determine how much variation in the dependent variable is explained by the independent variables, predict dependent variable values, and control for other independent variables. The document outlines statistics used in bivariate and multiple regression including coefficients, standard errors, and tests. It provides examples of applying regression and discusses interpreting partial regression coefficients and issues like multicollinearity.

Uploaded by

Harshul Bansal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 23

MULTIPLE REGRESSION ANALYSIS

&
APPLICATIONS
Regression Analysis

Regression analysis examines associative relationships


between a metric dependent variable and one or more
independent variables in the following ways:
• Determine whether the independent variables explain a
significant variation in the dependent variable: whether a
relationship exists.
• Determine how much of the variation in the dependent
variable can be explained by the independent variables:
strength of the relationship.
• Determine the structure or form of the relationship: the
mathematical equation relating the independent and
dependent variables.
• Predict the values of the dependent variable.
• Control for other independent variables when evaluating the
contributions of a specific variable or set of variables.
• Regression analysis is concerned with the nature and degree
of association between variables and does not imply or
assume any causality.
Statistics Associated with Bivariate
Regression Analysis

• Bivariate regression model. The basic regression


equation is Yi = 0 +  1 Xi + ei, where Y = dependent or
criterion variable, X = independent or predictor variable,  0
= intercept of the line,  1 = slope of the line, and ei is the
 
error term associated with the i th observation.

• Coefficient of determination. The strength of association


is measured by the coefficient of determination, r 2. It
varies between 0 and 1 and signifies the proportion of the
total variation in Y that is accounted for by the variation in
X.

Y Y
• Estimated or predicted value. The estimated or predicted
value
0 ofY1 i is i = a + b x, where i is the predicted value
of Yi, and a and b are estimators of
and , respectively.

 
Statistics Associated with Bivariate
Regression Analysis

• Regression coefficient. The estimated


parameter b is usually referred to as the non-
standardized regression coefficient.
• Scattergram. A scatter diagram, or
scattergram, is a plot of the values of two
variables for all the cases or observations.
• Standard error of estimate. This statistic,
SEE, is the standard deviation of the actual Y
values from the predicted Y values.
• Standard error. The standard deviation of b,
SEb, is called the standard error.
Statistics Associated with Bivariate
Regression Analysis

• Standardized regression coefficient. Also termed


the beta coefficient or beta weight, this is the slope
obtained by the regression of Y on X when the data
are standardized.

• Sum of squared errors. The distances of all the


points from the regression line are squared and added
together to arrive at the sum of squared errors, which
is a measure of total error, ej
2

• t statistic. A t statistic with n - 2 degrees of


freedom can be used to test the null hypothesis that
no linear relationship exists between X and Y, or H0:
β = 0, where t=b /SEb
Examples of Regression Analysis

AW = f(U, A, I)
Where,
AW= Awareness about the product
U = Uniqueness of the product
A = Advertisement
I = Interest in the product category
Examples of Regression Analysis

S = f(AE, P, D)
Where,
S = Sales
AE = Advertisement expenditure
P = Price
D = Level of distribution
Examples of Regression Analysis

MS = f(SSF, AE, SPB)


Where,
MS= Market share
SSF = Size of sales force
AE = Advertisement expenditure
SPB = Sales promotion budgets
Examples of Regression Analysis

CP = f(PP, BI, BA)


Where,
CP = Consumers’ perceptions of
quality
PP = Perceptions of prices
BI = Brand image
BA = Brand attributes
Multiple Regression

The general form of the multiple regression model


is as follows:

Y =  0 +  1 X1 +  2 X2 +  3 X3+ . . . +  k X k + e
which is estimated by the following equation:

Y= a + b1X1 + b2X2 + b3X3+ . . . + bkXk


As before, the coefficient a represents the intercept,
but the b's are now the partial regression coefficients.
Statistics Associated with Multiple Regression

• Adjusted R2. R2, coefficient of multiple determination, is


adjusted for the number of independent variables and the
sample size to account for the diminishing returns. After the
first few variables, the additional independent variables do not
make much contribution.

• Coefficient of multiple determination. The strength of


association in multiple regression is measured by the square of
the multiple correlation coefficient, R2, which is also called the
coefficient of multiple determination.

• F test. The F test is used to test the null hypothesis that the
coefficient of multiple determination in the population, R2pop, is
zero. This is equivalent to testing the null hypothesis. The test
statistic has an F distribution with k and (n - k - 1) degrees of
freedom.
Statistics Associated with Multiple Regression

• Partial F test. The significance of a partial regression


coefficient, i , of Xi may be tested using an incremental
F statistic. The incremental F statistic is based on the
increment in the explained sum of squares resulting
from the addition of the independent variable Xi to the
regression equation after all the other independent
variables have been included.

• Partial regression coefficient. The partial regression


coefficient, b1, denotes the change in the predicted
value, Y , per unit change in X1 when the other
independent variables, X2 to Xk, are held constant.
Conducting Multiple Regression Analysis
Partial Regression Coefficients

To understand the meaning of a partial regression coefficient,


let us consider a case in which there are two independent
variables, so that:

Y = a + b1X1 + b2X2

 First, note that the relative magnitude of the partial


regression coefficient of an independent variable is, in
general, different from that of its bivariate regression
coefficient.
 The interpretation of the partial regression coefficient, b1,
is that it represents the expected change in Y when X1 is
changed by one unit but X2 is held constant or otherwise
controlled. Likewise, b2 represents the expected change in
Y for a unit change in X2, when X1 is held constant. Thus,
calling b1 and b2 partial regression coefficients is
appropriate.
Conducting Multiple Regression Analysis
Partial Regression Coefficients

• It can also be seen that the combined effects of X1 and X2 on Y


are additive. In other words, if X1 and X2 are each changed by
one unit, the expected change in Y would be (b1+b2).

• Suppose one was to remove the effect of X2 from X1. This


could be done by running a regression of X1 on X2. In other
words, one would estimate the equation X 1 = a + b X2 and
X
calculate the residual Xr = (X1 - 1). The partial regression
coefficient, b1, is then equal to the bivariate regression
Y
coefficient, br , obtained from the equation = a + br Xr .
Conducting Multiple Regression Analysis
Partial Regression Coefficients

• Extension to the case of k variables is straightforward. The partial


regression coefficient, b1, represents the expected change in Y when
X1 is changed by one unit and X2 through Xk are held constant. It can
also be interpreted as the bivariate regression coefficient, b, for the
regression of Y on the residuals of X1, when the effect of X2 through Xk
has been removed from X1.
• The relationship of the standardized to the non-standardized
coefficients remains the same as before:
B1 = b1 (Sx1/Sy)
Bk = bk (Sxk /Sy)

The estimated regression equation is:


 (Y ) = 0.33732 + 0.48108 X1 + 0.28865 X2

or

Attitude = 0.33732 + 0.48108 (Duration) + 0.28865 (Importance)


Multiple Regression

Table 17.3
Multiple R 0.97210
R2 0.94498
Adjusted R2 0.93276
Standard Error 0.85974

ANALYSIS OF VARIANCE
df Sum of Squares Mean Square

Regression 2 114.26425 57.13213


Residual 9 6.65241 0.73916
F = 77.29364 Significance of F = 0.0000

VARIABLES IN THE EQUATION


Variable b SEb Beta (ß) T Significance
of T
IMPORTANCE 0.28865 0.08608 0.31382 3.353 0.0085
DURATION 0.48108 0.05895 0.76363 8.160 0.0000
(Constant) 0.33732 0.56736 0.595 0.5668
Multicollinearity

• Multicollinearity arises when intercorrelations among


the predictors are very high.

• Multicollinearity can result in several problems, including:


• The partial regression coefficients may not be

estimated precisely. The standard errors are likely to


be high.
• The magnitudes, as well as the signs of the partial

regression coefficients, may change from sample to


sample.
• It becomes difficult to assess the relative importance of

the independent variables in explaining the variation in


the dependent variable.
• Predictor variables may be incorrectly included or

removed in stepwise regression.


Multicollinearity

• A simple procedure for adjusting for multicollinearity consists


of using only one of the variables in a highly correlated set of
variables.

• Alternatively, the set of independent variables can be


transformed into a new set of predictors that are mutually
independent by using techniques such as principal
components analysis.

• More specialized techniques, such as ridge regression and


latent root regression, can also be used.
Multicollinearity

• Tolerance: The percentage of variance in the IV not accounted


for by other IV.:1-R2

• Tolerance of .10-.20 are problematic as this means that only


10-20% of the IV is not explained by the other IV

• Variance inflation factor(VIF): 1/1-R2.

• So if Tolerance is .10 then the VIF is 10. If the variance


inflation factor of a predictor variable were 10 this means that
the standard error for the coefficient of that predictor variable
is 10 times as large as it would be if that predictor variable
were uncorrelated with the other predictor variables.

• Hence VIF should be ≤3 ; 3-5 =not good;5-10=bad; >10=VB


Dummy Variables

• There are situations where the dependent variable may


be influenced by the qualitative variables like gender,
marital status, profession, geographical region, and
religion etc.
• To quantify the qualitative variables, dummy variables are
used.
• The number of dummy variables in a regression model
equals the number of categories of data less one.
• Dummy variable may take two values such as zero, one;
ten, eleven; or any other such value.
• Dummy variables could also be used to examine the
moderator effect between two variables.
Example of a Dummy Variable Regression

Suppose the starting salary of a college lecturer is influenced


not only by years of teaching experience but also by gender.
Therefore, the model could be specified as:
Y = f (X, D)
Where,
Y = Starting salary of a college lecturer in thousands `
per month
X = No. of years of work experience
D is a dummy variable which takes values
D = 1 (if the respondent is a male)
= 0 (if the respondent is a female)

The model could be written as,


Y=α+βX+γD+U
Example of a Dummy Variable Regression

This can be estimated by using ordinary least squares (OLS)


techniques. Suppose the estimated regression equation looks like:

The above two equations differ by the amount γˆ. It is known


that γˆ can be positive or negative. If γˆ is positive it would
imply that the average salary of a male lecturer is more than that
of a female lecturer by the amount γˆ while keeping the number
of years of experience constant.
Moderator Variable

Consider Y = a + b1x + b2z + b3xz

• If b3 is insignificant and b2 is significant, than z is not a


moderator variable but simply an independent predictor
variable.
• If b2 is insignificant and b3 is significant, than z is a PURE
moderator variable.
• If both b2 and b3 are significant, than z is a QUASI
moderator variable.

You might also like