0% found this document useful (0 votes)
14 views24 pages

Multiple-Regression -Batool & Raya

The document provides an overview of multiple regression analysis, highlighting its ability to analyze relationships between one dependent variable and multiple independent variables. It discusses the differences between simple and multiple regression, the purposes of using multiple regression, and the necessary conditions and assumptions for effective analysis. Additionally, it covers stages of research design, variable selection, interpretation of results, and the importance of assessing multicollinearity and validating results.

Uploaded by

anasjawabera
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views24 pages

Multiple-Regression -Batool & Raya

The document provides an overview of multiple regression analysis, highlighting its ability to analyze relationships between one dependent variable and multiple independent variables. It discusses the differences between simple and multiple regression, the purposes of using multiple regression, and the necessary conditions and assumptions for effective analysis. Additionally, it covers stages of research design, variable selection, interpretation of results, and the importance of assessing multicollinearity and validating results.

Uploaded by

anasjawabera
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 24

Multiple Regression

Done by:
Batool Gherbawe & Raya Al abbadi
Multiple Regression

Multiple regression is an analysis technique that can


be used to analyze the relationship between a single
variable (Dependent) and several variables
(Independent).
Simple vs. Multiple Regression

• One dependent variable Y which • One dependent variable Y


is predicted from one predicted from a set of
independent variable X. independent variables (X1, X2
• One regression coefficient ….Xk)
• r2: proportion of variation in the • One regression coefficient for
dependent variable Y that is each independent variable
predictable from Independent • R2: proportion of variation in
Variable X dependent variable Y predictable
by set of independent variables
(X’s)
Purposes:
– Prediction For example, regression analysis is the
foundation for businesses forecasting models, ranging
from the econometric models that predict the national
economy based on certain inputs income levels, business
investment, etc.)
– Explanation applications include evaluating the
determinants of effectiveness for a program
– Theory building Multiple regression analysis is a general
statistical technique used to analyze the relationship
between a single dependent variable and several
independent variables
Model and Required Conditions

• We allow for k independent variables to


potentially be related to the dependent variable
Random error
Coefficients variable

Y = β0 + β1X1+ β2X2 + …+ βkXk + ε

Dependent variable Independent variables


Multiple Regression for k = 2,
Graphical Demonstration
Y The simple linear regression model
allows for one independent variable, “X”
Y = β0 + β1X + ε

Note how the straight line


becomes a plane

X1

The multiple linear regression model


allows for more than one independent variable.
Y = β0 + β1X1 + β2X2 + ε
X2
THE IMPACT OF
MULTICOLLINEARITY
Ability of an additional independent variable to improve the
prediction of the dependent variable is related not only to its
correlation to the dependent variable, but also to the correlation(s) of
the additional independent variable to the independent variable(s)
heady in the regression equation.
Collinearity is the association, measured as the correlation, between
two independent variables.
Multicollnearity refers to the correlation among three or more
independent variables
STAGE 1: OBJECTIVES OF MULTIPLE REGRESSION
• Multiple regression analysis, a form of general linear modeling,
is a multivariate statistical technique used to examine the
relationship between a single dependent variable and a set of
independent variables.
• In selecting suitable applications of multiple regression, the
researcher must consider three primary issues:
1. The appropriateness of the research problem (Prediction,
Explanation)
2. Specification of a statistical relationship (when the researcher
is interested in a statistical, not a functional relationship).
functional relationship calculates an exact value, whereas
a statistical relationship estimates an average value.
3. Selection of the dependent and independent variables (strong
theory, measurement error, and specification error).
Stage 2: research design of a multiple regression analysis
• Sample Size :
• The sample size used in multiple regression is perhaps the
most single influential element under the control of the
researcher in designing the analysis.
• The size of the sample has a direct impact on the
appropriateness and the statistical power of multiple
regression.
Stage 3: Assumptions in Multiple Regression Analysis

We must make several assumption about the relationship between the


dependent and independent variable that affects the statistical distribution
The assumptions to be examined are in four areas :

1- Linearity of the performance measured


2- Constant variance of the error terms
3- Independence of the error terms
4- Normality of the error term distribution
Linearity of the Phenomenon

Linearity: the relation between the dependent


and the independent variable represented to
which the change dependent variable is
associated with the independent variable
Constant Variance of the Error Term

• The presence of unequal variable (Heteroscedasticity)


is one of the most common assumption violation.
• Diagnosis is made with residual plot or sample
statistical test.
• Homogeneity: SPSS test measures the quality of
variance for a single pair of variable.
Independence of the Error Terms

• We assume in regression that each predicted value is


independent, which means that the predicted value is
not related to any other prediction.
Normality of the Error Term Distribution
Perhaps the most frequently encountered assumption
violation is nonnormality of the independent or
dependent variable or both.
A better method is comparing with the normal
distribution.
Stage 4: Estimating the Regression Model and Assessing
Overall Model Fit
• Must accomplish three basic tasks:

1. Select a method for specifying the regression model to be


estimated.
2. Assess the statistical significance of the overall model in
predicting the dependent variable.
3. Determine whether any of the observations exert an undue
influence on the results.
Variable Selection Approaches

• Confirmatory
• Sequential Search Methods:
✓ Stepwise (variables not removed once
included in regression equation).
✓ Forward Inclusion & Backward Elimination.
• Combinatorial (All-Possible-Subsets)
Stage 5: Interpretation of Regression Results

• Interpret the impact of each independent variable relative to the


other variables in the model, as model re-specification can have a
profound effect on the remaining variables:

✓ Use beta weights when comparing relative importance among


independent variables.

✓ Regression coefficients describe changes in the dependent


variable, but can be difficult in comparing across independent
variables if the response formats vary.
Assessing Multicollinearity

The researcher’s task is to . . .


• Assess the degree of multicollinearity,
• Determine its impact on the results, and
• Apply the necessary remedies if needed.
Multicollinearity
Diagnostics
• Variance Inflation Factor (VIF) – measures how much the variance of
the regression coefficients is inflated by multicollinearity problems. If
VIF equals 0, there is no correlation between the independent
measures. A VIF measure of 1 is an indication of some association
between predictor variables, but generally not enough to cause
problems. A maximum acceptable VIF value would be 10; anything
higher would indicate a problem with multicollinearity.

• Tolerance – the amount of variance in an independent variable that


is not explained by the other independent variables. If the other
variables explain a lot of the variance of a particular independent
variable we have a problem with multicollinearity. Thus, small values
for tolerance indicate problems of multicollinearity. The minimum
cutoff value for tolerance is typically .10. That is, the tolerance value
must be smaller than .10 to indicate a problem of multicollinearity.
Stage 6: Validation of the Results

• To ensure that the result are generalizable to the population


and not specific to the sample estimation. How??
• By obtaining another sample of the same population and
assess the two results or estimating the regression model on
two or more subsamples of the data
Evaluating alternative regression model
Confirmatory regression model:
-alternative to stepwise regression estimation, focus on prediction
and explanation.
-the researcher specify the independent variable to be included in
the regression equation.
-inclusion of 13 perceptual measures as independent variables
and directly entered into the regression equation at the same
time.
-the researcher examine:
1)Model fit
2)interpretation
Regression Analysis Terms
• Explained variance = R2 (coefficient of determination).
• Unexplained variance = residuals (error).
• Adjusted R-Square = reduces the R2 by taking into account the
sample size and the number of independent variables in the
regression model (It becomes smaller as we have fewer
observations per independent variable).
• Standard Error of the Estimate (SEE) = a measure of the
accuracy of the regression predictions. It estimates the variation
of the dependent variable values around the regression line. It
should get smaller as we add more independent variables, if they
predict well.
Regression Analysis Terms
• Total Sum of Squares (SST) = total amount of variation that
exists to be explained by the independent variables. TSS =
the sum of SSE and SSR.
• Sum of Squared Errors (SSE) = the variance in the dependent
variable not accounted for by the regression model =
residual. The objective is to obtain the smallest possible sum
of squared errors as a measure of prediction accuracy.
• Sum of Squares Regression (SSR) = the amount of
improvement in explanation of the dependent variable
attributable to the independent variables.
Thank you

You might also like