0% found this document useful (0 votes)
2 views

15Multiple Linear Regression

The document provides an overview of intermediate biostatistics, focusing on linear regression models, including simple and multiple regression, assumptions, and model building strategies. It discusses the importance of understanding the relationships between predictor variables and response variables, as well as techniques for assessing model adequacy and addressing issues like multicollinearity. Additionally, it covers various statistical tests and metrics used in regression analysis, such as R-squared, adjusted R-squared, and ANOVA.

Uploaded by

Berhanu Yelea
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

15Multiple Linear Regression

The document provides an overview of intermediate biostatistics, focusing on linear regression models, including simple and multiple regression, assumptions, and model building strategies. It discusses the importance of understanding the relationships between predictor variables and response variables, as well as techniques for assessing model adequacy and addressing issues like multicollinearity. Additionally, it covers various statistical tests and metrics used in regression analysis, such as R-squared, adjusted R-squared, and ANOVA.

Uploaded by

Berhanu Yelea
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 168

Intermediate

Biostatistics
By
Wondwossen Terefe
Assistant Professor of Biostatistics at Mekele
University
Senior Biostatistician at Tulane International Ethiopia
Overview

OLR vs MLR
OMLR with Examples
OAssumptions
OGeneral Strategy
OSummary

2
The linear model with a
single
predictor variable X can
easily
be extended to two or more
Ypredictor
   Xvariables.
o 1   X  ...   X  
1 2 2 p p

3
The General Idea
Simple regression considers the relation
between a single explanatory variable and
response variable

4
The General Idea
Multiple regression simultaneously
considers the influence of multiple
explanatory variables on a response
variable Y
The intent is to look at
the independent effect
of each variable while
“adjusting out” the
influence of potential
confounders

5
Regression Modeling
O A simple regression
model (one
independent
variable) fits a
regression line in 2-
dimensional space

O A multiple regression
model with two
explanatory variables
fits a regression
plane in 3- 6
dimensional space
Simple Regression
Model
Regression coefficients are estimated by
minimizing ∑residuals2 (i.e., sum of the
squared residuals) to derive this model:

The standard error of the regression (sY|x) is


based on the squared residuals:

7
Multiple Regression Model
Again, estimates for the multiple
slope coefficients are derived by
minimizing ∑residuals2 to derive this
multiple regression model:

Again, the standard error of the


regression is based on the ∑residuals2:

8
Multiple Regression Model
O Intercept α
predicts where
the regression
plane crosses
the Y axis
O Slope for
variable X1 (β1)
predicts the
change in Y per
unit X1 holding
X2 constant
O The slope for
variable X2 (β2)
predicts the
change in Y per
unit X2 holding 9
X1 constant
Multiple Regression Model

A multiple regression
model with k independent
variables fits a
regression “surface” in k
+ 1 dimensional space
(cannot be visualized)

10
Categorical Explanatory
Variables in Regression
Models
O Categorical
independent variables
can be incorporated
into a regression
model by converting
them into 0/1
(“dummy”) variables
O For binary variables,
code dummies “0” for 11
“no” and 1 for “yes”
Dummy Variables, More
than two levels
For categorical variables with k categories, use k–1
dummy variables
SMOKE2 has three levels, initially coded
0 = non-smoker
1 = former smoker
2 = current smoker

Use k – 1 = 3 – 1 = 2 dummy variables to code this


information like this:

12
Common variance
explained by X1 and X2
Unique variance
explained by X2

X2
X1

Unique variance
Y Variance NOT
explained by X1 explained by X1 13
and X2
A “good” model

X1 X2

Y
14
Y  o  1 X 1   2 X 2  ...   p X p  

intercept Partial residuals


Regression
Coefficients
Partial Regression Coefficients
(slopes): Regression coefficient of X
after controlling for (holding all other
predictors constant) influence of other
variables from both X and Y.
15
The matrix algebra of

Ordinary Least Square


Intercept and Slopes:
1
 ( X ' X ) X ' Y
Predicted Values:

Y   X
Residuals:

Y  Y 16
Regression Statistics
How good is our model?

SST  (Y  Y ) 2

SSR  (Y   Y ) 2

SSE  (Y  Y ) 2

17
Coefficient of determination
(R2)
O Also known as the squared multiple
correlation coefficient
O Usually report R2 instead of R
O R2 = % of variance in DV explained
by combined effects of the IVs
O Analogous to r2

18
Interpretation of R2

R2 = .30 is good for social


sciences and health
OR2 = .10 = small
OR2 = .20 = medium
OR2 = .30 = large
19
Regression Statistics

2SSR
R 
SST
Coefficient of Determination
to judge the adequacy of the regression model

20
Adjusted R2
O Adjusted R2 used for estimating explained
variance in a population.
O As number of predictors approaches N, R2 is
inflated
O Hence report R2 and adjusted R2
particularly for small N and where results are
to be generalised
O If N is small, take more note of adjusted R2

21
Regression Statistics

2 n 1 2
R adj 1  (1  R )
n k  1
n = sample size
k = number of independent variables

R2 are not biased! Adjusted 22


Regression Statistics
Standard Error for the regression model

S e  S  ˆ
2
e
2

SSE SSE  (Y  Y ) 2
2
S 
e
n k  1
2
S e MSE
23
ANOVA
H 0 : 1  2 ...  k 0
H A :  i 0 at least one!

df SS MS F P-value

Regression k SSR SSR / df MSR / MSE P(F)

Residual n-k-1 SSE SSE / df

Total n-1 SST


If P(F)<a then we know that we get significantly better prediction of Y from the
regression model than by just predicting mean of Y.
24

ANOVA to test significance of regression


Hypothesis Tests for Regression
Coefficients

H 0 :  i 0
H 1 :  i 0

bi   i
t( n  k  1) 
Sbi
25
Hypotheses Tests for Regression
Coefficients
H 0 :  i 0
H A :  i 0

b1   i bi   i
t( n  k  1)   2
S e (bi ) 2 S
S e Cii e
S xx
26
Confidence Interval on
Regression Coefficients

bi  t / 2,( n  k  1) S Cii  i bi  t / 2,( n  k  1) S Cii


2
e
2
e

Confidence Interval for bi

27
1
 ( X ' X ) X ' Y

28
1
 ( X ' X ) X ' Y

29
1
 ( X ' X ) X ' Y

30
b1   i bi   i
t( n  k  1)  
S e (bi ) 2
S e Cii

31
Model
Development/Building

Avoiding predictors (Xs) that


do not
contribute significantly to
model prediction

32
Types of Model
Building

OForward addition
OBackward elimination
OStandard or direct
(simultaneous)
OHierarchical or sequential
33

OStepwise (forward &


Forward selection

OThe ‘best’ predictor variables are


entered, one by one, if they reach a
criteria (e.g., p < .05)
OBest predictor = x with the highest
r with Y
OAdd one variable at a time until
contribution is insignificant
OComputer driven – controversial
34
Forward Selection

35
Backward elimination

OAll predictor variables are entered,


then the weakest predictors are
removed, one by one, if they meet
a criteria (e.g., p > .05)
ORemove one variable at a time
starting with the “worst” until R2
drops significantly
OComputer driven – controversial36
Backward Elimination

37
Direct or Standard

OAll predictor variables are


entered together
O Allows assessment of the relationship between
all predictor variables and the criterion (Y)
variable if there is good theoretical reason for
doing so.
OManual technique & commonly
used
38
Hierarchical (Sequential)

O Researcher defines order of entry for the


variables, based on theory.
O IVs are entered in blocks or stages.
O May enter ‘nuisance’ variables first to
‘control’ for them, then test ‘purer’ effect
of next block of important variables.
O Manual technique & commonly used.

39
Stepwise
O Combines both forward and backward.
O At each step, variables may be entered or
removed if they meet certain criteria or some
order
O By size or correlation with dependent variable
O In order of significance
O Useful for developing the best prediction
equation from the smallest no. of variables.
O Means that redundant predictors will be
removed.
O Computer driven – controversial

40
Stepwise Include X3

Regressio Include X6

n Include X2

Include X5

Remove X2
(When X5 was inserted into the
model
X2 became unnecessary)

Include X7

Remove X7 - it is insignificant

41
Stop
Final model includes X3, X5 and X6
Which method?
O Standard: To assess impact of all IVs
simultaneously
O Hierarchical: To test specific hypotheses
derived from theory
O Stepwise: If goal is accurate statistical
prediction – computer driven

42
Model Selection: The General
Case

SSE ( x1 , x2 ,..., xq )  SSE ( x1 , x2 ,..., xq , xq 1 ,..., xk )


k q
F
SSE ( x1 , x2 ,..., xq , xq 1 ,..., xk )
n k  1
Reject H0 if : F  F ,k  q ,n  k  1 43
Linear Regression
Assumptions
O Mean of Distribution of Error Is 0

O Distribution of Error Has Constant Variance

O Distribution of Error is Normal

O Errors Are Independent

e l y
t re m
E x an t
o rt
Im p 44
MLR Model: Basic Assumptions
O Independence: The data of any particular
subject are independent of the data of all
other subjects
O Normality: in the population, the data on the
dependent variable are normally distributed
for each of the possible combinations of the
level of the X variables; each of the variables
is normally distributed
O Homoscedasticity: In the population, the
variances of the dependent variable for each
of the possible combinations of the levels of
the X variables are equal.
O Linearity: In the population, the relation
between the dependent variable and the 45

independent variable is linear when all the


Tests for Normality of Residuals

O swilk: performs the Shapiro-Wilk W test


for normality.
O kdensity: produces kernel density plot with
normal distribution
overlayed.
O pnorm: graphs a standardized normal
probability (P-P) plot.
O qnorm: plots the quantiles of varname
against the quantiles of a normal
distribution. 46
Tests for
Heteroscedasticity

O hettest performs Cook and Weisberg


test

O rvfplot graphs residual-versus-fitted


plot.

47
Diagnostic Tests For Regressions
Expected distribution of residuals for
a linear model with normal
distribution or residuals (errors).

i

Xi 48
Standardized Residuals

Standard Residuals

2.5
2
ei 1.5
di  1
S e2 0.5
0
-0.5 0 5 10 15 20 25
-1
-1.5
-2

49
Normality & homoscedasticity

Normality
O If non-normality,
there will be
heteroscedasticity

Homoscedasticity
O Variance around
regression line is
same throughout the
distribution
O Even spread in 50
residual plots
Homoscedasticity

Heteroscedasticity is not fatal 51

- but it weakens the analysis.


Tests for
Multicollinearity

O vif calculates the variance


inflation factor for the
independent variables in the
linear model.

52
Multicollinearity
O Many health research studies have large
numbers of predictor variables
O Problems arise when the various predictors
are highly related among themselves
(collinear)
O Estimated regression coefficients can change
dramatically, depending on whether or not
other predictor(s) are included in model.
O Standard errors of regression coefficients can
increase, causing non-significant t-tests and
wide confidence intervals
O Variables are explaining the same variation in Y
53
Multicolinearity
• A high degree of multicolinearity
produces unacceptable uncertainty
(large variance) in regression
coefficient estimates (i.e., large
sampling variation)

 Imprecise estimates of slopes and


even the signs of the coefficients may
be misleading.

 t-tests which fail to reveal significant


54
factors.
Multicollinearity

O Detect via:
— Correlation matrix - are there large
correlations among IVs?
— Tolerance statistics - if < .2 then exclude
that variable.
— Variance Inflation Factor (VIF) - looking
for < 5, otherwise exclude variable.

55
Scatter Plot

56
Multicolinearity
 If the F-test for significance of
regression is significant, but tests on
the individual regression coefficients
are not, multicolinearity may be
present.

 Variance Inflation Factors (VIFs) are


very useful measures of
multicolinearity. If any VIF exceed 5,
multicolinearity is a1 problem.
VIF (  i )  2
Cii
1  Ri 57
Multicollinearity
 The stepwise selection process can
help eliminate correlated predictor
variables
 Other advanced procedures such as
ridge regression can also be applied
 Care should be taken during the
model selection phase as
multicollinearity can be difficult to
detect and eliminate
58
Intercorrelation or
collinearlity
O If the two independent variables are
uncorrelated, we can uniquely partition
the amount of variance in Y due to X1
and X2 and bias is avoided.
O Small intercorrelations between the
independent variables will not greatly
biased the b coefficients.
O However, large intercorrelations will
biased the b coefficients and for this
reason other mathematical procedures
are needed 59
O Correlation
O Regression X Y

O Partial Correlation
O Multiple Regression Y

X1 X2
60
Partial Correlation
O Measures the strength of association between
Y and a predictor, controlling for other
predictor(s).
O Squared partial correlation represents the
fraction of variation in Y that is not explained
by other predictor(s) that is explained by this
predictor.

rYX 2  rYX1 rX1 X 2


rYX 2  X1   1 rYX 2  X1 1
1  r 1  r 
2
YX1
2
X1 X 2
61
Partial correlations
O Partial correlation - r between X and Y
after controlling for (partialling out) the
influence of a 3rd variable from both X
and Y.
e.g.,Does:
1 # years of
marriage predict
(IV)
2 marital
satisfaction (DV)
3 after # children62
is controlled for?
Coefficient of Partial
Determination
O Measures proportion of the variation in Y
that is explained by X2, out of the
variation not explained by X1
O Square of the partial correlation
between Y and X2, controlling for X1.
2 2
2
R r YX 1
rYX 2  X 1  2
0 rYX2 2  X 1 1
1 r YX 1

• where R2 is the coefficient of determination for model with


both X1 and X2: R2 = SSR(X1,X2) / TSS
• Extends to more than 2 predictors (pp.414-415) 63
Multiple Partial
Correlation
O The concept of a partial correlation is related to that of a
partial F test.
O “To what extent are two variables, say X and Y,
correlated after accounting for a control variable, say
Z”?
O Preliminary 1: Regress X on the control variable Z
O Obtain the residuals
O These residuals represent the information in X that is independent
of Z
O Preliminary 2: Now regress Y on the control variable Z
O Obtain the residuals
O These residuals represent the information in Y that is independent
of Z
O These two sets of residuals permit you to look at the 64
relationship between X and Y, independent of Z.
Multiple partial
correlation
O Partial correlation (X,Y | controlling for Z)=
Correlation (residuals of X regressed on Z,
residuals of Y regressed on Z)
O If there is more than one control variable Z, the
result is a multiple partial correlation
O A nice identity allows us to compute a
partial correlation by hand from a
multivariable model development
O Recall that R2 = [model sum of
squares]/[total sum of squares] = MSS / TSS
O A partial correlation is also a ratio of sums of
squares.

65
Cont…

66
Statistical Definition of
the Partial F Test
O Research Question: Does inclusion/exclusion
of the “extra” predictors explain significantly
more of the variability in outcome compared
to the variability that is explained by the
predictors that are already in the model?
O HO: Addition of Xp+1 ... Xp+k is of no statistical
significance for the prediction of Y after
controlling for the predictors X1 ... Xp meaning
that:
O βp+1 =βp+2 = ... =βp+k =0
O HA: Not

67
Cont…

68
Cont…

69
A Suggested Statistical Criterion for
Determination of Confounding
O A variable Z might be judged to be a
confounder of an X-Y relationship if
BOTH of the following are satisfied:
O Its inclusion in a model that already
contains X as a predictor has
adjusted significance level < .05; and
O Its inclusion in the model alters the
estimated regression coefficient for X
by 15-20% or more, relative to the
model that contains only X as a
predictor.
70
71
A Suggested Statistical Criterion
for Assessment of Interaction

O A “candidate” interaction variable


might be judged to be worth
retaining in the model if BOTH of
the following are satisfied:
O The partial F test for its inclusion
has significance level < .05.
O Its inclusion in the model alters the
estimated regression coefficient
for the main effects by 15-20% or
more.
72
Modeling Interactions
O Statistical Interaction: When the effect of one
predictor (on the response) depends on the
level of other predictors.
O Can be modeled (and thus tested) with cross-
product terms (case of 2 predictors):
O E(Y) = a + b1X1 + b2X2 + b3X1X2
O X2=0  E(Y) = a + b1X1
O X2=10  E(Y) = a + b1X1 + 10b2 + 10b3X1
= (a + 10b2) + (b1 + 10b3)X1
O The effect of increasing X1 by 1 on E(Y)
depends on level of X2, unless b3=0 (t-test)
73
Interaction Effects
Implies how variables occur
together has an impact on
prediction of the dependent
variable

Y = 0 + 1X1 + 2X2 + 3X1X2 + e

µY = 0 + 1X1 + 2X2 + 3X1X2


74
Interaction Effects

µY µY
µY = 18 + 5X1 µY = 30 + 15X1 X2 = 5
X2 = 2 60 –
50 –
X2 = 2
40 –
30 –
µY = 30 - 10X1 20 – µY = 18 + 15X1
X2 = 5
10 –
| | | |
X1 X1
1 2 1 2
(a) (b)

Figure 12.26
75
Quadratic and
Second-Order Models
Quadratic Effects

Y =  0 +  1X1 +  2X12 + e

Complete Second-Order Models

Y =  0 +  1X1 +  2X2 +  3X1X2 +  4X12 +  5X22 + e

Y =  0 +  1X1 +  2X2 +  3X3 +  4X1X2 +  5X2X3


+  6X2X3 +  7X12 +  8X22 +  9X32 + e 76
Comparing Regression Models
O Conflicting Goals: Explaining variation in Y while
keeping model as simple as possible
(parsimony)
O We can test whether a subset of k-g predictors
(including possibly cross-product terms) can be
dropped from a model that contains the
remaining g predictors. H0: bg+1=…=bk =0
O Complete Model: Contains all k predictors
O Reduced Model: Eliminates the predictors from H0
O Fit both models, obtaining the Error sum of
squares for each (or R2 from each)
77
Comparing Regression
Models
O H0: bg+1=…=bk = 0 (After removing the effects
of X1,…,Xg, none of other predictors are
associated with Y)
O Ha: H0 is false

( SSEr  SSEc ) /( k  g )
Test Statistic : Fobs 
SSEc /[ n  (k  1)]
P P( F Fobs )

78
P-value based on F-distribution with k-g and n-(k+1) d.f.
Compare the “Model 2” and “Model3” models using
a partial F test.

79
Dealing with outliers
O Extreme cases should be deleted or modified.
O Univariate outliers - detected via initial data
screening
O Bivariate outliers – detected via scatterplots
O Multivariate outliers - unusual combination of
predictors…

80
Multivariate outliers
O Can use Mahalanobis distance or Cook’s D as a
MV outlier screening procedure
O A case may be within normal range on all
variables, but represent a multivariate outlier
which unduly influences multivariate test
results
e.g., a person who:
O Is 19 years old
O Has 3 children
O Has an undergraduate degree
O Identify & check unusual cases
81
Multivariate outliers

O Mahalanobis distance is distributed as c2 with


d. of f. equal to no. of predictors (a = .001)
O If any cases have a Mahalanobis distance
greater than critical level -> multivariate
outlier.
O Cook’s D - identifies influential variables,
values >1 are considered unusual.

82
Detecting Sample
Outliers
 Sample leverages
 Standardized residuals
 Cook’s distance measure

Yi – ^
Yi

Standardized residual =
s 1 - hi

83
Regression coefficients

Y = b1x1 + b2x2 +.....+ bixi +


a+e
O Intercept (a)
O Slopes (b):
O Unstandardised
O Standardised
O Slopes are the weighted loading of IV, adjusted
for the other IVs in the model.
84
Cook’s Distance
1
Measure
h i
Di = k + 1 (standardized
1 - hi
residual)2
^
(Yi - Yi)2 hi
=
(k + 1)s2 (1 – hi)2

k 1 or 2 3 or 4 ≥5
DMAX .8 .9 1.0

85
Table 12.1
Standardized Regression Coefficients
O Measures the change in E(Y) in standard
deviations, per standard deviation change in
Xi, controlling for all other predictors (bi*)
O Allows comparison of variable effects that are
independent of units
O Estimated standardized regression
coefficients:

 sXi 
b bi 
*
i

 sY 
• where bi , is the partial regression coefficient and sXi and sY are
86
the sample standard deviations for the two variables
Unstandardised regression
coefficients
OB = unstandardised regression
coefficient
OUsed for regression equations
OUsed for predicting Y scores
OCan’t be compared with one
another unless all IVs are
measured on the same scale
87
Standardised regression
coefficients
OBeta (b or b) = standardised
regression coefficient
OUsed for comparing the relative
strength of predictors
O b = r in LR but this is only true
in MLR when the IVs are
uncorrelated.
88
Relative importance of IVs
OWhich IVs are the most
important?
OCompare the standardised
regression coefficients (b’s)

89
split sample validation
O It is expected that the results obtained from split
sample validations will vary somewhat from the
results obtained from the analysis using the full
data set. We will use the following as our criteria
that the validation verified our analysis and
supports the generalizability of our findings:
O First, the overall relationship between the
dependent variable and the set of independent SW
388
variable must be statistically significant for both
R7
validation analyses. Dat
a
O Second, the R² for each validation must be Ana
lysi
within 5% (plus or minus) of the R² for the s&
model using the full sample. Co
mp
uter
s II
split sample validation - 2
O Third, the pattern of statistical significance for
the coefficients of the independent variables for
both validation analyses must be the same as
the pattern for the full analysis, i.e. the same
variables are statistically significant or not
significant.

O For stepwise multiple regression, we require that the


SW
388
same variables be significant, but it is not required R7
that they enter in exactly the same order. Dat
a
O For hierarchical multiple regression, the R² change Ana
for the validation must be statistically significant. lysi
s&
Co
mp
uter
s II
Notes
O Findings are stated on the results for the
analysis of the full data set, not the validations.

O If our validation analysis does not support the


findings of the analysis on the full data set, we
will declare the answer to the problem to be
false. There is, however, another common
option, which is to only cite statistical
significance for independent variables SW
supported by the validation analysis as well as 388
R7
the full data set analysis. All other variables are
Dat
considered to be non-significant. Generally itAnaisa
the independent variables with the weakest lysi
individual relationship to the dependent s&
Co
variable which fail to validate. mp
uter
s II
Steps prior to the validation
analysis
O Prior to the split sample validation analysis, we must
test for conformity to assumptions and examine
outliers, making whatever transformations are needed
and removing outliers.

O Next, we must solve the regression problem to make


certain the that findings (existence, strength, direction,
and importance of relationships) stated in the problemSW
388
are correct and, therefore, in need of validation before
R7
final interpretation. Dat
a
Ana
O When we do the validation, we include whatever lysi
transformations and omission of outliers were present
s&
Co
in the model we want to validate. mpt
ers
II
What is MLR?
O Use of several IVs to predict a DV
O Provides a measure of overall fit (R)
O Makes adjustments for inter-
relationships among predictors
O e.g. IVs = height, gender DV = FEV
O Weights each predictor

94
Research question
O Do number of cigarettes (IV1) , exercise (IV2)
and cholesterol (IV3) predict CHD mortality
(DV)?

Cigarettes
Exercise CHD Mortality
Cholesterol

95
Research question
O “Does the number of years of
psychological study (IV1) and the
number of years of counseling
experience (IV2) predict clinical
psychologists’ effectiveness in treating
mental illness (DV)?”

Study
Experience Effectiveness
96
Example
“Does ‘ignoring problems’ (IV1)
and ‘worrying’ (IV2)
predict ‘psychological distress’ (DV)”
98
Y

.32 .52

.34
X2
X1

99
100
101
Coefficients a

Unstandardized Standardized
Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) 138.932 4.680 29.687 .000
Worry -11.511 1.510 -.464 -7.625 .000
Ignore the Problem -4.735 1.780 -.162 -2.660 .008
a. Dependent Variable: Psychological Distress

102
Coefficients a

Correlations
Model Zero-order Partial
1 Worry -.521 -.460
Ignore the Problem -.325 -.178
a. Dependent Variable: Psychological Distress

103
Y

.18
.46
.32 .52
.34

X2
X1

104
Prediction equations
Linear Regression
Psych. Distress = 119 - 9.50*Ignore
R2 = .11

Multiple Linear Regression


Psych. Distress = 139 - .4.7*Ignore - 11.5*Worry
R2 = .30

105
***Confidence interval for the
slope

b1 = -5.44
The 95% CI:
-6.17 £ b1 £ -4.70
The est. average consumption of oil is reduced by
between 4.7 gallons to 6.17 gallons per each increase
106
of
10 F.
Confidence interval for the
slope

Mental Health is reduced by between 8.5 and 14.5 units


per increase of Worry units.
Mental Health is reduced by between 1.2 and 8.2 units
107
per increase in Ignore the Problem units.
Example – Effect of violence,
stress, social support on
internalizing behavior

108
Study
OParticipants were children 8 to
12 years
OLived in high-violence areas,
USA
OHypothesis: violence and stress
lead to internalising behavior,
whereas social support would
reduce internalising behaviour.
109
Variables

OPredictors
ODegree of witnessing violence
OMeasure of life stress
OMeasure of social support
OOutcome
OInternalising behaviour
(e.g., depression, anxiety 110
symptoms)
Correlations
Pearson Correlation
Correlations
Internalizin
Amount g
violenced Current Social symptoms
witnessed stress support on CBCL
Amount violenced
witnessed
Current stress .050
Social support .080 -.080
Internalizing symptoms
.200* .270** -.170
on CBCL
*. Correlation is significant at the 0.05 level (2-tailed).
**. Correlation is significant at the 0.01 level (2-tailed). 111
R2

Model Summary
Adjusted Std. Error
R R of the
R Square Square Estimate
.37a .135 .108 2.2198
a. Predictors: (Constant), Social
support, Current stress, Amount
violenced witnessed 112
Test for overall
significance
• Shows if there is a linear relationship between
all of the X variables taken together and Y
• Hypothesis:
H0: b1 = b2 = … = bp = 0
(No linear relationships)
H1: At least one bi ¹ 0
(At least one independent variable effects Y)
113
Test for overall
significance
O Significance test of R2 given by ANOVA table

ANOVAb
Sum of Mean
Squares df Square F Sig.
Regression 454.482 1 454.48 19.59 .00a
Residual 440.757 19 23.198
Total 895.238 20
a. Predictors: (Constant), Cigarette Consumption per
Adult per Day 114

b. Dependent Variable: CHD Mortality per 10,000


Test for significance:
Individual variables
•Shows if there is a linear relationship between
each variable Xi and Y.
•Hypotheses:

H0: bi = 0
(No linear relationship)
H1: bi ¹ 0 115

(Linear relationship between Xi and Y)


Coefficientsa
Regression coefficients
UnstandardizedStandardized
Coefficients Coefficients
Std.
B Error Beta t Sig.
(Constant) .477 1.289 .37 .712
Amount
violenced .038 .018 .201 2.1 .039
witnessed
Current stress .273 .106 .247 2.6 .012
Social
-.074 .043 -.166 -2 .087
support 116

a. Dependent Variable: Internalizing symptoms on CBC


Regression equation
Yˆ b1 X 1  b2 X 2  b3 X 3  b0
0.038Wit  0.273Stress  0.074 SocSupp  0.477
O A separate coefficient or slope for each variable
O An intercept (here its called b0)

117
Interpretation
Yˆ b1 X 1  b2 X 2  b3 X 3  b0
0.038Wit  0.273Stress  0.074 SocSupp  0.477
O Slopes for Witness and Stress are positive, but
slope for Social Support negative.
O If you had subjects with identical Stress and
Social Support, a one unit increase in Witness
would produce .038 unit increase in
Internalising symptoms.
118
Predictions
If Witness = 20, Stress = 5, and SocSupp = 35, then
we would predict that internalising symptoms
would be..... .012.

Yˆ .038 *Wit  .273 * Stress  .074 * SocSupp  0.477


.038(20)  .273(5)  .074(35)  0.477
.012
119
120
Variables
O IVs:
O Human & Built Capital (Human Development Index)
O Natural Capital (Ecosystem services per km2)
O Social Capital (Press Freedom)
O DV = Life satisfaction
O Units of analysis: Countries
(N = 57; mostly developed countries, e.g., in
Europe and America)

121
122
O R2 = .35
123
O Two sig. IVs (not Social Capital - dropped)
O R2 = .72 124

(after dropping 6 outliers)


125
126
Assumptions
O IVs = metric (interval or ratio) or
dichotomous
O DV = metric (interval or ratio)
O Linear relations exist between IVs & DVs
O IVs are not overly correlated with one
another (multicollinearity- e.g., over .7)
O Normality enhances solution
O Homoscedasticity
O Ratio of cases to IVs; total N:
• Min. 5:1; > 20 cases total
127
• Ideal 20:1; > 100 cases total
Causality
O Like correlation, regression does not tell us
about the causal relationship between
variables.
O In many analyses, the IVs and DVs could be
switched – therefore, it is important:
O Take a theoretical position
O Acknowledge alternative explanations

128
General MLR strategy
O Check assumptions
O Conduct MLR – choose type
O Interpret the output
O Develop a regression equation

129
1. Check assumptions
O Assumptions
(Xs not correlated, X-Y linear relations,
normal distributions, homoscedasticity)
O Check histograms (normality)
O Check scatterplots (linearity & outliers)
O Check correlation table (linearity &
collinearity)
O Check influential outlying cases (mv
outliers)
O Check residual plots
130
2. Conduct MLR
Conduct a multiple linear regression:
O Standard
O Hierarchical
O Stepwise
O Forward
O Backward

131
3. Interpret the results
Interpret the technical and psychological
meaning of the results, based on:
O Overall amount of Y predicted (R, R2,
Adjusted R2, the statistical
significance of R)
O Changes in R and F change if
hierarchical.
O Coefficients for IVs
Standardised and unstandardised
regression coefficients for IVs in each
model (b, B).
O Relations between X predictors (r) 132
O Zero-order and partial correlations for
4. Regression equation
O MLR is usually for explanation,
sometimes prediction
O If useful, develop a regression equation for
the final model.
O Interpret constant and slopes.

133
References
O Kliewer, W., Lepore, S.J., Oskin, D., &
Johnson, P.D. (1998) The role of social
and cognitive processes in children’s
adjustment to community violence.
Journal of Consulting and Clinical
Psychology, 66, 199-209.
O Vemuri, A. W., & Constanza, R. (2006).
The role of human, social, built, and
natural capital in explaining life
satisfaction at the country level: Toward
134
a National Well-Being Index (NWI).
Ecological Economics, 58, 119-133.
Goodness-of-Fit and Regression Diagnostics

a. Introduction and Terminology


Neither prediction nor estimation have meaning
when the estimated model is a poor fit to the data:

135
Cont…
O Our eye “tells” us:
O A better fitting relationship between
X and Y is quadratic
O We notice different sizes of
discrepancies
O Some observed Y are close to the
fitted Y’ (e.g. near X=1 or X=8)
O Other observed Y are very far from
the fitted Y’ (e.g. near X=5)
136
Cont…
O Poor fits of the data to a fitted line can occur
for several reasons and can occur even when
the fitted line explains a large proportion (R2)
of the total variability in response:
O The wrong functional form (link function) was fit.
O Extreme values (outliers) exhibit uniquely large
discrepancies between observed and fitted
values.
O One or more important explanatory variables
have been omitted.
O One or more model assumptions have been
violated.

137
Cont…
O Consequences of a poor fit include:
O We learn the wrong biology.
O Comparison of group differences aren’t
“fair” because they are unduly influenced
by a minority.
O Comparison of group means aren’t “fair”
because we used the wrong standard
error.
O Predictions are wrong because the fitted
model does not apply to the case of
interest.
138
Cont…
O Available techniques of goodness-of-fit
assessment are of two types:

O Systematic - those that explore the


appropriateness of the model itself
O Have we fit the correct model?
O Should we fit another model?
O Case Analysis – those that investigate the
influence of individual data points
O Are there a small number of individuals whose
inclusion in the analysis influences excessively
the choice of the fitted model?
139
Systematic Component

140
Case analysis

141
b. Assessment of Normality
O Recall what we are assuming wrt normality:
O Simple Linear Regression:
At each level “x” of the predictor variable X, the
outcomes Y are distributed normal with mean = μY|x=
β0 + β1x and constant variance σ2Y|x
O Multiple Linear Regression:
At each vector level “x = [x1, x2, ...,xp] ” of the
predictor vector X, the outcomes Y are distributed
normal with mean =
μY|x= β0 + β1x1+ β2x2 +...+ βpxp and constant variance
σ2Y|x

142
Some graphical assessments of normality and
what to watch out for:

Method What to watch out for


Histogram of outcome Look for normal shape of
variable Y and/or the histogram.
Histogram of residuals
Histogram of residuals Look for normal shape of
(or studentized or the histogram.
jackknife residuals)
Quantile quantile plot of Normally distributed
the quantiles of the residuals will appear,
residuals versus the approximately, linear.
quantiles of the assumed
normal distribution of
the residuals.
143
144
Hypothesis Tests of Normality and
what to watch out for:

145
Cont…
O Guidelines
O In practice, the assessment of normality is
made after assessment of other model
assumption violations. The linear model is
often more robust to violations of the
assumption of normality. The cure, is often
worse than the problem. (e.g. –
transformation of the outcome variable)
O consider doing a scatterplot of the residuals.
O Look for
O Bell shaped pattern
O Center at zero
O No gross outliers

146
147
148
149
c. Cook-Weisberg Test of
Heteroscedasticity

O Evidence of a violation of
homogeneity (this is
heteroscecasticity) is seen when
O There is increasing or decreasing
variation in the residuals with fitted

O There is increasing or decreasing
variation in the residuals with
predictor X

150
Cont…

151
152
153
154
Outlier detection

O Outlier detection involves the


determination whether the residual
(error = predicted – actual) is an
extreme negative or positive value.
O We may plot the residual versus the
fitted plot to determine which errors are
large, after running the regression.
O The command syntax was already
demonstrated with the graph on page
55: rvfplot, border yline(0)
155
156
Limits of Standardized Residuals

O If the standardized residuals have


values in excess of 3.5 and -3.5, they
are outliers.
O If the absolute values are less than
3.5, as these are, then there are no
outliers
O While outliers by themselves only
distort mean prediction when the
sample size is small enough, it is
important to gauge the influence of
157
outliers.
158
159
Studentized Residuals

O Alternatively, we could form


studentized residuals. These are
distributed as a t distribution with
df=n-p-1, though they are not quite
independent.
O Therefore, we can approximately
determine if they are statistically
significant or not.
O •Belsley et al. (1980) recommended
the use of studentized residuals.
160
Studentized Residual

161
162
163
164
Robust statistical options when assumptions
are violated
O Nonlinearity
1.Transformation to linearity
2.Nonlinear regression
O Influential Outliers
1.Robust regression with robust weight functions
2.rreg y x1 x2
O Heteroskedasticity of residuals
1.Regression with Huber/White/Sandwich variance-
covariance estimators
2.Regress y x1 x2, robust
O Residual autocorrelation correction
1.Autoregression with prais y x1 x2, robust
2.newey-west regression
O Nonnormality of residuals
1.Quantile regression: qreg y x1 x2 165
2.Bootstrapping the regression coefficients
regression diagnostics for the
detection of a poor fit:

The fit is poor because the true relationship is


quadratic, not linear. 166
What do we mean by poor

O We notice that the discrepancies between


the observed and the fitted values are not
of consistent size.
O Some are large and some are small.
O Goodness-of-fit assessments are formal
techniques for identifying such
inconsistencies.
O These techniques become especially
important when a picture is not possible,
as when the number of predictors is
greater than one.
167
Cont…
O Assessing regression model adequacy was
introduced previously and we noticed that
these assessments are of two types:
O Systematic component
O Is the logistic model formulation correct?
O Is the assumption of linearity on the ln(odds)
scale correct?
O Does the fitted model predict well?
O Case analysis
O Is the fitted model excessively influenced by
one or a small number of individuals?
168

You might also like