Lecture 11 Correlation Edited
Lecture 11 Correlation Edited
CORRELATIONAL RESEARCH
• Helps us describe the relationship between two or more
naturally occurring variables.
• It is a way of
∑ (X - X̄ ) (Y - Ȳ)
Covariancex,y =
N-1
CALCULATE COVARIANCE
Participant: 1 2 3 4 5 Mean S
∑( x i − x )(y i − y )
cov(x , y ) =
N −1
( −0.4)(−3) + ( −1.4)(−2 ) + ( −1.4)(−1) + (0.6)(2 ) + ( 2.6)(4)
=
4
1 . 2 + 2. 8 + 1 .4 + 1. 2 + ∑
10.(X
4 - X̄ ) (Y - Ȳ)
=Covariancex,y =
4 N-1
= 17
4
= 4.25
CALCULATE COVARIANCE
Participant: 1 2 3 4 5 Mean S
∑( x i − x )(y i − y )
cov(x , y ) =
N −1
( −0.4)(−3) + ( −1.4)(−2 ) + ( −1.4)(−1) + (0.6)(2 ) + ( 2.6)(4)
=
4
1.2 + 2.8 + 1.4 + 1.2 + 10.4
=
4
= 17
4
= 4.25
PROBLEMS WITH COVARIANCE
• It is affected by the units of measurement.
• Standardize it!
Covariancex,y
Correlation Coefficient = 4.25
Correlation Coefficient = S xS y
(1.67)(2.92)
= (X - X̄ ) (Y - Ȳ)
∑.87
=
(N-1)SxSy
CALCULATE STANDARDIZED
COVARIANCE
Participant: 1 2 3 4 5 Mean S
4.25
Correlation Coefficient =
(1.67)(2.92)
= .87
CORRELATION COEFFICIENT
• Pearson’s r is the most commonly used measure of
correlation.
Positive Negative
The variables have similar patterns The variables have opposite patterns
Variable X Variable Y Variable X Variable Y
Positive Negative
The variables have similar patterns The variables have opposite patterns
Variable X Variable Y Variable X Variable Y
-1.00 0 +1.00
Perfect negative Perfect positive
Zero correlation
correlation correlation
CORRELATION COEFFICIENT AS
MEASURES OF EFFECT SIZE
• Correlation coefficients are commonly used as indicators of
an effect size.
• General benchmarks:
300
200
Sales
100
0
0 20 40 60
Airplay
300
200
Sales
100
0
0 20 40 60
Airplay
• 0.5989188
200
Sales
> cor(album_sales, use = "complete.obs", method
= “pearson”)
100
75
Anxiety Level
50
25
0 25 50 75 100
Exam Scores
• -0.4409934
Anxiety Level
> cor(exam, use = "complete.obs", method = “pearson”) 50
correlation function.
• HOW? Subset, or specify variables as above.
cor(exam[1:4], use="complete.obs", method="pearson")
#or
cor(exam[,names(exam)!="Gender"], use="complete.obs",
method="pearson")
#or
cor(exam[,-5], use="complete.obs", method="pearson")
COEFFICIENT OF DETERMINATION
r2
r r2 Interpretation
1% of the variability in X is accounted for by Y.
± .1 .01 or
1% of the total variance in X is systematic variance shared with Y.
Assuming H0
HYPOTHESIS TESTING
• How do we know whether the correlation we observed in
our sample is indicative of the true correlation, or real
relationship, in the population?
}
Step 2: Set up sampling distribution & critical r
75
Anxiety Level
50
25
0 25 50 75 100
Exam Scores
75
Anxiety Level
50
25
0 25 50 75 100
Exam Scores
75
Anxiety Level
50
25
0 25 50 75 100
Exam Scores
H0: The true correlation is zero. There is no relationship between the two
variables. rho = 0
H1: The true correlation not zero. There is a relationship between the two
variables. rho ≠ 0
H0: The true correlation is zero. There is no relationship between the
Based on these data, we reject the null hypothesis. There is a significant
two variables. rho = 0
relationship between exam performance and exam anxiety, r = -.44, p<.05,
H1: The true correlation not zero. There is a relationship between the
two-tailed. Exam performance goes down as exam anxiety goes up. Anxiety
two variables.
accounts rho ≠of0variance in exam performance.
for 19.36%
PARTIAL CORRELATIONS
3 Revision Time
Alpha
• Your willingness to commit a type 1 error.
• Increasing your alpha level makes it more likely that
you will obtain a statistically significant correlation
coefficient. Decreasing alpha makes it less likely.
CORRELATION AND CAUSATION
• Y may cause X.
• A third variable may cause X and Y.
• Some other variable is related to both our variables
and accounts for their “relationship”
REQUIREMENTS FOR CAUSATION
• Covariation
• changes in one variable are associated with changes in
the other variable
• Directionality
• the presumed causal variable preceded the presumed
effect in time
• Extraneous variables
• all other variables that may affect the relationship
between the two target variables are controlled or
eliminated
• Correlational research satisfies the first (and sometimes
the second) criterion, but never the third.