0% found this document useful (0 votes)
43 views32 pages

Lecture 11 Correlation Edited

Correlational research examines relationships between naturally occurring variables without implying causation. The correlation coefficient measures the strength and direction of linear relationships between continuous variables on a scale from -1 to 1. Positive correlations indicate variables increase together, while negative correlations mean variables decrease together. Correlation coefficients of ±0.1, ±0.3 and ±0.5 represent small, medium and large effect sizes.

Uploaded by

lamita
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views32 pages

Lecture 11 Correlation Edited

Correlational research examines relationships between naturally occurring variables without implying causation. The correlation coefficient measures the strength and direction of linear relationships between continuous variables on a scale from -1 to 1. Positive correlations indicate variables increase together, while negative correlations mean variables decrease together. Correlation coefficients of ±0.1, ±0.3 and ±0.5 represent small, medium and large effect sizes.

Uploaded by

lamita
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

CORRELATION

CORRELATIONAL RESEARCH
• Helps us describe the relationship between two or more
naturally occurring variables.

• Is age related to political conservativism?

• Are highly extraverted people less afraid of rejection than less


extraverted people?

• Is depression correlated with hypochondriasis?

• Is I.Q. related to reaction time?

• Notice all these questions are about associations,


relationships, not causes or impacts.
WHAT IS A CORRELATION?

• It is a way of

• measuring the extent to which two variables


are linearly related.

• quantifying the pattern of responses across


variables.

• summarizing the ENTIRE relationship between


two variables into a single value.
MEASURING RELATIONSHIPS

We need to see whether as one variable increases, the


other increases, decreases or stays the same.
MEASURING RELATIONSHIPS

This can be done by calculating the covariance


between two variables.

We look at how much each score deviates from the


mean.

If scores on both variables deviate from the mean by


the same amount, they are likely to be related.
VARIANCE AND COVARIANCE
• The variance tells us by how much scores deviate from
the mean for a single variable.
∑ (X - X̄ ) (X - X̄ )
Variance =
N-1

• Covariance tells is by how much scores on two


variables differ from their respective means.

∑ (X - X̄ ) (Y - Ȳ)
Covariancex,y =
N-1
CALCULATE COVARIANCE
Participant: 1 2 3 4 5 Mean S

Adverts Watched 5 4 4 6 8 5.4 1.67

Packets Bought 8 9 10 13 15 11.0 2.92

∑( x i − x )(y i − y )
cov(x , y ) =
N −1
( −0.4)(−3) + ( −1.4)(−2 ) + ( −1.4)(−1) + (0.6)(2 ) + ( 2.6)(4)
=
4
1 . 2 + 2. 8 + 1 .4 + 1. 2 + ∑
10.(X
4 - X̄ ) (Y - Ȳ)
=Covariancex,y =
4 N-1
= 17
4
= 4.25
CALCULATE COVARIANCE
Participant: 1 2 3 4 5 Mean S

Adverts Watched 5 4 4 6 8 5.4 1.67

Packets Bought 8 9 10 13 15 11.0 2.92

∑( x i − x )(y i − y )
cov(x , y ) =
N −1
( −0.4)(−3) + ( −1.4)(−2 ) + ( −1.4)(−1) + (0.6)(2 ) + ( 2.6)(4)
=
4
1.2 + 2.8 + 1.4 + 1.2 + 10.4
=
4
= 17
4
= 4.25
PROBLEMS WITH COVARIANCE
• It is affected by the units of measurement.

• For example, the covariance of two variables measured


in miles might be 4.25, but if the same scores are
converted to kilometres, the covariance is 11.

• How can we remedy this?

• Standardize it!

The standardized version of covariance is known as the


correlation coefficient.
CALCULATE STANDARDIZED
COVARIANCE
Participant: 1 2 3 4 5 Mean S

Adverts Watched 5 4 4 6 8 5.4 1.67

Packets Bought 8 9 10 13 15 11.0 2.92

Covariancex,y
Correlation Coefficient = 4.25
Correlation Coefficient = S xS y
(1.67)(2.92)

= (X - X̄ ) (Y - Ȳ)
∑.87
=
(N-1)SxSy
CALCULATE STANDARDIZED
COVARIANCE
Participant: 1 2 3 4 5 Mean S

Adverts Watched 5 4 4 6 8 5.4 1.67

Packets Bought 8 9 10 13 15 11.0 2.92

4.25
Correlation Coefficient =
(1.67)(2.92)

= .87
CORRELATION COEFFICIENT
• Pearson’s r is the most commonly used measure of
correlation.

• It is used when both variables are measured on a continuous


scale.

• A correlation coefficient can range between -1.00 and +1.00


CORRELATION COEFFICIENT
• The sign indicates the direction of the relationship.

Positive Negative
The variables have similar patterns The variables have opposite patterns
Variable X Variable Y Variable X Variable Y

Variable X Variable Y Variable X Variable Y

• The number indicates the strength of the relationship.


WEAK STRONG

Variables are Extreme Xs paired with


inconsistently related extreme Ys
Variables are
consistently related
CORRELATION COEFFICIENT
• The sign indicates the direction of the relationship.

Positive Negative
The variables have similar patterns The variables have opposite patterns
Variable X Variable Y Variable X Variable Y

Variable X Variable Y Variable X Variable Y

• The number indicates the strength of the relationship.


STRONG WEAK STRONG

-1.00 0 +1.00
Perfect negative Perfect positive
Zero correlation
correlation correlation
CORRELATION COEFFICIENT AS
MEASURES OF EFFECT SIZE
• Correlation coefficients are commonly used as indicators of
an effect size.

• General benchmarks:

• ±.1 = small effect

• ±.3 = medium effect

• ±.5 = large effect


CORRELATION COEFFICIENT
Relationship beween Airplays and Sales

300

200
Sales

100

0
0 20 40 60
Airplay

There is a positive correlation between # of airplays and sales.


CORRELATION COEFFICIENT
Relationship beween Airplays and Sales

300

200
Sales

100

0
0 20 40 60
Airplay

What is the magnitude of the correlation?


CORRELATION COEFFICIENT IN R

Relationship beween Airplays and Sales

> cor(album_sales$airplay, album_sales$sales,


use = "complete.obs", method = “pearson”) 300

• 0.5989188
200

Sales
> cor(album_sales, use = "complete.obs", method
= “pearson”)
100

• Full Correlation Matrix


0
0 20 40 60
Airplay
CORRELATION COEFFICIENT
Relationship beween Exam Scores and Anxiety Levels
100

75
Anxiety Level

50

25

0 25 50 75 100
Exam Scores

There is a negative correlation between anxiety and exam scores.


CORRELATION COEFFICIENT IN R

Relationship beween Exam Scores and Anxiety Levels


100
> cor(exam$Exam, exam$Anxiety, use = "complete.obs",
method = “pearson”)
75

• -0.4409934

Anxiety Level
> cor(exam, use = "complete.obs", method = “pearson”) 50

• Error in cor(exam, use = "complete.obs", method =


"pearson") : 'x' must be numeric 25

• WHY? Check structure of variables in data


frame. Are there any non-numeric variables? If 0

yes, you need to remove them from the 0 25 50


Exam Scores
75 100

correlation function.
• HOW? Subset, or specify variables as above.
cor(exam[1:4], use="complete.obs", method="pearson")
#or
cor(exam[,names(exam)!="Gender"], use="complete.obs",
method="pearson")
#or
cor(exam[,-5], use="complete.obs", method="pearson")
COEFFICIENT OF DETERMINATION

r2

• Correlation coefficients can also be used to measure the


amount of variability in one variable that is accounted for
by variability in another.
CORRELATION AND COEFFICIENT OF
DETERMINATION BETWEEN X AND Y

r r2 Interpretation
1% of the variability in X is accounted for by Y.
± .1 .01 or
1% of the total variance in X is systematic variance shared with Y.

± .3 .09 9% of the variability in X is accounted for by Y.

± .5 .25 25% of the variability in X is accounted for by Y.


HYPOTHESIS TESTING
• How do we know whether the correlation we observed in
our sample is indicative of the true correlation, or real
relationship, in the population?

• We conduct a hypothesis test!

Calculate the probability of obtaining a correlation


coefficient at least as extreme as the one we’ve obtained,
if the correlation were zero.

Assuming H0
HYPOTHESIS TESTING
• How do we know whether the correlation we observed in
our sample is indicative of the true correlation, or real
relationship, in the population?

• We conduct a hypothesis test!

If there is no more than 5% chance of obtaining a correlation


coefficient at least as big as the one we obtained had the true
correlation been zero, then we conclude the correlation is
significantly different from zero.
HYPOTHESIS TESTING

Step 1: State the hypotheses

}
Step 2: Set up sampling distribution & critical r

Step 3: Compute obtained r In R

Step 4: Compare obtained r to critical r

Step 5: Report Results


HYPOTHESIS TESTING IN R

Relationship beween Exam Scores and Anxiety Levels


100

75

Anxiety Level
50

25

0 25 50 75 100
Exam Scores

H0: The true correlation is zero. There is no relationship between the


two variables. rho = 0
H1: The true correlation not zero. There is a relationship between the
two variables. rho ≠ 0
HYPOTHESIS TESTING IN R

Relationship beween Exam Scores and Anxiety Levels


100

75

Anxiety Level
50

25

0 25 50 75 100
Exam Scores

H0: The true correlation is zero. There is no relationship between the


two variables. rho = 0
H1: The true correlation not zero. There is a relationship between the
two variables. rho ≠ 0
HYPOTHESIS TESTING IN R

Relationship beween Exam Scores and Anxiety Levels


100

75

Anxiety Level
50

25

0 25 50 75 100
Exam Scores

H0: The true correlation is zero. There is no relationship between the two
variables. rho = 0
H1: The true correlation not zero. There is a relationship between the two
variables. rho ≠ 0
H0: The true correlation is zero. There is no relationship between the
Based on these data, we reject the null hypothesis. There is a significant
two variables. rho = 0
relationship between exam performance and exam anxiety, r = -.44, p<.05,
H1: The true correlation not zero. There is a relationship between the
two-tailed. Exam performance goes down as exam anxiety goes up. Anxiety
two variables.
accounts rho ≠of0variance in exam performance.
for 19.36%
PARTIAL CORRELATIONS

• Measures the relationship between Exam


Performance

two variables, controlling for the effect 1


that a third variable has on them both.
Variance Accounted for by Exam Anxiety
Exam Anxiety (19.4%)

• If a partial correlation between two Exam


Performance

variables is significantly lower than the 2 Revision Time

Pearson correlation between the two


variables, then the correlation
Variance Accounted for by
Revision Time (15.7%)

Unique variance accounted

between them is at least partly due to


for by Revision Time

Variance accounted for by Exam


both Exam Anxiety and Performance
the third variable (or to a variable
Revision Time

3 Revision Time

associated with the third variable).


Unique variance accounted Exam Anxiety
for by Exam Anxiety
FACTORS AFFECTING STATISTICAL
SIGNIFICANCE OF CORRELATION
COEFFICIENT
Sample size
• Larger samples arrive at better estimates of the true
correlation.

Magnitude of the correlation


• Larger correlation will be easier to detect.

Alpha
• Your willingness to commit a type 1 error.
• Increasing your alpha level makes it more likely that
you will obtain a statistically significant correlation
coefficient. Decreasing alpha makes it less likely.
CORRELATION AND CAUSATION

• A correlation between two variables does not imply that


one causes the other. A significant correlation could
mean:
• X may cause Y.

• Y may cause X.
• A third variable may cause X and Y.
• Some other variable is related to both our variables
and accounts for their “relationship”
REQUIREMENTS FOR CAUSATION
• Covariation
• changes in one variable are associated with changes in
the other variable
• Directionality
• the presumed causal variable preceded the presumed
effect in time
• Extraneous variables
• all other variables that may affect the relationship
between the two target variables are controlled or
eliminated
• Correlational research satisfies the first (and sometimes
the second) criterion, but never the third.

You might also like