Correlation
Correlation
5
You first need to look if
your data has a 4
correlation through a
3
scatterplot – patterns,
direction, strength of 2
correlation
1
0
0 2 4 6 8 10 12
Patterns of
Correlation
Linear Correlation
-relation between two
variables that shows up
on a scatter diagram as
the dots roughly
following a straight line.
8
0
0 2 4 6 8 10 12
Patterns of
Correlation
Curvilinear Correlation
-relation between two
variables that shows up on
a scatter diagram as dots
following a systematic
pattern but is not a
straight line
e.g. age and memory
Other types of
Non-Linear
Relationships
in Correlation
Patterns of
Correlation
No Correlation
-no systematic
relationship between two
variables
Direction of
Correlation
Positive Correlation
-also known as direct correlation,
is when the two variables tend to
change in the same direction.
The Pearson 𝑟=
degree to which X and Y vary together
degree to which X and Y vary separately
Correlation =
covariability of X and Y
variability of X and Y separately
Pearson Product- When there is a perfect linear relationship, every change in the X
moment Correlation variable accompanied by a corresponding change in the Y variable.
The covariability (X and Y together) is identical to the variability of X
and Y separately, and the formula produces a correlation with a
magnitude of 1.00 or -1.00
The Sum of
Products of Definitional formula
Deviations 𝑆𝑃 = 𝛴 𝑋 − 𝑀𝑋 − 𝑌 − 𝑀𝑌
Where:
This new value and
concept is similar to SS 𝑀𝑋 is the mean for the X scores
(the sum of squared 𝑀𝑌 is the mean for the Y scores
deviations), which is used
to measure variability for
1. Find the X deviation 𝑋 − 𝑀𝑋 and the Y deviation for each
a single variable. individual 𝑌 − 𝑀𝑌
2. Find the product of the deviations for each individual (XY)
SP is used to measure the
3. Add the products.
amount of covariability
between two variables.
Scores Deviations Products
X Y
1 3 -1.5 -2 3
The Sum of
2 6 -0.5 1 -0.5
Products of
Deviations 4 4 1.5 -1 -1.5
3 7 0.5 2 1
Definitional Formula
2.5 5
SP= 2
X Y XY
1 3 3
The Sum of 2 6 12
Products of 4 4 16
Deviations 3 7 21
10 20 52
Computational Formula Totals
𝛴𝑋𝛴𝑌
𝑆𝑃 = 𝛴𝑋𝑌 = 𝛴𝑋𝛴𝑌 10 20
𝑛 𝑆𝑃 = 𝛴𝑋𝑌 = = 52 − = 52 − 50 𝑆𝑃 = 2
𝑛 4
What is the sum of products for the
following data?
X Y
2 1
2 3
5 4
3 2
1 2
EXAMPLE:
X Y
Calculation of
0 2
the Pearson
Correlation 10 6
4 2
𝑆𝑃 8 4
𝑟=
𝑆𝑆𝑥 𝑆𝑆𝑌 8 6
Squared
Scores Deviations Products
Deviations
Calculation of X Y 𝑋 − 𝑀𝑋 Y−𝑀𝑌 (𝑋 − 𝑀𝑋 )2 (𝑌 − 𝑀𝑌 )2 (𝑋 − 𝑀𝑋 )(𝑌 − 𝑀𝑌 )
(1) SSX,
0 2 -6 -2 36 4 12
(2) SSY, and 10 6 4 2 16 4 8
(3) SP 4 2 -2 -2 4 4 4
for a sample of 8 4 2 0 4 0 0
n=5 pairs of 8 6 2 2 4 4 4
scores MX=6 MY=4 SSX=64 SSY=16 SP= 28
𝑆𝑃 28 28
𝑟= = = = 0.875
𝑆𝑆𝑥 𝑆𝑆𝑌 (64)(16) 32
A set of n=5 pairs of X andY values has SSx=5,
SSy=20, and SP=8. what is the Pearson
correlation?
The value of correlation describes the relationship that
exists in the data, also, it indicates how clustered the
points around a line.
Correlations
and the The positive sign for the correlation indicates that the
Pattern of points are clustered around a line that slopes to the
right. The high value for the correlations (near 1.00)
Data Points indicates that the points are very tightly clustered close
to the line.
Because the Pearson correlation describes the pattern
formed by the data points, any factor that does not
change the pattern also does not change the
correlation. Adding a constant to (or subtracting a
Correlations constant from) each X and/or Y value does not change
the pattern of data points and does not change the
and the correlation. Also, multiplying (or dividing) each X or each
Pattern of Y value by a positive constant does not change the
pattern and does not change the value of the correlation.
Data Points Multiplying by a negative constant, however, produces a
mirror image of the pattern and, therefore, changes the
sign of the correlation.
A set of n = 15 pairs of X andY values has a Pearson correlation
of r = 0.10. If each of the Xvalues were multiplied by 2, then
what is the correlation for the resulting data?
Correlation Measure of degree of correlation between two variables ranging
from -1 (a perfect negative linear correlation) through 0 (no
Coefficient (r) correlation) to +1 (a perfect correlation)
Person
Product- Also called Pearson’s r, devised by Karl Pearson, is a measure of the
strength of a linear association between variables.
Moment
Correlation
Using and Interpreting
Pearson Correlation
Using and
Where and Why Correlations are Used
Interpreting
1. Prediction
Pearson
2. Validity
Correlation
3. Reliability
Where and Why 4. Theory Verification
Correlations are Used
Using and When you encounter correlations, there are four
Interpreting additional considerations that you should bear in mind.
Pearson 1. Correlation and Causation
Correlation 2. Correlation and Restricted Range
3. Outliers
Interpreting 4. Correlation and the Strength of the Relationship
Correlations
It only describes a relationship between two variables, it
Correlation and does not explain why the two variables are related to
Causation each other.
Whenever a correlation is computed from scores that do
not represent the full range of possible values, you
should be cautious in interpreting the correlation.
The correlation within this restricted range could be
completely different from the correlation that would be
Correlation and obtained from a full range of data.
Restricted Range
This relationship is obscured when the data are limited
to a restricted range. For a correlation to provide an
accurate description for the general population, there
should be a wide range of X and Y values in the data.
Outliers
Correlation and
The value r2 is called the coefficient of determination
the Strength of because it measures the proportion of variability in one
the Relationship variable that can be determined from the relationship
with the other variable.
A correlation of r = 0.80 (or –0.80), for example, means
that r2 = 0.64 (or 64%) of the variability in the Y scores can
be predicted from the relationship with X.
The correlation for a set of X andY scores is r = 0.60. The scores are
separated into two groups, with one group consisting of individuals
with X values that are equal to or above the median and the other
group consisting of individuals with X values that are below the
median. If the correlation is computed for the group with X values
below the median, how will the correlation compare with the
correlation for the full set of scores?
Spearman
Correlation (rS)
Values are to be computed in ranks, and not the original scores. The
X ranks are simply the integers 1,2,3,4, and 5. 2 2
2 ΣX 15
− = 55 − = 10
ΣX=15 𝑺𝑺𝑿 = ΣX 𝑛 5
The ΣX2=55
Spearman Note that the ranks for Y are identical to the ranks for X; that is, they
are the integers 1, 2, 3, 4, and 5. Therefore, the SS for Y is identical to
Correlation (rS) the SS for X:
𝑺𝑺𝑿 = 10
Computing the Correlation When computing the SP Value, we need ΣX, ΣY, and ΣXY for the
we need (1) SS for X, ranks
(2) SS for Y, and ΣX ΣY 15 15
𝑺𝑷 = ΣXY − = 36 − = −9
(3) SP 𝑛 5
Finally, the Spearman correlation simply uses the Pearson formula
for the ranks.
𝑆𝑃 −9
𝑟= = = −0.9
𝑆𝑆𝑥 𝑆𝑆𝑌 10(10)
When you are converting scores into ranks for the Spearman
correlation, you may encounter two (or more) identical scores
1. List the scores in order form smallest to largest. Include tied
values in the list.
2. Assign a rank (first, second, etc.) to ach position in the ordered
list.
3. When two (or more) scores are tied, compute the mean of their
The ranked positions, and assign this mean value as the final rank for
ach score.
Spearman
Correlation (rS)
Ranking Tied Scores
If the following scores are ranked, what rank is assigned to an
individual with a score of X = 7?
Scores: 1, 2, 2, 5, 6, 6, 6, 7, 10, 12
a. 5
b. 6
c. 7
d. 8
After the original X values and Y values have been ranked, the calculations
necessary for SS and SP can be greatly simplified.
To compute the mean for these integers, you can locate the midpoint of the series
by
(𝒏 + 𝟏)
𝑴=
𝟐
The Similarly, the SS for this series of integers can be computed by
𝒏(𝒏𝟐 − 𝟏)
Spearman 𝑺𝑺 =
𝟏𝟐
Correlation (rS) Also, because the X ranks and the Y ranks are the same values, the SS for X is
identical to the SS for Y.
Testing the The null hypothesis states that there is no correlation (no monotonic
relationship) between the variables for the population, or in
Significance of symbols: