0% found this document useful (0 votes)
18 views59 pages

Correlation

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views59 pages

Correlation

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 59

Correlation

Czarina Trixia Sy Giangan


Understanding the basic logic behind a statistical technique for
determining the relationship between two variables (correlation)

Show competency in calculating, analysing, and interpreting


correlation coeffiecient
Learning
Objective Recognize the requirements of the Pearson Product-Moment
Correlation (Pearson’s r)

Identify the alternatives to Pearson’s r if its requirements were not


met
When should we use a
Correlational Statistical
Test?
Correlational  A non experimental quantitative research design where two
different variables are observed to determine whether there is a
Design relationship between them.
Correlation A statistical technique that is used to measure and describe the
(Bivariate Correlation) strength, magnitude, and the direction of the relationship between
two variables
1. Two or more continuous variables
2. Cases that have values on both variables
3. Linear relationship between the variables
Data 4. Independent cases
Requirements 5. Bivariate normality
6. Random sample data from the population
7. No outliers
 It is a graph showing the relationship between two variables
Scatter  The values of one variable are along the horizontal axis (X
Diagram variable/predictor/IV) and the values of the other variable are
along the vertical axis (Y variable/outcome variable/criterion/DV).
(Scatterplot)  Each scores is shown as a dot in this two-dimensional space.
Example:
Suppose a researcher is studying the relationship of sleep to mood.
As an initial test, the researcher asks six students in her morning
seminar two questions.
1. How many hours did you sleep last night?
How to make a 2. How happy do you feel right now on a score from 0 – not at all
Scatter happy to 8 – extremely happy?
Diagram 3. The following are the scores acquired from the participants
Participant Hours Slept Happy Mood
A 5 2
B 7 4
C 8 7
D 6 2
E 6 3
F 10 6
8

5
You first need to look if
your data has a 4
correlation through a
3
scatterplot – patterns,
direction, strength of 2
correlation
1

0
0 2 4 6 8 10 12
Patterns of
Correlation
Linear Correlation
-relation between two
variables that shows up
on a scatter diagram as
the dots roughly
following a straight line.
8

0
0 2 4 6 8 10 12
Patterns of
Correlation
Curvilinear Correlation
-relation between two
variables that shows up on
a scatter diagram as dots
following a systematic
pattern but is not a
straight line
e.g. age and memory
Other types of
Non-Linear
Relationships
in Correlation
Patterns of
Correlation
No Correlation
-no systematic
relationship between two
variables
Direction of
Correlation
Positive Correlation
-also known as direct correlation,
is when the two variables tend to
change in the same direction.

- as the value of X variable


increases from one individual to
another, the Y variable also tends
to increase: when the X variable
decreases, the Y variable also
decreases

e.g. IQ and Academic


Performance
Direction of
Correlation
Negative Correlation
-also known as inverse
correlation, is when the two
variables tend to in opposite
direction.

- as the value of X variable


increases, the Y variable
decreases

e.g. stress and performance


Strength of
Correlation
-correlation coefficient
(-1 to +1) would tell us if it
has a strong or weak
relationship between Correlation Coefficient Strength of Relationship
variables ±0.91 to ±1.00 Very Strong
±0.71 to ±0.90 Strong
1 – perfect correlation
0-no correlation ±0.51 to ±0.70 Moderate
±0.31 to ±0.50 Weak
±0.01 to ±0.30 Very Weak
.00 Zero
Which of the following is the correct order,
from strongest and most consistent to
weakest and least consistent, for the
following correlations?

a. –1.00, 0.85, –0.43, 0.02


b. 0.85, 0.02, –0.43, –1.00
c. –1.00, –0.43, 0.02. 0.85
d. 0.85, –0.43, 0.02, –1.00
The most common correlation, it measures the degree2 and
direction of straight-line relationship of two variables

The Pearson 𝑟=
degree to which X and Y vary together
degree to which X and Y vary separately
Correlation =
covariability of X and Y
variability of X and Y separately

Pearson Product- When there is a perfect linear relationship, every change in the X
moment Correlation variable accompanied by a corresponding change in the Y variable.
The covariability (X and Y together) is identical to the variability of X
and Y separately, and the formula produces a correlation with a
magnitude of 1.00 or -1.00
The Sum of
Products of Definitional formula

Deviations 𝑆𝑃 = 𝛴 𝑋 − 𝑀𝑋 − 𝑌 − 𝑀𝑌
Where:
This new value and
concept is similar to SS 𝑀𝑋 is the mean for the X scores
(the sum of squared 𝑀𝑌 is the mean for the Y scores
deviations), which is used
to measure variability for
1. Find the X deviation 𝑋 − 𝑀𝑋 and the Y deviation for each
a single variable. individual 𝑌 − 𝑀𝑌
2. Find the product of the deviations for each individual (XY)
SP is used to measure the
3. Add the products.
amount of covariability
between two variables.
Scores Deviations Products
X Y
1 3 -1.5 -2 3
The Sum of
2 6 -0.5 1 -0.5
Products of
Deviations 4 4 1.5 -1 -1.5
3 7 0.5 2 1
Definitional Formula
2.5 5
SP= 2
X Y XY
1 3 3
The Sum of 2 6 12
Products of 4 4 16
Deviations 3 7 21

10 20 52
Computational Formula Totals
𝛴𝑋𝛴𝑌
𝑆𝑃 = 𝛴𝑋𝑌 = 𝛴𝑋𝛴𝑌 10 20
𝑛 𝑆𝑃 = 𝛴𝑋𝑌 = = 52 − = 52 − 50 𝑆𝑃 = 2
𝑛 4
What is the sum of products for the
following data?

X Y
2 1
2 3
5 4
3 2
1 2
EXAMPLE:

X Y
Calculation of
0 2
the Pearson
Correlation 10 6
4 2
𝑆𝑃 8 4
𝑟=
𝑆𝑆𝑥 𝑆𝑆𝑌 8 6
Squared
Scores Deviations Products
Deviations
Calculation of X Y 𝑋 − 𝑀𝑋 Y−𝑀𝑌 (𝑋 − 𝑀𝑋 )2 (𝑌 − 𝑀𝑌 )2 (𝑋 − 𝑀𝑋 )(𝑌 − 𝑀𝑌 )
(1) SSX,
0 2 -6 -2 36 4 12
(2) SSY, and 10 6 4 2 16 4 8
(3) SP 4 2 -2 -2 4 4 4
for a sample of 8 4 2 0 4 0 0
n=5 pairs of 8 6 2 2 4 4 4
scores MX=6 MY=4 SSX=64 SSY=16 SP= 28

𝑆𝑃 28 28
𝑟= = = = 0.875
𝑆𝑆𝑥 𝑆𝑆𝑌 (64)(16) 32
A set of n=5 pairs of X andY values has SSx=5,
SSy=20, and SP=8. what is the Pearson
correlation?
The value of correlation describes the relationship that
exists in the data, also, it indicates how clustered the
points around a line.
Correlations
and the The positive sign for the correlation indicates that the
Pattern of points are clustered around a line that slopes to the
right. The high value for the correlations (near 1.00)
Data Points indicates that the points are very tightly clustered close
to the line.
Because the Pearson correlation describes the pattern
formed by the data points, any factor that does not
change the pattern also does not change the
correlation. Adding a constant to (or subtracting a
Correlations constant from) each X and/or Y value does not change
the pattern of data points and does not change the
and the correlation. Also, multiplying (or dividing) each X or each
Pattern of Y value by a positive constant does not change the
pattern and does not change the value of the correlation.
Data Points Multiplying by a negative constant, however, produces a
mirror image of the pattern and, therefore, changes the
sign of the correlation.
A set of n = 15 pairs of X andY values has a Pearson correlation
of r = 0.10. If each of the Xvalues were multiplied by 2, then
what is the correlation for the resulting data?
Correlation  Measure of degree of correlation between two variables ranging
from -1 (a perfect negative linear correlation) through 0 (no
Coefficient (r) correlation) to +1 (a perfect correlation)
Person
Product- Also called Pearson’s r, devised by Karl Pearson, is a measure of the
strength of a linear association between variables.
Moment
Correlation
Using and Interpreting
Pearson Correlation
Using and
Where and Why Correlations are Used
Interpreting
1. Prediction
Pearson
2. Validity
Correlation
3. Reliability
Where and Why 4. Theory Verification
Correlations are Used
Using and When you encounter correlations, there are four
Interpreting additional considerations that you should bear in mind.
Pearson 1. Correlation and Causation
Correlation 2. Correlation and Restricted Range
3. Outliers
Interpreting 4. Correlation and the Strength of the Relationship
Correlations
It only describes a relationship between two variables, it
Correlation and does not explain why the two variables are related to
Causation each other.
Whenever a correlation is computed from scores that do
not represent the full range of possible values, you
should be cautious in interpreting the correlation.
The correlation within this restricted range could be
completely different from the correlation that would be
Correlation and obtained from a full range of data.
Restricted Range
This relationship is obscured when the data are limited
to a restricted range. For a correlation to provide an
accurate description for the general population, there
should be a wide range of X and Y values in the data.
Outliers

A single data can have a dramatic effect on the value of a


correlation and thereby can affect one’s interpretation of
the relationship between variables X and Y.
Most researchers prefer to square the correlation and
use the resulting value to measure the strength of the
relationship. The squared correlation (r2) measures the
gain in accuracy that is obtained from using the
correlation for prediction (coefficient of determination).

Correlation and
The value r2 is called the coefficient of determination
the Strength of because it measures the proportion of variability in one
the Relationship variable that can be determined from the relationship
with the other variable.
A correlation of r = 0.80 (or –0.80), for example, means
that r2 = 0.64 (or 64%) of the variability in the Y scores can
be predicted from the relationship with X.
The correlation for a set of X andY scores is r = 0.60. The scores are
separated into two groups, with one group consisting of individuals
with X values that are equal to or above the median and the other
group consisting of individuals with X values that are below the
median. If the correlation is computed for the group with X values
below the median, how will the correlation compare with the
correlation for the full set of scores?

a. It will still be r = 0.60.


b. It will be greater than r = 0.60
c. It will be smaller than r = 0.60.
d. It is impossible to predict how the correlation for the smaller group
will be related to the correlation for the entire group.
Suppose the correlation between height and weight for adults
is +0.80. What proportion (or percent) of the variability in
weight can be explained by the relationship with height?
A partial correlation measures the relationship between two
variables while controlling the influence of a third variable by
holding it constant.

Partial Sometimes a researcher may observed that the relationship


Correlations between two variables is being distorted by the influence of a third
variable. When two variables have a strong positive/negative
relationship however, it is unlikely that this two variables has a
direct relationship, instead, both variables are influence by a third
variable.
EXAMPLE:
In a situation with three variables, X, Y, and Z, it is
possible to compute three individual Pearson
correlations:
1. rXY measuring the correlation between X and Y
Partial 2. rXZ measuring the correlation between X and Z
Correlation 3. YZ measuring the correlation between Y and Z

𝑟𝑋𝑌 −(𝑟𝑋𝑧 𝑟𝑌𝑧 )


𝑟𝑋𝑌−𝑍 =
2 )(1−𝑟 2 )
(1−𝑟𝑋𝑍 𝑌𝑍
What is measured by a partial correlation?

a. It is the correlation obtained for a sample with missing scores


(X orY values).
b. It is the correlation obtained for a restricted range of scores.
c. It eliminates the influence of outliers (extreme scores) when
computing a correlation.
d. It measures the relationship between two variables while
controlling the influ ence of a third variable.
Hypothesis Tests with
the Pearson Correlation
TWO-TAILED SIGNIFICANCE TEST
 H0: ρ=0 There is no population correlation
The  H1: ρ≠0 There is a real correlation
 * A sample correlation near zero provides support for H0 and a
Hypotheses sample value far from zero tends to refute H0 .

The basic question for ONE-TAILED SIGNIFICANCE TEST


this hypothesis test is When there is a specific prediction about the direction of the
whether a correlation correlation, it is possible to do a directional, or one-tailed test. For
example, if a researcher is predicting a positive relationship, the
exists in the hypotheses would be
population.
H0: ρ≤0 The population correlation is not positive.
H1: ρ>0 The population correlation is positive.
* For a directional test, a positive value for the sample correlation
would tend to refute a null hypothesis stating that the population
correlation is not positive
Samples are not expected to be identical to the populations from
which they come; there will be some discrepancy (sampling error)
The between a sample statistic and the corresponding population
Hypotheses parameter.
Standard error for r
The
1−𝑟 2
Hypothesis Test 𝑠𝑟 =
𝑛−2

The hypothesis test


evaluating the significance t Statistic
of a correlation can be
𝑟−ρ
conducted using either a t 𝑡=
statistic or an F-ratio. (1−𝑟)2
(𝑛−2)
The
Hypothesis Test
An intuitive explanation for this value is that a sample
with only n = 2 data points has no degrees of freedom
Degrees of hence, the first two points always produce a perfect
Freedom for the t- correlation, the sample correlation is free to vary only
Statistic when the data set contains more than two points.
df = n – 2
Alternatives to the
Pearson Correlation
Used when both variables are in ordinal scales (rank)
1. When the original data are ordinal (rank)
The 2. When measuring the consistency of a relationship between X and
Y, independent of the specific form of the relationship
Spearman - the original scores are first converted to ranks; then the Pearson
correlation formula is used with the ranks
Correlation (rS) monotonic describes a sequence that is consistently increasing (or
decreasing). Like the word monotonous, it means constant and
unchanging
EXAMPLE:
The following data show a nearly perfect monotonic relationship
between X and Y. When X increases, Y tends to decrease, and there is
only one reversal in this general trend. To compute the Spearman
correlation, we first rank the X and Y values, and we then compute
The the Pearson correlation for the ranks.

Spearman
Correlation (rS)
Values are to be computed in ranks, and not the original scores. The
X ranks are simply the integers 1,2,3,4, and 5. 2 2
2 ΣX 15
− = 55 − = 10
ΣX=15 𝑺𝑺𝑿 = ΣX 𝑛 5

The ΣX2=55
Spearman Note that the ranks for Y are identical to the ranks for X; that is, they
are the integers 1, 2, 3, 4, and 5. Therefore, the SS for Y is identical to
Correlation (rS) the SS for X:
𝑺𝑺𝑿 = 10
Computing the Correlation When computing the SP Value, we need ΣX, ΣY, and ΣXY for the
we need (1) SS for X, ranks
(2) SS for Y, and ΣX ΣY 15 15
𝑺𝑷 = ΣXY − = 36 − = −9
(3) SP 𝑛 5
Finally, the Spearman correlation simply uses the Pearson formula
for the ranks.
𝑆𝑃 −9
𝑟= = = −0.9
𝑆𝑆𝑥 𝑆𝑆𝑌 10(10)
When you are converting scores into ranks for the Spearman
correlation, you may encounter two (or more) identical scores
1. List the scores in order form smallest to largest. Include tied
values in the list.
2. Assign a rank (first, second, etc.) to ach position in the ordered
list.
3. When two (or more) scores are tied, compute the mean of their
The ranked positions, and assign this mean value as the final rank for
ach score.
Spearman
Correlation (rS)
Ranking Tied Scores
If the following scores are ranked, what rank is assigned to an
individual with a score of X = 7?
Scores: 1, 2, 2, 5, 6, 6, 6, 7, 10, 12

a. 5
b. 6
c. 7
d. 8
After the original X values and Y values have been ranked, the calculations
necessary for SS and SP can be greatly simplified.
To compute the mean for these integers, you can locate the midpoint of the series
by
(𝒏 + 𝟏)
𝑴=
𝟐
The Similarly, the SS for this series of integers can be computed by
𝒏(𝒏𝟐 − 𝟏)
Spearman 𝑺𝑺 =
𝟏𝟐
Correlation (rS) Also, because the X ranks and the Y ranks are the same values, the SS for X is
identical to the SS for Y.

Special Formula Spearman correlation equation:


𝟔𝜮𝑫𝟐
𝒓𝑺 = 𝟏 −
𝒏(𝒏𝟐 − 𝟏)
Where:
D is the difference between X and Y rank for each individual
EXAMPLE:

It should be noted that


this special formula can
be used only after the
scores have been
converted to ranks and
only when there are no
ties among the ranks. If
there are relatively few 6𝛴𝐷2 6 38 228
𝑟𝑆 = 1 − =1− =1− = 1 − 1.90 = −0.90
tied ranks, the formula 𝑛 𝑛2 − 1 5 25 − 1 120
still may be used, but it
loses accuracy as the
number of ties increases.
Testing a hypothesis for the Spearman correlation is similar to the
procedure used for the Pearson r. The sample correlation could be
due to chance, or perhaps it reflects an actual relationship between
the variables in the population.

Testing the The null hypothesis states that there is no correlation (no monotonic
relationship) between the variables for the population, or in
Significance of symbols:

the Spearman H0: ρS=0 The population correlation iis zero


The alternative hypothesis predicts that a nonzero correlation exists
Correlation in the population
H1: ρS≠0 There is a real correlation
To determine whether the Spearman correlation is statistically
significant (that is, H0 should be rejected), consult Table B.7.
The point-biserial correlation is used to

The Point-Biserial measure the relationship between two


variables in situations in which one variable
consists of regular, numerical scores, but the
Correlation and second variable has only two values
(dichotomous).

Measuring Effect A variable with only two values is called a


dichotomous variable or a binomial variable.
2 The dichotomous variable is coded using
Size with r values of 0 and 1, and the regular Pearson
formula is applied. Squaring the point-biserial
correlation produces the same r2 value that is
obtained to measure effect size for the
independent-measures t test.
When both variables, X and Y, are
dichotomous, the phi-coefficient can be
used to measure the strength of the
relationship.
Both variables are coded 0 and 1, and the
Pearson formula is used to compute the
The Phi-Coeffiecient correlation.

You might also like