Pearson Correlation Coefficient
Pearson Correlation Coefficient
PEARSON R
CORRELATION
COEFFICIENT
Introduction:
Sometimes in scientific data, it appears that two variables are
connected in such a way that when one variable changes, the other
variable changes also. This connection is called a correlation.
Examples of this type of correlation include: (1) in deer populations,
large males seem to have more successful matings; and (2) larger
numbers of birds seem to nest in areas with dense vegetation.
Student Procedure
Example: Your students have done some classroom research on
amphibian species found in your area and have discovered that the
red-backed salamander uses fallen logs and debris on the forest
floor for their home. During their earlier census of their Biodiversity
Plot, they have noticed that some quadrats have many fallen logs
whereas other quadrats have few or none. They expect that they
would find more red-backed salamanders in those quadrats with
many fallen logs and design an experiment to test this hypothesis.
This experiment measures: (1) the number of fallen logs in each
quadrat; and (2) the number of red-backed salamanders in each
quadrat. This is a table of the data your class has collected:
SA 3.1
Statistical Analysis - 3
Step 1:
Graphing a the data
Graph your data by computer, or by hand, by assigning the number
of fallen logs (the second column) as the X-axis and the number of
salamanders (the last column) as the Y-axis. For example, in
Quadrat 1, the X-value would be 4 and the Y-value would be 3. The
results, when we plot all 25 points on the graph, look like this:
RELATIONSHIP OF FALLEN LOGS AND SALAMANDERS
3.5
3
Number of Salamanders
2.5
1.5
0.5
0
0 1 2 3 4 5 6 7 8
Number of Fallen Logs
SA 3.2
Statistical Analysis - 3
Looking at this graph, there seems to be a positive relationship
between the number of fallen logs and the number of salamanders.
In other words, it appears that when the number of fallen logs
increases, the number of Red-Backed Salamanders also increases.
Some things to remember about the Pearson r correlation:
• The lowest value that the Pearson r can have is r = 0.00. This
means there is ZERO correlation, and would indicate that X and Y
are not related to one another.
• The highest value that the Pearson r can have is r = 1.00. This
indicates a PERFECT correlation and would indicate that X and Y
are completely related to one another in the sample.
• Pearson r values can be either positive or negative. A positive
value indicates that increases in X correspond to increases in Y. A
negative value indicates that increases in one variable are
associated with decreases in the other variable.
SA 3.3
Statistical Analysis - 3
Step 3:
Calculating the Pearson r Correlation
Coefficient
The graph below was produced by Microsoft Excel (charting
function) which calculated a correlation coefficient from the data in
our example. The graph shows a trend indicating an increase in
salamanders where there are more fallen logs present. Note,
however, that the value calculated by this program is the Pearson r
value squared. You must take the square root of this figure to give
the Pearson r value. From the graph: R2 = 0.72; Pearson r = 0.85.
Because 0.85 is close to 1.0 (the maximum value for the Pearson r),
this demonstrates a strong, positive correlation.
3.5
R2 = 0.7175
Pearson r = 0.85
3
2.5
Number of Salamanders
1.5
0.5
0
0 1 2 3 4 5 6 7 8
-0.5
Number of Fallen Logs
If not using the Excel Software, or other graphing program, you can
calculate the Pearson r by using the following formula:
Pearson r =
N(∑ XY )− (∑ X )(∑ Y)
[N(∑ X )− (∑ X) ][N(∑ Y )− (∑ Y) ]
2
2
2
2
SA 3.4
Statistical Analysis - 3
25 25 25
Using the values from the new table, complete the Pearson r
formula:
Pearson r =
N (∑ XY )− (∑ X )(∑ Y)
[N(∑ X )− (∑ X) ][N(∑ Y )− (∑ Y) ]
2
2
2
2
SA 3.5
Statistical Analysis - 3
The numerator, or top of the formula, looks like this once we plug in
all the numbers:
(25)(90) − (57)(22)
Pearson r =
[25(213) − (57)2 ][25(46) − (22)2 ]
2250 − 1254
Pearson r =
[5325 − 3249][1150 − 484]
996
Pearson r =
[2076][666]
996
Pearson r =
1382616
996
Pearson r =
1175.85
Pearson r = 0.8471
SA 3.6
Statistical Analysis - 3
Step 4:
Determine if your calculations have
statistical significance
DF = 25 - 2 = 23
2. Find your DF on the table below and find the critical value allowed.
3. The calculated figure is greater than the critical value from the
table; our findings have statistical significance. Therefore, we can
assume that our hypothesis is true and that there is a strong
positive correlation betwen the number of fallen logs and the
number of salamanders and this correlation is not due to chance.
SA 3.7
Statistical Analysis - 3
DF Critical
(N - 2) Value
(5% certainty)
1 .98769
2 .90000
3 .8054
4 .7293
5 .6694
6 .6215
7 .5822
8 .5494
9 .5214
10 .4973
11 .4762
12 .4575
13 .4409
14 .4259
15 .4124
16 .4000
17 .3887
18 .3783
19 .3687
20 .3598
25 .3233
30 .2960
35 .2746
40 .2573
45 .2428
50 .2306
60 .2108
70 .1954
80 .1829
90 .1726
100 .1638
SA 3.8