Linear Correlation (Pearson) : Assumptions
Linear Correlation (Pearson) : Assumptions
The LINEAR CORRELATION (PEARSON) command calculates the Pearson product moment correlation
coefficient between each pair of variables. Pearson correlation coefficient measures the strength of the
linear association between variables.
For ranked data consider using the Spearman's correlation coefficient (RANK CORRELATIONS
command).
Assumptions
Each variable should be continuous, random sample and approximately normally distributed.
How To
Run: STATISTICS->BASIC STATISTICS ->LINEAR CORRELATION (PEARSON)...
Pairwise deletion is default for missing values removal (use the MISSING VALUES option in the
PREFERENCES window to force the casewise deletion).
Results
Matrix with correlation coefficients, critical values and p-values for each pair of variables is
produced. The null hypothesis of no linear association is tested for each correlation coefficient. Below the
matrix the R-values are listed in order of R absolute value.
SAMPLE SIZE – shows how many cases were used for the calculations. The variables must have the same
number of observations (the size of the variable with the least observations count is used).
CRITICAL VALUE(𝛼%) - 𝛼% critical value for T-statistic, used to test the null hypothesis.
∑𝑁
1 (𝑥𝑖 − 𝑥̅ )(𝑦𝑖 − 𝑦
̅)
𝑟𝑋,𝑌 = ,
(𝑁 − 1)𝑠𝑋 𝑠𝑌
The correlation coefficient can take a range of values from +1 to -1. Positive correlation coefficient means
that if one variable gets bigger, the other variable also gets bigger, so they tend to move in the same
direction. But please note that even strong correlation does not imply causation. Negative correlation
coefficient means that the variables tend to move in the opposite directions: If one variable increases, the
other variable decreases, and vice-versa. When correlation coefficient is close to zero two variables have no
linear relationship.
There are many rules of thumb on how to interpret a correlation coefficient, but all of them are
domain specific. For example, here is correlation coefficient interpretation for behavioral sciences offered
by Hinkle, Wiersma and Jurs (2003):
R STANDARD ERROR – is the standard error of a correlation coefficient. It is used to determine the confidence
intervals around a true correlation of zero. If correlation coefficient is outside of this range, then it is
significantly different than zero.
T – is the observed value of the T-statistic. It is used to test the hypothesis that two variables are correlated.
A T-value near 0 is the evidence for the null hypothesis that there is no correlation between the variables.
When the sample size 𝑁 is large, the test statistic 𝑇 approximately follows the Student’s distribution with
𝑁 − 2 degrees of freedom.
P-VALUE – low p-value is taken as evidence that the null hypothesis can be rejected. The smaller the p-value,
the more significant is linear relationship. If p-value < 𝛼% we can say there is a statistically significant
relationship between variables.
H0 (𝛼%) ? – shows if null hypothesis (𝑟 = 0) is accepted (written in red) or rejected at 𝛼 (selected alpha).
References
Hinkle, Wiersma, & Jurs (2003). Applied Statistics for the Behavioral Sciences (5th ed.)