CORRELATION
Khairil Anuar Md. Isa BBiomedicalSc.(hons), UKM MSc. (Medical Stat), USM
WHAT IS CORRELATION?
Measure the strength and direction of a linear relationship between a pair of random variable. Measured by the coefficient of correlation, r (rho) : (Population) r has a value between -1 to 1. The sample estimate for r is denoted by r : (Sample Coefficient of Correlation)
WHAT IS CORRELATION? CONT
The type of coefficient suitable for a pair of variables is determined by measurement scales (ratio/interval or ordinal) and the distributions of the variables (normal or not normal). It does not indicate the cause and effect relationship E.g.: High Correlation does not necessarily mean one variable caused the other.
CORRELATION ANSWER:
What is the direction of the linear relationship (negative or positive)? How strong is the linear relationship?
! Regression will add the answer of prediction (more detail on the strength of linear relationship)
EXAMPLES OF RESEARCH QUESTIONS
What factors are related to road accident fatalities? Is the satisfaction level of student towards their lecture related to grade expectation? What factors affect customers satisfaction? Does the shelf life of bread related to the humidity of the storage area.
CORRELATION VERSUS REGRESSION
Correlation
Emphasis on the degree of linear relationship How strong is the relationship Does NOT matter which is X and which is Y
Regression Emphasis on prediction Directional ~ one is the predictor to predict the other one
MUST identify which is the predictor variable (X) and which is to predict (Y)
RELATIONSHIP BETWEEN 2 QUANTITATIVE VARIABLES
Measured by the Pearsons coefficient of correlation (Parametric) Both variables must be quantitative and normally distributed. The nature, direction and the strength of the relationship can be determined from scatter plot.
300
200
100
Positive correlation
BP
0 0 20 40 60 80 100 120 140
Weight
If weight increase, Blood pressure will increase
320
Negative correlation
300
280
260
240
220 50 60 70 80 90 100 110
X5
If age increase, PEFR will decrease
140
120
100
80
60
40
Weight
20
0 26 28 30 32 34 36
Almost no correlation
Age
When age increase, weight does not change
The longer the ellipse the more the r r = -0.94
r = -0.54
The more scattered the distribution the lower is correlation
Shape of circle
r = 0.42
r = 0.17
GUIDE TO INTERPRET THE COEFF. (R)
r = 0 0.25 : poor / no correlation r = 0.26 0.50 : fair correlation r = 0.51 0.75 : good correlation r = 0.76 1 : excellent / perfect correlation For + or values ! Positive correlation means two variables are moving in the
same direction and vise versa for negative correlation
ASSUMPTIONS FOR CORRELATION TEST
At least one of the variables are normally distributed There is a linear relationship between variables Random sample
Ho= There is no correlation between X & Y HA= There is correlation between X & Y
SUMMARY..
To see whether there is any relationship between two numerical variables in term of strength and direction.
Scatter plot
Linear association? Direction (+/-) association?
Correlation test = r
PEARSONS CORRELATION COEFFICIENT
A measure of degree of straight line relationship between two numerical variables At least one variable must have a normal distribution Population correlation coefficient (r: rho) Sample correlation coefficient (r)
FORMULA..
TESTING THE SIGNIFICANCE OF THE ASSOCIATION
CAUTIONS ABOUT INTERPRETING CORRELATION COEFFICIENTS
Appropriate data type
Data
MUST be drawn from random sample One variable should not be a component of the other Data is not the combined data from two identifiable groups
..CAUTIONS..
Effect of outliers
Look
at the scatter plot! Outliers effect the means so can effect the coefficient May be muted if sample size are large Check whether true outlier or not then handle the outlier
Correlation is not an agreement Not to compare change and initial value
..CAUTION..LIMITATIONS.
Very weak relationship (r<0.25) still give significant result with a adequate large sample size High correlation does not imply cause and effect relationship
No
component of temporal relationship for causality
SPEARMAN RANK CORRELATION
Distribution free For non-normal distribution data ~ nonparametric procedure Replace the observations by their ranks in the calculation
d is the different between rank x with rank y
COMPUTING CORRELATION COEFFICIENTS
Scatter plot Correlation matrix Coefficient Significance test
STEPS..
Check normality of the two variables
Pearsons or Spearman rank?
Identify outliers ~ remove? Visual assessment LINEAR or not & direction P-value Correlation coefficient
Do scatter plot
Correlation test
Steps using SPSS..
2 Click the two variables
INTERPRETATION
There is a significant (p<0.001), negative poor correlation (r=-0.244) between the age and PEFR.
Thank you..