Correlation
Correlation
ANALYSIS
CORRELATION
key concepts:
Concept of correlation
Types of correlation
Methods of studying correlation
a) Scatter diagram
b) Karl pearson’s coefficient of correlation
c) Spearman’s Rank correlation coefficient
CONCEPT OF CORELATION
❖ Ya Lun Chow
• Mutual Dependence
• Influence of third variable
• Pure chance
• Spurious correlation
TYPES OF CORRELATION
Positive
Correlation
TYPE 1
Negative
Correlation
Simple
CORRELATION TYPE 2
Partial
Multiple
TYPE 3 linear
non linear
TYPES OF CORRELATION TYPE I
Correlation
Correlation
Simple Multiple
Total
Partial
TYPES OF CORRELATION -TYPE II
Correlation
Ex X = 1, 2, 3, 4, 5, 6, 7, 8,
Y = 5, 7, 9, 11, 13, 15, 17, 19,
Y = 3 + 2x
Non Linear correlation: The correlation would be non linear if the
amount of change in one variable does not bear a constant
ratio to the amount of change in the other variable.
THE CORRELATION COEFFICIENT
If cov(X, Y) is greater than zero, the covariance for any two variables is
positive and both the variables move in the same direction.
If cov(X, Y) is less than zero, the covariance for any two variables is
negative and both the variables move in the opposite direction.
The scatter plot shows data points that are aligned which clearly indicate strong
positive relationship between x and y close to 1.
EXAMPLE 2
The scatter plot shows data points that are not aligned which clearly indicate very
weak relationship between x and y close to 0.
EXAMPLE 3
The scatter plot shows data points that are aligned which clearly indicate negative
relationship between x and y close to -1.
SCATTER DIAGRAM MERITS AND LIMITATIONS
Merits
Demerits
• Scatter diagram does not measure the precise the extent of correlation
Find r
EXAMPLE 4
r= 772,5 r= 1
√ 257.5 * 2317.5
E X 6 C A L C U L AT E K A R L P E A R S O N ’ S C O E F F I C I E N T O F C O R R E L AT I O N
F R O M T H E F O L L O W I N G D ATA A N D INTERPRET I T S VALUE:
EX: 6
CONTD:
EXAMPLE 5
Find r
EXAMPLE 5
r= 104.5 r= -0.28051
√ 253.5 * 547.5
EXAMPLE 7
SOLUTION
EXAMPLE 8
SOLUTION
K A RL PEARSON'S CO E F F I CIE NT O F CORRELATION
Method used so far
SOLUTION
EXAMPLE 15
EXAMPLE 15
CONCEPT OF PROBABLE ERROR
.
• Probable error is used to find the reliability of correlation coefficient.
• P.E (r ) = Standard Error
50% of observations in
normal distribution lie in
the range µ ± 0.6745𝛔
Uses:
• If r< P.E(r), no correlation between the variables. This shows that the
coefficient of correlation is not at all significant.
• If r>6 P.E(r), this shows that the value of ‘r’ is significant.
• In other situations, nothing can be concluded with certainty.
• If another random sample of the same size n from the same population from
which the first sample is the observed value of ‘r1,’ in the second sample can
be expected to lie within the limits given by r ± P.E (r )
CONCEPT OF PROBABLE ERROR
• The data must approximate to the bell-shaped curve, i.e. a normal frequency curve.
• The Probable error computed from the statistical measure must have been taken
from the sample.
• The sample items must be selected in an unbiased manner and must be
independent of each other.
EX:15
EX:15
SOLUTION
EX:16
EX:16
EX:17
EX:17
COEFFICIENT OF DETERMINATION
• Coefficient of Determination = r2
=Explained variation / Total variation
• Suppose: r = 0.9, r2 = 0.81 this would mean that 81% of the
variation in the dependent variable has been explained by the
independent variable.
• Coefficient of Non Determination
= k2 = 1 - r2
= Unexplained variation / Total variation
• The maximum value of r2 is 1 because it is possible to explain all of
the variation in y but it is not possible to explain more than all of it.
COEFFICIENT OF DETERMINATION: AN
EXAMPLE
Suppose: r1 = 0.60 and r2= 0.30 It does not mean that the first correlation
is twice as strong as the second the ‘r’ can be understood by computing
the value of r2 .
This implies that in the first case 36% of the total variation is explained
whereas in second case 9% of the total variation is explained .
Spearman’s Rank Coefficient Of Correlation
• For series in which the variables are not capable of quantitative measurement
but can be arranged in a serial order, Spearman Rank correlation can be used.
• This coefficient indicates the association between the rankings.
• Spearman Rank coefficient of correlation is given by:
• If R = +1, then there is complete agreement in the order of the ranks and
the ranks are in the same direction (perfect association)
• If R = -1, then there is complete agreement in the order of the ranks and
the ranks are in the opposite direction(perfect association for reverse
rankings)
• If R = 0, then there is no association/correlation
TYPES OF RANK METHODS
• If the ranks are not given, then we need to assign ranks to the data series.
• The lowest value in the series can be assigned rank 1 or the highest value in
• We need to follow the same scheme of ranking for the other series.
Marks by Rx Marks by Ry
X Y
50 5 31 3 4
66 6 64 6 0
34 3 53 5 4
21 2 41 4 4
15 1 17 1 0
79 8 73 7 0
42 4 29 2 4
RANK CORRELATION COEFFICIENT (R)
Here, m1, m2, ……. are the number of times a value has repeated in the given
X, Y, …….. series, respectively.
REPEATED RANKS
REPEATED RANKS
CONTD:
FEATURES OF SPEARMAN’S RANK
CORRELATION
• simple to understand
• Useful when the data is qualitative.
• Useful where the initial data in the form of ranks.