0% found this document useful (0 votes)
48 views83 pages

Correlation

The document discusses different concepts related to correlation analysis including types of correlation, correlation coefficient, covariance, and methods of studying correlation. It defines key terms and provides examples to explain concepts like positive correlation, negative correlation, simple correlation, and linear correlation. Scatter diagram and Karl Pearson's coefficient of correlation are mentioned as methods to study the relationship between variables.

Uploaded by

Disha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views83 pages

Correlation

The document discusses different concepts related to correlation analysis including types of correlation, correlation coefficient, covariance, and methods of studying correlation. It defines key terms and provides examples to explain concepts like positive correlation, negative correlation, simple correlation, and linear correlation. Scatter diagram and Karl Pearson's coefficient of correlation are mentioned as methods to study the relationship between variables.

Uploaded by

Disha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 83

CORRELATION

ANALYSIS
CORRELATION

key concepts:
Concept of correlation
Types of correlation
Methods of studying correlation
a) Scatter diagram
b) Karl pearson’s coefficient of correlation
c) Spearman’s Rank correlation coefficient
CONCEPT OF CORELATION

• The method of correlation is expanded by Francis Galton in 1885.


• Correlation is a statistical technique that can reveal
whether and how strongly pairs of variables are associated.

• Correlation is a term measure the strength of a linear relationship


between two quantitative variables.

• Correlation used in measuring the closeness of the relationship


between the variables. Example Price and Demand
DEFINITIONS

❖ Simpson and Kofka

❖“Correlation analysis deals with the association between two or more


variables”.

❖ Ya Lun Chow

❖“Correlation analysis attempts to determine the degree of relationship


between variables”.

❖ Croxton and Cowden


❖“When the relationship is of a quantitative nature, the appropriate
statistical tool for discovering and measuring the relationship and
expressing it in brief formula is known as correlation”.
SIGNIFICANCE OF CORRELATION

❖Correlation can measure the degree of relationship existing between the


variables. It measures the strength of linear relationship.
❖Correlation analysis contributes to the understanding of economic
behavior.

❖Correlation helps executive to estimate costs, prices and other variables.

❖The effect of correlation is to reduce the range of uncertainty. The


prediction based on correlation analysis is likely to be more reliable and
near to reality.
CORRELATION

Correlation: The degree of relationship between the


variables under consideration is measured through the
correlation analysis.
❖ The measure of correlation called the correlation coefficient.
❖ The degree of relationship is expressed by coefficient which
range from correlation ( -1 ≤ r ≥ +1).
❖ The direction of change is indicated by a sign.
❖ The correlation analysis enable us to have an idea about the
degree & direction of the relationship between the two or
more variables under study.
CORRELATION & CAUSATION
❖ Causation means cause & effect relation.
❖ Correlation denotes the interdependency among the
variables for correlating two phenomenon, it is essential that
the two phenomenon should have cause-effect relationship,&
if such relationship does not exist then the two phenomenon
can not be correlated.
❖ If two variables vary in such a way that movement in one
are accompanied by movement in other, these variables
are called cause and effect relationship.

❖ Causation always implies correlation but correlation does


not necessarily implies causation.
CAUSATION AND CORRELATION

• Mutual Dependence
• Influence of third variable
• Pure chance
• Spurious correlation
TYPES OF CORRELATION

Positive
Correlation
TYPE 1
Negative
Correlation

Simple

CORRELATION TYPE 2
Partial

Multiple

TYPE 3 linear

non linear
TYPES OF CORRELATION TYPE I

Correlation

Positive Correlation Negative Correlation


TYPES OF CORRELATION TYPE I
Positive Correlation: The correlation is said to be positive correlation
if the values of two variables changing with same direction.
Ex. Pub. Exp. & sales, Height & weight.
Positive coefficient of correlation 0 to + 1

Negative Correlation: The correlation is said to be negative correlation


when the values of variables change with opposite direction.
Ex. Price & qty. demanded.
An inverse relation between the variables. Negative coefficient of
correlation 0 to -1
X 100 90 80 70 60
Y 15 20 22 25 37
DIRECTION OF THE CORRELATION

Positive relationship - Variables change in the same direction.


• As X is increasing, Y is increasing Indicated by
• As X is decreasing, Y is decreasing
sign; (+) or (-).
„ E.g., As height increases, so does weight.
Negative relationship - Variables change in opposite directions.
• As X is increasing, Y is decreasing
• As X is decreasing, Y is increasing
„ E.g., As TV time increases, grades decrease
A zero correlation exists when there is no relationship between two
variables.
Example their is no relationship between the amount of tea drunk and level
of intelligence.
MORE EXAMPLES

Positive relationships Negative relationships:


• water consumption and • alcohol consumption and
temperature. driving ability.
• study time and • Price & quantity
grades. demanded
TYPES OF CORRELATION TYPE II

Correlation

Simple Multiple
Total
Partial
TYPES OF CORRELATION -TYPE II

Simple correlation: Only two variables are studied.


Multiple Correlation: Three or more than three variables are studied.
. Qd = f ( P,Pop,income, taste )
Partial correlation: Recognizes more than two variables but considers
only two variables keeping the other constant.
Total correlation: Based on all the relevant variables-normally not
feasible.
TYPES OF CORRELATION TYPE III

Correlation

LINEAR NON LINEAR


TYPES OF CORRELATION TYPE III

Linear correlation: when the amount of change in one variable


tends to bear a constant ratio to the amount of change in the
other. The graph of the variables having a linear relationship will
form a straight line.

Ex X = 1, 2, 3, 4, 5, 6, 7, 8,
Y = 5, 7, 9, 11, 13, 15, 17, 19,

Y = 3 + 2x
Non Linear correlation: The correlation would be non linear if the
amount of change in one variable does not bear a constant
ratio to the amount of change in the other variable.
THE CORRELATION COEFFICIENT

• It is a statistical measure of the strength of a linear


relationship between two variables. It is denoted by r
• Its values can range from -1 to 1. The weak correlation is
signaled when the coefficient of correlation approaches
zero. When ‘r’ is near zero, then we can deduce that the
relationship is weak.
• A correlation coefficient is a bivariate statistic when it
summarizes the relationship between two variables, and
it’s a multivariate statistic when we have more than two
variables.
THE CORRELATION COEFFICIENT

• It is a pure number without the effect of any units on it. It


also does not get affected when we add the same number
to all the values of one variable. We can multiply all the
variables by the same positive number. It does not affect
the correlation coefficient.(independent of origin and
scale)
• Correlation coefficients are used in science and in finance
to assess the degree of association between two variables,
factors, or data sets. For example, since high oil prices are
favorable for crude producers, one might assume the
correlation between oil prices and forward returns on oil
stocks is strongly positive.
THE CORRELATION COEFFICIENT-INTERPRETATION
COVARIANCE

• is a measure of the relationship between two random variables


and to what extent, they change together
• it defines the changes between the two variables, such that
change in one variable is equal to change in another variable.

Another formula for


covariance N (N) (N)
COVARIANCE

If cov(X, Y) is greater than zero, the covariance for any two variables is
positive and both the variables move in the same direction.

If cov(X, Y) is less than zero, the covariance for any two variables is
negative and both the variables move in the opposite direction.

If cov(X, Y) is zero, there is no relation between two variables.


COVARIANCE CORRELATION
• measures the degree to which two • Correlation is a standardized measure of
variables change together. the relationship between two variables.
• indicates the direction of the linear • It measures both the strength and direction
relationship between the variables. of the linear relationship between two
variables.
• The magnitude of the covariance is not
standardized, so it can be difficult to • Correlation coefficients range between -1
interpret. and 1.
• Covariance can be positive, negative, or • A correlation coefficient of 1 indicates a
zero, indicating the direction of the perfect positive linear relationship, -1
relationship (positive meaning they indicates a perfect negative linear
move together, negative meaning they relationship, and 0 indicates no linear
move inversely, and zero meaning no relationship.
linear relationship).
• Correlation is preferred over covariance
• However, it doesn't provide a when comparing the relationships between
standardized measure of the strength of variables because it's standardized and
the relationship. easier to interpret.
METHODS OF STUDYING CORRELATION

• Scatter Diagram Method


• Karl Pearson’s Coefficient of Correlation
• Rank corelation
SCATTER DIAGRAM METHOD

Scatter Diagram is a graph of observed plotted points where each


points represents the values of X & Y as a coordinate. It portrays the
relationship between these two variables graphically.
• The pairs of values are plotted on the graph paper, graphs of dots
are obtained. Its called scatter diagrams or dot grams.
• When the dots appear to be situated on a line which advances
upward at 45° angle from the 0 to X axis-Perfect Positive
Correlation.
• If the dots appear to be situated on a line which moves from
left to right in downward direction at 45° angle from 0 to X
axis - Perfect Negative Correlation
DIAGRAMS
EXAMPLE 1

• create a scatter plot and calculate the correlation coefficient

The scatter plot shows data points that are aligned which clearly indicate strong
positive relationship between x and y close to 1.
EXAMPLE 2

• create a scatter plot and calculate the correlation coefficient

The scatter plot shows data points that are not aligned which clearly indicate very
weak relationship between x and y close to 0.
EXAMPLE 3

• create a scatter plot and calculate the correlation coefficient

The scatter plot shows data points that are aligned which clearly indicate negative
relationship between x and y close to -1.
SCATTER DIAGRAM MERITS AND LIMITATIONS

Merits

• Its a very simple method of studying correlation between two variables

• It explains if the values of the variables have any relation or not

• Scatter diagram indicates whether the relationship is positive or negative

Demerits

• Scatter diagram does not measure the precise the extent of correlation

• It gives only an approximate idea of the relationship


K A RL PEARSON'S CO E F F I CIE N T O F CORRELATION

• Pearson’s ‘r’ is the most common correlation coefficient.


• Measure the degree of linear relationship between two variables say
x & y.
Assumptions of Karl Pearson’s coefficient
• There is linear relationship between two variables, i.e. when the
two variables are plotted on a scatter diagram a straight line will
be formed by the points.
• Cause and effect relation exists between the two variable
series.
KARL PEARSON'S COEFFICIENT OF CORRELATION

• The Pearson’s product-moment correlation coefficient, also known as


Pearson’s r, describes the linear relationship between two quantitative
variables.
• It is calculated by the formula which divides the covariance between the
variables by the product of their standard deviations

•rxy= strength of the correlation between variables x and y


•cov(x,y) = covariance of x and y
•sx = sample standard deviation of x
•sy = sample standard deviation of y
KARL PEARSON'S COEFFICIENT OF CORRELATION

r(x, y)= Σxy /√ Σx² Σy²


TABULAR REPRESENTATION

X Y x= X-X y= Y-Y (x)2 (y)2 x*y

∑x ∑y ∑ (x)2 ∑ (y)2 ∑ x*y


EXAMPLE 4

Find r
EXAMPLE 4

r= 772,5 r= 1
√ 257.5 * 2317.5
E X 6 C A L C U L AT E K A R L P E A R S O N ’ S C O E F F I C I E N T O F C O R R E L AT I O N
F R O M T H E F O L L O W I N G D ATA A N D INTERPRET I T S VALUE:
EX: 6
CONTD:
EXAMPLE 5

Find r
EXAMPLE 5

r(x, y)= Σxy /√ Σx² Σy²

r= 104.5 r= -0.28051
√ 253.5 * 547.5
EXAMPLE 7
SOLUTION
EXAMPLE 8
SOLUTION
K A RL PEARSON'S CO E F F I CIE NT O F CORRELATION
Method used so far

When deviation taken from actual mean:

r(x, y)= Σxy /√ Σx² Σy²


Where x= X-X and y= Y-Y
K A RL PEARSON'S CO E F F I CIE NT O F CORRELATION

Other two methods

2. When actual data is used

3. When deviation taken from an assumed mean:

N Σdxdy - Σdx Σdy


r=
√N Σdx²- (Σdx)² * √N Σdy²- (Σdy)²
METHOD 1- WHEN ACTUAL DATA IS USED
EX: 9
METHOD 1- WHEN ACTUAL DATA IS USED
EX: 9
3.WHEN DEVIATIONS ARE TAKEN FROM
ASSUMED MEAN
EX:12
CONTD:
EX: 13
EX:14
EX:14

SOLUTION
EXAMPLE 15
EXAMPLE 15
CONCEPT OF PROBABLE ERROR
.
• Probable error is used to find the reliability of correlation coefficient.
• P.E (r ) = Standard Error

50% of observations in
normal distribution lie in
the range µ ± 0.6745𝛔

Uses:
• If r< P.E(r), no correlation between the variables. This shows that the
coefficient of correlation is not at all significant.
• If r>6 P.E(r), this shows that the value of ‘r’ is significant.
• In other situations, nothing can be concluded with certainty.
• If another random sample of the same size n from the same population from
which the first sample is the observed value of ‘r1,’ in the second sample can
be expected to lie within the limits given by r ± P.E (r )
CONCEPT OF PROBABLE ERROR

Three conditions to use probable error:

• The data must approximate to the bell-shaped curve, i.e. a normal frequency curve.
• The Probable error computed from the statistical measure must have been taken
from the sample.
• The sample items must be selected in an unbiased manner and must be
independent of each other.
EX:15
EX:15
SOLUTION
EX:16
EX:16
EX:17
EX:17
COEFFICIENT OF DETERMINATION

• Another way of interpreting the value of correlation coefficient is


by using Coefficient of Determination

• Coefficient of Determination = r2
=Explained variation / Total variation
• Suppose: r = 0.9, r2 = 0.81 this would mean that 81% of the
variation in the dependent variable has been explained by the
independent variable.
• Coefficient of Non Determination
= k2 = 1 - r2
= Unexplained variation / Total variation
• The maximum value of r2 is 1 because it is possible to explain all of
the variation in y but it is not possible to explain more than all of it.
COEFFICIENT OF DETERMINATION: AN
EXAMPLE

Suppose: r1 = 0.60 and r2= 0.30 It does not mean that the first correlation
is twice as strong as the second the ‘r’ can be understood by computing
the value of r2 .

When r = 0.60 r2 = 0.36 -----(1)


r = 0.30 r2 = 0.09 -----(2)

This implies that in the first case 36% of the total variation is explained
whereas in second case 9% of the total variation is explained .
Spearman’s Rank Coefficient Of Correlation

• For series in which the variables are not capable of quantitative measurement
but can be arranged in a serial order, Spearman Rank correlation can be used.
• This coefficient indicates the association between the rankings.
• Spearman Rank coefficient of correlation is given by:

d= Difference of rank between paired item in two series.


n = Total number of observation.

• was developed by the British Psychologist Charles Edward Spearman in 1904


INTERPRETATION OF RANK CORRELATION
COEFFICIENT (R)

• The value of rank correlation coefficient, R ranges from -1 to +1

• If R = +1, then there is complete agreement in the order of the ranks and
the ranks are in the same direction (perfect association)

• If R = -1, then there is complete agreement in the order of the ranks and
the ranks are in the opposite direction(perfect association for reverse
rankings)
• If R = 0, then there is no association/correlation
TYPES OF RANK METHODS

• In the rank correlation we may have following types of


problems:
• Where ranks are given .
• Where ranks are not given.
• Where repeated ranks occur
RANK CORRELATION COEFFICIENT (R)
a) Problems where actual rank are given.
❖ Draw the table like

❖ 1) Calculate the difference ‘D’ of two Ranks


❖ i.e. (R1 - R2).
❖ Square the difference & calculate the sum of the difference i.e.
∑D2
❖ Substitute the values obtained in the formula.
WHEN RANKS ARE GIVEN
WHEN RANKS ARE GIVEN
RANK CORRELATION COEFFICIENT

b) Problems where Ranks are not given :

• If the ranks are not given, then we need to assign ranks to the data series.

• The lowest value in the series can be assigned rank 1 or the highest value in

the series can be assigned rank 1.

• We need to follow the same scheme of ranking for the other series.

• Then calculate the rank correlation coefficient in similar way as we do

when the ranks are given.


WHEN RANKS ARE NOT GIVEN
WHEN RANKS ARE NOT GIVEN

Marks by Rx Marks by Ry
X Y
50 5 31 3 4
66 6 64 6 0
34 3 53 5 4
21 2 41 4 4
15 1 17 1 0
79 8 73 7 0
42 4 29 2 4
RANK CORRELATION COEFFICIENT (R)

Equal Ranks or tie in Ranks: In such cases, the rank given is


average of ranks which these items would have got if they were
different from each other

Here, m1, m2, ……. are the number of times a value has repeated in the given
X, Y, …….. series, respectively.
REPEATED RANKS
REPEATED RANKS
CONTD:
FEATURES OF SPEARMAN’S RANK
CORRELATION

• The value of such co-efficient of correlation lies between +1 and -1.


• The sum of the differences between the corresponding ranks i.e.
∑d=0.
• It is independent of the nature of distribution from which the sample data
are collected for calculation of the co-efficient.
• It is calculation on the basis of the ranks of the individual items
rather than their actual values.
• Its result equals with the result of Karl Pearson’s co-efficient of
correlation unless there is repletion of any rank. This is because,
Spearman’s correlation is nothing more than the Pearson’s co-efficient
of correlation between the ranks
MERITS SPEARMAN’S RANK CORRELATION

• simple to understand
• Useful when the data is qualitative.
• Useful where the initial data in the form of ranks.

LIMITATION SPEARMAN’S CORRELATION


• Cannot be used for finding out correlation in a grouped frequency
distribution.

• This method should be applied where N exceeds 30.

You might also like