0% found this document useful (0 votes)
338 views

Notes For Correlation Unit - 3 Business Statistics

The document discusses correlation analysis and different types of correlation. It defines correlation as the relationship between two or more variables where a change in one variable produces a corresponding change in the other. There are three main types of correlation discussed: 1) positive correlation where variables change in the same direction, 2) negative correlation where variables change in opposite directions, and 3) no correlation where variables are unrelated. The document also discusses scatter diagrams as a method to study and visualize correlations between two variables by plotting their relationship.

Uploaded by

Mayank Majoka
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
338 views

Notes For Correlation Unit - 3 Business Statistics

The document discusses correlation analysis and different types of correlation. It defines correlation as the relationship between two or more variables where a change in one variable produces a corresponding change in the other. There are three main types of correlation discussed: 1) positive correlation where variables change in the same direction, 2) negative correlation where variables change in opposite directions, and 3) no correlation where variables are unrelated. The document also discusses scatter diagrams as a method to study and visualize correlations between two variables by plotting their relationship.

Uploaded by

Mayank Majoka
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

UNIT-III

CORRELATION ANALYSIS

CORRELATION/ COVARIATION

It is the relationship that exists between two or more variables. If two variables are
related to each other in such a way that change in one creates a corresponding
change in other, then the variables are said to be correlated.

Example

Relationship between height and weight


Relationship between the price of commodity and the demand of the commodity
Relationship between the age of the individuals and their blood pressure
Relationship between the advertising expenditure and sales

Types of Correlation

1. Depending on the direction of change of the variables:


 Positive correlation
 Negative correlation

Positive correlation
If both the variables vary in the same direction, correlation is said to be positive.
If one variable increases, the other increases or, if one variable decreases, the other
decreases, then the correlation between the two variables is said to be positive
correlation.
Example

Height(cm) 158 161 163 166 168 171 173 176


Weight(kg) 60 62 64 65 67 69 71 73

As height is increasing we observe that the weight is also increasing. Hence, the
direction of change is same. Height and weight share a positive correlation.

Negative correlation
If both the variables vary in the opposite direction, correlation is said to be
negative.
If one variable increases, the other decreases or, if one variable decreases, the other
increases, then the correlation between the two variables is said to be negative
correlation.

Example
Price (Rs.) 5 4 3 2 1
Demand (units) 100 200 300 400 500

As price is decreasing we observe that the demand is increasing. Hence, the


direction of change is opposite. Price and Demand share a negative correlation.

2. Depending upon the study of number of variables:


 Simple Correlation
 Multiple Correlation

Simple Correlation
When only two variables are studied, it is the case of simple correlation.
Example
Relationship between the wheat output per acre and the amount of rainfall.

Multiple Correlation
When three or more variables are studied, it is a case of multiple correlation.
Example
Relationship between the wheat output per acre, amount of rainfall and the amount
of fertilizers used.
They are of two types: Partial or Total
Partial Multiple Correlation
When one studies three or more variables but considers only two variables to be
influencing each other and the effect of other influencing variables being held
constant. Its order depends on the number of variables being held constant.
i.e. if one variable is held constant then it is called first order partial Correlation.
Total Multiple Correlation
When one studies three or more variables without excluding the effect of any
variable held as a constant.

3. Depending upon the constancy of the ratio of change between the variables:
 Linear Correlation
 Non-Linear / Curvilinear Correlation

Linear Correlation

If the amount of change in one variable bears a constant ratio to the amount of
change in the other variable.
If such variables are plotted then all the points will fall on a straight line.
Example
Milk (L) 10 20 30 40 50
Cheese (Kg) 2 4 6 8 10
The change in milk quantity to the change in cheese quantity is a constant ratio
10:2.

Non-Linear / Curvilinear Correlation

If the amount of change in one variable does not bear a constant ratio to the
amount of change in the other variable.
If such variables are plotted then all the points will fall on a curve and not a
straight line.
Example
Advertising Expenditure 2 4 6 8 10
Sales 10 12 15 15 16
Method of Studying Correlation

Scatter Diagram Method


It is a digramatic representation of bivariate data to ascertain the correlation
between two variables. Scatter diagrams are useful to determine the relationship
between two variables. This relationship can be between two causes, or a cause and
an effect, etc. It can be positive, negative or no relationship at all. The first variable
is independent, and the second variable depends on the first. To analyze the pattern
of the relationship, you change the independent variable and monitor the changes
in the dependent variable. A scatter diagram can have two independent variables.
Plot both the variables on the graph.
Observe the scatter of the plotted points

Perfect Positive Correlation


Perfect Negative Correlation
High Degree of Positive Correlation
High Degree of Negative Correlation
Low Degree of Positive Correlation
Low Degree of Negative Correlation
No Correlation
1. Perfect Positive Correlation

2. Perfect Negative Correlation

3. Low Degree of Positive Correlation


4. Low Degree of Negative Correlation

5. High Degree of Positive Correlartion

6. High Degree of Negative Correaltion


7. No Correlation

Scatter Diagram with No Correlation


There isn’t any relationship between the two variables to be seen. It might just be a
series of points with no visible trend, or it might be a straight, flat row of points. In
either case, the independent variable has no effect on the second variable; it is not
dependent.

A perfect positive correlation is given the value of 1. A perfect negative correlation


is given the value of -1. If there is absolutely no correlation present the value given
is 0. The closer the number is to 1 or -1, the stronger the correlation, or the stronger
the relationship between the variables. The closer the number is to 0, the weaker
the correlation. So something that seems to kind of correlate in a positive direction
might have a value of 0.67, whereas something with an extremely weak negative
correlation might have the value -0.21.

An example of a situation where you might find a perfect negative correlation, as


in the graph on the right above, would be if you were comparing the amount of
time it takes to reach a destination with the distance of a car (traveling at constant
speed) from that destination.

On the other hand, a situation where you might find a strong but not perfect
positive correlation would be if you examined the number of hours students spent
studying for an exam versus the grade received. This won't be a perfect correlation
because two people could spend the same amount of time studying and get
different grades. But in general the rule will hold true that as the amount of time
studying increases so does the grade received.
Limitations of a Scatter Diagram

The following are a few limitations of a scatter diagram:

 Scatter diagrams cannot give you the exact extent of correlation.


 A scatter diagram does not show you the quantitative measurement of the
relationship between the variables. It only shows the quantitative
expression of quantitative change.
 This chart does not show you the relationship for more than two
variables.

Benefits of a Scatter Diagram

The following are a few advantages of a scatter diagram:

 It shows the relationship between two variables.


 It is the best method to show you a non-linear pattern.
 The range of data flow, i.e. maximum and minimum value, can be
determined.
 Observation and reading are straightforward.
 Plotting the diagram is easy.

Question (2014)

Distinguish between Linear and Curvilinear Correlation.

Question (2015)

What is correlation? Explain various types of correlation. Does it always signify


causes and effect relationship between the two variables.

There are all sorts of correlations we can look at. Sometimes variables increase or
decrease over time. For example, the earth’s temperature is increasing over
time. So are the levels of greenhouse gases. If you run a correlation analysis on
these two variables, you will find that global temperature correlates strongly to the
level of greenhouse gases. But does this mean that one is the cause of the
other? Not necessarily. When two variables are trending up or down, a correlation
analysis will often show there is a significant relationship – simply because of the
trend – not necessarily because there is a cause and effect relationship between the
two variables.
COVARIANCE

Given a set of N pairs of Observations (X1, Y1), (X2, Y2), (X3, Y3), ……….., (XN,
YN) relating to two variables X and Y, the covariance X and Y, usually represented
by Cov(X, Y), is defined as:

∑ ̅ ̅ ∑
Cov(X, Y) = =

Where ̅ ̅

The covariance is a measure for how two variables are related to each other, i.e.,
how two variables vary with each other.

Example

Calculate the covariance

X 1 2 3 4 5
Y 10 20 30 50 40
Solution

X ̅ Y ̅ xy
1 -2 10 -20 40
2 -1 20 -10 10
3 0 30 0 0
4 1 50 20 20
5 2 40 10 20
∑ = 15 ∑ =0 ∑ =150 ∑ =0 ∑ =90

N= number of pairs = 5

̅=∑ = =3
̅=∑ = = 30

∑ ̅ ̅ ∑
Cov(X, Y) = = = = 18

A covariance of 0 indicates that two variables are totally unrelated. If the


covariance is positive, the variables increase in the same direction, and if the
covariance is negative, the variables change in opposite directions.
Properties of Covariance/ correlation

1. Independent of the choice of Origin: The value of covariance is not affected


even if each of the individual values of X and Y is increased or decreased by some
non-zero constant.
X Y Cov(X, Y)
X+A, Y +A Cov(X+A, Y+A) = Cov(X, Y)
X-B, Y-B, Cov( X-B, Y-B) = Cov(X, Y)

2. Dependent on the choice of scale: The value of covariance is affected if each of


the individual values of X and Y is multiplied or divided by some non-zero
constant.
X Y Cov(X, Y)
X*A, Y*A , Cov(X*A, Y*A) = A*Cov(X, Y)
X*C, Y*C , Cov(X*C, Y*C) = C*Cov(X, Y)

Covariance may be positive, negative or zero.

Limitation
Covariance is direct measure of correlation between two or more variables, but it
cannot be used for the meaningful measuring of the strenght of the relationship
between two variables.
Because covariance can take all the values from negative to positive to zero value.

Karl Pearson’s Coefficents of Correlation/ Covariance

Given a set of N pairs of Observations (X1, Y1), (X2, Y2), (X3, Y3), ……….., (XN,
YN) relating to two variables X and Y, the Coefficient of correlation between X
and Y, usually represented by r, is defined as:

r =

where
Cov(X, Y) = Covariance of X and Y
= Standard Deviation of X
= Standard Deviation of Y

r =
∑ ̅ ̅ ∑ ∑ ∑ ∑
= = = = =
√∑
̅
√∑
̅
√∑ √∑ √∑ ∑


⟹ r=
√∑ ∑

Where ̅ ̅

Pearson’s coefficient of correlation; ρ or “r” is measures the linear correlation


between two features and is closely related to the covariance. In fact, it’s a
normalized version of the covariance.

By dividing the covariance by the features’ standard deviations, we ensure that the
correlation between two features is in the range [-1, 1], which makes it more
interpretable than the unbounded covariance. However, note that the covariance
and correlation are exactly the same if the features are normalized to unit variance
(e.g., via standardization or z-score normalization). Two features are perfectly
positively correlated if ρ=1and pefectly negatively correlated if ρ=−1. No
correlation is observed if ρ=0.

Properties of Coefficient of correlation

Independent of the choice of Origin: The value of r is not affected even if each of
the individual values of X and Y is increased or decreased by some non-zero
constant.
Dependent on the choice of scale: The value of r is affected if each of the
individual values of X and Y is multiplied or divided by some non-zero constant.

Coefficient of correlation, r lies between -1 and 1.

Independent of the unit of measurement: r is a pure number devoid of units.

Interpretation
Example

Calculate the Coefficient of Covariance

X 1 2 3 4 5
Y 10 20 30 50 40

Solution
X ̅ Y ̅ xy
1 -2 10 -20 40 4 400
2 -1 20 -10 10 1 100
3 0 30 0 0 0 0
4 1 50 20 20 1 400
5 2 40 10 20 4 100
∑ = 15 ∑ =0 ∑ =150 ∑ =0 ∑ =90 ∑ =10 ∑ =1000

N= number of pairs = 5

̅=∑ = =3
̅=∑ = = 30

r= = = = 0.9
√∑ ∑ √

⟹ There exist high positive correlation

Example
Calculate the correlation coefficient from the following data:
X 6 8 12 15 18 20 24 28 31
Y 10 12 15 15 18 25 22 26 28

Solution

X Y ̅ ̅ xy x2 y2
6 10 -12 -9 108 144 81
8 12 -10 -7 70 100 49
12 15 -6 -4 24 36 16
15 15 -3 -4 12 9 16
18 18 0 -1 0 0 1
20 25 2 6 12 4 36
24 22 6 3 18 36 9
28 26 10 7 70 100 49
31 28 13 9 117 169 81
∑ =162 ∑ =171 ∑ =0 ∑ =0 ∑ =431 ∑ = ∑ =
598 338

N= number of pairs = 9

̅=∑ = = 18
̅=∑ = = 19


r= = = = 0.959
√∑ ∑ √

⟹ There exist high positive correlation


Example
Find the correlation coefficient between age and playing habit of students.

Age(years) 15 16 17 18 19 20
No. of students 250 200 150 120 100 80
Regular Players 200 150 90 48 30 12

Let us first find the percentage of regular players and then calculate the coefficient
of correlation between the age and percentage so obtained.

Solution
200/250 *100
150/200*100

Age (X) x=(X- x2 No of Regular %of y=(Y- y2 xy


̅) students players regular ̅)
players =
Y
15 -2.5 6.25 250 200 80 30 900 -75
16 -1.5 2.25 200 150 75 25 625 -37.5
17 -0.5 0.25 150 90 60 10 100 -5
18 0.5 0.25 120 48 40 -10 100 -5
19 1.5 2.25 100 30 30 -20 400 -30
20 2.5 6.25 80 12 15 -35 1225 -87.5
∑X ∑x2 =17.5 ∑Y = 300 ∑ ∑y2=335 ∑xy=-
=105 0 240

̅ =17.5 ̅ =50

r= = = = -0.991
√∑ ∑ √

High negative correlation.

Question

A consulting firm is preparing a study on consumer behavior. The company the


following data in thousand rupees to determine whether there is a relationship
between consumer income and consumption level.
Consumer No. 1 2 3 4 5 6
Income(Rs.) 300 350 320 400 295 315
Consumption(Rs.) 250 275 270 300 269 290
Calculate correlation coefficient for the above data. Write your comments about
the correlation coefficient value.

Advantage

Gives direction as well as the degree of the relationship between the variables.

Helps in estimating the value of the dependent variables from the known value of
independent variables.

Limitations

Time consuming method


Affected by extreme values
It is to be interpreted after taking into consideration other factors as well.

SPEARMAN’S RANK CORRELATION (R)

It uses ranks rather than actual observation. The correlation coefficient between
two series of ranks is called Rank Correlation Coefficient.


R=1

R= rank Correlation coefficient


D= Difference of the ranks between paired items in two series
N= Number of pairs of ranks

Advantage of Spearman’s rank Correlation Coefficient

1. Simple to understand and easy to apply

2. suitable for qualitative data ( association between the variables which are not
capable of being quantifiable but can only be ranked in some order. Example: it is
possible for the two judges to rank by preference 10 girls in terms of beauty
wheraeas it may be difficult to give them numerical grades in terms of beauty.

3. Suitable for abnormal data


4. Only method where ranks are given and not the actual data.

When to Use

1. Number of pairs of observations is fairly small.


2. The original data is in the form of ranks.

Steps

1. Calculate the diffeernce between two ranks i.e. (R1 - R2) =D


R1 – ranks of the first variable’s values
R2 – ranks of the second variable’s values

2. Square these differences and obtain the total i.e. ∑

3. Calculate the rank correlation



R=1–
4. Interepret the rank correlation

Case 1: When actual ranks are given

Example

(2015)
Two judges in a beauty competition rank the 12 entries as follows:
X 1 2 3 4 5 6 7 8 9 10 11 12
Y 12 9 6 10 3 5 4 7 8 2 11 1
What degree of agreement is there between the judgement of the two judges?
Solution

X =R1 R1 Y=R2 R2 D = R1 – R2 D2
1 1 12 12 -11 121
2 2 9 9 -7 49
3 3 6 6 -3 9
4 4 10 10 -6 36
5 5 3 3 2 4
6 6 5 5 1 1
7 7 4 4 3 9
8 8 7 7 1 1
9 9 8 8 1 1
10 10 2 2 8 64
11 11 11 11 0 0
12 12 1 1 11 121
∑D2 = 416

N = 12


R=1–

=1– =1– =1– = 1 – 1.389 = - 0.39

Low degree of negative correlation


Low degree of agreement

Example

Three judges in a beauty contest ranked the entries as follows:

X 1 2 3 4 5
Y 5 4 3 2 1
Z 3 5 2 1 4

Which pair of judges has the nearest approach to common tastes in beauty.

Solution

Let us rank each of them


R1 R2 R3 D12 = (R1 – R2)2 D22= (R2 – R3)2 D32= (R3 – R1)2
(rank for (rank for (rank for Z)
X) Y)
1 5 3 16 4 4
2 4 5 4 1 9
3 3 2 0 1 1
4 2 1 4 1 9
5 1 4 16 9 1
∑ D12 = 40 ∑ D22= 16 ∑ D32 = 24

N=5
Rank correlation between the judgement of first and second judges

R=1–

=1– =1 -2 = -1

Rank correlation between the judgement of second and third judges



R=1–

=1– =1 -0.8 = 0.2

Rank correlation between the judgement of first and third judges



R=1–

=1– =1 -1.2 = -0.2

Second and the third judges have the nearest approach to common tastes in beauty
since the correlation coefficient is positive here.

Case 2: When actual ranks are not given

Example

Calculate rank correlation coefficient for the following data:

X 59 69 39 49 29
Y 79 69 59 49 39
Solution
X Y R1 R2 D D2
59 79 4 5 -1 1
69 69 5 4 1 1
39 59 2 3 -1 1
49 49 3 2 1 1
29 39 1 1 0 0
∑D2 = 4

N=5
Rank correlation

R=1– =1– =1– = 1 – 0.2 = 0.8

Question

For the data X and Y given below:


X 1313 2020 2222 1818 1919 1111 1010 1515
Y 1717 1919 2323 1616 2020 1010 1111 18

Find spearman’s rank correlation coefficient

Solution

X Y R1 R2 D D2
1111 1717 3 5 -2 4
2020 1919 7 6 1 1
2222 2323 8 8 0 0
1818 1616 5 4 1 1
1919 2020 6 7 -1 1
1111 1010 2 2 0 0
1010 1111 1 3 -2 4
1515 18 4 1 3 9
∑D2 = 20

N=8
Rank correlation

R=1– =1– =1– = 1 – 0.238 = 0.761
Case3 : When values of some variables are equal

In case there is more than one item with the same values in the series, usually
average rank is alloted to each of these items and the factor is added for
each such tied item to ∑ . Thus, in case of tied ranks, the modified formula for
rank correlation coefficient becomes.


R=

Example

Calculate rank correlation coefficient for the following data:

X 49 69 39 49 29
Y 59 59 59 49 39

Solution
X Y R1 R2 D D2
49 59 3.5 4 -0.5 0.25
69 59 5 4 1 1
39 59 2 4 -2 4
49 49 3.5 2 0.5 2.25
29 39 1 1 0 0
79 ∑D2 = 7.5

In X, 49 we will take the average of 3rd and 4th 3+4//2= 7/2= 3.5
In Y, 59 we will take the average of 3rd, 4th and 5th , 3+4+5/3= 12/3=4
X has 49 two times (m=2), assign the average of the ranks 3th and 4th position for
the rank of 49.
= 3.5

Y has 59 three times (m=3), assign the average of the ranks 3rd , 4th and 5th position
for the rank of 59.
=4
N=5

R= = =
= = 1 – 0.5 = 0.5

There exist moderate positive correlation.

Exercise

Find the rank correlation coefficient for the given data.

X 49 69 39 49 29 49 79
Y 59 59 59 49 39 39 69

Questions

1. Calculate the number of pairs of X and Y variables if covariance between X and


Y is 18 and ∑xy = 288

2. Calculate the number of pairs of X and Y variables if ∑xy = 324, variance of x =


64, variance of Y = 81 and coefficient of correlation= 0.3

3. Fill in the blanks in each of the following alternative cases

Case Coefficient of Variance of X Variance of Y Covariance


correlation
A 0.8 9 16 ?
B ? 16 25 14
C 0.5 36 ? 21

1. 16
2. 15
3. A. 9.6 B. 0.7 C. 49

You might also like