0% found this document useful (0 votes)
1 views

Correlation Regression Analysis

Correlation analysis examines the relationship between two or more variables, identifying whether they move in the same or opposite directions, and can be categorized into types such as positive/negative, simple/partial/multiple, and linear/non-linear correlations. Various methods, including scatter diagrams and Pearson's coefficient, are used to study correlation, with each method having its own merits and limitations. The document also discusses the assumptions, properties, and limitations of Pearson's coefficient and introduces Spearman’s rank correlation coefficient for cases where ranks are involved.

Uploaded by

biologyinshorts
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views

Correlation Regression Analysis

Correlation analysis examines the relationship between two or more variables, identifying whether they move in the same or opposite directions, and can be categorized into types such as positive/negative, simple/partial/multiple, and linear/non-linear correlations. Various methods, including scatter diagrams and Pearson's coefficient, are used to study correlation, with each method having its own merits and limitations. The document also discusses the assumptions, properties, and limitations of Pearson's coefficient and introduces Spearman’s rank correlation coefficient for cases where ranks are involved.

Uploaded by

biologyinshorts
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 40

Correlation analysis

If two quantities vary in such a way that movements in one are accompanied by

movements in other, these quantities are correlated. For example, there exists some

relationships between price of commodity and amount demanded. A relationship between

two such sets of observations is called correlation. The correlation analysis refers to the

techniques used in measuring the relationship between the variables.

Thus correlation is a statistical device which helps us in understanding the co-variation

of two or more variables.

Types of correlation

1. Positive and negative correlation :

It depends upon the direction of change of the variables. If both the variables are

varying in the same direction correlation is said to be positive. If, on the other hand, the

variables are varying in opposite direction correlation is said to be negative.


2. Simple, partial and multiple correlation :

The distinction is based upon the number of variables studied. When only two variables are
studied it is a problem of simple correlation. When there are more variables are studied it is a
problem of either multiple or partial correlation.

In multiple correlation three or more variables are studied simultaneously, on the other
hand, in partial correlation we recognize more than two variables, but consider only two
variables to be influencing each other the effect of other influencing variables kept constant.

3. Liner and non-linear (curvilinear) correlation :

The distinction is based upon the constancy of the ratio of change between the variables. If
the amount of change in one variable tends to bear constant ratio to the amount of change in
other variables then the correlation is said to be linear.

Correlation would be called non-linear or curvilinear if the amount of change in one


variable does not bear a constant ratio to the amount of change in the other variable.
 Methods of studying correlation :

1. Scatter diagram method.

2. Graphic method.

3. Karl pearson’s coefficient correlation.

4. Concurrent deviation method.

5. Method of least squares.

1. Scatter diagram method :

The simplest device for ascertaining whether two variables are related is to prepare a
dot chart called scatter diagram. When this method is used the given data are plotted on a
graph paper in the form of dots, i.e., for each fair of X and Y values we put a dot and thus
obtain as many points as the numbers of observations. By looking to the scatter of various
points we can form an idea to whether the variables are related or not. The greater the scatter
of the plotted pointed on the chart, the lesser is the relationship between the two variables.
The more closely the points come to a straight line, the higher the degree of relationship.
i. Perfect positive correlation :-

If all the points lie on a straight line falling from the lower left-hand corner to the upper right-
hand corner, correlation is said to be perfectly positive (i.e., r = +1).

ii. Perfect negative correlation :-

If all the points lie on a straight line rising from the upper left-hand corner to the lower right-

hand corner of the diagram, correlation is said to be perfectly negative (i.e., r = -1).
iii. High degree of positive correlation :-

If the plotted points fall in a narrow band would be a high degree of correlation
between the variables -- correlation shall be positive if the points show a rising tendency
from the lower-left hand corner to the upper right-hand corner.

iv. High degree of negative correlation :-

Correlation shall be negative if the points show a declining tendency from the
upper left-hand corner to the lower right-hand corner of the diagram.

v
v v
v
v
v
v. Low degree of positive correlation :-

If the points widely scattered over the diagram it indicates very little relationship
between the variables-- correlation shall be positive , if the points are rising from the lower
left-hand corner to the upper right-hand corner.

vi. Low degree of negative correlation :-

Correlation shall be negative if the points are running from the upper left-hand side to the

lower right-hand side of the diagram.


vii. No correlation :-

If the plotted points lie on a straight line parallel to the X- axis haphazard

manner, it shows absence of any relationship between the variables (i.e., r = 0).

Merits

a) It is simple and non-mathematical method of studying correlation between the


variables. As such it can be easily understood and a rough idea very quickly be
formed as to whether or not the variables are related.

b) It is not influenced by the size of extreme items.

c) It is the first step in investing the relationship between two variables.


 Limitations :-

By applying this method we can get an idea about the direction of the

correlation and also whether it is high or low. But we cannot establish the exact

degree of correlation between the variables as is possible by applying the

mathematical methods.
2. Graphic method

Under this method individual values of two variables are plotted on the

graph paper thus obtain two variables one for X variable and another for Y

variable. By examining the direction and closeness of the to drawn we can inter

whether not the variables are related. If both the curves drawn on the graph are

moving in the same direction (either upward or downward) correlation is said to be

positive. On the other hand, if the curves are moving in the opposite directions

correlation is said to be negative.


• From the following data ascertain whether the income and expenditure of 100 workers of a

factory are correlated.

Year Average income Average expenditure (in


(in Rs) Rs)

1979 100 90

1980 102 91

1981 105 93

1982 105 95

1983 101 92

1984 112 94

1985 118 100

1986 120 105

1987 125 108

1988 130 110


3. Karl pearson’s coefficient correlation.

The pearson coefficient of correlation is denoted by the symbol r. the formula for
computing pesrsonian r

(i) when deviations of is the items are taken from actual mean.
r= ∑xy
Nδx δy

Where,
x = (X – X̅ ) & y = ( Y - Y̅ )
δx = standard deviation of series X,
δy = standard deviation of series Y,
N = number of pairs of observations,
r = the correlation coefficient,

The value of r lies between +1 and -1, i.e., cannot be greater than 1 or less than -1.
If r = +1 correlation is perfect and positive. If, r = -1 correlation is perfect and
negative. If r = 0 there is no correlation i.e., the variables are independents the above
formula for computing pearson coefficient of correlation can be transformed to
following form which is easier to apply.
r= Σx y
Σx² Σy² (i)
x = ( X – X̅ )
y = ( Y – Y̅ ) √
(ii) Direct method of finding out correlation
r= N Σxy - (Σx) (Σ y)
√ NΣx² - (Σx)² √ NΣy² - (Σy)²

(iii) When deviations are taken from an assumed mean :


r= N Σ dx dy - {(Σ dx )*(Σ dy)}
√ N Σdx² - (Σdx)² √ NΣdy² - (Σdy)²

(iv) Correlation of grouped data :


r= N Σ fdx dy - {(Σ fdx )*(Σ fdy)}
√ N Σ fdx² - (Σ fdx)² √ NΣ fdy² - (Σ fdy)²
Calculate Karl Pearson coefficient of correlation for the following data:

X Y X- X̅ X² Y- Ȳ Y² XY
48 45 14 196 10 100 140
35 20 1 01 -15 225 -15
17 40 -17 289 5 25 -185
23 25 -11 121 -10 100 110
470 45 13 169 10 100 130
170 175 00 776 00 550 280

r = Σx y .
√ Σx² Σy²
x = ( X – X̅ ) y = ( Y – Y̅ )

X̅ = Σx = 170 = 34 Ȳ = Σy = 175 = 35
N 5 N 5
r = 280 .
√ 776*550
= 280
653.29
= 0.429
(ii) Direct method of finding out correlation
r= N Σxy - (Σx) (Σy)
√ NΣx² - (Σx)² √N Σy² - (Σy)²

Example calculate correlation coefficient from data by direct method

X Y X² Y² XY
r= N Σxy - (Σx) (Σy)
9 15 81 225 135 √ NΣx² - (Σx)² √N Σy² - (Σy)²
8 16 64 256 128
7 14 49 196 98
6 13 36 169 78
= 9*597 – 45*108 .
5 11 25 121 55 √9*285- (45)² √ 9*1356-
4 12 16 144 48 (108)²
3 10 9 100 30
2 8 4 64 16 = 5373 – 4860 .
1 9 1 81 9
√ 2565 – 2025 √ 12204 -11664
45 108 285 1356 597
= 513 .
√ 540 √ 540

= 513 .
540

r = 0.95
(iii) When deviations are taken from an assumed mean :
r= N Σ dx dy - {(Σ dx )*(Σ dy)}
√ N Σdx² - (Σdx)² √ NΣdy² - (Σdy)²
Example :
Calculate Karl Pearson coefficient of correlation between the calues of X and Y for the
following data:X : 78 89 96 69 59 79 68 61
Y : 125 13 156 112 107 136 123 108

X dX= dx² y dy = dy² dxdy


(X-69) (y-112)
78 +9 81 125 +13 169 +117
89 +20 400 137 +25 625 +500
96 +27 729 156 +44 1936 +1188
69 00 00 112 00 00 00
59 -10 100 107 -5 25 50
79 +10 100 136 +24 576 240
68 -1 01 123 +11 121 -11
61 -8 64 108 -4 16 +32

Σdx= 47 1475 Σdy= 108 3468 2116


r= N Σ dx dy - {(Σ dx )*(Σ dy)}
√ N Σdx² - (Σdx)² √ NΣdy² - (Σdy)²

r= 9 *47*108 - {(47 )*(108)}


√ 9 * 1475 - (47)² √ 9 * 3468 - (108)²

r= 16928 - 5076
√ 11800 – 2209 √ 27744 – 11664
r= 1852 .
√ 9591√ 16080
r= 11852 .
97.93 * 126.80
r= 11852 .
12417.78
r= 0.954
Assumptions of the Pearson's coefficient

1) There is linear relationship between the variables.

2) The two variables under study are affected by a large number of causes so as to form

a normal distribution.

3) There is a cause and effect relationship between the forces affecting the distribution

of the items in the two series.

Merits of the Pearson's coefficient

4) It is most popular.

5) It summarizes in one figure not only the degree of correlation but also the direction,

i.e., whether correlation is positive or negative.


 Limitations of the Pearson's coefficient

1)The correlation coefficient always assumes linear relationship regardless of the fact

whether that assumption is correct or not.

2)Greater care must be taken for calculating the value.

3)It is unduly affected by extreme items.

4)This method takes more time to compute the value of correlation coefficient.
Properties of coefficient of correlation :
1. The coefficient of correlation lies between -1 and +1. symbolically, -1 ≤ r ≥ +1 or
2. The coefficient of correlation is independent of the change of scale and origin of the
variables X and Y.
By change of origin we mean subtracting some constant from every given
values of X and Y and by change of scale we mean dividing or multiplying every value
of X and y by some constant.
3. The coefficient of correlation is the geometric mean of two regression coefficient.
Symbolically,
r = √ bxy * byx .
4. The degree of relationship between the two variables is symmetric as shown below
rxy = ryx
rxy = Σx y = Σy x = ryx
Nδx δy N Σ δy δx
Spearman’s coefficient Rank correlation :

Spearman’s rank correlation coefficient is defined as ,

R = 1-(6 ΣD² )
N (N² –1)
R = Rank correlation coefficient

D = Difference of rank between paired item in two series.

N = Total number of observation.

The value of this coefficient ,interpreted in the same way as karl pearson’s
correlation coefficient, ranges between +1 and -1. when r ranks are in the same
direction, when the r is -1 there is complete agreement in the order to the ranks
and they are in opposite directions.
Example :

R1 R2 D= R1 – R2 D²
1 3 -2 4
2 2 0 0
3 1 2 4
D² = 8

R = 1- (6 ΣD² )
N (N² –1)

= 1- 6 * 8
3 (3² –1)

= 1 - 48
24

R= - 1
 In rank correlation we may have two types of problems :

A) Where ranks are given.

B) Where ranks are not given.

A.Where ranks are given :

Where actual ranks are given to us the required for computing rank correlation are :

(i) Take the difference of the two ranks, i.e., (R1-R2) and denote these differences by D.

(ii) Square there differences and obtained the total Σ D2.

(iii) Apply the formula,


R = 1-(6 ΣD² )
N3 –1
Example :
The ranking of 10 students in subjects accounting and auditing are as follow:
Accounting 3 5 8 4 7 10 2 1 6 9
Auditing : 6 4 9 8 1 2 3 10 5 7
R = 1- (6 ΣD² )
Solution:
N 3 –N
R1 R2 D= R1 – R2 D²
= 1- 6 * 214
3 6 3 9
10 3 – 10
5 4 1 1
8 9 -1 1
4 8 -4 16 = 1 - 1284
7 1 6 36 990
10 2 8 64
2 3 -1 1 = - 294
1 10 -9 81 990
6 5 -1 1
9 7 2 4 R = - 0.296

D² = 214
2) Where ranks are not given :

When we are given the actual data and not the ranks, it will be necessary to assign

the ranks. Ranks can be assigned by taking either highest value as 1 or the lowest value

as 1. but whether we start with lowest value or the highest value we must follow the

same method in use of both the variables.

Example:

Quotations of Index Numbers of security prices of a certain joint stock

company are given below, using rank correlation method, determine the relationship

between debenture price and share prices.


Year Debentares Prices Share Ry D = (Rx – Ry) D2
x Rx price
y

1 97.8 3 73.2 1 2 4
2 99.2 7 85.8 6 1 1
3 98.8 6 78.9 4 2 4
4 98.3 4 75.8 2 2 4
5 98.4 5 77.2 3 2 4
6 96.7 1 87.2 7 6 36
7 97.1 2 83.8 5 3 9
D² = 62

R = 1- (6 ΣD² )
N 3 –N
= 1- 6 * 62
73 – 7
= 1 - 372
336
= - 36
336
R = - 0.107
Equal ranks

If two or more items are of equal value, they can assigned average rank. An

adjustment is required for each group of equal ranks. The formula for calculating

rank coefficient of correlation in case of equal rank is :


r= 1 – 6 {ΣD² + 1/12 ( m3 –m) + 1/12 (m3- m) + --------}
N3 – N

m -- stands for the number of items whose ranks are common.

If there are one such group of items with common ranks, this value is

added as many times the number of such group.


Example:
Calculate the rank coefficient of correlation of the following data.

X Rx Y Ry D = (Rx – Ry) D2

80 8 12 1 7 49
78 7 13 2 5 25
75 5.5 14 4 1.5 2.25
75 5.5 14 4 1.5 2.25
68 4 14 4 0 00
67 3 16 7 4 16
60 2 15 6 4 16
59 1 17 8 7 49

D² = 159.5

r= 1 – 6 {ΣD² + 1/12 ( m3 –m) + 1/12 (m3- m) + --------}


N3– N
r = 1 – 6 {159.5+ 1/12 ( 23 –2) + 1/12 (33- 3)
83 – 8
r = 1 – 6 {159.5+ 5 + 2 }
504
r = 1 – 972
504
r = 1 – 1.929
r = - 0.929
Merits

(1)This method is simpler to understand and easier to apply compared to the karl
Pearson's method.

(2)When the data use of a qualitative nature like honesty, efficiency etc, this method can
be used with great advantage.

(3) This is the only method that can be used where we can given the ranks and not actual
data.

(4)Even where actual data are given, rank method can be applied for ascertaining
correlation

 Limitations

1) This method cannot be used for finding out correlation in a grouped frequency distribution.

2) This method should not be applied where N exceeds 30 censes we can given the ranks and

not the actual values of the variables.

28
Regression analysis
 Introduction

Regression analysis reveals average relationship between two variables and this

makes possible estimation or prediction. The two variable regression model assigns one of

the variables the status of an independent variable, and the other variable the status of a

de-pendent variable.

 Uses of regression analysis :


1. Regression analysis provides estimates of values of the dependent variables from values of
the independent variable. The devised used to accomplish this estimation procedure is
regression line.
2. A second goal of regression analysis is to obtain a measure of the error involved in using
the regression line as a basis for estimation. For this purpose standard error of estimate is
calculated.
3. With the help of regression coefficient we can calculate the correlation co-efficient.
Difference between correlation and regression analysis
I. Regression equations
Regression equations also know as estimating equations, are algebraic expression of the
regression lines.
 Regression equation of Y on X
The regression equation of Y on X is used to describe the variation in the values of Y for
given changes in X.
It is expressed as
Yc = a + bX
Here, a and b are constant, the symbol Yc stands for the value of Y computed from the
relationship for given X.
To determine the values of a and b the following normal equations are to be solved
simultaneously,
ΣY = Na + bΣx
ΣXY = a Σx + bΣx²
Regression equation X on Y
The regression equation of X on Y is used to describe the variations in the values of X for
given changes in Y.
It is expressed as
Xc = a + bY
To determine the values of a and b the following normal equations are to be solved
simultaneously,
ΣX = Na + bΣy
ΣXY = a Σy + bΣy²

Deviations taken from arithmetic means of X and Y


 Regression equation of X on Y
X – X̅ = r δx / δy (Y - Y̅ )
X̅ is the mean of X series,
Y̅ is the mean of Y series,
r δx / δy is know as the regression coefficient of X on Y,
The regression coefficient of X on Y is denoted by the symbol, bxy .
bxy or y = r δx / δy = Σxy / Σy²
 Regression equation of Y on X

Y – Y̅ = r δy / δx (x - x̅ )

X̅ is the mean of X series,

Y̅ is the mean of Y series,

r δy / δx is know as the regression coefficient of y on x,

The regression coefficient of Y on X is denoted by the symbol, byx .

byx or = r δy / δx = Σxy / Σx²


Example: From the following data calculate the regression equation:

X Y x x2 y y2 xy

6 9 0 0 1 1 0
2 11 -4 16 3 9 -12
10 5 4 16 -3 9 -12
4 8 -2 04 0 0 00
8 7 2 04 -1 1 -2
30 40 0 20 -26

Regression equation of X on Y
X – X̅ = r δx / δy (Y - Y̅ )

X̅ = ΣX = 30 = 6 Y ̅ = ΣY = 40 = 8
N 5 N 5

r δx = Σxy = -26 = - 1.3


δy Σy2 20

Hence, X – 6 = -1.3 (Y-8)


X-6 = -1.3Y +10.4
X = -1.3Y + 16.4
X = 16.4 -1.3Y
Regression equation of Y on X
Y - Y̅ ̅ = r δy / δx ( X – X)

r = δy = ΣXY = -26 = - 0.65


δx Σx2 40

Hence,
Y – 8 = -0.65 (X – 6)
Y – 8 = -0.65X + 3.9
Y = -0.65X + 11.9
OR Y = 11.9 – 0.65X
 Deviations taken from assumed means :

 regression equation of X on Y :

( X – X̅ ) = r δx / δy (Y - Y̅ )

The value of r δx / δy will now be obtained as follows ,


r δx / δy = Σdxdy – Σdx * Σdy / N
Σ dy² - (Σdy)² / N

dx = (X-A) and dy = (Y-A)

 Similarly the regression equation Y on X is,

(Y - Y̅ ) = r δx / δy ( X – X̅ )
r δy / δx = Σdxdy – Σdx * Σdy / N
Σ dx² - (Σdx)² / N
Example: From the following data calculate the regression equation:
X Y Dx =X dX2 Dy = Y – 7 dY2 XY
–6
6 9 0 0 2 4 0
2 11 -4 16 4 16 -16
10 5 4 16 -2 4 -08
4 8 -2 04 1 1 -02
8 7 2 04 0 0 00

30 40 40 5 25 -26

dx = X – A dy = Y-A
X̅ = ΣX = 30 = 6 Y ̅ = ΣY = 40 = 8
N 5 N 5
Regression equation of Y on X
Y - Y̅ ̅ = byx ( X – X)
byx = Σdxdy – (Σdx) (Σdy)/N
Σ dx 2 – (Σdx 2) / N
= -26 – (0) (5)
40 – 0/5
= -26/40
= -.65
dx = X – A dy = Y-A
X̅ = ΣX = 30 = 6 Y ̅ = ΣY = 40 = 8
N 5 N 5
Regression equation of Y on X
( X – X) ̅ ̅ = byx (Y - Y )
byx = Σdxdy – (Σdx) (Σdy)/N
Σ dy 2 – (Σdy 2) / N
= -26
25 – (5) 2 /5
= -26/20
= -1.3
Difference Between Correlation And Regression

1. Correlation coefficient measures the degree of co variability between X and Y. Regression


nature of relationship dependent and independent.

2. Correlation merely degree of relationship not cause and effect relationship. Regression cause
and effect relationship.

3. rxy is a measure of direction and degree of linear relationship between two variables X and Y,

rxy and ryx symmetric (rxy = ryx). i.e., it is immaterial which of X and Y is dependent

variable and which is independent variable. In regression analysis the regression coefficients

bxy and byx are not symmetric i.e., bxy ≠ byx and hence it definitely makes a difference as to

which variable is dependent which variable is independent.

4. There may be nonsense correlation between two variables, it is purely due to change and has
no practical relevance. Nothing like nonsense regression.

5. Correlation coefficient is independent of change of scale and origin. Regression coefficient of

correlation takes the same sign as the ( r ) regression coefficient ( bxy and byx )
• The following points should be noted about the regression coefficient :

1. Both the regression coefficients will have the same sign. i.e., either they will be positive or

negative. It is never possible that one of the regression coefficients is negative and other

positive.

2. Since the value of the coefficient of correlation ( r ) cannot exceed one, one of the regression

coefficient must be less than one or in other words, both the regression coefficients can not be

greater than one. For ex – if bxy = 1.2 and byx = 1.4 the value of correlation coefficient would

be √1.2 x 1.2 = 1.296 which is not possible.

3. The coefficient of correlation will have the same sign as that of regression coefficients i.e., if

regression coefficient have a negative sign, r will also be negative, if regression coefficient

have a positive sign r would also be positive. For example – if bxy = -0.8 and byx = -1.2 r

would be r = √-0.8 x -1 = - 0.98 and not + 0.98.


4. Since bxy = r δx / δy, we we can find out any of the four values given the other three. For

example, if we know that r = 0.6, δx = 0.4 and bxy = 0.8 we can find δy, bxy = r δx / δy .

0.8 = 0.6 (0.4 / δy ) or δy = 0.24 / 0.8 = 0.3

5. Regression coefficients are independent of change of origin but not scale ( change of origin

subtracting some constant change of scale. Dividing, multiplying, every values x and y).

You might also like