Correlation Regression Analysis
Correlation Regression Analysis
If two quantities vary in such a way that movements in one are accompanied by
movements in other, these quantities are correlated. For example, there exists some
two such sets of observations is called correlation. The correlation analysis refers to the
Types of correlation
It depends upon the direction of change of the variables. If both the variables are
varying in the same direction correlation is said to be positive. If, on the other hand, the
The distinction is based upon the number of variables studied. When only two variables are
studied it is a problem of simple correlation. When there are more variables are studied it is a
problem of either multiple or partial correlation.
In multiple correlation three or more variables are studied simultaneously, on the other
hand, in partial correlation we recognize more than two variables, but consider only two
variables to be influencing each other the effect of other influencing variables kept constant.
The distinction is based upon the constancy of the ratio of change between the variables. If
the amount of change in one variable tends to bear constant ratio to the amount of change in
other variables then the correlation is said to be linear.
2. Graphic method.
The simplest device for ascertaining whether two variables are related is to prepare a
dot chart called scatter diagram. When this method is used the given data are plotted on a
graph paper in the form of dots, i.e., for each fair of X and Y values we put a dot and thus
obtain as many points as the numbers of observations. By looking to the scatter of various
points we can form an idea to whether the variables are related or not. The greater the scatter
of the plotted pointed on the chart, the lesser is the relationship between the two variables.
The more closely the points come to a straight line, the higher the degree of relationship.
i. Perfect positive correlation :-
If all the points lie on a straight line falling from the lower left-hand corner to the upper right-
hand corner, correlation is said to be perfectly positive (i.e., r = +1).
If all the points lie on a straight line rising from the upper left-hand corner to the lower right-
hand corner of the diagram, correlation is said to be perfectly negative (i.e., r = -1).
iii. High degree of positive correlation :-
If the plotted points fall in a narrow band would be a high degree of correlation
between the variables -- correlation shall be positive if the points show a rising tendency
from the lower-left hand corner to the upper right-hand corner.
Correlation shall be negative if the points show a declining tendency from the
upper left-hand corner to the lower right-hand corner of the diagram.
v
v v
v
v
v
v. Low degree of positive correlation :-
If the points widely scattered over the diagram it indicates very little relationship
between the variables-- correlation shall be positive , if the points are rising from the lower
left-hand corner to the upper right-hand corner.
Correlation shall be negative if the points are running from the upper left-hand side to the
If the plotted points lie on a straight line parallel to the X- axis haphazard
manner, it shows absence of any relationship between the variables (i.e., r = 0).
Merits
By applying this method we can get an idea about the direction of the
correlation and also whether it is high or low. But we cannot establish the exact
mathematical methods.
2. Graphic method
Under this method individual values of two variables are plotted on the
graph paper thus obtain two variables one for X variable and another for Y
variable. By examining the direction and closeness of the to drawn we can inter
whether not the variables are related. If both the curves drawn on the graph are
positive. On the other hand, if the curves are moving in the opposite directions
1979 100 90
1980 102 91
1981 105 93
1982 105 95
1983 101 92
1984 112 94
The pearson coefficient of correlation is denoted by the symbol r. the formula for
computing pesrsonian r
(i) when deviations of is the items are taken from actual mean.
r= ∑xy
Nδx δy
Where,
x = (X – X̅ ) & y = ( Y - Y̅ )
δx = standard deviation of series X,
δy = standard deviation of series Y,
N = number of pairs of observations,
r = the correlation coefficient,
The value of r lies between +1 and -1, i.e., cannot be greater than 1 or less than -1.
If r = +1 correlation is perfect and positive. If, r = -1 correlation is perfect and
negative. If r = 0 there is no correlation i.e., the variables are independents the above
formula for computing pearson coefficient of correlation can be transformed to
following form which is easier to apply.
r= Σx y
Σx² Σy² (i)
x = ( X – X̅ )
y = ( Y – Y̅ ) √
(ii) Direct method of finding out correlation
r= N Σxy - (Σx) (Σ y)
√ NΣx² - (Σx)² √ NΣy² - (Σy)²
X Y X- X̅ X² Y- Ȳ Y² XY
48 45 14 196 10 100 140
35 20 1 01 -15 225 -15
17 40 -17 289 5 25 -185
23 25 -11 121 -10 100 110
470 45 13 169 10 100 130
170 175 00 776 00 550 280
r = Σx y .
√ Σx² Σy²
x = ( X – X̅ ) y = ( Y – Y̅ )
X̅ = Σx = 170 = 34 Ȳ = Σy = 175 = 35
N 5 N 5
r = 280 .
√ 776*550
= 280
653.29
= 0.429
(ii) Direct method of finding out correlation
r= N Σxy - (Σx) (Σy)
√ NΣx² - (Σx)² √N Σy² - (Σy)²
X Y X² Y² XY
r= N Σxy - (Σx) (Σy)
9 15 81 225 135 √ NΣx² - (Σx)² √N Σy² - (Σy)²
8 16 64 256 128
7 14 49 196 98
6 13 36 169 78
= 9*597 – 45*108 .
5 11 25 121 55 √9*285- (45)² √ 9*1356-
4 12 16 144 48 (108)²
3 10 9 100 30
2 8 4 64 16 = 5373 – 4860 .
1 9 1 81 9
√ 2565 – 2025 √ 12204 -11664
45 108 285 1356 597
= 513 .
√ 540 √ 540
= 513 .
540
r = 0.95
(iii) When deviations are taken from an assumed mean :
r= N Σ dx dy - {(Σ dx )*(Σ dy)}
√ N Σdx² - (Σdx)² √ NΣdy² - (Σdy)²
Example :
Calculate Karl Pearson coefficient of correlation between the calues of X and Y for the
following data:X : 78 89 96 69 59 79 68 61
Y : 125 13 156 112 107 136 123 108
r= 16928 - 5076
√ 11800 – 2209 √ 27744 – 11664
r= 1852 .
√ 9591√ 16080
r= 11852 .
97.93 * 126.80
r= 11852 .
12417.78
r= 0.954
Assumptions of the Pearson's coefficient
2) The two variables under study are affected by a large number of causes so as to form
a normal distribution.
3) There is a cause and effect relationship between the forces affecting the distribution
4) It is most popular.
5) It summarizes in one figure not only the degree of correlation but also the direction,
1)The correlation coefficient always assumes linear relationship regardless of the fact
4)This method takes more time to compute the value of correlation coefficient.
Properties of coefficient of correlation :
1. The coefficient of correlation lies between -1 and +1. symbolically, -1 ≤ r ≥ +1 or
2. The coefficient of correlation is independent of the change of scale and origin of the
variables X and Y.
By change of origin we mean subtracting some constant from every given
values of X and Y and by change of scale we mean dividing or multiplying every value
of X and y by some constant.
3. The coefficient of correlation is the geometric mean of two regression coefficient.
Symbolically,
r = √ bxy * byx .
4. The degree of relationship between the two variables is symmetric as shown below
rxy = ryx
rxy = Σx y = Σy x = ryx
Nδx δy N Σ δy δx
Spearman’s coefficient Rank correlation :
R = 1-(6 ΣD² )
N (N² –1)
R = Rank correlation coefficient
The value of this coefficient ,interpreted in the same way as karl pearson’s
correlation coefficient, ranges between +1 and -1. when r ranks are in the same
direction, when the r is -1 there is complete agreement in the order to the ranks
and they are in opposite directions.
Example :
R1 R2 D= R1 – R2 D²
1 3 -2 4
2 2 0 0
3 1 2 4
D² = 8
R = 1- (6 ΣD² )
N (N² –1)
= 1- 6 * 8
3 (3² –1)
= 1 - 48
24
R= - 1
In rank correlation we may have two types of problems :
Where actual ranks are given to us the required for computing rank correlation are :
(i) Take the difference of the two ranks, i.e., (R1-R2) and denote these differences by D.
D² = 214
2) Where ranks are not given :
When we are given the actual data and not the ranks, it will be necessary to assign
the ranks. Ranks can be assigned by taking either highest value as 1 or the lowest value
as 1. but whether we start with lowest value or the highest value we must follow the
Example:
company are given below, using rank correlation method, determine the relationship
1 97.8 3 73.2 1 2 4
2 99.2 7 85.8 6 1 1
3 98.8 6 78.9 4 2 4
4 98.3 4 75.8 2 2 4
5 98.4 5 77.2 3 2 4
6 96.7 1 87.2 7 6 36
7 97.1 2 83.8 5 3 9
D² = 62
R = 1- (6 ΣD² )
N 3 –N
= 1- 6 * 62
73 – 7
= 1 - 372
336
= - 36
336
R = - 0.107
Equal ranks
If two or more items are of equal value, they can assigned average rank. An
adjustment is required for each group of equal ranks. The formula for calculating
If there are one such group of items with common ranks, this value is
X Rx Y Ry D = (Rx – Ry) D2
80 8 12 1 7 49
78 7 13 2 5 25
75 5.5 14 4 1.5 2.25
75 5.5 14 4 1.5 2.25
68 4 14 4 0 00
67 3 16 7 4 16
60 2 15 6 4 16
59 1 17 8 7 49
D² = 159.5
(1)This method is simpler to understand and easier to apply compared to the karl
Pearson's method.
(2)When the data use of a qualitative nature like honesty, efficiency etc, this method can
be used with great advantage.
(3) This is the only method that can be used where we can given the ranks and not actual
data.
(4)Even where actual data are given, rank method can be applied for ascertaining
correlation
Limitations
1) This method cannot be used for finding out correlation in a grouped frequency distribution.
2) This method should not be applied where N exceeds 30 censes we can given the ranks and
28
Regression analysis
Introduction
Regression analysis reveals average relationship between two variables and this
makes possible estimation or prediction. The two variable regression model assigns one of
the variables the status of an independent variable, and the other variable the status of a
de-pendent variable.
Y – Y̅ = r δy / δx (x - x̅ )
X Y x x2 y y2 xy
6 9 0 0 1 1 0
2 11 -4 16 3 9 -12
10 5 4 16 -3 9 -12
4 8 -2 04 0 0 00
8 7 2 04 -1 1 -2
30 40 0 20 -26
Regression equation of X on Y
X – X̅ = r δx / δy (Y - Y̅ )
X̅ = ΣX = 30 = 6 Y ̅ = ΣY = 40 = 8
N 5 N 5
Hence,
Y – 8 = -0.65 (X – 6)
Y – 8 = -0.65X + 3.9
Y = -0.65X + 11.9
OR Y = 11.9 – 0.65X
Deviations taken from assumed means :
regression equation of X on Y :
( X – X̅ ) = r δx / δy (Y - Y̅ )
(Y - Y̅ ) = r δx / δy ( X – X̅ )
r δy / δx = Σdxdy – Σdx * Σdy / N
Σ dx² - (Σdx)² / N
Example: From the following data calculate the regression equation:
X Y Dx =X dX2 Dy = Y – 7 dY2 XY
–6
6 9 0 0 2 4 0
2 11 -4 16 4 16 -16
10 5 4 16 -2 4 -08
4 8 -2 04 1 1 -02
8 7 2 04 0 0 00
30 40 40 5 25 -26
dx = X – A dy = Y-A
X̅ = ΣX = 30 = 6 Y ̅ = ΣY = 40 = 8
N 5 N 5
Regression equation of Y on X
Y - Y̅ ̅ = byx ( X – X)
byx = Σdxdy – (Σdx) (Σdy)/N
Σ dx 2 – (Σdx 2) / N
= -26 – (0) (5)
40 – 0/5
= -26/40
= -.65
dx = X – A dy = Y-A
X̅ = ΣX = 30 = 6 Y ̅ = ΣY = 40 = 8
N 5 N 5
Regression equation of Y on X
( X – X) ̅ ̅ = byx (Y - Y )
byx = Σdxdy – (Σdx) (Σdy)/N
Σ dy 2 – (Σdy 2) / N
= -26
25 – (5) 2 /5
= -26/20
= -1.3
Difference Between Correlation And Regression
2. Correlation merely degree of relationship not cause and effect relationship. Regression cause
and effect relationship.
3. rxy is a measure of direction and degree of linear relationship between two variables X and Y,
rxy and ryx symmetric (rxy = ryx). i.e., it is immaterial which of X and Y is dependent
variable and which is independent variable. In regression analysis the regression coefficients
bxy and byx are not symmetric i.e., bxy ≠ byx and hence it definitely makes a difference as to
4. There may be nonsense correlation between two variables, it is purely due to change and has
no practical relevance. Nothing like nonsense regression.
correlation takes the same sign as the ( r ) regression coefficient ( bxy and byx )
• The following points should be noted about the regression coefficient :
1. Both the regression coefficients will have the same sign. i.e., either they will be positive or
negative. It is never possible that one of the regression coefficients is negative and other
positive.
2. Since the value of the coefficient of correlation ( r ) cannot exceed one, one of the regression
coefficient must be less than one or in other words, both the regression coefficients can not be
greater than one. For ex – if bxy = 1.2 and byx = 1.4 the value of correlation coefficient would
3. The coefficient of correlation will have the same sign as that of regression coefficients i.e., if
regression coefficient have a negative sign, r will also be negative, if regression coefficient
have a positive sign r would also be positive. For example – if bxy = -0.8 and byx = -1.2 r
example, if we know that r = 0.6, δx = 0.4 and bxy = 0.8 we can find δy, bxy = r δx / δy .
5. Regression coefficients are independent of change of origin but not scale ( change of origin
subtracting some constant change of scale. Dividing, multiplying, every values x and y).