Data Analysis
Data Analysis
LINEAR CORRELATION
NTRODUCTION
ANALYSIS
se of central tendency, dispersion and
rable. But in the real world we have problems skewness, we study problems relating to one
seknow
there is correlation between
price and pertaining to twO or more than two variabies.
quantity
f husband and wife. income and consumption etc. The demanded, price and quantity supplied.
rables can be measured with the help of extent of relationship between any two
comelation.
correlation coefficient. It gives one The measure of correlation is called correlation
indexor figure
the coefficient ofcorrelation helps us in which shows the degree and direction of correlation.
HO 0r MOre than two variables
determing the closeness of the relationship between
Thus whenever twO variables are so related that a change in the
achange in the value of the other, in such a way that (a) value of one is accompanied
an increase or decrease is one variable
acompanied by an ncrease or decrease in the other or (b) a fall in one variable is accompanied
a fall or rise in the other; then the variables are said to be
correlated. If two
regression lines are
enical (one straight line), then correlation coefficient varies between -1 and +1.
1.1 Definition :
"Correlaton is an analysis of the co-variation between two or more variables."
-A.M. Tuttle,
"When the relationship is of aquantitative nature, the appropriate statistical tool for
discovering and measuring the relationship and expressing it in a brief formula is known as
correlation." -Croxton and Cowden
"The effect of correlation is to reduce the range of uncertainty of one's prediction " Tippett
"F two or more quantities vary in sympathy so that movement in the one tend to be
2ccompanied by corresponding movement in the other, then they are said to be correlated."
-LR. Connor
"Correlation analysis attempts to determine the degree of relationship between variables."
-Ya Lun Chou,
Inus, we can say that corelation is a statistical technique which shows the degree and direction
0 relationship between the two
variables.
USES OR SIGNIFICANCE OF CORRELATION ANALYSIS
Ine use or utility of the study of correlation is clear from the following points:
relationship between two
) The correlation coefficient helos us in measuring the extent of elationship
and extent of the between two
Of more than two variables, The degree
Variables is, of course, one of the most important problems in statistics.
the future. For instance, if there are
S hrough correlation that we can predict about and hence can expect fall inprice of
Sod monsoons, we can eXpect better food supply
foodgrains and other products. know the value of another variable. I is, of
le value of a variable is givyen, we can
cOurse, done with the help ofeconomic
regression analysis. It helps us in knowing the important
behaviour.
orelation contributes to
variables on which others depend.
LNEARCORA
Price:
50
100 120
70 80
ACCBOOK 90
Supply: 140 160
This numerical illustration shows that as price increases
I80
ne Supply also increases and vice
versa.
Price : 50 60 70 80 90
Demand 180 160 140 120 100
This table shows that with the increase in price, the demand falls and vice versa.
4.2. Simple and Multiple Correlation
(a) Simple Correlation. When there are only two variables and the relationship is studied
between those two variables, it is a case of simple correlation. Relationships between height and
weight, price and demand or income and consumption etc. are examples of simple correlation.
(b) Multiple Correlation. When there are more than two variables and we study the
relationship between one variable and all the other variables taken together then it is a case of multiple
correlation. Suppose there are three variables 1, 2, 3 we can study the multiple correlation between
Aand B & Ctaken tògether or between Band A & C together etc. It can be denoted as R123 or
R2.13 Or R3.12
4.3. Partial and Total Correlation.
(a) Partial Correlation. When there are more than twovariables and the relationship between
any two of the variables is studied assumingother variables as constant it is a case of partial correlation.
Inis, in fact, is an extension of multiple correlation. Suppose we study the relationship between
laintall and crop, without taking into consideration the effects of other inputs like fertilizers, seeds
nd pesticides etc., this technique will be known as partial correlation. Symbolically if , y, z are
a nree variables then partial correlation between x and y excluding z will be given by sy.z Fy
or ryzx
(0) 1otal Correlation. When the correlation between the variables under study taken together
a alime, is worked out. it is called total correlation.
consideration is that this line will indicate positive relation if a' is
positive and main
in case 'a' worth
point is negative the correlation will also be negative. In such type of correlation
of correlation is always + l or - 1depending on the sign of a' in the
equati y=coefficient
of the
equation Or ax + b. Correlation will be + 1 if 'a' is + ve and -1 if 'a' is -ve.
4.4
Linear
(a)
to a unit change
and Non-Lincar Correlation
Linearin Correlation. The correlation
one variable, there between
is a constant twoin variables
change
will be linear if corresponding
the other variable over the entire range
of the
values. For instance, we consider the following data :
LINEAR (ORRELATON ANA
-V2X-?
N N
Similarly N N
r= Cov.(X,Y)
GOy
Lxy
Thus Exy
N N
deviationsof
This formula is simple to apply as it does not require calculations of standard
the two series.
LINEAR CORRELATION ANALYSIS
Steps. 4.9
Calculate the mean of X and Y
2. Calculate the series
Xand Y (i.e., Calculate and Y)
= (Y-Y)), deviations of
from their
3. Square these respective means [i.e., x= (X-X) and y
deviations in Xand Y
individually in the two series series. Findthe sum of the
lie Ex'= E(X-X) and Ey = squares of the deviations
4. Multiply the single Z(Y-y}
deviation in X
series and find sum of it. series with its
[i.e., Exy = Z[(X-X)(Y-Y) corresponding single deviation in Y
5. Put the values in the
formula
2xy
EXAMPLE 1, Calculate the co-efficient of
X 12 9 correlation from the following data :
14
6 13
SOLUTION. Calculation of co-efficient of 12
correlation
12 14
x=X-X y=Y -ù
4 25
6 -2 1
10 -3 4
6
13 12 4
3 3 2
7 3 -3
EX = 70 EY =63 36 18
EY 63 Ex=28 Ey =84
X=10 Y=
7
N
=9
}ry =46
Exy 46 46
VExy V28x 84 48.497
r=0.9485
Aliter 28 =V4 -2
N
N
-V84 =VIN=3464,
r=: Exy 46 46
N¡,Oy 7x2x3.464 48.497 =0.9485
6.3.1.1 Limitations of the Method.
This method has a lengthy process because the true means of both the
st and then the deviations are taken. The original yalues series are to be calculated
of the standard deviations are also to be
AOWn, only then the final formula can be used. Easier to this method is the product moment method.
1ne product moment method does not involve the calculation of standard deviations of the
two, series separately.
and y AAMPLE 2. From the following data compute the co-efficient of correlation between X
X series Y seriesS
No. of
itms 15 15
Ariihmetic Mean
Square of deviations from mean
25
136
18
138
Correlationand Regression A
Analysis 135
Example: Production
expenses
temperature, study time and and
grades etc. sales, Height &
2. Negative Correlation
weight, water consumption and
Scatter Diagram
ScatterDiagram is a graph of observed plotted point where each point
the values of X & Y as acoordinate. It represent
portrays the relationship between these t
variables graphically.
SL. No. Maths
Statistics
1. 55 60
2. 70 65
3. 35 50
4. 40 60 Scatter
70+ Diagram
5. 65 75 60
6. 40 70 50
8. 20 40 S 30
20 Low Degree of
9. 30 60 Positive
Corelation
10. 50 30
11. 10 30
12, 20 10 Maths
Correlation and
RegressionnAnalysis 137
Advantages of Scatter Diagram
It is a very simple and non
It is not.influenced by the mathemnatical
1.
method.
2 size of extreme item.
Eirst step in investing the
3.
Disadvantage of Scatter
relationship between two variables.
It cannot be adopted the
Diagram
1.
exact degree of correlation.
Karl Pearson's
Karl Pearson's
Co-efficient of Correlation
coefficient of
Coefficient of Correlation denoted by. (-1 = r =+1). "The
correlation r measure the degree of linear
variables say x & y. relationship between two
Degree of Correlation is expressed by a value of
Indicated by sign (- ve) or (+ ve). Coefficient. Direction of change is
Determination ,
Decisio
The (ettieient of
Supppse:r09
081 this would mean that 81% of the variation in the dependent yAriable ha
independent variable.
been explained by the is possible to explain all of
of is I because it the
The maximum value
in y but it is not possible to explain more than all of it.
Explained variation /Total variation.
variatie
Coefficient of Determination =
Co-efficient of Correlation
of Kart Pearson's
Merits This method indicates the presence or absence of correlation between t
1.
their correlation.
variables and gives the exact degree of
correlation- positive or negative.
2. To ascertainthe direction of the
3 This method has many algebraic properties for which the calculation*
made easy.
eficient of correlation and other related factors are
Demerits of Karl Pearson's Co-efficient of Correlation
calculations.
1. It is more difficult to calculate than other methods of
2. It is much afected by the values of the extreme items.
3. It is very much likely to be misinterpreted in case of homogeneous data.
Procedure for Computing the Correlation Co-efficient
1. Calculate the mean of the two series X&Y.
2. Calculate the deviations X& Y in two series from their respective mean,
Square each deviation of 'x & y then obtain the sum of the squared deviation
ie., x² & Ly'
4. Multiply each deviation under x with each deviation under y & obtain th:
product of xy'.
5. Then obtain the sum of the product of x, y i.e., xy
6. Substitute the value in the formula.
Direct Method
Type 1: This method is used when given variables are small in magnitude.
Type 2: It is direct formula to find r. This formula can effectively be used where
and Y is not in fractions. The formula is
Exy
r =
Neyative oelatlo
Slmpte and uluple conelalon
inear ad non heal corelatlon
Positive Correlatlon )4.
vatables are moving in the same diyeton, wa rall # gh sye (Mdaun
r t ,
\When twKO valje of n t e
varlable leads to ncrease In
an incAse in the value of one In value of Aney Varipl, fy
one varlable leads to decreA6e
txrease n the value of productlon, elc,
exanle, helght and welght, ralntall and food
Negative Correlation
moving In the opposlte directlon I,e,, an ingea 0n ore Varle
When tvO varlables are varlable and vice versa, ls called AS negaiye Drretm,
lets to decrease nother
example, prices and demand; yleld of crops and prlces,
3. Simple Correlatlon
variables relatlonship are studled, the relationship is called sirse
When only two
When more than two varlables relatlonshlp are studled at a ime, VIe all t
reation,
muBtiple corelations.
4. Partiat Correlatlon
method used to descrlbe the relationship between two varabes
Partial correlation is a on tis
varlable, or several other variables, betnen
whilst taking away the effects of another correlatlon, the three correlations
relationship. For the calculatlon of the partlal
partlal correlatlon then In ryCorrelation bewe
individual variables are requlred. The
the partlal corrçlatlon ryg tells how strongy
variable x and y is generated by the varlabley,z. IfThe
the correlatlon of both variables th the
the variable x correlates with the varlable
variablezis calculated out.
5. Multiple Correlations
at a time, we call it as mutiple
When more than two variables relationship are studled
correlations.
6. Linear and Non-Linear Correlatlon
there will be linear corelation
Ratio of change between two variables is unlfornm, then
between them. For example:
X: 10 15 20 25 30
Y: 2 4 6 10
It is dear from the above example that the ratio of change in X variable and Y variable is
uniform. When we plot these on the graph we will get a straight line.
The relationship between two variables is said to be non-linear or curvilinear, when the
amount of change in one example:
variabie does not bear a constant ratio of the amount of change
in the other variable. For
5 10 15 20 25
X:
1 4 6 7 9
Y:
Methods of Correlation
In order to know the existence of correlation between the variables, the fllowing methods
are helpful. They are:
1. Scatter diagram
2. Karl Pearson's coefficient of correlation
3. Coefficient of correlation and probable error
Rank Correlation Coefficient
4.
5.
Regression line
250
CORRELATION
No correlation
Merltsand Limitations of Scatter Diagram
It is the simplest
hetween two varlablesgraphical method of studying the
2.
without
The conclusion drawn under triismathematical calculation. existence cf coretatiorn
dots path. So, this method is not method is based on the positian of
2 This method is time saving as affected by extreme observations. majarty of
compared to other methods.
Ilmitation: This method fails to give the exact
between twWo variables. numerical value of existence of coretaticn
2 Karl Pearson'sCoefficient of
Karl Pearson is wel known in the field Correlation
of
mathematical measurements of correlationbiometric
and statistics. He has sucgested the
between two variables. His method of
calculating correlation is popularly called as Karl Pearson's coefficient of correlation and ts
denoted by'r The following formula is used to calculate coefficient of correlation.
No, Gy
Where,
Karl Pearson's coefficient of correlation
Zxy Summationof product x(x - X) and y (y- y)
N
No. of pairs of observations
Standard deviations cof x series
Oy of coefficient
The value Standard deviationssha1ll
of correlation cof yalways
serles lies between +! and-1. When r=+L,
then there is a perfect positive Correlaticon, when r=-1 thenthere is a perfect negative
variables.
Corrabove
The elation, When r=0 then there is no correlation between two
correlation is tedious. In orderto
said formula for calculation of coefficient ofsuggested of
make it methods are to calculate coefficient
easy calculation, the following
Correlation.
a)
1. Direct mnethod y²
2 Square given
Sum up the Xand yvalues i.e!., x² and and Ey2
Ex²
the squared Xand y valuers i.e.,
253
BUSINSS STATISTCS
N N
Note: This method is used when the Items are in small number in both the series.
b) When the devlations are taken from arlthmetic mean:
1. Find out the arithmetic mean of x and y serles.
2 Take the deviations of mean from the observations of x and y series and denote du
and dy.
3 Multiply the dx into dy and sum up the product of dx and dy and denote Ldx dy.
4 Square the devlation of x and y serles and denote dx2 and dy2,
Sum up the squared devlations of × and y serles and denote zdy? and Edy2,
6. Substitute the values obtalned from the above steps in the following formula:
Ldx dy
254
CORRELATION
X-A
Takethe
step deviations of x varlable i.e., dx =
2. y- A
Takethe
step deviations of y variable i.e., dy =
3. Muitiply the deviations of x and y variable obtained in the step-2 and 3 with
corresponding frequencies of each cell and note the obtained figure in the upper
hand corner of
each cell.
riaht
Sum up all the values obtained in step-4 and get total i.e., Efdxdy.
Sum up the product of dx with corresponding frequencies() and get zfdx.
6
Sum up the product of dy with corresponding frequencies (f) and get Efdy.
Square the dx2 and multiply with corresponding frequencies, then sum up and get
zfdx².
Square the dy' and multiply with Corresponding frequencies, then sum up and get
9.
Efdy².
Substitute the values in the following formula to get the correlation coefficient.
10.
Efdxdy Efdx X
N
2fdy,
(2fdx)? (zfdy
Coefficient of Correlation and Probable Error
3.
koow the reliability or significance of the value of Pearsons coefficient of correlation,
orobable error is used. AcCording to Horace Secrist "The probable error of the coefficient of
orrelation is an amount which, if added to or subtracted from the mean correlation
coefficient, produce!s amounts within which the chances are even that a coefficient of
orrelation from aseries selected at random willfall." The formula for calculating probable
error is:
Probable Errcor of r = 0.6745
1-r
255
BUSINESS STATISTCS
Example:
Find the probable error. Assume that the correlation coefficient is 0.8
samples are 25 and the
Solution: We will use the most common method to calculate the outcome
Here, r0.8 and n=25. We know that, of the
pairs of
Probable Error= 0.6745 1-r' foloving,
So, on putting the values:
Probable Error= 0.674 x {(1-(0.8)')/V25}
= 0.674 x {(1-0.64)/V5}
= 0.674 x (0.36/5)
= 0.0486
Therefore, the probable error is 0.0486
4. Rank Correlation Coefficient
This is the third method in correlation. This
Spearman, a British psychologist in the year 1904.methÍd is developed by Charles FA.
This method is used to
coefficient correlation by assigning the ranks to the items. ascertin aidesln the
This measure is useful
with qualitative characteristics of items such as
failed to give quantitative measurement of beauty, intelligence, morality, etc. But t
use of Pearson's calculation of coefficient of correlation coefficient which is possible hy k
Calculation of Rank correlation is pOssible correlation.
obtained is approximate one, because actualin case data
of individual series only. The resulk
is
assigned ranks are taken intó consideration. The not considered for calculaion. Hhe
Rank Correlation Coefficient. following formula is used to.calauata
R 62D² 62D
1 or R = 1
N°-N
Where, N(N-1)
R
Rank Correlation Coefficient
Sum of the squares of
N differences of two ranks
Number of paired observations
Steps
1
Assign the ranks to x and y variables
taking highest value as rank 1* continue this
process till to exhaust all the variables and denote this column as Ry
2.
respectively. and R
Take the deviations of the two
ranks
3.
Square the D' andsum up, then we and denote 'D'.
4
Substitute the values in the above saidgetformula.
D2,
Tied Ranks or Equal
When the two or more Ranks
values
or Repeated Ranks
have equal values is
the rank in such case, we
have to allot the average called tied values, It is diffcuit to
rank to the tied values. For exdaiydlu
two individuals are placed in 5h
place, they are given the rank 5.5 5+6
) to eacl;
and next will be 7. If three are (i.e.,
5+6+7
equal ranked at the 5 place, they are
-), which is the common rank to given the rank O
3 be assigned to each: and the next rank W
The formula used in this case is as
follows:
6{2D²1
R 1 12 (m -m) +121 (m-m) t .)
N-N
256
CORRELATION
257