0% found this document useful (0 votes)
48 views

Data Analysis

This document discusses linear correlation between two or more variables. It defines correlation as the analysis of co-variation between variables and notes that the correlation coefficient measures the degree and direction of correlation. A positive correlation means that as one variable increases, the other also increases, while a negative correlation means that as one variable increases, the other decreases. Simple correlation analyzes the relationship between two variables, while multiple correlation analyzes the relationship between one variable and multiple other variables. Partial correlation studies the relationship between two variables while controlling for other variables, and total correlation analyzes the relationship between all variables together. Linear correlation means a constant change in one variable corresponds to a constant change in the other over the entire range of values.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views

Data Analysis

This document discusses linear correlation between two or more variables. It defines correlation as the analysis of co-variation between variables and notes that the correlation coefficient measures the degree and direction of correlation. A positive correlation means that as one variable increases, the other also increases, while a negative correlation means that as one variable increases, the other decreases. Simple correlation analyzes the relationship between two variables, while multiple correlation analyzes the relationship between one variable and multiple other variables. Partial correlation studies the relationship between two variables while controlling for other variables, and total correlation analyzes the relationship between all variables together. Linear correlation means a constant change in one variable corresponds to a constant change in the other over the entire range of values.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

hapter 4

LINEAR CORRELATION
NTRODUCTION
ANALYSIS
se of central tendency, dispersion and
rable. But in the real world we have problems skewness, we study problems relating to one
seknow
there is correlation between
price and pertaining to twO or more than two variabies.
quantity
f husband and wife. income and consumption etc. The demanded, price and quantity supplied.
rables can be measured with the help of extent of relationship between any two
comelation.
correlation coefficient. It gives one The measure of correlation is called correlation
indexor figure
the coefficient ofcorrelation helps us in which shows the degree and direction of correlation.
HO 0r MOre than two variables
determing the closeness of the relationship between
Thus whenever twO variables are so related that a change in the
achange in the value of the other, in such a way that (a) value of one is accompanied
an increase or decrease is one variable
acompanied by an ncrease or decrease in the other or (b) a fall in one variable is accompanied
a fall or rise in the other; then the variables are said to be
correlated. If two
regression lines are
enical (one straight line), then correlation coefficient varies between -1 and +1.
1.1 Definition :
"Correlaton is an analysis of the co-variation between two or more variables."
-A.M. Tuttle,
"When the relationship is of aquantitative nature, the appropriate statistical tool for
discovering and measuring the relationship and expressing it in a brief formula is known as
correlation." -Croxton and Cowden
"The effect of correlation is to reduce the range of uncertainty of one's prediction " Tippett
"F two or more quantities vary in sympathy so that movement in the one tend to be
2ccompanied by corresponding movement in the other, then they are said to be correlated."
-LR. Connor
"Correlation analysis attempts to determine the degree of relationship between variables."
-Ya Lun Chou,
Inus, we can say that corelation is a statistical technique which shows the degree and direction
0 relationship between the two
variables.
USES OR SIGNIFICANCE OF CORRELATION ANALYSIS
Ine use or utility of the study of correlation is clear from the following points:
relationship between two
) The correlation coefficient helos us in measuring the extent of elationship
and extent of the between two
Of more than two variables, The degree
Variables is, of course, one of the most important problems in statistics.
the future. For instance, if there are
S hrough correlation that we can predict about and hence can expect fall inprice of
Sod monsoons, we can eXpect better food supply
foodgrains and other products. know the value of another variable. I is, of
le value of a variable is givyen, we can
cOurse, done with the help ofeconomic
regression analysis. It helps us in knowing the important
behaviour.
orelation contributes to
variables on which others depend.
LNEARCORA

Positive Or direct 4.3


(a) inone variable. the correlation. If
other variable the two variables move in the
gn
increase
wgriable
also
falls, the
correlation is said to be also increases or with a fall in same direction i.e. with
one variable,
positive. For the other
up andexample,
It means if price goes
up, the supply goes price and
vice versa. It cansupply are
be shownpositively
elatcd. arrows>
helpof with the
P P S

Price:
50
100 120
70 80
ACCBOOK 90
Supply: 140 160
This numerical illustration shows that as price increases
I80
ne Supply also increases and vice
versa.

leggtive or Inverse Correlation, If two variables move in


increase in one variable, the other variable falls or with the fall in opposite directiur i.e. with
the correlation is said to be negative or inverse. For one variable, the other vanatBe
rises,the example, the law of dernand shows inverse
relation between price and demand. It can be shown with the help of
arrows
P D P D

Price : 50 60 70 80 90
Demand 180 160 140 120 100
This table shows that with the increase in price, the demand falls and vice versa.
4.2. Simple and Multiple Correlation
(a) Simple Correlation. When there are only two variables and the relationship is studied
between those two variables, it is a case of simple correlation. Relationships between height and
weight, price and demand or income and consumption etc. are examples of simple correlation.
(b) Multiple Correlation. When there are more than two variables and we study the
relationship between one variable and all the other variables taken together then it is a case of multiple
correlation. Suppose there are three variables 1, 2, 3 we can study the multiple correlation between
Aand B & Ctaken tògether or between Band A & C together etc. It can be denoted as R123 or
R2.13 Or R3.12
4.3. Partial and Total Correlation.
(a) Partial Correlation. When there are more than twovariables and the relationship between
any two of the variables is studied assumingother variables as constant it is a case of partial correlation.
Inis, in fact, is an extension of multiple correlation. Suppose we study the relationship between
laintall and crop, without taking into consideration the effects of other inputs like fertilizers, seeds
nd pesticides etc., this technique will be known as partial correlation. Symbolically if , y, z are
a nree variables then partial correlation between x and y excluding z will be given by sy.z Fy
or ryzx
(0) 1otal Correlation. When the correlation between the variables under study taken together
a alime, is worked out. it is called total correlation.
consideration is that this line will indicate positive relation if a' is
positive and main
in case 'a' worth
point is negative the correlation will also be negative. In such type of correlation
of correlation is always + l or - 1depending on the sign of a' in the
equati y=coefficient
of the
equation Or ax + b. Correlation will be + 1 if 'a' is + ve and -1 if 'a' is -ve.
4.4
Linear
(a)
to a unit change
and Non-Lincar Correlation
Linearin Correlation. The correlation
one variable, there between
is a constant twoin variables
change
will be linear if corresponding
the other variable over the entire range
of the
values. For instance, we consider the following data :
LINEAR (ORRELATON ANA

shsthat fu unit change in value of a, y


the is Nant hange e in the coesponding
vals f N Mahenutàally, itcan be shown as
lu gena.iv aiables are linearly related if
there exists a relatisp of the form Y = + X
ween them.
ln the abve
bis the slope Ifwe equation,
plot the
'a' is intercept, whercas
we will get a suaight line.values of the two variables,
consideration is that this line Thewill
main point worth Fig. 1.
relation if 'a' is posiive and ìn caseindicate positive LInoar Posltlve Flg. 2. Llnoar
then correlation will 'a' is
also be negative. In negative Correlatlon
types of correlation the
of corelation is value of the coefficientsuch CorrolatlonNegutl
the sign of a` inalways + I or
the equation-1 depending on
Correlation will be + of
1if a' is + vey = ax + b.
'a' is -ve. This and - 1 if
in type of relation does not
economics and
of relation can other social sciences. This exist
exist type
However, it has greatonly in physical sciences.
economics and other social theoretical
importance in
(b) Non-linear sciences.
relationship
non-linear or between two Correlation,
variables will
The Fig. 3.
Curvilinear &+ ve
different curvi-linear,
at a it be correlation Fig. 4.
rate. If such correspondingto a unit Curvilinear
correlation
&-vo
means the slope of the data is plotted, we do not change in one
variable, the other
plotted curve is not get a sraight
a curvevariables
the form of y= Mathematically
will get the graphs as
the
ax + bx +relationship
cor y= ab
constant.
between x and y will
line but type change
figure. It
ACCBODOK etc. When the never be of the form y=
Such types of curve rather than the straight lines, values are plotted on
the
ar+ b, but in
social sciences. As such correlations
these
are found very
are very Commonly
as
in
shown in fig. 3 and 4. graph paper, we
4.5 Logical and important in the studythe fields of
(a) Logical Illogical
Correlation. of economics and the other
social sciences.
math emat ical ly Correlation.
defined When the
correlation between income and logically sound correlation
but also between two
too, it is called
logical correlations. consumption, price and demand,logical variables is not only
logically.
or positive can be
These correlations
In other words, are
it can be said that determined by both ways
confirmed by logic or by the correlations
age and correlation. habits
playing For example.
There exists functional
relationship between
(b) Illogical Correlation. In certain
variables which are though well defined
theapplying
variables.
cases we come across
in the
the requisite above cases mathematically as
etc. are
well as
statistical toolswhetof her negative
yet when tested on the logical point ofand established by
view they statistical such cases of correlation.
method
fail to justify their
relatprionshioductcorpreiolantiornelacoeftionshificipent,
For example relationship between rainfall of
the death rate. etc. These variables are and the number of babies of
correlations can be established by applying theconnected with each born, with each
not
statistical methods other in of otand
her.
correlation is known as Non-Sense Correlation or Spurious of any way. cycles
5. DEGREE AND
INTERPRETATION correlat OF
correlation.
(a) Accordingv to Karl Pearson, the coefficient ofCORRELATON
correlation. Such
But their
type of
tween two
LINEARCO
4.5

impliesiftthere isperfect positive relationship between (wo


t1.
It
be +1. On the contrary, i1 there is perfcct negative variables, the value of correlation
wouldof the correlation will be -l. It means r lies betwcenrclationship
value +1 and -1.between
Withintwo
these limits the
variables, the
correlation is interpreted as :
value of
r = +| Perfect positive corrclation
(a) When 0.75 but < + 1
(b) WWhen r > + High degrce of positive correlation
0.5 but < + 0.75
When r>+ Moderate degree of positive correlation
(c)
() When r> +0 but < + 0.3 Low degree of positive correlation
r = 0 No correlation at all
(e) When 0.75 but < - 1
When r >- High degree of negative correlation
r > - 0.5 but <-0.75 Moderate degree of negative correlation
(e) When
but < - (0.5
( )When r> 0 Low degree negative correlation
of
-l Perfect negative correlation
(i)When r =
6.
Methods of Studying Correlation
In correlation analysis, we are required to know the relationship between variables and the
that relationship. There are two methods which visualize the relationship between the
(wo variables i.e.(1) the Scatter Diagram Method and (2) Graphic Method. These are based on
graphsand diagrams. Then there are mathematical methods in which we include, (3) Karl Pearson's
eiticient of correlation, (4) Rank Correlation Method, (5)Concurrent Deviation Method, and (6)
Method of Least Squares.
6.1 Scatter Diagram Method
This method is also known as dot diagram, dotagram or scattergram.
Scater diagram is one of the simplest method of diagrammatic representation of a bivariate
(two variables) distribution. It provides the simplest tool of determining the correlation between two
variables. Suppose we are given n númber of pairs of values i.e. (xi, yi). (x2, yz)...xn yb) of X
and Y variables. If X and Y show the demand and price respectively, then the pairs (rI, y). (12.
y2)..*n, yn) show the demand and price in pairs of n numbers. These n points can be plotted on
the x and y axes in the xy plane. The two sets of figures are known as subject and relative. The
most important set which is ysed as the standard is known as Subject'" and the one of less importance
is known as "Relative". The values of subject are along the X-axis and the corresponding values
of the relative along the Y-axis. The diagram so obtained would be called Scatter diagram""
From the diagram, we can make a rough idea about the relationship between the two variables.
The term scatter refers to the dispersion or spread of the dots on the graph. We should keep
the following points in mind while interpreting the correlation between two variables through
scatter diagram:
(i) If the points plotted are very close to each other, it shows high correlation, otherwise poor
correlation is expected.
(i) correlation.
If the points But
on the diagram show upward or downward trend, then we say there is
in case no trend is shown by the points, it shows that the variables are
uncorrelated.
(i~i) If there is upward trend from left to right, the correlation is positive, that means the values
of two variables move in the same direction. On the other hand, if the points show a
downward trend from left to right, the correlation is negative as the values of the two
variables move in opposite direction.
(iv) The correlation would be perfect or equal to one if all the points lie on a straight line
starting from left bottom and going up towards the right top. On the other hand, the
correlation would be perfect and negative if all the points lie on a straight line starting
from top left to fall to right bottom.
We show the following diagrams to exhibit the different ypes of correlarion.
LINEAR CORRELATION ANALYSIo
4.8

perfect negative relationship. In case the value is *0, then it


positive correlation and -I means
mcans no relationship between the variables. of calculatins
method is considered to be an ideal method
4. ldealMeasurc. Karl Pearson's as a Standard
Correlation coefficient. It is because of the covariance which is most reliable
OT
statistical tool.
Calculation of Karl Pearson's Coefficient of Correlation
be divided into two parts
correlation can
The calculation of Karl Pearson's coefficient of
1. In case of Individual series or ungrouped data.
2. In case of grouped data.
series or ungrouped date
1. Calculation of coefficient of correlation in case of Individual
case of ungrouped date
According to Karl Pearson's Method, the coefficient of correlation in
is measured by the following two methods.
(1) Direct Method. (Actual Mean Method)
(2) Short-cut method (Assumed Average method)
(1) Direct Method: Calculation of r' Using Actual Means
When the deviations are taken from actual means the following formula is applied i.e.
Cov. (x, y) Ery
r or rxy =

Where Exy = Co-variance of x and y


x =(X-) means deviation in X series from its actual mean
y = (Y-) means deviations in Y series from its actual mean
o, = Standard deviation of X-series
Oy = Standard deviation of Y-series
N = Number of observations.
In the above
Zxy =c0-variance ofX and Y =Cov. (X, Y)
or Dxy

-V2X-?
N N

Similarly N N
r= Cov.(X,Y)
GOy
Lxy
Thus Exy

N N

Therefore r= }xy is the direct method to find r.

deviationsof
This formula is simple to apply as it does not require calculations of standard
the two series.
LINEAR CORRELATION ANALYSIS

Steps. 4.9
Calculate the mean of X and Y
2. Calculate the series
Xand Y (i.e., Calculate and Y)
= (Y-Y)), deviations of
from their
3. Square these respective means [i.e., x= (X-X) and y
deviations in Xand Y
individually in the two series series. Findthe sum of the
lie Ex'= E(X-X) and Ey = squares of the deviations
4. Multiply the single Z(Y-y}
deviation in X
series and find sum of it. series with its
[i.e., Exy = Z[(X-X)(Y-Y) corresponding single deviation in Y
5. Put the values in the
formula
2xy
EXAMPLE 1, Calculate the co-efficient of
X 12 9 correlation from the following data :
14
6 13
SOLUTION. Calculation of co-efficient of 12
correlation
12 14
x=X-X y=Y -ù
4 25
6 -2 1
10 -3 4
6
13 12 4
3 3 2
7 3 -3
EX = 70 EY =63 36 18
EY 63 Ex=28 Ey =84
X=10 Y=
7
N
=9
}ry =46

Exy 46 46
VExy V28x 84 48.497
r=0.9485

Aliter 28 =V4 -2
N

N
-V84 =VIN=3464,
r=: Exy 46 46
N¡,Oy 7x2x3.464 48.497 =0.9485
6.3.1.1 Limitations of the Method.
This method has a lengthy process because the true means of both the
st and then the deviations are taken. The original yalues series are to be calculated
of the standard deviations are also to be
AOWn, only then the final formula can be used. Easier to this method is the product moment method.
1ne product moment method does not involve the calculation of standard deviations of the
two, series separately.
and y AAMPLE 2. From the following data compute the co-efficient of correlation between X
X series Y seriesS
No. of
itms 15 15
Ariihmetic Mean
Square of deviations from mean
25
136
18
138
Correlationand Regression A
Analysis 135
Example: Production
expenses
temperature, study time and and
grades etc. sales, Height &
2. Negative Correlation
weight, water consumption and

The correlation is said to be


change with opposite negative correlation
direction.
driving ability etc. Example: Price &
when the values of
consumption and
Direction of the
Quantity demanded,variables
alcohol
Correlation
Positive relationship - Variables
change in the same
As Xis increasing Yis direction.
As X is decreasing - Y is increasing.
decreasing.
Example, As
height increases, s0 does weight.
tadative relationship - Variables
As X is increasing - Y is
change in opposite directions.
decreasing.
As X is decreasing - Y is
increasing.
Example, As TV time increases, grades
3. Partial Correlation
decrease.
In partial correlation more than two
variables influence each other, the effect of variables are recognised but only two
other
limit our correlation influencing
In the above example, if we variable is kept constant.
fertilizer variable as constant to becomes a analysis yield and
of
problem of partial correlation.rainfall keeping
I. On the basis of
Number of Sets
1. Simple Correlation
When onlytwo variables are studied it is a
case of simple correlation.
2. Multiple Correlation
When more than three variables are
F0r examples, When we study the studied it is known as multiple correlation.
both the amount of rainfall and therelationshipof between the yield of rice per acre and
amount fertilizer used it is a case of multiple
correlation.
IL.On thebasis of
1, Linear Correlation Change
bear aCorrelation
constant
is said to be linear when the amount of change in one variable tends to
having a ratio to the amount of change in the other. The graph of the variables
linear relationship will form a straight line.
136 Statistics for
Business
Example:
X= 1, 2, 3, 4, 5, 6, 7, 8,
Decisog
Y= 5, 7, 9, 11, 13, 15, 17, 19,
Y=3 +2x,
2. Non Linear
Correlation
The correlation would be non linear if the amount of change in
not bear a constant ratio to the amount of change in the other variableone variable
X 1 2 3 4
7 14 21 28 35

Methods of Determining Correlation


The various methods of studying correlation are as follows:
1. Scatter Diagram
2. Karl Pearsons Coefficient of Correlation
3. Rank Correlation

Scatter Diagram
ScatterDiagram is a graph of observed plotted point where each point
the values of X & Y as acoordinate. It represent
portrays the relationship between these t
variables graphically.
SL. No. Maths
Statistics
1. 55 60
2. 70 65
3. 35 50
4. 40 60 Scatter
70+ Diagram
5. 65 75 60
6. 40 70 50

7. 60 50 40+ Estimaing Line

8. 20 40 S 30

20 Low Degree of
9. 30 60 Positive
Corelation
10. 50 30

11. 10 30

12, 20 10 Maths
Correlation and
RegressionnAnalysis 137
Advantages of Scatter Diagram
It is a very simple and non
It is not.influenced by the mathemnatical
1.
method.
2 size of extreme item.
Eirst step in investing the
3.
Disadvantage of Scatter
relationship between two variables.
It cannot be adopted the
Diagram
1.
exact degree of correlation.
Karl Pearson's
Karl Pearson's
Co-efficient of Correlation
coefficient of
Coefficient of Correlation denoted by. (-1 = r =+1). "The
correlation r measure the degree of linear
variables say x & y. relationship between two
Degree of Correlation is expressed by a value of
Indicated by sign (- ve) or (+ ve). Coefficient. Direction of change is

When deviation taken from actual mean: r =

Interpretation of Correlation Co-efficient (r)


The value of correlation coefficient r
ranges from -1to +1
a) If r=+1, then the correlation between the two
positive. variables is said to be perfect and
b) Ifr= -1, then the correlation between the two variables is
said to be perfect and
negative.
) If r=0, then there exists no correlation between the
variables.
Properties of Correlation Co-efficient
a) The correlation coefficient lies between -1 & +1
symbolically (-1srs1)
b) The correlation coefficient is
independent of the change of origin & scale.
c) The co-efficient of correlation is the geometric mean of two regression coefficient.
r=bxy x byx
d)
The one regression coefficient is (+ve) other regression coefficient is also (+ve)
correlation coefficient is (+ve).
Coefficient of Determination
The convenient way of interpreting the value of correlation coefficient is to use of
oquare of coefficient of correlation which is called Coefficient of Determination.
Statistics for Business
13S

Determination ,
Decisio
The (ettieient of
Supppse:r09
081 this would mean that 81% of the variation in the dependent yAriable ha
independent variable.
been explained by the is possible to explain all of
of is I because it the
The maximum value
in y but it is not possible to explain more than all of it.
Explained variation /Total variation.
variatie
Coefficient of Determination =
Co-efficient of Correlation
of Kart Pearson's
Merits This method indicates the presence or absence of correlation between t
1.
their correlation.
variables and gives the exact degree of
correlation- positive or negative.
2. To ascertainthe direction of the
3 This method has many algebraic properties for which the calculation*
made easy.
eficient of correlation and other related factors are
Demerits of Karl Pearson's Co-efficient of Correlation
calculations.
1. It is more difficult to calculate than other methods of
2. It is much afected by the values of the extreme items.
3. It is very much likely to be misinterpreted in case of homogeneous data.
Procedure for Computing the Correlation Co-efficient
1. Calculate the mean of the two series X&Y.
2. Calculate the deviations X& Y in two series from their respective mean,
Square each deviation of 'x & y then obtain the sum of the squared deviation
ie., x² & Ly'
4. Multiply each deviation under x with each deviation under y & obtain th:
product of xy'.
5. Then obtain the sum of the product of x, y i.e., xy
6. Substitute the value in the formula.

Direct Method
Type 1: This method is used when given variables are small in magnitude.
Type 2: It is direct formula to find r. This formula can effectively be used where
and Y is not in fractions. The formula is
Exy
r =
Neyative oelatlo
Slmpte and uluple conelalon
inear ad non heal corelatlon
Positive Correlatlon )4.
vatables are moving in the same diyeton, wa rall # gh sye (Mdaun
r t ,
\When twKO valje of n t e
varlable leads to ncrease In
an incAse in the value of one In value of Aney Varipl, fy
one varlable leads to decreA6e
txrease n the value of productlon, elc,
exanle, helght and welght, ralntall and food
Negative Correlation
moving In the opposlte directlon I,e,, an ingea 0n ore Varle
When tvO varlables are varlable and vice versa, ls called AS negaiye Drretm,
lets to decrease nother
example, prices and demand; yleld of crops and prlces,
3. Simple Correlatlon
variables relatlonship are studled, the relationship is called sirse
When only two
When more than two varlables relatlonshlp are studled at a ime, VIe all t
reation,
muBtiple corelations.
4. Partiat Correlatlon
method used to descrlbe the relationship between two varabes
Partial correlation is a on tis
varlable, or several other variables, betnen
whilst taking away the effects of another correlatlon, the three correlations
relationship. For the calculatlon of the partlal
partlal correlatlon then In ryCorrelation bewe
individual variables are requlred. The
the partlal corrçlatlon ryg tells how strongy
variable x and y is generated by the varlabley,z. IfThe
the correlatlon of both variables th the
the variable x correlates with the varlable
variablezis calculated out.
5. Multiple Correlations
at a time, we call it as mutiple
When more than two variables relationship are studled
correlations.
6. Linear and Non-Linear Correlatlon
there will be linear corelation
Ratio of change between two variables is unlfornm, then
between them. For example:
X: 10 15 20 25 30
Y: 2 4 6 10
It is dear from the above example that the ratio of change in X variable and Y variable is
uniform. When we plot these on the graph we will get a straight line.
The relationship between two variables is said to be non-linear or curvilinear, when the
amount of change in one example:
variabie does not bear a constant ratio of the amount of change
in the other variable. For
5 10 15 20 25
X:
1 4 6 7 9
Y:
Methods of Correlation
In order to know the existence of correlation between the variables, the fllowing methods
are helpful. They are:
1. Scatter diagram
2. Karl Pearson's coefficient of correlation
3. Coefficient of correlation and probable error
Rank Correlation Coefficient
4.
5.
Regression line

250
CORRELATION

No correlation
Merltsand Limitations of Scatter Diagram
It is the simplest
hetween two varlablesgraphical method of studying the
2.
without
The conclusion drawn under triismathematical calculation. existence cf coretatiorn
dots path. So, this method is not method is based on the positian of
2 This method is time saving as affected by extreme observations. majarty of
compared to other methods.
Ilmitation: This method fails to give the exact
between twWo variables. numerical value of existence of coretaticn
2 Karl Pearson'sCoefficient of
Karl Pearson is wel known in the field Correlation
of
mathematical measurements of correlationbiometric
and statistics. He has sucgested the
between two variables. His method of
calculating correlation is popularly called as Karl Pearson's coefficient of correlation and ts
denoted by'r The following formula is used to calculate coefficient of correlation.

No, Gy
Where,
Karl Pearson's coefficient of correlation
Zxy Summationof product x(x - X) and y (y- y)
N
No. of pairs of observations
Standard deviations cof x series
Oy of coefficient
The value Standard deviationssha1ll
of correlation cof yalways
serles lies between +! and-1. When r=+L,
then there is a perfect positive Correlaticon, when r=-1 thenthere is a perfect negative
variables.
Corrabove
The elation, When r=0 then there is no correlation between two
correlation is tedious. In orderto
said formula for calculation of coefficient ofsuggested of
make it methods are to calculate coefficient
easy calculation, the following
Correlation.
a)
1. Direct mnethod y²
2 Square given
Sum up the Xand yvalues i.e!., x² and and Ey2
Ex²
the squared Xand y valuers i.e.,

253
BUSINSS STATISTCS

3. Multiply the x andy values and sum up I.e,, 2xy


(Exy x N)-(2)x Xy)
4. Apply the formulara
Vx'xN-(2x) zy'xN (2y)
OR
X Xy
SXY
N

N N
Note: This method is used when the Items are in small number in both the series.
b) When the devlations are taken from arlthmetic mean:
1. Find out the arithmetic mean of x and y serles.
2 Take the deviations of mean from the observations of x and y series and denote du
and dy.
3 Multiply the dx into dy and sum up the product of dx and dy and denote Ldx dy.
4 Square the devlation of x and y serles and denote dx2 and dy2,
Sum up the squared devlations of × and y serles and denote zdy? and Edy2,
6. Substitute the values obtalned from the above steps in the following formula:
Ldx dy

When deviations are taken from assumed mean:


When the deviations are taken from assumed mean to calculate coefficient of correlation.
the following formula is used,
2 dx X Edy
L dx dy
r N
(2 dx)?
N N
Where,
r coefficient of correlation
dy deviation of the items of x series from the assumed mean i.e., dx=
(X-A)
deviations of the items of y series from the assumed mean i.e., dy=
(y-A)
N = Number of items in the serie:s
Ed,d, = sum of the product of the: deviations of x and yseries from their
assumed mean
Zdy? sum of the squared deviaticns ofx series
Zdy? sum of the squared deviations ofy series
Edx sum of the deviation of x se:ries
Ddy sum of the deviations ofy seris
Correlation Coefficient of Bivariate Data
When the large number of observations of two series, the
data is classified into two-Way
(bivariate) frequency distribution or correlation taible. The formula
of coefficient of correlation is same as the is used in the calculaton
above: used
from assumed mean). The only difference is that formula (when deviations are taken
multiplied with the corresponding frequencies. the deviations of x and y are to be
Steps
1, Find the mid values of x and y class intervaals.

254
CORRELATION
X-A
Takethe
step deviations of x varlable i.e., dx =
2. y- A
Takethe
step deviations of y variable i.e., dy =
3. Muitiply the deviations of x and y variable obtained in the step-2 and 3 with
corresponding frequencies of each cell and note the obtained figure in the upper
hand corner of
each cell.
riaht
Sum up all the values obtained in step-4 and get total i.e., Efdxdy.
Sum up the product of dx with corresponding frequencies() and get zfdx.
6
Sum up the product of dy with corresponding frequencies (f) and get Efdy.
Square the dx2 and multiply with corresponding frequencies, then sum up and get
zfdx².
Square the dy' and multiply with Corresponding frequencies, then sum up and get
9.
Efdy².
Substitute the values in the following formula to get the correlation coefficient.
10.
Efdxdy Efdx X
N
2fdy,

(2fdx)? (zfdy
Coefficient of Correlation and Probable Error
3.
koow the reliability or significance of the value of Pearsons coefficient of correlation,
orobable error is used. AcCording to Horace Secrist "The probable error of the coefficient of
orrelation is an amount which, if added to or subtracted from the mean correlation
coefficient, produce!s amounts within which the chances are even that a coefficient of
orrelation from aseries selected at random willfall." The formula for calculating probable
error is:
Probable Errcor of r = 0.6745
1-r

Where, 0.6745 is a c:onstant number,


r = Pearson's coefficient of correlation
N= Number of pairs
The limits for popul:ation correlation coefficient are:
rt P.E.(r)
Functions of Protable Error:
a) If the value of r is less than the probable error, the value of r is not at all
significant.
b) If the \alue of r is more than six times the probable error (r>6PE), the value of
ris sig1nificant.
If the probable error is less than 0.3, the correlation should not be considered
at all.
d) If the probable error is small, the correlation definitely exists.
Conditions for t}héuse of Probable Error:
i) The: number of items should be large enough. When the number of pairs of
observation is small, the probable error may Jead to fallacious conclusions.
i) The distribution should have a normal distribution. That is, bell shaped
curve.
i) Thie items in the sample must have been selected by random sample method
and in an unbiased manner.
iv) The statistical measure for which probable error is computed must have been
from a sample.

255
BUSINESS STATISTCS
Example:
Find the probable error. Assume that the correlation coefficient is 0.8
samples are 25 and the
Solution: We will use the most common method to calculate the outcome
Here, r0.8 and n=25. We know that, of the
pairs of
Probable Error= 0.6745 1-r' foloving,
So, on putting the values:
Probable Error= 0.674 x {(1-(0.8)')/V25}
= 0.674 x {(1-0.64)/V5}
= 0.674 x (0.36/5)
= 0.0486
Therefore, the probable error is 0.0486
4. Rank Correlation Coefficient
This is the third method in correlation. This
Spearman, a British psychologist in the year 1904.methÍd is developed by Charles FA.
This method is used to
coefficient correlation by assigning the ranks to the items. ascertin aidesln the
This measure is useful
with qualitative characteristics of items such as
failed to give quantitative measurement of beauty, intelligence, morality, etc. But t
use of Pearson's calculation of coefficient of correlation coefficient which is possible hy k
Calculation of Rank correlation is pOssible correlation.
obtained is approximate one, because actualin case data
of individual series only. The resulk
is
assigned ranks are taken intó consideration. The not considered for calculaion. Hhe
Rank Correlation Coefficient. following formula is used to.calauata
R 62D² 62D
1 or R = 1
N°-N
Where, N(N-1)
R
Rank Correlation Coefficient
Sum of the squares of
N differences of two ranks
Number of paired observations
Steps
1
Assign the ranks to x and y variables
taking highest value as rank 1* continue this
process till to exhaust all the variables and denote this column as Ry
2.
respectively. and R
Take the deviations of the two
ranks
3.
Square the D' andsum up, then we and denote 'D'.
4
Substitute the values in the above saidgetformula.
D2,
Tied Ranks or Equal
When the two or more Ranks
values
or Repeated Ranks
have equal values is
the rank in such case, we
have to allot the average called tied values, It is diffcuit to
rank to the tied values. For exdaiydlu
two individuals are placed in 5h
place, they are given the rank 5.5 5+6
) to eacl;
and next will be 7. If three are (i.e.,
5+6+7
equal ranked at the 5 place, they are
-), which is the common rank to given the rank O
3 be assigned to each: and the next rank W
The formula used in this case is as
follows:
6{2D²1
R 1 12 (m -m) +121 (m-m) t .)
N-N
256
CORRELATION

Where, the number of items whose ranks are common or


Regression Line tied)
The
4, regression
lines are drawn whenever two variables have a linear
elaborately in the next chapter. relationship. This
willbe discussed
aspect
s MarksIllustrations
Ilustration-1
Following are the rank obtained by 10 students in two subjects, Marketing and
Management. Find the Rank Correlation Coefficient.
Rank in Marketing 123 4 5 6 7 8 10
Rank in Management 2 1 5
39 7 10 6
Solution: Calculation of Rank Correlation Coefficient
Rank in Rank in
Marketing (Rx) Management (Ry) D= Rx-Ry D2
1 2 -1 1
2 4 -2
3 1 4
4 5
1
5 3 4
6
7 7
10 -2 4
6 3
10 2 4
ED'=40
62D?
R 1
N° -N
ED'=40, N=10
6x40 240 240
1 = 1 =1 =1-0.24
10 -10 1000 - 10 990
0.76
Illustration-2
Calculate Rank Correlation Coefficient for the data given below:
X 60 50 40 70 75 55 65 80
40 70 60 55 65 75 50 45
Solution: Calculation of Rank Correlation
X Coefficient D2
Rx Ry D= RxRy
60 40 5 9
50
70 2 5 25
40 60 4 16
70 55 5 4
75 65 3
55 75 25
1
65 50 4
4 6
80
45 1 36
ED2=120
R 62D2
1
N-N
ED2=120, N=8

257

You might also like