0% found this document useful (0 votes)
25 views

Unit 3-1

This document discusses correlation and correlation analysis. It defines correlation as a statistical tool that studies the relationship between two variables and measures the extent of their relationship. Correlation can be positive, negative, linear, or non-linear. Common methods for studying linear correlation between variables include scatter diagrams, Karl Pearson's coefficient of correlation, two-way frequency tables, rank method, and concurrent deviations method. Correlation analysis helps understand relationships between variables but does not necessarily imply causation. Regression analysis differs from correlation analysis in establishing a functional relationship between variables to predict one from the other and indicates a cause-and-effect relationship between variables.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views

Unit 3-1

This document discusses correlation and correlation analysis. It defines correlation as a statistical tool that studies the relationship between two variables and measures the extent of their relationship. Correlation can be positive, negative, linear, or non-linear. Common methods for studying linear correlation between variables include scatter diagrams, Karl Pearson's coefficient of correlation, two-way frequency tables, rank method, and concurrent deviations method. Correlation analysis helps understand relationships between variables but does not necessarily imply causation. Regression analysis differs from correlation analysis in establishing a functional relationship between variables to predict one from the other and indicates a cause-and-effect relationship between variables.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 12

CORRELATION

In a bivariate distribution, we may be interested to find if there is any relationship between the two

variables under study. The correlation is a statistical tool which studies the relationship between two

variables and correlation analysis involves various methods and techniques used for studying and

measuring the extent of the relationship between the two variables.

WHAT THEY SAY ABOUT CORRELATION — SOME DEFINITIONS AND USES

“When the relationship is of a quantitative nature, the appropriate statistical tool for

discovering and measuring the relationship and expressing it in a brief formula is known as

correlation.”—Croxton and Cowden

“Correlation is an analysis of the covariation between two or more variables.” —A.M. Tuttle

“Correlation analysis contributes to the understanding of economic behaviour, aids in

locating the critically important variables on which others depend, may reveal to the economist

the connections by which disturbances spread and suggest to him the paths through which

stabilising forces may become effective.”—W.A. Neiswanger

“The effect of correlation is to reduce the range of uncertainty of our prediction.” —Tippett

Two variables are said to be correlated if the change in one variable results in a corresponding

change in the other variable.

8·1·1. Types of Correlation

(a) POSITIVE AND NEGATIVE CORRELATION

If the values of the two variables deviate in the same direction i.e., if the increase in the values of one
variable results, on an average, in a corresponding increase in the values of the other variable or if a
decrease in the values of one variable results, on an average, in a corresponding decrease in the values
of the other variable, correlation is said to be positive or direct. Some examples of series of positive
correlation are:

(i) Heights and weights.


(ii) The family income and expenditure on luxury items.
(iii) Amount of rainfall and yield of crop (up to a point).
(iv) Price and supply of a commodity and so on. On the other hand, correlation is said to be
negative or inverse if the variables deviate in the opposite direction i.e., if the increase
(decrease) in the values of one variable results, on the average, in a corresponding decrease
(increase) in the values of the other variable.

(v) Some examples of negative correlation are the series relating to : (i) Price and demand of a
commodity. (ii) Volume and pressure of a perfect gas. (iii) Sale of woolen garments and the
day temperature, and so on.

(b) LINEAR AND NON-LINEAR CORRELATION

The correlation between two variables is said to be linear if corresponding to a unit change in one

variable, there is a constant change in the other variable over the entire range of the values. For
example, letus consider the following data :

x12345

y 5 7 9 11 13

Thus for a unit change in the value of x, there is a constant change viz., 2 in the corresponding values of

y. Mathematically, above data can be expressed by the relation

y = 2x + 3

In general, two variables x and y are said to be linearly related, if there exists a relationship of the form

y = a + bx … (*)

between them. But we know that (*) is the equation of a straight line with slope ‘b’ and which makes an
intercept ‘a’ on the y-axis [c.f. y = mx + c form of equation of the line]. Hence, if the values of the two
variables are plotted as points in the xy-plane, we shall get a straight line. This can be easily checked for
the example given above. Such phenomena occur frequently in physical sciences but in economics and
social sciences, we very rarely come across the data which give a straight line graph. The relationship
between two variables is said to be non-linear or curvilinear if corresponding to a unit change in one
variable, the other variable does not change at a constant rate but at fluctuating rate. In such cases if the
data are plotted on the xy-plane, we do not get a straight line curve. Mathematically speaking, the
correlation is said to be non-linear if the slope of the plotted curve is not constant. Such phenomena are
common in the data relating to economics and social sciences

METHODS OF STUDYING CORRELATION

We shall confine our discussion to the methods of ascertaining only linear relationship between two

variables (series). The commonly used methods for studying the correlation between two variables are :
(i) Scatter diagram method.

(ii) Karl Pearson’s coefficient of correlation (Covariance method).

(iii) Two-way frequency table (Bivariate correlation method).

(iv) Rank method.

Concurrent deviations method.

SCATTER DIAGRAM METHOD

Scatter diagram is one of the simplest ways of diagrammatic representation of a bivariate distribution
and provides us one of simplest tools of ascertaining the correlation between two variables. Suppose we
are given n pairs of values (x1, y1), (x2, y2), …, (xn, yn) of two variables X and Y. For example, if the
variables X and Y denote the height and weight respectively, then the pairs (x1, y1), (x2, y2), … , (xn, yn)
may represent the heights and weights (in pairs) of n individuals. These n points may be plotted as dots
(.) on the x-axis and y-axis in the xy-plane. (It is customary to take the dependent variable along the y-
axis and independent variable along the x-axis.) The diagram of dots so obtained is known as scatter
diagram.
KARL PEARSON’S COEFFICIENT OF CORRELATION (COVARIANCE METHOD) A mathematical method for
measuring the intensity or the magnitude of linear relationship between two variable series was
suggested by Karl Pearson (1867-1936), a great British Bio-metrician and Statistician and is by far the
most widely used method in practice.
Assumptions Underlying Karl Pearson’s Correlation Coefficient. Pearsonian correlation

coefficient r is based on the following assumptions :

(i) The variables X and Y under study are linearly related. In other words, the scatter diagram of the

data will given a straight line curve.

(ii) Each of the variables (series) is being affected by a large number of independent contributory causes
of such a nature as to produce normal distribution. For example, the variables (series) relating to ages,
heights, weights, supply, price, etc., conform to this assumption. In the words of Karl Pearson : “The sizes
of the complex organs (something measureable) are determined by a great variety of independent
contributing causes, for example, climate, nourishment, physical training and innumerable other causes
which cannot be individually observed or their effects measured.” Karl Pearson further observes, “The
variations in intensity of the contributory causes are small as compared with their absolute intensity and
these variations follow the normal law of distribution.”

(iii) The forces so operating on each of the variable series are not independent of each other but are
related in a causal fashion. In other words, cause and effect relationship exists between different
operating forces are entirely independent of each other and not related in any fashion, then there
cannot be any correlation between the variables under study. For example the correlation coefficient
between :

(a) the series of heights and income of individuals over a period of time,
(b) the series of marriage rate and the rate of agricultural production in a country over a period of time,
(c) the series relating to the size of the shoe and intelligence of a group of individuals, should be zero,
since the forces affecting the two variable series in each of the above cases are entirely independent of
each other.

However, if in any of the above cases the value of r for a given set of data is not zero, then such
correlation is termed as chance correlation or spurious or non-sense correlation.
9·9. CORRELATION ANALYSIS Vs. REGRESSION ANALYSIS

1. Correlation literally means the relationship between two or more variables which vary in
sympathy so that the movements in one tend to be accompanied by the corresponding
movements in the other(s). On the other hand, regression means stepping back or returning
to the average value and is a mathematical measure expressing the average relationship
between the two variables.
2. Correlation coefficient ‘rxy’ between two variables x and y is a measure of the direction
and degree of the linear relationship between two variables which is mutual. It is symmetric,
i.e., ryx = rxy and it is immaterial which of x and y is dependent variable and which is
independent variable.
Regression analysis aims at establishing the functional relationship between the two
variables under study and then using this relationship to predict or estimate the value of the
dependent variable for any given value of the independent variable. It also reflects upon the
nature of the variable, i.e., which is dependent variable and which is independent variable.
Regression coefficients are not symmetric in x and y, i.e., byx ≠ bxy.
3. Correlation need not imply cause and effect relationship between the variables under
study. [For details see § 8·1·2, page 8·2. However, regression analysis clearly indicates the
cause and effect relationship between the variables. The variable corresponding to cause is
taken as independent variable and the variable corresponding to effect is taken as
dependent variable.
4. Correlation coefficient rxy is a relative measure of the linear relationship between x and y
and is independent of the units of measurement. It is a pure number lying between ± 1.
On the other hand, the regression coefficients, byx and bxy are absolute measures
representing the change in the value of the variable y(x), for a unit change in the value of the
variable x(y). Once the functional form of regression curve is known, by substituting the
value of the dependent variable we can obtain the value of the independent variable and
this value will be in the units of measurement of the variable.
5. There may be non-sense correlation between two variables which is due to pure chance
and has no practical relevance, e.g., the correlation between the size of shoe and the
intelligence of a group of individuals. There is no such thing like non-sense regression.
6. Correlation analysis is confined only to the study of linear relationship between the
variables and, therefore, has limited applications. Regression analysis has much wider
applications as it studies linear as well as non-linear relationship between the variables.

You might also like