0% found this document useful (0 votes)
163 views

QMM 1

This document discusses correlation analysis and different correlation coefficients. [1] Correlation analysis studies the relationship between two variables and determines the degree of association between them. [2] The Pearson correlation coefficient measures the strength of the linear relationship between two quantitative variables. [3] Spearman's rank correlation coefficient is used to measure the association between two variables when only the ranking of their values is known.

Uploaded by

Ravi Reddy
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
163 views

QMM 1

This document discusses correlation analysis and different correlation coefficients. [1] Correlation analysis studies the relationship between two variables and determines the degree of association between them. [2] The Pearson correlation coefficient measures the strength of the linear relationship between two quantitative variables. [3] Spearman's rank correlation coefficient is used to measure the association between two variables when only the ranking of their values is known.

Uploaded by

Ravi Reddy
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 18

Correlation Analysis

• Correlation is a statistical tool which studies the relationship between two


variables and Correlation Analysis involves various methods and techniques used
for studying and measuring the extent of the relationship between two variables.

• Correlation Analysis is a statistical procedure by which we can determine the


degree of association or relationship between two or more variables.

Statistical Relationship

Relation between height & weight; Price & demand, Age & Height; Radius & Area of a
circle Two variables are said to be correlated if a change in the value of one variable is
accompanied by a change in the value of another variable.

Such a relationship is called Statistical Relationship.

When both the variables in the bi-variate data are quantitative, we use the term
Correlation analysis to describe the methods to find out if relationship exists or not?

Croxton and Crowden defined the correlation as

“The relationship of quantitative nature. The appropriate statistical tool for


discovering and measuring the relationship and expressing it in brief formula is
known as Correlation.”

According to the Statistician A. M. Tuttle

“Correlation is an analysis of the covariation between two or more variables.”

Sample Data for House Price Model

House Price in $1000s Square Feet

(Y) (X)

245 1400

312 1600

279 1700

308 1875
199 1100

219 1550

405 2350

324 2450

319 1425

255 1700

Graphical Presentation

• House price model: scatter plot

Types of Relationships

Linear relationships Curvilinear relationships


Strong relationships Weak relationships

No relationship

UNIVARIATE & BIVARIATE DISTRIBUTION

• In a bivariate population we are interested to know whether there exists some sort
of functional relationship between the two variables involved.

• The change in one variable affects a change in the other variable or not?

• If yes what is the nature of this relationship?

COVARIANCE

• Covariance is an absolute measure between two variables X & Y, denoted by


Cov. (X,Y) and defined as

• Cov. (X,Y) = S(x - m)*(y - µ )/n

• Cov. (X,Y) = 1/n*{Sxy – 1/n*(Sx)*(Sy)}

• The covariance measures the strength of the linear relationship between two
variables

SCATTER DIAGRAM OR DOT DIAGRAM METHOD

• Scatter diagram is a graphical method of showing the correlation between the two
variables x & y.

• The scatter diagram may indicate both degree and the type of correlation.

• From scatter diagram, we can form a fairly good, though rough idea about the
relationship between the two variables.

Scatter Plot

A scatter plot (or scatter diagram) can be used to show the relationship between two
variables
Volume Cost
per day per day
Advantage & Disadvantage of Scatter Diagram
23 125
• Readily comprehensible and enables us to form a
26 140 rough idea of the nature of relationship between the
two variables
29 146
• Not affected by extreme observations
33 160
• Not influenced by extreme items
38 167 • Not a suitable method if the number of observations
is very large
42 170
• Provides only rough measure of Correlation which can
50 188 differ from man to man

55 195 Co-efficient of Correlation r

It gives the degree of association or relationship correlation.


60 200
The relationship between two variables such that a change
in one variable results in a positive or negative change in
the other variable and also a greater change in one variable results in corresponding
greater or smaller change in the other variable is known as Correlation.

Coefficient of Correlation

Measures the strength of


n the linear relationship between two quantitative variables

∑( X i − X ) ( Yi − Y )
r= i =1
n n

∑( X i − X) ∑( Y −Y )
2 2
i
i =1 i =1

Application of Correlation analysis

• Correlation analysis is used to measure strength of the association (linear


relationship) between two variables

– Correlation is only concerned with strength of the relationship

– No causal effect is implied with correlation


Properties of Co-efficient of correlation

1. It is a measure of the closeness of a fit in a relative sense

2. R lies between -1 & +1

3. The correlation is perfect negative when r = -1

4. The correlation is perfect positive when r= +1

5. If r = 0 then there is no correlation, Thus Variables are independent

6. R is a pure number and is not affected by a change of origin & scale

7. Relative measure of association between two or more variables

Scatter Plots of Data with Various Correlation Coefficients

Karl Pearson’s Coefficient of Correlation

• Karl Pearson (1857-1936) a great Statistician provided formula for measuring the
magnitude of linear correlation coefficient between two variables.

• ᵖ (X,Y) = rxy = Cov (x,y)

√(VarX *VarY)

• ᵖ (X,Y) = rxy = S(x - µ )(y - µ )

√ S(x - µ )2* Σ (y - µ ) 2

Karl Pearson’s Coefficient of Correlation contd.

ᵖ (X,Y) = rxy = n*Σ x*y – Σ x*Σ Y

√ {n*Σ x 2 – (Σ x)2}*{n*Σ y2 – (Σ y)2}

Above formula saves a lot of computational labour.

Also It reduces the error due to computation & rounding off.

Other forms also can be used

ᵖ (X,Y) = rxy = Σ dx*Σ dy where dx=(x- µ )

√ Σ dx2 * Sdy2 where dx2=(x- µ ) 2

ᵖ (X,Y) = rxy = Σ dx*Σ dy


n* σx*σy

Another Formula called short cut method

ᵖ (X,Y) = rxy = n*Σ dx*dy – Σ dx*Σ dY

√ {n*Σ dx2 – (Sdx)2}*{n*Sdy2 – (Σ dy)2}

where dx = (x - a) a is assumed mean for X

where dx2= (x - a) 2

where dy = (y - b) b is assumed mean for Y

where dy2= (y - b) 2

Nature of Relationship

• Positive correlation means that low values of one variable are associated with low
values of the other, and high values of one variable are associated with high
values of the other.

• Negative correlation means that low values of one variable are associated with
high values of the other, and high values of one variable are associated with low
values of the other.

• The degree of correlation between two variables is measured by the Personian


( Product moment) correlation coefficient. ( r )

• The nearer “r” to +1 or –1. The stronger the relationship.

Spearman’s Rank Correlation Coefficient R

• It is applied in the problems in which data cannot be measured quantitatively but


qualitatively assessment is possible such as beauty, honesty etc.

• In this case the best individual is given rank number1, next 2 and so on.

• R = 1 - 6*S(D) 2

n(n2 – 1)

Where is the square of the difference of corresponding ranks


and n is number of pairs of observations.

Spearman’s Rank Correlation Coefficient When Ranks are tied or Repeated


ranks

• R = 1 - 6[Σ (D)2 +(p3–p)/12+(q3 –q)/12]

n(n2 – 1)

where p, q…….. Are the number of times a value is repeated

Tie Rank Procedure

• Suppose an item is repeated at rank 5, then the common rank assigned to 5 & 6 is
5.5, i.e. Average of 5&6

• and The Next rank will be assigned 7.

• If an item is repeated at rank 2, then the common rank assigned to each value will
be average of 2,3 & 4 = 3. And next rank will be assigned as 5. Then correction
formula will be used to calculate r.

Spearman’s Rank Correlation Coefficient

• It is simpler to understand and easy to calculate as compared to Karl’s Pearson’s


Method.

• It is useful for qualitative data such as beauty, honesty, efficiency etc.

• It is a useful method when the actual data is not given but only ranks are given.

• Limitation

• It can’t be used for grouped frequency distribution

• It is no as accurate as Pearson’s coefficient.

• It can’t be used in continuous series.

• When no of items is >30, and if ranks are not given; it takes more time and
therefore can’t be used conveniently.

Quiz

• State the nature of the following correlation

• (positive, Negative or no correlation)


• 1. The amount of rainfall & Yield of crops

• 2. The colour of a saree and the intelligence of the girl wearing it

• 3. Age if life insurance & the premium of insurance

• 4. Demand for goods and their prices under normal time

• 5. Production of pig iron and soot contents in Durgapur

• 6. Unemployment index and the purchasing power of the common man

PROBABLE ERROR IN CORRELATION

• The PE of r, helps in interpreting its value.

• Since r is calculated from the sample data, it is subject to errors of sampling.

• So, from interpretation point of view PE of r is very useful.

• Probable error of the co-efficient of Correlation

• PE = 2(1 – r2)/3√n

• or PE = 0.6745*(1 –r2)/√n

• Where n is no of pairs of observations

• r = Coefficient of Correlation

Properties of Probable Error

• If r < 6*PE then it is not significant, no evidence of correlation.

• If r > 6*PE then it is significant, correlation exists.

• By adding and subtracting the value of PE from r, we get respectively the upper &
lower limits within which the r in the population can be expected to.

• Correlation of the Population = r +- PE

• Thus PE is used for testing the reliability of the value of r.

• Standard Error = (1 –r2)/√n

Conditions for the use of PE


• The sample must have been taken out in an unbiased manner and the individual
items must be independent.

• The whole data is symmetrical and gives a normal frequency curve (Bell Shaped
Curve)

• The statistical measure for which the PE is computed, must have been calculated
from the sample.

• The items in two series should not be independent of each other.

Determination Coefficient

• The Coefficient of Determination, r2 - the proportion of the total variation in the


dependent variable Y that is explained or accounted for by the variation in the
independent variable X.

– The coefficient of determination is the square of the coefficient of


correlation, and ranges from 0 to 1.

Co-efficient of Determination r2

• It is the square of the coefficient of correlation .

• The “r2” is preferred to the “r”, because it explains the process of variation in the
dependent variable which is explained by a change in the independent variable.

• Ex A student calculates the value of r as 0.7 when the value of n is 5 and


concludes that r is highly significant. Is he correct?

• PE = 2(1 – 0.72)/3√5 = 0,34/2.24 = 0.15

• r/PE = 0.7/0.15 = 4.67 à r = 4.67*PE

REGRESSION ANALYSIS

Regression is the measure of the average relationship between two or more variables
in terms of the original units of the data.. -- Blair

Regression Analysis attempts to establish the nature of the relationship between


variables - that is, to study the functional relationship between the variables and
thereby provide a mechanism for prediction or forecasting. - Ya-Lurn-
Chou

Regression Analysis is a statistical device with the help of which we can estimate or
predict the unknown values of one variable from the known values of the other
variable.

The variable which is used to predict the variable of interest is called Independent
variable, generally denoted as X and the variable we are trying to predict is called
as Dependent Variable generally denoted as Y.

X is regressor or predictor or Explainator & Y is Regressed or Explained variable.

Regression means to return or to go back. So it implies the

Act of returning to or going back to.

Natural phenomenon generally have a tendency to return to normal.

In stats, The term Regression is used to denote backward tendency

which means going back to average or normal.

Sir Francis Galton used this term in the study of heredity.

(regression or mediocrity)

Regression Analysis

• Purpose: to determine the regression equation; it is used to predict the value of the
dependent variable (Y) based on the independent variable (X).

• Procedure: select a sample from the population and list the paired data for each
observation; draw a scatter diagram to give a visual portrayal of the relationship;
determine the regression equation.

• Y= a + bX where,

Regression Line Assumptions

• For each value of X, there is a group of Y values, and these Y values are normally
distributed.
• The means of these normal distributions of Y values all lie on the straight line of
regression.

• The standard deviations of these normal distributions are equal.

• The Y values are statistically independent. This means that in the selection of a
sample, the Y values chosen for a particular X value do not depend on the Y values
for any other X values.

UTILITY OF REGRESSION ANALYSIS

1. The cause & effect relations are indicated from the study of regression analysis.

2. It establishes the rate of change in one variable in terms of the changes in another
variable.

3. It is useful in economic analysis as regression equation can determine an increase


in the cost of living index for a particular increase in general price level.

4. It helps in prediction and thus it can estimate the values of unknown quantities.

5. It enables us to study the nature of relationship between the variables.

6. It helps in determining the coefficient of correlation as r = √byx *bxy

7. It can be useful to all natural, social and physical sciences, where the data are in
functional relationship.

RELATION BETWEEN REGRESSION ANALYSIS AND CORRELATION


ANALYSIS

Correlation Analysis Regression Analysis

It is relationship between two or more variables Regression means returning to average


value

R between X & Y is a measure of direction & degree byx & bxy are mathematical measures
of linear relationship expressing the avg relationships between X
&Y

It is symmetric in X & Y ryx = rxy These are not symmetrical byx not = bxy
It indicates the degree of association It is used to forecast the nature of dependent
variable when the independent variable is
known

Correlation Analysis Regression Analysis

It is a relative measure and is independent of the units Regression Coefficients are absolute
of measurement measure of finding out the relationship
between two or more variables.

It does not imply cause & effect relationships between It indicates the cause & effect relationship
the variables under study between the variables. The variable
corresponding to cause is taken as
independent variable, whereas
corresponding to effect is taken as dependent
variable.

R does not reflect upon the nature of variable It estimates the value of dependent variable
for any given value of independent variable.

It has limited application as it is confined to the study It has wider applications as it also studies
of linear relationship between two variables. non-linear relationship between the
variables.

Quantitative Approaches to Forecasting

• Quantitative methods are based on an analysis of historical data concerning one or


more time series.

• A time series is a set of observations measured at successive points in time or over


successive periods of time.

• If the historical data used are restricted to past values of the series that we are
trying to forecast, the procedure is called a time series method.

• Time Series Analysis

• If the historical data used involve other time series that are believed to be related
to the time series that we are trying to forecast, the procedure is called a causal
method.
More Definitions

• A time series consists of Statistical Data in chronological order.


Croxton & Cowden

• A set of data depending on the time is called time series.


Kenny

• A time series may be defined as a collection of magnitudes belonging to


different time periods, of some variable or composite of variables, such as
production of steel, per capita income, gross national product, price of tobacco or
index of industrial index.

• It reflects the dynamic pace of the movements of a phenomenon over a period of


time.

• Most of the series relating to Economics, Business, and Commerce are all time
series spread over a period of time.

Components of a Time Series

• The trend component accounts for the gradual shifting of the time series over a
long period of time.

• ( Secular Trend) Trend is either upward or downward, generally smooth long term
tendency.

• Any regular pattern of sequences of values above and below the trend line is
attributable to the cyclical component of the series. (5-7, 7-9)

• Cyclic variations are the oscillatory movements in a time series are due to ups and
down recurring after a period greater than a year. May not be uniformly periodic.

The seasonal component of the series accounts for regular patterns of variability within
certain time periods.

n The seasonal variation may be attributed to those causes resulting from natural
forces and social customs and tradition. Seasonal variations are the results of such
factors which uniformly and regularly rise and fall in the magnitude.

n These variations usually repeat themselves in less than one year time.

n The irregular component of the series is caused by short-term, unanticipated and


non-recurring factors that affect the values of the time series.
n One cannot attempt to predict its impact on the time series in advance.

n Random variations are accidental changes which are purely random, unforeseen
and unpredictable, earthquakes, wars, floods & droughts

n Normally, they are short-term variations but some times their effects is so intense
that they may give rise to new cyclical or other movements.

Utility of a Time Series

• 1. Analysis : It helps in the analysis of past behaviour of a variable.

– Analysis discloses the effect of various factors on the variable. Help in


prediction in future.

• 2. Forecasting : Analysis becomes the base for predicting future behaviour of the
variable.

– All five years plan are based on analysis of past performances.

• 3. Evaluation : It helps in evaluating the progress .

• 4. Comparison : Comparative studies can be possible when data is available


chronologically.

• 5. Approximation : It can help to provide approximate indicators.

Analysis of a time series

• 1. Identifying or determining the various forces or influences whose interaction


produces the variations in the time series.

• 2. Isolating, studying, analysing and measuring them independently, i.e. by


holding other things constant.

• The time series analysis is of great importance not only to businessman or to an


economist but also to people working in various disciplines in natural, social and
physical sciences.

Mathematical models for a Time Series

• The following are the two models commonly used for the decomposition of a time
series into its components.

• 1. Additive Model : Y = T + S + C + I
– This model assumes that the observed value is the sum of four components
of time series.

– All components operate independently of one another. Behaviour of


components is of additive in nature.

• 2. Multiplicative Model : Y = T*S*C*I

– Observed value is obtained by multiplying the T by the rates of three other


components.

– This model assumes that the components although due to different causes
are not necessarily independent and they can affects one another.
Multiplicative nature.

– In practice, additive model is rarely used.

– Most of the time series related to economic and business phenomenon


conform to the multiplication model.

Editing of a Time Series

• 1. Time Variation : When data are available on monthly basis.

• All months do not have same number of days. So, Each month total divide by no
of days and then by 365/12. Example

• 2. Population Change:

• 3. Price Change : Current values & Real values

• 4. Comparability :

Index Numbers

Consider the following data :à

(Rs/Mtr) 1994 95 96 97 98 99 2000

Cotton 10 12 15 18 20 24 28

Polyster 30 35 39 42 45 48 58

Cotton price has gone from 10 to 28 i.e. Rs 18


Polyster price from 30 to 58 i.e. Rs 28

So Polyster is becoming costlier than Cotton

CHARACTERISTICS OF INDEX NUMBER

From definitions, important characteristics can be summarized :à

Expressed in Percentage to measure the relative change; however sign ( %) is


not used.

Absolute numbers (Free from units)

Specialized Averages (Averages are used to compare two or more series which
are expressed in the same units, however Indices can be used when units are
different)

Measure the effect of change over a period of time

Measure changes not capable of direct measurement

(Ex. Cost of Living, Price Level, Business activity etc.)

Tools to measure Relative Change

USES OF INDEX NUMBERS

As a Economic Barometers

Help us in framing suitable policies

Helpful in determining Trends and Tendencies

Useful in deflating

Index numbers are used to measure the purchasing power of Money

TYPES OF INDEX NUMBERS

Price Index (Measures the changes in Prices of item between two points of time)

Quantity Index (It measures the changes in physical volume of goods produced
or consumed)

Value Index (It measures the change in actual value between the base and the
given period)
Special Purpose Index (Consumer Price Index, PPI, Sensex, Dow Johns Industrial
Index etc. )

Process of construction of Index Numbers

1. Definition of Purpose (Objectives)

2. Selection of base period (Reference Point may be year, month)

3. Selection of numbers of items (Neither too small, nor too big)

4. Selection of source of data (Reliable, Correct, Relevant )

5. Price Quotation (Prices vary from place to place, shop to shop so selection of
cities, shops and persons for price quote)

6. Choice of an average

7. Selection of an appropriate Method

LIMITATION OF INDEX NUMBERS

1. Since index numbers are based on samples, hence can not represent all items.

2. Index numbers are constructed from deliberately selected samples which may
introduce errors. (Not Random sampling)

3. Approximate indicators

4. Quality is assumed to be same

5. A large numbers of methods are in practice, thus may result in different values

Methods of constructing Index Numbers

1. Simple Index Numbers (Un-weighted )

• A. Price Relatives ( Simple Index Number)

• B. Simple Aggregative Method

• C. A simple average of price relatives

2. Composite Index Numbers (Weighted)

• A. Weighted aggregate index


• B. Laspeyres index

• c,. Paasche index

You might also like