QMM 1
QMM 1
Statistical Relationship
Relation between height & weight; Price & demand, Age & Height; Radius & Area of a
circle Two variables are said to be correlated if a change in the value of one variable is
accompanied by a change in the value of another variable.
When both the variables in the bi-variate data are quantitative, we use the term
Correlation analysis to describe the methods to find out if relationship exists or not?
(Y) (X)
245 1400
312 1600
279 1700
308 1875
199 1100
219 1550
405 2350
324 2450
319 1425
255 1700
Graphical Presentation
Types of Relationships
No relationship
• In a bivariate population we are interested to know whether there exists some sort
of functional relationship between the two variables involved.
• The change in one variable affects a change in the other variable or not?
COVARIANCE
• The covariance measures the strength of the linear relationship between two
variables
• Scatter diagram is a graphical method of showing the correlation between the two
variables x & y.
• The scatter diagram may indicate both degree and the type of correlation.
• From scatter diagram, we can form a fairly good, though rough idea about the
relationship between the two variables.
Scatter Plot
A scatter plot (or scatter diagram) can be used to show the relationship between two
variables
Volume Cost
per day per day
Advantage & Disadvantage of Scatter Diagram
23 125
• Readily comprehensible and enables us to form a
26 140 rough idea of the nature of relationship between the
two variables
29 146
• Not affected by extreme observations
33 160
• Not influenced by extreme items
38 167 • Not a suitable method if the number of observations
is very large
42 170
• Provides only rough measure of Correlation which can
50 188 differ from man to man
Coefficient of Correlation
∑( X i − X ) ( Yi − Y )
r= i =1
n n
∑( X i − X) ∑( Y −Y )
2 2
i
i =1 i =1
• Karl Pearson (1857-1936) a great Statistician provided formula for measuring the
magnitude of linear correlation coefficient between two variables.
√(VarX *VarY)
√ S(x - µ )2* Σ (y - µ ) 2
where dx2= (x - a) 2
where dy2= (y - b) 2
Nature of Relationship
• Positive correlation means that low values of one variable are associated with low
values of the other, and high values of one variable are associated with high
values of the other.
• Negative correlation means that low values of one variable are associated with
high values of the other, and high values of one variable are associated with low
values of the other.
• In this case the best individual is given rank number1, next 2 and so on.
• R = 1 - 6*S(D) 2
n(n2 – 1)
n(n2 – 1)
• Suppose an item is repeated at rank 5, then the common rank assigned to 5 & 6 is
5.5, i.e. Average of 5&6
• If an item is repeated at rank 2, then the common rank assigned to each value will
be average of 2,3 & 4 = 3. And next rank will be assigned as 5. Then correction
formula will be used to calculate r.
• It is a useful method when the actual data is not given but only ranks are given.
• Limitation
• When no of items is >30, and if ranks are not given; it takes more time and
therefore can’t be used conveniently.
Quiz
• PE = 2(1 – r2)/3√n
• or PE = 0.6745*(1 –r2)/√n
• r = Coefficient of Correlation
• By adding and subtracting the value of PE from r, we get respectively the upper &
lower limits within which the r in the population can be expected to.
• The whole data is symmetrical and gives a normal frequency curve (Bell Shaped
Curve)
• The statistical measure for which the PE is computed, must have been calculated
from the sample.
Determination Coefficient
Co-efficient of Determination r2
• The “r2” is preferred to the “r”, because it explains the process of variation in the
dependent variable which is explained by a change in the independent variable.
REGRESSION ANALYSIS
Regression is the measure of the average relationship between two or more variables
in terms of the original units of the data.. -- Blair
Regression Analysis is a statistical device with the help of which we can estimate or
predict the unknown values of one variable from the known values of the other
variable.
The variable which is used to predict the variable of interest is called Independent
variable, generally denoted as X and the variable we are trying to predict is called
as Dependent Variable generally denoted as Y.
(regression or mediocrity)
Regression Analysis
• Purpose: to determine the regression equation; it is used to predict the value of the
dependent variable (Y) based on the independent variable (X).
• Procedure: select a sample from the population and list the paired data for each
observation; draw a scatter diagram to give a visual portrayal of the relationship;
determine the regression equation.
• Y= a + bX where,
• For each value of X, there is a group of Y values, and these Y values are normally
distributed.
• The means of these normal distributions of Y values all lie on the straight line of
regression.
• The Y values are statistically independent. This means that in the selection of a
sample, the Y values chosen for a particular X value do not depend on the Y values
for any other X values.
1. The cause & effect relations are indicated from the study of regression analysis.
2. It establishes the rate of change in one variable in terms of the changes in another
variable.
4. It helps in prediction and thus it can estimate the values of unknown quantities.
7. It can be useful to all natural, social and physical sciences, where the data are in
functional relationship.
R between X & Y is a measure of direction & degree byx & bxy are mathematical measures
of linear relationship expressing the avg relationships between X
&Y
It is symmetric in X & Y ryx = rxy These are not symmetrical byx not = bxy
It indicates the degree of association It is used to forecast the nature of dependent
variable when the independent variable is
known
It is a relative measure and is independent of the units Regression Coefficients are absolute
of measurement measure of finding out the relationship
between two or more variables.
It does not imply cause & effect relationships between It indicates the cause & effect relationship
the variables under study between the variables. The variable
corresponding to cause is taken as
independent variable, whereas
corresponding to effect is taken as dependent
variable.
R does not reflect upon the nature of variable It estimates the value of dependent variable
for any given value of independent variable.
It has limited application as it is confined to the study It has wider applications as it also studies
of linear relationship between two variables. non-linear relationship between the
variables.
• If the historical data used are restricted to past values of the series that we are
trying to forecast, the procedure is called a time series method.
• If the historical data used involve other time series that are believed to be related
to the time series that we are trying to forecast, the procedure is called a causal
method.
More Definitions
• Most of the series relating to Economics, Business, and Commerce are all time
series spread over a period of time.
• The trend component accounts for the gradual shifting of the time series over a
long period of time.
• ( Secular Trend) Trend is either upward or downward, generally smooth long term
tendency.
• Any regular pattern of sequences of values above and below the trend line is
attributable to the cyclical component of the series. (5-7, 7-9)
• Cyclic variations are the oscillatory movements in a time series are due to ups and
down recurring after a period greater than a year. May not be uniformly periodic.
The seasonal component of the series accounts for regular patterns of variability within
certain time periods.
n The seasonal variation may be attributed to those causes resulting from natural
forces and social customs and tradition. Seasonal variations are the results of such
factors which uniformly and regularly rise and fall in the magnitude.
n These variations usually repeat themselves in less than one year time.
n Random variations are accidental changes which are purely random, unforeseen
and unpredictable, earthquakes, wars, floods & droughts
n Normally, they are short-term variations but some times their effects is so intense
that they may give rise to new cyclical or other movements.
• 2. Forecasting : Analysis becomes the base for predicting future behaviour of the
variable.
• The following are the two models commonly used for the decomposition of a time
series into its components.
• 1. Additive Model : Y = T + S + C + I
– This model assumes that the observed value is the sum of four components
of time series.
– This model assumes that the components although due to different causes
are not necessarily independent and they can affects one another.
Multiplicative nature.
• All months do not have same number of days. So, Each month total divide by no
of days and then by 365/12. Example
• 2. Population Change:
• 4. Comparability :
Index Numbers
Cotton 10 12 15 18 20 24 28
Polyster 30 35 39 42 45 48 58
Specialized Averages (Averages are used to compare two or more series which
are expressed in the same units, however Indices can be used when units are
different)
As a Economic Barometers
Useful in deflating
Price Index (Measures the changes in Prices of item between two points of time)
Quantity Index (It measures the changes in physical volume of goods produced
or consumed)
Value Index (It measures the change in actual value between the base and the
given period)
Special Purpose Index (Consumer Price Index, PPI, Sensex, Dow Johns Industrial
Index etc. )
5. Price Quotation (Prices vary from place to place, shop to shop so selection of
cities, shops and persons for price quote)
6. Choice of an average
1. Since index numbers are based on samples, hence can not represent all items.
2. Index numbers are constructed from deliberately selected samples which may
introduce errors. (Not Random sampling)
3. Approximate indicators
5. A large numbers of methods are in practice, thus may result in different values