Statistics
Statistics
Aggregate of facts
Numerically expressed
Affected by multiplicity of causes
Reasonable accuracy
Placed in relation to each other
Predetermined purpose
Estimated
According to Seligman, “Statistics is the science which deals with the methods of collecting,
classifying, presenting, comparing and interpreting numerical data collected to throw some light
on any sphere of enquiry”.
Limitations of Statistics:
Collection of Data
Sources of Data There are two sources of data
Primary Source of Data It implies collection of data from its source of origin.
Secondary Source of Data It implies collection of data from some agency or
institution which already happens to have collected the data through statistical
survey.
Primary Data Data collected by the investigator for his own purpose for the first
time, from beginning to end are called primary data.
Secondary Data These data have already been collected by somebody else, these
are available in the form of published or unpublished report.
Primary data are original and secondary data are already in existence and therefore,
are not original.
Primary data do not need any adjustment, secondary data need to be adjustment to
suit the objective of study in hand.
Primary data are expensive and secondary data are less expensive.
Originality
Reliability
Uniformity
Accuracy
Related information
Elastic
(b) Demerits
Wide coverage
Expert opinion
Simple
Less expensive
Free from bias
(b) Demerits
Less accurate
Doubtful conclusions
Biased
Economical
Wide coverage
Continuity
Suitable for special purpose
(b) Demerits
Loss of originality
Lack of uniformity
Personal bias
Less accurate
Delay in collection
(b) Enumerator’s Methods Under this Method enumerator himself fills the schedules after
seeking information from the informants. This method is mostly used when
field of investigation is large.
the investigation need specialised and skilled investigation.
the investigators are well versed in the local language and cultural norms of the
informants.
(c) Collection of Secondary Data There are two main sources of secondary data
Published sources
Unpublished sources
(d) Published Sources Some of the published source of secondary data are
Government publication
Semi-government publication
Reports of committees and commissions
Publications of trade associations
Publication of research institutions
Journals and papers
Publication of research scholars
International publication
(e) Unpublished Sources These data are collected by the government organisations and others,
generally for their self use or office record.
In order to assess the reliability, suitability and adequacy of the data, the following
points must be kept in mind
Ability of the collecting organisation
Objective and scope
Method of collection
Time and condition of organisation
Definition of the unit
Accuracy
Costly
Large manpower
Not suitable for large investigation
Economical
Time saving
Identification of error
Large investigation
Administrative convenience
More scientific
(b) Demerits
Partial
Wrong conclusions
Difficulty in selecting representative sample
Difficulty in framing a sample
Specialised knowledge
Methods of Sampling
(i) Random Sampling Random sampling is that method of sampling in which each and every
item of the universe has equal chance of being selected in the sample.
Random sampling may be done in any of the following ways
Lottery method
Tables of random number
(ii) Purposive or Deliberate Sampling It is that method in which the investigator himself makes
the choice of the samples items which in his opinion are the best representative of the universe.
(iii) Stratified or Mixed Sampling According to this method of sampling population is divided
into different strata having different characteristics and some of the items are selected from each
strata, so the entire population gets represented.
(iv) Systematic Sampling According to this methods, units of the population are numerically,
geographically and alphabetically arranged. Every nth item of the numbered is selected as a
sample item.
(v) Quota Sampling In this method, the population is divided into different groups or classes
according to different characteristics of the population.
(vi) Convenience Sampling In this method, sampling is done by the investigator in such a
manner that suits his convenience.
Important agencies at the national level which collect process and tabulate the statistical data.
NSSO (National Sample Survey Organisation), RGI (Registrar General of India), DGCIS
(Directorate General of Commercial Intelligence and Statistics) and Labour Bureaus.
Organisation of Data
Organisation of Data
Organisation of data refers to the arrangement of figures in such a form that comparison of the
mass of similar data may be facilitated and further analysis may be possible.
Classification
Classification is the process of arranging things in groups or classes according to their
resemblances and affinities and gives expression to the unity of attributes that may exist amongst
a diversity of individuals.
Objectives of Classification
Comprehensiveness
Clarity
Homogeneity
Suitability
Stability
Elastic
Basis of Classification
Simple classification
Manifold classification
Raw Data A mass of data in its crude form is called raw data.
Individual Series These are those series in which the items are listed singly. These
series may be presented in two ways
Frequency distribution is also known as continuous series or series with class-intervals, or series
of grouped data.
Mid Values Frequency Series Mid value frequency series are those series in which
we have only mid values of the class intervals and the corresponding frequencies.
Univariate Distribution The frequency distribution of a single variable is called a
univariate distribution.
Bivariate Distribution A bivariate distribution is the frequency distribution of two
variables.
Presentation of Data
Textual Presentation
In textual presentation, data are a part of the text of study or a part of the description of the
subject matter of study.
Table number
Title
Head note
Stubs
Caption
Body or field
Footnotes
Source
Quantitative Classification of Data These occurs when data are classified on the
basis ot quantitative characteristics of a phenomenon.
Temporal Classified of Data In this, data are classified according to time, and time
becomes the classifying variable.
(iii) Spatial Classification In spatial classification place, location becomes the classifying
variable. It may be a village, a town, a district, etc.
(iv) Merits of Tabular Presentation
Simple Bar Diagrams Simple bar diagrams are those diagrams which are based on
a single set of numerical data.
Multiple Bar Diagrams These are those diagram which show two or more sets of
data simultaneously.
Sub Divided Bar Diagram Sub-divided bar diagram are those diagrams which
simultaneously present total values as well as part values of a set of data.
Percentage Bar Diagram Percentage bar diagrams are those diagrams which show
simultaneously, different parts of the values of a set of data in terms of percentages.
(ii) Pie or Circular Diagrams Pie diagram is a circle divided into various segments showing the
per cent values of a series. This diagram does not show absolute values.
(iii) Frequency Diagram Data in the form of grouped frequency distributions are generally
represented by frequency diagram like histogram, frequency polygon, frequency curve and
ogive.
Less than Method In this method, beginning from upper limit of the
1st values we go on adding the frequencies corresponding to every
next upper limit of the series.
(iv) Arithmetic Line Graph An arithmetic line graph is also called time series graph. In it time is
plotted along x-axis and the value of the variable along y-axis. A line graph by joining these
plotted points, these obtained is called time series graph.
Choice of scale
Proportion of axis
Method of plotting the points
Lines of different types
Table of data
Use of false line
To draw a line or curve
One Variable Graph One variable graph are those graphs in which
values of only one variable are shown with respect to some time
period.
Two or More than Two Variable Graphs These – are the graphs in
which values of two variables are simultaneously shown with respect
to some period of time.
Limited use
Misuse
Only preliminary conclusions
Mathematical Averages
Positional Averages
Arithmetic Mean
Arithmetic Mean is the number which is obtained by adding the values of all the items of a series
and dividing the total by the number of items.
Arithmetic Mean is generally written as X. It may be expressed in the form of following formula
𝑋⎯⎯⎯⎯⎯=𝑥1+𝑥2+𝑥3+……𝑥𝑁𝑁 or Σ𝑋⎯⎯⎯⎯⎯𝑁
Types of Arithmetic Mean
Direct Method According to this method, we find the Arithmetic mean from the
following formula
𝑋⎯⎯⎯⎯⎯=Σ𝑋𝑁 or 𝑋⎯⎯⎯⎯⎯= Total value of the item Number of items
Short-cut Method By short cut method, we find the Arithmetic Mean from the
following formula
𝑋⎯⎯⎯⎯⎯=𝐴+Σ𝑑𝑁
Here, 𝑋⎯⎯⎯⎯⎯ = Arithmetic Mean, A = Assumed average of Ed = Net sum of the
deviations of the different values from the assumed average; and N = Number of
items in the series,
(ii) Discrete Series There are three methods of calculating mean of the discrete series
Direct Method Direct method of estimating mean of the discrete frequency series
uses the formula
𝑋⎯⎯⎯⎯⎯=Σ𝑓𝑋Σ𝑓
Short-cut Method Short cut method of estimating mean of the discrete frequency
series uses the following formula
𝑋⎯⎯⎯⎯⎯=𝐴+Σ𝑓𝑑Σ𝑓
Step-deviation Method This method is a variant of short-cut method. It is adopted
when deviations from the assumed mean have some common factor
𝑋⎯⎯⎯⎯⎯=𝐴+Σ𝑓𝑑Σ𝑓×𝑐
(iii) Frequency Distribution
There are three methods of calculating mean in frequency distribution
(a) Direct Method Direct method of estimating mean of the discrete frequency series uses the
formula
𝑋⎯⎯⎯⎯⎯=Σ𝑓𝑚Σ𝑓
m = mid-value, mid-value = 𝐿1+𝐿22
L1 = lower limit of the class
L2 = upper limit of the class
(b) Short-cut Method Short cut method of estimating mean of the frequency distribution uses the
formula
𝑋⎯⎯⎯⎯⎯=𝐴+Σ𝑓𝑑Σ𝑓
(c) Step Deviation Method According to this method, we find the Arithmetic Mean by the
following formula
𝑋⎯⎯⎯⎯⎯=𝐴+Σ𝑓𝑑′Σ𝑓×𝑐
(d) Weighted Arithmetic Mean It is the mean of weighted items of the series. Different items are
accorded different weights depending on their relative importance. The weighted sum of the
items is divided by the sum of the weights.
Calculation of Weighted Mean
According to this way, we find weighted mean from the following information
𝑋⎯⎯⎯⎯⎯𝑊=Σ𝑊𝑋Σ𝑊
(i) Merits
Simplicity
Certainty
Based on all items
Algebraic treatment
Stability
Basis of comparison
Accuracy test
(ii) Demerits
Median
“The Median is that value of the variable which divides the
group into two equal parts, one part comprising all values
greater than the Median value and the other part
comprising all the values smaller than the Median value”.
(i) Calculation of Median
(a) Individual Series Calculation of Median in individual
series involves the following formula
M = Size of (𝑁+12)th item
When N of the series is an even number, Median is estimated
using the following formula
Range
Quartile deviation
Mean deviation
Standard deviation
Range Range is the difference between the highest value and the lowest value in a series.
R = H – L or L – S
H or L = Highest or Largest value of series
L or S = Lowest or Smallest value of series
Mid values of the class interval are found, difference between the highest and
lowest values would be the range.
According to this method, we find the difference between lower limit of the first
class interval and upper limit of the last class interval in the series would be the
range.
Mean Deviation
“Mean deviation is the arithmetic average of deviation of all the values taken from a statistical
average of series. In taking deviation of values, algebraic signs + and – are not taken into
consideration, that is negative deviations are also treated as positive deviations”.
(i) Formulas for Mean Deviation
(a) If deviations are taken from median, the following formula is used
(b) If deviation are taken from arithmetic mean of the series
Quartiles
If a statistical series is divided in to four equal parts, the end value of each part is called a
Quartile.
(i) Calculation of Quartiles Quartile values (Q1 and Q3) are estimated differently for different sets
of series,
(a) Individual and Discrete Series
(b) Frequency Distribution Series In frequency distribution series, the class interval of Q1 and
Q3 are first identified as under
Percentiles
Percentiles divide the series into 100 equal parts, and is generally expressed as P.
Percentiles are estimated for different types of series as under
(i) Individual and Discrete Series
(ii) Frequency Distribution Series
Mode
The value of the variable which occurs most frequently in a distribution is called the mode.
According to Croxton and Cowden, “ The mode may be regarded as the most typical of a series
of value”.
(i) Calculation of Mode
Individual Series There are two ways of calculating Mode in individual series
By inspection
Discrete Series There are two methods for calculation of mode indiscrete frequency
series
Inspection Method
Grouping Method
Frequency Distribution Series The exact value of Mode can be calculated with the
following formula
𝑍=𝐿1+𝑓1−𝑓02𝑓1−𝑓0−𝑓2𝑥𝑖
Relative Position of Arithmetic Mean, Median and Mode Suppose we express,
Arithmetic Mean = Me
Median = Mi
Mode = Mo
The relative magnitude of the three are Me > Mi > Mo or Me < Mi < Mo The Median is always
between the Arithmetic Mean and the Mode.
Correlation
Correlation
It is a statistical method or a statistical technique that measures quantitative relationship between
different variables, like between price and demand.
According to Croxton and Cowden, “When the relationship is of a quantitative nature, the
appropriate statistical tool for discovering and measuring the relationship and expressing it in a
brief formula is known as correlation.”
Types of Correlation
Correlation is commonly classified into negative and positive correlation.
Positive Correlation When two variables move in the same direction, such a
relation is called positive correlation, e.g., Relationship between price and supply
Negative Correlation When two variables changes in different directions, it is
called negative correlation. Relationship between price and demand.
Degree of Correlation
Degree of correlation refers to the coefficient of correlation
The study of correlation shows the direction and degree of relationship between the
variables.
Correlation coefficient some times suggests cause and effect relationship.
Correlation analysis facilitates business decisions because the trend path of one
variable may suggest the expected changes in the other.
Correlation analysis also helps policy formulation.
Index Numbers
Index Number
An index number is a statistical device for measuring changes in the magnitude of a group of
related variables. It represents the general trend of diverging ratios from which it is calculated.
According to Croxton and Cowden, “Index numbers are devices for measuring difference in the
magnitude of a group of related variables.”
𝑃01=∑[𝑃1𝑃0×100]𝑁
Construction of Weighted Index Numbers
(i) Weighted Average of Price Relative Method
According to this method, weighted sum of the price relatives is divided by the sum total of the
weight. In this method, goods are given weight according to their quantity, thus
𝑃01=Σ𝑅𝑊Σ𝑊
Here, P01 = Index number for the current year in relation to the base year
W = weight
R = price relative
(ii) Weighted Aggregative Method Under this method, different goods are accorded weight
according to the quantity bought therefore, suggested different techniques of weighting some of
well known methods are as under
Quantity weight
Expenditure weight
Classification of industries
Statistics or data related to industrial production
Weightage
Sensex
Sensex is the index showing changes in the Indian stock market. It is a short form of a Bombay
Stock Exchange sensitive index. It is constructed with 1978-79 as the reference year or the base
year. It consists of 30 stocks of leading companies in the country.