Module - 1
Module - 1
MODULE : 1
BASIC CONCEPTS
Prepared by : Bhumi
Vyas
(Asst. Prof.)
CHAPTER 1 – Basics of Statistics
• CONTENTS
Basics of Statistics
Function of Statistics
Scope of Statistics
Limitation of Statistics
Basic Statistical Concepts
Basics of statistics
• Introduction:
Nature created variation and thereby
generated the need for the subject of statistics which
essentially exists only because of variation in data.
Ex: Height, Weight, Marks, Percentage, Income,
Sales of the companies, Prices of Stocks.
• Statistics is the science of dealing with numbers.
Example:
1.I walk on an average 4 km/day.
2.There are 60 percent chances that a particular political party
will get reelected at the next election.
• It facilitates comparison
Facts by themselves have little value unless they are
seen in the correct context in which they occur.
The purpose of statistics is to enable comparison
between past and present results to ascertain the reasons
for changes, which have taken place and the effect of
such changes in future.
• It helps in forecasting
The future is uncertain. Statistics helps in forecasting the
trend and tendencies. Statistical techniques are used for
predicting the future values of a variable. Eg.
Regression line
A producer forecasts his future production on the basis
of the present demand conditions and his past
experiences.
• There has been hardly any area where statistics has not been
applied, whether it be trade, industry, commerce, economics,
life sciences, education.
• However certain fields have used statistics very frequently and
effectively. We list some of the important fields.
State
Economics
Business Management
STATISTICS & STATE
• Data
• The word data is plural of the word datum which in Greek
means fact.
• It is a collection of observations expressed in numerical
quantities.
• Any raw collection of facts and figures which is not
meaningful to user is know as Data.
• Data is always used in the collective sense and not in singular.
• Population
• The word population in statistics means the totality of
the set of objects under study.
• Statistical populations are used in order to observe
behaviors, trends, and patterns in the way individuals in
a defined group interact with the world around them,
allowing statisticians to draw conclusions about the
characteristics of the subjects of study
• An example of population is over eight million people
living in New York City.
• Sample
• A subset of the population.
• A sample is a selected number of entities or individuals
which form a part of the population under study.
• The study of a sample is more practical and economical
in most situations where the population is large and is
used to make conclusions about the entire population.
• Characteristics
• The word characteristic means an aspect possessed by
an individual entity.
• We may study the rainfall of a certain region, or the
marks scored by students in a certain school. These are
referred to as characteristics.
• Variable & Attributes
• In statistics characteristics are of two types. Measurable
and non measurable.
110 175 161 157 155 108 164 128 114 128 165 133
164 146 116 149 104 141 103 204 162 149 74 113
69 121 93 143 140 144 187 184 197 87 35 122 203 148
• k=7
•Minimum value = 30
•Maximum value = 204
•So, Range= Max Value – Min Value
= 204 – 30
= 174
C= R/k = 174/7 ~ 25
K is generally between 5 to 15
Classes Tally mark Frequency Relative Percentage
Frequency Frequency
30 – 55
55 – 80
80 – 105
105 – 130
130 – 155
155 – 180
180 – 205
Less Than Cumulative Distribution
Less than calculation No. of
cumulative series students
Less than 55
Less than 80
80 – 100
100 – 120
120 – 140
140 – 160
160 – 180
180 – 200
200 – 220
HW sum: Prepare ‘less than’ and
‘more than’ series.
Marks obtained by No. of students
students
20 – 30 6
30 – 40 18
40 – 50 25
50 – 60 22
60 – 70 17
70 – 80 12
Total 100
GRAPHS:
• A large variety of graphs are used in practice.
Here we will be discussing the graphs of
frequency distributions only.
• A frequency distribution can be presented
graphically in any of the following ways:
1) Histogram
2) Frequency Polygon
3) ‘Ogive' or cumulative frequency curves .
Histogram:
• It is a graph of a frequency distribution in which
the class intervals are plotted on x-axis and their
respective frequencies on y-axis
• On each class a rectangle is drawn , the height of
each rectangle is taken to be equal to the
frequency of the corresponding class .
• The construction of such a Histogram is shown
in the following example.
Example
• Draw a Histogram & Frequency polygon from
the following distribution
• giving marks of 50 students in statistics .
Mar 0-10 10- 20- 30- 40- 50- 60- 70- 80-
ks in 20 30 40 50 60 70 80 90
Stati
stics
No. 0 2 3 7 13 13 9 2 1
of
stud
ents
Frequency Polygon:
• Frequency polygons are more suitable than
histograms whenever two or more frequency
distributions are to be compared.
• The frequencies of the classes are plotted against
the mid-values of the corresponding classes .
• The points so obtained are joined by straight
lines to obtain the frequency polygon.
Draw a frequency polygon and
Histogram: hw
Monthly Income No. of Families
(in Rs.)
0 – 500 10
500 – 1000 15
1000 – 1500 18
1500 – 2000 12
2000 – 2500 8
2500 – 3000 4
Draw a Histogram & frequency Polygone for the
following data:
No.of 7 11 15 21 16 6 4
Stude
nts
Average of Position:
• Measures of central tendency namely median,
quartiles, deciles, percentiles, and mode are used.
• Median:
• Median may be defined as the middle value in the
data set when elements are arranged in sequential
order of scale.
• Median is a measure of location or centrality of the
observation.
• The median can be calculated for both grouped and
ungrouped data sets.
Median: Question
Xi fi
20 6
9 4
25 16
50 7
40 8
80 2
Question:
10 – 20 23
20 – 30 27
30 – 40 21
40 – 50 15
Merits & Limits of Median:
Merits:
It is useful measure of central value especially in case
of open ended classes.
It is most suitable measure in case of qualitative data
such as beauty, intelligence, honesty, etc.
It is not affected by extreme values.
The value of median can be determined graphically
whereas the value of mean can not be determined
graphically.
Limitations:
For calculating median it is necessary to arrange the
data in ascending or descending order of scale.
Since it is positional average it is not based on all the
observation.
It is affected by sampling fluctuation.
Mode:
1)For Raw data:
Mode is the value which occurs most frequently ,in
a set of observations. It is a value which is repeated
maximum number of times and is denoted by Z.
Limitation:
Value of mode can not always be determined,
whereas in some cases we have multiple mode of
distribution.
It is not capable of further analysis, i.e. using mode of
two sets of data we can not calculate combined mode.
It is not based on all observation
Example: 1
20 – 40 21
40 – 60 28
60 – 80 35
80 – 100 40
100 – 120 24
120 – 140 18
140 – 160 10
Measures of Dispersion
• The measures of central tendency describe that the
values in the data set tend to spread around central
value called average. But these measures do not
reveal how these values are spread or scattered on
each side of the central value.
• In central tendency can be measured by a number in
the form of an average, the amount of variation
(spread or scattered) among the values in the data set
can also be measured.
Measures of Dispersion: