1 Unnamed 04 01 2024
1 Unnamed 04 01 2024
• Step 1: Gather first-hand information from the sample and this is called
the raw data.
• Step 2: Tabular representation of the raw data, i.e., represent the raw data
in a table.
• Step 3: Pictorial representation of the data, i.e., draw diagrams with the
organized data in a table.
• Step 4: Numerically summarize the data, i.e., describe the entire data set
with some key numbers.
• Step 5: Analyze the data using mathematical formulae.
• Step 6: Draw the final inference or conclusion about the population
under study.
Data Analysis:
• Primary data
• Secondary data
Primary Data:
Example:
If a researcher is interested to know the impact of noon meal scheme
for the school children, he has to undertake a survey and collect data
on the opinion of parents and children by asking relevant questions.
Such a data collected for the purpose is called primary data.
Methods for Collecting Primary Data:
Secondary data are those data which have been already collected
and analyzed by some earlier agency for its own use; and later
the same data are used by a different agency.
Frequency Distribution:
Frequency distribution is a series when a number of observations
with similar or closely related values are put in separate bunches or
groups, each group being in order of magnitude in a series. It is
simply a table in which the data are grouped into classes and the
number of cases which fall in each class are recorded. It shows the
frequency of occurrence of different values of a single Phenomenon.
The statistical data collected are generally raw data or ungrouped data.
Example:
Let us consider the daily wages (in Rs.) of 30 laborers in a factory.
800, 700, 550, 500, 600, 650, 400, 300, 800, 900, 750, 450, 350, 650,
700, 800, 820, 550, 650, 800, 600, 550, 380, 650, 750, 850, 900, 650,
450, 750.
Given a raw data set, we can rearrange it in two different ways.
Using the frequency of the variable we can arrange it. This representation
of the data is known as frequency distribution.
𝒙 𝑓 𝒙 𝑓 𝒙 𝑓
15 2 1-9 3 5 - 10 2
17 3 10 - 19 5 10 - 15 3
18 5 20 - 29 10 15 -20 5
20 4 30 - 39 4 20 - 25 4
22 7 40 - 49 7 25 - 30 7
25 9 50 – 59 6 30 -35 9
30 3 60 - 69 3 35 - 40 3
If “𝑑” is the gap between the upper limit of any class and the lower limit
of the succeeding class, the class boundaries for any class are then given by:
𝒅
Upper class boundary = Upper class limit +
𝟐
𝒅
Lower class boundary = Lower class limit -
𝟐
Summarizing a raw data set or an organized data set
There are two basic properties of a quantitative data set that are
commonly studied. These are central tendency and variability (or
dispersion).
Central Tendency: Quite often it is found that the entries in data set
cluster around a central (or middle) value. This behavior of the data
set is called the central tendency. The main Challenge is to locate a
central value around which the clustering takes place.
* Range
* Quartile deviation
* Variance
* Standard deviation
1. Arithmetic Mean or Average
Quartiles: The three points which divided the series in to four equal parts
are called quartiles. It is denoted by 𝑄1 , 𝑄2 , 𝑄3 .
Deciles: The nine points which divided the series in to ten equal parts
are called deciles. It is denoted by 𝐷1 , 𝐷2 , ⋯ , 𝐷9 .
𝑘𝑁
Percentiles:- 𝑃𝑘 ∶ ; Identify the same value in 𝑐𝑓 list, otherwise find 𝑐𝑓
100
𝑘𝑁
just greater than ,
the corresponding variable is the
100
percentile value. Here, 𝑘 = 1, 2, ⋯ , 99.
For continuous frequency distribution:
Q1) Find the first four moments about 𝑥 = 10 for the series 4, 7, 10, 13, 16, 19, 22.
Q2) Calculate the first four moments about the mean for the series 4, 7, 10, 13, 16,
19, 22.
Q3) The first four moments of a distribution about 𝑥 = 4 are 1, 4, 10 and 45.
Comment upon the nature of the distribution.
Q4) In a certain distribution, the first four moments about x=5 are 2, 20, 40 and 50.
Calculate and state whether the distribution is leptokurtic or platykurtic.
Q5) The first four central moments of a distribution are 0, 2.5, 0.7 and 18.75. Test the
skewness and kurtosis of the distribution.