0% found this document useful (0 votes)
18 views

1 Unnamed 04 01 2024

Statistics is the science of collecting, organizing, summarizing, and interpreting data. It involves gathering raw data, organizing it into tables or diagrams, numerically summarizing the data using measures like the mean, median and mode, analyzing patterns in the data using mathematical formulas, and making inferences about the broader population based on the sample data. Key steps in any statistical study are collecting raw data, tabulating it, representing it pictorially or graphically, summarizing it numerically using measures of central tendency and dispersion, analyzing it mathematically, and drawing a conclusion.

Uploaded by

vanchagarg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

1 Unnamed 04 01 2024

Statistics is the science of collecting, organizing, summarizing, and interpreting data. It involves gathering raw data, organizing it into tables or diagrams, numerically summarizing the data using measures like the mean, median and mode, analyzing patterns in the data using mathematical formulas, and making inferences about the broader population based on the sample data. Key steps in any statistical study are collecting raw data, tabulating it, representing it pictorially or graphically, summarizing it numerically using measures of central tendency and dispersion, analyzing it mathematically, and drawing a conclusion.

Uploaded by

vanchagarg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 66

Statistics:

Statistics is the branch of science where we plan, gather and


analyze information about a particular collection of individuals or
objects under investigation.
Statistics is defined differently by different authors over a period of
time.

 Statistics are numerical statement of facts in any department of


enquiry placed in relation to each other.
- A.L. Bowley
 Statistics may be defined as the science of collection, presentation
analysis and interpretation of numerical data from the logical
analysis. It is clear that the definition of statistics by Croxton and
Cowden is the most scientific and realistic one. According to this
definition there are four stages: Collection of Data, Presentation of
data, Analysis of data and Interpretation of data.
- Croxton and Cowden
Basic Steps in a Statistical Study:
For any statistical study, there are some basic steps to be followed once
we draw a sample. These are:

• Step 1: Gather first-hand information from the sample and this is called
the raw data.
• Step 2: Tabular representation of the raw data, i.e., represent the raw data
in a table.
• Step 3: Pictorial representation of the data, i.e., draw diagrams with the
organized data in a table.
• Step 4: Numerically summarize the data, i.e., describe the entire data set
with some key numbers.
• Step 5: Analyze the data using mathematical formulae.
• Step 6: Draw the final inference or conclusion about the population
under study.
Data Analysis:

• The data can be collected in connection with time or


geographical location or in connection with time and location.

• Any statistical data can be classified under two categories


depending upon the sources utilized.

• Primary data

• Secondary data
Primary Data:

Primary data is the one, which is collected by the investigator himself


for the purpose of a specific inquiry or study. Such data is original in
character and is generated by survey conducted by individuals or
research institution or any organisation

Example:
If a researcher is interested to know the impact of noon meal scheme
for the school children, he has to undertake a survey and collect data
on the opinion of parents and children by asking relevant questions.
Such a data collected for the purpose is called primary data.
Methods for Collecting Primary Data:

The primary data can be collected by the following five methods.

1. Direct personal interviews


2. Indirect Oral interviews
3. Information from correspondents
4. Mailed questionnaire method
5. Schedules sent through enumerators
Secondary Data:

Secondary data are those data which have been already collected
and analyzed by some earlier agency for its own use; and later
the same data are used by a different agency.
Frequency Distribution:
Frequency distribution is a series when a number of observations
with similar or closely related values are put in separate bunches or
groups, each group being in order of magnitude in a series. It is
simply a table in which the data are grouped into classes and the
number of cases which fall in each class are recorded. It shows the
frequency of occurrence of different values of a single Phenomenon.

A frequency distribution is constructed for three main reasons:

1) To facilitate the analysis of data.


2) To estimate frequencies of the unknown population distribution
from the distribution of sample data.
3) To facilitate the computation of various statistical measures.
Raw Data or Ungrouped Data:

The statistical data collected are generally raw data or ungrouped data.

Example:
Let us consider the daily wages (in Rs.) of 30 laborers in a factory.

800, 700, 550, 500, 600, 650, 400, 300, 800, 900, 750, 450, 350, 650,
700, 800, 820, 550, 650, 800, 600, 550, 380, 650, 750, 850, 900, 650,
450, 750.
Given a raw data set, we can rearrange it in two different ways.

 Frequency distribution or Discrete frequency distribution:

Using the frequency of the variable we can arrange it. This representation
of the data is known as frequency distribution.

 Grouped frequency distribution / Continuous frequency distribution:


Again we can arrange it for the class intervals. For this situation, it is
called as Grouped frequency distribution of the variable.
Examples:

𝒙 𝑓 𝒙 𝑓 𝒙 𝑓
15 2 1-9 3 5 - 10 2
17 3 10 - 19 5 10 - 15 3
18 5 20 - 29 10 15 -20 5
20 4 30 - 39 4 20 - 25 4
22 7 40 - 49 7 25 - 30 7
25 9 50 – 59 6 30 -35 9
30 3 60 - 69 3 35 - 40 3

(𝒊) (𝒊𝒊) (𝒊𝒊𝒊)


Frequency distribution Grouped Frequency distribution Continuous Frequency distribution
Or
Discrete frequency distribution
Special case in grouped frequency distribution:

If “𝑑” is the gap between the upper limit of any class and the lower limit
of the succeeding class, the class boundaries for any class are then given by:

𝒅
Upper class boundary = Upper class limit +
𝟐
𝒅
Lower class boundary = Lower class limit -
𝟐
Summarizing a raw data set or an organized data set
There are two basic properties of a quantitative data set that are
commonly studied. These are central tendency and variability (or
dispersion).

Central Tendency: Quite often it is found that the entries in data set
cluster around a central (or middle) value. This behavior of the data
set is called the central tendency. The main Challenge is to locate a
central value around which the clustering takes place.

Three standard methods to measure the location of central tendency


are:
* Mean
* Median
* Mode
Variability or Dispersion: Variability or dispersion of data set means
the amount of discrepancies among the data entries. There are several
ways to measure dispersion or variability in a data set and these are:

* Range
* Quartile deviation
* Variance
* Standard deviation
1. Arithmetic Mean or Average

 Mean of “𝑛” observations (𝑥1 , 𝑥2 , ⋯ , 𝑥𝑛 ) is given by


1
𝑥= 𝑥𝑖 .
𝑛

 In case of the discrete frequency distribution:

If 𝑓𝑖 ’s are the frequencies of the variable 𝑥𝑖 ’s then mean


1
𝑥= 𝑓𝑖 𝑥𝑖 , where n = 𝑓𝑖 .
𝑛

 In case of the continuous frequency distribution:

If 𝑓𝑖 ’s are the frequencies of the class intervals then mean


(𝑚𝑖𝑑𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 𝑒𝑎𝑐ℎ 𝑐𝑙𝑎𝑠𝑠)×𝑓𝑖
𝑥= .
𝑓𝑖
Calculation of mean by using deviation concept:
Sometime the values of the variable (𝑥) or frequency (𝑓) or both are large.
Then, the calculation of mean by previous formulas is quite time-consuming.
Hence, to avoid such situations we are calculating mean by taking the
deviations of the given values from any arbitrary point “𝐴” as explained
below:
1
Discrete frequency distribution: 𝑥=𝐴 + 𝑓𝑖 𝑑𝑖 ,
𝑁
where “𝐴” is an arbitrary point, 𝑑𝑖 = 𝑥𝑖 − 𝐴 and 𝑁 = 𝑓𝑖 .

Continuous frequency distribution: 𝑥 = 𝐴 + 𝑓𝑖 𝑑𝑖 ,
𝑁

where “𝐴” is an arbitrary point, "ℎ” is the magnitude of class interval


𝑥 −𝐴
and 𝑁 = 𝑓𝑖 . Here, 𝑑𝑖 = 𝑖 , where 𝑥𝑖 ’s are the mid value of each

class.
65
Karl Pearson Relationship:

Sometimes mode is estimated from the mean and the median.


For a symmetrical distribution, mean, median and mode
coincide. If the distribution is moderately asymmetrical, the
mean, median and mode obey the following empirical
relationship (due to Karl Pearson) :

The distance between mean and median is about one-third of


the distance between the mean and mode

Mean – Mode = 3 (Mean - Median)

Which gives, Mode = 3 Median – 2 Mean.


Relation between Mean, Median, Mode:

1. In symmetrical distribution Mean = Median = Mode.

2. In positively skewed distribution Mode < Median < Mean.

3. In negatively skewed distribution Mean < Median < Mode.


Partitions:
• Quartiles
• Deciles
• Percentiles
Partitions:
These are the values which divided the series into a number of equal parts.

Quartiles: The three points which divided the series in to four equal parts
are called quartiles. It is denoted by 𝑄1 , 𝑄2 , 𝑄3 .

Deciles: The nine points which divided the series in to ten equal parts
are called deciles. It is denoted by 𝐷1 , 𝐷2 , ⋯ , 𝐷9 .

Percentiles: The ninety-nine points which divided the series in to hundred


equal parts are called percentiles. It is denoted by 𝑃1 , 𝑃2 , ⋯ , 𝑃99 .
For discrete frequency distribution:
𝑘𝑁
Quartiles:- 𝑄𝑘 ∶ ; Identify the same value in 𝑐𝑓 list, otherwise find 𝑐𝑓
4
𝑘𝑁
just greater than , the corresponding variable is the
4
quartile value. Here, 𝑘 = 1, 2, 3.
𝑘𝑁
Deciles:- 𝐷𝑘 ∶ ; Identify the same value in 𝑐𝑓 list, otherwise find 𝑐𝑓
10
𝑘𝑁
just greater than ,
the corresponding variable is the
10
decile value. Here, 𝑘 = 1, 2, 3, 4, 5, 6, 7, 8, 9.

𝑘𝑁
Percentiles:- 𝑃𝑘 ∶ ; Identify the same value in 𝑐𝑓 list, otherwise find 𝑐𝑓
100
𝑘𝑁
just greater than ,
the corresponding variable is the
100
percentile value. Here, 𝑘 = 1, 2, ⋯ , 99.
For continuous frequency distribution:

Quartiles:- Step-1: Find quartile class by:


𝑘𝑁
Compute ; Identify the same value in 𝑐𝑓 list, otherwise
4
𝑘𝑁
find 𝑐𝑓 just greater than .
4
Step-2: Use the formula
ℎ 𝑘𝑁
𝑄𝑘 = 𝑙 + − 𝑐 ; 𝑘 = 1, 2, 3.
𝑓 4

Deciles:- Step-1: Find decile class by:


𝑘𝑁
Compute ; Identify the same value in 𝑐𝑓 list, otherwise
10
𝑘𝑁
find 𝑐𝑓 just greater than .
10
Step-2: Use the formula
ℎ 𝑘𝑁
𝐷𝑘 = 𝑙 + − 𝑐 ; 𝑘 = 1, 2, ⋯ , 9.
𝑓 10
Percentiles:- Step-1: Find percentile class by:
𝑘𝑁
Compute ; Identify the same value in 𝑐𝑓 list, otherwise
100
𝑘𝑁
find 𝑐𝑓 just greater than .
100

Step-2: Use the formula


ℎ 𝑘𝑁
𝑃𝑘 = 𝑙 + − 𝑐 ; 𝑘 = 1, 2, ⋯ , 99.
𝑓 100
Problems based on the Moments, Skewness and Kurtosis concepts:

Q1) Find the first four moments about 𝑥 = 10 for the series 4, 7, 10, 13, 16, 19, 22.

Q2) Calculate the first four moments about the mean for the series 4, 7, 10, 13, 16,
19, 22.

Q3) The first four moments of a distribution about 𝑥 = 4 are 1, 4, 10 and 45.
Comment upon the nature of the distribution.

Q4) In a certain distribution, the first four moments about x=5 are 2, 20, 40 and 50.
Calculate and state whether the distribution is leptokurtic or platykurtic.

Q5) The first four central moments of a distribution are 0, 2.5, 0.7 and 18.75. Test the
skewness and kurtosis of the distribution.

You might also like