0% found this document useful (0 votes)
48 views

Classification of Data

This document discusses concepts related to data classification and tabulation. It defines key terms like variables, quantitative vs. qualitative variables, random variables, and ordered arrays. It explains that data classification involves grouping related data into classes based on common characteristics like geography, time, attributes, or quantitative intervals. The objectives of classification are to condense data, facilitate comparison, identify significant features, and enable statistical analysis. Principles of classification include exhaustive and mutually exclusive classes, appropriate number and width of classes. Frequency distributions can be continuous or discrete based on class intervals.

Uploaded by

rga
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views

Classification of Data

This document discusses concepts related to data classification and tabulation. It defines key terms like variables, quantitative vs. qualitative variables, random variables, and ordered arrays. It explains that data classification involves grouping related data into classes based on common characteristics like geography, time, attributes, or quantitative intervals. The objectives of classification are to condense data, facilitate comparison, identify significant features, and enable statistical analysis. Principles of classification include exhaustive and mutually exclusive classes, appropriate number and width of classes. Frequency distributions can be continuous or discrete based on class intervals.

Uploaded by

rga
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 46

CLASSIFICATIO

N AND
TABULATION OF
DATA
CONTENT
 Concept of Variable

 Ordered array

 What is data Classification ?

 Objectives of Classification

 Frequency distributions

 Variables and attributes

 Tabulation of data
Concept of Variable
 Variable
A characteristic which takes on different
values in different persons, place or
things.

Example: Diastolic/Systolic blood


pressure, heart rate, the heights of adult
males, the weights of preschool children
and the ages of patients seen in a
dental clinic
X- blood pressure
Y -age
 Quantitative Variable:- One that can be
measured and expressed numerically. The
measurements convey information regarding
amount.
Example: Diastolic/Systolic blood pressure, heart
rate, the heights of adult males, the weights of
preschool children and the ages of patients seen
in a dental clinic
Grade point average, stars in a galaxy, bank
 account balance, average number of patients
discharged from COVID-19 ward daily etc
Qualitative Variable:- The
characteristics that cannot be
measured measured quantitatively
but can be categorized. The
measurement convey information
regarding the attribute.
The measurement in real sense
can’t be achieved but persons,
places or things belonging to
different categories can be counted.
Example: Gender of a patient, eye
colours, hair colour, region etc.
Random variable
Discrete variable
continuous variable
Random Variable
 Values obtained arise as a result of chance
event/factor, so that can’t be exactly predicted
in advance.
Example: heights of a group of randomly
selected adult.
 Discrete Random Variable:- Characterized by
gaps or interrupts in the values that it can
assume.
It assumes values with definite jumps.
It can’t take all possible values within a
range.
It is observed through counting only
Example: No. of daily admission to a
general hospital, the no. of decayed,
missing or filled teeth per child in an
elementary school.
Discrete Variable =X

0 1 2 4 5 6 8 10
 Continuous Random Variable:-
• It can take all possible values positive,
negative, integral and fractional
values within a specified relevant
interval.
• Doesn’t possess the gaps or
interruptions within a specified relevant
interval of values assumed by the
variable.
• Derived through measurement
Example: height, weight and skull
circumference
Because of limitations of available
measuring instruments, however
observations on variables that are
inherently continuous are recorded as if
they are discrete.
Continuous variable =X

Range between 0 and 10


0 .0001,0.0002,…….10
Intervals
0 to 1
1 to 2
2 to 3
Range = 0 to 100
Interval
0 to 10
10 -20
The ordered array
 A first step in organizing data is
preparation of an ordered array.
 It is a listing of values of a data series
from the smallest to the largest values.
 It enables one to quickly determine the
smallest and largest value in the data set
and other facts about the arrayed data
that might be needed in a hurried manner.
 Look at the unordered and ordered data in
the file Data
Example.xls

DATA CLASSIFICATION: The grouping
of related facts/data into different
classes according to certain common
characteristic.
 Basis of data Classification:
• Broadly 4 broad basis
1. Geographical i.e. area wise
• Total Population of Orissa by
districts
• No. of death due to malaria by
districts.
• Infant deaths in Orissa
by districts
Chronological or Temporal(on the basis of
time)
Example: death by lighning
YEAR NUMBER
1990 10
1991 5
1992 12
1993 6
1994 9
1995 5
1996 3
1997 3
1998 12
1999 12
2000 8
2001 7
2002 8
3. Qualitative i.e. on the basis of some
attributes
Example: People by place of residence, sex
and literacy
Place of residence
Rural Urban
Male Female Male Female
Literate Illiterate Literate Literat Illiterate Literate Illiterate
Illiterate e
4. Quantitative: On the basis of
quantitative class intervals

For example students of a college may be classified


according to weight as follows
Table 3 :Weight of students of a college
Wt. In (LBS) No. of students

90-100 50
100-110 200
110-120 260
120-130 360
130-140 90
140-150 40
Total 1000
Classification of Age of 600 person in
the Social Survey
Class Relative
Interval frequency
Frequency 09.3
15 -24
25-34 56
153 25.5
35-44 149 24.8
45-54 75 12.5
55 - 64 61 10.2
65 - 74 70 11.7
75 - 84 28 4.7
85 - 94 8 1.3
Total 600 100.0
In a survey of 35 families in a village,
the number of children per family was recorded data were obtained.

1 0 2 3 4 5 6
7 2 3 4 0 2 5
8 4 5 9 6 3 2
7 6 5 3 3 7 8
9 7 9 4 5 4 3
OBJECTIVES OF CLASSIFICATION
 Helps in condensing the mass of data
such that similarities and dissimilarities can
be readily distinguished.
No. of No of Cum. Fre. Cum.
children families Less Fre.
(Frequency) than Greater
than
0-2 7 7 35
3-5 16 23 28
6 and 12 35 12
above
Total 35
 Facilitate comparison
No. of No of Cum. Fre. Cum.
children families Less Fre.
(Frequency) than Greater
than
0-2 7 7 (20%) 35
(100)
3-5 16 23 28
(65.7%) (80%)
6 and 12 35 12
above (100%) (34%)
35
Total
 Most significant features of the data
can be pin pointed at a glance
 Enables statistical treatment of the
collected data
 Averages can be computed
 Variations can be revealed
 Association can be studied
 Model for prediction / forecasting can be
built
 Hypothesis can be formulated and
tested etc.
Principles of Classification:

There is no hard and fast rules for


deciding the class interval,
however it depends upon:
 Knowledge of the data
 Lowest and highest value of

the set of observations


 Utility of the class intervals

for meaningful comparison


and interpretation
r
 The classes should be collectively
exhaustive and non-overlapping i.e.
mutually exclusive.

 The number of classes should not be too


large other wise the purpose of class i.e.
summarization of data will not be served.

 The number of classes should not be too


small either, for this also may obscure the
true nature of the distribution.

 The class should preferable of equal


width. Other wise the class frequency
would not be comparable, and the
computation of statistical measures will
be laborious.
Continuous frequency distribution

Discrete frequency distribution


Continuous Frequency Distribution
1. Class Limits
2. Class Interval
3. Class Frequency
1.Class Limits-Lowest and Highest
values that can be included in the
class

1.In the class 70-89 , lower limit is


70 and upper limit is 89
That is ,there can be no value which
is less than 70 ,or more than 89
2.In the class 90 -109,,lower limit is
90 and upper limit is 109
That is ,there can be no value which
is less than 90 or more than 109
2.Class Interval-The span of a class,
that is the difference between the
upper limit and lower limit is known
as class interval

Example: If number of Persons with


income between 10000-15000 is
50.
The class interval in this case is
15000-10000=5000
3.Class Frequency-the number of
observations corresponding to a
particular class is known as the
frequency of that class or class
Frequency

Example: If number of Persons with


income between 10000-15000 is 50.
In this the number of persons in the
particular class interval is known as
Frequency
Mid point=(15000+10000)/2
3.Class Frequency

Class Mid-point-It is the value lying


half way between the lower and
upper class limits of class interval.

Mid-point =(upper limit of the


class+Lower limit of the class)/2
Classification based on class
intervals

Exclusive (continuous)
Inclusive (discontinuous)
Classification will be called exclusive (Continuous),
when the class intervals are so fixed that the upper
limit of one class is the lower limit of the next class
and the upper limit is not included in the class
interval.
An example
Income (Rs.) No. of
families
1000 – 1100 = (1000 but under 15
1100)
1100 – 1200 = (1100 but under 25
1200)
1200 – 1300 = (1200 but under 10
1300)
Total 50
 Classification will be inclusive
(discontinuous) when the upper and lower
limit of one class is include in that class itself

Income (Rs.) No. of


persons
1000 – 1099 = (1000 but < 50
1099)
1100 – 1199 = (1100 but < 100
1199)
1200 – 1299 = (1200 but < 200
1299)
Total 300
 Discontinuous class interval can be
made continuous by applying the
Correction factor.
Lower limit of 2nd Class – Upper
limit of the 1st Class
CF =
2

The correction factor is subtracted


from the lower limit and added to
the upper limit to make the class
interval continuous.
Frequency distributions
 Quantitative Variables:
• Discrete variable
• Continuous variable
 Qualitative variable (attributes)

 The manner in which the total


number of observations are
distributed over different classes
is called a frequency
distribution.
Frequency distribution of an attribute
Table 4 : Results of survey
on Awarenesson HIV / AIDs
State of Number of
 In 1993, 1674 Knowledge people
inhabitants of
Aware 620
Calcutta, Bombay
Unaware 1054
and Madras were
Total 1674
surveyed. Each was
asked, among,
other questions, Table 5 : Proportion of
people Aware of
whether he/she HIV / AIDS
knew about the HIV
State of Relative
/ AIDS. The results
Knowledge frequency
is tabulated.
Aware 0.370
Unaware 0.630
Total 1.000
Frequency distribution of a discrete
variable
 Data grouped in to classes and the number of
cases which fall in each class are recorded

Example: In a survey of 35 families in a village, the number


of children per family was recorded data were obtained.

1 0 2 3 4 5 6
7 2 3 4 0 2 5
8 4 5 9 6 3 2
7 6 5 3 3 7 8
9 7 9 4 5 4 3
Steps for frequency distribution
• Find the largest & smallest value;
those are 9 and 0 respectively.

• Form a table with 10 classes for the


10 values 0,1,2……9

• Look at the given values of the


variable one by one and for each value
put a tally mark in the table against
the appropriate class.

• To facilitate counting, the tally marks


are arranged in the blocks of five
every fifth stroke being drawn across
the proceeding four. This is done
below.
Table 6: Frequency Table

Cumulative Cumulative
No. of Frequency Frequency
Tallies Frequency
children Less than More
type type
than
0  2 2 35
1  1 3 33
2  4 7 32
3  13 28
6

4  5 18 22
5  5 23 17
6  3 26 12
7  4 30 9
8  2 32 5
9  3 35 3
Cumulative Frequency

A cumulative frequency distribution is the sum of


the class and all classes below it in a frequency
distribution. 
TABULATION OF DATA

 Compress the data into rows and columns


and relation can be understood.

 Tabulation simplifies complex data,


facilitate comparison, gives identify to the
data and reveals pattern
Different parts of a table
 Table number
 Title of the table
 Caption: Column Heading
 Stubs : Row heading
 Body : Contains data
 Head notes: Some thing that is not
explained in the title, caption, stubs
can be explained in the head notes on
the top of the table below the title.
 Foot notes: Source of data, some
exception in the data can be given in
the foot notes.
Table can be classified into 3
ways
Type of table Characteristic Feature
1. Simple table only one characteristic is shown
2. Complex table

a. Two way table shows two characteristics and is


formed when either the stub or the
caption is divided in to two co-
ordinate parts
b. Higher order When three or more characteristic are
table represented in the same table, such
a table is called higher order table
3. General and published by Govt. such as in the
special purpose table statistical Abstract of India, or census
reports are general purpose table
Simple Table/one way Table
 In this type of table, only one characteristics
is shown.
 Most simplest of Tables
Simple Table -Example
Age(years) No. of
Employees
Below 25 50
25-35 67
35-45 43
45-55 15
55 and above 5
Total 180
Complex table
Average Number of OPD patients in a PHC in a
tribal area in different age group according of sex

OPD Patients
Age in yrs
Male Female Total
Below 25 25 5 30
25-35 30 4 34
35-45 25 5 30
45-55 22 3 25
Above 55 15 1 16
Total 117 18 135
Number of patients in OPDs of Public sector
hospital by Religion, Age, Rank and Sex
Religion Age(in yr.)
Rank Supervisor Clerks Total
Assistant
F M T F M T F M FMT
Hindu Below 25 T
25- 35
35 – 45
45 – 55
55 & above
Muslim Below 25
25- 35
35 – 45
45 – 55
55 & above
Total

You might also like