0% found this document useful (0 votes)
6 views

Classification of Data (1)

The document provides a comprehensive overview of data classification, defining it as the grouping of related facts into classes and explaining various types including geographical, chronological, qualitative, and quantitative classifications. It outlines the objectives of classification, methods for creating frequency distributions, and the importance of relative frequency and bivariate distributions. Additionally, it includes examples and methods for organizing data effectively for analysis.

Uploaded by

ellen deus
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Classification of Data (1)

The document provides a comprehensive overview of data classification, defining it as the grouping of related facts into classes and explaining various types including geographical, chronological, qualitative, and quantitative classifications. It outlines the objectives of classification, methods for creating frequency distributions, and the importance of relative frequency and bivariate distributions. Additionally, it includes examples and methods for organizing data effectively for analysis.

Uploaded by

ellen deus
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

CLASSIFICATION OF DATA

DEFINITION

 Classification is the grouping of related facts into classes.


 Facts in one class differ from those of classification.
 Sorting facts on one basis of classification and then on another basis is
called cross-classification.
 This process can be repeated as many times as there are possible bases of
classification.
 Example, sorting letters in a post office.
OBJECTIVES OF CLASSIFICATION

 To condense the mass of data in such a manner that similarities and


dissimilarities can be readily apprehended.
 To facilitate comparison.
 To pinpoint the most significant features of the data at a glance.
 To give prominence to the important information gathered while dropping
out unnecessary elements.
 To enable a statistical treatment of the material collected.
Types of Classification

 Geographical, i.e., area-wise, eg., cities, districts, etc.


 Chronological, i.e., on the basis of time.
 Qualitative, i.e., according to some attributes.
 Quantitative, i,.e., in term of magnitudes.
Geographical classification
In this type of classification data are classified on the basis of geographical
or vocational differences between the various items like Countries, states,
cities, regions, zones, areas, etc. For example, the international comparison
of rice yield for 2023-04 is given in the following table.
Name of the country Production of Rice (tones/hectare)
Egypt 9.80
India 2.90
Japan 6.42
Myamar 2.43
Korea 6.73
Thailand 2.63
USA 7.83
Average 5.53

Geographical classification are usually listed in alphabetical order for easy


reference. Items may also be listed by size to emphasize the important
areas as in the ranking the States by population.
Chronological Classification
When data are observed over a period of time the type of classification is known
as chronological classification. For example, we may present the figures of
population, (or production sales, etc) as follows:

Year Population (in Year Population (in


millions) millions)
1972 36.11 2002 68.33
1982 43.92 2012 84.64
1992 54.82 2022 102.87

Time series are usually listed in chronological order, normally starting with
the earliest period. When emphasis falls on the most recent events, a
reverse time order may be used.
Qualitative Classification
In qualitative classification data are classified on the basis of some attribute
or quality such as sex, color of hair, literacy, religion, etc. The attribute under
study cannot be measured; one can only find out whether it is present or
absent in the units of the population under study. For example population
under study may be divided into two categories as follows:

Population

Urban Rural
In a similar manner, we may classify population on the basis of sex, i.e.
into males and females, or literacy, i.e., into literate and illiterate, and
so on.
Population

Males Females

Literates Illiterates Literates Illiterates

Employed Unemployed Employed Unemployed Employed Unemployed Employed Unemployed


Quantitative Classification
Quantitative classification refers to the classification of data according
to some characteristics that can be measured, such as height, weight,
income, sales, profits, production, etc. For example, the students of a
college may be classified according to weight as follows:

Weight (in lb.) No. of Students


90 − 100 50
100 − 110 200
110 − 120 260
120 − 130 360
130 − 140 90
140 − 150 40
Total 1,000

Such a distribution is known as empirical frequency distribution or simple


frequency distribution.
In this type of classification, there are two elements:
1. The variable, i.e., the weight, and
2. The frequency, i.e., the number of students in each class.

A frequency distribution refers to data classified on the basis of some variable


that can be measured such as prices, wages, age, number of units produced
or consumed. The term variable refers to the characteristic that varies in
amount or magnitude in a frequency distribution. A variable may be either
continuous or discrete.
A continuous variable, also called continuous random variable, is
capable of manifesting every conceivable fractional value within the
range of possibilities, such as the height or weight of persons or the
weight of a product. In a continuous variable, thus, data are obtained
by numerical measurements rather than counting.

A discrete variable is that which can vary only by finite jumps and
cannot manifest every conceivable fractional value. For instance, the
number of rooms in a house can only take values such as 1, 2, 3, etc.
Discrete data are obtained by counting.
Series which can be described by a continuous variable are called continuous
series. Series represented by a discrete variable are called discrete series.

DISCRETE
CONTINUOUS

No. of No. of Weight (in lb.) No. of Students


children families
90 − 100 50
0 10
100 − 110 200
1 40
110 − 120 260
2 80 120 − 130 360
3 100 130 − 140 90
4 250 140 − 150 40
5 150 Total 1,000
6 50
Total 680
Formation of a discrete frequency
distribution
 To prepare this type of distribution, count the number of times a particular
value is repeated which is called the frequency of that class.
 To facilitate counting, prepare a column of tallies.
 In another column, place all possible values of variable from the lowest to
the highest.
 Then put a bar (vertical line) opposite the particular value to which it
relates.
 To facilitate counting, block of five bars are prepared and some space is
left in between each block.
 We finally count the number of bars and get the frequency.
Task

 In a survey of 35 families in a village, the number of children per family was


recorded and the following data obtained:

1 0 2 3 4 5 6
7 2 3 4 0 2 5
8 4 5 12 6 3 2
7 6 5 3 3 7 8
9 7 9 4 5 4 3
 Represent the data in the form as of a discrete frequency distribution.
Formation of a continuous frequency
distribution
 The following technical terms are important when a continuous frequency
distribution is formed or data are classified according to class-interval:
 Class limits: the class limits are the lowest and the highest values that can be
included in the class. For example, take the class20 − 40. The two boundaries of
class are known as the lower limit and the upper limit of the class. The lower limit
of a class is the value below which there can be no item in the class. The upper
limit of a class is the value above which no item can belong to that class.
 Class intervals: the difference between the upper and lower limit of a class is
known as class interval. For example, in the class 200-300, the class interval is 100
(i.e., 300 minus 200).
 Class frequency: The number of observations corresponding to a particular class
is known as the frequency of that class or the frequency.
 Class mid-point or class mark: it is the value lying half-way between the
lower and upper class limits of a class-interval.
𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈 𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙 𝑜𝑜𝑜𝑜 𝑡𝑡𝑡𝑡𝑡 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐+𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿 𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙 𝑜𝑜𝑜𝑜 𝑡𝑡𝑡𝑡𝑡 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐
 𝑀𝑀𝑀𝑀𝑀𝑀 − 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝑜𝑜𝑜𝑜 𝑎𝑎 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 =
2
Methods of classifying data according
to class-intervals
 Exclusive method: When the class intervals are so fixed that the upper limit
of one class is the lower limit of the next class it is called the exclusive
method of classification.
Income (Tsh) No. of persons
1000-1100 50
1100-1200 100
1200-1300 200
1300-1400 150
1400-1500 40
1500-1600 10
Total 550
 Inclusive method: The upper limit of one class is included in that class itself.

Income (Tsh) No. of persons


1000-1099 50
1100-1199 100
1200-1299 200
1300-1399 150
1400-1499 40
1500-1599 10
Total 550
Example

 Prepare a frequency table for the following data with width of each class
interval as 10. Use exclusive method of classification:

57 44 80 75 00 18 45 14 04 64
72 51 69 34 22 83 70 20 57 28
96 56 50 47 10 34 61 66 80 46
22 10 84 50 47 73 42 33 48 65
10 34 66 53 75 90 58 46 39 69
Solution:
Marks Tallies Frequency
0 − 10 || 2
10 − 20 |||| 5
20 − 30 |||| 4
30 − 40 |||| 5
40 − 50 |||| ||| 8
50 − 60 |||| ||| 8
60 − 70 |||| || 7
70 − 80 |||| 5
80 − 90 |||| 4
90 − 100 || 2
𝑻𝑻𝑻𝑻𝑻𝑻𝑻𝑻𝑻𝑻 𝟓𝟓𝟓𝟓
Relative Frequency Distribution

 At times it may be desirable to convert class frequencies to relative class


frequencies to show the percentage of the total number of observations in
each class.
 In order to convert a frequency distribution to a relative frequency
distribution, each of the class frequencies is divided by the total number of
frequencies so that the relative frequencies would always total 1.
For illustration

Marks Tallies Frequency


0 − 10 2 0.04
10 − 20 5 0.10
20 − 30 4 0.08
30 − 40 5 0.10
40 − 50 8 0.16
50 − 60 8 0.16
60 − 70 7 0.14
70 − 80 5 0.10
80 − 90 4 0.08
90 − 100 2 0.04
𝑵𝑵 = 𝟓𝟓𝟓𝟓
Bivariate or Two-way Frequency
Distribution
 So far, we have described frequency distributions involving one variable
only. Such frequency distributions are called univariate frequency
distributions.
 In many situations simultaneous study of two variables becomes necessary.
 For example, we want to classify data relating to age of husbands and age of
wives or data relating to marks in statistics and marks in accountancy or height
and weight of students.
 The data so classified on the basis of two variables give rise to what is
called a bivariate frequency distribution or bivariate frequency table.
 In preparing a bivariate frequency distribution, we consider the values of
each variable.
 If the data corresponding to one variable, say X, is grouped into m classes
and the data corresponding to the other variable, say Y, is grouped into n
classes then the bivariate table will consist of 𝑚𝑚 × 𝑛𝑛 cells.
 By going through the different pairs of the values (X,Y) of the variable and
using tally marks we can find the frequency of each cell and thus form
bivariate frequency distribution.
 The frequency distribution of the values of the variable X together with their
frequency totals is called the marginal distribution of X and the frequency
distribution of the values of variable Y together with the total frequencies is
known as the marginal frequency distribution of Y.
Example: the data below relate to the height and
weight of 20 persons. You are required to form a two-way
frequency table with class interval 62” to 64”, 64” to 66, and
so on and 115 to 125 lb, 125 to 135 lb., etc.
S.No. Weight Height S.No. Weight Height
1 170 70 11 163 70
2 135 65 12 139 67
3 136 65 13 122 63
4 137 64 14 134 68
5 148 69 15 140 67
6 121 63 16 132 69
7 117 65 17 120 65
8 128 70 18 148 68
9 143 71 19 129 67
10 129 62 20 152 67
Solution

Weight

Height 115-125 125-135 133-145 145-155 155-165 165-175 Total


62-64 || (2) | (1) 3

64-66 || (2) ||| (3) 5


66-68 | (1) || (2) | (1) 4
68-70 || (2) || (2) 4
70-72 | (1) | (1) | (1) | (1) 4
Total 4 5 6 3 1 1 20
TABULATION OF DATA

 The simplest and most revealing devices for summarizing data and
presenting them in a meaningful fashion is the statistical table.
 A table is a systematic arrangement of statistical data in columns and rows.
 Rows are horizontal arrangements whereas columns are vertical ones.
 The purpose of a table is to simplify the presentation and to facilitate
comparisons.
Role of tabulation
 It simplifies complex data: When data are tabulated all unnecessary
details and repetitions are avoided. Data are presented systematically in
columns and rows. Hence, the reader gets a very clear idea of what the
table presents. Also, a large amount of space is saved because of non-
duplicating of his headings and designations: the description at the top of
a column serves for all the terms beneath it.
 It facilitates comparison: Since a table is divided into various parts and for
each part there are total and sub-totals, the relationship between different
parts of data can be studied much more easily with the help of a table
than without it.
 It gives identity to the data: When the data are arranged in a table with a
title and number they can be distinctly identified and can be used as a
source reference in the interpretation of a problem.
 It reveal patterns: Tabulation reveal patterns within the figures which
cannot be seen in the narrative form. It also facilitates the summation of the
figures if the reader desires to check the total.
Type of tables
1. Simple and Complex Tables
 The distinction between simple and complex tables is based upon the
number of characteristics studied.
 In a simple table only one characteristic is shown. Hence, this type of table
is also known as one-way table.
 In a complex table, two or more characteristics are shown. Such tables are
more popular in practice because they enable full information to be
incorporated and facilitate a proper consideration of all related facts.
 When two characteristics are shown such a table is known as two-way or
double tabulation. It is formed when either the row or the column is divided
into two coordinate parts.
 When three or more characteristic are represented in the same table, such
a table is called higher order table. The need for such a table arises when
we are interested in presenting a number of characteristics simultaneously.
While constructing such a table it is necessary to first establish an order of
precedence among the attributes or characteristics sought to be classified
having regard to their relative importance.
Age (in yers) No. of Employees
𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵 25 ……..
25 − 35 …… Simple Table
35 − 45 ……..
45 − 55 ……
𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴 55 ……..
𝑻𝑻𝑻𝑻𝑻𝑻𝑻𝑻𝑻𝑻 … …

Employees 𝑻𝑻𝑻𝑻𝑻𝑻𝑻𝑻𝑻𝑻
Age (in years) Males Females
Two-way Table
𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵 25 …… …… ……
25 − 35 …… …… ……
35 − 45 …… …… ……
45 − 55 …… …… ……
𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴 55 …… …… ……
𝑻𝑻𝑻𝑻𝑻𝑻𝑻𝑻𝑻𝑻 …… …… ……
2. General and Special Purpose Tables

 General purpose tables also known as reference tables or repository tables


provide information for general use or refence. They usually contain
detailed information and are not constructed for specific discussion.
 Special purpose tables, also known as summary tables provide information
for a particular discussion. These tables are also called derivative tables
since they are often derived from general tables..

You might also like