0% found this document useful (0 votes)
11 views

Classification of Data

Uploaded by

Elijah Moturi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Classification of Data

Uploaded by

Elijah Moturi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

CLASSIFICATION AND TABULATION OF DATA

CLASSIFICATION
- Classification is the process of grouping data into different categories, on the basis of nature,
behavior, or common characteristics.
Objectives of Classification
1. Simplification and Briefness: Classification presents data in a brief manner. Hence, it
becomes fairly easy to analyze the data.
2. Utility: As classification highlights the similarity in the data, it brings out its utility.
3. Distinctiveness: With the help of grouping data into different classes, classification also brings out
the distinctiveness in data.
4. Comparability: As already mentioned, it facilitates comparison of data.
5. Scientific Arrangement: Classification arranges data on scientific lines. Thus it also increases the
reliability of data.
6. Attractive and Effective: Lastly, through the process of classification, data becomes effective and
attractive.
Characteristics of a Good Classification
1. Comprehensiveness: Classification should cover all the items of the data. In other words, it should
be so comprehensive that it classifies all items in some group or class.
2. Clarity: There should be no confusion of the placement of any data item in a group or class. That is,
classification should be absolutely clear.
3. Homogeneity: The items within a specific group or class should be similar to each other.
4. Suitability: The attribute or characteristic according to which classification is done should agree
with the purpose of classification.
5. Stability: A particular kind of investigation should be effected on the same set of classification.
6. Elastic: As the purpose of classification changes, one should be able to change the basis of
classification.
Primary Rules of Classification
- In quantitative classification, we classify data by assigning arbitrary limits called class-limits. The
group between any two class-limits is termed as class or class-interval.
- The primary rules of classification are given below:
1. There should not be any ambiguity in the definition of classes. It will eliminate all doubts while
including a particular item in a class.
2. All the classes should preferably have equal width or length. Only in some special cases, we use
classes of unequal width.
3. The class-limits (integral or fractional) should be selected in such a way that no value of the item in
the raw data coincides with the value of the limit.
4. The number of classes should preferably be between 10 and 20, i.e., neither too large nor too small.
5. The classes should be exhaustive, i.e., each value of the raw data should be included in them.
6. The classes should be mutually exclusive and non-overlapping, i.e., each item of the raw data
should fit only in one class.
7. The classification must be suitable for the object of inquiry.
8. The classification should be flexible and items included in each class must be homogeneous.
9. Width of class-interval is determined by first fixing the no. of class-intervals and then dividing the
total range by that number.
Basis of Classification
- Definitely, we can classify a given data according to various characteristics, depending on the
purpose of our study. Evidently, there is the various basis of classification:
i. Geographical or Spatial classification
When we classify data according to different locations, it is termed as a geographical
classification of data.
ii. Chronological OR Temporal Classification
- In chronological classification, we classify data according to time i.e. it follows a chronological
sequence. For example, the classification of the data about the number of deaths in Kenya according
to the years.
iii. Qualitative Classification
- Here, we classify data according to the qualities or attributes of data. One key point to remember is
that an attribute is qualitative in nature i.e. we cannot measure an attribute in quantitative terms like
5, 1, 2 etc. This qualification is further of two types:
- Simple: In the simple qualitative classification of data, we qualify data exactly into two groups.
One group has data items that exhibit the quality, the other group doesn’t. Evidently, it is also
known as classification according to a dichotomy. Example of classes can be educated-
uneducated, male-female and so on.
- Manifold: Here we classify data according to more than one characteristic of an attribute. This
means one we classify data into two groups according to an attribute, the two groups are further
divided into two according to another attribute. As a result, there can be many levels of
classification couples with more than just two classes. For example, the classification of data
about students in a class, according to their gender, followed by classification according to
whether they are fat or not.
iv. Quantitative or Numerical Classification
- Qualitative classification allows numerical division of data into classes. Here, each class represents
a range of numerical values for the phenomenon under consideration. Accordingly, we frame each
class with a lower and higher value and according to the range of data.
TABULATION
- Tabulation is a process of summarizing data and presenting it in a compact form, by putting data
into statistical table.
- It is a systematic arrangement of data in columns and rows that represents data in concise and
attractive way.
- Tabulation may be defined as the systematic presentation of numerical data in rows or/and
columns according to certain characteristics.
- It expresses the data in concise and attractive form which can be easily understood and used to
compare numerical figures. Before drafting a table, you should be sure what you want to show
and who will be the reader.
Objectives of Tabulation
- The main objectives of tabulation are stated below:
(i) To carry out investigation
(ii) To do comparison
(iii) To locate omissions and errors in the data
(iv) To use space economically
(v) To study the trend
(vi) To simplify data
(vii) To use it as future reference
Basic principles of tabulation
1. Tables should be clear, concise & adequately titled.
2. Every table should be distinctly numbered for easy reference.
3. Column headings & row headings of the table should be clear & brief.
4. Units of measurement should be specified at appropriate places.
5. Explanatory footnotes concerning the table should be placed at appropriate places.
6. Source of information of data should be clearly indicated.
7. The columns & rows should be clearly separated with dark lines.
8. Demarcation should also be made between data of one class and that of another.
9. Comparable data should be put side by side.
10. The figures in percentage should be approximated before tabulation.
11. The alignment of the figures, symbols etc. should be properly aligned and adequately spaced to
enhance the readability of the same.
12. Abbreviations should be avoided.
Contents of a table
1. Table number: A number must be allotted to the table for identification, particularly when there
are many tables in a study.
2. Title: The title should explain what is contained in the table. It should be clear, brief and set in
bold type on top of the table. It should also indicate the time and place to which the data refer.
3. Date: The date of preparation of the table should be given.
4. Stubs, or, Row designations: Each row of the table should be given a brief heading. Such
designations of rows are called “stubs”, or, “stub items” and the entire column is called “stub
column”.
5. Column headings, or, Captions: Column designation is given on top of each column to explain
to what the figures in the column refer. It should be clear and precise. This is called a “caption”,
or, “heading”. Columns should be numbered if there are four, or, more columns.
6. Body of the table: The data should be arranged in such a way that any figure can be located
easily. Various types of numerical variables should be arranged in an ascending order, i.e., from
left to right in rows and from top to bottom in columns. Column and row totals should be given.
7. Unit of measurement: If the unit of measurement is uniform throughout the table, it is stated at
the top right-hand corner of the table along with the title. If different rows and columns contain
figures in different units, the units may be stated along with “stubs”, or, “captions”. Very large
figures may be rounded up but the method of rounding should be explained.
8. Source: At the bottom of the table a note should be added indicating the primary and secondary
sources from which data have been collected.
9. Footnotes and references: If any item has not been explained properly, a separate explanatory
note should be added at the bottom of the table. A table should be logical, well-balanced in
length and breadth and the comparable columns should be placed side by side. Light/heavy/thick
or double rulings may be used to distinguish sub columns, main columns and totals. For large
data more than one table may be used.
The advantages of a tabular presentation over the textual presentation are:
i. It is concise
ii. There is no repetition of explanatory matte
iii. Comparisons can be made easily
iv. The important features can be highlighted
v. Errors in the data can be detected.
Differences between Classification and Tabulation
- The paramount differences between classification and tabulation are discussed in the points given
below:
1. The process of arranging data into different categories, on the basis of nature, behavior, or common
characteristics is called classification. A process of condensing data and presenting it in a compact
form, by putting data into statistical table, is called tabulation.
2. Classification of data is done after data collection process is completed. On the other hand,
tabulation follows classification.
3. Data classification is based on similar attributes and variables of the observations. Conversely, in
tabulation the data is arranged in rows and columns, in a systematic way.
4. Classification of data is performed with the objective of analyzing data in order to draw inferences.
Unlike tabulation, which aims at presenting data, to ensure easy comparison of various figures.
5. In classification, data is bifurcated into categories and sub-categories while in tabulation data is
divided into headings and sub-headings.
Illustration:
The total number of accidents in Kenya Railway in 1960 was 3, 500, and it decreased by 300 in
1961 and by 700 in 1962. The total number of accidents in meter gauge section showed a
progressive increase from 1960 to 1962. It was 245 in 1960, 346 in 1961; and 428 in 1962. In the
meter gauge section, “not compensated” cases were 49 in 1960, 77 in 1961, and 108 in 1962.
“Compensated” cases in the broad gauge section were 2, 867, 2, 587 and 2, 152 in these three
years respectively. From the above report, you are required to prepare a neat table as per the
rules of tabulation.
Answer:
Table 1.15: Number of Accidents in KENYA Railway from 1960 to 1962
Section 1960 1961 1962
C N TOTAL C N TOTAL C N TOTAL
Meter gauge 196 49 245 269 77 346 320 108 428
Broad gauge 2867 388 3255 2587 267 2844 2152 220 2372
TOTAL 3063 437 3,500 2856 344 3200 2472 328 2800
Frequency distribution
- A frequency distribution is a tabular arrangement of data whereby the data is grouped into
different intervals, and then the number of observations that belong to each interval is
determined. Data that is presented in this manner are known as grouped data.
- The smallest value that can belong to a given interval is called the lower class limit, while the
largest value that can belong to the interval is called the upper class limit. The difference
between the upper class limit and the lower class limit is defined to be the class width. When
designing the intervals to be used in a frequency distribution, it is preferable that the class widths
of all intervals be the same.
- Important Terms
i. Class – a quantitative or qualitative category. A class may be a range of numerical values (that
acts like a “category”) or an actual category
ii. Class-limits: The maximum and minimum values of a class-interval are called upper class limit
and lower class-limit respectively. In Table 1.5 the lower class-limits of nine classes are 56, 58,
60, 62, 64, 66, 68, 70, 72 and the upper class-limits are 57, 59, 61, 63, 65, 67, 69, 71, and 73.
iii. Class-mark, or, Mid-value: The class-mark, or, mid-value of the class-interval lies exactly at the
middle of the class-interval.
iv. Class boundaries: Class boundaries are the true-limits of a class interval. It is associated with
grouped frequency distribution, where there is a gap between the upper class-limit and the lower
class-limit of the next class.
v. Width or Length (or size) of a Class-interval: Width of a class-interval = Upper class boundary −
Lower class boundary
vi. Frequency: The frequency of any value is the number of times that value appears in a data set. So
from the above examples of colors, we can say two children like the color blue, so its frequency
is two. So to make meaning of the raw data, we must organize. And finding out the frequency of
the data values is how this organization is done.
Types of frequency distributions
1. Ungrouped Frequency Distribution
- Is a frequency distribution where each class is only one unit wide
- Meaningful when the data does not take on many values.
- Each class is constructed using a single data value for each class, e.g., 0, 1, 2, 3, …, 10
- Class boundaries will be defined to separate the classes (when graphing) so there are no gaps in
the frequency distribution.
2. Grouped frequency distribution
- Frequency of a quantitative variable with a large range of values, so the data must be grouped
into classes that are more than one unit in width.
3. Cumulative frequency distribution
- Distribution that shows the number of observations less than or equal to a specific value
Class limits in exclusive and inclusive form:
Exclusive form
- In exclusive form, the lower and upper limits are known as true lower limit and true upper limit
of the class interval.
- In this method the upper limit of a class becomes the lower limit of the next class.
- Thus, class limits of 10 - 20 class intervals in the exclusive form are 10 and 20.
- When the lower limit is included, but the upper limit is excluded, then it is an exclusive class
interval. For example - 150 - 153, 153 - 156.....etc are exclusive type of class intervals. In the
class interval 150 - 153, 150 is included but 153 is excluded.
Usually in the case of continuous variate, exclusive type of class intervals are used.
Inclusive form
- In inclusive form, class limits are obtained by subtracting 0.5 from lower limit and adding 0.5 to
the upper limit.
- In this method the upper limit of any class interval is kept in the same class-interval
- Thus, class limits of 10 - 20 class interval in the inclusive form are 9.5 - 20.5.
- When the lower and the upper class limit is included, then it is an inclusive class interval. For
example - 220 - 234, 235 - 249..... Etc. are inclusive type of class intervals. Usually in the case of
discrete variate, inclusive type of class intervals are used.

You might also like