Diagrammatic &graphical Presentation of Data
Diagrammatic &graphical Presentation of Data
Introduction
The most common and simple forms of pictorial representation of
data are:
(i) Bar diagram
(ii) Histogram
(iii) Pie diagram
(iv) Stem-Leaf display
(v) Frequency Polygon
(vi) Ogive
Though the first two approaches above are similar in nature, the bar
diagram is meant for categorical data whereas the histogram and
stem-leaf display are meant solely for quantitative data. On the
other hand, pie diagram can be used for both types of data.
Amity Business School
Data
1 1 2 4 1 4 2 4 5 2 5 4 1 1 4 2 3
4 5 1 4 1 3 2 4 3 1 2 5 4 2 3 3 2
5 4 1 4 1 4 5 5 1 4 2 4 2 2 5 2 5
1 5 3 4 1 4 1 2 1 3 4 2 4 5 5 1 2
2 1 4 3 3 1 4 1 1 1 1 2 4 1 4 3 2
2 4 1 1 2 4 4 4 5 4 5 1 1 3 2 1 3
3 1 5 3 1 3 2 1 1 1 5 3 2 3 4 2 5
1 3 1 1 1 4 2 4 4 2 1 4 4 5 5 2 1
4 4 2 5 3 2 4 1 1 4 3 2 4 2 3 1 1
1 2 1 1 4 1 4 3 4 4 2 3 1 4 5 3 3
1 4 1 2 4 1 4 5 2 2 2 5 4 4 4 1 4
4 1 4 4 1 2 4 2 2 3 2 1 4 4 3 4 1
3 4 5 3 3 1 5 1 4 2 2 1 5 5 4 1 1
1 4 3 2 2 1 1 4 2 3 1 3 3 2 2 3
4 2 2 1 4 2 3 1 5 1 1 2 1 1 1
Amity Business School
Pie Chart
If we wish to emphasize the relative frequencies instead of drawing
the bar chart, we draw the pie chart. A pie chart is simply a circle
subdivided into slices that represent the categories. It is drawn so
that the size of each slice is proportional to the percentage
corresponding to that category. For example, since the entire circle
is composed of 360 degrees, a category that contains 25% of the
observations is represented by a slice of the pie that contains 25%
of 360 degrees, which is equal to 90 degrees. The number of
degrees for each category in Example 1.1 is shown in Table 1.2.
Amity Business School
Histogram
Example 1.2: A random sample of 40 days gave the following
information about the total number of people treated per day at a
community hospital emergency room (ER).
40 35 42 6 13 50 60 27
8 42 53 17 25 23 24 12
26 32 28 28 31 29 30 28
21 46 22 19 20 30 31 30
36 30 40 38 30 29 31 41
Here,
the population = collection of days over a long period of time, and
the sample = collection of 40 days
The (quantitative) variable = number of people being treated at the ER
per day.
Amity Business School
Since the variable is quantitative and can take many possible values
(much more than a typical categorical variable), it does not make
sense to have frequencies for distinct entries (we might end up with
40 distinct entries with each having frequency 1). So, here we first
find the minimum (min) and maximum (max) entries to get a spread
of the variable (in the sample).
There is a systematic way of finding the min and max. First, find
the min and max for each column, which is easy to do, since there
are much fewer entries in a single column (compared to the whole
array). Next, find,
Amity Business School
Note that the unit here (i.e., the smallest possible increment of the
quantitative variable) is 1 (or 1 patient). We modify the range (6,
60) by extending by one half of a unit on both sides. This called a
modified range and for the present data set, our modified range is
(5.5, 60.5). The lower limit of the modified range is 5.5, and the
upper limit is 60.5. The idea behind the modified range is that it
includes the boundary values (6 and 60) properly. The length (L) of
the modified range is
L = upper limit – lower limit
= 60.5 – 5.5 = 55
Amity Business School
Class Frequency
5.5 – 16.5 4
16.5 – 27.5 10
27.5 – 38.5 17
38.5 – 49.5 6
49.5 – 60.5 3
Total 40
Amity Business School
Frequency Polygon
Histogram gives rise to another simple concept called relative
frequency polygon. Find the midpoint of each class (midpoint of a
class is found by adding the two endpoints of the class and then
dividing by 2), and then plot the relative frequencies (on y-axis)
against the midpoints (on x-axis). Connect the adjacent points with
straight line segments, and the resultant diagram is a frequency
polygon. A frequency polygon shows the trend in the data in terms of
frequency (which is also evident in the histogram).
From the frequency polygon in Figure 1.3 it is clear that for the
emergency room dataset, the frequency or relative frequency increases
as the number of patients per day increases to 33, and beyond this the
frequency starts falling. Roughly, we see that there are more days
when we treat 25 patients per day than 15 patients per day. Similarly,
less number of days treat 45 patients per day than 35 patients per day.
Amity Business School
If a frequency polygon has a longer right (left) tail than the left (right)
one along with a single hump, then the frequency polygon (or the
histogram) is called positively (negatively) skewed. If a frequency
polygon with a single hump has approximately equal left and right tails
(i.e., looks symmetric) then it is said to have a bell shape.
Amity Business School
Example 1.3 Table 1.4 gives the one-way commuting distance (in
nearest miles) of 30 work-ing mothers in a large city
Table 1.4 Commuting Distance Data
13 47 10 3 16
7 25 8 21 19
12 45 1 8 4
6 2 14 13 7
34 13 41 28 50
14 26 10 24 36
Amity Business School
For the data in Table 1.4, where all entries are one- or two-digit
numbers, we use tens digit of an entry to form the stem and the
units digit to form the corresponding leaf. For the first entry 13, the
stem is 1 and the leaf is 3. The entry 8 is treated as 08, meaning 0
for its stem and 8 for its leaf. Figure 1.5 gives the stem-leaf display
of the above mentioned data. From Figure 1.5, it is clear that most
of the entries are in the l0-mile range [i.e., (10, 19) miles], followed
by the 0-mile range [i.e., (0, 9) miles]. The horizontal length of the
leaves represents the frequency for the corresponding stem which is
essentially a class. The stem 1 represents the class 10-19 miles, or
more correctly the class 9.5-19.5 miles, since the data entries are
rounded values and hence anyone commuting 9.5 (or 9.6 or 9.7 or
9.8 or 9.9) miles would be assigned the value 10.
Amity Business School
Ogive
The frequency distribution lists the number of observations that fall into each
class interval. In some situations we may wish to highlight the number of
observations that lie below each of the class limits. In such cases we create the
cumulative frequency distribution. Table 1.5 displays this type of distribution for
Example 1.2.
Table 1.5 Cumulative Frequency table for number of individuals treated at ER per day
Class Frequency Cumulative
Frequency
5.5 – 16.5 4 4
16.5 – 27.5 10 14
27.5 – 38.5 17 31
38.5 – 49.5 6 37
49.5 – 60.5 3 40
From Table 1.5 we can see that, for example, 77.5 % of the data is less than or equal to 38.5
and that 92.5 % were less than or equal to 49.5.
Amity Business School
Summary
A set of data, even if modest in size, is often difficult to interpret
directly in the form in which it is gathered. Graphical methods
provide procedures for organizing and summarizing data so that
patterns are revealed and the data are more easily interpreted.
Fre-quency distributions, relative frequency distributions, percent
frequency distributions, bar graphs, and pie charts were presented
as tabular and graphical procedures for summarizing qualitative
data. Frequency distributions, relative frequency distributions,
percent fre-quency distributions, histograms, cumulative frequency
distributions, and ogives were pre-sented as ways of summarizing
quantitative data. A stem-and-leaf display provides an ex-ploratory
data analysis technique that can be used to summarize quantitative
data.
Amity Business School
Self Test
1. A frequency distribution is a tabular summary of data showing the
a. fraction of items in several classes
b. percentage of items in several classes
c. relative percentage of items in several classes
d. number of items in several classes
2. Qualitative data can be graphically represented by using a(n)
a. histogram
b. frequency polygon
c. ogive
d. bar graph
Amity Business School
Exhibit 1
Michael's Rent-A-Car, a national car rental company, has kept a
record of the number of cars they have rented for a period of 80
days. Their rental records are shown below:
17. To help determine the need for more golf courses, a survey
was undertaken. A sample of 75 self declared golfers was asked
how many rounds of golf they played last year. These data are as
follows
18 26 16 35 30 15 18 15 18 19 25
30 35 14 20 18 24 21 25 18 29 23
15 19 27 28 9 17 28 25 23 20 24
28 36 20 30 26 12 31 13 26 22 30
29 26 17 32 36 24 29 18 38 31 36
24 30 20 13 23 3 28 5 14 24 13
18 10 14 16 28 19 10 42 22
a. Draw a histogram.
b. Draw a stem-and-leaf display.
c. Draw an ogive.
d. Describe what you have learned.