0% found this document useful (0 votes)
171 views95 pages

Data Visualization Techniques Explained

The document discusses various methods for presenting categorical and numerical data through tables, charts, and graphs. These include summary tables, bar charts, pie charts, Pareto diagrams, contingency tables, dot plots, stem-and-leaf displays, and histograms. Examples are provided for many of these.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
171 views95 pages

Data Visualization Techniques Explained

The document discusses various methods for presenting categorical and numerical data through tables, charts, and graphs. These include summary tables, bar charts, pie charts, Pareto diagrams, contingency tables, dot plots, stem-and-leaf displays, and histograms. Examples are provided for many of these.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Chapter 2

Summarizing Data (6 hours)


Learning Objectives
In this chapter you learn:
◼ 1. Frequency distribution classes

◼ 2. Descriptive Statistics: Tabulars and


Charts/Graphs
Data Presentation
Data
Presentation

Categorial Numerical
Data Data

Summary Dot Stem-&-Leaf Frequency


Table Plot Display Distribution

Bar Pie Pareto


Histogram
Graph Chart Diagram
Summary Table
1. Lists categories & number of elements in
category
2. Obtained by tallying responses in category
3. May show frequencies (counts), % or both
Row Is
Major Count Tally:
Category |||| ||||
Accounting 130
|||| ||||
Economics 20
Management 50
Total 200
Summary Table-Example

◼ A summary table of the retirement funds,


categorized by risk
Example

◼ The sample of 318 retirement funds includes


the variable risk that has the defined categories
Low, Average, and High. Construct a summary
table of the retirement funds, categorized by
risk (Dataset: Retirement Funds).
Summary Table: Retirement Funds
categorized by Risk

Fund Risk Level Number of funds Percentage of Funds


Low 99 31.13%
Average 145 45.60%
High 74 23.27%
Total 318 100.00%
Bar Chart
(for an Investor’s Portfolio)

Investor's Portfolio

Savings
CD

Bonds
Stocks

0 10 20 30 40 50
Amount in K$
Bar Chart

◼ The bar chart visualizes a categorical variable


as a series of bars.
◼ The length of each bar represents either the
frequency or percentage of values for each
category.
◼ Each bar is separated by a space called a gap.
Pie Chart Example
Current Investment Portfolio
Investment Amount Percentage Savings
Type (in thousands $)
15%
Stocks 46.5 42.27 Stocks
Bonds 32.0 29.09 42%
CD
CD 15.5 14.09 14%
Savings 16.0 14.55
Total 110 100

Bonds Percentages
(Variables are Qualitative) are rounded to
29% the nearest
percent
Pie Chart

◼ The pie chart is a circle broken up into slices


that represent categories.
◼ The size of each slice of the pie varies
according to the percentage in each category.
Doughnut chart
Example

◼ The sample of 318 retirement funds includes


the variable risk that has the defined categories
Low, Average, and High. Construct a bar chart
and a pie chart of the retirement funds,
categorized by risk (Dataset: Retirement
Funds).
Pareto Diagram
45% 100%

40% 90%

Axis for 35%


80%

bar
70%
chart 30%

shows 60%
25%
% 50%
invested 20%
40%
in each
15%
category 30% Axis for line
10%
20%
graph
shows
5% 10%
cumulative
0% 0% % invested
Stocks Bonds Savings CD
VILFREDO PARETO
(1843–1923)

The Pareto Principle


◼ Pareto showed that approximately 80% of

the total wealth in a society lies with only


20% of the families.
◼ This famous law about the “vital few and the
trivial many” is widely known as the Pareto
principle in economics.
Pareto Diagram
▪ Used to portray categorical data
▪ A bar chart, where categories are shown in
descending order of frequency
▪ A cumulative polygon is shown in the same graph
▪ Used to separate the “vital few” from the “trivial
many”.
▪ Pareto charts are also powerful tools for prioritizing
improvement efforts, such as when data are
collected that identify defective or nonconforming
items.
Pareto Diagram

The “Vital
Few”
Pareto Diagram

◼ Use the bank’s own processing systems as a


primary data source, causes of incomplete
transactions are collected, stored in ATM
Transactions to construct a Pareto Diagram.
Example
Pareto diagram
Pareto diagram

◼ The first three categories account for about


60% of the defects.
◼ Decision makers can see where to concentrate
efforts to improve the process. Attempts to
reduce defects due to warpage, damage, and
pin marks should produce the greatest payoff.
Summary
Bar graph: The categories (classes) of the
qualitative variable are represented by bars, where
the height of each bar is either the class frequency,
class relative frequency, or class percentage.
Pie chart: The categories (classes) of the
qualitative variable are represented by slices of a
pie (circle). The size of each slice is proportional to
the class relative frequency.
Pareto diagram: A bar graph with the categories
(classes) of the qualitative variable (i.e., the bars)
arranged by height in descending order from left to
right.
Bivariate Categorical Data

◼ Contingency Tables
◼ Side By Side Bar Charts
Bivariate Categorical Data

◼ Contingency Tables: Investment in Thousands of Dollars


Investment Investor A Investor B Investor C Total
Category

Stocks 46.5 55 27.5 129


Bonds 32 44 19 95
CD 15.5 20 13.5 49
Savings 16 28 7 51
Total 110 147 67 324
Bivariate Categorical Data

◼ Side by Side Charts


C om paring Investors

S avings

CD

B onds

S toc k s

0 10 20 30 40 50 60

Inves tor A Inves tor B Inves tor C


Contingency Table
◼ A contingency table cross-tabulates, or tallies jointly,
the data of two or more categorical variables, allowing
you to study patterns that may exist between the
variables.
◼ Tallies can be shown as a frequency, a percentage of the
overall total, a percentage of the row total, or a
percentage of the column total, depending on the type
of contingency table you use.
◼ Each tally appears in its own cell, and there is a cell for
each joint response, a unique combination of values for
the variables being tallied.
Contingency Table- Example
Contingency Table- Example

◼ Use Mutual Funds to construct a contingency


table for the levels of risk, and type of funds.
Side by Side Bar Chart- Example

◼ Use Mutual Funds to construct a side-by-side


chart to visualizes the data for the levels of risk
for growth and value funds.
Side-By-Side Bar Charts
The doughnut chart
Data Presentation
Data
Presentation

Categorial Numerical
Data Data

Summary Dot Stem-&-Leaf Frequency


Table Plot Display Distribution

Bar Pie Pareto


Histogram
Graph Chart Diagram
Dot Plot
1. Horizontal axis is a scale for the quantitative
variable, e.g., percent.
2. The numerical value of each measurement is
located on the horizontal scale by a dot.
Stem-and-Leaf

◼ Data in Raw Form (as Collected):


24, 26, 24, 21, 27, 27, 30, 41, 32, 38
◼ Data in Ordered Array from Smallest to Largest:
21, 24, 24, 26, 27, 27, 30, 32, 38, 41
◼ Stem-and-Leaf Display:
2 144677
3 028

4 1
Example

◼ Suppose you collect the following meal costs (in


$) for 15 classmates who had lunch at a fast-food
restaurant (stored in FastFood ):
Example
◼ To construct the stem-and-leaf display, you use whole
dollar amounts as the stems and round the cents to one
decimal place to use as the leaves. For the first value,
7.42, the stem is 7 and its leaf is 4. For the second
value, 6.29, the stem is 6 and its leaf 3.

◼ A stem-and-leaf display turned sideways looks like a histogram.


Frequency Distributions

◼ A frequency distribution is a list or a table


containing the values of a variable (or a set of
ranges within which the data fall) and the
corresponding frequencies with which each value
occurs (or frequencies with which data fall within
each range).
◼ A frequency distribution is a way to summarize data
◼ The distribution condenses the raw data into a
more useful form and allows for a quick visual
interpretation of the data.
Example
Data in Ordered Array:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

Relative
Class Frequency Frequency Percentage
10 but under 20 3 .15 15
20 but under 30 6 .30 30
30 but under 40 5 .25 25
40 but under 50 4 .20 20
50 but under 60 2 .10 10
Total 20 1 100
Frequency Distribution:
Discrete Data
◼ The following data record the number of
children in the families of the 47 workers in a
company:
1 1 3 2 0 2 0 1 2 2 1 3

5 2 4 0 0 2 4 1 1 2 2 0

3 0 0 2 1 3 6 0 2 1 0 3

2 2 2 1 0 0 1 1 3 1 4
Frequency distribution table
Number of children Number of workers
in family
0
1
2
3
4
5
6
Frequency Distribution:
Discrete Data
◼ Discrete data: possible values are countable
Number of days
Example: An read
Frequency
advertiser asks 0 44
200 customers 1 24
how many days 2 18
per week they 3 16
read the daily 4 20
newspaper. 5 22
6 26
7 30
Total 200
Relative Frequency
Relative Frequency: What proportion is in each category?
Number of days Relative
Frequency
read Frequency
44
0 44 .22 = .22
1 24 .12
200
2 18 .09 22% of the
3 16 .08 people in the
sample report
4 20 .10 that they read
5 22 .11 the newspaper
0 days per week
6 26 .13
7 30 .15
Total 200 1.00
NOTE
For developing frequency and relative frequency
distributions for discrete data
(1) List all possible values of the variables. If the

variable is quantitatives, order the possible


values from low and high.
(2) Count the number of occurrences at each
value of the variable and place this value in a
column labeled “frequency”
(3) Determine the variable frequencies
Frequency Distribution:
Continuous Data

◼ Continuous Data: may take on any value in


some interval
Example: A manufacturer of insulation randomly selects
20 winter days and records the daily high temperature
24, 35, 17, 21, 24, 37, 26, 46, 58, 30, 32, 13, 12, 38, 41,
43, 44, 27, 53, 27
(Temperature is a continuous variable because it could
be measured to any degree of precision desired)
Distribution classes
◼ Class limits: are the lower and upper values of the
classes as physically described in the distribution.

Discrete data Continuous data


Lower Upper
limit Classes Classes
limit

Lower Upper
limit limit
Distribution classes
◼ Class widths (class lengths):
- continuous data: are the numerical differences
between lower and upper class limits.
- discrete data: are the numerical differences
between the lower limit of one class and the lower
limit of the immediately following class
◼ Class mid-points: are situated in the centre of the
classes.
Distribution classes
◼ Open-ended class: Classes
- A class without a/an
< 10
lower/upper limit.
- Usually used for the first 10-15
class which has no defined
lower limit and/or the last 15-20
class which has no defined
upper limit
>=20
Grouping Data by Classes
Sort raw data in ascending order:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

◼ Find range: 58 - 12 = 46
◼ Select number of classes: 5 (usually between 5 and 20)
◼ Compute class width: 10 (46/5 then round off)
◼ Determine class boundaries:10, 20, 30, 40, 50
◼ Compute class midpoints: 15, 25, 35, 45, 55
◼ Count observations & assign to classes
Frequency Distribution Example
Data in ordered array:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

Frequency Distribution

Class Frequency Relative


Frequency
10 but under 20 3 .15
20 but under 30 6 .30
30 but under 40 5 .25
40 but under 50 4 .20
50 but under 60 2 .10
Total 20 1.00
Questions for Grouping Data
into Classes

◼ 1. How wide should each interval be?


(How many classes should be used?)

◼ 2. How should the endpoints of the


intervals be determined?
◼ Often answered by trial and error, subject to user judgment
◼ The goal is to create a distribution that is neither too "jagged"
nor too "blocky”
◼ Goal is to appropriately show the pattern of variation in the
data
How Many Class Intervals?
◼ Many (Narrow class intervals) 3.5

◼ may yield a very jagged 3


2.5
distribution with gaps from empty

Frequency
2
classes 1.5
1
◼ Can give a poor indication of 0.5

how frequency varies across 0

4
8
12
16
20
24
28
32
36
40
44
48
52
56
60
More
classes
Temperature

12
◼ Few (Wide class intervals) 10

may compress variation too

Frequency
8

6
much and yield a blocky 4
distribution 2

◼ can obscure important patterns 0


0 30 60 More
of variation. Temperature

(X axis labels are upper class endpoints)


Guidelines for grouping values into
classes
• Use between 5 and 20 classes.
Or Sturges’s Rule: Classes = 1 + 3.322[log10(n)]
where n: number of data values.
The classes should meet four criteria
- First, they must be mutually exclusive.
- Second, they must be all-inclusive.
- Third, if at all possible, they should be of equal-width
and
- Fourth, avoid empty classes if possible.
General Guidelines
◼ Number of Data Points Number of Classes
under 50 5- 7
50 – 100 6 - 10
100 – 250 7 - 12
over 250 10 - 20

◼ Class widths can typically be reduced as the number of


observations increases
◼ Distributions with numerous observations are more
likely to be smooth and have gaps filled since data are
plentiful
Class Width

◼ The class width is the distance between the


lowest possible value and the highest possible
value for a frequency class

◼ The minimum class width is


Largest Value - Smallest Value
W =
Number of Classes
The Histogram

Data in Ordered Array:


12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58
Histogram

7 6
6 5
Frequency

5 4 No Gaps
4 3
3 2
Between
2 Bars
1 0 0
0
5 15 25 35 45 55 More

Class Boundaries
Class Midpoints
The Histogram

▪ A graph of the data in a frequency distribution is called


a histogram.

▪ The class boundaries (or class midpoints) are shown


on the horizontal axis.

▪ The vertical axis is either frequency, relative


frequency, or percentage.

▪ Bars of the appropriate heights are used to represent


the number of observations within each class.
Example

◼ To improve the quality of a product, a sample of


10 products is randomly collected each day for
ten days. The outside diameter is gauged and
reported in dataset diameter. Construct a
frequency distribution and a histogram to show
the distribution of the diameter.
The Histogram

Three general types of information:


❑ A visual indication of where the approximate

center of data is.


❑ The degree of spread (or variation) in the data.

❑ The shape of the distribution.


Data pattern

◼ Patterns in data are commonly described in


terms of: center, spread, shape and unusual
features
◼ Some common distributions have special
descriptive labels such as symmetric, bell-
shapes, skewed, etc…
Center

◼ the center of a distribution is located at


the median of the distribution
◼ This is the point in a graphic display where
about half of the observations are on either side
◼ Example:
Spread

◼ The spread of a distribution refers to the


variability of the data
◼ If the observations cover a wide range, the
spread is larger. If the observations are
clustered around
Shape

The shape of a distribution is described by the


following characteristics.
◼ Symmetry

◼ Number of peaks

◼ Skewness

◼ Uniform: When a uniform distribution has no


clear peaks.
Example
Unusual features

◼ The two most common unusual features are


gaps and outliers.
◼ Gaps: refer to areas of a distribution where there are
no observations.
◼ Outliers: distributions are characterized by extreme
values that differ greatly from the other observations.
Example
Histogram-Example
◼ A manufacturer of industrial wheels suspects that
profitable orders are being lost because of the long time
the firm takes to develop price quotes for potential
customers. To investigate this possibility, 50 requests for
price quotes were randomly selected from the set of all
quotes made last year, and the processing time was
determined for each quote. Each quote was classified
according to whether the order was “lost” or not (i.e.,
whether or not the customer placed an order after
receiving a price quote).
◼ Use data set QUOTES to create a frequency histogram
for these data. Then shade the area under the histogram
that corresponds to lost orders. Interpret the result.
Histogram-Example

“lost” orders
in the upper
tail of the
distribution.
Histogram- Example
◼ In the Journal of Experimental Social Psychology (Vol. 45, 2009) study on
whether money can buy love (p. 63), the researchers randomly assigned
participants to the role of either gift-giver or gift-receiver. (Gift-givers, recall,
were asked about a birthday gift they recently gave, while gift-recipients were
asked about a birthday gift they recently received.) Two quantitative variables
were measured for each of the 237 participants: gift price (measured in dollars)
and overall level of appreciation for the gift (measured as the sum of the two 7-
point appreciation scales, with higher values indicating a higher level of
appreciation).
◼ One of the objectives of the research was to investigate whether givers and
receivers differ on the price of the gift reported and on the level of appreciation
reported.
◼ Use BUYLOV to construct side-by-side histograms for the quantitative
variables, one histogram for gift-givers and one for gift-recipients.
The histograms for birthday gift price

The prices
reported by
gift-recipients
tended to be
higher than
the prices
reported by
gift-givers.
The histograms for overall level of
appreciation

Gift-givers and
gift-recipients
respond
differently, with
gift-recipients
more likely to
express a greater
level of
appreciation for
the gift than what
gift-givers
perceive
Organizing Numerical Data

Numerical Data 41, 24, 32, 26, 27, 27, 30, 24, 38, 21

Stem and Leaf Frequency Distributions


Ordered
Display and
Array
Cumulative Distributions
2 144677
21, 24, 24, 26, 27,
3 028
27, 30, 32, 38, 41
4 1 Histograms Ogive

Tables Polygons
Cummulative and relative cummulative
frequency distribution

◼ A summary of a set of data that displays the


number of observations with values less than or
equal to the upper limit of each of its classes.
◼ A summary of a set of data that displays the
proportion of observations with values less than
or equal to the upper limit of each of its classes.
Relative frequency histograms
and ogives

• A relative frequency histogram is


formed in the same manner as a
frequency histogram, but it used
rather than frequencies.
• The cummulative relative frequency
is presented using a graph called an
ogive.
Tabulating Numerical Data:
Cumulative Frequency

Data in Ordered Array:


12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

Lower Cumulative Cumulative


Limit Frequency % Frequency
10 0 0
20 3 15
30 9 45
40 14 70
50 18 90
60 20 100
The Ogive (Cumulative %)

Data in Ordered Array :


12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

Ogive

100

80
60
40
20

0
10 20 30 40 50 60

Class Boundaries (low boundaries (Not Midpoints)


Ogives
Unlike the
percentage
polygon, the
lower boundary
of the class
interval for the
numerical
variable are
plotted,
at their
respective class
percentages as
points on a line
along the X axis.
The Frequency Polygon

Data in Ordered Array:


12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58
Frequency

7
6
5
4
3
2
1
0
5 15 25 35 45 55 More

Class Midpoints
The Polygon
▪ A percentage polygon is formed by having the
midpoint of each class represent the data in that
class and then connecting the sequence of
midpoints at their respective class percentages.
▪ The cumulative percentage polygon, or ogive,
displays the variable of interest along the X axis,
and the cumulative percentages along the Y axis.
▪ Useful when there are two or more groups to
compare.
The Polygons

This chart uses


the midpoints of
each class
interval to
represent the
data of each
class and then
plots the
midpoints,
at their
respective class
percentages, as
points on a line
along the X axis
The Percentage Polygon
Line Charts and Scatter Diagrams
◼ Line charts show values of one variable
vs. time
◼ Time is traditionally shown on the horizontal axis

◼ Scatter Diagrams show points for bivariate


data
◼ one variable is measured on the vertical axis and
the other variable is measured on the horizontal
axis
Example
◼ In collecting meal cost data as part of a study that reviews the
travel and entertainment costs that a business incurs in a major
city, you might want to determine if the cost of meals at restaurants
located in the center city district differ from the cost at restaurants
in the surrounding metropolitan area. As you collect meal cost data
for this study, you also note the restaurant location, center city or
metro area. The file Restaurants stores the data.

◼ Construct ogives and polygons of meal costs for center city and
metro area restaurants
Line Chart Example
Inflation
Year Rate
1985 3.56
U.S. Inflation Rate
1986 1.86 6
1987 3.65
5
Inflation Rate (%)

1988 4.14
1989 4.82 4
1990 5.40
1991 4.21 3
1992 3.01
1993 2.99 2
1994 2.56
1
1995 2.83
1996 2.95 0
1997 2.29
1984 1986 1988 1990 1992 1994 1996 1998 2000 2002
1998 1.56
1999 2.21 Year
2000 3.36
2001 2.85
2002 1.58
Scatter Diagram Example

Production Volume vs. Cost per Day


Volume Cost per
per day day
250
23 125
200
Cost per Day

26 140
29 146 150
33 160
100
38 167
42 170
50
50 188 0
55 195 0 10 20 30 40 50 60 70
60 200 Volume per Day
Types of Relationships

◼ Linear Relationships

Y Y

X X
Types of Relationships
(continued)

◼ Curvilinear Relationships

Y Y

X X
Types of Relationships
(continued)

◼ No Relationship

Y Y

X X
Summary
◼ END OF CHAPTER 2
Seven Basic Tools of Quality Control
1. Process Flowcharts
2. Brainstorming
3. Fishbone Diagram
4. Histogram
5. Trend Charts
6. Scatter Plots
7. Statistical Process
Control Charts
Seven Basic Tools of Quality Control
(continued)

1. Process Flowcharts Map out the process to better


2. Brainstorming visualize and understand
3. Fishbone Diagram opportunities for improvement.
4. Histogram
5. Trend Charts
6. Scatter Plots
7. Statistical Process
Control Charts
Seven Basic Tools of Quality
Control
(continued)

1. Process Flowcharts Fishbone (cause-and-effect) diagram:


2. Brainstorming
Cause 1 Cause 2
3. Fishbone Diagram
4. Histogram Sub-causes
5. Trend Charts
6. Scatter Plots Problem
7. Statistical Process Sub-causes
Control Charts

Show patterns of variation Cause 3 Cause 4


Seven Basic Tools of Quality
Control
(continued)

1. Process Flowcharts Identify trend


2. Brainstorming y
3. Fishbone Diagram
4. Histogram
5. Trend Charts
6. Scatter Plots
7. Statistical Process time
Control Charts
Examine relationships
y

x
Seven Basic Tools of Quality
Control
(continued)

1. Process Flowcharts Examine the performance


2. Brainstorming of a process over time
3. Fishbone Diagram
4. Histogram X
5. Trend Charts
6. Scatter Plots
7. Statistical Process
Control Charts

time

You might also like