Data Visualization Techniques Explained
Data Visualization Techniques Explained
Categorial Numerical
Data Data
Investor's Portfolio
Savings
CD
Bonds
Stocks
0 10 20 30 40 50
Amount in K$
Bar Chart
Bonds Percentages
(Variables are Qualitative) are rounded to
29% the nearest
percent
Pie Chart
40% 90%
bar
70%
chart 30%
shows 60%
25%
% 50%
invested 20%
40%
in each
15%
category 30% Axis for line
10%
20%
graph
shows
5% 10%
cumulative
0% 0% % invested
Stocks Bonds Savings CD
VILFREDO PARETO
(1843–1923)
The “Vital
Few”
Pareto Diagram
◼ Contingency Tables
◼ Side By Side Bar Charts
Bivariate Categorical Data
S avings
CD
B onds
S toc k s
0 10 20 30 40 50 60
Categorial Numerical
Data Data
4 1
Example
Relative
Class Frequency Frequency Percentage
10 but under 20 3 .15 15
20 but under 30 6 .30 30
30 but under 40 5 .25 25
40 but under 50 4 .20 20
50 but under 60 2 .10 10
Total 20 1 100
Frequency Distribution:
Discrete Data
◼ The following data record the number of
children in the families of the 47 workers in a
company:
1 1 3 2 0 2 0 1 2 2 1 3
5 2 4 0 0 2 4 1 1 2 2 0
3 0 0 2 1 3 6 0 2 1 0 3
2 2 2 1 0 0 1 1 3 1 4
Frequency distribution table
Number of children Number of workers
in family
0
1
2
3
4
5
6
Frequency Distribution:
Discrete Data
◼ Discrete data: possible values are countable
Number of days
Example: An read
Frequency
advertiser asks 0 44
200 customers 1 24
how many days 2 18
per week they 3 16
read the daily 4 20
newspaper. 5 22
6 26
7 30
Total 200
Relative Frequency
Relative Frequency: What proportion is in each category?
Number of days Relative
Frequency
read Frequency
44
0 44 .22 = .22
1 24 .12
200
2 18 .09 22% of the
3 16 .08 people in the
sample report
4 20 .10 that they read
5 22 .11 the newspaper
0 days per week
6 26 .13
7 30 .15
Total 200 1.00
NOTE
For developing frequency and relative frequency
distributions for discrete data
(1) List all possible values of the variables. If the
Lower Upper
limit limit
Distribution classes
◼ Class widths (class lengths):
- continuous data: are the numerical differences
between lower and upper class limits.
- discrete data: are the numerical differences
between the lower limit of one class and the lower
limit of the immediately following class
◼ Class mid-points: are situated in the centre of the
classes.
Distribution classes
◼ Open-ended class: Classes
- A class without a/an
< 10
lower/upper limit.
- Usually used for the first 10-15
class which has no defined
lower limit and/or the last 15-20
class which has no defined
upper limit
>=20
Grouping Data by Classes
Sort raw data in ascending order:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58
◼ Find range: 58 - 12 = 46
◼ Select number of classes: 5 (usually between 5 and 20)
◼ Compute class width: 10 (46/5 then round off)
◼ Determine class boundaries:10, 20, 30, 40, 50
◼ Compute class midpoints: 15, 25, 35, 45, 55
◼ Count observations & assign to classes
Frequency Distribution Example
Data in ordered array:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58
Frequency Distribution
Frequency
2
classes 1.5
1
◼ Can give a poor indication of 0.5
4
8
12
16
20
24
28
32
36
40
44
48
52
56
60
More
classes
Temperature
12
◼ Few (Wide class intervals) 10
Frequency
8
◼
6
much and yield a blocky 4
distribution 2
7 6
6 5
Frequency
5 4 No Gaps
4 3
3 2
Between
2 Bars
1 0 0
0
5 15 25 35 45 55 More
Class Boundaries
Class Midpoints
The Histogram
◼ Number of peaks
◼ Skewness
“lost” orders
in the upper
tail of the
distribution.
Histogram- Example
◼ In the Journal of Experimental Social Psychology (Vol. 45, 2009) study on
whether money can buy love (p. 63), the researchers randomly assigned
participants to the role of either gift-giver or gift-receiver. (Gift-givers, recall,
were asked about a birthday gift they recently gave, while gift-recipients were
asked about a birthday gift they recently received.) Two quantitative variables
were measured for each of the 237 participants: gift price (measured in dollars)
and overall level of appreciation for the gift (measured as the sum of the two 7-
point appreciation scales, with higher values indicating a higher level of
appreciation).
◼ One of the objectives of the research was to investigate whether givers and
receivers differ on the price of the gift reported and on the level of appreciation
reported.
◼ Use BUYLOV to construct side-by-side histograms for the quantitative
variables, one histogram for gift-givers and one for gift-recipients.
The histograms for birthday gift price
The prices
reported by
gift-recipients
tended to be
higher than
the prices
reported by
gift-givers.
The histograms for overall level of
appreciation
Gift-givers and
gift-recipients
respond
differently, with
gift-recipients
more likely to
express a greater
level of
appreciation for
the gift than what
gift-givers
perceive
Organizing Numerical Data
Numerical Data 41, 24, 32, 26, 27, 27, 30, 24, 38, 21
Tables Polygons
Cummulative and relative cummulative
frequency distribution
Ogive
100
80
60
40
20
0
10 20 30 40 50 60
7
6
5
4
3
2
1
0
5 15 25 35 45 55 More
Class Midpoints
The Polygon
▪ A percentage polygon is formed by having the
midpoint of each class represent the data in that
class and then connecting the sequence of
midpoints at their respective class percentages.
▪ The cumulative percentage polygon, or ogive,
displays the variable of interest along the X axis,
and the cumulative percentages along the Y axis.
▪ Useful when there are two or more groups to
compare.
The Polygons
◼ Construct ogives and polygons of meal costs for center city and
metro area restaurants
Line Chart Example
Inflation
Year Rate
1985 3.56
U.S. Inflation Rate
1986 1.86 6
1987 3.65
5
Inflation Rate (%)
1988 4.14
1989 4.82 4
1990 5.40
1991 4.21 3
1992 3.01
1993 2.99 2
1994 2.56
1
1995 2.83
1996 2.95 0
1997 2.29
1984 1986 1988 1990 1992 1994 1996 1998 2000 2002
1998 1.56
1999 2.21 Year
2000 3.36
2001 2.85
2002 1.58
Scatter Diagram Example
26 140
29 146 150
33 160
100
38 167
42 170
50
50 188 0
55 195 0 10 20 30 40 50 60 70
60 200 Volume per Day
Types of Relationships
◼ Linear Relationships
Y Y
X X
Types of Relationships
(continued)
◼ Curvilinear Relationships
Y Y
X X
Types of Relationships
(continued)
◼ No Relationship
Y Y
X X
Summary
◼ END OF CHAPTER 2
Seven Basic Tools of Quality Control
1. Process Flowcharts
2. Brainstorming
3. Fishbone Diagram
4. Histogram
5. Trend Charts
6. Scatter Plots
7. Statistical Process
Control Charts
Seven Basic Tools of Quality Control
(continued)
x
Seven Basic Tools of Quality
Control
(continued)
time