03 GraphicalPart2 Numerical
03 GraphicalPart2 Numerical
Class 3: Describing Data by Graphs, Charts, Tables Bar and Pie Charts Stem and Leaf Diagram Scatter Plots, Line Charts Describing Data by Numerical Measures
A frequency distribution is a tabular summary of data showing the number (frequency) of items of several non-overlapping classes. Relative frequency of a class equals the fraction or proportion of items belonging to a class.
Three steps necessary to define the classes for a frequency distribution with quantitative data:
2
Determine the number of non-overlapping classes Determine the width of each class Determine the class limits
May yield a very jagged distribution with gaps from empty classes
May compress variation too much and yield a blocky distribution
May use 2k > n as a guideline Range of data = Largest data point Smallest data point
Must be mutually exclusive Must be all-inclusive Categories (classes) should be of equal width Avoid empty categories
Histogram: A histogram is constructed by placing the variable of interest on the horizontal axis and the frequency, relative frequency, or percent frequency on the vertical axis. Cumulative Distributions (Ogives): shows the number of data items with values less than or equal to the upper class limit of each class.
4
Todays Focus
Bar and Pie Charts Stem and Leaf Diagram Scatter Plots and Line Charts
Bar Charts
A bar chart is a graphical device for depicting categorical data summarized in a frequency, relative frequency, or percent frequency distribution.
Horizontal axis: Specify labels used for the classes (categories) Vertical axis: Using a bar of fixed width, drawn above each class label, extend the length of the bar until we reach the frequency, relative frequency, or percent frequency of the class. For categorical data, bars should be separated to emphasize the fact that each class is separate.
38 16 10 26 10 100
15 Frequency
10
0 Coke Classic Diet Coke Dr. Pepper Soft Drinks Pepsi Sprite
Histograms are used to represent a frequency distribution associated with a single quantitative (ratio or interval scale) variable. There are no gaps between the histogram bars. Bar charts are used when one or more variables of interest are categorical
10
A useful feature of bar charts is that they can display multiple issues.
Investor A 46.5 32.0 15.5 16.0 Total 110.0 Investor B 55.0 44.0 20.0 28.0 147.0 Investor C 27.5 19.0 13.5 7.0 67.0 Total 129.0 95.0 49.0 51.0 324.0
11
A useful feature of bar charts is that they can display multiple issues.
Investment Categories Across Investors
60.0 50.0 40.0 30.0 20.0 10.0 0.0 Stocks Bonds Derivatives Investment Categories Savings Investor A Investor B
Thousand Dollars
Investor C
12
Besides column bar charts, we can also display data by horizontal bar charts
Investment Categories Across Investors
Investment Categories Savings
Investor A
Stocks 0.0 10.0 20.0 30.0 40.0 50.0 Thousand Dollars 60.0
13
Pie Charts
A pie chart is another graphical device for depicting relative frequency, or percent frequency for categorical data.
A pie chart is the shape of a circle The circle is divided into slices corresponding to the categories or classes to be displayed. The size of each slice is proportional to the magnitude of the displayed variable associated with each category class.
14
38 16 10 26 10 100
15
Pie Chart:
Pie Chart for Soft Drink Purchases
Sprite 10%
Pepsi 26%
16
Both Bar and Pie Charts are used to depict categorical data. Which type of chart should be preferred?
A pie chart is appropriate to show proportion among a total. Otherwise a bar chart will be more appropriate.
17
A simple way to see distribution details from quantitative data Constructing a Stem and Leaf Diagram
(Sort the data from low to high) Decide how to split into stem and leaves List all possible stems in a single column For each stem, list all leaves associated with the stem
18
115
95
76
141
91
81
102
80
81
106
84
68 100
119
98 85
113
115 94
98
106 106
75
95 119
19
86
94 98 104 108 118 127
91
95 98 106 112 119 128
92
95 100 106 113 119 132
92
96 100 106 115 124 134
92
97 102 107 115 126 141
20
To the right of the vertical line, record the last digit for each data value
6 7 8 9 10 11 12 13 14
22
To the right of the vertical line, record the last digit for each data value
6 7 8 89 233566 01123456 12224556788 002466678 2355899 4678 24 1
9 10 11 12 13 14
23
A Stem and Leaf diagram is similar to the histogram as it displays the distribution for the quantitative variable. Advantages of Stem and Leaf over Histogram:
The stem and leaf display is easier to construct by hand Within a class interval, the stem and leaf display provides more information than the histogram because the stem and leaf shows the actual data In the histogram, the individual value of the data is lost once it falls into a class The stem and leaf diagram shows individual data values
24
Using the 100s digit as the stem: Round off the 10s digit to form the leaves
6 7 1 8
12
25
Line Charts
Line Charts are effective tools to represent data that are measured over time (e.g., monthly, quarterly, annually) Line charts show values of one variable vs. time
Time is traditionally shown on the horizontal axis Variable of interest on the vertical axis.
26
Inflation Rate
6 5 4 3
Inflation Rate
2
1 0 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 Year
Scatter Diagrams
shows points for bivariate data (joint values for 2 quantitative variables) One variable is measured on the vertical axis and the other variable is measured on the horizontal axis
28
Scatter Diagrams
Number of commercials 2 5 1 3 4 1 5 3 4 2
Sales in $1000 50 57 41 54 54 38 63 48 59 46
Scatter Diagrams
Relationship between number of commercials shown and sales: Sales vs Number of Commercials Shown
70 Sales in $1000 60
50
40 30
20
10 0 0 1 2 3 4 5 Number of commercials shown 6
What type of a relationship is there between number of commercials shown and sales?
30
Scatter Diagrams
31
Categorical Data
Quantitative Data
Tabular Methods
Tabular Methods
Frequency distribution Relative frequency distribution Percent frequency distribution Bar chart Pie chart
Graphical Methods
Frequency distribution Relative frequency distribution Percent frequency distribution Cumulative frequency distribution Cumulative relative frequency distribution Cumulative percent frequency distribution
Graphical Methods
32
Histogram Ogive Stem and Leaf Display Line chart Scatter diagram
Mean
Median
Mode
Weighted Mean
xi
i1
XW
Midpoint Most often
x
i 1
Balance point
wx w w x w
i i
i i
i i
34
The most common measure of central tendency Mean = sum of values divided by the number of values Affected by extreme values (outliers)
0 1 2 3 4 5 6 7 8 9 10
0 1 2 3 4 5 6 7 8 9 10
Mean = 3
Mean = 4
1 2 3 4 5 15 3 5 5
1 2 3 4 10 20 4 5 5
35
Population mean
N = Population Size
x1 x 2 x N N N
i1 i
Sample mean
n = Sample Size
x
36
x
i1
x1 x 2 x n n
Median
In an ordered array (lowest to highest), the median is the middle number, i.e., the number that splits the distribution in half numerically
0 1 2 3 4 5 6 7 8 9 10
0 1 2 3 4 5 6 7 8 9 10
Median = 3
Median = 3
37
Median
To find the median, sort the n data values from low to high (sorted data is called a data array) Find the value in the i = (1/2) n position
If i is not an integer, round up to next highest integer. (i.e., for an odd number of observations, the median is the middle value.) If i is an integer, the median is the average of the values in position i and i + 1. (i.e., for an even number of observations, the median is the average of the two middle values.)
38
Median Example
Data array: 4, 4, 5, 5, 9, 11, 12, 14, 16, 19, 22, 23, 24
Since 6.5 is not an integer, round up to 7 The median is the value in the 7th position: Md = 12
39
Mean vs Median
The median is the measure of location most often reported for annual income and property value data because a few extremely large incomes or property values can inflate the mean.
40
Shape of a Distribution
Describes how data is distributed Symmetric or skewed The greater the difference between the mean and the median, the more skewed the distribution
Symmetric Right-Skewed
Left-Skewed
Mean = Median
Shape of a Distribution
Left-Skewed Symmetric Right-Skewed
Mean = Median
Exam Scores
42
Mode
A measure of location The value that occurs most often Not affected by extreme values Used for either numerical or categorical data There may be no mode There may be several modes (2 modes = bimodal)
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Mode = 5
No Mode
43