Unit 01 Statistics
Unit 01 Statistics
When looking at data we must identify the following aspects before we can analyze and draw
appropriate conclusions.
Who?________________________________________________________________________________
Population
Sample
What? _______________________________________________________________________________
Qualitative (categorical)
Quantitative
AP Statistics Chapter 3 Part I Displaying and Describing Categorical Data Pages 20-43
I.
Frequency Tables
A. Define Frequency Table
II.
III.
Bar Charts
A. What is a bar chart?
IV.
Pie Charts
A. What is a pie chart?
V.
Contingency Tables
A. What is a contingency table?
VI.
Conditional Distributions
A. What is a conditional distribution?
VII.
2. Stem-and-leaf display: a graph in which each number in the data set is broken
into two pieces called a stem and a leaf. The stem is the first part of the number
and consists of the beginning digit(s). The leaf is the last part of the number and
consists of the final digit.
When to use: Numerical data sets with a small to moderate number of observations
(does not work for very large data sets)
(Back-to-back stemplots can be used to compare two distributions)
How to Construct:
1. Select one or more leading digits for the stem values. The trailing digits
become the leaves.
2. List possible stem values in a vertical column.
3. Record the leaf for every observation beside the corresponding stem value
in order.
4. Create a key indicating the units for the stems and leaves.
What to look for: Center (mode, median), Shape (symmetry or clustering), Spread
(the extent of spread about the typical value, location of peaks), Deviations (the
presence of gaps or outliers).
Example:
Measures of Center
A. Mean
1. most common measure of center
2. arithmetic average
3. sample mean,
population,
4. always exists
5. takes every data value into account
6. NOT resistant to outliers
B.
Median
1. middle value
2. denoted Med
3. commonly used
4. always exists
5. RESISTANT to outliers
6. The Median may not be an actual data value, but an average
of two.
C.
Mode
1. does not always exists
2. may have more than one mode
3. most frequent value
4. only one that may be used with categorical data.
II.
Shape
Symmetric: mean = median
Right-skewed: mean > median
Left-skewed: mean < median
III.
Measures of Spread
A. Range = max min (NOT resistant)
B. Variance, s2 the average of the squares of the deviations of the
observations from their mean. (NOT resistant)
Formula: s2 = (x )2 (sample std dev)
n1
2
= (x )2 (population std dev)
n
C.
D.
Quartiles (resistant)
1. Q1 = 25% mark, P25, The middle between the min and the
median
2. Q2 = 50% mark, P50, median
3. Q3 = 75% mark, P75, The middle between the median and the
max
E.
F.
Five-number summary
min, Q1, med, Q3, max
BOXPLOT: , a graph based on the five-number summary. The box spans the
quartiles and shows the spread of the central half of the distribution. The median
is marked within the box. The whiskers extend to the extremes and show the
full spread of the data.
MODIFIED BOXPLOT: , Plots outliers as isolated points and pulls the
whiskers back to the next highest/lowest data value that is not an outlier.
IV. Deviations
Outlier: If an observation falls more than 1.5 IQR below Q1 or above Q3