Introduction To Probability and Statistics Twelfth Edition Introduction To Probability and Statistics Twelfth Edition
Introduction To Probability and Statistics Twelfth Edition Introduction To Probability and Statistics Twelfth Edition
1
Median Example
• The set: 2, 4, 9, 8, 6, 5, 3 n = 7
• The median of a set of measurements is
the middle measurement when the • Sort: 2, 3, 4, 5, 6, 8, 9
measurements are ranked from smallest • Position: .5(n + 1) = .5(7 + 1) = 4th
to largest. Median = 4th largest measurement
Mode Example
The number of quarts of milk purchased by
• The mode is the measurement which occurs 25 households:
most frequently. 0 0 1 1 1 1 1 2 2 2 2 2 2 2 2
• The set: 2, 4, 9, 8, 8, 5, 3 2 3 3 3 3 3 4 4 4 5
• Mean?
– The mode is 8, which occurs twice
∑ xi 55
• The set: 2, 2, 9, 8, 8, 5, 3 x= = = 2 .2 10/25
n 25 8/25
Relative frequency
– There are two modes—8 and 2 (bimodal) • Median? 6/25
•The median is often used as a measure Skewed left: Mean < Median
of center when the distribution is
skewed. Copyright ©2006 Brooks/Cole Copyright ©2006 Brooks/Cole
A division of Thomson Learning, Inc. A division of Thomson Learning, Inc.
2
Measures of Variability The Range
• A measure along the horizontal axis of
the data distribution that describes the • The range, R, of a set of n measurements is
spread of the distribution from the the difference between the largest and
center. smallest measurements.
• Example: A botanist records the number of
petals on 5 flowers:
5, 12, 6, 8, 14
• The range is R = 14 – 5 = 9.
•Quick and easy, but only uses 2
Copyright ©2006 Brooks/Cole of the 5 measurements.
Copyright ©2006 Brooks/Cole
A division of Thomson Learning, Inc. A division of Thomson Learning, Inc.
3
Two Ways to Calculate Some Notes MY APPLET
9 Can be used for either samples ( x and s) or for a population (µ 9The interval µ ± σ contains approximately 68% of
and σ). the measurements.
9Important results:
9If k = 2, at least 1 – 1/22 = 3/4 of the measurements are 9The interval µ ± 2σ contains approximately 95%
within 2 standard deviations of the mean. of the measurements.
9If k = 3, at least 1 – 1/32 = 8/9 of the measurements are
within 3 standard deviations of the mean. 9The interval µ ± 3σ contains approximately 99.7%
of the measurements.
Copyright ©2006 Brooks/Cole Copyright ©2006 Brooks/Cole
A division of Thomson Learning, Inc. A division of Thomson Learning, Inc.
•Do they agree with the Empirical •No. Not very well.
Relative frequency
10/50
8/50
Rule?
s = 10.73 6/50
4/50
•Why or why not?
•The data distribution is not very
2/50
mound-shaped, but skewed right.
0
25 33 41 49 57 65 73
Ages
Shape? Skewed right Copyright ©2006 Brooks/Cole Copyright ©2006 Brooks/Cole
A division of Thomson Learning, Inc. A division of Thomson Learning, Inc.
4
Example Approximating s
The length of time for a worker to
complete a specified operation averages • From Tchebysheff’s Theorem and the
12.8 minutes with a standard deviation of 1.7 Empirical Rule, we know that
minutes. If the distribution of times is R ≈ 4-6 s
approximately mound-shaped, what proportion
of workers will take longer than 16.2 minutes to
• To approximate the standard deviation
complete the task? of a set of measurements, we can use:
95% between 9.4 and 16.2 s ≈ R/4
or s ≈ R / 6 for a largedata set.
47.5% between 12.8 and 16.2
.475 .475 .025 (50-47.5)% = 2.5% above 16.2
Copyright ©2006 Brooks/Cole Copyright ©2006 Brooks/Cole
A division of Thomson Learning, Inc. A division of Thomson Learning, Inc.
Approximating s
The ages of 50 tenured faculty at a
Measures of Relative Standing
state university. • Where does one particular measurement
• 34 48 70 63 52 52 35 50 37 43 53 43 52 44
stand in relation to the other measurements
• 42 31 36 48 43 26 58 62 49 34 48 53 39 45
in the data set?
• 34 59 34 66 40 59 36 41 35 36 62 34 38 28 • How many standard deviations away from
• 43 50 30 43 32 44 58 53 the mean does the measurement lie? This is
measured by the z-score. s
Suppose s = 2.
R = 70 – 26 = 44
x−x 4
s ≈ R / 4 = 44 / 4 = 11 z - score = s s
s
x =5 x=9
Actual s = 10.73 x = 9 lies z =2 std dev from the mean.
Copyright ©2006 Brooks/Cole Copyright ©2006 Brooks/Cole
A division of Thomson Learning, Inc. A division of Thomson Learning, Inc.
z-Scores
• From Tchebysheff’s Theorem and the Empirical Rule Measures of Relative Standing
– At least 3/4 and more likely 95% of measurements lie within
2 standard deviations of the mean.
• How many measurements lie below
– At least 8/9 and more likely 99.7% of measurements lie the measurement of interest? This is
within 3 standard deviations of the mean. measured by the pth percentile.
• z-scores between –2 and 2 are not unusual. z-scores should not
be more than 3 in absolute value. z-scores larger than 3 in
absolute value would indicate a possible outlier.
p% (100-p) %
x
Outlier Not unusual Outlier
z p-th percentile
-3 -2 -1 0 1 2 3
Somewhat unusual Copyright ©2006 Brooks/Cole Copyright ©2006 Brooks/Cole
A division of Thomson Learning, Inc. A division of Thomson Learning, Inc.
5
Examples Quartiles and the IQR
• The lower quartile (Q1) is the value of x
• 90% of all men (16 and older) earn which is larger than 25% and less than
more than $319 per week. 75% of the ordered measurements.
BUREAU OF LABOR STATISTICS
10% 90%
$319 is the 10th • The upper quartile (Q3) is the value of x
$319 percentile. which is larger than 75% and less than
25% of the ordered measurements.
50th Percentile ≡ Median
• The range of the “middle 50%” of the
25th Percentile ≡ Lower Quartile (Q1) measurements is the interquartile range,
75th Percentile ≡ Upper Quartile (Q3) IQR = Q3 – Q1
Copyright ©2006 Brooks/Cole Copyright ©2006 Brooks/Cole
A division of Thomson Learning, Inc. A division of Thomson Learning, Inc.
6
Constructing a Box Plot Constructing a Box Plot
9Calculate Q1, the median, Q3 and IQR. 9Isolate outliers by calculating
9Lower fence: Q1-1.5 IQR
9Draw a horizontal line to represent the scale
9Upper fence: Q3+1.5 IQR
of measurement.
9Measurements beyond the upper or lower
9Draw a box using Q1, the median, Q3.
fence is are outliers and are marked (*).
*
Q1 m Q3 Q1 m Q3
*
Q1 m Q3 m
Q1 Q3
Copyright ©2006 Brooks/Cole Copyright ©2006 Brooks/Cole
A division of Thomson Learning, Inc. A division of Thomson Learning, Inc.
m
Q1 Q3
Copyright ©2006 Brooks/Cole Copyright ©2006 Brooks/Cole
A division of Thomson Learning, Inc. A division of Thomson Learning, Inc.
7
Key Concepts Key Concepts
2. Variance
I. Measures of Center ∑( xi − µ ) 2
a. Population of N measurements: σ =
2
1. Arithmetic mean (mean) or average N
b. Sample of n measurements:
a. Population: µ
∑ xi (∑ xi ) 2
b. Sample of size n: x = ∑ xi −
2
n ∑( xi − x )
2
n
s2 = =
2. Median: position of the median = .5(n +1) n −1 n −1
3. Mode 3. Standard deviation
4. The median may preferred to the mean if the data are
Population standard deviation : σ = σ 2
highly skewed.
II. Measures of Variability Sample standard deviation : s = s 2
1. Range: R = largest − smallest 4. A rough approximation for s can be calculated as s ≈ R / 4.
The divisor can be adjusted depending on the sample size.
Copyright ©2006 Brooks/Cole Copyright ©2006 Brooks/Cole
A division of Thomson Learning, Inc. A division of Thomson Learning, Inc.
Key Concepts
3. Upper and lower fences are used to find outliers.
a. Lower fence: Q 1 − 1.5(IQR)
b. Outer fences: Q 3 + 1.5(IQR)
4. Whiskers are connected to the smallest and largest
measurements that are not outliers.
5. Skewed distributions usually have a long whisker in
the direction of the skewness, and the median line is
drawn away from the direction of the skewness.