Essentials of Statistics
Essentials of Statistics
Rao Jammalamadaka
University of California, Santa Barbara
ii
iii
Preface
Statistical ideas have become indispensable not only for doing and understanding scien-
tific research but just for being well-informed citizens. We deal with uncertainties around
us in everyday life from weather forecasting to stock-market gyrations and are bombarded
with polls and advertisements containing claims and counter-claims. So a certain level of
“Statistical literacy” is desirable for all of us. The need for such statistical literacy in this
modern age was foreseen by H.G. Wells, who said “Statistical thinking will one day be as
necessary for efficient citizenship as the ability to read and write” and that day, we think, is
already upon us!
This book attempts to cover basic ideas of statistics in a direct and succinct way. The
goal is to help develop familiarity with statistical concepts among students and others with
varied backgrounds. Minimal mathematical background is assumed and the emphasis is on
understanding concepts and how they apply to data. Statistical ideas are explained and then
illustrated with one or two simple examples. We use practical and realistic examples in many
places, and believe a statistical idea is more easily explained through a simple illustrative
example.
Besides teaching basic statistics, this book can also serve students and novices as an
introduction to modern computational packages that are being commonly used nowadays
for statistical analysis, namely Python and R. Two Appendices at the end of the book
provide basic introduction to these two packages, and demonstrate their use by working out
sample exercises taken from the book, in a follow-up Appendix. Further introduction to
these packages comes in the form of worked out Examples in various chapters throughout
the book. Clearly many more complex data analyses can be done using Python and R, and
we restrict ourselves to their use in connection with the basic topics covered in this book on
“Essential Statistics”.
iv
v
Contents
1 What is Statistics? 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
3 Probability ideas 39
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4 Random variables 53
vi
6 Sampling distributions 91
B Introduction to R 214
D Python and R Code for Answering Selected Exercises from the Book 223
E Tables 242
List of Figures
3.1 Venn Diagram for two events A and B with no common outcomes . . . . . 41
8.2 Rejection regions for testing H0 : µ = µ0 with fixed significance level α. . . . 145
10.3 Scatter plot of savings vs. income and the best fitting line . . . . . . . . . . 189
Chapter 1
What is Statistics?
1.1 Introduction
The word “statistics” is used primarily in two different contexts. First, to refer to a collection
of numbers or data, as in “football statistics” and “crime statistics” and secondly, when we
refer to the subject of statistics, which we are about to study.
The subject of “statistics” may be called the “science of data” and broadly described
as “a body of methods for collecting, summarizing, analyzing, and interpreting data”.
Each of these aspects of statistics, simple as it may seem, can be discussed in considerable
detail and at a highly sophisticated mathematical level. We will avoid both the detail and
the mathematics and try to provide a brief glimpse at statistics, focusing on some essential
basic techniques.
We now amplify a bit on each of these aspects, namely collecting, summarizing, analyzing
and interpreting data, thus providing a bird’s eye-view of the subject.
2 What is Statistics?
I. Collecting data
Collecting accurate and representative data in the most economical way is the first pre-
liminary step in statistics. How to collect data is the subject of two specialized branches in
statistics, called “Sampling Techniques” and “Design of Experiments”. Sampling techniques
or methods deal with collecting data in the real world as it exists, namely through opinion
polls, cross-sectional studies, surveys dealing with political and social issues, etc. Sampling
is an essential part of everyday life and we return to it in the next section. The object of
Design of Experiments, on the other hand is to design data collection in a more controlled
setting in order to answer specific scientific questions, as in agricultural or clinical trials. The
goal in either case is to maximize the “information” obtained for a given amount of money
or conversely, to minimize the cost for attaining a given level of precision. To illustrate this
point through a very simple example, consider two objects whose unknown weights say θ1
and θ2 we need to determine, using a simple balance. Since a measurement always involves
error and can only be considered as an “estimate”, one very simple approach is to take one
measurement on each one of these objects, thus getting two separate “estimates”. An alter-
nate procedure (which may not make much sense unless you think more about it), is to take
the same two measurements, costing the same amount of time and money, but this time on
the the “total weight” and the “difference of weights” of the two objects. From these, it is
easy to figure out the individual weights θ1 and θ2 . It can be shown (and we will see later)
that the second approach, which costs the same time or effort, gives estimates of θ1 and θ2 ,
which are 100% more accurate than the first! This is a consequence of the fact that averages
tend to have higher precision, a fact we will learn soon. Thus, designing a proper experiment
is very crucial and different methods of collecting information can be quite different in terms
of their efficiency.
II. Summarizing
Summarizing data is the next essential step before we can make sense out of any large
set of numbers i.e., to understand what the data is about. Suppose we are given the birth-
weights of say, 1000 babies born in a hospital in a particular year. Such a large set of numbers
by themselves would not make much sense even to an expert eye. If they are appropriately
summarized, either graphically or numerically, we can figure out some essential features
of this data, like where is the center and how much is the spread around this center.
You have no doubt heard the saying “A picture is worth a thousand numbers”! Providing
1.2 Population, sample and inference 3
graphical and numerical summaries of data (often sample data from a larger population), is
called Descriptive statistics. Surprisingly, in some cases we find that under appropriate
assumptions, one or two numerical summaries is all that we need to make inferences about
the larger population in place of the original set of 1000 numbers. In the next chapter, we
talk about “graphical summaries” such as the bar charts, pie charts, histograms etc. as well
as “numerical summaries” such as the mean, median, range, standard deviation to measure
certain features of data sets, like the center and spread.
It can be argued that analyzing and interpreting data is the heart of statistics. This is also
sometimes called inferential statistics or statistical inference, i.e., drawing conclusions
or inferences about the “population” after observing only a subset – a “sample” from it.
Most often the data statisticians collect is a sample, i.e., a randomly selected (and hence
representative) subset of the population. We use the word population to refer to the
“entirety of data one can collect on a topic of interest”. For instance, if one is interested
in the “average family income” in the US, the population here consists of incomes of every
family in the US – over 100 million numbers (after one tackles such non-trivial questions as
to what we mean by a “family” and what we mean by “family income” etc.). If suppose
we are interested in “television rating” for a particular TV show in a given week, the word
population then refers to all the data on time spent by each of the potential viewers watching
this show. As a third example, consider the situation where one is interested in figuring out
the “chance of heads” for a given coin. The entirety of data here refers to the results of all
possible tosses with this coin. Since the coin can be flipped forever to obtain more and more
data about this coin, the population is theoretically infinite.
In all these cases, a sensible thing to do is to take a representative sample from such a
population and try to interpret what this sample has to say, regarding the population. This
type of generalizations i.e., drawing conclusions about the population from the observed
sample, is called “statistical inference”. Statistical inference is a fundamental ingredient in
most scientific advances because we can not wait in most cases for “all the data to be in”. In
4 What is Statistics?
some cases, it is patently unwise or impossible to collect all the data - like in the coin-tossing
example. Think what would happen if a manufacturer wanted to determine the “lifetime”
of a certain brand of electric bulb and started to burn each of the bulbs manufactured in
order to get the “totality of data” on the lifetimes. Not good for business since there would
be nothing left to sell!
Population characteristics (like its center or spread) are called parameters and are de-
noted typically by Greek letters such as µ and σ. On the other hand, a sample characteristic,
which is computed based on the sample values, is called a statistic (yet another meaning
1.2 Population, sample and inference 5
for the word “statistics” — as the plural of the word “statistic”) and are generally denoted
by English alphabet, using symbols like x and s. Statistical inference typically proceeds
through “estimating” or sometimes “testing hypotheses” i.e., statements about the unknown
parameters, using sample based values, i.e., the statistics. Sometimes, the populations may
not be characterized through a few parameters and the inference for such populations, called
“nonparametric inference”, is not discussed in this book.
Of course, when one tries to generalize from an observed sample to the (incompletely
observed) population, there are bound to be pitfalls. One can never be sure that the conclu-
sions are quite right and such generalizations may involve errors. The beauty of statistics is
that it allows us to quantify these errors and control them as desired. Since typically a larger
sample size provides more information about the population, choosing an appropriately large
enough sample size is one way to reduce the error. We will come back to this idea of choosing
an appropriate sample size, in later chapters.
Consider the following simple example to illustrate the process of inference. In estimating
the “chance of heads”, recall that the coin can be tossed indefinitely. We are forced to stop
at some finite point, say after 100 tosses. This is a sample of size 100 and suppose the data
collected looks like this (with H and T standing for Heads and Tails respectively):
H, H, T, H,....., T.
Suppose there are altogether 46 heads (and the other 54 are tails) in this sample of 100
tosses. It appears reasonable to declare the observed proportion of heads in our sample,
namely (46/100) = 0.46 as our best guess or estimate of the “chance of heads”. How certain
are we? Of course there is no guarantee that if we tossed the same coin another 100 times,
we will get again 46 heads and 54 tails. This kind of variation from sample to sample, is
referred later on as sampling variability. How different can that other estimate be, from
0.46? If we can declare something like, “Well, we have 0.46 for our estimate but if we repeat
this again, 90% of the time it will be within 0.08 of 0.46”, i.e., the error in our estimate is
no more than 0.08, with 90% chance –that would be somewhat reassuring. This is the type
6 What is Statistics?
of errors we are talking about measuring. This type of measurement of errors, is based on
ideas of probability, which we explore in Chapter 3.
EXERCISES
(a) The average height of the family is 5.5 feet whereas the average depth of the
stream is only 4 feet. So the family can safely walk across the stream.
(b) Statistics is like accounting since both deal with numbers.
(c) Most of the people who wrote the company had nothing but praise for its product.
So the product must be good.
(d) Almost a 1,000 people died of a deadly disease in spite of being vaccinated for it.
So the vaccination must be useless.
1.2 Define what the population is in each of the following cases and how you might draw
a sample:
1.4 Find a topic of statistical inference in today’s paper and report on it critically. Does
the “stock table” giving price changes of stocks represent statistics? If so, in what
sense?
1.5 Market research is often done by telephoning people randomly selected from a telephone
directory. This clearly eliminates people who have no phones or have unlisted numbers.
Can you think of any ways to include one or both these groups in a survey?
EXERCISES 7
1.6 How representative is the sample and how valid is the inductive reasoning in the fol-
lowing examples?
(a) A young lady has been jilted by 3 boyfriends, all of them above 5 feet 6 inches
and she vows never to date guys above 5 feet 6 inches.
(b) A doctor takes a small amount of blood from you and concludes your blood sugar
is normal.
(c) You buy a sack of potatoes after closely inspecting a few that you can actually
see.
1.7 Explain why the following poll which was actually advertised in a major newspaper,
might not be unbiased: “Should handgun control be tougher? You call the shots in
a special call-in poll tonight. If yes call 1-900-720-6181. If no, call 1-900-720-6182.
Charge is $1.00 for the first minute.”
1.8 It is known that 1 in 12 industrial workers have a drinking problem. So the supervisors
are warned “If you have 12 people working for you, one of them has a drinking problem.
If you do not believe it, you are kidding yourself!”. Criticize the conclusion.
8 What is Statistics?
9
Chapter 2
As the British physicist and mathematician Lord Kelvin said in 1883: “.... when you can
not measure it, when you can not express it in numbers, your knowledge is of a meager and
unsatisfactory kind ... scarcely advanced to the stage of science”. And since repeated mea-
surements even under identical conditions vary, Galileo (1564-1642) emphasized this further
by saying: “Measure, measure, measure. Measure again and again to find out the difference
and the difference of the difference”. This is what statisticians really do — measure things
and repeat these measurements, in order to figure out the magnitude of the characteristic of
interest, as well as its variability or the measurement error.
We saw that a “population” refers to all possible data on a given topic of interest. A
particular member of this population will be referred to as an “individual”, on whom we
make the measurement(s). Such measurements can be “qualitative” (or attributes) like
the eye-color, marital status etc. or be “quantitative” — taking on numerical values, like
the height, the family income etc. More systematically, one can define any measurement as
being on 4 levels or scales, each successively more precise than the preceding. The first
two levels of measurement (Nominal and Ordinal) lead to “Qualitative” variables whereas
10 Descriptive Statistics: Graphical and Numerical Summaries
(i) Nominal scale: This is the most basic level of measurement where an individual is
classified into a group depending on whether or not they possess certain attributes
or characteristics. For example, “gender” may be measured as “male” or “female”,
while “marital status” is measured as “single”, “married”, “divorced”, etc. There is no
natural order or ranking to these nominal classes, i.e., we can not say whether male or
female is a better or higher category.
(ii) Ordinal scale: Here, not only do we have a label or name for each category as in
the nominal scale, but also an order attached to them. For example, we may rate a
product as “excellent”, “very good”, “good”, “poor” or when a student is given a grade
A, B, C, D and F. In the latter example, supposedly, an A is better than a B and so
on!
(iii) Interval scale: Here the measurements can be quantified with numerical values which
bear an order relationship. In addition, arithmetic differences are meaningful and in-
tervals of equal width signify equal differences in the characteristic as e.g., temperature
measured in degrees or the scores on a test. However, a zero on this scale does not
signify the absence of the characteristic i.e., there is no “absolute zero”. For instance,
when the temperature is zero, it does not mean there is no temperature.
(iv) Ratio scale: This is a level of measurement which has all the properties of the inter-
val scale and in addition, possesses an “absolute zero”, signifying the absence of the
characteristic. For instance things like height and weight belong to the ratio scale and
taking ratios makes sense here, unlike things measured on interval scale. One who is
6 feet tall is twice as tall as one who is only 3 feet tall. On the other hand, we could
not say that a city where the temperature is 60 ◦ F on a given day is twice as hot as
one where the temperature is 30 ◦ F!
Qualitative variables can be displayed graphically by bar charts, pie charts etc.
2.2 Graphical Summaries 11
A Bar Chart is a simple graphical display in which the length of each bar represents
the frequency in each category. To draw one:
(ii) On the vertical axis, mark the frequency of each of these categories using an appropriate
scale ,
(iii) Place a bar (or rectangle) above each category label with the height corresponding to
the frequency. Base widths of these bars should be equal.
We now give instructions needed to generate the same bar chart using R or Python.
But before using either of these software packages, the reader is referred to the Appendices
B and C and get familiar with the general introduction given there, to these two popular
packages.
R code instruction:
A Bar Chart can be generated in R by the built-in function using the Syntax:
(Note that ”height” is the name of the data set, “main= ” defines the title for the chart,
and “names.arg= ” generates names for each bar).
1 b a r p l o t ( h e i g h t , names . a r g = , main = )
Example 2.1: For purposes of national mortality statistics, every death is attributed to one
underlying condition. According to the National Center for Health Statistics (U.S. Dept. of
Health and Human Services), the 6 leading causes of death in 1995 are as shown in the table
below, along with a bar chart in Figure 2.1 :
12 Descriptive Statistics: Graphical and Numerical Summaries
R code:
1 d a t a s e t=c ( 7 3 8 , 5 3 8 , 1 5 8 , 1 0 3 , 9 3 , 6 8 2 )
2 b a r p l o t ( h e i g h t=d a t a s e t , main=” Bar Chart o f c a u s e s o f death ” , names . a r g=c ( ” Heart
” , ” Cancer ” , ” S t r o k e ” , ” Pulmonary ” , ” A c c i d e n t s ” , ” Others ” ) )
Python code:
1 dataset =(738 ,538 ,158 ,103 ,93 ,682)
2 name=( ’ Heart ’ , ’ Cancer ’ , ’ S t r o k e ’ , ’ Pulmonary ’ , ’ A c c i d e n t s ’ , ’ Others ’ )
3 p l t . bar ( name , d a t a s e t )
4 p l t . t i t l e ( ’ Bar Chart o f c a u s e s o f death ’ )
5 p l t . show ( )
2.2 Graphical Summaries 13
A Pie Chart on the other hand is one in which the total “pie” is divided into slices
proportional to the number of observations (frequency) in each category. It’s main advantage
is in letting the eye see how the total count is divided up into the different categories. To
draw a pie chart:
(i) Find the relative frequency (or percentage) corresponding to each category (e.g., heart
disease has 738/2312 = 0.3192 of the total),
(ii) Find the angle corresponding to each of the categories, keeping in mind that there are
360 degrees in all, at the center. If f is the frequency of a particular category and
n is the total number of the observations, then the corresponding degrees in the pie
f
graph is n
× 360 (e.g., heart disease should correspond to an angle of 0.3192×360 =
115 degrees). A pie chart for this data is given in Figure 2.2.
art
He
Can
cer
e
ok
Str
Ot
he
iden ry
rs
a
ts
on
lm
Pu
Acc
R code instruction:
Similarly, a pie chart can also be plotted in R by using the built-in function with the
command:
Note that x represents the data set, “radius = k” defines the size of the pie chart where k is
a constant, labels are the names of each category.
1 p i e ( x , l a b e l s = names ( x ) , r a d i u s = k , main = )
R code:
1 p i e ( d a t a s e t , r a d i u s =1.0 , l a b e l s=c ( ” Heart ” , ” Cancer ” , ” S t r o k e ” , ” Pulmonary ” , ”
A c c i d e n t s ” , ” Others ” ) , main=” P ie Chart o f c a u s e s o f death ” )
Python code:
1 c o l o r s =( ’ r ’ , ’ y ’ , ’ g ’ , ’ b ’ , ’ g r e y ’ , ’ p u r p l e ’ )
2 p l t . p i e ( d a t a s e t , l a b e l s=name , c o l o r s=c o l o r s , r a d i u s =1.0)
3 p l t . t i t l e ( ’ P ie Chart o f c a u s e s o f death ’ )
4 p l t . show ( )
Quantitative variables on the other hand, can be represented in many ways. We will
describe just two basic graphical methods — histograms and stem-and-leaf plots.
A good way to summarize and make sense of a large set of numbers is to form a frequency
distribution which tells us where the values are and how frequently they occur. To form a
frequency table (or frequency distribution table), we proceed as follows:
(i) Locate the minimum and maximum values among the data.
(ii) Break this range of values into a small number of groups — called “class intervals” or
“bins”.
(iii) Find the frequency in each bin i.e., how many data points fall into each of these class
intervals.
2.2 Graphical Summaries 15
Remark 2.1 An important question here is “how many bins should one use?” Any frequency
table, while it provides a more easily understandable summary, involves loss of information
since we can not reconstruct the original values of the data from the frequencies in the
bins. Smaller the number of bins (with large bin-widths), more such loss of information.
On the other hand, it is not good to have too many bins either since this will mean not
enough summarization has taken place. Typically 5-15 bins are reasonable depending on the
size of the data. In terms of easy interpretation, it is preferable to have equal bin-widths
although it is not always practical, especially when the distribution has long tails with very
few observations there.
We can now construct what is known as a histogram from such a frequency table. A
histogram is a just a bar chart with classes (bins) on the x-axis, with the corresponding
frequencies represented on the y-axis. Note that by looking at a histogram, one can judge
where the center is and how much spread there is around that center. Also one can observe
the “shape” of the histogram i.e., whether it is symmetric or asymmetric. If it is
asymmetric and the longer tail is on the right hand side, the distribution is called “positively
skewed” or “right skewed”, whereas if the longer tail is on the left hand side, it is called
“negatively skewed” or “left skewed”.
Sometimes, we are interested in the proportions of values that fall into each bin,
(called “relative frequencies”) instead of the frequencies. This is called the relative frequency
distribution, from which we can construct a relative frequency histogram. The relative
frequencies, of course, add up to 1. Note that the shape of a relative frequency histogram
is exactly the same as the corresponding histogram, only the scale on the y-axis is different.
For instance, the relative frequency histogram for the website data (Example 2.2 below) will
look exactly the same as Figure 2.3 except on the y-axis, the values 5, 10, 15, 20 are replaced
by 0.1, 0.2, 0.3 and 0.4 respectively.
R code instruction:
In general, a frequency table is necessary as preparation for plotting the histogram. However,
in R programming, a histogram can be generated without having a frequency table first by
using a built-in function:
Note: x is still the data set, xlab is the name of category, and ylab is frequency by default.
1 h i s t ( x , x l a b = , y l a b= , main= )
16 Descriptive Statistics: Graphical and Numerical Summaries
1 plt . hist ( )
Example 2.2: Consider the following data on the number of “hits” per day for the website
of a statistics course, over a 50 day period is:
20, 14, 21, 29, 43, 17, 15, 26, 8, 14, 39, 23, 16, 46, 28, 11, 26, 35, 26, 28,
22, 30, 17, 23, 9, 27, 18, 22, 19, 25, 31, 55, 63, 52, 16, 13, 23, 33, 43, 49,
25, 32, 26, 51, 39, 42, 55, 41, 36, 32.
Solution: By selecting class-intervals of width 10 units each, we get the following frequency
table. The last column in the table represents “relative frequencies” i.e., frequencies in each
class-interval divided by the total frequency which is 50 in this case.
Class Relative
interval frequency frequency
0-9 2 0.04
10-19 11 0.22
20-29 17 0.34
30-39 9 0.18
40-49 6 0.12
50-59 4 0.08
60-69 1 0.02
Total 50 1.00
2.2 Graphical Summaries 17
20
15
frequency
10
5
0
0 10 20 30 40 50 60 70
score
R code:
1 d a t a s e t 1=c ( 2 0 , 1 4 , 2 1 , 2 9 , 4 3 , 1 7 , 1 5 , 2 6 , 8 , 1 4 , 3 9 , 2 3 , 1 6 ,
2 46 ,28 ,11 ,26 ,35 ,26 ,28 ,22 ,30 ,17 ,23 ,9 ,27 ,
3 18 ,22 ,19 ,25 ,31 ,55 ,63 ,52 ,16 ,13 ,23 ,33 ,43 ,
4 49 ,25 ,32 ,26 ,51 ,39 ,42 ,55 ,41 ,36 ,32)
5 h i s t ( d a t a s e t 1 , main=” Histogram o f w e b s i t e h i t s ” , x l a b=” s c o r e ” )
Python code:
1 dataset =(20 ,14 ,21 ,29 ,43 ,17 ,15 ,26 ,8 ,14 ,39 ,23 ,16 ,46 ,28 ,11 ,26 ,35 ,
2 26 ,28 ,22 ,30 ,17 ,23 ,9 ,27 ,18 ,22 ,19 ,25 ,31 ,55 ,63 ,52 ,16 ,13 ,23 ,
3 33 ,43 ,49 ,25 ,32 ,26 ,51 ,39 ,42 ,55 ,41 ,36 ,32)
4 p l t . h i s t ( d a t a s e t , c o l o r= ’ g r e y ’ )
5 p l t . t i t l e ( ’ Histogram o f w e b s i t e h i t s ’ )
6 plt . xlabel ( ’ score ’ )
7 p l t . show ( )
3
18 Descriptive Statistics: Graphical and Numerical Summaries
R code instruction:
There is also a built-in function for stem and leaf plot:
Note: Unlike other graphs, stem-and-leaf diagrams do not contain names since “main=” is
not a part of this function. x is still the data-set, “scale=n” controls the plot length, and
“width=t” is the desired width of plot.
1 stem ( x , s c a l e = n , width = t )
92, 88, 74, 83, 86, 64, 82, 85, 80, 66, 83, 98, 77, 69, 61, 57, 78, 86, 90, 81, 87, 79, 62,
89, 72.
5 7
6 1 2 4 6 9
7 2 4 7 8 9
8 0 1 2 3 3 5 6 6 7 8 9
9 0 2 8
3
stem into two, once with the low leaves (the lower half) and once with the high leaves (the
upper half). For Example 2.3, this gives:
5 7
6 1 2 4
6 6 9
7 2 4
7 7 8 9
8 0 1 2 3 3
8 5 6 6 7 8 9
9 0 2
9 8
R code:
1 d a t a s e t 2=c ( 9 2 , 8 8 , 7 4 , 8 3 , 8 6 , 6 4 , 8 2 , 8 5 , 8 0 , 6 6 , 8 3 , 9 8 , 7 7 ,
2 69 ,61 ,57 ,78 ,86 ,90 ,81 ,87 ,79 ,62 ,89 ,72)
3 stem ( d a t a s e t 2 , s c a l e =1, width =100)
Python code:
1 dataset =(92 ,88 ,74 ,83 ,86 ,64 ,82 ,85 ,80 ,66 ,83 ,98 ,77 ,69 ,61 ,
2 57 ,78 ,86 ,90 ,81 ,87 ,79 ,62 ,89 ,72)
3 p l t . stem ( d a t a s e t )
4 p l t . show ( )
It will be useful to have a summary measure — a single number that represents the center
of a set of values. There are several possible measures of the center and the most important
among these are: the Mean, the Median and the Mode.
20 Descriptive Statistics: Graphical and Numerical Summaries
We will now employ some symbols to represent our data, so we do not have to write
sets of specific numbers each time. This will simplify presentation of the ideas. The values
observed in a sample of size n, maybe represented as x1 , x2 , ..., xn where
Here n represents the size of the sample. Of course, in any given problem in practice,
these (x1 , x2 , ..., xn ) are specific numbers. To measure the center for such a data set, we may
use:
(i) Mean: The arithmetic mean, sometimes loosely called the average of a set of data,
is obtained by summing all the values in the data and dividing by the number of values.
In a physical sense, this corresponds to the “center of gravity” of a system where unit
weights are attached at each of the data points. We use x as the standard notation for
the mean of x values, so that:
= x1 + x2 + · · · + xn
n
n
X
1
= n xi ,
i=1
Pn
where i=1 xi = sum over all the observations xi , with i = 1 to n,
and n = total number of observations.
(ii) Median: A median is the “the middle value” when the data set is arranged in
increasing (or decreasing) order of magnitude. An equal number of observations are
smaller than this median as are larger than this.
2.3 Numerical Summaries 21
th
n+1
If the number of observations in the sample n, is odd, the median is the 2
largest value.
On the other hand, if n is even, the average of the two middle values, namely the
th th
n n
2
and the 2
+1 is taken as the median.
For example, if n = 21, the median is the 11th largest value, whereas if n = 20, the
median is the average of the 10th and 11th largest values. In either case, note that 10
observations are smaller than the median value and the same number of observations
are larger than the median value.
(iii) Mode: Mode is the most frequent value in a given data set.
The mode may not always be uniquely defined since no single value may be most
frequent.
R code instruction:
Built-in functions are constructed in R to find mean and median with the Syntax:
1 mean ( )
1 median ( )
Even though there is no built-in function in R for finding mode, another function in statip
package can give the value of mode. The Syntax is:
1 i n s t a l l . packages ( ” s t a t i p ” )
2 library ( statip )
3 mfv ( )
1 import numpy a s np
2 np . median ( )
Stats package from Scipy library is used to find the mode for a given data set with Syntax:
22 Descriptive Statistics: Graphical and Numerical Summaries
1 from s c i p y import s t a t s
2 s t a t s . mode ( )
Example 2.4: Suppose we have a data set with n = 5 observations, say the scores obtained
by a student in the 5 quizzes given to her. Suppose her scores are
8, 5, 7, 3, 7.
3, 5, 7, 7, 8.
th
5+1
Since we have an odd number of observation, the median is the 2
or the 3rd largest
observation, which in this case is 7.
Finally the mode here is 7 (most frequent value) since it occurs twice compared to other
values which occur only once.
If however n were 4, say with the data being, 3, 5, 7, 7 then the median would be the
average of the middle two values, namely 5 and 7, which is 6.
R code:
1 d a t a s e t=c ( 8 , 5 , 7 , 3 , 7 )
2 mean ( d a t a s e t )
3 median ( d a t a s e t )
4 library ( statip )
5 mfv ( d a t a s e t )
> dataset=c(8,5,7,3,7)
> mean(dataset)
[1] 6
> median(dataset)
[1] 7
> library(statip)
> mfv(dataset)
2.3 Numerical Summaries 23
[1] 7
An outlier is an observation, which is far outside the regular pattern of data. Outliers can
be due to errors in measurement and/or in recording, in which case people tend to disregard
them. However, often outliers may point towards something very important and the cause
of outliers should be investigated. The median and the mode are resistant to outliers and
are referred to as resistant or robust measures of center, whereas the arithmetic mean is
not robust, as the following simple example illustrates.
Example 2.5: Suppose in Example 2.4 with n = 5, the first observation is changed from 8
to 38. Clearly 38 is a very large or an “extreme” value compared to the rest of the numbers
— an outlier. Then for the new data set with the values ordered,
3, 5, 7, 7, 38,
we get
Median = 7,
Mode = 7.
Python code:
1 import numpy a s np
2 dataset =(3 ,5 ,7 ,7 ,38)
3 np . mean ( d a t a s e t )
4 np . median ( d a t a s e t )
5 from s c i p y import s t a t s
6 s t a t s . mode ( d a t a s e t )
12.0 7.0
ModeResult(mode=array([7]), count=array([2]))
3
24 Descriptive Statistics: Graphical and Numerical Summaries
Note that by changing a single observation, the mean x jumped from 6 to 12 whereas
the median and mode did not change — illustrating a single or a few outliers do not affect
the median or mode, while they can change the mean substantially.
Remark 2.2 Now some general guidelines in choosing an appropriate measure of center:
If the data is symmetrically distributed and approximately follows a Normal or bell-shaped
curve (about which we shall talk in Chapter 5 and in later chapters), the mean x is the
preferred measure. The mean is also unambiguously defined and has some nice mathematical
properties. In fact for nice symmetrical distributions with one peak, it is easy to see that the
mean, the median and the mode coincide. However if the data comes from a decidedly skewed
distribution as can be judged from the plots of a histogram or a stem-and-leaf diagram, or
when outliers are present, we would prefer a more robust measure of the center like the
median or the mode. For instance, when talking about house-prices (or incomes), since we
would not want a few very expensive homes (or a few rich individuals) to distort the center,
one would prefer to use the “median selling price” or “median income” rather than the mean,
to represent the center.
Variability or diversity is natural part of life and the theory of statistics is based on such
inherent variability which is all around us. Without this, imagine how monotonous and
dull life would be! Every individual would be the exact replica of every other and from a
statistical point of view, we would not need to take any more than a sample of size one to
reveal the mysteries of the population. Fortunately such is not the case!
To measure this variation, (also called the spread, scatter or dispersion) in a given
data set, there are several possible measures that one could use. We will discuss three of the
most common and important measures of spread. These are the Range, the Inter-Quartile
Range and the Standard Deviation.
(i) Range is the difference between the largest and the smallest values.
Clearly, the more spread out the data is, the larger the range and vice versa. Thus
range is a simple-to-calculate measure of spread.
• The 25th percentile is also called the first quartile and is denoted by Q1 . It is
also the median of the lower half of the observations, i.e., those which are to the
left of the overall median in the ordered list.
• The 75th percentile is also called the third quartile and is denoted by Q3 . It is
also the median of the top half of the observations, i.e., those which are to the
right of the overall median in the ordered list.
Note that the median is also the second quartile. The range between the Q1 and Q3 ,
which contains the middle 50 percent of the data, is another measure of spread for a
given data set and is called the Interquartile range. This is clearly a resistant or
robust measure since it ignores very large or very small values at either end.
R code instruction:
Built-in functions are constructed in R to find IQR and range:
1 IQR( )
1 range ( )
26 Descriptive Statistics: Graphical and Numerical Summaries
Note: The range( ) can only output the Maximum and Minimum values of a given
data set. Therefore, following command is used to find the range:
1 max( )−min ( )
Example 2.3 (contd.): (i)Find the range and the interquartile range (IQR).
(ii)Suppose the score 79 is entered as 19 by mistake. Find the new range and IQR.
(iii)Suppose the student with the midterm score of 57 dropped out of the course, later.
What is the new range and the IQR?
Solution:
57 61 62 64 66 69 72 74 77 78 79 80 81 82 83 83 85 86 86 87 88 89 90 92 98
The range = 98 - 57 = 41. The median is the 13th largest observation, which is 81.
69+72
Q1 is the median of the 12 observations to the left of the median = 2
= 70.5,
86+87
Q3 is the median of the 12 observations to the right of the median = 2
= 86.5.
R code:
1 d a t a s e t=c ( 5 7 , 6 1 , 6 2 , 6 4 , 6 6 , 6 9 , 7 2 , 7 4 , 7 7 , 7 8 , 7 9 , 8 0 , 8 1 , 8 2 ,
2 83 ,83 ,85 ,86 ,86 ,87 ,88 ,89 ,90 ,92 ,98)
3 max( d a t a s e t )−min ( d a t a s e t )
4 IQR( d a t a s e t )
> dataset=c(57,61,62,64,66,69,72,74,77,78,79,80,81,82,83,83,85,86,
86,87,88,89,90,92,98)
> max(dataset)-min(dataset)
[1] 41
> IQR(dataset)
[1] 16
Python code:
1 data = ( 6 1 , 6 4 , 6 9 , 7 4 , 7 7 , 7 8 , 7 9 , 8 0 , 8 1 , 8 2 , 8 2 , 8 2 , 8 3 , 8 3 , 8 5 , 8 6 , 8 6 , 8 6 , 8 7 , 8 8 ,
2 89 ,90 ,92 ,98)
3 max( data )−min ( data )
4 from s c i p y import s t a t s
5 s t a t s . i q r ( data )
The output of this Python code gives the range and the IQR:
37.0 8.0
28 Descriptive Statistics: Graphical and Numerical Summaries
The five values, namely, (the minimum value, Q1 , the median, Q3 , the maximum value)
provide a useful description of where the data is and how much the spread is and is
called the Five Point Summary.
R code instruction:
In order to find the five values at once, the function is:
1 summary ( )
Boxplots
The information contained in the five-point summary, can also be represented graph-
ically by what is called a boxplot, which draws a box of a desired width from Q1
through Q3 , noting the center at the median. The box has whiskers which connect
up to the minimum value at one end and the maximum value at the other end.
R code instruction:
1 b o x p l o t (X, h o r i z o n t a l=TRUE)
2 #h o r i z o n t a l=TRUE s e t s t h e b o x p l o t i n t h e h o r i z o n t a l d i s p l a y . I f i t i s not
s p e c i f i e d , t h e b o x p l o t w i l l be d i s p l a y e d v e r t i c a l l y by d e f a u l t .
1 import m a t p l o t l i b . p y p l o t a s p l t
2 pl t . boxplot ( )
3 p l t . show ( )
Thus the five point summary consists of these 5 numbers and is (57, 70.5, 81, 86.5, 98).
The boxplot is given in Figure 2.4.
2.3 Numerical Summaries 29
ch
50 60 70 80 90 100
R code:
1 d a t a s e t=c ( 5 7 , 7 0 . 5 , 8 1 , 8 6 . 5 , 9 8 )
2 b o x p l o t ( d a t a s e t , h o r i z o n t a l=TRUE)
Remark 2.3 Although there is not a formal definition of an outlier, we may consider
a data point an outlier, if it is 1.5× IQR above Q3 or 1.5×IQR below Q1 . If such
outliers are present in a given data set, then the whiskers are drawn only to points
within this range ( Q1 - 1.5×IQR, Q3 + 1.5×IQR ) with outliers being identified by
small open circles –isolated points unconnected to the main part of the data.
Thus, averaging these deviations does not tell us anything useful about the data spread.
On the other hand, squaring these deviations makes them all positive and if we then average
these, it will result in a useful measure of spread. It turns out that it is better to divide the
sum of these squared deviations by (n − 1), rather than n, to give us what we will later call
30 Descriptive Statistics: Graphical and Numerical Summaries
an “unbiased” measure of the true variation. This division by (n − 1) may also be justified
by the fact that only (n − 1) of these deviations are actually independently determined, since
their total is always fixed at zero by Fact 2.1. Thus we define a very useful alternate measure
of spread namely the
Pn 2
Sample Variance,
2
s = i=1 (xi −x)
n−1
( x )2
P
xi 2 − n i xi 2 −( xi )2
P P P
2 n
s = n−1 = n(n−1)
If the data is more spread out from the center, then the deviations are larger and so is the
resulting value of s2 and s. Thus, s is a useful measure of variation or spread in a given data
set.
Fact 2.2: Since we are adding the squares of deviations, it is clear that
s2 ≥ 0 for all data sets
and
s2 = 0, if and only if there is no variation in the data
i.e., all the values are the same.
√ √
Sample Standard Deviation (SD), s= variance = s2 .
Notice that taking the square root gets us back the original units. For instance, if the
2.3 Numerical Summaries 31
weights are measured in lbs, s2 is in (lbs.2 ) while s brings it back to the original units, viz.,
the lbs.
R code instruction:
1 sd ( )
2 #f u n c t i o n f o r c a l c u l a t i n g t h e sample s t a n d a r d d e v i a t i o n
3 var ( )
4 #f u n c t i o n f o r c a l c u l a t i n g t h e sample v a r i a n c e
In this example, the “deviations from the mean”, i.e. (xi − x), are
and their sum is zero as claimed in Fact 2.1. However the sample variance
s2 = 9 + 1 + 1 + 1 + 4 = 16 = 4
5−1 4
Thus the standard deviation, which is the square root of the variance, is 2. In this
example it is easy to see that the SD is zero if and only if all the 5 observations are the same,
namely 6, 6, 6, 6, 6 i.e., when data has no variation.
Python code:
1 data = ( 3 , 5 , 7 , 7 , 8 )
2 import s t a t i s t i c s
3 s t a t i s t i c s . v a r i a n c e ( data )
4 s t a t i s t i c s . s t d e v ( data )
Remark 2.4 If we ever needed to reconstruct the original data from a frequency table,
for instance to find the average or the sample variance it is a reasonable approximation to
assume that all the observations falling inside a bin, have their values at the “midpoint”
of that class-interval. Thus, in the above formula for mean and standard deviation, replace
P 2
fi m2i ), where mi is the mid-point of the class interval
P P P
( xi ) by ( fi mi ) and (x) i by (
i and fi is the frequency of that interval.
Remark 2.5 Most calculators have a button for σn−1 . This is the s with the denominator
(n − 1), that we want.
In general, a large part of statistical inference is based on the so-called Normal Distri-
bution (which we discuss in Chapter 5) and then the preference is to use the mean x as the
measure of center and s2 (or s) as the measure of spread. Although it is beyond the scope
of this book, these statistics (x, s) are called “sufficient statistics” for such data. i.e., they
contain “all” the information about the model that the entire data possesses. Also (x, s)
are clearly and unambiguously defined algebraically. On the other hand, in dealing with
distributions which have longer or heavier tails and when outliers are present, more robust
measures like the median and the IQR are preferred to x and s.
EXERCISES
2.1 Determine the level/scale of measurement for each of the following variables:
2.2 A Statistics class consisted of the following majors (with the corresponding number of
students):
EXERCISES 33
Communication 36
Biology 15
Economics 22
Psychology 32
Undeclared 40
What is the variable being measured and what scale is it measured in? Plot a bar
graph and a pie chart representing this data.
2.3 (a) What distinguishes measurements that can be made in “Ratio scale” versus those
that can only be made in “Interval scale”?
(b) Can a measurement be in both interval and ratio scale? Give an example.
(c) Can qualitative data be in interval scale?
(d) What scale is “the price of IBM computer stock” measured in?
2.4 Of the total of 14,526 domestic freshman admissions at the University of Califor-
nia, Santa Barbara in 1997, 140 were American Indian/Alaskan, 438 were African
American, 2,215 were Chicano/Latino, 3,075 were Asian/Pacific Islander, 7,933 were
White/Other, while 725 declined to state their race. Draw a bar chart and a pie chart
representing this data.
2.5 Suppose a radar records the speeds of several automobiles, based on which the mean,
median, mode, range, IQR and s2 were computed. But on closer scrutiny, it was found
that the radar had been calibrated wrongly and all the recorded speeds are 5 miles too
fast. How does it affect each of these summary measures?
2.6 The following data represent the number of hours that 30 students devote to homework
and study in a typical week:
5, 5, 16, 7, 18, 8, 3, 5, 10, 16, 18, 26, 25, 30, 35, 5, 30, 7, 6, 11, 14, 12, 10, 17, 10, 9,
12, 13, 10, 7.
Make a split stem-and-leaf plot. From the plot, what can you say about the center,
spread and shape (symmetry) of this data.
2.7 The following data gives the weight gain (in ozs.) of 10 rabbits in a nutrition experi-
ment.
34 Descriptive Statistics: Graphical and Numerical Summaries
2.8 The following data gives the waiting times in minutes of 15 calls to a company’s
customer service telephone line.
10, 1, 2, 9, 5, 3, 4, 1, 0, 5, 6, 5, 3, 9, 3
(a) Find the “five-point” summary and draw a box-plot for this data.
(b) Calculate the sample mean and sample standard deviation for this data.
2.9 When you drive on a certain highway at 55 miles per hour, you pass nearly the same
number of cars as the number that pass you in the same direction. Is 55 the mean,
median or the mode of speed on that highway?
2.10 Albert Michelson (1852-1931) won a Nobel Prize in physics in 1907 for development and
use of precision optical instruments. In 1882, he obtained the following measurements
on the speed of light (in kms. per second):
yi = xi − c,
Show that the sample variance of the y-values is exactly the same as the sample
variance of the x-values, but the mean, y = x − c.
(c) Subtract 299,000 from each of the measurements on the speed of light and recal-
culate the sample mean and variance and relate to part (b).
EXERCISES 35
P
Pn ( xi )2
i=1 (xi − x)
2 xi 2 −
P
2 n
s = =
n−1 n−1
2.13 The ratings of several football players is summarized in the following stem and leaf
plot (with colon representing the decimal):
79 : 3 7
80 : 2 3 5 5
81 : 6 6 7 8
82 : 6 6
83 : 4
84 : 8
85 :
86 : 6
2.14 A mathematics achievement test consisting of 100 questions was given to 25 sixth
grade students at Maple Elementary school. The following data shows the number of
questions answered correctly by each student:
49 61 51 57 49 55 69 88 89 55 77 51 61
68 54 67 57 63 71 77 84 79 75 65 50
(a) Prepare frequency table with class width 10, starting from 45.
36 Descriptive Statistics: Graphical and Numerical Summaries
2.15 A TA is looking at a student’s scores for 6 quizzes. The average of the 6 quiz scores is
9 with a standard deviation of 1. Next day the student takes Quiz # 7 and receives a
score of 9.
(a) What is the average and the standard deviation of the 7 quizzes?
(b) How well must the student do on Quiz # 8 in order to keep 9 as his average score?
2.16 A city administrator is investigating the efficiency of its Fire Department. The time,
in minutes, taken by the Fire Department to respond to 15 alarms, is given below:
1, 1, 2, 2, 6, 9, 4, 1, 7, 10, 2, 4, 5, 10, 5
(a) Find the “five-number summary” and draw a boxplot for this data.
(b) Calculate the sample mean and sample standard deviation for this data.
2.17 Find (a) mean (b) median (c) mode and (d) SD of the following 10 observations:
2.18 Draw a stem and leaf plot for the following enrollment data for a statistics class:
36 28 33 41 39 36 31 35 33 35 39 41 36
30 38 37 41 44 46 38 33 35 47 27 36
2.19 Create a stem-and-leaf display, five-number summary and boxplot for these data on
reaction times in a psychology experiment.
20.5 18.7 17.2 18.5 19.2 16.7 19.0 21.6 28.3 21.2
18.3 25.3 17.9 19.9 18.9 19.0 15.7 20.3 22.6 18.1
23.0 18.4 20.4 19.3 22.5 21.9 19.2 18.0 19.0 19.5
2.20 What are the mean, median and mode of the following distributions? Which measure
of central tendency best reflects each distribution and why?
EXERCISES 37
2.22 The mean advanced mathematics achievement for male and female students in 16
countries having taken advanced mathematics in their final year of secondary school
(1994–1995) are listed in the table below. (Source: Mulis et al. (1998), Mathematics
and Science Literacy in the Final Year of Secondary School, Chestnut Hill, MA.)
Draw separate side-by-side boxplots for males and for females and comment if you
think there is any ”gender gap” in mathematics.
38 Descriptive Statistics: Graphical and Numerical Summaries
39
Chapter 3
Probability ideas
3.1 Introduction
Any discussion of probability starts with what we call a random experiment. A random
experiment is one for which we know in advance, the set of all possibilities or outcomes -
but we can not predict which of these outcomes would occur in any specific trial. The set
of all possible outcomes is called the sample space. We give below a few simple examples
of random experiments and their sample spaces:
Example 3.2: Roll a die and observe the score on the top face.
Example 3.3: Throw a basketball and keep track of how many attempts it takes for the
1st successful basket.
Example 3.4: Wait for a taxi in a new town and record the time it takes for the taxi to
arrive. The outcomes are possible times t, say in seconds and can be represented by the set
B = {5, 6 }.
We would like to define probabilities for various events. To be consistently defined, such
probability should satisfy the following 3 basic rules, called the axioms:
Rule 2. In any random experiment, probability of all the outcomes added together,
is one.
The third axiom or rule can be generalized for any collection of events -not just two, but
3.1 Introduction 41
we shall keep out of such trouble, for the sake of simplification. A pictorial representation
of sets and outcomes is through the so called Venn Diagram:
Sample Space
A B
Figure 3.1: Venn Diagram for two events A and B with no common outcomes
Recall the die rolling Example 3.2, where there are 6 outcomes. Since we assumed that the
die is fair, all scores are equally likely i.e., any score has the same probability as any other.
However, Rule 2 stipulates that all the probabilities have to add to one. Therefore, it follows
that
From this and by following Rule 3, we can now compute the probability of any event of
interest. In particular,
1
P(A) = P( score is odd ) = P({1, 3, 5}) = P(1) + P(3) + P(5) = 6
+ 16 + 1
6
= 36 ,
while
Not always will two events be non-overlapping or disjoint. When two events A and B
have common outcomes, we can represent them as follows:
Some consequences of the three basic rules or axioms provide us with further “extended
rules”, 4 and 5:
A B
Observe that A and “not A” do not have any common outcomes and they make up the
whole space. So by Rules 2 and 3,
P(A) + P( not A) = P(all outcomes) = 1,
Therefore,
P(A or B) = P(A) + P(B) − P(A and B).
In Example 3.2, since A and B refer to the outcome 5 which has probability 16 , we get
3.1 Introduction 43
3
P(A or B) = P(A) + P(B) − P(A and B) = 6
+ 62 − 1
6
= 4
6
Example 3.5: Suppose we select a 2 digit random number. Find the probability it is
divisible by either 3 or 4.
Solution: There are altogether 100 outcomes, all of them equally likely, namely
34
P(A) = 100
25
P(B) = 100
34 66
P(not A) = 1 − P(A) = 1 − 100
= 100
34 25 9 50
P(A or B) = P(A) + P(B) − P(A and B) = 100
+ 100
− 100
= 100
= 0.5 .
3
Remark 3.1 In such cases as in Example 3.2 and Example 3.5, where the total number of
outcomes is finite and they are all equally likely, then probability is obtained as a ratio of
the number of favorable outcomes to the total number of outcomes, by using the formula
Example 3.6: If three people are selected at random, the chance that they all have the
365 1
same birthday is 365×365×365
= (365)2
, observing there are 365 days in a year (ignoring leap
years) and these are equally likely.
3
44 Probability ideas
Remark 3.2 None of these rules tell us exactly how we assign probabilities in a given
experiment. Such probability assignment is done using known physical or natural laws that
govern the experiment (e.g., a coin from a US mint is typically fair, certain combination
of atmospheric conditions lead to rain, etc.) or empirical evidence, i.e., past experience.
Sometimes probabilities may be merely subjective, i.e., expert or educated guesses.
A= Score is odd = { 1, 3, 5 },
B= Score is 2
and C= Score is 3.
P(B) is 1/6 since there are six possible outcomes to start with. But if we are given the
event A has occurred, i.e., we are told an odd score is the outcome, then the conditional
probability of B, given A has happened, denoted by P(B|A) (the vertical bar inside the
probability statement is read “given”), is 0 since the score can not be 2 any longer. In fact,
the occurrence of A makes our new sample space of possible outcomes to be { 1, 3, 5 }. For
instance, P(C|A) = 1/3 since it is one of the possible 3 outcomes that can happen. Thus the
conditional probability of B given A is the chance of the outcomes in B relative to those in
A or
Referring to the Venn Diagram 3.2 is helpful. Originally the rectangular box represented
the set of all outcomes but once we are given A, the set of outcomes are restricted to those
3.2 Conditional probability and independence 45
inside the circle A. In the above example, since the numerator P (A and B) = 0, so is
P(B|A).
Example 3.6: Suppose we throw a pair of fair dice and observe the 2 scores. Let
If the chance of an event A is the same as the conditional chance of A given another event
B, it means that the occurrence (or non-occurrence) of B does not affect the chance of A.
46 Probability ideas
All the three statements are equivalent, i.e., if any of one of them is true, then all three
statements are true. For example,
P(B|A) = P(B)
P(A and B)
⇔ = P(B)
P(A)
This leads to another extended rule, the so called multiplication rule for independent
events:
Rule 6. P(A and B) = P(A)×P (B), if A and B are independent.
Example 3.6 (contd.): In this example, it is seen that A and B are independent since
Knowing that the score on the second die is 3 does not change the chance of getting score
2 (or any other event) relating to the first die. The multiplication rule for such independent
3.3 Bayes theorem 47
3
Independence of events is what allows us to say, for instance, that if the chance of making a
“free throw” is 0.2 each time for a particular basketball player, then in 3 attempts,
In other words, if we know the probability of an event A conditional on other events and we
48 Probability ideas
also know what proportion of the time these conditions hold, we can put these together to
find P (A).
Example 3.7: A microchip manufacturing plant has 3 machines that produce the chips.
Machine 1 produces 30% of the output and of these, 2% are defective; Machine 2 produces
45% of the output and of these, 1% are defective; Machine 3 produces the remaining 25% of
the chips and of these, 3% are defective. Find the probability that a randomly selected chip
from this company is defective.
A= chip is defective
The so called Bayes formula (due to Reverend Thomas Bayes (1702-1761)), allows
us now to calculate some “inverse” probabilities. Since, by the definition of conditional
proability,
P(Bi |A) = P(A and Bi )/P(A)
and we can write the numerator as the product of P(A|Bi )P(Bi ) while the denominator has
the total probability formula, we obtain
EXERCISES 49
P(A|Bi )P(Bi )
P(Bi |A) = .
P(A|B1 )P(B1 ) + P(A|B2 )P(B2 ) + . . . + P(A|Bk )P(Bk )
This formula allows us to find the conditional probability of Bi given A using the other
type of conditional probabilities namely, P(A|Bi ).
Example 3.7 (contd.): If a chip selected from this factory’s output turns out to be defec-
tive, find the probability that it was made by Machine 3.
Solution: In our notations, we need P(B3 |A), which, by the Bayes formula is
EXERCISES
3.1 California’s lotto involves picking six numbers without repetition from 1 to 51. If you
buy a ticket and pick a set of six numbers, what is the probability that you select the
six winning numbers?
3.2 A student’s chance of passing a statistics test is 0.8 if he studies and only 0.2 if he
does not study before the test. If the chance that this student studies is 0.6, find the
probability
3.3 On a day in December, the probability that it will snow in Boston is 0.4 and the prob-
ability that it will snow in Moscow is 0.7. Assuming independence, find the probability
that it will snow:
50 Probability ideas
3.4 A, B are two events such that P(A) = 0.5, P(B) = 0.2. Find P(A or B) if A and B
are (a) disjoint (b) independent.
3.5 Suppose a weather bureau reports that the chance of precipitation tomorrow is 70%,
while the chance that the temperature will be less than 32o F is 25%. If these two
events are independent, what is the probability that tomorrow it will
(a) snow? (assume that the precipitation turns into snows at a temperature less than
32◦ F.)
(b) rain?
(c) be less than 32◦ F but not snow?
(d) be over 32◦ F but not rain?
3.6 A die is loaded so that any even score is twice as likely as any odd score. Find
(a) P(score ≥ 4)
(b) P(odd score).
3.7 A committee of four people is selected from a group of five men and six women.
3.8 Assume that all the 12 months in a year are equally likely to be the month of birth. If
we selected 3 people at random, what is the probability that
3.9 Three students A, B, C can solve problems in a statistics book with probabilities 0.6,
0.8 and 0.3 respectively. Assuming that they all work independently, for a randomly
selected problem, find the probability that
3.10 Let A and B be two events such that P(A) = 0.5, P(A or B) = 0.7. Find P(B) if
3.11 Assume that the probability is 0.95 that a jury selected to try a criminal case will
arrive at the correct verdict, i.e., find one guilty if actually guilty and find innocent
if actually innocent. Suppose the police in a certain town are quite careful in their
investigations and 99% of the people brought for trial before a jury are actually guilty.
Find the probability that
3.12 Two different suppliers A and B provide a manufacturer with the same part. All the
supplies of this part are kept in a large bin. In the past, 5% of the parts supplied by
A and 9% of the parts supplied by B are defective. A supplies four times as many
parts as B. Suppose you reach into the bin and select a part and find it is nondefective.
What is the probability that it was supplied by A?
3.13 In the parking lot of a high tech company, 35% of the cars parked are US-made and of
these, 20% are luxury cars; 40% are European-made and of these, 40% are luxury cars;
the other 25% are Japanese-made and of these, 15% luxury cars. Find the probability
that (a) if a car is picked at random in this lot, it is a luxury car (b) if a car is luxury
model, find the probability that it is European-made.
52 Probability ideas
3.14 A fair coin is flipped three times. Let A denote the event “head occurs on the first
flip” and B, the event “the same face does not occur on all three tosses”.
(a) Write down the sample space for this experiment as well as the sample points
that correspond to the two events A and B.
(b) Are A and B independent events? Are they mutually exclusive?
(c) Find P(A and (not B)) and P(not A|B).
3.15 A student has 4 pairs of socks in his dresser, each pair a different color. He has to
dress before dawn without waking his roommate and so he grabs a pair of socks without
seeing them and puts them on. What is the probability that they are of matching color?
3.16 A deck of playing cards consists of 52 cards = 4 suits x 13 ranks. A poker hand consists
of five different cards, chosen so that any five cards are equally likely. Clubs is one of
the suits with 13 of them in the deck. What is the probability that a poker hand will
consist of all clubs?
3.17 Five people get into an elevator on the ground floor of a building which has 5 upper
floors. What is the probability that they all get off on different floors?
53
Chapter 4
Random variables
Or the random experiment may be to “select a student from a class of 100 students” and
we may be interested in
Notice that the random variable X defined above can take only 5 discrete values: 0, 1,
2, 3 and 4, whereas Y can take all values continuously in a given of range of height. This
property distinguishes how we deal with the calculation of probabilities corresponding to
these random variables. We now discuss these two basic types of random variables:
A discrete random variable is one which takes a finite (or a countable) number of values. A
list of such values and the corresponding probabilities is called the probability distribution
of the r.v.
Values of x x1 x2 . . . xn
Probabilities p1 p2 . . . pn
Pn
These pi = P(X = xi ), being probabilities, satisfy the conditions pi ≥ 0 and i=1 pi = 1 .
Example 4.1: Toss a coin 3 times. There are 8 (= 2 × 2 × 2) equally likely outcomes (with
1
probability 8
each) namely,
All this can be summarized in the following table which lists possible values of X and
the corresponding probabilities:
x 0 1 2 3
1 3 3 1
P(X = x) 8 8 8 8
3
4.2 Continuous random variables 55
Example 4.2: Consider a slightly more complex random experiment which is to “roll a fair
die twice”. Then
(1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6),
(2, 1), (2, 2), (2, 3), (2, 4), (2, 5), (2, 6),
(3, 1), (3, 2), (3, 3), (3, 4), (3, 5), (3, 6),
outcomes = .
(4, 1), (4, 2), (4, 3), (4, 4), (4, 5), (4, 6),
(5, 1), (5, 2), (5, 3), (5, 4), (5, 5), (5, 6),
(6, 1), (6, 2), (6, 3), (6, 4), (6, 5), (6, 6)
Since this is a fair die, all these 36 outcomes are equally likely and the chance of any
1
particular outcome is 36
. Define the random variable:
This random variable can take values from 2 through 12 and the probabilities correspond-
ing to each of these values can be computed. For instance, the total score is 5 corresponds
to the outcomes {(1, 4), (2, 3), (3, 2), (4, 1)} and hence,
4
P(total score is 5) = P (X= 5) = 36
.
It can be checked that the Probability distribution of X is given by the following table.
x 2 3 4 5 6 7 8 9 10 11 12
1 2 3 4 5 6 5 4 3 2 1
P(X = x) 36 36 36 36 36 36 36 36 36 36 36
Note that a total score of 7 has the largest probability, with the probabilities decreasing
symmetrically for values on either side of 7.
3
A continuous random variable takes all the values in a given interval. For instance, we
may select an individual at random and measure his or her height, say X or the weight, Y .
56 Random variables
It is clear that X can take all the values within the range of say 0 to 8 feet, not just integer
values like 0, 1, 2, . . . as in Examples 4.1 and 4.2.
Continuous random variables take too many (an uncountable number of) values and this
makes it impossible for us to make a list of values and assign probabilities to each individual
value. Instead, in the continuous case, we define probabilities through what is known as
a density curve. This is a curve above the x-axis and with a total area of 1 under it.
Probabilities for intervals are defined as areas under such a density curve, i.e.,
a b
Figure 4.1: Continuous density curve
Example 4.3: Let X be the time (in minutes) taken for a fire truck to arrive after one
dials 911. Suppose this time can equally well be anywhere between (2, 15) minutes. Such
a situation where all the values within a range are equally likely for a continuous random
variable, is called a continuous uniform distribution and the random variable is called a
continuous uniform random variable. The density curve has the same height and has a
rectangular shape. In this example, since the base is of length 13 units, the height of the
1
density curve is 13
, so that we have
4.3 Mean and Variance of a random variable 57
13
0 2 5 8 15
1 3
P(5 < X ≤ 8) = (Area between 5 and 8) = 3 × 13 = 13
whereas,
1 4
P(X ≤ 6) = P(2 < X ≤ 6) = 4 × 13
= 13
.
3
n
µ = (x1 × p1 ) + (x2 × p2 ) + · · · + (xn × pn ) = xi p i .
P
i=1
The mean represents the “long-run average” value of the random variable X. That is,
if we keep a record of the actual values the random variable takes and average these over a
large number of times, it will be very close to µ. Intuitively, this makes sense as a “weighted
average” with the weights being the probabilities or the proportion of times the random
variable takes a particular value.
58 Random variables
One can also define a measure of spread around this mean by calculating the deviations
from the mean (xi −µ), squaring them and averaging these with the same weights, pi .
This process of taking deviations from the mean and averaging their squares is similar to
calculating the sample variance s2 . This gives us the variance of the r.v. X, denoted by
σ2,
σ2 =
P 2
xp
i i − µ2 ,
P 2
so that once we have µ, it suffices to calculate xp i i. The positive square root of this
variance, i.e., the σ, is called the standard deviation (SD) of the random variable X.
Example 4.1 (contd.): Recall X= number of heads in 3 tosses of a fair coin. From the
probability distribution given before, we have mean
µ = 0 × 81 +1 × 38 + 2 × 38 + 3 × 18 = 12
8
= 3
2
= 1.5.
Thus, if we repeat the “experiment of tossing a fair coin 3 times” a very large number of
times and record the number of heads each time, the average of the number of heads will be
very close to 1.5.
The variance
1 3 3 1
σ 2 = (0 − 1.5)2 × + (1 − 1.5)2 × + (2 − 1.5)2 × + (3 − 1.5)2 ×
8 8 8 8
6 3
= = = 0.75
8 4
EXERCISES 59
One can define the mean µ (as a measure of center or the center of gravity) and the
variance σ 2 (as a measure of spread) for continuous r.v.s also, but this involves calculus and
is beyond the scope of this book. However, one may see for instance, in Example 4.3, that
the center of gravity is at the middle of the range, so that
2+15
µ= 2
= 8.5 minutes.
EXERCISES
4.1 The number of ice cubes put into a soft drink glass in a restaurant is a random variable
X with the following probability distribution:
x 1 2 3 4 5
P(X = x) 0.1 0.2 0.4 0.2 0.1
Find
4.2 The number of passengers in a car has the following probability distribution:
x 1 2 3 4 5
P(X = x) 0.4 0.3 0.15 ? 0.05
60 Random variables
(a) Find the probability that there are exactly 4 passengers in a car.
(b) What is the probability of at most 4 passengers in a car?
(c) What is µ? How do you interpret this number?
(d) Find the standard deviation of the number of passengers in a car.
4.3 The number of defects (denoted by X) in a new car from a certain manufacturer has
the following probability distribution:
x 0 1 2 3 4
P (X = x) 0.50 0.30 0.10 ? 0.05
(a) Find the probability that there are exactly 3 defects in a new car.
(b) What is the probability of at most 2 defects in a new car?
(c) What is the mean and standard deviation of the number of defects?
4.4 If X is the difference in scores when a pair of fair dice are rolled, find its probability
distribution. What is the mean µ and standard deviation σ for X?
4.5 A statistics professor has determined from past experience the following probability
distribution for X, the GPA (with A = 4, . . . , F = 0):
x 0 1 2 3 4
P (X = x) 0.10 0.15 0.25 ? 0.15
4.6 In the Goren point-count system for bidding in contract bridge, cards are assigned
points as follows (there are 52 cards in a deck):
(a) Give the probabilities that a card drawn at random will have 0, 1, 2, 3, and 4
points, respectively.
(b) Compute the mean value of the number of points assigned to a card selected at
random.
(c) Find the average number of points in a hand of 13 cards dealt at random. (Hint:
Determine the total number of points in the deck and decide how they would be
distributed among the four players, on the average.)
4.7 In a lottery your winnings/return, say X, is a random variable. Suppose it has the
following probability distribution:
x 0 100 1,000
P(X = x) 0.998 0.001 0.001
4.8 The following table gives the probability distribution for the rating (the number of
stars) X of hotels in a large town.
(a) Find the probability that a randomly picked hotel has 4 or 5 stars.
(b) Find the mean rating and interpret this result.
(c) Find the standard deviation of hotel ratings.
4.9 In a certain state, historical records indicate that it takes anywhere from 1 to 5 attempts
for a person to acquire his or her first driver’s license. The following table represents
the number of attempts X, needed in order to acquire the license.
62 Random variables
(a) Find the probability that one does not get a drivers licence in the first two at-
tempts.
(b) Find the mean and the standard deviation for X, the number of attempts one
needs.
4.10 An outdoor rock concert draws a “full-house” and the organizers make $30,000 profit
if the day of the event is sunny, they make only a $15,000 profit if it is cloudy and they
expect to lose $10,000 if it rains on the day of the event. Suppose the weather bureau
predicts that there is a 20% chance of rain and 30% chance for a cloudy day and 50%
chance for the day to be sunny.
(a) Write down the probability distribution for X, the profit the organizers make.
What is the expected profit and its standard deviation?
(b) Would it be worthwhile for the organizers to buy an insurance policy costing
$5,000 from an insurance company, if they cover the $10,000 loss in the event of
rain? (Find the expected profit again, in this case)
(c) If $5,000 is too much, what is a reasonable premium to pay, to cover the loss in
the event of rain?
4.11 A Pizza shop sells pizzas in four different sizes. If X denotes the size (diameter) of the
pizza ordered, based on last years’ sales, the following information was obtained:
4.12 The number of calls X, to a 911 number in a small city has the following probability
distribution:
EXERCISES 63
x 0 1 2 3 4 5 6
P(X = x) .1 .2 .2 .3 .1 .05 .05
(a) Find the probability that there will be three or more calls on a given day.
(b) Find the expected number of calls µ and the s.d. σ.
4.13 A particular statistics professor always takes a few minutes extra before she dismisses
the class. Suppose X denotes the extra time (in minutes) and it can equally well be
anywhere between 0 and 8 minutes i.e., the density curve has a constant height between
0 and 8 minutes.
4.14 Suppose the amount of gasoline customers buy from a gas station (in number of gallons)
is denoted by the random variable X. From past records, it was found that the density
curve is a triangle, starting at height zero when x=0 and going with constant slope to
a height of (1/10) at x=20.
(a) Draw a picture of this density curve and verify the total area is 1. (Recall the
area of a triangle = (height x base)/2).
(b) Find the probability that a customer who pulls in buys, (i) less than 10 gallons
(ii) between 12 and 15 gallons (iii) more than 16 gallons.
(c) Try and figure out the average number of gallons bought by a customer, remem-
bering that the average µ corresponds to the center of gravity of this triangular
shape. What can you say about the standard deviation of X?
64 Random variables
65
Chapter 5
In this chapter, we consider two very basic probability distributions that are omnipresent
in statistics. One of them is a discrete probability distribution — the so-called Binomial
distribution, which arises in connection with counts and/or proportions. The other one
we will discuss is a continuous distribution — called the Normal distribution, which is
a typical model assumed in connection with continuous measurements like heights, weights,
scores, etc.
The Binomial distribution is a very important and basic discrete probability distribution,
which arises under the following three conditions:
(i) A random experiment has just 2 possible outcomes, which we label, for convenience as
(ii) The probability of success, denoted by p, (0 < p < 1) is fixed from trial to trial. It is
clear that
P(Failure) = 1 - P(Success) = 1 - p.
Then, we define
The number of trials n and the probability of success on any single trial p, completely de-
termine the situation and the probabilities of various events. We refer to this as Binomial(n, p)
or Bin(n, p) distribution. Note that an outcome of these n trials can be written as a string
of n symbols consisting of S or F. If k is any number, i.e., an integer between 0 and n, then
the probability of any specific outcome with k successes and (n − k) failures such as
S · · · S} F
| S {z · · · F}
| F {z
k times n−k times
is
(p × · · · × p) ((1 − p) × · · · × (1 − p)) = pk (1 − p)n−k .
| {z }| {z }
k times n−k times
The outcome given above has the k successes in the first k places and all the failures in the
last (n − k) places. If, on the other hand we ask for the probability of k successes out of
n trials, no matter which k places they occur, we have to add the probabilities of all such
outcomes (n-tuples) which have exactly k successes and (n − k) failures in them. It turns
n
out that there are k
of them each with the same probability pk (1 − p)n−k , where
!
n n!
= number of ways of selecting k places out of n places = ,
k k!(n − k)!
and n! = n · (n − 1) · · · · · 3 · 2 · 1 .
! !
4 4·3·2·1 5 5! 5·4·3·2·1
Also, = = 6, while = = = 10.
2 (2 · 1)(2 · 1) 2 2!3! (2 · 1)(3 · 2 · 1)
5.1 Binomial distribution 67
n k
n−k
P(k successes out of n trials) =P(X = k) = k p (1 − p) ,
k = 0, 1, 2, · · · , n,
0 < p < 1.
Table B in the Appendix lists these probabilities for n = 0, 1, . . ., 20 and p = .1, .2,
.25, .3, .4, .5, .6, .75, .7, .8, .9.
Syntax:
1 np . random . b i n o m i a l ( n , p , s i z e )
R code instruction:
In R the computation becomes more succinct and accurate: probability of a certain point
P (X = a) for example can be computed by dbinom() function whose syntax is represented
as following:
Syntax:
1 dbinom ( x , s i z e= , prob= )
Example 5.1: What is the probability that in a family with 5 children 2 are girls?
Solution: Here the basic trial or random experiment occurs when the parents have a child.
Since the sex of the child is of interest, there are 2 possibilities. Let us label “female child”
as a “Success”. Then assuming boy/girl are equally likely,
and n, the number of trials = 5. Then X = # of female children among the 5, is conceptually
the same as the number of successes in the n = 5 trials. This has a Bin (5, 0.5) distribution.
So, using the formula, we get
!
5 10 10 5
P(X = 2) = (0.5)2 (0.5)3 = 5 = = .
2 2 32 16
Python code:
1 import numpy a s np #import numpy package
2 g i r l n u m b e r=np . random . b i n o m i a l ( 5 , 0 . 5 , 1 0 0 0 0 ) #s i m u l a t e b i n ( 5 , 0 . 5 ) f o r 10000
times
5.1 Binomial distribution 69
3 count=0 # number o f 2 o c c u r s i n a r r a y g i r l n u m b e r
4 f o r i in range (10000) :
5 i f g i r l n u m b e r [ i ]==2: # check e v e r y e l e m e n t i n g i r l a r r a y i f i t ’ s 2
6 count=count+1
7 p r i n t ( ’ The s i m u l a t e d p r o b a b i l i t y i s ’ , count / 1 0 0 0 0 ) #p r o b a b i l i t y t h a t 2 a r e
g i r l s among t h e 5 k i d s
As we can clearly see, the simulated value is pretty close to the theoretical value since
the simulation was done using a very large sample size (10,000) and it accurately reflects the
P (X = 2) for a population which follows a Bin(5, 0.5).
R code:
1 # P(X=2)
2 dbinom ( 2 , s i z e =5, prob =0.5)
3 # P(X=5)
4 dbinom ( 5 , s i z e =5, prob =0.5)
5 # P(X=0)
6 dbinom ( 0 , s i z e =5, prob =0.5)
The R code, with a one-line built-in function, will directly calculate the probability that
we want. Though the outputs of Python and R for this question agree from a numerical
perspective, R code would have the more accurate answer compared to Python, as in this
Example 5.1, since it is obtained by a direct calculation and not by simulation.
70 Binomial and Normal distributions
Example 5.2: A commuter plane has 10 seats. However, the airline books 12 people on
each flight since not everyone who books a seat will actually show up for the flight. Suppose
the chance of a person, who makes a reservation, actually showing up is 0.8.
Let X = # of who actually show up for the flight among the 12 who reserve.
The possible values for X are x = 0, 1, 2, . . . , 11, 12 and the probabilities are given
by the Binomial formula with n = 12, p = 0.8. For instance,
!
12
P(X = 10) = (.8)10 (.2)2 = 0.2835
10
! !
12 12
= (0.8)11 (0.2)1 + (0.8)12 (0.2)0
11 12
= .2062 + .0687
= .2749
and
Remark 5.1 Do the 12 passengers correspond to 12 independent trials i.e., does the chance
of success on one trial (i.e., someone showing up) depend on the chance of success on other
trials (i.e., others showing up)? Since some passengers may be friends or members of the
5.1 Binomial distribution 71
same family and may make decision as a group, the trials may not strictly speaking, be
independent and therefore the binomial distribution may not be appropriate. However, for
simplicity, we assumed they are independent.
µX = np
Example 4.1 (contd.): Fair coin tossed 3 times. Let X = # of heads in the 3 tosses. X
has a Bin(3, 21 ). Therefore, it has
1
mean, µ = np = 3 · = 1.5
2
and
s s
q 1 1 3
standard deviation, σ = np(1 − p) = 3· · = = 0.8660.
2 2 4
Note that we obtained the same values for µ and σ in Chapter 4 by a direct computation
that uses their definitions.
3
Example 5.2 (contd.): 12 passengers book seats. p = 0.8 is the chance for an individual
to show up. Here
µ = np = 12(0.8) = 9.6
q q √
σ= np(1 − p) = 12(0.8)(0.2) = 1.92 = 1.38. Thus, on the average 9.6 people
show up at the plane and the standard deviation is 1.38.
3
72 Binomial and Normal distributions
Python/R Instruction
One common limitation that Python and R share in calculating the mean or the sd of
a binomial distribution is that both involve a large amount of calculation or simulation. So
the theoretical derivations and formula given above are useful.
Example 5.3: Roll a fair die 36 times and let X = number of times face 6 shows up. What
is the average and standard deviation of this random variable?
Solution: Here, each time one rolls a die, there are six possible scores and one might think
a Binomial is not appropriate. But since we are just interested in whether “the score is 6” (a
success) or “not 6” (a failure), we have a Binomial situation. We have 36 independent trials
of a Binomial experiment where we think of getting a score 6 as a “success” and thus, X has
a Binomial distribution, with n = 36 and p = 16 . We write X has Bin(36, 1
6
) distribution
1
(in symbols, X ∼ Bin(36, 6
)). Hence, for instance,
5 31
!
36 1 5
P(X = 5) = ,
5 6 6
4 32 5 31
! !
36 1 5 36 1 5
P(X = 4 or 5) = P(X = 4) + P(X = 5) = + .
4 6 6 5 6 6
1
Mean of X, µ = np = 36 × =6
6
and
s
q 1 5 √
standard deviation of X, σ = np(1 − p) = 36 · · = 5 = 2.2361.
6 6
3
Python code:
1 import numpy a s np
2 d i c e=np . random . b i n o m i a l ( 3 6 , 1 / 6 , 1 0 0 0 0 ) #s i m u l a t e t h e e x p e r t 10000 t i m e s
5.1 Binomial distribution 73
3 c o u n t 5=0 # number o f 5 s i n a r r a y d i c e
4 c o u n t 4 a n d 5=0 #number o f 4 s and 5 s i n a r r a y d i c e
5 f o r i in range (10000) :
6 i f d i c e [ i ]==5:
7 c o u n t 5=c o u n t 5+1
8 i f d i c e [ i ]==5 o r d i c e [ i ]==4:
9 c o u n t 4 a n d 5=c o u n t 4 a n d 5+1
10 p r i n t ( ’ The p r o b a b i l i t y o f P(X=5) i s ’ , c o u n t 5 / 1 0 0 0 0 )
11 p r i n t ( ’ The p r o b a b i l i t y o f P(X=4 o r 5 ) i s ’ , c o u n t 4 a n d 5 / 1 0 0 0 0 )
12
R code:
1 #p ( x=5)
2 dbinom ( 5 , s i z e =36 , prob=1/ 6 )
3 #p ( x=4 o r 5 )
4 dbinom ( 5 , s i z e = 3 6 , prob = 1 / 6 )+dbinom ( 4 , s i z e = 3 6 , prob = 1 / 6 )
5 sum ( dbinom ( 4 : 5 , s i z e =36 , prob = 1 / 6 ) )
6 # f i n d mean : compare sample r e s u l t with t h e o r e t i c a l r e s u l t
7 b i n sample <− rbinom ( 1 0 0 0 0 , s i z e = 3 6 , p=1/ 6 )
8 mean ( b i n sample )
9 sd ( b i n sample )
10
[1] 0.1701991
> dbinom(5, size = 36, prob = 1/6)+dbinom(4, size = 36, prob = 1/6)
[1] 0.3031671
> sum(dbinom(4:5,size=36, prob = 1/6))
[1] 0.3031671
> # find mean: compare sample result with theoretical result
> mean(bin_sample)
[1] 6.0068
> sd(bin_sample)
[1] 2.260501
> mean(new_bin_sample)
[1] 6.00079
> sd(new_bin_sample)
[1] 2.236377
We used a new built-in function in R, namely “rbinom()” for working with Example 5.3:
This function randomly generates data from bin(size, p).
meaning that there are 10000 trials from a Bin(36,1/6) randomly generated and stored in
an array called “bin sample”. Then the mean and standard deviation of array “bin sample”
is used to approximate the population mean and standard deviation. When we increase the
sample size from 10,000 to 10,000,000 as we did in the succeeding lines, the corresponding
mean and standard deviation become closer to the theoretical value, as one would expect.This
phenomenon, whereby the sample mean approaches the true mean with increased sample
sizes, is more generally true and is called the “Law of Large Numbers” (See Remark 6.3
later).
This is a very useful continuous distribution used to model many data sets whose frequency
curves are approximately bell-shaped i.e., the frequency curve is symmetric around a single
5.2 Normal distribution 75
peak at the center and the frequencies decline rather rapidly, as we move away from this
center. For instance, the distribution of heights, weights, SAT scores etc. tend to have such
bell-shaped curves. A Normal distribution is completely described by its central value or
point of symmetry, µ and the spread around this center, denoted by the standard deviation,
σ. We write N(µ, σ) for such a distribution. There is a different Normal curve for each pair
68%
95%
99.7%
X
µ−4σ µ−3σ µ−2σ µ−σ µ µ+σ µ+2σ µ+3σ µ+4σ
of values of µ and σ. These µ and σ are called parameters of the Normal distribution and
their roles in describing the center and variability are illustrated in Figures 5.2 and 5.3. Note
that the mean µ tells us where the curve is centered. On the other hand, the parameter σ
is a measure of spread. Smaller the σ, the taller and more concentrated the curve is at the
center, so that there is a greater chance of observing a value near the mean.
Z
-7 -5 -3 -1 0 1 2 3 4 5 6 7 8 9
Hence, (µ − 3σ, µ + 3σ) can actually be considered as the “practical range” for a
N(µ, σ) distribution. Similarly, if we take a symmetric interval around the mean going
one standard deviation on each side, it has approximately 68% probability, while an
interval two standard deviations from the mean has nearly 95% probability, i.e.,
This rule is sometimes called the 1σ − 2σ − 3σ rule or 68- 95- 99.7 rule.
5.3 In particular, the Normal curve centered at µ = 0 and with σ = 1, is called the
Standard Normal curve and a random variable having such a standard Normal
distribution is usually denoted by Z. See Figure 5.4 for the curve of a standard Normal
random variable. If X is any Normal random variable with a N(µ, σ) distribution,
then it can be standardized by subtracting its mean µ and dividing by σ to obtain a
standard Normal or N(0, 1) distribution.
5.2 Normal distribution 77
N(0,1/2)
N(0,1)
N(0,2)
Z
-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7
X −µ
That is, if X has a N(µ, σ) distribution, Z = has a N(0, 1) distribution.
σ
Example 5.4: Suppose it is given that the height of adult women has a Normal distribution,
with µ = 64.5 inches and σ = 2.5 inches. The probability that a randomly selected woman
has height between 59.5 and 69.5 inches, i.e., between (µ − 2σ, µ + 2σ) is 95% while the
1
probability that her height exceeds 72, which is (µ + 3σ) inches, is 2
× (1 − 0.997) or 0.0015
because of symmetry and the 99.7% rule.
3
However, in most cases, we need to use the Property 5.3, which allows us to translate
78 Binomial and Normal distributions
68%
95%
99.7%
Z
-4 -3 -2 -1 0 1 2 3 4
the problem into one on Z and then look up the areas for the standard Normal or the Z
variable, which are tabulated in Table A of the book. This table given on the inside front
cover as well as in the Appendix, provides the area to the left of any given value z.
For instance, from this table, one can read that the area to the left of 1.20 is 0.8849, that
to the left of −1.32 is 0.0934 and that to the left of 0 is 0.5000.
The last probability reflects the fact that the N(0, 1) curve is symmetric about zero with
a probability of 0.5 on either side. If one wants the probability to the right of a given number,
i.e., on the right tail, recalling that the total probability is one,
If we want the area between any two points, we take the cumulative area till the larger
5.2 Normal distribution 79
value and subtract the cumulative area till the smaller value, which leaves us the with area
in between these two points. Thus
P(−1.32 < Z ≤ 1.2) = P(Z ≤ 1.2) − P(Z ≤ −1.32) = 0.8849 − .0934 = 0.7915.
The last three probability calculations can be stated more generally as:
When simulating normal distribution, Numpy package in Python can do the job by
using the Syntax:
1 np . random . normal ( mean , sd , s i z e )
R Code Instruction:
• lower.tail decides whether the area under the curve is greater than x or less than x
The output would be the probability P (X > x) or P (X ≤ x)
Example 5.5: Suppose X, the weight in lbs. of a college student, has a N(150, 20) distri-
bution. Find the probability that a randomly selected student weighs more than 160lbs.
3
Python code:
1 import numpy a s np
2 count=0
3 x=np . random . normal ( 1 5 0 , 2 0 , 1 0 0 0 0 )
4 f o r i in range (10000) :
5 i f x [ i ] <=160:
6 count=count+1
7 p r i n t ( ’ The p r o b a b i l i t y t h a t a randomly s e l e c t e d s t u d e n t w e i g h s more than 160
l b s i s ’ , count / 1 0 0 0 0 )
8 p r i n t ( ’ The p r o b a b i l i t y t h a t a randomly s e l e c t e d s t u d e n t w e i g h s l e s s than 160
l b s i s ’ , 1−count / 1 0 0 0 0 )
The probability that a randomly selected student weighs more than 160lbs
is 0.6934
The probability that a randomly selected student weighs less
than 160lbs
is 0.3098
R code:
5.2 Normal distribution 81
1 #p ( x<=160)
2 pnorm ( 1 6 0 , mean=150 , sd =20 , l o w e r . t a i l = T)
3 #p ( x>=160)
4 pnorm ( 1 6 0 , mean=150 , sd =20 , l o w e r . t a i l = F)
5 p1−norm ( 1 6 0 , mean=150 , sd =20 , l o w e r . t a i l = T) #same a s l i n e above
Example 5.4 (contd.): Recall X, the height of women has a N(64.5, 2.5) distribution.
Hence,
X − 64.5 70 − 64.5
P(X ≤ 70) = P(X −64.5 ≤ 70−64.5) = P ≤ = P(Z ≤ 2.2) = 0.9861,
2.5 2.5
Table A can be used not only to read the areas for a given value of z but also to read off
the z value corresponding to a given area. For instance from Table A, the value z which has
a 20% probability below it is −0.84 while the value z with a 95% probability to its left, falls
between 1.64 and 1.65 and is indeed 1.645.
Example 5.6: Suppose the score, X on a certain test has a N(430, 100) distribution. Find
the 90th percentile score on this test i.e., x such that P(X ≤ x) = 0.9.
From Table A, the z-value corresponding to an area of 0.90 is 1.28, so that we have the
equation
x − 430
= 1.28
100
(x − 430) = 128
x = 430 + 128 = 558.
i.e., if the scores follow this Normal distribution, then there is a 90% chance that a randomly
selected student will have a score of 558 or less, or to put it differently, 90% of the students
have a score of 558 or less.
3
As the code discussed in this chapter will be frequently called upon in the next few
chapters, we provide a brief review of the ideas here:
• Python
• R
EXERCISES
5.1 In a medical study it is found that there is 5% chance of a false positive in a mam-
mogram test i.e., suggests presence of breast cancer but biopsy reveals that cancer is
not present. If a woman has 30 mammograms during her lifetime (say, once a year,
between the ages of 40 and 70), what is the chance of at least one false positive during
her lifetime?
5.2 Suppose a multiple choice test has 20 questions with each question having 4 choices,
only one of which is correct. Suppose an unprepared student writes the answers, each
time randomly picking one of the 4 choices. What is the probability that the student
will get
5.3 A girl-scout makes visits to 15 households over a weekend, trying to sell cookies. She
feels that there is a 30% chance of selling cookies at any one household.
(a) What probability model would be appropriate for describing the number of house-
holds that buy cookies?
(b) Compute the probability that 10 or more households buy cookies, among the 15
she visits.
86 Binomial and Normal distributions
4
5.4 If I throw a peanut up in the air, there is a probability 5
that I can catch it in my
mouth. I throw up in the air four peanuts. Assume that the throws are independent.
i. all four ?
ii. exactly two ?
iii. at least one?
(b) Find the mean (µ) and variance (σ 2 ) for the number I catch in my mouth.
5.5 Since 2 out of 3 is the same proportion as 4 out of 6, can you conclude that the
probability of getting 2 heads in 3 tosses of a fair coin is the same as that of getting 4
heads in 6 tosses?
5.6 The Intelligence Quotient (IQ) of a person is normally distributed with mean 100 and
standard deviation 16.
(a) What percentage of the population possess an IQ in the interval (84, 116)? What
rule, if any, did you use to arrive at your answer?
(b) MENSA is an organization for people with IQ in the top 2% of the population.
What IQ should a person possess to get admitted to MENSA?
5.7 Suppose the weight of newborn babies is normally distributed with mean 7 pounds and
standard deviation 1 pound.
(a) Find the probability that a given newborn weighs more than 5 pounds.
(b) What percentage of the newborns has weight greater than 8 pounds and less than
9 pounds?
(c) Find the weight such that 10% of newborns have weight less than or equal to that
value.
5.8 Suppose that the daily high temperature in January in New York City’s Central Park
has a Normal distribution with mean µ = 33 degrees and σ = 4 degrees.
(a) What is the probability that a day in January will have a high temperature of
more than 30 degrees?
EXERCISES 87
(b) What is the probability that a day in January will have a high temperature
between 34 and 40 degrees?
(c) What is the temperature such that 9% of days in January will have daily highs
less than that temperature?
5.9 Suppose the lengths of frogs bred for human consumption follow a Normal distribution
with mean length µ = 20 cms and σ = 5 cm. They are classified as “small” if the
length is less than 15 cm, “medium” if between 15 and 25 cms, “large” if between 25
and 30 and “jumbo” if longer than 30 cms.
(a) What is the probability that a frog you select at random is “jumbo”?
(b) What is the probability that a randomly selected frog is “medium”?
(c) If you select two frogs at random for dinner, what is the probability that one is
“large” while the other one is “jumbo”?
5.10 When customers call a certain 800 number, the time for which they are put on hold is
normally distributed with a mean of 3 minutes and standard deviation of 1 minute.
(a) What is the probability that a randomly chosen caller has to wait on hold for
more than 5 minutes?
(b) If I have to call the number five times a week, what is the probability that I wait
for more than 5 minutes exactly once during the week?
5.11 Suppose that the scores, X, on a college entrance examination are normally distributed
with a mean of 550 and a standard deviation of 90. A certain university will consider for
admission only those applicants, whose scores fall in the top 10%. Find the minimum
score an applicant must achieve in order to be considered for admission to the university.
5.12 Suppose the “waiting time at a bank teller’s window” is normally distributed with
mean µ = 5 minutes and standard deviation σ = 1.2 minutes. If a customer visits this
bank on two different days, what is the probability that she has to wait more than 7
minutes on both these occasions?
5.13 A beer distributor believes that the actual amount of beer in a 12 ounce can of beer
has a Normal distribution with a mean of 12 ounces and a standard deviation of 1
ounce. If a 12 ounce beer can is randomly selected, find the probability that
88 Binomial and Normal distributions
(a) the 12 ounce can of beer will actually contain less than 11 ounces of beer.
(b) 12 ounce can of beer will actually contain more than 12.5 ounces of beer.
(c) the 12 ounce can of beer will actually contain between 10.5 and 11.5 ounces of
beer.
5.14 A tire manufacturer claims that the steel-belted radial tire manufactured by their
company has tread life that is normally distributed with a mean life of 22,000 miles
and a standard deviation of 18,000 miles.
(a) Find the probability that a tire selected at random will last
i. less than 18,000 miles.
ii. more than 18,000 miles.
(b) If two tires are selected at random, what is the probability that both will last less
than 18,000 miles?
5.15 A car battery of a certain brand has a mean life, µ = 5 years with a standard deviation,
σ = 1.5 years. Assume a Normal distribution for the lifetime.
(a) What is the probability that a battery of this brand will last longer than 8 years?
(b) If this battery is guaranteed for 3 years, what percent of these can be expected
to need replacement before the warranty period?
(c) If this company wishes to replace no more than 10% of the battery sold under its
guarantee program, how long should the guarantee period be?
5.16 A machine produces nails whose length X is a random variable with a Normal distri-
bution with µ = 2” and σ = 0.02”.
5.17 Suppose the IQ scores for American children have a Normal distribution with mean µ
= 100 and variance σ 2 = 25.
EXERCISES 89
5.18 A children’s bicycle says the “safe maximum load” for any rider is 150 lbs. If a UCSB
student gets on this bike, what is the probability that she will have a “safe ride”, given
that the weights of students at UCSB are normally distributed with µ = 120 lbs. and
σ = 10 lbs.?
5.19 A cafeteria coke machine dispenses coke into 6-ounce cups in such a way that the
actual amount dispensed into any particular cup is normally distributed with standard
deviation of 0.1 ounce. The machine can be set so that the mean of the amount
dispensed is any desired level. At what level should the mean be set so that the
6-ounce cup will overflow about 2% of the time?
5.20 Suppose at a given hospital, newborn birth weights are normally distributed with an
average of 120 ounces and a standard deviation of 10 ounces. A doctor knows that
infants with hypothyroidism have high birth weights and are are always above the 90th
percentile in birth weight. At what birth weight should the doctor be concerned about
the baby having hypothyroidism?
5.21 In a study of long distance runners, the average weight was found to be 140 lbs. with
a standard deviation of 10 lbs. Assuming the distribution of runners’ weights to be
normal,
5.23 Suppose the heights of adult males follow a Normal distribution with mean µ = 68
inches and σ = 3 inches. Find
(a) the proportion of males whose heights are between 65 and 71 inches.
(b) the proportion of males who are taller than 6 feet.
(c) the chance that if two males are selected at random from this population, both
are taller than 6 feet.
(d) the proportion of males that is exempt, if a certain police department does not
recruit anyone who is shorter than 5 feet.
(e) the first quartile i.e., a value below which 25% of the male heights lie.
5.24 The composite scores in a statistics course have a Normal distribution with a mean µ
of 68 and a standard deviation σ of 8. If the teacher decides to give the top 15% of the
students an A grade, what is the lowest score needed for an A ? If the bottom 10% are
to be given an F, below what score is it an F?
5.25 Consider a student who is taking a multiple choice exam where there are 5 possible
answers for each question. Since the student has not studied or attended any of the
classes, the student decides to randomly guess each question. Suppose there are 10
questions on this exam.
(a) Find the probability the student gets the first 2 guesses right and the last 8 guesses
wrong.
(b) Find the probability that the student gets 2 questions out of the 10 correct (give
an exact expression and also use table).
(c) If it takes 6 or more correct answers to pass this test, what is the chance that this
student will pass?
(d) What is the expected number of correct guesses and its variance?
91
Chapter 6
Sampling distributions
Recall that population characteristics, like its center and spread, are called parameters and
are denoted by µ and σ respectively. Suppose we are interested in the average height, µ,
of all the adult males in the United States. To estimate it, we may select a representative
random sample of 100 men. Suppose in this sample of 100 men, we get
If we repeat the same process again, i.e., draw another random sample of 100 and calculate
the sample mean, then we may (and almost surely will) get a different value for the sample
mean, say 5’2” — the third time possibly 4’6” . The values of the sample means differ
because the samples, on which they are based, differ. The important question here is, if we
repeat this many times, what possible distribution of values of these sample means would we
get? This is called the sampling distribution of X , i.e., the distribution of values of X
over all possible samples. If we know features of this sampling distribution, like the location
of its center and how spread out it is about this center, it will help us in interpreting and
using the sample mean for inference.
The idea applies to the sampling distribution of any other statistic (recall a statistic is
just a quantity based on the sample). For example, in tossing a coin consider the “probability
of getting heads”, say p, which typically we do not know. Suppose a sample of size n=20
92 Sampling distributions
is obtained by tossing the coin 20 times and say, we obtain 12 heads in this sample of 20
tosses. Then the observed proportion of heads i.e.,
12
proportion of heads in the sample = = 0.6
20
can be used as an estimate of the unknown p. If we toss the same coin again and obtain
another sample of 20, we may end up getting 11 heads — so that, this time, proportion of
11
heads in the sample = 20
= 0.55. The question again is: “What is the sampling distribution
of the observed proportion, i.e., the distribution of values of this observed proportion of heads
over repeated sampling?” In some important cases, we are able to obtain such sampling
distributions and use the center and spread of such distributions for purposes of inference.
Remark 6.1 One simple way to generate the sampling distribution of the observed propor-
tion when n = 20 and the true proportion p = 21 , is to do the following experiment:
I. Toss a fair coin 20 times, observe the number of heads, note down the observed
proportion of heads.
II. Repeat step I again and again.
III. Draw a histogram of the different values of the observed proportion, which ranges from 0 to 1.
This process can be simulated easily on a computer.
Definition: A statistic that is used to estimate a parameter is said to be unbiased for that
parameter if the sampling distribution of this statistic centers at the true parameter value.
That is, even though the statistic will vary from sample to sample, on the average, the
statistic centers at the correct or true value which we are trying to estimate. As we proceed,
we will be able to check that the sample mean X is unbiased for the true population mean
µ and and observed sample proportion in the Binomial situation is unbiased for the true
probability p. The sample variance s2 with a denominator of (n − 1) that we defined earlier,
is also an unbiased estimate of the true variance σ 2 , although, this is a bit harder to check.
When two different statistics have sampling distributions, both of which center around
the correct or true value (i.e., both are unbiased), we would prefer the one which has a
smaller spread around this center. Such a statistic is more likely to be closer to the true
value which we are trying to estimate and hence preferable. See Figure 6.2.
Unbiased statistic
Unbiased statistic
Figure 6.2: Among two unbiased statistics, the one with the smaller spread is preferred.
94 Sampling distributions
Coding Instruction:
In this chapter, we will mainly use the R functions that were previously discussed, to
get a deeper understanding of sampling distributions. While Python can also be used to
consider such questions, it is somewhat more complicated and not so straightforward at the
introductory level; thus our main focus in this chapter will be on R.
One such illustration is to look at the distribution of the sample mean from a binomial and
demonstrate the normal approximation to the binomial distribution.
Fact 1: The mean and standard deviation for the X distribution are given
by
µX = µ
and
σ
σX = √ .
n
i.e., the sampling distribution of X centers at the same value µ as the center of any individual
observation. On the other hand, the spread of the X distribution (as described by its
√
standard deviation) reduces by a factor of n and is √1n of the standard deviation of any
individual value. Thus the X values, i.e., averages, tend to be much less variable or more
stable than single observations.
Remark 6.2 The fact that the sampling distribution of X tends to be centered at µ can be
rephrased to say that the sample mean X is unbiased for µ.
Remark 6.3 Note that since σX = √σ , this standard deviation of X tends to zero as n → ∞.
n
6.2 Distribution of the sample mean 95
Recall that a variable with variance or standard deviation of zero is a constant (in this case,
µ), so that we may conclude that X comes arbitrarily close to µ for large enough n. This
fact is referred to as the Law of Large Numbers, which says that the sample mean X
approaches the true mean, µ when the sample size is sufficiently large.
For Fact 1, we assumed that the individual values have mean µ and standard deviation
σ and nothing about the what distribution they have. Suppose we assume further that
the individual values are from a N(µ, σ) curve. Then X also follows a normal curve, with
the center and standard deviation indicated in Fact 1 above, i.e., µX = µ , σX = √σ .
n
Fact 2:
If X1 , X2 ,. . . , Xn are from N (µ, σ), X has a N(µ, √σ ) distribution.
n
√
(X − µ) n(X − µ)
Z= √ =
(σ/ n) σ
Example 6.1: Students in a certain university have a weight distribution that is known
to be N(150, 20). Let X1 , X2 ,. . . , X16 represent the weights of 16 randomly selected
students from this university. If X is the average weight for this sample of the 16 students,
find P(X > 160).
Solution: Recall from Fact 2 that X has N(µ, √σ ) ≡ N(150, 20
) = N(150, 5) distribution.
n 4
96 Sampling distributions
Therefore,
!
X − 150 160 − 150
P(X > 160) = P >
5 5
10
= P Z>
5
= P(Z > 2)
= 1 − P(Z ≤ 2)
= 1 − 0.9772
= 0.0228.
3
R code:
1 # g i v e n t h a t X˜N( 1 5 0 , 2 0 ) ; 16 s t u d e n t s a r e randomly c h o s e n
2 # c a l c u l a t e sample mean and s t a n d a r d d e v i a t i o n
3 miu bar =150
4 sigma bar=20/ s q r t ( 1 6 )
5 # c a l c u l a t e P(X bar >160)
6 pnorm ( 1 6 0 , mean = miu bar , sd=sigma bar , l o w e r . t a i l = FALSE)
> miu_bar=150
> sigma_bar=20/sqrt(16)
> pnorm(160,mean = miu_bar,sd=sigma_bar,lower.tail = FALSE)
[1] 0.02275
Example 6.1 (contd.): An elevator at this university has a carrying capacity of 1500
pounds. What is the probability that 9 students who enter the elevator will have a safe ride,
i.e., their total weight is less than 1500 pounds?
6.2 Distribution of the sample mean 97
1500
P(Total weight of 9 people is < 1500) = P(X < ) = P(X < 166.67)
9
X − 150 166.67 − 150
= P 20 <
20
= P(Z < 2.5) = 0.9938
3 3
i.e., more than 99% chance that they will have a safe ride (unless all these 9 guys are
from the university’s football team!— in which case we may have a very unusual and not a
representative sample!)
3
R code:
1 miu bar =150
2 sigma bar=20/ s q r t ( 1 6 )
3 # t o t a l w e i g h t o f 9 p e o p l e l e s s than 1 5 0 0 : P(X t o t a l <1500=P(X bar <1500/ 9 ) )
4 pnorm ( 1 5 0 0 / 9 , mean = miu bar , sd=sigma bar , l o w e r . t a i l = TRUE)
miu_bar=150
sigma_bar=20/sqrt(16)
> pnorm(1500/9,mean = miu_bar,sd=sigma_bar,lower.tail = TRUE)
[1] 0.9995709
We note that for calculating the mean and standard deviation of X with a distribution
of N(µ,σ), we merely switch σX that we used in the calculation of X with a distribution of
√
N(µ, σ) for a single observation, to σX / n
Example 6.2: Suppose X, the score on a certain test has N(500, 100). Let X1 , . . . , X16
be a sample of scores for 16 individuals and let X be the average score for these 16 people.
Find P(550 < X ≤ 600).
Solution: X has normal distribution with µX = µ = 500, and with σX = √σ = 100
= 25.
16 4
98 Sampling distributions
Hence,
!
550 − 500 X − 500 600 − 500
P(550 < X ≤ 600) = P < ≤
25 25 25
= P(2 < Z ≤ 4)
= P(Z ≤ 4) − P(Z ≤ 2)
= 1 − 0.9772 = 0.0228.
It is a somewhat surprising fact that even if the sample is not from a normal distribu-
tion as we assumed in Fact 2, the sampling distribution of X is well approximated by the
normal curve, provided the sample size is large. This is called the Central Limit Theorem.
Remark 6.4 An important question is how large should n be before we can use this normal
approximation. Unfortunately, there is no simple answer, as it depends on a number of
factors. However, the larger the n, the better the normal curve approximation — at least a
sample of size of 20 is desirable in most cases, before we use this approximation.
! !
σ 10
X has approximately a N µ, √ = N 700, √ = N(700, 2) distribution.
n 25
6.3 Normal approximation to the Binomial 99
Thus,
!
X − 700 702 − 700
P(X ≤ 702) = P ≤ = P(Z ≤ 1) = 0.8413.
2 2
Remark 6.5 The main difference between Facts 2 and 3 is that Fact 2 holds true for any
sample size as long as the original data is assumed to be from a normal distribution, while
Fact 3 (or the Central Limit Theorem) is true even when the observations are not from a
normal curve, but the approximation is good only for sufficiently large n.
Recall the Bin(n, p) random variable X, which counts the number of successes in n trials with
probability of success p on each trial. When the number of trials n is sufficiently large, the
Binomial probabilities can be approximated by the Normal probabilities. This result can be
stated in either of the two following equivalent ways — for the distribution of the observed
number of successes, X (Fact 4) or for the distribution of the the observed proportion of
successes, pb = X
n
, which is a fraction (Fact 40 ).
∗ We can also use Python to test the accuracy of Fact 4. (See Appendix)
Fact 4: For n large enough, the Bin(n, p) random variable X can be approxi-
mated by a normal distribution with
µX = np,
and q
σX = np(1 − p).
100 Sampling distributions
and with s
p(1 − p)
σpb = .
n
Remark 6.6 It can be demonstrated that Facts 4 and 40 are indeed consequences of the
Central Limit Theorem, stated in Fact 3.
Remark 6.7 For this approximation to be reasonable, a general rule of thumb is that n and
p should be such that np ≥ 5 and n(1 − p) ≥ 5 . The approximation becomes quite good
when np ≥ 10 and n(1 − p) ≥ 10.
For instance, following the stricter rule, if p = 12 , n should be greater than or equal to 20,
whereas if p = 41 , n should be greater than or equal to 40 for the Normal approximation to be
quite good. We plot in Figure 6.3 Binomial distributions for different n and p to demonstrate
the normal approximation. (Code for creating this plot is given in the Appendix).
6.3 Normal approximation to the Binomial 101
0.25
0.30
0.6
0.20
0.4
0.15
n=10
0.10
0.2
0.0 0.05
0.0
0.0
0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10
0.10
0.2
n=20
0.05
0.1
0.0
0.0
0.0
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
0.12
0.20
0.08
0.08
n=50
0.10
0.04
0.04
0.0
0.0
0.0
0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50
0.20
0.08
0.10
n=80
0.04
0.05
0.0
0.0
0.0
0 20 40 60 80 0 20 40 60 80 0 20 40 60 80
p = .05 p = .5 p = .8
Remark 6.8 We mentioned that Facts 4 and 40 are equivalent since both these lead to the
same standardization or Z-scores i.e., If X is a Binomial and n is sufficiently large, we can
use the Normal approximation with
q
µX = np and σX = np(1 − p)
leading to
1
(X − np) n
(X − np) pb − p
Z=q = q =q
1 p(1−p)
np(1 − p) n
np(1 − p) n
102 Sampling distributions
Example 6.4: If a fair coin is tossed 100 times, what is the chance that the observed pro-
portion of heads exceeds 0.6?
Solution: If X denotes the number of heads in these 100 tosses, it has a Binomial distribu-
tion with n = 100 and p = 12 . In terms of X, we want P(X > 60). The exact answer can be
obtained by using the binomial probabilities i.e.,
! !
100 100
= (.5)61 (.5)39 + (.5)62 (.5)38 + · · · · · · .
61 62
Clearly, this is a very difficult expression to evaluate numerically partly because of the large
factorials involved. On the other hand, we can use Fact 40 (or Fact 4), since n is large enough.
The observed proportion pb is approximately normal with
µpb = p = 0.5
and
s s
p(1 − p) (.5)(.5) .5
σpb = = = = 0.05.
n 100 10
Thus, we may compute the probabilities of interest by using the much simpler Normal
approximation to the Binomial distribution, namely
3
R code:
1 #Example 6 . 4
2 # p r o b a b i l i t y f u n c t i o n o f t h e b i n o m i a l d i s t r i b u t i o n can be used t o g e t P(X>60)
3 n=100
4 p=1/ 2
6.3 Normal approximation to the Binomial 103
> n=100
> p=1/2
> pbinom(60,size = n,prob = p,lower.tail = FALSE)
[1] 0.0176001
> norm_mean=n*p
> norm_std=sqrt(100*1/2*(1-1/2))
> pnorm(60,mean = norm_mean,sd=norm_std,lower.tail = FALSE)
[1] 0.02275013
Example 6.5: Suppose the true proportion of foreign cars in California is known to be p
= 0.4. Suppose we observe a sample of n = 100 cars. What is the chance that (a) more
than half the cars observed are foreign-made? (b) the observed proportion of foreign cars is
between 0.35 and 0.45?
Solution: From Fact 40 , pb is approximately Normally distributed with
s
(0.4)(0.6)
µpb = 0.4, σpb = = 0.05.
100
Therefore,
(b) Similarly,
P(.35 < pb ≤ .45) = P(−1 < Z ≤ 1) = 0.68.
104 Sampling distributions
Example 6.6: Suppose the admissions office at a certain university sends out 1000 ad-
mission letters to prospective students. Suppose it is known from past statistics, that the
probability that an admitted student will actually enroll at that university is 0.40. Find
the probability that fewer than 420 of the admitted students students will enroll at this
university.
Solution: Let X = Number of students who actually enroll among the 1000 offered admis-
sion. This is a Binomial distribution with n = 1000 and p = 0.40. Hence,
! ! !
1000 1000 1000
P(X ≤ 420) = (.4)0 (.6)1000 + (.4)1 (.6)999 + · · · + (.4)420 (.6)580
0 1 420
s s
p(1 − p) (.4)(.6) .24
µpb = p = 0.4, σpb = = = = 0.0155.
n 1000 1000
Therefore,
P(X ≤ 420) = P(pb ≤ .42) = P(Z ≤ 1.29) = 0.9015.
3
EXERCISES 105
EXERCISES
6.1 An insurance company is studying the age at death of its clients. If the standard
deviation in the population is 6.2 years, how likely is the sample mean based on 64
observations to differ from the unknown population mean by 1 year or less?
6.2 The distribution of scores for persons over 16 years of age on the Wechsler Adult Intel-
ligence Scale (WAIS) is approximately normal with mean 100 and standard deviation
15. The WAIS is one of the most common “IQ tests” for adults.
(a) What is the probability that a randomly chosen individual has a WAIS score of
105 or higher?
(b) What are the mean and standard deviation of the average WAIS score x for an
independent sample of 60 people?
(c) What is the probability that the average WAIS score of the sample of 60 people
is 105 or higher?
(d) Would your answers to (a), (b) or (c) be affected if the distribution of WAIS
scores in the adult population were not normal?
6.3 If the SAT verbal scores are normally distributed with mean 420 and a standard devia-
tion of 100 and SAT math scores are distributed normally with mean 480 and standard
deviation of 100, answer the following questions:
(a) Assuming independence, find the probability one gets more than 600 on verbal
AND more than 700 on math. Is this the same as asking the chance of the total
score (verbal + Math) being more than 1300? Without calculating, which of these
outcomes do you think has a higher probability?
(b) If 10 tenth-graders attempt this exam in a particular high school, what is the
chance their average math score is more than 550?
6.4 Suppose the lifetime of light bulbs follow a distribution with mean 700 hours and
standard deviation of 25.
(a) For a random sample of 100 light bulbs, give an approximate probability distri-
bution for the sample average, and give justification through a fact or theorem.
106 Sampling distributions
(b) For a random sample of 100 light bulbs, find an approximate probability that the
sample average of lifetimes is greater than 750.
6.5 A soda fountain dispenses to any customer an amount X at random anywhere between
6.5 and 7.5 ounces, i.e., X is uniformly distributed between 6.5 and 7.5 ounces.
6.6 The length of a TV commercial (in seconds) is a random variable with the following
probability distribution:
6.7 A buyer for a lumber company must determine whether to buy a piece of land con-
taining 5,000 pine trees. If at least 1,000 of the trees are 32 feet or taller, the buyer
will purchase the land; otherwise, she will not. The owner of the land reports that the
distribution of the heights of the trees has mean of 30 feet, and standard deviation of 3
feet. Based on this information, what should the buyer decide? State any assumptions
you made.
EXERCISES 107
6.8 The probability that a door-to-door salesperson makes a sale when she visits a house
is 0.2. There are 10 houses on a particular street.
(a) What is the probability that she makes at least one sale on this street?
(b) What is the probability that the number of sales is less than or equal to 3?
(c) On a typical day, the salesperson visits 100 houses. What is the approximate
probability that she makes a sale in at least 40% them?
6.9 The probability that a person with a certain disease is cured after taking a new medicine
is 0.40. Suppose there are 10 patients and the probability of a patient being cured is
independent of what happens to the other patients.
(a) What is the exact probability that at least three patients out of the ten will be
cured?
(b) What is the mean and the standard deviation of the number of patients that will
be cured out of the 10?
(c) Suppose that new medicine has now been tested out on a different group of 500
patients. What is the approximate probability that at least 190 of the patients
will be cured?
6.10 A restaurant offers its patrons free dinner on the 10th of each month, if their birthday
falls on that month. What is the approximate probability that out of 200 customers
that they plan to serve on the coming May 10th, more than 25 will be entitled to a
free meal? (You may assume that the birthday of a person is equally likely to fall in
any one of the 12 months.)
6.11 A radio station in Santa Barbara claims 30% of the people in town listen to it on
a regular basis i.e., p = 0.3. In a random sample of 200 radio listeners, find the
probability that the number who tune to this station is 50 or less.
6.12 A new drug aimed at alleviating mental depression seems to be 80% effective, according
to an intensive prelicensing testing program.
(a) If the new drug is prescribed for 15 patients, with what probability can we expect
fewer than 11 to benefit from it?
108 Sampling distributions
(b) If the drug is prescribed for 150 patients, how likely is it that fewer than 110 will
benefit?
(c) Of 1500 patients taking this drug, what are the chances that fewer than 1100 will
benefit from it?
(d) With 15,000 patients being given the new drug, what is the probability that fewer
than 11,000 will benefit from it?
6.13 A statistics professor decides to give a 20-question true-false quiz to determine who
has read the weekly assignment. She wants to choose the passing mark such that the
probability of passing a student who guesses on every question is less than .05. What
score should she set as the lowest passing grade?
(a) If we selected 6 people randomly, what is the chance that 2 of these 6 jog regularly?
(b) Suppose we take a random sample of 200 adults and observe the proportion of
adults who jog in our sample. Find the probability that this is between 15% and
22%.
6.15 Suppose there is a 80% chance that an insect egg hatches and becomes an adult. If
a “mommy insect” lays 100 eggs, find the chance that 75 or more of these eggs hatch
and grow into adults.
6.16 Suppose 55% of self-employed workers in the United States do not have health insurance
coverage (i.e. they are uninsured). 100 self-employed workers are randomly surveyed.
What is the probability that at least one half of those surveyed will be uninsured?
109
Chapter 7
Thus far, we have been describing how the number of successes follows a Binomial distri-
bution and how variables like heights or scores or their means follow a Normal curve. We
have assumed all this time that we knew the so-called parameters — the true proportion
or probability p in a Binomial distribution or the µ and σ for a Normal curve. However, in
practice, these parameters are typically unknown and one of the main purposes of statistics
is (i) to estimate the parameters or (ii) to test hypotheses about these unknown parameters
using sample data. This is, as we said before, part of statistical inference and in this
chapter, we will try to cover some basic ideas in estimation.
The parameters p, µ or σ are generally unknown. For instance, in the Binomial context,
the true proportion or “probability of success” p is unknown. However, if we conducted
the Binomial experiment n times and observed say, X successes among these n trials, it is
X
intuitively clear that the observed proportion of successes in n trials, pb = n
is a reasonable
0
estimate of p. Indeed, in Fact 4 in Chapter 6, we observed
µpb = p,
so that such pb values, which can vary from one sample to the other, still center at the true
(but unknown) value p. This means that pb is an unbiased estimator of p. Similarly, in the
110 Estimation and Confidence Intervals
case of data (X1 , X2 , · · · , Xn ) from a Normal curve with unknown center µ, the sample
mean
X1 + X2 + · · · + Xn
X= ,
n
is a natural heuristic first guess at this µ. Again Fact 1 of Chapter 6 tells us that
µX = µ,
n
2 1 X
S = (Xi − X)2
n − 1 i=1
Such estimates are called point estimates since they give a single value or point as an
estimate of the corresponding unknown quantity. In fact, it can be shown that these are all
very good point estimates in the sense that they are not only unbiased (i.e., centered at
the value we are trying to estimate) but also have the smallest possible spread around that
center among all unbiased estimators (refer Section 6.1).
X
pb = , if X successes are observed in n trials.
n
µb = X
and
σc2 = S 2 .
These are are known to be the best (unbiased as well as having minimum variance) point
estimates of the corresponding parameters.
7.2 Confidence interval for mean with known σ 111
R Code Instruction:
In R, if the unbiased estimator is a mean of the sample data, it can be obtained using
the built-in function mean( ) that was mentioned before. Similarly for the estimator of
X
population proportion p, to calculate p,
b the equation pb = n
is used. As an example, the
following data set contains different genders and we calculate the proportion of females in
the data set.
Example :
1 d a t a s e t 1=c ( ” f e m a l e ” , ” male ” , ” male ” , ” f e m a l e ” , ” male ” , ” f e m a l e ” , ” f e m a l e ” )
2 X=sum ( d a t a s e t 1==” f e m a l e ” )
3 n=l e n g t h ( d a t a s e t 1 )
4 phat=X/n
5 p r i n t ( phat )
> dataset1=c("female","male","male","female","male","female","female")
> X=sum(dataset1=="female")
> n=length(dataset1)
> phat=X/n
> print(phat)
[1] 0.5714286
Note: For estimating the variance, the built-in function var() is used as mentioned in
Chapter 2.
An alternative to finding point estimates, is to seek a possible range of values for the unknown
parameter and to state along with it the confidence that we have in such an interval of values.
The latter approach is called interval estimation and the intervals are called confidence
intervals.
X 1 + X2 + · · · + Xn
X=
n
has a Normal distribution with µX = µ , σX = √σ . Hence,
n
X −µ
Z= √
σ/ n
has a N(0, 1) distribution. Thus, from the 68-95-99.7 rule, or more accurately from Table A,
we can claim that the Z-value lies between (−1.96, 1.96) with a 95% probability. In other
words, the chance is 95% that
X −µ
−1.96 ≤ √ ≤ 1.96
σ/ n
or
σ σ
(−1.96) √ ≤ (X − µ) ≤ (1.96) √
n n
or
σ σ
X − 1.96 √ ≤ µ ≤ X + 1.96 √ .
n n
This interval, (X − 1.96 √σn , X + 1.96 √σn ), is entirely known once we have the data and
σ is known and can hence be used as an interval estimate of µ. The interval is called a 95%
confidence interval for µ when σ is known and the corresponding probability, 95%, is
called the confidence level or confidence coefficient.
We can, of course, arrange for any other confidence level C (usually 0.90, 0.95, 0.99, etc.)
by looking up Table A for the appropriate Z-value. If C is the desired confidence level, we
1−C
need a Z-value to correspond to a right tail area of 2
. It is more common to use (1 − α)
in place of C, where α is some small given value (usually 0.1, 0.05, 0.01, etc.) Then, we seek
7.2 Confidence interval for mean with known σ 113
α
values −z and z such that there is a probability (1 − α) in between these two values and 2
in each tail. We denote such a z by zα/2 . This makes the area between −zα/2 and zα/2 the
probability C = (1 − α) that we desire.
For instance, 1.96 corresponds to a 95% confidence level, while 2.576 corresponds to a
99% confidence level. Or, in other words, z.025 = 1.96 and z.005 = 2.576.
C=(1- α)
α 1-C 1-C α
= =
2 2 2 2
Z
-zα/2 0 z α/2
N(0,1) Curve
95%
Z
-1.96 0 1.96
N(0,1) Curve
99%
Z
-2.576 0 2.576
The zα/2 values corresponding to different confidence levels are obtained readily from the
last row of Table C of this book.
We may now restate the more general confidence interval for the unknown mean µ, as
follows:
7.2 Confidence interval for mean with known σ 115
σ
X ± zα/2 √
n
R Code Instruction:
From what has already been discussed, we know that X follows a normal distribution with
µX =µ, and σX = √σn , so that the interval estimator with known σ is obtained as x ± 1.96 √σn
with 95% confidence. Recall that the interval (-1.96, 1.96) contains 95% probability under
the standard normal distribution. In R, we can use the built-in function qnorm() to find
such values or z-scores corresponding to any specified probability. For Example:
1 qnorm ( 0 . 9 5 )
2 qnorm ( 0 . 9 7 5 )
> qnorm(0.95)
[1] 1.644854
> qnorm(0.975)
[1] 1.959964
A confidence interval in this case can be found through R using qnorm() function. We
will use Example 7.1 to illustrate this.
Example 7.1: Suppose X, the scores of a test on numerical ability, have a Normal distri-
bution with unknown µ and known σ= 60. Suppose the sample values (X1 , . . . , X900 ) give
a sample mean X = 272. Find a 95% as well as a 98% confidence interval for the (unknown)
true mean µ.
Solution: We have
µb = X = 272,
116 Estimation and Confidence Intervals
!
σ σ
X − 1.96 √ , X + 1.96 √
n n
!
60 60
= 272 − 1.96 √ , 272 + 1.96 √
900 900
For a 98% confidence interval for µ, we replace 1.96 by 2.326 (see Table C) and get
3
R code:
1 xbar =272
2 sigma=60
3 n=900
4 c i=qnorm ( 0 . 9 5 ) ∗ ( sigma / s q r t ( n ) )
5 c o n f i d e n c e I n t e r v a l=xbar+c ( c i ,− c i )
6 print ( confidenceInterval )
> xbar=272
> sigma=60
> n=900
> ci=qnorm(0.95)*(sigma/sqrt(n))
> confidenceInterval=xbar+c(ci,-ci)
> print(confidenceInterval)
[1] 275.2897 268.7103
7.3 Choice of sample size 117
Remark 7.2 From Fact 3, we know that if the sample size n is large,
X −µ
Z= √
σ/ n
is approximately N(0, 1), no matter what the parent population is. We can therefore use
the same confidence intervals and confidence statements for µ for any data set, provided n
is large.
The margin of error (or half-width of the confidence interval), zα/2 √σn , depends on the fol-
lowing three factors:
Given the first two, we can ask “What is the sample size n, needed for the margin of
error to be a specified value, say m?”
σ
m = zα/2 √
n
√ σ
or, n = zα/2
m
2
zα/2 σ2
or, n = .
m2
118 Estimation and Confidence Intervals
Thus,
Example 7.2: Suppose (X1 , . . . , Xn ) are heights from N(µ, σ), with σ given to be 10”. If
in a sample of size 16, we obtain X = 60” , find a 95% confidence interval for µ. Find the n
needed in order that the margin of error m = 3 inches.
Solution: Since we want a confidence level of 95%, C = 0.95. From the last row of Table
C, the appropriate zα/2 = 1.96. Thus, the required confidence interval is:
σ
x ± zα/2 √
n
10
= 60 ± 1.96 √
16
= 60 ± 4.9
= (55.1, 64.9)
However, if we wish to keep the margin of error to m = 3 inches, then the required sample
size is obtained by substituting in the equation above and we obtain
2 2
σ 10
n = zα/2 = 1.96 × = 42.68,
m 3
which we round up to 43. Thus if we increase the sample size to 43 instead of the original
16, we can reduce the margin of error from 4.9 inches to 3 inches.
3
7.4 Confidence interval for mean with unknown σ 119
n
1 X
S2 = (Xi − X)2 .
n − 1 i=1
X −µ
t≡ √ .
S/ n
Notice that when we know σ we use it to compute the Z-statistic whereas when it is
unknown, we replace σ by the sample estimate S. The result is now called a t-statistic and
it does not follow the N(0,1) distribution any longer.
This t-statistic however follows a t(n − 1) distribution where (n − 1) is called the degrees
of freedom (df). A t-distribution is very similar in shape to the Standard Normal or Z
distribution except that it has somewhat heavier (or fatter) tails (See Figure 7.3). In other
words, there is less concentration near the center compared to the Standard Normal curve.
As the degrees of freedom of a t distribution increases, it becomes indistinguishable from a
N(0, 1) distribution. The t-values corresponding to the commonly used upper-tail areas are
given in Table C. Each row corresponds to a different df with the last row (∞ df) giving us
the Z-values of the N(0, 1) distribution.
X −µ
z= √
σ/ n
N(0,1)
t(2)
t(8)
-5 -4 -3 -2 -1 0 1 2 3 4 5
Similarly when σ is unknown, we can replace σ by S and use the corresponding fact that
X −µ
t= √
S/ n
S
X ± tα/2 × √ ,
n
with confidence level (1 − α). Note that tα/2 coming from Table C, corresponding to (n − 1)
df, satisfies:
α
P t(n − 1) > tα/2 = .
2
We thus get:
7.4 Confidence interval for mean with unknown σ 121
S
X ± tα/2 √
n
R Code Instruction:
For finding a confidence interval for the mean with unknown variance, t-score is used
instead of the z-score. The R program can find the t-score for specified probability, using
the built-in function
1 qt ( p , d f )
Example 7.3: A scientist who is studying the brain-weights of tigers took a random sample
16 animals and measured their brain-weights in ounces. Suppose this data gave a sample
mean x =10 and standard deviation s = 3.2. Assuming that these weights follow a Normal
distribution, find a 95% confidence interval for the true mean weight µ.
Solution: Here, n = 16, X = 10, S = 3.2. Since we want a 95% confidence, α = 0.05 and
α/2= 0.025. Looking up Table C, we have tα/2 = 2.131 corresponding to 15 df, so that the
required confidence interval is
3.2
10 ± 2.131 × √ = 10 ± 1.705 = (8.295, 11.705).
16
R code:
1 xbar1=10
2 s =3.2
3 n=16
4 c i 1=qt ( 0 . 9 5 , d f=n−1)∗ ( s / s q r t ( n ) )
5 C o n f i d e n c e I n t e r v a l=xbar1+c(− c i 1 , c i 1 )
6 print ( ConfidenceInterval )
> xbar1=10
> s=3.2
> n=16
> ci1=qt(0.95,df=n-1)*(s/sqrt(n))
> ConfidenceInterval=xbar1+c(-ci1,ci1)
> print(ConfidenceInterval)
[1] 8.29556 11.70544
Thus we are 95% confident that the true mean weight of tiger’s brain is somewhere
between 8.295 and 11.705 ozs. (Just for comparison, a human brain weighs on the av-
erage 46 ozs while an African elephant has a brain weighing 158 ozs, on the average ).
3
We mentioned in Section 7.1 that the sample variance, S 2 , is a good estimate of the true
(population) variance, σ 2 . To obtain a confidence interval for σ (or σ 2 ), we need to introduce
yet another distribution called the chi-square (written as χ2 ) distribution.
df = 2
df = 4
df = 10
The procedure we are about to describe (called the equal-tails method) works well for
reasonably large n, although choosing equal tails is not the ”Optimal” thing to do. If we
wish to set a C = (1 − α) level confidence interval for σ for a given sample size n, consult a
χ2 (n − 1) distribution and find the low and high values χ21−α/2 and χ2α/2 respectively so that
α
there is a probability of 2
on each tail (and hence a probability of C = (1 − α) in between
χ21−α/2 and χ2α/2 ). Such values can be found from Table D.
(n − 1)S 2
χ21−α/2 ≤ ≤ χ2α/2
σ2
v v
u (n − 1)S 2 u (n − 1)S 2
u u
t ≤σ≤ t .
χ2α/2 χ21−α/2
124 Estimation and Confidence Intervals
α/2 α/2
2 2
χ1−α/2 χ α/2
α
where χ21−α/2 and χ2α/2 are values from a χ2(n−1) distribution with areas 2
below χ21−α/2 and above χ2α/2 , respectively.
R Code Instruction:
For finding a confidence interval for σ, R programming can find χα , using the built-in
function:
1 qchisq (p , df )
Example 7.3 (contd.): Recall here n = 16, X = 10 and S = 3.2. Find a 95% confidence
interval for σ.
Solution: We consult a χ2 (15) distribution to find χ20.95 = 7.26 as the value with 5%
probability below it and χ20.05 = 25.00 as the value with 5% probability above. Therefore,
the 90% confidence interval is
v v
u 15 × (3.2)2 u 15 × (3.2)2
u u
t , t
(25.00)2 (7.26)2
or
(2.48, 4.60).
That is, we have 90% confidence that the true σ is in between 2.48 and 4.60.
R code:
1 s =3.2
2 n=16
3 c h i 1=q c h i s q ( 0 . 9 5 , d f=n−1)
4 c h i 2=q c h i s q ( 0 . 0 5 , d f=n−1)
5 c i 1=s q r t ( ( n−1)∗ ( s ˆ 2 ) / ( c h i 1 ˆ 2 ) )
6 c i 2=s q r t ( ( n−1)∗ ( s ˆ 2 ) / ( c h i 2 ˆ 2 ) )
7 Ci=s+c(− c i 1 , c i 2 )
8 p r i n t ( Ci )
> print(Ci)
[1] 2.704175 4.906878
observations and suppose X of these result in successes. Then we have seen that
X
pb = = observed proportion
n
6
is a good point estimate of p. For instance, if 10 tosses result in 6 heads, then pb = 10
= 0.6
is a point estimate of “the probability of getting heads”.
s
p(1 − p)
µpb = p and σpb =
n
so that
pb − p
Z=q
p(1−p)
n
has an approximate N(0, 1) distribution. Therefore, using the zα/2 corresponding to a spec-
ified confidence level C = (1 − α) , there is a chance C = (1 − α) that
pb − p
−zα/2 ≤ q ≤ zα/2
p(1−p)
n
or
s s
p(1 − p) p(1 − p)
pb − zα/2 ≤ p ≤ pb + zα/2 .
n n
In large samples, it is justified to plug in pb in place of p inside the square root on either
side of the above inequality. This results in
s
b − p)
p(1 b
pb ± zα/2
n
as the confidence interval for p with confidence level C.
7.6 Confidence interval for proportion 127
R code Example:
1 d a t a s e t 1=c ( ” f e m a l e ” , ” male ” , ” male ” , ” f e m a l e ” , ” male ” , ” f e m a l e ” , ” f e m a l e ” )
2 X=sum ( d a t a s e t 1==” f e m a l e ” )
3 n=l e n g t h ( d a t a s e t 1 )
4 phat=X/n
5 qhat=1−phat
6 c i 2=qnorm ( 0 . 9 0 ) ∗ s q r t ( phat ∗ qhat /n )
7 c o n f i d e n c e i n t=phat+c(− c i 2 , c i 2 )
8 print ( confidenceint )
> print(confidenceint)
[1] 0.3317222 0.8111350
As before, we can ask how large a sample we should drawqto attain a given precision.
Suppose we wish to find n such that the margin of error zα/2 p(1−p)
n
is some pre-selected
value m. Solving an equation like the one in the last section, we have
2
zα/2 p(1 − p)
n= .
m2
Since p is unknown, we have to plug in our best guess of p, say p0 , in the above equation.
If no guess is available, a conservative approach is to take p = 21 . This choice gives us the
maximum sample size needed, no matter what the true value of p is. This results in
2
zα/2 ( 12 )2 2
zα/2
n= = .
m2 4m2
128 Estimation and Confidence Intervals
2
zα/2 p0 (1 − p0 )
n= ,
m2
is the sample size required for the margin of error to be m, where p0 is the best prior
guess about p.
If no prior information about p is available, use the conservative value p0 = 21 .
Example 7.4: In obtaining a 90% confidence interval for p, if it is desired to have an error
of no more than 3 percentage points i.e., m = 0.03, how large an n do we need?
α
Solution: Since (1 − α) = 0.90, 2
= .05 and zα/2 = 1.645. We want m = 0.03. Therefore,
(1.645)2
n= = 751.67
4(.03)2
R code:
1 z=qnorm ( ( 1 − 0 . 9 ) / 2 , l o w e r . t a i l = FALSE)
2 m=0.03
3 n=z ˆ2 / ( 4 ∗mˆ 2 )
4 n
> n
[1] 751.5398
EXERCISES
7.1 A psychologist is studying “learning” in rats and wants to determine the overall time
required for rats to learn to traverse a maze. She randomly selected 12 rats and found
EXERCISES 129
5.1, 1.6, 6.8, 2.3, 5.4, 1.8, 2.6, 6.2, 3.6, 3.4, 4.4, 4.8
(a) Assuming this data is from a N(µ, σ) distribution where σ is known to be 3.7,
estimate the average time required for rats to learn to traverse this maze with a
90% confidence interval.
(b) If we wished to estimate this with 90% confidence and with a margin of error, m
= 1 minute, how large a sample will be needed?
7.2 The League for Moral Rectitude felt it was necessary to verify if the average hemline
of skirts worn by girls at a local college is more than 4” above the knee. Suppose µ
denotes the true mean (inches above the knee). Careful measurement of a random
sample of 15 girls yielded the following data ( in terms of inches above the knee):
2, 5, 7, 4, 5, 7, 6, 8, 0, 9, 4, 5, 3, 2, 4
7.3 If a sample of 40 men have a mean weight of 178 pounds with a sample standard
deviation s of 8.5, find a 99% confidence interval for the population mean weight.
7.4 A company is interested in estimating the mean number of days of sick leave taken by
its employees.
(a) The firm’s statistician selects 25 personnel files at random and notes the number of
sick days taken by each employee. The following sample statistics are computed:
x = 12.2 days and s = 10 days. Find a 90% confidence interval for the mean
number of sick days taken by each employee.
(b) How many personnel files would the statistician have to go through to estimate
the mean number of sick days within a margin of error of 2 days with a 99%
confidence interval? (Assume σ = 10 days)
130 Estimation and Confidence Intervals
Find a 95% confidence interval for the mean nicotine content,µ for this brand of
cigarettes. (Assume the nicotine content is normally distributed with both µ and
σ unknown.)
7.6 A random sample of the birth weights of 20 babies born in a Santa Barbara hospital
are recorded. This sample has x = 6.87 lbs. and s = 1.76. Find a 99% confidence
interval for the mean weight of babies born in this hospital.
7.7 Nine measurements of the percentage of sugar in 9 boxes of cereal A yielded the fol-
lowing results:
Construct a 90% confidence interval for the mean percentage of sugar in cereal A.
7.8 A physical model suggests that the mean temperature increase in the water used as
coolant in a compressor chamber should not be more than 5◦ C. Temperature increases
in the coolant measured on 8 independent runs of the compressing unit revealed the
following data:
Give a 95% confidence interval for the mean increase of the temperature in the coolant.
7.9 The amount of dissolved oxygen in water is an important measure of the quality of
water and its ability to support aquatic life. The following readings (mg/l) were ob-
tained when 12 random samples downstream from an industrial region were tested.
7.13 6.68 6.14 6.39 5.14 5.28 4.47 5.75 7.05 4.78 6.79 5.67
7.10 A manager of a large production facility wants to determine the average time required
to assemble a widget. A random sample of the times to produce 15 assembled widgets
gave x = 15.2 minutes and s = 2.2 minutes.
(a) Assuming that the assembly times are normally distributed, estimate the mean
assembly time with 95% confidence.
(b) How would your answer to part (a) change if the population standard deviation
were known to be 2.0 minutes?
7.11 If 12 out of a sample of 50 Halloween-revelers at Isla Vista are from out of town, find
a 95% confidence interval for the true proportion of people coming from out of town
for the Halloween.
7.12 In a telephone survey on the subject of the death penalty, 250 out of the 400 people
contacted said they are in favor. Find a point estimate for the “true” proportion of
people who support death penalty, as well as a 95% confidence interval for it.
7.13 A random sample of 200 persons from the labor force of a large city are interviewed,
and 22 of them are found to be unemployed. Give a 95% confidence interval for the
unemployment rate in the city.
7.14 A news reporter would like to predict the percentage of members of Congress who favor
a certain controversial change in the banking laws. If a poll of 120 members yields 72
who support the change, find a 90% confidence interval for the true percentage of
members favoring the change.
7.15 A news organization is interested in finding out the proportion of people who believe
the explanation of a high-ranking government official concerning an alleged incident.
(a) Find the smallest sample size needed to attain a desired margin of error of 0.05
for a 95% confidence interval for the true proportion.
(b) Suppose now that out of 100 people surveyed randomly, 55 people were found to
believe the explanation. Give a 90% confidence interval for the true proportion
of people who believe the official.
(c) If 10 people are sampled randomly and the true proportion is 0.5, what is the
exact probability that the number of people who believe the official is less than
or equal to 8?
132 Estimation and Confidence Intervals
7.16 A survey is to be conducted to estimate the true proportion of faculty in the UC system
who favor “Affirmative Action”. How large a sample should be used if we want that
with 90% confidence, the sample proportion will not differ from the true proportion
by more than 0.03?
7.17 An employee of the CHP desires to estimate the true proportion p of California drivers
who wear seat belts.
(a) How many drivers should be sampled in order that the sample proportion will not
differ from the true proportion by more than 0.03 with 90% confidence.
(b) Suppose 100 cars were randomly stopped and 85 of these drivers wear seat belts
on a regular basis, find a 95% confidence interval for p, the true proportion.
7.18 You have been asked to conduct a survey to determine the proportion of students at
UCSB who favor a fee increase to support a UCEN expansion. If you would like your
pb to be in error of no more than ±0.05 when constructing a 90% confidence interval,
find how large a sample you will need for your survey.
7.19 (a) Of the 500 cars observed on a California freeway, 160 were foreign imports. Find
a 90% confidence interval for the true proportion of foreign imports, assuming
that this is a representative sample of cars in California.
(b) A survey is to be conducted to estimate the proportion of citizens who favor
trade restriction on imports. How large should the sample be so that with 98%
confidence, the sample proportion will not differ from the true proportion by more
than 0.04?
Chapter 8
In the last chapter, we were concerned with estimating unknown parameters by using point
estimates as well as through confidence intervals. Sometimes, we are more concerned with
verifying if the observed data fits a certain hypothesis about these parameters or contradicts
it. A hypothesis is a statement about the unknown parameter, say p or µ. For
instance, if p is the probability of observing a heads after flipping a coin, we may wish to
test the hypothesis that the coin is fair, i.e.,
1
H:p= .
2
Or, if µ refers to the true or population average height of individuals, we may wish to verify
if the observed sample data is consistent with the hypothesis
H : µ = 64 inches.
Based on the sample data, we wish to test if the hypothesis H is true or false and
accordingly accept or reject it. If it is consistent with the data, we have no reason to reject
H (i.e., we accept H) and otherwise, we reject H. Before we accept or reject a hypothesis, we
should ask what the alternatives are and set up an alternative hypothesis, relative to which
we judge the original hypothesis.
134 Testing Hypotheses for a single sample
8.1 Introduction
A null hypothesis, written H0 , is a statement denoting “no effect”, “no change” or “status
quo”. An alternative hypothesis, written Ha , reflects the “expected change” or what is
called the “research hypothesis”.
In hypotheses testing context, we take the attitude that we will hold on to the null
hypothesis as true and reject it only if there is sufficient evidence against it — much the same
attitude as expressed in statements like “innocent until proven guilty” or the conservative
philosophy, “do not fix things if they are not broken”. For instance,
Example 8.1: A coin when tossed 100 times, gives 62 heads. Is this a fair coin? Set up H0
and Ha .
Solution: We assume the coin is fair until proven otherwise, so that we set up the null
hypothesis
1
H0 : p = .
2
The alternative hypothesis in this general context is that the coin is not fair or that p is
1
different from 2
. In other words,
1
Ha : p 6= .
2
Such an alternative is called a two-sided alternative, since values on either side of the
1
hypothesized value 2
are allowed. In some cases, we may have more specific information or
even hunches about p. In such cases we may set up the alternative to be either
1
Ha : p >
2
or
1
Ha : p < .
2
These two latter type of alternatives are called one-sided alternatives, since they allow
1
for alternative values to be only on one side of the null-hypothesis value, 2
in this case.
3
Example 8.2: Suppose there are two treatments or drugs (or a treatment and a control)
that we wish to compare . Often we need to test if one treatment or drug is better than the
8.1 Introduction 135
or
Ha : Drug 1 is worse than drug 2.
Example 8.3: Suppose the average score on midterms over the past few years has been
equal to 70 out of a possible 100 points. This year, suppose with a sample of n = 100, we
obtain X = 73 . Are this year’s students any smarter? Assume that the midterm scores
follow a Normal distribution with σ = 10.
Solution: Suppose we use µ to represent this year’s true (unknown) mean score on the
midterm. Remember that X = 73 is just the mean of one sample and it can vary if we took
another sample. So we set up the null hypothesis that the students this year have the same
true mean as before (indicating status-quo), i.e.,
H0 : µ = 70
versus the alternative hypothesis suggested in the question, viz., that they are smarter,
Ha : µ > 70.
To decide whether to accept or to reject the null hypothesis, we ask “If the hypothesis
H0 is true and the true mean µ is still only 70, how likely are we to see a sample mean of
73 or even larger values than 73?” If the observed sample mean of 73 is quite possible, i.e.,
136 Testing Hypotheses for a single sample
consistent with a true mean of 70, there is no reason for us to change our mind about µ still
being 70.
Remark 8.1 While a null hypothesis as we said before, represents status-quo or no change,
it is not always clear-cut as to how to set up an alternative hypothesis. If no specific
possible alternative values are being indicated by the question (corresponding to a research
hypothesis), the default is a two-sided test.
A P -value is the probability of observing a value of the test statistic at least as contradictory
to the null hypothesis (and favoring the alternative hypothesis) as the observed value, when
the null hypothesis is indeed true i.e., the probability of finding the observed value or more
extreme or contradictory values than the observed one, if the null hypothesis is true.
Thus, the P -value (also called the observed significance level) is a measure of credi-
bility of the null hypothesis, given the data. If the P -value of the observed data is very small,
we reject H0 . As will be seen in the following examples, if the alternative is one-tailed, the
P -value is the tail area beyond the observed value, in the same direction as the alternative
hypothesis. If the alternative is two-tailed, the P -value is the probability of observing a test
statistic value at least as different from the hypothesized value, on either side, i.e., twice the
observed tail area.
How small should the P -value be before we reject the null hypothesis? There is no simple
answer but the smaller the P -value, the more convincing the disagreement between the data
and H0 . Typically a P -value of less than 1% or 5% might be convincing enough for most
purposes to decide to reject the null hypothesis.
X − µ0
Z=
√σ
n
8.2 P -value approach 137
has a N(0, 1) distribution under the null hypothesis. The computation of the P -value, i.e.,
the tail area depends on the alternative.
If the alternative is Ha : µ > µ0 , and z is the calculated value, the P -value is the tail area
to the right of z, i.e., P(Z > z). If the alternative is Ha : µ < µ0 , and z is the calculated
value, the P -value is the tail area to the left of z, i.e., P(Z < z). Finally, if the alternative
is Ha : µ 6= µ0 , and z is the calculated value, the P -value is 2 × P(Z > |z|). See Figure 8.1
for illustration.
z Z
0
z Z
0
|z| Z
0
Figure 8.1: P -values for testing H0 : µ = µ0 against various alternatives.
R Code Instruction:
By implementing R in the context of hypothesis testing, we would mostly use it to calculate
the critical values and the p-value; Essentially no brand new functions are needed. We now
take the example below to demonstrate the implementation of R:
Example 8.3 (contd.): If H0 is true and µ = 70, from Fact 2, the sample mean X follows
138 Testing Hypotheses for a single sample
σ 10
µX = µ = 70, σX = √ = √ = 1.
n 100
The P -value in this example is the probability that the sample mean can be as large as 73
or larger when the null hypothesis is true. Thus
!
X − 70 73 − 70
P -value = P(X ≥ 73) = P ≥ = P(Z ≥ 3) = 0.0013.
1 1
R code:
1 # H0 : miu=70 vs Ha : miu>70
2 n <− 100
3 sd x <− 10
4 miu xbar <− 70
5 sd xbar <− sd x/ s q r t ( n )
6 # The P−v a l u e i n t h i s example i s t h e p r o b a b i l i t y t h a t t h e sample mean can be
a s l a r g e a s 73 o r l a r g e r when t h e n u l l h y p o t h e s i s i s t r u e , t h u s p v a l=P( x
bar >= 7 3 )=P( Z >= z ) with
7 z = (73−mu xbar ) / sd xbar
8 p v a l=1−pnorm ( z ) #where pnorm ( abs ( z ) )=P( Z<z )
9 p val
Since this is considerably small, for instance less than 0.01, the null hypothesis does not
appear credible and we reject H0 . We may thus conclude that the true mean for this year’s
students is more than 70, that is, they are indeed smarter.
3
Example 8.3 (contd.): If on the other hand a sample mean of X = 73 is based not on 100
observations, but only say on n = 25, do we still reject H0 ?
Solution: In this case, µX is still equal to 70 under H0 , but note that now
σ 10 10
σX = √ = √ = = 2.
n 25 5
Thus,
73 − 70
P -value = P(X ≥ 73) = P Z ≥ = P(Z ≥ 1.50) = 1 − 0.9332 = 0.0668.
2
R code:
1 # i f p v a l=P( x bar >=73)=P( Z>=z ) and n=25
2 n <− 25
3 sd x <− 10
4 mu xbar <− 70
5 sd xbar <− sd x/ s q r t ( n )
6 z <− (73−mu xbar ) / sd xbar
7 p v a l=1−pnorm ( z )
8 p val
> n <- 25
> sd_x <- 10
> mu_xbar <- 70
> sd_xbar <- sd_x/sqrt(n)
> z <- (73-mu_xbar)/sd_xbar
> p_val=1-pnorm(z)
> p_val
[1] 0.0668072
140 Testing Hypotheses for a single sample
Since this is not sufficiently small (say smaller than 1% or 5%), we have no reason to
reject H0 that µ = 70. Thus the same difference of 3 extra points in the sample average is
convincing enough for us to reject H0 : µ = 70 if the sample mean is based on n = 100, but
not if it is based only on a sample of size n = 25.
In other words, sample means based on n = 25 have more variability and we are not able
to rule out chance variability as the cause for those extra 3 points.
3
Here, the observed sample mean is 68. To compute the P -value, we need to interpret
what we mean by more “extreme” values. Under the alternative that µ < 70, we expect
smaller X values. Values of X equal to 68 or smaller are considered more extreme relative
to the hypothesized mean of 70 and in favor of the present Ha . Thus, the P -value now is the
chance of seeing the observed sample mean value of 68 or values even smaller than this (the
left tail). We get
!
X − 70 68 − 70
P -value = P(X ≤ 68) = P ≤ = P(Z ≤ −1) = 0.1587.
2 2
R code:
1 # i f p v a l=P( x bar <=68)=P( Z<=z ) w h i l e n=25
2 n <− 25
3 sd x <− 10
4 mu xbar <− 70
5 sd xbar <− sd x/ s q r t ( n )
6 z <− (68−mu xbar ) / sd xbar
7 p v a l=pnorm ( z )
8 p val
> n <- 25
8.2 P -value approach 141
This is not too small; for instance, it is not less than 0.01 (or even 0.05), and so we do
not reject H0 .
3
How does one compute the P -value for a 2-sided alternative, say Ha : µ 6= 70?
H0 : µ = 70 vs. Ha : µ 6= 70.
If the alternative allows values on either side of the hypothesized values, “extreme” values
of X are those that are either too small or too large, relative to 70. Hence, in terms of the
calculation of the P -value for this example, this refers to X values at least 2 units smaller
than 70 as well as those at least 2 units larger than 70. But by the symmetry of Normal
curves,
R code:
1 # i f 2− s i d e d t e s t H0 : mu=70 vs Ha : mu != 70
2 # then p v a l=P( x bar <=68)+P( x bar >=72)
3 n <− 25
4 sd x <− 10
5 mu xbar <− 70
6 sd xbar <− sd x/ s q r t ( n )
7 z <− (68−mu xbar ) / sd xbar
142 Testing Hypotheses for a single sample
> n <- 25
> sd_x <- 10
> mu_xbar <- 70
> sd_xbar <- sd_x/sqrt(n)
> z <- (68-mu_xbar)/sd_xbar
> z_1 <- (72-mu_xbar)/sd_xbar
> p_val=pnorm(z) + (1-pnorm(z_1))
> p_val
[1] 0.3173105
> # it could also be calculated as 2*P(Z<=z)
> p_val_same=2*pnorm(-abs(z))
> p_val
[1] 0.3173105
As we can see, the calculation of p-value depends on the alternative hypothesis. But
in essence, the p-value calculation is just to find the tail-area –either one tail or both tails
depending on the alternative being one-sided or two-sided; if the p-value i.e. the area, we
calculate is small, say less than some α, then we would reject H0.
Example 8.4: Suppose the mean height for men is known to be 66”, and on the basis of a
sample of size n = 36 women, we get X = 62”. Can we determine if on the average, women
are shorter than men? (Use σ = 10.)
Solution: If µ represents the true mean height for women, we wish to test
Hence,
!
X − 66 62 − 66
P -value = P(X ≤ 62) = P ≤ = P(Z ≤ −2.4) = 0.0082.
1.67 1.67
Since this P -value is small (less than 0.05 or 0.01), we reject H0 and conclude that the women
are shorter on the average.
3
Example 8.5: Suppose the verbal SAT score for n = 100 students gives X = 500, and it
is known that σ =100. Test the hypothesis H0 : µ = 475 versus the 2-sided alternative,
Ha : µ 6= 475.
Solution: Here
X − µ0 500 − 475 25
z= √ = √ = = 2.5
σ/ n 100/ 100 10
with
P(Z > 2.5) = 0.0062.
Recall that the P -value for a 2-sided alternative takes into account the possibility that
extreme values can be on either tail. We therefore double this to get
Since this is not smaller than 0.01, we do not have a strong enough reason to reject H0 .
However, this P -value is smaller than 0.05, and someone using that as a yard-stick would
reject H0 . Thus, this P -value may be considered a borderline result in terms of providing
conclusive enough evidence in favor of the alternative.
3
144 Testing Hypotheses for a single sample
α
zα Z
0
Figure 8.2: Rejection regions for testing H0 : µ = µ0 with fixed significance level α.
Example 8.4 (contd.): If, based on a sample of size n = 36 we get a sample mean of x =
64”, test H0 : µ = 66 versus Ha : µ < 66. Assume σ = 6” and use α = 0.05 .
Solution:
R code:
1 # one−s i d e d t e s t H0 : mu=66 vs Ha : mu<66
2 n <− 36
3 mu xbar <− 64
146 Testing Hypotheses for a single sample
4 sd x <− 6
5 a l p h a <− 0 . 0 5
6 sd xbar <− sd x/ s q r t ( n )
7 z=(mu xbar −66)/ ( sd xbar )
8 #f i n d c r i t i c a l v a l u e Z c
9 Z c <− −qnorm(1− a l p h a )
10 print ( z )
11 p r i n t (Z c ) # comparing z with c r i t i c a l v a l u e Z c
> n <- 36
> mu_xbar <- 64
> sd_x <- 6
> alpha <- 0.05
> sd_xbar <- sd_x/sqrt(n)
> z=(mu_xbar-66)/(sd_xbar)
> #find critical value Z_c
> Z_c <- -qnorm(1-alpha)
> print(z)
[1] -2
> print(Z_c) # comparing z with critical value Z_c
[1] -1.644854
In this case, since −2 < −1.645, we do reject H0 and conclude that the true mean is less
than 66.
Very similar to the p-value approach, a fixed level of significance approach can also be
used to do the same job. Depending on the alternative hypothesis, H0 would be rejected
when the calculated vale of z falls into the shaded rejection region,
3
As for Python, a new package called Scipy, which is as widely used as N umpy, will be
introduced in the next few sections. While Scipy is quite an extensive package, we would
just use few of its basic functions.
8.3 Fixed Level of significance 147
Example 8.5 (contd.): Recall here n = 100, x = 500, and σ = 100. Let α = .01. Also
recall that we already tested H0 : µ = 475 versus Ha : µ 6= 475 earlier using the P -value
approach. To use this alternate approach,
Step 1 Compute
x − µ0 500 − 475 25
z= √ = √ = = 2.5.
σ/ n 100/ 100 10
α .01
Step 2 Since this is a 2-sided alternative, we find zα/2 corresponding to 2
= 2
= .005,
which, from Table C, is = 2.576.
Step 3 Reject H0 if |z| > 2.576. Since 2.5 is not larger than the critical value, 2.576, we
do not reject H0 — the same conclusion we reached before by using the alternate P -value
approach.
Python code:
1 import numpy a s np
2 from s c i p y . s t a t s import norm
3 n=100
4 mu xbar=500
5 sd =100
6 a l p h a =0.01
7 s d x b a r=sd /10
8 z=(mu xbar −475) / s d x b a r #compute z
9 Z c=norm . ppf (1 −( a l p h a / 2 ) ) # l o o k i n g f o r c r i t i c a l v a l u e when 2− s i d e d t e s t i n g
10 p v a l =2∗norm . c d f (−abs ( z ) ) # p v a l u e when 2− s i d e d t e s t i n g
11 print ( ’ z value i s ’ , z , ’ . c r i t i c a l value i s ’ , Z c , ’ . p value i s ’ , p val )
Compared with the codes in R, the structure of how scipy.stats calculates the critical
value z and the p-value, is quite similar. We will now take a look at how R would approach
the testing problems.
R Solution:
148 Testing Hypotheses for a single sample
In Sections 8.2 and 8.3 we illustrated testing ideas for the unknown µ after assuming that
the population standard deviation σ is known. In practice, σ also is typically unknown. In
8.4 One sample t-test 149
this section, we consider testing H0 : µ = µ0 when σ is not known. Recall that we mentioned
in Section 7.4 that when σ is unknown, in the z-statistic
x − µ0
z= √ ,
σ/ n
x − µ0
t= √ ,
s/ n
which has a t(n − 1) distribution. Apart from this substitution and using a t-distribution in
place of a z-distribution, the mechanisms for hypothesis testing remain the same as in last
two sections. We now recite the three steps in fixed α testing, for this situation.
Step 1 Calculate
x − µ0
t= √ .
s/ n
Step 3
R Code Instruction: When dealing with a t distribution, the entire structure of our R
code remains the same except for substituting qnorm() by qt()
Python Code Instruction: Everything else in Python also remains the same except for
adding a new built-in function t.ppf () which would display the probability density function
of the t distribution using the Syntax:
1 t . ppf ( x , d f )
150 Testing Hypotheses for a single sample
which provides the probability density of the t distribution with df degrees of freedom, at
the value x.
Step 1
x − µ0 64 − 66 −12
Here t = √ = √ = = −2.4.
s/ n 5/ 36 5
Step 3 Since −2.4 < −1.690, we reject H0 and conclude that the true mean is less than 66.
3
R code:
1 # one−s i d e d t e s t H0 : mu=66 vs Ha : mu<66
2 a l p h a =0.05
3 n=36
4 mu xbar=64
5 samp var =25
6 t <− (mu xbar −66)/ s q r t ( samp var /n )
7 t
8 # Fixed l e v e l o f s i g n i f i c a n c e approach
9 t c <− qt ( alpha , n−1)
10 print ( t c )
11 # p−v a l u e approach
12 p v a l <− 1−pt ( abs ( t ) , d f=n−1)
13 print (p val )
> alpha=0.05
> n=36
> mu_xbar=64
> samp_var=25
> t <- (mu_xbar-66)/sqrt(samp_var/n)
8.4 One sample t-test 151
> t
[1] -2.4
> # Fixed level of significance approach
> t_c <- qt(alpha,n-1)
> print(t_c)
[1] -1.689572
> # p-value approach
> p_val <- 1-pt(abs(t),df=n-1)
> print(p_val)
[1] 0.01092493
Python code:
1 import numpy a s np
2 from s c i p y . s t a t s import t
3 a l p h a =0.05
4 n=36
5 mu xbar=64
6 samp var=25
7 t e s t r e s u l t =(64−66) / ( 5 / 6 )
8 p r i n t ( ’ the t e s t r e s u l t i s ’ , t e s t r e s u l t )
9 t v a l=−t . ppf (1− alpha , n−1) # c r i t i c a l v a l u e o f t
10 p r i n t ( ’ the c r i t i c a l value of t i s ’ , t v a l )
Example 8.7: Suppose we have a sample of size n = 16, from which we obtain x = 10 and
s = 3.2. Test H0 : µ = 8 vs. Ha : µ > 8 using α = .05 .
Solution:
Step 1
x − µ0 10 − 8 8
t= √ = √ = = 2.5.
s/ n 3.2/ 16 3.2
152 Testing Hypotheses for a single sample
Step 2 Since n = 16, df = (n − 1) = 15. Corresponding to this row and for α = .05, we get
tα = 1.753 from Table C.
Step 2 We still have df = 15, but since this is a 2-tailed alternative, we have α/2 = .025,
so that tα/2 = 2.131.
Remark 8.2 The P -values are a bit harder to compute for the t-tests since Table C for the
t-distribution is not as extensive as Table A for the z-statistic. We can still use this table
to find the approximate range for the P -value. For instance, in Example 8.7, with df = 15
and observed (or calculated) value of t = 2.5, we can look up Table C to conclude that the
P -value is between 0.01 and 0.02. This follows since the tail probability for 2.249 is 0.02 and
the tail probability for 2.602 is 0.01 and our value 2.5 is right in between. Most statistical
computer software packages provide the P -values such as for t-distributions.
8.5 Tests on σ
Often, we are interested in testing and controlling the variability in given measurements.
For instance, a bottler filling soda cans wants the mean to be 12 ozs. (say) and neither too
much more nor too much less. She may wish to test a hypothesis about the true σ, say
H0 : σ = σ0 . Such a test can be conducted as follows:
Step 1 Calculate
(n − 1)s2
χ2 = .
σ02
8.5 Tests on σ 153
Step 2 Find χ2α (or, for a two-sided test, two numbers χ21−α/2 and χ2α/2 ) from Table D for a
given df and a given level of significance α.
Step 3
Case A: If the alternative is Ha : σ > σ0 , reject H0 if χ2 > χ2α , where χ2α is the value with
area α above it.
Case B: If the alternative is Ha : σ < σ0 , reject H0 if χ2 < χ21−α where χ21−α is the value
with area α below it.
α
Case C: If the alternative is Ha : σ 6= σ0 , reject H0 if χ2 < χ21−α/2 (with area 2
to the
α
left) or if χ2 > χ2α/2 (with area 2
to the right).
R Code Instruction:
In scipy.stats package, we would use chi2 as our tool to solve problems involving the chi-
square distribution. chi2.isf () is just similar to qchisq() in R, and calculates the probabilities
for a chi-square distribution.
Example 7.3 (contd.): Suppose on the basis of n = 16 observations, we find s = 3.2. Test
the hypothesis H0 : σ = 4.2 versus the alternative Ha : σ < 4.2, using α = .05.
Solution:
Step 1 We compute
(n − 1)s2 15(3.2)2
χ2 = = = 8.71.
σ02 (4.2)2
154 Testing Hypotheses for a single sample
Step 2 and 3 Here the alternative Ha : σ < 4.2 is one-sided (see Case B above). From a χ2
distribution with (n − 1) = 15 df, we find the lower tail critical value χ21−α = χ20.95 = 7.26.
Since the computed value of χ2 , namely 8.71, is not below 7.26, we have do not have strong
enough reason to reject the hypothesis that the actual value of σ is 4.2.
R code:
1 s t d =3.2
2 sigma =4.2
3 n=16
4 a l p h a =0.05
5 c h i =(n−1)∗ s t d ∗ s t d / ( sigma ∗ sigma ) #o b s e r v e d v a l u e o f c h i −s q u a r e
6 print ( chi )
7 c h i c r i t i c a l =q c h i s q ( alpha , d f=n−1) #c r i t i c a l v a l u e o f c h i −s q u a r e
8 print ( chi c r i t i c a l )
> std=3.2
> sigma=4.2
> n=16
> alpha=0.05
> chi=(n-1)*std*std/(sigma*sigma) #observed chi-square
> print(chi)
[1] 8.707483
> chi_critical=qchisq(alpha, df=n-1) #critical value of chi-square
> print(chi_critical)
[1] 7.260944
Python code:
1 from s c i p y . s t a t s import c h i 2
2 sd =3.2
3 n=16
4 sigma =4.2
5 c h i =(n−1)∗ sd ∗ sd / ( sigma ∗ sigma )
6 c h i c r i t i c a l = c h i 2 . i s f ( q =0.95 , d f=n−1)#c r i t i c a l v a l u e o f c h i s q u a r e
7 p r i n t ( ’ c a l c u l a t e d v a l u e o f c h i −s q u a r e i s ’ , c h i )
8 p r i n t ( ’ c r i t i c a l v a l u e o f c h i −s q u a r e i s ’ , c h i c r i t i c a l )
8.6 Large sample tests on proportions 155
In dealing with counts or proportions, we are interested in testing hypotheses about the true
proportion p. Suppose in a Binomial experiment with n trials, we observed x successes with
resulting sample proportion
x
p̂ = .
n
Based on this, we wish to test
H0 : p = p0 ,
where p0 is a specified value. Recall from Fact 40 that if we have a large enough sample,
pb − p0
z=q
p0 (1−p0 )
n
has an approximate N(0,1) distribution under the hypothesis that the true p is p0 . This fact
can be used to test hypotheses on p.
Step 1 Calculate
pb − p0
z=q .
p0 (1−p0 )
n
Step 3
Example 8.8: When a coin is tossed 100 times, suppose we get 60 heads and 40 tails. Test
if the coin is fair versus the alternative it is loaded in favor of heads, using significance level
α = 0.05.
Solution: We want to test
1 1 1
H0 : p = (fair coin) versus Ha : p > ( here p0 = = 0.5).
2 2 2
Since
60
pb = observed proportion = = 0.6,
100
Step 1
Steps 2 and 3 For this one sided alternative, we reject H0 if z > zα . From Table A, zα =
1.645. Since z = 2 > 1.645, we reject the null hypothesis H0 that the coin is fair.
R code:
1 p hat=60/ 100
2 p0=1/ 2
3 n=100
4 a l p h a =0.05
5 s t d=s q r t ( p0∗(1−p0 ) /n )
6 z=(p hat−p0 ) / s t d #o b s e r v e d z
7 print ( z )
8 Z c <− qnorm(1− a l p h a ) # c r i t i c a l z f o r Ha : p > 1 / 2
9 p r i n t (Z c )
> p_hat=60/100
> p0=1/2
EXERCISES 157
> n=100
> alpha=0.05
> std=sqrt(p0*(1-p0)/n)
> z=(p_hat-p0)/std #observed z
> print(z)
[1] 2
> Z_c <- qnorm(1-alpha) # critical z for Ha:p > 1/2
> print(Z_c)
[1] 1.644854
Python code:
1 import numpy a s np
2 from s c i p y . s t a t s import norm
3 p h a t =40/100 #o b s e r v e d p
4 p0=1/2 # p i n H0
5 n=100
6 sd=np . s q r t ( p0∗(1−p0 ) /n )
7 a l p h a =0.05
8 z=( p hat−p0 ) / sd #compute z
9 Z c=−norm . ppf (1 −( a l p h a ) ) # l o o k i n g f o r c r i t i c a l v a l u e when one−s i d e d t e s t i n g
10 print ( ’ z value i s ’ , z , ’ . c r i t i c a l value i s ’ , Z c )
For quick and easy reference, we summarize all these tests as well as those that appear
in the next chapter, in the form of a Table in Appendix F.
EXERCISES
8.1 A government agency is planning to send a spaceship to the moon with you in it.
Clearly, you are quite concerned about returning safely. The trip will take place only
158 Testing Hypotheses for a single sample
if a test of the hypothesis H: “the ship will return safely”, is accepted, instead of
alternative A: “the ship will not return safely”. What do the Type I and Type II errors
represent here? Which will be of more concern to you?
8.2 A label on a certain cereal package states that the true mean weight of the packages
is 16 ounces (denote this by µ = 16). A consumer group insists that the true mean
weight is less than the stated weight. Suppose that the total weight of a sample of 100
boxes is 1550 ounces, and that the standard deviation of the packages is known to be
1.0 ounce.
(a) Test the hypothesis that µ = 16 versus the suggested alternative. Let α = 0.05.
(c) Suppose that µ actually is equal to 16. What is the probability that the sample
mean of 100 boxes is between 15.8 and 16.1?
8.3 Suppose the amount of soda in a can is normally distributed with a mean µ of 12
ounces. On the other hand a random sample of 20 cans from a bottling plant gave x =
11.2 ounces and s2 = 0.4 . Test if there is any validity to a consumer complaint that
they are being short-changed. State a null and alternative hypothesis, and test them
using α = 0.05.
8.4 A company named Acme Semiconductors has developed a new microprocessor. It wants
to test how fast one of these new chips can conduct a certain benchmark calculation.
Suppose that the time it takes to complete the calculation is normally distributed.
After 10 runs, the sample average time to completion is 32.7 seconds, and the sample
variance is 16. Letting α = 0.05, conduct a two-sided test of the hypothesis that the
true average time to completion is 30 seconds.
8.5 For the data in Exercise 7.2, test the hypothesis H0 : µ = 4 against the alternative
Ha : µ > 4 using α = 0.05 under both situations of (a) and (b).
8.6 The manufacturer in Exercise 7.5 claims in his advertisements that the mean nicotine
content is no more than 16 mgs. Test their claim at α = 0.10 level of significance.
8.7 Use the data in Exercise 7.7. The manufacturer of cereal A claims that the percentage
of sugar does not exceed 19.8%. Test the hypothesis that they are right. Use α = .05.
EXERCISES 159
8.8 Use the data in Exercise 7.8. Do the data contradict the assertion of the physical
model? (Test at the level 5%.)
8.9 A manufacturer of automobile tires claims that the average number of trouble-free miles
given by one line of tires made by his company is more than 40,000. When sixteen
randomly picked tires were tested, the mean number of miles was 39,200, with the
sample standard deviation s equal to 8,200 miles. At the 5 percent level of significance,
is the manufacturer’s claim justified?
8.10 Suppose the weight of a fish caught off of the Santa Monica Pier is normally distributed.
From a random sample of 20 recently caught fish, the sample mean is found to be 22.0
ounces and the sample variance is found to be 25.0 ounces. The Fish and Wildlife
Department states that the true mean weight is 24.0 ounces. Fishermen claim that
due to increased pollution in the bay, the true mean weight is less than 24.0 ounces.
Test the claim of the Fish and Wildlife Department at α = 0.05.
8.11 The average IQ of entering high school students is known to be 113. A random sample
of 49 students who were firstborn children was found to have a mean IQ of 117 with a
standard deviation of 15. Is it reasonable to conclude that firstborns have a different IQ
from that of the general population of students? Give your conclusions using α = 0.01.
What is the P -value of the test?
8.12 (a) A soft-drink dispensing machine is supposed to dispense 8 ozs. per cup. The
machine needs to be adjusted if the true standard deviation, σ, of the amount per
cup is greater than 1.2 ozs. Does the machine need adjustment if the 6 sample
cups gave
(a) Past achievement test scores have been normally distributed with a mean of 25. Is
there evidence to suggest that the performance of this year’s class is significantly
above the past average? Set-up and perform an appropriate test using α = .10.
160 Testing Hypotheses for a single sample
(b) Construct a 99% confidence interval for the true mean score, µ.
(c) Find a 95% confidence interval for the true standard deviation, σ.
8.14 It has been conjectured that the Dow Jones Index for stock prices is equally likely to
rise as not rise on a given day relative to the previous day’s index value. Over a period
of 1000 days, suppose that for 550 of those days the Dow Jones index value rose from
the previous day’s value. Test this hypothesis at α = 0.02.
8.15 If in 80 tosses of a coin, one observes 48 heads, test the hypothesis that the coin is fair.
Use α = .05.
8.16 A real estate agent in Santa Barbara claims that during summer, 70% of the days
are “sunny”. In other words, one can see the sun for more than 4 hours without
interruption. A client spends 10 days in Santa Barbara and observes only 5 “sunny”
days. Test the hypothesis that the agent is telling the truth. (Use α = .05.)
8.17 A politician takes a random sample of 50 voters and finds that 28 of these 50 are
pro-choice. Use a 5% level of significance to test if the actual proportion of pro-choice
voters is significantly more than 0.5. What is the P -value of the test?
8.18 Two girls who share an apartment take turns at washing dishes. Out of a total of 10
broken dishes that occurred over a quarter, 8 were caused by the younger girl. Do
you think she is clumsier (or can the event be attributed to chance?) [Use a binomial
1
distribution with n = 10 and p = 2
to find the P -value.]
8.19 It is believed that a certain die is loaded, and specifically that the tendency for a “six”
to appear is greater than if the die were fair. Suppose that after rolling the die n = 96
times, “six” appeared 24 times. Test the hypothesis that the die is fair at α = 0.01.
8.20 A sample of 900 cars passing through a busy intersection in Santa Barbara was taken
on a particular morning. It was found that 13% of the cars observed were SUVs.
The purpose of the study was to determine whether the proportion of SUVs in Santa
Barbara has increased from the last year’s value of 10%.
8.21 In a large city, 30% of the households had a particular newspaper delivered to their
doors. After the newspaper conducted an aggressive marketing campaign to increase
that figure, a random sample of 200 households was surveyed. Of the sample group,
85 households now have the paper delivered.
(a) Can we conclude at 5% level of significance that the campaign was a success?
(b) Find the P -value of the test.
162
Chapter 9
Quite frequently, we wish to compare the means of two groups or populations based on sam-
ples from each of them. For example, we may have available a sample of m observations on
men’s heights (x1 , x2 , . . . , xm ) and another independently drawn sample of n observations
on women’s heights (y1 , y2 , . . . , yn ). It could be of interest to test if the mean heights
are the same for these two groups. As another example, (x1 , x2 , . . . , xm ) may represent
observations corresponding to a “treatment” group, whereas (y1 , y2 , . . . , yn ) represent ob-
servations for the “control” group. We would want to test the hypothesis that the treatment
made no difference or, that the treatment and control groups have the same mean i.e.,
H0 : µx = µy
Ha : µx > µy , Ha : µx < µy or Ha : µx 6= µy .
Before we proceed to deal with genuine two-sample questions such as when comparing 2
independent samples, we will discuss a special situation where the observations (X, Y )
actually come as matched pairs and hence are not independent. For instance X could be the
score on a test for an individual before (s)he undergoes certain training and Y the score for
the same individual after such training. Or, (X, Y ) could be the heights of twins or siblings.
9.1 Paired t-test 163
In such cases, the independence assumption about X and Y is not valid. In Chapter 1, we
mentioned briefly the advantages of “blocking” and the paired situation is a simple yet very
useful and powerful example of such blocking. Here a block consists of the two twins or
siblings or sometimes even the same person being measured before and after. By pairing
the data and differencing, we eliminate the so-called block-effect and capture what we are
actually trying to measure.
In this case, it is easy to see that the problem of testing the equality of the means can
be translated into one which says that the differences between X and Y have mean zero.
Let µx be the true mean for X’s (say, before training) and µy the true mean for the Y’s (say,
after training). We wish to test the null hypothesis that training makes no difference, i.e.,
H0 : µx = µy
versus say, the alternative that the mean is bigger after training than before training
Ha : µx < µy .
Let d stand for the difference (X − Y ). Then it is easy to see that the mean of the
difference, µd , is equal to the difference in the means, (µx − µy ). We can now rephrase H0
and Ha in terms of µd and formulate the hypothesis as :
H0 : µd = 0 versus Ha : µd < 0.
If the data consists of the pairs of observations (xi , yi ), we can then proceed as follows.
First, compute the differences:
and
Pn
1 (di − d)2
s2d = .
n−1
with x replaced by d and hypothesized mean µ0 here being zero (or it can be any other
specified value for the mean difference that might be postulated) , so that we have:
d−0 d
t= √ = √
sd / n sd / n
which has the t(n − 1) distribution. Against the one-sided alternative we have stated, we
reject H0 if t < −tα . This is called the paired t-test or matched pairs t-test.
Example 9.1: The amount of lactic acid in the blood was examined for 10 men, before and
after a strenuous exercise, with the following results:
Before: 15 16 13 13 17 20 13 16 14 18
After: 33 20 30 35 40 37 18 26 21 19
(a) Test if exercising changes the level of lactic acid in blood. Use α = .005.
(b) Find a 95% confidence interval for the mean change in the blood lactose level.
Solution:
9.1 Paired t-test 165
(a) Let d = (After level - Before level). The hypotheses are H0 : µd = 0 versus Ha : µd 6= 0
since we are just checking if there is a change. The 10 values for d are 18, 4, 17, 22,
23, 17, 5, 10, 7 and 1 with d =12.4, s2d = 63.1556. Thus,
d
t= √ = 4.9342.
sd / n
With df = 10 − 1 = 9 and α = .005, tα/2 = 3.69. We reject H0 if |t| > tα/2 . Since
4.9342 > 3.690, we do indeed reject H0 .
(b) By Table C, tα/2 = 2.262 for 95% confidence level when df = 9. Since s2d = 63.1556, sd
= 7.9470. Hence, the confidence interval for the mean change in lactic acid level µd , is
7.9470
12.4 ± (2.262) √ = 12.4 ± 5.6845 = (6.7155, 18.0845)
10
3
R code:
We are given the number of men is 10, and use alpha = 0.005. We use the mean function
to find the mean of the differences of x and y.
1 dbar<−mean ( d i f f )
2 dbar
3 s d sq<−6 3 . 1 5 5 6
4 t a c i d<−dbar / ( s q r t ( s d sq / 1 0 ) )
5 t acid
6 t s t a t a c i d=qt ( . 9 9 7 5 , d f =9)
7 round ( t s t a t a c i d , d i g i t s =3)
8 a c i d l o w e r<−dbar − qt ( 0 . 9 7 5 , d f =9)∗ s q r t ( s d sq / 1 0 )
9 a c i d upper<−dbar + qt ( 0 . 9 7 5 , d f =9)∗ s q r t ( s d sq / 1 0 )
10 p r i n t ( p a s t e ( round ( a c i d lower , d i g i t s =4) , round ( a c i d upper , d i g i t s =4) ) )
> dbar<-mean(diff)
> dbar
[1] 12.4
166 Comparing two samples
> s_d_sq<-63.1556
> t_acid<-dbar/(sqrt(s_d_sq/10))
> t_acid
[1] 4.934189
> t_stat_acid=qt(.9975,df=9)
> round(t_stat_acid, digits=3)
[1] 3.69
> acid_lower<-dbar - qt(0.975,df=9)*sqrt(s_d_sq/10)
> acid_upper<-dbar + qt(0.975,df=9)*sqrt(s_d_sq/10)
> print(paste(round(acid_lower,digits=4),round(acid_upper,digits=4)))
[1] "6.715 18.085"
Case (1) Assume that the standard deviations for X and Y observations, σx and σy are
known.
Step 1 Calculate
x−y
z=q σy2
.
σx2
m
+ n
In this case, we need to calculate the sample variance of the x’s, s2x , and the sample
variance of the y’s, s2y , by the usual formulae:
m
1 X
s2x = (xi − x)2 ,
m − 1 i=1
n
1 X
s2y = (yi − y)2 .
n − 1 i=1
If it is reasonable to assume that the variances in both these groups are nearly equal
(say from past experience) then we can combine these two separate estimates of variances
to obtain what is called the pooled variance
m n
( )
1
s2p (xi − x)2 + (yi − y)2
X X
=
m + n − 2 i=1 i=1
( )
1
= (m − 1)s2x + (n − 1)s2y .
m+n−2
Example 9.2: A medication for blood pressure was administered to a group of 13 randomly
selected patients with elevated blood pressures while a group of 15 was given a placebo. At
the end of three months, the following data was obtained on their Systolic Blood Pressure.
168 Comparing two samples
n sample mean s
Control group, x 15 180 50
Treated group, y 13 150 30
Test if the treatment has been effective. (Assume the variances are the same in both the
groups and use α = .01.)
Solution: Let µx denote the true mean blood pressure for the control group and µy denote
the true mean blood pressure for the treated group. The hypothesis of interest is
H0 : µx = µy versus Ha : µx > µy . Here
( )
1 (15 − 1) 502 + (13 − 1) 302
s2p = (m − 1)s2x + (n − 1)s2y = = 1761.54.
m+n−2 15 + 13 − 2
Thus,
(x − y) − 0 180 − 150
t= q =√ q = 1.8863.
1 1 1 1
sp m
+ n
1761.54 15
+ 13
The output for this R code gives the critical value to be:
Option A: The degrees of freedom is obtained from the two sample variances given above
by using the formula
2
s2x s2y
m
+ n
k = r s2 2 .
2 2sx 1 y 1
m m−1
+ n n−1
If the resulting value of k is not a integer, use the integer part of it. Thus, if k turns out to
be 15.2, use k = 15 degrees of freedom in looking up the tables. This procedure provides a
more accurate approximation to the degrees of freedom, although it is slightly complicated.
Step 2 This step is the same for Case 2 and 3 (options A and B) except for the different
degrees of freedom.
170 Comparing two samples
In Case (1) of last section, when the σx and σy are known, we may use the z-statistic to
obtain a (1 − α) = C level confidence interval for (µx − µy ) as
s
σx2 σy2
(x − y) ± zα/2 + .
m n
If σx , σy are not given and we are prepared to assume they are equal (Case (2) of last
section), we use the t-statistic to get
s
1 1
(x − y) ± tα/2 sp + .
n m
Here, tα/2 has degrees of freedom (m + n − 2). Finally, in Case (3) of the last section, we
have the confidence interval
s
s2x s2y
(x − y) ± tα/2 + ,
m n
where we use a t-distribution with k df (see Option A or Option B in Case (3) for the
appropriate k) to get tα/2 .
Example 9.2 (contd.): Construct a 95% confidence interval for the difference in the means
of blood pressures for the two groups , i.e., (µx − µy ).
Solution: As we have done before in the testing context, we assume that there is a common
unknown variance, which we estimate by the pooled variance. Recall x = 180, y = 150 and
9.4 Comparing two proportions in large samples 171
√
sp is 1761.54 = 41.9707. Corresponding to 26 df and C = (1 − α) = .95, tα/2 = 2.056.
Therefore, the 95% confidence interval for µx − µy is
s
1 1
(180 − 150) ± (2.056)(41.9707) +
13 15
3
R code:
We construct a 95% confidence interval where our difference in means is 30, standard devi-
ation is 41.97, and sample sizes are 13 and 15. We use qt() to find t0.025 .
1 sd . p o o l<−4 1 . 9 7
2 t . 0 2 5<−qt ( . 9 7 5 , d f =26)
3 l o w e r<−30 − t . 0 2 5 ∗ sd . p o o l ∗ s q r t ( 1 /13+1/ 1 5 )
4 upper<−30 + t . 0 2 5 ∗ sd . p o o l ∗ s q r t ( 1 /13+1/ 1 5 )
5 p r i n t ( p a s t e ( round ( lower , d i g i t s =4) , round ( upper , d i g i t s =4) ) )
> sd.pool<-41.97
> t_.025<-qt(.975,df=26)
> lower<-30 - t_.025*sd.pool*sqrt(1/13+1/15)
> upper<-30 + t_.025*sd.pool*sqrt(1/13+1/15)
> print(paste(round(lower,digits=4),round(upper,digits=4)))
[1] "-2.6912 62.6912"
Suppose there are x successes in one Binomial experiment with m independent trials and y
successes in a second Binomial experiment with n independent trials. Let p1 and p2 denote
the true probabilities for these two populations. We wish to test H0 : p1 = p2 .
172 Comparing two samples
Step 1 We compute
x
pb1 =
m
y
pb2 =
n
as well as
x+y
pb = combined or pooled proportion = .
m+n
This follows a Standard Normal distribution under the null hypothesis when both m and n
are sufficiently large (say, at least 25 or 30 each).
Steps 2 and 3
Example 9.3: A sample of 180 college graduates was surveyed, 100 of them men and 80
women, and each was asked if they make more or make less than $40,000 dollars a year. The
following data was obtained
Are men more likely to make more than $40,000 than women?
Solution:
9.4 Comparing two proportions in large samples 173
We want to test
versus
Ha : p1 > p2 (a higher proportion of men make more than $40,000 compared to women).
Here,
60 30
m = 100, pb1 = = 0.6, n = 80, pb2 = = 0.375.
100 80
Also,
60 + 30 90
pb = combined/pooled proportion = = = 0.5.
100 + 80 180
Thus, the test statistic is:
> z<-3
> z_0.05<-qt(.95,df=Inf)
> z_0.05
[1] 1.644854
174 Comparing two samples
As mentioned in Chapter 8, we summarize all the tests that appear in this and the last
chapter in the form of a Table in Appendix F.
EXERCISES
9.1 A new drug treatment for patients with elevated cholesterol levels is administered to
six patients for a four-month test period. Results of the experiment are given in the
following table:
Patient 1 2 3 4 5 6
Cholesterol count Before 252 311 280 293 312 327
After 211 251 241 248 256 268
(a) Treating the subjects as a random sample of patients with a high cholesterol
count, test whether the drug decreases the mean cholesterol level at least by 50
units. Use 5% level of significance.
(b) What is the P -value of this test?
(Hint: Draw the picture and think!)
9.2 A psychologist suspects that the first-born twin of a pair of twins is smarter, on average.
She measures the IQs of 8 pairs of twins and obtains the following data:
twin 1 2 3 4 5 6 7 8
IQ of first-born 103 98 107 127 116 134 152 110
IQ of second-born 101 96 102 125 117 131 140 112
Does this confirm the psychologist’s theory? Do a one-sided test using α = 0.05.
Entering weight 208 215 170 140 182 177 150 191 190
Weight after 10 weeks 200 200 172 143 165 159 150 175 172
(a) Using the information on “weight losses” for the above 9 people, construct a 90%
confidence interval for the “true mean” weight loss. What assumptions are you
making?
(b) Is the claim that the program enables one to lose 10 lbs. on average substantiated?
Test using level α = .05.
9.4 A placement exam in mathematics was given to 10 students who had new-math and
to 20 students who had a traditional math training. The mean score of the modern
math students was 88 points and that of the traditional math students was 82 points.
Suppose it may be assumed that the variances of the score for modern math and
traditional math are known to be σ12 = 20 and σ22 = 12 respectively. At the 5 percent
level of significance, do the true mean scores differ significantly? Assume that the
scores in the populations are normally distributed.
9.5 An psychologist wishes to test the hypothesis that “blondes have more fun” based
on the “Subjective Fun Scale” (higher scores indicating more fun). 13 brunettes are
selected at random and their SFS measured. He then bleaches the hair of all 13 subjects
blonde, waits two weeks and administers the SFS again. The scores are given in the
following table.
Person 1 2 3 4 5 6 7 8 9 10 11 12 13
Brunettes 70 47 57 61 65 57 58 58 62 56 52 60 61
Blondes 50 56 63 73 51 65 60 63 64 66 61 83 71
(b) If in the above experiment, the investigator selects 13 brunettes at random and
independently another 13 blondes at random and aministers SFS to them, result-
ing in the same table. Use α = 0.05 to test if blondes have more fun.
176 Comparing two samples
(c) If your conclusions are different in (a) and (b), how do you explain it?
9.6 To compare two programs for training industrial workers to perform a skilled job, 20
workers are included in an experiment. Of these 10 are selected at random to be
trained by method 1; the remainder to be trained by method 2. After completion of
training, all the workers are subjected to a time-and-motion test that records the speed
pf performance of a skilled job. The following data are obtained:
Method 1 15 20 11 23 16 21 18 16 27 24
Method 2 23 31 13 19 23 17 28 26 25 28
(a) Test the hypothesis H0 : mean job time after training with method 1 is the same
as after training with method 2 against alternative A: the first mean time is less
than the second. (Assume normality and use level α = .05).
(c) Suppose that the data above are obtained from paired samples (e.g., each column
of the table represents observations of two workers of the same age, mental and
physical abilities, etc.). Answer now the questions (a) and (b).
9.7 In one discussion section of a statistics course, there were 15 students and for the
midterm exam, their mean was 70 points and the sample variance equal to 36. In
another section, there were 20 students whose mean was 78 with a sample variance of
64. Is there reason to believe the second section is significantly better than the first?
(Use α = 0.01).
9.8 The manufacturer of a new type of baseball claims that their product has the same
playing characteristics as the one currently in use. To verify the manufacturer’s claim,
20 baseballs of the currently used type and 15 of the new type are placed in an auto-
matic ball-hitting machine, and the distance the machine hits each baseball is recorded.
The average distance for the baseball in current use is 352 ft with a standard deviation
of 20 ft, while that for the new ball is 375 ft with a standard deviation of 25 ft. Do
you think that the manufacturer’s claim is valid? (Use α = 0.10).
EXERCISES 177
9.9 In a study of cereal beetle damage on oats, researchers measured the number of beetle
larvae per stem in small plots of oats after randomly applying one of the two treatments:
no pesticide (the control), or malathion at the rate of 0.25 lb/acre. The data appear
nearly normal. Here are the summary statistics:
Group Treatment n x s
1 Control 12 3.5 1.2
2 Malathion 15 1.4 0.5
9.10 Electrical measurements on the nerve activity of 6 rats poisoned with DDT gave the
following observations:
Test the hypothesis that the mean level of electrical activity is the same in both groups,
versus the alternative hypothesis that poisoning increases the electrical activity. (Use
α = .05). Assume that the two samples come from normal distributions with the same
variance.
9.11 The percentage fat-content for samples of ice cream of two different brands, say A and
B, are given below:
(a) Assuming both these are from normal populations with common unknown σ 2 ,
test if these brands have the same percentage fat on the average. (Hint: Test H0
: µA = µB versus Ha : µA 6= µB , where µA , µB are the true means for the two
brands respectively. Use α= .05)
(b) Do the test without the assumption of equal σ.
Test H : the two true means are the same. Use α = .05.
9.13 In order to determine driving habits, an auto insurance company surveyed 30 male and
25 female drivers. Each was asked how many miles he or she had driven in the past
year. The means and the standard deviations are shown in the accompanying table.
Male Female
Mean 9,117 10,014
Standard deviation 3,249 3,960
Can we conclude at the 5% significance level that the male and female drivers differ in
the number of miles driven per year?
EXERCISES 179
9.14 On the issue of an “Oil Initiative” in Santa Barbara county, an opinion poll was con-
ducted which gave the following results: Among 80 registered Democrats sampled, 65
said they are in favor of the Initiative while among 75 registered Republicans, 50 said
they are in favor of it. Test if there is significant difference between Democrats and
Republicans, using 5% level of significance.
9.15 The percentages of adults who are right-handed, left-handed, and ambidextrous are well
documented. What is not so well known is that a similar phenomenon can be found in
animals. Dogs, for example, can be either right-pawed or left-pawed. Suppose that in
a random sample of 200 beagles it is found that 55 are left-pawed and that in a random
sample of 200 collies 40 are left-pawed. Can we conclude that the true proportion of
collies that are left-pawed is significantly different from the true proportion of beagles
that are left-pawed? Let α = 0.05.
180
Chapter 10
So far, we have dealt with situations where we measured only a single characteristic or
variable on each individual — so-called univariate data. Often, it is of interest to measure
two or more variables on the same individual. For instance, we may measure (height and
weight) or (SAT score and GPA) of each individual; (price of a commodity and demand),
(the amount of fertilizer and yield of a crop), etc. If we just measure two variables on each
individual, the resulting data is called bivariate data and takes the form
where (xi , yi ) represent measurements on the “same” ith individual. The primary goals in
studying two variables together, are
(i) correlation studies: to study the type and amount of association between x and y and/or
(ii) regression studies: to predict say y from x by setting up a simple equation,if possible,
that relates y to x.
10.1 Correlation
The first step in trying to figure out what if any kind of association there is between x and
y is to make a Scatter-plot. After looking through the x and y values, find an appropriate
10.1 Correlation 181
scale for the horizontal and vertical axes. A scatter plot consists of just plotting each of the
observations (xi , yi ) as a point on the x-y plane.
The type of relationship or association between the two variables can be broadly catego-
rized as:
(ii) Negative association: Two variables are said to be negatively associated if larger x-
values are generally associated with smaller y-values and vice versa, i.e., the two variables
tend to move in opposite directions. For instance, the relationship between the price (x) and
demand (y) for a commodity would be of this kind since the higher the price, typically the
less the demand for that commodity.
x
182 Bivariate Data: Correlation and Regression
n
!
1 X xi − x yi − y
rxy = .
n − 1 i=1 sx sy
Since each of the standardized deviations can be positive or negative, this measure itself
can be positive or negative. However, if x and y have a positive association, recall that
large x’s tend to be observed with large y’s . Hence, positive deviations of x, i.e. ( xis−x
x
),
get multiplied with positive deviations of y, i.e. ( yis−y
y
), and negative deviations of x get
multiplied with negative deviations of y. In either case, the product of such deviations is
positive and when added up, this results in a positive value for rxy . Similarly it can be seen
that r takes negative values when there is a negative association between x and y. Thus the
sign of the correlation coefficient represents the type of association, i.e., positive values of
r indicating positive association, while negative r indicating negative association between x
and y. Further, it can be shown that the correlation coefficient r takes values between -1
and +1, i.e., −1 ≤ r ≤ 1. Its magnitude indicates the strength of such association. For
instance, r = 0.2 represents a weak positive association while an r value of -0.8 represents a
stronger negative relationship between the variables.
Then
xi y i ) − (
P P P
[n ( xi )( yi )]
r=q
[n( x2 ) − ( x )2 ] [n( y 2 ) − ( y )2 ]
P P P P
i i i i
10.1 Correlation 183
Remark 10.2 Watch out for spurious correlation, i.e. false correlation that may exist
because of other so-called lurking or hidden variables.
Example 10.1: In a study on home fires in a town it was found that rxy = 0.85, where
x = amount of damage and y = number of firemen on the scene. In spite of this strong
positive correlation, it is inappropriate to conclude that the more firefighters are used, the
more damage occurs. In this case, z (= the size of the fire) is driving both x and y, and is
called the lurking/hidden variable.
3
Remark 10.3 Correlation does not imply causation. Even if there is a strong correlation,
we can not say which is the cause and which is the effect between x and y, without further
controlled studies.
For instance, in the case of smoking and incidence of lung cancer, just observing a high
correlation between the two factors does not imply smoking causes lung cancer. As the
tobacco companies would have us believe, it could be the other way around!
10.2 Regression
The second important goal in studying two variables together (i.e. bivariate data) is to
“predict” one variable say y, using the other variable x. This may be accomplished by
setting up a simple equation like
y = α + βx.
In this straight-line relation between y and x, α represents the “intercept”, i.e. where the
line meets the vertical or y-axis, and β represents the “slope” , i.e. the change in y for each
unit increase in x.
y
x
+ β
α slope β
y=
1
} intercept α
x
Figure 10.1: Intercept and slope of a regression line
How do we set up such an equation? In other words, how do we estimate the intercept
α and the slope β? We need sample data on such x and y , from which we can base finding
the best fitting regression line. This data, (xi , yi ), is sometimes called the training data.
We use the so-called Least squares principle to estimate these unknown α and β. If we
use the equation y = α + β x, then corresponding to the actual value of xi , our prediction
of the value of y would be (α + β xi ), while the observed value corresponding to this xi is
10.2 Regression 185
The “Least Squares Principle” is to find α and β which minimize these sum of squared
βx
y
=α +
y
e2 { } e1
x
Figure 10.2: Least squares method
n( xy) − ( x)( y)
P P P
sy
b= βb = rxy ( ) = .
n( x2 ) − ( x)2
P P
sx
Observe that this estimate of slope has the same sign as rxy since positive association
between x and y gives a positive slope and vice versa. Also, the sample estimate of the
intercept is given by
b = y − bx .
a=α
.
186 Bivariate Data: Correlation and Regression
yb = a + bx
where
sy
b = βb = rxy · .
sx
and
a = y − bx .
Before we start using a regression equation for prediction, we should ask if doing so has any
value. In other words, does y depend on x significantly or is what we have found just random
noise? Notice that in the latter situation the regression line becomes horizontal, indicating
y does not change with x. However, the sample estimate b may never be exactly equal zero
even if the true slope β = 0. Thus we want to test the null hypothesis
(residual)2 = ei 2 .
P P
sP sP s
(residual)2 (yi − ybi )2 n−1
se = = = sy (1 − rxy
2 ).
n−2 n−2 n−2
10.3 Inference for regression 187
Step 1 Calculate
se se
standard error of b = SEb = qP = √
(xi − x)2 sx n − 1
b
t= .
SEb
Prediction:
Here we have to make a distinction between two different scenarios. First, we may be
interested in predicting the “mean response” in y corresponding to a given value x = x∗ .
This is done using our regression equation given by α + βx∗ , and it is estimated by
µb y = a + bx∗ .
v
u1 (x∗ − x)2
u
SEµby = se t +P
n (xi − x)2
where tα/2 is the C = (1 − α) level value for the t distribution with (n − 2) df from Table C.
yb = a + bx∗ .
One can also obtain a C-level confidence interval for this predicted value, called the
prediction interval, which is given by
yb ± tα/2 SEby
where
v
1 (x∗ − x)2
u
u
SEby = se t1 + +P .
n (xi − x)2
Remark 10.4 Notice from the formula for the standard error, SEby , that the error in the
predicted value increases as x∗ is farther away from x. This implies that extrapolating for
values x∗ far away from the sample mean is not desirable.
Example 10.2: An economist studying the relationship between income and savings (mea-
sured in thousands of dollars) collected the following data on 10 households. We illustrate
through this example, many of the concepts discussed in this chapter starting with a scatter
plot.
10.3 Inference for regression 189
5
savings
0 10 20 30 40 50 60 70
income
Figure 10.3: Scatter plot of savings vs. income and the best fitting line
190 Bivariate Data: Correlation and Regression
xi yi x2i yi2 xi y i
25 0.5 625 0.25 12.5
28 0.3 784 0.09 8.4
35 0.8 1225 0.64 28.0
39 1.6 1521 2.56 62.4
44 1.8 1936 3.24 79.2
48 3.1 2304 9.61 148.8
52 4.3 2704 18.49 223.6
55 4.4 3025 19.36 242.0
65 5.6 4225 31.36 364.0
72 7.2 5184 51.84 518.4
463 29.6 23533 137.44 1687.3
P P P 2 P 2 P
xi yi x i y i xi yi
[10(1687.3) − (463)(29.6)]
r = q = .9804
[10(23533) − 4632 ][10(137.44) − 29.62 ]
[10(1687.3) − (463)(29.6)]
b = = 0.1511
10(23533) − 4632
se se .4921
SEb = qP =r 2
=q = .0107
4632
(xi − x)2 23533 −
P
( xi )
( x2i ) − n
P
10
b .1511
t = = = 14.1215
SEb .0107
From Table C, P -value = 2P(T > |t|) = 2P(T > 14.1215) = 0. Hence, reject H0 : β = 0.
A 99% confidence interval for β is given by:
v
u1 (x∗ − x)2
u
(a + bx∗ ) ± tα/2 se t +P
n (xi − x)2
v
1 (50 − 46.3)2
u
u
= −4.0381 + .1511 ∗ 50 ± 3.355 × .4921t + 2
10 23533 − 463
10
= 3.5169 ± .5389
= (2.9780, 4.0558)
192 Bivariate Data: Correlation and Regression
v
1 (x∗ − x)2
u
(a + bx∗ ) ± tα/2 se
u
t1 + +P
n (xi − x)2
v
1 (50 − 46.3)2
u
u
= −4.0381 + .1511 ∗ 50 ± 3.355 × .4921t1 + + 2
10 23533 − 46310
= 3.5169 ± 1.7367
= (1.7802, 5.2536)
3
R code instructions:
We can find the sample correlation coefficient r using the function cor, other regression
estimates using summary(), and plot a scatter plot and best fitting line. As seen in the
output for the R-code, this gives an r value of 0.9804, slope estimate b of 0.15115, intercept
a of -4.03812, residual standard error se of 0.4921, standard error SEb of 0.01075, and t−value
of 14.062.
R code:
1 x i<−c ( 2 5 , 2 8 , 3 5 , 3 9 , 4 4 , 4 8 , 5 2 , 5 5 , 6 5 , 7 2 )
2 y i<−c ( 0 . 5 , 0 . 3 , 0 . 8 , 1 . 6 , 1 . 8 , 3 . 1 , 4 . 3 , 4 . 4 , 5 . 6 , 7 . 2 )
3 r<−c o r ( x i , y i )
4 round ( r , d i g i t s =4)
5 p l o t ( y i ˜x i , x l a b=” income ” , y l a b=” s a v i n g s ” ,
6 main=” F i g u r e 1 0 . 3 ” ,
7 xlim=c ( 0 , 8 0 ) , ylim=c ( 0 , 8 ) , pch=8)
8 a b l i n e ( s , c o l=” b l u e ” )
9 s<−lm ( y i ˜x i )
10 summary ( s )
11 # 99% C o n f i d e n c e I n t e r v a l f o r Beta
12 t s t a t=qt ( p =.995 , d f =8)
13 round ( t s t a t , d i g i t s =3)
14 p r i n t ( p a s t e ( round (0.15115 − t s t a t ∗ 0 . 0 1 0 7 5 , d i g i t s =4) ,
15 round (0.15115+ t s t a t ∗SE b , d i g i t s =4) ) )
> x_i<-c(25,28,35,39,44,48,52,55,65,72)
> y_i<-c(0.5,0.3,0.8,1.6,1.8,3.1,4.3,4.4,5.6,7.2)
> r<-cor(x_i,y_i)
> round(r,digits=4)
[1] 0.9804
> plot(y_i˜x_i,xlab="income",ylab="savings",
main="Figure 10.3",
xlim=c(0,80),ylim=c(0,8),pch=8)
> abline(s,col="blue")
Figure 10.3
8
6
savings
4
2
0
0 20 40 60 80
income
> s<-lm(y_i˜x_i)
194 Bivariate Data: Correlation and Regression
> summary(s)
Call:
lm(formula = y_i ˜ x_i)
Residuals:
Min 1Q Median 3Q Max
-0.81236 -0.23908 -0.00548 0.29789 0.75944
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -4.03812 0.52144 -7.744 5.51e-05 ***
x_i 0.15115 0.01075 14.062 6.35e-07 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Python instruction:
We use the function sns.regplot() to plot our scatter plot and the best fitting line. We begin
by importing the necessary libraries to run our Python code.
Python code:
1 # Import l i b r a r i e s
2 import pandas a s pd
3 import m a t p l o t l i b . p y p l o t a s p l t
10.3 Inference for regression 195
4 import s e a b o r n a s s n s
5 # Code f o r s c a t t e r p l o t and b e s t f i t t i n g l i n e
6 x=pd . S e r i e s ( [ 2 5 , 2 8 , 3 5 , 3 9 , 4 4 , 4 8 , 5 2 , 5 5 , 6 5 , 7 2 ] , name= ’ Income ’ )
7 y=pd . S e r i e s ( [ 0 . 5 , 0 . 3 , 0 . 8 , 1 . 6 , 1 . 8 , 3 . 1 , 4 . 3 , 4 . 4 , 5 . 6 , 7 . 2 ] , name= ’ S a v i n g s ’ )
8 s n s . r e g p l o t ( x=x , y=y , c o l o r=” b l u e ” , marker=” ∗ ” , s c a t t e r k w s ={” s ” : 1 1 0 } )
9 s n s . s e t ( f o n t s c a l e =1.2)
10 pl t . t i t l e ( ” Figure 10.3 ” )
Residual Analysis
Even the best-fitting linear regression does not completely account for the variability in
the y values. It still leaves the so-called “residuals” or the unexplained part, namely
The sum of squares of these residuals, appropriately called the residual sum of squares RSS,
represents the left-over variation in y that the dependence on x, has failed to account for.
Or one may consider the residual sum of squares as that part of variation in the data not
explained by the linear regression model. We may then ask what proportion of the total
variation in y values has the linear regression accounted for? Recall that the total variation
(yi − y)2 and is called the T otal SS. Thus the answer to our
P
in the y’s is represented by
question is given by
Since the residuals reflect what is “left over” in the observed y, after the assumed linear
dependence on x is accounted for, one might ask if these are purely vestiges of noise, centered
at zero and with constant variance or if there are any significant and noticeable patterns still
left in them. Various plots of residuals are possible including (i) a time-ordered plot of the
residuals ri versus i, if the data is collected over time, to see if one can detect dependence on
time, (ii) a plot of the residuals ri versus the xi , to see if there are other detectable patterns
which may indicate that a linear regression does not adequately describe the dependence of
y on x. This plot can also reveal if the variation in ri changes with the magnitude of xi ,
invalidating the assumption of constant variance that is usually made in the linear regression
model. (iii)Also of concern is the assumption of normality of the residuals, which can be
checked either by using some rather complex statistical tests or graphically by making what
is known as a “normal probability plot”. A normal probability plot is a scatter plot of the
“ordered residuals” against a set of expected values of an ordered sample of the same size
from a standard normal distribution. Such expected values are found from statistical theory
and are available in many computer packages and tables. For instance, if n = 10, such values
are
-1.539, -1.001, -0.656, -0.376, -0.123, 0.123, 0.376, 0.656, 1.001 and 1.539.
10.3 Inference for regression 197
Here -1.539 has the interpretation that it is the mean value (long-run average) of the
smallest value if a sample of size 10 were to be drawn from a N(0,1), -1.001 is the mean value
of the second smallest out of 10 observations from N(0,1), etc. When a normal probability
plot is approximately a straight line, then we have reason to believe that the data, in this
case the residuals, are from a normal curve.
Example 10.2 (contd.): Examine the residuals in Example 10.2, to see if the normality
assumption is valid.
Solution: We order the 10 residuals obtained earlier and plot these against the 10
ordered scores given above and obtain the plot given below. A visual inspection reveals that
this plot is approximately linear and therefore we can conclude that normality is a reasonable
assumption.
•
0.5
• •
0.0
y
•
•
•
•
-0.5
R code:
The function qqplot() is used to check normality.
1 x1<−c ( −1.539 , −1.001 , −0.656 , −0.376 , −0.123 ,
2 0.123 ,0.376 ,0.656 ,1.001 ,1.539)
3 y i<−c ( 0 . 5 , 0 . 3 , 0 . 8 , 1 . 6 , 1 . 8 , 3 . 1 , 4 . 3 , 4 . 4 , 5 . 6 , 7 . 2 )
4 y i h a t<−c ( − 0 . 2 5 9 4 , 0 . 1 9 4 0 , 1 . 2 5 2 0 , 1 . 8 5 6 6 , 2 . 6 1 2 4 ,
5 3.2170 ,3.8215 ,4.2750 ,5.7865 ,6.8445)
6 e i<−y i −y i h a t
7 y . lm = lm ( e i ˜ x1 )
8 y . r e s = r e s i d ( y . lm )
9 q q p l o t ( x1 , y . r e s , ylim=c ( −1 ,1) , x l a b=” x ” , y l a b=” R e s i d u a l s ” ,
10 xlim=c ( − 1 . 7 , 1 . 7 ) , pch =8, cex =0.8 , main=” F i g u r e 1 0 . 4 ” )
Python code:
The function sm.qqplot() is used to check for normality. We begin by importing the necessary
libraries to run our Python code.
1 # Import l i b r a r i e s
2 import pandas a s pd
3 import m a t p l o t l i b . p y p l o t a s p l t
4 import s t a t s m o d e l s . a p i a s sm
5 import numpy a s np
6 # Code t o check n o r m a l i t y
7 y i =[0.5 ,0.3 ,0.8 ,1.6 ,1.8 ,3.1 ,4.3 ,4.4 ,5.6 ,7.2]
8 y ihat =[ −.2594 ,.194 ,1.252 ,1.8566 ,2.6124 ,3.2170 ,3.8215 ,4.275 ,5.7865 ,6.8445]
9 x 1 =[ −1.539 , −1.001 , −0.656 , −0.376 , −.123 ,.123 ,.376 ,.656 ,1.001 ,1.539]
10 e i=np . s u b t r a c t ( y i , y i h a t )
11 model=sm . OLS( e i , x 1 )
12 r e s u l t s=model . f i t ( )
13 r e s=r e s u l t s . r e s i d
14 f i g=sm . q q p l o t ( r e s , marker= ’ ∗ ’ , m a r k e r s i z e =9, c o l o r=” b l a c k ” )
15 pl t . t i t l e ( ” Figure 10.4 ” )
16 p l t . x l a b e l ( ”x” )
17 plt . ylabel ( ” Residuals ” )
18 p l t . show ( )
In many cases, the dependent variable y may depend on more than one independent
variable x, as is assumed in simple linear regression. For example. the yield y of a crop
may depend on the amount of fertilizer applied, x1 , as well as the amount of rainfall, x2 .
Or, one’s blood pressure y may depend on one’s weight, x1 , one’s age, x2 , as well as several
other factors. If we believe that the dependence is linear, one can use a Multiple Linear
Regression equation of the form
y = α + β1 x1 + β2 x2 + · · · + βk xk .
Again the least squares principle provides a set of simultaneous equations in the unknown
parameters from which it is possible to estimate these coefficients and find the best-fitting
line
y = a + b1 x1 + b2 x2 + ... + bk xk .
EXERCISES
10.1 The gas mileage of an automobile first increases and then decreases as the speed in-
creases. Suppose the following data on speed x (miles per hour) and mileage y (miles
per gallon) is obtained.
Speed 20 30 40 50 60
Mileage 24 28 30 28 24
(b) Show that the correlation coefficient rxy = 0 in this example. Do you really believe
there is no association between the 2 variables? Explain the zero correlation in
spite of possible association between speed and mileage.
10.2 A car company has been advertising on television. The number of commercials broad-
casts in a month (x) and the corresponding number of car sales in the same month (y,
in thousands) are given in a table below.
x 15 10 9 11 8 7 12 13
y 33 21 20 25 17 17 26 27
10.3 Suppose it is reasonable to assume that the weight of newborn babies, y in lbs, is a
linear function of its age, x in months, during the first 6 months. The following values
were obtained for n = 10 babies:
1
x = 3.7, y = 15.6, s2x = 23.1 and sxy = (xi − x)(yi − y) = 47.8.
P
n−1
(a) Find the best fitting line and interpret what the intercept and the slope denote.
(b) Use this equation to predict the weight of a baby who is (i) two-months old and
(ii) 5 months old.
10.4 The following table represents x, the weight in pounds and y, the miles per gallon in
the city, for 16 car models:
10.5 Eight randomly selected people were asked to watch a 1-hour television program. In
the middle of the show, a commercial advertising a breakfast cereal appeared. Each
person was shown a commercial of a different length ranging from 20 to 48 seconds,
with the essential content being the same in all cases. After the show, each person was
given a test to determine how much he or she remembered about the product. The
commercial times and test scores (on a 20-point scale) are given below:
Person 1 2 3 4 5 6 7 8
Length of commercial(x) 20 24 28 32 36 40 44 48
Test score(y) 10 8 10 11 14 16 12 13
P P P P 2
Summary statistics: x= 272, y= 94, xy = 3, 320, x = 9, 920,
P 2
y = 1, 150.
(a) Do you think a straight line explains the data? Draw a scatter plot to decide.
(b) Calculate the correlation coefficient of the data. What percentage of the total
variation in test scores would be explained by a regression line?
(c) Calculate the regression line for predicting test-score using commercial length.
(d) What is the predicted test score of a person who watches a 30 second commercial?
10.6 It has been conjectured that the age at which a child speaks his/her first word can
predict the score on an aptitude test given to children entering first grade. Let x = the
first month a word is spoken, y = the score on the aptitude test. Data was collected
on 8 kids.
202 Bivariate Data: Correlation and Regression
x 15 10 9 11 8 7 12 13
y 76 90 93 87 97 100 84 80
P8 P8 P8
Note that i=1 xi yi = 7358, i=1 x2i = 953, i=1 yi2 = 62959.
(a) Draw a scatter plot (use a scale of 0-20 on x and 70-110 on y).
(b) Compute the correlation coefficient between x and y.
(c) Briefly, interpret the value you obtain for the correlation coefficient.
(d) Obtain the least squares regression line, y = a + bx.
(e) Predict the aptitude test score for a child whose first spoken word came at 9
months.
10.7 The following data has been gathered on the used cars for sale:
x = age of the car (in years) 1.0 0.5 5.0 3.0 0.6
y = asking Price (in dollars) 12,900 15,100 5,900 8,950 15,800
(a) Draw a scatter plot to see it and what type of association there is between how
old the car is (x) and asking price, y.
(b) Obtain the regression line y = a + bx. What do a and b signify in this problem?
(c) Predict the asking price for a car which is 4.5 years old and a 95% prediction
interval for this value.
(d) Test if the regression line in statistically significant, i.e., β =0. (Use α = .05.)
10.8 A study of soil erosion produced the following data on the rate at which water flows
across land (x) and the resulting amount of erosion (y).
EXERCISES 203
(a) Find the least squares regression line, y = a + bx, for predicting the amount soil
eroded as a function of flow rate.
(b) Find the predicted amount of soil erosion when the flow rate is x = 100.
(c) Find the correlation coefficient between the two variables and interpret its value.
10.9 In order to estimate the current year’s inventory of tires, a tire company sampled 6 of
its dealers, in each case obtaining this year’s inventory (Y, in 100’s) along with last
year’s (X, in 100’s). The summary statistics are as follows:
P6 P6 P6
X = 5, Y = 8, i=1 x2i = 450, i=1 yi2 = 684, i=1 xi yi = 50.
(a) Find the best linear regression line Y = a + bX, showing how this year’s inventory
Y is related to last year’s value X. In what sense is this the “best” line?
(b) At a 5% level, test the hypothesis that last year’s inventory X is not useful in
predicting Y.
(c) Suppose the actual value of an inventory X for last year turned out to be 8 for a
dealer. Corresponding to this value of x0 = 8, find a 95% confidence interval for
the predicted value of Y, the number of tires in this year’s inventory.
10.10 Eight students obtained the following scores (out of 100) for a psychology test (X) and
a statistics test (Y).
Student A B C D E F G H
Psych. test score (X) 70 38 90 50 96 65 58 77
Stat. test score (Y) 75 56 60 68 56 90 78 52
(a) Compute the sample correlation coefficient, rXY . Interpret its meaning in this
context.
204 Bivariate Data: Correlation and Regression
(b) Find the best regression line Y = A + BX for predicting the statistics score from
the psychology score. In what sense is this the “best” line? Use it to predict the
statistics score of someone whose psychology score is X = 60 points.
10.11 Six students of a statistics course recorded the number of hours per week they spent
in studying for the course, x, and their score in the final exam, y. The following table
gives this data:
10.12 Data was collected from n = 6 people about the number of years of schooling, x, and
their beginning salary in thousands, y.
(a) Compute the sample correlation coefficient, r, and interpret your result in this
context.
(b) Find the least squares regression line, y = a + bx.
(c) Test if this dependence of y on x is statistically significant (use α = .05).
(d) Predict your salary if you decide to stop with a Bachelor’s degree, i.e., x = 16
years of schooling. Also, find a 90% confidence interval for this predicted value.
10.13 Some recent studies indicate that moderate consumption of wine is beneficial to your
heart. Let x be the annual consumption of wine in liters per capita, and y the annual
EXERCISES 205
death rate due to heart disease per population of 100,000. The following table, (Source:
Laura Shapiro, “Food to Your Health?”, Newsweek, January 22, 1996), gives data on
x and y for 10 countries.
Country x y
France 63.5 61.1
Italy 58.0 94.1
Switzerland 46.0 106.4
Australia 15.7 173.0
Britain 12.2 199.9
United States 8.9 176.0
Russia 2.7 373.6
Czech Republic 1.7 283.7
Japan 1.0 34.7
Mexico 0.2 36.4
(a) Compute the correlation coefficient between x and y, and interpret it. Can you
argue a cause and effect relation between x and y?
(b) Set up a regression line of y on x and test if β = 0. Use significance level = 0.05.
(c) Predict the heart disease rate when x = 10 liters and find a 95% prediction
interval.
10.14 According to the authoritative “Third International Mathematics and Science Study”
done in 1995, American 12th graders rank near the bottom in Math and Science literacy,
placing 18th among the 21 countries compared. We look at part of the data to see if
these test scores Y , can be explained by different factors such as time (hours per day)
students spend on hobbies outside school (e.g. such as watching TV, playing sports,
etc.) denoted as x1 , the hours spent on a paid job x2 , the hours spent on homework
x3 and finally the per capita expenditure in US dollars that the government spends on
elementary and secondary education x4 . The table below summarizes the mean scores
of students of different countries on a test measuring knowledge of math and science
given in the 1994-95 school year, as well as the factors mentioned above. (Source:
206 Bivariate Data: Correlation and Regression
Mulis et al. (1998), Mathematics and Science Literacy in the Final Year of Secondary
School, Chestnut Hill, MA, Boston College)
(a) Compute the correlation coefficients between y and x1 , y and x2 , y and x3 , y and
x4 , respectively and interpret them. Can you argue a cause and effect relation
between y and the x’s?
(b) Set up a regression line of y on each of the x’s separately and test if β = 0 in each
case. (Use significance level = .05).
(c) Does the time students spend on homework explain the differences in the test
scores? Find a 95% confidence interval for the mean score if U.S. students were
to get an extra hour of homework per day (i.e., let x3 = 2.7).
EXERCISES 207
(d) How much better can we expect the U.S. students to do if we increased the
expenditure on them, say to x4 = $1,500. Find a 95% prediction interval.
10.15 The following data from the Third International Mathematics and Science Study com-
pare the scores of twelfth graders in advanced mathematics (x) and physics (y).
Country x y
Sweden 512 573
Switzerland 533 488
Denmark 522 534
Canada 509 485
Austria 436 435
Australia 525 518
Slovenia 475 523
France 557 466
Czech Republic 469 451
Russia 542 545
United States 442 423
Cyprus 518 494
Greece 513 486
Germany 465 522
(a) Make a scatter plot and describe the type of association between math and physics
scores.
(b) Compute the correlation coefficient between x and y, and interpret it.
(c) Set up a regression line of y on x and test if β = 0. Use significance level = 0.05.
208
Appendix A
Chapter 1
1.1a. An average of 5.5 feet does not mean that all or even most of the family members are
taller than 4 feet. Also, even with an average depth of 4 feet, the stream can be very very
deep in parts!
b. Statistics deals with drawing conclusions about the population while accounting deals
with actual amounts of money.
c. It is possible that many unsatisfied customers simply do not write.
d. Without the vaccine, the number could have been 100,000 instead of 1,000. It is better
to compare the proportions dead, among the vaccinated and unvaccinated groups.
1.3 The former summarizes data, while the latter draws conclusions about the population
from data.
1.5 Do a street survey, a door-to-door survey or collect information in front of a supermarket.
Unlisted numbers are covered by randomly selecting the digits of a telephone number instead
of from a telephone directory
1.7 The poll will be biased if people in one group are more likely to call in than people from
another group. Also the $1.00 charge excludes all but those who are highly motivated.
Chapter 2
2.1 a. ratio b. ratio c. nominal d. ratio e. ordinal
2.3 a. existence of ”absolute zero” and a meaningful ratio of two data points. b. Yes. Money
you have. c. No. d. ratio
Answers to Odd Numbered Problems 209
2.5 Each of the first three will be 5 miles faster than its corresponding true value, while the
rest three will be correct.
2.7 a. (-2,5,8,10,12) b. Center: mean = 8; Spread: IQR = 5.
2.9 Median
2.11 mean = 44, s = 8.8556
2.13 a. 818.1333 b. 81.6 c. Median, since data is skewed. d. (79.3, 80.3, 81.6,82.6, 86.6)
2.15 a. average=9, s=.9129 b. 9
2.17 a. 12.1 b. 11.5 c. 11 d. 5.685
2.19
15. 7
16. 7
17. 2 9
18. 0 1 3 4 5 7 9
19. 0 0 0 2 2 3 5 9
20. 3 4 5
21. 2 6 9
22. 5 6
23. 0
24.
25. 3
26.
27.
28. 3
Chapter 3
1
3.1 18009460
3.3 a. .28 b. .18 c. .42 d. .54
3.5 a. .175 b. .525 c. .075 d. .225
3.7 a. 330 b. .0606
3.9 a. .144 b. .336 c. .92 d. .944
3.11 a. .9410 b. .000531
210 Answers to Odd Numbered Problems
Chapter 4
4.1 a. .9 b. µ = 3, σ = 1.0954
4.3 a. .05 b. .9 c. .85, 1.1079
4.5 a. .35, .25 b. 2.3, 1.1874
4.7 a. 1.10 b. 1008.79 c. No, even though the expected winnings is $1.10, the variance is
very large and 998 out of a 1000 end up losers.
4.9 a. .75 b. 1.89, .9262
4.11 a. 15.3 b. 1.9261
1
4.13 a. 8
b. 83 , 18 c. 4, 2.3094
Chapter 5
5.1 .7854
5.3 a. Binomial with n = 15, p = .30 b. .0037
3 6
5.5 No. 2
(0.5)2 (0.5)1 = .375, but 4
(0.5)4 (0.5)2 = .234375
5.7 a. P(X > 5) =P(Z > -2) = .9772 b. .1359 c. 5.72
5.9 a. .0228 b. .6826 c. 0.1359 × 0.0228 = 0.0031
5.11 665.2
5.13 a. .1587 b. .3085 c. .2405
5.15 a. .0228 b. .0918 c. 8.84 years
5.17 a. .9690 b. 0 c. 95.80
6−µ
5.19 P(X > 6) = P Z > .1
= .02, µ = 6 - 2.055 × .1 = 5.7945
5.21 a. .0668 b. .9282 c. 152.8
5.23 a. .6826 b. .0918 c. .0084 d. .0038 e. 65.99
5.25 a. .0164 b. .7373 c. 0 d. 2, 1.6
Chapter 6
6.1 P(|X − µ| ≤ 1) = P(-1.29 ≤ Z ≤ 1.29) = .8030
6.3 a. .0359 × .0139 = .0005. No. The latter, since it covers any combination of scores
totalling 1300, including (600 and 700). b. .0136
6.5 a. Continuous, X is anywhere between 6.5 and 7.5. b. .5 c. .0003
Answers to Odd Numbered Problems 211
Chapter 7
2
3.7 1.645×3.7
7.1 a. 4.0 ± 1.645 × √
12
= (2.243, 5.757). b. n = 1
=37.045. Rounded up to the
next integer gives n = 38.
8.5
7.3 178 ± 2.704 × √
40
= (174.3659, 181.6341)
3.162
7.5 17 ± 2.571 × √6 = (13.681, 20.319)
7.7 20 ± 1.860 × .141421
√
9
= (19.9123, 20.0877)
.8971
7.9 5.9392 ± 2.201 × √12 = (5.3692, 6.5092)
q
12 (.24)(.96)
7.11 pb = 50
= .24. .24 ± 1.960 × 50
= (.1216, .3584)
q
22 (.11)(.89)
7.13 pb = 200
= .11. .11 ± 1.960 × 200
= (.0666, .1534)
2
1.960
7.15 a. n = 2×0.05
=384.16. Round up to 385.
q
55 (.55)(.45)
b. pb = 100
= .55, .55 ± 1.645 × 100
= (.4682, .6318)
c. P(X ≤ 8) = 1 - P(X=9) - P(X=10) = .9893
7.17 a. 752 b. (.7800, .9200)
7.19 a. (.286, .354) b. 846
212 Answers to Odd Numbered Problems
Chapter 8
8.1 Type I: error of concluding that the space-ship will not return safely when in fact it will.
Type II: error of concluding that the ship will return safely when in fact it will not. Type II
is of more concern.
8.3 H0 : µ = 12 vs. Ha : µ < 12; t = -5.657 < -t.05 (19) = -1.729. Reject H0 .
8.5 a. z = 1.8934 > 1.645. Reject H0 . b. t = 1.1669 6> t.05 (14) = 1.761. Do not reject H0 .
8.7 H0 : µ = 19.8 vs. Ha : µ > 19.8; t = 4.2433 > t.05 (8) = 1.860. Reject H0 .
8.9 H0 : µ ≥ 40000 vs. Ha : µ < 40000; t = -.3902 6> t.05 (15) = 1.753. Do not reject H0 .
8.11 a. H0 : µ = 113 vs. Ha : µ 6= 113; t = 1.86 6> t.005 (48) ≈ 2.704. Do not reject H0 .
b. P-value is between .05 and .10.
8.13 a. H0 : µ = 25 vs. Ha : µ > 25; t = 3.18 > t.10 (9) = 1.383. Reject H0 .
4.6678
b. 29.7 ± 3.250 × √
10
= (24.9027, 34.4973)
q q
9 (4.6678)2 9 (4.6678)2
c. χ21−α/2 = 2.700, χ2α/2 = 19.023, 19.023
, 2.700
= (3.21, 8.52)
8.15 H0 : p = .5 vs. Ha : p 6= .5; z = 1.7889. |z| >
6 z.025 = 1.960. Do not reject H0 .
8.17 H0 : p = .5 vs. Ha : p > .5; z = 0.855 6> z.05 = 1.645. Do not reject H0 . P-value =
.1977
1
8.19 H0 : p = 6
vs. Ha : p > 16 ; z = 2.1909 6> z.01 = 2.326. Do not reject H0 .
8.21 a. H0 : p = .3 vs. Ha : p > .3; z = 3.8576 > z.05 = 1.645. Reject H0 . b. P-value = 0.
Chapter 9
9.1 a. H0 : µd ≤ 50 vs. Ha : µd > 50; t= 0 6> t.05 (5) = 2.015. Do not reject H0 .
b. 0.5
9.042
9.3 a. 9.677 ± 1.860 × 9
= (4.061, 15.273). Weight loss is normally distributed.
b. H0 : µx = µy vs. Ha : µx < µy ; t= -.110 6< -t.05 (8) = -1.860. Do not reject H0 .
9.5 a. t = 1.55 6> t.05 (12) = 1.782. Do not reject H0 . b. t = 1.63 6> t.05 (24) = 1.711. Do not
reject H0 .
9.7 t = -3.3806 < -t.01 (33) = -2.43. Reject H0 .
9.9 a. H0 : µ1 = µ2 vs. Ha : µ1 > µ2
b. t= 6.162 > t.05 (25) = 1.708. Reject H0 .
c. t= 5.68 > t.05 (11) = 1.796. Reject H0 .
9.11 a. sp = 1.8148. t = .538 6> t.025 (10) = 2.228. Do not reject H0 .
b. t = -.5783 6> t.025 (4) = 2.776. Do not reject H0 .
9.13 H0 : µx = µy vs. Ha : µx 6= µy ; t = .9065. |.9065| = .9065 6> t.025 (24) = 2.064. Do not
Answers to Odd Numbered Problems 213
reject H0 .
9.15 H0 : p1 = p2 vs. Ha : p1 6= p2 ; z = 1.7624. |1.7624| = 1.7624 6> z.025 = 1.960. Do not
reject H0 .
Chapter 10
1 Pn xi −x yi −y
10.1 b. rxy = n−1 i=1 sx sy
. In the sum, (20,24) cancels with (60, 24) and (30, 28)
with (50, 28), while (40, 30) results a zero summand.
10.3 a. yb = 7.9437 + 2.0693x. The line intercept with the y axis at (0, 7.9437) and a baby
will grow 2.0693 lbs. (on average) every month.
b. (i) 12.0823 lbs. (ii) 18.2902 lbs.
10.5 a. Yes, roughly. b. r = .709; r2 = .503, 50.3 c. yb = 5.63 + .18x d. 11.03
10.7 a. negative association b. yb = 14939.0933 − 1753.2839x
√
c. 7049.3158, se = 178811608 = 1495.0401, 95% CI = (3064.4696, 11034.1620)
−1753.2839
d. t = 310.3641
= −5.6491.—t—> t.025 = 2.306. Reject H0 .
10.9 a. yb = 11.1665 − .6333x. The sum of squares of residuals will be minimized.
b. t = -1.63. |t| > t.025 = 2.776. Do r
not reject H0 .
2
c. 11.1665 -.6333*8 ±2.776 × 6.7023 1 + 61 + (8−5)302 = (-14.2530, 26.4532).
450− 6
10.11 a. r = .9165. Strong positive linear relationship. b. yb = 33.33 + 7.00x c. 71.83, (60.6,
83.1)
10.13 a. r = .3948. No. b. yb = 190.7307 − 1.7552x. t = -1.2153. |t| 6> t.025 = 2.306. Do not
reject H0 . r
1 (10−20.99)2
c. 173.1787; 173.1787 ±2.306 × 108.0030 1 + 10
+ 2 = (-90.5842, 436.9416)
9998.01− 209.9
10
Appendix B
Introduction to R
B.1 What is R?
R is one of the most popular and useful programming languages that has free and
open resources. It has many statistical computing and graphing tools, allowing one to
analyze data sets, large and small. Many built-in functions and packages are already
part of R, making it very convenient for users. Users can compile and run R on various
operating systems including Windows, Mac OS X, and Linux.
• Choose the first one, RStudio Desktop Open Source License, click on DOWN-
LOAD
• Choose the Installers for Supported Platforms and download the RStudio
• Follow the installer’s simple steps to install R
(c) Video Link for tutorial to download R and RStudio
• For Mac: https://www.youtube.com/watch?v=d-u7 vdag − 0
F orW indows : https://www.youtube.com/watch?v=9-RrkJQQYqY
As a tool for statistical analysis, there are many built-in functions in R programming
that help users to find the information more accurately and conveniently.
However, several powerful functions are not contained in R by default, but we can
install packages to use these functions. To install a package, we only have to use
the function install.packages(“nameof thepackage”). For example, we can install the
package “ statip “ in R by saying
1 i n s t a l l . packages ( ’ s t a t i p ’ )
2
218 Introduction to R
The package has been successfully installed if it shows the following line at the end of
the output.
Note that the directory would be different for each user depending on different settings
of the computer; but once this line prints out, package installation is complete.
? operator offers exactly the same result. Therefore, even if we forgot the usage of a
function in R, we can use these two functions to search for the information we want.
219
Appendix C
Introduction to Python
• Like several other programming languages, Python has a number of basic types
including integers, floats, booleans, and strings, which behave in ways similar to
other programming languages.
• Unlike many languages, Python does not have unary increment (x++) or decre-
ment (x−−) operators.
Numbers: include integers and floats.
Booleans: Python uses English words like and, not, and or, instead of symbols
like (&&, k, etc.).
Strings: Python has great support for strings and can be used with single or
double quotes.
• Python has several built-in containers including lists, loops, and functions.
List: Python equivalent of arrays but are resizable and can contain elements of
different types. It stores a series of items in a specific order, which can be accessed
using an index or within a loop e.g.
– nums = [1, 5, 2]
– print(nums[−1])
• Numpy is the core scientific computing library in Python. To get started, type
import numpy as np in Jupyter and run it.
• It can be used to deal with powerful N −dimensional arrays and matrices, along
with high-level mathematical functions.
For instance, numpy.random.binomial(n, p, size): draws samples from a
binomial distribution where n is number of trials, p is probability of success, and
size is output shape. (see Ch 5)
Similary numpy.random.uniform(low, high, size): draws samples from a
uniform distribution where low is the lower bound of the interval, high is the
upper bound, and size is output shape. (Ch 5)
numpy.mean(a): calculates the average of the elements of array a.
numpy.std(a): calculates the standard deviation of the elements of array a.
• Matplotlib is a Python 2D plotting library which allows one to plot figures and
create visualizations such as line graphs and scatterplots. To get started, type
import matplotlib.pyplot as plt in Jupyter.
• Plots can be customized using customizations such as plt.scatter(), plt.title(),
plt.xlabel(), plt.ylabel(), and plt.show() to add titles, labels, and scaling axes.
plt.savefig() saves a plot whereas plt.figure() is used to set a custom figure
size.
Appendix D
• Ex. 2.2
Using R:
1 s t u d e n t=c ( 3 6 , 1 5 , 2 2 , 3 2 , 4 0 )
2 major=c ( ” Communication ” , ” B i o l o g y ” , ” Economics ” , ” Psy chol ogy ” , ” Undeclared ” )
3 b a r p l o t ( s t u d e n t , names . a r g = major , x l a b=” Major ” , y l a b=”Number o f Student ” ,
main=” S t a t i s t i c s C l a s s ” )
• Ex. 2.7
Using Python:
1 weight =(10 ,8 ,12 , −2 ,8 ,8 ,9 ,11 ,5 ,4)
2 import m a t p l o t l i b . p y p l o t a s p l t
3 p l t . boxplot ( weight )
4 p l t . t i t l e ( ’ Boxplot For t h e Weights o f Rabbits ’ )
5 p l t . show ( )
• Ex. 2.8
Using R:
a.)
1 time=c ( 1 0 , 1 , 2 , 9 , 5 , 3 , 4 , 1 , 0 , 5 , 6 , 5 , 3 , 9 , 3 )
2 summary ( time )
3 b o x p l o t ( time , main=” Waiting time i n minutes ” )
> time=c(10,1,2,9,5,3,4,1,0,5,6,5,3,9,3)
> summary(time)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0 2.5 4.0 4.4 5.5 10.0
b.)
1 mean ( time )
2 sd ( time )
> mean(time)
[1] 4.4
> sd(time)
[1] 3.065942
226 Python and R Code for Answering Selected Exercises from the Book
• Ex 2.18
Using Python:
1 enrollment =(36 ,28 ,33 ,41 ,39 ,36 ,31 ,35 ,33 ,35 ,39 ,41 ,36 ,
2 30 ,38 ,37 ,41 ,44 ,46 ,38 ,33 ,35 ,47 ,27 ,36)
3 import m a t p l o t l i b . p y p l o t a s p l t
4 p l t . stem ( e n r o l l m e n t , l i n e f m t=” −. ” )
5 p l t . t i t l e ( ’ Stem and L e a f p l o t For t h e E nr ol lm en t ’ )
6 p l t . show ( )
Chapter 5
• Ex. 5.2
Using R:
1 # Ex 5 . 2
2 #p ( x=4)
3 dbinom ( 4 , s i z e =20 , prob=1/ 4 )
4 #p ( x=8 o r more )
5 sum ( dbinom ( 8 : 2 0 , s i z e =20 , prob=1/ 4 ) )
6
Using Python:
1 import numpy a s np
2 n=20
3 p=1/4
4 c o u n t 4=0
5 c o u n t 8 m o r e=0
6 x=np . random . b i n o m i a l ( n , p , 1 0 0 0 0 )
7 f o r i in range (10000) :
8 i f x [ i ]==4:
9 c o u n t 4=c o u n t 4+1
10 i f x [ i ] >=8:
11 c o u n t 8 m o r e=c o u n t 8 m o r e+1
12 p r i n t ( count 4 /10000)
13 p r i n t ( count 8 more /10000)
14
228 Python and R Code for Answering Selected Exercises from the Book
0.1862
0.1022
• Ex. 5.9 a) b)
Using R:
1 #prob jumbo
2 1−pnorm ( 3 0 , mean=20 , sd =5)
3 pnorm ( 3 0 , mean=20 , sd =5, l o w e r . t a i l = F) # same a s l i n e above
4 #prob medium
5 pnorm ( 3 0 , mean=20 , sd =5)−pnorm ( 2 5 , mean=20 , sd =5)
6
Using Python:
1 import numpy a s np
2 mu=20
3 sigma=5
4 count jumbo=0
5 count medium=0
6 x=np . random . normal (mu, sigma , 1 0 0 0 0 )
7 f o r i in range (10000) :
8 i f x [ i ] >30:
9 count jumbo=count jumbo+1
10 i f x [ i ]>=15 and x [ i ] <=25:
11 count medium=count medium+1
12 p r i n t ( count jumbo / 1 0 0 0 0 )
Python and R Code for Answering Selected Exercises from the Book 229
13 p r i n t ( count medium / 1 0 0 0 0 )
14
0.0179
0.6866
• Ex. 5.22
Using R:
1 #p ( x<1)
2 pnorm ( 1 . 0 0 , mean=0, sd =1)
3 #p ( x<−1 . 5 )
4 pnorm ( 1 . 5 , mean=0, sd =1)
5 #p ( x >2.4)
6 1−pnorm ( 2 . 4 , mean=0, sd =1)
7 #p ( x=2)
8 dnorm ( 2 , mean=0, sd =1)
9 #between 1 and 2
10 pnorm ( 2 , mean=0, sd =1)−pnorm ( 1 , mean=0, sd =1)
11 #between −1 and 2 . 2
12 pnorm ( 2 . 2 , mean=0, sd =1)−pnorm( −1 , mean=0, sd =1)
13
• Ex. 5.23 a) b) d) e)
Using R:
1 #a )
2 pnorm ( 7 1 , mean=68 , sd =3)−pnorm ( 6 5 , mean=68 , sd =3)
3 #b )
4 # 6 f e e t i s 6∗12=72 i n c h e s
5 1−pnorm ( 7 2 , mean=68 , sd =3)
6 #d )
7 pnorm ( 6 0 , mean=68 , sd =3)
8 #e )
9 qnorm ( p =0.25 , mean=68 , sd =3)
10
> #a)
> pnorm(71, mean=68, sd=3)-pnorm(65, mean=68, sd=3)
[1] 0.6826895
> #b)
> 1-pnorm(72, mean=68, sd=3)
[1] 0.09121122
> #d)
> pnorm(60,mean=68, sd=3)
[1] 0.003830381
> #e)
> qnorm(p=0.25, mean=68, sd=3)
[1] 65.97653
Python and R Code for Answering Selected Exercises from the Book 231
Chapter 6
• Interpreting Fact 4:
Fact 4 is considered the core principle in the whole chapter. Before we dig into specific
problems, we could use Python to first test its accurateness:
Suppose we have a binomial distribution of Bin(50, 1/6); the simulation goes like:
1 import numpy a s np
2 n=50
3 p=1/6
4 x=np . random . b i n o m i a l ( n , p , 1 0 0 0 0 0 0 ) # s i m u l a t i n g random a r r a y f o l l o w i n g
binomial d i s t r i b u t i o n
5 sim mean=np . mean ( x ) #mean o f t h e s i m u l a t e d a r r a y x
6 s i m s d=np . s t d ( x ) # sd o f s i m u l a t e d a r r a y x
7 t h e o r e t i c a l m e a n=n∗p # formula in f a c t 4
8 t h e o r e t i c a l s d=s q r t ( n∗p∗(1−p ) ) # f o r m u l a i n f a c t 4
9 p r i n t ( ” t h e s i m u l a t e d r e s u l t mean and s t a n d a r d d e v i a t i o n i s ” , sim mean ,
sim sd )
10 p r i n t ( ” t h e t h e o r e t i c a l r e s u l t mean and s t a n d a r d d e v i a t i o n i s ” ,
theoretical mean , theoret ical sd )
11
[1] the simulated result mean and standard deviation is 8.329962 2.63570124987
[2] the theoretical result mean and standard deviation is 8.333333333333332
2.63523138347
As we can see, the simulated values are very close to the result expected from the
formula, and the accuracy is very high.
In the Figure 6.3, we increase the sample size n from 10 to 80 as we go from the top to
the bottom; and vary the p values 0.05, 0.5 and 0.8 as we go from left to right. These
plots demonstrate how the normal approximation gets better with increasing n and for
p values of closer to 0.5. The R code for plotting these graphs is attached below, for
the readers who would like to experiment with other values of n and p.
232 Python and R Code for Answering Selected Exercises from the Book
1 x1 <− s e q ( 0 , 1 0 , by=1)
2 y1 <− dbinom ( x1 , 1 0 , 0 . 0 5 )
3 p l o t 1 1 <− p l o t ( x1 , y1 , c o l=” r e d ” , pch =16)
4 y2 <− dbinom ( x1 , 1 0 , 0 . 5 )
5 p l o t 1 2 <− p l o t ( x1 , y2 , c o l=” r e d ” , pch =16)
6 y3 <− dbinom ( x1 , 1 0 , 0 . 8 )
7 p l o t 1 3 <− p l o t ( x1 , y3 , c o l=” r e d ” , pch =16)
8
9 x2 <− s e q ( 0 , 2 0 , by=1)
10 y4 <− dbinom ( x2 , 2 0 , 0 . 0 5 )
11 p l o t 2 1 <− p l o t ( x2 , y4 , c o l=” g r e e n ” , pch =16)
12 y5 <− dbinom ( x2 , 2 0 , 0 . 5 )
13 p l o t 2 2 <− p l o t ( x2 , y5 , c o l=” g r e e n ” , pch =16)
14 y6 <− dbinom ( x2 , 2 0 , 0 . 8 )
15 p l o t 2 3 <− p l o t ( x2 , y6 , c o l=” g r e e n ” , pch =16)
16
17 x3 <− s e q ( 0 , 5 0 , by=1)
18 y7 <− dbinom ( x3 , 5 0 , 0 . 0 5 )
19 p l o t 3 1 <− p l o t ( x3 , y7 , c o l=” b l u e ” , pch =16)
20 y8 <− dbinom ( x3 , 5 0 , 0 . 5 )
21 p l o t 3 2 <− p l o t ( x3 , y8 , c o l=” b l u e ” , pch =16)
22 y9 <− dbinom ( x3 , 5 0 , 0 . 8 )
23 p l o t 3 3 <− p l o t ( x3 , y9 , c o l=” b l u e ” , pch =16)
24
25 x4 <− s e q ( 0 , 8 0 , by=1)
26 y10 <− dbinom ( x4 , 8 0 , 0 . 0 5 )
27 p l o t 4 1 <− p l o t ( x4 , y10 , c o l=” p u r p l e ” , pch =16)
28 y11 <− dbinom ( x4 , 8 0 , 0 . 5 )
29 p l o t 4 2 <− p l o t ( x4 , y11 , c o l=” p u r p l e ” , pch =16)
30 y12 <− dbinom ( x4 , 8 0 , 0 . 8 )
31 p l o t 4 3 <− p l o t ( x4 , y12 , c o l=” p u r p l e ” , pch =16)
32
• Ex 6.11
1 # t h i s q u e s t i o n meets t h e p r e r e q u i s i t e o f u s i n g normal d i s t r i b u t i o n a s
approximation
2 n=200
3 p=0.3
4 norm mean=n∗p
5 norm s t d=s q r t ( n∗p∗(1−p ) )
Python and R Code for Answering Selected Exercises from the Book 233
6 #c a l c u l a t e P(X<=50)=1−P(X>50)
7 1−pnorm ( 5 0 , mean = norm mean , sd=norm std , l o w e r . t a i l = TRUE)
8 # i f using binomial d i s t r i b u t i o n
9 1−pbinom ( 5 0 , s i z e = n , prob = p , l o w e r . t a i l = TRUE)
10 # t h e output a r e p r e t t y c l o s e t o each o t h e r
11
> n=200
> p=0.3
> norm_mean=n*p
> norm_std=sqrt(n*p*(1-p))
> 1-pnorm(50,mean = norm_mean,sd=norm_std,lower.tail = TRUE)
[1] 0.9385887
> 1-pbinom(50,size = n,prob = p,lower.tail = TRUE)
[1] 0.9304547
• Ex 6.12
1 p=0.8
2 # a ) exact
3 pbinom ( 1 1 , 1 5 , prob = p , l o w e r . t a i l = TRUE)
4 # a ) a p p r o x i m a t i n g by normal d i s t r i b u t i o n
5 pnorm ( 1 1 , mean = 15 ∗p , sd=s q r t ( 1 5 ∗p∗(1−p ) ) , l o w e r . t a i l =TRUE)
6
7 # b) exact
8 pbinom ( 1 1 0 , 1 5 0 , prob = p , l o w e r . t a i l = TRUE)
9 # b ) a p p r o x i m a t i n g by normal d i s t r i b u t i o n
10 pnorm ( 1 1 0 , mean = 150 ∗p , sd=s q r t ( 1 5 0 ∗p∗(1−p ) ) , l o w e r . t a i l =TRUE)
11
12 # c ) exact
13 pbinom ( 1 1 0 0 , 1 5 0 0 , prob = p , l o w e r . t a i l = TRUE)
14 # c ) a p p r o x i m a t i n g by normal d i s t r i b u t i o n
15 pnorm ( 1 1 0 0 , mean = 1500 ∗p , sd=s q r t ( 1 5 0 0 ∗p∗(1−p ) ) , l o w e r . t a i l =TRUE)
16
17 # d) exact
18 pbinom ( 1 1 0 0 0 , 1 5 0 0 0 , prob = p , l o w e r . t a i l = TRUE)
19 # d ) a p p r o x i m a t i n g by normal d i s t r i b u t i o n
20 pnorm ( 1 1 0 0 0 , mean = 15000 ∗p , sd=s q r t ( 1 5 0 0 0 ∗p∗(1−p ) ) , l o w e r . t a i l =TRUE)
21
234 Python and R Code for Answering Selected Exercises from the Book
Output:
> p=0.8
> pbinom(11,15,prob = p,lower.tail = TRUE)
[1] 0.3518379
> pnorm(11,mean = 15*p,sd=sqrt(15*p*(1-p)),lower.tail=TRUE)
[1] 0.2593025
Apart from what is being asked, we can also view the problem from a graphical per-
spective:
Python and R Code for Answering Selected Exercises from the Book 235
1 # a) plot interpretation
2 x1 <− s e q ( 0 , 1 5 , by=1)
3 y1 <− dbinom ( x1 , 1 5 , 0 . 8 )
4 y2 <− dnorm ( x1 , mean = 15 ∗ 0 . 8 , sd=s q r t ( 1 5 ∗ 0 . 8 ∗ 0 . 2 ) )
5 p l o t 1 1 <− p l o t ( x1 , y1 , c o l=” r e d ” , main = ” 6 . 1 2 a ) ” )
6 l i n e s ( x1 , y2 , c o l=” b l u e ” )
7 # b) plot interpretation
8 x1 <− s e q ( 0 , 1 5 0 , by=1)
9 y1 <− dbinom ( x1 , 1 5 0 , 0 . 8 )
10 y2 <− dnorm ( x1 , mean = 150 ∗ 0 . 8 , sd=s q r t ( 1 5 0 ∗ 0 . 8 ∗ 0 . 2 ) )
11 p l o t 1 1 <− p l o t ( x1 , y1 , c o l=” r e d ” , main = ” 6 . 1 2 b ) ” )
12 l i n e s ( x1 , y2 , c o l=” b l u e ” )
13 # c ) plot interpretation
14 x1 <− s e q ( 0 , 1 5 0 0 , by=1)
15 y1 <− dbinom ( x1 , 1 5 0 0 , 0 . 8 )
16 y2 <− dnorm ( x1 , mean = 1500 ∗ 0 . 8 , sd=s q r t ( 1 5 0 0 ∗ 0 . 8 ∗ 0 . 2 ) )
17 p l o t 1 1 <− p l o t ( x1 , y1 , c o l=” r e d ” , main = ” 6 . 1 2 c ) ” )
18 l i n e s ( x1 , y2 , c o l=” b l u e ” )
19 # d) plot interpretation
20 x1 <− s e q ( 0 , 1 5 0 0 0 , by=1)
21 y1 <− dbinom ( x1 , 1 5 0 0 0 , 0 . 8 )
22 y2 <− dnorm ( x1 , mean = 15000 ∗ 0 . 8 , sd=s q r t ( 1 5 0 0 ∗ 0 . 8 ∗ 0 . 2 ) )
23 p l o t 1 1 <− p l o t ( x1 , y1 , c o l=” r e d ” , main = ” 6 . 1 2 d ) ” )
24 l i n e s ( x1 , y2 , c o l=” b l u e ” )
Chapter 9
• Ex 9.1
Using R:
1 # a)
2 t 0 . 0 5<−qt ( 0 . 9 5 , d f =5)
3 round ( t 0 . 0 5 , d i g i t s =3)
4 # b)
5 a c i d l o w e r<−1 2 . 4 − qt ( 0 . 9 7 5 , d f =9)∗ s q r t ( 6 3 . 1 5 5 6 / 1 0 )
6 a c i d upper<−1 2 . 4 + qt ( 0 . 9 7 5 , d f =9)∗ s q r t ( 6 3 . 1 5 5 6 / 1 0 )
236 Python and R Code for Answering Selected Exercises from the Book
> # a)
> t_0.05<-qt(0.95,df=5)
> round(t_0.05,digits=3)
[1] 2.015
> # b)
> acid_lower <- 12.4 - qt(0.975,df=9)*sqrt(63.1556/10)
> acid_upper <- 12.4 + qt(0.975,df=9)*sqrt(63.1556/10)
> print(paste(round(acid_lower,digits=4),
round(acid_upper,digits=4)))
[1] "6.715 18.085"
Chapter 10
• Ex 10.4
Using R:
1 # a)
2 x l b<−c ( 2 7 0 5 , 3 4 7 0 , 3 9 3 5 , 3 1 9 5 , 4 0 2 5 ,
3 2270 ,1845 ,3735 ,1695 ,2285 ,3695 ,
4 2970 ,2295 ,3240 ,2045 ,2985)
5 y mpg<−c ( 2 5 , 1 9 , 1 6 , 2 1 , 1 5 ,
6 29 ,31 ,15 ,46 ,26 ,17 ,
7 26 ,29 ,19 ,33 ,21)
8 p l o t ( x lb , y mpg , main=” E x e r c i s e 1 0 . 4 a ” ,
9 x l a b=” Weight ( l b s ) ” , y l a b=” M i l e s Per G a l l o n ” ,
10 pch =20 , cex =0.8)
11 # b)
12 r<−c o r ( x lb , y mpg)
13 round ( r , d i g i t s =4)
14 # c)
15 c<−lm ( y mpg˜x l b )
16 p l o t ( c , main=” E x e r c i s e 1 0 . 4 c ” ,
17 x l a b=” Weight ( l b s ) ” , y l a b=” M i l e s Per G a l l o n ” ,
18 pch =20 , cex =0.8)
Python and R Code for Answering Selected Exercises from the Book 237
19 a b l i n e ( c , c o l=” r e d ” )
20 # d)
21 summary ( c )
22 # e)
23 x c a r <− 2500
24 y c a r <− 53.05 −0.01 ∗x c a r
25 y car
26 d f c a r <− data . frame ( x l b =2500)
27 p r e d i c t ( c , d f car , s e . f i t =TRUE, l e v e l =0.95 ,
28 interval = ’ confidence ’ )
> # b)
> cor(x_lb,y_mpg)
> round(r,digits=4)
[1] -0.9186
> # c)
> c <- lm(y_mpg˜x_lb)
> plot(c,main="Exercise 10.4c",
xlab="Weight (lbs)",ylab="Miles Per Gallon",
pch=20,cex=0.8)
> abline(c,col="red")
> # d)
> summary(c)
Call:
lm(formula = y_mpg ˜ x_lb, data = cars)
Residuals:
Python and R Code for Answering Selected Exercises from the Book 239
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 53.047551 3.417447 15.523 3.23e-10 ***
x_lb -0.009932 0.001142 -8.694 5.13e-07 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
$se.fit
[1] 0.9572027
$df
[1] 14
$residual.scale
[1] 3.365896
240 Python and R Code for Answering Selected Exercises from the Book
• Ex 10.11
Using R:
1 # a)
2 x h r s<−c ( 5 , 6 , 8 , 4 , 6 , 7 )
3 y s c o r e<−c ( 7 0 , 7 5 , 8 5 , 6 0 , 7 2 , 9 0 )
4 c o r ( x hrs , y s c o r e )
5 # b)
6 s c o r e<−lm ( y s c o r e ˜x h r s )
7 summary ( s c o r e )
> # a)
> x_hrs <- c(5,6,8,4,6,7)
> y_score <- c(70,75,85,60,72,90)
> r <- cor(x_hrs,y_score)
> round(r,digits=4)
[1] 0.9165
> # b)
> score<-lm(y_score˜x_hrs)
> summary(score)
Call:
lm(formula = y_score ˜ x_hrs, data = exam)
Residuals:
1 2 3 4 5 6
1.6667 -0.3333 -4.3333 -1.3333 -3.3333 7.6667
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 33.333 9.375 3.556 0.0237 *
x_hrs 7.000 1.528 4.583 0.0102 *
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Python and R Code for Answering Selected Exercises from the Book 241
Appendix E
Tables
243
z
Z
z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09
-3.4 .0003 .0003 .0003 .0003 .0003 .0003 .0003 .0003 .0003 .0002
-3.3 .0005 .0005 .0005 .0004 .0004 .0004 .0004 .0004 .0004 .0003
-3.2 .0007 .0007 .0006 .0006 .0006 .0006 .0006 .0005 .0005 .0005
-3.1 .0010 .0009 .0009 .0009 .0008 .0008 .0008 .0008 .0007 .0007
-3.0 .0013 .0013 .0013 .0012 .0012 .0011 .0011 .0011 .0010 .0010
-2.9 .0019 .0018 .0018 .0017 .0016 .0016 .0015 .0015 .0014 .0014
-2.8 .0026 .0025 .0024 .0023 .0023 .0022 .0021 .0021 .0020 .0019
-2.7 .0035 .0034 .0033 .0032 .0031 .0030 .0029 .0028 .0027 .0026
-2.6 .0047 .0045 .0044 .0043 .0041 .0040 .0039 .0038 .0037 .0036
-2.5 .0062 .0060 .0059 .0057 .0055 .0054 .0052 .0051 .0049 .0048
-2.4 .0082 .0080 .0078 .0075 .0073 .0071 .0069 .0068 .0066 .0064
-2.3 .0107 .0104 .0102 .0099 .0096 .0094 .0091 .0089 .0087 .0084
-2.2 .0139 .0136 .0132 .0129 .0125 .0122 .0119 .0116 .0113 .0110
-2.1 .0179 .0174 .0170 .0166 .0162 .0158 .0154 .0150 .0146 .0143
-2.0 .0228 .0222 .0217 .0212 .0207 .0202 .0197 .0192 .0188 .0183
-1.9 .0287 .0281 .0274 .0268 .0262 .0256 .0250 .0244 .0239 .0233
-1.8 .0359 .0351 .0344 .0336 .0329 .0322 .0314 .0307 .0301 .0294
-1.7 .0446 .0436 .0427 .0418 .0409 .0401 .0392 .0384 .0375 .0367
-1.6 .0548 .0537 .0526 .0516 .0505 .0495 .0485 .0475 .0465 .0455
-1.5 .0668 .0655 .0643 .0630 .0618 .0606 .0594 .0582 .0571 .0559
-1.4 .0808 .0793 .0778 .0764 .0749 .0735 .0721 .0708 .0694 .0681
-1.3 .0968 .0951 .0934 .0918 .0901 .0885 .0869 .0853 .0838 .0823
-1.2 .1151 .1131 .1112 .1093 .1075 .1056 .1038 .1020 .1003 .0985
-1.1 .1357 .1335 .1314 .1292 .1271 .1251 .1230 .1210 .1190 .1170
-1.0 .1587 .1562 .1539 .1515 .1492 .1469 .1446 .1423 .1401 .1379
-0.9 .1841 .1814 .1788 .1762 .1736 .1711 .1685 .1660 .1635 .1611
-0.8 .2119 .2090 .2061 .2033 .2005 .1977 .1949 .1922 .1894 .1867
-0.7 .2420 .2389 .2358 .2327 .2296 .2266 .2236 .2206 .2177 .2148
-0.6 .2743 .2709 .2676 .2643 .2611 .2578 .2546 .2514 .2483 .2451
-0.5 .3085 .3050 .3015 .2981 .2946 .2912 .2877 .2843 .2810 .2776
-0.4 .3446 .3409 .3372 .3336 .3300 .3264 .3228 .3192 .3156 .3121
-0.3 .3821 .3783 .3745 .3707 .3669 .3632 .3594 .3557 .3520 .3483
-0.2 .4207 .4168 .4129 .4090 .4052 .4013 .3974 .3936 .3897 .3859
-0.1 .4602 .4562 .4522 .4483 .4443 .4404 .4364 .4325 .4286 .4247
-0.0 .5000 .4960 .4920 .4880 .4840 .4801 .4761 .4721 .4681 .4641
244
z
Z
0
z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09
0.0 .5000 .5040 .5080 .5120 .5160 .5199 .5239 .5279 .5319 .5359
0.1 .5398 .5438 .5478 .5517 .5557 .5596 .5636 .5675 .5714 .5753
0.2 .5793 .5832 .5871 .5910 .5948 .5987 .6026 .6064 .6103 .6141
0.3 .6179 .6217 .6255 .6293 .6331 .6368 .6406 .6443 .6480 .6517
0.4 .6554 .6591 .6628 .6664 .6700 .6736 .6772 .6808 .6844 .6879
0.5 .6915 .6950 .6985 .7019 .7054 .7088 .7123 .7157 .7190 .7224
0.6 .7257 .7291 .7324 .7357 .7389 .7422 .7454 .7486 .7517 .7549
0.7 .7580 .7611 .7642 .7673 .7704 .7734 .7764 .7794 .7823 .7852
0.8 .7881 .7910 .7939 .7967 .7995 .8023 .8051 .8078 .8106 .8133
0.9 .8159 .8186 .8212 .8238 .8264 .8289 .8315 .8340 .8365 .8389
1.0 .8413 .8438 .8461 .8485 .8508 .8531 .8554 .8577 .8599 .8621
1.1 .8643 .8665 .8686 .8708 .8729 .8749 .8770 .8790 .8810 .8830
1.2 .8849 .8869 .8888 .8907 .8925 .8944 .8962 .8980 .8997 .9015
1.3 .9032 .9049 .9066 .9082 .9099 .9115 .9131 .9147 .9162 .9177
1.4 .9192 .9207 .9222 .9236 .9251 .9265 .9279 .9292 .9306 .9319
1.5 .9332 .9345 .9357 .9370 .9382 .9394 .9406 .9418 .9429 .9441
1.6 .9452 .9463 .9474 .9484 .9495 .9505 .9515 .9525 .9535 .9545
1.7 .9554 .9564 .9573 .9582 .9591 .9599 .9608 .9616 .9625 .9633
1.8 .9641 .9649 .9656 .9664 .9671 .9678 .9686 .9693 .9699 .9706
1.9 .9713 .9719 .9726 .9732 .9738 .9744 .9750 .9756 .9761 .9767
2.0 .9772 .9778 .9783 .9788 .9793 .9798 .9803 .9808 .9812 .9817
2.1 .9821 .9826 .9830 .9834 .9838 .9842 .9846 .9850 .9854 .9857
2.2 .9861 .9864 .9868 .9871 .9875 .9878 .9881 .9884 .9887 .9890
2.3 .9893 .9896 .9898 .9901 .9904 .9906 .9909 .9911 .9913 .9916
2.4 .9918 .9920 .9922 .9925 .9927 .9929 .9931 .9932 .9934 .9936
2.5 .9938 .9940 .9941 .9943 .9945 .9946 .9948 .9949 .9951 .9952
2.6 .9953 .9955 .9956 .9957 .9959 .9960 .9961 .9962 .9963 .9964
2.7 .9965 .9966 .9967 .9968 .9969 .9970 .9971 .9972 .9973 .9974
2.8 .9974 .9975 .9976 .9977 .9977 .9978 .9979 .9979 .9980 .9981
2.9 .9981 .9982 .9982 .9983 .9984 .9984 .9985 .9985 .9986 .9986
3.0 .9987 .9987 .9987 .9988 .9988 .9989 .9989 .9989 .9990 .9990
3.1 .9990 .9991 .9991 .9991 .9992 .9992 .9992 .9992 .9993 .9993
3.2 .9993 .9993 .9994 .9994 .9994 .9994 .9994 .9995 .9995 .9995
3.3 .9995 .9995 .9995 .9996 .9996 .9996 .9996 .9996 .9996 .9997
3.4 .9997 .9997 .9997 .9997 .9997 .9997 .9997 .9997 .9997 .9998
245
p
n k 0.1 0.2 0.25 0.3 0.4 0.5 0.6 0.7 0.75 0.8 0.9
1 0 .9000 .8000 .7500 .7000 .6000 .5000 .4000 .3000 .2500 .2000 .1000
1 .1000 .2000 .2500 .3000 .4000 .5000 .6000 .7000 .7500 .8000 .9000
2 0 .8100 .6400 .5625 .4900 .3600 .2500 .1600 .0900 .0625 .0400 .0100
1 .1800 .3200 .3750 .4200 .4800 .5000 .4800 .4200 .3750 .3200 .1800
2 .0100 .0400 .0625 .0900 .1600 .2500 .3600 .4900 .5625 .6400 .8100
3 0 .7290 .5120 .4219 .3430 .2160 .1250 .0640 .0270 .0156 .0080 .0010
1 .2430 .3840 .4219 .4410 .4320 .3750 .2880 .1890 .1406 .0960 .0270
2 .0270 .0960 .1406 .1890 .2880 .3750 .4320 .4410 .4219 .3840 .2430
3 .0010 .0080 .0156 .0270 .0640 .1250 .2160 .3430 .4219 .5120 .7290
4 0 .6561 .4096 .3164 .2401 .1296 .0625 .0256 .0081 .0039 .0016 .0001
1 .2916 .4096 .4219 .4116 .3456 .2500 .1536 .0756 .0469 .0256 .0036
2 .0486 .1536 .2109 .2646 .3456 .3750 .3456 .2646 .2109 .1536 .0486
3 .0036 .0256 .0469 .0756 .1536 .2500 .3456 .4116 .4219 .4096 .2916
4 .0001 .0016 .0039 .0081 .0256 .0625 .1296 .2401 .3164 .4096 .6561
5 0 .5905 .3277 .2373 .1681 .0778 .0312 .0102 .0024 .0010 .0003
1 .3280 .4096 .3955 .3601 .2592 .1562 .0768 .0284 .0146 .0064 .0004
2 .0729 .2048 .2637 .3087 .3456 .3125 .2304 .1323 .0879 .0512 .0081
3 .0081 .0512 .0879 .1323 .2304 .3125 .3456 .3087 .2637 .2048 .0729
4 .0005 .0064 .0146 .0283 .0768 .1562 .2592 .3601 .3955 .4096 .3280
5 .0003 .0010 .0024 .0102 .0312 .0778 .1681 .2373 .3277 .5905
6 0 .5314 .2621 .1780 .1176 .0467 .0156 .0041 .0007 .0002 .0001
1 .3543 .3932 .3560 .3025 .1866 .0938 .0369 .0102 .0044 .0015 .0001
2 .0984 .2458 .2966 .3241 .3110 .2344 .1382 .0595 .0330 .0154 .0012
3 .0146 .0819 .1318 .1852 .2765 .3125 .2765 .1852 .1318 .0819 .0146
4 .0012 .0154 .0330 .0595 .1382 .2344 .3110 .3241 .2966 .2458 .0984
5 .0001 .0015 .0044 .0102 .0369 .0938 .1866 .3025 .3560 .3932 .3543
6 .0001 .0002 .0007 .0041 .0156 .0467 .1176 .1780 .2621 .5314
7 0 .4783 .2097 .1335 .0824 .0280 .0078 .0016 .0002 .0001
1 .3720 .3670 .3115 .2471 .1306 .0547 .0172 .0036 .0013 .0004
2 .1240 .2753 .3115 .3177 .2613 .1641 .0774 .0250 .0115 .0043 .0002
3 .0230 .1147 .1730 .2269 .2903 .2734 .1935 .0972 .0577 .0287 .0026
4 .0026 .0287 .0577 .0972 .1935 .2734 .2903 .2269 .1730 .1147 .0230
5 .0002 .0043 .0115 .0250 .0774 .1641 .2613 .3177 .3115 .2753 .1240
6 .0004 .0013 .0036 .0172 .0547 .1306 .2471 .3115 .3670 .3720
7 .0001 .0002 .0016 .0078 .0280 .0824 .1335 .2097 .4783
8 0 .4305 .1678 .1001 .0576 .0168 .0039 .0007 .0001
1 .3826 .3355 .2670 .1977 .0896 .0313 .0079 .0012 .0004 .0001
2 .1488 .2936 .3115 .2965 .2090 .1094 .0413 .0100 .0038 .0011
3 .0331 .1468 .2076 .2541 .2787 .2188 .1239 .0467 .0231 .0092 .0004
4 .0046 .0459 .0865 .1361 .2322 .2734 .2322 .1361 .0865 .0459 .0046
5 .0004 .0092 .0231 .0467 .1239 .2188 .2787 .2541 .2076 .1468 .0331
6 .0011 .0038 .0100 .0413 .1094 .2090 .2965 .3115 .2936 .1488
246
p
n k 0.1 0.2 0.25 0.3 0.4 0.5 0.6 0.7 0.75 0.8 0.9
8 7 .0001 .0004 .0012 .0079 .0313 .0896 .1977 .2670 .3355 .3826
8 .0001 .0007 .0039 .0168 .0576 .1001 .1678 .4305
9 0 .3874 .1342 .0751 .0404 .0101 .0020 .0003
1 .3874 .3020 .2253 .1556 .0605 .0176 .0035 .0004 .0001
2 .1722 .3020 .3003 .2668 .1612 .0703 .0212 .0039 .0012 .0003
3 .0446 .1762 .2336 .2668 .2508 .1641 .0743 .0210 .0087 .0028 .0001
4 .0074 .0661 .1168 .1715 .2508 .2461 .1672 .0735 .0389 .0165 .0008
5 .0008 .0165 .0389 .0735 .1672 .2461 .2508 .1715 .1168 .0661 .0074
6 .0001 .0028 .0087 .0210 .0743 .1641 .2508 .2668 .2336 .1762 .0446
7 .0003 .0012 .0039 .0212 .0703 .1612 .2668 .3003 .3020 .1722
8 .0001 .0004 .0035 .0176 .0605 .1556 .2253 .3020 .3874
9 .0003 .0020 .0101 .0404 .0751 .1342 .3874
10 0 .3487 .1074 .0563 .0282 .0060 .0010 .0001
1 .3874 .2684 .1877 .1211 .0403 .0098 .0016 .0001
2 .1937 .3020 .2816 .2335 .1209 .0439 .0106 .0014 .0004 .0001
3 .0574 .2013 .2503 .2668 .2150 .1172 .0425 .0090 .0031 .0008
4 .0112 .0881 .1460 .2001 .2508 .2051 .1115 .0368 .0162 .0055 .0001
5 .0015 .0264 .0584 .1029 .2007 .2461 .2007 .1029 .0584 .0264 .0015
6 .0001 .0055 .0162 .0368 .1115 .2051 .2508 .2001 .1460 .0881 .0112
7 .0008 .0031 .0090 .0425 .1172 .2150 .2668 .2503 .2013 .0574
8 .0001 .0004 .0014 .0106 .0439 .1209 .2335 .2816 .3020 .1937
9 .0001 .0016 .0098 .0403 .1211 .1877 .2684 .3874
10 .0001 .0010 .0060 .0282 .0563 .1074 .3487
11 0 .3138 .0859 .0422 .0198 .0036 .0005
1 .3835 .2362 .1549 .0932 .0266 .0054 .0007
2 .2131 .2953 .2581 .1998 .0887 .0269 .0052 .0005 .0001
3 .0710 .2215 .2581 .2568 .1774 .0806 .0234 .0037 .0011 .0002
4 .0158 .1107 .1721 .2201 .2365 .1611 .0701 .0173 .0064 .0017
5 .0025 .0388 .0803 .1321 .2207 .2256 .1471 .0566 .0268 .0097 .0003
6 .0003 .0097 .0268 .0566 .1471 .2256 .2207 .1321 .0803 .0388 .0025
7 .0017 .0064 .0173 .0701 .1611 .2365 .2201 .1721 .1107 .0158
8 .0002 .0011 .0037 .0234 .0806 .1774 .2568 .2581 .2215 .0710
9 .0001 .0005 .0052 .0269 .0887 .1998 .2581 .2953 .2131
10 .0007 .0054 .0266 .0932 .1549 .2362 .3835
11 .0005 .0036 .0198 .0422 .0859 .3138
12 0 .2824 .0687 .0317 .0138 .0022 .0002
1 .3766 .2062 .1267 .0712 .0174 .0029 .0003
2 .2301 .2835 .2323 .1678 .0639 .0161 .0025 .0002
3 .0852 .2362 .2581 .2397 .1419 .0537 .0125 .0015 .0004 .0001
4 .0213 .1329 .1936 .2311 .2128 .1208 .0420 .0078 .0024 .0005
5 .0038 .0532 .1032 .1585 .2270 .1934 .1009 .0291 .0115 .0033
6 .0005 .0155 .0401 .0792 .1766 .2256 .1766 .0792 .0401 .0155 .0005
7 .0033 .0115 .0291 .1009 .1934 .2270 .1585 .1032 .0532 .0038
8 .0005 .0024 .0078 .0420 .1208 .2128 .2311 .1936 .1329 .0213
9 .0001 .0004 .0015 .0125 .0537 .1419 .2397 .2581 .2362 .0852
10 .0002 .0025 .0161 .0639 .1678 .2323 .2835 .2301
11 .0003 .0029 .0174 .0712 .1267 .2062 .3766
12 .0002 .0022 .0138 .0317 .0687 .2824
247
p
n k 0.1 0.2 0.25 0.3 0.4 0.5 0.6 0.7 0.75 0.8 0.9
13 0 .2542 .0550 .0238 .0097 .0013 .0001
1 .3672 .1787 .1029 .0540 .0113 .0016 .0001
2 .2448 .2680 .2059 .1388 .0453 .0095 .0012 .0001
3 .0997 .2457 .2517 .2181 .1107 .0349 .0065 .0006 .0001
4 .0277 .1535 .2097 .2337 .1845 .0873 .0243 .0034 .0009 .0001
5 .0055 .0691 .1258 .1803 .2214 .1571 .0656 .0142 .0047 .0011
6 .0008 .0230 .0559 .1030 .1968 .2095 .1312 .0442 .0186 .0058 .0001
7 .0001 .0058 .0186 .0442 .1312 .2095 .1968 .1030 .0559 .0230 .0008
8 .0011 .0047 .0142 .0656 .1571 .2214 .1803 .1258 .0691 .0055
9 .0001 .0009 .0034 .0243 .0873 .1845 .2337 .2097 .1535 .0277
10 .0001 .0006 .0065 .0349 .1107 .2181 .2517 .2457 .0997
11 .0001 .0012 .0095 .0453 .1388 .2059 .2680 .2448
12 .0001 .0016 .0113 .0540 .1029 .1787 .3672
13 .0001 .0013 .0097 .0238 .0550 .2542
14 0 .2288 .0440 .0178 .0068 .0008 .0001
1 .3559 .1539 .0832 .0407 .0073 .0009 .0001
2 .2570 .2501 .1802 .1134 .0317 .0056 .0005
3 .1142 .2501 .2402 .1943 .0845 .0222 .0033 .0002
4 .0349 .1720 .2202 .2290 .1549 .0611 .0136 .0014 .0003
5 .0078 .0860 .1468 .1963 .2066 .1222 .0408 .0066 .0018 .0003
6 .0013 .0322 .0734 .1262 .2066 .1833 .0918 .0232 .0082 .0020
7 .0002 .0092 .0280 .0618 .1574 .2095 .1574 .0618 .0280 .0092 .0002
8 .0020 .0082 .0232 .0918 .1833 .2066 .1262 .0734 .0322 .0013
9 .0003 .0018 .0066 .0408 .1222 .2066 .1963 .1468 .0860 .0078
10 .0003 .0014 .0136 .0611 .1549 .2290 .2202 .1720 .0349
11 .0002 .0033 .0222 .0845 .1943 .2402 .2501 .1142
12 .0005 .0056 .0317 .1134 .1802 .2501 .2570
13 .0001 .0009 .0073 .0407 .0832 .1539 .3559
14 .0001 .0008 .0068 .0178 .0440 .2288
15 0 .2059 .0352 .0134 .0047 .0005
1 .3432 .1319 .0668 .0305 .0047 .0005
2 .2669 .2309 .1559 .0916 .0219 .0032 .0003
3 .1285 .2501 .2252 .1700 .0634 .0139 .0016 .0001
4 .0428 .1876 .2252 .2186 .1268 .0417 .0074 .0006 .0001
5 .0105 .1032 .1651 .2061 .1859 .0916 .0245 .0030 .0007 .0001
6 .0019 .0430 .0917 .1472 .2066 .1527 .0612 .0116 .0034 .0007
7 .0003 .0138 .0393 .0811 .1771 .1964 .1181 .0348 .0131 .0035
8 .0035 .0131 .0348 .1181 .1964 .1771 .0811 .0393 .0138 .0003
9 .0007 .0034 .0116 .0612 .1527 .2066 .1472 .0917 .0430 .0019
10 .0001 .0007 .0030 .0245 .0916 .1859 .2061 .1651 .1032 .0105
11 .0001 .0006 .0074 .0417 .1268 .2186 .2252 .1876 .0428
12 .0001 .0016 .0139 .0634 .1700 .2252 .2501 .1285
13 .0003 .0032 .0219 .0916 .1559 .2309 .2669
14 .0005 .0047 .0305 .0668 .1319 .3432
15 .0005 .0047 .0134 .0352 .2059
248
p
n k 0.1 0.2 0.25 0.3 0.4 0.5 0.6 0.7 0.75 0.8 0.9
16 0 .1853 .0281 .0100 .0033 .0003
1 .3294 .1126 .0535 .0228 .0030 .0002
2 .2745 .2111 .1336 .0732 .0150 .0018 .0001
3 .1423 .2463 .2079 .1465 .0468 .0085 .0008
4 .0514 .2001 .2252 .2040 .1014 .0278 .0040 .0002
5 .0137 .1201 .1802 .2099 .1623 .0667 .0142 .0013 .0002
6 .0028 .0550 .1101 .1649 .1983 .1222 .0392 .0056 .0014 .0002
7 .0004 .0197 .0524 .1010 .1889 .1746 .0840 .0185 .0058 .0012
8 .0001 .0055 .0197 .0487 .1417 .1964 .1417 .0487 .0197 .0055 .0001
9 .0012 .0058 .0185 .0840 .1746 .1889 .1010 .0524 .0197 .0004
10 .0002 .0014 .0056 .0392 .1222 .1983 .1649 .1101 .0550 .0028
11 .0002 .0013 .0142 .0667 .1623 .2099 .1802 .1201 .0137
12 .0002 .0040 .0278 .1014 .2040 .2252 .2001 .0514
13 .0008 .0085 .0468 .1465 .2079 .2463 .1423
14 .0001 .0018 .0150 .0732 .1336 .2111 .2745
15 .0002 .0030 .0228 .0535 .1126 .3294
16 .0003 .0033 .0100 .0281 .1853
17 0 .1668 .0225 .0075 .0023 .0002
1 .3150 .0957 .0426 .0169 .0019 .0001
2 .2800 .1914 .1136 .0581 .0102 .0010 .0001
3 .1556 .2393 .1893 .1245 .0341 .0052 .0004
4 .0605 .2093 .2209 .1868 .0796 .0182 .0021 .0001
5 .0175 .1361 .1914 .2081 .1379 .0472 .0081 .0006 .0001
6 .0039 .0680 .1276 .1784 .1839 .0944 .0242 .0026 .0005 .0001
7 .0007 .0267 .0668 .1201 .1927 .1484 .0571 .0095 .0025 .0004
8 .0001 .0084 .0279 .0644 .1606 .1855 .1070 .0276 .0093 .0021
9 .0021 .0093 .0276 .1070 .1855 .1606 .0644 .0279 .0084 .0001
10 .0004 .0025 .0095 .0571 .1484 .1927 .1201 .0668 .0267 .0007
11 .0001 .0005 .0026 .0242 .0944 .1839 .1784 .1276 .0680 .0039
12 .0001 .0006 .0081 .0472 .1379 .2081 .1914 .1361 .0175
13 .0001 .0021 .0182 .0796 .1868 .2209 .2093 .0605
14 .0004 .0052 .0341 .1245 .1893 .2393 .1556
15 .0001 .0010 .0102 .0581 .1136 .1914 .2800
16 .0001 .0019 .0169 .0426 .0957 .3150
17 .0002 .0023 .0075 .0225 .1668
18 0 .1501 .0180 .0056 .0016 .0001
1 .3002 .0811 .0338 .0126 .0012 .0001
2 .2835 .1723 .0958 .0458 .0069 .0006
3 .1680 .2297 .1704 .1046 .0246 .0031 .0002
4 .0700 .2153 .2130 .1681 .0614 .0117 .0011
5 .0218 .1507 .1988 .2017 .1146 .0327 .0045 .0002
6 .0052 .0816 .1436 .1873 .1655 .0708 .0145 .0012 .0002
7 .0010 .0350 .0820 .1376 .1892 .1214 .0374 .0046 .0010 .0001
8 .0002 .0120 .0376 .0811 .1734 .1669 .0771 .0149 .0042 .0008
9 .0033 .0139 .0386 .1284 .1855 .1284 .0386 .0139 .0033
10 .0008 .0042 .0149 .0771 .1669 .1734 .0811 .0376 .0120 .0002
11 .0001 .0010 .0046 .0374 .1214 .1892 .1376 .0820 .0350 .0010
12 .0002 .0012 .0145 .0708 .1655 .1873 .1436 .0816 .0052
13 .0002 .0045 .0327 .1146 .2017 .1988 .1507 .0218
14 .0011 .0117 .0614 .1681 .2130 .2153 .0700
249
p
n k 0.1 0.2 0.25 0.3 0.4 0.5 0.6 0.7 0.75 0.8 0.9
18 15 .0002 .0031 .0246 .1046 .1704 .2297 .1680
16 .0006 .0069 .0458 .0958 .1723 .2835
17 .0001 .0012 .0126 .0338 .0811 .3002
18 .0001 .0016 .0056 .0180 .1501
19 0 .1351 .0144 .0042 .0011 .0001
1 .2852 .0685 .0268 .0093 .0008
2 .2852 .1540 .0803 .0358 .0046 .0003
3 .1796 .2182 .1517 .0869 .0175 .0018 .0001
4 .0798 .2182 .2023 .1491 .0467 .0074 .0005
5 .0266 .1636 .2023 .1916 .0933 .0222 .0024 .0001
6 .0069 .0955 .1574 .1916 .1451 .0518 .0085 .0005 .0001
7 .0014 .0443 .0974 .1525 .1797 .0961 .0237 .0022 .0004
8 .0002 .0166 .0487 .0981 .1797 .1442 .0532 .0077 .0018 .0003
9 .0051 .0198 .0514 .1464 .1762 .0976 .0220 .0066 .0013
10 .0013 .0066 .0220 .0976 .1762 .1464 .0514 .0198 .0051
11 .0003 .0018 .0077 .0532 .1442 .1797 .0981 .0487 .0166 .0002
12 .0004 .0022 .0237 .0961 .1797 .1525 .0974 .0443 .0014
13 .0001 .0005 .0085 .0518 .1451 .1916 .1574 .0955 .0069
14 .0001 .0024 .0222 .0933 .1916 .2023 .1636 .0266
15 .0005 .0074 .0467 .1491 .2023 .2182 .0798
16 .0001 .0018 .0175 .0869 .1517 .2182 .1796
17 .0003 .0046 .0358 .0803 .1540 .2852
18 .0008 .0093 .0268 .0685 .2852
19 .0001 .0011 .0042 .0144 .1351
20 0 .1216 .0115 .0032 .0008
1 .2702 .0576 .0211 .0068 .0005
2 .2852 .1369 .0669 .0278 .0031 .0002
3 .1901 .2054 .1339 .0716 .0123 .0011
4 .0898 .2182 .1897 .1304 .0350 .0046 .0003
5 .0319 .1746 .2023 .1789 .0746 .0148 .0013
6 .0089 .1091 .1686 .1916 .1244 .0370 .0049 .0002
7 .0020 .0545 .1124 .1643 .1659 .0739 .0146 .0010 .0002
8 .0004 .0222 .0609 .1144 .1797 .1201 .0355 .0039 .0008 .0001
9 .0001 .0074 .0271 .0654 .1597 .1602 .0710 .0120 .0030 .0005
10 .0020 .0099 .0308 .1171 .1762 .1171 .0308 .0099 .0020
11 .0005 .0030 .0120 .0710 .1602 .1597 .0654 .0271 .0074 .0001
12 .0001 .0008 .0039 .0355 .1201 .1797 .1144 .0609 .0222 .0004
13 .0002 .0010 .0146 .0739 .1659 .1643 .1124 .0545 .0020
14 .0002 .0049 .0370 .1244 .1916 .1686 .1091 .0089
15 .0013 .0148 .0746 .1789 .2023 .1746 .0319
16 .0003 .0046 .0350 .1304 .1897 .2182 .0898
17 .0011 .0123 .0716 .1339 .2054 .1901
18 .0002 .0031 .0278 .0669 .1369 .2852
19 .0005 .0068 .0211 .0576 .2702
20 .0008 .0032 .0115 .1216
250
Table C: Critical Values of t distribution
tα
χ2α
df χ2.995 χ2.990 χ2.975 χ2.950 χ2.900 χ2.100 χ2.050 χ2.025 χ2.010 χ2.005
1 0.000 0.000 0.001 0.004 0.016 2.706 3.841 5.024 6.635 7.879
2 0.010 0.020 0.051 0.103 0.211 4.605 5.991 7.378 9.210 10.597
3 0.072 0.115 0.216 0.352 0.584 6.251 7.815 9.348 11.345 12.838
4 0.207 0.297 0.484 0.711 1.064 7.779 9.488 11.143 13.277 14.860
5 0.412 0.554 0.831 1.145 1.610 9.236 11.070 12.833 15.086 16.750
6 0.676 0.872 1.237 1.635 2.204 10.645 12.592 14.449 16.812 18.548
7 0.989 1.239 1.690 2.167 2.833 12.017 14.067 16.013 18.475 20.278
8 1.344 1.646 2.180 2.733 3.490 13.362 15.507 17.535 20.090 21.955
9 1.735 2.088 2.700 3.325 4.168 14.684 16.919 19.023 21.666 23.589
10 2.156 2.558 3.247 3.940 4.865 15.987 18.307 20.483 23.209 25.188
11 2.603 3.053 3.816 4.575 5.578 17.275 19.675 21.920 24.725 26.757
12 3.074 3.571 4.404 5.226 6.304 18.549 21.026 23.337 26.217 28.300
13 3.565 4.107 5.009 5.892 7.042 19.812 22.362 24.736 27.688 29.819
14 4.075 4.660 5.629 6.571 7.790 21.064 23.685 26.119 29.141 31.319
15 4.601 5.229 6.262 7.261 8.547 22.307 24.996 27.488 30.578 32.801
16 5.142 5.812 6.908 7.962 9.312 23.542 26.296 28.845 32.000 34.267
17 5.697 6.408 7.564 8.672 10.085 24.769 27.587 30.191 33.409 35.718
18 6.265 7.015 8.231 9.390 10.865 25.989 28.869 31.526 34.805 37.156
19 6.844 7.633 8.907 10.117 11.651 27.204 30.144 32.852 36.191 38.582
20 7.434 8.260 9.591 10.851 12.443 28.412 31.410 34.170 37.566 39.997
21 8.034 8.897 10.283 11.591 13.240 29.615 32.671 35.479 38.932 41.401
22 8.643 9.542 10.982 12.338 14.041 30.813 33.924 36.781 40.289 42.796
23 9.260 10.196 11.689 13.091 14.848 32.007 35.172 38.076 41.638 44.181
24 9.886 10.856 12.401 13.848 15.659 33.196 36.415 39.364 42.980 45.559
25 10.520 11.524 13.120 14.611 16.473 34.382 37.652 40.646 44.314 46.928
26 11.160 12.198 13.844 15.379 17.292 35.563 38.885 41.923 45.642 48.290
27 11.808 12.879 14.573 16.151 18.114 36.741 40.113 43.195 46.963 49.645
28 12.461 13.565 15.308 16.928 18.939 37.916 41.337 44.461 48.278 50.993
29 13.121 14.256 16.047 17.708 19.768 39.087 42.557 45.722 49.588 52.336
30 13.787 14.953 16.791 18.493 20.599 40.256 43.773 46.979 50.892 53.672
40 20.707 22.164 24.433 26.509 29.051 51.805 55.758 59.342 63.691 66.766
50 27.991 29.707 32.357 34.764 37.689 63.167 67.505 71.420 76.154 79.490
60 35.534 37.485 40.482 43.188 46.459 74.397 79.082 83.298 88.379 91.952
70 43.275 45.442 48.758 51.739 55.329 85.527 90.531 95.023 100.425 104.215
80 51.172 53.540 57.153 60.391 64.278 96.578 101.879 106.629 112.329 116.321
90 59.196 61.754 65.647 69.126 73.291 107.565 113.145 118.136 124.116 128.299
100 67.328 70.065 74.222 77.929 82.358 118.498 124.342 129.561 135.807 140.169
252
Appendix F
Chapter 2
• Mean n
x1 + x2 + · · · + xn 1X
x= = xi
n n i=1
• Sample Variance
P
Pn 2
P 2 ( i x )2
2 i=1 (xi − x) xi − n
s = =
n−1 n−1
Chapter 3
Chapter 4
2
σX = (x1 − µ)2 × p1 + (x2 − µ)2 × p2 + · · · + (xn − µ)2 × pn
n
(xi − µ)2 × pi
X
=
i=1
n
(x2i × pi ) − µ2
X
=
i=1
Chapter 5
!
n n!
=
k k!(n − k)!
•
!
n k
P(k successes out of n trials) = p (1−p)n−k , k = 0, 1, . . . , n, 0 < p < 1.
k
Chapter 7
v v
u (n − 1)S 2 u (n − 1)S 2
u u
t , t
χ2α/2 χ21−α/2
α
where χ21−α/2 and χ2α/2 are values from a χ2 (n − 1) distribution with areas 2
below
χ21−α/2 and above χ2α/2 , respectively.
s
b − p)
p(1 b
pb ± zα/2
n
x
with confidence C = (1 − α) where pb = n
is the observed proportion and n is large.
2
σ
2 zα/2 σ2
n = zα/2 =
m m2
2
zα/2 p0 (1 − p0 )
n=
m2
is the sample size required for the margin of error to be m, where p0 is the best prior
guess about p.
Chapter 9
s
σx2 σy2
(x − y) ± zα/2 +
m n
• If σx , σy are not known but are assumed equal, use the t-statistic to get a C = (1 − α)
level confidence interval for µx − µy
s
1 1
(x − y) ± tα/2 sp +
n m
(m − 1)s2x + (n − 1)s2y
s2p =
m+n−2
s
s2x s2y
(x − y) ± tα/2 +
m n
Chapter 10
• The sample correlation coefficient
n
!
1 X xi − x yi − y
rxy =
n − 1 i=1 sx sy
xi yi ) − (
P P P
n( xi )( yi )
= q
[n( x2 ) − ( x )2 ] [n( y 2 ) − ( y )2 ]
P P P P
i i i i
yb = a + bx
•
n( xy) − ( x)( y)
P P P
sy
b= βb = rxy ( ) =
n( x2 ) − ( x)2
P P
sx
•
b = y − bx
a=α
•
sP sP s
residual2i (yi − ybi )2 n−1
se = = = sy (1 − rxy
2 )
n−2 n−2 n−2
•
se se
Standard error of b = SEb = qP = √
(xi − x)2 sx n − 1
•
b
t=
SEb
µb y = a + bx∗
µb ± tα/2 × SEµb
yb = a + bx∗
v
1 (x∗ − x)2
u
u
SEby = se t1 + +P
n (xi − x)2
yb ± tα/2 × SEby
§8.4: x1 , ..., xn are N(µ, σ). µ = µ0 µ > µ0 t > tα (n-1) P(T > t)
x−µ
σ unknown. µ = µ0 µ < µ0 t= √0
s/ n
t < −tα (n-1) P(T < t)
n
1
Find s2 = (xi − x)2
P
n−1 µ = µ0 µ 6= µ0 |t| > tα/2 (n-1) 2 P(T > |t|)
1
§9.4: Comparing p1 , p2 :
x1 x2 p1 = p2 p1 > p2 z= z > zα P(Z > z)
Find p̂1 = m, p̂2 = n
x1 +x2
√ (p̂1 −p̂21)
and p̂ = m+n . p1 = p2 p1 < p2 1
p̂(1−p̂)( m + n ) z < −zα P(Z < z)
m, n large p1 = p2 p1 6= p2 |z| > zα/2 2 P(Z > |z|)
260
Index
Estimate, 2 mean, 20
Event, 40 median, 20
Explanatory variable, 184 mode, 21
Measures of spread
Frequency distribution interquartile range, 25
of qualitative data, 14 range, 24
of quantitative data, 15 standard deviation, 29
Median, 20
Histogram, 15
Mode, 21
Hypothesis tests, 133
Mutual exclusive, 47
for a population mean
σ known, 135 Nominal data, 10
σ unknown, 148 Normal
for a population proportion, 155 approximation to Binomial, 99
for σ, 152 distribution, 74
for comparing two means, 162, 166 Null hypothesis, 134
for comparing two proportions, 171
level of significance, 144 One-sided alternative, 134
P-value, 136 Ordinal data, 10
paired difference, 162 Outlier, 23
rejection region, 144
test statistic, 136 P-value, 136
Type I error, 144 Parameters, 4
Type II error, 144 Percentiles, 25
Pie chart, 13
Independent Population, 3
events, 46 Probability
sample, 162 conditional, 44
variable, 184 distribution, 54
Inferential Statistics, 3 event, 40
Interquartile range, 25 independent events, 46
multiplication rule, 46
Laws of Large Numbers, 95 mutual exclusive, 47
Least squares line, 184 random experiment, 53
Level of scales, 9 rules, 40, 41, 46
Level of significance, 144 sample space, 39
Linear Regression Proportion, 15
simple, 184
Qualitative data, 9
Mean Quantitative data, 9
of a discrete random variable, 57 Quartiles, 25
sample, 20
Measures of central tendency Random experiment, 39
262