Robert V. Hogg
Allen T. Craig
THE UNIVERSITY OF IOWA
Introduction to
Mathematical
Statistics
Fourth Edition
Macmillan Publishing Co., Inc.
NEW YORK
Collier Macmillan Publishers
LONDONCopyright © 1978, Macmillan Publishing Co., Inc.
Printed in the United States of America
All rights reserved. No part of this book may be reproduced or
transmitted in any form or by any means, electronic or mechanical,
including photocopying, recording, or any information storage and
retrieval system, without permission in writing from the Publisher.
Earlier editions © 1958 and 1959 and copyright © 1965 and 1970 by
Macmillan Publishing Co., Inc.
Macmillan Publishing Co., Inc.
866 Third Avenue, New York, New York 10022
Collier Macmillan Canada, Ltd,
Library of Congress Cataloging in Publication Data
‘Hogg, Robert V
Introduction to mathematical statistics.
Bibliography: p.
Includes index.
1. Mathematical statistics. I, Craig, Allen
Thornton, (date) joint author. II, Title.
QA276.H59 1978 519 77-2884
ISBN 0-02-355710-9 (Hardbound)
ISBN 0-02-978990-7 (International Edition)
PRINTING 131415 YEAR 56789
TERN = A-na-a6671.0-9Preface
We are much indebted to our colleagues throughout the country who
have so generously provided us with suggestions on both the order of
presentation and the kind of material to be included in this edition of
Introduction to Mathematical Statistics. We believe that you will find
the book much more adaptable for classroom use than the previous
edition. Again, essentially all the distribution theory that is needed is
found in the first five chapters. Estimation and tests of statistical
hypotheses, including nonparameteric methods, follow in Chapters 6, 7,
8, and 9, respectively. However, sufficient statistics can be introduced
earlier by considering Chapter 10 immediately after Chapter 6 on
estimation. Many of the topics of Chapter 11 are such that they may
also be introduced sooner: the Rao-Cramér inequality (11.1) and
robust estimation (11.7) after measures of the quality of estimators
(6.2), sequential analysis (11.2) after best tests (7.2), multiple com-
parisons (11.3) after the analysis of variance (8.5), and classification
(11.4) after material on the sample correlation coefficient (8.7). With this
flexibility the first eight chapters can easily be covered in courses of
either six semester hours or eight quarter hours, supplementing with
the various topics from Chapters 9 through 11 as the teacher chooses
and as the time permits. Ina longer course, we hope many teachers and
students will be interested in the topics of stochastic independence
(11.5), robustness (11.6 and 11.7), multivariate normal distributions
(12.1), and quadratic forms (12.2 and 12.3).
We are obligated to Catherine M. Thompson and Maxine Merrington
and to Professor E. S. Pearson for permission to include Tables II and
V, which are abridgments and adaptations of tables published in
Biometrika. We wish to thank Oliver & Boyd Ltd., Edinburgh, for
permission to include Table IV, which is an abridgment and adaptation
vvi Preface
of Table IIT from the book Statistical Tables for Biological, Agricultural,
and Medical Research by the late Professor Sir Ronald A. Fisher,
Cambridge, and Dr. Frank Yates, Rothamsted. Finally, we wish to
thank Mrs. Karen Horner for her first-class help in the preparation of
the manuscript.
RV.H.
ATGContents
Chapter 1
Distributions of Random Variables
1.1 Introduction 1
12 Algebra of Sets 4
1.3 Set Functions 8
14 The Probability Set Function 12
15 Random Variables 16
1.6 The Probability Density Function 23
1.7 The Distribution Function 37
18 Certain Probability Models 38
1.9 Mathematical Expectation 44
1.10 Some Special Mathematical Expectations 48
1.11 Chebyshev’s Inequality 58
Chapter 2
Cenditional Probability and Stochastic Independence
2.1 Conditional Probability 61
2.2 Marginal and Conditional Distributions 65
2.3. The Correlation Coefficient 73
2.4 Stochastic Independence 80
Chapter 3
Some Special Distributions
3.1 The Binomial, Trinomial, and Multinomial Distributions
3.2 The Poisson Distribution 99
3.3. The Gamma and Chi-Square Distributions 103
3.4 The Normal Distribution 109
3.5 The Bivariate Normal Distribution 1/7
vii
61Contents
viii
Chapter 4
Distributions of Functions of Random Variables 122
4.1. Sampling Theory 122
4,2 Transformations of Variables of the Discrete Type 128
43 Transformations of Variables of the Continuous Type 132
4.4 Thetand F Distributions 143
4.5 Extensions of the Change-of-Variable Technique 147
4.6 Distributions of Order Statistics 154
4.7. The Moment-Generating-Function Technique 164
4.8 The Distributions of X and nS?/o? 172
4.9 Expectations of Functions of Random Variables 176
Chapter 5
Limiting Distributions 181
5.1 Limiting Distributions 187
5.2 Stochastic Convergence 186
5.3 Limiting Moment-Generating Functions 188
5.4 The Central Limit Theorem 192
5.5 Some Theorems on Limiting Distributions 196
Chapter 6
Estimation 200
6.1 Point Estimation 200
6.2 Measures of Quality of Estimators 207
6.3 Confidence Intervals for Means 212
6.4 Confidence Intervals for Differences of Means 219
6.5 Confidence Intervals for Variances 222
6.6 Bayesian Estimates 227
Chapter 7
Statistical Hypotheses 235
7.1 Some Examples and Definitions 235
7.2 Certain Best Tests 242
7.3 Uniformly Most Powerful Tests 257
oe
Likelihood Ratio Tests 257Contents ix
Chapter 8
Other Statistical Tests 269
8.1 Chi-Square Tests 269
8.2 The Distributions of Certain Quadratic Forms 278
8.3. A Test of the Equality of Several Means 283
8.4 Noncentral x? and Noncentral F 288
8.5 The Analysis of Variance 297
8.6
8.7
A Regression Problem 296
A Test of Stochastic Independence 300
Chapter 9
Nonparametric Methods 304
ol
9.2
Lact
9.4
Bae
ane
97
9.8
Confidence Intervals for Distribution Quantiles 304
Tolerance Limits for Distributions 307
The Sign Test 312
A Test of Wilcoxon 314
The Equality of Two Distributions 320
The Mann-Whitney-Wilcoxon Test 326
Distributions Under Alternative Hypotheses 337
Linear Rank Statistics 334
Chapter 10
Sufficient Statistics 341
w.1
10.2
10.3
10.4
10.5
10.6
‘A Sufficient Statistic for a Parameter 341
The Rao-Blackwell Theorem 349
Completeness and Uniqueness 353
The Exponential Class of Probability Density Functions 357
Functions of a Parameter 361
The Case of Several Parameters 364
Chapter 11
Further Topics in Statistical Inference 370
ee
11.2
11.3
114
The Rao-Cramér Inequality 370
The Sequential Probability Ratio Test 374
Multiple Comparisons 380
Classification 38511.5 Sufficiency, Completeness, and Stochastic Independence
11.6 Robust Nonparametric Methods 396
11.7. Robust Estimation 400
Chapter 12
Further Normal Distribution Theory
12.1 The Multivariate Normal Distribution 405
12.2 The Distributions of Certain Quadratic Forms 410
12.3. The Independence of Certain Quadratic Forms 414
Appendia A
References
Appendix B
Tables
Appendia C
Answers to Selected Exercises
Index
Contents
389
405
42
429
435Chapter I
Distributions of
Random Variables
1.1 Introduction
Many kinds of investigations may be characterized in part by the
fact that repeated experimentation, under essentially the same con-
ditions, is more or less standard procedure. For instance, in medical
research, interest may center on the effect of a drug that is to be
administered; or an economist may be concerned with the prices of
three specified commodities at various time intervals; or the agronomist
may wish to study the effect that a chemical fertilizer has on the yield
of a cereal grain, The only way in which an investigator can elicit
information about any such phenomenon is to perform his experiment.
Each experiment terminates with an outcome. But it is characteristic of
these experiments that the outcome cannot be predicted with certainty
prior to the performance of the experiment.
Suppose that we have such an experiment, the outcome of which
cannot be predicted with certainty, but the experiment is of such a
nature that the collection of every possible outcome can be described
prior to its performance. If this kind of experiment can be repeated
under the same conditions, it is called a random experiment, and the
collection of every possible outcome is called the experimental space or
the sample space.
Example 1. In the toss of a coin, let the outcome tails be denoted by
T and let the outcome heads be denoted by H. If we assume that the coin
may be repeatedly tossed under the same conditions, then the toss of this
coin is an example of a random experiment in which the outcome is one of
12 Distributions of Random Variables [Ch.1
the two symbols T and H; that is, the sample space is the collection of these
two symbols.
Example 2. In the cast of one red die and one white die, let the outcome
be the ordered pair (number of spots up on the red die, number of spots up
on the white die). If we assume that these two dice may be repeatedly cast
under the same conditions, then the cast of this pair of dice is a random
experiment and the sample space consists of the 36 order pairs (1, 1),...,
(1, 6), (2, 1),.-+, (2, 6),.-., (6, 6).
Let @ denote a sample space, and let C represent a part of &. If,
upon the performance of the experiment, the outcome is in C, we shall
say that the event C has occurred. Now conceive of our having made N
repeated performances of the random experiment. Then we can count
the number fof times (the frequency) that the event C actually occurred
throughout the N performances. The ratio f/N is called the relative
‘frequency of the event C in these N experiments. A relative frequency is
usually quite erratic for small values of N, as you can discover by
tossing a coin. But as N increases, experience indicates that relative
frequencies tend to stabilize. This suggests that we associate with the
event C a number, say #, that is equal or approximately equal to that
number about which the relative frequency seems to stabilize. If we do
this, then the number # can be interpreted as that number which, in
future performances of the experiment, the relative frequency of the
event C will either equal or approximate. Thus, although we cannot
predict the outcome of a random experiment, we can, for a large value
of N, predict approximately the relative frequency with which the
outcome will be in C. The number # associated with the event C is given
various names, Sometimes it is called the probability that the outcome
of the random experiment is in C; sometimes it is called the probability
of the event C; and sometimes it is called the probability measure of C.
The context usually suggests an appropriate choice of terminology.
Example 3. Let @ denote the sample space of Example 2 and let C be
the collection of every ordered pair of @ for which the sum of the pair is equal
to seven. Thus C is the collection (1, 6), (2, 5), (3, 4). (4, 3), (5, 2), and (6, 1).
Suppose that the dice are cast N = 400 times and let f, the frequency of a
sum of seven, be f = 60. Then the relative frequency with which the outcome
was in C is f/N = 8%; = 0.15. Thus we might associate with C a number p
that is close to 0.15, and p would be called the probability of the event C.
Remark. The preceding interpretation of probability is sometimes re-
ferred to as the relative frequency approach, and it obviously depends upon
the fact that an experiment can be repeated under essentially identical con-Sec. 1.1] Introduction 3
ditions. However, many persons extend probability to other situations by
treating it as rational measure of belief. For example, the statement p = 3
would mean to them that their personal or subjective probability of the event
C is equal to 2. Hence, if they are not opposed to gambling, this could be
interpreted as a willingness on their part to bet on the outcome of C so that
the two possible payoffs are in the ratio p/(1 — p) = 2/3 = 4. Moreover, if
they truly believe that p = 3 is correct, they would be willing to accept either
side of the bet: (a) win 3 units if C occurs and lose 2 if it does not occur, or
(b) win 2 units if C does not occur and lose 3 if it does. However, since the
mathematical properties of probability given in Section 1.4 are consistent
with cither of these interpretations, the subsequent mathematical develop-
ment does not depend upon which approach is used.
The primary purpose of having a mathematical theory of statistics
is to provide mathematical models for random experiments. Once a
model for such an experiment has been provided and the theory
worked out in detail, the statistician may, within this framework, make
inferences (that is, draw conclusions) about the random experiment.
The construction of such a model requires a theory of probability. One
of the more logically satisfying theories of probability is that based on
the concepts of sets and functions of sets. These concepts are introduced
in Sections 1.2 and 1.3.
EXERCISES
1.1. In each of the following random experiments, describe the sample
space @. Use any experience that you may have had (or use your intuition) to
assign a value to the probability p of the event C in each of the following
instances:
(a) The toss of an unbiased coin where the event C is tails.
(b) The cast of an honest die where the event C is a five or a six.
(©) The draw of a card from an ordinary deck of playing cards where the
event C occurs if the card is a spade.
(d) The choice of a number on the interval zero to 1 where the event C
occurs if the number is less than 4.
(e) The choice of a point from the interior of a square with opposite
vertices (1, —1) and (1, 1) where the event C occurs if the sum of the
coordinates of the point is less than 4.
1.2. A point is to be chosen in a haphazard fashion from the interior of a
fixed circle. Assign a probability p that the point will be inside another circle,
which has a radius of one-half the first circle and which lies entirely within
the first circle.
1.3. An unbiased coin is to be tossed twice. Assign a probability p, to
the event that the first toss will be a head and that the second toss will be a4 Distributions of Random Variables [Ch. 1
tail. Assign a probability pa to the event that there will be one head and one
tail in the two tosses.
1.2 Algebra of Sets
The concept of a set or a collection of objects is usually left undefined.
However, a particular set can be described so that there is no misunder-
standing as to what collection of objects is under consideration. For
example, the set of the first 10 positive integers is sufficiently well
described to make clear that the numbers } and 14 are not in the set,
while the number 3 is in the set. If an object belongs to a set, it is said
to be an element of the set. For example, if A denotes the set of real
numbers z for which 0 < x < 1, then is an element of the set A. The
fact that } is an element of the set A is indicated by writing 3 ¢ A.
More generally, a ¢ A means that a is an element of the set A.
The sets that concern us will frequently be se¢s of numbers. However,
the language of sets of points proves somewhat more convenient than
that of sets of numbers, Accordingly, we briefly indicate how we use
this terminology. In analytic geometry considerable emphasis is placed
on the fact that to each point on a line (on which an origin and a unit
point have been selected) there corresponds one and only one number,
say x; and that to each number « there corresponds one and only one
point on the line. This one-to-one correspondence between the numbers
and points on a line enables us to speak, without misunderstanding, of
the “point x” instead of the “‘number z.” Furthermore, with a plane
rectangular coordinate system and with x and y numbers, to each
symbol (x, y) there corresponds one and only one point in the plane; and
to each point in the plane there corresponds but one such symbol. Here
again, we may speak of the “ point (z, y),” meaning the “ordered number
pair x and y.” This convenient language can be used when we have a
rectangular coordinate system in a space of three or more dimensions.
Thus the “point (2, z,..., a,)” means the numbers z,, 2,-.., 2, in
the order stated. Accordingly, in describing our sets, we frequently
speak of a set of points (a set whose elements are points), being careful,
of course, to describe the set so as to avoid any ambiguity. The nota-
tion A = {x;0 < x < 1} is read “A is the one-dimensional set of
points x for which 0 <2 < 1.” Similarly, A = {(x,y);0<@<1,
0 < y < 1} can be read “A is the two-dimensional set of points (x, y)
that are interior to, or on the boundary of, a square with opposite
vertices at (0,0) and (1, 1).” We now give some definitions (together
with illustrative examples) that lead to an elementary algebra of sets
adequate for our purposes.Sec. 1.2] Algebra of Sets 5
Definition 1. If each element of a set A; is also an element of set
‘Ag, the set Ay is called a subset of the set A». This is indicated by writing
A, © Aq. If Ay © Ay and also Ay ¢ Ay, the two sets have the same
elements, and this is indicated by writing A, = Ay.
Example 1. Let A; = {v;0 < @ < Iyand A, = {e; -1
-- = {a, 0 < @ < 1}. Note that the number zero is not in
this set, since it is not in one of the sets A,, dy, Ag,....
Definition 4. The set of all elements that belong to each of the sets
A, and A, is called the intersection of A, and A, The intersection of Ay
and Ag is indicated by writing A, 0 Ag, The intersection of several sets
A,, Ag, Ag, ... is the set of all elements that belong to each of the sets
Aj, Ag, Ag, .... This intersection is denoted by 4, AgN Ag n+
or by Ay A AgN--- A Ay if a finite number of sets is involved.
Example 8. Let A, = {(z, y); (2, y) = (0, 0), (0, 1), (1, I} and A, =
{@ y); (ey) = (1, 1), (1, 2), (2, 1}. Then Ay 0 Ag = {(@, 9); (@ y) = (1,6 Distributions of Random Variables [Ch.1
A Az
Ain Aa
FIGURE 1.1
Example9. Let A, = {(x,y);0 0,y > 0}.
Definition 6. Let / denote a space and let A be a subset of the
set , The set that consists of all elements of .f that are not elements
of A is called the complement of A (actually, with respect to «/). The
complement of A is denoted by A*. In particular, /* = 9.
Example 16. Let sf be defined as in Example 14, and let the set A =
{x;x = 0, 1}. The complement of A (with respect to sf) is A* ,3, 4}.
Example 17. Given A ¢ of. Then AU A* = of, AN A* = 9,4 Ul = oh,
AQ = A,and (A*)* = A,
EXERCISES
1.4. Find the union 4, U A, and the intersection 4, A Ag of the two
sets A, and Ag, where:
(a) Ay = fx; 2 = 0,1, 2}, do = (a; 2 = 2,3, 4}.
(b) Ay = {x;0 <2 <2, A, = (a1 <2 < 3}.
(c) Ai = {(0,9)0 Apes,
A = 1,2,3,..., the sequence is said to be a nonincreasing sequence. Give an
example of this kind of sequence of sets.8 Distributions of Random Variables [Ch.1
1.10. If A,, Aa, Ag... are sets such that Ay © Ayyi, & = 1,2,3,...,
jim A,, is defined as the union A, U A, U A, U---. Find jim A, if:
(a) Ay = (a; Ik Sx <3 - 1/8}, k =1,2,3,...;
(b) A, = {@ yi a s P+ y? <4— hb = 1,2,3,....
LU. If Ay, Ag, Ag,-.- are sets such that A, > Ayer, R= 1,2,3,...,
lim A,, is defined as the intersection A, 0 Ap Ag Q:+. Find lim A, if:
koe koe
(a) Ay = {32 - Ik < es 2},k = 1,2,3,
(b) Ay = 32 <2 52+ 1/8}, k= 1,2,3,
(c) Ay = {(@, y); 0 < a? + y? < Wk}, k = 1,2,3,....
1.3 Set Functions
In the calculus, functions such as
fQ@) =2, wo <2 Jj, then Q(A) is undefined.
At this point we introduce the following notations. The symbol
J, fe) ae
will mean the ordinary (Riemann) integral of f(x) over a prescribed
one-dimensional set A; the symbol
J, ele.) de ay
will mean the Riemann integral of g(x,y) over a prescribed two-
dimensional set A; and so on. To be sure, unless these sets A and these
functions f(z) and g(x,y) are chosen with care, the integrals will
frequently fail to exist. Similarly, the symbol
2 fe)
a
will mean the sum extended over all x ¢ 4; the symbol
Zale y)
a
will mean the sum extended over all (x, y) € A; and so on.
Example 4. Let A be a set in one-dimensional space and let Q(4) =
Je), where
J@) =H x= 1,2,3,...,
= 0 elsewhere.
If A = {2,0 < 2 < 3}, then
O14) = 4+ OP + GP = 3
Example 5. Let Q(A) = 5 f(a), where
f@) =p7(1-p)-*, 2 =0,1,
= O elsewhere,10 Distributions of Random Variables [Ch.1
If A = {@; 2 = 0}, then
zo
Q(4) = QP — py =1—p;
if A = {@;1< x < Bj, thenQ(A) =f(1) = 4.
Example 6. Let A be a one-dimensional set and let
Q(A) = fetae.
Thus, if A = (x; 0 < x < a}, then
Q(4) = f e-Fdx=1;
if A = {w;1 < x < 2}, then
oe eer
(A) = fie de = et — 0-3;
if Ay = (2,0 PC). ‘
Theorem 4. For eachC < @,0 < P(C) < 1.“4 Distributions of Random Variables [Ch. 1
Proof. Since @ ¢ C < %, we have by Theorem 3 that
P(2)