Regression Analysis and Its Application: A Data-Oriented Approach First Edition Richard F. Gunst download pdf
Regression Analysis and Its Application: A Data-Oriented Approach First Edition Richard F. Gunst download pdf
com
https://textbookfull.com/product/regression-analysis-and-
its-application-a-data-oriented-approach-first-edition-
richard-f-gunst/
OR CLICK BUTTON
DOWNLOAD NOW
https://textbookfull.com/product/understanding-regression-analysis-a-
conditional-distribution-approach-1st-edition-peter-h-westfall/
textboxfull.com
https://textbookfull.com/product/multivariate-data-analysis-joseph-f-
hair/
textboxfull.com
https://textbookfull.com/product/systems-analysis-and-design-an-
object-oriented-approach-with-uml-dennis/
textboxfull.com
JavaScript Application Design A Build First Approach 1st
Edition Nicolas Bevacqua
https://textbookfull.com/product/javascript-application-design-a-
build-first-approach-1st-edition-nicolas-bevacqua/
textboxfull.com
https://textbookfull.com/product/regression-analysis-a-practical-
introduction-1st-edition-jeremy-arkes/
textboxfull.com
https://textbookfull.com/product/systems-analysis-and-design-an-
object-oriented-approach-with-uml-5th-edition-dennis/
textboxfull.com
https://textbookfull.com/product/data-oriented-design-software-
engineering-for-limited-resources-and-short-schedules-2nd-edition-
richard-fabian/
textboxfull.com
https://textbookfull.com/product/a-course-in-categorical-data-
analysis-first-edition-leonard/
textboxfull.com
REGRESSION ANALYSIS
A N D ITS APPLICATION
STATISTICS: Textbooks and Monographs
A SERIES EDITED BY
R i c h a r d F. G unst R o b e r t L. M a s o n
Departmeiit o f Statistics Automotive Research Division
Southern Methodist I tiiversity Southwest Research Institute
Dallas, Texas San Antonio, Texas
CRC Press
Taylor & Francis C ro up
Boca Raton London New York
Neither this book nor any part may be reproduced or transmitted in any form or by
any means, electronic or mechanical, including photocopying, microfilming, and
recording, or by any information storage and retrieval system, without permission
in writing from the publisher.
MARCEL DEKKER, INC.
109 8 7
PRINTED IN THE UNITED STATES OF AMERICA
To Ann and Carmen
PREFACE
Regression analysis is considered indispensable as a data analysis technique
in a variety of disciplines. In recent years many research articles dealing
with new techniques for regression analysis have appeared in the statistical
and applied literature but few of these recent advances have appeared in re
gression textbooks written for data analysts. Regression Analysis and Its
Application: A Data-Oriented Approach bridges the gap between a purely
theoretical coverage of regression analysis and the needs of the data analyst
who requires a working knowledge of regression techniques. Data analysts,
consultants, graduate and upper-level undergraduate students, faculty
members, research scientists, and employees of governmental data-collec-
tion agencies comprise only a few of the groups whose research activities
can benefit from the material in this book.
The main prerequisites for reading this book are a first course in statisti
cal methods and some college-level mathematics. A first course in statistical
methods is required so that summation notation, basic probability distribu
tions (normal, i, chi-square, T)» hypothesis testing, and confidence interval
estimation are already familiar to the reader. Some mathematical knowl
edge of algebra, functional relationships [f{x) = x'K ln(x)], and solving
equations is also important to an appreciation of the material covered in the
text. Although manipulation of vectors and matrices are essential to the un
derstanding of the topics covered in the last two-thirds of this book, we do
not presume that readers have had a course in vector algebra. Rather, we
include an introduction to the properties and uses of vectors and matrices in
Chapter 4.
VI PREFACE
Richard F. Gunst
Dallas, Texas
CONTENTS
INTRODUCTION 1
1.1 DATA COLLECTION 2
1.1.1 Data Base Limitations 3
1.1.2 Data-Conditioned Inferences 5
1.2 REGRESSION ANALYSIS 6
1.2.1 Linear Regression Models 6
1.2.2 Regression vs. Correlation 8
1.3 USES OF REGRESSION ANALYSIS 9
1.3.1 Prediction 9
1.3.2 Model Specification 10
1.3.3 Parameter Estimation 11
1.4 ABUSES OF REGRESSION ANALYSIS 12
1.4.1 Extrapolation 12
1.4.2 Generalization 15
1.4.3 Causation 17
IX
CONTENTS
4. MULTIPLE-VARIABLE PRELIMINARIES 92
4.1 REVIEW OF MATRIX ALGEBRA 94
4.1.1 Notation 94
4.1.2 Vector and Matrix Operations 96
4.1.3 Model Definition 103
4.1.4 Latent Roots and Vectors 104
4.1.5 Rank of a Matrix 106
4.2 CARE IN MODEL BUILDING 108
4.2.1 Misspecification Bias 108
4.2.2 Overspecification Redundancy no
4.3 STANDARDIZATION 111
4.3.1 Benefits 112
4.3.2 Correlation Form of X 'X 114
CONTENTS XI
. INFERENCE 167
6.1 MODEL DEFINITION 169
6.1.1 Four Key Assumptions 169
6.1.2 Alternative Assumptions 172
6.2 ESTIMATOR PROPERTIES 174
6.2.1 Geometrical Representation 174
6.2.2 Expectation 178
6.2.3 Variances and Covariances 183
6.2.4 Probability Distributions 187
XII CONTENTS
BIBLIOGRAPHY 389
INDEX 398
REGRESSION ANALYSIS
A N D ITS APPLICATION
CHAPTER 1
INTRODUCTION
Data analysis of any kind, including a regression analysis, has the
potential for far-reaching consequences. Conclusions drawn from small
laboratory experiments or extensive sample surveys might only influence
one’s colleagues and associates or they could form the basis for policy deci
sions by governmental agencies which could conceivably affect millions of
people. Data analysts must, therefore, have an adequate knowledge of and
a healthy respect for the procedures they utilize.
Consider as an illustration of the potential for far-reaching effects of a
data analysis one of the most massive research projects ever undertaken, the
Salk polio vaccine trials (Meier, 1972). The conclusions drawn from the
results of this study ultimately culminated in a nationwide polio immuniza
tion program and virtual elimination of this tragic disease in the United
States. The foresight and competence of the principal investigators of the
study prevented ambiguity of the results and possible criticism of the con
clusions. The handling of this experiment provides valuable lessons in the
overall role of data analysis and the care with which it must be approached.
Polio in the early 1950’s was a mysterious disease. No one could predict
where or when it would strike. It did not affect a large segment of any com
munity but those it did strike, mostly children, were often left paralyzed. Its
crippling effect on young children and the sporadic nature of its occurrence
led to demands for a major effort in eradicating the disease. Salk’s vaccine
was one of the most promising ones available, but it had not been suffi
ciently tested.
1
2 INTRODUCTION
Since the occurrence of polio in any specific community could not be pre
dicted and only a small portion of the population actually contracted the
disease in any year, a large-scale experiment including many communities
was necessitated. In the end over one million children participated in the
study, some receiving the vaccine and others just a placebo.
In allowing their children to participate, many parents insisted on know
ing whether their child received the vaccine or the placebo. These children
constituted the “ observed-placebo” group (Meier, 1972). The planners of
the experiment, realizing potential difficulties in the interpretation of the
results, insisted that there be a large number of communities for which
neither child, parent, nor diagnosing physician knew whether the child
received the vaccine or the placebo. This group of children made up the
“ placebo-control” group.
For both groups of children the incidence of polio was lower for those
vaccinated than for those who were not vaccinated. The conclusion was
unequivocal: the Salk vaccine proved effective in preventing polio. This
conclusion would have been compromised, however, had the planners of
the study not insisted that the placebo-control group be included. Doubts
that the observed-placebo group could reliably indicate the effectiveness of
the vaccine were raised both before and after the experiment. The indicators
of polio are so similar to those of some other diseases that the diagnosing
physician might tend to diagnose polio if he knew the child had not been
vaccinated and diagnose one of the other diseases if he knew the child had
been vaccinated. After the experiment was conducted, analysis of the data
for the observed-control group indicated that the vaccine was effective but
the differences were not large enough to prevent charges of (unintentional)
physician bias. Differences in the incidence of polio between vaccinated and
nonvaccinated children in the placebo-control group were larger than those
in the observed-control group and the analysis of this data provided the
definitive conclusion. Thus due to the careful planning and execution of this
study, including the data collection and analysis, the immunization pro
gram that was later implemented has resulted in almost complete eradica
tion of polio in the United States.
conclusions can be of great value, provided that either the nonresponse rate
is small enough to be ignored or that it can be ascertained that the nonre
spondents would reply similarly to those who did respond (this latter point
holds true also for Coleman’s data but the large nonresponse rate cannot be
ignored).
Deficiencies in the data base that can be identified, therefore, may enable
conditional inferences or conclusions to be drawn. Poor data-collection
procedures that result in suspected data deficiencies of an unknown nature
can render any attempt at analysis of the data fruitless.
In this Study mortality would be regarded as the response variable and the
above characteristics as possible predictor variables.
A host of difficulties must be addressed before the technical details of a
regression analysis can be performed to examine the relationship between
mortality and these predictor variables. Among the problems addressed by
Lave and Seskin are the lack of adequate information on many of the varia
bles, ambiguity in the definition of others, errors in measurement, and the
controversy over causal assumptions; e.g., if air pollution levels are found
to be beneficial as predictor variables does this imply that air pollution
causes increases in mortality? Putting these questions aside for the moment,
the authors obtained prediction equations (“ fitted” or estimated regression
models) for mortality using air pollution and socioeconomic variables. One
of the equations, using 1960 data from the 117 largest Standard Metropol
itan Statistical Areas (SMSA’s), is
Using this fitted regression model, mortality rates for SMSA’s can be esti
mated by inserting values for minimum sulfate level, etc., and performing
the multiplications and additions indicated in the prediction equation. The
value of each predictor variable can be assessed through statistical tests on
the estimated coefficients (multipliers) of the predictor variables. These pro
cedures and other evaluations of the prediction equation will be detailed in
later chapters. We now turn to a formal algebraic definition of a regression
model.
All applications of linear regression methodology involve the specifica
tion of a linear relationship between the response and predictor variables.
Denoting the response variable by V and the p predictor variables by X],X2,
Xpy the linear relationship takes the form
Y = a + PiX{ + p2^2 ^
where e measures the error in determining Y from only Xj and X2. Numeri
cal procedures discussed in later chapters allow a, and P2 to be estimated
as well as the probable size of the error.
The term “ linear” is used to distinguish the type of regression models
that are analyzed in this book: the unknown parameters in eqn. (1.2.1)
occur as simple multipliers of the predictor variables or, in the case of a, as
additive constants. If one assigns numerical values to any (p - 1) of the pre
dictor variables, the relationship between the response variable and the
remaining predictor variable is, apart from the error term, a straight line.
For example, in the model
Y= 10 4- 5Xi + 3X2-4X3 4- £
r = 16 + 3X2+ £.
She had been out in the vain hope of being able to find
a little work for herself, for although they were better off
than many of the neighbours, it was hard work to provide
for all their wants even with the help they got from the
strike fund.
But one day there came a letter with news that set the
little household in a quiver of excitement.
"I wouldn't own her for my 'little un' again if she didn't
do all she could for you, Winny," he replied. "But we shall
miss you, my dear; we shall all miss you. But look here, if
you hadn't give my gal the chance you did, why she
couldn't have done this for you, so you see after all, it's just
your own kindness coming back to you again. The seed you
sowed is just bearing the right kind of fruit. That's what it
is, my lass, you may depend upon that. We heard
something like it down at the mission-hall the other day,
when Miss Lavender give us that tea. She stood up
afterwards and warned us against losing our patience or our
temper, telling us in good plain words that the seed we
sowed would bring the same kind of fruit to us."
She waited a minute, but not more, and then she went
to her husband again. "You must come home at once," she
said a little sharply, for she did not like to be put off for a
stranger like this.
"No, no, it isn't that; but we've had a letter from Annie
Brown, and she says there is constant work for you in the
country if you like to go and see about it."
This would make him give up his talk with the stranger
and go home with her, she thought. She had not meant to
tell him so quickly, but she wanted to get him away, and
thought that this would do it, if everything else failed.
"Do you want this work at all, Tom Chaplin?" asked his
wife.
She could not see that just lounging about the dock
gates, walking up and down, speaking occasionally to the
policeman, taking with a smile some ugly epithet thrown at
them by the dock foreman who might be passing, was by
any means so important as her husband seemed to think,
and she was more angry with him than ever she had been
in her life before.
The lady heard the poor woman's story, and could well
sympathize with her impatience at what seemed like her
husband's apathy. But having done so, she said, "He could
not have left his post without leave from those who placed
him there. You see it is not every man who could be trusted
to do such duty, for these pickets must be careful, steady
men. No, no, Mrs. Chaplin, he could not leave such a post
as that for anything," added the lady.
"We must take care he does not do that," said the lady.
"I will write a telegram and give you the money to send it to
the country." And as she spoke, the lady took a pencil from
her pocket, and wrote on the leaf of her pocket-book:
"Chaplin will come to-morrow—cannot leave
post of duty."
Chaplin came home soon after four, very tired but full of
eager expectation.
But Mrs. Chaplin soon took it from her, for she was all
eagerness to see whether her husband had a chance of
making a decent appearance at the place he was going to.
To see him once more clad like a decent carpenter was the
highest ambition of her life. Her friend knew this, and felt
that the man would stand a much better chance of success
in his new venture, if he could go down in trim, tidy clothes
instead of the poor rags he wore as a dock labourer. So she
had managed to get a decent gray suit about his size, and a
clean white shirt, and a pair of boots, so that nothing was
wanting to complete his attire.
To see them all when these were laid out for inspection
can better be imagined than described. Letty danced round
the table, bumping her head against the bedstead in the
process, while Winny clapped her hands, and insisted that
her father should dress himself in them at once that they
might have time to admire him in them before he went
away the next day.
The next letter that came from Annie had almost the
same words.
Updated editions will replace the previous one—the old editions will
be renamed.
1.D. The copyright laws of the place where you are located also
govern what you can do with this work. Copyright laws in most
countries are in a constant state of change. If you are outside the
United States, check the laws of your country in addition to the
terms of this agreement before downloading, copying, displaying,
performing, distributing or creating derivative works based on this
work or any other Project Gutenberg™ work. The Foundation makes
no representations concerning the copyright status of any work in
any country other than the United States.
1.E.6. You may convert to and distribute this work in any binary,
compressed, marked up, nonproprietary or proprietary form,
including any word processing or hypertext form. However, if you
provide access to or distribute copies of a Project Gutenberg™ work
in a format other than “Plain Vanilla ASCII” or other format used in
the official version posted on the official Project Gutenberg™ website
(www.gutenberg.org), you must, at no additional cost, fee or
expense to the user, provide a copy, a means of exporting a copy, or
a means of obtaining a copy upon request, of the work in its original
“Plain Vanilla ASCII” or other form. Any alternate format must
include the full Project Gutenberg™ License as specified in
paragraph 1.E.1.
• You pay a royalty fee of 20% of the gross profits you derive
from the use of Project Gutenberg™ works calculated using the
method you already use to calculate your applicable taxes. The
fee is owed to the owner of the Project Gutenberg™ trademark,
but he has agreed to donate royalties under this paragraph to
the Project Gutenberg Literary Archive Foundation. Royalty
payments must be paid within 60 days following each date on
which you prepare (or are legally required to prepare) your
periodic tax returns. Royalty payments should be clearly marked
as such and sent to the Project Gutenberg Literary Archive
Foundation at the address specified in Section 4, “Information
about donations to the Project Gutenberg Literary Archive
Foundation.”
• You comply with all other terms of this agreement for free
distribution of Project Gutenberg™ works.
1.F.
1.F.4. Except for the limited right of replacement or refund set forth
in paragraph 1.F.3, this work is provided to you ‘AS-IS’, WITH NO
OTHER WARRANTIES OF ANY KIND, EXPRESS OR IMPLIED,
INCLUDING BUT NOT LIMITED TO WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR ANY PURPOSE.
Please check the Project Gutenberg web pages for current donation
methods and addresses. Donations are accepted in a number of
other ways including checks, online payments and credit card
donations. To donate, please visit: www.gutenberg.org/donate.
Most people start at our website which has the main PG search
facility: www.gutenberg.org.
Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.
textbookfull.com