Dana S. Dunn, Suzanne Mannes - Statistics and Data Analysis For The Behavioral Sciences-McGraw-Hill Companies (2001)
Dana S. Dunn, Suzanne Mannes - Statistics and Data Analysis For The Behavioral Sciences-McGraw-Hill Companies (2001)
9 780072 347647
Selected Descriptive Statistics Application Location
Frequency distribution Categorizes numerical responses based on a scale Chapter 3
Proportion Reflects a frequency's relationship to a sample's N Chapter 3
Percent A number representing the proportion of a score Chapter 3
per hundred
Relative frequency Percent or proportion of raw scores for X Chapter 3
di tribution
Grouped frequency Places raw scores (X) into preset value intervals Chapter 3
di tribution
True limits Range of values between which a variable lies Chapters 1, 3
Cumulative freq uency Number of interval values added to the total values Chapter 3
falling below an interval
Cumulative perce ntage Percentage of interval values added to the total Chapter 3
perce ntage falling below an interval
Quartile Divides a distribution into four equal portions Chapter 3
Mode Most freq uently occu rring score(s) in data set Chapter 4
Range Difference betwee n the high and low scores in a distribution Chapter 4
Interquartile range Range of scores falling between the 75th and 25th Chapter 4
percentiles (m iddle) of a di stribution
emi-interquartile range Nu merical index of half the distance between the first Chapter 4
and third quartiles in a distribution
Variance Average of the squared deviations from the mean of a Chapter 4
distribution
tandard deviation Average deviation between a given score and a Chapter 4
distribution's mean
z ore Indicates the distance between a given score and a Chapter 5
distribution's mean in standard deviation units
(also known as a standard sco re)
Nonparametric
("assumption free") Statistics Application Location
Chi-square "goodness- Examines whether categorical data conform to Chapter 14
of-fit" test proportions specified by a null hypothesis
Chi-square test of Determines whether frequencies associated with two Chapter 14
independence nominal variables are independent
Phi coefficient (1)) Assesses association between two dichotomous variables Chapter 14
Cramer's V Assesses association between two dichotomous variables Chapter 14
when one or both are more than two levels
Mann-Whitney Identifies differences between two independent samples Chapter 14
U test of ordinal data
Wilcoxon matched-pairs Identifies differences between two dependent Chapter 14
signed-ranks test samples of ordinal data
Spearman rank-order Assesses strength of association between ordinal Chapter 14
correlation coefficient (rs) data
Selected
Statistical
Symbols Description First appears on page
DANA S. DUNN
Moravian College
I'
I
\
j
i
f
,I Boston Burr Ridge, IL Dubuque,IA Madison, WI New York San Francisco St. Louis
,
I
Bangkok Bogota Caracas Lisbon London Madrid
"
I
Mexico City Milan New Delhi Seoul Singapore Sydney Taipei Toronto
I
i
McGraw-Hill Higher Education ~
A Division of The McGraw-Hill Companies
Some ancillaries, including electronic and print components, may not be available to customers
outside the United States.
234567890VNHNNH0987654321
ISBN 0-07-234764-3
Photo credit
Figure 1.5; ©CorbislBettmann
/
I
The credits section for this book begins on page C-1 and is considered an extension of the
copyright page. /
I
Library of Congress Cataloging-in-Publication Data I
Dunn, Dana.
Statistics and data analysis for the behavioral sciences / Dana S. Dunn. -1st ed.
p. cm.
Includes bibliographical references and index.
!)
ISBN 0-07-234764-3
r
1. Psychometrics. 2. Psychology-Research-Methodology. I. Title. I
(
BF39.D825 2001
150' .l'5195-dc21 00-030546
CIP
www.mhhe.com
;
/
To the memory of my father and grandfather,
James L. Dunn and Foster E. Kennedy.
"WHAT'S PAST IS PROLOGUE" - THE TEMPEST (ACT II, SC. I)
ABOUT THE AUTHOR
Dana S. Dunn is currently an Associate Professor and the Chair of the Depart-
ment of Psychology at Moravian College, a liberal arts and sciences college in
Bethlehem, Pennsylvania. Dunn received his Ph.D. in
experimental social psychology from the University of
Virginia in 1987, having previously graduated with a BA
in psychology from Carnegie Mellon University in 1982.
DANA S. DUNN He has taught statistics and data analysis for over 12 years.
Dunn has published numerous articles and chapters in the
areas of social cognition, rehabilitation psychology, the
teaching of psychology, and liberal education. He is
the author of a research methods book, The Practical
Researcher: A Student Guide to Conducting Psychological Research (McGraw-Hill,
1999). Dunn lives in Bethlehem with his wife and two children.
)
i
vi
J
CONTrNTS IN 5Rlri
Preface
Acknowledgments
1 INTRODUCTION: STATISTICS AND DATA ANALYSIS AS TOOLS
FOR RESEARCHERS 3
/'
2 PROCESS OF RESEARCH IN PSYCHOLOGY AND RELATED
1
.; FIELDS 45
3 FREQUENCY DISTRIBUTIONS, GRAPHING, AND DATA
DISPLAY 85
4 DESCRIPTIVE STATISTICS: CENTRAL TENDENCY AND
VARIABILITY 133
5 STANDARD SCORES AND THE NORMAL DISTRIBUTION 177
6 CORRELATION 205
7 LINEAR REGRESSION 241
8 PROBABILITY 273
9 INFERENTIAL STATISTICS: SAMPLING DISTRIBUTIONS AND
HYPOTHESIS TESTING 315
10 MEAN COMPARISON I: THE tTEST 365
11 MEAN COMPARISON II: ONE-VARIABLE ANALYSIS OF
VARIANCE 411
12 MEAN COMPARISON III: TWO-VARIABLE ANALYSIS OF
VARIANCE 459
13 MEAN COMPARISON IV: ONE-VARIABLE REPEATED-
MEASURES ANALYSIS OF VARIANCE 499
14 SOME NONPARAMETRIC STATISTICS FOR CATEGORICAL AND
ORDINAL DATA 523
15 CONCLUSION: STATISTICS AND DATA ANALYSIS IN
CONTEXT 563
vii
Contents in Brief viii
Preface xxi
Acknowledgments xxvi
Reader Response xxviii
1
) 1 INTRODUCTION: STATISTICS AND DATA ANALYSIS AS
TOOLS FOR RESEARCHERS 3
DATA BOX 1.A: What Is or Are Data? 5
Tools for Inference: David L.'s Problem 5
College Choice 6
College Choice: What Would (Did) You Do? 6
Statistics Is the Science of Data, Not Mathematics 8
Statistics, Data Analysis, and the Scientific Method 9
Inductive and Deductive Reasoning 10
Populations and Samples 12
Descriptive and Inferential Statistics 16
DATA BOX 1.B: Reactions to the David L. Problem 18
Knowledge Base 19
Discontinuous and Continuous Variables 20
DATA BOX 1.C: Rounding and Continuous Variables 22
Writing About Data: Overview and Agenda 23
Scales of Measurement 24
Nominal Scales 25
Ordinal Scales 26
Interval Scales 27
Ratio Scales 28
Writing About Scales 29
Knowledge Base 31
Overview of Statistical Notation 31
What to Do When: Mathematical Rules of Priority 34
DATA BOX 1.D:The Size of Numbers is Relative 38
Mise en Place 39
ix
x Contents
About Calculators 39
Knowledge Base 40
PRO.JECT EXERCISE: Avoiding Statisticophobia 40
Looking Forward, Then Back 41
Summary 42
Key Terms 42
Problems 42
f
,.
;
Contents xiii
/
I
xiv Contents
8 PROBABILITY 273
The Gambler's Fallacy or Randomness Revisited 275
Probability: A Theory of Outcomes 277
Classical Probability Theory 277
DATA BOX 8oA: "I Once Knew a Man Who ...": Beware Man-Who
Statistics 278
Probability's Relationship to Proportion and Percentage 281
DATA BOX 8.B. Classical Probability and Classic Probability
Examples 282
Probabilities Can Be Obtained from Frequency Distributions 283
Knowledge Base 283
DATA BOX S.C. A Short History of Probability 284
Calculating Probabilities Using the Rules for Probability 285
The Addition Rule for Mutually Exclusive and Nonmutually
Exclusive Events 285
The Multiplication Rule for Independent and Conditional
Probabilities 287
DATA BOX 8.D. Conjunction Fallacies: Is Linda a Bank Teller or a
Feminist Bank Teller? 288
Contents xv
J
,/
I
xvi Contents
Too 508
Thkey's HSD Revisited 510
Effect Size and the Degree of Association Between the Independent
Variable and Dependent Measure 511
Contents xix
i
I
PRriAcr
i
)
In my view statistics has no reason for existence except as a catalyst for learning and
discovery.
- - GEORGE BOX
This quotation serves as the guiding rationale for this book and, I hope, provides an
outlook for teaching and learning about statistics. From the main content to the ped-
agogical aids and end-of-the-chapter exercises, this textbook fosters learning and dis-
covery. As students learn how to perform calculations and interpret the results, they will
discover new ways to think about the world around them, uncover previously unrec-
ognized relationships among disparate variables, and make better judgments about how
and why people behave the way they do.
Statistics and Data Analysis for the Behavioral Sciences teaches the theory behind
statistics and the analysis of data through a practical, hands-on approach. Students will
learn the "how to" side of statistics: how to select an appropriate test, how to collect
data for research, how to perform statistical calculations in a step-by-step manner, how
to be intelligent consumers of statistical information, and how to write up analyses and
results in American Psychological Association (APA) style. Linking theory with prac-
tice will help students retain what they learn for use in future behavioral science courses,
) research projects, graduate school, or any career where problem solving is used. Com-
[' bining statistics with data analysis leads to a practical pedagogical goal-helping stu-
dents to see that both are tools for intellectual discovery that examine the world and
events in it in new ways.
r'
I
f' • To the Student
Two events spurred me to write this book, and I want you to know that I wrote it with
r students foremost in my mind. First, I have taught statistics for over 12 years. In that
time, I've come to believe that some students struggle with statistics and quantitative
material simply because it is not well presented by existing textbooks. Few authors, for
example, adequately translate abstract ideas into concrete terms and examples that can
be easily understood. Consequently, as I wrote this book, I consciously tried to make
I even the most complex material as accessible as possible. I also worked to develop ap-
r plications and asides that bring the material to life, helping readers to make connec-
tions between abstract statistical ideas and their concrete application in daily life.
xxi
r
xxii Preface
Second, the first statistics course that I took as an undergraduate was an unmiti-
gated disaster, really, a nightmare-it was dull, difficult, and daunting. I literally had no
idea what the professor was talking about, nor did I know how to use statistics for any I
I
purpose. I lost that battle but later won the war by consciously trying to think about i
i
how statistics and the properties of data reveal themselves in everyday life. I came to
appreciate the utility and even--dare I say it-the beauty of statistics. In doing so, I
also vowed that when I became a professor, no student of mine would suffer the pain
and intellectual doubt that I did as a first-time statistics student. Thus, I wrote this book
with my unfortunate "growing" experience in mind. I never want anyone in my classes
or using my book to feel the anxiety that I did and, though it is a cliche, I think that
the book is better because of my trying first experience.
How can you ensure that you will do well in your statistics class? Simple: Attend
classes, do the reading, do the homework, and review what you learn regularly. Indeed,
it is a very good idea to reserve some meaningful period of time each day for studying
statistics and data analysis (yes, I am quite serious). When you do not understand some-
thing mentioned in this book or during class, ask the instructor for clarification im-
mediately, not later, when your uncertainty has had time to blossom into full-blown
confusion (remember my first experience in a statistics class-I know whereof I speak).
Remember, too, the importance of reminding yourself that statistics is for something.
You should be able to stop at any given point in the course of performing a statistical /
test in order to identify what you are doing, why, and what you hope to find out by us-
ing it. If you cannot do so, then you must backtrack to the point where you last un-
derstood what you were doing and why; to proceed without such understanding is not
only a waste of time, it is perilous, even foolhardy, and will not help you to compre-
hend the material. By the way, if you feel that you need a review of basic mathematics,
Appendix A provides one, including some helpful ideas on dealing with math anxiety.
Beyond these straightforward steps, you should also take advantage of the peda-
gogical tools I created for this book. They are reviewed in detail in the To the Instruc-
tor section, and I suggest you take a look at their descriptions below. I do, however, take
the time to explain these tools and their use as they appear in the first few chapters of ,
the book. I urge you to take these devices seriously, to see them as complementary to
and not replacements for your usual study habits. I promise you that your diligence will j
have a favorable payoff in the end-actual understanding, reduced anxiety, and prob-
ably a higher grade than you expected when you first began the class.
i
I
II To the Instructor
This book was written for use in a basic, first, non-calculus-based statistics course for
undergraduate students in psychology, education, sociology, or one of the other be-
havioral sciences. I assume little mathematical sophistication, as any statistical proce-
dure is presented conceptually first, followed by calculations demonstrated in a step-
by-step manner. Indeed, it is important for both students and instructors to remember
that statistics is not mathematics, nor is it a subfield of mathematics (Moore, 1992).
This book has a variety of pedagogical features designed to make it appeal to in-
structors of statistics (as well as students) including the following:
Decision Trees. Appearing on the opening page of each chapter, these very simple
flow charts identify the main characteristics of the descriptive or inferential procedures
reviewed therein, guiding readers through what a given test does (e.g., mean compari-
son), when to use it (i.e., to what research designs does it apply), and what sort of data
it analyzes (e.g., continuous). At the close of each chapter, readers are reminded to rely
Preface xxiii
on the decision trees in a section called "Looking forward, then back." A special icon
(H) prompts them to recall the features found in the decision tree( s) opening the
chapters.
Key Terms and Concepts. Key terms (e.g., mean, variance) and concepts (e.g., ran-
dom sampling, central limit theorem) are highlighted throughout the text to gain read-
ers' attention and to promote retention. An alphabetical list of key terms (including the
page number where each is first cited) appears at the end of every chapter.
Focus on Interpretation of Results and Presenting Them in Written Form. All sta-
tistical procedures conclude with a discussion of how to interpret what a result actu-
ally means. These discussions have two points: what the test literally concludes about
some statistical relationship in the data and what it means descriptively-how did par-
ticipants behave in a study, what did they do? The focus then turns to clearly commu-
nicating results in prose form. Students will learn how to put these results into words
for inclusion in American Psychological Association (APA) style reports or draft arti-
cles. I used this approach successfully in a previous book (Dunn, 1999). Appendix C,
which provides a brief overview of writing APA style reports, gives special emphasis to
properly presenting research results and statistical information.
Statistical Power, Effect Size, and Planned and Post Hoc Comparisons. Increasingly,
consideration of statistical power and effect size estimates is becoming more common-
place in psychology textbooks as well as journals. I follow this good precedent by at-
taching discussion of the strength of association of independent to dependent variables
along with specific inferential tests (e.g., estimated omega-squared-c;)2 -is presented
with the F ratio). In the same way, review of planned or post hoc comparisons of means
are attached to discussions of particular tests. I focus on conceptually straightforward
approaches for doing mean comparisons (e.g., Tukey's Honestly Significant Difference
,.
,.'
xxiv Preface
[HSD} test), but I also discuss the important-but often neglected-perspectives pro-
vided by contrast analysis (e.g., Rosenthal & Rosnow, 1985).
or extends issues presented therein. Project Exercises are designed to give students the
;
opportunity to think about how statistical concepts can actually be employed in re- r
search or to identify particular issues that can render data analysis useful for the design r
of experiments or the interpretation of behavior. On occasion, a chapter's Project Ex-
j
ercise might be linked to a Data Box.
I
I
(
End-of-Chapter Problems. Each chapter in the text concludes with a series of prob-
lems. Most problems require traditional numerical answers, but many are designed to
help students think coherently and write cogently about the properties of statistics and
data. Answers to the odd-numbered problems are provided in the back of the textbook
in Appendix E.
• Supplements
Statistics and Data Analysis for the Behavioral Sciences has several supplements designed
to help both instructors and students. These supplements include:
Elementary Data Analysis Using Microsoft Excel by Mehan and Warner (2000). This
easy to use workbook introduces students to Microsoft Excel speadsheets as a tool to
be used in introductory statistics courses. By utilizing a familiar program such as Ex-
cel, students can concentrate more on statistical concepts and outcomes and less on the
mechanics of software.
Preface xxv
Instructor's Manual and Test Bank. The book has a detailed Instructor's Manual
(1M) and Test Bank (TB). The 1M includes syllabus outlines for one- or two-semester
statistics courses, detailed chapter outlines, key terms, lecture suggestions, sugges-
tions for classroom activities and discussions, film recommendations (where avail-
able and appropriate), and suggested readings for the instructor (i.e., articles and
books containing teaching tips, exercises). The TB contains test items (i.e., multiple
choice items, short essays, problems), and is also available on computer diskette for
PC and Macintosh.
/'
ACKNOWlrDCJf'\rNTS
riters of statistics books require willing, even charitable, readers of rough drafts.
My colleagues and friends, Stacey Zaremba, Matthew Schulz, and Robert Brill,
read and commented on most of the chapters in this book. Peter von Allmen and
Jeanine S. Stewart provided valuable suggestions regarding specific issues and chapters.
Dennis Glew and Clif Kussmaul improved the clarity of some examples. During spring
1999, several students in my Statistics and Research Methods class took the time to read
initial drafts of the first half of the book, and their subsequent suggestions refined the
material. The Reference Librarians and the Interlibrary Loan Department of Reeves
Library helped me to track down sometimes obscure materials or references. Ever
patient, Jackie Giaquinto shepherded the manuscript and me through our appropriate
paces, and reminded me of my other responsibilities. Sarah Hoffman helped to pull bits
and pieces of the manuscript together at the end of the revision process. I want to
express my gratitude to the Moravian College Faculty Development and Research
Committee for the summer grant that enabled me to finish the book on time. My friend,
Steve Gordy, studiously avoided reading anything this time round, but his support was
there, and welcome, nonetheless.
Beyond my campus, I am very grateful to the constructive comments and criticism
offered by an excellent group of peer reviewers, including:
xxvi
Acknowledgments xxvii
and several anonymous reviewers. I followed many but not all of their recommenda-
tions, so any errors in omission, commission, inference, or good judgment are mine
alone. Special thanks to Dennis Cogan, o. Joseph Harm, and Helen Kim for their assis-
tance in providing accurate anwers to the problems and computations within this text.
This is the second book I have written with the McGraw-Hill College Division, and
I remain convinced that the professionals who work there are rare and true. I hope our
relationship is a long one. My editor and friend, Joe Terry, established the project's
vision, and then developmental editor, Susan Kunchandy, helped to move it forward.
Editorial director Jane Vaicunas continued to show confidence in my work. Barbara
Santoro-a tireless and dedicated individual-answered all my queries, organized end-
less details, and provided help at every turn. Marketing manager Chris Hall provided
sage advice about the book's development in its later stages. Project manager Susan
Brusch steered the book (and me) skillfully through the production schedule. Wayne
Harms created the book's elegant and clear design. I am grateful to copy editor, Pat
i Steele, for consistently improving my prose.
,/ Finally, my family-past and present-enabled me to write this book. Daily, my
wife, Sarah, and my children, Jake and Hannah, reminded me of the link between love
and work. I am grateful for their patience and good humor. Dah K. Dunn's faith in my
f
I writing was as steadfast as ever. I dedicate this book to two fine men from my family.
/
)
l
(
f
"
I
(
)
I
RrADrR RrSPONSr
No book can satisfy every reader, but every author makes a genuine effort to try, any-
I
way, and I am no different. I welcome your comments to this first edition of Statistics I
and Data Analysis for the Behavioral Sciences. I pledge to listen carefully to critical re-
actions as well as any compliments, using both to improve this book's pedagogy in the
future. I encourage you to take a moment, log on to www.mhhe.com/dunn and com-
plete the short questionnaire found on this website. Your questionnaire will be e-mailed
to the publisher, who will share it with me. You may also contact me directly at the De- I
j
!
xxviii
Data or Datum? Which Scale of Measurement Is Being Used?
If yes,then
go to 2.
CIIaptar OUIIIne
related behavioral science discipline, you should know that statistics and data analysis • Scales of Measurement
can help you to answer focused questions about cause and effect, to simplify complexity, Nominal Scales
to uncover heretofore unrecognized relationships among observations, and to make Ordinal Scales
more precise judgments about how and why people behave the way they do. This book Interval Scales
will teach you about some of the theory behind statistics and the analysis of data Ratio Scales
through a practical, hands-on approach. As you read, you will learn the"how to" side of Writing About Scales
statistics and data analysis, including: Knowledge Base
• Overview of Statistical
• How to select an appropriate statistical test What To Do When:
• How to collect the right kinds of information for analysis Rules of Priority
• How to perform statistical calculations in a straightforward, step-by-step manner Data Box I.D: The Size of
• How to accurately interpret and present statistical results Numbers Is Relative
• How to be an intelligent consumer of statistical information Mise En Place
• How to write up analyses and results in American Psychological Association (APA) style About Calculators
Knowledge Base
Linking theory with practice will help you to retain what you learn so that you can use it
in future courses in psychology or the other behavioral sciences, research projects, grad- Project Exercise: Avoiding
Statisticophobia
uate or professional school, or any career where problem solving, statistics, and data
analysis are used. • Looking Forward, Then Back
But we are getting ahead of ourselves. First, we need to define some terms, terms • Summary
that have been used as if you already understood them! What is a statistic, anyway? What • KeyTerms
is data analysis? Why are these terms important? • End of Chapter Problems
4 Chapter 1 Introduction: Statistics and Data Analysis as Tools for Researchers
KEY T ER M A statistic is some piece of information that is presented in numerical form. For example, a nation's
5% unemployment rate is a statistic, and so is the average number of words per minute read by a
group of second-graders or the reported high temperature on a July day in Juneau, Alaska.
The field of statistics-and the statisticians who work within it-focuses on appro-
priate ways to collect, codify, analyze, and interpret numerical information. The ways
statisticians examine information are formalized into a set of rules and procedures com-
monly referred to as "statistics." Scholars, researchers, teachers, and students from many
disciplines rely on these organized rules and procedures to answer questions and explore
topics unique to their fields.
-
.,..--
-.'.:":',
' .
....
KEY TERM Data analysis refers to the systematic examination of a collection of observations. The examination
can answer a question, search for a pattern, or otherwise make some sense out of the observations.
These observations are either numerical (i.e., quantitative) or not based on numbers
(i.e., qualitative). If the observations are quantitative-for example, number of puzzles
solved in an experiment-statistics are frequently used in the data analysis. What was the
highest number solved? The lowest? In the case of qualitative information, some organiz-
ing principle-identifying the emotional content of words exchanged between husbands
and wives, for instance-can draw meaning from the observations. Do women use words
that establish relationships, whereas men rely on words that express their individuality?
Despite popular opinion, as terms, statistics and data analysis are neither synony-
Statistics and data analysis are mous nor redundant with one another. For our purposes, the first term emphasizes the
importance of working through necessary calculations in order to identify or discover
complementary-not equivalent-
relationships within the quantitative results of research. The second term, however, ac-
terms. knowledges the interpretive, methodological, or analytic side of working with informa-
tion-knowing, for instance, what information to collect and what to do with it once it
is collected. Unlike the term statistics, data analysis also allows for the possibility that not
Tools for Inference: David L.'s Problem 5
all the information you encounter or are interested in will necessarily be quantitative in
nature. As we will see later in this book (Appendix F), qualitative-that is, non-
numerical, often descriptive or narrative-relationships within data can be equally re-
vealing and, increasingly, social scientists are taking an active interest in them.
Key terms like these will be highlighted throughout the book. Whenever you come
Quantitative relationships are across a new term, plan to take a few minutes to study it and to make sure that you un-
derstand it thoroughly. Why? Because learning the vocabulary and conceptual back-
numerical. Qualitative relationships
ground of statistics is akin to learning a foreign language; before you can have an actual
are based on descriptions or conversation in, say, French or German, you need to know how to conjugate verbs, use
organizing themes, not numbers. pronouns, recognize nouns, and so on. Like learning the parts of speech in another lan-
guage, it takes a bit of time and a little effort to learn the language of statistics. As you will
see, it can be done-you will learn to use this new language, understand, and even ben-
efit from it. One important point, though: the more you work with the statistical terms
and their meanings, the more quickly they will become second nature to you-but you
must make the effort starting here and now. Similar to studying a foreign language, sta-
tistical concepts build upon one another; learning the rudimentary parts of speech, as it
were, is essential to participating in the more complex dialog that comes later.
I nexplicably, the word data has developed a certain cachet in contemporary society. Although it is
usually associated with science, use of the word is now common in everyday speech. As a"buzz"
word, it seems to lend an air of credibility to people's pronouncements on any number of topics.
But what does the word data actually mean? "Data" refer to a body of information, usually a col-
lection of facts, items, observations, or even statistics. The word is Latin in origin, meaning "a
thing given." Thus, medical information from one patient, such as heart rate, blood pressure, and
weight, constitute data, as does the same information from everyone admitted to a hospital in the
course of a year. Conceptually, then, the term is flexible.
The grammatical usage of data, however, is proscribed. How so? The word "data" is plural-
the word datum, which means a piece of information, is singular. So, all the medical entries on a
patient's chart are data, whereas the patient's weight when admitted to the ward-say, 1651b-is a
datum. Why does this distinction matter? When writing about or describing data, you will want to
be both correct and precise. Data are, datum is:
"These data are flawed." (correct) "These data were helpful." (correct)
"This data is flawed." (incorrect) "The data was helpful." (incorrect)
"The datum is flawed." (correct) "The datum helped." (correct)
I urge you to listen carefully to how your friends, faculty, and family members use the term
data-usually incorrectly, I'll wager-not to mention newscasters and some newspaper colum-
nists, all professionals who should know better. Do your best to turn the tide by resolving to use
the terms data and datum correctly from now on.
than the question they are trying to answer or the topic they are exploring. In fact, first-
time statistics students can sometimes feel overwhelmed by the trappings of statistical
analysis-the formulas, the math, the tables and graphs-so that they lose sight of what
statistics are supposed to offer as a way of looking at things. Remember the message from
George Box that appeared earlier-to paraphrase him, statistics are for something, they
are supposed to enlighten us, to help us discover things. For most people, teachers like
me and students like you, they are not ends in themselves.
Let's consider an example of how statistics can shed some light on a decision. Read
the following "problem;' as it deals with a situation you probably know firsthand. After
you read the problem and think about it, take out a piece of paper and answer the ques-
tion that appears below.
College Choice
David 1. was a senior in high school on the East Coast who was planning to go to college.
He had completed an excellent record in high school and had been admitted to his two
top choices: a small liberal arts college and an Ivy League university. The two schools
were about equal in prestige and were equally costly. Both were located in attractive East
Coast cities, about equally distant from his home town. David had several older friends
who were attending the liberal arts college and several who were attending the Ivy league
university. They were all excellent students like himself and had interests that were sim-
ilar to his. His friends at the liberal arts college all reported that they liked the place very
much and that they found it very stimulating. The friends at the Ivy League university re-
ported that they had many complaints on both personal and social grounds and on ed-
ucational grounds. David thought that he would initially go to the liberal arts college.
However, he decided to visit both schools for a day. He did not like what he saw at the
private liberal arts college: Several people whom he met seemed cold and unpleasant; a
professor he met with briefly seemed abrupt and uninterested in him; and he did not like
the"feel" of the campus. He did like what he saw at the Ivy League university: Several of
the people he met seemed like vital, enthusiastic, pleasant people; he met with two dif-
ferent professors who took a personal interest in him; and he came away with a very
pleasant feeling about the campus.
Question. Which school should David 1. choose, and why? Try to analyze the argu-
ments on both sides, and explain which side is stronger. (Nisbett, Krantz, Jepson, &
Fong, 1982,pp.457-458)
be a variable (Le., blonde, brunette, redhead), as can a score on a personality test, the day of the
week, or your weight.
As we will see later in the chapter, statistics usually rely on variables X and Y to represent
numerical values in formulas or statistics about data.
Several variables stand out in the David L. problem. First and foremost, David's
friends at the liberal arts institution were generally satisfied, as they told him they liked
it and even found it to be a stimulating place. His pals at the Ivy League school, however,
reported just the opposite, voicing complaints and qualifications on personal, social, and
educational grounds. You will recall that David planned to go to the smaller school until
visits at both places called his initial decision into question-he liked what he saw at the
university but had an unpleasant time at the college. In short, his experiences were the
opposite of his friends' experiences.
What else do we know? Well, a few factors appear to be what are called constants, not
variables.
KEY TERM A constant is usually a number whose value does not change, such as 11" (pronounced"pie"), which
equals 3.1416. A constant can also refer to a characteristic pertaining to a person or environment
that does not change.
We know, for example, that David's friends attending both schools were strong students
(like himself, apparently) and shared outlooks like his own. In other words, intellectual
ability appears to be a constant, as David is not very different from his friends. Yet we
know he had decidedly different experiences than they at the two schools. We also know
that both schools are equally prestigious, cost about the same, are metropolitan, and are
equidistant from his home. These normally important factors do not appear to be very
influential in David's decision making because they, too, are constants and, in any case,
he is more focused on his experiences at the schools than on money issues, location, or
distance.
Nonetheless, these constants do tell us something-and perhaps they should tell
Variables take on different values, David something, as well. In a sense, his friends are constants and because they are so
similar to him, he might do well to pay close attention to their experiences and to won-
constants do not change.
der rather critically why his experiences when he visited were so different from their
own. What would a statistician say about this situation? In other words, could a rudi-
mentary understanding of statistics and the properties of data help David choose be-
tween the small liberal arts college and the Ivy League university?
Approaching the problem from a statistical perspective would highlight two con-
cepts relevant to David L.'s data-base rate and sampling. Base rate, in this case, refers to
the common or shared reactions of his friends to their respective schools; that is, those
attending the liberal arts college liked the place, while those at the Ivy League school did
not. If we know that David L. is highly similar to his friends, shouldn't we assume that
across time he will have reactions similar to theirs, that he will like the college but not the
university? Thus, the base rate experiences of his friends could reasonably be weighed
!' more heavily in the college choice deliberations than his own opinion.
I
Similarity of reaction leads to the second issue, that of sampling, which may explain
why David's reactions were different from those of his peers. Put simply, is one day's ex-
posure to any campus sufficient to really know what it is like? Probably not, especially
when you consider that his friends have repeatedly sampled what the respective schools
offer for at least a year, possibly longer. (Consider these thought questions: Did you
know what your present school was really like before you started? What do you know
now that you did not know then?) In other words, David really doesn't have enough in-
formation-enough data-to make a sound choice. His short visits to each campus were
8 Chapter 1 Introduction: Statistics and Data Analysis as Tools for Researchers
The field of statistics is concerned with making sense out of empirical data, particu-
larly when those data contain some element of uncertainty so that we do not know the
true state of affairs, how, say, a set of variables affect one another.
KEY TERM Empirical refers to anything derived from experience or experiment.
Empiricism is a philosophical stance arguing that all knowledge is developed from sen-
sory experience. Indeed, one can verify what the world is like by experiencing it. I know
the floor is solid because I am presently standing on it. If I had doubts about its struc-
tural integrity, I could do an experiment by testing how much weight it would support
before buckling. The philosophical doctrine seen as the traditional foil to empiricism is
called rationalism. Rationalism posits that reason is the source of all knowledge, and that
such knowledge is completely independent of sensory experience, which is deemed
faulty (Leahey, 1997).
Certainly it is the case that statistics relies on mathematical operations, but these
operations are secondary to the process of reasoning behind statistics. In fact, I always
tell students in my statistics classes that the meaning behind the data, the inferences we
make about the data, are more important than the math. Please don't miss the subtle
message here: Understanding how to do a mathematical procedure is very useful, as is
t
J
" getting the"right" answer, but these facts will not do you much good if you cannot inter-
J pret the statistical result. Thus, the ideal you should strive for is the ability to select an ap-
propriate statistic, perform the calculation, and to know what the result means.
I Some students who take a first statistics course are concerned about whether their
math background is sufficient to do well in the class. Other students are actually fearful
of taking statistics precisely because they believe that they will do poorly because of the
math involved. If you fall into either group, let me offer some solace. First, I firmly be-
lieve that if you can balance your checkbook, then you will be able to follow the formu-
i
( las and procedures presented in this book. Second, Appendix A contains a review of sim-
ple mathematics and algebraic manipulation of symbols if you feel that your math skills
are a little rusty. If so, consult Appendix A after you finish reading this chapter. Third,
you may actually be experiencing what is commonly called math anxiety. To help
with the latter possibility, a discussion of this common-and readily curable-form of
anxiety, as well as references, can also be found in Appendix A. Finally, a project exercise
presented at the end of this chapter will help you to overcome the normal trepidation
students feel when studying statistics for the first time.
look at objects that are not too familiar or that are not too novel; the former are boring
and the latter can be confusing (McCall, Kennedy, & Appelbaum, 1977). By presenting a
group of young children with different groupings of objects (e.g., blocks with patterns)
representing different degrees of familiarity, the researcher could measure their
interest-how long they look at each object, for instance. Different hypotheses examin-
ing the same issue are then combined to form what is called a theory.
KEY TERM A theory is a collection of related facts, often derived from hypotheses and the scientific method,
forming a coherent explanation for a larger phenomenon.
One theory is that infants' interests in novelty also reveal their innate preferences for
learning. Some researchers have suggested that preferences for novelty, in turn, are
linked with intelligence. Another theory suggests that intelligent infants are drawn to
novel information more readily than less intelligent infants (e.g., Bornstein & Sigman,
1986). Note that these theories were developed by examining a variety of hypotheses and
the results of many studies about how infants perceive the objects they encounter. A re-
searcher using any theory would be aware of the existing data and would make certain to
use the scientific method and careful reasoning before executing research aimed at test-
ing any hypotheses consistent with it.
I
i
Statistics, Data Analysis, and the Scientific Method 11
seek to explain individual aspects of human behavior-just look at the table of contents
of any introductory textbook in the field-suggesting that we are far from having a uni-
fied position that ties them all together (Kuhn, 1970, 1977; Watson, 1967).
Much older areas of science, especially physics, have unified theories that enable
them to employ the second type of reasoning, which is called deductive.
KEY TERM Deductive reasoning is characterized by the use of existing theories to develop conclusions, called
deductions, about how some unexamined phenomenon is likely to operate. Theory is used to
search for confirming observations.
Deduction promotes a particular type of prediction: whenever event X occurs, event Y
usually follows. Deductive reasoning is essentially fact-based reasoning, so that what we
already know points us in the direction of what we can also expect to be true. In physics,
I
i for example, Albert Einstein created the theory of relativity, which, among other things,
posited that time was not absolute and that it depended on an object's state of motion.
Initially, there was no experimental evidence to support his radical contentions, but
across the 75 or so years since it first appeared, Einstein's theory has been overwhelm-
ingly supported by experimental data. From theory, then, Einstein deduced how time
and motion should behave.
As I am sure you recognize, our understanding of how infants learn is not as finely
j
j
Deduction: theory leads to data. honed as what we know about time, motion, or the speed oflight. Numerous theories in
the behavioral sciences are still being developed, tested, revised, or discarded in accor-
j dance with empirical data and inductive reasoning. Until a generally accepted theory of
I
human behavior based on facts arrives (if ever!), we will need to be content with induc-
tive reasoning, and the statistical analyses which allow us to verify our induction.
I Figure 1.1 illustrates the direction of inference inherent in inductive and deductive
reasoning. As you can see, when observations lead a researcher to create a theory to ex-
plain some event, the process is inductive. Incidentally, the inferences David L. (and you)
made about college choice were largely inductive. When an investigator relies on some
existing theory to posit the existence of particular observations, the process is deductive.
Let's turn now to consider how our ability to use these two types of reasoning can help
us to determine when an observation is representative of the whole.
,
I
..
COl
·sco
.
COl
0:
co
'"'"
'"'"
GO GO
I:
~
..
GO
..
I:
GO
u
....= ....=
GO
.5 C
Observations
KEY TERM A population is a complete set of data possessing some observable characteristic, or a theoretical
set of potential observations.
Perhaps because the word is commonly associated with groups of people-the popula-
tion of a city or country, for instance-students are quick to assume that the term is ex-
clusively demographic. Keep in mind that a population is any complete set of data, and
these data can be animal, vegetable, or mineral. Test scores can comprise a population, as
can birthrates of Monarch butterflies in Nebraska, sales figures for the East Coast fishing
industry, or all the words printed in a given book like this one. Typically, of course, some
numerical characteristic of the population will be used in statistical calculations.
When psychologists study people's behavior, they typically want to describe and un-
derstand the behavior of some population of people. The behavior studied can be an ac-
tion, an attitude, or some other measurable response. Psychologists may talk about pop-
ulations of people, but keep in mind that they are usually focused on a population of
some characteristic displayed by the population of people. A developmental psychologist
studying preschool social relations might work with a population of children who are 5
years of age or younger, observing the content of comments made by one peer to an-
other. A gerontologist interested in memory decline and age could examine information
processing speed and efficiency in recalling words among persons 80 years of age or
older. Again, note that the term population does not literally mean"every person";
rather, it means the numerical responses for observations-here, comments or words-
of every person within some identified group (e.g., children under age 5 years or persons
80 years or older).
When any psychologist examines the behavior of interest in a population, he or she
cannot examine the responses of all members of that population. A given clinical psy-
chologist who does therapy with people who have agoraphobia (i.e., fear of crowds or
public spaces) does not literally work with every person who has the disorder. The group
of people receiving research attention constitutes a sample from the population of peo-
ple who have agoraphobia.
KEY TERM A sample is a smaller unit or subset bearing the same characteristic or characteristics of the pop-
ulation of interest.
When researchers collect data from a sample, they hope to be able to demonstrate
that the sample is representative of-is highly similar to-the population from which it
was drawn. Why is it necessary to rely on a sample? Practically speaking, most popula-
tions are simply too large and unwieldy to allow a researcher to gauge every observation
within them. Such undertakings would be too expensive, too time consuming, not
feasible, and given good samples-the observations within them reflect the characteris-
tics of their populations of origin-completely unnecessary anyway. Thus, the reactions
of the group of agoraphobics to a new therapy are assumed to hold true in general for
Statistics, Data Analysis, and the Scientific Method 13
x x x
x
x x
x x
x x x
x
I x x
x x
x x x
Population
Sample
the population of all extant (or potential) agoraphobics. Similarly, a good sample of 80-
year-old men and women should be sufficient to illustrate how aging affects processing
speed and memory. In both cases, what we learn from a sample should enable us to ac-
curately describe the population at large. Figure 1.2 illustrates this process: Samples are
drawn from a population in order to discern what the population is like.
Wait a moment-what constitutes a good sample? How do we know if a sample
Adequate samples describe
approximates the characteristics of the population from which it was drawn? These and
related questions are actually the foundation of statistics and data analysis, as all we will
populations accurately.
really do throughout this book is variations on this same theme: Does our sample of be-
havior accurately reflect its population of origin? Will an observed change in behavior in
a sample reflect a similar change in behavior within the population? To answer these
questions, researchers rely on statistics and what are called popUlation parameters.
KEY TERM A population parameter is a value that summarizes some important, measurable characteristic of
a population. Although population parameters are estimated from statistics, they are constants.
For all intents and purposes, we will probably never know the true parameters of
Parameters are estimated but they do any population unless it is reasonably small or extensive research funds are available.
When you hear advertisers say, "four out of five dentists recommend" a mouthwash or
not change.
"
toothpaste, for example, not all practicing dentists were asked to give an opinion! Gen-
./ erally, then, researchers must content themselves with estimating what the parameters
.
)
are apt to be like. Many populations have parameters that could never be measured be-
,,1 cause their observations are constantly changing. Consider the number of people who
are born and die in the United States each minute of every day-the American popula-
tion is theoretically the same from moment to moment, but in practical terms it is ever
changing. Despite this apparent change, we can still estimate the average height and
/ weight of most Americans, as well as their projected life spans from sample statistics.
That is, sample statistics enable us to approximate the population parameters.
When we discuss sample statistics and their relations to population parameters, the
former term takes on added meaning.
KEY TERM A sample statistic is a summary value based upon some measurable characteristic of a sample.
The values of sample statistics can vary from sample to sample.
14 Chapter 1 Introduction: Statistics and Data Analysis as Tools for Researchers
That is, because different pieces of research rely on different samples, statistics based on
sample data are not constants-the same variables collected in two different samples can
take on different values. If a visual perception study reveals that people blink their eyes
an average of 75 times a minute but another on the same topic finds that the average is
closer to 77, is one study "right" and the other "wrong"? Of course not! The studies re-
lied on different samples containing different people who were from the same popula-
tion, and each sample probably differed in size (i.e., number of observations), when and
where it was drawn, and so forth. It is quite common for one study to find sample statis-
tics with values different from those found in other, similar pieces of research. Such sub-
tle or not so subtle variations pose no problem because we are concerned with trying to
approximate the characteristics of the larger population, which has unchanging values
(i.e., parameters).
Thus, one very important conclusion can be drawn: Although sample statistics are
Statistics are calculated from sample apt to change from sample to sample, a parameter, such as a population average, will not
change. This conclusion is shown in Figure 1.3. Different samples from a larger popula-
data, and they change from sample to
tion can yield different sample statistics, but the parameters found in the relevant popu-
sample. lation are constant.
How, then, can we be sure that a given sample is actually representative of a popula-
tion? We make use of another technique from the arsenal of statistics and data analysis-
simple random sampling.
KEY TERM Simple random sampling is a process whereby a subset is drawn from a population in such a way
that each member of the population has the same opportunity of being selected for inclusion in the
subset as all the others.
If no member of a group has a greater opportunity of being selected than any other,
then there is no bias or undue influence affecting the sample's composition. If no
individual or observation was more or less likely to be chosen than another, those ac-
tually selected should, therefore, be deemed representative of the population. A sample
chosen in this way is called a simple random sample. All else being equal, a simple ran-
dom sample enables a researcher to make inferences about the population and its
x x
x x
x x
x
~ x
Population
parameters
are constant.
--- ------- -
... x
I j<"'Random sampling" x x
i Procedure x
x
I x x
Population
x
x
Process of Inference
parameters based on sample statistics. As shown in Figure 1.4, a sample drawn ran-
domly from a population (see the arrow pointing from the right to the left at the top
of the diagram) yields sample statistics, which are then used to infer the qualities of
the population's parameters (note the arrow leading from the left to the right at the
bottom of the diagram).
Let's consider the meaning of random samples in a practical, everyday example.
When soliciting student opinion about a proposed tuition hike or a change in the policy
regarding who can keep a car on campus, a college's administration would not just poll
a few members of the senior class. On the contrary, administrators would be sure to ask
members of the first-year, sophomore, and junior classes, as well. Further, there may be
graduate or special students who lack a class label (e.g., second-year law students, part-
time students, evening program students), but whose input would be desirable before
, any policy changes are implemented. The administrators involved in making new poli-
i " cies would want to know how the student community felt about any changes in advance,
and a representative sample of student opinion is necessary to achieve that end.
Perhaps you are aware of the deleterious effects of nonrandom sampling in the his-
tory of American politics. Based on a mail poll conducted by the Literary Digest during
the 1936 presidential election, Alfred Landon was predicted to defeat Franklin Delano
Roosevelt by a wide margin-but FDR won handily in November. What happened? The
pollsters' sample of registered voters was taken from car registration records, periodical
subscription lists, and phone books, a procedure that worked well in determining the
winner of several prior presidential contests (e.g., Shaughnessy & Zechmeister, 1994).
-, What the pollsters failed to recognize was that the Great Depression encouraged less af-
!
; fluent voters to vote; although wealthier voters favored Landon, greater numbers of
(
I poorer voters wanted Roosevelt. These poorer voters were never polled, of course, be-
,( cause their names did not appear on the (unrepresentative) polling lists! At that time,
poor Americans would be less likely to own cars or have phones, for example.
A similar situation occurred in the 1948 election. Polling results so strongly pointed
to a winner that a famous headline-DEWEY DEFEATS TRUMAN-was determined in ad-
vance of the election results (e.g., Boyer, 1995; Donaldson, 1999). The surprise, though,
was on the Chicago Tribune: Republican candidate Thomas Dewey did not defeat
incumbent Democrat Harry S. Truman, as a combination of inadequate voter samples
and swing votes (Le., voters who changed from Dewey to Truman late in the election)
carried the day. Having the last laugh on the paper and the pollsters, the triumphant
Truman appeared in a now famous photograph holding up the offending headline (see
Figure 1.5).
16 Chapter 1 Introduction: Statistics and Data Analysis as Tools for Researchers
Figure 1.5 Sampling in the 1948 Election Went Awry, but Not in the Eyes of President-Elect
Harry S. Truman
The lesson here is a good one: Avoid overconfidence regarding research outcomes
There is no substitute for an before the data are known to come from representative samples. These electoral inci-
dents serve as strong reminders that there is no substitute for good, representative sam-
adequate-and random-sample.
ples, and that wise researchers avoid counting the proverbial chickens (data) before they
hatch (the results are in, analyzed, and interpreted). Though not perfect, sampling is
now much more sophisticated and precise than earlier in the century, and there are ap-
propriate techniques for random sampling that duly increase the chance of making cor-
rect inferences about data.
men and women attended all the conference workshops?), and the most common index,
the average (e.g., On average, how many days did people stay at the conference?).
Even if a data set contains 200 observations (e.g., the grade point averages of all psy-
Descriptive statistics describe chology majors at a college), a descriptive statistic like the average indicates what the aca-
demic performance of the typical major (e.g., 3.00 or B) is like. The main advantage as-
samples.
sociated with descriptive statistics, then, is simplicity-they tend to be concrete, easy to
understand, and readily presentable to others (i.e.,"The typical student in our program
has a B average").
Inferential statistics extend the scope of descriptive statistics by examining the rela-
tionships within a set of data. In particular, inferential statistics enable the researcher to
make inferences-that is, judgments-about a population based on the relationships
within the sample data.
KEY TERM Inferential statistics permit generalizations to be made about populations based on sample data
drawn from them.
As you probably realize, inferential statistics are based on induction. That is, the obser-
vations constituting a sample are used to make generalizations about the population's
characteristics-some data lead to a theory. Inferential statistics enable us to use what we
know at one point in time to make assumptions about what we do not know, to reduce
our uncertainty. As we will see later in this book, inferential statistics enable researchers
to ask specific, testable questions about the sample data in order to draw tentative con-
I
r clusions about the population.
Figure 1.6 summarizes the processes we have just reviewed. Descriptive statistics are
Inferential statistics infer population calculated from the data collected in a random sample, and then inferential statistics are
performed on the data to estimate the characteristics of the population. We assume that
characteristics.
such inferences based on statistical evidence are true for the time being, and that any
"truths" they disclose are suggestive and conditional-better samples, different data, a
clearer sense of the population, or a revised theory can change the nature of what we
know. Our clinician's treatment for agoraphobia might appear to help the members of
his sample a great deal; however, it does not necessarily follow that the therapeutic in-
tervention will help all persons affected by the disorder. Consider the real possibility that
people with the most severe forms of the disorder never came to the clinic for treat-
ment-they are literally afraid to leave the perceived security of their homes-and thus
did not find themselves in this (or any) study. Future efforts could include more severe
)</~Random Samplin~" x
x
x Procedure x x
x x
Population
x x
x
x x
Descriptive
statistics Inferential statistics infer
describe population characteristics
the sample. from sample data.
Figure 1.6 Descriptive Statistics Describe a Sample. Inferential Statistics Infer Population Characteristics
j
;'
18 Chapter 1 Introduction: Statistics and Data Analysis as Tools for Researchers
U pon first blush, many respondents react to David L.'s dilemma (see p. 6) by relying on personal
evidence (Nisbett, Krantz, Jepson, & Kinda, 1983), that is, his own opinion about his campus
visits. To wit, "He's got to choose for himself, not his friends" (Nisbett et al., 1982, p. 458). Taking
this course of action can be personally satisfying, if not statistically sound. Here is one statistically
sound account, however, from a student who never had a course in statistics (from Nisbett et aI.,
1982, p. 458):
I would say he should go to the liberal arts college. His negative experience there was a
brief, very shallow contact with the school. His friends, all veritable clones of himself, have
been there (presumably) for a while and know the place intimately, and like it, whereas the
opposite statements are true of the Ivy League school. He would be justified, however, to
go with his own feelings about the places. Often, this intuition is a higher perception that
we can't analyze, and he may be right to go with it. I think, though, that the first choice I've
mentioned is more reliable, for his experience is too limited with the two schools.
What prompts statistical versus nonstatistical answers? There are apparently situational fac-
tors that can cause people to respond differently to David's situation. Nisbett and his colleagues
(1983) conducted a simple experiment where two groups of students received the same version of
the David 1. problem that you read. One of the groups, however, also received a cue highlighting
sampling as a relevant issue that David himself chose to consider. Before learning the results of
David's campus visits, the cued group read that:
He proceeded systematically to draw up a long list, for both colleges, of all the classes
which might interest him and all the places and the activities on campus that he wanted
to see. From each list, he randomly selected several classes and activities to visit, and sev-
eral spots to look at (by blindly dropping a pencil on each list of alternatives and seeing
where the point landed). (Nisbett et aI., 1983, p. 353)
Were readers' decisions influenced by the presence of the cue? Emphatically, yes. When the
sampling cue was present, fewer participants (56%) recommended that David should attend the
Ivy League university than when the cue was absent (76%). In other words, the cue encouraged
them to think about the quality of David's visits-as representative samples-of the schools. Fur-
ther, Nisbett et al. (1983) note that participants in the problem cue condition were more likely to
refer to statistical concerns (recall the example shown above) regarding the adequacy of David's
sample data. We will discuss experimentation in detail in chapter 2 and examine the utility of per-
centages for summarizing data in chapter 3.
Be certain to note, however, that in both of these groups, the majority of respondents still
encouraged David to follow his heart when selecting a school rather than to use the statistically more
valid opinions of his friends. You should think about why people are often apt to follow the personal
and not the statistical, an issue we will come back to throughout the book in examples like this one.
cases of agoraphobia, thereby revealing the clinician's therapy to be less efficacious under
some circumstances. No doubt you can imagine other occasions where the greater di-
versity present in a population could be masked within a given sample or samples.
Psychologists rely on inferential statistics to identify which variables cause pre-
dictable changes within experiments. How does X (e.g., an old familiar song) influence
the occurrence of Y (e.g., retrieval of personal memories related to the time the song was
popular)? Relationships like these will be presented in detail in chapter 2, but it is
I
r
Statistics, Data Analysis, and the Scientific Method 19
(
I,
critical for you to appreciate this fact: Inferential statistics do not prove any relationship
1 among any variables definitively. As noted earlier, our inductions can sometimes go awry,
especially if some critical, influential variable goes unnoticed. This real possibility ex-
plains why researchers never assume that the results of anyone study or experiment tell
the whole story about a given topic. More data must be collected from different samples
and new hypotheses must be tested to better understand some behavior. Note, however,
that any such understanding is assumed to be temporary-as we will see in chapter 2
when we discuss the research loop of experimentation, new data and new understanding
lead to theory revision and development.
Knowledge Base
Each chapter in this book contains one or more"Knowledge Bases." The idea is to in-
crease your base of knowledge in statistics and data analysis, as well as to help you pace
your reading of chapters. Take a few minutes and answer the following questions that re-
view the first part of this chapter. Answers to the questions are provided below, so be sure
to go back and review anything you miss.
1. True or False: As disciplines, statistics and mathematics study the same topics and
( answer the same questions.
2. Which of the following are variables, which are constants?
a. Date of autumn's first frost
b. Month of the year
c. Your birth date
d. 'IT
e. Score on a political science quiz
3. A developmental psychologist notices that children with older siblings tend to speak
2 to 3 months sooner than children with no siblings. The researcher concludes that
sibling imitation is a key component in language acquisition. This is an example of
(a) inductive or (b) deductive reasoning.
4. Determine which of the following statements are true and which are false:
a. Samples try to characterize populations.
b. Population parameters are always known.
c. Different samples from the same population are not likely to have the same sam-
ple statistics.
d. Random sampling insures that every member of a population has the same
chance as all the others of being selected.
e. Descriptive statistics allow researchers to determine if a sample adequately char-
acterizes a population.
Answers
1. False
2. a. Variable
b. Variable
c. Constant
"' d. Constant
e. Variable
)
j 3. Inductive reasoning
4. a. True
b. False
c. True
d. True
i
i
e. False
)
.1
20 Chapter 1 Introduction: Statistics and Data Analysis as Tools for Researchers
KEY TERM A continuous variable can take on any numerical value on a scale, and there exists an infinite num-
ber of values between any two numbers on a scale.
We will be reviewing the types of scales most frequently used by behavioral scien-
tists, most of which are continuous, shortly. For the present, think about some continu-
ous variables you encounter with regularity, such as grade point average or time. If you
have ever determined your grade point average (GPA) using the typical scale (i.e., A =
4.0, A - = 3.67, B+ = 3.33, etc.), then you know that it is entirely possible to achieve a
GPA of3.066 for one semester (i.e., the average of 5 course grades, such as B+, B-,A-,
C+, and B+). The usual practice is to "round" a GPA like this to two places behind the
decimal point, or 3.07 (see Data Box I.C for guidelines on rounding numbers). As a con-
tinuous variable, however, there are an infinite number of fractional values that can
occur between a solid A (4.00) and a solid B (3.00) average. In fact, there are an infinite
number of possible values between the GPAs of 2.62 and 2.64!
Another continuous variable, time, can be parsed at any number of levels-hours,
minutes, seconds, and milliseconds, for example, or as days, weeks, months, years,
decades, and so on. Indeed, the only practical limitation on how any continuous vari-
ables are studied or reported is the precision of the measuring instruments involved. For
most of us, noting that it is quarter past the hour, not 16 minutes and 32 seconds past, is
sufficient to tell time, as is relying on a normal watch rather than an atomic clock in
Greenwich to track time's passage.
Discontinuous and Continuous Variables 21
True Limits
I I
157.5 158.5
,;
Bathroom
Scale
I t I t I I
i 157 158 159 160
Weight
Observed Weight
'">-
I
True Limits
!t n
158.05 158.15
Doctor's it
15~ t
Scale
I I I
157 159 160
Weight
158.10 Observed Weight
Figure 1.7 True Limits and Different Levels of Precision for Hypothetical Weight
22 Chapter 1 Introduction: Statistics and Data Analysis as Tools for Researchers
D ividing one number by another is probably the most frequent mathematical operation done in
statistical analyses. As you know from personal experience, the product of such division can
sometimes be a number with seemingly innumerable places beyond the decimal point. When con-
verted to its decimal form, for example, the simple fraction 1/6 is equal to 1.6666. The line across
the top of the last 6 means that this series of numbers is without end. Some calculators, like mine
for instance, round off the number stream; that is, the number is expressed to the nearest power of
I
ten. My calculator reports that 1/6 is equal to 0.1667. j
I
Where did that 7 come from? The fourth digit-originaliy a 6-was rounded up to 7 due to
the fifth digit, also a 6 (note that this digit was implied but not displayed by the calculator). When-
ever you round a number "up" or "down," you do so to what are called significant digits.
KEY TERM Significant digits are the numbers beyond a decimal point that indicate the desired accuracy of
measurement.
If a cognitive psychologist were reporting the average time it took for participants in an experi-
ment to retrieve some fact from memory, the average could be reported to different significant
digits (with corresponding true limits) depending on the desired level of accuracy. If the average
retrieval time for a learned fact were 2.36453231 seconds, it could be reported to:
• The nearest second: 2.4 seconds with true limits of 1.9 and 2.9 seconds (Le., ± 0.5 of a
second)
• The nearest 1/10th of a second: 2.36 seconds with true limits of 2.41 and 2.31 seconds (i.e.,
± 0.05 of a second)
• The nearest 1/100th of a second: 2.364 with true limits of 2.3690 and 2.3590 (Le., ± 0.005 of
a second)
• The nearest 1/1000th of a second: 2.3645 with true limits of2.3650 and 2.3640 (Le., ± 0.0005
of a second)
How far from the decimal point significant digits are reported is for the researcher to decide. The
rules for rounding digits, however, are rather specific, and you should check the rounding per-
formed on the reaction times to make sure you understand them:
1. Decide how many places beyond the decimal point a number should be reported. In general,
I recommend that you round to two places beyond the decimal point. The American Psycho-
logical Association Publication Manual (APA, 1994) advocates that any reported statistic
should be shown to two digits beyond what were shown in the raw (i.e., unanalyzed) data.
2. If you are rounding to two places beyond the decimal point, any remainder extending beyond
the two places that is less than 5 should be dropped. Thus, 28.46455721 would be reported as
28.46.
3. If you are rounding to two places beyond the decimal point and the remainder extending be-
yond the two places is greater than 5, then 1 is added to the last number. Thus, 75.4582163
would be reported as 75.46.
4. If the remainder after the two places beyond the decimal is exactly 5, then (a) add 1 to the last
digit if it is an odd number or (b) drop the remainder if the last digit is an even number. Thus,
1.235357 becomes 1.24 because the digit in the second place, a 3, is odd-and 418.72569143
becomes 418.72 because the digit in the second place, a 2, is an even number.
5. There is one final but critical rule about rounding: Do not do any rounding until the final cal-
culation for a statistic is completed. Rounding is reserved for answers or results that are reported.
Ifyou round numbers early in a calculation, you will experience what is called rounding error-in
your final calculation, your answer will either underestimate or overestimate the actual answer.
Discontinuous and Continuous Variables 23
assumed to fall. Compare this finer measurement with the original weight shown in the
upper portion of Figure 1.7. Both diagrams illustrate the same continuous variable but
at different levels of precision; the basic relationships within the data, however, are the
same.
True limits draw boundaries for identifying the probable location of any continu-
ous variable. For the present, think about the message conveyed by true limits, that we
never know the actual value of any continuous variable, and not the math involved.
True limits will concern us again in chapter 3 when we learn to construct graphs.
most likely the field of psychology. As a result, you should be very interested in learning
how writing can be linked to statistics and data analysis. You should want to know what
to say about statistics and how to go about saying it. In most chapters in this book, I will
provide concrete guidance about how to put statistical results into words that can be
read and understood by others. These "asides" on writing will be highlighted and usually
follow the presentation of a statistical test or data presentation. I urge you to give as
much attention to how to write about the meaning of any given test statistic as you do its
calculation. This extra effort will help you to learn and to retain the various statistical
procedures better, and to present them in precise and accurate ways to others.
Table 1.1 Common Mathematical Symbols, Their Operations, and Supporting Examples
Mathematical Symbol Operation Example
+ addition 5+3=8
subtraction 10-5=5
X,() multiplication 5 X 4 = 20, 8(7) = 56
+,/ division 6 + 2 = 3, 100/5 = 20
< less than 2<3
> greater than 7>5
:s; less than or equal to 3.49 :s; 3.50
~ greater than or equal to 3.65 ~ 3.642
*- not equal to 34 *- 18
- approximately equal to 1.2123 5$ 1.21
Nominal Scales
When an observation is simply given a name, a label, or otherwise classified, a nominal
scale is being used. To be sure, nominal scales use numbers, but these numbers are not in
any mathematical relationship with one another.
KEY T ER M A nominal scale uses numbers to identify qualitative differences among measurements. The mea-
surements made by a nominal scale are names, labels, or categories, and no quantitative distinc-
tions can be drawn among them.
Nominal scales are actually much more common to your experience than you real-
J ize. Can you recall the last time you purchased an appliance or a piece of stereo equip-
( ment? You probably filled out a warranty card and, while doing so, answered a variety of
questions that superficially appeared to have little to do with the item. Typical questions
concern gender ("Are you male or female?"), income ("Which of the following ranges
best categorizes your household income?"), and hobbies ("Please check any of the below
activities you do during your leisure time"). You probably checked a few boxes, dropped
the card into the mail, and quickly forgot about it. The manufacturer, however, was
deeply interested in these facts and quickly converted them into numerical equivalents:
r
; You could be a female (a 2 was checked) earning between $30,000.00 and $35,000.00 a
I year (a 4 was checked) who likes skiing (5), bodysurfing (12), and macrame (27). The in-
i
formation you provided, along with that of hundreds or thousands of other consumers,
is summarized so that a profile of buyers can help the manufacturer target future adver-
/ tising (i.e., toward who is or is not purchasing the product).
The point here is that nominal information can be easily coded (e.g., 1 = male,
i
I Nominal scales name things. 2 = female), tallied (i.e., 4,000 women and 1,800 men bought the product in March last
I
year), and interpreted (e.g., women buy the product more than men do). No informa-
tion is lost in this process; rather, it is converted into numbers for a quick, summary re-
view. These numbers Can then be stored in computer files for examination, classification,
and categorization.
Nominal scales dealing with individuals' characteristics are often used in psycho-
logical research. Gender is nominally scaled, as are race, ethnicity, religion, and many
other variables. Indeed, nominal scaling can be performed on data that were collected
using another scale. Imagine that we have a group of male and female college students
answer a series of questions and complete a variety of psychological batteries dealing
I'
26 Chapter 1 Introduction: Statistics and Data Analysis as Tools for Researchers
Note: Each observation within a category represents one hypothetical respondent. The hypothetical sample
has a total of 400 respondents.
with gender issues. Following research on androgyny, the combination of masculine and
feminine psychological characteristics found in some individuals, we could categorize
each of the respondents according to their gender-related responses to the measures
(e.g., Bem, 1977; Spence & Helmreich, 1978). The standard classifications for gender-
related behaviors are masculine, feminine, androgynous, and undifferentiated (i.e., indi-
viduals who do not think of themselves in gender-related ways).
Table 1.2 shows how the participants in this hypothetical project on gender could be
categorized nominally. Notice that once a respondent is categorized, he or she can be
placed in one and only one nominal category. Thus, for example, a given male partici-
pant could not be androgynous and masculine-he can appear in one category or the
other. Not surprisingly, perhaps, there are more "masculine" males than females (i.e., 68
vs. 28) and more "feminine" females than males (i.e., 64 vs. 16). Interestingly, however,
there are more androgynous males than females, but the reverse holds true for those stu-
dents categorized as undifferentiated.
If pressed about a quantitative dimension for nominal scales, it is true that they pos-
sess an equivalence (=) or a nonequivalence ('*)
dimension. An observation is either
equivalent to the others in a category, so that it is then counted as being part of that cat-
egory, or not; if not, then it is included in another category. If one were studying smok-
ing behavior among preadolescents, for example, a given participant could be catego-
rized as "male smoker:' "male nonsmoker:' "female smoker:' or "female nonsmoker." A
10-year-old female who smoked would be placed in the "female smoker" category-she
is equivalent to the observation therein and not equivalent to any of the remaining three
'*
categories. Note also that = and are effectively "all or nothing" for categorization pur-
'*
poses, so that a male smoker is to a female smoker. The male smoker is not "greater
than" (» or "less than" «) the female smoker, however. Such ordering distinctions
require an ordinal scale of measurement.
Ordinal Scales
When the measurement of an observation involves ranking or ordering based on an un-
derlying dimension, an ordinal scale is being used.
KEY TERM An ordinal scale ranks or orders observations based on whether they are greater than or less than
one another. Ordinal scales do not provide information about how close or distant observations are
from one another.
When we observe who graduated as valedictorian and salutatorian of a high school
Ordinal scales rank or order things. class, for example, we learn who had the highest and second highest GPA. We do not
usually learn mathematically how far apart the GPAs of valedictorian and the salutato-
rian were from one another, or what common or disparate classes they took that con-
tributed to their standings. We only know who had the highest and the next highest
academic performance based on the underlying dimension of GPA.
Scales of Measurement 27
Table 1.3 Ordinal Scaling-The Top Ten Films of All Time as Ranked by Two Friends
Top Ten Films Friend l's Rankings Friend 2's Rankings
1. Citizen Kane 5 1
2. Casablanca 7 3
3. The Godfather 2 2
4. Gone with the Wind 6 9
5. Lawrence of Arabia 8 4
6. The Wizard of Oz 1 7
7. The Graduate 3 6
8. On the Waterfront 10 10
9. Schindler's List 4 5
10. Singin' in the Rain 9 8
Note: A rank of 1 refers to the most highly regarded film, a 2 the second most highly regarded film, and so on.
Source: American Film Institute (AFI)
Anytime we are presented with data pointing to who was fastest (or slowest) in a
race, scored 12th highest on an exam, expressed the strongest reservations about a deci-
sion, won third place in a contest, or was least distressed by witnessing an accident, ordi-
;
I nal measurement is being used. The observations are compared to one another in terms
of their greater or lesser magnitudes, so that the mathematical operations used for ordi-
nal scaling are> and <, respectively. These operations illustrate a modest improvement
over nominal scaling because the observations are evaluated on one dimension as op-
I posed to an either-or process of categorization.
Here is a different, but straightforward, example of a simple ordinal scale. Suppose
you and a friend are discussing a recently published list of the top ten films of all time. It
) turns out you both have seen all ten, so you decide to determine your own rankings for the
films; that is, which film did you like the most, second most, and so on. Table 1.3 illustrates
(in order) the published list of the American Film Institute's (AFI) top 10 films of all time
and the respective rankings of these same films by two friends. As you can see, the pub-
lished ranking serves as the baseline data and the respective ordinal rankings of the two
friends illustrate the deviations in liking from this list. The issue is not whose film taste is
better or worse, rather that the same stimuli can be ordered differently by different people.
A modest irony associated with ordinal scaling is that we often assign the person (or
stimulus) with the highest score or best performance on a given dimension a 1, which is
actually the lowest score. In the earlier example concerning the valedictorian and the
salutatorian, the two persons with the highest GPA ( ::::; 4.00, presumably) received a
rank of 1 and 2, respectively. Does that "the highest score is first" irony matter? Not at all,
just as long as the individuals performing the ranking use the system consistently. As ca-
sual observers, we are all quite familiar with the intent of most ordinal rankings, implic-
itly understand the irony involved, and take virtually no notice of it. Greater degrees of
precision in measurement, however, are associated with interval and ratio scales.
Interval Scales
Interval scales are used when the distance between observations is measurable, equal,
and ordinal, but a true zero point is unnecessary. What is a true zero point? A true zero
point occurs when a scale cannot measure any observations or ratings at or below the
scale's value of o.
f
f
-"l
28 Chapter 1 Introduction: Statistics and Data Analysis as Tools for Researchers
KEY T ER M An interval scale is quantitative, contains measurably equal distances between observations, but
lacks a true zero pOint.
The basic mathematical operations of addition, subtraction, multiplication, and divi-
sion can be performed on data collected from interval scales. When a zero point appears
on an interval scale, however, its placement does not mean that information stops at that
point. Think about it: If you have ever lived through a cold winter, you know that tem-
perature does not stop being meaningful at 0° Fahrenheit. The recorded temperature can
and often does fall below 0°, so that it becomes colder still. Indeed, "below zero" mea-
surement entails the use of negative numbers (e.g., "With the wind chill, it was -5°
below 0° last night").
The Fahrenheit scale found on most thermometers is an example of an interval scale
Interval scales are Quantitative you know well. Each degree entry on a thermometer is equally far apart from every other
degree, a property that renders the relationships among temperatures meaningful. A
measures that lack a true zero.
temperature of 62° is objectively higher (and warmer) than 60°, and the same 2° differ-
ences exists between 53° and 55°-temperatures that are in turn cooler than the first
pair.
Despite what appears to be a clear mathematical relationship in interval data, how-
ever, one cannot claim that one measure is twice or three times the magnitude of an-
other. Though it is colder, a 20° winter day in Vermont is not 3 times colder than a 60°
day in Georgia, for example. A person who receives a score of 50 on a scale that measures
depressive symptoms is clearly at greater risk for depression than another person with a
score of 25, but it would be incorrect to say that one is "twice as depressed as the other"
(e.g., Radloff, 1977). The question becomes how much colder is one temperature than
another or how much of a difference in risk for depression is illustrated by disparate
scores on a standardized measure. Whether a meteorologist or a psychologist, a re-
searcher must present interval scale results in ways that are true to the data-ordered
mathematical relationships are fine, but they cannot be based on ratios.
Within behavioral science research, many personality scales, intelligence (IQ) mea-
sures, educational tests, and rating scales use an interval scale. Whether standardized or
not, practically all the tests, quizzes, and exams that you have completed throughout
your education are based on an interval scale of some type. Here is an important issue to
think about: If an individual receives a score of 0 on a self-esteem scale or the equivalent
score on a test of verbal skills, does such performance necessarily imply the complete ab-
sence of the relevant personality trait or verbal ability? Certainly not. Such performance
simply indicates low-but not no--verbal ability, as well as very low levels of self-esteem.
On an interval scale, then, a measure of 0 does not mean that the phenomenon being
measured is absent. After all, the scale being used might not be sensitive enough to ade-
quately measure the phenomenon of interest. A measurement of 0 does take on a precise
meaning when appearing on a ratio scale, however.
Ratio Scales
The ratio scale incorporates all of the properties found in the previous three scales, as
well as an absolute zero point.
KEY TERM A ratio scale ranks observations, contains equal and meaningful intervals, and has a true zero
point.
In the case of a ratio scale, a zero point is meaningful because it indicates a true absence
of information. For instance, the zero measurement on a ruler means that there is no
Scales of Measurement 29
i
/\ object being measured, just as a reading of 0 miles per hour (mph) on a speedometer in-
dicates that the automobile is not in motion. The existence of a true zero point on a ratio
scale enables users to describe measurements in terms of numerical ratios. Weight is a
ratio scale, so that a 2-ton object is to a I-ton object as a 4-ton object is to a 2-ton object
(i.e., each is twice the weight of the other). Similarly, where height is concerned, a 6-ft
person is twice as tall as a 3-ft person. The same relationship among ratios oflength can
be demonstrated on any standard yardstick or, for that matter, a more precise measuring
device, such as a micrometer.
Most scales used by behavioral scientists and most used throughout this book turn
Ratio scales are quantitative out to be interval scales. Although nominal and ordinal scales are used in research, they
lend themselves to a relatively small number of statistical tests (see chapter 14). Ratio
measures that have a true zero.
scales are used less frequently in behavioral research than interval scales, but they are by
no means rarities. Anytime a project involves the measurement of reaction time, the am-
plitude of sound or the intensity of light, or familiar measures such as height or weight
or even the number of cigarettes smoked, a ratio scale is presumably being used.
,
,
i Writing About Scales
I
:
Our review of the scales moved from the simplest variety to the most complex form of
measurement. Table 1.4 summarizes the main points associated with the four scales of
!
measurement. Nominal scales provide the least amount of information for researchers
f (see the arrow pointing up to the top left side of Table 1.4) and ratio scales provide the
most (see the corresponding arrow pointing downward). As noted on the right side of
Table 1.4, nominal and ordinal scales tend to identify qualitative relationships within
data while interval and ratio scales focus on quantitative ones. The defining features of
each scale, as well as representative examples, are presented in the center of Table 1.4.
I recommend that you refer back to Table 1.4 when preparing to describe a scale
" within a paper or a presentation, or even when solving a statistics problem. It is essential
(
!
Table 1.4 Comparing Qualities of Measurement Scales
.a " Scale Name Defining Features Examples
" "
Nominal Names, labels, categories Gender (1 = male, 2 = female)
Qualitative operations: =, =fo Ethnicity or religion of person
Smoker vs. nonsmoker More
Provide
Ordinal Observations ordered or ranked Class rank (1st, 2nd, 3rd, 4th) Qualitative
! Less
Qualitative operations: <, > Rank on personality measure
/ Information
(high vs.low optimism)
Self-esteem scale scores (10 = high
self-esteem, 1 = low self-esteem)
Interval Order or ranking; equal intervals Fahrenheit temperature
between observations; no true zero Most standardized psychological tests
point score on a measure of verbal ability
Provide Quantitative operations: +, -, X, + IQ score
More
More Ratio Order or ranking; equal intervals Weight, height, reaction time, Quantitative
Information between observations; true zero number of bar presses, amplitude
point of sound, intensity of light, speed
Quantitative operations: +, -, X, +
30 Chapter 1 Introduction: Statistics and Data Analysis as Tools for Researchers
that you have a firm grasp of what sort of data a scale measures, as well as the operations
that can be conducted on those data, before you write about them. Sometimes it is very
clear which of the four scale types a given measure falls under, but other times a reader
of the research literature or even an investigator will not be so certain.
Consider an example illustrating this uncertainty: When children are screened for
special academic or remedial programs, various intelligence tests and achievement mea-
sures are used to assess their intellectual abilities. Most educators and administrators be-
lieve that such tests are interval scales but, oddly, they are often used as ordinal measures.
Students with a measured IQ score of 130 are placed in the gifted program, for example,
while those with an IQ of 85 or less end up in a remedial class. What happens to a child
with an IQ of 129 or 86? Nothing, really; that is, the child remains in the regular class-
room environment, receiving neither enrichment nor remediation. An observer might
reasonably ask if a person with an IQ of 129 is really less gifted than the individual with
an IQ of 130? Does it follow that a child with an IQ of 86 is at less risk educationally than
children with 1 IQ point below her? Again, the key is knowing what a given scale is sup-
posed to measure and how that measure is being used.
When writing about a scale, be sure to let the reader know if the scale was created
previously for another purpose-it may be a standardized test or an existing personality
measure-by providing a reference (see Appendix D). If the scale is unique to your piece
of research, briefly describe how and why it was created. In either of these cases, readers
will understand your research better if you provide detailed information about the scale
you used. Be sure to note how many items appear on the scale, if some numerical rating
or ranking was used (Le., did respondents circle numbers? rank order preferences?), the
range of scale scores (high to low), as well as the average given (if appropriate). Most of
this information is germane to interval and ratio scales, of course. If you used a nominal
or an ordinal scale, be sure to describe what was categorized or ranked, and how the pro-
cedure was performed on the data. If possible, create a table or a figure to summarize the
relationships in the scale data (see chapter 3 for suggestions on pictorial representa-
tions).
One of the most helpful things you can do when writing about a particular scale is
to share an item or two from it with readers. If items from the scale are shown, readers
will gain a better sense of what it is designed to measure, as well as an appreciation of the
respondents' point of view when completing it. Unless you are writing about a very fa-
miliar scale (e.g., the Scholastic Aptitude Test [SAT]), it is likely that most readers will
not know the scale's characteristics. If there is nothing memorable to latch on to, lack of
familiarity with a scale can quickly lead to reader disinterest or mild confusion. This
problem is especially true in the psychological literature, where scales are routinely re-
ferred to by their acronyms (e.g., CES-D) instead of their actual names (e.g., The Center
for Epidemiological Studies Depression Scale; Radloff, 1977). The shorthand of
acronyms saves time and printed space, but their heuristic value as memory aids does lit-
tle good for people who are unfamiliar with them.
Here is a description of the CES-D, an interval scale, excerpted from the Method
section of an empirical article (Dunn, 1996, p. 290):
Depression. Depressive symptomatology was assessed using the Center for Epidemi-
ological Studies Depression Scale (CES-D; Radloff, 1977), a general measure of de-
pressed affect or mood designed for use with cross-sectional samples in survey research.
The CES-D's 20 items are scored on a 4-point scale (0 to 3) that measures the frequency
of a symptom's occurrence during the previous week (e.g., "My sleep was restless").
Scores can range from 0 to 60 (present sample range was 0 to 41), and higher scores re-
flect a greater prevalence of depressive symptoms....
Overview of Statistical Notation 31
Strive to give readers sufficient detail about whatever measures you employ or analyze. Even
a relatively brief description of a scale can give readers a coherent picture of who should
complete the scale, how the scale is used and scored, and what a given item from it is like.
Shorter scales, those containing only an item or two, can be included in their entirety
in any report. If interpersonal attraction were the topic of the study, the major question of
interest might be: "Using the following rating scale, indicate the attractiveness of the per-
son you just mee' Such scales typically use a 1 ("not at all attractive") to 7 ("very attrac-
tive") rating scheme, where higher numbers on the scale correspond to higher levels of
perceived attractiveness. Circling the number 4 on this scale would suggest a neutral level
rating. Note that the same scale could use a rating system of -3 ("not at all attractive") to
+ 3 ("very attractive"), where 0 would be the neutral midpoint. As you can see, the quan-
titative relationship among the numbers on the scale matters, but the particular numbers
appearing on the scale do not-any numbers would do, as long as their meaning is clear to
respondents (Le., the numbers are ordered with equal intervals between them).
Knowledge Base
1. Indicate which of the following variables is discontinuous and which is continuous.
a. Number of items selected from a menu
I, b. Rated satisfaction with menu items
j c. Number of rats run in an experimental maze
d. Volume gauge on a stereo system
e. Car speedometer
2. Calculate the true limits for the following:
;
I a. 16lb
i b. 27 minutes
3. Identify which scale of measurement best describes the following:
a. Need for acceptance by others
b. Democrat
c. Second place in a pie eating contest
d. Sensitivity to light
! Answers
1. a. Discontinuous
b. Continuous
c. Discontinuous
d. Continuous
e. Continuous
2. a. IS.Slb
b. I6.5lb, 26.S minutes, 27.50 minutes
3. a. Interval
b. Nominal
c. Ordinal
d. Ratio
another: symbols and numbers. Numbers are familiar to you, but the symbols we will
learn are probably new to you. Symbols are used in statistical formulas as directions or
guidelines, nothing more. These symbols are usually shorthand for specific mathemati-
cal operations, the majority of which will be either familiar to you or very easy to learn.
At other times, the symbols will serve as variables that can take on different values.
As you read this section of the chapter, make a concerted effort to learn what the
symbols mean and how they are used. Again, your goal is to immerse yourself in the
symbolic language of statistics so that it becomes second nature to you. Learning the
rudiments of this language will enable you to broaden your vocabulary, so to speak, as
you read and learn from subsequent chapters.
There are four basic symbols that appear throughout the various formulas pre-
sented in this book and they are highlighted in Table 1.5. In the future, refer back to this
table when you need a quick summary. These symbols are the real "workhorses" of data
analysis. As you study these symbols and the accompanying statistical notation, imagine
that the numbers used to illustrate them are from interval or ratio scales.
Variables Xand Y. The first two symbols, X and Y, are used to represent variables, but
usually indicate different types of information.
KEY TERM X and Yare variables that take on the values of some set of observations or data.
Typically, these variables will be used to refer to the data collected in an experiment.
Variable X could refer to friendly comments and Y could indicate the amount of time
spent smiling in a study on peer friendships among adolescents. Variable X will be used
more frequently than Y, as most formulas contain only one variable, but occasionally
they will be used together in the same formula.
Notice that when they stand alone, X or Y refers to a set of data, not necessarily an
individual observation. There is a particular notation for illustrating the scores or ob-
servations that comprise X. To indicate an observation within X, we use Xi' where the
subscript i indicates a specific observed value for X. If there were five scores in X, say, 10,
14,20,21, and 36, these scores could be represented as Xl> X 2 , X 3 , X4 , and Xs. That is,
Xj = 10, X2 = 14, X3 = 20, X4 = 21, Xs = 36.
In this case, X refers to the object being measured and the numbers I to 5 indicate the in-
dividual observations or participants comprising the data set of X. We can do the same
with variable Y. Variable Y could indicate, say, a set of data with three observations:
Yj = 4, Y2 = 7, Y3 = 10.
In a large set of data collected for a study, there could be two different measures taken
from each participant (i.e., Xi and Yi). Data from the second participant in this study
could then be referred to as X2 and Y2 •
Please be aware that there is nothing magical about X or Y. Their use as variables is
simply a convention (no doubt you remember using x in algebra). If you wished to do
Overview of Statistical Notation 33
so, you could use letters like K and G in place of X and Y. You could even employ Z, the
de facto variable of choice in those rare instances where a third variable is needed.
Total Observations Are Equal to N. The third basic symbol identifies the number of
observations or participants available in a data set or for a particular analysis.
KEY T ER M N signifies the total number of observations in a set of data.
In the data sets for X and Ywe just examined, N = 5 and N = 3, respectively. That
is, there were 5 scores for X and three scores for Y, or one from each participant in the
two data sets. There might be 75 peers in the friendship study noted above, so N = 75. If
there were 30 students enrolled in your statistics class, then N = 30, and so on. The sym-
bol N always signifies the total number of observations within some data set.
Summation Rules. Unless you know the Greek alphabet, the fourth symbol will be
new to you. In statistics, the symbol I (pronounced "sigma") means "to sum:' In gen-
eral, when you see the I before a string of numbers, you should add them together. If we
( use the observations for X from above, we can use I to sum them as follows:
Ix = Xl + X2 + X3 + X4 + Xs.
This is the same as saying:
Ix = 10 + 14 + 20 + 21 + 36.
If so, then,
Ix = 101.
Take a moment and check your understanding of how I is used. Go back and per-
form the summation for the Yobservations shown above (be sure to write out the indi-
vidual observations with subscripts for Y, the actual scores for Y, and then the sum of Y):
What was your answer? The sum of Y, or I Y, is equal to 21. If you made an error, please
go back and review the example for X, and then redo the summation for Yuntil you un-
derstand how to obtain the answer.
There is a slightly more formal way to express summation notation. The additional
I
r information actually makes the procedure for adding a string of numbers much easier to
.'
follow as long as you keep the basic process we covered in mind. It is also true that the
additional notation will prove to be useful in more advanced calculations later on. For
the present, imagine a new data set for X and that we want to sum all of the observations
within it together. Another way to represent the summation would be:
Xl + X2 + X3 + ... + XN •
That is, all of the values of X-from the first value (Xl) to the last or Nth value (XN)-
must be added together, starting with i = 1. If there were six observations in X, they could
be Xl = 5, X2 = 9, X3 = 11, X4 = 9, Xs = 6, and X6 = 2. The summation is written as
N
I
i=l
Xi = 5 + 9 + 11 + 9 + 6 + 2 = 42.
34 Chapter 1 Introduction: Statistics and Data Analysis as Tools for Researchers
Do you see the notations above and below the sigma? The i = 1 notation below the sigma
means i adopts successive values from 1,2,3,4, and so on up to N, the last observation
in the data set, which appears at the top of the summation sign. In simple, descriptive
terms, this summation means: ''Add all the observations from i = 1 (Xl) through i = N
(XN ) together."
Of course, you will not always want to sum all the values within a data set. Many
times, you will need to work with only a portion of the observations available. The sta-
tistical notation is flexible here, as well. Imagine that you were interested in summing
only the first three numbers in the data set for X. This time, the summation notation
would be written as
3
I ;=1
Xi = Xl + X2 + X3
Notice that now i = 1 (Xl) to i =3 (X3 ), not N. In turn, this means summing the first
three observations in the data set, or
3
IXi = 5 + 9 + 11 = 25.
;=1
What if the summation sign directs you to add only selected observations within the
data set? Try solving this summation:
4
IX; =
i=3
This time, you would start at i =3 (X3 ) and add it to i =4 (Xt). Be sure that you under-
stand how to read the notation, which is like a guidepost. You begin with the informa-
tion under the summation sign ("start at i = 3") and add the observations until you
reach the end point indicated above the sign ("include but stop at i = 4"). The answer to
the above summation is 20. Was your sum correct? If not, please reread the section on
summation before proceeding with the rest of the chapter.
You would begin at the left, which happens to contain an operation in parentheses. So,
you would first perform the operation (here, addition) inside the parentheses,
X = (8)2 - 12 X l.2...
5
Then, you would square the value as indicated, recalling that any squares or exponents
are next in priority,
X = 64 - 12 X l.2...
5
In the next step, you would perform the multiplication and division. In this case, the
r division would take place first,
X= 64 - 12 X 2,
then the multiplication,
X= 64 - 24.
Subtraction comes last, so you would take 24 from 64. The solution to the equation is the
number 40, or X = 40.
Let's try another, one involving negative numbers and a square root. Here is the
equation,
y= V4 - (-12 + 3).
Although we work from the left to the right, operations in parentheses still take prece-
dence over the square root. A negative number is also present inside the parentheses, so
you must remember that a larger negative number added to a smaller positive number
J results in a negative number, or
y= V4 - (-9).
In the next step, we can do the square root, which takes precedence over the negation,
y= 2 - (-9).
36 Chapter 1 Introduction: Statistics and Data Analysis as Tools for Researchers
The negation, the changing of two negative signs to a positive sign, follows,
Y= 2 + 9.
Finally, we finish the problem with addition for,
Y=ll.
If you had any difficulty with these mathematical operations-either understand-
ing the priority of some operations over others or just doing the operations-take a few
minutes and review the two sample equations. Before you do so, though, you might want
to take a look at Table 1.6, which contains isolated examples of mathematical operations
in order of their priority.
Up to this point, basic statistical notation, the priority rules, and some sample equa-
tions have been reviewed. One last step remains, which is to combine the statistical no-
tation and priority rules into a few sample equations. Equations like these will recur
throughout the text, but it is a good idea for you to begin to solve them now, so that their
reintroduction in subsequent chapters will be familiar, not a surprise. I do not expect you
to memorize what follows, rather I want you to focus on understanding how the symbols
and numbers work together to produce the answers. Once you understand how these com-
ponents work together in any given equation, you will be able to solve it with little or no
difficulty.
Assume that you have two sets of data, X and Y, and each one contains four values.
Xl = 5, Yl = 4,
X2 = 3, Y2 = 7,
X3 = 6, Y3 = 5,
14= 9, Y4 = 2.
Squares, X2 72 = 49
exponents, and y5 35 = 243
square roots Vx v'36 = 6
Negation -y -10
-(X) -(5) = -5
(-X)(Y) (-4)(5) = -20
(-X)(-Y) (-4)( -3) = 12
Multiplication XX Y 2 X 8 = 16
X(Y) 8(7) = 56
(X)(Y) (2)(15) = 30
and
Although you already know the basic summation rule, we will repeat it here for both
data sets,
LX = 5 + 3 + 6 + 9,
Ix= 23,
LY = 4 + 7 + 5 + 2,
LY= 18.
What if we squared each of the observations in a data set and then summed them? Here
is what it would look like for X,
LX2 = X~ + X~ + X~ + X~,
LX2 = 52 + 32 + 62 + 92,
LX2 = 25 + 9 + 36 + 81,
LX2 = lSI,
and for Y,
Ly2 = Y~ + Y~ + Y~ + Y~,
Ly2 = 4 2 + 72 + 52 + 22,
Ly2 = 16 + 49 + 25 + 4,
Ly2 = 94.
These are called "the sum of the squared values of X" and "the sum of the squared values
of Y," respectively.
Another alternative is to sum the observations in the data sets and to then square the
sum (recall that we previously determined the sums of X and Y). These products are the
"sum of X squared" and "the sum of Y squared." For data set X it looks like,
(LX)2 = 529,
and for data set Y,
(Ly)2 = (Yl + Y2 + Y3 + y4)2,
(IY? = (18)2,
(Iy? = 324.
Please note that the sum of the squared values of X is not equal to the sum of X
squared (nor, obviously, is the sum of the squared values of Yequal to the sum of Y
squared). In the case of X, the rule is presented symbolically as:
In the data set for X we just reviewed, the product of the former is equal to 151, while the
latter is 529. This is an important rule, one that we will revisit frequently in the course of
learning statistical formulas. Learning it now will save you heartache, confusion, and
redoing calculations later.
We can also perform what is called the "sum of the cross products:' which uses
multiplication.
LXY= X1Y1 + X2 Y2 + X3 Y3 + X4Y4 ,
LXY = 5(4) + 3(7) + 6(5) + 9(2),
LXY= 20 + 21 + 30 + 18,
LXY= 89.
Finally, we can produce what is called the "product of two sums" through multipli-
cation.
(LX)(LY) = (23)(18),
(LX)(LY) = 414.
[Excerpted from "A Billion, A Trillion, Whatever" by Michael T. Kaufman. The New York Times,
Sunday, October 18, 1998; Week in Review, Section 4, page 2.]
Overview of Statistical Notation 39
Mise En Place
When French chefs cook, they rely on a preparation technique known as mise en place
(pronounced "meeze ehn plass"), which literally means "everything in its place." Before
any recipe is executed, all the raw materials are gathered, chopped, diced, or otherwise
readied, and then these ingredients are lined up in the order in which they will be used
for cooking. In much the same way, you should have all of your statistical materials ready
to go before you begin any data analyses. That is, the data should be collected and orga-
nized, appropriate formulas or procedures should be selected, and paper, sharp pencils,
and a well-lighted working area set up.
The mise en place philosophy toward doing statistics also entails a regular rhythm,
a standard routine, to your work. You should set out some time each day to study this
book and to do your homework. I can guarantee you that you will learn more and per-
form better in your class if your reading, studying, and homework are done consistently.
) I can also assure you that you will experience difficulty if you do the work occasionally,
i haphazardly, or at the last minute. Again, your goal should be to understand and retain
what you learn about statistics for the long term (i.e., future classes, research projects, ca-
reer), not the short term (i.e., tomorrow's class, the quiz on Friday, next week's exam).
Steady work on statistics will payoff, so before you start to "cook" with your statistics, re-
peat the mantra to yourself, "mise en place, mise en place."
About Calculators
Many years ago, I received a very expensive, programmable, scientific calculator as a gift.
It has 34 buttons on it, each of which has 2 or 3 separate functions (or between 68 and
102 possible operations!). Its number display can go out to 10 places behind the decimal
point, and it has 9 separate memories for number storage (I actually think there is still
more memory, I just never learned to access it). In short, my calculator is a wonder
in spite of the fact that by my estimate, I only know how to do about 5 or 10% of its
functions.
Should you obtain one like mine to do the statistics presented in this book? Ab-
solutely not. You should find a good calculator, but you will really only use the set of
basic operations common to most calculators. The operations are addition, subtraction,
multiplication, and division, of course, but also a key for taking square roots of numbers,
/
/ and a squaring function or exponent key. Some memory storage capability, too, is desir-
able. In contrast to my calculator, though, less definitely is more.
Some calculators also have basic statistical procedures and tests programmed into
them, which can be very useful for checking your answers to examples in the text or
homework problems. You should not solve any statistics problems by using these pro-
grams, however, because one of the goals of this book is to teach you to work through the
calculations by hand. Yes, hand calculations (supplemented by a calculator, of course) do
take a bit more time, but they also help you to get a real feel for the data, a sense of where
the numbers came from and how they are used to calculate a statistic.
I believe that you will retain more concepts from your class and the material in
this book by doing the bulk of the work by hand. Calculators are necessary and very
helpful tools, but they are only tools to augment, not replace, your understanding of
mathematical and statistical operations. Whether you own, buy, or borrow one, make
certain that it does what you need and that you avoid becoming distracted by functions
you will not need for your course work. Just think of me and the 89 or so operations I
have yet to figure out!
40 Chapter 1 Introduction: Statistics and Data Analysis as Tools for Researchers
Knowledge Base
Examine these two data sets and then solve the expressions.
x Y
7
5 3
10 5
2 5
6 6
3 2
1. Ix
2. Iy2
3. (IX)2
4. (IX)(Iy)
5. IXY
Answers
l. 33
2. 100
3. 1,089
4. 726
5. 124
II Each chapter in this book concludes with a project exercise, an activity that applies or ex-
tends some of that chapter's main points. These project exercises are designed to give
you an opportunity to think about how statistical concepts can actually be employed in
research or to identify particular issues that can render data analysis useful for the design
of experiments or the interpretation of behavior. This first project exercise, however, has
a different goal. It is meant to help you overcome some of the fears and prejudices you
may unknowingly harbor toward statistics.
Take a few moments and answer the following questions:
1. When I think of learning about statistics, I feel _ _ _ _ _ _ _ _ _ __
2. When I look at this equation,
t = (Xl - X2) - (ILl - IL2)
5X, -X2
I feel _ _ _ _ _ _ _ _ __
Generally, students respond to both questions with a mixture of fear and trepida-
tion. Dillon (1982) notes that students rarely respond with confident or interested com-
ments. Instead, they express uncertainty or even fear by using words like "unsure;' "nau-
seous;"'panicky;"'doomed;' and "overwhelmed:' When you answered the two questions,
did words like these come to mind? If so, then you reacted like most students do the first
time they encounter statistics.
Overview of Statistical Notation 41
Summary
1. The chapter opened by discussing the "how to" side of statis- often theoretical sets of data. The parameters of populations
tics, the focus of this book, as well as drawing a distinction are constants that cannot be known but are usefully approxi-
between statistics and data analysis. A statistic is any infor- mated by sample statistics.
mation presented in numerical form, where data analysis is 7. Descriptive statistics describe-that is, summarize the values
the systematic collection of observations. Both terms empha- in-samples, while inferential statistics allow researchers to
size quantitative relationships, but because it is broader, data determine if sample data are sufficient to characterize a pop-
analysis also encompasses qualitative issues. ulation's parameters.
2. Statistics and data analysis are tools used to accomplish the 8. Discontinuous variables-those with real gaps where infor-
task of interpreting human behavior-by themselves, they mation does not occur-and continuous variables are com-
are not as important as that task. monly used in behavioral science research. These variables
3. Variables and constants were introduced in the context of the are related to the four scales of measurement: nominal, ordi-
David 1. problem, where they helped to guide college choice. nal, interval, and ratio.
Base rates-that is, a minimal consensus of opinion-and 9. Researchers need to recognize that good writing and quality
sampling issues were identified as essential ingredients in a statistics and data analysis go hand in hand. Such writing is
statistically based college choice for David. essential to the planning, interpretation, and dissemination
4. Although the disciplines of statistics and mathematics have of research results.
different agendas, both share an appreciation for the scien- 10. Basic statistical notation and the rules of mathematical pri-
tific method, hypothesis testing, and theory development. ority were reviewed in detail. Readers were advised to adopt
5. Inductive and deductive reasoning were presented as the the "mise en place" philosophy, obtain a good but simple cal-
dominant modes of reasoning in science. The behavioral sci- culator, and avoid becoming "statisticophobes" by falling
ences are largely inductive; that is, theories are created in prey to math anxiety.
order to search for confirming evidence.
6. Samples and simple random sampling were discussed as the
only way to adequately characterize populations, or large,
Key Terms
Constant (p. 7) Inferential statistics (p. 17) Sample statistic (p. 13)
Continuous variable (p.20) Interval scale (p. 28) Scientific method (p. 9)
Data (p.5) Lower true limit (p.21) Significant digits (p.22)
Data analysis (p. 4) Mise en place (p. 39) Simple random sampling (p. 14)
Datum (p.5) N (p. 33) Statistic (p. 4)
Deductive reasoning (p. 11) Nominal scale (p.25) Stimuli (p. 24)
Descriptive statistics (p. 16) Ordinal scale (p.26) Theory (p. 10)
Discontinuous variable (p.20) Population (p. 12) True limits (p.21)
Empirical (p. 9) Population parameter (p. 13) Upper true limit (p.21)
Hypothesis (p. 9) Ratio scale (p.28) Variable (p. 6)
Inductive reasoning (p. 10) Sample (p. 12)
Chapter Problems
1. What is a statistic? Can data analysis differ from statistical 5. Why are mathematics and statIstIcs different disciplines?
analysis? Why or why not? What makes some mathematical operations statistical?
2. Define the term variable, and then provide an example. 6. Use the terms data, datum, stimulus, and stimuli correctly in
3. How do variables differ from constants? Give an example of four different sentences.
each. 7. What are empirical data?
4. Explain how an appreciation for base rates and sampling is- 8. Briefly describe the scientific method and the role hypotheses
sues could have helped David 1. choose a college. and theories play in it.
Chapter Problems 43
9. Define inductive and deductive reasoning, and then give an 21. Solve the following equations:
example of each process. a. Y = (7 + 2)2 - V25
10. Explain the relationship between samples and populations, b. X = (1W + (12 - 5) X 4
and sample statistics and population parameters. How does c. Y= Vlo - (-15 + 10)2
simple random sampling relate to these concepts? d. X = 8 X 2 + (10 + 12)2
11. Define simple random sampling. 22. Using the following data, solve the expressions for:
12. Describe the difference(s) between descriptive and inferen- Xl = 12, X2 = 2, X3 = 15, J4 = 10, Xs = 7
tial statistics.
13. Can inferential statistics prove without any doubt that a N=
given sample is from some particular population? Why or N
why not? IXi=
i=l
14. List three examples each of discontinuous and continuous
variables. 3
15. Identify the upper and lower true limits for the following: IXi=
i=2
a. 2,0501b
b. 58.30 minutes N
c. 3 inches IXi=
i=4
d. a score of 70 on a 100-point test
16. Round the following numbers to two significant digits: 23. Using the following data sets, solve the expressions:
a. 2178.762143
b. 1.22222 x Y
c. 3.41982 4 7 Ix=
d. 2.1
17. Why is writing relevant to statistics and data analysis? How 2 2 Iy=
can good, clear writing help investigators with their research? 2 4 IXY=
18. A college entrance questionnaire asks first-year students to in-
dicate their age, gender, hair color, height, weight, the number 3 4 (IX) (Iy) =
of books they read in the last 3 months, and their SAT scores. 1 4 IX2=
a. Indicate which of the variables is discontinuous or contin-
uous. 5 1 Iy2=
b. Identify which scale of measurement best describes each 24. Explain the "mise en place" philosophy in statistics and data
variable. analysis.
19. Name and define the four basic statistical symbols. 25. What is "statisticophobia"? How can it be overcome?
20. Briefly summarize the mathematical rules of priority.
Random Selection or Random Assignment? Which Category of Research Design is being used?
1. 2. 1. 2.
Will a sample be Will the members Are participants Are the
used to make of a sample be randomly , participants
generalizations assigned to two assigned to randomly
about its popula< (or more) experi- groups? selected.fr
tion of origin? ; mental groups? II .' ~ome~op
.....
If yes, If no, .......
If yes, If no, If yes, If no, If yes, Ilno,
then perform then go to then perform consider then go to then go to then go to" then go to
random Question 2. random using another Question 3. Question 2. Question 3. ;,Question 5.
---.------
selection. <assignment. available
randomizing
procedure.
3. 4.
Is at least one Is at least one
Independent or Dependent Variable? independent pependent .variab!~
variable being ~. (measure) being.~
1. 2. manipulated in the measured in the
Will the variable '. Does the variable ····iX.>+!i~lM!gM.K\;
be manipulated? have two or more
conditions?
c
If.ves, If no, ifno,
~then go to then go to go to
Question 4. Question 5. experiment. Question 5.
~!t~~~~
Question 2. Question 3. independent independent
variable. variable; go to i;IS 5.
the behavior of , Are the
Question 4. an intact but non- of association
random group of between two (or
participants being >.rnore) variables;
"""4/" " 9pmpared to th~t !R~ing.examined?
3. &1 of a comparison' .'.
Will the variable Are you certain
group?
be measured? that you have
answered the
previous Questions
correctly? If yes, then it If no, If yes, then it If no,
is a quasi- then go to is a correla- then go to
~experiment. Question 6. tional design.
......
Question 7.
If yes, If no, If yes, then If no, then a
then it is a then it is not the variable Question was
dependent a dependent needs to be answered in-
variable (also variable; changed be- correctly; go 7.
known as a go to fore it is used back to Is the behavior of a unique;.:
dependent Question 4. iiin a study or Question 1 intact group of participants
measure). research and start being studied without any
summary. again.
i., . !~o,~p~ris~n,~ro~gJ!;i ii
. Ifyes, Itri6YfH~ffttiei<'
then it is probably a research design
quasi-experiment. cannot be categorized
as experimental,
quasi-experimental,
or c9f£ulfIJt911al.
CHAPTER 2
Chapter OuUine
ow is research in the behavioral sciences conducted? How do researchers go about The Importance of lJel~errmnm"'·
designing experiments or other modes of inquiry that enable them to tease apart Causality
cause and effect relations in data? Is any particular approach to research superior Data Box 2.C: The "Hot
to others? These and related questions will be answered in this chapter, which is de- Basketball" and the
voted to explaining the theory and practice behind research ventures in psychology and Misrepresentation of Randomness
• Operational Definitions in
related disciplines.
Behavioral Research
This second chapter is an interlude between chapter 1's overview of statistics and Writing Operational Definitions
the emphasis on statistical concepts, formulas, and data analysis techniques to be found Knowledge Base
in subsequent chapters. This interlude is important because anyone who wants to learn • Reliability and Validity
to use statistics properly must have a context for their application to data. Knowledge Reliability
cannot be acquired, understood, or meaningfully applied inside a vacuum. As you Validity
learned in the last chapter, statistics are for something-they are tools that mean very Knowledge Base
little unless they are used to answer a question or to discover heretofore unnoticed re- • Research Designs
lationships among variables. In the behavioral sciences, notably psychology, statistics Correlational Research
and data analysis are used to predict and to interpret human behavior in all its myriad Experiments
forms. In this chapter, we will examine the fundamentals of basic experimentation and Quasi-experiments
research design, the mechanics of the research enterprise. The message in this chapter Data Box 2.D: Quasi-
is simple but important: Good research and quality research design are enhanced by experimentation in Action: What
rigorous and appropriate use of statistics and data analysis. to Do Without Random
Assignment or a Control Group
Step 3: Conduct experiment(s) to test the hypothesis and to eliminate any alternative
- - hypothesis(es)
Step 4: Analyze the data and interpret the results of the experiment(s)
hypothesis. This account of the scientific method is both brief and idealized. We need
to establish a better sense of how research actually gets done, and to do so we turn to
the research loop of experimentation (Dunn, 1999; see also Platt, 1964).
The research loop of experimentation (Figure 2.1) is a series of steps that identify
the work done at each stage of the research process. Although these steps ap-
pear to be discrete, numerous smaller activities occur between each one. Such smaller
activities are practical matters that investigators routinely perform but rarely discuss
with nonresearchers, what we might call the tacit or implicit side of conducting re-
search. I mention this fact so that you will not make the mistake of viewing research as
a cut-and-dried affair rather than a dynamic, detailed, and demanding enterprise.
In step 1 of the research loop (see Figure 2.1), an investigator observes some in-
teresting relationship among some observations or decides to explore some as-yet-
untested aspect of a theory. Thus, the impetus to initiate research can range from al-
most casual curiosity to theory extension, and a host of possible reasons for asking
research question can fall between these extremes. Step 1 represents a scientific com-
mitment, one that requires the investigator-whether student or professional-to
think critically about the research topic before proceeding to the next step.
The second step entails the development of a research hypothesis or a testable ques-
tion derived from the research topic (see Figure 2.1). In many investigations, the hy-
pothesis identifies which variables will be manipulated and which will be measured to
best answer questions of cause and effect. In others, the hypothesis is less specific and
more oriented toward observing the relationships among variables in order to develop
firmer speculations for future research.
As we will see later in chapter 9, any research project really has two hypotheses-
one the investigator wants to put forth as a satisfactory account for some behavior, and
the other, an alternative hypothesis that the investigator wants to invalidate. The latter
usually offers an account of behavior opposite that of the research hypothesis. The idea
here is to pit the research hypothesis against its alternate so that only one of them can
be shown to offer superior explanatory power.
The Research Loop of Experimentation: An Overview of the Research Process 47
Step 3 is the data collection phase, which brings the theory and hypothesis together
in some empirical fashion (see Figure 2.1). The empirical realization is usually (but not
always) an experiment.
KEY TERM An experiment introduces intentional change into some situation so that reactions to it can be
systematically observed, measured, and recorded.
In lieu of an experiment, step 3 could also involve a correlational investigation or what
is called a quasi-experiment. Each of these research alternatives will be defined and
discussed in detail below.
Step 3 also includes drawing a sample from some larger population and making
certain that it was drawn randomly. If an experiment is conducted, some members of
the sample are then given an experience that is consistent with the research hypothe-
sis, while the others receive information fitting the alternative hypothesis.
In step 4, the results of the experiment are interpreted (see Figure 2.1). It is here
that the bulk of the statistical analyses and accompanying scientific reasoning takes
place. Descriptive statistics are calculated from the sample data, and inferential statis-
tics are then employed to see how well the sample results fit the population parame-
ters. In practical terms, the investigator tries to discern whether the manipulated vari-
able(s) created the hypothesized change in the measured variable(s). If the hypothesized
change took place, then there is evidence for the favored hypothesis and accompa-
nying theory. The research hypothesis, then, is treated as a reasonable explanation
for the cause and effect-the give and take-among the variables in the research. On
the other hand, if unpredicted change or no change occurs, the investigator cannot be
confident that the research hypothesis is tenable-instead, the opposing or alternative
hypothesis is embraced for the time being.
The phrase "for the time being" is an apt one for the last step in the research
loop of experimentation. In step 5, the process begins anew, and the researcher ef-
fectively goes back to step 1 where he or she had only some idea or bit of evidence
about how and why some behavior occurs (see Figure 2.1). Even armed with the re-
sults of one experiment, the investigator is really starting over from scratch-the same
process must be acted out again from start to finish to start again because research
is really never finished! Variations of the original topic must be examined, cherished
hypotheses must be revised or discarded, and new questions must be formulated.
Some researchers spend their entire careers exploring subtle distinctions within the
same topic, while others migrate from topic to topic as the spirit or scientific inspi-
ration moves them.
The research loop of experimentation should demonstrate to you that science is
The research loop of experimentation done in a somewhat cyclical way. The knowledge gained through this looping cycle ad-
vances gradually, even incrementally. Forward movement-the identification, classifi-
is recursive; that is, after completing
cation, and application of scientific facts-is usually slow. The scientific community ac-
four steps, a researcher loops back cepts results only after they are critically reviewed by peers or occasionally even debated.
to step 1 to begin the process anew. The results of related and disparate investigations are compared so that consistencies
can be identified and inconsistencies can be explained or more thoroughly explored.
The process of recursion inherent in the research loop of experimentation-that is, con-
tinually repeating the five main steps in order-advances knowledge by ensuring that
established findings are continually examined in light of newer results, and that no data,
no matter how persuasive, are seen as permanent.
Two other advantages are associated with the research loop of experimentation, the
replication and extension of results. The term replication refers to repeating or redoing
an investigation to verify that the results can be duplicated.
48 Chapter 2 Process of Research in Psychology and Related Fields
KEY TERM A replication study, which is usually an experiment, is performed to repeat or duplicate some
scientific result.
Replication is a necessity in scientific research; indeed, many heralded findings are not
accepted or "trusted" until they are independently confirmed by other investigators from
different laboratories. To their credit, many researchers are loathe to share their results
with the scientific community until they have verified them more than once. With its
recursive design, the research loop of experimentation enables investigators to repeat
the same study more than once.
What about extending known results? How does this process take place? Generally,
extending known results occurs through what is called conceptual replication. A literal
Replication-literally repeating a
replication study is a relatively precise re-creation of what was done before, whereas a
procedure to find the same results to conceptual replication (sometimes called systematic replication; see Aronson, Ellsworth,
verify their accuracy-is a Carlsmith, & Gonzales, 1990) keeps some aspects of the situation constant from the orig-
cornerstone of behavioral science inal work while allowing other parts to vary or be left uncontrolled. In other words, some
part of the conceptual replication study differs from the original work. This research pos-
research.
sibility is also covered by the research loop of experimentation. By changing the condi-
tions from the original study somewhat in steps 1,2, and 3 (see Figure 2.1), the inves-
tigator has the opportunity to see how well the observed result stands when change is
introduced. If the result does not change, then the researcher knows that it is not unique
to the way the original work was done. In this case, the result is strong and pronounced
so that it is sometimes described as "robust:' The result may change, of course, in which
case the investigator must study the amount and direction of that change to learn if the
"new" results point to unanticipated relationships with other variables or to a limit for
the result. Such limits are often referred to as "boundary conditions" because the ex-
tent-the boundary-of the observed effect has been located.
Keep the research loop of experimentation in mind as you read the remainder of
this chapter. As you will see, it serves as a useful guide-really, a guiding force-
behind much of the work conducted in psychology and the other behavioral sciences.
Random x x x
Random Sampling
Assignment x x x
x x
x Population
x Convenience x x
x Random /'
Sample Selection / x
x x Process ./
x
i~.ms.'.""
I \
Random I
I
I
I
Assignment to I I
employees are included). Additionally, there may be some hidden biases embedded
within the list. Even if the list is alphabetical, for example, some letters may be over-
represented on it (many last names begin with "S") than others (few last names begin
with "Q" or "Z"). For most situations, this alphabetization problem is probably minor,
but the researcher must be aware of it nonetheless.
Some populations of people are made up of different sized groups or subpopula-
tions. Consider a small college with an enrollment of 8,000 undergraduates. Is it the
case that each class-first year through senior-has 2,000 students in it (i.e., 4 classes
X 2,000 students in each =8,000)? Probably not-why? Simply because the size of an
entering class tends to shrink across a 4-year period, a phenomenon known in higher
education as attrition. Some students drop out, leave for unknown reasons, transfer to
another school, take time off, among a variety of other possibilities, across the standard
4 years of college (not to mention that some students take 5 years or longer to com-
plete their degrees). A first-year class that began with 2,000 students might graduate
with, say, 1,750 after 4 years. The important implication here is that the size of each of
the four classes would differ within the college.
What does class size have to do with sampling? If you wanted to accurately assess
student opinion in all four classes, you would need to know how many people were in
each class. If one class has more students than another, you would not want to over-
sample the former (or undersample the latter). Further, you might also want to make
sure that an appropriate representation of males and females within each class was col-
lected (i.e., if a school has more men than women, are more men sampled than women?).
To accurately represent various subgroups existing within a population, researchers of-
ten used a stratified random sample. A researcher will divide a population into "strata"
or subgroups, and then randomly sample an appropriate number of observations from
each one. Thus, relatively fewer first-year students than juniors or seniors would need
to be sampled because there are relatively more of them available. Stratification turns
out to be especially helpful when investigators are trying to ensure that minority groups
are adequately represented in a sample.
A third technique is called cluster sampling. Sometimes it is too expensive or time
consuming to sample an entire town or city, for example, and yet it is desirable to learn
the opinion of its inhabitants. Cluster sampling involves identifying a few smaller units
("clusters") within the population and then sampling their opinion. Imagine you were
running for city council and you wanted to get a sense of the impression your cam-
paign was creating in the minds of voters. Instead of spending time and money to ob-
tain a true random sample of voters, why not randomly select a few neighborhood
blocks throughout the city? If there were 50 neighborhood blocks, you might randomly
select five to visit. Each block would serve as a cluster within the larger population. You
and your campaign workers could then go door to door within those five blocks to learn
how your candidacy was faring. Cluster sampling is economical, quick to do, and
enables researchers to study a few observations with some depth. It may lack the pre-
cision of a true random sample, but its practical orientation renders it a useful sam-
pling tool for particular situations.
Convenience sampling was already identified as a nonrandom way to round up a
group of people who are later randomly assigned to conditions in a study. One other
nonrandom sampling technique will be mentioned here. Quota sampling entails select-
ing a number or "quota" of persons who fit some set of predetermined characteristics.
If you were interested in the opinions of students involved in Greek life on your cam-
pus, you might decide to interview four members of each fraternity and sorority.
A quota sample can be further refined along the lines of other demographic consid-
erations (e.g., each of the four members from the fraternity and the sorority would
52 Chapter 2 Process of Research in Psychology and Related Fields
hail from the first-year, sophomore, junior, or senior class). Quota samples are clearly
biased, but they are biased toward involving special interest groups to obtain some
information on the topic of interest.
Table 2.1 summarizes the sampling techniques we have discussed so far. You can
refer back to this table when deciding what sort of sampling procedure would be ap-
propriate for research you undertake, or when you read a study identifying a particu-
lar sampling procedure or that could be improved by better sampling.
Sampling Error
Our discussion of sampling here and in chapter 1 could lead readers to believe that
proper use of randomization rules out most difficulties. This conclusion is largely true,
but it neglects one important fact about sampling and populations-even very good
samples are only relatively representative, not wholly representative, of a population. In
other words, no sample is perfect nor can it completely portray a population's charac-
teristics. Statisticians refer to the discrepancy between sample statistics and population
parameters as sampling error.
KEY TERM Sampling error is the difference existing between a sample statistic and its corresponding popu-
lation parameter.
The word "error" in this context means that our measurement is not precise. A sample
statistic is an estimate of some true score found in the population. Regardless of the
quality of a study, there will always be some degree of sampling error. The error in mea-
surement may be large or it may be small. Obviously, a researcher's goal is to minimize
that error as much as possible.
Using a randomizing procedure is a good start, of course, and giving appropriate
consideration to the size of a sample is another. Generally speaking, sampling error can
be predicted by the size of a given sample. Here is a good rule of thumb: Larger samples
Populations and Samples Revisited: The Role of Randomness 53
exhibit smaller amounts of sampling error than do smaller samples. This rule makes good
sense if you think about it for a moment. When trying to gauge public opinion, for ex-
ample, is it wiser to ask a few people or a large number? As you well know from the
countless opinion polls you have seen on television or read about in the paper, it is
better to ask a fairly large number of people. Note that you need not ask everyone-a
representative sample will do precisely because it will best approximate the public's
opinion. By the way, if you were to poll everyone you would be performing a census, a
sample that includes each and every member of a population (see Table 2.1).
We will not consider exactly how many people to include in a poll or an experi-
Larger samples characterize a ment at this point. For now, I would prefer that you remember that sampling error can
be reduced substantially through randomization and by obtaining a reasonably large
population more accurately than
sample of observations. We will revisit the connection between size of a sample and er-
smaller samples. ror in a later chapter.
Knowledge Base
1. True or False: The repetitive nature of the research loop of experimentation en-
sures that knowledge advances incrementally and that established results are eval-
uated against new data.
2. True or False: A replication study maintains some aspects of the original study while
varying others or leaving them uncontrolled.
3. True or False: The goal of random assignment is to create equivalent participant
groups before an experiment begins.
4. Recruiting people who happen to be available or around is called a
a. quota sample
b. convenience sample
c. random sample
d. systematic sample
5. If you sample every 27th person in a large group, you are creating a
a. quota sample
b. convenience sample
c. random sample
d. systematic sample
6. Identifying and then sampling several smaller units within a larger population is
called a
a. quote sample
b. haphazard sample
c. cluster sample
d. systematic sample
7. True or False: If a sample is carefully drawn, there can be no sampling error; that
is, sample statistics will match corresponding population parameters.
Answers
1. True
2. False-the statement describes a conceptual replication.
3. True
4. b. Convenience sample
5. d. Systematic sample
6. c. Cluster sample
7. False-there will always be some degree of sampling error.
54 Chapter 2 Process of Research in Psychology and Related Fields
T here is an irony in human behavior where randomness is concerned. We try to recognize ran-
domness when it operates, but end up trying to impose order to make sense of it. The irony is
that, by definition, random events have no order and make no "sense" in the way humans usually
define it.
We persist, believing that even minor chaos cries out for order, which can sometimes get us
into trouble where inference, accuracy, and understanding are concerned. Here are two promi-
nent cases where we run into trouble-exerting control we do not have and relying too much on
a judgment strategy called representativeness.
IDusory Control and Random Events. Life experiences teach us that skill and effort usu-
ally payoff, that we can be masters of our destinies. The problem is that we frequently assume
control over events that are in actuality not controllable, and there is often a random element in-
volved. Psychologist Ellen Langer coined the term illusion of control to explain what happens when
people's expectations of personal success are inappropriately higher than the objective nature of
their situation warrants. In general, this illusion operates when people insert skill-related behav-
iors into settings that have a random component. By skill-related behaviors, we refer to situations
where control is a possibility, when we compete with others, have choice, exert effort, and the like.
In one famous study, choice was shown to elicit illusory control by encouraging people to over-
look the role randomness played in the situation (Langer, 1975, study 2). Office workers were ap-
proached by a fellow worker and asked if they would like to take part in a lottery. Half of those ap-
proached were given the opportunity to choose a $1 lottery ticket (the tickets were cards with pictures
of football players), while the remaining participants were assigned a card. All participants knew
that the lottery winner would receive $50. The day before the lottery was to take place, all the of-
fice workers were asked how much money they would be willing to sell their lottery ticket to an-
other person for (all the tickets were supposedly sold but a fellow employee still wanted to buy one).
Participants who selected their own ticket wanted an average of $8.67 to resell it, but those
assigned a card only asked for an average of $1.96! Participants who had a choice were willing to
resell it for over eight times the purchase price, which can be construed as indicative of their con-
fidence of winning the lottery. Langer (1975; see also, Langer, 1983) concluded that the simple
act of making a choice induced illusory control, enabling some participants to ignore the ran-
dom information in the situation-only 1 ticket out of 50 could win and choosing (or not choos-
ing) a ticket made absolutely no difference.
Can you think of any similar instances where you have subjectively tried to control some-
thing over which objective influence was not possible? What role, if any, did randomness play?
There is a group of 100 professionals, 70 of them are engineers and 30 are lawyers. One
individual is selected at random from the group. His name is Dick. He is a 30-year-old
man. He is married with no children. A man of high ability and high motivation, he
promises to be quite successful in his field. He is well liked by his colleagues. Is Dick
an engineer or a lawyer?
Did you pick engineer or lawyer? What led to your choice? If you look over the paragraph once
more, you can see that no information that sheds light on Dick's actual profession was provided.
Yet I have no doubt that many of you used the details provided (e.g., "married;' "no children;'
"promises to be successful") to create a mental profile of Dick, which would in turn help you
make the choice based on other information you have about engineers ("low social skills;' "nerdy")
or lawyers ("aggressive;' "there are a lot of them out there").
The only information you need, though, is that (a) the choice was made randomly and
(b) there are more engineers (Le., 70) than lawyers (i.e., 30) in the sample. Given these consid-
erations, Dick is apt to be an engineer. Kudos to you if you got it right, but I am guessing that
some of you were mislead by the worthless evidence that made you downplay the role of ran-
domness. If you had simply been told that the group of 100 was composed of 70 engineers and
30 lawyers, that 1 was selected, and you were then asked to indicate his profession, I have no
doubt you would pipe up "engineer" in a heartbeat. Due to randomness, Dick could be a lawyer,
but it is more likely he is an engineer-there are more engineers so it is more likely that an en-
gineer was randomly selected. Tversky and Kahneman (1974) note that people get it right when
no other evidence is shared, but they tend to be less confident or to make an error when worth-
less evidence draws their attention away from the role randomness plays.
How often does the representativeness heuristic lead you inferentially astray? When do you
neglect base rates and randomness by focusing on extraneous details that trigger stereotypes? We
will review other roles for randomness in this chapter and later in the book. Until then, the les-
son is clear: When you recognize randomness is at work, avoid imposing order.
~I Imagine that you are a cognitive psychologist who has applied interests in human
An experimenter manipulates memory. Cognitive psychologists often study the selection, perception, interpretation,
storage, and retrieval of information from memory, as well as decision making and
independent variables, whereas
problem solving. Your general interest is how people search through memory to recall
dependent variables are measured by learned information, and your particular interest is improving search engines on the
an experimenter. Internet so they become more compatible with human reasoning. When people sit down
to search the Internet for information on some topic of interest, for example, how do
they retrieve appropriate search terms from memory? Some people are more
efficient at such searches than others; that is, they locate what they learned in memory
56 Chapter 2 Process of Research in Psychology and Related Fields
with relative ease, while others seem to struggle a bit, gradually recalling useful search
terms or helpful categories.
As a start to your research, you decide to examine how well people remember search
terms on their own versus being given a cue to stimulate their recall. In the cognitive lit-
erature, this is a distinction between free recall and recognition in memory. Free recall is
simply remembering whatever you can from some list of stimuli without regard to the or-
der in which it was learned and without any prompting. In contrast, recognition involves
being given a stimulus and then being asked if you saw it before, for example, or you could
be asked to pick out previously encountered information from a list of alternatives.
Perhaps future Internet search engines, or "web browsers;' should suggest related
terms to users once a term is entered. These related terms would have to be intuitively
related to the original search term, yet unique enough to access different material on
the Internet. As each search occurs, the search engine would need to "learn" from a
user's search style so that previously located terms would not repeated. In tum, the user
would have to adapt to the search engine's style of responding to search terms with en-
tries, websites, narrower or broader terms, and so on. Thus, computer hardware and
software would need to complement human hardware and software.
To begin your research on human-computer compatibility with search terms, you
decide to conduct a straightforward experiment illustrating memory differences
between free recall and recognition for search terms one might use on the Internet to
locate information on investing in the stock market. As a cognitive psychologist, you
would be very familiar with the voluminous literature on human memory and related
processes, as your theorizing about how people come up with search terms in memory
would be based on it. The hypothesis you intend to test is derived from the available
theory: Recognition searches of memory result in higher recall of search terms than
do free-recall searches. You would then identify, define, and describe the variables that
will be used in the experiment.
Generally, experimental research-indeed, almost any type of research-relies on
the two types of variables mentioned earlier. The independent variable is the variable
that is manipulated or allowed to vary in any experiment.
KEY TERM An independent variable is the variable that is manipulated by a researcher. In experimental
research, it must have two or more levels.
Laboratory experiments, for example, typically examine how the presence and absence
of a given variable affects people's behavior. When a variable is sometimes present and
other times absent within a study, the researcher is said to be "manipulating the
independent variable."
An independent variable, then, must have at least two levels, usually an experi-
mental treatment and a control treatment. Some study participants-usually half of
those available-are exposed to the experimental treatment while the remainder ex-
perience the control treatment. The experimental treatment represents the hypothesis
favored by the investigator.
Please be aware that an independent variable can have more than two levels. An
(
independent variable can illustrate a range of values, for example, so it could have three
levels-high, medium, and low. Alternatively, an independent variable could have four,
five, six, or even more levels to it-the proverbial sky is the limit as long as tHe respec-
tive levels can reasonably be expected to elicit behavioral differences on the plut of re-
search participants. The researcher, too, must be sure that a more complicated inde-
pendent variable, one with several levels, can be adequately manipulated in an
experimental or other research context. Still, the best way to become familiar and com-
fortable with thinking about independent variables is to learn about their most basic
,
r
( Independent and Dependent Variables 57
t
~
I
,I
form-one independent variable with two levels. Complexity in the form of several lev-
f,
els or even several independent variables will come later.
r
I
To continue our hypothetical cognitive example, the experimental treatment would
be exposure to a recognition task subsequent to the learning phase. Half the partici-
I pants could be given pairs of stock investment search terms to review, such that one
'!
member of each pair would be familiar (i.e., from the stimulus list used in the learn-
ing phase), such as "share:' while the other would be novel, say, "security." The experi-
mental treatment participants would simply indicate which term from each pair they
recognized from the learning phase.
Participants receiving the control treatment, however, would not receive any recog-
( nition prompts-they would simply be asked to perform a free recall of whatever search
t' terms they remembered. Notice that the control treatment really refers to the absence
i of any intervention at all, a condition that is used for comparison with the experimen-
)
tal treatment. Here is a key point: The researcher compares the effects of both levels of the
,i independent variable on some outcome variable to assess whether any observed difference
,.J can be attributed to the experimental treatment.
I
,, To continue our cognitive research example, the dependent measure would be the num-
(
ber of search terms recalled in the experimental group versus the control group. In line
r with the hypothesis, the predicted outcome would be that a relatively greater number
of search terms would be recalled by participants in the experimental group (Le., those
j who performed the recognition task) than the control group (Le., those who performed
(
the free-recall task). As the researcher, you would probably examine the average num-
r
"
ber of search terms recalled by participants in the two conditions, anticipating that a
higher number in the experimental group than the control group.
Where are we in the research loop of experimentation? The identification and
f
'"
explication of independent variables and dependent measures occurs in step 2 (see
Figure 2.1). Once the variable selection is accomplished, we then enter step 3 where the
~ actual experiment is conducted (see Figure 2.1). This step entails recruiting participants
i
0' and randomly assigning them to one of the two conditions in the experiment. As we
r discussed earlier, it is likely that you, the cognitive researcher, would also have to rely
r
) on a convenience sample. As you might imagine, step 3 is rather involved, as it involves
I staging your piece of research from start to finish. A review of the nuts-and-bolts of
J
how to conduct an experiment is beyond the scope of this book but, if you are inter-
J ested, you can consult any number of books for detailed advice (e.g., Dunn, 1999; Mar-
,I
tin, 1996; Rosnow & Rosenthal, 1996; Shaughnessy & Zechmeister, 1997).
I We now turn back to step 4 in the research loop of experimentation, which is con-
f cerned with interpreting the results from the memory experiment. The two averages-
one representing the recognition group, the other the free recall group-would be com-
ri pared with one another to see if the former was reliably larger than the latter. It is at
this point in the research that statistical analysis comes into play, when the researcher
r
I tests the hypothesis by determining if the anticipated relationship is confirmed by the
empirical data (Le., a recognition task leads to higher levels of recall for search terms
r'
/
58 Chapter 2 Process of Research in Psychology and Related Fields
than no recognition task). Later in the text we will discuss the specific statistical tests
that would be best suited to analyze the data collected in this hypothetical study. For
now, concentrate on the fact that both the independent variable and the dependent
measure have a role to play where statistical analyses are concerned. The independent
variable is often used as what is called a "blocking" variable in the analyses, as here when
the two discrete groups-recognition versus free recall-were presumed to exhibit
different behavior. The dependent measure is important, too, because it serves as
tangible behavioral evidence that something happened. Here, of course, the behavioral
difference between the two groups suggests that recognition facilitates recall.
You should also focus on the fact that the statistical analyses occur after the data
are collected, though the investigator would have determined the statistical test long be-
fore the first datum would be collected. That's right-statistical analyses are such an in-
tegral part of the research process that they must be planned well in advance of the ac-
tual research. If the analyses are not planned in advance, it is entirely possible to collect
the wrong sort of data. In other words, one can inadvertently collect data that cannot
be analyzed. The design of an experiment or a study, for instance, can necessitate the
use of a particular statistical test or data analytic technique, but if the wrong type of
data were collected, then no test can be performed on them. This problem is actually
more common than you might guess, and we will discuss specific ways to avoid it by
proper planning in chapter 15 (see also, Appendix D). In the mean time, we need to
consider what sort of data can comprise a dependent measure.
old are you?"). Surveys, standardized personality measures, essays, rating scales, attitude
scales, mood instruments-almost anything that has a verbal component-can be la-
beled a self-report measure. The majority of self-report data are based on paper-and-
pencil measures, where respondents write, circle, or check responses corresponding to
their thoughts or feelings. Naturally, self-report data can also be drawn from videotapes,
tape recordings, interviews, and even phone calls, though in psychological research the
term most often refers to participants' own written comments.
r There is a drawback to self-reports, however, in that it is very difficult, if not im-
r
possible, to verify their accuracy. If I ask someone why she acted a certain way, how will
I know that she is telling me the truth? Moreover, how does she know what promoted
( her action? If you find these questions odd, then you will be surprised to learn that
r
many psychologists are deeply concerned about linking what people say with what they
I actually do. Research evidence actually promotes the view that we often do not know
I,
why we act the way we do, rendering our self-reports and introspection suspect (e.g.,
; Nisbett & Wilson, 1977; Wilson, 1994; Wilson, Dunn, Kraft, & Lisle, 1989; but see Er-
,~
icsson & Simon, 1993). Although self-report measures are very useful, even integral to
r behavioral science research, a good research strategy is to bolster their effectiveness by
/
simultaneously measuring related behavioral variables. What people say can then be
compared to what they do.
Regarding the memory experiment, I hope that it is clear why search terms recalled
To examine the link (if any) between
or recognized do not constitute what is normally considered a self-report measure. The
reason is that neither recall nor recognition of terms require participants to share any
people's words and their deeds,
thoughts or feelings-the memory measure, then, represents a behavior rather than a
self-report measures should be response driven by attitude or opinion.
t
accompanied by behavioral In contrast to the public nature of behavioral and self-report measures, physiolog-
r ical measures are markers of much more private, internal psychological states. Common
( measures.
physiological measures in psychological research include pupil dilation, blood pressure,
~ heart rate, and galvanic skin response, an indicator of electrodermal activity. Please
!
notice that each of these measures provide indirect evidence for an individual's psy-
r chological reactions. Why is the evidence indirect? Imagine that you were interested in
studying how people can become physiologically aroused when they watch an exciting
clip of film, say, a downhill ski run taken from the perspective of the skier. Instead of
asking people how they felt about the film or watching their facial expressions-self-
! report and behavioral indicators, respectively-you could measure their heart rate,
r respiration levels, perspiration, and so on to study their arousal reactions.
(
The one problem with physiological measures is that they are often not easy to
interpret; that is, despite their emotional differences, both fear and excitement tend
to result in elevated heart rate, rapid breathing, and heightened perspiration. An in-
vestigator must develop a coherent, logical account of why particular physiological
changes are caused by one stimulus and not another. Thus, physiological measures are
still somewhat controversial, though they are becoming increasingly common in psycho-
logical research.
The fourth and final class of dependent measures is called behavioroid measures.
Aronson and colleagues (1990), who wanted a way to describe situations where re-
search participants provide future-oriented responses, coined the term behavioroid.
In some studies, for example, participants are asked to volunteer to perform some
activity in the future-devoting time to community service or visiting patients in a
nursing home. The participants never actually perform the activities, of course,
but their responses are treated in an "as if they did" manner. Researchers who use
behavioroid measures are interested in studying participant commitment to some
possible future event, not whether the event actually occurs. Really, anytime a
60 Chapter 2 Process of Research in Psychology and Related Fields
participant in a piece of research is asked to think about and react to some hypo-
thetical event (e.g., "What is the first thing you would do if you won the lottery?"),
a behavioroid measure is being used.
Table 2.2 summarizes the four classes of dependent measures found in psycholog-
ical research. Be sure to examine the representative dependent measures listed within
each class and try to remember some of them. As you learn statistical analyses in
subsequent chapters, they will be easier to conceptualize if you recall dependent mea-
sures that can serve as examples.
\
! Variable Distinctions: Simple, Sublime, and All Too Easily Forgotten
(
J After years of teaching, I am convinced of one thing: One of the easiest concepts to forget is the
r'
d ~distinction between independent variables and dependent measures. I am so convinced of this
!
fact-particularly after having graded hundreds of exams and papers demonstrating the error-
that I created this small Data Box to call your attention to the problem. You are probably not con-
vinced of this fact, and I am guessing that more than a few readers believe they know the distinc-
tion cold. But why risk missing needless points on an exam or paper? Take a moment and review
the distinction between these two critically important variable types one more time and ensure
your future success.
Commit these facts to memory-I have tried to make them mnemonically meaningful:
Please understand that it is not necessary for the same researcher to continue a par-
ticular line of research using the research loop. Another investigator somewhere else
may learn of the results and become interested in continuing the work or examining
,
I some empirical variation of it (recall the earlier discussion of replication and concep-
tual replication studies). Thus, step 5 is not an end but really only another beginning
" for researchers who use the research loop of experimentation.
experiment progresses forward only as rival hypotheses are tested against established
wisdom; old data must sometimes yield to new.
To put it more bluntly still, we will never know the whole truth of human behavior,
but each careful, systematic piece of research and its accompanying statistical analyses
move us closer to it than we would otherwise be. We need to study how independent
variable X caused a verifiable change in dependent measure Y-and to be sure, statis-
tics and data analysis will help us do this-but we must also keep in mind that deter-
Operational Definitions in Behavioral Research 63
literature, you would be amazed by the variety of operational definitions and the
ingenuity of the researchers in creating them.
Knowledge Base
1. The is manipulated, whereas the _____ is measured.
a. dependent measure
b. independent variable
2. An independent variable must have at least _____ Ievels.
a. 1
b. 2
c. 3 or more
d. 6
3. Give one example of each of the four types of dependent measures: behavioral,
self-report, physiological, and behavioroid.
4. You are a psychologist who is interested in aggression. Provide a descriptive defini-
tion and an operational definition for this concept.
5. Identify which of the following is likely to be an independent variable or a depen-
dent measure.
a. varied temperature
b. ratings of personal attraction
c. bright and dim lighting
d. recognition test
e. recall test
f. hearing or reading a prepared speech.
Reliability and Validity 65
Answers
1. b. independent variable; a dependent measure
2. b. 2
3. See Table 2.2 for examples
4. Examples: descriptive definition-aggression is the intent to harm another person. Opera-
tional definition-number of hostile comments made within an experiment.
5. Independent variable: a, c, f; dependent measure: b, d, e.
6. True
KEY TERM A hypothetical construct is an image, an idea, or theory used to organize hypotheses and data.
Hypothetical constructs enable researchers to speculate about the processes underlying, even
causing, thought and behavior.
Hypothetical constructs cannot be seen, nor can you reach out and touch one. Their
power is not physical but rhetorical-hypothetical constructs help researchers
to create persuasive accounts of how variables appear to behave or influence one
another.
You have lots of attitudes or opinions, for example, but you cannot show them to
anyone. Rather, your thoughts, words, and deeds provide indirect evidence for your
attitudes. If your political attitudes are liberal, then you probably vote for Democratic
candidates, donate to left-wing causes, regularly read The New York Times in lieu of
more conservative newspapers, and are unlikely to sport a "Rush is Right" bumper-
sticker on your car. You may also speak up about traditionally liberal causes, such as
the environment, affirmative action, and the women's movement. Whether the exam-
ple is behavioral or verbal, though, it is clearly the case that we are not "seeing" your
actual liberal attitude-we are simply encountering aspects of what you say and do that
suggest or strongly imply that you harbor liberal tendencies.
Do you see the subtle problem? Hypothetical constructs have no reality per se, but
their presence is essential to theory development and the testing of hypotheses. If you
want to predict which candidate is likely to be elected in a local or national election,
for example, some knowledge of his or her political attitudes is critical-even if that
attitude is only known imperfectly and indirectly. Similarly, I may believe that you have
high self-esteem if you exude confidence, appear articulate, give a firm handshake, and
look me in the eye when you speak to me. Can I actually see your high self-esteem? No,
I only see traces of it via the (potential) effects it has on the way you present yourself.
I could try to envision your self-esteem in a different way, however, by asking you to
complete a standardized psychological instrument designed to measure self-esteem by
reporting a score. Your score could then be compared to the known range of scores, as
well as the average, so that I could determine if you do, indeed, have high self-
esteem. Such test scores are proxy measures, close substitutes, for the actual but hypo-
theticallevel of an individual's self-esteem.
66 Chapter 2 Process of Research in Psychology and Related Fields
As you can no doubt appreciate, hypothetical constructs are integral to our the-
ories and hypotheses-they are literally everywhere in the behavioral sciences-
so that even if we have difficulty establishing their actual existence, we must try to
verify their influence on behavior. Fortunately, psychologists and other behavioral
scientists have focused on ways to carefully measure hypothetical constructs and to
then provide supporting evidence for the measurements. Two main questions are
relevant to the measurement of any psychological construct: Is the measurement of
the construct (or variable thought to represent it) reliable? Is the construct's meas-
ure (or the variable serving as the surrogate) valid? We will review each question and
its implications in detail.
Reliability
In everyday use, the word reliable corresponds to "trustworthy" or possibly "faithful"
or even "true." In research terms, the word reliability has a more precise and circum-
scribed meaning: A reliable measure is one that is stable or consistent across time. That
is, all else being equal, a reliable measure is anticipated to give the same measurement
of the same phenomenon each and every time it is used.
KEY TERM Reliability refers to a measure's stability or consistency across time. If used on the same stimu-
lus, a reliable measure gives approximately the same result each time it is used.
An instrument-a survey, a personality questionnaire, a thermometer, a bathroom
scale-is deemed reliable if it consistently gives the same answer, score, reading, weight,
or result when the same person, object, or construct is measured on two or more oc-
casions. Unless you have had an especially happy series of events in your life (or a num-
ber of crushing personal defeats), your score on a reliable self-esteem scale should not
vary more than a few points from some original score each time you take it.
Given the earlier discussion of sampling error, though, we would not expect you
Reliability = stability = consistency. to get the exact same score on the self-esteem measure each time. There is presumably
a true score reflecting your self-esteem, but any given administration of a self-esteem
measure-yielding what is called an observed score-will not necessarily capture it. If
you took the same self-esteem measure, say, six times over a 3-year period, it is likely
that the true score would fall somewhere among the six observed scores. If the self-
esteem measure is a particularly reliable one, then the difference among the six scores )
I
is apt to be small (i.e., there is a low degree of measurement error). On the other hand,
if the measure is not reliable (or you have recently experienced dramatic ups or downs
in your life) then there is apt to be a relatively large amount of measurement error. Not
only would the six observed scores vary greatly from one another, they would presuma- r
i
bly differ rather substantially from the true (but unknown) score, as well. Let me reit-
erate the main point here one more time: Any measure is apt to demonstrate some
measurement error or "drift" between true scores and observed scores-it is simply the
case that a reliable measure shows less error or drift.
(
Thus far, our discussion of reliability has been conceptual. In actuality, reliability f
is very much a statistical matter. Countless standardized tests and personality inven-
tories used in laboratories, clinics, classrooms, and courtrooms contain information
regarding what are called reliability coefficients. A reliability coefficient is a numeri-
cal index of how stable or consistent the scores on a measure are across two or more
administrations. Reliability coefficients are based on correlation, a method of mea-
suring the association between two or more variables. Although we will introduce cor-
relational research in this chapter, correlation is a basic and important statistical tech-
nique that we will need to spend sufficient time exploring later in this book. There are
Reliability and Validity 67
also several different types of reliability coefficients and we will discuss them, and the
concept of correlation more broadly, in chapter 6. For the present, I recommend that
you focus on the conceptual understanding of reliability as useful and desirable to any
measure used in behavioral science research. We will also review the notion of relia-
bility and its relation to sampling in chapter 9, when we discuss what are called sam-
pling distributions.
We have examined why the measurement of a variable should be consistent or
stable across time. We now turn to the necessity of demonstrating that a research mea-
sure is a valid one.
Validity
The intuitive meaning of the term validity-sound, just, or even well-founded-is close
to its definition in research contexts: Does a measure actually measure what it is sup-
posed to measure? The last phrase may seem like quite a mouthful, but this definition
for validity gets right to the heart of the research enterprise.
KEY TERM Validity is the degree to which an observation or a measurement corresponds to the construct
that was supposed to be observed or measured.
KEY TERM Construct validity examines how well a variable's operational definition reflects the actual nature
and meaning of the theoretical variable.
Consider a research topic such as intelligence, which can be defined as the capacity to
learn from experience and the ability to adapt to one's environment (e.g., Sternberg &
Detterman, 1986). If a researcher were studying intelligence, then the intelligence test
she chose to use would serve as the operational definition of intelligence. The test's
construct validity can be determined by whether and how well it actually measures the
theoretical construct commonly referred to as "intelligence."
Pause for a moment and think about exactly what sort of information you believe
Validity is present when a could reasonably be placed on a test designed to measure an individual's intelligence. I
think you will agree that it is by no means easy to come up with indicators of intelli-
hypothetical construct of interest is
gence that are fair, describe large numbers of people, differentiate among people's dif-
actually being observed or ferent levels of intelligence, and can be agreed on by psychologists and educators who
measured. concern themselves with the study of intelligence. In fact, defining intelligence de-
scriptively and operationally, and then verifying its construct validity, is no small task,
as it can involve verbal comprehension, math skills, pattern analysis, and memory (e.g.,
words, digits, objects). Besides performing the empirical side of research, then, investi-
gators have their conceptual work cut out for themselves, as well.
Let's turn to reviewing the several different approaches to establishing the valid-
ity of a measure. The most basic form of validity is called face validity. When a mea-
sure is said to have face validity, it means that after a superficial analysis, it appears to
be measuring what it set out to measure. To continue our intelligence example, any
number of general knowledge tests would probably serve as reasonable evidence for a
measure's face validity. Thus, a series of questions examining basic math and verbal
skills would be appropriate, but more esoteric questions concerning the art of the high
68 Chapter 2 Process of Research in Psychology and Related Fields
Renaissance or the poems of Octavio paz might be out of place (i.e., too few people
could answer them). Yet some people would associate esoteric knowledge as being more
representative of "intelligence" than skill at answering more basic, even commonsensi-
cal, sorts of questions. As you can see, face validity is a start, but nothing prevents dif-
ferent researchers (or observers like us) from offering different claims about what is or
is not a sign of intelligence.
Convergent validity reduces the difficulty posed by competing opinion or inter-
pretation by focusing on comparing a measure with other related measures and vari-
ables. Responses to a novel measure for intelligence would be compared to people's re-
sponses on existing-previously validated-measures of intelligence. In other words,
the new and existing measures would "converge" on the construct we refer to as intel-
ligence. If the new measure really did tap into the construct, then the researcher would
expect to find that it was at least moderately related to the existing measures of intel-
ligence. To be safe, the researcher would probably also want to see how responses on
the measures fared in comparison to responses on related constructs, such as problem
solving and creativity. Note that problem solving and creativity are related to the con-
struct of-but are not the same as-intelligence. In fact, the strength of the positive re-
lationships among problem solving, creativity, and intelligence should be much lower
than the patterns among the new and existing measures of the construct.
Most researchers also want to show that a new measure of a construct is not related
to particular constructs or variables. In terms of theory and research practice, some mea-
sures should be specifically predicted to be unrelated to a new measure. This form of va-
lidity is called discriminant validity because a new measure should discriminate-dif-
ferentiate or note differences-among a novel measure and other constructs or variables.
After all, what use would a measure be if all other constructs were somehow related to
it? Thus, our novel measure of intelligence would probably not be related to measures
of aggression, sociability, risk-taking, or depression, and its level of association would
presumably be low on a positive scale, close to zero, or perhaps even negative.
Finally, there are two types of validity that are particularly relevant to experimen-
tation, though they do apply in varying degrees to any research effort. Internal valid-
ity is defined as the unambiguous effect of some independent variable on a dependent
measure. That is, the causal relationship between what is manipulated in the experi-
ment and its outcome is clear and free of competing interpretations. Internal validity
addresses. the "inside" of a piece of research while the second type, external validity,
focuses on the "outside" implications of the research (Dunn, 1999). Specifically, exter-
nal validity addresses the question of whether research results can generalize to other
people, at other times, and in other places. This generalizability is of greatest concern
for research that addresses "real-world" issues, such as the search for solutions to social
problems, educational reform, and the psychological study of health and well-
being. To be considered scientific and therefore useful for expanding knowledge, a study
must have internal validity, but it need not possess much in the way of external valid-
ity (for more detail on this issue, see Mook, 1983).
A summary of these various approaches to validity can be found in Table 2.4. Simi-
lar to reliability, illustrating the presence or absence of validity within a piece of re-
search relies on some statistical evidence. In general, this evidence entails demonstrat-
ing how a new measure is statistically associated (or not) with other measures-does it
converge (Le., associate highly and positively) or discriminate (Le., have low or no as-
sociation), for example. At this point, the conceptual background of validity is impor-
tant for you to understand and use as you read the final section of this chapter, which
deals with research designs. The statistical basis for validity will be visited later in the
text in a variety of ways.
Reliability and Validity 69
Knowledge Base
1. True or False: Although they are not real in any physical sense, hypothetical con-
structs allow researchers to build theories and test hypotheses.
2. You complete a personality measure once and then take it again 6 months later.
The two scores are approximately equal to one another, indicating that the mea-
sure has a high degree of _ _ __
a. Discriminant validity
b. Reliability
c. Convergent validity
d. Internal validity
3. _ _ _ _occurs when, as predicted, related measures are positively associated with
) a new measure or variable.
a. Discriminant validity
b. Reliability
c. Convergent validity
,
J d. Internal validity.
4. In order to demonstrate , some variables should not be associated with
/
a new measure or variable.
a. Discriminant validity
j b. Reliability
c. Face validity
) d. External validity
5. Although is desirable in an experiment, ________determines
I whether the results are viewed as scientifically worthy.
r a. Face validity
b. Internal validity
c. Reliability
d. External validity
70 Chapter 2 Process of Research in Psychology and Related Fields
Answers
1. True
2. b. Reliability
3. c. Convergent validity
4. a. Discriminant validity
5. d. External validity; b. internal validity
• Research Designs
The questions of interest to psychologists and other behavioral scientists are asked
within the context of a research design.
KEY TERM A research design is an organized collection of procedures used by researchers to collect be-
havioral data.
There are numerous research designs available, anyone of which can be used for a dis-
tinct purpose or to answer a specific question about behavior. For the sake of clarity
and to foster links between design and data analysis, we will define three main cate-
gories of research design. These three categories are correlational research, experiments,
and quasi-experiments. These research designs are quantitative in nature; that is, they
lend themselves to and make considerable use of statistical analyses. Qualitative designs
tend not to rely on statistics, but they are very useful for examining behavioral phe-
nomena that are not measurable in the traditional sense of the word (see Appendix F).
Although we will discuss the basic features of each category of quantitative design,
please be advised that by their very nature, research designs are tailored to the partic-
ular situation of interest to researchers-presenting them in a generic fashion does not
do justice to their richness or applicability.
Correlational Research
Many research questions begin when an investigator is not sure how two or more
variables relate to one another. Do they "go together" or, as psychologists and sta-
tisticians are wont to say, covary with one another? When variables covary with one
another in some discernable pattern, they are described as "correlated:' Correlational
designs examine the pattern of association among variables, and such patterns are
useful for planning future experiments or discovering unknown relations among
variables.
KEY TERM A correlational design is used to discover predictive relationships and the degree of association
among variables.
The reason investigators study the covariation between (or among) variables is to es-
tablish predictive relationships. If grade point average (GPA) and debating skill were
found to go together, then the admissions teams of law schools might be able to pre-
dict which applicants would make good lawyers. Thus, skilled speakers-those who in-
terview well, for instance-would presumably have good grades, and applicants with
high GPAs would presumably be able debaters.
This description of association between speaking skills and GPA is described as a
positive correlation. A positive correlation occurs between two variables when as the
A correlation can be positive,
value of one variable increases or decreases, the other variables behaves in a similar
negative, or zero, ranging in value manner. In contrast, a negative correlation occurs when the value of one variable in-
from -1.00 to +1.00. creases (e.g., GPA) as the other decreases (e.g., time spent socializing). Perhaps people
Research Designs 71
with higher grades socialize less than individuals with lower grades. No correlation,
sometimes called a zero correlation, means that there is no clear pattern between two
variables; speaking skills, socializing, and GPA do not covary-correlate-with one
another in any discernable fashion.
Do you see any problems with the interpretations offered for these covarying
variables? That is, are the positive or negative associations trustworthy or merely sug-
gestive? To be sure, some skilled speakers do have high GPAs, but many do not, just
as only some persons with high GPAs will be skilled speakers. Some sociable people,
too, have very high grades despite the fact that they burn the candle at both ends
(I'm guessing you know one or two people like this). The fundamental dictum re-
garding correlational research explains these counterexamples: Correlation does not
imply causation. The reason that causality is not implicated in correlational research
is because the variables involved are measured and the associations among them are
identified, but no manipulation of some variables to test their affect on others oc-
curs. Thus, variable X can cause a change in Y, Y can create change in X, or both X
and Y can be affected by some unknown variable Z (see chapter 6). In fact, some
unknown variables P, Q, and R could be the source of the relationship between X
and Y! Correlation does not, cannot identify the causal relationships (if any) among
some group of variables.
Figure 2.3 illustrates how two variables can be related to one another. A random
or nonrandom (e.g., convenience) sample is drawn from some population. Sample
participants react to variables-some independent, some dependent-some of which
are correlated with one another. As you can see in Figure 2.3, the precise direction
of association is unknown because the causal connections between (or among) the vari-
ables are not identified.
Where correlational research is concerned, then, researchers know association and
/
not the causal ordering among the variables unless some are manipulated while others
are measured. Learn the dictum now-correlation does not imply causation-so that
it will serve you well in the future. If you follow it, you can be sure that you will not
draw erroneous conclusions about behavior until you have causal, experimental evi-
( dence to support your speculations.
)
,)
J
I x
x
j
x
I Nonrandom x x
)
~
Population
x x
or
x
I '~-------------
Random Selection
!
Some association Correlation between two variables can
is found be positive, negative, or zero.
between two
variables. lor Other (Unknown) Variables
Experiments
We discussed the basic framework of experimental designs earlier in the chapter. Some I
hypothesis derived from a theory is used to test the affects of manipulating two or more !
levels of an independent variable on some dependent measure. Participants are either
randomly selected from a population before being placed in groups or they are ran-
domly assigned to one of several experimental conditions (i.e., levels of the indepen-
dent variable). The goal of experimentation is to specify a causal relationship based on
,i
I
intervening in some situation-Does the predicted change occur in the outcome (de- Ii
pendent) variable? You will recall that the hypothetical research on memory for Internet ,
search terms illustrating these concepts in action. /
One main methodological aspect of experimentation remains to be introduced, I
however. Even when all potentially influential variables are controlled, held constant, I
or selectively manipulated by a researcher, the experiment still may not represent a /
strong and clear test of the hypothesis. Why? Simply because one or more confounded /
variables may be present. The confounding of variables occurs when the influence of /
the independent variable becomes entangled with the effects of some other variable, I
one not under the control of the researcher. The dependent measure, in turn, is affected'
by this uncontrolled variable.
KEY TERM A confounded variable is an uncontrolled variable that unknowingly but systematically varies with
the independent variable, thereby preventing a clear interpretation of cause and effect between the
independent variable and the dependent measure.
The chief concern about confounds is that they can systematically bias the results
of any study. Of course, it is also true that it is impossible to think of or control for
every possible variable that could affect the outcome and interpretation of a study. In
fact, one of the reasons investigators run multiple studies to examine some behavioral
phenomenon is to rule out potential confounds along the way. Such efforts are not only
necessary where confounds are concerned, but they also enhance the reliability and
validity of obtained results. Researchers, then, must do the best they can to identify and
isolate the most probable confounds in their work.
Are there any confounded variables in the memory for Internet search terms ex-
periment? Well, as the investigator, you might want to be concerned about the previ-
ous computer experiences of the research participants. Participants who have had a lot
of computer experience, especially Internet experience, could systematically bias the re-
sults-their familiarity with Internet searching, for instance, gives them a distinct ad-
vantage over novice users. It is probably difficult to find many people who have not had
some exposure to computers-the technology is now ubiquitous in homes, schools, and
offices-but it may be possible to locate some individuals who have not had much ex-
perience searching the Internet. To avoid this confounded variable-prior computer ex-
perience-your primary concern should be to make certain that either all participants
have little prior experience with the Internet or that those who have had considerable
Research Designs 73
i
experience are dropped from the study. In a future study, of course, you might want to
specifically test whether such prior experience makes any difference in retaining search
terms-but that is for another time.
I Table 2.5 illustrates a model of the standard experiment. The sequence of events is
J
I presented on the left side of the table, and whether the experimental and control groups
receive identical or different information is noted on the right. The discussion in this
chapter, as well as Table 2.5, assumes a very basic experiment involving random as-
signment to one of two participant groups, manipulation of one independent variable,
and measurement of one dependent measure. As you can see in Table 2.5, the partici-
pant groups receive different levels of the independent variable but the identical de-
pendent measure so that any resulting differences can be assessed. Assessment of these
differences (if any) relies on statistics and data analysis.
The basic model of the two-group experiment is graphically illustrated in
Figure 2.4. Random assignment must be used to create two or more groups (prior
Random Convenience
Assignment ,
,,
,
, ~'d.ml
Sample
,#' ,)<'
....... , or
",,
Random "" , x
I I
Independent Selection
Variable
t t x
Introduced
0
I
0 I
x
Population
x x
t t x
Dependent
Measure c=J c=J
t J
Assessment of
BehMeen-Group Difference
QuaSi-experiments
Cook and Campbell (1979) coined the term "quasi-experiments" to apply to those re-
search investigations that lacked one or more of the characteristics that denote a true
experiment (e.g., randomization, manipulation of an independent variable, presence
of a control group). Thus, a quasi-experiment is akin to an experiment, except for the
fact that the actual causal dynamics underlying the research question remain some-
what obscure.
KEY TERM A quasi-experiment is a research deSign resembling an experiment, but it lacks one or more key
features heightening experimental control. QuaSi-experiments enable investigators to approximate
but not delineate causal effects.
Quasi-experiments are used to study those situations that do not lend themselves
to experimentation. The most common reason a quasi-experiment is employed is due
to the inability to randomly assign participants to some treatment condition or group
corresponding to a level of an independent variable. Random assignment is not possi-
ble, for example, when a researcher wants to study the behavior of some intact group
of persons who have experienced something unique. An investigator cannot randomly
assign people to live along the San Andreas fault in California to assess their reactions
to earthquakes, for example. Instead, investigators must be content with drawing con-
clusions about the experiences of those who choose to live there, even if that experi-
ence is "contaminated" by or confounded with many other influential variables (e.g.,
Pennebaker & Harber, 1993).
Use of a quasi-experimental design necessitates demonstrating that between-group
differences are due to some treatment, a naturally occurring event (e.g., an earthquake),
or something about the research participants themselves. Many times an adequate con-
trol or comparison group is not even available. For example, should you compare the
responses of earthquake survivors with those of people who do not live in earthquake
zones, or is it better to compare them with the survivors of other natural (e.g., floods,
fires) or man-made (e.g., shipwrecks, plane crashes) disasters? Is it even necessary to
have a control group (see Data Box 2.D)? Questions like these-and the concerns they
raise-must be duly considered by researchers before any firm conclusions about the
results of a quasi-experiment can be drawn.
(
r Research Designs 75
I
!
(
(
Quasi-experimentation in Action:
What to Do Without Random Assignment or
)
a Control Group
(
,.. W hat recourse do researchers have when interesting behavioral phenomena occur in less than
ideal research circumstances? They adapt by adopting one of many quasi-experimental ap-
proaches. We will briefly consider two excellent examples of quasi-experimental work that lacked
( random assignment and traditional control groups.
) Baum, Gatchel, and Schaeffer (1983) studied stress reactions in the community surround-
ing Three Mile Island (TMI) after the nuclear accident that took place there back in the late 1970s.
Clearly, this traumatic event had obvious societal implications for how people cope with sudden
but unseen dangers-dangers with long-term consequences (e.g., cancer). A quasi-experimental
approach was selected because few variables could be controlled, replication was neither desir-
I able nor feasible, and the event could never be adequately simulated in a laboratory setting. For
) comparing stress reactions, the researchers used three comparison groups in lieu of a single con-
trol group. A comparison group is composed of individuals who approximate the characteristics
found in the treatment group, but they are not usually randomly assigned nor can we be sure
they were drawn from the same theoretical population.
)
Besides interviewing people living near TMI, Baum et al. (1983) also spoke to individuals
dwelling near a functional nuclear power plant and a coal-powered energy facility, as well as a
) group who lived 20 miles from any power plant. The responses of the participants from the three
comparison groups helped the investigators rule out alternative explanations for the stress reac-
/
I tions observed among the people living near the TMI facility. Over a year after the accident, TMI
residents reported more physical symptoms, anxiety, depression, and social alienation than the
members of the three comparison groups, who did not differ from one another on these psy-
chosocial indicators.
{
In contrast to the work of Baum and his colleagues, the second study represents quasi-
(" experimentation on a much smaller scale. Despite the lack of random assignment or any con-
trol (comparison) group, the psychosocial implications of this second study are as profound
as the first. Pennebaker, Barger, and Tiebout (1989) conducted a quasi-experiment on the ef-
fects of disclosing long past traumatic experiences on health, predicting that higher levels of
disclosure regarding the trauma would yield better health several months later. They filmed
33 Holocaust survivors who described their lives and fortunes during World War II, care-
fully monitoring various physiological measures as the survivors spoke on camera. Over a
year later, Pennebaker and colleagues found that higher levels of disclosure were positively
I
I correlated with health indicators-sharing painful memories seemed to enhance health and
well-being.
I Strong conclusions about such correlational data are precluded, of course, but should we
raise the lack of a control group as a concern? Probably not-what experience could rival the
horror of the Holocaust, the murder of millions of European Jews at the hands of the Nazis in
World War II? Of what use-scientific or otherwise-would there be in comparing the night-
) mare tales of survivors to the experiences of some arbitrary comparison group? Meaningful com-
parison groups sometimes elude us, and in this case, perhaps, we should be grateful. Many of
/
)
Pennebaker et al.'s (1989) participants benefited from their testimonials, and the findings were
consistent with other, more experimental evidence (e.g., Pennebaker, 1989). Lack of random as-
! signment or control groups in quasi-experimental design, then, is often offset by the rich, hu-
mane quality these studies bring to the scientific enterprise.
76 Chapter 2 Process of Research in Psychology and Related Fields
Event or
Treatment
of Interest
Sample or
Group of Convenience x
Interest (Nonrandom)
I
Event or
Treatment
of Interest 4> '---,..~:---'~. x
Dependent t t Random ""
Measure c:::::J c:::::J Se lecti 0 n
Population2
t~_~J x )
Assessment of I
!
Between-Group Difference
x
I
I
The review of specific quasi-experimental designs is beyond the scope of this chap-
ter (but see the two examples in Data Box 2.D). Figure 2.5 illustrates but one possible
quasi-experimental design. As you can see, the intact group of interest was drawn (non-
randomly) from one population and the comparison group from another. An event or
treatment could occur to the group of interest either before or after recruitment (see
Figure 2.5). Naturally, the comparison group has no similar experience; indeed, the only
common experience the two groups share is the presentation of the dependent mea-
sure and the subsequent assessment of between group differences at the study's con-
clusion (see Figure 2.5).
Various other quasi-experimental designs exist and they can be found in anyone
of several references (e.g., Campbell, 1969; Campbell & Stanley, 1963; Cook & Camp- /
bell, 1979; Judd & Kenny, 1981). Keep in mind that the results from quasi-experiments
are analyzed using the same statistical and data analytic techniques that are employed
on experimental data. The difference, of course, is that investigators must spend more
time and energy using statistical tools to rule out competing or alternative accounts
for the results obtained in quasi-experiments.
/
Knowledge Base
1. A teacher observes a positive correlation between classroom seating and grades:
Students who sit toward the front of the room have higher grades than those who I
sit in the back of the class do. Does it follow that seating leads to scholastic achieve-
ment? Why or why not?
r
2. Provide an example of (a) a positive correlation and (b) a negative correlation.
3. True or False: It is possible to identify and control for every confounded variable
that could effect an experiment.
4. True or False: Research designs with more than one independent variable and
dependent measure exist.
5. What makes a quasi-experiment different from an experiment?
( Research Designs 77
r
(
! Answers
1. No. Correlation does not imply causation.
2. Any two variable examples are fine. The positive correlation must indicate that as variable X
increases (decreases), variable Y behaves similarly. The negative correlation must show that
as variable X increases in value, variable Y decreases (and vice versa).
I 3. False: No piece of research can eliminate all potential confounded variables.
r 4. True
5. Quasi-experiments lack one or more defining experimental characteristics (e.g., control
group, random assignment).
~
('
r
r
I
..
Project Exercise
_
U SIN GAR AND 0 M N U M B E R S TAB L E
we turn to an exercise for performing random selection and another for random as-
signment. Other randomization techniques can be found in Snedecor and Cochran
(1980), and for interested readers, a work on the history of randomization is also avail-
able (Gigerenzer, Swijtink, Porter, Daston, Beatty, & Kruger, 1989).
.• •..
•• Row
number
(
I
00000 10097 32533 76520 13586 34673 54876 80959 09117 39292 74945
00001 37542 04805 64894 74296 24805 24037 20636 10402 00822 91665
00002 08422 68953 19645 09303 23209 02560 15953 34764 35080 33606
00003 99019 02529 09376 70715 38311 31165 88676 74397 04436 27659
00004 12807 99970 80157 36147 64032 36653 98951 16877 12171 76833
00005 66065 74717 34072 76850 36697 36170 65813 39885 11199 29170
00006 31060 10805 45571 82406 35303 42614 86799 07439 23403 09732
00007 85269 77602 02051 65692 68665 74818 73053 85247 18623 88579
00008 63573 32135 05325 47048 90553 57548 28463 28709 83491 25624
00009 73796 45753 03529 64778 35808 34282 60935 20344 35273 88435 ,t
00010 98520 17767 14905 68607 22109 40558 60970 93433 50500 73998 !
00011 11805 05431 39808 27732 50725 68248 29405 24201 52775 67851
00012 83452 99634 06288 98033 13746 70078 18475 40610 68711 77817
00013 88685 40200 86507 58401 36766 67951 90364 76493 29609 11062
00014 99594 67348 87517 64969 91826 08928 93785 61368 23478 34113
;
00015 65481 17674 17468 50950 58047 76974 73039 57186 40218 16544
00016 80124 35635 17727 08015 45318 22374 21115 78253 14385 53763
00017 74350 99817 77402 77214 43236 00210 45521 64237 96286 02655
00018 69916 26803 66252 29148 36936 87203 76621 13990 94400 56418
00019 09893 20505 14225 68514 46427 56788 96297 78822 54382 14598
00020 91499 14523 68479 27686 46162 83554 94750 89923 37089 20048
00021 80336 94598 26940 36858 70297 34135 53140 33340 42050 82341
00022 44104 81949 85157 47954 32979 26575 57600 40881 22222 06413
00023 12550 73742 11100 02040 12860 74697 96644 89439 28707 25815
00024 63606 49329 16505 34484 40219 52563 43651 77082 07207 31790
00025 61196 90446 26457 47774 51924 33729 65394 59593 42582 60527
00026 15474 45266 95270 79953 59367 83848 82396 10118 33211 59466
00027 94557 28573 67897 54387 54622 44431 91190 42592 92927 45973
00028 42481 16213 97344 08721 16868 48767 03071 12059 25701 46670
00029 23523 78317 73208 89837 68935 91416 26252 29663 05522 82562
00030 04493 52494 75246 33824 45862 51025 61962 79335 65337 12472
00031 00549 97654 64051 88159 96119 63896 54692 82391 23287 29529
00032 35963 15307 26898 09354 33351 35462 77974 50024 90103 39333
00033 59808 08391 45427 26842 83609 49700 13021 24892 78565 20106
00034 46058 85236 01390 92286 77281 44077 93910 83647 70617 42941
00035 32179 00597 87379 25241 05567 07007 86743 17157 85394 11838
/
00036 69234 61406 20117 45204 15956 60000 18743 92423 97118 96338
00037 19565 41430 01758 75379 40419 21585 66674 36806 84962 85207 ,
I
00038 45155 14938 19476 07246 43667 94543 59047 90033 20826 69541
00039 94864 31994 36168 10851 34888 81553 01540 35456 05014 51176
00040 98086 24826 45240 28404 44999 08896 39094 73407 35441 31880
00041 33185 16232 41941 50949 89435 48581 88695 41994 37548 73043
00042 80951 00406 96382 70774 20151 23387 25016 25298 94624 61171
00043 79752 49140 71961 28296 69861 02591 74852 20539 00387 59579
i
Research Designs 79
I 00044 18633 32537 98145 06571 31010 24674 05455 61427 77938 91936
1
j 00045 74029 43902 77557 32270 97790 17119 52527 58021 80814 51748
00046 54178 45611 80993 37143 05335 12969 56127 19255 36040 90324
00047 11664 49883 52079 84827 59381 71539 09973 33440 88461 23356
00048 48324 77928 31249 64710 02295 36870 32307 57546 15020 09994
I
00049 69074 94138 87637 91976 35584 04401 10518 21615 01848 76938
I 00050 09188 20097 32825 39527 04220 86304 83389 87374 64278 58044
;
/
00051 90045 85497 51981 50654 94938 81997 91870 76150 68476 64659
t 00052 73189 50207 47677 26269 62290 64464 27124 67018 41361 82760
00053 75768 76490 20971 87749 90429 12272 95375 05871 93823 43178
00054 54016 44056 66281 31003 00682 27398 20714 53295 07706 17813
)
00055 08358 69910 78542 42785 13661 58873 04618 97553 31223 08420
I
00056 28306 03264 81333 10591 40510 07893 32604 60475 94119 01840
~ 00057 53840 86233 81594 13628 51215 90290 28466 68795 77762 20791
00058 91757 53741 61613 62669 50263 90212 55781 76514 83483 47055
00059 89415 92694 00397 58391 12607 17646 48949 72306 94541 37408
i
/ PERFORMING RANDOM SELECTION
I Perhaps you have a population of individuals-the members of a seminar class-and
I you want to draw a sample from it. Table 2.7 lists the 10 students (in alphabetical or-
der) enrolled in the seminar. You elect to randomly select 5 individuals from the popu-
f
I
lation of 10. (Please note that the procedure I am using here can be expanded for
application to much larger populations-the logic is the same.)
iI Close your eyes and take your index finger (either hand) and place it anywhere on
P
I
Table 2.6. When I did so, I landed on a number string in row 00038, column 2. The
number string is 14938. If! read across and treat each digit in the string as corresponding
to a one-digit numbered name in Table 2.7, then I have randomly selected Arletta, Doyle,
Isaac, Carol, and Harriett to be the sample from the larger population (Le., class).
/ Alternatively, I could have begun reading down the table from that first digit in row
(
00038, column 2 of Table 2.6. If so, then Areletta would still be in the sample, and she
would be followed by (3) Carol and (2) Biff-I would then skip 1 because Arletta was
/
!
'r
Table 2.7 Population Comprised of 10 Students
I II; It
1. Arletta
•"""• 2. Biff
3. Carol
)
4. Doyle
5. Ernest
I 6. Fran
) 7. Geoff
.- 8. Harriet
I
) 9. Isaac
10. Jennie
80 Chapter 2 Process of Research in Psychology and Related Fields
As you might expect, the random selection exercise we just used can be adjusted to per-
form random assignment. Keep in mind that we use random selection when we have a
clearly identified population to draw from, and that random assignment is typically em-
ployed when we have a convenience sample available. That is, we have a group of indi-
viduals who will be randomly assigned to experience one of two (or more) levels of an
independent variable in an experiment. The following random assignment procedure
is adapted from Salkind (1997) and Dunn (1999):
1. Recruit a convenience sample containing the number of participants you need.
Imagine that the names in Table 2.7 comprise such a sample.
2. Assign numbers to each member in the sample (the names do not need to be in
any special order). The names in Table 2.7 are already numbered 1 to 10.
3. Use a table of random numbers (see Table 2.6) to select the appropriate number
to be placed in one condition (e.g., 5); remaining names will be assigned to the
control condition.
4. You can start the random assignment procedure by closing your eyes and placing
your finger anywhere on Table 2.6. Following essentially the same procedure out-
lined above for random selection, you can then begin to search for one-digit num-
bers from 1 to 10. The first five names selected are assigned to one group, the other
five to the other group.
Suggested Exercise: Add 10 names to Table 2.7 (i.e., there will now be 20) and then
randomly assign 10 names to one group-the remaining 10 will comprise the control
group.
Thought Exercise: How could you use the random numbers table (Table 2.6) to as-
sign members of some sample to three or four conditions in an experiment? Be cre-
ative but avoid making the procedure too cumbersome. Explain the logic of your strat-
egy, and be sure that you do not violate randomness.
Research Designs 81
Summary
1. The main message of this chapter was that good research and 7. Even the most meticulous research can have sampling error,
quality research designs are enhanced by the rigorous and the difference between a sample statistic's value and its corre-
appropriate use of statistics and data analysis. sponding population parameter. Generally speaking, large
2. The research loop of experimentation outlined five discrete samples demonstrate lower sampling error than small samples.
steps that identify the work conducted at each stage of the 8. To test hypotheses, researchers manipulate independent vari-
research process, which is cyclical-research on any question ables within experiments. Independent variables are under
I
i never really ends. Activities known only to actual researchers the researcher's control. An independent variable must have
( occur between each of the steps in the research loop. at least two different levels, one of which is the treatment of
3. Experiments create influential changes into a setting so that interest and the other traditionally represents a control con-
participants' reactions can be observed, measured, and re- dition.
I corded. True experiments enable investigators to determine 9. A dependent measure is the outcome measure in an experi-
cause and effect relationships between variables. ment or study. Its value depends on the effect of the inde-
I 4. In order to verify research results, duplicate or "replication" pendent variable. By assessing a dependent measure, a re-
studies are frequently conducted. Some replications are lit- searcher can determine whether a treatment or intervention
eral, others are conceptual; that is, not all parts of the new have any effect on behavior (i.e., did the predicted change
study remain constant with the original. occur?).
/ 5. Random assignment entails assigning participants from 10. The four basic types of dependent measures were introduced:
some sample-random or nonrandom-to one of the con- behavioral, self-report, physiological, and behavioroid.
ditions or groups in an experiment. The goal of random as- 11. Descriptive definitions explain theoretical relationships
signment is to equalize groups prior to the manipulation and among variables, whereas operational definitions outline
subsequent measurement of any variables. concrete operations of how these variables are actually mea-
/
I 6. Different types of random samples (e.g., stratified random sured or manipulated.
sampling, quota sampling) were introduced, and each was 12. The measurement of any hypothetical construct should be
shown to apply to particular research circumstances or to ad- both reliable and valid. That is, the measure should be stable,
dress certain needs. giving consistent readings across time. It should also be
82 Chapter 2 Process of Research in Psychology and Related Fields
demonstrated to be actually tapping into the theoretical con- 14. Investigators must be careful to eliminate confounds, or un-
struct it is supposed to measure. controlled variables that can systematically mask the effects,
l3. Three classes of research designs-correlational, experimen- if any, among independent and dependent variables.
tal, and quasi-experimental-were presented. Only experi- 15. Random numbers tables contain sequences of numbers be-
mental designs ensure that cause and effect relations among tween 0 and 9 that were generated in an unbiased manner.
variables can be isolated. Correlated and quasi-experimental Such tables have great utility when it comes to randomly as-
designs are suggestive but not definitive where causality is signing participants to groups or randomly selecting indi-
concerned. viduals to comprise a sample.
Key Terms
Comparison group (p. 75 in Fig. 2.5) Experiment (p. 47) Random assignment (p. 49)
Conceptual replication (p. 48) External validity (p. 68) Random numbers table (p. 77)
Confounded variable (p. 72) Face validity (p. 67) Reliability (p. 66) (
Construct validity (p. 67) Hypothetical construct (p. 65) Reliability coefficient (p. 66)
Convergent validity (p. 68) Independent variable (p. 56) Replication (p. 48)
Correlational design (p. 70) Internal validity (p. 68) Research design (p. 70)
Dependent measure (p. 57) Negative correlation (p. 70) Sampling error (p. 52)
Dependent variable (p. 57) Operational definition (p. 63) Validity (p. 67)
Descriptive definition (p. 63) Positive correlation (p. 70) Zero correlation (p. 71)
Discriminant validity (p. 68) Quasi-experiment (p. 74)
Chapter Problems
I. Describe the steps comprising the research loop of experi- 12. What is the minimum number of levels for an independent
mentation. Does it differ from the scientific method intro- variable? Why? Is there a maximum number?
duced in chapter I? How does the research loop help investi- l3. Define the term dependent measure (or variable) and pro-
gators do research? vide a concrete example of one. How are dependent measures
2. What is an experiment? What is the goal of experimentation? used in research?
How does it differ from other forms of behavioral research? 14. How does the independent variable differ from a dependent
3. What are replication studies and why are they scientifically measure?
useful? How does a conceptual (systematic) replication study 15. Create a hypothetical experiment. Describe the randomizing
differ from a standard replication? procedure(s) you would use, as well as the hypothesis, the in- r
4. What is random assignment? How and why is it useful in be- dependent variable(s), and the dependent variable(s). What
havioral science research? do you predict will happen in the experiment? Why?
5. In terms of scientific utility and purpose, how does random 16. Define the four types of dependent measures and then pro-
assignment differ from random selection? vide an example to represent each one.
f
6. What is a convenience or haphazard sample? 17. Why does good research in the behavioral sciences generate
7. Explain the difference between systematic random sampling more questions than it answers? .
and stratified random sampling. How are these sampling 18. What is causality? Why is it important to research in the be-
techniques used? Create a hypothetical example to illustrate havioral sciences? In what sort of research is causality illus-
each one. trated?
8. Explain the difference between cluster sampling and quota 19. What is a descriptive definition and how does it differ from r
sampling. How are these techniques used? Create a hypothet- an operational definition?
ical example to illustrate each one. 20. Create a descriptive definition for a variable of choice and
9. Define sampling error? Why do researchers need to be con- then illustrate it using an operational definition. ,.
cerned about it? Is sampling error common or rare in re- 2I. Write operational definitions for each of the following vari-
search? Why? abies: helping, fear, procrastination, tardiness, happiness, at-
10. What can investigators do to reduce sampling error? traction, factual recall.
II. Define the term independent variable and provide a concrete 22. What are hypothetical constructs? Why are these con-
example of one. How are independent variables used in re- structs important to research efforts? Provide an example
search? of one.
Chapter Problems 83
23. What is reliability and why is it important when measuring 32. Why are correlational research designs used? How do they
variables? Create an example to illustrate reliability. differ from experiments or quasi-experiments?
24. In general, what does the term validity mean within be- 33. Define the concept of correlation. Create hypothetical exam-
havioral science research? Provide a general example of ples to illustrate a positive, a negative, and a zero correlation.
validity. 34. Does correlation imply causation? Why or why not?
25. What is construct validity? Why is it important? Provide a hy- 35. What is a confounded variable in an experiment? Why does it
pothetical example of construct validity. prove to be problematic for understanding causality?
26. What is face validity? Why is it used? Does it have any draw- 36. Why are quasi-experimental designs used in lieu of experi-
backs? mental designs? Do quasi-experimental research designs have
27. Define convergent validity and discriminant validity. Are an advantage over correlational designs? Why or why not?
these concepts related? Illustrate using examples how these 37. What is a random numbers table? How would a researcher
two types of validity are used in research. use one in her work?
28. What is internal validity? Why is it important to research? 38. Describe the procedure for using a random numbers table to
29. What is external validity? Why is it important to research? perform random selection from some population.
30. Describe a hypothetical experiment to illustrate the differ- 39. Describe the procedure for using a random numbers table to
ence between internal and external validity. perform random assignment for a basic two group experiment.
/ 31. What are the three categories of research design? Describe the 40. Why is randomization so important to research in the behav-
strengths and weaknesses of each one. ioral sciences?
I
;
,;
APA Style: Figure or Table? Choosing a Graph
"Ifyes; . ,", If n6, I{yes, If no, then go 'If yes,'pldfthe'" <If fIo,then
then it is then it is plot the data to question 2. data using a reevaluate the
technically technically a using a bar histogram or data and con-
a figure. table, which graph. a frequency sider an alter-
can contain polygon. "2, native way to
..
i only numbers: present them.
1. ' 1. ". 2.
When creating a Does the Does the
data display, is distribution distribution have
your goal to appear to be more than one
provide a quick symmetric? bell-shaped
summary for I curve?
readers?
Ifyes/! then '/' "If fIo, If res, If no, If yes,·' ::', / If no, then the
graphthe then go to then go to then go to then the distribution
data using a question 2. question 2. question 3. distribution may be
bar graph, may be normal.
histogram, bimodal or
frequency multimodal.
polygon, or
stem and leaf
diagram.
3. 4- (
Are the scores in Are the scores in
·····2. this nonsym metric this nonsymmetric
When creating a distribution distribution
data display, is cl uste red at the clustered at the !
your goal to low end high end
encourage of the scale? of the scale?
readers to
carefu lIy review
the data?
If yes, then the If no, then If yes, then the If no, then
distribution go to distribution the distribution f
If ye~; th~n phice ,"" If no, question 4. has a cannot be
has a
the data into a then positive skew. negative skew. described in
frequency reevaluate your conventional
distribution table goals and terms.
that has clear decide whether a I
labels and identi- data display
fies important is necessary.
categories.
CHAPTER 3
CIIapter Oullne
Knowledge Base
• Graphing Frequency ni.trihilti~
Bar Graphs
DATA DISPLAY
Data Box 3.B: Biased
Display-Appearances Can Be
Deceiving
Tukey's Tallies
Knowledge Base
• Envisioning the Shape of
Distributions
atterns in a graph of data can be revealing. In the mid-19th century, a London Data Box 3.C: Kurtosis, or Wh
physician plotted the location of residents who died in an on-going cholera the Point Spread?
epidemic (Gilbert, 1958; Tufte, 1983). Cholera is a dangerous bacterial infection of Data Box 3.D: Elegant ,
the small intestine marked by continuous watery diarrhea and severe dehydration. This Information-Napoleon's nz-fa~dj)
deadly disease is most often transmitted through contaminated drinking water. Fig- March to Moscow
ure 3.1 shows Dr. John Snow's map of central London. The dots in Figure 3.1 represent • Percentiles and Percentile KaIIK""Yi'
deaths due to cholera, and the eleven XS (X) mark water pumps in the downtown area. Cumulative Frequency
Snow concluded that the majority of the deaths occurred among people who lived close Cumulative Percentage
to and presumably drank from the Broad Street water pump (which is denoted by the X Calculating Percentile Rank
j adjacent to the "D" in BROAD STREET in Figure 3.1). After examining the pattern of the Reversing the Process: Finding
Scores from Percentile Ranks
dots on the map, he had the handle of the contaminated water pump removed and the
Exploring Data: Calculating
cholera epidemic ended-after the loss of 500 souls living in the immediate neighbor- Middle Percentiles and r \•• ,._.CI"'''',,
hood. Writing About Percentiles
Although you will probably not draw maps or track diseases, you can still learn to Knowledge Base
organize information and identify relationships between variables, the real lessons of
Snow's cholera map. Even the most interesting ideas or facts can be missed if they are Less is More: Avoiding Chartj
not presented in ways that highlight their promise or meaning for interpreting behav- and Tableclutter, and Other
ior. Chapter 3 represents a return to the study of statistics and data analysis, as the mate- Suggestions
American Psychological Associ~%6il1
J
I rial in this chapter is designed to help you develop basic skills for organizing, presenting,
and explaining data. The research foundation interlude comprising chapter 2 provided (APA) Style Guidelines for Dat~,L!'1
f essential background material, of course, as you needed a context for the descriptive
Display ,
Project Exercise: Discussing th
statistical procedures that appear in this and the next chapter. Benefits ofAccurate but Persu
I As you read this chapter, then, imagine that you have some set of data and that you Data Display
are beginning to look for any underlying relationships or obvious patterns that might be • Looking Forward, Then Back
in it. In this chapter, you will learn about frequency distributions, graphing, and data • Summary
display. These three activities are related ways of summarizing data, and thus they fall • KeyTerms
Chaplet 3
86
Figure 3.1 Dr. Snow's Central London Map of the 1854 Cholera Epidemic
Source: Tufte (1983), p. 24; Gilbert (1958).
under the heading of descriptive statistics, a rubric that was introduced in chapter 1. (
I
Why summarize data? There are two reasons for examining frequencies and graphing or
displaying data. First, investigators examine their data to get a "feel" for it. What does the
pattern of results look like? Do the data conform to any established patterns? Are partic- r
ipants' responses within anticipated boundaries or do some data fall at the extremes-
the outer limits-of a scale? Thus, researchers summarize their data to fully understand
it before they begin to ask questions about the hypothesis that was tested or whether any
predictions were realized.
The second reason for investigators to summarize their data is for presentation to
Summaries-tables, figures, and
interested others. These "others" can include fellow researchers, students, or future readers r
graphs-educate and encourage of resulting publications. Summarizing data through plotting their frequency-how
viewers to think about relationships often they occur in a sample-or through some other means of display serves a public,
within data; sometimes, such educational function. Researchers want to share their work because science does not ad-
summaries can spark controversies,
vance in a vacuum. By sharing any patterns in their data, investigators can begin to tell I
their research "story" to an audience. The initial audience may be a friend or two, or per-
leading to creative insights about haps a colleague, but as clarity in the data emerges, a researcher's audience is apt to grow.
behavior. Audiences, too, are not passive-they will ask questions of the researcher and the data-
What is a Frequency Distribution? 87
indeed, they will "demand" that any emerging facts or patterns be explained elearly and
concisely. On occasion, members of a researcher's audience will challenge his or her con-
elusions, an event that is entirely acceptable within the framework provided by both the
scientific method and the research loop of experimentation. What better way to antici-
pate or prepare for possible debate than to know your data set, as it were, inside and out?
As you will see, the presentation of data in tabular or graphic form is both basic and
critical to statistical analysis.
KEY TERM A frequency distribution is a table presenting the number of participant responses (e.g., scores,
values) within the numerical categories of some scale of measurement.
;
1
Thus, frequency distributions display the type of responses made in a piece of research,
as well as how many participants actually made each response (i.e., frequency).
I Imagine you were interested in studying how transfer students adapt to a new
campus environment. Personality factors certainly foster such adaptation, so you de-
cide to measure the self-reported optimism-pessimism of new transfers to your insti-
tution. The data in Table 3.1 are scores on the Lift Orientation Test (LOT; Scheier &
Carver, 1985). The LOT is an eight-item measure of dispositional optimism, a gen-
eralized expectation that future outcomes will be positive. Respondents with
higher scores on the LOT can be described as optimistic, while those with lower scores
tend toward pessimism. The LOT is comprised of favorable ("In uncertain times I usu-
I ally expect the best") and unfavorable ("If something can go wrong for me, it
; will") statements. Respondents rate their agreement with each statement on a five-
I
I
point interval scale (1 = strongly disagree to 5 = strongly agree). The ratings are then
summed to create an optimism score for each respondent, and these scores can range
) from 8 to 40.
/
Table 3.1 Thirty Raw Scores on the Life Orientation Test
....
..
••
10 22 30 8 27
33 21 14 17 20
22 22 9 10 10
10 35 33 31 40
30 29 22 34 32
J 32 36 22 19 12
/
Note: Scores on the LOT can range from 8 to 40.
88 Chapter 3 Frequency Distributions, Graphing, and Data Display
i
I
As you can see, the LOT scores are in their "raw"-that is, unorganized-form. It is (
difficult to look at the scores and draw any conclusions about them (or the sample they i
were drawn from) because they are not yet organized. Our "eye-balling" of the data tells ,;
us there are quite a few scores available, but there is no organizing scheme or framework
to tell us much more than that. In fact, you would have to take more than a few moments ;
to search through the data to identify the largest value, the smallest, the most or least fre-
quently occurring score, and so on.
The structure imposed by a simple frequency distribution, however, enables a
(
Table 3.2 Frequency Distribution for Life Orientation Test (LOT) Scores
"""" x I IX
"" 40 1 40 (40 X 1 = 40)
39 0 0 (39 x 0 = 0)
38 0 0 (38 x 0 = 0)
37 0 0 (37 x 0 = 0)
36 36 (36 x 1 = 36)
35 35 (35 Xl = 35)
34 34 (34 Xl = 34)
33 2 66 (33 x 2 = 66)
32 32 (32 x 1 = 32)
31 1 31 (31Xl=31)
30 2 60 (30 x 2 = 60)
29 1 29 (29 Xl = 29)
28 0 0 (28 x 0 = 0)
27 27 (27 Xl = 27)
26 0 0 (26 x 0 = 0)
25 25 (25 x 1 = 25)
24 0 0 (24 x 0 = 0)
23 0 0 (23 x 0 = 0)
22 5 110 (22 x 5 = 110)
21 21 (21 x 1 = 21)
20 20 (20 Xl = 20)
I 19 19 (19 x 1 = 19)
I 18 0 0 (18 x 0 = 18)
(
17 17 (17 x 1 = 17)
16 0 0 (16 x 0 = 0)
15 0 0 (15 x 0 = 0)
14 14 (14 x 1 = 14)
13 0 0 (13 x 0 = 0)
12 1 12 (12 x 1 = 12)
11 0 0 (11XO=0)
10 4 40 (10 x 4 = 40)
9 9 (9 x 1 = 9)
8 8 (8 x 1 = 8)
,I noted beneath it until the lowest possible score (i.e., X = 8) is reached. Please note that
the convention of ranking data from high to low is traditional but arbitrary-there
would be nothing wrong with ranking them in a reverse order.
The next consideration is frequency, the number of times particular measurements
appear in the data. Individuals giving the same response-here, having the same scores
on the LOT-are grouped together (see the second column labeled f for "frequency"
in Table 3.2). The sum of the frequencies eIf) is equal to N. As you can see at the
bottom of the second column in Table 3.2, there are 30 transfer students in the sam-
i ple (i.e., N = 30). As shown in Table 3.2, 5 transfer students scored a 22 on the LOT, and
4 others scored a 10. These are the most frequently occurring scores. The scores with the
next highest frequency represent a tie-scores of 33 and 35 each had 2 respon-
dents. The remaining scores in the distribution had either 1 or no respondents. Why
90 Chapter 3 Frequency Distributions, Graphing, and Data Display
bother indicating those scores lacking respondents? Well, 0 is a frequency and for rea-
sons of consistency, clarity, and thoroughness, most frequency distributions routinely
include those scores that lack corresponding respondents (e.g., no one in the sample
scored a 37 on the LOT, so that the ffor that score is 0). In fact, please note that 12 other
individual scores had a frequency of 0, as well (see Table 3.2).
There are several virtues associated with any frequency distribution. First, a viewer
can see the spread of scores-are they high or low, for example, or concentrated in one
or several areas? Second, any given score can be considered in relation to all other scores.
A score of 34 on the LOT can easily be seen as more frequent and higher than most other
scores in the sample. Third, by looking at the base of the f column, the N of the respon-
dents can be quickly known. Thus,
[3.1.1] 'Lf=N,
From this point forward in the book, [3.1.2] 'Lf= 30.
all equations will be numbered. The Fourth, the data are organized and ready for use in additional analyses.
numbering scheme will help you The sum of scores-that is, 'L X-is one quick calculation for other analyses that a
researcher might want from data like those listed in Table 3.2. The sum of the scores can-
locate information you need quickly
not be determined simply by summing all the values of X, however. Why not? Because
and easily. For example, 3.1.1 means that summation ignores that fact that the different scores occur with different frequen-
"chapter 3, equation 1, step 1." cies-the appropriate calculation must use both X andf It is possible to add all the (raw)
values appearing in Table 3.1 (i.e., 'L X = 10 + 22 + 30 + ... + 12 = 685), but that
would be time consuming and defeat the purpose of creating a frequency distribution in
the first place.
Instead, the quickest and easiest way to compute the 'L X is to multiply each score of (
X by its corresponding frequency (f) and to then add the resulting products together.
Symbolically, then,
[3.2.1]
The third column in Table 3.2 shows the products for f X for each score. The 'L f X is
equal to 685, a sum shown at the bottom of the third column in the table. The fourth col-
umn in Table 3.2 shows the multiplication so that you can verify the numbers in the
third (f X) column of the table. Please note that the fourth column of information is
provided here for you to make certain you understand how 'L f X was calculated
in this example only. You need not add this fourth column to any frequency distribution
you construct in the future (nor will it appear in later tables in the book).
i
Proportions and Percentages
Two other pieces of information-proportion and percentage-are often included !
in frequency distributions. As you will see, either one can be determined from the
other.
A proportion is a number reflecting a given frequency's (f) relationship to the N of
the available sample or group. Put another way, a proportion is a fractional value of the
total group associated with each individual score. Let's consider an example using the
(
data in Table 3.2. As shown in Table 3.2, 4 of the transfer respondents had a score of X =
10. The proportion would be 4 out of 30 respondents had X = 10, or 4/30 = .l333. The
formula for determining a proportion, then, is
[3.3.1] proportion = p = fiN.
Proportions can be shown as fractions (i.e., 1/2) within text or a table, but they are usu-
ally presented in decimal form (i.e., 0.50). Table 3.3 repeats the information from
What is a Frequency Distribution? 91
Table 3.3 Relative Frequency Distribution for Life Orientation Test (LOT) Scores
..
• •
• •
x f p=f/N p(IOO) =%
40 1 .0333 3.33
39 0 0 0
38 0 0 0
37 0 0 0
37 .0333 3.33
35 .0333 3.33
34 .0333 3.33
33 2 .0667 6.67
32 .0333 3.33
31 1 .0333 3.33
30 2 .0667 6.67
29 1 .0333 3.33
28 0 0 0
27 1 .0333 3.33
26 0 0 0
25 .0333 3.33
24 0 0 0
23 0 0 0
22 5 .1667 16.67
21 .0333 3.33
20 .0333 3.33
I 19 .0333 3.33
18 0 0 0
17 1 .0333 3.33
16 0 0 0
15 0 0 0
14 .0333 3.33
13 0 0 0
12 .0333 3.33
11 0 0 0
10 4 .1333 13.33
9 1 .0333 3.33
8 .0333 3.33
Note: The data in this table are the raw scores from Table 3.1.
/ Table 3.2 except that the new third column, which is labeled p, shows the proportions
I corresponding to each of the LOT scores. As noted at the bottom of column 3, all of the
available proportions will sum to 1.00.
Although the two indexes are closely related, percent is a much more common
index of relative position within data than is the proportion. A percent is a number that
expresses the proportion of some score per hundred, and it is often used to simplify ex-
I plaining a set of data or reporting relationships within it. Instead of reporting that 12 of
the 30 transfer students came from a large state university, you could report that
/ "40% came from large state universities, 50% came from liberal arts colleges, and the
remaining 10% of the sample arrived from community colleges." Percents-or as they
are interchangeably called, percentages-simplify numerical relationships by making
the component parts sum to 100%.
92 Chapter 3 Frequency Distributions, Graphing, and Data Display
Calculating percentages from proportional data is easy. Using the data from
When you know a proportion, its
Table 3.2 once more, we can determine the percentage of transfer students who scored a
22 on the LOT. The percent formula entails multiplying a proportion by 100 or
percentage equivalent is quickly and
[3.4.1] percent = (p) X (100) = (fIN) X (100),
easily calculable (and vice versa).
[3.4.2] (:0) X (100) = (.1667) X (100) = 16.67.
ing how the LOT scores can be grouped into intervals in the process. Please note that
these are guidelines, not hard and fast rules-you may group a frequency distribution in
any number of ways as long as it can be understood by you and interested others. Indeed,
you need not follow these or any other guidelines as long as your data are grouped in a
reasonably logical and consistent manner. I think, though, that the following guidelines
will prove to be helpful to you.
1. Begin by calculating the difference between the highest and the lowest scores in a
data set, and then add 1 to this difference. The resulting number represents the
number of possible score values within a grouped frequency distribution. As shown
by Table 3.2 or Table 3.3, the difference between the high (40) and low score (8) is
32. Adding 1 to this number indicates that there are 33 possible score values. Here is
a formula illustrating the highest minus the lowest difference plus 1:
The first (lowest) interval of LOT scores will range between 8 (lower limit) and 11
(upper limit). You would then calculate the next interval by treating the next high-
est score as the lower limit of this second grouping (i.e., 12):
Table 3.4 Intervals for Grouped Frequency Distribution for LOT Scores
\." Class Intervals of X
••
40-44
36-39
32-35
28-31
24-26
20-23
16-19
12-15
8-11
Note: The intervals in this table are based on the frequency distribution shown in Table 3.2, which in turn is
based on the raw scores from Table 3.l.
and you will probably experience them again. That is, if you ever conduct indepen-
dent or supervised research, you will probably rely on some published inventory,
personality scale, or intelligence measure that has a fixed low and high score (most
of the standardized tests you have taken all your life have these qualities, as well).
Thus, I purposefully selected this example so that you could experience real-life
data problem that confronts many researchers. Sometimes, even the best laid
Real data are rarely neat and tidy, and
plans-or easy to use guidelines-will not work for the data you possess. But, how
a good data analyst learns to be do we present the LOT scores? When the dimensions of a measure are constrained
flexible by revising any data by fixed upper and lower limits (Le., scores), start constructing the intervals from
formulation or presentation as the highest interval, not the lowest. To do so, you would subtract the interval size
minus 1 number from the highest score and work backward to the lower limit of an
needed. Providing the clearest
interval. The first (highest interval) for the LOT data would be from 37 to 40, and it
account of the information is always is calculated using:
the goal.
[3.7.1] lower limit = upper limit - (interval size - 1),
[3.7.2] lower limit = 40 - (4 - 1) = 40 - 3 = 7.
A table can then be constructed in the same (albeit reverse) fashion as was dis-
cussed above. The revised grouped frequency distribution for the LOT data is I
shown in Table 3.5. As you can see, there is still a ninth interval-the lowest
one-added to the original eight intervals envisioned in step 2. This addition is )
a small price to pay, as it were, so that the uppermost possible score is correct-
and there is little difficulty posed by the fact that the lowest possible score on (
the LOT (i.e., 8) resides in the scoring interval 5-8 (see Table 3.5). A note about
the range of fixed scores on any standardized measure is then added beneath
r"
the table.
4. Finally, we work with the actual data. Enter all of the raw scores into their appropri- I
ate scoring intervals and under a column labeled "frequency" or f(see Table 3.5).
(
5. I add this last step as an error check to avoid mistakes or misinterpretation: You I
should make certain that no errors have occurred in the construction of the grouped
frequency distribution. To do so, check the following information in Table 3.5 and
in any future tables you construct:
• Are all the class intervals the same width? They should be the same width.
• Do any class intervals overlap with one another? They should not.
• Do all the data fit into the table? There should be no leftover scores.
What is a Frequency Distribution? 95
-.
-..- Class Intervals of X
37-40
f
33-36 5
29-32 5
25-28 2
21-24 6
17-20 3
13-16 1
9-12 6
5- 8
Lf= 30
Note: The intervals in this table are based on the frequency distribution shown in Table 3.2, which in turn is
based on the raw scores from Table 3.1.
By examining the grouped frequency distribution in Table 3.5, you can get a much
clearer sense of the data. Most respondents scored above a 20 on the LOT, indicating that
the majority of the sample could be described as optimistic. There was a pocket of pes-
simists who scored in the 9-12 interval, as well as two groups of five respondents scoring
in the 29-32 and the 33-36 intervals, respectively. Finally, only 1 respondent fell into the
highest (37-40) and the lowest (5-8) class intervals-most scored in the middle of the
scale (21-24) or above. As you can see, a grouped frequency distribution like this one
provides much more information more quickly than a standard or even a relative fre-
quency distribution.
Table 3.6 Grouped Frequency Distribution of LOT Scores with True Limits and Class Intervals
-.,.-. Class Intervals of X /
.. ..
36.5-40.5
32.5-36.5 5
28.5-32.5 5
24.5-28.5 2
20.5-24.5 6
16.5-20.5 3
12.5-16.5 1
8.5-12.5 6
4.5- 8.5
I/=30
Note: The intervals in this table are based on the frequency distribution shown in Table 3.2, which in turn is
based on the raw scores from Table 3.1.
Do you notice something interesting happening here with the interval values? The
upper true limit of a given class is the same value as the lower true limit of the next
class interval (or lower true limit of a given class is the upper true limit of the one
below it). Table 3.6 illustrates these facts clearly. Why does this matter? No information
is lost because no gaps appear in the LOT or whatever scale we choose to use. We may
be partitioning a continuous variable to make sense out of it, but by using true limits
with class intervals, we are making certain that nothing is being missed or overlooked.
The use of true limits within a grouped frequency distribution makes the display of
data precise. As we will see shortly, it also prepares the data for ready use in graphs.
Knowledge Base
1. Construct a relative frequency distribution using the following data, and be sure to
include a column for proportions and one for percentages.
8,9,8,8,9,10,11,14,16,8,9,12,14,13,13,11,9,10,
2. What is the If of the distribution in I? What is the IX? What is the easiest way to
calculate the Ix from a frequency distribution?
3. A group of scores ranges from 40 to 80. How many possible scores are there in the
data set? What interval size is appropriate? How many intervals would you recom- /
mend?
4. You have an interval ranging from 58 to 62. What are its true limits? I
Answers /
1. X / p % ;'
16 .1111 11.11
15 0 0 0
14 2 .1111 11.11
13 2 .1111 11.11
12 1 .0556 5.56
11 2 .1111 11.11
10 2 .1111 11.11
9 4 .2222 22.22
8 4 .2222 22.22
Graphing Frequency Distributions 97
KEY TERM A graph is a diagram illustrating connections or relationships among two or more variables. Graphs
are often made up of connecting lines or dots.
All graphs present data using a two-dimensional display composed of two perpendic-
ular lines or axes. One axis-the x axis-is a horizontal line usually portrayed as run-
Graphs are different from figures,
ning along the bottom of a graph (see Figure 3.2). The x axis is sometimes referred to
; which in turn are different than tables. as the abscissa and it is used to plot independent variables. The other axis, or y axis,
Learning the precise distinctions is represented by a vertical line extending up and down the left side of the graph (see
J among them will enhance your Figure 3.2). The alternate name for the yaxis is the ordinate, and it traditionally plots de-
; understanding in the long run.
pendent measures. Within most graphs, the two axes meet in the bottom-left corner at
the value of O. As one moves along the horizontal x axis from the lowest value (Le., 0) to
the right, the values for x increase. Similarly, the values for y begin at 0 and increase as
you move up the vertical line to their highest point (see Figure 3.2). In general, the range
of values is typically higher along the x axis than the y axis. This guideline creates graph-
( ics that are proportionally pleasing to the eye as well as accurate where data presentation
is concerned.
I
J
(
J
y-Axis or Ordinate
)
/
Lower Values for y '--_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __
/ o
Lower Values for x Higher Values for x
!
.I x-Axis or Abscissa
Figure 3.2 The Two Axes (xand y) Used for Graphing Data
98 Chapter 3 Frequency Distributions, Graphing, and Data Display
Bar Graphs
The most basic graph is the bar graph, which consists of enclosed lines forming vertical
bars rising along the x axis. Bar graphs are used to illustrate data from nominal or ordi-
nal scales. Because of the discrete nature of these scale types, bar graphs always have
space between each bar (Le., no scale values exist between the bars). Another defining
characteristic of bar graphs is that the x axis will not illustrate any quantitative measure
or scale; instead, it will include labels referring to the nominal or ordinal categories being
displayed. The yaxis in a bar graph is quantitative, as it corresponds to the frequency of
the nominal or ordinal data within each category.
Figure 3.3 shows a bar graph for a nominally scaled variable, the gender of respon-
dents to an anonymous survey. The bar graph's virtue lies in the immediacy with which
Bar graphs are one of the most basic
it conveys this (or any) information. You can tell with a glance at Figure 3.3 that more fe-
forms of data representation. males than males completed and returned the survey-35 females versus 20 males. Re-
member that gender is discrete-a given individual is either categorized as a male or a
female-so that the gap between the two bars indicates the absence of any meaningful
information (see Figure 3.3). The relationship between the bars representing the num-
ber of males versus females is thus clear, obvious, and easy to follow. Researchers often
use bar graphs when they wish to draw readers' attention to straightforward but still
important information; indeed, if the information is not particularly noteworthy, then it
should appear in the text, not a graph.
Naturally, bar graphs can be more complex than the example presented in Fig-
ure 3.3. If you were graphing the respective number of people from four (or more) dif-
ferent campus organizations, you would label each bar, plot its frequency, and make cer-
r
tain that none of the bars touched one another. On the other hand, I would caution you (
against making any bar graph too complicated by having too many bars to examine.
How many is too many? The answer is probably more personal preference than analytic
prowess, but I would say no more than 8 or 10 bars (though preferably fewer) to a single
bar graph. Any more than that defeats the purpose of a bar graph's simplicity, while at
the same time slowing viewers substantially.
45 r- (
40 -
35 -
30 -
Y I
25 -
20 - (
I
!
15 -
10 c-
O
Males Females
X
Figure 3.3 Bar Graph Indicating Number of Surveys Returned by Respondents' Gender
Graphing Frequency Distributions 99
Histograms
Where slightly more complex or technical displays are concerned, a histogram is often a
Due to their quantitative nature,
better choice than a traditional bar graph. This second level of graphing presents quan-
titative relationships from interval or ratio scales, and the data can be from either an un-
histograms are usually more complex grouped or a grouped frequency distribution. That is, frequencies can be plotted in a
than bar graphs. graph by individual scores or as they appear within ranges (Le., class intervals) of scores.
The main distinction between a bar graph and a histogram is that in the latter, the x axis
displays a range of numerical values (see Figure 3.4). As noted in Figure 3.4, each of the
bars has the same width and touches another, indicating that the scale of measurement
for x is continuous and quantitative, not qualitative and discrete. Despite this additional
quantitative information, a reader can still extract the main message of a given
histogram. With a brief glance at title and contents of Figure 3.4, for example,
readers can see the frequency of scores-high (5 scores of "8") as well as low (1 score of
"3" and "10:' respectively)-on a quiz.
Of course, the real advantage that histograms have over bar graphs is the way they
portray class intervals with true limits. Indeed, there are two ways to indicate true limits
of a grouped frequency distribution within a histogram. In the first place, the true limits
could be suggested by showing the raw scores appearing at the upper and lower limit if
each class interval, but then extending the width line of each bar one half a unit of mea-
sure. This method is shown in Figure 3.5 with a different set of hypothetical quiz scores.
As you can see, the true limits are not labeled in the graph, but they are delimited by the
placement of the columns (e.g., the vertical lines appearing halfway between 3 and 4, as
well as 6 and 7, to indicate the upper and lower true limits of the class interval 4-6). For
purposes of convenience and comparison, a table of the grouped frequency distribution
J
! data also appears in Figure 3.5.
J The second method for displaying grouped frequency data actually includes the
true limits along the x axis. Figure 3.6 presents the same data shown in Figure 3.5, but
this time the true limits are clearly indicated. Thus, the same information can be con-
veyed in a slightly different way. In some situations, a researcher might want to draw
x f
4
5 5 2
6 2
; 4 7 3
8 5
)
Y 3 9 2
/
/ 10
2
j o 4 5 6 7 8 9 10
./ Scores on Quiz
I
x f
4-6 1
6 I-
7-9 3
5~ 10-12 2
13-15 5
4~
y
3~
2~
l-
o
I I
2
I
3
I I
4
I
5
I
6
I
7
I
8
I
9
I
10
I
11
I
12
I
13
I
14
I
15
r'
Figure 3.5 Histogram of Grouped Frequency Distribution of Quiz Scores
Note: This graph suggests the true limit of a grouped frequency distribution. The vertical lines associated with
each bar appear half way above and below each of the inervals identified in the grouped frequency distribution
shown in the above-right corner of the graph.
attention to the true limits of a grouped frequency distribution by actually placing them
in a figure. Other times, it may be appropriate to simply suggest their presence. As a data
analyst, you can determine which approach best fits your research needs.
Frequency Polygons
Data display is not limited to the boxlike presentation of bar graphs or histograms. A re-
searcher can also display frequency data within a frequency polygon. As you may re-
member from geometry, a polygon is a multisided figure that has three or more sides,
which are usually straight or flat. A frequency polygon illustrates data from grouped fre-
quency distributions using a series of interconnected lines and is meant for graphing
x f
3.5-6.5
6 I-
6.5-9.5 3 (
9.5-12.5 2
5 ~
I
I
12.5-15.5 5 (
4~
y (
3 l-
2 I-
i
l- ('"
o 0.5
I I
1.5
I
2.5
I
3.5
I
4.5
I
5.5 6.5
I
7.5
I
8.5 9.5
I I I
10.5 11.5 12.5 13.5 14.5 15.5
I
X
Figure 3.6 Histogram of Grouped Frequency Distribution of Quiz Scores with True Limits
Note: This histogram identifies the true limits of the grouped frequency distribution shown in the upper right
corner of the graph. Please note that these data are the same as those shown in Figure 3.5.
Graphing Frequency Distributions 101
data from interval or ratio scales. These lines are connected to form the frequency poly-
gon, or "curve," as some researchers prefer to call it.
To create a basic frequency polygon, a single dot is placed above each number
(e.g., score, category) along the x axis. The height or placement of the dot is determined
by the frequency corresponding to the number. Once all the dots are placed in the space
to the right of the y axis and above the x axis, a line is then drawn to connect them all to-
gether. It is traditional to begin drawing the line at the 0 point where the two axes join.
The line forms the polygon, as the space between any two points creates a "side" of the
figure.
Figure 3.7 shows a basic frequency polygon plotting the grouped frequency data
from Figure 3.4 (i.e., refer especially to the small grouped frequency distribution shown
in the upper right portion of Figure 3.4).
What if a researcher wants to show a frequency polygon that displays class intervals,
even the true limits associated with class intervals? This desire can be accommodated
with little effort, as only minor calculations are involved. The goal involved is placing the
dot above the midpoint of a class interval. The minor calculation to locate this midpoint
is determining the average of each of the class interval limits. If the class interval runs
from 10 to 12, you would add the upper and lower limits together and then divide by 2
to find the midpoint. The formula would be:
[3.9.1]
upper limit +
2
lower limit
= ml' d '
pomt 0
f th· I
e mterva , or
[3.9.2] 10 + 12 = 22 = 11.
2 2
Naturally, the same procedure would be performed if true limits were being
graphed. The true limits for the interval 10 - 12 are 9.5 and 12.5, respectively, so:
7 8 9 10
x
Figure 3.7 Frequency Polygon of Scores on a 1O-point Quiz
x f Midpoint
6
3.5-6.5 1 5
6.5-9.5 3 8
5 9.5-12.5 2 11
12.5-15.5 5 14
4
y
3
o 0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5 10.5 11.5 12.5 13.5 14.5 15.5
X
Figure 3.8 Frequency Polygon for Hypothetical Quiz Scores with True Limits
Note: The data in this graph were taken from Figure 3.5
Figure 3.8 is a frequency polygon based on the data from Figure 3.5. The polygon is
formed by connecting the dots placed above the midpoint of the true limits of each class
interval.
We mentioned that some of the goals of graphing were the quick and clear presentation
of information so interested observers can understand the relationships between two
variables of interest. As a reader of books and periodicals in the behavioral sciences,
however, you must be wary about how data are presented to you. Few graphs intention-
ally lie about actual relationships in data (but see Tufte, 1983), however, many graphs can
be misleading about the magnitude of relationships between variables. You should learn
to detect when you are (potentially) being misled by purported relationships between
variables.
Besides being a consumer of graphic images, if you will, it is very likely that you
will also be a creator of graphs and charts. Many readers, for example, will want to pre-
sent data from some class project in a graph. Others will conduct independent or su-
pervised research with a faculty member who will expect that they know how to sum-
marize data relationships in concise but meaningful forms. Still other students are
planning to attend graduate school or begin a career where quantitative skills (and in-
formation) abound. For these and many other reasons, it is likely that you will be
pressed to create a histogram or frequency polygon at some point in the future. Why not
plan ahead by learning to identify and avoid creating biased or otherwise misleading
data presentation right now?
Edward R. Tufte (1983), the dean of graphic images, has published several books
"Data is power (sic)" detailing how to properly present data in graphic form. He argues for what he calls
graphical integrity. In other words, a graph should mean what it actually says, and it
-Anonymous Researcher
should not include any deceptive information. As Tufte put it, "Graphical excellence be-
Graphing Frequency Distributions 103
gins with telling the truth about the data" (Tufte, 1983, p. 53). To help readers and pro-
ducers of graphics develop acumen in this regard, Tufte listed several principles to keep
in mind as you examine or develop any graphic display. Here are a few of them (from
Tufte, 1983, p. 77)
• Emphasize how the data vary, not how the design presenting them can vary.
Focus on the numbers, not fonts, colors, textures, or other stylistic additions that
distract readers.
• To defeat distortion and ambiguity in a data presentation, clear and detailed
labels should appear with any graph. Where necessary, explanations can be
written on the actual graph, just as any important events in the data should be
I labeled.
I
• A good graphic should never quote data out of their proper context.
The last principle-presenting information in proper context-is a good one to
use as an illustration of how factual relationships within a set of data can be properly
used or easily abused. By "context," we refer to how some data or datum compares to
,I
other relevant information. Consider graph (a) in Figure 3.9, which shows traffic
deaths in the state of Connecticut before (1955) and after (1956) stronger en-
forcement of speed limits. It appears to show a strong and favorable event-a dra-
matic decline in traffic deaths once enforcement became stronger. The message of
graph (a) is strong and clear: Deaths dropped from 325 in one year to somewhere
around 280 the next year.
/
I
Before Stricter
) 325
Connecticut Traffic Deaths,
Before (1955) and After (1956)
y Stricter Enforcement by the Police
300 Against Cars Exceeding Speed limit
J
.r
After Stricter
( 275 Enforcement
I (a)
1955
X
1956
/ 325
Connecticut Traffic Deaths,
, 1951-1959
) 300
/-
t 275
Y
250
225
But wait a moment. Take a look at graph (b) in Figure 3.9. The addition of more
contextual information-the number of deaths a few years before and after the stricter
speed limit enforcement tell us more. Yes, there were fewer deaths in 1956, but there were
actually many fewer deaths still in the years 1951 through 1954 before enforcement began
in earnest. In fact, the decline continues beyond 1954 to levels approximating the earlier
mortality rates. Thus, context matters: As critical students of behavioral science, we must
always ask, "Compared to what?"
Our search for useful contextual information is not over yet, however. The
Cultivate the habit of asking researchers who actually worked with these data wondered if the decline in traffic
fatalities was unique to Connecticut or if adjacent states were in any way affected
"Compared to what?" whenever
(Campbell, 1969; Campbell & Ross, 1970). Figure 3.10 provides even more contextual
you examine any data information because it shows the fatalities for three states plus Connecticut pre and
presentation-including your post speed limit changes in the latter. What can we learn from the data? Figure 3.10
own. clearly shows that Connecticut was not the only state to experience declines in traffic
deaths between 1955 and 1956. We would not have known that by examining graph (a)
or (b) in Figure 3.9.A change may appear to be a dramatic one, but a good graph must
provide adequate information for readers to concur with-and critically accept-such
a judgment.
Care must be taken whether you are a producer or consumer of graphs, and de-
veloping a constructive but appropriately skeptical regard for graphs can be healthy.
Some guidelines for developing accurate graphs are provided in Table 3.7. I recom- J
mend that you consult these guidelines whenever you are creating or evaluating a !
graph.
.
I
New Alternatives for Graphing Data: Exploratory Data Analysis 105
/
traditional, and advanced statistical procedures (what Tukey calls confirmatory data
analyses) that are later applied to the data. In other words, EDA is used before a re-
searcher even determines whether a study's results confirm his or her hypotheses. You
will learn many confirmatory analyses later in this book.
What is involved in EDA? Taking a fresh look at one's data by getting a preliminary feel
for it. EDA is comprised of guidelines for quickly organizing, summarizing, and inter-
) preting data from a piece of research. We will not-indeed, we cannot-review all of the
I
i
procedures entailed in EDA, but we can learn to use two of them. These procedures are a
I novel way to present numbers in a frequency distribution and a quick way to tally num-
I bers. Both procedures will help you think about the rest of the material in this book, and
no doubt they will prove to be useful for any research you do in the future. If you become
interested in EDA, I heartily recommend that you consult Tukey's (1977) now classic
work.
!
Stem and Leaf Diagrams
Previously, we tried to make sense out of the LOT scores shown in Table 3.1, and we
concluded that if such data were not organized, it would be difficult to quickly and
efficiently point to the high and low, most frequent scores, and so on. We organized
the LOT scores into a frequency distribution, which enabled us to answer these (and
,) other) questions with greater ease. There is yet another way to organize the same
sorts of data for quick perusal, a technique that Tukey (1977) calls a stem and leaf
diagram.
/ KEY T ERM Stem and leaf diagrams are numerical graphs that promote exploration of a data set while retain-
ing the values of the original observations.
,.
I Imagine that you are working with the test scores shown in Table 3.8. These are scores on
! a 100-point Introductory Psychology test given to 50 students. These scores have been
converted into the stem and leaf diagram shown in Figure 3.11. The vertical line between
the two sets of numbers divides the "stems" from their "leaves:' The numbers to the left
/
I
of the line are the stems, which represent the first or base digit of a two-digit number
(e.g., the stem of 5 accounts for all of the numbers between 50 and 59). The numbers to
/ the right of the vertical line, the leaves, are the second digits in these two-digit numbers.
106 Chapter 3 Frequency Distributions, Graphing, and Data Display
W hy would anyone want to "lie" with a biased graph? The creators of a deceiving graph may not
be out to lie to people but, rather, to persuade them to adopt some point of view. Deceptive data
displays are often used to convince an audience-a jury, a school board, some trustee group-to
commit to one choice and not another. A good way to persuade people is by presenting them with
graphic evidence that appears simple, logical, and incontrovertible. Not only should you avoid per-
suading people with untruthful data, you should also avoid being persuaded by others.
Let's consider a hypothetical example. Imagine you are reading the newspaper and you see an
advertisement for a new medically supervised "wonder" diet guaranteed to help people lose un-
wanted pounds. Placed squarely in the center of the page is a graph purporting to show dramatic
weight loss [see graph (a)]. The x axis shows three nominally scaled diet groups and the y axis
plots the average number of pounds lost after 1 month. As you can see, clients on the "wonder"
diet appear to have lost much more weight than those on a competitor's diet program or a control
group.
7 , 14 ,
6 e-- 12 e--
r'
5 I-- 10 I-- /
/
4 I-- 8 -
Y y
3- 6-
2 I-- 4 e--
e-- 2 r--
o
Wonder Competitor
I Control
I o Wonder Competitor
I
Control
I
Diet Diet Group Diet Diet Group
But was the weight loss actually dramatic? To answer that question, take a look at graph (b),
which presents the same information but the height of the bars is not as great so that the weight
loss does not appear as great. In fact, graph (b) indicates that weight loss on the "wonder" diet was I
not miraculous; indeed, it was only a couple pounds higher than the competing diet.
Is one graph better than another is? No, not really. Graph (a) exaggerates the relationship be-
/
tween diet and weight loss, while graph (b) indicates that the between-group differences are slight
and not at all interesting. If truth rather than deception were the goal, a better route for the re-
(
searcher would be to place numerical information in a table rather than a bar graph.
The string of numbers 334455 represents the 6 scores that fell in the 50 to 59 range (i.e.,
53,53,54,54,55, and 55).
The stem and leaf diagram's virtue is that all the original scores are still visible in
the distribution. Readers still receive the desired information about the relative (
frequency of scores in several ways. One can see the actual data, as well as the range of
scores where the most frequent (i.e., scores in the 80 range) or least frequent (i.e.,
scores in the 30 range) observations fell (see Figure 3.11). With one glance, too,
the high and the low scores can be readily discerned (here, 99 and 34, respectively). The
shape of the distribution is also apparent, and a reader can get a sense of whether the
(
I
New Alternatives for Graphing Data: Exploratory Data Analysis 107
"" ..
55 34 66 87 85 88 54 44 39 84
74 88 92 55 70 63 41 90 88 72
66 65 91 53 40 99 81 73 65 87
92 88 70 86 74 90 88 54 87 73
88 45 92 84 76 53 76 74 80 85
3 49
4 0145
5 334455
6 35566
7 002334466
I
8 014455677788888
9 0012229
Figure 3.11 Stem and Leaf Diagram of Introductory Psychology Test Scores
Note: These data are from Table 3.S.
i
(
/ data are uniformly distributed or grouped in one or more areas. As shown by
Figure 3.11, the scores appear to be weighted toward the high end, as most scores are
above 60 (as they should be!).
Can we do more with these data using the stem and leaf diagram approach? Ab-
solutely. For instance, we need not look at ranges of scores in a stem and leaf diagram
/ by units of 10 (i.e., 0 to 9 values). Following Tukey's (1977) encouragement to get a feel
for the data, we can divide the leaves, if you will, in half: that is, we look at score values
from 0 to 4 in one stem and those ranging from 5 to 9 in another stem. This "break-
point" effectively divides each potential range of scores in half, and enables us to ex-
amine a distribution of scores with a still finer eye. These new breakpoints are shown
in Figure 3.12. The asterisk (*) next to the first stem for 4 indicates that the leaf values
can range from 40 to 44. The dot (.) next to the second stem for 4, then, covers values
/ from 45 to 49.
What do we know now? As shown by Figure 3.12, we can now see that most scores
j
fell into the low 70 range and the high 80 range. Looking closer, it is apparent that there
I
is a concentrated clustering of scores from the high 60s to the low 70s, and then again
from the high 80s to the low 90s (notice the two "peaks" represented by the leaf spreads
in these ranges). As shown by the shape of the distribution, the majority of the test scores
! are still shown to fall in the 60 and above range (cf., Figure 3.11), while fewer scores are
found in the lower or "tail" of the distribution. Note, too, that with the use of break-
i
)' points in Figure 3.12, the high and low scores in the distribution are now even more ap-
; parent. There is a certain advantage, then, to looking at a data distribution from a vari-
ety of perspectives before the actual analyses get underway, and EDA is particularly
) helpful in this regard.
108 Chapter 3 Frequency Distributions, Graphing, and Data Display
3- 4
3- 9
4- 014
4- 5
5- 3344
5- 55
6- 3
6- 5566
7- 0023344
7- 66
8- 0144
8- 55677788888
9- 001222
9- 9
Figure 3.12 Stem and Leaf Diagram of Introductory Psychology Test Scores
Note: These data are from Table 3.B.
As you might guess, it is possible to make even smaller breakpoints than are shown
if Figure 3.12. The stems in a stem and leaf diagram can be made into interval classes like
those we reviewed earlier in the chapter, causing their leaves to reflect the greater or /
lesser width of the classes. Further, the digits in a stem and leaf diagram are not restricted I
Tukey's Tallies /
What happens if the observations within a stem and leaf diagram become too nu-
merous? Tukey (1977) suggests that once 20 or more leaves appear on a stem, the
data analyst is apt to feel cramped. Stem and leaf diagrams, then, are very useful for
examining smaller sets of data, but what about larger distributions? Can we retain
their information but render them easy to handle visually? Tukey recommends rely- I
ing on a tallying system that used dots first, then lines that form boxes, and then
crossed lines that represent a final symbol for 10. Tukey's line box tally system is
shown in Figure 3.13. It is much easier to use than the old fashioned method of
counting by lines of five:
Why is the line box system better? Because even the most careful data analyst can
make a mistake as he or she adds yet another line to make a bundle of five lines to count.
r
I used the older tallying system more times than I care to remember-invariably I would
miscount, make an error, and have to start all over again. I became frustrated by my error (
and annoyed when I noticed the time I wasted trying to correct it. The Tukey (1977) tal-
lying system prevents such mistakes, is pleasing to the eye, and reveals accurate counts of
observations quickly.
(
(
I
New Alternatives for Graphing Data: Exploratory Data Analysis 109
5
e---e
or
I • and so on
2 0 8
• • •
6
:l or
r. and so on 3 ~~D 28
7
C or
U and so on 4 1-: 6
8
0 5 ~D 18
•
9
IZI or
ISJ 6
• •
3
10
~ 7
•
! Other Sample Counts
•
22 ~ ~ •
• •
34 ~ ~ ~ • •
50 ~ ~ ~ ~ ~
Figure 3.13 Tukey's (1977) Tallying System
Consider the simple frequency distribution shown in the upper right section of
Tukey's tallying system prevents Figure 3.13. The X observation values are noted to the left of the vertical line and the
symbols for the tally counts of the observations are shown to its right. For your
counting errors, a bane of good data
convenience, the actual frequency (f) of the tallies is shown to the far right. Notice
analysis. how quickly you can pick up the tallying symbols and know what the count is with a
glance. As with the stem and leaf diagram, the graphic quality of the tallies enables
viewers to get a sense of what the distribution of scores is like and where the heav-
i~st concentration of scores falls. Other larger sample counts are shown in the bottom
left of Figure 3.13.
By the way, it makes no difference where you start a box with four dots, nor which
J line you draw first when you want to connect two points (see the column labeled Tally
in Figure 3.13). An added virtue of this system for tallying is that there is no correct
order-like most EDA procedures, this one is very flexible for both researcher and data.
;
Knowledge Base
1. Traditionally, which axis is used to plot the dependent measure? Which one plots the
independent variable?
) 2. True or False: A histogram and a frequency polygon based on the same data set con-
vey the same information.
/ 3. Using the data from the frequency distribution that follows, plot a histogram and a
; frequency polygon.
110 Chapter 3 Frequency Distributions, Graphing, and Data Display
x f
7 o
6 8
5 4
4 5
3 5
2 2
2
4. True or False: In a bar graph, both the x and the y axes are based on quantitative
scales.
5. Why is contextual information important in graphs?
6. How do stem and leaf diagrams differ from other types of graphs?
7. Place the following data in a stem and leaf diagram:
3322243547505333302922414033214259204327
8. How many observations are represented by the following tally:
r.
Answers
I
r--
8 l-
7 I--
6 I--
5 t-
y 4- r-
3 -
2 -
1-
I I I ~ ~ J J
o 1 2 3 4 5 6 7
X
4. False: In a bar graph, only the yaxis is quantitative; the x axis is qualitative.
i
,/
Envisioning the Shape of Distributions 111
5. Critical reviewers should be able to ask, "compared to what?" about any particular result
shown in a graph. To do so, context is important-one result cannot be properly understood
in isolation from other relevant information.
6. Stem and leaf diagrams retain and display the actual raw scores in a set of data.
7.
2 0122479
3 03335
I
4 01237
j
I
5 039
f
8. 26
KEY TERM A normal distribution is a hypothetical, bell-shaped curve wherein the majority of observations ap-
pear at or near the midpoint of the distribution.
A typical normal distribution is shown in (a) of Figure 3.14. As you can readily see, the
distribution is symmetric around its midpoint; that is, if you split the bell curve in half,
each side is the mirror image of the other. Most of the observations or scores fall at or
around the midpoint of the distribution, and fewer and fewer fall away from this cen-
ter-the observations occur less frequently-as you move into what are usually referred
to as the "tails" of the distribution. We will review the statistical properties of the normal
distribution that make it so helpful to psychologists and other behavioral scientists later
in the book.
112 Chapter 3 Frequency Distributions, Graphing, and Data Display
YI~ X
(a) A Normal Distribution
X
(b) A Bimodal Distribution
X
(e) A Positively Skewed Distribution
rl~ X
(d) A Negatively Skewed Distribution
(
r'
!
Figure 3.14 Four Distribution Shapes
The classic example of the normal distribution is apt to induce fear and trembling
in most students, and memories of things long past in the minds of many former stu-
dents. I refer, or course, to the distribution of grades within the typical college class of
yore: The majority of students earned grades in the C (i.e., average) range, which is rep-
resented by the bulk of observations underneath the "bell" in the normal curve. Moving
(
outward to the left and right of the bell, a smaller number of students received grades of
D and B, respectively. Finally, in the tails of the distribution fall the fewest students-
those receiving the highest (i.e.,As) and lowest (i.e., Fs) grades. Some readers will be sur-
prised to learn that it used to be assumed that a portion of students enrolled in any given
class would be anticipated to fail it, just as the vast majority were expected to pass it with
/
only average grades. Fortunately, student goals and desires, as well as faculty teaching
philosophies, have moved away somewhat from the tenets of the normal curve where
grading is concerned.
It is also possible to have a distribution that looks like two normal distributions put
Anything that can be measured- together. This two "humped" distribution is still deemed symmetrical, and it is called a
bimodal distribution. "Bi" means two and "modal" refers to the frequency of scores, an
from behaviors to beliefs-will have a
issue we will take up again in the next chapter. For the present, think of a bimodal dis-
distribution of scores comprising tribution as one that has two more or less symmetrical curves in it (if the distribution
some shape. has more than two score clusters or curves, it is sometimes described as being multi-
modal). Chances are that you have been part of a bimodal distribution at one time or
another. Have you ever taken a test where half of the students score either very highly (A
or B range) or very poorly (D or F range)? Few or no students receive an
average grade of C on this all or nothing test. In this case, the range of test scores is said
to be bimodal, as one of the humps corresponds to very high grades and the other to very
low grades. A bimodal distribution is shown by (b) in Figure 3.14.
Do the distributions of data that behavioral scientists work with tend to be normal
ones? Not necessarily. In fact, statisticians have developed various descriptions for the
different forms distributions can take. One of the chief features of a distribution is a
property called skew, which refers to the lack of symmetry within a distribution.
KEY TERM Skew refers to a nonsymmetrical distribution whose observations cluster at one end.
Envisioning the Shape of Distributions 113
When a distribution has a positive skew, its observations tend to be clustered at the
lower end of the distribution-very few observations fall into the upper region of the
scale, which has a long but low tail. An example of positive skew is shown in (c) of Fig-
ure 3.14. Positive skew can occur, for example, when a test is so difficult that the distri-
bution of scores reveals that the majority of people who took it performed poorly-only
a relative handful scores highly [see (c) in Figure 3.14]. Negative skew, then, occurs
when many observations fall at the upper end of the scale, which can be seen in (d) in
Figure 3.14. As shown in (d), the tail in a negatively skewed distribution is long and low
to the left of the scale. A test that is too easy will show negative skew in its distribution of
scores-most everyone scores relatively highly on it, and only a few test-takers receive
poor or failing scores [see (d) in Figure 3.14]. The distribution shown in the stem and
leaf diagrams in Figures 3.11 and 3.12 also has a negative skew. Normal distributions do
not have skew, of course, because the observations are symmetric about the midpoints
of these distributions [see (a) in Figure 3.14]. Another prominent feature of some dis-
tributions is discussed in Data Box 3.C.
D istributions of observations can be normal or they can be skewed. In either case statisticians
sometimes like to further characterize what sort of clustering of scores takes place within a
given distribution. Is a distribution "skinny" or "fat"? This relative peakedness or the flatness of a
curve is called kurtosis. There are three categories of kurtosis.
/ a. A normal distribution is called mesokurtic because most scores appear in its middle ("meso"
refers to middle). The normal curve shown in curve (a) is mesokurtic.
! b. When a curve is very tall and skinny with only a quasi-normal shape, it is referred to as 1epto-
kurtic. Curve (b) is leptokurtic ("lepto" refers to thin).
c. Fatter curves that still possess a somewhat normal shape [see curve (c)] are said to be
p1atykurtic. As you can see, the spread of scores is broad or flat, the definition of "platy."
I
y
114 Chapter 3 Frequency Distributions, Graphing, and Data Display
I
nformation can be presented in elegant and informative ways. An example of such graphical ele-
gance is shown below. The image represents Napoleon Bonaparte's ill-fated march to Moscow in
the war of 1812. It was created bya French engineer, Charles Joseph Minard (1781-1870). Minard
deftly shows what happened to Napoleon's army as it marched from the Polish-Russian border
near the Niemen river toward Moscow and then back again. The thick band at the top of the
graphic represents the army (which began with 422,000 men) as it invaded Russia during June of
1812. As the band moves to the right, notice that it reduces in width, Minard's way of illustrating
Napoleon's losses-indeed, by the time the army reaches Moscow in September, there are only
100,000 soldiers remaining (see the far right of the graphic). By then, Moscow was sacked and de-
serted.
The black band in the lower portion of the graphic is Napoleon's army in retreat-black is
appropriate, of course, because as your eye follows the band back toward the Niemen River, it
grows ever smaller in width. Why? The winter was bitterly cold, as noted by the temperatures and
months Minard provided at the bottom of the graphic. Soldiers died or were lost by the thou-
sands. The numbers running along the bottom of the black band represent the number of
soldiers left as the army returns to its point of origin-only 10,000 men remained alive at the
campaign's end.
Minard's presentation of the data is simple but staggering. This tragic tale is conveyed graph-
ically through five variables-the army's size, it's location, the direction of travel, and the temper-
atures and dates during the retreat from Moscow. Tufte (1983, p. 40) remarked about this graphic
that, "It may well be the best statistical graphic ever drawn." I want to draw your attention to this
and similar data displays (cf., Dr. Snow's map of cholera in central London; see p. 86) to remind
(
you relationships between variables can be powerful. Such presentations can-and should-be
portrayed in ways that capture a viewer's attention, even imagination, but that the integrity and
factual basis of the data must not be sacrificed. Although you will probably not create images like
Snow's or Minard's, allow their careful work to inspire you to develop meaningful graphs and
tables for your own data.
CARTE FIGURATIVE des perles successives hommes de l'Armee Fran~aise dans la campagne de Russie 1812-1813.
Dressee par M.Mlnard, Inspecteur General des Ponts et Chaussees en restraite.
\~
, ~
~ '~M"'OUI
.. i'Io • Ii'
C/JIBI ~
tID.P...mm.,.. d. FnI'j
0255075
r---~~==:=]~~i~i~~~~~E~~~~~~~~U'.~2~40~ctiZaro.'1BOct --1200
-28" .. 70.. -311" .. 60.. - o. _..O" .. 23N.. -21" 0.14 Nov -30 dil ....
Percentiles and Percentile Ranks 115
Cumulative Frequency
Before any percentile rank or percentile can be calculated, however, we first need to de-
termine the cumulative frequency for the data we are analyzing.
KEY TERM Cumulative frequency refers to the number of values within a given interval added to the total num-
ber of values that fall below that interval.
Cumulative frequencies are organized into what are called cumulative frequency
(
distributions.
Table 3.9 shows a cumulative frequency distribution for the LOT data we examined
earlier in the chapter. As you can see, we created the intervals previously and their origi-
nal frequencies were retained. The third column of Table 3.9 is labeled" cf" for "cumula-
tive frequency." The convention associated with creating cumulative frequencies is to
begin with the frequency of the lowest class interval (see 1 in interval4.5-8.5)-this is
the base value of the cf. This base value of 1 is then added to the value in the class inter-
val immediately above it (here, 6 in 8.5-12.5) to create the cffor this next-to-the-lowest
class interval, which is 7. The same procedure is used to determine the cffor the next
highest (the third one from the bottom) interval (i.e., 7 + 1 = 8), and so on, until all of
the frequencies are accounted for in the top most class interval. Note that the total
cf = 30, which is equal to the 2: fand, in turn, N. Please make sure you understand why
this is-and must be-so.
What can we learn from the cfof the LOT scores? Take another look at Table 3.9. For
example, consider the 6 respondents who had LOT scores falling in the interval 20.5-24.5.
We know that 17 (i.e., the cJfor that class interval) respondents in the sample received LOT
scores that were less than or equal to 24.5, the upper true limit of the interval. We also
know, then, that 13 people in the sample had LOT scores that were greater than 24.5, the
lower true limit of the above interval. How so? If we know that the I fis 30, then all we
need to do is subtract 17 from that sum (i.e., 30 - 17 = 13). Thus, our reliance on (
If= 30
Note: The intervals in this table are based on the frequency distribution shown in Table 3.2, which in turn is
based on the raw scores from Table 3.1.
Percentiles and Percentile Ranks 117
cumulative frequencies provides a bit more information about the relative positions of
scores than was available through a grouped frequency distribution.
Please note that cumulative frequencies need not be calculated using data that are
already organized into class intervals. One could determine cumulative frequencies for
frequencies of individual scores. Thus, for example, a column of cumulative frequencies
could be added to the frequency distribution of LOT scores shown back in Table 3.2. Be-
cause larger samples of data are usually presented in class intervals (with or without true
limits), however, it is likely that any cumulative frequencies you work with will already
be in some sort of grouped frequency distribution. As a result, I feel it is more beneficial
for you to see and work with examples that present cumulative frequencies for data that
are already organized into class intervals.
Cumulative Percentage
Once they are available, cumulative frequencies, in turn, can be converted into cumula-
tive percentages.
KEY TERM A cumulative percentage is the percentage of values within a given interval added to the total num-
ber of percentage values that fall below that interval.
Cumulative percentages enable you to consider a distribution of data as having 100
equal parts, and can be determined from cumulative frequencies with ease. A cumulative
frequency can be changed into a cumulative percentage simply by dividing the cf value
by N and multiplying the resulting product by 100, or:
)
[3.10.1) cumulative percentage = cumulative frequency/total
I number of scores (i.e., N) X 100.
Thus, if we want to know the cumulative percentage corresponding to the cfof 17 in the
class interval 20.5-24.5 in Table 3.9, the calculation would be:
[3.10.2) cumulative percentage = 17/30 X 100,
36.5-40.5
f
1
cf %
30 3.33%
c%
100%
32.5-36.5 5 29 16.67% 96.67%
28.5-32.5 5 24 16.67% 80.00%
24.5-28.5 2 19 6.67% 63.33%
20.5-24.5 6 17 20.00% 56.67%
16.5-20.5 3 11 10.00% 36.67%
12.5-16.5 8 3.33% 26.67%
8.5-12.5 6 7 20.00% 23.33%
4.5-8.5 3.33% 3.33%
If= 30
Note: The intervals in this table are based on the frequency distribution shown in Table 3.2, which in turn is
based on the raw scores from Table 3.1.
where
elI 1 = cumulative frequency of the class interval below X,
Xi = the score to be converted to a percentile rank,
Percentiles and Percentile Ranks 119
section back into its original score. Once again, we will rely on a useful formula and
series of steps provided by Runyon and colleagues:
[3.12.1) X PR -X
- II
+ w(efpRIi- cjll)
'
where
cjPR = the cumulative frequency found by multiplying the known percentile rank
(PR) by the sample Nand then dividing by 100,
ch I =the cumulative frequency of the class below the one containing the known
PR,
XII = the score at the lower true limit of the class containing the known PR,
2. Using information provided in Table 3.10, we then substitute numbers for the vari-
ables shown in formula [3.12.1). First, we now know that the efPR falls within the
class interval of 32.5-36.5. Thus, we know that the value of Xll is 32.5. The efbelow /
this class interval is 24, so that elI I = 24. Finally, the Ii of the class containing the PR
is 5, and the width (w) of the class intervals remains 4.
3. Once the numbers are entered into formula [3.12.1)' it can be solved:
X - 325
[3.12.2) PR- • + 4(27.126 - 24)
,
5 ,!
;
Source: Adapted from Table 3.7 on p. 82 of Runyon, Haber, Pittenger, and Coleman (1996).
122 Chapter 3 Frequency Distributions, Graphing, and Data Display
scores; that is, smaller samples are apt to be more affected by changes in frequency than
are larger samples. Why? Because in a larger sample, any given score creates very little
percentage change, whereas in a small sample like this one, any given score contributes
3.33% (i.e., 1/30). If the sample size were 100, then a given score would represent only
1% change (i.e., 1/100), and so on for even larger samples.
What about the 50th percentile, also known as Q2? Well, the 50th percentile has a
special name-the median. We will spend considerable time discussing the median in
the next chapter but for now, know that the median divides a distribution of data exactly
in half-that is, half of the scores fall above a median point and half below it. The me-
dian LOT score was determined to be 26 (see Table 3.11). Thus, we can say that half of
the LOT scores were higher than 26 and half were lower. (
The SEI was administered to 7593 public school children in grades 4 through 8. The
sample included all socioeconomic ranges and Black and Spanish surnamed students.
Derived percentiles (see Table 5) showed a consistency of score values at a given per-
centile regardless of the population considered.
The written text refers to a table (I have not provided the table of data here because I
want to emphasize the writing side of the link between data and explanation). What do
we know from this brief passage? We know that across grades 4 through 8, the SEI scores
in every percentile (1 to 99 for the SEI) were consistent with one another so that the
Percentiles and Percentile Ranks 123
reader knows what to expect when she reviews the percentile data in the table. The au-
thor presents this information clearly and succinctly. If any score or scores stood out-
were aberrant, for instance-they would be discussed in greater detail. Let me close this
section by encouraging you to treat describing percentile results as a guide for writing
about data-create links between what you know and what you must tell readers, and do
so in a judicious manner.
Knowledge Base
1. A group of clinically depressed individuals completes a self-esteem inventory, and
most of the scores cluster at the low end of the scale (i.e., most members of the sam-
ple have low self-esteem). The distribution of scores is best described as
a. positively skewed b. negatively skewed c. bimodal
d. normal
2. A score on a psychological inventory is observed to be at the 85th percentile. Explain
what this fact means.
3. Calculate the percentile rank for a score of 8 using the following data:
X f cf
25.5-30.5 8 36
15.5-20.5 7 28
10.5-15.5 5 21
5.5-10.5 10 16
0.5-5.5 6 6
4. Using the data set in 3, find the score corresponding to a percentile rank of 70.
) Answers
1. a. Positively skewed
2. Eighty-five percent of the inventory scores fall at or below the observed score.
3. Percentile rank of a score of 8 = 30.5556 == 31.
) 4. Score corresponding to a percentile rank of 70 = 15.02 == 15.
/
Constructing Tables and Graphs
Quite a bit of the material in this chapter focuses on how to present and interpret infor-
mation in graphs. As you know by now, the presentation of data within a clear graph is
an important way to convey results to an audience of readers or listeners. We have not,
however, spent much time reviewing tables or how to present information in tabular
form. I have not neglected this topic-on the contrary, most chapters in this book con-
tain numerical information presented within tables. Due to the frequency with which ta-
bles appear throughout the book, however-and compared to the relative paucity of
graphs-I felt that emphasis in this chapter should be placed on the latter.
I do want to close this chapter with some suggestions that can improve the con-
struction and the presentation of data in tables as well as graphs. Some information will
be a continuation of themes raised by our earlier review of Tufte's (1983) comments on
graphic images, while the remainder deals with guidelines for data display advocated by
the American Psychological Association (APA), notably its rules of style. For continuity,
we close the chapter with a project exercise dealing with a historical use of data reminis-
cent of Dr. Snow's cholera map of central London; old information can be presented in
relatively new and powerful ways.
124 Chapter 3 Frequency Distributions, Graphing, and Data Display
en
~ 800
...
o:a
=
~ 600
:i5
400
o
1949 1954 1959 1964 1969 1974 1979
What about tables? Can they also have chartjunk or, to coin a term, tableclutter? Of
course. Any table that contains too many unnecessary numbers, labels, categories, and
footnotes is probably suffering from tableclutter. If a reader must spend more than a
minute or two reviewing the contents of a table in order to understand it or to gain his
or her bearings, as it were, something is wrong with both its conception and construc-
tion. Many complex and detailed relationships are appropriately displayed through nu-
merical entries and headings in a table, but when too much numerical information ap-
pears within a table, its message will be lost on readers. To be useful, data tables must be
informative but not encyclopedic. The best guide when creating a table is to determine
the minimal information needed for readers to understand the main point(s) the table
is presenting. In this sense, a good table is like a good paragraph-it says what it is sup-
posed to say and no more.
A second consideration in a good graph or table is the link between it and the ac-
companying written text. A graph or a table can be wonderfully clear and meaningful,
but it will go unheeded if the author does not make direct, guiding reference to it in the
text. In most cases, it is not sufficient to tell a reader to "See Table 2" if you do not bother
to explain either before or after this directive what the fuss is all about. If you plan to take
the time to create informative tables and graphs, then you must also explain what
relationships-numerical or otherwise-you want viewers to notice. Keep in mind that
graphs and tables are supplements supporting a researcher's ideas and insights, and that
they stand alone only in the sense that their contents should be interesting to look at or
consider. Researchers must still write clear and cogent prose that leads readers through
this distilled information, pointing to interesting or suggestive trends along the way.
Readers' imaginations are best left to fiction, not to scientific writing in the behavioral
sciences-some link between text and table or graph must be established.
A third and final consideration is redundancy. A moment ago I advocated having
) written text complement tables and graphs-this still holds true-but the issue here is
the amount of detail involved. A table or a graph should provide some new information,
not repeat the same information presented in the text. Of course, there will be some
small degree of overlap. A researcher will want to specifically point to numbers that
I stand out or a line that portrays a crucial pattern, but this should be done carefully and
I sparingly. Experience is the best teacher here. As you gain experience presenting and
discussing data in graphs and tables, you will acquire a sense that what to leave out of the
; written text is as important as what you put into it.
Table 3.12 Some Considerations for Creating High-Quality Tables and Figures
1. Be certain that a table or figure absolutely necessary. Does it add-not detract from-the
point you want to make?
2. A table or figure must be double-spaced.
3. Tables and figures must be consistent with other tables and figures in a text or presentation.
4. Titles, notes, and footnotes accompanying tables and figures must be brief but explanatory.
5. Entries in columns or rows must have headings.
6. Any abbreviations or acronyms appearing in a table or figure must be defined or explained in
a note or a footnote.
7. Tables and figures must be connected to material in the text.
8. Spelling in any table or figure must always be checked.
9. A table or figure must appear on a separate page in a manuscript or a report.
Source: Some entries were adapted from the Publication Manual of the American Psychological Association,
4th ed. (I994).
tables, see sections 3.62 to 3.74; for figures, see 3.75 to 3.86; see also section 4.21}. For
your convenience, I have distilled several of the Publication Manual's key points, as well
as some intuitive observations, about table and graph construction into Table 3.12.
Study this table before you create or evaluate any tables or figures. Finally, you should
also consult Appendix C in this book, which is focused on presenting results (including
tables and figures) in APA style.
j
Project Exercise DISCUSSING THE BENEFITS OF ACCURATE BUT :
PERSUASIVE DATA DISPLAY
II In our day and age, we take many things for granted, including the high quality of health
care. Prenatal care and childbirth are a good case in point. In the middle of the last cen-
tury, for example, maternal mortality following a birth was often quite high and physi- (
cians had only vague speculations as to its causes (Wertz & Wertz, 1977). The work of
Dr. Ignaz P. Semmelweis, a Hungarian doctor, stands out as a clear example where casual
observation led to the systematic collection of convincing data. Semmelweis noted a link
between a contagious fever-then called puerperal fever-and an antiseptic environ-
ment as a means to prevent it. Working in Vienna's largest maternity hospital, Semmel-
weis discerned an interesting pattern of maternal mortality. In a ward where medical
students delivered infants, the maternal mortality rate was 437% higher than in a ward
where midwives did the deliveries (in the year 1846, the former ward had 459 deaths ver-
sus 105 losses in the latter). He knew that training and experience were not the key, as
women who delivered babies on the hospitals' steps or in the corridors before reaching
the wards never contracted the fever.
Semmelweis developed a hypothesis when a colleague died from puerperal fever
after completing an autopsy. He believed-correctly-that women who survived child-
birth had not been touched by student doctors whose hands had been contaminated
from the handling of decomposed matter (i.e., the student doctors worked with and
learned from cadavers in the hospital's autopsy ward). Those mothers who died, however,
had been infected by disease lingering on the hands or person of the physicians who de-
livered their children.
Medicinal hygiene, a given in our century, was the key. Following his deduction,
Semmelweis required anyone helping with births to wash their hands in a chloride of
Percentiles and Percentile Ranks 127
lime solution before ever touching the mothers. Two years later, only 45 women out of
3,556 died in the medical students' ward (43 died the same year in the midwives care).
The drop in maternal mortality was dramatic, though you may be surprised to learn that
the medical community still did not universally accept that cleanliness was necessary for
health after childbirth (or many other medical procedures; see Wertz & Wertz, 1977).
Can some of the remaining deaths in the medical students' and midwives' wards be
accounted for? Indeed, they can. Wertz and Wertz (1977) note that even the insightful
Dr. Semmelweis did not understand the full importance of hygienic hospital conditions.
He did not know, for instance, that it was necessary for each mother-to-be to have clean
sheets on her bed! Other opportunistic infections beyond puerperal fever were still
present when hospital beds were vacated by one patient in order to make room for the
next one.
Think about Semmelweis' data: He had no tabular or graphic knowledge to calIon
when he set out to convince his medical colleagues that they were infecting their patients
; and indirectly causing their deaths. For your project exercise, think about and then an-
I swer the following questions by calling on the skills you have developed in the course of
reading this chapter:
1. What makes the link between hygiene and maternal mortality a problem for behav-
ioral science? How was Semmelweis acting like a behavioral scientist?
2. Knowing what you now know about graphs, for example, how could you reasonably
use the mortality data cited above to make Semmelweis' case a convincing one?
3. If you were to construct a table linking mortality to the medical student ward but
not the midwives' ward, how would you do it?
4. Are there other ways to break the mortality totals for 1846 down into other mean-
ingful units? What units would you use? How would you use them to make your case
, regarding the link between disease and hygiene?
I 5. How would you link your data display to your arguments for creating and main-
taining hygienic conditions?
6. Do you think your ideas for displaying data would be as convincing as (a) Semmelweis'
numerical observations or (b) Snow's cholera map? Why or why not?
decision trees gracing the opening page of this chapter are very practical in
The first three trees are designed to help you determine, respectively,
wllletiler a data display is properly called a figure or a table, choosing to make a
graph of some data, or deciding between creating a table or a graph. The fourth decision
tree will help you to characterize the shape of a distribution of data (i.e., normal or
skewed), a matter of grater or lesser importance depending on the nature of the data and
the statistical tests used for analysis. In each case, these decision trees promote clarity
when organizing research results into written form-a report, say, or a paper-or ready-
ing them for formal presentation in a classroom or conference setting. After you use
these guidelines a few times, selecting and then creating an appropriate mode for pre-
senting or describing information will become second nature to you.
128 Chapter 3 Frequency Distributions, Graphing, and Data Display
Summary
1. There are two reasons for examining frequencies and graph- lower true limits. Frequency polygons are graphs of con-
ing display data: To get a feel for the data and to present it to nected lines. The lines meet at points placed above a score
others. or the midpoint of a class interval along the x axis. The
2. The best way to organize and summarize data is a frequency height of any point is determined by its frequency on the y
distribution. A frequency distribution is a table of scores or- axis.
ganized into numerical categories. Frequency distributions 9. Graphs can be misleading if their data are not accurately pre-
indicate how many times some particular type of response sented or represented. Good behavioral scientists always ex-
was made. Scores in a frequency distribution are ranked from amine relationships portrayed in graphs by asking, "Com-
the highest to the lowest observation. pared to what?" That is, what other pertinent information
3. Proportions and percentages, indices of relative position, are does a viewer need to know in order to be sure the graph is
often included in frequency distributions. Percentages are both accurate and honest?
readily calculated from proportions (and vice versa). Because 10. Exploratory data analysis refers to initial, nonmathematical
both illustrate the relationship of frequency to the total num- work with data that gives the researcher a sense of the data I
ber of available observations, they are often called relative before any testing for hypothesized relationships begins.
frequencies, and thus, they appear in relative frequency dis- Stem and leaf diagrams allow the researcher to quickly graph
tributions. a data set while retaining the original scores for examination.
A "stem" is the first digit or digits in some array of scores, and
proportion = p = fiN, a leaf is the last digit or digits of a score within the array.
percent = p(100). Stems appear in a column (usually ranked from lowest to
highest) and the leaf or leaves appear to the right of each
4. Grouped frequency distributions collapse the data into class stem. Stem and leaf diagrams present a quick "picture" of the
intervals of a set size. Each interval contains a group of data, but become less useful if too many scores are present. In
scores, the origin of the name "grouped frequency distribu- that case, a "tally" of scores using Tukey's line box tally system
tion:' Depending on the range of values in a data set, most is useful because it allows for accurate counts of observations
grouped frequency distributions will have around 10 class in- quickly.
tervals-larger data sets will require more intervals, smaller 11. The shape of a distribution tells a story about a sample's data.
ones fewer. The width of the class intervals will be a fixed A symmetrical or bell-shaped curve is called a normal distri-
whole number (e.g., 6, 8), and no gaps must exist between the bution. When distributions are symmetrical but they have
intervals, nor should they overlap one another. two "bells" or "lumps:' they are described as "bimodal." If
5. The class intervals in most grouped frequency distributions there are more than two clusters of scores, then the distribu-
will have upper and lower true limits. In general, the true lim- tion may be "multimodal." When a distribution is not sym- I
its will be determined by adding or subtracting one half of metrical, it is skewed; that is, its scores are clumped either at I
(
the unit of measure, respectively, from upper and lower class the high end or the low end. When scores appear predomi-
interval values. nantly at the left or low end of the distribution, positive skew
6. The x axis in a graph of a frequency distribution has scores is present. When the scores clump at the high end, negative
running horizontally, where the y axis has scores running skew is said to have occurred.
vertically. In most graphs, the two axes meet in the bottom- 12. A percentile is a statistic disclosing the relative position of a
left corner at the value of o. By convention, the dependent score within a distribution. Percentile rank is a number indi-
measure is graphed along the y axis and the independent cating the percentage of scores that fall at or below a given
variable is graphed on the x axis. score (e.g., a score at the 80th percentile means that it is
7. Bar graphs are used to show data from ordinal or nominal higher than or equal to 80% of the remaining scores on the
scales. Bar graphs illustrate simple group differences in terms test). Percentile information is useful because it places the
of frequency. Nonoverlapping bars are drawn above labeled score in context-only 20% of test scores were higher than
or "named" groups on the x axis, and the height of a group's one at the 80th percentile.
bar corresponds to its frequency on the yaxis. 13. Cumulative frequency refers to the total number of frequen-
8. Histograms and frequency polygons plot data from interval cies falling at or below a given point in a distribution. Cumu-
or ratio scales. A histogram is similar to a bar graph except lative percentage, the percent of responses or respondents at
that a given bar is attached to a score on a measure. Bars are or below a given point in a distribution, can be determined
drawn above their corresponding scores on the x axis until from cumulative frequency. Both can be used to determine
they reach the heights of their frequencies on the y axis. percentiles and percentile ranks.
When the bars are drawn extending up from the x axis, they 14. Data distributions can be divided into four equal parts called
should touch one another because each bar has upper and quartiles. By convention, the 25th and 75th percentiles in a
Chapter Problems 129
distribution are called Q, and Q3' The 50th percentile (Q2) is of chartjunk. Tables with too many numbers and categories
called the median. that making discerning meaning and patterns difficult suffer
15. Quality graphs and tables should strive for accuracy, brevity, from tableclutter. Any graph or table should be clearly linked
and clarity. Graphs that are too complex and weighted down to accompanying text, but that text should complement-
with extra (largely useless) information have an abundance not be redundant with-the data display.
Key Terms
Bar graph (p. 98) Histogram (p. 99) Platykurtic (p. 113)
Bimodal (p. 112) Kurtosis (p. 113) Positive skew (p. 113)
Chartjunk (p. 124) Leptokurtic (p. 113) Proportion (p. 90)
Cumulative frequency (p. 116) Median (p. 122) Quartile (p. 121)
Cumulative frequency distribution (p. 116) Mesokurtic (p. 113) Relative frequency (p. 92)
Cumulative percentage (p. 117) Multimodal (p. 112) Relative frequency distribution (p.92)
Exploratory data analysis (EDA) (p. 104) Negative skew (p. 113) Skew (p. 112)
Frequency (p. 87) Normal distribution (p. 111) Stem and leaf diagram (p. 105)
Frequency distribution (p. 87) Percent (p. 91) Tableclutter (p. 125)
Frequency polygon (p. 100) Percentage (p. 91) Tukey's line box tally (p. 108)
Graph (p. 97) Percentile (p. 115) x axis (p. 97)
Grouped frequency distribution (p.92) Percentile rank (p. 115) yaxis (p. 97)
Chapter Problems
1. Place the following data into a relative frequency distribu- b. Graph the scores using a histogram.
I
tion. Be sure to add columns for proportion and percent. c. If the usual score on this quiz is a 6, how would you de-
scribe the performance of the students on this quiz?
3,1,4,2,1,3,1,5,6,3,2,3,4,1,2,2,3,5,3,2
8. An anthropology professor assigns numerical grades to the
2. What is a proportion? What is its relationship to percent? term papers she receives. Here are the scores:
How can percent be calculated from proportion?
79,85,88,72,65,89,94,75,72,82,80,89,92,96,
3. Place the following data into a relative frequency distribu-
75,70,81,69
tion.
a. Place these scores into a frequency distribution.
10,11,13,13,15,13,10,11,11,12,14,13,15,16,
b. Graph the scores using a histogram.
13,12,12,11,10,10,11,12,12,13,15
c. The anthropologist usually gives an average grade of 77 on
4. Using the data shown in problem 1, draw a histogram and a term papers. Compared to that standard level of perfor-
frequency polygon. mance, how well did this class do on the term paper?
5. Using the data shown in problem 3, draw a histogram and a 9. Examine the following frequency distribution:
frequency polygon.
6. What is the difference between a bar graph and a histogram? x f
When should you use a bar graph instead of a histogram? 5 10
7. The following are quiz scores from a high school physics 4 7
class. The maximum score possible is a 10: 3 3
8,8,9,7,5,3,3,5,7,8,9,10,3,2,3,8,9,6 2 5
6
a. Put the scores into a frequency distribution.
130 Chapter 3 Frequency Distributions, Graphing, and Data Display
a. What is the N of this distribution? 21. Using the data from problem 20, construct a stem and leaf
b. What is the L f? diagram using units of 5.
c. What is the L X for the distribution? 22. Examine the following frequency distribution and then
10. Examine the following frequency distribution: recreate it using Tukey's tally system of dots, lines, and boxes:
X f X f
5 12 6 15
4 o 5 11
3 5 4 20
2 7 3 3
2 2 8
5
a. What is the N of this distribution?
b. What is the L f? 23. Using the following table of data, add a column for cumula-
c. What is the L X for the distribution? tive frequency (cf) and one for cumulative percentage (c%):
11. Using the data provided in problem 9, draw a frequency X f
polygon.
12. Using the data provided in problem 10, draw a frequency 10 8
polygon. 9 5
13. Review the following data: 8 o
30 28 20 25 18 15 7 4
12 20 10 28 17 8 6 9
22 16 12 13 27 25 5 6
6 10 18 17 18 27 4 8
29 30 22 16 11 9 3 5
Place the data into a grouped frequency distribution using 2 3
a. An interval width of 2 6
b. An interval width of 5
14. Using the data from problem 13, determine the true limits of 24. Using the data shown in problem 22, add a column for cu-
each class interval when mulative frequency (cf) and one for cumulative percentage
a. The interval width is 2 (c%).
b. The interval width is 5 25. Examine this data table and then answer the questions that
15. Using the data from problem 13, draw a histogram (using follow.
true limits) when X f
a. The interval width is 2.
b. The interval width is 5. 18-20 3
16. Using the data from problem 13, draw a frequency polygon 15-17 7
(using true limits) when 12-14 8
a. The interval width is 2. 9-11 6
b. The interval width is 5. 6-8 2
17. Describe some of the ways that a graph can misrepresent
a. What is the percentile rank for a score of 13 (i.e., X = 13)?
data. What should a critical researcher or viewer do to verify
b. What score (X) corresponds to the 65th percentile?
that a graph's data are presented accurately?
c. What is the median score (i.e., 50th percentile)?
18. You are planning to graph some data-what are some of the
26. Complete the following table and then answer the questions
steps you should take to ensure you present them accurately?
that follow:
19. What is exploratory data analysis (EDA)? Why do researchers
find it useful? X f
20. Using a unit of 10, construct a stem and leaf diagram of these
26-30 12
data:
21-25 15
20 32 43 52 67 77 81
83 16-20 10
65 23 33 72 80 79
45 31 27 11-15 5
35 48 71 76
89 72 65 63 6-10 11
42 41 30
44 31 22 29 1-5 2
56 70 79
Chapter Problems 131
a. What is the percentile rank for X = 27? centile results into words; that is, write about the answers you
b. What is the percentile rank for X = 9? gave to parts a, b, and c in the problem.
c. What score (X) falls at the 35th percentile? 28. Assume that the data presented in problem 26 are scores on
d. What scores (X) fall at QJ> Q2, and Q3? measure of introversion-extraversion (higher scores corre-
27. Assume that the data presented in problem 25 represent spond to more extroverted behavior). Put the percentile re-
achievement tests scores for fifth-graders (higher scores indi- sults into words by writing about the answers you gave to
cate better mastery of grade-level materials). Put the per- parts a through d in the problem.
Choosing a Measure of Central Tendency Choosing a Measure of Variability
1. 2. 1. 2.
Are the data Are the data Are you reporting Are you
based on an based on an the mean? reporting
interval or ratio ordinal scale? the median?
scale?
1. 2. 3.
Are you trying to Are you trying to Are you trying to
describe a sample? estimate the describe an entire
parameters of a (knowable)
population? population?
If yes, then If no, then go to If yes, then If no, go to If yes, then If no, review the
calculate sample question 2. approximate the question 3. calculate the data and your
statistics parameters by actual population goals, and then
(Le., X, S, s) calculating the parameters go back to
unbiased estimates (Le., p" r?, u) step 1.
of the population
(Le., 52, 5).
CHAPTER 4
Chapter oualne
/"
£
The Mean: The Behavioral Scientist's Statistic of Choice 137
Some of the populations that behavioral scientists and students work with are said to be
finite; that is, the number of observations eN) is known and it is small enough that the
observations can be counted. Other populations are infinitely large and their Ns are un-
knowable. These populations contain so many observations-real or potential-that
counting or keeping track of them all is not possible. In the case of infinite populations,
we rely on X to estimate the value of fJ-. We will learn a bit more about estimation
later in this chapter and in greater detail in chapter 9. For the present, we will focus on
finite populations.
If we have a population of six observations, for example,
34 22 12 20 24 26
the mean is calculated as
[4.2.2]
34 + 22 + 12 + 20 + 24 + 26
[4.2.3] fJ-=
6
l38
[4.2.4] fJ- = - ,
6
Mean X I.t
Variance 52 0-2 ;2
A
Standard deviation 5 a- s
Pearson correlation
coefficient r p
138 Chapter 4 Descriptive Statistics: Central Tendency and Variability
C onsistency is said to be the hobgoblin of little minds, but any adequate data analyst wants to
know when to use Nor n to refer to the number of observations available for a calculation.
Though it is not the stuff of high drama, there are minor inconsistencies in the way statistics books
present how and when to use Nand n. In chapter 1, Nwas introduced as a way to designate the
number of observations in a population. What about n? Different texts introduce n differently.
Some books indicate that n refers to the number of known observations in a sample of data
(e.g., n = 6 means there 6 raw scores in the sample). The use of n, then, is reserved for sample data.
Other books, including this one, often refer to sample data using N in lieu of n. Instead, the latter
books often reserve n for those occasions where there is more than one sample of data; that
is, an experimental group might have 10 observations (n\ = 10) and the control group might have
12 (n2 = 12). Still other books use N\ and N2 to denote the same circumstance-it all depends on
the analytic temperament of the author.
Chances are that you will be able to figure out Nor n's meaning in a particular context by de-
termining whether you are working with a sample or a population. To make that determination
is usually quite straightforward-just answer this question: Is the symbol for the test statistic you
are considering written using a Roman (sample data) or Greek (population) letter? Because we
will spend relatively little time dealing with population parameters in this book, I elected to
use N wherever possible, as its meaning is by now both familiar and intuitive. In those
few situations where n is employed, I will draw your attention to its presence and identify the ap-
propriate interpretation.
The search for an indicator of central tendency, then, is an exercise in data reduc-
The most representative value in a
tion. We are not altering the data; rather, we are beginning to explore one of its most
basic characteristics by working with-really, manipulating-its raw values. Instead of
given distribution is its central
discussing the various qualities of a distribution, we want to first hone in on one
tendency. characteristic-usually that area of the distribution where the majority of scores ap-
pear to "clump" or cluster together. In a perfectly normal distribution (which was
introduced in chapter 3), the observations cluster in its center, where they form a bell.
The most representative numerical value in a normal distribution is apt to be at the cen-
ter of this bell shape, where the curve can be split into two symmetric halves. In distri-
butions of other shapes, particularly those lacking symmetry, the index of central ten-
dency may move around a bit, but we should still be drawn by the clustering of
observations as an initial guide to its location.
But we are getting ahead of ourselves. Before we review some specific measures of
central tendency, it is important to discuss the meaning of the word average. This word
has a clear meaning to statisticians and behavioral scientists; namely, it refers to "a math-
ematical quantity indicating the most typical or representative number in a set of obser-
vations:' True enough. No doubt this definition (or words to that effect) come to mind
when we hear televised reports or read news stories about the "average rainfall in the
Mississippi basin:' "the average compensation package of CEOs of major corporations:'
"the grade point average:' "the average number of emergency calls the local 911 number
receives each week:' and so on.
The meaning of the word average in everyday language, however, has a somewhat
different connotation, one that is sometimes deemed pejorative. In my experience, few
professionals want to be evaluated as average by their employers, just as few students-
most readers of this book included-want to earn C or "average" grades in college. The
term average has, as it were, fallen in with bad company. Indeed, many people equate the
term average with mediocrity or being "second rate:' In some quarters, notably athletics,
average performance is erroneously seen as evidence of poor or even failing efforts.
American culture and our educational systems (largely unknowingly, perhaps) encour-
age this view, so that people bristle if they are deemed average. Many aspire to be citizens
of humorist Garrison Keillor's community, Lake Wobegone, where "all the women are
strong, all the men are good-looking, and all the children are above average:'
Statistically and mathematically, of course, we cannot all be above average on any
The term average has a statistical as given dimension-if we were, there would be no average per se. Think back to the nor-
mal distribution for a moment. The bell-shaped curve illustrates the problem precisely:
well as a colloquial meaning. Be
Although some observations will fall below the bell (Le., below the average) and others
aware of which one is being used in a above it (Le., above average), the preponderance of the data will be found at a central,
given situation. thereby representative, point. The idea of being average should not have any stigma at-
tached to it-as we will see, the concept has a great deal of utility. In practical terms,
being average should not be seen as an odd or troublesome circumstance either, as one
is in good company (i.e., with most other people). To be sure, people should excel on some
dimensions (for instance, I hope you are excelling in your study of statistics and data
analysis), but assuming that one is above average on a host of skills is misguided, if not
foolhardy. Keep this in mind the next time you witness the idea of average performance
being derided or criticized. (While you are at it, you might begin to think about your an-
swers to the four questions that opened the chapter, as well; see also, Data Box 4.B.)
Our job, then, is not to change people's understanding of the word "average"-that
is too tall an order-but to acknowledge that perception of what the word means has
changed. Be advised, though, that the term average means different things to different
people, and it has finer shades of meaning in various contexts. Let's turn back to central
tendency. There are three common measures of central tendency-the mean, the
136 Chapter 4 Descriptive Statistics: Central Tendency and Variability
median, and the mode-and we will review each in turn. We will use the term average
with the first statistical measure of central tendency we learn-the mean. Keep in
mind that the term average should be used appropriately, accurately, and judiciously
when discussing statistical relationships-to do otherwise is to forget the linguistic
baggage it carries.
[4.1.1]
- = Ix
X N·
In words, "X-bar is equal to the sum of the values of X divided by the number of values
in the sample:' or
[4.1.2]
If we have an array of scores from a sample that were randomly drawn, say, 12, 15, 10, 14,
and 8, their mean would be:
12 + 15 + 10 + 14 + 8
[4.1.3] X=
5
- 59
[4.1.4] X=-,
5
[4.1.5] X = 11.8.
When we are working with sample statistics, the symbols are based on the Roman al-
phabet (i.e., X leads to X), but when we are considering population parameters, we
switch to Greek letters. (If you need to review the distinctions between samples and pop-
ulations, please turn back to chapter 1 before proceeding.) Thus, the population mean is
symbolized as J.L (pronounced "mew" or "mu").
The formula for calculating the mean of a population would be:
Ix
[4.2.1] J.L = N·
The Mean: The Behavioral SCientist's Statistic of Choice 139
Note: Hypothetical ratings are based on a 1 ("not at all funny") to 7 ("very funny") scale. Higher scores
indicated greater perceived funniness.
what are supposed to be humorous circumstances. You conceptually replicate these prior
research efforts by randomly assigning participants to two groups, an experimental
group and a control group. The experimental group rates a cartoonist's work containing
characters experiencing painful pratfalls. Cartoons lacking violent themes but drawn by
the same artist were evaluated by the control group. Exposure or nonexposure to hostile
cartoons was the independent variable, and the participants' ratings of how funny the
cartoons actually were constituted the dependent measure. Table 4.2 contains the ratings
for both groups, and these data are not grouped in any particular way-the raw humor
ratings are simply listed within the respective group.
As you can see, there were eight participants in each group. Please note that for
convenience, the subscripts "e" for "experimental" and "c" for "control" are attached to
the symbols shown in Table 4.2 (use of subscripts as labels for keeping track of infor-
mation will become more frequent and necessary as illustrative examples become more
complex; see Data Box 4.B). By eyeballing the raw scores in the groups, you can see that
the ratings are slightly higher in the experimental group. The means shown at the bot-
tom of the two columns confirm this expectation: The experimental group rated the
cartoons with hostile content as funnier on average (Xe = 6.0) than did the control
group, which rated cartoons lacking the theme (Xc = 4.0). The means were calculated
using the formula for sample means presented back in [4.1.1]. As descriptive statistics,
the two means are suggestive-they appear to support Freud's theory of hostile humor
because we know the typical reactions of the members of both groups. We need to
confirm this prediction, though, by determining if the difference between the two
means is a reliable one (i.e., the mean humor rating of the hostile cartoons is actually
higher than the mean associated with the nonhostile cartoons), a topic we explore in
later chapters.
What happens if the data you are working with are arranged in a frequency distri-
bution? Is it possible to determine the mean with relative ease and accuracy? Fortunately,
yes. We can capitalize on the working knowledge of frequency distributions we devel-
oped in chapter 3.
140 Chapter 4 Descriptive Statistics: Central Tendency and Variability
Table 4.3 Frequency of Spatial Errors Made by Rats in a Radial Arm Maze
...
•.,... X
8 3
I IX
24
7 5 35
6 6 36
5 8 40
4 10 40
3 4 12
2 2 4
1 5 5
0 0
N =44 'IfX= 196
Note: X refers to the number of possible errors rats could make while navigating the radial arm maze.
Table 4.3 shows a frequency distribution of errors from a radial arm maze used in an
animal learning experiment. Experimental psychologists often measure spatial ability in
rats by using a maze that has eight "corridors:' or arms, extending outward from a central
chamber (e.g., Suzuki, Augerinos, & Black, 1980). A rat must navigate its way through
space to find a food reward in each arm. If a rat navigates all eight arms successfully, it
makes no errors and can eat all the food rewards. If the animal goes down a given arm
twice (Le., once after the first trial when it ate the food available in the arm), the behavior
counts as an error. If it backtracks through the same arm three or four times, it counts as
two and three errors, respectively. The rats in this study were allowed a maximum of eight
errors before they were removed from the maze. As you can see in Table 4.3, only one of the
44 rats in the study made no errors in the maze, just as only three animals managed to
make eight errors. The remaining 40 animals had errors ranging between these two ex-
tremes-but what was the mean error rate for the sample of rats?
How can we calculate the mean number of errors from the data shown in Table 4.3?
Recall that in chapter 3, we learned how to find the I Xby multiplying each value of X
by its frequency (f) and then summing the products (Le., IfX = I X; see [3.2.1]). Once
we know I fX, we can determine the X of this frequency distribution by dividing
the I fXby N, or
[4.3.1]
- IfX
x=--·
N
We need to enter only the numbers from Table 4.3 to find the mean:
- 196
[4.3.2] X=-,
44
are very far away-either high or low in value-from the central tendency of the distri-
bution. Such scores are often called outliers because they lie very far outside the central
part of a distribution. Outliers that are high in value can inflate the mean; the opposite
effect-deflating the average-occurs when an outlying score (or scores) is very low in
value. We will consider the effect of outlying scores conceptually as well as through a
practical example.
Let's do the conceptual work first. Think about a balance beam or a playground
seesaw. It is possible to put several smaller weights on one side of a balance beam and a
larger, heavier one on the opposite side-despite the presence of several smaller weights,
the bigger weight can still tilt the beam to one side. In the same way, two or three smaller
toddlers on one side of a seesaw cannot balance the weight of a full-grown adult. Can
balance ever be achieved when weight magnitudes vary? Yes. Lighter weights can balance
a heavier weight when the latter is moved closer to the fulcrum, the support that bal-
ances the beam or seesaw. If our hypothetical adult moved closer to the fulcrum (and the
toddlers), the children can balance his or her weight.
In virtually the same way, the presence and magnitude of any outliers can influence
the mean of a distribution of numbers. Now we can turn to a practical illustration near
and dear to the hearts of any people who have ever been employed-average salaries.
Imagine that you worked in a small business, one that had seven employees and one
owner. Hypothetical salaries for this business are shown in Table 4.4. As you can see, the
owner makes a very high salary compared to the modest but generally similar salaries of
the employees. When the mean of these salaries-a figure of $57,000-is calculated, the
problem is apparent: No one in the business actually earns a salary near the mean; indeed,
the figure closest to it is $42,000.00, a difference of $15,000.00. The owner's salary exceeds
this average by $143,000.00! Why is the average inflated? Like the heavy weight on a bal-
ance beam, the owner's salary drags value of the mean away from where the preponder-
ance of salaries lay. If one read about the average salary of employees working in this small
business in a trade report or a newspaper article, the impression received would be quite
different than the financial reality. Caveat emptor!
We may not be able to examine the salary data for privately held companies, but we
The mean of a distribution can be can assuredly examine our own data or those published in behavioral science books and
journals. The utility of the mean as an index of central tendency is not offset by its sensi-
inflated or deflated when an extreme
tivity to extreme scores-on balance, as it were, it is still the best indicator. We must be
or outlying score is present. vigilant, however, and always carefully review any distribution of data in search of outliers
that could substantially influence the magnitude of a mean. In general, extreme scores
pose greater problems when they appear in data sets with fewer observations (see Table 4.4)
than in larger ones-once again, larger samples are apt to be more representative of
population than smaller ones. It is also true that more representative scores will
appear in larger samples, a property we will revisit in later chapters.
X _ I (NX)
[4.4.1] w- IN'
where Xw is the symbol for the weighted mean. This formula indicates that the user must
multiply the mean of each sample by the its respective N, sum the products, and then di-
vide the resulting total by the number of observations available.
Here are the hypothetical data for the two statistics classes:
NB = 45
XB = 87.0
ClassAhadanaverageof82.00nthetestandclassBhadanaverageof87. What was the test
average across both classes? We simply enter the numerical information into [4.4.1] for:
[4.4.2]
Xw = . :.-.(6-.:,0).. .:. ,.(8_2.-.:,0)_+_(.:-45-,-)..:.-.(8_7.-,-0)
60 + 45
4,920 + 3,915
[4.4.31 Xw = """:""'-1-0-5"':"'--
[4.4.4]
x = 8,835,
w 105
Thus, the mean score on the test across both classes was 84.0. Note that if we did not
use the weighted mean approach-that is, if we simply calculated the average of 82.0 and
87.0 (i.e., 82 + 87/2 = 169/2 = 84.5), we would have overestimated the true mean by a
slight amount. Imagine how the overestimation could increase, however, if the sample
sizes were more disparate from one another. Avoiding the bias inherent in such overesti-
mation is the primary reason for relying on the weighted means approach.
The Mean: The Behavioral Scientist's Statistic of Choice 143
We will leave our review of the mean, its applications and properties, for the time
being so that the remaining measures of central tendency can be introduced. We will re-
Aweighted mean should be calculated
turn, however, to consider two mathematical properties of the mean later in the chapter.
when you are trying to identify the Chapter 4's Project Exercise entails having you prove that these properties make the mean
average of more than one group, each the most useful gauge of central tendency.
of which has adifferent sample size.
l1li The Median
You may remember being introduced to the median in chapter 3 when the calculation
of quartiles was introduced. At that time, I promised to discuss the median in some de-
tail at the proper place, which is here by virtue of its utility as a measure of central ten-
dency. The median can be used for calculations involving ordinal, interval, or ratio
scale data.
KEY TERM The median is a number or score that precisely divides a distribution of data in half. Fifty percent of
a distribution's observations will fall above the median and fifty percent will fall below it.
One way to think about the median is that it represents a particular case of percentile
rank-a median score that falls at the 50th percentile of a given distribution. Thus,
you could determine the median for a grouped frequency distribution of data by cal-
culating Q2' the 50th percentile (see formula [3.13.1] and the related distribution in
chapter 3).
Many times, of course, the distribution of data is relatively small and not grouped
into a frequency distribution. The formula for determining the median score in this case
A simple memory aid for the median:
is surprisingly easy.
The median strip on a highway
divides opposing lanes of traffic in For an Odd Number of Scores. Here is a data set of 15 scores to consider:
half.
26 32 21 12 15 11 27 16 18 21 19 28 10 13 31
To calculate the median, arrange the scores from the lowest to the highest:
10 11 12 13 15 16 18 19 21 21 26 27 28 31 32
When you have an odd number of scores, find the score that splits the distribution into
two halves. The location of the median score can be found by using this simple formula:
. N+l
[4.5.1] medIan score = - 2 - '
. 15 + 1
[4.5.2] medIan score = ---,
2
. 16
[4.5.3] med Ian score = - = 8.
2
Locate the 8th score in the original array, which is 19 (see below):
10 11 12 13 15 16 18 19 21 21 26 27 28 31 32
As you can see, 7 scores appear on either side of this median score.
For an Even Number of Scores. What happens if we are confronted with an even num-
ber of scores? Consider a new array of 8 scores:
55 67 78 83 88 92 98 99
The Mode 145
We need to calculate the median, which will fall half way between the two middle scores
in the previous array. We can again employ formula [4.5.1] to find the median score:
.8+ 1 9
medIan score = - - = - = 4.5.
2 2
The median score can be found 4.5 scores into the array, which places it between the
scores of 83 and 88:
55 67 78 83 88 92 98 99
The average of these two middle scores is the median score: (83 + 88)/2 = 17l/2 = 85.5.
The median has one important characteristic that sometimes makes it a better indi-
cator of central tendency than the mean. Unlike the mean, the median is not particularly
sensitive to extreme scores. Recall the data we examined back in Table 4.4. The mean we
calculated from those data was inflated by the relatively large, outlying salary of the
owner-what if we instead determined the median annual salary of that small business?
If you follow the above instructions for calculating a mean when there is an even num-
ber of scores present, you will discover that the median salary based on the data in Table
4.4 is $37,000.00. Here are the salaries (ranked low to high) from Table 4.4:
$31,000.00 $32,000.00 $36,000.00 $36,000.00 $38,000.00 $41,000.00
$42,000.00 $200,000.00
To locate the median score, we use formula [4.5.1] and calculate a value of 4.5, which
means that the median salary falls half way between score 4 ($36,000.00) and score 5
($38,000.00)-hence, we know that the median is $37,000.00. Clearly, the median pro-
vides a much more realistic figure of central tendency for this relatively small distribu-
tion than the biased mean of $57,000.00 we calculated previously.
Further, we can extrapolate from this small business example to the work done by
demographers when they report "average" income in the United States. Demographers
study what are called vital and social statistics, such as the number of births, deaths, mar-
riages, and so forth, within a given population. They are also concerned with employ-
ment and salary data. Demographers routinely eschew reporting mean income be-
cause-similar to the small business example-the distribution of data is unduly
influenced by the relatively small number of citizens who make salaries of incredible
size. That is, most households in our nation probably earn somewhere around
$40,000.00 per year or less, but this majority of offset by other members of the popula-
tion who make salaries in the hundreds of thousands, even multimillions, of dollars. If
we determined mean household income in the United States, we would no doubt see the
same problem we witnessed with the data from Table 4.4-the mean would be so in-
flated that it would not numerically represent the compensation experience of most
Americans. As we discussed, the median is not as sensitive to extreme scores, so demog-
raphers usually report median household income when they portray the average eco-
nomic experience of the population.
A final note: There is no Roman letter designating the median. In APA style, how-
ever, the word "median" is abbreviated and then written as mdn. This abbreviation can
appear in written text or within a table.
Although it can be used to describe the most frequent observation based on anyone of
the four scales of measurement, it is usually associated with nominal scales.
KEY TERM The mode of a distribution is the score or category that has the highest frequency.
Take a second look at the definition of the mode and be sure you understand a subtle but
defining point: the mode is the most frequent score or category-it is not the frequency
associated with the score or category. Thus, if we reexamine a sample of scores from our
previous review of the median:
10 11 12 13 15 16 18 19 21 21 26 27 28 31 32
Which score (or scores) occurs with the most frequency? A1; you can see, the score 21 ap-
pears twice, as all the other scores appear only once. The mode of this simple distribu-
tion, then, is 2l.
The mode is usually quickly and easily discerned from any frequency distribution.
Imagine you wanted to know the modal or most frequently declared major among a
group of prospective first-year college students. You could form a frequency distribution
like the one shown in Table 4.5, subsequently locating the mode with alacrity. A1; you can
see, psychology is the modal prospective major in this hypothetical distribution-97
prospective first-year students indicated they were interested in majoring in that field. Its
next closest rival in terms of student interest is economics. For the sake of argument,
what if both psychology and economics had the same frequency of expressed interest-
let's say that each had 110 students say they were interested in both fields-can there be
more than one mode? Absolutely. Psychology and economics could both be the mode;
that is, the simple distribution in Table 4.5 could be bimodal. Indeed, if three or four or
even five majors had the same frequency, they would all constitute the mode, which
would mean that the distribution is multimodal. It really does not matter as long as the
_most frequent score (or scores) is identified as the mode. The chief advantage of using
this measure of central tendency is that no calculations need to be performed. The data
just need to be organized and "eyeballing" by the data analyst can take over from there.
Because the mode only relies on one aspect of central tendency-frequency-its use
In contrast to its definition, the mode is rather limited. It is the least flexible and applicable of the three measures we reviewed
here. Finally, the mode does not have any special statistical abbreviation or symbol to
is the least frequently used measure
identify it, and APA style has no separate designation for it. When writing about the mode
of central tendency. of some distribution, just write "mode;' along with the score or category name.
mode
(a) Normal Distribution
YI~x
mode mdn i
(b) Positive Skew
that has the longest tail, and the median will appear approximately one-third of the dis-
tance between the mean and the mode (Kirk, 1999). Why this and not other orderings?
Extreme scores, which fall into the longer tail of a given distribution, will pull the mean
toward their positions. The median, in turn, is affected by the relative position of these
outliers but not their magnitudes. The mode remains unaffected by the presence of extreme
scores unless one or more of them happen to have the greatest frequency of occurrence.
Kirk notes that the ordering can be remembered as an alphabetical mnemonic-mean,
median, and mode--as long as the user begins the ordering in the longer of a distribu-
tion's two tails.
A distribution's departure from symmetry can be tracked by the magnitude of the
discrepancy existing between the mean and the median. A smaller discrepancy implies
greater symmetry; less symmetry occurs when there is a greater discrepancy. In most
cases, the actual ordering of the mean and median can indicate whether a distribution is
positively or negatively skewed (Kirk, 1999). As introduced in chapter 3, positively
skewed distributions show a clustering of scores toward the low end of the scale of mea-
surement-the opposite pattern occurs in negatively skewed distributions. Where cen-
tral tendency is concerned, a positively skewed distribution will ha~e a mean with a
higher value than its median. As shown in panel (b) in Figure 4.1, X > mdn, and the
mode falls below the latter measure. In negatively skewed distributions, however, medi-
ans ten~ to be of greater value than means. Panel (c) in Figure 4.1 illustrates that the
mdn > X, and that the mode lies above the median.
The fact that the values of the three measures of central tendency can vary widely
depending on the shape of a distribution should serve as ample warning to any data an-
alyst or reader of the behavioral science literature. Relatively few distributions are com-
pletely normal-indeed, depending on the research topic, nonsymmetric distributions
may be, if you will, the relative "norm." Let me encourage you to carefully examine your
own data as well as what you read in the published literature, and to know the value and
location of each measure of central tendency before drawing any psychological or be-
havioral conclusions dependent on them.
In the wrong hands, an asymmetric distribution of data can be used to distort fac-
tual information (e.g., Campbell, 1974). If you were trying to negotiate a higher salary
for yourself or your coworkers, for instance, you would want to know whether the mean,
median, or modal salary in your company was the highest one. If the distribution of
salaries were positively skewed, you would want to argue for the mean, but a disingenu-
ous but savvy management would probably suggest the median or even the mode. If the
data were negatively skewed, however, you would be undercutting yourself if you took
the mean salary, as the median and modal salaries, respectively, would be relatively
higher (see also, Kirk, 1999, p. 87). Caveat emptor with a vengeance!
Each peer tutor offered academic assistance to his or her learning group more times
(M = 7.2) during the work session than did the traditional tutors (M = 4.3) or the
teacher, who worked with students in the control group (M = 4.0).
If there is a relatively large number of means to report, you will probably want to place
them into a table (see chapter 3). If you do so, just be sure to explain the table to readers
and to lead them carefully through it. Describing the mean relationships in the text is
important, of course, and you will presumably need to repeat a mean or two from the
table. Avoid repeating all the numbers shown in the text, however; such redundancy de-
feats the legitimate purpose of using a data table.
Data concerning the median and the mode are similarly descriptive in nature. In an
animal learning experiment, an author might note that, "The median pecking rate of the
pigeons placed in the operant chamber was 127:' The mode might best be linked to the
median or the mean within a research summary. Consider a hypothetical sociological
investigation of the smoking habits of middle school students:
Sixth-, seventh-, and eighth-graders were asked to indicate whether they ever smoked
cigarettes and, if so, how many per day. Forty percent of the students said they had
smoked at least one cigarette, and 26% of these indicated they smoked daily. The mean
number of cigarettes smoked per day was 8.0. For comparison purposes, the mode was
4.0 cigarettes per day.
In this study, the presence of the mode told readers a bit more about the students' smok-
ing habits than the solo mean level would have accomplished. A statistically sophisti-
cated reader like you would posit that the distribution of cigarettes consumed in the
middle school sample was probably positively skewed. If you are not certain why this
should be so, review the earlier discussion concerning the shape of distributions and
likely positions of the central tendency measures.
Because they are somewhat less flexible than the mean, the median and the mode
appear less often in the Results section of an APA style article. The two measures are
much more likely to appear in the Method section of the article, where they often ac-
company pretest information (e.g., "The mdn of practice trials was 5."), brief descrip-
tions of standardized psychological measures (e.g., "High and low agoraphobia groups
were created by using the median score on the Public Space Scale as a guide."), and par-
ticipant sample descriptions (e.g., "The most frequent participant category was divorced
mother with two children. This modal group had 12 members."). Of course, the mean
can also be a useful statistic for a Method section (e.g., "Mean age of respondents was
27.5 years"). It all depends on what information appropriately but concisely helps to
clarify the scopes and goals of the research in question.
Please understand, then, that there is no pressing need to report more than one or
even all three measures of central tendency unless you have a good reason for doing so.
The descriptive quality of these measures is important, but only when they support a
conclusion or fact the researcher wishes to highlight-otherwise they serve as excess
writerly baggage that detracts from the main points of the research.
Knowledge Base
1. Calculate the mean of the following scores:
19 22 8 14 25 17 6 16 9 2
Understanding Variability 151
Answers
1. The mean is 13.8.
2. The mean is 3.54.
3. The median is 10.5.
4. The modes are 5 and 3, respectively.
5. Measures of central tendency enable us to characterize data with one representative or typical
number.
6. The distribution is apt to be positively skewed.
• Understanding Variability
Measures of central tendency, particularly the mean, characterize a sample or popula-
tion of observations with one number. To be sure, this indexing of central tendency-
where does the general behavior tend to fall and what is it like?-is useful, but re-
searchers need contextual information in order to properly evaluate it. Context is
provided by the manner in which the observations within a distribution disperse them-
selves. Are the scores clustered close together, right on top of one another, or spread
very far apart? Clustering, spread, dispersion-each word is a synonym for the concept
of variability.
KEY TERM Variability refers to the degree to which sample or population observations differ or deviate from
the distribution's measures of central tendency.
When the values in a distribution are very similar to the distribution's average value-
that is, the spread of the observations around the mean is small-then variability is said
to be low. When a distribution's values are dissimilar from its mean-that is, the spread
of observations around the mean is large-variability is high.
For the moment, using numerical values to illustrate variability may seem a bit
The clustering, spread, or dispersion abstract. In more concrete terms, how can we construe variability in a sample or pop-
ulation? Imagine that you wanted to compare recall rates for two participant groups,
of data emphasize the relative amount
an experimental group that relied on a particular memory strategy and a control
of variability present in a distribution. group that did not. Cognitive psychologists have studied what is called the method of
loci (Hayes, 1981), a strategy for remembering a list of unrelated items sequentially.
152 Chapter 4 Descriptive Statistics: Central Tendency and Variability
Participants are coached to mentally "walk through" a familiar place (e.g., home, cam-
pus) and to associate each item (e.g., "umbrella") with an object ("I will visualize the
'umbrella' on the chair in my living room") found along the "path." To recall the items,
you simply "retrace" your steps. The loci method is very effective and easy to use, so
much so that a list of 20 items can be recalled with ease (Hayes, 1981). The participants
in the experimental group could learn the loci strategy in a few minutes, whereas the
control group would receive no special instruction before hearing and then recalling the
item list.
Behaviorally, low variability in the experimental group's recall scores could mean that
most participants recalled all the items-that is, they more or less acted or responded in
the same way (i.e., on average, individual recall rates were close to the group's mean).
Perhaps members of the experimental group were found to recall an average of 18 out of
20 words. Higher variability in the control group's behavior could indicate that recall
rates were relatively freewheeling, that without a common memory strategy, some peo-
ple tended to remember fewer words-make more errors than others (Le., generally, re-
call scores were far away from the group's mean). The hypothetical study might find that
the mean recall rate for the control group was 12 words.
Thus, the loci method appears to be superior-the mean number of items recalled
in the experimental group (M = 18.0) was higher than the control group (M = 12.0).
Please note, however, the importance of knowing about the variability within the distri-
bution of each group. A mean tells you only part of the behavioral story. You need to
know how close or far away from one another-how variable-the scores were in the re-
spective groups in the experiment.
You may not realize it, but in a statistical sense, you already have some working
The shape of a distribution, its knowledge of variability. In chapter 3, the concepts of skewness and kurtois were intro-
duced (skewness was also reviewed here in chapter 4). Both concepts deal with the shape
skewness or kurtosis, often
of distributions, or how the observations in a set of data are spread along a continuum.
characterizes its variability. When a distribution is positively or negatively skewed, the scores cluster, respectively, on
the lower and higher ends of the measurement scale. Skewed distributions, too, have a
longish "tail" that increases their variability. In contrast, a normal distribution will have
two much shorter tails and observations falling in its center, two factors that contribute
to its relatively lower variability.
What about kurtosis? If you look back to Data Box 3.e, you will recall that kurtosis
refers to the peakedness or flatness of a curve. Three types of kurtosis-meso-, lepto-,
and platykurtic-were introduced to characterize whether the spread of scores in a given
distribution was relatively normal, tall and skinny, or broad and flat (see Data Box 3.e).
That is, a mesokurtic distribution is apt to have relatively low variability, a leptokurtic
one will have almost no variability, and a platykurtic curve will have a great deal. Again,
it is important for you to envision the degree to which values in a set of data deviate from
the mean, as the amount of deviation indicates the relative amount of variability present.
Keep the shapes of distributions in mind as you learn the quantitative indices used to in-
dicate variability.
Generally speaking, what do researchers in the behavioral sciences desire where
variability in their research is concerned? As we will see in later chapters where inferen-
tial statistics are reviewed, researchers usually want a low degree of variability within a
given condition in an experiment. In other words, an investigator hopes that the level of
an independent variable presented to the experimental condition will cause most people
in it to behave similarly to one another (Le., close to some mean level). Accordingly, the
members of a control group, too, should act like one another (Le., close to another mean
level, one differing in value from the experimental group). Further, researchers hope that
the variability between the two groups is relatively high; that is, the independent variable
The Range 153
truly had an effect because the behavioral differences between the two groups are pro-
nounced. We will review this research desideratum many times. Here is a simple
mnemonic, a memory aid, you can learn to describe the desired pattern of variability in
experimental groups: Low within, high without.
II The Range
The most basic index of variability is called the range or, on occasion, the crude range or
even the simple range. When it is used to describe the variability of a distribution, the
range is reported with the mean, the median, or the mode.
KEY TERM The range is the difference between the highest and the lowest score in a distribution.
The range entails subtracting a smaller number from a larger one, and it identifies the dis-
tance between the two extreme points in a distribution. The formula for the range is:
[4.6.11 range = Xhigh - X!ow'
If the lowest observation is a sample of data were 20 and the highest was 75, then the
range would be 55, or:
[4.6.21 range = 75 - 20 = 55.
Please note that we are calculating the range-represented by only one number-and
not the range of scores (i.e., 20 to 75), which entails two values.
. The range is readily capable of accommodating a distribution's true limits. In the
The range is the most basic index of ~nor example, ~e true limits of the scores 20 and 75 might be 19.5 and 75.5, respectively
variability. ~I.e., 1/2 the umt of measure added to the highest and subtracted from the lowest scores
In a di~tribution). In this case, the range wouldbe 56 (i.e., 75.5 - 19.5 = 56) instead of
55, which was found for the raw scores.
'd J~e range is a rough-and-ready way to get the sense of a distribution but it has a de
~~n~e i~~~~j~~;~oL~:t:~~ean, t~
extreme scores easily affect it. Similar the mean, th~
lady one aberrant Score th7t ~hen e:reme Scores are present. Outlying scores, particu-
range, giving a false sense ofwlsh"teryth adr away from any other, can artificially inflate the
a e ata are really 1& Th " f
example, might fall into a somewhat I d" e. e maJonty 0 the scores, for
outlying score will obscure the d' t 'bno:m~ b Istnbution, but the presence of even one
. IS n utlOn s alance.
. ~s a practical matter, however, calculatin .
Checking the ranges of variables a lImited number of possible val' g or checking the range of a measure with
is a good way to catch errors ;n a ues IS an excellent wa t h ' ak
sure, say, a 7-point rating scale' d y 0 catc mist es. If a given mea-
fl' , IS use and the ran . l' d
data set. o 1 ffilnus a low Score of 1) som ge IS loun to be 10 (e.g., a high score
(i 7 ) , e error occurred·! . all tho
.e., - 1 . The error in this case is apt t b d ' OglC y, IS range cannot exceed 6
number 7 was being entered and 11 0 be ~ ata entry error or "typo"-perhaps the
to make certain that the values make was suo stItuted by mistake. Thus, checking ran
d
ata set. sense IS a straightforward ges
way to catch errors in a
In practice, the interquartile range is less likely to be used than the semi-
interquartile range, which characterizes the distance between 75th and the 25th per-
centiles, and then divides the distance in half.
KEY TERM The semi-interquartile range overcomes the instability of the range by providing a numerical
index of half the distance between the first (01 ) and third (03 ) quartiles in a distribution.
Larger values representing the semi-interquartile range indicate a greater distance be-
tween QI and Q3; that is, a greater spread of scores. Smaller values, then, would point to
less spread. Here is the formula used to calculate the semi-interquartile range, which can
be abbreviated as SIQR:
~~~~~~r---X
Median
40 55 70
01 02 a3
(a) A Normal Distribution
,,
,,
,
y 81 R \
,--' X
1~.5 27 35.5
01 O2 ~
(b) A Skewed Distribution .
d Skewed DistributIOn
'n a Normal an a
'The Semi-Interquartile Range I
Figure 4.2
Variance and Standard Deviation 155
KEY T ER M Variance is equal to the average of the squared deviations from the mean of a distribution. Sym-
bolically, sample variance is 52 and population variance is (7'2.
This definition for variance seems all too technical, but bear with me for a few min-
utes. You already have a conceptual knowledge regarding variability, but it does need
to be bolstered with a bit of technical knowledge. As you will see, working through
this technical knowledge will help you appreciate the utility of statistics generally and,
more specifically, the virtues of the standard deviation-but now back to explaining
variance.
What is meant by a deviation from the mean? One way to explain the fundamental
Variance is the numerical index of role the mean plays in statistics is to acknowledge that its value is completely dependent
on every single score in a distribution-adding or subtracting an observation will
variability, being based on the average
change the value of the mean. There is more to the mean than such numerical
of the squared deviations from the sensitivity, however. Let's examine the role of deviation from the mean in a simple data
mean of a data set. set. Below is a frequency distribution that notes the values of X in column 1, the X
[me!!ll of the distribution in column 2, and deviation scores-each value of X minus
the X in column 3.
x X X-X
6 4 2
5 4 1
3 4 -1
2 4 -2
I (X- X) = 0
What stands out here? When calculating the sum of the deviation scores, or L (X - X),
you will discover that it is always equal to O. What does this mean? Two things, really.
First, the mean is the balance point between the high and the low scores in this (or any)
distribution. Second, statistically speaking, we cannot work with the deviation scores in
their present form precisely because they are positive and negative numbers that always
sum to O. The deviation scores must be transformed in some way so they can be entered
into formulas and used in calculations.
Our statistical recourse is to square the deviation scores so that we will always be
When squared and summed, mean working with relatively large and positive numbers. Unless you are working with 0, 1, or
with decimals between 0 and 1, squaring a number will always increase its value. By
deviation scores yield positive
squaring the deviation scores, we create a fundamental and helpful number-the sum of
numbers that are useful in a variety of the squared deviations from the mean, otherwise know as the sum of squares, which is
statistical analyses. always abbreviated in statistical works as SS.
KEY TERM The sum of squares (88) is the sum of the squared deviations from the mean of a distribution. The
SS is fundamental to descriptive and inferential statistics.
Symbolically, the SS looks like this:
[4.9.1] SS = L (X - X)2. .'
The SS will be used extensively in the calculation of inferential statistics that appear later
in this book, so I advise you to become familiar with it now. This frequency distribution
~
156 Chapter 4 Descriptive Statistics: Central Tendency and Variability
is modified somewhat by (X - X)2. As you can see immediately below, summing the
squared deviations results in a positive number (i.e., SS = 10). Please make certain that
you understand where the SS came from before you proceed:
x X (X- X)2
6 4 4
5 4 1
3 4
2 4 4
I (X - X)2 = SS = 10
There is a second formula that is used to calculate the SS, and it, too, is useful for
working with a table of data. It is often referred to as the "computational formula" for the
SS. Here it is:
(IX)2
[4.10.1] SS=IX2- - - .
N
There are no unfamiliar calculations here. We can use the data table from above and
recalculate the SS using the computational formula. All we need to do is calculate the
I X2 and (I X)2. As we noted back in chapter 1, remember that I X2 =fo (I X)2. Here
is the revised data table:
X X2
6 36
5 25
3 9
2 4
Ix= 16 I X2 = 74
The values of I X and I X2, as well as N, are then entered into [4.10.1]:
[4.10.2] SS = 74 _ (16)2,
4
256
[4.10.3] SS= 74 - - ,
4
[4.10.4] SS = 74 - 64,
[4.10.5] SS = 10.
As must be the case, the same answer for the calculation of the SS is found using [4.9.1]
and [4.10.1].
Now, consider what happens when we have a larger number of observations than
what is shown in the frequency distribution, and these observations have values great
There will generally be more than one
and small. If we use them all to calculate the SS of squares of their distribution, we
procedural formula available for will discover something interesting. If the scores are spread out and relatively far from
calculating statistics like the sum of the mean-that is, they have a relatively high degree of variability-then the total SS
the squares (55). for the distribution will be a relatively large number. On the other hand, if the scores
are close to the mean, even clustered, then there will be less variability and a smaller
total for the SS. Intuitively, then, we know that scores that are more deviant from
the mean will result in higher SS and be indicative of greater variability (people
behaved somewhat differently than each other). The reverse will hold true for scores
,r Variance and Standard Deviation 157
I that fall closely around the mean of a distribution (people behaved similarly to one
f
r another).
r, The size of the 55, then, is influenced by the magnitude of the deviation scores
1 Greater variability means less around the mean. A second factor, the number of observations available, also plays a role
I' in the 55's size. Generally speaking, the more scores available in a distribution, the larger
consistency (larger deviations
the 55 will be, especially if some of the scores diverge greatly from the mean. As we
r between Xand Xl in behavior, acknowledged on earlier occasions, the more observations we have in a data set the bet-
( whereas less variability leads to ter, but in that case, we need a way to make 55 a good indicator of dispersion. We need-
f
so to speak-to get the 55 under our control. The most direct way to do so is by dividing
r greater consistency (smaller
the 55 by the number observations (i.e., N) available. By doing so, we are calculating a
r deviations between Xand X).
( number that represents the average of the squared deviations from the mean, that is, the
r variance. (At this point, you might want to review the definition of variance presented at
II
the start of this section of the chapter.) We now turn to the measurement of variance in
(
samples, then populations, and finally, where and how sample statistics are used to esti-
f
,r mate population parameters.
( 2
=I (X- x? .
(
, [4.11.1] 5
N
I Due to its derivation from the 55, the 52 can also be represented by:
(
f
I
/ 2 55
I
[4.12.1] 5 =-.
I N
r Because we already calculated the 55 above, we need only enter the relevant numbers
I
, into [4.12.1] to determined the variance:
/'.
f 2 10
[4.12.2] 5 = 4'
)-
! [4.12.3]
(
r' It follows that the variance can also be calculated by incorporating the 55 information
) from [4.10.1], except that now the formula, which already includes division by N, must
itself be divided by N:
r
r, [4.13.1] 52 = IX2 - (IXP/N
i N
,. Let's enter the numbers we know into [4.13.1] to illustrate that we can obtain the same
I
.r
of"
variance shown previously in [4.12.3]. We can simply repeat the information found in
[4.10.2] and then divide by N = 4 again.
2
t [4.13.2] 5 = 74-(16)2/4 ,
I, 4
158 Chapter 4 Descriptive Statistics: Central Tendency and Variability
[4.13.3] 5
2
= 74 - 256/4
,
4
[4.13.4] 5
2
= 74 - 64
,
4
2 10
=-,
[4.13.5] 5
4
[4.13.6] 52 = 2.5.
The variance, then, can be calculated directly if the SS is known (i.e., using [4.12.1])
or with a few more steps if the SS must be determined from scratch (i.e., using [4.13.1]).
As a measure of dispersion, the variance is useful for averaging the effects of
greater and lesser deviations from the mean. However, the variance represents the av-
erage of the sum of the squared deviations between each observation X and the mean
of a distribution. It does not represent the average deviation between an observation
and the mean using the original scale of measurement. In other words, we need to
transform the squared deviations to average deviations-and the way to achieve this
goal is by taking the square root of the variance, which converts the variance to what
is called the standard deviation.
KEY T ER M The standard deviation is the average deviation between an observed score and the mean of a dis-
tribution. Standard deviation, symbolized s, is determined by taking the square root of the variance,
orW= s.
The formula for the standard deviation of a sample is:
[4.14.1] s=W.
We can easily determine the standard deviation for the variance shown in [4.l3.6]:
[4.14.2] 5= v2.5,
[4.14.3] 5 = 1.58H.
As can be seen from [4.14.1], once you know the value of the variance, the standard de-
viation is also known once the square root of the variance is taken-an easy feat with a
calculator. It follows, of course, that if you know the standard deviation, you can deter-
mine the variance by squaring 5, or:
[4.15.1] 52 = {S)2,
[4.15.2] 52 = {1.58H)2,
[4.15.3] 52 = 2.5.
As we will see, the idea of standard deviation and the descriptive statistic,s, are es-
sential to many of the inferential statistics that researchers use. The basic idea behind the
concept and the numerical value is actually rather straightforward: Standard deviation
describes the typical distance-the average deviation-between a given score in a distri-
bution and the mean. It is a very useful index of variability, and as you gain experience
calculating and thinking about it, you will develop an intuitive sense of what it means in
terms of any given data set. For now, conceptualize it the same way we conceptualize
other indicators of variability-smaller standard deviations indicate that observations
fall closer to the mean and larger standard deviations suggest that, on average, observa-
tions fall farther away from the mean.
,
I
, Variance and Standard Deviation 159
I
r Homogeneity and Heterogeneity: Understanding the Standard
Deviations of Different Distributions
Smaller standard deviations indicate that observations fall close to a distribution's mean.
(
The standard deviation is the most These low average deviations from the mean also indicate that the observations in the
,f distribution tend to be similar to one another. Statisticians refer to high degrees of sim-
common and, arguably, the best
ilarity-that is, low dispersion-as homogeneity, and the observations in such a distri-
r single index of a distribution's bution are said to be homogeneous. When data are spread farther apart in a distribu-
!
,. variability. tion, when observations tend to be dissimilar in value, a distribution is classified as
I"
heterogeneous. Heterogeneity occurs when a distribution's values show a relatively high
f degree of dispersion around a mean.
( These twin concepts are very useful when it comes to characterizing and comparing
!
Low dispersion = homogeneous distinct distributions. Figure 4.3 shows two distributions, each with a different mean.
)
I Their standard deviations, too, vary: panel (a) has a relatively low degree of spread
observations; high dispersion =
I around the mean (Le., the standard deviation is 2.0) and panel (b) shows greater disper-
,/ heterogeneous observations. sion around the mean (i.e., the standard deviation is 5.0). As you can see, panel (a) in
/
Figure 4.3 would best be described as homogeneous-the observations are closely clus-
( tered around the mean. Notice that the numbers around the mean in panel (a) represent
! the standard deviation, which is small-this fact is also shown by the rather tight curve
('
) surrounding the mean. The spread in panel (b) in Figure 4.3 is much wider, just as this
I distribution's standard deviation is much larger. As you can see, the standard deviation
(
'}' intervals around the mean of panel (b) cause the curve to be much shorter and more
spread out than panel (a). Thus, especially when it is compared to panel (a), the panel
( (b) distribution is clearly heterogeneous.
I Besides serving as a good index of dispersion and providing a base for comparison
among different distributions, the standard deviation serves one other function: It lo-
( cates the bulk of the scores in any distribution. As can be seen in Figure 4.3, both distri-
butions contain shaded areas around the mean and, as you know, these areas were cre-
ated by taking each mean and then adding and subtracting their respective standard
deviations. More to the point, though, by "bulk" of the scores in each distribution, I
( mean that about 68% of the available observations fall within the first standard devia-
tion intervals around each mean, leaving about 32% of the remaining observations in
I
r the distributions' tails (Le., in the second and third standard deviation intervals
-(
(
!
Distribution Mean Standard Deviation
"/
,
A 30 2.0
(
y B 72 5.0
,)
!
i
~--~~~~~~--~~~~x
f 24 26 28 30 32 34 36
J (a) A Homogeneous Distribution
(
t'
.f
y
!
/
j ~~~--~----~---L--~~~X
i 62 67 72 77 82
.; (b) A Heterogeneous Distribution
I
r Figure 4.3 Comparing Distributions: Homogeneity and Heterogeneity
r
I
Q;
160 Chapter 4 Descriptive Statistics: Central Tendency and Variability
extending outward to the left and right of the mean). Why do distributions possess this
sort of standardization? We will explore this matter in chapter 5, when we learn about
the consistency of standard deviation intervals and their utility for locating observations
in a distribution.
10 100 6 60 600
9 81 8 72 648
8 64 7 56 448
7 49 6 42 294
6 36 6 36 216
5 25 5 25 125
4 16 9 36 144
3 9 2 6 18
2 4 8 16 32
2 2 2
N= If= 59 IfX=351 I f(X2) = 2,527
Ix= 351 I X2 = 2,527
(I X)2 = 123,201
2 IX2 - (Ix)2/N
5 = N
2 2527 - 2088.1525
5 = 59
2 438.8475
5 =
59
52 = 7.4381 == 7.44
5=W
5 = \1'7.4381
5 = 2.7272 == 2.73
Variance and Standard Deviation 161
numbers into the respective equations for variance and standard deviation shown at the
bottom of Table 4.7.
That is, the population variance is the sum of the squared deviations between all obser-
vations (X) in the population and the mean of the population (IL), which is then divided
by the total number of available observations. If we want to "unpack" this formula a bit,
we remember that the population 55 would be based on:
[4.17.1]
Because this relationship holds true here as well as for sample data, it follows that the
population variance can also be calculated by:
[4.18.1] u 2 =55
-,
N
or by
2
u =
2: X2 - (2: Xy / N.
[4.19.1]
N
As you can see, these three formulas (ie., [4.17.1], [4.18.1], [4.19.1]) are mathematically
similar to [4.9.1], [4.12.1], and [4.13.1], respectively, because the relationships being ex-
amined among the variables are the same. The only difference is that we are now con-
sidering describing the variance of a "knowable" population.
What about calculating the standard deviation of a knowable population? The
symbol for a population's standard deviation is u, or "lowercase sigma:' As you no
162 Chapter 4 Descriptive Statistics: Central Tendency and Variability
doubt discerned already, it is simply a matter of taking the square root of the population
variance, or:
[4.20.1) u=W.
We showed above that when the standard deviation of a sample is known, by squaring it
the variance also can be known. This relationship holds true for population data as well,
therefore:
[4.20.2)
Finally, the population standard deviation can also be determined by taking the square
root of the SS divided by N:
[4.21.1) u=~.
Practically speaking, then, accessible, intact populations are treated no differently
than sample data-only the symbolic relations within the appropriate formulas change.
What happens when we try to use sample statistics to infer the characteristics of larger,
potentially unknowable populations? We must estimate population characteristics by
using slightly adjusted formulas for working with sample data, the topic of the next
section.
s
2
= I(x-xi .
N
If we place a radical or square root sign over the equation from [4.11.1], we have the
. formula for the sample standard deviation. Written more simply, of course, we can rely
on [4.14.1]:
s=W.
What if we are not describing a sample but, rather, trying to estimate the population
variance and standard deviation? We must use modified formulas to do so because
[4.11.1] and [4.14.1] provide what is called a biased estimate of these population
parameters.
Variance and Standard Deviation 163
KEY TERM A biased estimate (sometimes called a "biased estimator") is any descriptive statistic that will
reliably overestimate or underestimate a corresponding population parameter.
KEY TERM An unbiased estimate (sometimes referred to as an "unbiased estimator") is any descriptive
statistic that reliably approximates the value of a population parameter.
"Unbiased" means that any calculated sample statistics are closer to actual population
parameter values than would be the case if a biased estimate formula were used.
This is another occasion when the size of a sample affects the accuracy of a statistic;
Biased estimators consistently where sampling is concerned, bigger is better. When the N of a sample is relatively small
(e.g., < 20 observations) estimates based on the unbiased population variance and stan-
overestimate or underestimate
dard deviation will still differ somewhat from the actual parameters. For a larger N (e.g.,
parameters, whereas unbiased 100 or 200 observations), however, unbiased estimates will approach the actual values of
estimators tend to approximate the population parameters. Errors in estimation, then, will be greater when samples are
parameters. small, gradually decreasing as sample sizes grow.
New symbols will indicate when unbiased estimates are being used (recall Table 4.1).
The unbiased estimate of the variance is 52 and the unbiased estimate of the standard de-
viation is 5. The "caret" or II means that the correction factor of N - 1 was used as the
denominator in the respective formulas. The formulas for the unbiased estimate of the
population variance and population standard deviation, respectively, are:
[4.22.1]
"2
s =
I (X - X)2 55
= --,
N-l N-l
[4.23.1] 5=~.
52 I (X -
X)2 == a2 = I (X - pi
N-l N
164 Chapter 4 Descriptive Statistics: Central Tendency and Variability
It follows that the unbiased estimate of the population standard deviation should ap-
proximate the population's true standard deviation:
5= J I (X - xy
N-l
== (J' = J I (X - JL)2
N
We can work through one example to illustrate how the unbiased estimates for the
The presence of a caret (A) indicates population parameters differ from the (biased) estimates provided by sample statistics.
Imagine that we draw a sample of N = 6 observations. These X observations, ranked
that an estimate is unbiased, one
from high to low scores, are shown below. A second column provides the X2. The
usually involving a correction factor calculation of SS is shown underneath the data table. Please beware of a potential
of N - 1 in a formula's denominator. error when calculating the SS using the computational formula: N, not N - I, is first
used to determine the SS; N - 1 (or, in the case of sample variance, N) is subsequently
used as a denominator after the SS is known.
X X2 (IX)2
SS= IX2 _ _ _ ,
9 81 N
7 49 (29)2
5 25 SS = 181 - -6-'
4 16
841
3 9 SS = 181--,
6
SS = 181 - 140.1667,
IX=29 IX2 = 181
SS = 40.833.
Once we know that the SS is 40.833, we are free to calculate the variance and the
standard deviation for the data. If we want to estimate the parameters of the population
where these data were drawn from, then formula [4.22.1] should be used to calculate the
vanance:
52 = ~ = 40.833 = 8.167.
N- 1 5
The unbiased estimate of the population standard deviation would be based on formula
[4.23.1]:
40.833 = 6.0855.
6
s = W = Y 6.8055 = 2.6087.
As you can see, the unbiased estimates of the population variance and standard devia-
tion are indeed larger than the corresponding sample statistics.
Variance and Standard Deviation 165
Skeptical readers and researchers-you should be the former working toward be-
coming the latter-will wonder whether the N - 1 correction for unbiased estimates is
truly necessary. As behavioral scientists are wont to say, "it's an empirical question;'
meaning that we need to see some actual data demonstrating the presumed effect to re-
duce or eliminate any skepticism. As modern statistics developed, statisticians found
that over many, many trials employing observations from real as well as theoretical
populations, the N - 1 correction lessened the distance between estimated and actual
I parameters. Is the correction perfect? No, but it reaches a level that statisticians find to
be descriptively acceptable and satisfactory for subsequent applications or analyses.
Keep in mind that statistics and data analysis are tools that provide guidance to be-
havioral scientists, not definitive answers-an estimate is always just an estimate, and a
researcher or student must decide if it seems to be a reasonable one. What matters here
is how close these approximations are to parameters that are sometimes knowable,
other times not. When analyzing data using formulas for unbiased estimates, we assume
that our results are close, if not entirely accurate, the majority of the time. In other
words, statistical estimates enable us to approximate reality, to close in on the relative lo-
cation of a parameter, not to define it with absolute certainty. Later, we will learn how
this assumption is very important for the testing of research hypotheses within
experiments.
I n chapter 1, we acknowledged that some calculators are more complicated to use than others.
This observation is especially true for calculators that have statistical functions already pro-
grammed into them. There is a good chance you have one-and I am guessing that there is an
equally good chance that you never bothered to read the instruction manual that accompanied it.
You should remedy this oversight quickly, right now.
Why? Do you know whether your calculator uses the biased estimate formula or the unbiased
estimate formula when it calculates the variance and the standard deviation? I have taught more
than my fair share of students who become understandably frustrated when the variance they
calculated by hand does not match the answer on their calculator's screen. Why the error or ap-
parent confusion? The students might be trying simply to describe a sample's variance (denomi-
nator of N) while the calculator is seeking to estimate the variance of the population the sample
hailed from (denominator of N - 1).
To add to the confusion, some calculators have a variance key (labeled 52, S2, or u 2 ) or a
standard deviation key (labeled 5, S, or u), and still others use a key symbol like "5 - 1" or even
cc u - 1". What do these various keys mean? Presumably, the presence of a cc -1" indicates the cal-
culator's standard routine is to provide unbiased estimates of parameters. Yet some calculators will
give unbiased estimates but their keypad lacks the telltale" -1;' and some machines provide both
sample statistics as well as population estimates (users just need to know which keys to push in
what sequence!). Avoid being plagued by wrong answers and frustration-before you check your
homework or analyze any data, be sure that you know which formulas for variance and standard
deviation your calculator relies on.
Knowledge Base
1. Determine the range for the following data:
10 12 16 19 22 23 28 32 34
166 Chapter 4 Descriptive Statistics: Central Tendency and Variability
2. Turn to Table 3.11 on page 121. Using the quartile information from this distribu-
tion of scores on the Life Orientation Test (LOT):
a. Calculate the semi-interquartile range.
b. If the mean LOT score is 22.83, between what two scores do half of the distribu-
tion's observations fall?
3. Consider the following table of sample data:
X
8
5
4
2
Answers
1. range = 34 - 10 = 24
2. a. SIQR = 36 - 13/2 = U.S b. 22.83 ± U.5 = 11.33 and 34.33
3. a SS = 30 b 52 = 6 C 5 = 2.5
4. f.L = 4.0; (T2 = 6.0; (T = 2.5
5. ;2 = 7.5;; = 2.74
Sample Size. Generally speaking, greater variability is apt to be found in studies em-
Bigger samples are generally better ploying smaller sample sizes than research based on larger numbers of observations (see
also, Data Box 4.D). Although the utility of larger samples will be discussed in detail in
samples where the reliability of
chapter 9, a short preview can be made here. Unlike smaller samples, larger samples of
inferential statistics is concerned. data are more likely to reflect the characteristics of their parent populations. In this
chapter, for example, larger samples were shown to result in closer matches between un-
biased parameter estimates and actual population parameters. Larger samples, too, are
likely to actually take on the shape of the parent distribution; indeed, as we will see later,
Factors Affecting Variability 167
large enough samples can even approximate normal distributions, making them ideal
for inferential statistics.
Selection Process. All else being equal, random sampling and random selection are
more likely to result in distributions containing relatively low amounts of variability. If
every member of a given population has the same chance of being selected as all the oth-
ers, there is no bias in the formation of the sample. That is, the randomization process
will probably yield a distribution of observations that adequately portray the parent
population's characteristics. In the absence of any randomization, researchers can un-
knowingly draw on biased subsamples of a population or include nonrepresentative
members, resulting in higher than desired levels of variability in their responses. Ran-
domization is the raison d'ftre-the reason or justification-for a quality selection
process in any research effort.
response, say, because alternative responses are unavailable. The opposite problem, too
much variability, could occur if dependent measures are too sensitive, leading to many
outliers, as well as inconsistent responses to even related ideas. Ideal dependent measures
cut a middle course between these extremes by allowing participants, as well as investi-
gators, a moderate degree of response flexibility-change, reaction, response, action-
all must be measured appropriately.
Passage of Time Between the Presentation of the Independent Variable and Depen-
dent Measure. The passage of time is a practical matter than can influence judg-
ment and behavior in the lab or in everyday life: How long does the effect of a given
independent variable last? In most experiments, only a few minutes pass between the
manipulation of the independent variable and the measurement of the dependent
variable. Although the latter may have greater or lesser amounts of variability, it is not
likely to be due to time passage or related problems (e.g., fatigue, memory loss; see
also, Cook & Campbell, 1979). If the time between the presentation of one and then
the other is longer-days or weeks pass, even months, before a dependent measure is
administered-then variability in response is less likely to be attributed to the inde-
pendent variable. As demonstrated by Elizabeth Loftus' (Loftus & Palmer, 1974; see
also, Loftus, 1979) research on constructive memory processes for events occurring
during an accident, for example, people fill any vacuums in recall with stereotypic ex-
pectations (see also, Nisbett & Wilson, 1977). As an extraneous variable, then, time
can affect variability in ways that are not necessarily desirable where predicted results
are concerned.
What is the correct answer? Across a year, the smaller hospital is apt to have more days that
diverge from the 50% male and 50% female birth norm. Why? Think about the relationship be-
tween variability and sample size: Larger samples are likely to be more representative of any given
population (here, 50:50) than are smaller samples. In statistical contexts, this fact is called the law
oflargenumbers. Tversky and Kahneman (1971) note that people appear to fall prey to what they
dubbed (tongue in cheek) the law of small numbers, which renders even skeptical researchers in-
sensitive to the statistical problems present in small samples. Indeed, smaller samples are usually
not representative of their parent populations-they often contain extreme or outlying scores.
Thus, we should expect that because fewer births occur at the small hospital, anyway, it should be
the one to show the greatest divergence from the expectation of equal male and female births. Put
another way, smaller samples usually show higher variability or heterogeneity than large samples.
If you missed the problem, please go back and read it again-do you see why the smaller hos-
pital is the correct choice? Congratulations to readers who selected the right answer, but don't
worry if you missed it, as psychologists, mathematicians, and even statisticians have been known
to get such problems wrong. Tversky and Kahneman (1974; see also, Tversky & Kahneman, 1971)
report that most people who encounter this problem are insensitive to the effects sample size has
on outcomes; that is, they assume that each hospital should show about the same number of days
diverging from the 50:50 split.
Problems like this one, as well as real-life situations where some form of sampling occurs,
should demonstrate to you the powerful way that the size of a sample can sway its variability. The
lesson is clear: It is always a good idea to consider the size of a sample before drawing any conclu-
sions from it or answering any questions about it.
Please take note that when reporting such relationships, APA style dictates that values in-
side the parentheses be separated by a comma.
The second way to report means and accompanying standard deviations is,
of course, as entries displayed in a table of data. Instead of reporting the central ten-
dency and variability information in sentence form, it can be summarized in a table.
170 Chapter 4 Descriptive Statistics: Central Tendency and Variability
Experimental Control
Males
M. 6.50 4.05
SD 1.10 3.30
Females
M 6.00 4.15
SD 3.60 1.60
Note: Higher numbers indicate the task was rated as more interesting on a 7-point rating scale (extremely
uninteresting to extremely interesting).
Table 4.8 contains the male and female participants' mean ratings of two different tasks,
as well as the standard deviation within each participant group (for related suggestions
about presenting information in tabular form, see chapter 3). As shown by the relatively
small size of the respective standard deviations, males tended to rate the experimental
task the same, while females rated the control task similarly to one another. In contrast,
greater variability was found for the males' ratings of the control condition and the fe-
males' evaluation of the experimental task (see Table 4.8)
Specific mention should be made somewhere regarding the number of participants
Remember, N = the total number of in each condition in the experiment, as well as the total participating in the research.
Some writers prefer to make note such numerical information in the method section
observations comprising a sample;
(e.g., "There were 40 participants in the study; 20 were placed in the experimental con-
n = the number of observations dition and 20 in the control condition") or in a Note appearing under a figure or table.
within a given subsample of N. Other writers prefer to insert specific numerical information into a table. Under each of
the 4 SDs in Table 4.8, for example, one could add an "n = 10:' Note that the use of n in
this circumstance refers to the number of participants appearing in subgroups compris-
ing the larger N (recall Data Box 4.A).
• Early on in this chapter, the mean-the arithmetic average-was identified as the single
best measure of central tendency for research in the behavioral sciences. Various expla-
nations for the ubiquity and utility of the mean were offered, but the supporting argu-
ments were more rhetorical than quantitative. I want to change that perspective some-
what now that you have a relatively solid understanding of the relation between central
tendency and variability. Several properties of the mean were discussed, but one was
postponed until now: the least squares principle for the mean. In fact, once I introduce
and define the least squares principle, I will leave it to you to prove its importance using
a distribution of data.
KEY T ER M The least squares principle refers to the fact that within a distribution, the sum of the squared de-
viations between the mean and individual scores will be smaller than the sum of squared deviations
between a given score and any other individual scores from that distribution.
I
Factors Affecting Variability 171
- Ix 20
x= N =5=4.0
Another way to define the mean is that it represents the smallest average squared difference
between itself and the individual observations in a distribution--hence, the term "least
squares:' As we will see later, this minimal amount of squared deviation is one of the pri-
mary reasons that the mean features so large in the family of inferential statistics.
What must be shown for the present, though, is that the sum of the squares (55)
around the mean of a distribution must be smaller than the 55 for all other obser-
vations (or possible observations) in a distribution. Table 4.9 presents a small distri-
bution of scores (X). There is space provided for you to calculate the squares and the
sum of the squared deviations from the distribution's mean as well as the other scores in
the distribution. Your object here is to demonstrate the least squares principle, that
2: (X - X)2 is less than the difference between X and any of the other scores in the
distribution. To help you get started, a few entries already appear in the column under
(X - 1)2. Be sure to record the 55 under each of the columns so that you show the one
corresponding to the mean is indeed less than the others.
This project exercise is by no means difficult or time consuming, but it will give you
first hand experience with one of the main reasons that the mean is the statistic of choice
for the majority of analyses performed in inferential statistics. The calculated answers
appear upside down under Table 4.9, but please do not look at them until you have tried
to prove the least squares principle on your own.
point of central tendency is identified. When the scale (Le., interval, ratio, ordinal, nom-
inal) used and the shape of a distribution are known, the decision tree opening this
chapter will help you to select an appropriate measure of central tendency. Once a mea-
sure of central tendency is known, the second decision tree will guide your choice of an
index of variability (an important choice, especially when it and the central tendency are
presented in the context of a research summary).
172 Chapter 4 Descriptive Statistics: Central Tendency and Variability
The third and final decision tree serves as a reminder regarding the difference
between-and appropriate use of-the sample statistics (X, 52, 5) and population
parameters (IL, 00 2 , 00). Whether to report sample statistics or to estimate population
parameters (biased or unbiased) depends on the source of the data (sample, population)
and whether the research goal is description or statistical inference. In later chapters, too,
the role and use of unbiased estimators will become more important. At present, how-
ever, consider how this decision tree, as well as the two noted immediately above, can be
used to characterize typical responses and the relative level agreement with them, as well
as to accurately portray such information to interested others.
Summary
1. Measures of central tendency and variability comprise two 9. In normal, that is, symmetric distributions, the mean, me-
classes of basic, descriptive statistics. The two topics are dian, and mode will share the same value. In positively
bound together because when behavioral scientists ask about skewed distributions, the mean will be higher than the me-
one, they necessarily need to know about the other. dian and mode, respectively. The median and mode-in that
2. A measure of central tendency is used in order to identify one order-will fall higher than the mean in negatively skewed
numerical value that is the most representative one within a distributions.
distribution. A given measure of central tendency is the best 10. Measures of central tendency should be written about in
single indicator of what a sample or population is like. clear, descriptive terms. Readers should get a real sense of
3. The word "average" originally was reserved for describing what the mean, median, or modal behavior was like.
some numerical value that was the most typical or represen- 11. Measures of dispersion or variability account for the ways
tative one in a data set. Unfortunately, the word average has data in samples or populations deviate from the relevant
developed other connotations, some pejorative. As a result, it measure(s) of central tendency. When the spread of scores
is often a good idea for researchers and students to make around a mean is small, for example, variability is low. Vari-
clear whether they are interpreting the word in a statistical or ability is high when scores around the mean are far apart
a cultural sense. from one another.
4. The mean, the arithmetic average, is the most useful measure 12. The range is the simplest index of variability, and it is based
of central tendency. Much of the literature in the behavioral on the difference between the highest and lowest scores in a
sciences, especially psychology, is based on the relationship distribution. Similar to the mean, however, the range's value
among IE.ean observations in experiments. The mean is sym- can be inflated by extreme scores.
bolized X in statistical formulas and M in text written in APA 13. When a distribution is not normal, the interquartile range or
style. the semi-interquartile range (SIQR) can be employed as
5. The formula for calculating the sample mean is similar to measures of dispersion. The latter measure overcomes the in-
that used to determine the population ~an-one relies on stability of the range by numerically indexing half of the dis-
symbols denoting sample statistics (Le., X), the other popu- tance between a distribution's first and third quartiles. A
lation parameters (Le., p,). larger SIQR value implies a greater spread of scores, just as a
6. Although the mean is very useful, extreme or outlying scores smaller value indicates lower variability.
in a distribution can unduly influence its value. To combat 14. The most flexible measures of variability are the variance and
this sensitivity, the median, another index of central ten- the standard deviation. The variance is the average of the
dency is sometimes reported in place of the mean. squared deviations from a distribution's mean. Taking the
7. The median (abbreviated in APA style as mdn) is the number square root of the variance yields the standard deviation, or
or score in a distribution that precisely divides into two equal the average deviation between an observed score and the
halves. Fifty percent of the scores fall above the median in a mean of a distribution.
distribution, and 50% fall below it. It is not sensitive to ex- 15. Although they contain different symbols, the formulas used
treme scores, so its value tends not to shift dramatically when to describe the variance and standard deviation of a sample
scores (including outliers) are added to or subtracted from or population are mathematically equivalent. When sample
the distribution. data is used to infer the characteristics of a population, how-
8. The mode is the most frequently occurring value in a distri- ever, formulas for making unbiased estimates of the popula-
bution and, as such, is not calculated. Typically, the mode is tion parameters are used. N - 1 serves as the denominator in
based on a nominal scale. As it highlights frequency exclu- the variance and standard deviation formulas for unbiased
sively, use of the mode is rather limited. estimates, as N routinely underestimates population values.
/
Chapter Problems 173
16. Homogeneous distributions contain a preponderance of 18. Measures of variability tend to be reported in the Method
similar values, generally pointing to response unanimity and and Results sections of APA style articles and, inevitably, they
low dispersion. When values are spread far apart and tend to accompany measures of central tendency. If a study's vari-
be dissimilar in value, such high variability distributions are ability is not summarized in the text, then it is noted within a
labeled heterogeneous. table.
17. Within behavioral science research, variability can be influ- 19. The fact that the sum of the squared deviations between a
enced by sample size, how participants were selected, partici- distribution's mean and its individual scores is smaller than
pant characteristics, the clarity of the independent variable(s), the sum of squared deviations between the individual
the sensitivity of dependent measure(s), and the passage of scores themselves is called the least squares principle for
time between the manipulation and measurement ofvariables. the mean.
Key Terms
Average (p.135) Law oflarge numbers (p.169) Standard deviation (p.158)
Biased estimate (p.163) Least squares principle (p.170) Sum of squares (SS) (p.155)
Central tendency (p.134) Mean (p.136) Unbiased estimate (p.163)
Heterogeneous (p.159) Median (p.144) Variability (p.15l)
Heterogeneity (p.159) Mode (p.146) Variance (p.155)
Homogeneous (p.159) Outlier (p.14l) Weighted mean (p.142)
Homogeneity (p.159) Range (p.153)
Interquartile range (p.153) Semi-interquartile range (p.154)
Chapter Problems
1. Define the term "central tendency." Explain the concept's different groups of participants. Calculate the mean based on
utility for statistics and data analysis, as well as research in the these data: The 32 members of group A had a mean of 27.5;
behavioral sciences. the 48 members of group B had a mean of 23.0; and the 12
2. Calculate the mean, the median, and the mode of each group members of group C had a mean of 25 on the personality
of scores: measure.
a. 10, 16,8,9,20, 15,21,6, 13, 18, 12, 10 8. When should a researcher use the weighted means approach
b. 3,1,3,4,7,2,1,7,8,2,3,1,5,7,3,7,6,2,3 to analyze data? Why was the weighted means approach the
c. 20,34,21,45,32,33, 16,84 appropriate one for analyzing the data presented in problem 7?
d. 13, 13, 13, 11, 10,9, 14, 16, 15, 15, 12, 11 Would it make any difference if an investigator simply calcu-
3. Review your answers to problem 2. Would reporting the lated the mean of the three means in problem 7-in other
mean for any of the four distributions pose a problem for an words, what makes the weighted average a better statistics for
investigator? Why? this example?
4. Select one of the distributions from problem 2 and use it to 9. Most people who need welfare tend to receive this form of
demonstrate the least squares principle. (Hint: Repeat the public assistance for a very short time (6 months or less),
procedure used for the Project Exercise presented at the end and the number of people who review welfare for the long
of the chapter.) term is relatively small. If you were a government demogra-
5. You have a distribution of scores and most of them are rela- pher who was researching welfare, which measure(s) of
tively close in value to one another. There is, however, an out- central tendency-the mean, the median, or the mode-
lying score that is very far from main cluster of scores. What would probably be the best one to use for your analyses.
measure of central tendency should you report? Why? Why?
6. Why is the mean sensitive to extreme scores? How can re- 10. Examine the following information and then indicate
searchers reduce the interpretive problem posed by outlying whether these hypothetical distributions are apt to be nor-
scores? In such situations, is one of the other measures of ma1. or positively or negatively skewed:
central tendency a better choice than the mean? Why? a. -?£ = 34, mdn = 36, mode = 39
7. An investigator wants to report the average score on a stan- b. -?£ = 12, mdn = 12, mode = 12
dardized personality measure that was administered to three c. X = 5, mdn = 3, mode = 2
174 Chapter 4 Descriptive Statistics: Central Tendency and Variability
11. Create examples of data where the appropriate measure of 23. Calculate the biased and unbiased estimates of the variance
central tendency is: and standard deviation for distributions a to d in problem 2.
a. Themean 24. Although the two measures of dispersion are intimately re-
b. The median lated, why is the standard deviation preferred over the vari-
c. Themode ance? Explain the concept underlying the standard deviation
12. How would you characterize the following hypothetical dis- as a measure of variability.
tribution? X = 53, median = 57, mode = 63 25. Explain the role variability plays in both homogeneous and
13. Which measure of central tendency is most affected by skew heterogeneous distributions.
in a distribution? Which one is least affected? Why? 26. What measure of dispersion is usually reported with the me-
14. Compute the mean of the following table of data: . dian? Why?
27. Review the four distributions presented in problem 2 and
x f then answer this question for each one: Between what two
7 10 scores do half of a distribution's observations fall? (Hint: Use
5 8 the semi-interquartile ranges calculated in problem 16.)
3 11 28. You are a psychologist who is about to conduct a social psy-
2 9 chology experiment on interpersonal attraction. Name three
7 factors that can affect variability in a research project.
29. What is the relationship between sample size and variability?
15. Calculate the range for distributions a to d in problem 2. Is it better to have data from a larger or a smaller sample?
16. Calculate the semi-interquartile range for distributions a to d Why?
in problem 2. 30. Consider these two samples:
17. Calculate the sum of the squares (SS) for the following distri- Sample X: 3,4,6,8,9,12,14,18
bution: Sample Y: 2, 8,17,25,26,27,35,36
a. Intuitively, which sample do you think has greater vari-
x ability? Why?
b. Calculate the mean and standard deviation for each
11 sample.
8 c. Does either mean convey a better sense of the data it is
7 based on? What does each standard deviation convey
about the data?
6
31. Assume that you want to use samples X and Yin problem 30
5 to infer the characteristics of their respective populations of
4 origin. Calculate the unbiased estimates of the variance and
standard deviation for each one.
2
32. Examine the following data:
18. Assume the distribution shown in problem 17 is sample data 11, 14, 15, 15, 12, 10,8, II, 12, 14,47, 18,20,25,27,20,42,
and you want to describe it. Calculate the mean, range, sam- a. Calculate the mean, variance, and standard deviation.
ple variance, and standard deviation. b. Calculate the median and semi-interquartile range.
19. Assume the distribution shown in problem 17 represents a c. Does either combination of central tendency and variabil-
population. Calculate p" u 2 , and u. ity provide a better sense of the data? Explain your answer.
20. Imagine that you want to use the sample data shown in prob- 33. Use the decision tree(s) at the start of this chapter to answer
lem 17 to estimate the parameters of the population from the following:
which it was drawn. Calculate the unbiased estimates of the a. The data are based on an interval scale, and the distribu-
population's variance and standard deviation. tion is skewed. Which measure of central tendency should
21. Explain the difference between so-called "biased" and "unbi- be used?
ased" estimates of population parameters. Which type of es- b. The data are based on an ordinal scale. Which measure of
timate is used in what sort of situation? central tendency should be used?
22. Review the SS and N information provided below, then cal- c. The data are based on a nominal scale. Which measure of
culate the biased and unbiased estimates of the variance and central tendency should be used?
standard deviation for each one. d. The data are based on a ratio scale, and the distribution is
a. SS = 127,N= 30 normal. Which measure of central tendency should be
=
b. SS 78, N 40 = used?
c. SS = 100,N= 15 34. Use the decision tree( s) at the start of this chapter to answer
d. SS = 45, N = 10 the following:
Chapter Problems 175
a. A median is being reported. Which measure of variability a. A researcher plans to estimate the characteristics of a pop-
should be used? ulation. Should statistics or parameters be calculated?
b. A mode is being reported, but it is not based on a nominal Why?
scale. Which measure of variability should be used? b. A researcher plans to describe the characteristics of a sam-
c. A mean is being reported. Which measure of variability ple. Should statistics or parameters be calculated? Why?
should be used? c. A researcher plans to describe the characteristics of an entire
35. Use the decision tree(s) at the start of this chapter to answer population. Should statistics or parameters be calculated?
the following: Why?
Describing the Placement of Observations
...
1. 2. 3.
Do you want to know an observa- Do you want to know the score's Do you want to summarize the
tion's location relative to the mean? standing relative to other scores? entire distribution of scores?
I I I I
Ify~§;then ,«ftknonhen go to If yes, then select: >If nb;then review
convert the raw step 3. measures of your goals and
score to a z sco re to z sco re central tendency go back to
score. and determine its and variability step 1.
percentile rank. (see chapter 4)
3.
Do you know the 1£ and the u of Do you know the
the population? X and the s of the sample?
5. 6. 7.
the appropriate numbers .Enter the appropriate numbers int9 A zscore cannot be calculateq/y!
the following formula and ·.··the following formula and then r~-; >unless information about the raw ii/
report the z score's location ~~Ilprt the z score's location relative score's parent distribution is ...
to the population 1£: z = (x- 1£)1 u 'tothe sample X: Z= (J{- X)/s available.
1 ~
Locate the z score in column 4 of Table B.2 in Appendix
c. Add the per- a. Find the area b. Convert the c. The resUlting
centage to 50%- below the z by column C percentage is
the resulting (total) reading across proportion to a the percentile
percentage is the to column C. percentage by rank of the z
percentile rank of multiplying it by score.
the zscore.
5.
the z score equal to O.O?
Chapter Outline
Data Box S.A: Social
Among Behavioral and
Scientists: How Many Peers
Research Before Publication?
Data Box S.B: Explaining the
Decline in SAT Scores: Lay Versus
Statistical Accounts
DISTRI~UTION
The Area Under the Normal
Application: Comparing
Performance on More than One
Measure
Knowledge Base
• Working with z Scores and the
Normal Distribution
ne of the great lessons taught by research methods, statistICs, and life is the Finding Percentile Ranks with
Scores
importance of asking a particular question: Compared to what? The methodol-
Further Examples of Using z
ogy used by behavioral scientists reminds them to constantly compare the be-
Scores to Identify Areas Under
havior of one group with that of another, usually a control group. Comparing what Normal Curve
research participants do within the various conditions of an experiment is reinforced Data Box S.C: Intelligence,
when the data are analyzed. As we learned in chapter 4, most statistics are mean- Standardized IQ Scores, and the
ingless unless they compare the measures of central tendency and dispersion observed Normal Distribution
in one group with those drawn from others. We will broaden our conceptual under- A Further Transformed Score:
standing of the process of drawing meaning from descriptive statistics by starting TScore
to compare them in this chapter. Actual, inferential statistics will be presented later in Writing About Standard Scores
and the Normal Distribution
the book, but their underlying foundations begin in earnest here.
Knowledge Base
But wait a moment, what about life-how does life teach us to ask about or to
make various comparisons? The act of comparing ourselves to others is ubiquitous and • Looking Ahead: Probability, z
Scores, and the Normal
ongoing in the social world. In the language of social psychology, for example, the act
Distribution
of evaluating our own opinions and abilities against those of other people is called social
comparison (Festinger, 1954). We do it all the time, no doubt because other people- Project Exercise: Understanding
the Recentering of the Scholastic
their acts, abilities, feelings-are such a good source of information for us. In fact, we Aptitude Test Scores
cannot seem to help ourselves, as we almost effortlessly, seemingly automatically, look
• Looking Forward, Then Back
to others for information about what we should do or how we should feel in a partic-
ular situation. • Summary
There is no more obvious realm for comparing ourselves to others than the class- • KeyTerms
room. Take a moment and think about how often you spend time comparing your aca-
demic performance to that of others, whether real (e.g., roommate, friends, family,
classmates) or imagined (e.g., the "perfect" student, the "struggling" peer). There is a
good chance that your social comparing in the educational realm has been a minor
178 Chapter 5 Standard Scores and the Normal Distribution
preoccupation since middle or high school, if not earlier (in my experience, even the
most laid back college student has a competitive streak, one prompted by comparing the
self to others).
Here is a past example of your drive to compare, one that may still be a source
of mild pride or subtle discomfort for your conscious self: What were your scores
on the Scholastic Aptitude Test (SAT)? Way back when you received two main
scores, one for verbal ability and one for math. Despite the fact that the College
Board and the Educational Testing Service cautioned against adding these subtest
scores together, your primary concern was presumably whether their combination
put you in the "admit range" of the college or university of your choice. After
that, you probably shared (and compared!) your scores-and the hopes and
dreams you had for those scores-with friends and peers who had similar plans
for their futures.
Back in chapter 3, we established that you were probably unaware that the percentile
When examining actual behavior or
rank information accompanying your SAT scores provided a relative sense of how your
scores compared to those of other people who took the test at the same time you did. If
descriptive statistics referring to it, be your verbal score was at the 70th percentile, for instance, then you performed better than
a critical observer by asking, or equal to 70% of predominantly high-school-aged peers who took the test. But there
"Com pared to what?" was also some other information that you probably did not know or really attend
to-that the test was standardized so that individuals evaluating performance on the
SAT would have a sense of where a given verbal or math score fell along the distribution
of possible scores. In other words, people evaluating your scores were able to ask-and
to answer-compared to what?
The SAT is a single but important component of student applications to college,
one that has been used for admissions purposes for over 70 years, and not without
its share of controversy (Schwartz, 1999; for a related discussion, see Bowen & Bok,
1998). The verbal and mathematical subtests each can have scores ranging be-
tween 200 and 800 or, if you prefer, combined scores ranging from 400 to 1,600 (but
see Data Box S.B and the end-of-chapter Project Exercise). The mean of each subtest
was set at 500: scores falling above this point were deemed above average and scores
falling below were said to be below average. Different educational institutions deter-
mine different admissions standards for acceptable test scores (i.e., how many stu-
dents within a given range of scores would be considered for admission at the college or
university).
Your understanding of central tendency (mean scores on the SAT), dispersion
(range of possible values on the two subtests or the combined score), and percentile
rank (relative standing of a given score within the distribution of SAT scores) should-
albeit with some hindsight-render your own SAT scores a bit more meaningful.
What remains to be explained, however, is how scores on the SAT or any other
standardized test, psychological inventory, or personality measure can be compared to
past, present, and yes, even future test takers. How are such comparisons, really ex-
trapolations, possible? We will touch on these and related issues throughout the
chapter.
To do so, we will apply knowledge acquired in the first four chapters of the book
to examine what are called standard or z scores and their relation to the normal
distribution. Both z scores and the normal distribution will provide us with a
meaningful context for understanding how a given score or statistic can be interpreted
in light of the distribution it was drawn from, an excellent preparation for the in-
ferential statistics to come. To begin, however, we must discuss the matter of stan-
dardizing measures.
Chapter 5 Standard Scores and the Normal Distribution 179
Note: "Higher numbers reflect more peers acknowledged per article; bhigher numbers indicate more authors
per article.
Source: Adapted from Suls & Fletcher (1983).
180 Chapter 5 Standard Scores and the Normal Distribution
P undits have noticed a steady but modest decline in the SAT verbal and math scores for years,
puzzling over the meaning and significance of the changes in these averages. Are students be-
coming dumb or dumber? Or, has the SAT been "dumbed down"? If these two possibilities are
both false, how else can the noted decline in SAT scores be explained? To answer this question, we
need to examine the lay or nonstatistical explanation for the decline, as well as a more data-driven,
statistical account for the change in scores.
Lay explanations for the decline in SAT scores focus on social themes that can quickly become
politicized. One popular explanation is that today's students are less intellectual or motivated than
the students of past generations, having been raised on television and rock videos rather than
books and newspapers. A companion theme of this "anti-intellectual" youth argument is that our pub-
lic schools, teachers as well as administrators, are doing a worse job at educating students than they did
20 or 30 years ago (i.e., "in the good old days"). This latter argument tends to appear in the con-
text of test scores or whenever school taxes are levied or increases are contemplated. You have
probably seen national and local politicians of various stripes make political hay with these sorts
of arguments.
Stop and think for a moment: Given the educational and technological booms the
United States has enjoyed for the past few decades, is it really possible that students are be-
coming, as it were, "dumber"? What else might be going on to make a decline in SAT scores be
more apparent than real? Well, consider the fact that in 1941, 10,000 predominantly white
males from the Northeastern United States took the test-recently, more than 1 million-
indeed, closer to 2 million-men and women take the SAT in a given year. These men and
women hail from diverse ethic, racial, and social backgrounds as well as educational experiences
(College Board, 1995).
Can you guess what a statistician might say about the SAT data and the purported decline in
scores? Consider what you know about sample size and populations. First, the relative size and
composition of the yearly population of college bound students has grown dramatically in size.
This fact might bring to mind the law of large numbers introduced in chapter 4 - the apparent
decline in scores may be a false one because the performance of (recent) larger samples of stu-
dents taking the SAT may be more representative of the population's parameters. In other words,
the "decline" in average scores is really a better reflection of the J.L of the distribution of possible
SAT scores.
Second, and in a related way, the population of students now taking the test is diverse-more
heterogeneous-than the homogeneous student samples from decades past. More students, dif-
ferent subpopulations of students, taking the test should result in some shift in the test scores-the
one that has occurred just happens to be in a downward direction, which may more appropriately
capture or represent the population parameters of the test. The College Board and the Educational
Testing Service have actually addressed this artificial decline by performing what is called a "re-
centering" of the scores, a simple transformation that returns the verbal and math subscale means
back to 500 each. We will examine this transformation procedure in more detail in the Project
Exercise at the end of this chapter.
Score realignments aside, I hope that the lesson here is clear. You should always be suspicious
of claims that skills are changing-for better or worse-when at the same time the population
based on which those skills are assessed is also changing. Remember the lessons taught by larger
samples, as well as the theme of this chapter-compared to what?
Why Standardize Measures? 181
Third
0(
Second
0(
Second
~
Third
~
(] (] (] (]
(] (]
Below fJ.. Below fJ.. Below fJ.. Above fJ.. Above fJ.. Above ,....
direction-that is, in the so-called "tails" of the distribution-they occur less frequently
as compared to those scores that cluster close to the mean (note that in Figure 5.1, the
median and the mode would share the population mean's value; see chapter 4). Thus, for
example, a person who scores a 142 on an IQ test is certainly not unheard of, but by falling
way in the distribution's upper tail, neither is she commonplace.
Various and sundry IQ tests and the SAT are by no means the only standardized
tests you may know of or encounter. Other common standardized tests include the Law
School Admissions Test (LSAT), the Graduate Record Exam (GRE), the Graduate Man-
agement Admissions Test (GMAT), and the Medical College Admissions Test (MCAT).
There are also a host of psychological tests and inventories with unfamiliar names or
acronyms that also yield standard scores.
Standardization of IQ scores or, for that matter, any measure, enables researchers to
precisely locate where a particular score falls in a distribution and to describe how it
compares to other scores in the distribution. This achievement involves converting a raw
score into a standard score.
KEY TERM A raw score is any score or datum that has not been analyzed or otherwise transformed by a sta-
tistical procedure.
A raw score, then, is any basic score, rating, or measurement in its pure form.
KEY TERM A standard score is derived from a raw score. Standard scores report the relative placement of in-
dividual scores in a distribution and are useful for various inferential statistical procedures.
Raw scores are turned into standard scores so that they can be used for comparison pur-
poses (e.g., Is one score closer to the mean than another?) or to make inferences (e.g.,
How likely is it that a given score is from one rather than another population?). When
raw scores are converted into standard scores, their apparent value will change but no in-
formation is lost; rather, the conversion renders the score easier to work with and to
compare with other scores along a distribution.
In order for standard scores to be truly useful, however, the mean of the relevant dis-
Converting raw scores to standard tribution must also be taken into account. Does a given score fall above, at, or below the
scores promotes comparison and
mean of its distribution? The mean continues to serve as the main reference point, the
anchor, if you will, of any distribution. Any mean, of course, provides limited informa-
inference.
tion unless it is accompanied by its standard deviation. We rely on the standard devia-
tion to inform us whether a given score is similar to or divergent from the mean, as well
as other scores in the distribution. For example, does a particular score fall into the
first or the second standard deviation above or below a mean? To begin to answer these
questions more concretely, we turn to the z score.
I
50 I 5 60
I
I
I I
I
difference between the mean and the observed score. Imagine, then, that we also know
the standard deviation, which is equal to 5. If we divide the absolute difference between
the mean and the observed score (i.e., 5) by the standard deviation of 5, we will know the
relative deviation of the observed score from the mean. That is, 5/5 = + 1.0, indicating
that a score of 55 lies 1.0 standard deviation above the mean of 50 (see Figure 5.2). In
fact, anytime a relative deviation is positive in value, we know that the score falls above
the mean-and we know where it falls in terms of standard deviation units.
What if the observed score were equal to 40? This time, although we know the ab-
solute difference between the mean and the observed score is 10, we will have a negative
number (Le., 40 - 50 = -10.0). If we divide -10.0 by the standard deviation of 5, we
find that a score of 40 lies 2.0 standard deviations below the mean (i.e., -10.0/5 = -2.0;
see Figure 5.2). Anytime a relative deviation is negative, then, we know how many stan-
dard deviation units below the mean it is located.
What have we accomplished? We have just calculated two z scores, common types of
standard scores derived from raw scores.
KEY TERM A descriptive statistic, the zscore indicates the distance between some observed score (X) and the
mean of a distribution in standard deviation units.
The z score tells us one very important thing: How many standard deviations away from
the mean is a given score? As we just learned in the above examples, a score of 55 was 1.0
standard deviation above the mean, while a score of 40 happened to be 2.0 standard de-
viations below the mean. Put another way, we know that the first score was relatively
closer to the mean than the second score (see Figure 5.2). In fact, a defining character-
istic of any z distribution is that the width of the standard deviation around the mean
will always be equal to 1.0.
PI~ase ?ote that a z score need not fall precisely on a standard deviation as the two
The standard deviation of a z scores m thIS example did. I used numbers that fell precisely on the standard deviations
distribution j" :>/wayo 1.0. ~or convenience. If we have an observed score of 52, for example, then the correspond-
. mgzscorewouldbe equal to +0040 (i.e., 52 - 50/5 = 2/5 = +OAO),which falls less than
:alfway across the first sta?dard deviation above the mean (see Figure 5.2). That is,
. 0040 IS less than + 1.0, whIch tells us that a score of 52 is very close to the mean of 50,
Just as +0.40 is very close to the z distribution's mean of O.
184 Chapter 5 Standard Scores and the Normal Distribution
Key Points Regarding zScores. Let's review these three important characteristics of
zscores:
l. The mean of any z distribution is always O.
2. The standard deviation of the any z distribution is always l.0.
3. When the value of a z score is positive ( + ), then the score falls above the mean of 0;
when it is negative ( - ), it falls below it. The only time a z score lacks a sign is when
it is equal to 0 (i.e., the raw score is equivalent to the original mean of the distribu-
tion).
The third point merits more emphasis and explanation. When we use the + and -
signs, we again use them as guideposts for the relative placement of a score above or
below the mean. We do not treat these signs as being indicative of value per se. Although
it is true that a z of + l.50 is greater than a z of -l.2, it is appropriate to think of the for-
mer as being higher in value or magnitude than the latter by virtue of its location rel-
ative to the mean.
But wait, there is one more point to add to this list:
4. The distribution of z scores will always retain the shape of the distribution of the
original raw scores.
Students-and many faculty members, actually-often forget this last one because
they misunderstand the definition of standard scores. Specifically, they may erroneously
Standardization =F normalization;
believe that the conversion to z scores somehow "sanitizes" or "normalizes" the data, tak-
conversion to z scores does not alter ing an any-old-shaped distribution of raw scores and turning it into that paragon of sta-
the shape of a distribution in tistical virtue, a normal distribution (see below). Not so. The conversion to relative devia-
anyway. tion provides a much better indication of where scores lie relative to one another, but it
does not change the shape of the distribution in any way-any skew or kurtosis present,
as well as any outlying scores, remains intact.
A Brief Digression on Drawing z Distributions for Plotting Scores. Despite the likeli-
hood of an irregular shaped distribution, I have always found it useful to draw a normal
shaped distribution when I work with z scores. Figure 5.2 is a formal exa~ple of what I
have in mind, as it enables you to visualize the placement of one score relatIve to. another,
and to identify their relation to the mean of 0 and the I-unit standard deviatIons sur-
rounding it. Again, few distributions of scores you encounter will actually be normal, but
jotting down a bell-shaped curve is a heuristic convenience, a short cut or rule of thumb,
that makes interpreting the converted raw scores easier to conceptualize.
Your bell-shaped curve need not be elegantly drawn, perfectly symmetric, or even
particularly neat to look at-but it should enable you to "see" how scores in the data re-
late to one another. A simple way to proceed is to sketch what I call a "volcano" shape
first (see panel (a) in Figure 5.3), and then to draw a semicircl: over its top (see. panel
(b) in Figure 5.3). Once you add this lid, you can erase the hnes underneath It (~ee
)m· Figure 5 3) and draw in lines representing the mean and standard deVla-
paneI (c · t mark m U1C '" ,,<>oro nr
tions (see panel (d) in Figure 5.3). It then becomes a snap 0
Standard Deviation Revisited: The Area Under the Normal Curve 187
normal distribution but rather a "family" of curves that can be defined as normal
(Elifson, Runyon, & Haber, 1990). The reason so many exist is the endless. potential for
different combinations of population means (J,L) and population standard deviations
(u). As we will see, all normal curves share some basic characteristics, but they are not
"cookie-cutter" constructs; some are larger than others, flatter in shape, or have a steeper
peak (recall the discussion of distribution shapes in chapter 3).
Across time, researchers realized that the normal distribution had a wide variety of
( applications and that it was a very effective way to describe the behavior of various naturally
i
,
,0
occurring events. Natural and behavioral scientists were quick to realize that the normal
r curve was useful for studying phenomena germane to their respective areas of inquiry. A
( wide variety of sociological, psychological, and biological variables distribute themselves
normally or in an approximately normal fashion, or they lend themselves to transfor-
mation to a normal curve (e.g., Rosenthal & Rosnow, 1991). The discipline of psychol-
! ogy, for example, would be at a loss if the properties of the normal distribution-notably
.ir its usefulness for hypothesis testing (see chapter 9)-were not available. Of course, some
everyday variables are also normally distributed. Height and weight represent everyday
t
I examples of variables that qualify in this regard.
r
r Although we will not be using it directly, there is a formula that specifies the shape
i, of the normal distribution. Here it is:
(
122
( [5.5.1] f(X) = e-(X-p.) f2u •
V27TU 2
{ Take a good look at it-it won't bite you. Forgive me, but I merely want you to think
,I about the following information pertaining to this formula and not be intimidated by it.
The conceptual information the formula provides will be useful background material
(
for the rest of this section of the chapter. For statisticians, the main advantage of having
/ this formula is that although available data can change or vary, this formula can always
be used to determine what a normal distribution looks like, no matter what the value of
f'
i its mean and standard deviation. What we see in [5.5.1] is that the relative frequency or
(
function of any score (X) is dependent upon the population mean (J,L) and variance
(
r (if), the constant 7T (which is = 3.146), and the constant e (the base of the natural log-
(
, =
arithm, which is 2.7183). In other words, if the relative frequencies of X were entered
into the equation, we would be able to see how they must form the now familiar normal
curve.
Of course, such plotting of scores is not our purpose here, because we already know-
or at least assume-that the normal distribution provides us with certain statistical advantages.
Chief among these advantages is the ability to partition the area under the normal curve to
t
identify what proportion or percentage of observations must fall within given ranges.
!
(
r • Standard Deviation Revisited: The Area Under the Normal Curve
In some sense, normal distributions are "standardized" because particular percentages of
f observations occur in a predictable pattern. (For ease of comparison, we will rely on
percentages, but discussing proportions under the curve is equally appropriate; recall the
relation between percentages and proportions introduced in chapter 3.) When the area
under the curve was described as being predictable in chapter 4, you did not know any-
i thing about z scores or greater detail about the utility of the normal distribution. We
!
,. noted then that approximately 68% of the available observations fall within the first
<.i
standard deviation interval above and below the mean, or 34.13% in each one. Formula
f
[5.5.1] actually enables us to specify what percentage of observations fall within each of
I these standard deviation intervals occurring to the left and the right of the mean.
f
I
188 Chapter 5 Standard Scores and the Normal Distribution
X
Z= -3.0 Z= -2.0 Z= -1.0 Z= 0.0 Z= +1.0 Z= +2.0 Z= +3.0
(]' (]' (]'
fL (]' (]' (]'
t t
1 t 68%
95%
99.7%
J
1
Figure 5.4 Area Between Standard Deviation Intervals Along a z Distribution
Our understanding of z scores and their fixed standard deviation interval widths of
1.0 will now payoff. Theoretically, the total area under the normal curve is equal to
100%. Figure 5.4 illustrates a standard normal distribution with z scores running along
the x axis. As you can see in Figure 5.4, on either side of the mean of 0 is one standard de-
viation interval equal to 34.13% of the area under the normal curve (i.e., 2 X 34.14% =
68.26%, the available area under the curve). The area between the first and second stan-
dard deviation on either side of the mean is equal to 13.59% (i.e., 2 X 13.58% = 27.18%
of the available area; see Figure 5.4). In the third standard deviation from the mean re-
sides 2.15% of observations (i.e., 2 X 2.15% = 4.30% of the available area; see Figure 5.4). If
you add the total area accounted for under the curve in Figure 5.4-you have accounted
for 99.74% of the available observations or z scores.
What about this less than 1% of other (potential) observations? They appear to re-
main but are not accounted for under the curve-why? In practical terms, these obser-
vations fall relatively far in either tail, the resting place of true outliers. But there is a the-
oretical consideration regarding the normal curve that must be noted as well. The
normal curve never "closes" because its two tails do not end (asymptotic to the x-axis)-
that is, it is comprised of a potentially infinite number of cases (Elifson et aI., 1990).
As a result, notice that the sign for infinity-both positive (+00) and negative (-oo)-is
included in Figure 5.4. These signs serve as theoretical reminders about the nonclosing
nature of the normal distribution, but they also have some practical significance. These
signs reinforce the idea that researchers are actually trying to generalize findings to popula-
tions that are large and subject to change, but still measurable.
Beyond the uniformity of observations falling in standard deviations, what else do
Contrary to expectation, a normal
we know about normal distributions? For most practical and theoretical purposes, the
standard normal distribution is treated as having outer boundaries that end at
distribution never ends (theoretically) ±3.0 standard deviations (see Figure 5.4). Normal distributions are symmetric
because its tails remain open. (lack any skew) and are described as generally mesokurtic (see chapter 4). Finally, using
a normal distribution, the percentile ranking of any given z score is easily determined.
Standard Deviation Revisited: The Area Under the Normal Curve 189
,..
r Table 5.1 Scores on Three Hypothetical Measures of Psychological Well-Being
Measure Raw Score Population Parameters zScore
f
;
r
r
items, as well as different means and standard deviations. The ability to covert disparate
measures to comparable scores is invaluable for researchers, as it frees them to consider
how all kinds of variables can affect one another.
r We can consider a simple example in this vein. A clinical psychologist might be in-
r'
terested in examining a client's scores on a few standardized measures, say, a depression
. inventory, a self-esteem scale, and a life satisfaction scale. Perhaps the clinician wants to
! confirm her assessment that the client is not depressed, but merely dissatisfied with
some life circumstances involving both home and work. The client's raw scores on the
~ standardized measures, the population mean and sample deviation for each measure,
,
i and the z scores are shown in Table 5.1.
( As shown in Table 5.1, the zscore for the client's depression level (z = -2.00) is rel-
!
When converted to z scores, different, atively far below the mean, which indicates a low likelihood of depression (i.e.,
J higher scores reflect a greater incidence of depression). The z score corresponding to
! even disparate, variables can be
(
self-esteem (z = + 1.88), however, falls fairly high above the mean (i.e., the client has a
r compared to one another. relatively high level of self-esteem). In contrast, the z score representing life satisfac-
'" tion (z = -3.00) is not in the desired direction-the client is clearly dissatisfied with
/
J salient aspects of his life-as it is three standard deviations below the mean. As
I
shown by this simple example, then, scores from different scales can be compared rel-
ative to one another once they are converted to standard scores. The psychologist can now
focus on helping the client to recognize which aspects of his life need to be ad-
( dressed, and the comparison of some empirical "apples" and "oranges" allowed that to
happen.
,( Knowledge Base
( 1. What are the characteristics of any distribution of z scores?
I 2. You have a sample with a mean of 25 and a standard deviation of 3. What are the
( z scores corresponding to the following raw scores?
a. 18 b. 26 c. 25 d. 32 e. 15
3. You have a population with a J.L of 65 and a (J" of 6. Convert the following z scores
( back to their raw score equivalents.
a. + 1.2 b. -2.5 c. -1.0 d. 0 e. +2.9
4. In percentage terms, what is the total area under the normal curve? What
f percentage of the observations fall in the first standard deviation below the
mean?
(
h
Answers
1. Any z distribution has a mean of 0 and a standard deviation of 1.0. Positive z scores
are always greater than the mean, whereas negative z scores are less than the mean
r in value. A distribution of z scores will retain the shape of the original raw score
t
distribution.
(
/
190 Chapter 5 Standard Scores and the Normal Distribution
By locating z = + 1.75 in column A of Table B.2 and then reading across to column B, we
find that the area between the mean and z is .4599. Thus, 45.99%, really 46%, of the
available scores lie between the students score and the mean (see Figure 5.5). Because
we also know that 50% of the scores fall below the mean, we can say that 95.99% (i.e.,
50% + 46.99%) of the area under the curve falls at or below the student's score of 35. In
other words, by using the z distribution, we know the percentile rank of a score of 35 on
this particular test.
If you examine the entry in column C corresponding to the z score of + 1.75, you
can see that 4.01% of the cases fall above it (see Figure 5.5). In other words, about 4% of
the people who took the test could have scores greater than a score of 35.
What if a second student received a score of 23 on the test? Where does this score
fall? Calculate a z score using the now familiar formula:
..;
r
I
J
f
( Y
~
(
(
...i
r
(
r
,I
"'r, +1.75
,
r Figure 5.5 Location of z = + 1.75 in a z Distribution with JL = 28.0 and u = 4.0
Note: In percentage terms, the area between the mean and the z score of + 1.75 is 45.99%. The area in the
J
r curve beyond the z score of + 1.75 is 4.01 %.
f'
i
,I
"
I
4
192 Chapter 5 Standard Scores and the Normal Distribution
~~~-------L--~~------L-----~-----L----~-=~==x
-3.0 -2.0 -1.0 0.0 +1.0 +2.0 +3.0
-1.25
the available area of the curve that falls below z = -1.25 (see Figure 5.6). Thus, a score
of 23 on the test fell at approximately the 11th percentile-II % of the scores fell at or
below the score of 23.
Further Examples of Using z Scores to Identify Areas Under the Normal Curve
Besides using z scores to determine the percentile rank of raw scores, z scores can also be
used to delineate the area between two raw scores. We will consider two illustrative
examples, one using z scores falling on either side of the mean and one involving stan-
dard scores appearing on the same side of the mean.
Area Between z Scores on Either Side of the Mean. An investigator wants to know
what percentage of the cases on a standardized instrument fall between scores of 85 and
Ill. The standardized test, which measures people's knowledge of geography, has a JL of
96 and a u of 7. The first step is to convert both test scores to zs using formula [5.2.1].
The test score of 85 is equal to:
85 - 96 11
[5.8.1] z= = - - = -1.57,
7 7
and the score of III is equal to:
111 - 96 15
[5.9.1] z= = - = +2.14.
7 7
To begin, of course, we draw a diagram of a z distribution similar to the one shown
in Figure 5.7. As you well know, the z of -1.57 is just over one and one half standard de-
viations below the mean and the z of + 2.14 is slightly over the boundary of the second
standard deviation to the right of the mean (see Figure 5.7). How do we proceed?
Well, because we want to know the area-the percentage of the cases-falling be-
tween scores falling on either side of the mean of 0.0, intuitively, we can (1) first deter-
mine the percentage distance between each score and the mean, and then (2) add these
percentages together. Using column B in Table B.2, we learn that 44.18% of the cases fall
between a z of -1.57 and the mean, and that 48.38% of the cases fall between a z of
Working with z Scores and the Normal Distribution 193
I
I
I
I
I
I
-1.57 +2.14
Figure 5.7 z Distribution Representing Geographic Knowledge Test with M = 96 and (J' = 7
Note: The raw scores of 85.0 and 111.0 are shown as z scores of-1.57 and +2.14, respectively. The area
between z = -1.57 and the mean is 44.18%, and between +2.14 and the mean is 48.38%. The total area
between these scores is 95.52% (i.e., 44.14% + 48.38% = 95.52%). Thus, 95.52% of the area under the normal
curve falls between scores of 85.0 and 111.0.
+2.14 and the mean (see Figure 5.7). To describe the total area between these two scores,
then, we add 44.14% to 48.38% and learn that 92.52% of the area under curve falls be-
tween the raw scores of 85 and Ill. As long as you understand the logic behind working
with z scores and the z distribution, and you bother to draw a diagram similar to Figure
5.7 (for guidance, see Figure 5.3), answering the question is not at all difficult.
Area Between Two z Scores on the Same Side of the Distribution. Let's do a second
example. This time, however, we will delineate the area between two scores that happen
to fall on the same side of the mean. A teacher of gifted middle school students wants to
know what percentage of IQ scores in the population fall between 132 and 138. As you
may recall, standardized IQ tests usually have a JL of 100.0 and a 0' of 15.0.
Right at the start, we know that these two scores are well into the right (upper) tail
of the distribution. Before we begin to perform any calculations, a pressing issue of plan-
ning must be addressed: How can we determine the percentage distance between two
scores that fall on the same side of the distribution (here, the positive side)? Previously, we
have added relative distances together. Now, however, we must identify the area between
the mean and where the two scores overlap one another-the remaining area under that
section of the curve will represent the percentage of cases existing between the two IQ
scores.
As always, begin by drawing a z distribution like the one shown in Figure 5.8 and
then calculate the z scores corresponding to the two IQ scores. An IQ score of 132 is
equal to a z of +2.13, or:
+2.53
+2.13
W hat is intelligence? How can it be measured and reduced to one score? What does the distri-
bution of IQ scores look like?
Defining the term "intelligence" is an old problem for educators and researchers, especially
psychologists. Robert J. Sternberg, a prominent researcher who studies intelligence, defines it as
the ability to learn from past experiences, to understand and control one's own thinking to pro-
mote learning, and to adapt to environments containing varied cultural and social elements
(Sternberg, 1999; Sternberg & Detterman, 1986). To some extent, intelligence means different
things to different people who find themselves (or the people they notice or, for psychologists,
formally study) in varied contexts. A doctor diagnosing a medical problem can display a different
type of diagnostic intelligence than a plumber repairing a drain or a mechanic fixing a plane en-
gine, but each one demonstrates what can be called a contextual "intelligence."
Historically, psychology has been interested in assessing human intelligence via various intel-
ligence tests, subsequently ranking people based on their scores. Although this practice has been
subject to stark and accurate criticism (e.g., Gould, 1996), the use of intelligence testing and
scores-notably the IQ score-continues. The original IQ score was devised by Stern (1912, cited
in Sternberg, 1999), who argued that intelligence should be based on the ratio of an individual's
mental age (MA) divided by chronological age (CA), multiplied by 100, or:
MA
IQ = CA X 100.
If a teenager's mental age was the same as his chronological age (e.g., 13 years) then his IQ score
would be 100-the exact average on the scale (i.e., IQ = [13/13) X 100 = 1.0 X 100 = 100). When
mental age is greater than chronological age, the IQ score exceeds the average; when the reverse is true,
the IQ score will fall below the mean of 100. This type ofIQ score is called a ratioIQ(Sternberg, 1999).
Why Standardize Measures? 185
y y
'------------...;::X X
(a) (b)
y y
~-------~X X
(c)
scores on the distribution for ready reference. I promise you that these quick curve
drawings will come in handy in later sections of this chapter and for solving the prob-
lems at its end.
[5.1.1] z=--·
x-x
5
As you can see, X represents the known raw score, Xis the sample mean, and 5 is the
z Scores can be calculated from sample's standard deviation. The formula for calculating a z score from population data
is conceptually identical-only the symbols change:
sample or population data.
X-JL
[5.2.1] z=--·
u
Once again, X is the known raw score, but now JL represents the mean of the population
and u is its standard deviation.
186 Chapter 5 Standard Scores and the Normal Distribution
What if you know a z score and you want to transform it back to its original raw
score form? This reverse conversion is really quite simple. Besides the z score itself, you
must know the value of the mean and the standard deviation of the distribution of raw
scores, and then it becomes a simple matter of multiplication and addition. Here is the
transformation formula back to a sample's raw score:
[5.3.1] X= X + z(s).
As shown in [5.3.1], the original raw score can be determined by multiplying the z
score times the sample standard deviation, which is then added to the sample mean. Let's
use sample data from earlier in the chapter to demonstrate this reconversion. Recall that
we earlier calculated that a score of 55 from a distribution with a mean of 50 and a stan-
dard deviation of 5 corresponded to a z score of 1.0. If we enter the z, the mean (X), and
the standard deviation (s) into [5.3.1] we can show that the raw score of X is indeed 55:
[5.3.2] X = 50 + 1.0(5),
[5.3.3] X= 50 + 5,
[5.3.4] X=55.
The transformation formula for converting a population-based z score back to its
raw form is conceptually identical but symbolically different:
[5.4.1] X = /L + z(u).
In this version of the transformation formula, the population standard deviation (u) is
multiplied by the z score, the product of which is then added to the population mean (/L).
Conceptually, of course, both the sample and population formulas achieve the same
end. The distance between an observed score and the mean of its distribution is divided
by the standard deviation of that mean. Before we learn more about the statistical util-
ity of z scores, we need to review the standard normal distribution, which will help us to
work with the information the scores provide.
Percentage of
Individuals in Ranges
of the Normal Curve 34% 34%
Standard
Deviations -4 -3 -2 -1 +1 +4 X
10 Score I I I I I I
40 55 70 85 100 115 160
Percentile I I I I I I I I I I I I I
2 5 9 16 25 4750 63 75 84919598
This figure shows a normal distribution as it applies to IQ, including identifying labels that are sometimes used to characterize different levels of IQ. It
is important not to take these labels too seriously, as they are only loose characterizations, not scientific descriptions of performance.
Ratio IQs turned out to pose practical and conceptual measurement problems, so in recent
decades they were replaced by deviation IQ scores. As you might guess, deviation IQ scores are
based on the normal distribution and its assumptions for large populations. "Deviation" in this
context means scores that deviate from the mean score within a normal distribution of IQ scores.
The normal distribution of deviation IQs is shown in Table 5.1.
With any luck, you remain blissfully unaware of your measured IQ score and the above distri-
bution is a mere curiosity for you. If you know your IQ score, however, then you are probably locating
it on this curve to see how your performance compares to the population at large. In either case, you
would do well to heed the sage wisdom of many research psychologists as well as intelligence test crit-
ics: The IQ score is but one measure of intelligence and an imperfect, incomplete one at that-no skill
or behavior can be adequately reduced to one numerical index. As the definition provided earlier sug-
gests, intelligence is comprised of various abilities. In other words, you are more than one score.
For further discussion of intelligence, its definition, and measurement, as well as accompa-
nying controversies, see Ceci (1996), Gardner (1983), Gould (1996), and Sternberg (1985).
Locate and then plot where the two zs fall on your sketch of the z distribution (see
Figure 5.8). Before proceeding any further, think about what relationship the plotted
scores should disclose. First, both IQ scores are at least two standard deviation units
away from the mean, placing both almost into the tail of the distribution. As a result, we
should anticipate that the area between the two scores is probably quite small because
few cases exist at the extreme of any distribution.
Turn to Table B.2 in Appendix B and locate the area between the mean and each of
the two z scores. The percentages for +2.13 and +2.53, respectively, are 48.34% and
49.43%. As shown in Figure 5.8, then, a considerable amount of area is shared by the two
scores (see the arrows pointing left from the z scores back to the mean)-only a small
area exists between the scores (see the double-headed arrow pointing to both the scores).
This small area represents the answer to the original question: What percentage of IQ
196 Chapter 5 Standard Scores and the Normal Distribution
scores on the test fall between 132 and 138? To determine its numerical value, we sub-
tract the smaller area associated with the z of +2.13 from the larger area for the z of
+ 2.53, or 49.43% - 48.34% = 1.09%. In other words, only about 1% of the IQ scores
in the distribution fall between scores of 132 and 138. Given that we know that rela-
tively few people score in the upper ranges of the scale, this result makes sense. (As an
aside, try to develop the practice of questioning any result-does it make statistical
sense? Is it too large or too small?)
What if you needed to determine the area between two z scores that fell below
the mean? The logic is the same as that used in the last example-you would simply be
using negative z scores. To begin, of course, you would plot the standard scores,
determine the area of overlap between each z and the mean, and then subtract
the smaller area under the curve from the larger area. As an exercise, take a moment and
make the sign on the z scores from the above example negative, sketch a z distribution,
and then enter the scores on to it. Redo the above example using scores that now fall
below the mean to verify that you understand the process and obtain the same answers.
Procedurally, the resulting T score is rounded to the nearest value of the whole number.
Thus, a score of 87.2 would be reported as 87.0. Again, please note that the value of T
will vary depending on the characteristics of the measure being used.
Can T scores be converted back to their z score equivalents? Yes, with little difficulty.
The formula for this transformation back is:
T- T
[5.13.1] z=~·
When writing about a z or T score, the goal is not to draw attention to the statistics
per se, but to focus on the properties of the measure in question and the observations
drawn from it.
Knowledge Base
1. You have a raw score of 220 drawn from a sample with a X of 215 and an s of 11.
a. What is the z score corresponding to the raw score of 220?
b. What proportion and percentage of cases occur between the mean and z?
c. What proportion and percentage of cases exist beyond z?
2. What is the percentile rank of the raw score from question I?
3. Using the sample statistics from question 1, determine the percentage of the area
under the curve falling between scores of 210 and 218.
4. Using the sample statistics from question 1, determine the percentage of the area
under the curve falling between scores of 200 and 213.
5. Using the z score transformation to a T score shown in formula [5.12.1] and
[5.12.2]' determine the Tscore for a z score of -2.20.
Answers
1. a. +0.46 b .. 1772,17.72% c. .3228,32.28%
2. 67.72% of the cases fell at or below a score of 220.
3. 28.0%
4. 34.17%
5. T score = 78.0
essary? As noted in Data Box 5.B, the population of students taking the test has expanded
J
r greatly since 1941, the year thatthe scores of 10,000 college-bound students were used to es-
( tablish norms-performance expectations and comparison scores-for the test.
r Prior to this 1995 recentering of scores, the average SAT verbal had "fallen" from 500
r
to 424 and the math score changed from an average of 500 to 478. In other words, the
(
r, distribution of test scores changed from 1941 to the 1990s. As shown graphically in Fig-
ure 5.9, the average score shifted from the 1941 midpoint to a point well below it. The dis-
r tribution for the current (pre-1995) scale shown in Figure 5.9 appears to be showing some
(
!
,
/
i
f
(
(
~
(
f
Figure 5.9 Shape of Distributions of Test Scores for 1941 and 1990s Student Samples
Note: The average score on the 1941 version of the verbal and math subscales of the SAT was 500. The average
scores on these subscales in the early 1990s were 424 (verbal) and 478 (math), indicating that the population
of test takers changed in the intervening years.
200 Chapter 5 Standard Scores and the Normal Distribution
positive skew; that is, test scores appear to be clustering toward the lower end of the dis-
tribution. Recentering of the SAT made the distribution of scores once again appear
like the normal (original) distribution shown in the left side of Figure 5.9.
By balancing scores on the SAT, the College Board returned the scores to a standard
normal distribution. Following the logic of transforming scores presented in this chap-
ter, "old" SAT scores were transformed into "new" ones. The difference here, of course, is
that the shape of the distribution was changed somewhat-it was "normalized" once
again, presumably by a transformation formula similar to one used for calculating T
scores. Although the exact formula is not public information, I do have two equivalence
tables that quickly illustrate how pre-April 1995 SAT scores on the verbal and math sub-
tests can be changed to the "new" SAT scores (see Table 5.2).
nA I~t ~t YalUe
iiV llJm!m~b'
;0",
~~;:, ~cb
standard s cal proce-
unus, such h rms) is deli
on d,and thi d
escendin 0
g Order of
an be known i
found_in d t at the bUlk nfeated in stand n propOrtion
av'] ard d .
pattern.
Key Terms
Raw score (p.1S2)
Standard score (p.182)
Tscore (p.196)
zscore (p.l83)
Chapter Problems
1. What are the properties of the z distribution? Why are z 8. A researcher examines her data and notices that its distribu-
scores useful? tion is negatively skewed, and that its f.L = 342.0 and its (F =
2. Explain why comparison between one observation and an- 21.0. If she converts all the data to z scores, what will be the "
other is so important for statistics and data analysis, as well as numerical value of the mean and the standard deviation?
the behavioral sciences. ,i
What will the shape of the distribution of scores look
3. What are the properties of the standard normal distribution? like now?
Why is the normal curve useful for statisticians and behav- 9. Imagine you have a distribution with a f.L = 78.0 and its (F =
ioral scientists? 12.5. Find the z score equivalents of the following raw scores:
4. Explain what role the statistical concept of standard devia- 54.0,63.5,66.0,77.0,78.5,81.0.
tion plays in understanding z scores and the area under the 10. Imagine you have a distribution with a f.L = 78.0 and its
normal curve. (F = 12.5. Convert the following z scores back to their raw
5. Why do researchers standardize data? What is a standard score equivalents: -3.10, -1.55, -1.0, +0.55, +1.76,
score? Why are z scores and T scores standard scores? +2.33, +3.9.
6. If z scores appear to have such a wide range of applicability to 11. Sketch a z distribution and plot the z scores from problem 9.
statistical analysis, why did behavioral scientists find it neces- Determine the area under curve (in percent) between each z
sary to develop T scores? What is the relationship between z and the mean of the distribution.
and T scores? 12. Imagine you have a sample with a X = 35 and an 5 = 3.5.
7. What is probability? How does probability conceptually re- Find the z score equivalents for the following raw scores: 27,
late to z scores and the normal distribution? 29.5,34.3,35,45,47.5.
--------------------------~/
Chapter Problems 2l1li3
13. Imagine you have a sample with a X = 35 and an 5 = 3.5. each test, summarizes its characteristics, and provides the
Convert the following z scores back to their raw score student's score on it. Assume that the possible scores on each
equivalents: -3.01, -2.56, -1.21, + 1.40, +2.77,3.00. test are normally distributed.
14. Sketch a zdistribution and plotthe zscores from problem 12.
Area Test f.I, a Student's Score
Determine the area under curve (in percent) between each z
and the mean of the distribution. Verbal ability 58.5 6.50 63.0
15. You have a very large distribution of IQ scores. As you know, Visualization 110.0 15.0 102.5
the IQ test has a f.I, = 100.0 and its a = 15.0. Find the percent- Memory 85.0 ll.5 98.0
age of the curve falling between each of the following scores Spatial relations 320.0 33.5 343.0
and the mean: 85, 88, 98.0, 112, 120, 133.
16. You have a very large distribution of IQ scores. As you know, a. Change each of the student's score to its z score equiva-
the IQ test has a f.I, = 100.0 and its a = 15.0. Find the per- lent.
centage of the curve falling between each of the following b. On which test did the student receive a high score? A low
pairs of scores: 85 and 88; 92 and 96; 98 and 108; 115 and 128; score?
130 and 140; 141 and 143. c. What is the percentile rank of the student's verbal ability
17. You have a data set with a X = 50.0 and an 5 = 7.0. Find the score? What percentage of the students who took the spa-
percentage of the curve falling between each of the following tial relations test scored higher than the student?
scores and the mean: 38.0,39.5,45.0,52.0,57.0,66.6. 28. Envision a normal distribution with f.I, = 88 and a = 14.
18. You have a data set with a X = 50.0 and an 5 = 7.0. Find the a. Identify the scores at the 25th, 75th, and 95th percentiles.
percentage of the curve falling between each of the following b. What percentage of cases fall below a score of 74.0?
pairs of scores: 39.0 and 43.0; 44.5 and 56.5; 51.0 and 61.0; c. What percentage of scores are higher than a 93.0?
62.0 and 63.0; 65.0 and 75.0; 76.0 and 80.0. d. What percentage of cases lie between the mean and a score
19. Determine what percentage of the cases under the normal of 99.0?
curve are beyond each of the z scores calculated in problem e. What percentage of cases fall between scores of 86.5 and
9. 92.5?
20. Determine what percentage of the cases under the normal 29. Using the T score formula of T = 75 + lO(z), transform the
curve are beyond each of the z scores calculated in problem following z scores into T scores. Round your answers to the
12. nearest whole number: -1.12, -2.30, + 1.18, +2.67,
2!. Determine what percentage of the cases under the normal +3.58.
curve are beyond each of the z scores calculated in problem 30. A transformation to T scores resulted in the following val-
15. ues when the formula T = 400 + 100(z) was used. Convert
22. Calculate the percentile rank of each of the z scores found in these T scores back to their original z scores: 420, 373, 510,
problem 9. 485,624.
23. Calculate the percentile rank of each of the z scores found in 31. Use the decision trees opening this chapter to answer the fol-
problem 12. lowing questions:
24. Calculate the percentile rank of each of the z scores found in a. An investigator wants to know a data point's location rel-
problem 15. ative to the mean of its (large) distribution. What should
25. Are z scores normally distributed? Why or why not? the investigator do?
26. Is there one normal curve or many normal curves? Can there b. A researcher draws an observation from an intact popula-
be more than one? How? Explain. tion whose parameters are known. How can this observa-
27. A student has taken four area tests designed to measure par- tion be converted to a z score?
ticular intellectual abilities. The following table identifies c. What is the percentile rank of a z score equal to 1.76?
~"
1\1,
Choosing a Measure of Association
1. 2. 3.
Are the data Are both variables Are both
organized into Xand Ymeasured variables X and
pairs of scores on interval or ratio Ymeasured on
(Le., X. Xl? scales? ordinal scales?
4- 5-
Are both You cannot perform
variables X and correlational analyses
Ymeasured on on these data. Consider
nominal scales? an alternative method of
analysis or presentation
of results.
If yes, then If no,
select a mea- , then go to
sure of asso- ~'
step 5.
ciation from
Table 6.6.
1. 2. 3.
Is the range When graphed in a Are there any
of values for scatter plot, does the extreme or outly-
variable X or Y relationship between ing scores in the'
"truncated" or variables X and Y data set?
restricted? appear to be nonlinear?
"
'" ""'.
4. 5.
i" Is the size of the sample Has the examination of scatter
small (Le., <20 pairs of plots involving all the main mea-
observations)? sures and subject variables rule~
out the presence of any preexist-
ing subpopulations in the data?
--_/
CHAPTER 6
Chapter Outline
Pearson r
Data Box 6.A: Mood as
Misbegotten: Correlating Prp,d,rtnr<
with Mood States
of us search for predictable, stable patterns in our experience. By seeing or recog- Calculating the Pearson "
connections between events, we can make sense of the world around us. • Interpreting Correlation
we are drawn to make connections between two events or variables, and Magnitude of r
we probably make conjoint judgments in this vein with great regularity. If you have ever Coefficients of Determination
Nondetermination
lived with another person for any length of time, you probably began to recognize asso-
ciations between certain events and your roommate's behavior. A phone call from an old Factors Influencing r
flame, for example, might elicit happy reminiscences for an hour or two, or a sullen de- Writing About Correlational
Relationships
meanor for the rest of the day. A mountain of homework, in turn, could elicit consider-
Knowledge Base
able grumbling and procrastination or extraordinary feats of efficient study.
When these same events happen more than once, we begin to detect a pattern be- • Correlation as
Reliability
tween one event (an old flame's call) and another (a bad mood), or what statisticians call
Data Box 6.B: Personality,
a correlation between the two variables. Literally, we create our own hypotheses about Cross-Situational Consistency,
how and why one variable "co-relates" with another variable. An ironic aspect of our de- Correlation
sire to seek and identify predictable, consistent correlations in everyday experience is
that we are actually not very proficient at picking them out (e.g., Nisbett & Ross, 1980). A Brief Word About Validity
That is, sometimes we believe that we see a pattern when, in fact, the association be-
tween variables is illusory and nonexistent (e.g., Chapman & Chapman, 1982). Other
times we sense that a pattern is present but, as intuitive rather than formal scientists, we Research
are not by nature well-calibrated enough to recognize its strength or magnitude (e.g., • What to Do When: A Brief,
Dawes, 1988; PIous, 1988). Conceptual Guide to Other
A classic series of studies by Jennings, Amabile, and Ross (1982) illustrate the real Measures of Association
obstacles people run into when they try to detect correlations. Research participants en- Data Box 6.D: Perceived
Importance of Scientific Topics
countered paired stimuli from three data sets-lO pairs of numbers, sketches of 10 male Evaluation Bias
figures of different heights with canes of varied lengths, and a tape recording of 10 peo-
Project Exercise: IdentiJYing
ple reciting a letter from the alphabet and singing a musical note. Participants tried to Predictors of Your Mood
determine if any association existed between the variables in each data set. In other
• Looking Forward, Then Back
words, for example, did tall figures tend to have long walking sticks and short figures
have short sticks (a positive relationship)? Or did figures with greater heights seem to • Summary
have shorter sticks, and vice versa (a negative relationship)? Alternatively, there might • Key Terms
be no discernable pattern between height and stick length (a zero relationship). As we
will see later in this chapter, correlations can take on a value ranging between -1.00 and
+ 1.00 and, as values move toward either extreme, they become stronger (i.e., more
Chapter 6 Correlation
206
predictive). Jennings et a1. (1982) purposefully created patterns within each data set so
that the objective correlations ranged between 0 (no relationship) and 1.00 (a perfect,
predictable relationship).
KEY TERM Correlation refers to whether the nature of association between two (or more) variables is positive,
negative, or zero.
Did the participants do a good job at identifying the presence and strength of the
correlations for each data set? Not really. Jennings and colleagues (1982) found that the
participants did not really begin to notice any positive relationships in any of the three
data sets until they were "strong;' that is, with objective numerical values ranging be-
tween + .60 and +.70. In fact, participants generally viewed objectively "strong" rela-
tionships as "moderate;' while objectively "moderate" relationships in the data were
missed altogether.
Sometimes, of course, we are capable of recognizing that two variables share a re-
Calculating correlations is necessary, lationship, one that we can readily seek (e.g., Hard work usually leads to rewards) or
avoid (e.g., If you touch a hot stove, you will burn yourself). Under relatively con-
if only because humans are not
trolled conditions, there is also some evidence indicating that people can recognize
"calibrated" to recognize objective everyday life correlations with some accuracy, such as the degree to which social
empirical relationships unless they behavior is consistent across occasions (e.g., If Jill is more honest than Jane once, how
are obvious ones. will she behave in the future?; see Kunda & Nisbett, 1986). The world is a complex
place much of the time, however, and it can be difficult for the person on the street to
isolate which variable is linked with which variable, and in what way. To circumvent
this complexity, statisticians and behavioral scientists rely on a straightforward pro-
cedure enabling them to convert variables to numbers, which are then entered into a
formula that provides an objective index of correlation. Correlational research,
which was introduced in chapter 2, can be a powerful tool for identifying heretofore
unnoticed relationships in the behavioral sciences, engaging in exploratory research,
or examining associations between variables when experimental research is not a
possibility. This chapter reviews the conceptual material underlying one of the more
common correlation statistics, as well as practical advice about its calculation and
application .
The simple number, of course, is the correlation coefficient. An ardent admirer and
junior colleague of Galton, the statistician Karl O. Pearson, enlarged the mathematical
background and precision of the index of correlation. The official name of this, the most
frequently used index of correlation, is the Pearson product-moment correlation coef-
ficient, or Pearson r (the Greek letter rho) for short. The word "moment" is a synonym
for the mean and, as you well know, the multiplication of two numbers results in a prod-
uct. "Coefficient" is a mathematical term specifying an operation on some data.
"Product-moment:' then, refers to the average of the multiplied numbers in the coef-
ficient's calculation (Runyon et al., 1996). Before we examine the conceptual under-
pinnings of the Pearson r, we must review a general but powerful aspect of all correlational
relationships-that they are not causal.
x .. y
Variable X causes Y.
X 4 Y
Variable Ycauses X.
1
z
The association between X and Y Is caused by a third (unknown) variable Z.
Figure 6.1 Graphic Representation of Possible Causal Links Between Variables X and Y, and the
Third Variable Problem
lived. What was the single best predictor of whether birth control methods were adopted
and used? The number of electrical appliances (e.g., toasters, fans, blenders) found in the
home: more appliances, more birth control-fewer or no labor-saving devices, less or
no contraceptive practice.
Despite the correlation's implication, as Stanovich (1998, p. 73) wryly observes, we
are not apt to conclude that teenage pregnancy can be discouraged by passing out free
toasters in Taiwan's schools. Intuitively, this conclusion cannot be correct, that is, there
is a noncausal relationship lurking hereabouts. Can you think of an explanation that
can parsimoniously explain this quirky anthropological observation? While you do
that, let's focus on thinking about the relations between variables identified by
any correlation.
First, variable X (presence or absence of appliances) can lead to a change in variable
Y (use or nonuse of birth control). The problem, however, is that correlational relation-
ships are bidirectional-that is, Y might also be creating the change in X. Both of these
possibilities are illustrated graphically in Figure 6.1. This lack of certainty in directional
relationships-the origins of cause and effect-is the reason that correlations are
not causal.
Understanding the relation between variables is more complicated still due to what
is sometimes referred to as the third variable problem in correlational analysis. A third
variable (or fourth or fifth, for that matter) is a variable that exercises an unrealized ef-
fect in a correlational result. Why is this third variable a problem? Perhaps X does not
lead to Ynor Yto X. Instead, it may be that the apparent relationship between X and Yis
actually the result of some third, unknown variable Z (see Figure 6.1). The problem, of
course, is there can be an infinite number of candidates for variable Z. And so we return
to the question of how birth control and electric appliances are linked-have you devel-
oped any hypotheses about the probable causal agent here?
The mediating variable-the one that ties the relationship between the two repre-
Correct causal ordering is difficult to sented by the correlation-is probably education or socioeconomic status (SES to be-
havioral scientists), two variables that, in turn, are highly correlated. In other words, in-
recognize, let alone prove-be on the
dividuals with more education are also apt to earn more money, which in turn is spent
lookout for mediating variables on electric appliances. Persons with more education, then, are more likely to know about
driving the association between two and exercise birth control; they also happened to have a higher disposable income to
seemingly related variables. spend on food processors and related items. This account is plausible, but consider how
long it took us to get here. More to the point, we still lack definitive (i.e., causal) proof.
Only when some controlled intervention is created-a manipulation in an experiment
/
The Pearson Correlation Coefficient 209
(see chapter 2), perhaps-can we actually determine what factors increased or inhibited
the use of birth control on the island of Taiwan. Other examples of correlations ex-
hibiting the third variable problem can be found in Stanovich (1998) and Abelson
(1995).
If correlation does not imply causation, then should such analyses be avoided? Not
at all. Correlational research, the examination of associations among variables, is ex-
tremely valuable and very often revealing, even enlightening. After all, few people
would posit a link between toasters and birth control out of the blue! More seriously,
however, correlations are a start. They can be highly suggestive, pointing researchers in
the right direction so that experiments identifying the actual causal ordering among
known and unknown variables can be designed and subsequently executed. As a
critical consumer of behavioral science research, one who is becoming well versed in
data analysis, however, you must always avoid confusing correlated variables with
causal variables.
KEY TERM The Pearson r, a correlation coeffiCient, is a statistic that quantifies the extent to which two
variables Xand Yare associated, and whether the direction of their association is positive, negative,
or zero.
210 Chapter 6 Correlation
We will review this conceptual definition for the Pearson rin the context of an ex-
ample here and then learn how to calculate it in the subsequent section. Conceptually,
the Pearson r assesses the degree to which X and Y vary together-this shared infor-
mation is then divided by the degree to which X and Y vary separately from one
another.
Imagine that a personality psychologist is interested in developing a new inven-
tory to measure extraversion (e.g., Jung, 193111971; Myers, 1962). As you know, ex-
traverts are individuals who display highly sociable characteristics-they are drawn to
form connections with others, to seek new experiences, and are generally "outgoing" in
their demeanor. In contrast, persons with introverted personalities appear to be shy,
less outgoing, and are much less sociable, particularly in novel or public settings.
Introverts are usually hesitant to form new social connections, holding back until
they become comfortable in a situation. The new inventory contains 20 questions,
each of which requires respondents to rate whether their reaction to a hypo-
thetical situation would be introverted or extraverted. Thus, scores on the inven-
tory can range between 1 and 20, and higher scores indicate a greater degree of
extraversion.
Column 2 in Table 6.1 shows the extraversion score for the 10 participants who took
part in a small scale validation study (for convenience, column 1 assigns a number to
each participant). The third column in Table 6.1 is a behavioral measure employed to
test whether the inventory is adequately measuring the personality construct of
introversion-extraversion. One week after the 10 participants completed the inventory,
they returned to the laboratory to take part in a staged social interaction. Specifically, the
10 participants met 12 confederates in a cocktail party setting, and their instructions
were simply to greet and meet new people for 30 minutes. Unbeknownst to the partici-
pants, their sociability was carefully monitored by the investigator-the whole numbers
in column 3 of Table 6.1 represent the number of "new" people they interacted with dur-
ing the party (the numerical value of these interactions could range between 0 ["met no
one"] to 12 ["met everyone"]). The personality psychologist predicts that higher levels
of extraversion will be linked with higher levels of sociability; that is, extraverts will tend
to meet more new people relative to introverts. If the inventory is valid, then the direc-
tion of the correlational relationship between the personality scores (X) and the ob-
served social behavior (Y) should be a positive one.
20 8
2 5 2
3 18 10
4 6 3
5 19 8
6 3 4
7 4 3
8 3 2
9 17 7
10 18 9
Note: "Higher scores indicate higher level of extraversion (range: 1 to 20); bhigher number indicates more
social contacts made in the 30-minute get-acquainted session (range 0 to 12).
The Pearson Correlation Coefficient 211
KEY TERM A negative correlation identifies an inverse relationship between variables X and Y-as the value
of one increases, the other necessarily decreases.
In the academic realm, a negative correlation can occur between the number of
hours spent watching television and the grades one earns. Logically, an inverse re-
lationship would exist between these variables-more hours spent watching TV
potentially leads to lower grades, while fewer hours propped in front of the tube
would be associated with higher grades. Bear in mind that although these examples
seem to be intuitive, even plausible, they cannot be treated as identifying causal re-
lationships. Many students who do well scholastically spend untold hours watching
television, while others avoid any such leisure activity and still receive lower grades.
Correlation does not imply causation!
A negative correlational relationship between extraversion and the number of
Negative correlation: As X increases, people engaged during the 30-minute study would be counterintuitive to the goals of the
study, not to mention what is known about the personality constructs involved.
Y decreases (and vice versa).
Nonetheless, of course, a researcher must anticipate what pattern of results an
(unexpected and unwanted) negative correlation would reveal, as planful investigators
always envision alternative results and consider their impact on theory (Dunn, 1999). A
negative correlation would occur if extraverted participants introduced themselves to
relatively few confederates, while the introverts in the study ended up meeting many
more people by comparison.
Finally, a zero correlation occurs when there is no discernable pattern of covaria-
tion-how things vary together-between two variables.
KEY TERM A zero correlation indicates that there is no pattern or predictive relationship between the behavior
of variables X and Y.
212 Chapter 6 Correlation
D o you know what factors put you into a good mood? Pleasant conversation? A sunny day? Is
happiness really a warm puppy, or is it more likely to come to you in the form of a rich cup of
real (caffeinated) coffee?
Wilson, Laser, and Stone (1982) had 50 students complete daily questionnaires regarding
their moods and various predictor variables (e,g" how much they slept, the quality of the food
they ate, the weather) for 5 weeks. Correlations between the predictors and mood ratings were de-
termined for each participant, and these were subsequently compared with the participants' esti-
mates of what the relationships were like (i.e., positive, negative, or zero). A second group of ob-
server participants (who did not know the first group) estimated how much each of the
aforementioned variables predicted mood.
Were the actual participants more accurate than the observers were when it came to knowing
what affected their moods? Wilson et al. (1982) found that although the participants achieved a
moderate level of accuracy at judging how their moods covaried with the predictors (the average
r between estimated and actual mood predictors was +.42), the peer observers who had no direct
experience were equally accurate. How so? Wilson et al. correlated the observers' estimates of how
predictors were linked to mood with the participants' actual estimates, yielding an r of + .45! In
other words, the observers who had no experience with judging what does or does not affect mood
were as accurate as individuals who carefully monitored predictor variables and their own moods
for over a month!
How can this intriguing event be explained? Wilson and colleagues (1982) suggest that in-
stead of basing mood judgments on the actual data they collected, the 5-week participants relied
on cultural beliefs and theories about what causes moods (e.g., "Rainy days and Mondays always
get me down."). These same shared beliefs and theories were used by the observers when they were
asked to predict to what extent each factor was linked to people's moods. Both groups, then, relied
on what some researchers have called a priori causal theories (e.g., Nisbett & Wilson, 1977) and
not the data found in their actual experiences (for further reading, see Wilson, 1985; Wilson &
Stone, 1985).
Now, do you think you really know what factors predict whether you will be in a good mood
or a bad one? Would you be any more (or less) accurate than the students who tracked their moods
for 5 weeks? To test your accuracy, try your hand at the Project Exercise presented at the end of the
chapter.
®
(9
® C!>
y '" ® y ® ®
e '" 0 ® It
.
f) ®
'" e ® Ii>
Ii It e ® @
f)
'" ®
®
e
X X
(a) Positive Linear Relationship (b) Negative Linear Relationship
y y
It '"
•
e
8
x X
(c) Perfect Positive Linear Relationship (d) Perfect Negative Linear Relationship
Figure 6.2 Linear Scatter Plots Illustrating Positive and Negative Correlations
linearity-are shown in panels (C) and (d), respectively, in Figure 6.2. In other words, the
known value of each variable would predict its complement; if you knew one value, you
would necessarily know the other. In practice, such perfect prediction rarely occurs,
though the ideal prediction of points will be the goal of a technique called linear regres-
sion, which is based on correlation. We will learn about this technique-one that gives
life, as it were, to correlations-in chapter 7.
Of course, scatter plots are not used only to plot perfect linear relationships. They
Greater linearity in a scatter plot's are at home, as it were, plotting any correlational data. Figure 6.3 is a scatter plot repre-
senting the relationship between the scores on the personality inventory and the
points indicates a stronger
participants' social interaction with the study's confederates (the data were taken from
correlation. Table 6.1). As shown in Figure 6.3, each of the plotted points represents the intersection
of pairs of X and Yvalues from our hypothetical study. Thus, for example, participant
6 had an inventory score of 19 and he or she met 8 confederates during the 30-minute
session (see Figure 6.3). What sort of relationship does the scatter plot in Figure 6.3
suggest? It looks to be quite a strong positive relationship-extraverts tended to meet
many of the confederates, while introverts sought out relatively few. We will see if our
supposition of a strong, positive relationship is borne out by the correlation between these
variables below.
What about correlational relationships between X and Y that hover near O? What
do these patterns look like when they are graphed? As shown by the data presented in
Figure 6.4, there is not much to see by way of any discernable pattern. This lack of
consistency between values X and Y is, of course, the point of a zero correlation. No
meaningful, interpretable pattern should appear (see Figure 6.4). In general, weaker
correlations between variables can be known by a broader scattering of points
in space.
/
The Pearson Correlation Coefficient 215
15
10 •
• •
y •
•
5
•
• •
• •
0 5
X
Figure 6.3 Scatter Plot of Personality Inventory Scores and Interaction Behavior
Note: These data were taken from Table 6.1.
•
• • •
• • • •
• • •
y
• • •
• • •
• • • • •
•
• • • • •
• • •
• • • •
X
Scatter plots can appear in an endless variety falling between the strict linearity of
perfect correlations and the spatial disarray of zero correlations. Figure 6.5 illustrates a
few other (hypothetical) possibilities. In each case, the value of X is matched with another
value Y. Some scatter plots can illustrate a curvilinear relationship, as shown by panel
(a) in Figure 6.5. Such a pattern of points will yield a correlation close to 0 because the
Pearson r does not detect covariation that does not conform to a linear relationship.
Panel (b) in Figure 6.5 illustrates how scores on a dichotomous variable-one with
only two levels or values, such as gender where M stands for male and F stands for
female (X)-can be plotted with a variable, say test scores (Y) that can take on many
values. Finally, panel (c) presents a scatter plot representing the effects of practice. As
time (X) passes, for instance, the memorization of novel information (Y) gradually
216 Chapter 6 Correlation
••
• • • • •• • ••• ••
••• •• • • • •
y y
••
•
•• •••• ••• • •• •
•• • • •• •• ••
•• •
•
X M X F
(a) (b)
••••••
•• •
•
y •
•• •
••
••
X
(c)
Figure 6.5 Other Possible Scatter Plots Showing Relationships Between Some Variables X and Y
improves until it eventually levels off (note that the curve in panel (c) eventually be-
comes "flat").
The scatter plot of a data set allows a researcher to think about what the relation-
ship between two variables is apt to be like. By examining the pattern of points, one can
get a feel for whether the relationship between the variables is positive, negative, or
zero, as well as the probable strength of the actual correlation statistic (i.e., are the
points relatively close together or far apart). Of course, the only way to truly under-
stand the nature of the relationship is by actually calculating the r statistic, the mat-
ter we turn to next.
The Pearson r's relation to zscores. In chapter 5, we learned that the distinct advan-
tage posed by z scores is their ability to standardize variables. Once a score is converted
to a z, we can see how close it lies to the z distribution's mean of 0, as it is now repre-
sented in standard deviation units (if these facts seem unfamiliar to you, please turn
back to chapter 5 before proceeding with your reading here). By using a z score, we can
precisely locate the placement of an individual score in a distribution.
The Pearson r capitalizes on this standardization by assessing and reconciling, as it
were, the relative locations of X and Y simultaneously. In other words, the Pearson r
..
,
;
~
Sum of the Squares Revisited. Back in chapter 4, you will recall learning that the sum
i of the squared deviations from the mean-the "sum of the squares" for short-played
{
r an essential role when it came to the calculation of the variance and the standard devia-
( tion. As we noted then, the sum of the squares is integral to many, if not most, inferen-
r tial statistical tests. You will see in a moment that the formula for the Pearson r is no ex-
( ception, that the sum of the squares is prominent within it.
r Here is a second conceptual formula for the Pearson r:
rf
I(X-X)(Y- Y)
{ [6.2.1)
·r
The numerator of the formula is based on the sum of the mean deviations from X and Y
( multiplied by one another. With the possible exception of the square root radical, how-
r ever, the denominator should look somewhat more familiar. The I (X - X)2 and
I (Y - Y)2 are the sums of squares for variables X and Y, respectively. The formula we
r use to compute the sum of the squares is [4.10.1], or:
r'
r
SS =I X2 _ (IX) 2.
N
".
\
This formula can certainly be used to determine the sum of squares for X, but we also
r
need to remind ourselves that we must calculate the same information for Y. To simplify
r
J'
matters, we can revise [4. 10.1] by making two versions of it, one for X and one for Y, or:
,.;
~631)
t··
SSx = "X2
L
_ (Ix)
N •
2
i
( and
r
( [6.4.1]
r
( Once we think of the denominator of [6.2.1] in terms of SSxand SSyit no longer seems
[
1 so forbidding. Now, take another look at the numerator of [6.2.1]-it, too, is another
form of the sum of squares, one that assesses what is called the covariance of X and Y.
,r Statistically, the covarianc~}s defin~ as the average of the sum of products of deviations
r
from the mean, or (X - X) (Y - Y). The calculation formula for the covariance of X
and Yis:
r
r [6.5.1] covxy = IXY _ (IX)(IY).
{ N
I
218 Chapter 6 Correlation
If we place [6.5.1] in the position of the numerator and then put [6.3.1] and [6.4.1]
under the radical, the original Pearson r formula shown in [6.2.1] can be rewritten as the
common computational formula:
Ixy _ (IX)(IY)
N
[6.6.11 r=
20 400 8 64 160
2 5 25 2 4 10
3 18 324 10 100 180
4 6 36 3 9 18
5 19 361 8 64 152
6 3 9 4 16 12
7 4 16 3 9 12
8 3 9 2 4 6
9 17 289 7 49 119
10 18 324 9 81 162
Xx = 11.3 X y = 5.6
N=lO
831 _ 6,328
10
r= ~========~~======~
831 - 632.8
r = -y7=[=1,=79=3=-=1=,2=7=6.=9]=:=[4=0=0=-=3=13=.6=-]
198.20
r = -y7=[5=1=6.=10=:=]:=[8=6.=-4]
198.20
r = =-yr4==4,===59:: : 1==.0==4
198.20
r=
211.1659
r = +.9386 == +.94
shown under the last column of Table 6.2. If you have any questions regarding how any of
these sums or statistics were determined, please review the appropriate sections of chap-
ter 1 or 4.
We now calculate the Pearson r between scores on the psychological inventory and
the number of social interactions during the 3D-minute get-acquainted session. This cal-
culation is shown in Table 6.3. For convenience, formula [6.6.1] is repeated there. The
sums entered into the second step of the calculation here were taken from Table 6.2. Be
sure that you can follow each of the steps shown in Table 6.3. The computed
r = + .9386. The convention is to round the correlation coefficient to two places behind
the decimal, thus r = + .94.
What about the second method for calculating the Pearson r? The second
method-the mean deviation approach-is no more difficult than the raw score
method. Here is the computational formula for the mean deviation method:
2: (X- X)(y- Y)
[6.7.1] r=
YSSx ' SSy
As you can see in formula [6.7.1], this approach also relies on the sum of squares and the
covariance of X and Y.
220 Chapter 6 Correlation
Table 6.4 Data Prepared for Calculation of the Pearson r (Mean Deviation Method)
....
II ..
Participants X X-X (X- X)2 Y y- Y (y- 1')2 (X- X)(Y- Y)
II It
I (X - X) (Y - Y) = 198.20
Xx = 11.3 X y = 5.6
Sx = 7.57 Sy = 3.10
N=lO
Calculating r using the mean deviation method is really no more difficult than the
raw score method. The data for X and Y from Table 6.1 are shown in Table 6.4 (see the
second and fifth columns, respectiv~ly). Column 3 contains the deviation scores based
on subtracting the mean of X (i.e., Xx = 11.3) from each of the values of X. Column 4,
in turn, shows the square of the mean deviation scores for X, the sum of which, of
course, is the sum of squares (SSx = 516.10; see the bottom of column 4). Columns 6
and 7 repeat these respective procedures for variable Y (Xy= 5.60). The sum of squares
for Y, which is equal to 86.4, can be found at the bottom of column 7. The sum of the last
column in the far right of Table 6.4 shows the covariance of X and Y, or 198.20. This
number is based on the deviation scores for both variables. Multiplying the values in
column 3 by those in column 6, and then summing the resulting products calculates the
covariance of X and Y.
Once the covariance of X and Y and their respective sums of squares are known,
they can be entered into formula [6.7.1]. For your convenience, each of the steps used
to calculate r using this formula is summarized in Table 6.5. Naturally, the same re-
sult-r = + .94-is found using the mean deviation method (see Table 6.5). To verify
that you understand how all the numbers were determined, please compare the contents
of Tables 6.3 and 6.5 with one another.
Which of the two calculation methods should you use? The choice is your own;
some individuals prefer the raw score method, others the mean deviation method. Both
approaches will provide the same answer and probably take approximately the same
In correlational analyses, N refers to
amount of time to perform.
the number of X, Ypairs, not the total Regardless of which formula you select, there are couple pitfalls to be wary of when
number of observations present. calculating r. An easy to forget feature of the Pearson r is that N refers to the number of
Interpreting Correlation 221
J
,
I
.
••
•• r=
2: (X - xH Y -
Vssx ' SSy
Y)
r
J
198.20
f r = -:-yr.(=51=6.=10:: : =):;=8=:=6.
:( =-=4)
f
~
,
198.20
f r = =-yr4==4::::::,S=='91:=.0:=:=4
(
r
r
) r=
198.20
! 211.1659
t r = .9386 e; .94
f
i',
I
l
r
( pairs of variables X and Y (here, N = 10), not the total number of observations available
>
( (i.e., 20-10 for X and 10 for Y).1t is easy to make this mistake because you are not (yet)
f used to working with pairs of observations. Further, it is important to keep the mise en
place philosophy from chapter 1 in mind when calculating a correlation. As you can see
by the amount of information that must be kept track of (see Tables 6.2 and 6.4) and
then entered into an rformula (see Tables 6.3 and 6.5), it pays to be organized. None of
the calculations are particularly difficult or even time consuming, but with so many
numbers, it is easy to write down the right number in the wrong place. To avoid compu-
f
tational errors, then, it is essential for you to keep track of which number goes where by
(
r carefully writing them down on a separate sheet with column headings like those shown
r in Tables 6.2 and 6.4.
r We now turn to the important matter of interpreting a correlational result.
(
, • Interpreting Correlation
(
J
Based on our analysis of the scatter plot (see Figure 6.3) and the correlation calculated in
Tables 6.3 and 6.5, what do we know? The scatter plot provided us with graphic evidence
that the correlation was positive in nature and the r-statistic of + .94 confirmed it. We
( know that extraverts-those individuals who had higher scores on the personality in-
r
j ventory-were apt to meet many more confederates during the staged "get acquainted"
session than their introverted counterparts. The latter, as predicted, tended to meet fewer
confederates, that is, to engage in less social contact. In short, particular personalities
were correlated with particular social behaviors.
Beyond discounting causal accounts, however, there are other considerations that
must be examined before any correlational result is to be relied on. These considerations
include examining the magnitude of the correlation, its predictive accuracy, and ruling
out factors that can preclude an accurate interpretation of the statistic.
Magnitude of r
Beyond indicating the sign and strength of a correlational relationship, researchers
typically also describe the magnitude of the association between the two variables. Dif-
ferent researchers use different descriptors, but what follows is a helpful guide of arbi-
trary labels for ranges of rvalues (based on Evans, 1996, p. 146):
Range of r Descriptive label for r
±.80 to 1.00 Very strong
±.60 to .79 Strong
±.40 to .59 Moderate
±.20to .39 Weak
±.OO to .19 Very weak
These ranges and descriptions are not writ in stone. One investigator might describe
an r of +.75 as very strong, while another might say that one equal to -.35 is moderate.
The choice of label depends on the data, the context of the scientific argument, and the
investigator's temperament.
How can our hypothetical study's Pearson r of + .94 be classified? It is clearly a very
Though conventions exist, adjectives strong correlation, indeed, falling in the + .80 to + 1.00 range. An investigator could
legitimately describe this correlation as "very strong" in a written summary of the project
describing the magnitude of
(see below). In most cases, however, it is not sufficient to just note the value of the cor-
correlational relationships are relation and to describe its magnitude-readers will also want to know how accurately it
somewhat arbitrary. portrayed the data. The issue of accuracy is wrapped up with how knowing the variabil-
ity of one variable can allow you to predict the change in another.
, provides the proportion of the variance in one variable that is not attributable to any
t
i change within the other variable in a correlational relationship. It is easily calculated
r"
simply by subtracting r2 from 1.
f
; KEY T ERM The coefficient of nondetermination (k, or 1 - (2) indicates the proportion of variance or change
,- in one variable that cannot be accounted for by another variable.
v
I
,= +.94 ,= +.40
I
r
,r
,r
(
7
x y
r
I
I
ir ,2 = +.88 ,2 = +.16
i
(
I
,= +.10 ,= +0.0
r
(
,) x y x y
r
- ,2 = +.01 ,2 = 0.0
.(
Figure 6.6 Graphic Illustration of Correlation and the Coefficients of Determination and
Nondetermination
Note: Overiapping circles indicate the extent of the correlation between variable X and Y. The shaded area
shared between circles can be quantified as the coefficient of determination (r 2 ). Nonoveriapping areas of
I'
J
circles can be quantified as the coefficient of nondetermination (1 - r2).
I
I
(
I
(
224 Chapter 6 Correlation
Factors Influencing r
Before one can conclude definitively that a correlation is trustworthy, several factors
should be examined. These factors are the range of values used to compute a result, the
extent of a linear relationship between variables, the presence of any outlying scores, the
size of a sample, and the potential for preexisting subpopulations within a data set. Each
of these factors can lead to abnormally high or low correlations masking any real effects
within a set of data.
limited linear Relationship Between Xand Y. As noted earlier, the Pearson r is designed to
detect the linear relationship between two variables. If the data of interest depart too much from
a straight-line orientation in space (see the earlier discussion ofscatter plots of data), then the size
and the magnitude of the correlation will be adversely affected. Curvilinear relationships be-
tween X and Y-when plotted they form a U shape or an inverted U-will result in very low
,
,
Interpreting Correlation 225
Pearson r values, for example, despite the fact that a scatter plot makes the data appear to
be predictable (see panel (a) in Figure 6.5). The reason for low predictability is that the
positive and negative values on either side of the curve tend to cancel one another out,
resulting in a low, often spurious, correlation. A savvy researcher always examines the
scatter plot of any data before beginning an analysis or drawing conclusions from one
, that has already been completed.
!
('
i, Outlying Scores and Correlation. Extreme scores-those with aberrantly high or low
,
.I
J
values-have a similar deleterious effect on correlations as they do on descriptive statis-
tics like the mean. Outlying scores can artificially increase (or decrease) the apparent size
of a correlation; just one or two "deviant" scores in a small or medium-sized sample can
)
wield a great influence.
f Researchers really have two choices to make to reduce the deleterious effects of ex-
; treme scores. First, increasing the size of the sample can usually diminish the effects of
any given outlier (Le., recall the law of large numbers introduced in chapter 4). Second,
J an investigator can examine the scatter plot of scores for variables X and Y to determine
if any outliers are present and, therefore, if they should be removed from subsequent
I analyses. Given a choice, increasing a sample's size seems to be a more ethically com-
J pelling maneuver than dropping problematic observations on a whim. On the other
hand, removing outliers is often easier to carry out than widening the size of a sample.
Whichever alternative is chosen, researchers must be honest with themselves and to their
readers by providing a full account of how they dealt with the problem of outlying scores.
(-
r Sample Size and Correlation. On occasion, sample size can influence whether a cor-
r
relation reveals a spurious relationship between two variables. Whenever possible, inves-
t
{ tigators should recruit sufficiently sized samples of participants for research purposes.
The goal, of course, is to obtain enough participants so that their responses, and corre-
lations between pairs of particular responses, will adequately represent their population
of origin. Do not fall prey to a common myth, however, that larger samples yield larger
r correlations and that smaller samples necessarily result in smaller correlations. Sample
size and the strength of a given correlation are actually independent of one another; that
is, one does not predict the other (e.g., Runyon et aI., 1996). This qualification is not an
invitation to run correlational analyses on small samples, though, rather it is encourage-
ment to seek adequately sized samples in order to reduce the risk of uncovering spurious
I correlations in your data.
What is a reasonably sized sample? How many participants will you need for your
research? Answers to these questions involve exploring (a) available resources and
,i (b) the effect you hope to demonstrate (Dunn, 1999). Resources can be divided into two
( components, the time available to you to collect data and the number of participants
available to you. If you have a large amount of time and a corresponding number of re-
; search participants, then you are probably in good shape. If you have a shortage of either
resource, beware of the heightened potential for spurious correlations. Indeed, where
,. correlational analyses are concerned, you should endeavor to have around 30 or so par-
1
ticipants before running any Pearson rs.
f What about the particular effect, some positive or negative relationship between two
variables, you hope to demonstrate? There are several techniques available to researchers
that can help them to determine how many participants they need in order to demonstrate
a statistical effect that reaches a particular magnitude. Collectively, these techniques are
r
; called power analysis (e.g., Cohen, 1988, 1992; Rosenthal & Rosnow, 1991), a topic we will
r
I
I
226 Chapter 6 Correlation
return to later in the book. Until that time, follow the general rule that larger samples are
apt to be helpful toward testing for the presence of desired effects in data.
Preexisting Subpopulations and Correlation. The final factor that can preclude a clear
and accurate interpretation of a correlation between variables X and Y is the presence of
preexisting subpopulations within a sample of data. By preexisting subpopulations, I
refer to any groups of participants who share some common feature or trait that escapes
being noticed by a researcher. Subject variables such as age, gender, marital status, ethnic
or religious background, as well as any number of personality traits (e.g., self-esteem),
can reduce the overall correlation between two variables. How so? Imagine that a re-
searcher was interested in women's self-esteem and how well it correlated with life satis-
faction, among other indices of psychological well-being. Other investigators routinely
document strong relationships between these variables, yet the researcher finds a
relatively low correlation (i.e., r = .15).
Why did the researcher fail to replicate the usual correlational results? When
examining a scatter plot of data, the researcher notices two distinct clusters of scores, ei-
ther of which might be predictive on its own but, in combination, cancel out each other's
effects. She quickly realizes that the upper cluster contains mostly scores of married pro-
fessional women, while the lower cluster is composed primarily of divorced women
heading single-parent households. In other words, the investigator neglected to consider
the effects of marital status, its potential link to economic well-being and, in turn, how
it affects self-esteem. Had she examined scatter plots displaying self-esteem and its rela-
tion to various other variables (including marital status) before running any analyses,
she would have anticipated the distinct and problematic subgroups. Thus, it is always
beneficial to search for patterns in your data before actually sitting down to analyze them
in earnest (recall the spirit of exploratory data analysis introduced in chapter 3; see, for
example, Tukey, 1977).
1A correlation is also usually accompanied by a probability or p value, which indicates whether the r is
statistically significant. Significance testing and p values are introduced in chapters 8 and 9, and I will postpone
discussing their relation to r until then.
Interpreting Correlation 227
Correlational results can be presented in several ways. First, a correlation can be re-
ported in the heart of a sentence:
The Pearson correlation between number of children and marital satisfaction was - .35,
suggesting that couples with more children tend to experience lower levels of content-
,. ment. Those with fewer or no children report greater marital compatibility with their
t
spouses.
(
merical information is placed in a table, often in the form of a correlation matrix (see
f' Data Box 6.C for suggestions about reading and interpreting a correlation matrix). Such
matrices are very useful, especially when several correlations are reported. In practice, a
matrix will show the correlation values computed between all of the possible variable
! pairings. Although only some of the correlations represented by these pairings will be of
I primary interest to researchers and readers alike, the scientific convention is to disclose
,I
them all for public scrutiny. Indeed, astute readers sometimes identify relationships
r- neglected by the researcher or pose alternative hypotheses for the data when viewing a
( given pattern of correlations.
i
If you do place correlational data into a table, be sure to follow the guidelines for descnb-
t ing tabular information within the body of a paper (see chapter 3; see also, Dunn, 1999). You
( must guide readers to correlational relationships that are of particular interest to you or that
,,
r, support your arguments rather than assuming they will find them on their own.
r Knowledge Base
I
r L What are the three types of directional relationships that can be found in a
J
I correlational analysis?
( 2. A middle-aged man begins to exercise regularly because he read a newspaper study
r
l identifying an inverse correlation between heart disease and physical activity. He
proudly announces to his spouse, "Well, I'll never have a heart attack." What is wrong
( with his reasoning?
3, True or False: An r = + 047 is stronger than an r = - 047.
I
.{
4. What is the coefficient of determination for r = - AD? What is the coefficient of
nondetermination for r = - AD?
f
5. A scatter plot reveals scores falling close together into a well-defined U-shaped
f
J curve, but the researcher is surprised to learn that the Pearson r is only equal to
(
+ .12. Why is the correlation so low?
r
r
Answers
f
1. A correlation can reveal a positive, a negative, or a zero relationship.
2. Although exercise is certainly beneficial, correlation does not imply causation-there is no
guarantee that he will or will not have a heart attack at some point.
228 Chapter 6 Correlation
3. False: The + or - sign denotes directionality in the relationship. The shared numerical value
of these correlations makes them equally strong.
4. The,2 = +.16; 1 - ,2= +.84.
5. The Pearson, is designed to detect linear, not curvilinear, relationships within a set of data.
pect that, on the average, each member of each pair of scores would not differ substan-
tially from one another (recall the discussion of observed versus true scores presented in
chapter 2).
t What is the statistical component of reliability? The reliability of any given measure
f can be quantified or assessed using the Pearson correlation coefficient. All a researcher
has to do, for example, is to correlate one set of scores on a measure with another set
J from the same measure as long as they were collected at a different point in time. This
( simplest form of reliability, referred to as test-retest reliability, involves giving the same
~
test or measure twice to the same group of respondents. If the measure in question were
truly reliable, then the Pearson r between the scores on the first and second administra-
tion should be relatively high, say, between + .80 and + .90. When a Pearson r is used to
( demonstrate reliability, it is often called a reliability coefficient.
rI In order to be deemed reliable, a measure must display a positive correlation (relia-
r A reliable measure provides bility coefficient) that reaches or exceeds an rof + .70.A correlation lower than this min-
f
imal value suggests that the responses across the two administrations vary considerably
consistent or similar measurements
from one another (i.e., consistency from time 1 to time 2 is low). By definition, measures
r each time it is used. that show low levels of consistency cannot be considered reliable for research purposes.
f
Other Types of Reliability Defined
Although it is popular, test-retest reliability is not the only statistical way to determine if
r
I
~
a measure is reliable by using the Pearson r. In fact, giving the same test to participants
I twice can be problematic, as many respondents may recall the answers they gave on the
first administration. Parroting an earlier response on a later administration of a measure
;
only serves to inflate the test-retest reliability coefficient. To combat this problem, some
r
researchers rely on alternate form reliability where participants complete different
!' measures containing the same sort of items on two separate occasions. Their two scores
! are then correlated and a higher rvalue indicates whether the alternate form reliability is
successful.
f
Reliability need not necessarily be assessed using the same measure given at two
( points in time-under certain conditions a single administration can actually be suffi-
i
I
cient. To do so, behavioral scientists occasionally rely on what are collectively called
i internal consistency measures of reliability. The main condition for this form of relia-
J
> bility is that the measure in question must be comprised of a reasonably large number of
"I
;,
items. The rationale for this condition will be articulated shortly.
.
~ The first form of internal consistency reliability is called split-half reliability be-
cause it entails literally "splitting" a test in half so that respondents' total scores on the
first half of the test can be correlated with their total scores on the second half. A second,
related form is called odd-even reliability. As you might guess, responses to the odd
numbered items on a test are totaled and then correlated to the total scores of the even
:f numbered items. A third method of internal consistency reliability is labeled item-total
t
j
)
reliability, and it requires a bit more effort for researchers than the previous two ap-
} proaches. Scores on each item of a test are correlated with the total score on the test. A
~
test with 100 items, for instance, would require that 100 correlation coefficients be com-
"
i
puted, the mean of which would serve as the reliability coefficient.
(
r
( A Brief Word About Validity
(
I
I
Correlational analysis can also be used to establish the validity of a measure or construct.
1 In chapter 3, validity was defined as the degree to which some observation or measure
corresponds to the construct it was designed to gauge. Many times a researcher will want
to verify that some novel test or measure for a psychological construct (e.g., chronic
230 Chapter 6 Correlation
Note: CES-D is a depression scale (Radloff, 1977); LOT-R is the revised Life Orientation Test, a measure of
dispositional optimism (Scheier, Carver, & Bridges, 1994).
IHigher scores indicate greater risk for depression (a score above 16 is considered "at risk").
2Higher scores indicate greater levels of self-esteem (Rosenberg, 1965).
3Higher scores reflect a tendency to conceal uncomfortable thoughts, feelings, and information about the self
from others (Larson & Chastain, 1990).
What to Do When: A Brief, Conceptual Guide to Other Measures of Association 231
Correlation Matrix
1 2 3 4 5 6 7
Note: Only negative entries have a sign (-); positive entries do not. Please note that the entries of 1.00 in the
matrix indicate the correlation between a given variable and itself (i.e., a correlation matrix shows every
variable correlated with every other variable).
4Higher scores indicate greater willingness to disclose to a male friend (Miller, Berg, & Archer, 1983).
5Higher scores indicate greater willingness to disclose to a female friend (Miller, et ai., 1983).
6Higher scores indicate greater incidence of health symptoms associated with HIV/ AIDS within the past
week (Reed, Taylor, & Kemeny, 1993).
7Higher scores indicate higher levels of optimism, a belief that future outcomes will be positive (Scheier &
Carver, 1985).
232 Chapter 6 Correlation
data. The Spearman rs is probably the second most common measure of association to
which students are exposed.
What about other measures of association? Is it possible to "correlate" nominally
scaled data? Are there other statistics for analyzing ordinal data besides the Spearman
correlation? There are a variety of other measures of association available, many of
which are amenable to nominal as well as ordinal data. It is beyond the scope of the pres-
ent book to provide you with detailed instructions regarding how to calculate all of the
alternative approaches to measuring association. Later in the book, we will review two of
these indices in some detail because they support other statistics (recall the role of in r
explaining the strength of association between correlated variables). I do, however, want
What to Do When: A Brief, Conceptual Guide to Other Measures of Association 233
to provide you with a resource where some of the other measure of association are con-
cerned, a summary guide to rely on in the future when confronted with data that cannot
be analyzed by the Pearson or Spearman statistics.
Table 6.6 lists the four scales of measurement and then some recommended, alter-
native measures of association under each one. The specific formulas for these statistics
can be found in any number of advanced statistics texts (e.g., Agresti & Finlay, 1997; Elif-
son, Runyon, & Haber, 1990; Rosenthal & Rosnow, 1991). I hope that Table 6.6 will be a
useful for resource for you in the future, especially if you develop a strong interest in sta-
tistics and data analysis.
• Did you read Data Box 6.A? If not, then please go read it now before proceeding. If you
read Data Box 6.A, then you no doubt found yourself wondering whether you could
identify what aspects of your daily experience affect how you feel. Such questions are al-
ways interesting, if not provocative, because they motivate us to wonder how well we
know ourselves. Is it possible that you don't know what affects your mood each day, or
to what degree given factors like the weather or physical exercise actually influence how
you feel?
The following Project Exercise, designed by Timothy D. Wilson of the University of
Virginia, enables you to effectively replicate the Wilson et al. (1982) study by keeping
track of your daily moods for a few weeks. You can perform this project by yourself and
then share and compare your results with your classmates. While tracking your moods,
you can also record how you believe that each of several factors influenced your mood
each day.
When this data collection period is over, you will have the opportunity to judge the
degree to which each of the tracked factors covaried with your mood. You will also learn
how well each of the factors is actually correlated with your mood. Finally, you will com-
pute a correlation that will indicate your relative judgment accuracy-in other words,
how well did you identify what factors influence why you feel the way you do.
234 Chapter 6 Correlation
Answer the following set of questions at the same time each day. Complete one
questionnaire daily. You should complete at least 7 (i.e., one week s worth) of these
questionnaires; the results will be more stable if you record data for 3 or more weeks,
however. If you miss a day, do nollry to recall how you felt just skip it until the next week.
Today s date _ _ _ __
Step 1. Figure 6.7 contains a short questionnaire that you should fill out each day
for a week or two. You can either write down your responses or ratings to answer the
questions in Figure 6.7 on a separate sheet of paper each day or use a copy machine to
make the number of copies you think will be necessary (7 for a week, 14 for 2 weeks, and
so on). Be advised that the correlational results will be more stable if you collect data for
a longer period of time, say, 3 weeks or more (the original study was 5 weeks long; see
Data Box 6.A). Be sure to follow the instructions shown in Figure 6.7. Please note that
questions 10 and 11 in Figure 6.7 encourage you to keep track of two factors of your own
choosing-just be consistent in your record keeping! When your week or two is up and
you are finished collecting data, go on to step 2.
Step 2. Complete the questionnaire shown in Figure 6.8 and then go on to step 3.
Step 3. Compute the correlation coefficient between your daily mood ratings (the
first question in Figure 6.7) and each of the factors that might predict your mood (see
What to Do When: A Brief, Conceptual Guide to Other Measures of Association 235
Read the following questions and then rate the relationship you believe exists between the
cited factors and your mood during the time you completed the Daily Questional"
shown in Figure 6.7. Each question asks you to determine whether there was a positive
relationship between each factor (e.g., amount of exercise, the weather) and your mood, a
negative relationship, or no relationship. A positive relationship occurred when a factor
was given a high rating (e.g., amount of exercise) and your mood was relatively high (i.e.,
positive), or when the factor was rated as being low and so was your mood. A negative
relationship would indicate that as a factor tended to be rated as high, mood was rated as
low (or vice versa). No relationship exists when you believe that a given factor and your
mood were unrelated to one another.
Use the following scale to rate each of the factors cited below:
-3 = Strong negative relationship
-2 = Moderate negative relationship
-1 = Slight negative relationship
o= No relationship
+1 = Slight positive relationship
+2 = Moderate positive relationship
+3 = Strong positive relationship
1. How was your daily mood related to whether it was Monday? _ _
(negative relationship = mood tends to be lower on Mondays than on other days; positive
relationship = mood tends to be higher on Mondays than on other days)
2. How was your dally mood related to whether it was Tuesday? _ _
3. How was your dally mood related to whether it was Wednesday? _ _
4. How was your daily mood related to whether it was Thursday? _ _
5. How was your daily mood related to whether it was Friday? _ _
6. How was your dally mood related to whether It was Saturday? _ _
7. How was your daily mood related to whether it was Sunday? _ _
8. How was your daily mood related to the wBBthBr? _ _
9. How was your daily mood related to your physical hBBlth? _ _
10. How was your daily mood related to your "lations with your romantic
partnBr/Spouse/frlends? _ _
11. How was your daily mood related to the food you ate? _ _
12. How was your dally mood related to your WOrkload? _ _
13. How was your daily mood related to your amount of physical exercise? _ _
14. How was your daily mood related to the amount of sleep you had the night befo,,? _ _
15. Your Question 1: _ _
16. Your Question 2: _ _
questions 2 to 11 in Figure 6.7}. Here are brief instructions illustrating how to correlate
your recorded mood with the amount of sleep you got each night. The same procedure,
then, can be used to correlate your rated mood with the remaining ten predictors. As you
finish calculating each correlation coefficient, enter it into the appropriate place in col-
umn 2 of Figure 6.9.
To determine the Pearson r between mood and number of hours of sleep per night,
use formula [6.6.1], where:
Please note that for the days of the week, such as Monday, enter Y = 1 only when it is
Monday; enter Y = 0 when it is not Monday, and do the same for the remaining 6 days
(i.e., for Tuesday, Y = 1 and for not Tuesday, Y = 0).
Step 4. You should end up with a correlation between your daily mood ratings and
each of the 16 predictor variables shown in column 1 of Figure 6.9. Be certain that the
correlations are properly entered into the second column of Figure 6.9.
Step 5. Take the estimates of how much you believe each factor predicts mood
from Figure 6.8 (Reviewing the Daily Mood Questionnaires) and write them into column
3 of Figure 6.9.
Step 6. Compute the correlation between columns 2 and 3 of Figure 6.9 (Mood
Rating Worksheets) and record it in the space provided at the bottom of the figure. This
correlation represents your accuracy score. If it falls between +.so and + 1.00, then you
are fairly accurate when it comes to judging how each variable affects your mood state.
If the correlation is less than + .sO, then you are either fairly inaccurate or there are
Monday
Tuesday
Wednesday
Thursday
Friday
Saturday
Sunday
Weather
Physical Health
Relationships
Food
Workload
Amount of Exercise
Amount of Sleep
Other 1
Other 2
problems with the data (e.g., there was not enough variance in your mood-perhaps
due to truncated ranges-during the time you completed the daily questionnaires).
Step 7. Using the correlational results now written into Figure 6.9, as well as the
Wilson et al:s (1982) research (see Data Box 6.A), write a brief explanation wherein you
discuss how the correlational results relate to your self-beliefs (e.g., how well do you
,)
seem to know what predictor variables influence your moods). Discuss the role the cor-
relation coefficient plays in this type of research: What are the advantages and disadvan-
tages of correlation as a technique for behavioral scientists?
i
f
LOOKIN<t fORWARD Et;l lFIEN ~ACK
y far, the Pearson correlation coefficient (r) is the one most frequently encountered
in the behavioral science literature. It is also the index of association that you are
most likely to use. The first decision tree opening this chapter, however, will help
( you to think about the association between variables in somewhat broader terms. To be
i sure, if you are working with interval or ratio-scaled data, then you will be apt to use the
i
i Pearson T. If you end up using ranked or ordinal data, however, the Spearman-which
I the decision tree will guide you to-is available (see chapter 14). Nominal data can also
r, be accommodated.
The second decision tree is designed to help you get your hands a bit dirty with your
data. In other words, it is often important to verify that a correlation's numerical value is
r informative. The second decision tree invites you to check the range of values used to
r
r calculate the Pearson-is there some degree of difference or are the observations too
close together? Plotting the observations can also reveal the absence of a linear relation-
,.i
! ship-the only relationship the Pearson T is designed to detect-but the presence of
some other (e.g., curvilinear) pattern that could give you empirical pause. Finally, any
outlying scores or preexisting subpopulations lurking in the data must be reckoned with,
[ which serves as a reminder that statistical analysis is all about data analysis.
r
r
r
I
1-
,. Summary
)
( 1. The term "correlation" refers to the association between two ative relationship, the variables' values are inverted: As the
variables, that is, how they vary or relate to one another. value of one increases, the other necessarily decreases. In a
2. Because the variables are measured and not manipulated in zero relationship, no pattern or predictable relationship ex-
any experiment, a correlational result is not a causal result. In ists between the variables.
a correlational relationship, variable X can lead to Y, Y can 5. Correlational values range from -1.00 to + 1.00. The posi-
[, lead to X, or an unknown third variable Z can mediate the re- tive and negative signs indicate the direction of the relation-
lationship between the two. Correlations can be suggestive, ship, where the strength of association between the variables
I pointing to relationships between variables that are worthy is known by the correlation's numerical value. Values close to
r of experimental investigation. 0.0 indicate that the variables are not related to one another,
(
3. The Pearson product moment correlation coefficient, or but as values move away from 0.0, their strength, as well as
r Pearson T, is the most commonly used index of correlation.
Conceptually, this statistic assesses the degree to which vari-
the predictability and association increase.
6. A scatter plot is a graph used to represent correlational data.
ables X and Y vary together, as well as the degree to which The points in a scatter plot indicate the intersection of each X
they vary separately from one another. value with its corresponding Yvalue. As points become more
4. The direction of a correlational relationship is described as linear within the graph, the strength of the correlation will
being positive, negative, or zero. In a positive relationship, as increase.
the value of one variables increases (or decreases), the value 7. When an rvalue is squared (i.e., r2), it can be used to learn
of the second variables behaves in a similar fashion. In a neg- what proportion of the variance in one variable within a
238 Chapter 6 Correlation
correlation is accounted for by the other variable. This statis- lems with sample size, and preexisting subgroups within
tic is known as the coefficient of determination. The correla- the data.
tion of nondetermination, which is equal to 1 - r2, reveals 9. The Pearson r can be used to determine the consistency of a
the proportion of changes in one variable that cannot be ac- measure (i.e., its reliability), as well as its relationship with
counted for by the other variable. other similar and dissimilar instruments (i.e., its validity).
8. Spurious correlations can possess unusually high or low val- 10. The Pearson r is the most popular index for measuring the
ues that disguise any real (or nonreal) effects within a data strength of association between two interval or ratio-scaled
set. Such correlations occur due to several factors: a trun- variables. There are, however, a variety of other types of cor-
cated range of values in one or both variables, nonlinear re- relations that can be used for ordinally or nominally scaled
lationships in the data, the presence of extreme scores, prob- variables.
Key Terms
Alternate form reliability (p.229) Negative correlation (p.211) Reliability coefficient (p. 229)
Coefficient of determination (p.222) Odd-even reliability (p.229) Scatter plot (p. 212)
Coefficient of nondetermination (p. 223) Pearson product-moment correlation Split-half reliability (p. 229)
Correlation (p. 206) coefficient (p. 207) Spurious correlation (p. 224)
Internal consistency measure of Pearson r (p. 209) Test-retest reliability (p. 229)
reliability (p. 229) Positive correlation (p.211) Third variable problem (p.208)
Item-total reliability (p. 229) Power analysis (p. 225) Zero correlation (p.211)
Chapter Problems
1. Conceptually, correlation assesses the association between 13. Examine the following set of data:
two variables. Why isn't the association causal in nature? X Y
2. What is the third variable problem in correlational analyses?
Provide a concrete example. 4 7
3. Describe the possible directions of relationships between two 3 2
variables in correlational analyses. What role do positive ( + ) 8 5
and negative ( - ) signs play? 7 6
4. What is the possible range of values for the Pearson r? Is an 8 7
r = +.35 stronger than an r of - .35? 2 5
5. Describe a scatter plot showing a positive correlation, a nega-
a. Draw a scatter plot for the X and Y variables and then de-
tive correlation, and a zero correlation.
scribe the relationship, if any, between them.
6. Describe a scatter plot of a perfectly linear positive correla-
b. Calculate a Pearson r between X and Y. Does the r fit the
tion and a perfectly linear negative correlation.
relationship you described in part a?
7. In what way is the Pearson r related to z scores? What advan-
c. What are the coefficients of determination and nondeter-
tage does the z score provide to the Pearson r?
mination for these data?
8. Provide descriptive labels for the magnitudes of the following r-
14. Examine the following set of data:
values: -.67, +.98, -.03, +.25, -.81, +.79, -.55, +.37.
9. Conceptually define the coefficients of determination and
X y
nondetermination. How do these coefficients aid re- 1 3
searchers? 5 5
10. Provide the coefficient of determination corresponding to 3 2
each of the r values shown above in problem 8. 4 5
11. Provide the coefficient of nondetermination corresponding 5 4
to each of the r values shown above in problem 8. 2 3
12. Some factors lead to abnormally high or low correlational
values. Briefly describe three of the factors that can have dele- d. Draw a scatter plot for the X and Yvariables and then
terious effects on correlations. describe the relationship, if any, between them.
Chapter Problems 239
r
~.
I
e. Calculate a Pearson r between X and Y. Does the r fit 19. You are an industrial/organizational psychologist who wants
r the relationship you described in part a? to develop a new measure of job stress for use in corporate
i
r, f. What are the coefficients of determination and nonde- settings. How would you go about assessing the reliability of
i termination for these data? your new measure?
i
;
15. A health psychologist is interested in the relationship be- 20. You are an educational psychologist who has developed a
tween self-reported stress (X) and objective health ratings new intelligence test for use in elementary school grades 1-3.
; (Y). Higher scores on each measure indicate more stress and How would you go about assessing the validity of this new
r better health, respectively. Using the following data, deter- intelligence test?
r
r mine the nature of this relationship by calculating a Pearson 21. The following scores represent a personality test that was
r and then interpreting the result in words. given to the same group of participants on two separate oc-
i X Y
casions. Calculate an r to represent the test's reliability. Does
r the test appear to be reliable? Why or why not?
2 10
) 9 3
X Y
r 10 4 9 9
(,..
11 5 8 6
4 13 6 7
\ 5 11 9 8
1 10 7 10
3 4 8 5
f) 9 9
16. Read each of the following descriptions of correlations. Eval-
r
I
uate whether each description is a reasonable account of
what the correlation means. If the description is not accurate,
3 8
22. Draw a scatter plot of the X and Y data provided in problem 21.
then suggest how it should be corrected. Does plotting the data provide any further evidence for the
I a. An educator correlates the number of course books stu- reliability of the measure? Why or why not?
I dents purchase in a semester with their grade point aver- 23. Use the decision trees opening this chapter to answer the fol-
i
ages (GPA). This r is +.56, and the educator concludes lowing questions:
that using more books in a given semester leads to higher a. A researcher wants to assess the correlation between tem-
grades (and vice versa). perature and rainfall. Which measure of association is ap-
f
r
b. A negative correlation exists between number of visits to
the student health center per year and GPA. Healthier stu-
propriate?
b. A student wants to determine the correlation between his
dents appear to have better grades than those who are fre- rankings of national league baseball teams and those of a
r
quently ill. friend. Which measure of association is appropriate?
II- c. GPA in college and scores on the Scholastic Aptitude Test
(SAT) are moderately correlated. We cannot conclude,
c. An investigator discovers a correlation of +.75 between
five pairs of test scores. Is this correlation apt to be reli-
I
r however, that higher scores on the SAT cause better grades able? Why?
\ later on. d. Before calculating a correlation between two variables, a
\
t 17. How is the Pearson r used to determine the reliability of a student notices that one of them is dichotomous-the
measure? What range of r values is appropriate for demon- score for X is either "1» or "2:' but Y has a relatively
I strating reliability? wide range of values. Should the student be concerned?
I( 18. How is the Pearson r used to determine the validity of a mea- Why?
sure? What range of rvalues is appropriate for demonstrating
r validity?
I
I
)
r
Deciding to Perform a Linear Regression Analysis
2.
Are the data
)!;organized into pairs!!:
,,;o( scores (Le., X. Y)?
lis'i"i~: >
~flo you intend to use ....
Crone variable (X) to
: predict another (Y)?
L >lfJ~, Ifno,~>
If" then go to h·;.then go to
step 5. step 7.
&.
""cis one variable (Y) a' Is the other variable
i:~<. criterion or .' 1%
$i"liependent variabllJ'l
~\ -cc;} >,~~c:
,; (X~ni;i:fl~~~or
Chapter OUtline
usually a scatter plot of points representing hours studied (X) by exam scores (Y). In ef-
fect, this statistical technique seeks to fmd the best fit for a straight line projected among
the points on the scatter plot, one that "captures" as many of them as possible (if nec-
essary, you may wish to review the discussion of scatter plots introduced in chapter 6 be-
fore proceeding). In the process of doing so, linear regression can provide a viable esti-
mate for the each individual's exam score, given the number of hours he or she studied
for the test.
Of course, regression analysis does not guarantee that a given prediction will match
Correlation =association; regression the reality imposed by the available data, but it can help us to approximate actual obser-
vations, as well as to identify reasons for the error between our estimates and reality. Re-
=prediction. member that regression is only one tool that guides us through the process of drawing
inferences. It cannot provide absolute or definitive answers. As you will see, regression is
a powerful technique that enables us to apply skills and concepts learned from earlier
chapters-especially knowledge about correlation and the standardization of z scores-
toward developing more precise accounts of human reasoning and behavior.
When rxy is negative, however, the sign of Zy will be opposite of Zx; low scores will be as-
sociated with high scores and high scores with low scores (see [7.1.1]).
The second point regarding the z score equation for regression is that when rXY =
When two variables are uncorrelated ± 1.00, Zy will have the same score as Zx. As we know, of course, such perfect correlation
is rare in behavioral data. Thus, when rXY < ± 1.00, Zy will be closer to 0.0 than Zx. Any
with one another, the best predictor
z score that approaches 0.0 is based on a raw score that is close to a distribution's mean,
of any individual score on one of the implying that a predicted score will always be closer to a mean than the raw score on
variables is the mean. The mean is which the prediction was based. The most extreme example in this vein occurs when
the predicted value of X or Y when rxy = 0.0. When this happens, Zx is multiplied by 0.0 and Zy becomes equal to 0.0,
the mean of the zdistribution (see [7.1.1]).
the correlation between these
variables is O.
Computational Approaches to Regression
Linear relationships between variables X and Y can also be represented by this well-
known computational equation:
[7.2.11 Y= a+ b(X),
where Y is the criterion variable-what we are trying to predict-and a and b are con-
stants with fixed values. Variable X, of course, is the predictor variable that can take on
different values. As in correlational analyses, variables X and Yare pairs of scores that
will vary or change from individual to individual within a given data set, whereas the val-
ues of a and b will not change.
Some of you may remember having seen this formula (or a variation) earlier in your
Y = a + b(X) is the formula for a
educational career, where it was probably used as an equation for plotting points into a
straight line. If so, then you will remember that b is also called the slope of the line, the
straight line.
purpose of which is to link Y values to X values. The slope of any straight line defines its
angle in space. Conceptually, the slope of a line represents changes occurring in Y that
take place when X is changed 1 unit, or
[7.3.11 b = change in Y.
changeinX
[7.4.1) Y= 20 + 5(X).
We can compute Y values using [704.1] and then plot them to confirm that they fall
into a straight line. Let's assume that one research participant spends 3.0 minutes (i.e.,
X = 3.0) on the procrastination task. Her score would be,
[7.4.2) Y = 20 + 5(3.0),
[7.4.3) Y= 20 + 15,
[7.4.4) Y= 35.
A second participant spends 5.5 minutes on the task. His procrastination score is:
[7.4.5) Y= 20 + 5(5.5),
[7.4.6) Y= 20 + 27.5,
[7.4.7) Y= 47.5.
What if a third participant took 2 minutes to complete the task? What is the value of this
individual's Y? Use the formula shown in [704.1] and calculate the Y value now.
Because the slope of the equation ( b) is positive, we expect a graph of the computed
criterion scores (Y) plotted with the predictor scores (X) to indicate a direct relation-
ship. Figure 7.1 illustrates the relationship between scores on the procrastination task
and number of minutes spent performing its behavioral component. As you can see, the
plotted data reveal a straight line, which begins at the Y intercept of a = 20. Did you cal-
culate the value for Y when X is equal to 2? Note that when X = 2.0 minutes, Y = 30 (see
Figure 7.1).
As seen here-and as we learned in chapter 6-a straight line is often an excellent
way to portray the relationship between two variables. Regression analysis effectively
tries to determine the best fitting straight line that can be projected among a set of points
for any set of data. This line is referred to as the regression line.
A regression line is a straight line projecting through a given set of data, one designed to represent
KEY TERM the best fitting linear relationship between variables X and Y.
What is meant by a "best fitting" line? We must first recognize that scatter plots of
data points rarely look as organized and linear as those plotted in Figure 7.1. In fact, be-
havioral data are typically not very neat and tidy, let alone linearly predictable. Indeed, if
we actually had a scatter plot of points that fell into perfect alignment with one another,
we would know that the correlation between X and Y was either + 1.00, -1.00, or O!
Generally, scatter plots of data look like those shown back in chapter 6 (e.g., see
Figure 6.2 on page 214)-there are many points in a graph's space that can suggest a
positive or a negative pattern, but they rarely fall into an orderly, let alone straight,
shape or pattern. Of course, a straight line can be drawn among plotted points that
Simple Linear Regression 245
80
70
60
..
I!!
co
Vol
c 50
co
~Y
:;; 40
~
e
a..
30
o 2 3 4 5 6
x
Minutes Spent on Behavioral Task
Figure 7.1 Procrastination Scores as a Function of Time Spent Performing Behavioral Task (in
Minutes)
Note: Higher scores indicate a greater predisposition for procrastination. The plotted points were
determined using the formula Y = 20 + 5(X), where Y is the score on a behavioral procrastination
measure and X is the minutes spent performing the procrastination task. As shown above, the formula
reveals a linear relationship between X and Y.
are either close together or very far apart from one another, and such a line can be
drawn by using our own best guess or (now) by haphazardly using the formula Y =
a + b(X). Still, both of these methods lack precision-one is too intuitive, the other
somewhat unwieldy. A preferable alternative, of course, is finding a way to consistently
determine a best fitting straight line tailored to a given set of data.
The distance - Y- Y- -
, I
I
• •
•
•
between Y and Y = Distance I
I
also represents I
error in measurement. • I
I
•
•
I
•
Y • •
•
•
Y= a + b(X) •
• •
x
Figure 7.2 Hypothetical Scatter Plot of Scores with Accompanying Regression Line
This formula calculates the total squared difference between the actual values for Yand
the predicted values for Y, and a better fitting straight line will result when Y =:; Y. In the
meantime, we must recast the formula for a straight line as:
A
[7.6.1] Y = a+ b(X).
A
When we say the "distance" between Yand Y, what do we mean exactly? Figure 7.2
shows a hypothetical scatter plot of points with a regression line drawn among them. As
you can see, one X, Y data point falling above the line has been highlighted. The broken
line drawn between this point and the regression line represents the distance between an
actual Y value and the value predicted by the regression line (i.e., Y). The distance or
difference between these two values represents error, or the how measured reality differs
from a prediction, a topic we will explore in the next section of the chapter. Obviously, a
desirable state of affairs entails minimizing error or, put another way, having predicted
A . " •
Y values that closely match actual Y values. Some of the distances between Yand Y will
be positive, others negative-by squaring these distance values we ensure that we have a
positive measure of error, one that can be used for subsequent analytic purposes. (Recall
that in chapter 4, we learned that such deviation values must be squared, otherwise the
positive and negative values will sum to 0.)
Now that we will rely on revised regression formula [7.6.1] for a best fitting line, we
need to learn how to determine specific numerical values for a and b within the formula.
The mathematics involved is actually rather complicated, providing much more detail
than we need for present purposes. Fortunately, however, the results of the mathematics
can be presented in familiar terms we can both understand and use. Let's begin with b,
which is equal to:
Sy
[7.7.1] b=r-
Sx
That is, the slope of the regression line can be determined by multiplying the correlation
between X and Y by the product of the standard deviation of Y divided by the standard
deviation of X.
The value of the intercept, a, in turn, can be determined by:
[7.8.1] a = Y - bX.
Simple Linear Regression 247
75 82
2 83 86
3 92 88
4 95 98
5 74 75
6 80 86
x = 83.17 Y = 85.83
Sx = 8.70 Sy= 7.55
TXY = .8925 == .89
N=6
Indeed, this formula produces the same result as formula [7.6.1], but it does so in a way
that will probably make more sense to you as we work through an example.
Table 7.1 presents X and Y scores for six people. The X scores represent perfor-
A rule of thumb for selecting r(sx/sy) mance on a practice aptitude test and the Y values are scores on the actual aptitude
test given a week later. Can we use X to predict Y? Take a few moments and look over
or r(syjsx) for the raw score
the data in the table. The respective means and standard deviations for X and Yare
regression formula: The standard provided here for you, and you will recall how these values were calculated from
deviation for the variable you wish to chapter 4.
predict is the numerator and the As shown in Table 7.1, participant 2 scored an 83 on the practice aptitude test.
Using the information available in Table 7.1, we can see what score he is predicted
standard deviation for the predictor
to receive (Y) and then compare it with the score that we know he did receive (Y= 86).
variable is in the denominator. We simply enter the relevant information from Table 7.1 into formula [7.9.1], or:
practice test-what is her predicted score on the actual test? We once again use formula
[7.9.1], entering the necessary values from Table 7.1 into:
b 7.55
[7.10.2] = +.89--,
8.70
[7.10.3] b = +.89(.8678),
[7.10.4] b = +.7723 == +.77.
[7.11.1] a = Y- bX,
YCan Also Predict X. As you no doubt recall from our discussion of correlation in
chapter 6, the relationship between variables X and Y is a reciprocal one. Thus, in
Simple Linear Regression 249
/
100
90
80
•• •
70
60
Y
50
40
I
30
20
10
0 10 20 30 40 50 60 70 80 90 100
X
;
{ Figure 7.3 Scatter Plot with Regression Line for Practice and Actual Aptitude Test Scores
t"
~ Note: The line projected through the above points is the best fitting straight (regression) line. Predicted
,,
{
Y
values for fall onto the line. The points shown around the line are the actual values for Y. A$long as the
I correlation between X and Y is less than ± 1.00. there will be some degree of error between the predicted and
( actual values of Y.
r
r,
I regression analysis, X can be used to predict Y-but X can also be predicted from Y
( (i.e., Xbecomes X). The formula for predicting X is:
.
f
,,
r [7.13.11 X = X+ r( :: )(Y - Y).
(
t
I
Beside the fact that X is now being predicted instead of Y, the main differences between
( this formula and [7.9.1] are that the s~dard deviation of X is being divided by the stan-
( dard deviation of Y, and that Y and Y have been substituted for the respective comple-
r ments of X. Like correlation, regression is flexible, so that researchers need not necessar-
ily always denote the independent variable as X and dependent variables A
asA Y.
( One final point about regression lines and the calculation of Y or X values. You
{
X can predict Y or Y can predict X should avoid calculating any predicted values for Yor X that do not fall within the range
r of the original data. Why not? Simply because we do not know the quality of the rela-
tionship between pairs of X and Y values beyond their known boundaries. The X values
in Table 7.1 range between 75 and 95, but it would not be appropriate to calculate YVal-
ues for X = 62 or X = 121, for example, as both fall outside the data set's boundaries.
r
I-
I Knowledge Base
\, 1. Within this equation, Y = 10 - 4.5(X), what are the respective values of the inter-
I cept and the slope of the line?
(
r 2. Using the equation Y = 20 - 2.8(X), calculate Y when X = 3, 5, 6, 8.
What does I (Y - Y)2 represent?
.
r
,
3.
4. True or False: As r approaches -1.00, the amount of error in a regression analysis
I
will increase.
r/
i
r Answers
{
r 1. Intercept = IO;Aslope = -4.5.
2. The values of Yare 11.6, 6.0, 3.2, and - 2.4.
r
/
I
250 Chapter 7 Linear Regression
3. Error in the prediction of Y, or the sum of the squared differences between the predicted and
actual values of Y.
4. Y
False: As rapproaches -1.00 or + 1.00, the amount of error decreases (Le., will increasingly
approximate Y). Error increases as rgets closer to 0.0.
C an a linear regression model be used to predict academic success? Dawes (1979) describes how
variables such as grade point average (GPA), scores on the Graduate Record Exam (GRE), and
ratings of undergraduate institutions' selectivity can be used to develop formulas for graduate ad-
missions committees. Such formulas determine which applicants are apt to succeed in a given
graduate program. As Dawes and Corrigan (1974, p. 105) put it, "the whole trick is to know what
variables to look at and then to know how to add." Developing useful regression equation involves
a bit more work than this wry observation suggests, but the point is well taken.
Humor aside, however, many people become upset at the thought that a mere formula rather
than a flesh-and-blood judge or committee decides who is (or is not) granted admission to a grad-
uate program. Are such reactions justified, especially if the method proves to be successful, that is,
accurate in its predictions? Knowing what you now know about linear regression, for example,
how would you feel about your future being decided by the outcome of an equation? Is it more or
less fair than the decision reached by a committee evaluating the same information, which the lat-
ter would do in a much less formal and systematic manner?
Dawes (1979) persuasively argues that linear models are better at making important judg-
ments in a fair and decisive manner because they are more objective and technically superior to
human judges. Many people-including some behavioral scientists-are loath to abdicate their
right to admit students on subjective grounds, however. There are also a variety of psychological
arguments against linear models, notably the persistent, if erroneous, belief that clinical intuition
regarding admission decisions will be correct with greater regularity (see Data Box 7.B). Finally,
Dawes addresses the ethical implications of relying on linear models. Is making an important
choice about graduate admissions-or any other decision forum-"dehumanizing" if no inter-
view occurs? Dawes (1979, pp. 580-581) addresses this matter cogently.
I think that the question of whether people are treated in a fair manner has more to do
with the question of whether or not they have been dehumanized than does the question
of whether the treatment is face to face. (Some of the worst doctors spend a great deal of
time conversing with their patients, read no medical journals, order few or no tests, and
grieve at funerals.) A GPA represents 3~ years of behavior on the part of the applicant.
(Surely, not all the professors are biased against his or her particular form of creativity.)
The GRE is a more carefully devised test. Do we really believe that we can do a better or a
fairer job by a lO-minute folder evaluation or a half-hour interview than is done by these
two numbers? Such cognitive conceit (Dawes, 1976, p. 7) is unethical, especially given the
fact of no evidence whatsoever indicating we can do a better job than does the linear
equation. (And even making exceptions must be done with extreme care if it is to be eth-
ical, for if we admit someone with a low linear score on the basis that he or she has some
special talent, we are automatically rejecting someone with a higher score, who might
well have had an equally impressive talent had we taken the trouble to evaluate it.)
Before you decide that admitting people based on "mere numbers" is too extreme, take a
few minutes and think about Dawes's comments and their implications for the use of regression in
decision making (see also, Dawes, 1971, 1975).
Residual Variation and the Standard Error of Estimate 251
The residual variance can be determined from the sum of squares of the distance be-
tween actual Y values and predicted Y values (i.e., Y), which we introduced previously
in [7.5.1] as 2: (Y - y)2. The symbol for the residual variance is S;st Y and, in formula
terms, it looks like this:
[7.14.1)
2
- 2:(Y-
sestY- Y?
N-2
The standard error of the estimate is similar to the standard deviation, as both measures
provide a standardized indication of how close or far away observations lie from a cer-
tain point-the regression line in the case of the standard error of the estimate and a
mean in the case of the standard deviation. The standard error of the estimate-sym-
bolized Sest y-can be conceptually presented as the square root of the residual variance:
Take a good look at [7.16.1] and then think about these properties. First, when r is
equal to -1.00 or + 1.00 (or very close to either value), then Sest Y will be equal to 0.0. In
other words, there are no predictive errors because the regression line perfectly matches
up to the actual values of Y. As the value of r draws closer to 0.0, however, then the cor-
responding value of Sest Y will increase (Le., error will increase as variables X and Yare
shown to be less associated with one another). In fact, when r = 0.0, errors in prediction
will be at their maximum for a given distribution of scores, drawing close to the value of
the standard deviation for Y, or Sy. In symbolic terms,
[7.17.1] Sest Y
-~
= Sy
N-2
--'
As the sample size becomes increasingly large, Sest Y ~ Sy because VN/(N - 2) will be
approximately equal to 1.0.
What is the standard error of the estimate for the aptitude test data presented ear-
lier in the chapter? We need only to turn back to Table 7.1 to collect the information we
need to enter into [7.16.1] for:
[7.18.3] -
SestY- (755))
• 60 - .7921) ,
4
) 6(.2079)
[7.18.4] Sest y = (7.55) 4'
Thus, the standard error of estimate for Y, the actual aptitude test scores, is equal to
4.22. Similar to the standard deviation, the standard error of estimate indicates that
there is some dispersion of Y scores around the regression line shown in Figure 7.3. As
the standard error of estimate increases in magnitude, this statistic indicates a relatively
greater amount of dispersion around the regression line. Further, we now know more
about a prediction than simply where a regression line is plotted-we also have a sense
of how well it captures the observations in a set of data.
,
i Residual Variation and the Standard Error of Estimate 253
)
I Along-running debate in the discipline of psychology concerns the proper role, if any, of statis-
J ~tical formulas for use in diagnosing and predicting psychological disorders or making other
,, clinically related decisions (e.g., Meehl, 1977). These statistical-sometimes called "actuarial"-
I methods do not involve human judges, rather such judges identify diagnostic criteria for inclusion
in the formulas. The term actuarial means computed statistics, which are usually based on large
} samples of people. Clinical conclusions based on such formulas, then, focus entirely on empiri-
cally valid relations between predictor variables and outcomes. In contrast, clinical judgments rely
entirely on human judges (e.g., psychiatrists) and their intuitions. Such human judges must care-
( fully observe, collect, and codify clinically related information in their heads to diagnose clients.
f Close to 100 behavioral science studies on various topics indicate that where accuracy is con-
r cerned, statistical predictions are equal to or better than clinical judgments made by humans (see
l
r Dawes, Faust, & Meehl, 1989). The fact is that statistical predictions remain superior even when
human judges have access to all the available actuarial information (e.g., diagnostic test scores,
personality inventory results) that is entered into the formula. Such data indicate rather conclu-
sively that our intuitions are not nearly as good as formal statistical models, which raises an im-
portant question: Why don't we rely on these models consistently?
i Presumably, people do not like the idea of allowing a nonhuman entity to make decisions
i about the fate or treatment of people. More than this understandable concern, of course, may be
r
i another human concern-taking the role of decision-maker away from human experts (psychia-
(
)" trists, psychologists, and social workers, among others)-who have years of training and vested in-
terests in making clinical decisions. A way out, of course, is not to assume that the data indicate
human judges do a mediocre job compared to actuarial formulae, rather we should focus on what
I a difficult job it is to assess a variety of variables in order to reach a decision (Nisbett & Ross, 1980).
f
Human experts should be lauded for developing accurate formulas and for subsequently treating
t I
people based on a formula's diagnosis. After all, what shame can there be in using a tool that is con-
sistently shown to actually help people?
,
,I Research continues to demonstrate the actuarial method's superiority to clinical method (for
j. a review, see Dawes, Faust, & Meehl, 1989; see also, Chapman & Chapman, 1971; Dawes, 1979), yet
i
I' many people cannot abide allowing a decision to be made by formulas, not people-even when
the former consistently matches or outperforms the latter (see also, Goldberg, 1970). What do you
think about the trade-off between accuracy and who (or in this case, what) is making a decision?
If a formula consistently and correctly diagnoses clinical conditions or points to appropriate deci-
sions in other realms, should it be favored over the (sometimes flawed) opinions of human judges?
Or, are such formulae admittedly useful tools that must still be necessarily displaced by the final
judgment of human experts?
(
I
I
f
(
Assumptions Underlying the Standard Error of Estimate
)
There are a few important assumptions underlying the use of the standard error of
estimate and, more generally, reliance on regression analysis. First, although any stan-
dard error of estimate you calculate will be based on sample data, it is meant to rep-
resent the larger population from which the sample data were drawn. In other words,
we assume that the best fitting regression line created for one set of data, as well as its
standard error of estimate, is theoretically applicable to other, similar sets of data
,
I
I
I
,(
254 Chapter 7 Linear Regression
y4
0 4 5
Figure 7.4 Scatter Plot of Hypothetical Data Illustrating the Assumption of Homoscedasticity
collected at other times and based on the responses of other participants. Indeed, one
motivation for researchers is to develop regression equations that become increas-
ingly predictive as they are applied to various data sets (and such equations are fre-
quently revised based on new data). A college or university might create a regression
model-a term often used to describe an equation that is repeatedly tested and re-
vised-for predicting the qualifications students need to academically succeed at the
institution.
Second, as was the case for correlational analysis, variables X and Y presumably
share a linear or straight-line relationship with one another.
Third, we assume that in theory, every X value has a possible distribution of Y values
associated with it; Yis equal to the mean of each distribution of Yvalues, which vary around
it. The standard deviation of each of these distributions is said to be equal to 5est y. The spread
of Y values around Yvalues should be the same up and down a regression line, then, mean-
ing that 5est Y will have the same value for each and every predicted (:Y) value. Put another
way, the distribution of Y scores for each X value will have the same variance. This particu-
lar condition is called homoscedasticity, which basically means that the variability associ-
ated with one variable (Y) remains constant at all of the levels of the other variable (X).
Essentially, it is as if small but relatively normal distributions of scores occur at
Heteroscedasticity is the opposite of regular intervals all along the regression line. Figure 7.4 portrays some hypothetical
data that adhere to the assumption of homoscedasticity of variance. As you can see,
homoscedasticity. It refers to the
various small distributions of Y values appear to "line up" in a regular fashion above
condition where Y observations vary each value of X. A scatter plot of data violating the homoscedasticity assumption
in differing amounts at different levels would show irregular spreadings of Y values for each value of X along a regression
of X.
line-some of these Y distributions would contain closely aligned Y scores whereas
others would illustrate scores widely apart from one another. This condition, which
is the opposite of homoscedasticity, is referred to as heteroscedasticity. When het-
eroscedasticity occurs, the homoscedasticity assumption is violated precisely be-
cause the value of Sest Y varies from distribution to distribution along the regression
line.
When homoscedasticity is present and each of the distributions of Ycorresponding
to some value of X is normal (i.e., Yis normally distributed at every level of X), the stan-
dard error of estimate is known to share some characteristics with the standard normal
.
(
",
Residual Variation and the Standard Error of Estimate 255
,t
/
i
r
J
)
,)
.1
!
;
(
t~_~t
; 68.26% of actual Y scores
J fall within these boundaries.
I
95.44% of actual Y scores
fall within these boundaries.
Figure 7.& Standard Error of Estimate with Assumptions of Homoscedasticity and Normal Distri-
butions of Y at Every Level of XBeing Met
Note: Approximately 68.3% of the Y scores fall within ± 1 seS! y of Y. This condition will hold true as long as
the distribution of Y scores at every score of X is approximately normally distributed and all the
distributions have the same pattern of dispersion (i.e., homoscedasticity).
Figure 7.6 nicely illustrates these two assumptions and their theoretical effects on
the standard error of estimate. The mean of each of the theoretical distributions along
Y,
the regression line (i.e., the middle line in Figure 7.6) is equal to and the boundaries
for ± 1 Sest yare shown to the left and right of this point (see Figure 7.6).
,;
(SStot), the variance and standard deviation for Yare easily known (i.e., SStot/N
and V SStot/ N, respectively; see chapter 4).
)
In mathematical terms, we can show that Total sum of squares = Unexplained vari-
ation in Y (i.e., error sum of squares) + Explained variation in Y (i.e., regression sum of
squares), or:
[7.19.11
J
J
I
Though it is time consuming, it is entirely possible to take a data set and calculate
each of these different variances in order to demonstrate that the above relations are true.
SStot = SSunexPlained + SSexPlained.
",; Table 7.2 displays the partitioned variance for the practice and aptitude test data from
Table 7.1. As shown in Table 7.2, the explained and unexplained variation do sum to the
!
total sum of squares (SStot). Please note that the SStot and the sum of the explained vari-
: ation and the unexplained variation are approximately equal to one another due to
! rounding (see Table 7.2). Such minor rounding error illustrates the real nature of data
r
analysis, that calculations do not always add up perfecdy. As long as the values involved
I'
are relatively close to one another, such approximations are fine.
Is there an easier way to conceptualize the explained variation versus the unex-
plained variation in a regression analysis? Yes, indeed, and instead of actually perform-
ing calculations like the ones shown in Table 7.2, we can rely on the coefficients of de-
termination and nondetermination to help us achieve the same end.
( .) It,
, Explained Variation Unexplained Variation Total Variation
~
X Y
A
Y cy -
y)Z (y- :f)Z (y _ Y)Z
I
I
I 75 82 79.52 39.82 6.15 14.67
83 86 85.7 0.017 0.09 0.029
92 88 92.65 46.51 21.62 4.71
95 98 94.97 83.54 9.18 148.11
74 75 78.75 50.13 14.06 117.29
80 86 83.38 6.00 6.86 0.029
,1
Sum of Squares 226.02 57.96 284.84
A
[7.20.1]
r = Regression sum of squares = L (Y - y)2 .
Total sum of squares L (Y - y)2
We need not do all these calculations if we already know the correlation between
variables X and Y. Thus, if we know that the correlation between the practice aptitude
test scores and the actual aptitude test scores in the earlier example was + .89, then r2 =
(r)2 = (+.89)2 = .7921. In other words, about 79% of the variance in the actual aptitude
test can be predicted from the practice exam. We know this to be true because r2 is the
proportion of variance in Y accounted for by X.
Naturally, we can just as easily and quickly determine the variation in Y not ac-
counted for by X by calculating Ie, the coefficient of nondetermination. In regression
terms, then:
[7.21.1]
k = 1 _ r2 = Error sum of squares = L (Y - [)2.
Total sum of squares L (Y - Y?
As was true for the coefficient of determination, we do not need to perform labori-
ous calculations to determine k if we know the correlation between X and Y or the value
of r2. Because we already know the value of r2, we need only to subtract it from 1, or k =
1- r2 = 1- (.7921) = .2079 == .21. Thus, about 21% of the variation in the actual ap-
titude test scores cannot be predicted from the practice exam scores (i.e., other unmea-
sured factors, systematic as well as random, are responsible).
Knowledge Base
1. True or False: When performing a regression analysis, investigators generally want a
larger standard error of estimate (SOSI y).
2. Define homoscedasticity.
3. The total sum of squares (SStot) can be partitioned into two sources of variation.
What are these sources called?
4. The correlation between the X and Yvariables in a regression analysis is + .55. What
percentage of the variability in Y is explained by changes in X? What percentage of
variability in Y is not accounted for by changes in X?
Answers
1. False: Investigators generally want a smaller standard error of estimate, as it involves less dis-
persion of Y sores around the regression line.
2. Homoscedasticity indicates that the variability associated with one variable (Y) remains the
same or constant at all levels of another variable (X) all along the regression line.
3. Sum of squares for explained variance in Y and the sum of squares for unexplained variance
in Y.
4. =
r2 = (+ .55)2 = .3025 .30 or 30% of the variability in Y is explained by X; k = 1 - r2 =
1 - .3025 = .6975 = .70 or 70% of the variability in Y is not explained by X.
Regression to the Mean 259
KEY T ERM Regression toward the mean refers to situations where initially high or low observations are found
to move closer to or "regress toward" their mean after subsequent measurement.
Do you recall the tall parent and child example presented at the opening of the
chapter? One reason we might anticipate two tall parents to have somewhat shorter chil-
dren is regression to the mean. Why? As you know, regression analysis is based on the ex-
amination of relative deviation scores from a mean, the point in any distribution that is
closest to the largest number of scores. This is not to say that our hypothetical parents
will absolutely have a shorter child, rather, any offspring are simply more likely to be
shorter (i.e., closer to a mean height) than their parents. Note that this regressive predic-
tion is an educated guess-sometimes taller parents do have an equally tall (or even
taller) child, but the best prediction is to assume that the child will be somewhat shorter
(i.e., closer to the mean) in height.
Why does regression to the mean occur? To begin, we know observations in any dis-
tribution tend to cluster around a mean. If variables X and Yare more or less indepen-
dent of one another (i.e., rXY == 0.0), then some outlying score on one variable is likely
to be associated with either a high or a low score on the other variable (recall the earlier
review of the z score formula for regression). More to the point, though, if we obtain an
extreme score on X, the corresponding Y score is likely to regress toward the mean of Y.
If, however, X and Yare highly correlated with one another (i.e., rXY == ± 1.00), then an
extreme score on X is likely to be associated with an extreme score on Y, and regression
to the mean will probably not occur. Regression to the mean, then, can explain why an
unexpected or aberrant performance on one exam does mean subsequent performance
will be equally outstanding or disastrous.
Regression toward the mean can be a particular problem for research, especially
when participants are recruited precisely because they appear to possess relatively high
or low degrees of some trait. For this reason, Cook and Campbell (1979) list regression
toward the mean as a potential threat to the internal validity of experiments or quasi-
experiments. That is, an unknown regression effect can mask the effect (or noneffect) of
an independent variable on some dependent measure.
Regression effects are especially apt to occur in studies that recruit participants
exhibiting extreme qualities-very high or very low-on dimensions critical to the re-
search hypothesis. Studies concerned with the effects of intensive remedial education,
for example, involve students whose academic performance is particularly low on
standard grading scales (e.g., below a GPA of 2.00). Regression to the mean becomes a
problem if these low performing students are recruited based on poor performance on
260 Chapter 7 Linear Regression
R egression to the mean is often an elusive phenomenon, one that can lead to errors in judgment.
Can it also hamper insight into the human condition? Kahneman and Tversky (1973) offered
the following problem, which was based on an actual experience. Read the following problem and
formulate your answer before consulting the explanation that follows:
A Problem of Training
A group of graduate students was asked to explain the performance of the pilots. Although
they suggested a number of possible explanations (e.g., verbal reinforcement is ineffective for
some pilots, the instructors' observations were biased), not one nominated regression to the
mean. Kahneman and Tversky (1973, p. 251) remarked that, "Regression is inevitable in flight ma-
neuvers because performance is not perfectly reliable and progress between successive maneuvers
is slow. Hence, pilots who did exceptionally well on one trial are likely to deteriorate on the next,
regardless of the flight instructors' reaction to the initial success." They also waggishly note that
even well-trained students can be thrown by regression examples that do not involve descriptions
of tall parents and their children, a comment that should motivate all of us to be on the look out
for regression to the mean effects with greater vigilance!
Oddly, perhaps, this real-life example of regression to the mean actually illustrates how
teachers can be led to see punishment as being a more efficacious course for learning novel skills
than reinforcement; effects of the former are overestimated while the latter's impact is underesti-
mated. The danger, of course, is that we can generalize from experiences where rewards and pun-
ishments appear to switch places (Tversky & Kahneman, 1973). Due to regression to the mean, we
may see evidence that behavior is most likely to get better after punishment and to decline after
praise or reward (perhaps this is one reason that people who work hard and well are so often
rewarded with still more work). Like Tverskyand Kahneman (1973), do you ever wonder if the
chance elements in human experience often lead to rewards for those who punish others and to
punishment for those who reward them?
initial (screening) measures, then subjected to remediation, and then tested again.
Many of the recruits might demonstrate a rebound in performance on subsequent
testing. Such improvement might not be due to remediation but, instead, to statistical
regression-the students are simply returning to their own average level of perfor-
mance. Any cited improvement, then, might be artificial, but unless regression is con-
trolled for in the research design (e.g., adequate control groups are used), the investi-
gators might conclude that intensive remediation caused the scores on the subsequent
test to rise.
Cook and Campbell (1979) suggest that extreme performance in any domain is
anomalous because it is influenced by chance factors. As time passes and new but simi-
lar events occur, people will return to their more typical or mean levels of performance.
And you should not assume that regression toward the mean only occurs in testing-
related environments. A variety of everyday events can probably accurately be described
Regression as a Research Tool 261
as regression phenomena, though we are more likely to seek special explanations for
them. PIous (1993) cites some superstitious behavior in the context of sport as reactions
to regression phenomena, as when a baseball team is having a winning streak and a
player elects to wear the same clothing for weeks running so as not to "disrupt" it. In fact,
almost any extreme run of events-sudden changes in crime rates, heavy weather for
weeks straight, a dramatic rise in demand for a good or service-is quite likely to be an
instance of regression to the mean. Nisbett and Ross (1980) note that one discomfiting
aspect of implementing actions to address these problems is the intervention's illusory
impact-the return to "normal" or average events is regression to the mean, not entirely
the results of one's good work or intentions.
Under what circumstances does regression to the mean dissipate? Regression effects
are less likely to pose any problems for research based on a true experiment, one where
participants are randomly assigned to two or more levels of an independent variable.
Within a true experiment, regression to the mean is assumed to affect all participant
groups equally-that is, relatively-so that the effect is spread across conditions and
with no adverse effects.
The utility of regression analysis as a research tool can be most easily understood by
comparing it to correlational analysis. Correlational analysis is often used to develop be-
havioral science theories. Researchers often look for patterns of association between
variables in exploratory research. In contrast, regression analysis is often used in research
applications. We already know, for example, that regression is frequently used for ad-
missions purposes in higher education. An admissions officer might want to learn
whether and to what degree variables such as IQ, aptitude test scores, socioeconomic sta-
tus (SES), hobbies, and athletic participation are helpful in predicting future academic
achievement (Pedhazur, 1982).
Harris (l998) points out that with effort and planning, regression is a research
tool that could advise people to select appropriate careers or create programs de-
signed to help individuals with special needs excel in some domain. In theory, one
could use empirically derived scores pertaining to success or failure in any number of
professions to develop predictive models. Models for competitive sports (e.g., foot-
ball, golf), musical performance (e.g., pianist, violist), service industry (e.g., cook,
waiter), or medicine (e.g., doctor, nurse, surgeon), among many others, could be de-
veloped. Imagine how helpful linear regression could be in directing people toward
situations where their skills and abilities could be channeled in constructive, benefi-
cial ways.
Some investigators already use linear regression to predict an individual's future
In practice, we forget that there is performance based on his or her responses to some test. Corporations routinely employ
industrial-organizational psychologists to help select new employees from large appli-
really little to be gained from thinking
cant pools. Some of these psychologists simplify the hiring process by using regression
about regression as a way to predict analysis to make decisions about which applicants should receive job offers. Job candi-
Yfrom Xwhen we have all the actual dates, for example, are sometimes asked to complete aptitude tests related to work and
values of Y. Regression is really for job performance as part of the interview process in a company. Candidates' scores on
such measures can then be entered into a regression equation in order to determine if
predicting the behavior of individuals
their future performance is projected to be sufficient for the needs of the corporation.
in samples beyond the original Researchers usually rely on regression only when relatively large samples-often
sample. numbering in the hundreds-of respondents are available. In fact, regression equations
based on small samples are rarely useful (i.e., predictive), as they cannot be applied from
sample to sample with reasonable rates of success. Reliance on large samples makes a
262 Chapter 7 Linear Regression
great deal of sense when we remember that such samples tend to be more representative
of populations than small samples. In regression terms, then, larger samples ensure a
greater degree of predictive accuracy than small samples. By comparison, correlational
analyses tend to be performed on relatively small data sets (we noted in chapter 5 that
values of N greater than 30 are desirable).
In sum, then, linear regression is used when one variable is being used to predict the
value of another variable. Prediction, then, is the watchword. Both of the variables
entered into any regression analysis must be based on either interval or ratio scales. The
working assumption of investigators using regression is that an equation derived from a
sufficiently large sample can be applied in the future to other similar samples. This
assumption is valid only when the samples are representative of a larger population of
origin.
KEY T ERM Multiple regression is a statistical technique for exploring the relationship between one dependent
variable (Y) and more than one independent variable (X1 • X2• ...• XN)'
Multiple regression is widely used in the behavioral and natural sciences, where
relationships are complex and a more detailed sense of how variables work together is
264 Chapter 7 Linear Regression
necessary. Thus, for example, a multiple regression equation with two independent
variables would look like this:
[7.22.1]
As you can see, this equation contains an intercept (a), but there are two slopes (b l and
b2 ), each of which is aligned with two independent or predictor variables, Xl and X2 •
This particular equation could be the basis for predicting respondents' stress scores on
some standardized measure (Y), so that variable Xl could be a self-reported rating of
job pressure and X 2 might be a score on a measure of physiological arousal. Please un-
derstand that formula [7.22.1] is still a formula for a straight line, but the line is some-
what harder to conceptualize than those presented earlier in the chapter. Indeed, a line
based on [7.22.1] would be three-dimensional and best represented by a plane (Allison,
1999) or perhaps a vector passing through a plane.
Within the behavioral and natural sciences, multiple regression is used in two gen-
eral ways: prediction and causal analysis (Allison, 1999). You are already familiar with
the idea of prediction, as we have discussed it extensively in linear regression terms.
Where multiple regression is concerned, prediction is based on the combination of
many variables, which often results in the optimal forecasting of a dependent measure.
Causal analysis refers to the use of multiple regression to identify the underlying causes
of some observed dependent measure. Researchers are drawn to the causal analysis side
of regression because it allows them to separate and explore the unique contributions of
each independent variable to the prediction of some dependent measure (Allison, 1999).
Similar to linear regression, multiple regression also develops an equation for pre-
dicting respondents' scores (:Y). More than this, however, multiple regression is also
used to learn how well some predictor variables (X) actually do predict the criterion
variable (Y). Any multiple regression analysis yields what is called a multiple correlation
coefficient, which is symbolized by the letter R and can range in value from .00 to + 1.00.
The multiple R, or simply R, indicates the degree of relationship between a given crite-
rion variable (Y) and a set of predictor variables (X). As R increases in magnitude, the
multiple regression equation is said to perform a better job of predicting the dependent
measure from the independent variables.
Similar to the relationship between rand r2, a multiple R value can also be squared
to illustrate the percentage of variance in Y that is accounted for by the set of predictors,
that is, X variables. The job of the researcher is to empirically determine which combi-
nation of predictor variables yields the best fitting regression equation. Those predictor
variables that add little to the value of R are usually dropped from the analysis.
There is a great deal to learn about multiple regression, much more than can be de-
scribed here, as it is beyond the scope of this text. Interested readers are encouraged to
consult the variety of advanced texts available on the topic (e.g., Allison, 1999; Cohen &
Cohen, 1983; Newton & Rudestam, 1999; Pedhazur, 1982).
Smallpox o
Poisoning by vitamins 0.5
Botulism
Measles 2.4
Fireworks 3
Smallpox vaccination 4
Whooping cough 7.2
Polio 8.3
Venomous bite or sting 23.5
Tornado 44
Lightning 52
Nonvenomous animal 63
Flood 100
Excess cold 163
Syphilis 200
Pregnancy, childbirth, and abortion 220
Infectious hepatitis 330
Appendicitis 440
Electrocution 500
Motor vehicle and train collision 740
Asthma 920
Firearm accident 1,110
Poisoning by solid or liquid 1,250
Tuberculosis 1,800
Fire and flames 3,600
Drowning 3,600
Leukemia 7,100
Accidental faIls 8,500
Homicide 9,200
Emphysema 10,600
Suicide 12,000
Breast cancer 15,200
Diabetes 19,000
Motor vehicle (car, truck, or bus) accident 27,000
Lung cancer 37,000
Cancer of the digestive system 46,400
All accidents 55,000
Stroke 102,000
All cancers 160,000
Heart disease 360,000
All diseases 849,000
Note: These data were taken from Fischhoff et al. (1977). The entries appearing in bold are those also shown
in Table 7.3.
people, and damaging property, say, in the Midwest with little or no difficulty. In-
stances of people succumbing to asthma do not leap to mind, however, despite the
fact that more people die (on average) each year from asthma (920 deaths per 100
million U.S. residents per year) than tornadoes (44 deaths per 100 million people). As
you may have guessed by now, the second cause of death in each pair in Table 7.3 is
far more prevalent than the first cause-and not coincidentally, these second causes
Multivariate Regression: A Conceptual Overview 267
.r ..
ftI
I! 100.000
as
.,.c:o.
tu 10.000
Pregnancy
CI
!
I are also usually much more subtle, even quiet killers, compared to the first member
of a pair. For comparison purposes, Table 7.4 lists all ofthe causes of death (and their
relative rates) from Table 7.3, plus several other lethal events (these data were taken
r from Fischhoff et al., 1977). Please note that the causes of death (and their incidence)
from Table 7.3 appear in boldface in Table 7.4.
How accurate were the predictions you made in Table 7.4? Did you behave like most
respondents-that is, did the availability bias guide your selections and boost your con-
fidence in them? If you made some accurate choices in Table 7.4, what guided your se-
lection? Were your confidence ratings appropriately gauged to any correct choices?
I What if we considered the estimated frequency of deaths per year to be Yand the
( actual frequency of deaths attributable to lethal events to be X? In other words, can we
i
f
.. treat the estimates people make as a regression problem? Figure 7.7 illustrates the
r relationship between the judged frequency and actual frequency of death due to lethal
(
events. If people's estimates and the actual death rates matched, then the data points
would fall on the straight line shown in Figure 7.7. The points plotted in Figure 7.7, as
well as the curved line that is more or less fitted to them, represent the average responses
1 of laypeople who took part in research conducted by Lichtenstein, Slovic, Fischhoff,
r Layman, and Combs (1978). As shown in Figure 7.7, hazards that occur with greater reg-
I ularity were generally estimated to occur more often, yet the data points are sometimes
I
( above, other times below, the line of accurate judgment.
How can the data shown in Figure 7.7 be characterized? Generally speaking, partic-
ipants appeared to inflate the frequency of rare causes of death and to minimize the in-
cidence of common fatalities. Table 7.5 summarizes the participants' judgmental biases
,( regarding the relative frequencies of death. As an exercise, compare the answers you
recorded in Table 7.4 to the respondents' data shown in Table 7.5. Do you see why peo-
ples' predictions often over- or underestimate the actual incidence of some events? It
should be clear that regression analysis is very useful when it comes to linking predicted
scores with actual observations.
268 Chapter 7 Linear Regression
All accidents
Most Underestimated Fatalities
Smallpox vaccination
Motor vehicle accidents Diabetes
Pregnancy, childbirth, and abortion Stomach cancer
Tornadoes Lightning
Flood Stroke
Botulism Tuberculosis
All cancers Asthma
Fire and flames Emphysema
Venomous bite or sting
Homicide
Summary
1. Linear regression involves predicting the value of one vari- whereas variable X is the predictor or independent variable in
able (Y) from another variable (X). Regression is related to most regression analyses.
correlation: Unless one variable is correlated with the other, a 4. Although regression can be both understood and performed
reasonable degree of prediction is not possible. using z scores, it is usually more practical to rely on one of the
2. Regression attempts to locate patterns within data, which are computational formulas. The most common computational
usually shown in a scatter plot. Specifically, linear regression Y
formula is = a + bX, the formula for a straight line, where
projects a "best fitting" straight line through the plotted Y is a predicted score, a is equal to the y intercept, b is the
points representing X and Y. Ideally, this line is placed so as to slope of the line, and X is a predictor variable whose value
minimize the distance between the predicted and actual val- varies.
ues of Y. 5. Any variance found around a regression line is termed resid-
3. Regression analysis examines changes in the level of variable ual or error variance, and it is symbolized S;SI y. The index
Y to corresponding changes in the levels of X. By convention, most frequently used to describe the distance between actual
variable Y is designated the criterion or dependent variable, data points and those points predicted from a regression line
Chapter Problems 269
is called the standard error of estimate, or 5est y. In essence, 10. Regression toward the mean occurs when extreme observa-
the standard error of estimate is a standard deviation found tions are later found to regress or move toward a distribu-
around any regression line. tion's mean after subsequent measurement. In research, re-
6. As the correlation between variables X and Y draws closer to gression toward the mean can be a particular problem when
± 1.00, the 5est y will move closer to 0.0 (Le., no error exists participants who score very high or very low on measures of
between actual and predicted scores). As the correlation be- interest are recruited. Subsequent increases or decreases in
comes closer to 0.0, however, errors in prediction will in- some behavior can be erroneously attributed to an interven-
crease, closing in on the value of the standard deviation of Y tion when regression to the mean is actually the source of the
(i.e.,sy). perceived change. For this reason, regression to the mean is
7. Homoscedasticity refers to the fact that variability associated considered to be a serious threat to the internal validity of re-
with one variable (Y) remains constant at all of the levels of search projects lacking adequate control groups.
another variable (X). In theory, homoscedasticity involves 11. Linear regression is best used in research that involves very
small distributions of Y scores lining up in a regular fashion large data sets, where observations can number in the hun-
above each value of X along the regression line. This condi- dreds or even thousands. Although it is often associated with
tion must be met before a regression analysis can properly be data analysis in achievement settings (e.g., college admis-
performed (in practice, heteroscedasticity-irregular Y dis- sions, hiring decisions), regression can be used in any forum
tributions-will result in poor or no prediction of Y). where predicting future behavior from past behavior is de-
8. In sum of squares terms, the variation in a regression analysis sired.
can be partitioned into two components: explained variation 12. Multiple regression predicts one dependent variable (Y)
and unexplained variation. When summed, these compo- from more than one independent variable (X). As an analytic
nents equal the total sum of the squares (SStot). technique, multiple regression is used for prediction when a
9. The coefficient of determination (r2) is used to identify the combination of variables is available. This technique can also
percentage of variance in Y explained by changes in X, be used for causal analysis, that is, to separate and identify the
whereas the coefficient of nondetermination (k) indicates the unique contributions of each independent variable in the
percentage of variance in Y not explained by changes in X. prediction of a dependent variable.
Key Terms
Criterion variable (p.242) Multiple regression (p. 263) Residual (p. 251)
Error sum of squares (p.256) Predictor variable (p.242) Residual variance (p.251)
Error variance (p. 251) Regression analysis (p.242) Slope (p. 243)
Heteroscedasticity (p. 254) Regression line (p. 244) Standard error of the estimate (p. 251)
Homoscedasticity (p. 254) Regression sum of squares (p.256)
Intercept (p.243) Regression toward the mean (p.259)
Chapter Problems
1. What is the nature of the relationship between correlation 6. Define homoscedasticity and heteroscedasticity. Why is
and regression? homoscedasticity an important assumption for regression
2. Define each of the variables and constants in the formula analysis?
Y= a+ bX. 7. What does it mean when statisticians "partition" variation?
3. Explain the relationship between a "best fitting" regression How and why is the variation around a regression line parti-
line and the "the method of least squares." tioned?
4. Explain the relationship between r, residual variance, and er- 8. What roles do the coefficients of determination and non-
rors in prediction. That is, describe the relative incidence of determination play in regression analysis?
error when r == ± 1.00 and when r == 0.0. 9. A college athlete plays very well in a series of games, but his
s. What is residual variance and how is it related to the standard performance drops off late in the season. The team's coach
error of estimate? Why are lower values for the standard error explains the change in performance by noting that his star
of estimate more desirable than larger values? player is under a lot of pressure, as he must choose between
270 Chapter 7 Linear Regression
professional sports or graduate school. Can you offer the Positive Mood Productivity
coach a statistical explanation for the player's slump in per-
formance? X = 15.5 Y= 25
10. Regression to the mean is said to be a common threat to the sx= 3.0 Sy= 6.0
internal validity of experiments and quasi-experiments. r= .66
How and why is it a threat? Can anything be done to reduce N=50
or eliminate the deleterious effects of regression to the
mean? a. Determine the regression equation of Y on X for these
11. Conceptually, how does multiple regression differ from lin- data.
ear regression? In what ways is multiple regression used? b. Sam's recorded mood is a 12. What is his projected pro-
12. Suggest two real world applications for regression analysis. ductivity level?
13. Sketch a graph showing the linear equation Y =5 + 4X. c. Andrea's productivity level is 33. What level of mood led
14. Using the equation Y = - 5 + 6.3X, calculate Y when X = 7, to such a high level of productivity?
12,14,20, and 21. A d. Calculate the standard error of estimate for both X and Y.
A
15. Using the equation X = 15 + .S7Y, calculate X when Y= 2.S, e. What percentage of total variation in productivity is ac-
3.7,4.5,5, and 6. counted for by positive mood?
16. Determine the regression equation for these data: 20. For each of the following data sets, what is the value of the
slope of a regression line?
X Y a. r = +.43, Sx = 2.5, Sy = 4.0.
1 2 b. r = -.72,sx = 4.0, Sy = 5.4.
3 4 c. r = + .S3, Sx = 1.0, Sy = 1.S.
6 4 d. r = -.53, Sx = 2.S, Sy = 2.1.
3 2 21. Assume that each of the data sets in problem 20 is based on
2 6 an N of 90. What is the value of Sest y for each one? Which
3 data set has the largest Seat y? The smallest Sest y?
22. Report the coefficients of determination and nondetermina-
A
Compute each of the Y values based on each of the X values. tion for each of the data sets shown above in problem 20.
What is the standard error of estimate for Y? 23. A psychologist examines the link between a measure of de-
17. What perce~tage of the Y scores lie between ± 1 Seat yaround pression (Y) and one for stress (X). Based on a random sam-
any given Y value? What percent~ge of the Y scores lies ple of students from a very large state university, the psychol-
between ±3 Sost y around any given Yvalue? ogist obtains the following data:
IS. A sociologist believes that there is a relationship between Stress Test Depression Measure
hours spent exploring the Internet and grade point average
(GPA). Specifically, the sociologist believes that lower X = 22.5 Y = 45.0
grades can be predicted from increasing amounts of time sx= 2.0 Sy= 4.5
spent "surfing" the Internet. She surveyed SO college stu- r=+.73
dents, asking them to indicate their GPAs (Y) and the N=400
number of hours per week spent on the Internet (X). Here
are the data: a. If a student's stress score is 32, what is his or her depres-
sion score apt to be?
Hours on Internet GPA
b. A new student receives a 20 on the depression measure.
X=20 Y= 2.75 What is her estimated score on the stress test?
sx= 5.0 Sy= .45 c. What percentage of the variation in the depression mea-
r=-.70 sure is not explained by scores on the stress test?
N=SO 24. The psychologist introduced in problem 23 creates an inter-
vention program designed to reduce stress and alleviate
a. Determine the regression equation of Y on X for these depression. He recruits 20 highly stressed freshmen and
data. meets in private and group sessions with them for 1 month.
b. Biff uses the Internet for 30 hours a week-estimate his At the end of that time, their depression scores drop by an av-
GPA. erage of S.5 points. The psychologist presents his results to
c. Susan has a GPA of 3.25. Estimate the number of hours the university's administration in order to ask for funding
she spends on the Internet. to undertake a larger intervention project. Are the psycholo-
d. What is the standard error of estimate of Y? gist's conclusions about the intervention program justified?
19. Researchers believe that positive mood is related to work pro- Why or why not? How could he improve his intervention
ductivity. The following data were collected in an office: project?
;.
t
,
F Chapter Problems 271
25. A gerontologist wants to see if age predicts declines in scores males, 30 females), asking each one to indicate their number
{ on a particular intelligence test. Despite having a large number of siblings and their birth orders. Is a regression analysis ap-
! of research participants who range in age from 25 to 85, age propriate? (Hint Use the decision tree at the start of this
,t is not highly correlated with performance on the intelligence
test. Is a regression analysis appropriate? (Hint: Use the deci-
chapter to answer this question.)
27. A researcher conducts a field study under trying conditions,
,l sion tree at the start of this chapter to answer this question.) so that she has some data from each participant. It is the case,
26. A student wonders if birth order (firstborn, secondborn, and however, that some participant records lack a criterion vari-
so on) predicts shyness, such that first or only children tend able, others a predictor-few have both pieces of data. Is a re-
to be shyer than later born children are. The student gives a gression analysis appropriate? (Hint: Use the decision tree at
standardized measure of shyness to 60 participants (30 the start of this chapter to answer this question.)
r
!>
I,
I
i
;r
I
r
.
(
f
r
fI
Calculating a Probability Working with a Probability Value
1. 2. 1. 2.
Were the Do you know the Does the Will you be
data collected total number of probability's reporting more
randomly? observations value fall than two or
favoring each between 0 and three probabili-
possible event? tOO? ties. at once?
3.
Will the
then adjust the total N of
observations (i.e.,
I probability be
reported in a'
denominator) and Nfor any written text?
event (i.e., numerator)
following any selection before ......... ..' ....
calculating each probability,
Le., ptA) = events favoring If yes, I . .If no, .•
I
A/total number of then round then you do
events = A' the probability . not need to .......
(A) +(notA) to two places round it to two
behind the places behind.
4. decimal point. the decimal
It is not appropriate to point.
calculate a probability when . ' ....... '.' ,""',......
this condition is not met in
advance.
Chapter Ouliine
In the course of this chapter, we will explore some basic aspects of probability the-
ory, and our goals in doing so will be two fold. First, familiarity with some formal
Probability involves the degree to
properties of probability theory will help us to understand the statistics derived from
which some event is likely to happen them. Generally, we will learn that probability entails examining the ratio of the
in the future. number of actual occurrences of some event to the total number of possible occurrences
for the event.
Second, we will use probability in connection with a main theme of statistics, in-
ferring the characteristics of a population based on a sample drawn from it. The link
shared between samples and populations is often described in probabilistic terms-
how likely is that sample A was drawn from population B rather than population C?
When a sample is established as originating from a given population, empirical re-
search and statistical analysis can be used to make inferences about the characteristics
of the population. These inferences are based on inferential statistics, which enable in-
vestigators to use the limited information from a sample to draw broad conclusions
about a population.
Let's consider an example that will illustrate these two goals in microcosm:
Consider the letter V and its placement in words in the English language. Is V more
likely to appear as
_the first letter in words?
_the third letter in words?
How can we explore this problem in the context of the chapter's first goal, the
examination of the ratio of actual to possible occurrences of some event? When
people answer this problem by treating it as a subjective probability judgment-
a reliance on belief, opinion, or intuition-most select the first option. My guess
is that most readers solved this problem by making a subjective probability judg-
ment-V was perceived to occur more frequently as the first rather than the third
letter of words. A more empirical approach, of course, would be to randomly se-
lect words from a book or newspaper, comparing how often V appeared in the first
or third position of the sampled words. Using this method, one would not know
the actual number of possible occurrences, but the relative frequency of first letter
V words to third letter V words could be assessed (i.e., counted) through this sort
of sampling.
Such sampling and subsequent examination of words from some text would, of
course, involve the second of the stated chapter goals. You would be using the sample
of words containing V in the first or third position to try to characterize the quality of
the larger population of words-does it contain more words with V in the first or third
position? No inferential statistics would be involved in this case, but inference as-
suredly would, as a limited sample of information would be used to estimate the char-
acteristics of the larger (and largely unknown) population.
As you almost certainly guessed by now, actual word counts reveal that the letter V
is more frequently the third rather than the first letter of English words. Tversky and
Kahneman (1973) explain the pronounced tendency for placing V in the first position
as another case of availability bias, the ease with which examples can be brought to
mind. This heuristic or inferential shortcut was introduced in chapter 7's Project
Exercise. Thus, it is by far easier to mentally retrieve words starting with V (e.g., verb,
vendetta, victory, viper, vim) than it is to think of words having V in third place
(e.g., live, love, dive). Availability biases aside, we are clearly familiar with probabil-
ity judgments-but we must learn ways to make them properly and, therefore,
more accurately.
The Gambler's Fallacy or Randomness Revisited 275
H-T-H-H-H-T-T-H-H-H-H
H-H-H-T-T-T
H-T-H-T-T-H
Neither one is more likely than the other, but many people will nonetheless select the
We commit the gambler's fallacy second array because it appears to be relatively more "random" than the first one. In fact,
the first one appears to many people to be too ordered or sequenced-nature, after all,
when we assume that one chance
should be choppy or somewhat haphazard, shouldn't it? Not in this case, nor in many
event will cause another chance similar cases.
event. When we fall prey to seeing order amidst randomness, order where it does not
apply, we are being influenced by what is traditionally called the gambler's fallacy.
KEY TERM The gambler's fallacy entails the erroneous belief that one randomly determined event will affect
the outcome of another randomly determined event.
A fallacy is a delusion, an untruth, one that is disruptive to the extent that peo-
ple follow or believe it. Misconstruing chance is called the gambler's fallacy for
276 Chapter 8 Probability
ellCII II>
II> C!>
<8
C!>
<8 Regent's e C!>
Park e <8
e
8
III <8
e ell e
<8
e (i)
e
I!>
Y II>
I!l
C!l III • I!>
I!l
Cumberland ®
I!>
til e
(i)
•
til • e
• 41
•
II> •
e e
e
II> • <8
e e
II>
• II>
The quotation from Lem illustrates that the study of probability is used to predict what
is likely to happen in the future given that some information is already known. Such
information usually concerns the characteristics of a sample, which are used to infer the
actual nature of the larger population from which they were drawn (for a review of
samples and populations, see chapter 1). Probability, then, serves as a guide to under-
standing what could happen in the future, and it is based on assessing the nature of the
relationships among the variables found within a sample.
A person could spend a lifetime learning all there is to know about probability the-
ory, as it is a special area of study within the discipline of statistics. Thus, our review of
probability theory and probabilistic concepts will necessarily be a selective one. We will
begin with the so-called "classical approach" to probability, which will be linked with our
working knowledge of proportions and frequency distributions.
The students in my statistics class comprise not only a sample but also a sample space.
When I call on anyone of them, I am sampling their opinion-one student is an obser-
vation from the larger sample space of students.
278 Chapter 8 Probability
Sampling with Replacement. What is the probability that you will draw aredmarble?We
know there are 6 red marbles and 10 black ones. If we designate the red marbles "X' and the
black marbles "notA:' then based on [8.3.1]' the probability of A can be determined by:
6
[8.3.2] peA) = 6 + 10'
Please note that we are still using the conceptual information from formulas [8.1.1] and
[8.2.1). We are simply treating "not Pl' or the choice of black marbles as the probability
of interest. (If so desired, we could redefine the black marbles as ''1\' or even call them "B"
or whatever-it is the numerical relationships between the probabilities that matters,
not the nomenclature. Just be sure to keep the chosen variable designations in mind as
you proceed with any probability calculations.) To continue:
10
[8.4.2] p(not A) = 10 + 6'
10
[8.4.3] p(notA) = 16'
Thus, the probability of selecting one of the 10 black marbles (.625) is greater than
selecting one of the 6 red marbles (.375). If we continue to sample in this manner-with
replacement-the probability associated with selecting a red or a black marble will not
change.
Here is an important lesson we can point to now and come back to later:
[8.5.1] peA) + p(not A) = 1.00.
This mathematical relationship must be true and we can certainly demonstrate it with
the probabilities we have already determined. The probability of selecting a red marble
plus the probability of selecting a black marble is equal to 1.00, or:
[8.5.2] .375 + .625 = 1.00.
Sampling Without Replacement. Imagine that we repeated this example but used
sampling without replacement, that is, once a marble was drawn from our hypothetical
sack, it is not replaced therein. Let's say that we chose a red marble on the first draw-we
know that our probability of doing so was .375. What is the probability of selecting a red
marble on the next draw? When sampling without replacement, we need to take into ac-
count that there is one less red marble (i.e., now 5, not 6) and one less marble overall
(now IS, not 16); thus, the teA) changes, as does the denominator (i.e.,f(A) + t(not A»
used to determine the p(A), or
5
[8.6.1] peA) = 5 + 10'
5
[8.6.2] peA) = IS'
[8.6.3] peA) = .333.
As you can see, the likelihood of selecting a red marble decreases somewhat-from .375
to .333-but we have taken into account the fact that we removed the original red mar-
ble. The probability of selecting a black marble, too, is affected due to sampling one red
marble without replacement.
10
[8.7.1] p(not A) = 10 + 5'
10
[8.7.2] p(not A) = 15'
[8.7.3] p(notA) = .667.
Probability: A Theory of Outcomes 281
The likelihood of selecting a black marble increases once a red marble is removed
from the sample. Naturally, the two probabilities-peA) + p(not A)-still sum to 1.00
(Le., .333 + .667).
Anytime you sample without replacement, you must be sure to account for the re-
moval of an observation from the denominator of a probability formula, and possibly its
numerator, as well (see [8.6.1]). Whether you are examining marbles, people, or person-
ality test scores, just be sure to keep track of what observations are drawn from a sample
each time and to account for them after every round.
At this point, you have a very basic working knowledge of classical probability.
There are a few key points we need to highlight here before proceeding to the next topic.
These points are:
1. Any probability discloses a pattern of behavior that is expected to occur in the long run.
Probability refers to an event that is A probability value does not mean that a behavior is actually going to happen at any
given point in time, just that it is more (or less) likely to happen depending upon its
likely to happen, not than event must
strength. In fact, it is a good idea to think about probability as if you were repeatedly
happen. Higher probabilities indicate sampling for a given behavior across some period of time in order to understand how
what events are apt to take place in it is truly likely to occur.
the long run. 2. A given probability can take on a value ranging from 0 to 1.00. An event that is never
expected to occur has a probability of a and one that is absolutely certain to occur
has a probability equal to 1.00. In the previous example, the probability of selecting
a blue marble is 0 because there are no blue marbles in the sample. The probability
of selecting a red or a black marble is 1.00 because all 16 of the marbles are either
black or red. The individual probabilities of selecting red or black, of course, were
around .300 and .600, respectively, varying somewhat due to whether sampling
occurred with or without replacement.
III 3. Probabilities can but need not be rounded. If a probability is determined to be, say,
Probability values can range from peA) = .267, it can be reported as .267 or rounded to .27. For pedagogical purposes
and issues of calculation accuracy, the convention in this book is to report a proba-
oto 1.00. bility to three places behind the decimal point. Be advised that the convention of the
American Psychological Association (1994) is to report such information to only two
places behind a decimal point. Both approaches are correct, and here is a compro-
mise to help you accommodate to their difference: When doing a calculation, go out
to three places-when reporting a result based on a calculation, round to two places.
Please also note that APA style eschews placing a 0 in front of the decimal point of
probabilities and any numbers that will never have a value greater than 1.00.
We can now turn to interpreting numerical probabilities, which is linked to proportions
and percentages, two familiar topics.
W e have already examined one of the classic tools used to teach probability, coin flips, but it is
worth revisiting it for a moment in the context of classical probability theory. What is the
probability of flipping a coin and having it come up "heads"?
If p(A) = heads, then p(not A) = tails, so:
p(A) 1 1
p(A) = p(A) + p(notA) = 1 + 1 = 2' = .50.
What is the probability of tossing a coin and having "tails," or p(not A), come up?
What is the probability of selecting a queen from the same deck after the ace of spades is re-
turned and the cards are reshuffled? There are four queens in any deck, so
Thus, the odds of picking the ace of spades or a queen on any given draw are few and far be-
tween. After shuffling and drawing from a deck of cards 100 times, you can expect to select the ace
of spades once or twice, and to pick a queen approximately seven or eight times.
If= 25
We can determine the simple probability of any event X within a frequency distribution
by using this formula:
[8.8.1] p(X) = .L,
N
where fis the frequency of any given value for X and N is equal to the I f(here, 25), the
total number of observations available. Let's do an example. What is the probability of
randomly selecting a 10 from this population?
2
[8.8.2] p(X = 10) = - = .080.
25
Note once again that this probability value is also a proportion.
Besides determining the probability of any single event X, we can also explore the
probability associated with sampling a range of possible values within a population.
In other words, if we randomly select an observation X from the above frequency
distribution, what is the probability it will be less than «) 6? If you look at the above
frequency table, you can see that X is less than 6 when X is equal to 5, 3, or 2. We need
only sum the frequencies (fs) associated with these three observations (X) to determine
the probability of selecting an observation less than 6, or:
16
[8.9.1] p(X < 6) = - = .640.
25
This probability was determined by summing the fs of 7, 4, and 5, which correspond to
X = 5, 3, and 2, respectively.
What is the probability of drawing an observation that is less than or equal to
(:5) 6? This time, of course, we need to account for the frequency associated with X = 6,
which, as shown above, is f= 3. This additional frequency would be added to the 16 ob-
servations already accounted for in [8.9.1], or
Knowledge Base
1. A bag contains 10 marbles-4 red, 2 green, and 4 blue. When sampling with re-
placement, what is the probability of selecting a blue marble? A green marble?
284 Chapter 8 Probability
2. A bag contains 12 marbles-4 red, 3 green, 3 blue, 2 yellow. You draw a green mar-
ble. Using sampling without replacement, what is the probability of drawing a green
marble again? What is the probability of subsequently drawing a yellow marble, and
then another green marble on a third round?
3. Examine the following frequency distribution:
x f
9 4
8
5
3 3
2 6
5
Answers
1. p(blue) = 4/10 = .400; p(green) = 2/10 = .200
2. p(green) = 2/11 = .182;p(yellow) = 2/11 = .182;p(green) = 2/10 = .20
3. p(X = 3) = 3/20 = .150; p(X> 5) = 5/20 = .250; p(X s, 3) = 14/20 =.70.
The Addition Rule for Mutually Exclusive and Nonmutually Exclusive Events
The addition rule for probability is used when considering the "union"or intersection of
two events. We will introduce the formal definition for the addition rule and then
demonstrate its use in an example.
KEY TERM The addition rule for probability indicates that the probability of the union of two events, A and 8,
that is, p(A or 8), is equal to: p(A or 8) = p(A) + p(8) - p(A and 8).
Imagine, for example, that we have a small class of 20 elementary school students.
Eight of the children have blond hair, 6 have blue eyes, and 4 have both blonde hair
and blue eyes. What is the probability of selecting a child who has blond hair or blue
eyes? (Not both.)
In evaluating this sample, we notice a few things. First, we are considering a subset
)
of the children, not all 20, though we will use 20 as the denominator for our probability
calculations. Second, we know the probability of having blond hair is equal to 8/20 (i.e.,
I peA) = 0400) and that associated with blue eyes is equal to 6120 (i.e., (B) = .300). Third,
r we know that a joint probability-having both blond hair and blue eyes-is also present.
KEY T ER M A joint probability is the mathematical likelihood of selecting an observation wherein two condi-
,- tions-p(A) and p(8)-are present. In formula terms, a joint probability is equal to p(A and 8).
In the present example, the joint probability of hair and eye color can be known by de-
termining peA and B). If the question were "What is the probability of selecting a child
from the sample who has blond hair and blue eyes?"we could readily answer it using the
information presented: peA and B) = 4/20 = .20.
Because we were asked to provide the probability of having blond hair or blue eyes,
,,; however, we need to take into account that some of the children in the sample have
,
I
both conditions-there are four of them-and then, in effect, we do not want to count
them twice. Figure 8.2 shows an example of what is called a Venn diagram to illustrate
the relationship between hair color and eye color (there is a good chance you encoun-
tered Venn diagrams at some earlier point in your education). As you can see, the
I circle on the left represents blond haired children and the one on the right those with
r
I blue eyes. The shaded intersection of the two circles indicates the overlap-the joint
r
'.'
286 Chapter 8 Probability
A B
probability-between the two groups (Le., the four blond-haired, blue-eyed children).
By using the addition rule we can develop a probability estimate that partials out this
joint probability.
The probability of selecting a blond-haired or a blue-eyed child from the sample
(and nota blond haired, blue eyed child), then, would be:
[8.12.1] peA or B) = peA) + p(B) - peA and B),
[8.12.2] peA or B) = (0400 + .300) - .200,
[8.12.3] peA or B) = .700 - .200,
[8.12.4] peA or B) = .500.
As you can see, the probabilities of A and B are added together, and then the joint prob-
ability of A and B is subtracted from the resulting sum. Thus, when randomly selecting
a child from the sample, we know that the probability is .50 that the child will have either
blond hair or blue eyes.
The addition rule is also applicable to probabilistic events that are said to be mutu-
ally exclusive.
KEY TERM Two events are mutually exclusive when they have no observations within a sample in common.
Mutually exclusive events cannot occur simultaneously.
In contrast to a joint probability and its points of overlap or intersection, when two
events are described as being mutually exclusive, they are disjoint from one another. A
person is either male or female; one cannot be both because gender is a mutuallyexclu-
sive construct. Similarly, the U.S. government does not permit its citizens to hold dual
citizenship in another nation. If you are a U.S. citizen, for example, you cannot also be a
citizen of Canada.
When the addition rule is used to identify probabilities for mutually exclusive
events, it is simplified somewhat. Specifically, the joint probability of A and B, that is,
peA and B), is equal to O. Imagine that we revisited the same elementary school intro-
duced above and went to a different room of children. In this room there are 18 children,
7 with blond hair and 5 with blue eyes-none of the children have both blond hair and
blue eyes. What is the probability of having blond hair or blue eyes? We follow the same
Calculating Probabilities Using the Rules for Probability 287
A B
procedure for calculating simple probabilities-J/ N-so that the likelihood of having
blond hair in this group is 7/18 (i.e., peA) = .389) and that of having blue eyes is 5/18
(i.e., pCB) = .278). In the face of such mutually exclusive events, then, our addition rule
becomes
[8.13.11 peA or B) = peA) + pCB),
[8.13.21 peA or B) = .389 + .278,
[8.13.31 peA or B) = .667.
Please note that [8.13.1] is the same as [8.12.1] except that the peA and B) is obviously
absent. The peA and B) is implied, however, but because it is equal to 0, there is no need
to represent it in the formula. If we randomly select a child from the second classroom,
then, we know that the probability he or she will be blond haired or blue eyed is .667.
I Figure 8.3 shows a Venn diagram for the mutually exclusive events of having blond hair
or blue eyes (note that the two circles never intersect, as their probabilities are disjoint
I from one another).
/
The Multiplication Rule for Independent and Conditional Probabilities
The second basic rule of probability is called the multiplication rule, which is used for es-
timating the probability of a particular sequence of events. In this case, a sequence of
events refers to joint or cooccurrence of two or even more events, such as a particular
pattern of coin tosses (e.g., what is the probability of flipping H-T-H-H-T?).
Determining the probability of a precise sequence of coin flips involves recognizing
,..-
the independence of each flip from every other flip. To determine this probability, we
rely on the multiplication rule for independent events.
KEY T ERM When a sequence of events is independent, the multiplication rule entails multiplying the probabil-
ity of one event by the probability of the next event in the sequence, or ptA then B then· .. ) = ptA)
! x p(B) x pt· .. ).
If we want to know the probability of a sequence of coin flips, such as the afore-
I
I
mentioned pattern H-T-H-H-T, we need only remember that the probability of an H
or a T on any toss is .500. Thus,
288 Chapter 8 Probability
Linda is 31 years old, single, outspoken, and very bright. She majored in philosophy. As
a student, she was deeply concerned with issues of discrimination and social justice, and
also participated in anti-nuclear demonstrations. Which of the following statements is
apt to be true about Linda? Check one.
Linda is a bank teller. _ __
Linda is a bank teller and is active in the feminist movement. _ __
At first blush, many people immediately decide that Linda must be a bank teller who is also a
feminist-after all, she appears to be something of an activist, doesn't she? The descriptive infor-
mation about Linda is socially interesting but it has no bearing (that is, it should have no bearing)
on the decision at hand. Yet after reading the above (brief) description, the majority of respon-
dents in a study by Tversky and Kahneman (1983) were quite willing to predict that she was more
likely to exhibit both characteristics than just one. These researchers labeled this judgment bias the
conjunction fallacy (for related efforts, see Abelson, Leddo, & Gross, 1987; Leddo, Abelson, & Gross,
1984; Morier & Borgida, 1984).
What do they mean by conjunction fallacy? The conjunction of events is a key concept in el-
ementary probability. The conjunction or joint occurrence of two events (e.g., "bank teller" and
"active in the feminist movement") cannot be more likely than the probability of either event on
its own (e.g., "bank teller" or "active in the feminist movement"). The veracity of the conjunction
rule is easily demonstrated by the probabilistic relationships portrayed in the Venn diagram shown
below. As you can readily see, the shaded area shared by the two circles represents the joint proba-
bility of being a "bank teller and active in the feminist movemene' The probability of the two
events cooccurring must be lower than either event-being a "bank teller"(see the left circle) or
being "active in the feminist movement"(see the circle on the right)-separately. Why? Well, not
all bank tellers are feminists nor do all feminists have careers as bank tellers.
r
I
According to Tversky and Kahneman (1983), then, we frequently violate the conjunction rule
when making everyday judgments. Ironically, we see specific, detailed scenarios as being more
likely to occur than general events, despite the fact that the latter are by definition much more
likely to happen. Indeed, as the detail associated with a projected event or a person's behavior in-
creases, the objective probability linked to the event or act actually decreases. In our minds, how-
ever, detailed, even wild, scenarios take on a life of their own, and our belief in the likelihood that
they will occur actually increases.
Calculating Probabilities Using the Rules for Probability 289
PIous (1993, p. 112) provides another compelling example that reinforces Tversky and Kah-
neman's conclusion that we see specific scenarios as more likely than general events because they
are representative of the ways we imagine particular events occurring. Which of the following is
more likely to occur:
Scenario 1: An all-out nuclear war between Russia and the United States
Scenario 2: A situation in which neither country intends to attack the other side with
nuclear weapons, but an all-out nuclear war between the United States and Russia is
triggered by the actions of a third country such as Iraq, Libya, Israel, or Pakistan
Similar to the Linda problem, most respondents believe that a specific crisis (a third country trig-
gers a war between the United States and Russia) is more likely than a general one (an all-out war
between the United States and Russia). PIous (1993) notes that Pentagon planners behaved like
most respondents for decades-they committed the conjunction fallacy by developing incredibly
detailed war plans to deal with any number of improbable chains of events in spite of the fact that
the two nation conflict is more likely.
[8.14.11 p(H then T then H then H then T) = p(H) X p(T) X p(H) X p(H)
X p(T),
In spite of the fact that each draw of a marble is independent of the others, the likelihood
of first selecting a red marble, then a green one, and finally a blue one is really quite low.
What happens if we sample without using replacement? The events are still inde-
pendent of one another but it is necessary to take into account that the number of mar-
bles in the population is reduced after each draw. Using random sampling without re-
placement, what is the probability of selecting a blue marble, then a red marble, and then
a blue marble once again? Following the same procedure as before, we set the probabil-
ityup as
[8.17.1] p(blue then red then blue) = p(blue) X p(red) X p(blue),
6 10 5
[8.17.2] p(blue then red then blue) = p 20 X P19 X P18
The probability, then, of drawing one blue, one red, and a second blue marble-in that
order-is rather low (i.e., p = .044). Please take another look at [8.17.2] in order to be l
sure you understand that sampling without replacement means that one observation is (
subtracted from the value of the denominator after each draw. Please also note that be-
cause we assumed that we would select a blue marble on the first draw, the numerator
for selecting a blue marble again on the third round is also reduced by 1 (i.e., there are
now five blue marbles to choose from as the sixth was drawn on the first round).
Probabilistic relationships can be more complex than the ones we have reviewed so
far. In fact, behavioral scientists often examine what are called conditional probabilities
precisely because the probability of one event can be better understood by taking into I
Male 55 23 78
Female 25 47 72
Column Total 80 70 150
Students were invited to apply for tour guide positions and part of the application
process entailed completing a self-monitoring questionnaire. Table 8.1 shows the ob-
served frequencies of male and female students subsequently classified as either high or
low self-monitors. In this sample, relatively more men than women were classified as high
self-monitors, whereas the reverse pattern was true in the case of low self-monitors (see
Table 8.1).
By relying on the inherent relationship between proportion and probability, the fre-
quencies shown in Table 8.1 can be readily converted into probabilities. Each entry of the
four cell entries in the center of the table, as well as the two column totals and the two
row totals, were divided by the overall N of the sample (which is 150; see the bottom
right corner of the table). The resulting probabilities are presented in Table 8.2.
Please note that I use subscripts to denote the two levels of variables A and B
in Table 8.2. Where self-monitoring is concerned, for example, I could just as easily
have used "A"and "not if' instead of "AI" and "A2 " to denote "high" and "low" self-
monitors, respectively, which would mean that the former personality type would be
''A.'' In this and similar contexts, however, I find such designations to be arbitrary, if
not confusing-after all, low self-monitoring personalities are on the same contin-
uum with high self-monitors, so designating one group as "not if' seems to me to be
odd, even inaccurate.
Table 8.2 contains two important types of probability information. First, the four
entries in the center of the table are joint probabilities shared between the two levels of
A (self-monitoring type) and B (gender). The probability of being a high self-monitor
1 and a male (p(A I and BI )), for example, is .367, just as the relative likelihood of being a
( low self-monitor and a female (p(A 2 and B2 )) is.313 (see Table 8.2). Please note that the
sum of the four joint probability cells must sum to 1.00 (i.e., .367 + .153 + .167 + .313
= 1.00). It is always a good idea to perform this quick check for calculation errors before
J using or reporting the probabilities.
Table 8.2 also includes what are called marginal probabilities, probabilities based on
A proportion is also a probability; a collapsing across one of the two variables shown in the table. For example, by glancing
down columns Al and A2 , the probability of being a high or a low self-monitor, respec-
probability is also a proportion.
tively, can be known. In the same way, reading across the two rows representing gender
will reveal the probability of being a male or a female in the sample.
Table 8.2 Joint and Marginal Probabilities of Students by Self-Monitoring Type and Gender
)
" Il
& •
- Gender
High Self-Monitors
Al
Low Self-Monitors
A2
Marginal Probability
A
KEY TERM A marginal probability, sometimes known as an "unconditional probability," indicates the likeli-
hood of an independent event's occurrence.
As shown in the far right of Table 8.2, the marginal probability of being a male in the
sample (p(B j )) is .520. Similarly, the marginal probability of being a high self-monitor
(p(A j ) = .533) is slightly higher than the probability of being a low self-monitor
(p(A 2 ) = .467; see Table 8.2). The sum of the marginal probabilities for either personal-
ity type (Aj + A2 ) or gender (B j + B2 ) must also sum to 1.00 (see Table 8.2). As noted
above, it is always appropriate to determine that each respective set of marginal proba-
bilities do sum to 1.00 to eliminate any errors that could plague later calculations.
III What about conditional probabilities? Can we go beyond the joint and marginal
Although they are related, conditional probabilities provided here and create more specific-that is, conditional-probabil-
probabilities differ from joint
ity estimates? Keep in mind that the conditional probability of a given event is one
that depends upon the presence of some other event. Our hypothetical admissions
probabilities and marginal director might ask a question like the following: Given that we know a student in the
probabilities. Conditional probabilities sample is a male, what is the probability he is also a high self-monitor? In other
incorporate joint and marginal words, the student's gender-here, male-is a "given" piece of information, one that
probabilities.
makes any subsequent probability information conditional (i.e., we can only deter-
mine probabilities involving men). This conditional probability estimate is written in
the following symbolic terms: The probability of being a high self-monitor (p(A j ) )
given he is a male (p(B j )), or
[8.18.1]
When you see (or use) a line like this one ( I), you know you are working with a condi-
tional probability. The line means "given;' so that the probability is read as "the proba-
bility of Aj given Bj:' Notice that the probability of being male (p(Bj))-the given in-
formation-is the denominator. The numerator is conditional on this information: If
we want to know the probability that the person is a high self-monitor, we can assume
that this high self-monitor must be male because it is given information. Thus, we will
use the joint probability of being a high self-monitor and male (p(A j and Bj)). If we take
these probabilities from Table 8.2 and enter them into [8.16.1], we get
[8.18.2] ( I .367
pAj Bj ) = - ,
.520 (
[8.19.1) P
(B
2
IAI ) = peAlpeAl)
and B2) .
That is, given that we know the selected student is a high self-monitor, what is the prob-
ability the student is also female? Or
[8.19.2) ( I )
P B2 Al = - ,
.167
.533
[8.19.3) p(B2 1 AI) = .313?
Thus, if we select a person at random and know that the person is a high self-monitor,
the chance that the person is also female is not terribly high. Please note that this con-
clusion makes sense because we established earlier that males in the sample were apt to
be high self-monitors, whereas females were more likely to be low self-monitors.
There is one fina) important thing you should know about conditional proba-
Conditional probabilities are not bilities-they are not interchangeable with one another. As a general rule, then,
interchangeable: p(A IB) *" p(B IA). I I
peA B) :t:- pCB A). We can demonstrate this fact by determining specific conditional
probabilities from Table 8.2. Thus, for example, the p(B2 1 A2) =1= p(A2 1 B2) because the
former conditional probability (.670) is not equal to the latter (.652). As an exercise, go
back and perform the calculations to prove this fact and to verify that you understand
how to determine conditional probabilities.
The accuracy of this probability can be verified by looking at the joint probability of A2
and B2 given in Table 8.2-as you can see, it is indeed equal to .313.
Knowledge Base
1. A jar contains 30 marbles. Nine of the marbles are green, 8 are pink, 8 are brown and
white, and 5 are pink and green. What is the probability of selecting a marble that is
pink or green?
2. A jar contains 20 marbles. Ten are yellow, 6 are blue, 2 are pink, and 2 are brown.
What is the probability of selecting a yellow or a pink marble? Using sampling with
294 Chapter 8 Probability
replacement, what is the probability of selecting a blue and then a brown marble?
Using sampling without replacement, what is the probability of selecting a blue, a
pink, and then a yellow marble?
3. You are flipping a coin. What is the probability of obtaining the following sequence
of heads (H) and tails (T): H-H-T-T-T-H-T?
4. You are a developmental psychologist who is interested in humor. Two age groups of
children hear a joke, and you classify whether they found it to be funny. Examine the
following table of data and answer the questions that follow it
Age Funny Not Funny
Answers
1. p(pink or green) = 8/30 + 9/30 - 5/30 = .267 + .300 - .167 = .400
2. p(yellow or pink) = 10/20 + 2/20 = .500 + .100 = .600; ; p(blue then brown) = 6/20 X
2/20 = .300 X .100 = .030; p(blue then pink then yellow) = 6/20 X 2/19 X 10/18 =
.300 X .105 X .556 = .018.
3. p(H-H-T-T-T-H-T) = (.500) X (.500) X (.500) X (.500) X (.500) X (.500) X (.500) = .0078.
4. a p(5-year-old) = 28/55 = .509
b p(lO-year-old and funny) = 4/55 = .073
c p(not funny and 10 110) = .418/.490 = .853
5. p(5 and not funny) = p(5) X p(not funny 15) = (.51) X (.286) = .146 or p(not funny) X
p(51 not funny) = (.564) X (.259) = .146.
Back in chapter 5, we noted that z scores could be used to determine the probabilities as-
sociated with areas under the standard normal ("bell-shaped") curve. In other words,
investigators can ask focused questions concerning the probability of randomly selecting
an observation or individual possessing some quality or characteristic that is less than,
greater than, or equal to a particular range of values. AF, we learned in chapter 5, any
measure can be converted to a z score as long as the mean and standard deviation of its
distribution are known. Once a score on a measure is transformed into a z score, its rel-
ative position is understood in terms of the z distribution's mean of 0 and standard de-
viation of 1.0. If these ideas seem unfamiliar to you or your recollection of the z distri-
bution's properties is a bit hazy, then you should review chapter 5 before proceeding
with the remainder of this section.
One other matter must be discussed before we firmly establish the link between
probability and the area under the normal distribution. Most of the probabilities we
have calculated so far in this chapter have involved discrete rather than continuous vari-
ables. With the exception of calculating the probability that a score within a grouped fre-
quency distribution was above or below a certain point or range of scores (see page 283),
Using Probabilities with the Standard Normal Distribution: z-Scores Revisited 295
our work in this chapter has focused largely on counting the number of observations fa-
voring one outcome and dividing that number by the total number of available obser-
vations. To be sure, discrete probabilities are quite useful, but our interest in behavioral
science research and its need to generalize results from one sample to other samples and
populations will be hampered unless we are concerned with a continuum of probabili-
ties, with ranges of probabilities therein.
Our interest, then, turns toward learning to determine the probability that an
observation or score can be shown to fall within a certain area under the normal
curve. The probability for a continuous variable-one that has no gaps between
values-can be expressed as the proportion of the area under the normal curve. Thus,
we can ask the probability that some score, X, falls within a range of values represented
by A and B. In conceptual terms, we mean that
That is, the population mean is subtracted from some known score X, and the resulting
difference is divided by the population standard deviation. Using the information for the
present example, we find that
125 - 100
[8.22.1] z=
15
25
[8.22.2] z=-,
15
[8.22.3] z = +1.67.
We know, then, that an IQ score of 125 lies 1.67 standard deviation units above the
mean. The top curve in Figure 8.4 illustrates a normal distribution ofIQ scores with a p.,
of 100 and (T of 15. A score of 125 is highlighted in Figure 8.4, and the area under the
curve at or beyond 125 is shaded. The curve representing the z distribution in the bot-
tom half of Figure 8.4 portrays the same relationships as those in the upper curve-this
time, however, the observed z score of + 1.67 is highlighted; the area at or beyond this z,
too, is shaded.
Turning to Table B.2 in Appendix B, we want to determine the probability of select-
ing a score equal to or greater than 125. As you can see, the z score of + 1.67 is located in
the upper left corner of the second page of Table B.2. Because we are interested in the
probability of selecting a score that is greater than or equal to 125, we look to column C,
which is labeled "area beyond z." The entries in Table B.2 are proportions under the
curve, but we already know that proportions are interchangeable with probabilities.
Thus, the probability (proportion) of randomly selecting an IQ score of 125 or higher is
equal to .0475, or p( x;;;.: 125) = .0475 (which is equal to the shaded area under the z
296 Chapter 8 Probability
~~=-----~--~---L---L---L--~~~~~~~===x
70 80 90 100 110 120 125 130
1.1.
Figure 8.4 Normal Distribution of IQ scores (p, = 100, u = 15) and Corresponding z Distribution
Note: The shaded area under each of the curves represents the proportion (probability) of scores equal to or
greater than an IQ score of 125.
distribution shown in the lower half of Figure 8.4). In other words, there is a less than
5% chance of randomly selecting a person whose IQ score is 125 or higher.
In our second example, we can rely on the addition rule for mutually exclusive
events that was presented earlier in this chapter. What is the probability of randomly se-
lecting someone from the population whose IQ falls below 75 or above no? The upper
curve in Figure 8.5 illustrates the two areas of the curve-the lower and upper tails of the
distribution-that are relevant to answering this question. All that we need to do is to
identify the two proportions (probabilities) under the curve corresponding to the
shaded areas shown in the upper curve in Figure 8.5, and then sum these values.
We first convert each of the IQ scores to their z score equivalents. Using formula
[5.2.1], the z score corresponding to an IQ score of 75 is
75 - 100
[8.23.1) z=
15
25
[8.23.2) z= - 15 '
[8.23.4) z = -1.67.
The z for an IQ of 130 is
130 - 100
[8.24.1) z=
15
30
[8.24.2) z=-,
15
[8.24.3) z = + 2.00.
Using Probabilities with the Standard Normal Distribution: z-Scores Revisited 297
~~=-------~--------~----------~-------=~==x
~ 100 1~
b=~2S2U~~~--~----~----~----~~~~~~==X
-2.00 -1.67 -1.00 0 +1.00 +2.00
11
Figure 8.5 Normal Distribution of IQ scores (j.L = 100, (]' = 15) and Corresponding z Distribution
Note: The shaded area to the left under each of the curves represents the proportion (probability) of scores less
than or equal to an IQ score of 75. The shaded area to the right under each of the curves represents the
proportion (probability) of scores greater than or equal to 130.
As shown by the shaded areas in the z distribution presented in the lower half of
Figure 8.5, we need to determine the proportion of the curve that is equal to or below
z = -1.67 and the area equal to or greater than +2.00. Recall that when we turn to Table
B.2 in Appendix B, no negative signs accompany the z scores in the table because the
z distribution is symmetric. Turning to Table B.2, then, we conceptually want to know
the proportion (probability) of the area beyond (that is, below) z = -1.67. We look,
then, to column C for z = + 1.67 and learn that p(X::::; 75) = .0475. When then look to
see the area beyond z = +2.00 (again, column C) and find that p(X;::: 130) =.0228. The
respective probabilities are noted in the z distribution presented in the bottom of Figure
8.5. If we add these two probabilities together, we will know the likelihood of selecting
an individual whose IQ is less than or equal to 75 or greater than or equal to 130, or
[8.25.1] p(X::::; 75 or X;::: 130) = p(X::::; 75) + p(X;::: 130),
[8.25.2] p(X::::; 75 or X;::: 130) = (.0475) + (.0228),
[8.25.3] p(X::::; 75 or X;::: 130) = .0703.
In other words, there is about a 7% chance of randomly selecting a person whose IQ falls
at or below 75 or at or beyond 130.
Our third example concerns an IQ score that falls below the mean of 100. What is the
probability of randomly selecting an individual with an IQ score of 89 or higher? First, we
sketch a normal curve for the IQ scores and identify where 89 falls in relation to the mean
of the distribution (see the upper curve in Figure 8.6). Note that the probability we are
looking for is associated with a "score of 89 or higher:' thus besides the area under curve
298 Chapter 8 Probability
~~=---------------~--~-----2~~--~~--~~=x
90 100
89
~~-~~~=x
-3.0 -2.0 -1.0 -.73 o +1.0 +2.0 +3.0
jJ.
Figure B.6 Normal Distribution of IQ scores (I-' = 100, u = 15) and Corresponding z Distribution
Note: The shaded area under each of the curves represents the proportion (probability) of scores equal to or
greater than an IQ score of 89.
falling between 89 and 100, we must also take into account the entire area from the mean
and beyond (see the shaded areas in the upper curve shown in Figure 8.6).
For our next step, we calculate the z score corresponding to an IQ score of 89, or
[8.26.1] z= X-y ,
u
[8.26.2] z= 89 - 100
15
11
[8.26.3] z=--'
15
[8.26.4] z= -.73.
The bottom curve in Figure 8.6 highlights the location of z = -.73. The shaded area in
this curve indicates the proportion (probability) under the curve associated with ran-
domly selecting a person who has an IQ score of 89 or higher. Before we proceed to Table
B.2 to determine the area between the mean and the observed z score, by looking at the
bottom curve in Figure 8.5, we know that we will be adding .50 to this area precisely be-
cause the original probability question asked for the probability of selecting a score of 89
or higher. In this case, "higher" entails the whole upper half of the z distribution, which
you will remember is equal to .50 (if you do not recall why this must be so, please stop
now and review chapter 5, especially Figure 5.4).
We locate the entry for z = +.73, keeping in mind that we are working with
z = - .73. Following the logic developed in the bottom curve shown in Figure 8.5,
we need to know the proportion of the area (probability) under the curve falling
Determining Probabilities with the Binomial Distribution: An Overview 299
between z = +.73 and the mean. Looking in the center section of the first page of Table
B.2, we learn that the area between the mean and z = + .73 is equal to .2673 (see the
bottom curve in Figure 8.6). In symbolic terms, of course, we are describing the proba-
bility that an IQ score is greater than or equal to 89 but less than or equal to the mean IQ
of 100, or p(89 ~ X ~ 100) = .2673. Symbolically, in turn, the probability of having an
IQ score equal to or greater than 100 is p(X ~ 100) = .50. (Please note that this fact can
be verified by looking up the mean of O.OO-the first entry-in the top left section of the
second page of Table B.2, where the area beyond the mean, that is, z = 0.00, is .500.)
The probability of randomly selecting an individual who has an IQ score of 89 or
greater can be known by adding the p(89 ~ X ~ 100) to p(X ~ 100). Symbolically, this
probability is
[8.27.1] p(X ~ 89) = p(89 ~ X ~ 100) + p(X ~ 100),
[8.27.2] p(X ~ 89) = (.2673) + (.500),
[8.27.3] p(X ~ 89) = .767.
Thus, the probability of sampling a person who has an IQ equal to 89 or greater is actu-
ally quite high. If you sampled 100 people, you would expect to find that about 77% of
them would have IQ scores at or above 89.
We will close this section of the chapter with a final example, one that relies on the
multiplication rule for independent events. Imagine that we are sampling with replace-
ment. What is the probability of selecting a person with an IQ less than or equal to 75,
followed by one with an IQ equal to or greater than 130? To simplify matters, we can use
the information we collected when working with the second example. Through random
sampling, we learned that the probability of selecting someone with a score at or below
75 is .0475, or p(X ~ 75) = .0475. We know, too, that the likelihood of drawing an indi-
vidual with an IQ of 130 or greater is .0228, or p(X ~ 130) = .0228.
Using the multiplication rule for independent events, the probability of randomly
selecting two people-one with an IQ less than or equal to 75 and one with an IQ equal
to or above 130 is,
[8.28.1] p(X ~ 75) then p(X ~ 130) = (.0475) X (.0228),
[8.28.2] p(X~ 75) thenp(X~ 130) = .0011.
There is litde likelihood that we would randomly sample two people in a row who fell in
the lower and upper tail, respectively, of the distribution of IQ scores. Indeed, there is
about a 1 in a thousand chance that we would do so!
"Do you speak a foreign language fluently?" ''Are you married?"), too, are binomial, as
are any variables that can be categorized into two groups. One's gender-male or fe-
male-is obviously binomial because it naturally falls into one of two possible classes.
Other variables can become binomial through reclassification. Earlier in the
chapter we mentioned the personality construct called self-monitoring (Snyder,
1987), noting that based on their scores on a self-monitoring scale, people can be
classified as either high or low self-monitors (e.g., Snyder, 1974). That is, a score below
a designated cutoff point (e.g., the median score) indicates one is best described as a
low self-monitor-a score above a cutoff would be placed in the high self-monitoring
group. As you can imagine, any personality or other measure with a continuum of pos-
sible scores (e.g., high versus low, present versus absent) can probably be divided into
two discrete groupings (e.g., self-esteem, intelligence, reading ability, anxiety).
When behavioral scientists work with binomial data, they typically know or can deter-
mine the probabilities associated with the two categories of the variable in question. To sim-
plify matters and to demonstrate the basic idea, let's return to a coin tossing example. When
tossing a fair coin, the probability of heads is the same as the probability of tails: p(heads) =
p(tails) = .500. When working with such binomial data, we usually want to know how
often each category is anticipated to occur across time-that is, across a number of trials
(here, coin tosses). What is the probability of getting 20 tails out of 30 flips of a fair coin?
Now, if we leave coin tosses for a moment, we can ask a similar binomial question
regarding data based on some personality construct. What, for example, is the probabil-
ity of finding 25 high self-monitors in a sample of 40 research participants? Alterna-
tively, if I have a colony of lab rats, what is the probability that 35 out of 100 research an-
imals will be abnormally aggressive? To answer a questions like these, we can rely on the
normal distribution but we will do so in a way that enables us to determine probabilities
with binomial data.
If the sample size is n = 2 and X is the number of tails, we can create a table of the possi-
ble outcomes and their accompanying probabilities. Table 8.3 illustrates the precise
probabilities for the four possible combinations of outcomes. These probabilities were
determined by using the multiplication rule for independent events, as well as the addi-
i. tion rule for mutually exclusive events. Using the multiplication rule, we know that the
probability of tossing two tails in a row (i.e., T- T) is .250, as is the probability of tossing
I no tails twice (i.e., H-H) (see the upper portion of Table 8.3). We again use the multi-
plication rule to determine the likelihood of tossing one tail and one head in either order
Ii (see the lower portion of Table 8.3). In both cases, the probability is again .250, but this
time we must use the addition rule for mutually exclusive events-two of the four
r possible outcomes have one tail, so these probabilities must be added together (i.e.,
p(T-H) + p(H-T) = (.250) + (.250) = .500). In short, the likelihood of obtaining one
tail out of two tosses is .500. (Please note that the probabilities for the four possible bi-
r nomial outcomes shown in Table 8.3 sum to 1.00.)
The binomial probabilities shown in Table 8.3 can be used to answer other ques-
I
j tions, as well. What, for example, is the probability of tossing at least one tail in two
tosses? To answer this question, we must take note of the three possible sequences in
r
Table 8.3 that include at least one tail, and then sum the probabilities together. That is,
p(T-T) + p(T-H) + p(H-T) = (.250) + (.250) + (.250) = .750. Do you see why the
probability of tossing at least one tail on two coin flips is equal to .750? A tail occurs in
three of the four possible binomial sequences-adding the probability of each indepen-
dent sequence together gives .750 (see Table 8.3).
,J
I
Relying on the same logic presented here, you could develop binomial distributions
( Think about it: A binomial distribution that would allow you to determine probabilities associated with 3, 5, 6, or even 10 tosses
of a fair coin. If you did so, you would witness an important aspect of the binomial dis-
approximates a normal distribution
/ because the highest probabilities
tribution, one that should not surprise you: As the size of the sample (n) increases, the
binomial distribution becomes more normal in shape. Across 10 coin tosses, for exam-
occur in the center of the distribution ple, the most likely outcome is 5 tails and 5 heads. In contrast, if the coin were fair, the
(i.e., 5 tails and 5 heads); other probabilities associated with tossing 10 tails (or 10 heads) across 10 tosses would be very
low indeed. Ten tails or 10 heads occurs in the tail of either side of the probability distri-
binomial patterns become less
bution, areas that represent the least likelihood of occurrence.
frequent as you move toward either
tail (i.e., 10 tails or10 heads). Approximating the Standard Normal Distribution with the Binomial Distribution
Before we can proceed with using the binomial distribution to approximate the normal
J distribution, we need to introduce a few more facts about the links between these distri-
"
butions. We already know that as n increases in size, the binomial distribution will ap-
proximate the normal distribution very well. It turns out that the binomial distribution
I will fit the normal distribution best when pn and qn are both greater than or equal to a
value of 10. When this requirement is met, the JL and the u of the binomial distribution,
respectively, can be known by using the following formulas. The mean or
[8.31.1] JL of a binomial distribution = pn.
302 Chapter 8 Probability
[8.33.1] z = 2S..=....l:!:.. = X - pn .
(]' v-;;pq
III There is one important difference between the normal distribution and its binomial
True limits are necessary when the counterpart-the latter has gaps between observations. The normal distribution is
continuous, so we need not worry than any gaps appear between observations along the
normal distribution is used to
continuum. Although the binomial distribution is generally a good fit to the normal
approximate probabilities for the distribution, its observations are discrete, not continuous-gaps exist. We deal with this
binomial distribution. minor discrepancy here in the same way we learned to do so when working with
discrete data in any grouped frequency distribution-we rely on true limits (for a
review, see page 95 in chapter 3). Thus, if we want to know the probability of getting a
score of X = 18 on some measure, we set its boundaries as 17.5 and 18.5. We then find
the proportion of the area under the curve (probability) falling between these two true
limits. (As an aside, I hope you realize the cumulative nature of the material you have
learned in the first eight chapters of this book. The material really builds on itself, and
earlier ideas are central to understanding and working with later ones. Please do not
hesitate to review any concepts that you cannot remember-what you read will mean
more to you and, more importantly, you will retain it better if you take a few minutes to
reread earlier sections of the book.)
By taking all the properties of the binomial distribution into account, we can now
work through a complete example. Imagine that you are an industrial organizational
psychologist who is interested in developing a screening measure for potential
employees. This screening measure is designed to identify those persons who have a
predilection for telling lies or being otherwise deceptive in their responses. The instru-
ment contains 60 multiple choice questions, and each question has 3 possible answers.
One of each of three answers to a question is deemed to be a deceptive (Le., lie prone) re-
sponse. You constructed the instrument so that a respondent is considered to be poten-
tially untruthful if 25 deceptive responses out of 60 questions are endorsed.
If a respondent were just answering randomly, the chance of selecting a deceptive
answer would be lout of 3 (Le., p = 1/3 = .333) and the probability of selecting a
nondeceptive response would be 2 out of 3 (Le., q = 2/3 = .667). If we have 60 ques-
tions, then we can calculate both pn and qn to ascertain whether they are greater than or
equal to 10, the aforementioned requirement that must be met before we can use the
normal distribution.
[8.34.1] pn = (.333)(60) = 19.98,
[8.35.1] qn = (.667)(60) = 40.02.
Both pn and qn clearly exceed the criterion of 10, so we can proceed to the calculation of
the population mean and standard deviation for this binomial distribution. We will use
formulas [8.31.1] and [8.32.1], respectively, for determining J.L and u:
[8.31.2] J.L of a binomial distribution = pn = (.333)(60) = 19.98.
Determining Probabilities with the Binomial Distribution: An Overview 303
(Please note that we already calculated this number earlier using [8.34.1], however, we
·
I did so for a different reason.)
}
r
[8.32.2] 0" of the binomial distribution = vnpq =V(60)(.333)(.667),
, [8.32.3] 0" of the binomial distribution = V13.33,
/
[8.32.4] 0" of the binomial distribution = 3.65.
Thus, the distribution of 60 questions is normal, and it has a p, of 19.98 and a 0" of
3.65. Our interest is to determine the area of the distribution where X = 25. As shown in
the upper curve in Figure 8.7,X= 25 falls into the region bounded by the lower true limit
r of 24.5 and the upper true limit of 25.5. How do we determine the probability associated
( with the highlighted area in upper curve shown in Figure 8.n
We simply calculate the z
/
r
r
scores corresponding to 24.5 and 25.5 using [8.33.1], or
1
I
X- pn
i [8.36.1] z= vnpq ,
I
r [8.36.2] z=
24.5 - 19.98
3.65
!,
r [8.36.3] z=--,
4.52
3.65
i-
r
~ [8.36.4] z = 1.24.
I
i
r
/
(
(
y
,f
(
!
J
I
~~~-----------------L------~--------~~==x
19.98 24.5 25.5
(
25
/
j
~
i
J
r
i
I
~~~~--~--~~--~----~4--L----~~==x
+1.0 +2.0 +3.0
Il. +1.24 +1.51
I
i
I Figure 8.7 Binomial (Normal) Distribution Representing the Number of Deceptive Answers (X)
on a 50-item Employment Instrument
Note: The shaded portion in the upper and lower curves represents the probability of giving 25 (X) out of 60
deceptive answers. The score of 25 is bounded by the true limits 24.5 and 25.5.
r
J
304 Chapter 8 Probability
x- pn
[8.37.1] z = vnpq ,
25.5 - 19.98
[8.37.2] z=
3.65
5.52
[8.37.3] z= - - ,
3.65
[8.37.4] z = 1.51.
Please note that these two z scores have been plotted in the z distribution shown
in the lower half of Figure 8.7. As you can see, we need to determine the proportion
p Values: A Brief Introduction 305
(probability) of the curve falling between these two boundaries. To do so, we need to
(a) determine the area between the mean and zfor each zscore and then (b) calculate the
amount of overlapping area between the two. That overlapping area can be calculated by
subtracting the smaller proportion (probability) of z = 1.24 from the larger proportion
(probability) of z = 1.51. Turning to Table B.2 in Appendix B, we find that the area be-
tween the mean and z for 1.24 is .3925 and the area between the mean and z for 1.51 is
.4345. The difference between these two probabilities represents the likelihood of getting
a deception score of 25 on the instrument, or .4345 - .3925 = .042. Thus, the probabil-
ity of getting a lie score of 25 is very low-less than 5%-and if a respondent did obtain
such a score, the industrial organizational psychologist would be appropriately con-
cerned about extending that person a job in the corporation.
What is the probability of obtaining a lie score of less than 25 on the same de-
ception scale used for applicant screening? The industrial organizational psychologist
also needs to be aware of the relative likelihood of not being considered a deceptive
(
applicant. If you look back at Figure 8.7, we are interested in the area under the
( curve that falls at or below z = 1.24. Why z = 1.24? Precisely because it is the lower
true limit associated with a score of 25 on the lie scale, and the question cited above
/
specifies a score less than 25.
I'
I Notice that we need to use the proportion of the curve falling between the mean and
r z, which we already determined was equal to .3925. To this, of course, we must add .50,
the proportion (probability) of the curve falling at or below the mean of the z distribu-
tion (see the lower curve shown in Figure 8.7). Why? Because the question asked the
probability of obtaining a score less than 25 on the deception scale. Thus, .3925 + .500
(- = .893. In other words, the majority of people who take the deception scale are not likely
i to be identified as potential liar-indeed, over 89% will "pass"the test without ever rais-
(
ing concern about the nature of the answers they provided on it.
r
Knowledge Base
l. A personality test has a /L of 50 and a (Y of 10. What is the probability of obtaining a
score of 62 or higher on the test?
2. Using the parameters presented in question 1, what is the probability of obtaining a
score less than or equal to 40 and greater than or equal to 55?
,./ 3.
I When can the normal distribution be used to approximate the binomial distribu-
I
tion?
j 4. There are 50 multiple choice questions on a quiz, and each question has four possi-
/
i ble responses. What is the probability that a person would get 20 questions right just
i by guessing?
Answers
1. p(X"2:.62) =.1151
2. p(X:S 40) + p(X "2:.55) = (.1587) + (.3085) = .467
3. When both pn and qn are greater than or equal to 10.
4. P = .25, q = .75; pn and qn "2:. 10; t-L = 12.5, u = 3.06; p(X = 20) = .0065
Every hypothesis and each statistic used to evaluate or test its efficacy is accompanied
by what is called a probability value or p value. The role p values play in hypothesis
testing will be conceptually introduced here-their actual use will be taken up in the
next several chapters.
Any p value helps a researcher determine whether the likelihood of the results ob-
tained in an experiment or other research investigation deviate from a conservative expec-
tation of no difference. Thus, the baseline expectation in any research venture is that the in-
dependent variable will create no observed difference between an experimental and a
control group. We use probability in the guise of a pvalue to gauge the likelihood that a fa-
vored research hypothesis defies this conservative expectation, that a difference between
groups is detected. What is the likelihood that a difference between groups is actually
found? How do we know that the observed difference is actually due to an independent
variable (i.e., a systematic factor) rather than chance (Le., random, uncontrollable factors)?
When a p value is linked to a test statistic, the p value indicates the observed rela-
tionship between some independent variable and its influence on a dependent measure.
Specifically, the p value reveals the degree of the likelihood that the test statistic detected
any difference between the two (or more) groups represented by different levels of the
independent variable (e.g., the mean level of behavior in one differed from the mean be-
havior observed in a second group). In the behavioral sciences, the convention is to rely
on a p value equal to or lower than .05. A P value of .05 (read as "point oh-five") tells an
investigator that there is a 95% chance that an observed difference is real-that is, the in-
dependent variable presumably caused the observed difference between group perfor-
mances on the dependent measure. At the same time, the p value reminds us that there
is a probability of 5% (or less) that the observed difference is due to chance factors-
random variation in the sample-and not the effect of the independent variable on the
dependent measure.
To be more conservative in the sense of making it more difficult to detect a differ-
ence between two groups, one can use a p value of .01 (read as "point-oh-one"). A
p value equal to .01 means that there is a 99% chance that a test statistic is identifying a
reliable effect and only a 1% chance that the difference is due to influences beyond an ex-
perimenter's control. Reducing the level of a p value makes it harder to detect a differ-
ence between groups, so that when a difference is detected under these strenuous condi-
tions, we are reassured that it is apt to be a reliable one. One can even rely on still lower
p values, a matter we will discuss later in this book.
For the time being, however, I want you to think about how probability can help in-
vestigators to determine the likelihood of obtaining particular research results. Proba-
bility helps researchers decide whether a particular event is likely to occur due to some
intervention or the degree to which a treatment can be seen as actually creating verifiable
change in behavior.
could be expected to perform above average, the criterion for inclusion was deemed
to be too low... :'
Naturally, readers should be informed if a particular type of probabilistic relation-
ship is reported One might write that, "The marginal probability of being male in the
j
sample was .54" or that the "joint probability of being both male and a fraternity mem-
ber was equal to .26." When writing about a conditional probability, I would err on the
side of caution. In addition to reporting the actual probability value, I would also be sure
to describe the conditional quality of the relationship among the variables: "Given that
we knew that a respondent was male, the probability that he was also an active member
i·
of a fraternity chapter was .64:'
When reporting several probabilities at once, it is a very good idea to place them
into a well-labeled table. By using a table, you can describe the main points or gist of
the probability results in the text and then direct readers to consider examining specific
probabilities in the table. If you try to report more than a handful of probabilities in the
text, readers will become bored or distracted-they will skip the presumably impor-
tant probability information and go on to other material. In fact, reading about more
than a few probabilities at one point in time is boring, as the tendency to present them
is unavoidably done in a "sing song"sort of fashion (e.g., "The probability of being a
./
I female was .46, while the probability of being female and on the Dean's list was .24. On
i the other hand, the probability of ..."). Like more than one or two correlational rela-
I- tionships, multiple probabilities should be placed in a table so they can be examined or
t
I referred to when a reader's pace or level of interest allows. Only the most important or
I
I
otherwise salient probabilistic relationships need to be described in the actual text.
! A final suggestion: When you have occasion to write about probability several
i
times in the same paper, try to break up the descriptive monotony for the reader (and
j yourself as writer) by using probabilistic synonyms such as "likelihood:' "expectation:'
r\ or even "chance:' It may seem to be a minor point but repeated use of the word "prob-
i ability" can grow tiresome-judicious use of synonyms will spice up the prose and
maintain reader interest.
I
are the initial directions:
1. Flip the coin 30 times and record the results of each trial (using H for "heads" and T
) for "tails") in the space provided in Table 8.4. Spin the coin 30 times and record the
308 Chapter 8 Probability
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
Total number of heads by flipping _ __ Total number of heads by spinning _ __
results (again, H or T) in the space provided in Table 8.4. Be sure to total the num-
ber of "heads" for the respective categories at the bottom of Table 8.4.
2. Be prepared to hand in the results shown in Table 8.4, as you will need to combine
your results with those found by your classmates. For convenience, your instructor
may want to pool the class results and distribute a master data set to everyone in
your class (see Figure 8.8).
3. Create a simple matrix with four columns (see Figure 8.8). The number of rows
should correspond to the number of students (i.e., 1 to N) who share the results of
their 30 flips and 30 spins from Table 8.4. The information in the columns of Figure
8.8 should contain the following information in this order: number of "heads" ob-
tained by flipping; total number of flips; number of "heads" obtained by spinning;
and the total number of spins (see Figure 8.8). Create a master record like the one
shown in Figure 8.8 and then make the appropriate number of copies for all stu-
dents who collected data.
p Values: A Brief Introduction 309
·
,.(
,
1. Jane Doe 10 30 14 30
I
2. _ _ _ _ __
f
3. _ _ _ _ __
>
f
.,,
(
N. _ _ _ _ __
I
r
I"
f
/
f
( 1.00
,
I
)
; .75
t J
y
i .50
i
!!
.25
o
) 30 60
X =
gO 120 150 180 210 240 270 -
Cumulative Number 01 Tosses or Spins
oc
i
/
I
1
4. Every student should have flipped a coin 30 times, as well as indicated the number
I
of heads that appeared in 30 flips. Using the results from the N students, compute
the cumulative proportion of "heads" (Y) and plot this value versus the total num-
ber of flips (X) (see Figure 8.9). Imagine that your class had three students, for ex-
ample, and if the first student obtained 10 heads in 30 flips, the second had 16 heads
in 30 flips, and the third had 14 heads across 30 flips, then the cumulative propor-
tions would be 10/30, 26/60, and 40/90, respectively. These proportions (probabili-
ties) would then be plotted on the Yaxis versus the number of flips (e.g., 30, 60, and
90) shown in Figure 8.9. What does your plot of these data reveal about the probabil-
ity of observing heads as you flip a coin across an increasing number of trials?
310 Chapter 8 Probability
5. Using the spin data from Table 8.4, repeat the steps presented in question 4. Com-
pare the plot of the spin data to the plot of the flip data. Are these plots similar? How
so? How would you characterize the proportion (probability) of "heads" as the N of
tosses or spins increases?
6. In your class, everyone flipped a coin 30 times and then spun a coin 30 times. Ex-
plain the advantages and disadvantages of this research design as compared to a
study where (a) each student is randomly assigned to either flip a coin 30 times or
spin a coin 30 times and (b) everyone is randomly assigned to either flip a coin 30
times and then spin a coin 30 times or to spin a coin 30 times and then to flip a coin
30 times.
Summary
1. A probability is a quantitative statement about the likelihood bIer's fallacy include any game of chance where players as-
that some event will occur in the future. Probabilities range sume that a run of one event must necessarily change to an-
°
between and 1.00, and higher probabilities indicate a rela- other event (e.g., a string of tossing "heads" means that "tails"
tively greater likelihood that some event will take place in the must occur next).
longrun. 4. Independence refers to situations where the probability
2. Probabilities usually entail examining the ratio of the num- of one event remains unaffected by the probability of
ber of actual occurrences of some event to the total number some other event. When flipping a coin, for example, a
of possible occurrences. Probability is also used to determine "head" is just as likely to be followed by another "head" as
the likelihood that a sample is drawn from one population it is a "tail."
rather than another. To make such judgments, probability is 5. Classical probability theory calculates the probability of
linked with inferential statistics. some event A to be peA) = the number of outcomes fa-
3. The gambler's fallacy refers to the erroneous belief that one voring event A/the total number of available outcomes
randomly determined event will influence the outcome of (i.e., A + not A).
another randomly determined event. Examples of the gam- 6. Sampling with replacement entails selecting an observation
Chapter Problems 311
from a sample, determining the probability, and then return- probability of one event is dependent upon the role of an-
ing the observation before drawing the next observation. other event (e.g., given that a student is a freshmen, what is
When sampling with replacement, the probabilities associ- the probability he will be placed on academic probation, or
ated with a given sample remain constant. Sampling without p(AIB». Finally, a marginal or "unconditional" probability
replacement follows the same process except that the obser- provides the likelihood of some independent event's occur-
vation is never returned to the sample-thus, the numera- rence.
tore s) and denominator( s) in subsequent probability calcula- 10. Binomial data occur when information can be divided into
tions must be adjusted to reflect the removal of observations or changed into two discrete groupings. A binomial distribu-
from the sample. tion is one wherein events can have one of two possible out-
7. Proportion and probability are related concepts; indeed, both comes. When particular conditions are met, the binomial dis-
are calculated the same way. Thus, the area under the normal tribution can approximate the standard normal distribution.
curve can be discussed interchangeably as proportion or 11. Probability or p values were briefly introduced. These p val-
probability information. Standard or z scores can be used to ues will become increasingly important as we review inferen-
determine various proportions (probabilities) associated tial statistical tests in subsequent chapters. For present pur-
with various areas under the normal curve. poses, p values were described as guides to determining
8. Two events are mutually exclusive from one another when whether some inferential statistical test actually found that
they share no observations in common (e.g., a person is the independent variable created some observed change in
either male or female). An event is said to be nonmutually the dependent measure.
exclusive then two events occur simultaneously (e.g., a 12. When reporting probabilistic information, the numerical
person can be male and a Yankees fan). Two rules for value should be explained in concrete terms-it is not suffi-
probability-the addition rule and the multiplication cient to rely exclusively on numbers, as readers require a con-
rule-were introduced, and variations of each rule were text for understanding results. When several probabilities are
applied to mutually exclusive and not mutually exclusive presented, they should appear in a table, whereas three or
events. fewer probabilities can probably be discussed within the text.
9. A joint probability is the numerical likelihood of selecting Judicious use of synonyms for the word "probability" will re-
some observation that shares two characteristics or events tain readers' attention, while overuse of the latter term can
(i.e., peA and B». A conditional probability occurs when the quickly become monotonous.
Key Terms
Addition rule (p. 285) "Man-who" statistics (p.278) Probability value (p. 306)
Binomial distribution (p. 299) Marginal probability (p. 292) Sample space (p. 277)
Conditional probability (p.290) Multiplication rule (p. 287) Sampling with replacement (p.279)
Gambler's fallacy (p. 275) Multiplication rule for dependent Sampling without replacement (p. 280)
Independence (p.276) events (p.293) Subjective probability (p. 274)
Law of averages (p. 276) Mutually exclusive (p.286)
Joint probability (p.285) p value (p. 306)
Chapter Problems
l. What are some examples of the sort of probability judg- know that I'm due to win soon!" Characterize the inherent
ments you make in daily life? Even if they are difficult to flaws in Steve's reasoning. What do statistician's call this sort
quantify, can they still be probability judgments? Why or of thinking and behavior?
why not? 4. A sociologist presents data on the average life span of male
2. You are standing outside on a dark and cloudy day. You and female adults in the United States, noting than the aver-
remark to a friend, "I am fairly certain that it is going to age age of death for men is about 72 years and that for
rain:' Why is your remark best described as a subjective women is about 74. A student raises her hand and says she
probability? cannot believe those statistics because "my grandmother is 92
3. Steve is playing a slot machine in one of the major hotels in and my next door neighbor, Mr. Smith, is 97!" What sort of
Las Vegas. His friend, Paul, tries to convince him to leave it statistic is the student citing? Why is it problematic? What
and join other friends in the main casino. Steve refuses, noting should the sociologist say in response?
that "I've spent 20 bucks on this thing so far-I've primed it. I 5. When flipping a coin and obtaining a string of either "heads"
312 Chapter 8 Probability
or "tails;' why do some people assume that they can accu- 16. A social psychologist studies helping behavior when a person
rately predict what the next flip will be? is either alone or when others are present. She performs a
6. Why isn't one flip of a coin, say a "head," affected by the flip study in the field-an older woman appears to fall and hurt
that occurred just before it? Does chance "correct" itself? Why her leg. Sometimes only one participant witnesses the event,
or why not? other people besides the participant are present the remain-
7. In your opinion, what does the quote from Stanislaw Lem on der of the time. Examine the following data table and then
page 277 mean about the role probability plays where human answer the questions that follow it.
judgments and decisions are concerned?
Help Given No Help Help Given
8. A sack contains 30 blue marbles, 10 green ones, and 20 red
Participant to Confederate to Confederate
ones. Using sampling with replacement, what is the probabil-
ity of selecting a green marble? A red marble? Two blue mar- Alone 30 8
bles in a row? A green, then a red, and then a blue marble? With others 6 42
9. Using sampling without replacement, provide the probabili-
ties listed in question 8. What is the probability a person waiting alone offered to help
10. An upper-level seminar class offered by the Psychology the confederate? What is the probability that a someone wait-
Department enrolls 15 students, 8 of whom are juniors and the ing with others did not offer to help the confederate? Given
rest are seniors. Six of the juniors are female and 3 of the se- that a person was with others, what is the probability that the
niors are female. The remaining students are males. What is the person offered to help the confederate?
probability of being a male in this class? A female? What is the 17. Using the data provided in question 16, show how the multi-
probability of being a junior and female? A senior and male? plication rule for dependent events can be used to calculate
11. Examine the following frequency distribution: the joint probability of being alone and not offering to help
the confederate.
x f 18. An intelligence test has a Ii of 100 and a u of IS. What is the
12 8 probability of obtaining a score of 103 or higher? What is the
11 10 probability of obtaining a score of 85 or lower? What is
8 11 the probability of obtaining a score less than or equal to 95
6 10 and greater than or equal to lIS?
5 6 19. A measure of romantic attraction has a Ii of 75 and a u of 8.
4 3 What is the probability of obtaining a score between 76 and
3 5 82? What is the probability of obtaining a score greater than
or equal to 90? What is the probability of obtaining a score of
Determine the following probabilities: p(X = 4); p(X = 11);
less than 50?
p(X> 5); p(X < 3); p(X;::: 8); p(X S 8).
20. In what ways is the binomial distribution similar to the stan-
12. Examine the following frequency distribution:
dard normal distribution? How is the former different than
x f the latter? Under what specific conditions can the binomial
distribution be used to approximate the standard normal dis-
15 7
tribution?
12 13
21. A multiple-choice test has 100 questions, and four possible
7 10
responses to each question. Only one out of each of the four
3 8
responses is correct. If a respondent is just guessing, what is
2 6
the probability of getting 48 questions correct by chance?
Determine the following probabilities: p(X = 3); p(X = 15); 22. A true-false quiz has 50 questions, and a student needs to get
p(X> 7); p(X < 15); p(X;::: 5); p(X S 3). 30 of them correct to pass. What is the probability of earning
13. A sack contains 13 black marbles, 16 white marbles, 4 pink a passing grade by guessing?
marbles, and 8 pink and white marbles. What is the probabil- 23. A statistician calculates the probability that an event will
ity of selecting a marble that is black? What is the probability occur to be - .35. How likely is it that this event will occur?
of selecting a marble that is pink? What is the probability of (Hint: Use the decision tree(s) at the start of the chapter to
selecting a pink or a white marble? What is the probability of answer this question properly.)
selecting a marble that is pink and white? 24. A researcher calculates 20 separate probabilities to include in
14. You are flipping a coin. What is the probability of obtaining a research report. How should these probabilities be pre-
each of the following sequences: H-H-H-T-H-H-T; sented to readers? (Hint: Use the decision tree(s) at the start
H-H-H;T-T-T-T-T-T-T-T;H-H-T-T-H. of the chapter to answer this question properly.)
IS. Which of the following distributions of coin tosses is more 25. A students wants to calculate the likelihood that several events
likely to occur than any of the others: H-H-H-T-T-T or will occur but she does not know the number of possible ob-
H-H-H-H-H-H orT-H-T-T-H-H? servations favoring each event. Can she still calculate the
Chapter Problems 313
probabilities? (Hint: Use the decision tree(s) at the start of the demonstration? (Hint: Use the decision tree(s) at the start of
chapter to answer this question properly.) the chapter to answer this question properly.)
26. A statistics instructor wants to demonstrate probability theory 27. Which probability rule is appropriate for each of the follow-
to his class. He dumps 10 red marbles, then 5 green marbles, ing situations (Hint: Use the decision tree(s) at the start of the
and then 40 yellow marbles into a can. He wants to explain to chapter to answer this question properly):
his class the likelihood that if he reaches into the can he will se- a. Events are conditional on one another.
lect a yellow marble. Is he ready to demonstrate the probability b. Events occur in an independent sequence.
of selecting a yellow marble? Why or why not? Properly speak- c. Events are mutually exclusive.
ing, does he need to do anything else in order to prepare this d. Events are not mutually exclusive.
Performing a One-Sample Hypothesis Test
1. 2. 3.
Is there more than one separate, Js there information regarding pop- Will a two-tailed significance test be~
sample available? ,,~. ulation parameters (especially iT, used to test the null hypothesis?i~
i·
, >, ..•(1). I but perhaps fJ, or p, as well)?
If yes, then a one-sample If no, If yes, then go to If no, then you If yes, then go to If no, then provide
hypothesis test is not ap~ then go step 3. cannot perform a step 4. a clear rationale for . . .
propriate; consult a later to step 2. one-sample hy- ,,' " using a one-tailed
;'7{chapter (e.g., chapters 10, pothesis test.·~. ~ test before pro-
11, 12, or 13) for guidance. ceeding to stepA.
4. 5.
Is the significance level for the statistical test Is the value of the test (z or r) greater than
appropriately conservative (i.e. p =:05 orless )T " or equal to an obtained critical value?
If.yes, then consider whether the sig~, It no, then consider lowering the significance If yes, then reject If no, then accept
nificance level is sufficient to guard level for the test while reviewing its potential the null hypothesis the null hypothesis
against Type I errors; go to step 5. impact on the test's power; go to step 5. of no difference and of no difference and
I '" interpret the result. interpret the result.
Ii' If yes, fhen rely on If no, th~n go to" f'lf yes, then use a more stringent signiii~ . If no, retain the conventionaL05 sigrlifi-
the conventional sig- step 2. i'ance level (.01, .001) for the test statistic. cance level and take steps to enhance the
nificance level of .05.' power available for the test statistic.
If yes, then go to If no, then con- If yes, If no, provide a If yes, then to If no, then use the con-
step 2. sider collecting then go to clear rationale avoid making a ventional p value of .05
• '2 I.' more data before step 3. for using a : Type I error, con- , but consider replicat-
beginning one-tailed test sider using a Ing any significant
analyses. before going to p value lower effect(s) before any
,
step 3. than .05. publication or
presentation.
Enhancing a Study's Power
0'
1. 2. 3.
Will the sample size be rea~., , IS.the dependent measure sensitive enough., Is there evidence to indicate that the independent;
sonably large (i.e., ~30)? '., to reveal differences between groups? variable will be salient to research participants?
If yes, then go If no, then re- If yes, then go to If no, then consider If yes, then If no, then consider verifying
to step 2. cruit more par- step 3. using a more estab- go to the independent variable's ef-
ticipants be- r lished dependent step 4. fectiveness in a pilot stuay
fore beginning measure; go to step 3. before actual data collection
data collection. begins; ~o to step 4.
4. 5.
.. Can a one-tailed significance test be used?
.....' :iWili a conventional significance level (.05) be appropriate fo(
. the statistical test used to analyze the data? <
If yes, then be sure to balance If no, then go to step 5. If yes, then be sure to balance If no, then be advised that more
its use against the threat posed" its use against the threat posed conservative significance levels
.j
I by Type I errors; go tostep 5.
..•.... by Type I errors. (e.g.,.01, :001) restrict power.
'.
~-
,
..
,
(
; C HAP T E R 9
(
•
ChapIer OUIIIne
(
,
r t=1YPOTFtLSIS TLSTINCT the Standard Error of the
Data Box 9.B: Standard
Index of Stability and Ke/ilaOlllfY.
Means
I Knowledge Base
I Data Box 9.C: Represellting
Standard Errol' Graphically
• Asking and Testing Focused
; I
he first eight chapters in this book were a necessary preamble to prepare readers
for this chapter. These earlier chapters introduced and explained a variety of sta-
Questions: Conceptual
Hypotheses
Data Box 9.D: Wllat Constitutes
I Good Hypotllesis?
~
tistical theories and concepts, which will render this chapter's material-the logic Directional and Nondirectional .
Hypotheses
underlying the testing of hypotheses-comprehensible and doable. Your understanding The Null and the Experimental
of sampling, populations, and probability will all be brought to bear in the exploration Hypothesis
• Statistical Significance: A
of hypotheses, which were previously defined as testable questions or focused predic- Account
tions. Once a hypothesis is identified, the research is executed, and the data are collected, Data Box 9.E: J)isitinvllis~,in"
inferential statistics are used to test the viability of the hypothesis; that is, was an antici- Statistical and
Critical Values: Est,ablishillg
pated relationship found within the data? for Rejecting the Null
) Here are a few examples of the sorts of hypotheses behavioral scientists investigate: One- and Two-Tailed Tests
Degrees of Freedom
a variety of public and social policy issues. The political psychologist wants to de-
termine if fourth-years students are more likely to vote for liberal candidates, while
first-year students will tend to endorse conservative candidates.
• A health psychologist believes that middle-aged individuals who care for elderly
parents are at greater risk for illness than similarly aged persons with no caregiver
responsibilities. The investigator interviews the two sets of adults and then gains
permission to examine their medical records at the end of a I-year period, hypothe-
sizing that the caregiver group will show more frequent illnesses, visits to the doctor,
hospitalizations, and medicine prescriptions than the noncaregiver group.
• An experimental psychologist is interested in the role that zinc, a nutrient in a nor-
mal daily diet, plays in learning and memory processes. The psychologist exposes
two groups oflaboratory rats-a control group of "normal" animals and an exper-
imental group of nutritionally zinc-deprived animals-to a novel maze. After a few
baseline trials, the animals are individually returnc.d to the maze and the number
of eHors made prior to the discovery of a food reward is recorded. The psycholo-
gist wants to demonstrate that the zinc-deprived group will show a higher average
number of search errors relative to the control group.
In each of these four scenarios, the procedures involved in testing the respective hy-
potheses are similar. None of these researchers can ever hope to measure the responses
of every possible respondent in their population of interest, so such data are usually col-
lected in the form of some random sample, one that is presumed to be representative of
a larger population. This random sample is then generally randomly divided into two
(or, on occasion, more) distinct groups-a control group and an experimental group.
Each group is then exposed to one level of the independent variable-the variable ma-
nipulated by or under the control of the investigator. As suggested above, not every vari- r
able of interest can be manipulated (e.g., time to acquire sociopolitical sophistication,
the stress of caregiving)-some are naturally occurring, but their effect on the members
of a distinct subgroup can still be examined. But in every case, the collected data are
the participants' responses to a dependent variable, usually some verbal or behavioral
measure, an identical version of which is presented to each group in each of the re-
spective studies.
Hypothesis testing entails comparing the groups' reactions to the dependent mea-
sure following the introduction of the independent variable. The practical matter is this:
Did the independent variable create an observed and systematic change in the depen- r
dent measure? Specifically, did the experimental group behave or respond differently
than the control group after both were exposed to the independent variable? The theo-
retical matter is this: Following exposure to the independent variable, is the f.L of the
experimental group verifiably different from the f.L of the control group? In other
words, can we attribute the differential and measured between group differences to the
fact that the control and experimental groups now effectively represent difJerentpopula-
tions with difJerent parameters? We turn now to the importance of establishing a link
among the questions researchers pose, the samples they draw, and issues of estimation
and experimentation.
drawn from this 0 ulation. The investi ator hopes that the sample is representative of
the population from which it was drawn, that e typIC or average e avior witnessed
, in the sample reflects what is usually true of the population. In chapter 1, we learned that
;
the first question to ask of any sample is whether its sample statistics (i.e., the sample
..'
mean and standard deviation) are similar to those of the population's parameters
(Le., the population mean and standard deviation). Thus, our first matter of concern is
0'
j the role of inferential statistics in the estimation ofpopulation parameters.
I
i
,( Point Estimation
i Can we characterize the arameters of a 0 ulation ba a sin Ie sam Ie? Yes, we
f can-or we can at least try-and when we do so, we are engaging in w at statisticians
I
r ~~'" .r -- _t. . .r;l.,.Iev
.
o
KEY TERM
refer to as point estimation.
~- --,-
.APoint estimation is the process of using a sample statistic (e.g., X, s) to estimate the value of some
-llpopulation parameter (e.g., IL, a), ...L. J.J
v
pop ~~ ~ " iPl'noc"lIfC,
Anytime we calcUlate the mean ot some sample of data in preparation for use in some
r
statistical test, we are also, in effect, asking whether that mean (X) is close to or equal in
.- value to the J.L of the population from which it came (to refresh your memory on the link
between sample statistics and population parameters, see Table 4.1). Indeed, we truly
hope that X= J.L. -;c
f The obvious drawback to point estimation is that only one sample is used to esti-
mate the characteristics of a population that could very well be infinite in size and scope.
To phrase the problem in the language you acquired in the last chapter, what is the prob-
ability that one sample statistic is going to provide an adequate estimate of a given pop-
ulation parameter? The chances that a sample statistic will closely match a population
parameter are few and far between-one sample is simply not sufficient, though it is
often all that a behavioral scientist has to go on. We assume, then, that any given sample
I
f'
statistic is a t to contain some de ree of error (i.e., the dlHerence betweeii €sLhnatcd and
I actual reality) an t ere is little we can do to improve the situation besides using random
I
r sampling and ensuring that the size of the sample is reasonably large. This form of
error-called sampling error-was introduced in chapter 2, where we acknowledged
that there will always be some degree of sampling error. The question is, how much sam-
pling error is reasonable or acceptable?
Overcoming point estimation's limitations can be achieved through a somewhat
1 more laborious process known as interval estimation. Where point estimation focuses on
j
t'" comparing one statistic and one parameter, as it were, interval estimation relies on re-
t peated sampling and repeated generation of statistics in order to characterize a popula-
,r tion of interest, as well as some parameter (usually J.L). The repeated sampling, though,
(
is for the express purpose of examining the amount of variability observed to occur
i', among the sample statiStlCS.
(
I KEY T E R!l1" Interval estimation involves careful examination and estimation of the variability noted among
J
/
l' sample statistics based on the repeated sampling of the same population.
Interval estimation provides a range within which sample statistics can fall by allowing
behavioral scientists to project by how much these statistics would be expected to vary
; through repeated sampling of the opulation. If repeated sampling reveals sailiple )6; of
similar va ue, en a researcher can be reasonab y confident that any g!yen sample is rep-
resentative of the population. If the sampling process reveals sample XS that vary some-
what from one another in value-in other words, there is some degree of variability
f among the observed means-then the researcher will want to report the sampling error
i
,.\ around the mean of the sample means.
318 Chapter 9 Inferential Statistics: Sampling Distributions and Hypothesis Testing
You probably do not realize it, but you frequently hear or read about interval esti-
mates when pubiic opinion surveys are reported. When a scientific study reports that
"62% of Americans favor shoring up the Social Security program:' the written text or
televised voice-over will also typically note that the study is "accurate to within 3
percentage points." This accuracy reading refers to the fact that not every American was
actually polled, so that "majority" opinion actually falls within ±3 percentage points, or
between 59% and 65%. This interval estimate, then, provides an adequate cushion or set
of boundaries, really, for any sampling error in measurement. We will discuss how to
calculate a similar sort of interval later in the chapter.
KEY TERM Hypothesis testing compares sample data and statistics to known or estimated population pa-
rameters.
Is the sample
mean of the
experimental group
"* to the original
population I!-
(i.e. , the experimental
group's Xis Irom
some other population
with a different I!-)?
Is the sample
mean 01 the
control group
,.. population I!-?
Figure 9.1 The Process of Sampling and Inferring Whether a Statistic is from One or Another
Population
re resentative sampling. This "as if" hypothesizing enables behavioral scientists to evalu-
ate the stren 0 ypotheses in an indirect manner; that is, given what the sample data
show, how likely is it that the same etteZts hold true in the population at large?
This review of hypothesis testing and point estimation is a conceptual introduction.
We will learn their practical side-how to calculate and apply them-later in the chap-
ter. No doubt some of the material reviewed so far can sometimes seem redundant, as we
discussed much of it earlier in the book. Some ideas merit repetition, however, and I
think that later you will agree that it is better to err on the side of repetition to ensure
understanding than to assume that understanding already exists or that your memory of
) the eight previous chapters is flawless. We turn now to some critical theoretical topics
concerning sampling, probability, and estimating the characteristics of unknown
populations .
composed of different respondents, they also possess a different X and s). What if we
continued to draw samples of the same fixed sample size N-could we begin to discern
any pattern or shared characteristics among the samples? Yes, and in fact, we could even
make some predictions based on all of these sample means. If we did so, we would be
using what is known as the distribution of sample means.
KEY T E R M~ A distribution of sample means is a group or collection of sample means based on random sam-
r pies of a fixed size Nfrom some population.
Theoretically, the distribution of sample means can be exhaustive-that is, all possible
sample means based on an infinite number of samples-but in a conceptually more
manageable way, it is possible to think of this distribution as being based on a very large
number of samples.
Please take careful notice of one important aspect of the distribution of sample
Sampling distributions are different means: It is a distribution comprised of sample statistics (here, Xs) rather than individ-
ual scores or observations, a clear departure from the familiar sorts of frequency distri-
than frequency distributions, as they
butions we have encountered previously. This distinction is an important one to keep in
are comprised of an array of a single mind as you continue reading.
sample statistic (e.g., X), not "aw Any time a researcher works with a distribution that is based on some sample sta-
scores. tistic like the mean or the standard deviation, such a distribution is labeled a sampling
distribution.
KEY TERM A sampling distribution is a distribution comprised of statistics (e.g., J(, s) based on samples of
some fixed size Ndrawn from a larger population.
The distribution of sample means, then, is a sam lin distribution, and there is a theo-
re c sampling distribution for every statistic that exists. Besides one for means, then,
there is one for variances, standard deviations, correlations, proportions or probabilities,
and so on. Any statistic that can be calculated from some sample of data has its own
sampling distribution.
What would the sampling distribution of means look like if we plotted it? That is,
what shape would its distribution adopt or take on? Given its ubiquity in our discussions
in this book, it will come as no shock that a sampling distribution of means of some
fixed size Nwill take on the shape of the standard normal distribution (see Figure 9.2).
As shown in Figure 9.2, most of the sample means will be similar in value to one an-
other-they will cluster under the bell in the normal curve (for a review of the normal r
distribution and its properties, see chapter 5). A few stray or aberrant sample means will
fill out the tails of the normally shaped sampling distribution (see Figure 9.2). What will
probably surprise you is the enormous importance the bell-shaped assumption for any
sampling distribution takes on in inferential statistics, leading us to carefully consider
what is called the central limit theorem in the next section of the chapter. Before we pro-
ceed to review this critical theorem, however, let's pause for a moment to consider~
important conce of an sampling distribution, expected value and stan-
I
l
Repeated /
®
Samples ~,/
Drawn , /
r ,/
r ,
( )I'
I,
®
Figure 9.2 A Sampling Distribution Created by Repeated Sampling (Fixed Sample Size N) of a
Population
r'
;
KEY TERM The mean of any sampling distribution of a sample statistic is referred to as the sampling distribu-
) tion's expected value.
,r Thus, the mean of a sampling distribution of means is called the expected value of the
(
sampling distribution of the mean, which is symbolically known as /J-X. The formula for
I calculating /J-x is:
l' I X .rIA"" of s~4 WI tlQIrjl
, f [9.1.1] /J-x= Nk ' - I'I"".,!xuoP',f"""pkS
r where the expected value of the sample mean can be known by summing all the sample
r means and then dividing them by the number of samples (Nk ), not the size of the samples.
I Please be aware that formula [9.1.1] is meant to conceptually present how /J-x could be
known. Given that most sampling distributions of means are presumably composed of
r an infinite number of samples, we will not actually be using [9.1.1] to perform any
,
I
calculations.
I
( The standard deviation of any sampling distribution of sample statistics is also
f known by a particular name, the standard error.
KEY TERM The standard deviation of any sampling distribution of a sample statistic is referred to as the sam-
pling distribution's standard error.
)
I When working with a sampling distribution of means, then, the standard deviation of
>
that distribution would be known as the standard error of the mean. The symbol for
'" the standard error of the mean is ux. Please note that ax does not mean the same thing
as u, the symbol for the standard deviation of a population. The standard error of the
mean can be calculated using:
.t,
I
[9.2.11
y;;;. u
t
/
uX=N=v'N'
i
322 Chapter 9 Inferential Statistics: Sampling Distributions and Hypothesis Testing
Besides the sample size (N), this formula assumes that the population variance (u 2 ) or
the population standard deviation (u) is known.
We will learn more about the expected value of the mean and its standard error
shortly. We now turn to a review of the central limit theorem and its implications for
sampling, hypothesis testing, and inferential statistics.
KEY TERM The central limit theorem proposes that as the size of any sample, N, becomes infinitely large in
size, the shape of the sampling distribution of the mean approaches rmality-that is, it takes on
the appearance of the familiar bell-shaped curve-with a m an e ual to t n,
'![1d a standard deviation equal to u/\I N, whicii"1s known as the standard error of the mean. As N
increases in size, the standard error of the mean or ux will decrease in magnitude, indicating that
the sample will be close in value to the actual population /L. Thus, it will also be true that /LX == /L
and that ux == u/VN.
Two main points stand out in the definition of the central limit theorem:
• Despite a parent population's shape, mean, or standard deviation, the central
limit theorem can be used to describe any distribution of sample means. Thus, a
population does not have to be normally distributed in order for the central i
limit theorem to be true. The central limit theorem, then, can be applied to any
population as long as samples can be randomly drawn and be of a reasonable,
fixed size.
• As N increases in size, the shape of the distribution of sample means quickly
approaches normality. When an N = 30 observations or greater, a sampling distri-
bution will take on the familiar, symmetric bell-shaped curve. Interestingly, if a
population is normally distributed to begin with, then even small fixed size N sam-
ples (i.e., < 30) will create a normally shaped sampling distribution of means.
\ ____________________~IIIIQtiAlliliAIIII~·I~·lllijII
..11Ili!i'L-____________________
The Law of Small Numbers Revisited
The law oflarge numbers and the central limit theorem are essential to hypothesis testing, but do
I not assume that their influence is assured in every circumstance. Keep an open mind as you
read and then answer the following problem from Tversky and Kahneman (1971):
The mean IQ of the population of eighth graders in a city is known to be 100. You have se-
lected a random sample of 50 children for a study of educational achievements. The first
ild tested has an IQ of 150. What do you expect the mean IQ to be for the whole sample?
Given our p evious experience with IQ and sampling issues, my guess is that most readers guessed
that the mean IQ should still be 100. Although that guess is a good one, the actual average should
be 101. Why? Well, despite the fact that the first child selected has an IQ of 150, the 49 remaining
children are still anticipated to have an IQ of 100 each. To calculate the expected mean IQ score,
we would have to determine the average of the total number of IQ points available. To do so, mul-
tiply the 49 children X 100 points = 4,900 points, which is added to the 150 points we know about
from the first child (Le., 150 + 4,900 = 5,050). The mean of the available IQ scores would be based
on dividing the 5,050 points by the 50 children, which is equal to 101. The expected IQ average of
the class, then, is 10 1.
Why, then, do people say the average IQ for the above problem should be 100? Think about
what you know about the normal distribution-it is symmetric on both sides of the mean of the
distribution. Many readers who answer the IQ problem assume that the aberrantly high IQ score
of 150 will necessarily be balanced out by a very low score (or group of scores) on the other side of
the distribution so that the mean will remain 100. In other words, eo Ie err neousl assume that
chance is self-correcting, but as you know from chapter 8, it is not! To paraphrase Tversky an
/ Kahneman (1971), chance is not self-correctin but rather it is self-dilutin ; the effects of extreme
. uted b other scores that are closer to the mean (here, the IQ average of 10 ).
amples, then, are not always representative. Remem er at law of large numbers simply
states that the larger samples drawn from their parent populations will generally-not
absolutely-be close to the true population average. Beware, then, the law of small numbers
whereby people assume that that any random sample (a) must resemble all other random samples
and that (b) any random sample will necessarily approximate the true population average. On av-
erage, any random sample of sufficient size is expected to be representative of the parent popula-
tion, but this expectation is easily violated by chance-a given sample may not be representative.
I
KEY TE~ The law of large numbers proposes that the larger the size of a sample (N), the greater the proba-
~ bility that the sample mean (X) will be close to the value of the population mean (p,).
Why is the law of large numbers important? Precisely because it reminds us of the
The Xof a larger sample is more virtues behind and im ortance of obtaining an adequately sized sample. Anytime you
conduct a piece of research or rea a pu IS e summary, one 0 e rs questions you
likely to approximate J1- that the X of a
should ask yourself is whether the sample size is adequate for the inferential task in-
smaller sample. volved. In a more concrete way, think back to the David L. problem that was introduced
in chapter 1. If you were looking at a college again or searching for an appropriate grad-
uate or professional school, would you want to know the opinions of 2 students or 20
students? Without going to Herculean efforts or bothering too many passersbys on your
campus visits, I think that you will agree that bigger samples are better samples when an
inference is being made, especially because such samples are apt to more accurately re-
ect the true valuesjnl1e~~nt in a population--or the 0 inions of the indi enous mem- VO~
..
bers of col ege an umverslty communities! /'
324 Chapter 9 Inferential Statistics: Sampling Distributions and Hypothesis Testing
w: -\
•} s-
._
f'-'
sure that any research effort is free from sampling error that is caused by unknown,
!o\ e.r-,.,q :....,.\, systematic factors .
I) ~ 1.~~D~owing the standard error of the mean provides a researcher with a very important ad-
::---··f.J-vantage: t e standard error indicates the how well a sam mean estimates the value of
a population mean. smaller standard error specifies a close match between a samp r
mean and a popltlation mean, a larger error points to considerable disparity between the
two indices. To determine standard error, we need to review what we previously learned
S ~WlIl", ~ $ about estimating population variance and standard deviation from known values of
- a~ sample variance and standard deviation.
In chapter 4, we discussed the difference between biased and unbiased estimators of
variance and standard deviation (see pages 162 to 165). In the course of our detailed re-
view of variability, we learned that the standard formula for calculating sample variance,
formula [4.11.1], routinely underestimates the population variance. Here it is, renum-
bered to reflect its placement in this chapter:
[9.3.1]
We noted that this ve!]iQD of the sample variance formula is known as a biased estima-
tor when used for an ur ose besides describin sam Ie data; when used as an . ate
o t e actual population variance, it falls short of the mark. It follows, then, that a
Standard Error and Sampling Error in Depth D ~~ L .. I. I 3~5 ... .
. f'..f ~ S"NI1II(~ as-I"o,....
• • 1_ _, /\UP-
IS t"ollS Icxe,-~ _ """/,If
t:(t',O I) .
standard deviation (formula [4.14.1]) based o~'tiling the sq~are root of 52, too, would
underestimate the standard deviation of the population, or:
[9.4.1) 5 = ~V!?5
5-2 = JI (X;; X)2 .
The solution to the problems posed by biased estimators was a simple one. Statisti-
cians learned that a biased estimate can be corrected, that is, converted into an unbiased
estimate by reducing the value of the denominator by one observation-in formula
terms, Nbecomes N - 1. The formulas for sample variance and standard deviation are
easily recast into unbiased estimates that more closely approximate population values.
The unbiased estimate of population variance (CT 2) can be determined by using formula
[4.22.1] , recast as:
[9.5.1)
As you will recall, the caret ( A ) over 52 indicates that the statistic is an unbiased estimate.
The population's standard deviation (CT) can then be determined by formula [4.23.1], or:
[9.6.1]
<1b
~
CT = ; =W = J I (X - X)2 .
N - 1
It follows, then, that the standard error of the mean cali be estimated by using:
[FA
/ ~ [9.8.1] Estimated CTx = 5X = -V ~ = VN' ?
,...
On occasion, of course, unbiased estimates of ~2 and ~ may be unavailable. If so, the
standard error of the mean can be estimated using the following formula, wltich corrects
for the use of biased estimates:
[9.9.1] •
EstImate CT- =
x
5- =
x
5
-v"N=-I
= JI (X - X)2 =
N(N - 1)
J SS
N(N - 1)
.
Given your prior experiences with their component parts, I very much hope that
you feel that the preceding set of formulas is not difficult to follow. I also hope that you
appreciate how interrelated each of the formula variations is with the others-by now,
you should be able to see how one formula can be derived from (is related to) another. If
you are feeling a bit overwhelmed at this moment, please take a break for a few minutes
and then reread this section of the chapter. You must have a solid grasp of the theoreti-
cal nature and computational derivation of the standard error of the mean before
proceeding to the next section.
326 Chapter 9 Inferential Statistics: Sampling Distributions and Hypothesis Testing
[9.10.1]
Because we know u = 15 already, we can simply divide this value by the square root of N.
~ -
Let's begin with a sample size of one (N = 1) observation. Obviously, when we dr~w
rQ ,• only one observation, the mean of this distribution of 1 will be that observation (Le., X = X).
Put another way, when N = I, the standard error for the distribution of sample means
~6eJ~ .
will be equal to the population's standard deviation. We can show this using the latter
half offormula [9.10.1]' or
u
[9.11.1] Ux= v'N'
15
[9.11.2] ux= VI'
[9.11.3] ux= 15.
As you can see, when N = I, the standard error of the mean is identical to the popula-
tion standard deviation.
As the sample size increases, however, we should see a smaller standard error. If we
make N = 20, what happens to the value of Ux ?
15
[9.12.1] Ux = VW'
15
[9.12.2] ux=--'
4.47
[9.12.3] Ux = 3.36.
Clearly, the expected standard error of the mean for a sample size of 20 is much smaller
than that for N = 1. What if we make N = 100?
15
[9.13.1] Ux= ViOo'
15
[9.13.2] Ux=-'
10
[9.13.3] Ux = 1.5.
Thus, when a sample size increases to 100 observations, the standard error of the mean
can be expected to drop substantially to only 1.5 standard error units. In other words, the
Standard Error and Sampling Error in Depth 327
width of the bell-shaped curve narrows when sample size rises, thereby confirming the
central limit theorem and the law oflarge numbers, as well as demonstrating the u~eful
ness of the standard error concept. As N increases then, the error between a given X and
As N increases, the measurement
JL readily decreases, so that the standard error of the mean serves as an index of how wetr
For example, the likelihood that a subsequent sample mean will fall within a particular
range can be estimated in terms of some ..specific probability.
A researcher draws a sample of 75 '~articipants from a population. The sample's
mean is 65 and the unbiased estimate of its standard deviation is",4.5. The first thing we
~ulate the standard error of the mean, which is: S
A
S
[9.14.1) Sx= v"N.
We then enter the standard deviation and the N of 75 for:
4.5
[9.14.2) Sx = v75'
4.5
[9.14.3) Sx=--'
8.66
[9.14.4) Sx = .52.
Using this value for the standard error of the mean, a confidence interval for the mean
can be created. To begin the next step, we remember that the sampling distribution for
/ the mean is normally distributed. As a result, we can use standard or z scores to create a
confidence interval. Do you remember that back in chapter 5 we examined the percent-
age of the area under the normal curve that fell within standard deviation units around
the mean of 0 of the z distribution? Figure 5.4 on page 188 illustrates the areas between
the standard deviation intervals along the z distribution. You will remember, for exam-
ple, that approximately 68% of the area under the curve falls between:±:: 1.0 standard
deviation units around the mean.
The formula for calculating the confidence around the mean is:
sure is usually taken to be a sufficient indicator of a measure's reliability (see chapter 6). In a similar
way, the standard error can be treated as a measure of a sample mean's reliability. Think about it: If
the means repeatedly drawn from some population are very close to one another in value, then they
appear to provide a reliable measure of the population and its p,. More to the point, of course, is the
fact that the standard error corresponding to all of these similar means would of necessity be very
small. Thus a andard error would indicate a high degree of reliability where the similarity
egree of
How do we determine the appropriate z score to use? Turn to Table B.2 in Appendix B
and locate z = 1.00. As you can see, the entry in column B corresponding to this z score is
.3413, indicating that approximately 34% of the area under the curve falls between the
mean of a and the first standard deviation. If we take into account the same percentage
area falling below the mean (i.e., z = -1.0), then we have accounted for the aforemen-
tioned 68% of the area under the curve. To define the confidence interval for the mean, all
that remains is to enter the zscore of ± 1.0, the sx, and the sample's mean into [9.15.1], or:
[9.15.21.. confidence intervalx = 65 ± 1.00 (.52).
Thus, the lower boundary of the confidence interval is:
[9.15.31 confidence intervalx = 65 - .52 = 64.48,
and the upper boundary is:
[9.15.4] confidence intervalX' = 65 + .52 = 65.52.
The 68% confidence interval for the mean is between 64.48 and 65.52. The width of
this confidence interval should make intuitive sense to you, as it represents where the
Standard Error and Sampling Error in Depth 329
bulk of the sample means should fall-within 1 standard deviation on either side of
the mean.
In general, researchers tend to rely on confidence intervals that capture a greater
amount of area under the curve than just 1 standard deviation unit on either side of the
mean. It is common, for example, to see 95% and 99% confidence intervals cited in the
research literature. Using the hypothetical sample data here, we can determine the confi-
dence intervals for these more frequently used ranges for the mean. The z scores corre-
sponding to 95% and 99% of the area under the curve can be found in Table B.2, and they
are 1.96 and 2.58, respectively. I encourage you to examine the area between the mean and
zfor each of these zscores (see their column B entries in Table B.2), remembering to dou-
ble the value to take into account the symmetric nature of the normal distribution.
Using [9.15.1], the 95% confidence interval for a mean of 65 and an Sj( of .52 is:
[9.16.11 confidence intervalx = 65 ± 1.96 (.52),
which yields a confidence interval ranging between 63.98 and 66.02. The 99% confi-
dence interval, then, is based on:
[9.17.11 confidence intervalx = 65 ± 2.58 (.52),
resulting in confidence interval boundaries of 63.66 and 66.34.
We must keep in mind what information a confidence interval does and does not
supply. A confidence interval defines onl an interval where the population me .
lieved to £ , ut It oes not specify the likelihood that the mean we ave is the correct one.
Conhden~intervals like the ones we have just identified are often misinterpreted-and
misrepresented-to mean that an investigator can be, say, 95% confident that the popu-
lation mean is 65. This is an incorrect statement-please read it one more time so that you
will remember that confidence intervals do not specify the accuracy of observed means.
Neither do confidence intervals suggest that the likelihood a population mean falls
within a specified interval is .68, .95, .99, or whatever probability you choose. Why not?
Keep in mind that population parameters are unknown constants; sample data we use to
create confidence intervals for them will vary from sample to sample, and from sample
size to sample size. We should anticipate that the accompanying sample statistics, too, will
vary substantially, implying that our confidence interval estimates will vary, as well. What,
then, does a confidence interval establish? A confidence interval means that if repeated
samples of a given size are drawn from a population, then 95% or 99% of the confidence
interval estjmates wj!J include the population mean. In a sense, we know where the pop-
J ulation mean is apt to be much of the time, but we still do not know its true value.
Knowledge Base
1. Assuming that a fixed sample size is sufficiently large, what will the shape of a sam-
pling distribution of means be like?
2. How do sampling distributions differ from the distributions we have previously
studied in this book?
3. What is the law of lar~~?umbers? How do~t relate to the central limit theorem?
4. What are~~'rurthe standarcr'tf7viatlO~~ a sampling distribution called?
5. You have a population with JL = 82 and 5 = 5.0, and you have 100 observations.
What is the sx? What is the 95% confidence interval for the mean?
Answers
1. Normal; it will appear to be a bell-shaped curve.
2. Sampling distributions are made up of statistics (e.g., means), whereas the other distributions
we have reviewed contain raw scores or observations.
330 Chapter 9 Inferential Statistics: Sampling Distributions and Hypothesis Testing
A second style of standard error representation is shown in Figure B. Each dot represents a
fish's average granule cell density, and the "Ts" rising above and below the dot are the standard
error around the mean. Granule cells-so-called due to their granular appearance-process local
information in the brain. Stewart and Brunjes (1990) were interested in seeing how the size and
stage of development of the goldfish would influence the development of the olfactory bulb's
granule cell population.
Asking and Testing Focused Questions: Conceptual Rationale for Hypotheses 331
".-
150 -
~
1.L with increasing fish size. Plot
liNe 125
CI ::!.
- • of granule cell density (each
point represents one animal's
=i g 100 - T average score) vs fish body
'-'t::.
.. III
~:;;
•
1 T length (bars represent ± 1
75 - T
~!:.
~
50 -
.LIi> i! S.E.M.).
25 I I I I I I
45 60 75 90 105 120 135 150
Body Length (mm)
3. The law_of large numbers posits that larger sample sizes increase the chance that a sample
mean (X) will approximate fL. In part, the central limit theorem proposes that the mean of a
sampling distribution based on a large, fixed size N will be equal to fL.
4. The expected value and the standard error.
5. s5( = .50; confidence interval ranges between 81.02 and 82.98.
W hether you are developing a hypothesis for your own research or evaluating one that already
appears in the behavioral science literature, it should satisfy the following list of criteria
(adapted from Dunn, 1999). A good hypothesis:
1. Is a clear statement, not a question. Despite the fact that a hypothesis is a testable question, it
should appear as a declarative statement (e.g., crowded environments promote aggressive be-
havior).
2. Identifies specific relationships between or among variables. In general, some independent vari-
able (e.g., the presence or absence of crowded conditions) should be expected to create an ob-
servable change in some dependent measure (e.g., physically or verbally aggressive behavior
directed from one person to another).
3. Is theory-based or somehow linked to existing knowledge. There are few new psychological ideas
under the sun. Chances are that some literature exists pertaining to a hypothesis or at least the
general area it touches on (e.g., there are extensive literatures on human as well as animal ag-
gression).
4. Is conase and to the point. Less is more (or at least less is better). Most hypotheses should be
no more than ~ne or perhaps two clear sentences in length. Supporting details should be saved
for the theory underlying a hypothesis (see points 1 and 3 above).
5. Can be tested. The variables within a hypothesis should be operationalized with relative ease
and, as a declarative statement, the hypothesis should lend itself to empirical examination in an
experiment, quasi-experiment, observational study, or other form of investigation (see point 2).
6. Is readily understood by others. People interested in the behavioral sciences-researchers,
scholars, teachers, students, and outside observers-should be able to interpret a hypothesis
with little difficulty. Some hypotheses will be more technical than others will, but an ideal hy-
pothesis expresses an empirical relationship that can be understood by educated persons.
We now turn to the statistical hypothesis, which concerns quantifying the relation-
ship between the variables involved in the conceptual and operational hypotheses.
KEY TERM A statistical hypothesis represents the mathematical relationship presumed to exist between two
or more population parameters.
We might specify, for example, that the group of participants exposed to the small dis-
play of objects would notice more changes (Le., make fewer recall errors) than those pre-
sented with the larger object display. Thus, the statistical hypothesis highlights the math-
ematical relationship presumed to exist between the two groups-exposure to a small
array of objects leads to a higher number of correctly recognized changes than exposure
to a larger array of objects.
Statistical hypotheses are usually represented in symbolic as well as written form.
Thus, the average number of correctly recognized changes for the control group (JJ.c) is
anticipated to be higher than the changey'!ote~ the experimental group (JJ.e), or
JJ.c > JJ.e· Why are we using JJ. to designate ~IQR means? Precisely because we an-
ticipate that changes observed in the respective groups of research participants (who
were randomly assigned to one or the other condition) are representative of changes in
population parameters. Indeed, the symbolic use of population means reminds us that
the results of one experiment are meant to generalize beyond the original participant
sample to the larger population of participants, extant as well as potential.
Asking and Testing Focused Questions: Conceptual Rationale for Hypotheses 333
o tion parameters.
When using a control group and an experimental group, a djrectional bwotbesis wjJJ be
either 'k > fLe or & < M".. Most of the hypotheses that behavioral scientists entertain
and investigate are directional. Before conducting any actual research, most investigators
have a strong sense of what relationship between the manipulated and measured vari-
ables is specified by a theory. Note that the explicit use of the "greater than" (» or "less
than" «) sign provides the directionality in any hypothesis. Naturally, the mathemati-
cal signs representing "less than or equal to" (::5) and "greater than or equal to" (2::) can
also be used to represent relationships in, as well as to test, directional hypotheses.
In contrast, a nondirectional hypothesis does not specify the direction of a differ-
ence, merely that some difference will exist between two population parameters. If we
employed a nondirectional hypothesis in the environmental sensitivity study, we ~
simply note that there would be a difference in the number of changes noted by the con-
trol group versus the experimental group. We would not, however, speculate about
which group would have a higher recall rate.
KEY TERM A nondirectional hypothesis anticipates a difference will exist between population parameters but
does not specify the nature, or direction, of that difference.
Researchers elect to use nondirectional hypotheses when they are uncertain or unwilling
to characterize the difference between the population means, often when the relevant re-
search is novel or the investigator is new to the area of inquiry. Symbolically, a nondirec-
tional hypothesis can be represented as /Lc '* /Le, but the representative magnitudes of
the population means are not specified by this relationship-all we know is that a dif-
ference should exist between them. The "not equal to" ('*) sign reminds us that either
population could be larger than the other one.
behavior is never regarded as ultimately true but merely a useful, even reliable guide
until it is proven to be false, at which time another, better account replaces it. This
process then repeats itself virtually endlessly (a variation of this theme was introduced as
the research loop of experimentation in chapter 2). How, then, do we go about the busi-
ness of creating, testing, and falsifying hypotheses?
When testing any hypothesis, researchers actually develop two competing hypotheses
and effectively pit one against the other. These hypotheses are typically labeled the null
hypothesis (Ho) and the alternative hypothesis (HI). In experimental terms, the null hy-
pothesis specifies that an independent variable does not have any effect on a dependent
measure. In the parlance of statistics, the population mean of one group (e.g., the control
group) is the same as the population mean of the other group (e.g., the experimental or
"research" group )-indeed, these groups ate said to behave the same or in very similar
ways. In other words, the .J!,oups share the same population mean because they are
from-and remain a art of-the same 0 ulation because the treatment or in rven-
tion represente y the different eve of the independent variable had no effect.
KEY TERM The null hypothesis traditionally indicates all the population parameters in an experiment, which
are represented by sample statistiCS, are equal. The null hypothesis predicts that a given indepeD.;
~ varia hie or other intervention will notcause a change in some dependent measure. The null
hypothesis is symbolically presented as Ho. 7
The null hypothesis (Ho is pronounced "H-oh" or sometimes "H-naught") of the
environmental sensitivity study would be that there would be no difference between the
average number of changes recalled by the control group and the experimental group, or:
Ho: as /Lc = /Le·
For heuristic reasons, the null hypothesis is sometime referred to as "the hypothesis
of no difference;' highlighting the fact that it usually posits than no discernable differ-
ences exist between means at the level of the samples tested or from which population(s)
they hail.
The alternative hypothesis (HI is pronounced "H-one"), sometimes called the ex-
perimental hypothesis, is the idea actually being examined or tested by a piece of re-
search. The alternative hypothesis embodies the researcher's expectation of what will
happen when an independent variable is manipulated and a dependent measure is mea-
sured.
KEY TER M The alternative or experimental hypothesis specifies that a difference exists between the popula-
tion parameters identified by the null hypothesis. The alternative hvpothesis predicts that a giv~
inde endent variable or ot . rvention will cause a change in some dependent measure. The
alternative hypothesis is symbolically presente as 1.
We already specified the alternative hypothesis of the environmen~ sensitivity study
earlier, but it merits mention once more. The control group-those exposed to the small
array of objects-should identify a higher number of perceived changes than the exper-
imental group, which was confronted with a relatively larger number of objects, or:
HI: /Lc> /Le·
The projected results of any experiment are compared to the results anticipated by
its null hypothesis. In other words, the experimental results (e.g., a verified difference be-
tween sample means) are used to demonstrate that the null hypothesis is false-the
groups specified by the null hypothesis are in fact not equal to one another. When the hy-
pQlbesized relationship is found in the analysjs of 3n experiment's data, statisticians say
that we can reject the null hypothesis as false and accept the alternative hypothesis as an
Asking and Testing Focused Questions: Conceptual Rationale for Hypotheses 335
ade uate explanation for the time bein . Please note that we specifically did not say that
the alternative hypot esis was true-remember we can disprove a statement (i.e., render
it false) but we cannot prove a statement (i.e., declare it to necessarily be true and accu-
rate). When the hypothesized relationship (HI) is not found in the course of the data
analysis, statisticians say that we can accept or retain the null hypothesis. Here again, we
do not say that we are rejecting the alternative hypothesis (though, in effect, we are) be-
cause the null hypothesis represents the touchstone for hypothesis testing-every statis-
tical conclusion is understood in terms of the null hypothesis.
IIi When we accept the null hypothesis, however, please do not assume that the sample
Any statistical conclusion is based means representing the control and the experimental groups are literally equal in value
to one another (see Table 9.1). There will always be some degree of observable sampling
exclusively on the null hypothesis,
error, and such superficial differences between these means are readily attributable to
which is either accepted or rejected. random influences. At the level of the population parameters, of course, the population
means represented by the samples means are equal to one another because we are con-
cluding that the two groups come from the same population. We retain or accept the null
hypothesis to show that the independent variable did not achieve its intended or desired
effect on the dependent measure. Moreover, our statistical analysis does not literally
..demonstrate that UC = Me is absolutely true, rather the research js unahle to show thatit
is not true in the present situation (see Tahle 9 1). _
What about the statistical import of the alternative hypothesis? We know that by
showing the null hypothesis to be an incorrect account in this instance, the observed
sample means are different from one another due to the affect of the independent vari-
able. More to the point, by rejecting the null hypothesis we are claiming that the popu-
lation mean of the control group (lLc) is not equivalent to that of the experimental group
(lLe)-the behavior observed in one sample is from a different population than the be-
havior exhibited by the other sample. The statistical difference portrayed by the rejection
or the null hypothesis is one where the observed difference between two sample means
Table 9.1 What Are We Doing When We Accept the Null Hypothesis?
Accepting the null hypothesis (Ho)-the hypothesis of no difference-is not necessarily and not
always the same thing as declaring it to be true. Here are some reasons that the null hypothesis
is not rejected:
1. The null hypothesis is actually true (i.e., no mean difference exists) and therefore it should
not be rejected.
But it is also possible that:
2. The null hypothesis is actually false (i.e., a mean difference actually exists) and therefore it
should be rejected, however, the obtained sample of participants is not representative of the
true population. A biased sample, then, leads to the acceptance of the null hypothesis.
Or:
3. The null hypothesis is actually false (i.e., a mean difference actually exists) and therefore it
should be rejected, however, the experimental methodology is insufficient to detect the true
situation (i.e., the mean difference). Experimental methodology can fail to detect differences
because the manipulation of the independent variable is weak, the dependent measure is in-
sensitive, randomization is flawed or absent, the sample size is too small, an unknown but
influential "third variable" is present, and so on.
Thus, the null hypothesis is accepted as an appropriate conclusion until more data are collected, a
newexperiment(s) is performed, experimental methodology is improved-whatever additional
information can be gathered to execute another test of the alternative hypothesis (Hd or some
other alternative hypothesis.
is too great to be due to chance alone, an issue we will discuss in detail shortly. Some sys-
tematic influence-presumably the hypothesized effect represented by the independent
variable-is the cause and the sample means are described in statistical terms as being
significantly different from one another.
~lol\
• Statistical Significance: A Concrete Account
How can two means be "significantly different" from one another? What is statistical sig-
nificance? A difference between means is described as being statistically significant-one
mean is larger in value than another-when only a very low probability exists that the
results are due to random error rather tlian the systematIc mfluence of an inde endent
variable. though significance is examine oc y, at t e level of sample means, the in-
ferentIal process involves making judgments about population parameters. A significant
difference between group averages, for example, is unlikely to occur when population
means are actually equal to one another. An si nificant difference, then, suggests that
each sample mean represents a distinct and different population. Inferenti statistics
Te'ly heavily on this form of significance testing. "
KEY TERM Significance testing entails using statistical tests and probabilities to determine whether sample
data can be used to accept or reject a null hypothesis involving population parameters.
In practical terms, when a test statistic is said to be significant, a mathematical differ-
ence-say, one mean is larger in magnitude than another-is observed to occur between
the two (or more) groups within an experiment (i.e., the sample means represent
different populations with different parameters). If a difference is in the predict~
direction, then it is attributed to the independent variable and the null hypothesis of no
difference is re· ected. When a test statIstIc IS not sIgmhcant, then no mathemahcar
ifference exists between the means of the two groups-both are similar in magnitude
(i.e., they presumably represent the same population and its parameters). Lack of a
significant difference promotes retention of the null hypothesis. We will learn how to
assess significance using particular test statistics later in this chapter as well as in four
subsequent chapters in this book.
Regrettably, the word significance has a great deal of conceptually confusing bag-
gage attached to it. In everyday language, the word "significant" is used as a synonym for
"important;' "meaningful;' "notable;' or even "consequential." In the context provided
by statistics and data analysis, however, the word means something quite different, and
it possesses much narrower, even modest, applications. It refers to whether a result is
statistically reliable, one that is sufficiently trustworthy so that an investigator can
reasonably reject the null hypothesis. Wright (1997) sagely observes that researchers
would be better served if the word detected were used in place of the word significant. For
example, did some inferential statistical test detect a difference between groups? The
word "detect" is certamly a less confusing choice, and it does not implywnether a
difference is great or small-merely that the difference exists. In contrast, the word
~ significant raises expectations in the minds of naive readers and novice data ana-
lysts-they erroneously assume that a difference is big, powerful, and sometimes
Statistical significance is not even dramatic. --..
synonymous with scientific Significance is directly related to probability or p values, which were briefly intro-
importance or the strength or size ofh duced at the end of chapter 8. In statistical jargon, {!robabili values are often referred
a result. A significant result is one ~ t ... els or si ni cance ~ecause they indica e d .re h
. . aI!..Qbserved dIfference between sample means IS a re Ia e one. An alternative term that
that IS reliable and detectable thrOUgh/ is frequently used to denote a p value or a significancelevel is alpha level, which is
statistical means. J,..
symbolized by the Greek letter a. (Although these three terms are largely interchangeable,
~ in mind that significance testing deals only with the analysis of measures according to the group-
ings dictated by the hypothesis-it has absolutely nothing to do with whether the theory underly-
ing the hypothesis is reasonable, tenable, absurd, or clever. Indeed, significance tests are mute
where theory is concerned-they only deal with the numbers based on samples, which in turn re-
flect the relationships that are present or absent in their populations of origin.
In contrast, the practical significance of results refer to what they mean in the context of the-
ory. Practical significance also addresses the implication of results for the understanding of be-
havior. Does a result disclose something new and dramatic about why people or animals behave
the way they do? Does it tell us what we already knew to be true? Or, in the worst of cases, does it
reveal something that is in the "big picture" of behavior, rather trivial? Sad but true-one can ob-
serve a significant statistical result that is ultimately not terribly meaningful, a point that should
reinforce for you the distinction between statistical and practical significance (e.g., Abelson, 1995).
a-levels are often discussed in connection with inferential errors, a matter we will take
up at the end of this chapter.)
At the conclusion of chapter 8, two of the more conventional p values or levels of
i significance were introduced-.Os and .01. When a test statistic is said to be significant
J at the .05 level, for example, the likelihood that the difference it detected is due to chance
! (i.e., random error) is less than 1 in 20. In other words, if the experiment were repeated
lOO times, we would expect to observe the same results by chance-not due to the effect
an independent variable exerts on a dependent measure-s time or less. Indeed, when
reporting the differences detected by test statistics, no difference is deemed to be a reli-
able one unless it reaches (or is still lower than) this conventional 5 percent mark or, in
statistical terms, the .05 ("point oh-five") level of significance.
If a researcher or data analyst chooses to use a more stringent requirement for de-
The terms p value, significance level, tecting a statistically significant difference, then a result could be deemed acceptable or
level of significance, and alpha (a) reliable ("significant") only when it was apt to be due to chance once in lOO trials, the 1
percent or .0 1 level of significance. Some investigators rely on the even more demanding
level are generally interchangeable.
p value of .001, which indicates that there over 1,000 trials, there is only 1 chance of ob-
They all refer to some predetermined taining a predicted difference when the null hypothesis is actually true.
probability level (e.g., .05, .01, .001) Most behavioral scientists choose to use the .05 level of significance as the mini-
used to assess the null hypothesis. mum acceptable level of significance and, in fact, a p value is invariably reported with
any test statistic. These p values are commonly reported as p < .05 ("pee less than point
oh-five"), p < .01 ("pee less than point oh-one"), and p < .001 ("pee less than point
oh-oh-one"). In truth, of course, it is actually possible to determine a precise p value
338 Chapter 9 Inferential Statistics: Sampling Distributions and Hypothesis Testing
Region of Retention
= .950
-3 -2 -1 o +3
-1.96
t
L -_ _ _ _ _ _ .025 + .025 = .05 - - - - - - - - - - - 't
Figure 9.3 Two-Tailed Critical Values and Critical Regions of a z Distribution
KEY TERM A critical value is a numerical value that is compared to the value of a calculated test statistic. When
a test statistic is less than a critical value, the null hypothesis is accepted or retained. When a test
statistic is equal to or higher than a critical value, then the null hypothesis is rejected.
A critical value is the minimum value that a calculated test statistic must match or exceed
in order for a statistically significant difference to be realized. Critical values are usually
.. ~rganized into a table of values wherein statistical significance is indicated for any given
sample size (Appendix B contains several examples we will use shortly and in later chap-
ters). Naturally, these critical values correspond to one of the conventional significance
levels, such as .05, .01, or .001, used to test hypotheses.
Figure 9.3 portrays the graph of a sampling distribution similar to those discussed
A critical value is a cutoff, a guide to previously. This version of the standard or z distribution highlights two critical values
that serve as cutoff points for regions of the distribution where statistically significant
whether a test statistic is or is not
and statistically nonsignificant results are found. The z values of -1.96 and + 1.96 COt-n
significant. res ond to the .05 significance level because each one "cuts off" .025 of the scores at ei-
ther end 0 t e Istn utlOn I.e., .025 + .025 - .05). Why is t IS so. urn to Table B.2 in -
Appendix B and locate the entry for a z score of 1.96. Now check the value found in col-
umn C, to the right of where z = 1.96 is located. As you will remember, column C rep-
resents the area under the curve that is beyond z, and here it turns out to be equal to
.0250. Because the z distribution is symmetric, we can account for both -1.96
and + 1.96-hence, we add the probabilities associated with the areas beyond each z and
achieve .05 (see Figure 9.3).
The region of the sampling distribution lying at and beyond these two z scores is re- .
ferred to as the critical region or sometimes the reaion of rejection, This region or, if
you prefer, either region, is one leading to the rejection of the null hypothesis. When the
value of a test statistic is equal to or greater than + 1.96 (or less than or equal to -1.96
in the case of a test statistic with a negative value), the null hypothesis of difference is re-
jected. Thus, for example, if an observed statistic were z = +2.21 (or -2.2l), then the
null hypothesis could be rejected because these values fall into the rejection regions (se;
Figure 9.3). Any critical region contains relatively rare or extreme values that are not
likely to be found when a null hypothesis is true, thus, any test statistic falling inside a
critical region is sufficient grounds to reject a null hypothesis.
340 Chapter 9 Inferential Statistics: Sampling Distributions and Hypothesis Testing
When a test statistic is found to have a value less than + 1.96 (or greater
than -1.96), then the null hypothesis is accepted. Retention of the null hypothesis oc-
curs because the test statistic falls into the region of retention (see Figure 9.3) or, in
other words, it neither reaches nor exceeds the required critical value. A z score of + 1.34
(or -1.34) would fall into the region of retention, so that the alternative hypothesis it
represented would be found wanting (i.e., the null hypothesis of no difference would be
accepted, instead; see Figure 9.3).
-
two), making it more difficult to reject Ho. Regarding curiosity, I mean that the two-
Statistical Significance: A Concrete Account 341
-1.67
(
t
I
I
l
o +1.0 +2.0
+1.67
I
Figure 9.4 One-Tailed Critical Values and Critical Regions of Two z Distributions
tailed test can detect differences on both sides of a distribution, so that an observed dif-
ference between sample means can be found in the predicted direction or its exact op-
posite-but it would always be found. In contrast, one-tailed tests emphasize a single di-
rection, so that if an observed difference does not conform to that direction, it is not found.
Many researchers, for example, want to know why a significant difference was observed on
one rather than the other side of a distrib\ltion, speculating about a result's cause in the
j
process. In other words, tWo-tailed significance tests allow researchers to be a bit more ex-
ploratory about their results. When a result is opposite from a prediction, then a researcher
can consider whether somerevision to the existing theory is warranted, if a methodological
innovation is the culprit, and so on-interesting, plausible possibilities abound.
When the Null Hypothesis is Rejected-Evaluating Results with the MAGIC Criteria
W hen you evaluate the meaning of the statistical results of any piece of research, you should
consider them in light of Abelson's (1995) MAGIC criteria. The five criteria are presented in
the form of questions and definitions below:
Magnitude. In statistical terms, are the results reliable? How strong are they in terms of sup-
porting the alternative hypothesis? Magnitude also addresses issues of effect size and what is
known as statistical power, topics considered later in this chapter.
Articulation. Are the results presented in clear, concise, and cogent terms? In general, shorter
explanations are better than longer explanations as long as crucial details are not lost or over-
looked. Qualifications-too many ift, buts, and but alsos-undermine readers' interest and confi-
dence in research conclusions.
Generality. Are the results externally valid, that is, can they be applied toward understanding
behavior in other settings (see chapter 2 for more detail on validity issues)? Are the results broad
or narrow in scope?
Interestingness. Will behavioral scientists care or be interested in the results? Is anything about
the results surprising or counterintuitive? Will existing theories need to be reevaluated in light of
the findings? Abelson suggests that results satisfy this criterion to the extend that they alter what
people currently believe to be true about important research issues (see also Data Box 9.E).
Credibility. In the end, are the results believable? Was the research methodology-everything
from sampling to the last statistical test-appropriate? Do the results "hang together" in support
of or to advance a theory?
rather complex, but here is a relatively straightforward account. The degrees of freedom
. ' in any set of data are the number of scores that are free to take on any value once some
~ IW\t"".s
S' l.L ~tatistical test is performed. Generally, the ~rees offreedom for a test caD be kpmom bx.
" c..(~!..- of ro ptA.kJ.'..o1\ r'0l 0
taking the total number available values-the sample size-and then subtracting the
l S+ :I"\t.l~r,. .J<. number of population parameters that will be estimate om the sam e. tatisticians
describe is as N mmus e number 0 restrictIons researchers place on any data by
asking focused questions in the form of hypotheses.
Here is a simple example illustrating the concept behind degrees of freedom. Per-
haps you know that a sample's mean is 25, and you have six observations-five of them
can take on any value whatsoever. Once the values of those five observations are known,
however, the sixth observation can have one and only one possible value in order for the
distribution's mean to be equal to 25. Thus, the degrees of freedom would be 5-based
on N - I-because the value of the sixth observation is no longer free to vary.
KEY TERM Degrees of freedom are based on the number of available observations or scores minus the num-
ber of parameters that are being estimated from the data. Virtually all inferential statistical tests
have accompanying calculations for their degrees of freedom.
In more practical terms, degrees of freedom are numerical guides that enable the
A good rule of thumb for thinking
data analyst to select the appropriate critical value from a statistical table. Indeed, most
statistical tables have a column labeled "degrees of freedom;' enabling users to quickly
about degrees of freedom is this: You and efficiently select a proper critical value once the corresponding degrees of freedom
lose one degree of freedom for every are calculated or identified. (We will calculate the degrees of freedom for the Pearson
mean that you calculate. correlation coefficient [r; see chapter 6] and use an appropriate table of critical values in
the next section of this chapter.)
Single Sample Hypothesis Testing: The zTest and the Significance of r 343
As we conclude our discussion of statistical significance, let me remind you how im-
portant it is to link statistical results with their meaning. Too many first-time data ana-
lysts become overly focused on the finer details of testing hypotheses and, in the process,
they lose sight of what their results are telling them about behavior. To combat this ana-
lytical myopia, I urge you to develop the habit of thinking about results in terms of Abel-
son's (1995) MAGIC criteria. MAGIC is an acronym standing for magnitude, articula-
tion, generality, interestingness, and credibility, five evaluative criteria that are defined in
Data Box 9.F. Abelson suggests that these criteria should be kept in mind when deciding
whether a set of results adequately supports a research hypothesis. When analyzing a
study's data, develop the practice of asking the questions associated with each criterion
(see Data Box 9.F). This simple practice will remind you to always link calculations with
their meaning and, in turn, to think about whether this linkage is strong enough to con-
vincingly persuade others that the results merit their attention (see also the later section
of this chapter on Writing about hypotheses and results).
Knowledge Base
1. Which of the following hypotheses is a directional hypothesis? A non directional hy-
pothesis?
a. /Lc *- /Lc
b. /Lc < /Lc
c. /Lc ~ /Le
2. When a null hypothesis is accepted, what does a researcher conclude about a popu-
lation?
3. Why is the null hypothesis either accepted or rejected? Why isn't the alternative or
research hypothesis accepted or rejected?
4. True or false: A statistically significant difference is a meaningful difference.
5. What is a critical region?
Answers
1:.- a: -Nondirectional
b. Directional
c. Directional
2. When Ho is accepted, the researcher concludes that the independent variable did not create
any observable change in the dependent variable meant to represent the population.
3. Although empirical statements can be falsified, they can never be deemed true. Thus, the null
hypothesis of no difference serves as a researcher's comparison point-it is rejected when dif-
ferences are found and retained when no differences are found. An alternative hypothesis is
assumed to provide an adequate description of events only until another, more complete ex-
planation is identified.
4. False: Statistical significance refers to the detection of some difference-a reliable one-
between groups, not whether a finding is meaningful or noteworthy.
5. A critical region or region of rejection is the area of a sampling distribution that contains "cut-
off" values that are unlikely to be obtained unless a difference between groups exists (i.e., Ho
can be rejected). When a test statistic based on sample data falls at or inside a critical region,
the null hypothesis of no difference is rejected.
Note: The steps outlined in this table are meant to be flexible and adaptive. Some statistical tests will use all
the steps, others only one or two.
the hypothesis tested will involve making inferences about a population from a single
sample of data (e.g., is a mean representative of one population or another?). In keeping
with the mise en place philosophy of analysis espoused in chapter I, we follow a series of
flexible steps that will make hypothesis testing a straightforward, orderly procedure.
These steps are summarized in Table 9.2, and they will be loosely-not absolutely-
followed in the examples of hypothesis testing presented below, as well as in those
appearing in later chapters. Take a moment and review Table 9.2 before going on to the
examples, and remember that these are flexible, not lockstep, guidelines. Once all the
necessary numerical information is available and organized, the required calculations
are actually rather simple. I think you will be surprised to see how material you previ-
~e. 1& ously learned comes together in the process.
Jl't:(~ _ 1....L.'aA
~ i.1..~';;;W
l' pofWhat Is the Probability a Sample Is from One Population or Another?
"oS <.' ~.s-t The ztest, which is derived from the zor standard distribution (see chapter 5), tests hy-
potheses about parameters based on sample data drawn from populations whose stan-
dard deviations are known. The z test is used in these circumstances to test a hypothesis
involving one sample mean. Calculating a z test statistic is not terribly different than cal-
culatmg a z score. Where a z score is determins.,d by s!!btractjn~ a population mean from
some observation, and then dividing the difference between cores b the 0 uIa-
tion standar eviation, or (X - JL) U", the ztest e a 'ne e difference between a sam-
ple mean (X) divi e t e standard error of the mean, u"x, or:
X - /1.
[9.18.1] Z = .::..::.........
U"x
Once a z score is calculated, its value can be looked up in Table B.2 in Appendix B in
order to see the probability of observing a score that extreme (cf., chapter 8). The prob-
ability of the observed z helps a researcher decide the likelihood that some sample mean
really came from a distribution of scores where the hypothesized value of JL is the same
as the population mean. If the observed z score is not found to originate from a popula-
tion with the hypothesized value of JL, then the researcher assumes that the sample mean
is from some other population with a different JL. In other words, there is a statistically
significant difference between the sample mean and the hypothesized value of JL.
Let's look at a concrete example so that you can see that there is little difference here
compared to your prior work with the zdistribution. Imagine that an instructor of75 gifted
students knows that the mean IQ of the group is 132. She also knows that the IQ distribu-
tion of all school children has a JL of 100 and a U"of 15. How likely is it that the sample mean
of 132 is reflective of a random sample from the population of school children?
Single Sample Hypothesis Testing: The zTest and the Significance of r 345
Step 1. Following Table 9.2, the first step involves identifying the null and alterna-
tive hypotheses, and then deciding upon a significance level. We are essentially asking if
the J-t of the gifted students J-t is greater than or equal to the general population J-t (or, if
you prefer, that the general population J-t is less than J-tg). Thus, the null and alternative
hypotheses, respectively, are:
Ho: J-t = J-tg'
HI: J-t < J-tg.
Relying on conventional wisdom, the gifted student instructor decides to set the signifi-
cance level at .05.
Step 2. This step involves calculating the standard error of the mean, which is eas-
ily determined by using formula [9.2.1]. Entering the known population standard devi-
ation of 15 and the gifted sample's size of 75 into it, we find that:
U'
[9.19.1) U'x = v'N'
15
[9.19.2) U'x = V75'
15
[9.19.3) U'x = --,
8.66
[9.19.4) U'x = 1.73.
The instructor then decides to use a two-tailed significance test (each tail of the distri-
bution covers a critical region equal to .025). Because we are determining a probability
based on the z distribution, no critical values or regions of rejection are necessary.
Step 3. We can now calculate the ztest statistic using formula [9.18.1]. We simply
need to enter the sample mean (132), the known J-t (100), and the standard error of the
mean (1.73):
132 - 100
[9.20.1) z=
1.73
32
[9.20.2] z=--'
1.73
[9.20.3) z = 18.50.
This z is very, very large and, in fact, it does not even appear in Table B.2 (please turn
there now), which indicates that the likelihood of obtaining a z of this size by chance is
far less than .05 (i.e., .025 in the upper tail because we used a two-tailed test); indeed, it
is much less than .001! Therefore, we conclude that the sample of gifted students do not
constitute a random sample from the population that has a mean IQ of 100. The gifted
students have an IQ that is far above the average, so they are not representative of the
general population of school children. The null hypothesis is rejected.
Step 4. There is not much more to conclude here beyond the fact that the group of
75 gifted students cannot be considered a random sample within the larger population
of school children. This conclusion should come as no surprise, yet it nicely illustrates
the basic procedures and underlying logic of testing the likelihood a sample is from
some given population.
middle school (sixth graders) show rapid growth in social and emotional development
as compared to peers who are only one grade below them in elementary school. An
established measure of socioemotional development, one tested on generations of
elementary school students, has a /.L of 45 and a 0' of 6. The researcher decides to give the
same measure to a sample of 60 sixth-graders. He assumes that their scores on the
developmental measure will be abnormally high, suggesting a level of maturity beyond
that typically found in the elementary school population. The mean score of the sixth
grade sample turns out to be 48.
Step 1. We again follow the steps laid out in Table 9.2, so that formulating the
statistical hypotheses is a rather straightforward exercise. The researcher wants to
determine if the /.L of known population of elementary school students is lower than the
/.L of the middle school group (/.Lm). The null and alternative hypotheses can be readily
expressed as:
Ho: /.L = /.Lm'
HI: /.L < /.Lm·
Given that the developmental measure is an established instrument, the investigator
elects to use a conservative significance level of .0 1 instead of the usual .05.
Step 2. Using a two-tailed significance test, each of the tails will be equal to .005
(i.e., .01/2 = .005). The researcher then determines the standard error of the mean
using formula [9.2.1]. The population standard deviation of 6 and the N of 60 are en-
tered into this formula:
0'
[9.21.1] O'x= VN
6
[9.21.2] O'x = v'6i)'
6 ".'
[9.21.3] O'x= --, -" r- ."
7.75 ."
[9.21.4] O'x = .775.
The critical z value for a two-tailed significance test can be obtained by finding the z
score corresponding to .005 percent of the area under the z curve. A close examination
of Table B.2 in Appendix B reveals that z = 2.58 is the critical value for .0049, which,
when rounded up, is equal to .005. Figure 9.5 illustrates the (shaded) critical regions of
rejection for a two-tailed significance test with a critical value of ±2.58.
Step 3. The z test is then used to determine if the sixth-graders' mean score on the
developmental measure is higher than the established mean of the elementary school-
aged population. Following formula [9.18.1], the population mean (45) is subtracted
from the sample mean (48), and the difference is then divided by the value of the stan-
dard error of the mean (.775), or:
X-/.L
[9.22.1] z= O'j(
48 - 45
[9.22.2] z=
.775
3
[9.20.3] z=--'
.775
[9.20.4] z = 3.87.
Is the observed z of 3.87 greater than or equal to the previously established critical value
of ± 2.58? Yes, indeed it is, so the researcher rejects the null hypothesis of no difference.
Single Sample Hypothesis Testing: The zTest and the Significance of , 347
-2.58 +2.58
The observed z of 3.87 falls well into the upper critical region of rejection for Ho (see
Figure 9.5).
Step 4. Based on their average scores on the maturity measure, the developmental
psychologist is justified in arguing that the socioemotional development of the middle
school children is significantly higher than the population of elementary school
children-the sixth-graders represent a different population because they responded at
a higher level of development. How would the researcher succinctly report this result?
He might write something like this: ''After com leting a standardized instrument, mid-
dle school students displayed higher levels of socioemotion eve 0 ment than their el-
ementary sc 00 p , z = . 7, P< .01." M e care note of the statistical nomen-
clature appearing at the end of the sentence-the researcher provides information about
(a) the test statistic (z) and its obtained value (3.87) as well as (b) specific indication of
the significance level (the p value or a level of .0 I) used to successfully reject the null hy-
pothesis. Throughout the remainder of the text, we will learn to present statistical results
in this common reporting style (see also, Appendix C).
different from 0 (i.e., the absence of any measurable association between X and Y). In
fact, the null hypothesis for testing whether a correlation is significant assumes that its
value is equal to 0, while the alternative hypothesis posits that the association is detected
as different from O. A statistically significant correlation, then, is one that is large enough
in value that a researcher anticipates obtaining the same result in other, similar samples.
A si nificant correlation (r) is assumed to reflect a true ositive or ne ative correlation
existing in a popu atlOn p; see Table 4.1), one greater than 0, so that the researc er must
~terested in generatiZing the relationship portrayed by the correlation beyond the
current sample. Sociologists and demographers, for example, routinely find a positive
correlation between years of education and income in any reasonably large sample of
adults-more education tends to result in higher earned income, just as abbreviated
schooling is linked to generally lower earnings.
Once a correlation is calculated (see chapter 6), determining whether it is significant
is relatively easy. Generally, two criteria must be met: the value of r must be relatively far
fr e statistic should be based on a reasonably large sample. Some balancing
can be achieved using these two criteria: smaller corre atlOns can 'still be declared signif-
icant if they are based on very large samples and, in turn, very large correlations-those
close to ± l.OO-can compensate for smaller samples. It is rare, however, to observe a
significant correlation that is based on fewer than 10 pairs of observations.
To demonstrate how to determine a correlation's significance, we can return to an
example from chapter 6 and consider it in light of the steps for hypothesis testing out-
lined in Table 9.2. A personality psychologist was interested in validating a new inven-
tory designed to measure extraversion. Ten participants completed the inventory and
then, a week later, they took part in a staged social interaction-a cocktail party-where
they were instructed to meet and greet 12 new people (confederates of the investigator)
for 30 minutes. The correlation between their scores on the extraversion measure and
the number of confederates they met was + .94 (to review the data and the calculation of
r, please see tables 6.1-6.3). Is this correlation significantly different from O?
Step 1. The personality researcher decides to use the .05 level of significance for
testing whether the observed correlation is different from O. The null hypothesis is that
the population correlation coefficient (p) is equal to 0, while the alternative hypothesis
is that the population the sample was drawn from is greater than 0, or:
Ho: p = 0,
Hl:P>O.
Step 2. The value of the correlation coefficient was calculated previously, so there
is no need to calculate any other statistics, including standard error. The personality re-
searcher elects, however, to rely on a two-tailed significance test.
Step 3. The formula for determining the degrees of freedom for r is N - 2. There
were 10 participants in the study, so the degrees of freedom are equal to 8 (i.e., 10 - 2. = 8).
To determine whet~er r = + .94 is statistically significant, we turn to Table B.3 in
Appendix B - please turn there now-which contains critical values for the Pearson r.
We look for the critical value that represents the intersection between the row contain-
ing 8 degrees of freedom (located in the far left column) and the column labeled .05
under the heading "Level of Significance for Two-Tailed Test:' As you can see, the critical
value of r-usually called rcril-is equal to .632. Is the calculated r of + .94 greater tRan
A correlation representing sample orequal to the rcril of .632? Yes, it IS, so we can reject the null hypothesis. The observed
correlation coefficient of + .94 is from a population whose p is greater than O.
data is symbolized r; one
Step 4. The personality researcher successfully demonstrated a significant, positive
representing a population is correlation between the trait of extraversion and the number of people participants met
symbolized p. socially. The correlation could be reported as: ''A significant correlation between extra-
Inferential Errors Types I and II 349
,
I A data analyst has two chances to make correct decisions and two opportunities to
i draw faulty inferences. These four decision possibilities are summarized in Table 9.3. Please
(
take a few minutes to review each cell carefully.
.
I
I
I Ty~e I. Err~r. Sometimes a researcher incorrectly rejects the null hypothesis, thereby
{
belIevmg m results that only appear to contain statistically reliable differences. How
(
can such an error occur? Unknowingly, for instance a researcher can draw an unrepre-
\ sentative sample that shows a 1 erence or effect that does not occur a e eve 0 e
)...
350 Chapter 9 Inferential Statistics: Sampling Distributions and Hypothesis Testing
Table 9.3 Statistical Decisions and Reality: Making Correct Decisions or Inferential Errors
..
....... Statistical Decision
AcceptHo RejectHo
population. The Type I error, then, can be due to unlikely but extant random variation.
When a null hypothesis is actually true, the pro abili 0 . naT e I error is equal
!£ a, that is, the significance level used for rejection. Thus, if a researcher is using a .0
significance level, then there is a 5% chance of rejecting a true null hypothesis (see the
upper right cell in Table 9.3). A researcher would be making a Type I error by conclud-
ing that an herbal dietary supplement helped people feel more alert when, in actuality, it
had no effect on mental acuity whatsoever. As shown in Table 9.3, the probability of
making a correct decision-accepting the null hypothesis when it is true-is equal to
1 - a or, here, .95 (see the upper left cell in Table 9j).
What can be done to avoid a Type I error? Researchers can hedge their empirical
bets by using a more stringent level of significance to reject the null hypothesis. In other
The probability of making a Type I
words, the probability of making a Type I error can be reduced by lowering a. Thus, for
error is equal to a test's example, a researcher reduces the risk of making this inferential error by changing a
predetermined significance (a) level. from .05 to .0 I-to be sure, this change decreases the likelihood of incorrectly rejecting
the null hypothesis, just as it increases the probability of correctly accepting the null hy-
pothesis when it is actually true (Le., 1 - a = .99). There is an empirical price to be paid
for this careful logic, however: By reducing the risk of making a Type I error, the chance
of making a Type II error increases.
Type II Error. Type II errors happen when researchers accept the null hypothesis when
it is actually false, that is, they fail to reject it and miss identifying an actual statistical
effect in the data. What causes this error? Lowering the level of a to protect against
making a Type I error decreases the area available in a sampling distribution used for
detecting significant differences-if the critical region is smaller, the likelihood of
obtaining a test statistic of sufficient magnitude to fall within its confines becomes more
difficult. In fact, by decreasing a you necessarily decrease the chance of rejecting the null
hypothesis whether it is actually true or false. Thus, an investigator out to rigorously test
for the mood enhancing effects of an herbal dietary supplement might miss document-
ing the effects entirely by decreasing a to .01 or .00 I-reliable differences become harder
to detect while accepting the null hypothesis becomes a stronger possibility.
Can we determine the probability of making a Type II error? Although it is called a
{3 (the Greek letter "beta") error (see the lower left cell in Table 9.3), the actual likelihood
Protecting against Type I errors
of making a Type II error is theoretical and, therefore, very hard to determine (and if we
(reducing a) increases the likelihood knew {3, we could also know the probability of correctly rejecting the null hypothesis, or
of making a Type II error. 1 - {3; see the bottom right cell in Table 9.3). Indeed, a Type II error is influenced by
several factors besides the level of a, including sample size, random variation present in
the data set, and the size of the effect the study's independent variable has on the
Statistical Power and Effect Size 351
dependent measure. Although we cannot readily pinpoint a probability value for f3, it is
certainly the case that making a Type II error becomes less likely when a larger sample
size is available, a is increased (Le., .01 is moved "up" to .05), and the independent
variable's effect size-a matter we take up in the next section-is pronounced. When
there is a great deal of error variance present, however, the probability of making a Type
II error remains high.
f Which error-Type I or Type II-is worse? A cursory analysis of the question leads
I many people to reply that a Type II error is the worse of the two because a researcher's
I
effort is not rewarded. A good idea is operationalized and tested in an experiment, but
the hypothesized effect never pans out empirically-the researcher moves on to other
projects, unaware that his or her research expectation is actually true.
Although this scenario is no doubt frustrating, Type I errors are actually assumed to
be much worse than Type II errors (e.g., Bakan, 1966). Think about it for a moment-
what could be worse than having faith in a research result that is not actually true? Once
a Type I error finds itself into a scholar's program of research, his or her subsequent pub-
lications and presentations will spread the contaminated result to other investigators
who, in turn, will use it in their theorizing, writings, and teaching. The false result
I quickly becomes established wisdom within the research literature (Dunn, 1999) and,
across time, false theories and dubious findings mislead future generations of students
( and researchers. When a result appears in the published literature, most researchers as-
r sume it is valid, if not the gospel truth. When the cannot re Iicate it in the' n stu(t,)
" ies re t to ame themselves because the fa se effect is alread known to
i be "true"-after all it appeared in the literature, didn't it?
t' Unless they are caught early 10 me research process, lhen, Type I errors can haunt a
(
~
Type I errors are considered to be research literature for quite some time. Type II errors are frustrating and the damage
/ caused by not knowing about whether a result could benefit a topical area is hard to as-
more harmful to the research
t' sess. Still, most behavioral scientists agree that it is better to forgo a good idea (Type II
enterprise than Type II errors. error) than to perpetuate a potentially destructive falsehood (Type I error). One thing all
researchers can do to combat both types of errors is perform more than one experiment
to examine a hypothesized relationship between an independent variable and a depen-
dent measure. Replicating an effect is certainly more work, but it is one of the best ways
to avoid making inferential e~rors that can be damaging and costly in the long run. We
now turn to a related issue, ensuring that we do what we can statistically to identify real
differences and to avoid locating spurious ones.
i
)
l
352 Chapter 9 Inferential Statistics: Sampling Distributions and Hypothesis Testing
Determining probabilities like these is actually a very complex process, one that is
beyond the scope of our present discussion and the level of this book. We will, however,
Power is the ability to accomplish
consider practical steps that you, as a researcher and data analyst, can take to enhance the
research goals, to correctly reject the level of power in experiments and in the analyses used to interpret experimental results.
null hypothesis. You should be aware, too, that there are power tables containing values associated with
various statistical procedures based on different sized samples. Such tables are very use-
ful when it comes to estimating the power necessary for a particular research design, and
they can be found in various advanced statistics texts (e.g., Cohen & Cohen, 1983;
Cohen, 1988; see also, Aron & Aron, 1997).
There is one simple but important lesson to be learned about the concept of power
and whether to accept or reject a null hypothesis. Many budding researchers assume that
if an alternate or research hypothesis is true, then a significant result-one based on the
rejection of a null hypothesis-will be found. Although this scenario is very desirable,
the data and the results based upon it will not always cooperate. Why not? Well, the sam-
ple data drawn from a population of interest may not contain values extreme enough to
reject the null hypothesis-that is, the statistical power associated with the analysis of a
particular sample may not be sufficient. Thus, just because a research hypothesis is ten-
able-it is routinely replicated-do not assume that a significant difference will actually
be found; indeed, steps should always be taken to enhance the likelihood that predicted,
significant differences will be observed.
Experienced researchers and savvy data analysts are well aware of the important role
that power plays in research efforts. As a result, th~ try to maximize the power available
within an study-great or smaII-b takin concrete action before any data are
~ co lected. Power t en something to worry about when ana SIS e ms I.e.,
I){ when It IS 00 ate), rather it is a cWl,cern w en the research begins, when it is in its plan-
ning stages. D~ite the fact that researchers are usually unable to measure power pre-
cisely, there are several ways to increase power. Some of these actions were mentioned
earlier as factors influencing Type II errors, but they merit broader discussion here:
Increase a Study's Sample Size. As you know by now, larger samples are apt to be
representative of their parent populations to a much higher degree than smaller samples.
If a result actually exists at the level of a population, then a hypothesized effect is likely
to be found if the sample size is sufficiently large. More to the point, increasing a sam-
ple's size is often the most strai htforward eas ud S ow r.
ow large a sample is "big enough?" As unsatisfying as it may sound, "it depends" on the
circumstance. As noted earlier in the book, always strive to obtain as many participants
as possible. As a general rule of thumb, it is difficult to detect trustworthy significaI11.dif-
ferences with fewer than 30 articipants and, as a stud's N moves toward 100, the prob-
e!Wity of rejecting a null hypo eSlS IS enhanced dramatically.
Figure 9.6 illustrates how a larger N can increase the probability (i.e., 1 - f3) of re-
jecting the null hypothesis. Drawing (a) in Figure 9.6 represent two hypothetical, some-
what overlapping distributions when N = 20. Note the two-tailed critical regions
shaded in the null distribution, especially that the lower tail shares some overlap with the
experimental distribution. The shaded portion of the experimental distribution in
drawing (a) corresponds to power of the test being used to detect a difference between
the control and experimental groups. Now look at drawing (b) in Figure 9.6, which il-
lustrates two narrower, less overlapping distributions based on N = 100. Two-tailed
critical regions are again-.RQ~dJor the null (control) distribution, but notice that the rel-
a!Lv~oveIlqp oLthe two distribution_~in drawing (b) is much less-a larger sample size
leads to less variability in a sampling distribution. More important for present purposes,
of cotirse, is the fact that the shaded area under the experimental curve-the area indi-
Statistical Power and Effect Size 353
Reject Ho
---J~~
N=20
!lou
(a)
Reject Ho
:....----I&if!..
N= 100
(b)
Figure 9.6 Illustration of Statistical Power for Two Different Sample Sizes
eating the power available to reject the null hypothesis-is greater than that found in
drawing (a). In general, then, a larger sample size increases the amount of power
available to reject the null hypothesis.
Use Precise Dependent Measures. Harris (1998) advocates using sensitive or precise
dependent measures. These terms usually implicate relying on established or well-tested
instruments-;Sudi-as-published- questionnaires or valid mventones, as well as concrete
behavioral indices, found to reliably demonstrate differences among groups in prior re-
search. I am not trying to stifle your creativity where developing novel measures is con-
cerned, but tried and true dependent variables are likely to show lower levels of error
v~riance than instruments being given a trial rup,. The goal is to avoid increasing mea-
surement error, which unduly inflates standard deviations within groups, making it
more difficult to locate any significant differences between them. In contrast, sensitive
measures possess lower levels of error, thereby increasing the chance of identifying even
small-but reliable-differences between groups.
354 Chapter 9 Inferential Statistics: Sampling Distributions and Hypothesis Testing
offer sage guidance about how to enhance the impact of independent variables (e.g.,
Cozby, 1997; Dunn, 1999; Martin, 1996; Rosenthal & Rosnow, 1991). One good way to
determine the effectiveness of a novel independent variable is to pilot test it prior to the
start of a project's actual data collection (Dunn, 1999).
Reduce Error Variation by Controlling for Random Factors. Obviously, using sensitive
dependent measures and sillent independent variables will go far toward reducing error
variance, but only so far. Researchers can still take other steps to enhance power by
keeping all aspects of an experiment constant except exposure to a level of an indepen-
dent variable. These steps include reducing participant attrition or determining its
cause, using trained experimenters who memorize a research script (i.e., the "lines" and
"blocking" in an experimental production), balancing gender across conditions in the
experiment, and tracking-measuring-any potentially influential variables for inclu-
sion in subsequent analyses or discussion (Dunn, 1999).
Consider Using a One-Tailed Test. This piece of advice will seem contradictory given
the earher recommendation to ~mploy exclusively two-tailed significance tests. To be
sure, such rigor will reduce, even negate, Type I errors, but the power of the statistical test
used to examine the null hypothesis is also reduced in the process. The truth of the
matter is clear, however; it is far easier to reject a null hypothesis by using a one-tailed
rather than a two-tailed test. In concrete terms, a one-tailed z test at the .05 level relies
on a critical value of ± 1.65, while its two-tailed counterpart requires a z equal
to ± 1.96. Assuming the one-tailed test's hypothesized direction pans out, it is much
simpler to reject the lower (one-tailed) than the higher (two-tailed) z in an analysis.
Effect Size
The power of a statistical test is not dependent exclusively on sample size, the clarity or
sensitivity of variables, level of significance tests, controlling error variance, or any of the
other influences noted above. Power is also affected by what statisticians call effect size.
Conceptually, the degree to which two populations-one for the control r rou e
other for the ex enmental group-do not over a WIt one another is indicative of ef-
fect size, as the measurable isparity reveals whether the mamp ation 0 an In epen-
dent variable truly separated the values of the populations from one another.
Statistical Power and Effect Size 355
KEY TERM Effect size is the measured strength of association among variables within a study or the magnitude
of an experimental effect. Effect size influences power, such that the greater the effect size, the
greater the power within a study. --.
I:practical terms, effect ize ;gers to how far apart group means are from one another,
and how much variance occurs within each group (e.g., on on, ect size
is usually reported in standardized units-the mean difference between the groups is di-
vided by the standard deviation of the control group.
Effect size is not difficult to calculate and there are a variety of different indices
available to researchers. We will learn to calculate several effect size indices in subsequent
chapters in this book. Many investigators now routinely report effect sizes in their pre-
sentations and publications, allowing interested observers to learn the magnitude of an
independent variable's effect and how difficult it was to obtain. In fact, some journal ed-
itors now require that authors include effect size statistics in articles they submit for peer
review and publication. Presumably, lar~er effect sizes point to larger effects, which are
more detectable t er effects (but see the Project Exercise concluding this chap-
ter . Reporting effect size in this manner is very helpful to the scientific community be-
cause other researchers can plan related studies knowing the level of difficulty entailed
in finding a particular result (e.g., during a study's planning stage, effect sizes can be used
inconsort with the aforementioned power tables; see Cohen, 1988; see also Rosenthal &
Rosnow, 1991). Such studies are more likely to succeed-legitimately reject a null hy-
pothesis-when researchers can determine how difficult it is to find significant differ-
ences between the groups in an experiment.
Although it is usually determined afterward, Cohen (1988) persuasively argues that
researchers should consider the effect size they plan (or hope) to obtain before they
perform the research. Cohen's extensive research concerning the analysis of power in
numerous studies led to the identification of conventional values for effect size. Effect
size is usually reported as a coefficient ranging between 0 and 1.00, and higher values in-
dicate that an independent variable had greater influence on a dependent measure.
Cohen su ests that an effect size value is labeled small (.20), medium (.50), or large
(80), a matter we wjJ1 explore in detailed examples beginnillg ill t e next chapter.
[Sltatistical analysis also has a narrative role. Meaningful research tells a story with some
point to it, and statistics can sharpen the story. Students are often not mindful of this.
Ask a student the question, "If your study were reported in the newspaper, what would
the headline be?" and you are likely to receive in response a rare exhibition of incoher-
ent mumblings, as though such a question had never been even remotely contemplated.
' ..
,
356 Chapter 9 Inferential Statistics: Sampling Distributions and Hypothesis Testing
Let me add that the word "student" should be read as "student of statistics," and this in-
cludes instructors, statisticians, and the author whose words you are now reading.
Everyone can easily miss the "forest"-a clean and clear summary of a hypothesis and its
link to a study's results-for the "trees;' all the small details and asides that can be spun
offfrom a hypothesis and supporting results.
As you analyze the homework problems presented in the remaining chapters in this
book, do calculations for your next statistics exam, or when the opportunity arises,
conduct your own research project from start to finish, keep Abelson's (1995) admoni-
tion in mind. It is essential that you learn to quickly and efficiently link up the results
of a given statistical analysis with a clear, relatively concise interpretation of what those
results actually mean. Making this link is no small feat because we often become bogged
down in the technical details or minutiae of "doing the math" required for a statistical
test, or we neglect to think about what a test's results tell us about behavior. Please keep
in mind that statistics and data analysis are tools designed to help investigators to draw
more clear conclusions about events that take place in the world, but the interpretation
and articulation of the events is still much more important than the number crunching
involved (see chapter O.
From this point forward, then, cultivate the habit of expressing each result you
calculate in terms of Abelson's (1995) "headline" approach. Alternatively, adopt a
"bumper sticker" point of view by effectively but efficiently getting across the essence of
some research results succinctly. That is, put the result into simple terms that someone
who is not the veteran of a statistics class-a close friend, your grandmother-can
understand. Your explanation should be no more than a sentence or two in length. Why
such a short summary? After more than 12 years of classroom teaching, I've observed
that if students cannot define a concept briefly or summarize it concisely, they rarely
understand the concept very well. Regrettably, our culture fosters the impression that
lengthy (and too often windy) explanations, "big" multisyllabic words, and overly
technical jargon correspond to expert understanding of some phenomenon. Don't
you believe it! The best explanation is still the simplest explanation. Strive to explain
complex concepts in straightforward prose as long as you do not lose any important
meaning in the process. If some technical information is needed, then by all means use
it, jargon and all-just be sure to properly explain what any complex relationship or
unusual terms mean for readers. One good way to make technical information less dry
is to explain it in the context provided by an example, so that you will give the reader
something to latch onto in the midst of a detailed account.
Concern about making the technical more comprehensible nicely leads to the
concluding point I want to make in this section. When reading behavioral science
journal articles that employ statistical analysis, you will notice a curious thing-
enerall , authors do not discuss predictions, anal ses . ms of either t
null or the alternative hypot eSIS. n erentl statistical tests are certainly performed and
significant dIfferences are noted, even discussed at length, but the technical terms we
learned in this chapter-whether to "accept" or "reject;' for example-rarely appear.
Many students find the absence of this (now) familiar material a bit disconcerting
(though others rejoice!)-what is going on?
Behavioral scientist authors and journal editors assume that general readers of the
literature are already initiated into the mysteries of statistics and data analysis, that edu-
cated consumers of research "kuru,," the pull or alternative h}!pQtbesjs is implied in any
published piece of research. Readers who are unfamiliar with the symbolic and mathe-
maticallanguage of staustics, too, can skip over any sections of a research report con-
taining overly technical material in favor of details provided in more accessible sections.
Journal articles written in APA style, for example, are crafted in such a way that readers
Statistical Power and Effect Size 357
can get the "gist" of what a researcher did and found by reading any single section of a
text, from the Abstract to the Discussion section (see Appendix C). Why? Because each
section of an APA style paper is meant to "stand alone:' that is, it should be readily in-
terpretable to almost any level of reader with little effort (Bern, 1987; Dunn, 1999).
Statistically sophisticated readers know whether a null hypothesis is accepted or re-
jected based on the statistical information provided by an analysis and any accompany-
ing prose explanations. Thus, an author might report that,
The intensive treatment group reported lower anxiety scores than are generally observed
in clinical populations, z = 3.75, P < .05.
In this case, the "p < .05" indicates a significant difference was obtained-here, the in-
tensive treatment group appears to represent the characteristics of another popula-
tion-so that the null hypothesis of no difference was rejected. If the intensive treatment
proved to be ineffective, then you might read a summary like this one:
The intensive treatment group reported mean anxiety scores that were similar to those
observed in clinical populations, z = 1.20, P > .05.
Here, the" reater t
sult did not reach the conventional level of significance. orne aut ors c oose to report
the same result by using "p = ns" instead, where "ns'! translates as "not significant." In
both cases, of course, the message to readers is that the null hypothesis was accepted, that
no reliable difference from the known population was discerned. By literally learning to
read the signs and their link to descriptive accounts of what took place in a study, you
can determine whether a null hypothesis was accepted or rejected with relative ease.
Knowledge Base
1. What is a Type I error? A Type II error?
2. What is the probability of correctly rejecting a true null hypothesis?
3. True or False: Lowering a significance level from .05 to .01 increases the chance of
rejecting a null hypothesis.
4. True or False: Increasing power reduces the chance of making a Type II error.
5. Name any three steps a researcher can perform to increase the power of a statistical test.
Answers
I. A Type I error occurs when a researcher finds a significant effect that does not really exist. A
Type II error occurs when a researcher does not find a significant effect that does exist.
2. 1 - f3, which is also known as power
3. False. Lowering a significance level reduces the power of a test.
4. True. Increasing power reduces the incidence of "missed" effects.
5. Increase sample size; use sensitive dependent measures or salient independent variables; re-
duce error variance by controlling for random factors; use a one-tailed significance test; avoid
reducing significance levels.
I~~ This chapter is devoted to reviewing the key components that enable researchers to for-
mulate and subsequently test hypotheses using statistics and data analysis. A great deal
of material was covered in the course of this chapter, everything from sampling distri-
butions to standard error, from the null hypothesis to power and effect size. When used
358 Chapter 9 Inferential Statistics: Sampling Distributions and Hypothesis Testing
Table 9.4 Some Guidelines for Reading for Meaning in Journal Articles
...
.._ ..
III
1. Read actively, not passively. Identify the main message behind the author's research and sum-
marize it in a sentence or two. What does the article's title reveal about the research and its re-
sults? What are the main points highlighted in the study's Abstract?
2. What do you know about the author? Is the author a noted expert on the topic or in the field?
Has the author written similar publications? Do the current findings complement or oppose
earlier findings? Is the author affiliated with a college or university, a research institution, or
an organization that might espouse a particular political philosophy or point of view that
might color any conclusions drawn from the results?
3. What do you know about the article and the journal where it appears? Is the article recent or
several years old? Are the data new (sometimes called "cutting edge") or somewhat dated? Is
the journal reputable, that is, is it cited regularly by scholars in the field? Is the journal subject
to peer review via an editorial board? Is the journal known to be rigorous (e.g., it has a high
rejection rate for submissions, only high quality articles appear in it) or is it relatively easy to
publish there (e.g., acceptance rate is high, submissions vary in quality)?
4. How is the article organized? After a quick perusal of the article, is the author's organizational
plan-the paper's outline-apparent? Do the article's main points seem to be represented by
the headings and subheadings in the text?
5. Read for knowledge, read for notes. Read the article at least twice, once to get a sense of the arti-
-de's main points and a second time for ~areful notetaking. The first reading can be quick-you
can skim the article in order to get a sense of the whole. The second reading, however, needs to
be relatively slow and deliberate so that you can really highlight the main points being made by
the research, especially how the results fit into what is already known about the topic.
in consort with one another, these components enable researchers to ask focused ques-
tions of their data, and to use the resulting answers to build or revise theories in the be-
havioral sciences. Teaching you to use and understand these components and how they
work together will enable you to perform analyses for assignments in your statistics class
and for any independent piece of research you choose to conduct. The knowledge you
are acquiring, too, can be used to think about research that has already been published.
When reading and evaluating a published piece of research, you should begin by
thinking about whether the study's results are meaningful-important or consequential
for research and theory-as well as whether significant or nonsignificant relationships
exist among the study's independent and dependent variables. Table 9.4 offers some
suggestions about how to read for meaning when you review an empirical article. The
suggestions in Table 9.4 are meant to remind you that an article is more than just its sta-
tistical results, that you must get a sense of the whole text-from the topic of the study
to the author's intentions in conducting the work and writing it up.
Once an article is read for meaning, the statistical results can be considered. By re-
viewing the four possible outcomes of any statistical analysis, Table 9.5 is meant to pro-
vide you with some perspective when you evaluate data published within an article. As
shown by the entries in Table 9.5, the fact that a study obtains a significant result is not
necessarily grounds for assuming the result is particularly noteworthy. If the sample size
is small and the result is significant, then that result could be an important one (see pos-
sibility 1 in Table 9.5). On the other hand, a significant result that is based on a large
sample size might or might not be interesting or prove to be useful for advancing knowl-
edge about a topic. Possibility 2 is likely to motivate researchers to collect (or want) more
data, while it can remind readers to be appropriately critical, even skeptical consumers.
,.
Statistical Power and Effect Size 359
......"
"
Possibility 1
Is Result Significant?
Yes
Sample Size
Small
Conclusion
What happens when a predicted result is not found? The absence of a result can be
tricky to interpret when a sample size is small-you cannot be sure if the null hypothesis
is actually true or really false at the level of the population (see possibility 3 in Table 9.5).
Nonresults and large samples do conspire together, as it were, highlighting the fact that
even a favored research hypothesis is likely to be false (see possibility 4 in Table 9.5).
This chapter's Exercise involves having you search for empirical articles on a topic
that interests you. You will then judge each article's hypothesis, read the article for mean-
ing, and then evaluate the reported data-the results and the statistical analyses that led
to them. A series of suggested steps follow:
1. Select a well-defined area of research within the behavioral science literature (e.g.,
social psychology, experimental psychology, sociology of the family).
2. Select a well-defined topic within the area of research (e.g., deceiving and detecting
deceit in social psychology; divided attention research in experimental psychology;
divorce and remarriage in the sociology of the family).
3. Search the topic's literature for recent, representative experimental or quasi-
experimental investigations. To do so, visit your institution's reference library and
use one of the online databases for your search (e.g., PsycLIT, Sociological Abstracts,
ERIC; for search tips, see Hult, 1996; Dunn, 1999).
4. Obtain copies of two or three articles, read each one for meaning (follow the sug-
gested guidelines in Table 9.4), determine whether the hypotheses are appropriate
(rely on the criteria listed in Data Box 9.D), and then evaluate any statistical results
based on the possibilities listed in Table 9.5. As you perform these activities, answer
the questions listed below to the best of your ability. At this point in time, you
probably feel uncertain about your knowledge of hypothesis testing; indeed, you
may be hesitant to use what you know to critique a published article. Your reserva-
tions are understandable, but there is no better way to learn the utility and limits of
hypothesis testing than to "leap in" and apply what you know. I promise you that
this exercise will help you to learn the material that is presented in the next few
chapters. Base the answers to the following questions on your reactions to the
articles you collected:
• Are the published results important? What makes the results important-the
research topic(s), the question(s) asked, the statistical conclusions about the re-
sults, and/or the author's opinion? What is your opinion about the meaning and
statistical reliability of the results?
• What makes each study's hypothesis a good one? Do the hypotheses have any
shortcomings? If so, what are they?
360 Chapter 9 Inferential Statistics: Sampling Distributions and Hypothesis Testing
Summary
1. Hypothesis testing compares the reactions of distinct groups 2. Statistical differences are examined at the level of sample data,
to a dependent measure following exposure to a level of an but any observed differences between groups are assumed to
independent variable. Any resulting systematic change in a originate at the population level. Such differences indicate
group's reaction to a dependent measure is attributed to the that one population (represented by a control group) is differ-
independent variable. ent from another (represented by the experimental group).
Summary 361
3. The use of a sample statistic (e.g., X, 5) to estimate a popula- research hypothesis (HI). The null hypothesis posits that no
tion parameter (e.g., fJ., a) is called point estimation. In con- discernable differences exist among population parameters,
trast, interval estimation provides a range of values for while the alternative hypothesis specifIes that some difference
sample statistics based on repeated sampling of a population exists between population parameters.
of interest. 16. The null hypothesis is the touchstone or guide for any statis-
4. Sampling error refers to a distribution of sample means that tical test. When no predicted difference is found by a test sta-
vary somewhat from one another in value. tistic, the researcher retains or accepts the null hypothesis of
5. Inferential statistics are used in hypothesis testing, generally no difference. When a predicted difference is found by a test
to demonstrate mean differences. Hypothesis testing com- statistic, the researcher rejects the null hypothesis of no dif-
pares sample data and statistics to either known or estimated ference. Note that the alternative hypothesis is neither ac-
population parameters. cepted nor rejected.
6. A distribution of sample means is a gathering of sample 17. SignifIcance testing relies on statistical tests and probabilities
means that are all based on some random sample of a fIxed to decide whether to accept or reject a null hypothesis. A sig-
size N of some population. A sampling distribution contains nifIcant difference is one where a mathematically reliable, de-
statistics based on samples of some fIxed size N that were tected difference exists between two (or more) sample means
drawn from a larger population. that reflect distinctions between two (or more) population
7. The mean of a sampling distribution of any sample statistic is means. The word "signifIcant" does not refer to "meaningful"
called its expected value. The mean of a sampling distribu- or "important;' rather it simply indicates that some differ-
tion of means (fJ.J{) is called the expected value of the sam- ence was "detected."
pling distribution of means. 18. Statistical signifIcance is guided by probability or p values,
8. The standard deviation of a sampling distribution of sample which are also called signifIcance levels or ex levels. The most
statistics is called its standard error. The standard deviation common p values employed in statistical tests are .05, .01, and
of a sampling distribution of means is known as the standard .001; indeed, the p value of .05 is taken to be the conventional
error of the mean. cutoff distinguishing between signifIcant (i.e., p < .05) and
9. The central limit theorem states that as the size (N) of a sam- non-signifIcant (Le., p > .05) differences between means and
ple becomes increasingly large, the shape of the sampling dis- the populations they approximate.
tribution of the mean becomes normal or bell-shaped. The 19. Critical values are numerical values that a calculated test sta-
mean of this distribution is equal to fJ., and its standard devi- tistic must match or exceed in value in order for statistical
ation-the standard error of the mean-is equal to aNN. signifIcance to be realized. Critical values are usually pre-
10. The so-called law of large numbers states that as a sample in- sented in tabular form, and such tables of values exist for vir-
creases in size, the value of a sample mean (X) will close in on tually all inferential statistics. When the value of a test statis-
the value of the population mean (fJ.). tic falls at or above a designated critical value and into a
11. Although it is based on a population, the standard error of critical region, the null hypothesis is usually rejected. When a
the mean can be estimated using sample data and accompa- test statistic falls below a critical value, inside what is called
nying statistics. As the N of~ sample increases in size, the the region of retention, then the null hypothesis is accepted.
error between the observed X and a known fJ. substantially 20. SignifIcance tests can be either one- or two-tailed. A one-
decreases. The standard error of the mean, then, reveals how tailed signifIcance test relies on a single critical value and re-
closely a sample mean approximates fJ.. gion of rejection. Two critical values and regions of rejection,
12. The range of values where a sample mean is expected to fall is however, are used for a two-tailed signifIcance test.
called a confIdence interval. 21. Degrees of freedom are numerical guides that help re-
l3. There are two types of hypotheses, conceptual/theoretical searchers to select critical values for comparison with calcu-
and statistical. Conceptual hypotheses identify predicted re- lated test statistics. Technically, degrees of freedom are the
lationships among independent variables and dependent number of scores that are free to vary or take on different val-
measures. Statistical hypotheses test whether the predicted ues once a statistical test is performed on a data set.
relationships are mathematically supported by the existing 22. Single sample hypothesis testing is usually used to determine
data, that is, do differences based on sample statistics reflect whether a sample of data is consistent in value with a known
differences among population parameters. population or another different population. The z test is fre-
14. Statistical hypotheses can be directional or nondirectional. A quently used in single sample hypothesis tests where popula-
directional hypothesis identifIes the precise nature, the or- tion values are known. Testing whether a correlation coeffi-
dered difference, between population parameters. No precise cient (r) is signifIcantly different than 0, too, is a single
or ordered difference is specifIed by a non-directional hy- sample hypothesis test.
pothesis, simply that some difference between parameters 23. There are two types of inferential errors, Type I and Type II.
will occur. A Type I error takes place when a null hypothesis is rejected
15. Any experiment pits a null hypothesis (Ha) against the re- but it is actually true-in other words, an effect is found but
searcher's prediction, which is embodied in the alternative or it is not really a true result. A Type II error happens when a
362 Chapter 9 Inferential Statistics: Sampling Distributions and Hypothesis Testing
null hypothesis is accepted but it is actually false-an effect is 25. Effect size is the degree to which two populations (repre-
missed when it actually exists. Type I errors are considered sented by control and experimental groups) do not share
worse because false, misleading results affect future theoriz- overlapping distributions. Greater effect size is linked to
ing, research, and education concerning a topic. higher levels of statistical power, leading to an increased like-
24. Statistical power is the probability that a statistical test will lihood of rejecting a null hypothesis. Practically speaking, ef-
correctly reject a null hypothesis. Power is defined as 1 - f3. fect size involves how far apart two sample means are from
Concrete steps, such as increasing sample size, will enhance a one another, as well as how much variance occurs within
test's power to reject a null hypothesis. these respective groups.
Key Terms
Alpha (a) level (p.336) Hypothesis testing (p.318) Sampling error (p. 317)
Alternative hypothesis (p. 334) Interval estimation (p.317) Significance level (p.336)
Central limit theorem (p. 322) Law of large numbers (p. 323) Significance testing (p. 336)
Confidence interval (p. 327) MAGIC criteria (p. 342) Significant difference (p. 336)
Critical region (p. 339) Nondirectional hypothesis (p.333) Standard error (p.321)
Critical value (p.339) Null hypothesis (p. 334) Standard error of the mean (p. 321)
Degrees of freedom (p.342) One-tailed significance test (p. 340) Statistical hypothesis (p. 332)
Directional hypothesis (p. 333) Point estimation (p.317) Two-tailed significance test (p. 340)
Distribution of sample means (p. 320) Power (p.351) Type I error (p. 349)
Effect size (p. 355) Region of rejection (p. 339) Type II error (p. 349)
Expected value (p.321) Region of retention (p. 340)
Experimental hypothesis (p. 334) Sampling distribution (p. 320)
Chapter Problems
1. What is point estimation? Is point estimation different than mean of the sample is 30 and the unbiased estimate of its
interval estimation? How so? What role do these two forms standard deviation is 6. Calculate 75%, 95%, and 99% confi-
of estimation play in hypothesis testing? dence intervals for the mean.
2. Explain the difference between a distribution of sample 11. A sample of 100 participants is drawn from a population.
means drawn from a population and a sampling distribu- The mean of the sample is 56 and the unbiased estimate of
tion. its standard deviation is 12.5. Calculate 80%, 95%, and 99%
3. How do frequency distributions differ from sampling distri- confidence intervals for the mean.
butions? 12. What is a statistical hypothesis? How does it differ from a
4. What are the specific labels used to denote the mean and conceptual or theoretical hypothesis?
standard deviation of a sampling distribution? What is the 13. Name several of the components comprising a good hy-
name used for the standard deviation of a sampling distri- pothesis.
bution of means? 14. Define directional and nondirectional hypotheses. Is there
5. If fixed, reasonably large sample sizes are repeatedly and any advantage to favoring one type over the other? Why?
randomly drawn from a population, what will the shape of 15. An educational psychologist develops a new reading pro-
the sampling distributions of means be like? Why? gram she believes will accelerate reading skills and compre-
6. What is the central limit theorem? Why is this theorem so hension of the population of at risk readers in grades 1 and
important to inferential statistics? What is the central limit 2. A sample of at risk readers takes part in the reading pro-
theorem's particular relationship with the standard error of gram for 4 months. The researcher decides to determine if
the mean? the sample's scores on a reading and comprehension test ex-
7. Why is the law oflarge numbers relevant to the central limit ceed the population average for at risk readers. Formulate
theorem? Ho and Hi using a directional test and then a nondirectional
8. A normally distributed population has a /.L of 43 and a a of test.
16. Determine the standard error of the mean (ax) for each 16. Explain the status of the null hypothesis within statistical
of the following sample sizes: N = 10,30,55,70, 100. analysis and science more broadly.
9. A normally distributed population has a /.L of 80 and a a of 17. Statistical analysis is guided by the null hypothesis and not
20. Determine the standard error of the mean (ax) for each the alternative hypothesis-why is this so? Why is it difficult
of the following sample sizes: N = 15,40,65,80, 110. to prove an alternative or research hypothesis? How does
10. A sample of 85 participants is drawn from a population. The this difficulty enhance the utility of the null hypothesis?
Chapter Problems 363
18. An investigator can accept (retain) the null hypothesis or she between two variables reflect? What criteria promote the
can reject it-why can't a researcher accept the alternative likelihood of observing a significant correlation?
hypothesis? 31. The correlation between self-esteem and weight in a sample
19. Define the word significant, as well as its use, in statistical of 50 people is equal to - .37. Using a two-tailed test, deter-
contexts. mine if this correlation is significant. What are the degrees of
20. Explain the difference between statistically significant research freedom and the value of rcrit? Report what this result means
results and those possessing practical significance. Are these in words and using statistical nomenclature.
research terms opposites or complements of one another? 32. What is a Type I error? Why do Type I errors occur? Provide
21. Review the following p values and indicate which, in con- an example.
ventional statistical terms, are significant, marginal, or non- 33. What is a Type II error? Why do Type II errors occur? Provide
significant: .031, .040, .003, .076, .051, .120, .250, .098, .0001, an example.
and .046. 34. As inferential errors, both Type I and Type II errors are dis-
22. What is a critical value? What role do critical values play in ruptive to the research process. Is one worse than the other?
hypothesis testing? In general, is it easier to reject a null hy- Why? Provide an example to support your answer.
pothesis using one-tailed or two-tailed critical values? Ex- 35. What can be done to reduce the incidence of making a Type
plain your answer. I error? What can be done to reduce the incidence of making
23. Explain the difference between one-tailed and two-tailed a Type II error? How can a researcher balance the demands of
significance tests. Is one test considered to be more statisti- these competing concerns in a research project?
cally rigorous than the other? Why? Which test enables re- 36. An experimental psychologist finds a significant difference
searchers to satisfy their curiosity regarding relationships at the .05 level between an experimental and a control con-
among variables in a statistical analysis? dition in a highly technical study. What is the probability the
24. Assume that the educational psychologist cited above in investigator is committing a Type I error? What can the re-
problem 15 analyzed her data using a z test. What critical searcher do to reduce the chances a Type I error is being com-
value(s) for zwould be used for a one-tailed significance test mitted?
at the .05 and .01 levels? For a two-tailed test at the .05 and 37. Define the word power, as well as its use, in statistical contexts.
.01 levels? 38. A clinical psychologist is planning a yearlong intervention
25. In conceptual terms, what are degrees of freedom? More prac- study on alleviating depression among elderly women
tically, how are degrees of freedom used by data analysts? whose spouses died of medical complications following
26. Using the study described in problem 14 for context, review surgery. What are some concrete steps the researcher can
and summarize the four steps for testing a hypothesis. take to enhance the power of her statistical analyses at the
27. What does the acronym MAGIC mean? Why should re- study's conclusion? (Hint: Consider using the decision trees
searchers adhere to its criteria when conducting research? that open the chapter to answer this question.)
28. An herbal dietary supplement is believed to enhance the 39. Define the word effect size, as well as its use, in statistical con-
short-term memory capacity of individuals who are age 85 texts.
and over. A gerontologist dispenses the supplement to a 40. How does sample size affect power, critical regions for rejec-
group of 65 elderly persons living in a nursing care facility. tion and retention, and a researcher's ability to reject the null
Four months later the gerontologist tests their short-term hypothesis?
memory, observing that the group's mean performance on a 41. A researcher acquires some sample data and wants to deter-
standardized measure is 35. The researcher knows that the mine whether it comes from a larger population of interest.
performance distribution for individuals 85 years and older Unfortunately, there is no information available regarding
has a f.L of 30 and a (j of 10. What is the probability that the the population's parameters-can the researcher still per-
group's sample mean of 35 is different from the general per- form the appropriate hypothesis test? Why or why not?
formance distribution? Use a two-tailed significance test (Hint: Consider using the decision trees that open the chap-
where ex = .05. ter to answer this question.)
29. A developmental psychologist studies moral awareness in 42. An investigator wants to select a significance level for a topic
male and female adolescents. Casual observation suggests where little information is available. What significance level
that girls develop moral awareness earlier than boys, though should she choose? (Hint: Consider using the decision trees
most published research contains male participants. A stan- that open the chapter to answer this question.)
dard measure of moral awareness, one validated using only 43. A researcher wants to avoiding making a Type I error when
male samples, has a f.L of 72 and a (j of 18. The psychologist analyzing the data from a recent project where the N = 24.
administers the measure to 58 13-year-old girls, whose mean She intends to use a one-tailed test. Can you give her any
score is 76. Test the hypothesis that the known f.L represent- specific analytic guidance before the analyses begin? (Hint:
ing males is different from the sample mean for females. Use Consider using the decision trees that open the chapter to
a two-tailed test where ex = .01. answer this question.)
30. In statistical terms, what does a significant correlation (r)
Choosing a t test Conducting an Independent Groups tlest
1. 2. 1. 2.
How many How many Are the Are the sample
samples are observations samples sizes for each
drawn from a were collected independent of independent
larger from each one another? group the
population? participant? I·' same?
If one, If two, If one, If two, then If yes, If no, then If yes, If no, then
then go to then go to then go to perform a then go to you cannot then go to beware that
step 2. step 4. step 3. correlated step 2. perform an step 3. .•. [.. unequal
groups t independent [ .•. 'sample
test. groups t I .••• sizes are not
1. 2. 3.
Do you know the What level of power do you anticipate Find the intersection between the
typical effect size seeking to test the hypothesis? effect size r value and the power level
(e.g., effect size r) identified in Table 10.6, and write
associated with your that number here _. This number
research topic? represents the total participants needed to
.................... achieye .• adequat~P9W~Lf9r a···t te~t.·.· •
I I I
If you intend If you If you
to use are using have not
effect size literature the recom- another determined a
here_ and try to mended power level power level,
and proceed ascertain or default besides the do so now
to step 2. estimate the power level •.... default before
typical effect of .80, then value, write proceeding
size and proceed to it here_ to step 3.
then go to step 3. and then go
step 2. to step 3.
C HAP T E R 1 0 Chapter OUIIIne
Figure 10.1 Experimental Logic Underlying the Three Variations of the tTest
Recapitulation: Why Compare Means? 367
Before we learn how to appropriately calculate and apply each of these two group
inferential tests in experimental and quasi-experimental settings, a brief review of the
important role of mean comparison in the behavioral sciences is needed.
As this quote from Edgeworth suggests, statistics are used to represent possibilities-
what might be true given the conditions found within some situation or, in behavioral
science terms, a controlled research setting. The statistic most often used to represent
the behavior of many observations is, of course, the mean. In chapter 4, we learned that
the mean is the best single indicator for describing an array or sample of observations.
This singular status is awarded because the mean identifies the typical or average response
in a group-how Edgeworth's "generic portrait" of the average individual responds to
some particular circumstance or influential stimulus.
Researchers cannot study every person's singular reactions to an event, so people's
collective response is examined in the form of a mean, just as samples are meant to
stand in for larger populations. The mean possesses a canonical-that is, a sanctioned
or authoritative-place in the behavioral sciences, so much so that discipline-based
books and journal articles rarely bother to explain why it is used with such regularity
or that such faith is attached to any results based upon it. Beyond its mathematical
properties (e.g., the least squares principle; see chapter 4) and ubiquity, most statistics
texts never mention other beneficial aspects of the mean. Indeed, on occasion, just the
opposite occurs-the "tyranny" of the mean within the behavioral sciences is decried
(Aron & Aron, 1997) and the dearth of alternative approaches in general is mentioned,
instead, implying everyone implicitly understands the unique status of the mean (some
alternative, more qualitative, approaches are introduced in Appendix F).
Think about it: what does the mean actually mean, as it were? You don't often "see"
people's average behavior or learn their average opinions in some topical domain
(e.g., politics, religion, women's issues), rather, you infer a group's typical response by
sifting through and thinking about what several people did or said. On rare occasions,
of course, responses unite or are highly similar to one another, and this low variability
clues us into actually recognizing a mean or average reaction. More often than not, how-
ever, we are left to our own judgment devices where mean reactions are concerned-
in a real sense, we make inferences about averages in much the same way inferential
statistical tests do. We judge whether one sample of an act or opinion is similar to or
distinguishable from another. Our judgments of significance where mean performance
is concerned, too, are simply less formalized and somewhat subjective when compared
against the objective methods of statistics (e.g., Kahneman, Slovic, & Tversky, 1982;
Nisbett & Ross, 1980).
As you acquire the skills for comparing means statistically, try to keep in mind the
hypothetical but general behaviors they are meant to represent. Do not simply focus on
the means as numbers; focus on what they refer to, what story they tell within a re-
search context. When you compare one mean with another to learn if some reliable,
detectable difference exists between them, you are really speculating about whether what
was said or done in one group is substantially different from the acts performed in another
group. Following the outline of the standard experiment summarized in chapter 2 and
noted selectively thereafter, then, did the manipulation of some independent variable
368 Chapter 10 Mean Comparison I: The tTest
create a different average level of behavior in the dependent measure for an experi-
mental and control group, respectively?
The I Distribution
Enter the t distribution-or, actually, t distributions-which were derived to specifically
deal with the inherent shortcomings posed by small samples and the statistics based
upon them.
KEY TERM The t distributions are sampling distributions of means designed for use with small samples. Any
t distribution has a mean of 0 and a standard deviation that decreases as the available degrees of
freedom or number of observations increase.
To begin with, then, no t distribution is a standard normal distribution, though it pos-
sesses a characteristic "bell-shape" and, thus, is symmetric on both sides of its mean.
As shown by the overlaying curves in Figure 10.2, the t distribution looks somewhat
squat compared to the z distribution-it is shorter, flatter, and broader. In fact, t dis-
tributions tend to have a greater spread than the standard normal distribution, espe-
cially when sample sizes are relatively small.
Why is this so? Despite the fact that both t and z distributions have a mean of 0,
they have different standard deviations. Where the standard normal distribution has
fixed standard deviations equal to 1.0 (for a review, see chapter 5), the standard devia-
tions of any t distribution vary in size; indeed, there is a distinct distribution for all
possible degrees of freedom. The spread of the t distributions and their standard devi-
ations do decrease in size as the available degrees of freedom mcrease in value (remem-
b;-that z tests do not have or need degrees of freedom; see chapters 5 and 9).
The distributions are similar where probabilities are concerned, however; similar to
t tests are used to compare one or the z distribution, a t distributioJ,1 can be used to determine the relative likelihood a sam-
ele mean differs from a popUlation me~. tater, we will see that directional and noiiai.-
two sample means-but not more
rectional hypotheses can be posed and tested in a similar fashion using either distribu-
than two. tion. Both the t and z distributions test hypotheses involving either one or two sample
means, but no more than two (other, more advanced tests are used to compare more than
The Relationship Between the t and the z Distributions 369
lor
Normal
/ Distribution
/ t Distribution
t Distribution
Figure 10.2 Comparing the Standard Normal (z) Distribution with a Hypothetical t Distribution
Note: The z distribution is more compact than the t distribution, which tends to display greater variability
under its curve.
two means simultaneously; see chapters 11, 12, and 13). Finally, when a sample size is
quite large, the statistical procedures and conclusions drawn from them are identical (and
some researchers routinely rely on ttests when a z test would suffice; Harris, 1998).
no practical way to acquire the necessary information to avoid violating one or another
assumption. We acknowledged earlier in the book that true random sampling is rarely
used, and that random selection is treated as an adequate, though less-than-perfect,
substitute for it. If one knew a great deal about a population-that it was normal, for
instance-the need to conduct experimental research relying on the t test would be
much less pressing. Violation of the t test's assumptions is acceptable in most situations
(though trying to perform an analysis without calculating a sample mean(s) would
surely be problematic) because the test is said to be robust.
KEY TERM A statistical test is described as robust when it provides reasonably valid portrayals of data (and
relationships therein) when some of the test's assumptions are not met during its application.
In particular, the robustness of a test lies in its ability to hold the probability of mak-
The robust nature of the ttest ing a Type I error relatively constant. When some of the test's assumptions are violated
(e.g. no random sampling, nonnormal parent populations), the ttest holds the likeli-
ensures that results remain
hood of incorrectly rejecting the null hypothesis to the predetermined p value (e.g., .05,
reliable even when one of the .01). A robust te . e chers with a decided advanta e-the are no
test's assumptions is violated. more . ely to detect a false effect when conditions for the statistical test are fair than
lYben conditions are ideal
Across time and frequent application, the t test has proven itself to be relatively im-
mune to the general violations of some of its assumptions. As a statistical procedure,
then, it still indicates reliable, valid differences between a sample mean and a popula-
tion parameter or two-sample means from potentially different populations when con-
ditions are less than ideal. Naturally, no researcher should actively seek to violate any
of the test's assumptions, nor should an investigator assume that every detected differ-
ence is real-it pays to be cautious and conservative when reporting any observed dif-
ference, even when no assumption was violated. (By now, I hope your reflex is to think
about why replicating results from any experiment or other piece of research is so
important.)
Is there a situation where a researcher should not calculate a t test? Yes, there is one
clear case for worry. Harris (1998) suggests that the t test will provide less than trust-
worthy results when two samBles are being compared and (a) they have an unequal
number of observations in each, (b) the respective distributions are not all normal in
shape, and (c) the sample vari"ances differ from one another b a factor or more
~, si = 15.5 and 52 = 27.0 . When these three problems coincide, then, the ttest w'
not be robust-some other statistical test or an alteration in the data will be required
(see also, Coombs, Algina, & Oltman, 1996).
What Does the t Test Actually Do? The practical, mathematical mechanics of the t
test are introduced shortly, but I want you to have a conceptual foundation for the sta-
1~~cIJ. ~ (' tistic before you do any calculations. T~ t test assesses whether any measurable differ-
tJ.. a-.
51»"r- ence exists between means-any difference found . sam Ie data reflects 'fference
.... ~ f1I) aHne population level. When no 'fference occurs, the value of the t statistic . e
~~ e\uat to 0, suggesting that the distributions being examined are identical. As the dis-
.j ... . t~ ~..~.~
...., tributions are measured as being different from one another, the value of the t statistic
i _,.~ deviates from 0 in either a positive or a negative direction.
L \_ .•"JA..-DfJ .:11 . Because t distributions are symmetric (recall Figure 10.2), the positive or negative
_.:.ll~- . ~ , sign accompanying the test statistic lets the user know which mean was larger than the
- . other (i.e., a negative t value indica ond mean was reater-subtractin
alar er number
,
1
The Relationship Between the t and the z Distributions 371
)
l I and retention under the t distribution (but see Data Box 1O.B for a discussion of what
to do when a negative t value is obtained).
J The number representing the measurable difference between means is then divided
A t test detects a significant by the standard error of the difference between the means. Because the value of the
population standard deviation(s) (0-) is unknown, we rely on a sample standard devi-
difference between means when the
ation (5) to calculate the value of the standard error (recall that standard error is re-
difference is large, the sample viewed extensively in chapter 9). Larger values of t, which point to significant mean dif-
standard deviation is small, and/or ferences, occur when:
the sample size is large. • The difference between means is relatively large, and this difference serves as the
numerator for calculating any t statistic.
• The standard deviation, which is used to estimate the standard error of the
I difference between the means, is relatively small. As the denominator for the t
statistic, a smaller standard error will result in a larger value of t.
J • As always, larger sample sizes are desirable because they lead to smaller standard
deviations, which in turn leads to a smaller standard error for the difference
between the means.
W illiam S. Gosset (1876-1937), who published his work under the pseudonym "A Student:'
studied chemistry and math at New College, Oxford. After leaving Oxford, Gosset was one
of several scientists hired by the Guinness Brewing Company, where he examined how brewing
outcomes (e.g., beer quality) were influenced by variability in the quality and type of ingredi-
ents, as well as temperature fluctuations (Peters, 1987; Tankard, 1984).
Prior to Gosset's innovations, most such quality control analyses relied on the standard nor-
mal distribution, which, as you know, has wide and correct applications when samples are large.
Because he necessarily contended with small samples, however, Gosset set out to devise ways to
make adequate inferences when the standard normal distribution could not be used. Gosset de-
veloped a new distribution by combining mathematics with what is now referred to as simula-
tion, systematically creating a sampling distribution of means and then examining the ratios of
the differences between sample means and the (known) mean of the population they were drawn
I )
from (Peters, 1987). The resulting ratios fit the new distribution but not the normal distribution.
Gosset's work drew the attention of the great statistician Ronald A. Fisher, who offered mathe-
matical proofs for the new distribution, later deriving the formula most often recognized as the
I
I! t test from it. (Fisher, one of the 20th century's foremost experimentalists, is profIled in the next
chapter.)
Student's t is ubiquitous in the behavioral sciences and elsewhere-today, it is even a com-
I
I
j
mon fixture on the keypad of many pocket calculators, a fact that would no doubt surprise the
modest Gosset (Vadum & Rankin, 1998). But why did Gosset publish under "Student" and not
his own name? Gosset wanted to save the Guinness Brewery from the embarrassment of admit-
ting that it sometimes produced a less than potable batch or two of ale (Aron & Aron, 1997).
Generations of statisticians, researchers, and students-not to mention distillers and consumers-
are forever in his debt.
l-
I
\i
372 Chapter 10 Mean Comparison I: The tTest
Please notice two things about the null hypothesis shown here. First, I added the word
"recall" as a subscript to the population mean. Hypotheses should always be presented in
terms of the parameter /L, but adding a label to it is very useful because it will always re-
mind you about the dependent measure (i.e., mean recall for digits) you are actually test-
ing. Second, the "less than or equal to sign" (::5) is used here to indicate that although the
normal outer limit for memory span is 7 digits-some people remember fewer, hence it
is reasonable to indicate that recall of "7 or fewer" digits reflects the population average.
Hypothesis Testing with t: One-Sample Case 373
Following usual practice, the significance level for the test will be held at .05.
Step 2. As always, the second step involves calculating the standard error of the
mean and choosing whether to use a one- or a two-tailed test (see Table 9.2). Because
we do not know the values of any population parameters, we must calculate the esti-
mated standard error of the mean using the sample standard deviation, the sample N,
and formula [9.8.1], which is renumbered here for consistency:
A
5
[10.2.1) Sx= --,
VN
2.5
[10.2.2) Sx= --,
v20
2.5
[10.2.3) 5-= - - ,
x 4.47
[10.2.4) Sx = .559.
Should we use a one- or a two-tailed significance test? In chapter 9, I encouraged
you to reI on two-tailed tests due to their rigor, counseling that one-tailed tests be used
when the direction of a re atlOns 1 IS unquestiona certam or to mcrease aval -
..able pmorer.,Let's assume the memory researcher never conducted a memory trammg
program before, thus he decides to follow a more conservative approach and employ a
two-tailed significance test here.
Step 3. We can now calculate the actual value of the one-sample t, as well as the
test's degrees of freedom, and then determine whether to accept or to reject the null
hypothesis. The formula for tis:
X-/L
[10.3.1) t= - - - .
Sx
Calculating the t test is straightforward-we need only to enter the sample mean of 10
(X), the population mean of 7 (/L), and, from step 2, the estimated standard error of .559:
10 - 7
[10.3.2) t= 559'
3
[10.3.3) t= - - ,
.559
[10.3.4) t = 5.37.
The formula for degrees of freedom for the one-sample t test is:
[10.4.1) degrees of freedom = N - 1.
Because there were 20 participants in the sample, the degrees of freedom for this test
are:
[10.4.2) degrees offreedom = 20 - 1 = 19.
In order to decide whether to accept or reject Ho, we must learn to read and use
a new statistical table, one containing critical values of t. Table B.4 in Appendix B
374 Chapter 10 Mean Comparison I: The tTest
Table 10 1 . Excerpt of Table 8.4 (Selected One- and Two-Tailed Critical Values of t)
..
••
• •
'
.10
Level of Significance for One-Tailed Test
.05 .025 .01 .005 .0005
Critical Values of t
For any given df, the table shows the values of t corresponding to various levels of probability. The
obtained t is significant at a given level if it is equal to or greater than the value shown in the table.
contains various critical values of t for samples with various degrees of freedom. A por-
tion of Table BA is reproduced in Table 10.1. To read Table 10.1, locate the .05 column
shown under the heading "Level of Significance for Two-Tailed Test" toward the top of
the table. Now, look over to the left most column, which is labeled dt ("degrees of free-
dom"), and locate 19, the number of degrees of freedom available for this test. Read
across in the table from the number 19 until you locate a number under the .05 col-
umn for two-tailed tests-what is the critical value? If you selected 2.093, you are cor-
rect (see Table 10.1; to verify that you have your bearings, as it were, please find this
same critical value in Table BA in Appendix B now).
Is the calculated t value greater than or equal to the critical value of t? The ob-
served t value of 5.37 is clearly greater than the critical value of 2.093, so we can
reject the null hypothesis. What can we conclude about the training, which was the
whole point of this statistical exercise anyway? We can conclude that the training
group displayed a significantly higher average recall score for digits than is found
in the general population (we will return to the proper way to report this below in
step 4).
Symbolically, the decision to reject Ho is shown as:
tcalculated (19) = 5.37 > tcrit (19) = 2.093: Reject Ho.
Both t values have 19 degrees of freedom available, and these are shown in parenthe-
ses immediately following the statistic's symbol. Please note that I added the abbrevia-
tion "crit" as a superscript to the second or critical t value. I prefer to add such super-
scripts to avoid forgetting which value was calculated and which came from a table of
critical values (another reminder of the mise en place philosophy).
Step 4. The memory researcher effectively demonstrated to his class that three
Superscripts on calculated test training sessions across a one-week period were sufficient to increase the participants'
statistics and critical values reduce
memory for digits. The mean recall of the sample (X = 10.0) was significantly higher
than the population average of 7 digits. Put another way, the training was effective where
confusion by organizing results. memory for digits is concerned.
Wriling Up the Result of a One-Sample t Test. How can this result be presented
in accordance with APA style? The memory researcher could write that, "A Qoe-,
sample t test found that the training group of 20 students displayed a significantly
Hypothesis Testing with t: One-Sample Case 375
-
t test, then. is:
t( dO = tealeu!ated, P < a.
If you calculated the t statistic using a software program or a calculator rather than by
hand and with the use of a table, you could report the actual significance value associ-
ated with it. The format is:
t( dO = tea!cu!ated, P = p,
where the value of p is provided by the printout on a screen or paper. In practice, most
researchers report the standard range significance levels along with a result (i.e., .05,
.01) rather than the actual probability value (e.g., p = .022). You should know, how-
ever, that you do have different reporting options available to you.
Note that it is not necessary to indicate that the test is two-tailed because this choice
is the default option-educated readers and researchers expect that this more rigorous
test will be used. In contrast, the choice of a one-tailed test would need to be explained
in some detail, and an explicit statement that the test was one-tailed would be included
along with the above explanation.
Please also note that both the observed sample mean and the usual population mean
When evaluating means or mean of recall memory for digits were reported. There is no use in reporting any statistical re-
sult unless readers are able to ask and answer the ever-important question, "Compared
differences using inferential
to what?" As a parametric test, the t test searched for-and found-a probable differ-
statistics, cultivate the important ence between the sample mean of 10 and the population mean of 7. Readers need to
habit of asking,"Compared to know what the respective mean recall levels for digits are so that they can intuitively un-
what?" derstand what difference the test is identifying. Indeed, critical readers (and you are be-
coming one) also want to verify for themselves the directional relationship portrayed by
the means. One of the worst sins of data analysis is the failure to report the means (and
their accompanying standard deviations) along with a significance test. Such omission
leads more critical readers to assume that the researcher either is sloppy or is hiding
something, both of which are unfavorable impressions to create in the minds of others.
where the critical value of t under the null hypothesis is multiplied by the standard error
of the mean. The resulting product is then added to and subtracted from the known
sample mean to form the confidence interval for the one-sample t test. We will use the
information from the memory for digits example here, but you could determine the
confidence interval for a sample mean without previously conducting a one-sample t
test (i.e., determine the degrees of freedom based on sample size and then look up the
corresponding critical t value in Table BA).
Because we used a critical value of t at the .05 level for the previous problem, we
will be calculating a 95% confidence interval for the sample mean from the training
376 Chapter 10 Mean Comparison I: The tTest
project (i.e., 1 - a = 1 - .05 = 95%). The known sample mean (10), the two-tailed
critical value of t at the .05 level (2.093), and the standard error of the mean (.559) are
all entered into formula [10.5.1], or:
[10.5.2] 10 ± 2.093(.559).
The first step is to calculate the lower limit of the confidence interval:
[10.5.3] 10 - 2.093(.559),
[10.5.4] 10 - 1.17 = 8.83.
The upper limit of the confidence interval, then, is:
[10.5.5] 10 + 1.17 = 11.17.
What can we conclude? Remember, we cannot conclude that there is a 95% chance
that the population mean is 10. Rather, the confidence interval establishes boundaries
for the ran e of ossible means that might be observed if repeated samples of the same
size were drawn from the population. y re ying on teo serve sample mean and
standard deviation, which are based on a population whose parameters are unknown,
an interval is identified where other means selected from this population are likely to
fall. Means representing mean digit recall should appear in the interval ranging between
8.83 and 11.17. We cannot be more precise here because we do not know much about
the parent population. We are also limited by the reality of working with relatively small
samples of observations, which increase the chance of sampling error. If we continued
to sample in this fashion, we would no doubt learn that subsequent sample statistics-
means and standard deviations-would all vary somewhat in value from one another.
,
I~
;. Table 10.2
Hypothesis Testing with t: One-Sample Case
(' "•"""• Significance (0: or p) level-less conservative p values (e.g., .05 instead of .01) increase the
likelihood of rejecting Ho. Bear in mind that concerns about power must be balanced against
(' concerns regarding Type I errors (i.e., less conservative p values increase the probability of
making a Type I error; see chapter 9).
Variability within the sample data-all else being equal, less variability in the form of a
I lower standard deviation(s) will lead to a lower standard errQr of the mean (or the standard
J
t error of the difference between means).
I
r Sample size-bigger samples are better samples. The more observations available, the
r greater the chances of rejecting Ho. Larger samples, too, mean that more degrees of freedom
I
l' are available for a t test.
r
Mean differences-The larger the difference between a sample mean and a population
mean (or two independent or dependent sample means), the larger the resulting t statistic,
thereby increasing the chance of rejecting Ho. When there is little difference between
means, a calculated t statistic will be close to 0, indicating the absence of any significant
difference.
i
r'
Note: These criteria are meant to apply to the one-sample, as well as the independent and dependent
r
~ group t tests.
r
r'
i
r
f Power Issues and the One-Sample t Test
I Do you remember some of the criteria that enhance the power of a statistical test? Sev-
I,
Remember: Power helps to achieve eral criteria affecting the degree of power available to reject a null hypothesis were in-
troduced in chapter 9. Naturally, these criteria are relevant to the t test, including the
research and data analytic goals
f one-sample variety. Table 10.2 lists four key criteria that influence the viability of the
( by enabling investigators to reject
r one-sample ttest's power. Please take a moment to examine the criteria listed in Table 10.2
r Ho when Ho is true. and refresh your memory about power issues. You can refer back to this table as necessary
r when planning a study and its analyses or reviewing statistical results.
(
Knowledge Base
1. When is it appropriate to use the t test and the z test, respectively?
2. How does sample size affect the standard deviations and degrees of freedom asso-
ciated with t distributions?
3. True or False: One of the assumptions of the t test is that means are based on in-
terval or ratio scales of measurement.
L
I
(
4. True or False: A robust test is one that applies to many different types of data.
I 5. When is it appropriate to use a one-sample t test?
r 6. A sample comprised of 30 participants has a mean of 27 and a standard error of
rr 1.2. Calculate a 99% confidence interval for the mean.
r
r Answers
f'
J..
1. The t test is used when population parameters remain unknown and the z test is used when
parameters are known.
2. As sample sizes increase, the standard deviation decreases and the available degrees of free-
"1-
!t, dom increase. Smaller sample sizes are associated with larger standard deviations and fewer
i degrees of freedom.
I 3. True.
I 4. False. Robust tests can be used when a test's assumptions are violated. Under these circum-
I
5. One-sample t tests are used for hypothesis tests comparing a sample mean to a related pop-
ulation where the population mean and standard deviation are unknown.
6. 27 ± 2.756(1.2) = interval between 23.69 and 30.31.
to one or the other gender, for example; the participant is already a male or a female.
The experimenter, however, can compare the presence or absence of a subject variable's
influence on participant reactions to something (e.g., male reactions to a given stimu-
lus-say, a film of violent behavior-are compared to the reactions of women). Sub-
ject variables provide a "lens" through which other behaviors can be evaluated or com-
pared, though causal conclusions cannot be drawn.
As you may recall from chapter 2, though, a researcher cannot determine the causal
An experimenter manipulates ordering of events unless a variable is manipulated, thus, identifying causality is always
indirect-and potentially problematic-when subject variables are present. Instead, the
independent variables, but subject
investigator must be content with making an inference about the affect the presence of
variables can only be measured for a subject variable may have on some behavioral outcome. Perhaps males are generally
making between-groups attracted to violent images due to socialization processes. These same social influences,
comparisons. in turn, lead most females to be repelled by violent images, a proposition that is worth
considering but one that can never be fully tested because of the limitations posed by
subject variables.
111 = ? IL2 =?
Populations
Distributions of A;
A
- X2 Differences Between Means
Figure 10.3 Understanding the Origins of the Sampling Distribution of the Difference
Between Means
Source: Adapted from Figure 9-1 in Aron and Aron (1997, p. 180).
the bottom of Figure 10.3, the sampling distribution of the difference between means
is theoretically derived from the respective samples and their statistics (i.e., sample
means and sample standard deviations) which, in turn, were drawn from the popula-
tions with the unknown parameters.
Adopting an apt metaphor from Henry David Thoreau, Aron and Aron (1997,
p. 180-181) note that the theory behind this sampling procedure is "really a kind of com-
plicated castle in the air." We never actually create the sampling distribution of the dif-
ference between means, but we behave as if we did by following the logical implications
derived from sampling theory. The only mortar of our castle, as it were, the only concrete
realities involved here, are the data we collect after two samples are randomly created and
exposed to one or another level of some independent variable. The data comprising each
sample are then used to estimate the variances of the respective populations, which in
turn are used to estimate the variances of the two distributions of sample means that
lead to the sampling distribution of the difference between means.
You are probably anticipating what this sampling distribution's characteristics are
like and no doubt guessed correctly that it will be normal (congratulate yourself now
if this thought occurred to you when you began reading this section). Because it is a
normal distribution composed of difference scores, the sampling distribution of the dif-
ference between means will have a mean of O. Here is an important aspect of this nor-
mally shaped sampling distribution-we can describe it even when the pairs of sam-
ples we use to create it are not from the same population. Symbolically, the mean of
this sampling distribution is shown as (ILx, - ILx,) and it is used to approximate the
actual population mean of (ILl - IL2)'
Hypothesis Testing with Two Independent Samples 381
KEY TERM The standard error of the difference between means is the standard deviation of the sampling
distribution of mean differences derived from two independent samples presumed to represent
two separate populations.
The resulting distribution will be normal and because the population parameters are
known, hypotheses can be tested using the standard normal distribution.
Given that we will probably not know the value of any population parameters, we
will need to estimate the standard error of the difference between means and substitute
its value for the denominator shown in [10.6.1]. As you will see, doing so converts this
z test into the t test for independent groups. Before we get to that point, however, we
first need to estimate CTx, - X2 from sample data.
Using sx, _ X2 to estimate CTx, _ x2 • The estimated standard error of the difference be-
tween means is known as sX,-x and we will review its utility in conceptual and then
2
computational terms. To begin, the estimated standard error of the difference between
means can be determined by taking the square root of the combined variances for each
of the two samples, or
[10.7.1] sx, - X = \lsi + s~.
2
another. As an aside, maintaining equal sample sizes between conditions in any exper-
iment is no small feat, and it must always be a concern for data analysts. Ideally, the
conditions or groups in any experiment will have the same number of participants ran-
domly assigned to them (another desirable factor is to have the same number of males
and females in each condition), but the reality of experimentation often disrupts this
goal-participants drop out, neglect to attend research sessions, equipment fails-any
number of tragedies great and small can prevent a researcher from achieving equal Ns
across conditions. As a result of this real concern, a variation of formula [10.7.1] pro-
vides an unbiased estimate of the standard error of the difference between means:
[10.8.1] sx, - x 2 =
The advantage of formula [10.8.1] is that it takes into account the biasing influence of
unequal groups by providing a more balanced-and unbiased-estimate. Take a mo-
ment to examine formula [10.8.1] and to refresh your memory about the source of the
numbers shown within it. The sums of the squares (i.e., 551 and 552 ) are calculated sep-
arately by using formula [4.10.1], which was introduced back in chapter 4. This formula
is renumbered here for each of the two groups (note that the calculations are precisely
382 Chapter 10 Mean Comparison I: The tTest
the same, except that the data entered into each is from one sample or the other):
(IXI )2
[10.9.11 551 = Ixi - - - -
NI
and
[10.10.11
If you feel a bit rusty where calculating the sum of the squares is concerned, take a
minute and examine the relevant portion of chapter 4.
Besides being an unbiased estimate, what statistical information is formula [10.8.1]
actually providing? Formula [10.8.1] estimates ax, _ x, by "pooling"- or combining-
the separate sums of squares and degrees of freedom representing the two samples that,
when their square root is taken, yield a pooled estimate of the standard error of the dif-
ference between means (i.e., sx, - x,).
Why does this pooling procedure to estimate the standard error of the difference
between means make sense? Think about it for a moment. If you were working with
one sample drawn from a larger population, you could calculate sample statistics from
it. By dividing the unbiased estimate of the variance (i.e., 52) by the available Nand
then taking the square root of the resulting product, you could determine the standard
error of the mean for that sample. In effect, formula [10.8.1] is doing the same thing
except that it combines the separate standard errors for two samples into one useful nu-
merical index-the standard error of the difference between means.
Formula [10.8.1] is readily reduced to a simpler version when the available Ns for
the two samples are equal to one another. When NI and N2 are equal to one another,
use the following formula to estimate sx, - x,:
,------
(551 + 552 )
[10.11.11 sx, - x, = N(N- 1) ,
where N is equal to the equivalent number of observations in each group, not the total
number of observations available (i.e., NI + N 2 ). When unbiased estimates of the sam-
ple variances are available, an alternative formula can be used:
"2 "2
[10.12.11 5)(, - x, = ~ + .!...£.
NI N2
Please remember that the caret ( A) not only indicates that a statistic is unbiased, its
presence specifically means that N - 1 rather than N was the denominator in the for-
mula used to calculate the respective sample variance estimates (i.e., I(X - x)2 IN - 1).
The variance estimates are then divided by the respective sample sizes, the resulting
products are summed, and the square root of the sum is taken.
When sample sizes and their population variances are approximately equal to one
The standard error of the .oifference another (i.e., at == a~), the standard error of the difference between means (sx, - x,) is
between means (sx, _ x2 ) is often ideally suited for use in the ttest for independent groups. Many statistics books and some
research methods texts refer to this form of standard error as an error term. In other
referred to as the error term, the
words, the standard error of the difference between means serves as a numerical gauge
denominator in statistical of how much random error-the degree to which observations deviate from a popula-
calculations. tion's true characteristics-is present when samples are drawn from a population.
KEY TERM The error term is a synonym for the standard error of the difference between means, one used to
estimate the random error that is found anytime samples are drawn from a population.
The error term is always used as a denominator in statistical calculations. Keep in mind,
too, that the standard error of the difference between means is just a different incarnation
Hypothesis Testing with Two Independent Samples 383
of the standard deviation-it just so happens that this standard deviation lets data an-
alysts estimate how much random samples differ from population values. Armed with
this statistic, we are almost ready to test a hypothesis involving two independent groups.
Before we do so, however, a short overview of the relationship between error terms and
mean differences is in order.
Comparing Means: A Conceptual Model and an Aside for Future Statistical Tests
This section is a very important one, so much so that if you read it carefully and think
about it critically, I assure you that the statistical tests that follow in this and even the
next chapter will make a great deal of sense. Take your time reading and refer back to
previous sections as needed, but remember that by comparing means we are trying to
assign a numerical value to how people behave under various conditions. The dispar-
ity or similarity of behavior within and between groups of people allows us to deter-
mine whether and where any statistically significant differences lie.
Various statistical tests used to compare means-especially those relying on two
samples-are based on the following conceptual model:
. . 1 Difference between sample means
Statlstlca test = .
Standard error
Given that we just learned that error term is a substitute phrase for standard error, this
formula is easily recast as:
. . 1 Difference between sample means
Statlstlca test = .
Error term
This model indicates that the measured difference between sample means (numerator)
is divided by estimated random error of the difference between means (denominator).
When the observed difference between the means is great-so great that it exceeds the
estimated . e sam Ie means is presum-
a I due to some influence be ond random variation in the samples. A researc er, of
course, hopes that the influence is the independent variable, one at created a mea-
surable difference in behavior between the samples. When an observed difference be-
tween means is sufficiently large, that is, greater than would be anticipated by chance,
the sample means-and the result they represent-are said to be significantly different
from one another.
Earlier, we discussed what is meant by a between-subjects design, a designation we
can return to here to describe what occurs between samples. The mean of each sample
represents an average level of behavior within that group, so that if we examine the
difference between the averages of the two groups, we identify what statisticians call
between-groups or between-samples variation. The difference in variability between
the groups-literally, a difference in average behavior-is attributed to the independent
variable.
Let's turn back to the error term for a moment. It represents a combined index of
average deviation from the mean behavior within each sample. When the error term is
relatively small, it means that behavior within each sample was similar to the mean of
the sample (i.e., low variability within each respective group). On the other hand, when
the error term is larger, behavior within either group deviated from what was typical
for the group. Where mean comparisons are concerned, statisticians refer to the stan-
dard error or error term in a test statistic as within-group or within-sample variation.
Within-group variation-deviation from a group's average-is explained by ex-
amining two sources: random error and experimental error. Random error is ascribed
384 Chapter 10 Mean Comparison I: The tTest
to a host of minor influences, chief of which is that people are simply different from
one another (many investigators refer to this as individual differences within a sam-
ple). Random error is expected and there is little or nothing to be done about it. Ex-
perimental error is more problematic-indeed, it is often systematic, originating from
equipment failure, environmental conditions, poorly presented instructions, or any
other disruption in what should ideally be a smoothly running study (for more detailed
discussion of such biases, see Dunn, 1999; Rosenthal & Rosnow, 1991)tf)
As yo~ght imagine, a researcher's goal js to mptro! experimen~rror and to
Within-group variation-the error reduce ra~ error as much as possible. The practical reason for doing so is statisti-
term-is comprised of random
<ii. By doing so, the error term is kept small so that when it is divided into the observ;d
mean difference, the resulting test statistic should be numerically large enough to ex=::'-
error and experimental error, c;d some CrItICal value so that the null hypothesis of no difference between the means
is thereby rejected. When the error term is large, its utility as a denominator suffers;
any resulting test statistic is unlikely to even reach a critical value and the null hypoth-
esis of no difference must be retained.
What, then, is the conceptual "take home" message from this section? In order to find
a statistically significant difference between two means, a researcher desires a relatively
large amount of between-groups variation and a relatively small amount of within-
groups variation. Put another way, due to the affect of the independent variable, each
sample should behave somewhat differently from the other, whereas the behavior dis-
played within each sample should be similar. We can now recast the relationship be-
tween error terms and mean differences as: ~~
You may remember this version of the t formula, as you encountered it back in chap- .-
ter l's project exercise. Let's review each part offormula [10.13.1] before using it to ac-
tually test a hypothesis. As you can see, the denominator is the standard error of the
difference between means or error term that we just reviewed in detail. The magnitude
of the error term will depend on the size of the two samples and the variability within
each one, though two general rules of data analysis apply: Larger sample sizes will usu-
ally result in a smaller error term, as will lower variance within each sample.
-The numerator presented in [io.13.1jls unusual becaUSE 11 has two components,
one representing sample means and one representing population means. Actually, the
" '
Hypothesis Testing with Two Independent Samples 385
second component is literally symbolic-(fLl - fL2) points to the null hypothesis, which
will equal 0 if no difference between the population means exists. When Ho is true, any
minor difference between the observed means is attributed to sampling error, not the
intervention of any independent variable. The first component in formula [10.l3.1],
then, is the actual numerator, so that the numerical difference between XI and X2 is as-
sessed and then divided by the error term. A simplified way to present the t test for in-
;.
dependent groups is:
[10.14.1] t=
j
r
(
Table 10.3 Test Scores from Control and Cooperative Learning Groups
.I ..
II
. .. Control Group
3
Cooperative Learning Group
6
I 5 7
! 3 7
i 5 8
I 4 6
) 3 8
3 8
( .
3 7
N 8 8
LX 29 57
Lx2 III 411
X. 3.63 7.13
SS 5.88 4.88
s .916 .835
j
J'
386 Chapter 10 Mean Comparison I: The tTest
Step 1. The null hypothesis for the present study is that neither group has a higher
test score, or:
Ho: J.LI = f.L2.
The alternative or research hypothesis would be that the control group would exhibit
a lower test score than the cooperative learning group, or:
HI: J.LI < f.L2.
As always, the significance level for rejecting Ho will be .05.
Step 2. We need to calculate the standard error of the difference between means
(error term), but we will do that in the course of calculating the t statistic in step 3.
What we can do now is to decide to use a one- or a two-tailed significance test. To be
appropriately conservative, a two-tailed test will be performed.
Step 3. We can now perform the t test for independent groups using formula
[10.14.1]. Because the sample sizes are equal, we can use formula [10.11.1] to deter-
mine the error term (denominator).
Information drawn from the bottom of Table 10.3 can now be entered into this
formula for: (3.63 - 7.13)
[10.14.3) t= (5.88 + 4.88) ,
8(8 - 1)
-3.5
[10.14.4) t= JlO.76,
8(7)
-3.5
[10.14.5] t= JlO.76,
56
-3.5
[10.14.6] t=
VT921'
-3.5
[10.14.7] t=--,
.4383
[10.14.8) t = -7.99.
Because this is the first statistical test used to compare independent samples, I provided
more detailed steps for performing the calculation than usual. Please note that I did not
demonstrate how to calculate the sum of the squares here, as we have reviewed it nu-
merous times before (see formulas [10.9.1] and [10.10.1]).
Before we can identify a critical t value, enabling us to accept or reject the null hy-
pothesis, we need to calculate the degrees of freedom for this t test. Degrees of freedom
for an independent group t test are calculated using:
[10.15.1) degrees offreedom (df) = (N! + N2 ) - 2.
There were 8 students in each group, so:
[10.15.2) df = (8 + 8) - 2,
[10.15.3) df = 16 - 2,
[10.15.4) df = 14.
Hypothesis Testing with Two Independent Samples 387
We then turn to Table BA in Appendix B (please turn there now, as you need to
develop the habit of using this table). Locate the number 14 in the left most row un-
der the column (labeled "df"), as well as the .05 column under the heading "level of
significance for two-tailed test:' What critical value for t meets your gaze as you read
across the row for 14 degrees of freedom and down the .05 column? You should locate
2.145. Did you?
We now compare the calculated t against this critical t value-but wait, the calcu-
lated t is negative? Recall the guidance provided in Data Box 1O.B and .:grop" the neg-
ative sign (for nOF) and compare the absolute value of the calculated statistic with the
CrItical value from Table B.4. Is the absolute value of the calculated t greater than the
critical t? Yes, indeed:
Writing up the Result of an Independent Groups tTest. What can the educational re-
searcher conclude? Reporting the results in APA style, the researcher would briefly re-
mind the reader about the hypothesis and then highlight the obtained results. Here is
one way to achieve these ends:
Did those students who learned about industrial pollution while working together learn
more than those who studied the material alone? At the end of the week, a 10-point
knowledge test concerning pollution reduction was given to all 16 students. As expected,
students in the control group received lower scores (M = 3.63, SD = .916) than did
those in the cooperative learning group (M = 7.13, SD = .835), t(14) = -7.99,
P < .05, two-tailed.
Please note a few things about how the result is reported. First, I elected to report
!he means and respective standard deviations within a single sentence. As noted ear-
lier in thiS book, this style IS preferable when there are few means tOteview-if sev-
eral means were being tested, a tabular format would be appropriate> Second, the
parenthetical means use "M" for mean rather than "X" and "SD" for stand vi-
atlon i s -APA style abbreviation conventions that subtly emphasize av-
erage behavior, not statistical analysis, as the key point when results are presented.
Third, the actual result of the test statistic is presented at the end of the sentence de-
scribing what happened. By placing the statistical information here, it can easily be
examined by interested researchers or safely ignored by lay readers. Finally, note that
in the interest of reporting accurate analyses, I retained the nega1tve sign on me
calculated t value.
388 Chapter 10 Mean Comparison I: The tTest
Deflaters (words used to describe "dramatic" negative change or poor performance): only,
meager growth, poor performance, mere, nose dived, tumbled, deteriorated, slipped, collapsed, dis-
integrated, toppled, shrank, as little as,
-----------------------------------------
On first blush, these words and phrases appear to be associated with the mass media, but
some of them find their way into the behavioral science literature, as well. As you read articles,
books, and book chapters containing statistical analysis and interpretation, watch for strategic
use of inflaters and deflaters to emphasize the magnitude of mean differences or other results.
Remember that the relationships observed within a set of data may not be as strong or as weak
as they are said to be!
Are we finished with the analysis? We are actually not finished,yet, as the oppor-
tunity to examine the obtained effect remains. What information. are we missing? We
need to have an index for the cooperative learning effect, one that denotes its relative
strength. -
In the present example, what was a nitude of the effect of the experimental-
level 0 e independent variable (i.e., cooperative earning) on t e ependent measure
(i.e., test scores. a I Ica e ect sIze is readily calculable-different indices for vari-
ous test statistics are available-and higher values generally indicate that the indepen-
dent variable had a stronger influence on the dependent measure. One effect size for-
mula for the independent groups t test is effect size r, which is derived from the Pearson
(
correlation coefficient introduced in chapter 6 (e.g., Rosenthal, 1991; Rosenthal &
I' Rosnow, 1991; see also, Cohen, 1988):
,
'(
J 63.84
[10.16.3] effect size r =
63.84 + 14
[10.16.4]
. r=
effect Stze J63.84
--,
c 77.84
/
[10.16.5] effect size r = V.8201,
[10.16.6] effect size r = .9056 == .91.
Clearly, the effect size of .91 in this hypothetical example is very strong (in the neigh-
borhood of what Rosnow and Rosenthal [1996] refer to as "jumbo-sized"), indicating
that the cooperative learning of material between pairs of students enhanced their later
test performance.
How is effect size r reported along with the main test statistic? Rather easily, actu-
ally. As a data analyst, you can assume that some of your readers will be familiar with-
even knowledgeable about-effect size, and so you do not need to go to great pains to
explain it within the Results section of an APA style paper. Readers who are unfamiliar
(
with statistical analysis will be looking for the "gist" or take-home message of your re-
sults, which will appear in clear prose, anyway. To indicate that the effect size of the t
statistic in the cooperative learning study was strong, we need only add an additional
sentence to the results paragraph (excerpted from above):
... students in the control group received lower scores (M = 3.63, SD = .916) than did
those in the cooperative learning group (M = 7.13, SD = .835), t(l4) = -7.99, P < .05,
two-tailed. The size of this effect was large (effect size r = .91).
Note that by presenting the result in this manner, the context provided by the descrip-
tive portion of the results will help people who know little about statistics understand
the result's importance.
T he concept of effect size, or the magnitude of some statistically significant result, was concep-
tually introduced in the last chapter, and a particular formula for its calculation (effect size
r derived from t) was presented here. Behavioral scientists now routinely note that effect size is
as important and informative as the statistical significance denoted by a p value. Data analysts
are now encouraged to report effect size along with standard inferential tests as a matter of course
(e.g., Rosenthal & Rosnow, 1991). Still, it is reasonable to ask whether there are other ways to
think about effect size besides statistical indices.
Prentice and Miller (1992) persuasively argue for the long tradition of examining effects
from a methodological rather than a merely statistical perspective. Indeed, generations of re-
searchers in psychology approached issues of effect size by way of research design strategies, not
data analysis. An intriguing quality inherent in these strategies is their emphasis on statistically
small effects that are nonetheless "impressive demonstrations" (Prentice & Miller, p. 161) Two
strategies are prominent: minimally manipulating an independent variable and choosing difficult-
to-influence dependent measures.
Minimal manipulations. The minimalist approach is impressive because even the relatively mi-
nor manipulation of an independent variable can account for change in the variability of a cor-
responding dependent measure. An obvious, classic example in this vein is Tajfel's minimal group
experiments (e.g., Billig & Tajfel, 1973; Tajfel & Billig, 1974), where arbitrary assignment to a
group leads to displays of ethnocentrism, a pronounced preference for one's own group and a
decided distrust of outsiders. Even when group assignment is done explicitly by some random-
izing procedure, individuals still demonstrate a clear preference for the members of their own
group (Locksley, Ortiz, & Hepburn, 1980).
This undue preference is all the more sobering when observers consider the complete arbi-
trariness of the exercise-a group member may literally have nothing whatever in common with
group peers except the accident of assignment to one rather than another collection of individ-
uals. Yet that group assignment is sufficient to engender loyalty and commitment to one's com-
patriots in the arbitrary group, as well as a strong tendency to favor this ingroup at the expense
of individuals randomly assigned to the (other) outgroup (see also, Tajfel, 1981). Beyond these
group assignment effects, of course, other independent variables adhering to the minimalist ap-
proach can be found in Prentice and Miller (1992).
Difficult dependent measures. On the surface, some dependent measures appear less open to
influence by particular independent variables than others. What independent variable, for ex-
ample, could be expected to show clear and consistent effects where people's subjective ratings
of intelligence, success, sociability, kindness, and sensitivity are concerned? Prentice and Miller
(1992) point to the pronounced effect that physical attractiveness has on social perceivers-more
attractive people are readily perceived to be more intelligent, successful, sociable, kind, and sen-
sitive, among other positive qualities, than are less attractive people (see Berscheid & Walster,
1974, for a review).
More to the point, physical attractiveness is repeatedly demonstrated to influence human
judgment in socially important domains, such as mock jury trials. Attractive defendants, for ex-
ample, are given lighter sentences and judged guilty less often than their less physically appeal-
ing counterparts (e.g., Efran, 1974; Sigall & Ostrove, 1975). As Prentice and Miller wryly observe,
if personal attractiveness has such subtle but powerful effects in what should be the paradigm
example of objectivity, the courtroom, then there may be no social setting that is immune to its
influence.
( Hypothesis Testing with Two Independent Samples 391
The moral to the effect size story? Prentice and Miller (1992) remind us that our statistical
zeal for searching out strong effect sizes is all to the good, but that we should not neglect alter-
native research strategies in the process. Minimal manipulations and difficult-to-influence de-
pendent measures highlight roles for independent variables and psychological processes, respec-
I tively, even when neither source provides particularly impressive results in statistical terms. As
I
behavioral science researchers and data analysts, we can learn a valuable lesson from these ob-
I
r servations and supporting studies-even small effects can be impressive where understanding
human behavior is concerned.
about the e~ of tAil jndependent variable (e.g., the presence or absence of coopera-
tive learning) on the dependent measure (e.g., follow-up knowledge test score). To be
sure, the significant difference identified by the independent groups t test indicates
that there is some de ree of ass . . en these two v . it does not
quanti It.
Presumably, the educational researcher would like to characterize the relationship
between cooperative learning and subsequent test performance as an important one.
i
I Generally speaking, the greater the degree of association between an independent vari-
~.
,. able and a dependent measure, the more the finding can be designated an important
~
!
one. A straightforward index of the association between an independent variable and
( a dependent measure is III (estimated omega-squared).
KEY TERM As a statistical index, estimated omega-squared (w 2 ) indicates the degree to which an indepen-
dent variable accounts for variation or change in a dependent variable.
Calculating al for an independent groups t test is done using:
"2 2
t - 1
[10.17.1] w = -;;------
t 2 + N J + N2 - 1
Once again, we need only insert the value of the calculated t statistic (it will be squared)
and, in addition, the number of participants appearing in both groups. Thus,
(
( (-7.99)2 - 1
[10.17.2] al =
(-7.99)2 + 8 + 8 - 1
63.84 - 1
t'
[10.17.3] al=
63.84 + 15
62.84
[10.17.4] al = --,
78.84
[10.17.51 al = .7971 == .80.
How do we interpret an al of .80? This statistic is interpreted in the same way we
learned to interpret r2 (see chapter 7), iliat IS, the degree to which the independent van-
able accounts for medependent variable. In concrete terms, we can say that approxi-
matel 80% of the variance in the test scores (de endent measure) IS accounted tor b
t
I the learning method (indepen ent variable). This amount 0 variation is considerable,
J suggesting that the degree of association between the variables is very high.
Unlike r2, however, al can have a positive or a negative value (r2 values only range
between 0 and 1.0; see chapter 7). Very small tstatistics-those falling below 1.00 in mag-
nitude-can lead to a negative al. Negative values for al are not at all meaningful; in-
deed, there is no reason to calculate this statistic if the t test does not reveal a significant
between-group difference, anyway (Runyon, Haber, Pittenger, & Coleman, 1996).
392 Chapter 10 Mean Comparison I: The tTest
Thus, we use (;l for the same reason that we rely on effect size r: Despite their
utility for research, inferential statistics like the t test only identify the presence or
absence of a difference between means. To more fully understand the empirical qual-
ity of the relationship between variables, these additional indices are used. But wait,
how do we report (;l in the results of the t test for independent groups? We need only
add another clause to our growing account:
Students in the control group received lower scores (M = 3.63, 5D = .916) than did
those in the cooperative learning group (M = 7.13, 5D = .835), t (14) = -7.99, P < .05,
two-tailed. The size of this effect was large (effect size r = .91), as was the degree of
association between the independent variable and the dependent measure (fil = .80).
Knowledge Base
1. True or False: Similar to traditional independent variables, subject variables are also
manipulated by researchers.
2. In contrast to the one-sample t test, when is the t test for independent groups used?
3. Where the sources of variation are concerned, most inferential test statistics are
based on _ _ groups variation divided by _ _ groups variation.
4. A health psychologist exposes two groups of randomly selected student volunteers
to a common flu virus. One group is comprised of 10 athletes who get regular ex-
ercise and the other is made up of 10 students who do not exercise regularly. The
researcher measures the average number of days each group stays ill with a cold.
Here are the data:
Athletes Nonathletes
x = 6.0 x = 8.0
55 = 10 55 = 18
Can the researcher argue that regular exercise decreases the amount of time one
is ill?
5. What is the effect size and the value of f;i for the study described in question 4?
Answers
1. False: Subject variables (also known as organismic variables) are individual differences that
cannot be changed or otherwise manipulated.
2. An independent groups ttest is used in traditional or standard experiments where two ran-
domly assigned groups are exposed to one of the two levels of an independent variable. Any
between-group difference in the sample data is assumed to reflect a real difference between
respective populations ~d their paran~eters.
3,__ Test statistic =_ between-groupuaril1!ionlwithil1-groups variation.
Hypothesis Testing with Correlated Research Designs 393
4. Reject Ho: t(lS) = -3.76, P < .05 (two-tailed). The athletes were sick for fewer days on av-
erage (M = 6.0) than the nonathletes (M = S.O).
5. Effect size r = .66; t;} = .40.
Note: Carryover effects can plague any correlated or repeated measures design where two or more
measures are administered to the same group of research participants.
over effects are threats to the internal validity of the research because the investigator
is not always sure that the observed change is exclusively due to the effect of the inde-
pendent variable on the dependent measure (for a review of internal validity, see
chapter 2).
Table 1004 lists six of the most common types of carryover effects observed in cor-
related research designs. Depending on the nature of the research, one or more of the
sources listed in Table lOA can plague a correlated design. Plan to consult this list of
common carryover effects anytime you are reading about, planning, or analyzing data
from a correlated groups design. When presenting or writing about such results, you
should always acknowledge the potential presence of carryover effects by discussing
them in detail. Beyond such acknowledgement-and depending on the data's source or
use-you may want to assure yourself, as well as those learning about your research,
that carryover effects were either controlled for or ruled out in your work insofar as it
was possible to do so.
Counterbalancing is one methodological maneuver experimenters can use to pre-
vent carryover effects. Counterbalancing is a form of experimeptal control where each
artici ant en co --- imulus materials but in a di nt order than any of"The
o1her partIC1pants (e.g., Campbell & Stanley, 1963). Imagine that before a pretes 1S ad-
Hypothesis Testing with Correlated Research Designs 395
; 1/1 The Statistical Advantage of Correlated Groups Designs: Reducing Error Variance
.'r The relatively low error variance The correlated group design enables us to build on the knowledge we gained earlier in
associated with correlated groups this chapter when we reviewed the conceptual model for comparing means. Statisti-
cally, the great advantage of the correlated group design is that the same participants
designs increases the chance of
experience any and all treatments. Thus, assessing participants' states before and after
finding a significant research result. the presentation of the independent variable establishes a firm basis for comparison. In
a very real sense ach participant serves as his or her own control rou . How so? Be-
cause participant expenences 0 eve s 0 e In epen ent variable; that is, each per-
son encounters the study's control level (pretest measure) as well as experimental level
(posttest measure). Think about it: Assuming that carryover effects pose no interpre-
tive threat (see Table lOA), the correlated research design is ideal because any measur-
able change from one measure to the next is attributed to the influence of the inde-
pendent variable.
Correlated group designs effectively keep error variation-the error term in the de-
nominator of the correlated groups t test-relatively low by using the same (or care-
fully matched) participants before and after the treatment is introduced. What keeps
396 Chapter 10 Mean Comparison I: The tTest
To be sure, this formula looks a bit forbidding, though parts of it should certainly be
recognizable. Similar to the two prior t test variations, the numerator in formula
[10.18.1] denotes the comparison between means-here they happen to be from the
same or similar group of people, of course-and, as in the t test for independent groups,
the aram ter notation involving fJ- is strictly s}'Il!bolic. The denominator, the error
term, is not very different om w at you ea yow, except that a correlation between
the standard errors of the two dependent measures is indicated.
Fortunately, another, much less laborious formula (with accompanying calculation
procedures) is available, one based on raw scores resulting from the mean differences
between timel and time2:
[10.19.1] tv = S15
where Xv is the mean of the difference scores-the average change-between the be-
The t test for correlated groups
fore and after measures, and S15/~ is the standard error of these difference
scores. As was the case for the Pearson r, please note that N refers to the number of
assesses the average change irs of d' res and not the total number of observations available. One
between a measure taken at time1 other note before we wor t roug a sample problem: To avoid confusing is varia-
and another taken at time2' tion of the t test with the other two we know, I prefer to identify it as tv when per-
forming a calculation (the subscript D refers to "difference scores" or, if you prefer,
"dependent" groups). When reporting a result in written form, just be certain to let
the reader know that a t test for correlated group designs was used to analyze the data.
Let's consider a straightforward example involving attitude change, one adapted
from Dunn (1999). An applied social psychologist wants to determine if an antismok-
ing campaign aimed at high school students is effective when it comes to making ex-
Hypothesis Testing with Correlated Research Designs 397
r isting attitudes toward cigarettes less positive. Following some intervention (e.g., a
I month-long series of speakers, class discussions on the perils of smoking, a poster cam-
paign in school hallways), a group of high school students is anticipated to perceive
smoking more negatively than before the intervention was initiated. Imagine that par-
i ticipants are eight randomly-selected high school students (four males, four females)
) who rate their attitudes toward cigarette smoking at timel (preintervention) and then
;
one month later at time2 (postintervention). Naturally, we will follow the four steps
for testing a hypothesis outlined in Table 9.2.
Step 1. What is the null hypothesis we are testing? By using formula [lO.19.1], we
are assessing a mean difference in an unusual way. How so? We treat two samgles-
( pre- and posttest attitudes-as one sample (i.e., the calculateddifference score based
t
>
o'ii a raw score at tImeI minus a raw score at time2)' In effect, the population. we are
sampling from is comprised of difference scores, and we are trying to determine whether
an observed mean difference (XD ) is reater than O'":"""A larger average difference indi-
cates e ce 0 at Itu e c ange, while a smaller average change implies relative
f constancy of belief. This logic is somewhat similar to that used to test hypotheses in-
;
) volving the Pearson r in chapter 9 (i.e., is an observed value for r different from O?).
r The hope, of course, is that the obtained difference is in a direction consistent with the
r study's hypothesis.
I
r ..Jld. Thus, the null hypothesis regarding attitude change toward the cigarette smoking is:
" ~ Ho: ILD = O.
In other words, no change in attitude is expected to occur under Ho; the responses
collected before and after the antismoking campaign would essentially be from the
same population of attitudes. The alternative or research hypothesis is that some at-
titude change occurred so that the difference between the observed means is greater
,. than O. In particular, the social psychologist expects that the mean attitude will de-
( crease (become less positive) from time l to time2' Symbolically, however, the alter-
r
I native hypothesis is: .
i
I
HI: ILD> O.
Following standard convention, our significance level for the correlated groups t test
will be .05.
Step 2. This second step usually entails calculating the standard error of the mean
or, in this case, the standard error of the difference between means. We must postpone
this calculation until step 3, when the standard error is determined after the standard
deviation of the difference scores becomes known. At this point, however, we can de-
cide to use a two-tailed significance test.
Step 3. Now we can get down to the actual hypothesis testing. The upper portion
of Table lO.5 illustrates how to set up the t test for correlated groups. As you can see,
there are eight participants (see the column labeled "participants" on the far-left side
of Table lO.5). Raw scores representing the pre- and the posttest attitude measures are
displayed in the second and third columns of Table lO.5. As shown by the means be-
neath the respective columns, the average attitude before the antismoking campaign
(Xl = 5.58) was more favorable than the average attitude (X2 = 3.75) after it. How do
we know this is so? The participants completed the same seven-point rating scale twice,
where higher scores indicate greater tolerance for smoking. Our interest, though, is not
in these two means per se but in the average difference between them. By performing
the correlated groups t test, we can determine if the average difference between these
means is significant (i.e., attitudes toward smoking became more negative-the mean
dropped-from timel to time2)'
398 Chapter 10 Mean Comparison I: The tTest
""" .
q
Participant
1
Pretesta
6
Posttesta
4
Difference (D)
2
XD
2.13
XD-XD
-.13
(XD - X D)2
.017
2 5 2 3 2.13 .87 .757
3 7 4 3 2.13 .87 .757
4 6 3 3 2.13 .87 .757
5 5 5 0 2.13 -2.14 4.54
6 6 4 2 2.13 -.13 .017
7 5 2 3 2.13 .87 .757
8 7 6 2.13 -1.13 1.28
XD = I D = 12. = 2.13
N 8
Note: Data are hypothetical attitudes concerning smoking collected before (pretest) and after (posttest) an
antismoking campaign.
"Ratings are based on a seven-point scale where higher numbers indicate more favorable attitudes toward
cigarette smoking.
How do we begin the calculations? The first step is to calculate the difference (D)
between the pretest and posttest scores. These difference scores are shown in column 4
of Table 10.5 (please note that the LD and the mean of D (i.e., Xjj) are shown at the
bottom of this fourth column). The formula for calculating the mean of D is shown in
the lower portion of Table 10.5. The XD , equal to 2.13, is then entered into column 5,
and the difference between each entry in columns 4 and 5 is subsequently entered into
column 6 (Le., XD - XDi see Table 10.5). These difference scores are then squared (see
the entries in column 7) and the sum of the squared differences scores (Le., L (XD -
XD )2 = 8.88) is shown below column 7 (see Table 10.5).
J
The next step is to determine the standard deviation of the difference scores by using:
[10.20.1] -
SD = L (XD ;; X D )2
We need only enter the number noted at the bottom of column 7, which is 8.88, as well
as the N. Again, please note that N refers to the number of participants or, if you pre-
fer, paired measures-but not the total number of observations available.
i'
28.84
[10.22.3] effect size r =
28.84 + 7
.1+.' J28.84
[10.22.4] eJJect stze r = 35.84 '
Use of this supplemental statistical information provides readers with a clearer sense of
what happened in the research project, one that extends beyond the province of the
mean difference. We turn now to the final topic in this chapter, one that will help us to
think about predicting differences when a research topic is still on the drawingboard
or merely the seed of an interesting idea.
Source: Rosnow and Rosenthal (1996, p. 263), which was adapted from Cohen (1998, pp. 92-93).
402 Chapter 10 Mean Comparison I: The tTest
Knowledge Base
1. What does the word "correlated" refer to in correlated group designs?
2. True or False: Carryover effects are threats to the internal validity of correlated
group designs.
3. True or False: In a matched groups design, the same group of participants appears
in both the control and experimental condition.
4. Why is the error term in the correlated groups t test usually relatively low?
5. What is the nature of the mean difference assessed by the t test for correlated
groups?
6. An investigator is planning an experiment on a topic that usually finds effect sizes
around r = .30, and she assumes a power level of .70. How large should her par-
ticipant sample be in order to have a reasonable chance of rejecting the null
hypothesis?
Answers
1. The dependent measures are said to be correlated because the same participants respond to
them at least twice (e.g., before and after exposure to an independent variable).
2. True.
3. False: Similar participants are paired together on some matching variable, one highly corre-
lated with the dependent measure. One member of each matched pair is then randomly as-
signed to a control or an experimental condition; the other member is then placed in the re-
maining condition.
4. The error term remains low because the research participants are measured more than once
or matched so that any subject-related factors (Le., individual differences that are sources of
error) are more or less identical.
5. The t test for correlated groups assesses whether the mean of the difference scores (XlJ)-the
average change between a before and after measure-is greater than O.
6. Using Table 10.6, a total of 65 participants would be needed.
We began this chapter by discussing the statistical exploits of William Gosset, a creative
researcher who developed an effective statistical alternative-the t test-enabling him
to cope with the research vagaries endemic to the brewing industry (e.g., small sam-
ples). Fortunately, similar problems were noticed in the behavioral sciences, and an al-
liance between the t test and experimental research was forged. We closed the chapter
by touting the use of power analysis tables as effective tools for increasing the chance
of obtaining significant results by creating relatively ideal research conditions (e.g., am-
A Brief Overview of Power Analysis: Thinking More Critically About Research and Data Analysis 41«Jl3
pIe sample size of participants). Such tools decrease a researcher's risk and render ex-
penditures of time, effort, and energy worthwhile.
Besides the ubiquitous statistical theme, what else do these topics have in com-
mon? Both illustrate the need for careful and thoughtful planning where statistical
analyses are concerned. An unsung but important part of becoming a skilled re-
searcher and savvy data analyst is planning the statistical analysis you will use at the
same time you are planning the experiment that will use it. By "planning the analy-
sis:' I mean thinking about all the necessary parts of a research project from start to
finish, from idea conception to the sharing of results, with an eye on how the col-
lected data will be analyzed.
You would be surprised (if not shocked) by the number of researchers-
students and faculty alike-who do not bother to think about what statistical test will
be used to analyze what data until after the study is over, when it is usually
always too late to do anything about the empirical sins committed along the way
(e.g., Dunn, 1999; Martin, 1996; McKenna, 1995). Despite entreaties and earnest warn-
ings, I still encounter students who design and execute a semester's research project
without ever thinking about how to analyze their data. To our mutual regret, many
times the student and I discover that the data are not analyzable by conventional tests-
indeed, the data often cannot be understood by any statistical test! Once data are col-
lected, salvage missions are hard to launch and, in any case, they rarely lead to behav-
ioral insight or salvation for researchers (or in the case of students, their grades).
Why don't people plan their analyses more carefully? Fear and loathing of statis-
tics, time constraints, disorganization, and general sloth all conspire to keep students
and researchers from planning what they will do with the information they gather at a
project's outset. To combat this problem, I encourage-sometimes exhort-my students
to develop what I refer to as a Before and After Data Collection Analysis Plan before they
begin a project in earnest.
An analysis plan identifies and defines independent and dependent variables, as
well as the statistical testes) used to analyze the relationship(s) between them. Chapter
10's Project Exercise is designed to help you to plan your analyses-literally or hypo-
thetically-while avoiding the research fate of less organized peers. This Project Exer-
cise can be performed for (a) an experiment you are actually conducting or (b) a hy-
pothetical experiment you could conduct. In either case, I encourage you to follow the
steps outlined in Table 10.7 and to think about the issues highlighted therein. The ques-
tions provided will help you to think more critically about the links among the avail-
able literature on a topic, research methodology, and data analysis. Table 10.7 makes
these links more apparent by outlining four steps that should be followed before any
data are collected and four steps guiding research activities once the data are in hand.
Please note that only step 5 in Table 10.7 refers to conducting the actual project; there
is more to research planning than just data collection.
If you are creating an analysis plan for a project you intend to conduct (option a
above), then obviously you can use the steps in Table 10.7 with any number of different
statistical tests in mind. If you develop an analysis plan for a hypothetical study (option b
above), however, I encourage you to create a study where one of the three variations of
the t test presented in this chapter would be appropriate. By using your knowledge of
mean comparison and the t test in this way, you are increasing the chance that you will
remember information about this test and its applications for the future.
Your instructor may ask you to share your research topic and analysis plan with
the class. Before you do so, be sure to think about how your choice of statistical analy-
ses can be used to support the ideas underlying your research.
411114 Chapter 10 Mean Comparison I: The tTest
"The decision tree that opens chapter 15, as well as that chapter's Project Exercise, provide guidance for
choosing an appropriate statistical test.
Source: Adapted from Dunn (1999) and McKenna (1995).
Here are the Project Exercise questions you should answer as you link options (a)
or (b) to the guidelines shown in Table 10.7:
1. What is your research topic? What behavioral science discipline (e.g., psychology,
sociology, education) does the topic fall under?
2. What types of studies-experiments, quasi-experiments, observational research-
are published on your topic? Characterize the usual research design(s), indepen-
dent and dependent variable(s), and statistical analyses employed. What differences
were found? How strong were they?
3. What will your study be like? What question will you address empirically? Opera-
tionally define your variables, characterize the research design, and explain who will
take part in the study. How many participants will be necessary for your research?
Who will they be?
4. What statistical hypothesis will you test? What statistical analysis will you use to test
it? Describe the results in behavioral and statistical terms you anticipate observing.
5. Are there any special issues for your research that do not appear in Table IO.?? If
so, what are they? Why do they matter?
Summary 405
(
r
LOOKIN<t fORWARD rn TI'IEN ~ACK
r
r , . . . :.', . ". . •:. •.:. hiS chapter represents the first group of inferential statistical tests designed for use
. ,with sample data drawn from one or two populations. The t test is probably the
(
r ,. ,,'imost common statistical test used in the behavioral sciences, and it is one with
r more applicability than many students and researchers realize. The three decision trees
at the beginning of this chapter attest to this fact. The first decision tree is useful be-
(
cause it will help you to decide which of the three possible t tests (i.e., single-sample,
independent groups, or correlated groups) is best suited to your data. The second de-
cision tree reinforces the fact that the independent groups t test is by far the most fa-
miliar one, and its test statistic is the one students are apt to encounter in their read-
ing or analyses for a research project they undertake. More to the point, perhaps, this
second tree can guide you from start to finish through any analysis requiring the inde-
pendent groups t test.
The third and fmal decision tree is designed to encourage you to think about key
[
I data analytic issues before you start an experiment to test some hypothesis. Effect size
r and power are two essential considerations that can "make or break" the results from a
(
piece of research. As noted on previous occasions in this book, planning and forethought
where statistical analyses are concerned enhances the chance of success-here, correctly
rejecting the null hypothesis of no difference.
r
I
r
>
I
Summary
j'
I
r
r 1. The t test-Student's t-was born out of necessity to deal the difference between means, which acts as the denomi-
with small samples when the parameters and variability of nator. Smaller standard error values tend to occur when
larger populations remain unknown. The t test is used to de- sample sizes are large, two conditions that lead to larger
tect significant differences between two sample means, which t statistics.
are either independent or dependent, or between a sample 6. The single- or one-sample t test is used to compare an ob-
mean and an estimated population mean. served sample mean with a hypothesized value thought to be
2. The t distribution substitutes for the z distribution because representative of a population. This variation of the t test is
the latter is of limited use when small samples «30 partici- often used to measure deviation from some known behav-
pants) are examined. The t distributions are a family of bell- ioral pattern or standard.
shaped sampling distributions with a mean of 0 and standard 7. The power of a t test is influenced by the selected sig-
deviations that reduce in magnitude as sample sizes increase. nificance level, variability within the sample data, the
3. Several assumptions underlie the t test: (a) the data are size of a sample(s), and the magnitude of the difference
drawn from a normally distributed population(s); (b) the between means.
data are either randomly sampled from a larger popula- 8. Hypothesis testing aimed at assessing whether a signifi-
tion or individually selected from a larger population; cant difference exists between two independent samples
(c) dependent measures must be based on an interval or a ra- is the most common-and familiar-application of the t
tio scale; and (d) when independent samples are used, they are test. The independent groups t test compares the mean
presumed to originate in populations with equal variances. reaction of one group (e.g., experimental condition) to
4. The t test is said to be statistically robust; that is, it can still another (e.g., control condition). This inferential statis-
provide reliable answers when some of its assumptions are tic is commonly used to analyze data from two-group
violated. Robust tests generally guard against Type I errors. between-subjects designs.
5. The t test assesses whether a difference exists between 9. Independent variables are either manipulated, as in tradi-
two means, and this difference serves as the numerator tional or standard experiments, or measured as a subject
in a calculation. The standard deviation(s) of a sample(s) variable. A subject variable is an individual characteristic a
is used to estimate the standard error of the mean or participant possesses, a trait or characteristic (e.g., age,
406 Chapter 10 Mean Comparison I: The tTest
gender, height, IQ) that is beyond the control of the 13. Carryover effects involve response bias, where prior re-
experimenter. For example, subject variables can be used for sponses to a dependent measure influence reactions to the
comparing one group showing the trait to another group same measure on a subsequent occasion(s). These carryover
where the trait is absent. effects are common in correlated group designs and can mis-
10. Various inferential statistical tests, including the t test, are lead investigators by mimicking, masking, or eliminating the
based on a conceptual model where between-groups vari- effects of an independent variable. As a result, carryover ef-
ation (attributable to an independent variable) is divided fects serve as a threat to internal validity.
by within -groups variation (the error term or the degree of 14. Correlated group and matched group designs pose a dis-
behavioral similarity observed in each group). Generally, tinct statistical advantage, the reduction of error variance.
researchers want to obtain a large amount of between- In contrast to between-groups research designs, these cor-
groups variation relative to a small amount of within- related designs draw error variance from one source (e.g.,
groups variation. one group of participants) rather than two (e.g., two inde-
11. Beyond reporting a mean difference revealed by an inde- pendent samples of participants). When a smaller error
pendent groups t test, it is also important to index any sig- term is present in the denominator of the t test, there is a
nificant effect size (e.g., effect size r) and to indicate the greater likelihood of rejecting the null hypothesis even
degree of association existing between the relevant inde- when the difference between means is modest.
pendent variable and dependent measure (e.g., &,;2). 15. Power analysis is a tool to enhance a researcher's probability
12. The t test for correlated groups is used to assess an ob- of correctly rejecting a null hypothesis. By planning an ex-
served difference between two conditions for the same or periment in advance (i.e., identifying effect sizes in the rele-
a matched group of participants. This test is used to ana- vant research literature and selecting a power level), a re-
lyze correlated group designs, where participant responses searcher can identify the total number of participants needed
are assessed before and after exposure to an independent to have a reasonable chance of detecting a statistically sig-
variable. The test can also analyze data from matched nificant result.
group designs, where pairs of participants are equated on 16. Before collecting one datum, every student and researcher
some dimension that is highly related to the dependent should complete an analysis plan, a guiding "game plan" that
measure. promotes identifying necessary statistical tests in advance.
Key Terms
Between-groups design (p. 378) Matched groups design (p. 393) Subject variables (p. 378)
Between-groups variation (p. 383) Paired ttest (p. 393) t distributions (p. 368)
Between-samples variation (p. 383) Power analysis (p. 400) t test for correlated groups (p. 393)
Between-subjects design (p. 378) Repeated measures design (p. 393) t test for dependent groups (p. 393)
Carryover effects (p. 393) Robust (p. 370) t test for independent groups (p. 384)
Correlated groups design (p. 393) Sampling distribution of the difference Within-group variation (p. 383)
Error term (p. 382) between means (p. 379) Within-sample variation (p. 383)
Estimated {;} (p. 391) Single- or one-sample t test (p. 372) Within-subjects design (p. 393)
Individual differences (p. 384) Standard error of the difference between
means (p. 381)
Chapter Problems
1. Why is the ttest used in place of the z test? tions does use of the t test become problematic, that is, a
2. How do t distributions differ from the z distribution? result based on it becomes less robust?
3. List the assumptions underlying use of a t test. What hap- 5. Why are larger samples desirable? How do larger samples
pens when one of these assumptions is violated? Can the t influence the size of a sample's standard deviation and stan-
test still be used-why or why not? dard error?
4. What does it mean when a statistical test is described as 6. When should a researcher use the single sample t test? De-
"robust"? Is the t test robust? Under what particular condi- scribe a hypothetical but concrete example.
Chapter Problems 407
7. A personality researcher wants to know if students attending Can she claim that as a group, her students exceed the su-
smaller colleges are more introverted that those going to perior level? Use a significance level of .01 and perform a
larger universities. The researcher selects a sample of N = 32 one-tailed test.
19-year-old students from a small liberal arts college. These 15. What are the confidence intervals for the sample means based
students complete a standardized introversion-extroversion on the data reported in problems 13 and 14. Base the confi-
scale, one whose scale characteristics were developed using dence interval on the appropriate significance level provided
university populations (lower scores indicate higher levels of in each problem.
introversion). The scale's J.L = 65. The mean of the sample 16. Discuss the factors affecting the statistical power of a t test.
X = 62 and the standard deviation is 7. Using a significance 17. Why is the t test for independent groups ideal for hypothe-
level of .05, can the personality researcher assume that the sis testing in experimental research?
college students are more introverted than the general pop- 18. How do traditional independent variables differ from sub-
ulation? ject variables?
8. What is 95% confidence interval for the mean reported in 19. What is a subject variable? How are subject variables used in
problem 7. concert with between-groups designs and the independent
9. An organization of middle school educators believes that stu- groups t test?
dents' geographical knowledge has improved over the past 20. How does the standard error of the difference between means
five years. To verify this belief, a sample of 26 middle school differ from the usual measures of standard error?
students completes a world geography test (X = 53, s = 8.5). 21. Explain the nature of the conceptual model for com-
The middle school average on this measure is 50. Using a paring means presented in this chapter. Why is this model
significance level of .01, have the educators demonstrated an appropriate prelude for most inferential statistical tests?
that geographical knowledge is actually improving? Why or 22. Unbeknownst to his class, an instructor decides to replicate
why not? a classic study on experimenter expectancy effects (Rosen-
10. What is the 99% confidence interval for the mean reported thal & Fode, 1963). In an experimental psychology lab, each
in problem 9. student was given a rat to teach to run a standard maze. How-
11. A random sample is drawn and observed to have a X = 45 ever, half of the students were told their rats were specially
and an s = 11. Use this sample data to test the null hypoth- bred to be "maze bright" while the remaining students were
esis that J.L = 42 when: told their rats were "maze dull." In actuality, of course, all the
a. The sample size is 40 and the significance level for the test rats were of the same breed and possessed no special talents.
is .05. The following data represent the rats' skill-level at running
b. The sample size is 25 and the significance level for the test the mazes, where lower numbers represent fewer errors (Le.,
is .01. higher skill). Did the students with the maze bright rats
c. The sample size is 62 and the significance level is .05. transmit that expectancy to the animals, so that they out-
12. A random sample is drawn and observed to have an X = 77 performed the "maze dull" animals? Use a one-tailed test with
and an s = 7.5. Use this sample data to test the null hypothesis a significance level of .05.
that J.L = 80 when: Maze bright scores: 15, 10, 11, 10, 12, 13, 10, 13, 12, 11
a. The sample size is 30 and the significance level for the test Maze dull scores: 17, 18, 17, 16.5, 17, 19, 13, 12, 18, 17
is .01. 23. What is the effect size for the result obtained in problem 22?
b. The sample size is 43 and the significance level for the test If the t statistic calculated in problem 22 reached significance,
is .05. then determine the value of Ili.
c. The sample size is 20 and the significance level is .05. 24. Using the examples presented earlier in the chapter, write a
13. A campus psychologist is starting a weekly group to dis- paragraph or two summarizing the results obtained in prob-
cuss low level depression, the type often triggered by stress- lems 22 and 23.
ful, academic events. The psychologist knows that the pop- 25. Participants took part in a study on aversive noise and
ulation mean of the screening test is 35, and that scores at its affect on performing skilled tasks. An experimental
or above this level indicated the probable presence of low psychologist believes that the predictability of the aver-
level depression. His sample of students obtained screen- sive noise is key to understanding its influence on per-
ing test scores of 30, 36, 34, 34, 38, 40, 32, 30, 28, 37, 36, formance. Specifically, predictable noise-noise occur-
and 37. Can the psychologist label this group of students ring at fixed intervals of time-is less disruptive than
as having low level depression so that he can begin the ther- random noise, which reminds participants that they lack
apy group? Use a .05 level of significance and perform a control in the situation. Two groups of participants
one-tailed test. solved moderately difficult math problems while listen-
14. A teacher of gifted students believes that her sample of stu- ing to a loud noise. One group heard the loud noise at
dents have IQs still higher than the superior gifted cutoff fixed intervals, the other heard it at random intervals.
of 134. Her students' scores on the IQ test were 132, 130, The following data are the total number of math problems
136,137,135,136,133,135, and 137. Is the teacher correct? that participants in each group got correct. Determine
408 Chapter 10 Mean Comparison I: The tTest
whether the experimenter was right, that random noise is Participant TestScorel Test Score2
more disruptive to skilled performance than fixed noise (use
a .05 significance level for a one-tailed test). A 8 10
Random noise: 10,9,8,8,9,10,9,8,7,8. B 6 7
Fixed noise: 6, 7, 8, 8, 7, 7, 6, 5, 8, 6. C 6 8
26. What is the effect size for the result obtained in problem 25? D 5 6
If the t statistic calculated in problem 25 reached significance, E 9 10
then determine the value of ti. F 8 9
27. Using the examples presented earlier in the chapter, write a
paragraph or two summarizing the results obtained in prob- 35. Review the research project described above in problem 34.
lems 25 and 26. Do you think this project could be susceptible to any carry-
28. When teaching about science, instructors are assumed to rely over effects? If so, which one(s) and why?
on less descriptive or flowery language because the natural 36. An industrial psychologist is concerned that a recent round
sciences contain topical information that is relatively fixed in of layoffs at a plant may have increased the stress felt by em-
content. Classroom instructors in the humanities, however, ployees who retained their positions. To measure whether
rely on much more descriptive language because the topics this survivor stress actually exists, the psychologist adminis-
are open to debate and multiple interpretations. An educa- tered a stress measure to a sample of employees before and
tional researcher sits in several natural science and humani- after the layoffs occurred. Evaluate whether self-reported
ties classes at several high schools and monitors the teaching stress increased from time} to time2 by using a .01 signifi-
styles of instructors. What follows are the number of differ- cance level and a one-tailed test.
ent words and phrases used by the respective sets of in- Employee Stress Score} Stress Score2
structors to convey course material: Did the humanities
teachers use more descriptive language than the natural sci- A 21 23
ence instructors? Use a .01 significance level and perform the B 24 26
appropriate one-tailed test, and take note that the groups are C 15 15
not of equal size. D 18 22
Humanities instructors'speech: 34, 28, 27, 32, 33, 30, 38, 32, E 19 20
30,28,34,32 F 20 19
Natural science teachers' speech: 27, 26, 23, 30, 25, 26, 27, 26, G 18 21
27
29. What is the effect size for the result obtained in problem 28? 37. Review the research project described above in problem 36.
If the t statistic calculated in problem 28 reached significance, Do you think this project could be susceptible to any carry-
then determine the value of (Ii. over effects? If so, which one(s) and why?
30. Using the examples presented earlier in the chapter, write a 38. A social psychologist is interested in how emotional conta-
paragraph or two summarizing the results obtained in prob- gion-adopting the affective state of people after hearing
lems 28 and 29. them describe positive or negative experience-affects peo-
31. Are there any advantages to conducting a correlated groups ple who have an optimistic disposition. The psychologist is
design rather than an independent groups design? If so, what particularly interested in whether the number of people shar-
are they? ing their emotional experiences makes any difference in the
32. How does a correlated groups design differ from a matched affective transfer (e.g., perhaps two people sharing a happy
groups design? When is it appropriate to use one design event with a third enhance her mood to a greater degree than
rather than the other? one person talking about the same experience). The social
33. What is a carryover effect? Why do such effects pose con- psychologist recruits a group of people who are matched on
cerns for correlated groups designs? their measured level of optimism, age, and gender. One
34. An educational psychologist believes that relaxation training member of each pair is then exposed to three people who
can help students to improve their performance on stan- discuss the same emotion eliciting experience, while the
dardized tests. A group of high school students completes other matched participant meets with one person for emo-
two grade-appropriate reading comprehension tests, one be- tion sharing. The following data are emotion ratings on a 1
fore the relaxation training and the other a week later, after to 7 scale, where higher numbers indicate greater emotion
the training is complete. Using the following data, demon- transfer. Evaluate whether the participants who met with a
strate whether reading comprehension scores increased fol- group rather than a single person showed a greater degree of
lowing the training. Use a significance level of .05 as well as emotional contagion. Use a .05 significance level for a one-
a one-tailed test. tailed hypothesis test.
Chapter Problems 409
Solo-Emotion Group Emotional a. effect r = .30, power = .40; b. effect size r = .20, power
Matched Pair Encounter Encounter = .60; c. effect size r = .70, power = .15;
d. effect size r = .30, power = .50.
A 4 6
41. Which t test is most appropriate for each of the following
B 3 5
situations? (Hint: Consider using the decision trees that open
C 7 6 the chapter to answer this question.)
D 5 7 a. Samples are not independent of one another.
E 2 4
b. Population parameters are known.
F 6 6 c. Two observations at different points in time were gath-
G 5 7
ered for each participant.
H 5 6
d. Two independent samples were drawn.
39. Why should investigators learn to perform a power analysis? e. One observation was drawn for each participant, and
Should a power analysis be performed before or after a study? population parameters were not known.
Why? 42. Two independent samples of data are available, but their sam-
40. Examine the following effect sizes and power levels, and then ple sizes are unequal. What should the data analyst do? (Hint:
determine how many total participants are needed for each Consider using the decision trees that open the chapter to
of the studies represented. (Hint: Consider using the deci- answer this question.)
sion trees that open the chapter to answer this question.)
Deciding Whether to Use a One-Way ANOVA Procedure Following a One-Way ANOVA Yielding an Omnibus F ratio
1. 1.
Are the data based on an interval or a ratio scale? Did the one-way ANOVAs F ratio reach
significance (e.g., p < .05)?
I I 1 1
If yes, If no, If yes, If no,
-y
then go to step 2.
• I' then go to step 5. then go to step 3. then go to step 2.
t.···
2•...
2.
How many samples or groups? Does an a priori theory exist that provides a more compel"
ling explanation of the results than the omnibus F ratio?
I I
If two, If three or more, If yes, If no,
then go to step 3. then go to step 4; otherwise, then perform a contrast no additional analyses
go to step 5. analysis or some other may be possible-reevaluate· . .
appropriate post hoc the theory and method used
comparison, and then go to test it.
to step 4.
3. 3.
Are the samples or groups independent of one another? Are there more than two means?
1 1
If yes, If no If yes, If no,
then perform an and they are perform Tukey's HSD test and no post hoc test is
independent groups dependent observations, then go to step 4. necessary; go to step 4.
t test (see chapter 10) or a then perform a dependent
one-way ANOVA. groups t test
(see chapter 10).
4. 4.
Are the samples or groups independent of one another? Do the results generally support the
hypothesis?
I I
If yes, If no If yes, If no, then determine whether
then perform a one-way and they are IL then perform the any supplementary analyses
ANOVA with three or more dependent observations, then appropriate supplementary are appropriate (Le., effect site
levels to the independent . perform a repeated measures analyses (Le., effect size t, 6',,2), t, 6',,2) and consider whether th~
variable. ANOVA (see chapter 13). and go to step 5. obtained results are pushing
the theory in a new direction;
then go to step 5.
5. 5.
A one-way ANOVA is not the appropriate analysis- Write up the results in clear prose, display the means in a
reevaluate the data and your analysis plan. table and/or a graph, and create an ANOVA source. table •..
C HAP T E R 1 1
Chapter DulUne
Ninety-six percent of a group of students reported that Mr. Tees would be more
upset-after all, he just missed the plane by a scant 5 minutes, whereas Mr. Crane missed
his by a full 30 minutes. In the minds of perceivers, that 25-minute difference looms
large-but should it? Is the travel plight of the two men so different?
Consider the fact that the situation of these two hypothetical figures is identical-
both missed their planes and no doubt fully expected to do so when they were stuck in
traffic (i.e., the departure time came and went). As Kahneman and Tversky (1982,
p. 203) point out, the "only reason for Mr. Tees to be more upset is that it was 'more'
possible for him to reach his flight:' These investigators point out that this scenario in-
vites us to run a mental simulation, a "what-if" exercise where reality and fantasy, hope
and desire, meet-we (and Mr. Tees) feel "upset" because we can imagine arriving
5 minutes sooner; it is much harder to imagine being there 30 minutes earlier, the less
reasonably imaginable event that would help Mr. Crane.
My point here is that doing more complex data analyses like those presented in this
chapter-especially with our own data-can sometimes put us in a mental place analo-
gous to Mr. Tees. Our study "almost" reached significance, for example, or our results were
close to the pattern predicted by the hypothesis and so on. It can be difficult to fight the
tendency to be disappointed when fate-and our hard-won data-is less than coopera-
tive, but it can and will happen to most of us. As you read this chapter and learn a new
set of statistical procedures, bear in mind that you may be faced with a situation like the
hypothetical Mr. Tees, one where you are faced with, say, a significance level of .06 rather
than the "magical" (predetermined) significance level of .05. When a situation like this
comes to pass, try to imagine a researcher whose p value was .07 or even .09-are you in
such a different place with .06, or even the magical .05? No, not really, and in any case,
you can still focus on a practical matter: discerning why the experiment and the analyses
did not work out, then focusing on how to rectify the problem(s) in a future study. Armed
with this philosophy and, I hope, the ability to run mental simulations about data in their
proper perspective, we can move on to the main topic of this chapter.
As you know by now, the statistical analysis of data is an eminently practical
enterprise. The t test, the focus of the last chapter, was originally developed to ensure
consistency and maintain quality in the brewing of beer and ale. That statistical proce-
dure is now an ubiquitous tool in the natural and behavioral sciences, one at home in
the laboratory as well as the classroom. This chapter's topic, the F test, had an equally
propitious start in the analysis of agricultural data-crop yields, in particular-in
England and later the United States. By studying the growth of wheat and other crops
in carefully planted fields, the statistician Sir Ronald A. Fisher (see Data Box 11.A)
examined how varied soil quality, as well as the presence or absence of fertilizer, had an
effect on the amount and quality of a harvest. Given the agricultural origins of Fisher's
work on the Ftest, you should never take the term "field experiment" for granted again!
By linking statistical reasoning with modem farming techniques, Fisher almost
single-handedly created a detailed theory of experimentation, a variety of experimen-
tal designs, and most important for our purposes, a method of data analysis capable of
working with more than two means at once. The F test is at the heart of this method,
which is formally known as the analysis of variance but commonly recognized by the
acronym ''ANOVA:' If the t test is indeed the inferential statistic most often recognized
by students, then the ANOVA is arguably the most familiar procedure for behavioral
scientists, and you will learn why as you read this chapter. For the last half-century, the
Overview of the Analysis of Variance 413
ANOVA, as well as many of Fisher's other contributions, has exerted widespread influ-
ence over the basic and applied research conducted in many disciplines. Without these
statistical and experimental innovations, disciplines in and outside of the behavioral
sciences would literally be at a loss-the very existence of some would probably be
threatened-and most fields would certainly lack the intellectual maturity and analytic
sophistication they currently enjoy.
The ANOVA is based on another statistical distribution, the F distribution (named,
as you might guess, for Fisher), which is used to test hypotheses regarding significant
differences among two or more means. Where the t test was limited to identifying
whether the average of one group of scores was larger or smaller than the average of a
second group of scores, the ANOVA and its F test can search for reliable differences
among the magnitudes of two, three, four, or even more means simultaneously. As we
will see, the ability to examine more than two means at once is a boon for experimen-
tation in the behavioral sciences, as more complex questions about behavior and its un-
derlying causes can be asked. Indeed, this is the second chapter in this book explicitly
dealing with issues involving comparing means from different groups or experimental
conditions. Because more than two means can be examined at one time, the scope and
the complexity of the hypotheses being tested by the ANOVA, too, are on the rise.
Why do we need to move beyond the t test and its two-group comparisons?
In contrast to the independent Despite the fact that the two group standard experiment is the raison d'etre-the rea-
son for being-for experimental research, its impact is necessarily limited. How many
groups t test, for example, the
interesting questions consistently lend themselves to this form of dichotomous exami-
ANOVA can compare more than two nation? Life, as well as the human behavior represented by it, is not exclusively an
means at the same time. either/or process or one governed by the precise, if limited, comparison afforded by
control and experimental groups. Many important topics require an investigator to con-
sider how more than two levels of an independent variable affect a dependent measure.
For example, which of four seating patterns maximizes group discussion and decision
making in an organizational setting? What is the optimal number of students for an
undergraduate research team-2, 4, or 6? Can short-term depression observed in newly
retired males be most effectively reduced by psychotherapy, group counseling, a com-
bination of the two therapies, or no intervention at all? As you can see, moving beyond
simple two group comparisons opens up many other research possibilities.
What will you learn in this chapter? Beyond acquiring the ability to compare more
The ANOVA enables investigators to than two groups at one time, I think you will learn-and be surprised by-how much
knowledge about statistics and data analysis you are now carrying around in your head.
analyze data addressing complex
In other words, the bulk of what you will learn in this chapter is about thinking more
questions (i.e., beyond those posed broadly about the relationships portrayed in data-you already have a firm grasp on
by standard two-group experiments). variance (as in the analysis of variance), the sum of squares we use to calculate varia-
tion, and the logic underlying hypothesis testing and comparing means. We are simply
fleshing out the important skills you currently possess. Let's get down to it, shall we?
Note: Each of these samples was exposed to a different level of an independent variable. The large bell-shape
encompassing the other three distributions covers their total variance.
randomly assigned to three groups or samples, eight in each one. Each sample is then
exposed to one-and only one-level of the independent variable. To begin with, we
can think of the three samples of research participants as a collective source of varia-
tion. In fact, the variance of all the participants' scores from the three samples com-
bined would be labeled the total variance or total variation.
KEY TERM Total variance entails the combined variance of all the scores or observations within an
experiment.
Figure 11.1 shows three distinct distributions corresponding to the three levels of the
independent variable in our hypothetical study. Two of these distributions are relatively
close together and the third is located farther away from them (see Figure ILl). Do
you see the larger bell-shape surrounding the three smaller distributions? That larger
bell is covering the total variance present in the experiment. We could determine the
total variance by summing the squared deviations from the mean of the entire sample
of 24 participants (the mean of a larger sample that is comprised of smaller samples is
sometimes called the grand mean). Please note that the total variation indicated in
Figure ILl is greater than the variance in anyone of the three samples.
Our next consideration is to partition or divide the total variation into smaller
The ANOVA yields an F ratio, a test
amounts of variability that are due to (a) the effect of an independent variable creat-
ing change in a dependent measure and (b) error due to random or chance influences.
statistic that is determined by
How does the ANOVA partition variance when some hypothesis is being tested?
dividing between-groups variance Conceptually, the ANOVA pits some alternative hypothesis against a null hypothesis
(largely based on an independent that two or more samples of randomly assigned participants hail from populations shar-
variable) by within-groups variance
ing the same means. The ANOVA provides a statistic, the F ratio, comprised of two
elements:
(random error).
IiiiI The numerator of the F ratio indicates the variability between or among the
means of two or more samples (i.e., between-group variance).
III The denominator of the F ratio identifies the variability among the observations
within each sample (i.e., within-group variance).
Overview of the Analysis of Variance 415
Thus, we can construe the relationship between these two elements as:
between-group variance
[11.1.1]
F = within-group variance .
These elements should be familiar to you, as they are virtually identical with those
we encountered in the conceptual and procedural reviews of the t test (see chapter 10).
Similar to the t test, the within-group variance can also be referred to as the error term
or, more precisely in this case, the error variance.
KEY T ERM Error variance, which is estimated by within-group variance, refers to the idiosyncratic, uncon-
trollable, unknown factors or events that create differences among the observations within a group.
The error variance represents the differential behavior of participants (i.e., individual
differences) within the samples as well as experimental error (e.g., misunderstood
directions, equipment problems). If all the participants in a given sample behaved ex-
r actly alike, then there would be no error variance in that sample. In fact, we would be
apt to attribute their common reactions exclusively to the influence of the independent
variable. As people act more differently from one another, however, the error
(
,r
variance increases, so we know that individual differences and possibly experimen-
, tal error are present.
j Figure 11.2 shows the familiar three distributions from Figure 11.1, each of which
( represents a different independent group of randomly assigned participants who were
i exposed to a single level of an independent variable. As you can see, the within-group
I
I or error variance is identified within each distribution. Keep in mind that the observed
r differences within a sample are not-should not be-systematic, but instead are due to
) chance. When calculating an F ratio, the estimate of within-group variance or error
variance is based on the average variance of the observations within each sample.
; The between-group variance also adopts a particular name in the context of the
I
I Treatment is another name for an ANOVA-it is frequently referred to as treatment variance. Each level of an indepen-
! dent variable can be construed as a "treatment:' literally a manipulation designed to
independent variable.
elicit some reaction or response from research participants. A paradigmatic example of
)
,f
i
j
j
I I
! ----r~--~L===~b;:=~iF===-- orWithin-Group
Error
I
~ Variance
f.,
x
.I
r
r
Figure 11.2 Within-Group or Error Variance Within Three Independent Samples in
an Experiment
Note: Each of the three samples contains scores or observations that vary from their sample mean. Each
sample was exposed to a different level of an independent variable.
416 Chapter 11 Mean Comparison II: One-Variable Analysis of Variance
a treatment, of course, is when one group is given an experimental drug, another takes
a current prescription used for the ailment, and a third group is administered a placebo.
Naturally, a treatment does not need to be a drug-it can just as easily be more con-
ceptual, as when some stimulus (e.g., an unexpected gift) elicits some psychological state
(e.g., positive mood; Isen, 1987)-some X causes a change in some Y.
Besides the variability attributable to some independent variable, treatment
variance is also comprised of error variance, that is, individual differences and experi-
mental error. Why? Simply because not all of the behavioral change within a sample
can be attributed to the influence of an independent variable-some change is neither
predictable nor controllable. Ideally, most of the between-groups or treatment variance
will be due to the impact of an independent variable and only a small portion will be
error variance.
KEY TERM Treatment variance is based on the systematic influence of different levels of an independent vari-
able on a dependent measure, combined with error variance.
Figure 11.3 shows the same three independent samples from Figure 11.2. This time,
however, we are interested in the variance existing between the three group means, which
makes each of the groups distinct from the others. As the treatment variance increases,
the three samples appear to stand out from one another-in other words, the respec-
tive levels of the independent variable lead to distinct behaviors and accompanying
treatment variance. The lines drawn from each mean to the other two means in Figure
11.3 represent this type of variance. We can now recast the F ratio shown in formula
[11.1.1] slightly differently:
F = treatment variance + error variance .
[11.1.2]
error variance
Naturally, the two formulations are identical, but it will help you to think about what
the ANOVA actually does with data by being exposed to them both.
Between-Group
or
Treatment
Variance
Figure 11.3 Between-Group or Treatment Variance Shared Among Three Independent Samples
in an Experiment
Note: Each of the three samples can be described by its respective mean. Each sample was exposed to a
different level of an independent variable. The between-group or treatment variance is based on variation
among the three sample means.
Overview of the Analysis of Variance 4H
[11.2.1] F= (T~etween
(T~ithin
Of course, we will always be working with sample data so it is probably better to think
of the F ratio in terms of sample variance:
[11.3.1] F= S~etween
S~ithin
Because any F ratio is based on separate variance estimates-one divided by
another-you know that calculations involving sums of squares will be involved. Why
is this issue worth mentioning? When we began to calculate variance in chapter 4, we
learned that the sum of squares is an integral part of the calculation process-without
squaring any numbers, the mean deviations called for in the variance formula sum to
O. Because the square of any number is positive, the value of any F ratio, too, will
always be positive. Whatever its magnitude or significance, an F ratio must be positive,
a fact that will serve you well when checking your statistical calculations for errors. You
know that you cannot have any negative numbers entering into the calculation of an
F ratio (unlike the t and the z distributions, then, the F distribution lacks negative
values).
A second characteristic of the F distribution concerns the logic underlying the ac-
tual ratio calculation. In theory, when the null hypothesis is true, the two variance es-
Because they are based on variance timates representing the numerator and the denominator, respectively, should be equal
in value to one another. Under such conditions, the resulting F ratio is equal to
estimates, which in turn are
1.00. In practice, of course, obtaining a perfect 1.00 ratio does not occur very often be-
determined by sum of squares, F cause of sampling error-even if the two variance estimates are equal at the popula-
ratios are always positive numbers. tion level, they are unlikely to match one another precisely when they are based on sam-
ple data.
What happens when an F ratio exceeds 1.00? When an F statistic is greater than
1.00, it is a clear indication that the null hypothesis of no difference between or among
a set of means may be false. In other words, two or more means might actually be re-
liably different from one another, and the pattern of the difference(s) will ideally reflect
the theory linking the levels of a treatment variable to changes in a dependent measure.
Again, obtaining a value greater than 1.00 indicates that there is more between-groups
418 Chapter 11 Mean Comparison II: One-Variable Analysis of Variance
variability (presumably due to the effect of an independent variable) and less within-
groups variability (participants within a given condition in the experiment behaved
more or less alike). As F ratios increase in value (Le., > 2.00 or 3.00), the probability
of rejecting Ho increases accordingly.
L -____L-____L-____L-~~L-----L-~--~--~~--------~-F
o
3.34 5.56
Note: This distribution represents F ratios with 3 (numerator) and 14 (denominator) degrees of freedom.
Only 5% of the distribution's values exceed 3.34 and only 1% are greater than 5.56.
Overview of the Analysis of Variance 419
When only two sample means are involved, an independent groups t test and a
one-way-one independent variable-ANOYA (to be reviewed in detail shortly) will
reach the same empirical conclusion: either there is or is not a reliable difference
between two averages. Naturally, the actual values for a t statistic and an F ratio will
differ, but in the two-sample (two-means) case, both are related to the normal distri-
bution, a relationship providing statistical as well as conceptual links. When an
independent groups ttest yields a statistic (which, as you will recall from chapter 10,
, has N - 2 degrees of freedom), the corresponding F ratio has 1 and N - 2 degrees of
I
I freedom (remember, the F distribution, which the ANOYA is based on, supplies two
• numbers for degrees of freedom-one corresponding to a numerator, the other a
denominator). The relationship between a tvalue and a two-group Fratio is verifiable
by squaring the obtained t statistic:
[11.4.1)
I
t"
420 Chapter 11 Mean Comparison II: One-Variable Analysis of Variance
[11.6.2]
C = 4(4 - 1) ,
2
[11.6.3]
C = 4(3) ,
2
Overview of the Analysis of Variance 421
[11.6.4] c= 12,
2
[11.6.5] C= 6.
Six separate ttests! That is quite a bit of calculation. Imagine how easily one could make
a calculation error given the repetitive nature of the exercise. The possibility of making
a simple but inferentially disruptive error should give any researcher or data analyst
pause (of course, relying on a statistical software package eliminates this fear).
The second problem with performing multiple t tests is much more insidious than
mere math errors, however. In normal circumstances, when only one comparison
between two independent means is being performed, the risk of making a Type I error
is equal to the level of a, the significance level of the statistical test (e.g., .05). This form
of comparison between two means using the t test is called a pairwise comparison a.
KEY TERM Pairwise comparison a is the probability of making a Type I error (e.g., a = .05) when two sam-
ple means are assessed using the t test.
By performing numerous, even repeated, analyses of the same data set, a researcher
Performing multiple independent substantially increases the risk of committing a Type I error. In other words, the more
similar tests you perform-each based on the same significance level, say, a = .OS-the
t tests on the same data set
greater the chance of making a Type I error when the null hypothesis is actually true.
increases the probability of Table ILl shows the probability of making at least one Type I error when all pairs of
committing a Type I error-reliance means are examined using separate, independent t tests. Notice that when the differ-
on the ANOVA reduces this problem ence between two means is examined, the probability of making a Type I error is equal
to a (i.e., p = .05). So far, so good. But notice what happens when three means are ex-
to an acceptable level of risk.
amined-performing the necessary three independent t tests increases the probability
of making one Type I error (there may be more!) to .14. As shown in Table 11.1, once
you go beyond three means, the likelihood increases still more dramatically. In short,
performing more than one legitimate pairwise comparison a inflates the probability of
identifying a false difference between means.
The probability of committing one or more Type I errors under these conditions
has a special name-statisticians refer to it as experimentwise error.
KEY TERM Experimentwise error refers to the likelihood of making one or more Type I errors in the course
of performing multiple pairwise comparisons with independent t tests in the same data set.
The probability of committing experimentwise error under particular conditions-
known significance level of a statistical test and the number of two mean comparisons
being contemplated-is easily calculated. The following formula was used to estimate
the probabilities of making a Type I error shown in Table ILl:
[11.7.1] p(experimentwise error) = 1 - (1 - a)c,
2 .05
3 3 .14
5 10 .40
10 45 .90
INote that p(experimentwise error) = 1- (I - a)e if the C comparisons are independent of one another.
422 Chapter 11 Mean Comparison II: One-Variable Analysis of Variance
where a is the significance level employed in a given pairwise comparison and C is the
number of two mean comparisons being made (recall that we learned to calculate C
using formula [11.6.1]). Suffice it to say that as the number of comparisons increases,
the probability of making at least one Type I error accelerates rather dramatically.
How is the ANOVA Distinct from Prior Statistical Tests? Some Advantages 423
Can anything be done to alleviate this problem? The usual practice of lowering the
significance level of the test (e.g., changing .05 to a more rigorous .01) makes some
sense, but is ultimately counterproductive. You no doubt remember that reducing the
significance level for a test statistic necessarily leads to a reduction in the power avail-
able to reject the null hypothesis of no difference (see chapters 9 and 10).
{ What is the moral of the story? A researcher is unwise, even foolhardy, to perform
t
multiple t tests on the same data set. This foolhardiness is not limited to the possibility
r
r of inferring the existence of a relationship in the data that is not, in fact, real or true
(i.e., committing a Type I error). More damaging still is the ethical implication of per-
i forming multiple t tests-any researcher should know better than to risk drawing false
r conclusions that could potentially harm the welfare of others if the results were un-
! I
I
knowingly used as if they were accurate. The mind reels when considering the great dam-
age the publication of faulty research results can incur. In fact, I encourage you to be
I
t highly skeptical of any published piece of research you encounter that relies on multiple
t tests in the analysis of results. Please keep in mind that any test statistic-not just the t
ir test-needs to be used carefully, appropriately, and judiciously in the analysis of data.
(
• How is the ANOVA Distinct from Prior Statistical Tests? Some Advantages
In spite of its frequent appearance in everyday language, the word "unique" is a tough
one to use correctly. It literally means solo, solitary, or without peers-in other words,
when something is unique, it is one of a kind; there really is nothing else like it. In some
ways, the ANaVA is unique from the tests we studied before, but there is more than
one type of ANaVA-we will learn the most basic version in this chapter and then
some variations on it in the next one. Thus, we cannot properly describe the ANaVA
as unique, but we can say that it is distinct from the other tests we have examined.
How does the ANaVA differ from prior statistical tests we have studied? In three
ways, really-by virtue of the way it compares means, protects against Type I errors,
and enables us to think about complex causal relationships among variables.
W hen is it correct to substitute the preposition among for between? Grammarians recommend
that between be used for comparing or contrasting two things, and that among be reserved for
those times when more than two things are at issue. Correct usage turns out to be a bit more com-
plicated than this simple rule suggests, however. The language maven William Safire recently wrote
about the use of between and among in his weekly On Language column in The New York Times
(Safire, 1999, p. 24).
Although I fuzzily warned about the trickiness of the relationship of several items con-
sidered a pair at a time, [the late linguist James] McCawley came thundering back with
the vivid examples that bring his theories to life: "Only between is appropriate when
you say, 'He held four golf balls between his fingers' or 'He has a fungus between his
toes.'"
No arguing with that, even as I try the four-golf-ball stretch; it is not in the gram-
mar that is hard-wired in our heads to say, "The golf balls are among my fingers." Jim
then zinged home the lesson: "What determines the choice of the preposition isn't
whether its object denotes two entities or more than two, but whether the entities are
being referred to in twos or in combinations of more than two:' Therefore, I will fol-
low the simple rule of style (between two, among several), remembering the exception's
complexity every time I feel an itch in my shoes. (Italics in original.)
In much the same way, statisticians stretch the use of between, referring to relationships be-
tween means rather than among them. Statisticians prefer to always use between, but I think that
it is possible to rely on among, as well, and I have tried to do so at various points throughout
this chapter and book. When writing about the process of comparing means, I encourage you to
try your hand at using both of these prepositions, learning by doing to see which one is better
suited to what sort of descriptive situation or set of results. The main point, of course, is to be
consistent in your usage and to recognize that many authors will stick to between when among
might well suffice.
We will learn a statistical procedure used to identify the precise differences between
or among means later in this chapter. Such procedures are referred to as post hoc ("post
hoke") or "after the fact" tests because the data are already collected and analyzed in a
preliminary fashion by the ANOVA or some other test. For the present time, think about
the advantages posed by omnibus statistical tests, especially the way they enable inves-
tigators to entertain and explore complex questions that are prevented by most
two-group experiments.
Type I errors beyond the usual level of risk (Le., the a level or significance level cho-
sen before the analysis is undertaken).
Table 11.2 generally move from relatively simple relationships to more complex ones. I
want you to view this table as a resource for your thinking about and planning of re-
search efforts, as well as their analyses. You may never actually conduct a study that hy-
pothesizes the presence of ceiling effects, say, or developmental effects, but being aware
of the behavioral patterns they reflect can help you to think more critically about the
work you do conduct and the data you analyze. The ANOVA, then, will help you to see
causal relations in somewhat broader strokes than prior opportunities allowed.
One last point: The ANOVA is still only one tool for pinpointing cause and effect
between independent variables and dependent measures, just as no one experiment-
no matter how complex its design-can answer every question. The ANOVA is very
useful, though, for analyzing studies that empirically bring together points of view based
on prior research (i.e., using an old operationalization of an independent variable) and
novel approaches (i.e., use of a new operationalization)-and the ability to compare
more than two participants groups means that an adequate control group need not be
sacrificed in the process.
Knowledge Base
1. What is the source of the variability in the F ratio's numerator? The denominator?
2. What makes the F distribution different than the t or the z distributions?
3. When does the t statistic bear a special relationship to the F ratio? What is the
nature of the relationship?
4. Why is it a bad idea to use multiple t tests to assess whether significant differences
exist among more than two means?
5. What makes the ANOVA distinct from previous statistical tests reviewed in this
book?
Answers
1. The numerator of the F ratio is comprised of between-groups or treatment variance (i.e., vari-
ance attributable to an independent variable and error variance). The denominator is based
on within-group or error variance, which is made up of variance from individual differences
and experimental error.
2. The F distribution has two degrees of freedom (one for the numerator, the other for the
denominator in the F ratio) based on two independent sources of variance; it is nonnormal
in shape except when these two degrees of freedom are large; the F ratios comprising the
distributions are always positive in value.
3. The value of a t statistic can be determined from an F ratio (and vice versa) only when both
statistical tests are used to analyze data from two independent samples, in which case t 2 = F
and VF = t.
4. Multiple pairwise comparison Q' tests inflate the probability of making at least one Type I
error, known as experimentwise error. Use of the ANOYA is prescribed instead because it
simultaneously evaluates whether any differences exist among a collection of means while
holding the risk of Type I error constant.
5. The ANOYA (a) is an omnibus statistical test, (b) protects against experimentwise error, and
(c) assesses more complex causal relationships than prior statistical tests we have learned.
a practical warning. The ANOVA involves more steps and a bit more calculation than
you are used to-yet. As you read through the next several pages, this statistical tech-
nique may even look a bit daunting-but it only looks that way; in statistics, appear-
ances really are deceiving.
As you work through the example in a step-by-step manner, you will discover that
the ANOVA is no more difficult a technique than any of the others we have already
learned. As always, keep focused on what you are doing so that if you stop at any point,
you will know how and why you got to that point. Keep the mise en place philosophy
in mind here, as it will serve you well-there is quite a bit of quantitative information
to organize and interpret, but you will have little difficulty if you take your time with
this section of the chapter and think about what you are doing every step of the way.
We can begin defining the statistical test that is the focus of the remainder of this
chapter, the so-called one-way analysis of variance or one-way ANOVA, for short. It is
called a one-way ANOVA because it involves the use of one-and only one-indepen-
dent variable (treatment), which must have two or more levels to it.
KEY TERM A one-way analysis of variance (one-way ANOVA) is a statistical technique for analyzing the
variation found within the various levels of a single independent or treatment variable. A one-way
ANOVA will compare the means of two or more levels with one another in order to determine if
any significant difference(s) exist(s) between or among them.
An environmental psychologist, for example, might be concerned with identifying the
proper amount of ambient light that should be prevalent in public structures-an
office building or a public library-where work or study take place. Ambient light would
be the treatment or independent variable, and the psychologist would simply manipu-
late the amount of light (measured in lumens) different samples of participants are
exposed to in the setting. The psychologist might use four different levels of illumina-
tion-these correspond to the levels of treatment or of the independent variable-so
that some aspect of people's behavior (e.g., work productivity, books read or checked
out) could serve as a dependent measure.
Within any ANOVA, the independent variable is assigned a different name than is
commonly used anywhere else. Independent variables are referred to as factors, which
contain different levels, the components or values that serve to differentiate what one
sample of participants experiences and another-or others-do not.
KEY TERM A factor is a synonym for a treatment or independent variable within an ANOVA. To be analytically
viable, a factor must have two or more levels within in it.
In the previous example, ambient light was a factor, and the four distinct amounts of
illumination were its levels.
samples are functionally equivalent to one another (i.e., conceptually, they hail from
the same population, therefore, each JL is equal to all the others), or:
Ho: JLI = JL2 = ... = JLj'
where j is the individual number of the k means (or samples) present. If there
were three sample means (i.e., k = 3) representing three levels of a treatment vari-
able (i.e., j = 1,2, and 3) then the null hypothesis would specify their presumed
equivalence as:
Ho: JLI = JL2 = JL3·
Naturally, there is no reason to anticipate that the three means (or however many are
present) actually share precisely similar magnitudes-we are dealing with sample data
and so we assume that any observed differences among the means are superficial, cre-
ated by chance or sampling error.
Runyon, Haber, Pittenger, and Coleman (1996) offer a compelling substitute for
this traditional formulation of the null hypothesis. These authors suggest emphasizing
the fact that through the F ratio, the ANOVA literally looks at the ratio of between-
groups variance to within-groups variance. As we noted earlier, when an independent
variable fails to create marked differences between samples or groups, then the variance
estimate for the numerator is treated as being equivalent to the denominator-the F
ratio is equal to 1.00 (or thereabouts due to measurement or sampling error). Thus, the
null hypothesis could be convincingly recast as:
What about the alternative hypothesis? How do we present it using these two
schemes? To begin with, we acknowledge-and remind ourselves-that the ANOVA
is an omnibus statistical test, one that indicates whether at least one mean is signif-
icantly different from another mean; this test does not (except in the case of two
means) identify where or how many differences exist. We will learn a technique to
tease this important matter out of the data later. For now, we simply want to note
that the alternative hypothesis implies that at least one difference exists (keeping in
mind, though, that an investigator will have a desired, theory-based pattern of
results in mind).
We can achieve this end in a few ways. First, we can broadly state that the alterna-
tive hypothesis does not conform to Ho, suggesting some lack of equivalence among the
means stipulated by the null hypothesis, or:
HI: not Ho.
A second possibility is to indicate that some difference will exist, as in:
HI: One treatment mean, at least, will be different than another
treatment mean.
Note that both alternative hypotheses are identical in their meaning.
The third possibility, one advanced by Runyon et al. (1996), compliments the
variance comparison made for Ho above:
HI: (T~etween *" (T~ithin.
Note that this alternative hypothesis is nondirectional, though we hope that a relatively
large amount of estimated variance exists in the numerator and a smaller amount of
variance is found in the denominator. If the empirical tables turn-there is more
variance in the denominator than the numerator-then the F ratio will not detect any
One-Factor Analysis of Variance 429
significant difference between any means. We will come back to hypothesis formula-
tion again when we work thorough an ANOVA example in detail.
_ [ (I Xl ] _ (I X i/
[11.10.1) S~etween - I N
nj
430 Chapter 11 Mean Comparison II: One-Variable Analysis of Variance
This formula prescribes that (a) the observations within a factor level (j) are
summed and then squared ((I Xj )2), and this result is then divided by the number
of observations or participants with that level (nj)' In turn, each of the products is
then summed-if there are three levels in a factor, for example, then three prod-
ucts are summed (I [(I xi
/nj]). Finally, (b) the correction term (i.e., the same
number from the previous step) is subtracted from the sum of the products from
(a) (see formula [11.10.1]).
The degrees of freedom for the between groups variance estimate (dfbetween) is:
[11.11.11 d}between = k - 1,
where k is the total number of levels in the study'S factor (and the total number of
j levels should equal k, of course).
The between-groups variance estimate, O'~etween' is also called the mean squares
between-groups or MSt,etween. "Mean squares" refers to the fact that it is based on
the mean or the average of the sum of the squared deviations between the groups.
Thus, the estimated variance between groups is based on:
MSbetween S~etween
[11.12.11 = d+' .
:lbetween
3. Calculate the within-groups variance estimate. The third and last calculation for
the one-way ANaVA is the within-groups variance estimate (O';'ithin)' As always,
we begin by calculating the within-groups sum of squares (SSwithin) using this
formula:
[11.13.11 SSwithin = I xt - I [ (I n. Xi ].
J
If you look closely at both parts of this formula, you will recognize the fact that we
already did the requisite calculations in the course of the previous two steps. Thus,
all of the raw scores were squared and summed back in (a) in step 1 (i.e., I xt),
just as I [(I xl/ nj] was determined in (a) in step 2. We need only calculate the
difference between the numbers corresponding to each to know SSwithin-I hope
you appreciate the virtue provided by the quick substitutions of prior calculations
(you will shortly).
The formula for the degrees of freedom for the within-group variance
estimate is:
[11.14.11 dfwithin =N - k.
That is, the number of groups or samples is subtracted from the total number of
participants available. We can then determine the within-group variance estimate-
known as the mean squares within-groups-using:
M<:: SSwithin
[11.15.1J '-'within = ~.
:lwlthln
4. The final step, of course, is the calculation of the F ratio, which is based on the
between-groups variances estimate (O'~etween) divided by the within-groups vari-
ance estimate (O';'ithin), or:
[11.16.1J F = MSbetween.
MSw;thin
One-Factor Analysis of Variance 431
Total Variance
There is yet another way to construe variability. We are so accustomed to thinking about
variability in group terms (i.e., behavioral differences between groups and behavioral similarity
within groups) that we lose sight of the fact that any behavioral change begins in each individ-
ual in a given experiment. In theory, a score corresponding to each individual in a data set can
be described by a single equation. This equation is known as the general linear model (GLM).
The GLM posits that any observed score is based on the sum of the population mean, the
specific level of some independent variable and random or chance influences. Symbolically, the
GLM for a one-way ANaVA looks like this:
where
X ij = is the score of one person (i) in a given group (j)
i.L = the population mean of the domain of interest
a = the positive or negative effects of a specific treatment condition
(note that this a is not the one we associate with significance testing)
eij = the Greek letter e (epsilon) denotes random error (e.g., individual
differences and experimental error).
This alternative approach to variability and the ways it is partitioned should help you to con-
strue behavioral change in individual rather than exclusively group terms. To be sure, the ANaVA
model encourages us to think of behavioral change occurring as a group phenomenon, but the GLM
nicely illustrates how change actually occurs one person at a time. This individualized approach
should remind you of one important truth-an independent variable affects each person slightly dif-
ferently, just as random vagaries of personal experience will have different effects, as well.
Hypothetical Speed Estimates (in Miles per Hour) from a Replication of Loftus
Table 11.3 and Palmer's (1974) Language and Memory for Auto Accidents Study
II II Group 1 Group 2 Group 3
.l1li
" Verb "Smashed" "Bumped" "Contacted"
XI,I = 30 XI,2 = 20 Xu = 10
X2,1 = 25 X 2,2 = 16 X 2,3 = 14
X3,1 = 22 X 3,2 = 18 X 3,3 = 12
Xu = 27 X 4,2 = 18 X4 ,3 = 10
IXI = 104 IX2 = 72 I X3 = 46
I XII = 2,738 I Xl 2 = 1,304 I Xl3 = 540
N= 12
IXij= 222
Ix~= 4,582
One-Factor Analysis of Variance 433
to be greater than the one reported by the contacted group (X3 = 11.5). By analyzing these
data with an ANOVA, the student researcher will determine (a) whether any differences
exist between (or among) the means representing the three levels of the independent vari-
able and (b) verify if the replication of Loftus and Palmer's (1974) work was successful.
Similar to the previous detailed examples we conducted from start to finish, we are
going to rely on guidance from Table 9.2's list of steps for testing a hypothesis-
however, that list of steps needs to be revised a bit to conform to the constraints of the
ANOVA. A suitable revision is shown in Table 11.4, which contains four steps for test-
ing a hypothesis with an ANOVA The first step and part of the second step in Table 11.4
are identical with their counterparts in Table 9.2-the differences occur in steps 3 and
4, primarily because the ANOVA involves more detailed steps than the t or the z tests.
In addition, Table 11.4 is meant to accommodate other forms of ANOVA beyond the
one-variable case we are introducing in this chapter (other ANOVA designs are
introduced in chapters 12 and 13).
We will now begin the actual ANOVA analyses of the data in Table 11.3. Follow-
Conducting an ANOVA requires some ing step 1 of Table 11.4, we state the null and alternative hypotheses for the replication
experiment. Naturally, the student researcher assumes that Ho entails no difference
different steps than are used to
among the sample means and the populations they hail from (i.e., use of a different
calculate the t or the z test. action verb in each group will have no effect on the participants' estimates of how fast
the filmed cars were traveling when they crashed), or:
Table u.s illustrates the basic source table used for anyone-way ANOVA, though
The ANOVA source or summary table it only contains the statistical symbols we reviewed previously-the actual statistics
(numbers) will be entered into a copy of this summary table that comes later, after the
organizes various statistics in
calculations. Take a moment to look over this table so that you are familiar with the
preparation for calculating an Fratio. logic involved in its construction. In terms of making entries, we move from the left
(source of the variation) to the right (the actual F ratio and its accompanying p value).
We will refer back to Table u.S as we proceed with the analyses here in step 3. Each of
the calculations we perform will be entered in the appropriate place in completed source
Table 11.6.
We begin by calculating the sum of squares of the total variance and then the
degrees of freedom associated with it. (Please note that for the most part, we are using
the formulas and repeating the formula numbers presented earlier in the chapter.) To
calculate the total sum of squares, we use:
(Ix.i
[11.8.1] SStotal =" 2
L, X ij -
'J
N
The numbers corresponding to formula [11.8.1]'s entries are readily found in Table 11.3,
so that:
(222)2
[11.8.2] SStotal = 4,582 - --,
12
49,284
[11.8.3] SStotal = 4,582 - --,
12
[11.8.4] SStotal = 4,582 - 4,107,
[11.8.5] SStotai = 475.
The SStotal of 475 is then entered at the bottom of the second column in Table 11.6.
Calculating the degrees of freedom for the SStotal is readily accomplished using
formula [11.9.1]:
[11.9.1] dftotal =N - 1.
Because there are a total of 12 participants,
[11.9.2] dftotal = 12 - I,
[11.9.3] dftotal = 11.
One-Factor Analysis of Variance 435
These 11 degrees of freedom are then entered at the bottom of the third column in
Table 11.6.
We now perform the three calculations for the between-groups variance estimate.
The first step is identifying the sum of squares between groups, or SSbetween:
_ [(I x/ ] _ (I Xij)2
[11.10.1) SSbetween - I nj
N
Once again, the requisite information is easily obtained from the prior work done in
Table 11.3:
Please stop for a moment here and verify that you recognize where each of the
numbers shown in [11.10.2] came from in Table 11.3, and that you know how they
were calculated. (Remember, too, that we simply tacked 4,107-the correction term,
or (I Xi//N-on to the end of [11.10.2] without recalculating it; we determined its
value earlier when we calculated the SStotal') By proceeding with the calculation for the
S~etween' we see that:
[11.11.1) dfi,etween = k- I,
where k is the total number of levels of the independent variable (factor) or:
[11.11.2) dfi,etween =3- 1,
[11.11.3) dfi,etween = 2.
These 2 degrees of freedom are entered into the top place in column 3 in Table 11.6.
By dividing the SSbetween by the dfi,etween we can determine the mean square
between-groups, that is, the between-groups variance estimate:
[11.12.1] MS S~etween
between = df .
~between
[11.13.1) SSwithin =I
2
X ij - I [(I Xi]
nj .
Again, the necessary numerical information for the first half of the formula is drawn
from Table 11.3 (verify you know how and where it came from, please), whereas the
second part - I [(I xl/ njl- is taken from the previous calculation of SSbetween, for:
[11.13.2] SSwithin = 4,582 - 4,529,
[11.13.3] SSwithin = 53.
The SSwithin is then entered into the middle of column 2 in Table 11.6. Before we pro-
ceed to the calculation of the degrees of freedom for the within-groups variance esti-
mate, however, we can perform a quick error check involving the sources of the three
sums of squares. Specifically, using the three entries in column 2 of Table 11.6, we can
verify that the SSbetween and SSwithin sum to the SStotah or:
[11.17.1] (SSbetween + SSwithin) = SStotah
[11.17.2] (422 + 53) = 475,
[11.17.3] 475 = 475.
As you can see, the sum of squares corresponding to the two variances estimates does add
Be sure to perform any and all error up to the total sum of squares-thus, we can be confident that no math error was made.
The degrees of freedom within-groups is known by:
checks-they will save time and
increase accuracy in the long run. [11.14.1] d!within =N - k,
[11.14.2) d!within = 12 - 3,
[11.14.3) d!within = 9.
The degrees of freedom within groups can be entered into the middle of column 3 in
Table 11.6, enabling us to perform a second error check. We must verify that the degrees
of freedom for the between- and within-groups variances estimates add up to the to-
tal degrees of freedom shown at the bottom of column 3 in Table 11.6. Do they?
[11.18.1) (dfi,etween + d!within) = dj;otah
[11.18.2] (2 + 9) = 11,
[11.18.3) 11 = 11.
They do, so that once again, we can assume that no errors have been made.
The within-groups variance estimate or mean square within groups is based on:
Table 11.6 Completed (Hypothetical) One-Way ANOVA Source Table with Entries
II a Source of Variation Sum of Squares df Mean Square F p
••
Between groups 422 2 211
35.82 .05
Within groups 53 9 5.89
Total 475 11
columns of rows 1 and 2 in Table 11.6 (Le., When SSbetween is divided by dfbetween, does
the resulting number equal MSbetween? When SSwitbin is divided by d!within> does the re-
sulting number equal MSwithin?)' If the numbers check out-as they should-then the
final calculation in the source table, the F ratio, is a snap.
We now calculate the F ratio for this one-way ANOVA, the test statistic that will
indicate whether any reliable difference exists between (among) any of the speed esti-
mates corresponding to the three action verbs (represented by the three means shown
in Table 11.3). Be sure to remember that an F ratio for a one-way ANOVA is based on
the ratio of between-group variation to within-group variation-and that more of the
former is preferred relative to a small amount of the latter. Using formula [11.16.1] and
the two variance estimates from column 5 in Table 11.6, we find this to be the case:
[11.16.1] F = MSt,etween,
MSwithin
[11.16.2] F= 211,
5.89
[11.16.3] F = 35.82.
How do we report this F ratio and determine whether it is significant? The F ratio
is correctly reported with the degrees of freedom taken from both its numerator and
denominator (remember that any F distribution is based on both these number sources),
as well as the statistic itself:
F(2, 9) = 35.82.
Make careful note that these two numbers for the degrees of freedom were taken from
column 3 in Table 11.6, as they represent-in order-the between-group variance es-
timate (i.e., 2) and the within-group variance estimate (i.e., 9).
We now need to locate a critical value for F, a number whose magnitude indicates
whether we can accept or reject the null hypothesis of no difference. To achieve this
end, we must learn to use a new table of critical values-you guessed it, a table of F
values. A section of Table B.5 is reprinted in Figure U.5 (the complete table may be
found in Appendix B). The F table is not at all difficult to use: All we need to do is find
the intersection between the column corresponding to the numerator (here, 2) and the
row for the denominator (here, 9; see where the arrows are drawn and meet in Figure 11.5).
The lighter entries in Figure 11.5 (and Table B.5 in Appendix B) are critical values at
the .05 level, while the darker ones represent the .01 level. Back in step 2 (following the
guidelines shown in Table 11.4), we chose .05 as the significance level, so the critical
value of F for this experiment is 4.26, written as:
Fcrit (2, 9) = 4.26.
Following the usual statistical convention, we determine that if the calculated sta-
tistic is greater than or equal to the critical value, then Ho is rejected-otherwise, it is
438 Chapter 11 Mean Comparison II: One-Variable Analysis of Variance
-'"
.S!
c
3 10.13 955 9.28 9.12 9.01 8.94 8.88 8.84 8.81 8.78 8.76 8.74 8.71 8.69 8.66 8.64
34.12 30 81 29.46 28.71 28.24 27.91 27.67 27.49 27.34 27.23 27.13 27.05 26.92 26.83 26.69 26.60
'sco 4 7.71 694 6.59 6.39 6.26 6.16 6.09 6.04 6.00 5.96 5.93 5.91 5.87 5.84 5.80 5.77
c
-'"
'C
.S!
EO
co
21.20 18 00 16.69 15.98 15.52 15.21 14.98 14.80 14.66 14.54 14.45 14.37 14.24 14.15 14.02 13.93
5 6.61 579 5.41 5.19 5.05 4.95 4.88 4.82 4.78 4.74 4.70 4.68 4.64 4.60 4.56 4.53
'C 16.26 13 27 12.06 11.39 10.97 10.67 10.45 10.27 10.15 10.05 9.96 9.89 9.77 9.68 9.55 9.47
'"
~
- 6 13.74
co
en
5.99 514 4.76 4.53 4.39 4.28 4.21 4.15 4.10 4.06 4.03 4.00 3.96 3.92 3.87 3.84
10 92 9.78 9.15 8.75 8.47 8.26 8.10 7.98 7.87 7.79 7.72 7.60 7.52 7.39 7.31
'"~
c
ar 7 5.59 474 4.35 4.12 3.97 3.87 3.79 3.73 3.68 3.63 3.60 3.57 3.52 3.49 3.44 3.41
12.25 955 8.45 7.85 7.46 7.19 7.00 6.84 6.71 6.62 6.54 6.47 6.35 6.27 6.15 6.07
8 5.32 446 4.07 3.84 3.69 3.58 3.50 3.44 3.39 3.34 3.31 3.28 3.23 3.20 3.15 3.12
11.26 865 7.59 7.01 6.63 6.37 6.19 6.03 5.91 5.82 5.74 5.67 5.56 5.48 5.36 5.28
CID 5.12 4.26 3.86 3.63 3.48 3.37 3.29 3.23 3.18 3.13 3.10 3.07 3.02 2.98 2.93 2.90
10.56 8.02 6.99 6.42 6.06 5.80 5.62 5.47 5.35 5.26 5.18 5.11 5.00 4.92 4.80 4.73
accepted (Le., retained). Because the obtained value is greater than critical value, the
null hypothesis of no difference is rejected, or:
Fcalculated(2, 9} = 35.82 ;::: Fcrit (2, 9} = 4.26: Reject Ho.
What does this significant F ratio tell the student researcher who set out to con-
A significant omnibus ("big, dumb") ceptually replicate Loftus and Palmer (l974)? More specifically, where exactly is (or are)
the difference(s} between (among) the three means shown in the middle of Table 11.3?
Fencourages the data analyst to
In actuality, we cannot say-ret-where any given difference lies. Why not? Remember
undertake the (post hoc) search for that earlier in the chapter we learned that except in the two-sample case, the F test yields
where precise differences between or an omnibus statistic; that is, it partitions the variance and detects the presence or ab-
among means are found. sence of a reliable difference(s} but it does not pinpoint where it is (except in the two
sample-two mean-case when the difference is obvious; recall the independent groups
ttest). In a real sense, then, the omnibus F statistic is effectively a "hunting license;' one
that, when significant-as it is here-entitles the data analyst to use supplemental sta-
tistics to determine the precise nature and location of the significant difference(s}. This
possibility was mentioned before when the idea of post hoc tests was introduced, as
well as being noted in step 4 of the guidelines in Table 11.4.
In other words, we are not yet finished with the one-way ANOVA. We know that
some difference exists between or among the three speed estimates, but we cannot be
certain where it lies until we perform the requisite post hoc analysis. We cannot begin
to write up the results in detailed prose yet either-we need more information before
any concrete behavioral conclusions can be drawn. At this point in time, the F ratio is
mute about the true nature of the results. In fact, some researchers only half-jokingly
( Post Hoc Comparisons of Means: Exploring Relations in the "Big, Dumb F" 439
r
"
I
refer to a significant F ratio for more than two sample means as a "big, dumb F;' a sur-
prisingly odd but apt and even memorable name for it. I assign the F this nom de
guerre----an assumed name-because it helps me (as it will help you) to remember the
"I nature of significant differences in the ANOVA and the necessity of post hoc tests, mat-
(
,r ters we take up in earnest in the next section.
r
f
(
I'
III Post Hoc Comparisons of Means: Exploring Relations
( in the "Big, Dumb F"
r" An omnibus or "big, dumb F" lets an investigator know that some difference between
(
r the means in a study exists, one where the variability attributed to the influence of the
treatment variable exceeds that associated with sampling error or chance variation. To
( locate where the difference or differences exist, a data analyst will rely on what is called
r a comparison, one usually but not necessarily performed between pairs of means.
KEY TERM A comparison is a statistical procedure wherein a researcher tests a particular hypothesis re-
r garding how various means compare to one another.
,-r
The logic of the comparison process is similar to that used by the independent groups
t test, except that most comparison procedures are designed to protect the researcher
r from making a Type I error.
"
I Statistical comparisons can be further subdivided into two approaches: a priori and
r
t post hoc comparisons. We already mentioned the latter in passing, so we will define the ap-
,.I proach advocated by a priori (pronounced "aye pry-or-ee") comparisons first. An a pri-
ori comparison is created prior to the collection of any data, one that is usually an inte-
I gral part of the study's hypothesis. These types of comparisons are often called "planned
I
I comparisons" or even "planned contrasts" (see Rosenthal & Rosnow, 1985). A priori com-
(
parisons are often based on existing theories and the weight of data from the published
literature, or at least the certainty assured from an ongoing program of research.
To be sure, almost any piece of research conducted involves some degree of hy-
Comparisons can be planned (a pothesizing about the relationships between or among levels of a treatment variable
before the data are collected. The key issues are the relative amount of certainty a
priori) or developed after the fact
researcher has in those relationships and the degree of statistical rigor desired in a given
(post hoc), and either type is set of analyses. In many situations, unexpected, even serendipitous patterns between
appropriate, depending on the means do show up, rendering an a priori comparison untenable as well as inappropri-
particular circumstance, ate. When this sort of event occurs-and it does fairly often, as data rarely meet a
;'
hypothesis in a perfect one-to-one match-the researcher must rely on the aforemen-
I tioned post hoc comparison.
.1 Post hoc comparisons are identical to planned comparisons in computational
I terms. Where they differ from planned comparisons is in the ease with which a critical
value delineating a significant difference from a nonsignificant one can be reached or
exceeded. In other words, post hoc tests make it tougher on the data analyst to find a
reliable difference, thereby increasing the level of confidence or faith one can have in a
result. To be sure, a priori comparisons are statistically more powerful than post hoc
comparisons, but the latter have the advantage of being useful when (a) no firm hy-
potheses are established (a proverbial "fishing expedition" where one is fishing around
in the data in order to "catch" some result-other authors [e.g" Kirk, 1999] refer to this
process as "data snooping") or (b) unanticipated relationships are uncovered in the data.
Other points of comparison, as it were, between these two approaches include:
• A researcher undertakes post hoc tests only when the initial pass of the ANOVA
indicates a null hypothesis is rejected. In contrast, an a priori comparison can
440 Chapter 11 Mean Comparison II: One-Variable Analysis of Variance
actually be done without doing an ANOVA (see the later section introducing
contrast analysis).
• Post hoc tests control the experimentwise error rate via the possible number of
comparisons one could perform rather than the number actually performed.
A priori tests control experimentwise error in terms of the specific number of
comparisons being conducted.
• Each of these types of statistical procedures protects data analysts from inflated
alpha or Type I errors, and can compare pairs of means or groups of means.
We focus on post hoc comparisons in this section of the chapter for two reasons. First,
as a student of ANOVA, you should learn the logical progression of steps a researcher
follows when a significant F ratio is found-and the next step is usually a post hoc
comparison. Second, most researchers rely on the ANOVA in a more or less ex-
ploratory fashion, anyway, so it will be important for you as a producer and consumer
of behavioral science data to comprehend the whys and wherefores of post hoc tests.
I will, however, close this chapter with an example of how one sort of planned com-
parison can be liberating and enlightening for researchers, but I offer it there as food
for thought and future statistics courses, as it covers advanced terrain. Here, in a ba-
sic introductory course, it is more important for you to acquire the necessary back-
ground for understanding what is typically done with statistics and data analysis in
mainstream behavioral science research (but see the controversies regarding main-
stream approaches in chapter 15). If you are interested in exploring a priori compar-
isons further than the descriptive overview provided here or at the end of the chap-
ter, there are very good references available to do so (Harris, 1998; Kirk, 1994;
Rosenthal & Rosnow, 1985).
where HSD is a critical difference between means that must be reached or exceeded in
order to identify a reliable difference between pairs of means. The value of qa' which
is based on k (the number of means or groups in a study) and the dfwithin drawn from
the ANOVA, is taken from Table B.6 in Appendix B. The within-groups variance esti-
mate, or MSwithin> is taken directly from the ANOVA source table. Finally, n refers to
the number of participants within each sample or group-a data analyst need only be
concerned about whether the n is equivalent across groups. If not, then n' ("n-prime"),
the mean of various different values of nj' is substituted in the HSD formula, where
number of means
[11.20.1] n' = .
I (l/nj)
If there were three means with respective sample sizes of 6, 5, and 8, then n' would be
3
[11.20.2] n' =
116 + 1/5 + 1/8
3
[11.20.3] n' =
.167 + .20 + .125
3
[11.2Q.4] n'= - - ,
.492
[11.20.5] n' = 6.10.
This value of n' would be entered into the HSD formula (i.e., [11.19.1]) in place of an
equal n.
The three groups in the replication of Loftus and Palmer (1974) study, however,
had an equal number of participants-four-per group. We can now finish that ANOVA
analysis by calculating an HSD value and determining where, precisely, differences be-
tween pairs of means lie. As a first step, we need to identify qa using Table B.6 in
Appendix B (please turn there now-the table is on page 605). Table B.6 is titled "Per-
centage Points of the Studentized Range," and it has three main column headings-one
labeled "error df" (for the within-groups degrees of freedom), one for a (the desired
significance level; here, .05 or .01), and k (for the number of means being considered).
Based on the results of the ANOVA shown in Table 11.6, we know to locate the row for
9 degrees of freedom under the "error df" column. We then read down the k column
corresponding to 3 (for the three sample means in the study). Please locate the inter-
section point for row 9 and column 3-you should see 2 values, 3.95 and 5.43. We
select 3.95 as the qcx value because it corresponds to the .05 significance level (i.e., the
significance level chosen using step 2 in Table 11.4).
Now that qa is known, we can calculate the HSD value using formula [11.19.1}.
Besides substituting 3.95 for qcx' all we need to do is enter MSwithin from Table 11.6, as
well as the n of 4, for:
(5.89
[11.19.2] HSD = (3.95) -V 4-4-'
[11.19.3] HSD = (3.95)\11.4724,
[11.19.4] HSD = (3.95)(1.21),
[11.19.5] HSD = 4.78.
What do we do with this HSD value? The first thing to do is recall what the
,/
i
value means-any absolute difference between a pair of means that is 4.78 or greater
442 Chapter 11 Mean Comparison II: One-Variable Analysis of Variance
Table 11.7 Pairwise Comparisons Between All Means Using the Tukey HSD Test
Action Verb Mean Estimates (in Miles per Hour)
'""""'"
"" Action Verb Mean Estimates Smashed Bumped Contacted
26.0 18.0 11.5
Smashed 26.0 8* 14.5*
Bumped 18.0 6.5*
Contacted 11.5
Note: The table contains absolute differences between all possible pairs of means in the replication of
Loftus and Palmer (1974). An asterisk (*) indicates that the absolute difference between the means
(pairwise comparison) is significant at the .05 level using the Tukey HSD test.
is statistically significant at the .05 level. The second thing to do is to create a sim-
ple matrix of the three means so that we can identify the absolute differences be-
tween every possible pair. The matrix shown in Table 11.7 fits the bill-note that the
three means are presented in both the columns and the rows of this table. (The
upper-right and lower-left portions of a matrix like this one are always symmetric.
To avoid confusion, the absolute differences are shown only in the upper right of
Table 11.7.) Inside the table, you can see that the absolute differences between each
possible pair of means are indicated (remember, when you take the absolute value
of any number, the negative sign is dropped). How many of the absolute differences
presented in Table 11.7 reach or exceed the HSD of 4.78? In this particular case, they
all do-thus, we know that each of the sample means was significantly different than
each of the others at the .05 level (the asterisk [* 1shown by each entry in Table 11.7
indicates that the pairwise comparison is statistically reliable according to the HSD
test).
In practical terms-how use of a particular action verb influenced speed esti-
Remember to compare the HSD mates-what do we now know based on the results of the Tukey HSD test? As hy-
pothesized, speed estimates tended to increased as a function of the action verb used
value with the absolute value of any
in the question, ''About how fast were the cars going when they ___ -ed into each
difference (drop all negative signs!) other?" Participants who encountered the most dramatic verb, "smashed" eX = 26.0),
between a pair of means. offered significantly greater speed estimates than those who read the verbs "bumped"
(X = 18.0) or "contacted" eX = 11.5). In turn, the speed estimates for the "bumped"
sample were significantly higher than those made by participants in the "contacted"
group. In short, the student researcher successfully replicated the Loftus and Palmer
(1974) study.
How are these statistical results reported into an APA-style narrative? We will post-
pone this important matter until after we perform two supplementary analyses that
support the results of the one-way ANOYA-effect size, and the degree of association
existing between the treatment variable and the dependent measure.
~
[11.21.1]
f=V~'
where
[11.22.11 r/ = S~etween.
SStotal
We can readily determine the value of 712 ("eta-squared") by taking the respective
sums of squares from the ANOVA source table (see Table 11.6):
2 422
[11.22·21 71 = - ,
475
[11.22.3] 712 = .8884.
This value for 712 can then be entered into formula [11.21.1] so that f, too, can be known:
[11.21.21 f= J .8884 ,
1 - .8884
[11.21.3] f= J.8884,
.1116
[11.21.4] f = v'7.96,
[11.21.5] f = 2.82.
This effect size statistic is clearly quite large-in fact, it is off Cohen's (1988)
Data analysts speak of effect sizes scale-though you must keep in mind that these data are hypothetical. I do, how-
ever, want you to get a feel for the magnitude of the differences that can exist be-
for AN OVA results as being small,
tween means. The effect size in this hypothetical replication study is quite large, alert-
medium, or large, though Is above ing us to the fact that where verbs are concerned, how a question is phrased has a
.50 are rare in behavioral science powerful effect on resulting judgments-at least where speed estimates are con-
data. cerned. I must point out, though, that Cohen argued that effect sizes above .50 are
rarely observed in real behavioral science data. I mention this fact so that you can
make a realistic appraisal when you collect and analyze your own data or evaluate
the research of others.
R obert Abelson (1985), a statistically savvy social psychologist and baseball fan, drew attention
to an intriguing paradox involving [;1 and America's favorite pastime. In your opinion, how
much does a player's skill affect his batting record? Now, answer this same question in variance
terms-how much variability in performance at any given time at bat is due to skill, how much
is due to chance and other factors?
Using baseball "stats" as they are called, Abelson examined the percentage of variance in bat-
ting performance attributable to skill differences among major league baseball players. In a typ-
ical year, the batting averages of major league baseball players range between the low .200s and
the low .300s. A batting average is determined by dividing a player's number of hits by the
number of times he has batted; a team average can be calculated using the same logic. Abelson
estimated an [;} of .00317 for the average major league baseball player, a minuscule figure sug-
gesting that the variability in any single batting performance accounted for by skill is equal to one
third of 1% (or, if you prefer, over 99% of the available variation is not explained by a player's
abilities) !
This result should give you pause, just as it did Abelson and his interested colleagues who
found it surprising, even unbelievable. The casual observation of countless fans, as well as coaches,
teams, and team owners-not to mention sportswriters-is at odds with this sort of conclusion, or
is it? Abelson explains the baseball skill versus variance paradox this way: A given player's batting
prowess is measured across a long season, not on a particular trial at bat. As Abelson (1985) put it,
"a way to understand the paradox is to realize that in the major leagues, skills are much greater than
in the general population. However, even the best batters make outs most of the time" (p. 131).
Furthermore, Abelson's (1985) calculation of 1lJ2 takes into account players of all skill levels,
and most of them, of course, are average (see chapter 4). When people think of a player's skill at
batting, they are usually reflecting on his entire career, not the single time at bat entailed in this
statistic. As observers, we frame perceived athletic performance in different terms than those mea-
sured by a statistic like 1lJ2. Should we belittle this statistic, then? Not according to Abelson (p.132):
The single at bat is a perfectly meaningful context. I might have put the question this
way: As the team's manager, needing a hit in a crucial situation, scans his bench for a
pinch hitter, how much of the outcome is under his control? Answer: one third of 1%.
Qualification: This assumes that the standard deviation of the batting averages against
a given pitcher is the same as the standard deviation of batting averages in general.
Are there any statistical implications of this sort of variance analysis? Yes, indeed there are
some implications. Teams score runs as a result of a conjunction of events (e.g., a prior hit by
player A is linked to a current hit by player B), and teams with better-than-average batters should
perform better across time than those with lower than average players. A team's athletic success is
affected more by the average batting skill of its players than any individual's successful performance
on a given try at bat. Skill, then, is cumulative, and this is true for any given player and for an en-
tire team. What matters is the process whereby variables like these operate in real world settings-
they may have an additive quality so that their effects are not salient or influential in any given
circumstance. Belief that the variance in batting averages is related to skill is intuitively correct but
only in the long run-skill is not very powerful or even consequential in a single situation.
Abelson's (1985) point is a powerful one-relying on IlJ2 or a similar index that accounts for
variability can be inappropriate in situations where small changes become cumulative across time.
Despite appearances and belief to the contrary, some meaningful outcomes (e.g., a player's bat-
ting prowess, a winning team), then, develop or unfold gradually, and though skill is clearly a
Post HOG Comparisons of Means: Exploring Relations in the "Big, Dumb F" 445
part of the process-it is more apparent than real at any point time. Explaining variance via [;}
is not always the best indicator on systematic influence. Like so many things in life, skill is in the
eye of the beholder. Some social processes, such as educational interventions or the effects of per-
suasive communication on consumer behavior, as well as baseball, take some time-and a longer
perspective-to arrive at an adequate understanding of their operations (Abelson, 1985).
Here it is:
dft,etween (F - 1)
[11.23.1] ()} = - - ' - - - - - - - -
dft,etween (F - 1) +N
Once again, all the information we need to perform the necessary calculations can be
drawn from the ANOVA source table we created (see Table 11.6) and knowledge about
the study (i.e., we know that 12 participants took part in the replication, so N = 12).
Entering the appropriate statistical information we find that:
(;j2 = 2(35.82 - 1)
[11.23.2]
2(35.82 - 1) + 12
2(34.82)
[11.23.3] (;j2 =
2(34.82) + 12
()} = 69.64
[11.23.4]
69.64 + 12
69.64
[11.23.5] ()} = - -,
81.64
An t} based on ANOVA results is
[11.23.6] (;j2 = .8530 == .85.
interpreted in the same manner as a
The student researcher can conclude that the independent variable (action verb
tJ2 for a t test, though the choice) accounted for approximately 85% of the variance in the dependent measure
calculations differ. (speed estimates). In short, there was a very high degree of association between the in-
dependent variable and the dependent measure.
practice as a researcher and data analyst, you will develop an intuitive sense of how to
best present some data, whether it makes more sense to see it in numerical form in a text
or table, or in a more graphic presentation-a figure of some sort (see chapter 3 for rel-
evant guidelines). Just remember to provide thorough statistical information for critical
readers to follow and, if they so desire, to verify the logic of your arguments.
Here is one way that the student researcher could summarize the replication of
When writing about the results of Loftus and Palmer (1974). Please note that any supporting statistics (e.g., effect size) are
reported following the main results:
any analysis-ANOVA or otherwise-
Previous research indicated that the language used in a question can be as important
be sure to describe what happened
as the question itself. Participants in the present study all saw the same car crash and
in behavioral terms. were later asked to estimate how fast the cars were traveling when they hit one an-
other-however, the three groups of participants encountered three different action
verbs embedded in the speed question. Were these eyewitnesses as prone to memory-
related errors due to language as Loftus and Palmer (1974) suggest? Preliminary analy-
ses indicated that participants' speed estimates varied according to how dramatic the
action verb embedded in the dependent measure happened to be. The means and stan-
dard deviations for the three action verb groups are shown in Table 1. A one-way analy-
sis of variance revealed a significant difference among the three groups, F (2,9) = 35.82,
P < 0.5. Tukey's HSD test indicated that the average speed estimate for the "smashed"
group was significantly higher than the other two groups; in turn, the mean of the
"bumped" group was statistically greater than that observed in the "contacted" group
(see Table 1). The effect size of this result was quite large (f = 2.82), and there was a high
degree of association between the changes in the action verbs and participants' speed
estimates(&? = .85).
Note: All of the mean entries are significantly different from one another at the .051evel.
Knowledge Base
l. True or False: Factor is another term for independent or treatment variable.
2. You conduct an experiment where the independent variable has five levels. The data
are analyzed with a one-way ANOVA and a significant F ratio is found. Does the F
ratio indicate where the difference(s) between means are? Why or why not?
3. What is meant by the term "big, dumb F"?
4. True or False: A post hoc test can be undertaken when an ANOV.Ns F ratio is not
significant.
Answers
1. True
2. For more than two independent groups, the one-way ANOVA yields an omnibus Fratio, one
that indicates the presence of a difference but not its precise location. A follow-up compar-
ison, for example, one relying on a post hoc test (e.g., Tukey HSD test) must be conducted
to specify the nature of any difference(s) between means.
An Alternative Strategy for Comparing Means: A Brief Introduction to Contrast Analysis 447
3. This nontechnical but apt term refers to the omnibus nature of the F ratio when more than
two groups are being analyzed by an ANOVA-the statistic cannot identify where any pre-
cise difference occurs so that additional analysis is necessary.
4. False. Some a priori comparisons can be performed under these circumstances, but a post
hoc test is only proper when the F ratio reaches significance.
S.O
6.0
...
c
'"
E y 4.0
.
co
't:
a..
2.0
o.oS 12 X
9 10 11
Age
visually apparent differences not significant? The omnibus F based on the one-way
ANOVA looks for any mean differences, and it does so in an unfocused, diluted man-
ner-the arrangement of means by increasing age, for example, is not considered by
this omnibus test. Rosenthal and Rosnow (1991) advocate that a better test of the hy-
pothesis that task performance is enhanced by age can be tested by a contrast, a fo-
cused comparison where weights for each mean are used to compare obtained results
against some predetermined pattern (i.e., one based on theory, a hypothesis, or even an
educated guess). These weights are called A ("lambda") weights and they adopt nu-
merical values such that the sum of A (I A)is equal to 0 for any given contrast (the
rationale for this procedure will become clear shortly).
An Alternative Strategy for Comparing Means: A Brief Introduction to Contrast Analysis 449
Any contrast has one (1) degree of freedom for the numerator, while the denom-
Any contrast has one (1) degree of
inator is linked to the error degrees of freedom found in the ANOVA source table. An
F ratio based on a contrast can be determined using this formula:
freedom corresponding to the
numerator. L2
[11.24.1] MScontrast = SScontrast = - - - ,
n I).?
where L is the sum of all the condition totals (T), each one of which is multiplied by
a A-weight based on some hypothesis, or:
[11.25.1]
where
(here, we assume an equal number of 10 per group; if the group sizes were unequal, we
would substitute n' using formula [11.20.1]),
Rosenthal and Rosnow (1991) note that the necessary contrast is actually already
hardwired into the example. The means shown at the top of Table 11.8 can each be
multiplied by 10 (i.e., the n in each condition) in order to determine the T for each
condition! (see Table 11.9). The Tvalue of 20 for the 8-year-old group was determined
by multiplying the group mean (2.0) by n (10; see Table 11.9). In turn, each T must be
multiplied by a A-weight based on the educational researcher's theory (i.e., cognitive
skill increases with age). The weights corresponding to this linear trend can be determined
by subtracting the mean age of the five groups (i.e., Xage = 8 + 9 + 10 + 11 + 12/5 =
50/5 = 10) from each age group (i.e., 8 - 10 = -2; 9 - 10 = -1; 10 - 10 = 0;
11 - 10 = + 1; 12 - 10 = +2). (Tables of different contrast weights can be found in
Rosenthal & Rosnow, 1985; Snedecor & Cochran, 1967.) These weights are shown in
the second row of Table 11.9; as required, the weights sum to 0 (see the 0 entry at the
Table 11.9 Table of Calculations for Age Level and Performance Contrast
•••• Age Level
• •
8 9 10 11 12 I
T(nX) 20 30 50 70 80 250
A -2 -1 0 +1 +2 0
TA -40 -30 0 70 160 160
1If the number of observations per group were unequal, then T would be calculated using n'.
450 Chapter 11 Mean Comparison II: One-Variable Analysis of Variance
end of the row). Each T is then multiplied by its appropriate A-weight (see row three
in Table 11.9), and the resulting products are summed:
For convenience, the I TA, that is, L, is also shown in the lower right corner of Table 11.9.
By entering L into formula [11.24.1], we can calculate the sum of squares and then
the variance estimate for the contrast and, in turn, an F ratio that will reveal whether
the means are significantly different from one another in a linear fashion. The only
other entry into [11.24.1] that we need to calculate is I A2, which is:
[11.26.1] I A2 = Ai + A~ + A~ + A~ + A~,
[11.26.2] I A2 = (-2)2 + (-1)2 + (O? + (+1? + (+2)2,
[11.26.3] I A2 = 4 + 1 + 0 + 1 + 4,
[11.26.4] I A2 = 10.
25,600
[11.24.3] MScontrast = SScontrast = 100'
[11.24.4] MScontrast = SScontrast = 256.
As always, the F ratio is determined by dividing the variance estimate for the nu-
merator (MScontrast) by that for the denominator (MSwithin; see Table 11.8) for:
[11.27.1] F - MScontrast
contrast - MS . . '
withtn
256
[11.27.2]
Fcontrast = "63'
[11.27.3] Fcontrast = 4.06.
Turning to Table B.5, we find a critical value corresponding to 1 and 45 degrees of free-
dom (remember that any contrast has 1 df in the numerator-the denominator df is
from the source table in Table 11.8). When you turn to Table B.5, however, you will find
that there are error degrees of freedom entries for 44 and 46, but not 45 degrees of free-
dom. Using what is called interpolation, literally estimating or finding a value between
two others (here, Fs of 4.06 and 4.05), we identify the critical F as being equal to 4.055.
We find, then, that:
What does this significant F tell us? It indicates that the linear trend of means
shown in Table 11.8 is significant, such that children's performances on a cognitive
activity increased in a steady manner as they aged. Literally, the contrast shows us that
the 8-year-olds scored lower than the 9-year-olds, who in turn did less well than the
lO-year-olds, and so on up to the 12-year-old group, which had the highest average
score of all (see the display of means in Table 11.8).
An Alternative Strategy for Comparing Means: A Brief Introduction to Contrast Analysis 451
How would the educational researcher report these results? There are two options.
The more conservative approach would be to first report the (nonsignificant) findings
from the traditional one-way ANaVA before presenting the (significant) contrast re-
sults. The newer approach, that advocated by Rosnow and Rosenthal (1989), would be
to focus exclusively on the results of the contrast and to not even bother reporting the
one-way ANaVA findings as, strictly speaking, germane. The choice is ultimately your
own, though you may be asked by an instructor or editor to follow the more conserv-
ative approach in the interest of thoroughness. Here is one way to present the data in
that more inclusive spirit:
Did the older children perform better than the younger ones? The mean performance
scores appeared to adhere to a linear trend, where ability increased systematically with
age (see Table 1). This developmental question was addressed by analyzing the perfor-
mance scores with a one-way analysis of variance (ANOVA), with age level serving as
the independent variable. The ANOVA did not reach significance, P(4, 45) = 1.03,
P = ns. 2 The more appropriate analysis, however, entails a planned contrast where
performance was predicted to increase with age. As hypothesized, this contrast was
significant, indicating that cognitive ability improved with age, P(l, 45) = 4.06,
p< .05 ....
Note: All of the mean entries are significantly different from one another at the .05 level, and the n
for each age group is 10. These data were adapted from Rosenthal and Rosnow (1991, p. 467).
2Some writers use "ns" for "not significant:' whereas others either report the actual p value (here, AD) or
indicate p > .05 (note the direction of the sign). I happen to prefer "ns."
452 Chapter 11 Mean Comparison II: One-Variable Analysis of Variance
data. Writing about a complex topic requires you to really think about how to "marry"
abstract and concrete elements to one another in a coherent way where others can un-
derstand your point of view.
Here are step-by-step instructions for writing a letter on what you know about the
one-way ANOVA. Chances are that as you write your letter, some aspects of the ANOVA
will become clearer to you, others less so. I want you to know that this sort of mixed
reaction is perfectly normal, as it will motivate you to ask particularly focused ques-
tions and to reread portions of this chapter so that your understanding of one-way
ANOVA is relatively complete.
1. Identify your intended audience. You can write a letter describing what you know
about the ANOVA to your statistics instructor, a peer in your class, or a peer in
some other class (Dunn, 2000; Konzem & Baker, 1996). Often times, it can be con-
structive to write to someone who knows little about the subject, as you will need
to explain any technical issues carefully and thoroughly.
2. Choosing the content and focus of the letter. Your goal is to write a letter wherein
you explain one or two things you have learned about the one-way ANOVA that
are interesting, compelling, confusing, or otherwise noteworthy (Dunn, 2000). De-
scribe why it was important to learn the ANOVA, for example, particularly where
it can aid students in the behavioral sciences. You could also describe some hypo-
thetical experiment whose results illustrate either very little or considerable
between-groups variability, and how this variability impacts on the presence or
absence of statistical significance via the Fratio (Johnson, 1989). Alternatively, you
could write about the nature of the F ratio and what it means to obtain an F ra-
tio close to 1.00. The choice of what to write about is entirely up to you-just be
somewhat creative and detailed.
3. Sharing letters: Writing, reading, and responding. A variation of this exercise en-
tails exchanging letters with a peer in or outside of the class. That is, you not only
write about the ANOVA, you also read what another student thought about this
analytical tool and, in addition, have the opportunity to write back to him or her
in the form of a second letter. Exchanging letters can be a helpful exercise because
it allows you to see-and sometimes critique-the ideas of others while they com-
ment on the point of view espoused in your letter. (Your instructor may want copies
of any letters you share with or receive from a peer.)
4. Other suggestions. Your letter can be printed out and mailed to your instructor or
a peer, or sent directly via e-mail. If your instructor is interested or you are partic-
ularly motivated, a chat room could be created for the members of your statistics
class on your campus network. How long should your letter be? I would strive for
two or so typed, double-spaced pages. Remember, the best way to really learn some-
thing is to write about it!
of hypothesis testing that began with the z test and the t test, respectively. The
one-way ANOVA enables the data analyst to simultaneously compare two or
more means, where these previous tests restricted such comparisons to two sample
means (see chapter 10) or a sample mean and a known population average (see chap-
ter 9). The two decision trees at the start of this chapter promote the idea of moving
Summary 453
beyond simple two-group comparisons. In fact, there are many steps involved in the
calculation of a one-way ANOVA, and these trees provide an excellent review of pro-
cedures, thereby ensuring that no important issues are neglected. In this spirit, the
first decision tree is designed to help you, the investigator or data analyst, decide
whether a one-way ANOVA is the appropriate tool for the available data and research
design. By answering a few questions concerning how the data are scaled and whether
the samples or groups are independent, one can quickly determine the best way to
proceed with an analysis.
What happens after the one-way ANOVA is performed? The second decision tree
opening the chapter is a guide to the procedures that follow the calculation of an F ra-
tio. Beyond determining whether the test statistic is significant, the data analyst must
also decide to proceed with any necessary post hoc test or the calculation of supple-
mentary analyses (i.e., effect size, degree of association between independent and de-
pendent variables). When correctly performed, the one-way ANOVA and these supple-
mentary procedures can provide a detailed portrayal of behavior, which can help you
to build more specific theories of why people act the way they do.
I Summary
1. The F ratio, the test statistic based on the analysis of variance null hypothesis of no difference is appropriately accepted. As
(ANOVA), was originally used to analyze agricultural data. the F ratio becomes greater than 1.00 in value, the probabil-
2. The ANOVA is probably the most commonly recognized sta- ity of rejecting Ho increases substantially.
tistic used by behavioral scientists. It is used to test hy- 8. The t statistic and the F ratio share a specific relationship
potheses involving two or more means. when two-and only two-means are present. Specifically,
3. Statistically, the ANOVA "partitions" or divides variance into the square root of an F ratio yields a t statistic for a two-
portions that can be attributed to the effect of an indepen- group comparison, while squaring a t value from an inde-
dent variable on a dependent measure. Total variance refers pendent groups t test results in a comparable F ratio.
to the combined variance of all available data in an experi- 9. The statistical assumptions underlying the ANOVA match
ment, which can be broken down into between-groups vari- those related to the t test. Similar to the t test, the ANOVA
ance (usually caused by the manipulation of an independent provides quite a robust statistic. The t test is limited, how-
variable) and within-groups variance (caused by random er- ever, when it comes to comparing multiple means. Practi-
ror or chance events). cally speaking, only two means can be compared at one time
4. Between-group variance is often called treatment variance, and, as the number of t tests is increased due to interest or
and it involves examining how some treatment (e.g., drug, need, the probability of making a Type I error increases dra-
novel stimulus) affects some outcome (e.g., elimination of matically. Where multiple means are concerned, the ANOVA
disease, learning). Within-group variance is referred to as er- is a more appropriate analytical tool in that it provides ade-
ror variance, as its source is largely uncontrollable as un- quate protection against Type I errors.
known elements affect people. Error variance, then, creates 10. Pairwise comparison a, the probability of making a Type I
the usually minor behavioral differences that occur within a error when two samples are compared, increases as the num-
group of people. ber of t tests increases. Experimentwise error is the likelihood
5. Conceptually, the F ratio is based on between-groups vari- of making one or more Type I errors when performing mul-
ance/within-groups variance, or treatment variance + error tiple t tests in the same data set.
variance/error variance. 11. The F test is an appropriate antidote to the problem posed
6. The F distribution is generally skewed in a positive direction, by using multiple t tests, especially since the ANOVA is de-
though it approaches normality when both the sample sizes for signed to hold Type I error rates constant. The F test is called
the numerator and the denominator are large. Unlike the t and an omnibus test precisely because it can assess the presence
z distributions, then, the F distribution lacks negative values. of any statistically significant differences among more than
7. When an F ratio equals 1.00, then the between-groups and two means, but it cannot pinpoint these differences (or dif-
the within-groups variances are equal-in other words, the ference) until some post hoc test is performed.
454 Chapter 11 Mean Comparison II: One-Variable Analysis of Variance
12. A one-way analysis of variance searches for behavioral dif- (after the fact) comparisons are performed. In other words,
ferences among more than two sample means. The term the omnibus or "big, dumb F" indicates the presence of dif-
"one-way" refers to the fact that the analysis can only han- ference but does not specify the location between or among
dle one independent variable with two or more levels. the means. In general, most students and many researchers
13. The term "factor" is a synonym for a treatment or indepen- rely on post hoc tests rather than planned comparisons.
dent variable in any type of ANOVA. 18. Tukey's honestly significant difference (HSD) test is fre-
14. The one-way ANOVA compares the null hypothesis of no quently used by investigators because its calculations are
difference, which stipulates that all samples or groups come straightforward and it can be applied to most data sets. Es-
from the same population, against a flexible alternative hy- sentially, the researcher calculates an HSD statistic, a nu-
pothesis. The alternative hypothesis can be precisely stated merical value, based on statistical information drawn from
in terms of predicted relations among means or, as is more the ANOVA source table. The HSD statistic is then compared
typical, simply noted as "not Ho" or that at least one treat- to all possible absolute differences between pairs of means.
ment mean will differ from the others. When the difference between any pair of means is equal to
15. The one-way ANOVA can be calculated in a variety of dif- or greater than the HSD, the two means are significantly dif-
ferent ways, and different texts promote different formulas. ferent from one another.
In most cases, however, some notation or symbol system is 19. Supplementary analyses that support the interpretation of
introduced to make the data analysts' work easier. The nota- ANOVA results include effect size f and a variation of /))2
tion is used to keep track of observations for between- and based on the F ratio and related ANOVA information. These
within-group (sample) analyses necessary to determine the supplementary measures allow a researcher to flesh out what
variance estimates that eventually yield an F ratio. is known about the effect of an independent variable on a
16. The conceptual steps that make up the ANOVA analysis re- dependent measure when an obtained difference(s) is signifi-
flect the partitioning of variation noted previously. The cant.
between-groups sum of squares (S~etween)' degrees of free- 20. When writing about the results of an ANOVA, one should
dom (dfi,etween), and mean square or between-groups vari- be sure to connect the numerical information (i.e., means
ance estimate (MSbetween) are calculated initially, followed by and mean differences) with actual behavior-in other words,
the same statistics corresponding to the within-groups or er- what participants actually did is as important the statistical
ror perspective (Le., SSwithin> d!within, MSwithin)' The F statis- interpretation being proffered.
tic is then calculated by dividing the M~etween by the MSwithin 21. Contrast analysis-an a priori, theory-based comparison
estimates. The resulting F ratio is significant-some mean of means-is an alternative strategy one can use in lieu of
difference or differences exists-when it equals or exceeds a post hoc analyses. An advantage of contrast analysis is that
critical value of F (drawn from Table B.5 in Appendix B) de- obtaining a significant omnibus F ratio in an ANOVA de-
termined by the respective degrees of freedom for the nu- sign is not necessary. A researcher can pose a 1 degree of
merator (between-groups variance estimate) and the de- freedom contrast using weights to compare the magnitude
nominator (within-groups variance estimate). of the obtained means against a hypothesis. If the result-
17. When an F ratio is determined to be significant with more ing F ratio is found to be significant, then the investigator
than two means present, the location of the difference(s) knows that the pattern of the means conforms to the
must wait until some a priori (Le., planned) or post hoc hypothesis.
Key Terms
Analysis of variance (ANOVA) (p.413) Factor (p. 427) Post hoc test (p. 424)
Contrast (p. 448) Grand mean (p.414) Source table (p. 433)
Contrast analysis (p.447) Omnibus statistical test (p.423) Total variance (p.414)
Comparison (p. 439) One-way analysis of variance Treatment variance (p. 416)
Error variance (p. 415) (One-way ANOVA) (p.427)
Experimentwise error (p. 421) Pairwise comparison a (p.421)
Chapter Problems
1. Why should any researcher-student or professional-avoid establish mean differences? How is the ANOVA distinct or
becoming overly focused on finding statistical significance different from these tests?
when conducting behavioral science research? 3. What are some advantages of the ANOVA that make it a ver-
2. How is the ANOVA similar to other statistical tests used to satile and useful statistical test? Is it more versatile or useful
Chapter Problems 455
than the other statistical tests used for mean comparison? plete the missing information. Hint: use the numbers shown
Why or why not? and the formula provided in the chapter. The independent
4. In what ways are the ANOYA and the independent groups t variable is comprised of three groups, and each group has
test related to one another? Can they be used interchange- 15 participants.
ably? Why or why not?
Sum of
5. Explain the source of the variation used to calculate the F ra-
Source Squares df MS F
tio in a one-way ANOYA. Is the partitioning of variance for
the ANOYA similar to the way it is divided by the analysis Between -groups 20 F= __
performed by the t test? Within-groups 90
6. What is treatment variance? What is error variance? How Total
do these terms relate to between-group and within-group
variance?
20. Calculate the effect size f and the tlJ2 using the information
7. Briefly describe the F distribution-how is it similar to or
provided in question 19.
different from the t and z distributions?
21. Examine the following ANOYA summary table and com-
8. In terms of variance and the behavior of participants, explain
plete the missing information. (Hint: use the numbers
what it means when a data analyst obtains an F ratio equal
shown and the formula provided in the chapter.) The inde-
to 1.00.
pendent variable is comprised of four groups, and each
9. Is it possible to calculate an F ratio with a negative value?
group has 10 participants.
Why or why not?
10. Explain the statistical assumptions underlying the ANOYA. Sum of
How are these assumptions similar to or different from those Source Squares df MS F
associated with the t test?
Between-groups 100 F=_
11. Under what particular circumstances do the independent
groups t test and the F ratio share a special relationship with Within-groups 5.00
one another? Explain the nature of this relationship. Total
12. Is it advisable to conduct numerous t tests when, say, six
means are involved? Why or why not? What analysis should 22. Calculate the effect size f and the tlJ2 using the information
be performed with this many means, that is, levels of an in- provided in question 21.
dependent variable? 23. These data are based on an experiment comprised of three
13. Given the number of means cited in question 12, what is the independent groups. Perform a one-way ANOYA using
risk of committing at least one Type I error using multiple t a = .05. Can you accept or reject Ho? If you reject Ho, per-
tests? What would the probable risk be if there were 8 means? form the appropriate post hoc test in order to identify where
14. How is the ANOYA distinct from prior statistical tests? the differences lie.
15. A student uses an independent groups t test to compare the Treatment A TreatmentB Treatment C
number of hours per week men versus women spend studying
in the library. She finds a t statistic of -3.22. Another student 108109 6 6 7 6 5 4 5 6
analyzes the same data using an ANOYA. What does he find? 119108 7 7 6 5 4 5 7 6
a. F= 10.37 b. F= 9.2 c. F= -9.2 d. F = 3.22
e. F= 5.23 24. These data are based on an experiment comprised of two
16. Under what circumstances is it appropriate to conduct a post independent groups. Perform a one-way ANOYA using
hoc test following an ANOYA? How do post hoc tests differ a = .01. Can you accept or reject Ho? Once you complete
from a priori comparisons? the ANOYA, use an independent groups t statistic to verify
17. Examine the following ANOYA summary table and complete a significant difference (if one is found).
the missing information. (Hint: use the numbers shown and
Treatment A TreatmentB
the formula provided in the chapter.)
Sum of 12 10 10 8 8 9 8 7
Source Squares df MS F 11 11.6 10 7 7 9 8 7
12 11 10 11 8 7 6 6.6
Between -groups 15 7.5 F=_
Within-groups 27 25. Calculate the effect size and degree of association between
Total 72.5 29 the independent variable and dependent measure for the
data presented in question 23.
18. Calculate the effect size f and the III using the information 26. Calculate the effect size and degree of association between
provided in question 17. the independent variable and dependent measure for the
19. Examine the following ANOYA summary table and com- data presented in question 24.
456 Chapter 11 Mean Comparison II: One-Variable Analysis of Variance
27. Nursing students are taught clinical skills in various ways- bers, and friends, are permitted to be any closer physically.
some learn from nursing texts, others rely on computer sim- Below are the self-reported comfort levels (where 1 = very
ulations of patient condition profiles, still others learn by uncomfortable to 7 = very comfortable) of 25 people based
observing staff, and some learn by lecture. Imagine that you on measured distances between themselves and a stranger.
are the member of a nursing school and are interested in Is it true that personal space violations by strangers leads to
learning which method of acquiring clinical skills is the best. higher levels of discomfort?
Here are clinical quiz scores for each of the four methods
(NOTE: lower scores mean poorer performance): Distance: 3 inches 1 foot 1.5 feet 2 feet 3 feet
34. An F ratio based on an analysis of three independent groups 35. Once the results of a one-way ANOVA are found to support
reaches statistical significance (p < .05).What is the next a hypothesis, what should a researcher do next? Why? (Hint:
step the data analyst must undertake? Is a one-way ANOVA Use the decision tree(s) at the start of this chapter to answer
the appropriate statistical test to analyze these data? Why? these questions.)
(Hint: Use the decision tree(s) at the start of this chapter to
answer these questions.)
Deciding Which ANOVA to Use
2. 3. 4-
How many independent Are both variables Is one of the variables
variables are there? between-groups factors? repeated-measures factor
while the other is a
rpetween-groups factol?
C. 6.e't 7. ••
Are the samples or groups Are there at least two Are the two levels indepen-
re~resented by a Single in- levels in the independent dent of one another?:.<
dependent variable so that variable?
they are independent of
one another?
. ;e,;:. __ :_X:,;':c.iC:-__,:~ .<.<. I'c.
I I I I I I
If yes;'then ifno, they .' If yes, then If no;·then If.Yes,then If nO,then
perform a are depen- go to step 8. go to step 9. perform an perform a
one-way dent obser- independent dependent
ANOVA (see vations, then groups t groups t
chapter 11). perform a .f' ...... test(see •. •. test (see '.
repeated- I ·<Y cl1apter 10). chapter 10).
measures
ANOVA (see 9.
>. chapter 13), NqANOVA is appropriate for the analysis-please reevaluateth~
..; .......
data and your analysis plan .
Chapter Oulline
• Overview of Complex
Designs: Life Beyond
Manipulating One Variable
• Two- Factor Analysis of
Data Box 12.A: Thinking
Factorially
Reading Main Effects and the
or VARIANCL
Qualification: Interactions
Supersede Main Effects
Knowledge Base
• The Effects of Anxiety and
Ordinal Position on Affiliation:
A Detailed Example of a Two-
Way ANOVA
n odd bit of statistical history: Ronald A. Fisher (Cowles, 1989; see Data Box Il.A)
Linear Model for the Two-
once illustrated his principles of research design by entertaining the following
ANOVA
"experiment": Effect Size
A lady declares that by tasting a cup of tea made with milk she can discriminate Estimated Omega-Squared
for the Two-Way ANOYA
whether the milk or the tea infusion was first added to the cup. We will consider the
Writing About the Results of
problem of designing an experiment by means of which this assertion can be tested
Two-Way ANOYA
(Fisher, 1966, p. 11).
Coda: Beyond 2 X 2 Designs
Fisher went on to describe how to test this assertion, which was prompted by a Dr. B. Knowledge Base
Muriel Bristol's refusal of a cup of tea because the tea was added before the milk (Fisher Project Exercise: More on
Box, 1978). Fischer's colleague at the RothamstedAgricultural Research Station insisted Interpreting Interaction-
Mean Polish and Displaying
that the order by which milk or tea is added to a cup affects the taste of the beverage
Residuals
favored by the British. There will always be an England!
• Looking Forward, Then Back
In terms of topics presented in the last chapter, "tea first" or "milk first" into the
cup, represents the two levels of an independent variable, perhaps for a one-way analy- • Summary
sis of variance (AN OVA). Where the topic of this chapter is concerned-complex • Key Terms
ANOVA designs-we might complicate matters by adding a second independent vari-
able, whether a cube of sugar is added to the cup before or after the tea or milk is poured.
In other words, we are interested in whether the effects of these two independent vari-
ables (tea then milk versus milk then tea, and sugar before versus sugar after the sec-
ond liquid is introduced) on some dependent measure-mean ratings of the taste of
tea-can be observed simultaneously. Preferences about the preparation of hot drinks
may seem trivial, but Fisher's real interest was not this experimental domain per se, but
rather that inferences from multivariable questions could be drawn in the first place.
This is the third chapter devoted exclusively to comparing means in this book. We
began with simple two group comparisons using the t test in chapter 10 and moved
460 Chapter 12 Mean Comparison III: Two-Variable Analysis of Variance
onto the simultaneous comparison of more than two independent samples in chapter 11.
This chapter builds on the logic and empirical framework found in the previous two,
but its goal is to demonstrate how introducing a relatively minor degree of change in
a standard experiment enables researchers to ask questions containing higher levels of
complexity. The change I refer to here involves introducing a second independent
variable into a research design and its accompanying analysis. In the language of sta-
tistics and experimentation, we will learn to think about and analyze data using the
two-way ANOVA.
The two-way ANOVA is used to analyze data from experiments wherein each par-
ticipant is exposed to one level from each of two treatment variables instead of just one.
An investigator might be curious about the effects of room temperature (e.g.,
ambient versus hot) and time pressure (e.g., pressure versus no pressure) on people's
problem-solving abilities. Instead of running two different experiments-one ex-
amining how temperature affects the number of problems correctly solved, the other how
time pressure influences the same outcome-and then conducting two separate one-way
ANOVAs on the resulting data, both variables can be investigated and analyzed in one study.
As we will see, research time and effort are not the only advantages-conclusions about the
nature of relationships between variables are positively enhanced, as well.
IIi
,<~I
Overview of Complex Research Designs:
Life Beyond Manipulating One Variable
It is a cliche, but still true, that modern life is complex. Each and every day, most of us
encounter a host of different people across a variety of different settings, just as we are
barraged with media influences from the time our clock-radios go off or, later on, as we
listen to the news on our commute to and from daily destinations. Quite a lot happens
to the average person on a daily basis; some of it is extraordinary and much of it, no
doubt, is rather mundane. Yet our consideration of how to go about studying life through
the methods and techniques of behavioral science has been rather unidimensional. In
other words, we spent almost two thirds of this book acquiring the knowledge necessary
for comparing multiple means representing different levels of the same independent vari-
able (see chapter 11). At that time, we noted that life was more than a two group re-
search design, and now we are saying, in turn, that there is more than comparing means
representing several levels of the same independent variable with one another. What more
can we learn about analyzing complex situations? Why should we bother learning it?
As we just acknowledged, life is more complex than the one-way ANOVA; indeed,
Human behavior is complex and it is more varied, even infinite, than can be properly grasped, measured, or analyzed in
the most complex experiment imaginable. Poets and philosophers are more comfort-
varied, and it cannot be completely
able addressing such questions from the heart, and they have long-standing traditions
understood-only approximated-by the typical sociologist or political scientist cannot follow. Still, however, the behavioral
isolating the influences of one or two scientist must enter the breach by relying on the tools he or she has available. In the
independent variables and some
present situation, we can try to capture a slice of life, as it were, through undertaking
more complex research designs and analyses. By examining how two (or even more)
dependent measure.
independent variables affect a single dependent measure, we develop a broader sense
of how human behavior is influenced by people, situations, or thoughts. Such complex
experiments can tell us whether a given variable has a pronounced effect, no effect, or
an effect that becomes apparent only in combination with another variable. Studying
one variable in isolation, then, tells us something about its characteristics, but its role
in concert with others or how it changes across time tells us still more.
Here are some reasons to manipulate more than one variable in a factorial exper-
iment. This list of reasons is by no means an exhaustive one, but it should help you to
Two-Factor Analysis of Variance 461
see beyond the mere mechanics of the analyses we will undertake shortly, to remember
that there is actually a larger purpose to these statistical exercises:
• Economy. Manipulating more than one factor introduces a tremendous savings
in terms of time, energy, and resources. By performing only one rather than
several studies addressing a series of questions, a researcher can gain knowledge
more quickly, is apt to maintain interest in and enthusiasm for the topic, and will
not overtax available materials (e.g., space where the research will occur, creation
and production of any questionnaires, funding for experimenter[s] to aid with the
project, modest payment [if any] to participants).
• Efficiency. Economic considerations aside, it is simply easier to run one-not
several-empirical investigations. When designed correctly, a complex investiga-
tion will answer more than one pertinent question.
• Elegance. Two-factor designs are more elegant-gracefully concise, admirably
succinct-than the basic two-group, one independent-variable design because
they cover a bit more empirical terrain without a great deal of additional effort
or expense. These complex designs are often ideal candidates for developing the
broader theories of behavior so prized by behavioral scientists.
• Generality of effects. Two-factor designs not only tell us more without making
additional taxing investments of time, effort, or expense, they also enable investi-
gators to offer a more comprehensive account of behavior, one that aspires to por-
tray life's intricacies. We are back to accurately characterizing the aforementioned
slice of life while emphasizing that the presence of multiple independent variables
or repetitions of a dependent measure enhances external validity. Regardless of its
complexity, no laboratory experiment can ever be said to approach the complexity
of real life, but the juxtaposition of more than one variable or measure clearly ap-
proximates real life in spirit more than a single independent variable.
• Interaction of variables. Finally, where experiments with two or more indepen-
dent variables are concerned, there is a statistical concept called interaction,
which will be discussed in detail shortly. An interaction occurs when the effect of
one independent variable changes at different levels of another independent
variable. Why is this a matter of design complexity? As we will see, an interaction
can occur only in the presence of two (or more) independent variables, one
hallmark of more complex designs.
Please keep these characteristics in mind as we review the two-way ANOVA.
KEY TERM A two-factor or two-way ANOVA is a statistical test used to identify differences between
sample means representing the independent effects of variables A and B, as well as the interaction
between the two variables. As a statistical technique, the two-factor ANOVA partitions variability, at-
tributing portions of it to the respective independent variables and the interaction between them.
In order to get an accurate sense of the scope and utility of the two-factor ANOVA,
we will review a classic two-factor study conduced by Glass, Singer, and Friedman
(1969). In study 1, Glass and his colleagues were interested in the aftermath of being
462 Chapter 12 Mean Comparison iii: Two-Variable Analysis of Variance
Figure 12.1 Design of Two-Factor Study of Noise Predictability and Noise Level on Tolerance for
Frustration (after Glass, Singer, & Friedman, 1969)
Two-Factor Analysis of Variance 463
Thinking Factorially
j actorial designs assess the effects of more than one independent variable on the same depen-
dent measure. The most basic factorial design is the so-called 2 2 design, one that replicates
X
the format of the simplest possible one-factor study twice. A researcher can examine how vari-
ables A and B affect a dependent measure separately (main effects) and together (the A X B in-
teraction). A key idea behind factorial design-the presence of more than one independent vari-
able in an experiment or study-is the creation of different combinations of variable packages.
We know that a 2 X 2 design has four possible combinations of variables a participant could ex-
perience (i.e., AlB!> AlB2, A2Bl> and A2B2).
The combinations here are easy to envision, but what about more complex combinations?
Here is a simple problem to think about in this vein:
Any student buying lunch can have one and only one item from the main course, drink,
and dessert categories. How many possible combinations of items from these three cat-
egories are there?
If you think about the three categories as independent variables with two or three levels
each, the combination question becomes much easier to answer. Some readers will be tempted
to write out the combinations in a systematic manner (e.g., hamburger, milk, cake; hamburger,
milk, ice cream; and so on), but there is an easier way, one used by researchers all the time. When
planning or conducting a factorial design, you simply multiply the number of levels of each in-
dependent variable times the number of levels in the other independent variable(s). Thus, there
are 12 possible meal combinations because 3 X 2 X 2 is equal to 12 (just as the ubiquitous 2 X
2 design has four groups). That is all there is to it!
Despite the fact that we are not going to analyze complex research designs with more than
two variables in this book, you can now readily identify the number of distinct groups such de-
signs possess. Use the factorial multiplication rule to determine the number of groups for the
following designs:
2X 3
2X4X2
3X3X2
2X2X2X2
5X2
[01 '91 '81 '91 '9 :S.l<1MSU vl
tration-just one where both variables are represented. (For convenience, assume that
each of the four cells has an equal number of female participants-here,lO-within it.)
We will consider some actual data resulting from this design in a moment. In the
meantime, let's review the two-factor ANaVA's link to the F distribution and the ra-
tios derived from it. Like the one-way ANaVA, the two-factor ANaVA relies on the F
464 Chapter 12 Mean Comparison III: Two-Variable Analysis of Variance
distribution for hypothesis testing (for a review of the F distribution's qualities, see
chapter 11). Instead of testing for the significance of a single F ratio, however, the two-
factor ANOVA actually provides three F ratios, each of which must be compared against
some critical value (or, depending on the degrees of freedom calculations required, two
or three critical values). Using the experiment performed by Glass et aI. (1969) to pro-
vide context, the respective F ratios test for:
1. A mean difference due to variable A (noise predictability): Does predictable noise
lead to greater tolerance for frustration (more attempts to solve puzzles) than un-
predictable noise?
2. A mean difference due to variable B (noise level): Does soft noise lead to greater
tolerance for frustration (more attempts to solve puzzles) than loud noise?
3. An interaction between variables A and B, or differences among group means that
are not a result of the main effects or random variability. An interaction occurs
when the influence of one variable on a dependent measure is affected by the pres-
ence of a second variable (e.g., perhaps soft noise is linked with greater tolerance
for frustration [more attempts to solve puzzles] when the noise is predictable rather
than unpredictable). Please note that there are always several possible interaction
patterns for a given research design-the actual one found by Glass et al. (1969)
will be presented later.
Statistically, the F ratios for the two independent variables are calculated essentially
A two-way ANOVA provides three F the same wayan F ratio based on a one-way analysis is determined. The difference, of
course, is that two variables yield three Fs, not just one. Conceptually, the F ratio for
ratios, two corresponding to
the noise predictability or A factor is:
variables A and B, and one for the
variance differences between the sample means
interaction between the variables.
F _ due to variable A + error variance for variable A .
A - error variance for variable A
Similarly, the F ratio for the noise level, or factor B, is:
variance differences between the sample means
F _ due to variable B + error variance for variable B.
B - error variance for variable B
As always, the error variance is comprised of variability presumably due to individual
differences possessed by the participants and experimental error.
What about the interaction between the two variables? In what way does noise pre-
dictability interact with noise level, if at all? How should we construe the variability
partitioned within it? We can think of the variance attributable to an interaction in a
couple of ways. First, we can think of it as the interaction between noise predictability
and noise level (symbolized A X B for "A by B;' not "A times B"):
variance differences between the sample means due to A X B
F = interaction + error variance for variable for A X B interaction.
A x B error variance for A X B interaction
This formation is another way of identifying the residual or leftover variance that is not
explained by either noise predictability or noise level alone, or:
variance differences between the sample means not due to
variable A or B + error variance not due to variable A or B
FAX B = error variance not due to variable A or B
At base, of course, each of the three F ratios that are derived from any two-factor
ANOVA is based on the conceptual formulation we rely on for exploring mean differences:
Two-Factor Analysis of Variance 465
F = between-groups variation,
within-groups variation
where the bulk of the between-groups variation is presumed to be systematic, caused
by factor A (noise predictability), factor B (noise level), or the A X B (noise pre-
dictability X noise level) interaction, as well as individual differences (each participant
displays some idiosyncratic behavior), and experimental error (e.g., less-precise mea-
surement, deviation from experimental script, equipment problems). The within-
groups variation, in contrast, should be random, not systematic, being caused by the
aforementioned combination of individual differences and experimental error.
As was true for the one-way ANOVA, as the value of any of the three F ratios ex-
ceeds 1.00, there exists the possibility that an observed difference between sample means
is not due to chance but, rather, either variable or the interaction between the two vari-
ables. In order to determine if one or more of the F ratios is significant, each calculated
value must be compared to a critical value of F drawn from Table B.5 in Appendix B.
Note: Data are drawn from Table 3 in Glass et aI. (1969, p. 204).
466 Chapter 12 Mean Comparison III: Two-Variable Analysis of Variance
To begin, an advantage of the two-way ANOVA is its ability to isolate the effects of
one independent variable from the other, a process that identifies what is called a main
effect for each independent variable.
KEY TERM A main effect occurs when an independent variable has an overall and significant effect on a de-
pendent measure. A main effect is likely to exist when there are mean differences between or
among the levels of a factor.
Take a look at the far right column of two means in Table 12.1. These means represent
the effect of factor A (noise predictability) on the number of tries participants made to
solve the puzzle. What do we know? Participants exposed to the random noise (.KA , =
6.64) made fewer attempts than those who heard the noise in the fixed pattern (.KA2 =
19.23). The difference between these means indicates that there was a main effect for
noise predictability-unpredictable noise was more frustrating for the women than pre-
dictable noise. Of course, this main effect would only be considered a reliable one if the
aforementioned F ratio for factor A (i.e., FA) reached statistical significance. If the means
were the same or similar-and the resulting F ratio was close to 1.00-then no main
effect for noise predictability would be present, and any superficial mean differences
would be attributed to sampling error or other random variation.
Please take note of one other important fact: In examining the main effect for noise
predictability, we have collapsed across the noise level factor B (see Table 12.1). In other
words, in order to determine the marginal means for noise predictability, we had to take
the average of cells AlBI and AlB2 (i.e., .KA ) as well as the average of cells A2Bl and
A2B2 (i.e., .KA ,). No information regarding factor B (noise level) is lost in the process of
examining the main effect for factor A (noise predictability)-we are merely interested
in isolating the effects of factor A before examining factor B.
Now, what about factor B? This time, we will be collapsing downward across fac-
tor A-noise predictability-in order to examine whether noise level has any influence
on tolerance for frustration. Look at the bottom row of marginal means in Table 12.1.
As you can see, there also appears to be a main effect for factor B (noise level), though
the magnitude of the mean difference is more modest than that for factor A. Women
who heard the loud noise (.KB , = ILl 1) were frustrated more easily than those who
heard the soft noise (.KB2 = 14.75). Once again, we say there appears to be a main ef-
fect because of the mean difference-note the importance of linking the means in ques-
tion to participants' behavior-but we cannot be sure unless we determined that the F
ratio for factor B (FB ) was statistically significant.
Because each main effect represents the independent effect of one independent
variable-here noise predictability and level-it is as if you conducted two separate
experiments. In fact, this is precisely one of the advantages I raised earlier: one exper-
iment reveals that predictability matters where tolerance is concerned (i.e., the main
effect for factor A) and another suggests that noise level, too, affects performance (i.e.,
the main effect for factor B). Fortuitously, however, we discover these two effects by per-
forming only one experiment!
Although these two main effects are informative, they can be qualified by the pres-
ence of an interaction between the two variables in the experiment. As we will see, an
interactive effect between two variables-here, noise predictability and noise level-
actually supersedes the simpler information disclosed by the ANOVA's two main effects.
There are several issues to consider when looking for an interaction. Let's get a precise
definition out of the way first, and then consider whether one exists in the data from
the experiment performed by Glass and colleagues (1969). Recall that an interaction re-
sults from some unique combination of the independent variables and their effects on
a dependent measure.
Two-Factor Analysis of Variance 4l1li7
KEY TERM A statistical interaction involves the combined effect of two or more factors on a dependent vari-
able, one that is independent of the separate effects of the two or more factors. In a two-way
AN OVA, an A x B interaction exists when the effect of one factor depends on the level of the sec-
ond factor. A statistically significant interaction between two or more variables qualifies the results
identified by one or more main effects.
Take another look at Table 12.1. This time, we are interested in the four cell means
shown inside the table because the focus is on whether the two variables affect one another
in some unique fashion. The easiest way to begin to determine if an interaction exists is to
plot the data. A simple plot of the four cell means is shown in Figure 12.2. The mean num-
ber of puzzle solving attempts, a quantitative scale, is noted on the Yaxis, while the sec-
ond index-loud noise versus soft noise (i.e., factor B)-appears on the X axis (the choice
of placing factor A or B on this axis is an arbitrary one). The upper line graph in Figure
12.2 links the two-cell means (loud and soft) reflecting predictable noise. This line reveals
something interesting about the predictable means-the cell means are very similar (i.e.,
18.55 and 19.90), so the line connecting the means is almost level. In the presence of the
fixed noise, then, level of the noise did not seem to affect the number of attempts to solve
the puzzles. The cell means for unpredictable noise (loud and soft) are shown in the lower
line graph. This line has a relatively steep slope, one indicating that fewer attempts occurred
when the unpredictable noise was loud than when it was soft.
Is there an interaction between factors A and B? The accurate way to determine whether
the F ratio for the A X B interaction is statistically significant, of course, involves compar-
ing its value to an established critical value. There is a quicker way to estimate the likeli-
hood that an interaction is present, however-simply examine the two lines shown in Fig-
ure 12.2 and then answer one simple question: Are the lines parallel? If the answer is "yes,
the lines are parallel;' then there is not likely to be an interaction between the two vari-
ables. If the lines are not parallel, however-they either intersect with one another at some
point or they could intersect with one another if their respective lengths were extended-
then an interaction is probably present. The parallel test is a quick guide for examining ob-
tained data and then making an educated guess as to whether an interactive relationship
exists between the variables.
25
.l!i
=-
E
CD
i 20
(!)-
(!) Fixed Noise
=
CII
.s;:
Q
en 15
.
CD
i:l y
~
..
c; 10 Random Noise
.
CD
.=
E
z 5
=
'"
CD
:::!
X
0
Loud Noise Soft Noise
Noise Level
Note: Graph of tolerance for frustration means from Table 12.1. Higher means indicate more tolerance for
frustration (i.e., participant attempts to solve first insoluble puzzle).
468 Chapter 12 Mean Comparison III: Two-Variable Analysis of Variance
~,
~j Based on this quick guide to reading the nonparallel line graph shown in Figure 12.2,
Plotting cell means in a line graph we can say that there appears to be an interaction between noise level and noise pre-
dictability. Specifically, Glass and colleagues (1969) found that the number of attempts
can suggest whether an interaction is
to solve the insoluble puzzle was greater than when the noise was soft than when it was
present: parallel lines = no loud-but this relationship held only when the noise was unpredictable. When the noise
interaction, whereas nonparallel was predictable, the number of attempts was relatively high, between an average of 18
lines = potential interaction. and 20 tries, regardless of the noise intensity (see the cell means in Table 12.1 and the
graph in Figure 12.2). Changing the noise predictability altered the relationship between
noise intensity and the attempts to solve the puzzle-thus, noise intensity and noise pre-
dictability interacted with one another. As noted earlier, then, the presence of this inter-
action qualifies-limits the interpretive power-of the two main effects. We know more
about people's tolerance for frustration reactions based on the interaction between noise
predictability and noise level than considering the effects of either variable in isolation.
If no interaction were present, the simple plot of the means might have looked like
anyone of the line graphs shown in the top row of Figure 12.3. Each line in a given
pair is parallel to the other, thereby indicating the absence of any interaction (see the
top row of Figure 12.3). The bottom row of Figure 12.3, however, illustrates some other
possible interaction patterns. Each of these line graphs suggests the presence of an
interaction (note that some of the lines actually cross over or connect with one an-
other-others do not cross but would eventually if the lines were extended). Naturally,
many other graphic patterns illustrating the presence or absence of interactions are also
possible. The examples presented in Figure 12.3 are suggestive, not exhaustive.
Any two-way ANOVA will test three separate hypotheses-one for factor A, an-
other for B, as well as one for the A X B interaction. Each of these hypothesis tests is
independent of the others, which means that the result of anyone of the tests is not re-
lated to the outcome of the other two. In other words, the results of any two-way ANOVA
x
(d) (e) (f)
No Parallel Lines-Interaction Present
Figure 12.3 Some Line Graphs Illustrating the Presence or Absence of Interaction Between
Hypothetical Variables A and B
Two-Factor Analysis of Variance 469
can display several combinations of significant or nonsignificant main effects and in-
teractions. In the present example, we found a main effect for factors A and B, and an
A X B interaction. We could just as easily have found a main effect for A, none for B,
and no interaction. How many possible combinations of present or absent main effects
and interactions exist? Think about it: If we have three effects with two levels (i.e., ef-
fect present or effect absent) each, how many possible combinations are there? That's
right-there are eight (i.e., 2 X 2 X 2; see also Data Box 12.A).
Figure 12.4 presents four data tables illustrating means that do or do not yield main
effects and interactions. Please take a few minutes and examine these data tables to make
certain that you know how to spot main effects, and to plot and identify interactions.
We will learn to read and take apart data tables like these using a more advanced tech-
nique later in this chapter's Project Exercise. In the meantime, remember that a thor-
ough data analyst will always examine each and every effect in a two-way ANOVA. Any
significant main effect must yield, however, to the more detailed interpretation offered
by a significant interaction (see Data Box 12.B).
Hypotheses, Notation, and Steps for Performing for the Two-Way ANOVA
How do we present the null and alternative hypotheses for the two main effects and the
interaction found in any two-way ANOVA? We proceed really no differently than we have
done for any other inferential statistical test, notably the one-way ANOVA, of course. We
will briefly present the null and alternative hypotheses for each of the three effects
provided by any two-way ANOVA. For continuity and ease of understanding, the Glass et
al. (1969) study's design will provide some context for our hypotheses; indeed, you may
want to turn back to examine the data in Table 12.1 as you review each one.
470 Chapter 12 Mean Comparison III: Two-Variable Analysis of Variance
(a) 40
81 82 3D
22.5 22.5 10
,=/
No Main Effect lor 0
8
(b) 40
81 82 30
A1 3D 15
Y 20
A2 15 3D
10
(c) 40
81 82 30
A1 10 3D
Y 20
A2 10 10
10
(d) 40
81 82 30
A1 10 20
Y 20
A2 15 15
10
['UO!Plll;llU! II x V pUll 'II lOj P;l.JJ;l U!IlW lnq V lOj p;l.JJ;l U!IlW ou:O !UO!Pll
-l;llU! II x V pUll 'II pUll V lOj p;l.JJ;l U!IlW ::) !UO!Plll;llU! II x V lnq 'II lO V lOj p;l.JJ;l U!IlW ou :II :SJ3MSUV]
Figure 12.4 Reading Data Tables for Main Effects and Interactions
Four data sets illustrating the presence or absence of main effects and interactions for studies involving two
independent variables are shown above. The numbers in each cell of a table represent the mean values
found within a condition. To illustrate the task involved, the first data table (a) is done for you. Determine
whether each of the remaining three tables have a main effect for A, one for B, and an interaction.
Two-Factor Analysis of Variance 471
W hat happens when both main effects and the interaction in a two-way ANOVA are signifi-
cant? How do we go about reporting the results? Are the main effects as important as the
interaction?
Statistically speaking, a significant interaction qualifier-that is, modifies or limits-any con-
clusions drawn from either of the separate main effects in a two-way ANOVA. How so? Consider
the Glass et al. (1969) tolerance for frustration results for a moment. We know that relatively
fewer attempts were made when the noise was loud than when it was soft (main effect for factor B),
just as we know that fewer tries took place when the noise was random than when it was fixed
(main effect for factor A) (see Table 12.1). Despite the fact that these two main effects provide a
clear and relatively concise portrayal of the participants' behavior, we know still more when we
examine the interaction between them. The interaction revealed that noise level only mattered
when it was unpredictable-tolerance for frustration was lowest in the presence of loud noise-
but not when it was predictable (i.e., the number of attempts for the loud and soft noise condi-
tions were approximately equal) (see Figure 12.2). Thus, we know more about both variables
when they interact together, much more than is apparent in the information provided by either
main effect.
As Harris (1998) put it, a significant interaction invites us to interpret results by reporting
that "it all depends" (p. 419). Any effect of factor A must be understood in terms of the level of
factor B, and any effect of factor B is dependent on the level of factor A. Here, then, are some
rules to follow when reviewing the results of a two-way ANOVA: Report any and aU significant
main effects and any interactions, but remember that a significant interaction captures more of the
available information than either or both of the main effects. Thus, the more complex relationship
revealed by a significant interaction should be the focus of any discussion--consideration of ei-
ther or both main effects must be subordinate to that given to the interaction. When an interac-
tion is not significant, emphasize the results provided by the significant main effect(s) and then
speculate about why an interaction was not found. Keep in mind, too, that researchers do not
necessarily always want to find an interaction-it all depends on the theory or hypothesis they
are testing.
The null hypothesis for factor A (noise predictability) is that there is no difference
between the effects of random or fixed noise on tolerance for frustration, or:
As was true for factor A, the alternative hypothesis for the effect of loud versus soft
noise on the number of attempts to solve the puzzle can be presented as nondirectional
or directional (Le., loud noise leads to fewer tries than soft noise):
HI: /LB, =1= /LB 2 ,
HI: /LB, < /LB 2 '
Where the hypothesis for the A X B interaction is concerned, we could specify some
particular pattern of results involving the two variables. It usually makes more sense,
though, to be general where both the null and the alternative hypothesis is concerned,
as in:
Ho: The effect of one factor is not dependent on the levels of the other factor.
HI: The effect of one factor depends on the levels of the other factor.
The notation and formulas used to perform a two-way ANOVA are virtually iden-
tical to those used for the one-way ANOVA calculations in chapter 11. The goal is to
identify and partition the variance attributable to each of the two main effects (A and
B), the interaction (A X B), and the within-group or error variance. Each of these vari-
ance estimates is found by carving up the total sums of squares based on the two-way
ANOVA (see Data Box 12.C):
SStotal = SSA + SSB + SSA x B + SSwithin'
In addition, the degrees of freedom for each of the variance estimates must also be de-
termined. All of this information will be placed into a two-way ANOVA source table, a
convenient method for organizing and interpreting the obtained F ratios from any in-
vestigation. Keep in mind that the notation and formulas presented here are but one of
several methods available to perform a two-way ANOVA; other writers and texts pre-
sent alternative approaches that will yield the same answers.
1. Calculate the total sum of squares. Prior to assigning variation to main effects, an
interaction, or an error term, we need to determine the total sum of squares. Here
is the computation formula for SStotal:
2 (2: X ijk )2
[12.1.1) SStotai = 2: X ijk - N '
where i refers to the number of a given participant (e.g., participant 2), j identifies
the level of factor A (i.e., 1 = random noise and 2 = fixed noise), and k identifies
the level of factor B (i.e., 1 = loud noise and 2 = soft noise). The content and me-
chanics of formula [12.1.1] are no different than any other formula we have pre-
viously used to find the total sum of squares-we are just highlighting two inde-
pendent variables as well as the number of a participant. The formula directs us to
(a) square and then sum all of the raw scores in the data (2: X~k); (b) and to then
sum all of the raw scores, square them ((2: X ijk )2), and then divide the product by
the total number of available observations or participants (N). As was true for the
one-way ANOVA, the second half of the equation for the total sum of squares-
(b )-(Le., (2: Xijkl / N) is referred to as the correction term (Harris, 1998), and it is
subtracted from the product found in (a) (see [12.1.1]).
The degrees of freedom associated with the total variance is found by
[12.2.1) dftotal =N - 1.
2. Calculate the between-group variance estimate for factor A. In order to calculate
the variance estimate for factor A, we first need to determine the sum of squares
for A using:
Two-Factor Analysis of Variance 473
- SSA .
MSA-
[12.5.1]
diA
3. Calculate the between-group variance estimate for factor B. The between-group
variance for factor B is calculated by the same logic used to find the sum of squares,
degrees of freedom, and mean square for factor A. Only the notation for the sum
of squares for B differs slightly, as k denotes the levels of factor B:
[12.8.1] MS = SSB.
B diB
4. Calculate the variance estimate for the interaction. Calculating the interaction sum
of squares is only slightly more complicated than either of the main effect estimates.
Here is the formula:
(L Xjk)2] (L Xijkf
[12.9.1] SSA x B = L[ njk - N - (SSA + SSB)'
This looks like quite a bit of work, but there is actually not much calculation here
because the values for two thirds of the formula were determined earlier. Notice,
for example, that we already know the (b) correction term-the middle of formula
[12.9.1]-and the (c) respective sum of squares for factors A and B from the two
previous calculations. All that we really need to calculate is the first third of the
equation. Notice that the first part of formula [12.9.1] indicates that we should
(a) take the sum of the observations for each cell in the design (L Xjk ), square each
sum (Le., (I Xjkf), and then divide each squared sum by the number of obser-
vations found in the appropriate cell (njk). The resulting products representing each
474 Chapter 12 Mean Comparison III: Two-Variable Analysis of Variance
cell of the design are then summed (I [(I Xjk)2 j njk)]) together, and the rest of the
equation can be solved (i.e., the [b] correction term and the [c] sum of squares for
the two main effects are subtracted from the product of [a]).
The degrees of freedom for the interaction are based on:
[12.10.1] d!A x B = (j - 1)( k - 1),
where j and k refer to the total number of levels of factors A and B, respectively.
The mean square or variance estimate for the interaction is calculated using:
SSA x B
[12.11.1] MSA x B = -d'~ .
'JA x B
[12.12.1] SSwithin =I 2
X ijk - I [(I n ?] .
Xjk
jk
We can complete the first half of formula (a) by inserting the value for I xt", which
was calculated back in step 1. Similarly, the second half of the formula (b), I [(I
XjJ 2jnjk)]' can be found in step 4. We need only subtract (b) from (a) (see [12.12.1].
The degrees of freedom for the within-group variance estimate are equal to:
[12.13.1] d!within =N - c,
where c refers to the total number of cells in the research design. In a 2 X 2 design,
for example, c is equal to 4 (recall Data Box 12.A).
The mean square for within-groups or error term is based on:
SSwithin
[12.14.1] MSwithin = -d'~'
..
:JWlthlD
6. Calculating the F ratios for the main effects and the interaction. The final calculations
for the two-way ANOVA involve the F ratios for the two main effects (factors A and
B), and the A X B interaction. These calculations entail dividing the appropriate mean
square for each of the three separate effects by the mean square representing the within-
groups variance estimate. Thus, the F ratio for factor A is equal to:
-
FA- MSA
[12.15.1]
MSwithin
[12.16.1]
-
FB- MSB
MSwithin
Finally, the F ratio for the interaction between factors A and B is determined by:
- MSA x
FAXB- B
[12.17.1]
MSwithin
Naturally, all of these calculations are systematically entered into a source table de-
signed for the two-way ANOVA. We will complete one from start to finish in the next
The Effects of Anxiety and Ordinal Position on Affiliation: A Detailed Example of a Two-Way ANOVA 475
Between groups
Sum of Squares df Mean Square F p
MSA
Factor A SSA dfA MSA .05
MSw;thin
MS!! .05
Factor B SSB dfB MSB
MSwithin
MSAXB
AXB SSA x B dfA X B MSAX B .05
MSwithin
Within groups SSwithin dfwithin MSw;thin
section of the chapter. In the meantime, however, it will be good preparation for you
to examine one that includes all of the symbolic entries we just reviewed. Table 12.2
contains these symbolic entries and, as was true for the one-way ANOVA source table,
the calculations build from the left to the right, from the sum of squares for each ef-
fect to the three resulting F ratios. This table can serve as a resource to remind you
where a particular calculation is placed. We will review the simple error checks you can
perform as you proceed through a two-way source table in the next section, as we per-
form a two-way ANOVA from start to finish.
Knowledge Base
1. List the reasons it is sometimes desirable to manipulate more than one indepen-
dent variable.
2. How many conditions would a 3 X 4 design have? A 2 X 2 design?
3. How many main effects and interactions does a two-way ANOVA provide?
4. What is a quick method to determine if two variables interact with one another?
Answers
1. Economy of time, energy, resources; efficiency, as more than one question can be addressed;
elegance of design; generality of effects, which leads to more comprehensive portrayals of be-
havior; interaction of variables, such that one independent variable changes at different lev-
els of another independent variable.
2. A 3 X 4 design would have 12 separate conditions. A 2 X 2 design has 4 separate condi-
tions.
3. A two-way ANOVA provides two main effects-one for factor A, the other for factor B-
as well as one interaction between factor A and B.
4. Plot the cell means in a line graph. If the two lines are parallel, there is no interaction. If the
lines are not parallel (i.e., they intersect or could intersect), then an interaction may be present.
W hen it was first introduced in chapter 11, the general linear model (GLM) was described as
another way to think about partitioning the variance in the one-way ANOVA (see Data Box
Il.C). As you will recall, the novelty of the GLM is that it enables us to think about the compo-
nent parts of a single observation within some research design. The GLM's logic can also be ex-
tended to the two-way ANOVA. Thus far in this chapter, we have grown accustomed to parti-
tioning the variation in a two-way ANOVA into the following components:
Total Variation
Using the GLM, we can think of any participant's score obtained in a research project as being
based upon this formulation:
where
X ijk = the score of one person within one cell (combined treatment condition).
The i refers to an individual observation, the j is the particular level of factor A, and the k is the
level of factor B.
JL = the mean of the population mean of the domain of interest.
aj = effects due to a specific level of factor A.
13k = effects due to a specific level of factor B.
a13jk = effects due to the interaction between factors A and B.
Eijk = random error caused by individual differences and experimental error.
contact with others (Schachter, 1959). His research examined a variety of situational and
personal variables influencing the degree to which people prefer the company of others, par-
ticularly when they feel under duress. Using a variation of one Schachter's studies and some
hypothetical data, we will illustrate the utility of the two-way ANOVA for data analysis.
Schachter's basic paradigm entailed presenting research participants with an anxiety-
producing communication regarding an experiment they were to take part in, and then
The Effects of Anxiety and Ordinal Position on Affiliation: A Detailed Example of a Two-Way ANOVA 477
observing whether they preferred to wait alone or with others (i.e., engage in affiliation)
who would be undergoing the same experience. In general, Schachter found that the de-
sire to affiliate was stronger when participants felt a high rather than a low degree of anx-
iety about the upcoming experiment. To examine individual differences in affiliation,
Schachter then considered a second variable-birth order-positing that firstborn chil-
dren would show a stronger desire to affiliate relative to later-born children. Why? Casual
observation as well as empirical research suggests that parents dote on firstborn children
(a group that includes "only" children, as well) more than later-born children, possibly cre-
ating a pronounced degree of dependence in the process (e.g., Schachter, 1959).
Imagine that a group of 24 college-aged men (run in groups of 6) sign up to take
part in an experiment. Half of the participants are firstborn children, the other half are
the second- or later-born children within their families. Once a participant arrives at
the psychology laboratory, he and the other members of his group are told they will be
taking part in an experiment on the effects of electric shock. When the preliminary
instructions are over, participants in the high-anxiety condition are informed by a
somewhat authoritarian experimenter that, "I do want to be honest with you and tell
you that these shocks will be quite painful but, of course, they will do no permanent
damage" (Schachter, 1959, p. 13). From a somewhat friendlier researcher, the low-
anxiety participants learn that, "what you feel will not in any way be painful. It will
resemble more a tickle or a tingle than anything unpleasant" (pp. 13-14).
Participants were then told that they would wait 10 minutes before the actual ex-
periment would begin, and that they were welcome to wait alone or with others. The
prediction, of course, is that participants in the high anxiety condition would want to
wait with others rather than be alone (i.e., "misery loves company"), and that firstborns
would have a generally higher affiliative desire than later-borns. To see if these predic-
tions were borne out, the participants then completed the main dependent measure, a
simple rating scale that looked like this:
1 2 3 4 5
I very much I prefer I don't I prefer being I very much
prefer being being care very together with prefer being
alone. alone. much. others. together with
others.
Participants circled the number corresponding to how they felt about being alone or
with others. The data resulting from the four experimental conditions are shown in
Table 12.3. Please take a moment and familiarize yourself with these data, as we will be
using them to complete the subsequent calculations for the two-way ANOVA.
What does a preliminary review of the data in Table 12.3 reveal? By eyeballing
the main effect for birth order, participant reactions appear to be consistent with the
hypothesis: firstborns (XA \ = 3.67) wanted to wait with others more than later-borns
(XA2 = 2.25)(see Table 12.3). Similarly, participants exposed to the high anxiety
communication (X6 \ = 3.58) rated their desire to wait with others as higher than the
low anxiety group (X62 = 2.33) (see Table 12.3). The means corresponding to the interac-
tion are in the four cells inside Table 12.3. Figure 12.5 shows a simple plot of these cell
means. Because the lines are more or less parallel, we anticipate that no interaction is
present as we begin the calculations for the two-way ANOVA. We will see if this spec-
ulation is borne out by the data analysis.
As usual, we will follow the four steps for testing hypotheses using the ANOVA out-
lined in Table 11.4 (which is a revised version of Table 9.2). Step 1 involves identifying
the null and alternative hypotheses for the two main effects and the interaction. As you
review these three sets of hypotheses, please make certain you understand how and
478 Chapter 12 Mean Comparison III: Two-Variable Analysis of Variance
Table 12.3 Hypothetical Data for Birth Order, Anxiety, and Affiliation Study
..'""'".."' Factor A
Factor B (Anxiety)
(Birth Order) Marginal Means
why they match up with the conceptual discussion of the means we just had. The null
and alternative hypotheses for birth order (factor A) are:
Ho: /-LA, = /-LA"
HI: /-LA, > /-LA,·
In turn, the null and alternative hypotheses for the anxiety communication (factor B)
are:
Ho: /-LB, = /-LB"
HI: /-LB, > /-LB,·
Finally, the null and alternative hypotheses for the A X B interaction entail:
Ho: The effect of one factor is not dependent on the levels of the other factor.
HI: The effect of one factor depends on the levels of the other factor.
Step 2 from Table 11.4 requires us to select a significance level for rejecting the null
hypotheses for the two-way ANOVA. Following convention, we rely on a p value of .05.
As we begin step 3-the actual calculation of the sums of squares, degrees of free-
dom, variance estimates (mean squares), and the three F ratios-we acknowledge that
there is quite a bit of procedure to be followed and information to be organized. The
mise en place philosophy must be followed to the letter-any and all calculations we
The Effects of Anxiety and Ordinal Position on Affiliation: A Detailed Example of a Two-Way ANOVA 479
.l2
~ 5
~
oS 4
e!
'iii
II>
c Y 3 Firstborn
Ci
COl
c
~ 2
II:
...
c
II>
Later-Born
::!E
o~------~---------------------------x
High Anxiety Low Anxiety
Anxiety Level
Note: Graph of mean rating of desire to affiliate on 1 to 5 rating scale, where higher numbers reflect a greater
desire to be with others,
perform and information we track must be neatly arranged, To that end, Table 12.4 rep-
resents the completed two-way ANOVA source table for the birth order and anxiety
data. Please refer back to it as we proceed with each step in each calculation-that way,
you will see where each entry is placed and how it fits into the calculations of the three
F ratios. I will draw your attention to a few error checks that will help us avoid com-
mon mistakes along the way.
One last reminder: Please read the following pages carefully and critically so that
you know where each and every number came from and why-the moment you feel
uncertain or confused, stop where you are and backtrack immediately until you iden-
tify where and why you got lost. I realize that you will be tempted to speed read the
next few pages, maybe even skip them, with an eye to reading them "later" when you
have to do so in order to complete homework problems. Fight this temptation-the
time and effort you expend now will save you time in the long run-and you will also
have the satisfaction of knowing what you are doing now rather than having some doubts
later.
As always, we begin by calculating the sum of squares of the total variation and then
I the accompanying degrees of freedom. I will be using the formulas introduced earlier in
the chapter, hence I will be repeating formula numbers you encountered earlier. The
total sum of squares (SStotal)is equal to:
2 (I Xijkf
[12.1.1] SStotal = I Xijk - N
Remember that the value of I Xtk is found by squaring the individual observations in
each of the four cells in the design (see the squared values of X and the resulting I X2
values representing each of the cells in Table 12.3). The four I X2 values are then
summed to form I xtk> which is equal to 245 (this calculation is shown in the pre-
liminary calculations section at the bottom of Table 12.3). The value of I X ijk is also
obtained from Table 12.3, as the four I X values are summed and found to be equal
to 71 (the calculation for this sum appears at the bottom of Table 12.3-be sure you
see where the values came from in the upper portion of this table). Finally, N is equal
to the total number of participants, which we already know is equal to 24 (i.e., 6
I.
480 Chapter 12 Mean Comparison III: Two-Variable Analysis of Variance
participants per cell, and there are 4 cells; see Table 12.3). All of these numbers are then
entered into [12.1.1] for:
(71)2
[12.1.2] SStotal = 245 - 24'
5,041
[12.1.3] SStotal = 245 - - - ,
24
[12.1.4] SStotal = 245 - 210.04,
[12.1.5] SStotal = 34.96.
The SStotaI is entered into the bottom of the second column of Table 12.4, and we proceed
to calculate the accompanying degrees of freedom based on the 24 participants in the study:
[12.2.1] d!total =N- 1,
[12.2.2] d!total = 24 - 1,
[12.2.3] d!total = 23.
We then enter the 23 degrees of freedom at the bottom of the third column in Table 12.4.
We now begin to partition the between-groups variation by calculating the sum of
squares for each of the two main effects and the interaction, their respective degrees of
freedom, and the three separate variance estimates. Let's start with the sum of squares for
factor A, the main effect for birth order, using:
[12.3.1]
" [(I
SSA=L. -- - Xi] (I Xijk)2 .
nj N
Note that the correction term-the second half of the formula (Le., (I X jjk?/ N)-was
already determined when SStotal was calculated previously (see the second half of
[12.1.1]); we need not do this math a second time, rather, we can simply insert its known
value, 210.04 (this value was found in [12.1.4]). The information for the first half of
the formula-the I X for each of the two levels of factor A (Le., I XA , and I XA2 ) -
can be drawn from the leftmost column in Table 12.3. Please take careful note of the
fact, too, that each level of A is based on 12 observations (nj), not 6 observations, be-
cause we are collapsing across the B factor (see Table 12.3):
Remember, there are two levels (j) to the birth order factor, so:
[12.4.2] dfA =2 - 1,
[12.4.3] dfA = 1.
The Effects of Anxiety and Ordinal Position on Affiliation: A Detailed Example of a Two-Way AN OVA 481
This 1 degree of freedom is entered in the first slot in column 3 of Table 12.4.
Because we know SSA and dfA, we can also calculate the mean square or variance
estimate for birth order (factor A) using:
[12.5.1] MS - SSA
A- dfA·
P]ease note that in this case the MSA will be equal to SSA because dfA is equal to 1 (i.e., a
number divided by 1 is equal to that number). By entering the two values we just cal-
culated we find:
SSB=I [ - - -
(I Xk? ] (I Xijk? .
[12.6.1]
nk N
Once again, the correction term-the end of the formula-is based on a previous cal-
culation, and each level of factor B will is based on 12-not just 6-observations. The
first portion offormula [12.6.1] is based on information provided in the lower margin
of Table 12.3, for:
(I Xjk)2] (I X ijk)2
[12.9.1] SSA x B = I [ njk - N - (SSA + SSB)
We can readily enter the known correction term as (b) the second part of the formula,
and we can insert the recently calculated sums of squares for the respective main
effects (c) at the end of the formula. In order to complete the first part of the formula
(a), all that we need to do is to take the sum of each of the four cells in the design, square
it, divide the squared value by the number of observations in the cell, and then add the
resulting products together. Parts (b) and (c) are then subtracted from (a). As you will
see when you check the numerical entries, all of them can be found in Table 12.3:
Here is the tricky part of the calculation: Treat the last two numbers in the equation
(210.04 and 21.41) as negative, so that you add the numbers together to form a larger
negative number, as in:
B = .060.
/
The SSA x B is rather small, confirming our suspicion that little variability is available
for a statistical interaction. Before the SSA x B is entered into Table 12.4, we can also
determine the degrees of freedom and the variance estimate for the A X B interaction.
The degrees of freedom are based on:
where j and k, respectively, refer to the number of levels in factors A and B-each is
equal to 2:
Due to this single degree of freedom, the MSA x B will be equal to the SSA x B follow-
ing this formula:
MSA - SSA X B
[12.11.1] X B - ,
dfAX B
.060
[12.11.2] MSA x B = -1-'
The MSA x B, as well as the 1 degree of freedom, and the SSA x B are now entered into
their proper columns in row 3 of Table 12.4. Please look at row 3 in Table 12.4 now to
verify that you understand the rationale behind their placement.
The final set of variance estimate calculations concerns the within-group or error
sum of squares. The formula for SSwithin is:
(I J0k)2]
[12.12.1] SSwithin =I Xtk - I [ (njk) .
Numerical values for both portions of the formula are already known. The first of the
formula (a)-the sum of all the squared values of X in the sample (I Xtk)-was
calculated when the SStotai was determined (see [12.1.1], as well as the additional pre-
liminary calculations at the bottom of Table 12.3). The I Xtk, which is equal to 245, was
found by adding the individual I X2 from each of the four cells (see Table 12.3). We just
calculated the second part of the formula (b )-I [(I J0kf / njk) ]-when the SSA x B was
identified (see [12.9.1]). The value of I [(I Xjkf/njk)], which is equal to 231.51, was
found by squaring each of the four I X values from Table 12.3, dividing each by the num-
ber of observations per cell (i.e., 6), and then summing the resulting products together
(see [12.9.2] to [12.9.6]). Taking the values from these prior procedures, we find that:
[12.13.1] c,
dfwithin =N -
where c is the number of cells or conditions in the research design. Here, there are four
(see Table 12.3):
[12.13.2] dfwithin = 24 - 4,
[12.13.3] dfwithin = 20.
I
I
Table 12.4 Two-Way ANOVA Source Table for Birth Order, Anxiety, and Affiliation Data
\I 'l Source Sum of Squares df Mean Square F p
""""" Between groups
Factor A (birth order) 12.04 12.04 17.84 P < .05'
Factor B (anxiety) 9.37 9.37 13.88 P < .05'
A X B (birth order X anxiety) .06 .06 .089 p> .05
Within groups 13.49 20 .675
Total 34.96 23
SSwithin
[12.14.1] MSwithin = ~'
JWlthm
13.49
[12.14.2] MSwithin = 20'
[12.14.3] MSwithin = .675.
These three values-SSwithin, d!within, and MSwithin-are then entered into the re-
spective places in Table 12.4 (see the three entries in row 4).
All that remains is to calculate the three F ratios corresponding to the two main
effects and the interaction. Before we perform these calculations, however, we should
first verify that we have not made any calculation errors when determining the various
sum of squares values or the variances estimates. Here are three quick error checks us-
ing the completed two-way ANOVA source table shown in Table 12.4:
III First, using the entries from Table 12.4, make certain that partitioned sum of
squares for the main effects, the interaction, and the error term actually add up
By employing any and all error to the total sum of squares, or:
checks, data analysts increase the
SStotal = SSA + SSB + SSA x B + SSwithin,
chance that the observed results are 34.96 = (12.04 + 9.37 + .06 + 13.49),
actually correct. 34.96 = 34.96.
In this case, the SStotal precisely matched the sum of the partitioned sum of
squares values. Keep in mind, however, that some rounding error often occurs-
the point is that the SStotal and the sum of the remaining sums of squares should
be approximately equal to one another.
IIiIJ Second, be sure that the individual degrees of freedom for the respective variance
estimates actually add up to the dftotal using:
dftotal = d!A + d!B + d!A x B + d!within,
23 = (1 + 1 + 1 + 20),
23 = 23.
Once again, our calculations prove to be accurate-the error check is successful.
n; A third possible error check involves checking the accuracy of the four variance
estimates based on the division of each sum of squares value by its accompanying
degrees of freedom. When a 2 X 2 design is used, each of the degrees of freedom !
for the main effects and the interaction will be equal to 1 (as noted previously,
each SS value will necessarily be the same as its MS value). It is a good idea,
however, to verify that the MSwithin is actually equal to SSwithinl d!within, or:
SSwithin
MSwithin = -d'l'· . '
:fwlthm
13.49
.675 = - - ,
20
.675 = .675.
We can now calculate the three F ratios and determine which one, if any, is statis-
tically significant. All the information we need is easily gleaned from the right side of
Table 12.4, the far end of which contains space where the F ratios are entered (with the
accompanying significance levels, if any).
The Effects of Anxiety and Ordinal Position on Affiliation: A Detailed Example of a Two-Way AN OVA 485
The F ratio for the main effect of factor A (birth order) is known by:
[12.15.1]
[12.15.2] F = 12.04,
A .675
[12.15.3] FA = 17.84.
The F ratio for the main effect of factor B (anxiety level) is calculated using:
[12.16.1]
-
FB- MSB ,
MSwithin
[12.16.2] F = 9.37,
B .675
[12.16.3] FB = 13.88.
Finally, the F ratio for the A X B interaction is calculated, despite the fact that the ex-
tremely small magnitude of SSA x Bassures us that the effect will not be a significant one:
FAX - MSAX B
[12.17.1] B - ,
MSwithin
[12.17.2]
F .060
A X B = .675'
To determine whether any of the F ratios reach significance, we first report each
one by recording its numerical value and accompanying degrees of freedom-the nu-
merator df corresponds to the effect's variance estimate df(see the first three rows in
column 3 in Table 12.4) and the denominator degrees of freedom is the dfwithin (see
row 4 in column 3 in Table 12.4), for:
FA (1, 20) = 17.84,
FB(1, 20) = 13.88,
FAX B(1, 20) = .089.
We now turn to Table B.5 (critical values of F) in Appendix B in order to determine
;
the critical value at the .05 level for an F ratio with 1 and 20 degrees of freedom. We
/ locate the column for numerator 1 across the top of Table B.5 and then read down the
table's left-most rows until we find the denominator value of 20. When we find the in-
tersection between row and column, we choose the lighter type-faced value of 4.35,
i which is the .05 critical value (recall step 2). We now compare this number with each
of the obtained Fs in order to determine if they equal or exceed it in value, in which
case they can be declared significant:
Is FA (1, 20) = 17.84 ~ Ferit (1,20) = 4.35? Yes, reject Ho.
Is FB(1, 20) = 13.88 ~ Ferit (1,20) = 4.35? Yes, reject Ho.
Is FAX B(1, 20) = .089 ~ Ferit (1, 20) = 4.35? No, accept or retain Ho.
Thus, the two main effects in the affiliation study reached significance but
the interaction between the two variables did not. What do we know? Look back
at Table 12.4. We know that there was a main effect for birth order (factor A),
such that firstborn participants rated the desire to wait with others (.KAt = 3.67) as
486 Chapter 12 Mean Comparison III: Two-Variable Analysis of Variance
significantly greater than later-born participants (X'A2 = 2.25) (see the marginal means
in the far-right side of Table 12.4). There was also a main effect for anxiety level, so that
participants in the high anxiety group rated the desire to wait with others (X'B, = 3.58)
as significantly greater than those who heard a less anxiety provoking communication
from the experimenter (X'B2 = 2.33) (see the marginal means in the bottom row of Table
12.4). There was no interaction between these two variables, however, as was originally
shown by the parallel plots of cell means in Figure 12.5.
Please notice that no post hoc tests are necessary with two-way ANOVA when it is
used to analyze a 2 X 2 research design. All the mean comparisons for the main effect
and the interaction are, as it were, "hardwired:' We need only to examine the marginal
means or the cell means in a table similar to Table 12.4 in order to determine the
direction of any effect and whether it is consistent with or in opposition to a relevant
hypothesis. We will postpone further interpretation of these data, as well as advice
on how to write up the results, until after we consider issues of effect size and the
degree of association between the independent variables and the dependent variable
in this study.
Effect Size
How can we characterize the effect sizes of the significant results, the main effects, we
found? We can use Cohen's (1988) f, which was first introduced in chapter 11. Recall
that we use fto label a significant effect as small (f = .10), medium (f = .25), or large
(f = .40). To calculate f, we use formulas [11.21.1] and [11.22.1], respectively:
f= J \r(
1 - 'l1
2'
where
SSeffect
'l12 =-_.
SStotal
We are only interested in the effect sizes of the main effect for birth order (factor A)
and anxiety level (factor B), as they were both statistically significant. All of the infor-
mation we need can be drawn from the ANOVA source table in Table 12.4. The effect
size for birth order is:
2 SSA
[12.18.1] 'l1A = S-S--'
total ...
(
2 12.04
[12.18.2] 'l1A = 34.96'
I
[12.18.3] 'l1i = .34.
This value of 'l1i is then entered into the formula for f:
[12.19.1] fA = J?-liiA
2'
1 - 'l1A
= ~,
34
[12.19.2] fA
1 - .34
[12.19.3]
,
j
[12.19.4]
The Effects of Anxiety and Ordinal Position on Affiliation: A Detailed Example of a Two-Way ANOVA 487
[12.19.5] fA = .71.
Thus, the main effect for birth order is quite large.
What about the effect size of anxiety level? We repeat the same analyses, now sub-
stituting the symbols for factor B as well as SSB:
2 SSB
[12.20.1] 1/B = - - ,
SStotal
2 9.37
[12.20.2] 1/8 = 34.96'
~
[12.21.1] f = __B_,
B 1 - 1/~
[12.21.2] fB
f
=
~
F¥f.
1 - .268'
).268,
[12.21.3]
B .732
[12.21.4] fB = V.3661,
[12.21.5] fB = .61.
The main effect for anxiety level, too, is quite large.
The degree of association between birth order (factor A) and affiliative desire, then,
i would be:
[12.23.1]
(1)(17.84 - 1) ,
[12.23.2] wA- 2 -
(1)(17.84 - 1) + 24
(1)(16.84)
[12.23.3] wA- 2 -
(1)(16.84) + 24
r
488 Chapter 12 Mean Comparison III: Two-Variable Analysis of Variance
16.84
[12.23.4] ~i= ----,
16.84 + 24
1\2 16.84
[12.23.5] WA = 40.84'
[12.23.6] ~i = AI.
[12.24.1]
Once again, we draw on numerical information provided in Table 1204 and complete
the formula:
~2 _
(1)(13.88 - 1) ,
[12.24.2] B-
(1)(13.88 - 1) + 24
~2 _
(1)(12.88) ,
[12.24.3] B-
(1)(12.88) + 24
~2 _ 12.88 ,
[12.24.4] B-
12.88 + 24
1\2 12.88
[12.24.5] W =--,
B 36.88
[12.24.6] ~~ = .35.
About 35% of the variance pertaining to participants' affiliative tendencies, then, can
be ascribed to their anxiety level. This variation is not explained by birth order or the
interactive effects (which are virtually absent) between birth order and anxiety. We can
now take this information, along with the effect size and the main two-way ANOVA re-
sults, and report the results in written form.
(
Their ratings, which were based on a 1 to 5 rating scale (higher numbers reflect a
greater desire to affiliate with others) were analyzed by a 2 (first- vs.later-born) X 2 (high
vs. low anxiety) analysis of variance (ANOVA). Table 1 presents the means separated by
birth order and anxiety level. A significant main effect for birth order was found, indicat-
ing that as expected, first born men (M = 3.67) rated their desire to affiliate as greater
than later born men (M = 2.25), F (1,20) = 17.84, P < .01. The main effect for anxiety
level was also significant. Participants who heard the high anxiety communication (M =
3.58) wanted to wait with others more than those in the low anxiety condition (M = 2.33),
F (1, 20) = 13.99, P < .01. There was no interaction between these two factors, however,
F (1,20) = .089, P = ns. The effect sizes for birth order (f = .71) and anxiety level (f =
.61) were both quite large. The degree of association between the desire to affiliate and
birth order ((;l = .41) and anxiety level ((;l = .35), were quite strong, as well.
Note: Means are based on a 5-point scale. Higher values reflect a greater desire to affiliate with others.
Please notice that I elected not to include a line graph of the cell means (but see Fig-
ure 12.5) because the interaction did not reach significance. Had an interaction been found,
then I would want to draw the reader's attention to it. Instead, I chose to highlight the mar-
ginal means when the main effects were reported in the text (see above). There is no one
right way to report these or any set of data. As the data analyst, the choice is your own. As
you gain experience writing and presenting results, you will acquire intuition about which
results to highlight and which ones to downplay a bit-just be sure to be honest and re-
! port all of the main results from any multi-factor ANOVA design.
low), we might have added gender (male vs. female). Recall that Schachter's (1959) orig-
Higher order AN OVA designs-those inal work used women and our replication study employed men. If we included both
men and women in the study, then our research design would be a 2 X 2 X 2 design.
beyond the typical 2 x 2 design-
Not only would we have three main effects-one each for birth order (A), anxiety level
can contain three or even four (B), and participant gender (C)-we would also have three interactions (i.e., A X B, B X
independent variables. Such C, and A X C) and what is known as a triple interaction (i.e., A X B X C) to analyze.
complexity can sacrifice As more variables are added, the designs become more complex to carry out and
to interpret, and their practical value is sometimes questionable (Gravetter & Wallnau,
interpretability, however.
1996; but see Rosenthal & Rosnow, 1991). Indeed, one must be quite practiced at eval-
uating more basic two-variable designs before tackling anything on a grander scale.
Higher order factorial designs are best saved for advanced statistics classes, though I
will encourage you to look for comprehensible examples as you read in the behavioral
science literature (for some cases, see Rosenthal & Rosnow, 1991). In the next chapter,
we will examine another advanced ANOVA technique-the repeated measures
ANOVA-that is neither difficult to perform or understand.
Knowledge Base
Examine the following table of means and then answer the questions that follow.
Variable A
Variable B Al A2 Marginal Means
BI 10 20 15
B2 20 10 15
Marginal means 15 15
Answers
1. No. The means for Al and A2 are both 15.
2. No. The means for BI and B2 are both 15. I'
3. Yes. A plot of the cell mean indicates that there is a classic crossover interaction.
Psychologists Ralph Rosnow and Robert Rosenthal (1989; Rosenthal & Rosnow, 1991)
launched something of minor crusade to decrease the widespread misunderstanding
/
surrounding the concept of statistical interaction within the ANOVA. Interactions are
the bread-and-butter results of many areas of the behavioral sciences, and yet many re-
searchers, data analysts, and readers remain unaware of a major problem in how such
effects are presented-interaction effects are rarely separated from the main effects
(Rosenthal & Rosnow, 1991). We already noted that a significant interaction qualifies
any accompanying main effect (see Data Box 12.B), but what do these researchers mean
by "separating" the two effects from one another? How will such separation help you
to interpret interactions in the future?
Rosenthal and Rosnow (1991) point out that when an interaction from a 2 X 2
ANOVA is significant, the pattern of results will always be the classically "X" -shaped
The Effects of Anxiety and Ordinal Position on Affiliation: A Detailed Example of a Two-Way ANOVA 491
Computer-enhanced classroom 7 13 10
Traditional classroom 7 5 6
Mean 7 9 8
pattern. At this point, you should be wondering how this observation could be true
when we learned earlier that an interaction can take on any number of forms. Both
points are true-Rosenthal and Rosnow are simply drawing attention to the fact that
most researchers never actually display the "true" interaction effect (i.e., those we ex-
amined earlier still had the main effects embedded within them). Let's quote the
authors on this matter before we move onto an illustrative example: "The diagram of
the interaction is X-shaped; indeed, it is true in general that in any 2 X 2 analysis of
variance the display of any nonzero (i.e., significant) interaction will be X-shaped"
(Rosenthal & Rosnow, 1991, p. 366).
In other words, before an accurate interpretation of an interaction can be offered,
the interaction itself must be identified and then displayed. To do so, Rosenthal and
Rosnow ( 1991) advocate that researchers calculate the residual values within a research
design that point to the interaction. The term "residual" usually refers to something that
is left over when something else has been removed. In the context of interactions, the
residuals are guiding numbers that remain once the main effects have been removed.
Perhaps we have some data from an education project on the use of computer-enhanced
classrooms and learning in traditional high school subjects. In other words, do com-
puter classrooms generally enhance learning or does the subject matter make a differ-
ence (e.g., English, mathematics)?
Take a look at the hypothetical test scores shown in of Table 12.5. As you can see,
there appears to be a main effect for environment (i.e., students score higher when
learning occurred in the computer-based rather than traditional class setting) and one
for subject matter (i.e., math scores were slightly higher than English scores). (If you
feel a bit rusty about reading this data table, turn back to the relevant section on read-
ing tables presented earlier in this chapter before proceeding with the rest of the dis-
cussion.) There also appears to be an interaction, which is plotted in Figure 12.6. Imag-
ine that the interaction is statistically significant-how do we interpret it following
) Rosenthal and Rosnow (1991)?
To uncover the interaction, we must first subtract the row and the column ef-
fects (i.e., the means) from each of the cells in the experiment using a technique that
Rosenthal and Rosnow (1991) refer to as the mean polish.
KEY TERM The mean polish is a technique whereby row, column, and grand mean effects are removed from
a table of means in order to illustrate residuals that define an interaction.
A row effect is based on subtracting the grand mean, the average of all the obser-
vations, (here, the number 8 shown in the lower right portion of Table 12.5) from each
of the row means for the computer-enhanced and traditional environment, respectively,
or: 10 - 8 = 2 and 6 - 8 = -2 (see the far right column in Table 12.6). We then sub-
tract the respective row effects from each of the conditions found within its row. For
, the computer-enhanced row, the cell value of 7 becomes 5 (i.e., 7 - 2) and the cell value
/ of 13 becomes 11 (13 - 2; see the upper row in Table 12.7). We then subtract the row
492 Chapter 12 Mean Comparison III: Two-Variable Analysis of Variance
20
15
Computer-Enhanced Classroom
y 10
5 Traditional Classroom
oL----------------------------------x
English Math
Subject MaHer
Figure 12.6 Figure Illustrating Two Main Effects and Interaction Involving Learning Environment
and Subject Matter
effect from the cells representing the traditional environment and find the new cell val-
ues of9[7 - (-2) = 9] and 7[5 - (-2) = 7] (see the lower row in Table 12.7). Finally,
each of the row means is also corrected by subtracting its respective row effects (see the
new means of 8 and 8, respectively, in Table 12.7).
The next step involves removing the column effects from the data table of means.
We use the same procedure that was used to remove the row effects. To begin, we iden-
tify the column effects, which are based on subtracting the grand mean of 8 from each
of the two column means. The column effects are 7 - 8 = -1 for the English course
and 9 - 8 = 1 for math (these values are identified in the bottom row of Tables 12.6
and 12.7). Removal of a given column effect means that the effect's value must be sub-
tracted from each of the cells within its own column (remember that the cell values be-
ing used have had the row effects already removed; see Table 12.7). Thus, the two values
in the first column in Table 12.7 now become 6[5 - (-1) = 6] and 10[9 - (-1) = 10)].
The two values in the second column of Table 12.7, in turn, now are 10(11 - 1 = 10) and
6(7 - 1 = 6). We must also subtract the respective column effects from the column means,
yielding values of 8[7 - (-1) = 8] and 8(9 - 1 = 8), respectively. Test scores corrected
for both row and column effects are shown in Table 12.8. (
What are we left with in Table 12.8? Once the row and column effects are removed,
we are left with a set of residuals that define an interaction effect, yet one last calcula-
tion remains. We must remove the effect of the grand mean from the residuals because
••••
'I " Subject Matter
Environment English Math Mean Row Effect
Computer-enhanced classroom 7 13 10 2
Traditional classroom 7 5 6 -2
Mean 7 9 8
Column effect -1
The Effects of Anxiety and Ordinal Position on Affiliation: A Detailed Example of a Two-Way ANOVA 493
Computer-enhanced classroom 5 11 8 0
Traditional classroom 9 7 8 0
Mean 7 9 8
Column effect -1
this number inflates their values. To remove the grand mean-we already know that its
value is 8 (see Table l2.5)-we subtract its value from each of the cells, as well as each
of the row and column means. If you subtract the grand mean of 8 from all of the num-
bers shown in Table 12.8, then you end up with the numbers presented in Table 12.9.
The positive and negative pattern shown by the numbers in the cells in Table 12.9, not
the numbers per se, reveal the descriptive nature of the interaction. That is, positive and
negative signs indicate the relative magnitude of the relationships between the cell means.
What do these numbers suggest about the relationship between learning environ-
ment and subject matter? Take a closer look at Table 12.9: Math test scores were higher
than English test scores when learning took place in the computer-based classroom;
however, English scores were higher than math scores when the traditional classroom
settings were utilized. Put in still simpler terms: Math scores are higher when learning
is computer-enhanced, whereas English scores are higher when learning occurs in the
traditional classroom. Compare these statements with the residuals shown in Table 12.9.
This interpretation of the interaction is not only simple, it is also completely accurate.
To plot the means disclosing this relationship, we need only go back to the corrected
means provided in Table 12.8 (remember that these means are all positive because the
grand mean was not yet removed from them). As you can see in Figure 12.7, the plot
of the mean is X-shaped, just as Rosenthal and Rosnow (1991) promised, and it does
illustrate the interaction once the main effects due to learning environment and sub-
ject matter are removed from the data.
That is all there is to it! The calculations involved are not at all difficult and once
you have the concepts down, you can polish the means of any 2 X 2 study quickly and
efficiently in order to make certain that you are reporting the accurate interaction, one
based on the actual pattern of the residuals.
Here is the Project Exercise, one designed to give you some practice with using the
mean polish procedure and displaying residuals.
Computer-enhanced classroom 6 10 8 0
Traditional classroom 10 6 8 0
Mean 8 8 8
Column effect 0 0
494 Chapter 12 Mean Comparison III: Two-Variable Analysis of Variance
1. Locate some data based on a 2 X 2 between-groups research design. You can lo-
cate studies published in the behavioral science literature, reanalyze data from
tables presented earlier in this chapter (see Table 12.1 or Figure 12.4, for in-
stance), or ask your instructor for some data. Alternatively, create a data table
demonstrating an interaction-make up some data similar to what we just re-
viewed.
2. Perform the mean polish from start to finish. Be sure to create progressive data
tables like those we reviewed above in order to carefully track your calculations
(i.e., remove the row effects first, then the column effects, then the grand mean,
etc.). Keep the mise en place philosophy in mind here, as it is relatively easy to
make a simple math error, one that will disrupt the inherent symmetry of the
residual patterns.
3. Once you have the residuals from the mean polish, describe what they mean in the
simplest terms possible-be sure to write out your interpretation and then prac-
tice simplifying it still more, if necessary. Finally, plot the mean residuals before
the grand mean is removed so that you can get a real sense of the X-shape of the
interaction, as well as to verify that your calculations were correct. (Keep in mind
that you cannot perform the mean polish and expect to get tidy residuals, as well
as a neat X-plot, if there is no interaction present in the data.)
20
15
y 10 Computer-Enhanced Classroom
5 Traditional Classroom
oL-----------------------~---------x
English Math
Subject Matter
Figure 12.7 Figure Illustrating Interaction Between Learning Environment and Subject Matter
Once the Two Main Effects Are Removed
Note: The data here are taken from Table 12.8.
The Effects of Anxiety and Ordinal Position on Affiliation: A Detailed Example of a Two-Way ANOVA 495
I he decision tree that opens this chapter is designed to help you differentiate among
f the various ANOVAs and to choose the one that best fits a research design and its
.data. (For continuity, this same tree is reprinted at the opening of chapter 13, the
last chapter devoted to mean comparisons). When, for example, is it appropriate to use
a one-way (chapter 11) or a two-way ANOVA (this chapter's topic) rather than a one-
way repeated-measures ANOVA or a mixed-design ANOVA (topics covered in the next
chapter)? Such decisions should not-indeed, cannot-be taken lightly by researchers;
there is little worse than choosing an incorrect statistic or collecting data that cannot
be analyzed.
I t
Assuming that a two-way ANOVA is identified as appropriate, the second decision
I
(
tree will help you to decide which supplementary analyses (if any) should accompany
I the main results of an analysis, as well as to pinpoint what additional information
(e.g., tables, graphs, source tables) should also be included in any writeup of research
data. Remember, a good data analyst tries to portray a complete picture of the results-
how results are presented matter as much as what they mean in the context of a theory
or an application. Thus, tables and graphs must accompany the main analyses which,
in turn, are supported by supplementary statistics. The whole "package" of results should
provide readers with a coherent framework within which to think about the implica-
tions of and future directions for the findings.
Summary
1. Two-variable ("two-way") analysis of variance (ANOVA) de- 6. The statistical assumptions of the two-way ANOVA are iden-
signs involve two independent variables. tical with those associated with the traditional one-way
2. Human behavior is complex, and no single research design ANOVA or the independent groups t test. An additional as-
with an accompanying method of analysis can hope to sumption is that any two-way ANOVA should be fully fac-
capture it entirely. Complex ANOVA designs enable inves- torial (all combinations existing between the two indepen-
tigators to explore multivariable relationships, thereby dent variables are displayed).
extending knowledge beyond conventional one-variable 7. Partitioning variability in the two-way ANOVA is analogous
studies. to the procedure used for the one-way ANOVA-the chief
3. Multivariable-factorial-designs present advantages where difference, of course, is that variability for two independent
economy, efficiency, elegance, generality of effects, and in- variables and their interaction are explained. Though they
teraction between variables are concerned. include a few more steps, the calculations for the two-
4. A two-way ANOVA provides three effects: a main effect (mean factor ANOVA are no more difficult than those associated
difference) due to variable A, a second main effect (mean dif- with the one-factor ANOVA.
II ference) caused by variable B, and an A X B interaction. An 8. Once the significant F ratios are identified in a two-way
interaction between two variables reveals how a change in one ANOVA, it is advisable to calculate their respective effect sizes
I variable influences changes in the other variable. (f) as well as the association between the significant inde-
5. The most common factorial design is the 2 X 2 ("two by pendent variables and dependent measure (w 2 ). Such sup-
two") design, which involves two independent variables with plementary information is routinely included in the writeup
two levels in each one. By multiplying the number of levels of the results, along with a table or graph of a study's means.
in one variable by those in another, a researcher can identify 9. Multifactorial designs-studies where 3 or more indepen-
the number of unique experimental conditions. Thus, a 2 X dent variables are manipulated-are possible to conduct, but
2 has four conditions (Le., 2 X 2 = 4) and a 2 X 3 design interpreting their results becomes increasingly difficult as
has six (i.e., 2 X 3 = 6), and so on. their complexity increases.
Key Terms
Factorial design (p. 462) Main effect (p. 466) Residual values (p. 491)
Interaction (p. 467) Mean polish (p. 491) Two-way or two-factor ANOVA (p. 461)
496 Chapter 12 Mean Comparison III: Two-Variable Analysis of Variance
Chapter Problems
1. Conceptually, how does a two-way ANaVA differ from a one- Factor B
way ANaVA? (Hint: Use examples to illustrate your points.)
2. Why manipulate more than one independent variable? What
are the advantages of using a two-way ANaVA design? 10 10
3. What is a factorial design? How can you tell how many differ- 10 5
ent levels are present in a given factorial design?
4. How many separate conditions exist in the following designs: 15. Examine the following data table and indicate the presence of
3X~3X~2X2X~2X~2X2X2XL
any main effect( s) or interaction:
5. What is a main efJect?Define this term and provide an example.
FactorB
6. What is an interaction? Define this term and provide an ex-
ample. Why are interactions often graphed?
7. Why are interactions only associated with factorial designs?
8. What is the quickest, simplest method to determine if an in- 20 10
teraction is present in a set of data? 5 20
9. How many F ratios result from a two-way ANaVA? What is
the purpose of each of the Fs? 16. Create your own data table wherein:
10. In the context of the two-way ANaVA, what is a residual? a. There is a main effect for variable A and B, but no interaction.
How are residuals used for interpreting results from an b. There is a main effect for variable A but not B, and an in-
ANOVA? teraction.
11. A developmental psychologist believes that language learn- c. There is no main effect for A or B, but there is an interaction.
ing in preschool girls differs from boys, such that the former 17. The following data were collected in a 2 X 2 between -groups
use more complex sentence constructions earlier than the research design. Perform the appropriate analyses and be
latter. The researcher believes that a second factor affecting sure to include a data table, a source table, and, if necessary, a
language skills is the presence of older siblings; that is, graphical display of the relationship between the variables.
preschool children with older siblings will generate more FactorB
complex speech than only children. The researcher carefully
records the speech of a classroom of 40 preschool children
(20 males, 20 females), half of whom have older siblings. The 5,6,7,5,7,6 3, 3, 2, 4, 5, 5
speech of each child is then given a complexity score in 10, 11, 10,7,8,8 5,6,7,7,7,8
preparation for analysis-what method of analysis should
the researcher use? Why? Use the decision trees that open 18. The following data were collected in a 2 X 2 between-groups
this chapter to answer and to guide your thinking about research design. Perform the appropriate analyses and be
these questions. sure to include a data table, a source table, and, if necessary, a
12. Assume that the sentence complexity result(s) from the graphical display of the relationship between the variables.
analysis of identified in question 11 were statistically signifi-
FactorB
cant. Using prose, describe what the main effects for gender
and the presence of older siblings, respectively, might be like.
Speculate about the nature of the interaction between gen-
7,7,8,7,7 5,6,3,4,3
der and the presence of older siblings on children's sentence
6,5,6,6,6 7,6,7,7,6
complexity-what might it reveal? What next step(s)
should the investigator take? Use the appropriate decision
19. Calculate the effect size and association between the inde-
tree provided at the opening of this chapter to answer these
pendent variable and dependent measure (if appropriate) for
questions.
the results presented in question 17.
13. Examine the following data table and indicate the presence of
20. Calculate the effect size and association between the inde-
any main effect(s) or interaction:
pendent variable and dependent measure (if appropriate) for
Factor B the results presented in question 18.
21. A student taking a cognitive psychology class decides to con-
duct a memory experiment examining recall for nonsense
5 10 (e.g., sleemta) versus real (e.g., sleeve) words. Half of the par-
10 5 ticipants in his study learn nonsense words while the other half
study real ones. The students decide to also see how the envi-
14. Examine the following data table and indicate the presence of ronment where the learning occurs affects recall, so that half
any main effect(s) or interaction: the students learn their words in a very warm room (i.e., 82 de-
Chapter Problems 497
grees F) while the remaining students are placed in an ambient a. How many levels are there in factor A? Factor B?
room (70 degrees F) during the study. The data (number of b. If the cells sizes are equal, how many participants are in
correctly recalled words) are shown below. Using the appropri- each cell?
ate ANOYA, determine whether word type or learning envi- c. Did any of the F ratios reach significance?
ronment had any discernable effect on recall. Be sure to create d. What are the critical values associated with each of the Fs?
any appropriate tables or graphs, calculate supplementary sta- 24. Calculate the effect size and association between the inde-
tistics as appropriate, and to write up the results of the analysis pendent variable and dependent measure (if appropriate) for
in APA style. (Hint: you may find the decision trees appearing the results presented in question 22.
at the opening of the chapter to be helpful.) 25. Calculate the effect size and association between the inde-
pendent variable and dependent measure (if appropriate) for
Word Type
the results presented in question 23.
Learning Environment Nonsense Real 26. What is the procedure known as a "mean polish"? Why is it a
more useful procedure than simply looking at a data table
Warm room 2,3,4,3,4,2 3,4,2,4,4,3
and guessing the nature of an interaction?
Ambient room 5, 5, 4, 5, 6, 4 7,8,4,7,6,8
27. Create a 2 X 2 data table that you believe illustrates an inter-
Note: There are six participants per condition. action between some variables A and B, and perform a mean
polish on the table of means using the procedure outlined in
22. Complete the following ANOYA source table and answer the the Project Exercise.
accompanying questions: 28. An experiment has two between-groups factors with two lev-
els in each one, and the data are based on an interval scale.
Source Sum of Squares df MS F Which type of ANOYA should be used for the analysis? Why?
Between groups (Hint: Use the decision tree(s) at the start of this chapter to
Factor A 125 125 answer these questions.)
FactorB 225 225 29. The following data were collected in a 2 X 3 between-
AXB 65 65 groups research design. Perform the appropriate analyses
Within groups 1,100 and be sure to include a data table, a source table, and, if nec-
essary, a graphical display of the relationship between the
Total 51 variables.
a. How many levels are there in factor A? Factor B? FactorB
b. If the cells sizes are equal, how many participants are in
each cell?
c. Did any of the F ratios reach significance?
Al 10,11,10,12,11,10 8,8,7,9,8,7 5,4,5,4,5,4
d. What are the critical values associated with each of the Fs?
A2 5,5,5,6,4,4 7,8,8,9,8,7 11,10,9,12,11,10
23. Complete the following ANOYA source table and answer the
accompanying questions: 30. Calculate the effect size(s) and association between the inde-
Source Sum of Squares df MS F pendent variable and dependent measure (if appropriate) for
the results from question 29.
Between groups
Factor A 85
FactorB 93
AXB 52 1
Within groups 36
Total 570
Deciding Which ANDVA to Use
;• .• • . .•.• . •. ·······;:'?;'!4;·?;··,''',).;
Is one of the variables a>
·'epeated-measures factor
while the other is a
iRJil'l!ceen-groups factol?
I
lfttulrllis If there are If yes, then If no, then"go If yes, then If no, the~ the
oneindepen- two in de pen- perform a to step 4. perform a answer to the
dent variable, dent vari- two-way mixed-design previous
':then go to ables, then ANOVA. (see ANOVA-con- question may
step 5. go to chapter 12). jSlJlt an appro- be incorrect-
step 3. -priate source go back to
for guidelines. step 1 and
begin anew.
.:...., ......
......
5. 6. 7.
~}:~hthe two levels
g
Are there three or more We the samples or groups Are there at least two B• indepen~
levels in the independent represented by a single in- levels in the independent·· dent of one another? .
variable? dependent variable so that variable?
<they are independent of
- - . one another?
.... ..;; .......... ,..;.,.. ~
..•. iii •..•. ,.
I I I I I I
Ifyes'ttHln If no, they If yes, then j(l1d;then If yes,lh~ni'" ithO;ifhen
I. perform a are depen- go to step 8. go to step 9. perform an perform a
one-way dent obser- independent dependent
ANOVA (see vations, then groups t groups t
~chapter 11 ). perform a test (see test (see
repeated- .. (if -chapter 10). chapter 10).
measures
ANOVA 9.
No ANOVA is appropriate for the analysis-please reevaluate the
I .. data andIi.9IJr~l1alysisplan.
If.0. 2.
Ie> Are there more than
'\ 4.
Write up the results in
.'
Chapter Outline
• One-Factor Repeated-MeasureS
ANOVA
Statistical Assumptions of the
One-Way Repeated- Measures
ANOVA
Hypothesis, Notation, and
for Performing the One-
ny honest researcher or statistician will admit that it is virtually impossible to Project Exercise: Repeated-
control for or rule out every conceivable variable that could influence the results Measures Designs: Awareness
Threats to Validity and Int'~rer.!ce·:
of a study. Still, researchers and data analysts alike exert reasonable efforts to iden-
tify, reduce, and, ideally, eliminate any obvious influences that could jeopardize a study's • Looking Forward, Then Back
outcome-or at least render its interpretive character murky rather than clear. The pa- • Summary
tron saint of this effort to control the research situation turns out to be none other than • Key Terms
Ivan Pavlov, the Nobel prize winning physiologist who performed the original, classic
research on the conditioned reflex (Pavlov, 1927/1960; see also, Pavlov, 1928). As the fa-
ther of classical conditioning, Pavlov holds a revered place in the discipline of psychol-
ogy but, ironically, he held the field (and presumably the other behavioral sciences, as
well) in such low esteem that his assistants were fined or fired for using mentalistic
terms (e.g., mind, thought) while carrying out his research program!
Pavlov examined the salivary reflex of dogs, noting that saliva production initially
associated with the presentation of food (meat powder, actually) could be conditioned
to occur when a bell rang. The food served as the unconditioned stimulus for saliva-
tion, the unconditioned response; later, the ringing sound became the conditioned stim-
ulus for salivation, a (now) conditioned response. Pavlov compared the rate of saliva-
tion to the bell before conditioning (i.e., the control condition) with the rate of salivation
to the bell after the conditioning sessions (i.e., the experimental condition) within a
single subject.
Vadum and Rankin (1998) report that Pavlov tackled the problem of uncontrolled
variables by attempting to literally control all aspects of the experimental situation his
500 Chapter 13 Mean Comparison IV: One-Variable Repeated-Measures Analysis of Variance
participants, the dogs, encountered. Pavlov expended great effort to ensure that the dogs'
experiences were completely under his control, thereby demonstrating the hypothesized
relations between experimental variables and behavioral responses. The laboratory
building housing the dogs was known as the "tower of silence" because it was sound-
proof, a feature designed to eliminate any potential effects from outside (i.e., uncon-
trolled) noise. Aside from those times when an experimental procedure was running,
it was rare for anyone to directly interact with the animals. Instead, the dogs' behavior
was often observed via periscope to prevent any unplanned contacts with their human
caretakers or experimenters. As you can see, Pavlov was somewhat obsessed with the
idea of total control and its possible link to accurate inference about the causes of be-
havior. Vadum and Rankin (1998) note that his obsession for controlled to the devel-
opment of the so-called "n = 1" or single-subject designs (for discussion of these
research designs, see Dunn, 1999; Shaughnessy & Zechmeister, 1994).
It is much easier to impose absolute control in animal studies than the sort typi-
cally involving people, as practical, methodological, and ethical issues abound in hu-
man research. Substitute methods reinforcing the Pavlovian desire for control have
developed, however, not the least of which is the one-variable repeated-measures or
within-subjects research design, the topic of this chapter. This statistical technique tests
for mean differences in participants' responses to the same measure, one that is assessed
multiple times-usually following the presentation of some level of an independent
variable-in the course of a study. The repeated measures ANOVA, then, serves the
same purpose as the dependent groups t test (see chapter lO)-it assesses change in
people's reactions to the same dependent measure across some period of time. Where
the dependent variant of the t test could only explore change between two means mea-
sured at two distinct points in time, however, the repeated-measures ANOVA examines
change in the same measure at more than two points in time.
A researcher might study how and why people's emotional reactions shift depend-
ing on the nature of what they encounter. Thus, how the same group of people react,
feel, or act toward some stimulus-say, an exciting or distressing film-can be tracked
so that distinct changes (if any) can be noted. Such repeated measures enable the re-
searcher to estimate when and if a change occurred, as well as its direction (i.e., posi-
tive, negative, or zero). Repeatedly assessing participants' reactions to varied stimuli pro-
vides us with a more coherent picture of how changes in thought and behavior occur.
As we will see, a researcher can introduce participants to all levels of an independent
variable using a repeated-measures design, a huge savings in time and materials. Instead
of running, say, four independent groups and exposing each to one level of an indepen-
dent variable, one group is presented with all four levels of treatment. Not surprisingly,
repeated-measures designs capitalize on several of the advantages associated with factor-
ial experiments cited in chapter 12. That is, repeated-measures designs are generally:
KEY TERM A one-variable or one-way repeated-measures ANOVA is a statistical test used to identify differ-
ences between or among three or more sample means representing the corresponding (different)
levels of a single independent variable.
said to serve as his or her own "control group:' The only available source of error, then,
should be that due to unsystematic but uncontrolled experimental error (e.g., environ-
mental variables such as heat, lighting, or room temperature; inconsistent instructions).
As a result, the Fratio of interest in a one-variable repeated-measures design is based
on the effect of the independent variable in question and experimental error only, or:
variance differences between sample means due tq the
independent variable + variance differences due to experimental error
Fbetween = variance differences due to experimental error .
This F ratio represents a between-groups or treatment factor in that each of its levels
is systematically presented to the same group of participants, and a different sample
mean results from each exposure to a given level. For this reason, we continue to refer
to the statistic as a "between-groups" F ratio but it is a "within-subjects" or repeated-
measures statistic because it determines the presence or absence of mean differences
based on the responses of the same participants across some period of time.
Do make careful note that a repeated-measures design only has experimental
error in the denominator-there is no error attributable to individual differences,
unlike the between-subjects designs reviewed previously. What happens to the error
due to individual differences? This error-which is typically quite low in a repeated-
measures design-is removed separately from the analysis. Typically, the lower error
variance in the denominator of the F ratio will yield a larger statistic (Le., even mod-
est between-groups differences based on an independent variable are often signifi-
cant when a denominator is small).
How does the one-variable repeated-measures ANOVA separate or partition vari-
Error variance in a repeated- ability? With one exception-the inclusion of a new sum of squares term-the variabil-
ity components will be familiar to you. The total sum of squares (SStotal) is comprised of:
measures AN OVA is based on
experimental error exclusively, as SStotal = SSbetween + SSsubjects + SSwithin'
individual difference variation is The only new component of variability here is SSsubjects, which statistically represents
partialed out of the analysis. the change within the group of participants across the given number of measures. When
a considerable aspect of a study or experiment of interest is devoted to individual
difference issues, then the SSsubjects will be relatively large (Runyon, Haber, Pittenger,
& Coleman, 1996). Research that is not focused on individual differences per se,
however, will generally yield values for SSsubjects that are low in value.
When researchers employ a repeated-measures design, they must necessarily be con-
cerned about the sequencing used to present any stimuli (i.e., levels of the independent
variable) to research participants. If all the participants experienced the same material in
the same order of presentation, an inherent bias or confound might develop (e.g., par-
ticipants might attend more readily to the early items and pay less or no attention to
information coming later in the study). This particular problem and its antidote, coun-
terbalancing, were discussed in the context provided by the dependent groups ttest (please
see chapter 10 for a review; but see the Project Exercise at the end of this chapter).
• The data collected within each level of the independent variable are independent
of one another (although the dependent [repeated] measures are not independent
of one another-a participant's response to measure! is necessarily related to her
response to measurel, and so on).
• The distribution of the population from which the measures in each level of the
independent variable are drawn is assumed to be normal. Concern over the
normality assumption is only appropriate, however, when the sample of participants
and the number of observations drawn from them are both very small.
• The distribution of the populations within each level of the independent variable
are homogeneous (i.e., homogeneity of variances requirement).
A fifth and somewhat novel assumption exists for the repeated-measures ANOVA:
• The effects of each level of the independent variable on behavior should be
consistent across a study's participants and, as noted previously, no carryover
or other order-related effects should be present in the data (Gravetter &
Wallnau, 1996).
Table 13.1 Hypothetical Data from a Study on Rumination and Therapeutic Meditation
Subject Baseline Week Meditation Weeki Meditation Week2 Subject Totals
1 Xu = 10 X11 = 100 X I2 = 5 X12 = 25 X I3 = 4 X13 = 16 19
2 X21 = 8 X~I = 64 X22 = 5.5 X~2 = 30.25 X 23 = 6 X~3 = 36 19.5
3 X31 = 10 X;I = 100 X 32 = 6 X;2 = 36 X33 = 5 X;3 = 25 21
4 X41 = 9.5 X~I = 90.25 X 42 = 7 X~2 = 49 X 43 = 6 X~3 = 36 22.5
5 XSI = 7 X~I = 49 XS2 = 5 X~2 = 25 XS3 = 5.5 X~3 = 30.25 17.5
6 X61 = 11 X~I = 121 X 62 = 6 X~2 = 36 X63 = 4 X~3 = 16 21
Measure
totals IXI = 55.5 IX2 = 34.5 I X3 = 30.5
I X1 = 524.25 I X~ = 201.25 I X~ = 159.25
Means XI = 9.25 X2 = 5.75 X3 = 5.08
Additional Calculations for the One-Way Repeated-Measures ANOVA:
I X~ = 55.5 + 34.5 + 30.5 = 120.5
I Xij = 524.25 + 201.25 + 159.25 = 884.75
Note: Each table entry (X) represents the average (mean) number of ruminative thoughts across a
7-day period.
We will again follow the basic steps for testing a hypothesis laid out in Table 11.4.
Step 1 entails the identification of the null and the alternative hypotheses. The null
hypothesis posits that no differences exist among the means representing participant
responses to the different levels of an independent variable. In the context of our ex-
ample, the null hypothesis would be:
The most straightforward alternative hypothesis involves assuming that some mean dif-
ference between the repeated measures will be found, or:
HI: There will be at least one mean different from another mean.
The second step in the process of testing a hypothesis involves choosing a significance level
for rejecting the null hypothesis. Following precedent, we will rely on a p value of .05.
We begin the actual analyses of the data here in step 3. As we proceed with the
analyses, we will complete a source table for the one-way repeated-measures ANaVA
(see Table 13.2 later in this section). The number of calculations involved in this sta-
tistical test is far fewer than the two-way ANaVA you learned in chapter 12, but still
more than is found with the one-way ANaVA introduced in chapter ll. Please be cer-
tain that you always know the origin of any entries within a given formula (hint:
following along by comparing the numbers and answers shown in the formula with the
numbers in Tables 13.1 and 13.2, respectively, will help in this regard). If you ever be-
come confused or uncertain, please stop and review the appropriate portions of the text
before proceeding-a moment's delay is a small price to pay where accurate under-
standing is concerned! Moreover, your careful review now will pay dividends in the fu-
ture, when homework or review for an exam is necessary.
As always, we begin by calculating the total sum of squares (SStotal):
(LXii
[13.1.1] S~otal = L X~j - --"--
N
One-Factor Repeated-Measures ANOVA 5[»5
where I xt is based on squaring and then summing all of the individual observations
shown in Table 13.1 (please notice that this particular calculation is shown there un-
der the heading ''Additional Calculations for the One-Way Repeated-Measures ANOVR'
at the bottom of the table). The error or correction term-(I Xi/IN-is based on sum-
ming all of the observations, squaring the sum (this calculation is also shown at the
bottom of Table 13.1), and then dividing the resulting product by N. Here is an impor-
tant note: In any repeated measures ANOVA, N refers to the total number of observa-
tions available, rather than the number of participants (see the 18 entries in the mid-
dle of Table 13.1); n is used to denote the number of participants in this kind of research
design (there are six here-see the leftmost column in Table 13.1).
~i We can now enter the numbers from Table 13.1 and calculate SStotal:
Avoid confusion: In a repeated (120.5)2
measures ANOVA, N refers to the [13.1.2] SStotal = 884.75 -
18
total number of observations present
14,520.25
in a data set-n designates the [13.1.3] SStotal = 884.75 -
18
number of participants who took part
[13.1.4] SStotal = 884.75 - 806.68,
in a study.
[13.1.5] SStotal = 78.07.
Once the value of the SStotal is entered at the bottom of column 2 in Table 13.2,
its accompanying degrees of freedom are calculated:
[13.2.1] dftotal = N - 1.
Again, do not forget that N refers to the number of available observations and not the
number of participants in the study:
[13.2.2] dftotal = 18 - I,
[13.2.3] dftotal = 17.
The dftotal is then entered at the bottom of column 3 in Table 13.2.
Our next goal is to calculate the between-groups variance estimate. To obtain this
value, we first need to calculate the between-groups sum of squares and its degrees of
freedom. The SSbetween is known by:
(I Xi
[13.3.1] SS between = ---"--
n N
The first half of the formula (a) involves summing the observations within each of the
treatment periods (Le., the 3 weeks; I Xj ), squaring the 3 sums separately ((I Xi),
and then dividing each product by the number of observations within its treatment
(here, n = 6)-the resulting 3 values are then added together. The second half of the
formula (b) is the correction term, whose value was determined when we calculated
the SStotal (the value-806.68-is first shown above in [13.1.4]). All of the numerical
information we need for this calculation can be obtained from the preliminary work
done in Table 13.1. Thus,
The SSbetween can then be entered into the first row under column 2 in Table 13.2.
The degrees of freedom corresponding to the SSbetween are equal to:
[13.4.1) d/between = k - 1,
where k is the number of levels of the independent variable (here, the 3 weeks).
[13.4.2) dft,etween = 3 - 1,
[13.4.3) dft,etween = 2.
The value of the dft,etween can be recorded under column 3 in the first row of Table 13.2.
Given that we now know the SSbetween and the dlbetween, we can determine the
MSbetween using:
[13.5.1) MS - SSbetween
between - d'£ '
"between
60.12
[13.5.2) MSbetween = -2-'
The second part of the formula (b) is simply the correction factor used in the previous
calculations. In contrast, the first portion of this formula can seem to be unfamiliar un-
til you realize that it merely entails squaring and then summing each of the "subject to-
tals" (i.e., (I Xi?) shown in the far right side of Table 13.1. Once these values are squared
and summed, the resulting product is divided by the number of levels in the indepen-
dent variable (k), which is equal to 3 in the present analysis, or:
[13.6.2)
[13.6.3)
As you can see, there is very little variation in this study that is linked with individual
differences. The value of SSsubjects is entered into row 2 under column 2 of Table 13.2.
One-Factor Repeated-Measures ANOVA 507
The dfsubjects is calculated by the following formula before its value is placed in row
2 under column 3 of Table 13.2:
[13.7.1] dfsubjects =n- 1.
There are six participants, so:
[13.7.2] dfsubjects =6- 1,
[13.7.3] dfsubjects = 5.
The value of MSsubjects can then be readily determined using:
SSsubjects
[13.8.1] MSsubjects = ,
dfsubjects
MS 5.24
[13.8.2] subjects = -5-'
[13.8.3] MSsubjects = 1.05.
The MSsubjects is then recorded in row 3 under column 4 in Table 13.2.
The final set of calculations for the ANOVA source table deal with within-groups
variability. We begin by calculating the SSwithin using:
'" 2 (I Xi)2 (I xi (I Xii
[13.9.1] SSwithin = LX ij - k - n + N
There are a few more numbers in this formula than the others, but all of the values are
already available, either from prior calculations or Table 13.1. The first part of the for-
mula (a) is readily found in Table 13.1 (i.e., I X~j = 884.75), for example, and the sec-
ond part (b) was just determined in the previous set of calculations «I
X i )2/k = 81l.92;
see [13.6.5]). The third entry (c) can be taken from the between-groups sum of squares
calculation «I
X/In = 866.80; see [13.3.5]) and the last entry (d), of course, is the
correction term, which was originally determined in [13.l.4]. Entering the appropriate
values we find:
[13.9.2] SSwithin = 884.75 - 811.92 - 866.80 + 806.68.
Do not be put off by all these numbers-just perform the addition or subtraction as
appropriate to find SSwithin:
[13.9.3] SSwithin = 12.71.
This value is then placed into the third row of column 2 in Table 13.2.
We then calculate the degrees of freedom for the within-groups variance estimate:
[13.10.1] d!within = (n - 1)(k - 1),
[13.10.2] d!within = (6 - 1)(3 - 1),
[13.10.3] d!within = (5)(2),
[13.10.4] d!within = 10.
The d!within is then recorded in the third row of column 3 in Table 13.4.
The variance estimate within-groups is determined by:
[13.11.1] e.
M "within - SSwithin
- ,
d!within
12.71
[13.11.2] MSwithin = """"'lO'
[13.11.3] MSwithin = 1.27.
508 Chapter 13 Mean Comparison IV: One-Variable Repeated-Measures Analysis of Variance
Cell Size Matters, But Keep the Cell Sizes Equal, Too
T hroughout this book, the importance of obtaining an adequately sized sample is stressed again
and again. Larger samples are much more likely to approximate or even closely characterize
populations than smaller ones. This truth is evident whether one is performing a t test or one of
the various types of ANOVAs we have learned about in this and the previous two chapters.
So, size matters, but should we also worry about whether each cell in a simple or complex
research design has an equivalent number of observations in it? In a word, yes. A one-way ANOVA
can have differing numbers of participants in each level of the independent variable with no ill
effects where the results are concerned, but a two-way ANOVA is another story. Performing a fac-
torial ANOVA with unequal numbers of observations in each cell is a problematic but, unfortu-
nately, rather common practice; conducting an actual research project from start to finish is not
always a tidy enterprise (Dunn, 1999). Thus, obtaining all the participants you need is a desir-
able goal, but often a difficult one to realize.
You will notice that most of this book's hypothetical examples are based on equal sample
sizes within cells. As a writer and data analyst, this choice was not simply made for convenience
or to create order in data-I also wanted to illustrate the methodological importance of what
otherwise might be seen as a small detail. Why does equivalent cell size matter? The explanation
is rather technical but the simple fact of the matter is that the results one obtains are generally
somewhat distorted (e.g., Aron & Aron, 1999).
What is a conscientious researcher to do where unequal cell sizes are concerned? Here are a
few possibilities.
1. Never work with unequal cell sizes! Do your best to balance the number of participants in
each cell in a factorial design. Make certain that all of the cells in your 2 X 2 design, for ex-
ample, have equal membership and representation.
2. Drop extra data or collect some more. Some researchers use a random selection procedure
to remove the "extra" data from a cell in order to keep its size equivalent to the others in the
research design. This procedure seems to be a bit drastic and even wasteful-why would one
want to literally throw data away (and statistical power in the process)? A better idea might
be to recruit a few more participants in order to "fill up" those cells that have lower mem-
bership than the others.
3. Learn a new ANOVA technique. Finally, there is a regression-based statistical procedure called
least squares analysis of variance. This computer-based data analysis procedure effectively
equalizes each cell's respective influence on any main effects and interactions (Aron & Aron,
1999). Many software packages rely on this procedure-indeed, Aron and Aron note that
when cell sizes are unequal, it may automatically be used in lieu of the more conventional
ANOVA procedures. If you use statistical software, be sure to consult a manual to learn how
unequal cell sizes are handled.
This value, too, is entered into Table 13.2 (see row 3, column 4).
Before we calculate the F ratio indicating whether any difference( s) existed between
the average number of ruminative thoughts across the 3 weeks, we must be certain that
no errors exist in the calculations we just performed. Here are two quick error checks
to verify what you have already done:
• Using information from Table 13.2, check whether the individual sum of squares
estimates add up to the SStotal:
One-Factor Repeated-Measures ANOVA 509
Table 13.2 ANOVA Source Table for Rumination and Therapeutic Meditation Study
•••• Source Sum of Squares df Mean Square F p
•• Between-groups 60.12 2 30.06 23.67 .05
Subjects 5.24 5 1.05 .827
Within-groups 12.71 10 1.27
Total 78.07 17
Is Fcakulated (2, 10) = 23.67;::: Fcritical (2, 10) = 4.1O? Yes: Reject Ho.
Thus, we can reject the null hypothesis of no difference, indicating that at least one
mean difference exists among the means representing daily ruminative thoughts across
the 3-week period. Please note that the F ratio for this one-way repeated-measures
ANOVA is an omnibus F because there are more than two groups (means) present. If
there were only two means present, the nature of the difference would be apparent
(though it is likely that we would have used a dependent groups t test in lieu of the
repeated-measures ANOVA; see chapter 10). Given that we have a "big, dumb F;' our
only recourse is to conduct a post hoc comparison before we can properly complete
step 4 of the analysis, the interpretation of the results.
510 Chapter 13 Mean Comparison IV: One-Variable Repeated-Measures Analysis of Variance
Before we perform this post hoc comparison, however, one other calculation re-
mains involving Table 13.2. Despite the fact that we are only interested in the presence
of differences between the means representing the 3 weekly averages-the between-
groups factor-we should also calculate the F ratio linked to the subjects factor. As I
will show you later on, this other F ratio plays a particular role in a supplementary mea-
sure. Calculating the F for the subjects is done the same way as any other F ratio:
[13.13.1] F - MSsubiects
subjects - MS . .
wlthm
The necessary information is drawn from Table 13.2 for:
[13.13.2] F - 1.05
subjects - 1. 27 '
Table 13.3 Pairwise Comparisons Between All Means Using the Tukey HSDTest
....
It
• •
III Average Dally Ruminations
Average Dally Ruminations Baseline Week Week2
,
~ 9.75 5.75 5.05
j
ences between the means are significant: More ruminations occurred daily during the base-
r line week (Xl = 9.75)-before the meditation therapy-than in either the first (X2 = 5.75)
I
or the second week (X3 = 5.05) after its introduction (see Table 13.3). There was no dif-
ference, however, between the number of daily ruminative thoughts during the two weeks
; after the therapy was taught to the six research participants (see Table 13.3).
i
,( As was true for the other variations of the ANOYA, we can compute the usual sup-
? Remember that an absolute plementary indices-effect size f and &}2_to provide more meaningful context for the
j
difference between two means must results. Once these two indices are known, we can tie everything together into a writ-
I ten summary for this one-way repeated-measures ANOYA.
.. be greater than or equal to the HSD
" value in order for a difference to be Effect Size and the Degree of Association Between the Independent
! significant. Variable and Dependent Measure
I
The effect size for the introduction of meditation therapy to combat ruminative
thoughts can be known by using formulas [11.21.1] and [11.22.1], which are renum-
I
bered here for our present purposes: ~
[13.17.3]
tJ2 = --:-_--,--.....;2('--22_._67...:..)_ __
2(22.67) + 4.135 + 10 + 1 '
[13.17.4]
tJ2 = __--:;4.:,.5.c:..3..::.
4 ___ ,
45.34 + 15.14
"2 45.34
[13.17.5] w = -----,
60.475
[13.17.6] tJ2 = 0.7497 = 0.75.
Thus, approximately 75% of the difference between the 3 ruminative means can be attri-
buted to the effects of the meditation therapy.
occurred during the baseline period, prior to the introduction of the meditation therapy
for the second and third weeks of the study. Indeed, participants reported experiencing
far less anxiety-based cognition on a daily basis during weeks 2 and 3 (see Table 1).
A one-way repeated measures analysis of variance (ANOVA) revealed a significant
decreases in the number of ruminative thoughts following the introduction of medi-
tative therapy, F (2, 10) = 23.67, P < .01. Post hoc comparisons of means based on
Tukey's HSD test indicated that participants reported a higher number of daily rumi-
native thoughts in the baseline period than in either of the subsequent weeks, where
lower but equivalent numbers of ruminations were reported (see Table 1).
Based on this clinical sample, the effects of meditation therapy show great promise.
The effect size for the therapeutic intervention is extremely large (f = 1.83), and there
was a high degree of association between the independent variable the dependent
(repeated) measures of rumination (&)2 = 0.75).
Note: Means sharing the same subscripts are significantly different from each other at the .05
level.
Knowledge Base
l. Why is it sometimes said that a participant in a repeated-measures design serves as
his or her own control group?
2. Complete the following ANOVA source table for a one-way repeated-measures de-
sign:
Source Sum of Squares df Mean Square F
Between-groups 60 20
Subjects 9
Within-groups 30
Total 130 39
Answers
I. Because each participant appears in every level (k) of an independent variable, the variabil-
ity attributable to that person is apt to be low, and it can be statistically removed from the
study's random error. The remaining error will generally be small because it is due exclu-
sively to experimental error, not individual differences and experimental error. As a denom-
inator, the experimental error will be small, leading-through division into the numerator
of the between-groups variability-to a relatively large F ratio.
2.
Source Sum of Squares df Mean Square F
Between-groups 60 3 20 18.02
Subjects 40 9 4.44 3.996
Within-groups 30 27 1.11
Total 130 39
514 Chapter 13 Mean Comparison IV: One-Variable Repeated-Measures Analysis of Variance
Order of Treatment
Group 1 A C B
Group 2 B A C
Group 3 C B A
As you can see, each of the three treatments-A, B, and C-appear once in each of the three or-
der positions, yielding three distinct treatment orders. By the way, the term "group" can refer to
an actual group comprised of several people or it can denote only one person who receives a par-
ticular treatment combination. Although this form of randomization eliminates systematic bias,
the measurement of error and the calculation of statistical tests for within- and between-groups
differences are more advanced than those presented in this book. Guidance on creating or se-
lecting particular Latin square designs, as well as advice on analyzing their data, can be found in
Kirk (1982) and Winer, Brown, and Michels (1991).
Wolraich and colleagues (1994) used a three-treatment Latin square like the one shown above
to examine the effect of sugar intake on children's hyperactivity. Many parents and educators
believe that some children are highly reactive to sugar, so that aggressive behavior, attentional
problems, and poor academic performance can be attributed to a sugar-laden diet. To verify
the accuracy of this perception, 48 children and their families were assigned to one of three
groups. The groups followed three different diets (A, B, & C; see above Latin square design), each
for a period of three weeks. Here is a description of the diets:
Diet A-high in sugar but no artificial sweeteners (e.g., aspartame, saccharin)
Diet B-low in sugar; aspartame used as a sweetener
Diet C-low in sugar; saccharin used as a sweetener
Across the 9 weeks of the study, the children completed various tests and measures weekly,
and their parents, teachers, and the researchers rated their behavior for evidence of hyperactiv-
ity and aggression. What did the investigators find by using this Latin square design? The chil-
dren's behavior was not linked to diet or sugar intake--observers (parents, teachers) believe in a
link between sugar and hyperactivity that is more apparent than real. The origin of this wide-
spread fallacy is the intriguing question for future research efforts.
Mixed Design AN OVA: A Brief Conceptual Overview of Between-Within Research Designs 515
• No drink group (participants are not given anything to drink before the memory
trials)
• Placebo group (participants are given decaffeinated coffee and told that it is
caffeinated) .
• Alternate beverage group (participants are given water to drink)
What about the repeated-measures factor? The repeated-measures factor in a
mixed design is usually, but not always, temporal (e.g., measurement across hours,
days, weeks, months, number of trials). The investigator has all of the participants
complete four separate memory tasks designed to examine their ability to retain stim-
ulus items in short- and long-term memory. Participants might learn a word list, for
example, and after a fixed time interval, be tested to see how many items they recalled.
To control for any order effects, the order of the 4 memory tasks is counterbalanced
across the 32 participants (see Dunn, 1999; Shaughnessy & Zechmeister, 1994; for coun-
terbalancing recommendations).
The advantage of a mixed design ANOVA is that it can compare mean recall for
items on each of the four memory tasks between coffee drinkers and noncoffee
drinkers (factor A), between the four types of caffeine administration (factor B), and
across the four different memory tasks (factor C). This is a 2 X 4 (between-groups
factors) X 4 (repeated measures factor) design, and you are no doubt already probably
using knowledge gained in this and the last chapter to identify what particular effects
emerge from the analysis. In terms of specific results, this mixed design ANOVA would
reveal three main effects-one each for coffee drinking status, caffeine administra-
tion, and recall, as well as the following interactions: A X B, B X C, A X C, and a
triple interaction-A X B X C. Explaining the interpretation of these main effects
and interactions-even in the context of a hypothetical example-is a somewhat
complex exercise. I offer this closing example to chapter 13 as a teaser, an invitation,
really, for you to pursue learning advanced statistical techniques like the mixed de-
sign ANOVA in future classes.
In a now classic work on experimental design, Campbell and Stanley (1963) identified
several major threats to the internal validity-the unambiguous influence of an inde-
pendent variable on a dependent measure (see chapter 2)-of repeated-measures re-
search designs. By use of the term "threats:' the authors were referring to possible rival
explanations for the observed outcomes of experiments. In other words, a researcher
might assume that some obtained mean difference based on a repeated measures
ANOVA was reliable when, in fact, it might have resulted from some unknown-and
uncontrolled-factor other than the intended independent variable.
This Project Exercise involves examining repeated-measures research designs that
employ (or could do so) a one-variable repeated-measures ANOVA, and then identify-
ing any potential threats to accurate inference where internal and external validity are
concerned (see chapter 2 for a review of validity issues). According to Campbell and
Stanley (1963), the four threats that are apt to occur when a repeated-measures design
is employed include:
Testing. After completing any sort of test, the original testing affects the score~ an
individual obtains at any subsequent testing (e.g., standardized test scores often Im-
prove across time as respondents become familiar with the question format, time de-
mands, and the like).
Mixed Design ANQVA: A Brief Conceptual Overview of Between-Within Research Designs 517
Summary
1. A one-way repeated-measures ANOYA is often referred to participants and that no carryover or other order-related ef-
as a within-subjects ANOYA, as the same group of partic- fects are present.
ipants is exposed to all the levels of a single independent 4. Partitioning variability in the one-way repeated-measures
variable. This statistical test is a close relative of the de- ANOYA is analogous to the procedure associated with the
pendent groups ttest-the difference between the two tests other ANOYA techniques, however, the variability due to the
is that the latter can only examine the difference between subjects (MS,ubjects) is kept separate and distinct from that
two means. associated with the independent variable (M~etween)' This
2. There is a distinct statistical advantage to a repeated- separation leads to a smaller denominator, which frequently
measures ANOYA: Because each participant serves as his or enhances the likelihood of finding a significant F ratio asso-
her own control group, the variability in the denominator ciated with the repeated-measures factor.
of the Fratio is exclusively due to experimental error-vari- 5. When the F ratio for a repeated-measures factor is found
ability due to individual differences is partialed out sepa- to be significant, some post hoc test-such as Tukey's HSD
rately in the analysis. test-must be employed to locate differences between
3. The statistical assumptions of the one-way repeated-measures means when more than two are present. Although the F
ANOYA differ from those associated with other ANOYA tech- ratio for the "subjects" factor is calculated, it is used to
niques in one respect only-it is assumed that the effects of determine the value of a supplementary measure and not
each independent variable are consistent across the study's as a result per se.
Chapter Problems 519
6. Supplementary measures (e.g., effects size f and t;}) should from it are analyzed by the mixed design ANOVA. Cal-
be included in any write up of one-way repeated-measures culation of this test is beyond the scope of this chapter,
results, along with a table illustrating how the means repre- though it is important to be aware that a main effect for
senting the different levels of the independent variable each variable, as well as an interaction between all vari-
changed (if at all) across time. able pairs, results.
7. A mixed design is one comprised of at least one between-
groups factor and one repeated-measures factor, and data
Key Terms
Mixed design (p. 515) Mixed design ANOVA (p. 515) One-variable or one-way repeated-
measures ANOVA (p. 501)
Chapter Problems
1. How does a one-way repeated-measures ANOVA differ from 8. An investigator performs a one variable repeated measures
a one-way ANOVA? (Hint: Use examples to illustrate your ANOVA and finds an F ratio with 4 and 36 degrees of free-
points.) dom. How many participants took part in the study? How
2. Why measure a dependent variable more than once? many times was the dependent measure administered?
What are the advantages of using a repeated-measures 9. An investigator conducts a study where 4 dependent mea-
ANOVA design? sures are administered to 10 participants. The investigator
3. Why would a data analyst elect to use the dependent groups analyzes the data using a one variable repeated measures
ttest instead of a one-variable repeated-measures ANOVA? ANOVA. What are the degrees of freedom associated with
Illustrate your points through an example(s). the resulting F ratio for the repeated measures variable?
4. A one-variable repeated-measures ANOVA is conducted on 10. A guidance counselor is interested in how student mo-
data collected from eight participants whose responses were tivation changes across time. She tracks six students from
measured at four different points in time. What is the value 10th through 12th grade in order to document any
of the degrees of freedom for the F ratio? change in their academic motivation. To do so, the stu-
5. A one-variable repeated-measures ANOVA is conducted on dents complete the Student Academic Motivation (SAM)
data collected from 10 participants whose responses were scale during the second week of school in each of the
measured at three different points in time. What is the value three grades (higher scores indicate greater scholastic
of the degrees of freedom for the F ratio? motivation; the data are provided below). Calculate the
6. An educational psychologist is interested in how students' average SAM scores for the three years and then use the
moods vary across a given school day. The researcher is in- appropriate ANOVA to determine whether (and how)
terested in identifying stable mood patterns among middle the scores changed across the 3-year period (use a
school students, as such patterns may reveal the "best" or significance level of .05). If any significant differences
"worst" time for teaching and learning particular academic are observed, perform the required post hoc tests and
topics. Twenty students complete a mood scale at six inter- supplementary statistics. (Hint: You may find the deci-
vals during a typical day (i.e., about every 2 hours). The sion trees appearing at the opening of the chapter to be
measures are administered at home (when they wake up and helpful.) Be sure to write up the results of your analysis
just before bed), as well as in school. What type of analysis in APA style.
should the educational psychologist use to analyze the data SAM Scores
from the mood scales? Why? Use the decision trees that open
this chapter to answer and to guide your thinking about Students lOth grade 11th grade 12th grade
these questions. 10 6 11
7. Assume that the result(s) of the analysis proposed in 2 9 8 12
question 6 reached significance. What is the next step(s) 3 8 6 10
the investigator should take? Use the appropriate 4 10 5 9
5 9 6 10
decision tree provided at the opening of this chapter to
6 8 8 9
answer this question.
520 Chapter 13 Mean Comparison IV: One-Variable Repeated-Measures Analysis of Variance
11. Are there any advantages that repeated-measures designs 18. The following source table represents the results of an ex-
have over between-groups designs? periment, the data from which were analyzed by a one-way
12. How does the calculation of a repeated-measures repeated-measures ANOVA. Assume that there were six par-
ANOVA's error term (i.e., the denominator in the Fratio) ticipants in the study and then complete the source table's
differ from the one associated with a between-groups entries.
ANOVA?
13. The following data were collected in a repeated-measures Sum of
design. Analyze these data using the appropriate test and Source Squares df MS F
explain whether (and where) any significant differences Between-groups 48 3
lie (use a significance level of .05): Subjects
Within-groups 35
Treatments
Total 90
Participants I 2 3 4
A 2 4 5 7
B 2 4 5 19. Calculate the effect size and association between the inde-
C 2 5 3 6 pendent variable and dependent measure (if appropriate)
D 4 3 6 for the results presented in question 17.
20. Calculate the effect size and association between the inde-
pendent variable and dependent measure (if appropriate)
14. The following data were collected in a repeated-measures
for the results presented in question 18.
design. Analyze these data using the appropriate test and ex- 21. An industrial organizational psychologist examines the
plain whether (and where) any significant differences lie productivity scores of a trainee group at a manufacturing
(use a significance level of .05):
plant. The trainees' performance is assessed weekly for the
first three weeks of their employment. The psychologist
Sessions wants to learn whether there are any significant changes
in performance across the training period. Here are the
Participants I 2 3 4 data:
A 8 7 5 2
B 9 8 4 2 Trainee WeekI Week 2 Week 3
C 8 7 4 1 A 5 8 9
D 8 6 5 3 B 4 7 10
E 7 4 4 2 C 2 4 8
D 6 9 10
15. Calculate the effect size and association between the inde-
pendent variable and dependent measure (if appropriate) Calculate an ANOVA to determine whether any differences
for the results presented in question 13. in productivity occur across the training period (use a sig-
16. Calculate the effect size and association between the inde- nificance level of .05 in your analysis). If appropriate, de-
pendent variable and dependent measure (if appropriate) termine effect size as well as the degree of association be-
for the results presented in question 14. tween the independent variable and the dependent
17. The following source table represents the results of an measures.
experiment, the data from which were analyzed by a one- 22. A coach examines the hours per week her players devote to
way repeated-measures ANOVA. Assume that there were weight training. She wants to see whether the time devoted
10 participants in the study and then complete the source to weight training increases steadily over a period of one r
table's entries. month. Here are the data:
Calculate an ANOVA to determine whether any differences 24. A research project on weight loss and health follows a
in training time occur across the month (use a significance group of dieters for one year. During the year, their
level of .05 in your analysis). If appropriate, determine effect weight is measured monthly, so each participant has 12
size as well as the degree of association between the inde- recorded weights. What sort of analysis should be per-
pendent variable and the dependent measures. formed on these weight scores in order to determine
23. Under what sort of circumstances would an investigator whether dieting led to weight loss across the year? Why?
elect to use a mixed design ANOVA? How does a mixed (Hint: Use the decision tree(s) at the start of this chapter
design ANOVA differ from the other forms of ANOVA to answer these questions.)
reviewed in this and the previous chapter?
Choosing Which Nonparametric to Use for Data Analysis
It
Are the data ordinal?
'Itno;fthengo to
. step 10. f·
Selecting a Supporting (Strength of Association) Statistic for the Chi-Square (x 2 ) Test for
Independence
,.
. 1/
f Is the chi-square (x 2) test statistic based 00 a,
. 2 x 2 contingency table?
J
SOf'\E. NONPARAf'\ETRIC Statistical Assumptions of the
Chi-Square
The Chi-Square Test for
r One-Variable: Goodness-of-Fit
\
t.
\
I
STATISTICS FOR CATECTORICAl The Chi-Square Test of
Independence of Categorical
Variables
Data Box 14.B: A Chi-Square
f for Independence Shortcut for
f
AND ORDINAl DATA Tables
Supporting Statistics for the
Chi-Square Test of n(\(!pen(\(:nc(~,:,':':.
I
f Phi (M and Cramer's V
I
l
( orne years ago, I had an officemate who worked in a developmental research lab-
.j
f oratory where children's social behavior was observed and coded. The codings
often used to create frequency tables wherein the behavioral styles of fami-
1, lies (e.g., nurturing, distracted, abusive) and their impact on the children's social and • The Mann-Whitney V Test
Data Box 14.D: Handling
emotional development could be studied. The leader of this laboratory was a dedicated
Ranks in Ordinal Data
f
J
researcher who kept a close watch on incoming data as they were coded for later analy-
sis. She was known to closely eyeball a table of frequencies, sometimes remarking that,
Mann-Whitney V Test for
(Ns > 20) Samples: A Normal
( "a visual chi-square [an inferential statistical test] tells me this pattern is significant." Approximation of the V
i Of course, this researcher did not seriously trust her intuition in place of actual sta- Distribution
.,.. Writing About the Results of
tistical analyses, nor was she actually performing an analysis with just a glance (but note
Mann-Whitney VTest
that I always implore you to eyeball your data before beginning any analyses). The spirit • The Wilcoxon Matched-Pairs
t of the researcher's remark, however, is important for our present purposes. She identi- Signed-Ranks Test
fied an appropriate statistical test for the data that is not very complicated to calculate; Data Box 14.E: Even Null
I indeed, the chi-square test-and others like it-are much less complex than the infer-
ential tests presented earlier in this book. These tests, some of which will be discussed Writing About the Results
Wilcoxon Test
in this chapter, represent a collection of alternative approaches to hypothesis testing. • The Spearman Rank Order
This chapter represents a departure from our careful and thorough examination of Correlation Coefficient
how to statistically test for mean differences, the focus of the previous four chapters. Writing About the Results
We are still very much interested in asking focused questions of data through hypoth- Spearman rs Test
Knowledge Base
esis testing, but now we will use different sorts of data. Previously, our emphasis was
on measuring things-observations, participants' reactions to some stimulus-using ei-
ther an interval or ratio scale, and then determining whether any significant between- or Project Exercise: Survey !intl<_~"'>·!\>
within-groups differences involving the measurements existed. To be sure, the behav- Vsing Nonparametric Tests on
ioral sciences have something of a mania for measurement, invoking a philosophy only Data
• Looking Forward, Then
half-jokingly referred to as the "if-it-moves, then-measure-it" approach. It is also true, • Summary
however, that there are many other interesting questions that are not at all dependent • Key Terms
Chapter14 S
ome Nonparametric Statistics for Categorical and Ordinal Data
bonhm:asualre~ent per se-and these questions, too, are well within the purview of the
e aVlOr SCIences.
Question.s that do not lend themselves to analysis by the usual inferential statisti-
~.al :ests£ reqUIre
an altogether different set of statistical techniques, which are collec-
lve y re erred to as nonparametric statistics. Here are some nonmeasurement uestions
that are amenable to analysis by the nonparametric approach: q
• A campus physician is interested in determining whether male and female
students are equally aware that regular workshops on wellness, fitness, and diet
are offered by the health center. She randomly samples 200 students (half male
half fe~ale) and asks whether they are aware of these programs, subsequently'
comparmg the frequencies of their responses. Here are the data:
As you can see, their responses are readily categorized as raw frequencies-
"Yes, I know about the workshops" or "No, I didn't know about the
workshops"-so that no rating scales or other types of measurement are needed.
As you look over the table, the obvious question to ponder is whether the
university's women are more aware of the health initiatives than men-the
pattern of their "yes" and "no" answers certainly points to this conclusion. A
nonparametric test performed on such data enables us to take any speculation a
step further, to consider whether the observed pattern of responses is statistically
valid (i.e., women are more aware of available campus health initiatives than
men). Such a significant finding serves as an invitation for the physician to
embark on research designed to reveal the possible reasons for and implications
of the disparity in knowledge between the two groups.
• A professor of decision sciences wants to demonstrate that the problem-solving
efforts of groups generate better, more creative solutions than individuals
working alone. To test this idea empirically, the researcher gives groups of five
people or solo individuals a complex task involving describing a government for
a hypothetical society. All participants are given 1 hour to develop mock
bureaucratic systems on paper, each of which is then rated (in rank order) by a
political scientist who is unaware of the study's intent. The decision researcher
wants to show that more complex but possible structures are likely to originate in
groups rather than being thought up by persons working alone. The decision
sciences professor will need to employ a method of data analysis enabling him
to determine whether group solutions will be ranked more highly than the plans
of individual thinkers.
• A perception researcher is interested in people's similarity judgments where
texture gradients are concerned. She provides two participants with a collection
of 10 different grades of sandpaper, asking each to rank order the sandpaper
samples from the most fine-grained to the coarsest texture. The researcher wants
to determine the degree of overlap between the rankings made by each of the
participants: Were their rankings of texture unanimous, somewhat similar, or
rather diverse? To adequately address this question, the researcher must assess the
degree of association (correlation) between the respective sets of rankings (note
again that the data are rankings rather than ratings).
How Do Nonparametric Tests Differ from Parametric Tests? 525
In the course of this chapter, then, we will examine a variety of statistical tests that
stretch our general conception of hypothesis testing, enabling us to think about data in
new ways. These new ways chiefly entail working with ordered rankings of observations
or critically examining frequency information to determine if either reveals any clear
conclusions about behavior. Before we learn the actual statistical tests, however, we must
be certain we understand the underlying difference between the inferential tests we
learned in earlier chapters and those we are studying now.
Note: The entries in this table are selective. Various other nonparametric and parametric te
included in this table. The blank entries indicate that an appropriate test(s) is not review~'"
Chapter I
Some No
nparametrie Statist'
les for Cate .
gOfteal and
OrdinalD
ala
The~
t el oparametric Bible for the Be .
YSfUd€iJt of the beh .
efrie Stmist' fi
havloral SCiences: Sie e/
aVlOral sciences should b
/cs or the B h . e aware f
9 and Castel/an (1988)
to use this book unl e aVloral Sciences. I Use the °d a classic reference work AT
J.' ess a part" u1 Wor " a " , 'vonpa
go LOr g~idance. IC ar need arises, but when one d:are because you are not lik:~-
. Wntten by Sidne Sie es, you need to know where t:
In 1956 (Siegel, 1956/T gel, ~e first edition of this guide to
parametric techniques' . he decIded strengths of the book l'n lnodn~arametric statistics appeared
h' In concert ·th h CUe Its ti
IS great credit, Sie el e WI t e research designs the ocus on presenting non-
7
vised version of th: bo;Pl ed a s~ep-~y-step calculation met:o:e;: developed to analyze. To
This revision, undertake b legel dIed In 1961-appeared in 1988 ;~ughout the book. A re-
incorporating n. y John Castellan, preserves the b t ( legel & Castellan, 1988).
tra t newe: techmques from the ever- andi es a~pects of the original text while
vel~;g~~Ular Inion, the discipline of stat=s is
oPth :!tr:~:;;o~e ofhnon~arametric
tests (con-
. e any 0 er academic field). ' ut c angmg, growing, and de-
. The SIegel and Castellan (1988) text is best described as .
tIcs. Why? Simply because the authors put to eth the bible for nonparametric statis-
searchers who must decide which statl'stl" g er a cle~rly written reference tool that aids re-
c IS most approprIate fo th . .
and data. The chapters in the Siegel and C t II . r elr partIcular circumstances_
as e an text gUIde read th h
samples, one sample with two measures two i d d ers roug tests for single
I ' n epen ent samples, two or more d d
pes, two or more independent samples, and measures of association lure ou t epen ent.sam-
r~ference work in mind should you find yourself with data that do ~ot c;n~rm ~o ~~ep thIS key
hons of parametric tests. e expecta-
n~w, you should reflexively think of employing the independent groups ttest (for a re-
VIew, see chapter 10) to analyze interval or ratio scaled data from such a design. As you
can see, the independent groups ttest is identified as the parametric test of choice. When
the available data are nominal, however, the X2 (chi-square, which is pronounced "kie-
square") test of independence is the correct analytic tool (we will learn to calculate this
statistic in the next section). On the other hand, ordinal data placed into two indepen-
dent groups would be examined using what is called the Mann-Whitney Utest, a sta-
tistic that will be introduced later in the chapter. Table 14.1 will serve as a valuable ref-
erence for you when you must decide which nonparametric test is appropriate for a
given research design.
favorite class this semester." To respond to the statement, a student simply circled one
(and only one) of the rating options (i.e., strongly agree, agree, undecided, disagree,
strongly disagree). Here is a breakdown of the students' responses (N = 35):
Observed Data
Strongly Agree Agree Undecided Disagree Strongly Disagree
17 8 3 2 5
Please note that the instructor elected to treat the responses as categorical data-she
might just have easily calculated a mean rating based on the 35 responses.
What can we conclude from these categorical data? By casually "eyeballing" the
data, it appears that the majority of the students in the sample (those comprising the
"strongly agree" and "agree" categories) affirmed the sentiment that this was their fa-
vorite course that particular semester. (As an aside, I wonder how you would answer
this question, and whether you and your classmates agree with the positive sentiment!)
The statistical issue, of course, involves demonstrating that this positive pattern of
agreement deviates from what would be expected by chance (i.e., if the students were
responding more or less randomly to the question).
Thus, the chi-square test statistic (X 2 ) indicates whether there is a difference be-
tween some observed set of frequencies-the data drawn from a piece of research-
and a set of expected frequencies. These expected frequencies constitute the prediction
made under the null hypothesis. We will review the null hypothesis for the chi-square
test in some detail momentarily.
Before we do so, however, we should review the necessary steps for testing a hy-
pothesis using a nonparametric statistic. I say "review" because the steps involved do not
diverge much from those we followed for previous inferential tests (recall Tables 9.2 and
11.4). The five basic steps for performing a nonparametric test are shown in Table 14.2.
The main difference here is that it is advisable for you to first verify that the data are ei-
ther nominal or ordinal, as well as to select a test statistic (see step 1 in Table 14.2). (We
already completed this step when we began to talk. about the data presented earlier in
this section. When you perform this step "solo:' you can rely on Table 14.1 as well as the
decision trees that open the chapter.) Once this step is completed, the next few steps-
stating a null and alternative hypothesis (step 2), choosing a significance level (step 3),
performing requisite calculations in order to accept or reject Ro, as well as interpreting
and evaluating the results (step 4), follow the standard pattern. Nonparametric tests do
not require any post hoc tests, though some supporting statistics are sometimes calcu-
lated (this flexibility is provided by step 5 in Table 14.2). We will refer back to this table
through the remainder of this chapter.
We can now return to the null hypothesis for the chi-square test for goodness-of-
fit. This null hypothesis can take one of two forms: no frequency difference among a set
of different categories or no frequency difference from a comparison popUlation.
To return to our example, the Ho for no frequency difference among a set of dif-
ferent categories would be:
Ho: No difference in course ratings across the five rating categories
(i.e., strongly agree to strongly disagree).
Because there are 35 students, the expected frequency for each category when no dif-
ference exists would be 7 (i.e., 35 students divided by the five possible categories equals
7 students in each one), or:
7 7 7 7 7
Thus, the general rule of thumb for determining the expected frequencies for the chi-
square test for goodness-of-fit test is simply dividing N by the number of available cat-
egories. The alternative hypothesis, then, is:
Hi: There is a statistically reliable difference between the
observed and the expected frequencies.
What if a comparison population existed? In that case, you would simply compare
the observed data with some existing comparison data. To continue our example, it is
possible that a faculty member wanted to compare student ratings from a prior semester
(say, one where a different textbook was used) with the current semester. The compar-
ison data might look like this:
4 10 11 5 5
The null hypothesis for the no frequency difference from a comparison population,
then, could look like this:
Ho: No difference in course ratings from prior semester across the
five rating categories (i.e., strongly agree to strongly disagree).
The expectation of this null hypothesis is that the observed data would not depart
significantly from the ratings gathered in the previous semester. If a difference were found
to exist between the observed data and the comparison data, then perhaps-depending
on the pattern of the frequencies-the instructor could conclude that the new book led
to more favorable ratings. More simply, however, the alternative hypothesis could be:
Hi: There is a statistically reliable difference between the
observed and the comparison frequencies.
532 Chapter 14 Some Nonparametric Statistics for Categorical and Ordinal Data
In general, most chi-squares tests focus on testing the first null hypothesis, where
some set of observed data is compared to a pattern expected by chance. Thus, we will
finish this example by testing just this hypothesis and so complete step 2 in Table 14.2.
Before we proceed, however, what other information do we need? Beyond identifying
the hypotheses, we must choose a significance level for the test statistic (step 3 in
Table 14.2; as usual, the alpha-level of .05-two-tailed-is fine), which will later help
us to pinpoint the critical value for accepting or rejecting H o.
We can now move on to step 4 in Table 14.2, the actual calculations for the test
statistic. Here is the formula for calculating the chi-square test:
2 L (fo - fd
[14.1.11 X = ,
fE
where fo refers to the observed frequency in a given category and fE is the expected
frequency under Ho for that category. The numerical difference between these two fre-
quencies is calculated, its value is squared, and then divided by the fE. The process is
repeated for each of the remaining cells, and then all of the products are summed to
create the chi-square test statistic.
Before we proceed with the actual calculations, pause for a moment and think about
what intuitively must be true about the process of assessing differences between ob-
served and expected frequencies: When the respective frequencies are similar to one an-
other, the value of the X2 test statistics will be relatively small (i.e., the null hypothesis
is apt to be accepted). As the differences between the observed and expected values in-
crease, there is an increased likelihood that the null hypothesis can be rejected (i.e., a
statistically reliable difference is identified).
An easy way to organize the calculation of the X2 test statistic is by using a tabular
format. Table 14.3 illustrates each of the observed (fo; see column 2) and expected
(fE; see column 3) frequencies, the difference between them (fo - fE; see column 4), the
squared value of the difference ((fo - fE)2; see column 5) and the final product result-
ing once each squared-difference score is divided by M(fo - fE? IfE; see column 6). The
value of the X2 test statistic is determined by summing each of these final products
together. Once summed (see the bottom of column 6 in Table 14.3), the value of the X2
test statistic is known. (Please take a moment and review the calculations presented in
Table 14.3 and assure yourself that you know where all the numbers came from and why.
Take special note of the fact that both L fo and L fE must be equal to N-here, 35.)
.i
The Chi-Square (x 2) Test for Categorical Data 533
Using [14.1.1] (repeated here for consistency) and entering the values from Table 14.3,
we can calculate the X2:
In any ¥ analysis, the := to and the
~ tE must both be equal to N. L (fa - fd
[14.1.1]
IE
(17-7)2 (8-7? (3-7? (2-7)2 (5-7)2
[14.1.2] X2 = - - - - + - - - + + ---- + ----
7 7 7 7 7
(10)2 (1)2 (-4)2 (- 5? (-2)2
[14.1.3] X2 = - - + - - + - - + - - + - - ,
7 7 7 7 7
2 100 1 16 25 4
[14.1.4] X =- +- +- +- + -,
7 7 7 7 7
[14.1.5] X2 = 14.29 + 0.143 + 2.29 + 3.57 + 0.571,
[14.1.6] X2 = 20.86.
In this example, the value of the X2 test statistic-20.86-is quite large.
Once the value of the test statistic is known, you can calculate the degrees of free-
dom for the X2, which are based on:
[14.2.1] dfx = k - 1,
where k is equal to the number of available categories. The original rating scale is based
on five categories, so the degrees of freedom for this X2 test statistic are:
[14.2.2] dfx = 5 - 1,
[14.2.3] dfx = 4.
Once the degrees of freedom are identified, we can turn to Table B.7 in Appendix
B, a table of critical values of X2. Please turn to Table B.7 now and locate the row cor-
responding to 4 degrees of freedom in the table's leftmost column. Read across that row
until you find the value under the column labeled ".05:' What value did you find? If you
located the X2 critical value of 90488, then you are correct. As we have done with all
other hypothesis testing ventures, we ask a straightforward question: Did the observed
test statistic exceed or equal the value found in the table? Or,
Is X2 (4) = 20.86 ;:::: X~ritical (4) = 9A88? Yes, so Reject Ho.
As shown here, the degrees of freedom are included (parenthetically) along with the re-
ported test statistic and critical value. To report this significant X2 in APA style, you
would write:
X2 (4, N = 35) = 20.86, P < .05.
The statistic is reported in the standard APA manner but with one important excep-
tion: Because the degrees of freedom bear little resemblance to a study's sample size, N
is always included for clarification. Please note that there is no supporting statistic
(i.e., step 5 in Table 14.2) for the chi-square test for goodness-of-fit.
Interpreting and Writing About the X2 Goodness-ol-Fit Result. What does this sig-
nificant test statistic mean? As shown by the original data, students enrolled in statis-
tics that semester generally (and strongly) agreed with the sentiment that it was their
favorite course. When a X2 test statistic is significant, "what you see is what you get"
where interpretation is concerned. Can we conclude that all students share this belief?
Certainly not. Aside from required courses-and statistics may be one-anytime
534 Chapter 14 Some Nonparametric Statistics for Categorical and Ordinal Data
students take a class they generally elect to be there, so they were not randomly selected
from the larger population. We can be reasonably confident that students in this sam-
ple defied expectations-they did indeed like the course-but we cannot generalize be-
yond this sample.
Just as there is relatively little to say where the interpretation of the X 2 is con-
cerned, there is not that much to write about, either. This virtue enables the
researcher/data analyst to get right to the point. Here is one way this example could
be shared with others:
Students enrolled in the statistics class were asked to indicate their level of agreement
with the statement that "Statistics was my favorite course this semester:' Five rating cat-
egories were available for participant responses, which were in general agreement with
the statement (see Table 1). A chi-square test for goodness-of-fit revealed that the ob-
served data departed from the expectation of no difference across the categories, X2 (4,
N = 35) = 20.86, P < .05 (two-tailed). Thus, the students generally felt that statistics
was their favorite course that semester.
Table 1
Strongly Strongly
Agree Agree Undecided Disagree Disagree
17 8 3 2 2
News Source
Educational Status Television Newspaper Row Totals
College 47 62 109
(' High School 58 39 97
r Column Totals 105 101 206
A glance at the cell entries inside the table suggests that college graduates are more
likely to rely on the newspaper as their source for news, where high school graduates tend
(
to use the television predominantly for gathering information. Note that the row totals
and the column totals representing the respective variables are very similar in magnitude.
The chi-square test for independence will reveal whether this apparent relationship be-
tween the two categorical variables is a dependent one-that is, both variables need to be
considered simultaneously-or whether they are actually independent of one another.
Given that we are dealing with two categorical variables, we know to apply the
r chi-square test for independence, thereby satisfying step 1 of Table 14.2. Step 2 entails
I identifying the null and the alternative hypotheses. In this case, the null hypothesis is
I that the participants' level of education is independent of their mode of news acquisi-
tion. The varying pattern of frequencies shown above in the contingency table, then, is
i due to chance rather than any dependent relationship. In contrast, the alternative
hypothesis tests whether a dependent relationship exists between education and
mode of news gathering. In other words, more education is associated with whether
people select a more active mode of news acquisition (i.e., reading the paper versus
watching the television). Before we test the veracity of this alternative hypothesis, we
can complete step 3 by selecting .05 as the significance level for rejecting Ro.
Step 4 entails the actual calculation of the nonparametric statistic, and here is the
formula for doing so:
[14.3.1]
where r is the number of rows and c is the number of columns, and fa and iE refer to
the observed and expected frequencies, respectively. This formula directs the data ana-
lyst to determine the difference between the observed (fa) and expected (iE) frequen-
cies in a given cell (i.e., fa - fE), square that difference ((fa - fE)2), and then divide it
by the expected frequency ((fa - iE)2/fE)-each of the product's corresponding four
cells are then summed to create the chi-square statistic for independent events. If the
X2 value equals or exceeds a critical value, then the null hypothesis of independence is
rejected and the two variables are said to be in a dependent relationship with one another.
So far, the steps involved in calculating this form of the chi-square are identical to
those used for the goodness-of-fit variation (recall [14.1.1]). We now introduce a pro-
nounced difference from the latter statistic, however, by illustrating the procedure for
determining the cell values expected under the null hypothesis of no difference. Please
recognize that because there are two variables, the expected cell frequencies cannot be
identified by simply dividing the number of participants by the number of cells avail-
able. Instead, we use a simple procedure called the "cell !\' strategy to calculate the ex-
pected frequencies for the four cells. By cell A, we refer to the somewhat arbitrary iden-
tification of the upper left cell in any 2 X 2 table (see below):
Television Newspaper
College A B
High school C D
536 Chapter 14 Some Nonparametric Statistics for Categorical and Ordinal Data
The remaining three cells are labeled B, C, and D, accordingly. Our goal is to calculate
the expected frequency corresponding to each of these four cells.
Here is the formula for the cell A strategy:
Going back to the original contingency table presented above, we simply need to enter
the column total found under cell A, divide that number by N, and multiply the result
by row total found to the far right of cell A. Entering the appropriate numbers from
the contingency table, we find:
105
[14.4.2] CellA = - X 109,
206
[14.4.3] Cell A = 0.5097 X 109,
[14.4.4] Cell A = 55.56.
Thus, the expected frequency for cell A (i.e., college grads who watch the television for
news) is 55.56. Please note that we would get the same result if we switched the place-
ment of the row and column totals, or:
row total
[14.5.1] Cell A = N X column tota~
109
[14.5.2] CellA = - X 105,
206
[14.5.3] Cell A = 55.56.
How do we calculate the expected frequencies for the remaining three cells? There
are really two ways. First, we could simply use the cell A strategy for each of the re-
maining cells; that is, dividing a given row (column) total by N and then multiplying
the product by the column (row) total. Based on the relevant entries from the contin-
gency table, the expected frequency for cell B, then, would be:
column total
[14.6.1] Cell B = X row tota~
N
101
[14.6.2] Cell B = - X 109,
206
[14.6.3] Cell B = 0.4903 X 109,
[14.6.4] Cell B = 53.44.
This procedure is then repeated the same way for cells C and D, a relatively easy feat
requiring the data analyst to select only the appropriate row and column totals.
Alternatively, the original expected value for cell A (i.e., 55.56) and some relatively
easy subtraction can be used to determine the expected values for cells B, C, and D.
Here's how. Once the value of cell A is fixed at 55.56, we need only subtract this value
from the row total of 109 (see the original contingency table) to determine cell B
(i.e., 109 - 55.56 = 53.44). Note that this is the same value we just calculated with for-
mula [14.6.1]. Using the same logic, cell C's value can be known by subtracting the
value of cell A from the column total of 105 (see the contingency table). Thus, cell
C's value is 49.44 (i.e., 105 - 55.56 = 49.44). To calculate the expected frequency of
cell D, we have a choice. We can subtract the value of cell C from the row total of 97
or cell B's value can be subtracted from the column total under it, which is 101. Either
The Chi-Square (x 2) Test for Categorical Data 537
calculation will yield the same result of 47.56 (as an exercise, show that this claim is
true). Here, then, are all the expected frequencies alongside the four lettered cells:
Television Newspaper Television Newspaper
College A B College 55.56 53.44
High school C D High school 49.44 47.56
Once again, please note that the sum of all the expected frequencies for the 4 cells
must be equal to N (here, 206-remember that in any chi-square test, the sum of the
observed data must always equal the sum of the expected frequencies). Another error
check, of course, is to verify that when both row totals or both column totals are summed
together, their values should both be equal to N, as well (check this assertion by going
back to the original contingency table).
Using these expected frequencies, we can now continue with the chi-square analy-
When computing a x2 test for sis using formula [14.3.1], repeated here for convenience:
independence, be certain that the
row totals and the column totals [14.3.1]
each sum to N-any discrepancy
from N indicates that a math error is (47 - 55.56? (62 - 53.44?
[14.3.2] X2= + -----
present. 55.56 53.44
(58 - 49.44)2 (39 - 47.56)2
+-----+
49.44 47.56
2 ( -8.56)2 (8.56)2 (8.56)2 (-8.56)2
[14.3.3] X = +--+--+
55.56 53.44 49.44 47.56
2 73.27 73.27 73.27 73.27
[14.3.4]
X = 55.56 + 53.44 + 49.44 + 47.56'
Instead of computing expected frequencies using either of the two procedures presented in the
text, the following simple procedure can be used. The other two procedures can be expanded
for contingency tables that are larger than 2 X 2 (e.g., 2 X 3, 3 X 3)-however, the following
method can only be used for 2 X 2 tables. We will review the necessary steps using the data from
the example problem presented in the text:
A B A+B 47 62 109
C D C+D 58 39 97
A+C B+D N 105 101 206
2. Enter the organized numbers into this formula (and do not be put off by unusually large
numbers!):
N(AD - BC)2
X2= ------~~--~~------
(A + B)(C + D)(A + C)(B + D) ,
X2 = -=.20=-:6"",[(...:,47:..L)..>..:(3..:..9)'-,------"(6:..::2",,)(.::.,:58:..<.).....
]2
(109)(97)(105)( 101)
206(1,833 - 3,596)2
X2= --~~--~~~
112,126,665
2 206(3,108,169)
X = 112,126,665 '
2 640,282,814
X = 112,126,665'
X2 = 5.71.
3. The observed X2 has the same value as that found using formula [14.3.1].
read down the column of .05 values, thereby finding the critical value of 3.84 (please turn
to Table B.7 and verify that you would select the same critical value). Is the observed chi-
square statistic greater than or equal to the critical value? Yes, in symbolic terms:
X 2 (l) = 5.71 ;;:: X~ritical (1) = 3.84 - Reject Ho.
The significant X2 statistic is then reported as:
X2(1, N = 206) = s.7l, p < .05.
How do we interpret this result? Clearly, the two variables are not independent of one
another-source of news acquisition depends on level of education. College graduates
fIJ tend to read the newspaper for news, where high school graduates are more likely to
When reporting the result of any watch a news program on television.
The strength of the relationship between two variables can be assessed using one of two
supporting statistics, the phi (cP) or Cramer's V statistic.
Phi (cP) coefficient. The phi coefficient can be calculated when an investigator is per-
forming a chi-square analysis on a 2 X 2 contingency table. In a manner similar to the
Pearson r, the phi coefficient provides a measure of association between two dichoto-
mous variables. The formula for this supporting statistic is:
[14.8.1] rzN'
cP= ~
Using the data from the education and news acquisition, we find that:
[14.8.2] cP = J 5.71,
206
[14.8.3] cP = V.0277,
[14.8.4] cP=·17.
The value of the phi coefficient can range between 0 and 1, where higher values indicate
a greater degree of association between the variables. Values closer to 0 suggest that there
is little or no relationship between the variables. In the present case, we see that the
strength of association between education and news acquisition is not very pronounced,
despite the fact that the chi-square value is significant. Once again, we must remind our-
selves that significance per se is not the issue-just because a result reaches conventional
statistical significance does not give us license to claim its effect is strong. In the present
example, the two variables share only a minor relationship with one another.
Cramer's Vstatistic. When is Cramer's Vpreferred over the phi coefficient? Cramer's
V statistic is used only when a contingency table is larger than the standard 2 X 2 size.
The formula for Cramer's V is:
~
[14.9.11
V=~~'
where n refers to the smallest number of rows or columns present in a contingency table. If
the table were a 3 X 4 design, then n would be equal to 3. One last suggestion: Neither the
phi coefficient nor Cramer's V should be calculated unless the X2 is statistically significant.
D oes nonhuman companionship-caring for a dog or a cat, for instance-promote health and
well-being? Can the presence of animal companions positively enhance the I-year survival rate
for people who have had a heart attack? Friedman, Katcher, Lynch, and Thomas (1980) argued
that similar to human beings-a spouse, a close friend or other family member-pets provide an
important source of companionship that may mitigate subsequent health problems associated with
heart disease. Pets must serve some important function in the lives of many people, as almost half
of the homes in the United States report having some kind of pet (Friedman et al., 1980).
The researchers interviewed 67 male and 29 female patients diagnosed with heart attack
(myocardial infarction or angina pectoris) and then admitted to a hospital, contacting them again
a year later. The survival rate of the patient group after one year was 84%. Of the original sam-
ple, 58% claimed they had one or more pets at the time of their heart attack. The relationship
between owning a pet and the survival rate following a year after hospital admission was exam-
ined in the following contingency table (from Friedman et al., 1980, p. 308):
Alive 28 50
Dead 11 3
A chi-square test for independence performed on these data proved to be significant, X 2 (1, N =
92) = 8.90, P < .002. Of the 39 patients who did not own pets, 11 of them (28%) died within
that first year following their heart attacks, compared to only 3 (6%) of the 53 pet-owning
patients. Thus, pet ownership was linked with an increased survival rate 1 year after heart attack.
The authors are quick to note that the increased survival rate is presumably not due to the ben-
eficial effects of physical activity associated with pet care (Le., walking a dog regularly), as a sub-
sequent analysis showed that owners of pets other than dogs also had a higher survival rate than
individuals who had no pets.
Did Friedman et al. (1980) conclude that the presence of pets definitively reduces mortal-
ity following heart disease? Certainly not, but these authors did use the chi-square test (as well
as other, related statistical analyses) to make a case to researchers and health care professionals
that pets, like other social factors, can have a potentially important effect on people's health fol-
lowing debilitating illness. Questions concerning the presence of pets, too, can be easily added
to standard patient questionnaires, yielding a ready source of potentially insightful data.
The Mann-Whitney U Test 541
Knowledge Base
1. Name a few ways in which nonparametric statistics differ from parametric tests.
2. What does I fa always equal? What does I fE always equal?
3. Examine the following contingency table and determine the expected frequencies
for cells A and D.
Al A2
BI 10 25
B2 15 10
4. A chi-square test for independence is performed on a 3 X 4 contingency table. As-
suming the test finds a significant result, which supporting statistic should be cal-
culated?
Answers
1. Nonparametric tests are distribution free, generally require nominal or ordinal data, and tend
to require less complex calculations. When parametric assumptions are violated, interval or
ratio scale data can sometimes be analyzed by nonparametric statistics.
2. Both sums must be equal to the N of the observations available.
3. Cell A = 14.58; cell D = 14.58
4. Cramer's V is the appropriate supporting statistic for any contingency table larger than a
2 X 2 table.
Data Box 14.D). When the majority of ranks in a data set are tied, however, con-
sult statistical works like Hays (1988) or Kirk (1990) for guidance.
The Mann-Whitney Utest is employed when two independent samples exist-the
presence of noninterval scale data precludes using the independent groups t test-and
different participants appear in each of the two samples. Let's review an example in
some detail. Perhaps a linguist is interested in comparing the effectiveness of traditional,
classroom-based language learning versus total immersion learning where elementary
students are concerned. The linguist randomly assigns a group of 18 fourth-graders to
either a traditional Spanish language class (i.e., the teacher gives directions in English,
though the emphasis is on learning to speak Spanish) or a total immersion class
(i.e., the teacher speaks exclusively in Spanish). At the end of the school year, a panel
of judges gives an age-appropriate Spanish-language test to the students, subsequently
using the scores to rank the children's linguistic skills from a to 100 (the judges re-
main unaware of which learning technique each child was exposed to). The rankings
were then categorized by the respective teaching techniques the students were exposed
to (see Table 14.4).
The goal of any nonparametric test is to establish overall differences between two
The Mann-Whitney U test is the (or possibly more) distributions, not to identify the differences between any particular
parameters (e.g., Evans, 1996). The Mann-Whitney U test assumes that if two collec-
non parametric analog of the t test
tions of rankings originate from the same parent population, then the rankings from
for independent groups. each group will be unsystematically "mixed" with one another. If, however, one group's
rankings (say, group A) are found to be localized in upper or lower positions relative to
the other group's ranks (group B), then we can reasonably assume that the ranks of one
group come from a different population than the other. Let's follow the usual steps for
testing a hypothesis using this nonparametric test.
Following Table 14.2, we have already completed step 1 by recognizing that the data
are ordinal, rather than nominal, interval, or ratio-scaled. Stating the null and alterna-
tive hypothesis for the Mann-Whitney U test is not at all difficult (step 2); indeed, it
only entails noting whether a systematic difference exists between the language skills of
the two groups (based, of course, on their respective rankings), or:
Ho: There will be no systematic difference between the Spanish-speaking
skills of the traditional-learning group and the total immersion group.
Note: Each number represents the relative ranking of a student's ability to speak Spanish after 1 year of
receiving one mode of instruction.
( The Mann-Whitney U Test 543
I
!
The alternative hypothesis will suggest that such a difference exists, as in:
/.
HI: There will be a systematic difference between the Spanish-speaking skills
of the traditional-learning group and the total immersion group.
(
Following the convention associated with step 3, we will rely on a significance level
- of .05 for the Mann-Whitney U test. Naturally, step 4 requires the lion's share of our
efforts, as we must learn the calculation procedures for a new statistical test. We begin
f by noting that NA , the number of students in the traditional learning group, is equal to
f
10; NB is equal to 8 (i.e., the overall N for the study is 18, or NA + NB; see Table 14.4).
We then sort the raw ordinal rankings into a new table, one where they are listed in
ascending order, and next to these raw data are rankings from 1 (lowest score) to N
(Le., NA + NB, the highest score; see Table 14.5). Any tied ranks-a situation where two
or more observations receive the same ranking-must be resolved using the straight-
(
forward procedure outlined in Data Box 14.D.
I The third column in Table 14.5 identifies whether a given score is from group
A (traditional learning) or B (total immersion). This labeling is accomplished in order
to separate the rankings in the subsequent steps in the calculation of the Mann-
Whitney U statistic. Columns 4 and 5 in Table 14.5 identify the respective rankings for
groups A and B. As you can see, the former's ranks tend to be lower relative the latter's
ranks, suggesting that a between-group difference is present (see columns 4 and 5 in
Table 14.5).
We now select what is called the Ucritical value for the Mann-Whitney Utest. To do
so, we need to know the sample sizes of groups A (NA = 10) and B (NB = 8), as well
as the predetermined significance level from step 2 (i.e., .05). Armed with this infor-
mation, we turn to Table B.8 in Appendix B, which is the table of critical values for the
W hen analyzing ordinal data, it is common to come across two or more ranks that are tied
with one another. The following straightforward procedure can be readily used to "break the
ties" and determine appropriate ranking values for substitution in whatever ordinal procedure
you are conducting.
In the case of the Mann-Whitney U test performed in the text, for example, two scores of
42 represented tied ranks (see column 1 in Table 14.5). These two identical ranks held places 3
and 4 in the rankings of scores shown in column 2 of Table 14.5, but because the scores are in-
deed the same, we cannot call them 3 and 41 Instead, we rely on this simple formula to rectify
the problem:
sum of rank positions possessed by tied scores
Rank of tied scores = .
number of tied scores present
Because the two 42s held positions 3 and 4, their shared ranking becomes:
3+4
Rank of tied scores = - - ,
2
7
Rank of tied scores = -,
2
Rank of tied scores = 3.5.
As shown in column 2 of Table 14.5, both scores of 42 are assigned the shared rank of 3.5,
thereby "breaking the tie;' as it were.
What if we had three tied scores, say, in the fifth, sixth, and seventh places in a data set? We
would simply add these places and divide by 3, or:
5+6+7
Rank of tied scores =
3
18
Rank of tied scores = - ,
3
Rank of tied scores = 6.
The number of rankings needed to break any tie(s) can be expanded or contracted for the
analysis of ordinally scaled data.
Mann-Whitney Utest (the boldface entries in this table indicate two-tailed critical val-
ues at the .05 level and then the .01 level-we are interested in the .05 critical values,
which comprise the first half of Table B.B). Table B.B permits one-tailed (directional)
hypothesis tests, too, but most research questions will be presented in a two-tailed
(nondirectional) manner. Table B.B requires a user to locate the intersection between a
column heading corresponding to the value of NA (10) and a row heading corresponding
to the value of NB (B)-when reading down and across, the boldface UcriticaI value is
17. If the subsequent analysis, the actual U test, produces a statistic that is less than or
equal to this Ucritical value, then we can reject Ro. Please note that this procedure is dif-
ferent-indeed, opposite of-the usual hypothesis testing logic, as an observed value
should be lower than or equal to a critical value in order to reject the null hypothesis
of no difference.
,-
( The Mann-Whitney V Test 545
To compute the value of U, we must first determine the sum of the ranks of the
Note well that unlike previous
two groups shown in Table 14.5. For convenience, the respective sums are shown at the
bottom of columns 4 and 5 in Table 14.5 (i.e., 61 and 1l0, respectively). We now com-
inferential statistics, a computed pute a U statistic for groups A and B. The formula for UA is:
(
value of V must be less than or
equal to Veritie.1 in order to reject Ho. [14.10.1]
r Most test statistics must be equal to
(
( or greater than a critical value. This formula requires us to enter the number of rankings for group A (NA = 1O) and
,-
(
group B (NB = 8), as well as the sum of the ranks for group A (i.e., L RA = 61; see
/ Table 14.5). Entering the values for these sample sizes and the sum of the ranks for A,
we find:
!'
10(10 + 1)
i [14.10.2] UA = (10)(8) + - 61,
2
10(1l)
[14.10.3] UA = 80 + 2
- 61>
110
[14.10.4] UA = 80 + - - 61,
2
[14.10.5) UA = 80 + 55 - 61,
[14.10.6] UA = 135 - 61,
[14.10.7] UA = 74.
The value of UB is determined by:
I
J
i [14.11.1)
We again enter the number of ranks for groups A (NA = 10) and B (NB = 8); instead
As shown by this example, a larger of the sum of the ranks for group A, however, we substitute the sum of the ranks for
group B (i.e., L RB = 1l0). Entering these values we find:
sample (here, group A) will not
)'
A and B in columns 4 and 5, respectively, of Table 14.5). Apparently, then, the lan-
guage immersion group demonstrated relatively greater proficiency speaking Spanish
than did the traditional learning group.
NANB(NA + NB + 1) ,
[14.13.1] Uu =
12
(10)(8)(10 + 8 + 1)
[14.13.2] Uu=
12
(80)(19)
[14.13.3] Uu =
12
[14.13.4]
_
Uu- J1,520
--,
12
[14.13.5] Uu = Y126.67,
[14.13.6] Uu = 11.26.
The Wilcoxon Matched-Pairs Signed-Ranks Test 547
,i
The third step involves computing a z ratio by entering the (now) known values of
the population mean and the population standard deviation into [14.14.1]. Please note
that either UA or Us can be used as U in this formula.
r Step 3. If we use Us = 6, then:
"..
I
J
Writing About the Results of the Mann-Whitney U Test
.! A singular virtue of most parametric tests is the ease with which their results can be re-
.I
(
ported. There are no hard-and-fast rules for reporting results from a Mann-Whitney U
test in APA style, however, reasonable advice is to keep the explanation simple but thor-
ough. Here is one way to report the results of the comparison of the Spanish-teaching
techniques:
The students' performance on the age-appropriate Spanish language proficiency test
!
were rank ordered by Spanish-speaking judges, and then a Mann-Whitney U test com-
pared the ranks for the traditional teaching approach (n = 10) and total immersion
(n = 8) approach. The test revealed a significant difference between the groups, where
students in the total immersion group tended to rank higher in language proficiency
.J
than those who were taught traditionally, U = 6, P < .05, where the sum of ranks was
equal to 110 for the former group and 61 for the latter group.
"
I
r • The Wilcoxon Matched-Pairs Signed-Ranks Test
r
The Wilcoxon matched-pairs signed-ranks test-the Wilcoxon test for short-analyzes
ordinal data from a basic one-group, repeated measures experiment.
,/
KEY T ERM The Wilcoxon matched-pairs signed-ranks test is a nonparametric statistic used to identify a dif-
I
ference between two dependent samples of rank-ordered (ordinal) data.
As noted in Table 14.1, the parametric analog of the Wilcoxon test is the dependent
groups t test. Like the dependent groups t test, the data used to perform the Wilcoxon
test are comprised of differences scores (i.e., the difference between how each partici-
pant reacted during a first treatment and then again in a second treatment) or obser-
vations collected in a matched-pairs research design (i.e., two separate participant
548 Chapter 14 Some Nonparametric Statistics for Categorical and Ordinal Data
groups are matched on some variable or variables). This nonparametric test involves
ranking these difference scores from the smallest to the largest one in absolute value
terms (i.e., without attending to the presence of a positive or negative sign), and it fol-
lows the same statistical assumptions associated with the Mann-Whitney U test (see
previous page). The Wilcoxon test produces a test statistic referred to as T (please note
that the letter is always capitalized so that it is not confused with the t reserved for its
parametric cousin-please also note that the Wilcoxon T is not a T [ transformed1score;
see chapter 5).
A clinical researcher interested in decreasing social phobias might rely on the
The Wilcoxon test is symbolized T, Wilcoxon test to analyze data resulting from an intervention involving a therapy group.
The clinician is running a weekly group comprised of seven people who fear meeting
not t-be certain not to confuse the
new people, public places, and having to engage in conversations with strangers for pro-
two statistics (here is a mnemonic: tracted periods of time. The clinician has a colleague who is a social worker rate the so-
"Capital T is used when lowercase t cial acumen and competence of these group members before the first group meeting
cannot be"). occurs (this rater is only asked to perform the rating-the true purpose of the study is
not revealed to her). This pretest rating was based on a 20-point social competence
scale, where a higher score indicates greater competence. The rater was explicitly asked
to compare the social competence of each participant with that displayed by all the
other participants.
After the group therapy took place for 2 months, the clinician asked the social
worker to again rate the social competence displayed by the participants. Thus, the rater
used the same scale to assess the perceived social competence of the therapy group mem-
bers after 8 sessions (i.e., posttest rating). The pretest and posttest ratings are shown
below in Table 14.6.
Why is the Wilcoxon test a more appropriate analytic choice than the dependent
groups t test? For a couple of reasons, actually. First, the participants and their behav-
ior do not constitute a normal population where social competence is concerned; if any-
thing, their social reticence would best be described as extreme (i.e., far away from
average). Second, the rater was asked to make a comparison of each person to all the
others while executing the ratings, a request that built a ranking component directly
into the exercise. Thus, the only way to adequately assess the difference between pretest
and posttest behavior is through using the Wilcoxon T test.
Following Table 14.2, this line of reasoning leads us to complete step 1, the iden-
tification of an appropriate test statistic. Although the content of their research designs
vary from one another, the style or format of the null and alternative hypotheses of the
Table 14.6 Rated Social Competence of Eight Socially Phobic Therapy Group Members
(1)
Participant
(2)
Pretest
(3)
Posttest
(4)
Difference
.
1 8 11 3
2 14 15 1
3 7 12 5
4 8 6 -2
5 2 8 6
6 13 11 -2
7 10 14 4
8 9 9 0
Wilcoxon test is almost identical to the Mann-Whitney Utest. Thus, the null hypoth-
esis for this group therapy study might be:
r" Ho: There will be no systematic difference between the pretest
r
and posttest social competency skills.
It follows that the alternative hypothesis would be:
r HI: There will be a systematic difference between the pretest
(
I
and posttest social competency skills.
Once step 2 is completed and we then decide on a significance level of .05 for the test
(i.e., step 3), we can perform the actual analysis for step 4.
I To begin the analysis, each pretest rating is subtracted from its posttest rating,
( and this difference score is placed in column 4 of Table 14.6. Because these differ-
ence scores are the basic observations in this correlated or dependent-groups re-
search design, only this single set of difference scores will be ranked in lieu of rank-
I
ing the two original sets of scores (see columns 2 and 3 in Table 14.6). We now
prepare the rankings for the analysis by:
I
III Changing all the difference scores to absolute values (Le., ignore the sign of the
difference score; see column 3 in Table 14.7).
III The difference scores are then reordered from the lowest to the highest in value
(see column 3 in Table 14.7).
III Beginning with "I;' a ranking is then assigned to each difference score until N is
j
,I,
reached.
I II Difference scores with a value of 0 are not, however, included further in the
~ analysis (see the data for participant 8 in the top row of Table 14.7), and any ties
are broken using the procedure presented in Data Box 14.D (see the two
differences scores with values of -2 in Table 14.7).
III The ranks showing positive differences are then placed into one column and those
illustrating negative differences are placed into another column (see columns 5
and 6, respectively, in Table 14.7).
III Once the positive and negative ranks are separated in two columns, the sum of
each column should be noted beneath it (see the L R+ and the L R_ under
columns 5 and 6).
;
j
!
,, Table 14.7 Calculating the Wilcoxon T Statistic from Difference Scores
'- (1) (5)
( a
a
ma
a
m
(2) (3) (4) (6)
L R+ = 23
550 Chapter 14 Some Nonparametric Statistics for Categorical and Ordinal Data
W e have not had occasion to discuss an important but often overlooked aspect of writing-up
statistical results-what happens when we accept the null hypothesis, when we find that no
statistically reliable results were obtained in our analyses? Many novice data analysts become frus-
trated when expected results are not found (e.g., means are in the wrong directions, frequencies
spread across categories are almost identical to one another), so much so that they may be moved
to "shelve" the data and the analyses. In truth, there are many prominent behavioral scientists
who effectively do the same thing-they box up and file away (mentally as well as physically) the
null findings and move on to the next project.
There is nothing wrong with this understandably human reaction, but I advocate that you
should still try to learn from analyses yielding nonsignificant results, anyway. Here are some rea-
sons-guidelines, really-as to why you should write up results before putting them away:
• Null results are still results, and they can sometimes tell you something true or useful about
behavior that you heretofore overlooked. Writing up null results can help you to identify or
consider possibilities that you missed when you designed the project and executed the analyses.
• It is incumbent on any researcher or data analyst to tell the true story about his or her re-
search, warts and all. You should not write endlessly about "what might have been" or spec-
ulate in too great a detail in an article or paper about why some statistically reliable differ-
ence was not found. You should, however, report null results as a matter of course alongside
those findings that do match up to their hypotheses.
• A careful reading of the behavioral science literature will reveal that many, if not most, pub-
lications contain at least some null findings. Such empirical honesty is worthy of your con-
gratulation and emulation.
• An honestly written research summary can shed light on why a desired result was not found
(e.g., low power, small sample size, an odd sampling procedure).
• A research summary can point investigators and readers to consider new directions for the
research. Theory revision, then, becomes a possibility.
• By the same token, a written summary can help a researcher to see how subsequent studies
on the same or similar topics can be improved.
• Although it is difficult, if not impossible, to publish research results that are entirely null (see
chapter 15), researchers and data analysts will want to maintain a record of "failed" studies
for future reference. Such studies can often be "imported" into other work (e.g., a multi-
study publication, grant writing, conference presentations).
• Finally, sometimes the null hypothesis is actually false.
Suggested Guidelines for Reporting Null Results
1. Report all test statistics (e.g., t, F, X 2 , U, T, rs) and include their actual p values (if known),
as in p = .26 or report p = ns ("not significant").
2. Describe the original hypothesis and clearly acknowledge that the current results were not
consistent with it. Briefly speculate as to why this might be the case (remaining cognizant
that you cannot prove whether any explanation for a null result is compelling outside an ex-
periment designed to test its tenability).
3. Provide a table of means or in the case of categorical data, frequencies or percentages. Link
the table to the statistical results (see point 1 above).
Remember that you are in good company. Getting a study to "work" takes practice, patience, ex-
perience, and time (see Dunn, 1999, for further discussion of this issue). Congratulate yourself
for having collected the data and/or performed the statistical analyses.
The Spearman Rank Order Correlation Coefficient 551
Unlike the Mann-Whitney U test, no further computations are necessary. The value
of Wilcoxon's T is equal to the smaller of the two sums of the ranks. In this case, T =
I R_ = 5 (see Table 14.7). To determine the critical value of T (Le., Tcritical), turn to
Table B.9 in Appendix B. This table of critical values simply requires the user to iden-
tify the value at the intersection of a (recall our previously determined significance level
of .05) and N-but N in the context of the Wilcoxon T statistic means the number of
nonzero differences found within the data. Thus, although we began with eight partici-
pants, one of them did not show any change from the pretest to the posttest (a zero dif-
ference), so that our Nbecomes 7 (cf. Gravetter & Wallnau, 1996). Following Table B.9,
then, the Tcritical value (two-tailed) is equal to 2.
f Can we accept or reject the null hypothesis? Similar to the Mann-Whitney U test,
Why sum ranks for both positive the observed T statistic must be equal to or less than the Tcritical value. Because 5 is not
less than 2, we cannot reject Ho, or:
and negative difference scores? The
Wilcoxon test assumes that an T = 5 ;::: Tcritical = 2; Accept Ho.
effective treatment will show The observed T is greater than the critical value for T, suggesting that the two months
consistent effects (mostly + or -) of group therapy were not successful in reducing social phobia or enhancing social
/
competence.
on difference scores, but that
chance-no effect-will be apparent
Writing About the Results of the Wilcoxon (T) Test
when inconsistent difference scores
Like the Mann-Whitney Utest, there is no prescribed format for presenting the results
appear (about equal numbers of + of a Wilcoxon (T) test in APA style. Once again, though, a good rule of thumb is to
and -). present the results in simple terms and to provide as much statistical information as
,.
.: necessary. I hasten to add one more point: even the null results of the test we just re-
I
~
viewed should be written up for later use in a written report or oral presentation (see
also Data Box 14.E):
i
The eight members of a social phobia therapy group had their social competence rated
by an independent, expert judge prior to the start of the therapy (pretest) and then
I again after two months (posttest). The magnitude of change in social competence from
;
time] to timez was examined by a Wilcoxon T test. Despite the group therapy, the re-
sults revealed no significant change in social competence rankings, T = 5, P = ns, where
positive change in ranks totaled 23 and negative change totaled 5.
Pearson. Indeed, it is usually the case that interval or ratio-scaled data can be converted
into ordinal rankings with relative ease, making the Spearman rs a viable alternative in
many nonlinear situations.
When two variables are related to one another in a uniform manner, a rank or-
dering of their values will be linearly related (Gravetter & Wallnau, 1996). How so? Con-
sider the data for 5 individuals (denoted as A through E) shown below:
Individuals Variable X Variable Y
A 27 17
B 6 9
C 43 20
D 32 19
E 12 15
As you can see, the third person (C) has the highest scores on both variable X (i.e., 43)
and variable Y (i.e., 20), while the second person (B) has the lowest scores corresponding
to these two variables (i.e., 6 and 9, respectively). Actually, these data portray a consis-
tent relationship between the two variables, but if we plotted the X and Y scores shown
above, they would not appear to be linear because of the different scales involved.
What if we convert the raw scores for X and Y into rankings (i.e., 1, 2, 3, and so
on)? The lower ranks for both the X and Yvalues can be assigned a ((l'~ the next low-
est a ((2'~ and so on up through the rank of 5 for each variable. Once these data are con-
verted to ranks (see table below), a different picture-a linear relationship between the
two variables-emerges.
Individuals Variable X Variable Y
A 3 3
B 1 1
C 5 5
D 4 4
E 2 2
Thus, the second person (C) is ranked first on both variables, the fifth person (E)
is ranked second, the first (A) is ranked third, and so on. When these ranks are plotted
and labeled, the linear relationship inherent in the data becomes clear:
6
Y3
o
x
The Spearman Rank Order Correlation Coefficient 553
, Actor
Adam
Judge l's Rankings
1
Judge 2's Rankings
3
D
-2
D2
4
Bill 2 1
Cara 3 2
Deena 4 5 -1
Ernesto 5 4
Fran 6 7 -1
Gerald 7 8 -1
Helen 8 6 2 4
I 36 36 ID=O I D2 = 14
We can now apply the logic of the Spearman correlation and the linearity of rank-
ings in an example. In preparation for conducting an experiment on detecting decep-
tion, a social psychologist recruited a group of eight actors and asked them to tell a col-
lection of truths and lies while being filmed by a camera. The film was then shown to
two judges who were asked to rank order the actors from "best" to "worst" liar. The
judges' rankings are shown in Table 14.8, accompanied by the differences between the
each of the two rankings (i.e., D = Judge l's rank - Judge 2's rank). When any tied
ranks appear, follow the guidelines outlined in Data Box 14.0.
In order to assess the level of agreement expressed by these two judges, the social
psychologist performed a Spearman rs on their rankings. You will no doubt be pleased
to learn that calculating the Spearman rs is quicker and easier than the Pearson r. Here
is the formula for the Spearman rs:
[14.15.1]
where D2 is equal to the difference between ranks, squared (see the last column in
Table 14.8), and N refers to the number of pairs of data (here, 8; not the total number
of observations).
Notice that the sum of each of the judges' rankings must be equal, while the
sum of the difference scores must equal 0 (see these respective sums shown under
columns 2, 3, and 4 in Table 14.8). The sum of the squared differences, which
will be included in the Spearman rs formula, is shown at the bottom of the last
column in Table 14.8 (i.e., I D2= 14). We can now complete the calculations for the
Spearman rs:
6(14) ,
11! [14.15.2] rs = 1-
8[(8? - 1)
Be careful: The value determined on
the right side of the Spearman 84
[14.15.3] rs =1-
8(64 - 1)
formula is subtracted from 1, which
appears on the left side of the [14.15.4] rs= 1- -84
-,
formula. The number 1 should not 8(63)
appear in the formula's numerator! 84
[14.15.5] rs = 1- --,
504
554 Chapter 14 Some Nonparametric Statistics for Categorical and Ordinal Data
[14.15.7] rs = .83.
Thus, there appears to be a very high degree of association between the judges' rank-
ings of the actors' skills at lying. Put another way, there is a relatively high degree of
consistency between the judges' rankings of the actors' lying skills.
Before we report this statistic in APA style, however, we need to determine whether
it is statistically significant. To do so, we turn to Table B.1O in Appendix B (please do
so now). Table B.1O contains both one-tailed and two-tailed critical values for rs. To de-
termine a critical value for comparison, we need to locate the N associated with the cal-
culation in the leftmost column of Table B.1O (once again, please remember that N
refers to the number of pairs of ranks included in the analysis). As usual, we will
perform a two-tailed test at the .05 level, so we read across the row for N = 8 and
locate the critical value of .738 (please verify that you can locate this critical value
in Table B.I0 before proceeding).
Is the observed rs of .83 greater than or equal to the rs critical value of .738? Yes, so
When the Pearson r does not reach we can reject the null hypothesis of no difference, or:
statistical significance. it is unlikely rs(8) = .83 2:: rS critical (8) = .738; Reject H Q •
that the Spearman rs will either (and
The judges' ranks reached a level of agreement that was greater than chance (Le., sig-
vice versa). nificantly different from 0). In the interests of thoroughness, the number of pairs of
rankings is included in parentheses following the rs in both the test statistic and the
critical value.
Knowledge Base
1~1 1. Rank these scores and be sure to take any tied ranks into account: 2, 5, 8, 3, 1, 6, 8,
Congratulate yourself-you made it! 4,12.
2. When performing a Mann-Whitney U test, should UA or UB be used to test a
The Spearman rs is the last statistic
hypothesis?
and set of calculations presented in 3. Under what conditions is it proper to use the z transformation for the Mann-
this text. Whitney U test?
4. What hypothesis testing procedure associated with the Mann-Whitney U test and
the Wilcoxon test renders them different from all other testing procedures intro-
duced in this book?
5. How does the Spearman correlation coefficient differ from the Pearson correlation
coefficient?
The Spearman Rank Order Correlation Coefficient 555
S ince early in this century, social psychologists have argued that people's attitudes should
predict their subsequent actions, though demonstrating this empirical link has been a
challenge (Eagly & Chaiken, 1993; Wicker, 1969). Within this well-established research tradi-
tion, Wilson, Dunn, Bybee, Hyman, and Rotondo (1984) used the Spearman correlation
coefficient (rs) to demonstrate the disruptive effects of analyzing reasons on attitude-
behavior consistency.
Wilson and colleagues (1984) had research participants become familiar with a variety of
paper-and-pencil puzzles (e.g., letter series completion task) for 5 minutes (see Fazio & Zanna,
1981, for a review of research on direct experience with such attitude objects). During this "get
acquainted" session, half of the participants were instructed to analyze why they felt the way they
did about the puzzles ("I liked the maze because it was challenging"), while those comprising the
control group were given no additional directions. When the time was up, participants in the rea-
sons analysis condition wrote down the reasons they found each of the puzzles to be interesting
or boring. The control participants, on the other hand, were given a filler questionnaire to com-
plete. Later, all participants rated how interesting they found each of the puzzles to be on a (1)
extremely boring to (7) extremely interesting rating scale. They were then left alone for 15 minutes
with several packets containing the five puzzle types, which the experimenter told them they could
play with while waiting for the next part of the study. Unbeknownst to the participants, trained
coders behind a one-way mirror recorded the amount of time they played with each of the types
of puzzles.
Of chief interest to Wilson and colleagues (1984) was the within-subject Spearman rank or-
der correlation computed between participants' interest ratings of the five puzzles and the amount
of time they spent playing with each one during the IS-minute free play period. In the control
group, the average attitude-behavior correlation (Le., rated level of interest in a puzzle with
amount of actual time spent playing with that puzzle) was rs = .54, P < .001 . The average cor-
relation in the reasons analysis conditions, however, was not significantly different from 0 (rs =
.17, P = ns), but it was significantly lower than the mean correlation found in the control group,
t(24) = 2.23, P < .05.
Wilson and colleagues concluded that analyzing the reasons for one's feelings toward an
attitude object can change attitudes in a less accurate direction and that, as a result, the expressed
attitude does not correspond very well with their subsequent actions (Le., the observed discrep-
ancy found in attitude-behavior consistency). These investigators confirmed and elaborated on
these basic findings in a series of subsequent studies (see Wilson, Dunn, Kraft, & Lisle, 1989, for
a review).
Answers
1. Ranks: 1 2 3 4 5 6 7.5 7.5 9
2. Whichever value of U is smaller is lIsed to test whether to accept or reject Ho.
3. When both groups of ordinal data have ns that are greater than or equal to 20.
4. Both the Mann-Whitney Utest and the Wilcoxon test must show statistics (i.e., U or T) that
are less than or equal to some critical value-all other statistical tests in this book require that
the observed st~tistic equal or exceed some critical value.
S. The Spearman correlation coefficient assesses the degree of association between ordinally
ranked data, whereas the Pearson is used with interval or ratio-scaled data.
556 Chapter 14 Some Nonparametric Statistics for Categorical and Ordinal Data
• TESTS ON DATA
There are two straightforward project exercises for this chapter, one dealing with the X2
and the other involving the Spearman correlation coefficient. Both exercises are de-
signed to give you a bit of experience collecting and analyzing some simple data using
these nonparametric tests. The first exercise, adapted from Tanner (1990), requires that
you collect data from your entire class (your instructor may assign this one) or some
other group of people, whereas the second entails performing a straightforward survey
with one of your peers.
Note: Your instructor may collect your responses to these questions so they can be pooled with those given
by the other members of your class.
Adapted from Tanner (1990, p. 185).
The Spearman Rank Order Correlation Coefficient
1
2
3
4
5
6
7
8
questions 1 through 4, did you and your peers have trouble generating numbers in
an equally likely manner? f h ("I" = male) and the women ("2" =
N re the responses 0 t e men d
5. ow compa I 5' 11 ble 14 10) based on the data collected in the first roun
~'F:,:~:g::;:on~o ~o '0, ·yuu will need to ,on"~~~t , 2; ~;;~:~::"Y
table. Is the~e any reason tlo ?e~eve ~aty~~;e:nd~~~~~e ~r~~;~how a preference
ber generatIOn? If so, exp am ow e
for a given digit, for example? d fd
6. Compare the responses of the men and the women when the three roun s. 0 ata
from Table 14.10 are combined (see question 4). Construct a 2 X 3 contmgency
table to answer this question. Is there any reason to believe that m.en and women
differ with regard to number generation? If so, explain how they dIffer-does one
group show a preference for a given digit, for example?
Alternatively, create a list of 10 or so films you have seen in the last few years and
then rank them. (If you would prefer another domain of interest, rank popular songs,
rock groups, actors and actresses, music videos, books, etc., instead.) Have a peer re-
view your list, asking him or her to rank order the same stimuli. Perform the Spearman
correlational analysis and evaluate the result.
Chapter 14 Some Nonparametric Statistics for Categorical and Ordinal Data
Summary
1. Research questions that cannot be analyzed by conventional 8. The chi-square test for the independence of categorical vari-
inferential statistics can often be examined by nonparamet- ables indicates whether two variables are independent of one
ric statistics. another (i.e., can be understood separately) or dependent
2. Nonparametric statistics generally test hypotheses involving (i.e., the effect of one variable cannot be properly understood
ordinal rankings of data or frequencies. without taking the other variable into account).
3. Where parametric tests require that certain assumptions be 9. The degrees of freedom for the chi-square test for indepen-
met, especially the shape of a population's distribution, non- dence are based on both the number of rows (levels) repre-
parametric tests are said to be "distribution free." That is, senting one variable and the number of columns (levels) rep-
nonparametric tests make no assumptions regarding parent resenting the other.
populations or the shape of their distributions. 10. The chi-square test for independence reveals whether two
4. Parametric tests are very powerful when their assumptions variables have a statistically significant relationship with one
are met, but nonparametric tests make worthy substitutes. another but not the strength of that relationship. The phi (cP)
coefficient is used to measure the strength of association be-
5. The advantages of nonparametric tests include that.they are
tween variables from a 2 X 2 contingency table, whereas
distribution free; can analyze data that are not preCIsely nu-
Cramer's V statistic is used when a contingency table exceeds
merical; are applicable when sample sizes are small; and tend
this standard size.
to be relatively easy to calculate. . 11. The Mann-Whitney U test is a nonparametric test ~sed to
6. Categorical or nominal data are analyzed by the .Chl-sq~are assess whether a statistically significant difference eXists b~
(l) test, which compares observed freque~cles agamst tween two independent samples of rank ordered data. It IS
expected frequencies. The one variable chI-square test, the nonparametric equivalent of the independent groups t
known as the chi-square test for goodness-of-fit, deter-
. whether the obtained observations in each level of test. l'd h h dt
mmes al (. h nee deter- 12. The Mann-Whitney U test can be app Ie w .en t e a a
a variable are approximately equ I.e., ca. . are ordinal; were randomly selected; and no ties are pr~s
. d) or if they adhere to a pattern (i.e., a slgmficant t within the rankings (or those ties have been dealt WIth
mme eXl'sts between the observed and expected :ing a standard procedure presented in the chapter; see
difference
observations). d f fit Data, Box 14.D). . . Whitne U
7 The degrees of freedom for the chi-square goo ~ess-o ~ 13 Unlike previous inferential statIstICS, the Mann- I Y f
. test are based on the number of available. categones rat er . test only identifies a significant difference when the ower 0
than the total number of observations available.
Chapter Problems
559
its two U values is still less than or equal to some critical U test; indeed, it also shares the same hypothesis testing pro-
value of U. cedure (i.e., the observed statistic-Wilcoxon's T-must be
14. When large sample sizes are available (i.e., both of the groups less than or equal to a critical T-value in order to reject a
contain 20 or more rankings), then the normal approxima- null hypothesis).
tion of the Mann-Whitney U test, one based on a z trans- 16. Wilcoxon's T statistic is different from the ttest (see chapter
formation, can be used for hypothesis testing. 10) or a transformed score, or T-score (see chapter 5).
15. The Wilcoxon matched-pairs signed-ranks test examines 17. The Spearman rank order correlation coefficient (rs) assesses
whether a difference between two dependent samples of the degree of association between two sets of ordinally
ordinal rankings is significant. This test is based on the same ranked data. The Spearman rs is ideally used when the avail-
statistical assumptions as its cousin, the Mann-Whitney able rankings are not linear.
Key Terms
Chi-square (X 2 ) test for Mann -Whitney U test (p. 541) Spearman rank order correlation
goodness-of-fit (p. 529) Nonparametric statistic coefficient (rs) (p. 551)
Chi-square (X2) test for (p.525) Wilcoxon matched-pairs signed-ranks
independence (p. 534) Parametric statistic (p. 525) test (p. 547)
Cramer's V statistic (p. 539) Phi (1)) coefficient (p. 539)
Chapter Problems
1. What is a nonparametric statistical test? How do nonpara- on the intersection of two variables, each of which has three
metric tests differ from parametric tests? levels. What non parametric statistic should the researcher use
2. What is a parametric test? Do parametric tests share the same to analyze the data? Why? (Hint: Use one of the decision trees
statistical assumptions with nonparametric tests? Why or why presented at the opening of this chapter.)
not? 15. Two restaurant critics for the same newspaper rank order the
3. How do interval and ratio scales differ from nominal and or- top 30 places to dine in the city. What nonparametric test will
dinal scales? Provide an example of each scale type. allow them to quickly determine their level of agreement?
4. What are some advantages associated with using nonpara- Why? (Hint: Use one of the decision trees presented at the
metric tests? opening of this chapter.)
5. Why should researchers in the behavioral sciences be open to
16. The office of the dean of students wants to examine whether
learning to use nonparametric statistical tests?
the priorities of freshmen change across their first year in
6. A researcher performs a study that yields no statistically reli-
college. The first year students ranked their priorities
able differences. What should the researcher do with these null
(e.g., grades, dating) during the first week of school and then
results? Why?
again during the last week of the academic year. What test
7. Which nonparametric tests are most similar to the t test for
should be used to demonstrate that the students' rankings
independent groups?
changed across time? Why? (Hint: Use one of the decision
8. What is the nonparametric test that is most similar to the t
trees presented at the opening of this chapter.)
test for dependent groups?
9. Which nonparametric test can be used to analyze data from 17. A pizza parlor wants to determine whether one of its "spe-
one sample only? cial" pizzas is ordered more frequently than two other special
10. Which nonparametric test can analyze data based on two vari- pies. The owner of the shop examines the phone orders for
ables? the three types of pizza for a week. What test should be used
11. What is the nonparametric counterpart of the Pearson r? to demonstrate that requests for the three special pizzas var-
12. A chi-square test performed on a 2 X 2 contingency table ied? (Hint: Use one of the decision trees presented at the open-
reaches statistical significance. Which supporting statistic can ing of this chapter.)
be used to describe the strength of association between the 18. Rank order the following data and be certain to "break"
two variables? Why? (Hint: Use one of the decision trees pre- any ties in the process: 12,4,3,6,8,3,4,5,9, 11,3,6,7,0, 1,
sented at the opening of this chapter.) 21,45.
13. Under what conditions is it preferable to use Cramer's V sta- 19. Rank order the following data and be certain to "break" any
tistic instead of the phi (1)) coefficient? ties in the process: 99, 67, 32, 12, 100, 23, 33, 11,6, 78, 34,
14. A researcher creates a contingency table of frequencies based 56,22,32,70,89,99.
. I nd Ordinal Data
nca a
Som el\!onparametric Statistics for Catego
Cba\lter14
560
.. . that each ~erson can
20. What are the expected frequencies for this contingency table?
majors Wlthm the department, notmg
endorse one-and only one-interest area. Evaluate the d~ta
Al A2
10 23 using an a level of .05 to determine if gender is linked WIth
30 8 preference for an area of study:
21. What are the expected frequencies for this contingency table? Clinical Social Biopsychology
Al A2
9 10 Male 15 23 10
10 9 Female 8 25 40
22. What are the expected frequencies for this contingency table?
Al A2 A3 29. A researcher wonders whether self-esteem is linked with will-
BI 5 10 3 ingness to persist at a difficult task. Evaluate the data using
B2 8 4 10 an a level of .05 to determine if self-esteem is linked with
23. What are the expected frequencies for this contingency table? persistence:
Al A2 A3 Continue Working on Task Quit
BI 5 10 20
B2 20 5 10 High self-esteem 30 10
24. Use the chi-square test for independence shortcut presented Low self-esteem 14 25
in Data Box 14.B on the contingency table presented in prob-
lem 20. What is the value of the X 2 statistic? Is it significant? 30. What is the strength of association between the two variables
25. Use the chi-square test for independence shortcut presented in problem 28?
in Data Box 14.B on the contingency table presented 31. What is the strength of association between the two variables
in problem 21. What is the value of the X2 statistic? Is it
in problem 29?
significant? 32. A clinical psychologist wonders whether individuals with
26. A professor wonders whether an unusual number of Type A personality answer the telephone faster (fewer rings)
upper-class students are enrolled in his introductory psy- than those with the Type B personality. Use a Mann-
chology class, which is usually populated primarily by first, Whitney U test with a = .05.
second, and third-year students. Here are the enrollment Type A: 2, 5, 4, 3, 2, 3, 6, 3, 2, 4
data: Type B: 5, 6, 3, 6, 7, 6, 5, 3, 5, 6, 4, 6, 8, 4
Freshmen Sophomores Juniors Seniors 33. An animal behaviorist wonders whether one strain of rats
runs in a play-wheel more often than another species. The
15 12 14 27 researcher observes the number of times 8 rats from each
species run in a wheel for a 5-day period. Do the following
data suggest that one species runs more often than the other?
Is there any significant preference by senior class standing?
Use the Mann-Whitney U-test at the .05 level of significance
Use an a level of .05 to test the appropriate hypothesis.
to examine these data.
27. A university health center tracks the number of flu-related vis-
Species 1: 12,32,10,20,22,19,8,8
its during each month of the fall semester. The center direc-
Species 2: 3,4, 2, 1, I, 3, 4, 7
tor wonders whether students come down with the flu more
34. Use the normal approximation of the U distribution to test
often around midterm (mid-October) and final (mid-
the significance of the following research results. Use an
December) exams. Can these data shed any light on this issue?
a level of .05: NA = 46, NB = 34, UA = 312, UB = 332.
Flu-Related Visits to the University Health Center 35. Use the normal approximation of the U distribution to test
(by months) the significance of the following research results. Use an
a level of .01: NA = 38, NB = 42, UA = 252, UB = 236.
September October November December 36. A study on prejudice reduction assessed perceived changes
20 48 27 56 in students' attitudes toward members of minority groups.
A clinical psychologist rated the students' prejudicial
attitudes before and after they participated in a weeklong
Is there any significant difference among the flu-related vis- encounter group with members of various minority
its during the fall semester? Use an a level of .05 to test the groups. Examine the following data-pre- and posttest
appropriate hypothesis. prejudice ratings for each of the students-and then use
28. The chair of a psychology department wants to determine if the Wilcoxon matched-pairs signed-ranks test (a = .05) to
student gender is linked with area of interest in the field of determine if there was a significant change from pre- to
psychology. She distributes a survey to a random sample of posttest attitudes.
Chapter Problems 561
Student Pretest Posttest 39. Rank the following scores and then compute the Spearman
rank-ordered correlation between X and Y (use a = .05 to
A 21 17 test the hypothesis that r~ is significantly different from 0):
B 17 l3
X y
C 20 18
D 23 16
6 12
E 17 18
7 10
F 25 19
9 14
G 18 15
4
H 19 l3
5 7
7 10
37. Student teachers were rated on their classroom presence
before and after being filmed teaching an elementary school
40. Two variables are arranged into a 3 X 4 table, one where the
class. All of the students watched themselves on film before
entries-nominal data-appear in each of the 12 cells. Which
teaching-and being rated again-in a subsequent class. Did
non parametric test should be used to analyze the data? Why?
the film appear to improve the new teachers' classroom pres-
(Hint: Use the decision tree( s) at the start of this chapter to
ence? Use an a level of .05.
answer these questions).
Student Teacher Before Seeing Film After Seeing Film 4l. Imagine that the analysis described in question 40 reaches
statistical significance. What is the next step the researcher
A 3 2 should take? Why? (Hint: Use the decision tree(s) at the start
B 5 3 of this chapter to answer these questions).
C 4 3 42. A researcher wants to assess the degree of association be-
D 7 5 tween two sets of ordinal rankings. Which non parametric
E 2 2 test should be used to analyze the data? Why? (Hint: Use the
F 6 4 decision tree( s) at the start of this chapter to answer these
G 8 5 questions) .
43. A researcher wants to determine whether a difference exists
38. Two instructors are team-teaching a high school science class. between two sets of ordinal rankings that are not indepen-
In order to make certain that they are grading the students dent of one another. Which non parametric test should be
consistently, the instructors each rank the students in terms used to analyze the data? Why? (Hint: Use the decision tree(s)
of their perceived ability from 1 to 10. Using an a level of at the start of this chapter to answer these questions).
.05, determine their level of agreement. 44. After calculating a significant X 2 test statistic based on a 2 X
2 contingency table, what should the data analyst do next?
Teacher Xs ranking Teacher B's ranking Why? (Hint: Use the decision tree(s) at the start of this chap-
ter to answer these questions).
1 3
2 2
3 1
4 6
5 7
6 5
7 4
8 8
9 10
10 9
Matching a Statistical Test to a Research Design
1. 2. 3.
Are participants randomly assigned to Are participants randomly selected from Is at least one independent variable being
groups in your study? some population? manipulated in your study?
8. 9. 10.
What are the two levels of the inde pendent Were the same participants exposed to every Were two (or more) measures per participant
variable like? > .••. 1>( . . lev~lof theindependentvariable? collected outside an experimentajdesign?
I I I
If different If the same If yes, then do a If no, then do a If yes, then use the If no, then go to
participants appear. in partici pants completed one-way repeated one-way analysis of Pearson product step 11.
each level, then a dependen t measure measures analysis variance (AN OVA): moment correlation:
do an independent twice, th en do a of variance go to go to chapter 6.
groups t test: go to dependent groupst 1< (AN OVA): go to chapter 11. I .
. chapter 10. .. test: go to chapter 10. chapter 13. . ....
11. .•
Inferential statistics are not appropriate. Try presenting the results in a graph, table, or other diagram (go to chapter 3), or
. .• . . • . . • . . ....... consider adopting a qualitative approach (go to Appendix F).
1. 2. 3. 4.
Was the minimally sufficient If causal conclusions were drawn Do supporting statistics Were any procedural problems
statistical test used to from the data, was the evidence (e.g., effect size) accompany any reported in the Method or priorto
the data? sufficiently strong to rule out test statistics? the main Results?
altemative interpretations?
7. 8.
Does psychometric information Are the results of the research
accompany any standardized tests, generalizable beyond the
measur~s~or athertools? current work?
Chapter Oullne
IN CONTEXT Future
Data Box I5.A: Statistical
Heuristics and Improving
Reasoning
• Data Analysis with COlnpllltel:s:>
The Tools Perspective
hen planning to write this book, I had occasion to examine many existing statis- Knowledge Base
tics books in and outside of the behavioral sciences. Admittedly, I did not read • Thinking Like a Behavioral
every book I came in contact with from cover to cover, but I did form some last- Scientist: Educational, Social,
ing impressions of many works, taking notice of some of their more salient character- Ethical Implications of
istics. One aspect of practically all the books I looked at stood out more than any other- and Data Analysis
rarely did a book have any sort of concluding chapter. Indeed, within my haphazard Data Box I5.B: Recurring
Problems with Fraudulent,
collection of 20 or so statistics books (I admit the possibility of selection bias in this or Misleading Data Analysis:
nonrandom sample) only two texts bothered to include a conceptual parting shot- The Dracula Effect
only two! The majority of the texts actually ended rather abruptly, most final chapters
• Conclusion
being devoted to reviewing some nonparametric statistics.
Project Exercise: A Ch,~cklist J'Or-:~
Why the paucity of concluding chapters? Why no final words of wisdom or direc- Reviewing Published l<es,earc'n,o,
tion, or at least an appeal to properly use, not abuse, statistics in the course of conduct- Planning a Study
ing behavioral science research? I cannot speak for my fellow authors, but in my opin-
• Looking Forward, Then Back
ion some closing comments designed to put what you have learned into perspective are
• Summary
truly required. I want you to leave your course and this book with an awareness that
statistics and data analysis are for something, not just means to inferential ends. Thus, • Key Terms
we will conclude what is probably your first exposure to statistics and data analysis by
discussing the recent fuss over null hypothesis testing, a public discussion that is likely
to influence how statistics and data analysis are used in the future. We will then consider
how to avoid allowing statistical analysis to determine rather than merely steer your
thinking, emphasizing once more that good interpretation always supersedes a right
answer. I will suggest some ways to forge links between analysis and behavioral science
research. We will also consider computers as research tools, as well as some educational,
social, and ethical implications of statistics and data analysis. The book's final Project
Exercise provides a checklist of statistical issues to consider when evaluating or planning
a piece of research.
564 Chapter 15 Conclusion: Statistics and Data Analysis in Context
After reading an entire book on the statistical analysis of data, it may surprise you to
learn of a growing controversy surrounding the proper use of null hypothesis
significance testing. Many professional researchers claim that the statistical techniques
we reviewed throughout this book are often misunderstood and, worse still, misused in
the classroom, the lab, and out in the field. Concerns about the appropriate use of sta-
tistical methods are leading to new guidelines for students, teachers, and researchers in
psychology and the other behavioral sciences. We will review the fruits of this contro-
versy as a reminder that statistics are helpful tools for behavioral science research but
that they must not be used dogmatically.
Various researchers have questioned the wisdom of the prevailing reliance on null
hypothesis significance tests (e.g., Cohen, 1994; Loftus, 1993, 1996; Shrout, 1997).
These researchers highlight concerns including the inappropriate, even "slavish," re-
liance on significance tests when other-often simpler-ways of discerning differ-
ences among research groups are available; failing to consider issues of power and
effect size before planning and executing a study; and neglecting to view the presence
or absence of an observed difference as somewhat arbitrary. In addition, there is the
simple failure to educate a lay public that still erroneously equates phrases like "sta-
tistical significance" with "truthful," "powerful;' and "meaningful" (e.g., Scarr, 1997;
recall Data Boxes 9.E and 10.C). These various and sundry problems suggest that re-
searchers become "distant" from their data when they rely too heavily on statistics for
statistics' sake. To borrow an observation from John W. Tukey (1969, p. 83), a staunch
advocate for analytic clarity, "There really seems to be no substitute for 'looking at
the data.»'
Fortunately, Tukey's is not a lone voice. Geoffrey Loftus (1991, 1993, 1996), a cog-
Though critics decry their overuse or nitive psychologist and journal editor, jumped into the statistical fray by suggesting sev-
eral alternatives to the usual null hypothesis-testing program found in the average jour-
misuse, null hypothesis significance
nal article. His modest proposals for improving the way data are analyzed and presented
tests are the analytic method of include (adapted from Loftus, 1996):
choice for most behavioral scientists.
II Plotting data for visual examination in lieu of presenting table after table of test
statistics (e.g., F, t) with their accompanying p values. The heuristic value of this
suggestion is obvious-it is easy to see the meaning (e.g., presence or absence of
a predicted difference) in a graph of data (recall chapter 3; see also Tukey, 1977).
III Performing planned comparisons or contrasts, which can lead to more focused
research questions and, eventually, agendas (see the discussion of contrast
analysis in chapter 11).
III Reporting confidence intervals for any sample statistics. Smaller confidence
intervals around a sample mean, for example, indicate greater statistical power;
wider intervals point to less power (Loftus, 1996).
III Combining the effects of independent research efforts in order to isolate and
identify consistent, predictable patterns of behavior. The advanced statistical
technique used to achieve this synthesis is called meta-analysis, and it is
beyond the scope of this book (but see, for example, Rosenthal, 1991). In a
related vein, it is also important to assess the effect sizes (e.g., strength of
association between independent and dependent variables) found within these
efforts, as well.
of forces and concerns like those raised by Loftus led the American Psychological As-
sociation (APA) to recently revisit the role of significance testing in psychology jour-
nals (Wilkinson & The Task Force on Statistical Inference, 1999).
Panel Recommendations: Wisdom from the APA Task Force on Statistical Inference
A task force of statistical and methodological experts was recently convened by the APA
to develop a list of recommendations for the use and reporting of statistical techniques
(Azar, 1999). The APA Task Force on Statistical Inference originally convened to ad-
dress the concern that too many investigators were relying exclusively on null hypoth-
esis significance tests in their research, that other useful statistical or data analytic tech-
niques were being overlooked (other analytic matters were subsequently examined as
well). One of the most historically contentious issues has been the faulty perception
that a study failing to reject a null hypothesis at the .05 (or less) level is "not signifi-
cant" so it is not worth publishing (e.g., Locascio, 1999). The presence or absence of
significance, of course, is not the point-any inferential light a statistic sheds on the
meaning of a result is the point.
Although the recommendations were developed for the psychological community,
Remember, statistics are only their importance is both valid and useful for other behavioral science disciplines. In
general, the points raised by the task force will come as no surprise to researchers, ed-
tools-their interpretive meanings
ucators, and students who are conscientious about conducting and reporting about re-
rather than their numerical values are search. One of the main points raised in the report is the importance of carefully and
what matter. clearly describing a study's methodology, acknowledging any related limitations in its
design or execution, and providing a sufficient amount of data for readers to critically
evaluate the effort (Azar, 1999). In addition, researchers should describe the character-
istics of any research populations as well as the membership in any control or com-
parison groups. Following the lead of Loftus (1996), researchers are also encouraged to
carefully examine their data, plotting or graphing it whenever possible, so as to iden-
tify consistency as well as any aberrations within the observations. As one of the
coleaders of the task force, Robert Rosenthal, stated the matter, researchers should "make
friends with their data" (quoted in Azar, 1999).
A particularly important message of the report is one that I have tried to em-
phasize throughout this book: Select the simplest or most straightforward statistical
test or approach to answer a question. It is never a good idea to become enslaved to
powerful but complicated computer packages or statistical software that removes the
user from his or her data. Any result should be cross-checked by hand calculation or
another program, and then conveyed in the most clear prose possible. The goal, as
always, is to focus on describing and interpreting behavior by using statistics as a
tool (recall chapter 1); data, not data analytic techniques, must be the focus of
the work.
The APA Task Force on Statistical Inference (Azar, 1999; Wilkinson & the Task Force
on Statistical Inference, 1999) also recommends that researchers and students alike:
Provide Supporting Statistics Along with Any Main Statistics. As we learned in pre-
vious chapters, any main statistical result should be bolstered by appropriate support-
ing statistics. When a mean difference is found, for example, be sure to include an
accompanying effects size statistic and/or a statistic demonstrating the amount ofvari-
ance accounted for by the observed difference.
Account for Problems in Research Procedure Before Presenting Any Results. Any
number of problems can occur in the course of conducting a piece of research; some
are merely annoying while others are potentially hazardous to the drawing of mean-
ingful conclusions from the work. Be sure to report any problems (e.g., research pro-
tocol violations, missing data, participant attrition) before presenting the results so that
readers can consider the magnitude of the complication's effects on the data.
Document That Randomizing Procedures Are Truly Random. When participants are
described as being "randomly assigned" or "randomly selected," and so on, verify for
yourself and any readers that these terms are being used truthfully and accurately. If
not, then discuss what statistical or methodological controls were used to reduce po-
tential bias in the actual research and any conclusions drawn from it.
Define Any Variables in Detail and Explain Their Relevance. Independent and de-
pendent variables must be explicitly defined and presented contextually. Why was one
variable rather than another chosen for inclusion in a given project? The use of any and
all variables in any study must be justified, and their presence must support the goals
of the research effort.
Knowledge Base
1. True or False: Complex statistical analyses are better than simple ones because they
can address difficult research questions.
2. True or False: There is a growing consensus that there are viable alternatives to null
hypothesis significance tests.
Answers
1. False: Simpler statistical tests are easier to use, more familiar to users and readers, and more
readily interpretable.
2. True.
alienated from your data, merely going through the motions of an analysis. The survey
researcher who continually collects data in the same way study after study, for exam-
ple, is no different than the psychologist who uses the same paradigm in every experi-
ment-not only is innovation lacking, but it is likely than novelties (real or potential)
in the data are being missed.
Alienation is also inevitable when statistics and data analysis are placed on a high
Remain open to (and open-minded intellectual pedestal, one where quantification is portrayed as an unassailable author-
ity ("If it's a number, it must be true"). Regrettably, such quantitative worship occurs
about) new analytic techniques and
when students-and all too many teachers and researchers-become empirical zealots
research methods. who see rigorous statistical analysis as the sole way to discern cause and effect rela-
tionships in the world. Anything that does not conform to the dominant rules of evi-
dence-formulas, Greek letters, p values, or any form of quantification-is deemed less
worthy or, worse, less "correct" than observations made using some other technique (see
the discussion of qualitative methods in Appendix F). Once again, I am not trying to
debunk the arguments I made throughout this book-yes, statistical analysis is an im-
portant part of the research process, but it is only one part of the process. To unduly
inflate the importance of statistics at the cost of other aspects of the research process
is not only misguided, it reduces a researcher's flexibility and creativity.
Finally, alienation from data and even the research process occurs when statistics
and data analysis are viewed from a "been there, done that" frame of reference. Some
students feel that a statistics class is something to be suffered through once and that no
return engagement is necessary-statistics was a requirement for a major or for grad-
uation, one that merely needed to be checked off the list with all those other distribu-
tion requirements. Others see statistical analysis as a ritual to be performed in a set way
forever, one where no deviation from the script (or the old class notes or user-friendly
statistical software) is permitted. Believe it or not, some spontaneity in the planning
and execution of research is to be welcomed-and the field of statistics is one that ac-
tually continues to advance and grow. In any case, I hope that what you learned about
statistics and data analysis in this book is more than some rite of passage for you, that
exposure to quantitative methods will help you to be an active thinker, producer, and
consumer of research.
fill Reprise: Right Answers Are Fine, but Interpretation Matters More
Every picture tells a story, and so does every statistical result. In the same way that I
hope you avoid becoming alienated from your data, I hope that you will see beyond a
result. In other words, learn to focus on what the result of an analysis reveals about
behavior rather than the numbers present in the analysis. Similar to selecting the ap-
propriate test, getting the right or correct answer to a statistics question is a desirable
goal. Remember, though, not to become too fixated on the value of a test statistic or
whether its level of significance is .03 or .003. Your concern is better directed at dis-
cerning what a significant or nonsignificant test statistic tells you about people's be-
havior. How does a given statistical analysis inform or change your thinking about some
behavior? The statistic itself is mute on this matter-you must use a combination of
theory and inference in order to develop a coherent explanation for what people do un-
der particular conditions, and why.
As noted above, statistics provide credibility in the form of technical support, but
such support is incidental to the goal of describing what happened behaviorally. The
Numerical results are secondary to
meaning underlying a statistical result matters much more than the result itself. I en-
behavioral results. courage you to keep Abelson's (1995) MAGIC criteria in mind whenever you are con-
sidering the meaning and importance of a statistical result based on your own work or
Linking Analysis to Research 569
one you come across in the published literature (see Data Box 9.F; see also Dunn, 1999).
You may recall that MAGIC stands for a result's magnitude, articulation, generality, in-
terestingness, and credibility. How strongly does a result support a hypothesis? Can a
finding be explained clearly and succinctly? Does a result apply to other contexts? Who
will care about a finding? Is there sufficient evidence to believe what a result reveals?
Abelson exhorts researchers to examine results through the critical lens of these five cri-
teria, which also serve as reminders that numerical results represent only a superficial
part of the story. It is the relationships among variables in a set of data-the stories the
numbers tell about the interplay between or among the variables-which matter more
than any clean, correct statistical result.
behavioral sciences improved their statistical and methodological reasoning across their
four undergraduate years (Lehman & Nisbett, 1990). Foundation courses in statistics
and research methods favorably affected students' inferential reasoning across time, so
much so that they outperformed peers in the humanities and the natural sciences
(Lehman & Nisbett, 1990; see also Data Box lS.B; Lehman, Lempert, & Nisbett, 1988;
Nisbett, Fong, Lehman, & Cheng, 1987). Another study, one emphasizing methodological
issues (dependent in part on statistical knowledge), corroborates these results. VanderStoep
and Shaughnessy (1997) found that students enrolled in a one-semester research meth-
ods course improved their inferential reasoning (e.g., generalizing from samples to pop-
ulations, recognizing regression to the mean) regarding real-life events compared to a
control group enrolled in a different behavioral science course.
Third-and arguably most important-you can pledge to conduct some inde-
pendent piece of research that relies on statistics and data analysis at your earliest op-
portunity. There is really no better way to reinforce what you learned than by apply-
ing these skills in a project run from start to finish. Research projects run by individuals
or groups of students can be conducted in the context of almost any content course
(e.g., social psychology, cognitive psychology), as an independent study guided by a
faculty member, or a senior or honors project. There are a variety of books available
that can help you to develop ideas for a project, prepare the necessary groundwork, and
then actually run the study (e.g., Cozby, 1997; Dunn, 1999; Martin, 1996; Pelham, 1999).
Alternatively, you could hone your research skills by volunteering to work in a fac-
ulty member's lab on his or her program of research. Many faculty members are de-
lighted to have help with their work, and they truly enjoy teaching students about their
research passions outside the classroom. It is not uncommon for such volunteer efforts
to blossom into collaborative ones where students serve as coinvestigators-indeed,
many prominent behavioral scientists began their careers in this fashion. This sort of
"behind the scenes" experience is excellent preparation for graduate study, as well as a
meaningful way to make an informed choice about your future.
Knowledge Base
1. True or False: Statistical analysis remains the best method available for drawing
conclusions about behavior.
2. Why is the interpretation of a result more important than the statistics used to find
it?
3. How can students maintain their statistical knowledge base?
4. True or False: Less is generally more where reliance on computer software is
concerned.
Answers
1. False: Statistical analysis is but one perspective on behavior. Although it is certainly a dom-
inant mode of inquiry, others (e.g., qualitative approaches) serve as compelling alternatives.
2. Statistical analysis is an important tool used to discern or support the interpretation of a re-
search result. Such statistics are not as important as the result, however.
3. Students can keep their knowledge of statistics fresh by relying on chapter decision trees, by
remembering that transfer of learned material does occur, or by conducting an independent
project or volunteering to work on scholarly projects with active researchers. Learning to ask
for help, too, can enhance recall of course material.
4. True. Computers are extremely useful tools, but data analysts should avoid becoming de-
pendent on them .
I \
574 Chapter 15 Conclusion: Statistics and Data Analysis in Context
children than were other mothers. Moreover, these mothers were also likely to partici-
pate in various activities with their intended children, including taking them places out-
side the home. The lead researcher, Dr. Jennifer Barber, told an interviewer that "moth-
ers with intended births engaged in fun stuff like going to the movies or the zoo with
their children 3.7 times a week" (Berger, 1999). Mothers with unintended births, how-
ever, reported doing such pleasurable activities only an average of 3.5 times per week. Dr.
Barber commented that, "Over the lifetime of a child, that's a huge difference" (Berger,
1999, p. F8).
Many readers, even the most sophisticated ones, might have stopped short of say-
ing that the difference between a mean of 3.7 and 3.5 was a particularly "huge" one be-
cause the article was short-only a handful of paragraphs-and written for the lay pub-
lic (the data were originally published in an issue of the Journal of Health and Social
Behavior; see Barber et al., 1999). As a statistically knowledgeable individual, however,
you understand the meaning the researchers were trying to convey, that it is not the
means per se that matter, but their direction and magnitude. The difference begins to
be more powerful, if not sobering, when we learn that the survey data contained l3,000
respondents who were interviewed between 1987 and 1988.
Furthermore, we need to move beyond the numbers, as it were, to think about what
Remember: data are meaningless the occurrence (or relative absence) of pleasurable outings represents in the context of
familial relationships. Emphasizing context enables us to examine the effects of rela-
without context.
tionship quality in childhood and its impact on later socioemotional development, a
feat that cannot be achieved when results are not "unpacked" for readers. Unless atten-
tion is given to the educative nature of statistics and data analysis in situations like this
one, there is the unfortunate possibility that the implications of results can be missed-
or at least misconstrued.
The possibility of misconstruing statistical results, especially controversial ones, is
related to the second area, social welfare. Some readers might wonder what statistics could
possibly have to do with the welfare of society but, again, behavioral scientists have a
responsibility to present their work in constructive ways-even when the fruits of their
labors are not greeted warmly. At the same time I read the article on unintended births,
I read a second, longer one discussing some data that have not yet been published.
Because the data are so controversial, however, they began to attract attention before
appearing in any peer-reviewed publication.
The investigators, two economists, suggest that a portion of the large drop in the
national crime rate in the 1990s-maybe as much as 500/0-is attributable to the dis-
tinct rise in the number of abortions since the Roe v. Wade ruling made by the Supreme
Court in 1973 (Goode, 1999; see also Perman, 1999). In other words, at least some of
the reduced level of crime is due to the fact that children who might have grown up to
perpetrate the crime were never born. The constitutionally guaranteed right to abor-
tion led to approximately Y. of all pregnancies being aborted within a few years of the
Roe v. Wade ruling.
The conceptual details and technical aspects of the economists' argument is be-
yond the scope of our present discussion, but the factual side of their work-and its
controversial if unintended effects-is not. The two researchers conducted their research
in the spirit of interested economic inquiry, as many writers, scholars, politicians, and
pundits have tried to account for the drop in crime rates over the last decade. The econ-
omists simply wanted to demonstrate that a given social factor (i.e., legal right to abor-
tion) could be legitimately linked with a second social factor (i.e., reduced crime rates).
They never intended to make any controversial splash with their work, nor did they
think anyone would react to it as a public policy piece, something it was never intended
to be (the authors were never out to advocate abortion). According to Goode (1999),
i
r
(
1. Hoaxing involves claiming that some novel result has been found when there is no
concrete evidence for it. A researcher perpetrates a statistical hoax by claiming to
have measured some behavior or detected some significant difference when noth-
ing of the kind actually occurred. Hoaxing is, of course, pure fabrication and, there-
fore, a form of intentional fraud.
2. Forging is a close cousin of hoaxing, but it is worse because the false claim-the
hoax-is supported by fabricated ("forged") data. A behavioral scientist practicing
forging would make up some data and then perform statistical analyses on it so as
to show results confirming some favored hypothesis. Sadly, this type of forging oc-
curs, even among eminent scientists (see the case of Cyril Burt and arguments for
the heritability of intelligence; Gould, 1981).
3. Cooking data entails selecting (and perhaps selectively presenting or publishing)
only those observations that confirm a cherished hypothesis or theory. A behav-
ioral scientist guilty of cooking his or her data would simply drop, overlook, or
otherwise remove any offending (usually extreme) observations from the analysis,
perhaps pushing the (recalculated) difference between or among a group of means
toward significance. To be sure, removing problematic observations from a data
set is acceptable, but only under stringent, clear, and proscribed conditions-
conditions normally determined prior to the start of data collection and absolutely
before any analysis was undertaken (for further discussion, see Dunn, 1999). Cook-
ing is perhaps less dramatic than hoaxing or forging, but it is just as insidious and,
in any case, careful readers are always suspicious of data that look too perfect.
576 Chapter 15 Conclusion: Statistics and Data Analysis in Context
Recurring Problems with Fraudulent, False, or Mistaken Data Analysis: The Dracula Effect
Ioften think of fraudulent or false data that manage to get published as being like a lot like Dracula.
Think of all those vampire movies you have ever seen-generally, even after Dracula has a stake
driven through his heart, he comes back, maybe not in that movie, but certainly in a host of se-
quels. In much the same way, false results-whether erroneous due to chance or through calcu-
lated fakery-do not "go away;' either. It is very hard to "kill" a false result once it is published in
the scientific literature. Here's why:
• It might never be caught or recognized as false; for example, as when other researchers try to
publish results conflicting with the accepted (but false) ones. Peers may reject these compet-
ing works as "wrong," or challenging data may simply be deemed inconsistent with the exist-
ing (unknowingly false) data.
• Alternatively, a false result might be acknowledged as such, as when the original author pub-
lishes a retraction and apology. There is still a problem, however, because not everyone who is
aware of the original "result" will ever learn of the retraction!
• A false result can spur an ongoing series of research exchanges designed to prove or disprove
the veracity of the result. On the one hand, such intellectual police actions are probably a good
idea to the extent they alert readers to the questionable findings of some study. On the other
hand, however, the authors of such follow-up investigations lose valuable time and energy in
this diversion-their efforts are thus drawn away from their own research agendas.
• Finally, false findings affect the research programs of committed investigators, leading them
down blind empirical alleys, and wasting their efforts and funding in the process. This re-
source drain probably sets researchers and sciences back for quite some time, as everyone in-
volved-researchers, teachers, students, and members of the general public-lose out when
false results are perpetrated or perpetuated.
r
(
r
I
In the end, you must make a concerted effort to be an honest researcher whose in-
tegrity and work ethic precludes committing the sort of transgression shown by
Figure 15.1. The humorous intent of Figure 15.1 belies a larger problem, however: As
Babbage knew, some people misuse statistics and data analysis for inappropriate ends.
Avoid being one of them, and do your best to combat erroneous statistics and data
analysis whenever you encounter them. It is your responsibility as an educated person,
data analyst, and producer and consumer of statistics.
One final ethical concern for the data analyst is worth mentioning: Should one re-
port all the statistics that were computed or only those that reached statistical signifi-
cance? If I did my job well earlier in the book, then your automatic response to this
question should be something like, "well, it all depends." Why? Just because a statistic
is significant does not mean it is worth reporting; by the same token, some nonsignif-
icant ~tatistics can be informative. What to report or leave out all depends. The point,
then, IS to be honest with readers by specifying what statistical tests were conducted
(and why). You may not need or want to report all the results from the data that were
analyzed, but you are honor bound to disclose what sort of analyses were actually
condu\..l~d even when the specific fmdings are not provided. Following the advice of
578 Chapter 15 Conclusion: Statistics and Data Analysis in Context
Kromrey (1993) and other researchers, you do not want to give the misleading im-
pression that the reported (mostly significant) tests were the only ones actually per-
formed (but recall Data Box 14.E).
fill Conclusion
I closed the first chapter of this book by reciting a fortune appropriate to an individual's
first exposure to statistics and data analysis, so it is only fitting to do the same here and
now, at the journey's end. Whatever your goals are now or in the future, I know that the
quantitative knowledge presented in this book will serve you well, teaching you to look
at the world around you from a different perspective. Maintaining this different point of
view will refine the judgments you make because, in the spirit of the slip of paper found
in a recent fortune cookie, "You should be able to undertake and complete anything:'
This final Project Exercise is designed to supplement the decision tree that opened the
chapter, as well as to build on the aforementioned goal of applying the statistical knowl-
edge you acquired in the course of reading this book. You can review three aspects of
your knowledge by performing this Project Exercise: characterizing what existing or po-
tential data are like, identifying the purpose of a piece of research, and reviewing or
reporting statistical information. The Project Exercise consists of a series of simple ques-
tions (and accompanying checklists) designed to help you evaluate published data and
analyses appearing in a journal article, chapter, or book. The questions are designed to
refresh your memory about a test, as well as to verify the proper use of an analysis and
the presence (if any) of supporting statistics.
As you review some published data, for example, you can compare the authors'
analyses and results with those proscribed by the checklist shown below. Alternatively,
the following questions can also be used as a checklist during the analysis planning stages
of a research project you are conducting (e.g., to keep track of completed analyses, to
plan any remaining ones). When you set out to plan analyses-preferably in advance of
any actual data collection-the questions and checklist provided below will help you to
frame your thinking appropriately. You can also rely on the three sets of questions to or-
ganize your presentation of data and any analyses you have already completed.
=
ratio
b. What is the shape of the distribution of data?
normal
positively skewed
_ _ negatively skewed
bi- or multimodal
c Is the sample size reasonable?
. _ _ very small-potential for bias
small
---------------~- -
-
Conclusion 579
__ adequate
__ large
2. Purpose of the research
a. Is the research descriptive?
__ summary or descriptive statistics (i.e., mean, variance, standard
deviation, range)
__ tables (e.g., frequencies)
__ graphs (e.g., bar graphs, Tukey's tallies)
b. Is the research focused on associations between variables?
__ continuous data are analyzed by the Pearson correlation coefficient (r)
__ discontinuous data (discrete) data are analyzed by the Spearman
rank-order correlation coefficient (rs)
c. Is the research predictive?
_ continuous data are analyzed by linear regression
d. Is the research inferential, generally focused on mean comparison?
__ one independent variable
__ one mean-single-sample t test or z test
two means
__ independent groups t test
estimated III
__ one-way ANOVA (F test) for two means
__ estimated III
__ f (effect size)
__ dependent groups t test
three or more means
__ one-way ANOVA (F test)
__ Tukey's HSD for post hoc comparison of means
__ estimated III
__ f (effect size)
__ one-way repeated measures ANOVA (F test)
__ Tukey's HSD for post hoc comparison of means
estimated /;)2
__ f (effect size)
__ two independent variables
__ two-way ANOVA (F test)
estimated /;)2
r __ f (effect size)
e. Is the research focused on nonparametric differences?
__ discontinuous (discrete) data are:
ordinal
__ association between ranks--Spearman rank-order correlation
coefficient (rs)
__ independent rankings-Mann-Whitney Utest
__ dependent rankings-Wilcoxon matched-pairs signed-ranks
test
__ categorical
__ one variable-chi-square (X2) goodness-of-fit test
__ two variables-chi-square (X2) test for independence
__ phi coefficient (4J)
__ Cramer's V
L Other _____________________________________
Chapter 15 Conclusion: Statistics and Data Analysis in Context
580
Summary
1. Traditional null hypothesis significance tests have been re- to a right or correct answer is less important than what the
peatedly criticized as being too confining, limited in scope, or answer reveals about behavior. Data analysts should always
flagrantly misapplied by many researchers. The APA's Task keep Abelson's (1995) MAGIC (magnitude, articulation, gen-
Force in Statistical Inference recommends that researchers be erality, interestingness, and credibility) in mind.
more flexible and less dogmatic where statistics and data 5. Learning about statistics and data analysis is more than plug-
analysis are concerned. ging numbers into formulas. A prepared, thoughtful data an-
2. Meaning is more important than method, and researchers are alyst knows which statistical test to use under what condition,
encouraged to examine their data more closely and with more tries to retain statistical knowledge by being involved in re-
of an eye to replication than has heretofore been the case. Gen- search, and knows when to seek advice or assistance.
eralizing results from one setting to another is as dependent 6. Computers are valuable aids, but like statistical analysis itself,
on the practical matter of replication as it is on the statistical they ideally serve as tools to help the research process run
significance of results. smoothly. Budding statisticians should avoid becoming overly
3. Readers were warned to not treat statistics as an ideology or dependent on computers and their statistical software-it is
too stringent a set of beliefs or procedures for working with too easy (and problematic) to let a program do your thinking
data. A statistical ideology can lead to alienation by reducing for you.
a researcher's analytic flexibility, promoting mere analysis over 7. In order to truly think like a behavioral scientist, one must
the importance of interpretation, or curtailing openness to consider the educational, social, and ethical implications of
new analytic techniques. statistics and data analysis. A researcher must educate audi-
4. The results of any statistical analysis should be viewed from ences about the interpretation, meaning, and application of
their interpretive utility first and foremost-does a finding in- any result by placing it in context. When a result has social
crease understanding of behavior? Whether a calculation leads implications-even controversial ones-a researcher must
I
(
r defend the choice of an analysis and its meaning, and be ac- analyses (forging), selectively presenting information (cook-
countable for both. ing), and removing extreme scores from samples (trim-
8. Ethical standards in statistics and data analysis require re- ming).
searchers to acknowledge and avoid four forms of scientific 9. The chapter and book concludes by encouraging readers to
.t'
fraud identified by Babbage (1989). These include pointing undertake behavioral science research and the statistical analy-
to findings that do not exist (hoaxing), fabricating data and sis of data with confidence.
Key Terms
Ideology (p. 567) Meta-analysis (p. 564)
f
(
Chapter Problems
1. Identify some of the problems or concerns linked to the use 16. Why is the computer a "tool"? Explain.
of null hypothesis significance testing in the behavioral sci- 17. Why should behavioral scientists be concerned about the ed-
ences, especially psychology. ucational implications of the analyses used to support their
2. What are some of the remedies proposed to improve the use research?
of significance testing and the reporting of results in the be- 18. Why should behavioral scientists be concerned about the
havioral sciences? social implications of the results and analyses based on their
3. Discuss six of the specific recommendations made by the research?
American Psychological Association's (APA) Task Force on 19. You are a behavioral scientist whose research and analyses
J Statistical Inference (TFSI). How can these recommendations have led to some controversial finding that is upsetting to the
improve the use of statistics and data analysis, as well as the general public. How should you react? What are your re-
reporting of written results? sponsibilities to the public and to your research, including
.,,
I
4. In lieu of exclusive reliance on statistical results, what alter-
native do Loftus and others recommend? In your opinion,
the statistical analyses?
20. Identify, define, and explain the four classes of scientific fraud
(
why have researchers been relatively slow to embrace this al- identified by Babbage (1989).
ternative? 21. Why do you suppose some investigators might see "trimming"
5. How can replication be used to demonstrate generalizability? as more acceptable than the other three classes of fraud? Is
6. Why should researchers avoid treating statistics as an ideol- it more acceptable-why or why not?
ogy? How can investigators become alienated from their data? 22. Why is scientific fraud so dangerous to the scientific enter-
7. In your opinion, is it possible for a researcher to become prise, including statistics and data analysis?
overly dependent on statistics? How? 23. Why are fraudulent, false, or otherwise misleading data so
8. Are right answers to statistical calculations more or less im- problematic where the scientific literature is concerned?
portant than their interpretation? 24. What is the "Dracula effect"? What problem does it present
9. How do Abelson's (1995) MAGIC criteria impact on the ten- to behavioral science?
sion between obtaining right answers and offering accurate 25. Is a researcher ethically bound to report the results of all sta-
interpretation? tistical analyses performed on a data set? Why or why not?
10. Why is it important to transfer statistical knowledge from 26. In your opinion, what is the proper role of statistics and data
one setting to another? analysis in the behavioral sciences?
11. What evidence supports the contention that statistical or data 27. In your opinion, what contributions do statistics and data
analytic skills acquired in one domain can be transferred to analysis make to the behavioral sciences?
others? Explain. 28. A criminologist studies the decrease in juvenile delinquency
12. How can first-time data analysts retain the statistical knowl- and petty crime in a group of middle school students fol-
edge and skills they have acquired after reading a book like lowing an intensive educational intervention. Delinquent be-
this one or taking a statistics course? havior was measured before and then 6 months after the
13. As a novice data analyst, how will you keep your statistical intervention. The criminologist wants to demonstrate a de-
skills sharp? crease in mean crimes committed by the children across the
14. Characterize the proper role of computers and statistical soft- 6-month period. Which statistical test is best suited to test
ware in data analysis. the researcher's hypothesis? Why? (Hint: Use the decision tree
15. When planning to analyze some data using statistical soft- at the opening of the chapter to answer this question.)
ware, what are some of the dos and don'ts a data analyst 29. An animal breeder compares the rated temperaments of four
should follow? different breeds of dogs (Labrador retriever, rottweiler, fox
582 Chapter 15 Conclusion: Statistics and Data Analysis in Context
terrier, Jack Russell terrier). How can the breeder determine searcher has a group of children listen to a story and then
which breed has the least reactive (i.e., emotional) tempera- each child (separately from the others) is asked to repeat the
ment, which one has the most, and so on? Which statistical story from start to finish. The children's recall is then checked
test is best suited to test the breeder's hypothesis? Why? (Hint: for accuracy (i.e., how many errors occur at the beginning,
Use the decision tree at the opening of the chapter to answer middle, or end of the story). The psychologist assumes that
this question.) most errors will occur in the middle of the children's stories.
30. A college admissions director wants to compare the mean Which statistical test is best suited to test the researcher's hy-
achievement test scores of this year's entering class with the pothesis? Why? (Hint: Use the decision tree at the opening
average of the previous year. The director believes that the of the chapter to answer this question.)
entering class has a higher mean score than last year's aver- 34. A medical researcher looks at the incidence (number of cases)
age. Which statistical test is best suited to test the director's of meningitis in a large city, categorizing the occurrence of
hypothesis? Why? (Hint: Use the decision tree at the open- the disease by age (i.e., 25 and under, 26 to 50, and 51 and
ing of the chapter to answer this question.) above). The researcher wants to determine whether menin-
31. Imagine that the admissions director described in question gitis risk is linked to any particular age group. Which statis-
30 was interested in comparing the class rank of the enter- tical test is best suited to test for this link? Why? (Hint: Use
ing class against that of the rank found in the previous year's the decision tree at the opening of the chapter to answer this
class. The director believes that last year's class has a lower question.)
rank than this year's class. Which statistical test is best suited 35. A demographer believes that the number of siblings (X)
to test the director's hypothesis? Why? (Hint: Use the decision people have is inversely related to intellectual outlook of the
tree at the opening of the chapter to answer this question.) family environment (Y). The researcher rates the intellec-
32. A family practice physician examines the link between age, tual atmosphere of a large number of families and then tries
gender, and the incidence of motion sickness in children. She to see if fewer siblings predict a more favorable intellectual
recruits three groups of children (ages 4, 10, and 14). Each family environment. Which statistical test is best suited to
group has an equal number of males and females in it. The test the researcher's hypothesis? Why? (Hint: Use the deci-
doctor believes that motion sickness decreases with age, but sion tree at the opening of the chapter to answer this ques-
that the decrease is especially pronounced in female children. tion.)
Which statistical test is best suited to test the doctor's hypoth- 36. A professor of nursing has male and female college students
esis? Why? (Hint: Use the decision tree at the opening of the categorize themselves as light, medium, or heavy (binge)
chapter to answer this question.) drinkers of alcoholic beverages. This nursing researcher
33. A developmental psychologist studies whether the details wants to see whether gender is linked to how students cate-
young children leave out of stories are linked to the well- gorize themselves as drinkers. Which statistical test is best
known serial position effect associated with memory research suited to test the researcher's question? Why? (Hint: Use the
(i.e., items appearing at the start or finish of a list are recalled decision tree at the opening of the chapter to answer this
better relative to those appearing in the middle). The re- question.)
A P PEN D x A
IIi Overview
This appendix presents the basic mathematical operations that are used throughout the calculations pre-
sented in this book (i.e., addition, subtraction, multiplication, division, exponentiation and square roots, ab-
solute values, and elementary algebra). Many readers will be very familiar with the operations presented here,
needing only to skim the following material. Other students might feel a bit rusty, however, so this review
should be useful in jump-starting their memories of basic math and algebra. To help spur recall, I include a
Knowledge Base at the end of each section, which can serve as a short quiz of material that was just presented.
If you find yourself having difficulty with particular but consistent sorts of problems within a given review
section and/or Knowledge Base, you should seek help from your instructor, a teaching assistant, or a tutor.
Alternatively, consider looking up the relevant topics in one or more of the references in basic mathematics
provided at the end of the first half of this appendix.
The second half of Appendix A briefly discusses math anxiety-the fear of doing poorly in math or ac-
tivities requiring mathematics (including, naturally, statistics)-and what can be done about it. I sincerely
regret that I cannot dedicate a large amount of space to addressing ways to combat math anxiety in this book.
What I can do, however, is to identify the symptoms and consequences of math anxiety and then encourage
you to seek appropriate assistance if you believe you are experiencing it. To that end, some references on the
topic are provided at the end of this appendix.
Where possible, I consciously try to avoid repeating information that I presented elsewhere in the book.
Thus, you will need to turn to chapter 1 for a review of the order of mathematical operations and see Table
1.1 for a comprehensive list of mathematical symbols, their meaning, and an example illustrating each one.
[A.1.1] 2 + 7 + 3 = 12
A·1
Appendix A Basic Mathematics Review and Discussion of Math Anxiety
A·2
[A.2.1] 7 +3 +2 = 12.
Note, too, that although the presence of parentheses indicates which numbers should be summed first, this
imposition of order has no effect on the final sum, as in:
[A.3.1] (2 + 7) + 3 = 12,
[A.3.2] 9 + 3 = 12,
[A.3.3] 12 = 12.
Adding positive and negative numbers together has no effect on a sum, either. Consider this equation,
where the negative sign is treated as a subtraction sign:
[A.4.1] 4 + (-2) + 3 = 5,
[A.4.2] 4-2+3=5,
[A.4.3] 2 + 3 = 5,
[A.4.4] 5 = 5.
What happens when a long string of positive and negative numbers is present? An easy way to proceed is to
(a) add all the positive numbers together, (b) add all the negative numbers together, and then (c) subtract
the negative sum from the positive sum. Thus, for instance, we might have:
[A.5.2] 4 + 6 + 2 + 1 = 13,
[A.5.3] -1 - 5 - 4 = -10.
Subtracting the negative sum from the positive sum, we find:
[A.5.4] 13 - 10 = 3.
Naturally, if the negative sum were -13 and the positive sum were 10, you would subtract 13 from 10, which
is equal to -3.
When subtracting negative numbers, however, keep in mind that two adjacent negative signs convert
to a positive sign, meaning that addition is taking place. Thus,
Knowledge Base
Perform the following calculations to check your understanding of addition and sub-
traction.
1. 10 - (-4) + 3 + (-6) =
2. 3 + 5 + 6 + 1+ 9 =
3. (3 + 5) + 2 =
4. (10-5)+(-5)+2=
5. 7-5+9-(-3)+1+3+5+(-5)=
6. - 5 +3- 5 - (-2) +4- 3=
An8W811
1. 11
2. 24
3. 10
4. 2
5. 18
6. -4
Overview A·3
Multiplication
The order or placement of numbers in multiplication has no influence on a product's value. Thus,
[A.7.1] 4X5X2=40
is equivalent to
[A.8.1] 2 X 4 X 5 = 40.
One of the most common errors occurring in the calculation of statistics is the dropping or misplacing of a
decimal point. As shown by [A.IO.IJ and [A.ILlJ, placement of a decimal point (Le., 3.0 vs. 0.3) has a dra-
matic effect on the magnitude of an answer (Le., 15 vs. 1.5).
Knowledge Base
I. 3X5X7X5=
2. (7 x 4) x 3 x 2 =
3. 5 x 1 X (2 X 9) =
4. 2.5 X 25 =
5. 10.0 x 1.0 =
6. 5.0 x 4.0 =
Answers
I. 525
2. 168
3. 90
4. 62.5
5. 10.0
6. 20.0
DiviSion
Division involves two numbers-one is divided by the value of the other-and the order
of these numbers in an equation makes a difference. Thus, for example,
[A.12.1] 6 + 2"* 2 + 6,
[A.12.2] 3"* 0.33.
Although division is usually designated by the symbol +, most division in this book is denoted by one num-
ber appearing on top of another, with a straight line separating the two, or:
[A.13.1] i =3
2 '
[A.13.2] 3 = 3.
Similarly, it must be true that:
The number appearing on top of the line is called the numerator, whereas the one under the line-the num-
ber that does the actual dividing-is called the denominator. As acknowledged previously in the discussion
of multiplication, the presence of a decimal point has an effect on an answer, and you must be vigilant about
where it is placed. If you are not careful, you can easily make a miscalculation. Consider, for instance,
[A.15.1] ~ = 1.2
15
versus
[A.16.1] ~ = 12.
1.5
Knowledge Base
100
I.
25
100
2.
2.5
10
3.
25
10
4.
2.5
8
5.
8
8.0
6.
0.8
Answers
I. 4
2. 40
3. 0.40
4. 4
5. 1.0
6. 10
[A.17.1] 92 = 9 X 9 = 81.
The number presented as a superscript indicates the number of times the larger number is to be multiplied
by itself. To continue with the number 9, we can take 9 to the third power, or:
[A.18.1] 93 =9 X9 X9 = 729.
Taking 9 to the fourth power results in:
[A.19.1] 94 = 9 X 9 X 9 X 9 = 6,561.
or
[A.21.1]
or
[A.22.1] Viii = 15.07.
Calculating all but the simplest (i.e., usually memorized) square roots by hand is tedious. Fortunately,
most calculators have a square root key (i.e., Yx), and you should make certain that your calculator is
equipped with one.
Knowledge Base
l. 52 =
2. 102 =
3. 63 =
4. V28=
5. VoTo=
6. ViOo=
Anlwlll
l. 25
2. 100
3. 216
4. 5.29
5. 0.3162
6. 10
Absolute Value.
Taking the absolute value of a positive or a negative number means that you are converting the number-
regardless of its value-to a positive number. Symbolically, the number is placed between two horizontal lines
(i.e., Ixl). Here are a few examples:
Knowledge Ba.e
Find the absolute value of each of the following numbers:
l. -45
2. -2,765
3. 67
4. 2
5. -70
6. 0
AnlwlII
l. 1-451 = 45
2. 1-2,7651 = 2,765
3. 1671 = 67
4. 121 = 2
5 1-701 = 70
6. 101 = 0
Transposition. Transposition is the mathematical operation where a number or other mathematical term
is moved from one side of an equation to the other. You may remember the common refrain that genera-
tions of teachers have passed on to their students: "What you do to one side must also be done on the other
side." Words to live by when doing algebra. With this dictum in mind, recognize that each of the following
mathematical statements are actually equal to one another:
x+y=z,
y=z-x,
x=z-y,
O=z-x-y,
0= z - (x + y).
Transposition enables us to fill in the blanks, the unknown values, of any equation. Consider the
following:
[A.26.l] 30 +x= 8.
[A.26.3] x = -22.
As an error check, plug the obtained answer into the original formula shown above in [A.26.1j (but renum-
bered here for consistency). Replacing x with - 22, we find:
[A.27.l] 30 + (-22) = 8,
[A.27.2] 30 - 22 = 8,
[A.27.3] 8 = 8.
Some equations involve subtracting a value from x. Here is an example:
[A.28.l] x - 6 = 14.
In this case, 6 must be added to both sides of the equation in order to isolate the value of x, as in:
[A.28.2] x = 14 + 6,
[A.28.3] x = 20.
Once again, we can check for any errors by substituting 20 for x in [A.28.1] (renumbered here):
[A.29.l] 20 - 6 = 14,
[A.29.2] 14 = 14.
To increase your mathematical flexibility, it is also a very good idea to recall how to solve equations
when they contain fractions. Here is a basic example, one where we need only to solve the equation:
5-2
[A.30.l] x= - - ,
2
3
[A.30.2] x= -,
2
[A.30.3] x = 1.5.
A given value of x can also be multiplied by some other value. Imagine that we have an equation like
this one:
[A.31.1] 6x = 30.
In this case, we must remove the 6 that is multiplied by the value x. To do so, we can divide both sides of the
equation by 6, or:
6x 30
[A.3l.2]
6 6'
[A.3l.3] x = 5.
Overview A·7
By plugging the value of 5 into the original equation (Le., [A.3Ll), we find that:
,.r' [A.32.1] 6(5) = 30,
i
[A.32.2] 30 = 30.
, [A.33.2] 4 d) = 12(4).
-
I
The two 4s on the left side of the equation cancel each other out, so:
r
, [A.33.3] x = 48.
I
( As an error check, we enter 48 into [A.33.1] (renumbered here for consistency):
[A.34.1] 48 = 12
4 '
[A.34.2) 12 = 12.
Knowledge Base
I. 12+x=10
2. x- 2 = 15
3- 1
3. x=--
4
4. 9+ x = 12
5. x-5=6
7 +3
6. x=--
10
Answers
I. -2
2. 17
3. 0.50
4. 3
5. 11
6.
[A.35.1] 5x-9=15.
[A.35.2] 5x = 15 + 9,
[A.35.3] 5x = 24.
x 24
[A.35.4]
5=5"
[A.35.5] x= 4.8.
A-8 Appendix A Basic Mathematics Review and Discussion of Math Anxiety
As always. we check the answer by substituting the obtained value for xin the original equation (i.e•• [A.35.1)):
[A.36.1) 5(4.8) - 9 = 15.
[A.36.2] 24 - 9 = 15.
[A.36.3) 15 = 15.
Here is a final algebraic example:
[A.37.1] x+6 = 5
8 .
[A.37.4] x=40-6.
[A.37.S] x= 34.
As an error check. we can insert 34 into the original formula (i.e.• [A.37.1)):
[A.38.1] 34+6 =5
8 •
[A.38.2] 40 = 5
8 '
[A.38.3] 5 = 5.
Knowledge Base
I. 2x- 4 = 10
2. x+3=10
6
3. 5x- 8 = 20
4. x + I = 15
10
5. 7x +2= 9
x-5
6. ---zo- = 8
....
I.
2.
7
57
3. 5.6
4. 149
5. I
6. 165
This exercise is not a quiz or a personality inventory-it is a common sense-based approach to diag-
nosing math anxiety. If you agreed with more than half of these statements, then there is a distinct possibil-
ity that you suffer from a common, intellectually disabling condition labeled math anxiety. Math anxiety is
the catchall term for a general fear or feeling of dread associated with any number of mathematically related
activities-from balancing a checkbook to performing inferential statistics.
Tobias (1993) identifies four myths associated with mathematics. They are:
There is ample empirical evidence disproving these unfortunate myths, and more data are being collected all
the time (see Tobias, 1993). Unfortunately, our culture and various social influences-the family, the class-
room, our peers-often reinforce these myths, thereby perpetuating math anxiety. I cannot tackle the entire
problem of math anxiety here, nor can I help you to overcome it right now. But what I can do is warn you
that math anxiety is real and that it can have adverse consequences for how well you learn statistics and data
analysis. Particular groups of people-women and minorities-are especially prone to math anxiety (Tobias,
1993). If you are female or the member of a minority group, and math-related activities make you somewhat
anxious, then you will want to take math anxiety very seriously-and take action to do something about it
(for interesting perspectives on group membership and intellectual performance, see Spencer, Steele, & Quinn,
1999; Steele & Aronson, 1995).
Regardless of your gender or group membership, however, if you believe that you suffer from math anx-
iety' only you can take the concrete steps necessary to deal with it. First, speak to your statistics instructor or
a teaching assistant about it. Second, call your institution's Department of Mathematics to see if any work-
shops, courses, or materials on coping with and overcoming math anxiety are available (if your school has
one, you might also call the Statistics Department). Third, approach your institution's Dean of Students or
Student Services Office to find out what, if any, resources are available for people with math anxiety. Fourth,
take a look at some of the math anxiety references provided on the next page, read them, and see if they help
to address your concerns about doing math.
On a personal note, I have taught many students who saw themselves as math phobics. When they ex-
pended effort in my statistics course and worked to overcome their math anxiety-being coached by my fa-
vorite admonition that "statistics is a tool to help guide inference only"-they tended to do just fine in the
course. I am sure that you will do fine in your statistics course, as well, but you must spend energy and time
carefully reading this textbook, doing the homework, diligently attending class, asking questions, and immedi-
ately seeking assistance when you have difficulty with course material. All of these activities fall under the head-
ing of the "mise en place" philosophy of doing statistics (see chapter 1), of course, as does doing something
about math anxiety. Good luck-identifying a problem like math anxiety is the first step to its resolution.
A-10 Appendix A Basic Mathematics Review and Discussion of Math Anxiety
STATISTICAl TA5lrS
,
I
B·1
B·2 Appendix B Statistical Tables
Table B.2 Proportions of Area Under the Normal Curve (the z Table)
........"' The use of this table requires that the raw score be transformed into a z score and that the
variable be normally distributed.
The values in the table represent the proportion of area in the standard normal curve,
which has a mean of 0, a standard deviation of 1.00, and a total area also equal to 1.00.
Because the normal curve is symmetrical, it is sufficient to indicate only the areas
corresponding to positive z values. Negative z values will have precisely the same proportions
of area as their positive counterparts.
Column B represents the proportion of area between the mean and the given z. Column
C represents the proportion of area beyond a given z.
y y
Mean z -z Mean
X X
y y
Mean z -z Mean
X X
,
?
B·4 Appendix B Statistical Tables
Source: Table B.4 is taken from Table III (page 46) of Fisher and Yates, Statistical Tables for Biological,
Agricultural, and Medical Research, 6th ed., published by Longman Group Ltd., 1974, London (previously
published by Oliver and Boyd, Edinburgh), and by permission of the authors and publishers.
.
III
CO Table 8.5 Critical values of F
The obtained F is significant at a given level if it is equal to or greater than the value shown in the table. 0.05 (lightface row) and 0.01 (boldface
row) points for the distribution of F.
The values shown are the right tail of the distribution obtained by dividing the larger variance estimate by the smaller variance estimate. To find
the complementary left or lower tail for a given df and a level, reverse the degrees of freedom and find the reciprocal of that value in the F table. For
example, the value cutting off the top 5% of the area for 7 and 12 df is 2.92. To find the cutoff point of the bottom 5% of the area, find the tabled
value of the a = .05 level for 12 and 7 df This is found to be 3.57. The reciprocal is 1/3.57 = 0.28. Thus 5% of the area falls at or below an F = 0.28.
18.51 19.00 19.16 19.25 19.30 19.33 19.36 19.37 19.38 19.39 19.40 19.41 19.42 19.43 19.44 19.45 19.46 19.47 19.47 19.48 19.49 19.49 19.50 19.50
98.49 99.01 99.17 99.25 99.30 99.33 99.34 99.36 99.38 99.40 99.41 99.42 99.43 99.44 99.45 99.46 99.47 99.48 99.48 99.49 99.49 99.49 99.50 99.50
10.13 9.55 9.28 9.12 9.01 8.94 8.88 8.84 8.81 8.78 8.76 8.74 8.71 8.69 8.66 8.64 8.62 8.60 8.58 8.57 8.56 8.54 8.54 8.53
34.12 30.81 29.46 28.71 28.24 27.91 27.67 27.49 27.34 27.23 27.13 27.05 26.92 26.83 26.69 26.60 26.50 26.41 26.30 26.27 26.23 26.18 26.14 26.12
7.71 6.94 6.59 6.39 6.26 6.16 6.09 6.04 6.00 5.96 5.93 5.91 5.87 5.84 5.80 5.77 5.74 5.71 5.70 5.68 5.66 5.65 5.64 5.63
21.20 18.00 16.69 15.98 13.52 15.21 14.98 14.80 14.66 14.54 14.45 14.37 14.24 14.15 14.02 13.93 13.83 13.74 13.69 13.61 13.57 13.52 13.48 13.46
6.61 5.79 5.41 5.19 5.05 4.95 4.88 4.82 4.78 4.74 4.70 4.68 4.64 4.60 4.56 4.53 4.50 4.46 4.44 4.42 4.40 4.38 4.37 4.36
...
a
16.26 13.27 12.06 11.39 10.97 10.67 10.45 10.27 IUS 10.05 9.96 9.89 9.77 9.68 9.55 9.47 9.38 9.29 9.24 9.17 9.13 9.07 9.04 9.02
o::! 6 5.99 5.14 4.76 4.53 4.39 4.28 4.21 4.15 4.10 4.06 4.03 4.00 3.96 3.92 3.87 3.84 3.81 3.77 3.75 3.72 3.71 3.69 3.68 3.67
I:!
'g 13.74 10.92 9.78 9.15 8.75 8.47 8.26 8.10 7.98 7.87 7.79 7.72 7.60 7.52 7.39 7.31 7.23 7.14 7.09 7.02 6.99 6.94 6.90 6.88
c
~...
5.59 4.74 4.35 4.12 3.97 3.87 3.79 3.73 3.68 3.63 3.60 3.57 3.52 3.49 3.44 3.41 3.38 3.34 3.32 3.29 3.28 3.25 3.24 3.23
12.75 9.55 8.45 7.85 7.46 7.19 7.00 6.84 6.71 6.62 6.54 6.47 6.35 6.27 6.15 6.07 5.98 5.90 5.85 5.78 5.75 5.70 5.67 5.65
~ 5.32 4.46 4.07 3.84 3.69 3.58 3.50 3.44 3.39 3.34 3.31 3.28 3.23 3.20 3.15 3.12 3.08 3.05 3.03 3.00 2.98 2.96 2.94 2.93
E 11.26 8.65 7.59 7.01 6.63 6.37 6.19 6.03 5.91 5.82 5.74 5.67 5.56 5.48 5.36 5.28 5.20 5.11 5.06 5.00 4.96 4.91 4.88 4.86
] 9 5.12 4.26 3.86 3.63 3.48 3.37 3.29 3.23 3.18 3.13 3.10 3.07 3.02 2.98 2.93 2.90 2.86 2.82 2.80 2.77 2.76 2.73 2.72 2.71
r1:; 10.56 8.02 6.99 6.42 6.06 5.80 5.62 5.47 5.35 5.26 5.18 5.11 5.00 4.92 4.80 4.73 4.64 4.56 4.51 4.45 4.41 4.36 4.33 4.31
'&
10 4.96 4.10 3.71 3.48 3.33 3.22 3.14 3.07 3.02 2.97 2.94 2.91 2.86 2.82 2.77 2.74 2.70 2.67 2.64 2.61 2.59 2.56 2.55 2.54
.,
~
10.04 7.56 6.55 5.99 5.64 5.39 5.21 5.06 4.95 4.85 4.78 4.71 4.60 4.52 4.41 4.33 4.25 4.17 4.12 4.05 4.01 3.96 3.93 3.91
~
Q 11 4.84 3.98 3.59 3.36 3.20 3.09 3.01 2.95 2.90 2.86 2.82 2.79 2.74 2.70 2.65 2.61 2.57 2.53 2.50 2.47 2.45 2.42 2.41 2.40
9.65 7.20 6.22 5.67 5.32 5.07 4.88 4.74 4.63 4.54 4.46 4.40 4.29 4.21 4.10 4.02 3.94 3.86 3.80 3.74 3.70 3.66 3.62 3.60
12 4.75 3.88 3.49 3.26 3.11 3.00 2.92 2.85 2.80 2.76 2.72 2.69 2.64 2.60 2.54 2.50 2.46 2.42 2.40 2.36 2.35 2.32 2.31 2.30
9.33 6.93 5.95 5.41 5.06 4.82 4.65 4.50 4.39 4.30 4.22 4.16 4.05 3.98 3.86 3.78 3.70 3.61 3.56 3.49 3.46 3.41 3.38 3.36
13 4.67 3.80 3.41 3.18 3.02 2.92 2.84 2.77 2.72 2.67 2.63 2.60 2.55 2.51 2.46 2.42 2.38 2.34 2.32 2.28 2.26 2.24 2.22 2.21
9.07 6.70 5.74 5.20 4.86 4.62 4.44 4.30 4.19 4.10 4.02 3.96 3.85 3.78 3.67 3.59 3.51 3.42 3.37 3.30 3.27 3.21 3.18 3.16
14 4.60 3.74 3.34 3.11 2.96 2.85 2.77 2.70 2.65 2.60 2.56 2.53 2.48 2.44 2.39 2.35 2.31 2.27 2.24 2.21 2.19 2.16 2.14 2.13
8.86 6.51 5.56 5.03 4.69 4.46 4.28 4.14 4.03 3.94 3.86 3.80 3.70 3.62 3.51 3.43 3.34 3.26 3.21 3.14 3.11 3.06 3.02 3.00
15 4.54 3.68 3.29 3.06 2.90 2.79 2.70 2.64 2.59 2.55 2.51 2.48 2.43 2.39 2.33 2.29 2.25 2.21 2.18 2.15 2.12 2.10 2.08 2.07
8.68 6.36 5.42 4.89 4.56 4.32 4.14 4.00 3.89 3.80 3.73 3.67 3.56 3.48 3.36 3.29 3.20 3.12 3.07 3.00 2.97 2.92 2.89 2.87
Table 8.5 (continued)
Degrees of Freedom for Numerator
4 6 7 8 9 10 II 12 14 16 20 24 30 40 50 75 100 200 500
16 4.49 3.63 3.24 3.01 2.85 2.74 2.66 2.59 2.54 2.49 2.45 2.42 2.37 2.33 2.28 2.24 2.20 2.16 2.\3 2.09 2.07 2.04 2.02 2.01
8.53 6.23 5.29 4.77 4.44 4.20 4.03 3.89 3.78 3.69 3.61 3.55 3.45 3.37 3.25 3.18 3.10 3.01 2.96 2.89 2.86 2.80 2.77 2.75
17 4.45 3.59 3.20 2.96 2.81 2.70 2.62 2.55 2.SO 2.45 2.41 2.38 2.33 2.29 2.23 2.19 2.15 2.11 2.08 2.04 2.02 1.99 1.97 1.96
8.40 6.11 5.18 4.67 4.34 4.10 3.93 3.79 3.68 3.59 3.52 3.45 3.35 3.27 3.16 3.08 3.00 2.92 2.86 2.79 2.76 2.70 2.67 2.65
18 4.41 3.55 3.16 2.93 2.77 2.66 2.58 2.51 2.46 2.41 2.37 2.34 2.29 2.25 2.19 2.15 2.11 2.07 2.04 2.00 1.98 1.95 1.93 1.92
8.28 6.01 5.09 4.58 4.25 4.01 3.85 3.71 3.60 3.51 3.44 3.37 3.27 3.19 3.07 3.00 2.91 2.83 2.78 2.71 2.68 2.62 2.59 2.57
19 4.38 3.52 3.13 2.90 2.74 2.63 2.55 2.48 2.43 2.38 2.34 2.31 2.26 2.21 2.15 2.11 2.07 2.02 2.00 1.96 1.94 1.91 1.90 1.88
8.18 5.93 5.01 4.50 4.17 3.94 3.77 3.63 3.52 3.43 3.36 3.30 3.19 3.12 3.00 2.92 2.84 2.76 2.70 2.63 2.60 2.54 2.51 2.49
20 4.35 3.49 3.10 2.87 2.71 2.60 2.52 2.45 2.40 2.35 2.31 2.28 2.23 2.18 2.12 2.08 2.04 1.99 1.96 1.92 1.90 1.87 1.85 1.84
8.10 5.85 4.94 4.43 4.10 3.87 3.71 3.56 3.45 3.37 3.30 3.23 3.\3 3.05 2.94 2.86 2.77 2.69 2.63 2.56 2.53 2.47 2.44 2.42
21 4.32 3.47 3.07 2.84 2.68 2.57 2.49 2.42 2.37 2.32 2.28 2.25 2.20 2.15 2.09 2.05 2.00 1.96 1.93 1.80 1.87 1.84 1.82 1.81
8.02 5.78 4.87 4.37 4.04 3.81 3.65 3.51 3.40 3.31 3.24 3.17 3.07 2.99 2.88 2.80 2.72 2.63 2.58 2.51 2.47 2.42 2.38 2.36
...
2
22 4.30
7.94
3.44
5.72
3.05
4.82
2.82
4.31
2.66
3.99
2.55
3.76
2.47
3.59
2.40
3.45
2.35
3.35
2.30
3.26
2.26
3.18
2.23
3.12
2.18
3.02
2.\3
2.94
2.07
2.83
2.03
2.75
1.98
2.67
1.93
2.58
1.91
2.53
1.87
2.46
1.84
2.42
1.81
2.37
1.80
2.33
1.78
2.31
<:!
;!
1.96
'g 23 4.28
7.88
3.42
5.66
3.03
4.76
2.80
4.26
2.64
3.94
2.53
3.71
2.45
3.54
2.38
3.41
2.32
3.30
2.28
3.21
2.24
3.14
2.20
3.07
2.14
2.97
2.10
2.89
2.04
2.78
2.00
2.70 2.62
1.91
2.53
1.88
2.48
1.84
2.41
1.82
2.37
1.79
2.32
1.77
2.28
1.76
2.26
<:)
;!
C; 24 4.26 3.40 3.01 2.78 2.62 2.51 2.43 2.36 2.30 2.26 2.22 2.18 2.13 2.09 2.02 1.98 1.94 1.89 1.86 1.82 1.80 1.76 1.74 1.73
... 7.82 5.61 4.72 4.22 3.90 3.67 3.50 3.36 3.25 3.17 3.09 3.03 2.93 2.85 2.74 2.66 2.58 2.49 2.44 2.36 2.33 2.27 2.23 2.21
~
25 4.24 3.38 2.99 2.76 2.60 2.49 2.41 2.34 2.28 2.24 2.20 2.16 2.11 2.06 2.00 1.96 1.92 1.87 1.84 1.80 1.77 1.74 1.72 1.71
1::
7.77 3.86 3.63 3.32 3.21 3.\3 3.05 2.99 2.89 2.81 2.70 2.62 2.54 2.45 2.40 2.32 2.29 2.23 2.19 2.17
1
5.57 4.68 4.18 3.46
26 4.22 3.37 2.89 2.74 2.59 2.47 2.39 2.32 2.27 2.22 2.18 2.15 2.10 2.05 1.99 1.95 1.90 1.85 1.82 1.78 1.76 1.72 1.70 1.69
7.72 5.53 4.64 4.14 3.82 3.59 3.42 3.29 3.17 3.09 3.02 2.96 2.86 2.77 2.66 2.58 2.50 2.41 2.36 2.28 2.25 2.19 2.15 2.13
'&
~
27 4.21 3.35 2.96 2.73 2.57 2.46 2.37 2.30 2.25 2.20 2.16 2.13 2.08 2.03 1.97 1.93 1.88 1.84 1.80 1.76 1.74 1.71 1.68 1.67
7.6& 5.49 4.60 4.11 3.79 3.56 3.39 3.26 3.14 3.06 2.98 2.93 2.83 2.74 2.63 2.55 2.47 2.38 2.33 2.25 2.21 2.16 2.12 2.10
~
Q 28 4.20 3.34 2.95 2.71 2.56 2.44 2.36 2.29 2.24 2.19 2.15 2.12 2.06 2.02 1.96 1.91 1.87 1.81 1.78 1.75 1.72 1.69 1.67 1.65
7.64 5.45 4.57 4.07 3.76 3.53 3.36 3.23 3.11 3.03 2.95 2.90 2.80 2.71 2.60 2.52 2.44 2.35 2.30 2.22 2.18 2.\3 2.09 2.06
29 4.18 3.33 2.93 2.70 2.54 2.43 2.35 2.28 2.22 2.18 2.14 2.10 2.05 2.00 1.94 1.90 1.85 1.80 1.77 1.73 1.71 1.68 1.65 1.64
7.60 5.52 4.54 4.04 3.73 3.50 3.32 3.20 3.08 3.00 2.92 2.87 2.77 2.68 2.57 2.49 2.41 2.32 2.27 2.19 2.15 2.\0 2.06 2.03
30 4.17 3.32 2.92 2.69 2.53 2.42 2.34 2.27 2.21 2.16 2.12 2.09 2.04 1.99 1.93 1.89 1.84 1.79 1.76 1.72 1.69 1.66 1.64 1.62
7.56 5.39 4.51 4.02 3.70 3.47 3.30 3.17 3.06 2.98 2.90 2.84 2.74 2.66 2.55 2.47 2.38 2.29 2.24 2.16 2.13 2.07 2.03 2.01
32 4.15 3.30 2.90 2.67 2.51 2.40 2.32 2.25 2.19 2.14 2.10 2.07 2.02 1.97 1.91 1.86 1.82 1.76 1.74 1.69 1.67 1.64 1.61 1.59
7.50 5.34 4.46 3.97 3.66 3.42 3.25 3.12 3.01 2.94 2.86 2.80 2.70 2.62 2.51 2.42 2.34 2.25 2.20 2.12 2.08 2.02 1.98 1.96
34 4.13 3.28 2.88 2.65 2.49 2.38 2.30 2.23 2.17 2.12 2.08 2.05 2.00 1.95 1.89 1.84 1.80 1.74 1.71 1.67 1.64 1.61 1.59 1.57
7.44 5.29 4.42 3.93 3.61 3.38 3.21 3.08 2.97 2.89 2.82 2.76 2.66 2.58 2.47 2.38 2.30 2.21 2.15 2.08 2.04 1.98 1.94 1.91
36 4.11 3.26 2.86 2.63 2.48 2.36 2.28 2.21 2.15 2.10 2.06 2.03 1.89 1.93 1.87 1.82 1.78 1.72 1.69 1.65 1.62 1.59 1.26 1.55
7.39 5.25 4.38 3.89 3.58 3.35 3.18 3.04 2.94 2.86 2.78 2.72 2.62 2.54 2.43 2.35 2.26 2.17 2.12 2.04 2.00 1.94 1.90 1.87
38 4.10 3.25 2.85 2.62 2.46 2.35 2.26 2.19 2.14 2.09 2.05 2.02 1.96 1.92 1.85 1.80 1.76 1.71 1.67 1.63 1.60 1.57 1.54 1.53
7.35 5.21 4.34 3.86 3.54 3.32 3.15 3.02 2.91 2.82 2.75 2.69 2.59 2.51 2.40 2.32 2.22 2.14 2.08 2.00 1.97 1.90 1.86 1.84
."
I
ca
~ Table 8.5 (continued)
=
-0.
40 4.08 3.23 2.84 2.61 2.45 2.34 2.25 2.18 2.12 2.07 2.04 2.00 1.95 1.90 1.84 1.79 1.74 1.69 1.66 1.61 1.59 1.55 1.53 1.51
7.31 5.18 4.31 3.83 3.51 3.29 3.12 2.99 2.88 2.80 2.73 2.66 2.56 2.49 2.37 2.29 2.20 2.11 2.05 1.97 1.94 1.88 1.84 \.81
42 4.07 3.22 2.83 2.59 2.44 2.32 2.24 2.17 2.11 2.06 2.02 1.90 1.94 1.89 1.82 1.78 1.73 1.68 1.64 1.60 1.57 1.54 1.51 1.49
7.27 5.15 4.29 3.80 3.49 3.26 3.10 2.96 2.86 2.77 2.70 2.64 2.54 2.46 2.35 2.26 2.17 2.08 2.02 1.94 1.91 1.85 1.80 1.78
44 4.06 3.21 2.82 2.58 2.43 2.31 2.23 2.16 2.10 2.05 2.01 1.98 1.92 1.88 1.81 1.76 1.72 1.66 1.63 1.58 1.56 1.52 1.50 1.48
7.24 5.12 4.26 3.78 3.46 3.24 3.07 2.94 2.84 2.75 2.68 2.62 2.52 2.44 2.32 2.24 2.15 2.06 2.09 1.92 1.88 1.82 1.78 1.75
46 4.05 3.20 2.81 2.57 2.42 2.30 2.22 2.14 2.09 2.04 2.00 1.97 1.91 1.87 1.80 1.75 1.71 1.65 1.62 1.57 1.54 1.51 1.48 1.46
7.21 5.10 4.24 3.76 3.44 3.22 3.05 2.92 2.82 2.73 2.66 2.60 2.50 2.42 2.30 2.22 2.13 2.04 1.98 1.90 1.86 1.80 1.76 1.72
48 4.04 3.19 2.80 2.56 2.41 2.30 2.21 2.14 2.08 2.03 1.99 1.96 1.90 1.86 1.79 1.74 1.70 1.64 1.61 1.56 1.53 1.50 1.47 1.45
7.19 5.08 4.22 3.74 3.42 3.20 3.04 2.90 2.80 2.71 2.64 2.58 2.48 2.40 2.28 2.20 2.11 2.02 1.% 1.88 1.84 1.78 1.73 1.70
50 4.03 3.18 2.79 2.56 2.40 2.29 2.20 2.13 2.07 2.02 1.98 1.95 1.90 1.85 1.78 1.74 1.69 1.63 1.60 1.55 1.52 1.48 1.46 1.44
7.17 5.06 4.20 3.72 3.41 3.18 3.02 2.88 2.78 2.70 2.62 2.56 2.46 2.39 2.26 2.18 2.10 2.00 1.94 1.86 1.82 1.76 1.71 1.68
55 4.02 3.17 2.78 2.54 2.38 2.27 2.18 2.11 2.05 2.00 1.97 1.93 1.88 1.83 1.76 1.72 1.67 1.61 1.58 1.52 1.50 1.46 1.43 1.41
....
a 7.12 5.01 4.16 3.68 3.37 3.15 2.98 2.85 2.75 2.66 2.59 2.53 2.43 2.35 2.23 2.15 2.06 1.96 1.90 1.82 1.78 1.71 1.66 1.64
.~ 60 4.00 3.15 2.76 2.52 2.37 2.25 2.17 2.10 2.04 1.99 1.95 1.92 1.86 1.81 1.75 1.70 1.65 1.59 1.56 1.50 1.48 1.44 1.41 1.39
E
<:)
7.08 4.98 4.13 3.65 3.34 3.12 2.95 2.82 2.72 2.63 2.56 2.50 2.40 2.32 2.20 2.12 2.03 1.93 1.87 1.79 1.74 1.68 1.63 1.60
~.... 65 3.99
7.04
3.14
4.95
2.75
4.10
2.51
3.62
2.36
3.31
2.24
3.09
2.15
2.93
2.08
2.79
2.02
2.70
1.98
2.61
1.94
2.54
1.90
2.47
1.85
2.37
1.80
2.30
1.73
2.18
1.68
2.09
1.63
2.00
1.57
1.90
1.54
1.84
1.49
1.76
1.46
1.71
1.42
1.64
1.39
1.60
1.37
1.56
~
70 3.98 3.13 2.74 2.50 2.35 2.32 2.14 2.07 2.01 1.97 1.93 1.89 1.84 1.79 1.72 1.67 1.62 1.56 1.53 1.47 1.45 1.40 1.37 1.35
E
~
7.01 4.92 4.08 3.60 3.29 3.07 2.91 2.77 2.67 2.59 2.51 2.45 2.35 2.28 2.15 2.07 1.98 1.88 1.82 1.74 1.69 1.62 1.56 1.53
J; 80 3.%
6.96
3.11
4.88
2.72
4.04
2.48
3.56
2.33
3.25
2.21
3.04
2.12
2.87
2.05
2.74
1.99
2.64
1.95
2.55
1.91
2.48
1.88
2.41
1.82
2.32
1.77
2.24
1.70
2.11
1.65
2.03
1.60
1.94
1.54
1.84
1.51
1.78
1.45
1.70
1.42
1.65
1.38
1.57
1.35
1.52
1.32
1.49
~
tl \00 3.94 3.09 2.70 2.46 2.30 2.19 2.10 2.03 1.97 1.92 1.88 1.85 1.79 1.75 1.68 1.63 1.57 1.51 1.48 1.42 1.39 1.34 1.30 1.28
~ 6.90 4.82 3.98 3.51 3.20 2.99 2.82 2.69 2.59 2.51 2.43 2.36 2.26 2.19 2.06 1.98 1.89 1.79 1.73 1.64 1.59 1.51 1.46 1.43
~
Q 125 3.92 3.07 2.68 2.44 2.29 2.17 2.08 2.01 1.95 1.90 1.86 1.83 1.77 1.72 1.65 1.60 1.55 1.49 1.45 1.39 1.36 1.31 1.27 1.25
6.84 4.78 3.94 3.47 3.17 2.95 2.79 2.65 2.56 2.47 2.40 2.33 2.23 2.15 2.03 1.94 1.85 1.75 1.68 1.59 1.54 1.46 1.40 1.37
150 3.91 3.06 2.67 2.43 2.27 2.16 2.07 2.00 1.94 1.89 1.85 1.82 1.76 1.71 1.64 1.59 1.54 1.47 1.44 1.37 1.34 1.29 1.25 1.22
6.81 4.75 3.91 3.44 3.13 2.92 2.76 2.62 2.53 2.44 2.37 2.30 2.20 2.12 2.00 1.91 1.83 1.72 1.66 1.56 1.51 1.43 1.37 1.33
200 3.89 3.04 2.65 2.41 2.26 2.14 2.05 1.98 1.92 1.87 1.83 1.80 1.74 1.69 1.62 1.57 1.52 1.45 1.42 1.35 1.32 1.26 1.22 1.19
6.76 4.71 3.38 3.41 3.11 2.90 2.73 2.60 2.50 2.41 2.34 2.28 1.17 2.09 1.97 1.88 1.79 1.69 1.62 1.53 1.48 1.39 1.33 1.28
400 3.86 3.02 2.62 2.39 2.23 2.12 2.03 1.96 1.90 1.85 1.81 1.78 1.72 1.67 1.60 1.54 1.49 1.42 1.38 1.32 1.28 1.22 1.16 1.13
6.70 4.66 3.83 3.36 3.06 2.85 2.69 2.55 2.46 2.37 2.29 2.23 2.12 2.04 1.92 1.84 1.74 1.64 1.57 1.47 1.42 1.32 1.24 1.19
1000 3.85 3.00 2.61 2.38 2.22 2.10 2.02 1.95 1.89 1.84 1.80 1.76 1.70 1.65 1.58 1.53 1.47 1.41 1.36 1.30 1.26 1.19 1.13 1.08
6.66 4.62 3.80 3.34 3.04 2.82 2.66 2.53 2.43 2.34 2.26 2.20 2.09 2.01 1.89 1.81 1.71 1.61 1.54 1.44 1.38 1.28 1.19 1.11
3.84 2.99 2.60 2.37 2.21 2.09 2.01 1.94 1.88 1.83 1.79 1.75 1.69 1.64 1.57 1.52 1.46 1.40 1.35 1.28 1.24 1.17 1.11 1.00
6.64 4.60 3.78 3.32 3.02 2.80 2.64 2.51 2.41 2.32 2.24 2.18 2.07 1.99 1.87 1.79 1.69 1.59 1.52 1.41 1.36 1.25 1.15 1.00
Statistical Tables B·ll
5 .05 3.64 4.60 5.22 5.67 6.03 6.33 6.58 6.80 6.99 7.17
.01 5.70 6.98 7.80 8.42 8.91 9.32 9.67 9.97 10.24 10.48
6 .05 3.46 4.34 4.90 5.30 5.63 5.90 6.12 6.32 6.49 6.65
.01 5.24 6.33 7.03 7.56 7.97 8.32 8.61 8.87 9.10 9.30
7 .05 3.34 4.16 4.68 5.06 5.36 5.61 5.82 6.00 6.16 6.30
.01 4.95 5.92 6.54 7.01 7.37 7.68 7.94 8.17 8.37 8.55
8 .05 3.26 4.04 4.53 4.89 5.17 5.40 5.60 5.77 5.92 6.05
.01 4.75 5.64 6.20 6.62 6.96 7.24 7.47 7.68 7.86 8.03
9 .05 3.20 3.95 4.41 4.76 5.02 5.24 5.43 5.59 5.74 5.87
.01 4.60 5.43 5.96 6.35 6.66 6.91 7.13 7.33 7.49 7.65
10 .05 3.15 3.88 4.33 4.65 4.91 5.12 5.30 5.46 5.60 5.72
.01 4.48 5.27 5.77 6.14 6.43 6.67 6.87 7.05 7.21 7.36
11 .05 3.11 3.82 4.26 4.57 4.82 5.03 5.20 5.35 5.49 5.61
.01 4.39 5.15 5.62 5.97 6.25 6.48 6.67 6.84 6.99 7.13
12 .05 3.08 3.77 4.20 4.51 4.75 4.95 5.12 5.27 5.39 5.51
.01 4.32 5.05 5.50 5.84 6.10 6.32 6.51 6.67 6.81 6.94
13 .05 3.06 3.73 4.15 4.45 4.69 4.88 5.05 5.19 5.32 5.43
.01 4.26 4.96 5.40 5.73 5.98 6.19 6.37 6.53 6.67 6.79
14 .05 3.03 3.70 4.11 4.41 4.64 4.83 4.99 5.13 5.25 5.36
.01 4.21 4.89 5.32 5.63 5.88 6.08 5.26 6.41 6.54 6.66
15 .05 3.01 3.67 4.08 4.37 4.59 4.78 4.94 5.08 5.20 5.31
.01 4.17 4.84 5.25 5.56 5.80 5.99 6.16 6.31 6.44 6.55
16 .05 3.00 3.65 4.05 4.33 4.56 4.74 4.90 5.03 5.15 5.26
.01 4.13 4.79 5.19 5.49 5.72 5.92 6.08 6.22 6.35 6.46
17 .05 2.98 3.63 4.02 4.30 4.52 4.70 4.86 4.99 5.11 5.21
.01 4.10 4.74 5.14 5.43 5.66 5.85 6.01 6.15 6.27 6.38
18 .05 2.97 3.61 4.00 4.28 4.49 4.67 4.82 4.96 5.07 5.17
.01 4.07 4.70 5.09 5.38 5.60 5.79 5.94 6.08 6.20 6.31
19 .05 2.96 3.59 3.98 4.25 4.47 4.65 4.79 4.92 5.04 5.14
.01 4.05 4.67 5.05 5.33 5.55 5.73 5.89 6.02 6.14 6.25
20 .05 2.95 3.58 3.96 4.23 4.45 4.62 4.77 4.90 5.01 5.11
.01 4.02 4.64 5.02 5.29 5.51 5.69 5.84 5.97 6.09 6.19
24 .05 2.92 3.53 3.90 4.17 4.37 4.54 4.68 4.81 4.92 5.01
.01 3.96 4.55 4.91 5.17 5.37 5.54 5.69 5.81 5.92 6.02
30 .05 2.89 3.49 3.85 4.10 4.30 4.46 4.60 4.72 4.82 4.92
.01 3.89 4.45 4.80 5.05 5.24 5.40 5.54 5.65 5.76 5.85
40 .05 2.86 3.44 3.79 4.04 4.23 4.39 4.52 4.63 4.73 4.82
.01 3.82 4.37 4.70 4.93 5.11 5.26 5.39 5.50 5.60 5.69
60 .05 2.83 3.40 3.74 3.98 4.16 4.31 4.44 4.55 4.65 4.73
.01 3.76 4.28 4.59 4.82 4.99 5.13 5.25 5.36 5.45 5.53
120 .05 2.80 3.36 3.68 3.92 4.10 4.24 4.36 4.47 4.56 4.64
.01 3.70 4.20 4.50 4.71 4.87 5.01 5.12 5.21 5.30 5.37
co .05 2.77 3.31 3.63 3.86 4.03 4.17 4.29 4.39 4.47 4.55
.01 3.64 4.12 4.40 4.60 4.76 4.88 4.99 5.08 5.16 5.23
B·12 AppendixB Statistical Tables
NA 2 3 4 5 6 7 8 9 10 11 12 13 14 IS 16 17 18 19 20
NB
1 o 0
2 00011 1 122 2 333 444
o 0 0 0 I I I 2 222
3 o 0 2 2 3 3 4 5 567 789 9 10 11
o 122 3 3 445 5 6 6 7 7 8
4 o 2 3 4 5 6 7 8 9 10 11 12 14 15 16 17 18
o I 2 3 4 4 5 6 7 8 9 10 11 II 12 13 13
5 o 2 4 5 6 8 9 11 12 13 15 16 18 19 20 22 23 25
o 2 3 5 6 7 8 9 II 12 13 14 IS 17 18 19 20
6 o 2 3 5 7 8 10 12 14 16 17 19 21 23 25 26 28 30 32
1 2 3 5 6 8 10 II 13 14 16 17 19 21 22 24 25 27
7 o 2 4 6 8 11 13 15 17 19 21 24 26 28 30 33 35 37 39
I 3 5 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34
8 1 3 5 8 10 13 15 18 20 23 26 28 31 33 36 39 41 44 47
o 2 4 6 8 10 13 IS 17 19 22 24 26 29 31 34 36 38 41
9 1 3 6 9 12 15 18 21 24 27 30 33 36 39 42 45 48 51 54
o 2 4 7 10 12 IS 17 20 23 26 28 31 34 37 39 42 45 48
lO 1 4 7 11 14 17 20 24 27 31 34 37 41 44 48 51 55 58 62
o 3 5 8 II 14 17 20 23 26 29 33 36 39 42 45 48 52 55
11 5 8 12 16 19 23 27 31 34 38 42 46 50 54 57 61 65 69
o 3 6 9 13 16 19 23 26 30 33 37 40 44 47 51 55 58 62
12 2 5 9 13 17 21 26 30 34 38 42 47 51 55 60 64 68 72 77
I 4 7 II 14 18 22 26 29 33 37 41 45 49 53 57 61 65 69
13 2 6 10 15 19 24 28 33 37 42 47 51 56 61 65 70 75 80 84
I 4 8 12 16 20 24 28 33 37 41 45 50 54 59 63 67 72 76
14 2 7 11 16 21 26 31 36 41 46 51 56 61 66 71 77 82 87 92
I 5 9 13 17 22 26 31 36 40 45 50 55 59 64 67 74 78 83
15 3 7 12 18 23 28 33 39 44 50 55 61 66 72 77 83 88 94 100
I 5 10 14 19 24 29 34 39 44 49 54 59 64 70 75 80 85 90
16 3 8 14 19 25 30 36 42 48 54 60 65 71 77 83 89 95 101 107
I 6 II IS 21 26 31 37 42 47 53 59 64 70 75 81 86 92 98
17 3 9 15 20 26 33 39 45 51 57 64 70 77 83 89 96 102 109 115
2 6 II 17 22 28 34 39 45 51 57 63 67 75 81 87 93 99 lOS
18 4 9 16 22 28 35 41 48 55 61 68 75 82 88 95 102 109 116 123
2 7 12 18 24 30 36 42 48 55 61 67 74 80 86 93 99 106 112
19 0 4 10 17 23 30 37 44 51 58 65 72 80 87 94 101 109 116 123 130
2 7 13 19 25 32 38 45 52 58 65 72 78 85 92 99 106 113 119
20 0 4 11 18 25 32 39 47 54 62 69 77 84 92 100 107 115 123 130 138
2 8 13 20 27 34 41 48 55 62 69 76 83 90 98 lOS 112 119 127
B·14 Appendix B Statistical Tables
-
.... .... The numbers listed below are the critical Uvalues for a = .01. Use the lightface critical values
for a one-tailed test and the boldface critical values for a two-tailed test. Note: To be significant,
the smaller computed U must be equal to or less than the critical U. A dash (-) means that no
decision is possible at the stated level of significance.
NA 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
NB
I
2 0000001 I
o 0
3 o 0 I 2 2 2 3 3 4 4 4 5
o o 0 I 2 2 2 2 3 3
4 o 2 3 3 4 5 5 6 7 7 8 9 9 10
o o 223 3 4 5 5 6 6 7 8
5 o I 2 3 4 5 6 7 8 9 10 II 12 13 14 15 16
o 2 3 456 7 7 8 9 10 11 12 13
6 I 2 3 4 6 7 8 9 II 12 13 15 16 18 19 20 22
o 1 2 345 679 10 11 12 13 15 16 17 18
7 o I 3 4 6 7 9 II 12 14 16 17 19 21 23 24 26 28
o 1 3 4 6 7 9 10 12 13 15 16 18 19 21 22 24
8 o 2 4 6 7 9 11 13 15 17 20 22 24 26 28 30 32 34
I 2 4 6 7 9 11 13 15 17 18 20 22 24 26 28 30
9 I 3 5 7 9 II 14 16 18 21 23 26 28 31 33 36 38 40
o 3 5 7 9 11 13 16 18 20 22 24 27 29 31 33 36
10 3 6 8 11 13 16 19 22 24 27 30 33 36 38 41 44 47
o 2 4 6 9 11 13 16 18 21 24 26 29 31 34 37 39 42
II 4 7 9 12 15 18 22 25 28 31 34 37 41 44 47 50 53
o 2 5 7 10 13 16 18 21 24 27 30 33 36 39 42 45 48
12 2 5 8 II 14 17 21 24 28 31 35 38 42 46 49 53 56 60
3 6 9 12 15 18 21 24 27 31 34 37 41 44 47 51 54
13 o 2 5 9 12 16 20 23 27 31 35 39 43 47 51 55 59 63 67
3 7 10 13 17 20 24 27 31 34 38 42 45 49 53 56 60
14 o 2 6 10 13 17 22 26 30 34 38 43 47 51 56 60 65 69 73
4 7 11 15 18 22 26 30 34 38 42 46 50 54 58 63 67
15 o 3 7 II 15 19 24 28 33 37 42 47 51 56 61 66 70 75 80
2 5 8 12 16 20 24 29 33 37 42 46 51 55 60 64 69 73
16 o 3 7 12 16 21 26 31 36 41 46 51 56 61 66 71 76 82 87
2 5 9 13 18 22 27 31 36 41 45 50 55 60 65 70 74 79
17 o 4 8 13 18 23 28 33 38 44 49 55 60 66 71 77 82 88 93
2 6 10 15 19 24 29 34 39 44 49 54 60 65 70 75 81 86
18 o 4 9 14 19 24 30 36 41 47 53 59 65 70 76 82 88 94 100
2 6 11 16 21 26 31 37 42 47 53 58 64 70 75 81 87 92
19 4 9 15 20 26 32 38 44 50 56 63 69 75 82 88 94 101 107
o 3 7 12 17 22 28 33 39 45 51 56 63 69 74 81 87 93 99
20 1 5 10 16 22 28 34 40 47 53 60 67 73 80 87 93 100 107 114
o 3 8 13 18 24 30 36 42 48 54 60 67 73 79 86 92 99 105
Source: Adapted from R. E. Kirk. (1984). Elementary statistics (2nd ed.). Pacific Grove, CA: Brooks/Cole.
Used with permission of the publisher.
Statistical Tables 8-15
.• .•
u Level of Significance for a
One-Tailed Test
Level of Significance for a
One-Tailed Test
.05 .025 .01 .005 .05 .025 .01 .005
Level of Significance for a Level of Significance for a
Two-Tailed Test Two-Tailed Test
N .10 .05 .02 .01 N .10 .05 .02 .01
How to use this table: Use the smaller computed sum of ranks as the test statistic, T. If the computed Tis
equal to or less than the critical value in the table, then it is significant.
Source: Adapted from R. E. Kirk (1984). Elementary statistics (2nd ed.) Pacific Grove, CA: Brooks/Cole.
Used with permission of the publisher.
8·16 Appendix B Statistical Tables
• N = number of pairs.
A P PEN D x c
WRITINCT UP RESEARC~ IN
APA STYLE: OVERVIEW AND
lOCUS ON RESULTS
hen a study's data are collected and analyzed, a researcher must begin the process
of packaging the results for others to consider, examine, or even challenge. Tra-
,. ditionally, most behavioral scientists write up research results in the form of an
article. Each of the behavioral sciences has a particular format for a manuscript-the
preprinted form of an article-though there are commonalities across the disciplines.
Because I know the format advocated by the American Psychological Association (APA)
best, it is the one I will present here. If you work in a discipline other than psychology,
then you should consult a set of writing style guidelines in an appropriate reference
work (please note that some academic disciplines outside of psychology-notably nurs-
ing, criminology, and personnel-also employ APA style). Let me also suggest, however,
that you read the material included in this appendix anyway, as much of the advice is
generic and can help you to write clearer prose in whatever format you need to use.
Carefully Check a Paper's Spelling, Grammar, and Punctuation. Given the ubiquity
of spell checking-and even grammar checking-software, there is absolutely no ex-
cuse for poor spelling, problematic grammar, or confused punctuation. In fact, once
C-1
C·2 Writing Up Research In APA Style: Overview and Focus on Results
you have written a complete rough draft of a paper, the very last thing you should do
is proofread it from start to finish with an eye to catching dropped letters or words,
misspellings, and misused periods, commas, colons, and semicolons. If you believe that
these so-called "surface errors" are not as important as the content of your paper, re-
member that readers will have little chance to think about content if they are constantly
(mentally) correcting mistakes or omissions while reviewing your work.
Write with Generalists, Not SpeCialists, in Mind. Many novice writers assume that
their prose must be dense and technical in order to be "scientific:' Not so. Certainly,
some technical material and even professional jargon is tolerable in disciplinary writ-
ing, but the overarching goal is to present such information in ways that an interested,
but general, audience can follow. Every writer wants his or her ideas to "catch on" or
be appreciated, but such hopes are easily dashed if the supporting prose is too compli-
cated, convoluted, or otherwise unclear.
How can the goal of clear, concise prose be achieved? As you write, remember that
your goal is to present information in an accurate way, one that conveys your interest-
even excitement-about a topic. When presenting your arguments, remember to speak
to your audience directly-after all, you are telling them a research story-so do not
talk down to them (i.e., simplistic language) or over their heads (Le., formal, dry text).
As I have noted elsewhere (Dunn, 1999), focus on those ideas you can present to read-
ers that they do not already know or have not considered from your perspective. Imag-
ine that you had only a few minutes to describe your work to someone-what would
you say? Obviously, you would only talk about the most important issues. The same
criterion should be used for your writing, as what you leave out can be as influential
to clarity as what information you choose to include.
Write in the Active, Not Passive, Voice. I once received a B on a history paper and
when I asked why, the professor told me that my writing relied too much on the pas-
sive voice. The passive voice involves overuse of passive verbs (e.g., has been) in place
of active ones (e.g., is), so that the syntax of sentences becomes vague, not crisp. When
I claimed that behavioral scientists wrote in the passive voice precisely because they
could not be sure that their conclusions were correct, my professor laughed and told
me that uncertainty was no excuse for boring writing. I was so much younger-and
annoyed-then, but I wholeheartedly agree with him now.
For some reason, behavioral scientists do write in the passive voice and it is dull to
read: "The written instructions and the stimulus materials were given to the research
participants by the experimenter:' The meaning is still there, but the presentation could
be snappier, as in: "The experimenter gave the instructions and the stimulus materials
to the participants." Sometimes writing in the passive voice is necessary (O'Connor,
1996), of course, but in general, your writing should be active, involving, and to the
point. The best time to catch overreliance on the passive voice is during the revision
process, when you are rereading your paper to see how it sounds or how well the ideas
flow together.
Revise Again and Again. Good writing is made, not born. Good writers know that
they will need to generate several drafts of what they are working on before their writ-
ing reaches a satisfactory state. (Note that I said satisfactory, not perfect; eventually, a
writer must ask others for suggestions about where further improvements can be made.)
Develop the habit of revising your manuscript from start to finish at the beginning of
each session of writing (yes, really). Not only will reading and editing at ~e s~art of
each writing session enable you to get back into the mind-set of your paper, It wIll also
;I
Be Vigilant About Gender Bias and Sexist Language. Academic writing tends to be
, male oriented when it should be gender inclusive. Given the power of language to shape
!
r thought, belief, and culture, the behavioral sciences are sensitized to male-dominated
prose (i.e., "mankind" in place of "humanity"; "girl" or "lady" instead of "woman" or
r "female"; overuse of "he" or "him" relative to "her" or "she"). How can you eliminate
( subtle sexism from your language? Rather easily, actually, and you can begin by using
r pronouns such as "he" and "she" only when it is absolutely necessary to individualize
f so~e person's. behavior or when an example requires solo identification. The appro-
r pnate ~ternatIve,. o~ co~rse, is to write using gender-neutral plurals (i.e., they, them,
I
I
we). ThIS ~ourse Is.m~~Itely ~,referable to the cumbersome and tedious (to read, any-
w~y) practice of usmg he/she or, worse still, "(s)he:' The latter constructions not only
fail to adhe~e to ~A style. (see the Publication Manual; APA, 1994), their novelty is a
sour~e of dIstractIOn, slowmg readers down by drawing their attention to superfluous
detaIls .
Introduction
Discussion
the impression that all the attention to detail is promoting a rather commonsense
enterprise. By that I mean the clear explication of scientific information, including meth-
ods, procedures, and statistical tests, all of which advances knowledge by promoting un-
derstanding and encouraging researchers to replicate or extend the work of published
investigators.
What are the components in an APA-style manuscript? There are usually eight: a
Title, Abstract, Introduction, Method, Results, Discussion, References, and Tables or
Figures (but see the Publication Manual for others; APA, 1994). Because I want you to
get a sense of the conceptual structure of a manuscript, we will briefly summarize the
contents of each section below. Envision an hourglass (Bem, 1987), one where the in-
troduction is at the broad top-the lid-of the hourglass and the Discussion lies at the
bottom of its broad base (see Figure C.l). Both the beginning and end of an APA-style
manuscript contain broad statements about the nature of the topic under investigation.
The introduction literally introduces the relevance of the topic to the study of behav-
ior, and gradually narrows its focus to the specific question(s) addressed by the paper.
At the other end of the manuscript, the Discussion deals with the (broader) implica-
tions of what was found for current and future work.
What about the narrow "neck" of the metaphorical hourglass? At the narrowest
point is the Method section of the APA-style paper (see Figure C.l). I am using thew~rd
"narrow" to reflect concern with reporting a step-by-step account of exactly how a plece
of research was carried out from start to finish. The details provided should be suffi-
cient to enable a motivated reader to recreate the experiment from scratch in order to
replicate the results. Below this narrow band of the hou~glass lie the Resu~ts, which
should contain a thorough accounting of what happened m terms of behavIOral out-
comes in the study (see Figure C.l). The Results section is also the place where the spe-
cific statistical analyses, as well as supporting symbols and nu~bers, ~re pr:sented. Once
the specific behavioral outcomes are reviewed, the Results sectlon ~ b~gm.to broaden
into the Discussion, where the previously acknowledged focus on lffiphcatIOns can be
reviewed. ..
Beyond the hourglass metaphor, one other important aspect of APA-style wntmg
stands out: Each section of a paper should be written so that it stands alone (Bem, 1987;
Overview of APA Style: Sections of a Manuscript
c-s
Dunn, 1999). By "stands alone," I mean that a motivated reader could begin by read-
ing the last section of a paper (the Discussion) and still have a coherent sense of what
was done and why in one or more of the earlier sections of the manuscript. Indeed, ex-
pert readers prefer to scan individual sections of a paper in order to determine whether
to read the whole thing. A given reader may begin with the Abstract, notice some in-
teresting point related to the research participants, and then immediately turn to the
Method. Another might read serially, from the Abstract, to the introduction, and all the
way through to the References.
Ideally, each section of an APA-style paper prepares readers for what appears in its
subsequent section(s). Thus, the introduction prepares the reader for the details of the
procedure found in the Method or for what happened, behaviorally speaking, in the
Results. How is some degree of consistency in content achieved so that readers can move
from one section to another with little difficulty? Usually by having a writer make cer-
tain that the study's purpose-its hypothesis-is recapitulated in some fashion within
each of the sections, despite the fact that it is primarily associated with the tail-end of
the introduction, where it usually serves as a segue into the Method. Such constancy of
purpose in APA-style writing takes practice, but is well worth the effort because it ef-
fectively educates and reminds readers of the study's unique purpose throughout its
written summary.
We can now briefly summarize the content and scope of the eight main sections
of the typical APA-style paper. Because there are ample resources available-both ref-
erence books and software-that provide concrete guidance about APA formatting re-
quirements, everything from pagination to capitalization and citation style, I will not
review such details here (but see, for example, Gelfand & Walker, 1990). For your con-
venience, however, I provide the number(s) of the relevant sections of the Publication
Manual for each of the components of the typical APA-style manuscript.
Title (Refer to Section 1.06 of the Publication Manual). A paper's title should con-
vey the nature of a topic, including relevant variables or theories, in as few words as
possible. Because every word is important, phrases such as "An Experiment Concern-
ing" or ''A Study of" should be dropped from a title. Thus, "The Effect of Proximity
Cues on Romantic Attraction" is preferable to ''An Experimental Study of the Effect of
Proximity Cues on Romantic Attraction." A good rule of thumb is to keep a title's length
to 12 words or less. Your name and institutional affiliation should appear two spaces
below the title on the title page, the first page of an APA-style manuscript.
the introduction should open with a relatively broad perspective on the research topic,
gradually introducing, emphasizing, and integrating more specific information leading
up to a concrete rationale for the research. Any researcher must expend considerable
effort to make the inferential logic underlying the study'S design clear and objective.
Introductions, which begin on the third page of APA-style manuscripts, tend to be sev-
eral pages in length.
Method (Refer to Section 1.09 of the Publication Manual). The purpose of the Method
section is to provide sufficient detail for readers to follow and understand all the pro-
cedures and materials used to run an experiment (or other investigation) from start to
finish. In many ways, a Method section is like a script or set of directions for readers to
follow. Most Method sections will contain a participants section describing the research
participants (e.g., gender, age); an apparatus or materials section where any special
equipment (e.g., operant chambers, computer programs) or written materials (e.g., pub-
lished tests, measures, or personality inventories) are noted; and a procedure section out-
lining what an experimenter(s) and participant(s) did at each stage of the research.
Depending on the nature of the research, other detailed subsections can also be added
into the Method.
Results (Refer to Section 1.10 of the Publication Manual). A Results section reviews
the nature of the data collected, accounts for the manner in which they were analyzed,
and then links the findings to the original hypothesis of the study. Because a later sec-
tion of this appendix is focused on how to structure and write a Results section, I will
defer the majority of my suggestions until then. I do, however, want to remind readers
that a Results section should not be focused on interpreting any findings in detail or
discussing their implications-these activities are reserved for the Discussion section.
Discussion (Refer to Section 1.11 of the Publication Manual). The degree to which
a study's results fit the hypothesis, and this "fit" or lack thereof has any impact on related
research, is the focus of the APA-style Discussion section. Most Discussion sections have
three goals: to review a study's main purpose and discuss whether the results indicate it
was achieved; to highlight any deviations from the research expectations, including prob-
lems or limitations with what was found; and, finally, to judge whether the obtained ... ,
results add to existing knowledge or point to future directions similar investigations should i
follow. Discussion sections invite both the author and reader to see the "forest" of the be-
havioral results (i.e., larger meaning), not just the "trees" (i.e., individual findings).
.;
References (R~fer to Sections 1.13 and 3.94-3.117 in the Publication Manual). The ,
Ref~re~ces sectIOn of the APA-style paper is its scholarly trail. Interested readers can ex-
~mme It t.o follo~ a researcher's logic, or use it to add to their own knowledge regard-
~g a ~OpIC. Any Idea drawn from an article, book, book chapter, or other source-
mcludmg the Internet-used in the course of the research must be referenced here and
according to a particular citation style (for numerous examples of APA-style references,
~lease t~rn to the References section for this book). The proper citation of references
IS a CruCIal part of the research enterprise, as it is an important way that scientists "com-
~unicate" wit~ ~ne anothe~ and the common way that students learn to navigate the
lIterature pertammg to partICular topics or even fields. ,
,
Tables. and Figures (Refer to Sections 3.62-3.86 in the Publication Manual). As
~oted m ~hapter 3 of this book, tables and figures should be used sparingly to high- ,,
hght partIcular aspects of data, analyses, or the relationships among variables. In
Focus on Writing a Results Section &-7
APA parlance, a table summarizes some results, including means and standard de-
viations, or the results of more complex inferential analyses. On the other hand, a
figure is a graph, a picture, or some line drawing designed to efficiently convey the
meaning of some relationship or result. A reader should be able to glance at a
figure or a table and derive meaning from it in a matter of seconds. If this reason-
able goal cannot be achieved in a relatively short amount of time, then it is fair to
assume that the data presentation-whether table or figure-is too complicated. For
recent, authoritative guidance on how to prepare APA-style tables, please see Nicol
and Pexman (1999).
Two Common Problems in Student Papers Written in APA Style. Aside from the prob-
lems presented above, students tend to overwrite APA style introductions while under-
writing their Discussions. By overwriting an introduction, I mean that most of the time
and energy in creating a draft is expended in the first section, chiefly on the literature
review. Problems occur when students discuss related research in too much detail, ne-
glecting to focus in on what points in prior work are actually relevant to the current
project, or citing too many studies that are only tangentially related to the current work.
When reviewing studies using a particular research paradigm (e.g., the bogus pipeline;
Jones & Sigall, 1971) to examine people's attitudes, for example, avoid the temptation
also to cite all other studies pertaining to attitude research (and there are hundreds,
possibly thousands)-just choose those that relate to the specific topic at hand.
Less, then, is more: Strive to provide readers with only the information they need to
evaluate your ideas and their relationship to a topical area of research, not an entire
field of research.
What about the problem of underwriting? By the time students reach the Dis-
cussion section, their production of prose often wanes, as if they are too tired to
really think about or expand on the implications of their results. Often, I find that a
quick summary is all that is provided, one that rarely does little more than repeat the
"greatest hits" of the introduction and the Results sections. Ideally, a Discussion sec-
tion invites the writer as well as the reader to really think about what the results mean
in context. Do they extend the theory or identify some limit to it? Was the hypothe-
sis confirmed, or will it need to be revised for use in future studies? Why? Can the
results be used to design future studies addressing some of the unanswered questions?
No matter how you choose to end your paper, though, try not to close with just the
tired, worn phrase, "More research on this topic is necessary:' Why? Because more re-
search is always called for-what else can you tell readers that they do not already
know? Be creative!
Other suggestions about writing an APA-style paper can be found in Dunn (1999)
or in several of the behavioral-science-oriented writing references provided in the list
appearing at the end of this appendix.
• Brief discussion of the specific (main) results, which serve to set up main
conclusions that can be mentioned in the Results section but are only fully
explored in the Discussion.
It is also worth mentioning what information should not appear in a Results section:
• Raw data, scores, or other individual observations should not appear except when
there is a compelling reason to discuss a salient example (e.g., an extreme score that
skewed a result) or to review some illustrative case or response (e.g., "One participant
captured the group's feeling when she said ... ").
• Calculations used to determine any descriptive or inferential statistics unless they
involve some novel, controversial, or otherwise noteworthy innovation.
As we discuss ways to craft a good Results section, we must confront an awful
truth: people do not think statistics is interesting-indeed, they usually assume it
is dull (Abelson, 1995). When writing up any results, your first goal is to identify
what, if anything, is surprising, unexpected, interesting, or otherwise revealing. In
other words, remember the MAGIC criteria when presenting what you found (see
Data Box 9.F and the related discussion in chapter 15; see also Abelson, 1995). When
writing the opening paragraph of a Results section, then, begin on a strong note, one
that reminds readers about the main hypothesis being tested and why it is a com-
pelling one.
Beyond the "interestingness" of the results, of course, there is an overarching
goal for the writer of any Results section: Explain in plain language what the results
reveal about some aspect of human (or animal) behavior (Dunn, 1999). Once the
hypothesis of interest is noted, you can conceptually begin to answer these sorts of
questions for the reader. (When there is more than one hypothesis, let the reader
know that early in the Results section, and then examine the hypotheses and results
in a serial fashion unless some other framework is more appropriate.) What, specif-
ically, did the research participants do behaviorally? To answer this question, discuss
how behavior was measured (Le., review the nature of the dependent measure and
the data gleaned from it). Why did the participants act one way and not another?
In most cases, observed behavior-especially when it more or less confirms a
hypothesis-will be discussed in terms of some independent variable. The writer's
job is to make it clear as to whether any experimental manipulation actually had its
predicted effects.
In the course of addressing these questions, you must provide any relevant de-
scriptive or inferential statistics to support the prose statements or conclusions. When
there are relatively few descriptive statistics available, they can be reported inside de-
clarative statements, usually parenthetically (e.g., "The average number of tokens se-
lected was 12 [SD = 2.0]."). A large number of descriptive statistics should be placed
in a table (see chapter 3), and it is important that the prose guides readers to it
(e.g., "The mean rate of response is shown in Table 1 ... "). Unless there are a variety
of similar tests, most inferential statistics should appear at the end of a sentence rather
than in a table, as in: "The experimental group tapped the key more times (M = 27.5)
than did the control group (M = 16.0), t(30) = 4.56, p < .05." Just be sure to provide the
statistical symbol(s) so readers will know how the data were analyzed, the available de-
grees of freedom (if necessary), and the significance level of the test. In fact, it is often
a good idea to concretely state what test was used: "The two means were compared us-
ing an independent groups t test."
I urge you to outline the results that will appear in a Results section hierarchically
and on paper before you actually sit down to write. A conceptual example of what I
Focus on Writing a Results Section C-g
.. .....
Table C.1 A Sample Hierarchical Outline for Writing Up Results
Opening Paragraph-Overview of Results
• Brief recap of overall hypothesis and method
• Emphasis on what aspect of human (or animal) behavior is being examined
• Review of independent variable(s) and dependent measure(s)
• Nature of the data
Main Result
• Specific statistical test used and brief rationale for it
• Concrete description of behavioral result and link to hypothesis
• Numerical value of test statistic (symbol, degrees of freedom, significance level)
• Table( s) and/or figure( s) linked to statistical test( s)
• Supporting statistics (e.g., effect size) and power issues
Secondary Results (if any)
• Specific statistical test used and brief rationale for it
• Concrete description of behavioral result and link to hypothesis
• Numerical value of test statistic (symbol, degrees of freedom, significance level)
• Table(s) and/or figure(s) linked to statistical test(s)
• Supporting statistics (e.g., effect size) and power issues
Tertiary Results (if any)
• Specific statistical test used and brief rationale for it
• Concrete description of behavioral result and link to hypothesis
• Numerical value of test statistic (symbol, degrees of freedom, significance level)
• Table( s) and/or figure( s) linked to statistical test( s)
• Supporting statistics (e.g., effect size) and power issues
Transition to Discussion Section
have in mind can be seen in Table Cl. As shown in Table CI, an overview paragraph
prepares the reader for the more detailed results that follow. Many student studies will
only include one main result (see Table CI), though more ambitious projects may con-
tain a main result as well as secondary results-those that relate to or otherwise sup-
port the main results-and even tertiary results (see Table CI).
Two important issues are often overlooked when results are presented in APA-style
articles. First, many writers assume that readers will remember the direction of pre-
dicted effects-that is, which mean is hypothesized to be larger than which other mean
(or means). To the researcher, the direction of effects is obvious because it is based on
the hypothesis (and the desired results) but, to paraphrase the French philosophe,
Voltaire, common sense is not so common. Take the time to specifically remind the
reader about the direction of an effect and whether the obtained results (e.g., means)
match it or deviate from it. Second, do not forget to present supporting statistics deal-
ing with effect size and the degree of association between the independent variable and
the dependent measure, among others. It is also a good idea to comment on the power
of the test statistics used, especially in situations where a mean difference appears to be
strong but is actually modest or even weak (refer to the discussion of statistical power
and effect size in chapter 9; see also, Data Box 1O.D and the discussion of power at the
end of chapter 10).
When should the APA style reference section be written? To many novice APA-style
writers, this issue will seem to be an odd one. After all, isn't the Results section writ-
ten after the Method, which, in turn, is composed after the introduction? This linear
C-10 Writing Up Research In APA Style: Overview and Focus on Results
. .,..
.. .. Step 1. Draft the Method section
Step 2. Draft the Results section
• Revisit step 1 for revising and editing
Step 3. Draft the Introduction
• Revisit steps 1 and 2 for revising and editing
• As references are used, cite them in the References section
Step 4. Draft the Discussion section
• Revisit steps 1, 2, and 3 for revising and editing
Step 5. Draft the Abstract
• Create the title page
• Create tables and figures
• Verify references
• Revisit steps 1,2,3, and 4
Source: Adapted from Table 3.6 in Dunn (1999, p. 98).
approach is one that many student authors use, but I advocate you consider an alter-
native approach. Write the Method section when you are still collecting data (i.e., the
procedural details will be fresh in your mind) and then craft the Results section after
the analyses are finished but before you ever write the introduction. Why? The results
of the research really inform every other section of the paper and, in particular, it is
important to have a firm grasp of what happened in the research before the
introduction is undertaken. Think about it: The researcher should know what happened
before wrestling with the conceptual presentation of the research question and the sup-
porting literature. Such foreknowledge will keep the introduction focused-doing the
Results later rather than sooner often leads to a meandering introduction.
Beginning with the Method and proceeding straight to the drafting of the Results,
Table C.2 contains a recommended series of steps for writing an APA-style paper. As
you can see, a process of continual revision and refinement of ideas and prose is built
into the steps shown in Table C.2. For additional suggestion about writing an APA style
Results section or paper, consult chapters 3 and 8 in Dunn (1999).
O'Connor, P. T. (1999). Words fail me: What everyone who writes should know about writing. New
York: Harcourt Brace.
Parrott III, 1. (1999). How to write psychology papers (2nd ed.). New York: Longman.
Rosnow, R. 1., & Rosnow, M. (1995). Writing papers in psychology (3rd ed.). Pacific Grove, CA:
Brooks/Cole.
Scott, J. M., Koch, R. E., Scott, G. M., & Garrison, S. M. (1999). The psychology student writer's
manual. Upper Saddle River, NJ: Prentice Hall.
Smyth, T. R. (1996). Writing in psychology: A student guide (2nd ed.). New York: Wiley.
Sternberg, R. J. (1993). The psychologist's companion: A guide to scientific writing for students and
researchers (3rd ed.). Cambridge: Cambridge University Press.
Strunk, jr., w., & White, E. B. (1972). The elements of style. New York: Macmillan.
Williams, B. T., & Brydon-Miller, M. (1997). Concept to completion: Writing well in the social sci-
ences. Fort Worth, TX: Harcourt Brace College Publishers.
Williams, J. M. (1990). Style: Toward clarity and grace. Chicago: University of Chicago Press.
A P PEN D x D
0·1
Project Organization: Identifying What Needs to be Done and When D·2
planning to conduct. About how long (in hours and days) will each of the required activities take? Will the
projected amount of time to perform these research activities match or "fit into" the actual amount of avail-
able time? This issue is key: You cannot do a competent project if there is not enough time available for its
completion. As a matter of fact, you may find that you need to revise or streamline various research activi-
f ties in order to complete the project in the allotted time. One final suggestion: If your instructor does not
give you a deadline, impose one on yourself. There is an old but valued maxim in writing circles, one that
applies equally well to conducting research: A deadline is your friend.
Sit down with a calendar and review each of the project activities listed below in the left column. Write
down the amount of time that you believe it will take you to complete a given activity in the space provided
in the right column. (Please note that each activity is actually comprised of many smaller activities-be sure
to take these into account in your planning!) If an activity does not apply to your project, skip it and enter
a "0" in the right column.
What do such time estimates have to do with the statistical analysis of data? Everything, actually-you
cannot do any analyses unless you have some data to analyze, as well as the time to do so. The research project
and all of its inherent details must be taken seriously. When you are finished evaluating the activities, sum the
day(s) and hour(s) in the right column to form a time estimate. Compare this time estimate with the actual
(
amount of time you have available to conduct the research and analyses-is your estimate more or less than the
rr time available? If it is more, then you will need to adjust your schedule or the project activities accordingly. If
the estimate is less, avoid complacency by getting started right away-remember that things often take more
time than projected.
"A positive difference means that the estimate exceeds the time available-reevaluate the activities and make appropriate adjustments. A
negative difference means that the project can he completed within the time available.
I recommend that you update your time estimate at least two or three times in the course of conduct-
ing a project (Dunn, 1999). Some activities will take a shorter time than expected, whereas others can drag on
much longer than predicted (this is especially true where participant recruitment and the actual collection of
i
I data are concerned). We now consider one project activity in detail, the prepping of data for analysis.
i
I
I
D-3 Appendix D Organization, TIme Management, and Prepping Data for Analysis
Your experiment or study was organized from beginning to end, and the data were collected and then
coded into some numerical form for analysis. What is the next step? Wise data analysts know that it is never
a good idea to begin the main statistical analyses (those pertaining to the main hypothesis or hypotheses of
interest) until the data are verified for accuracy. Ways to verify data accuracy are the focus of comments in
the remainder of this section, several of which were gleaned from an excellent chapter by Dollinger and
DiLalla (1996). Additional good advice can be found in chapters 1 and 2 of Newton and Rudestam (1999).
Check Scales Before Employing Them in a Study. Many researchers and students rely on published scales
or inventories in their work, often making changes in the words or phrases used in these measures. Such
changes are generally fine, but they must be carefully documented for the reader (see Smith, Budzeika, Ed-
wards, Johnson, & Bearse, 1986). If a scale is changed and no record of the change is maintained, a researcher
can lose valuable time backtracking to identify the nature of the change. Worse still, the specific change may
be forgotten and, thus, never acknowledged in the analyses or write-up of the project. Document a change
before the data collection, making sure that the change does not adversely affect the analysis that will even-
tually be used on data from the measure (Dollinger & DiLalla, 1996).
Some published measures also contain items where respondents' numerical responses must be altered
or "recoded" before a scale total can be determined. Often referred to as "reverse recoding," an item rating
is essentially flipped so that it remains consistent with the rest of the scale. Perhaps most of the items in a
scale are phrased positively ("I am a hard worker") and rated on a five-point scale (higher numbers are more
favorable), but one item is negative ("I am lazy"). In order to make the latter item's rating consistent with
the positive items, a respondent's response must be altered. Specifically, a rating of 1 would be recoded as a
5, and a 2 as a 4 (and vice versa); only the neutral (midscale) rating of 3 remains unchanged. Published mea-
sures will include specific directions about recoding requirements like this one, but researchers must be on
the lookout for them. Be sure, then, to read any fine print, directions, or test manuals before using published
scales-later you will be glad that you did so.
Check the Scoring of Each Variable Before Beginning Any Analyses. Once the data are collected and coded
(i.e., converted to numbers), it is essential that the possible range of numbers for each variable is checked. A
smart way to proceed is to create a frequency table for each and every variable (see chapter 3), checking to
make sure that a rating scale that has values, say, from 1 to 7, contains no observations greater than 7 or less
than 1 (e.g.,O and 8 could not appear in the frequency table). Similarly, if any dichotomous codings are used
for variables-gender is often coded as 1 for male and 2 for female-the relevant data must match these two
possibilities exclusively.
In the case of large data sets, statistical software can readily run frequencies with ease, and a researcher
can "eyeball" the resulting information to find out-of-range errors in a matter of minutes. Smaller data sets
are just as manageable, even when done by hand. Just take the necessary time to check all the values and their
respective ranges-catching errors in the short run will save time as well as heartbreak over wished-for re-
sults in the long run.
Calculate Descriptive Statistics for all Variables, Especially the Main Dependent Measures. This sugges-
tion is another way to check for out-of-range errors-also known as "outliers"-and to make sure that a set of
data is clean before the main analyses can begin. You need to get a feel for your data, and "snooping" around
in it to make sure that the means and standard deviations make sense is a smart way to begin. Note that this
"snooping" does not preclude the possibility that people's mean responses did not come out as expected, but it
will help you to recognize-and verify-those situations where the observed means (or medians, variances, stan-
dard deviations, etc.) do not fit expectations.
Dealing with Missing Values. Some researchers deal with missing values-that is, participants' nonre-
sponses to some question or series of questions-by not including any data from the nonrespondent in any
analyses. This rule holds true whether one item or a host of items were neglected. Other investigators prefer
some data to none, so they routinely include nonrespondents in their analyses by making certain that the N
in any analysis involving missing data is adjusted downward (i.e., a sample has 20 respondents but only 19
Prepping Data for Analysis D·4
answered item 12-the denominator for calculating the mean for item 12 becomes 19, not 20, because of the
nonrespondent). By the way, the presence of nonrespondents or missing data in a sample must be docu-
mented in the Method and (usually) the Results sections of APA-style papers.
r Particular problems occur when novice data analysts enter 0 instead of leaving a missing value blank
( or empty within a data set. The problem, of course, is twofold: the (non) respondent never endorsed a 0 rat-
ing, and the value of 0 has a decided mathematical effect on any calculation in which it is (inappropriately)
entered (e.g., deflating the mean). The rule, then, is a simple one-leave blanks blank and never enter a 0
unless it is both meaningful and apt (e.g., question: "How many times were you sick last semester?" answer:
"0 times").
I Keep Compulsive Records. This piece of advice will sound silly but it is tried and true-label things com-
i pulsively (Smith et al., 1986). There is always a chance that you may sit on your data for a while (e.g., a se-
r mester, a year), or you may decide to pick up a research project idea from one course as an independent
I study much later in your academic career. Unless you kept careful and meticulous notes about why you
i did various things to your data, or why you picked the analysis of variance (ANOVA) in lieu of the t test,
j there is a very good chance that you will not remember the rationale for decisions made many months be-
I
\ fore. As noted in chapter 15, one's memory for statistical concepts is apt to fade unless those concepts are
r used with some regularity. Why risk "re-creating" the wrong thing when a little note taking or thoughtful
labeling can save the day?
Make Certain That Significant Differences Are in the "Right" (i.e., Hypothesized) Direction. An ele-
mentary but common mistake for first-time data analysts is to focus on the presence of a significant differ-
ence ("Wow! The t test is significant at the .05 leve!!") without checking to make sure that the difference is
in the predicted direction (i.e., one mean is appropriately larger than the other). Many an honors or a mas-
ters thesis has foundered on the mistaken assumption that any significant difference is a good difference. You
cannot (probably) do anything about means that appear in the wrong direction, but you should be aware of
their presence so that you do not make the mistake of claiming that a hypothesis was supported when, in
fact, it was actually supported in a way opposite to the prediction.
i
Learn to Ask the Humble but Simple Question, "Does This Result Make Sense?" As you gain experience
working with statistics and data analysis, you will gradually get a feel for results and when they make sense.
Once the result of a statistical analysis is known, it is always a good idea to stop and examine it to make cer-
tain that it makes "sense" both statistically (i.e., the calculations are correct) and where interpretation is con-
cerned (i.e., does the result fit the prediction? Why or why not?). Many analytic disasters have been averted
by simply and calmly examining the data and statistical analyses with the criteria of common sense in mind.
When something-a number, a result-does not make sense, there is a good chance you will be able to track
down an error in coding or analysis, or some unusual participant responses.
• Suggested References
Dollinger. S. J.• & DiLalla, D. L. (1996). Cleaning up data and running preliminary analyses. In F. T. L. Leong & J. T. Austin
(Eds.). The psychology research handbook: A guide for graduate students and research assistants (pp. 167-176). Thousand
Oaks. CA: Sage.
Dunn. D. S. (1999). The practical researcher: A student guide to conducting psychological research. New York: McGraw-Hill.
Leong. F. T. L.• & Austin. J. T. (Eds.). (1996). The psychology research handbook: A guide for graduate students and research assis-
tants. Thousand Oaks. CA: Sage.
Newton. R. R.• & Rudestam. K. E. (1999). Your statistical consultant: Answers to your data analysis questions. Thousand Oaks.
CA:Sage.
Smith. P. C.• Budzeika. K. A.• Edwards. N. A.• Johnson. S. M.• & Bearse. L. N. (1986). Guidelines for clean data: Detection of
common mistakes. Journal of Applied Psychology. 71.457-460.
A P PEN D x E
ANswrRS TO ODD-NUM5rRrD
rND-or-Ct=tAPTrR PR05LrMS
As you compare your answers to those provided here, please keep in mind that different statistics require
differing amounts of calculation in order to determine a solution. When performing a calculation with
multiple stages, it is entirely possible that you will round numerical answers to the nearest decimal place
each step of the way. Despite the conventions presented in chapter 1, data analysts, too, round their an-
swers in various ways, depending on the circumstance. Similarly, different calculators round numbers to
several places or only a few behind a decimal point-it all depends on the designer's whims. The upshot
of all this rounding error is that your answer to a given question may not match what is shown here.
Should you be worried that you did something wrong? No, not unless the difference between the two an-
swers is large, in which case you should verify that your calculations and math are correct. If the differ-
ence is small, it is entirely acceptable and no doubt due to rounding error.
III! Chapter 1
1. What is a statistic? A statistic is a number representing some piece of information.
Can data analysis differ from statistical analysis? Why, or why not? Yes, they can differ. Statistical analysis
refers to working through the necessary calculations to identify relationships within quantitative results, while
data analysis allows for the possibility that nonquantitative information will also be of interest.
3. How do variables differ from constants? Give an example of each. A variable is any measurement value
that can change from one measurement to the next, while a constant remains the same from measurement to
measurement. Height and weight are variables (from individual to individual). The temperature at which
water freezes is a constant (under the same conditions each time).
5. Why are mathematics and statistics different disciplines? Mathematics is the discipline concerned with
the analysis of logical relationships among quantity and volume. Statistics, while it uses mathematical calcula-
tions, is the discipline concerned with the analysis of relationships among data, inferences that can be drawn
from these relationships, and the rules by which data are collected.
What makes some mathematical operations statistical? When a mathematical operation results in a numer-
ical answer that helps us research or understand an empirical fact, then it is statistical.
7. What are empirical data? Empirical data are measurements assigned to any observation in an experi-
ment or from experience.
9. Define inductive and deductive reasoning, and then give an example of each process. Inductive reason-
ing involves arriving at a general conclusion based on the observations from several specific events of a similar
E·1
Appendix E Answers To Odd-Numbered End-Of-Chapter Problems E·2
nature. Example: I notice that moderate increases in anxiety, hunger, or thirst lead to better performance on a
repetitious task, and I conclude that increases in arousal lead to increases in performance.
Deductive reasoning involves using more general knowledge to draw conclusions (make predictions)
about specific instances. If I know moderate arousal improves performance and that mild electrical shock is
arousing, I might predict (conclude) that mild electrical shock during the performance of a task requiring
that a subject cross out all the vowels in a page of text would improve the subject's performance.
11. Define random sampling. Random sampling involves selecting a sample from a population in such a
way that all members of the population have an equal chance of being selected in the sample.
13. Can inferential statistics prove without any doubt that a given sample is from some particular popula-
tion? Why or why not? No. Any sample, however large and representative, may contain unknown biases due to
variables that may have gone unnoticed in the selection of the sample.
15. Identify the upper and lower true limits for the following:
a. 2049.5 and 2050.5 pounds
b. 58.295 and 58.305 minutes
c. 2.5 and 3.5 inches
d. Since this is a discrete rather than a continuous distribution, the score of 70 has no other limits.
Were this value the mean of the test scores for an entire class, however, 69.5 and 70.5 would be the
true limits.
17. Why is writing relevant to statistics and data analysis? Data and their interpretation are of little use to a
field of study unless they are publicly reported. Thus, writing is, in some sense, as (or perhaps more) impor-
tant as the statistical procedures used.
How can good, clear writing help investigators with their research? Clear, concise, and accurate presentation
of the data, statistical results, and their interpretation eliminates misunderstandings and makes it easier for
other scientists to attempt to replicate and extend the results.
19. Name and define the four basic symbols. X and Yare symbols that represent variables (usually scores or
measurements); N refers to the total number of observations or values in a sample or population, and L refers
to the algebraic process of addition and requests that the following values be added together.
a. Y = (7 + 2)2 - V2s
= (9)2 - 5
= 81 - 5 = 76
b. X= oW + 02 - 5) X 4
= 1000 + (7) X4
= 1000 + 28 = 1028
c. Y= ViO - (-15 + lW
= 3.16 - (-5)2
= 3.16 - (25)
= -21.84
d. X= 8 X 2 + (10 + 12)2
= 16 + 484 = 500
4 7 LX = 4 +2+2+3+1+5= 17
2 2 LY=7+2+4+4+4+1=n
2 4 LX Y = 4(7) + 2(2) + 2(4) + 3(4) + 1(4) + 5(1) = 61
E·3 Appendix E Answers To Odd-Numbered End-Of-Chapter Problems
fIJI Chapter 2
1. Describe the steps comprising the research loop of experimentation.
1. Observe an interesting phenomenon or uncover a previously untested theoretical principle.
2. Develop a testable (operationalizeable) hypothesis.
3. Collect data bearing on the hypothesis.
4. Analyze the data and interpret the results.
5. Report or share results and obtain feedback.
6. Return to step 1.
Does it differ from the scientific method introduced in chapter I? No, it is the practical application of the
scientific method.
How does the research loop help investigators do research? It helps advance scientific knowledge, ensures that
established findings are reexamined in the light of new findings by replication and extension of previous re-
sults.
3. What are replication studies and why are they scientifically useful? A replication study repeats, or re-
does, a previous investigation to determine if the same results can be obtained on a second occasion. They are
necessary to establish that a new finding can be obtained by other investigators and by the same investigator
on different occasions. They are used to determine the reliability of a novel finding.
How does a conceptual (systematic) replication study differ from a standard replication? A conceptual (sys-
tematic) replication extends the new finding beyond the limits of the original experiment. Some aspects of the
systematic replication are kept the same and some aspects are chosen to vary from the original study. A sys-
tematic replication may extend the range of the same variables or look at the effects of new independent vari-
ables in interaction with the original variables.
5. In terms of scientific utility and purpose, how does random assignment differ from random selection?
Random selection requires that every element in the population has an equal chance of being selected in the
sample. That is not always practical or possible. Instead, most social science researchers use random assign-
ment. Individuals who come to participate in an experiment are randomly assigned (i.e., without bias) to one
of however many different conditions there are in the experiment.
7. Explain the difference between systematic random sampling and stratified random sampling. In sys-
tematic random sampling the researcher lists the potential participants in a study according to some order (i.e.,
alphabetically, by telephone number, etc.) and then selects every nth person on the list to obtain their sample.
In stratified random sampling, the researcher selects the sample randomly from subgroups of the population
that are represented in different proportions (i.e., ethnicity, gender, year in school).
How are these sampling techniques used? Systematic random sampling is used in circumstances where sim-
ple random sampling is too expensive or time-consuming (i.e., choosing from the phone book at random),
while stratified sampling is used when subgroups differ in their representation in a population and the re-
searcher does not wish to oversample any of the subgroups.
Create a hypothetical example to illustrate each one.
Systematic random sample: Have all the students in Introductory Psychology listed by social security
number (highest to lowest) and select every 20th student to participate in your study.
Stratified random sample: In an industry in which 60% of the employees are male and 40% female, a
stratified sample of 100 employees would have 60 males and 40 females.
9. Define sampling error. Sampling error is the difference between a sample value and the population pa-
rameter.
Why do researchers need to be concerned about it? The size of the sampling error has much to do with the
accuracy of the generalization from samples to populations. The smaller the sampling error, the more accu-
rate the generalization.
Appendix E Answers To Odd-Numbered End-Of-Chapter Problems E-4
Is sampling error common or rare in research? Why? Sampling error is common in research because, how-
ever careful the researcher is, there is always some nonrepresentativeness in all samples and, therefore, some
sampling error.
11. Define the term independent variable and provide a concrete example of one. An independent variable
is that variable manipulated (intentionally changed in a programmatic fashion) by the experimenter to deter-
mine the effect that it will have. An example would be manipulating the type of activity between a learning ses-
sion and a recall session. One group might spend the interval doing active rehearsal of the material, while an-
other group might simply wait quietly, while a third might be asked to do a series of arithmetic problems.
How are independent variables used in research? Independent variables are manipulated (present in some
groups, not in others) to directly test the investigator's experimental hypothesis. At a minimum there must be
at least two levels (amounts, types) of the independent variable to have the study qualify as an experiment.
13. Define the term dependent measure (or variable) and provide a concrete example of one. A dependent
variable is the one measured by the experimenter to assess the effects of the independent variable. In problem
11, for example, the dependent variable might be the number of items recalled from a list under the different
levels of the independent variable.
How are dependent variables used in research? Dependent variables are used to detect differences between ex-
perimental and control conditions.
15. Create a hypothetical experiment. Describe the randomizing procedure(s} you would use, as well as the
hypothesis, the independent variable(s}, and the dependent variables(s}. What do you predict will happen in
the experiment? Why? The experiment will measure the effects of mood state during learning on recall. The
independent variable is mood state, either happy or sad. Mood state is induced before learning by having the
subjects recall a happy moment or a sad moment in their recent experience. Subjects will be 30 college sopho-
mores randomly assigned to one of the two states upon showing up for the experiment. Subjects will learn a
serial anticipation list of 30 five-letter words (English) of moderate familiarity to a criterion of one complete
correct repetition. A IS-minute interval will follow during which all subjects will spend the time adding
columns of five-digit numbers. Following the interval, subjects will be asked to write down as many of the
words as they remember. The number of items recalled will be recorded.
The independent variable is mood state: happy versus sad.
The dependent variable is number of items recalled.
I predict the happy mood group will recall more items. Learning should be better (and therefore recall)
if you are in a good mood than if you are in a bad mood.
17. Why does good research in the behavioral sciences generate more questions than it answers? The
present state of knowledge in the behavioral sciences, and the character of the research issues, are such that no
one, definitive (universally true) answer can result from a single experiment. Thus, while good research an-
swers some questions, it also points to those areas of study in which answers are still lacking and those areas
where what we know requires further refinement to clarify the nature of the causal factors.
19. What is a descriptive definition and how does it differ from an operational definition? Descriptive def-
initions are abstract, conceptual descriptions of a relationship involving variables. An operational definition
focuses on concrete and testable ways of dealing with the relationship that are consistent with the nature of the
original hypothesis.
21. Write operational definitions for each of the following variables:
a. Helping can be defined as the response of participating with others in solving a problem.
b. Fear can be defined as the reduction in performance of an activity in the presence of a stimulus
associated with an aversive event.
c. Procrastination can be defined as failing to start a task until nearing a deadline for completion.
d. Tardiness can be defined as arriving after the time for an appointment.
e. Happiness can be defined as a high score on a mood assessment test.
f. Attraction can be defined as increased selection of a particular individual as a dance partner.
g. Factual recall can be defined as a score on a test of historical facts.
23. What is reliability and why is it important when measuring variables? Reliability is consistency in ob-
served events in commonsense terms. In more scientific terms, reliability is stability across time in some mea-
surement. Reliable measures tend to yield less variation in scores than unreliable measures (less measurement
error), leading to less sampling error.
E·5 Appendix E Answers To Odd-Numbered End-Of-Chapter Problems
Create an example to illustrate reliability. If my scores on the SAT the three times I took it were 1050, 1070,
and 1060, my performance would be more reliable than if they had been 1050,970, and 1160.
25. What is construct validity? Construct validity expresses the degree of fit between the operational defini-
tion of a construct and the actual meaning and nature of the construct.
Why is it important? It is important that we know that our measure of "something" accurately reflects what
it is, so that we use the measure (and, therefore, the construct) effectively in practice and theory development.
Provide a hypothetical example of construct validity. Scores on the Beck Depression Inventory (BDI) are an
operational measure of the construct "depression." To the extent that high scores are obtained by people who
also report being sad and depressed and low scores are obtained by people who do not report being sad and
depressed, the BDI has construct validity.
27. Define convergent validity and discriminant validity. Convergent validity demonstrates that new mea-
sures of a construct are closely related to existing measures that were previously validated. Discriminant valid-
ity indicates that a new measure of one construct is unrelated (or poorly related) to other constructs.
Are these concepts related? Yes, they are complementary. Convergent validity supports the use of a new mea-
sure by showing its relationship to established measures of a construct, while discriminant validity shows that
the measure does not relate to other, possibly conflicting, constructs. Thus, the new measure is shown to mea-
sure what it is supposed to, and nothing else.
Illustrate, using examples, how these two types of validity are used in research. Suppose that I wished to
measure the intensity of felt physical discomfort (pain) by having participants adjust a dial to a number be-
tween 1 and 100. To demonstrate convergent validity I would then vary the intensity of physical discomfort
for participants and have them set the dial appropriately. I would then determine if these measures were sim-
ilar to other established valid measures of physical discomfort (e.g., McGill Pain Inventory, Visual Analog
Scale). If they were, I would have demonstrated convergent validity. If I now relate the dial scores to other
valid measures of psychological discomfort (i.e., depression, sadness, grief) and found no evidence of rela-
tionship, I would have demonstrated discriminant validity.
29. What is external validity? External validity is reflected in the observation that findings in one study may
generalize to other people, in other places, at other times. In other words be the same in a variety of similar cir-
cumstances.
Why is it important in research? The "outside" implications of research may be of considerable concern to
practitioners, and to society, but has little importance in the advancement of theory and knowledge within
the science of psychology. However, it is of considerable importance in research directed at the solution to
"real-world" and applied problems.
31. What are the three categories of research design? Describe the strengths and weaknesses of each one.
a. Correlational research
Strengths
1. Effective in discovering relationships or associations to be used in prediction.
2. Effective in identifying variables that might be used in other research designs.
Weaknesses
1. Yields no information about any causal connection between the related variables.
2. Does not use random assignment to the variables and thus may be comparing groups which differ
in more ways than are being measured.
b. Experimental research
Strengths
1. Clarifies causal factors by isolating the effects of the independent variable.
2. Ensures group equivalence before the experiment.
3. Can control for extraneous variables that complicate interpretation of data.
Weaknesses
1. Subject to confounding of other variables
2. Artificial and may not generalize well to nonexperimental settings.
c. Quasi-experimental research
Strengths
1. Useful in examining situations that do not lend themselves to control and randomization (Le., gen-
der differences).
2. Useful in providing some information in more natural settings.
Appendix E Answers To Odd-Numbered End-at-Chapter Problems E-6
Weaknesses
1. Random assignment to different conditions is not usually possible.
2. Control groups are not usually available.
33. Define the concept of correlation. Create hypothetical examples to illustrate a positive, a negative, and
a zero correlation. A correlation expresses the relationship (tendency to covary) between two or more vari-
ables. The relationship may be positive (as one variable increases, so do the others) as in the relationship be-
tween number of calories consumed and weight gain (and loss). The correlation may be negative (as one vari-
able increases, the other decreases) as in the relationship between mortgage interest rates and number of
purchases of homes. No relationship may exist (a zero correlation) when, for instance, you measure the num-
bers of hairs on the head and relate that to grade point average in college.
35. What is a confounded variable in an experiment? A confounded variable is some uncontrolled variable
that varies systematically with the variation in the independent variable.
Why does it prove problematic for understanding causality? If a confounded variable is present, the outcome
of the experiment cannot be clearly attributed to the causal effects of the variations in the independent vari-
able. The outcome may have been due to the uncontrolled systematic variation in the confound (or some in-
teraction). Failure to clearly attribute the causal action to the independent variable defeats the purpose of the
experimental method.
37. What is a random numbers table? How would a researcher use one in her work? A random
numbers table is a list of values from 0 to 9 that is generated so that each number occurs equally in a pattern-
less, unbiased sequence.
A researcher would use one in his/her work to sample a smaller portion of a large population. If a researcher
had 30 people and only wanted to sample half of them due to time and/or financial reasons, he/she can use a
random numbers table to select 15 people. First, assign each person a number from 1 to 30. Start the random
assignment anywhere in Table 2.5 (e.g. row 40), and read across every two numbers. In this example, one
would start with 9S, OS, 62, 48, 26, etc. Since there isn't a person who is assigned 9S, go on to the next number,
OS. There is a person who is S, so that person is a member of your sample group. Continue to go across the rows
until there are 15 people in the sample.
39. Describe the procedure for using a random numbers table to perform random assignment for
a basic two group experiment. If a researcher is performing an experiement requiring two groups, he/she
can use the procedure as in Problem 37 for the first experiment group and all the remaining people can be in
the second group. Example: if there are 40 people, assign each person a number from 1 to 40. Start from any-
where on the table. Read every two numbers going across or down. If the number falls between 1 and 40, the
person who is assigned that number goes in your first group, e.g. the control group. Continue this method until
20 people have been chosen. The remaining 20 people can be in the second group, e.g. the treatment group.
III' Chapter 3
1. Relative frequency distribution 3. Relative frequency distribution
X f p % X f p %
6 .05 5 16 .04 4
5 2 .10 10 15 3 .12 12
4 2 .10 10 14 .04 4
3 6 .30 30 13 6 .24 24
2 5 .25 25 12 5 .20 20
4 .20 20 11 5 .20 20
Total 20 1.00 100 10 4 .16 16
Total 25 1.00 100
E·7 Appendix E Answers To Odd-Numbered End-Of-Chapter Problems
7 f-
6 f-
5 l-
4
y
3 f-
2 i-
l-
o
10 11 12 13 14 15 16
I
x
4
y
3
7. a. Frequency distribution
X f p %
10 .056 5.6
9 3 .167 16.7
8 4 .222 22.2
7 2 .111 ILl
6 1 .056 5.6
5 2 .111 ILl
4 0 .000 0.0
3 4 .222 22.2
2 1 .056 5.6
1 0 .000 0.0
Total 18 1.00 100
Appendix E Answers To Odd-Numbered End-Of-Chapter Problems E·8
b. Histogram
y 3
o~--L---~~~~~~--~--~--~--~--~---
2 3 4 5 6 7 8 9 10
X
c. If the usual score on this quiz is a 6, how would you describe the performance of the students
on this quiz? Better than usual.
9. a. N= 31
b. If= 31
c. I X = 5(10) + 7(4) + 3(3) + 5(2) + 1(6) = 103
12
10
- = 8
=
CI
Co.)
2
1.00 2.00 3.00 4.00 5.00
Frequency
E·g Appendix E Answers To OddcNumbered End-Of-Chapter Problems
b. An interval width of 5
4 - .---
3- - -
f
2- .--- -
1- .--- ~
10 r-
81-
f 6 I-
41-
17. Describe some of the ways that a graph can misrepresent data.
1. If the complete range of data is not included for comparison.
2. Exaggerate or minimize the differences in the data by manipulating the scales on one or more axis.
3. Do not have clear labels on all parts of the graph.
What should a critical researcher or viewer do to verify that a graph's data are presented accurately? The
critical researcher should be skeptical and check to see that there is no extraneous information, that all la-
bels are clear and understandable, and that the title is clear and accurate.
19. What is exploratory data analysis (EDA)? Exploratory data analysis is the process used to gain an over-
all impression of what the data are about.
Why do researchers find it useful? EDA provides a quick organizational structure to the data, often provid-
ing a fresh look at the data and aiding in planning further analysis.
21. Using the data from problem 20, construct a stem and leaf diagram using units of 5.
Stem Leaf
2* 023
2. 79
3* 0 1 1 235
3.
4* 1 2345
4. 8
5* 2
5. 6
6* 355
6. 7
7* 0122
7. 6799
8* 013
8. 9
23. Using the following table of data, add a column for cumulative frequency (if) and one for cumulative
percentage (c%)
X f if c%
10 8 54 100
9 5 46 85.2
8 o 41 75.9
E·11 Appendix E Answers To Odd-Numbered End-Of-Chapter Problems
x f if c%
7 4 41 75.9
6 9 37 68.5
5 6 28 51.8
4 8 22 40.7
3 5 14 25.9
2 3 9 16.7
6 6 ILl
25. Examine this data table and then answer the questions that follow:
3{16.9 - 16}
X 6S = 14.5 + 7 = 14.54
c. What is the median score {i.e., 50th percentile}?
3{l3 - 8}
Xso = 11.5 + 8 = 13.38
27. Put the percentile results into words.
a. A score of 13 indicates a mastery above the median at about the 46th percentile. This result indi-
cates that about 54% of the students had better mastery.
b. About 35% of the students had mastery above a score of 15, while 65% had scores at or below a
score of approximately 15.
c. Fifty percent of the students showed mastery at or below a score of approximately 13 and 50% had
greater mastery scores.
Ill! Chapter 4
1. Define the term "central tendency." The central tendency is a statistical measure that locates the
center of a distribution around which most scores cluster. It identifies the most representative value of that
distribution.
Explain the concept's utility for statistics and data analysis, as well as research in the behavioral sciences. It is
useful in defining the score value that is most representative of the distribution, providing a point of compar-
ison for individual scores in terms of position relative to the central tendency.
3. Would reporting the mean for any of the four distributions from problem 2 pose a problem for an in-
vestigator? Why? Yes, it would. Distribution c in problem 2 has one very extreme score {84}, which will make
the mean less representative of the central tendency of the scores. Unlike the other measures of central ten-
dency, the mean is unduly influenced by extreme scores.
5. What measure of central tendency should you report? Why? The median would be the best representa-
tive of the central tendency when a distribution has one or two scores very far from the main group of scores.
The mean will be overly influenced by the extreme score, and there may well be several modes since most
scores are relatively close to each other.
7. Calculate a weighted mean:
x= 32{27.5} + 48{23.0} + 12(25}
92
= 2,284 = 24.83
92
Appendix E Answers To Odd-Numbered End-Of-Chapter Problems E·12
9. Which measure of central tendency is best? If your measure is number of months receiving welfare. it
would be best to use the median. since it would be less influenced by the measures above six months than the
mean. On the other hand. if the researcher is simply categorizing the observations as above or below six
months. then the mode would be more appropriate.
11. Create examples of data where the appropriate measure of central tendency is:
a. The mean: 15. 19. 12.21. 11. 19. 14. a more or less symmetrical distribution with no outliers
b. The median: 15. 19. 12.21. 11. 19. 14.43.37. a distribution with two outliers
c. The mode: 15. 19.21. 19. 12. 19. 11.21. 19. a distribution in which one or more values is repeated
13. Which measure of central tendency is most affected by skew in a distribution? Which one is least af-
fected? Why? The mean is most affected by skew since it is responsive to extreme scores (in the extended tail
of the skewed distribution). The mode is least affected by skew since no increase in the length of the tail of the
distribution will affect the number of occurrences of the most frequent score. which will be found where
scores cluster.
19. Assume the distribution in problem 17 represents a population. Calculate p., u 2• and u.
43
p,= 7" = 6.14
50.86
7 = 7.27
u= V7:i.7 = 2.70
21. Explain the difference between so-called "biased" and "unbiased" estimates of population parameters.
Which type of estimate is used in what sort of situation? A biased estimator underestimates the parameter of
the population. while an unbiased estimator yields a value closer to the parameter than any other. While it is
always more accurate to use an unbiased estimator of a population parameter. in large (N > 100 or 200) sam-
ples the difference between the two is very small. With small samples. however. it is best to use an unbiased es-
timator of a parameter.
23.
Biased Unbiased
25. Explain the role variability plays in both homogeneous and heterogeneous distributions. The more het-
erogeneous the distribution. generally the greater the variability among the values since heterogeneity implies
many different values. On the other hand. variability in a homogenous distribution should be relatively small
since the number of different score values will be fewer.
27. Between what two scores do half of the observations fall in the distributions in problem 2?
a. 8.75 and 16.25
b. 0.875 and 5.125
c. 25.5 and 39.5
d. 11.5 and 14.5
E-13 Appendix E Answers To Odd-Numbered End-Of-Chapter Problems
29. What is the relationship between sample size and variability? As sample size increases, generally vari-
ability decreases and becomes more representative of the variability in the population.
Is it better to have data from a larger or a smaller sample? Why? Generally, it is better to have data from
larger samples. They tend to be more representative (less biased) of the population's characteristics in terms
of number and types of subsamples that exist, and the general shape of the population distribution.
31. Calculate the unbiased estimates of the variance and standard deviation for the two samples in prob-
lem30.
a. SampleX
- 74
X =8'= 9.25
u 2 = 26.5
u = 5.15
b. Sample Y
y= 176 = 22
8
u2 = 148
u = 12.17
33. Based on the decision tree for choosing a measure of central tendency:
a. Median
b. Median
c. Mode
d. Mean
35. Based on the decision tree for choosing to calculate sample statistics, unbiased estimates of population
parameters, or actual population parameters:
a. Unbiased estimates of the population parameters
b. Statistics
c. Parameters
II Chapter 5
1. What are the properties of the z distribution? The z distribution has a mean of zero and a standard de-
viation equal to 1 and retains the shape of the original distribution faithfully.
Why are z scores useful? The z score is useful in making comparisons of performance between different indi-
viduals on the same measure or within the individual across different measures.
3. What are the properties of the standard normal distribution? The standard normal distribution is a bell-
shaped, bilaterally symmetrical (if folded in half at the mean, the two halves exactly coincide) distribution with
tails that gradually approach the baseline (asymptotic). The mean = median = mode = O.
Why is the normal curve useful for statisticians and behavioral scientists? The normal curve is approximated
by the frequency distributions of a wide variety of physical (e. g., height, weight, etc.) and behavioral (e. g., IQ
scores, introversion/extroversion scores, etc.) measures. The well-known characteristics of the normal distrib-
ution can be used to describe, and make inferences about, any phenomena that yield a frequency distribution
that tends toward a normal distribution in shape.
5. Why do researchers standardize data? Researchers standardize data so that comparisons about relative
position of individuals on different measures can be made meaningfully.
What is a standard score? A standard score is a raw score expressed as a distance from the mean of a distri-
bution in terms of numbers of standard deviations.
Why are z scores and T scores standard scores? Z scores and T scores are standard scores because they both
use the standard deviation as the unit of measurement for transforming a raw score.
7. What is probability? Probability is the likelihood of the occurrence of a particular event from among all
possible related events.
How does probability conceptually relate to z scores and the normal distribution? The z score tells you
how many standard deviations a particular value is from the mean of a normal distribution. The area
Appendix E Answers To Odd-Numbered End-Of-Chapter Problems 1E·14
between the mean and that z score, or between any two z scores, represents the proportion of the time (prob-
ability) a value will be randomly sampled from within that area.
9 = 54 - 78 = -1.92
• Zs4 12.5
z = 63.5 - 78 = -1.16
63.5 12.5
%6.0 = ---u:s-
66 - 78
= -0.96
= ---u:s- = -0.08
77 - 78
Z77.0
78.5 - 78
~s.5 = 0.04
12.5
= ---u:s- = 0.24
81 - 78
Zs1.0
11.
A-1.92 = 47.26%
A-1.l6 = 37.70%
A- 0 .96 = 33.15%
A-o.os = 3.19%
Ao.04 = 1.60%
Ao.24 = 9.48%
Zmemory = 1.13
Zspatial relations = 0.69
b. The student was relatively high on memory and relatively low on visualization.
c. The student's percentile rank on verbal ability was 75.5 or approximately 76, and 24.5 or approxi-
mately 25% scored above the student on spatial relations.
29. T-1. 12 =64
T- 2 .3 = 52
TJ.l8 = 87
T2 .67= 102
T3•S8 = 111
31. a. Convert the raw score to a z score
b. z=~
u
c. Percentile rank of 1.76 = 96.1%
IIIJ Chapter 6
1. Conceptually, correlation assesses the association between two variables. Why isn't the association
causal in nature? Causality cannot be inferred from a correlation because correlations are bidirectional in na-
ture. It is as reasonable to say X is correlated with Y as it is to say Y is correlated with X. Causality is an
Appendix E Answers To Odd-Numbered End-at-Chapter Problems E-16
unidirectional relationship (A always leads to B). Also, a third factor, Z, may cause X and Y. This unknown fac-
tor makes the relationship between X and Y spurious.
3. Describe the possible directions of relationships between two variables in correlational analyses. What
role do positive ( + ) and negative ( - ) play? There are two possible directions of the relationship between two
variables. X may increase as Y increases, which is a positive (+) relationship, or X may increase while Y de-
creases (or vice versa), which is described as a negative (-) or inverse relationship.
5. Describe a scatter plot showing a positive correlation, a negative correlation, and a zero correlation.
In a positive correlation the pattern of dots tends to start on the lower left and move up and to the right. In a
negative (inverse) relationship the pattern of dots generally moves from upper left to lower right. In the case of
a zero correlation there is no clear directional character to the scatter plot.
7. In what way is the Pearson rrelated to zscores?What advantage does the zscore provide to the Pearson
r? Since z scores standardize a raw score in relationship to the mean in standard deviation units, one can use
the z scores of an individual on two (or more) variables to determine if their relative positions are related by
multiplying the z scores for each pair of raw scores (their covariation). The correlation is the average product
of the pairs of z scores. All that we know of the logic and statistical character of the z score (i.e., comparability
across different measures, relationship to the normal curve, etc.) may then be applied to correlations.
9. Conceptually define the coefficients of determination and nondetermination. How do these coeffi-
cients aid researchers? The coefficient of determination (r2) is the proportion of the change in one variable
which may be accounted for ("explained") by the corresponding change in the other variable. The coefficient
of nondetermination (1 - r2) is a measure of the proportion of change that cannot be accounted for ("unex-
plained") by such common (covariation or covariance) changes. The coefficients of determination and non-
determination help us understand the strength and predictive ability of a relationship to support our inter-
pretation of experimental findings.
11. Provide the coefficient of nondetermination corresponding to each of the rvalues shown in problem 8.
7 • •
6 •
5 • •
4
2 •
1
1 9
J
E·17 Appendix E Answers To Odd-Numbered End-Of-Chapter Problems
b. Calculate a Pearson r between X and Y. Does the rfit the relationship you describe in a? Yes. The in-
dicated correlation is a + .458 indicating a moderate relationship.
c. What are the coefficients of determination and nondetermination for these data? Coefficient of de-
termination = r2 = .2097, which rounds off to .21 .
j
Coefficient of nondetermination = I - r2 = .7903, which rounds off to .79
IS. Using the following data, determine the nature of this relationship by calculating a Pearson T and then
interpreting the result in words. T = - .634. There is a strong negative relationship between stress and health
ratings. This indicates that as stress scores increase, health ratings get lower (or vice versa).
17. How is the Pearson T used to determine the reliability of a measure? What range of T values is appropri-
ate for demonstrating reliability? The Pearson T can be used to test reliability by correlating answers to ques-
tions by the same individuals on separate occasions (test-retest reliability), or by using odd-even or split-half
correlations. In order to demonstrate reasonable reliability, the correlation should be +.70 or higher.
19. How would you go about assessing the reliability of a new job stress measure? I would develop a relatively
brief (time is money in industrial settings) questionnaire rating stress levels on the job and give it to a represen-
tative sample (probably stratified by salary or job classification) of employees on two separate occasions and
then calculate the test-retest correlation to determine reliability. If two testings are not feasible, I would use the
item-total correlation as the measure of reliability since it is a brief questionnaire.
'.
21. Does the test appear to be reliable? Why, or why not? No. The test-retest correlation was + .031, which
indicates a very low reliability.
23. Use the decision trees opening this chapter to answer the following questions:
a. A researcher wants to assess the correlation between temperature and rainfall. Which measure of
association is appropriate? Since these are both ratio measures, a Pearson r is the appropriate mea-
sure.
b. A student wants to determine the correlation between his rankings of national league baseball
teams and those of a friend. Which measure of association is appropriate? Since these are ordinal
rankings, calculate a Spearman's rho (rs).
c. An investigator discovers a correlation of + .75 between five pairs of test scores. Is this correla-
tion apt to be reliable? Why? No, it is not likely to be reliable because the sample size is very small.
d. Before calculating a correlation between two variables, a student notices that one of them is di-
chotomous-the score for Xis either"}" or"2:' but Yhas a relatively wide range of values. Should
the student be concerned? Why? Yes, the student should be concerned. It is not possible to obtain a
correlation between a dichotomous and a continuous variable using the Pearson product moment
correlation.
Ill! Chapter 7
1. What is the nature of the relationship between correlation and regression? Regression uses the degree of
relationship contained in the correlation coefficient to predict the performance by an individual on one variable f
(usually denoted Y, or the "dependent" variable) from the known performance of another variable (usually de-
noted X, or the "independent" variable). Correlation is used to determine the slope of the regression line.
3. Explain the relationship between a "best fitting" regression line and the "method ofleast squares." The
regression line, or line of best fit, is the line which makes the sum of the squared differences between the actual
score on Yand the predicted score on Y the smallest. The line of best fit minimizes prediction error variance.
5. What is residual variance and how is it related to the standard error of the estimate? Residual variance is cal-
culated based on the sum of the squared differences between actual and predicted scores divided by N - 2. Thus, it
is a measure of errors of prediction.
Why are lower values for the standard error of the estimate more desirable than larger values? Lower values
for the standard error of the estimate represent smaller deviations between actual and predicted scores. The
lower the standard error of the estimate, the more accurate are our predictions.
7. What does it mean when statisticians "partition" variation? Partitioning variation means dividing vari-
ance mathematically into two or more components, which may be associated with known factors (i.e., "ex-
plained" variation) or unknown factors (i.e., "unexplained variation").
How and why is the variation around a regression line partitioned? The variation around a regression line may
be partitioned by directly calculating the "explained" variation [I (y2 - y)2] and the "unexplained" variation
[I CY - Y?], or by using the coefficient of determination (r2) as an estimate of "explained" variation and the
Appendix E Answers To Odd-Numbered End-Of-Chapter Problems E·18
coefficient of nondetermination (1 - r2) as an estimate of"unexpIained" variation. We calculate these values to de-
termine the magnitude of error relative to r2 to assess the accuracy of our predictions.
9. A college athlete plays very well in a series of games, but his performance drops off late in the season. The
team's coach explains the change in performance by noting that his star player is under a lot of pressure, as he
must choose between professional sports or graduate school. Can you offer the coach a statistical explanation
for the player's slump in performance? The statistical explanation for the drop in performance could be regres-
sion towards the mean. If the player's average true proficiency was somewhere between his performance early in
the season and his later performance, the later performance could be expected to bring that season's performance
numbers closer to his true average level.
11. Conceptually, how does multiple regression differ from linear regression? Multiple regression is a way of
looking at the relationship between several predictor variables and some dependent performance measure in
the complex environment of the behavioral sciences. It is an extension of linear regression, which looks at a
single predictor and response measure.
In what ways is multiple regression used? Multiple regression may be used to improve prediction by including
several nonoverlapping predictors, or to further causal analysis by separating and exploring the unique contribu-
tion of each predictor (independent variable) to the prediction of the dependent variable (i.e., used in an ex-
ploratory analysis).
20
15
y
10
15. Using the equation X= 15 + .87Y, calculate the predicted Xwhen Y= 2.8,3.7,4.5,5.0,and6.0.
A
17. What percentage of the Yscores lie between ± lSat yaround any given Yvalue? What percentage of the
Y scores lies between ±3sesty? Assuming that the errors are normally distributed around the predicted value,
then .6826 will lie between ± lses• y and .9917 will lie between ±3s... y.
19. Researchers believe that positive mood is related to work productivity. The following data were col-
lected in an office:
Y= 20.38 as 20
c. Andrea's productivity level is 33. What level of mood led to such a high level of productivity?
X= 18.14 as 18
SeatY= 4.55
Smx = 2.28
Approximately 44% of the total variation in productivity is accounted for by positive mood.
21. Assume that each of the data sets in problem 20 is based on an N of 90. What is the value of the Sat Y for
each one? Which data set has the largest Saty? The smallest Sest y?
a. s... y = 3.63
b. s... y = 3.77
c. s... y = 1.01
d. s.,s. y = 1.79
Data set b has the largest Sa.Y\ data set c has the smallest.
23. A psychologist examines the link between a measure of depression (Y) and one for stress (X). Based on
a random sample of students from a very large state university, the psychologist obtains the following data:
,
\
Stress Test \ Depression Measure
a. IT a student's stress score is 32, what is his or her depression score apt to be?
" = 60.60 as 61
Y
b. A new student receives a 20 on the depression measure. What is her estimated score on the stress
test?
X as 14.39 or 14
c. What percentage of the variation in the depression measure is not explained by scores on the
stress test? The coefficient of nondetermination (1 - r2) = .4671. Approximately 47% of the
variation is unexplained.
Appendix E Answers To Odd-Numbered End-Of-Chapter Problems E·20
25. Is regression analysis appropriate? No, it is not. Regression analysis is only appropriate if the variables
correlate substantially.
27. Is a regression analysis appropriate? No, it is not. Regression analysis requires two scores from each par-
ticipant, one on the "independent" variable and one on the "dependent" variable. Since in this problem two
scores on each variable are not available for most subjects, regression analysis should be avoided.
Ii Chapter 8
1. What are some examples of the sort of probability judgments you make in daily life? If I buy this car, it
will run for a long time. If I study, I will get a better grade. If I ask my friend to lend me $5.00, she will.
Even if they are difficult to quantify, can they still be probability judgments? Why or why not? Yes, they can
still be probability judgments because they are expressions of the likelihood of an event based on past life ex-
periences with similar events.
3. Steve is playing a slot machine in one of the major hotels in Las Vegas. His friend, Paul, tries to convince
him to leave it and join other friends in the main casino. Steve refuses, noting that "I've spent 20 bucks on this
thing so far-I've primed it. I know that I'm due to win soon!" Assuming the usual type of programming, the
machine is a random processor. Thus, the machine has no memory. It does not remember how much Steve has
"invested." What comes up on one pull has no relationship to what comes up on any other pull. The machines
cannot be primed.
Characterize the inherent flaws in Steve's reasoning. What do statisticians call this sort of thinking and be-
havior? Steve is committing the gambler's fallacy in assuming some kind of orderly relationship exists where it
does not.
5. When flipping a coin and obtaining a string of either "heads" or "taiIs,"why do some people assume that
they can accurately predict what the next flip will be? They believe that runs of one type must be more likely
to end on the next flip or two (another form of the gambler's fallacy) because extended runs are such a rare
event.
7. In your opinion, what does the quote from Stanislaw Lern mean about the role probability plays where
human judgments are concerned? What he means is that if the outcome is certain to occur, one does not need
a formal method for making decisions. It is only if there is uncertainty about the outcome that one needs to
have some rule-based method for choosing among alternatives.
9. Using sampling without replacement, provide the probabilities listed in question 8.
a.
6
2 1
b. -or-
6 3
c. 1 X 29 = 870 = 0.246
6 59 3,540
11. Determine the following probabilities: p(X = 4); P(X = 11); P(X > 5); p(X < 3); p(X ~ 8); p(X s 8).
10
P(X= 11) = -53 = .189
39
p(X> 5) = -53 = .736
p(X < 3) = 0
p(X~ 8) = ~~ = .547
p(Xs 8) = ~ = .660
53
E-21 Appendix E Answers To Odd-Numbered End-Of-Chapter Problems
13. A sack contains 13 black marbles, 16 white marbles, 4 pink marbles, and 8 pink and white marbles.
What is the probability of selecting a marble that is black? p(black) = 13/41 = .317
What is the probability of selecting a marble that is pink? p(pink) = 4/41 = .098
What is the probability of selecting a pink or a white marble? p(pink or white) = 4/41 + 16/41 =
20/41 = .488
What is the probability of selecting a marble that is pink and white? p(pink and white) = 8/41 = .195
15. Which of the following distributions of coin tosses is more likely to occur than any of the others?
H-H-H-T- T-Tor H-H-H-H-H-H or T-H-T- T-H-H? They are equally likely to occur.
17. Using the data provided in question 16, show how the multiplication rule for dependent events can be
used to calculate the joint probability of being alone and not offering to help the confederate.
= ~ X iQ. = 0.093
50 86
19. A measure of romantic attraction has a population mean of 75 and a population standard deviation of 8.
What is the probability of obtaining a score between 76 and 82?
21. A multiple-choice test has 100 questions, and four possible responses to each question. Only one out of
each of the four responses is correct. If a respondent is just guessing, what is the probability of getting 48
questions correct by chance?
25. A student wants to calculate the likelihood that several events will occur but she does not know the
number of possible observations favoring each event. Can she still calculate the probabilities? No, she can-
not. It is not possible to calculate the probability of events if the total number of possible observations is un-
known.
27. Which probability rule is appropriate for each of the following situations?
a. Events are conditional upon one another: calculate the conditional probability [e.g., peA I B)].
b. Events occur in an independent sequence: use the multiplication rule for independent events.
c. Events are mutually exclusive: use the addition rule for mutually exclusive events [e.g., peA or
B) = peA) + p(B)].
d. Events are not mutually exclusive: use the addition rule for nonmutually exclusive events [e.g.,
peA or B) = peA) + p(B) - peA and B)].
II Chapter 9
1. What is point estimation? Point estimation uses a single sample statistic to estimate the corresponding
population parameter.
Is point estimation different from interval estimation? How so? Yes, point estimation and interval estima-
tion are different. While point estimation uses a single sample value to estimate a population parameter,
Appendix E Answers To Odd-Numbered End-Of-Chapter Problems E·22
interval estimation uses repeated samples from the same population to gain information about the variabil-
ity among sample estimates.
What role do these two forms of estimation play in hypothesis testing? Point values of experimental effects
.- can be used to determine if they differ from known population parameters. Interval estimates yield infor-
I
(
mation about the accuracy of an estimate of the corresponding population parameter.
r
I 3. How do frequency distributions differ from sampling distributions? A frequency distribution is based on
individual scores and their frequency of occurrence, while a sampling distribution is based on statistics from
.- repeated samples of a particular size. There is only one frequency distribution for a given set of observations,
but there is a different sampling distribution for each different sample size.
5. If fixed, reasonably large sample sizes are repeatedly and randomly drawn from a population, what will
the shape of the sampling distribution of means be like? Why? The sampling distribution of means of samples
r of a reasonable size from a population will tend to be normally shaped. Most of the sample means will be close
to each other and to the population mean. Only a relatively few sample means will deviate much from the pop-
J ulation mean. This outcome pattern, according to the central limit theorem, yields an approximately normal
!
distribution .
...
I 7. Why is the law oflarge numbers relevant to the central limit theorem? Since the central limit theorem ar-
gues that as sample size increases, and the sampling distribution of means becomes more normal in shape, it
is important to know that the law of large numbers implies that large sample sizes yield better estimates of the
parameters than small sample sizes. Both laws stress the importance oflarge sample sizes.
fI 20
.'
9. a. ux= \7i5 = 5.16
20
b. Ux = \740 = 3.16
20
c. Ux = \765 = 2.48
d. Ux = Jso = 2.24
20
e. U x = ""Vi"i() = 1.91
CI95% = 56 ± -2.45
CI99% = 56 ± 3.22
13. Name several of the components comprising a good hypothesis. A good hypothesis is an operationally
clear statement of how the independent and dependent variables are related. It is theory or experience based,
and is testable and understandable.
15. Formulate Ho and HI using a directional test and then a nondirectional test.
Directional test:
Ho: At-risk readers will score at or below the population average after taking part in the pro-
gram (JL •• risk:=; JLpop)'
HI: At-risk readers will score above the population average after taking part in the program
(JL •• risk> JLpop)'
Nondirectional test:
Ho: At-risk readers will perform as the general population after taking part in the program
(JL •• risk = JLpop)'
HI: At-risk readers will not perform equally to the general population after taking part in the
program (JL •• risk "* JLpop)'
17. Statistical analysis is guided by the null hypothesis and not the alternative hypothesis-why is this so? It
is easier to prove a general (universal) statement false since that requires only one negative example. Proving
the research or alternative hypothesis true requires that it be true in all possible instances. It is not possible, log-
ically or practically, to observe all instances.
i
!
r
t
E-23 Appendix E Answers To Odd-Numbered End-Of-Chapter Problems
Why is it difficult to prove an alternative or research hypothesis? How does this difficulty enhance the utility
of the null hypothesis? Since it is easier to find evidence that the null hypothesis is false, that is how science pro-
gresses, by attempting to falsify the null hypothesis. To the extent the null hypothesis is false, we can logically
infer the alternative is true.
19. Define the word significant, as well as its use, in statistical contexts. The word significant means statisti-
cally reliable (likely to be repeatable) in the context of statistics. It is used to describe experimental outcomes
which are rare or unusual if the null hypothesis is true.
23. Explain the difference between one-tailed and two-tailed significance tests. In a two-tailed test there are
three possible outcomes: the observed statistic falls into the upper rejection region, the lower rejection region,
or the retention region between the two. In a one-tailed test, there are only two possible outcomes: The ob-
served statistic falls either into the rejection region or the retention region.
Is one test considered to be more statistically rigorous than the other? Why? In a sense, the two-tailed test
is considered more rigorous because the critical values are farther into the tails of the distribution than for
a one-tailed test of the same significance level. This makes it harder to reject the null hypothesis since the
statistic must be farther into the tail as well.
Which test enables researchers to satisfy their curiosity regarding relationships among variables in a statis-
tical analysis? The two-tailed test is more useful in exploring relationships regardless of the direction of out-
come.
25. In conceptual terms, what are degrees offreedom? Conceptually, degrees of freedom represent the num-
ber of values in a sample that are free to vary when estimating a particular population parameter (usually one
less than a sample's size). One is said to lose one degree of freedom for each parameter estimated.
More practically, how are degrees offreedom used by data analysts? Practically speaking, the number of degrees
of freedom is used to determine the critical value for a statistical test.
Step 2: sx = 2.36
Z'OI = 2.59
Step 4: Cannot reject Ho. There is insufficient evidence to indicate that boys and girls score differently
on the test of moral awareness.
Appendix E Answers To Odd-Numbered End-Of-Chapter Problems E·24
r.os = ± .250 df = 48
Step 3: Reject the null hypothesis
Step 4: There is a moderately small significant negative (inverse) relationship between weight and
self-esteem such that the higher the weight, the lower the self-esteem (and vice versa).
33. What is a Type II error? A Type II error is a failure to reject the Null hypothesis when it is, in reality, false.
Why do Type II errors occur? Provide an example. Increases in rigor (i.e., requiring .01 rather than .05 for re-
jection of the null hypothesis) can make a Type II error more likely. Also, small sample sizes increase the like-
lihood of a Type II error, as does a weak effect of the independent variable. For example, if you are interested
in the effects of temperature on learning and you use 70° for the control group and 75° for the experimental
group, you are less likely to detect an effect for temperature than if you use 70° and 110°, respectively.
35. What can be done to reduce the incidence of making a Type I error? You can reduce the chance of a TYPe I
error by making a a (alpha) smaller (e.g. use .01 rather than .05).
What can be done to reduce the incidence of making a Type II error? You can reduce the chance of a Type II
error by increasing a (alpha), increasing sample size, and/or by increasing the potency of the independent
variable.
How can a researcher balance the demands of these competing concerns in a research project? Generally,
the best way to balance the problem created by these errors is to replicate experiments, even those where dif-
ferences are not found. In that way, the success or failure of the replications can clarify the nature of the re-
lationship between the independent and dependent variable.
37. Define the word power, as well as its use, in statistical contexts. Power is the abiliy to achieve particular
research goals. Power is the probability of correctly rejecting a false null hypothesis. In the statistical sense,
power is 1- f3 (the probability of a Type II error). It is used to describe the sensitivity of a particular statistical
test in a given experimental situation.
39. Define the word effect size, as well as its use, in statistical contexts. Effect size is a measure of the impact of
your independent variable on the behavior being examined (how much change in behavior for each change in the
independent variable), or the degree to which your variables are related (the correlation between them). Usually
it is reported as a value based on the size of the difference between means divided by the standard error of the dif-
ference where .20 is considered small, .50 moderate, and .80 or more large.
41. Can the researcher still perform the appropriate hypothesis test? Why or Why not? No. With no infor-
mation about the population parameters, there is nothing with which to compare the sample values.
43. Can you give her any specific analytic guidance before the analyses begin? The sample size is very small.
I would suggest collecting additional data.
III Chapter 10
1. Why is the ttest used in place of the z test? The ttest is used instead of the z test because it enables infer-
ences to be made about differences when samples are small and the population parameters are unknown.
What happens when one of these assumptions is violated? Violating one or more assumptions of the t test
results in an uncontrolled change in the probability of a Type I error from the established level of .05 or .01
to some other, unknown, probability. The robustness of the test is reduced.
E·25 Appendix E Answers To Odd-Numbered End-Of-Chapter Problems
Can the ttest still be used-why or why not? In many cases the t test may still be used because changes in the
actual alpha level tend to be slight. The t test is "robust" and relatively insensitive to violations of the assump-
tions. The only exception to this robustness occurs when the samples are unequal in size, the populations are
nonnormal, and the variances of the populations differ.
5. Why are larger samples desirable? Larger samples mean more accurate estimates (Le., lower standard er-
rors) and they enhance the opportunity, therefore, to reject Ho. In addition, larger samples mean more degrees
of freedom available for the hypothesis test.
How do larger samples influence the size of a sample standard deviation and error? In general, for a given size
of sums of squared deviations (55), the larger the sample size, the smaller the standard deviation and standard
error. The 55, after all, is divided by degrees of freedom, which are related to a sample's size, and the standard
error is calculated by dividing the standard deviation by the square root of sample size.
Sj{ = 1.237
tobs = 2.42
t.os = ±1.697 df= 31
Reject Ho. Small college students appear to be more introverted than students from large universities.
15. a. 34.33 ± 2.33 is the 95% confidence interval for problem 13.
b. 134.556 ± 2.684 is the 99% confidence interval for problem 14.
17. Why is the t test for independent groups ideal for hypothesis testing in experimental research? The t
test for independent groups verifies whether there is a difference between the means of an experimental and a
Appendix E Answers To Odd-Numbered End-Of-Chapter Problems E·26
control group that is unlikely to be due to chance. The experimenter may argue that this difference is due to
the effect of the independent variable since the two groups were the same before it was applied (except for ran-
dom differences).
19. What is a subject variable? A subject variable (organismic variable) is one that is a characteristic the sub-
ject possesses (e.g., gender, height, manual dexterity) which cannot be assigned by the experimenter.
How are subject variables used in concert with between-groups designs and the independent groups ttest? In
a between-groups design with subject variables, the organismic variable is treated like an independent variable
and the ttest is used to determine if the performance of participants who possess different degrees of the char-
acteristic (e.g., M/F; Hi, Med, or Lo dexterity) also differs.
21. Explain the nature of the conceptual model for comparing means presented in this chapter. The statis-
tical model is based on the size of an observed statistic (difference between means) relative to the size of the
standard error for that statistic. The statistic is then compared to a critical value from the appropriate sampling
distribution, which exists if Ho is true. If the observed value exceeds the critical value (falls in the rejection re-
gion), then Ho is rejected; if not (falls in the nonrejection region), then it is not rejected.
Why is this model an appropriate prelude for most inferential statistical tests? This is an appropriate model
to learn because it is similar to models used in other statistical tests of inference and can help in understand-
ing those procedures.
Conclusion: Reject Ho. The flxed noise condition resulted in fewer reliably correct responses than the ran-
dom noise condition.
27. The analysis indicates that the random noise condition resulted in significantly more math problems
completed (X = 8.6 correct) than in the fixed noise condition (X = 6.8 correct). This is in opposition to the
expectation that random noise would be more disruptive.
~ The effect size of r = .69 indicates that the independent variable had a relatively large effect w2 = .43,
r indicating that 43% of the total variance could be accounted for by the different types of noise.
r'
I 29. Effect size: r = .71, a large effect
(
t} = .466 indicating 47% of the variance is due to the independent variable.
31. Are there any advantages to conducting a correlated groups design rather than an independent groups
design? If so, what are they? Yes, there are advantages to the correlated groups design. The main advantage to
a correlated groups design for an experiment is that it keeps the effects of random variation to a minimum
since the same (or very similar) subjects appear in both conditions. Each subject can serve as his or her own
contro!' The error term (denominator of the t formula) for a correlated groups test is smaller than for a corre-
sponding independent groups test.
33. What is a carryover effect? A carryover effect is an effect of experience at time, in an experiment that
continues to affect behavior at time2. but it is not the result of the independent variable.
Why do such effects pose concerns for correlated groups designs? These effects may change the internal va-
lidity of an experiment by offering alternative explanations (confounding) for the outcome of an experiment
and thereby clouding conclusions.
35. Do you think the project in problem 34 could be susceptible to any carryover effects? If so, which one(s)
and why? It is possible that the scores on the second test may have been improved by having experienced a sim-
ilar test at time,. The situation on the second testing will be more familiar and the participants may be less ner-
vous, which, by itself, might yield higher scores.
37. Do you think the project described in problem 36 could be susceptible to any carryover effects? If so,
which ones and why? Since one round of layoffs has occurred and more may be expected, the employees may
report more stress because they are sensitized by the stressful event of seeing the fellow employees laid off, their
increased likelihood of being laid off in the future, or both, and not by any effect of being a survivor per se.
39. Why should investigators learn to perform a power analysis? Power analysis enables a researcher to as-
sess, after failing to reject a null hypothesis, if it is possible in the future to find a significant difference under
the conditions of this experiment. It helps to determine if a particular line of research is likely to be fruitful.
E·27 Appendix E Answers To Odd-Numbered End-Of-Chapter Problems
Power analysis can also provide a way to plan future research in terms of number of participants necessary
under the experimental conditions to achieve a reasonable level of power.
Should a power analysis be performed before or after a study? Why? A power analysis should be performed
after a study in which the null hypothesis was not rejected to determine if the power in the study was low or
high. If power was low, the analysis could indicate how to proceed (i.e., choose the number of participants) in
subsequent studies. If power was high, this would suggest that perhaps a new experimental approach would be
more useful. It is difficult to assess power before completing a study.
41. a. Use a correlated groups ttest.
b. Use a z test.
c. Use a correlated groups t test.
d. Use an independent groups t test.
e. Use a single-sample t test.
III Chapter 11
1. Why should any researcher-student or professional-avoid becoming overly focused on finding sta-
tistical significance when conducting behavioral science research? Since not all interesting results are found in
significant differences, and since the failure of an effect to appear may often be as interesting as its presence, it
is best to focus on the correct analysis and interpretation of the data rather than solely on finding statistically
significant differences.
3. What are some advantages of the ANOVA that make it a versatile and useful statistical test? The
ANOVA can assess differences between three or more means at the same time.
Is it more versatile or useful than the other statistical tests used for mean comparisons? Why or why not? The
two-group experiment is severely limited in scope because it focuses on the presence or absence of an effect.
ANOVA is decidedly more versatile than the t test because it can look at differences among several values of the
independent variable in a single experiment, which better reflects the complexities of causal relationships in
behavioral research.
5. Explain the source of the variation used to calculate the F ratio in a one-way ANOVA. The numerator
of the F ratio contains the variation between or among the means of the several samples (conceptually) while
the denominator contains variation between participants or experimental error.
Is the partitioning of the variance for the ANOVA similar to the way it is divided by the analysis performed
by the ttest? Conceptually, this partitioning is identical to the manner in which the variation is divided in the
ttest. In the ttest the numerator is the difference between the means of the two groups (between-groups vari-
ation) and the denominator is a measure of error variance (Sx 1 - x)
7. Briefly describe the F distribution-how is it similar to or different from the t and z distributions? The
F distribution is a unimodal positively skewed distribution. Since the F ratio is formed by dividing one vari-
ance by another, there can be no negative results, while it is possible to get a negative or positive tor z. Unlike
the situation with the t and z distributions, if there is no effect for the independent variable (Le., Ho is true and
the numerator of the tor z ratio = 0), if Ho is true in ANOVA, the ratio will equal 1 (Le., the expected value of
Fifthe null hypothesis is true = O.
In addition to being used to detect mean differences, like the t distribution, the F distribution is a family of dis-
tributions, the members of which are distinguished by the number of degrees of freedom in the numerator and
denominator.
9. Is it possible to calculate an F ratio with a negative value? Why or why not? No, it is not possible. The F
ratio is a ratio between two variances. Variances are the average squared differences between a score and the
mean. Although the differences may be either positive or negative, when squared they are positive. The ratio of
two positive numbers must always be positive.
11. Under what particular circumstances do the independent groups t test and the Fratio share a special re-
lationship with one another? Explain the nature of this relationship. The F and the t test share a special rela-
tionship when the F is used to test for differences between two means. Under those circumstances, the square
of the tvalue will equal the Fvalue (i.e., t 2 = F).
13. Assuming a =.05:
p (of at least one Type I error in comparing six means) = .54
P (of at least one Type I error in comparing eight means) = .76
Appendix E Answers To Odd-Numbered End-Ot-Chapter Problems E·28
17. Source 55 df M5 F
Between-groups 15 2 7.5 3.52
Within-groups 57.5 27 2.13
Total 72.5 29
19. Source 55 df M5 F
Between-groups 40 2 20 9.34
Within-groups 90 42 2.14
Total 130 44
21. Source 55 df M5 F
Between-groups 100 3 33.33 6.67
Within-groups 180 36 5.00
Total 280 39
23. Source 55 df M5 F
Between-groups 74.08 2 37.04 41.212"
Within-groups 18.88 21 .899
Total 92.96 23
'p< .05
Reject Ho
25. f = 1.98
/:J2 = .77
27. a. Source 55 df M5 F
d. Twenty-four nursing students were taught clinical skills, six in each of one presentation type: text,
computer simulation, observation, and lecture. They were then quizzed over the material. The means and
standard deviations of the four groups can be found in Table 1 and are graphed in Figure 1.
An overall analysis of variance yielded a significant value (F3 •20 =16.91; P < .01) indicating some of
the groups differed. Post hoc analysis using Tukey's HSD test indicated that participants in the simulation
condition had higher scores than participants in the text condition but were not reliably better than partic-
ipants in the observation condition (see Table 2). Participants in the lecture condition did not differ from
those in the text condition, but scored significantly more poorly than participants in either the Observation
and Simulation conditions (see Table 2).
The effect size was very large (f =1.59) and there was a high degree of association between presenta-
tion type and test scores «(;} = .665) accounting for approximately 67% of the total variability.
6
rn
CD
~
c:I
u 5
til
=4
IV
CD
==
3
1
LEe OBS SIM
Treat
29. a. Source SS df MS F
Table 1 Means and Standard Deviations for the Four Presentation Types
SD
j -
1:
CI
E
CI
u 4
5
c::
III
r ~ 3
2
0
3
Distance
E-31 Appendix E Answers To Odd-Numbered End-Of-Chapter Problems
3 1.0 .000
12 2.6 .548
18 3.2 1.304
24 5.0 .7071
36 6.8 .447
.p < .05
"p < .01
33. Is a one-way ANOVA the appropriate statistical test to analyze these data? Why or why not? No, ANOVA
is not appropriate. The analysis of variance requires the dependent variable be measured on an interval or ratio
scale. Since the dependent variable is an ordinal number, ANOVA is not appropriate.
35. What should a researcher do next? Why? Following a significant Fthat supports a hypothesis, the inves-
tigator should engage in contrast analyses to determine if the specifics of the hypothesis are supported (i.e.
Thkey's HSD) (additional exploratory comparisons should also be considered). If the contrast analysis sup-
ports the specifics of the hypothesis, the f (effect size) and &i should be included in the analysis. This approach
will clarify the exact nature and strength of the findings .
• Chapter 12
1. Conceptually, how does a two-way ANOVA differ from a one-way ANOVA? A two-way ANOVA design in-
cludes at least two levels of two different independent variables so that each participant experiences a combination
of one level of each independent variable. For example, an experiment might consist of two levels of anxiety (high
and low) and two levels of problem difficulty (easy and difficult), with each subject getting one of the four
possible combinations: high-easy, high-difficult, low-easy, and low-difficult. Alternatively, the experiment might
consist of two strains of rats (albino and hooded) and three levels of reward (small, medium, and large) result-
ing in six combinations: albino-small, albino-medium, albino-large, hooded-small, hooded-medium, and
hooded-large.
3. What is a factorial design? In a factorial design each level of each independent variable is combined with
each level of all other independent variables once.
How can you tell how many different levels are present in a given factorial design? The total number of com-
binations of each independent variable is the product of the number of levels of each independent variable
(i.e., a 2 X 3 X 4 factorial design has two levels of the first variable, three of the second, and four of the third
for a total number of combinations of 24).
5. What is a main effect? Define this term and provide an example. A main effect is reflected in a signifi-
cant difference among the means of one independent variable in a factorial design (represented by row or col-
umn means) without considering any other independent variable (summed across the other variable [s]). Con-
sidering the example given in the answer to problem 1, if there is a significant difference between the levels of
anxiety when summed across problem difficulty, that would be a main effect for anxiety.
7. Why are interactions only associated with factorial designs? Interactions result when the effects of one
variable are different at the different levels of the other variable(s), a finding which requires all combinations
Appendix E Answers To Odd-Numbered End-Of-Chapter Problems E·32
of the levels of one variable with the levels of the other(s). For example, again from the answer to problem 1,
if the effects of anxiety are different for easy problems (e.g., low anxiety facilitates easy problems, but high anx-
iety has no effect) and difficult problems (e.g., low anxiety has no effect on difficult problems, but high anxi-
ety interferes with them). These combinations and their effects can only be found in factorial designs.
9. How many F ratios result from a two-way ANOVA? What is the purpose of each of the Fs? Three F ra-
tios result from a two-way design: a main effect for variable A; a main effect for variable B; and an interaction
between A and B. These F ratios independently test the three null hypotheses concerning the two main effects
and the interaction.
11. What method of analysis should the researcher use? Why? The method of analysis should be a 2 X 2
ANOVA (two genders by the presence or absence of older siblings). The measure of complexity is, probably, at
least an interval scale and there are two independent variables.
13. Indicate the presence of any main effect(s) or interaction. There are no main effects for either factor A
or B, but there is an interaction.
10
15. Indicate the presence of any main effect(s) or interaction. There is the possibility of a main effect for
both factors A and B, as well as an interaction.
20
15
10
5
17. Perform a z - Factor ANOVA: Data table: Mean number for each combination
BI B2
Al 6.0 3.67 4.83
A2 9.0 6.67 7.83
7.5 5.17 6.33
Source table
Source 55 df M5 F
A 54 54 37.67*'
B 32.67 32.67 22.79**
AXB 0 o o
Within-groups 28.67 20 1.433
Total 115.33 23
10
A1 A2
7
en Real
=
~
CD
:IE 6
ii
=
'f' 5
~
:IE
Nonsense
'=
! 4
=
U
LI.l
Wordtype
- Nonsense
3 -Real
2
Warm Ambient
Appendix E Answers To Odd-Numbered End-Of-Chapter Problems E·34
Source table
Source 55 df M5 F
23. Source 55 df M5 F
Factor A 85 85 9.00··
Factor B 93 93 9.85 00
AXB 52 1 52 5.51·
Error within 340 36 9.44
Total 570 39
.p < .05
o.p < .01
a. 2.2.
b. 10
c. Yes. Factors A and B at p < .01 and the interaction at p < .05.
d. Fcrit.l.36 E!! 7.43 for p < .01; approximately 4.12 for p < .05
25. = .419, a moderate effect
fA
ti = .17
fB = .44, a moderate effect
tJ2 = .18
fAX B = .317, a small effect.
tJ2=.1O
B2
10 16 13 - 11.5 = 1.5
12 8 10 - 11.5 = -1.5
11 12 11.5
-11.5 = -.5 -11.5 = .5
Bl B2
8.5 14.5 11.5 - 11.5 =0
13.5 9.5 11.5 - 11.5 =0
11 - 11.5 = -.5 12 - 11.5 = .5
Bl B2
9 14 11.5 - 11.5 =0
14 9 11.5 - 11.5 =0
11.5 11.5
E-35 Appendix E Answers To Odd-Numbered End-Of-Chapter Problems
Bl B2
-2.5 +2.5
+2.5 -2.5
11
83
'ii
I:
'e 8
ftI
82
==
"1:1 7
-
CD
f tI
E 6
;:
tI.I
-1.00
-2.00
-3.00
L1.I
81
4
2.00
•
1.00
Chapter 13
1. How does a one-way repeated-measures ANOVA differ from a one-way ANOVA? Like the dependent
groups ttest, the repeated-measures ANOVA involves taking repeated samples from the same participants' be-
havior on three or more occasions, each occasion under a different level of the independent variable (avoiding
carry-over effects, of course). Drug dose studies are a good example. If an investigator is interested in the rela-
tive pain-reducing effects of several different doses of an aspirin, the investigator might give eight subjects four
doses of aspirin (i.e., placebo, 250 mg, 500 mg, and 1,000 mg) in random or counterbalanced order and measure
the time it takes for the partici-pant to remove his or her hand from a bucket of ice and water.
3. Why would a data analyst elect to use the dependent groups t test instead of a one-variable repeated-
measures ANOVA? Illustrate your points through an example. With more than two measures on each sub-
ject, the repeated measures ANOVA would be the choice since repeated ttests inflates the probability of getting
a chance difference above the level of alpha. A dependent groups ttest assesses a mean difference at 2 points in
Appendix E Answers To Odd-Numbered End-Of-Chapter Problems E·36
time, whereas a one-variable repeated measures ANOVA can assess mean change at 2 or more points in time.
If a researcher wanted to change the effectivness of a drug on some disease across a 6 month period (i.e., mean
change each month for 6 mos.), the repeated measures ANOVA would be appropriate. If change was being as-
sessed only twice, then the dependent groups t test would be appropriate.
5. dt = 2 for the between-groups term
dt = 9 for the between-subjects term
dt = 18 for the residual error term
7. What is the next step(s) the investigator should take? If significance is obtained for the overallANOVA,
the investigator should follow with Thkey's HSD test to locate the differences and then assess the effect size (f)
(w
and degree of association 2 ) between the independent and dependent variables.
9. What are the degrees of freedom for an F ratio from a one variable repeated measures ANOVA design
r with 4 dependent measures and 10 participants?
!
I
numerator degrees of freedom =3
denominator degrees of freedom =27
[
11. Are there any advantages that repeated-measures designs have over between-groups designs? Yes, several
advantages exist Repeated measures designs use fewer subjects, are generally more powerful, view change over
time in the same individual (reducing error variance by eliminating some variation between different subjects),
and they lead to more effective and detailed understanding of complex relationships among variables.
13. Source SS dt MS F
1i"eatment 40.5 3 13.5 22.09·
Between-groups SS 5.0 3 1.67 2.73
Error (residual) 5.5 9 .61
Total 51 15
.p < .05
Tukey's HSD = 1.72
Absolute differences between means are
2 3 4
.p < .05
The repeated measures ANOVA yielded an overall significant difference (F2•18 = 22.09, P < .05). Tukey's
comparisons indicated that treatment 1 had significantly lower scores than all others; 2 and 3 did not differ
from each other, but both had lower scores than treatment 4.
15. fbetween = 1.96, a very large effect.
w = .78
2
17. Source SS dt MS F
Between-groups 30 2 15 5.40·
Subjects 8 9 0.89 .32
Within-groups 50 18 2.78
1 Total 88 29
.p < .05
19. fbg = .72 a large effect
w = .29
2
E·37 Appendix E Answers To Odd-Numbered End-Of-Chapter Problems
2l. Source SS dt MS F
~2 = .52
23. Under what sort of circumstances would an investigator elect to use a mixed-design ANOVA? An inves-
tigator would use a mixed-design when the research contains one (or more) repeated measures on the same
participants and one (or more) independent groups. For example, a study of gender differences (independent
groups) and the effects of delta-9 tetrahydracannabinol (marijuana) on memory for stories would require a
mixed design.
How does a mixed-design ANOVA differ from the other forms of ANOVA reviewed in this and the previous
chapter? In a mixed-design there is both a between groups and a within groups variable, while in the other de-
signs there is one or the other, but not both.
III Chapter 14
1. What is a nonparametric statistical test? A nonparametric test makes few (if any) assumptions about the
nature of the data or the shape of the population distribution from which it is sampled.
How do nonparametric tests differ from parametric tests? Although nonparametric tests make few distribu-
tional assumptions (i.e., they are often referred to as distribution free tests), parametric tests are concerned
with populations and drawing inferences about their parameters. In order to draw valid inferences, paramet-
ric tests must meet appropriate assumptions (e.g., normality, random sampling, equal variances). In general,
parametric tests are more powerful than nonparametric tests when the assumptions can be met.
3. How do interval and ratio scales differ from nominal and ordinal scales? Provide an example of each
type. Interval and ratio scales have equal intervals between scale points and are compatible with the real num-
ber system. Ordinal and nominal scales do not have equal intervals and are not real number compatible.
Length and temperature (Fahrenheit and Centigrade) are good examples of ratio and interval scales (respec-
tively), while rank in graduating class and gender are good examples of an ordinal and nominal scale (respec-
tively).
5. Why should researchers in the behavioral sciences be open to learning to use nonparametric statistical
tests? Nonparametric tests tend to have fewer restrictive assumptions, may be used with nonnumerical data,
are more effective with small samples, and require easier calculations to arrive at a more intuitively under-
standable conclusion than parametric tests. All of these characteristics make them a significant tool for analy-
sis when measures and procedures do not meet the assumptions of parametric tests.
7. Which nonparametric tests are most similar to the ttest for independent groups? The chi-square test of
independence (two groups) and the Mann-Whitney Utest are useful in cases where the ttest might apply, but
failure to meet critical assumptions prevents its use.
9. Which nonparametric test can be used to analyze data from one sample only? The chi-square goodness-
of-fit test is the only nonparametric test useful with one sample.
11. What is the nonparametric counterpart of the Pearson r? The Spearman r is the nonparametric counter-
part of the Pearson r when only ranks (ordinal) data are available.
Appendix E Answers To Odd-Numbered End-Of-Chapter Problems E·38
13. Under what conditions is it preferable to use Cramer's V statistic instead of the phi (q,) coefficient:
Cramer's V statistic is used in place of the phi coefficient following a chi-square test of independence when the
contingency table is greater than 2 X 2 (e.g., 2 X 3,3 X 3, etc.).
15. What nonparametric test will allow them to quickly determine their level of agreement? Why? The
Spearman T correlation coefficient should be used because it estimates the degree of association (agreement)
between two sets of ranks (ordinal data).
17. What test should be used to demonstrate that requests for the three pizzas varied? The appropriate test
to answer the owner's question is the chi-square goodness-of-fit test.
19. Data Rank Data Rank
100 1 34 9
99 2.5 33 10
99 2.5 32 U.5
89 4 32 U.S
78 5 23 13
70 6 22 14
67 7 12 15
56 8 11 16
6 17
21. Al A2
B1 9.5 9.5
B2 9.5 9.5
23. Al A2 A3
BI 12.5 7.5 15
B2 12.5 7.5 15
UB=O
Ueri.ical = 13
Since UB < 13, reject Ho. The species differ in activity.
35. /Lu = 798 Uu = 103.79
ZUA = -5.26 ZUB = -5.4 X 1 Zcri•. = ±2.58
There is a significant difference.
37. Since the sum of +R = 0 and 0 < 2 (Teri •.), reject Ho. The film improved the ratings.
39. Spearman T = .83.
II Chapter 15
1. Identify some of the problems or concerns linked to the use of the null hypothesis. Focusing on the null
hypothesis often eliminates any examination of other, simpler ways of analyzing or describing data. This is fre-
quently coupled with a failure to look at power and/or effect size as a way of determining the practical impor-
tance of the findings or to educate the general reader on the real meaning of the phrase "statistically signifi-
cant" as distinct from meaningful or important.
3. Discuss six of the specific recommendations made by the American Psychological Association's (APA)
Task Force on Statistical Inference (TSFI). How can these recommendations improve the use of statistics and
data analysis, as well as the reporting of results? The task force recommends using a minimally sufficient an-
alytic technique to improve communicability of findings. More sophisticated analyses should be fully justified
and explained. The task force also recommends that care should be taken in imputing causality to experimental
effects and any supporting statistics should be included with the main results to strengthen causal assertions.
Problems with methods, data acquisition, and the like should be discussed clearly in terms of potential impact
on the conclusions before presenting any results. Random selection and/or assignment should be thoroughly
documented, and all variables (independent, dependent, and control) should be clearly operationally defined
and discussed as to their relevance, and the manner in which they support the aims of the research.
By taking these recommendations into account, a researcher can be assured that his or her results and con-
clusions will be clear and understandable, their importance and generality recognizable, and the likelihood of
their being replicated will be increased.
5. How can replication be used to demonstrate generalizability? Successful replication, particularly pro-
grammatic replication, increases awareness that a particular method and analysis (conceptual and statistical)
will produce reliable findings and conclusions in a variety of related settings with different investigators. Thus,
successful replicability implies that those variables and their effects generalize beyond a particular method, set-
ting, or researcher.
7. In your opinion, is it possible for a researcher to become overly dependent on statistics? HOM Yes, it is
possible. Too much dependency on statistics may serve to restrict the investigator's examination of the data,
increasing the possibility that an interesting finding might escape notice. This is particularly true when analy-
sis is applied with a routine, unthinking, recipe-like approach.
9. How do Abelson's (1995) MAGIC criteria impact on the tension between obtaining right answers and
offering accurate interpretation? Focusing on the more global implications of your data with respect to their
magnitude (effect size), articulation (how the results fit together), generality (how widely or narrowly the re-
sults apply), interestingness (what importance do they have), and credibility (how believable are the data based
on methods, analysis, and interpretation) ensure that readers will be clear about the relationships represented
by your data in real depth and with real meaning.
11. What evidence supports the contention that statistical or data analytic skills acquired in one domain
can be transferred to others? Explain. Several studies which show that statistical, inferential, and logical skills
improve over time following a first course in statistics, and generalize to other environments, support the con-
clusion that such knowledge is transferable and not domain specific. The work of Lehman and coworkers is
clear evidence that statistical knowledge improves over time, producing better inferential skills in psychology
students than in their humanities counterparts.
13. As a novice data analyst, how will you keep your statistical skills sharp? Retaining statistical knowledge
requires continued use and practice. Perhaps the best way of doing so is to design a research project of one's
own, collect data, and analyze and report the findings. If that is not feasible, one should volunteer to partici-
pate in the ongoing research of a member of the faculty. Failing either of those, periodically set yourself
research problems and use the decision trees to refresh your memory on how to design and analyze an exper-
iment you might do to test your hypotheses.
15. When planning to analyze some data using statistical software, what are some of the dos and don'ts a
data analyst should follow. Do remember the phrase "less is more" and don't use a more complex analysis that
the data and hypothesis require. Do be sure you know where the numbers come from, how they appear, and
where they are going in the chosen analysis (remember the famous computer phrase "GIGO," "garbage in,
garbage out"). Do be able to recognize an unexpected (and perhaps incorrect) statistical result (e.g., an unex-
pected significant difference or no difference where there should be one based on the summary statistics).
Don't just read the output and take the "correctness" for granted.
17. Why should behavioral scientists be concerned about the educational implications of the analyses used
to support their research? Behavioral scientists must be sure that the results and implications of their research
are clear and understandable by the consumer of the information. It is ethically important that a researcher ed-
ucates the public about what a result means, and, perhaps as important, what it does not mean. Failure to do so
may result in the outcome of a study being misunderstood or misinterpreted. The motives of the investigator
Appendix E Answers To Odd-Numbered End-Of-Chapter Problems E·40
may also be suspect when the scientist fails to make the purposes, methods, and the limitations of his or her re-
search clear and understandable.
19. You are a behavioral scientist whose research and analyses have led to some controversial finding that
is upsetting to the general public. How should you react? What are your responsibilities to the public and to
your research, including statistical analysis? A scientist whose work has created a controversy should react
calmly and without defensiveness. He or she must be prepared to explain and defend the work and interpreta-
tion. The scientist is ethically responsible to make as clear as humanly possible the findings, their strength and
generality, the reasons for a particular interpretation, alternative interpretations (if any) and any limitations of
the methods, results, or statistical analysis so that the meaning of the research may be judged on its merits.
21. Why do you suppose some investigators might see "trimming" as more acceptable than the other three
classes of fraud? Is it more acceptable-why or why not? "Trimming" or eliminating extreme scores after the
data are collected but before analysis, might be seen as more acceptable because it helps clarify and make
"cleaner" the statistical results and their interpretation. It is usually felt to be more limited in extent than the
other three fraudulent approaches. "Trimming" after the data are collected is as unethical as any other fraudu-
lent treatment of the data, despite its relatively "minor" scope.
23. Why are fraudulent, false, or otherwise misleading data so problematic where the scientific literature is
concerned? Fraudulent, false, or otherwise misleading data are so problematic because: (a) they might never
be recognized as false; (b) even if retracted, results may persist because some people may not learn of the re-
traction,like a bad newspaper story later retracted on the back pages; and (c) false results can waste other re-
searcher's resources by leading them down wrong pathways and focusing attention on proving or disproving
the false result.
25. Is a researcher ethically bound to report the results of all statistical analyses performed on a data set? Why
or why not? Although the researcher is ethically bound to have available all analyses (and the raw data), I do not
feel he or she is ethically bound to report all analyses in any write-ups. The researcher is, however, ethically bound
to report all results that pertain to the main aims of the research. Exhaustively reporting all analyses run on the
data set may be redundant, confusing, and could cloud important results and the conclusions to which they lead.
27. In your opinion, what contributions do statistics and data analysis make to the behavioral sciences? Sta-
tistics provide an important methodological tool for organizing, summarizing, and analyzing empirical data.
Without the availability of statistical methods, we would have no way of communicating the character of our
data, interpretations, and conclusions with the necessary accuracy and support. Without the shared methods
of analysis that statistics yield, it would be difficult, if not impossible, to obtain agreement on the meaning of
our findings among investigators or the public.
29. How can the breeder determine which breed has the least reactive (i.e., emotional) temperament, which
one has the most, and so on? Which statistical test is best suited to the breeder's hypotheses? Why? Assuming
the dependent measure is quantitative and continuous (e.g., number of squares entered in a novel environ-
ment), then the appropriate analysis would be a one-way ANOVA between breeds. If the dependent measure
is not quantitative and continuous (e.g., ranks) some nonparametric test (chi-square or Wilcoxin) would be
appropriate.
31. Which statistical test is best suited to test the director's hypothesis? Why? Presumably, the Director
wants to compare the. rankings of all students from both years by rank ordering them all. The most appropri-
ate test for rankings when two groups are involved is the Mann-Whitney U test. Since there will be a larger
number of students involved, the director could use the normal approximation of the Mann-Whitney U.
33. Which statistical test is best suited to test the researcher's hypothesis? Why? Repeated measures,
ANOVA is the appropriate analysis. There are three quantitative measures (number of errors at the beginning,
middle, and end) on each child.
35. Which statistical test is best suited to test the researcher's hypothesis? Why? The appropriate analysis
would be a regression analysis based on the Pearson product-moment correlation coefficient to determine if a
predictive relationship exists. This is appropriate because the data consist of two quantitative measures (num-
ber of siblings and rated intellectual environment) not collected in an experiment.
III Appendix F
1. Why are qualitative and quantitative approaches perceived to be incompatible with one another? Ex-
plain why they are arguably compatible with one another. Qualitative and quantitative approaches are per-
ceived to be incompatible because many believe that quantitative research is limited because data are not ap-
plicable to everyday experience, whereas qualitative approaches can better explain everyday life, but not cause
E·41 Appendix E Answers To Odd-Numbered End-Of-Chapter Problems
and effect relationships or relationships that may apply to a representative population. They are complements
to one another. Qualitative research needs some quantitative approaches and vice versa (i.e., triangulation).
3. In your opinion, why are the behavioral sciences so compelled to measure and manipulate variables in
the study of human behavior? Behavioral sciences are compelled to measure and manipulate variables be-
cause the world is focused on numbers. Numbers and quantitative reasoning gave rise to modern science and
technologies that humans depend on. Measurement gives rise to some sort of order that we can explain, and
we think that makes human behavior credible as a science.
5. Define qualitative research and list some of its characteristics. Qualitative research is a formal investi-
gation that is not dependent on statistical or other quantitative procedures. Some of its characteristics are par-
ticipant perspective and diversity, appropriate methodologies and theories, reflexive investigators and re-
searchers, and variety of qualitative approaches and methodologies.
7. Why is perspective or point of view less important in quantitative rather than qualitative research?
Does it matter less there? Why or why not? In quantitative research, there should be no point of view or per-
spective because the data one gathers and analyzes should be objective and unbiased.
9. Briefly discuss six rules of thumb for qualitative researchers.
1. Avoid studying your own group: reports may be biased and unfair.
2. Rely on outside readers to evaluate your work: unbiased perspective is invaluable.
3. Time spent on analysis is as important as time spent on data collection: the primary objective is to
be able to interpret your findings.
4. Create a model of what happened in your work: develop an outline illustrating the research process
from start to finish.
5. Share the data with the research participants: because they are the source of the data, they should
have the right to see everything.
6. Be on the watch for contradiction and points of conflict in the data: inconsistencies can be a source
of insight.
11. Explain the idea behind participant observation and how it varies from ethnography. After defining
these terms, provide a concrete example of each one. Participant observation entails entering into a research
setting as a study participant and monitoring, recording, and interpreting behavior, and interacting with study
participants. An example would be an investigative journalist disguising herself or himself as a retiree in a nurs-
ing home to discover if the residents there were abused or neglected by the medical staff.
Ethnography is a research approach dependent on the undisguised, objective study of the habits, cus-
toms, and actions of people in a culture as they go about their daily lives. An example would be a researcher
wanting to study the customs and habits of Tibetan monks. He or she would spend time in a Tibetan village or
convent, and then study the everyday habits and customs of the monks, but as an outside person.
13. What is a narrative? Identify and define the two main types of narratives used in research efforts. A
narrative is an individual's story or other verbal account of an event that is prompted by a semi-interview or
question. The two basic types are narrative interviews, where the interview is open-ended and biographical,
and episodic interviews, where the interview is more focused, usually on a specific event.
15. Create a prompt for a narrative interview. Narrative interview prompt: I want to ask you to tell me
about what you wanted to be in life as you were aging. The best way to do this is starting from your childhood
and going in chronological order until now. You can take your time in doing this and also give details.
17. How does a focus group differ from a more traditional interview? Why do some qualitative researchers
argue that focus groups represent a better research technique than other qualitative methods? A focus group
is an interview involving a group of people whereas a traditional interview is usually one-on-one. Some re-
searchers argue that focus groups represent a better research technique because they are more dynamic
and help evaluate more different research populations.
19. Imagine that you are running a focus group. Select a topic for the group to discuss and then describe
how you would lead the group through the four steps usually associated with focus groups.
Step 1: Topic: Should drivers be allowed to use cell phones while driving? Discussion to agree or disagree.
Step 2: Introduce members and emphasize that everyone chosen for the discussion group either drives or
owns cell phones.
Step 3: Show news articles on accidents occurring from drivers using cell phones on the roads.
Step 4: Review group conformity issues, and what to do to jump start discussions.
Appendix E Answers To Odd-Numbered End-Of-Chapter Problems E·42
21. List eight characteristics of qualitative researchers, and briefly explain in your opinion why each is im-
portant.
1. Possessing tolerance for ambiguity: not all questions can be answered concisely.
2. Flexibility and openness to change plans or directions in the course of the research: as with human
behavior, one must realize that research must adapt and change to different needs.
3. Patience and resourcefulness: all research cannot be done in a day; one needs to be patient and to be
able to find other ways to perform research.
4. Ability to commit equal time and effort to field work and analysis: to get valid results, work must be
done thoroughly.
5. Trust in self and others: a person cannot do research alone and has to trust in the ability of others.
6. Self-knowledge and self-awareness: person needs to rely on his or her instincts, whether learned or
natural.
7. Authenticity: as with all research, be honest and scientifically sound.
8. Good writing skills: a researcher must be able to show and explain results to others.
23. Which research approach-qualitative or quantitative--should you follow? Why? Use a qualitative ap-
proach by observing both child and parent. Because it is a specific event, an episodic narrative may be appro-
priate by participant observation. Also, a focus group may aid in collecting information.
Choosing a Qualitative or a Quantitative Approach
1. 2. 3.
Will the research involve Will the investigator be required Will the investigator closely
categorizing verbal to· enter a research setting and monitor the habits, customs, and
communications? . interact with participants? behaviors of a different culture to
make sense of its worldviews?
····.Ifyes; If yesi
then consider then consider
performing a performing a
content analysis. participant
observation study
(but see step 3).
5.
Will the narrative be focused on··;
some specific event or
experience?
A P PEN D x F
(
f
,.
EMERGING ALTERNATIVES:
, (
QUALITATIVE RESEARCH
APPROACHES
Thou shalt not sit
With statisticians nor commit
A social science.
-from "Under Which Lyre" by W. H. Auden (1991, p. 339)
hese lines are from a long, humorous poem written by W. H. (Wystan Hugh)
Auden, a British emigre to the United States and one of the 20th century's great
poets. Auden wrote these lines in 1946, though their apparent meaning resonates
with the desires, if not experience, of more recent generations of students and quite a
few professional academics. Many students would prefer studying various topics that fall
under the purview of the behavioral sciences without having to rely on statistics and data
analysis (perhaps you felt this way before beginning your statistical odyssey back in chap-
ter 1-1 hope that I have been able to reduce any lingering misgivings since then). In the
opinion of these students, "mathematizing" people's behavior often takes what is real and
concrete and makes it vague or abstract, difficult to grasp conceptually. What makes peo-
ple inherently interesting sometimes gets lost, as it were, in the quantitative translation.
For their part, many teachers and researchers are unsure about the placement of the
social and behavioral sciences, as well as some of the fields typically listed under these
headings. Sometimes the discipline of psychology will be identified with the natural rather
than the behavioral sciences, for example. Those who work in behavioral science fields-
sociology, education, anthropology, political science, as well as psychology-often
feel pressed to justify their existence to people outside their respective disciplines (see
Figure E1). Some observers are quick to comment that at best, the behavioral sciences are
trying to make relatively subjective topics appear to be objective, even overly "technical";
at worst, these fields are engaging in describing and cataloging matters that are transpar-
ent to all but the most casual of observers. In correspondence with a fellow philosopher
of science, for example. Imre Lakatos threw up his hands and wrote that, "The social
F-l
\
I
F·2 Appendix F Emerging Alternatives: Qualitative Research Approaches
sciences are on par with astrology [the pseudoscience of how heavenly bodies influence
human affairs 1, it is no use beating around the bush" (parenthetical comment, mine;
quoted in Motterlini, 1999, p. 107; see also, Data Box El). Other critics point out that the
quantitative turn of the behavioral sciences is at odds with its subject matter-the social,
political, psychological, and economic lives of people in all their variety and diversity. The
charge is that human experience should be experienced, discussed, or debated, but not
measured or otherwise calculated like materials found in the physical world.
As a result of the behavioral sciences' measurement mania, quantitative research is
often criticized for being of limited use (e.g., Flick, 1998). In particular, critics and com-
mentators argue that data collected in behavioral science research are rarely seen as di-
Despite their ubiquity, measurement
rectly applicable or helpful to life in the everyday world. Experimental rigor, cause and
and quantification remain effect relations, and concerns about representative populations-to name a few of the
controversial in some quarters. quantitative topics I defend in this book as essential to statistical analyses-are often
Emerging Alternatives: Qualitative Research Approaches F·3
deemed too far removed from the concerns or problems that affect the lives and for-
tunes of many people. Inevitably, even the most rigorously scientific research and pris-
tine results are colored by the interests and the social, political, and cultural backgrounds
of the people conducting it (Flick, 1998; see also, Kuhn, 1970). Although some biases
are reduced through experimentation, those associated with interpretation and point
of view often remain (for a related discussion concerning the power of perspective, see
Dunn, in press).
Collectively, of course, the behavioral sciences occupy contested territory between
the humanities (e.g., literature, theology, philosophy) and the natural or physical sci-
ences (e.g., chemistry, biology, physics), two sets of disciplines that have a relatively solid
sense of what sorts of questions can be asked and which methods can be used to ob-
tain answers. One reason that the behavioral sciences are stuck betwixt and between
these older sets of disciplines is the unavoidable overlap in conceptual material. The be-
havioral sciences purport to examine vast areas of human experience, domains claimed
for a far longer time by the humanities. Questions of human existence, destiny, and
purpose, for example, are readily found in the plays of Sophocles and Shakespeare, nov-
els by Iris Murdoch and William Faulkner, or the philosophy of Hegel, not just text-
books on psychology or education, the sociology of the family, or American national
government. The behavioral sciences do not deny the insights of humanistic sources
but try to back up their unique claim of understanding by arguing for empirical ap-
proaches that contrast with more intuitive explorations in the humanities. At the same
time, of course, use of the term "empirical" often elicits the ire of practitioners in the
natural sciences who do not believe that the behavioral sciences are very scientific to
begin with.
Whether these disparate views carry some weight or represent little more than a
tempest in an intellectual teapot is beside the point because they do influence the way
The conceptual home of the people think about the behavioral sciences. Auden's lyrical commandment about avoid-
ing statisticians and behavioral science, then, puts him in good company; some of the
behavioral sciences, as well as their
people working in, learning about, or observing what is done in the behavioral sciences
relation to the humanities and the share his concerns. His observation is clever, even quite funny, because it is poignantly
natural sciences, is uncertain. apt. To some extent, the behavioral sciences-and behavioral scientists-are trying to
find their way, though I am quick to argue that the struggle is an intellectually rich and
rewarding one (Dunn, 1999).
Why raise these concerns in an appendix found in a book devoted to statistics and
data analysis, bulwarks upon which the foundations of behavioral science rest? I would
be remiss if I presented research and methods in the behavioral sciences as monolithic
because diverse perspectives not only exist but also flourish. We spent the 15 chapters
of this book familiarizing ourselves with the statistical views of the behavioral sciences,
especially in the discipline of psychology. In this appendix, we will follow a different
path, one leading from quantitative to qualitative approaches to research, from a dom-
inant perspective to a nascent minority view.
KEY TERM Qualitative research uses empirical methodologies where resulting data appear and are interpreted
in nonnumerical forms, and the subjective nature of the research enterprise is acknowledged.
This appendix will discuss the characteristics of qualitative research and then present
some alternative approaches to doing research and performing data analyses that are
not based exclusively on quantitative measurement.
Is it odd to be discussing qualitative alternatives to statistics in a text largely
devoted to such subject matter? Perhaps so, but there is a method to this unconven-
tional madness. I want to demonstrate that data analysis is not just a collection of tech-
niques for working with numerical information. Many techniques are available for the
F·4 Appendix F Emerging Alternatives: Qualitative Research Approaches
Between the years 1250 and 1600, there was a dramatic shift in people's thinking about
the physical world, from a qualitative understanding to an increasingly quantitative one.
Intellectual, social, and cultural development in western Europe centered on mechanical
devices, especially clocks, double-entry bookkeeping, increasingly precise maps, and even
perspective drawing in painting (Crosby, 1998). The increasing reliance on numbers and
quantitative reasoning during the late Middle Ages through the Renaissance gave birth to
advances in modern science, the rise of bureaucracies, the economic ideas underlying
businesses and entrepreneurial ventures, and, of course, all of the technologies humans
came to depend on in succeeding generations.
Prior to this change in perspective, western Europeans surely knew how to count
and had more than a basic grasp of mathematical relations, but their understanding of
the world and events within it was somewhat arbitrary. Available ambient light and the
change of the seasons regulated time and labor, for instance, not clocks or calendars.
Somewhere into the 12th century, however, knowledge of number and elementary mea-
surement became-in the words of historian Alfred Crosby-a consuming "passion:'
Measurement provided order and later an accounting of events, which in turn led to
numerical record keeping, ledgers, and the like. Where time is concerned, we no longer
have just mechanical clocks but atomic ones that maintain the "correct" or "official"
time to millionths of a second. We are indeed passionate about our numbers.
This passion for numbers continues largely unabated today; after all, you are your so-
cial security or student identification number! (Stop for a moment and reflect on howof-
ten you rely on that arbitrary but assigned number, as well as how often you will rely on
it across your life.) Just take a look around yourself at how many interests, activities, and
basic needs that you must satisfy are based on quantification or measurement of some sort.
You break your day into discrete units (time) and each day is assigned a number (date),
for example. You also compare your academic performance with peers (GPA), receive pay-
ment in return for your labor (money), acquire debts (money to be paid back with inter-
est), and rely on computers with their binary programming (all information is actually
represented as strings of Os and Is)-the list of quantified entities you encounter is almost
endless. Consider, too, the millennium fever that has gripped the world for the past
several years, as well as in the past (Gould, 1997). And yes, even the much belabored and
The Presenting Prablem: Measuring Reality F·5
ballyhooed Y2K problem is just another case of quantification (albeit one that almost
went awry).
Not surprisingly, the behavioral sciences gradually came to depend on the power
Numbers sametimes have a of numbers, as well. Why did measurement and quantification become a fixation of the
behavioral sciences? The ability to measure and quantify observations lends credence
representational pawer .or influence
to the argument that the study of human behavior in a formalized way makes it a sci-
in human affairs. ence (Leahey, 1997).
Is Psychology a Science?
W hat makes a science a science? How does a science differ from a nonscience or a pseudo-
science (i.e., a discipline pretending to be a science)? In some quarters, there are still attempts
to rehabilitate astrology by labeling it a science (Holt, 1999).
The energy-less light than heat, usually-put into answering this question is akin to that
spent in 13th century arguments regarding the question, "How many angels can dance on the
head of a pin?" Answers were found but they did not satisfy everyone. To be sure, there are vol-
umes on the history, sociology, and philosophy of science, and we cannot review their many per-
spectives here. What we can do, however, is to consider a definition for science used by Leahey
(1997), who suggests that psychology cannot be a science in the traditional sense. A science
searches for, "exceptionless general laws that apply to spatiotemporally unrestricted objects"
(Leahey, 1997, p. 462; see also, Kuhn, 1977; Motterlini, 1999). Whoa, just what does that mean?
Well, think about it: Physics's laws presumably apply to planets and stars everywhere, not
just those found in or near our single solar system. In contrast, psychology is a discipline that
deals with really only one species-humans-and when and if any equally detailed laws of hu-
man behavior are derived, they will certainly not be "spatiotemporally unrestricted" as they are
based solely on intelligent life on this planet. This somewhat process-oriented definition of what
constitutes science will not satisfy everyone, but it does point to an important distinction that is
often overlooked when disciplinary boundaries are drawn.
Leahey (1997) is not the only one to cast doubt on psychology's scientific pretensions or to an-
swer in the negative about its status as a science. Considerable work on this question emanates from
the disciplines of philosophy and biology. Some prominent philosophers argue, for example, that the
increasing understanding of the brain and its neurophysiology will gradually replace and eventually
eliminate psychology (e.g., P. M. Churchland, 1985; P. S. Churchland, 1986; Rosenberg, 1983; see also,
Stich, 1983). Independent of but in sympathy with these sorts of philosophical positions, the field of
biology, too, weighs in on psychology's future and finds it to be limited (e.g., Rosenberg, 1980, 1994).
In particular, sociobiology, the idea that even social and cultural behaviors are based on and serve
largely biological or reproductive ends, poses an clear interpretive threat (Wilson, 1975).
All is not lost, of course. Many psychologists take great umbrage at these sorts of conclu-
sions by correctly noting that the discipline is quite young, so it is much too soon to draw any
definitive conclusions about psychology's fate. Reports of its demise may be a bit premature, es-
pecially given the discipline's great strides in recent years. There is also the obvious fact that most
psychologists-educators as well as researchers and practitioners-are going about their business
oblivious to the fact that the scientific status of their work is in question (and as was previously
acknowledged, status in the sense of recognition may be the chief worry, anyway).
Besides, if psychology is not a science by some set of definitions, does that really change any-
thing? Probably not all that much. For your part, you should wonder whether reliance on statistics
helps or hinders psychology's earnest desire to be treated like a member of the natural sciences. At the
same time, you must also recognize that a science is not just defined by its techniques but also by its
subject matter and the degree to which its findings are used to explain phenomena. You need not agree
with Leahey's (1997) definition of the scope of what a science is and what it does. Choosing to dis-
agree with his perspective or any of those being promulgated in philosophy, biology, or any other fields
of study, however, requires some constructive engagement. What counterarguments can you offer?
In the end, is there any truth regarding what is or is not a science? The philosopher Paul
Feyerabend (quoted in Motterlini, 1999, p. 249) claimed, "The truth, whatever it is, be damned.
What we need is laughter:' A wise, if not necessarily philosophical, observation we should keep
in mind as we go about our work in the behavioral sciences.
The Presenting Problem: Measuring Reality F·7
form of "field work" that became formalized in both sociology and anthropology in the
19th and early 20th centuries. Then and now, the overriding research concern involves
portraying the meanings and unique frames of reference individuals impute to their
lives. Making sense out of this created or derived meaning is an important goal of qual-
itative research, which has other salient qualities, as well.
Qualitative methods actually developed along side of quantitative approaches. The
figure that is often hailed as the "founder" of experimental psychology, Wilhelm Wundt,
also relied on descriptive methods in what he called "folk psychology;' Wundt's folk psy-
chology was meant to augment experimentation, as he believed that the latter could
never provide a complete psychology of individual mental life. Generally speaking, folk
psychology was an attempt to examine the historical development of the human mind
by examining aspects of collective experience (e.g., myths, languages, customs, tradi-
tions) that aid individual development (see Leahey, 1997, for more detail). The method-
ology of folk psychology was clearly qualitative but Wundt believed that it comple-
mented and was not antithetical to experimental programs. Regrettably, of course,
researchers following Wundt were drawn to the "rigor" of quantification, characteriz-
ing qualitative approaches as representing a "soft" approach to the study of behavior.
Exceptions to the rule existed, of course. American sociology had a decidedly qual-
Qualitative methods have always itative twist from the late 19th century until at least the 1940s (Flick, 1998). In partic-
ular, sociologists were drawn to employing a descriptive style when writing about case
existed, but their identification per se
studies, narratives, or biographies of their subjects. For the next 20 years or so, the quan-
occurred mostly as a reaction to titative mind-set gained followers in sociology, but this trend came under fire in the
quantitative techniques. 1960s. American sociology began to do some soul searching, noting that traditional
modes of inquiry provided rich characterizations of social life that the newer mathe-
matical approaches could not capture. German intellectuals took up this same concern
in the 1970s, heralding an interest in subjectivity and what is now known as postmod-
ernism (for a more detailed history see, for example, Denzin & Lincoln, 1994).
and so forth in the course of an investigation (see the section devoted to writing
issues later in this appendix). This subjective material, too, finds its way into any
publication or report about the research because it is deemed an important part
of the process of doing qualitative research. Whether the investigator studying
female medical school students was also a woman would de facto have some
influence on the conclusions drawn from the work (as would a male examining
the same events).
• A variety of qualitative approaches and methodologies. Although there is often
disagreement between different investigators or schools of thought in the
quantitatively oriented side of the behavioral sciences, there is relative unanimity
where method and theory are concerned. As tools, for example, statistics and data
analysis help to provide some objective guidance in the course of research.
Qualitative research lacks this unified character, as there is no single vision for
what to do, how to do it, and how to explain or think about it. A plethora of
(sometimes conflicting) perspectives and methods exist where, in the words of
Flick (1998, p. 7), "Subjective viewpoints are a first starting point:' Our medical
school researcher would literally walk onto the scene with certain expectations
about medical school (e.g., not friendly to women, an "old boys'" club) before
collecting an iota of data. An interesting question then becomes how these
expectations are challenged, met, or possibly changed by the medical school
venue selected for the research, and so on.
These four features should not be construed as exhaustive, of course. Indeed, the
very nature of qualitative approaches, especially their willingness to accept novel per-
spectives, should lead us to assume that other authors or investigators would charac-
terize qualitative research using very different terms. Such constructive "disagreement"
is entirely in keeping with the liberal contextualism of qualitative research. We can now
turn to a review of some of the ways behavioral scientists examine qualitative relation-
ships in research.
Content Analysis
When a qualitative researcher interviews a participant or examines the expressions
found in some written communication, how are these verbal responses analyzed? Some
researchers rely on a broad technique called content analysis.
KEY T ERM A content analysis involves creating a system or procedure for categorizing the content and fre-
quency of communications, which are usually verbal.
Content analyses can be performed on the verbal content of live or taped communica-
tions (e.g., interviews, films, documentaries) as well as archival material (e.g., diaries,
Alternative Methods: Selected Approaches to Qualitative Data F·11
records, books, articles). The results of a content analysis can be used to describe the
nature of some exchange or to ask more focused questions about the behavior of the
participants under examination. I once performed a content analysis as part of a study
on people's well-being following the amputation of a limb (Dunn, 1996). Seventy-seven
percent of the participants wrote about something positive resulting from their ampu-
tations, and these coded attributions were organized into conceptual categories based
on prior research dealing with the search for "silver linings" in adversity (e.g., Schultz
& Decker, 1985; Taylor, Lichtman, & Wood, 1984). Categorized sample attributions from
this content analysis are shown in Table E2.
One of the methodological concerns of doing a content analysis is illustrated by
"Content" refers to the coded two of the attributions in Table E2. These attributions require the investigator to "fill
in the blanks" left by respondents. The first two attributions include bracketed words
meaning of a verbal response.
(i.e., [where] and [I]), indicating that a respondent left out a word or words. The in-
vestigator must carefully select words that finish the spirit of the original thought. The
word substitutions shown in Table F.2 appear to be satisfactory, but more radical or de-
tailed changes to a categorized verbal explanation can be problematic. As a result, the
content analyst must always proceed cautiously when evaluating his or her data.
Following a somewhat different procedure, psychologists Christopher Peterson and
Martin Seligman developed a particular type of content analysis called the CAVE (con-
tent analysis of verbal explanations) technique (e.g., Peterson & Seligman, 1984), one
that involves some quantitative elements. Relying on the theoretical framework provided
by learned helplessness theory (e.g., Peterson & Bossio, 1989), Peterson and Seligman ar-
gued that the manner by which people habitually explain negative events not only
colors their perceptions, it is also linked to the possible presence of depressive sym-
tomatology (see also, Alloy, Abramson, & Francis, 1999). When people chronically ex-
plain negative events-being snubbed by a friend, performing poorly on a test-by ap-
pealing to internal ("it's me"), stable ("it's not going to change"), and global ("it will
affect everything in my life") causes, these researchers find that people's pessimism puts
them at greater risk for depression and health problems. One study found that college-
aged males who possessed this pessimistic explanatory style were more likely to have
poorer health between the ages of 45 and 60 than their more positive peers (Peterson,
Seligman, & Vaillant, 1988). Another discovered that pessimistic candidates for the
American presidency between 1948 and 1984 tended to lose the election more often
F·12 Appendix F Emerging Alternatives: Qualitative Research Approaches
than not, quite possibly due to passivity created by their negative outlooks (Zullow &
Seligman, 1990).
As shown by the work on attributions for amputation and negative explanatory
style, a content analysis can be used to classify responses or the respondents themselves.
Any resulting categories can be studied in their own right, used to create novel inter-
pretive frameworks, or combined with existing quantitative methodologies.
Generally, participant observers will have some theory or hypothesis guiding their
data collection, though they can often take advantage of unexpected events. Certainly,
then, this element of empirical spontaneity highlights the qualitative character of par-
ticipant observation-unplanned or unforeseen events can rarely be investigated ex-
perimentally or recreated in a laboratory setting (Weick, 1968). A classic demonstra-
tion is the doomsday group described by Festinger, Riecken, and Schachter ( 1956), where
a member of the research team infiltrated the group's membership in order to study its
process as the prophesized date (but not Armageddon) drew near.
A more recent but equally intriguing project employed ethnography, a close rela-
tive of participant observation. The study examined the experience of being a college
student living in a university dormitory-before I describe it in some detail, let me de-
fine ethnography.
KEY TERM Ethnography is a research approach dependent on the undisguised, objective study of the habits,
customs, and actions of people in a culture as they go about their daily lives.
Ethnographers utilize a qualitative perspective by trying to describe how a given cul-
ture makes sense out of its social experience. Unlike the standard form of participant
Alternative Methods: Selected Approaches to Qualitative Data 1F·13
Almost anyone who uses the term [ethnic group 1would say that it is a group distin-
guishable from others by one, or some combination of the following: physical character-
istics, language, religion, customs, instructions, or "cultural traits:' (Hughes, 1984, p. 153)
Most behavioral scientists would concur with this approach, noting that it fits in nicely with the
quantitative approach of operationally defining terms (e.g., recall chapters 1 and 2 in this book).
Becker (1998, p. 2) notes that he and his fellow students were implicitly out to compare a given,
identified "ethnic" group with a perceived "nonethnic" group. In short, the former is labeled an
"ethnic" group because it is different from the latter.
Hughes said no, that a better perspective was to turn the sequence around a bit using a sim-
ple "trick":
An ethnic group is not one because of the degree of measurable or observable differ-
ences from other groups [both quantitative dimensions]; it is an ethnic group on the
contrary because the people in and the people out of it know that it is one; because
both the ins and the outs talk, feel, and act as if it were a separate group. (Hughes, 1984,
pp. 153-154; parenthetical comment mine)
A group is a group because those in it and those outside it perceive it as different. Other fac-
tors-religion and language, for example-are important but only insofar as they shed light on
the (existing) ethnic relationships and networks occurring between groups. Hughes pointed to
an elegant and often overlooked qualitative distinction that pervades social life. His trick was to
examine the networks where the other factors are derived and used to create social worlds and
the points of view of people in and outside them.
There are, of course, countless other qualitative tricks one can use, and no doubt these tools
can inform or complement the work of statistics and data analysis. The trick for the quantita-
tively oriented researcher is to put his or her usual approaches on hold for a bit, allowing the
qualitative alternative to shed new light in theorizing about behavior.
around with one's subjects for a long enough time to start hearing them in their more
natural adolescent tones ... sensing their own priorities as they understand them:'
Moffat's work is appropriately titled Coming of Age in New Jersey, a play on a classic
piece of anthropology by Margaret Mead (Le., Coming of Age in Samoa; Mead, 1961),
herself a proponent of this involving form of interpretive field work.
During the course of his dorm life, Moffatt (1989) found that the way the under-
graduate students interacted with him changed over the course of his relationship with
them. Once they discovered he was not a fellow student but rather an interested an-
thropologist (he made no attempt to hide his identity or real purpose for being there),
they spoke to him differently-they used fewer obscene words and phrases in his pres-
ence. Note that a traditional quantitative investigation would identify this behavioral
change as a confound or bias introduced by the researcher, where a qualitative per-
spective could construe the change as a matter of course that no doubt represents the
complexity of social interaction outside the undergraduate age group. Interpretation is
indeed a matter of perspective.
questions oriented to a given topic. Flick (1996), for example, asked a variety of focused
questions concerning technology and people's reactions to it (e.g., "Which parts of your
life are free of technology? Please tell me about a typical situation!").
One of the most promising directions for narrative research involves the impor-
tance of biography and key episodes to people's lives and beliefs about their own de-
velopment (e.g., Bruner, 1987, 1991). Psychologist Dan P. McAdams (1993), for exam-
ple, uses a biographical narrative approach to study personality, one that incorporates
culture, social roles, and individual construal processes in identity development. Briefly,
McAdams believes that we use our life stories and personal myths to make sense of our
experiences. In effect, we derive meaning in our lives by essentially telling stories about
ourselves to others and, for that matter, to ourselves. This novel view runs counter to
the prevailing wisdom that people's personalities are either based on relatively fixed
personal characteristics or reactions to a general series of stages everyone necessarily
encounter.
According to McAdams (1993), these stories define us, and across time we put them
together in order to develop a personal myth. Personal myths can be cultural (e.g., tragic
myths tend to emphasize the absurdity oflife) or unique to an individual (e.g., "No one
taught me to be an artist-I just watched the world around me and internalized its
beauty."). The interesting part of myths is the mixing of truth with fiction, as they are
comprised of memories, perceptions, and our hopes for the future. To McAdams (1993,
p. 13), "We do not discover ourselves in myth; we make ourselves through myth." In this
way, the narrative tradition can be seen as a productive method for examining the
development of self-concept across the life span.
---:. tempt to create any sort of a representative sample of users. Remember, focus groups
F-16 Appendix F Emerging Alternatives: Qualitative Research Approaches
J anesick (1998, p. 70) characterized the requisite skills and temperament of the ideal qualita-
tive researcher. She believes particular qualities are necessary for conducting as well as com-
pleting a qualitative project. As you read the following list, think about whether these character-
istics are different than those you (now) associate with quantitative investigators. Similarly, how
would you rate yourself on these qualities?
/
Alternative Methods: Selected Approaches to Qualitative Data
going but must remain "flexible, objective, empathic, persuasive, a good listener"
(Fontana & Frey, 1994, p. 365). More often than not, a group of observers is also stand-
ing by to watch the focus group in action. These observers can be physically present in
the interview room or sequestered behind a partition or one-way mirror. Other times,
the focus group might be filmed for later viewing and analysis. In any case, the ob-
servers are interested in studying the fresh, ongoing impressions or perceptions of the
focus group participants. They are not especially interested in asking the focus group
to engage in any decision making or problem solving; indeed, the responses gathered
during the session can be incorporated into such activities at a later time. Where con-
sumer goods are concerned, then, focus groups entail brainstorming about reactions to
existing products or hopes for product development.
Methodology. Readers should be apprised of the particular methodology used (e.g., par-
ticipant observation, focus group) and research procedures (e.g., interview, notes based on
memory for events, audio or videotape).
Time Frame for the Research. Readers should know how long the study took, as well
as how much time was involved in data collection out in the field.
F·18 Appendix F Emerging Alternatives: Qualitative Research Approaches
Participants. Who were the participants? How many were there? What were they like?
In what setting(s) were they observed or interviewed?
Research Design. How were the participants selected or recruited? Why was one re-
search method chosen over another? Provide a detailed rationale of the whys and where-
fores of the project's methodology.
Investigator Point of View. Why was the study undertaken in the first place? How did
a researcher's view change over time? What presuppositions did the investigator bring
to the work?
Relationship with the Participants. How did the investigator interact with the study's
participants? Did the participants act naturally with the researcher? Characterize the
nature of this relationship and how it affected the data collection, analysis, and
conclusions.
Investigator's Analysis. How were the data analyzed? Were the participants given the
opportunity to review the analyses and report? Did the researcher impose any checks
or balances on the conclusion he or she drew?
The content and scope of these issues clearly err toward the descriptive. Little
emphasis is placed on accuracy in the sense of providing a "correct" or absolute ac-
count of what took place. There is a conscientious attempt, however, to be accurate
in reporting details great and small, as well as giving readers a flavor of the research
process. To be sure, many or perhaps all of these issues can be explored in writing
about quantitative issues, but the desire to be reflective about or to actually partici-
pate in the research effort as more than observer or experimenter is unusual. Few
quantitative researchers would openly embrace the qualitative idea of discounting, ei-
ther. Nonetheless, clear and concise prose about behavioral science results (or find-
ings!) is the sincere goal of both the qualitative and quantitative camps, qualities we
should aspire to in our own work.
he two decision trees opening this appendix will prove helpful when deciding
whether to conduct a qualitative or a quantitative piece of research (remembering,
of course, that elements of each approach are found in practically any study-iden-
tifywhich one predominates). The first decision tree provides gentle guidance where dis-
tinctions between these two approaches are concerned, as it is not feasible to highlight
every possible distinction in one place. The second tree focuses on selecting a qualitative
approach from one of the four main research approaches presented in the appendix. The
distinctions between the research approaches, too, can be fluid-a hallmark of qualitative
work-but you should feel free to consider others that can be found in other resources.
F-20 Appendix F Emerging Alternatives: Qualitative Research Approaches
Summary
1. There is a tension between quantitative and non- 8. A content analysis is a form of qualitative data reduction.
numerically based or "qualitative" research perspectives in Typically, a content analysis categorizes self-report or verbal
the behavioral sciences. data according to some guiding theory or framework.
2. Although qualitative research is popularly perceived as anti- 9. Participant observation refers to situations where a re-
thetical to quantitative work, it is actually complementary to searcher actively observes and sometimes engages partici-
it. Most quantitative projects possess qualitative elements, and pants during the process of data collection. Ethnography, a
vice versa. related approach, is the objective examination of all the char-
3. Quantitative worldviews have their origins in the 13th cen- acteristics of a given culture.
tury, when western Europeans began to create various mech- 10. Narratives or stories are used to portray the autobiographical
anized and mathematical devices to measure things, as well as nature of experience. A narrative begun in reaction to an
to direct labor into new avenues. The passion from that time open-ended question is called a narrative interview. One di-
continues to this day-numbers and technology drive mod- rected toward a specific issue or incident is called a narrative
ern culture. episode.
4. The behavioral sciences embraced quantification in order to 11. Qualitative group processes can be explored in a focus group,
gain the credibility associated with the natural sciences, espe- where a relatively small number of people are gathered to dis-
cially physics. The discipline of psychology is noteworthy for cuss some item or idea for a fixed period. The focus group is
having demonstrated "physics envy:' a desire to have the theo- an antidote to the largely singular or individualized self-
retical and methodological precision found in the oldest sci- reports found in most other qualitative approaches.
entific pursuit. This envy or emulation may be problematic 12. Writing about qualitative research is just as rigorous as sum-
because psychology and the other behavioral sciences do not marizing quantitative efforts, though a couple differences
possess the same characteristics as physics. stand out. Qualitative reports are usually focused on one as-
S. Qualitative approaches have always existed, especially in the pect of a qualitative study; despite the plethora of data usu-
form of narratives or other personal forms of writing or record ally available, not everything is reported. Qualitative authors
keeping. Qualitative methods became more formalized in the usually provide sufficient detail about a project so readers
19th century, however, and gradually gained use in the behav- can accurately judge its success or, conversely, to "discount"
ioral sciences as a viable alternative to quantification. the work as subjective, a recognition that proof is not a cru-
6. Symbolic interactionism, a prominent qualitative tradition, is cial issue in qualitative alternatives.
concerned with the subjective nature of meanings people at- 13. Meaning is more important than method, and researchers
tach to things. are encouraged to examine their data more closely and with
7. Qualitative research adheres to an open systems view in that more of an eye to replication than has heretofore been the
data are gathered in numerous ways using a variety of meth- case. Generalizing results from one setting to another is as
ods. There is no central qualitative method or paradigm, dependent on the practical matter of replication as it is on the
rather there are many, many ways to examine human behavior. statistical significance of results.
Key Terms
Content analysis (p. F-lO) Ethnography (p. F-12) Qualitative research (p. F-3)
Discounting (p. F-17) Narrative (p. F-14) Symbolic interactionism (p. F-8)
Episodic narrative (p. F-14) Narrative interview (p. F-14)
Focus group (p. F-lS) Participant observation (p. F-12)
Appendix Problems
1. Why are qualitative and quantitative approaches perceived 4. What is "physics envy"? Does psychology suffer from it? In
to be incompatible with one another? Explain why they are your view, do other areas of study within the behavioral sci-
arguably compatible with one another. ences display "physics envy"?
2. What are some advantages associated with qualitative re- S. Define qualitative research and list some of its characteristics.
search? What sorts of issues can qualitative efforts tackle that 6. Why is the theme of perspective-that of the researcher and his
quantitative research cannot? or her participants--so prominent in qualitative investigations?
3. In your opinion, why are the behavioral sciences so compelled 7. Why is perspective or point of view less important in quanti-
to measure and manipulate variables in the study of human tative rather than qualitative research? Does it matter less
behavior? there? Why or why not?
Appendix Problems F·21
8. Explain the meanings and use of "open" and "closed" systems 20. Characterize the differences, if any, between the writing asso-
in the context of qualitative and quantitative research. ciated with qualitative and that found in quantitative re-
9. Briefly discuss six rules of thumb for qualitative researchers. search. (Hint: Review the sections devoted to writing found
1O. Define the term content analysis and provide an example that in the first 14 chapters in this book for some ideas.)
might be found in the behavioral sciences. 21. List eight characteristics of qualitative researchers, and
11. Explain the idea behind participant observation and how it briefly explain in your opinion why each is important.
varies from ethnography. After defining these terms, provide 22. You are a clinical psychologist who is trying to identify how
a concrete example of each one. many young women in a large urban high school are at risk
12. What are the stages of conducting a participant observation? for anorexia nervosa, an eating disorder. Which research ap-
Define each one in the context of an example. proach-qualitative or quantitative-should you follow?
13. What is a narrative? Identify and define the two main types of Why? (Hint: Use the decision trees at the opening of this
narratives used in research efforts. appendix for guidance in answering this question.)
14. Create a prompt for an episodic narrative. 23. You are interested in studying child and parental impressions
15. Create a prompt for a narrative interview. of the first day of school in kindergarten. Your goal is to col-
16. What advantage(s) does qualitative research in groups have lect and examine reflections generated by these two groups.
over more interview oriented work (e.g., narrative, partici- Which research approach-qualitative or quantitative-
pant observation)? should you follow? Why? (Hint: Use the decision trees at the
17. How does a focus group differ from a more traditional inter- opening of this appendix for guidance in answering this
view? Why do some qualitative researchers argue that focus question.)
groups represent a better research technique than other qual- 24. The dean of students at a university is concerned that
itative methods? students will perceive a new set of behavioral guidelines for
18. What are some of the reasons a researcher would want to dormitory living as restrictive, even harsh. What sort of
conduct a focus group? research technique should the dean employ to get a sense
19. Imagine that you are running a focus group. Select a topic for of student opinion before deciding whether to implement
the group to discuss and then describe how you would lead the guidelines? Why? (Hint: Use the decision trees at the
the group through the four steps usually associated with opening of this appendix for guidance in answering this
focus groups. question.)
r
RrrrRE"NCrS
Babbage, C. (1989). The decline of science in England. Nature, 340, 499-502. (Original
work published 1830)
R-1
R·2 References
Bailar, J. c., & Mosteller, F. (1988). Guidelines for statistical reporting in articles for med-
ical journals: Amplifications and explanations. Annals of Internal Medicine, 108,
266-273.
Bakan, D. (l966). The test of significance in psychological research. Psychological Bul-
letin, 66, 423-437.
Barber, J. S., Axinn, W. G., & Thornton, A. (l999). Unwanted childbearing, health, and
mother-child relationships. Journal of Health and Social Behavior, 40, 231-257.
Barfield, w., & Robless, R. (1989). The effects of two- and three-dimensional graphics on
the problem-solving performance of experienced and novice decision makers.
Behaviour and Information Technology, 8, 369-385.
Baum, A., Gatchel, R. J., & Schaeffer, M. A. (l983). Emotional, behavioral, and physio-
logical effects of chronic stress at Three Mile Island. Journal of Consulting and Clin-
ical Psychology, 51, 565-572.
Becker, H. S. (l998). Tricks of the trade: How to think about your research while you're
doing it. Chicago: University of Chicago Press.
Bern, D. J. (l987). Writing the empirical journal article. In M. P. Zanna & J. Darley (Eds.),
The compleat academic: A practical guide for the beginning social scientist (pp.
171-201). New York: Random House.
Bern, D. J., & Allen, A. (1974). On predicting some of the people some of the time: The
search for cross-situational consistencies in behavior. Psychological Review, 81,
506-520. Cohen, J. (1992). A power primer. Psychological Bulletin, 112, 155-159.
Bern, S. L. (1977). On the utility of alternative procedures for assessing psychological an-
drogyny. Journal of Consulting and Clinical Psychology, 45, 196-205.
Bennett, D. J. (1998). Randomness. Cambridge, MA: Harvard University Press.
Berger, A. (1999). Long-lasting scares of unwanted births. The New York Times, August
24,F8.
Berscheid, E., & Walster, E. (1974). Physical attractiveness. In L. Berkowitz (Ed.), Advances in
experimental social psychology (Vol. 7, pp. 157-215). San Diego, CA: Academic Press.
Billig, M., & Tajfel, H. (1973). Social categorization and similarity in intergroup behav-
ior. European Journal of Social Psychology, 3, 27-52.
Blumer, H. (1938). Social psychology. In E. Schmidt (Ed.), Man and society (pp.
144-198). New York: Prentice Hall.
Blumer, H. (1969). Symbolic interactionism: Perspective and method. Berkeley and Los
Angeles: University of California Press.
Boice, R. (1996). Procrastination and blocking: A novel, practical approach. New York:
Praeger.
Bolt, M. (1993). Instructor's manual to accompany Myer's Social Psychology (4th ed.). New
York: McGraw-Hill.
Bornstein, M. H., & Sigman, M. D. (1986). Continuity in mental development from in-
fancy. Child Development, 57, 251-274.
Bowen, W. G., & Bok, D. C. (1998). The shape of the river: Long-term consequences of consid-
ering race in college and university admissions. Princeton, NJ: Princeton University Press.
Boyer, P. (1995). Promises to keep: The United States since World War II. Lexington, MA:
D. C. Heath and Company.
Brookhart, S. (1999). Board actions signal new era. APS Observer, 12 (6), 30-3l.
Bruner, J. (l987). Life as narrative. Social Research, 54,11-32.
..... -
References
11·3
Bruner, J. (1991). The narrative construction of reality. Critical Inquiry, 18, 1-21.
Cook, T. D., & Campbell, D. T. (1979). Quasi-experimentation: Design & analysis issues
for field settings. Boston: Houghton Mifflin Company.
Coombs, W. T.,Algina, J., & Oltman, D. O. (1996). Univariate and multivariate omnibus
hypothesis tests selected to control Type I error rates when population variances are
not necessarily equal. Review of Educational Research, 66, 137-179.
Coopersmith, S. (1987). SEI: Self-Esteem Inventories. Palo Alto, CA: Consulting Psychol-
ogists Press, Inc.
Cowles, M. (1989). Statistics in psychology: An historical perspective. Hillsdale, NJ:
Erlbaum.
Cozby, P. C. (1997). Methods in behavioral research (6th ed.). Mountain View, CA: May-
field Publishing Company.
Crosby, A. W. (1997). The measure of reality: Quantification and western society. Cam-
bridge: Cambridge University Press.
Crosby, A. W. (1998). The measure of reality: Quantification and Western sud"y,
1250-1600. Cambridge: Cambridge University Press.
R"bbagc,C. (1989). The decline of scit:nct: in England. Nature, 340, 499-502. (Original
work published 1830)
R·1
R-2 References
Bailar, J. c., & Mosteller, F. (1988). Guidelines for statistical reporting in articles for med-
ical journals: Amplifications and explanations. Annals of Internal Medicine, 108,
266-273.
Bakan, D. (1966). The test of significance in psychological research. Psychological Bul-
letin, 66,423-437.
Barber, J. S., Axinn, W. G., & Thornton, A. (1999). Unwanted childbearing, health, and
mother-child relationships. Journal of Health and Social Behavior, 40, 231-257.
Barfield, W., & Robless, R. (1989). The effects of two- and three-dimensional graphics on
the problem-solving performance of experienced and novice decision makers.
Behaviour and Information Technology, 8, 369-385.
Baum, A., Gatchel, R. J., & Schaeffer, M. A. (1983). Emotional, behavioral, and physio-
logical effects of chronic stress at Three Mile Island. Journal of Consulting and Clin-
ical Psychology, 51, 565-572.
Becker, H. S. (1998). Tricks of the trade: How to think about your research while you're
doing it. Chicago: University of Chicago Press.
Bern, D. J. (1987). Writing the empirical journal article. In M. P. Zanna & J. Darley (Eds.),
The compleat academic: A practical guide for the beginning social scientist (pp.
171-201). New York: Random House.
Bern, D. J., & Allen, A. (1974). On predicting some of the people some of the time: The
search for cross-situational consistencies in behavior. Psychological Review, 81,
506-520. Cohen, J. (1992). A power primer. Psychological Bulletin, 112, 155-159.
Bern, S. L. (1977). On the utility of alternative procedures for assessing psychological an-
drogyny. Journal of Consulting and Clinical Psychology, 45, 196-205.
Bennett, D. J. (1998). Randomness. Cambridge, MA: Harvard University Press.
Berger, A. (1999). Long-lasting scares of unwanted births. The New York Times, August
24, F8.
Berscheid, E., & Walster, E. (1974). Physical attractiveness. In L. Berkowitz (Ed.), Advances in
experimental social psychology (Vol. 7, pp. 157-215). San Diego, CA: Academic Press.
Billig, M., & Tajfel, H. (1973). Social categorization and similarity in intergroup behav-
ior. European Journal of Social Psychology, 3, 27-52.
Blumer, H. (1938). Social psychology. In E. Schmidt (Ed.), Man and society (pp.
144-198). New York: Prentice Hall.
Blumer, H. (1969). Symbolic interactionism: Perspective and method. Berkeley and Los
Angeles: University of California Press.
Boice, R. (1996). Procrastination and blocking: A novel, practical approach. New York:
Praeger.
Bolt, M. (1993). Instructor's manual to accompany Myer's Social Psychology (4th ed.). New
York: McGraw-Hill.
Bornstein, M. H., & Sigman, M. D. (1986). Continuity in mental development from in-
fancy. Child Development, 57, 251-274.
Bowen, W. G., & Bok, D. C. (1998). The shape of the river: Long-term consequences ofconsid-
ering race in college and university admissions. Princeton, NJ: Princeton University Press.
Boyer, P. (1995). Promises to keep: The United States since World War II. Lexington, MA:
D. C. Heath and Company.
Brookhart, S. (1999). Board actions signal new era. APS Observer, 12 (6), 30-31.
Bruner, J. (1987). Life as narrative. Social Research, 54, 11-32.
References R-5
Eagly, A. H., & Chaiken, S. (1993). The psychology of attitudes. Fort Worth, TX: Harcourt
Brace Jovanovich. /
Edgeworth, F. Y. (1887). Observations and statistics: An essay on the theory of errors of
observation and the first principles of statistics. Transactions of the Cambridge Philo-
sophical Society, 14, 138-169.
Efran, M. (1974). The effect of physical appearance on the judgement of guilt, interper-
sonal attractiveness, and severity of recommended punishment in a simulated jury
task. Journal of Research in Personality, 8, 45-54.
Egolf, B., Lasker, J., Wolf, S., & Potvin, L. (1992). The Roseto effect: A 50-year compari-
son of mortality rates. American Journal of Public Health, 82, 1089-1092.
Elbow, P., & Belanoff, P. (1995). A community of writers: A workshop course in writing
(2nd ed.). New York: McGraw-Hill.
Elifson, K. w., Runyon, R. P., & Haber, A. (1990). Fundamentals of social statistics
(2nd ed.). New York: McGraw-Hill.
Ericsson, K. A., & Simon, H. A. (1993). Protocol analysis: Verbal reports as data (Rev. ed.).
Cambridge, MA: MIT Press.
Ericsson, K. A., Chase, W. G., & Faloon, S. (1980). Acquisition of memory skill. Science,
208, 1181-1182.
Evans, J. D. (1996). Straightforward statistics for the behavioral sciences. Pacific Grove, CA:
Brooks/Cole.
Fazio, R. H., & Zanna, M. P. (1981). Direct experience and attitude-behavior consistency.
In L. Berkowitz (Ed.), Advances in experimental social psychology (Vol. 14, pp.
161-202). New York: Academic Press.
Feller, W. (1968). An introduction to probability theory and its applications (3rd ed., Vol.
1.). New York: Wiley.
Festinger, L. (1954). A theory of social comparison processes. Human Relations, 7,
117-140.
R·6 Referllnces
"
Festinger, L., Riecken, H. W., & Schachter, S. (1956). When prophecy fails. Minneapolis:
University of Minnesota Press.
Fischhoff, B. (1982). For those condemned to study the past: Heuristics and biases in
hindsight. In D. Kahneman, P. Slovic, & A. Tversky (Eds.), Judgment under uncer-
tainty: Heuristics and biases (pp. 335-351). Cambridge: Cambridge University Press.
Fischhoff, B., Slovic, P., & Lichtenstein, S. (1977). Knowing with certainty: The appro-
priateness of extreme confidence. Journal of Experimental Psychology: Human Per-
ception and Performance, 3, 552-564.
Fisher Box, J. (1978). R. A. Fisher: The life of a scientist. New York: Wiley.
Fisher, R. A. (1966). The design of experiments (8th ed.). Edinburgh: Oliver and Boyd.
Flick, U. (1996). Psychologie des technisierten Alltags. Opladen: Westdeutscher Verlag.
Flick, U. (1998). An introduction to qualitative research. London: Sage.
Fontana, A., & Frey, J. H. (1994). Interviewing: The art of science. In N. K. Denzin & Y. S.
Lincoln (Eds.), Handbook of qualitative research (pp. 361-376). Thousand Oaks,
CA: Sage.
Freud, S. (1960). Jokes and their relation to the unconscious. In J. Stratchey & A. Freud
(Eds. and Trans.), The standard edition of the complete psychological works of Sig-
mund Freud (Vol. 8, pp. 9-15). London: Hogarth. (Original work published 1905)
Friedman, E., Katcher, A. H., Lynch, J. J., & Thomas, S. A. (1980). Animal companions
and one-year survival of patients after discharge from a coronary care unit. Public
Health Reports, 95, 307-312.
Funder, D. C. (1983). The "consistency" controversy and the accuracy of personality
judgments. Journal of Personality, 48,473-493.
Funder, D.O., & Ozer, D. J. (1983). Behavior as a function of the situation. Journal ofPer-
sonality and Social Psychology, 44,1198-1213.
Gardner, H. (1983). Frames of mind: The theory of multiple intelligences. New York:
Basic Books.
Gardner, M. (1975). Mathematical carnival. New York: Knopf.
Gelfand, H., & Walker, C. J. (1990). Mastering APA style: Student's workbook and training
guide. Washington, DC: American Psychological Association.
Gibbons, J. D. (1971). Nonparametric statistical inference. New York: McGraw-Hill.
Gigerenzer, G., Swijtink, Z., Porter, T., Daston, L., Beatty, J., & Kruger, L. (1989). The
empire of chance: How probability changed science and everyday life. Cambridge:
Cambridge University Press.
Gilbert, E. W. (1958). Pioneer maps of health and disease in England. Geographical Jour-
nal, 124, 172-183.
Gilovich, T. (1991). How we know what isn't so: The fallibility of human reason in every-
day life. New York: Free Press.
Gilovich, T., Vallone, R., & Tversky, A. (1985). The hot hand in basketball: On the mis-
perception of random sequences. Cognitive Psychology, 17, 295-314.
Glass, D. c., Singer, J. E., & Friedman, L. N. (1969). Psychic cost of adaptation to an en-
vironmental stressor. Journal of Personality and Social Psychology, 12, 200-210.
Goldberg, L. R. (1970). Man versus model of man: A rationale plus some evidence for a
method of improving on clinical inferences. Psychological Bulletin, 73,422-432.
r References R·7
Goode, E. (1999). Linking drop in crime to rise in abortion: Roe v. Wade resulted in un-
born criminals, 2 economists theorize. The New York Times, August 20, A14.
Gould, S. J. (1981). The mismeasure of man. New York: Norton.
Gould, S. J. (1997). Questioning the millennium: A rationalist's guide to a precisely arbi-
trary countdown. New York: Harmony.
Gravetter, F. J., & Wallnau, 1. B. (1996). Statistics for the behavioral sciences (4th ed.).
Minneapolis, MN: West.
Gravetter, F. J., & Wallnau, 1. B. (1996). Statistics for the behavioral sciences: A first course
for students ofpsychology and education. Minneapolis, MN: West Publishing Co.
Grinnell, F. (1992). The scientific attitude (2nd ed.). New York: Guilford.
Johnson, D. (1981). V-1, V-2: Hilter's vengeance on London. New York: Stein & Day.
Johnson, D. E. (1989}.An intuitive approach to teaching analysis of variance. Teaching of
Psychology, 16,67-68.
Jones, E. E., & Sigall, H. (197l). The bogus pipeline: A new paradigm for measuring af-
fect and attitude. Psychological Bulletin, 76, 349-364.
Judd, C. M., & Kenny, D. A. (1981). Estimating the effects of social interventions. Cam-
bridge: Cambridge University Press.
Jung, C. G. (1971). The psychological theory of types. In H. Read, M. Fordham, & G.
Adler (Eds.), Collected works of C. G. Jung (Vol. 20, pp. 524-541). Princeton, NJ:
Princeton University Press. (German original published 1931.)
Kahneman, D. T., Slovic, P., & Tversky, A. (Eds.) (1982). Judgment under uncertainty:
Heuristics and biases. Cambridge: Cambridge University Press.
Kahneman, D., & Tversky, A. (1972). Subjective probability: A judgment of representa-
tiveness. Cognitive Psychology, 3, 430-454.
Kahneman, D., & Tversky, A. (1973). On the psychology of prediction. Psychological Re-
view, 80, 237-251.
Kahneman, D., & Tversky, A. (1982). The simulation heuristic. In D. Kahneman, P.
Slovic, & A. Tversky (Eds.), Judgment under uncertainty: Heuristics and biases (pp.
201-208). Cambridge: Cambridge University Press.
Kahneman, D., Slovic, P., & Tversky, A. (1982). (Eds.). Judgment under uncertainty:
Heuristics and biases. Cambridge: Cambridge University Press.
Keil, R. (1996, September 17). Welfare study debunks myths about recipients. The Morn-
ing Call, p. A3.
Kenrick, D. T., & Funder, D. C. (1988). Profiting from controversy: Lessons from the
person-situation debate. American Psychologist, 43, 23-34.
Kimball, o.
M. (1972). Development of norms for the Coopersmith Self-Esteem Inven-
tory: Grades four through eight. Doctoral dissertation, Northern Illinois University.
Dissertation Abstracts International, 34, 1131-1132.
Kirk, R. E. (1982). Experimental designs: Procedures for the behavioral sciences (2nd ed.).
Monterey, CA: Brooks/Cole.
Kirk, R. E. ( 1990). Statistics: An introduction (3rd ed.). New York: Holt, Rhinehart, & Winston.
Kirk, R. E. (1994). Choosing a multiple comparison procedure. In B. Thompson (Ed.),
Advances in social science methodology (pp. 77-121). Greenwich, CT: JAI Press.
Kirk, R. E. (1999). Statistics: An introduction. 4th ed. Fort Worth, TX: Harcourt Brace.
Konzem, P., & Baker, G. (1996). Essay exchanges to improve student writing. Kansas
English, 81,64-69.
Kramer, C. Y. (1956). Extension of multiple range test to group means with unequal
number of replications. Biometrics, 57, 649-655.
Kromrey, J. D. (1993). Ethics and data analysis. Educational Researcher, 22, 24-27.
Kuhlman, T. L. (1985). A study of salience and motivational theories of humor. Journal
of Personality and Social Psychology, 49,281-286.
Kuhn, T. S. (1970). The structure of scientific revolutions (2nd ed.). Chicago: University of
Chicago Press.
Kuhn, T. S. (1970). The structure of scientific revolutions. (Revised ed.) Chicago: Univer-
sity of Chicago Press.
References R·9
Langer, E. J. (1975). The illusion of control. Journal of Personality and Social Psychology,
32,311-328.
Langer, E. J. (1983). The psychology of control. Beverly Hills, CA: Sage.
Larkey, P. D., Smith, R A., & Kadane, J. B. (1989). It's okay to believe in the "hot hand."
Chance, 2, 22-30.
Larson, D. G., & Chastain, R L. (1990). Self-concealment: Conceptualization, measure-
ment, and health implications. Journal of Social and Clinical Psychology, 9, 439-455.
Leahey, T. H. (1990, August). Waiting for Newton. Paper presented at the annual Meeting
of the American Psychological Association, Boston.
Leahey, T. H. (1997). A history of psychology: Main currents in psychological thought.
Upper Saddle River, NJ: Prentice Hall.
Leddo, J., Abelson, R. P., & Gross, P. H. (1984). Conjunctive explanations: When two rea-
sons are better than one. Journal of Personality and Social Psychology, 47, 933-943.
Lehman, D. R, & Nisbett, R E. (1990). A longitudinal study of the effects of undergrad-
uate training on reasoning. Developmental Psychology, 26, 952-960.
Lehman, D. R, Lempert, R 0., & Nisbett, R E. (1988). The effects of graduate training
on reasoning: Formal discipline and thinking about everyday-life events. American
Psychologist, 43, 431-442.
Leik, R K. (1997). Experimental design and the analysis of variance. Thousand Oaks, CA:
Pine Forge Press.
Li, C. (1975). Path analysis: A primer. Pacific Grove, CA: Boxwood Press.
Lichtenstein, S., Slovic, P., Fischhoff, B., Layman, M., & Combs, B. (1978). Judged fre-
quency of lethal events. Journal of Experimental Psychology: Human Learning and
Memory, 4, 551-578.
Lincoln, Y. S., & Guba, E. G. (1985). Naturalistic inquiry. Newbury Park, CA: Sage.
Litt, M. D., Tennen, H., Affleck, G., & Klock, S. (1992). Coping and cognitive factors in
adaptation to in vitro fertilization failure. Journal ofBehavioral Medicine, 15, 17l-187.
Locascio, J. J. (1999, May). Significance tests and "results-blindness." APA Monitor, 11.
Locksley, A., Ortiz, v., & Hepburn, C. (1980). Social categorization and discriminatory
behavior: Extinguishing the minimal intergroup discrimination effect. Journal of
Personality and Social Psychology, 39, 773-783.
Loftus, E. E (1979). The malleability of human memory. American Scientist, 67, 312-320.
Loftus, E. E, & Palmer, J. C. (1974). Reconstruction of automobile destruction: An ex-
ample of the interaction between language and memory. Journal of Verbal Learning
and Verbal Behavior, 13,585-589.
Loftus, G. R. (1991). On the tyranny of hypothesis testing. Contemporary Psychology, 36,
102-105.
Loftus, G. R. (1993). Editorial comment. Memory & Cognition, 21,1-3.
Loftus, G. R (1996). Psychology will be a much better science when we change the way
we analyze data. Current Directions in Psychological Science, 5, 161-171.
R·10 References
Martin, D. W. (1996). Doing psychology experiments (4th ed.). Pacific Grove, CA:
Brooks/Cole.
Mauro, J. (1992). Statistical deception at work. Hillsdale, NJ: Erlbaum.
McAdams, D. P. (1993). The stories we live by: Personal myths and the making of the self
New York: William Morrow and Company.
McCall, R. B., Kennedy, C. B., & Appelbaum, M.1. (1977). Magnitude of discrepancy and
the distribution of attention in infants. Child Development, 48, 772-786.
McCauley, c., Woods, K, Coolidge, c., & Kulick, W. (1983). More aggressive cartoons
are funnier. Journal of Personality and Social Psychology, 44, 817-823.
McKenna, R. J. (1995). The undergraduate researcher's handbook: Creative experimenta-
tion in social psychology. Boston: Allyn and Bacon.
Mead, M. (1961). Coming of age in Samoa. New York: Morrow.
Meehl, P. E. (1977). Psychodiagnosis: Selected papers. New York: Norton.
Mehan, A. M., & Warner, c.B. (2000). Elementary data analysis using Microsoft Excel.
New York: McGraw-Hill.
Milkovich, G. T., & Newman, J. M. (1987). Compensation (2nd ed.). Plano, TX: Business
Publications, Inc.
Miller, L. c., Berg, J. H., & Archer, R. L. (1983). Openers: Individuals who elicit intimate
self-disclosure. Journal of Personality and Social Psychology, 44, 1234-1244.
Mischel, W. (1968). Personality and assessment. New York: Wiley.
Moffatt, M. (1989). Coming of age in New Jersey: College and American culture. New
Brunswick, NJ: Rutgers University Press.
Mook, D. G. (1983). In defense of external invalidity. American Psychologist, 38, 379-387.
Moore, D. S. (1992). Teaching statistics as a respectable subject. In F. Gordon & S.
Gordon (Eds.), Statistics for the twenty-first century (pp. 14-25). Washington, DC:
Mathematical Association of America.
Morgan, D. L. (1988). Focus groups as qualitative research. Newbury Park, CA: Sage.
Morier, D. M., & Borgida, E. (1984). The conjunction fallacy: A task specific phenome-
non? Personality and Social Psychology Bulletin, 10,243-252.
Moses, L. E. (1952). Nonparametric statistics for psychological research. Psychological
Bulletin, 49, 122-143.
Motterlini, M. (Ed.). (1999). For and against method: The Lakatos lectures and the
Lakatos-Feyerabend correspondence. Chicago: University of Chicago Press.
Myers, I. B. (1962). The Myers-Briggs Type Indicator. Princeton, NJ: Educational
Testing Service. New York: Putnam.
Newton, R. R., & Rudestam, K E. (1999). Your statistical consultant: Answers to your data
analysis questions. Thousand Oaks, CA: Sage.
Nicol, A. A. M., & Pexman, P. M. (1999). Presenting your findings: A practical guide for cre-
ating tables. Washington, DC: American Psychological Association.
Nisbett, R. E., Fong, G. T., Lehman, D. R., & Cheng, P. W. (1987). Teaching reasoning. Sci-
ence, 238, 625-631.
Nisbett, R. E., & Ross, L. (1980). Human inference: Strategies and shortcomings of social
judgment. Englewood Cliffs, NJ: Prentice Hall.
References R·11
Nisbett, R. E., & Wilson, T. D. (1977). Telling more than we can know: Verbal reports on
mental processes. Psychological Review, 81, 231-259.
Nisbett, R. E., Krantz, D. H., Jepson, c., & Fong, G. T. (1982). Improving inductive infer-
ence. In D. Kahneman, P. Slovic, & A. Tversky (Eds.), Judgment under uncertainty:
Heuristics and biases (pp. 445-459). Cambridge: Cambridge University Press.
Nisbett, R. E., Krantz, D. H., Jepson, c., & Kunda, Z. (1983). The use of statistical heuris-
tics in everyday reasoning. Psychological Review, 90, 339-363.
Nolen-Hoeksema, S. (1990). Sex differences in depression. Stanford, CA: Stanford Univer-
sity Press.
Nolen-Hoeksema, S. (1993). Sex differences in control of depression. In D. M. Wegner &
J. W. Pennebaker (Eds.), Handbook of mental control (pp. 306-324). Englewood
Cliffs, NJ: Prentice Hall.
Parrott III, L. (1999). How to write psychology papers (2nd ed.). New York: Longman.
Patton, M. Q. (1990). Qualitative evaluation and research methods (2nd ed.). Lon-
don: Sage.
Pavlov, I. P. (1928). Lectures on conditioned reflexes (W. H. Gantt, Trans.). New York:
International.
Pavlov, I. P. (1960). Conditioned reflexes (Rev. ed.) (G. V. Anrep, Trans. and Ed.). New
York: Dover. (Original work published 1927)
Pearson, E. S. (1968). Some early correspondence between W. S. Gosset, R. A. Fischer,
and Karl Pearson, with notes and comments. In E. S. Pearson & M. G. Kendall
(Eds.), Studies in the history of statistics and probability (Vol. I, pp. 405-417). Lon-
don: Charles Griffin and Co.
Pearson, K. (1930). Life, letters, and labours of Francis Galton. Vol. IIIa, Correlation, per-
sonal identification, and eugenics. Cambridge: Cambridge University Press.
Pedhazur, E. J. (1982). Multiple regression in behavioral research: Explanation and predic-
tion (2nd ed.). New York: Holt, Rinehart, and Winston.
Pelham, B. W. (1999). Conducting experiments in psychology: Measuring the weight of
smoke. Pacific Grove, CA: Brooks/Cole.
Pennebaker, J. W. (1989). Confession, inhibition, and disease. In L. Berkowitz (Ed.),
Advances in experimental social psychology (Vol. 22, pp. 211-244). New York: Ac-
ademic Press.
Pennebaker, J. w., & Harber, K. (1993). A social stage model of collective coping: The
Lorna Prieta earthquake. Journal of Social Issues, 49, 125-146.
Pennebaker, J. w., Barger, S. D., & Tiebout, J. (1989). Disclosure of traumas and health
among Holocaust survivors. Psychosomatic Medicine, 51, 577-589.
Perman, S. (1999). The unforseen effect of abortion. Time, August 23, 47.
Peters, W. S. (1987). Counting for something: Statistical principles and personalities. New
York: Springer-Verlag.
1R-12 References
Peterson, C. (1996). Writing rough drafts. In F. T. C. Leong & J. T. Austin (Eds.), The psy-
chology research handbook: A guide for graduate students and research assistants (pp.
282-290). Thousand Oaks, CA: Sage Publications.
Peterson, c., & Bossio, L. M. (1989). Learned helplessness. In R. C. Curtis (Ed.), Self-
defeating behaviors (pp. 235-257). New York: Plenum.
Peterson, c., & Seligman, M. E. P. (1984). Causal explanations as a risk factor for depres-
sion: Theory and evidence. Psychological Review, 91,347-374.
Peterson, c., Seligman, M. E. P., & Vaillant, G. E. (1988). Pessimistic explanatory style is
a risk factor for physical illness: A thirty-five year longitudinal study. Journal of Per-
sonality and Social Psychology, 55, 23-27.
Peterson, D. R. (1968). The clinical study of social behavior. New York: Appleton.
Pittenger, D. J. (1995). Teaching students about graphs. Teaching of Psychology, 22,
125-128.
Platt, J. R. (1964). Strong inference. Science, 146,347-353.
PIous, S. (1993). The psychology of judgement and decision making. New York:
McGraw-Hill.
Prentice, D. A., & Miller, D. T. (1992). When small effects are impressive. Psychological
Bulletin, 112, 160-164.
Radloff, L. (1977). The CES-D scale: A self-report depression scale for the general popu-
lation. Applied Psychological Measurement, 1,385-401.
Rand Corporation. (1955). A million random digits. Glencoe, IL: Free Press.
Reed, G. M., Taylor, S. E., & Kemeny, M. E. (1993). Perceived control and psychological
adjustment in gay men with AIDS. Journal ofApplied Social Psychology, 23, 791-824.
Rosenberg, A. (1980). Sociobiology and the preemption of social science. Baltimore: Johns
Hopkins University.
Rosenberg, A. (1983). Content and consciousness versus the intentional stance. The Be-
havioral and Brain Sciences, 3, 375-376.
Rosenberg, A. (1994). Instrumental biology, or, the disunity of science. Chicago: University
of Chicago Press.
Rosenberg, M. (1965). Society and the adolescent self-image. Princeton, NJ: Princeton
University Press.
Rosenthal, R. (1991). Meta-analytic procedures for social research (Rev. ed.). Newbury
Park, CA: Sage.
Rosenthal, R., & Fode, K. L. (1963). The effect of experimenter bias on the performance
of the albino rat. Behavioral Science, 8, 127-134.
Rosenthal, R., & Rosnow, R. L. (1985). Contrast analysis: Focused comparisons in the
analysis of variance. Cambridge: Cambridge University Press.
Rosenthal, R., & Rosnow, R. L. (1991). Essentials of behavioral research: Methods and data
analysis (2nd ed.). New York: McGraw-Hill.
Rosenthal, R., & Rubin, D. B. (1982). A simple, general purpose display of magnitude of
experimental effect. Journal of Educational Psychology, 74, 166-169.
Rosnow, R. L., & Rosenthal, R. (1989). Definition and interpretation of interaction ef-
fects. Psychological Bulletin, 105, 143-146.
Rosnow, R. L., & Rosenthal, R. (1989). Statistical procedures and the justification of
knowledge in psychological science. American Psychologist, 44,1276-1284.
References
Rosnow, R. L., & Rosenthal, R. (1996). Beginning behavioral research (2nd ed). Engle-
wood Cliffs, NJ: Prentice Hall.
Rosnow, R. L., & Rosnow, M. (1995). Writing papers in psychology (3rd ed.). Pacific
Grove, CA: Brooks/Cole.
Ross, L., & Nisbett, R. E. (1991). The person and the situation: Perspectives of social psy-
chology. New York: McGraw-Hill.
Runyon, R. P., Coleman, K. A., & Pittenger, D. J. (1999). Fundamentals of behavioral
statistics (9th ed.). New York: McGraw-Hill.
Runyon, R. P., Haber, A., Pittenger, D. J., & Coleman, K. A. (1996). Fundamentals of be-
havioral statistics (8th ed.). New York: McGraw-Hill.
I
}
r Safire, W. (1999, May 9). McCawley: Last words from a giant oflinguistics. The New York
r,
Times Magazine, pp. 24, 26.
Salkind, N. J. (1997). Exploring research (3rd ed). Upper Saddle River, NJ: Prentice Hall.
Scarr, S. (1997). Rules of evidence: A larger context for the statistical debate. Psychologi-
cal Science, 8, 16-17.
Schachter, S. (1959). The psychology of affiliation: Experimental studies of the sources of
gregariousness. Stanford, CA: Stanford University Press.
Scheier, M. E, & Carver, C. S. (1985). Optimism, coping, and health: Assessment and
implications of generalized outcome expectancies. Health Psychology, 4, 219-247.
Scheier, M. E, Carver, C. S., & Bridges, M. W. (1994). Distinguishing optimism from neu-
roticism (and trait anxiety, self-mastery, and self-esteem): A reevaluation of the Life
Orientation Test. Journal of Personality and Social Psychology, 67,1063-1078.
Scheier, M. E, Matthews, K. A., Owens, J. E, Macgovern, Sr., G. J., Lefebvre, R. c., Abbott,
R. A., & Carver, C. S. (1989). Dispositional optimism and recovery from coronary
artery bypass surgery: The beneficial effects on physical and psychological well-
being. Journal of Personality and Social Psychology, 57, 1024-1040.
Scheier, M. E, Matthews, K. A., Owens, J. E, Schulz, R., Bridges, M. W., Magovern, G. J.,
Jr., & Carver, C. S. (in press). Optimism and rehospitalization following coronary
artery bypass graft surgery. Archives of Internal Medicine.
Schulz, R., & Decker, S. (1985). Long-term adjustment to physical disability: The role of
social support, perceived control, and self-blame. Journal of Personality and Social
Psychology, 48, 1162-1172.
Schwartz, T. (1999, January 10). The test under stress. The New York Times Magazine,
pp. 30-35, 51,56,63.
Scott, J. M., Koch, R. E., Scott, G. M., & Garrison, S. M. (1999). The psychology student
writer's manual. Upper Saddle River, NJ: Prentice Hall.
Sears, D. O. (1986). College sophomores in the laboratory: Influences of a narrow data
base on social psychology's view of human nature. Journal of Personality and Social
Psychology, 51, 515-539.
Shaughnessy, J. J., & Zechmeister, E. B. (1994). Research methods in psychology. (3rd ed.).
New York: McGraw-Hill.
Shaughnessy, J. J., & Zechmeister, E. B. (1997). Research methods in psychology (4th ed.).
New York: McGraw-Hill.
Shrout, P. E. (1997). Should significance tests be banned? Introduction to a special sec-
tion exploring the pros and cons. Psychological Science, 8, 1-2.
R·14 References
Siegel, S. (1956). Nonparametric statistics for the behavioral sciences. New York: McGraw-Hill.
Siegel, S., & Castellan, Jr., N. J. (1988). Nonparametric statistics for the behavioral sciences
(2nd ed.). New York: McGraw-Hill.
Sigall, H., & Ostrove, N. (1975). Beautiful but dangerous: Effects of offender attractive-
ness and nature of the crime on juridic judgments. Journal of Personality and Social
Psychology, 31,410-414.
Slovic, P., Fischhoff, B., & Lichtenstein, S. (1979). Rating the risks. Environment, 21,
14-20,36-39.
Slovic, P., Fischhoff, B., & Lichtenstein, S. (1982). Facts versus fears: Understanding per-
ceived risk. In D. Kahneman, P. Slovic, & A. Tversky (Eds.), Judgment under uncer-
tainty: Heuristics and biases (pp. 463-489). Cambridge: Cambridge University Press.
Smith, P. c., Budzeika, K. A., Edwards, N. A., Johnson, S. M., & Bearse, 1. N. (1986).
Guidelines for clean data: Detection of common mistakes. Journal of Applied Psy-
chology, 71,457-460.
Smyth, T. R. (1996). Writing in psychology: A student guide. 2nd ed. New York: John Wiley
& Sons.
Snedecor, G. w., & Cochran, W. G. (1967). Statistical methods (6th ed.).Ames: Iowa State
University Press.
Snedecor, G. W., & Cochran, W. G. (1980). Statistical methods (7th ed). Ames, IA: Iowa
State University Press.
Snyder, M. (1974). The self-monitoring of expressive behavior. Journal ofPersonality and
Social Psychology, 30, 526-537.
Snyder, M. (1987). Public appearances/private realities: The psychology of self-monitoring.
New York: Freeman.
Spence, J. T., & Helmreich, R. 1. (1978). Masculinity and femininity: Their psychological
dimensions, correlates, and antecedents. Austin, TX: University of Texas Press.
Spencer, S. J., Steele, C. M., & Quinn, D. M. (1999). Stereotype threat and women's math
performance. Journal of Experimental Social Psychology, 35, 4-28.
Spradley, J. P. (1980). Participant observation. Fort Worth, TX: Harcourt Brace Jovanovich.
Sprent, P. (1989). Applied nonparametric statistical methods. London: Chapman and Hall.
Stanovich, K. E. (1998). How to think straight about psychology (5th ed.). New York:
Longman.
Steele, C. M., & Aronson, J. (1995). Contending with a stereotype: African-American in-
tellectual test performance and stereotype threat. Journal of Personality and Social
Psychology, 69, 797-811.
Stern, W. (1912). Psychologische Methoden der Intelligenz-PruJung. Leipzig, Ger-
many: Barth.
Sternberg, R. J. (1985). Beyond IQ: A triarchic theory of human intelligence. New York:
Cambridge University Press.
Sternberg, R. J. (1993). The psychologist's companion: A guide to scientific writing for stu-
dents and researchers (3rd ed.). Cambridge: Cambridge University Press.
Sternberg, R. J. (1999). Cognitive psychology (2nd ed.). Fort Worth, TX: Harcourt Brace.
Sternberg, R. J., & Detterman, D. K. (Eds.) (1986). What is intelligence? Contemporary
viewpoints on its nature and definition. Norwood, NJ: Ablex.
Stewart, J. S., & Brunjes, P. C. (1990). Olfactory bulb and sensory epithelium in goldfish:
Morphological alterations accompanying growth. Developmental Brain Research,
54, 187-193.
References R·15
Stich, S. P. (1983). From folk psychology to cognitive science: The case against belief. Cam-
bridge, MA: MIT Press.
Stigler, S. M. (1986). The history of statistics: The measurement of uncertainty before 1900.
Cambridge, MA: Belknap Press.
Strauss, A., & Corbin, J. (1990). Basics of qualitative research: Grounded theory, proce-
dur~, and techniques. Newbury Park, CA: Sage.
Strunk,-jr., w., & White, E. B. (1972). The elements of style. New York: Macmillan.
Suls, J., & Fletcher, B. (1983). Social comparison in the social and physical sciences: An
archival study. Journal of Personality and Social Psychology, 44, 575-580.
Suzuki, S.,Augerinos, G., & Black, A. H. (1980). Stimulus control of spatial behavior on
the eight-arm maze in rats. Learning and Motivation, 11, 1-18.
Tajfel, H. (1981). Human groups and social categories: Studies in social psychology. Lon-
don: Cambridge University Press.
Tajfel, H., & Billig, M. (1974). Familiarity and categorization in intergroup behavior.
Journal of Experimental Social Psychology, 10, 159-170.
Tankard, J. w. (1984). The statistical pioneers. Cambridge, MA: Schenkman Publishing Co.
Tanner, M. A. (1990). Investigations for a course in statistics. New York: Macmillan.
Taylor, S. E., & Brown, J. D. (1988). Illusion and well-being: A social psychological per-
spective on mental health. Psychological Bulletin, 103, 193-210.
Taylor, S. E., Kemeny, M. E., Aspinwall, L. G., Schneider, S. G., Rodriguez, R, & Herbert,
M. (1992). Optimism, coping, psychological distress, and high-risk sexual behavior
among men at risk for Acquired Immunodeficiency Syndrome (AIDS). Journal of
Personality and Social Psychology, 63,460-473.
Taylor, S. E., Lichtman, R R, & Wood, J. V. (1984). Attributions, beliefs about control, and
adjustment to breast cancer. Journal of Personality and Social Psychology, 46, 489-502.
Taylor, S. J., & Bogdan, R. (1998). Introduction to qualitative research methods: A guide-
book and resource (3rd ed.). New York: Wiley.
Thompson, S. c., Armstrong, W., & Thomas, C. (1998). Illusions of control, underesti-
mations, and accuracy: A control heuristic explanation. Psychological Bulletin, 123,
143-161.
Tobias, S. (1993). Overcoming math anxiety: Revised and expanded. New York: Norton.
Traue. H. C., & Pennebaker, J. W. (Eds.). (1993). Emotion inhibition and health. Seattle,
WA: Hogrefe & Huber.
Tufte, E. R. (1983). The visual display of quantitative information. Cheshire, CT:
Graphics Press.
Tukey, J. W. (1969). Analyzing data: Detective work or sanctification? American Psychol-
ogist, 24, 83-91.
Tukey, J. W. (1977). Exploratory data analysis. Reading, MA: Addison-Wesley.
Tversky, A., & Gilovich, T. (1989). The "hot hand": Statistical reality or cognitive illu-
sion? Chance, 2, 31-34.
Tversky, A., & Kahneman, D. (1971). Belief in the law of small numbers. Psychological
Bulletin,76,105-110.
Tversky, A., & Kahneman, D. (1973). Availability: A heuristic for judging frequency and
probability. Cognitive Psychology, 4, 207-232.
R·16 References
Tversky, A., & Kahneman, D. (1974). Judgment under uncertainty: Heuristics and biases.
Science, 185, 1124-1131.
Tversky, A., & Kahneman, D. (1983). Extensional versus intuitive reasoning: The con-
junction fallacy in probability judgement. Psychological Review, 90, 293-315.
Vadum, A. c., & Rankin, N. 0. (1998). Psychological research: Methods for discovery and
validation. New York: McGraw-Hill.
VanderStoep, S. W., & Shaughnessy, J. J. (1997). Taking a course in research methods im-
proves reasoning about real-life events. Teaching of Psychology, 24, 122-124.
Wilson, T. D., Dunn, D. S., Kraft, D. T., & Lisle, D. (1989). Introspection, attitude change,
and attitude-behavior consistency: The disruptive effects of explaining why we feel
the way we do. In L. Berkowitz (Ed.), Advances in experimental social psychology
(Vol. 22, pp. 287-343). New York: Academic Press.
Wilson, T. D., Laser, P. S., & Stone, J. 1. (1982). Judging the predictors of one's own mood:
Accuracy and the use of shared theories. Journal of Experimental Social Psychology,
18,537-556.
Winer, B. J., Brown, D. R., & Michels, K. M. (1991). Statistical principles in experimental
design (3rd ed.). New York: McGraw-Hill.
Wolraich, M. L., Lindgren, S. D., Stumbo, P. J., Stegink, L. D., Appelbaum, M. 1., & Kiritsy,
M. C. (1994). Effects of diets high in sucrose or aspartame on the behavior and cog-
nitive performance of children. The New England Iournal of Medicine, 330, 301-307.
Wright, D. B. (1997). Understanding statistics: An introduction for the social sciences. Lon-
don: Sage Publications.
(1974). Science, 185, 1124-1131. American Copyright © 1979 by the American Table 9.1 Source: Adapted from Statistics: An
Association for the Advancement of Science. Psychological Association. Reprinted with Introduction, 4th edition by Roger E. Kirk,
permission. Copyright (1999) Holt, Rinehart and Winston.
Data Box 7.C -Quote from Dawes, R.M. (1979). Reprinted with permission.
Chapter 5
The robust beauty of improper linear models. Data Box 9.C: Representing Standard Error
Data Box 5.A-Mean Number of American Psychologist, 34, 57l-582; problem Graphically (2 Tables) Source: Tables from
Acknowledgements and Authors by from Hahneman, D., & Tversky, A. (1973). On Stewart, J.S., & Brunjes, P. B. (1990). Olfactory
Scientific Discipline Table of data from Suls, the psychology of prediction. Psychological bulb and sensory epithelium in goldfish:
J., & Fletcher, B. (1983). Social comparison in Review, 80, 237-251. American Psychological Morphological alterations accompanying
the social and physical sciences: An archival Association. Reprinted with permission. growth. Developmental Brain Research, 54,
study. Journal of Personality and Social
Figure 7.7 Estimates of Lethal Deaths From 187-1993. Copyright © 1993 Elsevier Science.
Psychology, 44, 575-580. Copyright © 1983 by
Judgement Under Uncertainty: Heuristics and Reprinted with permission.
the American Psychological Association.
Biases by Slovic, Fiscbhoff, and Lichtenstein (1982). Table 9,4 From Hult, C.A. Researching and
Reprinted with permission.
Copyright © 1982 by Cambridge University Press. writing in the social sciences. 1996 by Allyn &
Data Box 5.B Information, Facts, and Reprinted with permission. Bacon. Adapted by permission.
Figures from the College Boards. Reproduced Table 7.5 Reprint of Table 1 from Slovic, P.,
with permission. Copyright © 1995 by College Fiscoff, B., & Lichtenstein, S. (1979). Rating Figure 9.6 Adapted from Statistics for the
Entrance Examination Board. AIl rights the risks. Environment, 21 (3), 14-20, 36-39. Behavioral Sciences. A First Course for Students
reserved. of Psychology and Education, 4te by Gravetter
and Wallnau. Copyright © 1996 by Thomson
Data Box 5.C Figure Normal Distribution of
Deviation IQ's from Cognitive Psychology by Chapter 8 Learning.
Robert J. Sternberg copyright © 1996 by Holt, Figure 8.1: Random Pattern of V-I Bombing Table 9.5 Statistics for the Behavioral and
Rinehart and Winston, reproduced by of Central London Reprinted with the per- Social Sciences: A Brief Course by ArontAron,
permission of the publisher. mission of The Free Press, a Division of Simon © 1997. Reprinted by permission of Prentice
& Schuster, Inc., from HOW WE KNOW Hall, Inc., Upper Saddle River, NJ.
WHAT ISN'T SO: The Fallibility of Human
Chapter 6 Reason in Everyday Life by Thomas Gilovich.
Table 6.6: Beyond the Pearson and Spearman Copyright © 1991 by Thomas Gilovich. Chapter 10
Correlation Coefficients Elifson, K.W., Reprinted with permission. Table 1O.I-Excerpt of Table B.4 (selected One
Rynyon, R., & Haber, A. (1990). Fundamentals Data Box 8.C Quotations from Hacking, I. and Two Tailed Critical Values of t) Source:
of social statistics (2 nd edition). Copyright The emergence of probability. Copyright © 1975 Table III of R.A. Fisher and F. Yates, Statistical
(1990) The McGraw-Hili Companies. by Cambridge University Press. Reprinted with tables for Biological, Agricultural, and Medical
Reprinted with permission. the permission of Cambridge University Press. Research, 6th edition. London: Longman Group,
Figure 6.7, Figure 6.8, Figure 6.9 Class Ltd., 1974. (Previously published by Oliver and
Text Permission, Data Box 80-"Linda is 31
Handout from Self-Insight, Department of Boyed, Trd. Edinburgh.) Reprinted with
years old, single,.." Problem from Tversky, A.,
Psychology, University of Virginia. permission.
& Kahneman, D. (1983) Extensional versus
intuitive reasoning: The conjunction fallacy in Table 10.6 Adapted table from Cohen, J. Sta-
probability judgment. Psychological Review, 90, tistical power analysis for the behavioral sciences
Chapter 7 293-315. Copyright © 1983 by the American (2nd edition). 1996 Hillsdale, NJ, Lawrence
Table 7.3-Adaptation of demonstration 2-5 Psychological Association. Reprinted with EarIbaum Associates.
from Bolt, M. Instructors manual to permission. Table 10.7 Adapted from McKenna, R.J. The
accompany Myers' Social psychology (4th ed.) Project Exercise Adapted material from undergraduate researcher's handbook: Creative
Copyright © 1993 by The McGraw-Hili Tanner, M.A. (1990). Investigations for a course experimentation in social psychology. Copyright
Companies. Reprinted by permission of The in statistics. Copyright © Prentice Hall. (1995) by Allyn & Bacon. Reprinted/adapted by
McGraw-Hili Companies. permission.
Table 7,4-Causes of Death and Their Actual Figure 10.3 Statistics for the Behavioral and
Relative Frequencies Table of data from Chapter 9 Social Sciences: A Brief Course by ArontAron,
Fischoff, B., Slovic, P., & Lichenstein, S. Data Box 9.A-"The mean IQ of the © 1997. Reprinted by permission of Prentice
(1977) Knowing with certainty; The population of eighth-graders Material and Hall, Inc., Upper Saddle River, NJ.
appropriateness of extreme confidence. Journal problem from Tversky, A., & Kahneman, D.
of Experimental Psychology: Human Perception (1971). Belief in the law of small numbers.
and Performance, 3, 552-564. Copyright © 1977 Psychological Bulletin, 76, 10S-110. Copyright Chapter 11
by the American Psychological Association. © 1971 the American Psychological Data Box 11.0-"The single at bat is a
Reprinted with permission. Association. Reprinted with permission. perfectly meaningful context..:' Abelson, R.P.
Data Box 7.A -"I think that the question of Data Box 9.D-What Constitutes a Good (1985). A variance explanation paradox: When
whether people are treated in a fair manner..." Hypothesis? Source: Dana S. Dunn, The a little is a lot. Psychological Bulletin, 97,
Quote from pp. 580-581 in Dawes, R.M. Practical Researcher lie. Copyright © (1999). 139-133. Copyright © 1985 by the American
(1979). The robust beauty of improper linear The McGraw-Hili Companies. Reprinted with Psychological Association. Reprinted with
models. American Psychologist, 34,571-582; permission. permission.
Credit eR-3
Page 411-"Mr. Crane and Mr. Tees were recommendations for public comment. APA © 1999 by the American Psychological
scheduled to leave the airport" Kahneman, Monitor, 9.; Wilkinson, L., & The Task Force on Association. Reprinted with permission.
D., & Tversky, A. (1982). Judgement Under Statistical Inference (1999). Statistical methods in
Uncertainty: Heuristics and Biases. Reprinted psychology journals: Guidelines and Cartoon, "That's the gist of what I want to
with the permission of Cambridge University explanations. American Psychologist, 54, 594-64. say:' Joseph Mirachi, The New Yorker.
Press. Copyright © 1999 by the American Psychological Copyright © 1977 by The Cartoon Bank.
Association. Reprinted with permission. Reprinted with permission.
Table 11.1 Adapted table from Dunn, D.S. (in
press). Letter exchanges on statistics and research
Poetry W.H. Auden From W.H. Auden: Appendix A
methods: Writing, responding, and learning.
Collected Poems by W.H. Auden, edited by
Teaching of Psychology. Lawrence Earlbaum Page A-9 Material adapted from Overcoming
Edward Mendelson. Copyright © 1950 by
Associates. Math Anxiety, Revised and Expanded Edition by
W.H. Auden. Reprinted by permission of
Table 11.2 Material quoted and adapted from Sheila Tobias. Copyright © 1993, 1978 by
Random House, Inc.
Exhibit 1.6 in Leik, R.K. (1997) Experimental Sheila Tobias Reprinted by permission ofW.W.
design and the analysis of variance. Reprinted by Data Box: 14.A Is Psychology a Science? A Norton & Company, Inc.
permission of R.K. Leik. HISTORY OF PSYCHOLOGY 4/e by Leahey,
T., © 1997. Reprinted by permission of Appendix B
Table 1l.8 Adapted table Rosenthal and Ros-
now (1991, pps. 467-468). Copyright © 1991 by Prentice-Hall, Inc., Upper Saddle River, NJ. Table B.l Adapted from page 1 of Rand
The McGraw-Hill Companies. Reprinted with Corporation (1955). Reprinted with
Table 14.2 Portions of a table (p. 291) from permission of the Rand Corporation.
permission.
Dunn, D.S. (1996). Well-being following
Table 11.9 Adapted from Rosenthal and Ros- Table B.2 Adapted from Runyon, R.P.,
amputaion: Salutary effects of positive
now (p. 470). Copyright © 1991 by The Mc- Coleman, K. A., & Pittenger, D. Fundamentals
meaning, optimism, and control. Rehabiliation
Graw-Hill Companies. Reprinted with permis- of behavioral statistics (9 th ed.), Copyright ©
Psychology, 41, 285-302. Copyright © 1996
sion. 2000 by The McGraw-Hill Companies.
Springer Publisher Company.
Reprinted with permission.
Figure 1l.5 Adaptation of Table of Critical
values of the F-ratio, from Snedecor, G. W., & Table 14.3 Material adapted from Flick, U. Table B.4 Table III of R.A. Fisher and F.
Cochran, G. W. (1980). Statistical Methods (7 th An introdutiction to qualitative research. Yates, Statistical Tables for Biological,
ed). Copyright © 1980 Iowa University Press. Copyright © 1998 by Sage Publications Ltd. Agricultural, and Medical Research, 6th
Reprinted with permission. edition. London: Longman Group, Ltd., 1974.
Figure 11.6 Adapted from Rosenthal and
(Previously published by Oliver and Boyed,
Rosnow (p. 468). Copyright © 1991 by
Page 552-Examples from Statistics for the Trd. Edinburgh.) Reprinted with permission.
The McGraw-Hill Companies. Reprinted with
Behavioral Sciences. A first course for students Table B.5 Reprint and adaptation from Harris,
permission.
of Psychology and Education, 4th edition, by Mary B., Basic Statistics for Behavioral Science
Data Box: I1.A Material quoted with the F.J. Grovetter and L.B. Wallnau. © 1996. Research 2/e. Copyright © 1998 by Allyn &
permission from p. 1462 in Cochran, G. W. Reprinted with permission of Wadsworth, a Bacon. Reprinted by permission.
(1967). Footnote by William G. Cochran. division of Thomson Learning.
Science, 156, 1460-1462. Copyright © 1967 by Table B.6 Reprint and adaptation of Table of
American Association for the Advancement of Percentage Points of the Studentized Range
Science. Chapter 15 from Pearson, E.S., & Hartley, H. O. (1958).
Biometrika tables for statisticians, vol. 1 (2 nd
Decision Tree for Opening of Chapter 15
Chapter 12 ed.) Copyright © 1958 Cambridge University
Adapted from Dunn, D.S. (1999). The Practical
Press.
Table 12.1-Design of Two-Factor Study of Noise Researcher: A Student Guide to Conducting
Credit Adapted from Glass, D.C., Singer, J.E., & Psychological Research. Reprinted by the Table B.7 Reprinted and adapted Table of
Friedman, L.N. (1969). Psychic cost of permission by The McGraw-Hill Companies. chi-square from Fisher, R.A. (1970). Statistical
adaptation to an environmental stressor. Journal methods for research workers (14th ed.).
of Personality and Social Psychology, 12, 200-210. Data Box 15.A-"Two sports fans are arguing Copyright © (1970) by Prentice Hall.
Copyright © 1969 by the American Psychological over which sport .. :' Nisbett, R.E., Krantz, Table B.8 Adapted from Kirk, R. E.,
Association. Reprinted with permission. D. H., Jepson, c., & Kunda, Z. (1983). The use Elementary Statistics, 2nd ed. © 1984. Reprinted
of statistical heuristics in everyday reasoning. by the permission of Roger E. Kirk.
Figure 12.1 Material quoted and adapted
Psychological Review, 90, 339-363. Copyright ©
from Glass, D.C., Singer, J.E., & Friedman, L. N. Table B.9 Adapted from Kirk, R. E.,
1983 by the American Psychological
(1969). Psychic cost of adaptation to an envi- Elementary Statistics, 2nd ed. © 1984. Reprinted
Association. Reprinted with permission.
ronmental stressor. Journal of Personality and by the permission of Roger E. Kirk.
Social Psychology, 12,200-210. Copyright © Page 577 Material quoted and adapted from Table B.I0 E.G.Olds (1949), "the 5 Percent
1969 by the American Psychological Associa-
Azar, B. (1999, May). APA statistics task force Significance Levels of Sums of Squares of Rank
tion. Reprinted with permission.
prepares to release recommendations for Differences and a Correction," Ann. Math.
Chapter 14 public comment. APA Monitor, 9: Wilkison, L. Statist, 20, 117-118. Reprinted with permission.
& the Task Force on Statistical Inference E.G. Olds (1938), "Distribution of Sums of
Page 562 "The APA Task Force on Statistical (1999). Statistical methods in psychology Squares of Rank Difference for Small Number
Inference also recommends..:' Azar, B. (1999, journals; guidelines and explanations. of Individuals;' Ann. Math. Statist, 9, 133-148.
May) APA statistics task force prepares to release American Psychologist, 54, 594-604. Copyright Reprinted with permission.
CR-4 Credit
Appendix C Table C-l Dana S. Dunn, The Practical Figure F-2-Brief quotation (one sentence)
Page C-3-Overview of APA Style From Researcher lie. Copyright © (1999). The from The Stories We Live By Dan P. McAdams.
the Publication Manual of the American McGraw-Hill Companies. Reprinted with Copyright © 1993 by Dan P. McAdams.
Psychological Association, 4/e. Copyright © permission. Reprinted by permission of HarperCollins
1994 by the American Psychological Publishers, Inc.
Association. Adapted by permission.
Appendix F Figure F.2 & F.6-Quotations from Motterlini,
Page F-l-Line from the poem, Under which M., For and Against Method. Copyright 1999
Figure C.l: The Hourglass Model of Writing. lyre, by W. H. Auden: Collected Poems by by University of Chicago Press. Reprinted by
Source: Dana S. Dunn, The Practical Researcher W. H. Auden, edited by Edward Mendelson. permission.
lie. Copyright © (1999). The McGraw-Hill Copyright © 1950 by W. H. Auden. Reprinted
Companies. Reprinted with permission. Page F-12-Selection adapted from Spradley,
by permission of Random House, Inc.
J.P. (1980). Participant observation. Fort Worth,
Appendix 0 Page F12-3 stages to participant observation. TX: Harcourt Brace, Jonanovich. Reprinted by
Material adapted from Spradley, J.P. (1980). the permission of Harcourt, Brace and
Page D-4 Material quoted from Dollinger, Participant observation. Fort Worth, TX: Company.
S. J., & DiLalla, D.L. (1996). Cleaning up Harcourt Brace & Company. Reprinted by
data and running preliminary analyses. In Page F5-FI7-Material adapted from Taylor
permission.
F. T. L. Leong & J. T. Austin (Eds.) The S. J. & Bogdan, R. (1998). Introduction to
psychology research handbook: A guide for Figure F.l-Cartoon "I'm a social scientist, qualitative research methods: A guide book and
graduate students and research assistants Michael..." Bud Handelsman, The New resource (3rd ed.) Copyright © 1998 John
(pp. 167-176). Copyright © 1996 by Sage Yorker. Copyright © 1988 by the Cartoon Wiley and Sons, Inc. Reprinted by permission
Publications. Reprinted by permission of the Bank. Reprinted with permission. of John Wiley and Sons, Inc.
publisher.
NAMr INDrX
A Baum,A.,75 Carlsmith, J. M., 48
Abelson, R. P., 209, 288, 337, 342, 343, Beatty, J., 78 Carr, D. E., 138
355,356,388,444,445,568,580 Becker, F-13 Carswell, C. M., 124
Abramson, F-11 Bern, D. J., 149,228,357 Carver, C. S., 87, 88, 230
Affleck, G., 88 Bern, S. L., 26 Cascio, W. F., 262
Agresti, A., 233 Bennett, D. J., 77 Castellan, N. J., Tr., 526, 527, 528,
Algina, J., 370, 420 Berg, J. H., 231 530, 558
Allen, A., 228 Berger, A., 574 Ceci, S. T., 195
Allison, P. D., 264 Bernhard, D., 124 Chaiken, S., 555
Alloy, F-ll Berscheid, E., 390 Chapman, J., 205
Amabile, T. M., 205 Billig, M., 390 Chapman, T. P., 253
American Psychological Association, 23, Black, A. H., 140 Chapman, L. T., 205, 253
125,126,226 Blumer, F-8 Chase, W. G., 372
Anselm, F-7 Bogdan, F-5, F-17 Chastain, R. L., 230
Appelbaum, M.I., 10 Boice, R., 243 Churchland, F-6
Archer, R. L., 231 Bok, D. c., 178 Clarke, R. D., 276
Armstrong, W., 304 Bolt, M., 265 Cleary, T. A., 262
Aron, A., 351, 352, 355, 359, 367, 371, Borgida, E., 288 Cochran, G. W., 422
380,508 Bornstein, M. H., 10 Cochran, W.G., 78,438,449
Aron, E. N., 367,371, 380, 508 Bossio, F-ll Cohen, C., 352
Aron,E.~,351,352,355,359 Bowen, W. G., 178 Cohen, T., F-19, 225,264,352,355,398,
Aronson, E., 48, 59 Box, G., 3 401,402,564
Aspinwall, L. G., 88 Boyer, P., 15 Cohen, T. W., 442, 443, 486, 512
Auden, W. H., F-1, F-4 Bridges, M. w., 88, 230 Cohen, P., 264
Augerinos, G., 140 Brookhart, F-5 Coleman, K. A., 118, 121,276,368,391,
Axinn, W. G., 573 Brown, D. R., 514 428, 502
Azar, B., 565 Brown, J. D., 143 College Board, 180
Bruner, F-15 Combs, B., 267
IB Brunjes, P. C., 330 Cook, T. D., 61, 74, 76,168,259,
Babbage, c., 575, 581 Bybee, J. A., 555 260,517
Bailar, J. c., 566 Coolidge,C., 138
Bakan, D., 351 C Coombs, W. T., 370, 420
Baker, G., 451, 452 Campbell, F-lO Coopersmith, S., 122
Barber, J. S., 573, 574 Campbell, D. T., 61, 74, 76, 104, 168, Corbin, F-7, F-17
Barfield, W., 124 231,259,260,394,516,517 Corrigan, B., 250
Barger, S. D., 75 Campbell, S. K., 148 Cowles, M., 8, 459
NI-1
NI·2 Name Index
Cozby, P. c., 354, 515, 570 Friedman,1. N., 461 Jepson, c., 6, 18,571
Crosby, F-4 Funder, D. c., 228 Johnson, D., 277
Crosby, A. w., 284 Johnson, D. E., 451, 452
G Judd, C. M., 76, 517
D Galton,F.,207 Jung, c. G., 210
Darwin, c., 207 Gardner, H., 195
Datson, 1., 78 Gardner, M., 77 K
Dawes, R. M., 49,167,205,250,253,420 Gatchel, R. J., 75 Kadane, J. B., 62
Decker, F-11 Gibbons, J. D., 527 Kahneman, D., 55,169,241,260,265,
Deckers, 1., 138 Gigerenzer, G., 78 274,275,276,288,323,411,412
Denzin, F-7 Gilbert, E. W., 85, 86 Kahneman, D. T., 367
DePaulo, B. M., 232 Gilovich, T., 62, 276, 278 Katcher, A. H., 540
Detterman, D. K., 67,194 Glass, D. c., 461, 462, 464, 465, 466, Kaufman, M. T., 38
Deutcher, F-17 468,469 Keil, R., 278
Dillon, K. M., 40, 4-1 Goldberg, 1. R., 253 Kemeny, M. E., 231
Donaldson, G. A., 15 Gonzales, M. H., 48 Kendrick, S. A., 262
Dunn, F-3, F-9, F-11 Goode, E., 574 Kennedy, C. B., 10
Dunn, D. S., 5, 23, 30, 46, 57, 59, 64, 68, Gosset, W. S., 371, 402 Kenny, D. A., 76, 517
73,80,143,211,225,227,230, Gould, F-4 Kenrick, D. T., 228
276,278,304,332,338,351,354, Gould, S. J., 194, 195,575 Kimball, O. M., 122
357,359,378,384,388,396,403, Gravetter, F. J., 217, 328, 353, 378, 490, Kirk, R. E., 147, 148, 149,335,439,440,
404,447,451,452,500,508,514, 503,551,552 514, 527, 542
516,550,555,569,570,572,575 Grinnell, E, 576 Klaaren, K. J., 232
Gross, P. H., 288 Klock, S., 88
Guba, F-9 Klockars, A. J., 440
E
Konzem, P., 451, 452
Eagly, A. H., 555 H Kraft, D., 59
Edgeworth, F. Y., 367 Haber, A., 118, 121, 187,233,276,368, Kraft, D. T., 555
Efran, M., 390 391,428,502 Kramer, C. Y., 440
Egolf, B., 161 Hacking, 1., 284 Krantz, D. H., 6,18,571
Elifson, K. W., 187, 188,233 Hamill, R., 278 Kromrey, J. D., 578
Ellsworth, P. c., 48 Hancock, G. R., 440 Kruger, 1., 78
Ericsson, K. A., 59, 372 Harber, K., 74 Kuhlman, T. 1., 138
Evans, J. D., 222, 526, 542, 546 Harris, M. B., 261, 338, 341, 353, 369, Kuhn, F-3, F-6
370,429,440,471 Kuhn, T. S., 11, 179
F Hayes, J. R., 151, 152 Kulick, w., 138
Faloon, S., 372 Hays, W. 1., 542 Kunda, Z., 18, 206, 571
Faust, D., 253 Helmreich, R. 1., 26
Feller, w., 276 Hepburn, c., 390 L
Festinger, F-12 Hermanns, F-14 Langer, E. J., 54, 143
Festinger, 1., 177 Holland, J. H., 10 Larkey, P. D., 62
Finlay, B., 233 Hollander, M., 527 Larson, D. G., 230
Fischhoff, B., 265, 266, 267, 268, 276 Holt, F-6 Laser, P. S., 213
Fisher, R. A., 412, 413, 422, 459 Holyoak, K. J., 10 Lasker, J., 161
Fiske, F-10 Hughes, F-13 Layman, M., 267
Fiske, D. T., 231 Hult, C. A., 358, 359 Leahey, F-5, F-6, F-7
Fletcher, B., 179 Hume, D., 206 Leahey, T. H., 9, 206, 207
Flick, F-2, F-3, F-7, F-8, F-9, F-14, F-15, Humphreys, 1. G., 262 Leddo, J., 288
F-16 Hyman, D. B., 555 Lehman, D. R., 570
Fong, G. T., 6, 18 Leik, R. K., 425
Fontana, F-17 Lempert, R. 0., 570
Francis, F-11 Isen, A. M., 416 Li, c., 207
Frankenberger, S., 124 Lichtenstein, S., 265, 267, 268
Freud, S., 138 J Lichtman, F-11
Frey, F-17 Janesick, F-lO, F-16 Lincoln, F-7, F-9
Friedman, E., 540 Jennings, D. 1., 205, 206 Lisle, D., 555
Name Index NI·3
SI-1
81·2 Subject Index
Carryover effect, 393-394 Consistency, correlation as, 228-229 prepping data for, D-3-4
Categorical variables, chi-square test of Constants, 7 statistics versus, 4-5
independence of, 534-538 Constructs, hypothetical, 65-66 Data display. See Graphs; Tables
Causality Construct validity, 67, 69 Datum, 2, 5
ANOVA for examination of, 425-426 Content analysis, F-IO-I2 Deductive reasoning, 11
correlation and, 71, 207-209, Continuous variables, 2, 20-21 Definitions
221-222,232 Contrast analysis, 447-451 descriptive, 63, 64
importance of determining, 61-63 Contrasts, 448 operational, 63-64
making conclusions about, 565 as carryover effect, 394 Degrees of freedom, 341-343
Cell size, for one-variable repeated- Control, illusion of, 54-55 Dependent events, multiplication rule
measures ANOVA, 508 probability and, 304 for, 272, 293
Census sampling, 52 Convenience samples, 49, 52 Dependent groups, t test for. See t test
Central limit theorem, 322-323 Convergent validity, 68, 69 for correlated (dependent)
Central tendency, 134-150 Cooking data, 575 groups
average and, 135-136 Correlated research designs, hypothesis Dependent variables, 44, 57-60,61
choosing measure to use, 132, testing with. See t test for degree of association between
148, 149 correlated (dependent) groups independent variable and,
mean and. See Means Correlation, 205-238 389-392,443,445,511-512
median and, 144-145, 148, 149 causation and, 71,207-209, precision of, power and, 353
mode and, 145-147, 148, 149 221-222,232 timing of measure of, variability
shapes of frequency distributions and, linear relationships and, 212-216 and, 168
147-148 negative, 70-71, 211, 212-213 types of, 58-60
utility of, 147-150 positive, 70, 211, 212, 213-214 variability and, 167-168
writing about, 149-150 as reliability, 228-229 Descriptive definitions, 63, 64
Chartjunk, 124 significant, 347-349 Descriptive statistics, 16-17. See also
Chi-square test, 528-541 signs of, 212 Central tendency; Means;
for goodness-of-fit, 529-534 spurious, 204, 224 Standard deviation; Variability
for independence, 534-540 strength of, 212 Deviation IQ scores, 195
statistical assumptions of, 529 third variable problems and, Directional hypotheses, 333
supporting statistics for, 522 208-209 Discontinuous variables, 2, 20
two-tailed critical ratios of, B-12 writing about, 226-227 Discounting, F-17
writing about, 533-534, 539-540 zero, 71,211-212,214,215 Discriminant validity, 68, 69
Clinical decision making, intuition Correlational designs, 70-72 Discussion section of paper, C-6
versus prediction in, 253 Correlation coefficients Distribution of sample means, 319-322
Cluster sampling, 51, 52 Pearson. See Pearson (product- Distributions. See Frequency
Coefficient of determination, 222, moment) correlation coefficient distributions
257-258 (Pearson r) Division, review of, A-3-4
Coefficient of nondetermination, Spearman rank order, 551-555 Division symbol, 25
222-224,257-258 Correlation matrices, 230, 231 Dracula effect, 576
College choice, statistical analysis for, Counterbalancing, 394-395
6-8, 18 Covariance, calculating, 217-218 IE
Comparison groups, 75 Cramer's V statistic, 539 Economics, linear regression in, 262
Comparisons, 439 Criterion variable, 242 Education
pairwise, 421 Critical region, 339 linear regression in, 262
post hoc, 439-445 Critical values, rejecting null hypothesis statistics and data analysis and,
Computer data analysis, 572-573 and,338-340 573-574
Conceptual replication, 47 Cumulative frequency distributions, Effect size
Conditional probabilities, 290-291, 116-117 for correlated research groups, 399-400
292-293 Cumulative percentages, 117-118 for F ratio, 442-443
Confidence intervals, 327 one-variable repeated-measures
defining using standard error of the I!:Il ANOVA and, 511-512
mean, 327-329 Data, definition of, 2, 5 power and, 354-355
for one-sample t tests, 375-376 Data analysis, 4 ttest and, 388-389, 390-391
Confounded variables, 73-74 computerized, 572-573 two-variable (two-factor) ANOVA
Conjunction fallacies, 288-289 fraudulent, false, or misleading, 576 and,486-487
Subject Index S~·3
Multiple correlation coefficients, 264 Null results, need for writing up and Spearman rank order correlation
Multiple (multivariate) regression, reporting, 550 coefficient for, 551-555
263-264 Numbers, relative size of, 38 Wilcoxon matched-pairs signed-ranks
Multiplication, review of, A-3 test for, 547-551
Multiplication rules for probability o Ordinal scales, 24, 26-27, 29
for dependent events, 272, 293 Observations, number of, 33, 138 measures of association under, 233
for independent and conditional Observed scores, 66 non parametric tests and, 526
probabilities, 272, 287, 289-293 Odd-even reliability, 229 tied ranks and, 544
Multiplication symbol, 25 Omnibus statistical tests, ANOVA as, Ordinate, 97
Mutually exclusive events, 286-287 423-424 Organismic variables, 378-379
One-factor analysis of variance, 426-439 Outlying scores, correlation and, 225
example of, 431-439 Oversight bias, 232
N factors, 427
N, 32,33 general linear model for, 431 p
Narrative interviews, F-14 identifying statistical hypotheses for, Paired t test. See t test for correlated
Narratives, F-I4-15 427-429 (dependent) groups
Negative correlation, 70-71, 211, notation for, 429 Pairwise comparison, 421
212-213 repeated-measures. See One-variable Papers, writing. See Writing
Negative skew, 113 repeated-measures analysis of Parametric statistics, 525
Nominal scales, 24, 25-26, 29 variance Parametric tests, differences from
measures of association under, 233 steps in, 429-430 nonparametric tests, 525-527
nonparametric tests and, 526 One-sample ttest, 314, 366, 372-377 Participant observation, F-12
Nondirectional hypotheses, 333 confidence intervals for, 375-376 Pearson (product-moment) correlation
Nonparametric statistics, 525 power of, 377 coefficient (Pearson r), 207,
Nonparametric tests, 523-559 writing about results of, 374-375 209-221
chi-square test. See Chi-square test One-tailed significance tests, 340-341 calculating, 216-221
choosing among, 522 power and, 354 conceptual defmition of, 209-210
differences from parametric tests, One-variable (one-way) analysis of critical values for, B-6
525-527 variance, 412-426 factors influencing, 224-226
guide to, 528 comparison with other statistical tests, magnitude of, 222
Mann-Whitney Utest, 541-547 423-426 mean deviation approach to, 219-220
Spearman rank order correlation decision to use, 410 pitfalls in calculating, 220-221
coefficient, 551-555 error variance and, 415-416 relation to z scores, 216-217
steps for testing hypotheses using, F distribution and, 417-418 sum of the squares and, 217-221
530 risk of type I error with multiple t Peer review, of manuscripts, 179
uses of, 527-528 tests and, 420-423 Percentages (percents), 91-92
Wilcoxon matched-pairs signed-ranks total variance and, 414-415 cumulative, 117-118
test, 547-551 treatment variance and, 415-417 probability related to, 281-282
Nonrandom sampling, 51-52 t test compared with, 418-420 Percentile ranks, 115
Normal distribution, 111-112, 186-199 yielding omnibus F ratio, procedure calculating, 118-119
approximating with binomial following, 410 finding scores from, 119-120
distribution, 301-305 One-variable repeated-measures analysis finding with z scores, 191-192
area under curve and, B-3-5, of variance, 499-514, 516-519 of z scores, 176
187-189 effect size and, 511-512 Percentiles, 115-116
writing about, 197-198 hypotheses for, 503 middle, calculating, 120-122
z scores and, 190-191. See also procedure following, 498 writing about, 122-123
z scores statistical assumptions of, 502-503 Personnel decisions, linear regression
Not equal to symbol, 25 steps in, 503-510 in, 262
Null hypothesis, 333-336 Tukey's HSD test and, 510-511 Phi coefficient, 539
acceptance of, 335 writing about results of, 512-513 Physiological measures, 59, 60
controversy over significance testing Operational definitions, 63-64 Platycurtic distributions, 113
of, 564-567 writing, 64 Point estimation, 317
establishing criteria for rejecting, Order of operations, 34-38 Population parameters, 13, 132
338-340 Ordinal data, 541 estimation of, 132,317-318
MAGIC criteria and, 342, 343 Mann-Whitney Utest for, 541-547 Populations, 12,316
51-6 Subject Index
Population standard deviation, 162 content analysis and, F-10-12 linear. See Linear regression; Simple
Population variance, 155, 161-162 in groups, F-15-17 linear regression
Positive correlation, 70, 211, 212, history of, F-5, F-7 multiple (multivariate), 263-264
213-214 measurement in, F-4-9 Regression analysis, 242
Positive skew, 113 narratives and life history and, Regression lines, 244-245
Post hoc tests, 424, 439-445 F-14-15 Regression sum of squares, 256
Power, 351-355 participant observation and Rejection, region of, 339
effect size and, 354-355 ethnography and, F-12-14 Relative frequency distributions, 92
factors affecting, 314, 351-354 readings on, F-18 Reliability, 66-67
one-sample ttest and, 377 rules of thumb for, F-lO alternate form, 229
Power analysis, 225-226, 362, 400-402 skills needed for, F-16 correlation as, 228-229
Practical significance, 337 writing about, F-17-19 internal consistency measures
Predictor variable, 242 Quartiles, calculating, 121-122 of,229
Priority, mathematical rules of, 34-38 Quasi-experiments, 74-76 item-total, 229
Probability, 273-311 Quota sampling, 51-52 odd-even, 229
addition rule for, 272, 285-287 split-half, 229
binomial distribution to determine, R standard error of mean as measure
299-305,307-310 Random assignment, 49, 52 of,328
calculating, 272 performing, 80-81 test-retest, 229
classical theory of, 277-281, 282 random sampling versus, 44, 48-50 Reliability coefficients, 66-67, 229
conditional, 290-291, 292-293 Random errors, power and, 354 Repeated-measures analysis of variance,
gambler's fallacy and, 275-276, 277 Randomization one-variable. See One-variable
history of, 284 documentation of procedures for, 566 repeated-measures analysis of
joint, 285-286 variability and, 167 variance
marginal, 292 Randomness, 62 Repeated measures research designs, 393.
multiplication rule for dependent illusion of control and, 54-55 See also t test for correlated
events and, 272, 293 probability and, 275-276, 277 (dependent) groups
multiplication rule for independent Random numbers tables, B-2, 77-81 Repeated-measures research designs,
and conditional probabilities random assignment using, 80-81 500-501. See also One-variable
and,272,287,289-293 random selection using, 79-80 repeated-measures analysis of
obtaining from frequency sample of, 78-79 variance
distributions, 283 Random sampling Replacement
proportion and percentage related to, performing, 79-80 sampling with, 279-280
281-282 procedures for, 50-51 sampling without, 280-281
p values and, 272, 305-307, 377 random assignment versus, 44, 48-50 Replication, 47-48
subjective, 274 simple, 14-15,52 conceptual (systematic), 47
writing about, 306-307 Range, 153-154 Reports, writing. See Writing
z scores and, 198-199,294-299 interquartile, 153-154 Representativeness heuristic, 55-56
Probability tests, one-and two-tailed, restricted, correlation and, 224 Research designs, 44,70-76
340-341 semi-interquartile, 153 between-subjects (between-groups),
Probability values. See p values writing about, 168 378. See also Independent
Problems, in research procedure, Ratio IQs, 194-195 groups t test
accounting for, 566 Ratio scales, 24, 28-29 complex, 460-461
Proportion, probability related to, measures of association under, 233 correlated (matched) groups (before-
281-282 Raw scores, 182 after; repeated-measures; within-
Proportions, 90-91 converting to z scores, 176 subjects), 393. See also t test for
Psychology, as science, F-6 Reasoning correlated (dependent) groups
Psychometric information, 566 deductive, 11 correlational,70-72
Punctuation, checking, C-1-2 inductive, 10-11 experimental,72-74
p values, 272, 305-307 Record keeping, D-4 factorial, 462-463
power of ttest and, 377 Redundancy, in charts and tables, 125 Latin square, 514
References section of paper, C-6 matching statistical tests to,
Q Region of rejection, 339 527-528,562
Q test, 440-442 Region of retention, 340 mixed (between-within), 515-516
Qualitative research, F-1-20 Regression multifactorial, 489-490
Subject Index 81·7
independent. See Independent active voice for, C-2 spelling, grammar, and punctuation
variables APA style and. See American and, C-I-2
predictor, 242 Psychological Association about standard scores, 197-198
representativeness heuristic and, (APA) style about statistical test results, 355-357
55-56 audience for, C-2 about two-variable (two-factor)
subject (organismic), 378-379 avoiding gender bias and sexist ANOVA results, 488-489
symbols for, 32-33 language in, C-3 about variability, 168-170
Variance, 155 basic advice and techniques for, about Wilcoxon matched-pairs signed-
calculating from a data array, C-I-3 ranks test results, 547-551
160-161 about central tendency, 149-150
error, 251, 415-416 charts and tables in, 123-126
estimators of, 162-165 about chi-square test results, 533-534, x
explained and unexplained, 256-258 539-540 X,32-33
population, 155, 161-162 about correlated groups t test x axis, 97
residual, 251 results, 399
sample, 155, 157-168 about correlation, 226-227
total,414-415 about correlational relationships, y
treatment, 415-417 226-227 y, 32-33
writing about, 168 about hypotheses, 355 yaxis, 97
Variance key, on calculators, 165 about independent groups t test
Variation results, 387-388
between-group (between-sample), 383 about Mann-Whitney Utest results, z
within-group (within-sample), 546-547 z distribution, t distributions related to,
383-384 about normal distribution, 197-198 368
about null results, 550 Zero correlation, 71, 211-212, 214, 215
about one-variable repeated-measures z scores, 182-185, 190-199
W ANOVA results, 512-513 converting raw scores to, 176
Weighted means, 142 about one-way ANOVA results, distributions of, 184-185
WIlcoxon matched-pairs signed-ranks 445-446 on either side of mean, area between,
test (Wilcoxon 1'),547-551 of operational definitions, 64 192-193
critical values of T statistic and, B-15 peer review of manuscripts and, 179 finding percentile ranks with,
writing about results of, 547-551 about percentiles, 122-123 191-192
Within-group (within-sample) variation, about probability, 306-307 formulas for calculating, 185-186
383-384 about qualitative research, F-17-19 linear regression and, 242-243
Within-subjects design, 393. See also about regression results, 263 normal distribution and, 190-191
t test for correlated about results of one-sample t tests, Pearson ts in relation to, 216-217
(dependent) groups 374-375 percentile rank of, 176
Within-subjects designs, 500-501. See revising and, C-2-3 probability and, 198-199,294-299
also One-variable repeated- about scales of measurement, 29-31 on same side of distribution, area
measures analysis of variance about Spearman rank order between, 193-196
Writing, 23-24, C-I-I0 correlation coefficient, 554 z test, 344-347
Subject Index
S\·5
Multiple correlation coefficients, 264 Null results, need for writing up and Spearman rank order correlation
Multiple (multivariate) regression, reporting, 550 coefficient for, 551-555
263-264 Numbers, relative size of, 38 Wilcoxon matched-pairs signed-ranks
Multiplication, review of, A-3 test for, 547-551
Multiplication rules for probability o Ordinal scales, 24, 26-27, 29
for dependent events, 272, 293 Observations, number of, 33, 138 measures of association under, 233
for independent and conditional Observed scores, 66 nonparametric tests and, 526
probabilities, 272, 287, 289-293 Odd-even reliability, 229 tied ranks and, 544
Multiplication symbol, 25 Omnibus statistical tests, ANOVA as, Ordinate, 97
Organismic variables, 378-379
Mutually exclusive events, 286-287 423-424
One-factor analysis of variance, 426-439 Outlying scores, correlation and, 225
example of, 431-439 Oversight bias, 232
N factors, 427
general linear model for, 431 p
•I' N, 32, 33
identifying statistical hypotheses for, Paired t test. See t test for correlated
Narrative interviews, F-14
Narratives, F-I4-15 427-429 (dependent) groups
Negative correlation, 70-71, 211, notation for, 429 Pairwise comparison, 421
212-213 repeated-measures, See One-variable Papers, writing, See Writing
Negative skew, 113 repeated-measures analysis of Parametric statistics, 525
Nominal scales, 24, 25-26, 29 variance Parametric tests, differences from
measures of association under, 233 steps in, 429-430 nonparametric tests, 525-527
nonparametric tests and, 526 One-sample ttest, 314, 366, 372-377 Participant observation, F-12
Nondirectional hypotheses, 333 confidence intervals for, 375-376 Pearson (product-moment) correlation
Nonparametric statistics, 525 power of, 377 coefficient (Pearson r), 207,
Nonparametric tests, 523-559 writing about results of, 374-375 209-221
chi-square test, See Chi-square test One-tailed significance tests, 340-341 calculating, 216-221
choosing among, 522 power and, 354 conceptual definition of, 209-210
differences from parametric tests, One-variable (one-way) analysis of critical values for, B-6
525-527 variance, 412-426 factors influencing, 224-226
guide to, 528 comparison with other statistical tests, magnitude of, 222
M.<o.nn-WllItney U test, 541-547 423-426 mean deviation approach to, 219-220
Spearman rank order correlation pitfalls in calculating, 220-221
decision to use, 410
coefficient, 551-555 relation to z scores, 216-217
error variance and, 415-416 sum of the squares and, 217-221
steps for testing hypotheses u~ing, F distribution and, 417-418 Peer review, of manuscripts, 179
JjQ risk of type I error with multiple t Percentages (percents), 91-92
tests and, 420-423 cumulative, 117-118
uses of, 527-528
total variance and,414-415 probability related to, 281-282
Wilcoxon matched-pairs signed-ranks
treatment variance and, 415-417 Percentile ranks, 115
test, 547-551
t test compared with, 418-420 d calculating, 118-119 120
NonT~nd.om ~o.U1pling) 51-52 ,
yielding ommbus ra ,
F t1'0 proce ure finding scores from, 119- 2
Norm~ distribution, 111-112, 186-199 following, 410
d measures analYSiS
, finding with z scores, 191-19
approximating with binomial of z scores, 176
One-variable repeate - 514 516-519
of variance, 499- , Percent iles, 115-116 '120-122
distribution, 301-305
512 middle, calculatlUg,
area under curve and, B-3-5, effect size and,511- ,' bout, 122-123 ,
187-189 hypotheses for, 503 wnting a " linear regresS10n
£ llowing, 498 Personnel declS 1ons ,
writing about, 197-198 procedure 0 , f 502-503 in, 262
statistical assumptl ons 0,
Z scores and, 190-191. See also phi coefficient, 539 59 60
z scores stepS in, 503-510 d 510-511 Physiological m~asu~es, ;13
I. y' HSD test an , 3
Not equal to symbol, 25 TUl\.e s ults of, 512-51 PI tycurtic distnbutions,
' 333-336 writing about r~~ 3-64 a '317
point estimatlOn , 13 132
Null hypoth eSiS, 335 OperatlO
, al deflUitlOnS, 6
n , parameters, '
acceptance of, "hcance testing populatlO~ 32 317-318
writing, 64 38 estimation of, 1 ,
controversy over SlgUl Order of operations, 34- populations, 12,316
of,564-567, f e'ecting, 547
( establishing critena or r ) Ordinal data',541 Utest for, 541-
Mann-Whitney
~
I 338-34°,
MAGIC critena and, 3
42,343
(
(
f
(
!
Subject Index
Q test, 440-442 References S;ctioC a~ts and tables, 125 factorial, 462-463
Region of reO . no paper, C-6 Latin square, 514
Qualitative research F 1 R· JectlOn, 339
, - -20 eglOn of retention 340 matching statistical tests to
Regression ' . 527-528, 562 '
mlXed (between-withi )
multifactorial, 489-49~ ,515-516
J
~I
..
Selected
Statistical
Symbols Description First appears on page
x,y Variables 6
N Total number of observations 33
I Upper-case sigma (Greek): "to sum" 33
M Mu (Greek), population mean 136
if- Lowercase sigma-squared (Greek) population variance 155
u Lowercase sigma (Greek) population standard deviation 161
f Frequency 87
p Proportion or probability (p value) 281
% Percent or percentage 91
cf Cumulative frequency 116
c% Cumulative percentage 117
w Width of a class interval 119
PR Percentile rank 115
Q Quartile (e.g., Ql is the first quartile) 121
X Sample mean 319
Xw Weighted mean (based on more than one sample) 142
mdn Median (in APA style) 144
M Sample mean or mean (in APA style) 136
SD Standard deviation (in APA style) 168
Sample variance 155
Sample standard deviation 158
SIQR Semi-interquartile range 153
S5 Sum of the squares 155
A Unbiased estimate or estimator of some statistic (e.g., 5) 132
n Number of observations in a subsample of N 32
z z Score or standard score 182
T Standard score reported as a positive, whoie number; or a condition total
in contrast analysis; or Wilcoxon matched-pairs signed-ranks test 182
Pearson r or correlation coefficient 207
Coefficient of determination 222
k Coefficient of nondetermination, or number of groups available 222
p(x) Probability of some event occurring p(x) 278
p(AI B) Conditional probability (of event A given B is true) 290
Expected value of sampling distribution of the mean 320
Variance of the sampling distribution of the mean 379