How To Use SPSS
How To Use SPSS
Brian C. Cronk
I
ll
:,-
Dtfbsr h ProportdE
'|
Ind.Fddrt Vri*b
NOTE: Relevant numbers section are givenin parentheses. instance, For '(6.9)" you refers to Section in 6.9 Chapter 6.
Notice Inc. images@ by SPSS, Inc. Screen SPSSis a registered trademark SPSS, of Usedwith permission. and MicrosoftCorporation. by or This book is not approved sponsored SPSS.
Corporation. "Pyrczak A Publisher, California Publishing" an imprintof FredPyrczak, is and the Althoughtheauthor publisher and havemade everyeffortto ensure accuracy for no responsibility completeness information of contained thisbook,we assume in Any slightsof people, herein. inaccuracies, errors, omissions, anyinconsistency or places, organizations unintentional. or are Director: Project MonicaLopez. M. MatthewGiblin,Deborah Oh, Consulting Editors: Bumrss, L. George Jose Galvan, Rasor. JackPetit.andRichard provided CherylAlcorn,Randall Bruce,KarenM. Disner, R. Editdrialassistance by Brenda Koplin,EricaSimmons, Sharon and Young. Kibler andLarryNichols. Coverdesign Robert by in Printed theUnitedStates America Malloy,Inc. of by All Publisher. rights Copyright 2008,2006,2004,2002,1999 FredPyrczak, @ by in or reserved. portionof thisbookmaybe reproduced transmitted anyform or by any No means withouttheprior writtenpermission thepublisher. of r s B N l -8 8 4 s8 5 -79 -5
Tableof Contents
Introduction theFifth Edition to What'sNew? Audience Organization SPSS Versions Availability SPSS of Conventions Screenshots Practice Exercises Acknowledgments '/ Chapter I Ll t.2 1.3 1.4 1.5 1.6 1.7 Chapter 2 Getting Started Starting SPSS Entering Data DefiningVariables Loading Saving and DataFiles Running Your FirstAnalysis Examining Printing and OutputFiles Modi$ing DataFiles Entering ModifyingData and Variables DataRepresentation and Transformation Selection Data and of Descriptive Statistics Frequency Distributions percentile and Ranks a singlevariable for Frequency Distributions percentile and Ranks Multille variables for Measures CentralTendency Measures Dispersion of and of for a Single Group Measures Central of Tendency Measures Dispersion and of for MultipleGroups Standard Scores Graphing Data Graphing Basics TheNew SPSS ChartBuilder Bar Charts, Charts, Histograms Pie and Scatterplots Advanced Charts Bar EditingSPSS Graphs
Predictionand Association PearsonCorrelation Coeffi cient SpearmanCorrelation Coeffi cient Simple Linear Regression Multiple Linear Regression
v v v v vi vi vi vi vii vii
I I I 2 5 6 8 ll ll t2
l7 t7 20
2l 24
)7
29 29 29 3l 33 36 39 4l 4l 43 45 49
u,
Chapter 6 6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9 6.10 Chapter 7 7.1 7.2 7.3 7.4 7.5 7.6 Chapter 8 8.1 8.2 8.3 8.4 Appendix A Appendix B
Parametric Inferential Statistics Review BasicHypothesis of Testing Single-Sample t Test Independent-Samples I Test Paired-Samples t Test One-Way ANOVA Factorial ANOVA Repeated-Measures ANOVA Mixed-Design ANOVA Analysis Covariance of MultivariateAnalysisof Variance (MANOVA) Nonparametric Inferential Statistics Chi-Square Goodness Fit of Chi-Square Testof Independence Mann-Whitney UTest WilcoxonTest Kruskal-Wallis Test ,F/ Friedman Test TestConstruction Item-Total Analysis Cronbach's Alpha Test-Retest Reliability Criterion-Related Validiw Effect Size Practice Exercise DataSets Practice DataSet I Practice DataSet2 Practice DataSet3
r 09
109 ll0 ll0 lt3
Appendix C Appendix D
Glossary Sample DataFilesUsedin Text COINS.sav GRADES.sav HEIGHT.sav QUESTIONS.sav RACE.sav SAMPLE.sav SAT.sav OtherFiles
tt7 n7
l l7 l l7
n7
l18 l18 lt8 lt8 l19 t2l
AppendixE Appendix F
Information Users Earlier for Versions SPSS of of Graphing Datawith SPSS 13.0 and 14.0
tv
I Chapter
GettingStarted
1.1 StartingSPSS Section
Startup proceduresfor SPSS will differ of on slightly, depending the exact configuration the machine on which it is installed.On most computers,you can start SPSS by clicking on Start, then clicking on Programs, then on SPSS. there will be an SPSSicon On many installations, on the desktopthat you can double-click to start the program. When SPSS is started,you may be presentedwith the dialog box to the left, depending on the optionsyour systemadministratorselected for your version of the program. If you have the dialog box, click Type in data and OK, which will present blank data window.' a If you were not presented with the dialog box to the left, SPSSshould open automatically with a blank data window. The data window and the output window provide the basic interface for SPSS. A blank data window is shownbelow.
ffi$ t't****
ffi c rrnoitllttt
(- lhoari{irgqrory r,Crcrt*rsrcq.,y urhgDd.b6.Wbrd (i lpanrnaridirgdataura
ChapterI GeningStarted
whether or not they were "morning people" and whether or not they worked. This survey also asked for their final grade in the class (100% being the highest gade possible). below: are The response sheets from two students presented
Response Sheet I ID: Dayof class: Class time: person? Are you a morning Finalgrade class: in Do youwork outside school?
8s%
Full-time
_ -
TTh Afternoon
No
83% Full-time No
X Part-time
into SPSSfor use in future Our goal is to enterthe data from the two students Any informaanalyses. first stepis to determine variables needto be entered. The that the Example participants a variable to that needs be considered. tion that can vary among is 1.2.2 liststhevariables will use. we Example 1.2.2 ID Dayof class Class time Morningperson Finalgrade Whether not the student worksoutside school or particivariables rowsrepresent and In the SPSS represent data window,columns (variables) two rows and pants. Therefore, will be creating datafile with six columns we a (students/participants). Section1.3 Defining Variables about Beforewe can enterany data,we must first entersomebasicinformation that: variable into SPSS. instance, For mustfirst be givennames each variables o beginwith a letter; o do not contain space. a
Thus, the variable name "Q7" is acceptable, while the variable name "7Q" is not. Similarly, the variable name "PRE_TEST" is acceptable, but the variable name "PRE TEST" is not. Capitalizationdoes not matter, but variable namesare capitalizedin this text to make it clear when we are referring to a variable name, even if the variable name is not necessarily capitalizedin screenshots. To define a variable.click on the Variable View tab at thebottomofthema inscre e n .Th is wills h o wy o u t h e V a ri-@ able View window. To return to the Data View window. click on the Data View tab.
Fb m u9* o*.*Trqll
.lt-*l*lr"$,c"x.l
From the Variable View screen,SPSSallows you to createand edit all of the variables in your data file. Each column represents some property of a variable, and each row represents variable. All variablesmust be given a name. To do that, click on the first a empty cell in the Name column and type a valid SPSSvariable name. The program will then fill in default valuesfor most of the other properties. One usefulfunctionof SPSSis the ability to definevariableand value labels.Variable labels allow you to associate descriptionwith each variable.Thesedescriptionscan a describethe variablesthemselves the valuesof the variables. or Value labelsallow you to associate description a with eachvalue of a variable.For example,for most procedures, SPSSrequiresnumerical values.Thus, for data such as the day of the class (i.e., Mon/Wed/Fri and Tues/Thurs),we need to first code the values as numbers.We can assignthe number I to Mon/Wed/Friand the number2to Tues/Thurs. To help us keep track of the numberswe have assigned the values,we use value labels. to To assignvalue labels,click in the cell you want to assignvaluesto in the Values column. This will bring up a small gray button (seeanow, below at left). Click on that button to bring up the Value Labelsdialog box. --When you enter a iv*rl** v& 12 -Jil value label, you must click L.b.f ll6rhl| s*l !!+ | Add after eachentry. This will
mOVe the value and itS associated label into the bottom section of J::::*.-,.Tl
the window. When all labels have been added, click OK to return to the Variable View window.
In additionto namingand labelingthe variable, you havethe option of definingthe variabletype. To do so, simply click on the Type, Width,or Decimals columns in the Variable View window. The default value is a numeric field that is eight digits wide with two decimalplacesdisplayed. your dataare more than eight digits to the left of the decimal If place,they will be displayedin scientificnotation(e.g.,the number2,000,000,000 be will displayed 2.00E+09).'SPSSmaintains as accuracy beyondtwo decimalplaces, all outbut put will be roundedto two decimal placesunlessotherwiseindicatedin the Decimals column. In our example, will be usingnumericvariables we with all of the defaultvalues. Practice Exercise Createa data file for the six variablesand two samplestudents presented Examin ple 1.2.1.Name your variables: DAY, TIME, MORNING, GRADE, and WORK. You ID, should code DAY as I : Mon/Wed/Fri,2 = Tues/Thurs.Code TIME as I : morning, 2 : afternoon. CodeMORNING as 0 = No, I : Yes. Code WORK as 0: No, I : Part-Time, 2 : Full-Time. Be sure you enter value labels for the different variables.Note that because value labelsare not appropriatefor ID and GRADE, theseare not coded.When done,your Variable View window should look like the screenshot below: J -rtrr,d
{!,_q, ru.g
Click on the Data View tab to open the data-entryscreen.Enter data horizontally, beginningwith the first student'sID number.Enter the code for eachvariable in the appropriate column; to enterthe GRADE variablevalue,enterthe student'sclassgrade.
Hhdow E*
ot otttr *lgl dJl ulFId't lr*lEl&lr6lgl slglglqjglej blbl Al 'r i-l-Etetmt olrt'
Theprevious data window canbe changed look instead the screenshot to like beclickingon the Value Labelsicon(seeanow).In this case, cellsdisplay the l*.bv value labelsratherthanthe corresponding codes. datais entered this mode,it is not necesIf in saryto entercodes, clickingthebuttonwhichappears eachcell asthe as in cell is selected will present drop-down of thepredefined a list lablis. You may useeithermethod, according to yourpreference.
: [[o|vrwl
vrkQ!9try /
)1
Instead of clicking the Value Labels icon, you may optionallytogglebetween views by clicking valueLaiels under the Viewmenu. Section1.4 Loading and SavingData Files Onceyou haveentered your data,you will need to saveit with a uniquenamefor later useso that you canretrieve whennecessary. it Loadingand savingSpSSdatafiles worksin the sameway as most Windows-based software. Underthe File menu, there are Open, Save, and Save As commands. SPSSdata files have a .,.sav" extension. which is addedby defaultto the end of the filename. ThistellsWindows thefile is anSpSS that datafile. Save Your Data
Anrfrrr Cr6l!
r ti
il
'i. I
,t1
,t
r lii
|:
H-
When you save your data file (by clicking File, then clicking Save or SaveAs to specify a unique name),pay specialattentionto where you saveit. trrtist systemsdefault to the.location<c:\programfiles\spss>.You will probably want to saveyour data on a floppy disk, cD-R, or removableUSB drive so that you can taie the file withvou.
Load YourData
When you load your data (by clicking File, then clicking Open, thenData, or by clicking the open file folder icon), you get a similar window. This window lists all files with the ".sav" extension.If you have trouble locating your saved file, make sure you are D{l lriifqffi looking in the right directory.
ChapterI GeningStarted
Practice Exercise To be surethat you havemastered saving and openingdata files, nameyour sample datafile "SAMPLE" andsaveit to a removable
FilE Edt $ew Data Transform Annhze @al
storagemedium. Once it is saved,SPSSwill display the name of the file at the top of the data window. It is wise to save your work frequently,in caseof computer crashes. Note that filenamesmay be upper- or lowercase. this text, uppercase usedfor clarity. is In After you have saved your data, exit SPSS (by clicking File, then Exit). Restart SPSS and load your databy selecting "SAMPLE.sav"file you just created. the
j
File Edlt Vbw Data Transformnnafzc Gretrs UUtias gdFrdov*Help I
(llnl El tlorl rl
rttrtJJ
r ktlml lff
Cottpsr Milns )
al ol lVisible:6
GanoralHnnar ) i f&dd ,) Corr*lrtr ) Re$$r$on 't901.00 ir l. Classfy ,. ), . OdrRrdrrtMr ) Scab Norparimetrlc lcrtt l ) Tirna 5arl6t ) Q.rlty Corfrd I tj\g*r*qgudrr,*ts"ussRff(trve,.,
To calculatea mean (average), are asking the computerto summarizeour data we set. Therefore,we run the commandby clicking Analyze, then Descriptive Statistics, then Descriptives. This brings up the Descriptives dialog box. Note that the left side of the box containsa OAY list of all the variablesin our data file. On the right .Sr ql is an area labeled Variable(s), where we can 3s,l specifythe variableswe would like to use in this particular analysis. A*r*.. I f- 9mloddrov*p*vri*lq
l:rt.Ij We want to compute the mean for the variable called GRADE. Thus, we need to select the variable name in the left window (by clicking ;F* | on it). To transfer it to the right window, click on -t:g.J the right arrow between the two windows. The -!tJ arrow always points to the window opposite the f- Smdadr{rdvdarvai& PR:l highlighted item and can be used to transfer selectedvariablesin either direction.Note that double-clickingon the variable name will also transfer the variable to the opposite window. Standard Windows conventions of "Shift" clicking or "Ctrl" clicking to selectmultiplevariables be usedas well. can When we click on the OK button, the analysiswill be conducted,and we will be readyto examineour output.
in
outputis added theendof yourprevious to output. To switchbackandforthbetween data window andtheoutput window,select the thedesired windowfrom the Window menubar(seearrow,below). The output window is split into two sections. left section an outlineof the The is (SPSS "outlineview").Theright section theoutputitself. output refersto this asthe is
H. Ee lbw A*t lra'dorm Craphr ,Ufr!3 Uhdo'N Udp
I -d * lnl-Xj irllliliirrillliirrr
-qg*g!r*!e!|ro_
:l ql el , * Descrlptlves
f]aiagarll
6r**
l: \ lrrs
datc\ra&ple.lav
lle*crhlurr
N ufinuc
I
Sl.*liilca Xsrn
81,0000
valldN (|lstrylsa)
ffiffi?iffi
rr---*.*
r*4
The sectionon the left of the output window provides an outline of the entire output window. All of the analyses listed in the order in which they were conducted. are Note that this outline can be used to quickly locate a sectionof the output. Simply click on the sectionyou would like to see,and the right window will jump to the appropriate place.
ChapterI GeningStarted
Clicking on a statistical procedure all also selects of the output for that command. By pressingtheDeletekey, that outputcan be deletedfrom the output window. This is a quick way to be sure that the output window containsonly the desiredoutput. Output can also be selectedand pastedinto a word processorby clicking Edit, then Copy Objecls to copy the output.You can then switch to your word processor and click Edit, thenPaste. To print your output, simply click File, then Print, or click on the printer icon on the toolbar. You will have the option of printing all of your output or just the currently selected section. Be careful when printing! Each time you mn a command, the output is addedto the end of your previous output. Thus, you could be printing a very large output file containinginformation you may not want or need. One way to ensurethat your output window containsonly the resultsof the current commandis to createa new output window just before running the command.To do this, click File, then New, then Outpul. All your subsequent commandswill go into your new output window. Practice Exercise Load the sampledata file you createdearlier (SAMPLE.sav). Run the Descriptives command for the variable GRADE and print the output. Your output should look like the exampleon page7. Next, selectthe data window and print it.
Example 1.7.1 Two morestudents provide with surveys. you Theirinformation is:
Response Sheet3 ID: Day of class: Classtime: Are you a morningperson? Final gradein class: Do you work outsideschool?
8734
MWF Morning Yes
80%
Full-time No
Response Sheet4 ID: Day of class: Classtime: Are you a morning person? Final gradein class: Do you work outsideschool?
TTH Afternoon No
X Part-time
To add thesedata, simply place two additionalrows in the Data View window (after loading your sampledata).Notice that as new participantsare added,the row numbers becomebold. when done,the screenshouldlook like the screenshot here.
j '.., .l lrrl vl
nh E*__$*'_P$f_I'Sgr
Tffiffi
1
2 3
)
ID DAY TIME MORNING GRADE WORK 4593.00 Tueffhu aternoon No 85.00 No 1gnl.B0 MonMed/ m0rnrng Yes 83.00 Part-Time 8734.00 Tue/Thu mornrng No 80,00 No 1909.00MonAfVed/ mornrng Yeg 73.00 Part-Time
var
\^
. mfuUiewffi
rb$ Vbw /
l{ l
Procus*r ready ls 15P55 I
'.-
I
-
rll
-,,,---Jd*
,4
New variables can also be added.For example, if the first two participantswere given specialtraining on time management, the two new participantswere not, the data and file can be changedto reflect this additionalinformation.The new variable could be called TRAINING (whether or not the participant receivedtraining), and it would be coded so that 0 : No and I : Yes. Thus, the first two participantswould be assigneda "1" and the Iast two participantsa "0." To do this, switch to the Variable View window, then add the TRAINING variable to the bottom of the list. Then switch back to the Data View window to updatethe data.
f+rilf,t tt Inl vl
14:TRAINING l0
1
I
i lvGbtr of
1r
3 4
I
(l)
t0 NAY TIME MORNING GRADE woRK I mruruwe 4593.0f1 Tueffhu aterncon No 85.0u No l Yes yes 1901.OCI ManA/Ved/ m0rnrng Yes ffi.0n iiart?mel8734"00 Tueffhu momtng No 80.n0 Noi No 1909.00 onrlVed/ morning M Yes 73.00 Part-Time No ' I
.r View { Vari$c Vlew
isPssW
l-.1 =J "
rll'l
,i
Adding data and adding variablesare just logical extensionsof the procedures we used to originally createthe data file. Save this new data file. We will be using it again later in the book.
Practice Exercise Follow the exampleabove(whereTRAINING is the new variable).Make the modifications your SAMPLE.sav file andsaveit. to data
l0
2 Chapter
EnteringandModifying Data
In Chapter 1, we learnedhow to createa simple data file, save it, perform a basic analysis,and examine the output. In this section,we will go into more detail about variablesand data.
(assuming the distribution statistic that summary tio scale,andthe mean is an acceptable is normal). a We could have had SPSScalculate mean for the variableTIME insteadof here. gettheoutputpresented GRADE.If we did, we would that TIME was 1.25.Remember TIME was that The outputindicates the average
coded as an ordinal variable ( I = m or ni ngcla ss,2-a fte rnoon g trlllql eilr $lclass).Thus, the mean is not an *lq]eH"N-ql*l appropriatestatisticfor an ordinal :* Sl astts .l.:D gtb it scale,but SPSScalculated any:$sh way. The importance of considering the type of data cannot be overemphasized. Just because ht6x0tMn SPSS will compute a statistic for you doesnot mean that you should
.6M6.ffi
$arlrba"t S#(|
LS a 2.qg
Lt@
ll
use it. Later in the text, when specificstatistical procedures discussed, conditions the are underwhich they are appropriate will be addressed. Missing Data you may have Often, participantsdo not provide completedata.For some students, a pretestscore but not a posttestscore.Perhapsone studentleft one question blank on a survey,or perhapsshe did not stateher age.Missing data can weakenany analysis.Often, can eliminatea suba singlemissingquestion ject from all analyses. ql total If you have missing data in your data 2.00 2.Bn 4.00 set, leave that cell blank. In the example to 3.00 1.0 0 4.00 the left, the fourth subject did not complete Question2. Note that the total score(which is 4.00 3.00 7.00 calculatedfrom both questions)is also blank becauseof the missing data for Question 2. 2.00 missing data in the data SPSS represents 1 .0 0 2.UB 3.00 window with a period (althoughyou should not enter a period-just leave it blank).
t'llitl&JE
il :id
We can use the SelectCasescommandto specify a subset of our data. The Select Cases command is located under the Data menu. When you select this command,the dialog box below will appear.
q*d-:-"-- "-"""-*--*--**-""*-^*l
6 Alce a llgdinlctidod ,r l
ConyD*S
r irCmu*dcaa
i*np* |
sd.rt Csat
{^ lccdotincoarrpr
i :
;.,* |
-:--J llaffrvci*lc
You can specify which cases(participants) you want to select by using the selection criteria, which appearon the right side of the Select Cases dialog box.
C6ttSldrDonoan!.ffi
l0&t
foKl
aar I c-"rl x* |
t2
By default,All caseswill be selected. The most common way to selecta subsetis to click If condition is satisfied, then click on the button labeledfi This will bring up a new dialog box that allowsyou to indicate which cases you would like to use. You can enter the logic used to select the subset in the upper section. If the logical statement is true for a given case, then that case will be U;J;J:.1-glL1 E{''di',*tI , 'J-e.l-,'J lJ.!J-El [aasi"-Eo,t----i selected.If the logical statement is false. that case will not be 0 U IAFTAN(r"nasl ,Jl _!JlJ selected.For example, you can sl"J=tx -s*t"lBi!?Blt1trb :r select all casesthat were coded ?Ais"I c'-t I Ht I as Mon/Wed/Fri by entering the formula DAY = I in the upperright part of the window. If DAY is l, thenthe statement will be true,and SPSSwill select the case.If DAY is anything other than l, the statement will be false, and the casewill not be selected.Once you have enteredthe logical statement,click Continue to return to the SelectCases dialog box. Then,click OK to returnto the data window. After you have selected cases, data window will changeslightly. the the The casesthat were not selected will be markedwith a diagonalline through the casenumber. For example,for our sampledata, the first and third casesare not selected. only the secondand fourth cases selected this subset. are for
= ilqex4q lffiIl,?,l*;*"'
, I I
I
i{
,1
'l
EffEN'EEEgl''EEE'o
,.,:r.
rt
lnl vl
1
I
!k_l**
-#gdd.i.&l Flib'/,-<
2-'4 4
1 :
TIME
*-
MORNING ERADE WORK TRAINING 4533.m Tueffhui affsrnoon No ffi.m Na Yes Not Selected 1901.m MpnMed/i mornino Yss 83,U1Fad-Jime Yes Splacled 6h4lto TuElThu morning No m.m . No No Not Selected ieifrfft MonA/Ved/1 morning Yes ru.mPart-Time No
. -..- ^,-.-.*.*..,-J.- . - .-..,..".*-....- ':
ID
'l
'l
fsPssProcaesaFrcady
. *-J I
*] ,1,
An additional variable will also be createdin your data file. The new variable is called FILTER_$ and indicateswhethera casewas selected not. or If we calculatea mean Descripthre Stailstics GRADE using the subset we just selected,we will receive std. N Minimum Maximum M e a n Deviation the output at right. Notice that UKAUE 2 73.00 83.00 78.0000 7 . 0 7 1 1 we now have a mean of 78.00 Va lid N 2 IliclwisP'l with a samplesize (M) of 2 insteadof 4.
l3
Be careful when you selectsubsets.The subsetremains in ffict until you run the the because commandagain and selectall cases.You can tell if you have a subsetselected you examine bottom of the data window will indicatethat a filter is on. In addition, when your output, N will be less than the total number of recordsin your data set if a subsetis The diagonal lines through some caseswill also be evident when a subsetis seselected. as lected.Be careful not to saveyour data file with a subsetselected, this can causeconsiderableconfusionlater. Computing a New Variable SPSScan also be used to compute a new variable or nh E* vir$, D.tr T|{dorm manipulateyour existing vari*lslel EJ-rlrj -lgltj{l -|tlf,l a*intt m eltj I ables. To illustrate this, we will create a new data file. This file will contain data for four participants and three variables (Ql, Q2, and Q3). The variables represent the points each number of l* ,---- LHJ participant received on three {#i#ffirtr!;errtt*; different questions.Now enter the data shown on the screen to the right. When done, save this data file as "QUESTIONS.sav." will be usingit againin laterchapters. We
I TrnnsformAnalyze Graphs Utilities Whds
into Rersde 5ame Variable*,,, into Varlables. Racodo Dffferant ,, Ar*omSic Rarode,,. Vlsual 8inrfrg,..
Now you will calculatethe total score for eachsubject.We could do this manually,but if the data file were large, or if there were a lot of questions,this would take a long time. It is more efficient (and more accurate) to have SPSS compute the totals for you. To do this, click Transform and then click Compute Variable.
After clicking the Compute Variable command, we get the dialog box at right. The blank field marked Target Variable is where we enter the name of the new variable we want to create. In this example, we are creating a variable called TOTAL, so type the word "total." Notice that there is an equals sign between the Target Variable blank and the Numeric Expression blank. These two blank areas are the
rrw I i+t*... gl w ca
*l
lllmr*dCof
0rr/ti*
til
&fntndi) Oldio.
E${t iil
:J
, rr | {q*orfmsrccucrsdqf
n* ri c* rl
"*l
l4
iii:Hffiliji:.:
. i . i> t
ii"alCt
J- l
|f- - | ldindm.!&dioncqdinl
tsil
nact I
c:nt I
x*
two sides of an equation that SPSS will calculate. For example, total : ql + q2 + q3 is the equation that is entered in the sample presentedhere (screenshot left). Note that it is posat sible to create any equation here simply by using the number and operational keypad at the bottom of the dialog box. When we click OK, a SPSSwill create new variablecalled TOTAL and make it equalto the sum of the threequestions. Save your data file again so that the new variablewill be available for future sessions.
t::,,
Eile gdit SEw Qata lransform $nalyza 9aphs [tilities Add'gns Sindow Help
- ltrl-Xl
3.n0
4.00
3.0n
4,n0
10.00
31 2.oo ..........;. l
41
I
2.oo
3 0 01
1.0 0 1
.:1
l-'r --i-----i
lit W*;
r ljl
Recodinga Variable-Dffirent
Variable
F{| [dt
---.:1.l{ rr 'r trl
SPSS can create a new variable based upon data from another variable. Say we want to split our participants the basisof on their total score.We want to create a variablecalled GROUP, which is coded I if the total score is low (lessthan or equal to 8) or 2 if the total score is high (9 or larger). To do this, we click Transform, then Recodeinto Dffirent Variables.
!la{
Data j Trrx&tm
Analrra
vdiouc',' conp$o
Cd.nVail'r*dnCasas.,,
-o ..* * ^ c- - u - r - c
4.00
Art(tn*Rrcodr... U*dFhn|ro,,.
2.00
i.m
Racodr 0ffrror* Yal lrto
S*a *rd llm tllhsd,,, Oc!t6 I}F sairs.., Rid&c l4sitE V*s.,. Rrdon iMbar G.rs*trr,,.
l5
Ch a p te 2 En te ri n g n d Mo d i fy i n gD ata r a
This will bring up the Recode into Different Variables dialog box shown here. Transfer the variable TOTAL to the middle blank. Type "group" in the Name field under Output Variable.Click Change,and the middle blank will show that TOTAL is becoming GROUP.as shownbelow.
NtnHbvli|bL-lo|rnrV*#r
til
-'tt"
rygJ **l-H+ |
To help keep track of variablesthat have been recoded, it's a good idea to open the t *.!*lr Variable View and enter "Recoded" in the Label rr&*ri*i*t column in the TOTAL row. This is especially ;rlnr-":-'-'1** I useful with large datasets which may include i T I r nryrOr:frr**"L many recodedvariables. ,fClick Old andNew Values.This will bring i c nq.,saa*ld6lefl; Fup the Recodedialog box. In this example,we have entered a 9 in the Range, value through HIGHEST field and a 2 in the Value field under New Value.When we click Add, the blank on the ,.F--*-_-_-_____ right displaysthe recodingformula. Now enter an :I "a *r***o lrt*cn*r I I nni. 8 on the left in the Range, LOWEST through rT..".''..."...value blank and a I in the Value field under New I ir:L-_t' Value. Click Add, then Continue.Click OK. You l6F i4i'|(tthah* ; T &lrYdd.r*t will be redirectedto the data window. A new I " n *'L,*l'||.r.$, : r----**-: variable (GROUP) will have been added and ; r {:ei.* gf-ll codedas I or 2, based TOTAL. on
lirli
i " r, . ! * r h ^ . , " , r '-
li-l a ri r, r it : . ' I
y..,t
$q I
'*J
-ltrlIl
l6
3 Chapter
Statistics Descriptive
in with ln Chapter we discussed manyof the options available SPSS dealing for 2, our waysto summarize data.The procedures usedto describe data.Now we will discuss andsummarize arecalleddescriptive statistics. data Section3.1 FrequencyDistributions and PercentileRanks for a SingleVariable Description produces The Frequencies frequency varicommand distributions the specified for percentages, percentages, ables. The outputincludes number occurrences, of valid the and percentages percentages. valid percentages the cumulative and cumulative The comprise as only thedatathatarenot designated missing. is TheFrequencies command usefulfor describing samples wherethe meanis not (e.g., useful It nominalor ordinal scales). is alsouseful a method getting feelof as of the just a meanandstandarddeviationandcan your data.It provides moreinformation than outliers.A special be usefulin determining feature the command of skewandidentifying percentile is its abilityto determine ranks. Assumptions percentages percentiles valid only for datathat are measured are Cumulative and on at leastan ordinal scale.Because outputcontains line for eachvalueof a varithe one with a relatively able,thiscommand worksbeston variables smallnumber values. of Drawing Conclusions produces TheFrequencies outputthatindicates of command boththenumber cases in the sample a particular with that value.Thus,convalueandthe percentage cases of of the in clusions drawnshouldrelateonly to describing numbers percentages cases the or of perconclusions If regarding cumulative the sample. the dataareat leastordinalin nature, percentiles be drawn. centage and/or can .SPSS Data Format frequency distributions The SPSS requires only onevariable, datafile for obtaining andthatvariable beof anytype. can
tt
C h a p te 3 D e s c ri p ti v e ti s ti c s r Sta
Creating a Frequency Distribution To run the Frequer?cies command, click Analyze, then Descriptive Statistics, I slsl}sl &99rv i:rl.&{l&l&l @ then Frequencies.(This example uses the i 1 r mpg } Disbtlvlr... cdrFrb'l{tirE i 18 N Erpbr,.. CARS.savdata file that comeswith SPSS. croac*a,.. Rrno,., It is typically located at <C:\Program F.Pt'lok,., aaPUs,., Fi les\SPS S\Cars. sav>. ) This will bring up the main dialog r5117gl box. Transfer the variable for which you would like a frequency distributioninto the Variable(s)blank to the right. Be surethat xl the Display frequency tables option is per Miles Gallon r q ! l checked.Click OK to receiveyour output. lm /Erqlr,onispUcamr / Hurepowor [horc Note that the dialog boxes in dv*,id"w"bir 1|ut jq? | newer versions of SPSS show both the d t!rc toAceileistc left .f"tq I type of variable(the icon immediately dr',Ccxr*yol Orbin [c He_l of the variable name) and the variable . labels if they are entered. Thus, the l7 Oisgay hequercy tder variable YEAR shows up in the dialog box as Model Year(modulo I0). sr**i,1..1 I rry*,:. I f*:.,. Output for a Frequency Distribution The output consists two sections. The first sectionindicatesthe numberof reof Recordswith a blank scoreare listedas cords with valid data for eachvariableselected. Notice that the variablelabel 406 records. missing.In this example, datafile contained the (modulo100). is ModelYear The second section of the output contains a statistics cumulative frequency distribution for each variable Wse lected.A tth e topof t h e s e c t io n , t h e v a ria b le la b e lis * oo? y.1"1 | of | given. The output iiself consists five columns.The first | Missing t I I I Jolumnliststhi valuesof the variablein sortedorder.There is a row for eachvalue of your variable, and additional rows are added at the bottom for the Total and Missing data. The secondcolumn gives the frequency of eachvalue,includingmissingvalues. The third columngivesthe percentage of all records (including records with missingdata) for eachvalue.The fourth column, labeled Valid Percenl, gives the percentage records(without including of records with missing data) for each value. If there were any missingvalues, these values would be larger than the valuesin column threebecause total the
Modol Yo.r (modulo 100) Cumulativs
Pcr ce n l
Valid P6rcnl I 4
vatE
34
72 73 74 75 76 77
28 40 27 30 34 28 29 29 30 31 405 1 406
I 4 7.1 6.9 9.9 6.7 8.4 6.9 8.9 7.1 7.1 7.4 7.6 99.8 100.0
7.2
6.9 9.9 6.7 7.4 8.4 6.9 8.9 f.2
E4 15.6 22.5 32.3 39.0 46.4 54.8 61.7 70.6 77.8 84.9 92.3 | 00.0
7.2
7.4 7.7
100 .0
r8
Chapter3 DescriptiveStatistics
numberof recordswould have beenreducedby the numberof recordswith missing values. The final column gives cumulative percentages. Cumulative percentages indicate the percentageof records with a score equal to or smaller than the current value. Thus, the last value is always 100%. These values are equivalentto percentile ranks for the values listed.
tril
xl
PscdibV.lrr
tr Ourilr3 F nrs**rtd!i*
I ,crnqo,p, i
c{q I *g"d I
Hdo I
f- Vdrixtgor0mi&ohlr Oi$.r$pn " l* SUaa** n v$*$i I* nmgc f Mi*n n |- Hrrdilrtl mcur l- S"E. 0idthfim' t- ghsrurt T Kutd*b
Statistics
ModelYear (modulo100 N Vatid Missing Percentiles 25 50 75 80
The Statistics dialog box adds on to the previous output from the Frequenciescommand.The new sectionof the output is shown at left. The output containsa row for each piece of information you requested. the example above, we In checkedQuartiles and asked for the 80th percentile. Thus, the output contains rows for the 25th, 50th. 75th,and 80th percentiles.
l9
Ch a p re ,1 D e s c ri p tie S ta ti s ti cs r r
Practice Exercise UsingPractice DataSet I in Appendix create frequency B, a distribution tablefor the mathematics skills scores. Determine mathematics the skills scoreat which the 60th percentile lies. section 3.2 FrequencyDistributions and percentileRanks for Multiple Variables Description The Crosslabs command produces frequency distributions multiple variables. for The outputincludes number occurrences eachcombination levelJof eachvarithe of of of able.It is possible havethecommand to givepercentages anyor all variables. for The Crosslabs command usefulfor describing is samples wherethe mean is not (e'g.,nominalor ordinal scales). is alsouseful a method getting feelfor useful It as for a your data. Assumptions Because outputcontains row or columnfor eachvalue of a variable.this the a command worksbeston variables a relatively with smallnumber values. of ,SPSS Data Format The SPSS data file for the Crosstabs I lnalyzc Orphn Ut||Uot command requires two or more variables. Those RcF*r ) variables be of anytype. can Runningthe CrosstabsCommand
(orprycrllcEnr ) G*ncral llrgar Flodcl ) , ) ;ilffi; ) chrfy DttaRcd.Etbn ) ) scah
i,
r---r
l rJ I
TK I '-l ryq I
The dialog box initially lists all variables on the left and contains two blanks labeled Row(s) and Column(s). Enter one variable (TRAINING) in the Row(s) box. Enter the second (WORK) in the Column(s) box. To analyze more than two variables, you would enter the third, fourth, etc., in the unlabeled area(ust under theLayer indicator).
20
Chapter3 DescriptiveStatistics
percentages and other information to be generatedfor eachcombinationof values.Click Cells, and you will get the box at right. For the example presented here, check Row, Column, and Total percentages.Then click Continue. This will return you to the Crosstabs dialog box. Click OK to run the analvsis.
TRAINING' WURK oss|nl)tilntlo|l Cr
NO
t*"1
,"1
''Pdrl.!p. ;F Bu F corm
"1'"1 -_rry-ys___ . -
TRAINING Yes
No
Total
Count % within TRAININO % within woRK % ofTolal Count % within TRAINING % within WORK % ofTolal Count % within TRA|NtNo % wilhin WORK % ofTolal
WORK Parl-Time I 1 50.0% 50.0% 50. 0% 50.0% 25. 0% 25.0% 1 1 50.0% 50.0% 50. 0% 50.0% 25.0% 25.0%
a
Tolal
5 00 % 100.0% 50.0%
Interpreting Cros stabs Output The output consistsof a contingencytable. Each level of WORK is given a column. Each level of TRAINING is given a row. In addition, a row is added for total, and a column is added for total.
Each cell contains numberof participants the (e.g.,one participant received no trainingand doesnot work; two participants received training,regardless employno of mentstatus). The percentages eachcell are also shown.Row percentages up to 100% for add horizontally. percentages up to 100%vertically. Column add Forexample, all the indiof vidualswho had no training 50oh not work and 50o% did workedpart-time (using the"o/o , within TRAINING" row). Of the individuals who did not work, 50o/ohad trainingand no 50%hadtraining(usingthe"o/o within work" row). Practice Exercise Using Practice Data Set I in AppendixB, createa contingency table using the Crosstabs command. Determine numberof participants eachcombination the the in of variables SEX andMARITAL. Whatpercentage participants married? of is Whatpercentageof participants maleandmarried? is Section3.3 Measuresof Central Tendencyand Measuresof Dispersion for a SingleGroup Description Measures centraltendency valuesthat represent typical memberof the of are a sample population. threeprimarytypesarethemean,median,andmode.Measures or The of dispersion you the variabilityof your scores. primarytypesare the range and tell The the standarddeviation.Together, measure central a of tendency a measure disperand of sionprovidea greatdealof information about entiredataset. the
2l
We will discuss thesemeasures central of tendency and measures dispersion the conof in text of the Descriplives command. Note that many of these statisticscan also be calculated with several other commands (e.g., the Frequenciesor Compare Means commandsare required to compute the mode or median-the Statisticsoption for the Frequenciescommandis shownhere).
iffi{ltl*::l'.,xl
Fac*Vd*c-----:":'-'-"-" "-
| |
r.-I
k'I +l
16I
:-^ t5m
':'I
l- Vdsm$apn&bcirr i0hx*ioo*".'*-' lf Sld.dr',iitbnl* lli*nn f.H**ntrn ]fV"iro f.5.t.ncr l fnxrgo oidrlatin -- -r5tcffi:
; f Kutu{b
Assumptions Each measureof central tendencyand measureof dispersionhas different assumptions associated with it. The mean is the most powerful measure centraltendency,and it of has the most assumptions. example,to calculatea mean, the data must be measured on For an interval or ratio scale.In addition,the distribution shouldbe normally distributedor, at least,not highly skewed.The median requiresat leastordinal data.Because median the indicatesonly the middle score (when scoresare arrangedin order), there are no assumptions about the shapeof the distribution.The mode is the weakestmeasureof central tendency.There are no assumptions the mode. for The standard deviation is the most powerful measure dispersion,but it, too, has of severalrequirements.It is a mathematicaltransformationof the variance (the standard deviation is the square the root of the variance).Thus, if one is appropriate, other is also. The standard deviation requiresdata measured an interval or ratio scale.In addition, on the distribution should be normal. The range is the weakestmeasureof dispersion.To calculatea range, the variablemust be at leastordinal. For nominal scale data,the entire frequencydistribution shouldbe presented a measure dispersion. as of Drawing Conclusions A measureof central tendencyshouldbe accompanied a measureof dispersion, by Thus, when reporting a mean, you should also report a standard deviation. When presentinga median, you shouldalso statethe range or interquartilerange. Data Format .SPSS Only one variable is required.
22
Chapter3 DescriptiveStatistics
Running the Command The Descriptives command will be the lA-dy* ct.dn Ltffibc command you will most likely use for obtaining measures centraltendencyand measures disperof of sion. This example uses the SAMPLE.sav data file ! D GonardtFra*!@ we have used in the previouschapters. ' cond*s )
. : Rolrar*n classfy 0tdRedrctitrt
,t X
dlt
) ) )
S&r dr.d!r&!d
Y*rcr ri vdi.bb
To run the command, click Analyze, then Descriptive Statistics,then Descriptives. n".d I This will bring up the main dialog box for the cr*l I Descriptives command. Any variables you f,"PI would like information about can be placed in opdqr".. the right blank by double-clickingthem or by I selectingthem, then clicking on the anow.
qil
ltl
{l
'!t
,l
By default, you will receive the N (number of cases/participants), the minimum value, the maximum value, the mean, and the standard deviation. Note that some of thesemay not be appropriatefor the type of data you have selected. If you would like to changethe default statistics that are given, click Options in the main dialog box. You will be given the Optionsdialog box presented here.
l- Slm
r@t
,l t
il
'i
I
:
I
"i
* I otlnyotdq:
I {f V;i*hlC
",
I r lpr,*an
;i I ;
Reading the Output The output for the Descriptivescommand is quite straightforward.Each type of output requested presented a column, and eachvariable is given in a row. The output is in presented here is for the sampledata file. It showsthat we have one variable (GRADE) and that we obtainedthe N, minimum, maximum, mean, and standard deviation for this variable.
DescriptiveStatistics
N
4 4
23
Practice Exercise Using PracticeData Set I in AppendixB, obtainthe descriptive statisticsfor the ageof the participants. What is the mean?The median?The mode?What is the standard deviation?Minimum? Maximum?The range? Section 3.4 Measures of Central Tendency and Measures of Dispersion for Multiple Groups Description The measures centraltendencydiscussed of earlierare often needednot only for the entiredataset,but also for several subsets. One way to obtainthesevaluesfor subsets would be to use the data-selection in techniques discussed Chapter2 and apply the Descriptivescommandto each subset. easierway to perform this task is to use the Means An command.The Means commandis designed provide descriptive statisticsfor subsets to ofyour data. Assumptions The assumptions discussed the sectionon Measures Central Tendencyand in of Measures Dispersion a SingleGroup(Section of for 3.3) alsoapply to multiplegroups. Drawing Conclusions A measure centraltendency of shouldbe accompanied a measure dispersion. by of Thus,when giving a mean, you shouldalsoreporta standard deviation. When presenting a median,you shouldalsostatethe range or interquartile range. SPSSData Format Two variablesin the SPSSdata file are required.One represents dependent the variable and will be the variablefor which you receivethe descriptive statistics.The other is the independentvariable and will be usedin creating subsets. the Note that while SPSScalls this variablean independentvariable, it may not meet the strict criteriathat definea true independentvariable (e.g.,treatment manipulation). Thus, someSPSSprocedures referto it as the grouping variable.
Runningthe Command This example uses the SAMPLE.sav data file you created in Chapterl. The Means commandis run by clicking Analyze, then Compare Means, thenMeans. This will bring up the main dialog box for the Means command. Place the selected variablein the blank field labeled Dependent List.
! RnalyzeGraphs Utilities WindowHetp F r.l nsportt ' Descriptive Statistirs ) General Linear ftladel F ) Csrrelata I Regression (fassify F
gt5il | Firulb
Ona-Sarnplefeft. f Independent-Samdes T Te T Test,,, Falred-SarnplEs Ons-Way*|iJOVA,,,
-l
' . '
1A LA
Chapter3 DescriptiveStatistics
Placethe grouping variable in the box labeledIndependent List.In this example, through use of the SAMPLE.sav data file, measures central tendencyand measures of of dispersion for the variable GRADE will be given for each level of the variable MORNING.
:I tu I
List Dependant
,du**
/wqrk tr"ining
arv
r T ril
ryl
Ii ..!'l?It.
Heset I Cancel I
ll".i I
lLayarl al1*-
I :'r:rrt|
I i
lr-,
r
l"rpI
l*i.rl
tffi,
I
L-:By default,the mean, numberof cases, and standard deviation are given. If you would like additional measures,click Options and you will be presented with the dialog box at right. You can opt to include any numberof measures. Reading the Output The output for the Means command is split into two sections.The first section,called a case processingsummary, gives informationabout the data used. In our sample data file, there are four students(cases),all of whom were included in the analysis.
fd Stdirtlx:
Medan
mil'*-*Ca*o* of
lltlur$u
Doviaion
5tt Minirn"rm Manimlrn Rarqo Fist La{ VsianNc Std.Enor Kutosis d Skemrcro Sld.Eno ol $karm HanorricMcan :J
ml
I
c""d I
lStardad
Lqlry-l
x,r I
lncluded N
Total N
grade - morning
Percent 100.0%
.OYo
4 |
Percent 100.0%
25
The secondsectionof the outRepott put is the report from the Means comGRADE mand. MOR N IN G Mean N Std. Deviation This report lists the name of NO 82.5000 2 3.53553 the dependent variable at the top Yes 78.0000 7.07107 (GRADE). Every level of the indeTotal 80.2500 4 5.25198 pendent variable (MORNING) is shown in a row in the table. In this example,the levels are 0 and l, labeledNo and Yes. Note that if a variable is labeled,the labelswill be usedinsteadof the raw values. The summary statisticsgiven in the report correspondto the data, where the level of the independentvariable is equalto the row heading(e.g.,No, Yes). Thus,two participantswere includedin eachrow. An additional row is added, named Total. That row contains the combined data. and the valuesare the sameas they would be if we had run theDescriptiyescommandfor the variableGRADE. Extension to More Than One Independent Variable If you have more than one independent variable, SPSS can break down the output even further. Rather than adding more variables to the Independent List section of the dialog box, you need to add them in a different layer. Note that SPSS indicates with which layer you are working.
id
If you click Next, you will be presentedwith Layer 2 of 2, and you can select a secondindependent variable (e.g., TRAINING). Now, when you run the command(by clicking On, you will be given summary statistics for the variable GRADE by each level of MORNING and TRAINING. Your output will look like the output at right. You now have two main sections(No and yes), along with the Total. Now, however, each main section is broken down into subsections(No, yes, and Total). The variable you used in Level I (MORNING) is the first one listed, and it defines the main sections.The variable you had in Level 2 (TRAINING) is listed secReport ORADE
MORNING TRAINING No Yes NO Total Yes Yes NO Total
Total
Yes
NO
Total
Mean 85.0000 80.0000 82.5000 83.0000 73.0000 78.0000 84.0000 76.5000 80.2500
Std.Deviation
1
1
I
3. 53553
1 1
1
a z
26
Chapter3 DescriptiveStatistics
who were not morningpeopleand ond. Thus,the first row represents thoseparticipants participants werenot morningpeowho who received row training. The second represents ple anddid not receive who The third row represents total for all participants the training. werenot morningpeople. are Noticethat standarddeviations not givenfor all of the rows.This is because per One with usingmanysubsets thereis only oneparticipant cell in thisexample. problem is that it increases numberof participants required obtainmeaningful to results. a See the for research design textor your instructor moredetails. Practice Exercise B, the UsingPractice DataSet I in Appendix compute meanandstandarddeviaWhat is the average of the marriedpartion of agesfor eachvalueof maritalstatus. age participants? ticipants? singleparticipants? divorced The The Section3.5 Standard Scores Description of scales transforming scores the Standard scores allow thecomparison different by into a commonscale. standard scoreis the z-score. z-score based A is The mostcommon (e.g., meanof 0 anda standarddeviation l). A a on a standardnormal distribution of z-score, of or therefore, represents number standarddeviations the above belowthe mean (e.9., z-score -1.5 represents a I deviations of a score % standard belowthemean). Assumptions Z-scores based the standardnormal distribution. Therefore, distribuare the on tionsthatareconverted z-scores be be to should normallydistributed, thescales and should eitherinterval or ratio. Drawing Conclusions of of above Conclusions based z-scores on consist thenumber standarddeviations 85 or belowthe mean.For example, student scores on a mathematics that a examin a class hasa meanof 70 andstandarddeviationof 5. The student's scoreis l5 pointsabove test z-score 3 because scored standard is the class mean(85 - 70: l5). The student's she 3 :3). If the same deviations student scores on a reading 90 exam, above mean(15 + 5 the of will with a class meanof 80 anda standarddeviation 10,thez-score be I .0 because she is one standard deviation abovethe mean. Thus,even thoughher raw scorewas higheron thereading to test,sheactually betterin relation otherstudents the mathedid on matics because z-score higheron thattest. was test her .SPSS Data Format That variable in mustbe Calculating z-scores requires only a singlevariable SPSS. numerical.
27
Running the Command Computingz-scores a component the is of eqhs Uti$tbl WMow Help Myzc Descriptivescommand.To access click Analyze, it, ) b,lrstlK- al then Descriptive Statistics, then Descriptives. This example uses the sample data file (SAMPLE.sav) createdin ChaptersI and2. @nerdLlneuFbdel )
Correlate )
This will bring up the standard dialog box for the Descrip/ives command. Notice the checkdoay drnue box in the bottom-left corner ladMonNtNs tsr.dI beled Save standardized values as dwnnn c"od I variables.Check this box and move drR$HtNs HdpI the variable GRADE into the righthand blank. Then click OK to com19 Srva*ndudi3ad vduos vcriaHas ts plete the analysis. You will be preldry | sented with the standard output from the Descriptivescommand.Notice that the z-scoresare not listed. They were inserted into the data window as a new variable. Switch to the Data View window and examine your data file. Notice that a new variable, called ZGRADE, has been added.When you asked SPSS to save standardized values,it createda new variable with the samename as your old variable preceded a Z. by The z-scoreis computedfor eachcaseand placedin the new variable.
t*l
Eb E*
Sw Qpt. lrnsfam
end/2. gr$t6
lr| -tsJX
rc
$citffrtirffi
Yas Yes
No
Tua/Thulaiemoon
Mi-
Reading the Output After you conductedyour analysis,the new variable was created.You can perform any numberof subsequent analyses the new variable. on Practice Exercise Using PracticeData Set 2 in Appendix B, determinethe z-scorethat corresponds to each employee's salary.Determinethe mean z-scoresfor salariesof male employeesand femaleemployees. Determinethe mean z-scorefor salaries the total sample. of
28
4 Chapter
Data Graphing
Section 4.1 GraphingBasics
In addition to the frequency distributions,the measuresof central tendency and measures dispersiondiscussed Chapter3, graphingis a useful way to summarize,orof in ganize,and reduceyour data. It has been said that a picture is worth a thousandwords. In the caseof complicateddatasets,this is certainlytrue. With Version 15.0of SPSS,it is now possibleto make publication-quality graphs using only SPSS.One importantadvantage using SPSSto createyour graphsinsteadof of other software (e.g., Excel or SigmaPlot)is that the data have alreadybeen entered.Thus, duplication eliminated, is and the chance makinga transcription of error is reduced.
the far-rightcolumnof the VariableView).Switchto the Data Viewto Measure enterthe datavalues the 16participants. for Now usethe Save comAs Scale mand save file,naming HEIGHT.sav. to the it -HEIGHT 66 69
/5
72 68 63 74 70 66 64 60 67 64 63 67 65
SEX I I I I I I I I 2
bCIb iNiomiiiai
2 2 2 2 2 2 2
29
Chapter GraphingData 4
Make sure you have enteredthe data correctly by calculating a mean for each of the three variables(click Analyze,thenDescriptive Statistics,then Descriptives).Compare your resultswith thosein the tablebelow.
Descrlptlve Statistics
srd.
N
Minimum Maximum 16 16 16 16
Mean
Dpvi2lion
60.00
06 nn
1.00
J.9Ub//
26.3451 .5164
Chart Builder Basics Make sure that the HEIGHT.sav data file you createdabove is open. In order to usethe chart builder, you must have a data file open. NewwithV ersio n l5 . 0 o f S P S S is t h e Ch a rt B u ild e rc o m. W windt mand. This command is accessedusing Graphs, then Chart rct"ph; Lulities Builder in the submenu. This is a very versatilenew commandthat can make graphsof excellentquality. When you first run the Chart Builder command,you will probably be presented with the following dialog box:
Bcforeyur rrc thlsdalog,moasuranar* shold bc sct gecrh fw cadr vadabb hvel h yourdurt. In dtbn, f yow chartcodahs cataqo*d v6d&. v*re hbds sha.rld &fhcd for eachcrtrgory br kass O( to doflrc yorr chart, Pr6srDafine V.riaHafroportbs to mt masrcnrant brd or ddhe v*.te l&b for rhart vsi$bs,
:,
f* non't*row $rUdalogagaFr
ol(
Ocfknvubt# kopcrtcr.,.
This dialog box is you to ensure your asking that variables are properly defined. Refer to Sections 1.3 and2.1 if you had difficulty definingthe variables usedin creatingthe datasetfor this example,or to refreshyour knowledge thistopic.Click of
oK.
The Chart Builder allows you to make any kind of graph that is normally used in publication or presentation, and much of it is beyond the scopeof this text. This text, however,will go over the basics of the Chart Builder so that you can understand mechanics. its On the left side of the Chart Builder window are the four main tabs that let you control the graphsyou are making. The first one is the Gallery tab. The Gallery tab allows you to choosethe basic format ofyour graph.
cc[ffy
Eesk notnents
l"ry{Y:_
litleo/Footndar
30
For example, the screenshothere showsthe different kinds of bar chartsthat the Chart Builder can create. After you have selectedthe basic 0rr9 a 63llst ctrt fsg b re it e form of graph that you want using the y* 6t'fig pohr Gallery tab, you simply drag the image OR from the bottom right of the window up to Clkl m f 86r Ele|mb * b tulH r dwt lsffirt bf ele|Ft the main window at the top (where it @9Pk8: reads,"Drag a Gallery chart here to use it as your startingpoint"). Chrtpftrbv [43 airr?b deb Alternatively, you can use the Badnsrfiom: sic Elemenlstab to drag a coordinatesys- 8arts ElpnF& Ll3 tem (labeledChooseAxes) to the top winAroa PleFokr dow, then drag variables and elements Scalbillot Hbbqran into the window. HUH-ot, 8oph The other tabs (Groups/Point ID DJ'lAm and Titles/Footnotes)can be used for adding other standard elements to your graphs. ,bh I n"ct I cror | The examples in this text will cover some of the basic types of graphs you can make with the Chart Builder. After a little experimentation your own, once you on have masteredthe examplesin the chapter,you will soon gain a full understanding the of ChartBuilder.
Section4.3 Bar Charts, Pie Charts, and Histograms Description pie represent number timeseachscore the ocBar charts, charts, histograms and of of Theyaregraphical represencursthrough varyingheights barsor sizes pie pieces. the of in tations thefrequency discussed Chapter 3. of distributions Drawing Conclusions produces TheFrequencies outputthatindicates of command boththenumber cases percentage cases particular with that value.Thus, in the sample with a valueand the of the for conclusions or drawn shouldrelateonly to describing numbers percentages the perconclusions regarding cumulative sample. the dataareat leastordinal in nature, If the percentiles alsobedrawn. centages and/or can SPSSData Format You needonlv onevariable usethiscommand. to
3l
Chapter GraphingData 4
Running the Command The Frequenciescommand will produce *nalyze Gr;pk Udties Window Hdp | graphicalfrequencydistributions.Click Analyze, t LiwL lW .a'fJul then Descriptive Statistics, then Frequencies. You will be presented (6fnpSg MBan* with the main dialog box ) ) for the Frequencies command, where you can GeneralLinearMsdel enter the variables for which vou would like to creategraphsor charts.(SeeChapter3 for other optionswith this command.)
);,r.:
Click the Charts button at the bottom to producefrequencydistributions.This will give you the Chartsdialog box. n"*dI There are three types of chartsavailc"!q I able with this command: Bar charts, Pie l1t"l charts, andHistograms. For each type, the I axis can be either a frequency count or a (selectedwith the Chart Values percentage option).
xl
0Kl
You will receive the charts for any variables lectedin the main Frequencies commanddialog box.
Output
The bar chart consistsof a I'axis, representing the frequency,and an Xaxis, representing eachscore.Note that the only values represented the X axis are those values on with nonzero frequencies (61, 62, and 7l are not represented).
h.lgtrt
,
I a L
G a
65.00 66.!0
67.m
68.00
70.s
h.lght
The pie chart shows the percentageof the whole that is representedby eachvalue.
The Histogram commandcreatesa groupedfrequency distribution. Therangeof scores splitintoevenly is spaced groups.The midpointof each groupis plottedon theX axis,andthe I axisrepresents number scores the of for each group. If you select With Normal Curve,a normalcurve will be superimposed over the distribution. This is very useful in determining the disif tribution you have is approximately normal. The distributionrepresented here is clearly not normaldue to the asymmetry thevalues. of
Practice Exercise
h.lght
Use PracticeData Set I in Appendix B. After you have enteredthe data,constructa histogramthat represents mathematics the skills scoresand displaysa normal curve, and a bar chart that represents frequencies the variableAGE. the for
JJ
Ch a p te -1 Gra p h i n g a ta r D
Assumptions Both variables should interval or ratio scales. nominalor ordinal dataare be If used, cautious be your interpretation thescattergram. about of .SPSS Data Format You need two variables perform command. to this
Running the Command You can produce scatterplots clicking Graphs, then Chart I 6raph* ulfftlqs Wnd by Builder. (Note: You can also use theLegacyDialogs. For this method, please AppendixF.) see ln Gallerv Choose r l 0l selectScatter/Dol.Then drag the Simple from: Scatter icon (top left) up to the main chart areaas shown in the screenshot left. Disreat orrq a 6ilby (h*t fes b & it e gard the ElementPropertieswindow that pops tl ".:' ;o on, l ln up by choosingClose. iLs clr* s fE Bs[ pleitbnb t b b krth Next, dragthe HEIGHT variableto the 3 cfst Bleffit by l8ffit X-Axis area,and the WEIGHT variable to the Y-Axisarea(rememberthat standardgraphing (& mtrpb dstr Chrifrwr* conventions indicate that dependent variCtffii'w: Frwih ables shouldbe I/ and independentvariables Si LtE shouldbe X. This would mean that we are trylr@ Fb/Fq|n ing to predict weights from heights).At this gnt$rrOol l,lbbgran point, your screenshould look like the examHlgfFl"tr l@bt Ral Ars ple below. Note that your actual data are not shown-just a set of dummy values.
Wrilitll'.,: ,, .Jol
V*l&bi:
8n
Lh PlrifsLa Scfflnal xbbrs Hg||rd
iEbM{ Ffip*t!4.,
opbr.,
34
Output Theoutput consist a markfor each will participant theappropriate and of at X levels.
arlo
i?Jo hdtht
?0.00
t:.${
Adding a Third Variable Even thoughthe scatterplot a is two-dimensional graph,it canplot a third variable.To make it do so, selectthe Groups/Point tab in the ChartBuilder. ID Click the Grouping/stacking variable option. Again,disregard Element the Properties window that pops up. Next, drag thevariable SEX into theupper-right corner where it indicates Color. When Set this is done,your screen shouldlook like the imageat right. If you are not ableto dragthe variable SEX,it may be because it is not identified nominalor ordinal as in the VariableViewwindow. Click OK to have SPSSproduce thegraph.
!|||d
d*|er
btrdtn-
b$tdl
l- cotrnrcpr:tvr$ I- aontpl*rt
35
4 Chapter GraphingData
the Now our output will have two different setsof marks. One set represents male participants,and the secondset represents female participants.Thesetwo setswill apthe You can use the SPSSchart editor (seeSection pear in two different colors on your screen. 4.6) to make them different shapes, shown in the examplebelow. as
sPx
iil
os
60.00
65,00
67.50
helght
Practice Exercise the to a B. DataSet2 in Appendix Construct scatterplot examine relaUsePractice tionship between SALARY andEDUCATION. Section4.5 Advanced Bar Charts Description command(see Section4.3). with the Frequencie.s Bar chartscan be produced wherethe I/ axis is not a frequency. we in however. are interested a bar chart Sometimes. we To produce sucha chart, need usetheBar chartscommand. to
SPSS Data Format You need at least two variables to perform this command. There are two basic designsand those for repeated-measures kinds of bar charts-those for between-subjects methodif one variableis the independentvariable and designs. Use the between-subjects method if you have a dethe other is the dependent variable. Use the repeated-measures pendent variable for eachvalue of the independentvariable (e.g.,you would havethree
36
Chapter4 GraphingData
variablesfor a designwith threevaluesof the independentvariable). This normally occurs when you make multiple observations over time. This exampleusesthe GRADES.savdata file, which will be createdin Chapter6. Please section6.4 for the dataif you would like to follow along. see Running the Command Open the Chart Builder by clicking Graphs, then Chart wh& Builder. In the Gallery tab, select Bar. lf you had only one inde- tG*ptrl uti$Ues pendent variable, you would selectthe Simple Bar chart example (top left corner).If you have more than one independentvariable (as in this example), tfldr( select the Clustered Bar Chart example l?i;ffitF.td'd{4rfr trrd... from the middle of the top row. /ft,Jthd) /l*n*|ts,., Drag the example to the top workdq*oAtrm, , h4 | G.lary ahd lsr to @ t 6 p cfwxry ing area. Once you do, the working area m should look like the screenshotbelow. ffi * $r 0* dds t bto h.td. drr drrrl by.lr!* (Note that you will need to open the data file you would like to graph in order to run this command.)
9{ m hlpd{ sc.ffp/Dat tffotm tldrtff 60elot oidA#
:gi
y"J
.*t I
r,* |
yu vttdld {a b. rsd te grmt! lh. y*rfts yw d.t, qa..dr vrt d. {db. Edr.*6ot.' rh ffi h *. dst, vtlB enpcr*.dby |SddSri,lARV vrtdb cdon d b Ur Y d. Vrtdrr U* d.ftr (&gqb n.ryst d !d c **h o b. red o. c&eskd d q 6 *rdd nDe(rd*L, . gdslo a F Ftrg Yrt aic.
Cdtfry LSdrl
o,-l
ryl
*. r! " l
If you are using a repeated-measures design like our example here using GRADES.sav from Chapter 6 (three different variablesrepresenting the i values that we want), you needto selectall threevariables (you can <Ctrl>-click them to selectmultiple variables)and then drag all threevariablenamesto the Y-Axisarea.When you do. vou will be given the warning message above. Click OK.
JI
Chapter GraphingData 4
,'rsji,. *lgl$
r*dlF*... dnif*ntmld,.. /tudttbdJ {i*rEkucrt}&"., &rcqsradtrcq,,.
Next, you will need to drag the INSTRUCT variable to the top right in the Cluster: set color area (see screenshot at left).
iJr g; i ? :
I' '!
;:Nl
iai
Note: The Chart Builder pays attention to the types of variables that you ask it to graph. If you are getting etTor messages or unusual results, be sure that your categorical variables are properly designated Nominal as in the Variable View tab (See Chapter2, Section2.l).
n"i*
l.
crot J
rr!
Output
inilrut
lr nt
&t:
Practice Exercise Use PracticeData Set I in Appendix B. Constructa clusteredbar graph examining the relationship betweenMATHEMATICS SKILLS scores(as the OepenOent variabtej and MARITAL STATUS and SEX (as independentvariables). Make sure you classify both SEX and MARITAL STATUS as nominalvariables.
38
qb li. lin.tlla.
,, ; l 61f
L: lr ! . H; gb. t c t - ] pu1 r i
F4*.it.r":!..*
*rll..!!lflE.!l
ltliL&{
;l Jxr il.dk'nl
IE
r 9-, rt
I
:,-
r--."1
fil
mlryl
OnceChart Editor is open,you can easily edit eachelementof the graph.To select an element,just click on the relevantspot on the graph. For example,if you have addeda title to your graph ("Histogram" in the examplethat follows), you may selectthe element representing title of the graphby clicking anywhereon the title. the
39
Chapter GraphingData 4
jn
Ex Yt
l tb " : k l g tH ,U:;
Once you have selected an element, you can tell whether the correct element is selectedbecauseit will have handlesaroundit. If the item you have selected a text element(e.g., is the title of the graph), a cursor will be present and you can edit the text as you would in a word processing program. If you would like to change another attribute of the element (e.g., the color or font size), use the Properties box. (Text properties are shownbelow.) With a linle practice, you can make excellent graphs using SPSS.Once your graph is formatted the way you want it, simply select File, Save, then Close.
o,tl*" ffiln*fot*.1
P?*l!r h ?frtmd Sa . .
AaBbCc123
$o gdt lbw gsion Ek $vr {hat Trm$tr,,, Spdy$a*Tmpt*c.,. flpoft {bdt rf'.|1,,,
Ua*tr$Sie
gltaridfu;
40
5 Chapter
n
1
..
rqsl
Vdri.blcr
n"."tI
I
. .i lwolalad
ry{l i*l
Move at leasttwo variablesfrom the box at left into the box at right by using the transfer arrow (or by double-clicking each variable). Make sure that a check is in the Pearson box under Correlation Cofficients. It is acceptableto move more than two variables.
{. 0rG-tr8.d
9@,. 1
4l
For our example,we will move all threevariables over and click OK. Reading the Output
Vdi{$b*
The output consists of a :rydl !4 1 correlation matrix. Every variableyou entered in the command is represented as both a row and a column. We entered three variables in our command. I Tc* d $lrfmma*--*=*-*:-**-*l Therefore,we have a 3 x 3 table. There are also three rows in each cell-the 17 Flag{flbrrcorda&rn correlation,the significance level, and the N. If a correlation is signifiCorrelations cant at less than the .05 level, a single * will appear next to the heioht weioht sex netgnt Pearsonuorrelalron 1 .806' -.644' correlation.If it is significant at Sig. (2-tailed) .000 .007 the .01 level or lower, ** will apN 16 16 16 pear next to the correlation. For weight PearsonCorrelation .806' .968' example, the correlation in the Sig. (2-tailed) .000 .000 output at right has a significance N 16 16 16 PearsonCorrelation 1 -.644' -.968' level of < .001, so it is flagged sex Sig. (2-tailed) .007 .000 with ** to indicatethat it is less N 16 16 16 than.0 1 . ". Correlation significant the 0.01 levet(2-tailed). is at To read the correlations. select a row and a column. For example,the correlationbetweenheight and weight is determinedthrough selectionof the WEIGHT row and the HEIGHT column (.806).We get the sameanswerby selecting the HEIGHT row and the WEIGHT column. The correlationbetween a variable and itself is always l, so thereis a diagonalsetof I s.
m l/'*
lsffi
-NOX I
nql
l_i::x-
.--i
Drawing Conclusions -1.0 and +1.0. Coefficients The correlation coefficientwill be between closeto 0.0 represent weak relationship. a a Coefficients closeto 1.0or-1.0 represent strongrelationship. Generally,correlationsgreaterthan 0.7 are consideredstrong. Correlationsless than 0.3 are considered weak. Correlationsbetween0.3 and 0.7 are considered moderate. Significant correlationsare flagged with asterisks.A significant correlation indicatesa reliable relationship,but not necessarily strong correlation.With enoughparticia pants,a very small correlationcan be significant.PleaseseeAppendix A for a discussion of effect sizesfor correlations.
4/
between for A Pearson correlationcoefficientwas calculated the relationship positive correlation was found participants' height and weight. A strong between linear relationship (r(14) : .806,p < .001),indicatinga significant to weigh more. tend Taller participants the two variables. The conclusionstatesthe direction(positive),strength(strong),value (.806), degreesof freedom(14), and significancelevel (< .001) of the correlation.In addition,a statement direction is included(taller is heavier). of is of Note that the degrees freedomgiven in parentheses 14. The output indicatesan command of freedom,the correlation give degrees N of 16. While most SPSSprocedures of gives only the N (the numberof pairs).For a correlation,the degrees freedomis N - 2. Phrasing Results That Are Not Significant Using our SAMPLE.savdataset from the previous chapters,we could calculatea correlationbetweenID and GRADE. If so, we get the outPut at has right. The correlation a significance level of .783. Thus, we could write the following in a resultssection(note that of the degrees freedomis N - 2):
Correlations lD Pearson Uorrelatlon (2{ailed) Sig. N PearsonCorrelation Sig.(2-tailed) N ID 1.000 4 .217 .783 4 GRADE .217
7A?
GMDE
1.000 4
A Pearsoncorrelation was calculatedexamining the relationshipbetween participants' ID numbers and grades. A weak correlation that was not to p was found(, (2): .217, > .05).ID numberis not related grade significant in the course. Practice Exercise conelaUse PracticeData Set 2 in Appendix B. Determinethe value of the Pearson tion coefficient for the relationshipbetweenSALARY and YEARS OF EDUCATION.
43
Cha p te r Pre d i c ti o n n dA s s o c i a ti on 5 a
SP.SS Data Format Two variables required your SPSS mustprovidedata are in datafile. Eachsubject for bothvariables.
Running the Command Click Analyze, then Correlate, then Grapk Utilitior wndow Halp |;,rfiy* Bivariate.This will bring up the main dialog box ) RrFarts for Bivariate Correlations(ust like the Pearson Statistics ) Oescri$ive I correlation). About halfway down the dialog ComparcMeans ) box, there is a section for indicating the type of " Generd Linear f{udel ) correlation you will compute. You can selectas many correlationsas you want. For our example, removethe check in the Pearsonbox (by clicking on it) and click on theSpearmanbox.
j
i* CsreldionCoefficients
Use the variablesHEIGHT and WEIGHT 4). from our HEIGHT.savdatafile (Chapter This is also one of the few commandsthat allows you to choose one-tailed a test.if desired.
Drawing Conclusions The correlation -1.0 and +1.0. Scores will be between closeto 0.0 represent weak a relationship. Scores closeto 1.0or -1.0 represent strongrelationship. a Significantcorrelations are flagged with asterisks.A significant correlation indicatesa reliable relationship, but not necessarily strong correlation.With enoughparticipants,a very small correlation a can be significant. Generally,correlationsgreaterthan 0.7 are consideredstrong. Correlations less than 0.3 are consideredweak. Correlationsbetween0.3 and 0.7 arc considered moderate.
44
rho correlationcoefficient was calculatedfor the relationship A Spearman betweena subject's ID number and grade.An extremely weak correlation was found (r (2\ = .000,p > .05).ID numberis not that was not significant relatedto gradein the course. Practice Exercise of the Data Set 2 in AppendixB. Determine strength the relationship Use Practice job classification calculating Spearman correlation. the r&o by between salaryand Section 5.3 Simple Linear Regression Description of allowsthe prediction one variablefrom another. Simplelinearregression Assumptions are that both variables interval- or ratio-scaled. assumes Simple linear regression aroundthe prediction In addition,the dependentvariable shouldbe normally distributed that the variablesare relatedto each other linearly.Typiline. This, of course,assumes
45
Ch a p te 5 P re d i c ti o n n dA s s o c i ati on r a
cally, both variablesshould be normally distributed. Dichotomous variables (variables with only two levels)are alsoacceptable independentvariables. as .SPSS Data Format Two variablesare required in the SPSSdata file. Each subject must contribute to both values. Running the Command Click Analyze, then Regression,then Linear. This will bring up the main diatog Aulyze Graphs LJtl$ties Whdow Help R;porte box for Linear Regression. the left sideof On ' Descrptive5tatistkf > the dialog box is a list of the variablesin Comparc ) Mems your data file (we are using the HEIGHT.sav General linear frlod l data file from the start of this section). On ' Corrolate the right are blocks for the dependent variable (the variable you are trying to Clasifu ) predict),and the independent variable (the Data ) Reductbn variablefrom which we are predicting). We are interestedin predicting someone's weight on the basisof his or her height. Thus, we should place the variable WEIGHT in the dependent variable block and the variable HEIGHT in the independent variable block. Then we can click OK to run the analysis. Reading the Output
lt{*rt*
0coandart
u* I i -Iqil I Crof
rrr l Pmr{r
Ucitbd lErra :J SdrdhVui.bh
i E
Ar-'"1
Est*6k
For simple linear regressions, we are interestedin three components of the output. The first is called the sui*br... pbr.. I Srrs... I Oaly*..I I Model Summary,and it occursafter the Variables Entered/Removed section. For our example,you shouldseethis output.R Square (calledthe coeflicient of determination) gives you the proportionof the variance of your dependentvariable (yEIGHT) that can be explainedby variationin your independentvariable (HEIGHT). Thus, 649% of the variation in weight can be explainedby differences height (talier in individuals weigh more). The standard error of Modetsummarv ModelSummary
WLSWaidrl:
I'J
estimategivesyou a measure Adjusted Std.Errorof of dispersion your predic- Model for R R Square R Souare the Estimate tion equation. When the 1 .E06 .649 .624 16.14801 predictionequationis used. a. Predictors: (Constant), height 68%of thedatawill fall within
46
one standard error of estimate (predicted)value. Just over 95ohwill fall within two stanof dard errors.Thus, in the previous example,95o/o the time, our estimatedweight will be :32.296). (i.e.,2 x 16.148 pounds beingcorrect within 32.296 of
ANOVAb
Sumof
Model
Sorrares
df
I 14 15
Sio.
25.926
.0004
a' Predictors: (Constant), HEIGHT b. Dependent Variable: WEIGHT in The secondpart of the output that we are interested is the ANOVA summarytable, as shown above.The important numberhere is the significance level in the rightmost If column. If that value is lessthan .05, then we have a significantlinear regression. it is largerthan .05,we do not. This is wherethe actual The final sectionof the output is the table of coefficients. predictionequationcan be found.
Coefficientt' Standardized Unstandardized Coefficients Coefficients Beta B Std.Error Model 1 (Constant) -234.681 71.552 height 1.067 .806 5.434 a. Dependent weight Variable:
/: "
S i o.
-3.280 5.092
.005 .000
equation.f' (pronounced In most texts, you learn that Y' : a + bX is the regression (primes are normally predictedvalues or depend"Y prime") is your dependent variable ent variables), and X is your independentvariable. In SPSSoutput,the valuesof both a andb are found in the B column.The first value,-234.681,is the value of a (labeledConstant).The secondvalue,5.434,is the value of b (labeledwith the name of the independent variable). Thus, our prediction equation for the example above is WEIGHT' : -234.681+ 5.434(HEIGHT). In otherwords,the average subjectwho is an inch taller than anothersubjectweighs 5.434 poundsmore. A personwho is 60 inchestall shouldweigh -234.681+ 5.434(60):91.359pounds. discussion standarderror of of Givenour earlier estimate,95ohof individualswho are 60 inchestall will weigh between59.063(91.359= + pounds. (91.359 32.296 123.655) 32.296: 59.063) and 123.655
47
Drawing Conclusions indicate(a) whether or not a significant preConclusionsfrom regression analyses diction equation was obtained,(b) the direction of the relationship,and (c) the equation itself. Phrasing Results That Are Significant In the exampleson pages46 and 47, we obtainedan R Squareof .649 and a regression equationof WEIGHT' : -234.681 + 5.434(HEIGHT). The ANOVA resultedin .F = 25.926 with I and 14 degreesof freedom.The F is significant at the less than .001 level. Thus, we could statethe following in a resultssection: A simple linear regressionwas calculatedpredicting participants' weight equationwas found (F(1,14) : basedon their height.A significantregression predictedweight is equal 25.926,p < .001),with an R' of .649.Participants' to -234.68 + 5.43 (HEIGHT) pounds when height is measuredin inches. Participants' weight increased 5.43poundsfor eachinch of height. average The conclusion statesthe direction (increase),strength(.649), value (25.926), deIn greesof freedom(1,14), and significancelevel (<.001) of the regression. addition,a statement the equation of itself is included. Phrasing ResultsThatAre Not Significant If the ANOVA is not significant (e.g.,seethe output at right), the section of the output labeled SE for the ANOVA will be greaterthan .05, and the regression equationis not significant.A results section might include the following statement: A simple linear regression was calculatedpredicting participants' ACT scoresbasedon their height. The regressionequation was not : significant(F(^1,14) 4.12, p > .05) with an R' of .227.Height is not a significantpredictor of ACT scores.
llorlol Srrrrrrry Adjuslsd R Souare
Std.Eror of
lh. Fsl i m a l e
Hodel
attt
R Souare 221
112
3 06696
a. Predlclors: (Constan0,h8lghl
rt{)vP
Sumof
Xodel dl
xean Souare
I
t
4.12U
Rssldual Tolal
Slo 0621
1a t5
I 408
Cootlklqrrr Unstandardizd
Hodl
Slandardizsd
ts
Std. Erol
Bsta
J OJI
Sio
(u0nslan0 hei9hl
| 9.35I -.411
13590 203
. r 17
.2 0 3 0
003 062
a. OBDendsnlva.iable: acl
Note that for resultsthat are not significant,the ANOVA results and R2resultsare given,but the regression equation not. is Practice Exercise Use PracticeData Set 2 in Appendix B. If we want to predict salary from years of education,what salary would you predict for someonewith l2 years of education?What with a collegeeducation(16 years)? salarywould you predict for someone
48
Assumptions
Multiple linear regression that all variables assumes are interval- or ratio-scaled. In addition, the dependent variable should be normally distributedaround the prediction line. This, of course,assumes that the variablesare relatedto eachother linearly. All variablesshouldbe normally distributed. Dichotomousvariables are also acceptable indeas pendentvariables. ,SP,S,S Data Format At least three variablesare required in the SPSSdata file. Each subject must contributeto all values.
Running the Command Click Analyze, thenRegression, Linear. At"h* eoptrc utiltt 5 then I This will bring up the main dialog box for Linear Regression. the left sideof the dialog box is a i &ry!$$sruruct On list of the variables your datafile (we are using Cglpsaftladls in GarnrdLhcar ldd the HEIGHT.savdata file from the start of this chapter). the right side of the dialog box are On you blanks thedependent for variable(thevariable aretrying to predict) variables andthe independent (thevariables from whichyou arepredicting).
Dmmd*
t{,lrdq., }l+
l-...G
LLI l&-*rt
.roj I
ryl
n{.rI tb.l
fn f*---*--
SlcdirnVdir*
I ,it'r:,
Er'---
Cs Lrbr&:
ti4svlit{
We are interested in predicting someone's weight basedon his or her height sex. We believe that both sex and and height influence weight. Thus, we should place the dependent variable WEIGHT in the Dependentblock and the independent variables HEIGHT and SEX in the Independent(s) block. Enterboth in Block l. This will perform an analysisto determine if WEIGHT can be predicted from SEX and/or HEIGHT. There are several methods SPSS can use to conduct this analysis. These can be selected with the Method box. Method Enter. the most widely
49
Cha p te 5 Pre d i c ti o n n dA s s o c i a ti on r a
used,puts all variables the equation, whether they are significant or not. The other in methodsuse various meansto enter only those variables that are significant predictors. Click OK to run the analvsis.
Uethod lE,rt-rl Readingthe Output For multiplelinearregression,therearethreecomponents of the output in which we are interested. The first is called Model the Summary, which is foundafterthe
ModelSummary Model R
R Souare
VariablesEntered/Removed section.For our example,you should get the output above.R Square(calledthe coefficientof determination) tells you the proportionof the variance in the dependentvariable (WEIGHT) that can be explained variationin the independby ent variables (HEIGHT and SEX, in this case). Thus, 99.3%of the variationin weight can be explained by differencesin height and sex (taller individuals weigh more, and men weigh more). Note that when a secondvariable is added,our R Squaregoes up from .649 to .993.The .649was obtained usingthe SimpleLinear Regression examplein Section5.3. The StandardError of the Estimategives you a margin of error for the prediction equation. Using the predictionequation,68%o the datawill fall within one standard erof ror of estimate (predicted) value.Just over 95% will fall within two standard errors of estimates.Thus, in the exampleabove,95ohof the time, our estimatedweight will be within 4.591 (2.296 x 2) poundsof being correct.In our Simple Linear Regression example in Section5.3, this numberwas 32.296. Note the higherdegree accuracy. of The secondpart of the output that we are interested is the ANOVA summarytain ble. For more information on readingANOVA tables,refer to the sectionson ANOVA in Chapter6. For now, the importantnumberis the significancein the rightmostcolumn. If that value is lessthan .05,we havea significantlinearregression. it is largerthan .05,we If do not.
eHoveb
Model xegresslon Residual Total Sum of Souares
df
03424 24
68.514 10410.938
z 13
15
F
v61.ZUZ
S i o.
.0000
The final sectionof output we are interested is the table of coefficients.This is in wherethe actualpredictionequationcan be found.
50
Beta
t
176
Sio.
.312 -.767
10.588 -26.071
equation.For multiple reIn most texts, you learn that Y' = a + bX is the regression + gression,our equationchanges l" = Bs + B1X1 BzXz+ ... + B.X.(where z is the number to Variables). I/' is your dependent variable, and the Xs are your independof Independent ent variables. The Bs are listed in a column. Thus, our predictionequationfor the example + (whereSEX is codedas aboveis WEIGHT' :47.138 - 39.133(SEX) 2.101(HEIGHT) I : Male, 2 = Female,and HEIGHT is in inches).In other words, the averagedifferencein weight for participants who differ by one inch in height is 2.101 pounds.Males tend to weigh 39.133 pounds more than females.A female who is 60 inchestall should weigh + of 47.138- 39.133(2) 2.101(60):94.932 pounds.Given our earlierdiscussion the stan90.341 who are 60 inchestall will weigh between of females dard error of estimate ,95o/o (94.932+ 4.591= 99.523)pounds. (94.932- 4.591: 90.341)and99.523 Drawing Conclusions analysesindicate(a) whether or not a significant preConclusionsfrom regression (b) the direction of the relationship,and (c) the equation diction equation was obtained, itself. Multiple regressionis generallymuch more powerful than simple linear regression. Compareour two examples. you the With multiple regression, must alsoconsider significancelevel of eachindependentvariable. In the exampleabove,the significancelevel of both independent variables is lessthan .001.
Morbl Sratrtny
xodsl
R Souars
ANr:rVAD
Sumof
Sd r r r r a q Xodel dt Heorsssron r u3t2.424 I 2 R es i dual 68.5t 4 Tutal | 041 0.938 15 a. Predlctors: (Conslan0, hoighl ser,
981202
000r
b. OspBndontVariablo reighl
Coefllcldasr
Unslanda.dizsd
Xodel
Slandardizad
Std.Eror
4 843
Beta .312
Sio t6
.1 98 L501
10.588 - 26.071
5l
A multiple linear regressionwas calculatedto predict participants' weight basedon their height and sex. A significantregression equationwas found (F(2,13): 981.202, < .001),with an R' of .993. Participants' p predicted weightis equalto 47.138- 39.133(SEX) 2.10l(HEIGHT), + whereSEX is coded as I = Male, 2 : Female, and HEIGHT is measuredin inches. Participantsincreased2.101 pounds for each inch of height, and males weighed 39.133 pounds more than females. Both sex and height were significantpredictors. The conclusionstates the direction(increase), strength(.993), value (981.20),degreesof freedom(2,13),and significancelevel (< .001) of the regression. addition,a In statement the equationitself is included.Because of there are multiple independent variables,we havenotedwhetheror not eachis significant. Phrasing Results That Are Not Significant If the ANOVA does not find a significantrelationship, Srg section the of the output will be greaterthan .05, and the regressionequation is not significant. A resultssectionfor the output at right might include the following statement: A multiple linear regression was predicting particicalculated pants' ACT scores basedon their height and sex. The regression equation was not significant : (F(2,13) 2.511, > .05)with an p R" of .279. Neither height nor weight is a significant predictor of lC7" scores.
llorlel Surrrwy
XodBl x R Souare
AdtuslBd R Souare
ANI]VIP gum of
dt
qin
l3 't5
23.717 9.a57
I 2.5r
i tn.
Standardized Coeilcionts
Beia
std
Sld E.rol
(Constan0 hl9hl sx
oJ ttl - 576
-t o??
- .668 - 296
3.1 02 2.1 68 - s 62
,ir";;;;ilJlovA
UsePractice DataSet2 in Appendix Determine prediction for B. the equation predictingsalary based education, on years service, sex.Whichvariables significant are of and predictors? you believethat men were paid more than womenwere,what would you If conclude afterconducting analysis? this
52
Chapter 6
Parametric InferentialStatistics
Parametric statistical procedures allow you to draw inferences about populations basedon samplesof those populations. make theseinferences, To you must be able to makecertainassumptions aboutthe shape the distributions the populationsamples. of of
REAL WORLD
Null Hypothesis True
NullHypothesis False All hypothesistesting attemptsto draw conclusions about the real world basedon the resultsof a test (a statistical test, in this case).There are four possible zdi TypeI Error I No Error combinationsof results (see the figure at <.r) 6a right). = ETwo of the possible results corare A rect test results.The other two resultsare enors. A Type I error occurs when we U ; - ^6 reject a null hypothesis that is, in fact, 6 fr trO! u true, while a Type II error occurs when l- o> No Error I Typell Error we fail to reject the null hypothesis that 'F: n2 is, in fact,false. Significance tests determine the probabilityof making a Type I error. In other words, after performing a seriesof calculations, obtain a probability that the null we hypothesisis true. If there is a low probability,suchas 5 or less in 100 (.05), by convention, we rejectthe null hypothesis.In other words,we typically use the .05 level (or less) as the maximumType I error ratewe are willing to accept. When there is a low probability of a Type I error, such as .05, we can statethat the significancetest has led us to "reject the null hypothesis."This is synonymous with saying that a difference is "statistically significant." For example,on a reading tesr, suppose you found that a random sampleof girls from a school district scoredhigher than a random
53
Chapter6 ParametricInferentialStatistics
sampleof boys. This result may have been obtainedmerely because chanceenors asthe sociatedwith random sampling createdthe observeddifference (this is what the null hypothesis asserts).If there is a sufficiently low probability that random errors were the cause(as determinedby a significance test),we can statethat the differencebetweenboys and girls is statistically significant.
Significance Levelsvs.Critical Values Moststatistics textbooks present hypothesis testing using concept a critiby the of
cal value. With such an approach,we obtain a value for a test statisticand compareit to a critical value we look up in a table.If the obtainedvalue is larger than the critical value, we reject the null hypothesis and concludethat we have found a significant difference(or relationship).If the obtainedvalue is lessthan the critical value, we fail to reject the null hypothesisand concludethat there is not a significantdifference. The critical-value approachis well suited to hand calculations.Tables that give critical valuesfor alpha levels of .001, .01, .05, etc., can be created. is not practicalto tt createa table for every possiblealpha level. On the other hand, SPSScan determinethe exact alpha level associated with any value of a test statistic. Thus, looking up a critical value in a table is not necessary. This, however,doeschangethe basic procedurefor determiningwhetheror not to reject the null hypothesis. The sectionof SPSSoutput labeledSrg.(sometimes or alpha) indicatesthe likelip hood of making a Type I error if we rejectthe null hypothesis. value of .05 or lessinA dicatesthat we should reject the null hypothesis(assumingan alpha level of .05). A value greaterthan .05 indicatesthat we shouldfail to reject the null hypothesis. In other words, when using SPSS,we normally reject the null hypothesis if the output value under Srg. is equal to or smaller than .05, and we fail to reject the null hypothesisif the outputvalueis largerthan .05. One-Tailed vs. Two-Tailed Tests SPSSoutput generally includes a two-tailed alpha level (normally labeled Srg. in the output). A two-tailed hypothesisattemptsto determinewhether any difference (either positive or negative)exists.Thus, you have an opportunity to make a Type I error on either of the two tails of the normal distribution. A one-tailedtest examinesa differencein a specificdirection.Thus, we can make a Type I error on only one side (tail) of the distribution.If we have a one-tailedhypothesis, but our SPSSoutput gives a two-tailed significance result, we can take the significance level in the output and divide it by two. Thus, if our differenceis in the right direction,and if our output indicatesa significance level of .084 (two-tailed), but we have a one-tailed hypothesis, can report a significance level of .042 (one-tailed). we Phrasing Results Resultsof hypothesistestingcan be statedin different ways, dependingon the conventionsspecifiedby your institution.The following examplesillustrate someof thesedifferences.
54
Chapter6 ParametricInferentialStatistics
Degreesof Freedom immediately after the Sometimesthe degreesof freedom are given in parentheses symbol representing test,as in this example: the (3):7.0 0 ,p<.0 1 of Other times, the degrees freedomare given within the statement results,as in of this example: t:7.0 0 , df :3, p < .01 SignificanceLevel When you obtain results that are significant, they can be describedin different ways. For example,if you obtaineda significancelevel of .006 on a t test, you could describeit in any of the following threeways: (3 ):7 .00,p <.05 (3 ):7 .00,p <.01 (3 ) : 7.0 0 , : .006 p Notice that because exactprobabilityis .006,both .05 and .01 are alsocorrect. the There are also variousways of describingresultsthat are not significant.For example, if you obtaineda significancelevel of .505, any of the following three statementscould be used: t(2 ): .8 0 5 , ns p > .05 t(2):.8 0 5 , t(2 )=.8 0 5 ,p:.505 Statementof Results the Sometimes resultswill be statedin terms of the null hypothesis,as in the following example: The null hypothesis rejected 7.00,df :3, p:.006). was 1t: Other times,the resultsare statedin terms of their level of significance,as in the following example: was A statistically significant difference found:r(3):7.00,p <.01. StatisticalSymbols in use of Generally,statisticalsymbolsare presented italics. Prior to the widespread computersand desktop publishing, statisticalsymbols were underlined.Underlining is a signal to a printer that the underlinedtext should be set in italics. Institutions vary on their requirementsfor student work, so you are advised to consult your instructoraboutthis. Section 6.2 Single-Sample I Test Description the The single-sample test compares mean of a single sampleto a known populaI if the current set of data has changedfrom a longtion mean. It is useful for determining
I
55
Chapter6 ParametricInferentialStatistics
term value (e.g., comparing the curent year's temperatures a historical averageto deto termine if global wanning is occuning).
Assumptions
The distributions from which the scoresare taken should be normally distributed. However, the t test is robust and can handleviolations of the assumptionof a normal distribution. The dependentvariable must be measured an interval or ratio scale. on .SP,SS Data Format The SPSSdata file for the single-sample test requiresa single variablein SPSS. / That variablerepresents set of scoresin the samplethat we will compareto the populathe tion mean. Running the Command The single-sample test is locatedin the CompareMeans submenu,under theAna/ lyze menu.The dialog box for the single-sample test requiresthat we transferthe variable / representing currentset ofscores ry the wllties Wiidnru nrb to the Test Variable(s) section. We i Ardvze Graphs ) Reports must also enter the population averI Sescriplive Stati*icr ] age in the Test Value blank. The exMcrns.,, ample presentedhere is testing the General linearModel ) variable LENGTH againsta popula- ' ) CorrElata Indegrdant-SrmplCs T tion meanof 35 (this exampleusesa ' Regrossion ) Paired-Samples T Tert.,, hypotheticaldata set). ) Clasdfy 0ne-Wey ANOVA,..
Readingthe Output
The output for the single-sample test consistsof two sections.The first section t lists the samplevariable and somebasic descriptive statistics (N, mean, standard deviation, and standard error).
56
Chapter6 ParametricInferentialStatistics
T-Test
S'tatistics One-Sample
N LENGTH
Mean 10
35.9000
Tes{ One-Sample = TeslValue 35 95%gsnl i 6sngg Interual the of Difierence Mean sis. (2-tailed) DitTerence Lowef U pner 7564 .041 .9000 4.356E -02
df
LENGTH
2.377
The secondsection of output containsthe results of the t test. The example preof sentedhere indicatesa / value of 2.377,with 9 degrees freedomand a significance level betweenthe sampleaverage(35.90) of .041.The mean differenceof .9000is the difference and the populationaveragewe enteredin the dialog box to conductthe test (35.00). Drawing Conclusions The I test assumesan equality of means.Therefore, a significant result indicates that the sample mean is not equivalentto the population mean (hence the term "significantly different"). A result that is not significantmeansthat there is not a significantdifference betweenthe means.It doesnot mean that they are equal. Refer to your statisticstext for the sectionon failure to reject the null hypothesis.
57
Chapter6 ParametricInferentialStatistics
tean I
OU .OOD T
Sld Deviation
tempetalure
1 9.1 013
3.03681
L'|D.S.Ittte
Tesi
A single-sample / test 95% Contldsnc6 Inlsryal ofthe comparedthe mean tempOit8rsnce llean t dt Sro (2-laild) DilTernce Lowsr UDOEI erature over the past year lemplatute 1 ?6667 I 688 8.2696 - 5.736 2 to the long-term average. The difference was not significant (r(8) = .417, p > .05). The mean temperature over the past year was 68.67(sd = 9.1l) comparedto the longterm average 67.4. of Practice Exercise The averagesalary in the U.S. is $25,000.Determine if the averagesalary of the participants PracticeData Set 2 (Appendix B) is significantlygreaterthan this value. in Note that this is a one-tailed hypothesis.
= TsstValue 57.1
58
Chapter6 ParametricInferentialStatistics
the mental group). The secondvariablerepresents dependent variable, such as scoreson a test. Conducting an Independent-Samples t Test For our example,we will use 6adrs Utltties window Heh [n;y* the SAMPLE.savdatafile. Click Analyze, then Compare Std{ic* l Means, then Independent-SamplesT ?"est. This will bring up the main diaI log box. Transfer the dependent ) Can*bto variable(s) into the Test Variable(s) l Regf6s$isl t ctaseff blank. For our example,we will use
ls elrl
thevariable GRADL,. For section. our exvariableinto the GroupingVariable Transfer independent the ample, will usethevariable we MORNING.
Next, click Define Groups and enter the values of the two levels of the independent variable. Independent t tests are capable of comparingonly two levels at a time. Click Continue, then click OK to run the analysis.
'1
Outputfrom the Independent-Samples t Test The output will have a sectionlabeled"Group Statistics."This sectionprovidesthe basicdescriptivestatisticsfor the dependentvariable(s)for eachvalue of the independent variable. It shouldlook like the outputbelow.
Group Statistics mornrnq grade No Yes
N
2 2
59
Next, there will be a sectionwith the resultsof the I test. It should look like the outputbelow.
lnd6pondent Samplo! Telt Leveng's Tost for
Fdlralitv df Variencac
ltest for Eoualitv of Msans 95% Confidenco lntsrual of the Diffornc6 Lower
df
Sio. (2{ailed}
Mean Diff6rence
Uooer
.805 .805
2 1.471
505 530
4.50000 4.50000
The columns labeledt, df, and Sig.(2-tailed) provide the standard"answer" for the I test.They provide the value of t, the degrees freedom(number of participants,minus 2, of in this case), and the significancelevel (oftencalledp). Normally, we usethe "Equal variances assumed" row. Drawing Conclusions Recall from the previous section that the , test assumesan equality of means. Therefore,a significant result indicatesthat the means are not equivalent.When drawing conclusions abouta I test,you must statethe directionof the difference(i.e., which mean was larger than the other). You should also include information about the value of t, the degrees freedom, significancelevel,and the meansand standard deviationsfor the of the two groups. Phrasing Results That Are Significant For a significant / test (for example,the output below), you might statethe following:
0r0uD conlfol rDoflmontal
N
6adO slil|llk!
'.do,.ilhrh!
sc0r9 xan al 0000
Sld. Odiation
a.7a25a
2osr67
S,uplcr lcsl
l l a sl fo r Fo u e l l to o t l n a n s
xsan
Si d
dT 071 5
Sio a2-laile(n
U Jb
sc0re
5 058
7 66667
I r. 6 1 2 8 7 r r. r 3 2 7 3
31tl
a 53l
029
An independent-samples test comparing the mean scores of the I experimental and control groups found a significant difference between the means of the two groups ((5) = 2.835, p < .05). The mean of the experimental group was significantlylower (m = 33.333,sd:2.08) than the meanof the control group (m: 41.000, : 4.24). sd
60
Chapter6 ParametricInferentialStatistics
Phrasing Results That Are Not Signifcant In our example at the start of the section,we comparedthe scoresof the morning peopleto the scoresof the nonmorningpeople.We did not find a significant difference,so we could statethe following: t test was calculatedcomparingthe mean score of An independent-samples as participants who identified themselves morning people to the mean score of participants who did not identi$ themselvesas morning people. No significant difference was found (t(2) : .805, p > .05). The mean of the morning people (m : 78.00,sd = 7.07) was not significantly different from the meanof nonmomingpeople(m = 82.50,sd = 3.54). Practice Exercise Use Practice Data Set I (AppendixB) to solvethis problem.We believethat young skills than older individuals.We would test this hyindividualshave lower mathematics pothesisby comparingparticipants25 or younger(the "young" group) with participants26 or older (the "old" group). Hint: You may need to createa new variable that represents which age group they are in. SeeChapter2 for help.
6l
Chapter6 ParametricInferentialStatistics
PRETEST 56 79 68 59 64
t.+ t)
47 78 6l 68 64 53 7l 6l 57 49 7l 6l 58 58
MIDTERM 64 9l 77 69 77 88 85 64 98 77 86 77 67 85 79 77 65 93 83 75 74
FINAL 69 89 8l 7l
IJ
INSTRUCT I I I I I 2 2 2 2 2 2 2
J J J J J t J J
REQUIRED 0 0 0
86 86 69 100 85 93 87 76 95 97 89 83 100 94 92 92
0 0 0
0 0 0
Enter the data and save it as You can check your GRADES.sav. data entry by computinga mean for each instructor using the Means command (see Chapter 3 for more information). Use INSTRUCT as the independentvariable and enter PRETEST, MIDTERM, ANd FINAL as your dependent variables. Once you have entered the I data,conducta paired-samples test comparing pretest scores and final scores. Click Analyze, then Compare Means, then Paired-SamplesT ?nesr. This will bring up the main dialog box.
i;vr*;
'
Report INSTRUCT
1.00 Mean N
PRETEST MIDTERM 67.5714 (6.t'l 4J 7 7 8.3837 63.1429 7 10.6055 59.2857 7 6.5502 63.3333 21 8.9294 9.9451 79.1429 7 11.7108 78.0000 7 8.62',t7 78.6190 21 9.6617
FINAL
79.5714 7 7.9552 86.4286 7 10. 9218 92.4286 7 5.5032 86. 1429 21 9.6348
std. Deviation
2.00 Mean N std. Deviation Mean N std. Deviation Mean N std. Deviation
3.00
Total
6r5db trl*lct
l t
t
tilkrdo|
tbb
Bsi*i l)06ctftlv.s44q
@ Cdnrdur*Modd
i c*@ i lryt'sn i'w
q8-sdndo T Tart...
62
You must select pairs of variables to compare. As you select them, they are placed in the Current Selections area. Click once on PRETEST. then once on FINAL. Both variableswill be moved into the Cnrrent Selections area.Click on the right anow to transfer the pair to the Paired Variablessection.Click OK to conductthe test.
m
ry{l .ry1
: CundSdacti:n: I VcirHal: Vri{!ilcz
|y:tl
"l*;"J
xl
PaildVaiibblt
.!K I
ril',n*
atrirdftEl d1;llqi.d
"ttr.I
- .t L*rd
fc
Reading the Output The output for PairedSamples Statistics paired-samplestest the t std. Std.Error consists of three comMean N Deviation Mean ponents. The first part Pair PRETEST 63.3333 21 8.9294 1.9485 givesyou basicdescrip1 rtNet 86.1429 21 9.6348 2.1025 tive statistics for the pair of variables. The PRETESTaverage was 63.3,with a standard deviation of 8.93.The FINAL average was 86.14, with a standarddeviationof 9.63.
PairedSamplesGorrelations
N PaIT1 PRETEST & F IN A L
Correlation 21 .535
S i o.
.013
The second part ofthe output is a Pearson correlation coefficientfor the pair of variables.
Within the third part of the output (on the next page),the sectioncalled paired Differences contains information about the differencesbetweenthe two variables.you miy have learnedin your statistics classthat the paired-samples test is essentially single/ a samplet testcalculated the differences on between scores. the The final threecolumnscontain the value of /, the degreesof freedom,and the probability level. In the examplepresentedhere,we obtaineda I of -11.646,with 20 degrees freedomand a significance of level of lessthan.00l. Note that this is a two-tailedsignificancelevel. Seethe startof this chapter more detailson computing one-tailed for a test.
63
Chapter6 ParametricInferentialStatistics
Paired Differences
srd,
Mean
Dcvirl
Std.Errot
Maan
l)iffcrcncc
sig
U D oer
I
Lower
df
l)Jailatl\
Pair1
8.9756
'|1.646
20
.000
Drawing Conclusions Paired-samples testsdeterminewhetheror not two scoresare significantly differI ent from each other. Significant values indicate that the two scoresare different. Values that are not significant indicatethat the scoresare not significantly different.
PhrasingResultsThatAre Significant
When statingthe resultsof a paired-samplestest,you shouldgive the value of t, I the degrees freedom,and the significancelevel. You should also give the mean and of standard deviation for each variable.as well as a statement results that indicates of whetheryou conducted one- or two-tailedtest.Our exampleabovewas significant, we a so could statethe following: A paired-samples test was calculated comparethe mean pretestscoreto I to the mean final exam score.The mean on the pretestwas 63.33 (sd : 8.93), and the mean on the posttest was 86.14(sd:9.63). A significantincrease from pretestto final was found (t(20) : -ll .646,p < .001). Phrasing Results That Are Not Significant If the significancelevel had beengreater than .05 (or greaterthan .10 if you were conducting one-tailed a test),the resultwould not have beensignificant. For example,the hypotheticaloutput below represents nonsignificantdifference.For this output, we could a state:
P.l crl Sdr{ror St.rtlstk r
Sld.Etrof x6an Patr 1 midlsrm nnal
N Sl d D d a l l o n
r B t1a3 79.571 1
7 I
I 9.509 7 95523
3 75889 3.00680
ts4trI
mtdtrm&flnal
000
Tcrl
x s an
. 8571a
Std Oryiation
Std.Eror Xan
tl
b
2 96808
I 12183
64
Chapter6 ParametricInferentialStatistics
to the A paired-samples wascalculated compare meanmidtermscoreto t test The on was78.71(sd: 9.95), themeanfinal examscore. mean the midterm (sd:7.96). No significant difference andthe meanon the final was79.57 = -.764,p >.05). from midterm finalwasfound((6) to Practice Exercise data Usethe same GRADES.sav file, andcompute paired-samples to detera / test mineif scores from midterm final. to increased Section6.5 One-Wav ANOVA Description (ANOVA) is a procedure determines proportion Analysisof variance that the of variabilityattributed eachof several components. is oneof the mostusefulandadaptIt to ablestatistical techniques available. the of The one-way ANOVA compares means two or moregroupsof participants that vary on a singleindependent variable (thus,the one-waydesignation). When we groups, we havethreegroups, couldusea I testto determine we differences between but wouldhaveto conduct to threet tests(GroupI compared Group2, Group I compared to Whenwe conduct multipleI tests, inflate we Group3, andGroup2 compared Group3). to of the Type I error rate and increase chance drawingan inappropriate our conclusion. ANOVA compensates thesemultiplecomparisons givesus a singleanswerthat and for from anyof theothergroups. tellsus if anyof thegroups different is Assumptions The one-wayANOVA requires singledependentvariable and a singleindea pendentvariable. Which groupparticipants belongto is determined the valueof the by independentvariable. Groupsshouldbe independent eachother.If our participants of belong to more than one group each,we will have to conducta repeated-measures variable,we would conduct factorial ANOVA. If we havemorethanone independent a ANOVA. variableis at the intervalor ratio levANOVA alsoassumes thedependent that is normallydistributed. elsand SP,S,S Data Format datafile. One variableserves the deTwo variables required the SPSS in as are pendentvariable andtheotherasthe independent provariable.Eachparticipant should variable. videonly onescore thedependent for Running the Command we data in For this example, will usetheGRADES.sav file we created the previous section.
65
Chapter6 ParametricInferentialStatistics
To conduct a one-way ANOVA, click Analyze, then Compare Means, then One-I7ay ANOVA. This will bring up the main dialog box for the Onelltay ANOVA command.
l ':l i rl :.,. :J xrded
$alp
orn-5dtFh
f f64...
Csnalda
R!E'l'd6al
I )
cl#y
rffillEil
f *wm dtin"t
S instt.,ct / rcqulad
f,'',I
R*g I c*"d I
miqq|4^__
!qt5,l
Hdp I
fqiryt,,,l odiors"' I
You should place the independent variable in the Factor box. For our example, INSTRUCT representsthree different instructors.and it will be used as our independentvariable. Our dependentvariable will be FINAL. This test will allow us to determine if the instructor has any effect on final sradesin the course.
pela$t
/n*Jterm d 'cqfuda
-rl ToKl
f- Fncdrdr*dcndlscls
Ro"t I
l* Honroga&olvairre,* j *_It
mrffi,.*Click on the Options box to get the Options dialog box. Click Descriptive. This will give you meansfor the dependentvariable at each level of the independentvariable. Checkingthis box preventsus from having to run a separate means command. Click Continue to return to the main dialog box. Next, click Post Hoc to bring up the Post Hoc Multiple Comparisonsdialog box. Click Tukev.then Continue.
c*ll H& l
, Eqiolvdirnccs NotfudfiEd
r TY's,T2
l,?."*:,_:
t or*ol:: _l6{''6:+l.'od
Sigflber!
h,v!t tffi-
Post-hoctests are necessary the event of a significant ANOVA. The ANOVA in only indicatesif any group is different from any other group. If it is significant,we needto determinewhich groupsare different from which other groups.We could do I teststo determine that, but we would have the sameproblem as before with inflating the Type I error rate. There are a variety of post-hoccomparisons that correct for the multiple comparisons.The most widely used is Tukey's HSD. SPSSwill calculatea variety of post-hoc testsfor you. Consult an advanced of text for a discussion the differencesbetween statistics thesevarioustests.Now click OK to run the analvsis.
66
Readingthe Output (i.e.,levelof the independinstructor will Descriptive statistics be givenfor each final Instructor hadan average I in class, ent variable)andthe total.For example, hisftrer exam score 79.57. of
Descriptives
final
N
UU
7 7 2'l
95% Confidence Interval for Mean Std. Deviation Std.Enor LowerBound UooerBound Minimum Maximum Mean 72.2',t41 6Y.UU tiv.uu 86 9288 7.95523 3.00680 79.5714 100.00 69.00 76.3276 96.5296 1 0 .9 2180 4.12805 86.4286 100.00 83.00 87.3389 97.5182 5.50325 2.08003 92.4286 100.00 69.00 81.7572 90.5285 9.63476 2.10248 86. 1429
The next section of the output is the ANOVA sourcetable. This is where 579.429 the various componentsof 1277. 143 the variance have been 1856.571 listed,along with their relative sizes. For a one-way ANOVA, there are two componentsto the variance: Between due to our independent variable) and Within the Groups (which represents differences within eachlevel of our independentvariable). For differences Groups(which represents differencesdue to different instrucour example,the BetweenGroups variancerepresents individual differencesin students. represents tors. The Within Groupsvariance variThe primary answeris F. F is a ratio of explainedvariance to unexplained ance. Consult a statisticstext for more informationon how it is determined.The F has two of different degrees freedom,one for BetweenGroups(in this case,2 is the number of levels of our independentvariable [3 - l]), and anotherfor Within Groups(18 is the number minusthe numberof levelsof our independentvariable [2] - 3]). of participants post-hoc The next part of the output consistsof the results of our Tukey's ^EISD comparison. of us This table presents with every possiblecombination levels of our independInstructor I comparedto Instructor 2. Next is Inent variable. The first row represents structor I compared to Multlple Comparlsons
Instructor 3. Next is InF,NAL variabre: Dooendenr
HS D 95% Confidence Mean (J) Difference ( t) Il-.tl INSTRUCT INSTRUCT 1 .0 0 2.OU .8571 Lower B ound
-16.J462
structor 2 compared to Instructor l. (Note that this is redundantwith the first row.) Next is Instructor 2 comparedto Instructor 3, and so on. The column labeled Sig. representsthe Type I error (p) rate for the simple (2-level) comparison in that row. In our
Upper
R6r rn.l
Sia
JU4
3.00
1.00
2.00
-24.3482
-4.6339 1 -17.491 1.3661 -5.4911
67
Chapter6 ParametricInferentialStatistics
example above, InstructorI is significantly 3, differentfrom Instructor but InstructorI is not significantly differentfrom Instructor andInstructor is not significantly 2 different 2, from Instructor 3.
Drawing Conclusions Drawing conclusionsfor ANOVA requiresthat we indicate the value of F, the degreesof freedom,and the significance level. A significantANOVA should be followed by the resultsof a post-hocanalysisand a verbal statement the results. of Phrasing Results That Are Significant In our exampleabove,we could statethe following: We computed a one-way ANOVA comparing the final exam scores of participantswho took a course from one of three different instructors.A significant difference was found among the instructors (F(2,18) : 4.08, p < .05). Tukey's f/SD was usedto determinethe natureof the differences between the instructors. This analysis revealed that students who had InstructorI scoredlower (m:79.57, sd:7.96) than students who had = 92.43, Instructor (m 3 who had Instructor (m:86.43, 2 sd: 5.50).Students sd : 10.92) were not significantly different from either of the other two groups. Phrasing Results That Are Not Significant If we had conducted the analysis using PRETEST as our dependent variable instead of FINAL, we would have received the following output: The ANOVA was not significant. so there is no need to refer to the Multiple Comparisons table. Given this result, we may statethe following:
Ocrcr lriir
tol
Sld Ddaton
Sld Eiror
LMf
Bo u n d
Uooer gound
1 21
1700 r9.00 at 0 0
78 00 71 00 79 00
All:'Vl
sumof
wrlhinOroups Tol.l
t8 20
1 600
229
The pretest means of students who took a course from three different instructors were compared using a one-way ANOVA. No significant = difference was found (F'(2,18) 1.60,p >.05).The students from the three different classes not differ significantlyat the start of the term. Students did who had Instructor I had a mean scoreof 67.57 (sd : 8.38). Studentswho had Instructor2 had a mean scoreof 63.14(sd : 10.61).Studentswho had lnstructor3 had a meanscoreof 59.29(sd = 6.55).
68
Chapter6 ParametricInferentialStatistics
Practice Exercise if mathscores of DataSet I in AppendixB, determine the average UsingPractice of participants significantly are different.Write a statement single,married, and divorced results. Section6.6 Factorial ANOVA Description variThe factorialANOVA is one in which thereis morethan one independent variables,eachwith two levhas able.A 2 x 2 ANOVA, for example, two independent x 2 x 2 ANOVA hasthreeindependent variables.Onehasthreelevels,andthe els.A 3 ANOVA is very powerfulbecause allowsus to asit othertwo havetwo levels.Factorial variable,plustheeffects the interaction. of of sess effects eachindependent the Assumptions ANOVA (i.e.,the all of Factorial ANOVA requires of the assumptions one-way In at the interval or ratio levelsand normallydistributed). dependent variable mustbe be should independent each of other. variables addition, independent the Data Format .SPSS variable,andone variable each for for SPSS requires variable the dependent one variablethatis represented multias independent variable.If we haveanyindependent (e.g., PRETESTand POSTTEST), must use the repeated-measures we ple variables ANOVA. Runningthe Command This exampleuses the GRADES.sav Rnalyze Graphs USiUas llilindow Halp I Click Anadata file from earlierin this chapter. Repsrts lyze, then GeneralLinear Model. then UnivariDescrbtiwStatFHcs
ate.
R{'dsr! F.dclt}
!!{4*,l
This will bring up the main dialog box for Univariate ANOVA. Select the dependent variable and place it in the DependentVariableblank (use FINAL for this example). Select one of your independent variables (INSTRUCT, in this case) and place it in the Fixed Factor(s) box. Place the secondindependent variable (REQUIRED) in the Fixed Factor(s)
box. Having defined the analysis, now click Options. When the Options dialog box comes up, move INSTRUCT, REQUIRED, and INSTRUCT x REQUIRED into the Display Means for blank. This will provide you with means for each main effect and interaction term. Click Continue. If you were to selectPost-Hoc,SPSSwould run post-hoc analysesfor the main effects but not for the interaction term. Click OK to run the analvsis.
At the bottom of the output, you will find the means 95olo Confidencelnterval for each main effect and inLower Upper Bound teraction you selected with the Mean Std.Error Bound I NS T R U C T 1 .UU 72.240 86.926 79.583 3.445 Optionscommand. 2.00 78.865 93.551 3.445 86.208 three were There 3.00 99.426 3.445 84.740 92.083 instructors, so there is a mean FINAL for each instructor. We 2. REQUIRED also have means for the two values of REQUIRED. FinallY, FINAL ariable: Interval we have six means representing 95%Confidence the interaction of the two Upper Lower Bound Std.Error Rorrnd variables (this was a 3 x 2 REOUIRED Mean ,UU 91.076 78.257 3.007 84.667 design). 1.00 92.801 2.604 81.699 87.250 had Participants who Instructor I (for whom the class of 79.67. Studentswho had Instructor I was not required) had a mean final exam score (for whom it was required)had a mean final exam scoreof 79.50,and so on. The example we just 3. IN S TR U C T' R E QU IR E D ran is called a two-way Variable:FINAL we ANOVA. This is because Interval 95%Confidence had two independent variUpper Lower Bound Bound ables. With a two-way INSTRUCT REQUIRED Mean Std.Error 90.768 68.565 5.208 1.00 .00 79.667 ANOVA, we get three an89. 114 69.886 4.511 1.00 79.500 swers: a main effect for 95.768 73.565 5.208 .00 84.667 2.00 INSTRUCT, a main effect 78.136 97.364 4.511 1.00 87.750 for REQUIRED, and an in100.768 78.565 5.208 .00 89.667 3.00 for result teraction 104.114 84.886 4.511 1.00 94.500 INSTRUCT ' REQUIRED (seetop ofnext page).
70
Chapter6 ParametricInferentialStatistics
635.8214 5 Intercept 1s1998.893 1 151998.893 1867.691 INSTRUCT 536.357 2 268.179 3.295 REQUIRED 34.32'l 1 34.321 .422 INSTRUCTREQUIRED ' 22.071 2 11.036 .136 Error 1220.750 15 81.383 Total 157689.000 21 Corrected Total 1856.571 20 a. R Squared .342(Adjusted Squared .123) = = R
The source-table above gives us these three answers (in the INSTRUCT, REQUIRED, and INSTRUCT * REQUIRED rows). In the example,none of the main effects or interactions was significant.tn the statements results, of you must indicate two ^F, degreesof freedom (effect and residual/error), the significance'level, and a verbal statement for each of the answers(three, in this case).uite that most statisticsbooks give a much simpler version of an ANOVA sourcetable where the CorrectedModel, Intercept, and CorrectedTotal rows are not included. Phrasing Results That Are Significant If we had obtainedsignificantresults this example, in we could statethe following (Theseare fictitious results.For the resultsthat correspona to the example above, please seethe sectionon phrasingresultsthat are not significant): A 3 (instructor) x 2 (requiredcourse)between-subjects factorial ANOVA was calculated comparingthe final exam scoresfor participants who had one of three instructors and who took the courseeither as a requiredcourseor as an elective. A significant main effect for instructor was found (F(2,15) : l0.ll2,p < .05). studentswho had InstructorI had higher final .*u- r.o16 (m = 79.57, sd:7.96) thanstudents who had Instructor (m:92.43, sd: 3 5.50). Studentswho had Instructor2 (m : g6.43,sd :'10.92) weie not significantlydifferent from eitherof the other two groups.A significant main effect for whetheror not the coursewas requiredwas found (F(-1,15) :3g.44, p < .01)'Students who took the coursebecause was requireddid better(z : it 9l '69, sd : 7.68)thanstudents who took thecourse an electi (m : 77.13, as ve sd:5.72). The interaction not significant was = (F(2,15) l.l5,p >.05). The effect of the instructorwas not influencedby whetheror not the students took the coursebecause was required. it Note that in the above exampre,we would have had to conduct Tukey,s HSD to determinethe differencesfor INSTRUCT (using thePost-Hoc command).This is not nec-
7l
Chapter6 ParametricInferentialStatistics
Typelll Sumof Source Souares uorrecleoMooel 635.8214 Intercept 151998.893 INSTRUCT 536.357 REQUIRED 34.321 INSTRUCTREQUIRED 22.071 Error 1220.750 Total 157689.000 't856.571 Corrected Total
df
Mean Souare
5 1 2 1 2
1E
127.164 1.563 151998.893 1867.691 268.179 3.295 34.321 .422 11.036 .136 81.383
21 20
The source table above gives us these three answers (in the INSTRUCT, REQUIRED, and INSTRUCT * REQUIRED rows). In the example,none of the main effects or interactions was significant.In the statements results,you must indicatef', two of (effect and residual/error), significance level, and a verbal statethe degreesof freedom ment for each of the answers(three,in this case).Note that most statistics books give a much simpler versionof an ANOVA sourcetable where the CorrectedModel, Intercept, and CorrectedTotal rows are not included. Phrasing Results That Are Signifcant If we had obtainedsignificantresultsin this example, could statethe following we (Theseare fictitious results.For the resultsthat correspond the exampleabove,please to seethe section phrasing on resultsthat arenot significant): A 3 (instructor) x 2 (required course)between-subjects factorial ANOVA was calculated comparingthe final exam scoresfor participants who had one of three instructors and who took the courseeither as a requiredcourseor as an elective. A significant main effect for instructor was found (F(2,15\ : l0.l 12,p < .05). Students who had InstructorI had higher final exam scores : 79.57,sd : 7.96) than students (m who had Instructor3 (m : 92.43,sd : 5.50). Studentswho had Instructor 2 (m : 86.43, sd : 10.92) were not significantlydifferent from eitherof the othertwo groups.A significantmain :38.44, effect for whetheror not the coursewas requiredwas found (F'(1,15) p < .01). Students who took the coursebecause was requireddid better(ln : it (m:77.13, who took the course an elective 91.69, sd:7.68) thanstudents as : 5.72).The interaction : (,F(2,15) I . 15,p > .05).The was not significant sd effect of the instructorwas not influencedby whetheror not the students took it the coursebecause was required. Note that in the above example,we would have had to conduct Tukey's HSD to determinethe differencesfor INSTRUCT (using the Post-Hoc command).This is not nec-
71
it for essary REQUIREDbecause hasonly two levels(andonemustbe differentfrom the other). ThatAre Not Significant Phrasing Results the so werenot significant, we canstate following: results Our actual factorial ANOVA A 3 (instructor)x 2 (requiredcourse)between-subjects who hadone participants for the wascalculated comparing final examscores or course as an as and of threeinstructors who took the course a required : p (F(2,15) 3.30, was The elective. maineffectfor instructor not significant wasalso course > .05).The maineffectfor whether not it wasa required or = .42,p > .05).Finally, interaction alsonot was the (F(1,15) not significant that neitherthe significant(F(2,15)= .136,p > .05). Thus, it appears effect hasany significant is or instructor whether not thecourse required nor on finalexamscores. Practice Exercise by are if DataSet 2 in AppendixB, determine salaries influenced Using Practice job classification. Write a statesex or sex,job classification, an interactionbetween and mentof results. ANOVA Section6.7 Repeated-Measures Description to the ANOVA extends basicANOVA procedure a withinRepeated-measures providedatafor morethanonelevelof variable (whenparticipants independent subjects / test when more thantwo like a paired-samples variable).It functions an independent levelsarebeingcompared. Assumptions on and variableshould normallydistributed measured an interbe The dependent variable shouldbe from the of val or ratio scale.Multiple measurements the dependent (or participants. same related) Data SP,S^S Format repdatafile should in Eachvariable the SPSS are At leastthreevariables required. an variable'Thus, levelof theindependent variableat a single dependent resent single a four varivariable would require with four levelsof an independent of analysis a design in the SPSS datafile. ables ANOVA effect,usethe Mixed-Design a representsbetween-subjects If any variable instead. command
72
Chapter6 ParametricInferentialStatistics
Runningthe Command sample the uses GRADES.sav Thisexample includesthree data set. Recall that GRADES.sav sets of grades-PRETEST, MIDTERM, and timesduring '@ threedifferent FINAL-that represent the This allowsus to analyze effects t{!idf-!de the semester. of of time on the test performance our sample f,txrd*r gp{|ssriori comparison). tSoar (hencethe within-groups population Click Analyze,then GeneralLinear Model, then
r )
!8*vxido"' $.dryrigp,,,
garlrr! Cffiporur&'..
Measures. Repeated Note that this procedure requires an optional module. If you do not have this command, you do not have the proper module installed. This procedure is NOT included in the student version o/SPSS. After selectingthe command, you will be W,;, -xl presented with the Repeated Measures Define Factor Nanc ffi* . ,,., I Factor(s)dialog box. This is where you identify the Ui**tSubioct qllcv6ls: Nr,nrbar factor (we will call it TIME). Enter 3 l3* B"*, I within-subject cr,ca I for the numberof levels (threeexams) and click Add. Now click Define. If we had more than one Hsl independent variable that had repeatedmeasures, we could enter its name and click Add. You will be presentedwith the Repeated Mea*lxaNsre: I Measures dialog box. Transfer PRETEST, MIDTERM, and FINAL to the lhthin-Subjects Variables section. The variable names should be in to according when they occurred time (i.e., ordered tl the values of the independent variable that they represent).
--J
h{lruc{
-ssl
U"a* | cro,r"*.I
Click Options,and SPSSwill computethe meansfor the TIME effect (seeone-way ANOVA for more detailsabouthow to do this). Click OK to run the command.
/J
Chapter6 ParametricInferentialStatistics
Readingthe Output o,.rtput This procedure usesthe GLM command. fil sess Model H &[ GeneralLinear GLM stands "GeneralLinearModel." It is a for Title very powerful command, and many sections of Notes outputarebeyond scope the ofthis text (seeouts Wlthin-SubjectFactor s put outlineat right). But for the basicrepeatedMultivariate Tests measures ANOVA, we are interested only in the Tes{ol Sphericity Mauchly's Eflecls Testsol Wthin-Subjects Testsof l{ithin-SubjectsEffects. Note that the Contrasts Tesisol \ /lthin-Subjects SPSS outputwill includemanyothersections of Testsol Between-Subjects Ellec{s output, whichyou canignoreat thispoint.
Tests of WrllrilFstil)iects Effecls
Measure: MEASURE 1 Type Sum lll ofSouares time Sphericity Assumed 5673.746 Gre e n h o u s e -Gee r iss 5673.746 Huynh-Feldt 5673.746 Lower-bound 5673.746 Error(time)SphericityAssumed 930.921 Greenhouse-Geisser 930.921 Huynh-Feldt 930.921 Lower-bound 930.921 Source
df
Mean Souare 2836.873 4685.594 4550.168 5673.746 23.273 38.439 37.328 46.546
Siq.
The TPesls l{ithin-Subjects of Effectsoutput shouldlook very similar to the output from the otherANOVA commands. the aboveexample, effect of TIME has an F In the value of 121.90with 2 and 40 degrees freedom(we use the line for Sphericity Asof sumed). is significantat less than the .001 level. When describing It results,we these should indicate typeof test,F value, the level. and of degrees freedom, significance Phrasing Results ThatAre Significant Because ANOVA results the weresignificant, needto do somesortof post-hoc we analysis. One of the main limitationsof SPSSis the difficulty in performingpost-hoc analyses within-subjects for factors. With SPSS, easiest solutionto this problemis to the protecteddependent testswith repeated-measures conduct t ANOVA. Thereare more powerful(andmoreappropriate) post-hoc will not compute them for analyses, SPSS but us.For moreinformation, your instructor a moreadvanced statistics text. consult or To conductthe protectedt tests,we will comparePRETESTto MIDTERM, MIDTERM to FINAL, andPRETEST FINAL, usingpaired-samples Because we I tests. to areconducting threetests and,therefore, inflatingour Type I error rate,we will usea significance levelof .017(.05/3) instead .05. of
74
Chapter6 ParametricInferentialStatistics
Paired Differences 950/o Confidence Interval the of l)ifferance Std.Error std. Dcvietior Lower UoDer Mean
3 .9 6 41
df
sig {2-tailed)
-15.2857
M IDT ERM Pai 2 P a i r3 PRETEST . F INAL M IDT ERM . F INAL
.8650
20 20 20
-22.8095 -7.5238
8.97s6 6.5850
The three comparisons eachhad a significancelevel of less than.017, so we can concludethat the scoresimproved from pretestto midterm and again from midterm to final. To generatethe descriptive statistics, we have to run the Descriptivescommand for eachvariable. Because resultsof our exampleabovewere significant,we could statethe folthe lowing: A one-way repeated-measures ANOVA was calculated comparingthe exam scoresof participants three different times: pretest,midterm, and final. A at significant effect was found (F(2,40) : 121.90,p < .001). Follow-up protectedI testsrevealedthat scoresincreased significantlyfrom pretest(rn : (m:78.62, sd:9.66), andagainfrom midterm 63.33, sd: 8.93)to midterm = 86.14, to final (m sd:9.63). Phrasing Results That Are Not Significant With resultsthat are not significant, could statethe following (the F valueshere we havebeenmadeup for purposes illustration): of ANOVA was calculatedcomparingthe exam A one-way repeated-measures scoresof participants threedifferent times: pretest,midterm, and final. No at significant effect was found (F(2,40) = 1.90, p > .05). No significant (m: 63.33, = 8.93),midterm(m:78.62, sd difference existsamongpretest sd:9.66), andfinal(m:86.14, sd: 9.63)means. Practice Exercise Use PracticeData Set 3 in Appendix B. Determine if the anxiety level of participants changedover time (regardless which treatmentthey received) using a one-way of repeated-measures ANOVA and protecteddependent tests.Write a statement results. I of Section 6.8 Mixed-Design ANOVA Description The mixed-designANOVA (sometimescalled a split-plot design) teststhe effects of more than one independentvariable. At leastone of the independentvariables must
75
Chapter6 ParametricInferentialStatistics
(repeated be within-subjects measures). leastone of the independentvariables must be At between-subjects. Assumptions The dependent variable shouldbe normally distributedand measured an interon val or ratio scale. SPSS Data Format The dependent variable shouldbe represented one variable for each level of the as within-subjects independentvariables.Anothervariableshouldbe present the datafile in x 2 mixed-designANOVA would require for each between-subjects variable. Thus, a 2 three variables,two representing dependent variable (one at each level), and one repthe resenting between-subjects the independentvariable. Running the Command The GeneralLinear Model commandruns the Mixed-Design ANOVA command. Click Analyze, then General Linear Model, then Repeated Measures. Note that this procedure requires an optional module. If you do not have this command, you do not have the proper module installed. This procedure is NOT included in the st u dent versi on o/,SPSS.
lAnqtr:l g{hs Udear Add-gnr Uhdorn Rqprto r oitc|hfiy. $rtBtlca i T.d*t
Cocpa''UF46 ) )
ilrGdtilod6b
} l i
The RepeatedMeasures command shouldbe used if any of the independent gfitsrf*|drvdbbht IK variables are repeatedmeasures(withinter I subjects). s*i I This example also uses the t*., GRADES.sav data file. Enter PRETEST, 34 1 MIDTERM, and FINAL in the lhthinSubjects Variables block. (See the RepeatedMeasuresANOVA command in Section 6.7 for an explanation.) This exampleis a 3 x 3 mixed-design. There are two independent variables (TIME and INSTRUCT), each with three levels. Uoo*..I cotrl... I 8q... I Podg*. I Sm.. | 8e4.. I We previously enteredthe information for TIME in the RepeatedMeasures Define Factors dialog box. We needto transferINSTRUCT into the Between-Subjects Factor(s) block. Click Options and selectmeansfor all of the main effectsand the interaction (see one-way ANOVA in Section6.5 for more details about how to do this). Click OK to run the command.
mT-
76
Chapter6 ParametricInferentialStatistics
Readingthe Output provides measures command, GLM procedure a repeated the As with the standard ANOVA, we are interested two secin we will not use.For a mixed-design lot of output Effects. tions.The first is Tests Within-Subjects of
Eflects Tests ot Wrtlrilr-Suuects M eas u re : Type Sum lll of Squares sphe ricity Assumed 6 5673.74 Gre e n h o u s e -Ge i s s el 5673.746 Huynh-Feldt 6 5673.74 Lower-bound 6 5673.74 time' instruct Sphericily Assumed 806.063 0 re e n h o u s e -Ge i s s e r 806.063 Huynh-Feldt 3 806.06 Lower-bound 806.063 Erro(time) Sphericity Assumed 124857 Gre e n h o u s e - i s s e r Ge 124.857 Huynh-Feldt 124.857 Lower-bound 124.857 S our c e time
df
Mean ouare S 2836.873 4802.586 4' t83.583 5673.746 201 6 .51 .149 341 297.179 403.032 3.468 5.8 71 5.1 15 6.S 37
sis
000 000 000 000 .000 .000 .000 .000
This sectiongives two of the threeanswerswe need (the main effect for TIME and the interactionresult for TIME x INSTRUCTOR). The secondsectionof output is Testsof Between-subjects Effects (sample output is below). Here, we get the answersthat do not contain any within-subjects effects. For our example, we get the main effect for INSTRUCT. Both of thesesectionsmust be combinedto oroduce the full answer for our analysis.
Eltecls Testsot Betryeen-Sill)iects Me a s u re :EA SU R1 M E ransformed Variable: Type Sum lll S o u rc e of Squares 9 Intercepl 3 6 4 1 2 .0 6 3 inslrucl 18 .6 9 8 En o r 43 6 8 .5 7 1
df
I
2 18
Siq.
.000 .962
If we obtain significant effects, we must perform some sort of post-hoc analysis. postAgain, this is one of the limitations of SPSS.No easyway to perform the appropriate (within-subjects)factors is available. Ask your instructor hoc test for repeated-measures for assistance with this. of When describingthe results,you should include F, the degrees freedom,and the In significancelevel for eachmain effect and interaction. addition,somedescriptivestaor tistics must be included(eithergive means includea figure). Phrasing Results That Are Significant There are three answers(at least) for all mixed-designANOVAs. Pleasesee Section 6.6 on factorial ANOVA for more detailsabouthow to interpretand phrasethe results.
77
For the aboveexample,we could statethe following in the resultssection(note that this assumes post-hoctestshave beenconducted): that appropriate A 3 x 3 mixed-design ANOVA was calculated examinethe effectsof the to instructor(Instructors1,2, and,3)and time (pretest,midterm, and final) on scores.A significant time x instructor interactionwas present(F(4,36) = 58.10, p <.001). In addition,the main effect for time was significant (F(2,36): 817.95,p < .001). The main effect for instructor was not (F(2,18): .039, > .05).Upon examination the data,it appears significant p of that Instructor3 showedthe most improvement scores in over time. With significant interactions, it is often helpful to provide a graph with the descriptive statistics.By selectingthe Plotsoption in the main dialog box, you can make graphsof the interaction like the one below. Interactionsadd considerable complexityto the interpretation statisticalresults.Consult a research methodstext or ask your instrucof tor for more help with interactions.
ilme
E@rdAs
I, I lffi---_ sw&t.tc
-
f Crtt"-l
-l
cd.r_J
x E i I
o0
-z
-3
-==s.Cr&Plob:
L:J r---r
-
.
I !
ia
E
gr! I
E
UT
t
2(Il
inrt?uct
Phrasing Results That Are Not SigniJicant If our resultshad not beensignificant,we could statethe following (note that the .F valuesare fictitious): A 3 x 3 mixed-design ANOVA was calculated examinethe effectsof the to (lnstructors1,2, and 3) and time (pretest,midterm, and final) on instructor scores.No significant main effects or interactions were found. The time x (F(4,36) = 1.10,p > .05), the main effect for time instructor interaction (F(2,36)= I .95,p > .05),andthe main effectfor instructor (F(2,18): .039, p > .05) were all not significant.Exam scoreswere not influencedby either time or instructor. Practice Exercise Use PracticeData Set 3 in AppendixB. Determineif anxiety levels changed over time for each of the treatment(CONDITION) types. How did time changeanxiety levels for eachtreatment? Write a statement results. of
78
Chapter6 ParametricInferentialStatistics
Assumptions with the dependANCOVA requires the covariatebe significantly that correlated ent variable.The dependent variableandthecovariateshould at theinterval or ratio be levels. addition. In bothshould normallydistributed. be SPS,S Data Format onevariable eachindependent for The SPSS variable,one datafile mustcontain variable variable,andat least covariate. representing dependent one the Running the Command The Factorial ANOVA commandis tsr.b"" GrqphrUUnbr,wrndowl.lab usedto run ANCOVA. To run it, click AnaReports lyze, then General Linear Model, then UniDascrl$lve St$stks variate. Follow the directionsdiscussed for factorial ANOVA, using the HEIGHT.sav sample data file. Placethe variableHEIGHT as your DependentVariable.Enter SEX as your Fixed Factor, then WEIGHT as the CoDogrdrfvdirt{c variate.This last stepdetermines differthe t{od.L. I l-.:-r [7mencebetween regularfactorialANOVA and c**-1 ANCOVA. Click OK to run theANCOVA. llr,.. I
|'rii l.ri I
till
Crd{'r.t{r}
*l*:" I
q&,, I
td
f-;-'l wlswc0't
l '|ffi
j.*.1,q*J*lgl
79
Reading the Output The outputconsists onemain source of table(shownbelow).This tablegivesyou the main effects and interactionsyou would have receivedwith a normal factorial ANOVA. In addition, thereis a row for eachcovariate. our example, haveonemain In we effect(SEX)andonecovariate(WEIGHT). Normally,we examine covariateline only the to confirmthatthecovariateis significantly related thedependent to variable. Drawing Conclusions This sample analysis was performed determine malesand females to if differ in height, afterweightis accounted We knowthatweightis related height. for. to Rather than matchparticipants usemethodological or we controls, canstatistically remove effectof the weight. When giving the results ANCOVA, we must give F, degrees freedom, of of and significance levelsfor all main effects, interactions,and covariates. main effectsor If interactions are significant,post-hoctests must be conducted. Descriptive statistics (meanandstandarddeviation)for eachlevelof theindependent variable should alsobe given.
Tests of Between-SubjectsEffects Va ri a b l e :H EIGH T Typelll Sumof Mean Source S orrare Souares df 9orrecreoMooel 215.0274 1U /.D 'tJ 2 lntercept 5.580 I 5.580 WEIGHT 1 1 9.964 1't9.964 1 SEX 66.367 66.367 1 Error 1 3.91 1 13 1.070 Total 7 1 9 1 9.000 16 Corrected Total 228.938 15 a. R Squared .939(Adjusted Squared .930) = = R
Siq.
PhrasingResults ThatAre Significant Theabove example obtainedsignificant a result, wecould so state following: the
A one-way between-subjects ANCOVA wascalculated examine effect to the of sexon height, covarying theeffectof weight. out Weightwassignificantly : related height (F(1,13) ll2.ll,p <.001). The maineffectfor sexwas to : 62.02,p< .001), significant (F(1,13) with males significantl! taler (rn: 69.38, sd:3.70) thanfemales = 64.50, = 2.33\. (m sd Phrasing Results ThatAre Not Significant If the covariateis not significant, needto repeat analysis we the withoutincluding thecovariate(i.e.,run a norrnal ANOVA). For ANCOVA results that are not significant, you could statethe following (note thattheF values made for thisexample): are up
80
Chapter6 ParametricInferentialStatistics
to ANCOVA was calculated examinethe effect A one-waybetween-subjects of sex on height,covaryingout the effect of weight. Weight was significantly related height(F(1,13)= ll2.ll, p <.001). The main effectfor sexwas not to = taller (F'(1,13) 2.02,p > .05),with malesnot beingsignificantly significant (m : 69.38, sd : 3.70) than females (m : 64.50, sd : 2.33\, even after covaryingout the effect of weight. Practice Exercise Using Practice Data Set 2 in Appendix B, determine if salariesare different for males and females.Repeatthe analysis,statisticallycontrolling for years of service.Write of a statement resultsfor each.Compareand contrastyour two answers.
Assumptions
that MANOVA assumes you havemultipledependentvariables that are relatedto on each other. Each dependent variable should be normally distributedand measured an interval or ratio scale. SPSS Data Format The SPSSdatafile shouldhavea variablefor eachdependentvariable. One addiindependent variable. It is also postional variable is required for eachbetween-subjects MANOVA and a repeated-measures sible to do a MANCOVA, a repeated-measures additionalvariablesin the data file. require MANCOVA as well. Theseextensions Running the Command Note that this procedure requires an optional module. If you do not have this command, you do not have the proper module installed. This procedure is NOT included in the student version o/SPSS. SAT and GRE scoresfor 18 participants.Six particiThe following data represent pants receivedno specialtraining, six receivedshort-termtraining before taking the tests, and six receivedlong-term training. GROUP is coded0 : no training, I = short-term,2 = lons-term.Enter the dataand savethem as SAT.sav.
8l
SAT 580 520 500 410 650 480 500 640 500 500 580 490 520 620 550 500 540 600
GRE 600 520 510 400 630 480 490 650 480 510 570 500 520 630 560 510 560 600
GROUP 0 0 0 0 0 0 I I I I I I 2 2 2 2 2 2
gaphs t$ltt6t Ad$tfrs Locate the Multivariate commandby ffi; I RrW clickingAnalyze, thenGeneral LinearModel, ) : , Wq$nv*st*rus thenMultivariate. t i ,1$lie ) ) )
,
WNry
(onddc
Bsrysrsn Lgdrcs
Iqkrcc cocporpntr'.,
This will bring up the main dialog box. Enter the dependentvariables (GRE Variin and SAT, in this case) theDependent Enter the independent variables blank. able(s)(GROUP,in this case)in the Fixed Factor(s)blank. Click OK to run the command.
Readingthe Output We are interested two primarysections output.The first onegivesthe results in of of the multivariate tests.The sectionlabeledGROUPis the one we want. This tells us whether variables.Fourdifferenttypesof GROUPhadan effecton anyof our dependent Thus,the anmultivariate results given.The mostwidely usedis Wilks'Lambda. test are Thatvalue freedom. swerfor theMANOVA is a Lambda .828,with 4 and28 degrees of of is not significant.
82
Chapter6 ParametricInferentialStatistics
hIIltiv.T i.ile Teslsc Efiecl Value F Hypothesis df Enordf sis. Intercept Pillai'sTrace 87. .988 569.1 2.000 4.000 000 Wilks'Lambda .012 569.1 97. 2.000 4.000 000 Hotelling's Trace 81.312 569.187: 2.000 4.000 000 Roy's Largest Root 81.31 2 569.1 87! 2.000 4.000 000 group Pillai's Trace 1t4 .713 4.000 30.000 590 Wilks'Lambda 828 .693: 1.000 28.000 603 Holelling's Trace .669 206 4.000 26.000 619 't.469b Roy's Largest Rool 196 2.000 15.000 261 a. Exact statistic b. Theslatistic an upperbound Fthatyieldslowerbound is on a onthesignificance level. c. Design: Intercept+group
The secondsectionof output we want gives the resultsof the univariatetests (ANOVAs)for eachdependent variable.
Tests ot BelweerFs[l)iectsEflects lll Type Sum S our c e D e p e n d e n ta ri a b l e of Squares V df Conecled Model sat 3077.778' 2 gre 5200.000b 2 Intercept sat 5 2 05688.889 1 gre 5 2 48800.000 1 group sat 307t.t78 2 gre 5200.000 I Enor sal 64033.333 15 gre 66400.000 15 Total sat 5272800.000 18 g te 5320400.000 18 4' Corrected Total sat ta 67111.111 qre 71600.000 17 a. R S qu a re= .0 4 6(A d j u s te d S q u a re= -.081) d R d b. R S qu a re d = .0 7(Ad j u s te dSq u a re d=051) 3 R -
Mean Suuare
F 1538.889 .360 2600.000 .587 5205688.889 1219.448 5248800.000 1185.723 1538.889 .360 2600.000 .597 4268.889 4426.667
Sio.
Drawing Conclusions We interpret results the univariate the of testsonly if the group Wilks'Lambdais significant. Our results not significant, we will first consider are but how to interpret resultsthataresignificant.
83
value
d]
Frr6t
dl
Sr d
Roy's Largsst Rool t .350 a Exatlslalistic b. Thestaustic an uppsr is on lml bound F halyisldsa lffisr bound thesiqnitcancB on
c. Desi0n: Intercepl+0roup
Ieir
Sourca corected todet OsoandahlVa.i sat gf8
Effoda
xean
htorcpt
gfoup
Sat ue
sal ore
Erol
T0lal
sat 0rs
ote
62017.tt8' 963tt att! 5859605.556 5997338.889 620tt.t78 863aral a 6121 667 6 6S| |6 667 5985900.000 6152r00 000
126291.444 151761 1|1
2 2
2
15 15 18
31038.889 t.250 13'1t2.222 9.r 65 5859605.556 I 368.71 1 5997338.8S9I 31 4.885 |.250 31038.089 431t2.222 9.t65
4281.11 ',l 1561.,|'rr
Corcled Tolal
sal 0re
l7
A one-way MANOVA wascalculated the examining effectof training(none, short-term, long-term) MZand GREscores. significant on A effectwasfound = (Lambda(4,28) ,423, p: .014).Follow-up ANOVAs indicated univariate thatSATscores weresignificantly p improved training(F(2,15): 7.250, : by .006).GRE scores were also significantly improvedby training(F(2,15): = '002). 9'465, P Phrasing Results That Are Not Significant The actual presented not significant. example was Therefore, couldstate folwe the lowingin theresults section: A one-way MANOVA wascalculated the examining effectof training(none, short-term, long-term) SIZ andGREscores. significanteffect was or on No = found (Lambda(4,28) .828, > .05).NeitherSAT nor GRE scores p were significantly influenced training. by
84
7 Chapter
InferentialStatistics Nonparametric
procedure inapis parametric tests Nonparametric areusedwhenthecorresponding the propriate. Normally,this is because dependentvariable is not interval- or ratioIf variable is not normallydistributed. the the because dependent scaled.It can alsobe statistics alsobe appropriate. may nonparametric counts, are dataof interest frequency of Section7.1 Chi-square Goodness Fit Description proportions whetheror not sample goodness fit testdetermines of The chi-square if it to values. example, couldbe used determine a die is "loaded" For matchthe theoretical of born with birth defects the It couldalsobe usedto compare proportion children or fair. has if value (e.g.,to determine a certainneighborhood a statistically to the population rate higher{han-normal of birth defects). Assumptions aboutthe shape Thereareno assumptions We needto makevery few assumptions. for shouldbe at leastl, andno frequencies eachcategory The of the distribution. expected frequencies lessthan5. of haveexpected should morethan20%o thecategories of Data Format SP,S,S variable. requires onlya single SPSS Runningthe Command The following data We will createthe following dataset and call it COINS.sav. T as represent flippingof eachof two coins20 times(H is coded heads, astails). the C o i NI: H T H H T H H T H HHTTTHTHTTH C O i N 2: T H H T H T H TTHHTH HTHTHH T COINI andCOIN2,andcodeH as I andT as 2. The data Namethe two variables calledCOINI andCOIN2. will file thatyou create have20 rowsof dataandtwo columns,
85
Chapter7 NonparametricInferentialStatistics
To run the Chi-Square comman{ ) click Analyze,then Nonparametric Tests, ,le*l 1-l RePsts Dascri*hrcststistict I then Chi-Square. This will bring up the :Y,:f l Cdmec lihnt main dialog box for the Chi-Square Test. t 6nsrdlhcir$,lo*l t Csrd*c Transfer the variable COINI into , RiErcrskn the Test Variable List. A "fair" coin has I d.$dfy an equal chance of coming up heads or , O*ERadwllon t ft.k tails. Therefore, we will leave the E'"r@ 5*b, pected Values setto All categories equal. ) tkfta We could test a specific set of } Qudry{onbd ROCCt vi,,. proportions by entering the relative freTail quencies in the Expected Values area. Click OK to run the analysis.
i Haad
i***- xeio'
.r...X
3Ll
Reading the Output The output consistsof two sections. The first sectiongives the frequencies (observed1f) of each value of the variable. The expectedvalue is given, along with the differenceof the observed from the expectedvalue (called the residual).In our example, with 20 flips of a coin, we shouldget l0 of eachvalue.
Test Statistics
coinl unt-uquareo df Asymp. Sig.
cotNl
Observed Expected Residual N N 1. 0 11 10.0 10.0 9 -1. 0 20
The second section of the output gives the resultsof the chisquaretest.
1
.200 .655
86
Chapter7 NonparametricInferentialStatistics
Drawing Conclusions A significant chi-square indicates the datavary from test that the expected values. A testthatis not significani indicates thedataareconsistent the that with expected values. Phrasing ResultsThat Are Significant In describing results, should the you state v.alue chi-square the of (whose symbolis thedegrees.of 1'1' dt.aot, it.'rignin.ance level,anaa description of the results. exFor ample'with a significant chi-square (for,a sample'different from the example above, such asif we hadused "loaded"die),we courdstati a the foilowing: A chi-square goodness fit testwascalculated of comparing frequency the of occurrence each of valueof a die.It washypothesized eachvalue that would occuran equalnumber times. of Significanideviation trtr rtypothesized nor values found :25.4g,p i.OS).Thedie was fXlSl appears be.,loaded.,, to Notethatthisexample hypothetical uses values. Phrasing Results ThatAre NotSignificant If the analysis produces significant no difference, in the previousexample, as we couldstate following: the A chi-square goodness fit testwascalculated of comparing the of occulrence heads tailson a coin.It washypothesized frequency of and thateachvalue would occur an equalnumberof times.No significant deviationfrom the hypothesized values wasfound(xre): .20,p rlOsl. The coin apfears to be fair. Practice Exercise
Use Practice Data Set 2 in Appendix B. In the entire population from which the sample was drawn , 20yo of employees are clerical, 50Yo are technical, and 30%oare profes-
drawnconformsto these values. HINT: Enter sional.Determinewhetheror not thesample "ExpectedValues" in of therelative proportions thethreesamples order(20, 50, 30) in the
afea.
Section 7.2 Chi-Square Test of Independence Description The chi-squaretest of independence tests whether or not two variables are independent of each other. For example, flips of a coin should be independent events, so knowing the outcome of one coin toss should not tell us anything about the secondcoin toss.The chi-squaretest of independence essentiallya nonparametricversion of the inis teraction term in ANOVA.
87
Chapter7 NonparametricInferentialStatistics
Drawing Conclusions A significant chi-squaretest indicatesthat the data vary from the expectedvalues. A test that is not significantindicatesthat the dataare consistent with the expectedvalues. Phrosing Results That Are Significant In describing results, you shouldstatethe value of chi-square the (whosesymbolis th" degrees freedom,the significance level, and a descriptionof the results.For exof 1t;, ample, with a significant chi-square(for a sampledifferent from the example above,such as if we had useda "loaded"die), we could statethe following: A chi-square goodness fit test was calculated of comparingthe frequencyof occunenceof eachvalue of a die. It was hypothesized that eachvalue would occur an equal numberof times. Significantdeviationfrom the hypothesized < .05).The die appears be "loaded." valueswas foundC6:25.48,p to Note that this example useshypothetical values. Phrasing Results That Are Not Significant If the analysisproduces significantdifference, in the previousexample,we no as could statethe following: A chi-square goodness fit test was calculated of comparingthe frequencyof occurrence headsand tails on a coin. It was hypothesized of that eachvalue would occur an equal number of times. No significant deviation from the hypothesized valueswas found (tttl: .20,p > .05).The coin appears be to fair. Practice Exercise Use PracticeData Set 2 in Appendix B. In the entire populationfrom which the samplewas drawn, 20o of employees clerical, 50Yoaretechnical, and30o/o profesare are sional. Determinewhetheror not the sampledrawn conformsto thesevalues.HINT: Enter the relativeproportions the threesamples order(20, 50,30) in the "ExpectedValues" of in area.
87
Chapter7 NonparametricInferentialStatistics
Assumptions
aboutthe Very few assumptions needed. are For example,we make no assumptions shapeof the distribution. The expectedfrequencies each categoryshould be at least l, for of and no more than 20o/o the categories of shouldhave expectedfrequencies lessthan 5. Data Format .SP,SS At leasttwo variablesare required. Running the Command The chi-square test of independenceis a llfr}1,* qrg" l.[fltias llllnds* Heb component of the Crosstabscommand. For more Rpports fraqrerrhs,,, details, see the section in Chapter 3 on frequency Compalil&nllt distributionsfor more than one variable. This exampleusesthe COINS.savexample. ' 6nn*rEll."f,rsarMMd ConeJat* COINI is placed in the Row(s) blank, and COIN2 R*lo;,. Reqn*e$uh is placedin the Column(s)blank. P+ P{att,,. Clarsfy
. OaioRrductlon Scala
*QPlotr,.,
*:51
siJ
.!rd I
H'b I
Click Statistics. then check the Chi- p&ffi square box. Click Continue. You may also , x\iref CedilgacycodlU*r want to click Cel/s to select expected fre, [- FiadCrrm#rv quenciesin addition to observedfrequencies, f Llrnbda as we will do on the next page.Click OK to rvYY ruTa' run the analysis. -Ndridblrttavc -f-Eta
lCoc*rgrfs '| '.r '' 'ld
$,iii1ilr'tirir ':l
|- Cqrddionc -*-, Or*ut i il* larna , : i f- lalar'd f Kcnddrldr! -
.w . l
H +l
i lrYa'*o
l{do}Hlcse6l .::'t':,:;,
88
Chapter7 NonparametricInferentialStatistics
Readingthe Output The output consistsof two parts.The first part gives you the counts.In this example,the actual are frequencies shown and expected with the because they were selected Cel/soption.
Crosstabulatlon COINI' GOIN2
corN2
Head Tail
4
Total 11 11.0
Y
golNl
Fleao
Tail
Total
6.1
4 5.0 11 11.0
EN
4.1 I on
9 .0 20 20.0
I | I
1.,..-...-..-..--,..----".*.J i.-*-..--.--.--.'***--*--***-l - ' - ""- "- ^ ' - " ' - "- l : N o d r l c $a W{ t*t l^ 8o.ndc..trradtr r no,"u.or"or*r i i l. Innc*cr*vralrf* I f Trurlccalcorntr | I r Xor4.r*rprr I
i I I I
Note that you can also use the Cells option to of display the percentages each variable that are each value. This is especiallyuseful when your groups are different sizes. The secondpart of the output gives the results test.The most commonlyusedvalue of the chi-square is the Pearson chi-square, shown in the first row (valueof .737).
Chi-SquareTests Value L;nr-!'quare Pearson Correctiorf Continuity Ratio Likelihood Fisher's ExactTest Linear-by-Linear Association N of ValidCases
{3t" df
I
Asymp.Sig (2-sided) 1
1
ExactSig, (2-sided)
ExactSig. (1-sided)
165 740
.342
700 20
.403
a' ComputedonlYfor a 2x2lable countis 4. b. 3 cells(75.0%)haveexpected expected countlessthan 5. The minimum 05.
Drawing Conclusions are that the two variables not indetest A significantchi-square resultindicates do that indicates the variables not vary significantly pendent. valuethatis not significant A from independence.
89
Chapter7 NonparametricInferentialStatistics
Phrasing ResultsThatAre Significant of the you shouldgive the valueof chi-square, degrees In describing results, the with a sigFor of the results. example, level,and a description freedom, significance the we above), couldstate (for nificantchi-square a datasetdifferentfrom the one discussed thefollowing: of the comparing frequency A chi-square of independence calculated was test was interaction found(f(l) : heartdisease menandwomen. significant in A (68%)thanwere p 23.80, < .05).Men weremorelikely to get heartdisease women(40%\ sex, assumes a testwasrun in whichparticipants' as that Notethatthis summary statement was well aswhether not theyhadheartdisease, coded. or Phrasing Results ThatAre Not Significant dependthat indicates thereis no significant A chi-square thatis not significant test Therefore, abovewasnot significant. enceof onevariable the other.The coin example on we couldstate following: the
was calculatedcomparing the result of A chi-squaretest of independence was found (I'(l) = .737,p> flipping two coins.No significant relationship
events. .05).Flipsof a coinappear be independent to Practice Exercise are wantsto knowwhether not individuals morelikely to helpin an or A researcher who Of when they are indoorsthan when they are outdoors. 28 participants emergency and who wereindoors,8helped wereoutdoors, helped 9 did not.Of 23 participants 19 and by is 15 did not. Enterthesedata,and find out if helpingbehavior affected the environwere problemis in the dataentry.(Hint: How many participants ment.The key to this participant?) there, whatdo you knowabouteach and Section7.3 Mann-Whitnev UTest Description t test. of equivalent the independent The Mann-Whitney testis thenonparametric U The distribution. Mannwhether not two independent are It tests or samples from thesame weaker independenttest,and the / testshouldbe usedif you t WhitneyU testis thanthe canmeetits assumptions. Assumptions the of The Mann-Whitney testuses rankings the data.Therefore, datafor the U the of aboutthe shape the dismustbe at leastordinal. Thereareno assumptions two samples tribution.
90
!il;tIFfFF".
,SPS,S Data Format This command requires single variable representing dependent a the variableand a second variable indicating groupmembership.
Running the Command This example will use a new data file. It represents participants a series l2 in of races. There were long races, medium races, and short races. Participants either had a lot of experience (2), some (l), (0). experience or no experience Enter the data from the figure at right in a new file, and savethe data file as RACE.sav. The values for LONG. MEDIUM, and SHORT represent the results of the race, with I being first place and 12 being last.
WMw I Andy* qaphr U*lx , R.po.ti Doroht*o1raru.r , t Cd||olaaMcrfE | 6on ralthccl'lodd I ) C*ru|*c j r Rooa!33hn ') cta*iry i j ) DatrRoddton i I lbb
[6 gl*l
:@e@ J llrsSarfr
I I Q.s*y Cr*rd BOCCuv.,,,
'
Sada
nl nl bl
'
n]'
rf
3 --ei--' 2:
To run the command,click Analyze, then Nonparametric Tests, then 2 Independent Samples. This willbring up the maindialog box. Enter the dependentvariable (LONG, for this example) in the Test Variable List blank. Enter the independentvariable (EXPERIENCE)as the Grouping Variable.Make surethat Mann-WhitneyU is checked. Click Define Groupsto selectwhich two groups you will compare.For this example,we will compare those runners with no experience(0) to those runners with a lot of experience(2). Click OK to run the analvsis.
liiiriil',,, -5J
IKJ
],TYTI
GnuphgVadadr:
| &r*!r*l
C.,"* |
ryl
fffii0?r-*
l ; ..' ,''r 1,
fry* |
l*J
9l
Chapter7 NonparametricInferentialStatistics
Reading the Output The output consistsof two sections. The first section givesdeexDenence scriptive statisticsfor the two sam- rong .uu 2.OO ples. Because data are only rethe Total quiredto be ordinal, summaries relatingto their ranksareused.Those participants who had no experience 6.5 averaged as their placein the race.Thoseparticipants with a lot of experience averaged astheirplacein therace. 2.5 The second section the outputis the resultof of the Mann-WhitneyU test itself. The value obtained was0.0,with a significance levelof .021.
Ranks
N
4
Mean Rank 4 8
Sum of Ranks
Zb.UU
o.cu 2.50
10.00
Mann-wnrrney u Wilcoxon W
z
Asymp. (2-tailed) Sig. Exact Sig.[2'(1 -tailed
sis.)l
Drawing Conclusions A significant Mann-Whitney resultindicates thetwo samples differentin U that are terms theiraverage of ranks. Phrasing Results That Are Significant Our example above significant, we couldstate following: is so the A Mann-Whitney Utestwascalculated the examining place thatrunners with varyinglevelsof experience took in a long-distance Runners race. with no experience significantly did (m worse place: 6.50)thanrunners with a lot of : (m experience place 2.5A;U = 0.00, < .05). p Phrasing Results ThatAre Not Significant If we conduct analysis the short-distance instead the long-distance the on race of race, will getthefollowingresults, we whicharenot significant.
Ranks
T..t St tlttlct
shorl
short
4 4 I
Manrr-wtltnoy u Wilcoxon W
z
Asymp. Sig. (2-tail6d) Exact Sig. [2'(1-tailed
sis.)l
92
Chapter7 NonparametricInferentialStatistics
Therefore,we could statethe following: A Mann-Whitney U test was used to examine the difference in the race performance of runners with no experience and runners with a lot of race.No significantdifferencein the resultsof experience a short-distance in (U :7.50,p > .05).Runnerswith no experience averaged the racewas found 4.38. with a lot of experience averaged a placeof 4.63. Runners Practice Exercise scoresin PracticeExercise I (Appendix B) are measAssumethat the mathematics (< ured on an ordinal scale.Determineif youngerparticipants 26) have significantly lower mathematics scoresthan older participants.
Section7.4 Wilcoxon Test Description (dependequivalent the paired-samples The Wilcoxontestis the nonparametric of The samples from the samedistribution. are ent) t test.It testswhether not two related or I Wilcoxontestis weaker thanthe independenttest,so the I testshouldbe usedif you can meetits assumptions. Assumptions in The The Wilcoxontestis based thedifference rankings. datafor the two samon plesmustbe at leastordinal. Thereareno assumptions about shape thedistribution. of the SP^SS Data Format represents dependent variable at The testrequires variables. variable two One the represents dependent varivariable.The othervariable the onelevelof the independent ableat thesecond variable. levelof theindependent Running the Command
lrnil$ re*|r u*t
Locate the command by clicking Analyze, then Nonparametric Tests,then 2 Related Samples.This example uses the RACE.savdataset.
Crrd*!
'.;,'{' sd*
;l ;
*!!fd...
--f---""
o*sqr'.., '
ffi
I log box
-35+i ;
| wilcoxon
test.
lcucf8drcdomivti*r, , Vrnra2
m
- " -"-l * - --f l.dltf. --: ip ulte! sip l- xrNdil : ',I
9$l ,Hrl
.W
-.lggg*g,lggj=l
oetm..I
93
Chapter7 NonparametricInferentialStatistics
Transfer variables the LONG andMEDIUM as a pair andclick OK to run the test. This will determine the runnersperformequivalently long- and medium-distance if on races. Readingthe Output for statistics thetwo The outputconsists two parts. of The first partgivessummary variables. second The section contains resultof theWilcoxontest(givenas4. the
Ranks Mean Rank 4
a
N MEUTUM NegaUVe LONG Ranks Positive Ranks Ties Total A . M ED IU M L ON G < b. tr,lgOtUtr,it> LONG c . LON G = M ED IU M
5.38 4.70
z
Asymp. sig. (2-tailed)
-1214 .904
3c 12
was the The example above shows difference foundbetween rethatno significant sultsof the long-distance medium-distance and races. Phrasing Results ThatAre Significant A significant the resultmeans between two measurethat a change occurred has ments. thathappened, couldstate following: If we the A Wilcoxon test examined resultsof the medium-distance longand the (Z p distance races. significant A difference foundin theresults = 3.40, < was .05).Medium-distance results. results werebetter thanlong-distance Notethatthese results fictitious. are Phrasing Results ThatAre Not Significant the In fact,the results the example so in werenot significant, we couldstate above following: A Wilcoxon test examinedthe resultsof the medium-distance longand (Z:4.121, distance races. significant No was difference foundin theresults p > .05).Medium-distance from longdifferent results werenot significantly distance results. Practice Exercise Use the RACE.savdata file to determine whetheror not the outcomeof shortyour results. races. Phrase distance races different is from thatof medium-distance
94
Chapter7 NonparametricInferentialStatistics
) r I '
I I
('$5*|..,. l
2 (nC*dsdd.r.,,
t c"{h.I +ry{ |
Hdp I
Reading the Output The output consists of two parts. The first part gives summary statisticsfor each of the groups defined by the groupvariable. ing (independent) rong .uu 1.00 2.00 Total
Ranks exoerience
N
MeanRank 4 4 4 12
1 0. 5 0 6.50 2.50
95
Chapter7 NonparametricInferentialStatistics
part of the outputgivesthe results The second value, of the Kruskal-Wallis (givenas a chi-square test but we will describe as an II).The example it hereis a significant valueof 9.846.
Test Statisticf,b
lonq
unr-uquare
df Asymp. Sig.
9.846 2 .007
Drawing Conclusions that Like the one-wayANOVA, the Kruskal-Wallis assumes the groupsare test is equal. Thus,a significant resultindicates at leastoneof the groups differentfrom at that thereareno ophowever, leastone othergroup.Unlike the One-llayANOVAcommand, tionsavailable post-hoc for analysis. Phrasing Results ThatAre Significant Theexample the above significant, we couldstate following: is so A Kruskal-Wallis the test was conducted comparing outcomeof a longA distancerace for nmnerswith varying levels of experience. significant resultwas found(H(2): 9.85,p < .01),indicating that the groupsdiffered from eachother.Runners a of with no experience averaged placement 10.50, with a lot of while runners with someexperience 6.50 averaged andrunners the experience averaged 2.50. The more experience runnershad, the better theyperformed. Phrasing Results ThqtAre Not Significant race, If we conducted analysis the of usingthe results the short-distance we would getthefollowingoutput, whichis not significant.
Ranks exoenence
N Teet Statlstlc$'b
4 4 4 12
short
unFuquare df Asymp.Sig.
.299
2
.861
This resultis not significant, we couldstate following: the so A Kruskal-Wallistest was conducted comparingthe outcomeof a shortNo distance race for runnerswith varying levelsof experience. significant > .05),indicating the groups did that difference found(H(2):0.299,p was averaged not differ significantly with no experience from eachother.Runners 7.25 and averaged a placement 6.38,while nmners of with someexperience to did with a lot of experience nmners 5.88.Experience not seem averaged influence results the short-distance race. the of
96
Chapter7 NonparametricInfbrentialStatistics
Practice Exercise Use Practice Data Set 2 in AppendixB. Job classification ordinal (clerical< is < technical professional). Determine males females if and havedifferinglevelsofjob clasyourresults. sifications. Phrase Section7.6 Friedman Test Description The Friedman is thenonparametric test equivalent a one-way of repeated-measures ANOVA. It is usedwhenyou havemorethantwo measurements related participants. from Assumptions The testuses rankings thevariables, the datamustbe at leastordinal. No the so of otherassumptions required. are Data Format SP,SS SPSSrequires leastthreevariables the SPSS in at datafile. Eachvariablerepresents dependent the variableat oneof thelevelsof theindependent variable. Running the Command Locatethe command clickingAnalyze,then by Nonparametric Tests, This will thenK Related Samples. bringup themaindialogbox.
f r;'ld{iv
f* Cdi#ln
Placeall the variables representing levelsof the independent the variable in the TestVariables area. this example, the RACE.sav For use datafile andthevariables LONG, MEDIUM. andSHORT. Click O1(.
97
Chapter7 NonparametricInferentialStatistics
Readingthe Output The outputconsists two sections. first section givesyou summary statistics of The for eachof the variables. second The of section the outputgivesyou the results the test of as a chi-square value. The examplehere has a value of 0.049 and is not significant (Asymp. Sig.,otherwise knownasp,is .976, than.05). whichis greater
Ranks Mean Rank
LUNU
MEDIUM SHORT
d f lz
Asymp. Sis. |
a. Friedman Test
Drawing Conclusions The Friedmantest assumes that the threevariablesare from the samepopulation.A significantvalue indicatesthat the variablesare not equivalent.
98
Chapter8
.fi
i$
't, s
j
TestConstruction
Section 8.1 ltem-Total Analvsis Description Item-totalanalysisis a way to assess internal consistencyof a data set. As the such,it is one of many testsof reliability. Item-totalanalysis comprises numberof items a (e.g.,intelligence), that make up a scaleor testdesigned measure singleconstruct to a and determines degreeto which all of the items measure sameconstruct.It doesnot tell the the you if it is measuringthe correctconstruct(that is a questionof validity). Before a test can be valid, however,it must first be reliable. Assumptions All the itemsin the scaleshouldbe measured an interval or ratio scale.In addion If you can tion, eachitem shouldbe normallydistributed. your itemsare ordinal in nature, rho correlationinsteadof the Pearsonr correlaconduct the analysisusing the Spearman tion. Data Forrnat SP.SS SPSSrequiresone variablefor each item (or I Andyle Grrphs Ulilities Window Help question)in the scale.In addition,you must have a RPpo*s , Ds$rripllvo Statirlicc ) variablerepresenting total scorefor the scale. the
iilrliililrr'
Vuiabla*
.q ! l
W Riqression
I"1l J13l
, Condation Coalfubr*c l- Kcndafsta* l|7 P"as* f- Spoamm
!:*l', ."
To conduct it, open the data file you crel * * - * _ _ . -_ ^ _ _ QUESTIONS.sav , Tsd ot SlJnific&cs ated in Chapter 2. Click Analyze, '. f ona.{atud I rr r'oorxaa I then Coruelate, then Bivariate. Place all questionsand the |7 nag ris{icsitqorrcldimr total in the right-hand window, and click OK. (For more help on conductingconelations,seeChapter5.) The total can be calin culatedwith the techniques discussed Chapter2.
99
Readingthe Output The output consistsof a correlation matrix containing all questions and the total. Use the column labeledTOTAL, and locate the correlationbetweenthe total scoreand each question. In the exampleat right, QuestionI hasa correlation 0.873with the of totalscore. 2 Question hasa correlation of -0.130 with the total. Question3 has a correlationof 0.926with thetotal. Interpreting the Output correlaItem-total If correlations shouldalwaysbe positive. you obtaina negative tion, that question shouldbe removedfrom the scale(or you may considerwhetherit should reverse-keyed). be desirable. Generally,item-totalcorrelations greaterthan 0.7 are considered of Thoseof lessthan 0.3 are considered of with correlations lessthan weak.Any questions 0.3 should removed be from thescale. After thetoNormally,theworstquestion removed, is thenthetotal is recalculated. tal is recalculated, item-totalanalysis repeated that was rethe without the question is moved. Then,if anyquestions lessthan0.3,the worstoneis removed, havecorrelations of andtheprocess repeated. is When all remaining correlations greater than 0.3, the remainingitems in the are scale considered be those are to thatareinternally consistent. Section8.2 Cronbach's Alpha Description it Cronbach's As alphais a measure internalconsistency. such, is oneof many of testsof reliability. Cronbach's alphacomprises numberof itemsthat makeup a scale a designed measure singleconstruct (e.g.,intelligence), determines degree the to to a and whichall the itemsaremeasuring same It the construct. doesnot tell you if it is measuring (thatis a question validity). Beforea testcanbe valid,however, it the correct construct of mustfirst be reliable. Assumptions In All the itemsin thescale on should measured an interval or ratio scale. addibe tion, eachitem should normallydistributed. be
Correlatlons
o1
l'oarson \,orrelallon Sig. (2-tailed) N PearsonCorrelalion UZ Si9.(2-tailed) N ParsonCorrelation Q3 Sig.(2-tailed) N TOTAL PearsonConelation Sig.(2-tailed) N Lll 1.000 4 -.447 .553 4
o2
cll
Q3
t to
TOTAL
o/J
.553
4
.282
4
.127
4
1.000 4
-.229 .771
4
30 - .1 .870 4
.YZ O
.718 .?82
4
..229 .771 4
30 -.1 .870 4
.873 .127 4
.074 4 1 .000 4
100
Data Format ,SP,SS in for requires variable eachitem(or question) the scale. one SPSS Running the Command This example uses the QUESTin IONS.sav datafile we first created Chapter then Reliability 2. Click Analyze,thenScale, Analysis. This will bring up the maindialogbox for Reliability Analysis.Transferthe questions from your scaleto the Itemsblank,and repre' any click OK. Do not transfer variables senting total scores.
Udtbr Wh&* tbb )
f:
l,
it.
) )
ql
q3
the Notethatwhenyou change oPof measures tionsunderModel,additional (e.9.,split-half) can internal consistency be calculated.
Readingthe Output is the In this example, reliability coefficient 0.407. closeto Numbers closeto 1.00are very good,but numbers poorinternal consistency. 0.00represent Section8.3 Test-RetestReliability Description of reliabilityis a measure temporal stability. As such,it is a measure Test-retest to that of measures internal consistency tell you theextent whichall of reliability. Unlike of temporal measures the that of the questions makeup a scalemeasure sameconstruct, is over you whether not the instrument consistent time and/orovermultiple or stability tell administrations. Assumptions be The total scorefor the scaleshould an interval or ratio scale.The scalescores be should normallydistributed.
Stntistics RelLrltility Cronbach's N of ltems Aloha 3 .407
l 0l
.SP,S,S Format Data requires variable SPSS a representing totalscore the scale the timeof first for at the participants a administration. second A variable for at representing total score the same the different time (normally two weekslater)is alsorequired. Running the Command The test-retest reliabilitycoefficient simplya Pearson coefficient for is correlation the relationship To between total scores the two administrations. compute coefthe the for (Chapter ficient,follow the directions computing Pearson 5, for coefficient a correlation Section 5.1).Usethetwo variables representing two administrations thetest. of the Readingthe Output The correlation between two scores the test-retest reliabilitycoeffrcient. It the is should positive. be reliability is indicated values Weakreliability Strong closeto 1.00. by is indicated values by close 0.00. to Section 8.4 Criterion-Related Validitv Description Criterion-related validity determines extentto which the scaleyou are testing the correlates with a criterion.For example, highly with GPA. If ACT scores shouldcorrelate theydo, thatis a measure validity forACT scores. theydo not,thatindicates ACT of that If scores maynot be valid for theintended purpose. Assumptions All of the same assumptions the Pearson applyto measfor correlation coefficient uresof criterion-related validity(intervalor ratio scales, normal distribution,etc.). .SP,S,S Format Data you Two variables required. are for Onevariable represents total score the scale the aretesting. Theotherrepresents criterion you aretesting against. the it Running the Command Calculating criterion-related validity involvesdetermining Pearson the correlation valuebetween scale thecriterion. Chapter Section for complete the and informa5.1 See 5, tion. Readingthe Output The correlation between two scores the criterion-related validity coefficient. the is It should positive. be Strong validity is indicated values Weakvalidity is closeto 1.00. by indicated values by close 0.00. to
102
AppendixA
Effect Size
Many disciplines are placing increasedemphasison reporting effect size. While statisticalhypothesistestingprovidesa way to tell the odds that differencesare real, effect sizesprovide a way to judge the relative importanceof thosedifferences.That is, they tell us the size of the difference relationship. They are alsocritical if you would like to estior matenecessary samplesizes,conducta power analysis, conducta meta-analysis. or Many professional (e.g.,the AmericanPsychological organizations Association) now requirare ing or strongly suggesting that effect sizesbe reportedin addition to the resultsof hypothesis tests. Becausethere are at least 4l different types of effect sizes,reach with somewhat different properties,the purposeof this Appendix is not to be a comprehensive resourceon effect size, but ratherto show you how to calculatesomeof the most common measures of effectsizeusingSPSS15.0. Co h e n ' s d One of the simplest and most popular measures effect size is Cohen's d. of Cohen's d is a memberof a classof measurements called"standardized meandifferences." In essence, is the difference d the between two meansdivided by the overall standard deviation. It is not only a popularmeasure effect size,but Cohenhasalso suggested simof a ple basisto interpretthe value obtained. Cohen'suggested effect sizesof .2 aresmall, that .5 are medium,and .8 are large. We will discussCohen's d as the preferredmeasure effect size for I tests.Unforof tunately,SPSSdoes not calculateCohen's d. However,this appendixwill cover how to calculateit from the output that SPSSdoesproduce.
EffectSizefor Single-SampleTests t
Although SPSSdoesnot calculate effect size for the single-sample test,calculat/ ing Cohen's d is a simplematter.
' Kirk, R.E. (1996). Practicalsignificance:A conceptwhose time has come. Educational & Psychological Measurement, 7 46-7 59. 56, ' Cohen,J. (1992). A power primer. PsychologicalBulletin, t t 2, 155-159.
103
T-Test
Om'S$da sltb.tr
std
xoan
LENOIH
Dflalon
16 qnnn
| 9?2
I Cohen's d for a single-sample test is equal to the mean differenceover the standard deviation. If SPSSprovidesus with the following output,we calculated as indicatedhere:
d95$ Conld6nca lhlld.t ottho
D
sD
o'r..Srt!-
Iad
= TsstValuo 35
slg (2-lailsd)
I
Igan Dill6f6nao
uo00r
I l 5 5 F- 0 7
71r'
sooo
In this example,using Cohen's guidelinesto judge effect size, we would have an effect size betweenmedium and large.
EffectSize Independent-Samples t Tests for Calculating effectsizefromtheindependent output a little morecomplex is t test
because SPSSdoesnot provideus with the pootedstandard deviation. The uppersection of the output, however, does provide us with the information we need to calculateit. The output presented here is the sameoutput we worked with in Chapter6.
Group Statlltlcr morntno No yes
graoe
z 2
Std.Error Mean
Z,SUUUU
s.00000
S pooled =
rp o o te d-\ /12
+ | - tlr.Sr552 (2 -t)7.07 (
2+ 21
S pool"d
= Spooted 5'59
Once we have calculated pooled standard deviation (spooud), can calculate we the Cohen'sd.
Q=- -
, x ,-x ,
S pooled
104
Appendix A
Effect Size
So, in this example,using Cohen's guidelinesfor the interpretationof d, we would have obtaineda large effect size.
EffectSizefor Paired-SamplesTests t
As you have probably learned in your statisticsclass, a paired-samples test is I just a specialcaseof the single-sample test.Therefore, procedure calculatreally I the for ing Cohen's d is also the same.The SPSSoutput,however,looks a little different,so you will be taking your valuesfrom different areas.
PairdSamplos Test
Paired Differences 95% Confidence Intervalof the
std.
Mean YAlt 1 |'KE ttss I . FINAL
f)eviatior A OTqA
Std.Error Mean
f)iffcrcnne
sig
Upper
11.646 df (2-tailed)
Lower
-22.809s
20
.000
u --
,D
JD
- 22.8095
8.9756 d = 2.54 Notice that in this example,we representthe effect size (d; as a positive number eventhoughit is negative. Effect sizesare alwayspositivenumbers. this example, In using Cohen'sguidelinesfor the interpretation d, we have found a very large effect size. of 12(Coefficient of Determination) While Cohen's d is the appropriatemeasureof effect size for / tests, correlation and regressioneffect sizes should be determinedby squaringthe correlation coefficient. This squaredcorrelationis called the coefficient of determination. Cohen' suggested here that correlations .5, .3, and.l corresponded large,moderate, of to and small relationships. Thosevaluessquared yield coefficients determination of .25,.09, and .01 respectively. of It would appear,therefore,that Cohen is suggestingthat accounting for 25o/o the variof ability represents large effbct, 9oha moderate a effect, and lo/oa small effect.
power analysisfor the behavioral sciences (2"d ed). New Jersey:Lawrence 'Cohen, J. (1988). Statistical Erlbaum.
105
Appendix A
Effect Size
The standardmeasureof effect size for correlationsis the coefficient of determination (r2) discussedabove. The coefficient should be interpretedas the proportion of variance in the dependent variable that can be accounted by the relationshipbetween for the independentand dependentvariables.While Cohenprovidedsomeusefulguidelines for interpretation, each problem should be interpretedin terms of its true practical significance. For example,if a treatmentis very expensiveto implement,or has significant side effects,then a larger correlationshould be required before the relationshipbecomes"important." For treatments that are very inexpensive, much smaller correlationcan be cona "important." sidered To calculatethe coefficientof determination, simply take the r value that SPSS providesand squareit.
EffectSize Regression for TheModelSummary section of the output reports R2 for you. The example output here shows a coefficient of determination of .649, meaningthat almost 65% (.649\ of the variability in the dependentvariable is accountedfor by the relationship betweenthe dependent and independentvariables.
ModelSummary
Adjusted R R Souare R Souare .8064 .649 .624 Std. Error of the Estimate 16.1 480
Model
Effect Sizefor Analysis of Variance For mostAnalysisof Variance you problems, shouldelectto reportEta Squared asyour effectsizemeasure. provides calculation you as part of the General SPSS for this LinearModel(GLIrl)command. To obtainEta Squared, you simplyclick on theOptions box in the maindialog box for the GLM command arerunning(this worksfor Univariate, you Multivariate, and Repeated Measures versions the command of eventhoughonly the Univariate option is presented here).
106
Onceyou haveselected Options, new a dialog box will appear. One of the optionsin that box will be Estimates ffict sze. When of you select that box, SPSSwill provide Eta Squaredvalues partofyour output. as
rli*dr
f'9n drffat*l Cr**l.|tniliivdrra5:
tl
l c,,*| n* |
Testsot EetweerF$iltectsEftcct3
Dependenl Variable: score Type Sum lll nf Sdila!es df tean Souare Source u0rrecleo M00el 1 0 .4 5 0 r 5.225 2 4 Inlercepl I s't.622 91.622 1 gr0up 5.225 1 0 .4 5 0 Enor 12 .27 4 3 .2 8 3 Total 15 1 0 5 .0 0 0 't4 Corrected Total 1 3 .7 3 3 a. R Squared .761 = = (AdiusiedSquared .721) R
Padial Eta
F
Souared
In the example here,we obtained Eta Squaredof .761for our main effectfor an groupmembership. Eta Because interpret Squaredusingthe same we guidelines ,', we as wouldconclude thisrepresents that alargeeffectsizefor groupmembership.
107
PracticeData Set2
A survey of employeesis conducted. Each employeeprovides the following infor(SALARY), Years of Service (YOS), Sex (SEX), Job Classification mation: Salary (CLASSIFY), and Education Level (EDUC). Note that you will haveto code SEX (Male: : : l, Female: 2) and CLASSIFY (Clerical: l, Technical 2, Professional 3).
SALARY 35,000 18,000 20,000 50,000 38,000 20,000 75,000 40,000 30,000 22,000 23,000 45,000
YOS 8 4 I 20 6 6 17 4 8 l5 16 2
SEX Male Female Male Female Male Female Male Female Male Female Male Female
CLASSIFY EDUC 14 Technical l0 Clerical Professional l6 Professional l6 20 Professional 12 Clerical 20 Professional 12 Technical 14 Technical 12 Clerical 12 Clerical Professional l6
PracticeData Set3
(CONDITION). Participants who havephobias given one of threetreatments are (ANXPRE), Their anxietylevel (1 to l0) is measured threeintervals-beforetreatment at (ANX4HR). (ANXIHR), and againfour hoursafter treatment one hour after treatment Notethatyou will haveto codethevariable CONDITION.
4lcondil 7l
ll0
ANXPRE 8
t0
9 7 7 9 l0 9 8 6 8 6 9 l0 7
ANXIHR 7 l0 7 6 7 4 6 5 3 3 5 5 8 9 6
ANX4HR 7 l0 8 6 7 5 I 5 5 4 3 2 4 4 3
CONDITION Placebo Placebo Placebo Placebo Placebo ValiumrM ValiumrM ValiumrM ValiumrM ValiumrM Experimental Drug Experimental Drug Experimental Drug Experimental Drug Experimental Drug
lll
Appendix C
Glossary
everypossible outcome. All Inclusive.A set of eventsthat encompasses normally showing that there Alternative Hypothesis. The oppositeof the null hypothesis, is a true difference. Generallv. this is the statementthat the researcherwould like to support. Case ProcessingSummary. A sectionof SPSSoutput that lists the number of subjects usedin the analysis. Coefficient of Determination. The value of the correlation, squared. It provides the proportionof varianceaccounted by the relationship. for Cohen's d. A common and simple measureof effect size that standardizes difference the betweengroups. Correlation Matrix. A section of SPSS output in which correlationcoefficientsare reported allpairs of variables. for variable,but not treatedas an Covariate. A variable known to be relatedto the dependent independent variable.Used in ANCOVA as a statisticalcontrol technique. Data Window. The SPSSwindow that containsthe data in a spreadsheet format. This is the window usedfor running most commands. Dependent Variable. An outcome or response variable. The dependent variable is variable. normally dependent the independent on Descriptive Statistics. Statisticalprocedures that organizeand summarizedata. $ $ nialog Box. A window that allows you to enter information that SPSS will use in a command.
* il I 'f
I'
(e.g., gender). Oichotomous with Variables. Variables onlytwo levels DiscreteVariable. A variable that can have only certainvalues(i.e., valuesbetween *hich there no score, A, B, C, D, F). is like
Effect Size. A measurethat allows one to judge the relative importanceof a differenceor relationshipby reportingthe size of a difference. Eta Squared (q2).A measure effectsizeusedin Analysisof Variancemodels. of
i .
I 13
Appendix Glossary C
Grouping Variable. In SPSS, variableusedto represent group membership. the SPSS often refersto independent variables groupingvariables; SPSSsometimes refersto as grouping variables independent as variables. Independent Events.Two events independent information are if aboutoneeventgivesno information aboutthe second event(e.g., two flips of a coin). Independent Variable.Thevariable whose the levels(values) determine groupto whicha subjectbelongs.A true independent variable is manipulated the researcher. by See Grouping Variable. Inferential Statistics.Statistical procedures designed allow the researcher draw to to inferences abouta population thebasis a sample. on of Interaction.With morethanone independent whena level variable, interaction an occurs of oneindependent variable affects influence another variable. the independent of Internal Consistency. reliabilitymeasure assesses extentto which all of the A that the itemsin an instrument measure same the construct. Interval Scale. A measurement scale where items are placed in mutually exclusive categories, with equal intervalsbetweenvalues.Appropriatetransformations include counting, sorting, addition/subtraction. and Levels.The values thata variable have.A variable with threelevelshasthreepossible can values. Mean.A measure central of tendency zero. where sumof thedeviation scores equals the Median.A measure central of tendency whenthe representing middleof a distribution the dataaresorted from low to high.Fifty percent thecases belowthe median. are of Mode. A measure centraltendency of representing value (or values)with the most the (the subjects score with thegreatest frequency). Mutually Exclusive. Two events are mutually exclusivewhen they cannot occur simultaneously. Nominal Scale. A measurement scalewhere items are placed in mutually exclusive categories. Differentiation by nameonly (e.g.,race,sex).Appropriate is include categories "same"or "different. Appropriate " transformations include counting. NormalDistribution. symmetric, A unimodal, bell-shaped curve. Null Hypothesis.The hypothesis be tested,normally in which there is no tnre to difference. is mutuallyexclusive thealternative It of hypothesis.
n4
Appendix Glossary C Ordinal Scale. A measurementscale where items are placed in mutually exclusive categories, in order. Appropriate categories include "same," "less," and "more." include countingand sorting. Appropriatetransformations Outliers. Extreme scoresin a distribution. Scoresthat are very distant from the mean and the rest of the scoresin the distribution. the The left side Output Window. The SPSSwindow that contains resultsof an analysis. summarizes resultsin an outline. The right side containsthe actual results. the of Percentiles(Percentile Ranks). A relative scorethat gives the percentage subjectswho scoredat the samevalue or lower. the Pooled Standard Deviation. A single value that represents standarddeviation of two groupsofscores. Protected Dependent / Tests. To preventthe inflation of a Type I error, the level needed when multipletestsare conducted. to be significantis reduced Quartiles. The points that define a distribution into four equal parts. The scoresat the 25th,50th,and 75th percentile ranks. Random Assignment. A procedurefor assigningsubjectsto conditionsin which each to subject hasan equalchance ofbeing assigned any condition. the Range. A measureof dispersionrepresenting number of points from the highestscore through the lowest score. Ratio Scale. A measurementscale where items are placed in mutually exclusive categories, with equal intervals between values, and a true zero. Appropriate transformations include counting,sorting,additiott/subtraction, multiplication/division. and of Reliability. An indication of the consistency a scale. A reliable scale is intemally consistent stableover time. and Robust. A test is said to be robust if it continuesto provide accurateresultseven after the violationof someassumptions. Significance. A difference is said to be significant if the probability of making a Type I error is less than the acceptedlimit (normally 5%). If a difference is significant, the null hypothesisis rejected. Skew. The extent to which a distribution is not symmetrical.Positive skew has outliers on (left) Negativeskew hasoutlierson the negative the positive(right) sideof the distribution. sideof the distribution. Standard Deviation. A measureof dispersion representinga special type of average deviation from the mean.
I 15
Appendix Glossary C
deviationfor a regression Standard Error of Estimate.The equivalent the standard of line pointswill be normallydistributed regression with a standard line.The data the around deviationequalto the standard errorof the estimate. with a meanof 0.0 anda standard StandardNormal Distribution. A normaldistribution deviation 1.0. of can Numericvariables String Variable. A stringvariable contain letters numbers. and can with stringvariables. contain only numbers. Most SPSS will commands not function that have determined Temporal Stability. This is achieved when reliability measures remainstable scores overmultipleadministrations the instrument. of purportedto reveal an "honestly significant Tukey's HSD. A post-hoccomparison (HSD). difference" rejectsthe null erroneously Type I Error. A Type I error occurswhen the researcher hypothesis. fails to rejectthe erroneously Type II Error. A Type II erroroccurs whenthe researcher null hypothesis. Valid Data.DatathatSPSS usein its analyses. will Validity. An indication theaccuracy a scale. of of Variance.A measure dispersion deviation. of standard equalto thesquared
l16
AppendixD
COINI COIN2
Entered Chapter 4 in QUESTIONS.Sav Variables: Ql 2) in Q2 (recoded Chapter Q3 2) in TOTAL (added Chapter (added Chapter 2) in GROUP 2 Entered Chapter in 2 Modifiedin Chapter
tt7
RACE.sav Variables:
Entered Chapter in 6 Other Files practice For some thatarenot usedin any exercises, Appendix for needed see datasets B otherexamples thetext. in
l18
Appendix E
r{ ril
oe&reoacs,.,
r hertYffd,e
l|]lrffl Clsl*t
GobCse",,
Graphing functions. functions SPSS Prior to SPSS12.0,the graphing of werevery limited.If you are using a version of SPSSolder than version 12.0, third-partysoftwarelike Excel or of for If SigmaPlot recommended theconstruction graphs. you areusingVersion14.0of is graphing. to 4, thesoftware, Appendix asan alternative Chapter whichdiscusses F use
l 19
Variableiconsindicatemeasurement type.
In versions of SPSS earlier than 14.0,variableswere represented in dialog boxes with their variable label and an icon that represented whether the variable was string or numeric (the examplehere shows all variablesthat were numeric). Starting with Version 14.0, SPSS shows additional information about each variable. Icons now representnot only whether a variable is numeric or not, but also what type of measurement scale it is. Nominal variables are represented the & by icon. Ordinal variables are representedby the dfl i.on. Interval and ratio variables(SPSSrefers to them as scale variables) are represented bv the d i"on.
f Mlcrpafr&nlm /srsinoPtrplecnft / itsscpawei [hura /v*tctaweir**&r /TimanAccAolao U 1c $ Cur*ry Orlgin I ClNrmlcnu clrnaaJ dq$oc.t l"ylo.:J
It liqtq'ftc$/droyte
md*
Heb
t|sr
$$imlzeA[ Whdows lCas,sav [Dda*z] - S55 DctoEdtry
t20
F Appendix
8L
l
l
t21
I* -l
ql ryl
OnceChartEditoris open, you caneasilyedit eachelement the graph. select of To just click on therelevant an element, spoton the graph. example, select element For to the representing title of the graph,click somewhere the title (the word "Histogram" the in on theexample below).
Once you have selected element,you can tell that the correctelementis selected an because will havehandlesaroundit. it If the item you have selectedis a text element(e.g.,the title of the graph),a cursor will be presentand you can edit the text as you would in word processing programs.If you would like to changeanotherattributeof the element(e.g., the color or font size), use the Properties box (Text propertiesare shownabove).
t22
graphs you can makeexcellent With a little practice, way you want usingSPSS. the Onceyour graphis formatted it, simplyselect File,thenClose. Data Set we For the graphing examples, will usea new set of data.Enterthe databelowand savethe file as HEIGHT.sav. The data representparticipants' HEIGHT (in inches), WEIGHT(in pounds), SEX(l : male,2= female). and HEIGHT 66 69 73 72 68 63 74 70 66 64 60 67 64 63 67 65 WEIGHT 150 155 160 160 150 140 165 150 110 100 9 52 ll0 105 100 ll0 105 SEX r I I I l l I I 2 2 2 2 2 2 2
the by Checkthat you haveentered datacorrectly calculating mean for eachof a (click Analyze,thenDescriptiveStatistics, the threevariables thenDescriptives). Compare yourresults with those thetablebelow. in
Descrlptlve Statlstlcs
std.
N ntrt(,n I
WEIGHT SEX
Valid N (listwise)
95.00 1.00
Deviation Mean I1.U U oo.YJ/0 J .9U O/ 165.00 129.0625 26.3451 2.00 .5164 1.s000
123
i ) ) ;
tro ps8l
E
."-41
8"". I
Click the Chartsbuttonat the bottom to produce frequencydistributions. Charts This will give you the Frequencies: dialogbox.
c[iffi
,i .,
. kj -i
:
r llrroclu* ,'
There are three types of charts under this command: Bar charts, Pie charts, andHistograms.For each type, the f axis can be either a frequency count or a (selected percentage through the Chart Valuesoption). You will receivethe chartsfor any variablesselectedin the main Frequencies commanddialog box.
124
.u-,, l- :+;:'.n ,,:,
rChatVdues' , 15 Erecpencias
l -, - - ::- ,-
,,,, I , j
'1
i
:
Porcer*ag$
Output
The bar chart consistsof a Y axis, representing the frequency, and an X axis, representing each score. Note that the only values represented the X on axis are those with nonzero frequencies(61, 62, and 7l are not represented).
hrleht
1.5
c a f s a L
1.0
: ea t 61@ llil@ I 65@ os@ I r:@ o 4@ g i9@ o i0@ a i:@ B rro o :aa
The pie chart shows the percentageof the whole that is represented eachvalue. by
ifr
The Histogramcommand creates a grouped frequency distribution. range The of scores is split into evenly spaced groups.The midpoint of each group is plotted on the X axis, and the I axis represents numberof scoresfor each the group. If you selectLl/ithNormal Curve,a normal curve will be superimposed over the distribution.This is very useful for helping you determine the distribution if you haveis approximately normal. Practice Exercise
Use PracticeData Set I in Appendix B. After you have enteredthe data,constructa histogramthat represents mathematics the skills scoresand displaysa normal curve, and a bar chart that represents frequencies the variableAGE. for the
t25
Scatterplots Description
Scatterplots(also called scattergrams scatterdiagrams)display two values for or eachcasewith a mark on the graph.The Xaxis represents value for one variable.The / the axis represents value for the secondvariable. the Assumptions Both variablesshouldbe interval or ratio scales.If nominal or ordinal data are used,be cautiousaboutyour interpretation the scattergram. of .SP,t^t Data Format You needtwo variablesto perform this command. Running the Command
I gr"pl,u Stllities Add-gns :
You can produce scatterplotsby clicking Graphs, then I Scatter/Dot.This will give you the first Scatterplot dialog box. i Selectthe desiredscatterplot (normally,you will select Simple ! Scatter),then click Define.
m m
*t
xl
ffi ffi
ll li Define -
q"rylJ
, Helo I
ffil;|rlil',r
m s.fd
3*J qej *fl
,-
| ,J l-,P*lb
9dMrk6by
IJ T-8off,
L*dqs.bx
rl[I
ti ...:
C*trK
This will give you the main Scatterplot dialog box. Enter one of your variablesas the I axis and the secondas the X axis. For example, using the HEIGHT.sav data set, enter HEIGHT as the f axis and WEIGHT as the X axis. Click
oK.
I'u;i** L":# .
126
Output subject theappropriate andI levels. X at Theoutput consist a markfor each will of
74.00
r20.m
Adding a Third Variable Even though the scatterplotis a twograph,it canplot a third variable. dimensional in To makeit do so,enterthethird variable the SetMarkers by field.In our example,we will Markers by enterthe variable SEX in the ^Sel space. Now our outputwill havetwo different the sets of marks. One set represents male participants, the second represents the and set participants. female These two setswill appear You canuse in differentcolorson your screen. the SPSSchart editor to make them different shapes, in thegraphthatfollows. as fa, I e- l
at.l ccel
x*l
t27
Graph
sox aLm o 2.6
72.00
70.00
58.00
c .9
56.00
o
64.00
oo o
52.00
!00.00
120.m
110.@
r60.00
wrlght
Practice Exercise the UsePractice DataSet2 in Appendix Construct scatterplot examine relato B. a tionship between SALARY andEDUCATION. Advanced Bar Charts Description (seeChapter Section 4, You canproduce charts bar with theFrequencies command 4.3). Sometimes, however, are interested a bar chartwherethe I axis is not a frewe in quency. produce To sucha chart, need usetheBar Chartscommand. we to Data Format SPS,S At leasttwo variables needed performthis command. Thereare two basic are to kinds of bar charts-thosefor between-subjects and designs thosefor repeated-measures variableand designs. the between-subjects Use is method onevariable the independent if the otheris the dependent variable.Use the repeated-measures methodif you havea dependent variable for eachvalueof the G"dr tJt$ths Mdsns pkrv independent variable(e.g.,you wouldhavethreevariables a for IrfCr$We designwith three valuesof the independentvariable). This ) Map normallyoccurs whenyou takemultipleobservations time. over
128
]1
Running the Command Bar Click Graphs,then for eithertypeof bar chart. dialog box. If you haveone This will openthe Bar Charts independentvariable, selectSimple.If you have more thanone,selectClustered. design, select If you are usinga between-subjects Summaries groups of cases.If you are using a refor peated-measures design, select Summariesof separate variables. graph,you measures If you are creatinga repeated overto will seethe dialog box below.Move eachvariable will placeit insidepaBars Represent area,and SPSS the rentheses followingMean.This will give you a graphlike the one below at right. Note that this exampleusesthe 6). 6.4 in data GRADES.sav entered Section (Chapter
Practice Exercise the B. a DataSet I in Appendix Construct bar graphexamining relaUse Practice Hint: In the BarsRepresent and skills scores maritalstatus. mathematics tionshipbetween enterSKILL asthevariable. area.
t29