100% found this document useful (1 vote)
402 views52 pages

Simple Linear Regression and Correlation

http://faculty.kfupm.edu.sa/SE/salamah/Engineering_Statistics/engineering_statistics.htm Dr Muhammad Al-Salamah محمد السلامة
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
100% found this document useful (1 vote)
402 views52 pages

Simple Linear Regression and Correlation

http://faculty.kfupm.edu.sa/SE/salamah/Engineering_Statistics/engineering_statistics.htm Dr Muhammad Al-Salamah محمد السلامة
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 52

Simplelinearregressionand l i correlation

Objectivesofthetopic:
Buildingsimplelinearregressionmodelstodata. Building simple linear regression models to data Understandingthemethodofleastsquaresandhowitisusedto estimateregressionmodelparameters. Assessingtheadequacyoftheregressionmodel. Testinghypothesesandconstructingconfidenceintervalson regressionmodelparameters. Predictingfuturevaluesandconstructingpredictionintervals. Applyingthecorrelationmodel. Applying the correlation model

DrMuhammadAlSalamah,IndustrialEngineering,KFUPM

Regressionmodels
Regressionmodelsareusedtoestablisharelationship betweentwoormorevariables.
Hydrocarbon level(%) Observation Purity(%) x y 1 0.99 90.01 2 1.02 89.05 3 1.15 91.43 4 1.29 93.74 5 1.46 96.73 6 1.36 94.45 7 0.87 87.59 8 1.23 1 23 91.77 91 77 9 1.55 99.42 10 1.4 93.65 11 1.19 93.54 12 1.15 92.52 13 0.98 90.56 14 1.01 89.54 15 1.11 89.85 16 1.2 90.39 17 1.26 93.25 18 1.32 93.41 19 1.43 94.98 20 0.95 87.33

DrMuhammadAlSalamah,IndustrialEngineering,KFUPM

Whenthescatterplotoftwovariablesshowsalineartrend, wesaythetwovariableshavealinearrelationship. The linear relationship between the mean of a random Thelinearrelationshipbetweenthemeanofarandom variableYandxisgivenas E(Y|x)=Y|x =0 +1 x | SinceYisarandomvariable,wecanwrite Y= + x+ Y = 0 + 1 x + Thevariable istherandomerror,whichhasameanof0and varianceof2. Itfollowsthat E(Y|x)=E(0 +1 x+)=0 +1 x+E()=0 +1 x V(Y|x)=V( + x+)=V( + x)+E()=0+2 V(Y|x) = V(0 + 1 x + ) = V(0 + 1 x) + E() = 0 + 2 = 2
DrMuhammadAlSalamah,IndustrialEngineering,KFUPM

Thus,theregressionmodelisalineofmeanvalues,andthe variabilityofYataparticularvalueofxis2. When isnormalwithmean0andvariance2,Yisnormally distributedwithmeanof0 +1 xandvariance2.

DrMuhammadAlSalamah,IndustrialEngineering,KFUPM

Simplelinearregression
Forsimplelinearregression,thereisasingleregressor or predictorvariablexandadependentorresponsevariableY. Supposethatwehavenpairsofobservations(x1,y1),(x2,y2), ,(xn,yn). The method of least squares is used to estimate the Themethodofleastsquaresisusedtoestimatethe parameters0 and1 byminimizingthesumofthesquaresof theverticaldeviationsfromthestraightline.

DrMuhammadAlSalamah,IndustrialEngineering,KFUPM

Theith responseyi canbeexpressedas yi =0 +1 xi +i i =1,2,,n Thesumofthesquaresofthedeviationsoftheobservations fromthetrueregressionlineis

L = i2 = ( yi 0 1 xi ) 2
i =1 i =1

The least squares estimators of the parameters of the Theleastsquaresestimatorsoftheparametersofthe regressionlinemustsatisfy n L = 2 ( yi 0 1 xi ) = 0 0 , i =1


0 1

L 1

0 ,1

= 2 ( yi 0 1 xi ) xi = 0
i =1

DrMuhammadAlSalamah,IndustrialEngineering,KFUPM

Simplifyingthelasttwoequationsleadstothenormal equations:

n 0 + 1 xi = yi
i =1 i =1

0 xi + 1 x = yi xi
i =1 i =1 2 i i =1

The least squares estimates of the intercept and the slope of Theleastsquaresestimatesoftheinterceptandtheslopeof theregressionlineare
0 = y 1 x

and

DrMuhammadAlSalamah,IndustrialEngineering,KFUPM

1 =

n n yi xi n yi xi i =1 i =1 n i =1 xi n xi2 i =1 n i =1
n 2

Thefittedregressionlineis

y = 0 + 1 x

DrMuhammadAlSalamah,IndustrialEngineering,KFUPM

Foreachpairofobservations,thefollowingrelationholds:

yi = 0 + 1 xi + ei

i = 1,..., n

Thetermei iscalledtheresidualanditisequalto ei = yi yi i = 1,..., n Define the following notations Definethefollowingnotations


xi n n S xx = ( xi x ) 2 = xi2 i =1 n i =1 i =1 n n yi xi n n S xy = ( yi y )( xi x ) = yi xi i =1 i =1 n i =1 i =1
n 2

DrMuhammadAlSalamah,IndustrialEngineering,KFUPM

Theleastsquareestimateoftheslopecanbewrittenas = S xy 1 S xx

DrMuhammadAlSalamah,IndustrialEngineering,KFUPM

Example
SeeExample111inthetextbook.

DrMuhammadAlSalamah,IndustrialEngineering,KFUPM

Estimating2
Theresidualscanbeusedtoestimatethevarianceoferror. Theerrorsumofsquaresisgivenby SS E = ei2 = ( yi yi ) 2
i =1 i =1 n n

Anunbiasedestimatorof2 isprovedtobe SS E 2 = n2 Itcanbewrittenalso SS E = SST 1S xy


n yi n n for SST = ( yi yi ) 2 = yi2 i =1 n i =1 i =1
2

(total sum of squares of y)

DrMuhammadAlSalamah,IndustrialEngineering,KFUPM

Example
Asimplelinearregressionmodelisassumedtoadequately establishtherelationshipbetweencompressivestrengthx andintrinsicpermeabilityyofconcretemixes. di t i i bilit f t i Asampleofn=14wastaken,anditwasfound y=572 y2=23 530 x=43 x2=157 42 xy=1 697 8 y=572, =23,530, x=43, =157.42, xy=1,697.8 Calculatetheleastsquaresestimatesoftheslopeandthe interceptoftheregressionline. Estimate2. Predictthevalueofyforx=4.3. Forx=3.7andy=46.1,computethevalueoftheresidual.

DrMuhammadAlSalamah,IndustrialEngineering,KFUPM

Usingtheformulas: 1 = 2.33, 0 = 48.01 Computingthevariance:

432 S xx = 157.42 = 25.35 14 (572)2 = 159.71 SST = 23,530 14 (572)(43) S xy = 1,697.8 = 59.06 14 SS E = 159.71 + 2.33(59.06) = 22.11 22.11 = = 1.84 12
2

DrMuhammadAlSalamah,IndustrialEngineering,KFUPM

Forx=4.3,thepredictedvalueofyis y = 48.01 2.33( 4.3) = 37.99 Forx=3.7,fromtheregressionmodel:


y = 48.01 2.33(3.7) = 39.39

The residual is Theresidualis e = 46.1 39.39 = 6.71

DrMuhammadAlSalamah,IndustrialEngineering,KFUPM

Propertiesoftheleastsquaresestimators
Thevaluesoftheinterceptandtheslopeoftheregression linedependontheobservedvaluesoftheresponsevariable y,whichisarandomvariablewithmean0 +1 xandvariance hi h i d i bl ith d i 2. Thus,theleastsquaresestimatorsareinturnrandom , q variables. Ithasbeenshownthat
E ( 0 ) = 0 E (1 ) = 1 1 x2 V ( 0 ) = 2 + n S xx 2 V (1 ) = S xx

DrMuhammadAlSalamah,IndustrialEngineering,KFUPM

Thecovarianceoftheslopeandinterceptrandomvariables hasbeenshowntoequalto

, ) = 2 x cov( 0 1 S xx The estimated standard error of the slope and estimated Theestimatedstandarderroroftheslopeandestimated standarderroroftheinterceptare
se(1 ) = 2 S xx 1 x2 se( 0 ) = 2 + n S xx

DrMuhammadAlSalamah,IndustrialEngineering,KFUPM

Hypothesistests
Considerthehypothesisontheregressionlineslope: H0:1 =1,0 H1:1 1,0 Thestatistic 1 1, 0 T0 = 2 / S xx followsthetdistributionwithn2degreesoffreedom. follows the t distribution with n 2 degrees of freedom Thenullhypothesesisrejectedif|t0|>t/2,n2. The test statistic T0 can be written as TheteststatisticT canbewrittenas 1 1, 0 T0 = se(1 )
DrMuhammadAlSalamah,IndustrialEngineering,KFUPM

Thehypothesisontheinterceptcanbewrittenas H0:0 =0,0 H1:0 0,0 Theteststatisticforthishypothesisis 0 0, 0 0 0, 0 T0 = = se( 0 ) 1 x2 2 + n S xx The null hypotheses is rejected if |t0| >t/2 n 2. Thenullhypothesesisrejectedif|t | t/2,n2

DrMuhammadAlSalamah,IndustrialEngineering,KFUPM

Significanceofregression
Aspecialhypothesisisgivenby H0:1 =0 H1:1 0 FailuretorejectH0 isequivalenttoconcludingthatthereisno linearrelationshipbetweenxandY. linear relationship bet een and Y IfH0 isrejected,thisimpliesthatxisofvalueinexplainingthe variabilityinY. y

DrMuhammadAlSalamah,IndustrialEngineering,KFUPM

Example
Seeexample112inthetextbook. Theregressionmodelforthedatahasbeen foundtobe:
Observation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Hydrocarbon level(%) Purity(%) x y 0.99 0 99 90.01 90 01 1.02 89.05 1.15 91.43 1.29 93.74 1.46 96.73 1.36 94.45 0.87 87.59 1.23 91.77 1.55 99.42 1.4 93.65 1.19 93.54 1.15 92.52 0.98 90.56 1.01 89.54 1.11 89.85 1.2 90.39 1.26 1 26 93.25 93 25 1.32 93.41 1.43 94.98 0.95 87.33

y = 74.283 + 14.947 x
Testthehypothesis H0: 1 =0 : 0 H1:1 0 for =0.05. Inthecalculations,maintain3decimal places.

DrMuhammadAlSalamah,IndustrialEngineering,KFUPM

Wecomputetheestimateofthevariance: Sxx =0.681

SS E = S yy 1S xy = 173.377 - 14.947 10.179 = 21.231


2 = SS E 21.231 = = 1.180 n2 18

We compute the value of the test statistic: Wecomputethevalueoftheteststatistic: 1 1, 0 14.947 0 t0 = = = 11.355 2 1.180 / 0.681 / S xx Fromthetdistributiontables,t0.005,18 =2.88. Sincet0 >t0.005,18,H0 shouldberejectandweshouldsaythat , theregressionlineinterceptisdifferentfromzero.
DrMuhammadAlSalamah,IndustrialEngineering,KFUPM

Example
Asimplelinearregressionmodelisassumedtoadequately establishtherelationshipbetweencompressivestrengthx andintrinsicpermeabilityyofconcretemixes. di t i i bilit f t i Asampleofn=14wastaken,anditwasfound 1 = 2.33, 0 = 48.01 S xx = 25.35, 2 = 1.84 Testforsignificanceofregressionfor=0.05. g g Estimatethevarianceoftheestimatedslope.

DrMuhammadAlSalamah,IndustrialEngineering,KFUPM

Weneedtotestthehypothesis H0:1 =0 H1:1 0 Thevalueoftheteststatisticis 2.33 0 t0 = = -8.65 1.84 / 25.35 Since t0 = 8 65 < t0.025,12 = 2 179 H0 must be rejected and we Sincet =8.65<t =2.179,H mustberejectedandwe concludethatxsignificantlyexplainsthevariabilityiny. Thevarianceoftheregressionslopeis

2 1.84 V (1 ) = = = 0.07 S xx 25.35


DrMuhammadAlSalamah,IndustrialEngineering,KFUPM

Analysisofvariance
Themethodofanalysisofvariancecanbeusedtotestfor significanceofregression. ThetotalvariabilityinYispartitionedintomeaningful components,whichisthebasisofthetest. The analysis of variance identity is stated as Theanalysisofvarianceidentityisstatedas
( yi y )2 = ( yi y )2 + ( yi yi )2
i =1 i =1 i =1 n n n

SST

= SS R

+ SS E

SSR measurestheamountofvariabilityinyi accountedforby theregressionline,anditiscalledtheregressionsumof squares. squares


DrMuhammadAlSalamah,IndustrialEngineering,KFUPM

( x, y )

( y y) ( y y)
( x, y )

y = 0 + 1 x

( y y)
( x, y )

DrMuhammadAlSalamah,IndustrialEngineering,KFUPM

SSE istheresidualvariationleftunexplainedbytheregression line,anditiscalledtheerrorsumofsquares. SST isthetotalcorrectedsumofsquaresofy. SST hasn1degreesoffreedom,SSR has1,andSSE hasn2. Di iding the s m of sq ares b the degrees of freedom leads Dividingthesumofsquaresbythedegreesoffreedomleads towhatiscalledthemeansquares.Thus,SSR/1=MSR and SSE/(n2)=MSE. Definetherandomvariable SS R / 1 MS R F0 = = SS E /( n 2) MS E

DrMuhammadAlSalamah,IndustrialEngineering,KFUPM

Considerthenullhypothesis H0:1 =0 H1:1 0 IftheH0 istrue,thenF0 followstheFdistributionwith1and n2degreesoffreedom. n 2 degrees of freedom Thus,H0 shouldberejectediff0 >f,1,n2. The computations are organized in the analysis of variance Thecomputationsareorganizedintheanalysisofvariance table:
Sourceof variation Regression Error Total Sumof squares SSR SSE SST Degreesof freedom 1 n2 n1 Mean squares MSR MSE Fo MSR/MSE

DrMuhammadAlSalamah,IndustrialEngineering,KFUPM

Example
Seeexample113inthetextbook. Testthesignificanceofregressionusingtheanalysisof variance. Theanalysisofvariancetable
Sourceof variation Regression Error Total Sumof squares 152.13 21.25 173.38 Degreesof freedom 1 18 19 Mean squares 152.13 1.18 fo 128.86

At =0.05,f0 =128.86>f0.05,1,18 =4.41,hencewesaythat thereisasignificantrelationshipbetweenxandy.

DrMuhammadAlSalamah,IndustrialEngineering,KFUPM

Confidenceintervalsontheslopeandintercept
Ithasbeenshownthatthetworandomvariables 1 1

2 / S xx 0 0 1 x2 2 + n S xx
followthetdistributionwithn2degreesoffreedom. A 100(1)% confidence interval on the slope in simple linear A100(1 )%confidenceintervalontheslopeinsimplelinear regressionis

1 t / 2,n2 2 / S xx 1 1 + t / 2 ,n2 2 / S xx
DrMuhammadAlSalamah,IndustrialEngineering,KFUPM

A100(1)%confidenceintervalontheinterceptinsimple linearregressionis

0 t / 2 ,n 2

1 x2 1 x2 2 + 0 0 + t / 2 , n 2 2 + n S xx n S xx

DrMuhammadAlSalamah,IndustrialEngineering,KFUPM

Example
Seeexample114inthetextbook. Itishasbeencalculatedthat

1 = 14.947, S xx = 0.681, 2 = 1.180


Ithasbeenaskedtoconstructa95%confidenceintervalon theslopeoftheregressionline. From the tdistribution tables t0 025 18 =2.101. Fromthet distributiontables,t0.025,18 2 101 Theconfidenceintervalis 14.947 2.101 1.18 / 0.681 1 14.947 + 2.101 1.18 / 0.681

12.181

17.713

Note:maintain3decimalplacesallthewayinthecalculations. DrMuhammadAlSalamah,IndustrialEngineering,KFUPM

Confidenceintervalonthemeanresponse
Ataspecificvaluex0,theconfidenceintervalonE(Y|x0)=Y|x0 iscalledtheconfidenceintervalabouttheregressionline. SinceE(Y|x0)=Y|x0 =0 +1 x0,anunbiasedpointestimator ofthemeanresponseatx0 wouldbe = + x
Y | x0
0 1 0

Thevarianceofthemeanresponseis p

1 ( x0 x ) 2 V ( Y |x0 ) = 2 + n S xx
Noticethatthemeanresponseisnormallydistributed becausetheslopeandtheinterceptsarebothnormal. because the slope and the intercepts are both normal
DrMuhammadAlSalamah,IndustrialEngineering,KFUPM

Therefore,therandomvariable Y | x0 Y | x0
1 ( x0 x ) 2 2 + n S xx

hasatdistributionwithn2degreesoffreedom. A 100(1 )% confidence interval on the mean response at A100(1 )%confidenceintervalonthemeanresponseat x=x0 is
Y | x0 t / 2 , n 2 1 ( x0 x ) 2 1 ( x0 x ) 2 2 2 + Y | x0 Y | x0 + t / 2 , n 2 + n S xx n S xx

DrMuhammadAlSalamah,IndustrialEngineering,KFUPM

Example
Seeexample115inthetextbook. Itisgiventhat n = 20, x = 1.196, 0 = 74.283, 1 = 14.947
S xx = 0.681, 2 = 1.180

Ithasbeenaskedtoconstructa95%confidenceintervalon themeanresponseforx0 =1. The estimated mean response is Theestimatedmeanresponseis Y | x0 = 74.283 + 14.947(1) = 89.23 Fromthetdistributiontables,t0.025,18 =2.101. Theconfidenceintervalis

1 (1 1.196) 2 1 (1 1.196) 2 89.23 2.101 1.18 + Y | x0 89.23 + 2.101 1.18 + 20 0.681 20 0.681 88.486 Y | x0 89.974
DrMuhammadAlSalamah,IndustrialEngineering,KFUPM

Theplotofx0 vs theconfidenceintervalonthemean responseisgivenbelow.

DrMuhammadAlSalamah,IndustrialEngineering,KFUPM

Predictionofnewobservations
RegressionmodelsareusedtopredictnewobservationsY correspondingtoaspecificvalueoftheregressor variablex. Ifx0 isthevalueoftheregressor,then Y0 = 0 + 1 x0 isthepointestimatorofthenewvalueoftheresponseY is the point estimator of the ne al e of the response Y0. Theerrorinthepredictionisgivenby e = Y Y
p 0 0

Thiserrorisarandomvariablewithmeanzeroandvariance 1 ( x0 x ) 2 V (e p ) = V (Y0 Y0 ) = V (Y0 ) + V (Y0 ) = 2 + 2 + n S xx

1 ( x0 x ) 2 = 2 1 + + n S xx
DrMuhammadAlSalamah,IndustrialEngineering,KFUPM

Iftheestimatedvarianceisused,thentherandomvariable Y Y
0 0

1 ( x0 x ) 2 2 1 + + n S xx
hastdistributionwithn2degreesoffreedom. A 100(1 )% confidence interval on the predicted observation A100(1 )%confidenceintervalonthepredictedobservation Y0 atthevaluex0 is
y0 t / 2 , n 2 1 ( x0 x ) 2 1 ( x0 x ) 2 2 1 + + Y0 y0 + t / 2,n 2 1 + + n S xx n S xx
2

DrMuhammadAlSalamah,IndustrialEngineering,KFUPM

Example
Seeexample116inthetextbook. Itisgiventhat n = 20, x = 1.196, 0 = 74.283, 1 = 14.947
S xx = 0.681, 2 = 1.180

Ithasbeenaskedtoconstructa95%confidenceintervalon thepredictedresponseY0 forx0 =1. The estimated response is Theestimatedresponseis y0 = 74.283 + 14.947(1) = 89.23 Fromthetdistributiontables,t0.025,18 =2.101. Theconfidenceintervalis
1 (1 1.196) 2 1 (1 1.196) 2 89.23 2.101 1.181 + + + Y0 89.23 + 2.101 1.181 + 20 0.681 20 0.681 86.829 Y0 91.631
DrMuhammadAlSalamah,IndustrialEngineering,KFUPM

Adequacyoftheregressionmodel
Assumptionsofregressionmustbetestedbeforethe regressionmodelcanbeusedtoprovidemeaningful information. i f ti Theassumptionofregressionare Errors are uncorrelated Errorsareuncorrelated Errorshavemeanzero Errorshaveconstantvariance Errorsarenormallydistributed Theorderoftheregressionmodelmustbechecked.

DrMuhammadAlSalamah,IndustrialEngineering,KFUPM

Residualanalysis
Residualsarethedifferencesbetweentheactualobservations andthefittedvalues: ei = yi yi Residualanalysisisusedtochecktheassumptionthatthe errorsareapproximatelynormalwithconstantvarianceand errors are approximately normal with constant variance and whetheradditionaltermsinthemodelwillbeuseful. Tocheckthenormalityoftheresiduals,thenormalprobability plotisused. Thestandardizedresidualsareusedmorethantheactual residuals. residuals Thestandardizedresidualdi iscomputedas ei di = 2
DrMuhammadAlSalamah,IndustrialEngineering,KFUPM

Approximately95%ofthestandardizedresidualsshouldfallin theintervalbetween2and+2iftheerrorsarenormal. Otherplotsarehelpful,suchas: Residualsintimesequence Resid als s fitted al es Residualsvs fittedvalues Residualsvs theregressor x

DrMuhammadAlSalamah,IndustrialEngineering,KFUPM

Example
Seeexample117inthetextbook.

DrMuhammadAlSalamah,IndustrialEngineering,KFUPM

Coefficientofdetermination
Awidelyusedmeasureofregressionmodeladequacyisthe coefficientofdetermination,R2. Thecoefficientofdeterminationis SS R SS E 2 R = = 1 SST SST Fromtheanalysisofvarianceidentity,thevalueofR2 is between0and1. ThevalueofR2 tellsabouttheamountofvariabilityinthe dataexplainedoraccountedforbytheregressionmodel. Forexample,whenR2 =0.877,itissaidthatthemodel accountsfor87.7%ofthevariabilityinthedata.

DrMuhammadAlSalamah,IndustrialEngineering,KFUPM

Correlation
Insomeapplicationsofregression,bothXandYarerandom variables. Hence,itisassumedthattheobservations(Xi,Yi)arejointly distributedrandomvariables,withadistributionfunction f(x,y). ( ,y) Itisassumedf(x,y)isabivariate normalfunction. TherandomvariableYhasmeanY andvarianceY2,Xhas meanX andvarianceX2. ThecorrelationcoefficientbetweenYandXisdefinedas XY = X Y

XY is the covariance between X and Y isthecovariancebetweenXandY.


DrMuhammadAlSalamah,IndustrialEngineering,KFUPM

TheconditionaldistributionofYgivenX=xisnormal

1 fY |x ( y ) = e 2 Y |x
where

1 y 0 1 x 2 Y |x

Y 0 = Y X X Y X Thevarianceisgivenby 1 =
2 2 Y |x = Y (1 - 2 )

DrMuhammadAlSalamah,IndustrialEngineering,KFUPM

Thus,theconditionaldistributionofYgivenX=xisnormal with
E (Y | x ) = 0 + 1 x
2 V (Y | x ) = Y | x

Themaximumlikelihoodestimatorsof0 and1 are


0 = Y 1 X 1 =

(Y Y )( X
i =1 i n i=1 i 1

X)

( X i X )2

S XY = S XX

DrMuhammadAlSalamah,IndustrialEngineering,KFUPM

Theestimatorof isthesamplecorrelationcoefficient:

R=

(Y Y )( X
i =1 i n n i =1

X) =

( X i X ) 2 (Yi Y ) 2
i =1

S XY S XX SST

It can be written that Itcanbewrittenthat = R SST 1 S XX Hence,itcanbesaidthattheslopeisjustthesample correlationcoefficientRmultipliedbyascalefactor,whichis thesquarerootofthespreadoftheYvaluesdividedbythe h f h d f h l d d db h spreadoftheXvalues.


DrMuhammadAlSalamah,IndustrialEngineering,KFUPM

Itcanbewrittenthat

S XX 1 S XY SS R R = = = SST SST SST Hence,thecoefficientofdeterminationR2 isonlythesquare ofthecorrelationcoefficientbetweenYandX. of the correlation coefficient bet een Y and X
2 2 1

DrMuhammadAlSalamah,IndustrialEngineering,KFUPM

Significanceofcorrelation
Thehypothesisonthesignificanceofthecorrelationisstated as H0: =0 H1: 0 The random ariable Therandomvariable
1 R2 followsthetdistributionwithn2degreesoffreedomifH0 is true. Therefore,H0 shouldberejectedif|t0|>t/2,n2. T0 = R n2

DrMuhammadAlSalamah,IndustrialEngineering,KFUPM

Example
Pullstrength Wirelength ll h l h

Seeexample118inthetextbook. Theindustrialengineerisinvestigating therelationshipbetweenpullstrength ofawirebondandwirestrength. From the data the following are Fromthedata,thefollowingare computed: Sxx =698.56 Sxy =2027.71 y SST =6105.94 Thesamplecorrelationcoefficientis

2027.71 r= = 0.98 698.56(6105.94 )


DrMuhammadAlSalamah,IndustrialEngineering,KFUPM

Observation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

y 9.95 24.45 31.75 35 25.02 16.86 14.38 9.6 24.35 27.5 17.08 37 41.95 11.66 11 66 21.65 17.89 69 10.3 34.93 34 93 46.59 44.88 54.12 56.63 22.13 21.15

x 2 8 11 10 8 4 2 2 9 8 4 11 12 2 4 4 20 1 10 15 15 16 17 6 5

Wetestthesignificanceofcorrelation: H0: =0 H1: 0 at =0.05. Theteststatisticis


1 0.98 Sincet0 >t0.025,23 =2.069,H0 shouldberejected. It should be concluded that there is a significant correlation Itshouldbeconcludedthatthereisasignificantcorrelation betweenYandX.
2

t0 =

0.98 25 2

= 24.80

DrMuhammadAlSalamah,IndustrialEngineering,KFUPM

You might also like