0% found this document useful (0 votes)
115 views35 pages

Regression & Correlation

Uploaded by

Danish Vohra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
115 views35 pages

Regression & Correlation

Uploaded by

Danish Vohra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 35
CORRELATI9; AND REGRESsjo; 10.1. INTRODUCTION Im the preceding chapters, we were dealing with the statistical analysis involving aunizig data, i.e., a data related to a single measurement. Many statistical problems arise where we ue Pairs, triplets or a higher number of measurements, Such data are called bivariate, tivareg multivariate data, In the present chapter, we shall be dealing with a bivariate data in whic ee will be pairs of values corresponding to two variables or characteristics for each unit uit study, ¢.g., fertilizer used and yield of various plots, income and expenditure of dfs households, volume and pressure of a gas, height and weighYof individuals, etc. In such acie the interest lies in analysing whether there is any relatioriship between the two characteris under study, c., whether changes in the value of one variable cause changes in the value oft in Some systematic way or not. For example, one may be interested in studying whether wih increase in the quantity of fertilizer used, the yield of a crop increases, decreases of ems unaffected, Jn the case of physical sciences, most of the times there exist a strong relationship b¥ id ef aisles, ¢.8., for a gas, at fixed temperature, a given value of pressure cores TP yemune: In such a case, the relationship can be studied by ploting the pairs of vibe Sates ts the other hand, in social and in natural sciences, the exact effect of change 3 different plots eel 1 wrely known, For example, if we add the same quantity ot wen other natural teens Ot Bet the same yield in all the plots. The reason being om 4 ene one wishes to a ecting the phenomenon which cannot be fully controlled. hh aici! Which ditection, This cajentont UPt© Which yield is affected by the quantity of ferilize” ™ ‘his will form the subject matter of correlation analysis. aod if the two variables are related, one may wish to estimate or predict HY Tsp other known variable, in case the sell iA ii ‘ low the expected yield if a given quamttY © ios the help of analysis which we shall discuss” cot CORRELATION AND REGRESSION 10.2. CORRELATION ANALYSIS 127 Asalready pointed out, correlation analysis aims at fin ing the degree of relationshi ip and the direction of relationship between two variables. As far as direction is concerned, if it is found that ashe value of one vari s, the value of other variable also increases, on the average, the two variable are said tq be positively correlated. On the other hand if tié increase in one variable is followed by a decrease in the other, the Variables are sai to be negatively correlated: There may be a third situation where with the change in one variable the value o f the other remains constant on the average. In such cases there is zerd or.no correlation and variables are then said to be uncorrelated. Following are the methods commonly used for studying correlation between two variables : \ 1. Scatter diagram 2. Karl Pearson’s coefficient of correlation N 3. Rank correlation coefficient = Of these, first method is based on the knowled, ige of diagram whereas the other two are mathematical procedure. Now, we shall discuss these methods in detail. 1. Scatter Diagram. For a bivariate data, if the pairs of values are plotted in the x-y plane, sae De seam of dots so obtained is called a scatter diagram. From this scatter diagram, one can have Tit) a fairly good, though vague, idea about the correlation between two variables. If the points (dots) have) are concentrated about some curve the variables are expected to be correlated. If all the plotted te t| points fall around a straight line, then we say that linear correlation exist between the two variables. there | Formostof the practical situation: Ss, We assume the relationship to be linear just to introduce. simplicity sndet| inthe analysis, fre On the other hand if the dots are widely scattered, a poor correlation is expected. Figure 10.1 cas | TSHesens some scatter diagrams, In (a) all the points lie on a straight line with positive slope. This ristics Y Y Y f oft 4 7 4 ith be emits ae Ql rtxo x0 x ds © (@) PERFECT POSITIVE (b) LIMITED DEGREE OF (C) PERFECT NEGATIVE “3 on? CORRELATION Positive conreLario (©) PERFECT NEGA Pd 2 io aos y ioe y oe } oe am iP f ve % oO +x 0 ox ae (0) LIMITED DEGREE OF —_(E) ZERO CORRELATION gor i NEGATIVE CORRELATION ils 8+ 10.1, Scatter'diagram showing positive, negative and zero correlation. jo? STATISTICAL METHODS FOR RESEAR( ICH Woy 128 iti ion. the dots are is £ perfect positive correlation. In (6) S are scattered is the case of M “4 degree of positive correlation, sig i iti is indi limites a th positive slope. This indicates a z i . pce in indicate perfect negative correlation and limited dent thy 8 Ofte (0) and (4) A © Dots in (e) are scattered in.an irregular manner. Corre atic 10n in By correlation respectively. case is said to be zero ibjective meth This is quite a sut : : value to the extent of relationship that exists betwee! ‘ber of observations is fairly large. a scatter diagram if the num! cient of Correlation. To measure the degree of linear, i od of measuring correlation. It does riot give a two variables. Moreover, itis ditt 9 2. Karl Pearson Coeff h Biometrician, Karl Pearson, developed a formula called Comte between two variables, @ Britisl coefficient. It is a numerical measure of linear relationship and is defined below : If (Kp Y)s = Ay Qronera is a bivariate data then Karl Pearson’s Linoou" Correlation Coefficient r (X, Y) is given as (@. “Cov (% Y) Aeghientn) r®& Y= Cov {e, *) | iD i rg Pa = 2 << where, Cov (X,Y) = ye, -X) %-Y) -t > XY, - XY i= = i=1 - is called the Covariance of X and-y, in analogy with the term variance that is used inca univariate data, : Thus substituting the expressions for Cov (X, Y), 0, and gy in (1), we get (XY) = —2& -)%-¥) S EX 0X! 4 & Ex, x ayy ex?-e en! also called pot ¢ correlation coeffici i Heanor ficient calculated by the above formula is also (8 ~ > 2 is variance of X, and 1 n 5 _y= 2_¥ > (y%- ¥)’ = B¥/ - Y” is variance of Y. Properties of C . wearing Colton one hein rp Th sir proofs. {G) The value of comelation coefficient lies between | and J a ond) "roof. Let us put 2 x; ay i and ¥ ey Then, r= 1 pyr : ° XY}, and Lex? = 1 n = 2X Lye = Lay? 1 Since, 1 xx _ Since, 77 2K + ¥, = 0 oof me or used inc CORRELATION AND REGRESSION 129 ot dexp+t Lyy? +2 sxvie0 or 20+nz0 or orz-l agin +E (Xj -¥)? = 0 or = %1-Hz0 o rei, Which proves the other part of property. Ths-1srstl. The value of r will be —1 if =o =. x= -¥i, or Y= ¥->-K -X), for each i. Similarly, the value of r will be 1 if =X or ¥,-¥+ 2 (x,-%), for each & Thus, in these cases each variable is an exact linear combination of the other and the variables are said to be perfectly correlated. ae Atif The correlation coefficient is a/pure number which is independent of the units of measurement of the two vari Proof. Let for each pair of values (X;, Y,), we have the pair of values (U;, V) given as Xi-A Yi-B Ua, and WG X=A+cU, since X,=A+cU; or X,-X=c(;-W) Similarly, =d(V,-V) Hence, we gave variances and covariances of X and Y as Cov (KY) = ed HEU: ~ H) (Vi - V) = ed Cov (U,V) 1 v aa o2=c? peG -BP =co2 1 = and 0} = d? SEM - VP? = dot Thus, we get edCov(U, V) r%&Y)= _ ed CovU, V) Oy “ay leaner =a ao In case c and d are of same sign wil be equal to 1 and, thus, r (X, Y) and r Oy) will Ta be equal, both in magnitude and sign. Alternatively, if ¢ and d have opposite signs 7; a | wil be~1 and r (X, ¥) and r (U, V) will be equal in magnitude but will have opposite signs. 130 Oe ESOS RESEARCH wy (©) Let the following information for two groups are available : Group Group ay Group size ma - Group mean (X) xX G Group mean (Y) a y | Group variance (X) on @, —_ 2 } Group variance (Y) oy 3, | Correlation coefficient r (X,Y) n a | | Then for the combined group, we have =_mXi +m X2 m Yi +m Yo m tn cu m+n 3 _ m2, + mor, + m(Xi ~ X)° +X - X)* oa nm +m, mo? +moz, + m(Vi~ YF? + m(% ~ ¥)? oy mm +12 MADy Oy + Min Opt m HH ~ HH - ¥) +m HG -H-H nm +m ‘Substituting these values in (1) we can get the correlation coefficient for the combined sanpe and Cov (X, ¥) = Illustration 1. In an agricultural field experiment on the wheat crop, following dat wt obtained : Nitrogen content in soil (kg/ha) | 69.6 | 93.7 | 81.5 96.2 | 95.9 | 8.5 | 708 8 [Wheat yield (qvha) 12.2 | 19.6 | 16.9 | 27.8 | 19.2 | 17.0 | 154 [18 Calculate the product moment correlation coefficient and interpret the result. Solution. To calculate r, we obtained for the following table : Nitrogen Content —~_ ~ 69.6 93.7 81.5 96.2 95.9 82.5 70.8 73.8 =X? = $5982.68 | SY2= 2739.06 e& - 131 CORRELATION AND REGRESSION \o | x: $0 $8 = 93, % = M4 a 18, ond 4, | XY > I) ZXY -nxX¥ NO ny r®&Y)= Se . Y (EX? ~ nk") (ey? — ny?) % 1224.87 ~ 8 x 83 x 18, : ~ ¥55982.68— 8383 x83) (2739.06 - 8 x 18x 18) ; : oe - 0.8185, n 870.68 X 147.06 r Alternative Method : By taking deviations of X and Y from arbitrary numbers 83 and 18 a respectively, we get the following table, x y | exe 69.6 12.2 | 13.4 93.7 19.6 107 815 169 “1s 96.2 278 132 95.9 19.2 ao 82,5 170 ~05 708 x4 | a2 Y-H . 5.9 -9.2 h-i Bs} 1 0 30 . Sol be formu 2UV-~nUV Sr Q&Yy= a zi Thus, X= 135, Later on at Wrong values apy] x Obtain the lution, ao > s We observe a posi by, thatthe soil rich in Mhustration 2, Pairs of observatio: To obtain comect value of FweTee fr, _ 292.87 - 0 Sota (20? ~ nd?) (av? nv) VETS oy (147.060) ~ itive correlation of the order 0.8185 by etween nitrogen contents gives more wheat the two variables, indicating yield, While calculating correlation coefficient between two variables X and Y from ns, following results were obtained, =X= 540, ZY = 120, ZY? = 530 and ZXY = 575 the time of che cking , following two mistakes were noted : Correct values v | x Ty 2 9 7 Correct value of, Correlation coefficient, ae , We need to find the Corrected summations to be used i 3 132 STATISTICAL METHODS FOR RESEARGy 1 Corrected EX = 135-7-6+6+7= 135 Wore of Corrected BY = 120 - 15-9 + 16 +8 = 120 gf | - Corrected EX? = 640 — 7? - 62 + 62 + 72 = 640 Corrected EY? = 530 — 157-92 + 162 + 82 = 544 Corrected = 575-15 X 7-6 X 9+6X 16+7 x 8=568 x. 135 120 X= 37245, Y=s>=40 . 568 - 30x 45x4 as and OY)" [(640 — 30x 45 44 — 3042)” Vang ye 06139, Illustration 3. Calculate coeffici nt of correlation from the followin; 1g data on marks students in two subjects. a Marks in Mathematics (x): | 38 | 70 | 40 [30 [90 | 80 | © | w [aw [Marks in Statistics (Y) : 65 | 40 | 30 | so | 60 | 70 | 50 | so] @ | Solution. Computation of r changing origin as well as scale x x60 | soys=u] w | ¥ | 7-50) (r-soysev |( ul 55 -5 +1 -1 | 65 15 3 9} 3] 70 10 2 4} 40 | -10 2 4] 4 n 40 20 4 16 | 30 -20 4 16 | i 30 -30 6 36 | 50 0 0 oj}! 90 30 6 36 | 60 10 2 4) 2 80 20 4 16 | 70 20 4 16 ' 60 0 0 0 | so 0 0 wae 80 20 4 16 | 50 0 0 an ¥ 90 30 6 36 | 60 10 2 Aya 80 20 4 16 | 70 20 4 : | 6 15 179 9 ot ; forctv# ‘ As one can see that mean for X as well as Y will not be a whole number. So we £0" , of origin by some convenient numbers, if) ; these Let us change the origin of X by 60 and Y by 50. Also it is convenient to cance | Xand Y by 5. Denote (X-60)/5 as U and (Y-50)/5 as V. With these new variallc® "Day n=10, TU=15, ZV=9, DU2 =177, ZV2 both =69, and EUV =65 _ Gy eV) | ey zuy - Rank corelaton coefficient, r= 1-22 _ 1 _ OOD - ois Sn? -1) Woy (99) ranks vt “SRepeated Rank It may happen that two or more individuals are assigned equal ms respect to characteristics A or B. In such a case, common ranks equal to the average of a 7 which these individuals would have assumed that had been slightly different from each of given to repeated items. Other items get the ranks as usual. While calculating rank correlation coefficient in such a case, the foll should be added to 3a2. ‘ should be added to 34°. (mm? — 1) 12 it lowing correction Correction factor Iie series 80 the sexo wade been differ ie 25isbe eommon rank Ate valve 6 occurs { 5607, Simin Tht on {Clore ig repea be tai Mey where m is the number of times an item is repeated. ion fa i Ir there are more than one such group of items with repeated ranks then the 6o7= (gg rly gait should be added as many times as the number of such groups. ios] Ea, 2 ii * Mlustration 7, Following data relate tothe marks often students in mathemati Xs were a Marks in Math. to | «| 7] s4[ | so[ 75] ” ray) yo, ued Marks in Biology 62] 57) «| a7] si | oo| 68 4 ‘a * No >» & PEL X RS, coRRELATION AND REGRESSION 2 1) 437 Obtain the rank correlation coefficient. Solution. Here while ranking the students we face the problem of repeated ranks. ‘WGateulation of Rank Correlation coefficient Mortsin Marks in Rank in Rank in d=Ry—- Ry @ a Biology Math, Biology x ¥ Ry R, go } 2 4 5 al 1 "644 57 l6 7 -1 1 953 68 \ 25 3.5 4 1 ae 47 9 10 -1 1 64 F 81 6 1 5 25 80! 60 1 6 5 25 25> 68 “25 35 -1 ! 40 ¢ 45 10 9 1 1 55 54 8 8 0 0 ~ 6 70 6 2 4 16 Zd=0 Be =72 In the X series 80 the maximum value is given rank 1. next to this value 75 occurs twice. Had these two values been different, they would have got ranks 2 and 3. Thus, the average of 2 and 3, Sis the common rank given to both these values. The value 68 gets the next rank which is 4, Again the value 64 occurs thrice and is given the common rank 6 which is the average of three ranks 5, 6 and 7. Similarly in Y-series value 68 is given the common rank 3.5. Thus, for the two repeated values in the X-series and one repeated value in the Y-series. the ‘otal correction is ? 7-1), 2027-1) _ 36 Comection factor -20 =) 30-0, @ =) 2 D2? ‘ 6 (2d? +3) 6(72+3) _ “Rank correlation coefficient r = 1 — aa - 0x99 0.545 Limitations of the Correlation Coefficient. It may be noted that a special care must be taken while interpreting the value of correlation coefficient. Following are few situations where Conelation coefficient may give misleading results. ( Sometimes correlation between two factors may occur due to the common effect of a "d factor, eg, the positive correlation between the per hectare yield of jute and rice may be due ‘ote fact that both are related to the amount of rainfall. sicawn a (i) On ation between two variables when no one exists in eT nye ds tal ape owing ec Ane shoukd assure that there is some theoretical reasoning behind this interdependence of the tw: Variables, di) As'stated earlier the correlation cvefficient measures the degree of linear rtationship. So “low value of correlation coefficient rules out the possibility of a linear relationship, though @ 138 STATISTICAL METHODS FoR RESEARG linear relationship may exist between the two variables. As an example, consi WOR below: * Consider the dae by [x ] 3 -2 | -l 0 1 ; | ni Ly [1s 10 7 6 7 10 1 It can be seen that r(X, Y) = 0 though here the two variables are related as Y 6 =6+y0 So, before using r (X, Y) as a measure of relationship one should see (withthe h agra) whether the general relationship is linear or not. el of cay court ows et tha Vouragle GRESSION ANALY: 2 caer Tithe correlation analysis we have discussed the degree of relationship without consi which is the cause and which is the effect. For example-in-case of field experiments the fet used is a cause variable while the yield is effect of that cause. So depending of the circuiting one can decide which is the cause apd which is the effect variable. The cause variable denoted by X is also called the independent variable and the effect variable denoted by Y scale \. dependent-variable. In regression analysis we find an algebraic function of the fom Thus, TEs [we express the dependent variable as a function of the indepéndent variable} analysis makes possible to estimiate or predict the unknown values of dependent variables forkara values of independent variable. In this chapter we shall discuss the case where the relation} between two variables is linear. Non-linear relationships shall be discussed in the next chapter. Principle of Least Squares. [itt ‘two variables are related, the points in the scatter aga will cluster around certain curve called the curve of, regression in regression analysis ura it find this curve i.e. to find the equation of a curve that gives the best representation of the relations between the two variables. In figure 10.2 the points are clustered around a straight line. One can draw a number straight lines through this scatter. (According to the principle of least squares, a CUN®. that mim Aooies 2 Y Fig. 10.2 A scatter diagram for pur of values (x, Y)- Burnsf (RY) CORRELATION AND REGRESSION 139 ependent variable) gives.a i Fore depet ebe the distance of it point (say A) of the scatter from the regression line, parallel fg Then afine that minimises the quantity Sez will be called the least square line. (Here raion ‘sextended over all the paired observations). Lines of Regression. If the scatter diagram indicates @ linear relationship between two jae, ibn the next step iso find the equation ofa straight line that gives a best fit to the data.) ‘ae tin, thus obtained is called the line of regression. Let us suppose, in the bivariate data YI LZ Y bbe the dependent variable and X be the independent variable, Then, the line Y=at bX js called the line of regression of Y on X. In this equation we want to estimate the values of vmsants a and B from the given data, which determine the position of the line completely. The parameters and b determine the intercept on Y-axis for X=0 and slope of the line respectively. Using the principle of least squares, the values of a and b can be’ calculated by solving simultaneously the following two equations called the normal equations : a+ b=X «(2) EX +b TX? (3) After some simplificat je get the following expressions for a and b. _ nEXY - EX3Y ~ n=X? — (2X)* (4) XK sos ad a= 2X p - 7-0 (5) From (5) we observe that the line of regression passes through the point (X, ¥) Further we know Cov &yy-L sexy - 2% 2% n no we aetee (BY 7 x on n Cov (X,Y) o% Again, we have _ Cov (X,Y) re oxay Multiplying both sides by oy /ox, we get Or COVKYY) gy _ CVOGY) 5 Ox 2 xy Ox ot ltrepes , above regression ‘ ee x ents the increase in the value of Y-variable corressponding to a umit jncrease in the value of ra axe For the sake of convenience we denote p by by, Thus, the regression coefficient of Y 'S given by eas 140 STATISTICAL METHODS FOR RESEARCH Wo = Cov Y) _ oy ibe = a ox Similarly, if we take X as the dependent variable and Y as the independent vari regression line of X on Y will be X=a'—by Y. Expressions for a’ and by, can be got from the expression (4) to (6) by interchanging Xa Y. Thus we get KE (8) able then, te = MEXY ~ EXEY _ Cov (X,Y) _ | o5, ay 2 2 A o » nZY? ~ (ZY) o 'y ~@) and ad’ =X-by¥ (lt “Here by is called the coefficient of regression of X on Y and it indicates the increase in the value of X variable if the value of Y-variable increases by one unit. Remarks, (i) It may be noted that we always have two lines of regression. Line of regressin of Y on X is to be used when we want to estimate the value of Y for a given value of X, ie, whe Y is dependent and X is independent variable. On the other hand, if we want to estimate the value of X for a given value of Y, we use line of regression of X on Y. In the case of perfect correlation, the two lines coincide. Gi) From (5) and (10) we conclude that the two lines of regression intersect each other at the point (X, ¥). (iii) We have two different lines of regression because in the regression of Y on X, the sumo squares of the difference between the observed points and the regression line parallel to the Y-ats are minimised. On the other hand in case of line regression of X on Y squares of these differences Parallel to the X-axis are minimised. Properties of Regression Coefficients. Few important properties of regressiot coefficients are given below : © Correlation coefficient (r) and the two regression coefficients (by, and ign e Same sign. Proof : we have r= SKY), _ Cove%yy Cov (X, Y) Geo, 7h EY pg = a . is te 2 these three expressions o, ad o, are always positive. So the signs ofr by and by ® same as the sign of Cov (X, Y), ii) Correlation coefficient is equal to the geometric mean of the two regression coefiiet® Proof : Multiplying (8) and (9); we get Obese , eo, oa, [Bxy X Byx The sign of correlation coefficient will be the same as that of By and Byx- i (i If one of the regression than um Coefficient is greater than unity, the other must be less CORRELATION AND REGRESSION 141 Proof, Let us suppose that by, is greater than unity. 1 ‘Then, bry >1 or ys Also Psi or bay bye S 1 1 1 by Sz by S$=—<1 o he or SB (jv) Arithmetic mean of the two regression coefficients is greater than the correlation coefficient. 1 Proof : We want to prove that z (bx + bye) Br o. y ox ay r2z|>r 4M 5y or | or oto 2, 3 +05 -2oxoy20 or ? > 0, or o% +05 ~2oroy= (ox - ay)? 20, Which is always true, since the square of a real number is always positive. (©) Regression coefficients are independent of change of origin but not of scale. Proof : Let U = Xoa and v-%t where a, b, and h and k are constants, Then, it can be proved that Cov (X, Y) = h k Cov (U, V) =n o2 and 0} =k of Cov (X,Y) = CV GY) ky — and by = 2 — how oF of Hence the result. _ Illustration 1. Following table gives the length of green jute plant (in cm) and the weight of Ay jute fbre (in gm) for ten jute plants. Length of green Weight of dry Length of green Weight of dry Plant jute fibre plant jute fibre mM 1.20 118 2.10 125 215 140 3.05 135 2.70 150 4.10 165 $5.25 160 5.70 1m 6.05 185 7.28 fing Calculate the linear regression equation of weight of dry fibre on length of green plant. Also the expected dry weight of a 120 cm long green plant. N STATISTICAL METHODS FOR RESEARCH wy 142 OR Solution. Weight of dry os Ps Length of Plant (Y) green plant (X) + 1.20 os 1332) i 92. an 2.10 13524 24189 25 oe 268.35 te 19600 140 3.05 42109 135 2.70 18225 36459 | os 4.10 22500 615.09 165 5.25 27225 866.5 160 5.70 25600 912.09 im 6.05 29241 133455 185, io 34225 1341.25 X = 1460 EY = 39.55 =X? = 218486 EXY = 621030 We are to find the regression equation Yeatb,.X Now, we have ~ NEXY -=XzY = 10 x 6210.30 — 1460 x 39.55 = 0.082 *NEX? — (Ex? 10 x 218486 — (1460)? 2 39.55 1460 _ | and a=Y — by “qo. ~ 0.082 x S5- = - 7.997 «. The reqired regression line is — 7.997 + 0.082 X Subsistuting X = 120, we get Y = — 7,997 + 0.082 x 120 = 1.827 Thus, expected dry weight of fibre from a 120 ems long plant will be 1.827 gms ,__ Illustration 2. The following data Pertains to marks in two subjects, statistics and m™ ina certain examination 1 Marks in Physics Marks in Stas Mean = 8 Standard deviation 10.5 13 Correlation betw: een marks in f , _ - ikely >= gets 60 marke y Physics and statistics is 0.53. Find the most likely in Statistics. Physics if a student Solution, Let 4) we me «og by ¥.9"" Tequired 10 find the enact Ute MAES in physics by X and marks in statsies bY Ys ll ed y 0 Tegtession line of X on Yin Value of the variable X for a given value of Y. So ™ wy (0 of d we ir a ol so, the substi? which | Must ses it (Fin (i Fi (ii) F (wit heart weigh Solut solve the tv Malti Subst fy Ua to th Xiang y . 8o, Bu een ith Lug Bay XY = 62109 CORRELATION AND REGRESSION 143 X=atbyY by = r2£ = 0:53 We know ny = 1G = 0.53. 775 = 0.492 and a=X~ by .¥ = 70 — 0.492 x 78 = 31,587 So, the line of regression of X on Y is X = Substituting Y = 60, we get X = 31.587 + 0.492 x 60 = 61.10. which are the required marks of the student getting 60 marks in statistics. Illustration 3. For a study related to. the weight of kidneys (Y) and weight of heart (X) both measured in gms the two lines of regression are 4X —10¥ + 1725 = 0 and SX - 6Y + 325 =0. (i Find the mean of two variables. (ii) Find the correlation between the two variables. 1.587 + 0.492 Y. (i) Find the expected weight of heart for an individual whose kidney weight is 250 gm. () Ifthe standard deviation for kidney weight is 87.5 gm, find the standard deviation for the heart weight. Solution. (/) We know that the two lines of regression intersect at the point (X, ¥) . So we solve the two equations, simultaneously. 4X — 10¥ + 1725 =0 aa) 3K - 6 +325=0 of) Multiplying (i) by 5 and (ii) by 4 substracting we get Y= 281.73 Substituting in (i), we get X= 273.08 i To calculate the correlation coefficient, we use the fact that the correlation coefficient is ‘cual tothe geometric mean of two regression coefficients. So, let (i) be the regression equation of Xand Y and (ii) be the regression equation Y on X. Then, we can write the two equations as 325.5 and Y="e+gx So, r=+ a Vom Bx ih: Them the correlation coefficient cannot be greater than 1. So our supposition is ons ¥. wet, (iis the regression equation of Y on X while (i) is the regression equation o} “With this we have on STATISTICAL METHODS FOR RESEang on H i Regression of Y on X as Y= 172.5 + 04X, Oi) gt i, oe and Regression of X on ¥ as X= -65.0 + 1.2¥ | Oye 8 p= & Jb On = + VI2X04 = + 0.69 | v4 rw ‘Again, as the signs of the two regression coefficients are positive the comea will alsebe positive. Thus, coreelation ‘coefficient = 0.69. lation g 7 (ii) We are to find the expected value of X for Y = 250. So we substitute y = so mn Y to get the required value of X. ing 3 eo regression equation of X 0 ‘ X =-65 + (1.2) (250) = 235 gm. ox . oslo (iv) We know that by = 7.5% on substituting the values, we get pe y 1.2 = 0.69 x 2 or oy = 152.2 gm 35 Remarks : 1, For each of the observed values of X if we estimate the value of Yx from the lng of regression of Y on X and then calculate the correlation coefficient betwee tg) be te8 estimated values and the actually observed values of Y, then this correlation coefficient wiles rege kam as the correlation coefficient between X and Y ioe 3 also it ean be observed thatthe square ofthe correlation coefficient is equaltohenig| & Se aty the variance of the estimated values of Y to the variance of observed values of Y. men Variance of estimated values of Y 2 - 1” (% Y) = “Variance of observed values of Y ; tegand mean sg Thus, the square of correlation coefficient may be interpreted as the proportion of thet variance that is accounted for by the regression of Y on X. 3. The main point of difference between correlation and regression analysis, is ttt correlation tells us the degree of linear relationship where as the regression analysis gives 8) Coreton rato ( expected value of dependent variable given the value of independent variable. 10.4. CORRELATION RATIO Ithas been emphasized in the above discussion that the techniques of correlation andregest analysis are useful only when the relationship between two variables is linear. ‘Sometimes wee! low value of correlation coefficient mearly because there exists some sort of curvilinear (0% linear) relationship between two variables. In such a case, correlation ratio denoted bY ot appropriate measure to study the degree of relationship between two variables. Just asris a ea, of concentration of points about the straight line of best fit, 7 is a measure of the concen @ points about a curve of best fit. ates Suppose the pairs of . wenn) ta ao pairs of values X and Y be arranged in arrays of Y according ‘0 ( nig “lin it - (iy, eben 1 Yn Yn = Yu ay, = Que, Dat 7 Yu Yo ~ Yor, CT he, ea ty ~ : Vlg to, ke : Yur Ye ~ ae “ash Ms a lo S 145 apéLATION AND REGRESSION oF we observe that out of the total » individuals n, have the same value of X, say X; while et onding values of Y are Yit, Yon Yet the el, fn ation ce) erve define the mean of ith array as lue of, ent between cient will qual tothe . ortion of tea alysis, i ysis gv Yi = DYy/mi ? and the grand mean of Y as Y= SY Yy/n = Yn Yin v7 i Then the correlation ratio of Y on X denoted by myx is the positive square root of dni Y - YP? oe SEC - YP Ifthe values are grouped into a k x 1 bivariate frequency table, the classes of X may be ‘supposed to give k arrays of Y-values. Let fj be the frequency of jth Y-values in the ith array, fi. be the total frequency of the ith array and f,, be the total frequency of the jth Y-values. Then ith array ‘mean is given by Yi= Diy Yy/Sio 5 and the grand mean is given is Yi= Dfoj Yj/n t Conlation ratio (7,,) will, then, be given as positive square root of Dla (Y; - ¥)? ee Shay (¥; - YY 7 yf Perties of Correlation Ratio. Few important properties of correlation ratio are given (the value of correlation ratio lies between 0 and 1, ie.,0 < n2<1. ( the value of correlation ratio is always greater than the correlation coefficient, If the ston perfect linear than r= 9, (ti ni i) r, ©) the Mtervals by the val 's independent of change of origin and scale. ‘and rye are same but ry and rYy~ are, in general different from each other Value of correlation ratio is not indep come NATOWE My, ® Of 7. approaches r, lustrati, Ue stration. Calculate correla “ton 5 of section 10.2. endent of the classification of data, As the class approaches unity. On the other hand, if the grouping is very coarse, tion ratio from the bivariate frequency data presented in nae STATISTICAL METHODS FOR RESEARG, | Wi Solution. Taking V = CY - 225)/50, we have the following table : y-25 | _, i|l2 v-*3 1| 0 y [175 | 225] 275 | 325 | Total | 2h vy x Sio tas 9 | 10/7 | 0 | 2 2 om 17s] 5 | 5 | 12] 23 9 cast | 2m 12s} 1 | 6} 20] 2] 29 2B 0703 | 1875 | 0 ieee, 12 18 1500 i Total foj 15 | 22| 43 | 10 | 90 48 lawl an 353 | 62] 94] 21s] 724 | ‘The grand mean of V is C1) (15) + (0) (22) + (1) (43) + 2010) 90 jis given by 2 _ 2 fio (Vi- VW)? _ 23.317 1p, = tLe NO - Bah = 322 Sew 72400 So that, "yx = V0.322 = 0.567 10.5. INTRACLASS CORRELATION Intraclass correlation means correlation with in a class. As compared to product moe correlation coefficient, in this case both the variables measure the same characteristic. Frew wwe may be interested in studying the correlation between the height of brothers or bei ‘weight of cubs from the same maize plant. In such cases there is nothing to distinguish ow from the other, so that one may be treated as X and the other as Y. present Let us have n classes Cj, C2, Cy and ky, ky «ky member of each may be Cy: Xu Xi ~ Xikt 1 where Xij (i=, ng m3 = 1, 2, ny k) denote the value of jth member in the ith class intraclass correlation coefficient is given as BR - HP - BBKy- 3? - 24 LTD Ch = 1) Ky - X* a File intaclss correlatior Solution, We have the gran: YIX=68 + Wy ti Bx = CORRELATION AND REGRESSION war ere %; is the mean of the ith class and X is the grand mean of all the classes. w nine above formula if we put ki~ k, i. ifall the classes have equal member, then intraclass coneation coefficient i given as 1 (i -} here denotes the variances of X and 02 denote the variance of means of classes. . 1 ‘The value of intraclass correlation lies between €yD and 1. Also, it is independent of change of origin and scale. Illustration. In five families of three brothers each, the heights of brothers are : Family Height in inches 1 68, ny, 2 2 72, 70, n 3 72, 69, 2 4 72, 22, 70 5 74, 22, B Find the intraclass correlation coefficient. Solution. We have the grand sum and sum of squares as, DD X = 68 + 71 + 72 + 72 +..474 + 72 + 73 = 1071 id and DD x? = (68)? + (71)? + (72)? ++ 72)” + (73)? = 76497 77 2 2 _ 16497 _ (1071 _ => =ise 1.840 Again, the means of the heights of brothers of five families are 70.33, 71.33, 71.00, 71.33, 73.00 and the grand mean is 1071 xeap a4 (7033 — 71.4)? + (71.33 - 71.4)? + a, nay? o= +(71.33 - 71.4)? + (73 - 71.47 =0.775 " 3 x 0.775 1.840 0.132 -enf2)-| is requi, . ‘ited intraclass correlation coefficient. jl 14 & STATISTICAL METHODS FOR RESEARCH | WOE PER EXERCISES What do you understand by the term correlation ? Discuss the various measure it methods ty ‘Show that the correlation coefficient is independent of a change of origin and the variable. State the limits between which r lies and give its proof. Scale Comment on the following statements : (@) Anegative correlation indicates that if the value of one variable decreases value of the other also decreases. the (b) Ifthe correlation coefficient between X and Vis positive then the correlation coef between (i) - X and -Y is positive; (i) -X and Y is positive. Explain the difference between product moment and rank correlation coefficients, Write short notes on the following : (a)Lines of regression _—_(b) Least squares principle (c) Regression coefficient (4) Correlation ratio. (0) Intraclass correlation. Why do we have two lines of regression ? Prove that the correlation coefficients he geometric mean of the two regression coefficients. From the following data calculate the coefficient of correlation between age and boty Weight of white leghorn chicks. Also estimate the weights of an 11 weeks old chick ‘Age in weeks : 1 2 3 4 5 Body wt. in gm: 60 100 160 240 320 Age in weeks : 6 7 8 9 10 Body wt in gm : 410 480 580 670 750 Ina field experiment on different varieties of raya crop, the following data are obtains Variety No. + 2 3 4 5 6 7 8 9 1 Maturitydays 164 178 167 163 158 169 185 157 193 180 SeedYield 48 64 48 41 26 48 54 32 59 54 per plant (gm) (a) Drawa scatter diagram. jeen the maturiy pero! (b) Calculate the product moment correlation coefficient betw and yield. Comment on the results. aux! The following data gives the number of blind persons per lakh of population inditfer age groups. Find Pearsons correlation coefficient between age and blindness * Age in yrs No. of blinds Age in yrs perlakh 0-10 50 40-50 10-20 68 50-60 20-30 102 60-70 30-40 109 70-80 oRRELATION AND REGRESSION 149 40. The following table gives the age and daily wages of 50 workers ina factory : Daily pay in Rs. (x) Age(Y) 160-169 170-179 180-189 190-199 200-209 20-30 5 3 1 = = 30-40 2 6 2 1 _ 40-50 1 2 4 2 5 50-60 - 1 3 6 2 60-70 = = 1 1 5 correlation ratio (1x) 11, Following are the ranks obtained by 12 students in mathematics and statistics. To what extent the knowledge of students in two subjects is related. Compute the correlation coefficient between age and daily wages. Also compute Mathematics | 1 2 3 4 5 6|7 8 9 to [11] 12 Statistrics 1 6 3) 4/7 5 | 2 10| 8 [9 [2 12. Inamuscial competition ten participants were ranked by three judges A, B and C in the following order : RanksbyjudgeA ] 3] 5] 8] 4] 7] t0]2]1]6]o9 RanksbyjudgeB | 6 | 4/9 ]8)] 1 ]2{3] [5 ]7 RanksbyjudgeC | 1 | 6] 5] 0] 3| 2/4] 9/71] 8 Use the rank correlation coefficients to see which pair of judges to. common liking in music. 13. The following table gives the heights in inches in 10 fathers and their eldest sons. has the nearest approach Calculate coefficient of rank correlation. Ht. of father 63 | 67 | 64 | 68] 62] 66 | 68 | 67 | 69 | 71 Ht. of son ee | 68 | 65 | 69 | 66 | 65 | 68 | 69] 71 | 70 14. Fromthe following data calculate the rank correlati on height (in inches) and weight (in pounds) of eight individuals, ion coefficient and the product moment correlation coefficient Weight : a 7 90] 99] 108] 117 | 126 | 135 | 144 Height = oo | se} a] o | 8 | o | 70 | 7 Why the two coefficients are equal ? 15. Arandom sample of ten families had the following total expen ture and food expenditure in Rs. per day per person. Total Exp. 766 20.1 262 30.4 22.5 |_Food Exp. 123 16.0 206 245 172 Total exp. 492 66.95 52.81 356 985 Food Exp. 364 2 a74 282 61.62 150 STATISTICAL METHODS FOR RESEARg Ht Estimate the regression line of food expenditure on total expenditure and 16. ne numbers of food (X) and clothing (Y) at eight Succes; ive 1996 1997 1998 1999 2000 ang, ee x 100 124 123 132 136 139 49 ay Y 100 104120 192 143g Pg (i) Fitthe lines of regression of X on Y and Y on X and hence coefficient. (i) Suggest what the value of Y will be when X is expected to be 150, 17. From the following data, estimate the most likely price at Calcutta Cotrespon, price Rs. 68 at Bombay and that at Bombay corresponding to price Rs. 70; my ra, IVE Years ar Calculate thee” Calcutta (X) Boma Average price in Rs. 7 6 Standard deviation of price 15 05 AX, Y) = 0.68 18. In correlation analysis the following two regression equations were Obtained: 12X+3Y=19, 3X+9Y=46 Obtain: () the value of correlation coefficient, (i) mean values of X and Y, and (ii) the ratio of coefficient of variability of X to that of Y. 19. (a) Ina regression analysis the following two regression coefficients were obiarl by=3.5 and by,=0.5, Comment on these values. (b) Given the following two regression equations calculate the correlation cose and comment on the results, Regression of Y on X: Regression of X on Y : X = 0.5 + 0.25Y. 20. Calculate the correlation coefficient between the following series { Year 1991 1992 1988 1994 1905 1906 1967 108 1 Deaths fromcancer 612 583 671 692 689 635 601 600 Production of 66.6 84.9 886 780 968 1052 932 116 9! steel (in millions) What can you infer from the value of r? | 21. For five plants each containing five ears, the numberof grains per ea"a'®9 Calculate the intraclass Correlation coefficient. Plant No. of Ears 5 1 2 3 7 ‘ 1 Bs Z 3 2 4 2 33 34 7 » fl 3 58 a 50 51 # 4 4 7 8 2 7 5 35 44 7 “ “y Oo Oo o Oo o a ii. INT In the pre volving t¥0 V ie, data having ne variable ma ‘qattyofseed, canbe studied MULTIVARIATE CORRELATION AND REGRESSION 11.1. INTRODUCTION In the preceding chapter, we were concerned with the correlation and regression studies involving two variables only. But many a times we are required to work with a multivariate data, ie, data having simultaneous information on more than two characteristics. In such a case value of one variable may be influenced by many others, e.g., yield per acre of a crop depends upon the quantity of seed, fertility of soil, fertilizer used, irrigation facilities and so on. This type of association can be studied with the help of multiple correlation and multiple regression techniques. These techniques help in investigating the extent of relationship and effect of a group of variables upon a variable not included in the group. Suppose in case of multivariate data we are interested in a relationship between two variables only. Two altemative methods to study this relationship are : (i) consider only those data in which allthe variables, other than the two under study, have a constant value or (ii) eliminate mathematically the effect of other variables on the two variables of interest. The first method is rarely used since the data in the required form are usually not available and if available, its size may be too small. In the second method it is not possible to eliminate the entire influence of the other variables, but their linear effect can be easily eliminated and these are called the partial correlation and partial regression ‘echniques, which are the subject matter of the present chapter. discal” 13 section, some commonly used non linear regression equations have also been scussed, 11.2, MULTIPLE REGRESSION .__ Suppose we have a multivariate data involving p variables X), X2,..., Xp and we are interested Onlvsing the effect of p — 1 independent variables X2, X3,-...., Xp on the dependent variable X; ta bijective here is to build up a mathematical relationship of a form X; = f (Xz, X3,... Xp) with ‘dea of using it for predicting the value of variable X from a knowledge of the value of variables en X,,. For example, we may be interested in estimating the yield of a crop in a year from nena of rainfall, average temperature and average humidity, etc., during the period from variate to harvesting of the crop. The common sense suggests that as more and more independent “s are included in the regression, the prediction becomes better. 152 STATISTICAL METHODS FOR RESEARCH yy 1. . OR, In the most general and simple case, the relationship between X and X3, x,y y ‘pis Ug | | to be linear one given by an equation of the form X1 =a + by Xp t b3 Xs +. + by Xp | Sty | i i Itiple regression equation of X; on Xp, X3,..,, bs pace eased es cl reresion coefficients. These constants ai ie dam | p-variate data consisting ofp values, corresponding to the p variables, for n (> p) distinct ing oe Pit these values for the i-th individual be denoted by X1i Xap »» Xpi Then, the least du catimates of the p constants 4, bp, b3,.--,B pcan be obtained by solving simultaneously the fog pnormal equations. _ BX, = a + by UX py + 3 EX3i tt bp EXpi DX) Xj = a EXyz + by EXof + bs EX2i Xai +--+ Bp EX2j Xpi BX Xoj = a EXzi + bz EXri Xai + 43 X25, +... bp EX3; Xpi Q) EX, Xp= a EXpi + by EXaj Xpi + bs EXsy Xpi + + bp EXHi Ths summations extend over i from | to 7. | Aliter. Here we give an alternative method to determine the multiple regression equation (I) case of two independent variables. However, the results can be extended for the regression equation. having more than two independent variables. We write the regression equation of X; on X2 and X3 as X= 4 + by23 Xo + b13.2 Xs Q) Here b12,3 is the partial regression coefficient of X on X when the linear effect of X;0nX, and X; is eliminated. Simialrly one can define b;3,2. In order to determine the values of these patil regression coefficents, we define the following terms : n3 3 (@) Correlation matrix of X,, Xp and X3 : ni 2 13 1 on R=In m2 m3]=]ni2 1 3 12 [os ry = Land y=) M1 132 133 ns 3 1 | (b) Co-factors of r11, r12, r13 elements in the correlation matrix R : % Co-factor of ry = Ry = (-1)!+4 lei | — 3 ny 1 n2 13 Corfactor of rip = Ry = C1)? =~ (ia — nisms) ag! fl Co-factor of ‘, ® ! 13 = Ry3 = (ays = 13 1) 73723 —N3 13 3 Say O 7 x, ia ‘a ul TWARIATE CORRELATION AND REGRESSION 153 {@ Also we denote by Xi, X2, X3, 01, 0 and o3 the means and standard deviations of X,, Ey] ua one . seri, ‘hen, br23 and br3a can be obtained from the following formulae: isting i r n, ing, 427323. » the “ 4) ugly sn a “hay yg =— 2b Ra = 21 ta ts 32° sR 92 1h «5) wd a= Xi ~ bias Xp ~ by32 Xs- 6) Remakrs. (i) The partial regression coefficient by 3 determines the increase in the value of X shen the value of Xs increased by one unit and the linear effect of Xs on X, and X; is eliminated. (ip Interms of the notations of equation (3), by, 3 will be the partial regression coefficient of yon X; when the linear effect or X3 on Xz and X, is eliminated and it will be different from bya. (ii) It may be noted that in both the methods discussed above, we get the same values of the constants of the regression equation, The same point has been verified for a particular case in the z fallowingilustration, s10N | tion Ij . *, repeamadl Ilustration. From a field experiment the following data related to yield of wheat (X;), sirogen contents in the soil at the time of sowing (Xq) and nitrogen applied (X;), are recorded. Yield (qil/ha.) in the soil Nitrogen contents Nitrogen applied 4 Gs/ua) ig/ha) r effect of Xs 162 69.6 0 ues of these 315 69.6 40 30.6 69.6 60 394 69.6 80 129 815 0 25.0 81.5 40 31.9 81.5 60 “| yr! and 375 81.5 80 189 95.9 0 36.1 95.9 40 38.0 95.9 60 40.3 95.9 80 Find the regression of X, on Xz and X; and interpret the regression coefficients. value elation, It is required to find the regression equation Xj = @ + by Xz + bs Xs. For this, f various sums and sum of products, are calculated in table 11.1. ex ing the values from table 11.1 in the normal equations (2), we get the following three 358.3 = 12a + 988b + 5406s i) > - 156 STATISTICAL METHODS FoR rB-rh- At na n3 5 Also -r3 The, substituting these expressions, we get 24,2 Ry oq = (12 23 = 2ririars 123 = > % Properties of Multiple Corelation Coefficient. of multiple correlation coefficient. @ Suppose, for all the given values of X», X5,., Xp We estimate the value ES oo, We listbelow certain mponay . . a t 1ultiple regression equation (1) and denote it by X1.25, .p. Then its product mones coefficient with X; is equal to the multiple corelation coeficient. So we can say har correlation coefficient measures the closeness of the association between the obserefa expected values of the dependent variable obtained from the multiple regression equaton (ii) Another way to interpret the multiple correlation coefficient is that its sua Percentage of variation in the dependent variable explained by the independent variables tim fitted linear regression equation. (iii) While calculating the value of multiple correlation coefficient from (7) we take tie square root of the expression on the right hand side. So the value of Ry 25). is take ai Again as said above it is the simple correlation coefficient between X; and X95, is will be less than 1. Therefore we conclude that 0 < Ry 93... 1. (i) Ry23,-4 = 1 means that the association between the dependent and the inde variables is perfect. In this case the observed and expected values will be equal, ie. Xi=Xu and thus the multiple regression equation is said to be a perfect perdiction formula () Ri 23... = 0 means that X; is completely uncorrelated with all the other variables mliple regression equation fails to throw any light on the value of X; when Xp, Xs own, (i Ris, is always greater than all the simple correlation coefficients of the de variable with the individual independent variables, i.e., Ri23e—p = rie M1, . Tip: a Ilustration, For a data related to three variables, Xy Xp and X; following conesion were observed. ; 712 = 0.80, 743 =~ 0.56 and 795 = Find Ry 25 and Rs 19 and interpret the results, Solution, Ris = [2+ 7 ~ 2nnans 1-r 23, (080)? + (0.56)? — 2 x (0.80) (-0.56) (-0.40) (080)? + 0.56)? — 2 x (0.80) 0.56) -0.40) 0.40) ni3 + 1B — Qnon Rap= as = tis = 9 56g 1-1 0.40. ein afer the linea ¢ eee Xy and Xp, dem Ar some algebra yLTWARIATE CORRELATION AND REGRESSION Ml 157 i 2s interpretation. R?2; = (0.842)? = 0.709 means that 70 per cent of the total variation of variable X; can be explained with the help of the multiple i oe nly we can interpret the value of Rp iple regression of Xj on Xz and Xs. {14 PARTIAL CORRELATION COEFFICIENT ‘Sometimes two variables X, and Xz may be correlated partly due to the fact that one of them is dependent on the other and partly due to the correlation of a third variable, X3 with both X, and Xp, Insuch a situation, one may wish to analyse the correlation between X; and X; after eliminating the effect of X3. This type of correlation is called the partial correlation and the coefficient of comelation between X, and X; obtained after eliminating the linear effect of X;, is called the partial comelation coefficient. ‘The residual e.3 = Xj — a ~ by3 X3, may be regarded as the part of the variable X; obtained after eliminating the linear effect of X3. Similarly the residual e 3 gives the part of variable X» that remains after the linear effect of X3 has been eliminated. Thus the partial correlation coefficient between X; and Xz, denotes by riz, is given by ri23 =r (e153, €23) Cov (¢1,3, €23) VV (413) V (23) ‘After some algebraic manipyulations we get the folloiwng expression riz = "323 Ri oye Rae = tas pe VRuRa~ Ja-)d-B) 8) Remarks : (i) As in case of simple correlation coefficient, partial correlation coefficient r2.3 is equal to the geometric mean of the partial regression coefficient of X, on Xp (b,23) and the partial regression coefficient of X_ on X1 (bp1.3)- Thus we have n23 = £ bi23 X bas (ii) The expressions for 713.2 and r23,1 obtained similar to (8), are given as : na Ris n3 = N23 3- Ra = ms PS e RnR a7) - A) 73 = Mi Tig Ra3 and 1 ty VRnRss fa 13) (0-73) (ii) Ifry2 = ry3 X rpg, then we have 712.3 = 0. Thus, one may ‘observe a correlation between X, and X; which is due tro the effect of another variable X; but after eliminmating the effect of Xs the two vaiables may be observed as uncorrelated. __ (i) Partial correlation coefficient being the simple correlation coefficient between the two ‘esiduals, it lies between —1 and 1. (0) Partial cortelation coefficient isa useful too! to decide whether a variable should be included ‘nthe regression analysis or not. Ilustration, For a trivariate data following results are known. %=42, 07=53, 03> 6.1 168 STATISTICAL METHODS FOR Reg, ARCH rr r= 0.80, 2370.7, 13 = 06 ey Find (i) rin3 (i) 723. (##) Ras and () bis bisa Solution. _ fin = 73 3 0.8 - 0.7) 0.6) 0 ast JEM - cose 7 Ja-Ba-m) — ¥0-049)0-036) 68s | ngs = 9.458 | | ° | @) RaI> Jo-ma-m) | 1 (ip Roe (RE _ [EA FOH HOH OD OH | 1- 0.36 = 0.846, | 1-13 : 1 ng—n3ns 02 2° 1-r eee 33° pe second a1 ng—nams _ 0 find D3 N2BS = 0,54, | Wee y= bo” and baa = SE - O° = | qaeqonding 0 norma 11.5. ESTIMATION OF NON-LINEAR RELATIONSHIP | sy = nb Cur discussion of the correlation and regression analysis in tis es well is ree ee a chapter is based on the assumption that the relationship between the dependent and theindepaia | a 1, one can well expectanotina | Siting the values of variables is a linear one. But, given the complexity of the real worl variables. The most common fms) we Telationship between the dependent and the independent vronlinear relationships can be expressed in the form of (®) polynomial equations ori) =) 937 <8 ‘curves. These two forms are discussed below. oes 1. Polynomial relations. Consider the relationship betweet yield (¥) and fern | oe = certain iveloft| = ‘Solving these equations, \ (0). As we increase the quantity of fertilizer the yield go on increasing, at the yield attains a maximum value and with a further increase 19 the quantity of fet decreases. Such a relationship can be adequately represented by a second degree polyno of the form Y = by + bX + bX? In general, a polynomial of degree p can be written as Y= by + 1X + XP + ack BpX? 0 (ad Py water ___Tocstimate this polynomial equation we set x with the fitting on the linear multiple regression equation. Y= by + by Xi + BaXa touchy aot] Illustration. Fit a second degree polynomial curve to the following data ™ type chicks. : ‘Age in weeks (X) : 1 2 | 3 4 5 me Gain in wt. ingm.(¥): | 62 [ 125 [ 150 | 150 130 | 120 P 11 as in the pre and the indepat 11 expecta nora st common fis) ations or (2 RIATE CORRELATION AND REGRESSION wi 159 olution. We need the type of table given below : x x? x3 xé YX x? 1 1 1 1 62 62 2 4 8 16 250 500 3 27 81 450 1350 4 16 64 256 600 2400 130 5 25 125 625 650 3250 120 6 36 216 1296 720 4320 110 7 49 343 2041 70 5390 0 8 64 512 4096 720 $760 937 36 204 1296 8772 4222 25032 We want to find the second degree curve Y = bo + DIX + DX? Corresponding to the normal equations (2), we get the following three equations : BY = ny + BiEX + EX, EXY = boEX + BEX? + HEX? ad EX2Y = boEX? + BEX! + DyEX4. Substituting the values of sums, sum of squares and products from the table given above, we get 937 = Bby + 36b; + 204, 4222 = 36by + 204bj + 1296, 23032 = 204by + 1296by + 8772b, Solving these equations, we get by = 35.206, by = 48.935, and by = ~ 5.423 Therefore, the required equation is , Y = 35.206 + 48.935X — 5.423X?. ee . i ations where with the incre sche cures any snes ve ie tio i ne cumple, a the income of an individual increases the demand forthe items of necessities #10 increases but at a decreasing rate and after a certain income level there is ni further increase in with increase in income. Such relations can be adequately represented with the help 0! Sowth function of the form 4 Y = by Xi"X2? Xp ; - For the estimation of this function, if we take logarithms on ee sides, we Tog, ¥ = log, by by loge Xi + Ba Hoke Xa tot Bp BX Which 19 + by loge Ai * 92 . yew to the chs tinear in log Y and log. X's. Thus by transforming the varabies eee be mar dr Vaiabvies ¥# = Jog YX} = log Xtvus Xp" = 108 Xp» We can find the v8 » troy . 7m the linear regression equation. > STATISTICAL METHODS FY 160 i OR RESEAR GA, Yt = bf + by Xi + bp Xft... by x} OR, Remarks.The value of bi (i= 1, 2,-p) gives the percentage increase in y | one percent increase in X;. Pini | Illustration. The following table shows the demand (Y) for a commodity ang in measured in arbitrary units. Pree re 343] 580] 618 | 695 | 724] 812 | 887 | on 1 Tap Zz: oi | 4 | so | 43 | 38 | 36 | 28 | 23 |) ~ Estimate the demand function Y = boX°. Solution. We take logarithum of the variables Y and X. Y x Y* = log.-¥ X= logy 543 61 6.2971 4.1109 580 54 6.3631 3.9890 618 50 6.4265 3.9129 695 43 6.5439 3.1612 724 38 6.5848 3.6316 812 36 6.6995 3.5835 887 28 6.7879 332 | 991 23 6.8987 3.1385 1186 19 7.0685 29445 | 1940 10 7.5705 23026 | The readers can verify that the regression equation of Y* on X* will be given as Y*=9.0 0.69X* “Y= Antilog (9.121) X-0-69 = 9145.4 X-0.69 The value of regression coefficient b = of the commodity will result in a 0.69 per cel 3. Exponential Relations. — 0.69 indicates that a one per cent increase ‘npn nt decrease in the demand of that commodity ' n : Non linear least squares regression is more general cs ne, Mi deconrsineat curves can be put into a linear form by appropriate transforms er the dependent variable , i snfineat is of the form, Y or some (or all) of the independent variables. An Y=fx;B)+e : Exponential Functions. size, in the spread of diseases, ‘pes of decline typified by radi describe continuous growth or y= aebt Exponential functions are used to model changes in PPA and in the growth of investments. They can also accurstel Pi, active decay. Exponential functions with the base ¢ are of" decay. Consider the nonlinear regression problem (ignoring > taking a logarithm transformations of both sides, it becomes In) = Ina) + by The exponenti ‘ ji : ial i : 0 increasingss an ino tnetion Sse more generally for functions when a series #7 State such that percent difference for observation to observation g we 61 rE CORRELATION AND REGRESSION 1 aT on + form of the model is y= ab, | 7 or growth, model when a > 0 and b> 1 and for decay when a > 0 and 0

You might also like