8.regression Analysis
8.regression Analysis
Regression analysis: Regression analysis is a statistical technique, which has developed to study and
measure the statistical relationship among two or more variables with a vision to estimate or predict the
value of dependent variable for some known value of the independent variable. Example: 1. Fertilizer
used and yield of various plots of land. 2. The price of a commodity and amount demanded.
The sample regression equation of y on x is defined as 𝑦 = 𝑎 + 𝑏𝑥
Where y is the dependent variable, x is the independent variable, a is a constant, b is the slope of the
line or sample regression coefficient of y on x.
Least square principle: The general form of the regression equation is given below-
𝑦 = 𝑎 + 𝑏𝑥 …..(1) ⇒ σ 𝑦 = 𝑎𝑁 + 𝑏 σ 𝑥…..(2) Also σ 𝑥𝑦 = 𝑎 σ 𝑥 + 𝑏 σ 𝑥 2 …..(3)
σ 𝑦 σ 𝑥 2 −σ 𝑥 σ 𝑥𝑦 𝑁 σ 𝑥𝑦−σ 𝑥 σ 𝑦
Solving equations (1) and (2) , we have 𝑎 = and 𝑏 =
𝑁 σ 𝑥2− σ 𝑥 2 𝑁 σ 𝑥2− σ 𝑥 2
Standard error of the estimate: A measure of the dispersion, or scatter, of the observed values around the
σ 𝑦−𝑦ത 2
line of the regression. It is defined as 𝑆𝑦𝑥 =
𝑛−2
If 𝑆𝑦𝑥 is small, this indicates that the data are relatively close to the regression line and the regression
equation y can be used to predict with little error. If 𝑆𝑦𝑥 is large, this indicates that the data are widely
scattered around the regression line and the regression equation will not provide a precise estimate y.
Regression analysis
Some important properties of regression coefficient
1. The regression coefficient measures the average change in dependent variable for a unit change in
independent variable.
2. Regression coefficient are not symmetrical function of x and y.
3. Both the regression coefficient have the same sign.
4. The correlation coefficient is the geometric mean of two regression coefficient,
5. The arithmetic mean of the regression coefficients is equal to or greater than the correlation coefficient,
6. If one of the regression coefficient is greater than one, then the other regression coefficient must be less
than 1
7. The sign of correlation coefficient and the regression coefficients are same, all the measures depend on
the sign of the covariance appearing in the numerator.
σ 𝑥𝑖 −𝑥ҧ 𝑦𝑖 −𝑦ത
❑ Regression coefficient of x and y is 𝑏𝑥𝑦 =
σ 𝑦𝑖 −𝑦ത 2
σ 𝑥𝑖 −𝑥ҧ 𝑦𝑖 −𝑦ത
❑ Regression coefficient of y and x is 𝑏𝑦𝑥 =
σ 𝑥𝑖 −𝑥ҧ 2
Regression analysis
Differences between simple correlation and simple regression
Simple correlation Simple regression
Simple correlation measures the direction and strength of Regression measures the effect of independent variable
linear relationship between two variables on dependent variable.
Correlation coefficient is symmetrical about the variables. Regression coefficient are not symmetrical.
The value of correlation coefficient lies between -1 to 1. The value of regression coefficient lies between -∞ to ∞.
Correlation coefficient is a pure number. It is a relative Regression coefficient is an absolute measurement. It
measurement. depends on the unit of measurement of the variables
Correlation coefficient does not measure cause and effect Regression coefficient measures cause and effect
relationship between the variables under study. relationship between the variables under study.
Example: The following data give the test scores and sales made by nine salesman during the last year of a big
departmental store:
Test Scores: x 14 19 24 21 26 22 15 20 19
Sales in lakh taka: y 31 36 48 37 50 45 33 41 39
Find the regression equation of test scores on sales. II. Find the test score when the sale is taka 40 lakh.
III. Compute coefficient of determination and comment.
Regression analysis
Solution: i) The best fitted regression equation of test scores 𝑦 on sales 𝑥 is 𝑦 = 𝑎 + 𝑏𝑥
Table for calculation of regression equation:
Test Scores (y) Sales (x) xy x2 y2 σ 𝑦 σ 𝑥 2 −σ 𝑥 σ 𝑥𝑦
we have 𝑎 =
𝑁 σ 𝑥2− σ 𝑥 2
14 31
19 36 180×14746−360×7393
= = −2.31
24 48 9×14746− 360 2
21 37 𝑁 σ 𝑥𝑦−σ 𝑥 σ 𝑦 9×7393−360×180
26 50 and 𝑏 = = = 0.56
𝑁 σ 𝑥2− σ 𝑥 2 9×14746− 360 2
22 45
15 33 ∴ 𝑦 = −2.31 + 0.56𝑥
20 41
19 39
σy =180 σ x =360 σxy = 7393 σ x2 = 14746 σy2 = 3720
ii) When x = 40, the value of y is y= -2.4+0.56 40 = 20 .The test score is 20 when the sale is tk. 40 lakh.
2
𝑁 σ 𝑥𝑦−σ 𝑥 σ 𝑦
iii) 𝑟 2 = = 0.95 2 = 0.9025. Comment: The test scores of the salespersons
𝑁 σ 𝑥2− σ𝑥 2 𝑁 σ 𝑦2− σ𝑦 2
explain 90.25% sales of the store. Hence, the regression line fit well.
Homework
1. An industrial engineer c0llected the following data on experience and performance rating of 8 operators:
Operators 1 2 3 4 5 6 7 8
Experience(years) 16 12 18 4 3 10 5 12
Performance rating 87 88 89 68 38 80 70 85
(a) Does the data give evidence that experience improves performance?
(b) Estimate the performance rating of an operator having (a) 9 years and (b) 15 years experience.
2. The data given below relate the thickness loss during calendaring of a viscose needle punched fabric
and the load on the calendar bowl Load (tons) :X 0.5 1.0 1.5 2.0 2.5 3.0 3.5
Thickness less (%) : Y 4 13 14 20 24 33 35
1) Fit the regression equations of Thickness on load.
2) Estimate the thickness when the load on the calendar bowl is 2.25 tons.
3. Show that 𝑟 = 𝑏𝑥𝑦 𝑏𝑦𝑥