QT_LESSON 8-Regression & Correlation.docx
QT_LESSON 8-Regression & Correlation.docx
I. REGRESSION.
Regression is the measure of the average relationship between two or more variables in
terms of the original units of the data.
The regression study which confines itself to a study of only two variables is called simple
regression. The regression analysis which studies more than two variables at a time is
called a multiple regression.
In a simple regression analysis, there are two variables-one of which is known as an
‘independent variable’ or ‘regressor /predictor/explanatory.’ This other variable whose
values are predicted is called the ‘dependent’ or ‘regressed ‘or explained variable.
With the help of regression studies, we can also calculate the coefficient of correlation.
The coefficient of determination (𝒓𝟐 ) which measures the effect of the independent
variable on the dependent variable gives us an indication about the predictive value of the
regression studies. We will use the method of least squares to obtain the regression
equation.
𝒂 ∑ 𝐗 + 𝒃 ∑ 𝐗 𝟐 = ∑ 𝑿𝒀
It the values of X and Y variables are substituted in the above equation we get the values
of 𝒂 and 𝒃 and thus we get the regression line of Y on X.
𝑎 ∑ Y + 𝑏 ∑ Y 2 = ∑ 𝑋𝑌
Example
Find the estimates of a and b given that 𝑌 = 𝑎 + 𝑏𝑋 and thus find the value of Y when 𝑋 =
12
X 8 10 6 11 8 7 10 9 10
Example
The table below shows the number of absences, X, in a Quantitative methods course and
the final exam grade, Y, for 7 students. Find the line that best fit the data.
X 1 0 2 6 4 3 3
Y 85 80 70 55 90 90 95
Example
From the following data obtain the two regression equations using the method of Least
Squares.
X 2 4 6 8 10
Y 5 7 9 8 11
If two variables X and Y vary in such a way that changes in one are accompanied by
changes in the other, these variables are said to be correlated.
If the increase in one variable is associated with an increase in the other variable, the
correlation is said to be positive. If the increase in one is being associated with a decrease
in the other, the correlation is said to be negative.
Thus, correlation is a measure of the association between the values of two variables
X and Y. Correlation techniques are used in predicting the values of one variable from the
values of another variable. If the values of the two variables satisfy an equation exactly,
we say that the variables are perfectly correlated. If there is no such equation, then
there is no relationship between the two variables and they are said to be uncorrelated.
Scatter Diagrams
If all points on a scatter diagram seem to lie near a straight line, the correlation is called
linear. If Y tends to increase as X increases, the correlation is positive; and if Y tends to
decrease as X increases, the correlation is negative. If all the points seem to lie near some
curve, the correlation is called non-linear.
𝒄𝒐𝒗𝒂𝒓𝒊𝒂𝒏𝒄𝒆 𝒐𝒇 (𝒙, 𝒚)
𝒓=
√(𝒗𝒂𝒓𝒊𝒂𝒏𝒄𝒆 𝒐𝒇 𝒙)(𝒗𝒂𝒓𝒊𝒂𝒏𝒄𝒆 𝒐𝒇 𝒚)
∑ 𝒙𝒚 ∑ 𝒙𝟐
where 𝒄𝒐𝒗𝒂𝒓𝒊𝒂𝒏𝒄𝒆 𝒐𝒇 (𝒙, 𝒚) = ̅𝒚
−𝒙 ̅ , variance of 𝒙 = ̅𝟐
−𝒙
𝒏 𝒏
∑ 𝒚𝟐
and variance of 𝒚 = ̅𝟐
−𝒚
𝒏
1. A researcher studied the connection between x (the age in years of a licensed driver)
and y (the percentage of fatal accidents for drivers of that age which are caused by
speeding). The collected data is shown below. Calculate the coefficients of
correlation.
x 17 27 37 47 57 67 77
y 36 25 20 12 10 7 5
x 1 0 2 6 4 3 3
y 85 80 70 55 90 90 95
Coefficient of determination
This measure denoted by 𝒓𝟐 is used to determine the goodness of fit of the regression
equation. It indicates the proportion of variance in the dependent variable that can be
explained by the independent variables.
A researcher studied the connection between x (the age in years of a licensed driver) and
y (the percentage of fatal accidents for drivers of that age which are caused by
speeding). The collected data is shown below. Calculate the coefficients of determination.
X 17 27 37 47 57 67 77
Y 36 25 20 12 10 7 5
To compute Karl Pearson’s Correlation Coefficient, Let’s consider the data given
below.
Age (years) 1 2 3 4 5
Weight (kg) 7 4 6 7 10
Let’s calculate the Karl Pearson’s Correlation Coefficient between the Age and the
weight of these children in a health Centre
Solution
Karl Pearson’s Correlation Coefficient (r) is given by the formula
𝒏(∑ 𝒙𝒚) − (∑ 𝒙)(∑ 𝒚)
𝒓=
√{∑ 𝒏𝒙𝟐 − (∑ 𝒙)𝟐 } {∑ 𝒏𝒚𝟐 − (∑ 𝒚)𝟐 }
Example 2
Sometimes we get a question where the Ranks are already provided and your task become
very easy.
i.e
The table below shows the ranks of students in Maths and physics test. Calculate the
Ranks Correlation Coefficient
𝑹𝑴 8 5 9 2 4 1 7 3 10 6
𝑹𝑷 7 10 6 3 2 5 9 1 8 4
You task is to obtain D and 𝐷 2 then substitute your values into the formula
𝟔 ∑ 𝑫𝟐
𝒓𝒌 = 𝟏 −
𝑵(𝑵𝟐 −𝟏)
1. Given that the covariance between Lengths and Weights of 5-items is 6 and their
standard deviation are 2.45 and 2.61 respectively. Find Coefficient of correlation
between the lengths and the weight
Solution
𝑪𝒐𝒗(𝑿, 𝒀) = 𝟔, 𝜹𝑿 = 𝟐. 𝟒𝟓, 𝒂𝒏𝒅 𝜹𝒀 = 𝟐. 𝟔𝟏.
𝑪𝒐𝒗(𝑿, 𝒀)
𝒂𝒏𝒅 𝒘𝒆 𝒌𝒏𝒐𝒘, 𝒓 =
𝜹𝑿 × 𝜹𝒀
𝑹𝒆𝒎𝒆𝒎𝒃𝒆𝒓, 𝜹𝟐 = 𝒗𝒂𝒓𝒊𝒂𝒏𝒄𝒆
Exercise
1. The table below reports the ages (in years) and the number of hours of sleep in
one night by seven adults.
Ages, X 35 20 59 42 38 68 75
Hours of 7 9 5 6 8 5 4
sleep, Y
2. The following table show the prices of a commodity and the amount demanded at
each respective price.
Price (X): 10 12 13 12 16 15
X 63 52 59 57 64 65 55 56 59
X 1 5 3 2 1 1 7 3
Y 6 1 0 0 1 2 1 5
5. The following data gives the ages and blood pressure of 10 women;
Age(X) 56 42 36 47 49 42 60 72 63 55
Blood 147 125 118 128 145 140 155 160 149 150
Pressure(Y)
6. The following data have been collected regarding sales and advertising
expenditure
Sales (Sh. M) 8 9 7 8 9 10
Advertising expenditure 21 25 29 33 37 41
(Sh. M)
Calculate the Karl Pearson’s correlation coefficient between sales revenue and
advertising expenditure. Comment on the results. (10 Marks)
7. The following data give the test scores and sales made by 9 salesmen
during the last one year
Test scores 14 19 24 21 26 22 15 20 19
Sales 31 36 48 37 50 45 33 41 39
(millions
Obtain
Applicant A B C D E F G H
Marks in Accountancy 15 20 28 12 40 60 24 80
Marks in Statistics 40 32 50 35 20 10 30 60
9. The following data have been collected regarding sales and advertising
expenditure
10. Two soccer judges were assigned the task of assessing 10 players of a
team and each judge awarded each player points as shown in the table
Judge A 34 30 44 8 12 41 38 18 26 28
Judge B 26 22 42 10 18 32 46 17 12 30
11. Find the regression equation for predicting Y from X given the data
X 0 2 8 6
Y 4 1 10 9 (7 marks)
1. EXERCIZES
a. Use the method of least squares to determine the equation of the straight
line that best fits the following data.
(10 Marks)
X 11 13 14 17 18 21 26
Y 20 23 25 28 30 34 38
b. Using the data in (b) above find the coefficients of correlation and
determination and interpret your result.
(10 Marks)
c. The following table gives the profits in ten thousand of shillings of two
supermarkets. Compute the coefficient of variation for each supermarket
and indicate which one has higher variability of profits.
(8 Marks)
A 48 15 28 41 59 41
B 33 20 23 69 45 53
(8 Marks)
II. The table shows the turnover and profit before taxation of a supermarket from
1982 to 1987.
Year Turnover (‘0000 ksh) Profit before Taxation
1982 106 10
1983 125 12
1984 147 16
1985 167 17
1986 187 18
1987 220 22
(i) Plot a scatter diagram showing the relationship between profit before taxation and
turn over. (4 marks)
(ii) Calculate the regression line of profit before taxation on turnover
( 8 marks)
(iii) Forecast the profit if the turnover is 2,800,000. (2 Marks)
X 17 27 37 47 57 67 77
y 36 25 20 12 10 7 5
Using this data to: