Regression Analysis
Regression Analysis
Correlation is concerned with the relationship between two variables. It measures the
association or strength of relationship between two variables say x and y.
To any extent, the changes in one variable affect the value of the other variable.
The coefficient of correlation denoted by ρ(the Greek letter rho) or 𝑟, measures the similarity of
the changes in the value of x and y. Its ranges is
−𝟏 ≤ 𝒓 ≤ +𝟏
𝑛 ∑ 𝑥𝑦 − ∑ 𝑥 ∑ 𝑦
𝑟=
√(𝑛 ∑ 𝑥 2 − (∑ 𝑥)2 )(𝑛 ∑ 𝑦 2 − (∑ 𝑦)2 )
6 ∑ 𝐷2
𝑟=
𝑛(𝑛2 − 1)
REGRESSION ANALYSIS
The primary objective of regression analysis is to estimate the value of random variable
(dependent variable) given that the value of an associated variable (independent variable) is known
Regression Equation- is the algebraic formula by which the estimated value of the dependent or
response variable is determined.
Simple Regression Analysis- indicates that the value of a dependent variable is estimated on the basis
of one independent or predictor variable
Multiple Regression Analysis- is concerned with estimating the value of a dependent variable on basis
of two or more independent variables.
3. The variances of conditional distributions of the dependent variable, given different values for
the independent variable are equal,
4. The conditional distributions of dependent variable, given different values for independent
variable, are all normally distributed in the population of values, and
5. The observed values of the dependent variable are independent of each other.
The Method of Least Squares for fitting a Regression Line or Straight Line Model
The statistical procedure for finding the “best-fitting straight line” for set of points is the
Method of Least Squares. It is the line that minimizes the sum of squares of the deviations of the
observed values of y from those predicted.
𝒀𝒊 = 𝜷𝟎 + 𝜷𝟏 𝑿𝒊 + 𝜺𝒊
𝛽0 = first parameter of the regression equation, which indicates the value of Y when X =0
𝑋𝑖 = the specified value of the independent variable in the 𝑖th trial or observation
The parameters 𝛽0 and 𝛽1 in the linear regression model are estimated by the values 𝑏0 and
𝑏1 that are based on sample data. Thus, the linear regression equation based on the sample data that is
used to estimate a single (conditional) value of the dependent variable, where the “hat” over the Y
indicates that it is estimated value is
𝒀 = 𝒃𝟎 + 𝒃𝟏 𝑿
𝒚 = 𝒂 + 𝒃𝒙
To solve for 𝑏
𝑛 ∑ 𝑥𝑦 − ∑ 𝑥 ∑ 𝑦
𝑏=
𝑛 ∑ 𝑥 2 − (∑ 𝑥)2
To solve for 𝑎
𝑎 = 𝑦̅ − 𝑏𝑥̅
Example: The data below summarizes the results of midterm grade and final exam result. Let us
try to predict that if a certain grade result in midterm will determine a value for his final grade.
From a previous computation, using the Pearson Product Moment Coefficient of Correlation,
the computed value r = 0.949 and is highly significant. This shows that there is a very strong or very
high association between the two results. Below are the solution using Stepwise Method.
I. Problem: Is there a significant relationship between the midterm grade and final
examinations of 10 students in Mathematics
II. Hypothesis:
Null Hypothesis:
There is no significant relationship between the midterm grades and final
Examination grades of 10 students in Mathematics
Alternative Hypothesis:
There is a significant relationship between the midterm grades and final
Examination grades of 10 students in Mathematics
III. Level of Significance
α = 0.05 and 𝑑𝑓 = 𝑛 − 2 = 10 − 2 = 8
𝑥 𝑦 𝑥2 𝑦2 𝑥𝑦
𝑟 = 0.949
V. Decision Rule: if the computed r value is greater that the r tabular value, disconfirm or
reject the null hypothesis 𝐻0
VI. Conclusion/Implications:
Since the computed r value which is 0.949 is greater than the tabular r value of
0.632 at 0.05 level of significance, with 8 as degrees of freedom, the null hypothesis id
disconfirmed.
This means that there is a significant relationship between the midterm grades
of students and the final examination results. It implies that the higher the midterm
grades, the higher also are the final exam result. Its show a positive correlation or direct
correlation
From the previous computation, using the Pearson Product Moment Coefficient of Correlation,
the computed value r = 0.949 and is highly significant. This shows that there is a very strong or very
high association between the two results.
Using
𝑛 ∑ 𝑥𝑦 − ∑ 𝑥 ∑ 𝑦
𝑏=
𝑛 ∑ 𝑥 2 − (∑ 𝑥)2
Then
Then using
𝑎 = 𝑦̅ − 𝑏𝑥̅
We obtained
4. With 𝑎 and 𝑏 already computed we will obtain the equation of the regression line as
𝑦 = 𝑎 + 𝑏𝑥
Now
𝒚 = 𝟔. 𝟐𝟓 + 𝟎. 𝟗𝟕𝟏𝒙
Execrcises:
x 4 2 3 5 8 7 7 9 3 5 8 10
y 20 25 10 15 30 24 28 35 12 16 32 45
Show a scatter diagram and determine the equation of the regression line for the above data
2. Compute for the equation of the regression line that determined by the given data below
x 2 7 5 4 9 3 3 4 5 8
y 20 35 48 51 71 39 45 25 60 70