0% found this document useful (0 votes)
55 views

Regression Analysis

The document discusses correlation and regression analysis. It defines correlation as measuring the relationship between two variables, and the coefficient of correlation measures the strength of this relationship from -1 to 1. Regression analysis estimates the value of a dependent variable based on the value of one or more independent variables. The method of least squares is used to fit a regression line that minimizes the variation between observed and predicted dependent variable values. An example applies these concepts to predict student final exam grades based on midterm grades.

Uploaded by

Allen Kurt Ramos
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views

Regression Analysis

The document discusses correlation and regression analysis. It defines correlation as measuring the relationship between two variables, and the coefficient of correlation measures the strength of this relationship from -1 to 1. Regression analysis estimates the value of a dependent variable based on the value of one or more independent variables. The method of least squares is used to fit a regression line that minimizes the variation between observed and predicted dependent variable values. An example applies these concepts to predict student final exam grades based on midterm grades.

Uploaded by

Allen Kurt Ramos
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

CORRELATION ANALYSIS

Correlation is concerned with the relationship between two variables. It measures the
association or strength of relationship between two variables say x and y.

To any extent, the changes in one variable affect the value of the other variable.

The coefficient of correlation denoted by ρ(the Greek letter rho) or 𝑟, measures the similarity of
the changes in the value of x and y. Its ranges is

−𝟏 ≤ 𝒓 ≤ +𝟏

If y increases when x increases, 𝑟 is positive. If y decreases when x decreases when x increases,


𝑟 is negative. If y is unaffectedby x, then 𝒓 = 𝟎.

Most Familiar Measures of Correlation

Pearson Product Moment Coefficient of Correlation

𝑛 ∑ 𝑥𝑦 − ∑ 𝑥 ∑ 𝑦
𝑟=
√(𝑛 ∑ 𝑥 2 − (∑ 𝑥)2 )(𝑛 ∑ 𝑦 2 − (∑ 𝑦)2 )

Spearman Rank-Order Coefficient of Correlation

6 ∑ 𝐷2
𝑟=
𝑛(𝑛2 − 1)

REGRESSION ANALYSIS

The primary objective of regression analysis is to estimate the value of random variable
(dependent variable) given that the value of an associated variable (independent variable) is known

Dependent variable- is also called response variable or predicted variable

Independent Variable- is also called predictor variable.

Regression Equation- is the algebraic formula by which the estimated value of the dependent or
response variable is determined.

Simple Regression Analysis- indicates that the value of a dependent variable is estimated on the basis
of one independent or predictor variable

Multiple Regression Analysis- is concerned with estimating the value of a dependent variable on basis
of two or more independent variables.

General Assumptions underlying the regression analysis model:

1. The dependent variable is a random variable


2. The independent and dependent variables are linearly associated
Assumption(1) indicates that although the values of the independent variable may be
controlled, the values of the dependent variable must be obtained through the process of
random sampling
If the interval estimation or hypothesis testing is done in this regression analysis, required
assumption are;

3. The variances of conditional distributions of the dependent variable, given different values for
the independent variable are equal,
4. The conditional distributions of dependent variable, given different values for independent
variable, are all normally distributed in the population of values, and
5. The observed values of the dependent variable are independent of each other.

The Method of Least Squares for fitting a Regression Line or Straight Line Model

The statistical procedure for finding the “best-fitting straight line” for set of points is the
Method of Least Squares. It is the line that minimizes the sum of squares of the deviations of the
observed values of y from those predicted.

The linear equation that represents the simple linear regression

𝒀𝒊 = 𝜷𝟎 + 𝜷𝟏 𝑿𝒊 + 𝜺𝒊

Where: 𝑌𝑖 = value of the dependent variable in the 𝑖th trial or observation

𝛽0 = first parameter of the regression equation, which indicates the value of Y when X =0

𝛽1 = second parameter of the regression equation, called the regression coefficients,

which indicates the slope of the regression line

𝑋𝑖 = the specified value of the independent variable in the 𝑖th trial or observation

𝜀𝑖 = random-sampling error in the 𝑖th trial or observation

The parameters 𝛽0 and 𝛽1 in the linear regression model are estimated by the values 𝑏0 and
𝑏1 that are based on sample data. Thus, the linear regression equation based on the sample data that is
used to estimate a single (conditional) value of the dependent variable, where the “hat” over the Y
indicates that it is estimated value is

𝒀 = 𝒃𝟎 + 𝒃𝟏 𝑿

Or we may adopt a simpler formula as

𝒚 = 𝒂 + 𝒃𝒙

Where: 𝑦 = the dependent or predicted variable

𝑥 = the independent or criterion variables

𝑎 = the y –intercept (𝑏0 )

𝑏 = the slope of the regression line (𝑏1 )

To solve for 𝑏

𝑛 ∑ 𝑥𝑦 − ∑ 𝑥 ∑ 𝑦
𝑏=
𝑛 ∑ 𝑥 2 − (∑ 𝑥)2

To solve for 𝑎

𝑎 = 𝑦̅ − 𝑏𝑥̅
Example: The data below summarizes the results of midterm grade and final exam result. Let us
try to predict that if a certain grade result in midterm will determine a value for his final grade.

Let x = midterm grade


x 75 70 65 90 85 85 80 70 65 90
y = final grade y 80 75 65 95 90 85 90 75 70 90

From a previous computation, using the Pearson Product Moment Coefficient of Correlation,
the computed value r = 0.949 and is highly significant. This shows that there is a very strong or very
high association between the two results. Below are the solution using Stepwise Method.

Solving the above problem using Stepwise Method:

I. Problem: Is there a significant relationship between the midterm grade and final
examinations of 10 students in Mathematics
II. Hypothesis:
Null Hypothesis:
There is no significant relationship between the midterm grades and final
Examination grades of 10 students in Mathematics
Alternative Hypothesis:
There is a significant relationship between the midterm grades and final
Examination grades of 10 students in Mathematics
III. Level of Significance
α = 0.05 and 𝑑𝑓 = 𝑛 − 2 = 10 − 2 = 8

𝑟0.05 = 0.632 , this is the tabulated value

IV. Statistics Use: Pearson Product Moment of Correlation

𝑥 𝑦 𝑥2 𝑦2 𝑥𝑦

75 80 5625 6400 6000


70 75 4900 5625 5250
65 65 4225 4225 4225
90 95 8100 9025 8550
85 90 7225 8100 7650
85 85 7225 7225 7225
80 90 6400 8100 7200
70 75 4900 5625 5250
65 70 4225 4900 4550
90 90 8100 8100 8100
775 815 60925 67325 64000
The computations are summarized into the following:

∑ 𝑥 = 775 , ∑ 𝑦 = 815, ∑ 𝑥 2 = 60,925 , ∑ 𝑦 2 = 67,325, ∑ 𝑥𝑦 = 64,000

Substitute to the formula of Pearson Product Moment Coefficient of Correlation and


solving simultaneously. The value is

𝑟 = 0.949

V. Decision Rule: if the computed r value is greater that the r tabular value, disconfirm or
reject the null hypothesis 𝐻0
VI. Conclusion/Implications:
Since the computed r value which is 0.949 is greater than the tabular r value of
0.632 at 0.05 level of significance, with 8 as degrees of freedom, the null hypothesis id
disconfirmed.
This means that there is a significant relationship between the midterm grades
of students and the final examination results. It implies that the higher the midterm
grades, the higher also are the final exam result. Its show a positive correlation or direct
correlation

From the previous computation, using the Pearson Product Moment Coefficient of Correlation,
the computed value r = 0.949 and is highly significant. This shows that there is a very strong or very
high association between the two results.

Applying the method of Linear Regression Analysis of the above problem

1. We need to compute the needed values use in the Pearson r such as , ∑ 𝑥 , ∑ 𝑦 , ∑ 𝑥𝑦 , ∑ 𝑥 2

∑ 𝑥 = 775 , ∑ 𝑦 = 815, ∑ 𝑥 2 = 60,925 , ∑ 𝑥𝑦 = 64,000

2. Solve for 𝑏 by substituting the above obtained values

Using
𝑛 ∑ 𝑥𝑦 − ∑ 𝑥 ∑ 𝑦
𝑏=
𝑛 ∑ 𝑥 2 − (∑ 𝑥)2

Then

10(64,000) − 775(815) 8375


𝑏= = = 0.971
10(60,925) − (775)2 8625

3. Solve for 𝑦̅ and 𝑥̅ where


∑𝑦 815 ∑𝑥 775
𝑦̅ = = = 81.5 and 𝑥̅ = = = 77.5
𝑛 10 𝑛 10

Then using

𝑎 = 𝑦̅ − 𝑏𝑥̅

We obtained

𝑎 = 81.5 − 0.971(77.5 ) = 6.25

4. With 𝑎 and 𝑏 already computed we will obtain the equation of the regression line as

𝑦 = 𝑎 + 𝑏𝑥

Now
𝒚 = 𝟔. 𝟐𝟓 + 𝟎. 𝟗𝟕𝟏𝒙

6.25 + 0.971(86)=6.25+ 83.51=89.76

6.25 + 0.971(73)= 6.25+ 70.88=77.13

Where the y-intercept is 6.25 and a slope of 0.971

Execrcises:

1. Below is a summary of Advertising-Sales Data of EG Merchandising for the year 2001


Let x = Advertising Expenses in Thousand
y = Sales Revenue

x 4 2 3 5 8 7 7 9 3 5 8 10
y 20 25 10 15 30 24 28 35 12 16 32 45

Show a scatter diagram and determine the equation of the regression line for the above data

2. Compute for the equation of the regression line that determined by the given data below

x 2 7 5 4 9 3 3 4 5 8
y 20 35 48 51 71 39 45 25 60 70

You might also like