Simple Regression and Correlation MEE
Simple Regression and Correlation MEE
ON
SUBMITTED TO:
SUBMITTED BY:
I like to express sincere gratitude and heartful thanks to my guide Mr. Mian
Abbas for his encouragement given to me at every stage of work. His kind nature
and knowledgeable guidance has given me spirit to fulfill my work in easier way. I
am immensely indebted to him for his affection and help rendered to me in several
ways. I am very grateful for his timely attention, criticism and for being a constant
source of inspiration. He constantly motivated me to step forward without being
dissipated by frolics and failure.
Simple Regression and Correlation
We are going to discuss a powerful statistical technique for examining whether or
not two variables are related. Specifically, we are going to talk about the ideas of
simple regression and correlation.
One reason why regression is powerful is that we can use it to demonstrate
causality; that is, we can show that an independent variable causes a change in a
dependent variable.
Scatter grams
The simplest thing we can do with two variables that we believe are related is to
draw a scatter gram. A scatter gram is a simple graph that plots values of our
dependent variable Y and independent variable X. Normally we plot our dependent
variable on the vertical axis and the independent variable on the horizontal axis.
For example, let’s make a scatter gram of the following data set:
Individual No. of Sodas Consumed No. of Bathroom Trips
Rick 1 2
Janice 2 1
Paul 3 3
Susan 3 4
Cindy 4 6
John 5 5
Donald 6 5
Just from our scatter gram, we can sometimes get a fairly good idea of the
relationship between our variables. In our current scatter gram, it looks like a line
that slopes up to the right would “fit” the data pretty well.
Essentially, that’s the entire math we have to do: figure out the best fit line, i.e. the
line that represents an “average” of our data points.
Note that sometimes our data won’t be linearly related at all; sometimes, there may
be a “curvilinear” or other nonlinear relationship. If it looks like the data are
related, but the regression doesn’t “fit” well, chances are this is the case.
a = ¯y − bx
And Again, this looks ugly, but it’s all the same simple math we already know and
love: just use PEMA, and we will get the right answer.
a = (26/7)−0.8387(24/7)
= 3.7142−(0.8387)(3.4285)
= 3.7142 − 2.8754 = 0.8388.
Pearson’s r
Now that we’ve found a and b, we know the intercept and slope of the regression
line, and it appears that X and Y are related in some way. But how strong is that
relationship?
That’s where Pearson’s r comes in. Pearson’s r is a measure of correlation;
sometimes, we just call it the correlation coefficient. R tells us how strong the
relationship between X and Y is.
Calculating Pearson’s r
The formula for Pearson’s r is somewhat similar to the formula for the slope (b). It
is as follows:
We already know the numerator from calculating the slope earlier, so the only hard
part is the denominator, where we have to calculate each square root separately,
and then multiply them together.
To test whether or not r is significantly different from zero, we use the t test for
Pearson’s r:
Since this is like any other hypothesis test, we want to compare tob with tcrit. For
this test, we use our given alpha level (conventionally, .05 or .01) and df = n − 2.
In this case, we subtract 2 from our sample size because we have two variables.
Now, like in other significance tests, we find our critical value of t from the table
(_ = .05, df = 5 : 2.571) and compare it to the obtained value. Since 2.571 _ 2.9893,
we reject the null hypothesis and conclude that the correlation is statistically
significant.