Regression Analysis Linear and Multiple Regression
Regression Analysis Linear and Multiple Regression
Dependent Variable: This is the main factor that you’re trying to understand or predict.
Independent Variables: These are the factors that you hypothesize have an impact on
your dependent variable.
Regression analysis is used in stats to find trends in data. For example, you might
guess that there’s a connection between how much you eat and how much you weigh;
regression analysis can help you quantify that. Regression analysis will provide you with
an equation for a graph so that you can make predictions about your data. For example,
if you’ve been putting on weight over the last few years, it can predict how much you’ll
weigh in ten years time if you continue to put on weight at the same rate. It will also give
you a slew of statistics (including a p-value and a correlation coefficient) to tell you how
accurate your model is. Most elementary stats courses cover very basic techniques, like
making scatter plots and performing linear regression. However, you may come across
more advanced techniques like multiple regressions.
In statistics, it’s hard to stare at a set of random numbers in a table and try to make any
sense of it. For example, global warming may be reducing average snowfall in your
town and you are asked to predict how much snow you think will fall this year. Looking
at the following table you might guess somewhere around 10-20 inches. That’s a good
guess, but you could make a better guess, by using regression.
Essentially, regression is the “best guess” at using a set of data to make some kind of
prediction. It’s fitting a set of points to a graph. There’s a whole host of tools that can run
regression for you, including Excel, which I used here to help make sense of that
snowfall data:
Just by looking at the regression line running down through the data, you can fine tune
your best guess a bit. You can see that the original guess (20 inches or so) was way off.
For 2015, it looks like the line will be somewhere between 5 and 10 inches! That might
be “good enough”, but regression also gives you a useful equation, which for this chart
is:
y = -2.2923x + 4624.4.
What that means is you can plug in an x value (the year) and get a pretty good estimate
of snowfall for any year. For example, 2005:
y = -2.2923(2005) + 4624.4 = 28.3385 inches, which is pretty close to the actual figure
of 30 inches for that year.
Best of all, you can use the equation to make predictions. For example, how much snow
will fall in 2017?
y=2.2923(2017) + 4624.4 = 0.8 inches
Regression also gives you an R squared value which for this graph is 0.0702. this
number tells you how good your model is. The values range from 0 to 1 with 0 being a
terrible model and 1 being a perfect model. As you can probably see 0.7 is a fairly
decent model so you can be fairly confident in your weather prediction!
Regression analysis is always performed in software, like Excel or SPSS. The output
differs according to how many variables you have but it’s essentially the same type of
output you would find in a simple linear regression. There’s just more of it:
Simple linear regression is a model that assesses the relationship between a dependent
variable and an independent variable. The simple linear model is expressed using the
following equation:
Y = a + bX + ϵ
Where:
Multiple linear regression analysis is essentially similar to the simple linear model, with
the exception that multiple independent variables are used in the model. The
mathematical representation of multiple linear regression is:
Where:
Multiple linear regression follows the same conditions as the simple linear model.
However, since there are several independent variables in multiple linear analysis, there
is another mandatory condition for the model:
Practice/Drills
Answer the following question:
1. https://www.surveygizmo.com/resources/blog/regression-
analysis/#:~:text=Regression%20analysis%20is%20a%20powerful,variables
%20on%20a%20dependent%20variable.
2. http://faculty.cas.usf.edu/mbrannick/regression/Part3/Reg2.html