Regression Analysis, Linear or Nonlinear Regression? That Is The Question. - Minitab
Regression Analysis, Linear or Nonlinear Regression? That Is The Question. - Minitab
| Minitab
21/11/15, 01:45
(http://blog.minitab.com)
Project
(http://blog.minitab.com/blog/project-tools-2) . 28 July, 2011 Minitab.com (http://www.minitab.com)
Jim FrostTools
(http://blog.minitab.com/blog/adventures-in-statistics)
13
"
58
20
()
()17 (http://blog.minitab.com/blog/adventures-in-statistics/linear-or-nonlinear-regression-that-is-the-question)
In the process of adding our new Nonlinear Regression analysis to Minitab 16, I had the
opportunity to learn a lot about it.
Master
Statistics
Anytime,
Anywhere
Quality Trainer teaches
you how to analyze
your data anytime you
are online.
As you probably noticed, the field of statistics (http://blog.minitab.com/blog/adventures-instatistics/why-statistics-is-important) is a strange beast. Need more evidence? Linear
regression can produce curved lines and nonlinear regression is not named for its curved
lines.
So, when should you use Nonlinear Regression (http://support.minitab.com/enus/minitab/17/topic-library/modeling-statistics/regression-andcorrelation/basics/nonlinear-regression/) over one of our linear methods, such as
Regression, Best Subsets, or Stepwise Regression?
Generally speaking, you should try linear regression first. Its easier to use and easier to
Take the Tour! (
interpret. However, if you simply arent able to get a good fit with linear regression, then it
http://www.minitab.com/products/qualitymight be time to try nonlinear regression.
trainer/?
WT.ac=BlogQT)
Lets look at a case where linear regression doesnt work. Often the problem is that, while
linear regression can model curves, it might not be able to model the specific curve that
exists in your data. The graphs below illustrate this with a linear model that contains a
cubed predictor (http://support.minitab.com/en-us/minitab/17/topic-library/modelingstatistics/regression-and-correlation/regression-models/what-are-response-and-predictorvariables/).
http://blog.minitab.com/blog/adventures-in-statistics/linear-or-nonlinear-regression-that-is-the-question
Page 1 of 11
21/11/15, 01:45
The fitted line plot shows that the raw data follow a nice tight function and the R-squared
(http://blog.minitab.com/blog/adventures-in-statistics/regression-analysis-how-do-iinterpret-r-squared-and-assess-the-goodness-of-fit) is 98.5%, which looks pretty good.
However, look closer and the regression line systematically over and under-predicts the
data at dierent points in the curve. When you check the residuals plots
(http://blog.minitab.com/blog/adventures-in-statistics/why-you-need-to-check-yourresidual-plots-for-regression-analysis) (which you always do, right?), you see patterns in the
Residuals versus Fits plot, rather than the randomness that you want to see. This indicates
a bad fit, but its the best that linear regression can do.
Lets try it again, but using nonlinear regression. It's important to note that because
nonlinear regression allows a nearly infinite number of possible functions, it can be more
dicult to setup. In this case, it required considerable eort to determine the function that
provided the optimal fit for the specific curve present in these data, but since my main
point is to explain when you want to use nonlinear regression instead of linear, we don't
need to relate all of those details here. (Just like on a cooking show, on the blog we have
the ability to jump from the raw ingredients to a great outcome in the graphs below without
showing all of the work in between!)
http://blog.minitab.com/blog/adventures-in-statistics/linear-or-nonlinear-regression-that-is-the-question
Page 2 of 11
21/11/15, 01:45
The fitted line plot shows that the regression line follows the data almost exactly -- there
are no systematic deviations. Its impossible to calculate R-squared for nonlinear
regression (http://blog.minitab.com/blog/adventures-in-statistics/why-is-there-no-rsquared-for-nonlinear-regression), but the S value
(http://blog.minitab.com/blog/adventures-in-statistics/regression-analysis-how-tointerpret-s-the-standard-error-of-the-regression) (roughly speaking, the average absolute
distance from the data points to the regression line) improves from 72.4 (linear) to just 13.7
for nonlinear regression. You want a lower S value because it means the data points are
closer to the fit line. What's more, the Residual versus Fits plot shows the randomness that
you want to see. Its a good fit!
Nonlinear regression can be a powerful alternative to linear regression but there are a few
drawbacks. In addition to the aforementioned diculty in setting up the analysis and the
lack of R-squared, be aware that:
http://blog.minitab.com/blog/adventures-in-statistics/linear-or-nonlinear-regression-that-is-the-question
Page 3 of 11
21/11/15, 01:45
The eect each predictor (http://support.minitab.com/en-us/minitab/17/topiclibrary/modeling-statistics/regression-and-correlation/regression-models/what-areresponse-and-predictor-variables/) has on the response (http://support.minitab.com/enus/minitab/17/topic-library/modeling-statistics/regression-and-correlation/regressionmodels/what-are-response-and-predictor-variables/) can be less intuitive to understand.
P-values are impossible to calculate (http://blog.minitab.com/blog/adventures-instatistics/why-are-there-no-p-values-for-the-variables-in-nonlinear-regression) for the
predictors.
Confidence intervals (http://blog.minitab.com/blog/adventures-in-statistics/when-shouldi-use-confidence-intervals-prediction-intervals-and-tolerance-intervals) may or may not be
calculable.
If you're using Minitab 17 now, you can play with this data yourself by going to File ->
Open Worksheet, then click on the Look in Minitab Sample Data folder icon and choose
Mobility.MTW. These data are the same that Ive used in the Nonlinear Regression Help
example in Minitab 17, which contains a fuller interpretation of the Nonlinear Regression
output.
If you'd like to try it, you can download the free 30-day trial of Minitab 17 Statistical
Software (http://it.minitab.com/en-us/products/minitab/free-trial.aspx). If you're learning
about regression, read my regression tutorial (http://blog.minitab.com/blog/adventures-instatistics/regression-analysis-tutorial-and-examples)!
Comments
Name: Nabil Darwazeh Monday, February 17, 2014
Why it it impossible to calculate R-squared for nonlinear regression, while EXCEL does calculate the R-Squared
http://blog.minitab.com/blog/adventures-in-statistics/linear-or-nonlinear-regression-that-is-the-question
Page 4 of 11
21/11/15, 01:45
Page 5 of 11
21/11/15, 01:45
If both types of R-squared are low, it's not necessarily bad if you have significant predictors and your residual plots are
good. However, it depends on what you want to do with your model.
Read this blog post for more details about this scenario:
http://blog.minitab.com/blog/adventures-in-statistics/how-high-should-r-squared-be-in-regression-analysis
(http://blog.minitab.com/blog/adventures-in-statistics/how-high-should-r-squared-be-in-regression-analysis)
Jim
http://blog.minitab.com/blog/adventures-in-statistics/linear-or-nonlinear-regression-that-is-the-question
Page 6 of 11
21/11/15, 01:45
Y = Constant + b1 * X1 + b2 * X1 squared
It's still linear in the parameters even though the predictor variable has been squared.
Here's an example of a nonlinear function, the Michaelis-Menten equation. There are 2 parameters (thetas) and one
predictor (X). Very dierent than the linear form!
y = theta1 * X1 / ( theta2 + X1 )
For the mobility example, the 2 equations are:
Linear:
Mobility = 1243 + 412.3 Density Ln - 94.29 * Density Ln^2 - 32.90 Density Ln^3
The predictor for this model is the natural log of density and it is also in the model in its squared and cubed forms.
Despite being a natural log and having the higher-order terms, it's still a linear model because it fits the linear functional
form of 1 parameter * 1 predictor for each term and the terms are additive.
Nonlinear:
Mobility = (1288.14 + 1491.08 * Density Ln + 583.238 * Density Ln^2 + 75.4167 * Density Ln^3) / (1 + 0.966295 * Density
Ln + 0.397973 * Density Ln^2 + 0.0497273 * Density Ln^3)
Basically, it's one polynomial equation divided by another that produces a curver which can't be fit by a linear function.
Unfortunately the graph chopped the denominator!
I hope this helps!
Jim
Page 7 of 11
21/11/15, 01:45
If you have more than one predictor, do the same as above with regular Regression but compare the residual plots. Look
to see if the model without the polynomial term has a non-random pattern in it. If adding the polynomial removes the
pattern, it's generally good to use the polynomial. However, you have to be sure that you're not overfitting the model. See
below.
You can check the adjusted R-squared, and especially the predicted R-squared to be sure that you're not including too
many terms. Including too many terms can improve the apparent fit, but it is actually fitting the random error in the data
rather than the true relationships. I describe how to assess this here:
http://blog.minitab.com/blog/adventures-in-statistics/multiple-regession-analysis-use-adjusted-r-squared-and-predictedr-squared-to-include-the-correct-number-of-variables (http://blog.minitab.com/blog/adventures-in-statistics/multipleregession-analysis-use-adjusted-r-squared-and-predicted-r-squared-to-include-the-correct-number-of-variables)
If the residual plots look good and you're not overfitting the model, you can then assess S, the standard error of the
regression. This tells you how wrong the model is on average. Smaller values indicate a better fit. To read about this
statistic, click the link in this post for "S value".
Finally, if you just want some examples of how to compare how well dierent curvilinear models fit a dataset, including
comparing a nonlinear model to linear models, read this post:
http://blog.minitab.com/blog/adventures-in-statistics/curve-fitting-with-linear-and-nonlinear-regression
(http://blog.minitab.com/blog/adventures-in-statistics/curve-fitting-with-linear-and-nonlinear-regression)
I hope this helps! Please don't hesitate to write again if you have further questions!
Jim
8 Comments
Recommend 1
Share
1
!
Login
Sort by Oldest
a year ago
Please clarify how dummy variable like gender could be addressed in linear regression?
Reply Share
Hi, I'm not sure what you mean by "manipulated" in the context. However to interpret a dummy
variable for gender:
If the p-value for Gender is significant, you can conclude that there is a statistically significant
dierence between males and females after controlling for all of the other variables in your model.
But, watch out for confounding variables that are not in your model.
If you use binary coding (the default for regression in Minitab 17), the coecient represents the
mean dierence between males and females.
Jim
Reply Share
http://blog.minitab.com/blog/adventures-in-statistics/linear-or-nonlinear-regression-that-is-the-question
Page 8 of 11
QuantizedDude00
21/11/15, 01:45
8 months ago
Hi, I came across your blog while searching for methods which one can use to test if a given set of data
(dependent vs independent variables) is monotone and nonlinear? I generated a set of data using the
random function in matlab and wish to perform some regression analysis. Is there any method by which
one can test if the data is nonlinear? I am baed given the fact that y=theta1*exp(theta2*xVal) +
theta3*exp(theta4*xVal2)+theta4 is not monotone or nonlianer, where all the variables are generated
randomly between -1 and 1?
Reply Share
Hi,
Have you graphed your data? That's a great starting point! You might be able to answer your
questions using the graph.
Also, I've written two blog posts that should answer your questions.
The first is how to tell the dierence between linear and nonlinear regression equations.
The second is using linear and nonlinear regression models to fit a curve.
Good luck!
Jim
Reply Share
7 months ago
Thanks alot Jim. But with the graphs, how can we know? Will you describe the
relationship between a cosine/sine vs the angles over [0,2*pi] as a nonlinear function?
Thanking
Reply Share
Definitely nonlinear!
SHUBHAM GUPTA
Reply Share
5 months ago
I am doing quality analysis, but I am unable to decide what type of relationship exists between my
variables. How to know whether to use Linear relationship or Non-linear relationship ?
Reply Share
Hi Shubham,
There are several things you should do. First, look at related studies to see what they've
determined. This is always a great place to start. You may not need to reinvent the wheel!
Also, be sure to graph the data. If you have continuous variables, you can use a scatterplot. Or,
use a matrix plot to graph multiple pairs of variables at the same time. This will help you
determine whether your data contain curvature.
Using the terms linear and nonlinear in the regression context can be tricky because these terms
don't describe your data but rather the type of model that you use. And, a linear model can fit
http://blog.minitab.com/blog/adventures-in-statistics/linear-or-nonlinear-regression-that-is-the-question
Page 9 of 11
21/11/15, 01:45
don't describe your data but rather the type of model that you use. And, a linear model can fit
curved data. For more information, be sure to read my blog post about the dierence between
linear and nonlinear models.
You may have to try fitting dierent models to see which one fits the best. I'd start with linear
models (unless your research suggests otherwise) because they're easier to use and they give
you more model summary statistics. However, if a linear model can't adequately fit your data,
you'll need to try a nonlinear model. I've written a blog post about curve fitting using both linear
and nonlinear models and how to compare the results. This post will be very helpful for you to
read and help you determine which type of model you need.
Best of luck with your study!
Jim
Subscribe
Reply Share
Privacy
Who We Are
Authors
Carly Barry
(http://blog.minitab.com/blog/realworld-qualityimprovement)
Patrick Runkel
(http://blog.minitab.com/blog/statisticsand-quality-dataanalysis)
Joel Smith
(http://blog.minitab.com/blog/funwith-statistics)
Kevin Rudy
(http://blog.minitab.com/blog/thestatistics-game)
Jim Frost
(http://blog.minitab.com/blog/adventure
in-statistics)
Greg Fox
(http://blog.minitab.com/blog/dataanalysis-andqualityimprovement-andstu)
Eric Heckman
(http://blog.minitab.com/blog/startingout-with-statistical-
Visit Us at Minitab.com
Blog Map (http://blog.minitab.com/sitemap.html) | Legal
(http://www.minitab.com/legal/) | Privacy Policy
(http://www.minitab.com/legal/#privacypolicy) | Trademarks
(http://www.minitab.com/legal/trademarks/)
Copyright 2015 Minitab Inc. All rights Reserved.
http://blog.minitab.com/blog/adventures-in-statistics/linear-or-nonlinear-regression-that-is-the-question
Page 10 of 11
21/11/15, 01:45
software)
Dawn Keller
(http://blog.minitab.com/blog/adventure
in-softwaredevelopment)
()
Eston Martz
(http://blog.minitab.com/blog/understan
statistics)
Karen Meldrum
(http://blog.minitab.com/blog/statisticstips-from-atechnical-trainer)
Bruno Scibilia
(http://blog.minitab.com/blog/applyingstatistics-in-qualityprojects)
Eduardo Santiago
(http://blog.minitab.com/blog/understan
statistics-and-itsapplication)
()
Cody Steele
(http://blog.minitab.com/blog/statisticsand-qualityimprovement)
http://blog.minitab.com/blog/adventures-in-statistics/linear-or-nonlinear-regression-that-is-the-question
Page 11 of 11