1-Linear Regression
1-Linear Regression
RMSE of 24.5$ means we can safely say the predicted and the actual price are off by 24.5$ at the
same time base on RMSE (upper-bound of prediction error)
MAE (Mean Absolute Error): Let's say we have a regressor for predicting house prices with a MAE
of 20.5$. Based on MAE. We can certainly interpret that the average difference between the predicted
and the actual price is 20.5$.
R-squared of the model tells us the proportion of the variance in the response variable that can be
explained by the predictor variable(s) in the model.
Overfitting and Underfitting
Overfitting in Machine Learning
Overfitting refers to a model that models the training data too well.
Overfitting happens when a model learns the detail and noise in the training data to the extent
that it negatively impacts the performance of the model on new data. This means that the noise
or random fluctuations in the training data is picked up and learned as concepts by the model.
The problem is that these concepts do not apply to new data and negatively impact the models
ability to generalize.
Overfitting is more likely with nonparametric and nonlinear models that have more flexibility
when learning a target function. As such, many nonparametric machine learning algorithms also
include parameters or techniques to limit and constrain how much detail the model learns.
For example, decision trees are a nonparametric machine learning algorithm that is very
flexible and is subject to overfitting training data. This problem can be addressed by pruning a
tree after it has learned in order to remove some of the detail it has picked up.
Overfitting and Underfitting
Underfitting in Machine Learning
Underfitting refers to a model that can neither model the training data nor generalize to new
data.
An underfit machine learning model is not a suitable model and will be obvious as it will have
poor performance on the training data.
Underfitting is often not discussed as it is easy to detect given a good performance metric. The
remedy is to move on and try alternate machine learning algorithms. Nevertheless, it does
provide a good contrast to the problem of overfitting.