1.1. Linear Models — scikit-learn 1.6.1 documentation
1.1. Linear Models — scikit-learn 1.6.1 documentation
Linear Models
The following are a set of methods intended for regression in which the target value is
^ is the
expected to be a linear combination of the features. In mathematical notation, if y
predicted value.
y^(w, x) = w0 + w1 x 1 + . . . + wp x p
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression 24/04/25, 15 50
Página 1 de 41
:
LinearRegression will take in its fit method arrays X , y and will store the
coefficients w of the linear model in its coef_ member:
The coefficient estimates for Ordinary Least Squares rely on the independence of the
features. When features are correlated and the columns of the design matrix X have an
approximately linear dependence, the design matrix becomes close to singular and as a
result, the least-squares estimate becomes highly sensitive to random errors in the
observed target, producing a large variance. This situation of multicollinearity can arise,
for example, when data are collected without an experimental design.
Examples
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression 24/04/25, 15 50
Página 2 de 41
:
1.1.1.1. Non-Negative Least Squares
It is possible to constrain all the coefficients to be non-negative, which may be useful
when they represent some physical or naturally non-negative quantities (e.g.,
frequency counts or prices of goods). LinearRegression accepts a boolean
positive parameter: when set to True Non-Negative Least Squares are then applied.
Examples
1.1.2.1. Regression
Ridge regression addresses some of the problems of Ordinary Least Squares by
imposing a penalty on the size of the coefficients. The ridge coefficients minimize a
penalized residual sum of squares:
The complexity parameter α ≥ 0 controls the amount of shrinkage: the larger the value
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression 24/04/25, 15 50
Página 3 de 41
:
of α , the greater the amount of shrinkage and thus the coefficients become more
robust to collinearity.
As with other linear models, Ridge will take in its fit method arrays X , y and will
store the coefficients w of the linear model in its coef_ member:
Note that the class Ridge allows for the user to specify that the solver be
automatically chosen by setting solver="auto" . When this option is specified, Ridge
will choose between the "lbfgs" , "cholesky" , and "sparse_cg" solvers. Ridge will
begin checking the conditions shown in the following table from top to bottom. If the
condition is true, the corresponding solver is chosen.
Solver Condition
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression 24/04/25, 15 50
Página 4 de 41
:
‘lbfgs’ The positive=True option is specified.
1.1.2.2. Classification
The Ridge regressor has a classifier variant: RidgeClassifier . This classifier first
converts binary targets to {-1, 1} and then treats the problem as a regression task,
optimizing the same objective as above. The predicted class corresponds to the sign of
the regressor’s prediction. For multiclass classification, the problem is treated as multi-
output regression, and the predicted class corresponds to the output with the highest
value.
It might seem questionable to use a (penalized) Least Squares loss to fit a classification
model instead of the more traditional logistic or hinge losses. However, in practice, all
those models can lead to similar cross-validation scores in terms of accuracy or
precision/recall, while the penalized least squares loss used by the RidgeClassifier
allows for a very different choice of the numerical solvers with distinct computational
performance profiles.
Examples
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression 24/04/25, 15 50
Página 5 de 41
:
Plot Ridge coefficients as a function of the regularization
Classification of text documents using sparse features
Common pitfalls in the interpretation of coefficients of linear models
Usage example:
Specifying the value of the cv attribute will trigger the use of cross-validation with
GridSearchCV , for example cv=10 for 10-fold cross-validation, rather than Leave-
One-Out Cross-Validation.
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression 24/04/25, 15 50
Página 6 de 41
:
References
1.1.3. Lasso
The Lasso is a linear model that estimates sparse coefficients. It is useful in some
contexts due to its tendency to prefer solutions with fewer non-zero coefficients,
effectively reducing the number of features upon which the given solution is
dependent. For this reason, Lasso and its variants are fundamental to the field of
compressed sensing. Under certain conditions, it can recover the exact set of non-zero
coefficients (see Compressive sensing: tomography reconstruction with L1 prior
(Lasso)).
1
min ||X w − y|| 22 + α ||w|| 1
w 2n samples
The lasso estimate thus solves the minimization of the least-squares penalty with
α ||w|| 1 added, where α is a constant and ||w|| 1 is the ℓ1 -norm of the coefficient
vector.
The implementation in the class Lasso uses coordinate descent as the algorithm to fit
the coefficients. See Least Angle Regression for another implementation:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression 24/04/25, 15 50
Página 7 de 41
:
The function lasso_path is useful for lower-level tasks, as it computes the
coefficients along the full path of possible values.
Examples
Note
As the Lasso regression yields sparse models, it can thus be used to perform
feature selection, as detailed in L1-based feature selection.
References
For high-dimensional datasets with many collinear features, LassoCV is most often
preferable. However, LassoLarsCV has the advantage of exploring more relevant
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression 24/04/25, 15 50
Página 8 de 41
:
values of alpha parameter, and if the number of samples is very small compared to
the number of features, it is often faster than LassoCV .
Indeed, these criteria are computed on the in-sample training set. In short, they
penalize the over-optimistic scores of the different Lasso models by their flexibility (cf.
to “Mathematical details” section below).
However, such criteria need a proper estimation of the degrees of freedom of the
solution, are derived for large samples (asymptotic results) and assume the correct
model is candidates under investigation. They also tend to break when the problem is
badly conditioned (e.g. more features than samples).
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression 24/04/25, 15 50
Página 9 de 41
:
Examples
Mathematical details
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression 24/04/25, 15 50
Página 10 de 41
:
1.1.4. Multi-task Lasso
The MultiTaskLasso is a linear model that estimates sparse coefficients for multiple
regression problems jointly: y is a 2D array, of shape (n_samples, n_tasks) . The
constraint is that the selected features are the same for all the regression problems,
also called tasks.
The following figure compares the location of the non-zero entries in the coefficient
matrix W obtained with a simple Lasso or a MultiTaskLasso. The Lasso estimates yield
scattered non-zeros while the non-zeros of the MultiTaskLasso are full columns.
Fitting a time-series model, imposing that any active feature be active at all times.
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression 24/04/25, 15 50
Página 11 de 41
:
Examples
Mathematical details
1.1.5. Elastic-Net
ElasticNet is a linear regression model trained with both ℓ1 and ℓ2 -norm
regularization of the coefficients. This combination allows for learning a sparse model
where few of the weights are non-zero like Lasso , while still maintaining the
regularization properties of Ridge . We control the convex combination of ℓ1 and ℓ2
using the l1_ratio parameter.
Elastic-net is useful when there are multiple features that are correlated with one
another. Lasso is likely to pick one of these at random, while elastic-net is likely to pick
both.
A practical advantage of trading-off between Lasso and Ridge is that it allows Elastic-
Net to inherit some of Ridge’s stability under rotation.
1 α (1 − ρ)
min ||X w − y|| 22 + α ρ||w|| 1 + ||w|| 22
w 2n sa mp les 2
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression 24/04/25, 15 50
Página 12 de 41
:
The class ElasticNetCV can be used to set the parameters alpha (α ) and l1_ratio
(ρ) by cross-validation.
Examples
References
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression 24/04/25, 15 50
Página 13 de 41
:
norm for regularization. The objective function to minimize is:
1 α (1 − ρ)
min ||X W − Y|| 2Fro + α ρ||W || 21 + ||W || 2Fro
W 2n sa mp les 2
The class MultiTaskElasticNetCV can be used to set the parameters alpha (α ) and
l1_ratio (ρ) by cross-validation.
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression 24/04/25, 15 50
Página 14 de 41
:
intuition would expect, and also is more stable.
It is easily modified to produce solutions for other estimators, like the Lasso.
Because LARS is based upon an iterative refitting of the residuals, it would appear
to be especially sensitive to the effects of noise. This problem is discussed in detail
by Weisberg in the discussion section of the Efron et al. (2004) Annals of Statistics
article.
The LARS model can be used via the estimator Lars , or its low-level implementation
lars_path or lars_path_gram .
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression 24/04/25, 15 50
Página 15 de 41
:
>>> from sklearn import linear_model
>>> reg = linear_model.LassoLars(alpha=.1)
>>> reg.fit([[0, 0], [1, 1]], [0, 1])
LassoLars(alpha=0.1)
>>> reg.coef_
array([0.6..., 0. ])
Examples
The Lars algorithm provides the full path of the coefficients along the regularization
parameter almost for free, thus a common operation is to retrieve the path with one of
the functions lars_path or lars_path_gram .
Mathematical formulation
Being a forward feature selection method like Least Angle Regression, orthogonal
matching pursuit can approximate the optimum solution vector with a fixed number of
non-zero elements:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression 24/04/25, 15 50
Página 16 de 41
:
Alternatively, orthogonal matching pursuit can target a specific error instead of a
specific number of non-zero coefficients. This can be expressed as:
OMP is based on a greedy algorithm that includes at each step the atom most highly
correlated with the current residual. It is similar to the simpler matching pursuit (MP)
method, but better in that at each iteration, the residual is recomputed using an
orthogonal projection on the space of the previously chosen dictionary elements.
Examples
References
This can be done by introducing uninformative priors over the hyper parameters of the
model. The ℓ2 regularization used in Ridge regression and classification is equivalent to
finding a maximum a posteriori estimation under a Gaussian prior over the coefficients
w with precision λ − 1 . Instead of setting lambda manually, it is possible to treat it as a
random variable to be estimated from the data.
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression 24/04/25, 15 50
Página 17 de 41
:
p(y|X , w, α ) = N (y|X w, α − 1 )
where α is again treated as a random variable that is to be estimated from the data.
References
p(w|λ) = N (w|0, λ − 1 I p )
The priors over α and λ are chosen to be gamma distributions, the conjugate prior for
the precision of the Gaussian. The resulting model is called Bayesian Ridge Regression,
and is similar to the classical Ridge .
The parameters w, α and λ are estimated jointly during the fit of the model, the
regularization parameters α and λ being estimated by maximizing the log marginal
likelihood. The scikit-learn implementation is based on the algorithm described in
Appendix A of (Tipping, 2001) where the update of the parameters α and λ is done as
suggested in (MacKay, 1992). The initial value of the maximization procedure can be
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression 24/04/25, 15 50
Página 18 de 41
:
set with the hyperparameters alpha_init and lambda_init .
After being fitted, the model can then be used to predict new values:
>>> reg.coef_
array([0.49999993, 0.49999993])
Due to the Bayesian framework, the weights found are slightly different to the ones
found by Ordinary Least Squares. However, Bayesian Ridge Regression is more robust
to ill-posed problems.
Examples
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression 24/04/25, 15 50
Página 19 de 41
:
References
p(w|λ) = N (w|0, A − 1 )
In contrast to the Bayesian Ridge Regression, each coordinate of w i has its own
standard deviation 1 . The prior over all λ i is chosen to be the same gamma
λi
distribution given by the hyperparameters λ 1 and λ 2 .
ARD is also known in the literature as Sparse Bayesian Learning and Relevance Vector
Machine [3] [4]. For a worked-out comparison between ARD and Bayesian Ridge
Regression, see the example below.
Examples
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression 24/04/25, 15 50
Página 20 de 41
:
References
[1] Christopher M. Bishop: Pattern Recognition and Machine Learning, Chapter 7.2.1
[2] David Wipf and Srikantan Nagarajan: A New View of Automatic Relevance
Determination
[3] Michael E. Tipping: Sparse Bayesian Learning and the Relevance Vector Machine
This implementation can fit binary, One-vs-Rest, or multinomial logistic regression with
optional ℓ1 , ℓ2 or Elastic-Net regularization.
Note
Regularization
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression 24/04/25, 15 50
Página 21 de 41
:
Note
Examples
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression 24/04/25, 15 50
Página 22 de 41
:
1
p^(X i ) = expit (X i w + w0 ) = .
1 + exp(− X i w − w0 )
n
1 r (w)
min ∑ s i ( − yi log( p^(X i )) − (1 − yi ) log(1 − p^(X i ))) + , (1)
w S i= 1 SC
where s i corresponds to the weights assigned by the user to a specific training sample
(the vector s is formed by element-wise multiplication of the class weights and sample
weights), and the sum S = ∑ ni= 1 s i .
We currently provide four choices for the regularization term r (w) via the penalty
argument:
penalty r (w)
None 0
ℓ1 ∥ w∥ 1
1 1
ℓ2
2
∥ w∥ 22 =
2
wT w
1− ρ
ElasticNet
2
wT w + ρ∥ w∥ 1
Note that the scale of the class weights and the sample weights will influence the
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression 24/04/25, 15 50
Página 23 de 41
:
optimization problem. For instance, multiplying the sample weights by a constant b > 0
is equivalent to multiplying the (inverse) regularization strength C by b.
Note
Mathematical details
1.1.11.3. Solvers
The solvers implemented in the class LogisticRegression are “lbfgs”, “liblinear”,
“newton-cg”, “newton-cholesky”, “sag” and “saga”:
The following table summarizes the penalties and multinomial multiclass supported by
each solver:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression 24/04/25, 15 50
Página 24 de 41
:
Solvers
Multiclass
support
Behaviors
The “lbfgs” solver is used by default for its robustness. For large datasets the “saga”
solver is usually faster. For large dataset, you may also consider using SGDClassifier
with loss="log_loss" , which might be even faster but requires more tuning.
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression 24/04/25, 15 50
Página 25 de 41
:
1.1.11.3.1. Differences between solvers
There might be a difference in the scores obtained between LogisticRegression with
solver=liblinear or LinearSVC and the external liblinear library directly, when
fit_intercept=False and the fit coef_ (or) the data to be predicted are zeroes. This
is because for the sample(s) with decision_function zero, LogisticRegression and
LinearSVC predict the negative class, while liblinear predicts the positive class. Note
that a model with fit_intercept=False and having many samples with
decision_function zero, is likely to be a underfit, bad model and you are advised to
set fit_intercept=True and increase the intercept_scaling .
Solvers’ details
Note
A logistic regression with ℓ1 penalty yields sparse models, and can thus be
used to perform feature selection, as detailed in L1-based feature selection.
Note
P-value estimation
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression 24/04/25, 15 50
Página 26 de 41
:
high-dimensional dense data, due to warm-starting (see Glossary).
Secondly, the squared loss function is replaced by the unit deviance d of a distribution
in the exponential family (or more precisely, a reproductive exponential dispersion
model (EDM) [11]).
1 α
min ∑ d(yi , y^i ) + ||w|| 22 ,
w 2n sa mp les i 2
where α is the L2 regularization penalty. When sample weights are provided, the
average becomes a weighted average.
The following table lists some specific EDMs and their unit deviance :
Normal y ∈ (− ∞ , ∞ ) (y − y^) 2
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression 24/04/25, 15 50
Página 27 de 41
:
Categorical y ∈ {0, 1, . . . , k} I ( y= i )
2∑ i ∈{0,1,...,k} I (y = i)yi log
^ i)
I ( y=
Gamma y ∈ (0, ∞ ) y^ + y − 1)
2(log y y^
The Probability Density Functions (PDF) of these distributions are illustrated in the
following figure,
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression 24/04/25, 15 50
Página 28 de 41
:
The choice of the distribution depends on the problem at hand:
References
[10] McCullagh, Peter; Nelder, John (1989). Generalized Linear Models, Second
Edition. Boca Raton: Chapman and Hall/CRC. ISBN 0-412-31760-5.
[11] Jørgensen, B. (1992). The theory of exponential dispersion models and analysis of
deviance. Monografias de matemática, no. 51. See also Exponential dispersion
model.
1.1.12.1. Usage
TweedieRegressor implements a generalized linear model for the Tweedie distribution,
that allows to model any of the above mentioned distributions using the appropriate
power parameter. In particular:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression 24/04/25, 15 50
Página 29 de 41
:
power = 0 : Normal distribution. Specific estimators such as Ridge , ElasticNet
are generally more appropriate in this case.
power = 1 : Poisson distribution. PoissonRegressor is exposed for convenience.
However, it is strictly equivalent to TweedieRegressor(power=1, link='log') .
power = 2 : Gamma distribution. GammaRegressor is exposed for convenience.
However, it is strictly equivalent to TweedieRegressor(power=2, link='log') .
power = 3 : Inverse Gaussian distribution.
Usage example:
Examples
Practical considerations
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression 24/04/25, 15 50
Página 30 de 41
:
Stochastic gradient descent is a simple yet very efficient approach to fit linear models.
It is particularly useful when the number of samples (and the number of features) is
very large. The partial_fit method allows online/out-of-core learning.
You can refer to the dedicated Stochastic Gradient Descent documentation section for
more details.
1.1.14. Perceptron
The Perceptron is another simple classification algorithm suitable for large scale
learning. By default:
The last characteristic implies that the Perceptron is slightly faster to train than SGD
with the hinge loss and that the resulting models are sparser.
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression 24/04/25, 15 50
Página 31 de 41
:
The passive-aggressive algorithms are a family of algorithms for large-scale learning.
They are similar to the Perceptron in that they do not require a learning rate. However,
contrary to the Perceptron, they include a regularization parameter C .
References
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression 24/04/25, 15 50
Página 32 de 41
:
There are different things to keep in mind when dealing with data corrupted by outliers:
Outliers in X or in y?
An important notion of robust fitting is that of breakdown point: the fraction of data that
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression 24/04/25, 15 50
Página 33 de 41
:
can be outlying for the fit to start missing the inlying data.
HuberRegressor should be faster than RANSAC and Theil Sen unless the
number of samples are very large, i.e. n_samples >> n_features . This is
because RANSAC and Theil Sen fit on smaller subsets of the data. However,
both Theil Sen and RANSAC are unlikely to be as robust as HuberRegressor for
the default parameters.
RANSAC is faster than Theil Sen and scales much better with the number of
samples.
RANSAC will deal better with large outliers in the y direction (most common
situation).
Theil Sen will cope better with medium-size outliers in the X direction, but this
property will disappear in high-dimensional settings.
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression 24/04/25, 15 50
Página 34 de 41
:
parameter). It is typically used for linear and non-linear regression problems and is
especially popular in the field of photogrammetric computer vision.
The algorithm splits the complete input sample data into a set of inliers, which may be
subject to noise, and outliers, which are e.g. caused by erroneous measurements or
invalid hypotheses about the data. The resulting model is then estimated only from the
determined inliers.
Examples
References
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression 24/04/25, 15 50
Página 35 de 41
:
The TheilSenRegressor estimator uses a generalization of the median in multiple
dimensions. It is thus robust to multivariate outliers. Note however that the robustness
of the estimator decreases quickly with the dimensionality of the problem. It loses its
robustness properties and becomes no better than an ordinary least squares in high
dimension.
Examples
Theil-Sen Regression
Robust linear estimator fitting
Theoretical considerations
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression 24/04/25, 15 50
Página 36 de 41
:
Examples
Mathematical details
The HuberRegressor differs from using SGDRegressor with loss set to huber in the
following ways.
Note that this estimator is different from the R implementation of Robust Regression
because the R implementation does a weighted least squares implementation with
weights given to each sample on the basis of how much the residual is greater than a
certain threshold.
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression 24/04/25, 15 50
Página 37 de 41
:
non-constant (but predictable) variance or non-normal distribution.
Based on minimizing the pinball loss, conditional quantiles can also be estimated by
models other than linear models. For example, GradientBoostingRegressor can
predict conditional quantiles if its parameter loss is set to "quantile" and parameter
alpha is set to the quantile that should be predicted. See the example in Prediction
Intervals for Gradient Boosting Regression.
Examples
Quantile regression
Mathematical details
References
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression 24/04/25, 15 50
Página 38 de 41
:
linear models with basis functions
One common pattern within machine learning is to use linear models trained on
nonlinear functions of the data. This approach maintains the generally fast
performance of linear methods, while allowing them to fit a much wider range of data.
Mathematical details
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression 24/04/25, 15 50
Página 39 de 41
:
>>> from sklearn.preprocessing import PolynomialFeatures
>>> import numpy as np
>>> X = np.arange(6).reshape(3, 2)
>>> X
array([[0, 1],
[2, 3],
[4, 5]])
>>> poly = PolynomialFeatures(degree=2)
>>> poly.fit_transform(X)
array([[ 1., 0., 1., 0., 0., 1.],
[ 1., 2., 3., 4., 6., 9.],
[ 1., 4., 5., 16., 20., 25.]])
This sort of preprocessing can be streamlined with the Pipeline tools. A single object
representing a simple polynomial regression can be created and used as follows:
The linear model trained on polynomial features is able to exactly recover the input
polynomial coefficients.
In some cases it’s not necessary to include higher powers of any single feature, but
only the so-called interaction features that multiply together at most d distinct
features. These can be gotten from PolynomialFeatures with the setting
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression 24/04/25, 15 50
Página 40 de 41
:
interaction_only=True .
For example, when dealing with boolean features, x ni = x i for all n and is therefore
useless; but x i x j represents the conjunction of two booleans. This way, we can solve
the XOR problem with a linear classifier:
>>> clf.predict(X)
array([0, 1, 1, 0])
>>> clf.score(X, y)
1.0
Previous Next
1. Supervised learning 1.2. Linear and Quadratic
Discriminant Analysis
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression 24/04/25, 15 50
Página 41 de 41
: