chapter2- optimisation
chapter2- optimisation
Regularization
1
Contents
1 Introduction 3
2 Convex Optimization 3
2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Mathematical Definition . . . . . . . . . . . . . . . . . . . . . . . 3
2.3 Examples in Machine Learning . . . . . . . . . . . . . . . . . . . 3
5 Nonlinear Regression 5
5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
5.2 Example: Exponential Growth . . . . . . . . . . . . . . . . . . . 5
6 Logistic Regression 5
6.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
6.2 Objective Function . . . . . . . . . . . . . . . . . . . . . . . . . . 5
6.3 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
7 Regularization 6
7.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
7.2 Ridge Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
7.3 Lasso Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
7.4 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
8 Conclusion 7
2
1 Introduction
In the field of statistical learning, regression techniques are central to build-
ing predictive models. This course covers various regression methods, including
simple linear regression, multiple linear regression, nonlinear regres-
sion, and logistic regression. We also explore convex optimization, which
is fundamental to solving many regression problems, and regularization tech-
niques used to improve model generalization.
2 Convex Optimization
2.1 Overview
Convex optimization is a subclass of mathematical optimization problems where
the objective function is convex, meaning it has a single global minimum. This
property simplifies the process of finding the optimal solution.
where f (x) is the objective function and gi (x) are the constraint functions.
3
3.2 Objective Function
The goal is to estimate β0 and β1 such that the residual sum of squares (RSS)
is minimized:
n
X n
X
RSS = (yi − ŷi )2 = (yi − (β0 + β1 xi ))2 .
i=1 i=1
3.3 Example
Consider a dataset with the following pairs of (x, y) values:
{(1, 2), (2, 3), (3, 5), (4, 4), (5, 6)}.
Using simple linear regression, we aim to fit a line to the data. Solving the
normal equations yields estimates of β0 = 1.4 and β1 = 0.9, so the regression
equation is:
ŷ = 1.4 + 0.9x.
4.3 Example
Suppose we have a dataset with two independent variables x1 (years of experi-
ence) and x2 (education level), and a dependent variable y (salary). Multiple
linear regression can be used to model the relationship:
y = 30 + 2.5x1 + 1.2x2 .
4
Here, 30 represents the base salary, 2.5 is the increase in salary per year of
experience, and 1.2 is the increase per level of education.
5 Nonlinear Regression
5.1 Overview
Nonlinear regression is used when the relationship between the dependent vari-
able and the independent variables cannot be modeled by a straight line. Non-
linear models are of the form:
y = f (x; β) + ϵ,
6 Logistic Regression
6.1 Overview
Logistic regression is used for binary classification problems where the dependent
variable y takes on values 0 or 1. The model estimates the probability that y = 1
as a function of the independent variables:
1
P (y = 1|x) = .
1 + exp(−(β0 + β1 x1 + · · · + βp xp ))
5
6.3 Example
Consider a dataset where we want to predict whether a student will pass (y = 1)
or fail (y = 0) based on study hours (x1 ) and previous grades (x2 ). The logistic
regression model might look like:
1
P (y = 1|x1 , x2 ) = .
1 + exp(−(0.5 + 1.2x1 + 0.8x2 ))
This model estimates the probability of passing based on the input features.
7 Regularization
7.1 Overview
Regularization is used to prevent overfitting in regression models, especially
when the number of features is large or when the model is highly flexible. Two
common regularization techniques are Ridge Regression and Lasso Regres-
sion.
7.4 Example
Suppose we are predicting house prices using 100 features. Ridge or Lasso
regression can be used to shrink the coefficients for less important features,
improving model generalization and interpretability.
6
8 Conclusion
In this course, we have explored fundamental techniques in statistical learning,
including convex optimization, various types of regression, and regularization.
These methods form the backbone of many machine learning algorithms used in
practice. Regularization techniques, in particular, help improve the performance
of models on unseen data by preventing overfitting.