0% found this document useful (0 votes)
2 views

chapter2- optimisation

This document discusses various regression techniques and their relationship with convex optimization and regularization in statistical learning. It covers simple, multiple, nonlinear, and logistic regression, along with regularization methods like Ridge and Lasso regression. The content emphasizes the importance of these methods in building predictive models and improving generalization.

Uploaded by

sufyanalthawri1
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

chapter2- optimisation

This document discusses various regression techniques and their relationship with convex optimization and regularization in statistical learning. It covers simple, multiple, nonlinear, and logistic regression, along with regularization methods like Ridge and Lasso regression. The content emphasizes the importance of these methods in building predictive models and improving generalization.

Uploaded by

sufyanalthawri1
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Chapter 2: Convex Optimization, Regression and

Regularization

October 21, 2024

1
Contents
1 Introduction 3

2 Convex Optimization 3
2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Mathematical Definition . . . . . . . . . . . . . . . . . . . . . . . 3
2.3 Examples in Machine Learning . . . . . . . . . . . . . . . . . . . 3

3 Simple Linear Regression 3


3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
3.2 Objective Function . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3.3 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

4 Multiple Linear Regression 4


4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
4.2 Objective Function . . . . . . . . . . . . . . . . . . . . . . . . . . 4
4.3 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

5 Nonlinear Regression 5
5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
5.2 Example: Exponential Growth . . . . . . . . . . . . . . . . . . . 5

6 Logistic Regression 5
6.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
6.2 Objective Function . . . . . . . . . . . . . . . . . . . . . . . . . . 5
6.3 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

7 Regularization 6
7.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
7.2 Ridge Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
7.3 Lasso Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
7.4 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

8 Conclusion 7

2
1 Introduction
In the field of statistical learning, regression techniques are central to build-
ing predictive models. This course covers various regression methods, including
simple linear regression, multiple linear regression, nonlinear regres-
sion, and logistic regression. We also explore convex optimization, which
is fundamental to solving many regression problems, and regularization tech-
niques used to improve model generalization.

2 Convex Optimization
2.1 Overview
Convex optimization is a subclass of mathematical optimization problems where
the objective function is convex, meaning it has a single global minimum. This
property simplifies the process of finding the optimal solution.

2.2 Mathematical Definition


A function f : Rn → R is said to be convex if for any x, y ∈ Rn and any
θ ∈ [0, 1], the following condition holds:
f (θx + (1 − θ)y) ≤ θf (x) + (1 − θ)f (y).
Convex optimization problems can be written in the standard form:
min f (x) subject to gi (x) ≤ 0, i = 1, . . . , m,
x

where f (x) is the objective function and gi (x) are the constraint functions.

2.3 Examples in Machine Learning


Many machine learning algorithms, including linear regression, support vector
machines (SVM), and logistic regression, rely on solving convex optimization
problems. For example, the cost function in linear regression is quadratic, which
is a convex function. Minimizing this cost leads to the optimal regression coef-
ficients.

3 Simple Linear Regression


3.1 Overview
Simple linear regression models the relationship between a single independent
variable x and a dependent variable y by fitting a linear equation to the data.
The form of the model is:
y = β0 + β1 x + ϵ,
where β0 is the intercept, β1 is the slope of the line, and ϵ is the error term.

3
3.2 Objective Function
The goal is to estimate β0 and β1 such that the residual sum of squares (RSS)
is minimized:
n
X n
X
RSS = (yi − ŷi )2 = (yi − (β0 + β1 xi ))2 .
i=1 i=1

This is a convex optimization problem where we minimize a quadratic cost


function.

3.3 Example
Consider a dataset with the following pairs of (x, y) values:
{(1, 2), (2, 3), (3, 5), (4, 4), (5, 6)}.
Using simple linear regression, we aim to fit a line to the data. Solving the
normal equations yields estimates of β0 = 1.4 and β1 = 0.9, so the regression
equation is:
ŷ = 1.4 + 0.9x.

4 Multiple Linear Regression


4.1 Overview
Multiple linear regression extends simple linear regression by modeling the re-
lationship between a dependent variable y and multiple independent variables
x1 , x2 , . . . , xp . The model is of the form:
y = β0 + β1 x1 + β2 x2 + · · · + βp xp + ϵ.

4.2 Objective Function


Similar to simple linear regression, the objective is to minimize the residual sum
of squares (RSS):
n
X 2
RSS = (yi − (β0 + β1 xi1 + · · · + βp xip )) .
i=1

The normal equations can be used to estimate the coefficients β0 , β1 , . . . , βp .

4.3 Example
Suppose we have a dataset with two independent variables x1 (years of experi-
ence) and x2 (education level), and a dependent variable y (salary). Multiple
linear regression can be used to model the relationship:
y = 30 + 2.5x1 + 1.2x2 .

4
Here, 30 represents the base salary, 2.5 is the increase in salary per year of
experience, and 1.2 is the increase per level of education.

5 Nonlinear Regression
5.1 Overview
Nonlinear regression is used when the relationship between the dependent vari-
able and the independent variables cannot be modeled by a straight line. Non-
linear models are of the form:

y = f (x; β) + ϵ,

where f (x; β) is a nonlinear function of the parameters β.

5.2 Example: Exponential Growth


A common example of nonlinear regression is exponential growth, where the
model is:
y = β0 exp(β1 x) + ϵ.
For example, if we are modeling population growth, y represents the population
at time x, and β0 and β1 are parameters to be estimated.

6 Logistic Regression
6.1 Overview
Logistic regression is used for binary classification problems where the dependent
variable y takes on values 0 or 1. The model estimates the probability that y = 1
as a function of the independent variables:
1
P (y = 1|x) = .
1 + exp(−(β0 + β1 x1 + · · · + βp xp ))

6.2 Objective Function


The objective in logistic regression is to maximize the likelihood of the observed
data. The log-likelihood function is:
n
X
ℓ(β) = (yi log(P (yi = 1|xi )) + (1 − yi ) log(1 − P (yi = 1|xi ))) .
i=1

This function is concave, so logistic regression is a convex optimization problem.

5
6.3 Example
Consider a dataset where we want to predict whether a student will pass (y = 1)
or fail (y = 0) based on study hours (x1 ) and previous grades (x2 ). The logistic
regression model might look like:
1
P (y = 1|x1 , x2 ) = .
1 + exp(−(0.5 + 1.2x1 + 0.8x2 ))

This model estimates the probability of passing based on the input features.

7 Regularization
7.1 Overview
Regularization is used to prevent overfitting in regression models, especially
when the number of features is large or when the model is highly flexible. Two
common regularization techniques are Ridge Regression and Lasso Regres-
sion.

7.2 Ridge Regression


Ridge regression adds an L2 penalty to the objective function in linear regres-
sion, which shrinks the coefficients:
 
Xn p
X
min  (yi − ŷi )2 + λ βj2  .
β
i=1 j=1

Here, λ controls the strength of the regularization.

7.3 Lasso Regression


Lasso regression adds an L1 penalty, which encourages sparsity in the coeffi-
cients:  
Xn p
X
min  (yi − ŷi )2 + λ |βj | .
β
i=1 j=1

Lasso is useful when we expect many coefficients to be zero, as it performs


feature selection.

7.4 Example
Suppose we are predicting house prices using 100 features. Ridge or Lasso
regression can be used to shrink the coefficients for less important features,
improving model generalization and interpretability.

6
8 Conclusion
In this course, we have explored fundamental techniques in statistical learning,
including convex optimization, various types of regression, and regularization.
These methods form the backbone of many machine learning algorithms used in
practice. Regularization techniques, in particular, help improve the performance
of models on unseen data by preventing overfitting.

You might also like