0% found this document useful (0 votes)
8 views

Lecture 02 (3hrs) Linear Regression and Logistic Regression

This document provides an overview of linear regression and logistic regression models. It introduces linear regression for both single and multivariate data. Optimization methods like least squares and gradient descent are discussed for fitting linear regression models. Logistic regression is then introduced as an extension of linear regression to binary classification problems using the logistic function. Maximum likelihood estimation and Newton's method are covered for logistic regression model fitting. Finally, the document discusses multi-class classification approaches and dealing with class imbalance.
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Lecture 02 (3hrs) Linear Regression and Logistic Regression

This document provides an overview of linear regression and logistic regression models. It introduces linear regression for both single and multivariate data. Optimization methods like least squares and gradient descent are discussed for fitting linear regression models. Logistic regression is then introduced as an extension of linear regression to binary classification problems using the logistic function. Maximum likelihood estimation and Newton's method are covered for logistic regression model fitting. Finally, the document discusses multi-class classification approaches and dealing with class imbalance.
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 42

Lecture 02

Linear Regression and Logistic


Regression
Xizhao WANG
Dian ZHANG
Big Data Institute
College of Computer Science
Shenzhen University

March 2022
Introduction to linear regression
Mathematical model
Pseudo-inverse of matrix Introduction to linear regression
Logistic regression
Summary

Outline

1. Linear regression
2. Logistic regression
3. Summary

Machine Learning Lecture – Xizhao Wang


Dian Zhang Lecture 02: Linear regression and Logistic regression
Linear regression
• Linear model

• Vector format

Color root shape sound good/bad

3
Linear regression
with one variable y
• Training dataset
x

• Linear regression tries to learn the best fit line

4
Linear regression
with one variable y
• Optimization :
Minimum mean-square error x

• Loss function

5
Linear regression
with one variable
• Least square method : Find a best fit
line , minimize the sum of Euclidean distance from
the samples to the line 。

• Partial derivative

6
Linear regression
with one variable
• Two equations with two unknowns

7
Linear regression
with one variable
• Two equations with two unknowns

8
Multivariate
linear regression
• Training set Color root shape sound good/bad

• Linear regression tries to learn the best fit line

9
Multivariate
linear regression
• Vector Color root shape sound good/bad

• Parameter

• Optimization

10
Multivariate
linear regression
• Loss function Color root shape sound good/bad

• Partial derivative 【 Zhihua zhou, 2016 】 appendix A.2

【 The matrix cookbook. 】


http://www.cs.toronto.edu/~bonner/courses/2018s/
csc338/matrix_cookbook.pdf
11
Multivariate
linear regression

Color root shape sound good/bad
Parameter estimation : closed-form

• Prediction

12
•log-linear regression
• log-linear regression

• generalized linear model

Link function: Differentiable function 13


log-linear regression
• Binary Classification : Define GLM
through ( Link function) g(.)

• unit-step function
• non-differentiable

• Not continuous

• Sigmoid function

14
Logistic Regression
• logistic function

• GLM

15
Logistic Regression
• 几率( odds) : positive probability/ negative
probability

• 对数几率 (log odds)

• Logistic regression : utilize linear regression to fit log


probability.

16
Logistic Regression
• Logistic regression : utilize linear regression to fit log probability.
(Decision through probability, interpretable )

• 利用概率分布属性,可求得 positive/negative probability


formula

17
Logistic Regression
• Formula

18
Maximum likelihood method
• Given dataset :

• log-likelihood

• 似然项展开:

19
Maximum likelihood method
• log-likelihood

20
maximum likelihood method
• Log-likelihood

• Maximum likelihood

21
Gradient descent

22
Newton’s Method
• Start with an initial guess, then to approximate the function by
its tangent line, and finally to compute the x-intercept of this
tangent line. This x-intercept will typically be a better
approximation to the original function's root than the first guess,
and the method can interative

23
Multi-class Learning
• Idea : divide into some Binary classification
problem

• Approach :
• One v.s. One x2

• One v.s. Rest


x1
• Many v.s. Many

24
Multi-class Learning
• One v.s. One
x2
• Combination: N(N-1)/2

• Vote x1

• One v.s. Rest

• Train N classifiers and choose the highest


confidence

25
Multi-class Learning x2

• One v.s. One x1

• 两两配对产生 N(N-1)/2 个二分类任务

• 结果可通过投票

• 一对其余 (One v.s. Rest)

• 训练 N 个分类器,选择置信度最高

26
Multi-class Learning x 2

x1
• Many v.s. Many : Error Correcting Output Code ( ECOC )
• Coding : divide M times , M classifiers
• Decoding : Compare

27
Class-imbalance x2

• Examples : medical, bio information


x1

• Approach :
• 欠采样 (under-sampling)
• 过采样 (over-sampling)
• 阈值移动 (threshold-moving): rescaling 再缩放

• Research problem : only one-class ,负类不可信 (positive-


unlabel).

28
Introduction

• Linear Regression is a regression technique


– Fitting curves
• Logistic Regression is a classification technique
– Classifying data samples
Conclusion
The Differences between Linear Regression and Logistic Regression

• Linear Regression is used to handle regression problems


whereas Logistic regression is used to handle the classification
problems.
• Linear regression provides a continuous output but Logistic
regression provides discreet output.
• The purpose of Linear Regression is to find the best-fitted line
while Logistic regression is one step ahead and fitting the line
values to the sigmoid curve.
• The method for calculating loss function in linear regression is
the mean squared error whereas for logistic regression it is
maximum likelihood estimation.
Example: Weights of Books

• hc stands for hardcover


• pb stands for paperback
Example: Weights of Books

• hc stands for hardcover


• pb stands for paperback
Linear Regression
Example: Weights of Books

• Construct the data frame to store the data


Linear Regression
Example: Weights of Books

• Include dependencies and packages


• Set feature vectors and the labels.
Example: Weights of Books

• Include dependencies and packages


• Set feature vectors and the labels.
Example: Weights of Books

• Train the model


• (https://scikit-learn.org/stable/modules/generated/
sklearn.linear_model.LinearRegression.html)
• reg = LinearRegression().fit(X, y)
Example: Weights of Books

• Understand the output

Coefficients for volume and cover

The constant term in the model


Example: Weights of Books

• Understand the output

Coefficients for volume and cover

The constant term in the model


Example: Weights of Books

• Understand the output


Example: Weights of Books

• Understand the output


From Linear Regression to Logistic Regression

• The relationship between weight and height is linear

Weight

Height
Linear Regression
From Linear Regression to Logistic Regression

• But what about Obesity vs. Weight

Obese Obese
Prob. of Obesity

Prob. of Obesity
Threshold

Weight Weight
Linear Regression Logistic Regression

You might also like