[Slide] Logistic Regression
[Slide] Logistic Regression
AI Insight Course
Logistic Regression
Quang-Vinh Dinh
Ph.D. in Computer Science
Year 2020
Outline
Sigmoid function
From Linear to Logistic Regression
Logistic Regression – Stochastic
Logistic Regression – Mini-batch
Logistic Regression – Batch
AI VIETNAM
AI Insight Course
Sigmoid Function
Sigmoid function
1
𝑦 = 𝜎(𝑢) =
1 + 𝑒 −𝑢 𝑦2
𝑢 ∈ −∞ +∞
𝑦∈ 0 1 𝒚
Property 𝑦1
∀𝑢1 𝑢2 ∈ 𝑎 𝑏 và 𝑢1 ≤ 𝑢2 𝑢1 𝑢2
→ 𝜎(𝑢1 ) ≤ 𝜎(𝑢1 )
𝒖
−∞ +∞
Year 2020 1
AI VIETNAM
AI Insight Course
Sigmoid Function
𝑦 = 𝜽𝑇 𝒙
𝜽𝑇 𝒙 𝜽𝑇 𝒙
𝑦 ∈ −∞ +∞
𝒙 𝒙
𝑧 = 𝜽𝑇 𝒙
1
𝑦 = 𝜎(𝑧) = 1 1
1 + 𝑒 −𝑧 𝑇𝒙 𝑇𝒙
1 + 𝑒 −𝜽 1 + 𝑒 −𝜽
𝑦∈ 0 1
𝜽𝑇 𝒙 𝜽𝑇 𝒙
Year 2020 2
AI VIETNAM
AI Insight Course
Sigmoid Function
𝑦 = 𝜽𝑇 𝒙
𝜽𝑇 𝒙 𝜽𝑇 𝒙
𝑦 ∈ −∞ +∞
𝒙 𝒙
𝑧 = 𝜽𝑇 𝒙
1
𝑦 = 𝜎(𝑧) =
1 + 𝑒 −𝑧
1 1
𝑦∈ 0 1 1 + 𝑒 −𝜽
𝑇𝒙
1 + 𝑒 −𝜽
𝑇𝒙
𝜽𝑇 𝒙 𝜽𝑇 𝒙
Year 2020 3
AI VIETNAM
AI Insight Course
Sigmoid Function
Feature Label
Category 1
1
Category 2 𝑇𝒙
1 + 𝑒 −𝜽
𝑥
𝑧 = 𝜽𝑇 𝒙
1
𝑦 = 𝜎(𝑧) =
1 + 𝑒 −𝑧 𝑧 = 0.535 ∗ 𝑥 − 0.654
𝑦∈ 0 1
Year 2020 4
AI VIETNAM
AI Insight Course
Sigmoid Function
Feature Label
Category 1
Category 2 1
𝑇𝒙
1 + 𝑒 −𝜽
𝑥
𝑧 = 𝜽𝑇 𝒙
1
𝑦 = 𝜎(𝑧) =
1 + 𝑒 −𝑧 𝑧 = 2.331 ∗ 𝑥 − 5.156
𝑦∈ 0 1
Year 2020 5
AI VIETNAM
AI Insight Course
Sigmoid Function
Feature Label
Category 1
Category 2
1
𝑇𝒙
1 + 𝑒 −𝜽
𝑧 = 𝜽𝑇 𝒙
1
𝑦 = 𝜎(𝑧) =
1 + 𝑒 −𝑧
𝑦∈ 0 1
𝑥
Year 2020 6
Outline
Sigmoid function
From Linear to Logistic Regression
Logistic Regression – Stochastic
Logistic Regression – Mini-batch
Logistic Regression – Batch
AI VIETNAM
AI Insight Course
Idea of Logistic Regression
Linear regression
𝒚
Model
Training data 𝒙
Year 2020 7
AI VIETNAM
AI Insight Course
Idea of Logistic Regression
Given a new kind of data Category 2
Feature Label
Plot data
Category 1
Category 1
Category 2
Feature
Assign numbers
to categories
Feature Label
A line is not suitable
for this data
Category 1
Category 2
Year 2020 Feature 8
AI VIETNAM
AI Insight Course
Idea of Logistic Regression
Given a new kind of data Sigmoid function
Feature Label could fit the data
𝑧 = 𝜽𝑇 𝒙 1
1 1 + 𝑒 −𝜽
𝑇𝒙
Category 1 𝑦ො = 𝜎(𝑧) =
1 + 𝑒 −𝑧
Category 2
𝑦ො ∈ 0 1
Feature
Assign numbers
to categories Error error =1-𝑦ො
Feature Label if 𝑦 = 1
error = 1 − 𝑦ො
if 𝑦 = 0
Category 1 error = 𝑦ො
error = 𝑦ො
Category 2 Feature
Year 2020 9
AI VIETNAM
AI Insight Course
Idea of Logistic Regression
Construct loss
error =1-𝑦ො
belief = 1 − 𝑦ො
belief = 𝑦ො
error = 𝑦ො
Error Belief
if 𝑦𝑖 = 1 if 𝑦𝑖 = 1
error = 1 − 𝑦ො𝑖 belief = 𝑦ො𝑖
if 𝑦𝑖 = 0 if 𝑦𝑖 = 0
error = 𝑦ො𝑖 belief = 1 − 𝑦ො𝑖
𝑦𝑖 1−𝑦𝑖
𝑃 = 𝑦ො𝑖 1 − 𝑦ො𝑖
Year 2020 Minimize error ~ maximize belief ~ Minimize (-belief) 10
7
AI VIETNAM
AI Insight Course
Idea of Logistic Regression
𝑛
Construct loss
belief = ෑ 𝑃𝑖 since iid
𝑖=1
𝑛
belief = 1 − 𝑦ො
log_belief = log𝑃𝑖
𝑖=1
belief = 𝑦ො 𝑛
belief = 𝑦ො𝑖
if 𝑦𝑖 = 0 1
L= −𝑦 𝑇 𝑙𝑜𝑔 𝑦ො − (1 − 𝑦 𝑇 )𝑙𝑜𝑔 1 − 𝑦ො
belief = 1 − 𝑦ො𝑖 𝑁
𝑦𝑖
Binary cross-entropy
𝑃𝑖 = 𝑦ො𝑖 1 − 𝑦ො𝑖 1−𝑦𝑖
Year 2020 11
8
AI VIETNAM
AI Insight Course
Idea of Logistic Regression
Construct loss 𝜕𝐿 𝜕𝐿 𝜕𝑦ො 𝜕𝑧 Derivative
=
𝜕𝜃 𝜕𝑦ො 𝜕𝑧 𝜕𝜃
Model and Loss
𝑧 = 𝜽𝑇 𝒙 𝜕𝐿 1 𝑦 1−𝑦 1 𝑦ො − 𝑦
= − + =
1 𝜕𝑦ො 𝑁 𝑦ො 1 − 𝑦ො 𝑁 𝑦(1
ො − 𝑦)ො
𝑦ො = 𝜎(𝑧) =
1 + 𝑒 −𝑧 𝜕𝑦ො
= 𝑦(1
ො − 𝑦)
ො
1 𝜕𝑧 𝜕𝐿 1 𝑇
L= −𝑦 𝑇 log 𝑦ො − (1 − 𝑦 𝑇 )log 1 − 𝑦ො = 𝑥 (𝑦ො − 𝑦)
𝑁 𝜕𝑧 𝜕𝜃 𝑁
=𝑥
𝜕𝜃
Year 2020 12
9
AI VIETNAM
AI Insight Course
Idea of Logistic Regression
𝜕𝐿 𝜕𝐿 𝜕𝑦ො 𝜕𝑧 Derivative
Construct loss =
𝜕𝜃 𝜕𝑦ො 𝜕𝑧 𝜕𝜃
Model and Loss 𝜕𝐿 1 𝑦 1−𝑦 1 𝑦ො − 𝑦
𝑧 = 𝜽𝑇 𝒙 = − + =
1 𝜕𝑦ො 𝑁 𝑦ො 1 − 𝑦ො 𝑁 𝑦(1
ො − 𝑦)ො
𝑦ො = 𝜎(𝑧) = 𝜕𝑦ො
1 + 𝑒 −𝑧 = 𝑦(1
ො − 𝑦)
ො
1 𝜕𝑧 𝜕𝐿 1 𝑇
L= −𝑦 𝑇 log 𝑦ො − (1 − 𝑦 𝑇 )log 1 − 𝑦ො = 𝑥 (𝑦ො − 𝑦)
𝑁 𝜕𝑧 𝜕𝜃 𝑁
=𝑥
𝜕𝜃
-log(𝑦)
ො -log(1-𝑦)
ො
Year 2020 13
9
AI VIETNAM
AI Insight Course
Idea of Logistic Regression
Feature Label Feature Label
𝑧 = 𝜽𝑇 𝒙
Category 1 1 Category 1
𝑦ො = 𝜎(𝑧) =
1 + 𝑒 −𝑧
Category 2 Category 2
1 1
𝑇𝒙 𝑇𝒙
1 + 𝑒 −𝜽 1 + 𝑒 −𝜽
Year 2020 14
4
Outline
Sigmoid function
From Linear to Logistic Regression
Logistic Regression – Stochastic
Logistic Regression – Mini-batch
Logistic Regression – Batch
AI VIETNAM
AI Insight Course
Logistic Regression-Stochastic
𝜽𝑇 = [𝑏 𝑤1 𝑤2 ]
1) Pick a sample (𝑥, 𝑦) from training data 𝑥1 𝑥2
𝒙𝑇 = [1 𝑥1 𝑥2 ]
2) Tính output 𝑦ො
𝑧 = 𝜽𝑇 𝒙 Model
1
𝑦ො = 𝜎(𝑧) =
1 + 𝑒 −𝑧 𝑏 𝑤1 𝑤2
3) Tính loss
𝐿(𝜽) = −ylogොy−(1−y)log(1−ොy )
𝑧 = 𝑤1 𝑥1 + 𝑤2 𝑥2 + 𝑏 𝒛 = 0.78
1
𝒙 = 1.4 𝒚= 0
0.2
1 Label
ෝ = 0.6856
𝒚 𝑦ො = 𝜎(𝑧) =
1 + 𝑒 −𝑧
𝑦 𝒚= 0
Loss
𝑳 = 1.1573 −y T logොy−(1−y)T log(1−ොy )
16
AI VIETNAM
AI Insight Course
Logistic Regression-Stochastic
𝒙1 = 1.4 𝑥1 𝑥2 𝒙2 = 0.2
Dataset
𝜂 = 0.01
Model
𝑏 𝑤1 𝑤2
𝑏 = 0.1 − 𝜂0.6856=0.0931 0.1 0.5 -0.1
𝑤1 = 0.5 − 𝜂0.9598=0.4990
𝐿′𝑏 𝐿′𝑤1 𝐿′𝑤2
𝑤2 = −0.1+𝜂0.1371=−0.1013
𝑧 = 𝑤1 𝑥1 + 𝑤2 𝑥2 + 𝑏 𝒛 = 0.78
1
𝒙 = 1.4 𝒚= 0 𝐿′𝜽 = x(ොy − 𝑦) ෝ = 0.6856
𝒚
0.2 1 1 Label
= 1.4 0.6856 𝑦ො = 𝜎(𝑧) =
1 + 𝑒 −𝑧
0.2 𝒚= 0 𝑦
0.6856 𝐿′𝑏
= 0.9599 = 𝐿′𝑤1 Loss
−0.1371 𝐿′𝑤2 -y T logොy−(1−y)T log(1−ොy )
Model
𝑏 𝑤1 𝑤2
0.1 0.5 -0.1
0.83
𝑧 = 𝑤1 𝑥1 + 𝑤2 𝑥2 + 𝑏 𝒛=
1 1 2.02
𝒙 = 1.5 4.1 0
𝒚=
0.2 1.3 1
0.6963 1 Label
ෝ=
𝒚 𝑦ො = 𝜎(𝑧) =
0.8828 1 + 𝑒 −𝑧 0
𝑦 𝒚=
1
Loss
1.1918
𝑳= −y T logොy−(1−y)T log(1−ොy )
0.1245 20
𝑏 = 0.1 − 𝜂0.28961=0.097103
𝑤1 = 0.5 − 𝜂0.28217=0.49717
Dataset 𝑤2 = −0.1+𝜂0.0064=−0.09993
1.5 0.2
𝒙1 =
4.1
𝑥1 𝑥2 𝒙2 =
1.3
Model
𝑏 𝑤1 𝑤2
0.1 0.5 -0.1
0.78
0.83
𝑧 = 𝑤1 𝑥1 + 𝑤2 𝑥2 + 𝑏 𝒛=
1.49
0.6856 2.02
1 1.4 0.2 0 0.6963
1 1.5 0.2 0 ෝ=
𝒚
𝒙= 𝒚= 0.8160 1
1 3.0 1.1 1 0.8828 Label
𝑦ො = 𝜎(𝑧) = 0
1 4.1 1.3 1 1 + 𝑒 −𝑧 0
𝑦 𝒚=
1
1.1573 Loss 1
Average loss = 0.6692 1.1918
𝑳=
0.2032 −y T logොy−(1−y)T log(1−ොy )
0.1245 23
𝑥1 𝑥2 Input
Phân loại hoa Iris dựa vào chiều
Backward
dài và chiều rộng của cánh hoa
Step Model
𝑏 𝑤1 𝑤2
Dataset
0.1 0.5 -0.1
𝜂 = 0.01 𝐿′𝑤2
𝐿′𝑏 𝐿′𝑤1
0.78
𝑏 = 0.1 − 𝜂0.2702=0.0972 0.83
𝑧 = 𝑤1 𝑥1 + 𝑤2 𝑥2 + 𝑏 𝒛=
𝑤1 = 0.5 − 𝜂0.2431=0.4975 1.49
2.02
𝑤2 = −0.1+𝜂0.0195=−0.0998
0.6856
1 Label
𝑦ො = 𝜎(𝑧) =
1 1.4 0.2 0
ෝ=
𝒚
0.6963 1 + 𝑒 −𝑧
1 1.5 0.2 𝒚=
0 0.8160 𝑦
𝒙=
1 3.0 1.1 1 0.8828
1 4.1 1.3 1 Loss
1 T
𝐿′𝜽 = x (ොy − 𝑦) −y T logොy−(1−y)T log(1−ොy )
1.0 1.0 1.0 1.0 N
𝐱T = 1.4 1.5 3.0 4.1 0.6856 𝐿′𝑏
1 1.0 1.0 1.0 1.0 0.2702
0.2 0.2 1.1 1.3 0.6963
= 1.4 1.5 3.0 4.1 = 0.2431 = 𝐿′𝑤1
4 −0.1839
0.2 0.2 1.1 1.3 −0.019 𝐿′𝑤2
−0.1171 24
𝑥1 𝑥2 Input
Phân loại hoa Iris dựa vào chiều
Backward
dài và chiều rộng của cánh hoa
Step Model
𝑏 𝑤1 𝑤2
Dataset
0.0972 0.4975 -0.0998
𝜂 = 0.01 𝐿′𝑤2
𝐿′𝑏 𝐿′𝑤1
0.78
𝑏 = 0.1 − 𝜂0.2702=0.0972 0.83
𝑧 = 𝑤1 𝑥1 + 𝑤2 𝑥2 + 𝑏 𝒛=
𝑤1 = 0.5 − 𝜂0.2431=0.4975 1.49
2.02
𝑤2 = −0.1+𝜂0.0195=−0.0998
0.6856
1 Label
𝑦ො = 𝜎(𝑧) =
1 1.4 0.2 0
ෝ=
𝒚
0.6963 1 + 𝑒 −𝑧
1 1.5 0.2 𝒚=
0 0.8160 𝑦
𝒙=
1 3.0 1.1 1 0.8828
1 4.1 1.3 1 Loss
1 T
𝐿′𝜽 = x (ොy − 𝑦) −y T logොy−(1−y)T log(1−ොy )
1.0 1.0 1.0 1.0 N
𝐱T = 1.4 1.5 3.0 4.1 0.6856 𝐿′𝑏
1 1.0 1.0 1.0 1.0 0.2702
0.2 0.2 1.1 1.3 0.6963
= 1.4 1.5 3.0 4.1 = 0.2431 = 𝐿′𝑤1
4 −0.1839
0.2 0.2 1.1 1.3 −0.019 𝐿′𝑤2
−0.1171 25
Phân loại hoa Iris dựa vào chiều 1.4 0.2
dài và chiều rộng của cánh hoa 1.5 0.2 Forward
𝒙1 = 𝒙2 =
3.0 𝑥1 𝑥2 1.1
4.1 1.3 Step
Dataset
Model
𝑏 𝑤1 𝑤2
0.0972 0.4975 -0.0998
0.77
0.82
𝑧 = 𝑤1 𝑥1 + 𝑤2 𝑥2 + 𝑏 𝒛=
1.48
0.6843 2.00
1 1.4 0.2 0
0.6950
1 1.5 0.2 0 ෝ=
𝒚
𝒙= 𝒚= 0.8146
1 3.0 1.1 1 1 Label
0.8815 𝑦ො = 𝜎(𝑧) = 0
1 4.1 1.3 1
1 + 𝑒 −𝑧 0
𝑦 𝒚=
1
1.1531 Loss 1
1.1875
𝑳=
0.2050 −y T logොy−(1−y)T log(1−ොy )
Average loss = 0.6679
Loss giảm từ 0.6692 xuống 0.6679 0.1260
26
Logistic Regression - Question
1.4 0.2
Phân loại hoa Iris dựa vào chiều 𝒙1 =
1.5
𝒙2 =
0.2
dài và chiều rộng của cánh hoa 3.0 𝑥1 𝑥2 1.1 Backward
4.1 1.3 Step
Dataset
Model
𝑏 𝑤1 𝑤2
? ? ?
Year 2020 28
4
AI VIETNAM
AI Insight Course
Tanh function
𝑒 𝑥 − 𝑒 −𝑥 𝑒 2𝑥 − 1
tanh 𝑥 = 𝑥 −𝑥 = 2𝑥
𝑒 +𝑒 𝑒 +1
2
= 1 − 2𝑥
𝑒 +1
𝑒 𝑥 − 𝑒 −𝑥 1 − 𝑒 −2𝑥
tanh 𝑥 = 𝑥 =
𝑒 + 𝑒 −𝑥 1 + 𝑒 −2𝑥
𝑒 −2𝑥 − 1 2
= − −2𝑥 = −2𝑥 −1
𝑒 +1 𝑒 +1
Year 2020 29
4
AI VIETNAM
AI Insight Course
Tanh function
𝑒 𝑥 − 𝑒 −𝑥 𝑒 2𝑥 − 1 2
tanh 𝑥 = 𝑥 −𝑥 = 2𝑥 = 1 − 2𝑥
𝑒 +𝑒 𝑒 +1 𝑒 +1
𝑒 𝑥 − 𝑒 −𝑥 1 − 𝑒 −2𝑥 𝑒 −2𝑥 − 1 2
tanh 𝑥 = 𝑥 = = − −2𝑥 = −1
𝑒 + 𝑒 −𝑥 1 + 𝑒 −2𝑥 𝑒 + 1 𝑒 −2𝑥 + 1
𝑥 − 𝑒 −𝑥 ′
𝑒 𝑒 𝑥 + 𝑒 −𝑥 𝑒 𝑥 + 𝑒 −𝑥 − 𝑒 𝑥 − 𝑒 −𝑥 𝑒 𝑥 − 𝑒 −𝑥
𝑡𝑎𝑛ℎ′ (𝑥) = 𝑥 =
𝑒 + 𝑒 −𝑥 𝑒 𝑥 + 𝑒 −𝑥 2
𝑒 𝑥 + 𝑒 −𝑥 2 − 𝑒 𝑥 − 𝑒 −𝑥 2
=
𝑒 𝑥 + 𝑒 −𝑥 2
2
𝑒 𝑥 − 𝑒 −𝑥
=1− 𝑥 = 1 − 𝑡𝑎𝑛ℎ2 (𝑥)
𝑒 + 𝑒 −𝑥
Year 2020 30
4
AI VIETNAM
AI Insight Course
Tanh function
𝑒 𝑥 − 𝑒 −𝑥 𝑒 2𝑥 − 1 2
tanh 𝑥 = 𝑥 −𝑥 = 2𝑥 = 1 − 2𝑥
𝑒 +𝑒 𝑒 +1 𝑒 +1
𝑒 𝑥 − 𝑒 −𝑥 1 − 𝑒 −2𝑥 𝑒 −2𝑥 − 1 2
tanh 𝑥 = 𝑥 = = − −2𝑥 = −1
𝑒 + 𝑒 −𝑥 1 + 𝑒 −2𝑥 𝑒 + 1 𝑒 −2𝑥 + 1
′
2 4𝑒 −2𝑥 𝑒 −2𝑥 + 1 − 1
𝑡𝑎𝑛ℎ′ (𝑥) = −1 = −2𝑥 =4
𝑒 −2𝑥 + 1 𝑒 +1 2 𝑒 −2𝑥 + 1 2
11 4 4
= 4 −2𝑥 − −2𝑥 2 =− −
𝑒 +1 𝑒 +1 𝑒 −2𝑥 + 1 2 𝑒 −2𝑥 + 1
2
4 4 2
=− 2−
+1 −1 =1− −1 = 1 − 𝑡𝑎𝑛ℎ2 (𝑥)
𝑒 −2𝑥 + 1 𝑒 −2𝑥 + 1 𝑒 −2𝑥 + 1
Year 2020 31
4
AI VIETNAM
AI Insight Course
Logistic Regression - Tanh
Construct loss 𝜕𝐿 𝜕𝐿 𝜕𝑦ො 𝜕𝑧 Derivative
=
𝜕𝜃 𝜕𝑦ො 𝜕𝑧 𝜕𝜃
Model and Loss
𝑧 = 𝜽𝑇 𝒙 𝜕𝐿 1 𝑦 1−𝑦 1 𝑦ො − 𝑦
𝑒 𝑧 − 𝑒 −𝑧 = − + =
𝜕𝑦ො 𝑁 𝑦ො 1 − 𝑦ො 𝑁 𝑦(1
ො − 𝑦)ො
𝑦ො = 𝑡𝑎𝑛ℎ(𝑧) = 𝑧
𝑒 + 𝑒 −𝑧 𝜕𝑦ො
1 = 1 − 𝑦ො 2
𝜕𝑧 𝜕𝐿 1 𝑇 (𝑦ො − 𝑦)(1 + 𝑦)
ො
L= −𝑦 𝑇 log 𝑦ො − (1 − 𝑦 𝑇 )log 1 − 𝑦ො = 𝑥
𝑁 𝜕𝑧 𝜕𝜃 𝑁 𝑦ො
=𝑥
𝜕𝜃
Year 2020 32
9
AI VIETNAM
AI Insight Course
Logistic Regression-MSE
Construct loss Derivative
𝜕𝐿 𝜕𝐿 𝜕𝑦ො 𝜕𝑧 𝜕𝑦ො
Model and Loss = = 𝑦(1
ො − 𝑦)
ො
𝜕𝜃 𝜕𝑦ො 𝜕𝑧 𝜕𝜃 𝜕𝑧
𝜕𝑧
𝑧 = 𝜽𝑇 𝒙 𝜕𝐿 =𝑥
= 2(𝑦ො − 𝑦) 𝜕𝜃
1 𝜕𝑦ො
𝑦ො = 𝜎(𝑧) = L = (𝑦ො − 𝑦)2
1 + 𝑒 −𝑧
𝜕𝐿 1 𝑇
= 2𝑥 (𝑦ො − 𝑦)𝑦(1
ො − 𝑦)
ො
𝜕𝜃 𝑁
Year 2020 33
AI VIETNAM
AI Insight Course
Summary
1) Pick all the samples from training data
2) Tính output 𝑦ො
𝑦2 𝑧 = 𝜽𝑇 𝒙
1
ෝ = 𝜎(𝑧) =
𝒚
1 + 𝑒 −𝑧
1 3) Tính loss (binary cross-entropy)
𝒚 𝑦=
1 + 𝑒 −𝑥 1
𝐿(𝜽) = −y T logොy−(1−y)T log(1−ොy )
N
𝑦1
4) Tính đạo hàm
𝑥1 𝑥2 1 T
𝐿′𝜽 = x (ොy − 𝑦)
N
𝒙 5) Cập nhật tham số
−∞ +∞
𝜽 = 𝜽 − 𝜂𝐿′𝜽
Sigmoid function
𝜂 is learning rate
34