Week 3 - Lecture Slides - Logistic Regression
Week 3 - Lecture Slides - Logistic Regression
Summer 2024
Today’s Agenda
• Group Quiz
• Generalized Linear Models
Logistic Regression • Case Study: Sentiment Analysis
• Logistic Regression
Evaluation Metrics • Decision Boundary
• Loss Function
• Evaluation Metrics
2
Sentiment Classifier
Sentence
Classifier Model
from review
Input: x Output: y
Predicted class
Converting Text to Numbers (Vectorizing):
Bag of Words
4
Pre-Processing: Sample Dataset
Review Sentiment
“Sushi was great, the food was awesome, but the service +1
was terrible”
… …
“Terrible food; the sushi was rancid.” -1
Vectorizer
Sushi was great the food awesome but service terrible rancid Sentiment
1 3 1 2 1 1 1 1 1 0 +1
… … … … … … … … … … …
1 1 0 1 1 0 0 0 1 1 -1
Attempt 1: Simple Threshold Classifier
Idea: Use a list of good words and bad words, classify review by the most frequent type of word
Word Good?
Simple Threshold Classifier
sushi None
Input 𝑥: Sentence from review
was None
Count the number of positive and negative words, in 𝑥
great Good
If num_posi<ve > num_nega<ve:
the None - 𝑦" = +1
food None
Else:
but None - 𝑦" = −1
awesome Good
service None Example: ”Sushi was great, the food was awesome, but the
service was terrible”
terrible Bad
rancid Bad
Limitations of Attempt 1 (Simple Threshold Classifier)
Longer sequences of words results in more context, more features, and a greater chance of overfitting.
Words Have Different Degrees of Sentiments
Idea: Use the regression model we learnt! The output will be the sentiment!
$
Word Weight
(&)
Predic𝑡𝑒𝑑 𝑆𝑒𝑛𝑡𝑖𝑚𝑒𝑛𝑡 = 𝑦0 = 2 𝑤! φ! (𝑥 ) sushi 0
!"#
was 0
φ! (𝑥) φ" (𝑥) φ# (𝑥) φ$ (𝑥) φ% (𝑥) φ& (𝑥) φ' (𝑥) φ( (𝑥) φ) (𝑥)
great 1
sushi was great the food awesome but service terrible
the 0
1 3 1 2 1 1 1 1 1
food 0
”Sushi was great, the food was awesome, but the service was terrible” awesome 2
Predic𝑡𝑒𝑑 𝑆𝑒𝑛𝑡𝑖𝑚𝑒𝑛𝑡 = 𝑦1 but 0
= (1*1) + (2*1) + (-1*1) service 0
terrible −1
=2
Attempt 2: Linear Regression
𝑆𝑐𝑜𝑟𝑒 𝑥 (#) = 𝑠̂
= w0 + w1φ1(x1) + w2φ2(x2) + … + wDφD(xD)
(
= + 𝑤% φ% (𝑥 (#) )
%&'
= 𝑤 ) φ(𝑥 (#) )
Linear Classifier
Input 𝑥: Sentence from review
Compute S𝑐𝑜𝑟𝑒(𝑥)
If S𝑐𝑜𝑟𝑒 𝑥 > 0:
𝑦1 = +1
Else:
𝑦1 = −1
Consider if only two words had non-zero coefficients
Decision Word Coefficient Weight
Boundary 𝑤* 0.0
awesome 𝑤! 1.0
awful 𝑤" -1.5
4 ⋯
3
2
1
0
0 1 2 3 4 ⋯ #awesome
Consider if only two words had non-zero coefficients
Decision Word Coefficient Weight
Boundary 𝑤* 0.0
awesome 𝑤! 1.0
awful 𝑤" -1.5
4 ⋯
0
=
𝑓𝑢𝑙
𝑤
⋅ #𝑎
1.5
3
𝑚𝑒−
𝑠𝑜
2
𝑒
#𝑎𝑤
1 ⋅
1
0
0 1 2 3 4 ⋯ #awesome
Issue: How do we train this?
4 ⋯ 10
0
𝑢𝑙=
𝑎𝑤𝑓
⋅ #
1.5
3
𝑒 −
𝑚
𝑠𝑜
2
𝑒
#𝑎𝑤
1 ⋅
1
0 𝑤$
0
0 1 2 3 4 ⋯ #awesome 0
Convexity
Probabilities
Examples:
“The sushi & everything else were awesome!”
• Definite positive (+1)
• 𝑃 𝑦 = +1 │𝑥 = “𝑇ℎ𝑒 𝑠𝑢𝑠ℎ𝑖 & 𝑒𝑣𝑒𝑟𝑦𝑡ℎ𝑖𝑛𝑔 𝑒𝑙𝑠𝑒 𝑤𝑒𝑟𝑒 𝑎𝑤𝑒𝑠𝑜𝑚𝑒! ” = 0.99
Notes:
Estimating the probability improves interpretability.
- Unclear how much better a score of 5 is from a score of
3. Clear how much better a probability of 0.75 is than a
probability of 0.5
Connecting Score & Probability
) = +1|𝑥)
Idea: Let’s try to relate the value of 𝑆𝑐𝑜𝑟𝑒(𝑥) to 𝑃(𝑦
#awful
4 ⋯
0
𝑢𝑙= What if 𝑆𝑐𝑜𝑟𝑒 𝑥 is positive?
𝑎𝑤𝑓
⋅#
1.5
3
𝑒−
𝑠𝑜
𝑚 What if 𝑆𝑐𝑜𝑟𝑒 𝑥 is negative?
2
𝑒
𝑎𝑤
1 ⋅#
1
What if 𝑆𝑐𝑜𝑟𝑒 𝑥 is 0?
0
0 1 2 3 4 ⋯ #awesome
Connecting Score & Probability
Want: a function that takes numbers arbitrarily large/small and maps them between 0 and 1.
1
𝑠𝑖𝑔𝑚𝑜𝑖𝑑 𝑆𝑐𝑜𝑟𝑒(𝑥) =
1 + 𝑒 HIJKLM(N)
𝑆𝑐𝑜𝑟𝑒(𝑥) 𝑠𝑖𝑔𝑚𝑜𝑖𝑑(𝑆𝑐𝑜𝑟𝑒 𝑥 )
1
−∞ =0
1 + 𝑒 !(!#)
−2
1
0 = 0.5
1 + 𝑒 !(%)
2
1
∞ =1
1 + 𝑒 !(#)
Logistic Function
1
𝑃 𝑦) = +1 𝑥) , 𝑤 = 𝑠𝑖𝑔𝑚𝑜𝑖𝑑 𝑆𝑐𝑜𝑟𝑒 𝑥) =
1 + 𝑒 0. φ(1 )
! (#)
w> h(x)
Else:
1
1+e
𝑦? = −1
w> h(x)
Quality Metric = Likelihood
Want to compute the probability of seeing our dataset for every possible setting
for 𝑤. Find 𝑤 that makes data most likely!
Now that we have our new model, we will talk about how to choose 𝑤
O to be the “best fit”.
The choice of 𝑤 affects how likely seeing our dataset is
#awful
&
ℓ 𝑤 = H 𝑃(𝑦 (%) |𝑥 (%) , 𝑤)
%
1
𝑃 𝑦 (%) = +1 𝑥 (%) , 𝑤 = ! )(* (#) )
1 + 𝑒 '(
! )(* (#) )
𝑒 '(
𝑃 𝑦 (%) = −1 𝑥 (%) , 𝑤 = ! )(* (#) )
1 + 𝑒 '(
0 1 2 3 4 ⋯ #awesome
Loss Function −𝐿𝑜𝑔(𝑎) −𝐿𝑜𝑔(1 − 𝑎)
𝑤
9 = argmax ℓ 𝑤 = argmax A 𝑃(𝑦& |𝑥& , 𝑤)
% %
&'(
𝑤
9 = argmax ℓ 𝑤 = argmax log(ℓ 𝑤 ) = argmax I log(𝑃 𝑦& 𝑥& , 𝑤))
% % %
&'(
1
8 = 0.5
1+ 𝑒 J(K L)
w0 0 w0 0 w0 0
w#awesome +1 w#awesome +2 w#awesome +6
w#awful -1 w#awful -2 w#awful -6
w> h(x)
w> h(x)
w> h(x)
1
1
1+e
1+e
1+e
𝑥! 𝑥!
𝑥" 𝑥"
How do we extend Logistic Regression to Multiclass classification?
• Approach 1: one-versus-one
• Computationally very expensive
• Approach 2: one-versus-rest
• Approach 3: discriminant functions
One-vs-all (one-vs-rest) 𝑥#
+
ℎ2 𝑥
𝑥$
𝑥!
𝑥#
ℎ2- 𝑥
𝑥" 𝑥$
Class 1:
Class 2: 𝑥#
Class 3: ℎ23 𝑥
"
ℎ! 𝑥 = 𝑃 𝑦 = 𝑖 𝑥; 𝜃 (𝑖 = 1, 2, 3) 𝑥$
Slide credit: Andrew Ng
One-vs-all
• Train a logistic regression classifier ℎ2) 𝑥 for each class 𝑖 to predict the
probability that 𝑦 = 𝑖
https://dataaspirant.com/difference-between-softmax-function-and-sigmoid-function/
Evaluation Metrics
True Positive (TP):
Accuracy
• is defined as the percentage of correct predictions for the test data. It can be calculated easily by dividing the number of
correct predictions by the number of total predictions.
Precision
• is defined as the fraction of relevant examples (true positives) among all of the examples which were predicted to
belong in a certain class.
Recall
• is defined as the fraction of examples which were predicted to belong to a class with respect to all of the examples that
truly belong in the class.
Evaluation
Metrics
Evaluate the performance of a COVID-19
Antigen test kits?
Let’s take a hypothetical kit that tests 20 individuals for potential case of COVID19
The Reality:
What Model
Predicts:
Arithmetic Average vs Harmonic Average
Jaccard Index and Dice Score
Jaccard Index and Dice Score
49
Other Applications
Semantic Segmentation Automatic Speech Recognition
Micro/Macro Average
Our model is
50% accurate?
Last Experiment
Frequency Metric
Class A 5 100%
Class B 95 0%
• Week 4 – Clustering
• K-Means, Gaussian Mixture Models, & EM
Coming up
Next…
• Homework #2 due Friday, May 31 (@ 7pm Pacific Time)
HW2 – Q5 Walkthrough
https://colab.research.google.com/drive/1iJc23kLBuCEeIygTysHxJc83BCfbHUVK
Questions?