0% found this document useful (0 votes)

9 views

Week 3 - Lecture Slides - Logistic Regression

Uploaded by

fantiaoxi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views

Week 3 - Lecture Slides - Logistic Regression

Uploaded by

fantiaoxi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 54

CS6140: Machine Learning

Week 3 – Logistic Regression

Dr. Ryan Rad

Summer 2024
Today’s Agenda

• Group Quiz
• Generalized Linear Models
Logistic Regression • Case Study: Sentiment Analysis
• Logistic Regression
Evaluation Metrics • Decision Boundary
• Loss Function

• Evaluation Metrics

2
Sentiment Classifier

In our example, we want to classify a restaurant review as positive or negative.

Sentence
Classifier Model
from review

Input: x Output: y
Predicted class
Converting Text to Numbers (Vectorizing):
Bag of Words

Idea: One feature per word!

Example: ”Sushi was great, the food was awesome, but the service was terrible”

sushi was great the food awesome but service terrible

This has to be too simple, right?

Stay tuned for issues that arise and how to address them J

4
Pre-Processing: Sample Dataset

Review Sentiment
“Sushi was great, the food was awesome, but the service +1
was terrible”
… …
“Terrible food; the sushi was rancid.” -1

Vectorizer

Sushi was great the food awesome but service terrible rancid Sentiment
1 3 1 2 1 1 1 1 1 0 +1
… … … … … … … … … … …
1 1 0 1 1 0 0 0 1 1 -1
Attempt 1: Simple Threshold Classifier

Idea: Use a list of good words and bad words, classify review by the most frequent type of word

Word Good?
Simple Threshold Classifier
sushi None
Input 𝑥: Sentence from review
was None
Count the number of positive and negative words, in 𝑥
great Good
If num_posi<ve > num_nega<ve:
the None - 𝑦" = +1
food None
Else:
but None - 𝑦" = −1
awesome Good
service None Example: ”Sushi was great, the food was awesome, but the
service was terrible”
terrible Bad
rancid Bad
Limitations of Attempt 1 (Simple Threshold Classifier)

• Words have different degrees of sentiment.

Awesome > Great
How can we weigh them differently?

• Single words are not enough sometimes…

“Good” → Positive
“Not Good” → Negative

• How do we get list of positive/negative words?

Single Words Are Sometimes Not Enough!

What if instead of making each feature one word, we made it two?

• Unigram: a sequence of one word
• Bigram: a sequence of two words
• N-gram: a sequence of n-words
”Sushi was good, the food was good, the service was not good”

sushi was good the food service not

1 3 3 2 1 1 1

sushi was good the food the service was not

was good the food was service was not good
1 2 2 1 1 1 1 1 1

Longer sequences of words results in more context, more features, and a greater chance of overfitting.
Words Have Different Degrees of Sentiments

What if we generalize good/bad to a numeric weighting per word?

Word Good? Word Weight
sushi None sushi 0
was None was 0
great Good great 1
the None the 0
food None food 0
but None but 0
awesome Good awesome 2
service None service 0
terrible Bad terrible -1
rancid Bad rancid -2
How do we get the word weights?

What if we learn them from the data?

Word Weight
y = w0 + w1φ1(x1) + w2φ2(x2) + … + wDφD(xD)
sushi 𝑤!
was 𝑤"
φ! (𝑥) φ" (𝑥) φ# (𝑥) φ$ (𝑥) φ% (𝑥) φ& (𝑥) φ' (𝑥) φ( (𝑥) φ) (𝑥) great 𝑤#
sushi was great the food awesome but service terrible the 𝑤$
1 3 1 2 1 1 1 1 1 food 𝑤%
awesome 𝑤&
but 𝑤'
In linear regression we learnt the weights for each feature. service 𝑤(
Can we do something similar here? terrible 𝑤)
Attempt 2: Linear Regression
y = w0 + w1φ1(x1) + w2φ2(x2) + … + wDφD(xD)

Idea: Use the regression model we learnt! The output will be the sentiment!
$
Word Weight
(&)
Predic𝑡𝑒𝑑 𝑆𝑒𝑛𝑡𝑖𝑚𝑒𝑛𝑡 = 𝑦0 = 2 𝑤! φ! (𝑥 ) sushi 0
!"#
was 0
φ! (𝑥) φ" (𝑥) φ# (𝑥) φ$ (𝑥) φ% (𝑥) φ& (𝑥) φ' (𝑥) φ( (𝑥) φ) (𝑥)
great 1
sushi was great the food awesome but service terrible
the 0
1 3 1 2 1 1 1 1 1
food 0
”Sushi was great, the food was awesome, but the service was terrible” awesome 2
Predic𝑡𝑒𝑑 𝑆𝑒𝑛𝑡𝑖𝑚𝑒𝑛𝑡 = 𝑦1 but 0
= (1*1) + (2*1) + (-1*1) service 0
terrible −1
=2
Attempt 2: Linear Regression

𝑆𝑐𝑜𝑟𝑒 𝑥 (#) = 𝑠̂
= w0 + w1φ1(x1) + w2φ2(x2) + … + wDφD(xD)
(

= + 𝑤% φ% (𝑥 (#) )
%&'

= 𝑤 ) φ(𝑥 (#) )

This score will be always numerical!

Attempt 3: Linear Classifier
Idea: Only predict the sign of the output!

Predic𝑡𝑒𝑑 𝑆𝑒𝑛𝑡𝑖𝑚𝑒𝑛𝑡 = 𝑦1 = 𝑠𝑖𝑔𝑛( 𝑆𝑐𝑜𝑟𝑒 𝑥 )

Linear Classifier
Input 𝑥: Sentence from review
Compute S𝑐𝑜𝑟𝑒(𝑥)
If S𝑐𝑜𝑟𝑒 𝑥 > 0:
𝑦1 = +1
Else:
𝑦1 = −1
Consider if only two words had non-zero coefficients
Decision Word Coefficient Weight
Boundary 𝑤* 0.0
awesome 𝑤! 1.0
awful 𝑤" -1.5

𝑠̂ = 1 ⋅ #𝑎𝑤𝑒𝑠𝑜𝑚𝑒 − 1.5 ⋅ #𝑎𝑤𝑓𝑢𝑙

#awful

4 ⋯
3
2
1
0

0 1 2 3 4 ⋯ #awesome
Consider if only two words had non-zero coefficients
Decision Word Coefficient Weight
Boundary 𝑤* 0.0
awesome 𝑤! 1.0
awful 𝑤" -1.5

𝑠̂ = 1 ⋅ #𝑎𝑤𝑒𝑠𝑜𝑚𝑒 − 1.5 ⋅ #𝑎𝑤𝑓𝑢𝑙

#awful

4 ⋯
0
=
𝑓𝑢𝑙
𝑤
⋅ #𝑎
1.5

3
𝑚𝑒−
𝑠𝑜

2
𝑒
#𝑎𝑤
1 ⋅
1
0

0 1 2 3 4 ⋯ #awesome
Issue: How do we train this?

Say we were to use the MSE…

,
1 )
𝑀𝑆𝐸 = '(𝑦 − 𝑠𝑖𝑔𝑛( 𝑆𝑐𝑜𝑟𝑒 𝑥 ()) ) )-
𝑛
)*+

The derivative of the 𝑠𝑖𝑔𝑛 function is 0!

Hence, Gradient Descent will no longer work L

Mathematical One idea is to just model the processing of finding 𝑤
" based on
Definition what we discussed in linear regression using MSE
,
1
Can we use MSE for 𝑤
5 = argmin '? ? ?
. 2𝑛
classification task? )*+
One idea is to just model the processing of finding 𝑤
" based on
Mathematical what we discussed in linear regression using MSE
Definition 1
%
𝑤
" = argmin . 𝕀 𝑦" ≠ 𝑦2"
! 2𝑛
"#$
Can we use MSE for
classification task?
Great! This makes sense conceptually!
Will this work?
One idea is to just model the processing of finding 𝑤
" based on
Mathematical what we discussed in linear regression using MSE
Definition 1
%
𝑤
" = argmin . 𝕀 𝑦" ≠ 𝑦2"
! 2𝑛
"#$
Can we use MSE for Will this work?
classification task?
Assume ℎ$ 𝑥 = #𝑎𝑤𝑒𝑠𝑜𝑚𝑒 so 𝑤$ is its coefficient and 𝑤& is fixed.
#awful loss / error

4 ⋯ 10
0
𝑢𝑙=
𝑎𝑤𝑓
⋅ #
1.5
3

𝑒 −
𝑚
𝑠𝑜
2

𝑒
#𝑎𝑤
1 ⋅
1

0 𝑤$
0

0 1 2 3 4 ⋯ #awesome 0
Convexity

Taken from Prof. Matt Gormley, CMU

Convexity

Taken from Prof. Matt Gormley, CMU

Quality Metric The MSE loss function doesn’t work because of different reasons:
for Classification The outputs are discrete values with no ordered nature, so we
need a different way to frame how close a prediction is to a
certain correct category
The MSE loss function for classification task is not continuous,
differentiable or convex, so we can’t use optimization
algorithm like Gradient Descent to find an optimal set of
weights

Note: Convexity is an important concept in Machine Learning. By

minimizing error, we want to find where that global minimum is,
and that’s ideal in a convex function.

Let’s frame this problem in term of probabilities instead.

Assume that there is some randomness in the world, and instead will
try to model the probability of a positive/negative label.

Probabilities
Examples:
“The sushi & everything else were awesome!”
• Definite positive (+1)
• 𝑃 𝑦 = +1 │𝑥 = “𝑇ℎ𝑒 𝑠𝑢𝑠ℎ𝑖 & 𝑒𝑣𝑒𝑟𝑦𝑡ℎ𝑖𝑛𝑔 𝑒𝑙𝑠𝑒 𝑤𝑒𝑟𝑒 𝑎𝑤𝑒𝑠𝑜𝑚𝑒! ” = 0.99

“The sushi was alright, the service was OK”

• Not as sure
• 𝑃 𝑦 = −1│𝑥 = “𝑇ℎ𝑒 𝑠𝑢𝑠ℎ𝑖 𝑎𝑙𝑟𝑖𝑔ℎ𝑡, 𝑡ℎ𝑒 𝑠𝑒𝑟𝑣𝑖𝑐𝑒 𝑤𝑎𝑠 𝑜𝑘𝑎𝑦! ” = 0.5

Use probability as the measurement of certainty

𝑃(𝑦|𝑥)
Idea: Estimate probabilities 𝑃< 𝑦 𝑥 and use those for prediction

Probability Probability Classifier

Classifier Input 𝑥: Sentence from review
< = +1|𝑥)
Estimate class probability 𝑃(𝑦
If 𝑃< 𝑦 = +1 𝑥 > 0.5:
- 𝑦2 = +1
Else:
- 𝑦2 = −1

Notes:
Estimating the probability improves interpretability.
- Unclear how much better a score of 5 is from a score of
3. Clear how much better a probability of 0.75 is than a
probability of 0.5
Connecting Score & Probability

) = +1|𝑥)
Idea: Let’s try to relate the value of 𝑆𝑐𝑜𝑟𝑒(𝑥) to 𝑃(𝑦
#awful
4 ⋯

0
𝑢𝑙= What if 𝑆𝑐𝑜𝑟𝑒 𝑥 is positive?
𝑎𝑤𝑓
⋅#
1.5
3

𝑒−
𝑠𝑜
𝑚 What if 𝑆𝑐𝑜𝑟𝑒 𝑥 is negative?
2

𝑒
𝑎𝑤
1 ⋅#
1

What if 𝑆𝑐𝑜𝑟𝑒 𝑥 is 0?
0

0 1 2 3 4 ⋯ #awesome
Connecting Score & Probability

𝑆𝑐𝑜𝑟𝑒 𝑥) = 𝑤 / φ(𝑥 ()) )

−∞ 𝑦?) = −1 0 𝑦?) = +1 ∞

Very sure Not sure if Very sure

𝑦?) = −1 𝑦?) = −1 𝑜𝑟 𝑦?) = +1 𝑦?) = +1

𝑃C 𝑦) = +1|𝑥) = 0 𝑃C 𝑦) = +1|𝑥) = 0.5 𝑃C 𝑦) = +1|𝑥) = 1

0 C = +1 |𝑥)
𝑃(𝑦 1
Logistic Function

Want: a function that takes numbers arbitrarily large/small and maps them between 0 and 1.
1
𝑠𝑖𝑔𝑚𝑜𝑖𝑑 𝑆𝑐𝑜𝑟𝑒(𝑥) =
1 + 𝑒 HIJKLM(N)
𝑆𝑐𝑜𝑟𝑒(𝑥) 𝑠𝑖𝑔𝑚𝑜𝑖𝑑(𝑆𝑐𝑜𝑟𝑒 𝑥 )
1
−∞ =0
1 + 𝑒 !(!#)

−2
1
0 = 0.5
1 + 𝑒 !(%)

2
1
∞ =1
1 + 𝑒 !(#)
Logistic Function
1
𝑃 𝑦) = +1 𝑥) , 𝑤 = 𝑠𝑖𝑔𝑚𝑜𝑖𝑑 𝑆𝑐𝑜𝑟𝑒 𝑥) =
1 + 𝑒 0. φ(1 )
! (#)

Logistic Regression Classifier

Input 𝑥: Sentence from review
Estimate class probability 𝑃C 𝑦 = +1 𝑥, 𝑤 5 / ℎ 𝑥) )
5 = 𝑠𝑖𝑔𝑚𝑜𝑖𝑑(𝑤
If 𝑃C 𝑦 = +1 𝑥, 𝑤
5 > 0.5:
𝑦? = +1

w> h(x)
Else:

1
1+e
𝑦? = −1

w> h(x)
Quality Metric = Likelihood

Want to compute the probability of seeing our dataset for every possible setting
for 𝑤. Find 𝑤 that makes data most likely!

Data Point φ𝟏 (𝒙) φ𝟐 (𝒙) 𝒚 Choose 𝒘 to maximize

𝑥 (O) , 𝑦 (O) 2 1 +1 𝑃 𝑦 (O) = +1 𝑥 (O) , 𝑤)
𝑥 (P) , 𝑦 (P) 0 2 −1 𝑃 𝑦 (P) = −1 𝑥 (P) , 𝑤)
𝑥 (Q) , 𝑦 (Q) 3 3 −1 𝑃 𝑦 (Q) = −1 𝑥 (Q) , 𝑤)
𝑥 (R) , 𝑦 (R) 4 1 +1 𝑃 𝑦 (R) = +1 𝑥 (R) , 𝑤)
Learn 𝑤
!

Now that we have our new model, we will talk about how to choose 𝑤
O to be the “best fit”.
The choice of 𝑤 affects how likely seeing our dataset is
#awful
&
ℓ 𝑤 = H 𝑃(𝑦 (%) |𝑥 (%) , 𝑤)
%

1
𝑃 𝑦 (%) = +1 𝑥 (%) , 𝑤 = ! )(* (#) )
1 + 𝑒 '(
! )(* (#) )
𝑒 '(
𝑃 𝑦 (%) = −1 𝑥 (%) , 𝑤 = ! )(* (#) )
1 + 𝑒 '(

0 1 2 3 4 ⋯ #awesome
Loss Function −𝐿𝑜𝑔(𝑎) −𝐿𝑜𝑔(1 − 𝑎)

Find the 𝑤 that maximizes the likelihood Cost(ℎ! 𝑥 , 𝑦) = /

−log ℎ! 𝑥 if 𝑦 = 1
) −log 1 − ℎ! 𝑥 if 𝑦 = 0

𝑤
9 = argmax ℓ 𝑤 = argmax A 𝑃(𝑦& |𝑥& , 𝑤)
% %
&'(

Generally, we maximize the log-likelihood which looks like

)

𝑤
9 = argmax ℓ 𝑤 = argmax log(ℓ 𝑤 ) = argmax I log(𝑃 𝑦& 𝑥& , 𝑤))
% % %
&'(

Also commonly written by separating out positive/negative terms

𝐹𝑜𝑟 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑡𝑒𝑟𝑚𝑠 𝐹𝑜𝑟 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒 𝑡𝑒𝑟𝑚𝑠

Decision Boundary

The decision boundary is the set of x such that

1
8 = 0.5
1+ 𝑒 J(K L)

A little bit of algebra shows that this is equivalent to

8 L)
1 = 𝑒 J(K

and, taking the natural log of both sides,

$
0 = − R 𝑤& x&
&"#

So, our decision boundary is linear!

Complex Decision Boundaries?

What if we want to use a more complex decision boundary?

• Need more complex model/features! (More on this later)
The logistic function become “sharper” with larger coefficients.

w0 0 w0 0 w0 0
w#awesome +1 w#awesome +2 w#awesome +6
w#awful -1 w#awful -2 w#awful -6

w> h(x)

w> h(x)
w> h(x)

1
1

1+e

1+e
1+e

#awesome - #awful #awesome - #awful #awesome - #awful

What does this mean for our predictions?

Because the 𝑆𝑐𝑜𝑟𝑒(𝑥) is getting larger in magnitude, the
probabilities are closer to 0 or 1!
Binary classification Multiclass classification

𝑥! 𝑥!

𝑥" 𝑥"
How do we extend Logistic Regression to Multiclass classification?

• Approach 1: one-versus-one
• Computationally very expensive

• Approach 2: one-versus-rest
• Approach 3: discriminant functions
One-vs-all (one-vs-rest) 𝑥#
+
ℎ2 𝑥
𝑥$
𝑥!
𝑥#
ℎ2- 𝑥

𝑥" 𝑥$
Class 1:
Class 2: 𝑥#
Class 3: ℎ23 𝑥

"
ℎ! 𝑥 = 𝑃 𝑦 = 𝑖 𝑥; 𝜃 (𝑖 = 1, 2, 3) 𝑥$
Slide credit: Andrew Ng
One-vs-all
• Train a logistic regression classifier ℎ2) 𝑥 for each class 𝑖 to predict the
probability that 𝑦 = 𝑖

• Given a new input 𝑥, pick the class 𝑖 that maximizes

max ℎ2) 𝑥
4
SoftMax
SoftMax
Read more on the difference between Softmax and Sigmoid (6 min):

https://dataaspirant.com/difference-between-softmax-function-and-sigmoid-function/
Evaluation Metrics
True Positive (TP):

• Predicted True and True in reality.

True Negative (TN):

• Predicted False and False in reality.

False Positive (FP):

• Predicted True and False in reality.

False Negative (FN):

• Predicted False and True in reality.

Confusion Metrics
Evaluation Metrics
Evaluation Metrics

Accuracy
• is defined as the percentage of correct predictions for the test data. It can be calculated easily by dividing the number of
correct predictions by the number of total predictions.
Precision
• is defined as the fraction of relevant examples (true positives) among all of the examples which were predicted to
belong in a certain class.
Recall
• is defined as the fraction of examples which were predicted to belong to a class with respect to all of the examples that
truly belong in the class.
Evaluation
Metrics
Evaluate the performance of a COVID-19
Antigen test kits?

Let’s take a hypothetical kit that tests 20 individuals for potential case of COVID19

In this sample population we will state:

• 16 people do NOT have COVID-19

The Reality:

What Model
Predicts:
Arithmetic Average vs Harmonic Average
Jaccard Index and Dice Score
Jaccard Index and Dice Score

49
Other Applications
Semantic Segmentation Automatic Speech Recognition
Micro/Macro Average
Our model is
50% accurate?

Last Experiment

Frequency Metric
Class A 5 100%
Class B 95 0%
• Week 4 – Clustering
• K-Means, Gaussian Mixture Models, & EM

Coming up
Next…
• Homework #2 due Friday, May 31 (@ 7pm Pacific Time)
HW2 – Q5 Walkthrough
https://colab.research.google.com/drive/1iJc23kLBuCEeIygTysHxJc83BCfbHUVK
Questions?

Method Statement Soft Landscaping Works DP 2
78% (9)
Method Statement Soft Landscaping Works DP 2
2 pages
Ratan Tata: Group Presentation On
100% (1)
Ratan Tata: Group Presentation On
9 pages
Tracking, Track Parcels, Packages, Shipments - DHL Express Tracking
No ratings yet
Tracking, Track Parcels, Packages, Shipments - DHL Express Tracking
2 pages
Text Classification Using Logistics Regression
No ratings yet
Text Classification Using Logistics Regression
64 pages
Yousef ML Washin Classification
100% (1)
Yousef ML Washin Classification
333 pages
Multimedia Application L9
No ratings yet
Multimedia Application L9
43 pages
Logistic Regression
No ratings yet
Logistic Regression
78 pages
5 LR Apr 7 2021
No ratings yet
5 LR Apr 7 2021
94 pages
Lecture 6 Linear Classifier 1
No ratings yet
Lecture 6 Linear Classifier 1
66 pages
Logisticregression 2021
No ratings yet
Logisticregression 2021
78 pages
Logistic Regression_byimran
No ratings yet
Logistic Regression_byimran
35 pages
Supervised Learning
No ratings yet
Supervised Learning
5 pages
CS388N Practice Questions Answers
No ratings yet
CS388N Practice Questions Answers
48 pages
Ch03 LogisticRegression
No ratings yet
Ch03 LogisticRegression
79 pages
5_LR_Apr_7_2021 (3)
No ratings yet
5_LR_Apr_7_2021 (3)
93 pages
PW3 SupervisedLearning
No ratings yet
PW3 SupervisedLearning
10 pages
W8 - Logistic Regression
No ratings yet
W8 - Logistic Regression
18 pages
Logistic Regression
No ratings yet
Logistic Regression
8 pages
Logistic Regression
No ratings yet
Logistic Regression
26 pages
הרצאה-Classifiers and Decision Trees
No ratings yet
הרצאה-Classifiers and Decision Trees
119 pages
3 LogisticRegression
No ratings yet
3 LogisticRegression
30 pages
04- Linear-Classification-2024
No ratings yet
04- Linear-Classification-2024
65 pages
Machine Learning - Logistic Regression
No ratings yet
Machine Learning - Logistic Regression
16 pages
DSCTP 2022 1 ML Slides
No ratings yet
DSCTP 2022 1 ML Slides
351 pages
Machine Learning - Unit 2
No ratings yet
Machine Learning - Unit 2
104 pages
Lclas (Lect 04)
No ratings yet
Lclas (Lect 04)
9 pages
AI Lec2.1 MLsupervised
No ratings yet
AI Lec2.1 MLsupervised
21 pages
lec22-ML III
No ratings yet
lec22-ML III
51 pages
1. Statistical Learning Theory
No ratings yet
1. Statistical Learning Theory
100 pages
W4_ Logistic Regression
No ratings yet
W4_ Logistic Regression
52 pages
06LogisticRegression
No ratings yet
06LogisticRegression
55 pages
Lec10 Intro ML
No ratings yet
Lec10 Intro ML
93 pages
Generalized Linear Model
No ratings yet
Generalized Linear Model
67 pages
Deep Learning Week 204-4
No ratings yet
Deep Learning Week 204-4
1 page
L3_CSE256_FA24_FFN
No ratings yet
L3_CSE256_FA24_FFN
64 pages
Lecture3 Logistic Regression Classifier V0
No ratings yet
Lecture3 Logistic Regression Classifier V0
41 pages
Week 7
No ratings yet
Week 7
53 pages
Mathematical Foundations of Computational Linguistics: Manfred Klenner and Jannis Vamvas
No ratings yet
Mathematical Foundations of Computational Linguistics: Manfred Klenner and Jannis Vamvas
32 pages
CS229 Supplemental Lecture Notes: 1 Binary Classification
No ratings yet
CS229 Supplemental Lecture Notes: 1 Binary Classification
7 pages
Lecture W3
No ratings yet
Lecture W3
28 pages
Unit 1 Lecture 2
No ratings yet
Unit 1 Lecture 2
4 pages
Lecture 5_Logistic Regression (1)
No ratings yet
Lecture 5_Logistic Regression (1)
28 pages
Lec 21
No ratings yet
Lec 21
34 pages
Sentimental Analysis
No ratings yet
Sentimental Analysis
34 pages
ML Unit 1
No ratings yet
ML Unit 1
73 pages
Week4
No ratings yet
Week4
45 pages
Logistic Regression: Some Slides Adapted From Dan Jurfasky and Brendan O'Connor
No ratings yet
Logistic Regression: Some Slides Adapted From Dan Jurfasky and Brendan O'Connor
53 pages
Revised Lecture Notes 2
No ratings yet
Revised Lecture Notes 2
16 pages
01B-DL2023-LinearModels
No ratings yet
01B-DL2023-LinearModels
47 pages
Week 2 Introduction To Linear Models - Revised - v1
No ratings yet
Week 2 Introduction To Linear Models - Revised - v1
54 pages
Binary Classification and Logistic Regression
No ratings yet
Binary Classification and Logistic Regression
7 pages
W4 Ecs7020p
No ratings yet
W4 Ecs7020p
48 pages
ML Linear Model
No ratings yet
ML Linear Model
10 pages
ML 03 Logistic Regression
No ratings yet
ML 03 Logistic Regression
32 pages
L13_14_15_16_17_Classification
No ratings yet
L13_14_15_16_17_Classification
123 pages
12 Deepak Fuzzy Logic SVM
No ratings yet
12 Deepak Fuzzy Logic SVM
81 pages
12 Deepak Fuzzy Logic SVM PDF
No ratings yet
12 Deepak Fuzzy Logic SVM PDF
81 pages
Lecture 6
No ratings yet
Lecture 6
19 pages
7 Logistic-Regression
No ratings yet
7 Logistic-Regression
63 pages
lec21-ML II
No ratings yet
lec21-ML II
66 pages
cs188-fa22-note21
No ratings yet
cs188-fa22-note21
4 pages
The Law of Kindness: God's Key to the Locked Heart
From Everand
The Law of Kindness: God's Key to the Locked Heart
Derrick Jackson
No ratings yet
The Ordinary Way: A Unique Way to Live
From Everand
The Ordinary Way: A Unique Way to Live
Mark T. Goodman
No ratings yet
NEBOSH iDIP Learner Guide
100% (1)
NEBOSH iDIP Learner Guide
74 pages
C and game programming a beginner s guide Second Edition Salvatore A Buono All Chapters Instant Download
100% (9)
C and game programming a beginner s guide Second Edition Salvatore A Buono All Chapters Instant Download
50 pages
I47e SGM H - Sigma-II Rotary Servo Motors Datasheet en
No ratings yet
I47e SGM H - Sigma-II Rotary Servo Motors Datasheet en
18 pages
Non-Executive Category Posts To Be Filled Up On Direct Recruitment Basis
No ratings yet
Non-Executive Category Posts To Be Filled Up On Direct Recruitment Basis
7 pages
Flex Catalogue of Tailwind Css Compiled by Muhammad Rizwan - Compressed
No ratings yet
Flex Catalogue of Tailwind Css Compiled by Muhammad Rizwan - Compressed
88 pages
LDA NEXOS02 DS EN Rev0
No ratings yet
LDA NEXOS02 DS EN Rev0
2 pages
Handbook of Intelligence Studies: Loch K. Johnson
No ratings yet
Handbook of Intelligence Studies: Loch K. Johnson
5 pages
TOC in Supply Chain Management
No ratings yet
TOC in Supply Chain Management
25 pages
BDM-Practicum Module
No ratings yet
BDM-Practicum Module
19 pages
Admission of A Partner
No ratings yet
Admission of A Partner
6 pages
Bud APAC - Prospectus PDF
No ratings yet
Bud APAC - Prospectus PDF
506 pages
Digital Cities Program Legazpi City Roadmap
No ratings yet
Digital Cities Program Legazpi City Roadmap
55 pages
25lalitpur WardLevel
No ratings yet
25lalitpur WardLevel
23 pages
MT2 - Wk6 - S11 Notes - Lyapunov Functions
No ratings yet
MT2 - Wk6 - S11 Notes - Lyapunov Functions
6 pages
Law Student Magazine Autumn 2011
No ratings yet
Law Student Magazine Autumn 2011
40 pages
Megalam Fabsafe MD, MX, MG: Clean Process Filters: E10 To U17 - HEPA/ULPA Panels
No ratings yet
Megalam Fabsafe MD, MX, MG: Clean Process Filters: E10 To U17 - HEPA/ULPA Panels
1 page
Unlock Your Future: Top MBA Admissions Help
No ratings yet
Unlock Your Future: Top MBA Admissions Help
9 pages
Huawei AP Interface Management Command
No ratings yet
Huawei AP Interface Management Command
217 pages
Fundu
No ratings yet
Fundu
2 pages
Tutorial 1 - Coordinate Geometry
No ratings yet
Tutorial 1 - Coordinate Geometry
5 pages
Individual Daily Log and Accomplishment Report
No ratings yet
Individual Daily Log and Accomplishment Report
5 pages
Tradewithsam - Money Management Plan
No ratings yet
Tradewithsam - Money Management Plan
5 pages
语料库练习模板 (剑16) -2022 8 29
No ratings yet
语料库练习模板 (剑16) -2022 8 29
2,609 pages
S1000D ATA Chapters
No ratings yet
S1000D ATA Chapters
67 pages
Accenture Unilever Upgrading SAP
No ratings yet
Accenture Unilever Upgrading SAP
2 pages
Dark Lensed Welding Demonstration Box2.0
No ratings yet
Dark Lensed Welding Demonstration Box2.0
19 pages
Danfoss Expansion Valve
No ratings yet
Danfoss Expansion Valve
42 pages

Week 3 - Lecture Slides - Logistic Regression

Uploaded by

Week 3 - Lecture Slides - Logistic Regression

Uploaded by

CS6140: Machine Learning

Week 3 – Logistic Regression

Dr. Ryan Rad

In our example, we want to classify a restaurant review as positive or negative.

Idea: One feature per word!

sushi was great the food awesome but service terrible

This has to be too simple, right?

• Words have different degrees of sentiment.

• Single words are not enough sometimes…

• How do we get list of positive/negative words?

What if instead of making each feature one word, we made it two?

sushi was good the food service not

sushi was good the food the service was not

What if we generalize good/bad to a numeric weighting per word?

What if we learn them from the data?

This score will be always numerical!

Predic𝑡𝑒𝑑 𝑆𝑒𝑛𝑡𝑖𝑚𝑒𝑛𝑡 = 𝑦1 = 𝑠𝑖𝑔𝑛( 𝑆𝑐𝑜𝑟𝑒 𝑥 )

𝑠̂ = 1 ⋅ #𝑎𝑤𝑒𝑠𝑜𝑚𝑒 − 1.5 ⋅ #𝑎𝑤𝑓𝑢𝑙

𝑠̂ = 1 ⋅ #𝑎𝑤𝑒𝑠𝑜𝑚𝑒 − 1.5 ⋅ #𝑎𝑤𝑓𝑢𝑙

Say we were to use the MSE…

The derivative of the 𝑠𝑖𝑔𝑛 function is 0!

Hence, Gradient Descent will no longer work L

Taken from Prof. Matt Gormley, CMU

Taken from Prof. Matt Gormley, CMU

Note: Convexity is an important concept in Machine Learning. By

Let’s frame this problem in term of probabilities instead.

“The sushi was alright, the service was OK”

Use probability as the measurement of certainty

Probability Probability Classifier

𝑆𝑐𝑜𝑟𝑒 𝑥) = 𝑤 / φ(𝑥 ()) )

Very sure Not sure if Very sure

𝑃C 𝑦) = +1|𝑥) = 0 𝑃C 𝑦) = +1|𝑥) = 0.5 𝑃C 𝑦) = +1|𝑥) = 1

Logistic Regression Classifier

Data Point φ𝟏 (𝒙) φ𝟐 (𝒙) 𝒚 Choose 𝒘 to maximize

Find the 𝑤 that maximizes the likelihood Cost(ℎ! 𝑥 , 𝑦) = /

Generally, we maximize the log-likelihood which looks like

Also commonly written by separating out positive/negative terms

𝐹𝑜𝑟 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑡𝑒𝑟𝑚𝑠 𝐹𝑜𝑟 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒 𝑡𝑒𝑟𝑚𝑠

The decision boundary is the set of x such that

A little bit of algebra shows that this is equivalent to

and, taking the natural log of both sides,

So, our decision boundary is linear!

What if we want to use a more complex decision boundary?

#awesome - #awful #awesome - #awful #awesome - #awful

What does this mean for our predictions?

• Given a new input 𝑥, pick the class 𝑖 that maximizes

• Predicted True and True in reality.

True Negative (TN):

• Predicted False and False in reality.

False Positive (FP):

• Predicted True and False in reality.

False Negative (FN):

• Predicted False and True in reality.

In this sample population we will state:

You might also like