CS585 Lecture October10th
CS585 Lecture October10th
2
Plan for Today
Classification: Evaluating and improving
performance
Introduction to Neural Networks
3
Classification: Improving
Performance
4
Classifier Evaluation: Confusion Matrix
Predicted class
Positive Negative
Sensitivity
False Negative (FN) (Recall)
Positive True Positive (TP)
Type II Error 𝑻𝑷
Actual class
𝑻𝑷 𝑭𝑵
5
Classifier Evaluation: Accuracy
Why don't we use accuracy as our metric?
Imagine we saw 1 million tweets
100 of them talked about Delicious Pie Co.
999,900 talked about something else
7
Classifier Performance Metrics
Precision and recall provide two ways to
summarize the errors made for the positive class
(FP, FN).
9
Receiver Operating Characteristic
Threshold: 0.5
HAM SPAM
𝑇𝑃𝑅 =
𝑇𝑃
=
3 1
HAM TP=3 FN=2 𝑇𝑃 + 𝐹𝑁 5
𝐹𝑃 1
Threshold: 0.2
HAM SPAM ROC Curve
𝑇𝑃 4
HAM TP=4 FN=1 𝑇𝑃𝑅 = =
𝑇𝑃 + 𝐹𝑁 5
𝐹𝑃 1
SPAM FP=1 TN=3 𝐹𝑃𝑅 = =
𝐹𝑃 + 𝑇𝑁 4
Threshold: 0.8
HAM SPAM
𝑇𝑃 3
HAM TP=3 FN=2 𝑇𝑃𝑅 = =
𝑇𝑃 + 𝐹𝑁 5 0 False Positive Rate (FPR) 1
𝐹𝑃 0
SPAM FP=0 TN=4 𝐹𝑃𝑅 = =
𝐹𝑃 + 𝑇𝑁 4
TPR: Sensitivity | FPR: 1 - Specificity
10
ROC Area Under the Curve
Threshold: 0.5
HAM SPAM
𝑇𝑃𝑅 =
𝑇𝑃
=
3 1
HAM TP=3 FN=2 𝑇𝑃 + 𝐹𝑁 5 ROC Curve
𝐹𝑃 1
Threshold: 0.2
HAM SPAM
𝑇𝑃 4
HAM TP=4 FN=1 𝑇𝑃𝑅 = =
𝑇𝑃 + 𝐹𝑁 5
ROC AUC
𝐹𝑃 1
SPAM FP=1 TN=3 𝐹𝑃𝑅 = =
𝐹𝑃 + 𝑇𝑁 4
Threshold: 0.8
HAM SPAM
𝑇𝑃 3
HAM TP=3 FN=2 𝑇𝑃𝑅 = =
𝑇𝑃 + 𝐹𝑁 5 0 False Positive Rate (FPR) 1
𝐹𝑃 0
SPAM FP=0 TN=4 𝐹𝑃𝑅 = =
𝐹𝑃 + 𝑇𝑁 4
TPR: Sensitivity | FPR: 1 - Specificity
11
Receiver Operating Characteristic
Threshold: 0.5
AUC: 1.0
HAM SPAM
𝑇𝑃𝑅 =
𝑇𝑃
=
3 1
HAM TP=3 FN=2 𝑇𝑃 + 𝐹𝑁 5
𝐹𝑃 1
Threshold: 0.2
HAM SPAM
𝑇𝑃 4 AUC: ~0.8
HAM TP=4 FN=1 𝑇𝑃𝑅 = =
𝑇𝑃 + 𝐹𝑁 5
𝐹𝑃 1
SPAM FP=1 TN=3 𝐹𝑃𝑅 = =
𝐹𝑃 + 𝑇𝑁 4
Threshold: 0.8
HAM SPAM
𝑇𝑃 3
HAM TP=3 FN=2 𝑇𝑃𝑅 = =
𝑇𝑃 + 𝐹𝑁 5 0 False Positive Rate (FPR) 1
𝐹𝑃 0
SPAM FP=0 TN=4 𝐹𝑃𝑅 = =
𝐹𝑃 + 𝑇𝑁 4
TPR: Sensitivity | FPR: 1 - Specificity
12
Receiver Operating Characteristic
Threshold: 0.5
You want your classifier
HAM SPAM
somewhere here with high AUC
𝑇𝑃𝑅 =
𝑇𝑃
=
3 1
HAM TP=3 FN=2 𝑇𝑃 + 𝐹𝑁 5
𝐹𝑃 1
Threshold: 0.2
HAM SPAM
𝑇𝑃 4
HAM TP=4 FN=1 𝑇𝑃𝑅 = =
𝑇𝑃 + 𝐹𝑁 5
𝐹𝑃 1
SPAM FP=1 TN=3 𝐹𝑃𝑅 = =
𝐹𝑃 + 𝑇𝑁 4
Threshold: 0.8
HAM SPAM
𝑇𝑃 3
HAM TP=3 FN=2 𝑇𝑃𝑅 = =
𝑇𝑃 + 𝐹𝑁 5 0 False Positive Rate (FPR) 1
𝐹𝑃 0
SPAM FP=0 TN=4 𝐹𝑃𝑅 = =
𝐹𝑃 + 𝑇𝑁 4
TPR: Sensitivity | FPR: 1 - Specificity
13
Receiver Operating Characteristic
14
Precision - Recall Curve
15
ROC vs. Precision-Recall Curves
Both summarize model performance using
different probability thresholds
ROC curves should be used when there are
roughly equal numbers of observations for each
class
Precision-Recall curves should be used when
there is a moderate to large class imbalance
(when we are interested in the positive class and
there’s only a few positive samples)
16
3-class Confusion Matrix
17
Macroaveraging and Microaveraging
Macroaveraging:
compute the performance for each class, and then
average over classes
Microaveraging:
collect decisions for all classes into one confusion
matrix
compute precision and recall from that table.
18
Macroaveraging and Microaveraging
19
Text Classification System Pipeline
1. Obtain / collect / create labeled data set suitable for the task
2. Split the data set into:
two (training and test sets) parts OR
three (training, validation, and test sets) parts
3. Choose evaluation metric
4. Transform raw text into feature vectors:
bag of words
other types
5. Using feature vectors and labels from the training set, train the classifier /
create a model
6. Using evaluation metric from (3) benchmark the classifier / model
performance using the test set
7. Deploy the classifier / model to serve a real world application and monitor its
performance
20
Text Classification System Pipeline
Training data
1
(texts and their labels)
21
Poor Classifier Performance: Reasons
1. With all possible features extracted, we ended up
with a sparse feature vector (some features are too
rare and end up being noise) makes training hard
2. Few (~20%) relevant samples compared to non-
relevant (~80%) samples in the data set skews
learning towards non-relevant data
3. Need better learning algorithm
4. Need better pre-processing / feature extraction
5. Classifier parameters / hyperparameters need tuning
22
Underfitting / Overfitting
Underfitting
Overfitting
23
Bias vs. Variance
High bias Low bias
24
Parameter Tuning
25
K-Fold Cross-Validation
Validation
Train Validate Score
4-fold cross-validation
Train Train Train Validate ScoreA
26
Ensemble Learning
In ensemble learning we are creating a collection
(an ensemble) of hypotheses (models) h1, h2, ..., hN
and combine their predictions by averaging, voting,
or another level of machine learning. Indvidual
hypotheses (models) are base models and their
combination is the ensemble model.
Bagging
Boosting
Random Trees
etc.
27
Bagging: Classification
In bagging we generate K training sets by sampling
with replacement from the original training set.
Train (M dataTrain
points) Model 1 | h1
Train
Train
(M data points) Model 3 | h3 Plurality vote Output
....
Train
Train (M data points) Model K | hK
28
Bagging: Classification
In bagging we generate K training sets by sampling
with replacement from the original training set.
Train (M dataTrain
points) NaiveBayes1 | h1
Train
Train
(M data points) NaiveBayes3 | h3 Plurality vote Output
....
Train
Train (M data points) NaiveBayesK | hK
Bagging tends to reduce variance and helps with smaller data sets.
29
Ensemble Classification
Indvidual hypotheses (models) are base models
and their combination is the ensemble model.
Train
Train (M data points) NaiveBayes1 | h1
Train
Train (M data points) Perceptron | h2
Train
Train (M data points) k-NN | h3 Plurality vote Output
....
Train
Train (M data points) NaiveBayes2 | hK
30
Supervised Learning
31
What Kind of Questions ML Answers?
Question ML Category Example
What should I do next? Reinforcement learning Adjust room humidity or leave as is?
32
Main Machine Learning Categories
Supervised learning Unsupervised learning Reinforcement learning
33
Choosing Hypothesis / Model
Given a training set of N example input-output
(feature-label) pairs
(x1, y1), (x2, y2), ..., (xN, yN)
where each pair was generated by
y = f(x)
Ideally, we would like our model h(x) (hypothesis)
that approximates the true function f(x) to be:
h(x) = y = f(x) (consistent hypothesis)
34
Choosing Hypothesis / Model
Typically consistent hypothesis is impossible or
difficult to achieve:
use best-fit model / hypothesis
36
McCulloch-Pitts Model (1943)
x1 First computational models of
an Artificial Neural Network
w1
(loosely inspired by biological
neural networks) were
x2
proposed by Warren
w2
McCulloch and Walter Pitts in
w3 y 1943. Their ideas are a key
x3 component of modern day
w4
machine and deep learning.
x4
37
A Biological Neuron
A neuron or nerve cell is an
electrically excitable cell that
communicates with other
cells via specialized
connections called synapses.
Most neurons receive signals
via the dendrites and soma
and send out signals down
the axon. At the majority of
synapses, signals cross from
the axon of one neuron to a
dendrite of another.
Source: https://en.wikipedia.org/wiki/Neuron
38
Biological vs. Artificial Neuron
“Synapses”
“Axon”
“Neuron
”
39
Artificial Neuron (Perceptron)
A (single-layer) perceptron is x1
a model of a biological
neuron. It is made of the w1
following components:
weights
inputs xi - numerical values x2
(numbers)
w2
representing information Output
weights wi - numerical w3 y
40
Artificial Neuron (Perceptron)
A (single-layer) perceptron is x1
a model of a biological
neuron. It is made of the w1
following components:
Model
inputs xi - numerical values x2
parameters
w2
representing information Output
weights wi - numerical w3 y
41
Artificial Neuron (Perceptron)
x1 x1
w1 w1
x2 x2
w2 w2
w3 X y w3 y
x3 x3
w4 w4
x4 x4
42
Single-layer Perceptron as a Classifier
x1 x1
w1 w1
x2 x2
w2 w2
w3 X y w3 y
x3 x3
w4 w4
x4 x4
wi * xi < 0 → f = 0 → NO wi * xi 0 → f = 1 → YES
43
Perceptron with Step Activation
x1
w1
x2
w2
w3 y
x3
wN
. b
.
.
xN
wi * x i + b → f → y
44
Perceptrons = Linear Classifiers
x1
w1
x2
w2
w3
Binary output y
x3
0 or 1
wN
. b
.
.
xN
wi * x i + b → f → y
45
Classification: Linear Separation
HAM
SPAM HAM
SPAM
46
Perceptron with Sigmoid Activation
x1
w1
x2
w2
w3 y
x3
wN
. b
.
.
xN
wi * x i + b → f → y
47
Logistic Regression Classifier
x1
w1
x2
w2
w3 y
x3
wN
. b
.
.
xN
wi * x i + b → f → y
48
Single-layer Perceptron as a Classifier
word1 word1
w1 w1
word2 word2
w2 w2
w3 X y w3 y
word3 word3
w4 w4
. . . .
. . . .
. . . .
wordN wordN
49
Basic Neural Unit
x1
w1
x2
w2
Nonlinear transform
w3 y
x3
wN
. b
.
.
xN
z = wi * x i + b y =
50
Basic Neural Unit
weighted sum
feature vector
Nonlinear activation
input layer
weights
function output
(can differ for each
layer!)
bias
51
Selected Activation Functions
f1
w1
f2
w2
y
w3
f3
w4
f4
52
Classification: Linear Separation?
HAM
SPAM
HAM SPAM HAM
HAM SPAM
SPAM
53
Hypothesis: Classification “Boundary”
54
XOR: Not a Linearly Separable f()
55
XOR: Not a Linearly Separable f()?
56
Artificial Neural Network
57
Basic Neural Unit
weights
Input Output
layer layer
58
Basic Neural Unit
weighted sum
feature vector
Nonlinear activation
input layer
weights
function output
(can differ for each
layer!)
bias
59
Artificial Neural Network (ANN)
An artificial neural network is made of multiple artificial neuron layers.
60
Feedforward Neural Network
features weights weights weights output
61
XOR: Hidden Layer Approach
62
Hidden Layer
features weights weights output
63
2 Layer Network
features weights weights output
64
Training Data: Features + Labels
Typically input data will be represented by a limited set of features.
Features:
Wheels: 4 Label:
Weight: 8 tons
Passengers: 1
Truck
Features:
Wheels: 6 Label:
Weight: 8 tons Truck
Passengers: 1
Features:
Wheels: 4 Label:
Weight: 1 ton Car
Passengers: 4
Features:
Wheels: 4 Label:
Weight: 2 tons Car
Passengers: 4
65
ANN: Supervised Learning
weights weights weights
wheels
weight
passengers
66
Training Data: Images + Labels
A classifier needs to be “shown” thousands of labeled examples to learn.
67
Digit Image as ANN Feature Set
Individual features need to be “extracted” from an image. An image is numbers.
Source: https://nikolanews.com/not-just-introduction-to-convolutional-neural-networks-part-1/
68
ANN: Supervised Learning
An untrained classifier will NOT label input data correctly.
weights weights weights
0.12
0.99
0.55
Other
69
ANN: Training
Given: input data and it’s corresponding expected label: DOG calculate
weights
“error”.
weights weights
Should be 1!
0.12
0.99
0.55
Other
“Error” = 0.88. Go back and adjust all the weights to ensure it is lower next time.
70
ANN: Training
Show data / label pair: / DOG.
weights weights weights
Should be 1!
0.12
0.99
0.55
Other
71
ANN as a Complex Function
In ANNs hypotheses take form of complex algebraic circuits with
tunable connection strengths (weights).
weights weights weights
72
Exercise: ANN Demo
http://playground.tensorflow.org/
73
Logistic Regression = 1 Layer Network
x1
w1
x2
w2
w3 y
x3
scalar
wN
. b
.
.
xN
wi * x i + b → f → y
74
Binary Logistic Regression = 1 Layer
weights
Input Output
layer layer
75
Multinomial Logistic Regression
weights
Input Output
layer layer
76
Fully Connected Network
weights
Input Output
layer layer
77
Softmax: Sigmoid Generalization
78
Binary Logistic Regression
features weights output
Input Output
layer layer
79
Multinomial Logistic Regression
features weights output
Input Output
layer layer
81
2 Layer Network
features weights weights output
j
i
82
2 Layer Network
features weights weights output
activation
function f1
activation
function f1
activation
function f2
activation
function f1
activation
function f1
Activation function f1: sigmoid, tanh, ReLU, etc. | Activation function f2: sigmoid
83
2 Layer Network
features weights weights output
activation
function f1
activation
function f1
activation
function f2
activation
function f1
activation
function f1
Activation function f1: sigmoid, tanh, ReLU, etc. | Activation function f2: softmax
84
Multilayer Neural Net: Notation
85
Multilayer Neural Net: Notation
x1
w1
x2
w2
z a
w3 Nonlinear transform y
x3
wN
g
. b
.
.
xN
z = wi * x i + b
86
Multilayer Neural Net: Notation
x1
w1
x2
w2
z a
w3 Nonlinear transform y
x3
wN
g
. b
.
.
xN
z = wi * x i + b
87
Replacing the Bias Unit
x1
w1
x2
w2
z a
w3 Nonlinear transform y
x3
wN
g
. b
.
.
xN
This is a bit inconvenient
z = wi * x i + b
88
Replacing the Bias Unit
Let's switch to a notation without the bias unit
1. Add a dummy node a0=1 to each layer
2. Its weight w0 will be the bias
3. So input layer a[0]0=1, a[1]0=1, a[2]0=1, …
and
89
Replacing the Bias Unit
90
Deep Learning
91
Deep Learning
Deep learning is a broad family of techniques for
machine learning (also a sub-field of ML) in which
hypotheses take the form of complex algebraic
circuits with tunable connections. The word “deep”
refers to the fact that the circuits are typically
organized into many layers, which means that
computation paths from inputs to outputs have
many steps.
92
Shallow vs. Deep Models
Source: https://www.quora.com/What-is-the-difference-between-deep-learning-and-usual-machine-learning
94
Machine Learning vs. Deep Learning
Source: https://www.intel.com/content/www/us/en/artificial-intelligence/posts/difference-between-ai-machine-learning-deep-
learning.html
95
Deep Learning: Feature Extraction
Source: https://en.wikipedia.org/wiki/Deep_learning
96
Neural Networks in NLP
Let’s consider the NLP modeling we explored
so far:
Classification
Language Modeling
Can we apply Neural Networks?
97
Logistic Regression Sentiment Analysis
features weights output
BINARY
answer
Input Output
layer layer
98
Logistic Regression Sentiment Analysis
features weights weights output
BINARY
answer
99
Complex Feature Vector Relationships
Adding hidden layers can help capture non-linear relationships between features!
100
Word Embedding: Definition
Word Embedding:
a term used for the representation of words for text analysis,
typically in the form of a real-valued vector that encodes the
meaning of the word such that the words that are closer in
the vector space are expected to be similar in meaning
from Wikipedia
101
Exercise: Word2Vec
https://www.cs.cmu.edu/~dst/WordEmbeddingDem
o/index.html
10
Embeddings as Input Features
features weights weights output
(features learned from data)
Embeddings
Multiclass output: add more output layer nodes + use softmax (instead of sigmoid)
103
Embeddings as Input Features
104
Embeddings as Input Features
Assumption:
“3-word sentences”
105
Embeddings as Input Features
features weights weights output
(features learned from data)
Embeddings
BINARY
answer
106
Texts in Different Sizes: Ideas
Some simple solutions:
1. Make the input the length of the longest sample
if shorter then pad with zero embeddings
truncate if you get longer reviews at test time
2. Create a single "sentence embedding" (the same
dimensionality as a word) to represent all the words
take the mean of all the word embeddings
take the element-wise max of all the word
embeddings
for each dimension, pick the max value from all
words
107
Language Models Revisited
Language Modeling: Calculating the probability of the next
word in a sequence given some history.
• N-gram based language models
• other: neural network-based?
108
Neural Language Model
109
Neural LM Better Than N-Gram LM
Training data:
We've seen: I have to make sure that the cat gets fed.
Never seen: dog gets fed
Test data:
I forgot to make sure that the dog gets ___
N-gram LM can't predict "fed"!
Neural LM can use similarity of "cat" and "dog" embeddings
to generalize and predict “fed” after dog
110
Training Neural Networks
111
Training Neural Networks: Intuition
For every training tuple (x ,y) = (feature vector, label)
Run forward computation to find estimate y ̂
Run backward computation to update weights:
For every output node
Compute loss L between true y and the estimated y ̂
For every weight w from hidden layer to the output layer
Update the weight
112
Back-propagation
Feed forward Evaluate Loss Back-propagation
w1 w1 w1
x
z z 𝜕𝐿𝑜𝑠𝑠 z
f(x,y) f(x,y) 𝜕𝑥 f(x,y)
𝜕𝐿𝑜𝑠𝑠
y 𝜕𝑦
𝜕𝐿𝑜𝑠𝑠
Loss = z - 𝜕𝑧
w2 w2 zexpected w2
114
NN Node: Derivative of the Loss
x1
w1
x2
w2
z a
w3 Nonlinear transform y
x3
wN
g
. b
.
.
xN
z = wi * x i + b
115
Convolutional Neural Networks
The name Convolutional Neural Network (CNN) indicates that the
network employs a mathematical operation called convolution.
CNNs can reduce images (data grids) into a form which is easier to
process without losing features that are critical for getting a good
prediction.
116
Convolutional Neural Networks
Flattening
Pooling
117
Convolution: The Idea
3 x 3 Kernel / Filter
Source: https://commons.wikimedia.org/wiki/File:Convolutional_Neural_Network_NeuralNetworkFilter.gif
118
Kernel / Filter: The Idea
3 x 3 Kernel / Filter
Source: https://commons.wikimedia.org/wiki/File:Convolution_arithmetic_-_Padding_strides.gif
119
Convoluting Matrices
Convolution (and Convolutional Neural Networks) can be applied
to any grid-like data (tensors: matrices, vectors, etc.).
kernel data
0 1 0 0 2 3 0*0 1*2 0*3
1 1 1 conv 2 4 1 “overla
y” 1*2 1*4 1*1 sum 12
0 1 0 0 3 0 0*0 1*3 0*0
120
Selected Image Processing Kernels
121
Image Processing: Kernels / Filters
122
Applying Kernels / Filters
3 x 3 Kernel / Filter
123
Convolutional NN Kernels
In practice, Convolutional Neural Network kernels can be larger than
3x3 and are learned using back propagation.
124
Convolution Layer 1
Kernel 1
125
Convolution Layer 1
Kernel 2
Kernel 1
126
Convolution Layer 1
Original image
Kernel 3
Kernel 2
Kernel 1
Convolution 1
127
Convolutional Neural Networks
128
Max Pooling Layer
Convolution 1
Max Pooling
129
Convolutional Neural Networks
130
Convolution Layer 2
Original convolution
after pooling Kernel C
Kernel B
Kernel A
Convolution A
131
Convolutional Neural Networks
132
Flattening
Final output of convolution layers is “flattened” to become a vector of features.
Convert to
vector
Source: https://nikolanews.com/not-just-introduction-to-convolutional-neural-networks-part-1/
133
Recurrent Neural Networks
Recurrent Neural Networks (RNNs) allow cycles in the computational graph
(network). A network node (unit) can take its own output from an earlier step as
input (with delay introduced).
Enables having internal state / memory inputs received earlier affect the RNN
response to current input.
134
Long Short-Term Memory (LSTM)
Long short-term memory (LSTM) is an artificial neural network. Unlike standard
feedforward neural networks, LSTM has feedback connections. Such a recurrent
neural network (RNN) can process not only single data points (such as images), but
also entire sequences of data (such as speech or video). This characteristic makes
LSTM networks ideal for processing and predicting data.
135
Large Language Model (LLM)
A large language model (LLM) is a language model
consisting of a neural network with many
parameters (typically billions of weights or more),
trained on large quantities of unlabeled text using
self-supervised learning.
Source: Wikipedia
136
Generative Pre-trained Transformer 3
What is it?
Generative Pre-trained Transformer 3 (GPT-3) is an
autoregressive language model that uses deep learning
to produce human-like text. It is the third-generation
language prediction model in the GPT-n series (and the
successor to GPT-2) created by OpenAI, a San Francisco-
based artificial intelligence research laboratory.
Size:
175 billion machine learning parameters
~45 GB
Source: Wikipedia
137
Parameters? What Are Those?
features weights weights output
j
i
138
Transformer Architecture
139
GPT-4 Architecture
Source: TheAiEdge.io
140
Self-Attention
In artificial neural networks, attention is a technique that is meant to mimic
cognitive attention. The effect enhances some parts of the input data while
diminishing other parts — the motivation being that the network should devote
more focus to the important parts of the data, even though they may be small.
Learning which part of the data is more important than another depends on the
context, and this is trained by gradient descent.
Source: Park et al. – “SANVis: Visual Analytics for Understanding Self-Attention Networks”
141
Generative Pre-trained Transformer 4
What is it?
Generative Pre-trained Transformer 4 (GPT-4) is a
multimodal large language model created by OpenAI. As a
transformer, GPT-4 was pretrained to predict the next
token (using both public data and "data licensed from
third-party providers"), and was then fine-tuned with
reinforcement learning from human and AI feedback for
human alignment and policy compliance.
Size:
1 trillion machine learning parameters
Source: Wikipedia
142
Large Language Models Data Sources
143
LLM Data Pre-Processing Pipeline
144
ChatGPT
What is it?
ChatGPT is a chatbot developed by OpenAI and released in
November 2022. It is built on top of OpenAI's GPT-3.5 and
GPT-4 families of large language models (LLMs) and has
been fine-tuned (an approach to transfer learning) using
both supervised and reinforcement learning techniques.
Source: Wikipedia
145
Transfer Learning
In transfer learning, experience with one
learning task helps an agent learn better on
another task.
146