3 Sequence and Language Modeling
3 Sequence and Language Modeling
Magdalena Biesialska
[email protected]
PhD Candidate
Universitat Politècnica de Catalunya
Technical University of Catalonia
Outline
2
Examples of sequences
3
Examples of sequences
4
Examples of sequences
5
Examples of sequences
6
Examples of sequences
7
Examples of sequences
8
Examples of sequences
9
Examples of sequences
10
Examples of sequences
11
Examples of sequences
12
What is a sequence in NLP?
● Sequence is an ordered collection of items
13
What is a sequence in NLP?
● Sequence is an ordered collection of items
● Many examples of sequences in real life
14
What is a sequence in NLP?
● Sequence is an ordered collection of items
● Many examples of sequences in real life
● Plenty of sequential data in NLP:
○ characters in words
○ words in sentences
○ sentences in discourse
15
Sequence modeling
● In NLP textual sequence can be input, output or both
16
Sequence modeling
● In NLP textual sequence can be input, output or both
● Sequences may have different lengths
17
Sequence modeling
● In NLP textual sequence can be input, output or both
● Sequences may have different lengths
● Long-distance dependencies in language, e.g.:
○ The lecture has lasted as long as students have had questions to the teacher.
○ The bicycle would not fit in the elevator because it was too bulky.
18
Sequence modeling using RNNs
yt
ht
xt
source: http://web.stanford.edu/class/cs224n/slides/cs224n-2020-lecture06-rnnlm.pdf 21
Sequence modeling using RNNs
Sentence Classification with RNNs: Sentiment Classification
source: http://web.stanford.edu/class/cs224n/slides/cs224n-2020-lecture06-rnnlm.pdf 22
Sequence modeling using RNNs
● RNNs are great for sequence modeling, but they suffer from vanishing gradients
● Other RNN variants:
○ LSTM
○ GRU
○ bidirectional
○ multi-layer
23
Sequence modeling using CNNs
● Introduced by Collobert et al. (2011)
● First layer extracts features for each word
● Second layer extracts features from the
whole sentence treating it as a sequence
with local and global structure
24
Sequence modeling using CNNs
Sentence Classification with CNNs: Sentiment Analysis
Kim (2014)
25
Sequence-to-sequence modeling
● Seq2Seq modeling takes a sequence as input and outputs another sequence
● Based on the encoder-decoder architecture
● Examples:
○ Machine Translation
○ Text Summarization
○ Chatbots
learn
eat
I want to ______
work
sleep
27
Examples of Language Models
29
Statistical language modeling
● Given a sequence of words, compute the probability distribution of the next word
31
Statistical language modeling
Example: 4-gram language model
32
Statistical language modeling
Example: 4-gram language model
33
Statistical language modeling
Example: 4-gram language model
34
Statistical language modeling
Example: 4-gram language model
35
Statistical language modeling
Example: 4-gram language model
36
Statistical language modeling
Example: 4-gram language model
37
Statistical language modeling
Example: 4-gram language model
38
Statistical language modeling
Example: 4-gram language model
39
Statistical language modeling
Example: 4-gram language model
40
Statistical language modeling
Example: 4-gram language model
41
Statistical language modeling
Example: 4-gram language model
42
Statistical language modeling
● Long-range dependencies are lost
● Some n-grams don’t appear in the corpus
● Solutions - smoothing techniques:
○ Linear interpolation
○ Back-off models
○ Discounting
43
Statistical language modeling
● Linear interpolation
44
Neural-based language modeling
● Learn with a neural network the probability of sequences of words
● NN tasks:
○ learn representation for each word (embedding)
○ learn the probability function from these embeddings
45
Neural-based language modeling
● Introduced by Bengio et al. (2003)
● Feed-forward NN
46
Neural-based language modeling
● Mikolov et al. (2011)
47
Neural-based language modeling
● Mikolov et al. (2011)
48
Neural-based language modeling
● Mikolov et al. (2011)
49
Neural-based language modeling
● Mikolov et al. (2011)
50
Neural-based language modeling
● Mikolov et al. (2011)
51
Neural-based language modeling
● Mikolov et al. (2011)
52
Neural-based language modeling
● Language models are often used as an auxiliary task to obtain
contextual embeddings
53
Large Language Models
● Jurafsky, D. & Martin, J.H. (2009). “Speech and language processing: an introduction to natural language processing”.
http://www.cs.colorado.edu/~martin/slp.html (2nd ed.) https://web.stanford.edu/~jurafsky/slp3/ed3book.pdf (3rd ed. draft for free)
● Goldberg, Y. & Hirst, G. (2017). “Neural Network Methods in Natural Language Processing”. Morgan & Claypool Publishers.
● Manning, C.D., & Schütze, H. (1999). “Foundations of Statistical Natural Language Processing”.
Articles
● Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., & Kuksa, P.P. (2011). “Natural Language Processing (almost) from
Scratch”. Journal of Machine Learning Research, 12, 2493-2537.
State-of-the-art
● OpenAI GPT-3
● Google Bard
55
Questions?