0% found this document useful (0 votes)
32 views

3 Sequence and Language Modeling

This document provides an overview of sequence modeling and language modeling techniques in natural language processing. It begins with examples of sequences in NLP like characters, words and sentences. It then discusses statistical and neural-based approaches to sequence modeling using recurrent neural networks, convolutional neural networks and sequence-to-sequence models. For language modeling, it explains statistical n-gram models and neural language models using feedforward networks and word embeddings. Finally, it briefly mentions recent large pre-trained language models.

Uploaded by

Ouri
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views

3 Sequence and Language Modeling

This document provides an overview of sequence modeling and language modeling techniques in natural language processing. It begins with examples of sequences in NLP like characters, words and sentences. It then discusses statistical and neural-based approaches to sequence modeling using recurrent neural networks, convolutional neural networks and sequence-to-sequence models. For language modeling, it explains statistical n-gram models and neural language models using feedforward networks and word embeddings. Finally, it briefly mentions recent large pre-trained language models.

Uploaded by

Ouri
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 56

Sequence and Language Modeling

Magdalena Biesialska
[email protected]

PhD Candidate
Universitat Politècnica de Catalunya
Technical University of Catalonia
Outline

● Key concepts and examples of Sequence Modeling


● Neural-based Sequence Modeling
● Statistical Language Modeling
● Neural-based Language Modeling

2
Examples of sequences

3
Examples of sequences

4
Examples of sequences

5
Examples of sequences

6
Examples of sequences

7
Examples of sequences

8
Examples of sequences

9
Examples of sequences

10
Examples of sequences

11
Examples of sequences

12
What is a sequence in NLP?
● Sequence is an ordered collection of items

13
What is a sequence in NLP?
● Sequence is an ordered collection of items
● Many examples of sequences in real life

14
What is a sequence in NLP?
● Sequence is an ordered collection of items
● Many examples of sequences in real life
● Plenty of sequential data in NLP:
○ characters in words
○ words in sentences
○ sentences in discourse

15
Sequence modeling
● In NLP textual sequence can be input, output or both

16
Sequence modeling
● In NLP textual sequence can be input, output or both
● Sequences may have different lengths

17
Sequence modeling
● In NLP textual sequence can be input, output or both
● Sequences may have different lengths
● Long-distance dependencies in language, e.g.:
○ The lecture has lasted as long as students have had questions to the teacher.
○ The bicycle would not fit in the elevator because it was too bulky.

18
Sequence modeling using RNNs

image source: Graham Neubig 19


Sequence modeling using RNNs
A simple RNN RNN unrolled in time

yt

ht

xt

image source: http://web.stanford.edu/class/cs224n/slides/cs224n-2020-lecture06-rnnlm.pdf 20


Sequence modeling using RNNs
Tagging with RNNs: Named Entity Recognition, POS tagging

source: http://web.stanford.edu/class/cs224n/slides/cs224n-2020-lecture06-rnnlm.pdf 21
Sequence modeling using RNNs
Sentence Classification with RNNs: Sentiment Classification

source: http://web.stanford.edu/class/cs224n/slides/cs224n-2020-lecture06-rnnlm.pdf 22
Sequence modeling using RNNs
● RNNs are great for sequence modeling, but they suffer from vanishing gradients
● Other RNN variants:
○ LSTM
○ GRU
○ bidirectional
○ multi-layer

23
Sequence modeling using CNNs
● Introduced by Collobert et al. (2011)
● First layer extracts features for each word
● Second layer extracts features from the
whole sentence treating it as a sequence
with local and global structure

24
Sequence modeling using CNNs
Sentence Classification with CNNs: Sentiment Analysis

Kim (2014)

25
Sequence-to-sequence modeling
● Seq2Seq modeling takes a sequence as input and outputs another sequence
● Based on the encoder-decoder architecture
● Examples:
○ Machine Translation
○ Text Summarization
○ Chatbots

image source: http://web.stanford.edu/class/cs224n/slides/cs224n-2020-lecture08-nmt.pdf 26


Language modeling
● Language Modeling is the task of predicting the next token.

learn

eat
I want to ______
work

sleep

27
Examples of Language Models

image source: https://support.apple.com/en-us/HT207525 28


Statistical language modeling
● Given a sequence of words, compute the probability distribution of the next word

● Chain rule of probability

29
Statistical language modeling
● Given a sequence of words, compute the probability distribution of the next word

● The probability of a sentence is the product of the conditional probabilities of


each word wi given the previous ones
● Independence assumption: the probability of the word wi is only conditioned by
the n previous words
30
Statistical language modeling
Example: 4-gram language model

31
Statistical language modeling
Example: 4-gram language model

32
Statistical language modeling
Example: 4-gram language model

33
Statistical language modeling
Example: 4-gram language model

34
Statistical language modeling
Example: 4-gram language model

35
Statistical language modeling
Example: 4-gram language model

36
Statistical language modeling
Example: 4-gram language model

37
Statistical language modeling
Example: 4-gram language model

38
Statistical language modeling
Example: 4-gram language model

39
Statistical language modeling
Example: 4-gram language model

40
Statistical language modeling
Example: 4-gram language model

41
Statistical language modeling
Example: 4-gram language model

42
Statistical language modeling
● Long-range dependencies are lost
● Some n-grams don’t appear in the corpus
● Solutions - smoothing techniques:
○ Linear interpolation
○ Back-off models
○ Discounting

43
Statistical language modeling
● Linear interpolation

● Back-off models - if n-gram is not present, back off to (n-1)-gram


● Discounting - keep part of the probability mass for unseen words

44
Neural-based language modeling
● Learn with a neural network the probability of sequences of words

● NN tasks:
○ learn representation for each word (embedding)
○ learn the probability function from these embeddings

45
Neural-based language modeling
● Introduced by Bengio et al. (2003)
● Feed-forward NN

46
Neural-based language modeling
● Mikolov et al. (2011)

47
Neural-based language modeling
● Mikolov et al. (2011)

48
Neural-based language modeling
● Mikolov et al. (2011)

49
Neural-based language modeling
● Mikolov et al. (2011)

50
Neural-based language modeling
● Mikolov et al. (2011)

51
Neural-based language modeling
● Mikolov et al. (2011)

52
Neural-based language modeling
● Language models are often used as an auxiliary task to obtain
contextual embeddings

53
Large Language Models

image sources: https://blog.tensorflow.org/2020/05/how-hugging-face-achieved-2x-performance-boost-question-answering.html


54
https://www.cs.toronto.edu/~raeidsaqur/csc401/lectures/13_LLM.pdf
Further reading
Textbooks

● Jurafsky, D. & Martin, J.H. (2009). “Speech and language processing: an introduction to natural language processing”.
http://www.cs.colorado.edu/~martin/slp.html (2nd ed.) https://web.stanford.edu/~jurafsky/slp3/ed3book.pdf (3rd ed. draft for free)
● Goldberg, Y. & Hirst, G. (2017). “Neural Network Methods in Natural Language Processing”. Morgan & Claypool Publishers.
● Manning, C.D., & Schütze, H. (1999). “Foundations of Statistical Natural Language Processing”.

Articles

● Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., & Kuksa, P.P. (2011). “Natural Language Processing (almost) from
Scratch”. Journal of Machine Learning Research, 12, 2493-2537.

● Kim (2014). “Convolutional Neural Networks for Sentence Classification” https://arxiv.org/pdf/1408.5882.pdf

State-of-the-art

● OpenAI GPT-3

● Google Bard

55
Questions?

You might also like