0% found this document useful (0 votes)

90 views44 pages

Practical Deep Learning For NLP: Maarten Versteegh NLP Research Engineer

The document provides an overview of deep learning techniques for natural language processing (NLP) tasks like text classification and sentiment analysis. It discusses convolutional neural networks with word embeddings for text classification and ResNet models for sentiment analysis. It also provides tips on model initialization, regularization, data augmentation, and hyperparameter optimization. The document uses examples from classifying news articles and analyzing sentiment in social media posts to illustrate the deep learning approaches.

Uploaded by

arelismohammad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

90 views44 pages

Practical Deep Learning For NLP: Maarten Versteegh NLP Research Engineer

Uploaded by

arelismohammad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 44

Practical Deep Learning for NLP

Maarten Versteegh
NLP Research Engineer
Overview

● Deep Learning Recap

● Text classification:
– Convnet with word embeddings
● Sentiment analysis:
– ResNet
● Tips and tricks
What is this deep learning thing again?
Output
Activation

Error
Hidden

Input
Rectified Linear Units
Backpropagation involves repeated multiplication with derivative of activation function
→ Problem if result is always smaller than 1!
Text Classification
Traditional approach: BOW + TFIDF
“The car might also need a front end alignment”

"alignment" (0.323) "also need" (0.343)

"also" (0.137) "car might" (0.358)
"car" (0.110) "end alignment" (0.358)
"end" (0.182) "front end" (0.296)
"front" (0.167) "might also" (0.358)
"might" (0.178) "need front" (0.358)
"need" (0.157) "the car" (0.161)
"the" (0.053)
20 newsgroups performance

F1-Score*
BOW+TFIDF+SVM Some number

(*) Scores removed

Deep Learning 1: Replace Classifier

Output

Hidden x 256

Hidden x 512

BOW
x 1000
Features
from keras.layers import Input, Dense
from keras.models import Model

input_layer = Input(shape=(1000,))
fc_1 = Dense(512, activation='relu')(input_layer)
fc_2 = Dense(256, activation='relu')(fc_1)
output_layer = Dense(10, activation='softmax')(fc_2)

model = Model(input=input_layer, output=output_layer)

model.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['accuracy'])

model.fit(bow, newsgroups.target)
predictions = model.predict(features).argmax(axis=1)
20 newsgroups performance

F1-Score*
BOW+TFIDF+SVM Some number
BOW+TFIDF+SVD+ 2-layer NN Some slightly higher number

(*) Scores removed

What about the deep learning promise?
Convolutional Networks

Source: Andrej Karpathy

Pooling layer

Source: Andrej Karpathy

Convolutional networks

Source: Y. Kim (2014) Convolutional Networks for Sentence Classification

Word embedding
from keras.layers import Embedding

# embedding_matrix: ndarray(vocab_size, embedding_dim)

input_layer = Input(shape=(MAX_SEQUENCE_LENGTH,), dtype='int32')

layer = Embedding(
embedding_matrix.shape[0],
embedding_matrix.shape[1],
weights=[embedding_matrix],
input_length=max_sequence_length,
trainable=False
)(input_layer)
from keras.layer import Convolution1D, MaxPooling1D,
BatchNormalization, Activation

layer = Embedding(...)(input_layer)

layer = Convolution1D(
128, # number of filters
5, # filter size
activation='relu',
)(layer)

layer = MaxPooling1D(5)(layer)
Performance
F1-Score*
BOW+TFIDF+SVM Some number
CBOW+TFIDF+SVD+NN Some slightly higher number
ConvNet (3 layers) Quite a bit higher now
ConvNet (6 layers) Look mom, even higher!

(*) Scores removed

Sentiment Analysis
Data Set
Facebook posts from media organizations:
– CNN, MSNBC, NYTimes, The Guardian, Buzzfeed,
Breitbart, Politico, The Wall Street Journal, Washington
Post, Baltimore Sun
Measure sentiment as “reactions”
Title Org Like Love Wow Haha Sad Angry

Poll: Clinton up big on Trump in Virginia CNN 4176 601 17 211 11 83

It's a fact: Trump has tiny hands. Will Guardian 595 17 17 225 2 8
this be the one that sinks him?
Donald Trump Explains His Obama- NYTimes 2059 32 284 1214 80 2167
Founded-ISIS Claim as ‘Sarcasm’
Can hipsters stomach the unpalatable Guardian 3655 0 396 44 773 69
truth about avocado toast?
Tim Kaine skewers Donald Trump's MSNBC 1094 111 6 12 2 26
military policy
Top 5 Most Antisemitic Things Hillary Breitbart 1067 7 134 35 22 372
Clinton Has Done
17 Hilarious Tweets About Donald Buzzfeed 11390 375 16 4121 4 5
Trump Explaining Movies
Go deeper: ResNet
Convolutional Layers with shortcuts

He et al. Deep Residual Learning

for Image Recognition
Go deeper: ResNet

input_layer = ...

layer = Convolution1D(128, 5, activation='linear')

(input_layer)
layer = BatchNormalization()(layer)
layer = Activation('relu')(layer)

layer = Convolution1D(128, 5, activation='linear')(layer)

layer = BatchNormalization()(layer)
layer = Activation('relu')(layer)

block_output = merge([layer, input_layer], mode='sum')

block_output = Activation('relu')(block_output)
%
Dense

The Guardian Dense

(1-of-K)
News Org
MaxPooling

ResNet Block

… Conv (128) x 10

ResNet Block

It's a fact: Trump has tiny hands.

(EMBEDDING_DIM=300)
Title + Message
Cherry-picked predicted response
distribution*
Sentence Org Love Haha Wow Sad Angry

Trump wins the election Guardian 3% 9% 7% 32% 49%

Trump wins the election Breitbart 58% 30% 8% 1% 3%

*Your mileage may vary. By a lot. I

mean it.
Tips and Tricks
Initialization
● Break symmetry:
– Never ever initialize all your weights to
the same value
● Let initialization depend on activation
function:
– ReLU/PReLU → He Normal
– sigmoid/tanh → Glorot Normal
Choose an adaptive optimizer

Choose an adaptive optimizer

Source: Alec Radford

Choose the right model size

● Start small and keep adding layers

– Check if test error keeps going down
● Cross-validate over the number of units
● You want to be able to overfit

Y. Bengio (2012) Practical

recommendations for gradient-based
training of deep architectures
Don't be scared of overfitting
● If your model can't overfit, it also can't learn enough
● So, check that your model can overfit:
– If not, make it bigger
– If so, get more date and/or regularize

Source: wikipedia
Regularization
● Norm penalties on hidden layer weights, never
on first and last
● Dropout
● Early stopping
Size of data set
● Just get more data already
● Augment data:
– Textual replacements
– Word vector perturbation
– Noise Contrastive Estimation
● Semi-supervised learning:
– Adapt word embeddings to your domain
Monitor your model

Training loss:
– Does the model converge?
– Is the learning rate too low or too high?
Training loss and learning rate

Source: Andrej Karpathy

Monitor your model

Training and validation accuracy

– Is there a large gap?
– Does the training accuracy increase
while the validation accuracy
decreases?
Training and validation accuracy

Source: Andrej Karpathy

Monitor your model

● Ratio of weights to updates

● Distribution of activations and gradients
(per layer)
Hyperparameter optimization

After network architecture, continue with:

– Regularization strength
– Initial learning rate
– Optimization strategy (and LR decay
schedule)
Friends don't let friends do a full grid search!
Hyperparameter optimization

Friends don't let friends do a full grid search!

– Use a smart strategy like Bayesian
optimization or Particle Swarm Optimization
(Spearmint, SMAC, Hyperopt, Optunity)
– Even random search often beats grid search
Keep up to date: arxiv-sanity.com
We are hiring!
DevOps & Front-end
NLP engineers
Full-stack Python engineers

www.textkernel.com/jobs
Questions?

Source: http://visualqa.org/

AI by Hand Vol 1
No ratings yet
AI by Hand Vol 1
28 pages
Deep Learning TensorFlow and Keras
No ratings yet
Deep Learning TensorFlow and Keras
454 pages
Deep Learning: A Visual Introduction
No ratings yet
Deep Learning: A Visual Introduction
53 pages
Deep Learning
100% (2)
Deep Learning
49 pages
Deepnet Lourentzou
No ratings yet
Deepnet Lourentzou
49 pages
IoT - Lecture 11
No ratings yet
IoT - Lecture 11
58 pages
3 - DeepLearning - and - CNN v3
No ratings yet
3 - DeepLearning - and - CNN v3
50 pages
NPU MachineLearning
No ratings yet
NPU MachineLearning
28 pages
Chapter 9
No ratings yet
Chapter 9
73 pages
01 - Introduction To Deep Learning
No ratings yet
01 - Introduction To Deep Learning
56 pages
Deep Learning
No ratings yet
Deep Learning
49 pages
Introduction To Deep Learning: TA: Drew Hudson May 8, 2020
No ratings yet
Introduction To Deep Learning: TA: Drew Hudson May 8, 2020
33 pages
Lecture 9 Training Deep Networks
No ratings yet
Lecture 9 Training Deep Networks
20 pages
Large Scale Deep Learning
No ratings yet
Large Scale Deep Learning
170 pages
Midterm Study Guide Csci566
No ratings yet
Midterm Study Guide Csci566
20 pages
The Deep Learning Revolution: Introductory Overview Lecture
No ratings yet
The Deep Learning Revolution: Introductory Overview Lecture
35 pages
AlexNet and Other Pretrained Models_Presentation
No ratings yet
AlexNet and Other Pretrained Models_Presentation
182 pages
LLM For Maths People
No ratings yet
LLM For Maths People
53 pages
08 Natural Language Processing in Tensorflow
No ratings yet
08 Natural Language Processing in Tensorflow
29 pages
Deep Learning Fundamentals and ArchitecturesDeep Learning Fundamentals and Architectures
No ratings yet
Deep Learning Fundamentals and ArchitecturesDeep Learning Fundamentals and Architectures
9 pages
Deep Unsupervised Learning
No ratings yet
Deep Unsupervised Learning
90 pages
23 DeepLearning PDF
No ratings yet
23 DeepLearning PDF
74 pages
Deep Learning Most Important Ideas PDF
No ratings yet
Deep Learning Most Important Ideas PDF
16 pages
Tensorflow Ensai SID 13 01 17
No ratings yet
Tensorflow Ensai SID 13 01 17
99 pages
LBDL
No ratings yet
LBDL
156 pages
Week 1_ Artificial Neural Networks - Part I - Justin (1)
No ratings yet
Week 1_ Artificial Neural Networks - Part I - Justin (1)
56 pages
Deep Learning and Applications: Pham The Bao Ptbao@sgu - Edu.vn
No ratings yet
Deep Learning and Applications: Pham The Bao Ptbao@sgu - Edu.vn
43 pages
Deep Nets
No ratings yet
Deep Nets
25 pages
Day 10
No ratings yet
Day 10
17 pages
Anthony
No ratings yet
Anthony
33 pages
A Recipe For Training Neural Networks
No ratings yet
A Recipe For Training Neural Networks
15 pages
Efficient Deep Learning (First Early Release) (Gaurav Menghani Naresh Singh) (Z-Library)
No ratings yet
Efficient Deep Learning (First Early Release) (Gaurav Menghani Naresh Singh) (Z-Library)
69 pages
Lecture 09 Slides - After
No ratings yet
Lecture 09 Slides - After
57 pages
BMM 2018 - Deep Learning Tutorial
No ratings yet
BMM 2018 - Deep Learning Tutorial
47 pages
Deep Learning Tutorial
No ratings yet
Deep Learning Tutorial
133 pages
Deep Learning With Tensorflow
100% (1)
Deep Learning With Tensorflow
70 pages
Deep Learning in Neural Networks An Overview
No ratings yet
Deep Learning in Neural Networks An Overview
89 pages
Exercise 8
No ratings yet
Exercise 8
6 pages
Bay Learn 2015 Deep Mind
No ratings yet
Bay Learn 2015 Deep Mind
69 pages
Deep Learning For NLP
No ratings yet
Deep Learning For NLP
78 pages
AIDL03 EvolutionOfAI
No ratings yet
AIDL03 EvolutionOfAI
22 pages
DL Intro
No ratings yet
DL Intro
64 pages
Super VIP Cheatsheet - Deep Learning
No ratings yet
Super VIP Cheatsheet - Deep Learning
47 pages
Crashcourse DL Pytorch Parr
No ratings yet
Crashcourse DL Pytorch Parr
39 pages
Chapter 11 Neural Nets
No ratings yet
Chapter 11 Neural Nets
39 pages
Keras For Beginners: Implementing A Recurrent Neural Network
No ratings yet
Keras For Beginners: Implementing A Recurrent Neural Network
13 pages
Lesson 4 - Deep Learning
No ratings yet
Lesson 4 - Deep Learning
20 pages
Short Course On Deep Learning: Welcome!!
No ratings yet
Short Course On Deep Learning: Welcome!!
57 pages
Lecture 14 Introduction To Pytorch
No ratings yet
Lecture 14 Introduction To Pytorch
45 pages
Deep Learning
No ratings yet
Deep Learning
57 pages
Deep Learning With Tensorflow
No ratings yet
Deep Learning With Tensorflow
50 pages
LBDL A5 Booklet
No ratings yet
LBDL A5 Booklet
82 pages
2.game AI 1
No ratings yet
2.game AI 1
268 pages
Lec14 CNNRNNModels
No ratings yet
Lec14 CNNRNNModels
64 pages
Tensorflow Implementation For Job Market Classification: Taras Mitran Jeff Waller
No ratings yet
Tensorflow Implementation For Job Market Classification: Taras Mitran Jeff Waller
46 pages
Lecture 02
No ratings yet
Lecture 02
147 pages
Deep Learing
No ratings yet
Deep Learing
37 pages
6S191 MIT DeepLearning L1
No ratings yet
6S191 MIT DeepLearning L1
108 pages
Learning C++ by Creating Games with UE4
From Everand
Learning C++ by Creating Games with UE4
William Sherif
3/5 (7)
Python Machine Learning By Example: Unlock machine learning best practices with real-world use cases
From Everand
Python Machine Learning By Example: Unlock machine learning best practices with real-world use cases
Yuxi (Hayden) Liu
No ratings yet
Joint Detection-Estimation of Brain Activity in Functional MRI: A Multichannel Deconvolution Solution
No ratings yet
Joint Detection-Estimation of Brain Activity in Functional MRI: A Multichannel Deconvolution Solution
15 pages
2018 Ahmed PDF
No ratings yet
2018 Ahmed PDF
123 pages
From Import From Import: Docx Docx - Shared
No ratings yet
From Import From Import: Docx Docx - Shared
2 pages
Marielle Caccam Jewel Refran
No ratings yet
Marielle Caccam Jewel Refran
100 pages
Python Code
No ratings yet
Python Code
1 page
Codility
No ratings yet
Codility
1 page
Cod Ility
No ratings yet
Cod Ility
1 page
Ca1 Prelim
No ratings yet
Ca1 Prelim
60 pages
Topic-Economic Role For Advertisement Development
No ratings yet
Topic-Economic Role For Advertisement Development
11 pages
Unit 11
No ratings yet
Unit 11
6 pages
Factory Act Return PDF Download Form 22
No ratings yet
Factory Act Return PDF Download Form 22
1 page
1 Preoperative
No ratings yet
1 Preoperative
67 pages
Obj. & Scope
No ratings yet
Obj. & Scope
2 pages
Parabola - COE-Assignment - Solutions
No ratings yet
Parabola - COE-Assignment - Solutions
34 pages
Hand, Foot and Mouth Disease (HFMD)
No ratings yet
Hand, Foot and Mouth Disease (HFMD)
3 pages
LENTIL & Legumes
No ratings yet
LENTIL & Legumes
15 pages
Istqb Advanced Level Test Manager Syllabus v5
No ratings yet
Istqb Advanced Level Test Manager Syllabus v5
126 pages
Advances in Carbohydrate Chemistry and Biochemistry Secure Ebook Download
No ratings yet
Advances in Carbohydrate Chemistry and Biochemistry Secure Ebook Download
17 pages
5G Wireless Technology: Millimeter Wave Health Effects
No ratings yet
5G Wireless Technology: Millimeter Wave Health Effects
5 pages
Detailed Lesson Plan of Mean For Ungrouped Data
No ratings yet
Detailed Lesson Plan of Mean For Ungrouped Data
8 pages
Astm C273-C273M - 19
No ratings yet
Astm C273-C273M - 19
9 pages
Nanto Company Profile & Introduction Letter & ISO
No ratings yet
Nanto Company Profile & Introduction Letter & ISO
15 pages
Patiala Rally Notification 2020
No ratings yet
Patiala Rally Notification 2020
9 pages
High Pass Filter
No ratings yet
High Pass Filter
12 pages
01
No ratings yet
01
314 pages
A Project Report ON: Online Payroll Management System
No ratings yet
A Project Report ON: Online Payroll Management System
52 pages
Changing Levels of Meaning and Experience - Steve Andreas
No ratings yet
Changing Levels of Meaning and Experience - Steve Andreas
5 pages
Purposive Communication - Lesson 3
No ratings yet
Purposive Communication - Lesson 3
7 pages
Lectures Named Reactions
No ratings yet
Lectures Named Reactions
26 pages
Daily Lesson Log of M8Al-Ib-2 (Week 2 Day 3) : Can The Difference of Two Squares Be Applicable To 3 - 12 If No, Why?
No ratings yet
Daily Lesson Log of M8Al-Ib-2 (Week 2 Day 3) : Can The Difference of Two Squares Be Applicable To 3 - 12 If No, Why?
4 pages
Keypad Control For Multiple Appliances: S. Ramasamy R.G.Thiagaraj Kumar
No ratings yet
Keypad Control For Multiple Appliances: S. Ramasamy R.G.Thiagaraj Kumar
2 pages
Chi Square Test
No ratings yet
Chi Square Test
11 pages
Culture and Society
No ratings yet
Culture and Society
22 pages
Camay Relaunch in Pakistan
100% (1)
Camay Relaunch in Pakistan
26 pages
Impulse Invariance and Bilinear
No ratings yet
Impulse Invariance and Bilinear
8 pages
Design and Manufacturing of Pneumatic Burr Removing Machine: Kakde D V, Lokawar V L
No ratings yet
Design and Manufacturing of Pneumatic Burr Removing Machine: Kakde D V, Lokawar V L
3 pages
Oral-Communications Q2 Module-3
No ratings yet
Oral-Communications Q2 Module-3
15 pages

Practical Deep Learning For NLP: Maarten Versteegh NLP Research Engineer

Uploaded by

Practical Deep Learning For NLP: Maarten Versteegh NLP Research Engineer

Uploaded by

Practical Deep Learning for NLP

● Deep Learning Recap

"alignment" (0.323) "also need" (0.343)

(*) Scores removed

model = Model(input=input_layer, output=output_layer)

(*) Scores removed

Source: Andrej Karpathy

Source: Andrej Karpathy

Source: Y. Kim (2014) Convolutional Networks for Sentence Classification

# embedding_matrix: ndarray(vocab_size, embedding_dim)

input_layer = Input(shape=(MAX_SEQUENCE_LENGTH,), dtype='int32')

(*) Scores removed

Poll: Clinton up big on Trump in Virginia CNN 4176 601 17 211 11 83

He et al. Deep Residual Learning

layer = Convolution1D(128, 5, activation='linear')

layer = Convolution1D(128, 5, activation='linear')(layer)

block_output = merge([layer, input_layer], mode='sum')

The Guardian Dense

It's a fact: Trump has tiny hands.

Trump wins the election Guardian 3% 9% 7% 32% 49%

Trump wins the election Breitbart 58% 30% 8% 1% 3%

*Your mileage may vary. By a lot. I

Choose an adaptive optimizer

Source: Alec Radford

● Start small and keep adding layers

Y. Bengio (2012) Practical

Source: Andrej Karpathy

Training and validation accuracy

Source: Andrej Karpathy

● Ratio of weights to updates

After network architecture, continue with:

Friends don't let friends do a full grid search!

You might also like