0% found this document useful (0 votes)
19 views33 pages

3 - Deep Learning

The document provides an overview of natural language processing and deep learning. It discusses neural networks, text processing for deep learning models, and various applications of sequence modeling for NLP tasks like sentiment analysis, search queries, machine translation, chatbots, and text summarization.

Uploaded by

Ansruta Mohanty
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views33 pages

3 - Deep Learning

The document provides an overview of natural language processing and deep learning. It discusses neural networks, text processing for deep learning models, and various applications of sequence modeling for NLP tasks like sentiment analysis, search queries, machine translation, chatbots, and text summarization.

Uploaded by

Ansruta Mohanty
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Deep Learning and NLP

Natural Language Processing


Session 4

Madhuri Prabhala
Overview
Overview of the class

❑ Neural networks
o Weights and biases
o Forward propogations
o Backward propogations
o Activation function
o Gradient descent

❑ Use cases in text data

❑ Text processing for inputting into Deep learning architecture


Understanding Neural network models

Yes Monthly
Offer home loan or not? income

Model

Home loan
No approved?
Rs. 45000
Low Applicant’s salary High
Steps in the model

1. Take the salary as an input. Similar to a biological neuron taking in


inputs and giving a reaction
2. Check if it is greater than Rs. 45,000

3. If it is then output “Loan approved”.

Monthly Is salary Is home loan


income > 45,000 approved?

X Y
Steps in the model

Spouse’s Threshold of 45,000 is


X1 salary based on total
household income

Applicant’s Home loan


X2 Y
salary approved?

Father’s
X3 salary
Steps in the model

X1 + X2 + X3 > Threshold
Spouse’s
X1 salary
= > X1 + X2 + X3 – Threshold > 0
Threshold of 45,000 is based
on total household income
= > X1 + X2 + X3 – Bias > 0
Applica
X2 nt’s Home loan Y
salary approved?

Father’s X1 + X2 + X3 – Bias > 0 then output should be 1 (Approve)


X3 salary X1 + X2 + X3 – Bias < 0 then output should be 0 (Not approved)
The step function

Z = X1 + X2 + X3 + bias

Output = 1 if (Z = X1 + X2 + X3 + bias) > 0


Step function:
1 (for Z > 0)
Output = 0, otherwise

Output =

0 (for Z < 0)

Here, step function is the “Activation function”

If the Activation function is a step function, it is called a Perceptron.


Weights in the Perceptron

(10,000 * 1) + (15000 * 1) + (7000 * 1) = 32000


Spouse’s
10000 So, what is the value of Z?
salary
1
Z = X1 + X2 + X3 + bias

Applicant 1 Home loan


15000 Y Z = -13,000 (0)
’s salary approved?

1 So, should the loan be approved?


Father’s
7000 salary Answer: No (As step function returns 0, for
values < 0)
Bias -45000
Weights in the Perceptron

(10,000 * 2) + (15000 * 3) + (7000 * 1) = 72000


Spouse’s
10000 So, what is the value of Z?
salary
2
Z = X1 + X2 + X3 + bias

Applicant 3 Home loan


15000 Y Z = 27,000 (1)
’s salary approved?

1 So, should the loan be approved?


Father’s
7000 salary Answer: Yes

Bias -45000
Deep Learning Models
❑ Typically used in the case of unstructured data. Use of training data to improve prediction accuracy.
❑ Multi-layer perceptron
o Number of hidden layers
o Number of neurons in the hidden layers
o Input layers
o Number of neurons in the output layers

❑ Use of newly created features to create more features and other hidden layers

❑ Tensor flow playground:


http://playground.tensorflow.org/#activation=tanh&batchSize=10&dataset=circle&regDataset=reg-
plane&learningRate=0.03&regularizationRate=0&noise=0&networkShape=4,2&seed=0.68420&showTestData=false&discretize=false&percTrainData=50&x=true&y=true&xTimesY=false&xSqu
ared=false&ySquared=false&cosX=false&sinX=false&cosY=false&sinY=false&collectStats=false&problem=classification&initZero=false&hideText=false

❑ Forward and Backward propagation


❑ Epoch (1 cycle of Forward and backward propagation)
Text data – Sequential models
Sequential data – Text data

job data the scientist a is of excellent

the job of a data scientist is excellent

Text data has a sequence.

If the sequence is disturbed, it stops making sense.

Therefore, Text data falls under the category of sequence data.


Sequence modeling in text – Use case – Sentiment classification

o The job of a data scientist is excellent

o I truly dislike my job

Sentiment Classification
Sequence modeling in text – Use case – Search queries

Is Amazon a better E-
commerce site Sequence
compared to Flipkart? Models

Search engine queries


Sequence modeling in text – Use case – Machine Translation

Oppenheimer is a great Sequence Openheimer es una


movie Models gran película.

Language translation
Sequence modeling in text – Use case – Chat bots

Please suggest a
suitable Insurance Sequence Sure, I will help you
policy. Models choose a suitable policy.

Dialogue systems
Sequence modeling in text – Use case – Text summarization

A good data scientist


needs to have five
important skills. Anyone
who wants to start a
career as a data
scientist, must gain
these skills. They are: The five skills required
Sequence
1…… to start a career as a
Models
2…… data scientist are.
3……
4……
5…...

Text summarization
NLP for Deep Learning
Preparing text for deep learning models

1.Text pre-processing

2.Convert them into arrays

3.Feed them into the deep learning models


Text pre-processing for Deep learning

Text cleaning Remove text noise Unwanted or useless information in the text
• URLs, punctuation marks, numbers, special
characters
• Slangs – Bro, dope,etc.
• Spelling mistakes – cntrl, defntly

Text pre-processing

Mapping between text character and


Text representation Encoding
computer memory
• ASCII (English), West Europe (Latin), Big5
(Chinese)
Text encoding

❑ ASCII codes are numerical representations of all the characters

❑ Another encoding can have some other characteristics to represent the English English ASCII Code
characters
a 097
❑ It is important to have a standard encoding of all kinds of text, before any b 098
modeling or analysis on text
c 099
❑ UTF – 8 – Universally accepted encoding for most languages d 100
o All text data should be available in UTF-8 to avoid any discrepancy e 101
o Preferred to convert all text to lower-case.
E.g. Pen and pen are treated differently by the computer f 102
Text cleaning
Representing text data numerically

❑ No machine learning algorithms accept text as inputs.

❑ Therefore, text needs to be converted to numbers

❑ Two ways of converting text to numbers are:


o One-hot encoding
o Word embeddings
Text representation – One hot encoding
o The length of the vector is fixed
o It is equal to the number of unique words in the vocabulary.
o In actual data, the size of the vocabulary, and therefore the
size of the vectors is huge.
I have a rose garden

0 0 1 0 0
1 0 0 0 0
0 1 0 0 0
0 0 0 1 0
0 0 0 0 1
Text representation – One hot encoding - Steps

o Clean the text

o Create tokens from the text

o From the created tokens, prepare a vocabulary

o Prepare the one – hot encoders


Steps in One hot encoding

Original text Cleaned text Tokens Vocabulary


The school is the school is the, school, is, nearby
nearby. nearby the, school, is, nearby,
The tennis class is the tennis class is the, tennis, class, is, tennis, class, fun, give,
fun. fun fun me, book

Give me the book. give me the book give, me, the, book
Steps in one hot encoding – Creating the one hot vector

Size of the vector is equal to the size of the


book 1 0 0 0 0 0 0 0 0 0
vocabulary.
class 0 1 0 0 0 0 0 0 0 0
Vocabulary
fun 0 0 1 0 0 0 0 0 0 0
the, school, is, nearby,
tennis, class, fun, give, give 0 0 0 1 0 0 0 0 0 0
me, book
…………………………………….
…………………………………….
…………………………………….
…………………………………….
In this case the size of the vocabulary
is 8. Size of the one-hot vector will be tennis 0 0 0 0 0 0 0 0 1 0
8.
the 0 0 0 0 0 0 0 0 0 1
Limitation of one hot encoding

❑ Look at these sentences:


o The shop is nearby.
The ______ is nearby. The context is identical
o The college is nearby.

❑ Assuming a vector size of 25: o Is this getting reflected in the


one hot encoding?

book 0 0 0 0 0 00 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 o One hot encoding does not


capture the context.
college 0 0 0 0 0 00 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 o This challenge is overcome by
Word embeddings.
Text representation – Word embeddings

❑ Another approach to numerical representation of the text

❑ Do not depend on the vocabulary size

o Even with a vocabulary size of 40,000, there can be a vector of 300 to 400

❑ They capture the context around the word


o Word embeddings represent position of the word in each context.

❑ They are obtained by training a special neural network architecture


Obtaining Word embeddings

vector of mango ~ vector of guava

Mango

Guava

Cheetah

Leopard

vector of cheetah ~ vector of leopard


Obtaining Word embeddings - Context

Rooster – Male + Female = ?? Hen Male

=> Gender relationship is preserved by the word


embeddings. Female

Rooster

Hen
Obtaining Word embeddings - Approaches

Obtaining word embeddings

Training of word embedding Pre – trained word


representation from scratch embeddings
✓ A huge text corpus is fed to a NN
architecture ✓ word2vec (Google)
✓ This is trained to give out word ✓ GloVe (Stanford)
embeddings
✓ The bigger the text corpus the ✓ Can be downloaded from the
better the word embeddings internet
✓ E.g., Wikipedia, 1000s of news
articles etc.

You might also like