0% found this document useful (0 votes)
10 views

NLP Lab2

Uploaded by

1315715372
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

NLP Lab2

Uploaded by

1315715372
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Lab 2 RNN and Seq2Seq With Attention

SHINE-MING WU SCHOOL OF INTELLIGENT ENGINEERING &


SCHOOL OF FUTURE TECHNOLOGY
Autumn 2024

Task 1 RNN

Prerequisites
1. You need to install keras and tensorflow packages:
p ip i n s t a l l t e n s o r f l o w

2. Make sure you have the following file(s): lab2_task1_RNN, including:


lab2 task1 RNN
lab2 task1 skeleton.py
data helper.py
data
ptb.test.txt
ptb.train.txt
ptb.valid.txt

1.Assignment
For this task, you are required to develop an RNN model. You may either devise a new
architectural design or choose an existing model, as outlined in the tutorial.

Hint: you could refer to lab2 task1 skeleton.py

2. Tutorial for RNN


Please explore these blogs for futher understanding of RNN

• Introduction: https://zhuanlan.zhihu.com/p/32755043

• Code: https://zhuanlan.zhihu.com/p/32881783

In this assignment, you can add a RNN by SimpleRNN(...), please read https://keras.
io/api/layers/recurrent_layers/simple_rnn/.
Example
Figure 1: Illustration of a RNN architecture. Image Source: https://www.jianshu.com/
p/b9cd38804ac6/

inputs: A 3D tensor denotes some sentences, with shape [batch, timesteps, feature], e.g,[32,
10, 8], i.e., The number of sentences is 32, each sentence contains 10 words, each word is
represented as a 8-dimensional vector.

SimpleRNN(units = 10, activation=’relu’,


use bias=True, return sequences=True)(inputs)

Advanced Reading Materials (RNN)

• Keras: https://keras.io/.
• RNN: http://adventuresinmachinelearning.com/recurrent-neural-networks-lstm-
tutorial-tensorflow/
• LSTM: http://colah.github.io/posts/2015-08-Understanding-LSTMs/.
• The state-of-the-arts: https://arxiv.org/pdf/1707.05589.pdf

Task 2 Seq2Seq With Attention

Prerequisites
1. You need to install keras and tensorflow packages:
p ip i n s t a l l t e n s o r f l o w

2. and install nltk package


p ip i n s t a l l n l t k

2
3. Make sure you have the following file(s): lab2_task2_seq2seq, including:
lab2 task2 seq2seq
lab2 task2 skeleton.py
data helper.py
data
fra cleaned.txt

1. Assignment: Neural Machine Translation


In this task, we will use the Seq2Seq model to train a tranlstion system from French to
English.
Just set the parameter of load data as translation. This function will load sentence pairs
of the Tatoeba project.
Hint: you could refer to lab2 task2 skeleton.py and the functions mainly needed in this task
have hyperlinks in this material.

2. Tutorial for GRU

What is the GRU (Gated Recurrent Unit)?

Figure 2: Illustration of the GRU. Image Source: https://towardsdatascience.com/


understanding-gru-networks-2ef37df6c9be

The GRU (Gated Recurrent Unit) is a gating mechanism in recurrent neural networks,

3
introduced in 2014 by Kyunghyun Cho et al. The details are as follows:

rt = σ(Wir xt + bir + Whr h(t−1) + bhr ) (1)


zt = σ(Wiz xt + biz + Whz ht−1 + bhz ) (2)
nt = tanh(Win xt + bin + rt (Whn h(t−1) + bhn )) (3)
ht = (1 − zt )nt + zt h(t−1) (4)

where ht is the hidden state at time t, xt is the input at time t, h(t−1) is the hidden state of
the previous layer at time t − 1 or the initial hidden state at tiem 0, and rt , zt , nt are the
reset, update, and new gates, respectively σ is the sigmoid function.

Why GRU?

• The GRU unit controls the flow of information like the LSTM unit, but without having
to use a memory unit. It just exposes the full hidden content without any control.

• GRU is relatively new, and from my perspective, the performance is on par with LSTM,
but computationally more efficient (less complex structure as pointed out).

3. Tutorial for Sequence to Sequence


Sequence to Sequence (Seq2Seq) is about training models to convert sequences from one
domain (e.g. sentences in French) to sequences in another domain (e.g. the same sentences
translated to English), introduced in 2014 by Sutskever et al.

Typically, a Seq2Seq model consists of 2 parts, the encoder part and the decoder part.
The encoder encodes the information of one sequence to one vector or multiple vectors, and
then the decoder decodes the information provided by the encoder into the target sequence.

Simple Seq2Seq

The simple version Seq2Seq2 uses one encoder RNN and one decoder RNN. And the infor-
mation from the encoder is only the last hidden state of the encoder RNN. In this lab, we
use GRU to encode and decode sequence information.

The network includes

• 1 layer of Embedding, which is shared by the encoder and the decoder for convenience.

• the encoder part (assuming N layer(s) of Bidirectional GRU)

– N − 1 layer(s) of Bidirectional GRU, which return(s) sequences only.


– 1 layer of Bidirectional GRU, which returns sequences and the last state.
– Dropout layers between GRU layers if applicable.

4
Figure 3: Illustration of the Seq2Seq.

• the decoder part (assuming M layer(s) of Unidirectional GRU)


– 1 layer of Unidirectional GRU, whose input is the ground truth and initial hidden
state is the encoder’s last state.
– M − 1 layer(s) of Unidirectional GRU.
– Dropout layers between GRU layers if applicable.
– 1 layer of Dense layer with softmax activation wrapped by TimeDistributed.

The loss function is the sparse categorical crossentropy.


The Adam optimizer is chosen.

Seq2Seq with Attention

Encoding of the source sentence needs to capture all information about the source sentence.
But just using the last hidden state causes information bottleneck!
Core idea: in each step of the decoder, focus on a particular part of the source sequence.

Compared with the Simple version, the encoder remains unchanged, but the decoder needs
more information of the source sequence.
There are many ways to fuse information. One way is to concatenate attention with the
current decoder hidden state before sending to the decoder RNN. And a faster and easier
way is to concatenate attention with the output of decoder RNN before making predictions.
To make this lab easier, we choose the latter method.
The decoder with attention includes

− 1 layer of Unidirectional GRU, whose input is the ground truth and initial hidden state

5
Figure 4: Illustration of the Seq2Seq with Attention.

is the encoder’s last state.

− M − 1 layer(s) of Unidirectional GRU.

− Dropout layers between GRU layers if applicable.

− Necessary layers (e.g. Dense layers) and computation operations (e.g. dot-
product).

− The concatenate operation and a Dense layer to fuse the attention with the
output of decoder RNN.

− 1 layer of Dense layer with softmax activation wrapped by TimeDistributed.

Submission

Please submit the files, including program output and Python scripts, to the Teach-
ing Assistant’s email address: ZT N LP 2024 [email protected]. After you finished the
assignments, make sure you include the header information in the beginning of your code
# a u t h o r : Your name
# s t u d e n t i d : Your student ID

6
Copy all the program output in to a text file named StudentID_StudentName_lab2_output.
txt and submit zipped python script solutions named StudentID_StudentName_lab2.zip
that containing all the python scripts and the aforementioned answer files to TA’s email
address.

If you want onsite grading during the lab, you can ask TA to grade your lab submissions by
showing your codes and outputs. It should be noted that, even with the onsite grading, you
still need to summit the files to Blackboard for keeping a electronic record of your assignment.

Deadline for Submission: 20:00, November 3, 2024.

You might also like