NLP Lab2
NLP Lab2
Task 1 RNN
Prerequisites
1. You need to install keras and tensorflow packages:
p ip i n s t a l l t e n s o r f l o w
1.Assignment
For this task, you are required to develop an RNN model. You may either devise a new
architectural design or choose an existing model, as outlined in the tutorial.
• Introduction: https://zhuanlan.zhihu.com/p/32755043
• Code: https://zhuanlan.zhihu.com/p/32881783
In this assignment, you can add a RNN by SimpleRNN(...), please read https://keras.
io/api/layers/recurrent_layers/simple_rnn/.
Example
Figure 1: Illustration of a RNN architecture. Image Source: https://www.jianshu.com/
p/b9cd38804ac6/
inputs: A 3D tensor denotes some sentences, with shape [batch, timesteps, feature], e.g,[32,
10, 8], i.e., The number of sentences is 32, each sentence contains 10 words, each word is
represented as a 8-dimensional vector.
• Keras: https://keras.io/.
• RNN: http://adventuresinmachinelearning.com/recurrent-neural-networks-lstm-
tutorial-tensorflow/
• LSTM: http://colah.github.io/posts/2015-08-Understanding-LSTMs/.
• The state-of-the-arts: https://arxiv.org/pdf/1707.05589.pdf
Prerequisites
1. You need to install keras and tensorflow packages:
p ip i n s t a l l t e n s o r f l o w
2
3. Make sure you have the following file(s): lab2_task2_seq2seq, including:
lab2 task2 seq2seq
lab2 task2 skeleton.py
data helper.py
data
fra cleaned.txt
The GRU (Gated Recurrent Unit) is a gating mechanism in recurrent neural networks,
3
introduced in 2014 by Kyunghyun Cho et al. The details are as follows:
where ht is the hidden state at time t, xt is the input at time t, h(t−1) is the hidden state of
the previous layer at time t − 1 or the initial hidden state at tiem 0, and rt , zt , nt are the
reset, update, and new gates, respectively σ is the sigmoid function.
Why GRU?
• The GRU unit controls the flow of information like the LSTM unit, but without having
to use a memory unit. It just exposes the full hidden content without any control.
• GRU is relatively new, and from my perspective, the performance is on par with LSTM,
but computationally more efficient (less complex structure as pointed out).
Typically, a Seq2Seq model consists of 2 parts, the encoder part and the decoder part.
The encoder encodes the information of one sequence to one vector or multiple vectors, and
then the decoder decodes the information provided by the encoder into the target sequence.
Simple Seq2Seq
The simple version Seq2Seq2 uses one encoder RNN and one decoder RNN. And the infor-
mation from the encoder is only the last hidden state of the encoder RNN. In this lab, we
use GRU to encode and decode sequence information.
• 1 layer of Embedding, which is shared by the encoder and the decoder for convenience.
4
Figure 3: Illustration of the Seq2Seq.
Encoding of the source sentence needs to capture all information about the source sentence.
But just using the last hidden state causes information bottleneck!
Core idea: in each step of the decoder, focus on a particular part of the source sequence.
Compared with the Simple version, the encoder remains unchanged, but the decoder needs
more information of the source sequence.
There are many ways to fuse information. One way is to concatenate attention with the
current decoder hidden state before sending to the decoder RNN. And a faster and easier
way is to concatenate attention with the output of decoder RNN before making predictions.
To make this lab easier, we choose the latter method.
The decoder with attention includes
− 1 layer of Unidirectional GRU, whose input is the ground truth and initial hidden state
5
Figure 4: Illustration of the Seq2Seq with Attention.
− Necessary layers (e.g. Dense layers) and computation operations (e.g. dot-
product).
− The concatenate operation and a Dense layer to fuse the attention with the
output of decoder RNN.
Submission
Please submit the files, including program output and Python scripts, to the Teach-
ing Assistant’s email address: ZT N LP 2024 [email protected]. After you finished the
assignments, make sure you include the header information in the beginning of your code
# a u t h o r : Your name
# s t u d e n t i d : Your student ID
6
Copy all the program output in to a text file named StudentID_StudentName_lab2_output.
txt and submit zipped python script solutions named StudentID_StudentName_lab2.zip
that containing all the python scripts and the aforementioned answer files to TA’s email
address.
If you want onsite grading during the lab, you can ask TA to grade your lab submissions by
showing your codes and outputs. It should be noted that, even with the onsite grading, you
still need to summit the files to Blackboard for keeping a electronic record of your assignment.