Deep Learning Project
Deep Learning Project
Submitted by:
Submitted to:
Department of CSE/IT
Jaypee Institute of Information Technology,Noida
December 2021
ABSTRACT
This project report is entitled “Chitter-Chatter (a bot that speaks like you)”. The main objective of this
project is to make a chat bot which would chat just like one would chat over any social chat medium
to their friends.
The various tools used for this project are python and various libraries to clean and filter the data (data
pre-processing), tensorflow, numpy, seq2seq modeling with attention mechanism to create the chatbot,
flask to prepare GUI for chat over the web
List of Tables
5.1 Configurations.................................................................................................................... 11
5.3 Testing……………………………………………………………………………………. 12
Page No.
1. INTRODUCTION 1
2. BACKGROUND STUDY 2
2.1. THE ENCODER-DECODER MODEL 2
2.2. RECURRENT NEURAL NETWORKS 2
2.3. BIDIRECTIONAL RNN 3
2.4. SEQ TO SEQ MODEL 3
2.5. BAHDANAU ATTENTION: ADDITIVE ATTENTION 4
3. REQUIREMENT ANALYSIS 5
3.1. SOFTWARE REQUIREMENTS 5
3.2. HARDWARE REQUIREMENTS 5
4. DETAILED DESIGN 6
5. IMPLEMENTATION 7
5.1. DATASET 7
5.2. DATA PREPROCESSING 7
5.3. TRAINING 10
5.4. TESTING 12
6. EXPERIMENTAL RESULT AND ANALYSIS 13
6.1. CONFIG 1 13
6.2. CONFIG 2 13
6.3. CONFIG 3 14
7. CONCLUSION AND FUTURE SCOPE 15
7.1. CONCLUSION 15
7.2. FUTURE SCOPE 15
References 16
1. INTRODUCTION
Chatbots are “computer programs which conduct conversation through auditory or textual methods”.
Apple’s Siri, Microsoft’s Cortana, Google Assistant, and Amazon’s Alexa are four of the most popular
conversational agents today. They can help you get directions, check the scores of sports games, call
people in your address book, and can accidentally make you order a $170 dollhouse.These products
all have auditory interfaces where the agent converses with you through audio messages. Chatbots
have been around for a decent amount of time (Siri released in 2011), but only recently has deep
learning been the go-to approach to the task of creating realistic and effective chatbot interaction.
From a high level, the job of a chatbot is to be able to determine the best response for any given
message that it receives. This “best” response should either (1) answer the sender’s question, (2) give
the sender relevant information, (3) ask follow-up questions, or (4) continue the conversation in a
realistic way. This is a pretty tall order. The chatbot needs to be able to understand the intentions of
the sender’s message, determine what type of response message (a follow-up question, direct
response, etc.) is required, and follow correct grammatical and lexical rules while forming the
response.It’s safe to say that modern chatbots have trouble accomplishing all these tasks. For all the
progress we have made in the field, we too often get chatbot experiences like this.
Chatbots are too often not able to understand our intentions, have trouble getting us the correct
information, and are sometimes just exasperatingly difficult to deal with.
This project focuses solely on the textual front. Here we have designed a deep learning model to train
a chatbot on our past social media conversations in the hope of getting the chatbot to respond to
messages the way that we would.
1
2. BACKGROUND STUDY
Fig 2.1:Unfolding of an RNN over 3 time-steps. Here x is the input sequence, o is the output sequence, s is the
sequence of hidden states and U,W and V are the weights of the network.
Source:https://www.researchgate.net/figure/A-recurrent-neural-network-and-the-unfolding-in-time-of-the-comp
utation-involved-in-its_fig1_324680970
2
2.3 BIDIRECTIONAL RNN
Bidirectional Recurrent Neural Networks (BRNN) connect two hidden layers of opposite directions to
the same output. With this form of generative deep learning, the output layer can get information from
past (backwards) and future (forward) states simultaneously as shown in the figure 2.2.
3
There are a few challenges in using this model. The most disturbing one is that the model cannot
handle variable length sequences. It is disturbing because almost all the sequence-to-sequence
applications involve variable length sequences. The next one is the vocabulary size. The decoder has
to run softmax over a large vocabulary of say 20,000 words, for each word in the output. That is going
to slow down the training process, even if your hardware is capable of handling it. Representation of
words is of great importance. How do you represent the words in the sequence? Use of one-hot
vectors means we need to deal with large sparse vectors due to large vocabulary and there is no
semantic meaning to words encoded into one-hot vectors.
Aligning the decoder with the relevant input sentences and implementing Attention. The proposed
approach provides an intuitive way to inspect the (soft-)alignment between the words in a generated
translation and those in a source sentence. This is done by visualizing the annotation weights. Each
row of a matrix in each plot indicates the weights associated with the annotations. From this we see
which positions in the source sentence were considered more important when generating the target
word as shown in figure 2.4.
4
3. REQUIREMENT ANALYSIS
● Python libraries:
○ tensorflow
○ numpy
○ matplotlib
5
4. DETAILED DESIGN
ALGORITHM
Algorithm Deep Neural Network (DNN), Recurrent
Neural Network (RNN)
WEB APPLICATION
Frontend and Backend Technology Used HTML, CSS, Javascript, Python Flask app
6
5. IMPLEMENTATION
5.1 DATASET:
This corpus contains a metadata-rich collection of fictional conversations extracted from raw movie
scripts:
- genres
- release year
- IMDB rating
- IMDB rating
7
3. get list of all conversations as questions and answers shown in figure 5.1
● text to lowercase
8
5. Filtering the questions and answers which are too long or too short.
6. Get each word and its count from filtered questions and answers in a vocab dictionary.
Creating vocabulary indexes with words appearing more than two times in vocab dictionary.
7. Adding the codes <EOS> (end-of-sentence), <PAD> (padding), <UNK> (unknown), and
9
8. Add an <EOS> tag at the end of each answer.
9. Again filter out words by comparing words in filtered questions and words in vocabulary
5.3 TRAINING:
As we created the filtered question and answer list, we created training data (length of 32542) .
Bidirectional LSTM is used in the encoder side and attention mechanism is used in the decoder side to
improve model performance.
10
The following table 5.1 shows the different configuration of our model.
max_length 5 5 6
epochs 500 80 60
The following table 5.2 shows the time taken for training each configuration on our local
computer with intel core i5 8th gen CPU and 8GB of RAM.
11
5.4 TESTING
Table 5.3 shows the response of the model to the data for different configurations.
12
6. EXPERIMENTAL RESULT AND ANALYSIS
6.1 CONFIG 1
The table 6.1 shows the plot of change in accuracy and change in loss for config 1.
6.2 CONFIG 2
The table 6.2 shows the plot of change in accuracy and change in loss for config 2.
13
6.3 CONFIG 3
The table 6.3 shows the plot of change in accuracy and change in loss for config 3.
14
7. CONCLUSION AND FUTURE SCOPE
7.1 CONCLUSION
Various techniques and architectures were discussed, that were proposed to augment the
encoder-decoder model and to make conversational agents more natural and human-like. Criticism
was also presented regarding some of the properties of current chatbot models and it has been shown
how and why several of the techniques currently employed are inappropriate for the task of modeling
conversations. The performance of the training was analyzed with the help of automatic evaluation
metrics and by comparing output responses for a set of source utterances. As the cornel data must be
improved further to get better results in future. We can also try different combinations of
hyperparameters other than the configurations discussed in this paper. The training on Cornell Movie
Subtitle corpus produced results which needs further improvement and more attention on training
parameters. We can try different attention mechanisms like Luong.We can also judge the quality of a
dataset by using another similar dataset with the hyperparameters we tried for cornel data.
15
REFERENCES
Dataset
[1] Cornell movie dialog corpus dataset available at
https://www.kaggle.com/rajathmc/cornell-moviedialog-corpus
Online
[2] Attention Mechanism https://blog.floydhub.com/attention-mechanism/
Proceedings paper
[5]https://www.researchgate.net/publication/338100972_Conversational_AI_Chatbot_Based_on_Enc
oder-Decoder_Architectures_with_Attention_Mechanism
[6] Neural machine translation by jointly learning to align and translate Dzmitry Bahdanau
https://arxiv.org/pdf/1409.0473.pdf
16