0% found this document useful (0 votes)
93 views

Deep Learning Project

Uploaded by

Keshav Bansal
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
93 views

Deep Learning Project

Uploaded by

Keshav Bansal
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

CHITTER CHATTER

(A bot that speaks like you!)

Deep Learning Project

Submitted by:

PURNIMA SHUKLA (9918103226)

KESHAV BANSAL (9918103219)

PALLAV GUPTA (9918103134)

Submitted to:

DR. SWATI GUPTA

Department of CSE/IT
Jaypee Institute of Information Technology,Noida

December 2021
ABSTRACT
This project report is entitled “Chitter-Chatter (a bot that speaks like you)”. The main objective of this
project is to make a chat bot which would chat just like one would chat over any social chat medium
to their friends.

Dialogue Generation or Intelligent Conversational Agent development using Artificial Intelligence or


Machine Learning technique is an interesting problem in the field of Natural Language Processing.In
many research and development projects, they are using Artificial Intelligence, Machine Learning
algorithms and Natural Language Processing techniques for developing conversation/dialogue
agents.Dialogue/conversation agents are are predominantly used by businesses, government
organizations and nonprofit organizations. They are frequently deployed by financial organizations
like banks, credit card companies, businesses like online retail stores and startups.These virtual agents
are adopted by businesses ranging from very small start-ups to large corporations. There are many
chatbot development frameworks available in the market both code based and interface based. But
they lack the flexibility and usefulness in developing real dialogues. Popular intelligent personal
assistants include Amazon’s Alexa, Microsoft’s Cortana and Google’ Google Assistant. The
functioning of these agents are limited, are retrieval based agents and also they are not aimed at
holding conversations which emulate real human interaction.Among current chatbots, many are
developed using rule based techniques, simple machine learning algorithms or retrieval based
techniques which do not generate good results.

The various tools used for this project are python and various libraries to clean and filter the data (data
pre-processing), tensorflow, numpy, seq2seq modeling with attention mechanism to create the chatbot,
flask to prepare GUI for chat over the web
List of Tables

Table Title Page

4.1 Algorithms and web applications....................................................................................... 6

5.1 Configurations.................................................................................................................... 11

5.2 Training Time..................................................................................................................... 11

5.3 Testing……………………………………………………………………………………. 12

6.1 Plot of change in accuracy and change in loss for config1……………………………..... 13

6.2 Plot of change in accuracy and change in loss for config2………………………………. 13

6.3 Plot of change in accuracy and change in loss for config3……………………………….. 14


List of Figures

Figure Title Page

1.1 Weird chat experience with chatbot………………………………………………….. 1

2.1 Unfolding of RNN over 3 time-steps………………………………………………… 2

2.2 Bidirectional RNN………………………………………………………………….... 3

2.3 Long-Short-Term-Memory(LSTM) Encoder Decoder Architecture………………… 3

2.4 Bahdanau Attention Overview………………………………………………………. 4

4.1 Steps involved in the designing of the model………………………………………... 6

5.1 List of all conversations……………………………………………………………… 8

5.2 Cleaning text.………………………………………………………………………….. 8

5.3 Filtered question and answers…………………………………………………………. 9

5.4 First few words of vocabulary………………………………………………………………. 9

5.5 Last few words of vocabulary……………………………………………………………….. 9

5.6 Working of the encoder-decoder model using Bahdanau Attention………………….. 10


TABLE OF CONTENTS

Page No.

1. INTRODUCTION 1
2. BACKGROUND STUDY 2
2.1. THE ENCODER-DECODER MODEL 2
2.2. RECURRENT NEURAL NETWORKS 2
2.3. BIDIRECTIONAL RNN 3
2.4. SEQ TO SEQ MODEL 3
2.5. BAHDANAU ATTENTION: ADDITIVE ATTENTION 4
3. REQUIREMENT ANALYSIS 5
3.1. SOFTWARE REQUIREMENTS 5
3.2. HARDWARE REQUIREMENTS 5
4. DETAILED DESIGN 6
5. IMPLEMENTATION 7
5.1. DATASET 7
5.2. DATA PREPROCESSING 7
5.3. TRAINING 10
5.4. TESTING 12
6. EXPERIMENTAL RESULT AND ANALYSIS 13
6.1. CONFIG 1 13
6.2. CONFIG 2 13
6.3. CONFIG 3 14
7. CONCLUSION AND FUTURE SCOPE 15
7.1. CONCLUSION 15
7.2. FUTURE SCOPE 15

References 16
1. INTRODUCTION

Chatbots are “computer programs which conduct conversation through auditory or textual methods”.
Apple’s Siri, Microsoft’s Cortana, Google Assistant, and Amazon’s Alexa are four of the most popular
conversational agents today. They can help you get directions, check the scores of sports games, call
people in your address book, and can accidentally make you order a $170 dollhouse.These products
all have auditory interfaces where the agent converses with you through audio messages. Chatbots
have been around for a decent amount of time (Siri released in 2011), but only recently has deep
learning been the go-to approach to the task of creating realistic and effective chatbot interaction.
From a high level, the job of a chatbot is to be able to determine the best response for any given
message that it receives. This “best” response should either (1) answer the sender’s question, (2) give
the sender relevant information, (3) ask follow-up questions, or (4) continue the conversation in a
realistic way. This is a pretty tall order. The chatbot needs to be able to understand the intentions of
the sender’s message, determine what type of response message (a follow-up question, direct
response, etc.) is required, and follow correct grammatical and lexical rules while forming the
response.It’s safe to say that modern chatbots have trouble accomplishing all these tasks. For all the
progress we have made in the field, we too often get chatbot experiences like this.

Fig 1.1 Weird chat experience with chatbot


Source: https://twitter.com/Reza_Zadeh/status/765722701465948160/photo/1

Chatbots are too often not able to understand our intentions, have trouble getting us the correct
information, and are sometimes just exasperatingly difficult to deal with.
This project focuses solely on the textual front. Here we have designed a deep learning model to train
a chatbot on our past social media conversations in the hope of getting the chatbot to respond to
messages the way that we would.

1
2. BACKGROUND STUDY

2.1 THE ENCODER-DECODER MODEL


The main concept that differentiates rule-based and neural network based approaches is the presence
of a learning algorithm in the latter case. An “encoder” RNN reads the source sentence and transforms
it into a rich fixed-length vector representation, which in turn is used as the initial hidden state of a
“decoder” RNN that generates the target sentence. Here, we propose to follow this elegant recipe,
replacing the encoder RNN by a deep convolution neural network (CNN). It is natural to use a CNN
as an image “encoder”, by first pre-training it for an image classification task and using the last hidden
layer as an input to the RNN decoder that generates sentences.

2.2 RECURRENT NEURAL NETWORKS


A recurrent neural network (RNN) is a neural network that can take as input a variable length
sequence x = (x1,...,xn) and produce a sequence of hidden states h = (h1,...,hn), by using recurrence.
This is also called the unrolling or unfolding of the network, visualized in Figure 2.1. At each step the
network takes as input xi and hi−1 and generates a hidden state hi. At each step i, the hidden state hi is
updated by
hi = f(Whi−1 + Uxi)
where W and U are matrices containing the weights (parameters) of the network. f is a nonlinear
activation function which can be the hyperbolic tangent function for example. The vanilla
implementation of an RNN is rarely used, because it suffers from the vanishing gradient problem
which makes it very hard to train. Usually long short-term memory (LSTM) or gated recurrent
units(GRU) are used for the activation function. LSTMs were developed to combat the problem of
long-term dependencies that vanilla RNNs face.

Fig 2.1:Unfolding of an RNN over 3 time-steps. Here x is the input sequence, o is the output sequence, s is the
sequence of hidden states and U,W and V are the weights of the network.
Source:https://www.researchgate.net/figure/A-recurrent-neural-network-and-the-unfolding-in-time-of-the-comp
utation-involved-in-its_fig1_324680970

2
2.3 BIDIRECTIONAL RNN

Bidirectional Recurrent Neural Networks (BRNN) connect two hidden layers of opposite directions to
the same output. With this form of generative deep learning, the output layer can get information from
past (backwards) and future (forward) states simultaneously as shown in the figure 2.2.

Fig 2.2:Bidirectional RNN


Source: http://www.chaitjo.com/context-embeddings/

2.4 SEQ2SEQ MODEL


The Sequence To Sequence model became the Go-To model for Dialogue Systems and Machine
Translation. It consists of two RNNs (Recurrent Neural Network) , an Encoder and a Decoder. As
shown in figure 2.3, the encoder takes a sequence(sentence) as input and processes one symbol(word)
at each time step. Its objective is to convert a sequence of symbols into a fixed size feature vector that
encodes only the important information in the sequence while losing the unnecessary information.
You can visualize data flow in the encoder along the time axis, as the flow of local information from
one end of the sequence to another. Each hidden state influences the next hidden state and the final
hidden state can be seen as the summary of the sequence. This state is called the context or thought
vector, as it represents the intention of the sequence. From the context, the decoder generates another
sequence, one symbol (word) at a time. Here, at each time step, the decoder is influenced by the
context and the previously generated symbols.

Fig 2.3:Long-Short-Term-Memory (LSTM) Encoder Decoder Architecture


Source: http://www.wildml.com/2016/04/deep-learning-for-chatbots-part-1-introduction/

3
There are a few challenges in using this model. The most disturbing one is that the model cannot
handle variable length sequences. It is disturbing because almost all the sequence-to-sequence
applications involve variable length sequences. The next one is the vocabulary size. The decoder has
to run softmax over a large vocabulary of say 20,000 words, for each word in the output. That is going
to slow down the training process, even if your hardware is capable of handling it. Representation of
words is of great importance. How do you represent the words in the sequence? Use of one-hot
vectors means we need to deal with large sparse vectors due to large vocabulary and there is no
semantic meaning to words encoded into one-hot vectors.

2.5 BAHDANAU ATTENTION: ADDITIVE ATTENTION

Aligning the decoder with the relevant input sentences and implementing Attention. The proposed
approach provides an intuitive way to inspect the (soft-)alignment between the words in a generated
translation and those in a source sentence. This is done by visualizing the annotation weights. Each
row of a matrix in each plot indicates the weights associated with the annotations. From this we see
which positions in the source sentence were considered more important when generating the target
word as shown in figure 2.4.

Fig 2.4:Bahdanau Attention Overview


Source: https://blog.floydhub.com/attention-mechanism/

4
3. REQUIREMENT ANALYSIS

3.1 SOFTWARE REQUIREMENTS:

● Windows or linux operating system

● Python 3.5 and Anaconda 3 for running jupyter notebook

● Python libraries:

○ tensorflow

○ numpy

○ matplotlib

3.2 HARDWARE REQUIREMENTS:

● Laptop or Desktop Computer

5
4. DETAILED DESIGN

Steps involved in designing of the model are shown in fig 4.1:

Fig 4.1: Steps involved in the designing of model

Algorithms and web applications are shown in table 4.1.

Table 4.1 Algorithms and web applications

ALGORITHM
Algorithm Deep Neural Network (DNN), Recurrent
Neural Network (RNN)

Main Technique Sequence to Sequence (Seq2Seq) modeling


encoder

Decoder Enhancement Techniques Long Short Term Memory (LSTM) based


RNN cell, Bidirectional LSTM, Neural
attention Mechanism

WEB APPLICATION
Frontend and Backend Technology Used HTML, CSS, Javascript, Python Flask app

6
5. IMPLEMENTATION

5.1 DATASET:

Cornell Movie-Dialog Corpus

This corpus contains a metadata-rich collection of fictional conversations extracted from raw movie
scripts:

● 220,579 conversational exchanges between 10,292 pairs of movie characters


● involves 9,035 characters from 617 movies
● in total 304,713 utterances
● movie metadata included:

- genres

- release year

- IMDB rating

- number of IMDB votes

- IMDB rating

● character metadata included:

- gender (for 3,774 characters)

- position on movie credits (3,321 characters)

5.2 DATA PREPROCESSING:

Understanding the Dataset and Preprocessing Steps:

1. a conversation pair in movie_conversations.txt is given as:

2. corresponding movie lines(index and line) for the above conversation:

7
3. get list of all conversations as questions and answers shown in figure 5.1

Fig 5.1: List of all conversations

4. clean text shown in figure 5.2:

● text to lowercase

● replacing certain words as follow:

Fig 5.2: Cleaning text

8
5. Filtering the questions and answers which are too long or too short.

Fig 5.3: Filtered question and answers

6. Get each word and its count from filtered questions and answers in a vocab dictionary.

Creating vocabulary indexes with words appearing more than two times in vocab dictionary.

7713 words appear more than 2 times

Fig 5.4 First few words of vocabulary

7. Adding the codes <EOS> (end-of-sentence), <PAD> (padding), <UNK> (unknown), and

<GO> (start) in the vocab dictionary.

Fig 5.5: Last few words of vocabulary

9
8. Add an <EOS> tag at the end of each answer.

yeah fine => yeah fine <EOS>

9. Again filter out words by comparing words in filtered questions and words in vocabulary

index and doing the same for filtered answers.

5.3 TRAINING:

As we created the filtered question and answer list, we created training data (length of 32542) .
Bidirectional LSTM is used in the encoder side and attention mechanism is used in the decoder side to
improve model performance.

Fig 5.6:Working of the encoder-decoder model using Bahdanau Attention

10
The following table 5.1 shows the different configuration of our model.

Table 5.1 Configurations

PARAMETERS CONFIG 1 CONFIG 2 CONFIG 3

max_length 5 5 6

batch size 128 512 512

rnn_size 128 512 512

embed_size 128 512 512

learning rate 0.001 0.001 0.001

epochs 500 80 60

learning rate decay 0.99 0.99 0.99

min learning rate 0.0001 0.0001 0.0001

The following table 5.2 shows the time taken for training each configuration on our local
computer with intel core i5 8th gen CPU and 8GB of RAM.

Table 5.2 Training Time

CONFIG 1 CONFIG 2 CONFIG 3

Time per epoch (sec) 65 155 240

No. of epochs 500 80 60

Total time (hrs) 9 3.5 4

11
5.4 TESTING

Table 5.3 shows the response of the model to the data for different configurations.

Table 5.3 Testing

CONFIG 1 CONFIG 2 CONFIG 3

12
6. EXPERIMENTAL RESULT AND ANALYSIS

6.1 CONFIG 1

The table 6.1 shows the plot of change in accuracy and change in loss for config 1.

Table 6.1:Plot of change in accuracy and change in loss for config 1

6.2 CONFIG 2

The table 6.2 shows the plot of change in accuracy and change in loss for config 2.

Table 6.2:Plot of change in accuracy and change in loss for config 2

13
6.3 CONFIG 3

The table 6.3 shows the plot of change in accuracy and change in loss for config 3.

Table 6.3:Plot of change in accuracy and change in loss for config 3

14
7. CONCLUSION AND FUTURE SCOPE

7.1 CONCLUSION

Various techniques and architectures were discussed, that were proposed to augment the
encoder-decoder model and to make conversational agents more natural and human-like. Criticism
was also presented regarding some of the properties of current chatbot models and it has been shown
how and why several of the techniques currently employed are inappropriate for the task of modeling
conversations. The performance of the training was analyzed with the help of automatic evaluation
metrics and by comparing output responses for a set of source utterances. As the cornel data must be
improved further to get better results in future. We can also try different combinations of
hyperparameters other than the configurations discussed in this paper. The training on Cornell Movie
Subtitle corpus produced results which needs further improvement and more attention on training
parameters. We can try different attention mechanisms like Luong.We can also judge the quality of a
dataset by using another similar dataset with the hyperparameters we tried for cornel data.

7.2 FUTURE SCOPE


Use an attention mechanism like luong attention which is also suggested in many papers. Also can try
different hyperparameters and evaluation metrics to improve the chat bot performance.

15
REFERENCES

Dataset
[1] Cornell movie dialog corpus dataset available at
https://www.kaggle.com/rajathmc/cornell-moviedialog-corpus

Online
[2] Attention Mechanism https://blog.floydhub.com/attention-mechanism/

[3] Sequence 2 Sequence model with Attention Mechanism


https://towardsdatascience.com/sequence-2-sequence-model-with-attention-mechanism-9e9ca2a613a

[4] Understanding Attention in neural network


https://www.kaggle.com/tientd95/understanding-attention-in-neural-network

Proceedings paper
[5]https://www.researchgate.net/publication/338100972_Conversational_AI_Chatbot_Based_on_Enc
oder-Decoder_Architectures_with_Attention_Mechanism

[6] Neural machine translation by jointly learning to align and translate Dzmitry Bahdanau
https://arxiv.org/pdf/1409.0473.pdf

[7] Effective Approaches to Attention-based Neural Machine Translation


https://arxiv.org/pdf/1508.04025.pdf

16

You might also like