Image Captioning
Image Captioning
Captioning
KOLLA TEJA
2K19CSUN04013
VA I B H AV S I N G H
2K19CSUN04029
BTECH CSE 4 DSML
INTRODUCTION
• Definitely all of these captions are relevant for this image and
there may be some other also. But the point I want to make is; it’s
so easy for us, as human beings, to just have a glance at a picture
and describe it in an appropriate language. Even a 5 year old could
this with ease.
•This example, you extract the features from the lower convolutional
layer of InceptionV3 giving us a vector of shape (8, 8, 2048).
•This vector is then passed through the CNN Encoder (which consists of
a single Fully connected layer).
•The RNN (here GRU) attends over the image to predict the next word
Checkpoint Epoch Loss plot
Training Loss
caption
•The evaluate function is similar to the training loop, except you don't
use teacher forcing here. The input to the decoder at each time step
is its previous predictions along with the hidden state and the
encoder output.
• https://paperswithcode.com/task/text-generation
• https://machinelearningmastery.com/develop-a-deep- learning-caption-generation-model-in-python/
• https://www.tensorflow.org/tutorials/text/image_captioning