Image Caption Generator
Image Caption Generator
This work investigates the hypothesis that exploiting such features could
increase accuracy in image captioning and that using all object features helps to
accurately mimic the human visual understanding of scenes. This paper aims to
present a model that makes use of this type of features through a simple
architecture and evaluate the results.
Researchers have been involved in finding an efficient way to make better
predictions, therefore we have discussed a few methods to achieve good results.
We have used the deep neural networks and machine learning techniques to
build a good model. There are two phases : feature extraction from the image
using Convolutional Neural Networks (CNN) and generating sentences in
natural language based on the image using Recurrent Neural Networks (RNN).
While human beings are able to do it easily, it takes a strong algorithm and a lot
of computational power for a computer system to do so. Many attempts have
been made to simplify this problem and break it down into various simpler
problems such as object detection, image classification, and text generation. A
computer system takes input images as two-dimensional arrays and mapping is
done from images to captions or descriptive sentences. In recent years a lot of
attention has been drawn towards the task of automatically generating captions
for images. However, while new datasets often spur considerable innovation,
benchmark datasets also require fast, accurate, and competitive evaluation
metrics to encourage rapid progress. Being able to automatically describe the
content of a picture using properly formed English sentences may be a very
challenging task, but it could have an excellent impact, as an example by
helping visually impaired people better understand the content of images online.
This task is significantly harder, for instance than the well-studied image
classification or visual perception tasks, which are a main focus within the
computer vision community.