Image Captioning

This document discusses image captioning using deep learning. It motivates image captioning by describing applications for assisting the blind and improving self-driving cars. It describes using the MS COCO dataset to train a model, preprocessing images with InceptionV3, and using an attention-based RNN model to generate captions by attending to different parts of encoded images. Key steps included extracting image features, tokenizing captions, splitting data into train and test sets, and generating captions for new images by attending to different image regions.

Uploaded by

Sita Putra Teja

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

65 views

Image Captioning

Uploaded by

Sita Putra Teja

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 17

Image

Captioning
KOLLA TEJA
2K19CSUN04013
VA I B H AV S I N G H
2K19CSUN04029
BTECH CSE 4 DSML
INTRODUCTION

• What do you see

in this picture?
• Well some of you might say “ A Giraffe is eating leaves”, some
may say “Giraffes roaming around the forest” and yet some
others might say “Two tallest living terrestrial animals”.

• Definitely all of these captions are relevant for this image and
there may be some other also. But the point I want to make is; it’s
so easy for us, as human beings, to just have a glance at a picture
and describe it in an appropriate language. Even a 5 year old could
this with ease.

• But, can A.I generate captions by taking input as image?

MOTIVATION

We must first understand how

important this problem is to real
world scenarios. Let’s see few
applications where a solution to
this problem can be very useful.
Aid to the blind— We can create a product for the blind
which will guide them travelling on the roads without the
support of anyone else. We can do this by first converting
the scene into text and then the text to voice. Both are
now famous applications of Deep Learning.

Self driving cars— Automatic driving is one of the biggest

challenges and if we can properly caption the scene
around the car, it can give a boost to the self driving
system.
DATA COLLECTION
• There are many open source datasets available for
this problem, like Flickr 8k (containing8k images),
Flickr 30k (containing 30k images), MS COCO
(containing 180k images), etc.

• In my project I had used MS COCO dataset it consists

of annotations and images each images has 5
annotations in it.
EXAMPLE
TRAINING SET
Group all captions together
having the same image ID.

Select the first 6000 image_paths

from the shuffled set.

Approximately each image id has

5 captions associated with it.

That will lead to lead to 30,000

examples.
ARCHITECTURE
Preprocess the images using InceptionV3
you will use InceptionV3 (which is pretrained on Imagenet)
to classify each image. You will extract features from the
last convolutional layer

Initialize InceptionV3 and load the pretrained Imagenet weights

you'll create a tf.keras model where the output layer is the last
convolutional layer in the InceptionV3 architecture. The shape of
the output of this layer is 8x8x2048
Caching the features extracted from InceptionV3

You will pre-process each image with InceptionV3 and

cache the output to disk. Caching the output in RAM would
be faster but also memory intensive, requiring 8 * 8 * 2048
floats per image.

Preprocess and tokenize the captions

you'll tokenize the captions (for example, by splitting on

spaces). This gives us a vocabulary of all of the unique
words in the data (for example, "surfing", "football", and so
on).
Split the data into training and
testing
• Create training and validation sets using an 80-20 split randomly.

• create a tf.data dataset to use for training our model

• We have to tune the hyperparameters to get the better results

MODEL
• we had used an attention-based model, which enables us to see what
parts of the image the model focuses on as it generates a caption.

•This example, you extract the features from the lower convolutional
layer of InceptionV3 giving us a vector of shape (8, 8, 2048).

•You squash that to a shape of (64, 2048).

•This vector is then passed through the CNN Encoder (which consists of
a single Fully connected layer).

•The RNN (here GRU) attends over the image to predict the next word
Checkpoint Epoch Loss plot

Training Loss
caption
•The evaluate function is similar to the training loop, except you don't
use teacher forcing here. The input to the decoder at each time step
is its previous predictions along with the hidden state and the
encoder output.

•Stop predicting when the model predicts the end token.

•And store the attention weights for every time step.

Caption generation
Input image:
Generated caption:
References
• https://arxiv.org/pdf/1502.03044.pdf

• https://paperswithcode.com/task/text-generation

• https://machinelearningmastery.com/develop-a-deep- learning-caption-generation-model-in-python/

• https://www.tensorflow.org/tutorials/text/image_captioning

Introduction To Industrial Automation PDF
94% (18)
Introduction To Industrial Automation PDF
458 pages
Affiniti 30 System Specifications 10.0
100% (1)
Affiniti 30 System Specifications 10.0
24 pages
66 1502 009-2 Elevator Controller Visionline 3G - 4G RFID
100% (3)
66 1502 009-2 Elevator Controller Visionline 3G - 4G RFID
33 pages
SAP Multiresource Scheduling - MasterData
67% (3)
SAP Multiresource Scheduling - MasterData
203 pages
Mindray New Pump - BeneFusion U Series
No ratings yet
Mindray New Pump - BeneFusion U Series
4 pages
Image Caption Generator Using AI: Review - 1
No ratings yet
Image Caption Generator Using AI: Review - 1
9 pages
ROHAN PRASAD FinalProjectReport - Rohan Gamer
No ratings yet
ROHAN PRASAD FinalProjectReport - Rohan Gamer
39 pages
Image Caption Generator Using Deep Learning: Guided by Dr. Ch. Bindu Madhuri, M Tech, PH.D
No ratings yet
Image Caption Generator Using Deep Learning: Guided by Dr. Ch. Bindu Madhuri, M Tech, PH.D
9 pages
Mini Project Fln..
No ratings yet
Mini Project Fln..
51 pages
Document from Deependra singh (1)
No ratings yet
Document from Deependra singh (1)
10 pages
2501
No ratings yet
2501
6 pages
Ai Image Captioning
No ratings yet
Ai Image Captioning
10 pages
Image Caption Bot With Keras and Speech Generation For
No ratings yet
Image Caption Bot With Keras and Speech Generation For
7 pages
Image Caption Generator Research Paper
No ratings yet
Image Caption Generator Research Paper
4 pages
Image Caption Generator
No ratings yet
Image Caption Generator
2 pages
Building A Voice Based Image Caption Generator With Deep Learning
No ratings yet
Building A Voice Based Image Caption Generator With Deep Learning
6 pages
Project Review
No ratings yet
Project Review
12 pages
Image Caption Generator
No ratings yet
Image Caption Generator
6 pages
Image Caption Generator Report
No ratings yet
Image Caption Generator Report
27 pages
Minor
No ratings yet
Minor
14 pages
Automatic Image Captioning Bot With CNN and RNN: - Submitted By-Harkirat Singh CSE-3 01976802717
No ratings yet
Automatic Image Captioning Bot With CNN and RNN: - Submitted By-Harkirat Singh CSE-3 01976802717
10 pages
Image Caption
No ratings yet
Image Caption
16 pages
Image Caption Generator Using Deep Learning
No ratings yet
Image Caption Generator Using Deep Learning
9 pages
ALGORITHM Saikareddy Img Cap-1742112866980
No ratings yet
ALGORITHM Saikareddy Img Cap-1742112866980
6 pages
RP Springer
No ratings yet
RP Springer
10 pages
Image Caption Generation
No ratings yet
Image Caption Generation
8 pages
BTP Report
No ratings yet
BTP Report
27 pages
Image Captioning Generator Using Deep Machine Learning
No ratings yet
Image Captioning Generator Using Deep Machine Learning
3 pages
Final Project Report
No ratings yet
Final Project Report
18 pages
IJIEMR March 2023 COPY RIGHT (2 Files Merged)
No ratings yet
IJIEMR March 2023 COPY RIGHT (2 Files Merged)
8 pages
Image Captioning
No ratings yet
Image Captioning
8 pages
ImagecaptionusingCNNandLSTM
No ratings yet
ImagecaptionusingCNNandLSTM
11 pages
A Novel Approach of Image Caption Generator Using Deep Learning
No ratings yet
A Novel Approach of Image Caption Generator Using Deep Learning
6 pages
Image Captioning
No ratings yet
Image Captioning
17 pages
Paper 17881
No ratings yet
Paper 17881
6 pages
Acd
No ratings yet
Acd
15 pages
Project Synopsis Imagecaptioning
No ratings yet
Project Synopsis Imagecaptioning
5 pages
DL Group 6 Rep
No ratings yet
DL Group 6 Rep
11 pages
Project Report
No ratings yet
Project Report
31 pages
Research Paper of Generating Caption From Image
No ratings yet
Research Paper of Generating Caption From Image
5 pages
Design of Machine Learning Algorithms For Object Captioning
No ratings yet
Design of Machine Learning Algorithms For Object Captioning
45 pages
(IJCST-V11I4P7) :dr. T. S. Suganya, Mrs. M. Divya, T. Santhosh Kumar, K. Prem Kumar
No ratings yet
(IJCST-V11I4P7) :dr. T. S. Suganya, Mrs. M. Divya, T. Santhosh Kumar, K. Prem Kumar
4 pages
Major Report Final
No ratings yet
Major Report Final
40 pages
15 Report PDF
No ratings yet
15 Report PDF
35 pages
Review 2
No ratings yet
Review 2
34 pages
DL 20i0551 Project Proposal
No ratings yet
DL 20i0551 Project Proposal
3 pages
Poster 2
No ratings yet
Poster 2
1 page
Image Caption Generator PCL
No ratings yet
Image Caption Generator PCL
19 pages
Image_Captioning_with_Visual_Attention.pdf
No ratings yet
Image_Captioning_with_Visual_Attention.pdf
16 pages
Image Caption Generation Research Paper
No ratings yet
Image Caption Generation Research Paper
9 pages
Image To Caption Generator
No ratings yet
Image To Caption Generator
7 pages
Implementation_of_Simple_and_Efficient_P
No ratings yet
Implementation_of_Simple_and_Efficient_P
8 pages
Image caption Generation Research Paper-
No ratings yet
Image caption Generation Research Paper-
8 pages
Automatic Image Captioning Using Neural Networks
No ratings yet
Automatic Image Captioning Using Neural Networks
9 pages
Image Caption Generator
No ratings yet
Image Caption Generator
13 pages
Image Caption Generator Using Deep Learning
No ratings yet
Image Caption Generator Using Deep Learning
5 pages
Final Demo
No ratings yet
Final Demo
48 pages
Visual Image Caption Generator 38
No ratings yet
Visual Image Caption Generator 38
6 pages
Aust Cse Thesis Final Book
No ratings yet
Aust Cse Thesis Final Book
72 pages
Sunnit Singh Shivam Kumar Soham Chatterjee Abhishek Kumar Sujata Dawn MuHmt
No ratings yet
Sunnit Singh Shivam Kumar Soham Chatterjee Abhishek Kumar Sujata Dawn MuHmt
6 pages
Detection and Recognition of Objects in Image Caption Generator System A Deep Learning Approach
No ratings yet
Detection and Recognition of Objects in Image Caption Generator System A Deep Learning Approach
3 pages
Image Captioning Using CNN and LSTM
No ratings yet
Image Captioning Using CNN and LSTM
9 pages
Image Captioning Generator Using CNN and LSTM
No ratings yet
Image Captioning Generator Using CNN and LSTM
8 pages
Black and White Both Sides Updated
No ratings yet
Black and White Both Sides Updated
25 pages
Image Compression: Efficient Techniques for Visual Data Optimization
From Everand
Image Compression: Efficient Techniques for Visual Data Optimization
Fouad Sabry
No ratings yet
103 BoE Instructions To ExamOfficials Oct2024 INS
No ratings yet
103 BoE Instructions To ExamOfficials Oct2024 INS
19 pages
Answers
No ratings yet
Answers
4 pages
Toc 240131 133405
No ratings yet
Toc 240131 133405
5 pages
Area-Delay-Power Efficient Fixed-Point LMS Adaptive Filter With Low Adaptation-Delay
No ratings yet
Area-Delay-Power Efficient Fixed-Point LMS Adaptive Filter With Low Adaptation-Delay
10 pages
(Ebook) Building Scalable Web Sites: Building, Scaling, and Optimizing the Next Generation of Web Applications by Cal Henderson ISBN 9780596102357, 0596102356 download
100% (1)
(Ebook) Building Scalable Web Sites: Building, Scaling, and Optimizing the Next Generation of Web Applications by Cal Henderson ISBN 9780596102357, 0596102356 download
58 pages
AKD To AKD2G Migration FAQ Final 042622
No ratings yet
AKD To AKD2G Migration FAQ Final 042622
3 pages
SE Lab Manual
No ratings yet
SE Lab Manual
73 pages
TH-42LF25W Monitor
No ratings yet
TH-42LF25W Monitor
3 pages
NyBerMan Free Internship Metagenomics
No ratings yet
NyBerMan Free Internship Metagenomics
1 page
PPT-unit 3 - 303105103
No ratings yet
PPT-unit 3 - 303105103
59 pages
An Application of 8085 Register Interfacing With Led
No ratings yet
An Application of 8085 Register Interfacing With Led
13 pages
Fire Detection Alarm Systems-BS7807 1995
No ratings yet
Fire Detection Alarm Systems-BS7807 1995
27 pages
Cse 1200 33-48
No ratings yet
Cse 1200 33-48
20 pages
recursion_MazeSolver
No ratings yet
recursion_MazeSolver
2 pages
Policy Administration System Upgrade: Perspective
No ratings yet
Policy Administration System Upgrade: Perspective
8 pages
2 Uninformed Search
No ratings yet
2 Uninformed Search
41 pages
VEGA Level Sensor VEGAFLEX 81 Guided Wave Radar Level Sensor Operating Instructions
No ratings yet
VEGA Level Sensor VEGAFLEX 81 Guided Wave Radar Level Sensor Operating Instructions
80 pages
DSA Lab Assignment- 1
No ratings yet
DSA Lab Assignment- 1
13 pages
Nandini Car2XCommunication
No ratings yet
Nandini Car2XCommunication
23 pages
School-Based Schools PressCon - District Memo #7 - 2024
No ratings yet
School-Based Schools PressCon - District Memo #7 - 2024
28 pages
Computing Revision Quiz - Google Forms-1
No ratings yet
Computing Revision Quiz - Google Forms-1
18 pages
DVD Xbox
No ratings yet
DVD Xbox
3 pages
Boland StochasticOrdersPartition 2002
No ratings yet
Boland StochasticOrdersPartition 2002
12 pages
Jahedul Islam DBA-1
No ratings yet
Jahedul Islam DBA-1
2 pages
The Critical Success Factors Study For E-Government Implementation
No ratings yet
The Critical Success Factors Study For E-Government Implementation
11 pages