Hybrid_Image_Captioning_Model

The document presents a hybrid image captioning model that utilizes a Convolutional Neural Network (CNN) and a Recurrent Neural Network (RNN) with an attention mechanism to generate descriptive captions for images. The model employs a VGG16 pre-trained CNN for feature extraction and Long Short-Term Memory (LSTM) networks for decoding the captions, achieving improved performance over traditional models. The effectiveness of the model is evaluated using the BLEU metric, demonstrating its capability in accurately generating image descriptions.

Uploaded by

magangzmp

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views

Hybrid_Image_Captioning_Model

Uploaded by

magangzmp

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Hybrid Image Captioning Model

Lipismita Panigrahi Raghab Ranjan Panigrahi Saroj Kumar Chandra

Department of School of Computer Department of Computer Science and Department of Computer Science and
Applications Engineering Engineering
KIIT University SOA University OPJU University
Bhubaneswar, India Bhubaneswar, India Raigarh, India
[email protected] [email protected] [email protected]

Abstract— Image captioning is implemented using Deep caption creation issues. These outcomes demonstrate that
learning and NLP (Natural Language Processing) resulting in our suggested model outperforms conventional models in
producing a description of an image. The proposed model terms of image captioning. Finally, the qualitative and
generates a caption for an image using a Convolutional Neural quantitative performance evaluation is used to score the
Network (CNN) together with a Recurrent Neural Network quality of the information produced and assess the
(RNN) and area of attention. Previously, the image names correctness of the produced caption.
were used as keys to map the images with descriptions. In
order to achieve high performance, in the proposed model the When the target image is analyzed to the training images,
image caption is based on the relationship between the areas of this model uses the learned data to produce a respectable
a picture (attention model), the words used in the caption, and description using a Convolutional Neural Network (CNN) as
the state of an RNN language model. The approach of the encoder, features are retrieved from the images. Long
progressive loading is employed for the loading of the image Short-Term Memory (LSTM) is employed to decode the
dataset. Further, for encoding the image dataset into a feature description of the image. Finally, the BLEU metric
vector, VGG16 a pre-trained CNN is used. The extracted technique is used to score the quality of the information
feature vector is given as input to the RNN model. These image produced and assess the correctness of the produced caption.
encodings are output to a specific type of RNN model known
as Long Short-Term Memory (LSTM) networks. Subsequently, II. RELATED WORK
the LSTM works on decoding the feature vector and predicts
the sequence of words, resulting in the generation of The application of ML and Deep Learning in image
descriptions or captions. The training performance is processing has been an area of immense interest for many
measured using one of the model's quantitative analysis researchers in the recent era [1-6]. The scene understanding
metrics known as BLEU. that is present in the image plays a vitalal role in developing
the caption for the picture and is important in many
applications (such as searching using pictures, reading
Keywords— Convolutional Neural Network, image captions, stories from collections, assisting people who are visually
Recurrent Neural Network, LSTM, attention model, encoder, impaired to understand while browsing the internet, and so
decoder. on). Many different picture captioning models have been
developed over last decades [7-9]. The use of encoder-
I. INTRODUCTION decoder models for image captioning has recently received a
Image captioning is one of the most cutting-edge models lot of attention [10-13]. In its ordinary form, a CNN converts
in the study of machine learning (ML) and artificial the input image into a vectorial representation by encoding it
intelligence (AI). The aim of image captioning is describing and then uses that representation as the beginning input for
a picture using idiomatic language. Though, image an RNN. Sequentially, the RNN predicts the word in the
captioning has a variety of uses like: helping people who are caption given the prior word without the need to limit the
visually impaired and offering suggestions, modifying apps, temporal dependence to a predetermined order, as in
creating a virtual assistant, retrieving photos more quickly, techniques based on n-grams. There are various ways to
etc. But, there are certain challenges also like the input the CNN image representation into the RNN. While
identification of many items in a picture, the discovery of some writers [12, 13] just utilize it to compute the RNN's
their associations, classification of objects, and the initial state, others enter it during each iteration of the RNN
combining of words that may not follow standard language [11, 15].
modeling, etc. Automatic Image Captioning (AIC) is a field
that is still under development. Utilising NLPs, the image G. Sairam et. al [16] proposed a model of caption
captioning technique combines object detection and generation for images using deep neural networks. CNN and
modeling into appropriate sentences. LSTM jointly worked to create a model that could generate a
caption. The authors use CNN as the encoder and features
Automatic caption creation from photographs has grown are retrieved from the images. LSTM is employed to decode
to be both a necessary chore and a fascinating study topic as the description of the image.
a result of the constantly expanding multimedia content from
online social networks. The method of creating an image Further, Marco Pedersoli et. al [17] worked on a model
caption involves first extracting the features of the image of an attention-based paradigm called "Areas of Attention”
and then creating a textual description based on the features for image captioning. The proposed method uses three
that are extracted. Further, this captioning model extends interactions to model the relationships between images,
add a new technique to attend various image boundaries as captions, and an RNN. The Attention model mainly focuses
the caption is being constructed phrase by phrase.. These on particular areas according to the current input while
models have produced efficient outcomes in the area of considering the normal network.

978-1-6654-9294-2/23/$31.00 ©2023 IEEE

Authorized licensed use limited to: Zhejiang University. Downloaded on January 24,2025 at 15:55:18 UTC from IEEE Xplore. Restrictions apply.
However, Steven J. Rennie et. al [18] proposed a model A. VGG16 CNN- LSTM RNN encoder-decoder model
using reinforcement learning for an automatic caption Initially, A VGG16 [16] a pre-trained CNN is utilized as
generation of images, showing performance improvement in an encoder to extract the features from the image I and
optimizing systems using the MSCOCO task. Self-critical compare the target image to the training images. The CNN is
sequence training is used to construct systems. The model a particular type of neural network which operates on image
has been propped to classify the WBC cell into any of the processing and is a feed-forward artificial neural network.
five WBC studies and concluded that there should be a CNN is a network model for deep learning algorithms that
development in the authentication methods, including the are utilized for image data processing. CNNs are mostly
factors that affect the authentication process. used for identifying, objects in the images [16]. Let Eq. (1-4)
Further, Zhilin Yang et. al [19] suggested a review regulate the whole network.
network, a brand-new addition to the encoder-decoder
system. In this paper, RNN decoders are taken into jt =σ(M jx X j + M jw wt −1 )
(1)
consideration with both CNN and RNN. The proposed
network can improve the encoder-decoder model by making Where, jt is the input gate at time t, and M stands for the
the model flexible. After each review step, the review trained parameters. The sigmoid operation, which employs
network generates the thought vector, then the vector is fed values between 0 and 1 to indicate how much of each
to the attention model in the standard encoder-decoders are component's output should be passed to the following
an exception to the framework. wt−1
component, uses the variable to denote the module's
Subsequently, in order to make the adaptation of the output at time t-1 [22].
captioning model easier, W. Zhao et. al [20] describe a
cross-modal retrieval aided approach to cross-domain fgt =σ(M fgx X j + M fgwwt −1 ) (2)
picture captioning that makes use of a cross-modal retrieval
model to construct fictitious pairs of images and sentences in Where, fgt stand for the forget gate which represents
the target domain.
the value of the forgotten cell.
Later on, C. Liu et. al [21] propose a novel transformer-
based remote sensing image change captioning (RSICC) og t = σ(M σx X t + M σw wt −1 ) (3)
model for producing human-like language explanations for
the land cover changes in multi-temporal RS images. This Where, ogt stands for the output gate which decides
study is 3 fold such as: 1) a CNN-based feature extractor
whether or not to pass the cell's updated value.
creates high-level features of RS image pairs, 2) a dual-
branch Transformer encoder (DTE) to enhance the feature Additionally, in order to decode the visual description,
classification for the alterations, and 3) a caption decoder the extracted feature vector is fed into the LSTM (a
creates phrases expressing the differences. particular type of RNN model). RNNs are a specific kind of
artificial network in which coordinated phases are formed by
associations between the units. The advantage of using RNN
III. PROPOSED MODEL over standard types is that RNN uses its memory to deal
with the self-assertive organisation of information sources.
This section outlines the procedures for the hybrid image Though the traditional RNNs have a certain advantage at the
captioning model, which is shown in Fig. 1. In this research, same time, the lack of consideration for long-term
we create a methodological model for automatically interdependence is one of the limitations of RNNs [10]. It is
identifying the caption of the image based on the possible that in some instances the typical RNN fails due to
relationship between the areas of a picture (attention model), a significant discrepancy between the relevant data and the
the words used in the caption, and an RNN language model's locations in which it is required.
current state.
To overcome this issue this study used LSTM (shown in
The main objective of this study is image caption Fig. 2) proposed by Hochreiter and Schmiber [23]. LSTM is
generation using natural language expressions. Extracting an improved version of RNN that can store and remember
the features in the image and predicting the priority of data for a longer term than RNN and is efficient in deep
captions is a difficult task when compared to other image learning. It works on discrete or continuous data. The
processing tasks like identification, and segmentation predicted output that predicts the words that will be
localization. As NLP is also involved in the model for generated is shown in Eq. (4).
identifying various features and describing in an image.
Neural network-driven approaches are the most widely used OPt +1 = Soft max(wt )
method for solving the caption generation problem because (4)
of recent developments in training neural networks, the Until an end sequence (.) is encountered, the whole LSTM
availability of GPU computing power, and enormous network is constantly repeated. These predicted words are all
datasets. included in the description of the input image. The LSTM is
created in such a way that it can only predict each word after
In this study, the Flickr8k images are employed as a
viewing the entire image and the previously generated words
training dataset. It contains 8091 images. The objective of
this study is twofold: 1) VGG16 CNN- LSTM RNN as defined by . The sum of losses
encoder-decoder model, 2) Apply attention for prediction evoked by the images I with the with proportionate captions
and feedback. The steps are described below: wt is minimized during training shown in Eq. (5). The details
formulas and experiment can be found in [16, 17].

Authorized licensed use limited to: Zhejiang University. Downloaded on January 24,2025 at 15:55:18 UTC from IEEE Xplore. Restrictions apply.
localised picture region appearances rather than global
image representations.
To extract the image regions as the caption word by
word, the marginal distribution over the regions, is
Where, θ refers to all of the CNN and RNN component's used to extract the region.
parameters collectively. Due to local minima in the loss, this Descriptions as shown in Eq. (6)
leads to an approximate maximum likelihood estimation.
B. Apply attention for prediction and feedback

In this step, the VGG16 CNN - LSTM RNN encoder-

decoder model extends with a framework to attend to
various image regions as the caption is being constructed Psh ∈ Rnr
phase by phase [24]. Subsequently, the caption words and Where, loads all location probabilities at time
image
regions can be connected directly. These connections are t. We generate Eq. (7) by updating the Eq. (3) of the VGG16
generalize from image-level captions during training, more CNN - LSTM RNN encoder-decoder model with this visual
than weakly-supervised object detector training. By representation, which is concatenated to the created word in
localising the appropriate regions during testing, these the feedback signal of the state.
associations help to improve captioning.
The benefit of employing the attention mechanism is that
it increases generalisation for identifying recognisable scene Finally, the BLEU metric technique is used to score the
elements in novel compositions by linking words to quality of the information generated and assess the
correctness of the generated caption.

Fig. 1. Proposed Model.

Fig. 2. Architecture of LSTM layer.

IV. EXPERIMENTAL RESULTS AND DISCUSSION contains 8091 images is used in the project. The proposed
In this section, we evaluate the relative value of the method is implemented using Python with the library Keras,
various model elements, the efficiency of the various TensorFlow, and Matplotlib.
attentional areas, and the impact of jointly adjusting the Initially mounting the drive for data collection and then
CNN and RNN elements. A Flickr8k image dataset which loading the images into a folder in order to make the data

Authorized licensed use limited to: Zhejiang University. Downloaded on January 24,2025 at 15:55:18 UTC from IEEE Xplore. Restrictions apply.
loading easier while training and also verifying the dataset a dictionary in order to load the data into the model the
for accurate contents (shown in Fig. 3). Further, extract the dictionary mapped format should be converted into list
feature from an image. Given a directory name, the function format and feed the RNN with the sequence of data which
extract-features (shown in Fig. 4) will extract features of an was previously created for easier loading of data into the
image by loading the images into VGG16 architecture. A 1- model for processing the image datasets with the description
dimensional vector with 4.096 elements makes up the text for caption generation. Further, a attention based model
image's characteristics. Image descriptions mapped image is used for prediction and feedback (all experiment can
features are returned by the function. found in [17]. Finally fitting the model to train the model
with training data with 20 epochs is shown in Fig. 5. The
Then cleaning the description text that is already
model is trained with 20 epochs with a loss of 20.2% and an
tokenized. To cut down on the number of words in the text, accuracy of 85% and the losses and caption losses
we will fix it up. The clean-descriptions function goes calculated by Eq.(5) represent in Fig. 6 respectively. The
through each description and cleans the wording after sum of losses evoked by the images I (x-axis) with the with
receiving the dictionary of image identifiers to descriptions. proportionate captions wt (y-axis) is minimized during
Further, mapped the images with their respective training.
descriptions. Then, the image and description are mapped in

Fig. 3. Loading and verifying the datasets

Fig. 4. Extracting features from an image

Authorized licensed use limited to: Zhejiang University. Downloaded on January 24,2025 at 15:55:18 UTC from IEEE Xplore. Restrictions apply.
Fig. 5. Fitting the model to train the model with training data with 20 epochs.

Fig. 6. (a) Represent the losses and (b) represent the caption losses.

V. CONCLUSION features,” in: 2016 Int. Conf. Adv. Comput. Commun. Informatics,
IEEE, pp. 1096–1102, 2016.
In this research, a hybrid model based on CNN and [4] L. Panigrahi, K. Verma, B.K. Singh, “Evaluation of Image Features
LSTM with an attention network is developed for automatic Within and Surrounding Lesion Region for Risk Stratification in
image captioning. In the suggested paradigm, encoder- Breast Ultrasound Images,” IETE J. Res., pp. 1–12, 2019.
decoder architecture was adopted. the CNN serves as the [5] Panigrahi, L., Verma, K., & Singh, B.K., “Ultrasound image
encoder in this process, converting the image to a segmentation using a novel multi-scale Gaussian kernel fuzzy
representation of vector features. Then, the synchronic clustering and multi-scale vector field convolution.” Expert Systems
with Applications, vol. 115, pp. 486-498, 2019.
sentence is produced by a model known as LSTM (selected
[6] Bafna, Y., Verma, K., Panigrahi, L., & Sahu, S. P., “Automated
as decoder). To enhance performance, the datasets are boundary detection of breast cancer in ultrasound images using
loaded into the model using the progressive loading watershed algorithm.” In Ambient communications and computer
technique. In order to amend the accuracy a method called systems, Springer, Singapore., pp. 729-738, 2018.
the “attention model” is used which increases the accuracy [7] Wang, Z., Shi, S., Zhai, Z., Wu, Y., & Yang, R., “ArCo: Attention-
as well as being capable of taking video input and generating reinforced transformer with contrastive learning for image
captions. captioning,” Image and Vision Computing, vol. 128, pp. 104570,
2022.
REFERENCES [8] Zhang, Z., Zhang, H., Wang, J., Sun, Z., & Yang, Z., “Generating
news image captions with semantic discourse extraction and
[1] L. Panigrahi, K. Verma, B.K. Singh, “Hybrid segmentation method contrastive style-coherent learning,” Computers and Electrical
based on multi‐scale Gaussian kernel fuzzy clustering with spatial Engineering, vol. 104, pp. 108429, 2022.
bias correction and region‐scalable fitting for breast US images,” IET
[9] Hu, N., Fan, C., Ming, Y., & Feng, F., “MAENet: A Novel Multi-
Comput., Vis. Vol. 12, pp. 1067–1077, 2018.
head Association Attention Enhancement Network for Completing
[2] B.K. Singh, K. Verma, L. Panigrahi, A.S. Thoke, “Integrating Intra-modal Interaction in Image Captioning,” Neurocomputing, 2022.
radiologist feedback with computer aided diagnostic systems for
[10] S. Bengio, O. Vinyals, N. Jaitly, and N. Shazeer, “Scheduled
breast cancer risk prediction in ultrasonic images: An experimental
sampling for sequence prediction with recurrent neural networks,” In
investigation in machine learning paradigm,” Expert Syst. Appl. Vol.
NIPS, 2015.
90, pp. 209–223, 2017.
[11] J. Donahue, L. Hendricks, M. Rohrbach, S. Venugopalan, S.
[3] L. Panigrahi, K. Verma, B.K. Singh, “An enhancement in automatic
Guadarrama, K. Saenko, and T. Darrell, “Long-term recurrent
seed selection in breast cancer ultrasound images using texture

Authorized licensed use limited to: Zhejiang University. Downloaded on January 24,2025 at 15:55:18 UTC from IEEE Xplore. Restrictions apply.
convolutional networks for visual recognition and description,” In captioning,” 2017 IEEE Conference on Computer Vision and Pattern
CVPR, 2015. Recognition (CVPR), pp. 1179-1195, 2017.
[12] A. Karpathy and L. Fei-Fei, “Deep visual-semantic alignments for [19] Zhilin Yang. Ye Yuan, Yuexin Wu, William W Cohen, and Ruslan R
generating image descriptions,” In CVPR, 2015. Salakhutdinov, “Review networks for caption generation,” In NIPS,
[13] O. Vinyals, A. Toshev, S. Bengio, and D. Erhan,” Show and tell: A 2016.
neural image caption generator,” In CVPR, 2015. [20] W. Zhao, X. Wu and J. Luo, "Cross-Domain Image Captioning via
[14] K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhutdinov, R. Cross-Modal Retrieval and Model Adaptation," in IEEE Transactions
Zemel, and Y. Bengio, “Show, attend and tell: Neural image caption on Image Processing, vol. 30, pp. 1180-1192, 2021.
generation with visual attention,” In ICML, 2015. [21] C. Liu, R. Zhao, H. Chen, Z. Zou and Z. Shi, "Remote Sensing Image
[15] J. Mao, W. Xu, Y. Yang, J. Wang, Z. Huang, and A. Yuille, “Deep Change Captioning With Dual-Branch Transformers: A New Method
captioning with multimodal recurrent neural networks (m-RNN),” and a Large Scale Dataset," in IEEE Transactions on Geoscience and
ICLR, 2015. Remote Sensing, vol. 60, pp. 1-20, 2022.
[16] Sairam, M. Mandha, Prashanth and P. Swetha, "Image Captioning [22] Vedantam, Ramakrishna, C. Lawrence Zitnick and Devi Parikh,
using CNN and LSTM," 4th Smart Cities Symposium (SCS 2021), "Cider: Consensus-based image description evaluation," Proceedings
pp. 274-277, 2021. of the IEEE conference on computer vision and pattern recognition,
2015.
[17] Pedersoli, M., Lucas, T., Schmid, C., & Verbeek, J., “Areas of
attention for image captioning,” In Proceedings of the IEEE [23] Hochreiter, S., & Schmidhuber, J., “Long short-term
international conference on computer vision, pp. 1242-1250, 2017. memory,” Neural computation, vol. 9(8), pp. 1735-1780, 1997.
[18] Steven J. Rennie, Etienne Marcheret, Youssef Mroueh, Jarret Ross. [24] Tiwari, V., Bapat, K., Shrimali, K. R., Singh, S. K., Tiwari, B., Jain,
and Vaibhava Goel, “Self- critical sequence training for image S., & Sharma, H. K. "Automatic generation of chest x-ray medical
imaging reports using lstm-cnn." In Proceedings of the international
conference on data science, machine learning and artificial
intelligence, pp. 80-85. 2021.

Authorized licensed use limited to: Zhejiang University. Downloaded on January 24,2025 at 15:55:18 UTC from IEEE Xplore. Restrictions apply.

Data Visualization Exploring and Explaining With Data J.camm Bibis - Ir
100% (1)
Data Visualization Exploring and Explaining With Data J.camm Bibis - Ir
418 pages
Inventory Management System UML Diagram - Complete
100% (2)
Inventory Management System UML Diagram - Complete
15 pages
Saudi: STC STC
No ratings yet
Saudi: STC STC
5 pages
Image_Captioning_-_A_Deep_Learning_Approach_Using_CNN_and_LSTM_Network
No ratings yet
Image_Captioning_-_A_Deep_Learning_Approach_Using_CNN_and_LSTM_Network
6 pages
Image Captioning Generator Using Deep Machine Learning
No ratings yet
Image Captioning Generator Using Deep Machine Learning
3 pages
IJIEMR March 2023 COPY RIGHT (2 Files Merged)
No ratings yet
IJIEMR March 2023 COPY RIGHT (2 Files Merged)
8 pages
Image Captioning
No ratings yet
Image Captioning
8 pages
Two Tier LSTM Model
No ratings yet
Two Tier LSTM Model
13 pages
Image Captioning
No ratings yet
Image Captioning
17 pages
I Image Caption Generation Using Contextual Information Fusion With Bi-LSTM-s
No ratings yet
I Image Caption Generation Using Contextual Information Fusion With Bi-LSTM-s
10 pages
Image Captioning - A Deep Learning Approach
No ratings yet
Image Captioning - A Deep Learning Approach
4 pages
Major Report Final
No ratings yet
Major Report Final
40 pages
Conference Paper A5
No ratings yet
Conference Paper A5
9 pages
Image Caption Bot With Keras and Speech Generation For
No ratings yet
Image Caption Bot With Keras and Speech Generation For
7 pages
TSP_CMC_53245
No ratings yet
TSP_CMC_53245
18 pages
AIML - Final Report _ version1
No ratings yet
AIML - Final Report _ version1
24 pages
Image Caption Generation
No ratings yet
Image Caption Generation
8 pages
IJNRD2309143
No ratings yet
IJNRD2309143
11 pages
Image Caption Generator Research Paper
No ratings yet
Image Caption Generator Research Paper
4 pages
Aneja Convolutional Image Captioning CVPR 2018 Paper
No ratings yet
Aneja Convolutional Image Captioning CVPR 2018 Paper
10 pages
Ref12
No ratings yet
Ref12
7 pages
Image_Caption_Generation_With_Adaptive_Transformer
No ratings yet
Image_Caption_Generation_With_Adaptive_Transformer
6 pages
RP Springer
No ratings yet
RP Springer
10 pages
Image Captioning Synopsis
No ratings yet
Image Captioning Synopsis
17 pages
(IJCST-V11I4P7) :dr. T. S. Suganya, Mrs. M. Divya, T. Santhosh Kumar, K. Prem Kumar
No ratings yet
(IJCST-V11I4P7) :dr. T. S. Suganya, Mrs. M. Divya, T. Santhosh Kumar, K. Prem Kumar
4 pages
Image Caption Generator
No ratings yet
Image Caption Generator
2 pages
DL- Review of Research Papers -Image_Caption_Generation
No ratings yet
DL- Review of Research Papers -Image_Caption_Generation
34 pages
s11063-018-09973-5
No ratings yet
s11063-018-09973-5
17 pages
118-presentation
No ratings yet
118-presentation
26 pages
A Novel Approach of Image Caption Generator Using Deep Learning
No ratings yet
A Novel Approach of Image Caption Generator Using Deep Learning
6 pages
Gu An Empirical Study ICCV 2017 Paper PDF
No ratings yet
Gu An Empirical Study ICCV 2017 Paper PDF
10 pages
An Empirical Study of Language CNN For Image Captioning
No ratings yet
An Empirical Study of Language CNN For Image Captioning
10 pages
Image Captioning Using R-CNN & LSTM Deep Learning Model
No ratings yet
Image Captioning Using R-CNN & LSTM Deep Learning Model
4 pages
Automatic Image Captioning Using Neural Networks
No ratings yet
Automatic Image Captioning Using Neural Networks
9 pages
A_Novel_Approach_of_Image_Caption_Generator_using_Deep_Learning
No ratings yet
A_Novel_Approach_of_Image_Caption_Generator_using_Deep_Learning
6 pages
1603.09016v2
No ratings yet
1603.09016v2
8 pages
Project Review
No ratings yet
Project Review
12 pages
Image Captioning Using Deep Learning Mait
No ratings yet
Image Captioning Using Deep Learning Mait
8 pages
A_Comparative_Study_of_Machine_Learning_Based_Image_Captioning_Models
No ratings yet
A_Comparative_Study_of_Machine_Learning_Based_Image_Captioning_Models
6 pages
Seminar Report Final
No ratings yet
Seminar Report Final
20 pages
ROHAN PRASAD FinalProjectReport - Rohan Gamer
No ratings yet
ROHAN PRASAD FinalProjectReport - Rohan Gamer
39 pages
Image Caption Generator Using AI: Review - 1
No ratings yet
Image Caption Generator Using AI: Review - 1
9 pages
Image Caption Generator Using Deep Learning
No ratings yet
Image Caption Generator Using Deep Learning
5 pages
Image Caption Generator Using Deep Learning
No ratings yet
Image Caption Generator Using Deep Learning
9 pages
Paper 17881
No ratings yet
Paper 17881
6 pages
Show Attend and Tell
No ratings yet
Show Attend and Tell
10 pages
Building A Voice Based Image Caption Generator With Deep Learning
No ratings yet
Building A Voice Based Image Caption Generator With Deep Learning
6 pages
Image Captioning: Department of Computer Science University of Engineering & Technology Taxila
No ratings yet
Image Captioning: Department of Computer Science University of Engineering & Technology Taxila
10 pages
Automated Image Captioning Using CNN and RNN
No ratings yet
Automated Image Captioning Using CNN and RNN
17 pages
Sunnit Singh Shivam Kumar Soham Chatterjee Abhishek Kumar Sujata Dawn MuHmt
No ratings yet
Sunnit Singh Shivam Kumar Soham Chatterjee Abhishek Kumar Sujata Dawn MuHmt
6 pages
Image Caption Generator PCL
No ratings yet
Image Caption Generator PCL
19 pages
Attention Based Image Caption Generation ABICG Using Encoder-Decoder Architecture
No ratings yet
Attention Based Image Caption Generation ABICG Using Encoder-Decoder Architecture
9 pages
9
No ratings yet
9
9 pages
DL Group 6 Rep
No ratings yet
DL Group 6 Rep
11 pages
Image Captioning Model Using Attention and Object
No ratings yet
Image Captioning Model Using Attention and Object
17 pages
Mini Project Fln..
No ratings yet
Mini Project Fln..
51 pages
Design of Machine Learning Algorithms For Object Captioning
No ratings yet
Design of Machine Learning Algorithms For Object Captioning
45 pages
Acd
No ratings yet
Acd
15 pages
DW & Caption Generator - Paper 1
No ratings yet
DW & Caption Generator - Paper 1
6 pages
PGCON Paper Final
No ratings yet
PGCON Paper Final
4 pages
MLSP Project Report
No ratings yet
MLSP Project Report
2 pages
Image Caption Generator Using Deep Learning
No ratings yet
Image Caption Generator Using Deep Learning
8 pages
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
U2 - ARM Processor
No ratings yet
U2 - ARM Processor
85 pages
SC 200T00A ENU PowerPoint 08
No ratings yet
SC 200T00A ENU PowerPoint 08
24 pages
In-Ceiling / In-Wall Loudspeaker
No ratings yet
In-Ceiling / In-Wall Loudspeaker
2 pages
Medicinal Chemistry Unit 3 Complete Carewellpharma - Rotated
No ratings yet
Medicinal Chemistry Unit 3 Complete Carewellpharma - Rotated
29 pages
Getting Started With Dymola
No ratings yet
Getting Started With Dymola
85 pages
Stock Market Prediction
100% (1)
Stock Market Prediction
15 pages
Database Infographic
No ratings yet
Database Infographic
1 page
PXF 5 11 2
No ratings yet
PXF 5 11 2
252 pages
How To Build A Table
100% (2)
How To Build A Table
14 pages
EQ T MICROONDAS - ENRAF NONIUS - RADARMED 650+ 950+ - ESPECIFICACIONES (Número de Parte)
No ratings yet
EQ T MICROONDAS - ENRAF NONIUS - RADARMED 650+ 950+ - ESPECIFICACIONES (Número de Parte)
7 pages
Auto Ethernetbr
No ratings yet
Auto Ethernetbr
20 pages
IPU Computer Network File Sem-6
No ratings yet
IPU Computer Network File Sem-6
36 pages
Morph Effect
No ratings yet
Morph Effect
14 pages
Numpy
No ratings yet
Numpy
15 pages
PLE2V - Solutions
No ratings yet
PLE2V - Solutions
27 pages
Beam Saber Pilot Playbooks V0.41
No ratings yet
Beam Saber Pilot Playbooks V0.41
39 pages
Feiyu Pocket 2 Manual Englis
No ratings yet
Feiyu Pocket 2 Manual Englis
18 pages
Isom 3400 - Python For Business Analytics 2. Python Basics: Yingpeng Robin Zhu
No ratings yet
Isom 3400 - Python For Business Analytics 2. Python Basics: Yingpeng Robin Zhu
55 pages
Download Complete Time Series Clustering and Classification 1st Edition Elizabeth Ann Maharaj PDF for All Chapters
100% (2)
Download Complete Time Series Clustering and Classification 1st Edition Elizabeth Ann Maharaj PDF for All Chapters
47 pages
Binary Decoder
No ratings yet
Binary Decoder
6 pages
Power BI DAX Functions Extended Cheatsheet 1712521727
100% (1)
Power BI DAX Functions Extended Cheatsheet 1712521727
8 pages
Iloq s5
No ratings yet
Iloq s5
16 pages
Activity in Cyberspace
100% (1)
Activity in Cyberspace
2 pages
ITPP Principles of Procedural Programming
No ratings yet
ITPP Principles of Procedural Programming
41 pages
Chapter - 4
No ratings yet
Chapter - 4
17 pages
Nuvole Bianchi - Ludovico Einaudi - Sheet Music For Piano (Solo) Musescore - Com 2
No ratings yet
Nuvole Bianchi - Ludovico Einaudi - Sheet Music For Piano (Solo) Musescore - Com 2
1 page
En Catalog Ev9
No ratings yet
En Catalog Ev9
22 pages