0% found this document useful (0 votes)

20 views

Lecture12 1MultimodalFusion

Uploaded by

Sanjay Reddy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views

Lecture12 1MultimodalFusion

Uploaded by

Sanjay Reddy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 66

Advanced

Multimodal Machine Learning

Lecture 12.1: Multimodal Fusion
and New Directions
Louis-Philippe Morency

* Original version co-developed with Tadas Baltrusaitis

1
Lecture Objectives

▪ Recap: multimodal fusion

▪ Kernel methods for fusion
▪ Multiple Kernel Learning
▪ New directions in multimodal machine learning
▪ Representation
▪ Alignment
▪ Translation
▪ Fusion
▪ Co-Learning
▪ Recap of multimodal challenges
Quick Recap:
Multimodal Fusion
3
Multimodal fusion

▪ Process of joining information

from two or more modalities to
perform a prediction
▪ Examples
▪ Audio-visual speech recognition
▪ Audio-visual emotion recognition
▪ Multimodal biometrics
▪ Speaker identification and
diarization
▪ Visual/Media Question answering
Multimodal Fusion

▪ Two major types

▪ Model Free
▪ Early, late, hybrid
▪ Model Based
▪ Neural Networks Prediction
▪ Graphical models
▪ Kernel Methods
Fancy
algorithm

Modality 1 Modality 2 Modality 3

Graphical Model: Learning Multimodal Structure
Sentiment
Modality-private structure
y
• Internal grouping of observations
ℎ1𝐴 ℎ2𝐴 ℎ3𝐴 ℎ4𝐴 ℎ5𝐴
Modality-shared structure
• Interaction and synchrony 𝒙𝑨𝟏 𝒙𝑨𝟐 𝒙𝑨𝟑 𝒙𝑨𝟒 𝒙𝑨𝟓

We saw the yellowdog

6
Multi-view Latent Variable Discriminative Models
Sentiment
Modality-private structure
y
• Internal grouping of observations
ℎ1𝐴 ℎ2𝐴 ℎ3𝐴 ℎ4𝐴 ℎ5𝐴
Modality-shared structure
ℎ1𝑉 ℎ2𝑉 ℎ3𝑉 ℎ4𝑉 ℎ5𝑉
• Interaction and synchrony 𝒙𝑨𝟏 𝒙𝑨𝟐 𝒙𝑨𝟑 𝒙𝑨𝟒 𝒙𝑨𝟓
𝒙𝑽𝟏 𝒙𝑽𝟐 𝒙𝑽𝟑 𝒙𝑽𝟒 𝒙𝑽𝟓

We saw the yellowdog

𝑝 𝑦 𝒙𝑨 , 𝒙𝑉 ; 𝜽) = ෍ 𝑝 𝑦, 𝒉𝑨 , 𝒉𝑽 𝒙𝑨 , 𝒙𝑽 ; 𝜽
𝒉𝑨 ,𝒉𝑽
➢ Approximate inference using loopy-belief
7
Multimodal Fusion:
Multiple Kernel Learning

8
What is a Kernel function?

▪ A kernel function: Acts as a similarity metric between

data points

𝐾 𝒙𝑖 , 𝒙𝑗 = 𝜙 𝒙𝑖 𝑇 𝜙(𝒙𝑗 ) = 𝜙 𝒙𝑖 , 𝜙(𝒙𝑗 ) , where 𝜙: 𝐷 → 𝑍

▪ Kernel function performs an inner product in feature map

space 𝜙
▪ Inner product (a generalization of the dot product) is often
denoted as . , . in SVM papers
▪ 𝒙 ∈ ℝ𝐷 (but not necessarily), but 𝜙 𝒙 can be in any space –
same, higher, lower or even in an infinite dimensional space
Non-linearly separable data

Not linearly separable

Same data, but now linearly separable

▪ Want to map our data to a linearly separable space

▪ Instead of 𝒙, want 𝜙(𝒙), in a separable space (𝜙(𝒙) is a feature
map)
▪ What if 𝜙(𝒙) is much higher dimensional? We do not want to learn
more parameters and mapping could become very expensive
Radial Basis Function Kernel (RBF)

▪ Arguably the most popular SVM kernel

1 2
▪ 𝐾 𝑥𝑖 , 𝑥𝑗 = exp − 𝑥𝑖 − 𝑥𝑗
2𝜎2
▪ 𝜙 𝒙 =?
▪ It is infinite dimensional and fairly involved, no easy way to actually
perform the mapping to this space, but we know what an inner
product looks like in it
▪ 𝜎=?
▪ a hyperparameter
▪ With a really low sigma the model becomes close to a KNN approach
(potentially very expensive)
Some other kernels

▪ Other kernels exist

▪ Histogram Intersection Kernel – good for histogram
features
▪ String kernels – specifically for text and sentence
features
▪ Proximity distribution kernel
▪ (Spatial) pyramid matching kernel
Kernel CCA

▪ If we remember CCA it used only inner products in definitions when dealing with
data, that means we can again use kernels

▪ We can now map into a high-dimensional non-linear space instead

[Lai et al. 2000]

Different properties of different signals

How do we deal with heterogeneous or multimodal data?

▪ The data of interest is not in a joint space so appropriate kernelsfor
each modality might be different

Multiple Kernel Learning (MKL) is a way to address this

▪ Was popular for image classification and retrieval before deep learning
approaches came around (winner of 2010 VOC challenge, ImageClef
2011 challenge)
▪ MKL - fell slightly out of favor when deep learning approaches became
popular
▪ Still useful when large datasets are not available
Multiple Kernel Learning

▪ Instead of providing a single kernel and validating which one works optimize
in a family of kernels (or different families for different modalities)
▪ Works well for unimodal and multimodal data, very little adaptation is needed

[Lanckriet 2004]
MKL in Unimodal Case

▪ Pick a family of kernels and learn which

kernels are important for the classification
case
▪ For example a set of RBF and polynomial
kernels

K
MKL in Multimodal/Multiview Case

▪ Pick a family of kernels for

each modality and learn which
kernels are important for the
classification case
▪ Does not need to be different
modalities, often we use
different views of the same
modality (HOG, SIFT, etc.)

K
New Directions:
Representation
18
Representation 1: Hash Function Learning

▪ We talked about coordinated representations, but mostly

enforced “simple” coordination
▪ We can make embeddings more suitable for retrieval
▪ Enforce a Hamming space (binary n-bit space)

[Cao et al. Deep visual-semantic hashing for cross-modal retrieval, KDD 2016]
Representation 2: Order-Embeddings

▪ We talked about coordinated representations, but mostly

enforced “simple” coordination
▪ Can we take it further?
▪ Replaces symmetric similarity

▪ Enforce approximate structure when training the embedding

[Vendrov et al. Order-embeddings of images and language, ICLR 2016]
Representation 3: Hierarchical Multimodal LSTM

▪ Attempts to model region-

based representations using
phrases rather than simply an
overview sentence for the
image

▪ Uses these region based

phrases to hierarchically build
sentences

Niu, Zhenxing, et al. "Hierarchical multimodal lstm for dense visual-semantic

embedding." Computer Vision (ICCV), 2017 IEEE International Conference on. IEEE,
2017.
Representation 3: Hierarchical Multimodal LSTM

Multimodal Embedding HM-LSTM

Niu, Zhenxing, et al. "Hierarchical multimodal lstm for dense visual-semantic embedding." Computer
Vision (ICCV), 2017 IEEE International Conference on. IEEE, 2017.
Representation 4: Multimodal VAE (MVAE)

▪ Introduce a multimodal variational autoencoder (MVAE) with a new

training paradigm that learns a joint distribution and is robust to
missing data

[Wu, Mike, and Noah Goodman. “Multimodal Generative Models for Scalable Weakly-Supervised Learning.”,
NIPS 2018]

23
Representation 4: Multimodal VAE (MVAE)

▪ Transform unimodal datasets into “multi-modal” problems by

treating labels as a second modality

𝑧~𝑝(𝑧) 𝑧~𝑝(𝑧|𝑥2 = 5) 𝑧~𝑝(𝑧) 𝑧~𝑝(𝑧|𝑥2 = 𝑎𝑛𝑘𝑙𝑒 𝑏𝑜𝑜𝑡)

[Wu, Mike, and Noah Goodman. “Multimodal Generative Models for Scalable Weakly-Supervised Learning.”,
NIPS 2018]

24
Representation 5: Multilingual Representations

Goal: map image and its descriptions (not translations) in both

languages close to each other.
[Gella et al. " Image Pivoting for Learning Multilingual Multimodal Representations", ACL 2017]
New Directions:
Alignment
26
Alignment 1: Books to scripts/movies

▪ Aligning very different modalities

▪ Books to scripts/movies

▪ Hand-crafted similarity based approach

[Tapaswi et al. Book2Movie: Aligning Video scenes with Book chapters, CVPR
2015]
Alignment 2: Books to scripts/movies

▪ Aligning very different modalities

▪ Books to scripts/movies

▪ Supervision based approach

[Zhu et al. Aligning Books and Movies: Towards Story-like Visual Explanations
by Watching Movies and Reading Books, ICCV 2015]
Alignment 3: Spot-The-Diff

▪ ‘Spot-the-diff’: a new task and a dataset for succinctly describing all

the differences between two similar images.
▪ Proposes a new model that captures visual salience through a latent
alignment between clusters of differing pixels and output sentences.
[Jhamtani and Berg-Kirkpatrick. Learning to Describe Differences Between Pairs of Similar
Images., EMNLP 2018]
Alignment 4: Textual Grounding

[Yeh, Raymond, et al. “Interpretable and globally optimal prediction for textual grounding using image concepts.”,
NIPS 2017.]
Alignment 4: Textual Grounding

▪ Formulate the bounding box prediction as an energy minimization

▪ The energy function is defined as a linear combination of a set of “image
concepts” 𝜙𝑐 𝑥, 𝑤𝑟 ∈ ℝ𝑊×𝐻

Word priors

[Yeh, Raymond, et al. “Interpretable and globally optimal prediction for textual grounding using image concepts.”,
NIPS 2017.]

31
Alignment 4: Textual Grounding

Word-word relationship

cos 𝑤𝑠 , 𝑤𝑠′
𝑤𝑠 = [𝑤𝑠,1 ; 𝑤𝑠,1 ; … ; 𝑤𝑠,|𝐶| ]

[Yeh, Raymond, et al. “Interpretable and globally optimal prediction for textual grounding using image concepts.”,
NIPS 2017.]

32
Alignment 5: Comprehensive Image Captions

▪ Merging attention from

text and visual modality
for image captioning

▪ Strike a balance
between details (visual
driven) and coverage of
objects (text/topic driven)

[Liu et al. simNet: Stepwise Image-Topic Merging Network for Generating Detailed
and Comprehensive Image Captions, 2018]
Alignment 5: Comprehensive Image Captions

▪ Merging attention from text and visual modality for

image captioning

[Liu et al. simNet: Stepwise Image-Topic Merging Network for Generating Detailed
and Comprehensive Image Captions, 2018]
New Directions:
Fusion
35
Fusion 1a: Multi-Head Attention for AVSR
Multi-head Attention

Afouras, Triantafyllos, Joon Son Chung, Andrew Senior, Oriol Vinyals, and Andrew Zisserman.
"Deep audio-visual speech recognition." arXiv preprint arXiv:1809.02108 (Sept 2018).
Fusion 1b: Fusion with Multiple Attentions

▪ Modeling Human Communication – Sentiment,

Emotions, Speaker Traits

Language LSTM

Vision LSTM

Acoustic LSTM

[Zadeh et al., Human Communication Decoder Network for Human Communication Comprehension, AAAI 2018]

37
Fusion 2: Memory-Based Fusion

[Zadeh et al., Memory Fusion Network for Multi-view Sequential Learning, AAAI 2018]

38
Fusion 3: Relational Questions

▪ Aims to improve relational

reasoning for Visual Question
Answering

▪ Current deep learning

architectures – unable to capture
reasoning capabilities on their
own

▪ Proposes a Relation Network

(RN) that augments CNNs for
better reasoning
Santoro, A., Raposo, D., Barrett, D. G., Malinowski, M., Pascanu, R., Battaglia, P., & Lillicrap, T.
(2017). A simple neural network module for relational reasoning. In Advances in neural
information processing systems (pp. 4967-4976).
Fusion 3: Relational Questions
Fusion 4: Structured Prediction

▪ Scene-graph prediction: The output structure is invariant to specific

permutations.
▪ The paper describe a model that satisfies the permutation invariance
property, and achieve state-of-the-art results on the competitive Visual
Genome benchmark

[Herzig et al. Mapping Images to Scene Graphs with Permutation-Invariant Structured

Prediction, NIPS 2018]
Fusion 5: Recurrent Multimodal Interaction

The same LSTM module strides across entire image /

feature map of image, acting as conv kernel

mLSTM mLSTM mLSTM

LSTM LSTM LSTM Segmentation mask

Input image

giraffe on right

[Liu et al. Recurrent Multimodal Interaction for Referring Image Segmentation, 2017]
New Directions:
Translation
43
Translation 1: Visually indicated sounds

▪ Sound generation!

[Owens et al. Visually indicated sounds, CVPR, 2016]

Translation 2: The Sound of Pixels

▪ Propose a system that learns to localize the sound sources in a

video and separate the input audio into a set of components
coming from each object by leveraging unlabeled videos.

[Zhao, Hang, et al. “The sound of pixels.”, ECCV 2018] https://youtu.be/2eVDLEQlKD0

45
Translation 2: The Sound of Pixels

▪ Trained in a self-supervised manner by learning to separate the

sound source of a video from the audio mixture of multiple videos
conditioned on the visual input associated with it.

[Zhao, Hang, et al. “The sound of pixels.”, ECCV 2018]

46
Translation 3: Learning-by-asking (LBA)

• an agent interactively learns by asking questions to an oracle

• standard VQA training has a fixed dataset of questions
• in LBA the agent has the potential to learn more quickly by asking
“good” questions (like a bright student in a class)
[Misra et al. "Learning by Asking Questions", CVPR 2018]
Translation 3: Learning-by-asking (LBA)

Training: Testing:
• Given on the input image the • LBA is evaluated exactly like
model decides what questions VQA
to ask
• Answers are obtained by
human supervised oracle
[Misra et al. "Learning by Asking Questions", CVPR 2018]
Translation 4: Navigation

▪ Goal prediction
▪ Highlight the goal location by generating a probability distribution over
the environment panoramic image

▪ Interpretability
▪ Explicit goal
prediction
modeling makes
the approach more
interpretable

[Misra et al. Mapping Instructions to Actions in 3D Environments with

Visual Goal Prediction, EMNLP 2018]
Translation 4: Navigation

▪ The paper proposes to decompose instruction execution into: 1)

goal prediction and 2) action generation

[Misra et al. Mapping Instructions to Actions in 3D Environments with Visual Goal

Prediction, EMNLP 2018]
Translation 5: Explanations for VQA and ACT

Pointing and Justification Architecture

▪ Answering Model: predicts an answer given the image and the

question
▪ Multimodal Explanation Model: generates visual and textual
explanations given the answer, question, and image

[Park et al. "Multimodal Explanations: Justifying Decisions and Pointing to the

Evidence", CVPR 2018]
Translation 5: Explanations for VQA and ACT
VQA-X: ACT-X:

[Park et al. "Multimodal Explanations: Justifying Decisions and Pointing to the

Evidence", CVPR 2018]
New Directions:
Co-Learning
53
Co-learning 1: Regularizing with Skeleton Seqs

▪ Better unimodal representation by regularizing using

a different modality

Non parallel data!

[B. Mahasseni and S. Todorovic, “Regularizing Long Short Term Memory with 3D Human-
Skeleton Sequences for Action Recognition,” in CVPR, 2016]
Co-Learning 2: Multimodal Cyclic Translation

Sentiment

Encoder Decoder

“Today was a great day!”

Verbal modality
(Spoken language)
Cyclic Co-learning Visual modality
Loss Representation

Paul Pu Liang*, Hai Pham*, et al., “Found in Translation: Learning Robust Joint Representations by
Cyclic Translations Between Modalities”, AAAI 2019
Co-learning 3: Taskonomy

Zamir, Amir R., et al. "Taskonomy: Disentangling Task Transfer Learning." Proceedings of
the IEEE Conference on Computer Vision and Pattern Recognition. 2018.
Co-learning 4: Associative Multichannel Autoencoder

▪ Learning representation through fusion and translation

▪ Use associated word prediction to address data sparsity

[Wang et al. Associative Multichannel Autoencoder for Multimodal Word

Representation, 2018]
Co-learning 5: Grounding Semantics in Olfactory Perception

▪ Grounding language in vision, sound, and smell

[Kiela et al., Grounding Semantics in Olfactory Perception, ACL-IJCNLP,

2015]
Multimodal machine
learning recap
59
Prediction Big dog
on the
beach
1 2

𝑡1

𝑡2 𝑡4 𝑡2

𝑡3 𝑡5 𝑡3

𝑡6
𝑡𝑛 𝑡𝑛

Language Visual
Input Modalities
Acoustic

60
Taxonomy of Multimodal Research [ https://arxiv.org/abs/1705.09406 ]

Representation o Encoder-decoder ▪ Model-based

o Online prediction o Kernel-based
▪ Joint
o Neural networks Alignment o Graphical models
o Neural networks
o Graphical models ▪ Explicit
o Sequential o Unsupervised Co-learning
▪ Coordinated o Supervised ▪ Parallel data
o Similarity ▪ Implicit o Co-training
o Structured o Graphical models o Transfer learning
Translation o Neural networks ▪ Non-parallel data
▪ Example-based Fusion ▪ Zero-shot learning
o Retrieval ▪ Concept grounding
▪ Model agnostic
o Combination ▪ Transfer learning
o Early fusion
▪ Model-based o Late fusion ▪ Hybrid data
o Grammar-based o Hybrid fusion ▪ Bridging
Tadas Baltrusaitis, Chaitanya Ahuja, and Louis-Philippe Morency, Multimodal Machine Learning: A Survey and Taxonomy
Core Challenge 1: Representation

Definition: Learning how to represent and summarize multimodal data in away

that exploits the complementarity and redundancy.

A Joint representations: B Coordinated representations:

Representation Repres. 1 Repres 2

Modality 1 Modality 2 Modality 1 Modality 2

62
Core Challenge 2: Alignment

Definition: Identify the direct relations between (sub)elements from two or

more different modalities.
Modality 1 Modality 2
A Explicit Alignment
t1
The goal is to directly find correspondences
t2 t4 between elements of different modalities
Fancy algorithm

t3 t5 B Implicit Alignment

Uses internally latent alignment of modalities in

order to better solve a different problem
tn tn

63
Core Challenge 3: Fusion

Definition: To join information from two or more modalities to perform a

prediction task.

Prediction

Fancy
algorithm

Modality 1 Modality 2 Modality 3

64
Core Challenge 4: Translation

Definition: Process of changing data from one modality to another, where the
translation relationship can often be open-ended or subjective.

A Example-based B Model-driven

65
Challenge 5 – Co-learning

▪ How can one modality help

learning in another modality? Prediction
▪ One modality may have more
resources
▪ Bootstrapping or domain
adaptation Modality 1 Modality 2
▪ Zero-shot learning Help during
▪ How to alternate between training

modalities during learning? Prediction

▪ Co-training (term introduced by

Avrim Blum and Tom Mitchell from Modality 1 Modality 2
CMU) Help during

▪ Transfer learning training

Fin 4721
No ratings yet
Fin 4721
15 pages
Session 15-1 Multimodal
No ratings yet
Session 15-1 Multimodal
82 pages
Computational Methods For Integrating Vision and Language: Kobus Barnard
No ratings yet
Computational Methods For Integrating Vision and Language: Kobus Barnard
229 pages
Deep Learning Book PDF
No ratings yet
Deep Learning Book PDF
272 pages
Multi Model
No ratings yet
Multi Model
36 pages
Multimodal Deep Learning
No ratings yet
Multimodal Deep Learning
21 pages
Lecture1.1-Introduction
No ratings yet
Lecture1.1-Introduction
52 pages
2402.05391v4
No ratings yet
2402.05391v4
54 pages
8
No ratings yet
8
27 pages
Multimodal Foundation Models
No ratings yet
Multimodal Foundation Models
14 pages
Multimodal_Machine_Learning_A_Survey_and_Taxonomy
No ratings yet
Multimodal_Machine_Learning_A_Survey_and_Taxonomy
21 pages
MMML Tutorial - P2 Representation
No ratings yet
MMML Tutorial - P2 Representation
41 pages
sensor fusion presentation
No ratings yet
sensor fusion presentation
10 pages
ICML2023 - Tutorial多模态机器学习Multimodal Machine Learning
No ratings yet
ICML2023 - Tutorial多模态机器学习Multimodal Machine Learning
120 pages
Lecture5.2-StructuredRepresentationsAndReasoning
No ratings yet
Lecture5.2-StructuredRepresentationsAndReasoning
50 pages
Multimodal Machine Learning: A Survey and Taxonomy: Tadas Baltru Saitis, Chaitanya Ahuja, and Louis-Philippe Morency
No ratings yet
Multimodal Machine Learning: A Survey and Taxonomy: Tadas Baltru Saitis, Chaitanya Ahuja, and Louis-Philippe Morency
20 pages
Recent Advances and Trends in Multimodal Deep Learning A Review
No ratings yet
Recent Advances and Trends in Multimodal Deep Learning A Review
35 pages
Lecture1.2-MultimodalResearchTasks
No ratings yet
Lecture1.2-MultimodalResearchTasks
46 pages
677 a Survey on Bridging VLMs
No ratings yet
677 a Survey on Bridging VLMs
20 pages
Lecture7.2-MultimodalInference
No ratings yet
Lecture7.2-MultimodalInference
68 pages
Data-Efficient Multimodal Fusion On A Single GPU
No ratings yet
Data-Efficient Multimodal Fusion On A Single GPU
15 pages
Multimodal Learning
No ratings yet
Multimodal Learning
29 pages
lecture4.1-MultimodalAlignment
No ratings yet
lecture4.1-MultimodalAlignment
47 pages
Synthesis Lectures On Computer Vision: Series Editors
No ratings yet
Synthesis Lectures On Computer Vision: Series Editors
8 pages
Deep Boltzmann Machine Paper
No ratings yet
Deep Boltzmann Machine Paper
9 pages
Learning Representations
No ratings yet
Learning Representations
8 pages
Lecture22-Multimodal
No ratings yet
Lecture22-Multimodal
32 pages
multimodel deep learning
No ratings yet
multimodel deep learning
92 pages
Introduction to multimodal RAG
No ratings yet
Introduction to multimodal RAG
12 pages
Li Et Al. - 2023 - Multimodal Foundation Models From Specialists To
No ratings yet
Li Et Al. - 2023 - Multimodal Foundation Models From Specialists To
119 pages
Incorporating Visual Information Into Natural Language Processing
No ratings yet
Incorporating Visual Information Into Natural Language Processing
151 pages
mmE5- Improving Multimodal Multilingual Embeddings via High-quality Synthetic Data
No ratings yet
mmE5- Improving Multimodal Multilingual Embeddings via High-quality Synthetic Data
21 pages
NLP Summary
No ratings yet
NLP Summary
6 pages
One Model To Learn Them All: Work Performed While at Google Brain
No ratings yet
One Model To Learn Them All: Work Performed While at Google Brain
10 pages
2502.08826
No ratings yet
2502.08826
32 pages
Deepsetfusion
No ratings yet
Deepsetfusion
10 pages
Perception, Reason, Think, and Plan
No ratings yet
Perception, Reason, Think, and Plan
75 pages
Nipsdlufl10 MultimodalDeepLearning
No ratings yet
Nipsdlufl10 MultimodalDeepLearning
9 pages
Exchanging-based Multimodal Fusion with Transformer
No ratings yet
Exchanging-based Multimodal Fusion with Transformer
13 pages
Multimodal Fusion Research Papers Survey
No ratings yet
Multimodal Fusion Research Papers Survey
1 page
2023 Multimodal Large Language Models- A Survey
No ratings yet
2023 Multimodal Large Language Models- A Survey
10 pages
The Evolution of 2024 Multimodal Model Architectures
No ratings yet
The Evolution of 2024 Multimodal Model Architectures
30 pages
Silber Er 13 Acl
No ratings yet
Silber Er 13 Acl
12 pages
An Introduction To Vision-Language Modeling: Aishwarya Agrawal Kate Saenko Asli Celikyilmaz Vikas Chandra
No ratings yet
An Introduction To Vision-Language Modeling: Aishwarya Agrawal Kate Saenko Asli Celikyilmaz Vikas Chandra
76 pages
Cs224n 2025 Lecture03 Neuralnets
No ratings yet
Cs224n 2025 Lecture03 Neuralnets
96 pages
26_Sentiment analysis of linguistic cues to assist medical image classification
No ratings yet
26_Sentiment analysis of linguistic cues to assist medical image classification
20 pages
2021 Wnut-1 11
No ratings yet
2021 Wnut-1 11
10 pages
2023.findings Emnlp.314v2
No ratings yet
2023.findings Emnlp.314v2
21 pages
11 Deep Transfer Learning and Multi Task Learning
No ratings yet
11 Deep Transfer Learning and Multi Task Learning
24 pages
Combining Language and Vision With A Multimodal Skip-Gram Model
No ratings yet
Combining Language and Vision With A Multimodal Skip-Gram Model
11 pages
2022.Findings Emnlp.230
No ratings yet
2022.Findings Emnlp.230
10 pages
MultiVae 8
No ratings yet
MultiVae 8
39 pages
Modeling Text With Graph Convolutional Network For Cross-Modal Information Retrieval
No ratings yet
Modeling Text With Graph Convolutional Network For Cross-Modal Information Retrieval
7 pages
A Modular End-to-End Multimodal Learning Method For Structured and Unstructured Data
No ratings yet
A Modular End-to-End Multimodal Learning Method For Structured and Unstructured Data
8 pages
Natural Language Processing With Deep Learning CS224N/Ling284
No ratings yet
Natural Language Processing With Deep Learning CS224N/Ling284
57 pages
Enhancing Natural Language Processing (NLP) Models With Multimodal Learning Enhanced
No ratings yet
Enhancing Natural Language Processing (NLP) Models With Multimodal Learning Enhanced
2 pages
Towards Large Language Models That Perceive And
No ratings yet
Towards Large Language Models That Perceive And
12 pages
Lecture1.2- Multimodal Research Tasks
No ratings yet
Lecture1.2- Multimodal Research Tasks
154 pages
Deep Learning With Python Illustrated Guide For Beginners & Intermediates: The Future Is Here!: The Future Is Here!, #2
From Everand
Deep Learning With Python Illustrated Guide For Beginners & Intermediates: The Future Is Here!: The Future Is Here!, #2
William Sullivan
1/5 (1)
MCS-024: Object Oriented Technologies and Java Programming
From Everand
MCS-024: Object Oriented Technologies and Java Programming
Dr. DK Sukhani
No ratings yet
The Numpy Pocketbook: Essentials on the Go
From Everand
The Numpy Pocketbook: Essentials on the Go
Silas Meadowlark
No ratings yet
Cross Learning Cameraradar Radar
No ratings yet
Cross Learning Cameraradar Radar
7 pages
Image Segmentation Using Deep Learning: A Survey
No ratings yet
Image Segmentation Using Deep Learning: A Survey
23 pages
Deep learning in computational mechanics a review
No ratings yet
Deep learning in computational mechanics a review
51 pages
ssrn-4990383
No ratings yet
ssrn-4990383
49 pages
Med Technique
No ratings yet
Med Technique
15 pages
Deep Learning For Deepfakes Creation and Detection: A Survey
No ratings yet
Deep Learning For Deepfakes Creation and Detection: A Survey
16 pages
Unit - V
No ratings yet
Unit - V
44 pages
271 Peeyush
No ratings yet
271 Peeyush
15 pages
2021 - Energy-Efficient VM Scheduling Based On Deep Reinforcement Learning
No ratings yet
2021 - Energy-Efficient VM Scheduling Based On Deep Reinforcement Learning
13 pages
Avishek Nag - Pragmatic Machine Learning With Python-BPB Publications (2020) - Pages-248-260
No ratings yet
Avishek Nag - Pragmatic Machine Learning With Python-BPB Publications (2020) - Pages-248-260
13 pages
ioegc-10-032-100471
No ratings yet
ioegc-10-032-100471
8 pages
Convolutional Neural Networks For Visual Recognition
No ratings yet
Convolutional Neural Networks For Visual Recognition
45 pages
AWID For IntrusionCISS2019
No ratings yet
AWID For IntrusionCISS2019
6 pages
Greedy Layerwise Learning
No ratings yet
Greedy Layerwise Learning
39 pages
Detection and Classification of Dental Caries in X-Ray Images Using Deep Neural Networks
No ratings yet
Detection and Classification of Dental Caries in X-Ray Images Using Deep Neural Networks
5 pages
Deep Learning Tutorial: Reference: Hung-Yi Lee
100% (1)
Deep Learning Tutorial: Reference: Hung-Yi Lee
179 pages
Handbook of Research On Machine and Deep Learning Applications For Cyber Security
No ratings yet
Handbook of Research On Machine and Deep Learning Applications For Cyber Security
507 pages
DL Lab File Front Page
No ratings yet
DL Lab File Front Page
7 pages
Denoising of Images Using Autoencoders
No ratings yet
Denoising of Images Using Autoencoders
18 pages
Deep Learning Book Ian Goodfellow download
100% (3)
Deep Learning Book Ian Goodfellow download
50 pages
ChatGPT - Convolution and Pooling Operations
No ratings yet
ChatGPT - Convolution and Pooling Operations
43 pages
Instructor Solution Manual To Neural Networks and Deep Learning A Textbook Solutions 3319944622 9783319944623 - Compress
No ratings yet
Instructor Solution Manual To Neural Networks and Deep Learning A Textbook Solutions 3319944622 9783319944623 - Compress
40 pages
Be Computer Engineering Aids Final Year Be Semester 7 8 Rev 2019 c Scheme
No ratings yet
Be Computer Engineering Aids Final Year Be Semester 7 8 Rev 2019 c Scheme
145 pages
ProjectOutliner
No ratings yet
ProjectOutliner
7 pages
Multimodal Deep Learning
No ratings yet
Multimodal Deep Learning
8 pages
Ihub - IITR - PCP in Generative AI and Machine Learning - 41223
No ratings yet
Ihub - IITR - PCP in Generative AI and Machine Learning - 41223
30 pages
Medical Image Denoising Using Convolutional Denoising Autoencoders
No ratings yet
Medical Image Denoising Using Convolutional Denoising Autoencoders
6 pages
Automatic Fabric Defect Detection With A Multi-Sca
No ratings yet
Automatic Fabric Defect Detection With A Multi-Sca
18 pages
Unit V Aiml
No ratings yet
Unit V Aiml
18 pages