Lecture1.1-Introduction
Lecture1.1-Introduction
2
Lecture Objectives
▪ What is Multimodal?
▪ Research-oriented definition
▪ Dimensions of modality heterogeneity
▪ Modality connections and interactions
▪ Core technical and conceptual challenges
▪ Representation, alignment, reasoning,
generation, transference and quantification
▪ Course syllabus
3
What is
Multimodal?
What is Multimodal?
Sensory Modalities
5
5
Multimodal Behaviors and Signals
6
What is a Modality?
Modality
Raw Abstract
Modalities Modalities
(closest from sensor) (farthest from sensor)
from a sensor
7
What is Multimodal?
A dictionary definition…
A research-oriented definition…
8
Heterogeneous Modalities
9
Dimensions of Heterogeneity
10
10
Dimensions of Heterogeneity
11
Dimensions of Heterogeneity Modality A Modality B
1 Element representations:
Discrete, continuous, granularity
2 Element distributions:
Density, frequency
3 Structure:
Temporal, spatial, latent, explicit
4 Information:
Abstraction, entropy 𝐻( ) 𝐻( )
5 Noise:
Uncertainty, noise, missing data
6 Relevance: 𝑦1
Task, context dependence 𝑦2
12
12
Connected Modalities
unconnected
unique
stronger
weaker
Modality A
Modality B unique
Statistical Semantic
13
Interacting Modalities
Modality A 𝑧 inference 𝑧
Modality B response
Interactions happen
during inference!
14
Taxonomy of Interaction Responses – A Behavioral Science View
signal response signal response
Redundancy
inference 𝑧 a a+b Equivalence
response
inputs
b a+b Enhancement
Nonredundancy
Multimodal Communication a+b and Independence
a
a+b Dominance
b
a+b (or ) Modulation
a+b Emergence
Partan and Marler (2005). Issues in the classification of multimodal
15
communication signals. American Naturalist, 166(2)
15
What is Multimodal?
16
16
Multimodal
Machine Learning
What is Multimodal Machine Learning?
18
Multimodal Machine Learning
Vision
Acoustic
19
19
Multimodal Machine Learning
Modality A
Modality B Multimodal ML or 𝑦ො or
Modality C ❑ Unsupervised,
❑ Self-supervised,
❑ Supervised,
❑ Reinforcement,
…
20
20
Multimodal Machine Learning
21
Multimodal Technical Challenges – Surveys, Tutorials and Courses
2016 2022
Multimodal Machine Learning: Foundations and Recent Trends
A Survey and Taxonomy in Multimodal Machine Learning
Paul Liang, Amir Zadeh and Louis-Philippe Morency
Tadas Baltrusaitis, Chaitanya Ahuja and Louis-Philippe Morency
(Arxiv 2017, IEEE TPAMI journal, February 2019) ☑ 6 core challenges
https://arxiv.org/abs/1705.09406 ☑ 50+ taxonomic classes
Tutorials: CVPR 2016, ACL 2016, ICMI 2016, … ☑ 700+ referenced papers
https://arxiv.org/abs/2209.03430
Graduate-level courses:
Multimodal Machine learning (11th edition) Tutorials: ICML 2023, CVPR 2022, NAACL 2022
https://cmu-multicomp-lab.github.io/mmml-course/fall2020/
Updated graduate-level course:
Advanced Topics in Multimodal Machine learning Multimodal Machine learning (12th edition)
https://cmu-multicomp-lab.github.io/adv-mmml-course/spring2022/
https://cmu-multicomp-lab.github.io/mmml-course/fall2022/
22
22
Challenge 1: Representation
Individual elements:
23
Challenge 1: Representation
Sub-challenges:
Fusion Coordination Fission
24
Challenge 2: Alignment
Modality A
Modality B
Spatial Hierarchical
25
25
Challenge 2: Alignment
Sub-challenges:
Discrete Continuous Contextualized
Alignment Alignment Representation
26
Challenge 3: Reasoning
Modality A
or 𝑦ො
Modality B
27
27
Challenge 3: Reasoning
Modality A
words
or 𝑦ො
words
Modality B
words
External
knowledge
28
28
Challenge 3: Reasoning
Sub-challenges:
Structure Intermediate Inference External
Modeling concepts Paradigm Knowledge
words 𝑧
or
or
∧ 𝑡𝑟𝑢𝑒
29
29
Challenge 4: Generation
Sub-challenges:
Enriched Modality A
only available
during training
Transference A B
Modality A Modality B
31
31
Challenge 5: Transference
Sub-challenges:
Co-learning Co-learning
Transfer via representation via generation
𝑦 𝑦 𝑦
32
32
Challenge 6: Quantification
Sub-challenges:
Heterogeneity Interactions Learning
Loss
Epoch
33
33
Core Multimodal Challenges
Representation Generation
Reasoning Quantification
Alignment 𝑦ො Transference
34
34
Lecture Schedule
35
35
Lecture Schedule
36
36
Lecture Schedule
37
37
Course Syllabus
Three Course Learning Paradigms
39
Course Recommendations and Requirements
40
40
Course Project Guidelines
41
Course Project Timeline
42
Equal Contribution by All Teammates!
43
Process for Selecting your Course Project
44
Project Preferences – Due Tuesday 9/6
45
Course Grades
▪ Project preferences/pre-proposal 2%
▪ First project assignment 10%
▪ Second project assignment 10%
▪ Mid-term project assignment
o Report and presentation 20%
▪ Final project assignment
o Report and presentation 30%
46
46
Lecture Highlight Form Starting Week 2 !!
47
Lecture Highlight Form - Segments
48
First Reading Assignment – Week 2
49
First Reading Assignment – Week 2
50
Late Submissions and Wildcards
51
Piazza https://piazza.com/cmu/fall2023/11777/info
✓ Announcements
✓ Question/Answers
✓ Reading assignments
✓ Project resources
✓ Course syllabus
52
52