0% found this document useful (0 votes)
2 views

Lecture1.1-Introduction

The document outlines the course structure and objectives for a Multimodal Machine Learning class taught by Louis-Philippe Morency and Paul Liang in Fall 2023. It covers the definition of multimodal learning, its challenges, and the course syllabus, which includes topics such as representation, alignment, reasoning, and generation. The course emphasizes the integration of diverse modalities and their interactions in machine learning applications.

Uploaded by

zhizhang28600
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Lecture1.1-Introduction

The document outlines the course structure and objectives for a Multimodal Machine Learning class taught by Louis-Philippe Morency and Paul Liang in Fall 2023. It covers the definition of multimodal learning, its challenges, and the course syllabus, which includes topics such as representation, alignment, reasoning, and generation. The course emphasizes the integration of diverse modalities and their interactions in machine learning applications.

Uploaded by

zhizhang28600
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 52

Multimodal Machine Learning

Lecture 1.1: Introduction


Louis-Philippe Morency

* Fall 2021, 2022 and 2023 co-lecturer: Paul Liang. Original


course co-developed with Tadas Baltrusaitis. Spring 2021
and 2022 editions taught by Yonatan Bisk. Spring 2023
edition taught by Yonatan and Daniel Fried
Your teaching team This Semester (11-777, Fall 2023)

Louis-Philippe Morency Paul Liang


[email protected] [email protected]
Course instructor Co-lecturer

Syeda Akter Mehul Agarwal Aditya Rathod


[email protected] [email protected] [email protected]
TA TA TA

Soham Dinesh Tiwari Haofei Yu


[email protected] [email protected]
TA TA

2
Lecture Objectives

▪ What is Multimodal?
▪ Research-oriented definition
▪ Dimensions of modality heterogeneity
▪ Modality connections and interactions
▪ Core technical and conceptual challenges
▪ Representation, alignment, reasoning,
generation, transference and quantification
▪ Course syllabus

3
What is
Multimodal?
What is Multimodal?

Sensory Modalities
5

5
Multimodal Behaviors and Signals

Language Visual Touch


▪ Haptics
▪ Lexicon ▪ Gestures
▪ Words ▪ Head gestures ▪ Motion
▪ Syntax ▪ Eye gestures
▪ Part-of-speech ▪ Arm gestures Physiological
▪ Dependencies ▪ Body language ▪ Skin conductance
▪ Pragmatics ▪ Body posture
▪ Electrocardiogram
▪ Discourse acts ▪ Proxemics

Acoustic ▪ Eye contact


▪ Head gaze
Mobile
▪ Prosody
▪ Eye gaze ▪ GPS location
▪ Intonation
▪ Voice quality ▪ Facial expressions ▪ Accelerometer
▪ FACS action units
▪ Vocal expressions ▪ Light sensors
▪ Smile, frowning
▪ Laughter, moans

6
What is a Modality?

Modality

Modality refers to the way in which something expressed or perceived.

Raw Abstract
Modalities Modalities
(closest from sensor) (farthest from sensor)
from a sensor

Speech Language Sentiment


Examples: signal intensity

Image Detected Object


objects categories

7
What is Multimodal?

A dictionary definition…

Multimodal: with multiple modalities

A research-oriented definition…

Multimodal is the scientific study of


heterogeneous and interconnected data
Connected + Interacting
8

8
Heterogeneous Modalities

Information present in different modalities will often show


diverse qualities, structures and representations.

Modality A Homogeneous Heterogeneous


Modalities Modalities
Modality B (with similar qualities) (with diverse qualities)

Examples: Images Text from Language ???


from 2 2 different and vision
cameras languages

Abstract modalities are more likely to be homogeneous


9

9
Dimensions of Heterogeneity

Information present in different modalities will often show


diverse qualities, structures, and representations.

A teacup on the right of a laptop


in a clean room.

10

10
Dimensions of Heterogeneity

Information present in different modalities will often show


diverse qualities, structures, and representations.

A teacup on the right of a laptop


in a clean room.

1 Element representations: discrete, continuous, granularity


{ , , }

{teacup, right, laptop, clean, room}


11

11
Dimensions of Heterogeneity Modality A Modality B

1 Element representations:
Discrete, continuous, granularity

2 Element distributions:
Density, frequency

3 Structure:
Temporal, spatial, latent, explicit

4 Information:
Abstraction, entropy 𝐻( ) 𝐻( )
5 Noise:
Uncertainty, noise, missing data

6 Relevance: 𝑦1
Task, context dependence 𝑦2
12

12
Connected Modalities

Connected: Shared information that relates modalities

unconnected
unique

stronger

weaker
Modality A

Modality B unique

Statistical Semantic

Association Dependency Correspondence Relationship


used for
= laptop

e.g., correlation, e.g., causal, e.g., grounding e.g., function


co-occurrence temporal
13

13
Interacting Modalities

Interacting: process affecting each modality, creating new response

Modality A 𝑧 inference 𝑧
Modality B response

Interactions happen
during inference!

“Inference” examples: • Representation fusion representation


• Prediction task 𝑦ො prediction
• Modality translation modality C
14

14
Taxonomy of Interaction Responses – A Behavioral Science View
signal response signal response

Redundancy
inference 𝑧 a a+b Equivalence
response
inputs
b a+b Enhancement

Nonredundancy
Multimodal Communication a+b and Independence
a
a+b Dominance
b
a+b (or ) Modulation

a+b Emergence
Partan and Marler (2005). Issues in the classification of multimodal
15
communication signals. American Naturalist, 166(2)
15
What is Multimodal?

Multimodal is the scientific study of


heterogeneous and interconnected data ☺

16

16
Multimodal
Machine Learning
What is Multimodal Machine Learning?

Multimodal Machine Learning (ML) is the study of


computer algorithms that learn and improve through the use
and experience of data from multiple modalities

Multimodal Artificial Intelligence (AI) studies computer


agents able to demonstrate intelligence capabilities such
as understanding, reasoning and planning, through
multimodal experiences, and data

Multimodal AI is a superset of Multimodal ML


18

18
Multimodal Machine Learning

Language I really like this tutorial

Vision

Acoustic

19

19
Multimodal Machine Learning

Modality A

Modality B Multimodal ML or 𝑦ො or

Modality C ❑ Unsupervised,
❑ Self-supervised,
❑ Supervised,
❑ Reinforcement,

20

20
Multimodal Machine Learning

What are the core multimodal technical challenges,


understudied in conventional machine learning?

21
Multimodal Technical Challenges – Surveys, Tutorials and Courses

2016 2022
Multimodal Machine Learning: Foundations and Recent Trends
A Survey and Taxonomy in Multimodal Machine Learning
Paul Liang, Amir Zadeh and Louis-Philippe Morency
Tadas Baltrusaitis, Chaitanya Ahuja and Louis-Philippe Morency
(Arxiv 2017, IEEE TPAMI journal, February 2019) ☑ 6 core challenges
https://arxiv.org/abs/1705.09406 ☑ 50+ taxonomic classes
Tutorials: CVPR 2016, ACL 2016, ICMI 2016, … ☑ 700+ referenced papers
https://arxiv.org/abs/2209.03430
Graduate-level courses:
Multimodal Machine learning (11th edition) Tutorials: ICML 2023, CVPR 2022, NAACL 2022
https://cmu-multicomp-lab.github.io/mmml-course/fall2020/
Updated graduate-level course:
Advanced Topics in Multimodal Machine learning Multimodal Machine learning (12th edition)
https://cmu-multicomp-lab.github.io/adv-mmml-course/spring2022/
https://cmu-multicomp-lab.github.io/mmml-course/fall2022/

22

22
Challenge 1: Representation

Definition: Learning representations that reflect cross-modal interactions


between individual elements, across different modalities

This is a core building block for most multimodal modeling problems!

Individual elements:

Modality A It can be seen as a “local” representation


or
Modality B representation using holistic features

23
Challenge 1: Representation

Definition: Learning representations that reflect cross-modal interactions


between individual elements, across different modalities

Sub-challenges:
Fusion Coordination Fission

# modalities > # representations # modalities = # representations # modalities < # representations

24
Challenge 2: Alignment

Definition: Identifying and modeling cross-modal connections between all


elements of multiple modalities, building from the data structure

Most modalities have internal structure with multiple elements

Elements with temporal structure: Other structured examples:

Modality A

Modality B
Spatial Hierarchical
25

25
Challenge 2: Alignment

Definition: Identifying and modeling cross-modal connections between all


elements of multiple modalities, building from the data structure

Sub-challenges:
Discrete Continuous Contextualized
Alignment Alignment Representation

Discrete elements Segmentation and Alignment + representation


and connections continuous warping
26

26
Challenge 3: Reasoning

Definition: Combining knowledge, usually through multiple inferential steps,


exploiting multimodal alignment and problem structure

Modality A

or 𝑦ො
Modality B

27

27
Challenge 3: Reasoning

Definition: Combining knowledge, usually through multiple inferential steps,


exploiting multimodal alignment and problem structure

Modality A

words
or 𝑦ො

words
Modality B

words
External
knowledge

28

28
Challenge 3: Reasoning

Definition: Combining knowledge, usually through multiple inferential steps,


exploiting multimodal alignment and problem structure

Sub-challenges:
Structure Intermediate Inference External
Modeling concepts Paradigm Knowledge

words 𝑧
or

or
∧ 𝑡𝑟𝑢𝑒

29

29
Challenge 4: Generation

Definition: Learning a generative process to produce raw modalities that


reflects cross-modal interactions, structure and coherence

Sub-challenges:

Summarization Translation Creation

Reduction Maintenance Expansion


Information:
(content)
> = <
30
Challenge 5: Transference

Definition: Transfer knowledge between modalities, usually to help the


target modality which may be noisy or with limited resources

Enriched Modality A

only available
during training
Transference A B

Modality A Modality B
31

31
Challenge 5: Transference

Definition: Transfer knowledge between modalities, usually to help the


target modality which may be noisy or with limited resources

Sub-challenges:
Co-learning Co-learning
Transfer via representation via generation
𝑦 𝑦 𝑦

32

32
Challenge 6: Quantification

Definition: Empirical and theoretical study to better understand heterogeneity,


cross-modal interactions and the multimodal learning process

Sub-challenges:
Heterogeneity Interactions Learning

Loss
Epoch

33

33
Core Multimodal Challenges
Representation Generation

Reasoning Quantification

Alignment 𝑦ො Transference

34

34
Lecture Schedule

Classes Tuesday Lectures Thursday Lectures


Week 1 Course introduction Multimodal applications and datasets
8/29 & 8/31 ● Multimodal core challenges ● Research tasks and datasets
● Course syllabus ● Team projects
Week 2 Unimodal representations Unimodal representations
9/5 & 9/7 ● Dimensions of heterogeneity ● Language representations
Read due: 9/9
● Visual representations ● Signals, graphs and other modalities
Week 3 Multimodal representations Multimodal representations
9/12 & 9/14 ● Cross-modal interactions ● Coordinated representations
Read due: 9/16
Proj. Due: 9/13 ● Multimodal fusion ● Multimodal fission
Week 4 Multimodal alignment and grounding Alignment and representations
9/19 & 9/21 ● Explicit alignment ● Self-attention transformer models
Proj. due: 9/24
● Multimodal grounding ● Masking and self-supervised learning
Week 5 Multimodal transformers Multimodal Reasoning
9/26 & 9/28 ● Multimodal transformers ● Structured and hierarchical models
Read due: 9/30
● Video and graph representations ● Memory models
Week 6 Project hours Multimodal language grounding
10/3 & 10/5 ● Grounded semantics and pragmatics
Proj. due: 10/8

35

35
Lecture Schedule

Classes Tuesday Lectures Thursday Lectures


Week 7 Multimodal interaction Multimodal inference
10/10 & 10/12 ● Reinforcement learning ● Logical and causal inference
Read due: 10/14
● Discrete structure learning ● External knowledge
Week 8 Fall Break – No lectures
10/17 & 10/19

Week 9 Multimodal generation New generative models


10/24 & 10/26 ● Translation, summarization, creation ● GANs and diffusion models
Proj. due: 10/29
● Generative models: VAEs ● Model evaluation and ethics
Week 10 Project presentations (midterm) Project presentations (midterm)
10/31 & 11/2

Week 11 Democracy Day – No Class – Transference


11/7 & 11/9 ● Modality transfer and co-learning
Read due: 11/12
● Self-training and multitask learning
Week 12 Quantification New research directions
11/14 & 11/16 ● Heterogeneity and interactions ● Recent research in multimodal ML
Read due: 11/21
● Biases and fairness

36

36
Lecture Schedule

Classes Tuesday Lectures Thursday Lectures


Week 13 Thanksgiving Week – No Class –
11/21 & 11/23

Week 14 Guest lecture Guest lecture


11/28 & 11/30

Week 15 Project presentations (final) Project presentations (final)


12/5 & 12/7
Proj. due: 12/10

37

37
Course Syllabus
Three Course Learning Paradigms

Course lecture participation Reading assignments


(16% of your grade) (12% of your grade)

Course project assignments


(72% of your grade)
39

39
Course Recommendations and Requirements

1 Ready to read about 6 papers this semester !


• Curated list of research papers for the 6 reading assignments
• Summarize one paper and contrast it with other papers

2 Already taken a machine learning course


• Strongly recommended for students to have taken an
introduction machine learning course
• 10-401, 10-601, 10-701, 11-663, 11-441, 11-641 or 11-741

3 Motivated to produce a high-quality course project


• Projects are designed to enhance state-of-the-art algorithms
• Four project assignments, to help scaffold the project tasks

40

40
Course Project Guidelines

▪ Dataset should have at least two modalities:


▪ Natural language and visual/images
▪ Teams of 3, 4 or 5 students
▪ The project should explore algorithmic novelty
▪ Possible venues for your final report:
▪ NAACL 2024, ACL 2024, IJCAI 2024, ICML 2024, ICMI 2024
▪ We will discuss on Thursday about project ideas
▪ GPU resources available:
▪ Amazon AWS and Google Cloud Platform

41
Course Project Timeline

Pre-proposal (due Wednesday Sept. 13)


▪ Define your dataset, research task and teammates
First project assignment (due Sunday Sept. 24)
▪ Study related work to your selected research topic
Second project assignment (due Sunday Oct 8)
▪ Experiment with unimodal representations
Midterm project assignment (due Sunday Oct 29)
▪ Implement and evaluate state-of-the-art model(s)
Final project assignment (due Sunday Dec. 10)
▪ Implement and evaluate new research ideas

42
Equal Contribution by All Teammates!

▪ Each team will be required to create a GitHub repository which


will be accessible by TAs
▪ Each report should include a description of the task from each
teammate
▪ Please let us know soon if you have concerns about the
participation levels of your teammates

43
Process for Selecting your Course Project

▪ Thursday 8/31: Lecture describing available multimodal datasets


and research topics
▪ Tuesday 9/5: Let us know your dataset preferences for the
course project
▪ Thursday 9/7: During the later part of the lecture, we will have an
interactive period to help with team formation. More details to
come
▪ Wednesday 9/13: Pre-proposals are due. You should have
selected your teammates, dataset and task

44
Project Preferences – Due Tuesday 9/6

▪ Post your project preferences:


▪ List of your ranked preferred projects
▪ Use alphanumeric code of each dataset
▪ Detailed dataset list in the "Lecture1.2-datasets“ slides
▪ Previous unimodal/multimodal experience
▪ Available CPU / GPU resources
▪ For topics or datasets not in the list:
▪ Include a description with links (for other students)

45
Course Grades

▪ Lecture highlights 16%


▪ Reading assignments 12%

▪ Project preferences/pre-proposal 2%
▪ First project assignment 10%
▪ Second project assignment 10%
▪ Mid-term project assignment
o Report and presentation 20%
▪ Final project assignment
o Report and presentation 30%
46

46
Lecture Highlight Form Starting Week 2 !!

Similar to note-taking during lectures


For each course segment (30mins):
2 sentences describing the main points

Help you summarizing the lecture


What is the main take-away message from the lecture
Short paragraph (15-40 words)

Ask questions about the lecture


Will be answered either online or at the next lecture

Submitted same day as lecture (before 8pm)


Students are encouraged to attend lectures in person

47
Lecture Highlight Form - Segments

Segment 1 Segment 2 Segment 3


9:30am 10:00am 10:30am 10:50am
Scheduled Scheduled
beginning end
of the lecture of the lecture

Segment 1 starts at 9:30am, even if the lecture starts slightly later.

Segment 3 ends whenever the lecture ends

Slides happening around the segment borders (+/- 5min of


10:00am and 10:30am) can be included in either neighboring
segment.

48
First Reading Assignment – Week 2

▪ Study groups: 9-10 students per group (randomly, in Piazza)


▪ 4 paper options are available
▪ Each student should pick one paper option!
▪ Google Sheets were created to help balance the papers between group members
▪ Then you will create a short summary to help others [1 point]

▪ Discussions with your study group


▪ Read other’s summaries. Ask questions!
▪ Write follow-up posts comparing the papers and suggesting ideas [1 point]
▪ At least one follow-up post for every paper you did not read

49
First Reading Assignment – Week 2

Four main steps for the reading assignments


1. Monday 8pm: Official start of the assignment
2. Wednesday 8pm: Select your paper
3. Friday 8pm: Post your summary
4. Monday 8pm: Post your follow-up posts

Detailed instructions posted on Piazza


https://piazza.com/cmu/fall2023/11777/resources

50
Late Submissions and Wildcards

▪ Each student has 6 late submission wildcards


▪ For lecture highlight forms or reading assignments
▪ Each project team has 2 late submission wildcards
▪ For any of the project assignments
▪ Total number of wildcards: 8 (6 individual and 2 team-level)
▪ Each wildcard gives 24-hour extension
▪ No partial credits for the wildcards
▪ Automatically calculated (no need to contact us apriori)

See details about late submission policy in syllabus


https://piazza.com/cmu/fall2023/11777/resources

51
Piazza https://piazza.com/cmu/fall2023/11777/info

✓ Announcements
✓ Question/Answers
✓ Reading assignments
✓ Project resources
✓ Course syllabus
52

52

You might also like