0% found this document useful (0 votes)
20 views45 pages

Compressed Yann 1

Yann LeCun's lecture discusses the evolution from machine learning to autonomous intelligence, emphasizing the limitations of current supervised and reinforcement learning methods. He advocates for self-supervised learning and energy-based models as key components for advancing AI capabilities, particularly in understanding and reasoning about the world. The lecture also highlights various applications of deep learning in fields such as transportation, medicine, and content moderation.

Uploaded by

Pick Tur
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views45 pages

Compressed Yann 1

Yann LeCun's lecture discusses the evolution from machine learning to autonomous intelligence, emphasizing the limitations of current supervised and reinforcement learning methods. He advocates for self-supervised learning and energy-based models as key components for advancing AI capabilities, particularly in understanding and reasoning about the world. The lecture also highlights various applications of deep learning in fields such as transportation, medicine, and content moderation.

Uploaded by

Pick Tur
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

From Machine Learning to

Autonomous Intelligence
Lecture 1
Yann LeCun
NYU - Courant Institute & Center for Data Science
Meta - Fundamental AI Research
http://yann.lecun.com
Summer School on Statistical
Physics & Machine Learning
Les Houches, 2022-07-[20-22]
Y. LeCun

Plan

Applications of AI / ML / DL today
Largely rely on supervised Deep Learning. A few on Deep RL.
Increasingly rely on Self-Supervised pre-training.
Current ML/DL sucks compare to humans and animals
Humans and animals learn models of the world
Self-Supervised Learning
Main problem: representing uncertainty, learning abstractions.
Energy-Based Models
Sample contrastive learning methods
Non-contrastive learning methods
Y. LeCun

Main Messages
Deep SSL is the enabling element for the next AI revolution
I’ll try to convince you to:
Give up on supervised and reinforcement learning
well, not completely, but as much as possible.
Give up on probabilistic modeling
use the energy-based framework instead
Give up on generative models
Use joint-embedding architectures instead
Use hierarchical latent-variable energy-based models
To enable machines to reason and plan.
See position paper: “A Path Towards Autonomous Machine Intelligence”
https://openreview.net/forum?id=BZ5a1r-kVsf
AI can do pretty
amazing things
today
Y. LeCun

Deep Learning: Protecting Lives and the Environment

Transportation
Driving assistance / autonomous driving
On-line Safety / Security
Filtering harmful/hateful content
Filtering dangerous misinformation
Environmental monitoring
Medicine
Medical imaging
Diagnostic aid
Patient care
Drug discovery
Y. LeCun

Deep Learning Connects People to knowledge & to each other

Meta (FB, Instagram), Google, YouTube, Amazon, are built around


Deep learning
Take Deep Learning out of them, and they crumble.
DL helps us deal with the information deluge
Search, retrieval, ranking, question-answering
Requires machines to understand content
Translation / transcription / accessibility
language ↔ language; text ↔ speech; image → text
People speak thousands of different languages
3 billion people can’t use technology today.
800 million are illiterate, 300 million are visually impaired
Y. LeCun

Deep Learning for On-Line Content Moderation


Filtering out objectionable content
What constitutes acceptable or objectionable content?
Meta doesn’t see itself as having the legitimacy to decide
But in the absence of regulations, it has to do it.
Types of objectionable content on Facebook
(with % taken down preemptively & prevalence, Q1 2022)
Hate Speech (95.6%, 0.02%), up from 30-40% in 2018
Violence incitement (98.1%, 0.03%), Violence (99.5%, 0.04%),
Bullying/Harassment (67%, 0.09%), Child endangerment (96.4%),
Suicide/Self-Injury (98.8%), Nudity (96.7%, 0.04%),
Taken down (Q1’22): Terrorism (16M), Fake accounts (1.5B), Spam (1.8B)
https://transparency.fb.com/data/community-standards-enforcement
Y. LeCun

Image understanding
Y. LeCun

FastMRI: 4x speed up for MRI acquisition (NYU Radiology + FAIR)

MRI images subsampled


in k-space by 4x and 8x
U-Net architecture
4-fold acceleration
[Zbontar et al.
ArXiv:1811.08839]

K-space masks
Y. LeCun

FastMRI (NYU Radiology+FAIR): 4x speed up for MRI acquisition


Radiologists could not tell the difference
between clinical standard and 4x
accelerated/restored images
They often preferred the accelerated/restored
images
[Recht et al., American Journal of Roentgenology 2020]
Similar systems are now integrated in new
MRI machines.
Y. LeCun

Why produce an image at all? [S. Chopra’s group, NYU]

Why not directly from raw data


to diagnosis / screening?
Humans need 2D image sliced
displayed on a monitor
DL systems can accept grossly
undersampled (10-20x) or low-
field raw data representing the
entire volume.
They can be trained to directly
produce a screening result
Y. LeCun

AI accelerates progress of biomedical sciences

Neuroscience
Neural nets as models of the brain
Models of vision, audition, & speech
understanding
Genomics
Identifying gene regulation networks
Curing genetic diseases?
Biology / biochemistry
Predicting protein structure and function
Designing proteins
Drug discovery
[DeepMind, AlphaFold]
Y. LeCun

AI accelerates the progress of physical sciences


Physics
Analyzing particle physics experiments
Accelerating complex simulations: fluids,
aerodynamics, atmosphere, oceans,….
Astrophysics: enabling universe-wide
simulations, classifying galaxies,
discovering exoplanets….
Chemistry
Finding new compounds
Material science
Predicting new material properties
Designing new meta-materials
[He 2019]
Y. LeCun

Open Catalyst Project: open competition


Want to solve climate change?
Discovering new materials to enable
large-scale energy storage
Efficient & scalable extraction of hydrogen
from water through electrolysis
Sponsored by FAIR & CMU
[Zitnick https://arxiv.org/abs/2010.09435]
Y. LeCun

Make-A-Scene: making art with the help of AI

1. Type a text description,


2. Draw a sketch

“A colorful sculpture of a cat”


Y. LeCun

Playing with Make-A-Scene

Drinking a glass of
Burgundy by the sea
painting of a physicist on a mountain path
watching the sunset, in the style of Van Gogh
Current ML Sucks!

Where are my self-driving car,


virtual assistant, domestic robot?
Y. LeCun

Requirements for Future ML/AI Systems

Understand the world, understand humans, have common sense


Level-5 autonomous cars
That learn to drive like humans, in about 20h of practice
Virtual assistants that can help us in our daily lives
Manage the information deluge (content filtering/selection)
Understands our intents, takes care of simple things
Real-time speech understanding & translation “Her”
(2013)
Overlays information in our AR glasses.
Domestic Robots
Takes care of all the chores
For this, we need machines near-human-level AI
Machines that understand how the world works
Y. LeCun

Machine Learning sucks! (compared to humans and animals)

Supervised learning (SL) requires large numbers of labeled samples.


Reinforcement learning (RL) requires insane amounts of trials.
SL/RL-trained ML systems:
are specialized and brittle
make “stupid” mistakes
Machines don’t have common sense

Animals and humans:


Can learn new tasks very quickly.
Understand how the world works
Humans and animals have common sense
Y. LeCun

Machine Learning sucks! (plain ML/DL, at least)

Machine Learning systems (most of them anyway)


Have a constant number of computational steps between input and
output.
Do not reason.
Cannot plan.

Humans and some animals


Understand how the world works.
Can predict the consequences of their actions.
Can perform chains of reasoning with an unlimited number of steps.
Can plan complex tasks by decomposing it into sequences of subtasks
Y. LeCun

Three challenges for AI & Machine Learning


1. Learning representations and predictive models of the world
Supervised and reinforcement learning require too many samples/trials
Self-supervised learning / learning dependencies / to fill in the blanks
learning to represent the world in a non task-specific way
Learning predictive models for planning and control
2. Learning to reason, like Daniel Kahneman’s “System 2”
Beyond feed-forward, System 1 subconscious computation.
Making reasoning compatible with learning.
Reasoning and planning as energy minimization.
3. Learning to plan complex action sequences
Learning hierarchical representations of action plans
How do humans
and animals
learn so quickly?
Not supervised.
Not Reinforced.
Y. LeCun
poin2ng
How could machines learn like animals and humans?
Social
helping vs false perceptual
Communication hindering beliefs

How do babies
Actions face tracking
learn how the
ra2onal, goal-
directed ac2ons
Perception

biological
mo2on world works?
gravity, iner2a
Physics stability,
support
conserva2on of
momentum
How can
teenagers learn
Object permanence shape
constancy to drive with
Objects solidity, rigidity 20h of practice?
[Emmanuel natural kind categories Age (months)
Dupoux] 0
Production

1 2 3 4 5 6 7 8 9 10 11 12 13 14
proto-imita2on
crawling walking
emo2onal contagion
Y. LeCun

How do Human and Animal Babies Learn?


How do they learn how the world works?
Largely by observation, with remarkably little interaction (initially).
They accumulate enormous amounts of background knowledge
About the structure of the world, like intuitive physics.
Perhaps common sense emerges from this knowledge?

Photos courtesy of
Emmanuel Dupoux
Y. LeCun

Common sense is a collection of models of the world

Jitendra Malik
Architecture of
Autonomous AI
Y. LeCun

Modular Architecture for Autonomous AI


Configurator
Configures other modules for task configurator
Perception Short-term
memory
Estimates state of the world World Model
World Model
Predicts future world states Perception
Actor Critic
Cost Intrinsic Cost
Compute “discomfort” cost

Actor
Find optimal action sequences action
Short-Term Memory
Stores state-cost episodes percept
Y. LeCun

Mode-2 Perception-Planning-Action Cycle


Akin to Model-Predictive Control (MPC) in optimal control.
Actor proposes an action sequence
World Model imagines predicted outcomes
Actor optimizes action sequence to minimize cost
e.g. using gradient descent, dynamic programming, MC tree search…
Actor sends first action(s) to effectors

C(s[0]) C(s[t]) C(s[t+1]) C(s[T-1]) C(s[T])

Pred(s,a) Pred(s,a) Pred(s,a) Pred(s,a)


s[0] s[t] s[t+1] s[T-1]

action
a[0] a[t] a[t+1] a[T-1]
Actor
Training the World Model
with
Self-Supervised Learning

Capturing dependencies between inputs.


Representing uncertainty.
Y. LeCun

Self-Supervised Learning = Learning to Fill in the Blanks


Reconstruct the input or Predict missing parts of the input.
time or space →

This is a [...] of text extracted [...] a large set of [...] articles


Y. LeCun

Self-Supervised Learning = Learning to Fill in the Blanks


Reconstruct the input or Predict missing parts of the input.
time or space →

This is a piece of text extracted from a large set of news articles


Y. LeCun

Two Uses for Self-Supervised Learning

1. Learning hierarchical representations of the world


SSL pre-training precedes a supervised or RL phase

2. Learning predictive (forward) models of the world


Learning models for Model-Predictive Control, policy
learning for control, or model-based RL.

Question: how to represent uncertainty & multi-


modality in the prediction?
Y. LeCun

Learning Paradigms: information content per sample


“Pure” Reinforcement Learning (cherry)
The machine predicts a scalar reward given once in a
while.
A few bits for some samples

Supervised Learning (icing)


The machine predicts a category or a few numbers
for each input
Predicting human-supplied data
10→10,000 bits per sample

Self-Supervised Learning (cake génoise)


The machine predicts any part of its input for any
observed part.
Predicts future frames in videos
Millions of bits per sample
Y. LeCun

The world is stochastic

Training a system to make a single


prediction makes it predict the
average of all plausible predictions
Blurry predictions!

Divergence
Prediction measure
y C(y,y)

G(x) Deterministic
Function

x y
Y. LeCun

The world is unpredictable. Output must be multimodal.

Training a system to make a single


prediction makes it predict the
average of all plausible predictions
Blurry predictions!
Y. LeCun

How do we represent uncertainty in the predictions?

The world is only partially


predictable
How can a predictive model
represent multiple
predictions?
Probabilistic models are
intractable in continuous
domains.
Generative Models must
predict every detail of the
world
My solution: Joint-
Embedding Predictive
Architecture
Energy-Based
Models
Capture dependencies through
an energy function.
See “A tutorial on Energy-Based
Learning” [LeCun et al. 2006]
Y. LeCun

Energy-Based Models: Implicit function

Gives low energy for compatible pairs of x and y


Gives higher energy for incompatible pairs

F(x,y) Energy
Function

x y y

time or space →

x
Y. LeCun

Energy-Based Models
Divergence
Feed-forward nets use a finite number of steps
to produce a single output. Prediction measure

What if… y C(y,y)

The problem requires a complex computation to


produce its output? (complex inference) G(x) Feed-forward
architecture
There are multiple possible outputs for a single
input? (e.g. predicting future video frames) x y

Set of constraints
Inference through constraint satisfaction That y must satisfy
Finding an output that satisfies constraints: e.g a
F(x,y)
linguistically correct translation or speech
transcription.
Maximum likelihood inference in graphical models x y
Y. LeCun

Energy-Based Models (EBM)


Energy function F(x,y) scalar-valued. Energy
Takes low values when y is compatible with x and higher Function
F(x,y)
values when y is less compatible with x
Inference: find values of y that make F(x,y) small.
There may be multiple solutions x y

Note: the energy is used for inference, not for learning

Example
Blue dots are
data points
y x
Y. LeCun

Energy-Based Model: implicit function


Energy function that captures the x,y dependencies:
Low energy near the data points. Higher energy everywhere else.
If y is continuous, F should be smooth and differentiable, so we can use
gradient-based inference algorithms.

y
Energy
Function
F(x,y)

x y

x
Y. LeCun

Energy-Based Model: unconditional version


Conditional EBM: F(x,y) Energy
Unconditional EBM: F(y) Function
measures the compatibility between the F(x,y)
components of y
If we don’t know in advance which part of
y is known and which part is unknown x y

F(y)
Energy
y2 Function
Dark = low energy (good)
Bright = high energy (bad)
y
Purple = data manifold
y1
Y. LeCun

Energy-Based Models vs Probabilistic Models


Energy
Probabilistic models are a special case of EBM Function
Energies are like un-normalized negative log probabilities F(x,y)

Why use EBM instead of probabilistic models?


EBM gives more flexibility in the choice of the scoring
function. x y

More flexibility in the choice of objective function for


learning
From energy to probability: Gibbs-Boltzmann
distribution
Beta is a positive constant
Y. LeCun

Latent-Variable EBM

Latent variable z:
Captures the information in y that is not available in x
Computed by minimization

=
x y
Y. LeCun

Latent-Variable Generative EBM Architecture


Latent variables:
parameterize the set of predictions
y C(y,y)
Prediction
Ideally, the latent variable
represents independent Dec(z,h)
explanatory factors of variation h
of the prediction.
z

The information capacity of the Latent


latent variable must be Pred(x) Variables
minimized.
Otherwise all the information for x y
the prediction will go into it.
Observation Desired Prediction

You might also like