Lec 01 Introduction Compressed
Lec 01 Introduction Compressed
Lecture 1 – Introduction
1.1 Organization
1.2 Introduction
3
1.1
Organization
Team
7
Exercises
8
Lecture Notes
Frameworks / IDEs:
I Visual Studio Code
https://code.visualstudio.com/
I Google Colab
https://colab.research.google.com 11
Course Materials
Courses:
I Gkioulekas (CMU): Computer Vision
http://www.cs.cmu.edu/~16385/
Linear Algebra:
I Vectors: x, y ∈ Rn
I Matrices: A, B ∈ Rm×n
I Operations: AT , A−1 , Tr(A), det(A), A + B, AB, Ax, x> y
I Norms: kxk1 , kxk2 , kxk∞ , kAkF
I SVD: A = UDV>
14
Prerequisites
15
Prerequisites
Deep Learning:
I Machine learning basics, linear and logistic regression
I Computation graphs, backpropagation algorithm
I Activation and loss functions, initialization
I Regularization and optimization of deep neural networks
I Convolutional neural networks
I Recurrent neural networks
I Graph neural networks
I Autoencoders and generative adversarial networks
16
Time Management
17
1.2
Introduction
Artificial Intelligence
“An attempt will be made to find how to make machines use language, form
abstractions and concepts, solve kinds of problems now reserved for humans, and
improve themselves.” [John McCarthy]
I Machine Learning
I Computer Vision
I Computer Graphics
I Natural Language Processing
I Robotics & Control
I Art, Industry 4.0, Education ...
19
Computer Vision
Over 50% of the processing in the human brain is dedicated to visual information.
23
Computer Vision vs. Computer Graphics
2D Image 3D Scene
Graphics
Vision
Pixel Matrix Objects Material
217 191 252 255 239
102
159
179
80
94
106
200
91
136
146
121
85
138
138
41
Shape/Geometry Motion
Semantics 3D Pose
115 129 83 112 67
94 114 105 111 89
24
Computer Vision vs. Computer Graphics
2D Image 3D Scene
Graphics
Vision
Computer Vision is an ill-posed inverse problem:
I Many 3D scenes yield the same 2D image
I Additional constraints (knowledge about world) required
24
Computer Vision vs. Image Processing
Deng, Dong, Socher, Li, Li and Li: ImageNet: A large-scale hierarchical image database. CVPR, 2009. 26
The Deep Learning Revolution
32
Why is Visual Perception hard?
2D Image 3D Scene
Graphics
Vision
34
Challenges: Images are 2D Projections of the 3D World
Adelson and Pentland: The perception of shading and reflectance. Perception as Bayesian inference, 1996. 35
Ames Room Illusion
36
Perspective Illusion
37
Challenges: Viewpoint Variation
Michelangelo (1475-1564) 38
Challenges: Deformation
Xu Beihong (1943) 39
Challenges: Occlusion
43
Challenges: Perception vs. Measurement
http://persci.mit.edu/gallery/checkershadow
Image Credits: Edward H. Adelson 44
Challenges: Perception vs. Measurement
46
Challenges: Local Ambiguities
http://www.homeworkshop.com/
Image Credits: Antonio Torralba 49
Challenges: Number of Object Categories
Steven Seitz (Univ. of Washington): 3D Computer Vision: Past, Present, and Future
I http://www.youtube.com/watch?v=kyIzMr917Rc
I http://www.cs.washington.edu/homes/seitz/talks/3Dhistory.pdf
52
Pre-History
53
1510: Perspectograph
54
1839: Daguerreotype
55
1802-1871: Great Trigonometrical Survey
56
Overview
Waves of development:
I 1960-1970: Blocks Worlds, Edges and Model Fitting
I 1970-1981: Low-level vision: stereo, flow, shape-from-shading
I 1985-1988: Neural networks, backpropagation, self-driving
I 1990-2000: Dense stereo and multi-view stereo, MRFs
I 2000-2010: Features, descriptors, large-scale structure-from-motion
I 2010-now: Deep learning, large datasets, quick growth, commercialization
ng
ni
s
et
ar
ry
l
ve
s
N
Le
et
re
ks
-le
om
ra
tu
ep
oc
eu
a
Ge
De
Lo
Fe
Bl
1957: Stereo
I Gilbert Hobrough demonstrated an
analog implementation of stereo
image correlation
I This led to the creation of the
Raytheon-Wild B8 Stereomat
I Used to create Elevation Maps
(Photogrammetry, since 1840)
eo
er
St
t
ec
oj
Pr
er
m
m
Su
t
per
Pa
y/
sk
in
M
S
Sf
es
shadows and lighting
ag
Im
c
si
n
tri
In
o
ere
St
relaxed subsequently
ric
et
om
ot
Ph
rix
I Key ideas known for 100 years
at
lM
ia
nt
se
Es
1950 1960 1970 1980 1990 2000 2010 2020
Longuet-Higgins. A computer algorithm for reconstructing a scene from two projections. Nature, 1981. 67
A Brief History of Computer Vision
eo
er
St
1950 1960 1970 1980 1990 2000 2010 2020
Baker and Binford: Depth from Edge and Intensity Based Stereo. IJCAI, 1981. 68
A Brief History of Computer Vision
ow
Fl
I Horn-Schunck algorithm
al
tic
Op
1950 1960 1970 1980 1990 2000 2010 2020
Horn and Schunck: Determining Optical Flow. Artificial Intelligence, 1981. 69
A Brief History of Computer Vision
s
RF
1950 1960 1970 1980 M 1990 2000 2010 2020
Geman and Geman: Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images. TPAMI, 1984. 70
A Brief History of Computer Vision
rts
Pa
n
io
at
I Remains main workhorse today
pag
pro
ck
1950 1960 1970 1980 Ba 1990 2000 2010 2020
Rumelhart, Hinton and Williams: Learning representations by back-propagating errors. Nature, 1986. 72
A Brief History of Computer Vision
s
oR
M
Va
1950 1960 1970 1980 1990 2000 2010 2020
73
A Brief History of Computer Vision
N
N
VI
AL
1950 1960 1970 1980 1990 2000 2010 2020
Pomerleau: ALVINN: An Autonomous Land Vehicle in a Neural Network. NIPS, 1988. 74
A Brief History of Computer Vision
1992: Structure-from-Motion
I Estimating 3D structures from 2D
image sequences of static scenes
I Requires only a single camera
I Tomasi-Kanade factorization
provides closed-form (SVD-based)
solution for orthographic case
I Today: non-linear least squares
M
Sf
1950 1960 1970 1980 1990 2000 2010 2020
Tomasi and Kanade: Shape and motion from image streams under orthography: a factorization method. IJCV, 1992. 75
A Brief History of Computer Vision
P
IC
1950 1960 1970 1980 1990 2000 2010 2020
Besl and McKay: A Method for Registration of 3-D Shapes. PAMI, 1992. 76
A Brief History of Computer Vision
on
si
Fu
ric
et
m
lu
Vo
1950 1960 1970 1980 1990 2000 2010 2020
Curless and Levoy: A Volumetric Method for Building Complex Models from Range Images. SIGGRAPH, 1996. 77
A Brief History of Computer Vision
VS
M
1950 1960 1970 1980 1990 2000 2010 2020
Faugeras and Keriven: Complete Dense Stereovision Using Level Set Methods. ECCV, 1998. 78
A Brief History of Computer Vision
ts
Cu
scanline stereo
h
ap
Gr
1950 1960 1970 1980 1990 2000 2010 2020
Boykov, Veksler and Zabih: Markov Random Fields with Efficient Approximations. CVPR, 1998. 79
A Brief History of Computer Vision
et
nvN
Co
1950 1960 1970 1980 1990 2000 2010 2020
LeCun, Bottou, Bengio and Haffner: Gradient-based learning applied to document recognition. Proceedings of the IEEE, 1998. 80
A Brief History of Computer Vision
ls
e
od
M
le
ab
ph
or
M
1950 1960 1970 1980 1990 2000 2010 2020
Blanz and Vetter: A Morphable Model for the Synthesis of 3D Faces. SIGGRAPH, 1999. 81
A Brief History of Computer Vision
1999: SIFT
I Scale Invariant Feature Transform
I Detection and description of salient
local features in an image
I Enabled many applications
(e.g., image stitching, reconstruction,
motion estimation, . . . )
FT
SI
1950 1960 1970 1980 1990 2000 2010 2020
Lowe: Object Recognition from Local Scale-Invariant Features. ICCV, 1999. 82
A Brief History of Computer Vision
m
ris
ou
ot
ot
Ph
1950 1960 1970 1980 1990 2000 2010 2020
Snavely, Seitz and Szeliski: Photo tourism: exploring photo collections in 3D. SIGGRAPH, 2006. 83
A Brief History of Computer Vision
2007: PMVS
I Patch-based Multi View Stereo
I Robust reconstruction of various
small and large objects
I Performance of 3D reconstruction
techniques continues to increase
VS
PM
1950 1960 1970 1980 1990 2000 2010 2020
Furukawa and Ponce: Accurate, Dense, and Robust Multi-View Stereopsis. CVPR 2007. 84
A Brief History of Computer Vision
y
Da
a
in
e
m
Ro
1950 1960 1970 1980 1990 2000 2010 2020
Agarwal, Snavely, Simon, Seitz and Szeliski: Building Rome in a day. ICCV, 2009. 85
A Brief History of Computer Vision
2011: Kinect
I Active light 3D sensing
I ML for 3D pose estimation
I Multiple hardware generations
I Early versions failed to
commercialize but heavily used for
robotics and vision research
c t
ne
Ki
1950 1960 1970 1980 1990 2000 2010 2020
Shotton et al.: Real-time human pose recognition in parts from single depth images. CVPR, 2011. 86
A Brief History of Computer Vision
et
N
via GPU training, deep models, data
ex
Al
e/
ag
Im
1950 1960 1970 1980 1990 2000 2010 2020
Krizhevsky, Sutskever and Hinton: ImageNet classification with deep convolutional neural networks. NIPS, 2012. 87
A Brief History of Computer Vision
et
N
via GPU training, deep models, data
ex
Al
e/
I Sparked deep learning revolution
ag
Im
1950 1960 1970 1980 1990 2000 2010 2020
Krizhevsky, Sutskever and Hinton: ImageNet classification with deep convolutional neural networks. NIPS, 2012. 87
A Brief History of Computer Vision
ts
se
ta
Da
1950 1960 1970 1980 1990 2000 2010 2020
Geiger, Lenz and Urtasun: Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite. CVPR, 2012. 88
A Brief History of Computer Vision
tic
he
nt
Sy
1950 1960 1970 1980 1990 2000 2010 2020
Dosovitskiy et al.: FlowNet: Learning Optical Flow with Convolutional Networks. ICCV, 2015. 89
A Brief History of Computer Vision
tic
he
nt
Sy
1950 1960 1970 1980 1990 2000 2010 2020
Dosovitskiy et al.: FlowNet: Learning Optical Flow with Convolutional Networks. ICCV, 2015. 89
A Brief History of Computer Vision
2014: Visualization
I Goal: provide insights into what the
network (black box) has learned
I Visualized image regions that most
strongly activate various neurons at
different layers of the network
I Found that higher levels capture
more abstract semantic information
n
io
at
a liz
su
Vi
1950 1960 1970 1980 1990 2000 2010 2020
Zeiler and Fergus: CNN Features Off-the-Shelf: An Astounding Baseline for Recognition. CVPR Workshops, 2014. 90
A Brief History of Computer Vision
es
pl
m
xa
E
v.
Ad
1950 1960 1970 1980 1990 2000 2010 2020
Szegedy et al.: Intriguing properties of neural networks. ICLR, 2014. 91
A Brief History of Computer Vision
s
N
GA
1950 1960 1970 1980 1990 2000 2010 2020
Goodfellow, Pouget-Abadie, Mirza, Xu, Warde-Farley, Ozair, Courville, Bengio: Generative Adversarial Networks. NIPS, 2014. 92
A Brief History of Computer Vision
2014: DeepFace
I Combination of model-based
alignment with deep learning
for face recognition
I First model to reach human-level
face recognition performance
ce
Fa
ep
De
1950 1960 1970 1980 1990 2000 2010 2020
Taigman, Yang, Ranzato and Wolf: DeepFace: Closing the Gap to Human-Level Performance in Face Verification. CVPR, 2014. 93
A Brief History of Computer Vision
es
en
Sc
3D
1950 1960 1970 1980 1990 2000 2010 2020
Geiger, Lauer, Wojek, Stiller and Urtasun: 3D Traffic Scene Understanding From Movable Platforms. PAMI, 2014. 94
A Brief History of Computer Vision
2014: 3D Scanning
I 3D scanning techniques allow for
creating accurate replicas
I Debevec’s team scans Obama
I Exhibition in Smithsonian
I https://dpo.si.edu/blog/
ng
ni
an
Sc
3D
1950 1960 1970 1980 1990 2000 2010 2020
Metallo et al.: Scanning and printing a 3D portrait of president Barack Obama. SIGGRAPH Studio, 2015. 95
A Brief History of Computer Vision
RL
ep
De
1950 1960 1970 1980 1990 2000 2010 2020
Mnih et al.: Human-level control through deep reinforcement learning. Nature, 2015. 96
A Brief History of Computer Vision
er
sf
an
Tr
yle
St
1950 1960 1970 1980 1990 2000 2010 2020
Gatys, Ecker and Bethge: Image Style Transfer Using Convolutional Neural Networks. CVPR, 2016. 97
A Brief History of Computer Vision
s
tic
an
m
Se
1950 1960 1970 1980 1990 2000 2010 2020
Kundu, Vineet and Koltun: Feature Space Optimization for Semantic Video Segmentation. CVPR, 2016. 98
A Brief History of Computer Vision
N
CN
R-
k
as
M
1950 1960 1970 1980 1990 2000 2010 2020
He, Gkioxari, Dollár and Ross Girshick: Mask R-CNN. ICCV, 2017. 99
A Brief History of Computer Vision
ng
ni
io
pt
Ca
1950 1960 1970 1980 1990 2000 2010 2020
Karpathy and Fei-Fei: Deep Visual-Semantic Alignments for Generating Image Descriptions. PAMI, 2017. 100
A Brief History of Computer Vision
s
an
um
H
1950 1960 1970 1980 1990 2000 2010 2020
Kanazawa, Black, Jacobs and Malik: End-to-End Recovery of Human Shape and Pose. CVPR, 2018. 101
A Brief History of Computer Vision
DL
3D
1950 1960 1970 1980 1990 2000 2010 2020
Niemeyer, Mescheder, Oechsle, Geiger: Differentiable Volumetric Rendering: Learning Implicit 3D Representations without 3D Supervision. CVPR, 2020. 102
A Brief History of Computer Vision
Google Portrait Mode Skydio 2 Drone Self-Driving Cars Microsoft Hololens Iris Recognition
Current Challenges
I Un-/Self-Supervised Learning
I Interactive learning
I Accuracy (e.g., self-driving)
I Robustness and generalization
I Inductive biases
I Understanding and mathematics
I Memory and compute
I Ethics and legal questions
104