0% found this document useful (0 votes)
10 views

Lecture 01

Uploaded by

jinyaoz
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Lecture 01

Uploaded by

jinyaoz
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 79

Computer Vision!

CS-E4850, 5 study credits!


!
Juho Kannala!
Aalto University!
Plan for today!

• Background!
• What is computer vision?!
• Why to study computer vision?!

• Overview of the course!


• Lecture 1: Image formation!

Credits: Material for slides borrowed from Victor Prisacariu, Andrew Zisserman, Esa Rahtu, James Hays, !
Derek Hoiem, Svetlana Lazebnik, Steve Seitz, David Forsyth, and others!
Course personnel!

!
• Lecturer: !
Juho Kannala
[email protected]!

• Main course assistant:!


Xiaotian Li
[email protected] !
A few words about me!

Juho Kannala!
Assistant Professor of Computer vision!
• PhD, University of Oulu 2010!

• Professor at Aalto since 2016!

• Working with computer vision since 2000 !

• Recent projects and other info available on my homepage: https://users.aalto.fi/~kannalj1/ !


Motivation - what is computer vision?!
Make computers understand images!

• What kind of scene?!


• Where are the cars?!
• How far are the buildings?!
• Where are the cars going?!
• …..!
Many data modalities!

• 2D or 3D still images !
• Video frames!
• X-ray !
• Ultra-sound!
• Microscope!
• ….!
What kind of information can be extracted?!

Semantic information! Geometric information!


What do we have here?!

… seems pretty easy…


Wrong! Very hard big data problem…!

• Hardware perspective:!
• RGB stereo images with 30 frames per second -> 100s MB/s data stream.!
• Non-trivial processing per each byte.!
• Massive image collections.!

• Mathematical perspective!
• Information is highly implicit or lost by perspective projection!
• 2D -> 3D mapping is ill-posed and ill-conditioned -> need to use constraints!
Wrong! Very hard big data problem…!

• Artificial intelligence perspective!


• Images have uneven information content !
• Computational visual semantics is hard (what does visual stuff mean exactly?)!
• If we have limited time, what is the important visual stuff right now?!

Still a massive challenge - if we want genuine autonomy.!


Natural vision !

• Humans see effortlessly!


Natural vision!

• Humans see effortlessly, but… it is very hard work for our brains!!
• There are billions of neurons in human brain!
• Years of evolution generated hardwired priors.!

So why bother?
What are the advantages?
Why computer vision matters?!

• Engineering point of view - Computer Vision helps to


solve many practical problems: business potential!
• Scientific point of view - Human kind of visual system is
one of the grand challenges of Artificial Intelligence (AI)!
• AI itself is a grand challenge of computing !
Why computer vision matters?!

• Safety!
• Health!
• Security!
• Fun!
• Access!
• ….!
Computer vision is already here!

• You are surrounded by !


devices using computer vision!
• Imagine what can be done !
with already installed cameras!!
Motivation - Success stories!
Recognizing “simple” patterns!
Face recognition!
Object detection and recognition!
Reconstruction: 3D from photo collections!

The Visual Turing test for Scene Reconstruction,!


Shan, Adams, Curless, Furukawa, Seitz, in 3DV 2013. YouTube video.!
A recent commercial 3D reconstruction system!

YouTube!
Robotics!

NASA’s Mars Rover! Robocup!


See “Computer Vision on Mars”! See www.robocup.org !

STAIRS at Stanford!
Saxena et al. 2008 !
Self-driving cars (Nvidia @ CES 2016)!
Visual odometry and SLAM!
Augmented Reality (AR) and Virtual Reality (VR)!
Image generation!

A style-based generator architecture for generative adversarial networks. Karras, Laine, Aila. CVPR 2019.!
Current state of the affairs!

• Many of the previous examples are less than 5 years old!!


• Many new applications to appear in the next 5 years!
• Strong open source culture!
• Many recent state-of-the-art methods are freely available!
• See papers from top conferences like CVPR, ECCV, ICCV, and NeurIPS!
5160

Rapidly growing area!

2019
Attendees and submissions to IEEE Conference on !
Computer Vision and Pattern Recognition (CVPR)!
Rapidly growing area !

Ref. Google Scholar top publications.!


Rapidly growing area - substantial commercial interest!

CVPR 2018 sponsors!


Plenty of job opportunities!

• Companies are looking for computer vision and deep learning experts.!
• Big Internet players are investing heavily (Apple, Google, Facebook,
Microsoft, Baidu, Tencent, …) as well as car industry (Tesla, BMW,…)!
• Strong imaging ecosystem also in Finland!
Specifics of this course!
Course textbooks!

• Szeliski: Computer Vision!


• Full-copy freely available!

• Hartley & Zisserman: Multiple!


View Geometry in Computer Vision!
• Available as an e-book via library!

• Forsyth & Ponce: Computer Vision!


• Full-copy freely available!
What will you learn on this course?!

• Course content (numbers refer to chapters in Szeliski’s book,1st edition):!


• Image formation and processing (2, 3)!
• Feature detection and matching (4)!
• Feature based alignment and image stitching (6,9)!
• Optical flow and tracking (8)!
• Basics of image classification and convolutional neural networks!
• Object recognition and detection (14)!
• Structure from motion, stereo and 3D reconstruction (7, 11, 12)!
What will you NOT learn on this course?!

• Software packages!
• PyTorch, TensorFlow, Keras, Caffe, etc.!
• We have simple exercises with Python/Matlab though!

• In-depth deep learning!


• Tweaking architectures, loss functions, etc.!
• Note that there exists a separate deep learning course (CS-E4890) !

• All the bells and whistles in the state-of-the-art systems!


• We concentrate on the basic concepts (get them right and the rest is easier for you)!
Organization!

• Lectures on Mondays at 8-10 (12 lectures)!


• Exercises on Fridays at 12-14 (12 sessions)!
• The solutions of weekly homework assignments should be returned before the session!
• The solutions are presented in the session !

• Guidance available if needed!


• Slack and guidance sessions on Thursdays (see MyCourses)!

• Presence is not rewarded, only returned homework and exam counts!


Requirements!

• Get more than 0 points from at least 8 exercise rounds !


(i.e. solve at least 1 task from 8 different weekly rounds)!
• Pass the exam!
Hints!

• Doing homework takes time but is often a good way to learn in depth!
• Try to do more than the minimum - homework points are taken into
account in the grading (i.e. weighted exercise points are added to
exam points)!
• Note that the amount of work and bonus points varies a bit between
weeks - exercises are published early so that you can do them in
advance if needed!
Questions at this point?!
Lecture 1: Camera model!
Relevant reading!

• Chapters 2, 3, and 6 in [Hartley & Zisserman]!


• Comprehensive presentation of the core content!

• Chapter 2 in [Szeliski]!
• Broader overview of the image formation!
This is (a picture of) a cat!

Credits: Victor Prisacariau!


Cat lives in a 3D world!

The point X in world space projects to the point x in image space.!


Credits: Victor Prisacariau!
Going from X in 3D to x in 2D!

The output would be blurry if film just exposed to the cat.!


Pinhole camera!

All rays passing through a single point (center of projection)!


Pinhole camera!
Pinhole camera!
What happens in the projection?!

• Projection from 3D to 2D -> information is lost!


• What properties are preserved?!
• Straight lines!
• Incidence!

• What properties are not preserved?!


• Angles!
• Lengths!
Projective geometry - what is lost?!
Length is not preserved!
Angles are not preserved!
Straight lines are still straight!
Vanishing points and lines!

• Parallel lines in the world!


intersect at a “vanishing point”!
Constructing the vanishing point of a line!
Vanishing points and lines!

All parallel lines will have the same vanishing point.!


Homogenous coordinates!

• The projection x1 = fX1/x3 is non linear!!


• Can be made linear using
homogenous coordinates!
• Homogenous coordinates allow for
transforms to be concatenated easily!
Homogenous coordinates!

Conversion to homogenous coordinates!

Conversion from homogenous coordinates!


Invariance to scaling!

E.g. [1,2,3] is the same as [3,6,9] and both represent !


the same inhomogeneous point [0.33,0.66]. !
Basic geometry in homogenous coordinates!

• Line equation: ax+by+c=0!


!
• A pixel p in homogenous coordinates:!
!
• Line is given by cross product of two points!
!
• Intersection of two lines is given by cross !
product of the lines!
3D Euclidean transformation!

• Cat moves through 3D space!


• The movement of the nose can be !
described using an Euclidean Transform!
Building the 3D rotation matrix R!

• R can be build from various representations (Euler angles, quaternion,


angle-axis representation, latter ones recommended)!
• Euler angles represent the rotation using three parameters, one for
each axis:!
!
!
!
!
!
!
!
!
3D Euclidean transformation!

• Concatenation of successive transforms is a mess!!


Homogenous coordinates save the day!!

• Replace 3D points with homogenous versions!

• The Euclidean transform becomes!

• Transformation can now be concatenated by matrix multiplication!


More 3D-3D and 2D-2D transformations!

3
Examples of 2D-2D transforms!
Perspective transformation (3D-2D)!
Perspective using homogenous coordinates!
Perspective using homogenous coordinates!
Wait! Our setup has several assumptions!

• Camera at world origin!


• Camera aligned with world
coordinates!
• Ideal pinhole camera!
Removing the initial assumptions!

• It is useful to split the overall projection matrix into three parts:!


• A part that depends on the internals of the camera (intrinsic)!
• A vanilla projection matrix!
• An Euclidean transformation between the world and camera frames (extrinsic)!

• Assume first that the world is aligned with camera coordinates!


-> the extrinsic camera matrix is an identity!
More realistic setting - camera pose!

• Assume the camera is translated and rotated with respect to the world!
The camera pose!

• The non-ideal camera pose can be taken into account by first


rotating and translating points from world frame to the camera frame!
The intrinsic parameters!

• Transformation to pixel units from metric units !


• Describe the hardware properties of a real camera!
• The image plane might be skewed!
• The pixels might not be square!
Summary of steps from scene to image!

• Move the scene point (Xw,1)T into camera coordinate system by!
4x4 (extrinsic) Euclidean transformation:!
!
!
• Project into ideal camera via the vanilla perspective transformation!
!

• Map the ideal image into the real image using intrinsic matrix!
Camera projection matrix P!
Beyond pinholes: Radial distortion!

• Common in wide-angle lenses!


• Creates non-linear terms in projection! Original!

• Usually handled by solving non-linear!


terms and then correcting the image!

Corrected!
Things to remember!

• Pinhole camera model!


!
!
• Homogenous coordinates!
!
!
• Camera projection matrix!
The end!

You might also like