0% found this document useful (0 votes)
39 views

01 Introduction 2023

This document provides information about an upcoming computer vision course at Shanghai Jiao Tong University. It introduces the instructor, Professor Xu Zhao, and provides details about the course including its motivation, information platform, grading policy, syllabus, and reference materials. The course aims to cover fundamental concepts in computer vision as well as open problems through lectures and assignments over 15 weeks. Students will complete problem sets, a final project, and be evaluated based on attendance, assignments, and their project.

Uploaded by

evelyn
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views

01 Introduction 2023

This document provides information about an upcoming computer vision course at Shanghai Jiao Tong University. It introduces the instructor, Professor Xu Zhao, and provides details about the course including its motivation, information platform, grading policy, syllabus, and reference materials. The course aims to cover fundamental concepts in computer vision as well as open problems through lectures and assignments over 15 weeks. Students will complete problem sets, a final project, and be evaluated based on attendance, assignments, and their project.

Uploaded by

evelyn
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 83

Shanghai Jiao Tong University

Instructor: Xu Zhao

Computer Vision Class No.: AU7005


Spring 2023

About Me

❖ Xu Zhao (赵旭),Professor at Department of


Automation, SEIEE, SJTU
❖ Research direction: Computer Vision, Image Processing,
Machine Learning, Pattern Recognition
❖ Email: [email protected]
❖ Office: 2-431 SEIEE Buildings

About this course

❖ Motivation
❖ Application: promising and significant direction
toward practical artificial intelligence.
❖ Research: in CV field, many open problems need to
be solved by inventing diverse methodologies from
different research domains.

Course information
❖ All in Canvas https://oc.sjtu.edu.cn/login/canvas

๏ Sildes
๏ Video
๏ Assignments
๏ Discussion
๏ Announcement
❖ Keeping in touch: Wechat group
❖ Office hours:

๏ Time: 16:00-17:00, Thursday

๏ Location: 2-431, SEIEE Buildings


Grading policy

❖ 3 Problem sets: 15% × 3 = 45 %

❖ Attendance: 10 %

❖ Final project: 45 %

๏ Team work: 4 people each


๏ Proposal: introduce your idea with 1-2 pages
๏ Final report: CVPR paper format

Syllabus
Event type Contents Hours Week No.
Unit 0 Introduction 3 1
Lecture 1 Introduction 3 1

Unit I Image formation 6 2-3


Lecture 2 Camera and geometric fundamentals 3 2
Lecture 3 Light and color 3 3
Unit II Visual Representation 9 4-6
Lecture 4 Local image features 3 4
Lecture 5 Motion features: optical flow and beyond 3 5
Lecture 6 Feature learning: CNN 3 6
Event type Contents Hours Week No.

Unit III Visual Reconstruction 9 7-9

Lecture 7 Calibration 3 7

Lecture 8 Stereopsis 3 8
Lecture 9 Structure from motion 3 9
Unit IV Grouping and Fitting 9 10-12
Lecture 10 Segmentation 3 10
Lecture 11 Fitting 3 11
Lecture 12 Registration 3 12
Unit V Recognition: high-level vision 9 13-15
Lecture 13 Learning based recognition 3 13
Lecture 14 BoW model 3 14
Lecture 15 Object detection 3 15
Final project Project presentation and evaluation 3 16
Reference
Reference
Reference
Other resources

❖ Conference: CVPR, ICCV, ECCV, ACCV, AAAI,…


❖ Journal: IEEE TPAMI, IJCV, IEEE TIP,…
❖ arXiv

Lecture 1: Introduction
Contents
❖ What is vision?
๏ Psychological perspective

๏ David Marr’s theory

❖ What is computer vision?


๏ Motivation

๏ History

๏ Current state: applications and challenges

❖ Course information

What is vision?
❖ What does it mean, to see?
❖ Philosophy perspective: “To know what is where by looking”. -plain
man’s answer. Vision is a process discovering from images what is
present in the world and where it is, and furthermore, what action
are taking place. [David Marr, Vision]
❖ Biological perspective: The special sense by which the qualities of
an object (such as color, luminosity, shape, and size) constituting its
appearance are perceived through a process in which light rays
entering the eye are transformed by the retina into electrical signals
that are transmitted to the brain via the optic nerve. [Merriam-
webster]

Human visual system


❖ The eye’s spatial
resolution is about 0.01∘
over a 150∘ field of view
(not evenly spaced, there
is a fovea and a
peripheral region).
❖ Intensity resolution is
about 11 bits/element,
spectral range is 400–
700nm.
❖ Temporal resolution is ❖ Retina measures about 5 × 5 cm and
about 100 ms (10 Hz). contains 108 sampling elements (rods
❖ Two eyes give a data rate and cones).
of about 3 GBytes/s!

❖ Vision is the most powerful of our own senses.


❖ Around 1/3 of our brain is devoted to processing the signals from our eyes.
❖ The visual cortex has around o(1011) neurons.

Visual perception

❖ Sensation: Provide basic fact in the view field and


encoding the physical energy (optical signal) as neural
signal which brain can recognize.
❖ Perception: A series of processes that organize and
interpret sensory information from external objects and
events.
๏ Perceptual organization

๏ Identification and recognition


Visual information processing


Example:
Psychological process
Top-down processing (concept driving)

Bottom-up processing (data driving)


Expectation Memory
Language Motivation
Belief Knowledge

Identification/Recognition Recognized as kind of“Graph”

Analysis

Perceptual organization Rectangle

Organization Continuance Depth

Sensation Retina image

Sensory process

Environmental stimulus

Distal stimulas Proximal stimulas


Perceptual organization

❖ Form the description to the objects


❖ Size, shape, motion, distance, orientation…
❖ Computing process integrating past experience and
current input
❖ Integration: from parts to whole
❖ Fast and having no cognitive process

Challenges: ambiguity
Challenges: illusion

Zollner illusion Muller Lyer illusion Ebbinghaus Illusion


Ames room
Psychological approaches
❖ Hermann von Helmholtz classical theory (1866):
unconscious inference, role of the experience
❖ Gestalt psychology (1920s): whole, structure, nature. The
guiding principle behind the Gestalt movement was that
the whole was greater than the sum of its parts.
❖ Gibson theory of ecological optics (1960s-1970s): focus
on the attributes of the stimulus from the real world;
perception is a kind of active exploring to stable
environment.

Examples of the Gestalt laws

Verywell / JR Bee 
Perceptual constancy

Size Lightness
Perceptual constancy

Shape
Orientation
Identification and recognition
❖ Top-down and bottom-up process
Identification and recognition

❖ Context plays critical role


David Marr’s theory
❖ David Marr (1945-1980): Pioneer
scientist of computer vision.
David Courtenay Marr was a British
neuroscientist and physiologist.
Marr integrated results from
psychology, artificial intelligence,
and neurophysiology into new
models of visual processing. His
work was very influential in
computational neuroscience and
led to a resurgence of interest in
the discipline. - WiKi
David Marr’s theory

❖ Vision is a complex information processing task.


❖ A representation is a formal system for making explicit certain
entities or types of information, together with a specification of
how the system does this.
❖ “And I shall call the result of using a representation to describe a
given entity a description of the entity in that representation” -
Marr and Nishihara, 1978

David Marr’s theory

❖ The three levels at which any machine carrying out an information-


processing task must be understood [Vision, David Marr]
๏ Computational theory: What is the goal of the computation, why is
it appropriate, and what is the logic of the strategy by which it can
be carried out?
๏ Representation and algorithm: How can this computational theory
be implemented? In particular, what is the representation for the
input and output, and what is the algorithm for the transformation?
๏ Hardware implementation: How can the representation and
algorithm be realized physically?

Representational framework
Name Purpose Primitives

Intensity value at each point in


Images Represent intensity
the image

Makes explicit important information Zero-crossings Blobs,


about the 2D image, primarily the Terminations and discontinuities,
Primal sketch Edge segments, Virtual lines,
intensity changes there and their
geometrical distribution and organization. Groups, Curvilinear organization
Boundaries
Local surface orientation (the
Makes explicit the orientation and rough
“needles” primitives); Distance
depth of the visible surfaces and contours
2.5 D sketch from viewer; Discontinuities in
of discontinuities in these quantities in a
depth; Discontinuities in surface
viewer centered coordinate frame
orientation
Describes shapes and their spatial
3-D models arranged
organization in an object-centered
hierarchically, each one based on
3-D model rep- coordinate frame, using a modular
a spatial configuration of a few
resentation hierarchical representation that includes
sticks or axes, to which
volumetric primitives as well as surface
volumetric or surface shape
primitives.
primitives are attached

Discussions

❖ Why human vision is inspiring for designing CV


algorithms?
❖ In your opinion, what's the pros and cons of data driven
and concept driven methods?

What is computer vision?


❖ Vision is about discovering from images what is present
in the scene and where it is.
❖ In Computer Vision a camera (or several cameras) is
linked to a computer. The computer interprets images
of a real scene to obtain information useful for tasks
such as navigation, manipulation and recognition.
❖ Computer vision aims to recover useful information
about a (3D) scene from its 2D projections (images).

Motivation

❖ Replicate human vision to allow a machine to see


๏ Central to that problem of Artificial Intelligence
๏ Many industrial applications
❖ Gain insight into how we see
๏ Vision is explored extensively by neuroscientists to
gain an understanding of how the brain operates

What is it related to?

Fei-FeiLi & JustinJohnson & SerenaYeung


History
❖ 1970s: Early thriving

๏ 1966: Marvin Minsky, at MIT asked his undergraduate student Gerald Jay Sussman to “spend
the summer linking a camera to a computer and getting the computer to describe what it saw”

๏ 1970s: Scene understanding by finding edges and then inferring the 3D structure

๏ 1970s: Three-dimensional modeling of non-polyhedral objects, generalized cylinder

๏ 1973: pictorial structures for object recognition, Fischler and Elschlager

๏ 1980s: qualitative approach to understanding intensities and shading variations. Shape from
shading.

๏ 1970s-1980s: Stereo corresponding algorithms.

๏ 1970-1980: simultaneously recovering 3D structure and camera motion

๏ 1970s-1980s: Intensity based optical flow

๏ 1970s: David Marr: three level visual information processing system



Fei-FeiLi & JustinJohnson & SerenaYeung


History
❖ 1980s: Utilization of sophisticated mathematical techniques

๏ Motion and structure from feature correspondences

๏ Image pyramid and wavelets for multiple resolution image


processing

๏ Shape-from-X (stereo, texture, shading, focus)

๏ Canny for edge detection (1986)

๏ Active contours (snake, 1988)

๏ MRF applied into CV (1984)

๏ Variational optimization problems and regularization



History
❖ 1990s: Existing techniques continued to be explored, learning based appeared

๏ Projective invariants for recognition, structure from motion

๏ Factorization techniques

๏ Bundle adjustment

๏ Physics-based vision, physical models of radiance transport and color image


formation

๏ Dense stereo correspondence algorithms, multi-view stereo algorithms

๏ Tracking: particle filter

๏ Image segmentation: mean-shift, normalized cut

๏ Statistical learning based

๏ Interaction with computer graphics


Fei-FeiLi & JustinJohnson & SerenaYeung


History
❖ 2000s: Sophisticated machine learning techniques

๏ Projective feature-based techniques (combined with learning) for object


recognition

๏ Pictorial structure: Felzenszwalb and Huttenlocher

๏ Interest point features

๏ Complex global optimization problems

๏ Probabilistic graph models

๏ Loopy belief propagation (LBP)

๏ Graph cut

๏ Dimensionality reduction

Fei-FeiLi & JustinJohnson & SerenaYeung


History

❖ 2010s: Full prosperity of deep learning


๏ Big data driven learning for CV task
๏ Deep learning and CNN
๏ GPU

Fei-FeiLi & JustinJohnson & SerenaYeung


Fei-FeiLi & JustinJohnson & SerenaYeung
Discussions

❖ Why machine learning gradually become as the main


methodology of CV?
❖ What could be the future evolving direction of deep
learning?

Applications: OCR
❖ Technology to convert images of text into text

Applications: Vision-based biometrics
❖ Face, Fingerprint, Iris, Palm, Gait…
Facial login without a password…

Liang et al. 2014

Applications: Human shape capture

http://gl.ict.usc.edu/Research/presidentialportrait/

Applications: Motion capture


Applications: Human computer interaction
Applications: Autonomous driving
Applications: Industrial robots

Vision-guided robots position nut runners on wheels


Applications: Mobile robot

Logistic
robots

Saxena et al.
2008 Robotics
STAIR at from Boston
Stanford dynamics

Applications: Medical

3D imaging
MRI, CT

Da Vinci surgical robotics


Image guided surgery
Grimson et al., MIT

Applications: Vision in space

NASA'S Mars Exploration Rover Spirit captured this westward view from atop
a low plateau where Spirit spent the closing months of 2007.

Vision systems used for several tasks


❖ Panorama stitching
❖ 3D terrain modeling
❖ Obstacle detection, position tracking
❖ For more, read “Computer Vision on Mars” by Matthies et al.


Applications: Sports

Free viewpoint video. Canon 2017


Applications: Augmented Reality and Virtual Reality

MS HoloLens, Oculus, Magic Leap,


ARCore / ARKit , Google glass …

Applications: Automatic retail


Applications: 3D reconstruction

Building Rome in a Day: Agarwal et al. 2009 Pollefeys et al.


Applications: Advanced photo search


Discussion

❖ How CV techniques could help people to fight with the


coronavirus?
It’s a challenging task
❖ Computer vision is an inverse problem: insufficient
information to recover some unknowns.
❖ A lot of efforts need to be made to disambiguate
between potential solutions.
❖ Real visual world is more than complex, and one
can view it through countless viewpoints…
❖ Image formation is a complex function takes input
as many variables to transform 3D as 2D.

From [Sinha and Adelson 1993]

Challenges: View point variation

Michelangelo 1475-1564

Challenges: Occlusion

Magritte, 1957

Challenges: Background clutter

Klimt, 1913

Challenges: Object intra-class variation

Slide by Fei-Fei, Fergus & Torralba


Challenges: Measurement vs. perception
Challenges: Illumination
Challenges: Scale
Challenges: Deformation

Mr. Bean
Challenges: Local appearance ambiguity

Slide by Fei-Fei, Fergus & Torralba


Discussions

❖ How the data driven approaches solve the inverse


problem?
❖ Is there any chance for CV to fully defeat human vision
in future?

Assignment

❖ Reading and thinking: Read the book chapter Part-1-


Vision-David Marr.pdf and the first chapter
of SzeliskiBook_20100903_draft.pdf. And then think
about the questions posted on the related discussions. 

You might also like