0% found this document useful (0 votes)

39 views

01 Introduction 2023

This document provides information about an upcoming computer vision course at Shanghai Jiao Tong University. It introduces the instructor, Professor Xu Zhao, and provides details about the course including its motivation, information platform, grading policy, syllabus, and reference materials. The course aims to cover fundamental concepts in computer vision as well as open problems through lectures and assignments over 15 weeks. Students will complete problem sets, a final project, and be evaluated based on attendance, assignments, and their project.

Uploaded by

evelyn

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

39 views

01 Introduction 2023

Uploaded by

evelyn

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 83

Shanghai Jiao Tong University

Instructor: Xu Zhao

Computer Vision Class No.: AU7005

Spring 2023

About Me

❖ Xu Zhao (赵旭)，Professor at Department of

Automation, SEIEE, SJTU
❖ Research direction: Computer Vision, Image Processing,
Machine Learning, Pattern Recognition
❖ Email: [email protected]
❖ Office: 2-431 SEIEE Buildings

About this course

❖ Motivation
❖ Application: promising and significant direction
toward practical artificial intelligence.
❖ Research: in CV field, many open problems need to
be solved by inventing diverse methodologies from
different research domains.

Course information
❖ All in Canvas https://oc.sjtu.edu.cn/login/canvas

๏ Sildes
๏ Video
๏ Assignments
๏ Discussion
๏ Announcement
❖ Keeping in touch: Wechat group
❖ Office hours:

๏ Time: 16:00-17:00, Thursday

๏ Location: 2-431, SEIEE Buildings

Grading policy

❖ 3 Problem sets: 15% × 3 = 45 %

❖ Attendance: 10 %

❖ Final project: 45 %

๏ Team work: 4 people each

๏ Proposal: introduce your idea with 1-2 pages
๏ Final report: CVPR paper format

Syllabus
Event type Contents Hours Week No.
Unit 0 Introduction 3 1
Lecture 1 Introduction 3 1

Unit I Image formation 6 2-3

Lecture 2 Camera and geometric fundamentals 3 2
Lecture 3 Light and color 3 3
Unit II Visual Representation 9 4-6
Lecture 4 Local image features 3 4
Lecture 5 Motion features: optical flow and beyond 3 5
Lecture 6 Feature learning: CNN 3 6
Event type Contents Hours Week No.

Unit III Visual Reconstruction 9 7-9

Lecture 7 Calibration 3 7

Lecture 8 Stereopsis 3 8
Lecture 9 Structure from motion 3 9
Unit IV Grouping and Fitting 9 10-12
Lecture 10 Segmentation 3 10
Lecture 11 Fitting 3 11
Lecture 12 Registration 3 12
Unit V Recognition: high-level vision 9 13-15
Lecture 13 Learning based recognition 3 13
Lecture 14 BoW model 3 14
Lecture 15 Object detection 3 15
Final project Project presentation and evaluation 3 16
Reference
Reference
Reference
Other resources

❖ Conference: CVPR, ICCV, ECCV, ACCV, AAAI,…

❖ Journal: IEEE TPAMI, IJCV, IEEE TIP,…
❖ arXiv

Lecture 1: Introduction
Contents
❖ What is vision?
๏ Psychological perspective

๏ David Marr’s theory

❖ What is computer vision?

๏ Motivation

๏ History

๏ Current state: applications and challenges

❖ Course information

What is vision?
❖ What does it mean, to see?
❖ Philosophy perspective: “To know what is where by looking”. -plain
man’s answer. Vision is a process discovering from images what is
present in the world and where it is, and furthermore, what action
are taking place. [David Marr, Vision]
❖ Biological perspective: The special sense by which the qualities of
an object (such as color, luminosity, shape, and size) constituting its
appearance are perceived through a process in which light rays
entering the eye are transformed by the retina into electrical signals
that are transmitted to the brain via the optic nerve. [Merriam-
webster]

Human visual system

❖ The eye’s spatial
resolution is about 0.01∘
over a 150∘ field of view
(not evenly spaced, there
is a fovea and a
peripheral region).
❖ Intensity resolution is
about 11 bits/element,
spectral range is 400–
700nm.
❖ Temporal resolution is ❖ Retina measures about 5 × 5 cm and
about 100 ms (10 Hz). contains 108 sampling elements (rods
❖ Two eyes give a data rate and cones).
of about 3 GBytes/s!

❖ Vision is the most powerful of our own senses.

❖ Around 1/3 of our brain is devoted to processing the signals from our eyes.
❖ The visual cortex has around o(1011) neurons.

Visual perception

❖ Sensation: Provide basic fact in the view field and

encoding the physical energy (optical signal) as neural
signal which brain can recognize.
❖ Perception: A series of processes that organize and
interpret sensory information from external objects and
events.
๏ Perceptual organization

๏ Identification and recognition

Visual information processing

Example:
Psychological process
Top-down processing (concept driving)

Bottom-up processing (data driving)

Expectation Memory
Language Motivation
Belief Knowledge

Identification/Recognition Recognized as kind of“Graph”

Analysis

Perceptual organization Rectangle

Organization Continuance Depth

Sensation Retina image

Sensory process

Environmental stimulus

Distal stimulas Proximal stimulas

Perceptual organization

❖ Form the description to the objects

❖ Size, shape, motion, distance, orientation…
❖ Computing process integrating past experience and
current input
❖ Integration: from parts to whole
❖ Fast and having no cognitive process

Challenges: ambiguity
Challenges: illusion

Zollner illusion Muller Lyer illusion Ebbinghaus Illusion

Ames room
Psychological approaches
❖ Hermann von Helmholtz classical theory (1866):
unconscious inference, role of the experience
❖ Gestalt psychology (1920s): whole, structure, nature. The
guiding principle behind the Gestalt movement was that
the whole was greater than the sum of its parts.
❖ Gibson theory of ecological optics (1960s-1970s): focus
on the attributes of the stimulus from the real world;
perception is a kind of active exploring to stable
environment.

Examples of the Gestalt laws

Verywell / JR Bee
Perceptual constancy

Size Lightness
Perceptual constancy

Shape
Orientation
Identification and recognition
❖ Top-down and bottom-up process
Identification and recognition

❖ Context plays critical role

David Marr’s theory
❖ David Marr (1945-1980): Pioneer
scientist of computer vision.
David Courtenay Marr was a British
neuroscientist and physiologist.
Marr integrated results from
psychology, artificial intelligence,
and neurophysiology into new
models of visual processing. His
work was very influential in
computational neuroscience and
led to a resurgence of interest in
the discipline. - WiKi
David Marr’s theory

❖ Vision is a complex information processing task.

❖ A representation is a formal system for making explicit certain
entities or types of information, together with a specification of
how the system does this.
❖ “And I shall call the result of using a representation to describe a
given entity a description of the entity in that representation” -
Marr and Nishihara, 1978

David Marr’s theory

❖ The three levels at which any machine carrying out an information-

processing task must be understood [Vision, David Marr]
๏ Computational theory: What is the goal of the computation, why is
it appropriate, and what is the logic of the strategy by which it can
be carried out?
๏ Representation and algorithm: How can this computational theory
be implemented? In particular, what is the representation for the
input and output, and what is the algorithm for the transformation?
๏ Hardware implementation: How can the representation and
algorithm be realized physically?

Representational framework
Name Purpose Primitives

Intensity value at each point in

Images Represent intensity
the image

Makes explicit important information Zero-crossings Blobs,

about the 2D image, primarily the Terminations and discontinuities,
Primal sketch Edge segments, Virtual lines,
intensity changes there and their
geometrical distribution and organization. Groups, Curvilinear organization
Boundaries
Local surface orientation (the
Makes explicit the orientation and rough
“needles” primitives); Distance
depth of the visible surfaces and contours
2.5 D sketch from viewer; Discontinuities in
of discontinuities in these quantities in a
depth; Discontinuities in surface
viewer centered coordinate frame
orientation
Describes shapes and their spatial
3-D models arranged
organization in an object-centered
hierarchically, each one based on
3-D model rep- coordinate frame, using a modular
a spatial configuration of a few
resentation hierarchical representation that includes
sticks or axes, to which
volumetric primitives as well as surface
volumetric or surface shape
primitives.
primitives are attached

Discussions

❖ Why human vision is inspiring for designing CV

algorithms?
❖ In your opinion, what's the pros and cons of data driven
and concept driven methods?

What is computer vision?

❖ Vision is about discovering from images what is present
in the scene and where it is.
❖ In Computer Vision a camera (or several cameras) is
linked to a computer. The computer interprets images
of a real scene to obtain information useful for tasks
such as navigation, manipulation and recognition.
❖ Computer vision aims to recover useful information
about a (3D) scene from its 2D projections (images).

Motivation

❖ Replicate human vision to allow a machine to see

๏ Central to that problem of Artificial Intelligence
๏ Many industrial applications
❖ Gain insight into how we see
๏ Vision is explored extensively by neuroscientists to
gain an understanding of how the brain operates

What is it related to?

Fei-FeiLi & JustinJohnson & SerenaYeung

History
❖ 1970s: Early thriving

๏ 1966: Marvin Minsky, at MIT asked his undergraduate student Gerald Jay Sussman to “spend
the summer linking a camera to a computer and getting the computer to describe what it saw”

๏ 1970s: Scene understanding by finding edges and then inferring the 3D structure

๏ 1970s: Three-dimensional modeling of non-polyhedral objects, generalized cylinder

๏ 1973: pictorial structures for object recognition, Fischler and Elschlager

๏ 1980s: qualitative approach to understanding intensities and shading variations. Shape from
shading.

๏ 1970s-1980s: Stereo corresponding algorithms.

๏ 1970-1980: simultaneously recovering 3D structure and camera motion

๏ 1970s-1980s: Intensity based optical flow

๏ 1970s: David Marr: three level visual information processing system

๏

Fei-FeiLi & JustinJohnson & SerenaYeung

History
❖ 1980s: Utilization of sophisticated mathematical techniques

๏ Motion and structure from feature correspondences

๏ Image pyramid and wavelets for multiple resolution image

processing

๏ Shape-from-X (stereo, texture, shading, focus)

๏ Canny for edge detection (1986)

๏ Active contours (snake, 1988)

๏ MRF applied into CV (1984)

๏ Variational optimization problems and regularization

๏

History
❖ 1990s: Existing techniques continued to be explored, learning based appeared

๏ Projective invariants for recognition, structure from motion

๏ Factorization techniques

๏ Bundle adjustment

๏ Physics-based vision, physical models of radiance transport and color image

formation

๏ Dense stereo correspondence algorithms, multi-view stereo algorithms

๏ Tracking: particle filter

๏ Image segmentation: mean-shift, normalized cut

๏ Statistical learning based

๏ Interaction with computer graphics

Fei-FeiLi & JustinJohnson & SerenaYeung

History
❖ 2000s: Sophisticated machine learning techniques

๏ Projective feature-based techniques (combined with learning) for object

recognition

๏ Pictorial structure: Felzenszwalb and Huttenlocher

๏ Interest point features

๏ Complex global optimization problems

๏ Probabilistic graph models

๏ Loopy belief propagation (LBP)

๏ Graph cut

๏ Dimensionality reduction

Fei-FeiLi & JustinJohnson & SerenaYeung

History

❖ 2010s: Full prosperity of deep learning

๏ Big data driven learning for CV task
๏ Deep learning and CNN
๏ GPU

Fei-FeiLi & JustinJohnson & SerenaYeung

Fei-FeiLi & JustinJohnson & SerenaYeung
Discussions

❖ Why machine learning gradually become as the main

methodology of CV?
❖ What could be the future evolving direction of deep
learning?

Applications: OCR
❖ Technology to convert images of text into text
❖
Applications: Vision-based biometrics
❖ Face, Fingerprint, Iris, Palm, Gait…
Facial login without a password…

Liang et al. 2014

Applications: Human shape capture

http://gl.ict.usc.edu/Research/presidentialportrait/

Applications: Motion capture

Applications: Human computer interaction
Applications: Autonomous driving
Applications: Industrial robots

Vision-guided robots position nut runners on wheels

Applications: Mobile robot

Logistic
robots

Saxena et al.
2008 Robotics
STAIR at from Boston
Stanford dynamics

Applications: Medical

3D imaging
MRI, CT

Da Vinci surgical robotics

Image guided surgery
Grimson et al., MIT

Applications: Vision in space

NASA'S Mars Exploration Rover Spirit captured this westward view from atop
a low plateau where Spirit spent the closing months of 2007.

Vision systems used for several tasks

❖ Panorama stitching
❖ 3D terrain modeling
❖ Obstacle detection, position tracking
❖ For more, read “Computer Vision on Mars” by Matthies et al.

 
Applications: Sports

Free viewpoint video. Canon 2017

Applications: Augmented Reality and Virtual Reality

MS HoloLens, Oculus, Magic Leap,

ARCore / ARKit , Google glass …

Applications: Automatic retail

Applications: 3D reconstruction

Building Rome in a Day: Agarwal et al. 2009 Pollefeys et al.

Applications: Advanced photo search

Discussion

❖ How CV techniques could help people to fight with the

coronavirus?
It’s a challenging task
❖ Computer vision is an inverse problem: insufficient
information to recover some unknowns.
❖ A lot of efforts need to be made to disambiguate
between potential solutions.
❖ Real visual world is more than complex, and one
can view it through countless viewpoints…
❖ Image formation is a complex function takes input
as many variables to transform 3D as 2D.

From [Sinha and Adelson 1993]

Challenges: View point variation

Michelangelo 1475-1564

Challenges: Occlusion

Magritte, 1957

Challenges: Background clutter

Klimt, 1913

Challenges: Object intra-class variation

Slide by Fei-Fei, Fergus & Torralba

Challenges: Measurement vs. perception
Challenges: Illumination
Challenges: Scale
Challenges: Deformation

Mr. Bean
Challenges: Local appearance ambiguity

Slide by Fei-Fei, Fergus & Torralba

Discussions

❖ How the data driven approaches solve the inverse

problem?
❖ Is there any chance for CV to fully defeat human vision
in future?

Assignment

❖ Reading and thinking: Read the book chapter Part-1-

Vision-David Marr.pdf and the first chapter
of SzeliskiBook_20100903_draft.pdf. And then think
about the questions posted on the related discussions.

1st Sem Psychology Notes
100% (3)
1st Sem Psychology Notes
67 pages
Preview: Impact of Selection Process On Organisational Performance: A Case Study of Unilever Nig. PLC.
No ratings yet
Preview: Impact of Selection Process On Organisational Performance: A Case Study of Unilever Nig. PLC.
24 pages
DLCV CH0 Syllabus v2
No ratings yet
DLCV CH0 Syllabus v2
16 pages
EECS 442: Prof. David Fouhey Winter 2019, University of Michigan
No ratings yet
EECS 442: Prof. David Fouhey Winter 2019, University of Michigan
64 pages
Impact of Corporate Governance On Financial Performance of Deposit Money Banks in Nigeria
No ratings yet
Impact of Corporate Governance On Financial Performance of Deposit Money Banks in Nigeria
94 pages
Capstone Report
No ratings yet
Capstone Report
55 pages
lect1
No ratings yet
lect1
53 pages
1 Introduction
No ratings yet
1 Introduction
67 pages
DIP Intro Class1
No ratings yet
DIP Intro Class1
88 pages
OSINT Combine - OSINT Course - Jakarta - Mar - 2021 - Joining - Instruction - INP
No ratings yet
OSINT Combine - OSINT Course - Jakarta - Mar - 2021 - Joining - Instruction - INP
3 pages
RVP Syllabus
No ratings yet
RVP Syllabus
4 pages
PROJECT REPORT- SIGN LANGUAGE TO TEXT CONVERSION
No ratings yet
PROJECT REPORT- SIGN LANGUAGE TO TEXT CONVERSION
34 pages
Object Tracking
No ratings yet
Object Tracking
50 pages
Project Report D
No ratings yet
Project Report D
50 pages
Computer Engineeirng Department, Uet Taxila Course Plan
No ratings yet
Computer Engineeirng Department, Uet Taxila Course Plan
2 pages
Study and Implementation of Object Detection and Visual Tracking
No ratings yet
Study and Implementation of Object Detection and Visual Tracking
32 pages
Block-4-output
No ratings yet
Block-4-output
101 pages
Facial Expression Detection Using Deep Learning
No ratings yet
Facial Expression Detection Using Deep Learning
6 pages
Sagar Paper
No ratings yet
Sagar Paper
4 pages
Facial Expression Recognition System Using Convolu
No ratings yet
Facial Expression Recognition System Using Convolu
29 pages
Information Visualization Courses For Students With A Computer Science Background
No ratings yet
Information Visualization Courses For Students With A Computer Science Background
4 pages
Visualization 1 Introduction 1
No ratings yet
Visualization 1 Introduction 1
53 pages
Lec 00
No ratings yet
Lec 00
76 pages
A Presentation On Visual Perception: MR .Mayur Rahul Sir
No ratings yet
A Presentation On Visual Perception: MR .Mayur Rahul Sir
18 pages
Project_Exhibition_2 Report GRP254 (1)
No ratings yet
Project_Exhibition_2 Report GRP254 (1)
49 pages
30 - 05 - 23 - 10389010 - 15 - HHW Science XI 2023-24
No ratings yet
30 - 05 - 23 - 10389010 - 15 - HHW Science XI 2023-24
3 pages
Thesis
No ratings yet
Thesis
240 pages
Human Activity Recognition Using CNN
No ratings yet
Human Activity Recognition Using CNN
51 pages
IPCV2
No ratings yet
IPCV2
13 pages
Ppt Finale
No ratings yet
Ppt Finale
17 pages
1 Visualization-Introduction-And-Course-Overview
No ratings yet
1 Visualization-Introduction-And-Course-Overview
106 pages
S1-QuantitativeImageAnalysis CourseIntro May2018 PDF
No ratings yet
S1-QuantitativeImageAnalysis CourseIntro May2018 PDF
41 pages
Mini Project-1 Documentation of Emotion Detection
No ratings yet
Mini Project-1 Documentation of Emotion Detection
73 pages
Lec 01 Introduction Compressed
No ratings yet
Lec 01 Introduction Compressed
111 pages
DOC-20241212-WA0007
No ratings yet
DOC-20241212-WA0007
23 pages
MR - Chetan Seminar Report
No ratings yet
MR - Chetan Seminar Report
42 pages
The Projects Prelim Pages
No ratings yet
The Projects Prelim Pages
10 pages
Facial Recognition Using Deep Learning
No ratings yet
Facial Recognition Using Deep Learning
6 pages
A Kali Prasad Sarangi Major Project - Final
No ratings yet
A Kali Prasad Sarangi Major Project - Final
31 pages
Facial Emotion and Object Detection For Visually Impaired Blind Persons IJERTV10IS090108
No ratings yet
Facial Emotion and Object Detection For Visually Impaired Blind Persons IJERTV10IS090108
4 pages
ASSESSMENT OF INFORMATION AND COMMUNICATION
No ratings yet
ASSESSMENT OF INFORMATION AND COMMUNICATION
75 pages
JabnounISDA2015
No ratings yet
JabnounISDA2015
6 pages
StudyofeyeTraking IEEE
No ratings yet
StudyofeyeTraking IEEE
7 pages
Final_Report
No ratings yet
Final_Report
39 pages
Articulo
No ratings yet
Articulo
10 pages
DSC327_DVT_V3.1
No ratings yet
DSC327_DVT_V3.1
2 pages
Computer Vision Course Outline
No ratings yet
Computer Vision Course Outline
3 pages
Lecture1.2- Multimodal Research Tasks
No ratings yet
Lecture1.2- Multimodal Research Tasks
154 pages
Application of deep learning in image recognition
No ratings yet
Application of deep learning in image recognition
8 pages
ML - It
No ratings yet
ML - It
24 pages
Engleza Radio
No ratings yet
Engleza Radio
29 pages
Lec01
No ratings yet
Lec01
50 pages
A_Deep_Learning_Approach_for_Face_Detection_using_YOLO
No ratings yet
A_Deep_Learning_Approach_for_Face_Detection_using_YOLO
4 pages
DeepFake Video Detection
No ratings yet
DeepFake Video Detection
22 pages
Attention Mechanism in Neural Networks
No ratings yet
Attention Mechanism in Neural Networks
22 pages
8B Pre-Attentive Features
No ratings yet
8B Pre-Attentive Features
40 pages
KBAI Lec2-Knowledge Represenation
No ratings yet
KBAI Lec2-Knowledge Represenation
35 pages
01 Introduction
No ratings yet
01 Introduction
51 pages
ANL303 - Week - 1 - Jan 2023 Includes Course Overview
No ratings yet
ANL303 - Week - 1 - Jan 2023 Includes Course Overview
51 pages
Object Detection: Advances, Applications, and Algorithms
From Everand
Object Detection: Advances, Applications, and Algorithms
Fouad Sabry
No ratings yet
Visual Learning Tools
From Everand
Visual Learning Tools
Mason Ross
No ratings yet
Caretaker 6
No ratings yet
Caretaker 6
2 pages
05 Image Filtering 2023
No ratings yet
05 Image Filtering 2023
110 pages
Chapter1 Ling
No ratings yet
Chapter1 Ling
53 pages
An Introduction To Functional Analysis For Science and Engineering
No ratings yet
An Introduction To Functional Analysis For Science and Engineering
60 pages
Scsu Coeld Edtpa Lesson Plan Template 3
No ratings yet
Scsu Coeld Edtpa Lesson Plan Template 3
6 pages
Optical Illusions
100% (1)
Optical Illusions
50 pages
Visual Acuity
No ratings yet
Visual Acuity
62 pages
Optical Illusions
No ratings yet
Optical Illusions
25 pages
Service Description
No ratings yet
Service Description
9 pages
Sensation and Perception: Psychology
No ratings yet
Sensation and Perception: Psychology
92 pages
BA Sem II Cognitive Psychology SLM-Module 1
No ratings yet
BA Sem II Cognitive Psychology SLM-Module 1
26 pages
Perception Organisational Behaviour 1.3.2021 PDF
No ratings yet
Perception Organisational Behaviour 1.3.2021 PDF
10 pages
4 Perception
No ratings yet
4 Perception
44 pages
1st SEMESTER BSc PSYCHOLOGY PORTIONS
No ratings yet
1st SEMESTER BSc PSYCHOLOGY PORTIONS
4 pages
Chapter 3 Perception
No ratings yet
Chapter 3 Perception
90 pages
Instant Access to Vision Science: Photons to Phenomenology 1st Edition – Ebook PDF Version ebook Full Chapters
100% (13)
Instant Access to Vision Science: Photons to Phenomenology 1st Edition – Ebook PDF Version ebook Full Chapters
55 pages
Latitude 28 Newsletter 2014
No ratings yet
Latitude 28 Newsletter 2014
16 pages
A Man Falls To His Death
50% (22)
A Man Falls To His Death
4 pages
Perception Consumer Behaviour
No ratings yet
Perception Consumer Behaviour
62 pages
On Visual Design Thinking The Vis Kids of Architecture
No ratings yet
On Visual Design Thinking The Vis Kids of Architecture
18 pages
Persistence of Vision
No ratings yet
Persistence of Vision
5 pages
In Touch With The Future - The Sense of Touch From Cognitive Neuroscience To Virtual Reality (PDFDrive)
No ratings yet
In Touch With The Future - The Sense of Touch From Cognitive Neuroscience To Virtual Reality (PDFDrive)
481 pages
Experiment Central 2010
100% (2)
Experiment Central 2010
1,481 pages
Psychology
No ratings yet
Psychology
21 pages
Organisational Behaviour Perception - CH 8
No ratings yet
Organisational Behaviour Perception - CH 8
32 pages
Bompas Parr - Future of Food 2023
No ratings yet
Bompas Parr - Future of Food 2023
31 pages
Abigail Okonsintroductiontopsychologynewformatting
No ratings yet
Abigail Okonsintroductiontopsychologynewformatting
134 pages
OB Week 5 Perception
No ratings yet
OB Week 5 Perception
32 pages
Carbon Inpress PDF
No ratings yet
Carbon Inpress PDF
12 pages
Famous Scientific Illusions PDF
No ratings yet
Famous Scientific Illusions PDF
2 pages
Illusion: Causes of Optical Illusion
No ratings yet
Illusion: Causes of Optical Illusion
3 pages
Attention, Perception, Learning, Memory, & Forgetting
No ratings yet
Attention, Perception, Learning, Memory, & Forgetting
319 pages
Unit 1 - James Monaco
No ratings yet
Unit 1 - James Monaco
24 pages