0% found this document useful (0 votes)
10 views

Lec00 Intro For Web

Uploaded by

vibrantshtern
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Lec00 Intro For Web

Uploaded by

vibrantshtern
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 81

CS5670: Intro to Computer Vision (Cornell Tech)

Instructor: Noah Snavely


Instructor
• Noah Snavely ([email protected])

• Research interests:
– Computer vision and graphics
– 3D reconstruction and visualization of Internet photo collections
– Deep learning for computer graphics
Noah’s work
• Automatic 3D reconstruction from Internet photo
collections
“Statue of Liberty” “Half Dome, Yosemite” “Colosseum, Rome”

Flickr photos

3D model
City-scale 3D reconstruction

Reconstruction of Dubrovnik, Croatia, from ~40,000 images


Depth from a single image
Visualizing scenes from tourist photos
Reconstructing dynamic 3D scenes

DynIBaR: Neural Dynamic Image-Based Rendering [https://dynibar.github.io/]


Zhengqi Li, Qianqian Wang, Forrester Cole, Richard Tucker, Noah Snavely
CVPR 2023
Teaching assistants

Michelle Shu Yen-Yu Chang


[email protected] [email protected]
u u

• Please check course webpage for office hours


https://www.cs.cornell.edu/courses/cs5670/2024sp/
Important information
• Textbook:
Rick Szeliski, Computer Vision: Algorithms and Applications online at:
http://szeliski.org/Book/

• Course webpage:
http://www.cs.cornell.edu/courses/cs5670/2024sp/

• Canvas Page:
https://canvas.cornell.edu/courses/61359

• Announcements/discussion via Ed Discussions (via Canvas)


• Assignment turnin via GitHub Classroom and CMSX:
https://cmsx.cs.cornell.edu
Today
1. What is computer vision?

2. Why study computer vision?

3. Course overview

4. Images & image filtering [time permitting]


Today
• Readings
– Szeliski, Chapter 1 (Introduction)
Every image tells a story
• Goal of computer vision:
perceive the “story” behind
the picture
• Compute properties of the
world
– 3D shape
– Names of people or objects
– What happened?
The goal of computer vision
Can computers match human
perception?
• Yes and no (mainly no)
– computers can be better at
“easy” things
– humans are better at “hard”
things

• But huge progress


– Accelerating in the last five
years due to deep learning
– What is considered “hard”
keeps changing
Human perception has its shortcomings

https://twitter.com/pickover/status/1460275132958662657/
But humans can tell a lot about a scene from a
little information…

Source: “80 million tiny images” by Torralba, et al.


The goal of computer vision
The goal of computer vision
• Compute the 3D shape of the world

ZED 2i Camera
The goal of computer vision
• Recognize objects and people

Terminator 2, 1991
slide credit: Fei-Fei, Fergus & Torralba
sky
building

flag

face
banner
wall
street lamp
bus bus

cars slide credit: Fei-Fei, Fergus & Torralba


The goal of computer vision
• “Enhance” images
The goal of computer vision
• Forensics

Source: Nayar and Nishino, “Eyes for Relighting”


Source: Nayar and Nishino, “Eyes for Relighting”
Source: Nayar and Nishino, “Eyes for Relighting”
The goal of computer vision
• Improve photos (“Computational Photography”)

Super-resolution (source: 2d3)

Depth of field on cell phone camera


(source: Google Research Blog)
Removing objects
(Google Magic Eraser)
Low-light photography
(credit: Hasinoff et al., SIGGRAPH ASIA 2016)
April 10, 2019
Why study computer vision?
• Billions of images/videos captured per day

• Huge number of potential applications


• The next slides show the current state of the art
Optical character recognition (OCR)
• If you have a scanner, it probably came with OCR software

Digit recognition, AT&T labs (1990’s) License plate readers


http://en.wikipedia.org/wiki/Automatic_number_plate_recognition
http://yann.lecun.com/exdb/lenet/

Sudoku grabber
http://sudokugrab.blogspot.com/

Automatic check processing


Face detection

• Nearly all cameras detect faces in real


time
– (Why?)
Face analysis and recognition
Vision-based biometrics

Who is she? Source: S. Seitz


Vision-based biometrics

“How the Afghan Girl was Identified by Her Iris Patterns” Read the story

Source: S. Seitz
Login without a password

Fingerprint scanners on Face unlock on Apple iPhone X


many new smartphones See also http://www.sensiblevision.com/
and other devices
New York Times, Jan. 18, 2020
by Kashmir Hill
Bird identification

Merlin Bird ID (based on Cornell Tech technology!)


Special effects: shape capture

The Matrix movies, ESC Entertainment, XYZRGB, NRC


Source: S. Seitz
Special effects: motion capture

Pirates of the Carribean, Industrial Light and Magic Source: S. Seitz


3D face tracking w/ consumer cameras

Snapchat Lenses

Face2Face system (Thies et al.)


Image synthesis

Karras, et al., Progressive Growing of GANs for Improved Quality, Stability, and Variation, ICLR 2018
Which face is real?

https://www.whichfaceisreal.com/
Image synthesis

“An astronaut riding a horse in a “A photo of a Corgi dog riding a bike in Times
photorealistic style” – DALL-E 2 Square. It is wearing sunglasses and a beach hat” –
Imagen
Sports

Sportvision first down line


Explanation on www.howstuffworks.com

Source: S. Seitz
Smart cars

• Mobileye
• Tesla Autopilot
• Safety features in many cars
Self-driving cars

Waymo
Robotics

NASA’s Mars Curiosity Rover Amazon Picking Challenge


https://en.wikipedia.org/wiki/Curiosity_(rover) http://www.robocup2016.org/en/events/amazon-picking-challenge/

Amazon Prime Air Amazon Scout


Medical imaging

3D imaging
(MRI, CT) Skin cancer classification with deep learning
https://cs.stanford.edu/people/esteva/nature/
Virtual & Augmented Reality

6DoF head tracking Hand & body tracking

3D scene understanding 3D-360 video capture


Current state of the art
• You just saw many examples of current systems.
– Many of these are less than 5 years old

• Computer vision is an active research area, and rapidly changing


– Many new apps in the next 5 years
– Deep learning and generative methods powering many modern
applications

• Many startups across a dizzying array of areas


– Generative AI, robotics, autonomous vehicles, medical imaging,
construction, inspection, VR/AR, …
Why is computer vision difficult?

Viewpoint variation

Credit: Flickr user michaelpaul

Scale
Illumination
Why is computer vision difficult?

Motion (Source: S. Lazebnik)


Intra-class variation

Background clutter Occlusion


Challenges: local ambiguity

slide credit: Fei-Fei, Fergus & Torralba


But there are lots of visual cues we can use…

Source: S. Lazebnik
Bottom line
• Perception is an inherently ambiguous problem
– Many different 3D scenes could have given rise to a given 2D
image

Artist Julian Beever with his anamorphic Coke bottle

– We often must use prior knowledge about the world’s structure


Image source: F. Durand
CS5670: Introduction to Computer
Vision
• Project-based course whose goal is to teach you the
basics of computer vision – image processing, geometry,
recognition – in a hands-on way
Course requirements
• Prerequisites
– Data structures
– Good working knowledge of Python programming
– Linear algebra
– Vector calculus

• Course does not assume prior imaging experience


– computer vision, image processing, graphics, etc.
Course overview (tentative)
1. Low-level vision
– image processing, edge detection,
feature detection, cameras, image
formation

2. Geometry & appearance


– projective geometry, stereo, structure
from motion, optimization, lighting &
materials

3. Recognition & generative


models
– object classification, deep learning,
diffusion models
1. Low-level vision
• Basic image processing and image formation

* =
Filtering, edge detection

Feature extraction Image formation


Project: Hybrid images
Project: Feature detection and matching
2. Geometry & appearance

Image credit: IDS Imaging

Projective geometry Stereo vision

Multi-view stereo Structure from motion


Project: Creating panoramas
Project: 3D reconstruction
3. Recognition, Deep Learning & Generative
Models

“dog”

Image classification Convolutional Neural Networks

“a class watching a computer vision lecture at Cornell Tech”

Image generation
Project: Neural Radiance Fields (NeRFs)
Lectures
• Lectures will be held in person in Bloomberg 131
• If there is an instance where you need to attend lecture
remotely, please reach out to the instructor for approval
Grading
• Approximately weekly short quizzes (typically at the
beginning of class on Thursdays)
• One midterm (take-home), one final exam (in class)

• Grade breakdown (subject to minor tweaks):


– Quizzes: 5% (lowest quiz grade dropped)
– Midterm: 16%
– Programming projects: 63%
– Final exam: 16%
Late policy

• Four free “slip days” will be available for the semester

• A late project will be penalized by 10% for each day it is


late (excepting slip days), and no extra credit will be
awarded
Academic Integrity
• Assignments will be done solo or in pairs (we’ll let you know
for each project)
• Please do not leave any code public on GitHub (or the like) at
the end of the semester!
• We will follow the Cornell Code of Academic Integrity
(http://cuinfo.cornell.edu/aic.cfm)
• If you use ChatGPT (or CoPilot, or similar) on coding
assignments, you must disclose that with your submission
– BUT: We advise you to do all coding yourself, unassisted. You will
learn less, and become less capable experts in vision, if you rely on
LLMs.
Questions?

You might also like