0% found this document useful (0 votes)
1 views

Computer_vision_part1

Temp
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views

Computer_vision_part1

Temp
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 96

Computer Vision

IFT6758 - Data Science

Sources:
http://www.cs.cmu.edu/~16385/
http://cs231n.stanford.edu/2018/syllabus.html

http://www.cse.psu.edu/~rtc12/CSE486/
What is Computer vision?

Vision is the act of knowing what is
where by looking.

--Aristotle

• Computer vision is a field of study focused on the problem of


helping computers to see.

!2
Computer vision vs. Image
processing
• Computer vision is distinct
from image processing.

• Image processing is the


process of creating a new
image from an existing image,
typically simplifying or
enhancing the content in some
way.

• Computer vision is concerned


with understanding the
content of an image.

!3
CV tasks (4 Rs)

1.Reconstruction

2.Registration

3.Reorganization

4.Recognition

!4
CV tasks (4 Rs)

1.Reconstruction

2.Registration

3.Reorganization

4.Recognition

Multiview Geometry, 3D Vision, Shape-from-X

!5
CV tasks (4 Rs)

1.Reconstruction

2.Registration

3.Reorganization

4.Recognition

Tracking, Alignment, Optical Flow, Correspondence


!6
CV tasks (4 Rs)

Clustering, Unsupervised Learning, Segmentation, Perceptual


1.Reconstruction
Organization
2.Registration

3.Reorganization

4.Recognition

!7
CV tasks (4 Rs)

1.Reconstruction

2.Registration

3.Reorganization

4.Recognition

Verification, Identification, Detection

!8
Why study Computer Vision?

• Images and movies are everywhere

• Fast-growing collection of useful applications

• building representations of the 3D world from pictures

• automated surveillance (who’s doing what)

• movie post-processing

• face finding

• Greater understanding of human vision

!9
Earth view (3d modelling)

Image from Microsoft’s Virtual Earth


(see also: Google Earth)

!10
Optical character recognition (OCR)

Technology to convert scanned docs to text


• If you have a scanner, it probably came with OCR software

Digit recognition, AT&T labs License plate readers


http://www.research.att.com/~yann/ http://en.wikipedia.org/wiki/Automatic_number_plate_recognition

!11
Face and smile detection

Many new digital cameras now detect faces


• Canon, Sony, Fuji, …

Who is she?

Sony Cyber-shot® T70 Digital Still Camera


!12
Vision biometric

“How the Afghan Girl was Identified by Her Iris Patterns” Read the story

Face recognition systems now


beginning to appear more widely
Fingerprint scanners on http://www.sensiblevision.com/
many new laptops,
other devices
!13
Object recognition

This is becoming real:


• Microsoft Research
• Point & Find, Nokia
LaneHawk by EvolutionRobotics
“A smart camera is flush-mounted in the
checkout lane, continuously watching for
items… “

!14
Sports and games

Digimask: put your face on a 3D avatar.

Nintendo Wii has camera-based IR


Sportvision first down line
tracking built in.
Nice explanation on www.howstuffworks.com

!15
Robotics

NASA’s Mars Spirit Rover


http://www.robocup.org/
http://en.wikipedia.org/wiki/Spirit_rover

!16
Medical imaging

Image guided surgery


3D imaging
Grimson et al., MIT
MRI, CT

!17
CV Challenges

!18
How machines see an image?

• Machines see and process everything using numbers, including images and
text. How do you convert images to numbers?

!19
Image
• Every number represents the pixel intensity at that particular location. e.g.,
for a grayscale image where every pixel contains only one value i.e. the
intensity of the black color at that location.

!20
What is an image?
• Color images will have multiple values for a single pixel. These values represent the
intensity of respective channels – Red, Green and Blue channels for RGB images,
for instance.

!21
Challenges of recognition

!22
Challenge: Viewpoint variation

!23
Challenge: Viewpoint variation

!24
Challenge: Illumination

Object appearance changes with respect to lighting magnitude and direction.

!25
What is Color?

Illumination Sensor Response

Surface
Reflection

!26
Color
• Color percepts are a composition of three factors (illumination, surface
reflectance, sensor response)

• We can’t easily factor the color we see in the image to infer illumination and
material (even if sensor properties are fixed and known).

!27
Is The Dress Blue and Black or White and Gold?

This dress manages to simultaneously gather more than 670,000 people on Buzzfeed, and convince
900,000 visitors to take a poll.

!28
Challange: Color Constancy

!29
Challenge: Deformation

!30
Challenge: Occlusion

!31
Challenge: Variation

!32
Distance Metric to compare images

!33
Distance metrics on pixels

!34
CV main Operations

!35
CV pipeline

!36
CV main Operations

!37
CV main Operations

!38
CV main Operations

!39
CV Pipeline

!40
Images as functions

!41
Images as functions

!42
Input Image

• By default, the imread function reads images in the BGR (Blue-Green-Red) format. We can read
images in different formats using extra flags in the imread function:

cv2.IMREAD_COLOR: Default flag for loading a color image


cv2.IMREAD_GRAYSCALE: Loads images in grayscale format
cv2.IMREAD_UNCHANGED: Loads images in their given format, including the alpha channel. Alpha
channel stores the transparency information – the higher the value of alpha channel, the more opaque
is the pixel

!43
CV Pipeline

!44
Image Augmentation

• Data augmentation uses the available data samples to


produce the new ones, by applying image operations
like rotation, scaling, translation, etc. This makes our
model robust to changes in input and leads to better
generalization.

!45
Image transformations

!46
Image transformations

!47
Image transformations

!48
Transformation: Warping

!49
Forward Warping

!50
Forward Warping
(resizing)

!51
Backward Warping

!52
Backward Warping

!53
Where the pixels go?

p = (x,y) p’ = (x’,y’)

• Transformation T is a coordinate-changing machine:


p’ = T(p)
• What does it mean that T is global?
– Is the same for any point p
– can be described by just a few numbers (parameters)
• Let’s consider linear xforms (can be represented by a 2D matrix):

!54
Linear Transformation

• Uniform scaling by s:

(0,0) (0,0)

What is the inverse?

!55
Linear Transformation

• Rotation by angle θ (about the origin)

θ
(0,0) (0,0)

What is the inverse?


For rotations:

!56
Transformation with 2x2 Matrices
• What types of transformations can be represented with a
2x2 matrix?
2D mirror about Y axis?

2D mirror across line y = x?

!57
Affine Transformation

• Affine transformations are combinations of …


– Linear transformations, and
–Translations ⎡ x'⎤ ⎡ a b c ⎤ ⎡ x ⎤
⎢ y '⎥ = ⎢d e f ⎥ ⎢ y ⎥
⎢ w ⎥ ⎢ 0 0 1 ⎥ ⎢ w⎥
⎣ ⎦ ⎣ ⎦⎣ ⎦
• Properties of affine transformations:
– Origin does not necessarily map to origin
–Lines map to lines
–Parallel lines remain parallel
–Ratios are preserved
– Closed under composition

!58
Affine Transformation

any transformation with


last row [ 0 0 1 ] we call an
affine transformation

!59
Basic transformation

⎡ x ' ⎤ ⎡1 0 t x ⎤ ⎡ x ⎤ ⎡ x '⎤ ⎡ s x 0 0⎤ ⎡ x ⎤
⎢ y '⎥ = ⎢0 1 t ⎥ ⎢ y ⎥ ⎢ y '⎥ = ⎢ 0 sy ⎥ ⎢
0⎥ ⎢ y ⎥ ⎥
⎢ ⎥ ⎢ y ⎥⎢ ⎥ ⎢ ⎥ ⎢
⎢⎣ 1 ⎥⎦ ⎢⎣0 0 1 ⎥⎦ ⎢⎣ 1 ⎥⎦ ⎢⎣ 1 ⎥⎦ ⎢⎣ 0 0 1⎥⎦ ⎢⎣ 1 ⎥⎦
Translate Scale

⎡ x'⎤ ⎡cos θ − sin θ 0⎤ ⎡ x ⎤ ⎡ x '⎤ ⎡ 1 shx 0⎤ ⎡ x ⎤


⎢ y '⎥ = ⎢ sin θ cos θ 0⎥⎥ ⎢⎢ y ⎥⎥ ⎢ y '⎥ = ⎢ sh ⎥ ⎢ ⎥
⎢ ⎥ ⎢ ⎢ ⎥ ⎢ y 1 0⎥ ⎢ y ⎥
⎢⎣ 1 ⎥⎦ ⎢⎣ 0 0 1⎥⎦ ⎢⎣ 1 ⎥⎦ ⎢⎣ 1 ⎥⎦ ⎢⎣ 0 0 1⎥⎦ ⎢⎣ 1 ⎥⎦
2D in-plane rotation Shear

!60
Projective transformation

what happens when we mess with this


row?

affine transformation

!61
Projective transformation

Called a homography
(or planar perspective map)

!62
Projective transformation

• Projective transformations …
– Affine transformations, and ⎡ x' ⎤ ⎡ a b c ⎤⎡ x ⎤
⎢ y '⎥ = ⎢d e f ⎥⎢ y ⎥
– Projective warps ⎢ w'⎥ ⎢ g h i ⎥⎦ ⎢⎣ w⎥⎦
⎣ ⎦ ⎣

• Properties of projective transformations:


– Origin does not necessarily map to origin
– Lines map to lines
– Parallel lines do not necessarily remain parallel
– Ratios are not preserved
– Closed under composition
!63
Backward Warping

!64
Bilinear interpolation

Bilinear interpolation; the output pixel value is a weighted average of pixels in the
nearest 2-by-2 neighborhood

!65
Linear Interpolation
(recall)

!66
Linear Interpolation
(recall)

!67
Bilinear Interpolation

!68
Image resizing

• Machine learning models work with a fixed sized input. The same idea
applies to computer vision models as well. The images we use for training
our model must be of the same size.

• Images can be easily scaled up and down

• Different interpolation and downsampling methods are supported by


OpenCV. OpenCV’s resize function uses bilinear interpolation by default.

!69
Resizing with OpenCV

!70
Rotate

• Suppose we are building an image classification model for identifying the


animal present in an image. So, both the images shown below should be
classified as ‘dog’:

!71
Rotation with OpenCV

!72
Shifting

!73
CV Pipeline

!74
Image transformations

!75
Image filtering

1D

2D

!76
Point processing

!77
Point processing
(How to implement them?)

!78
Filtering

!79
Enhancing Examples

!80
Image filtering

!81
2D discrete-space systems
(filters)

!82
Filter example:
Moving average
• Also known as Box filter

mask or
weight kernel

• 2D moving average over a 3×3 window of neighborhood

!83
Filter example:
Moving average

!84
Filter example:
Moving average

!85
Filter example:
Moving average

!86
Filter example:
Moving average

!87
Filter example:
Moving average

Achieve smoothing effect (remove sharp features)


!88
Filter example:
Image Segmentation

!89
Filter example:
Image Segmentation
• Non-contextual: grouping pixels with similar global features.

• Contextual: grouping pixels with similar features and in close locations.

!90
Filtering properties

!91
Shift invariant
• Filter replaces each pixel by a linear combination of its
neighbors (and possibly itself). The combination is
determined by the filter’s kernel.

!92
Shift invariant
Is the moving average system shift invariant?

!93
Shift invariant
Is the moving average system shift invariant?

!94
Linear filtering

Linear filtering means linear combination of neighboring pixel values.

• Is the moving average system a linear system?

• Is thresholding a linear system?

!95
Convolution

• Any linear, shift invariant operator can be represented as


convolution!

!96

You might also like