0% found this document useful (0 votes)
22 views93 pages

AD8703 BCV Unit IV 2023

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views93 pages

AD8703 BCV Unit IV 2023

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 93

Please read this disclaimer before proceeding:

This document is confidential and intended solely for the educational purpose of
RMK Group of Educational Institutions. If you have received this document
through email in error, please notify the system manager. This document
contains proprietary information and is intended only to the respective group /
learning community as intended. If you are not the addressee you should not
disseminate, distribute or copy through e-mail. Please notify the sender
immediately by e-mail if you have received this document by mistake and delete
this document from your system. If you are not the intended recipient you are
notified that disclosing, copying, distributing or taking any action in reliance on
the contents of this information is strictly prohibited.
AD8703
BASICS OF COMPUTER
VISION
UNIT IV
Department: AI&DS

Batch/Year : 2020 - 2024 /IV

Created by : Dr. V. Seethalakshmi

Date : 12.08.2023
Table of Contents

S
CONTENTS PAGE NO
NO

1 Contents 1

2 Course Objectives 6
8
3 Pre Requisites (Course Names with Code)

4 Syllabus (With Subject Code, Name, LTPC details) 10

5 Course Outcomes 12

6 CO- PO/PSO Mapping 14

7 Lecture Plan 16

8 Activity Based Learning 18

9 Lecture Notes 20

Lecture Slides 40

Lecture Videos 42

10 Assignments 44

11 Part A (Q & A) 47

12 Part B Qs 51

13 Supportive Online Certification Courses 54

Real time Applications in day to day life and to 56


14
Industry

15 Contents Beyond the Syllabus 58

16 Assessment Schedule 60

17 Prescribed Text Books & Reference Books 62

18 Mini Project Suggestions 64


Course Objectives
AD8703 BASICS OF COMPUTER VISION

COURSE OBJECTIVES
To review image processing techniques for computer vision.
To understand various features and recognition techniques
To learn about histogram and binary vision
Apply three-dimensional image analysis techniques
Study real world applications of computer vision algorithms
Prerequisite
PREREQUISITE

NIL
Syllabus
AD8703 -BASICS OF COMPUTER VISION

SYLLABUS 3003

UNIT I INTRODUCTION

Image Processing, Computer Vision ,What is Computer Vision - Low-level, Mid-level,


High-level Fundamentals of Image Formation, Transformation: Orthogonal, Euclidean,
Affine, Projective, Fourier Transform, Convolution and Filtering, Image Enhancement,
Restoration, Histogram Processing.

UNIT II FEATURE EXTRACTION AND FEATURE SEGMENTATION

Feature Extraction -Edges - Canny, LOG, DOG; Line detectors (Hough Transform),
Corners -Harris and Hessian Affine, Orientation Histogram, SIFT, SURF, HOG, GLOH,
Scale-Space Analysis- Image Pyramids and Gaussian derivative filters, Gabor Filters
and DWT. Image Segmentation -Region Growing, Edge Based approaches to
segmentation, Graph-Cut, Mean-Shift, MRFs, Texture Segmentation.

UNIT III IMAGES, HISTOGRAMS, BINARY VISION

Simple pinhole camera model – Sampling – Quantisation – Colour images – Noise –


Smoothing –1D and 3D histograms - Histogram/Image Equalisation - Histogram
Comparison - Back-projection - k-means Clustering – Thresholding - Threshold
Detection Methods - Variations on Thresholding -Mathematical Morphology –
Connectivity.

UNIT IV 3D VISION AND MOTION

Methods for 3D vision – projection schemes – shape from shading – photometric


stereo – shape from texture – shape from focus – active range finding – surface
representations – point-based representation – volumetric representations – 3D object
recognition – 3D reconstruction – introduction to motion – triangulation – bundle
adjustment – translational alignment – parametric motion–spline-based motion- optical
flow – layered motion.

UNIT V APPLICATIONS

Overview of Diverse Computer Vision Applications: Document Image Analysis,


Biometrics, Object Recognition, Tracking, Medical Image Analysis, Content-Based
Image Retrieval, Video Data Processing , Virtual Reality and Augmented Reality.
Course Outcomes
COURSE OUTCOMES

CO1: Recognise and describe how mathematical and scientific concepts are
applied in computer vision.

CO2: Identify and interpret appropriate sources of information relating to


computer vision.

CO3: Apply knowledge of computer vision to real life scenarios..

CO4: Reflect on the relevance of current and future computer vision


applications

CO5: Discuss principles of computer vision using appropriate language and


terminology. Implement various I/O and file management techniques.
CO – PO/ PSO Mapping
CO-PO MAPPING

PO’s/PSO’s
COs
PO PO PO PO PO PO PO PO PO PO PO PO PSO PSO PSO
1 2 3 4 5 6 7 8 9 10 11 12 1 2 3

CO1
3 2 2 2 2 - - - 2 - - 2 2 - -
CO2
3 3 2 2 2 - - - 2 - - 2 2 - -
CO3
2 2 1 1 1 - - - 1 - - 1 2 - -
CO4
3 3 1 1 1 - - - 1 - - 1 2 - -
CO5
3 3 1 1 1 - - - 1 - - 1 3 1 -

1 – Low, 2 – Medium, 3 – Strong


Lecture Plan
LECTURE PLAN

Mode
No of Taxon
S No Topics Actual Lecture Pertaining of
periods Proposed date omy
Date CO delivery
level

Methods for 3D
11.10.20
1 vision – projection 1 CO4 K3 Lecture
23
schemes

shape from shading 11.10.20


2 1 CO4 K4 Lecture
– photometric stereo 23

shape
12.10.20
3 from texture – shape 1 CO4 K3 Lecture
23
from focus
active range finding
12.10.20
4 – surface 1 CO4 K4 Lecture
23
representations
point-based
representation – 16.10.20 Lecture
5 1 CO4 K4
volumetric 23
representations

3D object recognition 16.10.20 Lecture


6 1 CO4 K4
– 3D reconstruction 23

introduction to
motion – Lecture
18.10.20
7 triangulation 1 CO4 K3
23

Lecture
bundle adjustment –
translational 18.10.20
8 1 CO4 K3
alignment – 23
parametric motion

spline-based motion- Lecture


optical flow – layered 19.10.20
9 1 CO4 K4
motion. 23
Activity Based Learning
ACTIVITY BASED LEARNING

UiPath.CV.Activities.CVGetTextWithDescriptor
https://scholarworks.calstate.edu/downloads/hh63sx58j
Lecture Notes
3D VISION AND MOTION

4. Methods for 3D vision


What is 3D Machine Vision?

3D vision is becoming more popular and more mainstream within machine


vision circles. Why? Because it is a powerful technology capable of
providing more accuracy for localisation, recognition, and inspection tasks
that traditional 2D machine vision systems cannot reliably or repeatably
succeed at.

As machine vision applications grow more complex, more creative solutions


are required to solve more difficult problems in machine vision. 3D
machine vision comprises an alternative set of technologies to 2D machine
vision which aim to process these issues in greater depth and provide
solutions to difficulties that 2D systems cannot solve.
3D machine vision systems utilise 4 main forms of technology to
generate 3-dimensional images of an object: Stereo Vision, Time of
Flight (ToF), Laser Triangulation (3D Profiling), and Structured Light.
A 3D vision system furthers the analogy of machine vision as the ‘eyes’
of a computer system, as the addition of accurate depth perception
functions more similarly to human eyes.

Stereo vision, for example, utilises two side-by-side cameras, calibrated


and focused on the same object to provide full field of view 3D
measurements in an unstructured and dynamic environment, based on
triangulation of rays from multiple perspectives.

Laser triangulation, by contrast, measures the alteration of a laser beam


when projected onto the object using a camera perpendicular to the
beam. Where stereo vision can be used to capture stationary objects,
laser triangulation requires a continuous linear motion, which can be
achieved with a conveyor belt for example. This constraint is resolved in
other ways, however, as laser triangulation can provide a spectacularly
detailed point cloud map of the object.
Types of 3D Machine Vision

3D machine vision techniques can be broken down into three broad


categories, which have been listed below:
1. Laser Triangulation
In the laser triangulation technique, the object under observation is
usually probed by a line laser which forms a line of light through which
the object is passed. A camera at a known angle captures the images on
the laser line.
Many profiles are generated from which a three-dimensional image is created.
One important requisite for laser triangulation is that the object moves relative to
the camera.
2. 3D Stereo Vision
Fundamentally, the geometric process of 3D stereo vision is based on the use of
two cameras. These two cameras are analogous to a pair of human eyes that
capture 2D images of the objects. Then, the images are superimposed to form a
3D image using specialized algorithms. 3D stereo vision allows for the movement
of the object during the recording. One con of 3D stereo vision is that it requires
two cameras, raising the costs. 3D stereo vision is prevalent in robotics and
surveillance applications.
3. Time of Flight
These 3D vision systems measure distances using the flight of time principle
(ToF). The basic idea behind this technique: a light pulse illuminates the desired
area. A camera then measures the time which the light takes to reach the object
and return. The time, in turn, gives the idea of the depth, thus providing data for
the third dimension.

4. 1. Projection schemes
It is the process of converting a 3D object into a 2D object. It is also defined as
mapping or transformation of the object in projection plane or view plane. The
view plane is displayed surface.
Representing an n-dimensional object into an n-1 dimension is known as
projection. It is process of converting a 3D object into 2D object, we represent a
3D object on a 2D plane {(x,y,z)->(x,y)}. It is also defined as mapping or
transforming of the object in projection plane or view plane. When geometric
objects are formed by the intersection of lines with a plane, the plane is called
the projection plane and the lines are called projections.
Types of Projections:
 Parallel projections
 Perspective projections

4.1.1 Center of Projection:


It is an arbitrary point from where the lines are drawn on each
point of an object.
•If cop is located at a finite point in 3D space , Perspective
projection is the result
•If the cop is located at infinity, all the lines are parallel and the
result is a parallel projection.
 Parallel Projection:

A parallel projection is formed by extending parallel lines from each


vertex of object until they intersect plane of screen. Parallel projection
transforms object to the view plane along parallel lines. A projection is
said to be parallel, if center of projection is at an infinite distance from
the projected plane. A parallel projection preserves relative proportion of
objects, accurate views of the various sides of an object are obtained
with a parallel projection. The projection lines are parallel to each other
and extended from the object and intersect the view plane. It preserves
relative propositions of objects, and it is used in drafting to produce scale
drawings of 3D objects. This is not a realistic representation, the point of
intersection is the projection of the vertex.

Parallel projection is divided into two parts and these two parts sub divided into
many.
 Orthographic Projections:
In orthographic projection the direction of projection is normal to the projection of
the plane. In orthographic lines are parallel to each other making an angle 90 with
view plane. Orthographic parallel projections are done by projecting points along
parallel lines that are perpendicular to the projection line.
Orthographic projections are most often used to procedure the front,
side, and top views of an object are called evaluations. Engineering and
architectural drawings commonly employ these orthographic projections.
Transformation equations for an orthographic parallel projection as
straight forward. Some special orthographic parallel projections involve
plan view, side elevations. We can also perform orthographic projections
that display more than one phase of an object, such views are called
monometric orthographic projections.

 Oblique Projections:
Oblique projections are obtained by projectors along parallel lines that are
not perpendicular to the projection plane. An oblique projection shows the
front and top surfaces that include the three dimensions of height, width
and depth. The front or principal surface of an object is parallel to the plane
of projection. Effective in pictorial representation.
 Isometric Projections: Orthographic projections that show more than one
side of an object are called axonometric orthographic projections. The most
common axonometric projection is an isometric projection. In this projection
parallelism of lines are preserved but angles are not preserved.

 Dimetric projections: In these two projectors have equal angles with respect
to two principal axis.

 Trimetric projections: The direction of projection makes unequal angle with


their principal axis.

4.2 Cavalier Projections:

All lines perpendicular to the projection plane are projected with no change in
length. If the projected line making an angle 45 degrees with the projected plane,
as a result the line of the object length will not change.
4.3 Cabinet Projections:
All lines perpendicular to the projection plane are projected to one half
of their length. These gives a realistic appearance of object. It makes
63.4 degrees angle with the projection plane. Here lines perpendicular
to the viewing surface are projected at half their actual length.

 Perspective Projections:
 A perspective projection is the one produced by straight lines radiating
from a common point and passing through point on the sphere to the
plane of projection.
 Perspective projection is a geometric technique used to produce a
three dimensional graphic image on a plane, corresponding to what
person sees.
 Any set of parallel lines of object that are not parallel to the projection
plane are projected into converging lines.
A different set of parallel lines will have a separate vanishing
point.

• Coordinate positions are transferred to the view plane


along lines that converge to a point called projection
reference point.

• The distance and angles are not preserved and parallel


lines do not remain parallel. Instead, they all converge at a
single point called center of projection there are 3 types of
perspective projections.

Two characteristic of perspective are vanishing point and


perspective force shortening. Due to fore shortening objects
and lengths appear smaller from the center of projections.
The projections are not parallel and we specify a center of
projection cop.

Different types of perspective projections:

1. One point perspective projections: In this, principal axis has a finite


vanishing point. Perspective projection is simple to draw.
2. Two point perspective projections: Exactly 2 principals have vanishing points.
Perspective projection gives better impression of depth.

3. Three point perspective projections: All the three principal axes have finite
vanishing point. Perspective projection is most difficult to draw.

 Perspective fore shortening:

The size of the perspective projection of the object varies inversely with distance of
the object from the center of projection.
4.3 Photometric stereo

Photometric stereo is a technique in computer vision for


estimating the surface normals of objects by observing that
object under different lighting conditions. Having this surface
normals, we can recreate the shape of the object in 3D.
Theoretically, a minimum of three light directions are required to
fully recover the 3D shape surface normal, but because there are
noises, more than 3 light sources will improve the accuracy of
the recovery. There are some limitations to this method:

1. the light source needs to be far away so that it can be


approximated as a point light source.

2. Specular (bright reflective spot) or dark spots don't give


accuracy results because the camera sensor clip the signal.

3. This model do not take into account for shadow.

Image brightness received by the camera at each pixel in each


colour channels v = R, G, B, is given by

where a is the albedo of the surface at pixel , is the surface


normal and is the direction of the light source.

Calibration

The first step before doing any 3D surface recovering is


calibration. To do this, we used a chrome sphere, a camera with
fixed position and several light sources with fixed position as
well. We then use the images taken with one light source turn on
a time.
i.e.

We can turn the light on and capture the chrome sphere shape and
produce a mask image.

Using equation (1) and the two images, we can figure out this one light source's
direction. This is because we know the surface normal of the chrome sphere at
any one pixel of the image, the surface albedo of the chrome sphere (~1 in
grayscale) and the light intensity (given by the pixel brightness).

After figuring out several light source direction, we can now apply the light to
different objects. We first need to find out the surface normal using a grayscale
image first because we can simplify the albedo to only 1 unknown.

We use the minimize square error method,

Where k is the number of light sources. We take the derivative of the square
error with respect to the surface normal and set the derivative equal to 0 (to find
the minimum). We then end up with the following equation:
a now is just a scale factor for the surface normal (because it is in grayscale), so we
can simply just ignore it because we are only interested in the unit vector of the
surface normal. Equation (3) is essentially in matrix form, we can use the least square
method to solve for the surface normal. Similarly, we can solve for the RGB albedo
after the light source directions have been calculated.

 Results

Sample input images (I only show 1 input image per object here, but I actually have
11 different images per object. All have light source in different direction):
Sample output:
4.5 Shape from Texture

Shape from texture is a computer vision technique where a 3D object is


reconstructed from a 2D image. Although human perception is capable to
realize patterns, estimate depth and

recognize objects in an image by using texture as a cue, the creation of a


system able to mimic that behavior is far from trivial.

Although texture as a meaning is difficult to describe in our case we mean


the repetition of an element or the appearance of a specific template over
a surface. Such element or surface is

called texel (TEXture ELement). Various textures can be seen in figure 1.

The first person who proposed that a shape can be perceived from a texture was G
ibson in 1950[2]. Gibson used the term texture gradient in order to denote that are
as of a surface that
have similar texture, with other neighbor areas, are perceived differently from the
observer due to differences in orientation of the surfaces and the distance from the

observer.
In order to measure the orientation of the texels in a texture, we need to
find the slant and tilt angles. Slant denotes the amount and tilt denotes
the direction of the slope of the planar

surface projected on the image plane. In figure 2 [3] the angle ρ between
and is the slant angle while the angle τ between and the projection of
the surface normal onto the

image plane is the tilt angle

We will present a shape from texture technique that is based on [4]. In [4] they
try first to find the frontal texel that will lead them to identify the surface with th
e best consistency measure. Their aim is to identify the transformation matrices t
hat lead fromthat texel to all the other ones. Since in the beginning they don't kn
ow the appearance of the frontal texel, they define the transformation matrices a
s product of a randomly chosen texel toall the other texels multiplied by the tran
sformation matrix from the frontal texel to the randomly selected one.
and now it is possible to calculate the gradients for patch of the frontal texel.
Then by using the Fundamental Theorem of Line Integrals they calculate the
cost term and with the

LevenbergMarquardt method [1] they find the most consistent surface. Finally
having determined the frontal texel they use it to calculate the surface shape.
This is done by solving the

transformation:

In Figure 3 we can see the result from their algorithm. In the left image we can s
ee the texture and the needles in each texel where they show us their orientation
, the second image is the
estimated height of the surface of the texture, and finally in the right image the s
urface as estimated as seen from side view.
4.6 Shape from focus
Shape from focus or shape from defocus is a method of 3D reconstruction which
consists of the use of information about the focus of an optical system to provide
a means of measurement for 3D information. One of the simplest forms of this
method can be found in most autofocus cameras today. In its most simple form,
the methods analyze an image based upon overall contrast from a histogram, the
width of edges, or more commonly, the frequency spectrum derived from a fast
Fourier transform of the image. That information might be used to drive a servo
mechanism in the lens, focusing it until the quantity measured on one of the
earlier parameters is optimized.
The shape-from-focus method is based on the observations made in the previous
sections. e At the facet level magnification, rough surfaces produce images that
are rich in texture. A defocused optical system plays the role of a low-pass filter.
Fig.4 shows a rough surface of unknown shape placed on a translational stage.
The reference plane shown corresponds to the initial position of the stage. The
configuration of the optics and sensor define a single plane, the "focused5
plane," that is perfectly focused onto the sensor plane. The distance d, between
the focused and reference planes, and the displacement d of the stage with
respect to the reference plane, are always known by measurement. Let us focus
our attention on the surface element, s, that lies on the unknown surface, S. If
the stage is moved towards the focused plane, the image will gradually increase
in its degree of focus (high frequency content) and will be perfectly focused when
s lies on the focused plane. Further movement of the element s will again
increase the defocusing of its image. If we observe the image area corresponding
to the s and record the stage displacement d = 2 at the instant of maximum
focus, we can compute the height d, of s with respect to the stage as d, = df - z.
In fact, we can use the value of to determine the distance of s with respect to the
focused plane, sensor plane, or any other coordinate system defined with respect
to the imaging system. This approach may bc applied independently to all surface
elements to obtain the shape of entire surface S.
To automatically detect the instant of "best" focus, we will develop an image
focus measure. In the above discussion, the stage motion and image acquisition
were assumed to be continuous processes. In practice, however, it is not feasible
to acquire and process such a large number of images in a reasonable mount of
time. Therefore, we obtain only a finite number of images; the stage is moved in
increments of Ad, and an image is obtained at each stage position (d = n.Ad). By
studying the behavior of the focus measure, we develop an interpolation method
that uses a small number of focus measures to compute accurate depth
estimates. An important feature of the methcd is its local nature; the depth
estimate at an image point is computed only from focus measures recorded at
that point. Consequently, the method can adapt well to variations in textun type
and content over the object surface.
4.13 Introduction To Motion
 Motion in computer vision is a sub-field within computer vision dealing with
estimation and analysis of information related to motion in image sequences
or in the scene depicted by a camera. Common problems in this field relate to
 Estimating the motion field in an image sequence.
 Estimating the 3D motion of points in the scene based on measurements on
image data.
 Estimation of Ego-motion, the 3D motion of the camera relative to the scene.
 “When objects move at equal speed, those more remote seem to move more
slowly.”
 - Euclid, 300 BC

4.13.1 Simplest Idea for video processing Image Differences

Given image I(u,v,t) and I(u,v, t+δt), compute I(u,v, t+δt) - I(u,v,t)

• This is partial derivative:

• At object boundaries, is large and is a cue for segmentation

• Does not indicate which way objects are moving


 Motion field
• The motion field is the projection of the 3D scene motion
into the image
What causes a motion field?
1. Camera moves (translates, rotates)
2. Objects in scene move rigidly
3. Objects articulate (pliers, humans, animals)
4. Objects bend and deform (fish)
5. Blowing smoke, clouds
6. Multiple movements
4.14 Triangulation
 Triangulation refers to the process of determining a point in 3D
space given its projections onto two, or more, images. In order to
solve this problem it is necessary to know the parameters of the
camera projection function from 3D to 2D for the cameras involved,
in the simplest case represented by the camera matrices.
Triangulation is sometimes also referred to as reconstruction or
intersection.

 The triangulation problem is in principle trivial. Since each point in


an image corresponds to a line in 3D space, all points on the line in
3D are projected to the point in the image. If a pair of
corresponding points in two, or more images, can be found it must
be the case that they are the projection of a common 3D point x.
The set of lines generated by the image points must intersect at x
(3D point) and the algebraic formulation of the coordinates of x (3D
point) can be computed in a variety of ways, as is presented below.

 In practice, however, the coordinates of image points cannot be


measured with arbitrary accuracy. Instead, various types of noise,
such as geometric noise from lens distortion or interest point
detection error, lead to inaccuracies in the measured image
coordinates. As a consequence, the lines generated by the
corresponding image points do not always intersect in 3D space.
The problem, then, is to find a 3D point which optimally fits the
measured image points.
In the literature there are multiple proposals for how to define
optimality and how to find the optimal 3D point. Since they are based
on different optimality criteria, the various methods produce different
estimates of the 3D point x when noise is involved.
Triangulation is made on corresponding image points from two views generated
by pinhole cameras. Generalization from these assumptions are discussed here.

The ideal case of epipolar geometry. A 3D point x is projected onto


two camera images through lines (green) which intersect with each
camera's focal point, O1 and O2. The resulting image points are y1
and y2. The green lines intersect at x.
In practice, the image points y1 and y2 cannot be measured with
arbitrary accuracy. Instead points y'1 and y'2 are detected and
used for the triangulation. The corresponding projection lines
(blue) do not, in general, intersect in 3D space and may also not
intersect with point x

The image to the left illustrates the epipolar geometry of a pair of


stereo cameras of pinhole model. A point x (3D point) in 3D space is
projected onto the respective image plane along a line (green) which
goes through the camera's focal point, and , resulting in the two
corresponding image points and . If and are given and the
geometry of the two cameras are known, the two projection lines
(green lines) can be determined and it must be the case that they
intersect at point x (3D point). Using basic linear algebra that
intersection point can be determined in a straightforward way.
The image to the right shows the real case. The position of the image
points and cannot be measured exactly. The reason is a
combination of factors such as
•Geometric distortion, for example lens distortion, which means that
the 3D to 2D mapping of the camera deviates from the pinhole
camera model. To some extent these errors can be compensated for,
leaving a residual geometric error.
•A single ray of light from x (3D point) is dispersed in the lens
system of the cameras according to a point spread function. The
recovery of the corresponding image point from measurements of
the dispersed intensity function in the images gives errors.
•In a digital camera, the image intensity function is only measured in
discrete sensor elements. Inexact interpolation of the discrete
intensity function have to be used to recover the true one.
•The image points y1' and y2' used for triangulation are often found
using various types of feature extractors, for example of corners or
interest points in general. There is an inherent localization error for
any type of feature extraction based on neighborhood operations.
4.15 Bundle adjustment
 In photogrammetry and computer stereo vision, bundle adjustment
is simultaneous refining of the 3D coordinates describing the scene
geometry, the parameters of the relative motion, and the optical
characteristics of the camera(s) employed to acquire the images,
given a set of images depicting a number of 3D points from different
viewpoints. Its name refers to the geometrical bundles of light rays
originating from each 3D feature and converging on each camera's
optical center, which are adjusted optimally according to an
optimality criterion involving the corresponding image projections
of all points.
 Bundle adjustment is almost always[citation needed] used as the
last step of feature-based 3D reconstruction algorithms. It amounts
to an optimization problem on the 3D structure and viewing
parameters (i.e., camera pose and possibly intrinsic calibration and
radial distortion), to obtain a reconstruction which is optimal under
certain assumptions regarding the noise pertaining to the
observed[1] image features: If the image error is zero-mean
Gaussian, then bundle adjustment is the Maximum Likelihood
Estimator.[2]:2 Bundle adjustment was originally conceived in the
field of photogrammetry during the 1950s and has increasingly
been used by computer vision researchers during recent years
 Bundle adjustment boils down to minimizing the reprojection
error between the image locations of observed and predicted image
points, which is expressed as the sum of squares of a large number
of nonlinear, real-valued functions.
Thus, the minimization is achieved using nonlinear least-
squares algorithms. Of these, Levenberg–Marquardt has proven to be
one of the most successful due to its ease of implementation and its
use of an effective damping strategy that lends it the ability to
converge quickly from a wide range of initial guesses. By iteratively
linearizing the function to be minimized in the neighborhood of the
current estimate, the Levenberg–Marquardt algorithm involves the
solution of linear systems termed the normal equations. When solving
the minimization problems arising in the framework of bundle
adjustment, the normal equations have a sparse block structure owing
to the lack of interaction among parameters for different 3D points and
cameras. This can be exploited to gain tremendous computational
benefits by employing a sparse variant of the Levenberg–Marquardt
algorithm which explicitly takes advantage of the normal equations
zeros pattern, avoiding storing and operating on zero-elements.
A sparse matrix obtained when solving a modestly sized bundle
adjustment problem. This is the arrowhead sparsity pattern of a
992×992 normal-equation (i.e. approximate Hessian) matrix.
Black regions correspond to nonzero blocks
4.16 Parametric Motion
Parametric model-based motion segmentation algorithms classify the
independent motion in the scene based on the fact that they are instances of
some underlying parametric motion models. Hence, an efficient model selection
criterion would have a vital role in such algorithms if the true underlying motion
model is to be detected.
Motion segmentation is a complicated process involving several dilemmas. That
is, to estimate the motion one needs to know the boundaries of motion and to
locate the motion boundaries one should have a preliminary estimate of motion.
In addition, to reject the outliers and estimate motion parameters, knowing the
true underlying motion model is essential. However, determining the true motion
model requires a clean (outlier-free) region, to which a model selection criterion
can be applied.
4.17 Spline-based Motion
4.19 Optical Flow
The most general version of motion estimation is to compute an independent
estimate of motion at each pixel, which is generally known as optical (or
optic)flow

 After each iteration of optic flow estimation in a coarse-to-fine pyramid,


they re-warp one of the images so that only incremental flow estimates
are computed
 When overlapping patches are used, an efficient implementation is to
first compute the outer products of the gradients and intensity errors at
every pixel and then perform the overlapping window sums using a
moving average filter.11
4.20 Layered Motion
In many situation, visual motion is caused by the movement of a small
number of objects at different depths in the scene.
In such situations, the pixel motions can be described more succinctly
(and estimated more reliably) if pixels are grouped into appropriate
objects or layers

Layered motion representations not only lead to compact


representations but they also exploit the information available in
multiple video frames, as well as accurately modeling the
appearance of pixels near motion discontinuities.
 first estimate affine motion models over a collection of non-overlapping
patches
 then cluster these estimates using k-means.
 then alternate between assigning pixels to layers and recomputing motion
estimates for each layer using the assigned pixels
 layers are constructed by warping and merging the various layer pieces from
all of the frames together.

 the motion of each layer is described using a 3D plane equation plus per-
pixel residual depth offsetsthat rigid planar motions (homographies) are
used instead of affine motion models.
 The final model refinement re-optimizes the layer pixel by minimizing the
discrepancy between the re-synthesized and observed motion
sequencesrequired a rough initial assignment of pixels to layers
Lecture Slides
Lecture Slides

Lecture Slides
Lecture Videos
Lecture Videos

Lecture Videos
Assignment
Assignment

Assignments
1. Gaussian filtering. Gradient magnitude. Canny edge detection
2. Detecting interest points. Simple matching of features.
3.Stereo correspondence analysis.
4.Photometric Stereo.
Each assignment contains:
1. paper. Discusses theory, task, methods, results.
2. src folder
code
README file for instructions on how to run code
Part A Q & A
PART -A

1. What is the purpose of "Shape from Focus" in 3D computer vision?


The purpose of "Shape from Focus" in 3D computer vision is to estimate
the depth or three-dimensional information of a scene or object by
analyzing the focus or defocus of images taken at different focal
lengths.
2: What does "Bundle Adjustment" aim to improve in 3D computer
vision?
"Bundle Adjustment" in 3D computer vision aims to improve the
accuracy of camera parameters (pose and calibration) and the 3D
structure of a scene by minimizing errors, resulting in a more precise 3D
reconstruction of the environment.
3. What is the primary objective of "Bundle Adjustment" in the field of
3D computer vision?
The primary objective of "Bundle Adjustment" in 3D computer vision is
to refine and optimize the parameters of cameras and the 3D structure
of a scene simultaneously to improve the accuracy of 3D reconstruction.
4. What is the fundamental goal of "Triangulation" in 3D computer
vision?
The fundamental goal of "Triangulation" in 3D computer vision is to
calculate the 3D position of a point in space by intersecting rays from
multiple cameras or sensors that observe the same point in 2D.
5. What is the primary purpose of "Shape from Shading" in 3D
computer vision?
The primary purpose of "Shape from Shading" is to estimate the three-
dimensional shape of an object by analyzing how light and shadows
interact with its surface.
74
PART -A

6. How does "Photometric Stereo" work in 3D computer vision?


"Photometric Stereo" estimates 3D shape by analyzing how the
appearance of an object changes in multiple images taken under
different lighting conditions.
7. What is the goal of "Shape from Texture" in computer vision?
The goal of "Shape from Texture" is to infer the three-dimensional
shape of an object by analyzing the texture patterns on its surface.
8. How does "Shape from Focus" estimate depth information in 3D
computer vision?
"Shape from Focus" estimates depth information by analyzing the focus
or defocus of images taken at different focal lengths.
9. What is the purpose of "Active Range Finding" in 3D computer
vision?
The purpose of "Active Range Finding" is to actively project patterns or
laser beams onto a scene to measure distances and create 3D models.
10. What are "Point-Based Representations" in the context of 3D
vision?
"Point-Based Representations" involve representing a 3D surface as a
collection of 3D points.
11. How are "Volumetric Representations" used in 3D computer vision?
"Volumetric Representations" represent a 3D object as a volume, often
used in medical imaging and other fields.

75
PART -A

12. What is the key objective of "3D Object Recognition" in computer


vision?
The key objective of "3D Object Recognition" is to identify and classify
objects based on their 3D structure.
13. Define "3D Reconstruction" in computer vision.
"3D Reconstruction" is the process of creating a 3D model or scene
from 2D images or sensor data.
14. How does "Triangulation" work in 3D computer vision?
"Triangulation" estimates 3D points by intersecting rays from multiple
cameras or sensors observing the same point in 2D.
15. What is the primary purpose of "Translational Alignment" in 3D
computer vision?
The primary purpose of "Translational Alignment" is to align objects or
scenes based on translation or movement.
16. How are "Parametric Motion" models used in the context of motion
analysis?
"Parametric Motion" models represent motion using mathematical
parameters, allowing for the description and prediction of object
movements.
17. What is the significance of "Spline-Based Motion" in motion
analysis?
"Spline-Based Motion" involves using spline curves to model and
interpolate motion paths, providing smoother and more realistic motion
representations.

76
PART -A

18. How does "Optical Flow" contribute to motion analysis in computer


vision?
"Optical Flow" is used to estimate the motion of objects within an
image sequence, helping to track object movement.
19. What is meant by "Layered Motion" in the context of motion
analysis?
"Layered Motion" deals with multiple moving objects in a scene and
often involves motion segmentation, separating the movements of
different objects.

77
Part B Q
7
9

PART-B
1. What are the key methods used for 3D vision in computer vision?
2. How do projection schemes play a role in 3D computer vision?
3. Explain the concept of "Shape from Shading" in 3D vision.
4. What is the purpose of "Photometric Stereo" in 3D computer
vision?
5. How does "Shape from Texture" contribute to 3D scene
understanding?
6. What is "Shape from Focus" and how is it applied in 3D vision?
7. Describe the significance of "Active Range Finding" in 3D computer
vision.
8. What are the different types of surface representations used in 3D
computer vision?
9. Explain the concept of "Point-Based Representation" in surface
modeling.
10. How are "Volumetric Representations" employed in 3D object
modeling?
11. What is the main goal of 3D object recognition in computer vision?
12. Define "3D Reconstruction" in the context of computer vision.
13. How does "Triangulation" play a role in 3D reconstruction?
14. Explain the purpose and process of "Bundle Adjustment" in 3D
reconstruction.
15. What are the key challenges in 3D object recognition and
reconstruction?
8
0

PART-B

16. What is the significance of "Optical Flow" in motion


analysis?
17. How are parametric motion models used to describe
object movements?
18. Describe the concept of "Spline-Based Motion" in motion
analysis.
19. What does "Layered Motion" refer to in the study of
object motion?
20. What challenges are associated with motion analysis in
computer vision?
Supportive Online
Certification courses
8
2

SUPPORTIVE ONLINE COURSES

Course
S No Course title Link
provider

https://www.udemy.co
Computer vision applies m/topic/computer-
machine learning.
1 Udemy vision/

https://www.udacity.co

Introduction to Computer m/course/introduction-


2 Udacity Vision to-computer-vision--
ud810

https://www.coursera.o
Advanced Computer Vision rg/learn/advanced-
3 Coursera with TensorFlow
computer-vision-with-
tensorflow
Computer Vision and https://www.edx.org/lear
Image Processing n/computer-
Fundamentals programming/ibm-
edX
4 computer-vision-and-
image-processing-
fundamentals?webview=
false&campaign=Comput
er+Vision+and+Image+
Processing+Fundamental
s&source=edx&product_
category=course&placem
ent_url=https%3A%2F%
2Fwww.edx.org%2Flearn
%2Fcomputer-vision
Real life Applications in
day to day life and to
Industry
8
4

REAL TIME APPLICATIONS IN DAY TO DAY LIFE

AND TO INDUSTRY

1.Explain the role of an computer vision applications in the most prominent industries
including agriculture, healthcare, transportation, manufacturing, and retail. (K4, CO2)
Content beyond
Syllabus
8
6

Contents beyond the Syllabus

Basics of Computer Vision

Reference Video – Content Beyond Syllabus

https://www.youtube.com/watch?v=2w8XIskzdFw
Assessment Schedule
ASSESSMENT SCHEDULE

FIAT
Proposed date :10.10.2023
Prescribed Text books &
Reference books
PRESCRIBED TEXT BOOKS AND REFERENCE BOOKS

TEXT BOOKS
D. A. Forsyth, J. Ponce, “Computer Vision: A Modern Approach”,
Pearson Education,
2003.
2. Richard Szeliski, “Computer Vision: Algorithms and Applications”,
Springer Verlag London Limited,2011.
REFERENCE BOOKS
B. K. P. Horn -Robot Vision, McGraw-Hill.
Simon J. D. Prince, Computer Vision: Models, Learning, and
Inference, Cambridge University Press, 2012.
Mark Nixon and Alberto S. Aquado, Feature Extraction & Image
Processing for Computer Vision, Third Edition, Academic Press,
2012.
E. R. Davies, (2012), “Computer & Machine Vision”, Fourth Edition,
Academic Press.
Concise Computer Vision: An Introduction into Theory and
Algorithms, by Reinhard Klette,2014
Mini Project
Suggestions
MINI PROJECT SUGGESTIONS

1. Real-Time Edge Detection using OpenCV


Thank you

Disclaimer:

This document is confidential and intended solely for the educational purpose of RMK Group of
Educational Institutions. If you have received this document through email in error, please notify the
system manager. This document contains proprietary information and is intended only to the
respective group / learning community as intended. If you are not the addressee you should not
disseminate, distribute or copy through e-mail. Please notify the sender immediately by e-mail if you
have received this document by mistake and delete this document from your system. If you are not
the intended recipient you are notified that disclosing, copying, distributing or taking any action in
reliance on the contents of this information is strictly prohibited.

You might also like