AD8703 BCV Unit IV 2023
AD8703 BCV Unit IV 2023
This document is confidential and intended solely for the educational purpose of
RMK Group of Educational Institutions. If you have received this document
through email in error, please notify the system manager. This document
contains proprietary information and is intended only to the respective group /
learning community as intended. If you are not the addressee you should not
disseminate, distribute or copy through e-mail. Please notify the sender
immediately by e-mail if you have received this document by mistake and delete
this document from your system. If you are not the intended recipient you are
notified that disclosing, copying, distributing or taking any action in reliance on
the contents of this information is strictly prohibited.
AD8703
BASICS OF COMPUTER
VISION
UNIT IV
Department: AI&DS
Date : 12.08.2023
Table of Contents
S
CONTENTS PAGE NO
NO
1 Contents 1
2 Course Objectives 6
8
3 Pre Requisites (Course Names with Code)
5 Course Outcomes 12
7 Lecture Plan 16
9 Lecture Notes 20
Lecture Slides 40
Lecture Videos 42
10 Assignments 44
11 Part A (Q & A) 47
12 Part B Qs 51
16 Assessment Schedule 60
COURSE OBJECTIVES
To review image processing techniques for computer vision.
To understand various features and recognition techniques
To learn about histogram and binary vision
Apply three-dimensional image analysis techniques
Study real world applications of computer vision algorithms
Prerequisite
PREREQUISITE
NIL
Syllabus
AD8703 -BASICS OF COMPUTER VISION
SYLLABUS 3003
UNIT I INTRODUCTION
Feature Extraction -Edges - Canny, LOG, DOG; Line detectors (Hough Transform),
Corners -Harris and Hessian Affine, Orientation Histogram, SIFT, SURF, HOG, GLOH,
Scale-Space Analysis- Image Pyramids and Gaussian derivative filters, Gabor Filters
and DWT. Image Segmentation -Region Growing, Edge Based approaches to
segmentation, Graph-Cut, Mean-Shift, MRFs, Texture Segmentation.
UNIT V APPLICATIONS
CO1: Recognise and describe how mathematical and scientific concepts are
applied in computer vision.
PO’s/PSO’s
COs
PO PO PO PO PO PO PO PO PO PO PO PO PSO PSO PSO
1 2 3 4 5 6 7 8 9 10 11 12 1 2 3
CO1
3 2 2 2 2 - - - 2 - - 2 2 - -
CO2
3 3 2 2 2 - - - 2 - - 2 2 - -
CO3
2 2 1 1 1 - - - 1 - - 1 2 - -
CO4
3 3 1 1 1 - - - 1 - - 1 2 - -
CO5
3 3 1 1 1 - - - 1 - - 1 3 1 -
Mode
No of Taxon
S No Topics Actual Lecture Pertaining of
periods Proposed date omy
Date CO delivery
level
Methods for 3D
11.10.20
1 vision – projection 1 CO4 K3 Lecture
23
schemes
shape
12.10.20
3 from texture – shape 1 CO4 K3 Lecture
23
from focus
active range finding
12.10.20
4 – surface 1 CO4 K4 Lecture
23
representations
point-based
representation – 16.10.20 Lecture
5 1 CO4 K4
volumetric 23
representations
introduction to
motion – Lecture
18.10.20
7 triangulation 1 CO4 K3
23
Lecture
bundle adjustment –
translational 18.10.20
8 1 CO4 K3
alignment – 23
parametric motion
UiPath.CV.Activities.CVGetTextWithDescriptor
https://scholarworks.calstate.edu/downloads/hh63sx58j
Lecture Notes
3D VISION AND MOTION
4. 1. Projection schemes
It is the process of converting a 3D object into a 2D object. It is also defined as
mapping or transformation of the object in projection plane or view plane. The
view plane is displayed surface.
Representing an n-dimensional object into an n-1 dimension is known as
projection. It is process of converting a 3D object into 2D object, we represent a
3D object on a 2D plane {(x,y,z)->(x,y)}. It is also defined as mapping or
transforming of the object in projection plane or view plane. When geometric
objects are formed by the intersection of lines with a plane, the plane is called
the projection plane and the lines are called projections.
Types of Projections:
Parallel projections
Perspective projections
Parallel projection is divided into two parts and these two parts sub divided into
many.
Orthographic Projections:
In orthographic projection the direction of projection is normal to the projection of
the plane. In orthographic lines are parallel to each other making an angle 90 with
view plane. Orthographic parallel projections are done by projecting points along
parallel lines that are perpendicular to the projection line.
Orthographic projections are most often used to procedure the front,
side, and top views of an object are called evaluations. Engineering and
architectural drawings commonly employ these orthographic projections.
Transformation equations for an orthographic parallel projection as
straight forward. Some special orthographic parallel projections involve
plan view, side elevations. We can also perform orthographic projections
that display more than one phase of an object, such views are called
monometric orthographic projections.
Oblique Projections:
Oblique projections are obtained by projectors along parallel lines that are
not perpendicular to the projection plane. An oblique projection shows the
front and top surfaces that include the three dimensions of height, width
and depth. The front or principal surface of an object is parallel to the plane
of projection. Effective in pictorial representation.
Isometric Projections: Orthographic projections that show more than one
side of an object are called axonometric orthographic projections. The most
common axonometric projection is an isometric projection. In this projection
parallelism of lines are preserved but angles are not preserved.
Dimetric projections: In these two projectors have equal angles with respect
to two principal axis.
All lines perpendicular to the projection plane are projected with no change in
length. If the projected line making an angle 45 degrees with the projected plane,
as a result the line of the object length will not change.
4.3 Cabinet Projections:
All lines perpendicular to the projection plane are projected to one half
of their length. These gives a realistic appearance of object. It makes
63.4 degrees angle with the projection plane. Here lines perpendicular
to the viewing surface are projected at half their actual length.
Perspective Projections:
A perspective projection is the one produced by straight lines radiating
from a common point and passing through point on the sphere to the
plane of projection.
Perspective projection is a geometric technique used to produce a
three dimensional graphic image on a plane, corresponding to what
person sees.
Any set of parallel lines of object that are not parallel to the projection
plane are projected into converging lines.
A different set of parallel lines will have a separate vanishing
point.
3. Three point perspective projections: All the three principal axes have finite
vanishing point. Perspective projection is most difficult to draw.
The size of the perspective projection of the object varies inversely with distance of
the object from the center of projection.
4.3 Photometric stereo
Calibration
We can turn the light on and capture the chrome sphere shape and
produce a mask image.
Using equation (1) and the two images, we can figure out this one light source's
direction. This is because we know the surface normal of the chrome sphere at
any one pixel of the image, the surface albedo of the chrome sphere (~1 in
grayscale) and the light intensity (given by the pixel brightness).
After figuring out several light source direction, we can now apply the light to
different objects. We first need to find out the surface normal using a grayscale
image first because we can simplify the albedo to only 1 unknown.
Where k is the number of light sources. We take the derivative of the square
error with respect to the surface normal and set the derivative equal to 0 (to find
the minimum). We then end up with the following equation:
a now is just a scale factor for the surface normal (because it is in grayscale), so we
can simply just ignore it because we are only interested in the unit vector of the
surface normal. Equation (3) is essentially in matrix form, we can use the least square
method to solve for the surface normal. Similarly, we can solve for the RGB albedo
after the light source directions have been calculated.
Results
Sample input images (I only show 1 input image per object here, but I actually have
11 different images per object. All have light source in different direction):
Sample output:
4.5 Shape from Texture
The first person who proposed that a shape can be perceived from a texture was G
ibson in 1950[2]. Gibson used the term texture gradient in order to denote that are
as of a surface that
have similar texture, with other neighbor areas, are perceived differently from the
observer due to differences in orientation of the surfaces and the distance from the
observer.
In order to measure the orientation of the texels in a texture, we need to
find the slant and tilt angles. Slant denotes the amount and tilt denotes
the direction of the slope of the planar
surface projected on the image plane. In figure 2 [3] the angle ρ between
and is the slant angle while the angle τ between and the projection of
the surface normal onto the
We will present a shape from texture technique that is based on [4]. In [4] they
try first to find the frontal texel that will lead them to identify the surface with th
e best consistency measure. Their aim is to identify the transformation matrices t
hat lead fromthat texel to all the other ones. Since in the beginning they don't kn
ow the appearance of the frontal texel, they define the transformation matrices a
s product of a randomly chosen texel toall the other texels multiplied by the tran
sformation matrix from the frontal texel to the randomly selected one.
and now it is possible to calculate the gradients for patch of the frontal texel.
Then by using the Fundamental Theorem of Line Integrals they calculate the
cost term and with the
LevenbergMarquardt method [1] they find the most consistent surface. Finally
having determined the frontal texel they use it to calculate the surface shape.
This is done by solving the
transformation:
In Figure 3 we can see the result from their algorithm. In the left image we can s
ee the texture and the needles in each texel where they show us their orientation
, the second image is the
estimated height of the surface of the texture, and finally in the right image the s
urface as estimated as seen from side view.
4.6 Shape from focus
Shape from focus or shape from defocus is a method of 3D reconstruction which
consists of the use of information about the focus of an optical system to provide
a means of measurement for 3D information. One of the simplest forms of this
method can be found in most autofocus cameras today. In its most simple form,
the methods analyze an image based upon overall contrast from a histogram, the
width of edges, or more commonly, the frequency spectrum derived from a fast
Fourier transform of the image. That information might be used to drive a servo
mechanism in the lens, focusing it until the quantity measured on one of the
earlier parameters is optimized.
The shape-from-focus method is based on the observations made in the previous
sections. e At the facet level magnification, rough surfaces produce images that
are rich in texture. A defocused optical system plays the role of a low-pass filter.
Fig.4 shows a rough surface of unknown shape placed on a translational stage.
The reference plane shown corresponds to the initial position of the stage. The
configuration of the optics and sensor define a single plane, the "focused5
plane," that is perfectly focused onto the sensor plane. The distance d, between
the focused and reference planes, and the displacement d of the stage with
respect to the reference plane, are always known by measurement. Let us focus
our attention on the surface element, s, that lies on the unknown surface, S. If
the stage is moved towards the focused plane, the image will gradually increase
in its degree of focus (high frequency content) and will be perfectly focused when
s lies on the focused plane. Further movement of the element s will again
increase the defocusing of its image. If we observe the image area corresponding
to the s and record the stage displacement d = 2 at the instant of maximum
focus, we can compute the height d, of s with respect to the stage as d, = df - z.
In fact, we can use the value of to determine the distance of s with respect to the
focused plane, sensor plane, or any other coordinate system defined with respect
to the imaging system. This approach may bc applied independently to all surface
elements to obtain the shape of entire surface S.
To automatically detect the instant of "best" focus, we will develop an image
focus measure. In the above discussion, the stage motion and image acquisition
were assumed to be continuous processes. In practice, however, it is not feasible
to acquire and process such a large number of images in a reasonable mount of
time. Therefore, we obtain only a finite number of images; the stage is moved in
increments of Ad, and an image is obtained at each stage position (d = n.Ad). By
studying the behavior of the focus measure, we develop an interpolation method
that uses a small number of focus measures to compute accurate depth
estimates. An important feature of the methcd is its local nature; the depth
estimate at an image point is computed only from focus measures recorded at
that point. Consequently, the method can adapt well to variations in textun type
and content over the object surface.
4.13 Introduction To Motion
Motion in computer vision is a sub-field within computer vision dealing with
estimation and analysis of information related to motion in image sequences
or in the scene depicted by a camera. Common problems in this field relate to
Estimating the motion field in an image sequence.
Estimating the 3D motion of points in the scene based on measurements on
image data.
Estimation of Ego-motion, the 3D motion of the camera relative to the scene.
“When objects move at equal speed, those more remote seem to move more
slowly.”
- Euclid, 300 BC
Given image I(u,v,t) and I(u,v, t+δt), compute I(u,v, t+δt) - I(u,v,t)
the motion of each layer is described using a 3D plane equation plus per-
pixel residual depth offsetsthat rigid planar motions (homographies) are
used instead of affine motion models.
The final model refinement re-optimizes the layer pixel by minimizing the
discrepancy between the re-synthesized and observed motion
sequencesrequired a rough initial assignment of pixels to layers
Lecture Slides
Lecture Slides
Lecture Slides
Lecture Videos
Lecture Videos
Lecture Videos
Assignment
Assignment
Assignments
1. Gaussian filtering. Gradient magnitude. Canny edge detection
2. Detecting interest points. Simple matching of features.
3.Stereo correspondence analysis.
4.Photometric Stereo.
Each assignment contains:
1. paper. Discusses theory, task, methods, results.
2. src folder
code
README file for instructions on how to run code
Part A Q & A
PART -A
75
PART -A
76
PART -A
77
Part B Q
7
9
PART-B
1. What are the key methods used for 3D vision in computer vision?
2. How do projection schemes play a role in 3D computer vision?
3. Explain the concept of "Shape from Shading" in 3D vision.
4. What is the purpose of "Photometric Stereo" in 3D computer
vision?
5. How does "Shape from Texture" contribute to 3D scene
understanding?
6. What is "Shape from Focus" and how is it applied in 3D vision?
7. Describe the significance of "Active Range Finding" in 3D computer
vision.
8. What are the different types of surface representations used in 3D
computer vision?
9. Explain the concept of "Point-Based Representation" in surface
modeling.
10. How are "Volumetric Representations" employed in 3D object
modeling?
11. What is the main goal of 3D object recognition in computer vision?
12. Define "3D Reconstruction" in the context of computer vision.
13. How does "Triangulation" play a role in 3D reconstruction?
14. Explain the purpose and process of "Bundle Adjustment" in 3D
reconstruction.
15. What are the key challenges in 3D object recognition and
reconstruction?
8
0
PART-B
Course
S No Course title Link
provider
https://www.udemy.co
Computer vision applies m/topic/computer-
machine learning.
1 Udemy vision/
https://www.udacity.co
https://www.coursera.o
Advanced Computer Vision rg/learn/advanced-
3 Coursera with TensorFlow
computer-vision-with-
tensorflow
Computer Vision and https://www.edx.org/lear
Image Processing n/computer-
Fundamentals programming/ibm-
edX
4 computer-vision-and-
image-processing-
fundamentals?webview=
false&campaign=Comput
er+Vision+and+Image+
Processing+Fundamental
s&source=edx&product_
category=course&placem
ent_url=https%3A%2F%
2Fwww.edx.org%2Flearn
%2Fcomputer-vision
Real life Applications in
day to day life and to
Industry
8
4
AND TO INDUSTRY
1.Explain the role of an computer vision applications in the most prominent industries
including agriculture, healthcare, transportation, manufacturing, and retail. (K4, CO2)
Content beyond
Syllabus
8
6
https://www.youtube.com/watch?v=2w8XIskzdFw
Assessment Schedule
ASSESSMENT SCHEDULE
FIAT
Proposed date :10.10.2023
Prescribed Text books &
Reference books
PRESCRIBED TEXT BOOKS AND REFERENCE BOOKS
TEXT BOOKS
D. A. Forsyth, J. Ponce, “Computer Vision: A Modern Approach”,
Pearson Education,
2003.
2. Richard Szeliski, “Computer Vision: Algorithms and Applications”,
Springer Verlag London Limited,2011.
REFERENCE BOOKS
B. K. P. Horn -Robot Vision, McGraw-Hill.
Simon J. D. Prince, Computer Vision: Models, Learning, and
Inference, Cambridge University Press, 2012.
Mark Nixon and Alberto S. Aquado, Feature Extraction & Image
Processing for Computer Vision, Third Edition, Academic Press,
2012.
E. R. Davies, (2012), “Computer & Machine Vision”, Fourth Edition,
Academic Press.
Concise Computer Vision: An Introduction into Theory and
Algorithms, by Reinhard Klette,2014
Mini Project
Suggestions
MINI PROJECT SUGGESTIONS
Disclaimer:
This document is confidential and intended solely for the educational purpose of RMK Group of
Educational Institutions. If you have received this document through email in error, please notify the
system manager. This document contains proprietary information and is intended only to the
respective group / learning community as intended. If you are not the addressee you should not
disseminate, distribute or copy through e-mail. Please notify the sender immediately by e-mail if you
have received this document by mistake and delete this document from your system. If you are not
the intended recipient you are notified that disclosing, copying, distributing or taking any action in
reliance on the contents of this information is strictly prohibited.