Computer Vision LAB 8 SEM
Computer Vision LAB 8 SEM
COMPUTER VISION
LAB
SEMESTER 8TH
Prepared By
Prof. Vijaya Chaturvedi Prof. Jharna Chopra
DOs:
▪ Take help from the Manual / Work Book for preparation of the experiment.
▪ For any abnormal working of the machine consult the Faculty In-charge/ Lab
Assistant.
▪ Shut down the machine and switch off the power supply after performing the
experiment.
DON’Ts :
▪ Do not tamper the instruments in the Lab and do not disturb their settings.
LIST OF EXPERIMENTS
Course Objectives:
1. To be able to use Python for Image handling and processing.
2. To perform Geometric transformations and computer homography matrix in Python.
3. To be able to perform perspective transformation, edge detection, line detection and
corner detection.
4. To be able to implement SIFT, SURF and HOG in Python.
Recommended Books:
1. Programming Computer Vision with Python, Jan Erik Solem, O'Reilly Media, ISBN:
9781449316549.
2. Practical Machine Learning for Computer Vision: End-to-End Machine
Learning for Images, Valliappa Lakshmanan, O'Reilly Media, ISBN:
9391043836.
EXPERIMENT-1
• Opening Images
• Cropping Images
• Using Filters
• Adding Borders
• Resizing Images
As you can see, Pillow can be used for many types of image processing. The images used in this
article are some that the author has taken himself. They are included with the code examples on
Github. See the introduction for more details.
Installing Pillow
Installing Pillow is easy to do with pip. Here is how you would do it after opening a terminal or
console window:
python -m pip install pillow
Now that Pillow is installed, you are ready to start using it!
Opening Images
Pillow let’s you open and view many different file types. For a full listing of the image file types that
Pillow supports, see the following:
• https://pillow.readthedocs.io/en/stable/handbook/image-file-formats.html
You can use Pillow to open and view any of the file types mentioned in the “fully supported formats”
section at the link above. The viewer is made with Tkinter and works in much the same way as
Matplotlib does when it shows a graph.
To see how this works, create a new file named open_image.py and enter the following code:
# open_image.py
This is pretty handy because now you can view your images with Python without writing an entire
graphical user interface. You can use Pillow to learn more about your images as well. Create a new
file named get_image_info.py and add this code to it:
# get_image_info.py
def get_image_info(path):
image = Image.open(path)
exif = image._getexif()
print(exif)
if __name__ == '__main__':
get_image_info('ducks.jpg')
Here you get the width and height of the image using the image object. Then you use
the _getexif() method to get metadata about your image. EXIF stands for “Exchangeable image file
format” and is a standard that specifies the formats for images, sound, and ancillary tags used by
digital cameras. The output is pretty verbose, but you can learn from that data that this particular
photo was taken with a Sony 6300 camera with the following settings: “E 18-200mm F3.5-6.3 OSS
LE”. The timestamp for the photo is also in the Exif information.
However, the Exif data can be altered if you use photo editing software to crop, apply filters or do
other types of image manipulation. This can remove part or all of the Exif data. Try running this
function on some of your own photos and see what kinds of information you can extract!
Another fun bit of information that you can extract from the image is its histogram data. The
histogram of an image is a graphical representation of its tonal values. It shows you the brightness of
the photo as a list of values that you could graph. Let’s use this image as an example:
To get the histogram from this image you will use the image’s histogram() method. Then you will use
Matplotlib to graph it out. To see one way that you could do that, create a new file
named get_histrogram.py and add this code to it:
# get_histrogram.py
This graph shows you the tonal values in the image that were mentioned earlier. You can try passing
in some of the other images included on Github to see different graphs or swap in some of your own
images to see their histograms.
Now let’s discover how you can use Pillow to crop images!
Cropping Images
When you are taking photographs, all too often the subject of the photo will move or you didn’t
zoom in far enough. This results in a photo where the focus of the image isn’t really front-and-center.
To fix this issue, you can crop the image to that part of the image that you want to highlight.
Pillow has this functionality built-in. To see how it works, create a file named cropping.py and add the
following code to it:
# cropping.py
Now when you run the code against this, you will end up with the following cropped image:
The coordinates you use to crop with will vary with the photo. In fact, you should probably change
this code so that it accepts the crop coordinates as arguments. You can do that yourself as a little
homework. It takes some trial and error to figure out the crop bounding box to use. You can use a
tool like Gimp to help you by drawing a bounding box with Gimp and noting the coordinates it gives
you to try with Pillow.
Now let’s move on and learn about applying filters to your images!
Using Filters
The Pillow package has several filters that you can apply to your images. These are the current filters
that are supported:
• BLUR
• CONTOUR
• DETAIL
• EDGE_ENHANCE
• EDGE_ENHANCE_MORE
• EMBOSS
• FIND_EDGES
• SHARPEN
• SMOOTH
• SMOOTH_MORE
Let’s use the butterfly image from earlier to test out a couple of these filters. Here is the image you
will be using:
Now that you have an image to use, go ahead and create a new file named blur.py and add this code to
it to try out Pillow’s BLUR filter:
# blur.py
image = Image.open(path)
blurred_image = image.filter(ImageFilter.BLUR)
blurred_image.save(modified_photo)
if __name__ == '__main__':
blur('butterfly.jpg', 'butterfly_blurred.jpg')
To actually use a filter in Pillow, you need to import ImageFilter. Then you pass in the specific filter
that you want to use to the filter() method. When you call filter(), it will return a new image object. You
then save that file to disk.
This is the image that you will get when you run the code:
That looks kind of blurry, so you can count this as a success! If you want it to be even blurrier, you
could run the blurry photo back through your script a second time.
Of course, sometimes you take photos that are slightly blurry and you want to sharpen them up a bit.
Pillow includes that as a filter you can apply as well. Create a new file named sharpen.py and add this
code:
# sharpen.py
image = Image.open(path)
sharpened_image = image.filter(ImageFilter.SHARPEN)
sharpened_image.save(modified_photo)
if __name__ == '__main__':
sharpen('butterfly.jpg', 'butterfly_sharper.jpg')
Here you take the original butterfly photo and apply the SHARPEN filter to it before saving it off.
When you run this code, your result will look like this:
Depending on your eyesight and your monitor’s quality, you may or may not see much difference
here. However, you can rest assured that it is slightly sharper.
Now let’s find out how you can add borders to your images!
Adding Borders
One way to make your photos look more professional is to add borders to them. Pillow makes this
pretty easy to do via their ImageOps module. But before you can do any borders, you need an image.
Here is the one you’ll be using:
Now that you have a nice image to play around with, go ahead and create a file named border.py and
put this code into it:
# border.py
img = Image.open(input_image)
else:
bimg.save(output_image)
if __name__ == '__main__':
in_img = 'butterfly_grey.jpg'
add_border(in_img, output_image='butterfly_border.jpg',
border=100)
Isn’t that nice? If you want to get really fancy, you can pass in different values for all four sides of
the image. But there probably aren’t very many use-cases where that makes sense.
Having a black border is nice and all, but sometimes you’ll want to add a little pizazz to your picture.
You can change that border color by passing in the fill argument to expand(). This argument takes in a
named color or an RGB color.
Create a new file named colored_border.py and add this code to it:
# colored_border.py
img = Image.open(input_image)
border, tuple):
bimg = ImageOps.expand(img,
border=border,
fill=color)
else:
raise RuntimeError(msg)
bimg.save(output_image)
if __name__ == '__main__':
in_img = 'butterfly_grey.jpg'
add_border(in_img,
output_image='butterfly_border_red.jpg',
border=100,
color='indianred')
Now your add_border() function takes in a color argument, which you pass on to the expand() method.
When you run this code, you’ll see this for your result:
That looks pretty nice. You can experiment around with different colors or apply your own favorite
color as the border.
The next item on your Pillow tour is to learn how to resize images!
Resizing Images
Resizing images with Pillow is fairly simple. You will be using the resize() method which takes in a
tuple of integers that are used to resize the image. To see how this works, you’ll be using this lovely
shot of a lizard:
Now that you have a photo, go ahead and create a new file named resize_image.py and put this code in
it:
# resize_image.py
original_image = Image.open(input_image_path)
f'high')
resized_image = original_image.resize(size)
f'high')
resized_image.show()
resized_image.save(output_image_path)
if __name__ == '__main__':
resize_image(
input_image_path='lizard.jpg',
output_image_path='lizard_small.jpg',
size=(800, 400),
Here you pass in the lizard photo and tell Pillow to resize it to 600 x 400. When you run this code,
the output will tell you that the original photo was 1191 x 1141 pixels before it resizes it for you.
The result of running this code looks like this:
Well, that looks a bit odd! Pillow doesn’t actually do any scaling when it resizes the image. Instead,
Pillow will stretch or contort your image to fit the values you tell it to use.
What you want to do is scale the image. To make that work, you need to create a new file
named scale_image.py and add some new code to it. Here’s the code you need:
# scale_image.py
def scale_image(
input_image_path,
output_image_path,
width=None,
height=None
):
original_image = Image.open(input_image_path)
w, h = original_image.size
'high')
elif width:
max_size = (width, h)
elif height:
else:
original_image.thumbnail(max_size, Image.ANTIALIAS)
original_image.save(output_image_path)
scaled_image = Image.open(output_image_path)
'high')
if __name__ == '__main__':
scale_image(
input_image_path='lizard.jpg',
output_image_path='lizard_scaled.jpg',
width=800,
This time around, you let the user specify both the width and height. If the user specifies a width, a
height, or both, then the conditional statement uses that information to create a max_size. Once it has
the max_size value calculated, you pass that to thumbnail() and save the result. If the user specifies both
values, thumbnail() will maintain the aspect ratio correctly when resizing.
When you run this code, you will find that the result is a smaller version of the original image and
that it now maintains its aspect ratio.
Wrapping Up
Pillow is very useful for working with images using Python. In this article, you learned how to do the
following:
•Open Images
•Crop Images
•Use Filters
•Add Borders
•Resize Images
You can do much more with Pillow than what is shown here. For example, you can do various image
enhancements, like changing the contrast or brightness of an image. Or you could composite multiple
images together.
Read
Discuss
Courses
Practice
Video
PIL is the Python Imaging Library which provides the python interpreter with image editing capabilities. It
was developed by Fredrik Lundh and several other contributors. Pillow is the friendly PIL fork and an easy to
use library developed by Alex Clark and other contributors. We’ll be working with Pillow.
Installation:
• Linux: On linux terminal type the following:
pip install Pillow
Installing pip via terminal:
sudo apt-get update
sudo apt-get install python-pip
• Windows: Download the appropriate Pillow package according to your python version. Make sure to
download according to the python version you have.
We’ll be working with the Image Module here which provides a class of the same name and provides a lot of
functions to work on our images.To import the Image module, our code should begin with the following line:
from PIL import Image
Operations with Images:
• Open a particular image from a path:
#img = Image.open(path)
try:
img = Image.open(path)
except IOError:
pass
• Retrieve size of image: The instances of Image class that are created have many attributes, one of its
useful attribute is size.
img.save(path, format)
• Rotating an Image: The image rotation needs angle as parameter to get the image rotated.
def main():
try:
#Relative Path
img = Image.open("picture.jpg")
#Angle given
img = img.rotate(180)
img.save("rotated_picture.jpg")
except IOError:
pass
if __name__ == "__main__":
main()
•
Note: There is an optional expand flag available as one of the argument of the rotate method, which if set
true, expands the output image to make it large enough to hold the full rotated image.
As seen in the above code snippet, I have used a relative path where my image is located in the same
directory as my python code file, an absolute path can be used as well.
• Cropping an Image: Image.crop(box) takes a 4-tuple (left, upper, right, lower) pixel coordinate, and
returns a rectangular region from the used image.
def main():
try:
#Relative Path
img = Image.open("picture.jpg")
img = img.crop(area)
#Saved in the same relative location
img.save("cropped_picture.jpg")
except IOError:
pass
if __name__ == "__main__":
main()
•
• Resizing an Image: Image.resize(size)- Here size is provided as a 2-tuple width and height.
def main():
try:
#Relative Path
img = Image.open("picture.jpg")
img.save("resized_picture.jpg")
except IOError:
pass
if __name__ == "__main__":
main()
• Pasting an image on another image: The second argument can be a 2-tuple (specifying the top left
corner), or a 4-tuple (left, upper, right, lower) – in this case the size of pasted image must match the size of
this box region, or None which is equivalent to (0, 0).
def main():
try:
#Relative Path
img = Image.open("picture.jpg")
#Relative Path
img2 = Image.open("picture2.jpg")
img.save("pasted_picture.jpg")
except IOError:
pass
if __name__ == "__main__":
main()
##An additional argument for an optional image mask image is also available.
•
• Getting a Histogram of an Image: This will return a histogram of the image as a list of pixel counts, one
for each pixel in the image. (A histogram of an image is a graphical representation of the tonal distribution
in a digital image. It contains what all the brightness values contained in an image are. It plots the number
of pixels for each brightness value. It helps in doing the exposure settings.)
from PIL import Image
def main():
try:
#Relative Path
img = Image.open("picture.jpg")
print img.histogram()
except IOError:
pass
if __name__ == "__main__":
main()
•
• Transposing an Image: This feature gives us the mirror image of an image
def main():
try:
#Relative Path
img = Image.open("picture.jpg")
#transposing image
transposed_img = img.transpose(Image.FLIP_LEFT_RIGHT)
transposed_img.save("transposed.jpg")
except IOError:
pass
if __name__ == "__main__":
main()
•
• Split an image into individual bands: Splitting an image in RGB mode, creates three new images each
containing a copy of the original individual bands.
def main():
try:
#Relative Path
img = Image.open("picture.jpg")
print img.split()
except IOError:
pass
if __name__ == "__main__":
main()
•
• tobitmap: Converting an image to an X11 bitmap (A plain text binary image format). It returns a string
containing an X11 bitmap, it can only be used for mode “1” images, i.e. 1 bit pixel black and white images.
from PIL import Image
def main():
try:
#Relative Path
img = Image.open("picture.jpg")
print img.mode
print img.tobitmap()
print type(img.tobitmap())
except IOError:
pass
if __name__ == "__main__":
main()
• Creating a thumbnail: This method creates a thumbnail of the image that is opened. It does not return a
new image object, it makes in-place modification to the currently opened image object itself. If you do not
want to change the original image object, create a copy and then apply this method. This method also
evaluates the appropriate to maintain the aspect ratio of the image according to the size passed.
from PIL import Image
def main():
try:
#Relative Path
img = Image.open("picture.jpg")
#In-place modification
img.thumbnail((200, 200))
img.save("thumb.jpg")
except IOError:
pass
if __name__ == "__main__":
main()
•
EXPERIMENT-2
AIM:Geometric transformations of Image in Python
Geometric transformations of images are used to transform the image by changing
its size, position or orientation. It has many applications in the fields of Machine
Learning and Image Processing.
For instance, consider a Machine Learning based project of detecting emotions such
as anger, sadness, happy from a given set of images. The database consists of
images present at different scales and orientations. But the model needs a uniform
set of images. Therefore, it is necessary to apply geometric transformations to
images to transform them into a consistent format.
• Rotation
• Scaling
• Translation.
• we will also learn how to combine these transformations together to perform
composite transformations of the image.
Output:
Getting the size and mode of Image
The properties of the above-created image object such as size and mode are used
to get the size and color model of the given image. We get the size in terms of width
and height. The color model, in this case, is RGB. RGB stands for red, green, and
blue channels of the given image.
size=image.size
mode=image.mode
Output:
The size of Image is: (220, 220)
The mode of Image is: RGB
Rotation of Image
For rotating an image, we are initially taking angle as a user input to determine the
angle with which the image should be rotated. Then we use the rotate() function to
rotate the image by the specified angle in degrees in a clockwise approach. We then
plot the rotated image as an output. In the below-mentioned code, we have rotated
the image by 90 degrees.
angle=int(input("Enter angle:"))
image = image.rotate(angle)
plt.imshow(image)
Output:
Scaling of Image
For scaling an image, we try to increase or decrease the size of the image. To scale
an image we make use of resize() function in Python. The resize function takes a
tuple containing the width and height of the image as parameters. The image is then
resized to this newly mentioned width and height. In the below-mentioned code, we
have doubled the width and height of the image.
(width,height)=(image.width*2,image.height*2)
img_resize = image.resize((width,height))
plt.imshow(img_resize)
Output:
Translation of Image
Image translation is changing the position of an image by a specified shift in x and y
directions. To translate an image we make use of the transform() function in Python.
The syntax of the transform function is mentioned below.
In the below-mentioned code, the method used for transformation is AFFINE. Affine
Transformation is used to transform the image while preserving parallel lines in input
and output images. The input data to the affine method is a six-element tuple
(a,b,c,d,e,f) which represents an affine transformation matrix
Initially, we take the values x and y as input which represents the x and y-axis shifts
respectively. The method will calculate the value as (ax+by+c, dx+ey+f) for every
(x,y) value given as input to the c and f variables.
plt.imshow(image)
Output
(width,height)=(round(im.width/2),round(im.height/2))
img_resize = im.resize((width,height))
im1=img_resize.rotate(-50)
plt.imshow(im1)
Output:
EXPERIMENT-3
AIM: Compute Homography Matrix.
What is Homography ?
Consider two images of a plane (top of the book) shown in Figure 1. The red
dot represents the same physical point in the two images. In computer vision
jargon we call these corresponding points. Figure 1. shows four corresponding
points in four different colors — red, green, yellow and orange.
A Homography is a transformation ( a 3×3 matrix ) that maps the points in
one image to the corresponding points in the other image.
Figure 2 :
One image of a 3D plane can be aligned with another image of the same
plane using Homography
But what about points that are not on the plane ? Well, they will NOT be
aligned by a homography as you can see in Figure 2. But wait, what if there
are two planes in the image ? Well, then you have two homographies — one
for each plane.
Homography examples using OpenCV –
Panorama
In the previous section, we learned that if a homography between two images
is known, we can warp one image onto the other. However, there was one big
caveat. The images had to contain a plane ( the top of a book ), and only the
planar part was aligned properly. It turns out that if you take a picture of any
scene ( not just a plane ) and then take a second picture by rotating the
camera, the two images are related by a homography!
In other words you can mount your camera on a tripod and take a picture.
Next, pan it about the vertical axis and take another picture. The two images
you just took of a completely arbitrary 3D scene are related by a homography.
The two images will share some common regions that can be aligned and
stitched and bingo you have a panorama of two images. Is it really that easy ?
Nope! (sorry to disappoint) A lot more goes into creating a good panorama,
but the basic principle is to align using a homography and stitch intelligently so
that you do not see the seams. Creating panoramas will definitely be part of a
future post.
Images in Figure 2. can also be generated using the following Python code.
The code below shows how to take four corresponding points in two images
and warp image onto the other.
#!/usr/bin/env python
1
2
import cv2
3
import numpy as np
4
5
if __name__ == '__main__' :
6
7
# Read source image.
8
im_src = cv2.imread('book2.jpg')
9 # Four corners of the book in source image
11
13 im_dst = cv2.imread('book1.jpg')
16
# Calculate Homography
17
h, status = cv2.findHomography(pts_src, pts_dst)
18
19
# Warp source image to destination based on homography
20
im_out = cv2.warpPerspective(im_src, h, (im_dst.shape[1],im_dst.shape[0]))
21
22 # Display images
23 cv2.imshow("Source Image", im_src)
26
cv2.waitKey(0)
27
28
29
Applications of Homography
The most interesting application of Homography is undoubtedly making
panoramas ( a.k.a image mosaicing and image stitching ). Panoramas will be
the subject of a later post. Let us see some other interesting applications.
EXPERIMENT-4
cv2.getPerspectiveTransform method
Syntax: cv2.getPerspectiveTransform(src, dst)
Parameters:
• src: Coordinates of quadrangle vertices in the source image.
• dst: Coordinates of the corresponding quadrangle vertices in the destination image.
cv2.wrapPerspective method
Syntax: cv2.warpPerspective(src, dst, dsize)
Parameters:
• src: Source Image
• dst: output image that has the size dsize and the same type as src.
• dsize: size of output image
Python code explaining Perspective Transformation:
import cv2
import numpy as np
while True:
if cv2.waitKey(24) == 27:
break
cap.release()
cv2.destroyAllWindows()
Output:
Output:
How to apply Perspective Transformations
on an image using OpenCV Python?
In Perspective Transformation, the straight lines remain straight even after the transformation. To apply a
perspective transformation, we need a 3×3 perspective transformation matrix. We need four points on the
input image and corresponding four points on the output image.
We apply the cv2.getPerspectiveTransform() method to find the transformation matrix. Its syntax is as
follows −
M = cv2.getPerspectiveTransform(pts1,pts2)
where,
• pts1 − An array of four points on the input image and
• pts2 − An array of corresponding four points on the output image.
The Perspective Transformation matrix M is a numpy array. We pass M to the cv2.warpAffine() function as
an argument to compute perspective transformation. Its syntax is −
cv2.warpAffine(img,M,(cols,rows))
Where,
• img − The image to be transformed.
• M − Perspective transformation matrix defined above.
• (cols, rows) − Width and height of the image after transformation.
To apply Perspective Transformation on an image, you can follow the steps given below −
Steps
Import the required library. In all the following Python examples, the required Python library is OpenCV.
Make sure you have already installed it.
import cv2
Read the input image using cv2.imread() function. Pass the full path of the input image.
img = cv2.imread('warning_wall.jpg')
Define pts1 and pts2. pts1 is an array of four points on the input image and pts2 is an array of
corresponding four points on the output image.
pts1 = np.float32([[56,65],[368,52],[28,387],[389,390]])
pts2 = np.float32([[100,50],[300,0],[0,300],[300,300]])
Compute the perspective transform matrix M using cv2.getPerspectiveTransform(pts1, pts2) function.
M = cv2.getPerspectiveTransform(pts1,pts2)
Transform the image using cv2.warpAffine() method passing the perspective transform matrix as
argument. cols and rows are the desired width and height of the image after transformation.
dst = cv2.warpAffine(img,M,(cols,rows))
Display the transformed image.
cv2.imshow("Transformed Image", dst)
cv2.waitKey(0)
cv2.destroyAllWindows()
Let's look at some examples for a better understanding of how it is done.
We will use this image as the input file for the following examples.
Example 1
In this example, we perform Perspective Transformation on the input image. We set the width and height
of the output image the same as the input image.
# import required libraries
import cv2
import numpy as np
img = cv2.imread('warning_wall.jpg')
rows,cols,ch = img.shape
pts1 = np.float32([[56,65],[368,52],[28,387],[389,390]])
pts2 = np.float32([[100,50],[300,0],[0,300],[300,300]])
M = cv2.getPerspectiveTransform(pts1,pts2)
cv2.waitKey(0)
cv2.destroyAllWindows()
Output
On execution, this Python program will produce the following output window −
The above output image is obtained after the Perspective Transformation on the input image.
Example 2
In this example, we perform Perspective Transform on the input image. We set the width and height of the
output image as (600, 350). It is different from the width and height of the input image.
import cv2
import numpy as np
img = cv2.imread('warning_wall.jpg',0)
rows,cols = img.shape
pts1 = np.float32([[56,65],[368,52],[28,387],[389,390]])
pts2 = np.float32([[0,0],[300,0],[0,300],[300,300]])
M = cv2.getPerspectiveTransform(pts1,pts2)
dst = cv2.warpPerspective(img,M,(600,350))
plt.subplot(121),plt.imshow(img, cmap='gray'),plt.title('Input')
plt.subplot(122),plt.imshow(dst, cmap='gray'),plt.title('Output')
plt.show()
Output
On execution, it will produce the following output window −
The left image is the input image and the right image is the output image after Perspective Transformation.
EXPERIMENT-5
Required libraries:
• OpenCV library in python is a computer vision library, mostly used for image processing, video
processing, and analysis, facial recognition and detection, etc.
• Numpy is a general-purpose array-processing package. It provides a high-performance
multidimensional array object and tools for working with these arrays.
Camera Calibration can be done in a step-by-step approach:
• Step 1: First define real world coordinates of 3D points using known size of checkerboard
pattern.
• Step 2: Different viewpoints of check-board image is captured.
• Step 3: findChessboardCorners() is a method in OpenCV and used to find pixel
coordinates (u, v) for each 3D point in different images
• Step 4: Then calibrateCamera() method is used to find camera parameters.
twodpoints.append(corners2)
cv2.imshow('img', image)
cv2.waitKey(0)
cv2.destroyAllWindows()
h, w = image.shape[:2]
Input:
Output:
Camera matrix:
[[ 36.26378216 0. 125.68539168]
[ 0. 36.76607372 142.49821147]
[ 0. 0. 1. ]]
Distortion coefficient:
[[-1.25491812e-03 9.89269357e-05 -2.89077718e-03 4.52760939e-04
-3.29964245e-06]]
Rotation Vectors:
[array([[-0.05767492],
[ 0.03549497],
[ 1.50906953]]), array([[-0.09301982],
[-0.01034321],
[ 3.07733805]]), array([[-0.02175332],
[ 0.05611105],
[-0.07308161]])]
Translation Vectors:
[array([[ 4.63047351],
[-3.74281386],
[ 1.64238108]]), array([[2.31648737],
[3.98801521],
[1.64584622]]), array([[-3.17548808],
[-3.46022466],
[ 1.68200157]])]
EXPERIMENT-6
When the two image planes are parallel, then the epipoles e and e’ are located at infinity. Notice
that the epipolar lines are parallel to u axis of each image plane.
An interesting case of epipolar geometry is shown in Figure 4, which occurs when the image
planes are parallel to each other. When the image planes are parallel to each other, then the
epipoles e and e’ will be located at infinity since the baseline joining the centers O1, O2 is parallel
to the image planes. Another important byproduct of this case is that the epipolar lines are
parallel to an axis of each image plane. This case is especially useful and will be covered in
greater detail in the subsequent section on image rectification.
In real-world situations, however, we are not given the exact location of the 3D location P, but can
determine its projection in one of the image planes p. We also should be able to know the
camera’s locations, orientations, and camera matrices. What can we do with this knowledge?
With the knowledge of camera locations O1, O2 and the image point p, we can define the
epipolar plane. With this epipolar plane, we can then determine the epipolar lines1. By definition,
P’s projection into the second image p0 must be located on the epipolar line of the second image.
Thus, a basic understanding of epipolar geometry allows us to create a strong constraint between
image pairs without knowing the 3D structure of the scene.
The setup for determining the essential and fundamental matrices, which help map points and
epipolar lines across views.
We will now try to develop seamless ways to do map points and epipolar lines across views. If we
take the setup given in the original epipolar geometry framework (Figure 5), then we shall further
deneM andM0 to be the camera projection matrices that map 3D points into their respective 2D
image plane locations. Let us assume that the world reference system is associated to the rst
camera with the second camera oset rst by a rotation R and then by a translation T. This species
the camera projection matrices to be:
M = K[I 0] M' = K'[R T]
Now we find Fundamental Matrix (F) and Essential Matrix (E). Essential Matrix contains the
information regarding translation and rotation, that describes the location of th e second camera
relative to the first in global coordinates.
Fundamental Matrix contains equivalent information as Essential Matrix additionally to the
knowledge about the intrinsics of both cameras in order that we will relate the 2 cameras in pixel
coordinates. (If we are using rectified images and normalize the point by dividing by the focal
lengths, F=E). In simple words, Fundamental Matrix F maps some extent in one image to a line
(epiline) within the other image. This is calculated from matching points from both the pictures. A
minimum of 8 such points is required to seek out the elemental matrix (while using the 8 -point
algorithm). More points are preferred and use RANSAC to urge a more robust result.
So first we need to find as many possible matches between two images to find the fundamental
matrix. For this, we use SIFT descriptors with FLANN based matcher and ratio test.
import numpy as np
import cv2
# in gray scale
imgLeft = cv2.imread('image_l.png',
0)
imgRight = cv2.imread('image_r.png',
0)
# two images
sift = cv2.xfeatures2d.SIFT_create()
None)
None)
FLANN_INDEX_KDTREE = 0
indexParams = dict(algorithm=FLANN_INDEX_KDTREE,
trees=5)
searchParams = dict(checks=50)
flann = cv2.FlannBasedMatcher(indexParams,
searchParams)
goodMatches = []
ptsLeft = []
ptsRight = []
for m, n in matches:
goodMatches.append([m])
ptsLeft.append(keyPointsLeft[m.trainIdx].pt)
ptsRight.append(keyPointsRight[n.trainIdx].pt)
Image Left
Image Right
ptsLeft = np.int32(ptsLeft)
ptsRight = np.int32(ptsRight)
F, mask = cv2.findFundamentalMat(ptsLeft,
ptsRight,
cv2.FM_LMEDS)
ptsLeft = ptsLeft[mask.ravel() == 1]
ptsRight = ptsRight[mask.ravel() == 1]
Next, we find the epilines. Epilines corresponding to the points in the first image is drawn on the
second image. So mentioning correct images are important here. We get an array of lines. So we
define a new function to draw these lines on the images.
r, c = img1.shape
3).tolist())
x1, y1 = map(int,
img1 = cv2.line(img1,
img1 = cv2.circle(img1,
img2 = cv2.circle(img2,
Now we find the epilines in both the images and draw them.
# Find epilines corresponding to points
linesLeft = cv2.computeCorrespondEpilines(ptsRight.reshape(-1,
1,
2),
2, F)
linesLeft = linesLeft.reshape(-1, 3)
linesLeft, ptsLeft,
ptsRight)
1, F)
linesRight = linesRight.reshape(-1, 3)
linesRight, ptsRight,
ptsLeft)
plt.subplot(121), plt.imshow(img5)
plt.subplot(122), plt.imshow(img3)
plt.show()
Output:
EXPERIMENT-7
There are two ways in which we would be implementing Edge detection on our images. In the first
method we would be using an inbuilt method provided in the pillow library
(ImageFilter.FIND_EDGES) for edge detection. In the second one we would be creating a
Laplacian Filter using PIL.ImageFilter.Kernel(), and then would use that filter for edge detection.
LAPLACIAN KERNEL:-
SAMPLE IMAGE:-
Method 1:
• Python3
image = Image.open(r"Sample.png")
image = image.convert("L")
image = image.filter(ImageFilter.FIND_EDGES)
image.save(r"Edge_Sample.png")
Output (Edge_Sample.png):
Explanation:-
Firstly we create an image object of our image using Image.open(). Then we convert the Image
color mode to grayscale, as the input to the Laplacian operator is in grayscale mode (in general).
Then we pass the image onto Image.filter() function by specifying ImageFilter.FIND_EDGES
argument, which in turns runs a edge detection kernel on top of the image. The output of the
above function results in an image with high intensity changes (edges) in shades of white, and
rest of the image in black color.
Method 2:
• Python3
from PIL import Image, ImageFilter
img = Image.open(r"sample.png")
img = img.convert("L")
final.save("EDGE_sample.png")
Output (EDGE_sample.png):
Explanation:-
Firstly we create an image object of our image using Image.open(). Then we convert the Image
color mode to grayscale, as the input to the Laplacian operator is in grayscale mode (in general).
Then we pass the image onto Image.filter() function by specifying our operator/Kernel inside the
function as an argument. The Kernel is specified by using ImageFilter.Kernel((3, 3), (-1, -1, -1, -1,
8, -1, -1, -1, -1), 1, 0)) which create a 3 X 3 Kernel (3 pixel Wide and 3 pixel long) with the
values (-1, -1, -1, -1, 8, -1, -1, -1, -1) (as stated in the Laplacian Kernel image). The 1 argument
(after the kernel) stands for the Scale value, which divides the final value after each kernel
operation, therefore we set that value to 1 as we don’t want any division to our final value. The 0
argument (after the Scale value) is the offset which is added after the division by Scale va lue. We
have set that value to 0 as we don’t want any increment to the final intensity value after the
Kernel Convolution. The output of the above function results in an image with high intensity
changes (edges) in shades of white, and rest of the image in black color.
Addendum –
Both the programs yielded the same result. The reason for which being the fact that the inbuilt
function ImageFilter.FIND_EDGE uses a 3 X 3 sized Laplacian Kernel/Operator internally. Due to
which we ended up with identical results. The benefit of using a Kernel instead of relying on
inbuilt functions is that we can define kernels according to our needs, which may/may not be in
the library. Such as we can create a Kernel for Blurring, Sharpening, Edge detection (usi ng other
Kernels) etc. Also, I intentionally chose the Laplacian so that we can maintain consistency in
results.
Benefits of using Laplacian:- Fast and decent results. Other common edge detectors like Sobel
(first order derivative) are more expensive on computation, as they require finding Gradients in
two directions and then Normalizing the results.
Drawbacks of using laplacian:- Convolving with Laplacian Kernel leads to a lot of noise in the
output. This issue is resolved by other Edge Detection methods such as Sobel, Prewitt Operator
etc. As they have a built-in Gaussian Blur Kernel in them. Which reduces the noise obtained from
the input image. They also lead to more accurate edge detection, due to the higher computation
involved into finding them
• Run the following command on your terminal to install it from the Ubuntu or Debian
repository.
bash install-opencv.sh
• on your terminal.
• Type your sudo password and you will have installed OpenCV.
• The first thing we are going to do is find the gradient of the grayscale image, allowing us to
find edge-like regions in the x and y direction. The gradient is a multi-variable generalization of
the derivative. While a derivative can be defined on functions of a single variable, for functions
of several variables, the gradient takes its place.
import cv2
import numpy as np
cap = cv2.VideoCapture(0)
while(1):
_, frame = cap.read()
# Calculation of Sobelx
sobelx = cv2.Sobel(frame,cv2.CV_64F,1,0,ksize=5)
# Calculation of Sobely
sobely = cv2.Sobel(frame,cv2.CV_64F,0,1,ksize=5)
# Calculation of Laplacian
laplacian = cv2.Laplacian(frame,cv2.CV_64F)
cv2.imshow('sobelx',sobelx)
cv2.imshow('sobely',sobely)
cv2.imshow('laplacian',laplacian)
if k == 27:
break
cv2.destroyAllWindows()
cap.release()
plt.imshow(img), plt.show()
Image after corner detection –
• First it creates a 2D array or accumulator (to hold values of two parameters) and it is set to
zero initially.
• Let rows denote the r and columns denote the (θ)theta.
• Size of array depends on the accuracy you need. Suppose you want the accuracy of
angles to be 1 degree, you need 180 columns(Maximum degree for a straight line is 180).
• For r, the maximum distance possible is the diagonal length of the image. So taking one
pixel accuracy, number of rows can be diagonal length of the image.
Example:
Consider a 100×100 image with a horizontal line at the middle. Take the first point of the line. You
know its (x,y) values. Now in the line equation, put the values θ(theta) = 0,1,2,….,180 and check
the r you get. For every (r, 0) pair, you increment value by one in the accumulator in its
corresponding (r,0) cells. So now in accumulator, the cell (50,90) = 1 along with some other
cells.
Now take the second point on the line. Do the same as above. Increment the values in the cells
corresponding to (r,0) you got. This time, the cell (50,90) = 2. We are actually voting the (r,0)
values. You continue this process for every point on the line. At each point, the cell (50,90) will be
incremented or voted up, while other cells may or may not be voted up. This way, at the end, the
cell (50,90) will have maximum votes. So if you search the accumulator for maximum votes, you
get the value (50,90) which says, there is a line in this image at distance 50 from origin and at
angle 90 degrees.
Everything explained above is encapsulated in the OpenCV function, cv2.HoughLines(). It simply
returns an array of (r, 0) values. r is measured in pixels and 0 is measured in radians.
• Python
import cv2
import numpy as np
img = cv2.imread('image.jpg')
r, theta = arr
a = np.cos(theta)
b = np.sin(theta)
x0 = a*r
y0 = b*r
# x1 stores the rounded off value of (rcos(theta)-1000sin(theta))
x1 = int(x0 + 1000*(-b))
y1 = int(y0 + 1000*(a))
x2 = int(x0 - 1000*(-b))
y2 = int(y0 - 1000*(a))
cv2.imwrite('linesDetected.jpg', img)
1. First parameter, Input image should be a binary image, so apply threshold edge detection
before finding applying hough transform.
2. Second and third parameters are r and θ(theta) accuracies respectively.
3. Fourth argument is the threshold, which means minimum vote it should get for it to be
considered as a line.
4. Remember, number of votes depend upon number of points on the line. So it represents the
minimum length of line that should be detected.
1.
Alternate simpler method for directly extracting points:
• Python3
import cv2
import numpy as np
# Read image
image = cv2.imread('path/to/image.png')
gray = cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)
edges = cv2.Canny(gray,50,150,apertureSize=3)
lines_list =[]
lines = cv2.HoughLinesP(
x1,y1,x2,y2=points[0]
cv2.line(image,(x1,y1),(x2,y2),(0,255,0),2)
lines_list.append([(x1,y1),(x2,y2)])
cv2.imwrite('detectedLines.png',image)
• In an image analysis context, the coordinates of the point(s) of edge segments (i.e. X,Y ) in
the image are known and therefore serve as constants in the parametric line equation, while
R(rho) and Theta(θ) are the unknown variables we seek.
• If we plot the possible (r) values defined by each (theta), points in cartesian image space map
to curves (i.e. sinusoids) in the polar Hough parameter space. This point-to-curve
transformation is the Hough transformation for straight lines.
• The transform is implemented by quantizing the Hough parameter space into finite intervals or
accumulator cells. As the algorithm runs, each (X,Y) is transformed into a discretized (r,0)
curve and the accumulator(2D array) cells which lie along this curve are incremented.
• Resulting peaks in the accumulator array represent strong evidence that a corresponding
straight line exists in the image.
Applications of Hough transform:
1. It is used to isolate features of a particular shape within an image.
2. Tolerant of gaps in feature boundary descriptions and is relatively unaffected by image noise.
3. Used extensively in barcode scanning, verification and recognition
EXPERIMENT-8
AIM:Introduction to SIFT( Scale Invariant Feature Transform)
SIFT stands for Scale-Invariant Feature Transform and was first presented in
2004, by D.Lowe, University of British Columbia. SIFT is invariance to image
scale and rotation. This algorithm is patented, so this algorithm is included in
the Non-free module in OpenCV.
Major advantages of SIFT are
• Locality: features are local, so robust to occlusion and clutter (no prior
segmentation)
The algorithm
SIFT is quite an involved algorithm. There are mainly four steps involved in the
SIFT algorithm. We will see them one-by-one.
• Keypoint Matching
Scale-space
Real world objects are meaningful only at a certain scale. You might see a sugar
cube perfectly on a table. But if looking at the entire milky way, then it simply
does not exist. This multi-scale nature of objects is quite common in nature.
And a scale space attempts to replicate this concept on digital images.
The scale space of an image is a function L(x,y,σ) that is produced from the
convolution of a Gaussian kernel(Blurring) at different scales with the input
image. Scale-space is separated into octaves and the number of octaves and
scale depends on the size of the original image. So we generate several octaves
of the original image. Each octave’s image size is half the previous one.
Blurring
Within an octave, images are progressively blurred using the Gaussian Blur
operator. Mathematically, “blurring” is referred to as the convolution of the
Gaussian operator and the image. Gaussian blur has a particular expression or
“operator” that is applied to each pixel. What results is the blurred image.
Blurred image
G is the Gaussian Blur operator and I is an image. While x,y are the location
coordinates and σ is the “scale” parameter. Think of it as the amount of blur.
Greater the value, greater the blur.
Now we use those blurred images to generate another set of images, the
Difference of Gaussians (DoG). These DoG images are great for finding out
interesting keypoints in the image. The difference of Gaussian is obtained as
the difference of Gaussian blurring of an image with two different σ, let it be σ
and kσ. This process is done for different octaves of the image in the Gaussian
Pyramid. It is represented in below image:
Finding keypoints
Up till now, we have generated a scale space and used the scale space to
calculate the Difference of Gaussians. Those are then used to calculate
Laplacian of Gaussian approximations that are scale invariant.
Keypoint Localization
Key0points generated in the previous step produce a lot of keypoints. Some of
them lie along an edge, or they don’t have enough contrast. In both cases, they
are not as useful as features. So we get rid of them. The approach is similar to
the one used in the Harris Corner Detector for removing edge features. For low
contrast features, we simply check their intensities.
They used Taylor series expansion of scale space to get a more accurate
location of extrema, and if the intensity at this extrema is less than a threshold
value (0.03 as per the paper), it is rejected. DoG has a higher response for
edges, so edges also need to be removed. They used a 2x2 Hessian matrix (H)
to compute the principal curvature.
Orientation Assignment
Keypoint descriptor
Keypoint Matching
Implementation
I was able to implement sift using OpenCV(3.4). Here’s how I did it:
plots[0].set_title("Training Image")
plots[0].imshow(training_image)
plots[1].set_title("Testing Image")
plots[1].imshow(test_image)
Out[1]:
<matplotlib.image.AxesImage at 0x7fa8c84ed390>
keypoints_without_size = np.copy(training_image)
keypoints_with_size = np.copy(training_image)
Matching Keypoints
In [5]:
# Create a Brute Force Matcher object.
bf = cv2.BFMatcher(cv2.NORM_L1, crossCheck = False)
# Perform the matching between the SIFT descriptors of the training image
and the test image
matches = bf.match(train_descriptor, test_descriptor)
# Print total number of matching points between the training and query
images
print("\nNumber of Matching Keypoints Between The Training and Query Images:
", len(matches))
Number of Matching Keypoints Between The Training and Query Images: 302
EXPERIMENT-9
In 2005, Dalal and Triggs published a research paper named Histograms of Oriented
Gradients for Human Detection. After the release of this paper, HOG is used in a lot of
object detection applications.
• HOG focuses on the structure of the object. It extracts the information of the
edges magnitude as well as the orientation of the edges.
• It uses a detection window of 64x128 pixels, so the image is first converted
into (64, 128) shape.
• The image is then further divided into small parts, and then the gradient and
orientation of each part is calculated. It is divided into 8x16 cells into blocks with
50% overlap, so there are going to be 7x15 = 105 blocks in total, and each block
consists of 2x2 cells with 8x8 pixels.
• We take the 64 gradient vectors of each block (8x8 pixel cell) and put them into
a 9-bin histogram.
As mentioned previously, if you have a wide image, then crop the image to the specific
part in which you want to apply HOG feature extraction, and then resize it to the
appropriate shape.
Calculating Gradients
Now after resizing, we need to calculate the gradient in the x and y direction. The
gradient is simply the small changes in the x and y directions, we need to convolve two
simple filters on the image.
# resizing image
resized_img = resize(img, (128*4, 64*4))
plt.axis("off")
plt.imshow(resized_img)
print(resized_img.shape)
• image: The target image you want to apply HOG feature extraction.
• orientations: Number of bins in the histogram we want to create, the original
research paper used 9 bins so we will pass 9 as orientations.
• pixels_per_cell: Determines the size of the cell, as we mentioned earlier, it
is 8x8.
• cells_per_block: Number of cells per block, will be 2x2 as mentioned
previously.
• visualize: A boolean whether to return the image of the HOG, we set it
to True so we can show the image.
• multichannel: We set it to True to tell the function that the last dimension is
considered as a color channel, instead of spatial.
In this article, we will be exploring some of the best Computer Vision projects.
These projects range from beginner-level to expert-level, catering to individuals
at different skill levels. Each Computer Vision project will provide you with
comprehensive guides, source codes, and datasets, enabling you to delve
straight into practical implementation and hands-on experience in the field of
computer vision.
What is Computer Vision?
Computer vision is a field of study within artificial intelligence (AI) that focuses
on enabling computers to Intercept and extract information from images and
videos, in a manner similar to human vision. It involves developing algorithms
and techniques to extract meaningful information from visual inputs and make
sense of the visual world.