Report on Computer Vision
Report on Computer Vision
Submitted to Submitted By
Prof. Samapika Das Biswas Sushmita Mallick
Project guide Dept-IT
IEM Roll No: 10400313178
Registration No:131040110437
DATE- 19-05-2015
1|Page
CERTIFICATE
To Whom It May Concern
This is to certify that the project report entitled “COMPUTER VISION” submitted by
Student of Institute of Engineering & Management, in partial fulfilment of the requirement for
the degree of Bachelor of Information Technology, is a benefited work carried out by them
under the supervision and guidance of prof. Samapika Das Biswas during 4th semester of
Academic Session of 2014-2015. The content of this report has not been submitted to any other
university or institute for the award of any other degree.
I am glad to inform you that the work is entirely original and its performance is found to be
quite satisfactory.
2|Page
ACKNOWLEDGEMENTS
Secondly I would also like to thank my parents and friends who helped
me a lot in finalizing this project within the limited time frame
3|Page
CONTENTS
SL.NO TOPICS PAGE NO.
1 Title Page 1
2 Certificate 2
3 Acknowledgements 3
4 Table of Contents 4
5 Preface 5
6 List of Illustrations 6
7 Abstract 7
8 Introduction 8
9 Classification of 10
Images
10 Image Processing and 12
Computer Vision
11 Basic Concepts 16
12 Image Representation 17
13 Colour Spaces 19
14 Typical tasks of 23
Computer Vision
15 Computer Vision 25
system Methods
16 Application and Future 28
Prospects
17 Research Areas in 30
Computer Vision
18 Conclusion 32
19 References 33
PREFACE
4|Page
The goal of computer vision is to compute properties of the three-
dimensional world from digital images. Problems in this field include
reconstructing the 3D shape of an environment, determining how
things are moving, and recognizing people and objects and their
activities, all through analysis of images and videos.
LIST OF ILLUSTRATIONS
5|Page
SL.NO TOPIC PAGE NO.
1 Raster Image 11
2 Fields related to 13
computer vision
3 Image processing vs 15
Computer Vision
4 Binary 17
5 Grayscale 17
6 Colour 18
7 RGB Additive Nature 20
8 HSV 20
9 Object Tracking 29
6|Page
ABSTRACT
7|Page
INTRODUCTION
8|Page
image above as it is. All the robot perceives is some random
voltage levels, and an array of binary or decimal data. It will
have absolutely no idea as to which is. Its we, the intelligent
humans, who program the robot in such a way that identifies
different objects.
9|Page
Classification of Images
Some of the basic concepts related to images and image
processing like types of images, pixel, channel, depth are
discussed here.
Image- An image is an artefact that depicts or records
visual perception
Images may be two-dimensional, such as a photograph, screen
display, and as well as a three-dimensional, such as
a statue or hologram. They may be captured by optical devices
– such as cameras, mirrors, lenses, telescopes, microscopes,
etc. and natural objects and phenomena, such as the human
eye or water.
10 | P a g e
software support raster images only.
Fig.1
Vector Image- Images which are inspired by the concepts
of mathematical geometry (vectors) are called vector
images. According to the rule of vectors, each point in
the image has a direction and length. Such images are
quite complicated to understand and process. It is not
supported by many software and much work hasn’t been
done in this field.
11 | P a g e
Image Processing and Computer Vision
12 | P a g e
The scope of Computer Vision is much large as can be seen in
the following —
Fig.2
14 | P a g e
example, medical imaging includes substantial work on the
analysis of image data in medical applications.
Fig.3
15 | P a g e
Basic Concepts
Pixels and Resolution
Pixels are tiny little dots that form the image. They are the
smallest visual elements that can be seen. This makes them
physically located somewhere in a raster image. When an
image is stored, the image file contains the following
information:
Pixel Location
Pixel Intensity
Aspect Ratio
Aspect Ratio is basically a ratio of Width:Height of the image.
For instance a 256×256 image has an aspect ratio of 1:1. You
must have come across this thing in several context. Like
while watching a movie or some TV shows, you must have
come across different aspect ratio standards. There are
basically three aspect ratio standards —
Academy Standard – 4:3
US Digital Standard – 16:9
Anamorphic Scope Standard – 21:9
16 | P a g e
Image Representation
Images can be represented in three ways —
Black and White Images (i.e. Binary Images)
Grayscale Images
Color Images
Fig.4
Grayscale Images
In a Grayscale Image, images are represented by several
shades ranging in between black and white. Black is usually
represented as 0 and white is represented as 1. But unlike
binary images, intermediate values in between 0 and 1 are
also possible here thus resulting in different shades of gray.
Fig.5
Color Images
17 | P a g e
these colors has its own plane of pixel intensities in form of
separate channels. The channels correspond to different color
spaces as well.
Fig.6
Depth
Color Spaces
Fig.7
This picture contains several bands of colors. The top band
represents pure red color, below that it shows how it fades
into white color. The same is repeated with pure green and
pure blue, followed by yellow, cyan and magenta (which are a
combination of two of the primary colors) and then two bands
of white and black color.
HSV
Fig.8
20 | P a g e
Suppose we want to extract the yellow region of the ball. In
this case, there is a lot of variation in the color intensity due to
the ambient lighting. The top portion of the ball is very bright,
whereas the bottom portion is darker as compared to the other
regions. This is where the RGB color model fails. Due to such
a wide range of intensity and color mix, there is no particular
range of RGB values which can be used for extraction. This is
where the HSV color model comes in. Just like the RGB
model, in HSV model also, there are three different
parameters.
Hue: In simple terms, this represents the “color”. For
example, red is a color. Green is a color. Pink is a color. Light
Red and Dark Red both refer to the same color red.
Light/Dark Green both refer to the same color green. Thus, in
the above image, to extract the yellow ball, we target the
yellow color, since light/dark yellow refer to yellow.
Saturation: This represents the “amount” of a particular color.
For example we have red (having max value of 255), and we
also have pale red (some lesser value, say 106, etc).
Value: Sometimes represented as intensity, it differentiates
between the light and dark variations of that color. For
example light yellow and dark yellow can be differentiated
using this.
This makes HSV color space independent of illumination and
makes the processing of images easy. But it isn’t much
intuitive and some people may have some difficulty to
understand its concepts.Each and every color has a separate
hue value.Different shades of red have the same hue value.
Saturation refers to the amount of color. That’s why you can
see such a variation in the saturation channel. The value (or
intensity) is the same in a computer generated image. In real
images, there will be variation in the intensity channel as well.
21 | P a g e
The converse is equally true. When you combine three
grayscale images, it will result in a color image. And this is no
magic as well! The three grayscale images are combined and
represented as a color image.
Channels
22 | P a g e
Typical tasks of computer vision
Recognition
The most important thing in computer vision, image
processing, and machine vision is to determine whether the
image data contains some specific object or not.
Object recognition –
One or several pre-specified or learned objects or object
classes can be recognized, usually together with their 2D
positions in the image or 3D poses in the scene.
Identification –
An individual instance of an object is recognized.
Examples include identification of a specific person's face
or fingerprint, identification of handwritten digits, or
identification of a specific vehicle.
Detection –
The image data are scanned for a specific condition.
Examples include detection of possible abnormal cells or
tissues in medical images or detection of a vehicle in an
automatic road toll system. Detection based on relatively
simple and fast computations is sometimes used for finding
smaller regions of interesting image data which can be
further analyzed by more computationally demanding
techniques to produce a correct interpretation.
23 | P a g e
Motion analysis
Several tasks relate to motion estimation where an image
sequence is processed to produce an estimate of the velocity
either at each points in the image or in the 3D scene, or even
of the camera that produces the images . Examples of such
tasks are:
Egomotion – determining the 3D rigid motion (rotation and
translation) of the camera from an image sequence
produced by the camera.
Tracking – following the movements of a (usually) smaller
set of interest points or objects (e.g., vehicles or humans) in
the image sequence.
Optical flow – to determine, for each point in the image,
how that point is moving relative to the image plane, i.e.,
its apparent motion. This motion is a result both of how the
corresponding 3D point is moving in the scene and how the
camera is moving relative to the scene.
Scene reconstruction
Given one or more images of a scene, or a video, scene
reconstruction aims at computing a 3D model of the scene. In
the simplest case the model can be a set of 3D points. More
sophisticated methods produce a complete 3D surface model.
The advent of 3D imaging not requiring motion or scanning,
and related processing algorithms is enabling rapid advances
in this field. Grid-based 3D sensing can be used to acquire 3D
images from multiple angles. Algorithms are now available to
stitch multiple 3D images together into point clouds and 3D
models.
24 | P a g e
Image restoration
The aim of image restoration is the removal of noise (sensor
noise, motion blur, etc.) from images. The simplest possible
approach for noise removal is various types of filters such as
low-pass filters or median filters. More sophisticated methods
assume a model of how the local image structures look like, a
model which distinguishes them from the noise. By first
analysing the image data in terms of the local image
structures, such as lines or edges, and then controlling the
filtering based on local information from the analysis step, a
better level of noise removal is usually obtained compared to
the simpler approaches.
25 | P a g e
type of sensor, the resulting image data is an ordinary 2D
image, a 3D volume, or an image sequence. The pixel
values typically correspond to light intensity in one or
several spectral bands (gray images or colour images), but
can also be related to various physical measures, such as
depth, absorption or reflectance of sonic or electromagnetic
waves, or nuclear magnetic resonance.
system is correct.
Noise reduction in order to assure that sensor noise does
can be detected.
Scale space representation to enhance image structures at
26 | P a g e
Detection/segmentation – At some point in the
processing a decision is made about which image points
or regions of the image are relevant for further
processing. Examples are
Selection of a specific set of interest points
different categories.
Image registration – comparing and combining two
27 | P a g e
Applications and Future Prospects
28 | P a g e
Fig.9
29 | P a g e
Apart from all these applications, there are lots of other
applications as well, which I haven’t mentioned here like
scene reconstruction, image restoration, robotic control, etc
etc.
Understanding images
Image understanding with tens of layers,
millions of classes, billions of images.
Common Objects in Context (Microsoft
CoCo)
Understanding Humans
So much of computer vision is ultimately
for humans, images of humans are an
important special case
Human body pose estimation for Kinect
30 | P a g e
Learning and Optimization
Computer vision often requires the
solution of especially large or difficult
problems in machine learning and
nonlinear optimization, and we innovate
in these domains.
31 | P a g e
Conclusion
There are many kinds of computer vision systems,
nevertheless all of them contain these basic elements: a power
source, at least one image acquisition device (i.e. camera, ccd,
etc.), a processor as well as control and communication cables
or some kind of wireless interconnection mechanism. In
addition, a practical vision system contains software, as well
as a display in order to monitor the system. Vision systems for
inner spaces, as most industrial ones, contain an illumination
system and may be placed in a controlled environment.
Furthermore, a completed system includes many accessories
like camera supports, cables and connectors.Computer Vision
is an emerging field of research and if implemented properly
can be used in several regular operations as well as disaster
management.Computer Vision is a building block of Artificial
Intelligence which is be implemented in critical work like
diagnosis and evolutionary robotics.
32 | P a g e
References
http://moodle.epfl.ch/mod/resource/view.php?id=12423
http://maxembedded.com/2012/12/basic-concepts-of-
computer-vision/
http://en.wikipedia.org/wiki/Computer_vision
http://ieeexplore.ieee.org/xpl/topAccessedArticles.jsp?
punumber=2200
33 | P a g e