Computer Vision
Computer vision means giving the ability to the computer to see the world just like humans.
It is a domain of Artificial Intelligence that enables computers to see, observe and
understand digital images or data, process them by acquiring, screening, analysing,
identifying and extracting information using the machine learning and neural network
algorithms.
Image classification Banking Agriculture
and object detection
Computer Vision
Applications of
Autonomous Cars Retail Business Warehouse Automation
Damage Analysis Medical Field
Difference between Computer Vision and Human Vision
Computer Vision Tasks
The computer vision applications are based on certain number of tasks performed on an input
image to get the desired output which can be used to do predictions or analysis of data
For For
Single Multiple
Objects Objects
Classificat Object
ion detection
Classificat
Instance
ion +
segmenta
localisatio
tion
n
Single Objects
This means giving one image as input to the computer vision application
Classification:
Classification is the process of finding out the class /
category of input image.
Theses predefined categories are created in a computer by
set of sample images.
The most popular architecture used for image classification
is Convolutional Neural Networks (CNNs) Eg: Identify Image
of Monument as India Gate
Classification + Localisation:
• Localisation means where the object is in the image and
processing the input image to identify its category along with
the location of object in image. Eg: Identify the monument as
India Gate and location as Delhi (India)
Multiple Objects
This means giving multiple images as input to the computer vision application
Object detection:
It is the process of identifying / detecting the instances of real world objects like cars, bicycle,
buses, animals, humans or anything on which the detection model has been trained.
This kind of system uses object detection algorithm to extract features of objects by matching
with sample images already fed into the system.
Instance Segementation:
• It is the process of division of an image into smaller objects so that the machine can identify an
object from the background or by using information about other objects present along with it in
the input image.
• A segmentation algorithm takes an image as input and outputs a collection of regions /
segments.
PIXELS
o Pixels stand for “Picture element”.
o It is the smallest unit of information in a digital image.
o These pixels are arranged in two dimensional grid to form a complete
image video, text, or any visible thing on a digital platform.
o A pixel can have only one colour at a time.
o Colour of pixel is determined by number of bits it represent.
o Screen resolution is calculated by displaying the number of pixels
displayed vertically or with the number of pixels displayed
horizontally.
o For example : A full HD screen displays a popular HD of 1080p, which
means 1080 pixels tall by 1920 pixels wide.
Basics of Images
An image is a visual representation of any object.
The term ‘image’ means a picture that has been created or stored in electronic form.
It can be described in terms of vector graphics or raster graphics.
An image comprises of a rectangular array of graphs known as pixel.
The size of image is specified as width x height, in the number of pixels.
Greyscale It is one in which the value of each pixel is single.
Images It is also known as Black & white, it has shades of grey.
These are images with two colours varying from Black
at weakest intensity to white at the strongest.
A grey scale image has each pixel of scale 1 bite having a
single plane of 2D array of pixels .
The range of greyscale images start with zero and ends
with 255 i.e., starts with pure black and ends with pure
white.
RGB • All coloured images around us are made up of three
Images primary colours or Red, Green and Blue.
• Every coloured image when split is stored in form of
three different channels, R-Channel, G-Channel & B-
Channel.
• Each channel has a pixel value varying from 0-255.
• In a coloured image a single pixel contains Red, green &
blue values in triplets.