0% found this document useful (0 votes)
13 views

Computer Vision Part 2

Class 10 AI - 5

Uploaded by

banani1776
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Computer Vision Part 2

Class 10 AI - 5

Uploaded by

banani1776
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Chapter : Computer Vision

How Do Human See?

1) Our process of seeing things is primarily because of the presence of light.


2) Humans see an object when the light bounces off an object and enters the eyes through
the cornea.
3) Then through the nerves it enters the retina which is involved in color vision.
4) Once an object is seen by the eyes it needs to be understood by the brain.
5) Our brain then identifies and understands other information like color, shape, movement
and other details about the image.

Tasks in Computer Vision Applications

The computer vision applications are based on a certain number of tasks performed on an input
image to get the desired output which can be used to do predictions or analysis of data.

The tasks on the bases of which computer vision applications are based on

1) Single Objects
i) This means giving on image as input to the computer vision application.
ii) It can further be divided into 2 categories
a) Classification
 Classification is the process of finding out the class/category of the input
image.
 The input image is processed using a machine learning algorithm and
classified into predefined categories.
 The most popular architecture used for image classification is
Convolutional Neural Network (CNN)
b) Classification + Localization
 Localization means where the object is in the image.
 The combined task of classification and localization means processing the
input image to identify its category along with the location of the object in
the image.
2) Multiple Object
i) This means giving multiple images to the computer vision application.
ii) It can further be divided into 2 categories
a) Object Detection
 Object detection is the process of identifying or detecting the instances of
real-world objects like cars, bicycles, buses, animals, humans or anything
on which the detection model has been trained.
 Object detection draws a boundary box around each object in the image.
 Object detection algorithms extract the features of the object and after
that the machine learning algorithms will recognize the instances of an
object category by matching it with the sample images already fed into the
system.
b) Image Segmentation
 It the computer vision task that involves identifying and separating
individual objects within an image, including detecting the boundaries of
each object and assign a unique label to each object.
 A segmentation takes an input image as input and outputs a collection of
regions (segments).

What are Pixels?

i) Pixel stands for ‘Picture Element’.


ii) It is the smallest unit of information in a digital image.
iii) These pixels are arranged in a 2-dimensional grid to form a complete image, video, text
or any visible thing on a digital platform.
iv) The most common pixel format is Byte image, where the number is stored in an 8-bit
integer ranging between 0-255, where 0 means No color/black and 255 represent.

What is Resolution?

i) Resolution is basically the dimensions through which we can measure how many pixels
are on a screen.
ii) Screen resolution is calculated by displaying the numbers of pixels, displayed vertically
or with the number of pixels displayed horizontally.
iii) For example, a full HD screen displays a popular HD of 1080p which means 1080 pixels
tall by 1920 pixels wide.

Basics of Image

1) Greyscale Image
i) A greyscale image is the one in which the value of each pixel is single i.e it carries
only intensity information.
ii) It is also know as black and white image.
iii) These are the images with only 2 colors, black and white, varying form black at the
weakest intensity to white at the strongest.
iv) A grayscale image has pixel of size 1 byte having a single plane of 2D array of pixels.
v) In grayscale images, the shades range starts with 0 and ends with 255 i.e it starts
with pure black and ends with pure white.
2) RGB Images
i) All colored images around us are made of 3 primary colors of Red, Green and Blue.
ii) All the colors are made by mixing the 3 basics colors of the RGB in varying intensity.
iii) Every colored image when split is stored in the form of 3 different channels, R
channel, G channel and B channel.
iv) Each channel has a pixel value varying from 0-255 .
v) In a colored image a single pixel contains red, green and blue values in triplets.

Understanding Convolution Operator

1) Technically, convolution is defined as a simple Mathematical operation that multiples two


numeric arrays of the same dimensions but different sizes to produce a third numeric
array of the same dimensions.
2) The filters that we use to improve the image quality or edit the pixel values in Photoshop
are actually changing the pixel values evenly throughout the image with the help of
convolution and the convolution operator.
3) In simple terms, convolution is passing a ‘Kernel’ matrix over the whole ‘image’ matrix to
give the convoluted matrix i.e the filtered image.

Note: An image is made up of pixels and these pixels are arranged in a 2D matrix to form a digital
image.

Kernel

1) Kernel is also known as a convolution matrix or mask that help in image processing by
creating a wide range of effects like sharp, blur, masking etc.
2) The kernel is slid across the image and multiplied with the input image matrix to generate
an output image with an enhanced desired effect.

What is Neural Network?

1) Neural networks are a series of algorithms used to recognize hidden patterns in raw data,
cluster and classify it, and continuously learn and improve.
2) The main advantage is that the data features can be extracted automatically by the
machine without the input from the developer.
3) Neural networks are primarily used for solving problems with large datasets like images.
4) A neural network is divided into multiple layers and each layer is further divided into
several blocks called nodes.
5) First we have the input layer which receives the input in several different formats provided
by the programmer and feeds it to the neural network. No processing occurs in the input
layer. The output layer predicts our final output. The output at each node is called its
activation or node value.

What is convolution neural network? (CNN)

1) Convolution Neural Network is a type of an Artificial Neural Network and is made up of


neurons that help in image recognition and image processing.
2) It uses Deep Learning Algorithm that takes an input image, processes it by assigning
learnable weights and biases to various aspects/ objects in the image which will help the
system to differentiate one image from the other with maximum accuracy.

Layers of Convolution Neural Network

Convolutional Neural Network is made up of multiple layers of artificial neurons. It uses


mathematical functions to calculate the weighted sum of multiple inputs and generate the
desired outputs.

1) Convolution Layer
i) Convolution is the first layer of CNN and is also known as Feature Extractor Layer.
ii) The main purpose of this layer is to extract the high-level feature from the input
image to perform operations such as edge detection, blur and sharpen by applying
filters.
iii) This layer deals with the convolution process of handling an image with several
types of kernels to provide features to the whole system.
iv) Each convolution kernel is used to generate a feature map based on input provided.
v) Feature maps have multiple uses like:
a) The output of the filter applied to the previous layer is trapped by the feature
map.
b) It helps in reducing the size of the image so that it can be processed easily.
c) It helps in focusing on the important features of the images like eyes, nose etc.
so that it can be processed efficiently.
2) Rectified Linear Unit (ReLU)
i) This layer is the next after the convolution layer.
ii) It takes the features maps of the convolution layer and generates the activation map
by discarding all the negative numbers of the feature maps. It means all positive
numbers will go as it to the system but all negative numbers will go as zero which
makes the feature map appear as Non-Linear graph with all positive values.
3) Pooling Layer
i) This layer reduces the dimensions of the input image while still retaining the
important features.
ii) This will help in making the input image more resistant to small transformations,
distortions and translations.
iii) All this is done to reduce the number of parameters and computation in the
network thus making it more manageable and improving the efficiency of the whole
system.
iv) There are two types of pooling
a) Max Pooling: Max Pooling is the most commonly used method that selects the
maximum value of the current image view and helps preserve the maximum
detected features.
b) Average Pooling: Average Pooling finds out the average value of the current
image view and thus down samples the feature map.
4) Fully Connected Layer
i) This is the last and final layer of the convolution neural network.
ii) After the features of the input image are extracted by the convolution layers and
downsampled by the pooling layers, their output is a 3-dimensional matrix which is
flattened into a vector of values.
iii) These values of the single vector represent a specific feature of a specific label and
are redirected to fully connected layers to predict the final outputs of the network.
iv) This helps in classifying an image into a specific label based on the probability of the
input being in a specific class.

You might also like