Dip Project
Dip Project
A
PROJECT REPORT
On
Submitted in partial fulfillment of the requirements for the award of the degree of
Bachelor of Technology
In
By
Aryan Garg(181300016)
Ashish Bhardwaj(181300017)
Jahnvi Sharma(181300040)
May, 2021
2
DECLARATION
We hereby declare that this submission is our own work and that, to the best of our knowledge
and belief, it contains no material previously published or written by another person nor material
which to a substantial extent has been accepted for the award of any other degree or diploma of
the university or other institute of higher learning, except where due acknowledgment has been
made in the text.
Signature
Date: 20-June-2021
3
ABSTRACT
Human being can have a strong capability of detection and recognition of objects to understand the
natural environment. But for a machine to understand the nature like human is a challenging
surrounding objects etc. However, the proper learning procedure for the machine using the objects'
shape, size, color, texture and other related properties may produce the satisfactory detection and
classification results. Most of the existing systems may not be able to detect the objects properly
when multiple objects belong to a single frame. The proposed system will be able to detect multiple
objects from an image, count the number of detected objects, separate these objects into individual
image through greyscaling, thresholding, edge detection, finding the objects corner points, cropping
etc. Finally the detected objects are recognized as geometrical shapes such as triangular,
LIST OF FIGURES……………………………………………………………………………6
CHAPTER 1 INTRODUCTION………………………………………………………….....7
CHAPTER 2 OBJECTIVE……………………………………………………………………12
CHAPTER 3 METHODOLOGY……………………………………………………………..13
CHAPTER 5 APPENDIX………………………………………………………………………31
CHAPTER 6 CONCLUSION…………………………………………………………………..34
REFERENCE…………………………………………………………………………………….35
6
LIST OF FIGURES
Fig 3.4 Gradient with Prewitt Operator of grayscale image of a brick wall and bike rack
CHAPTER 1
INTRODUCTION
Images contain different types of objects and structures which may convey information.
Counting involves estimating the number of objects in an image, detecting involves presence the
number of objects in an image. Counting arises in many real time applications such as counting
grains in agriculture industry, counting cells in microscopic images, counting of number
diamonds in industry etc. Existing methods for counting involves a large amount of hardware
which also adds to the cost or manual counting which is time consuming and may give erroneous
results. Now counting can be done with the technique involving digital camera and simple image
processing method based on matlab, and hence counting could be performed with ease.
Object detection is inextricably linked to other similar computer vision techniques like
image recognition and image segmentation, in that it helps us understand and analyze scenes in
images or video.
But there are important differences. Image recognition only outputs a class label for identified
objects, and image segmentation creates a pixel level understanding of a scene’s elements. What
separate objects detection from these other task is it unique ability to locate within an image.
This then allow us to count and then track those objects.
8
1.2 APPLICATIONS
Object detection is breaking into a wide range of industries, with use cases ranging from personal
security to productivity in the workplace. Object detection and recognition is applied in many
areas of computer vision, including image retrieval, security, surveillance, automated vehicle
systems and machine inspection. Significant challenges stay on the field of object recognition.
The possibilities are endless when it comes to future use cases for object detection. Here are
some current and future applications in detail:
1.2.2 SELF DRIVING CARS: One of the best examples of why you need object detection is
for autonomous driving is In order for a car to decide what to do in next step whether
accelerate, apply brakes or turn, it needs to know where all the objects are around the car and
what those objects are That requires object detection and we would essentially train the car to
detect known set of objects such as cars, pedestrians, traffic lights, road signs, bicycles,
motorcycles, etc.
9
1.2.3 TRACKING OBJECTS: Object detection system is also used in tracking the objects, for
example tracking a ball during a football match, tracking movement of a cricket bat, tracking
a person in a video. Object tracking has a variety of uses, some of which are surveillance and
security, traffic monitoring, video communication, robot vision and animation.
1.2.4 FACE DETECTION AND FACE RECOGNITION: Face detection and Face Recognition
is widely used in computer vision task. We noticed how face book detects our face when you upload a
photo This is a simple application of object detection that we see in our daily life. Face detection can
be regarded as a specific case of object-class detection. In object-class detection, the task is to find the
locations and sizes of all objects in an image that belong to a given class. Examples include upper
torsos, pedestrians, and cars. Face detection is a computer technology being used in a variety of
applications that identifies human face in digital images. Face recognition describes a
biometric technology that goes way beyond recognizing when a human face is present. It
actually attempts to establish whose face it is. Face-detection algorithms focus on the
detection of frontal human faces. It is analogous to image detection in which the image of a
person is matched bit by bit. Image matches with the image stores in database. Any facial
feature changes in the database will invalidate the matching process. There are lots of
applications of face recognition. Face recognition is already being used to unlock phones and
specific applications. Face recognition is also used for biometric surveillance, Banks, retail
stores, stadiums, airports and other facilities use facial recognition to reduce crime and
prevent violence.
1.2.5 SMILE DETECTION: Facial expression analysis plays a key role in analyzing emotions
and human behaviors. Smile detection is a special task in facial expression analysis with
various potential applications such as photo selection, user experience analysis and patient
monitoring
10
1.2.6 ACTIVITY RECOGNITION: Activity recognition aims to recognize the actions and
goals of one or more agents from a series of observations on the agents actions and the
environmental conditions. This research field has captured the attention of several computer
science communities due to its strength in providing personalized support for many different
applications and its connection to many different fields of study such as human-computer
interaction, or sociology.
1.2.7 MEDICAL IMAGING: Medical image processing tools are playing an increasingly
important role in assisting the clinicians in diagnosis, therapy planning and image-guided
interventions. Accurate, robust and fast tracking of deformable anatomical objects such as the
heart, is a crucial task in medical image analysis.
1.2.9 BALL TRACKING IN SPORTS: Increase in the number of sport lovers in games like
football, cricket, etc. has created a need for digging, analyzing and presenting more and more
multidimensional information to them. Different classes of people require different kinds of
information and this expands the space and scale of the required information. Tracking of
ball movement is of utmost importance for extracting any information from the ball based
sports video sequences and we can record the video frame according to the movement of the
ball automatically.
1.2.11 AUTOMATED CCTV: Automatic image annotation (also known as automatic image tagging
or linguistic indexing) is the process by which a computer system automatically assigns metadata in
the form of captioning or keywords to a digital image. This application of computer vision techniques
is used in image retrieval systems to organize and locate images of interest from a database. This
method can be regarded as a type of multi-class image classification with a very large
number of classes - as large as the vocabulary size. Typically, image analysis in the form of
extracted feature vectors and the training annotation words are used by machine learning
techniques to attempt to automatically apply annotations to new images. The first methods
learned the correlations between image features and training annotations, then techniques
were developed using machine translation to try to translate the textual vocabulary with the
'visual vocabulary', or clustered regions known as blobs. Work following these efforts has
included classification approaches, relevance models and so on.
1.2.12 ROBOTICS: Autonomous assistive robots must be provided with the ability to process
visual data in real time so that they can react adequately for quickly adapting to changes in
the environment. Reliable object detection and recognition is usually a necessary early step to
achieve this goal.
1.2.13 PEOPLE COUNTING: Object detection can be also used for people counting; it is used
for analyzing store performance or crowd statistics during festivals. These tend to be more
difficult as people move out of the frame quickly (also because people are non-rigid objects).
12
CHAPTER 2
OBJECTIVE:
The objective of this project is to detect and count of objects of different kind of shapes.
The goal of object detection is to detect all instances of objects from a known class, such as
people, cars or faces in an image. Typically only a small number of instances of the object are
present in the image, but there are a very large number of possible locations and scales at which
they can occur and that need to somehow be explored. Each detection is reported with some form
of pose information. This could be as simple as the location of the object, a location and scale, or
the extent of the object defined in terms of a bounding box. In other situations the pose
information is more detailed and contains the parameters of a linear or non-linear transformation.
Our goal is to accurately estimate the count. However, we evade the hard task of learning to
detect and localize individual object instances.
13
CHAPTER 3
METHODOLOGY:
3.1 Input image: In this process we take the input image on which we want to apply suitable
operation for our desired output.
3.2 Preprocessing: Pre-processing is a common name for operations with images at the
lowest level of abstraction both input and output are intensity images. These iconic images
are of the same kind as the original data captured by the sensor, with an intensity image
usually represented by a matrix of image function values (brightness’s). The aim of pre-
processing is an improvement of the image data that suppresses unwilling distortions or
enhances some image features important for further processing, although geometric
transformations of images (e.g. rotation, scaling, and translation) are classified among pre-
processing methods here since similar techniques are used
3.3.1 Threshold based segmentation: This is the simplest method of image segmentation
where each pixel value is compared with the threshold value. If the pixel value is smaller than
the threshold, it is set to 0, otherwise, it is set to a maximum value (generally 255).This
threshold value which can be changed arbitrarily. The application of this algorithms is when
we have to separate foreground with background. The drawback of this algorithm is that it
will always segment the image in to two categories.
3.3.2 Edge based segmentation: With this technique, detected edges in an image are
assumed to represent object boundaries, and are used to identify these objects. Sobel and
canny edge detection algorithms are some of the examples of edge based segmentation
techniques.
3.3.4 Graph based segmentation techniques: Graph-based approaches treat each pixel as a
node in a graph. Edge weights between two nodes are proportional to the similarity between
neighbouring pixels. Pixels are grouped together to form segments or a.k.a super pixels by
3.3.5 Clustering based segmentation techniques: Starting from a rough initial clustering
of pixels, gradient ascent methods iteratively refine the clusters until some convergence
criterion is met to form image segments or superpixels. These type of algorithms aim to
minimise the distance between the cluster centre and each pixel in the image. This distance is
defined differently for each algorithm but is dependent on either spatial distance between the
pixel and the centre, color distance between each pixel and the centre or both.
15
3.3.6 Probabilistic image segmentation technique: In theory there are two types of
clustering based segmentation, one is soft clustering and the other is hard clustering. In hard
clustering which is discussed in point 5 above, each pixel will be assigned to either of the
cluster(either cluster 1,2, or k). Whereas in soft clustering, each pixel or data point will be
classified in to every cluster with a probability. Hence soft clustering is a probabilistic type of
clustering. Soft clustering helps in those situations when there is an overlap between the
clusters and hence the data points/pixels in the overlap region have some probability to be
3.4 Edge Detection: In an image, an edge is a curve that follows a path of rapid change in
image intensity. Edges are often associated with the boundaries of objects in a scene. Edge
detection is used to identify the edges in an image. To find edges, you can use the edge
function. This function looks for places in the image where the intensity changes rapidly,
using one of these two criteria: (1). Places where the first derivative of the intensity is larger
in magnitude than some threshold. (2). Places where the second derivative of the intensity
has a zero crossing. Edge provides several derivative estimators, each of which implements
one of these definitions. For some of these estimators, you can specify whether the operation
should be sensitive horizontal edges, vertical edges, or both. edge returns a binary image
containing 1's where edges are found and 0's elsewhere. The most powerful edge-detection
method that edge provides is the Canny method. The Canny method differs from the other
edge-detection methods in that it uses two different thresholds (to detect strong and weak
edges), and includes the weak edges in the output only if they are connected to strong edges.
This method is therefore less likely than the others to be affected by noise, and more likely to
detect true weak edges. Common edge detection algorithms include Sobel, Canny, Prewitt,
Roberts, and fuzzy logic methods.
3.4.1 Edge properties: The edges extracted from a two-dimensional image of a three-
dimensional scene can be classified as either viewpoint dependent or viewpoint
independent. A viewpoint independent edge typically reflects inherent properties of the
three-dimensional objects, such as surface markings and surface shape. A viewpoint
dependent edge may change as the viewpoint changes, and typically reflects the geometry
16
of the scene, such as objects occluding one another. A typical edge might for instance be
the border between a block of red color and a block of yellow. In contrast a line (as can
be extracted by a ridge detector) can be a small number of pixels of a different color on
an otherwise unchanging background. For a line, there may therefore usually be one edge
on each side of the line.
3.4.2 Approaches: There are many methods for edge detection, but most of them can
be grouped into two categories, search-based and zero-crossing based. The search-based
methods detect edges by first computing a measure of edge strength, usually a first-order
derivative expression such as the gradient magnitude, and then searching for local
directional maxima of the gradient magnitude using a computed estimate of the local
orientation of the edge, usually the gradient direction. The zero-crossing based methods
search for zero crossings in a second-order derivative expression computed from the
image in order to find edges, usually the zero-crossings of the Laplacian or the zero
crossings of a non-linear differential expression. As a pre-processing step to edge
detection, a smoothing stage, typically Gaussian smoothing, is almost always applied (see
also noise reduction).
The edge detection methods that have been published mainly differ in the types of
smoothing filters that are applied and the way the measures of edge strength are
computed. As many edge detection methods rely on the computation of image gradients,
they also differ in the types of filters used for computing gradient estimates in the x-
and y-directions. A survey of a number of different edge detection methods can be found
in (Ziou and Tabbone 1998);[6] see also the encyclopedia articles on edge detection
in Encyclopedia of Mathematics[3] and Encyclopedia of Computer Science and
Engineering.
Figure 3.1
Advantages:
The process of Canny edge detection algorithm can be broken down to 5 different steps:
1. Apply Gaussian filter to smooth the image in order to remove the noise
2. Find the intensity gradients of the image
3. Apply gradient magnitude thresholding or lower bound cut-off suppression to get rid of
spurious response to edge detection
4. Apply double threshold to determine potential edges
5. Track edge by hysteresis: Finalize the detection of edges by suppressing all the other
edges that are weak and not connected to strong edges.
Figure 3.2
2. Prewitt Edge Detection: The Prewitt edge detection is proposed by Prewitt in 1970
(Rafael Gonzalez [1]. To estimate the magnitude and orientation of an edge Prewitt is a correct
19
way. Even though different gradient edge detection wants a quite time consuming calculation
to estimate the direction from the magnitudes in the x and y-directions, the compass edge
detection obtains the direction directly from the kernel with the highest response. It is limited
to 8 possible directions; however knowledge shows that most direct direction estimates are not
much more perfect. This gradient based edge detector is estimated in the 3x3 neighborhood for
eight directions. All the eight convolution masks are calculated. One complication mask is then
selected, namely with the purpose of the largest module. It detects two types of edges
Horizontal edges AND Vertical Edges.
Advantages:
Limitations:
Figure 3.3
20
Figure 3.4
3. Roberts Cross operator : The Roberts edge detection is introduced by Lawrence Roberts (1965). It
performs a simple, quick to compute, 2-D spatial gradient measurement on an image. This method
emphasizes regions of high spatial frequency which often correspond to edges. The input to the operator
is a grayscale image the same as to the output is the most common usage for this technique. Pixel values
in every point in the output represent the estimated complete magnitude of the spatial gradient of the input
image at that point. The main reason for using the Roberts Cross operator is that it is very quick to
compute. Only four input pixels need to be examined to determine the value of each output pixel, and
only subtractions and additions are used in the calculation. In addition there are no parameters to set. Its
main disadvantages are that since it uses such a small kernel, it is very sensitive to noise. It also produces
very weak responses to genuine edges unless they are very sharp. The Sobel operator performs much
better in this respect.
-1 0 0 -1
0 +1 +1 0
Gx Gy
Advantages:
Limitations:
4. Sobel Edge Detection: The Sobel edge detection method is introduced by Sobel in 1970
(Rafael González (2004)). The Sobel method of edge detection for image segmentation finds
edges using the Sobel approximation to the derivative. It precedes the edges at those points
where the gradient is highest. The Sobel technique performs a 2-D spatial gradient quantity on an
image and so highlights regions of high spatial frequency that correspond to edges. In general it
is used to find the estimated absolute gradient magnitude at each point in n input gray scale
image. In conjecture at least the operator consists of a pair of 3x3 complication kernels as given
away in under table. One kernel is simply the other rotated by 90 o . This is very alike to the
Roberts
Cross operator: The Sobel operator is slower to compute than the Roberts Cross operator, but its
larger convolution kernel smooths the input image to a greater extent and so makes the operator
less sensitive to noise. The operator also generally produces considerably higher output values
for similar edges, compared with the Roberts Cross.As with the Roberts Cross operator, output
values from the operator can easily overflow the maximum allowed pixel value for image types
that only support smallish integer pixel values (e.g. 8-bit integer images). When this happens the
standard practice is to simply set overflowing output pixels to the maximum allowed value. The
problem can be avoided by using an image type that supports pixel values with a larger range. It
22
uses two 3 x 3 kernels or masks which are convolved with the input image to calculate the
vertical and horizontal derivative approximations respectively.
-1 0 1
-2 0 2
-1 0 1
Gx
-1 -2 -1
0 0 0
1 2 1
Gy
Advantages:
Limitations:
Figure 3.7
Figure 3.8
Advantages:
Limitations:
START
INPUT IMAGE
SEGMENTATION
EDGE DETECTION
FILL HOLES
DETERMINING CONNECTED
OBJECTS
STOP
26
CHAPTER 4
EXPERIMENT AND SIMULATION RESULT
In this chapter we present experimental result to demonstrate how objects can be recognized and
counted using this technique. We have taken an image consisting of several geometric objects
shape. This technique first segment those objects and then find their edges to recognized and
count object.
Example 1:
Example 2:
Example 3:
CHAPTER 5
APPENDIX
5.1Matlab Code for Detection and Counting Of Objects of Different Kind
Of Shapes
clc;
clear all;
close all;
real_img=imread('C:\Users\Admin\Desktop\pp.png');
grey_img=rgb2gray(real_img);
figure;
imshow(grey_img);
com_img=255-real_img;
figure;
imshow(com_img);
bw_img=im2bw(com_img);
figure;
imshow(bw_img);
fill=imfill(bw_img,'holes');
figure;
imshow(fill);
[l,num]=bwlabel(fill,8);
32
c=label2rgb(l);
subplot(1,2,1);
imshow(real_img);
title('original image');
subplot(1,2,2);
imshow(c);
title(['objects counted :',num2str(num)]);
5.2Functions Description
5.2.1 imread(filename with path)
5.2.2 imshow(I)
It displays an image I in a handle graphics figure where I is a grayscale, RGB, or
binary image.
5.2.3 rgb2gray(I)
It converts a RGB or color image I into gray intensity image. It eliminates the hue
and saturation information from the RGB image while retaining the luminance.
5.2.4 figure
Figure creates figure objects. Figure objects are the individual windows on the
screen in which the MATLAB software displays the graphical output.
5.2.5 imfill(BW,'holes')
This function fills image regions and holes. fills holes in the input binary image
BW. In this syntax, a hole is a set of background pixels that cannot be reached by
filling in the background from the edge of the image
33
5.2.6 bwlabel(BW)
5.2.7 label2rgb(L)
This function converts label matrix into RGB image. label2rgb(L) converts a label
image, L into an RGB color image for the purpose of visualizing the labeled
regions. The label2rgb function determines the color to assign to each object
based on the number of objects in the label matrix. The label2rgb function picks
colors from the entire range of the color map.
5.2.8 Edge(I,method)
5.2.9 Imbinarize(I)
CHAPTER 6
CONCLUSION
Based on the above experimental results we are able to detect objects more precisely and identify
the objects individually with the exact location of an object in an image.
Using the process for object counting, we found out that the given process may be used for
object counting however proper interpretation of the results produced by the process is needed.
Image processing is a very hot field that needs extensive research and hard work.
1. Merge this technique with some other technique to get better results.
2. Use better image segmentation techniques and functions to detect objects with improper
shapes and boundaries.
35
REFRENCES