0% found this document useful (0 votes)
67 views

Project Report On: Hand Gesture Detection and Recognition

This document describes a project on hand gesture detection and recognition. It was conducted by Arnab Das, an undergraduate student at Jalpaiguri Government Engineering College, under the guidance of Mr. Pradipta Roy, a scientist at DRDO-ITR Chandipur. The project involved developing a method to detect and recognize hand gestures using image processing techniques like noise filtering, skin detection in YCbCr color space, morphological operations, and a curvature algorithm to determine finger count. The document outlines the proposed method, provides flow charts and MATLAB code samples, and discusses limitations and potential future work.

Uploaded by

palparmar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
67 views

Project Report On: Hand Gesture Detection and Recognition

This document describes a project on hand gesture detection and recognition. It was conducted by Arnab Das, an undergraduate student at Jalpaiguri Government Engineering College, under the guidance of Mr. Pradipta Roy, a scientist at DRDO-ITR Chandipur. The project involved developing a method to detect and recognize hand gestures using image processing techniques like noise filtering, skin detection in YCbCr color space, morphological operations, and a curvature algorithm to determine finger count. The document outlines the proposed method, provides flow charts and MATLAB code samples, and discusses limitations and potential future work.

Uploaded by

palparmar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

PROJECT GUIDED BY -

Mr. PRADIPTA ROY, Scientist- E


EOTS
DRDO-ITR, Chandipur

PROJECT REPORT ON
HAND GESTURE DETECTION
AND RECOGNITION

PROJECT DONE BY -

ARNAB DAS
ELECTRONICS AND COMMUNICATION
ENGINEERING DEPARTMENT
JALPAIGURI GOVERNMENT ENGINEERING COLLEGE
3rd YEAR ( 6th SEMESTER)
2016

CONTENT
1. ABSTRACT
2. INTRODUCTION
3. THE PROPOSED METHOD

2-Dimentional Filter for Noise Removal

YCbCr Color Space for Skin Color detection

Global Thresholding

Morphological Operations and Edge Detection

Curvature Algorithm

4. FLOW CHART
5. MATLAB CODE WITH OUTPUT
6. LIMITATIONS
7. CONCLUSION AND FUTURE WORK
8. REFECENCES
9. ACKNOWLEDGEMENTS

1. ABSTRACT
Abstract-In the process of vision-based hand gesture human- computer interaction, to
extract the motion hand gesture accurately and quickly is the premise and key of
improving recognition hand gesture's speed and accuracy .The image is acquired
randomly, and undergoes several processing stages before a meaningful recognition can
be achieved. Some of these stages include skin detection to effectively capture only the
skin region of the hand Skin detections method is used to create a segmented hand image
and to differentiate with the background, Median and smoothing lters are integrated to
remove the noise, application of the morphological operation to get the outline of the
hand, and apply curvature algorithm to determine the finger count. . Experimental results
demonstrate that our system can successfully recognize hand gesture

2. INTRODUCTION
A Motivation Gesture and Gesture recognition terms are heavily encountered in human
computer interaction. Gestures are the motion of the body or physical action form by the
user in order to convey some meaningful information. Gesture recognition is the process
by which gesture made by the user is made known to the system. Through the use of
computer vision or machine eye, there is great emphasis on using hand gesture as a
substitute of new input modality in broad range applications. With the development and
realization of virtual environment, current user-machine interaction tools and methods
including mouse, joystick, keyboard and electronic pen are not sufficient. Hand gesture
has the natural ability to represents ideas and actions very easily, thus using these
different hand shapes, being identified by gesture recognition system and interpreted to
generate corresponding event, has the potential to provide a more natural interface to
the computer system. This type of natural interaction is the core of immers ive virtual
environments. If we ignore the world of computers for a while and consider interaction
among human beings, we can simply realize that we are utilizing a wide range of gestures
in our daily personal communication. By the fact it is also shown that people gesticulate
more when they are talking on telephone and are not able to see each other as in face
to face communication. The gestures vary greatly among cultures and context still are
intimately used in communication. The significant use of gestures in our daily life as a
mode of interaction motivates the use of gestural interface and employs them in wide
range of application through computer vision.
Related Work In some passed decades Gesture recognition becomes very influencing
term. There were many gesture recognition techniques developed for tracking and
recognizing various hand gestures. Each one of them has their pros and cons. The older
one is wired technology, in which users need to tie up themselves with the help of wire
in order to connect or interface with the computer system. In wired technology user
cannot freely move in the room as they connected with the computer system via wire and
limited with the length of wire. Some of the major drawbacks of wearable gloves are the
cost and also the need for sophisticated sensors and hardware. Several different methods
have been proposed for hand gesture recognition using vision systems

Compared to data glove-based gesture and many other human-computer interaction


methods, vision-based hand gesture has the advantages of intuitive, kind, and easy to
use. Vision-based hand gesture recognition is extensively developed in recent years and
some different methods have been proposed for hand gesture model reconstruction. Two
types of methods, i.e., 3D modeling and appearance modeling, are usually used to identify
hand gestures. However, their methods cannot be implemented easily because of the
high computational cost. Furthermore, the model parameter estimation is unreliable
when the model parameters are extracted using a number of similar process.
Appearance-based model is used in the hand gesture model reconstruction. M. Bicego
etc. propose a new appearance-based 3D object classication method. This report
focuses on the hand gesture feature extraction problem which is a major factor affecting
the quality of the hand gesture recognition. We propose a hand gesture recognition
method using the combination of features. Firstly, the hand gesture must be segmented
from the video sequences. The hand silhouette is segmented using the skin color feature
in the YCbCr color space that is aided to classify the hand image into skin color and nonskin color clusters. Furthermore, the hand gesture image should be reprocessed to
remove the noise and enhance the contrast using the lters. Secondly, hand gesture
features are extracted. Lastly, Curvature algorithm is used to recognize the hand gesture.

3. THE PROPOSED METHOD


2-Dimentional Filter for Noise Removal:
Smoothing filter used for blurring and noise reduction. Blurring is used in
preprocessing steps, such as removal of small details from an image prior to (large)
object extraction, and bridging of small gaps in lines or curves. Noise reduction can be
accomplished by blurring with a linear filter and also by non-linear filtering.
The output of a smoothing linear spatial filter is simply the average of the pixels
contained in the neighborhood of the filter mask. These filters sometimes called
Average Filters. They also referred to as low pass filter. The grayscale image
smoothing can be viewed as a spatial filtering operations (Table 1) in which the coefficient of the mask are all 1s. As the mask is slid across the image to be smoothed,
each pixel is replaced by the mask. The concept is easily extended to the processing
of full color images. The principal difference is that instead of scalar gray-level values
we must deal with component vectors
(x, y) =

( ) (, )

Lets e(x, y) denote the set of coordinates defining a neighborhood centered at (x, y)
in an RGB component vectors in this neighborhood is
1

1
(, ) =

1
[

(, )

( )

(, )

( )

( )

(, )
]

Matlab supports a number of predefined 2-D linear spatial filters, obtained by using
function fspecial. Which generates a filter mask w using the syntax

W = fspecial (type, parameters)


Where the type specifies filter type, and parameters further define the specified
filter. The spatial filters supported by fspecial are summarized in the table below
including applicable parameters for each filter.

TYPE
average
disk
gaussian

laplacian
log

motion

prewitt

sobel

unsharp

SYNTAX AND PARAMETERS


fspecial (average, [r c]).A rectangular averaging filter of size r*c. The
default is 3*3. A single number instead of [r c] specifies a square filter
fspecial (disk, r).A circular averaging filter (within a square size 2r+1) with
radius r. The default is 5.
fspecial (gaussian, [r c], sig). A lowpass filter of size r*c and standard
deviation sig (positive). The defaults are 3*3 and 0.5. A single number
instead of [r c] specifies a square filter.
fspecial (laplacian, alpha). A 3*3 Laplacian filter whose shape is specified
by alpha. A number in range of [0 1]. The default value of alpha 0.5
fspecial (log, [r c], sig). Laplacian of Gaussian (LoG) filter of size r*c and
standard deviation sig (positive). The defaults are 5*5 and 0.5. A single
number instead of [r c] specifies a square filter.
fspecial (motion len, theta). Outputs a filter that, when convolved with an
image, approximates linear motion (of a camera with respect to the image)
of len pixels. The direction of motion is theta, measured in degrees,
counterclockwise from the horizontal. The defaults are 9 and 0, which
represents a motion of 9 pixel in the horizontal direction.
fspecial (prewitt). Outputs a 3*3 Prewitt mask, wv, that approximates a
vertical gradient. A mask for horizontal gradient is obtained by transposing
the result : wh= wv.
fspecial (sobel). Outputs a 3*3 Sobel mask, sv, that approximates a vertical
gradient. A mask for horizontal gradient is obtained by transposing the
result : sh= sv.
fspecial (unsharp, alpha). Outputs a 3*3 unsharp filter. Parameter alpha
controls the shape; it must be greater than 0 and less than or equal to 1.0;
the default is 0.2

We use disk type mask for smoothing the image for noise removal and blurring

h = fspecial('disk', radius)
return a circular averaging filter (pillbox) within the square matrix of side
(2*radius+1).
The default radius is 5.
Ex. >> h = fspecial('disk', 4)

The output of image filtering is

YCbCr Color Space and Skin Color Detections :


The RGB color space is an additive color model in which the primary colors red,
green, and blue light are added together in various ways to reproduce a broad
array of colors. The name comes from the initials of the three colors Red,
Green, and Blue. The RGB color model is shown in the Figure 1.

Fig. 1 RGB Color Model


The main purpose of the RGB color model is for sensing, representation, and
display of images in electronic systems, such as televisions and computers.
A color in the RGB color model is described by indicating how much of each
of the red, green, and blue is included in each component which can vary from
zero to a defined maximum value which depends of the application. In
computing, the component values are often stored as integer numbers in the
range 0 to 255.

The YCbCr Color Space


The YCbCr color space is widely used in digital video, image processing, etc. In
this format, luminance information is represented by a single component, Y,
and color information is stored as two color-difference components, Cb and
Cr. Component Cb is the difference between the blue component and a
reference value, and component Cr is the difference between the red
component and a reference value. The YCbCr color model was developed as
part of ITU-R BT.601 during the development of a world- wide digital
component video standard. YCbCr is a scaled and offset version of the YUV
color model. Y is the luma component defined to have a nominal 8-bit range
of 16 235; Cb and Cr are the blue-difference and red-difference Chroma
components respectively, which are defined to have a nominal range of 16
240.

The transformation used to convert from RGB to YCbCr color space is shown
in the equation (1):

16
65.481
[ ] = [128 ] + [37.797

128
112

128.553
74.203
93.786

24.996
112 ] [ ] .. [1]
18.214

In contrast to RGB, the YCbCr color space is luma- independent, resulting in a


better performance. The corresponding skin cluster is given as [2]:

Y > 80 85 < Cb <135 135 < Cr < 180,


Where Y, Cb, Cr = [0, 255] [2]
Chai and Ngan have developed an algorithm that exploits the spatial
characteristics of human skin color. A skin color map is derived and used on
the chrominance components of the input image to detect pixels that appear
to be skin. Working in this color space Chai and Ngan have found that the range
of Cb and Cr most representatives for the skincolor reference map are:

77Cb127 and 133Cr173


However due to that our purpose is to find human skin from different races,
the thresholds given above works only with a Caucasian people skin because
the first threshold only finds people with white skin, and the second threshold
segments people of different places of the world but some pixels are detected
as skin but really not. For this reason is proposed a new s kin threshold to
segment people within the image
Regardless skin color, so after exhaustive image histogram analysis, the
optimal range threshold was:

80Cb120 and 133Cr173

Some examples of segmentation of hand shape using the skin color algorithm
are shown in the figure bellow

Global Thresholding:
One way to choose a threshold is by visual inspection of the image histogram.
The threshold value T distinguishes the object and the background. Another
method of choosing T is by trial and error, picking different thresholds until
one is found that produces a good result as judged by the observer. This is an
effective environment which follows the following procedure:
1. Select an initial estimate for T. (Generally the mid-point between the
minimum and maximum intensity values in the image.)
2. Segment the image using T. This will produce two groups of pixels: G1,
consisting of all pixels with intensity values >= T, and G2, consisting of pixels
with values < T.
3. Compute the average intensity values u1 and u2 for the pixels in the regions
G1 and G2.
4. Compute a new threshold value:

5. Repeat steps 2 through 4 until the difference in T in successive iterations is


smaller than a redefined parameter To. Normalized histogram as a discrete
probability density function

Where n is the total number of pixels in the image, nq is the number of pixels
that have intensity level rq, and L is the total number of possible intensity
levels in the image. Now suppose that a threshold k is chosen such that Co is
the set of pixels with levels [0, 1, , k-1] and C1 is the set of pixels with levels
[k, k+1, , L-1]. Otsus method chooses the threshold value k that maximizes
the between-class variance, which is defined as

Where

Fig: Example of Global Thresholding

Morphological Operation and Edge Detection :


The word morphology commonly denotes a branch of biology that6 deals with the
form and structure of animals and plants. We use the same word here in the context
of mathematical morphology as a tool for extracting image components that are
useful in the representation and description of region shape, such as boundaries,
skeleton, and the convex hull. We are interested also in morphological techniquesfor
pre- or post-processing, such as morphological filtering, thinning, and pruning.
The field of mathematical morphology contributes a wide range of operatorstoimage
processing, all based around a few simple mathematical concepts from set theory.
The operators are particularly useful for the analysis of binary images and common
usages include edge detection, noise removal, image enhancement and image
segmentation.
Morphological techniques typically probe an image with a small shape or template
known as a structuring element. The structuring element is positioned at all possible
locations in the image and it is compared with the corresponding neighborhood of
pixels. Morphological operations differ in how they carry out this comparison.

The structuring element is already just a set of point coordinates (although it


is often represented as a binary image). It differs from the input image
coordinate set in that it is normally much smaller, and its coordinate origin is
often not in a corner, so that some coordinate elements will have negative
values. Note that in many implementations of morphological operators, the
structuring element is assumed to be a particular shape (e.g. a 33 square) and
so is hardwired into the algorithm.
The two most common structuring elements (given a Cartesian grid) are the 4connected and 8-connected sets, N4 and N8. They are illustrated in Figure.

Fundamental Definitions Erosion and Dilation

From these two Murkowski operations we define the fundamental


mathematical morphology operations dilation and erosion:

Where
While either set A or B can be thought of as an "image", A is usually considered
as the image and B is called a structuring element. The structuring element is
to mathematical morphology what the convolution kernel is to linear filter
theory.

Dilation, in general, causes objects to dilate or grow in size.

Erosion causes objects to shrink.

Boundary Extraction or Edge Detection :

Let A be an N*M binary image array with background represented by the value
0. The goal of boundary extraction is to find the pixels that are on the boundary
of the image, that is, simple one edge detection procedure. Let, B is a
structural element. Two steps for this boundary extraction procedure are, first
erosion is made between A and B, then next is subtraction of that eroded
image from the original image A as shown in figure below.

Fig: (a) A simple binary image, (b) Boundary Extraction- Result using
mathematical morphological

Curvature Algorithm :
The notion of curvature measures how sharply a curve bends. We would
expect the curvature to be 0 for a straight line, to be very small for curves
which bend very little and to be large for curves which bend sharply. If we
move along curve, we see that the direction of the tangent vector will not
change as long as the curve is at. Its direction will change if the curve bends.
The more the curve bends, the more the direction of the tangent vector will
change. So, it makes sense to study how the tangent vector changes as we
move along a curve.
But because we are only interested in the direction of the tangent vector, not
Its magnitude, we will consider the unit tangent vector. Curvature is defined
as follows:
Let C be a smooth curve with position vector () where s is the arc length
parameter. The curvature of C is defined to be:

is the unit tangent vector, Note the letter used to denote the
Where
curvature is the Greek letter kappa denoted k.
=

In our project we recognize the gesture of hands using the concept of


curvature. We first find the edge of the object and also find the set of
sequential connected points lying on the edge
Then we take two point having a specified interval then we take a ratio of the
summation the distance between all points in between those points to the
distance between those two points. We calculate the ratio for all the points of
the point set of the edge of the object and we found the corresponding
curvature for different object (hands).
One example of that curvature is shown below

4. FLOW CHART :
START

BLURRING THE IMAGE USING


2D FILTER

SEGMENT THE HAND REGION


USING SKIN COLOR DETECTION

CROP THE HAND REGION FROM


THE BACKGROUND

RESIZE THE IMAGE FOR SIZE


INVARRENCE

FIND THE SEQUENTIAL EDGE


POINT OF THE HAND

FIND THE CURVATURE

DISPLAY THE FINAL COUNT


WITH THE IMAGE

END

5. MATLAB CODE WITH OUTPUT :


clear all
clc
% image reading ...
a=imread('3.jpg');
a1= imresize(a, [400 300]);
% image blurring ...
H = fspecial('disk',8);
I1 = imfilter(a1,H,'replicate');
% skin color detection ...
k=rgb2ycbcr(I1);
[w h]=size(k(:,:,1));
k1=1;
for i=1:w
for j=1:h
if 135<k(i,j,3) && k(i,j,3)<180 && 58<k(i,j,2) && k(i,j,2)<135 &&
80<k(i,j,1)
b2(i,j)=255;
mx(k1)=j;
my(k1)=i;
k1=k1+1;
else
b2(i,j)=0;
end
end
end
% crop the areas of hand ...
b2=imcrop(b2,[min(mx) min(my) max(mx)-min(mx) max(my)-min(my)]);
b4= imresize(b2, [180 120]);
b4=threshold_algo(b4);
b4(1,:)=0;
b4(:,1)=0;
b4(size(b4,1),:)=0;
b4(:,size(b4,2))=0;
b4=double(b4);
b3=b4-jmerode(b4,ones(3));
[r4, c4]=size(b3);
e=1;
for i=r4:-1:1
for j=1:c4
if(b3(i,j)==1)
r(1,e)=i;
r(2,e)=j;
e=e+1;
end
end
end
x=r(1,1);
y=r(2,1);
% set of sequential connected edge points ...
r1(1,1)=x;
r1(2,1)=y;

e=1;
win=[-1 0;-1 -1;0 -1;1 -1;1 0;1 1;0 1;-1 1];
se=5;
flag=0;
while flag==0
if(se>1)
for i=1:se-1
u=x+win(i,1);
v=y+win(i,2);
if(b3(u,v)==1)
x1=u;
y1=v;
flag=1;
if(i>=1&&i<=4)
se=i+4;
else
se=i+4-8;
end
end
end
end
if((se<8)&&(flag==0))
for i=se+1:8
u=x+win(i,1);
v=y+win(i,2);
if(b3(u,v)==1)
x1=u;
y1=v;
flag=1;
if(i>=1&&i<=4)
se=i+4;
else
se=i+4-8;
end
end
end
end
if(flag==1)
if(x1==r(1,1))&&(y1==r(2,1))
else
e=e+1;
r1(1,e)=x1;
r1(2,e)=y1;
x=x1;
y=y1;
flag=0;
end
end
end
% curvature finding ...
bp=45;
j=1;
for i=1:size(r1,2)-bp
d=sqrt((r1(1,i+bp)-r1(1,i))^2+(r1(2,i+bp)-r1(2,i))^2);
u=0;
for k=1:bp
u=u+sqrt((r1(1,i+k)-r1(1,i))^2+(r1(2,i+k)-r1(2,i))^2);

end
t(j)=u/d;
j=j+1;
end
t1=t>=30;
% finger counting ...
cnt=0;
for i=2:size(t1,2)
if((t1(i)-t1(i-1))==1)
cnt= cnt+1;
end
end
count=(cnt+1)/2;
if(count==0.5)
count=0;
else
count=round(count);
end
% result display ...
imshow(a);
h=msgbox(sprintf('the count is = %d',count));

Outputs :

Output
for
count 1

Output
for
count 2

Output
for
count 3

Output
for
count 4

Output
for
count 5

6. LIMITATIONS :
In our program there are some limitations like if the background is of skin color
then this background pixels also become a part of our object pixels, which is not
desirable.
Another limitation is in the image the hand can not be placed in vertical downward
direction, problem in finger counting will arise.

7. CONCLUSION AND FUTURE WORK :


Throughout our project we have seen how we can effectively and efficiently
recognize finger counting. Here, we have learnt that finger counting as a basic method of the
Hand Gesture Recognition, taken certain assumptions and performed our work, which has
provided desirable outputs. There is still the requirement to build a globally accepted program
for this hand gesture recognition also.
There are lot more future work in this area still left to be done to improve the results in question
and also develop better algorithms to provide accurate end values. Thus it can be concluded as
we will be able to solve all these problem from our basic idea and drive ourselves towards an
advancement.

8. REFERENCES :
1.
2.
3.
4.

Digital Image Processing R. Gonzales


Digital image processing using matlab R. Gonzales
MATLAB 7.11.0 (R2010b) Documentation
M.Basilio, G.A.Torres Explicit Image Detection using YCbCr Space Color Model as Skin
Detection, In Applications of Mathematics and Computer Engineering, C.P. 04430
5. https:\www.stackoverflow.com

9. ACKNOWLEDGEMENTS :
I am thankful to my guides, Manvendra Sir, Palash Sir and Jayashree Maam and my Project Head,
Pradipta Sir for guiding me throughout the project and helping me achieve the desired results as
mentioned in the project. I also thank ITR, DRDO for providing me the opportunity to carry out
the project of my choice.

You might also like