Hand Gesture Mouse Using Matlab
Hand Gesture Mouse Using Matlab
BACHELOR OF TECHNOLOGY
Electrical Engineering
Submitted by
(2019PUSETBEEX07641)
Rajat Singh
(2019PUSETBEEX07265)
Supervised by
(Session 2022-23)
Poornima University
1
POORNIMA UNIVERSITY
CERTIFICATE
This is to certify that Final Year Minor Project Report entitled “IMPLEMENTATION & ANALYSIS of
VIRTUAL MOUSE with HAND GESTURE CONTROL” has been presented by “JASWANT SINGH
CHANDA (2019PUSETBEEX07641) and RAJAT SINGH (2019PUSETBEEX07295),” for partial
fulfilment of the Degree of Bachelor of Technology to Department of Electrical & Electronics Engineering,
Faculty of Engineering & Technology, Poornima University.
It is found satisfactory and approved for submission.
2
ACKNOWLEDGEMENT
We have undergone a Minor Project which was meticulously planned and guided at every stage so that it
became a life time experience for me. This could not be realized without the help from numerous sources
and people in the Poornima University and Electrical & Electronics Engineering Department.
We would like to take this opportunity to show our gratitude towards Dr. Nand Kishor Gupta (Supervisor),
Department of Electrical & Electronics Engineering, Poornima University who helped us in successful
completion of my Minor Project. He has been a guide, motivator & source of inspiration for us to carry out
the necessary proceedings for completing this report and related activities successfully.
We are also very grateful to our HOD Mr. Ashish Raj Sinha for his kind support and guidance.
We are also privileged to have faculties who have flourished us with their valuable facilities without which
this work cannot be completed.
We would also like to express our hearts felt appreciation to all of our friends whom direct or indirect
suggestions help us to develop this project and to entire team members for their valuable suggestions.
Lastly, thanks to all faculty members of Department of Electrical & Electronics Engineering for their moral
support and guidance.
(2019PUSETBEEX07641)
RAJAT SINGH
(2019PUSETBEEX07295)
3
ABSTRACT
As the technology is growing immensely in recent years, certain improvisations need to be made. People
are tired of using mouse and touch pad in this project got something better for them. This project about
to develop a virtual mouse which manipulates our system even without touching it.
In this project used OpenCV library for implementing the project. OpenCV is a huge open-source library
for computer vision, machine learning, and image processing. OpenCV supports a wide variety of
programming language likes python, C++, Java, etc. It can process images and videos to identify objects,
faces, or even the handwriting of human.
In this project using the live feed coming from the webcam to create a virtual mouse with functionalities.
In this project first detect the hand landmarks and then track click based on these points. And also apply
smoothing techniques to make it more usable. In this project do all functions of a mouse through our
hand gestures thus the name virtual mouse.
4
INDEX
Certificate 2
Acknowledgement 3
Abstract 4
Introduction 7-13
Motivation 14
Problem Description 15
Issues and Challenges 16
Project Description 17-25
Project Result & Discussion 26-31
Project Specifications 32
Conclusion 33
Further Work 34
Reference 35
5
FIGURE INDEX
Table Index
Table 1Platforms for run codes 12-13
Table 2 function of mouse perform by hand 26
6
INTRODUCTION
1.1 Introduction-
Computer technology has tremendously grown over the past decade and has become a
necessary part of everyday live. The primary computer accessory for Human Computer
Interaction (HCI) is the mouse. The mouse is not suitable for HCI in some real-life situations,
such as with Human Robot Interaction (HRI). There have been many researches on
alternative methods to the computer mouse for HCI. The most natural and intuitive technique
for HCI, that is a viable replacement for the computer mouse is with the use of hand gestures.
This project is therefore aimed at investigating and developing a Computer Control (CC)
system using hand gestures.
Most laptops today are equipped with webcams, which have recently been used insecurity
applications utilizing face recognition. In order to harness the full potential of a webcam, it
can be used for vision-based CC, which would effectively eliminate the need for a computer
mouse or mouse pad. The usefulness of a webcam can also be greatly extended to other HCI
application such as a sign language database or motion controller. Over the past decades there
have been significant advancements in HCI technologies for gaming purposes, such as the
Microsoft Kinect and Nintendo Wii. These gaming technologies provide a more natural and
interactive means of playing videogames. Motion controls is the future of gaming and it have
tremendously boosted the sales of video games, such as the Nintendo Wii which sold over 50
million consoles within a year of its release using hand gestures is very intuitive and
effective for one interaction with computers and it provides a Natural User Interface (NUI).
There has been extensive research towards novel devices and techniques for cursor control
using hand gestures. Besides HCI, hand gesture recognition is also used in sign language
recognition, which makes hand gesture recognition even more significant.
Computer vision techniques are used for gesture recognition. OpenCV consists of a package
called video capture which is used to capture data from a live video.
7
1.2 Basic Introduction of Python Programming Language
Python Language Introduction Python is a widely used general-purpose, high level
programming language. It was initially designed by Guido van Rossum in 1991 and
developed by Python Software Foundation. It was mainly developed for emphasis on code
readability, and its syntax allows programmers to express concepts in fewer lines of code. Python
is a programming language that lets you work quickly and integrate systems more efficiently.
Python is a high-level, interpreted, interactive and object-oriented scripting language. Python
is designed to be highly readable.
It uses English keywords frequently where as other languages use punctuation and it has fewer
syntactical constructions than other languages.
1 Python is Interpreted − Python is processed at runtime by the interpreter. You do not need
to compile your program before executing it. This is similar to PERL and PHP.
2 Python is Interactive − You can actually sit at a Python prompt and interact with the
interpreter directly to write your programs
3 Python is Object-Oriented − Python supports Object-Oriented style or technique of
programming that encapsulates code within objects.
1.2.1 History of Python
Python was developed by Guido van Rossum in the late eighties and early nineties at the National
Research Institute for Mathematics and Computer Science in the Netherlands.
Python is derived from many other languages, including ABC, Modula-3, C, C++, Algol-68, Small
Talk, and Unix shell and other scripting languages.
Python is copyrighted. Like Perl, Python source code is now available under the GNU General
Public License (GPL).
Python is now maintained by a core development team at the institute, although Guido van Rossum
still holds a vital role in directing its progress.
8
1.2.2 Python Features
Fig 1 Features
Generally, we have seen that python programming language is extensively used for web
development, application development, system administration, developing games etc.
But there are some future technologies that are relying on python. As a matter of fact, Python
has become the core language as far as the success of these technologies is concerned. Let’s dive
into the technologies which use python as a core element for research, production and further
developments.
Python programming language is undoubtedly dominating the other languages when future
technologies like Artificial Intelligence (AI) comes into the play.
There are plenty of python frameworks, libraries, and tools that are specifically developed to
direct Artificial Intelligence to reduce human efforts with increased accuracy and efficiency for
various development purposes.
It is only the Artificial Intelligence that has made it possible to develop speech recognition
system, autonomous cars, interpreting data like images, videos etc.
We have shown below some of the python libraries and tools used in various Artificial
Intelligence branches.
Machine Learning- PyML, PyBrain, scikit-learn, MDP Toolkit, GraphLab Create, MIPy etc.
General AI- pyDatalog, AIMA, EasyAI, SimpleAI etc.
Neural Networks- PyAnn, pyrenn, ffnet, neurolab etc.
Natural Language & Text Processing- Quepy, NLTK, gensim
10
Big Data
The future scope of python programming language can also be predicted by the way it has
helped big data technology to grow. Python has been successfully contributing in analysing
a large number of data sets across computer clusters through its high-performance toolkits
and libraries.
Let’s have a look at the python libraries and toolkits used for Data analysis and handling
other big data issues.
Pandas, Scikit-Learn, NumPy, SciPy, Graph Lab Create, IPython, Bokeh, Agate, PySpark,
Dask.
(Networking
Networking is another field in which python has a brighter scope in the future. Python
programming language is used to read, write and configure routers and switches and perform
other networking automation tasks in a cost-effective and secure manner.
For these purposes, there are many libraries and tools that are built on the top of the python
language. Here we have listed some of these python libraries and tools especially used by
network engineers for network automation.
Ansible, Netmiko, NAPALM (Network Automation and Programmability Abstraction Layer
with Multivendor Support), Pyeapi, Junos PyEZ, PySNMP, Paramiko SSH
11
1.2.5 Why Python Programming Language has preferred in this Project?
Python has been voted as most favourite programming language beating C, C++ and java
programming. Python programming is open-source programming language and used to develop
almost every kind of application.
Python is being used worldwide as a wide range of application development and system
development programming language. Big brands and search engine giants are using python
programming to make their task easier. Google, Yahoo, Quora, Facebook are using python
programming to solve their complex programming problems.
Python programming is used to write test scripts and tests mobile devices performance. It is one
of the most versatile languages these days. Python programmers are most demandable in the IT
industry these days and get paid more compared to another language programmer.
There are also many ide’s (integrated development environment) available for Python. A
brief list is as shown below:
12
Anaconda is a free and open- Free and
source distribution of Open https://www.anaconda.c
the Python and R Source Windows, Mac om/download/
programming languages for
OS, Linux
data science and machine
learning related applications
that aims to simplify package
management and deployment
Table 1
13
MOTIVATION
We had chosen this project with an interest in learning the direct interaction of humans with
consumer electronic devices. This takes the user experience to a whole new level. The gesture control
technology would reduce our dependence on age-old peripheral devices hence it would reduce the
overall complexity of the system. Initially, this technology was considered in the field of gaming
(like Xbox Kinect), but the application of motion/gesture control technology would be more diverse
if we apply it to our other electronics like computers, televisions, etc., for our day-to-day purposes
like scrolling, selecting, clicking, etc.
Our primary objective in doing this project was to build a device inspired by Leap motion. It is a
device that recognizes hand gestures and can be used to virtually control a computer. In short, it
provides a virtual screen with which we can interact with the computer. But the required hardware
for making a device on these lines was not feasible, in terms of budget and time frame provided.
14
PROBLEM DESCRIPTION
There are generally two approaches for hand gesture recognition, which are hardware based,
where the user must wear a device, and the other is vision based which uses image processing
techniques with inputs from a camera.
The proposed system is vision based, which uses image processing techniques and inputs from a
computer webcam. Vision-based gesture recognition tracking and gesture recognition. The input
frame would be captured from the webcam and systems are generally broken down into four
stages, skin detection, hand contour extraction, and hand skin region would be detected using
skin detection.
The hand contour would then be found and used for hand tracking and gesture recognition. Hand
tracking would be used to navigate the computer cursor and hand gestures would be used to
perform mouse functions such as click, scroll up and scroll down. The scope of this python-based
virtual mouse project would to design a vision-based CC system, which can perform the mouse
function previously stated.
15
ISSUES AND CHALLENGES
The first challenge was to correctly detect the hand with a webcam. We needed a
Computer Vision library for this purpose. Many are available but we decided to go ahead
with OpenCV as it is the most popular and has been ported to many languages and is
supported on many operating systems from Android to Windows. It has a good library
collection of standard image-processing functions.
Then we had to first set up OpenCV on our IDE (Visual Studio). We also had to learn
some basic usage of OpenCV. For which we referred to many tutorials on the web.
After learning OpenCV, we had to learn about skin detection techniques and image
processing techniques like Background Subtraction, Image Smoothening, Noise
Removal, and Reduction.
Now, after detecting the hand correctly and mapping the gestures, we had to learn to use
the Windows API to tune the software with the Metro UI. For learning it, we built some
basic applications based on it. So, in short, there was a steep learning curve.
As high-end cameras and sensors are very costly, we decided to go with a simple webcam.
So, we decided to optimize our software and its functionality in order the drawback of
using a simple webcam.
16
PROJECT DESCRIPTION
In this section the strategies and methods used in the design and development of the
vision-based CC system will be explained. The algorithm for the entire system is shown
in the Figure below.
17
In order to reduce the effects of illumination, the image can be converted to chrominance
colour space which is less sensitive to illumination changes.
The HSV colour space was chosen since it was found to be the best colour space for skin
detection. The next step would be to use a method that would differentiate skin pixels
from non-skin pixels in the image (skin detection). Background subtraction was then
performed to remove the face and other skin colour objects in the background.
Morphology Opening operation (erosion followed by dilation) was then applied to
efficiently remove noise. A Gaussian filter was applied to smooth the image and give
better edge detection. Edge detection was then performed to get the hand contour in the
frame. Using the hand contour, the tip of the index finger was found and used for hand
tracking and controlling the mouse movements. The contour of the hand was also used
for gesture recognition. The system can be broken down into four main components, thus
in the Methodology, the method used in each component of the system will be explained
separately.
18
In the proposed method, the HSV colour space was used with the Histogram- based skin
detection method. The HSV color space has three channels, Hue (H), Saturation(S), and
Value (V). The H and S channels hold the colour information, while the V channel holds
the intensity information.
The input image from the webcam would be in the RGB color space, thus it would have
to be converted to the HSV color space using the conversion Formulae. The Histogram-
based skin detection method proposed uses 32 bins of H and S histograms to achieve skin
detection. Using a small skin region, the color of this region is converted to a chrominance
color space.
A 32-bin histogram for the region is then found and used as the histogram model. Each
pixel in the image is then evaluated on how much probability it has to a histogram model.
This method is also called Histogram Back Projection.
Back projection can be defined as recording how well pixels or patches of pixels fit the
distribution of pixels in a histogram model. The result would be a grayscale image (back-
projected image), where the intensity indicates the likelihood that the pixel is a skin color
pixel. This method is adaptive since the histogram model is obtained from the user`s skin,
under the pre-set lighting condition.
19
Fig 4. Algorithm for Skin Detection
20
2. HAND CONTOUR EXTRACTION
There are several edge detection methods such as Laplacian edge detection, canny
edge detection, and border finding the OpenCV function “cvFindContours ()” uses an
order finding edge detection method to find the contours in the image.
The major advantage of the border-finding edge detection method is that all the
contours found in the image are stored in an array. This means that we can analyse
each contour in the image individually, to determine the hand contour. The Canny and
Laplacian edge detectors can find the contours in the image, but do not give us access
to each contour. For this reason, the border-finding edge detection method was used
in the proposed design.
In the contour extraction process, we are interested in extracting the hand contour so
that shape analysis can be done on it to determine the hand gesture. The figure below
shows the result when edge detection was applied to the skin-segmented binary
image. It can be seen that besides the hand contour, there are lots of small contours in
the image. These small contours can be considered noise and must be ignored. The
assumption was made that the hand contour is the largest contour thereby ignoring all
the noise contours in the image. This assumption can be void if the face contour is
larger than the hand contour. To solve this problem, the face region must be
eliminated from the frame. The assumption was made that the hand is the only moving
object in the image and the face remains relatively stationary compared to the hand.
This means that background subtraction can be applied to remove the stationary pixels
in the image, including the face region. This is implemented in the OpenCV function
named “BackgroundSubtractorMOG2”.
3. HAND TRACKING
The movement of the cursor was controlled by the tip of the index finger. In order to
identify the tip of the index finger, the centre of the palm must first be found. The
method used for finding the hand centre was adopted and it has the advantage of being
simple and easy to implement. The algorithm for the method is shown in the flow
chart of the Figure below. The shortest distance between each point inside the
inscribed circle to the contour was measured and the point with the largest distance
was recorded as the hand Centre. The distance between the hand Centre and the hand
contour was taken as the radius of the hand. The hand Centre was calculated for each
successive frame and using the hand Centre, the tip of the index finger would be
21
identified and used for hand tracking. The method used for identifying the index and
the other fingers is described in the following subsection. The results for hand tracking
would be demonstrated in Figure in the Results and Analysis section.
22
4. GESTURE RECOGNITION
The gesture recognition method used in the proposed design is a combination of two
methods, the method proposed by Yeo and the method proposed by Balazs. The
algorithm for the proposed gesture recognition method is described in the flow chart in
the Figure below. It can be seen from the Figure above that the convexity defects for the
hand contour must first be calculated. The convexity defects for the hand contour were
calculated using the OpenCV inbuilt function
“cvConvexityDefects”. The parameters of the convexity defects (start point, end point,
and depth point) are stored in a sequence of arrays. After the convexity defects are
obtained, there are two main steps for gesture recognition:
1. Finger Tip Identification
2. Number Of Fingers
23
5. CURSOR CONTROL
Once the hand gestures are recognized, it will be a simple matter of mapping different
hand gestures to specific mouse functions. It turns out that controlling the computer
cursor, in the C/C++ programming language is relatively easy. By including the User. lib
library into the program, the “Send Input” function will allow control of the computer
cursor.
Instruction on how to properly use this function was obtained from the Microsoft
Developers Network MSDN website. This function is only available for the Windows
2000 Professional operating system or later. This introduces a new limitation on the
system, such that it can only be used on newer versions of the Windows operating system.
The algorithm for the cursor control is shown in the Figure below.
24
The following table shows the Operations Performed depending on the number of fingers
detected:
Number of Fingertips Detected Operations Performed
One Move Cursor
Two Left Click
Three Right Click
Four Start Button
Five My computer
Table 2
Starting with the position of the index fingertip, the cursor is moved to the fingertip
position. This is done using the “Send Input” function to control the cursor movement.
The next step would be to determine if a hand gesture was performed. If a hand gesture
is performed, the “Send Input” function is again used to control the cursor function. If
there is no change in fingertip position, the loop is exited and it would be started again
when a change in fingertip position is detected.
25
PROJECT RESULT & DISCUSSION
Hand tracking is based on color recognition. The program is therefore initialized by
sampling color from the hand. The hand is then extracted from the background by using
a threshold using the sampled color profile. Each color in the profile produces a binary
image which in turn are all summed together.
A nonlinear median filter is then applied to get a smooth and noise-free binary
representation of the hand.
26
Program which is used to run this project
import cv2
import mediapipe as mp
import pyautogui
cap=cv2.VideoCapture(0)
hand_detector=mp.solutions.hands.Hands()
drawing_utils=mp.solutions.drawing_utils
screen_width, screen_height =pyautogui.size()
index_y = 0
while True:
_, frame= cap.read()
frame = cv2.flip(frame, 1)
frame_height, frame_width, _ = frame.shape
rgb_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
output = hand_detector.process(rgb_frame)
hands = output.multi_hand_landmarks
if hands:
for hand in hands:
drawing_utils.draw_landmarks(frame, hand)
landmarks= hand.landmark
for id,landmark in enumerate(landmarks):
x = int(landmark.x*frame_width)
y = int(landmark.y*frame_height)
if id == 8:
cv2.circle(img=frame, center=(x,y), radius=10,
color=(0,255,255))
index_x = screen_width/frame_width*x
index_y = screen_height/frame_height*y
pyautogui.moveTo(index_x, index_y)
if id == 4:
cv2.circle(img=frame, center=(x, y), radius=10,
color=(0, 255, 255))
thumb_x = screen_width / frame_width * x
thumb_y = screen_height / frame_height * y
print('outside' , abs(index_y - thumb_y))
if abs(index_y - thumb_y)<20:
pyautogui.click()
pyautogui.sleep(1)
cv2.imshow('Virtual Mouse',frame)
cv2.waitKey(1)
There are following different steps of project with their respective code are
27
When the binary representation is generated, the hand is processed in the following
way:
The analysis results in data that can be of further use in gesture recognition:
1. Fingertip positions
2. Number of fingers
3. Number of hands
4. Area of hands
28
The Process of Working of the Project
STEP 1 in this step we keep our palm in front of web camera and it detect the landmarks
in the palm and it form a rectangle shape on palm
Step 2 – in this step it detects the five fingers of the hand and analyzed the fingers for
the movement. Now, let’s see how it tracks our palm and detects our fingers:
29
STEP3
STEP 4
STEP 5
30
STEP 6
31
PROJECT SPECIFICATION
Software Specifications:
A Webcam
System: Intel corei5
Processor: Intel(R) Core (TM) i5-10300H CPU @2.50GHz
Installed RAM: 8.00 GB
System type: 64-bit OS, x64-based processor
Webcam: 720p HD Webcam
Resolution: 1920 X 1080
Environment Specifications:
32
CONCLUSION
There is different method to implement the virtual moused to use by the hand gesture.
The Histogram-based and Explicitly threshold skin detection methods were
evaluated and based on the results, the Histogram method was deemed as more
accurate.
The vision-based cursor control using a hand gesture system was developed in the
python language, using the OpenCV library and different library packages as
PyAutoGui, Mediapipe.
The system was able to control the movement of a Cursor by tracking the user’s hand.
Cursor functions were performed by using different hand gestures. The system has
the potential of being a viable replacement for the computer mouse, however, due to
the constraints encountered; it cannot completely replace the computer mouse.
The major constraint of the system is that it must be operated in a well-lit room. This
is the main reason why the system cannot completely replace the computer mouse,
since it is very common for computers to be used in outdoor environments with poor
lighting conditions. The accuracy of the python-based hand gesture recognition could
have been improved.
Template Matching hand gesture recognition method was used with a machine
learning classifier. This would have taken a lot longer to implement, but the accuracy
of the gesture recognition could have been improved.
It was very difficult to control the cursor for precise cursor movements since the
cursor was very unstable. The stability of the cursor control could have been
improved if a Kalman filter was incorporated into the design.
The Kalman filter also requires a considerable amount of time to implement and due
to time constraints, it was not implemented. All of the operations which were
intended to be performed using various gestures were completed with satisfactory
results.
33
FURTHER WORK
We would improve the performance of the software especially hand tracking in the
near future. And we also want to decrease the response time of the software for cursor
movement so that it can completely be used to replace our conventional mouse. We
are also planning to design a hardware implementation for the same in order to
improve accuracy and increase the functionality to various domains such as a gaming
controller or a general-purpose computer controller.
Other advanced implementation includes the hand gesture recognition stage uses the
Template Matching method to distinguish the hand gestures. This method requires
the use of a machine learning classifier, which takes a considerably long time to train
and develop. However, it would have allowed the use of lots more hand gestures
which in turn would allow the use of more mouse functions such as zoom in and
zoom out. Once the classifier is well-trained, the accuracy of the Template Matching
method is expected to be better than the method used in the proposed design.
Another novel implementation of this technology would use the computer to train the
visually or hearing impaired.
34
REFERENCE
[1] Abhik Banerjee, Abhirup Ghosh, Koustuvmoni Bharadwaj,” Mouse Control using a
Web Camera based on Color Detection”, IJCTT, vol.9, Mar 2014
[2] Angel, Neethu.P.S,”Real Time Static & Dynamic Hand Gesture Recognition”,
International Journal of Scientific & Engineering Research Volume 4, Issue3, March-
2013.
[3] Chen-Chiung Hsieh and Dung-Hua Liou,” A Real Time Hand Gesture Recognition
System Using Motion History Image”icsps, 2010
[4] Hojoon Park, “A Method for Controlling the Mouse Movement using a Real Time
Camera”, Brown University, Providence, RI, USA, Department of computer science,
2008.
[6] Angel, Neethu.P.S “Real-Time Static and Dynamic Hand Gestures Recognition” The
hand tracking has to be specifically adapted for each user.
[7] Abhirup Ghosh, Abhik Banerjee “Mouse Control using a Web Camera based on Color
Detection”
[8] J.L.Raheja, A.Chaudhary, K.Singal “Proposed using HSV algorithm but this uses
special sensor is used to capture image and processes it” User has to spend more money
for the sensor.
[9] https://en.wikipedia.org/wiki/Python_(programming_language)
35