Chapter 1
1.1Introduction
Considering that 83% of the sensory information we receive is derived from our visual perception, it is clear that
vision plays a crucial role in human physiology.[1]
As per the 2011 WHO statistics, around 285 million people globally are affected by visual impairment.
Among them, 39 million are blind, while 246 million experience low vision[2].
Object recognition glasses can enhance social inclusion for blind individuals by allowing greater
participation in daily activities. These devices reduce reliance on others, boosting confidence and
independence[3].
To aid individuals with visual impairments in recognizing objects, it is advisable to implement real-time detection
frameworks such as YOLO or SSD to provide immediate feedback. Utilizing Mask R-CNN can facilitate scene
description, while integrating Optical Character Recognition (OCR) with Text-to-Speech (TTS) technology will enable
text reading. Furthermore, employing LiDAR technology can enhance spatial awareness, and improving user
interaction through gesture or voice commands is recommended. It is also essential to ensure compliance with
GDPR and to adopt privacy-preserving AI measures. Various object recognition methodologies should be
considered, including feature-based techniques (such as SIFT and SURF), deep learning models (like CNNs and Vision
Transformers), and hybrid approaches to achieve greater accuracy. Additionally, adapting models through transfer
learning can cater to personalized environments, with deep learning being the predominant method utilized.[4]
Various studies using deep learning have developed technologies to assist the visually impaired. For
example, a study on smart glasses utilized Raspberry Pi 4, ESP32-CAM, and YOLOv4, achieving 69.2%
precision in 3–4 seconds. Another study employed DA-Multi-DCGAN on mixed datasets,
achieving 80.21% accuracy. A study with YOLOv3 and Raspberry Pi 3 B+ reported 85–95% accuracy,
with 100% for specific objects in 50ms. Another study used Raspberry Pi 3 and CNN models with
PASCAL VOC, achieving 90% accuracy for 20 objects in 29ms. Lastly, a study introduced CATNet on
Raspberry Pi 2, achieving high accuracy for small targets [5].
Detection techniques encounter challenges in low-light conditions, managing overlapping or
diminutive objects, and exhibit high resource consumption. Recognition systems experience
reduced accuracy due to varying viewing angles, minor alterations in lighting or facial expressions,
as well as the effects of unbalanced datasets and the risk of reverse attacks. In the realm of optical
character recognition (OCR), issues arise with handwritten, small, or distorted texts, along with the
complexities of handling multiple languages, particularly Arabic. Voice recognition also faces
obstacles in noisy environments, accommodating various dialects, and the potential for voice
manipulation, resulting in slow response times and diminished accuracy in quiet speech.
1.2 Reasearch problem:
1.3 Goals of the project
1. Develop an affordable and efficient object recognition system using CNN and ESP32 to assist blind
individuals in Yemen.
2. Enhance independence and improve daily navigation by enabling real-time identification of objects.
1.4 Methodology
1.5 limitation of the project
The project faces challenges with data availability and quality, as limited datasets may not cover all
relevant objects and environments, and real-world variations in lighting and angles make training difficult.
1.6 components of the project
1.ESP32
The ESP32 is a versatile, low-power, dual-core microcontroller developed by Espressif Systems,
designed for a wide range of applications, particularly in IoT (Internet of Things). It integrates Wi-Fi and
Bluetooth (both Classic and BLE) connectivity, making it ideal for wireless communication in embedded
systems. The ESP32 is widely used in devices such as smart home systems, wearables, and
automation products[6].
2 ESP32-CAM
The ESP32-CAM is a compact camera module featuring the ESP32-S chip and the OV2640 camera,
costing about $10. It includes a microSD card slot for storing images and files. Measuring
27*40.5*4.5mm with a deep sleep current of 6mA, it can operate independently. Ideal for various IoT
applications, it suits home smart devices, industrial control, wireless monitoring, and more.[7]
3.Headphones
Headphones are a pair of small loudspeaker drivers worn on or around the head over a user's ears.
They are electroacoustic transducers, which convert an electrical signal to a corresponding sound. [8]
4.Ultrasound sensor
Ultrasonic sensors help UAVs measure ground distance for altitude control by emitting pulses and
receiving reflections.
They have a range of up to four meters and are generally unaffected by environmental factors but can
be influenced by noise and airflow. Despite limitations, larger versions could help helicopters detect
obstacles like wires[9].
Chapter 2
The project on "Smart Glasses" for blind people aimed to assist visually impaired individuals in
education and daily life. The system could scan printed text, convert it to audio, and translate English
to Arabic using the Google Translate API. It also used RFID technology to help users locate specific
places like classrooms, along with ultrasonic sensors for better image capture. However, the project
faced numerous issues. The initial design was bulky and impractical, requiring a complete redesign
for better usability. There were compatibility issues between the camera, NOOBS operating system,
and Raspberry Pi model B+, forcing the team to switch to different hardware. Additionally, the RFID
sensor had a limited range, reducing its effectiveness. Time constraints also hindered the
implementation of all planned features, leaving the project incomplete in certain aspects.
The AI-powered smart glasses for the blind and visually impaired aimed to improve navigation and
social interaction by using deep learning techniques, specifically the Faster R-CNN, for object and face
recognition. The system provided voice-based assistance, enabling users to recognize their
surroundings. However, the project had several shortcomings. It focused only on object and face
recognition, overlooking crucial features like text reading and complex navigation. Moreover, the
absence of user testing raised concerns about its real-world effectiveness. The lack of technical
details on implementation and performance metrics made it difficult to assess its practicality.
Additionally, the project’s oversimplified presentation failed to acknowledge the complexities of
training deep learning models and processing real-time data, which are essential for such a system to
function effectively.
Another project, "My Eyes—Smart Glasses for Blind People," provided a cost-effective wearable
solution for visually impaired individuals. It integrated Raspberry Pi, a camera, and earpieces to assist
users in reading tasks and navigating their environment. The glasses combined text-to-speech
conversion, obstacle detection, and face recognition to enhance accessibility. While the project
showed promise, it had several limitations. The bulky design made prolonged use uncomfortable, and
its reliance on an internet connection posed challenges in areas with poor connectivity. The accuracy
of object recognition was not fully reliable in complex environments, and the system had a learning
curve that might discourage some users. Additionally, the glasses had limited battery life, potential
technical malfunctions, and privacy concerns that were not addressed in the study.
A similar project, "AI-Based Smart Glasses for Visually Impaired Individuals," focused on enhancing
accessibility in shopping environments. It used a Raspberry Pi 4, a camera module, and YOLOv5 deep
learning algorithms for real-time object classification and text recognition. The system provided audio
output in multiple languages to cater to user preferences. However, it had several drawbacks.
Language support was limited to English and Tamil, restricting its usability for a broader audience.
The project was dependent on specific hardware, the EPSON BT-300 smart goggles, reducing its
adaptability. There was also a noticeable delay in speech output, affecting user experience.
Additionally, the system only focused on object recognition and could not assist in navigation or
product comparison. It struggled in dynamic environments, had difficulties in indoor settings due to
poor GPS coverage, and lacked integration with environmental sensors, limiting its overall
effectiveness.
The smart glasses project for visually impaired individuals aimed to provide assistance in various daily
tasks, including text reading. It was designed as a cost-effective, wearable solution using a Raspberry Pi
2. The system offered audio feedback and demonstrated good text recognition accuracy, particularly
with larger fonts. However, it had notable limitations. Despite being designed for multiple tasks, it only
implemented a single reading mode, limiting its practical applications. The text recognition accuracy
was highly dependent on font size, style, and image clarity, making it less effective for general use. The
system lacked user testing, which is crucial for refining usability. Additionally, the project used Matlab
and Simulink for model design but relied on C++ for implementation on Raspberry Pi, increasing
complexity and the risk of errors. The Raspberry Pi 2’s limited processing power further constrained
performance, reducing the feasibility of adding advanced features.
Another project, "IoT-Based Smart Glasses with Facial Recognition for People with Visual
Impairments," proposed a low-cost assistive technology that used a Raspberry Pi 4, a camera module,
and an ultrasonic sensor for facial recognition and obstacle detection. It provided real-time assistance
by identifying people and measuring distances to avoid obstacles. The project had several advantages,
including affordability and the use of widely available technology. However, it was limited in
functionality, supporting only facial recognition and distance detection, without navigation features.
The small size of the Raspberry Pi’s SD card posed challenges in expanding capabilities. There were also
concerns regarding power management, as the project did not detail how it would sustain long-term
use. Additionally, the study lacked information on the accuracy of the recognition and distance
measurement systems, raising questions about its reliability. The absence of a defined user interface
also made it unclear how users would interact with the system effectively.
Lastly, the "AI-Powered Smart Glasses" project aimed to enhance mobility for blind individuals by
integrating computer vision, deep learning, and speech processing. The system could detect obstacles,
recognize faces, and read text using Optical Character Recognition (OCR). It provided voice feedback to
guide users, improving independence and accessibility. The project had several advantages, including
enhanced safety through early obstacle detection, increased mobility, and a user-friendly voice-based
interface. However, it faced some challenges. Language support was limited, restricting usability for
non-English speakers. The accuracy of the ultrasonic sensor was not ideal for detecting objects at short
distances, and the system’s performance was highly dependent on lighting conditions. Additionally, the
Raspberry Pi’s processing power was limited, making it difficult to handle complex tasks efficiently.
Despite these shortcomings, the project showed promise in developing assistive technology for visually
impaired individuals, with potential for future improvements.
Overall, while each of these projects contributed to the advancement of smart glasses for the visually
impaired, they all had notable limitations. Some struggled with hardware constraints, while others
lacked key functionalities such as navigation, user testing, or real-world adaptability. Addressing these
issues in future iterations could significantly enhance their effectiveness and accessibility.
The core of the system is based on Faster Region Convolutional Neural Network (Faster R-CNN) for
object and face recognition. The captured images are analyzed, and the results are converted into audio
for the user. The authors acknowledge the system is still in a prototype stage but is promising for future
development[16].
[1.]A. Raj, M. Kannaujiya, I. Bhardwaj, A. Bharti, and R. Prasad, “Model for object detection using
computer vision and machine
learning for decision making,” International Journal of Computer Applications, vol. 181, no. 43, Mar. 2019,
doi:
10.5120/ijca2019918516.
[2].WHO|Visual impairment and blindness. WHO, 7 April 1948.
http://www.who.int/mediacentre/factsheets/fs282/en . Accessed Oct 2015
[3]. Douglas, G., Corcoran, C., & Pavey, S. (2006). *The role of assistive technology in the lives of blind
and partially sighted people.* Visual Impairment Research. This article explores the impact of assistive
technologies on the social and emotional well-being of users.
[4].
Redmon et al., "You Only Look Once: Unified, Real-Time Object Detection", 2016.
Liu et al., "SSD: Single Shot MultiBox Detector", 2016.
He et al., "Mask R-CNN", 2017.
Smith, "An Overview of the Tesseract OCR Engine", 2007.
Apple's iPhone 12 Pro with LiDAR scanner.
Gesture recognition (Leap Motion, Kinect), Voice commands (Google Assistant, Siri, Alexa).
General Data Protection Regulation (GDPR).
McMahan et al., "Communication-Efficient Learning of Deep Networks from Decentralized Data",
2017.Yosinski et al., "How transferable are
features in deep neural networks?", 2014.
[5].https://www.espressif.com/en/products/socs/esp32
[6].https://docs.sunfounder.com/projects/galaxyrvr/en/latest/hardware/cpn_esp_32_cam.html
[7].https://en.m.wikipedia.org/wiki/Headphones
[8].https://www.sciencedirect.com/topics/engineering/ultrasonic-sensor
[9].JAMPULA SAITEJA,2022,Ultrasonic Smart Goggles for Blind People,SATHYABAMA INSTITUTE OF
SCIENCE AND TECHNOLOGY
[10].Shantappa G. Gollagi1,2023,An innovative smart glasses for blind people using artificial
intelligence,Indonesian Journal of Electrical Engineering and Computer Science
[11].Esra Ali Hassan,2016,Smart Glasses for the Visually Impaired People,Universiti Teknologi
PETRONAS
[12].Swapna Choudhary,2023,IoT Based Smart Glasses with Facial Recognition for
People with Visual Impairments,SSRG International Journal of Electrical and Electronics Engineering
[13].R. SWEATHA and S. SATHIYA,2024 PRIYAYOLOv5 driven smart glasses for visually
impaired,International Journal of Science and Research Archive
[14].Shruti Jha, Nagendran Shetty Neel Shinde,2024,My Eyes- Smart Glasses for Blind
People,Electronics Department
Atharva College of Engineering
[15].Ananthi, M., Bharathi, R., Gayathri, M., Gokul, G., Sivakumar, M., & Vaishnavi, B. (2023). AI-
Powered Smart Glasses for the Blind and Visually Impaired. International Journal of Innovative
Technology and
Exploring Engineering, 12(9), 4007-4012.