Enhancing Model Accuracy for Keypoint-Based Sign Language Recognition using Optimized Neural Network Architectures
Enhancing Model Accuracy for Keypoint-Based Sign Language Recognition using Optimized Neural Network Architectures
Abstract: Sign language is a vital means of communication for millions, yet technological barriers still limit accessibility.
To address this, we analyzed the existing deep learning model and identified key areas for enhancement. We expanded the
neural network to improve learning capacity, replaced ReLU with LeakyReLU to avoid inactive neurons, and added batch
normalization to maintain gradient stability throughout training. To reduce overfitting while preserving performance, we
fine-tuned the dropout layer. We also improved preprocessing to filter out background noise, enhancing the system’s
ability to accurately track gestures. The training process was accelerated using early stopping and model checkpointing in
order to save the best-performing version possible without incurring unnecessary computation. The final leg was
converting the model to run in TensorFlow Lite, so that it would be able to run efficiently on mobile and edge devices and
hence making its real-world deployment possible. The results were demonstrative; greatly improved accuracy, enhanced
stability, and decent real-time performance. With confusion matrices and ROC curves backing it, the improvement is
measurable. But more importantly, this project is about inclusivity—what it means to bring people into technology more
finely on behalf of the community.
How to Cite: Kailash Kumar Bharaskar; Dharmendra Gupta; Vivek Kumar Gupta; Rachit Pandya; Rachit Jain (2025) Enhancing
Model Accuracy for Keypoint-Based Sign Language Recognition using Optimized Neural Network Architectures International
Journal of Innovative Science and Research Technology, 10(4), 4049-4055.https://doi.org/10.38124/ijisrt/25apr2374
I. INTRODUCTION not only faster—but also more accurate and reliable in the
real world. This research mainly aims to improve the
The sign language allows the deaf and hard-of-hearing accuracy and robustness of a keypoint-based sign language
community to convey complex thoughts, feelings, and recognition model.
concepts. But the lack of being familiar with sign language
leaves obstacles to communication in everyday life. The Following Research Questions Guide Our Study:
Automated Sign Language Recognition SLR Automated
(SLR) is a significant development that utilizes machine How can neural network optimization techniques improve
learning to convert visual gestures into text or speech, the accuracy of keypoint-based SLR models?
facilitating smoother communication. The adoption of deep What impact do modifications such as batch
learning-based approaches has further augmented SLR normalization, LeakyReLU activations, and adjusted
systems, drastically enhancing their precision, speed, and dropout rates have on the model’s stability and
overall robustness. Deep learning models are very good at generalization capabilities?
identifying complicated patterns. The traditional SLR task
based on vision typically leverages CNNs and RNNs. But II. LITERATURE REVIEW
there’s an alternative — keypoint-based models. Rather
than analyzing entire images, they follow hand joint In the last 10 years, sign language recognition (SLR)
movements which makes them lightweight and quick. They has evolved significantly. Researchers progressed from
don’t require as much computing power and are better traditional methods — in which features such as those based
equipped to handle background noise, making them ideal for on hand shape, movement, and timing were manually
real-time usage. That said, they present challenges. extracted from the data — to deep learning techniques, in
Overfitting. Poor generalization. Our research addresses which patterns in the data were automatically learned. Early
these issues directly, we observed that this direct deployment systems had their moments, but they struggled with the
of architectural improvements would yield models that were complexity and variety of sign language as it is used in the
Sign language recognition (SLR) has really grown over training time. It was a good starting point, but clearly had
the years. It started with simple machine learning models that room for improvement. In order to provide some
used handcrafted features and basic classifiers like SVMs. performance improvements several important enhancements
They got the job done—sort of—but struggled with the were made:
complexity of real-world gestures. Then deep learning came
along and changed the game. Early CNNs helped improve Increased Learning Power: Having more neurons in the
accuracy, but what really pushed things forward were dense layers could allow the model to learn more
techniques like dropout, batch normalization, and smarter comprehensive features of the gesture. LeakyReLU
activation functions. These made models not just better—but Activation: Traditional ReLU activation was swapped to
more stable, and actually usable outside the lab. In our work, LeakyReLU, to mitigate the "dying neurons", encourage
we’ve built on all of that. We didn’t just copy what came smoother gradient flow, and boost stability during training. In
before—we dug into it. Looked at what worked. What didn’t. each layer image, we implement Input Normalization which
Then focused on fine-tuning key parts of the model—like allows the training process work faster and better(Batch
choosing the right activation functions, applying better Normalization). Optimized Dropout: After tuning dropouts
normalization, and adjusting dropout levels. The result? A several results were obtained which clearly measured the
leaner, smarter network that learns faster and holds up better balance between overfitting and learning. Enhanced Data
in the real world. Processing: By adopting a more flexible data-loading
approach, we could better handle inconsistencies, and feature
III. RELATED WORKS standardization helped conform the features before learning.
Lean Deployment: The model was converted to TensorFlow
Data Labelling and Structuring Lite leading to a decrease in size without sacrificing accuracy
The first version of the sign language recognition allowing the model to run on mobile and embedded device.
system was built using TensorFlow and Keras, with a focus
on keeping things simple and beginner-friendly. It worked by Training
tracking 21 key points on the hand—so, 42 values for x and y Optimizer & Learning Rate: Adam optimizer (LR =
coordinates—and running them through a few dense layers. 0.001) based on the learning adaptable to data. Loss
ReLU was used for activation, and dropout helped reduce function: sparse categorical cross-entropy for multi-class
overfitting. The final softmax layer handled the actual gesture classification. Regularization: Dropout (35%) to prevent
classification, sorting each input into a specific sign language overfitting. Instability in training backpropagation of batch
category. Training was done using sparse categorical cross- normalization. Use Early Stopping to avoid computation
entropy with the Adam optimizer. It did the job—but just time. Tracked. Metrics: Loss, accuracy and gradient flow
barely. The model struggled with more complex patterns, through TensorBoard. Training Workflow: (Initialization:
overfitted easily, and wasn’t exactly efficient when it came to His initialization for stable weight distribution. Forward Pass:
Workflow Diagram
The below diagram shows how the enhanced model works (see fig 7.).