mdgsoc

The MDG Season of Code 2023 project proposes the development of a touchless online book reader that utilizes computer vision and AI to allow users to navigate PDFs using hand gestures. The application aims to enhance the reading experience by minimizing reliance on traditional hardware controls, making it more accessible and convenient for users. The project includes a detailed specification of user flow, tech stack, milestones, and team member backgrounds.

Uploaded by

sanyasonaj

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views6 pages

mdgsoc

Uploaded by

sanyasonaj

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

MDG Season of Code 2023

AirSwipe Reader
Siddhant Munjamkar | MMED

Sanya Jain | EED

Vishal Bokhare | MMED

Overview
• Problem: Whether there are exams or any project or exploring any field, we all depend
on online book readers and pdf viewers for reading the materials but they all pose a
problem, every time we are done reading a page, we need to use the keyboard to
scroll down or if we want to add a note or highlight something, all these features are
either not available in a single pdf viewer or the controls are not quite convenient to
use. So there is a need for touchless interface to ease the process of studying.
• Solution: In the ever-expanding realm of technology, two intertwined fields have
emerged as pioneers, shaping the way we perceive and interact with the digital
world: Computer Vision and Artificial Intelligence (AI). These dynamic disciplines,
although distinct, share a symbiotic relationship that is propelling innovation and
transforming various industries. Using these, We propose to make an online book
reader in which we will give access to the webcam of our computer and using hand
gestures we can scroll the page up/down or zoom in/out etc. Also we will give the
facility to highlight the lines and add notes, screenshots or links in between and save
it so that we can view it later to ease the process of studying later or at the last
moment before the exams.
• Impact: The aim of this project is to facilitate the process of reading and scrolling by
minimising the dependency on computer hardware and making the human computer
interaction as touchless as possible. Also ,it will prove to be a boon for the lazy fellas
like us so we can read lying in bed comfortably with ease, without using the keyboard
for scrolling again and again.

Project Specification
These are the details of the project as follow :
1. User Flow:

Depiction of hand gesture i.e. swiping upwards/downwards for scrolling up/down

Depiction of hand gesture i.e. pinching outwards for zooming in

2. Description:
• Login/ register: This page will help the user to login or register on the website,
followed by a number of questions related to personal info like Age, gender, etc.
entering which ,will be optional.
• Home screen:
1. The home screen will contain import pdf option on the top using which we
can add a pdf to the webpage for reading and editing.
2. Below it ,displayed will be a list of our previously imported and edited pdfs
which we can open and start reading/editing anytime.
3. It will also display the dates along with the pdfs showing when we last edited
the pdf.

• The reading page:

1. When we click the pdf from the list , a window will be displayed, asking for
permission to enable the camera to get gestures for controls. Also, it will
display the gestures which we need to do to perform a particular task. The
camera will be turned on.
2. Then a screen will be opened with the pdf we want to read in the center,
occupying majority of the screen.
3. On top will be a tool bar containing all the options for highlighting the text,
adding notes,bookmark etc.
4. There will also be a save button to save our pdf with all our edits, so we can
refer to them later .
5. Most of the functions on this page ,including scrolling and zooming ,will be
gesture controlled, which the computer will interpret using a trained model.
3. Tech Stack: It will be a web application.
Frontend:
• Framework: React.js or Vue.js (for a component-based UI)
• State Management: Redux for React
• Styling: CSS-in-JS (e.g., Styled Components for React)
• Responsive Design: CSS Grid or Flexbox
Backend:
• Framework: Node.js with Express (for a JavaScript/TypeScript full-stack)
• Database: PostgreSQL (Relational Database) or MongoDB (NoSQL) depending on
data structure
• ORM/ODM: Sequelize for PostgreSQL, Mongoose for MongoDB

Model training will be done using mediapipe and yolov8 using python language. With
the help of PyAutoGUI, we can write a Python Program to mimic the actions of
mouse like left click, right click, scroll, etc. and keyboard like keypress , enter text,
multiple key press, etc. without physically doing them.
We’re planning to use a dataset similar to https://www.kaggle.com/datasets/gti-
upm/leapgestrecog
4. Web page is selected because it would provide a well-managed platform for the user,
also it can be easily accessed from both android and ios platforms. Python language
has a large number of libraries for ease in training ai models .
5. If our project requires API implementation, do some preliminary research and tell us
the ones you’re thinking of using.
6. Model for the database:
• User Table:
user_id (Primary Key): Integer
username: String
email: String
password_hash: String
created_at: DateTime
• Content Uploaded Table:
Content_id (Primary Key): Integer
name: String
description: Text
created_at: DateTime
• Order Table:
order_id (Primary Key): Integer
user_id (Foreign Key): Integer (references User table)
order_date: DateTime
• OrderItem Table:
order_item_id (Primary Key): Integer
order_id (Foreign Key): Integer (references Order table)
product_id (Foreign Key): Integer (references Product table)
quantity: Integer
7. The edge-cases of our solution: selecting in between the lines, the text for
highlighting can’t be done by gestures. Gesture recognition may be challenging in low-
light environments. Different devices may have varying camera capabilities, affecting
gesture recognition.

Milestones
For each week during our time at SoC, we must specify what deliverables we can present to
our mentor for discussion. Some examples are given below:

Week I
• Complete resources for learning the tech stack required for the project.
• Work on hand gesture detecting AI Model which includes training the model
with useful datasets and generating the variables.\
• Data Collection and Preprocessing: training a custom model, collect and
preprocess a dataset of hand gesture images.
• Develop the user interface of our web application.
• Implement the webcam access and real-time video processing.
• Integrate the hand gesture recognition model into the application.
• Implement any additional features, such as user feedback or gesture-
specific actions.

Week II

• Fix project architecture and initialize the project on GitHub

• Implementing the trained model in computer and testing its accuracy and
functioning.
• Train our hand gesture recognition model using a machine learning
framework.
• Optimize and fine-tune the model for better performance.

Week III

• Test the entire system thoroughly, focusing on different hand gestures and
scenarios.
• Address any bugs, performance issues, or usability concerns
• Optimize the performance of the hand gesture recognition system.

Week IV

• Make final and essential changes

• Deploy the application to a web server or cloud platform

About Me
Vishal Bokhare : I am a passionate and results-driven professional with a deep interest and
beginning experience in the intersection of Artificial Intelligence (AI), Machine Learning (ML),
and Computer Vision (OPENCV) . I am always open to exploring new opportunities,
collaborating on exciting projects, and connecting with fellow professionals in the field.
Languages i know C++ ,Python and bit knowledge of SQL.I've done my schooling from
Betul,MP and currently pursuing B Tech in Metallurgy and Materials engineering.

Sanya Jain:

I have completed my initial education from Bareilly, UP and currently pursuing B Tech in
Electrical engineering from Indian Institute of Technology Roorkee. I have knowledge of
programming languages C++, python Java and am exploring and gaining experience in the
field of AI/ML and computer vision.
Siddhant Munjamkar :

I’m a dedicated and creative web developer with a passion for crafting robust and user-
friendly digital experiences. I take pride in translating ideas into functional and visually
appealing websites. I take pride in translating ideas into functional and visually appealing
websites. My expertise includes front-end development, back-end development .I've done
my schooling from Nagpur , MH and currently pursuing B Tech in Metallurgy and Materials
engineering.

Web developer

Contact Details
Member 1

• Vishal Bokhare
• 22124049
• [email protected]
• 9340549668

Member 2
• Sanya Jain
• 22115138
• [email protected]
• 9389051680

Member 3
• Siddhant Munjamkar
• 22118074
• [email protected]
• 9403982378