0% found this document useful (0 votes)
52 views

Project

Uploaded by

goat636189
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views

Project

Uploaded by

goat636189
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 48

DAYANANDA SAGAR COLLEGE OF ENGINEERING

(An Autonomous Institute affiliated to Visvesvaraya Technological University (VTU), Belagavi,


Approved by AICTE and UGC, Accredited by NAAC with ‘A’ grade & ISO 9001 – 2015 Certified Institution)
Shavige Malleshwara Hills, Kumaraswamy Layout, Bengaluru-560 111, India

DEPARTMENT OF INFORMATION SCIENCE & ENGINEERING


(Accredited by NBA Tier 1: 2022-2025)

Project Report on

OBJECT DETECTION USING YOLOv10 ML MODEL DEPLOYED


USING AWS
Submitted in partial fulfillment for the award of the degree of

Bachelor of Engineering
in
Information Science and Engineering

Submitted by
TUNEER SAHA 1DS21ET099
YESHWANTH REDDY 1DS21ET111
SHIVKUMAR KHOT 1DS22ET418
PRAKASH R 1DS22ET412

Under the Guidance of


Ms. Ambika Naik Y
Assistant Professor
Department of Electronics and Telecommunication Engineering
DSCE, Bengaluru

VISVESVARAYA TECHNOLOGICAL UNIVERSITY


JNANASANGAMA, BELAGAVI-590018, KARNATAKA, INDIA
2024-25
DAYANANDA SAGAR COLLEGE OF ENGINEERING
(An Autonomous Institute affiliated to Visvesvaraya Technological University (VTU), Belagavi,
Approved by AICTE and UGC, Accredited by NAAC with ‘A’ grade & ISO 9001 – 2015 Certified Institution)
Shavige Malleshwara Hills, Kumaraswamy Layout, Bengaluru-560 111, India

DEPARTMENT OF INFORMATION SCIENCE & ENGINEERING


(Accredited by NBA Tier 1: 2022-2025)

CERTIFICATE

Certified that the project report entitled “OBJECT DETECTION USING YOLOv10 ML MODEL

DEPLOYED USING AWS” carried out by TUNEER SAHA (1DS21ET099), YESHWANTH

REDDY (1DS21ET111), SHIVKUMAR KHOT (1DS22ET414) and PRAKASH R (1DS22ET412)

a bonafide student of DAYANANDA SAGAR COLLEGE OF ENGINEERING, an autonomous

institution affiliated to VTU, Belagavi in partial fulfillment for the award of Degree of Bachelor of

Electronics and Telecommunication Engineering during the year 2024-2025. It is certified that all

corrections/suggestions indicated for Internal Assessment have been incorporated in the report deposited

in the departmental library. The project report has been approved as it satisfies the academic requirements

with respect to the work prescribed for the said Degree.

Signature of the Guide Signature of the HOD Signature of the Principal


Name Dr. Annapurna P Patil Dr. B G Prasad
Designation Dean Academics, Prof & Head Principal
Dept. of ISE, DSCE Dept. of ISE, DSCE, Bengaluru DSCE, Bengaluru
Bengaluru
Name of the Examiners Signature with date

1. ........................................... ..........................................

2. ........................................... ..........................................
DAYANANDA SAGAR COLLEGE OF ENGINEERING
(An Autonomous Institute affiliated to Visvesvaraya Technological University (VTU), Belagavi,
Approved by AICTE and UGC, Accredited by NAAC with ‘A’ grade & ISO 9001 – 2015 Certified Institution)
Shavige Malleshwara Hills, Kumaraswamy Layout, Bengaluru-560 111, India

DEPARTMENT OF INFORMATION SCIENCE & ENGINEERING


(Accredited by NBA Tier 1: 2022-2025)

DECLARATION

We, Tuneer Saha (1DS21ET099), Yeshwanth Reddy (1DS21ET111), Shivkumar Khot


(1DS22ET418) and Prakash (1DS22ET412), respectively, hereby declare that the project work entitled
“ Title of the Project ” has been independently done by us under the guidance of ‘Guide name’,
Designation of guide, ISE department and submitted in partial fulfillment of the requirement for the
award of the degree of Bachelor of Electronics and Telecommunication Engineering at Dayananda
Sagar College of Engineering, an autonomous institution affiliated to VTU, Belagavi during the academic
year 2024-2025.

We further declare that we have not submitted this report either in part or in full to any other university
for the award of any degree.

NAME OF THE CANDIDATE USN


NAME OF THE CANDIDATE USN
NAME OF THE CANDIDATE USN
NAME OF THE CANDIDATE USN

PLACE:
DATE:
ACKNOWLEDGEMENT

The satisfaction and euphoria accompanying the successful completion of any task would be incomplete
without the mention of people who made it possible and under constant guidance and encouragement the
task was completed. We sincerely thank the Management of Dayananda Sagar College of
Engineering, Bengaluru.

We express our sincere regards and thanks to Dr. B G Prasad, Principal, Dayananda Sagar College
of Engineering, Bengaluru. His constant encouragement guidance and valuable support have been an
immense help in realizing this technical seminar.

We express our sincere regards and thanks to Dr. Annapurna P Patil, Professor & Head, Department
of Information Science and Engineering, Dayananda Sagar College of Engineering, Bengaluru. Her
incessant encouragement guidance and valuable technical support have been an immense help in realizing
this project. Her guidance gave us the environment to enhance our knowledge, and skills and to reach the
pinnacle with sheer determination, dedication, and hard work.

We would like to express profound gratitude to my guide Guide name, designation, Department of
Information Science and Engineering, Dayananda Sagar College of Engineering, Bengaluru who
has encouraged us throughout the project. His/Her moral support enabled us to complete my work
successfully.

We express our sincere thanks to Project Coordinator Dr. Vaidehi M, Assoc. Prof, and Dr. Bhavani K
Asst. Prof. of the Department of Information Science and Engineering for their continues support
and guidance. We thank all teaching and non-teaching staff of the Department of Information Science
and Engineering for their kind and constant support throughout the academic Journey.

NAME OF THE CANDIDATE USN


NAME OF THE CANDIDATE USN
NAME OF THE CANDIDATE USN
NAME OF THE CANDIDATE USN
ABSTRACT

Object detection is a cornerstone of computer vision, enabling systems to identify and localize objects
within images or video streams. YOLOv10, the latest iteration of the "You Only Look Once"
framework, is celebrated for its balance of speed and accuracy, making it suitable for real-time
applications. This project presents the practical implementation and deployment of a YOLOv10-based
object detection system on Amazon Web Services (AWS), with the specific outcome of streaming
video from the server to users' phones, computers, and laptops via a provided address.

The deployment involves hosting the application on an AWS EC2 instance, where a Flask backend is
deployed using Gunicorn for handling API requests. CloudFront is integrated as a Content Delivery
Network (CDN) to enhance performance and reduce latency, while AWS ACM provides SSL
certificates for secure communication. Route 53 manages DNS hosting to ensure reliable domain
resolution. The backend leverages the Ultralytics library and OpenCV (cv2) for model inference, while
Flask and jsonify facilitate API responses.

This architecture enables seamless video streaming to multiple devices, showcasing YOLOv10's
effectiveness in real-world scenarios. Applications range from surveillance and security systems to e-
commerce and interactive experiences. Future work suggests exploring edge computing to improve
real-time responsiveness further, ensuring scalability and performance in diverse environments.

Keywords: YOLOv10, Object Detection, AWS EC2, CloudFront CDN, AWS Route 53, Scalable Cloud
Infrastructure
Table of Contents

ABSTRACT iii
ACKNOWLEDGMENT iv
LIST OF TABLES vii
LIST OF FIGURES viii
LIST OF ABBREVIATIONS AND SYMBOLS ix
1. INTRODUCTION…………………………………………………………………………1
1.1 Overview
1.2 Problem Statement
1.3 Objectives
1.4 Motivation
2. LITERATURE SURVEY…………………………………………………22
3. PROBLEM ANALYSIS & DESIGN………………………………………………… 40
3.1 Analysis
3.2 Hardware Requirements
3.3 Software Requirements
3.4 System Architecture Diagram
3.5 Data flow Diagram
3.6 Use Case Diagram
3.7 Sequence Diagram
4. IMPLEMENTATION
4.1 Overview of System Implementation
4.2 Module Description
4.3 Algorithms
4.4 Code Snippets
5. TESTING
5.1 Unit Test Cases
5.2 Integration Test Cases
6. RESULTS
6.1 Results and Analysis
7. CONCLUSION AND FUTURE SCOPE
7.1 Conclusion
7.2 Future Scope
REFERENCES
PUBLICATION DETAILS
PLAGIARISM REPORT
APPENDIX (IF ANY)
LIST OF FIGURES

Fig. No. Fig. Caption Page No.


LIST OF TABLES

Table No. Table Caption Page No.


LIST OF ABBREVIATIONS

Abbreviation Full Description


OBJECT DETECTION USING YOLOv10 ML MODEL DEPLOYED
USING AWS

INTRODUCTION
Real-time object detection plays a significant role in various domains like, video surveillance,
computer vision, autonomous driving and the operation of robots. Object detection is widely
used to count objects in a scene, track their precise locations and accurately label the objects. It
seeks to answer where is the object? , And What is the object ?.

Image localization refers to the process of finding a single object in an image, while
object detection refers to the process of finding several objects in an image. The Objective is to
detect objects using You Only Look Once (YOLO) approach and deploy it using AWS. This
method has several advantages as compared to other object detection algorithms.

In other algorithms like Convolutional Neural Network, Fast-Convolutional Neural


Network, the algorithm will not look at the image completely, but in YOLO, the algorithm looks
the image completely by predicting the bounding boxes, using convolutional network and the
class probabilities for these boxes, and detects the image faster as compared to other algorithms.

YOLO algorithm has emerged as a well-liked and structured solution for real-time object
detection due to its ability to detect items in one operation through the neural network. Tasks like
recognition, detection, localization, or finding widespread applicability in the best real-world
scenarios, make object detection a crucial subdivision of computer vision. We can identify the
things in the frame thanks to the high accuracy of the YOLO model.

For the project involving YOLOv10-based object detection deployed on AWS, the integration
of various technologies plays a crucial role in ensuring performance, scalability, and ease of use.
The object detection model is hosted on an AWS EC2 instance, which provides the necessary
compute power to run the YOLOv10 model efficiently. Flask is used as the backend framework
to handle API requests, facilitating communication between the frontend and the model. To
manage incoming traffic and improve response times, Gunicorn is deployed as a WSGI server,
handling multiple requests concurrently.

For global content delivery and to minimize latency, Amazon CloudFront serves as a Content
Delivery Network (CDN), caching the video stream and reducing the load on the server. The
jsonify function in Flask ensures that data is easily returned in JSON format, making it suitable

Dept. of E T E, DSCE AY 2024-25 1


OBJECT DETECTION USING YOLOv10 ML MODEL DEPLOYED
USING AWS

for API-based responses. To manage the application's domain and route user traffic reliably,
AWS Route 53 is used for DNS management, ensuring users can access the application through
a custom domain with low latency.

A brief introduction for some of the technology used in the Project:

Flask: Flask is a lightweight, Python-based web framework widely used for developing web
applications and APIs. Its simplicity and flexibility make it ideal for projects requiring rapid
development and easy scalability. Flask supports extensions for advanced functionalities like
database integration, authentication, and API handling. It is particularly effective for deploying
machine learning models, where frameworks like Gunicorn can be used to handle high-traffic
requests. Flask’s seamless integration with libraries like OpenCV and Ultralytics allows for
efficient implementation of computer vision tasks, such as YOLOv10-based object detection.
This makes Flask a popular choice for real-time applications requiring secure, scalable, and low-
latency solutions.

AWS: Amazon Web Services (AWS) is a comprehensive cloud platform that provides on-
demand computing resources, storage, and services to build and deploy applications at scale.
Among its services, Amazon EC2 (Elastic Compute Cloud) stands out as a virtual server
solution that offers scalable compute capacity in the cloud.EC2 instances allow users to run
applications with customized configurations, choosing from various operating systems, storage
options, and instance types tailored to specific performance needs. With features like auto-
scaling, load balancing, and secure networking, EC2 is ideal for hosting applications such as web
servers, databases, or machine learning models. Its flexibility and pay-as-you-go pricing make it
a key component for deploying robust, scalable, and efficient cloud-based solutions. CloudFront
- Amazon CloudFront is a Content Delivery Network (CDN) service that distributes content
globally with low latency and high transfer speeds. By caching content at edge locations
worldwide, it accelerates the delivery of static and dynamic web content, such as HTML, images,
videos, and API responses. CloudFront can be integrated with other AWS services like S3 for
storage or EC2 for serving dynamic content, improving performance and reducing latency for
users regardless of their geographical location.

Dept. of E T E, DSCE AY 2024-25 2


OBJECT DETECTION USING YOLOv10 ML MODEL DEPLOYED
USING AWS

Gunicorn: Gunicorn (Green Unicorn) is a Python WSGI (Web Server Gateway Interface) HTTP
server that is widely used to serve Python web applications, particularly those built with
frameworks like Flask and Django. It acts as an intermediary between the web server (like
Nginx) and the application, handling incoming HTTP requests and passing them to the Python
application for processing. Gunicorn is known for its performance and scalability, supporting
multiple worker processes to handle multiple requests simultaneously. This makes it ideal for
production environments where handling a high volume of requests with minimal latency is
crucial.

jsonify: In Flask, jsonify is a function used to easily convert Python objects (like dictionaries or
lists) into JSON format. It sets the correct MIME type (application/json) and makes it convenient
to return JSON responses from Flask routes, which is particularly useful for building APIs. It
simplifies the process of structuring data for client-side applications, making it ideal for web
applications that need to send structured data between the frontend and backend.

Python: Python is a high-level, interpreted programming language known for its readability,
simplicity, and versatility. It supports multiple programming paradigms, including procedural,
object-oriented, and functional programming. Python has a rich standard library and a large
ecosystem of third-party libraries, making it ideal for a wide range of applications, from web
development and data analysis to machine learning and automation. Python's ease of use and
community support have made it one of the most popular programming languages globally.

NumPy: NumPy (Numerical Python) is a fundamental package for scientific computing in


Python. It provides support for large, multi-dimensional arrays and matrices, along with a
collection of high-level mathematical functions to operate on these arrays. NumPy is optimized
for performance and offers capabilities such as element-wise operations, linear algebra, and
statistical analysis. It is widely used in fields like data science, machine learning, and numerical
simulations. NumPy serves as the foundation for many other libraries, including pandas, SciPy,
and TensorFlow..

Dept. of E T E, DSCE AY 2024-25 3


OBJECT DETECTION USING YOLOv10 ML MODEL DEPLOYED
USING AWS

1.1 Overview

Object detection has become an essential technology in computer vision, enabling machines
to identify and locate objects within images or videos. This project aims to develop a robust,
scalable, and real-time object detection application using the YOLOv10 (You Only Look
Once) model. By leveraging state-of-the-art machine learning techniques and cloud
infrastructure provided by AWS, the system offers efficient processing and seamless
deployment.

The application is designed to detect objects in uploaded images, annotate them with
bounding boxes, and return results via a web interface. Technologies like Flask, AWS, and
Gunicorn are integrated to ensure high performance and scalability.

Key Features

1. Real-Time Object Detection:


o The system processes uploaded images and returns results in real-time, making it
suitable for applications like surveillance and inventory management.
2. Dynamic Class Handling:
o The YOLOv10 model dynamically detects objects and associates them with class
names and confidence scores.
3. Cloud Deployment:
o Hosted on AWS, the system benefits from scalability, reliability, and global
accessibility.
4. Interactive Web Interface:
o A user-friendly interface allows non-technical users to interact with the
application effortlessly.

Technologies Used

1. YOLOv10 Model:
o A pre-trained YOLOv10 model is used for its speed and accuracy in object
detection tasks. It processes the image in a single pass, delivering high
performance.
2. Flask:

Dept. of E T E, DSCE AY 2024-25 4


OBJECT DETECTION USING YOLOv10 ML MODEL DEPLOYED
USING AWS

o A lightweight Python web framework handles the backend server and manages
API routes.
3. AWS (Amazon Web Services):
o EC2 Instance: Hosts the application and provides compute resources.
o CloudFront (optional): Enhances content delivery speed using a Content
Delivery Network (CDN).
4. Gunicorn:
o A production-ready WSGI server that ensures efficient handling of concurrent
user requests.
5. Python Libraries:
o OpenCV: For image decoding, preprocessing, and annotation.
o NumPy: For numerical computations and image array manipulations.
o Ultralytics YOLO API: For easy integration and inference using YOLOv10.
6. Frontend Technologies:
o HTML/CSS/JavaScript: Power the user interface, enabling image uploads and
result display.
o AJAX: Enables asynchronous communication between the frontend and backend.

Application Workflow

1. Image Upload:
o Users upload an image through the web interface. JavaScript converts the image
into a Base64-encoded string for transmission.
2. Backend Processing:
o Flask receives the encoded image via the /process_frame API endpoint.
o The image is decoded and passed to the YOLOv10 model for inference.
3. Object Detection:
o YOLOv10 detects objects, their bounding boxes, and confidence scores.
o Results are drawn as bounding boxes on the image using OpenCV.
4. Result Transmission:
o The processed image is re-encoded into Base64 format.

Dept. of E T E, DSCE AY 2024-25 5


OBJECT DETECTION USING YOLOv10 ML MODEL DEPLOYED
USING AWS

o JSON response contains the image and detection details, which are sent back to
the frontend.
5. Frontend Display:
o The processed image is displayed with bounding boxes and detection details.

Deployment on AWS

The application is deployed on AWS to ensure high availability, scalability, and performance.
Key AWS services used include:

 EC2 Instance:
o Hosts the Flask application and the YOLOv10 model.
 CloudFront :
o Optimizes content delivery for global users.
 Gunicorn: For efficient request handling.
 HTTPS: For secure data transmission.

1.2 Problem Statement

Object detection is a critical aspect of modern computer vision, enabling systems to identify and
locate objects within images or videos. This technology has applications in diverse fields,
including surveillance, retail, healthcare, and autonomous vehicles. Despite its potential,
implementing an effective and scalable object detection system remains a significant challenge.
Key issues include the need for high accuracy, low latency, ease of deployment, and scalability
across diverse use cases.

This project focuses on addressing these challenges by developing a real-time object detection
system using the YOLOv10 (You Only Look Once) model, integrated with a web application
and deployed on AWS infrastructure. The system is designed to deliver accurate and efficient
detection results while ensuring ease of use for non-technical users through an intuitive interface.

Dept. of E T E, DSCE AY 2024-25 6


OBJECT DETECTION USING YOLOv10 ML MODEL DEPLOYED
USING AWS

Challenges in object Detection

The project aims to tackle several critical challenges associated with object detection systems:

1. Accuracy and Efficiency:


o Object detection algorithms must strike a balance between accuracy and speed.
While high accuracy ensures reliable detection, excessive processing time can
hinder real-time applications.
2. Scalability:
o Real-world object detection systems must handle large volumes of data and
simultaneous requests, especially when deployed for applications like surveillance
or retail analytics.
3. Integration with Web Applications:
o Many detection systems lack seamless integration with user-friendly interfaces,
limiting accessibility for end-users without technical expertise.
4. Resource Constraints:
o Deployment on cloud platforms requires optimization of compute resources to
minimize costs while maintaining performance.
5. Global Accessibility:
o Object detection systems need to be accessible across geographies with minimal
latency, necessitating efficient deployment architectures.
6. Dynamic Object Classes:
o In many applications, the objects to be detected may vary or evolve over time,
requiring a flexible detection system capable of adapting to changing
requirements.

Proposed Solution

This project proposes a real-time object detection system that addresses the above challenges
through the following features and technologies:

1. Use of YOLOv10 Model:

Dept. of E T E, DSCE AY 2024-25 7


OBJECT DETECTION USING YOLOv10 ML MODEL DEPLOYED
USING AWS

o The YOLOv10 model is employed for its speed and accuracy. Its lightweight
architecture ensures efficient inference without compromising detection
performance.
2. Integration with Flask Framework:
o Flask serves as the backend framework, facilitating communication between the
frontend and the object detection pipeline. It ensures low-latency processing of
uploaded images.
3. Scalable Deployment on AWS:
o The application is hosted on AWS EC2 instances, providing scalable compute
resources. Optional services like CloudFront ensure low-latency global content
delivery.
4. User-Friendly Web Interface:
o The system includes a web-based frontend where users can upload images and
view detection results. This eliminates the need for specialized software or
technical expertise.
5. Asynchronous Communication:
o The use of JavaScript and AJAX ensures real-time interaction between the
frontend and backend, allowing seamless image uploads and result retrieval
without page reloads.
6. Optimized Resource Usage:
o Gunicorn is used as the production server to handle concurrent requests
efficiently. The Flask application and YOLO model are optimized to minimize
resource usage while maintaining high performance.

Scope and Goals

The scope of this project includes:

 Developing a functional real-time object detection application.


 Ensuring scalability through deployment on AWS.
 Providing an intuitive interface for non-technical users.
 Achieving high accuracy and low latency in detection tasks.

Dept. of E T E, DSCE AY 2024-25 8


OBJECT DETECTION USING YOLOv10 ML MODEL DEPLOYED
USING AWS

Goals:

 Accuracy: Achieve reliable detection with a focus on minimizing false positives and
false negatives.
 Scalability: Ensure the system can handle high traffic and large datasets.
 Accessibility: Develop an easy-to-use web application that requires minimal technical
knowledge.
 Performance Optimization: Optimize the application for speed and cost-efficiency in
cloud environments.

1.2 Objectives
The primary objective of this project is to design, develop, and deploy a real-time object
detection system capable of accurately identifying and localizing objects within images. By
leveraging the YOLOv10 model, modern web technologies, and cloud infrastructure, the
project seeks to create an efficient, scalable, and user-friendly solution suitable for various
real-world applications. The system aims to empower users with an accessible and robust
tool that delivers precise results in real-time.

Specific Objectives

1. Real-Time Object Detection:


o Implement a system that processes uploaded images and returns detection results,
including object classes, confidence scores, and bounding boxes, in real time.
2. Accuracy and Performance Optimization:
o Utilize the YOLOv10 model to achieve high accuracy in object detection while
maintaining a low inference time for seamless user interaction.
3. Scalable Cloud Deployment:
o Deploy the application on AWS infrastructure to ensure reliability, scalability,
and global accessibility. Utilize services like EC2 and optionally CloudFront to
optimize performance for users across different regions.
4. User-Friendly Interface:

Dept. of E T E, DSCE AY 2024-25 9


OBJECT DETECTION USING YOLOv10 ML MODEL DEPLOYED
USING AWS

o Develop an intuitive web interface where users can easily upload images, view
detection results, and interact with the application without requiring technical
expertise.
5. Dynamic Handling of Object Classes:
o Ensure the system can adapt to detect a variety of object classes and provide
detailed information about each detected object, including its confidence score
and precise location.
6. Efficient Backend Processing:
o Integrate Flask as the backend framework to handle API requests, process images,
and return detection results in a structured JSON format. Use Gunicorn to enable
the application to manage multiple concurrent requests efficiently.
7. Interactive Frontend Design:
o Use JavaScript and AJAX to create an interactive and responsive user experience.
Enable real-time communication between the frontend and backend for uploading
images and retrieving results dynamically.
8. Optimized Resource Utilization:
o Optimize compute resource usage on AWS by selecting an appropriate instance
type, fine-tuning the YOLOv10 model, and minimizing latency and resource
consumption.
9. Visualization of Detection Results:
o Provide visual feedback to the user by overlaying bounding boxes, class names,
and confidence scores on the uploaded image, making the results easy to interpret.
10. Security and Reliability:
o Implement secure communication protocols (e.g., HTTPS) to protect user data.
Ensure the application is reliable and capable of handling large-scale deployment
scenarios.

Dept. of E T E, DSCE AY 2024-25 10


OBJECT DETECTION USING YOLOv10 ML MODEL DEPLOYED
USING AWS

1.3 Motivation

The motivation for this project stems from the need to address these challenges and provide a
robust object detection solution that can serve diverse real-world applications. Leveraging state-
of-the-art technology like the YOLOv10 model, combined with scalable cloud deployment and
an intuitive user interface, this project aims to bridge the gap between advanced AI capabilities
and practical usability.

Key Motivation

1. Advancing Real-Time Object Detection:

 Many existing object detection systems struggle to achieve a balance between accuracy
and speed. Real-time processing is essential for applications such as surveillance,
autonomous vehicles, and industrial automation. The YOLOv10 model, known for its
high-speed inference and accuracy, serves as a foundation to meet this need. This project
aims to optimize the model's performance further, ensuring it can handle real-world
scenarios effectively.

2. Addressing Deployment Challenges:

 Deploying object detection systems often requires significant technical expertise and
resources, making them inaccessible to smaller organizations or individuals. By utilizing
AWS cloud infrastructure, this project demonstrates how such systems can be deployed
efficiently, ensuring global accessibility, scalability, and cost-effectiveness.

3. Improving Accessibility for Non-Technical Users:

 Most AI-powered tools require specialized knowledge to operate, limiting their usability
for non-technical users. This project aims to build an intuitive web application that allows
users to upload images and retrieve object detection results effortlessly, lowering the
barrier to entry for AI technologies.

4. Enhancing Industrial Applications:

Dept. of E T E, DSCE AY 2024-25 11


OBJECT DETECTION USING YOLOv10 ML MODEL DEPLOYED
USING AWS

 Object detection has applications in various industries:


o Surveillance and Security: Detecting suspicious activities or objects in real-time.
o Retail: Automating inventory tracking and monitoring customer behavior.
o Healthcare: Assisting in medical imaging by identifying anomalies.
o Transportation: Enabling autonomous vehicles to navigate safely. By providing
a flexible and scalable solution, this project seeks to meet the demands of these
industries and demonstrate the potential of object detection technology.

Dept. of E T E, DSCE AY 2024-25 12


OBJECT DETECTION USING YOLOv10 ML MODEL DEPLOYED
USING AWS

Literature Survey
Object detection has undergone remarkable advancements, transitioning from traditional
methods relying on handcrafted features and algorithms like SIFT, HOG, and Viola-Jones, to
modern deep learning frameworks that leverage neural networks. Early approaches were limited
in robustness and scalability, but the introduction of Convolutional Neural Networks (CNNs)
revolutionized the field with models like R-CNN, Fast R-CNN, and YOLO, which brought
significant improvements in speed and accuracy. Frameworks like SSD and RetinaNet further
enhanced real-time performance and detection of smaller objects. More recent iterations, such as
YOLOv5 and YOLOv10, leverage architectural innovations, data augmentation, and pre-trained
models to push the boundaries of precision and computational efficiency, making object
detection integral to applications like surveillance, autonomous vehicles, and healthcare.

1. The paper “You Only Look Once: Unified, Real-Time Object Detection” by Joseph Redmon,
Santosh Divvala, and Ross Girshick is a cornerstone of modern object detection and directly
relevant to this project. Introduced at CVPR 2016, the YOLO algorithm revolutionized object
detection by framing it as a single regression problem, eliminating the need for multi-stage
processing seen in earlier methods like R-CNN and Fast R-CNN. YOLO uses a unified
convolutional neural network to process an entire image in a single pass, predicting bounding
boxes and class probabilities simultaneously. This approach enables real-time detection,
achieving speeds of up to 45 frames per second, making it suitable for applications requiring
high throughput. The algorithm divides the image into a grid, with each cell responsible for
detecting objects and predicting their locations, classes, and confidence scores. By leveraging the
global context of the image, YOLO achieves competitive accuracy while being significantly
faster than traditional methods. This innovation laid the groundwork for subsequent YOLO
iterations, including YOLOv10 used in this project, further enhancing speed and precision for
real-time object detection tasksThe YOLO algorithm revolutionized object detection by framing
it as a single regression problem, eliminating the need for multi-stage processing seen in earlier
methods like R-CNN and Fast R-CNN. YOLO uses a unified convolutional neural network to
process an entire image in a single pass, predicting bounding boxes and class probabilities
simultaneously. This approach enables real-time detection, achieving speeds of up to 45 frames

Dept. of E T E, DSCE AY 2024-25 13


OBJECT DETECTION USING YOLOv10 ML MODEL DEPLOYED
USING AWS

per second, making it suitable for applications requiring high throughput. The algorithm divides
the image into a grid, with each cell responsible for detecting objects and predicting their
locations, classes, and confidence scores. By leveraging the global context of the image, YOLO
achieves competitive accuracy while being significantly faster than traditional methods. This
innovation laid the groundwork for subsequent YOLO iterations, including YOLOv10 used in
this project, further enhancing speed and precision for real-time object detection tasks.

2. YOLO Juan Du1,”Understanding of Object Detection Basedon CNN Family”,New Research,


and Development Center of Hisense, Qingdao 266071, China : Understanding of Object
Detection Based on CNN Family and YOLO, by Juan Du. The paper “Understanding of Object
Detection Based on CNN Family and YOLO” by Juan Du provides a comprehensive comparison
of object detection frameworks, focusing on the evolution from traditional Convolutional Neural
Networks (CNNs) to advanced methods like R-CNN and YOLO. It highlights the strengths and
limitations of these approaches, emphasizing YOLO's efficiency and real-time capabilities.
Unlike R-CNN, which involves multiple stages of region proposal and classification, YOLO
simplifies object detection by treating it as a single regression problem, significantly reducing
computation time while maintaining accuracy. The paper underscores how YOLO's grid-based
detection and unified architecture address the inefficiencies of earlier methods, making it a
transformative solution for real-time applications. This broader analysis contextualizes YOLO’s
development within the CNN family, demonstrating its role as a milestone in improving
detection speed and performance.

3. “Rapid Object Detection using a Boosted Cascade of Simple Features” (2001) : In the early
stages of object detection, spanning the 1990s to the early 2000s, methods primarily relied on
handcrafted features and classical machine learning algorithms. A notable approach was the Haar
cascades method introduced by Viola and Jones in their seminal paper “Rapid Object Detection
using a Boosted Cascade of Simple Features” (2001). This method revolutionized face detection
with its ability to perform real-time detection using simple Haar-like features combined with
AdaBoost for feature selection. Haar cascades were computationally efficient and suitable for
real-time applications on limited hardware, making them a landmark in object detection.

Dept. of E T E, DSCE AY 2024-25 14


OBJECT DETECTION USING YOLOv10 ML MODEL DEPLOYED
USING AWS

However, these approaches heavily relied on feature engineering, requiring domain-specific


expertise to manually design features such as edges, textures, or shapes to distinguish objects.
While effective for specific, simple tasks, their reliance on fixed feature sets constrained their
adaptability to complex and diverse datasets. For instance, detecting objects in cluttered or varied
environments often resulted in poor accuracy, as handcrafted features struggled to capture the
richness and variability of real-world data. Despite these limitations, early methods like Haar
cascades laid the foundation for modern object detection by introducing the concept of cascaded
classifiers and paving the way for feature extraction methods later automated by deep learning.

5. Navneet Dalal and Bill Triggs, “Histograms of Oriented Gradients for Human Detection”,
Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern
Recognition (CVPR), 2005. : The paper “Histograms of Oriented Gradients for Human
Detection” by Navneet Dalal and Bill Triggs introduced a robust feature descriptor, HOG, that
significantly advanced object detection, particularly for identifying humans in images. The
method focuses on capturing the distribution of intensity gradients and edge orientations, which
are critical for recognizing shapes and appearances. HOG divides an image into small cells,
calculates gradient orientations in each cell, and compiles this information into histograms.
These histograms are normalized across overlapping regions, enhancing invariance to lighting
and contrast changes.

By pairing HOG with a Support Vector Machine (SVM) classifier, Dalal and Triggs
demonstrated superior detection rates, especially on the INRIA Person Dataset, outperforming
earlier methods like Haar cascades. The descriptor's ability to extract detailed structural features
made it robust to partial occlusions and background clutter, addressing limitations in previous
feature-based methods. This work laid the foundation for modern feature extraction and
influenced the transition to automated, deep learning-based approaches like YOLO, which
further enhanced efficiency and accuracy.

Dept. of E T E, DSCE AY 2024-25 15


OBJECT DETECTION USING YOLOv10 ML MODEL DEPLOYED
USING AWS

6. Krizhevsky, A., Sutskever, I., & Hinton, G. E., “ImageNet Classification with Deep
Convolutional Neural Networks”, Advances in Neural Information Processing Systems
(NeurIPS), 2012 : AlexNet revolutionized the field of computer vision by demonstrating the
power of deep learning with Convolutional Neural Networks (CNNs). This paper introduced a
deep CNN architecture that significantly outperformed traditional machine learning models on
the ImageNet Large Scale Visual Recognition Challenge (ILSVRC). The network, consisting of
five convolutional layers and three fully connected layers, employed techniques such as
Rectified Linear Units (ReLU) for non-linearity, dropout for regularization, and data
augmentation to reduce overfitting. The success of AlexNet proved the viability of deep learning
for large-scale image classification, setting the stage for its application in various computer
vision tasks, including object detection, which is the foundation for modern approaches like
YOLO.

7. Girshick, R., Donahue, J., Darrell, T., & Malik, J., “Rich Feature Hierarchies for Accurate
Object Detection and Semantic Segmentation”, Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition (CVPR), 2014. : The Region-based Convolutional
Neural Network (R-CNN) introduced by Girshick et al. in 2014 significantly advanced object
detection by combining the power of CNNs with region proposal techniques. R-CNN first
generates candidate object regions using selective search, then extracts CNN-based features from
these regions for classification using a support vector machine (SVM). This two-stage pipeline
resulted in substantial improvements in detection accuracy over previous methods. However, R-
CNN was computationally expensive because it required running the CNN on each proposed
region independently, making it slow and less practical for real-time applications. Despite these
drawbacks, R-CNN set the foundation for subsequent improvements in object detection.

8. Girshick, R., “Fast R-CNN”, Proceedings of the IEEE International Conference on Computer
Vision (ICCV), 2015. : Fast R-CNN was an optimization of the original R-CNN, introduced by
Girshick in 2015. Unlike R-CNN, which extracted features from each region proposal separately,
Fast R-CNN extracted features from the entire image in a single pass through the CNN. It then
used a Region of Interest (RoI) pooling layer to crop the feature maps corresponding to each
proposed region, improving both speed and accuracy. Fast R-CNN also replaced the SVM
classifier with a softmax layer, simplifying the training process. This approach significantly

Dept. of E T E, DSCE AY 2024-25 16


OBJECT DETECTION USING YOLOv10 ML MODEL DEPLOYED
USING AWS

reduced computation time, making the model more suitable for practical applications, though it
still required external region proposal methods, such as selective search, which hindered its real-
time performance.

9. Pedro F. Felzenszwalb, Ross B. Girshick, David McAllester, and Deva Ramanan, “Object
Detection with Discriminatively Trained Part-Based Models”, IEEE Transactions on Pattern
Analysis and Machine Intelligence, 2008. : The paper “Object Detection with Discriminatively
Trained Part-Based Models” by Felzenszwalb et al., published in 2008, introduced a significant
advancement in object detection through the use of part-based models. This approach focused on
detecting objects by modeling them as a collection of parts, each of which could be detected
individually and then combined to form the full object. The authors proposed a discriminative
training approach, where the model was trained to distinguish between positive object instances
and negative background examples. The key innovation was the use of a deformable part model
(DPM), which allowed for variations in object appearance and pose while maintaining a robust
detection framework. By incorporating the spatial arrangement of parts and using efficient
algorithms for part-based detection, DPMs achieved higher accuracy and robustness compared to
earlier methods. While the methods from this paper are not directly used in your project, the
concepts introduced in DPMs, particularly the idea of part-based representation and
discriminative training, influenced later object detection methods, including those integrated in
YOLO. These developments laid the groundwork for more efficient and flexible models that can
detect objects with varied appearances and in different configurations.

10. AWS Documentation, “Amazon EC2, S3, and Elastic Beanstalk Services” : The AWS
documentation on “Amazon EC2, S3, and Elastic Beanstalk Services” provides essential
guidance for deploying scalable and efficient applications in the cloud, making it highly relevant
to the project. Amazon EC2 (Elastic Compute Cloud) offers resizable compute capacity, which
is crucial for hosting the YOLOv10-based object detection model and handling the
computational demands of real-time inference. Amazon S3 (Simple Storage Service) is used for
storing and retrieving large datasets, such as the model weights, input images, and processed
outputs, with high durability and availability. Elastic Beanstalk, a Platform-as-a-Service (PaaS),
simplifies the deployment and management of the Flask-based web application by automating
provisioning, load balancing, and scaling, thereby reducing the overhead of managing

Dept. of E T E, DSCE AY 2024-25 17


OBJECT DETECTION USING YOLOv10 ML MODEL DEPLOYED
USING AWS

infrastructure. Together, these services enable seamless deployment and scalability of the
project, ensuring robust performance even under varying workloads. The integration of these
AWS services not only supports the computational needs of the project but also ensures cost-
effective and secure hosting, vital for real-world applications.

11. Bochkovskiy, A., Wang, C., & Liao, H., “YOLOv4: Optimal Speed and Accuracy of Object Detection”,
arXiv preprint, 2020. The paper “YOLOv4: Optimal Speed and Accuracy of Object Detection” by
Bochkovskiy, Wang, and Liao (2020) introduced significant advancements to the YOLO (You Only Look
Once) series, making it a pivotal reference for modern object detection models like YOLOv10 used in the
project. YOLOv4 focused on enhancing both accuracy and efficiency, achieving a balance crucial for real-
time applications. The authors incorporated innovative techniques such as Cross-Stage Partial (CSP)
connections to reduce computational complexity and improve gradient flow, Mish activation for
smoother optimization, and mosaic data augmentation to diversify the training data without additional
annotation efforts. Additionally, YOLOv4 leveraged advancements in training strategies like SAT (Self-
Adversarial Training) and CIoU (Complete Intersection over Union) loss to improve object localization
and detection accuracy. These enhancements enabled YOLOv4 to deliver superior performance on
standard benchmarks while maintaining high inference speed, making it suitable for deployment in real-
world scenarios. The foundational concepts and architectural optimizations introduced in YOLOv4
directly influenced the iterative improvements leading to YOLOv10, which integrates these principles
alongside newer technologies such as attention mechanisms for even greater accuracy and versatility.
This lineage underscores YOLOv4's critical role in shaping the advancements employed in the project.

12. Ultralytics, “YOLOv5 Documentation”, 2020. The “YOLOv5 Documentation” by Ultralytics (2020)
marks a significant milestone in the evolution of the YOLO object detection family, emphasizing
usability, efficiency, and deployment readiness. Although not a peer-reviewed paper, YOLOv5
introduced various practical innovations that directly influenced its adoption and subsequent versions
like YOLOv10, which is used in the project. YOLOv5 featured a lightweight and modular architecture,
enabling faster training and inference compared to its predecessors. The inclusion of techniques such as
adaptive anchor generation and auto-learning of input dimensions further streamlined the model’s
adaptability to diverse datasets and applications. YOLOv5 also enhanced the ease of use with its
PyTorch-based implementation, pre-trained weights, and compatibility with deployment tools like ONNX
and TensorRT. These advancements not only improved accuracy and speed but also simplified real-
world deployment, making YOLOv5 an industry favorite for object detection tasks. The focus on

Dept. of E T E, DSCE AY 2024-25 18


OBJECT DETECTION USING YOLOv10 ML MODEL DEPLOYED
USING AWS

efficiency and deployment aligns closely with the goals of the project, where real-time inference and
scalability are critical. YOLOv5's contributions laid the groundwork for subsequent iterations like
YOLOv10, which integrates these usability improvements alongside cutting-edge technologies such as
attention mechanisms and transformer-based architectures.

13. Vaswani, A., et al., “Attention Is All You Need”, Advances in Neural Information Processing Systems
(NeurIPS), 2017 : The paper “Attention Is All You Need” by Vaswani et al., published in 2017, introduced
the transformer architecture, revolutionizing deep learning by shifting the focus from convolutional and
recurrent models to self-attention mechanisms. This groundbreaking work proposed a model that relied
entirely on attention mechanisms to process input sequences, eliminating the need for recurrence while
achieving state-of-the-art performance in tasks like machine translation. The core innovation, the self-
attention mechanism, allowed the model to weigh the importance of different parts of the input
dynamically, enabling it to capture long-range dependencies more effectively than previous
architectures. Transformers’ scalability and parallelization capabilities further enhanced computational
efficiency. While the paper is not directly about object detection, its principles have profoundly
influenced modern object detection architectures, including YOLOv10, as used in the project. In
YOLOv10, attention mechanisms derived from transformer-based models are employed to enhance
feature extraction and improve detection accuracy, particularly for objects in complex or cluttered
scenes. This integration of attention into object detection frameworks underscores the transformative
impact of Vaswani et al.'s work on a wide range of deep learning applications, including those
implemented in this project.

Dept. of E T E, DSCE AY 2024-25 19


OBJECT DETECTION USING YOLOv10 ML MODEL DEPLOYED
USING AWS

Product Analysis and Design

Object detection is a fundamental task in computer vision with applications in security,


autonomous systems, healthcare, and more. The project aims to build a real-time object detection
system using the state-of-the-art YOLOv10 (You Only Look Once) model. The primary
challenge is deploying this system in a scalable and efficient manner, leveraging the power of
cloud computing through AWS. The goal is to create a web application where users can upload
images, and the system processes them to identify and localize objects, returning annotated
images and detailed detection information.

The solution involves integrating YOLOv10 with a Flask-based API, deploying it on an AWS
EC2 instance, and optimizing performance using tools like Gunicorn and CloudFront. Secure
communication is ensured using SSL certificates, and the user interface is designed with HTML,
CSS, and JavaScript for an interactive experience. This project focuses on delivering accuracy,
speed, and user accessibility while addressing challenges such as processing high-resolution
images and managing concurrent user requests.

3.1 Analysis

The project involves developing a real-time object detection system using the YOLOv10 (You
Only Look Once) model, deployed on Amazon Web Services (AWS). YOLOv10 is selected for
its fast inference capabilities, making it ideal for use cases requiring immediate processing of
images or video frames. The system is designed to detect and classify objects in images,
returning both the detected object labels and their locations as bounding boxes, all in real-time.
This system will be deployed as a web application to ensure accessibility from a variety of
devices, including desktops and smartphones.

The core objective of this project is to create a scalable and responsive application that can
handle incoming requests efficiently while maintaining high accuracy in object detection.
Scalability is a critical factor, as the system must support multiple concurrent users from
different geographical locations, ensuring low latency and fast response times. By deploying the
application on AWS EC2 instances, the system can leverage cloud computing resources,

Dept. of E T E, DSCE AY 2024-25 20


OBJECT DETECTION USING YOLOv10 ML MODEL DEPLOYED
USING AWS

enabling automatic scaling based on demand. AWS CloudFront further enhances performance by
distributing content globally, reducing the latency for users located far from the server.

One of the primary challenges is ensuring the system can effectively process high-resolution
images while maintaining real-time performance. YOLOv10 requires significant computational
power for inference, and to meet this demand, the application will be hosted on an EC2 instance,
possibly utilizing GPUs to speed up the processing time. However, as the model is deployed in a
cloud environment, managing computational costs and resource allocation becomes crucial.
Efficient use of AWS resources, combined with optimized code, will be key to providing real-
time object detection even with larger image sizes.

Another challenge lies in maintaining secure and reliable communication between the front-
end and back-end. Since the application handles image data, which could be sensitive depending
on the use case (e.g., security surveillance), ensuring encrypted data transmission is essential.
This will be achieved by using SSL certificates, which ensure secure HTTPS communication.
Moreover, the backend API will be designed using Flask, and all requests will be routed through
secure, production-ready servers such as Gunicorn.

The potential use cases for this application are diverse. In security surveillance, the system can
automatically detect intruders or monitor restricted areas, providing alerts based on detected
objects. In autonomous vehicles, the application can be used to identify pedestrians, other
vehicles, traffic signs, and obstacles, ensuring safety and compliance. Similarly, in industrial
monitoring, the system can help identify defective products on assembly lines or monitor
equipment status by detecting issues in real time.

Finally, the user experience is paramount. The system needs to provide fast, accurate feedback to
users, presenting the processed image and detection results in a user-friendly interface. The
front-end will be developed using HTML, CSS, and JavaScript, while AJAX will be used for
asynchronous communication between the user interface and the server. This ensures smooth
interaction without the need for full-page reloads. The success of this project will depend on
overcoming challenges related to performance, scalability, security, and efficient user
interaction.

Dept. of E T E, DSCE AY 2024-25 21


OBJECT DETECTION USING YOLOv10 ML MODEL DEPLOYED
USING AWS

3.2 Hardware Requirements

The hardware requirements for the object detection system can be divided into two categories:
server-side (for deployment on AWS EC2) and client-side (for the user interacting with the
application). These hardware requirements ensure that the system runs efficiently and provides
optimal performance for both processing and user interaction.

Server-Side (AWS EC2 Instance)

For the server-side deployment, the hardware specifications need to meet the computational
demands of running the YOLOv10 model in real time, particularly when handling high-
resolution images. The AWS EC2 instance will serve as the backbone of the application, hosting
both the model and the web API that interacts with users.

 Processor: The EC2 instance should be equipped with a quad-core processor, such as
an Intel Xeon or an equivalent CPU, to ensure sufficient computing power for processing
multiple requests concurrently. This is crucial for maintaining low-latency responses
when handling multiple user interactions simultaneously.
 Memory: A minimum of 8GB of RAM is recommended. While the YOLOv10 model
itself does not require excessive memory, the additional load from handling concurrent
requests, serving the web interface, and processing images can benefit from ample RAM
to maintain smooth performance and avoid memory-related bottlenecks.
 GPU (Optional): While not mandatory, a GPU, such as the NVIDIA T4 or an
equivalent model, is highly recommended for faster inference times. YOLOv10 is a deep
learning model that benefits significantly from GPU acceleration, which can dramatically
reduce the time required to process each image. This is especially important for real-time
applications, where low-latency is critical.
 Storage: A minimum of 20GB of SSD storage is required to store the YOLOv10 model
weights, application logs, temporary files, and other system data. An SSD is preferred
over traditional HDD storage to ensure faster read and write speeds, which is essential for
maintaining quick access to the model and efficient data handling.

Dept. of E T E, DSCE AY 2024-25 22


OBJECT DETECTION USING YOLOv10 ML MODEL DEPLOYED
USING AWS

 Network: A high-speed internet connection is essential for handling the incoming


requests, particularly if the application serves multiple users concurrently. The server
must have sufficient bandwidth to handle large image data uploads and download
processed results with minimal delay.

Client-Side

On the client side, the hardware requirements are focused more on ensuring that the user can
interact with the application through a web interface, without being burdened by performance
constraints.

 Device: The client-side device can be a laptop, desktop, or smartphone with sufficient
processing power to handle basic web browsing tasks. The device should have enough
resources to support modern web applications without experiencing significant lag during
image upload or interaction.
 Web Browser: The client needs to use a modern web browser (e.g., Chrome, Firefox,
Safari, or Edge) that supports HTML5, JavaScript, and AJAX. These technologies are
necessary to ensure the dynamic interaction between the user interface and the backend
server. AJAX allows for asynchronous communication with the server, enabling seamless
updates to the webpage without the need for full page reloads.
 Internet Connection: A stable internet connection is essential to ensure that users can
upload images and receive processed results without interruption. A slower internet
connection may result in longer upload and download times, affecting the overall user
experience, especially when working with high-resolution images.

3.3 Software Requirements

The software stack for the real-time object detection system using YOLOv10 and deployed on
AWS consists of a combination of operating systems, programming languages, frameworks,
libraries, and cloud services. This stack ensures that the application operates efficiently, is
secure, and delivers high-quality performance for real-time image processing and object

Dept. of E T E, DSCE AY 2024-25 23


OBJECT DETECTION USING YOLOv10 ML MODEL DEPLOYED
USING AWS

detection. Below is a detailed overview of the software requirements for both the server-side and
client-side of the project.

Operating System:

 Server-side: The backend application will run on Ubuntu 20.04 or later. Ubuntu is a
popular Linux distribution that is known for its stability, ease of use, and strong
community support, making it a great choice for server deployments. Ubuntu provides a
reliable environment for installing and running software such as Flask, Gunicorn, and
other dependencies.
 Client-side: The client-side can run on Windows, Mac, or Linux. These operating
systems provide the necessary platforms to access the web interface through modern web
browsers like Chrome, Firefox, Safari, or Edge. The client’s device does not require
specialized software beyond a browser, allowing users to interact with the object
detection system seamlessly.

Programming Languages:

 Python 3.8+: The backend of the system will be developed using Python, as it is widely
used in machine learning and web development. Python 3.8 or later will be used for
server-side programming, leveraging libraries such as OpenCV, NumPy, and the
Ultralytics YOLO API for object detection tasks.
 HTML, CSS, JavaScript: These web technologies will be used on the frontend to create
an interactive user interface. HTML provides the structure of the webpage, CSS handles
styling, and JavaScript is used to implement dynamic functionalities, such as updating
the page with detection results in real-time.

Frameworks and Libraries:

 Flask: The Flask web framework will be used to handle API requests on the server-side.
Flask is lightweight and flexible, making it a great choice for developing web
applications with RESTful APIs. It will serve as the backbone of the application,

Dept. of E T E, DSCE AY 2024-25 24


OBJECT DETECTION USING YOLOv10 ML MODEL DEPLOYED
USING AWS

enabling communication between the frontend (browser) and the backend (server-side
processing).
 Ultralytics YOLO API: The Ultralytics YOLO API will be used for object detection.
YOLOv10, implemented through this API, allows for real-time inference on images,
detecting objects and providing bounding box coordinates. The model will be pre-trained
and deployed, ready to process images sent by the client.
 OpenCV and NumPy: OpenCV (Open Source Computer Vision Library) is used for
image processing tasks such as decoding base64 images, drawing bounding boxes around
detected objects, and encoding the processed images back into base64 for transmission.
NumPy provides support for handling and manipulating image data as arrays, enabling
efficient processing.
 Gunicorn: Gunicorn (Green Unicorn) will serve as the production-ready WSGI server
for hosting the Flask application. It is a high-performance server used to handle
concurrent HTTP requests in a production environment, ensuring the application can
scale effectively under load.
 AJAX: AJAX (Asynchronous JavaScript and XML) will be used for frontend
communication with the server. AJAX allows the webpage to send requests to the server
and receive responses without refreshing the page. This is crucial for delivering a smooth,
real-time user experience where image detection results are updated dynamically without
reloading the entire page.

Cloud Services:

 AWS EC2: The application will be hosted on AWS EC2 (Elastic Compute Cloud)
instances. EC2 instances provide scalable computing power, which is crucial for running
the YOLOv10 model efficiently. EC2 enables dynamic scaling, so resources can be
adjusted based on traffic and computational needs, ensuring that the system can handle
varying loads.
 AWS CloudFront: CloudFront will be used for content delivery across global regions.
It is a Content Delivery Network (CDN) that caches content at edge locations, reducing
latency and providing faster delivery of images and results to users around the world.

Dept. of E T E, DSCE AY 2024-25 25


OBJECT DETECTION USING YOLOv10 ML MODEL DEPLOYED
USING AWS

CloudFront will also ensure the application remains responsive and accessible regardless
of user location.

Security Tools:

 SSL Certificate: To ensure secure communication between the client and the server, an
SSL certificate will be implemented, enabling HTTPS. HTTPS ensures that all data
transmitted between the client (browser) and server is encrypted, protecting sensitive
information such as images and detection results.
 Certbot: Certbot will be used to automate the installation and renewal of SSL
certificates. Certbot is an open-source tool that simplifies the process of obtaining a
trusted SSL certificate from a Certificate Authority (CA), ensuring the application
remains secure without manual intervention.

3.4 System Architecture Design

The system architecture for the real-time object detection application using YOLOv10 deployed
on AWS is designed for scalability, efficiency, and low-latency processing. The architecture is
composed of key components that interact with each other to enable seamless object detection
from the user’s device to the backend, providing quick and accurate results. Below is a detailed
description of each component in the system architecture.

1. User (Frontend)

The user is the entry point of the system. Through a modern web browser on their laptop,
desktop, or smartphone, the user captures an image or uploads a stream of images to the
application. The frontend captures image data, which is then sent to the backend for processing.
This interaction is handled by AJAX, allowing for seamless communication without the need to
refresh the entire page. The frontend displays the processed image along with the detected
objects in real-time.

Dept. of E T E, DSCE AY 2024-25 26


OBJECT DETECTION USING YOLOv10 ML MODEL DEPLOYED
USING AWS

Key Responsibilities:

 Capture image data from the user interface.


 Send image data to the backend using AJAX for processing.

Display results, including detected objects and bounding boxes, on the user interface.

2. Flask Server (API)

The Flask server is the backbone of the application, handling incoming HTTP requests from the
frontend. It receives the image data, processes it by passing it to the YOLOv10 model, and
returns the results to the user. Flask serves as the API layer, ensuring communication between
the user’s browser and the object detection model. It also coordinates various aspects of the
system, such as error handling, data formatting, and ensuring that the correct response is sent to
the client.

Key Responsibilities:

 Handle API requests and responses.


 Integrate with the YOLOv10 model for inference.

Return processed images and detection results (e.g., bounding boxes, class labels) to the
frontend.

3. YOLOv10 Model

The YOLOv10 model performs the core function of object detection. It analyzes the image sent
by the frontend, runs inference to detect objects, and provides outputs such as bounding box
coordinates, class labels, and confidence scores for each detected object. YOLOv10 (You Only
Look Once) is a state-of-the-art deep learning model designed for real-time object detection,
which enables fast processing and high accuracy in identifying multiple objects in a single
image.

Key Responsibilities:

Dept. of E T E, DSCE AY 2024-25 27


OBJECT DETECTION USING YOLOv10 ML MODEL DEPLOYED
USING AWS

 Perform object detection on the provided image.


 Return detection results, including bounding boxes and class labels.
 Generate the processed image with overlayed bounding boxes and labels.

4. AWS EC2 Instance

AWS EC2 (Elastic Compute Cloud) serves as the hosting platform for the Flask application. The
EC2 instance runs the Flask server, processes the incoming requests, and manages interactions
with the YOLOv10 model. EC2 instances provide scalable and flexible compute resources,
ensuring that the application can scale based on the volume of incoming requests. It also hosts
the environment necessary to run the machine learning model, ensuring reliable and consistent
performance.

Key Responsibilities:

 Host the Flask application and YOLOv10 model.


 Handle the computational load of object detection inference.
 Provide scalable infrastructure to accommodate varying levels of traffic.

5. CloudFront CDN

AWS CloudFront, a Content Delivery Network (CDN), improves the speed and performance of
the application by distributing content globally through edge locations. It caches frequently
accessed content, reducing latency for users by serving data from the closest geographical
region. CloudFront ensures that users, regardless of their location, experience minimal delay
when interacting with the application, making it essential for real-time object detection
applications.

Key Responsibilities:

 Cache and deliver content globally to reduce latency.


 Provide faster access to images and results by serving content from nearby edge
locations.

Dept. of E T E, DSCE AY 2024-25 28


OBJECT DETECTION USING YOLOv10 ML MODEL DEPLOYED
USING AWS

 Improve scalability and availability of the application.

6. Database/Storage (Optional for Logs)

Although not a mandatory part of the system, a database or storage service may be used for
logging and storing images or results. For instance, AWS S3 could be utilized to store large
image files or detection logs for future analysis or review. This component is optional but useful
for monitoring system performance, tracking user interactions, or storing historical detection
results.

Key Responsibilities:

 Store image files or logs for auditing purposes.


 Archive detection results for future analysis or troubleshooting.

Dept. of E T E, DSCE AY 2024-25 29


OBJECT DETECTION USING YOLOv10 ML MODEL DEPLOYED
USING AWS

3.5 Data Flow Diagram

Data Flow Diagram (DFD) for Real-Time Object Detection System Using YOLOv10

The Data Flow Diagram provides a more detailed view of the steps involved in the real-time
object detection process, from image upload to detection results being returned to the user. It
consists of several key interactions between the components of the system, as follows:

1. User uploads image (frontend): The process starts with the user uploading an image
through the frontend interface. This is done by selecting or dragging an image file into
the web interface. The frontend is typically a web page running on a browser, allowing
the user to interact with the system. The image is prepared for transmission to the server.
2. AJAX sends image data to Flask API: Once the image is selected, the frontend uses
AJAX (Asynchronous JavaScript and XML) to send the image data to the Flask API
running on the backend server. AJAX is used to ensure the page does not reload during
the image transmission, providing a seamless and real-time user experience. The image is
sent as base64 encoded data to the backend.
3. Flask processes image and runs YOLO inference: Upon receiving the image, the Flask
API processes the data and prepares it for object detection. The server decodes the base64
image data into a format that can be processed by the YOLOv10 model. The server then
uses the YOLOv10 model, which is a pre-trained deep learning object detection model, to
run inference on the image. The YOLO model performs object detection by identifying
objects in the image and drawing bounding boxes around them.
4. YOLO model detects objects and annotates the image: The YOLOv10 model analyzes
the image and detects various objects based on its training. Each detected object is
associated with a class label (e.g., car, person, dog) and a confidence score. The model
annotates the image by drawing bounding boxes around each detected object, as well as
labeling the object with its class name and confidence score. This allows for visual
feedback that the detection is happening correctly.
5. Flask sends the processed image and detection details to the user: Once the YOLO
model has completed the object detection and image annotation, the Flask API sends the
processed image back to the frontend. Along with the image, detection details such as the

Dept. of E T E, DSCE AY 2024-25 30


OBJECT DETECTION USING YOLOv10 ML MODEL DEPLOYED
USING AWS

bounding boxes, object classes, and confidence scores are sent to the frontend. This data
is typically formatted in JSON and is used by the frontend to display the results in real-
time. The image with the annotations is shown to the user, allowing them to view the
detected objects.

3.6 Use Case Diagram

A Use Case Diagram is a useful way to visually represent the interactions between system
components and users, showing how the system performs tasks from the user’s perspective. In
the context of a real-time object detection system using YOLOv10, the Use Case Diagram
defines the roles of various actors and the processes they interact with within the system. The
diagram primarily focuses on the user and the system’s functionality.

Actors:

1. User: The user is the primary actor in this system. They interact with the frontend
interface of the application. The user can upload images and view the results of object
detection. Their main responsibility is to provide input (an image) and receive output (the
processed image with detected objects).
2. System: The system, typically the backend portion of the application, includes the Flask
API, YOLOv10 model, and the server hosting the application. The system is responsible
for processing the image, running YOLOv10 for object detection, and sending back the
results to the user.

Use Cases:

Dept. of E T E, DSCE AY 2024-25 31


OBJECT DETECTION USING YOLOv10 ML MODEL DEPLOYED
USING AWS

1. Upload Image:
o Description: This use case represents the user's action of uploading an image to
the system. It is initiated by the user through the frontend interface, where they
select an image file. The image is sent to the backend using AJAX, ensuring that
the page does not reload, which facilitates a smooth and uninterrupted user
experience.
o Flow: The user selects the image, and it is transmitted as base64-encoded data to
the server, where it is prepared for further processing.
2. Run Object Detection:
o Description: After the image is uploaded, the backend (Flask server) receives the
image and initiates the object detection process using the YOLOv10 model. The
model performs inference on the uploaded image, identifying objects within it and
drawing bounding boxes around the detected items. The YOLO model returns the
detection results, including the class labels and confidence scores for each
detected object.
o Flow: The system processes the image and runs YOLOv10 inference, then
annotates the image with detection boxes. This process is crucial to transforming
the raw image into a meaningful output.
3. Display Detection Results:
o Description: This use case represents the final step of the process, where the
system sends the processed image back to the frontend along with the detection
details. The user can view the image with annotated bounding boxes, class labels,
and confidence scores. The system also sends any additional metadata related to
the detection for display purposes.
o Flow: The processed image, along with detection details (e.g., class names and
bounding box coordinates), is sent back to the user. The frontend interface
displays these results in real-time, allowing the user to view the annotated image
with all detected objects.

Relationships:

Dept. of E T E, DSCE AY 2024-25 32


OBJECT DETECTION USING YOLOv10 ML MODEL DEPLOYED
USING AWS

 The User interacts with the Upload Image use case to provide the image input. The
System is responsible for processing this input, running the YOLOv10 model for object
detection, and displaying the detection results back to the user.
 The System links the Run Object Detection and Display Detection Results use cases to
ensure that object detection and result display occur smoothly and efficiently.

Diagram Components:

 Actor (User): Interacts with the system by uploading images and viewing results.
 Use Case (Upload Image): Represents the process where the user uploads an image to
the system.
 Use Case (Run Object Detection): The system processes the uploaded image and
performs object detection using YOLOv10.
 Use Case (Display Detection Results): The system returns the processed image and
detection results to the user.
 System (Backend): The backend system processes the user’s input (image) and returns
detection outputs.

Dept. of E T E, DSCE AY 2024-25 33


OBJECT DETECTION USING YOLOv10 ML MODEL DEPLOYED
USING AWS

3.7 Sequence Diagram

A sequence diagram outlines the flow of interactions between the user, system components, and
the YOLOv10 model in a step-by-step sequence. It captures the dynamic behavior of the system,
detailing how components interact in a timely manner to achieve object detection and result
delivery.

Actors and Components:

1. Actor (User):
o The user is the initiator of the object detection process. Their primary role is to
upload an image and view the results after detection.
o They interact with the frontend interface to initiate the sequence.
2. Frontend:
o The user interface captures the image uploaded by the user.
o It encodes the image into base64 format and sends the data asynchronously to the
backend using AJAX.
o Once the processed image and detection results are received from the backend,
the frontend decodes and displays them to the user.
3. Backend (Flask):
o Acts as the intermediary between the frontend and the YOLOv10 model.
o Receives the encoded image data from the frontend, decodes it into a usable
format, and passes it to the YOLO model for inference.
o Processes the detection results, annotates the image with bounding boxes, and
encodes the processed image for return to the frontend.
4. YOLOv10 Model:
o The core component that performs object detection.
o Processes the image provided by the backend, identifies objects, and returns
detection outputs such as class labels, confidence scores, and bounding box
coordinates.

Sequence of Interactions:

Dept. of E T E, DSCE AY 2024-25 34


OBJECT DETECTION USING YOLOv10 ML MODEL DEPLOYED
USING AWS

1. User Uploads an Image:


o The sequence begins when the user selects an image through the frontend
interface and initiates the upload.
2. Frontend Encodes and Sends Image:
o The frontend encodes the image in base64 format to ensure seamless data
transmission over the network.
o Using AJAX, the encoded image is sent to the backend Flask API without
refreshing the page, ensuring a smooth user experience.
3. Backend Decodes and Processes Image:
o The Flask API decodes the base64-encoded image data into its original format.
o It prepares the image for object detection by passing it to the YOLOv10 model.
4. YOLOv10 Model Runs Inference:
o The YOLO model processes the image to detect objects within it.
o It returns a list of detected objects, including details such as:
 Class labels of objects (e.g., "Person," "Car").
 Confidence scores for each detection.
 Coordinates of bounding boxes surrounding the detected objects.
5. Backend Annotates and Encodes Image:
o The backend uses the detection outputs to annotate the image with bounding
boxes, class labels, and confidence scores.
o It then encodes the processed image into base64 format to prepare it for
transmission back to the frontend.
6. Frontend Decodes and Displays Results:
o The frontend receives the processed image and detection details from the
backend.
o It decodes the base64-encoded image and displays the annotated image along with
detection metadata to the user.

Dept. of E T E, DSCE AY 2024-25 35


OBJECT DETECTION USING YOLOv10 ML MODEL DEPLOYED
USING AWS

Dept. of E T E, DSCE AY 2024-25 36


OBJECT DETECTION USING YOLOv10 ML MODEL DEPLOYED
USING AWS

Dept. of E T E, DSCE AY 2024-25 37

You might also like