0% found this document useful (0 votes)

134 views16 pages

Deep Learning for Sports Action Detection

The document presents a thesis on Deep-Learning assisted Sports Video Analytics, focusing on motion understanding and action recognition in sports videos. It details the work done, including the creation of a novel dataset for boxing actions, the development of an unsupervised learning framework for pose dynamics, and the implementation of a graph-spectrum based approach for temporal action localization. Future work includes extending the model to 2D domains and creating fine-grained action detection datasets.

Uploaded by

Vipul Baghel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

134 views16 pages

Deep Learning for Sports Action Detection

Uploaded by

Vipul Baghel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

PMRF Annual Review

Deep-Learning assisted Sports Video Analytics

By
Vipul Baghel
Department of Electrical Engineering
Indian Institute of Technology Gandhinagar
Gandhinagar, Gujarat, India

Thesis Advisor:
Dr. Ravi Sadananda Hegde
Roll no: 22350005
(Electrical Engineering) PMRF ID: 1703280
Presentation Outline

❖ Journey

❖ Introduction

◆ Motion Understanding

◆Background and Motivation

❖ Work Done

❖ Future Work

❖ References

2
Timeline
Artiﬁcial Intelligence
Optimization in Machine Learning
Transformer and GNN
Writing
Nature Inspired Computing
Machine Learning
Digital Image Processing
Probabilistic Machine Learning
HSS Course Independent Project - Seminar
HSS Course

Sem 1 Sem 2 Sem 3 Sem 4 Sem 5 Sem 6 Sem 7 Sem 8 Sem 9

Aug Pre-synopsis
Qualifiers-I Qualifiers-II Proposal defense
2022 seminar and
(30 Oct - 4 (Jan) (May)
Nov) Thesis
submission
CV based Fine-grained
Design
analytics highly dynamic
sports-specific
using sota motion analytics
long-term S&S
HPE module
tracking
Detection & Classification
of fine-grained sports
Course credits completed: 43
-specific motion on wild 3
Thesis credits completed: 28 videos
Deep-Learning assisted Motion Understanding
● Definition:
This process involves multiple sub-tasks such as:
Motion understanding in computer vision refers to the
process of analyzing, interpreting, and modeling dynamic
● Motion Estimation: e.g., optical flow, trajectory prediction.
changes in visual scenes over time, with a focus on
extracting meaningful patterns of movement from
● Action Recognition: identifying specific human actions.
sequences of images or videos.

● Temporal Segmentation: detecting the start and end points

It aims to identify, localize, classify, and understand
of motion events.
temporal events and actions performed by
objects—particularly humans—in a scene.
● Pose Estimation & Tracking: localizing body joints over
time.

● Motion Pattern Modeling: learning spatio-temporal

representations of dynamic behavior.

4
Temporal Action Localization (TAL)
Work Done
Objective 1
● Overview[1]:
● A novel, well-annotated dataset is collected from 20 YouTube boxing
sparring/practice videos featuring 18 athletes (11 male, 7 female) with a
total duration of approximately 4 hours.

○ The dataset includes key body joints, punch start/end times,

and labels for six ﬁne-grained punch categories:
Cross, Jab, Lead Hook, Lead Uppercut, Rear Hook, and Rear
Uppercut, capturing diverse techniques.

● Overall, there are 6915 detected, demarcated, and labeled boxing action
subclips in the dataset, with an average duration of 1 second.

5
Work Done
Objective 1
● Continue…
● A hierarchical framework is proposed for punch detection, demarcation, extraction, and classification
from raw combat sports videos.
The two-step approach first detects boundaries and extracts fixed-length subclips, then classifies
punches into six categories.

● The task of combat activity detection is framed as a regression problem rather than a classiﬁcation
problem, where each rolling window is scored to help identify subclip frame boundaries.

● Demonstration of the utility of the proposed methodology as a home-training tool for boxers using
cost-effective consumer-grade video capture.

6
Work Done
Objective 1
● Results:

Classiﬁcation performance of the proposed pipeline on the testing dataset

Use case of punch classes of multiple subjects

7
Work Done
Objective 2
● Overview[2]:
● Unsupervised Skeleton-Based Learning Framework:
We introduce an Attention-based Spatio-Temporal Graph Neural
Network (ASTGCN) encoder, pretrained via blockwise
pose-sequence learning. This approach captures blockwise motion
dynamics, and scene transitions are identiﬁed using change in the
curvature of motion dynamics sequence, introduced as Action
Dynamics Sequence (ADM).

● Empirical Validation:
Our model achieves performance comparable to supervised methods
on the DSV Diving dataset. Additionally, we demonstrate the
generalization capability of our approach on out-of-distribution,
in-the-wild diving videos.

● Theoretical and Visual Interpretability:

We provide a graphical representation of learned embeddings as a
measure of pose dynamics and transitions. Furthermore, we provide
an analytical proof demonstrating that inﬂection points correspond
to action transition states.

8
Work Done
Objective 2
● Results:

Comparison of our model with DiveNet in detecting pose transitions.

Performance comparison with varying number of ASTGCN Blocks

Performance comparison with varying Chebyshev ﬁlter size

9
Work Done
Objective 2
● ADM Interpretation:

10
Work Done
Objective 3
● Overview[3]:
● Completely unsupervised graph-spectrum based temporal action localization. It includes the
pre-training of the ASTGCN model with reconstruction as a pre-text task. Spectral clustering is
performed to get the microlevel segmentations.

● Our approach is evaluated on the validation on the largest available 3D pose sequence dataset
having frame-level annotations, i.e. BABEL dataset.
We achieved the average precision (mAP) of almost 25 higher (with zero ﬁne-tuning) as
compared with other unsupervised works with ﬁne-tuning.

11
Work Done
Objective 3
● Continue…:

● The optimal number of distinct actions that could be present in pose sequences is
determined using an ensemble estimator of three cluster quality score methods.

● Perform the ablation study on the various backbone and clustering methods.
Furthermore, we provide an analytical proof that the spectral clusters derived from
the low-dimensional and reduced ASTGCN encoded embeddings are the segments
belong to the same pose dynamics. Hence, the method provides pose transitions on
the basis of clusters.

12
Work Done
Objective 3

● Results:

Detection mAP@IoU(%) across different thresholds from 0.1 to 0.5 on BABEL dataset 13
Next Work
Objective

● Tasks:
● 2D domain extension of TAL problem
● Fine-grained skeleton-based pseudo labelling
● Dataset Creation:
○ Fine-grained action detection
■ Laboratory data collection using 3D camera
■ Apply bootstrapping to label You Tube videos

● Physics Prior:
● Human Oriented Transformation
● View-Invariant Transformation
● Rotational-Invariant Transformation
● Relative Coordinate Conversion

14
References
¹Rahul Kumar†, Vipul Baghel†, Sudhanshu Singh1, Shivam Yadav3, Babji Srinivasan1, and Ravi Hegde, “Real-Time Combat Training
Analytics: Skeleton-based Temporal Action Localization in Unstructured Video, ” 2025 10th National Conference on Computer Vision,
Pattern Recognition, Image Processing and Graphics (NCVPRIG). (Under Review)

2
Bikash Kumar Badatya†, Vipul Baghel† and Ravi Hegde, “Precise Motion Transitions Detection in Untrimmed Sports Videos Using
Spatio-Temporal Graph Embeddings, ”2025 IEEE International Conference on Image Processing (ICIP). (Under Review)

3
Vipul Baghel and Ravi Hegde, “Label-Free Temporal Action Localization based on Spectral Analysis of Graph-Encoded Motion
Embeddings, ”IEEE Transaction on Pattern Analysis and Machine Intelligence. (Submitted)

4
P. Wang, F. Zeng, and Y. Qian, ``A survey on deep learning-based spatio-temporal action detection," International Journal of Wavelets,
Multiresolution and Information Processing, 2024.

5
E. Vahdani and Y. Tian, “Deep learning-based action detection in untrimmed videos: A survey,'' IEEE Transaction on Pattern Analysis
and Machine Intelligence, 2023.

6
B. Wang, Y. Zhao, L. Yang, T. Long, and X. Li, “Temporal action localization in the deep learning era: A survey,” IEEE Transaction on
Pattern Analysis and Machine Intelligence, 2023.

7
S. Guo, Y. Lin, N. Feng, C. Song, and H. Wan, `Àttention based spatial-temporal graph convolutional networks for traffic flow
forecasting,'' Proceedings of AAAI Conference on Artificial Intelligence}, 2019.

15
16

MotionBERT: Unified Framework for 3D Motion
No ratings yet
MotionBERT: Unified Framework for 3D Motion
15 pages
MotionBERT: Learning Human Motion
No ratings yet
MotionBERT: Learning Human Motion
18 pages
Lightweight Human Motion Recognition Framework
No ratings yet
Lightweight Human Motion Recognition Framework
6 pages
3D Skeleton-Based Fall Detection System
No ratings yet
3D Skeleton-Based Fall Detection System
11 pages
Motion Capture in Sports Using GCNs
No ratings yet
Motion Capture in Sports Using GCNs
18 pages
Action Recognition via Depth Maps
No ratings yet
Action Recognition via Depth Maps
14 pages
Scientific Programming - 2021 - Wu - Sports Intelligent Assistance System Based On Deep Learning
No ratings yet
Scientific Programming - 2021 - Wu - Sports Intelligent Assistance System Based On Deep Learning
9 pages
Enhancing UTD-MHAD Action Recognition
No ratings yet
Enhancing UTD-MHAD Action Recognition
5 pages
Stroke Classification in Table Tennis Using PoseNet
No ratings yet
Stroke Classification in Table Tennis Using PoseNet
3 pages
Ensemble Deep Learning for Video Action Recognition
No ratings yet
Ensemble Deep Learning for Video Action Recognition
18 pages
Ebookball - Com/?p 1532
100% (5)
Ebookball - Com/?p 1532
49 pages
Taekwondo Training Motion Capture Technology Based
No ratings yet
Taekwondo Training Motion Capture Technology Based
22 pages
Human Action Recognition with Depth Images
No ratings yet
Human Action Recognition with Depth Images
37 pages
Human Motion Recognition Framework
No ratings yet
Human Motion Recognition Framework
6 pages
Human Pose Recovery & Action Recognition
No ratings yet
Human Pose Recovery & Action Recognition
7 pages
Deep Learning for Action Recognition
No ratings yet
Deep Learning for Action Recognition
6 pages
Deep Learning for Human Pose Analysis
No ratings yet
Deep Learning for Human Pose Analysis
47 pages
Tennis Strokes Recognition From Generated Stick Figure Video Overlays
No ratings yet
Tennis Strokes Recognition From Generated Stick Figure Video Overlays
8 pages
Video Action Recognition with SVM
No ratings yet
Video Action Recognition with SVM
3 pages
Enhanced AcT Model for Action Recognition
No ratings yet
Enhanced AcT Model for Action Recognition
14 pages
Action Recognition via TDD Method
No ratings yet
Action Recognition via TDD Method
10 pages
LSTM for Video-Based Action Recognition
No ratings yet
LSTM for Video-Based Action Recognition
8 pages
Skeleton-Based Action Classification in Videos
No ratings yet
Skeleton-Based Action Classification in Videos
14 pages
Action Classification via 2D Pose Estimation
No ratings yet
Action Classification via 2D Pose Estimation
7 pages
Deep Trajectory Descriptor for Action Recognition
No ratings yet
Deep Trajectory Descriptor for Action Recognition
6 pages
RGB-Depth Action Recognition in Videos
No ratings yet
RGB-Depth Action Recognition in Videos
6 pages
Action Progression Networks for TAD
No ratings yet
Action Progression Networks for TAD
16 pages
Real-Time Pose Estimation for Sports Videos
No ratings yet
Real-Time Pose Estimation for Sports Videos
5 pages
Survey on Deep Learning for Video Action Recognition
No ratings yet
Survey on Deep Learning for Video Action Recognition
55 pages
Human Pose Recovery & Action Recognition
No ratings yet
Human Pose Recovery & Action Recognition
8 pages
Enhancing Sewing Activity Recognition
No ratings yet
Enhancing Sewing Activity Recognition
7 pages
Object-Based Action Recognition Model
No ratings yet
Object-Based Action Recognition Model
15 pages
Ensemble Deep Learning for Video Action Classification
No ratings yet
Ensemble Deep Learning for Video Action Classification
12 pages
Human Action Recognition Using Depth Maps
No ratings yet
Human Action Recognition Using Depth Maps
4 pages
Soccer Video Action Classification with LSTM
No ratings yet
Soccer Video Action Classification with LSTM
6 pages
Ensemble Deep Learning for Video Action Recognition
No ratings yet
Ensemble Deep Learning for Video Action Recognition
8 pages
3D Point-Based Action Recognition
No ratings yet
3D Point-Based Action Recognition
8 pages
Improving Badminton Action Recognition Using SpatioTemporal Analysis and A Weighted Ensemble Learning Model - 2024 - Tech Science Press
No ratings yet
Improving Badminton Action Recognition Using SpatioTemporal Analysis and A Weighted Ensemble Learning Model - 2024 - Tech Science Press
18 pages
Deep Learning for Video Action Recognition
No ratings yet
Deep Learning for Video Action Recognition
3 pages
Deep Fusion for Action Recognition
No ratings yet
Deep Fusion for Action Recognition
10 pages
Using LSTM For Automatic Classification of Human Motion Capture Data
No ratings yet
Using LSTM For Automatic Classification of Human Motion Capture Data
8 pages
Gradient Local Auto-Correlation Features For Depth Human Action Recognition - SpringerLink
No ratings yet
Gradient Local Auto-Correlation Features For Depth Human Action Recognition - SpringerLink
3 pages
Attention-Guided MQA Framework
No ratings yet
Attention-Guided MQA Framework
5 pages
Transformer Models for Robotic Dynamics
No ratings yet
Transformer Models for Robotic Dynamics
12 pages
Hierarchical Deep Model for Activity Recognition
No ratings yet
Hierarchical Deep Model for Activity Recognition
10 pages
TSP CMC 49512
No ratings yet
TSP CMC 49512
21 pages
Deep Learning for Action Recognition in AFL
No ratings yet
Deep Learning for Action Recognition in AFL
14 pages
Electronics 10 01118 v3
No ratings yet
Electronics 10 01118 v3
12 pages
3D Gaussian Dynamics for Robot Video Prediction
No ratings yet
3D Gaussian Dynamics for Robot Video Prediction
12 pages
Vision-Based Human Activity Recognition
No ratings yet
Vision-Based Human Activity Recognition
218 pages
Ensemble Deep Learning for Video Action Recognition
No ratings yet
Ensemble Deep Learning for Video Action Recognition
13 pages
CNN and DTW for Depth Map Action Recognition
No ratings yet
CNN and DTW for Depth Map Action Recognition
13 pages
Deep Learning for Motion Control & Estimation
No ratings yet
Deep Learning for Motion Control & Estimation
62 pages
Multi-task Deep Learning for Pose & Action
No ratings yet
Multi-task Deep Learning for Pose & Action
16 pages
Unsupervised Action Localization in Sports
No ratings yet
Unsupervised Action Localization in Sports
2 pages
Computer Vision Course Performance Report
No ratings yet
Computer Vision Course Performance Report
9 pages
Gd-Doped LLZO Solid Electrolytes for Batteries
No ratings yet
Gd-Doped LLZO Solid Electrolytes for Batteries
8 pages
Qif India 2023 Official Rules
No ratings yet
Qif India 2023 Official Rules
2 pages
Enhanced Ionic Conductivity in Ti-Doped LLZ
No ratings yet
Enhanced Ionic Conductivity in Ti-Doped LLZ
5 pages
Faith and Reason: A Christian Perspective
No ratings yet
Faith and Reason: A Christian Perspective
2 pages
Rationalistic vs. Naturalistic Inquiry
100% (1)
Rationalistic vs. Naturalistic Inquiry
4 pages
Team Motivation Analysis at CCE
No ratings yet
Team Motivation Analysis at CCE
5 pages
Teacher Annotation Template for MOVs
100% (5)
Teacher Annotation Template for MOVs
1 page
Key Issues in Human Nature Theories
No ratings yet
Key Issues in Human Nature Theories
16 pages
Effective Multiple Choice Question Tips
No ratings yet
Effective Multiple Choice Question Tips
5 pages
High Quality Functional IEP Goals Training
100% (2)
High Quality Functional IEP Goals Training
36 pages
Essentials of Effective Communication
No ratings yet
Essentials of Effective Communication
22 pages
IEP 2: Grammar and Writing Practice
No ratings yet
IEP 2: Grammar and Writing Practice
21 pages
Barriers to Effective Communication
No ratings yet
Barriers to Effective Communication
6 pages
Key Objectives of Team Building
No ratings yet
Key Objectives of Team Building
5 pages
DSM-IV & DSM-V: Intellectual Disability Criteria
No ratings yet
DSM-IV & DSM-V: Intellectual Disability Criteria
1 page
Candidate Assessment Criteria Guide
No ratings yet
Candidate Assessment Criteria Guide
2 pages
Preliminary Exam: Introduction to Linguistics
No ratings yet
Preliminary Exam: Introduction to Linguistics
4 pages
AI Techniques for Knowledge Management
No ratings yet
AI Techniques for Knowledge Management
12 pages
Understanding Reading Fluency Components
No ratings yet
Understanding Reading Fluency Components
4 pages
Articles of Confederation Simulation Guide
No ratings yet
Articles of Confederation Simulation Guide
3 pages
Victor Antonio Start Selling 2009
100% (2)
Victor Antonio Start Selling 2009
104 pages
Understanding Job Attitudes and Satisfaction
No ratings yet
Understanding Job Attitudes and Satisfaction
40 pages
Essential Essay Writing Phrases
No ratings yet
Essential Essay Writing Phrases
3 pages
Question Tags: Exercises and Examples
No ratings yet
Question Tags: Exercises and Examples
6 pages
Mastering Organizational Skills
No ratings yet
Mastering Organizational Skills
16 pages
Motivation: Cognitive and Affective Insights
No ratings yet
Motivation: Cognitive and Affective Insights
13 pages
Toddler Learning Guide (Ages 1-3)
No ratings yet
Toddler Learning Guide (Ages 1-3)
3 pages
Grade 10 Science Lesson on Boyle's Law
No ratings yet
Grade 10 Science Lesson on Boyle's Law
2 pages
MMPC-011 Assignment Overview
No ratings yet
MMPC-011 Assignment Overview
8 pages
AI: Benefits and Risks Overview
No ratings yet
AI: Benefits and Risks Overview
7 pages
Vedic Teaching Principles and Practices
No ratings yet
Vedic Teaching Principles and Practices
4 pages
Understanding Full Infinitive Usage
No ratings yet
Understanding Full Infinitive Usage
4 pages
2005, Bylund - Examining Empathy in Medical Encounters
No ratings yet
2005, Bylund - Examining Empathy in Medical Encounters
20 pages

Deep Learning for Sports Action Detection

Uploaded by

Deep Learning for Sports Action Detection

Uploaded by

PMRF Annual Review

Deep-Learning assisted Sports Video Analytics

◆Background and Motivation

Sem 1 Sem 2 Sem 3 Sem 4 Sem 5 Sem 6 Sem 7 Sem 8 Sem 9

● Temporal Segmentation: detecting the start and end points

● Motion Pattern Modeling: learning spatio-temporal

○ The dataset includes key body joints, punch start/end times,

Classiﬁcation performance of the proposed pipeline on the testing dataset

Use case of punch classes of multiple subjects

● Theoretical and Visual Interpretability:

Comparison of our model with DiveNet in detecting pose transitions.

Performance comparison with varying number of ASTGCN Blocks

Performance comparison with varying Chebyshev ﬁlter size

You might also like