Crime Detecction DL Model ConvLSTM2D Analysis and Results
Crime Detecction DL Model ConvLSTM2D Analysis and Results
Learning
Hunaina Ehsan
School of Electrical Engineering and Abdul Rafay Khan
Computer Sciences School of Electrical Engineering and
NUST, H-12 Computer Sciences
Islamabad, Pakistan NUST, H-12
[email protected] Islamabad, Pakistan
[email protected]
Abstract—This project explores the use of deep learning constraints, emphasizing the need for further research and
techniques for the detection and classification of crimes from development in this area.
video footage. We developed two models: a binary classifier to
detect the presence of criminal activities and a multiclass Through rigorous experimentation, we evaluate the
classifier to identify specific types of crimes, including performance of our models and provide insights into their
explosion, assault, abuse, arson, and fighting, alongside normal strengths and limitations. Our findings contribute to the
activities. The binary classifier, trained with binary cross- growing body of research in automated surveillance and pave
entropy loss and the Adam optimizer, achieved a test accuracy the way for future advancements in crime detection
of 62.5%. The multiclass classifier, using categorical cross- technologies.
entropy loss, attained a high training accuracy of 98% but only
48% validation accuracy and 58% test accuracy, indicating II. NOVELTY OF OUR APPROACH
overfitting. Both models used a sequence length of 5 frames Our approach to crime detection and classification in
and normalized video frames to manage computational video surveillance leverages the capabilities of
constraints. Despite the data and processing limitations, this
Convolutional Long Short-Term Memory (ConvLSTM)
project demonstrates the potential of integrating
networks to address the limitations of existing solutions,
ConvLSTM2D networks for effective crime detection and
classification. The inference time of 0.79 seconds per video
such as the SlowFast Networks. While SlowFast Networks
suggests the feasibility of real-time deployment, though further represent a state-of-the-art method in video action
refinement and larger datasets are required for improved recognition, they come with significant drawbacks in terms
robustness and accuracy. of resource requirements and complexity. Our approach
offers several novel aspects that mitigate these issues:
I. INTRODUCTION A. Resource Efficiency:
The rise in surveillance infrastructure across public and
SlowFast Networks employ a dual-pathway architecture
private spaces presents an opportunity to leverage technology
to capture both slow and fast dynamics in video frames,
for enhanced public safety. Traditional surveillance methods,
which requires substantial computational resources and
which rely heavily on human monitoring, are not only labor-
memory. This can be a significant barrier to deployment in
intensive but also prone to oversight due to fatigue and the
resource-constrained environments. In contrast, our
sheer volume of data. The integration of deep learning in
ConvLSTM-based model processes video sequences
surveillance systems can automate the detection of criminal
efficiently by integrating spatial and temporal features
activities, providing timely alerts and aiding law enforcement
within a single pathway. This reduces the computational
agencies in rapid response.
load and makes our approach suitable for real-time
Motivated by the critical need for efficient crime applications on standard hardware, such as embedded
detection systems, this project aims to harness the power of systems and mobile devices.
deep learning to develop models capable of detecting and
B. Simplified Architecture:
classifying criminal activities from video footage. By
addressing both the detection of crime presence (binary The complexity of training SlowFast Networks stems
classification) and the identification of specific types of from the need to balance and optimize two separate
crimes (multiclass classification), we aim to provide a pathways with different frame rates. This not only increases
comprehensive solution that enhances the capabilities of the training time but also demands extensive
existing surveillance systems. hyperparameter tuning. Our ConvLSTM model simplifies
this by using a unified architecture that processes sequences
The novelty of our approach lies in the use of
of frames in a coherent manner, reducing the overall
ConvLSTM networks, which combine the spatial feature
complexity of the model. This simplification facilitates
extraction capabilities of convolutional neural networks
quicker training and easier deployment without
(CNNs) with the temporal sequence learning of long short-
compromising performance.
term memory (LSTM) networks. This combination allows
for the effective analysis of video data, capturing both frame- C. Temporal Memory:
level details and temporal dependencies. Our project not only
demonstrates the feasibility of such an approach but also ConvLSTM networks excel at capturing temporal
highlights the challenges associated with real-world data dependencies within video data, maintaining a memory of
sequential events. This is crucial for understanding the
D. Multiclass Classification
Objective: Classify the type of crime into one of six • Normal: The model shows high precision and recall for
categories. the 'Normal' class.
• Arson • Crime Classes: The precision and recall for crime
• Assault classes like abuse, arson, assault, explosion, and
fighting are lower, indicating difficulty in
• Abuse
distinguishing between these classes accurately.
• Explosion
The analysis shows that data for each crime i.e., 50 videos
• Fighting wasn’t sufficient enough as compared to normal videos
• Normal which are 150 videos. Data should be more as to avoid
Model Architecture: overfitting and good results.
• 2 ConvLSTM2D layers with ReLU activation.
• MaxPooling3D layers to reduce dimensionality. E. Performance Observations
• Dropout layers for regularization. • Overfitting: The high training accuracy and
• Flatten layer followed by a Dense layer with a softmax significantly lower validation/test accuracy in the
activation function. multiclass classification indicate overfitting. The
• Optimizer: Adam model performs well on training data but struggles
• Loss Function: Categorical Crossentropy to generalize to unseen data.
• Metrics: Accuracy • Resource Constraints: Due to RAM limitations, the
Training Setup: sequence length was fixed at 5 frames. More
• Batch Size: 31 extensive architectures or dynamic sequence lengths
• Epochs: 10 led to crashes.
• Validation Split: 10% • Model Complexity: The binary classification model
Results: utilized more ConvLSTM2D layers (5) compared to
• Accuracy plot analysis the multiclass model (2) due to resource constraints.
• Inference Time: The inference time of 0.79 seconds
per video indicates that the model is capable of real-
time processing on the given hardware setup.
Identify applicable funding agency here. If none, delete this text box.