0% found this document useful (0 votes)
30 views

Crime Detecction DL Model ConvLSTM2D Analysis and Results

It is a document of report for a deep learning model of ConvLSTM2D to detect 5 classes of crimes i.e., assault, abuse, explosion, fighting, arson and normal videos.

Uploaded by

neelathotacuso4
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views

Crime Detecction DL Model ConvLSTM2D Analysis and Results

It is a document of report for a deep learning model of ConvLSTM2D to detect 5 classes of crimes i.e., assault, abuse, explosion, fighting, arson and normal videos.

Uploaded by

neelathotacuso4
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Crime Detection and Classification using Deep

Learning

Hunaina Ehsan
School of Electrical Engineering and Abdul Rafay Khan
Computer Sciences School of Electrical Engineering and
NUST, H-12 Computer Sciences
Islamabad, Pakistan NUST, H-12
[email protected] Islamabad, Pakistan
[email protected]

Abstract—This project explores the use of deep learning constraints, emphasizing the need for further research and
techniques for the detection and classification of crimes from development in this area.
video footage. We developed two models: a binary classifier to
detect the presence of criminal activities and a multiclass Through rigorous experimentation, we evaluate the
classifier to identify specific types of crimes, including performance of our models and provide insights into their
explosion, assault, abuse, arson, and fighting, alongside normal strengths and limitations. Our findings contribute to the
activities. The binary classifier, trained with binary cross- growing body of research in automated surveillance and pave
entropy loss and the Adam optimizer, achieved a test accuracy the way for future advancements in crime detection
of 62.5%. The multiclass classifier, using categorical cross- technologies.
entropy loss, attained a high training accuracy of 98% but only
48% validation accuracy and 58% test accuracy, indicating II. NOVELTY OF OUR APPROACH
overfitting. Both models used a sequence length of 5 frames Our approach to crime detection and classification in
and normalized video frames to manage computational video surveillance leverages the capabilities of
constraints. Despite the data and processing limitations, this
Convolutional Long Short-Term Memory (ConvLSTM)
project demonstrates the potential of integrating
networks to address the limitations of existing solutions,
ConvLSTM2D networks for effective crime detection and
classification. The inference time of 0.79 seconds per video
such as the SlowFast Networks. While SlowFast Networks
suggests the feasibility of real-time deployment, though further represent a state-of-the-art method in video action
refinement and larger datasets are required for improved recognition, they come with significant drawbacks in terms
robustness and accuracy. of resource requirements and complexity. Our approach
offers several novel aspects that mitigate these issues:
I. INTRODUCTION A. Resource Efficiency:
The rise in surveillance infrastructure across public and
SlowFast Networks employ a dual-pathway architecture
private spaces presents an opportunity to leverage technology
to capture both slow and fast dynamics in video frames,
for enhanced public safety. Traditional surveillance methods,
which requires substantial computational resources and
which rely heavily on human monitoring, are not only labor-
memory. This can be a significant barrier to deployment in
intensive but also prone to oversight due to fatigue and the
resource-constrained environments. In contrast, our
sheer volume of data. The integration of deep learning in
ConvLSTM-based model processes video sequences
surveillance systems can automate the detection of criminal
efficiently by integrating spatial and temporal features
activities, providing timely alerts and aiding law enforcement
within a single pathway. This reduces the computational
agencies in rapid response.
load and makes our approach suitable for real-time
Motivated by the critical need for efficient crime applications on standard hardware, such as embedded
detection systems, this project aims to harness the power of systems and mobile devices.
deep learning to develop models capable of detecting and
B. Simplified Architecture:
classifying criminal activities from video footage. By
addressing both the detection of crime presence (binary The complexity of training SlowFast Networks stems
classification) and the identification of specific types of from the need to balance and optimize two separate
crimes (multiclass classification), we aim to provide a pathways with different frame rates. This not only increases
comprehensive solution that enhances the capabilities of the training time but also demands extensive
existing surveillance systems. hyperparameter tuning. Our ConvLSTM model simplifies
this by using a unified architecture that processes sequences
The novelty of our approach lies in the use of
of frames in a coherent manner, reducing the overall
ConvLSTM networks, which combine the spatial feature
complexity of the model. This simplification facilitates
extraction capabilities of convolutional neural networks
quicker training and easier deployment without
(CNNs) with the temporal sequence learning of long short-
compromising performance.
term memory (LSTM) networks. This combination allows
for the effective analysis of video data, capturing both frame- C. Temporal Memory:
level details and temporal dependencies. Our project not only
demonstrates the feasibility of such an approach but also ConvLSTM networks excel at capturing temporal
highlights the challenges associated with real-world data dependencies within video data, maintaining a memory of
sequential events. This is crucial for understanding the

XXX-X-XXXX-XXXX-X/XX/$XX.00 ©20XX IEEE


context of actions over time, which is essential for accurate A. Dataset
crime detection and classification. While SlowFast
Networks capture both fast and slow motions, they do not The dataset used for our experiments is the Crime UCF
inherently possess the temporal memory capabilities of Dataset, available on Kaggle
LSTMs. Our approach ensures that the model retains (https://www.kaggle.com/datasets/mission-
information about past frames, improving the detection of ai/crimeucfdataset). It consists of 35GB of video data,
subtle and prolonged anomalous behaviors that are categorized into normal and various types of crime activities
characteristic of crimes. such as explosion, assault, abuse, arson, and fighting.
B. Experimental Setup
D. Versatility and Adaptability: Hardware and Platform:
Our model is designed to handle both binary • Platform: Kaggle Notebooks
classification (detecting the presence of any crime) and
multiclass classification (identifying specific types of crimes • Disk Space: 73.1GB (Max)
such as explosion, assault, abuse, arson, fighting, and
• RAM: 30GB (Max)
normal activities). This dual functionality enhances the
system's versatility, making it applicable to a broader range • CPU
of surveillance scenarios. Additionally, the model's design
allows for easy adjustments to sequence length and frame • GPU: No accelerator
preprocessing, enabling it to be adapted to various hardware Data Preprocessing
constraints and specific use cases.
• Frame Extraction: Each video was divided into frames.

E. Regularization Techniques: • Normalization: Frames were normalized by dividing


pixel values by 255.
To combat overfitting and ensure better generalization,
our approach incorporates dropout regularization and early • Sequence Length: Fixed sequence length of 5 frames
stopping. These techniques are crucial for maintaining was used due to RAM constraints. Dynamic sequence
model performance across different datasets and preventing length adjustments were attempted but led to crashes on
the model from becoming too tailored to the training data. Kaggle.
While SlowFast Networks can also use these techniques, the
inherent complexity and dual-pathway nature make them C. Binary Classification
less straightforward to implement effectively. Objective: Detect whether a crime is happening or not.
F. Scalability: Model Architecture:
Given its resource-efficient design, our ConvLSTM- • 5 ConvLSTM2D layers with ReLU activation.
based approach is scalable across different hardware
platforms. This scalability ensures that the model can be • MaxPooling3D layers to reduce dimensionality.
deployed in diverse environments, from high-performance
computing systems to more constrained devices like • Dropout layers for regularization.
surveillance cameras and portable security systems. • Flatten layer followed by a Dense layer with a sigmoid
SlowFast Networks, on the other hand, are more challenging activation function.
to scale due to their intensive computational requirements.
• Optimizer: Adam
III. DETAILED EXPERIMENTAL SETUP AND THOROUGH
ANALYSIS OF RESULTS • Loss Function: Binary Crossentropy
• Metrics: Accuracy
Training Setup:
• Batch Size: 31
• Epochs: 20

Validation Split: 10%
Training Process:
• Early Stopping: Implemented to prevent overfitting.
• Callbacks: Early stopping based on validation loss with
patience of 5 epochs.
• Training and Validation: The models were trained for
an initial 10 epochs, with the potential to continue
training based on existing weights without affecting the
previous history.
Results:
• Accuracy plot:
PR plot:

• Training Accuracy: 62.5%


• Test Accuracy:62.5%

D. Multiclass Classification
Objective: Classify the type of crime into one of six • Normal: The model shows high precision and recall for
categories. the 'Normal' class.
• Arson • Crime Classes: The precision and recall for crime
• Assault classes like abuse, arson, assault, explosion, and
fighting are lower, indicating difficulty in
• Abuse
distinguishing between these classes accurately.
• Explosion
The analysis shows that data for each crime i.e., 50 videos
• Fighting wasn’t sufficient enough as compared to normal videos
• Normal which are 150 videos. Data should be more as to avoid
Model Architecture: overfitting and good results.
• 2 ConvLSTM2D layers with ReLU activation.
• MaxPooling3D layers to reduce dimensionality. E. Performance Observations
• Dropout layers for regularization. • Overfitting: The high training accuracy and
• Flatten layer followed by a Dense layer with a softmax significantly lower validation/test accuracy in the
activation function. multiclass classification indicate overfitting. The
• Optimizer: Adam model performs well on training data but struggles
• Loss Function: Categorical Crossentropy to generalize to unseen data.
• Metrics: Accuracy • Resource Constraints: Due to RAM limitations, the
Training Setup: sequence length was fixed at 5 frames. More
• Batch Size: 31 extensive architectures or dynamic sequence lengths
• Epochs: 10 led to crashes.
• Validation Split: 10% • Model Complexity: The binary classification model
Results: utilized more ConvLSTM2D layers (5) compared to
• Accuracy plot analysis the multiclass model (2) due to resource constraints.
• Inference Time: The inference time of 0.79 seconds
per video indicates that the model is capable of real-
time processing on the given hardware setup.

IV. LIMITATIONS FOR OUR APPROACH


• Hardware Constraints:
The primary limitation encountered was the hardware
constraints, particularly the RAM limitations on Kaggle and
Google Colab. The dataset size of 35GB and the
requirement for high computational power for training
ConvLSTM2D models led to frequent crashes and restricted
the complexity of the models that could be trained.
• Training Accuracy: 98% The inability to use more sophisticated architectures like
• Validation Accuracy: 48% SlowFast networks, which are more resource-intensive,
• Test Accuracy: 58% limited the potential performance improvements.
• Inference Time: 0.79 seconds per video • Model Complexity and Overfitting:
This analysis shows that model overfitted due to possible The binary classification model with five ConvLSTM2D
reasons such as less learning rate and less preprocessing layers and the multi-class model with two ConvLSTM2D
steps on dataset. layers indicated signs of overfitting, as seen from the high
training accuracy (98%) but significantly lower validation
(48%) and test accuracy (58%) for the multi-class problem. data, could enhance the model's ability to understand
This discrepancy suggests that the model learned the complex video dynamics.
training data well but struggled to generalize to new data.
• Cross-Validation:
• Learning Rate:
The learning rate might have been too high, causing the
model to converge quickly but possibly to suboptimal Implementing cross-validation techniques to better assess
weights. A lower learning rate could have resulted in better the model's performance across different subsets of the data
generalization but would require more epochs and thus more and ensure that the results are not dependent on a single
computational resources. train-test split.
• Sequence Length and Preprocessing:
The fixed sequence length of 5 frames, due to RAM
constraints, might not capture enough temporal information By addressing these limitations and exploring the
for the model to accurately understand and classify the recommended approaches, future work can significantly
complex actions in the videos. Additionally, the improve the performance and reliability of crime detection
preprocessing step of merely normalizing the frames models in surveillance videos.
without augmentation could have led to a less robust model.
VI. CONCLUSION
V. RECOMMENDATIONS FOR FUTURE WORK: In this project, we explored the challenging task of crime
detection and classification in surveillance videos using
• Enhanced Hardware Resources:
deep learning techniques. Two models were developed: a
Utilizing more powerful hardware with higher RAM and binary classifier to detect the presence of a crime and a
GPU capabilities would allow training more complex models multi-class classifier to categorize different types of crimes.
like SlowFast networks, which could improve performance
significantly. Cloud platforms with advanced GPUs or TPUs The binary classifier achieved a test accuracy of 62.5%
could be considered for this purpose. using a ConvLSTM2D architecture with five layers and was
• Model Optimization: trained for 20 epochs. The multi-class classifier, designed to
identify specific crimes such as explosion, assault, abuse,
Experimenting with different model architectures that arson, fighting, and normal activities, achieved a training
balance complexity and performance, such as using fewer accuracy of 98%, validation accuracy of 48%, and test
but deeper ConvLSTM2D layers or hybrid models accuracy of 58% over 10 epochs. The preprocessing steps
combining ConvLSTM2D with traditional LSTM layers, included sequence length adjustment to 5 frames and
could improve generalization. normalization of the video frames, constrained by the
Implementing regularization techniques like L2 hardware limitations.
regularization and dropout more extensively to prevent
overfitting. The results indicate that while the binary classifier performs
moderately well, the multi-class classifier faces significant
• Learning Rate Adjustments:
challenges, particularly in generalization, as evidenced by
Conducting a thorough learning rate tuning process, the substantial drop in accuracy from the training to the
potentially using learning rate schedules or adaptive learning validation and test sets. The high training accuracy coupled
rate optimizers like AdamW or Ranger, could help achieve with lower validation and test accuracy suggests overfitting,
better convergence and generalization. likely due to the complex nature of the task and limitations
• Data Augmentation and Preprocessing: in hardware resources preventing the use of more advanced
architectures or larger sequence lengths.
Incorporating data augmentation techniques such as
random cropping, rotation, and temporal augmentation to Moreover, the precision-recall curves reveal that the model
create more diverse training data could improve the model's struggles to distinguish between certain types of crimes,
robustness. indicating a need for further model optimization and more
Experimenting with dynamic sequence lengths and using sophisticated preprocessing techniques.
padding or truncation strategies to handle variable-length
video sequences might capture more relevant temporal Overall, this project highlights the potential of
information. ConvLSTM2D architectures in crime detection from video
data but also underscores the need for enhanced
• Advanced Temporal Models: computational resources, better data augmentation, and
Investigating other temporal modeling techniques, such more advanced modeling techniques to improve
as Transformers or attention mechanisms, which have shown performance and generalization in real-world applications.
promise in capturing long-range dependencies in sequential

Identify applicable funding agency here. If none, delete this text box.

You might also like