Othniel Project Work
Othniel Project Work
CHAPTER ONE
INTRODUCTION
1.5 Limitation
UAVs can cover a large area. However, they pose problems of, (1) a moving camera, (2) the
crowd that may move during the capture time and (3) different viewpoints which require
extensive additional training and testing. Another limitation is the inability of the UAV to fly
higher than a certain altitude limit, as this can limit the result of different factors, technical and
legal.
1.6 Significance
The crowd counting system has the capability to monitor and manage large crowd, which
provides surveillance and crowd control to respond to the event situation though a framework.
This framework includes three consecutive steps namely sensing, alerting and action (SAA).
Sensing is capturing images of crowd areas using a camera (digital/infrared/multi-spectral)
mounted onboard of a moving platform, such an unmanned aerial vehicle (UAV). The captured
images of the crowd area are then transmitted via communication infrastructure for further
processing. In the second step, images are segmented and classified to estimate crowd density.
The final step is to take an action based on the information provided in the image segmentation
step. This can be achieved through providing instructions for individuals to follow the best
directions that guarantee safe exit and controllable movements.
LITERATURE REVIEW
2.1 Overview
Crowd counting is one of the most perplexing difficulties in the computer vision community for
safety and security through surveillance systems, according to Kalyani, et al., (2021). Disaster
management, surveillance event detection, intelligence gathering and analysis, public safety
control, traffic monitoring, public space design, anomaly detection, and military applications all
use the system. Feature Pyramid Networks were first used in deep convolutional networks to
count the number of people in a crowd. When compared to well-known networks on three typical
crowd counting datasets, this technique outperforms them.
Crowd management is an important duty to maintain the safety and smoothness of any event,
according to Elharrouss, et al., (2021). The crowd management has become easier thanks to this
system, which was designed employing cutting-edge technologies such as surveillance cameras,
drones, and security agent communication systems. For feature extraction and density crowd
estimation, the method uses dilated and scaled neural networks. For training and testing, the
ViseDrone2020 dataset was used. In comparison to the implemented methods, the experiment
reveals that the suggested model is more efficient for crowd counting. Furthermore, some of
these technologies produce reasonably accurate results in terms of predicted crowd numbers and
density map quality. The suggested model was then tested on non-drone datasets, including UCF
QNRF, UCF CC 50, and shanghaiTech (A, B), which all yielded satisfactory results.
Furthermore, the suggested method was evaluated on noisy images, with Gaussian and salt &
pepper noise applied to all of the dataset's images with a noise density of 0.02. The investigation
revealed that the quality of the density map, as well as the quantity of crowd count estimation, is
superior to alternative methods that do not include noise.
Alubankudi and Ogunti (2021) conducted research into the many approaches that can be
employed to address the problem of crowd counting. More specifically, to improve a crowd
counting model based on the use of a convolutional neural system by removing its strongly
coupled layers and assessing the model's performance. The system utilized photographs as the
dataset, resulting in innovative modern-day outcomes that demonstrate the efficacy of our novel
technique.
Giordan et al. (2020) provided an overview of unmanned aerial vehicles (UAVs) and their
prospective uses in engineering geology. A good background in data processing and a good
drone pilot ability for the management of the flight mission in particular in a complex
environment were required for the use of UAVs as a standard research instrument for the
acquisition of images and other information on demand over an area of interest, and these
systems required a good background in data processing and a good drone pilot ability for the
management of the flight mission in particular in a complex environment. UAVs are a low-cost
and quick way to get detailed images of a specific region of interest on demand, as well as create
detailed 3D models and orthophotos.
Al-Sheary and Almagbile (2019) highlighted historical tendencies from the Kingdom of Saudi
Arabia hosting millions of pilgrims each year during the Hajj and Mrah seasons, which results in
stampedes. This recurrence emphasizes the importance of studying and dealing with crowd
dynamics in a more scientific manner. In this regard, effective crowd monitoring and other safe
crowd management techniques, such as real-time crowd monitoring with an unmanned aerial
vehicle (UAV), have been used to reduce the risks associated with large crowds. The crowd
density was assessed using image segmentation, and data was acquired from real-time
photographs shot by UAVs. Understanding and dealing with the safety aspects of crowd
dynamics in large crowds associated to sports, religious, and cultural events is critical,
particularly in terms of crowd risk analysis and crowd safety. The system is useful in that it
allows for quick decisions based on very accurate data. The results reveal that the picture
segmentation technique utilized is capable of mapping crowd density with an accuracy of 80%.
In real-world applications like as video surveillance, public safety, urban planning, and traffic
monitoring, estimating the number of people in unconstrained settings is a critical but difficult
task. Due to the numerous challenges and problems that crowd counting faces, Gouiaa, et al.,
(2021), developed a method for estimating the number of people that uses deep convolution
neural networks (CNNs) and public crowd counting datasets. This method can be adapted and
applied to related tasks in a variety of fields, including plant counting, vehicle counting, and cell
microscopy. Furthermore, methodologies established for estimating the number of individuals
can be adapted and applied to comparable tasks in a variety of domains, including plant
counting, vehicle counting, and cell microscopy. Cluttered environments, high occlusions, scale
variation, and variations in camera perspective are only a few of the issues and concerns that
crowd counting faces. As a result, enormous research efforts have been devoted to crowd
counting in recent years, and various outstanding strategies have been proposed. Advances in
deep convolution neural networks (CNNs) and public crowd counting datasets are largely
responsible for the tremendous progress in crowd counting approaches in recent years. Finally,
techniques based on detection, regression, and classic density estimates were compared.
Deep learning has lately shown exceptional results for tackling a wide variety of robotic tasks in
the domains of perception, planning, localization, and control, according to Carrio et al. (2017).
Its superior ability to learn representations from complicated data obtained in real contexts
makes it ideal for a wide range of autonomous robotic applications. Simultaneously, unmanned
aerial vehicles (UAVs) are being widely used for a variety of civilian functions, ranging from
security, surveillance, and disaster relief to parcel delivery and warehouse management. Finally,
the main problems for applying deep learning to UAVs are discussed.
Wen, et al., (2021), construct a benchmark with a new drone-captured largescale dataset formed
by 112 videos clips with 33; 600 HD frames in various scenarios, to promote the developments
of object detection, tracking and counting algorithms in drone captured videos. Meanwhile, the
Space-Time Neighnor-Aware Network (STNNet) was designed as a strong baseline to solve
object detection, tracking and counting jointly in dense crowds. STNNet is formed by the feature
extraction module, followed by the density map estimation heads, and localization and
association subnets. Also, the neighboring context loss to guide the association subnet training
was designed to exploit the context information of neighboring objects. This context enforces
consistent relative position of nearby objects in temporal domain.
Useful information for precision fertilization, irrigation, and yield prediction provides timely
estimation of rapeseed stand count at early growth stages. No field of study has been reported on
estimating rapeseed stand count by the number of leaves recognized with convolutional neural
networks (CNNs) in unmanned aerial vehicle (UAV) imagery. Hence, Zhang, et al., (2020),
developed a CNN model to recognize leaves in UAV-based imagery, and rapeseed stand count
was estimated with the number of recognized leaves. The developed system provides a case for
rapeseed stand counting with reference to the existing knowledge of the number of leaves per
plant and to determine the optimal timing for counting, after rapeseed emergence at leaf
development stages with one to seven leaves. Having compared the performance of leaf
detection using 16, 24, 32, 40 and 48 pixels’ sample sizes, the results shows that CNN-based leaf
count achieved the best performance at the four-to six-leaf stage with F-scores greater than 90%
after calibration with over counting rate. Therefore, this confirmed automatically, rapidly, and
accurately the feasibility to estimate rapeseed stand count.
Almagbile, (2019), carried out a research on the rapid development in platforms and sensors
technology in terms of digital cameras and video recordings. This discovery shows that crowd
monitoring has taken a considerable attention in many disciplines such as psychology, sociology,
engineering, and computer vision, and it is essential for safety enhancement and movement
controllability in order to minimize the risk particularly in highly crowded incidents. Unmanned
aerial vehicles (UAVs) is a platform that has been extensively employed in crowd monitoring,
and it has the capability to acquire fast, cost effective, high-resolution and real time images over
crowd areas. The system is based on the feature from accelerated segment test (FAST)
algorithms to detect the crowd features from UAV images taken from different camera
orientations and positions. Furthermore, after comparing a single pixel which takes the ranking
number 9 (for FAST-9) or 12 (for FAST-12) with the center pixel, accuracy assessment in terms
of completeness and correctness was used to assess the performance of the testing procedure
before and after filtering the crowd features. The results show that the proposed algorithm was
able to extract crowd features from different UAV images. Overall, the values of Completeness
range from 55 to 70 % whereas the range of correctness values was 91 to 94 %.
Crowd counting is a challenging problem due to the scene complexity and scale variation,
according to Chen, et al., (2020). Crowd counting is prone to errors as these methods usually see
some objects as people mistakenly, causing an inaccurate result. Crowd Attention Convolutional
Neural Network (CAT-CNN) was used to count crowd. This technique encodes a confidence
map that detects the human head and obtains the crowd count by integrating the final density
map. This method outperformed many other state-of-the-art methods.
Crowd counting in computer vision is to determine the number of people present in an image or
video, according to Jiang, et al., (2020). Crowd counting has a great deal of applications which
includes video surveillance, public safety, traffic control, agriculture monitoring, and cell
counting. This method uses a Density Attention Network (DANet) and Attention Scaling
Network (ASNet). This model uses an Adaptive Pyramid Loss (APLoss) to calculate estimation
loss,, and as a result improves the ability of the counting network. After experimenting on four
challenging datasets, this method proved to be superior as compared to others.
Liu et al., (2019) conducted a research on crowd counting using a novel Deep Structured Scale
Integration Network (DSSINet).This research is important as it can be applied in the fields of
video surveillance, traffic management, and traffic forecast, etc. This method addresses scale
variation of crowd by using structured feature representation learning and hierarchically
structured loss function optimization.
W. Liu et al., (2019) used an end-to-end method of crowd counting that involves a trainable deep
architecture that combines features obtained using multiple field sizes. This approach adaptively
yields an algorithm that out-performs other crowd counting methods. This method uses a deep
net architecture that adaptively encodes multi-level contextual information into the features it
produces, a scale-aware feature was used to regress to a final density map with calibrated and
uncalibrated cameras.
Wang et al., (2019) compared and analyzed the highlights of the three mains methods for crowd
counting. The three main methods are Multi-Column Parallel convolutional Neural Network
(MCNN), Switch-Convolutional Neural Network (Switch-CNN), and Congested Scene
Recognition (CSRNet). The three methods were compared from the perspective of network
structure and experimental performance. From the results obtained, the CSRNet is the best
performing CNN.
Density crowd counting and modeling at different gatherings has ignited a new flame in the
visual surveillance research community, according to Pandey et al., (2020). This method models
crowd counting in densely populated images. The orthographic projection of the crowd is
captured using a camera attached to a drone to reduce effect of occlusion or scaling. Data
prepared undergoes a model training using CNN. This method is simple and gives better results.
Future works would see this method having low illumination during nights as most shows
happen at night.
(Jiang et al., 2020) proposed two methods to alleviate counting performance differences caused
by the CNN based methods, these methods are Density Attention Network (DANet) and
Attention Scaling Network (ASNet). ASNet is responsible for generating scaling factors and
outputting attention-based density maps that only focus on their corresponding attention regions.
Extensive experiments prove that this method is superior compared to other state-of-the-art
methods.
Crowd counting presents enormous challenges in the form of large variation in scales within
images and across the dataset, according to Sindagi & Patel (2019). This method uses network
that involves a Multi-Level Buttom-Top and Top-Bottom fusion to combine information from
shallower to deeper layers and vice versa at multiple levels, Scale Complementary Feature
Extraction Blocks (SCFB) involving cross-scale residual functions to explicitly enable flow of
complementary features from adjacent conv layers along the fusion paths. This method results in
a more effective fusion of features from multiple layers of the backbone network. This method is
able to achieve significant improvements when evaluated on three popular crowd counting
datasets.
Tian et al., (2020) proposed a method of crowd counting, named pan-density crowd counting.
The Pan-Density Network (PaDNet) is composed of the following critical components, the
Density-Aware Network (DAN) which contains multiple subnetworks pretrained on scenarios
with different densities, the Feature Enhancement Layer (FEL) which effectively captures the
global and local contextual features and generates a weight for each density-specific feature, the
Feature Fusion Network (FFN) embeds spatial context and fuses these density-specific features.
This method obtained the lowest predictive errors and high robustness in crowd density counting.
Crowd counting is one of the most challenging issues in computer vision community for safety
and security through surveillance systems, according to Kalyani et al., (2021). Crowd counting
approaches still encounter problems like non-uniform density distribution, partial occlusion and
discrepancies in scale and point of view. To address this problem, Feature Pyramid Networks are
introduced in deep convolution networks for counting the individuals in the crowd. ResNeXtFP
counts the individuals in medium or high-level crowd visible in a still image. The convolutions
in the background network are utilized to extract the multi-scale features, creating density maps
with unaltered resolution. This method can accomplish best in class exhibition.
Reference
Al-sheary, A., & Almagbile, A. (2017). Crowd Monitoring System Using Unmanned Aerial
Vehicle (UAV).Journal of Civil Engineering and Architecture, 11(2017), 1014-1024.
https://doi.org/10.17265/1934-7359/2017.11.004
Almagbile, A. (2019). Estimation of crowd density from UAVs images based on corner detection
procedures and clustering analysis. Geo-Spatial Information Science, 22,(1), 23–34.
https://doi.org/10.17265/1934-7359/2017.11.004
Alubankudi, O. T., & Ogunti E. O. (2021). Development of Crowd Counting And Density
Estimation Model Using CNN. Journal of Multidisciplinary Engineering Science Studies
(JMESS), 7(4), 3802-3808.
Carrio, A., Sampedro, C., Rodriguez-ramos, A., & Campoy, P. (2017). A Review of Deep
Learning Methods and Applications for Unmanned Aerial Vehicles. Journal of Sensors,
2017, 1-14.
Elharrouss, O., Almaadeed, N., Abualsaud, K., Al-Ali, A., Mohamed, A., Khattab, T., & Al-
Maadeed, S. (2021). IEEE Transactions on Aerospace and Electronic Systems. (2021), 1-
14.
Kalyani, G., Janakiramaiah, B., Narasimha Prasad, L. V. Karuna, A. & Babu., A. M. (2021).
Efficient Crowd Counting Model using Feature Pyramid Network and ResNeXt. Research
Square, 1-23.
Giordan, D., Adams, M. S., Aicardi, I., Alicandro, M., Allasia, P., Baldo, M., Berardinis, P.D.,
Dominici, D., Godone, D., Hobbs, P., Lechner, V., Niedzielski, T., Piras, M., Rotilio, M.,
Salvini R., Segor, V., Sotier, B., & Troilo, F. (2020). The use of unmanned aerial vehicles
(UAVs) for engineering geology applications. Bulletin of Engineering Geology and the
Environment 79(2020), 3437–3481.
Gouiaa, R., Akhloufi, M. A., & Shahbazi, M. (2021). Advances in Convolution Neural Networks
Based Crowd Counting and Density Estimation. Big Data and Cognitive Computing, 5(50),
1-21.
Wen, L., Du, D., Zhu, P., Hu, Q., Wang, Q., Bo, L., & Lyu, S. (2021). Detection, Tracking, and
Counting Meets Drones in Crowds: A Benchmark. Applied Basic Research Program of
Qinghai (2021), 1-10.
Zhang, J., Zhao, B., Yang, C., Shi, Y., Liao, Q, Zhou, G., Wang, C., Xie, T., Jiang, Z, Zhang, D.,
Yang, W., Huang, W., & Xie, J., (2020). Rapeseed Stand Count Estimation at Leaf
Development Stages With UAV Imagery and Convolutional Neural Networks. Frontiers in
Plant Science. 11(2020), 1-16.