Deep Learning-based Anti-Drone System using
Radar Technology
Ann Janeth Garcia, Jae Min Lee, Dong-Seong Kim,
Networked Systems Laboratory, Department of IT Convergence Engineering,
Kumoh National Institute of Technology, Gumi, South Korea.
{ajg.garcia, ljmpaul, dskim}@kumoh.ac.kr
Abstract—This paper proposes a drone detection system with purpose of the paper [2] by Roldan, et. al. It has more than
the use of a radar device-based detection scheme. The radar 17,000 samples of cars, people and drones that is obtained
used to collect the data uses a frequency modulated continuous in real outdoor scenarios. The radar that is used to capture
wave (FMCW) on an 8.75 GHz based frequency band with a
BWmax of 500 MHz. The Deep Neural Network (DNN) is varied the data is a constant radar system created by the Microwave
with different number of filters and the preference that will give and Radar Group called RAD-DAR (Radar-Digital Array
the most accurate result is selected and is compared to different Receiver). It operates on a frequency band that is centered
machine learning algorithms such as ResNet-18, SqueezeNet and at 8.75 GHz with a BWmax of 500 MHz FMCW.
Support Vector Machine (SVM). After a digital signal processing, a 4092x512 matrix for
Index Terms—Deep neural network, drone detection, radar.
every single scene is acquired. The distance cells are in rows
and the Doppler frequencies are in columns, all in dBm unit.
I. I NTRODUCTION
These matrices have been reduced resulting in 11x61 matrices.
In the last decades, numerous surveillance technologies have
been studied for drone detection because of the great threats II. N ETWORK S TRUCTURE
that has been posed by drones. In [1], a radar-based system is The input for our network is an 11x61 Doppler matrix. The
used to secure an area against approaching unwanted drones by parameters such as stride, it is the step size with which the
tracking and jamming the signal that is used by the controller. filter moves. The stride in the pooling layer is set to (2,2)
The authors’ goal in this paper is to develop a drone-detection and (1,1) for the convolutional layer. Since there are 3 series
system that will provide long-term surveillance. Automatic of convolutional and pooling layer that is paralleled to each
classification techniques with quality labelled data is necessary other the filter size is set to (3,3), (5,5) and (7,7). The filter
to improve the efficiency of this system. This system is based size determines how many neighbor information a neuron can
on the database, Real Doppler RAD-DAR (Radar with Digital see when processing a current layer. When the filter size is
Array Receiver). (3,3), each neuron can see a total of 8 neighbor information
Neural Network (NN) is a computing system composed of around it.
many simple processing units working simultaneously or in Connected to the input layer are the hidden layers that is also
parallel to understand experiential knowledge from a dataset. called the feature detection layers. These do one of the three
It has its input layer, output layer and hidden layers that are types of operations on the data, which is pooling, convolution,
present in between. The complexity of the models depends and/or activation layer.
on the number of hidden layers in every layer, as well as • As shown in Fig. 1, convolution layer is chosen because
the number of nodes. NNs are commonly used in pattern this sets the input signal across a set of different convo-
recognition applications. Deep NN (DNN) is a supervised lutional filters where each filter activates some specific
neural network that has numerous hidden layers within the features from the frames.
input and output layer. This is applied in processing high- • MaxPooling layer as the pooling technique is selected
dimensional data and in learning progressively complicated and is connected after the convolution layer because
models however, it has increased training difficulties and this simplifies the output by doing a non-linear down
requires more computing resources. In every layer’s neurons, it sampling which will decrease the number of parameters
trains a feature representation centered on the former layer’s or features that the network wants to learn.
output called feature hierarchy. This makes DNNs efficient • The activation layer that is used is a Rectified Linear Unit
in handling significantly large high-dimensional data sets. It or ReLU. This layer allows rapid and efficient training by
provides much improved performance with other machine plotting the negative values to zero and retaining positive
learning algorithms because of multiple-level feature repre- values.
sentation learning. • The combination layer used is an addition layer. This
RAD-DAR Dataset is used, and it is a quality labelled layer combines all the output of the activation layer into
database that has been produced after an extensive controlled one single output and then will be the input of the next
trial test campaign. This novel dataset is created for the sole mid-block our output block.
results obtained using the RAD-DAR database with the pro-
posed network is compared with ResNet-18 and SqueezeNet
and Support Vector Machine (SVM). These are pretrained
convolutional neural networks with a million images from
ImageNet database and are readily available in MATLAB. The
input and output blocks of these networks were varied so that
it will match the database used.
TABLE II
C OMPARISON OF RESULTS WITH OTHER NETWORKS
Network Accuracy Time
consumption
Proposed network 96.54% 0.0034 s
ResNet-18 93.47% 0.0059 s
SqueezeNet 92.44% 0.0064 s
SVM 95.34% 0.0065 s
IV. C ONCLUSION
The proposed network that has 48 filters gave the highest
Fig. 1. Proposed network structure design implemented in MATLAB using accuracy and it also exceeds the accuracy of the ResNet-
Deep Network designer with the input layer, output layer and hidden layers 18, SqueezeNet and SVM. Although the accuracy of these
in between. networks as shown in Table 2, shows not that much difference,
the proposed network would still be the best choice of network
The input block is composed of the input layer and normal- to use because of the time consumption. This is the time
ization layer, this is where the signal enters. Mid-block is the consumed in processing one frame by the network. Therefore,
feature detection layer that is composed of the convolution, a lower processing time indicates how fast the network is.
pooling, and an activation function which is ReLU. The last The proposed network is a promising model that can
layer in the mid-block is the combination layer, this layer be applied for the radar-based anti-drone system. Despite
combines the outputs of the activation layers into one single outperforming other considered models, the network should
output. be optimized prior to implementation and validated in more
Lastly, the output block is the last layer that comes next after scenarios and for future works, varying the pooling technique
the hidden layer. It composes of the classification layers of this and combination layer can be done to check if it will further
network. The third to the last layer is a fully connected layer improve the system.
(FC) that gives an output of a vector of k-dimensions where k
ACKNOWLEDGEMENT
is the number of output classes that the network will be able
to detect. This vector includes the probabilities for each class This research was supported by the MSIT (Ministry of
of any frame that is being categorized, and the final layer in Science and ICT), Korea, under the Grand Information Tech-
the network is a SoftMax layer that provides the classification nology Research Center support program (IITP-2020-2020-0-
output that is being classified where in this case, it’s either a 01612) supervised by the IITP (Institute for Information &
drone, person or car. communications Technology Planning & Evaluation).
III. R ESULTS R EFERENCES
The number of filters that is set in the input layer is varied [1] Multerer, T., Ganis, A., Prechtel, U., Miralles, E., Meusling, A., Miet-
from 16, 24, 32 and 48, and then compared and checked which zner, J., ... & Ziegler, V. (2017, October). Low-cost jamming system
against small drones using a 3D MIMO radar based tracking. In 2017
number of filters will give the most accurate result. European Radar Conference (EURAD) (pp. 299-302). IEEE.
[2] Roldan, I., del-Blanco, C. R., de Quevedo, Á. D., Urzaiz, F. I., Menoyo,
TABLE I J. G., López, A. A., ... & Garcı́a, N. (2020). DopplerNet: a convolutional
VARYING THE NUMBER OF FILTERS OF THE PROPOSED NETWORK neural network for recognising targets in real scenarios using a persistent
range–Doppler radar. IET Radar, Sonar & Navigation, 14(4), 593-600.
No. of Accuracy Time Learnable Parameters
filters consumption Parameters
16 95.85% 0.0032 s 3,715
24 96.00% 0.0032 s 5,571
32 96.40% 0.0033 s 7,427
48 96.54% 0.0034 s 11,139
Table 1 shows that a 32-filter network will give a higher
accuracy than other numbers of filters and in Table 2, the