0% found this document useful (0 votes)
114 views5 pages

Small Boat Detection For Radar Image Datasets With YOLO V3 Network

This document presents a method for small boat detection in radar image datasets using time-frequency analysis and the YOLO V3 deep learning network. Time-frequency images are generated from radar echo data using short-time Fourier transform. These images are then used to train the YOLO V3 network for classification. The method achieved 94.89% accuracy for classification, outperforming a LeNet-5 convolutional neural network. Experimental data validated that the proposed method can effectively complete the small boat detection task in radar images.

Uploaded by

Dinar TAS
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
114 views5 pages

Small Boat Detection For Radar Image Datasets With YOLO V3 Network

This document presents a method for small boat detection in radar image datasets using time-frequency analysis and the YOLO V3 deep learning network. Time-frequency images are generated from radar echo data using short-time Fourier transform. These images are then used to train the YOLO V3 network for classification. The method achieved 94.89% accuracy for classification, outperforming a LeNet-5 convolutional neural network. Experimental data validated that the proposed method can effectively complete the small boat detection task in radar images.

Uploaded by

Dinar TAS
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Small Boat Detection for Radar Image Datasets

with YOLO V3 Network


Guanqing Li∗ , Zhiyong Song, Qiang Fu
College of Electronic Science, National University of Defense Technology, Changsha, China
∗ Email address: ([email protected])

Abstract—Small boat detection under the influence of sea


clutter is usually difficult, especially for the low-resolution pulse
Doppler Radar, as the amplitude of the boat is covered by
sea clutter in the time domain, and the spectrum of the boat
and sea clutter overlap in the frequency domain. This paper
proposes a new method for the detection task based on time-
frequency analysis and YOLO V3 network. This method can
automatically extract the characteristics of sea clutter and boat
in time-frequency images and complete the classification task.
The accuracy of the classification is 94.89 percent, which has
an improvement of 14.39% than the accuracy of LeNet-5. The
measured data verified the method.
Keywords—YOLO V3, Time-Frequency Analysis, Small Boat
Detection, Radar, Deep Learning (a) (b)

Fig. 1. The horizontal axis represents 96 distance ranges, each of which is 15


I. I NTRODUCTION meters. The vertical axis represents the pulse, a total of 57,334 pulses. The
China has more than 3 million square kilometers of sea area black solid line represents the GPS trajectory of the boat. The target echo is
completely annihilated in the strong sea clutter.
and 18,000 kilometers of coastline, ensuring marine safety is
an important aspect of maintaining national security. Radar
can detect and track the targets without the influence of time
and climate. It is an indispensable part of the marine military.
The sea observation radar will encounter severe disturbances
of the sea surface radar echo when it is working. This
interference signal is generally called sea clutter. Sea clutter
is affected by radar parameters, wave height, wind direction,
ocean currents, rainfall, and seawater dielectric constant [1].
Sea clutter is much more complicated than ground clutter. In
addition, if the target to be observed is small, the target echo
signal is often submerged in the strong sea clutter, making the
detection problem more difficult.
Su et al. [2] use deep CNN to classify different sea states
(a) (b)
and polarizations through the IPIX measured data. However,
for low-resolution pulse Doppler radar, when the target and sea
clutter are mixed in both the time domain and the frequency
domain, how to accomplish the small target detection task is
still an urgent problem to be solved.
The experimental data is from the publicly available Fynmet
dynamic radar data from the South African Scientific and
Industrial Research Council (CSIR) in Overberg. The CSIR
radar is vertically polarized at a frequency of 9 GHz with a
pulse repetition rate of 5 kHz, a range resolution of 15 m and (c) (d)
a boat size of 3-5 m.
The image below is from CSIR Fynmet dataset TFC15 009 Fig. 2. Time-frequency image obtained by the short-time Fourier transform
for the original echo data. The horizontal axis represents time, 0-13.46s. The
and TFC15 011. The first 132 seconds of returning data are vertical axis represents the Doppler shift, which is -800 Hz to 800 Hz from
shown in Figure 1, which represents the amplitude of the radar bottom to top. The first line contains the target and the second line does not.
echo from the periodic waves and targets of the Overberg coast
near Cape Town, South Africa.

‹,(((

Authorized licensed use limited to: UNIVERSITY OF WESTERN ONTARIO. Downloaded on October 11,2020 at 03:48:12 UTC from IEEE Xplore. Restrictions apply.
We take a column of the original echo data for the short-time classification network and classifier similar to ResNet. The
Fourier transform, and some of the resulting time-frequency figure shows the structure of the YOLO V3 network [6],
images are shown in Figure 2. It can be seen from Figure 2(a) consisting of 75 Convolutional Layers, 20 Shortcut Layers,
and 2(b), sometimes the sea clutter and the Doppler shift of a and 2 Up sample Layers. The improvement of YOLO V3 is
boat is mixed, and the Doppler frequency shift of sea clutter mainly reflected in the following two aspects.
is very strong.
1) Multi-scale prediction: The low-level feature semantic
In this paper, we propose a new method for the detection
information is relatively small, but the target location infor-
task based on time-frequency analysis and YOLO V3 network.
mation is accurate; the high-level feature semantic information
The measured data shows that our method can effectively
is rich, but the target location information is relatively rough.
complete the classification task. At the end of the article,
In YOLO V3, three boxes are predicted for each scale, and
the result of our experiments is compared with the classical
the anchor design method still uses clustering to obtain 9
convolutional neural network LeNet-5 [3].
cluster centers, which are divided into three scales according to
II. M ETHOD their size. YOLO v3 adopts Upsample, which combines three
scales (13*13, 26*26 and 52*52), and independently performs
A. Overall Architecture
detection on the fusion feature maps of multiple scales, and
The overall architecture of our method is shown in Figure 3. finally improves the detection effect of the target.
The raw data is a two-dimensional complex-valued matrix. The
horizontal axis data has been processed by pulse compression, 2) Basic network Darknet-53: Unlike Darknet-19, YOLO
representing the distance range, and the vertical axis represents V3 uses a 53-layer convolutional network that is superimposed
the sequence of the pulse. The resulting images generated by by residual units. There are two main changes for the Darknet-
STFT are saved as three-channel images in PNG format by 53 convolutional layer. One is to replace the original block
pseudo-color processing. with a simplified Residual block. The second is to use concate-
nated convolution to increase the channel. The basic network
comparison of YOLO v2 and v3 is shown in the TABLE I.
3DVFDO
5DZ
92& &111HWZRUN :HLJKWV
'DWD
'DWDVHWV
67)7 TABLE I
Yolo Tiny T HE BASIC NETWORK COMPARISON OF YOLO V2 AND V3.
Yolo V2 Tiny
)RUPDW Yolo V3 Tiny
&RQYHUVLRQ
Darknet 19 Darknet 53
Yolo V3
DQG/DEHOLQJ Type Filer Size Out Tis Type Filer Size Out
Conv 32 3 224 Conv 32 3 256
1RUPDOL]DWLRQ 7UDLQLQJ Maxp 2 112 Conv 64 3 128
&111HWZRUN 0RGHO Conv 64 3 112 Conv 32 1
'DWDVHWV
Maxp 2 56 1× Conv 64 3
5DGDU Conv 128 3 56 Resid 128
,PDJH Conv 64 1 56 Conv 128 3 64
5HVXOWV Conv 128 3 56 Conv 64 1
'DWDVHWV ą%RDW&OXWWHUĆ
7HVWLQJ Maxp 2 28 2× Conv 128 3
'DWDVHWV ą&OXWWHUĆ
Conv 256 3 28 Resid 64
Conv 128 1 28 Conv 256 3 32
Conv 256 3 28 Conv 64 1
Fig. 3. The architecture of our approach. The left and right part of the
Maxp 2 14 8× Conv 128 3
picture is the procedure of generating time-frequency image datasets and how
Conv 512 3 14 Resid 64
to detect small boats through convolutional neural networks, respectively. The
Conv 256 1 14 Conv 256 3 32
upper right part of the figure is a flow chart for transfer learning on the Pascal
Conv 512 3 14 Conv 64 1
VOC dataset.
Conv 256 1 14 8× Conv 128 3
Conv 512 3 14 Resid 64
During the forward training process, the images from the Maxp 64 2 7 Conv 256 3 16
training datasets are put into the CNN, including a series of Conv 1024 3 7 Conv 64 1
convolutional layers, max-pooling layers, and fully connected Conv 512 1 7 4× Conv 128 3
Conv 1024 3 7 Resid 8
layers to produce suggested features. We get the initial weight Conv 512 1 7 AvgP Glo
of the network by training on the Pascal VOC dataset. During Conv 1024 3 7 FC 1k
the back-propagation process, we calculate the loss. We put Conv 1000 1 7 SofM
AvgP Glo 1k
the testing image into the saved model to predict whether the SofM
image is ‘sea clutter’ or ‘boat + sea clutter’.
B. YOLO V3 Network In the TABLE I, ’Conv’, ’Maxp’, ’AvgP’, ’SofM’, ’Out’,
The network architecture of YOLO V3 is shown in Figure 4. ’Tis’, ’Resid’, ’Glo’, and ’FC’ means convolutional layer, max
It is improved on the basis of YOLO [4] and YOLO V2 pooling layer, average pooling layer, soft-max layer, output
[5], mainly in two aspects: one is to carry out multi-scale size, number of network repetitions, residual layer, global, and
prediction similar to FPN; the other is to use a better basic fully connected layer, respectively.

Authorized licensed use limited to: UNIVERSITY OF WESTERN ONTARIO. Downloaded on October 11,2020 at 03:48:12 UTC from IEEE Xplore. Restrictions apply.
Fig. 4. YOLO V3 network structure.

C. Time Complexity of the Network Non-maximum suppression (NMS) is widely used in target
The time complexity of the network is analyzed from the detection algorithms. Its purpose is to eliminate redundant
size of the input and output of each layer. Only the number candidate frames and find the best object detection location.
of multiplication operations is calculated. The amount of Now suppose there is a set of candidate boxes B and its
operation can be expressed by the following formula: corresponding scores set S. Firstly, we find the m with the
highest score; then we remove the box corresponding to M
75
 from B; and then, we add the deleted box to the set D; next,
O (N, F, n) = Niout ·Fi · (ni · ni · Fi−1 ) (1)
the other boxes in which the box overlap area corresponding
i=1
to M is larger than the threshold value Nt are deleted from
Where Niout is the output image size of the i-th layer network, B; finally, the above steps 1-4 are repeated. The pseudocode
and Fi is the number of the i-th layer convolution kernel is in TABLE II.
(the number of channels), and ni is the size of the i-th layer
convolution kernel. The total available computation of the E. Evaluation index
YOLO V3 network is 65.290 BFLOPS.
After the classification, it is necessary to evaluate the
D. Non-maximum suppression algorithm classification results. In addition to the commonly used correct
rate, the evaluation criteria include recall accuracy, false alarm
rate, and missed alarm rate.
TABLE II
T HE NON - MAXIMUM SUPPRESSION ALGORITHM Accuracy indicates the proportion of positive samples were
correctly classified. The formula is as follows:
NMS Algorithm for Object Detection
1: Input: B = {b1 , b2 , · · · , bN } , S = {s1 , s2 , · · · , sN } , Nt NT P + NT N
B is the list of initial detection boxes Pa = (2)
S contains corresponding detection scores NT P + NT N + NF P + NF N
Nt is the NMS threshold
2: begin
Where NT P indicates the number of positive class samples
3: D ← {} correctly classified, NT N indicates the number of negative
4: while B = ∅ do class samples correctly classified, NF P indicates that negative
5: m ← argmax S
6: M ← bm
class samples are divided into positive classes, and NF N
7: D ← D ∪ M; B ← B − M indicates that positive class samples are classified into negative
8: for bi in B do classes.
9: If iou(M, bi ) ≥ Nt then
10: B ← B − b i ; S ← S − si ;
11: end III. E XPERIMENTS
12: end
13: end In this section, we will describe the training process, the
14: return D, S testing process, the training parameters after adjusting the
14:end parameters, and the experimental results of the test.

Authorized licensed use limited to: UNIVERSITY OF WESTERN ONTARIO. Downloaded on October 11,2020 at 03:48:12 UTC from IEEE Xplore. Restrictions apply.
A. Training
Training the radar dataset with the YOLO V3 network is a
step-by-step process:
Step1, A short-time Fourier transform is performed on the
original echo data, and the width of the window function is
0.5 s, which is a balance of time resolution and frequency
resolution.
Step2, The time-frequency image is marked by the GPS
information of the radar data set.
Step3, Let the YOLO V3 network pre-train on Pacasal VOC
2012 to get the network weight file yolov3.weights.
Step4, Load the weights and adjust parameters, including
learning rate, momentum attenuation, batch size, epoch.
There are 1212 images in total. 70% of them were used for
training and the rest were used for testing. Here, we randomly
extract 848 pictures as the training set, the remaining 364
pictures as the test set, and then load the weight file to adjust Fig. 6. Accuracy curve during training, red solid line represents training
the parameters. The results of the adjustment are in TABLE III. accuracy, and blue solid line represents test accuracy.
The results of the adjustment for LeNet-5 and YOLO V2 are
also among them.
Step1, Load the model saved during training, ie the
TABLE III yolov3.pb and yolov3.meta files, and feed the image into the
T HE RESULTS OF THE ADJUSTMENT. model.
Step2, Use a non-maximum suppression algorithm to dis-
Parameter LeNet-5 YOLO V2 YOLO V3
Learning Rate 1e-3 1e-4 1e-4 card the extra bounding boxes, leaving at most one bounding
Momentum 0.9 0.9 0.9 box.
Batch Size 64 16 8 Step3, If there is a prediction box, the prediction result is
Epoch 2000 1000 2000
Training Time 30 min 2h 4h ‘sea clutter + boat’, otherwise ’sea clutter’.
In step 2, there is one point that needs to be explained.
We recorded and visualized the loss and accuracy values of Through the training, each picture can be predicted to get
YOLO V3 during the 2000 epoch training epochs. Figure 5 (13 × 13 + 26 × 26 + 52 × 52) × 3 = 10647 bounding boxes,
and Figure 6 show the curves of loss and accuracy in training, each bounding box contains six parameters: horizontal axis
respectively. upper left coordinate, vertical axis upper left coordinate, the
coordinates of the lower corner of the horizontal axis, the
coordinates of the lower right corner of the vertical axis, the
confidence of the target in the box, and the probability of the
target being a boat. There are too many prediction boxes here,
so you need to use the non-maximum suppression algorithm
to remove the extra boxes. In addition, in the boat inspection
mission, there is only one type of target in the label, the ’boat’.
Therefore, we only need to reserve at most one prediction box.
The test system is Linux, the specific version is Ubuntu
16.04, and the CPU processor is a 4-core 3.4GHz Intel i5
7500 processor, which is the same as the training system. The
GPU is not used during the test.
A total of 364 pictures of the ship were included in the
182 test pictures, 182 pictures without the ship. The accuracy
of recognition is 94.89%, and the evaluation index is shown
below:
From TABLE IV we see that the YOLO V3 indicators have
Fig. 5. The curve of loss during training, due to the use of transfer learning, the best performance. Compared to LeNet-5, which is directly
the initial loss value is relatively small.
classified using convolutional neural networks, YOLO V3 has
increased by 14.39 and 15.38 percent respectively in accuracy
B. Testing and precision. Its false alarm rate dropped by 6.45 percent.
The following steps are the process of testing the radar time- Its false alarm rate dropped by 7.14 percent. And the former
frequency image. is positioned at the same time as the classification, while the

Authorized licensed use limited to: UNIVERSITY OF WESTERN ONTARIO. Downloaded on October 11,2020 at 03:48:12 UTC from IEEE Xplore. Restrictions apply.
(a) (b) (c) (d)

(e) (f) (g) (h)

Fig. 7. More detailed results.The first and second lines are the test results of YOLO V3 and V2, respectively.

TABLE IV [3] Y. Lecun , L. Bottou , Y. Bengio and P. Haffner, “Gradient-based


T HE T ESTING R ESULT. learning applied to document recognition,” Proceedings of the IEEE,
vol. 86, no. 11, pp. 2278-2324, 1998.
Index value LeNet-5 YOLO V2 YOLO V3 [4] J. Redmon and A. Farhadi, “You Only Look Once: Unified, Real-Time
Accuracy Pa 0.8050 0.8267 0.9489 Object Detection”. Available online at http://arxiv.org/abs/1506.02640,
Precision Pp 0.8022 0.8352 0.9560 2015.
Missing Alarm Pma 0.1978 0.1648 0.0440 [5] J. Redmon and A. Farhadi, “YOLO9000: Better, Faster, Stronger”.
Recall Pr 0.8077 0.8182 0.8791 Available online at http://arxiv.org/abs/1612.08242, 2016.
False Alarm Pf a 0.1923 0.1818 0.1209 [6] J. Redmon and A. Farhadi, “YOLOv3: An Incremental Improvement”.
Available online at http://arxiv.org/abs/1804.02767, 2018.

latter can only be used for classification.


In Figure 7, we further demonstrate the specific results of
the test. Figure 7(a) - 7(d) are the test results of YOLO V3,
and Figure 7(e) - 7(h) are the test results of YOLO V2. For
some simple tasks, YOLO V3 and V2 are detected in addition
to the boat, but the positioning of the V3 is more accurate, as
shown in the Figure 7(a) and Figure 7(e). For the Figure 2(a)
and Figure 2(b), YOLO V3 correctly detected the boat, but
V2 did not.

IV. C ONCLUSION
Through a large number of experiments, this paper proposes
a sea surface small boat detection method based on YOLO V3
network for Radar data set. The result shows that this method
can distinguish the spectrum between the small boat and sea
clutter, and achieves high prediction accuracy.

R EFERENCES
[1] P. Shui , D. Li and S. Xu, “Tri-feature-based detection of floating small
targets in sea clutter,” IEEE Transactions on Aerospace and Electronic
Systems, vol. 50, no. 2, pp. 1416-1430, 2014.
[2] N. Su , X. Chen , J. Guan and Y. Li, “Deep CNN-based Radar
Detection for Real Maritime Target under Different Sea States and
Polarizations,” in International Conference on Cognitive Systems and
Information Processing. IEEE, 2018.

Authorized licensed use limited to: UNIVERSITY OF WESTERN ONTARIO. Downloaded on October 11,2020 at 03:48:12 UTC from IEEE Xplore. Restrictions apply.

You might also like