Small Boat Detection For Radar Image Datasets With YOLO V3 Network
Small Boat Detection For Radar Image Datasets With YOLO V3 Network
,(((
Authorized licensed use limited to: UNIVERSITY OF WESTERN ONTARIO. Downloaded on October 11,2020 at 03:48:12 UTC from IEEE Xplore. Restrictions apply.
We take a column of the original echo data for the short-time classification network and classifier similar to ResNet. The
Fourier transform, and some of the resulting time-frequency figure shows the structure of the YOLO V3 network [6],
images are shown in Figure 2. It can be seen from Figure 2(a) consisting of 75 Convolutional Layers, 20 Shortcut Layers,
and 2(b), sometimes the sea clutter and the Doppler shift of a and 2 Up sample Layers. The improvement of YOLO V3 is
boat is mixed, and the Doppler frequency shift of sea clutter mainly reflected in the following two aspects.
is very strong.
1) Multi-scale prediction: The low-level feature semantic
In this paper, we propose a new method for the detection
information is relatively small, but the target location infor-
task based on time-frequency analysis and YOLO V3 network.
mation is accurate; the high-level feature semantic information
The measured data shows that our method can effectively
is rich, but the target location information is relatively rough.
complete the classification task. At the end of the article,
In YOLO V3, three boxes are predicted for each scale, and
the result of our experiments is compared with the classical
the anchor design method still uses clustering to obtain 9
convolutional neural network LeNet-5 [3].
cluster centers, which are divided into three scales according to
II. M ETHOD their size. YOLO v3 adopts Upsample, which combines three
scales (13*13, 26*26 and 52*52), and independently performs
A. Overall Architecture
detection on the fusion feature maps of multiple scales, and
The overall architecture of our method is shown in Figure 3. finally improves the detection effect of the target.
The raw data is a two-dimensional complex-valued matrix. The
horizontal axis data has been processed by pulse compression, 2) Basic network Darknet-53: Unlike Darknet-19, YOLO
representing the distance range, and the vertical axis represents V3 uses a 53-layer convolutional network that is superimposed
the sequence of the pulse. The resulting images generated by by residual units. There are two main changes for the Darknet-
STFT are saved as three-channel images in PNG format by 53 convolutional layer. One is to replace the original block
pseudo-color processing. with a simplified Residual block. The second is to use concate-
nated convolution to increase the channel. The basic network
comparison of YOLO v2 and v3 is shown in the TABLE I.
3DVFDO
5DZ
92& &111HWZRUN :HLJKWV
'DWD
'DWDVHWV
67)7 TABLE I
Yolo Tiny T HE BASIC NETWORK COMPARISON OF YOLO V2 AND V3.
Yolo V2 Tiny
)RUPDW Yolo V3 Tiny
&RQYHUVLRQ
Darknet 19 Darknet 53
Yolo V3
DQG/DEHOLQJ Type Filer Size Out Tis Type Filer Size Out
Conv 32 3 224 Conv 32 3 256
1RUPDOL]DWLRQ 7UDLQLQJ Maxp 2 112 Conv 64 3 128
&111HWZRUN 0RGHO Conv 64 3 112 Conv 32 1
'DWDVHWV
Maxp 2 56 1× Conv 64 3
5DGDU Conv 128 3 56 Resid 128
,PDJH Conv 64 1 56 Conv 128 3 64
5HVXOWV Conv 128 3 56 Conv 64 1
'DWDVHWV ą%RDW&OXWWHUĆ
7HVWLQJ Maxp 2 28 2× Conv 128 3
'DWDVHWV ą&OXWWHUĆ
Conv 256 3 28 Resid 64
Conv 128 1 28 Conv 256 3 32
Conv 256 3 28 Conv 64 1
Fig. 3. The architecture of our approach. The left and right part of the
Maxp 2 14 8× Conv 128 3
picture is the procedure of generating time-frequency image datasets and how
Conv 512 3 14 Resid 64
to detect small boats through convolutional neural networks, respectively. The
Conv 256 1 14 Conv 256 3 32
upper right part of the figure is a flow chart for transfer learning on the Pascal
Conv 512 3 14 Conv 64 1
VOC dataset.
Conv 256 1 14 8× Conv 128 3
Conv 512 3 14 Resid 64
During the forward training process, the images from the Maxp 64 2 7 Conv 256 3 16
training datasets are put into the CNN, including a series of Conv 1024 3 7 Conv 64 1
convolutional layers, max-pooling layers, and fully connected Conv 512 1 7 4× Conv 128 3
Conv 1024 3 7 Resid 8
layers to produce suggested features. We get the initial weight Conv 512 1 7 AvgP Glo
of the network by training on the Pascal VOC dataset. During Conv 1024 3 7 FC 1k
the back-propagation process, we calculate the loss. We put Conv 1000 1 7 SofM
AvgP Glo 1k
the testing image into the saved model to predict whether the SofM
image is ‘sea clutter’ or ‘boat + sea clutter’.
B. YOLO V3 Network In the TABLE I, ’Conv’, ’Maxp’, ’AvgP’, ’SofM’, ’Out’,
The network architecture of YOLO V3 is shown in Figure 4. ’Tis’, ’Resid’, ’Glo’, and ’FC’ means convolutional layer, max
It is improved on the basis of YOLO [4] and YOLO V2 pooling layer, average pooling layer, soft-max layer, output
[5], mainly in two aspects: one is to carry out multi-scale size, number of network repetitions, residual layer, global, and
prediction similar to FPN; the other is to use a better basic fully connected layer, respectively.
Authorized licensed use limited to: UNIVERSITY OF WESTERN ONTARIO. Downloaded on October 11,2020 at 03:48:12 UTC from IEEE Xplore. Restrictions apply.
Fig. 4. YOLO V3 network structure.
C. Time Complexity of the Network Non-maximum suppression (NMS) is widely used in target
The time complexity of the network is analyzed from the detection algorithms. Its purpose is to eliminate redundant
size of the input and output of each layer. Only the number candidate frames and find the best object detection location.
of multiplication operations is calculated. The amount of Now suppose there is a set of candidate boxes B and its
operation can be expressed by the following formula: corresponding scores set S. Firstly, we find the m with the
highest score; then we remove the box corresponding to M
75
from B; and then, we add the deleted box to the set D; next,
O (N, F, n) = Niout ·Fi · (ni · ni · Fi−1 ) (1)
the other boxes in which the box overlap area corresponding
i=1
to M is larger than the threshold value Nt are deleted from
Where Niout is the output image size of the i-th layer network, B; finally, the above steps 1-4 are repeated. The pseudocode
and Fi is the number of the i-th layer convolution kernel is in TABLE II.
(the number of channels), and ni is the size of the i-th layer
convolution kernel. The total available computation of the E. Evaluation index
YOLO V3 network is 65.290 BFLOPS.
After the classification, it is necessary to evaluate the
D. Non-maximum suppression algorithm classification results. In addition to the commonly used correct
rate, the evaluation criteria include recall accuracy, false alarm
rate, and missed alarm rate.
TABLE II
T HE NON - MAXIMUM SUPPRESSION ALGORITHM Accuracy indicates the proportion of positive samples were
correctly classified. The formula is as follows:
NMS Algorithm for Object Detection
1: Input: B = {b1 , b2 , · · · , bN } , S = {s1 , s2 , · · · , sN } , Nt NT P + NT N
B is the list of initial detection boxes Pa = (2)
S contains corresponding detection scores NT P + NT N + NF P + NF N
Nt is the NMS threshold
2: begin
Where NT P indicates the number of positive class samples
3: D ← {} correctly classified, NT N indicates the number of negative
4: while B = ∅ do class samples correctly classified, NF P indicates that negative
5: m ← argmax S
6: M ← bm
class samples are divided into positive classes, and NF N
7: D ← D ∪ M; B ← B − M indicates that positive class samples are classified into negative
8: for bi in B do classes.
9: If iou(M, bi ) ≥ Nt then
10: B ← B − b i ; S ← S − si ;
11: end III. E XPERIMENTS
12: end
13: end In this section, we will describe the training process, the
14: return D, S testing process, the training parameters after adjusting the
14:end parameters, and the experimental results of the test.
Authorized licensed use limited to: UNIVERSITY OF WESTERN ONTARIO. Downloaded on October 11,2020 at 03:48:12 UTC from IEEE Xplore. Restrictions apply.
A. Training
Training the radar dataset with the YOLO V3 network is a
step-by-step process:
Step1, A short-time Fourier transform is performed on the
original echo data, and the width of the window function is
0.5 s, which is a balance of time resolution and frequency
resolution.
Step2, The time-frequency image is marked by the GPS
information of the radar data set.
Step3, Let the YOLO V3 network pre-train on Pacasal VOC
2012 to get the network weight file yolov3.weights.
Step4, Load the weights and adjust parameters, including
learning rate, momentum attenuation, batch size, epoch.
There are 1212 images in total. 70% of them were used for
training and the rest were used for testing. Here, we randomly
extract 848 pictures as the training set, the remaining 364
pictures as the test set, and then load the weight file to adjust Fig. 6. Accuracy curve during training, red solid line represents training
the parameters. The results of the adjustment are in TABLE III. accuracy, and blue solid line represents test accuracy.
The results of the adjustment for LeNet-5 and YOLO V2 are
also among them.
Step1, Load the model saved during training, ie the
TABLE III yolov3.pb and yolov3.meta files, and feed the image into the
T HE RESULTS OF THE ADJUSTMENT. model.
Step2, Use a non-maximum suppression algorithm to dis-
Parameter LeNet-5 YOLO V2 YOLO V3
Learning Rate 1e-3 1e-4 1e-4 card the extra bounding boxes, leaving at most one bounding
Momentum 0.9 0.9 0.9 box.
Batch Size 64 16 8 Step3, If there is a prediction box, the prediction result is
Epoch 2000 1000 2000
Training Time 30 min 2h 4h ‘sea clutter + boat’, otherwise ’sea clutter’.
In step 2, there is one point that needs to be explained.
We recorded and visualized the loss and accuracy values of Through the training, each picture can be predicted to get
YOLO V3 during the 2000 epoch training epochs. Figure 5 (13 × 13 + 26 × 26 + 52 × 52) × 3 = 10647 bounding boxes,
and Figure 6 show the curves of loss and accuracy in training, each bounding box contains six parameters: horizontal axis
respectively. upper left coordinate, vertical axis upper left coordinate, the
coordinates of the lower corner of the horizontal axis, the
coordinates of the lower right corner of the vertical axis, the
confidence of the target in the box, and the probability of the
target being a boat. There are too many prediction boxes here,
so you need to use the non-maximum suppression algorithm
to remove the extra boxes. In addition, in the boat inspection
mission, there is only one type of target in the label, the ’boat’.
Therefore, we only need to reserve at most one prediction box.
The test system is Linux, the specific version is Ubuntu
16.04, and the CPU processor is a 4-core 3.4GHz Intel i5
7500 processor, which is the same as the training system. The
GPU is not used during the test.
A total of 364 pictures of the ship were included in the
182 test pictures, 182 pictures without the ship. The accuracy
of recognition is 94.89%, and the evaluation index is shown
below:
From TABLE IV we see that the YOLO V3 indicators have
Fig. 5. The curve of loss during training, due to the use of transfer learning, the best performance. Compared to LeNet-5, which is directly
the initial loss value is relatively small.
classified using convolutional neural networks, YOLO V3 has
increased by 14.39 and 15.38 percent respectively in accuracy
B. Testing and precision. Its false alarm rate dropped by 6.45 percent.
The following steps are the process of testing the radar time- Its false alarm rate dropped by 7.14 percent. And the former
frequency image. is positioned at the same time as the classification, while the
Authorized licensed use limited to: UNIVERSITY OF WESTERN ONTARIO. Downloaded on October 11,2020 at 03:48:12 UTC from IEEE Xplore. Restrictions apply.
(a) (b) (c) (d)
Fig. 7. More detailed results.The first and second lines are the test results of YOLO V3 and V2, respectively.
IV. C ONCLUSION
Through a large number of experiments, this paper proposes
a sea surface small boat detection method based on YOLO V3
network for Radar data set. The result shows that this method
can distinguish the spectrum between the small boat and sea
clutter, and achieves high prediction accuracy.
R EFERENCES
[1] P. Shui , D. Li and S. Xu, “Tri-feature-based detection of floating small
targets in sea clutter,” IEEE Transactions on Aerospace and Electronic
Systems, vol. 50, no. 2, pp. 1416-1430, 2014.
[2] N. Su , X. Chen , J. Guan and Y. Li, “Deep CNN-based Radar
Detection for Real Maritime Target under Different Sea States and
Polarizations,” in International Conference on Cognitive Systems and
Information Processing. IEEE, 2018.
Authorized licensed use limited to: UNIVERSITY OF WESTERN ONTARIO. Downloaded on October 11,2020 at 03:48:12 UTC from IEEE Xplore. Restrictions apply.