Siamrcr: Reciprocal Classification and Regression For Visual Object Tracking
Siamrcr: Reciprocal Classification and Regression For Visual Object Tracking
952
Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence (IJCAI-21)
Regression
IoU Loss
256x25x25 4x25x25
256x31x31
Figure 2: Our proposed siamese framework on Reciprocal Classification and Regression (SiamRCR). It consists of a feature extractor, a
feature combination module, and a three-branch siamese head structure with the novel reciprocal links over the individual losses. Note that
the three links between the three branches are only designed for loss calculation during training and do not exist during inference.
severe since the independent optimization issue of classifica- 1. We propose a novel tracking model that solves the long-
tion and regression remains unsolved. standing unsolved classification and regression misalignment
In this paper, we propose a novel solution to alleviate the problem, with new simple, intuitive and efficient designs.
misalignment, which builds a reciprocal relationship between 2. It presents a new way on how to link losses of multiple
classification and regression, so that they can be optimized in branches and make the training and inference process more
a synchronized way for generating accuracy consistent out- consistent, which may provide inspirations to other tasks.
puts. Since the reciprocal relationship is the key for its suc- 3. Our SiamRCR achieves state-of-the-art performance on
cess, we name our model Siamese Network based Recipro- six public benchmarks, including GOT-10k, TrackingNet, La-
cal Classification and Regression with SiamRCR as its ab- SOT, OTB-2015, VOT-2018 and VOT-2019. The framework
breviation. The overall framework of SiamRCR is shown in is built on an anchor-free mechanism with a more direct cen-
Figure 2. Besides the commonly used classification branch ter offset and width/height prediction, running at 65 FPS.
and regression branch, we add two links (the classification
assistance link and the regression assistance link) to build the 2 Related Works
reciprocal relationship between them during model training.
Classification assists regression by weighting the regression 2.1 Siamese Network based Framework
loss with the classification confidence, so that regression can Comparing with traditional correlation filter tracking meth-
focus more on high confident positions for more precise loca- ods, recent siamese network based methods have achieved
tion. Regression assists classification by weighting the clas- superior performance since the pioneering work SiamFC was
sification loss with the localization score derived from the re- proposed [Bertinetto et al., 2016]. More recent studies [Li
gressed bounding box and the ground-truth box, forcing clas- et al., 2018; Li et al., 2019a] try to introduce object de-
sification score to be more consistent with regression accu- tection progresses into object tracking for more accurate lo-
racy. Since there is no such localization score during test- cation prediction. Though these works have explored sev-
ing/inference (ground-truth bounding box is unknown), a lo- eral important aspects, the accuracy misalignment problem
calization branch is added to predict such a localization score between classification and regression has been overlooked.
at each position, so that the prediction can be used as localiza- Ocean [Zhang et al., 2020] partially concerns a similar is-
tion score’s approximation to be consistent with the training sue and presents a feature alignment module to alleviate it
model. Therefore, the multiplication of the classification con- by utilizing the prediction of regression branch to refine the
fidence and the localization prediction confidence generates a classification branch. However, this cannot eliminate the mis-
new tracking score/confidence map for regression during test- alignment problem as the alignment is monodirectional. Dif-
ing, which ensures the consistency with the training process. ferently, our SiamRCR focuses on the misalignment problem
Besides the key idea of reciprocal classification and re- and proposes a simple, intuitive and more thorough solution
gression, two other designs also contribute to the effective- with bidirectional and reciprocal links and a novel comple-
ness and superiority of our model. One is that we choose mentary branch for making training and inference consistent.
to build on the anchor-free tracking mechanism so that the
whole model can be one-stage, clean, efficient with fewer 2.2 Anchor-Free Tracking Mechanism
hyper-parameters. The other is that our model predicts center Anchor-free methods have recently attracted widespread at-
offset and width/height of the target, which is more straight- tention in the object detection field [Law and Deng, 2018;
forward and efficient than other VOT methods. Duan et al., 2019; Tian et al., 2019; Zhou et al., 2019] due
The main contributions of this work are listed as follows: to their simplicity and efficiency. Naturally, the anchor-free
953
Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence (IJCAI-21)
mechanism has also been introduced to the tracking field [Xu the localization accuracy. Its output can serve as the approx-
et al., 2020; Chen et al., 2020; Zhang et al., 2020]. Mul- imation of the localization score during inference to generate
tiple object tracking (MOT) is a related area of VOT [Peng a more accurate tracking score together with the classification
et al., 2020a; Peng et al., 2020c]. In MOT area, based confidence. The key components are in detail as below.
on CenterNet [Zhou et al., 2019], CenterTrack [Zhou et
al., 2020] obtains high performance by predicting the cen- 3.2 Anchor-Free Tracking with Box Regression
ter point, width/height and center offset of each object. To For the i-th input pair from the training set, we have
our best knowledge, SiamRCR is the first VOT method pre- Fi ∈ RC×H×W denotes the feature map of the classifi-
dicting center offset and width/height of the target, which is cation branch and s be the total stride. The ground-truth
more straightforward and efficient than ever. bounding box for the current frame is defined as Bx,y ∗
=
∗ ∗ ∗ ∗
(x0 , y0 , x1 , y1 ), i.e., coordinates of the bounding box. For
2.3 Dynamic Sample Re-weighting each location (x, y) on the feature map Fi , we can map it
Existing trackers [Li et al., 2018; Li et al., 2019a; Xu et al., back onto the input frame to get the corresponding image co-
2020; Peng et al., 2020b] directly use some heuristic rules, ordinates (b 2s c + xs, b 2s c + ys). Different from anchor-based
e.g., the Focal Loss [Lin et al., 2017] to define the labels of trackers, which consider the location on the input frame as the
samples and their weights. PrDiMP [Danelljan et al., 2020] center of anchor boxes and regress the target bounding boxes
models the uncertainty of the labels. Such predefined static w.r.t. the anchor boxes, we directly regress the target boxes’
weights lead to the accuracy misalignment problem between width and height values and the center offsets at the location.
classification and regression, which harms the final tracking In this way, our tracker views locations as training samples
accuracy. However, in our SiamRCR, the sample weights for instead of anchor boxes, which follows the paradigm of the
each loss become dynamic as they are conditioned on the FCNs [Long et al., 2015] for semantic segmentation.
other branch’s outputs which keep changing during the in- Specially, the sample at location (x, y) is considered to
teraction. Such dynamic sample re-weighting mechanism is be positive if it falls into a radius r at the ground-truth
novel and also critical to the effectiveness of our model. box center, and the radius is a hyper-parameter for the pro-
posed method. Otherwise, it is a negative sample (back-
2.4 Localization Prediction Strategy ground). Besides the label (denoted by c∗x,y ) for foreground-
In object detection area, IoU-Net [Jiang et al., 2018] predicts background classification, we also have a 4D real vector
the IoU between each detected box and the matched ground- t∗x,y = (w∗ , h∗ , ∆x∗ , ∆y ∗ ) indicating the regression target
truth to guide the box regression, which is class-specific thus for the localization. Here, w∗ and h∗ are the width and height
not directly suitable for VOT. ATOM [Danelljan et al., 2019] of target ground-truth bounding box, while ∆x∗ and ∆y ∗
trains a target-specific IoU prediction network offline and are the center offsets between the current location and the
SiamFC++ [Xu et al., 2020] estimates the bounding box qual- ground-truth box. Formally, if location (x, y) is associated to
∗
ity based on centerness [Tian et al., 2019]. However, both the the ground-truth box Bx,y , which has width w∗ and height
∗
purpose and implementation of the localization branch in our h , then we have
SiamRCR are different. Our localization branch is a natural w∗ = x∗1 − x∗0 , h∗ = y1∗ − y0∗ ,
auxiliary of the reciprocal classification and regression struc- ∗ (1)
∆x = (x∗0 + x∗1 )/2 − x, ∆y ∗ = (y0∗ + y1∗ )/2 − y.
ture which itself is a better solution than existing works, while
the IoU network in other works is the main. Moreover, our lo- Corresponding to the training target, SiamRCR predicts a
calization branch is simple and lightweight, which ensures the classification confidence score pcls
x,y , a regressed 4D vector
effectiveness and efficiency of the algorithm simultaneously. tx,y = (w, h, ∆x, ∆y) for the bounding box, and a local-
ization confidence score ploc
x,y denoting the predicted localiza-
3 Proposed Method tion accuracy. It is worth noting that SiamRCR has 5× fewer
network parameters than the popular anchor-based tracker
3.1 Overview SiamRPN [Li et al., 2018] with 5 anchor boxes per location.
The proposed siamese tracking framework is shown in Fig-
ure 2. Different from previous anchor-based [Li et al., 2018; 3.3 Reciprocal Classification and Regression
Li et al., 2019a] methods which rely on pre-defined anchor In existing siamese network tracking models, classification
sizes and scales, our method is anchor-free. It operates as and regression branches operate in parallel and get optimized
follows. First, the target template and the current frame are independently with their own losses, which aggravates the ac-
both fed into the shared feature extractor (using the back- curacy misalignment of their results. In fact, when a regressed
bone of [He et al., 2016]) to generate their corresponding fea- bounding box has low accuracy, the corresponding classifica-
tures. Then, such features are combined through depth-wise tion score should not be high, because if that position be-
cross-correlation operation to create correlated feature maps, comes the winner of classification confidence the bad local-
which are further fed into the corresponding classification and ization will lead to bad tracking performance. And when a
regression branches of the anchor-free tracking head. The bounding box has a low classification score, there is no mean-
built-in reciprocal links dynamically re-weight the samples ing for the regression to try hard to get a high localization ac-
for computing each loss of the two branches. A new localiza- curacy for it will not be the winner anyway. Therefore, these
tion branch grows from the regression branch for predicting two branches need to talk to each other for aligning the accu-
954
Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence (IJCAI-21)
955
Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence (IJCAI-21)
1.0 1.0
IoU with Ground-Truth Boxes Localization Branch Reciprocal Links AO↑
0.8 0.8
3 24 0.619 0.747 0.459
0.6 0.6
4 32 0.612 0.743 0.446
5 40 0.611 0.740 0.474
0.4 0.4
0.0 0.0 sistent with the reciprocal links, serving well as the replace-
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
Tracking Score Tracking Score ment of the regression assistance link for inference. To bet-
(c) (d)
ter demonstrate how well our SiamRCR alleviates the accu-
Figure 4: The correlation between IoU scores and the tracking score, racy misalignment problem, we illustrate the correlation be-
together with the Pearson correlation coefficient R. (a) Baseline tween the IoU of regressed bounding box (w.r.t. the matched
model, the tracking score is the classification score. (b) Using cen- ground-truth) and the tracking score in Figure 4. As shown in
terness proposed in FCOS [Tian et al., 2019] as the tracking score. Figure 4(a), the Pearson correlation coefficients between IoU
(c) Baseline + localization branch. (d) SiamRCR. and tracking score is only 0.38, showing that the classifica-
tion score is indeed not consistent with the real localization
accuracy. Figure 4(c) and 4(d) show that both the localization
Testing Phase. We utilize the same offline testing strat- branch and the reciprocal links are effective and necessary,
egy as [Xu et al., 2020]. The ground-truth after augmenta- and they can well collaborate with each other.
tion in the first frame is used as the exemplar image and we
keep it unchanged during the whole testing phase. A cosine- Predicted IoU vs. Centerness. Centerness is pre-defined
window [Bertinetto et al., 2016] is multiplied on the confi- label which indicates the distance between candidates and
dence map. We adopt a linear interpolation updating strategy target center. Some object detection [Tian et al., 2019] or
on scale prediction to make the final box change smoothly object tracking [Xu et al., 2020] algorithms utilize centerness
over time. We evaluate SiamRCR on six public benchmarks to assist localization. In our SiamRCR, we discard this kind
following their corresponding protocols: GOT-10k [Huang et of fixed prior and utilize predicted IoU as dynamic super-
al., 2019], TrackingNet [Müller et al., 2018], LaSOT [Fan et vised localization information. Thus, our localization branch
al., 2019], OTB-2015 [Wu et al., 2015], VOT-2018 [Kristan can estimate the localization confidence more accurately. As
et al., 2018] and VOT-2019 [Kristan et al., 2019]. shown in Figure 4 (b) and (c), our localization prediction
mechanism alleviates the misalignment between classifica-
4.2 Ablation Study tion and regression, which is better than centerness.
Component. The ablation study results on the key compo- Radius. Radius r is a significant hyper-parameter in our
nents of SiamRCR are presented in Table 1. The baseline (I) proposed anchor-free framework. It decides the division of
without localization branch and reciprocal links obtains an positive samples and negative samples during training. We
AO (Average Overlap) of 0.594. With localization branch, conduct comparative experiment in terms of r. The results
SiamRCR can predict the localization score of the regressed are shown in Table 2. R is the corresponding radius of r
bounding box, making the final tracking score more consis- in the original input video frame, which is 8 times r. When
tent with the real IoU than the classification score. Multiply- r = 1, the performance on GOT-10k is poor since the number
ing the localization score alone (II) improves the performance of positive samples is too small. When r = 2, our SiamRCR
by 3.54% compared with baseline, showing the significance achieves the best performance. When r = 4 or r = 5,
of the accuracy misalignment between classification and re- the positive samples are redundant since some candidates far
gression. Building reciprocal assistance links itself (III) can from target center are divided into positive samples. There-
also gain a relative improvement of 2.86% over the baseline, fore, the performance drops compared with r = 2 or r = 3.
proving that the misalignment can be alleviated between clas-
sification and regression. When these two components are 4.3 Comparison with the State-of-the-Art
both adopted, the relative performance is more remarkable: We compare our SiamRCR with 18 state-of-the-art trackers.
5.05%, which is nearly equal to the direct sum of both perfor- The datasets and experimental settings are detailed as below.
mance gains. It confirms that the localization branch is con- Due to space limitations, the experiments on OTB-2015 and
956
Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence (IJCAI-21)
Succ.↑ Prec.↑ N-Prec.↑ SPM [Wang et al., 2019a] 0.275 0.577 0.507
SiamRPN++ [Li et al., 2019a] 0.285 0.599 0.482
SiamFC [Bertinetto et al., 2016] 0.559 0.518 0.652 SiamMask [Wang et al., 2019b] 0.287 0.594 0.461
ECO [Danelljan et al., 2017] 0.554 0.492 0.618 SiamBAN [Chen et al., 2020] 0.327 0.602 0.396
UPDT [Zhang et al., 2019] 0.611 0.557 0.702 Ocean [Zhang et al., 2020] 0.327 0.590 0.376
ATOM [Danelljan et al., 2019] 0.703 0.648 0.771
SiamRPN++ [Li et al., 2019a] 0.733 0.694 0.800 SiamRCR (ours) 0.336 0.602 0.386
DiMP50 [Bhat et al., 2019] 0.740 0.687 0.801
KYS [Bhat et al., 2020] 0.740 0.688 0.800 Table 5: Comparison of tracking results on VOT-2019 benchmark.
SiamAttn [Yu et al., 2020] 0.752 0.715 0.817
SiamFC++ [Xu et al., 2020] 0.754 0.705 0.800
set and conduct evaluation following the protocol II in [Fan
SiamRCR (ours) 0.764 0.716 0.818
et al., 2019]. As shown in Table 4, our SiamRCR achieves
0.575 of Succ. and 0.599 of Prec., and outperforms recent
Table 3: Comparison of tracking results on TrackingNet benchmark. SOTA tracker Ocean by 8.5% and 13.9% in terms of both
Red and blue fonts indicate the best and second results respectively. Success and Precision score respectively. It also achieves
better performance compared with other localization-aware
VOT-2018 are presented in the supplementary. trackers (ATOM and SiamFC++), proving that our reciprocal
GOT-10k. The evaluation follows the protocols in [Huang links with localization branch is better.
et al., 2019]. For a fair comparison, we train SiamRCR only VOT-2019. With challenging factors such as occlusion, fast
on the train subset which consists of about 10,000 sequences motion and illumination changing in 60 test sequences, VOT-
and test it on the test subset of 180 sequences. As shown 2019 provides a comprehensive evaluation platform for VOT.
in Figure 5, our SiamRCR achieves 0.624 of AO, which is Commonly used metrics for it are Expected Average Overlap
the best among evaluated trackers (including the online up- (EAO), Accuracy and Robustness. EAO takes both Accuracy
dating tracker DiMP). The slightly inferior performance at and Robustness into account to verify the overall tracking
large overlap threshold might due to SiamRCR’s strategy of performance. We report experimental results on VOT-2019
predicting the center offsets and width/height, rather than pre- in Table 5. Our SiamRCR achieves the best EAO score, the
dicting the bounding box coordinate offsets (e.g. SiamFC++), best Accuracy score and the second best Robustness score.
as larger value ranges can lead to less preciseness. However, Ocean performs slightly better in Robustness with the multi-
our strategy better solves the misalignment problem. feature combination strategy. As our SiamRCR only uses sin-
TrackingNet. The test subset of it contains 511 sequences gle conv-feature for estimation, it is faster than Ocean. More-
and 70 object classes. We also train our model only on Track- over, it demonstrates superior effectiveness and efficiency.
ingNet train subset. There are three metrics in TrackingNet:
Success (Succ.), Precision (Prec.) and Normalized Preci- 5 Conclusion
sion (N-Prec.). We report the results in Table 3. SiamRCR In this paper, we have proposed a novel anchor-free object
surpasses other state-of-the-art trackers on all three evalua- tracking framework which is efficient and effective. It ad-
tion metrics. In particular, SiamRCR obtains 0.764 of Succ., dresses the long-term standing accuracy misalignment prob-
0.716 of Prec. and 0.818 of N-Prec., which further demon- lem of Siamese network based models. Elaborate ablation
strates the superior tracking performance of our SiamRCR. studies have shown the effectiveness of the whole proposed
LaSOT. LaSOT is a large-scale long-term tracking bench- model and its key components. Without bells and whistles,
mark. It contains 1,400 sequences and more than 3.5 mil- the proposed method achieves state-of-the-art performance
lion frames. We train our model only on LaSOT train sub- on six tracking benchmarks, with a running speed of 65 FPS.
957
Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence (IJCAI-21)
References [Lin et al., 2017] Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaim-
[Bertinetto et al., 2016] Luca Bertinetto, Jack Valmadre, Joao F ing He, and Piotr Dollár. Focal loss for dense object detection. In
Henriques, Andrea Vedaldi, and Philip HS Torr. Fully- CVPR, 2017.
convolutional siamese networks for object tracking. In ECCV, [Long et al., 2015] Jonathan Long, Evan Shelhamer, and Trevor
2016. Darrell. Fully convolutional networks for semantic segmentation.
[Bhat et al., 2019] Goutam Bhat, Martin Danelljan, Luc Van Gool, In CVPR, 2015.
and Radu Timofte. Learning discriminative model prediction for [Müller et al., 2018] Matthias Müller, Adel Bibi, Silvio Giancola,
tracking. In ICCV, 2019. et al. Trackingnet: A large-scale dataset and benchmark for ob-
[Bhat et al., 2020] Goutam Bhat, Martin Danelljan, Luc Gool, Van, ject tracking in the wild. In ECCV, 2018.
and Radu Timofte. Know your surroundings: Exploiting scene [Nam and Han, 2016] Hyeonseob Nam and Bohyung Han. Learn-
information for object tracking. In ECCV, 2020. ing multi-domain convolutional neural networks for visual track-
ing. In CVPR, 2016.
[Chen et al., 2020] Zedu Chen, Bineng Zhong, Guorong Li, Sheng-
ping Zhang, and Rongrong Ji. Siamese box adaptive network for [Peng et al., 2020a] Jinlong Peng, Yueyang Gu, Yabiao Wang,
visual tracking. In CVPR, 2020. Chengjie Wang, Jilin Li, and Feiyue Huang. Dense scene multi-
ple object tracking with box-plane matching. In ACM MM, 2020.
[Danelljan et al., 2017] Martin Danelljan, Goutam Bhat, Fa-
had Shahbaz Khan, and Michael Felsberg. ECO: efficient con- [Peng et al., 2020b] Jinlong Peng, Changan Wang, Fangbin Wan,
volution operators for tracking. In CVPR, 2017. Yang Wu, Yabiao Wang, Ying Tai, Chengjie Wang, Jilin Li,
Feiyue Huang, and Yanwei Fu. Chained-tracker: Chaining paired
[Danelljan et al., 2019] Martin Danelljan, Goutam Bhat, Fa- attentive regression results for end-to-end joint multiple-object
had Shahbaz Khan, and Michael Felsberg. ATOM: accurate detection and tracking. In ECCV, 2020.
tracking by overlap maximization. In CVPR, 2019.
[Peng et al., 2020c] Jinlong Peng, Tao Wang, Weiyao Lin, Jian
[Danelljan et al., 2020] Martin Danelljan, Luc Van Gool, and Radu Wang, John See, Shilei Wen, and Erui Ding. Tpm: Multiple
Timofte. Probabilistic regression for visual tracking. In CVPR, object tracking with tracklet-plane matching. PR, 2020.
2020.
[Russakovsky et al., 2015] Olga Russakovsky, Jia Deng, Su Hao,
[Duan et al., 2019] Kaiwen Duan, Song Bai, Lingxi Xie, Honggang et al. Imagenet large scale visual recognition challenge. IJCV,
Qi, Qingming Huang, and Qi Tian. Centernet: Keypoint triplets 2015.
for object detection. In CVPR, 2019. [Tian et al., 2019] Zhi Tian, Chunhua Shen, Hao Chen, and Tong
[Fan et al., 2019] Heng Fan, Liting Lin, Fan Yang, et al. Lasot: A He. Fcos: Fully convolutional one-stage object detection. In
high-quality benchmark for large-scale single object tracking. In ICCV, 2019.
CVPR, 2019. [Wang et al., 2019a] Guangting Wang, Chong Luo, Zhiwei Xiong,
[He et al., 2016] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and and Wenjun Zeng. Spm-tracker: Series-parallel matching for
Jian Sun. Deep residual learning for image recognition. In CVPR, real-time visual object tracking. In CVPR, 2019.
2016. [Wang et al., 2019b] Qiang Wang, Li Zhang, Luca Bertinetto,
[Huang et al., 2019] Lianghua Huang, Xin Zhao, and Kaiqi Huang. Weiming Hu, and Philip HS Torr. Fast online object tracking
Got-10k: A large high-diversity benchmark for generic object and segmentation: A unifying approach. In CVPR, 2019.
tracking in the wild. TPAMI, 2019. [Wu et al., 2015] Yi Wu, Jongwoo Lim, and Ming-Hsuan Yang.
[Jiang et al., 2018] Borui Jiang, Ruixuan Luo, Jiayuan Mao, Tete Object tracking benchmark. TPAMI, 2015.
Xiao, and Yuning Jiang. Acquisition of localization confidence [Xu et al., 2020] Yinda Xu, Zeyu Wang, Zuoxin Li, Ye Yuan, and
for accurate object detection. In ECCV, 2018. Gang Yu. Siamfc++: Towards robust and accurate visual tracking
[Kristan et al., 2018] Matej Kristan, Ales Leonardis, Jiri Matas, with target estimation guidelines. In AAAI, 2020.
et al. The sixth visual object tracking vot2018 challenge results. [Yang et al., 2020] Tianyu Yang, Pengfei Xu, Runbo Hu, Hua Chai,
In ECCV, 2018. and Antoni Chan. Roam: Recurrently optimizing tracking model.
[Kristan et al., 2019] Matej Kristan, Jiri Matas, Ales Leonardis, In CVPR, 2020.
et al. The seventh visual object tracking vot2019 challenge re- [Yu et al., 2016] Jiahui Yu, Yuning Jiang, Zhangyang Wang,
sults. In ICCVW, 2019. Zhimin Cao, and Thomas Huang. Unitbox: An advanced object
[Law and Deng, 2018] Hei Law and Jia Deng. Cornernet: Detect- detection network. In ACM MM, 2016.
ing objects as paired keypoints. In ECCV, 2018. [Yu et al., 2020] Yuechen Yu, Yilei Xiong, Weilin Huang, and
[Li et al., 2018] Bo Li, Junjie Yan, Wei Wu, Zheng Zhu, and Xi- Matthew R. Scott. Deformable siamese attention networks for
aolin Hu. High performance visual tracking with siamese region visual object tracking. In CVPR, 2020.
proposal network. In CVPR, 2018. [Zhang et al., 2019] Lichao Zhang, Abel Gonzalez-Garcia, Joost
[Li et al., 2019a] Bo Li, Wei Wu, Qiang Wang, Fangyi Zhang, Jun- van de Weijer, Martin Danelljan, and Fahad Shahbaz Khan.
liang Xing, and Junjie Yan. Siamrpn++: Evolution of siamese Learning the model update for siamese trackers. In ICCV, 2019.
visual tracking with very deep networks. In CVPR, 2019. [Zhang et al., 2020] Zhipeng Zhang, Houwen Peng, Jianlong Fu,
[Li et al., 2019b] Peixia Li, Boyu Chen, Wanli Ouyang, Dong Bing Li, and Weiming Hu1. Ocean: Object-aware anchor-free
tracking. In ECCV, 2020.
Wang, Xiaoyun Yang, and Huchuan Lu. Gradnet: Gradient-
guided network for visual object tracking. In ICCV, 2019. [Zhou et al., 2019] Xingyi Zhou, Dequan Wang, and Philipp
Krähenbühl. Objects as points. arXiv:1904.07850, 2019.
[Lin et al., 2014] Tsung-Yi Lin, Michael Maire, Serge Belongie,
et al. Microsoft coco: Common objects in context. In ECCV, [Zhou et al., 2020] Xingyi Zhou, Vladlen Koltun, and Philipp
2014. Krähenbühl. Tracking objects as points. In ECCV, 2020.
958