Distill Dbdgan
Distill Dbdgan
Defocus blur detection (DBD) aims to segment the blurred regions from a given image affected by defocus
blur. It is a crucial pre-processing step for various computer vision tasks. With the increasing popularity of
small mobile devices, there is a need for a computationally efficient method to detect defocus blur accurately.
We propose an efficient defocus blur detection method that estimates the probability of each pixel being
focused or blurred in resource-constraint devices. Despite remarkable advances made by the recent deep
learning-based methods, they still suffer from several challenges such as background clutter, scale sensitiv-
ity, indistinguishable low-contrast focused regions from out-of-focus blur, and especially high computational
cost and memory requirement. To address the first three challenges, we develop a novel deep network that
efficiently detects blur map from the input blurred image. Specifically, we integrate multi-scale features in
the deep network to resolve the scale ambiguities and simultaneously modeled the non-local structural corre-
lations in the high-level blur features. To handle the last two issues, we eventually frame our DBD algorithm
to perform knowledge distillation by transferring information from the larger teacher network to a compact
student network. All the networks are adversarially trained in an end-to-end manner to enforce higher order
consistencies between the output and the target distributions. Experimental results demonstrate the state-of- 87
the-art performance of the larger teacher network, while our proposed lightweight DBD model imitates the
output of the teacher network without significant loss in accuracy. The codes, pre-trained model weights, and
the results will be made publicly available.
Additional Key Words and Phrases: Defocus blur detection, knowledge distillation, adversarial learning
1 INTRODUCTION
Defocus blur occurs commonly or intentionally in everyday photography when the light rays from
scene points on objects, not located at the camera’s focus distance, converge in front or behind
ACM Trans. Multimedia Comput. Commun. Appl., Vol. 19, No. 2s, Article 87. Publication date: February 2023.
87:2 S. Jonna et al.
Fig. 1. A challenging example for DBD. We show the blur detection results for a blurry input image taken
from CUHK dataset [29]. The algorithms proposed in (b)–(g) introduce large incorrect detection regions. Our
proposed method predicts masks closest to the ground truth.
the image plane. Defocus blur detection (DBD) aims at pixelwise identification of the out-of-
focus regions from an image. DBD has been an active area of research over the past few decades
due to its wide range of potential applications in several vision problems. Automatic detection
of the commonly encountered non-uniform blur with spatially varying point spread function is
a very complicated and challenging task. Common challenges involved are (i) precise detection
of the boundary between visually indistinguishable blurry smooth regions and in-focus smooth
regions in a partially blurred image, (ii) susceptibility of the degree of blur to image scales, (iii) low
detection accuracy, and (iv) runtime detection/inference speed.
Most of the existing blur region detection/classification techniques [6, 25] rely on traditional
handcrafted features that are often based on low-level defocus blur cues, such as gradient, fre-
quency, and contrast. They often fail to detect blur at large homogeneous or low-contrast regions.
Recently, deep convolutional neural networks– (DCNNs) based DBD methods [4, 33, 34, 37,
45–47] have successfully overcome several limitations of the traditional methods. Hence, several
algorithms have been proposed in this direction starting from References [18, 25, 45] to the current
state-of-the-art (SOTA) method [4, 33, 42–44, 47]. Nevertheless, the output detection results of
several recent notable works contain a lot of falsely labeled regions. The blur maps obtained us-
ing handcrafted-based methods (discriminative blur detection features (DBDF)) [29], local
binary patterns (LBP) [39], high-frequency multi-scale fusion and sort transform of gra-
dient magnitudes (HiFST) [6] are shown in Figure 1(b)–(d), respectively. In Figure 1(e)–(g), we
show the defocus detection results generated from deep learning-based DBD methods CRLNet
[46], DeFusionNet [37], and depth distillation (DD) [4], respectively. Note that the results ob-
tained using both the handcrafted feature-based methods in Figure 1(b)–(d) and the CNN models
in Figure 1(e)–(g) suffer from poor quality boundaries of in-focus objects and erroneous detections.
Thus, despite the superior performance of DCCN-based methods for the DBD problem in recent
years, they still encounter several challenges concerning accurate localization of ramified blurred
structures and low-variance homogeneous focal regions. Moreover, the increasingly deep com-
plex models employed in the previous works require high computational and storage resources.
However, the abundance and popularity of small mobile devices in recent times demand more
memory/computationally efficient methods. Motivated by the above observations, in this work, we
ACM Trans. Multimedia Comput. Commun. Appl., Vol. 19, No. 2s, Article 87. Publication date: February 2023.
Knowledge Distillation and Adversarial Learning Framework for DBD 87:3
Fig. 2. The overall pipeline of our approach. It consists of a larger teacher network T and a smaller student
network S along with multiple discriminators D 1 , D 2 , D 3 . The idea is to transfer knowledge from the teacher
network T to the student network S to improve the blur detection performance of the student network S.
Both the networks are trained using content losses together with adversarial losses (orange and purple).
Information is distilled across both the feature and output spaces (blue). The solid lines are the forward
paths; the dotted lines denote the backward paths. The pipeline allows significant reduction in memory cost
for the task of blur detection during inference.
ACM Trans. Multimedia Comput. Commun. Appl., Vol. 19, No. 2s, Article 87. Publication date: February 2023.
87:4 S. Jonna et al.
datasets CUHK [29], DUT [45], and SZU-Blur detection (SZU-BD) [31] along with the ablation
results. Finally, we provide a few applications of our proposed DBD method.
2 RELATED WORKS
Based on the image features, blur map segmentation techniques are broadly divided into two cat-
egories: methods based on handcrafted features and learning-based algorithms.
ACM Trans. Multimedia Comput. Commun. Appl., Vol. 19, No. 2s, Article 87. Publication date: February 2023.
Knowledge Distillation and Adversarial Learning Framework for DBD 87:5
designed an end-to-end convolution neural network (HANUN) with channel attentions and small
U-shaped networks embedded into the decoders for blur detection.
In contrast to the existing works, we formulate the blur detection problem using knowledge
distillation along with adversarial learning framework with an aim to create high-performance
models with fewer parameters and high inference speed.
3 PROPOSED METHODOLOGY
In this work, we design a lightweight blur detection student network that is trained in an adversar-
ial manner and produces comparable results to the state-of-the-art DCNN-based DBD methods.
ACM Trans. Multimedia Comput. Commun. Appl., Vol. 19, No. 2s, Article 87. Publication date: February 2023.
87:6 S. Jonna et al.
of the S network. In line with previous works [13, 22], we used batch normalization in the gener-
ators and spectral normalization in the discriminators for stable training. Spectral normalization
was introduced to stabilize GAN training [17] and was shown to outperform other regularization
techniques. In addition, the use of LeakyReLU instead of ReLU in the discriminators facilitates
a stronger backward flow of gradients for negative values from the discriminators to the corre-
sponding generators [26]. Our GAN training scheme with a two-timescale update rule [9] could
successfully avoid mode collapse using only a 1:1 balanced update interval between the generators
and the corresponding discriminators. In general, the optimization problem solved by our LSGAN
can be formulated as follows:
1 1
min VLSGAN (D 1 ) = Er ∼pdata (r ) [(D 1 (r ) − 1) 2 ] + Ez ∼pz (z ) [(D 1 (G 2 (z))) 2 ],
D1 2 2
(1)
1
min VLSGAN (G 1 ) = Ez ∼pz (z ) [(D 1 (G 1 (z)) − 1) 2 ],
G1 2
1 1
min VLSGAN (D 2 , D 3 ) = Er ∼pdata (r ) [(D 2 (r ) − 1) 2 ] + Ez ∼pz (z ) [(D 2 (G 2 (z))) 2 ]
D 2, D 3 2 2
1 1
+ Ez ∼pz (z ) [(D 3 (G 1 (z)) − 1) ] + Ez ∼pz (z ) [(D 3 (G 2 (z))) 2 ],
2
(2)
2 2
1 1
min VLSGAN (G 2 ) = Ez ∼pz (z ) [(D 2 (G 2 (z)) − 1) 2 ] + Ez ∼pz (z ) [(D 3 (G 2 (z)) − 1) 2 ].
G2 2 2
Here r and z represent the real data variable and the generator input variable, respectively. Simi-
larly, pdata (r ) and pz (z) denote the real data distribution and the input distribution of the generator,
respectively. The goal is to seek generators that generate samples as close as possible to the real
data distribution pdata (r ) from the given input distribution pz (z). Please refer to Sections 3.5.1
and 3.5.2 for details on the training objectives of the generators. The discriminators are discarded
during testing, and only the generators are employed for defocus blur detection. The training ob-
jectives of the discriminators are provided in Appendix A.
ACM Trans. Multimedia Comput. Commun. Appl., Vol. 19, No. 2s, Article 87. Publication date: February 2023.
Knowledge Distillation and Adversarial Learning Framework for DBD 87:7
Fig. 3. The proposed architecture of the teacher network for defocus blur detection. The following notations
are used: C(d) = Dilated Conv2D with atrous rate d, C = Conv2D, T = Transposed Conv2D, SN = Spectral
Norm, BN = BatchNorm, LR = Leaky ReLU, and R = ReLU. We deploy pre-trained SE-ResNeXt-101 [10] as the
backbone of the teacher network. The backbone encoder of the blur detection map generator extracts the
feature representations from the given input blurry image x. The extracted features are further enhanced by
using DenseASPP [38] and self-attention modules at the intermediate layer. A decoder decodes these latent
high-level feature maps into pixelwise blur detection map ŷ (t ) . A discriminator distinguishes the generator
prediction ŷ (t ) from the ground truth y. Channel dimensions are shown in each block of the encoder, decoder,
and the discriminator.
dilation rates (sparse sampling rate) leads to an exceedingly large field of view that densely ensem-
bles the diverse multi-scale structural information. Moreover, capturing the long-range non-local
dependencies of a given blur pixel assists in the effective coordination of the pixel’s spatial infor-
mation to that of all the similar pixels within the entire image. We thereby propose to use a parallel
intra-attention or SA module at the bottleneck layer to fully utilize the non-local structural corre-
lations in the high-level blur features. The key decoder part consists of a combination of vanilla
convolutions, batch normalization, and upsampling layers. Transposed convolutional layers are
used to gradually upsample the spatial resolution of the extracted features until the higher input
resolution is attained. To further improve the localization accuracies/detection of small regions
and to avoid the gradient vanishing problem, we use skip connections [27] between the encoder
and the decoder modules. Hence, the decoder module recovers boundaries by integrating low-level
information from the encoder with high-level features from the decoder.
The second module in the proposed teacher framework is the discriminator network D 1 . It con-
sists of seven convolutional blocks, each comprising a vanilla convolution of kernel size 4 × 4 and
stride 2, followed by spectral normalization [22]. LeakyReLU is used as an activation function in
all the convolutional blocks of D 1 . The number of output channels of the consecutive convolution
layers are 64, 128, 256, 256, 256, 256, and 256, respectively.
3.4.2 Student Network. For the generator G 2 of the student network, we design a lightweight
encoder-decoder architecture for blur detection. We utilize EfficientNetB3 [32], pre-trained on Ima-
geNet data [5], to initialize the weights of the encoder structure of the generator G 2 . EfficientNetB3
belongs to a family of recently introduced EfficientNets [32] that are fast, light networks and have
established benchmarks in image classification. The term B3 indicates the size of the network that
ACM Trans. Multimedia Comput. Commun. Appl., Vol. 19, No. 2s, Article 87. Publication date: February 2023.
87:8 S. Jonna et al.
scales up for every increment in the series from B0 to B7 and the corresponding increase in process-
ing power also leads to higher accuracies. In the ImageNet classification problem, EfficientNet-B3
exhibits top-1 accuracy of 81.6% with 12M parameters compared to 77.1% top-1 accuracy obtained
by EfficientNet-B0 with 5.3M parameters and 84.3% top-1 accuracy scored by EfficientNet-B7 with
66M parameters. We chose EfficientNet-B3 to achieve a tradeoff between network complexity and
performance. Mobile inverted bottleneck convolution [28] forms the basic building block of the
network. Our UNet decoder consists of five transposed convolution layers for upsampling, each
followed by two convolution blocks. The transposed convolution layers in the decoder use 2 ×
2 kernels with stride 2. The subsequent convolution blocks use 3 × 3 kernels with stride 1. A
sigmoid function is placed at the last layer to output blur detection score.
The architectures of the discriminator networks D 2 and D 3 of the student network are similar
to that of D 1 of the teacher network (see Section 3.4.1).
We capitalize on the knowledge of the teacher network to learn the student network by matching
the output distributions and the intermediate features of the teacher and the student networks that
is formulated through a knowledge distillation loss term mentioned in Section 3.5.
ACM Trans. Multimedia Comput. Commun. Appl., Vol. 19, No. 2s, Article 87. Publication date: February 2023.
Knowledge Distillation and Adversarial Learning Framework for DBD 87:9
Consistency Loss: The consistency loss Lcons ensures consistency between the outputs of the T
and S networks. It is defined as follows:
W×H
Lcons =
1 ŷ (t ) − ŷ (s ) . (6)
W × H j=1 j j
Feature Affinity Loss: Our aim is to effectively transfer the short and long-range correlations among
spatial locations of the rich feature maps F (t ) ∈ RC×Wf ×H f of the teacher network to the feature
maps F (s ) ∈ RC ×Wf ×H f of the student network. Here C, C are the feature depths, and Wf × H f is
the spatial dimension. Note that C does not necessarily equal C as the dimensions of the computed
adjacency matrices do not depend on the feature channel dimensions. The feature maps F (t ) and
F (s ) are obtained from the mid-level layers of both the networks. We build affinity graphs such
that the affinity functions defined on the nodes of the graph encode pairwise affinities between
the nodes and form the edges of the graph. First, we apply max pooling operation on F (t ) and
F (s ) (resized by bilinear interpolation to match spatial dimensions) to obtain feature maps F̌ (t ) ∈
C×Wf ×H f C ×Wf ×H f
R and F̌ (s ) ∈ R (F̌ (n) = {f̌c(n)
j }∀c ∈ C or C ;Wf < Wf , H f < H f ) comprising the
most activated pixels. The activations at each spatial position are independently normalized across
the channels to capture the structural information [15], resulting in feature maps F̃ (n) = {f̃j(n) } as
f̌j(n)
f̃j(n) = , (7)
(n) 2
c ( f̌c j ) +ε
where ε is a small stability constant (e.g., ε = 1e−6) to avoid divisions by zero. When n = t, c ∈ C
else c ∈ C .
If we assume that there are Wf · H f entities, then we can define graph adjacency matrices A (t ) ,
(n) (n)
A (s ) (A (n) = {ajk }) for the affinity graphs, where the affinity weights ajk between the pair of
entities, j and k are computed using an affinity function O as
(n)
ajk = O f̃j(n) , f̃k(n) . (8)
ACM Trans. Multimedia Comput. Commun. Appl., Vol. 19, No. 2s, Article 87. Publication date: February 2023.
87:10 S. Jonna et al.
3.5.4 Joint Loss. The overall objective functions for the proposed blur detection algorithm is
formulated as follows:
Teacher Network
(t ) (t )
L (t ) = λ 1 Lcon + λ 2 LLSGAN (12)
Student Network
(s ) (s )
L (s ) = γ 1 Lcon + γ 2 LLSGAN + γ 3 LK D , (13)
where λ 1 , λ 2 , γ 1 , γ 2 , and γ 3 are the weighing parameters used for weighing the loss terms.
The sensitivity analysis of the adversarial hyperparameters are provided in Appendix B.
4 IMPLEMENTATION DETAILS
We have implemented the entire pipeline on a machine containing Nvidia Tesla K80 GPU with a
mini-batch size of 4. The two decoder networks and the three discriminator networks are initialized
by sampling from a zero-mean normal distribution with a standard deviation of 0.2. We set all the
biases to 0. We use ADAM optimizers [14] with initial learning rates of 0.0002, 0.0001, 0.0005,
0.00005, and 0.00005 corresponding to G 1 , D 1 , G 2 , D 2 , and D 3 , respectively. In Equation (5), α 1 , α 2
and α 3 are set to 1.0, 0.1, and 0.1, respectively. The parameters λ 1 and λ 2 in Equation (12) are set to
1.0 and 0.1, respectively. In Equation (13), γ 1 , γ 2 , and γ 3 are set to 1.0, 0.1, and 1.0, respectively. We
resize all the images to 320 × 320 during training and evaluation, similar to the previous algorithms.
5 EXPERIMENTAL RESULTS
5.1 Datasets
We evaluate the proposed framework on publicly available datasets, namely CUHK [29], DUT [45],
and SZU-BD datasets [31].
5.1.1 CUHK Dataset [29]. The database proposed in Reference [29] is a publicly available
dataset consisting of both out-of-focus and motion blurred images for benchmarking the perfor-
mance of blur segmentation algorithms. It contains 1, 000 partially blurred images, of which 704
images contain out-of-focus blur, and the remaining 296 images are motion blurred. The ground
truth binary blur maps corresponding to all 1, 000 images are labeled by humans. To provide a fair
comparison with the SOTA methods [4, 45–47], we used the same training–testing split as in the
above methods. The training set consists of 604 blurred images, and the remaining 100 images are
used for evaluation.
5.1.2 DUT Dataset [45]. Recently, the authors in Reference [45] proposed a dataset for eval-
uating the performance of contemporary blur segmentation methods. It consists of 500 partially
blurred images containing defocus blur. The dataset comes with ground truth binary blur maps
corresponding to all the 500 blurred images. Since the authors in References [4, 45, 47] had not
utilized this dataset for training, we have used the DUT dataset for evaluation purposes only. Al-
though the authors in Reference [46] proposed DUT training dataset consisting of 600 training
images, we have not considered DUT data for training our model.
5.1.3 SZU-BD Dataset [31]. The SZU-BD dataset is a relatively newer benchmark dataset pro-
posed in Reference [31] and consists of 784 blur images. Besides, the dataset also contains the
corresponding pixelwise annotated ground truth blur masks. Of these 784 blur images, 709 images
contain defocus blur, while the remaining 75 images contain motion blur only. Part of the images
was collected from DUT blur dataset [45] and MSRA10K salient object detection dataset [3]. The
ACM Trans. Multimedia Comput. Commun. Appl., Vol. 19, No. 2s, Article 87. Publication date: February 2023.
Knowledge Distillation and Adversarial Learning Framework for DBD 87:11
Table 1. Quantitative Comparisons with 14 DBD Methods on CUHK [29] and DUT [45] Datasets in
Terms of F-measure ↑ and MAE ↓
Metric DBDF [29] SS [36] KSFV [24] DHCF [25] HiFST [6] LBP [39] BTBNet [45] CRLNet [46] CENet [47]
F-measure 0.548 0.649 0.534 0.202 0.553 0.681 0.808 0.871 0.906
CUHK
MAE 0.309 0.259 0.301 0.498 0.221 0.183 0.106 0.083 0.061
F-measure 0.497 0.629 0.576 0.252 0.503 0.687 0.701 0.804 0.816
DUT
MAE 0.383 0.292 0.276 0.510 0.249 0.191 0.193 0.141 0.137
Metric DD [4] HRFRNet [41] DAD [44] HANUN [8] AENet [43] EFENet [43] Ours (T ) Ours (S)
F-measure 0.879 0.921 0.884 0.970 0.910 0.914 0.926 0.918
CUHK
MAE 0.057 0.114 0.079 0.036 0.056 0.053 0.041 0.044
F-measure 0.828 0.945 0.794 0.920 0.831 0.854 0.901 0.894
DUT
MAE 0.107 0.103 0.153 0.107 0.114 0.094 0.068 0.071
Best in bold. Second best is underlined.
SZU-BD dataset was created mainly for testing purposes, with the image resolutions varying from
275 × 218 pixels to 500 × 468 pixels.
HRFRNet [41] achieves the best F-measure score of 0.945 on DUT dataset [45], but the large hier-
archical refinement network leads to a considerably larger model parameter size and thus limiting
its applicability. Our proposed T and S models achieve F-measure scores closer to the second best
method of HANUN [8] on the DUT dataset [45]. Also, our method achieves the best MAE values
on the DUT dataset [45]. HANUN [8] achieves the best performance on the CUHK dataset [29]
on both the metrics, while our proposed T model achieves the second-best performance. The pro-
posed S model produces comparable results with fewer parameters. Beyond the number of model
parameters, on considering the computational cost owing to the embedding of multiple nested at-
tention blocks and nested U-shaped network in the decoders in Reference [8], it is inevitable that
the method in Reference [8] requires us to perform several computationally intensive operations
across the network. We also present the visual comparisons with defocus blur detection methods
[4, 6, 18, 24, 31, 36, 39, 43, 44] on the SZU-BD dataset [31] in Figure 6. Our method provides satis-
factory results in Figure 6 (k) and (l) for complex scenes with cluttered backgrounds (second row)
or scenes where the difference between the focused and blurred regions is not very pronounced
(fourth row). Our method also produces more precise boundaries (sixth row). We provide the quan-
titative comparisons in Table 2 for only the 709 defocused images in the SZU-BD dataset [31]. DD
[4] achieves the best F-measure and MAE values even though our method produces more visually
plausible results, as can be seen from the examples in Figure 6 (sixth and eighth rows). This can
be attributed to the sample types present in the dataset (e.g., Figure 6 (last row)), where the results
of DD are closer to the annotated ground truths. Our method yields the second-best scores. Note
that the results reported in Figures 4, 5, and 6, and Tables 1 and 2 are obtained directly from the
proposed network output. No postprocessing or refinement steps have been used for the results
shown in the above mentioned figures and tables. The overall good performance of our models is
in general due to the collective effective factors such as architectural choices, training strategies
that include knowledge distillation and adversarial learning schemes (described in detail in the
ablation study). We also provide comparison of the inference time in second (s) and model param-
eters in Tables 3 and 4, respectively. Our proposed S model improves the speed to ∼1.5× the speed
of the second-fastest model [44]. Also, our proposed student model has ∼4×, ∼1.6×, ∼11×, ∼1.4×
and ∼2.5× fewer parameters than SOTA methods DD [4], DAD [44], HRFRNet [41], HANUN [8],
and EFENet [43], respectively. In Tables 3 and 4, we report the inference time and model parameter
count for methods whose codes along with pre-trained model weights are publicly accessible or
whose computation speed/parameter values are reported in their papers. PR curves and F-measure
values of the SOTA algorithms and the proposed T and S models are shown in Figure 7. From the
PR curves and F-measure values in Figure 7, one can see that the proposed T and S models achieved
comparable performance over all the three datasets on all the evaluation metrics.
5.4.2 Effectiveness of Knowledge Distillation. We show the effectiveness of KD for the problem
of DBD in Table 5 (right: second and fifth rows). By comparing the performances of the smaller
networks S trained with and without (w/o) KD, we can see that distillation improves the perfor-
mance of the proposed student network (Distill-DBDGAN) by transferring knowledge from the
larger teacher network. KD lowers the MAE values from 0.057 to 0.044 on CUHK dataset and from
ACM Trans. Multimedia Comput. Commun. Appl., Vol. 19, No. 2s, Article 87. Publication date: February 2023.
Knowledge Distillation and Adversarial Learning Framework for DBD 87:13
Fig. 4. Visual comparison results of DBD maps on the CUHK dataset [29]. The GT maps are shown in the
last column. It can be seen that our methods consistently produce DBD maps closest to the ground truth
maps. Additional results on the CUHK dataset [29] are available in Appendix C.
ACM Trans. Multimedia Comput. Commun. Appl., Vol. 19, No. 2s, Article 87. Publication date: February 2023.
87:14 S. Jonna et al.
Fig. 5. Visual comparison results of DBD maps on DUT dataset [45]. Our method produces the most visually
plausible DBD maps similar to the GT maps.
0.090 to 0.071 on DUT dataset. Our aim is to empower a lightweight network with the ability to per-
form on par with a larger network for DBD task. Several recent works [4, 43] have made gradual
improvements over existing deep learning techniques [46, 47] for DBD task. However, compli-
cated deep networks with larger inference time were employed to achieve the same. In our case,
ACM Trans. Multimedia Comput. Commun. Appl., Vol. 19, No. 2s, Article 87. Publication date: February 2023.
Knowledge Distillation and Adversarial Learning Framework for DBD 87:15
Fig. 6. Visual comparison results of DBD maps on SZU-BD dataset [31] (GT is ground truth). Our mod-
els show good generalization capacity with similar or better performance than other state-of-the-art DBD
methods.
ACM Trans. Multimedia Comput. Commun. Appl., Vol. 19, No. 2s, Article 87. Publication date: February 2023.
87:16 S. Jonna et al.
Table 2. Quantitative Comparisons with Nine DBD Methods on SZU Defocus Blur Detection
Dataset [31] in Terms of F-measure ↑ and MAE ↓
Metric KSFV [24] SS [36] LBP [39] HiFST [6] EHS [18] DPN [31] DD [4] DAD [44] EFENet [43] Ours (T ) Ours (S)
F-measure 0.841 0.877 0.912 0.899 0.939 0.958 0.972 0.916 0.968 0.969 0.968
MAE 0.273 0.224 0.160 0.217 0.126 0.078 0.055 0.172 0.073 0.064 0.065
Best in bold. Second best is underlined.
Table 3. Inference Time in Second (s) of Different Methods for Image Size 320 × 320
Metric DBDF [29] SS [36] DHCF [25] HiFST [6] LBP [39] BTBNet [45] CRLNet [46] DD [4] AENet [43] EFENet [43] DAD [44] Ours (T ) Ours (S)
Time (s) 45.45 0.7142 11.76 47.61 9.00 25.02 12.04 0.107 0.054 0.044 0.035 0.036 0.023
DD [4] DAD [44] HRFRNet [41] HANUN [8] EFENet [43] Ours (T ) Ours (S)
# Params 84.47M 34.89M 231.2M 29.75 53.13M 119.41M 21.43
Best in bold. Second best is underlined.
ACM Trans. Multimedia Comput. Commun. Appl., Vol. 19, No. 2s, Article 87. Publication date: February 2023.
Knowledge Distillation and Adversarial Learning Framework for DBD 87:17
Fig. 7. Precision, Recall, and F-measure plots on CUHK, DUT, and SZU-BD datasets.
Table 5. Ablation Analysis on CUHK [29] and DUT [45] Datasets Using F-measure ↑ and MAE ↓
CUHK DUT CUHK DUT
Model Model
F-measure MAE F-measure MAE F-measure MAE F-measure MAE
Proposed Distill-DBDGAN
Proposed T net 0.926 0.041 0.901 0.068 0.918 0.044 0.894 0.071
(w/ KD:Lcons , LK D−LSGAN , Lf a , w/ D 2 )
T net (w/o D 1 ) 0.917 0.043 0.889 0.074 S net (w/ KD:Lcons , LK D−LSGAN , w/ D 2 ) 0.914 0.047 0.891 0.081
T net (w/o D 1 , w/o SA, w/o DASPP) 0.915 0.047 0.882 0.087 S net (w/ KD:Lcons , w/ D 2 ) 0.911 0.049 0.887 0.084
— — — — — S net (w/o KD, w/ D 2 ) 0.902 0.057 0.882 0.090
— — — — — S net (w/o KD, w/o D 2 ) 0.887 0.066 0.858 0.106
We study the impact of different components in the proposed method. Best in bold.
and thereby results in better predictions for both the boundaries and blurred regions, as shown in
Figure 8(e). We have also observed that using an aggressively complex network for DBD task in a
limited dataset regime does not bring significant improvements during testing phase.
5.4.4 Model Performance vs. Model Size Tradeoff. We show the performances of the deep
teacher model and the lightweight smaller models along with the corresponding model
ACM Trans. Multimedia Comput. Commun. Appl., Vol. 19, No. 2s, Article 87. Publication date: February 2023.
87:18 S. Jonna et al.
Fig. 8. DBD results for two test samples with cluttered backgrounds, taken from DUT dataset [45]. T−sa−da
stands for the proposed teacher network T without SA and DASPP modules. GT denotes ground truth. Back-
ground clutters are marked with red circles and ellipses. The green rectangles show erroneous detections
(w.r.t. GT).
Table 6. Ablation Analysis on CUHK [29] and DUT [45] Datasets Using F-measure ↑,
MAE ↓, Number of Model Parameters ↓, and Inference Time ↓
parameters and inference speed in Table 6. To reduce training computations, the lightweight
models, shown in Table 6, were trained without KD for comparison purposes. We notice that
MobileNetV2-UNet, with 9.46M parameters, is the most compute-friendly model among the tested
models and has less inference time of 8.66 ms. However, its performance is low, as observed from
the reported F-measure and MAE values. EfficientNetB0-UNet has a lower model size (14.11M)
than EfficientNetB3-UNet (21.43M). But the performance of EfficientNetB0-UNet drops on both
CUHK and DUT datasets compared to the performance of EfficientNetB3-UNet. EfficientNetB7-
UNet achieves the highest accuracy among the tested lightweight models, but it also has the
largest parameter size and inference time of 77.01M and 46.40 ms, respectively. Hence, we have
employed EfficientNetB3-UNet -based framework for the student model that exhibits satisfactory
performance in terms of MAE and F-measure values (shown in Table 1). Our proposed student
model has reasonable parameter size of 21.43M and average inference time of 23.25 ms.
6 APPLICATIONS
Here we explore some of the possible applications that benefit from blur detection: blur magnifi-
cation, focal boundary detection, and foreground-background segmentation.
ACM Trans. Multimedia Comput. Commun. Appl., Vol. 19, No. 2s, Article 87. Publication date: February 2023.
Knowledge Distillation and Adversarial Learning Framework for DBD 87:19
Fig. 9. Demonstration of blur magnification. First row: (a) Partially blurred input image taken from CUHK
test dataset [29], images with magnified defocus blur in the background region obtained with the aid of blur
maps generated using (b) DD [3] and the proposed (c) T and (d) S networks. Second row: Zoomed regions
corresponding to the pink patches. Our approach preserves the in-focus pixels of the thumb region (zoomed
left patch) in the vicinity of the magnified blurred background. Pixels at the lower edge of the wristwatch
(zoomed right patch) are also more plausibly restored by our method.
Fig. 10. Focal boundary detection results. (a) Input defocus blurred image taken from CUHK test dataset
[29]. Results obtained by (b) Canny operator [23], (c) Sobel operator [12], (d) DD [4], proposed (e) T network,
and (f) S network.
in an image. We show examples of blur magnification in Figure 9 by using the detected blur maps
obtained by the proposed method and state-of-the-art DD technique [4]. In Figure 9(a), we show a
defocused image selected from CUHK test dataset [29]. We compare blur magnification achieved
using blur maps obtained by DD [4] and our method. Here we compute blur detection maps cor-
responding to Figure 9(a). The images with blur magnified in the background, obtained with the
aid of detected blur maps from DD [4], proposed teacher (T ), and student (S) models, are shown in
Figure 9(b), (c), and (d), respectively. The magnification result obtained by using the blur detection
output of DD [4] suffers from erroneous boundaries, as shown in the highlighted regions in the
second row. Looking at the highlighted regions, we can notice that our method achieves better de-
focus magnification where the focused pixels are kept intact at the boundaries of the foreground
object.
ACM Trans. Multimedia Comput. Commun. Appl., Vol. 19, No. 2s, Article 87. Publication date: February 2023.
87:20 S. Jonna et al.
Fig. 11. Foreground-background segmentation results. (a) Input defocus blurred image taken from CUHK
test dataset [29]. Results obtained by (b) DD [4], proposed (c) T network, and (d) S network.
predicted incorrect mask for the input blurry image. The superior quality of the defocus masks
produced by our proposed method helps to better capture the shape information near the focused
object boundaries.
7 FAILURE CASES
We identify some failure modes for our trained models across different experimental scenarios.
Our trained models may not always yield accurate results for transparent objects. Figure 12(a) il-
lustrates one such scenario. Since the glass object inherits multiple abstract textures and colors
from the blurry image background, it exhibits a similar appearance to its defocused surroundings
and is falsely labeled as blur. We also show a failure example of a complex scene image of focussed
water drops with multi-colored reflections in Figure 12(b). The blur mask in this scene is hard to
detect because of the complex light paths with reflections. Besides, our method is also prone to
failure when strong specular highlights are present in the image, as shown in Figure 12(c). Consid-
ering the physical constraints, we have not separately modeled the reflections characteristics for
specular region detection and removal in our proposed framework, which is out of scope of this
article. In Figure 12(d), our model failed to distinguish the thin focused structure/strip from the
corresponding blurry background setup. The detection accuracy of our method slightly suffers in
such cases, especially when there is a high resemblance in the appearances between the two.
ACM Trans. Multimedia Comput. Commun. Appl., Vol. 19, No. 2s, Article 87. Publication date: February 2023.
Knowledge Distillation and Adversarial Learning Framework for DBD 87:21
Fig. 12. Failure examples. The input images in panels (a) and (b) and panels (c) and (d) are taken from the
DUT and CUHK datasets, respectively. Our trained networks may fail in areas marked by the rectangles in (a)
transparent glass object, (b) falling water drops with complex light paths, (c) object with specular highlights,
and (d) thin focused structure with defocused background.
8 CONCLUSION
In this article, we proposed a KD strategy and an adversarial learning-based framework for the
challenging task of defocus blur detection. As per our knowledge, the proposed model is the first
application of KD scheme in an adversarial framework for robust detection of defocus blur regions
from a single image. We leverage deep multi-scale and attention-guided features to detect blur in
ambiguous blur regions accurately. We provide qualitative and quantitative comparison results
with state-of-the-art defocus blur detection algorithms. In future studies, we plan to investigate
automatic segmentation and tracking of blurred regions in videos.
APPENDICES
A OBJECTIVE FUNCTIONS OF THE ADVERSARIAL DISCRIMINATORS
We minimize the following objective functions to train the discriminators D 1 , D 2 , and D 3 :
W×H 2 W×H 2
1 1
LD 1 = D 1 ŷj(t ) ©xj + D 1 (yj ©xj ) − 1 , (14)
2(W × H ) j=1 2(W × H ) j=1
W×H 2 W×H 2
1 1
LD 2 = D 2 ŷj(s ) ©xj + D 2 yj ©xj − 1 , (15)
2(W × H ) j=1 2(W × H ) j=1
W×H 2 W×H 2
1 1
LD 3 = D 3 ŷj(s ) + D 3 (ŷj(t ) ) − 1 , (16)
2(W × H ) j=1 2(W × H ) j=1
where x is the input blurred image having spatial dimension W × H ; y is the ground truth blur
mask; ŷ (t ) and ŷ (s ) are the outputs of the teacher model G 1 and the student model G 2 , respectively;
j is the pixel index of the image; and ©is the concatenation operator.
B SENSITIVITY ANALYSIS OF ADVERSARIAL HYPERPARAMETERS
Considering the presence of several hyperparameters in our proposed algorithm and long train-
ing hours, we attempted a robust method to tune the sensitive hyperparameters present in our
method. Since hyperparameters related to KD losses have a known effect on a model in the
ACM Trans. Multimedia Comput. Commun. Appl., Vol. 19, No. 2s, Article 87. Publication date: February 2023.
87:22 S. Jonna et al.
Fig. 13. Effect of adversarial training hyperparameters on network performances when evaluated on the
CUHK and DUT datasets. We show plots of average MAE versus (a) λ 2 , versus (b) γ 2 , and versus (c) α 3 for
the CUHK test dataset and MAE versus (d) λ 2 , versus (e) γ 2 , and versus (f) α 3 for the DUT test dataset.
general sense and are not as sensitive as that of adversarial hyperparameters, we keep the value for
this hyperparameter constant and perform sensitivity analysis of the adversarial hyperparameters.
We have objectively selected a set of values within a range that would presumably demonstrate
a distinctive influence on the model performances. Figure 13 shows plots of the effect of change
of hyperparameters λ 2 (Equation (12)), γ 2 (Equation (13)), and α 3 (Equation (5)) on network per-
formances, when trained on the CUHK dataset and evaluated on the CUHK and DUT datasets.
The training parameter λ 2 controls the contribution of the adversarial loss for the generator of T
network, whereas γ 2 and α 3 control the same for the generator of S network. Each of the MAE
values in Figure 13 is calculated by averaging over 30.2K iterations for each of the hyperparam-
eter settings. In Figure 13(a), we observe that as λ 2 increases from 0.1 to 1.0, the average MAE
changes negligibly. We, therefore, reckon L1 loss to be less sensitive to λ 2 in the range [0.1, 1.0]
for the CUHK dataset. It achieves the lowest value (lower is better) at λ 2 = 10, whereas, for the
DUT dataset in Figure 13(b), MAE is lowest at λ 2 = 0.1 and then increases with an increase in
λ 2 value. We then analyze the influence of the parameter γ 2 on the smaller S network by setting
γ 3 = 0 (Equation (13)) and keeping γ 1 (Equation (13)) constant at 1.0 (as mentioned in Section 4).
We can observe from Figure 13(c) and (d) that with the increasing value of γ 2 , there is a decline
in average MAE values across both the datasets until it achieves the best results at γ 2 = 10.0 and
γ 2 = 0.1 for the CUHK and DUT datasets, respectively. We conjecture that evaluation on DUT
test dataset, with several obscure real blur images and 5× the size of CUHK test dataset, ensures
a more unbiased analysis and provides greater confidence in model performances. Moreover, the
DUT dataset, having not been used during training, is not directly related to the learning set (e.g.,
different acquisition settings) and therefore provides a better assessment of the model’s general-
ization abilities. Hence, we opted to choose both λ 2 and γ 2 as 0.1 for the final training session. To
analyze the effect of change of α 3 , we vary only the value of α 3 while keeping the other parame-
ters γ 1 , γ 3 (Equation (13)) and α 1 , α 2 (Equation (5)) constant (mentioned in Section 4), with γ 2 = 0.1
being chosen. In Figure 13(e) and (f), we notice nearly similar behavior of the effect of change of α 3
on the performance of the student network as that shown in Figure 13(c) and (d), respectively. In
conformity with the initial conjecture, we choose to adopt α 3 as 0.1 for training the final student
network within the knowledge distillation scheme.
ACM Trans. Multimedia Comput. Commun. Appl., Vol. 19, No. 2s, Article 87. Publication date: February 2023.
Knowledge Distillation and Adversarial Learning Framework for DBD 87:23
We observe that the optimal parameters for the CUHK dataset tend to be higher than those for
the DUT dataset, and there is a larger drop in MAE value for the DUT dataset at a hyperparameter
value of 0.1 for all the three hyperparameters considered.
C ADDITIONAL RESULTS
We provide additional results of our method along with other defocus blur detection methods
[4, 6, 8, 36, 39, 43–47] on the CUHK dataset [29] in Figure 14.
Fig. 14. Additional results of DBD maps on the CUHK dataset [29]. The GT maps are shown in the last
column.
ACM Trans. Multimedia Comput. Commun. Appl., Vol. 19, No. 2s, Article 87. Publication date: February 2023.
87:24 S. Jonna et al.
Fig. 15. Visual results of DBD maps on images captured using a smartphone camera (first and second rows)
and DSLR camera (third and fourth rows). The images captured using a DSLR camera are taken from the
DFD dataset [2]. The ((a) and (d)) input images are captured from similar scenes (along the same row) with
varying degrees of blur. Note that the image in (d) is more blurred than the image in (a) along the same row.
DBD maps detected by our proposed ((b) and (e)) teacher network and ((c) and (f)) student networks.
REFERENCES
[1] Soonmin Bae and Frédo Durand. 2007. Defocus magnification. In Computer Graphics Forum, Vol. 26. 571–579.
[2] Marcela Carvalho, Bertrand Le Saux, Pauline Trouvé-Peloux, Andrés Almansa, and Frédéric Champagnat. 2018. Deep
depth from defocus: How can defocus blur improve 3D estimation using dense neural networks? In Proceedings of the
European Conference on Computer Vision (ECCV’18) Workshops.
ACM Trans. Multimedia Comput. Commun. Appl., Vol. 19, No. 2s, Article 87. Publication date: February 2023.
Knowledge Distillation and Adversarial Learning Framework for DBD 87:25
[3] Ming-Ming Cheng, Niloy J. Mitra, Xiaolei Huang, Philip H. S. Torr, and Shi-Min Hu. 2014. Global contrast based salient
region detection. IEEE Trans. Pattern Anal. Mach. Intell. 37, 3 (2014), 569–582.
[4] X. Cun and C. M. Pun. 2020. Defocus blur detection via depth distillation. In Proceedings of the European Conference
Computer Vision (ECCV’20), Vol. 12358. 747–763.
[5] J. Deng, W. Dong, R. Socher, L. J. Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database.
In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’09). 248–255.
[6] S. A. Golestaneh and L. J. Karam. 2017. Spatially-varying blur detection based on multiscale fused and sorted transform
coefficients of gradient magnitudes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
(CVPR’17). 596–605.
[7] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. 2014.
Generative adversarial nets. In Proceedings of the Conference on Advances in Neural Information Processing Systems
(NeurIPS’14). 2672–2680.
[8] Wenliang Guo, Xiao Xiao, Yilong Hui, Wenming Yang, and Amir Sadovnik. 2021. Heterogeneous attention nested
U-shaped network for blur detection. IEEE Signal Process. Lett. 29 (2021), 140–144.
[9] Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. 2017. Gans trained by
a two time-scale update rule converge to a local nash equilibrium. In Proceedings of the Advances in Neural Information
Processing Systems (NeurIPS’17), Vol. 30.
[10] J. Hu, L. Shen, and G. Sun. 2018. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition (CVPR’18). 7132–7141.
[11] R. Huang, W. Feng, M. Fan, L. Wan, and J. Sun. 2018. Multiscale blur detection by learning discriminative deep features.
Neurocomputing 285 (2018), 154–166.
[12] Zhang Jin-Yu, Chen Yan, and Huang Xian-Xiang. 2009. Edge detection of images based on improved Sobel operator
and genetic algorithms. In Proceedings of the International Conference on Image Analysis and Signal Processing. 31–35.
[13] Alexia Jolicoeur-Martineau. 2019. The relativistic discriminator: A key element missing from standard GAN. In Pro-
ceedings of the International Conference on Learning Representations (ICLR’19).
[14] Diederick P. Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In Proceedings of the Interna-
tional Conference on Learning Representations (ICLR’15).
[15] Boyi Li, Felix Wu, Kilian Q. Weinberger, and Serge J. Belongie. 2019. Positional normalization. In Proceedings of the
Advances in Neural Information Processing Systems (NeurIPS’19). 1620–1632.
[16] Jinxing Li, Dandan Fan, Lingxiao Yang, Shuhang Gu, Guangming Lu, Yong Xu, and David Zhang. 2021. Layer-output
guided complementary attention learning for image defocus blur detection. IEEE Trans. Image Process. 30 (2021), 3748–
3763.
[17] Zinan Lin, Vyas Sekar, and Giulia Fanti. 2020. Why spectral normalization stabilizes GANs: Analysis and improve-
ments. Advances in Neural Information Processing Systems (NeurIPS) 34 (2020), 9652–9638.
[18] K. Ma, H. Fu, T. Liu, Z. Wang, and D. Tao. 2018. Deep blur mapping: Exploiting high-level semantics by deep neural
networks. IEEE Trans. Image Process. 27, 10 (2018), 5155–5166.
[19] X. Mao, Q. Li, H. Xie, R. Y. K. Lau, Z. Wang, and S. P. Smolley. 2017. Least squares generative adversarial networks. In
Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). 2813–2821.
[20] X. Mao, Q. Li, H. Xie, R. Y. K. Lau, Z. Wang, and S. P. Smolley. 2019. On the effectiveness of least squares generative
adversarial networks. IEEE Trans. Pattern Anal. Mach. Intell. 41, 12 (2019), 2947–2960.
[21] Mehdi Mirza and Simon Osindero. 2014. Conditional generative adversarial nets. arXiv:1411.1784. Retrieved from
https://arxiv.org/abs/1411.1784.
[22] Takeru Miyato, Toshiki Kataoka, Masanori Koyama, and Yuichi Yoshida. 2018. Spectral normalization for generative
adversarial networks. In Proceedings of the International Conference on Learning Representations (ICLR’18).
[23] Farzin Mokhtarian and Riku Suomela. 1998. Robust image corner detection through curvature scale space. IEEE Trans.
Pattern Anal. Mach. Intell. 20, 12 (1998), 1376–1381.
[24] Y. Pang, H. Zhu, X. Li, and X. Li. 2016. Classifying discriminative features for blur detection. IEEE Trans. Cybern. 46,
10 (2016), 2220–2227.
[25] J. Park, Y. W. Tai, D. Cho, and I. S. Kweon. 2017. A unified approach of multi-scale deep and hand-crafted features
for defocus estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17).
2760–2769.
[26] Alec Radford, Luke Metz, and Soumith Chintala. 2016. Unsupervised representation learning with deep convolutional
generative adversarial networks. In Proceedings of the International Conference on Learning Representations (ICLR’16),
Yoshua Bengio and Yann LeCun (Eds.).
[27] O. Ronneberger, P. Fischer, and T. Brox. 2015. U-Net: Convolutional networks for biomedical image segmentation. In
Proceedings of the Annual Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI’15).
234–241.
ACM Trans. Multimedia Comput. Commun. Appl., Vol. 19, No. 2s, Article 87. Publication date: February 2023.
87:26 S. Jonna et al.
[28] Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. 2018. MobileNetV2: In-
verted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recogni-
tion (CVPR’18).
[29] J. Shi, L. Xu, and J. Jia. 2014. Discriminative blur detection features. In Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition (CVPR’14). 2965–2972.
[30] J. Shi, L. Xu, and J. Jia. 2015. Just noticeable defocus blur detection and estimation. In Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition (CVPR’15). 657–665.
[31] Xiaoli Sun, Xiujun Zhang, Mingqing Xiao, and Chen Xu. 2020. Blur detection via deep pyramid network with recurrent
distinction enhanced modules. Neurocomputing 414 (2020), 278–290.
[32] M. Tan and Q. V. Le. 2019. EfficientNet: Rethinking model scaling for convolutional neural networks. In Proceedings
of the International Conference on Machine Learning (ICML’19).
[33] C. Tang, X. Liu, S. An, and P. Wang. 2021. BR2 Net: Defocus blur detection via a bidirectional channel attention residual
refining network. IEEE Trans. Multimedia 23 (2021), 624–635.
[34] C. Tang, X. Liu, X. Zheng, W. Li, J. Xiong, L. Wang, A. Zomaya, and A. Longo. 2022. DeFusionNET: Defocus blur
detection via recurrently fusing and refining discriminative multi-scale deep features. IEEE Trans. Pattern Anal. Mach.
Intell. 44, 2 (2022), 955–968.
[35] Chang Tang, Xinwang Liu, Xinzhong Zhu, En Zhu, Kun Sun, Pichao Wang, Lizhe Wang, and Albert Zomaya. 2020.
R2 MRF: Defocus blur detection via recurrently refining multi-scale residual features. In Proceedings of the AAAI Con-
ference on Artificial Intelligence, Vol. 34. 12063–12070.
[36] C. Tang, J. Wu, Y. Hou, P. Wang, and W. Li. 2016. A spectral and spatial approach of coarse-to-fine blurred image
region detection. IEEE Sign. Process. Lett. 23, 11 (2016), 1652–1656.
[37] Chang Tang, Xinzhong Zhu, Xinwang Liu, Lizhe Wang, and Albert Zomaya. 2019. DeFusionNET: Defocus blur detec-
tion via recurrently fusing and refining multi-scale deep features. In Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition (CVPR’19).
[38] M. Yang, K. Yu, C. Zhang, Z. Li, and K. Yang. 2018. DenseASPP for semantic segmentation in street scenes. In Proceed-
ings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). 3684–3692.
[39] X. Yi and M. Eramian. 2016. LBP-Based segmentation of defocus blur. IEEE Trans. Image Process. 25, 4 (2016), 1626–1638.
[40] K. Zeng, Y. Wang, J. Mao, J. Liu, W. Peng, and N. Chen. 2019. A local metric for defocus blur detection based on CNN
feature learning. IEEE Trans. Image Process. 28, 5 (2019), 2107–2115.
[41] Yongping Zhai, Junhua Wang, Jinsheng Deng, Guanghui Yue, Wei Zhang, and Chang Tang. 2021. Global context
guided hierarchically residual feature refinement network for defocus blur detection. Sign. Process. 183 (2021), 107996.
[42] Ning Zhang and Junchi Yan. 2020. Rethinking the defocus blur detection problem and a real-time deep DBD model.
In Proceedings of the European Conference Computer Vision (ECCV’20). 617–632.
[43] Wenda Zhao, Xueqing Hou, You He, and Huchuan Lu. 2021. Defocus blur detection via boosting diversity of deep
ensemble networks. IEEE Trans. Image Process. 30 (2021), 5426–5438.
[44] Wenda Zhao, Cai Shang, and Huchuan Lu. 2021. Self-generated defocus blur detection via dual adversarial discrimi-
nators. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’21). 6933–6942.
[45] Wenda Zhao, Fan Zhao, Dong Wang, and Huchuan Lu. 2018. Defocus blur detection via multi-stream bottom-top-
bottom fully convolutional network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
(CVPR’18).
[46] W. Zhao, F. Zhao, D. Wang, and H. Lu. 2019. Defocus blur detection via multi-stream bottom-top-bottom network.
IEEE Trans. Pattern Anal. Mach. Intell. 42, 8 (2019), 1884–1897.
[47] Wenda Zhao, Bowen Zheng, Qiuhua Lin, and Huchuan Lu. 2019. Enhancing diversity of defocus blur detectors via
cross-ensemble network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19).
ACM Trans. Multimedia Comput. Commun. Appl., Vol. 19, No. 2s, Article 87. Publication date: February 2023.