Image Denoising With Block-Matching and 3D Filteri
Image Denoising With Block-Matching and 3D Filteri
net/publication/246418250
Article in Proceedings of SPIE - The International Society for Optical Engineering · February 2006
DOI: 10.1117/12.643267
CITATIONS READS
662 4,979
4 authors:
Some of the authors of this publication are also working on these related projects:
Phase imaging via sparse coding in the complex domain View project
Spatial light modulator based phase retrieval: principles and applications View project
All content following this page was uploaded by Alessandro Foi on 30 July 2014.
ABSTRACT
We present a novel approach to still image denoising based on effective filtering in 3D transform domain by
combining sliding-window transform processing with block-matching. We process blocks within the image in a
sliding manner and utilize the block-matching concept by searching for blocks which are similar to the currently
processed one. The matched blocks are stacked together to form a 3D array and due to the similarity between
them, the data in the array exhibit high level of correlation. We exploit this correlation by applying a 3D
decorrelating unitary transform and effectively attenuate the noise by shrinkage of the transform coefficients.
The subsequent inverse 3D transform yields estimates of all matched blocks. After repeating this procedure for
all image blocks in sliding manner, the final estimate is computed as weighed average of all overlapping block-
estimates. A fast and efficient algorithm implementing the proposed approach is developed. The experimental
results show that the proposed method delivers state-of-art denoising performance, both in terms of objective
criteria and visual quality.
Keywords: image denoising, block-matching, 3D transforms
1. INTRODUCTION
Much of the recent research on image denoising has been focused on methods that reduce noise in transform
domain. Starting with the milestone work of Donoho,1, 2 many of the later techniques3—7 performed denoising in
wavelet transform domain. Of these methods, the most successful proved to be the ones4, 5, 7 based on rather so-
phisticated modeling of the noise impact on the transform coefficients of overcomplete multiscale decompositions.
Not limited to multiscale techniques, the overcomplete representations have traditionally played a significant role
in improving the restoration abilities of even the most basic transform-based methods. This is manifested by
the sliding-window transform denoising,8, 9 where the basic idea is to successively denoise overlapping blocks
by coefficient shrinkage in local 2D transform domain (e.g. DCT, DFT, etc.). Although the transform-based
approaches deliver very good overall performance in terms of objective criteria, they fail to preserve details which
are not suitably represented by the used transform and often introduce artifacts that are characteristic of this
transform.
A different denoising strategy based on non-local estimation appeared recently,10, 11 where a pixel of the true
image is estimated from regions which are found similar to the region centered at the estimated pixel. These
methods, unlike the transform-based ones, introduce very few artifacts in the estimates but often oversmooth
image details. Based on an elaborate adaptive weighting scheme, the exemplar-based denoising10 appears to be
the best of them and achieves results competitive to the ones produced by the best transform-based techniques.
The concept of employing similar data patches from different locations is popular in the video processing field
under the term of “block-matching”, where it is used to improve the coding efficiency by exploiting similarity
among blocks which follow the motion of objects in consecutive frames. Traditionally, block-matching has
found successful application in conjunction with transform-based techniques. Such applications include video
compression (MPEG standards) and also video denoising,12 where noise is attenuated in 3D DCT domain.
We propose an original image denoising method based on effective filtering in 3D transform domain by
combining sliding-window transform processing with block-matching. We undertake the block-matching concept
for a single noisy image; as we process image blocks in a sliding manner, we search for blocks that exhibit similarity
to the currently-processed one. The matched blocks are stacked together to form a 3D array. In this manner,
Image Processing: Algorithms and Systems, Neural Networks, and Machine Learning
edited by E.R. Dougherty, J.T. Astola, K.O. Egiazarian, N.M. Nasrabadi, S.A. Rizvi
Proc. of SPIE-IS&T Electronic Imaging, SPIE Vol. 6064, 606414, © 2006 SPIE-IS&T · 0277-786X/06/$15
d (Zx1 , Zx2 ) = N1−1 Υ T2D (Zx1 ) , λthr2D σ 2 log (N12 ) − Υ T2D (Zx2 ) , λthr2D σ 2 log (N12 ) , (1)
2
where x1 , x2 ∈ X, T2D is a 2D linear unitary transform operator (e.g. DCT, DFT, etc.), Υ is a hard-threshold
operator, λthr2D is fixed threshold parameter, and · 2 denotes the L2 -norm. Naturally, Υ is defined as
The result of the block-matching is a set SxR ⊆ X of the coordinates of the blocks that are similar to ZxR
according to our d-distance (1); thus, SxR is defined as
Figure 1. Fragments of Lena, House, Boats and Barbara corrupted by AWGN of σ = 15. For each fragment block-
matching is illustrated by showing a reference block marked with ’R’ and a few of its matched ones.
where τ match is the maximum d-distance for which two blocks are considered similar. Obviously d (ZxR , ZxR ) = 0,
which implies that |SxR | ≥ 1, where |SxR | denotes the cardinality of SxR .
The matching procedure in presence of noise is demonstrated on Figure 1, where we show a few reference
blocks and the ones matched as similar to them.
2.1.2. Denoising in 3D transform domain
We stack the matched noisy blocks Zx∈SxR (ordering them by increasing d-distance to ZxR ) to form a 3D array
of size N1 × N1 × |SxR |, which is denoted by ZSxR . We apply a unitary 3D transform T3D on ZSxR in order
to attain sparse representation of the true signal. The noise is attenuated by hard-thresholding the transform
−1
coefficients. Subsequently, the inverse transform operator T3D yields a 3D array of reconstructed estimates
−1
YSxR = T3D Υ T3D ZSxR , λthr3D σ 2 log (N12 ) , (3)
where λthr3D is a fixed threshold parameter. The array YSxR comprises of |SxR | stacked local block estimates
xR
Yx∈Sx
of the true image blocks located at x ∈ SxR . We define a weight for these local estimates as
R
1
ω xR = Nhar , if Nhar ≥ 1
(4)
1, otherwise,
where Nhar is the number of non-zero transform coefficients after hard-thresholding. Observe that σ2 Nhar is
equal∗ to the total variance of YSxR . Thus, sparser decompositions of ZSxR result in less noisy estimates which
are awarded greater weights by (4).
where ExR and Ex are the mean values of the blocks ExR and Ex , respectively. The mean subtraction allows for
improved matching of blocks with similar structures but different mean values.
where ESxR is a 3D array built by stacking the matched blocks Ex∈SxR (in the same manner as ZSxR is built by
stacking Zx∈SxR ). We filter the 3D array of noisy observations ZSxR in T3D -transform domain by an elementwise
multiplication with WSxR . The subsequent inverse transform gives
−1
YSxR = T3D WSxR T3D ZSxR , (7)
xR
where YSxR comprises of stacked local block estimates Yx∈S xR
of the true image blocks located at the matched
locations x ∈ SxR . As in (4), the weight assigned to the estimates is inversely proportional to the total variance
of YSxR and defined as
⎛ ⎞−1
N1 N1 |SxR |
2
⎜ ⎟
ω xR = ⎝ WSxR (i, j, t) ⎠ . (8)
i=1 j=1 t=1
4. ALGORITHM
We present an algorithm which employs the hard-thresholding approach (from Section 2) to deliver an initial
estimate for the Wiener filtering part (from Section 3) that produces the final estimate. A straightforward
implementation of this general approach is computationally demanding. Thus, in order to realize a practical
and efficient algorithm, we impose constraints and exploit certain expedients. In this section we introduce these
aspects and develop an efficient implementation of the proposed approach.
The choice of the transforms T2D and T3D is governed by their energy compaction (sparsity) ability for noise-
free image blocks (2D) and stacked blocks (3D), respectively. It is often assumed that neighboring pixels in small
blocks extracted from natural images exhibit high correlation; thus, such blocks can be sparsely represented
by well-established decorrelating transforms, such as the DCT, the DFT, wavelets, etc. From computational
efficiency point of view, however, very important characteristics are the separability and the availability of fast
algorithms. Hence, the most natural choice for T2D and T3D is a fast separable transform which allows for sparse
representation of the true-image signal in each dimension of the input array.
(a) Block-matching. Compute SxR as given in Equation (2) but restrict the search to a local neighborhood
of fixed size NS × NS centered about xR . If |SxR | > N2 , then let only the coordinates of the N2 blocks
with smallest d-distance to ZxR remain in SxR and exclude the others.
(b) Denoising by hard-thresholding in local 3D transform domain. Compute the local estimate blocks
xR
Yx∈Sx
and their corresponding weight ω xR as given in (3) and (4), respectively.
R
(c) Aggregation. Scale each reconstructed local block estimate YxxR , where x ∈ SxR , by a block of weights
W (xR ) = ωxR Wwin2D and accumulate to the estimate buffer: ebuff x = ebuff x + W (xR ) YxxR , for all
x ∈ SxR . Accordingly, the weight block is accumulated to same locations as the estimates but in the
weights buffer: wbuff x = wbuff x + W (xR ), for all x ∈ SxR .
ebuff (x)
3. Intermediate estimate. Produce the intermediate estimate e (x) = wbuff (x) for all x ∈ X, which is to be
used as initial estimate for the Wiener counterpart.
4. Local Wiener filtering estimates. Use e as initial estimate. The buffers are re-initialized: ebuff (x) = 0
and wbuff (x) = 0, for all x ∈ X. For each xR ∈ XR , do the following sub-steps.
(a) Block-matching. Compute SxR as given in (6) but restrict the search to a local neighborhood of fixed
size NS × NS centered about xR . If |SxR | > N2 , then let only the coordinates of the N2 blocks with
smallest distance (as defined in Subsection 3.1) to ExR remain in SxR and exclude the others.
xR
(b) Denoising by Wiener filtering in local 3D transform domain. The local block estimates Yx∈S xR
and
their weight ω xR are computed as given in (7) and (8), respectively.
(c) Aggregation. It is identical to step 2c.
ebuff (x)
5. Final estimate. The final estimate is given by y(x) = wbuff (x)
, for all x ∈ X.
4.2. Complexity
The time complexity order of the algorithm as a function of its parameters is given by
where the first two addends are due to block-matching and the third is due to T3D used for denoising and where
OT2D (N1 , N1 ) and OT3D (N1 , N1 , N2 ) denote the complexity orders of the transforms T2D and T3D , respectively.
Both OT2D and OT3D depend on properties of the adopted transforms such as separability and availability of
fast algorithms. For example, the DFT has an efficient implementation by means of fast Fourier transform
(FFT). The 2D FFT, in particular, has complexity O (N1 N2 log (N1 N2 )) as opposed to O N12 N22 of a custom
non-separable transform. Moreover, an effective trade-off between complexity and denoising performance can be
achieved by varying Nstep .
Image
Lena Barbara House Peppers Boats Couple Hill
σ/ PSNR 512 × 512 512 × 512 256 × 256 256 × 256 512 × 512 512 × 512 512 × 512
5/ 34.15 38.63 38.18 39.54 37.84 37.20 37.40 37.11
10/ 28.13 35.83 34.87 36.37 34.38 33.79 33.88 33.57
15/ 24.61 34.21 33.08 34.75 32.31 31.96 31.93 31.79
20/ 22.11 33.03 31.77 33.54 30.87 30.65 30.58 30.60
25/ 20.17 32.08 30.75 32.67 29.80 29.68 29.57 29.74
30/ 18.59 31.29 29.90 31.95 28.97 28.90 28.75 29.04
35/ 17.25 30.61 29.13 31.21 28.14 28.20 28.03 28.46
50/ 14.16 29.08 27.51 29.65 26.46 26.71 26.46 27.21
100/ 8.13 26.04 24.14 25.92 23.11 24.00 23.60 24.77
2552
PSNR = 10 log10 −1
.
|X| x∈X (y (x) − y (x))2
At http://www.cs.tut.fi/~foi/3D-DFT, we provide a collection of the original and denoised test images that
were used in our experiments, together with the algorithm implementation (as C++ and MATLAB functions)
which produced all reported results. With the mentioned parameters, the execution time of the whole algorithm
is less than 9 seconds for an input image of size 256 × 256 on a 3 GHz Pentium machine.
In Figure 3, we compare the output PSNR of our method with the reported ones of three6, 7, 10 state-of-art
techniques known to the authors as best. However, for standard deviations 30 and 35 we could neither find nor
reproduce the results of both the FSP+TUP 7 and the exemplar-based10 techniques, thus they are omitted.
In Figure 4, we show noisy (σ = 35) House image and the corresponding denoised one. For this test
image, similarity among neighboring blocks is easy to perceive in the uniform regions and in the regular-shaped
structures. Hence, those details are well-preserved in our estimate. It is worth referring to Figure 1, where
block-matching is illustrated for a fragment of House.
Pairs of noisy (σ = 35) and denoised Lena and Hill images are shown in Figures 5 and 6, respectively. The
enlarged fragments in each figure help to demonstrate the good quality of the denoised images in terms of faithful
detail preservation (stripes on the hat in Lena and the pattern on the roof in Hill).
We show fragments of noisy (σ = 50) and denoised Lena, Barbara, Couple, and Boats images in Figure 7. For
this relatively high level of noise, there are very few disturbing artifacts and the proposed technique attains good
preservation of: sharp details (the table legs in Barbara and the poles in Boats), smooth regions (the cheeks of
Lena and the suit of the man in Couple), and oscillatory patterns (the table cover in Barbara). A fragment of
Couple corrupted by noise of various standard deviations is presented in Figure 8.
In order to demonstrate the capability of the proposed method to preserve textures, we show fragments of
heavily noisy (σ = 100) and denoised Barbara in Figure 9. Although the true signal is almost completely buried
under noise, the stripes on the clothes are faithfully restored in the estimate.
I
30
0
D.
29
zD
—
28
27
C
10 15 20 25 30 35
Noise standard deviation Noise standard deviation
(a) Barbara '40.0 (b) Lena
o
zo<
0
—
CI
Figure 3. Output PSNR as a function of the standard deviation for Barbara (a), Lena (b), Peppers (c), and House (d).
The notation is: proposed method (squares), FSP+TUP 7 (circles), BLS-GSM 6 (stars), and exemplar-based10 (triangles).
We conclude by remarking that the proposed method outperforms–in terms of objective criteria–all tech-
niques known to us. Moreover, our estimates retain good visual quality even for relatively high levels of noise.
Our current research extends the presented approach by the adoption of variable-sized blocks and shape-
adaptive transforms,13 thus further improving the adaptivity to the structures of the underlying image. Also,
application of the technique to more general restoration problems is being considered.
REFERENCES
1. D. L. Donoho and I. M. Johnstone, "Adapting to unknown smoothness via wavelet shrinkage," J. Amer.
Stat. Assoc., vol. 90, pp. 1200—1224, 1995.
2. D. L. Donoho, "De-noising by soft-thresholding," IEEE Trans. Inform. Theory, vol. 41, pp. 613—627, 1995.
3. S. G. Chang, B. Yu, and M. Vetterli, "Adaptive wavelet thresholding for image denoising and compression,"
IEEE Trans. Image Processing, vol. 9, pp. 1532—1546, 2000.
4. A. Pizurica, W. Philips, I. Lemahieu, and M. Acheroy, "A joint inter- and intrascale statistical model for
Bayesian wavelet based image denoising," IEEE Trans. Image Processing, vol. 11, pp. 545—557, 2002.
5. L. Sendur and I. W. Selesnick, "Bivariate shrinkage with local variance estimation," IEEE Signal Processing
Letters, vol. 9, pp. 438—441, 2002.
6. J. Portilla, V. Strela, M. Wainwright, and E. P. Simoncelli, "Image denoising using scale mixtures of Gaus-
sians in the wavelet domain," IEEE Trans. Image Processing, vol. 12, pp. 1338—1351, 2003.
7. J. A. Guerrero-Colon and J. Portilla, "Two-level adaptive denoising using Gaussian scale mixtures in over-
complete oriented pyramids," in Proc. of IEEE Int’l Conf on Image Processing, Genoa, Italy, September
2005.
Figure 5. On the left are noisy (σ = 35) Lena and two enlarged fragments from it; on the right are the denoised image
(PSNR 30.61 dB) and the corresponding fragments.
a
at
Figure 6. On the left are noisy (σ = 35) Hill and two fragments from it; on the right are the denoised image (PSNR
28.46 dB) and the corresponding fragments from it.
(c) Couple (σ = 50, PSNR 26.46 dB) (d) Boats (σ = 50, PSNR 26.71 dB)
8. L. Yaroslavsky, K. Egiazarian, and J. Astola, "Transform domain image restoration methods: review, com-
parison and interpretation," in Nonlinear Image Processing and Pattern Analysis XII, Proc. SPIE 4304,
pp. 155—169, 2001.
9. R. Öktem, L. Yaroslavsky and K. Egiazarian, "Signal and image denoising in transform domain and wavelet
shrinkage: a comparative study," in Proc. of EUSIPCO’98, Rhodes, Greece, September 1998.
10. C. Kervrann and J. Boulanger, "Local adaptivity to variable smoothness for exemplar-based image denoising
and representation," Research Report INRIA, RR-5624, July 2005.
11. A. Buades, B. Coll, and J. M. Morel, "A review of image denoising algorithms, with a new one," Multiscale
Model. Simul., vol. 4, pp. 490—530, 2005.
12. D. Rusanovskyy and K. Egiazarian, "Video denoising algorithm in sliding 3D DCT domain," in Proc. of
ACIVS’05, Antwerp, Belgium, September 2005.
13. A. Foi, K. Dabov, V. Katkovnik, and K. Egiazarian, "Shape-Adaptive DCT for Denoising and Image
Reconstruction," in Electronic Imaging’06, Proc. SPIE 6064, no. 6064A-18, San Jose, California USA,
2006.
Figure 8. Pairs of fragments of noisy and denoised Couple for standard deviations: 25 (a), 50 (b), 75 (c), and 100 (d).
Figure 9. Fragments of noisy (σ = 100) and denoised (PSNR 24.14 dB) Barbara.