0% found this document useful (0 votes)
30 views6 pages

Gondara 2016

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views6 pages

Gondara 2016

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

2016 IEEE 16th International Conference on Data Mining Workshops

Medical image denoising using convolutional


denoising autoencoders

Lovedeep Gondara
Department of Computer Science
Simon Fraser University
[email protected]

Abstract—Image denoising is an important pre-processing step image denoising performance for their ability to exploit strong
in medical image analysis. Different algorithms have been pro- spatial correlations.
posed in past three decades with varying denoising performances.
More recently, having outperformed all conventional methods, In this paper we present empirical evidence that stacked
deep learning based models have shown a great promise. These denoising autoencoders built using convolutional layers work
methods are however limited for requirement of large training well for small sample sizes, which are typical of medical
sample size and high computational costs. In this paper we show image databases. It is in contrary to the belief that for optimal
that using small sample size, denoising autoencoders constructed performance, very large training datasets are needed for deep
using convolutional layers can be used for efficient denoising architecture based models. We also show that these methods
of medical images. Heterogeneous images can be combined to can recover signal even when noise levels are extremely high,
boost sample size for increased denoising performance. Simplest
at the point where most other denoising methods would fail.
of networks can reconstruct images with corruption levels so high
that noise and signal are not differentiable to human eye. Rest of this paper is organized as following, next section
Keywords—Image denoising, denoising autoencoder, convolu- discusses related work in image denoising using deep architec-
tional autoencoder tures. Section III introduces autoencoders and their variants.
Section IV explains our experimental set-up and details our
empirical evaluation and section V presents our conclusions
I. I NTRODUCTION
and directions for future work.
Medical imaging including X-rays, Magnetic Resonance
Imaging (MRI), Computer Tomography (CT), ultrasound etc. II. R ELATED WORK
are susceptible to noise [21]. Reasons vary from use of
different image acquisition techniques to attempts at decreasing Although BM3D [7] is considered state-of-the-art in image
patients exposure to radiation. As the amount of radiation is denoising and is a very well engineered method, Burger et
decreased, noise increases [1]. Denoising is often required for al. [4] showed that a plain multi layer perceptron (MLP) can
proper image analysis, both by humans and machines. achieve similar denoising performance.
Denoising autoencoders are a recent addition to image de-
Image denoising, being a classical problem in computer
noising literature. Used as a building block for deep networks,
vision has been studied in detail. Various methods exist, rang-
they were introduced by Vincent et al. [24] as an extension to
ing from models based on partial differential equations (PDEs)
classic autoencoders. It was shown that denoising autoencoders
[18], [20], [22], domain transformations such as wavelets [6],
can be stacked [25] to form a deep network by feeding the
DCT [29], BLS-GSM [19] etc., non local techniques including
output of one denoising autoencoder to the one below it.
NL-means [30], [3], combination of non local means and
domain transformations such as BM3D [7] and a family of Jain et al. [12] proposed image denoising using convolu-
models exploiting sparse coding techniques [17], [9], [15]. All tional neural networks. It was observed that using a small sam-
methods share a common goal, expressed as ple of training images, performance at par or better than state-
of-the-art based on wavelets and Markov random fields can be
achieved. Xie et al. [28] used stacked sparse autoencoders for
z =x+y (1)
image denoising and inpainting, which performed at par with
K-SVD. Agostenelli et al. [1] experimented with adaptive multi
Where z is the noisy image produced as a sum of original column deep neural networks for image denoising. Built using
image x and some noise y. Most methods try to approximate a combination of stacked sparse autoencoders, this system was
x using z as close as possible. In most cases, y is assumed to shown to be robust for different noise types.
be generated from a well defined process.
With recent developments in deep learning [14], [11], [23], III. P RELIMINARIES
[2], [10], results from models based on deep architectures
A. Autoencoders
have been promising. Autoencoders have been used for im-
age denoising [24], [25], [28], [5]. They easily outperform An autoencoder is a type of neural network that tries
conventional denoising methods and are less restrictive for to learn an approximation to identity function using back-
specification of noise generative processes. Denoising au- propagation, i.e. given a set of unlabeled training inputs
toencoders constructed using convolutional layers have better x(1) , x(2) , ..., x(n) , it uses

2375-9259/16 $31.00 © 2016 IEEE 241


DOI 10.1109/ICDMW.2016.102
z (i) = x(i) (2) 1) Denoising Autoencoders: Denoising autoencoder is a
stochastic extension to classic autoencoder [24], that is we
force the model to learn reconstruction of input given its noisy
An autoencoder first takes an input x ∈ [0, 1]d and version. A stochastic corruption process randomly sets some

maps(encode) it to a hidden representation y ∈ [0, 1]d using of the inputs to zero, forcing denoising autoencoder to predict
deterministic mapping, such as missing(corrupted) values for randomly selected subsets of
missing patterns.
Basic architecture of a denoising autoencoder is shown in
y = s(W x + b) (3) Fig. 2

where s can be any non linear function. Latent represen-


tation y is then mapped back(decode) into a reconstruction z,
which is of same shape as x using similar mapping.

z = s(W  y + b ) (4) Fig. 2. Denoising autoencoder, some inputs are set to missing

Denoising autoencoders can be stacked to create a deep


In (4), prime symbol is not a matrix transpose. Model network (stacked denoising autoencoder) [25] shown in Fig. 3
parameters (W, W  , b, b ) are optimized to minimize recon- [33].
struction error, which can be assessed using different loss
functions such as squared error or cross-entropy.
Basic architecture of an autoencoder is shown in Fig. 1
[32]

Fig. 3. A stacked denoising autoencoder

Output from the layer below is fed to the current layer and
training is done layer wise.
Fig. 1. A basic autoencoder 2) Convolutional autoencoder: Convolutional autoen-
coders [16] are based on standard autoencoder architecture
with convolutional encoding and decoding layers. Compared
Here layer L1 is input layer which is encoded in layer L2 to classic autoencoders, convolutional autoencoders are better
using latent representation and input is reconstructed at L3 . suited for image processing as they utilize full capability of
convolutional neural networks to exploit image structure.
Using number of hidden units lower than inputs forces
autoencoder to learn a compressed approximation. Mostly an In convolutional autoencoders, weights are shared among
autoencoder learns low dimensional representation very similar all input locations which helps preserve local spatiality. For a
to Principal Component Analysis (PCA). Having hidden units mono channel input, representation of ith feature map is given
larger than number of inputs can still discover useful insights as
by imposing certain sparsity constraints.
hi = s(x ∗ W i + bi ). (5)

242
where bias is broadcasted to whole map, ∗ denotes convo- Instead of corrupting a single image at a time, flattened
lution (2D) and s is an activation. Single bias per latent map dataset with each row representing an image was corrupted,
is used and reconstruction is obtained as hence simultaneously perturbing all images. Corrupted datasets
were then used for modelling. Relatively simple architec-
 ture was used for convolutional denoising autoencoder (CNN
y = s( hi ∗ W̃ i + c) (6) DAE), shown in Fig. 5.
i∈H

where c is bias per input channel, H is group of latent


feature maps, W̃ is flip operation over both weight dimensions.
Backpropogation is used for computation of gradient of the
error function with respect to the parameters.

IV. E VALUATION
A. Data
We used two datasets, mini-MIAS database of mammo-
grams(MMM) [13] and a dental radiography database(DX)
[26]. MMM has 322 images of 1024 × 1024 resolution and
DX has 400 cephalometric X-ray images collected from 400
patients with a resolution of 1935 × 2400. Random images
from both datasets are shown in Fig. 4.

Fig. 5. Architecture of CNN DAE used

Keras [31] was used for modelling on an Acer Aspire


M5 notebook (Intel Core i5-4200U, 10 GB RAM, no GPU).
Images were compared using structural similarity index mea-
sure(SSIM) instead of peak signal to noise ratio (PSNR) for
its consistency and accuracy [27]. A composite index of three
Fig. 4. Random sample of medical images from datasets MMM and DX, measures, SSIM estimates the visual effects of shifts in image
rows 1 and 2 show X-ray images from DX and row 3 shows mammograms luminance, contrast and other remaining errors, collectively
from MMM
called structural changes. For original and coded signals x and
y, SSIM is given as
B. Experimental setup
All images were processed prior to modelling. Pre- SSIM (x, y) = [l(x, y)]α [c(x, y)]β [s(x, y)]γ (7)
processing consisted of resizing all images to 64 × 64 for
computational resource reasons. Different parameters detailed where α, β and γ > 0 control the relative significance of
in Table I were used for corruption. each of three terms in SSIM and l, c and s are luminance,
contrast and structural components calculated as
TABLE I. DATASET PERTURBATIONS
2μx μy + C1
Noise type corruption parameters
l(x, y) = (8)
Gaussian p=0.1, μ = 0, σ = 1 μ2x + μ2y + C1
Gaussian p=0.5, μ = 0, σ = 1
Gaussian p=0.2, μ = 0, σ = 2
Gaussian p=0.2, μ = 0, σ = 5 2σx σy + C2
p=0.2, λ = 1 c(x, y) = (9)
Poisson
Poisson p=0.2, λ = 5
σx2 + σy2 + C2

2σxy + C3
p is proportion of noise introduced, σ and μ are standard deviation and mean of s(x, y) = (10)
normal distribution and λ is mean of Poisson distribution σx σy + C 3

243
where μx and μy represents the mean of original and
coded image, σx and σy are standard deviation and σxy is
the covariance of two images.
Basic settings were kept constant with 100 epochs and
a batch size of 10. No fine-tuning was performed to get
comparison results on a basic architecture, that should be easy
to implement even by a naive user. Mean of SSIM scores over
the set of test images is reported for comparison.

C. Empirical evaluation
For baseline comparison, images corrupted with lowest
noise level (μ = 0, σ = 1, p = 0.1) were used. To keep similar
Fig. 7. Training and validation loss from 100 epochs using a batchsize of 10
sample size for training, we used 300 images from each of the
datasets, leaving us with 22 for testing in MMM and 100 in
DX.
Using a batch size of 10 and 100 epochs, denoising results
are presented in Fig. 6 and Table II.

Fig. 6. Denoising results on both datasets, top row shows real images with
second row showing the noisier version (μ = 0, σ = 1, p = 0.1), third row
shows images denoised using CNN DAE and fourth row shows results of
applying a median filter

TABLE II. M EAN SSIM SCORES FOR TEST IMAGES FROM MMM AND
DX DATASETS
Image type MMM DX
Noisy 0.45 0.62
CNN DAE 0.81 0.88
Median filter 0.73 0.86
Fig. 8. Denoising performance of CNN DAE on combined dataset, top row
Results show an increased denoising performance using shows real images, second row is noisier version with minimal noise, third
this simple architecture on small datasets over the use of row is denoising result of NL means, fourth rows shows results of median
filter, fifth row is results of using smaller dataset (300 training samples) with
median filter, which is most often used for this type of noise. CNN DAE, sixth row is the results of CNN DAE on larger combined dataset.
Model converged nicely for the given noise levels and
sample size, shown in Fig. 7. It can bee seen that even using
50 epochs, reducing training time in half, we would have got Denoising results on three randomly chosen test images
similar results. from combined dataset are shown in Fig. 8 and Table III.
Table III shows that CNN DAE performs better than NL
To test if increased sample size by combining hetero-
means and median filter. Increasing sample size marginally
geneous data sources would have an impact on denoising
enhanced the denoising performance.
performance, we combined both datasets with 721 images for
training and 100 for testing. To test the limits of CNN DAEs denoising performance, we

244
TABLE III. C OMPARING MEAN SSIM SCORES USING DIFFERENT It can be seen that as noise level increases, this simple
DENOISING FILTERS
network has trouble reconstructing original signal. However,
Image type SSIM even when the image is not visible to human eye, this network
Noisy 0.63 is successful in partial generation of real images. Using a more
NL means 0.62
Median filter 0.80 complex deeper model, or by increasing number of training
CNN DAE(a) 0.89 samples and number of epochs might help.
CNN DAE(b) 0.90

Performance of CNN DAE was tested on images corrupted


CNN DAE(a) is denoising performance using smaller dataset and CNN DAE(b) is
denoising performance on same images using the combined dataset. using Poisson noise with p = 0.2, λ = 1 and λ = 5. Denoising
results are shown in Fig. 10.

used rest of the noisy datasets with varying noise generative


patterns and noise levels. Images with high corruption levels
are barely visible to human eye, so denoising performance
on these is of interest. Denoising results along with noisy and
noiseless images on varying levels of Gaussian noise are shown
in Fig. 9.

Fig. 10. CNN DAE performance on Poisson corrupted images. Top row
shows images corrupted with p = 0.2, λ = 1 with second row showing
denoised results using CNN DAE. Third and fourth rows show noisy and
denoised images corrupted with p = 0.2, λ = 5.

Table IV shows comparison of CNN DAE with median


filter and NL means for denoising performance on varying
noise levels and types. It is clear that CNN DAE outperforms
both denoising methods by a wide margin, which increases as
noise level increases.

TABLE IV. C OMPARISON USING MEAN SSIM FOR DIFFERENT NOISE


PATTERNS AND LEVELS

Image type p = 0.5 sd = 5 sd = 10 P oisson, λ = 5


Noisy 0.10 0.03 0.01 0.33
NL means 0.25 0.03 0.01 0.15
Median filter 0.28 0.11 0.03 0.17
CNN DAE 0.70 0.55 0.39 0.85

p = 0.5 represents 50% corrupted images with μ = 0, σ = 1, sd = 5 are images


corrupted with p = 0.2, μ = 0, σ = 5, sd = 10 are corrupted with
p = 0.2, μ = 0, σ = 10 and P oisson, λ = 5 are corrupted with a Poisson noise
using λ = 5

Fig. 9. Denoising performance of CNN DAE on different Gaussian noise


patterns. Top row shows original images, second row is noisy images with Also, as the noise level is increased the network has trouble
noise levels of μ = 0, σ = 1, p = 0.5, third row shows denoising results, converging. Fig. 11 shows the loss curves for Gaussian noise
fourth row shows corruption with p = 0.2, σ = 5, fifth row is denoised
images using CNN DAE, sixth and seventh rows shows noisy and denoised with μ = 0, p = 0.2, σ = 10. Even using 100 epochs, model
images corrupted with p = 0.2, σ = 10. has not converged.

245
[11] Hinton, Geoffrey, et al. ”Deep neural networks for acoustic modeling
in speech recognition: The shared views of four research groups.” IEEE
Signal Processing Magazine 29.6 (2012): 82-97.
[12] Jain, Viren, and Sebastian Seung. ”Natural image denoising with
convolutional networks.” Advances in Neural Information Processing
Systems. 2009.
[13] J Suckling et al (1994): The Mammographic Image Analysis Society
Digital Mammogram Database Exerpta Medica. International Congress
Series 1069 pp375-378.
[14] Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. ”Imagenet
classification with deep convolutional neural networks.” Advances in
neural information processing systems. 2012.
[15] Mairal, Julien, et al. ”Online dictionary learning for sparse coding.”
Proceedings of the 26th annual international conference on machine
learning. ACM, 2009.
Fig. 11. Model having trouble converging at higher noise levels, no decrease [16] Masci, Jonathan, et al. ”Stacked convolutional auto-encoders for hierar-
in validation errors can be seen with increasing number of epochs. chical feature extraction.” International Conference on Artificial Neural
Networks. Springer Berlin Heidelberg, 2011.
[17] Olshausen, Bruno A., and David J. Field. ”Sparse coding with an
overcomplete basis set: A strategy employed by V1?.” Vision research
V. C ONCLUSION 37.23 (1997): 3311-3325.
We have shown that denoising autoencoder constructed [18] Perona, Pietro, and Jitendra Malik. ”Scale-space and edge detection
using anisotropic diffusion.” IEEE Transactions on pattern analysis and
using convolutional layers can be used for efficient denoising machine intelligence 12.7 (1990): 629-639.
of medical images. In contrary to the belief, we have shown [19] Portilla, Javier, et al. ”Image denoising using scale mixtures of Gaus-
that good denoising performance can be achieved using small sians in the wavelet domain.” IEEE Transactions on Image processing
training datasets, training samples as few as 300 are enough 12.11 (2003): 1338-1351.
for good performance. [20] Rudin, Leonid I., and Stanley Osher. ”Total variation based image
restoration with free local constraints.” Image Processing, 1994. Pro-
Our future work would focus on finding an optimal ar- ceedings. ICIP-94., IEEE International Conference. Vol. 1. IEEE, 1994.
chitecture for small sample denoising. We would like to [21] Sanches, Joo M., Jacinto C. Nascimento, and Jorge S. Marques.
investigate similar architectures on high resolution images and ”Medical image noise reduction using the SylvesterLyapunov equation.”
the use of other image denoising methods such as singular IEEE transactions on image processing 17.9 (2008): 1522-1539.
value decomposition (SVD) and median filters for image pre- [22] Subakan, Ozlem, et al. ”Feature preserving image smoothing using a
processing before using CNN DAE. It would also be of continuous mixture of tensors.” 2007 IEEE 11th International Conference
on Computer Vision. IEEE, 2007.
interest, if given only a few images, we can combine them with
[23] Sutskever, Ilya, Oriol Vinyals, and Quoc V. Le. ”Sequence to sequence
other readily available datasets such as ImageNet [8] for better learning with neural networks.” Advances in neural information process-
denoising performance by increasing training sample size. ing systems. 2014.
[24] Vincent, Pascal, et al. ”Extracting and composing robust features
with denoising autoencoders.” Proceedings of the 25th international
R EFERENCES conference on Machine learning. ACM, 2008.
[1] Agostinelli, Forest, Michael R. Anderson, and Honglak Lee. ”Adaptive [25] Vincent, Pascal, et al. ”Stacked denoising autoencoders: Learning useful
multi-column deep neural networks with application to robust image representations in a deep network with a local denoising criterion.”
denoising.” Advances in Neural Information Processing Systems. 2013. Journal of Machine Learning Research 11.Dec (2010): 3371-3408.
[2] Bengio, Yoshua, et al. ”Greedy layer-wise training of deep networks.” [26] Wang, Ching-Wei, et al. ”A benchmark for comparison of dental
Advances in neural information processing systems 19 (2007): 153. radiography analysis algorithms.” Medical image analysis 31 (2016): 63-
76.
[3] Buades, Antoni, Bartomeu Coll, and Jean-Michel Morel. ”A review of
image denoising algorithms, with a new one.” Multiscale Modeling and [27] Wang, Zhou, et al. ”Image quality assessment: from error visibility
Simulation 4.2 (2005): 490-530. to structural similarity.” IEEE transactions on image processing 13.4
(2004): 600-612.
[4] Burger, Harold C., Christian J. Schuler, and Stefan Harmeling. ”Image
denoising: Can plain neural networks compete with BM3D?.” Computer [28] Xie, Junyuan, Linli Xu, and Enhong Chen. ”Image denoising and
Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. inpainting with deep neural networks.” Advances in Neural Information
IEEE, 2012. Processing Systems. 2012.
[5] Cho, Kyunghyun. ”Boltzmann machines and denoising autoencoders for [29] Yaroslavsky, Leonid P., Karen O. Egiazarian, and Jaakko T. Astola.
image denoising.” arXiv preprint arXiv:1301.3468 (2013). ”Transform domain image restoration methods: review, comparison, and
interpretation.” Photonics West 2001-Electronic Imaging. International
[6] Coifman, Ronald R., and David L. Donoho. Translation-invariant de-
Society for Optics and Photonics, 2001.
noising. Springer New York, 1995.
[30] Zhang, Dapeng, and Zhou Wang. ”Image information restoration based
[7] Dabov, Kostadin, et al. ”Image denoising by sparse 3-D transform- on long-range correlation.” IEEE Transactions on Circuits and Systems
domain collaborative filtering.” IEEE Transactions on image processing for Video Technology 12.5 (2002): 331-341.
16.8 (2007): 2080-2095.
[31] Franois Chollet, keras, (2015), GitHub repository,
[8] Deng, Jia, et al. ”Imagenet: A large-scale hierarchical image database.” https://github.com/fchollet/keras
Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE
Conference on. IEEE, 2009. [32] Deep learning tutorial, Stanford University.Autoencoders. Available:
http://ufldl.stanford.edu/tutorial/unsupervised/Autoencoders/
[9] Elad, Michael, and Michal Aharon. ”Image denoising via sparse and
redundant representations over learned dictionaries.” IEEE Transactions [33] Introduction Auto-Encoder, wikidocs.Stacked Denoising Auto-Encoder
on Image processing 15.12 (2006): 3736-3745. (SdA). Available: https://wikidocs.net/3413
[10] Glorot, Xavier, Antoine Bordes, and Yoshua Bengio. ”Deep Sparse
Rectifier Neural Networks.” Aistats. Vol. 15. No. 106. 2011.

246

You might also like