Deep Joint 2018
Deep Joint 2018
Color Filter
Scene Multispectral Image Mosaic Demosaicing Method Reconstructed Color Image
Array (CFA)
Encoder Decoder
Figure 1: Capture and reconstruction of color images on single-sensor cameras. A color filter array (CFA) selectively allows scene pho-
tons with certain wavelengths to reach portions of a monochromatic sensor. A color image is then reconstructed from the filtered samples
(multispectral image mosaic) using a demosaicing algorithm. We model this process as an autoencoder: the CFA projection encodes color
information onto the monochromatic sensor, which is later decoded by the color-reconstruction method. The joint design of CFA patterns
and demosaicing produces high-quality color reconstruction, outperforming existing techniques.
Abstract
We present a convolutional neural network architecture for performing joint design of color filter array (CFA) patterns and
demosaicing. Our generic model allows the training of CFAs of arbitrary sizes, optimizing each color filter over the entire
RGB color space. The patterns and algorithms produced by our method provide high-quality color reconstructions. We
demonstrate the effectiveness of our approach by showing that its results achieve higher PSNR than the ones obtained with
state-of-the-art techniques on all standard demosaicing datasets, both for noise-free and noisy scenarios. Our method can
also be used to obtain demosaicing strategies for pre-defined CFAs, such as the Bayer pattern, for which our results also
surpass even the demosaicing algorithms specifically designed for such a pattern.
CCS Concepts
•Computing methodologies → Computational photography; Neural networks;
• A method for the joint design of CFA pattern and demosaic- 2.2. Demosaicing
ing that minimizes color-reconstruction errors (Section 3). Our Demosaicing is a well-studied problem, with many surveys on
model is the first to optimize CFA colors over the entire RGB existing methods [LGZ08, MC11, KB15]. Demosaicing algorithms
color space, while jointly optimizing demosaicing. The re- have been proposed for the frequency domain [ASH05, Dub05,
sults produced by our system outperform existing solutions in LCTZ07], and based on hard-coded heuristics for interpola-
terms of PSNR for both noise-free and noisy data (Section 4); tion [JAJ∗ 14, Li05, ZW05], self-similarities [ZWBL11a, BCMS09],
• An autoencoder architecture that models the color-image cap- optimization schemes [CM12, HST∗ 14, KHKP16], and compres-
ture process on single monochromatic sensors. Our architec- sive sensing [MBP∗ 09,MAKR13,DVM16]. Next, we discuss demo-
ture achieves fast training convergence on image patches, and saicing strategies based on neural networks.
works with CFAs of different sizes, including existing ones
(Section 3.1); Kappa and Hel-Or [KHO00] and Go et al. [GSL00] were the first
to use neural networks for demosaicing. Long and Huang [LH06]
later proposed an adaptive scheme to improve Go et al.’s method.
2. Related work Heinze et al. [HvLP12] proposed multi-frame demosaicing us-
ing a neural network for estimating the pixel color based on its
2.1. CFA Design
surroundings. Wang [Wan14] used 4 × 4 patches to train a multi-
Following the work of Bayer [Bay76], several color filter arrays layer neural network while minimizing a suitable objective func-
have been proposed over the years. Lukac and Plataniotis [LP05] tion. Gharbi et al. constructed a dataset with hard cases, which
analysed the performance of ten RGB CFAs. More recently, re- were used to train a CNN for joint demosaicing and denois-
1
0
⊗ = × =
0
0
𝑤1 𝐼input ⊗ 𝑤1 mask1 submosaic1
𝐼input 0
1
⊗
0
= × = =
0
𝑤2 𝐼input ⊗ 𝑤2 mask 2 submosaic2
0 𝐼mosaic
𝑧𝑜𝑜𝑚 0
⊗ = × =
1
0
𝑤3 𝐼input ⊗ 𝑤3 mask 3 submosaic3
Figure 3: Example of an encoder mimicking the Bayer pattern. Each color filter is represented by weights w ~ i = [w i r , w i g , w i b , w i t ] and a
mask (maski ), generating a submosaic. For the Bayer pattern, the bias term w i t = 0 for all three color filters w i . The I input ⊗ w ~ i , submosaici ,
and I mosaic images are colored just for illustration, as they are single-channel images.
ing [GCPD16]. All these methods were specifically designed to imposed by the construction of actual CFAs, which precludes the
reconstruct Bayer filtered images. use of non-linear activation functions and stacked layers. The de-
coder, however, may use as many layers as desired, as it is com-
puted after the sampling process. The architecture of our net-
2.3. Convolutional Neural Networks and Autoencoders work is detailed in Section 3.1.
Convolutional neural networks [LBBH98] have been extensively Using only convolutional, ReLU and batch-normalization lay-
used for image classification [KSH12], and most recently have ers [IS15], our CNN architecture supports images of different
been applied to a large variety of visual tasks. An autoencoder sizes. By avoiding the use of fully-connected layers, the network
is a variation of a neural network that tries to learn a represen- can be trained using small image patches (with 128 × 128 pixels)
tation of its input in a lower-dimensional space and then repro- and still reconstruct images of variable resolution (without the
duce the original information from such a sparse representation. need of breaking the image into patches). This provides our CNN
Introduced in the CNN literature as a data-driven compression great flexibility and significantly reduces training time.
method [KW13], the autoencoder concept has already been used
for image denoising [VLL∗ 10], data visualization [vdMH08], su-
perresolution [ZYW∗ 15], and to learn priors used for image re- 3.1. Our Autoencoder Architecture
construction [CJN∗ 17]. Encoding: Our architecture optimizes colors over the en-
Residual architectures (with skip connections) have been used tire RGB color-space. Each component (i.e., color filter) of
in several image-processing tasks such as denoising [ZZC∗ 17], the CFA pattern is represented by a four-dimensional vector
superresolution [TAG∗ 17], and others [JMFU17]. Our architec- ~ = [w r , w g , w b , w t ]. The weights [w r , w g , w b ] ∈ R3≥0 are RGB
w
ture similarly makes use of skip connections, but instead of coefficients that represent the actual color filter (Fig. 2), and
merging branches by summing feature maps (as in the original w t ∈ R is a bias term. Appendix A shows the full vectors w ~ i as-
ResNet [HZRS16]), we do so by concatenating the skipped and sociated with our CFAs. Given a pixel from an input image with
original feature maps. Such a choice has already been used in re- RGB coefficients p ~ = [p r , p g , p b ] ∈ R3≥0 , we model the encoding
cent works [SLJ∗ 15, SVI∗ 15]. ~ by the color filter w
of p ~ using an affine functional ⊗ defined as:
~⊗w
p ~ := p r w r + p g w g + p b w b + w t . (1)
3. Joint Design of CFAs and Demosaicing Eq. (1) is modeled as a convolution of the input image with a
1 × 1 × 3 kernel plus a bias term. Note that since all coefficients
Our joint design of CFA pattern and demosaicing is expressed as
of w~ are trainable, we must enforce non-negative weights (i.e.,
the training of an autoencoder. Given a set of training images, the
w r , w g , w b ≥ 0) to guarantee that the color filter is physically
encoding procedure consists of projecting the corresponding in-
realizable. For this, after each update, all negative weights are
put RGB information on the (trainable) color filter array pattern
clamped to zero. We do not constrain the value of the bias param-
(Section 3.1). Such a projection generates a single-channel multi-
eter w t since it is added after image capture. Similarly, we do not
spectral image mosaic (Fig. 1 (center)), which serves as input for
constrain the maximum value of the weights w r , w g , and w b to
the decoding (color reconstruction) step.
allow for a wider range of admissible parameters during training
The training process minimizes a loss function defined as the (any constant rescaling may be performed after image capture
mean squared error (MSE) between the provided ground truth as well). While such restrictions could be included on the archi-
and the reconstructed color images. The trainable parameters tecture, having them could hamper the training convergence (by
are the colors of the CFA pattern (encoder) and the CNN weights reducing the optimizer’s search-space). In practice, however, the
for demosaicing (decoder). Fig. 1 illustrates the concept. When weights produced by our method seem to always fall in the [0, 1]
designing the encoder, we are restricted by physical limitations interval (see Fig. 2).
Each color filter w i has an associated binary mask (maski ) color filters, the required tent kernel is
that specifies the pixels projected through it. Thus, simulating
a CFA containing N distinct color filters requires the use of N 1 2 3 4 3 2 1
2 4 6 8 6 4 2
distinct functionals and N disjoint binary masks. The projected-
3 6 9 12 9 6 3
submosaic generated by the i -th color filter is then defined as 1
k 4,4 = 4 8 12 16 12 8 4 .
~ i × maski , 16
¡ ¢
submosaici = I input ⊗ w (2) 3 6 9 12 9 6 3
where I input is an RGB input image, w~ i = [w i r , w i g , w i b , w i t ], 2 4 6 8 6 4 2
maski is the binary mask corresponding to the i -th CFA color, 1 2 3 4 3 2 1
and both the functional ⊗ and product × are evaluated pixelwise.
This interpolation is performed by a standard convolutional layer
Thus, the multispectral mosaic generated by a CFA with N color
with fixed kernel weights. Although these weights could also be
filters is define as
learned, we have found that fixing them (i.e., not updating them
N
X during training) produces better results.
I mosaic = submosaici . (3)
i =1
The complete architecture of our autoencoder is depicted in
Note that one can define CFAs of arbitrary sizes while enforc- Fig. 4, while Section 4 provides implementation details. Besides
ing how the pattern should repeat by using the disjoint masks. the skip connection from the beginning of the decoder to its
~ i are optimized to minimize color-
During training, the weights w end (for improving color reconstruction and training conver-
reconstruction error, while all masks remain fixed. Appendix B gence) indicated by the dotted orange arrow, we also use skip
describes the construction of the binary masks used in our ex- connections from each submosaic and corresponding linearly-
periments. Fig. 3 shows an example of an encoder mimicking the interpolated versions to the decoder’s input. Such connections
Bayer pattern, for which the bias w i t = 0 for all i . Note that the are indicated by the dotted green arrows and proved to speed up
mask corresponding to the green component is twice as dense as the training, providing a path for gradient backpropagation.
the others.
Training: Given the number of color filters for the CFA and
Decoding: The decoding step receives as input the multispec- the topology of the decoding network (number of convolutional
tral image mosaic I mosaic produced by the encoder (Eq. (3)) and layers with their inner parameters – Fig. 4), the autoencoder
tries to reconstruct the original RGB image I input . The decoder is trained by iteratively feeding patches to the network. Such
architecture consists of stacked convolutional layers, each one patches are used to update the the colors of the CFA and weights
followed by a batch-normalization layer [IS15] and by the ReLU of the demosaicing method, trying to minimize the MSE between
activation function [NH10]. All convolutional layers use 3×3 ker- the input color patches and their reconstructions.
nels, using padding to ensure the same x y-dimensions for all re-
ceptive fields. We trained multiple architectures using different
numbers of skip connections. The results of these experiments
suggest that the use of a single skip connection (from the begin- 4. Demosaicing Results and Evaluation
ning of the decoder to its end) results in better performance, both
in terms of faster convergence and better reconstruction. All re- Our particular instantiation of the autoencoder architecture de-
sults presented in Section 4 and in the supplemental materials scribed in Section 3.1 and illustrated in Fig. 4 includes a de-
were obtained with such an architecture, which is shown in Fig. 4. coder consisting of a stack of 12 convolutional layers. The num-
ber of (3 × 3) kernels used in each of these 12 layers are [64,
In addition to the monochromatic image mosaic, we provide 64, 64, 64, 64, 64, 128, 128, 128, 128, 128, 128], respectively.
the following additional inputs to the decoder: the submosaics This setup presents a good trade-off between network expres-
of each color filter w i , and linearly-interpolated versions of each siveness and training time. We show that the quality of our re-
submosaic. Although the submosaics themselves do not add new constructions surpasses the ones generated by existing meth-
information to the decoding sub-network, they save the effort ods [Cha16, MBP∗ 09, GCPD16].
of learning how to separate individual channels, thus reducing
training time. The linearly-interpolated versions of the submo- We implemented our network using Keras [Cho15], running on
saic, in turn, result in better results and faster convergence, since top of Theano [Tea16], using MSE as the loss function, and the
such an initial guess for the color-reconstructed image is much Adam optimizer [KB14] (l r = α = 0.001, β1 = 0.9, β2 = 0.999, and
closer to the target image. In our CNN architecture, linear inter- ² = 10−8 ), with batch size of 32. Training was performed using
polation is achieved by convolving the submosaics with a tent two datasets provided by Gharbi et al. [GCPD16]: vdp and moiré.
kernel. Since this kernel is separable, the 2D interpolation kernel We used the same images as Gharbi et al. for training, consisting
k n,m is defined as the outer product k n k m T , where of 2,590,186 128 × 128-pixel patches. In addition, we used hori-
£1 ¤T zontal and vertical flips, as well as random 90◦ rotations, for data
k c = /c 2/c ... (c − 1)/c 1 (c − 1)/c ... 2/c 1/c . augmentation. Our model for reconstructing noise-free images
(as well as our demosaicing method for the Bayer pattern) was
The vector k c ∈ R2c−1 defines a 1D tent kernel that interpolates trained for 3 epochs, which corresponds to approximately 5 days
a mosaic generated by a 1D CFA pattern containing c colors. For of training time on a GeForce GTX TITAN X GPU. Our model for
example, k 2 = [1/2 1 1/2] and, for a 4×4 CFA pattern with 16 distinct reconstructing noisy data was trained for 6 epochs.
Skip connection
𝐬𝐮𝐛𝐦𝐨𝐬𝐚𝐢𝐜𝟏 Skip connections Interp
layer
𝑤1 concat
concat
𝐬𝐮𝐛𝐦𝐨𝐬𝐚𝐢𝐜𝟐
𝑤2 Mosaic &
Multispectral Mosaic
Mosaic Conv Conv Conv Reconstructed
Patch … + = Image Mosaic
Mosaic
Submosaics
Interps
&
layer layer … layer Patch
𝑤𝑁
𝐬𝐮𝐛𝐦𝐨𝐬𝐚𝐢𝐜𝑵
concat
Encoder Decoder
Figure 4: Our autoencoder architecture. The encoding step projects the colored image patches through the trainable CFA, generating a
multispectral image mosaic. Skip connections (green arrows) contribute submosaics consisting of the separate color channels as well as
per-channel interpolated images, which are stacked with the multispectral image mosaic forming a deeper representation (total of 2N + 1
channels for N color filters).The decoding component is based on residual blocks. This autoencoder can produce CFA patterns of different
sizes and works with networks of distinct depths. The beginning-to-end decoder skip connection (orange arrow) improves color reconstruc-
tion and training convergence. Parameters details are given in Section 3.1.
Demosaic (Bayer CFA) Kodak McMaster vdp moiré have used either source or executable code provided by the au-
Bilinear 29.51 32.32 24.97 27.39 thors. All images were saved to disk to guarantee similar color
[ZWBL11a] 35.66 29.87 24.85 28.04 quantization and later compared based on PSNR (error aver-
[Get11a]] 35.98 35.87 29.88 31.70
aged over pixels and color channels before computing the log-
[BCMS09] 36.62 35.24 29.34 31.30
arithm). In addition, all measurements were performed on full-
[LKV10] 37.17 32.22 27.67 28.70
[CM12] 38.51 33.29 29.03 30.96 resolution images (borders included). We did not use any of the
[JAJ∗ 14] 38.71 36.84 30.27 31.75 test images in our training phase. Table 1 shows the average
[HST∗ 14] 38.83 38.30 30.93 34.61 PSNR for the traditional Kodak [Fra99] and McMaster [ZWBL11b]
[KMTO16] 38.84 36.86 30.52 31.90 datasets, as well as for the two datasets of Gharbi et al. (vdp and
[JD13] 40.03 33.78 29.34 31.33 moiré) [GCPD16]. The techniques on the top portion of Table 1
[Get11b] 40.13 34.17 30.06 32.25
perform demosaicing for the Bayer pattern, while the ones on
[MBP∗ 09] 41.23 36.13 30.94 33.16
the bottom portion perform demosaicing for non-Bayer CFAs.
[GCPD16] 41.79 39.14 33.96 36.64
Our 2x2 Bayer 41.86 39.51 34.28 36.33 Our 4 × 4 noise-free CFA and demosaicing solution (last row of
Demosaic (non-Bayer CFA) Kodak McMaster vdp moiré
Table 1) surpasses all existing methods in all four datasets (even
[Cha16] 31.52 28.05 23.97 23.97 for Kodak and McMaster, from which no images were used for
[CFZ14] 33.51 30.94 25.91 28.77 training). We also trained our architecture using the Bayer pat-
[Con11] 38.10 32.90 28.65 30.45 tern, i.e., only optimizing the decoder (Fig. 3). Our demosaicing
[HLLD11] 39.42† — — — network for the Bayer pattern also outperforms all previous tech-
[BLLY16] 40.24† — — — niques for the Kodak, McMaster, and vdp datasets, and got very
[HW08] 40.36† — — — close to Gharbi et al.’s [GCPD16] in the moiré dataset. These re-
[LBLY17] 41.59† — — — sults clearly demonstrate the effectiveness of our autoencoder
Our 4x4 noise-free 43.13 40.18 35.17 37.70
architecture and the advantage of jointly optimizing CFA design
Table 1: Comparison of our 4 × 4 noise-free CFA and demosaic- and demosaicing.
ing technique against existing methods. The numbers show the Fig. 5 compares the reconstruction quality of our 4 × 4
average PSNR values of reconstructions for four datasets. All re- noise-free model with the state-of-the-art demosaicing tech-
sults were generated using code provided by the authors, except the niques [Cha16, MBP∗ 09, HST∗ 14, GCPD16]. For such compari-
ones marked with † , whose numbers were taken from the corre- son, we have used code provided by the authors for their noise-
sponding publications. Our 4 × 4 noise-free CFA and demosaicing free trained models. Chakrabarti’s method [Cha16] does not re-
outperform all other techniques in all four datasets (best results in construct the full patch, so we measure the PSNR only for the re-
bold). Our demosaicing network for the Bayer pattern also outper- constructed area. The examples in Fig. 5 are from Gharbi et al.’s
forms all previous techniques for the Kodak, McMaster, and vdp datasets. Note how our method better handles high-frequency
datasets, and got very close to Gharbi et al.’s in the moiré dataset. information, being less susceptible to aliasing artifacts than
other techniques (see the stripped shirt example in Fig. 5). Ad-
4.1. Comparisons to Other Approaches ditional examples can be found in the supplementary materials.
Table 1 compares the PSNR of the results obtained with our Chakrabarti [Cha16] trained and tested his model using the
4 × 4 noise-free CFA (Fig. 2) with the ones produced by the most dataset of Shi and Funt [SF10]. We also tested our 4 × 4 noise-free
successful demosaicing techniques [MBP∗ 09, Get11b, BLLY16, CFA on the same test images. Chakrabarti [Cha16] achieves an
HLLD11,Con11,Wan14,Cha16,GCPD16]. For all comparisons, we average PSNR of 41.50, while our model achieves 48.96, a signifi-
PSNR: 13.76 PSNR: 21.76 PSNR: 25.35 PSNR: 24.07 PSNR: 30.99
striped shirt
PSNR: 22.12 PSNR: 24.30 PSNR: 25.59 PSNR: 24.21 PSNR: 36.75
sleeves
PSNR: 9.58 PSNR: 31.43 PSNR: 31.75 PSNR: 34.34 PSNR: 39.62
ruller
PSNR: 28.55 PSNR: 33.10 PSNR: 34.31 PSNR: 36.91 PSNR: 41.58
high-freq
PSNR: 19.01 PSNR: 22.78 PSNR: 24.17 PSNR: 21.33 PSNR: 23.97
Chakrabarti [Cha16] Mairal et al. [MBP∗ 09] Heide et al. [HST∗ 14] Gharbi et al. [GCPD16] Our 4x4 noise-free Ground Truth
Figure 5: Comparison of reconstruction quality of our 4×4 noise-free method and the state-of-the-art techniques (best PSNR values shown
in bold). Patches from Gharbi et al.’s datasets [GCPD16]. Better visualized in the digital version.
cant improvement in reconstruction quality, even though no im- Running Times (seconds)
ages from Shi and Funt’s dataset were used for training our CFA. Chakrabarti (GPU) [Cha16] 0.09
Gharbi et al. (GPU) [GCPD16] 0.16
Our 4x4 (GPU) 0.34
Table 2 shows the average running times of the state-of-the- Mairal et al. (CPU) [MBP∗ 09] 565.23
art techniques for reconstructing images from the Kodak dataset Heide et al. (CPU) [HST∗ 14] 652.63
(resolution of 768 × 512). All measurements were made on an In-
Table 2: Average running times of state-of-the-art techniques for
tel Core i7-2660 CPU and GeForce GTX TITAN X GPU. Our tech-
reconstructing images from the Kodak dataset.
nique is slightly slower than other GPU-based implementations,
but still much faster than methods based on compressive sens-
ing [MBP∗ 09] or optimization schemes [HST∗ 14].
4.2. Reconstruction in the presence of noise Noise σ = 4 Kodak McMaster vdp moiré
The idea of jointly performing demosaicing and denoising has [Cha16] 28.59 26.32 21.96 21.72
been explored by many works [Con11, CM12, HvLP12, KHKP16, [CFZ14] 30.70 28.34 25.34 27.71
GCPD16,Cha16]. Enhancing our autoencoder with denoising ca- [Con11] 34.15 31.19 27.86 29.30
pabilities only requires feeding the network with patches cor- [CM12] 34.43 31.53 28.19 29.69
rupted by (artificial) noise during the training phase, while com- [GCPD16] 36.90 36.02 31.61 33.31
paring the network’s output to the noise-free images. To demon- Our 4x4 noise 38.01 36.59 32.83 34.54
strate the flexibility of our architecture, we have trained the same Noise σ = 8 Kodak McMaster vdp moiré
network structure from scratch, using the same datasets used for [Cha16] 26.63 25.16 20.79 20.52
training our 4 × 4 noise-free CFA. This time, however, each in- [CFZ14] 27.26 25.56 23.75 25.39
put patch was corrupted by additive Gaussian noise. Although [Con11] 29.83 28.50 26.26 27.20
in linear space camera noise should be modeled as a combina- [CM12] 30.14 28.84 26.56 27.52
tion of Poissonian and Gaussian noise [FTKE08], according to [GCPD16] 34.19 33.97 29.87 31.34
Jeon and Dubois [JD13], for white-balanced, gamma-corrected Our 4x4 noise 35.08 34.39 31.16 32.50
images (such as the case of the vdp and moiré datasets [GCPD16] Noise σ = 12 Kodak McMaster vdp moiré
used for training) one can model noise as signal-independent [Cha16] 25.59 24.32 20.22 20.04
white Gaussian noise. By corrupting the images with Gaussian [CFZ14] 24.62 23.41 22.20 23.34
noise with a varying standard deviation randomly picked from [Con11] 26.74 26.16 24.59 25.19
the set {0, 4, 8, 12, 16, 20}, we avoid the need of specialized net- [CM12] 27.07 26.50 24.87 25.49
works for each noise level (such as in Chakrabarti [Cha16]), train- [GCPD16] 32.40 32.41 28.39 29.87
ing a single model that handles a large range of noise variance. Our 4x4 noise 33.31 32.90 29.73 31.02
Noise σ = 16 Kodak McMaster vdp moiré
Table 3 compares the PSNR for our 4 × 4 CFA for noisy
[Cha16] 25.28 23.96 20.16 20.26
data (Fig. 2 (right)) against existing techniques. The quality
[CFZ14] 22.60 21.68 20.81 21.65
of our reconstructions surpasseses previous approaches in all
[Con11] 24.44 24.23 23.06 23.45
datasets, for all noise levels. Moreover, unlike Gharbi et al.’s ap-
[CM12] 24.76 24.56 23.32 23.74
proach [GCPD16], ours does not require an estimate of the noise
[GCPD16] 31.07 31.19 27.18 28.73
level, and thus the quality of our denoising results are not depen-
Our 4x4 noise 32.17 31.81 28.56 29.88
dent on the accuracy of any noise estimation step.
Noise σ = 20 Kodak McMaster vdp moiré
Fig. 6 compares our results to the the state-of-the-art tech- [Cha16] 24.60 23.40 19.98 20.31
niques. Note that our method performs an optimization that [CFZ14] 20.93 20.24 19.60 20.23
jointly improves CFA design, demosaicing, and denoising. As a [Con11] 22.61 22.62 21.70 21.96
result, it can reduce noise without oversmoothing the images, [CM12] 22.93 22.94 21.95 22.23
generating higher-quality reconstructions. Our CFA pattern for [GCPD16] 30.00 30.15 26.17 27.80
noisy datasets can be seen in Fig. 2 (right). Additional examples Our 4x4 noise 31.20 30.87 27.57 28.93
can be found in the supplementary materials.
Table 3: Comparison of our model with existing methods for joint
denoise and demosaic. The numbers show the average PSNR val-
5. Discussion ues of reconstructions for four datasets corrupted by noise of dif-
We evaluated several network architectures beyond the ones pre- ferent intensities. Our model outperforms all other techniques in
sented, and some observations worthy mentioning: all four datasets and for all noise intensities.
Additional input: By providing additional information to the de- CFA pattern size: Our architecture can train CFAs with an arbi-
mosaicing (decoder) scheme, our network is capable of learning trary number of color filters, and we have opted to train designs
faster and achieving better reconstructions. In addition to the larger than the traditional 2 × 2 patterns. Training bigger-sized
mosaic image, we have also provided the CFA submosaics, and CFA patterns has several advantages. First, they contain a larger
their linearly interpolated versions. This simplifies the training, number of distinct colors and thus provide more coverage dur-
σ = 4, PSNR: 36.05 PSNR: 27.04 PSNR: 34.74 PSNR: 36.46 PSNR: 37.97
σ = 12, PSNR: 26.55 PSNR: 24.37 PSNR: 27.00 PSNR: 30.77 PSNR: 32.08
σ = 12, PSNR: 26.85 PSNR: 23.84 PSNR: 27.39 PSNR: 33.84 PSNR: 34.21
σ = 20, PSNR: 22.49 PSNR: 22.87 PSNR: 23.17 PSNR: 31.62 PSNR: 32.21
σ = 12, PSNR: 26.70 PSNR: 24.94 PSNR: 27.08 PSNR: 31.40 PSNR: 32.85
σ = 20, PSNR: 22.37 PSNR: 24.03 PSNR: 22.99 PSNR: 29.17 PSNR: 30.64
Condat and
Corrupted Image Chakrabarti [Cha16] Gharbi et al. [GCPD16] Our 4x4 noise Ground Truth
Mossadegh [CM12]
Figure 6: Comparison of the reconstructions obtained by our method and state-of-the-art techniques that jointly perform denoising and
demosaicing (best PSNR in bold). The input images were corrupted with Gaussian noise (left column), whose level is indicated by the
standard deviation σ (in RGB [0, 255] units). Images from the Kodak dataset. Better visualized in the digital version.
PSNR: 17.65 PSNR: 17.68 PSNR: 22.42 PSNR: 21.30 PSNR: 20.62
PSNR: 18.43 PSNR: 17.88 PSNR: 19.35 PSNR: 19.66 PSNR: 20.14
Chakrabarti [Cha16] Mairal et al. [MBP∗ 09] Heide et al. [HST∗ 14] Gharbi et al. [GCPD16] Our 4x4 noise-free Ground Truth
Figure 7: Images with extreme high-frequencies are challenging for all demosaicing methods, whose results exhibit aliasing artifacts. These
patches are from the moiré dataset [GCPD16].
ing sampling of the color space, allowing the CNN to make bet- the artifacts in the reconstructions by the techniques of Mairal
ter use of correlations among colors. Second, smaller CFAs are et al. and Gharbi et al. are similar. Both use the same Bayer CFA,
more susceptible to aliasing due to pattern repetition, while big- indicating that such moiré artifacts are due to CFA color subsam-
ger CFAs allow the learning of more stochastic patterns. Third, pling, rather than to the demosaicing method itself.
from an optimization perspective, the 2×2 search-space is a sub-
Before arriving at the described architecture, we have sys-
space of the 4×4 search-space, meaning that a 4×4 CFA can learn
tematically tried many alternatives. Such exploration included
a 2×2 pattern if it is advantageous. For instance, a careful inspec-
the use of L1 and L2 regularizers, different optimizers (Adadelta
tion of Fig. 2 reveals that each of our 4 × 4 CFAs actually consists
and Adam), dropouts [SHK∗ 14], various configurations of skip-
of two side-by-side copies of a 4×2 pattern. The small differences
connections, different sizes of CFA patterns (including 6 × 6 and
among the corresponding RGB coefficients in the 4×2 patterns in
8 × 8), and trainable/fixed interpolation layers. While testing all
each CFA are fairly small, and are likely to be reduced with longer
combinations of these elements is unfeasible, we have made ex-
training. This suggests that 4 × 2 patterns are the most efficient
tensive experimentation. The results of these tests indicated that
tileable representations for CFAs achievable with a 4 × 4 pattern.
the architecture for the 4 × 4 patterns (both for noise-free and
Manufacturing our CFAs: Our encoder optimizes the CFA colors noisy patterns) achieved the overall best PSNR results. Note that
over the full RGB space. Although our 4 × 4 CFAs do not use the the CFA designs learned for the noise-free and for the noisy cases
standard Bayer color filters, the colors used in our patterns are are similar, one being a shifted version of the other. This indicates
a linear combination of these standard filters, which should sim- that those colors were not found by chance, and they indeed pro-
plify the manufacturing process. Alternatively, Chiulli [Chi89] has vide lower reconstruction errors.
patented a technique for creating color filters from any combi-
nations of red, green, blue, cyan, yellow and magenta dies. More
recently, SILIOS Technologies has developed a manufacturing 6. Conclusion
technique called COLOR SHADES® for producing band-pass fil-
ters [SIL17]. This technology combines thin film deposition and We have presented a convolutional neural network architecture
micro/nano-etching processes onto a silica substrate [LWTG14]. for performing joint design of color filter arrays, demosaicing,
COLOR SHADES® provides band-pass filters in the visible range and denoising. By expressing the CFA projection and linear inter-
from 400 nm to 700 nm (as well as in the IR range). Lapray et polation as convolutional layers, our network finds the filter pat-
al. [LWTG14] describe the construction of a multispectral CFA us- tern and corresponding demosaicing method that jointly min-
ing eight optical filter bands produced with COLOR SHADES® , imize image reconstruction error. The patterns and algorithms
and compare the simulated and measured responses of the indi- produced by our method provide high-quality color reconstruc-
vidual filters. This technology could be used to produce our CFAs. tions, surpassing the state-of-the-art techniques on all standard
demosaicing datasets.
Demosaicing of extremely high-frequency content is challeng-
ing to all demosaicing methods. Fig. 7 shows examples of two Our approach can also reconstruct high-quality images from
image patches the for which all techniques, including ours, are noisy data, outperforming existing techniques for all noise lev-
unable to obtain high-quality image reconstructions. Such prob- els, without requiring any information about the noise level in
lems are due to aliasing, when color high-frequency details can- the input data. In addition, it can be used to obtain effective de-
not be appropriately sampled by the CFA [GCPD16]. Note that mosaicing strategies for existing CFA patterns. Given its flexibil-
ity, our architecture might possibly be adapted for the design of [Get11b] G ETREUER P.: Zhang-Wu Directional LMMSE Image Demo-
CFAs for capturing HDR content with a single shot. saicking. Image Processing On Line 1 (2011). 1, 5
[GSL00] G O J., S OHN K., L EE C.: Interpolation using neural networks
for digital still cameras. IEEE TCE 46, 3 (2000), 610–616. 2
Acknowledgements
[HLLD11] H AO P., L I Y., L IN Z., D UBOIS E.: A geometric method for
We would like to thank the reviewers for their insightful com- optimal design of color filter arrays. IEEE TIP 20, 3 (2011), 709–722. 1,
ments, and NVIDIA for donating the GeForce GTX Titan X GPU 2, 5
used for this research. This work was sponsored by CNPq-Brazil [HST∗ 14] H EIDE F., S TEINBERGER M., T SAI Y.-T., R OUF M., PAJAK D.,
(fellowships and grants 306196/2014-0, 423673/2016-5). R EDDY D., G ALLO O., L IU J., H EIDRICH W., E GIAZARIAN K., K AUTZ J.,
P ULLI K.: Flexisp: A flexible camera image processing framework. ACM
ToG 33, 6 (2014). 2, 5, 6, 9
References [HvLP12] H EINZE T., VON L ÖWIS M., P OLZE A.: Joint multi-frame de-
[ASH05] A LLEYSSON D., S USSTRUNK S., H ERAULT J.: Linear demosaic- mosaicing and super-resolution with artificial neural networks. In
ing inspired by the human visual system. IEEE TIP 14, 4 (2005), 439– IWSSIP (2012), pp. 540–543. 1, 2, 7
449. 1, 2 [HW08] H IRAKAWA K., W OLFE P. J.: Spatio-spectral color filter array de-
[Bay76]B AYER B.: Color imaging array, 1976. US Patent 3,971,065. URL: sign for optimal image recovery. IEEE TIP 17, 10 (2008), 1876–1890. 2,
https://www.google.com/patents/US3971065. 1, 2 5
[BCMS09] B UADES A., C OLL B., M OREL J. M., S BERT C.: Self-similarity [HZRS16] H E K., Z HANG X., R EN S., S UN J.: Deep residual learning for
driven color demosaicking. IEEE TIP 18, 6 (2009), 1192–1202. 2, 5 image recognition. In CVPR (2016). 3
[BLLY16] B AI C., L I J., L IN Z., Y U J.: Automatic design of color filter [IS15] I OFFE S., S ZEGEDY C.: Batch normalization: Accelerating deep
arrays in the frequency domain. IEEE TIP 25, 4 (2016), 1793–1807. 1, 2, network training by reducing internal covariate shift. In ICML (2015),
5, 7 pp. 448–456. 3, 4
[CFZ14] C HAKRABARTI A., F REEMAN W. T., Z ICKLER T.: Rethinking [JAJ∗ 14] J AISWAL S. P., AU O. C., J AKHETIYA V., Y UAN Y., YANG H.: Ex-
color cameras. In ICCP (2014), pp. 1–8. 2, 5, 7 ploitation of inter-color correlation for color image demosaicking. In
[Cha16] C HAKRABARTI A.: Learning sensor multiplexing design IEEE ICIP (2014), pp. 1812–1816. 2, 5
through back-propagation. CoRR abs/1605.07078 (2016). 2, 4, 5, 6, 7, [JD13] J EON G., D UBOIS E.: Demosaicking of noisy bayer-sampled
8, 9 color images with least-squares luma-chroma demultiplexing and
[Chi89] C HIULLI C.: Method for manufacturing an optical filter, Feb. 28 noise level estimation. IEEE TIP 22, 1 (2013), 146–156. 5, 7
1989. US Patent 4,808,501. 9 [JMFU17] J IN K. H., M C C ANN M. T., F ROUSTEY E., U NSER M.: Deep
[Cho15] C HOLLET F.: Keras: Deep learning library for theano and ten- convolutional neural network for inverse problems in imaging. IEEE
sorflow, 2015. https://keras.io/. 4 TIP 26, 9 (2017), 4509–4522. 3
[CJN∗ 17] C HOI I., J EON D. S., N AM G., G UTIERREZ D., K IM M. H.: [KB14] K INGMA D. P., B A J.: Adam: A method for stochastic optimiza-
High-quality hyperspectral reconstruction using a spectral prior. ACM tion. CoRR abs/1412.6980 (2014). 4
TOG 36, 6 (Nov. 2017), 218:1–218:13. 3
[KB15] K AUR S., B ANGA V. K.: A survey of demosaicing: Issues and chal-
[CM12] C ONDAT L., M OSADDEGH S.: Joint demosaicking and denoising lenges. International Journal of Science, Engineering and Technologies
by total variation minimization. In IEEE ICIP (2012), pp. 2781–2784. 2, 2, 1 (2015). 2
5, 7, 8
[KHKP16] K LATZER T., H AMMERNIK K., K NOBELREITER P., P OCK T.:
[Con09] C ONDAT L.: A new random color filter array with good spectral Learning joint demosaicing and denoising based on sequential energy
properties. In ICIP (2009), pp. 1613–1616. 2 minimization. In IEEE ICCP (2016), pp. 1–11. 2, 7
[Con10] C ONDAT L.: Color filter array design using random patterns [KHO00] K APAH O., H EL -O R H. Z.: Demosaicking using artificial neu-
with blue noise chromatic spectra. Image and Vision Computing 28, 8 ral networks. In Proc. SPIE (2000), vol. 3962, pp. 112–120. 2
(2010), 1196 – 1202. 2, 7
[KMTO16] K IKU D., M ONNO Y., TANAKA M., O KUTOMI M.: Beyond
[Con11] C ONDAT L.: A new color filter array with optimal properties
color difference: Residual interpolation for color image demosaicking.
for noiseless and noisy color image acquisition. IEEE TIP 20, 8 (2011),
IEEE TIP 25, 3 (2016), 1288–1300. 5
2200–2210. 1, 2, 5, 7
[KSH12] K RIZHEVSKY A., S UTSKEVER I., H INTON G. E.: Imagenet clas-
[Dub05] D UBOIS E.: Frequency-domain methods for demosaicking of
sification with deep convolutional neural networks. In NIPS 25. 2012,
Bayer sampled color images. IEEE Signal Processing Lett (2005), 847–
pp. 1097–1105. 3
850. 1, 2
[DVM16] D AVE A., VADATHYA A. K., M ITRA K.: Compressive image re- [KW13] K INGMA D. P., W ELLING M.: Auto-encoding variational bayes.
covery using recurrent generative model. CoRR abs/1612.04229 (2016). CoRR abs/1312.6114 (2013). 2, 3
URL: http://arxiv.org/abs/1612.04229. 2 [LBBH98] L ECUN Y., B OTTOU L., B ENGIO Y., H AFFNER P.: Gradient-
[Fra99] F RANZEN R.:
Kodak lossless true color image suite, 1999. based learning applied to document recognition. Proceedings of the
http://r0k.us/graphics/kodak/. 5 IEEE 86, 11 (1998), 2278–2324. 3
[FTKE08] F OI A., T RIMECHE M., K ATKOVNIK V., E GIAZARIAN K.: Prac- [LBLY17] L I J., B AI C., L IN Z., Y U J.: Optimized color filter arrays for
tical poissonian-gaussian noise modeling and fitting for single-image sparse representation-based demosaicking. IEE TIP 26, 5 (2017), 2381–
raw-data. IEEE TIP 17, 10 (2008), 1737–1754. 7 2393. 1, 2, 5
[GCPD16] G HARBI M., C HAURASIA G., PARIS S., D URAND F.: Deep joint [LCTZ07] L IAN N. X., C HANG L., TAN Y. P., Z AGORODNOV V.: Adaptive
demosaicking and denoising. ACM ToG 35, 6 (2016), 191:1–191:12. 3, filtering for color filter array demosaicking. IEEE TIP 16, 10 (2007),
4, 5, 6, 7, 8, 9 2515–2525. 2
[Get11a] G ETREUER P.: Color demosaicing with contour stencils. In [LGZ08] L I X., G UNTURK B., Z HANG L.: Image demosaicing: a system-
ICDSP (2011), pp. 1–6. 5 atic survey. In Proc. SPIE (2008), vol. 6822, pp. 1–15. 2
[LH06] L ONG Y., H UANG Y.: Adaptive demosaicking using multiple [ZWBL11b] Z HANG L., W U X., B UADES A., L I X.: Mcmaster dataset,
neural networks. In IEEE Signal Processing Society Workshop on MLSP 2011. http://www4.comp.polyu.edu.hk/~cslzhang/CDM_
(2006), pp. 353–357. 2 Dataset.htm. 5
[Li05] L I X.: Demosaicing by successive approximation. IEEE TIP 14, 3 [ZYW∗ 15] Z ENG K., Y U J., WANG R., L I C., TAO D.: Coupled deep au-
(2005), 370–379. 2 toencoder for single image super-resolution. IEEE Transactions on Cy-
[LKV10] L U Y., K ARZAND M., V ETTERLI M.: Demosaicking by alternat- bernetics PP, 99 (2015), 1–11. 3
ing projections: Theory and fast one-step implementation. IEEE TIP [ZZC∗ 17] Z HANG K., Z UO W., C HEN Y., M ENG D., Z HANG L.: Beyond a
19, 8 (2010), 2085–2098. 5 gaussian denoiser: Residual learning of deep cnn for image denoising.
[LP05] L UKAC R., P LATANIOTIS K. N.: Color filter arrays: design and per- IEEE TIP 26, 7 (2017), 3142–3155. 3
formance analysis. IEEE TCE 51, 4 (2005), 1260–1267. 2, 7
[LV09] L U Y. M., V ETTERLI M.: Optimal color filter array design: quan- Appendix A: RGB Weights and Bias Terms
titative conditions and an efficient search procedure. In Digital Pho-
tography (2009), vol. 7250, SPIE, pp. 1–9. 2 Fig. 8 shows the RGB and bias weights (w ~ i = [w i r , w i g , w i b , w i t ])
[LWTG14] L APRAY P.-J., WANG X., T HOMAS J.-B., G OUTON P.: Multi- for our 4 × 4 CFAs for noise-free and noisy data. The values of
spectral filter arrays: Recent advances and practical implementation. w i r , w i g , w i b , and w i t are shown from top to bottom inside each
Sensors 14, 11 (2014), 21626–21659. 9
color filter cell.
[MAKR13] M OGHADAM A. A., A GHAGOLZADEH M., K UMAR M., R ADHA
H.: Compressive framework for demosaicing of natural images. IEEE
TIP 22, 6 (2013), 2356–2371. 2
[MBP∗ 09] M AIRAL J., B ACH F., P ONCE J., S APIRO G., Z ISSERMAN A.:
Non-local sparse models for image restoration. In ICCV (2009),
pp. 2272–2279. 1, 2, 4, 5, 6, 9
[MC11] M ENON D., C ALVAGNO G.: Color image demosaicking: An
overview. Image Commun. 26, 8-9 (2011), 518–533. 2
[NH10] N AIR V., H INTON G. E.: Rectified linear units improve restricted
boltzmann machines. In ICML (2010), pp. 807–814. 4 Our 4x4 noise-free Our 4x4 noise
[SF10] S HI L., F UNT B.: Re-processed version of the gehler color con- Figure 8: Our 4 × 4 patterns for noise-free and for noisy data. The
stancy dataset of 568 images, 2010. http://www.cs.sfu.ca/ color filters are expressed using four coefficients: R, G, B and a bias
~colour/data/shi_gehler/. 5
term, respectively (shown inside each cell).
[SHK∗ 14] S RIVASTAVA N., H INTON G., K RIZHEVSKY A., S UTSKEVER I.,
S ALAKHUTDINOV R.: Dropout: A simple way to prevent neural net-
works from overfitting. JMLR 15, 1 (2014), 1929–1958. 9
Appendix B: Construction of Binary Masks
[SIL17] SILIOS T ECHNOLOGIES: COLOR SHADES, 2017. URL: https:
//www.silios.com/color-shades. 9 A CFA is a periodic structure, with the CFA pattern corresponding
[SLJ∗ 15] S ZEGEDY C., L IU W., J IA Y., S ERMANET P., R EED S., A NGUELOV to one period. Given an M ×N CFA pattern, this will result in M N
D., E RHAN D., VANHOUCKE V., R ABINOVICH A.: Going deeper with (M times N ) distinct M ×N binary masks for the CFA pattern (i.e.,
convolutions. In CVPR (2015), pp. 1–9. 3
one binary mask for each CFA element). Each such binary mask
[SVI∗ 15] S ZEGEDY C., VANHOUCKE V., I OFFE S., S HLENS J., W OJNA Z.: has a single non-zero element (with value 1), at the position cor-
Rethinking the inception architecture for computer vision. CoRR
responding to the given CFA element. Thus, in a M × N CFA, the
abs/1512.00567 (2015). 3
CFA element at position (i , j ), 1 ≤ i ≤ M , 1 ≤ j ≤ N , has a cor-
[TAG∗ 17] T IMOFTE R., A GUSTSSON E., G OOL L. V., YANG M. H., Z HANG
L., L IM B., ... G UO Q.: Ntire 2017 challenge on single image super-
responding binary mask containing zeros everywhere, except at
resolution: Methods and results. In IEEE CVPRW (2017), pp. 1110– mask position (i , j ), which contains the value 1. Similar to a com-
1121. 3 plete CFA, the actual binary masks cover the entire image, being
[Tea16] T EAM T. D.: Theano: A Python framework for fast computation obtained by tiling the corresponding CFA-element binary masks.
of mathematical expressions. CoRR abs/1605.02688 (2016). 4
[vdMH08] VAN DER M AATEN L., H INTON G. E.: Visualizing high-
dimensional data using t-sne. JMLR 9 (2008), 2579–2605. 3
[VLL∗ 10] V INCENT P., L AROCHELLE H., L AJOIE I., B ENGIO Y., M AN -
ZAGOL P.-A.: Stacked denoising autoencoders: Learning useful rep-
resentations in a deep network with a local denoising criterion. JMLR
11 (2010), 3371–3408. 3
[Wan14] WANG Y. Q.: A multilayer neural network for image demo-
saicking. In ICIP (2014), pp. 1852–1856. 1, 2, 5
[ZK16] Z AGORUYKO S., KOMODAKIS N.: Wide residual networks. CoRR
abs/1605.07146 (2016). 7
[ZW05] Z HANG L., W U X.: Color demosaicking via directional linear
minimum mean square-error estimation. IEEE TIP 14, 12 (2005), 2167–
2178. 2
[ZWBL11a] Z HANG L., W U X., B UADES A., L I X.: Color demosaicking
by local directional interpolation and nonlocal adaptive thresholding.
J. Electronic Imaging 20, 2 (2011), 023016. 2, 5