Removing Motion Blur With SpaceTime Processing
Removing Motion Blur With SpaceTime Processing
Correspondence
Removing Motion Blur With Space–Time Processing The practical solutions to blind motion deblurring available so far
largely only treat the case, where the blur is a result of global mo-
Hiroyuki Takeda, Member, IEEE, and Peyman Milanfar, Fellow, IEEE tions due to the camera displacements [3], [4], rather than motion of
the objects in the scene. When the motion blur is not global, then it
would seem that segmentation information is needed in order to iden-
Abstract—Although spatial deblurring is relatively well understood by
tify what part of the image suffers from motion blur (typically due to
assuming that the blur kernel is shift invariant, motion blur is not so when fast-moving objects). Consequently, the problem of deblurring moving
we attempt to deconvolve on a frame-by-frame basis: this is because, in gen- objects in the scene is quite complex because it requires 1) segmenta-
eral, videos include complex, multilayer transitions. Indeed, we face an ex- tion of moving objects from the background, 2) estimation of a spatial
ceedingly difficult problem in motion deblurring of a single frame when the motion PSF for each moving object, 3) deconvolution of the moving ob-
scene contains motion occlusions. Instead of deblurring video frames indi-
vidually, a fully 3-D deblurring method is proposed in this paper to reduce jects one by one with the corresponding PSFs, and finally 4) putting the
motion blur from a single motion-blurred video to produce a high-resolu- deblurred objects back together into a coherent and artifact-free image
tion video in both space and time. Unlike other existing approaches, the pro- or sequence [5]–[8]. In order to perform the first two steps (segmen-
posed deblurring kernel is free from knowledge of the local motions. Most tation and PSF estimation), one would need to carry out global/local
importantly, due to its inherent locally adaptive nature, the 3-D deblurring
is capable of automatically deblurring the portions of the sequence, which
motion estimation [9]–[12]. Thus, the deblurring performance strongly
are motion blurred, without segmentation and without adversely affecting depends on the accuracy of motion estimation and segmentation of
the rest of the spatiotemporal domain, where such blur is not present. Our moving objects. However, the errors in both are in general unavoidable,
method is a two-step approach; first we upscale the input video in space particularly, in the presence of multiple motions, occlusion, or non-
and time without explicit estimates of local motions, and then perform 3-D rigid motions, i.e., when there are any motions that violate parametric
deblurring to obtain the restored sequence.
models or the standard optical flow brightness constancy constraint.
Index Terms—Inverse filtering, sharpening and deblurring. In this paper, we present a motion deblurring approach for videos
that is free of both explicit motion estimation and segmentation. Briefly
speaking, we point out and exploit what in hindsight seems obvious,
I. INTRODUCTION though apparently not exploited so far in the literature: that motion blur
is by nature a temporal blur, which is caused by relative displacements
Authorized licensed use limited to: Universita degli Studi di Bologna. Downloaded on May 01,2023 at 16:46:23 UTC from IEEE Xplore. Restrictions apply.
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 20, NO. 10, OCTOBER 2011 2991
2
Fig. 1. Motion (temporal) deblurring example of the Cup sequence (130 165, 16 frames) in which a cup moves upward. (a) Two frames of the ground truth
at times t =6 2 2
to 7. (b) Blurred video frames generated by taking the average of five consecutive frames (the corresponding PSF is 1 1 5 uniform) [PSNR:
23.76 dB (top), 23.68 dB (bottom), and structure similarity (SSIM): 0.76 (top), 0.75 (bottom)]. (c)–(e) Deblurred frames by Fergus’s method [3] [PSNR: 22.58 dB
(top), 22.44 dB (bottom), and SSIM: 0.69 (top), 0.68 (bottom)], Shan’s method [4] [PSNR: 18.51 dB (top), 10.75 dB (bottom), and SSIM: 0.57 (top), 0.16 (bottom)],
and the proposed 3-D total variation (TV) method (13) [PSNR: 32.57 dB (top), 31.55 dB (bottom), and SSIM: 0.98 (top), 0.97 (bottom)], respectively. The figures
(f)–(j) are the selected regions of the video frames (a)–(e) at time t=6 , respectively. (a) Ground truth. (b) Blurred frames. (c) Fergus et al. [3]. (d) Shan et al. [4].
(e) Proposed method (13). (f) Ground truth. (g) Blurred frames. (h) Fergus et al., [3]. (i) Shan et al. [4]. (j) Proposed method (13).
detail in Section III and a few more examples are also available at the distribution of gradients and the degree of blur [3]. With this in
our website.2 Although the blind methods are capable of estimating hand, the method estimates a spatial motion PSF for each segmented
complex blur kernels, when the blur is spatially nonuniform, they no object.
longer work. We briefly summarize some existing methods for the Later, inspired by Fergus’ blind motion deblurring method, Levin
motion deblurring problem in the next section. [8] and Shan et al. [4] proposed blind deblurring methods for a single
blurred image caused by a shaking camera. Although their methods
are limited to global motion blur, using the relationship between the
II. MOTION DEBLURRING IN 2-D AND 3-D distribution of derivatives and the degree of blur proposed by Fergus et
al., they estimated a shift-invariant PSF without parametrization.
A. Existing Methods Ji and Liu [13] and Dai and Wu [14] also proposed derivative-based
Ben-Ezra and Nayar [5], Tai et al. [6], and Cho et al. [7] proposed de- methods. Ji and Liu estimated the spatial motion PSF by a spectral
blurring methods, where the spatial motion PSF is obtained from the es- analysis of the image gradients, and Dai and Wu obtained the PSF by
timated motions. Ben-Ezra and Nayar [5] and Tai et al. [6] used two dif- studying how blurry the local edges are, as indicated by local gradients.
ferent cameras: a low-speed high-resolution camera and a high-speed Recently, another blind motion deblurring method was proposed by
low-resolution camera, and capture two videos of the same scene at the Chen et al. [15] for the reduction of global motion blur. They claimed
same time. Then, they estimate motions using the high-speed low-reso- that the PSF estimation is more stable with two images of the same
lution video so that detailed local motion trajectories can be estimated, scene degraded by different PSFs, and also used a robust estimation
and the estimated local motions yield a spatial motion PSF for each technique to stabilize the PSF estimation process further.
moving object. On the other hand, Cho et al. [7] took a pair of images With the advancement of computational algorithms, as mentioned
by a camera with some time delay or by two cameras with no time delay earlier, the data-acquisition process has been also studied. Using mul-
but some spatial displacement. The image pair enables the separation tiple cameras [5]–[7] is one simple way to make the identification of the
of the moving objects and the foreground from the background. Each underlying motion-blur kernel easier. Another technique called coded
part of the images is often blurred with a different PSF. The separation exposure improves the estimation of both blur kernels and images [16].
is helpful in estimating the different PSFs individually, and the estima- The idea of the coded exposure is to preserve some high-frequency
tion process of the PSFs becomes more stable. components by repeatedly opening and closing the shutter while the
Whereas the deblurring methods in [5]–[7] obtain the spatial motion camera is capturing a single image. Although it makes the SNR ratio
PSF based on the global/local motion information, Fergus et al. pro- worse, the high-frequency components are helpful in not only finding
posed a blind motion deblurring method using a relationship between the blur kernel, but also estimating the underlying image with higher
quality. When the blur is spatially variant, then scene segmentation is
2http://users.soe.ucsc.edu/~htakeda/VideoDeblurring/VideoDeblurring.htm necessary [17].
Authorized licensed use limited to: Universita degli Studi di Bologna. Downloaded on May 01,2023 at 16:46:23 UTC from IEEE Xplore. Restrictions apply.
2992 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 20, NO. 10, OCTOBER 2011
2
Fig. 2. Motion deblurring example of a rotating pepper sequence (179 179, 90 frames). (a) One of the frames from a simulated sequence, which we generate
by rotating the pepper image counterclockwise 1 per frame. (b) Blurred frame generated by taking the average of eight consecutive frames (the corresponding
2 2
PSF is a 1 1 8 shift-invariant uniform PSF) and adding white Gaussian noise with standard deviation = 2 PSNR = 27 10 SSIM = 0 82
( : dB; : ). (c) and
(d) Deblurred frames by Fergus’ method [3] ( PSNR = 23 23 : dB; SSIM = 0 61 : ), and Shan’s method [4] ( PSNR = 25 12 SSIM = 0 81
: dB; : ), respectively.
(e) Deblurred frame by the proposed method ( PSNR = 33 12 : dB; SSIM = 0 90 : ). The images in the second column show the magnifications of the upper right
portions of the images in the first column. (a) Ground truth. (b) Blurred frame. (c) Fergus et al. [3]. (d) Shan et al. [4]. (e) Proposed method (13).
Fig. 3. Schematic representation of the exposure time and the frame interval . (a) Standard camera. (b) Multiple videos taken by multiple cameras with slight
time delay is fused to produce a high frame rate video. (c) Original frames with estimated intermediate frames, Frame rate upconversion. (d) Temporally deblurred
output frames.
B. Path Ahead of interest, but as we will explain shortly, rather a means to obtain a de-
blurred sequence, at possibly the original frame rate. It is worth noting
that the temporal blur reduction is equivalent to shortening the expo-
All the methods mentioned earlier are similar in that they aim at re- sure time of video frames. Typically, the exposure time e is less than
moving motion blur by spatial (2-D) processing. In the presence of mul- the time interval between the frames f (i.e., e f ), as shown in
tiple motions, the existing methods would have to estimate shift-variant Fig. 3(a). Many commercial cameras set e to less than 0:5f (see for
PSF and segment the blurred images by local motions (or depth maps). instance [18]). Borissoff in [18] pointed out that e should ideally de-
However, occlusions make the deblurring problem more difficult be- pend on the speed of moving objects. Specifically, the exposure time
cause pixel values around motion occlusions are a mixture of multiple should be half of the time it takes for a moving object to run through
objects moving in independent directions. In this paper, we reduce the the scene width, or else temporal aliasing would be visible. In [19],
motion blur effect from videos by introducing the space–time (3-D) Shechtman et al. presented a space–time super resolution (SR) algo-
deblurring model. Since the data model is more reflective of the actual rithm, where multiple cameras capture the same scene at once with
data-acquisition process, even in the presence of motion occlusions, slight spatial and temporal displacements. Then, multiple low-resolu-
deblurring with 3-D blur kernel can effectively remove both global and tion videos in space and time are fused to obtain a spatiotemporally
local motion blur without segmentation or reliance on explicit motion super-resolved sequence. As a postprocessing step, they spatiotempo-
information. rally deblur the super-resolved video so that the exposure time e nearly
Practically speaking, for videos, it is not always preferable to re- equals to the frame interval f . Recently, Agrawal et al. proposed a
move all the motion blur effect from video frames. Particularly, for temporal coded sampling technique for temporal video SR in [20],
videos with relatively low frame rate (e.g., 10–20 frames per second), where multiple cameras simultaneously capture the same scene with
in order to show smooth trajectory of moving objects, motion blur (tem- different frame rates, exposure times, and temporal sampling positions.
poral blur) is often intentionally added. Thus, when removing (or more Their proposed method carefully optimizes those frame sampling con-
precisely “reducing”) the motion blur from videos, we would need to ditions so that the space–time SR can achieve higher quality results. By
increase the temporal resolution of the video. This operation can be contrast, in this paper, we demonstrate that the problem of motion blur
thought of as the familiar frame rate up-conversion, with the following restoration can be solved using a single, possibly low frame rate, video
caveat: in our context, the intermediate frames are not the end results sequence.
Authorized licensed use limited to: Universita degli Studi di Bologna. Downloaded on May 01,2023 at 16:46:23 UTC from IEEE Xplore. Restrictions apply.
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 20, NO. 10, OCTOBER 2011 2993
Fig. 4. Forward model addressed in this paper. We estimate the desired video u by two-step approach: 1) space–time upscaling, and 2) space–time deblurring.
To summarize, frame-rate up-conversion is necessary in order to deblurring for videos. Ringing suppression is of importance because
avoid temporal aliasing. Furthermore, unlike motion deblurring algo- the ringing effect in time creates significant visual distortion for the
rithms which address the problem purely in the spatial domain [3]–[7], output videos.
[13]–[15], we deblur with a shift-invariant 3-D PSF, which is effective 1) Data Model: The exposure time e of videos taken with a stan-
for any type of motion blur. Examples were illustrated in Figs. 1 and dard camera is always shorter than the frame interval f , as illustrated
2, and more will be shown later in Section III. The following are the in Fig. 3(a). It is generally not possible to reduce motion blur by tem-
assumptions and the limitations of our 3-D deblurring approach. poral deblurring when e < f (i.e., the temporal support of the PSF is
Assumptions shorter than the frame interval f ). This is because the standard camera
1) The camera settings are fixed: captures one frame at a time. The camera reads a frame out of the photo-
The aperture size, the focus length, the exposure time, and the sensitive array, and the array is reset to capture the next frame.3 Unlike
frame interval are all fixed. The photosensitivity of the image the spatial sampling rate, the temporal sampling rate is always below
sensor array is uniform and unchanged. the Nyquist rate. This is an electromechanical limitation of the standard
2) One camera captures one frame at a time: video camera. One way to have a high-speed video with e > f is to
In our approach, only one video is available, and the video is fuse multiple videos captured by multiple cameras at the same time
shot by a single camera, which captures one frame at a time. with slight time delay, as shown in Fig. 4(b). As we mentioned earlier,
Also, all the pixels of one frame are sampled at the same time the technique is referred to as space–time SR [19] or high-speed videog-
(without time delay). raphy [21]. After the fusion of multiple videos into a high-speed video,
3) The aperture size is small: the frame interval becomes shorter than the exposure time and we can
We currently assume that the aperture size is so small that the carry out the temporal deblurring to reduce the motion blur effect.
out-of-focus blur is almost homogeneous. An alternative to using multiple cameras is to generate intermediate
4) The spatial and temporal PSFs are known: frames, which may be obtained by frame interpolation (e.g., [22] and
In the current presentation, our primary focus is to show that [1]), so that the new frame interval ~f is now smaller than e , as illus-
a simple deblurring with the space–time (3-D) shift-invariant trated in Fig. 3(c). Once we have the video sequence with e > ~f , the
PSF can effectively reduce the complicated, nonuniform mo- temporal deblurring reduces e to be nearly equally to ~f , and the video
tion blur effects of a sequence of images. shown in Fig. 3(d) is our desired output. It is worth noting that, in the
Limitations most general setting, generation/interpolation of temporally interme-
1) The performance of our motion deblurring depends on the diate frames is indeed a very challenging problem. However, since our
performance of the space–time interpolator: interest lies mainly in the removal of motion blur, the temporal inter-
The space–time interpolator needs to generate the missing in- polation problem is not quite as complex as the general setting. In the
termediate blurry frames, while preserving spatial and tem- most general case, the space–time SR method [19] employing multiple
poral blur effects. cameras may be the only practical solution. Of course, it is possible to
2) The temporal upscaling factor affects our motion deblurring: apply the frame interpolation for the space–time super-resolved video
To remove the motion blur completely, the temporal upscaling to generate an even higher speed video. However, in this paper, we
factor of the space–time interpolator must be set to so large that focus on the case, where only a single video is available and show that
the motion speed slows down to less than 1 pixel per frame. our frame interpolation method (3-D SKR [1]) enables motion deblur-
ring. We note that the performance of the motion deblurring, therefore,
For instance, when the temporal upscaling factor is not large
depends on how well we interpolate intermediate frames. As long as
enough and an object in the upscaled video moves 3 pixels per
the interpolator successfully generates intermediate (upscaled) frames,
frame, the moving object would be still blurry along its mo- the 3-D deblurring can reduce the motion blur effects. Since, typically,
tion trajectory in a 3-pixel-wide window even after we deblur. the exposure time of the frames is relatively short even at low frame rate
However, as discussed in this section, the motion blur is some- (10–20 frame per second), we assume that local motion trajectories be-
times necessary for very fast moving objects in order to pre- tween frames are smooth enough that the 3-D SKR method interpolates
serve a smooth motion trajectory.
3Most commercial charge-coupled device (CCD) cameras nowadays use the
C. Video Deblurring in 3-D interline CCD technique, where the charged electrons of the frame are first trans-
ferred from the photosensitive sensor array to the temporal storage array and the
Next, we extend the single image (2-D) deblurring technique photosensitive array is reset. Then, the camera reads the frame out of the tem-
with total variation (TV) regularization to space–time (3-D) motion poral storage array while the photosensitive array is capturing the next frame.
Authorized licensed use limited to: Universita degli Studi di Bologna. Downloaded on May 01,2023 at 16:46:23 UTC from IEEE Xplore. Restrictions apply.
2994 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 20, NO. 10, OCTOBER 2011
If the sizes of the spatial and temporal PSF kernels are N 2 N 2 1 and
1 2 1 2 , respectively, then the overall PSF kernel has size N 2 N 2 ;
as illustrated in Fig. 5. We will discuss how to select the 3-D PSF for
deblurring later in Section II-C. While the data model (1) resembles the
one introduced by Irani and Peleg [23], we note that ours is a 3-D data
model. More specifically, we consider an image sequence (a video) as
Fig. 5. Overall PSF kernel in video (3-D) is given by the convolution of the one data set and consider the case where only a single video is available.
spatial and temporal PSF kernels. The PSF and the downsampling operations are also all in 3-D.
In this paper, we split the data model (1) into
Authorized licensed use limited to: Universita degli Studi di Bologna. Downloaded on May 01,2023 at 16:46:23 UTC from IEEE Xplore. Restrictions apply.
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 20, NO. 10, OCTOBER 2011 2995
Fig. 6. Schematic representation of the registration of the low-resolution video onto a high-resolution grid. In the illustration, a low-resolution video (3 2 3, 3
frames) is upsampled with the spatial upsampling factor r =2
and the temporal upsampling factor r . =3
the unknown pixel value z (xj ) and its neighboring sample yi by Taylor (9)
series as follows:
where h is the global smoothing parameter. This is the formulation of
the kernel regression [24] in 3-D. We set h = 0:7 for all the experi-
yi = z (xi ) + "i
ments, and Ci is the smoothing (3 2 3) matrix for the sample yi , which
= z (xj ) + frz (xi )g (xi
0 xj ) T
dictates the “footprint” of the kernel function and we will explain how
+ (xi 0 xj ) fHz (xj )g(xi 0 xj ) + 1 1 1 + "i
T
we obtain it shortly. The minimization (8) yields a pointwise estimator
= 0 + 1 (xi 0 xj )
of the blurry signal z (xj ) with the order of local signal representation
(N )
+ 2 vechf(xi 0 xj )(xi 0 xj ) g + 1 1 1 + "i
T T
(6)
with the Gaussian kernel (weight) function where p is the index of the sample positions around the ith sample
(yi )in the local analysis cubicle i ; zx (xj ); zx (xj ), and zt (xj )
(xi 0 xj )T Ci (xi 0 xj ) are the gradients along the vertical (x1 ), horizontal (x2 ), and time
K (xi 0 xj ) = jCi j exp 0
2h2 (t) axes, respectively. In this paper, we first estimate the gradients
Authorized licensed use limited to: Universita degli Studi di Bologna. Downloaded on May 01,2023 at 16:46:23 UTC from IEEE Xplore. Restrictions apply.
2996 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 20, NO. 10, OCTOBER 2011
( 1 = [zx (xp ); zx (xp ); zt (xp )]) using (8) with Ci = I and set rs 2 rs uniform PSF. Currently, we ignore the out-of-focus blur, and
i a 5 2 5 2 5 cubicle in the grid of the low-resolution video y , and we obtain the temporal support size of the temporal PSF by
then, plugging in the estimated gradients into (12), we obtain the
locally adaptive smoothing matrix Ci for each yi . With Ci given by e
(12), the kernel function faithfully reflects the local signal structure =
f
2 rt (17)
in space–time (we call it the steering kernel function), i.e., when we
estimate a pixel on an edge, the kernel function gives larger weights where rt is the user-defined temporal upscaling factor. Convolving the
for the samples (yi ) located on the same edge. On the other hand, if spatial PSF and the temporal PSF as shown in Fig. 5, we have a 3-D
there is no local structure, all the nearby samples have similar weights. (rs 2 rs 2 ) PSF for the deblurring (13). Our deblurring method with
Hence, the estimator (10) preserves local object structures while the rs 2 rs 2 PSF reduces the effective exposure time of the upscaled
suppressing the noise effects in flat regions. We refer the interested video. Specifically, after the deblurring, the effective exposure time of
reader to [24] for further details. Once all the pixels of interest have the output video is given by
been estimated using (10), we fill them in the matrix z (5) and deblur e f
~e = = : (18)
the resulting 3-D data set at once, as explained in the following section. rt
3) Space–Time (3-D) Deblurring: Assuming that, at the space–time
upscaling stage, noise is effectively suppressed [1], the important issue Therefore, when the temporal upscaling factor rt is not high, the ex-
that we need to carefully treat in the deblurring stage is the suppres- posure time ~e is not shortened by very much, and some motion blur
sion of the ringing artifacts, particularly, across time. The ringing ef- effects may be seen in the output video. For example, if an object moves
fect in time may cause undesirable flicker when we play the output 3 pixels per frame in the spatiotemporally upscaled video, the moving
video. Therefore, the deblurring approach should smooth the output object would be still blurry along its motion trajectory in a 3-pixel-wide
pixel across not only space, but also time. To this end, using the data window even after we deblur.
model (5), we propose a 3-D deblurring method with the 3-D version
of TV to recover the pixels across space and time III. EXPERIMENTS
We illustrate the performance of our proposed technique on both real
u^ = arg min
u
kz 0 Guk22 + k0uk1 (13) and simulated sequences. To begin, we first illustrate motion deblur-
ring performance on the Cup sequence, with simulated motion blur.5
The Cup example is the one we briefly showed Section I. This se-
where is the regularization parameter, and 0 is a high-pass filter. The quence contains relatively simple transitions, i.e., the cup moves up-
joint use of L2 -, L1 -norms is fairly standard [25]–[27], where the first ward. Fig. 1(a) shows the ground-truth frames, and Fig. 1(b) shows the
term (L2 -norm) is used to enforce the fidelity of the reconstruction to motion-blurred frames generated by taking the average of five consec-
the data (in a mean-squared sense), and the second term (L1 -norm) utive frames, i.e., the corresponding PSF in 3-D is 1 2 1 2 5 uniform.
is used to promote sparsity in the gradient domain, leading to sharp The deblurred images of the Cup sequence by Fergus’ method [3],
edges in space and time and avoid ringing artifacts. Specifically, we Shan’s method6 [4], and our approach (13) with (; ) = (0:75; 0:04)
implement the TV regularization as follows: are shown in Fig. 1(c)–(e), respectively. Fig. 1(f)–(j) shows the selected
1 1 1 regions of the video frames Fig. 1(a)–(e) at time t = 6, respectively.
k0uk1 ) u 0 Slx Sm t
x St u (14) The corresponding PSNR7 and SSIM8 values are indicated in the figure
l=01 m=01 t=01
1 captions. It is worth noting here again that, although motion occlusions
are present in the sequence, the proposed 3-D deblurring requires nei-
where Slx ; Sm t
x , and St are the shift operators that shift the video u ther segmentation nor motion estimation. We also note that, in a sense,
toward x1 ; x2 , and t-directions with l; m, and t-pixels, respectively. one could regard a 1 2 1 2 PSF as a 1-D PSF. However, in our paper,
We iteratively minimize the cost C (u) = kz 0 Guk22 + k0uk1 a 1 2 N 2 1 PSF and a 1 2 1 2 N are, for example, completely different.
in (13) with (14) to find the deblurred sequence u ^ using the steepest The 1 2 N 2 1 PSF blurs along the horizontal (x2 ) axis, while on the
descent method other hand, the 1 2 1 2 N PSF blurs along the time axis.
The second example in Fig. 2 is also a simulated motion deblur-
@C (u)
u^ (`+1) = u^ (`) + (15) ring. In this example, the motion blur is caused by the camera rota-
@ u u=^u tion about its optical axis. We generated a video by rotating the pepper
image counterclockwise 1 per frame for 90 frames. This is equivalent
where is the step size, and to rotating the camera clockwise 1 per frame. The sequence of the
rotated pepper image is the ground-truth video in this example. Then,
@C (u) we blurred the video by blurring with a 1 2 1 2 8 uniform PSF (this
@u
= 0GT (z 0 Gu)
is equivalent to taking the average of eight consecutive frames), and
1 1 1
I 0 S0l 0m 0t added white Gaussian noise (standard deviation = 2). Fig. 2(a) and
+ x Sx St
l=01 m=01 t=01
(b) shows one frame from the ground-truth video and the noisy blurred
video. When the camera rotates, the pixels rotate at different speeds in
2 sign u 0
Slx Sm t
x St u : (16) proportion to the distance from the center of the rotation. Consequently,
5In order to examine how well the motion blur will be removed, we do not
(`)
We initialize u ^ with the output of the space–time upscaling (i.e., take the spatial blur into account for the experiments.
u^ (0) = z), and manually select a reasonable 3-D PSF (G) for the 6The software is available at http://w1.cse.cuhk.edu.hk/~leojia/programs/de-
experiments with real blurry sequences. blurring/deblurring.htm. We set the parameter “noiseStr” to 0.05 and used
In this paper, we select a 3-D PSF based on the exposure time e and the default setting for the other parameters for all the examples.
the frame interval f of the input videos (which are generally avail- 7PSNR ratio= 10log (255 mean squareerror)
= (in decibels).
able from the camera setting), and the user-defined spatial and temporal 8The software for Structure SIMilarity index is available at http://www.ece.
upscaling factors rs and rt . Specifically, we select the spatial PSF an uwaterloo.ca/~z70wang/research/ssim/.
Authorized licensed use limited to: Universita degli Studi di Bologna. Downloaded on May 01,2023 at 16:46:23 UTC from IEEE Xplore. Restrictions apply.
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 20, NO. 10, OCTOBER 2011 2997
2
Fig. 7. Motion (temporal) deblurring example of the Book sequence (380 510, 10 frames) with real motion blur. (a) Frame of the ground truth at time t . =6
(b) and (c) Deblurred frames by Fergus’s [3] and Shan’s methods [4]. (d) and (f) Deblurred frames at t =6
and 6.5 by the proposed 3-D TV method (13) using a
2 2
1 1 8 uniform PSF. (e) One of the estimated intermediate frame at t =65 : by the 3-D SKR (10).
the motion blur is spatially variant. Even though the (temporal) PSF is The next experiment shown in Fig. 7 is a realistic example, where
independent of the scene contents or the camera motion, the shift-in- we deblur a low temporal resolution sequence degraded by real mo-
variant 3-D PSF causes spatially variant motion blur effects. Using the tion blur. The cropped sequence consists of ten frames, and the sixth
blurred video as the output of a space–time interpolator, we deblurred frame (at time t = 6) is shown in Fig. 7(a). Motion blur can be seen in
the blurred video by Fergus’ and Shan’s blind methods. One deblurred the foreground (i.e., the book in front moves toward right about 8 pixels
frame by each blind method is shown in Fig. 2(c) and (d), respectively. per frame). Similar to the previous experiment, we first deblurred those
Our deblurring result is shown in Fig. 2(e). We used the 1 2 1 2 8 frames individually by Fergus’ and Shan’s methods [3], [4]. Their de-
shift-invariant PSF for our deblurring (13) with (; ) = (0:5; 0:15). blurred results are in Fig. 7(b) and (c), respectively. For our method,
Authorized licensed use limited to: Universita degli Studi di Bologna. Downloaded on May 01,2023 at 16:46:23 UTC from IEEE Xplore. Restrictions apply.
2998 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 20, NO. 10, OCTOBER 2011
Fig. 8. 3-D (spatiotemporal) deblurring example of the Foreman sequence in CIF format. (a) Cropped frame at time t =6 . (b) and (c) Deblurred results of
2 2
the upscaled frame shown in (e) by Fergus’ [3] and Shan’s methods [4] (d) Deblurred frames by the proposed 3-D TV method (13) using a 2 2 2 uniform
PSF. (e) Upscaled frames by 3-D SKR [1] at time t =6 and 6.5 in both space and time with the spatial and temporal upscaling factors of r =2 and r =8 ,
=6
respectively. The figures (f)–(i) and (j)–(n) are the selected regions of the frames shown in (a)–(e) at t 65
and : .
temporal upscaling is necessary before deblurring. Here, it is indeed video with a 1 2 1 2 8 uniform PSF by the proposed method (13) with
the case that exposure time is shorter than the frame interval (e < f ), (; ) = (0:75; 0:06). We took the book video in dim light, and the
as shown in Fig. 3(a). Using the 3-D SKR method (10), we upscaled exposure time is nearly equal to the frame interval. Selected deblurred
the sequence with the upscaling factors rs = 1 and rt = 8 in order frames9 are shown in Fig. 7(d) and (f).
to generate intermediate frames to have the sequence, as illustrated in The last example is another real example. This time we used the
Fig. 3(c). We chose rt = 8 to slow the motion speed of the book down Foreman sequence in CIF format. Fig. 8(a) shows one frame of the
to about 1 pixel per frame so that the motion blur of the book will be 9We must note that, in case severe occlusions are present in the scene, the
almost completely removed. One of the estimated intermediate frames blurred results for the interpolated frames contain most of the errors/artifacts,
at t = 6:5 is shown in Fig. 7(e). Then, we deblurred the upscaled and this issue is one of or important future works.
Authorized licensed use limited to: Universita degli Studi di Bologna. Downloaded on May 01,2023 at 16:46:23 UTC from IEEE Xplore. Restrictions apply.
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 20, NO. 10, OCTOBER 2011 2999
Fig. 9. Deblurring performance comparisons using absolute residuals (the absolute difference between the deblurred frames shown in Fig. 8(b)–(d) and the esti-
mated frames shofwn in Fig. 8(e)). (a) Fergus’ method [3]. (b) Shan’s method [4]. (c) Our proposed method (13).
Authorized licensed use limited to: Universita degli Studi di Bologna. Downloaded on May 01,2023 at 16:46:23 UTC from IEEE Xplore. Restrictions apply.
3000 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 20, NO. 10, OCTOBER 2011
[17] Y. Tai, N. Kong, S. Lin, and S. Shin, “Coded exposure imaging for pro- [23] M. Irani and S. Peleg, “Improving resolution by image registration,”
jective motion deblurring,” in Proc. IEEE Conf. Comput. Vis. Pattern CVGIP: Graph. Models Image Process., vol. 53, no. 3, pp. 231–239,
Recognit., San Francisco, CA, Jun. 2010, pp. 2408–2415. May 1991.
[18] E. Borissoff, “Optimal temporal sampling aperture for HDTV [24] H. Takeda, S. Farsiu, and P. Milanfar, “Kernel regression for image
varispeed acquisition,” SMPTE Motion Imag. J., vol. 113, no. 4, pp. processing and reconstruction,” IEEE Trans. Image Process., vol. 16,
104–109, 2004. no. 2, pp. 349–366, Feb. 2007.
[19] E. Shechtman, Y. Caspi, and M. Irani, “Space-time super-resolution,” [25] L. Rudin, S. Osher, and E. Fatemi, “Nonlinear total variation based
IEEE Trans. Pattern Anal. Mach. Intell., vol. 27, no. 4, pp. 531–545, noise removal algorithms,” Physica D, vol. 60, pp. 259–268, Nov.
Apr. 2005. 1992.
[20] A. Agrawal, M. Gupta, A. Veeraraghavan, and S. G. Narasimhan, “Op- [26] C. Vogel and M. Oman, “Iterative methods for total variation de-
timal coded sampling for temporal super-resolution,” in Proc. IEEE noising,” SIAM J. Sci. Comput., vol. 17, pp. 227–238, 1996.
Conf. Comput. Vis. Pattern Recognit., San Francisco, CA, 2010, pp. [27] S. Osher, M. Burger, D. Goldfarb, J. Xu, and W. Yin, “An iterative reg-
599–606. ularization method for total variation-based image restoration,” SIAM
[21] B. Wilburn, N. Joshi, V. Vaish, M. Levoy, and M. Horowitz, “High- J. Multiscale Model. Simul., vol. 4, pp. 460–489, 2005.
speed videography using a dense camera array,” in Proc. IEEE Conf.
Comput. Vis. Pattern Recognit., Washington, DC, 2004, pp. 294–301.
[22] A. Huang and T. Nguyen, “Correlation-based motion vector processing
with adaptive interpolation scheme for motion-compensated frame in-
terpolation,” IEEE Trans. Image Process., vol. 18, no. 4, pp. 740–752,
Apr. 2009.
Authorized licensed use limited to: Universita degli Studi di Bologna. Downloaded on May 01,2023 at 16:46:23 UTC from IEEE Xplore. Restrictions apply.