0% found this document useful (0 votes)
38 views

Learning To Compress Images and Videos

We present an intuitive scheme for lossy color-image compression. Use color information from a few representative pixels to learn a model which predicts color on the rest of the pixels. A similar scheme is also applicable for compressing videos, where a single model can be used to predict color on many consecutive frames.

Uploaded by

Angel A. Cob
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views

Learning To Compress Images and Videos

We present an intuitive scheme for lossy color-image compression. Use color information from a few representative pixels to learn a model which predicts color on the rest of the pixels. A similar scheme is also applicable for compressing videos, where a single model can be used to predict color on many consecutive frames.

Uploaded by

Angel A. Cob
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Learning to Compress Images and Videos

Li Cheng
[email protected]
S.V. N. Vishwanathan
[email protected]
Statistical Machine Learning, National ICT Australia
Research School of Information Sciences & Engineering, Australian National University, Canberra ACT 0200

Abstract
We present an intuitive scheme for lossy
color-image compression: Use the color information from a few representative pixels to
learn a model which predicts color on the
rest of the pixels. Now, storing the representative pixels and the image in grayscale
suffice to recover the original image. A similar scheme is also applicable for compressing videos, where a single model can be
used to predict color on many consecutive
frames, leading to better compression. Existing algorithms for colorization the process of adding color to a grayscale image or
video sequence are tedious, and require intensive human-intervention. We bypass these
limitations by using a graph-based inductive
semi-supervised learning module for colorization, and a simple active learning strategy
to choose the representative pixels. Experiments on a wide variety of images and video
sequences demonstrate the efficacy of our algorithm.

1. Introduction
The explosive growth of the Internet, as witnessed
by the popularity of sites like YouTube and image
google, has exponentially increased the amount of images and movies available for download. As more and
more visual data is being exchanged, there is an everincreasing demand for better compression techniques
which will reduce network traffic. Typical compression
algorithms for images work in the frequency domain,
and use sophisticated techniques like wavelets. In the
case of video clips, these algorithms not only compress
each frame, but also use compression across frames in
Appearing in Proceedings of the 24 th International Conference on Machine Learning, Corvallis, OR, 2007. Copyright
2007 by the author(s)/owner(s).

order to reduce storage requirements. For instance,


frames within a scene are likely to be very similar, and
hence it is sufficient to encode the differences between
consecutive frames. Motion prediction, optical flow,
and other tools are also used to further improve performance.
In this paper, we take a slightly different approach.
Instead of performing a frequency transformation we
store a grayscale version of the image and color labels
of a few representative pixels. Using the stored information we learn a model which predicts the color for
the rest of the pixels. Turning to video, essentially the
same idea works, but now we only need to store color
information sampled from a single frame and use the
same model to predict on all closely related frames.
Two key questions are: a) How does one learn the
model? b) How to choose the representative pixels
automatically? In what follows, we will answer both
these questions systematically.
1.1. Paper Outline
In section 2 we will briefly discuss semi-supervised
learning algorithms with particular emphasis on graph
based methods. In section 3 we will describe the colorization algorithm of Levin et al. (2004), and formally show it is a transductive semi-supervised learning method. We then extend their formulation to an
inductive setting by adapting a graph-Laplacian based
manifold regularization algorithm due to Belkin et al.
(2006). In section 4 we show that the task of choosing
representative pixels can be automated by using a simple active learning approach. In section 6 we present
experiments on image and video data to demonstrate
the effectiveness of our algorithm. The paper concludes with a outlook and discussion.

2. Semi-Supervised Learning
Semi-supervised learning refers to the problem of
learning from labeled and unlabeled data. It has

Learning to Compress Images and Videos

attracted considerable attention in recent years (see


Zhu (2005) for a comprehensive survey). Of particular interest to us are graph based methods, examples
of which include Smola & Kondor (2003), Belkin &
Niyogi (2003), and Belkin et al. (2006). In this section we will briefly survey graph-based semi-supervised
learning algorithms.
2.1. Notation
A graph G consists of an ordered and finite set of n
vertices V denoted by {v1 , v2 , . . . , vn }, and a finite set
of edges E V V . A vertex vi is said to be a
neighbor of another vertex vj if they are connected by
an edge. G is said to be undirected if (vi , vj ) E
(vj , vi ) E for all edges. The adjacency matrix of G
is an nn real matrix W with Wij = 1 if (vi , vj ) E,
and 0 otherwise. If G is weighted then W can contain
non-negative entries other than zeros and ones, i.e.,
Wij (0, ) if (vi , vj ) E and zero otherwise. Let
D, the degree matrix,
P be an n n diagonal matrix
with entries Dii = j Wij . The graph Laplacian is
the matrix L = D W while the normalized graph
Laplacian := D1/2 LD1/2 . In what follows, by
default, we can either use the graph Laplacian or the
normalized graph Laplacian interchangeably.

tensity values and are spatially close to each other then


it is very likely that they have similar color values.
2.3. Graph Based Methods
Graph-based semi-supervised methods construct a
problem graph, G, whose nodes are the examples (both
labeled and unlabelled), and edges encode nearest
neighbor relationships. Often times, the edges are
weighted by a kernel function to reflect the similarity between neighboring examples. Now the semisupervised learning problem can be posed as that
of estimating a smooth function that respects neighborhood relations on the graph. Following Smola &
Kondor (2003), Belkin & Niyogi (2003), Belkin et al.
(2006), and others, we minimize the following regularized risk:
m

J(f ) = c ||f ||2H +

Here H is a Reproducing Kernel Hilbert Space (RKHS)


of functions f : X R. Its defining kernel is denoted
by k : X X R, which satisfies hf, k(x, )iH = f (x)
for all f H. c and are trade-off parameters for the
regularizers. The regularizer ||f ||2G is defined as
kf k2G = f > G f ,

2.2. Goal of Semi-Supervised Learning


Let X be the space of observations, and Y the space of
labels. We assume that Y is a finite subset of R and use
| Y | to denote its size. The semi-supervised learning
problem can be formally formulated as follows: Given
a sequence {(xi , yi )}m
i=1 of labeled examples drawn
from X Y, {xi }ni=m+1 of unlabeled examples drawn
from X , and a loss function l : X Y H R, learn
a function f H which minimizes the loss on the
labeled examples and also generalizes well to unseen
examples.
Our compression problem fits perfectly in this framework. In the case of images, given a set of color pixels
(labeled examples) and a set of grayscale pixels (unlabeled examples) we want to learn a function which
will predict color (labels) on the grayscale pixels. In
the case of video, our function also needs to generalize
well to predict on unseen (but closely related) frames.
Clearly, semi-supervised learning is meaningful only in
situations where the true underlying distribution of examples, which the unlabeled data will help elucidate,
is relevant for the classification problem. Therefore,
certain smoothness assumptions, e.g. if two observations are close then their corresponding labels should
be similar, are often made. In our application, these
assumptions are natural: If two pixels have similar in-

1 X

||f ||2G +
l(xi , yi , f ). (1)
2
n
m i=1

(2)

where
f
denotes
the
vector
[f (xi ), . . . , f (xm ), . . . f (xn )], and G Rnn is
a function of G which determines the specific form of
regularization imposed. Two choices are particularly
relevant to us:
kf k2G = f > f ,

(3)

kf k2G = f > L2 f = ||Lf ||2 .

(4)

and

Finally, we make the assumption that l only depends


on f via its evaluations at f (xi ) and that l is piecewise
differentiable. Specifically, when we use the square
loss,
l(xi , yi , f ) = (f (xi ) yi )2 ,

(5)

we obtain the so-called Laplacian Regularized Least


Square (LapRLS) algorithm, which generalizes the
Regularized Least Squares algorithm (Belkin et al.,
2006). As a consequence of the representer theorem
(Scholkopf & Smola, 2002) there exist coefficients i
such that f can be expressed as
f () =

n
X
i=1

i k(xi , ).

(6)

Learning to Compress Images and Videos

Furthermore, there is a closed form solution


= (Im K + cmI +

m
G K)1 y,
n2

makes it easy to see connections to semi-supervised


learning.
(7)

where denotes the vector [i , . . . , m , . . . n ], Im


Rnn contains the identity matrix of size mm on the
top left hand corner and zeros elsewhere, I is the identity matrix, K is the Gram matrix Kij = k(xi , xj ),
and y denotes the vector [yi , . . . , ym , 0, . . . , 0]. We will
use this formulation in all our experiments.
2.4. Transductive vs Inductive

Given a grayscale image with a few color patches,


we enforce the constraint that two neighboring pixels should have similar colors if their intensities are
similar. Formally, let X = {xi }ni=1 denote the set of
all pixels in an image, and {xi , yi }m
i=1 denote the set
of pixels for which color information, yi , is available.
We minimize the following objective function:

2
n
m
X
X
X
f (xi )

ij f (xj )
+
(f (xi ), yi ). (9)
i=1

In transductive learning one is given a (labeled) training set and an (unlabeled) test set. The idea of transduction is to perform predictions only for the given
test points (Chapelle et al., 2006). This is in contrast to inductive learning, where the goal is to output
a prediction function which is defined on the entire
space (Chapelle et al., 2006). In our context, this is a
key difference. While an inductive algorithm can easily be used to predict labels on closely related images,
transductive algorithms are unsuitable. This limits the
applicability of transductive algorithms to video compression.
All the algorithms we discussed above are inductive,
but, it is easy to turn them into transductive algorithms. Let X denote the set of labeled and unlabelled
points, then, work with functions f : X R, drop the
regularization term ||f ||2H from the objective function,
(1), and minimize (Zhu, 2005; Belkin et al., 2006):
m

J(f ) =

1 X

2
||f
||
+
l(xi , yi , f ).
G
n2
m i=1

(8)

We note in the passing that if we drop the regularization term ||f ||2H from the objective function but continue to work with f H, i.e., f : X R, we get an
inductive algorithm whose prediction function is required to be smooth only on the observed examples
(both labeled and unlabelled). In applications (like
ours) where f is only used to predict on examples that
are very similar to the observed examples, this suffices.

3. Colorization by Semi-Supervised
Learning
Colorization the process of adding color to a grayscale
image or video sequence has attracted considerable
research interest recently. Unfortunately, existing algorithms are tedious, and labor-intensive. For our purposes, a particularly relevant algorithm is due to Levin
et al. (2004), which we now present using notation that

ij

i=1

Here, f : X R is a function that assigns color values to pixels, i j implies that pixel xj is a neighbor
of pixel xi , and for each i the weights ij are nonnegative and sum to one. The predictor f is forced
to take on user-specified values on all pixels where
color information is available, by the loss function
(f (xi ), yi ) which is 0 if f (xi ) = yi and otherwise.
The weights are computed using a normalized radial
basis function or a second-order polynomial, and takes
into account the similarities in intensity values.
To show that the above algorithm is a graph-based
transductive semi-supervised learning algorithm, we
begin by constructing a weighted
adjacency matrix W
P
such that Wij = ij . Since i ij = 1, the degree matrix D = I, and the graph Laplacian can be written as
L = I W . It is now easy to verify that

2
n
X
X
f (xi )
f > L2 f =
ij f (xj ) ,
i=1

ij

and hence, modulo some scaling factors, the objective


function of Levin et al. (2004) is identical to (8).
It is worthwhile mentioning here that Levin et al.
(2004) also use their algorithm to perform colorization on a video sequence. Now, the notion of a neighbor also takes into account temporal information, that
is, two pixels are deemed neighbors if either they are
close to each other on a single frame or if they appear at the same position on two consecutive frames.
For our application, this approach suffers from several
drawbacks. First, the size of the optimization problem
grows with the number of related frames thus making
it unsuitable for real-time compression. Second, the
algorithm propagates color information from frame to
frame. A better approach is to learn how to predict
color on a single frame and reuse this model to predict
on all closely related frames. Third, when streaming
data on the Internet, one might need to compress on
demand since all the frames might not be available

Learning to Compress Images and Videos

apriori. Our algorithm, which we describe next, addresses all these issues.

and queries it for label information, adds it to its label


set, and the process repeats.

3.1. Our Algorithm

5. Implementation Details

We extend the formulation of Levin et al. (2004) in


many ways. First, we work with an graph-based inductive algorithm LapRLS (see section 2.3). This has
the advantage that we can learn a model for one frame
of video and reuse the same model to predict on related
frames, thereby addressing the shortcomings discussed
above. Second, we use the square loss (5) instead of the
loss. Besides being analytically tractable, our loss favors smoother functions whose color predictions might
differ slightly from user-supplied values. Third, we use
the normalized graph Laplacian, , for regularization
instead of the L2 regularization favored by Levin et al.
(2004). Recent studies (e.g. Zhang & Ando, 2005) have
shown that using the normalized graph Laplacian is
both theoretically and practically more appealing. Finally, we extract features out of each pixel and construct our nearest neighbor graph in feature space. Instead of just respecting spatial proximity, our features
also respect local texture. We discuss implementation
details in section 5.

In this section we describe various tricks of the trade


we employed to apply our algorithm to images and
videos.

4. Active Learning for Compression


Active learning is a framework that allows the learner
to ask for informative examples. The goal is as usual
to construct an accurate classifier, but the labels of the
data points are initially hidden and there is a charge
for each label you want revealed. The hope is that by
intelligent adaptive querying, one can get away with
significantly fewer labels than one would need in a regular supervised learning framework.
Recall that we want to colorize an image or a video
by building a model which takes as input color information of few representative pixels. The key question
is: How to choose these representative pixels? We
solve this by casting our question as an transductive
active-learning problem. Each pixel we query for a label (color information) increases our cost (the amount
of storage needed to reconstruct the image). Therefore, it is advantageous to query for as few pixels as
possible.
While sophisticated algorithms with guaranteed theoretical bounds exist for active learning, we use a simple
strategy. The learner starts off with a few randomly
chosen labeled pixels, and learns a model. The prediction of the model is then evaluated on the image and
high error areas are identified and clustered. The algorithm then chooses a representative from each cluster

Following Levin et al. (2004), we work in Y U V space


where Y is the intensity (luminance) channel, U and
V are the chrominance channels that encode the color.
We also predict U , V values independently. As noted
in section 3.1, our algorithm maps each pixel using a
feature map. Our feature maps encode both the spatial
location of the pixel in the image grid as well as the
local texture information obtained by sampling a 5 5
local grid around each pixel. Similar to Levin et al.
(2004), we also construct a 4-nearest neighbor graph
in feature space. While only spatially adjacent pixels
are connected in their graph, our graph also takes local
texture into account when connecting pixels. For the
kernel we choose the stock standard Gaussian kernel
(Scholkopf & Smola, 2002) with the variance tuned
separately for each problem.
A key issue to be addressed is computation of s which
involves inverting a dense n n matrix, where n is the
number of pixels in the image (see (7)). Advances in
parallel computing and matrix algebra notwithstanding, it is still computationally challenging to invert a
large dense matrix with tens of thousands of rows and
columns. To reduce our computational burden we first
observe that there is a lot of redundancy amongst spatially nearby pixels, since they tend to be spectrally
homogeneous. Therefore, we can preprocess input images or frames to obtain an over-segmented representation, also called as a super-pixel representation by Ren
& Malik (2003), and pick pixels randomly from these
segments. Typically, after quantization, the number
of segments ranges between 1000 - 5000 depending on
the complexity of the input image.
We address the issue of measuring the quality of our solution. We employ Peak Signal to Noise Ratio (PSNR)
score, a standard scheme for measuring the quality of
image and video compression. It measures fidelity in
the logarithmic decibel scale as

255
PSNR = 20 log10
,
MSE

(10)

where the empirical Mean Square Error (MSE) be-

Learning to Compress Images and Videos


0

tween two images I and I of size n n is


MSE =

n
0
1 X
(Iij Iij )2 .
n2 i,j=1

(11)

If the image contains more than one channel (e.g. R,


G, B or Y, U, V) then, the MSE is the average of the
MSE measured on each channel.
Now we turn our attention to the stopping criterion
employed by the active learning algorithm. We need
to balance between two conflicting requirements. On
one hand, we want to reduce the number of labeled
examples. On the other, we want to have a high PSNR.
In our experiments, we stop the algorithm by default
when either a PSNR of 38 is achieved or a max of 5000
pixels have been queried for color information.

6. Experiments
Our algorithm can work in two different modes: A
human-assisted mode, where the active learning module is switched off, and a completely automatic mode
which requires an oracle supplying ground truth. The
human-assisted mode is useful in situations where
ground truth is either not available or it is expensive to
label pixels. On the other hand, the automated mode
can be used for compression. We experiment with both
images and video and report our results below. 1
Human-Assisted Image Colorization Given a
gray-scale image, when its color image are not known
apriori, we could label a few pixels with color and
hand to our algorithm which will learns the predictor f . The color image are then revealed by applying
f to the whole image. Figure 1 presents an experiment
on a image (panel (a)), where with the aid of the labeled pixels (panel (a), in color), our semisupervized
algorithm is able to produce a visually appealing color
image (panel (b)).
Image Compression To test the efficacy of our image compression scheme we perform two experiments.
The aim of the first experiment is to show that our active learning approach outperforms humans in choosing pixels for labeling. Here we work with the color
image of a colony of bees on hive frames. The image
is of size 640 853 and is depicted in panel (a) of Figure 2. The corresponding grayscale image is depicted
in panel (b). On this image we asked a human volunteer to label certain pixels with color. The pixels
1

The results can be found from a dedicated webpage


sml.nicta.com.au/licheng/LearnCompressImgVid/ LearnCompressImgVid.html.

Figure 1. An image colorization example. When presented


with this gray-scale image of size 683 512, as well as the
labeled pixels (panel (a)), our algorithm is able obtain a
visually appealing colorized image (panel (b)).

chosen by the human are depicted in panel (c), and


the colorized image is depicted in panel (d). Notice
that the predicted image is not visually pleasing. For
instance, there are certain patches which are marked
with a bluish tinge because sufficient labels were not
available to learn those textures. This shows that for
images with rich texture human-assisted colorization
might be tedious and time-consuming. In contrast,
panel (e) depicts the pixels chosen by our active learning approach, and the corresponding colorized image
is depicted in panel (f). Observe that this image is
visually indistinguishable from the ground truth. Not
only does the active learning approach produce more
visually pleasing results (PSNR of 27.00 for the human
labeling vs a PSNR of 31.49 for the active learning approach), but also requires far fewer number of labeled
pixels (8558 vs 2534 pixels).
Recall that our algorithm works in iterations and
chooses pixels to query for labels. We plot the evolution of the PSNR score as the iterations proceed in
panel (a) of Figure 3. It can be seen that the PSNR
curve plateaus after 4 - 6 iterations.
In our second experiment (see Figure 4) we work with
the image of a girl. The aim of this experiment is to
show that the active learning approach outperforms
labeling randomly chosen pixels. The original image
of size is 512 683 is depicted in panel (a) of Figure 4.
The corresponding grayscale image is depicted in panel

Learning to Compress Images and Videos

(a)

(b). The random pixels chosen for labeling are depicted in panel (c), and the colorized image is depicted
in panel (d). Notice again that the predicted image
exhibits some artifacts (e.g. whitish color around the
forehead area). In contrast, panel (e) depicts the pixels chosen by our active learning approach, and the
corresponding colorized image is depicted in panel (f).
Our predicted image is visually indistinguishable from
the ground truth. Now the PSNR values are 38.41 and
40.95, and the number of pixels chosen are 2976 and
2766, for the random and active learning approaches
respectively. The evolution of the PSNR score with the
number of iterations is shown in panel (b) of Figure 3.

(b)

Finally, we test the compression ratios achieved by


our approach on both the images. In the case of the
bees, the original JPEG image occupies 595845 bytes
on disk, while the grayscale version occupies 439303
bytes. In addition, we have to store 2534 color pixels. Each pixel for which we store color information
requires 4 bytes of additional storage: 2 bytes for the
color information (the luminance channel is already
present in the grayscale image), and 2 bytes to encode
its location. This adds a modest 10136 bytes (approx
10Kb) of extra storage, leading to a compression ratio
of 0.754.
(c)

(d)

For the girl image, the figures are: 220641 bytes for
the color JPEG, 161136 bytes for the grayscale, and
2766 colored pixels, leading to a compression ratio of
0.781.
Human-Assisted Video Colorization The aim of
this experiment is to show that the color predictor
learnt from a single frame can be successfully deployed
to predict color on many successive frames, without
any visible distortions.

(e)

(f)

Figure 2. Image compression example. See text for details.


31.5

42

40

30.5

PSNR

psnr

31

30

36

29.5
29
0

38

2
4
6
number of iterations

(a)

34
0

5
10
15
number of iterations

We work with a grayscale image sequence (146 frames


of size 240 130) of a baby holding a milk bottle,
and manually scribble 1542 color labels on the first
frame. We then learn a predictor using the labeled and
unlabelled pixels of the first frame and use it to predict
it on successive frames. Figure 5 shows our results:
Panel (a) depicts the first frame of the sequence, panel
(b) the grayscale version with the manually annotated
labels, panel (c) the prediction results of our algorithm
on the first frame, panel (d) the 32nd frame of the
video sequence, and panel (e) the prediction of our
algorithm.

20

(b)

Figure 3. The PSNR scores for (a) bees and (b) girl vs
number of iterations of the active learning algorithm.

Video Compression The aim of this experiment is


to explore the utility of our method for compressing
color videos. We experiment with a video stream that
contains 302 frames, each of size 240 130, of a call

Learning to Compress Images and Videos

(a)

(a)

(b)

(c)

(d)

(b)
(e)
Figure 5. Human assistant video compression example.
See text for details.

(c)

(e)

(d)

(f)

Figure 4. Image compression example. See text for details.

center employee. Figure 6 shows our results: Panel


(a) depicts the first frame of the sequence, panel (b)
the grayscale version, panel (c) the color pixels chosen by our active learning approach, and panel (d) the
prediction results of our algorithm. We use our learnt
model to predict frames 1 to 49. In the same figure,
panel (e) depicts ground truth for frame number 50
and panel (f) the prediction of our algorithm. Notice
the raised hand of frame 50 generates heavy color distortion when using a predictor learnt using information
only from frame 1. Since our previously learnt model
fails to predict well, we now update model with additional color pixels from the current frame. Panel (g)
depicts the color pixels chosen and panel (h) depicts
the prediction of the new model.
Our active learning approach selectively queries labels
for pixels along the boundaries where color changes
occur. For instance, in panel (g) notice that it queries
for color information around the fingers, since this is a
difficult to learn region. During processing the whole
sequence, the final model learnt requires 7005 labeled
pixels.
We compressed the original color video into a H.264
format movie (using QuickTime Professional in default settings), and the resultant filesize was 816419
bytes. On the other hand, the grayscale movie compressed with the same codec occupied 698865 bytes.
Our method also needs to store color information for

Learning to Compress Images and Videos

7005 pixels. Each pixel for which we store color information requires 4 bytes of additional storage: 2 bytes
for the color information (the luminance channel is already present in the grayscale image) and 2 bytes to
encode its location. This adds a modest 35025 bytes
(approx 34Kb) of extra storage, leading to a compression ratio of 0.899.

is no reward for forgetting labels. Our algorithm iteratively queries for labels, but never forgets previously
queried labels. But, it is possible that we might be
able to achieve the same PSNR values with far fewer
number of pixels. Extending it to forget labels is part
of our future research. Proving performance bounds
for our algorithm and addressing non-stationary video
sequences are also fertile areas of future research.
Acknowledgements

(a)

(b)

(c)

(d)

NICTA is funded by the Australian Governments Department of Communications, Information Technology


and the Arts and the Australian Research Council
through Backing Australias Ability and the ICT Center of Excellence program. This work is supported by
the IST Program of the European Community, under
the Pascal Network of Excellence, IST-2002-506778.

References
Belkin, M., & Niyogi, P. (2003). Laplacian eigenmaps
for dimensionality reduction and data representation. Neural Computation, 15 (6), 13731396.

(e)

(f)

Belkin, M., Niyogi, P., & Sindhwani, V. (2006). Manifold regularization: A geometric framework for
learning from labeled and unlabeled examples. J.
Mach. Learn. Res., 7, 23992434.
Chapelle, O., Scholkopf, B., & Zien, A., eds. (2006).
Semi-Supervised Learning. Cambridge, MA: MIT
Press.

(g)

(h)

Figure 6. Video compression example. See text for details.

7. Outlook and Discussion


JPEG and H.264 are widely considered state-of-the-art
compression techniques for images and video respectively. In this paper, we presented a machine learning approach which is able to compress images and
video better than these algorithms, often times achieving competitive compression ratios.
The observation that the colorization algorithm of
Levin et al. (2004) is a transductive graph-based semisupervised learning algorithm led to the research presented in this paper. We enhance and extend the original algorithm in many different ways, which are well
motivated from a machine learning viewpoint, and applied it to a novel application.
In the standard active learning paradigm, there is a
cost associated with querying for a label. But, there

Levin, A., Lischinski, D., & Weiss, Y. (2004). Colorization using optimization. In SIGGRAPH 04: ACM
SIGGRAPH 2004 Papers, 689694. New York, NY,
USA: ACM Press.
Ren, X., & Malik, J. (2003). Learning a classification
model for segmentation. In Proc. 9th Intl. Conf.
Computer Vision, vol. 1, 1017.
Scholkopf, B., & Smola, A. (2002). Learning with Kernels. Cambridge, MA: MIT Press.
Smola, A. J., & Kondor, I. R. (2003). Kernels and
regularization on graphs. In B. Scholkopf, & M. K.
Warmuth, eds., Proc. Annual Conf. Computational
Learning Theory, Lecture Notes in Comput. Sci.,
144158. Heidelberg, Germany: Springer-Verlag.
Zhang, T., & Ando, R. K. (2005). Graph based semisupervised learning and spectral kernel design. Tech.
Rep. RC23713, IBM T.J. Watson Research Center.
Zhu, X. (2005).
Semi-supervised learning literature survey.
Tech. Rep. 1530, Computer
Sciences,
University of Wisconsin-Madison.
Http://www.cs.wisc.edu/jerryzhu/pub/ssl survey.pdf.

You might also like