A Class-Incremental Learning Method For Multi-Class Support Vector Machines in Text Classification
A Class-Incremental Learning Method For Multi-Class Support Vector Machines in Text Classification
Authorized licensed use limited to: SLIIT - Sri Lanka Institute of Information Technology. Downloaded on June 09,2023 at 08:47:25 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fifth International Conference on Machine Learning and Cybernetics, Dalian, 13-16 August 2006
model of the classifier must be relearned. In such situations, Incremental techniques have been developed to facilitate
it is necessary to update an existing classifier in an batch SVM learning over very large data sets and stream
incremental fashion to accommodate new data without data sets widespread use in the SVM community. A
compromising classification performance on previous data. procedure for exact “adiabatic” incremental learning of
On the other hand, many applications that involve massive SVM classifiers, in a single pass through the data, was
data sets are emerging. In such settings, SVM which suffers introduced in [4] and extended larger-set increments [5].
from the problem of large memory requirement and CPU Incremental learning with support vector machines,
time when trained in batch mode on large data sets proposed in [6], consists in learning new data by discarding
algorithm can not effectively deal with them. Incremental all past examples except support vectors. The proposed
learning techniques are one possible solution in both framework thus relies on the property that support vectors
settings. summarize well the data, so it may give an approximate
In this paper, we first present a new strategy for solution. A way to incrementally solve the global
updating SVM classifiers in an incremental manner when optimization problem in order to find the exact solution was
new classes are added. When a new class comes, the proposed in [4]. Furthermore, it allows to do “decremental”
proposed method will take all the original classes as one unlearning and to efficiently compute leave-one-out
new negative super class against it and using the basic estimations. Local incremental learning of SVM or LISVM,
binary training method on them to get a higher lever proposed in [7], exploits the “locality” of Radial Basis
classifier. The feature selection step will also be adjusted Function kernels to update current machine by only
accordingly, that is, we will use different subset of the considering a subset of support candidates in the
features for different level. The method offers a flexible neighborhood of the input and allowing determining a
choice for originally trained classes and effectively reduces relevant neighborhood size at each learning step.
the training time when new classes are added without However, there was little attention to the updating of
significant decreasing the accuracy of classification. multi-class SVM classifiers when the number of the class
The rest of the paper is organized as follows. In increased. When a new class was added to the classification
section 2, we describe how SVM classifiers are trained system, the above methods could not fully be
incrementally when new samples are added and used in text accommodated to this situation and the old models became
classification. Section 3 presents the principles and useless. As a result, the retraining of the whole model was
algorithms of our class-incremental training framework. not avoidable. In this case, the class-incrementablity as well
Experimental results on real web text data are given in as the reusing of old models turned into important issues.
Section 4. Finally some conclusions are drawn in Section 5.
3. Class-Incremental Learning for Multi-Class SVM
2. Related work
For a N-instance label set denoted by {1, 2,…, N}, a
In text classification, a feature selection phase is set containing N-1 binary SVM classifiers CN = {S2,…, SN-1,
needed before a text d was represented by a vector [1]: SN} can be constructed accordingly, in which Si is the
x = [wd1,wd2 ,..., wdn ], binary classifier telling apart all the instances belong to any
where wdi (i=1,2,…,n) are the weights of document features class of set {i-1,…,1} with those of class i. When a
in text d and n is the number of document features. The newcome class N+1 unknown to current classification
Lagrangian dual of soft-margin support vector learning for system is considered, we need not to train a classifier for all
M examples can be formulated as: the N+1 classes from beginning but only construct a new
max
α
∑ia i − 12 ∑i, j a i a j y i y j K (x i ⋅ x j ) binary classifier takes all the samples from known classes
, as negative and that of the new class as positive. By this
s.t. 0 ≤ a i ≤ C ( for i = 1,...M ), ∑i =1 a i y i = 0
M
way, which we call class-incremental learning (CIL), a new
where K is the kernel function and yi = ±1 are the labels of class or label is added to the old classification system
the positive and negative samples respectively [2]. The incrementally. All through this paper, without posing any
decision function of the binary SVM is: confusion, we take the class label i as either a class label or
f (x) = sgn(∑i =1 a i y i K (x, x i ) + b) .
M all the samples belonging to the class with label i, which
sometimes we also call class i.
Because the SVM learning poses the optimization
problem, which implies to re-learn the whole machine,
there were only a few works tackle the issue of incremental
training of SVM when samples are gradually added.
2582
Authorized licensed use limited to: SLIIT - Sri Lanka Institute of Information Technology. Downloaded on June 09,2023 at 08:47:25 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fifth International Conference on Machine Learning and Cybernetics, Dalian, 13-16 August 2006
3.1. Incremental Updating Method of Classifiers CN-1 called VN-1 is a sub-space of CN called VN. We train the
binary SVM sub-classifier in the space VN. In the testing
The basic idea of the CIL is to reuse the old models as phase, when a vector x in VN is passed down to the
possible as we can. When a new class and its samples are classifier CN-1, we use its projection on VN-1 denoted by
added to the system, the ability of the old classifier to x|VN-1 in classifier CN-1.
distinguish between the instances of the old classes is not Now we can get the whole learning process and
influenced at all. What should to be done is only to tell summarize it as follows:
apart the instances of the new class from that of the old Step 0: Apply appropriate feature selection phase to
classes. get feature set Fm and train a multi-class SVM classifier Cm
Assume that we now have N-1 classes whose in vector space Vm for m original classes by any batch
classification system is called classifier CN-1. When we take method.
a different N-th class into account, we only need to train a For a new class N (N>m),
binary sub-classifier SN to separate the instances of the class Step 1: Apply incremental feature selection phase to
N from instances of all the other N-1 classes (referred to as get feature set F N;
Super Class SCN-1). Consequently a positive decision of the Step 2: Train a binary classifier for super class SCN-1
instance in the testing phase will indicate its belonging to and class N;
the class N. Rejected instances will be passed down to the Step 3: Combine the binary classifier and the classifier
classifier CN-1. Then the classifier CN-1 and the sub-classifier CN-1 to get classifier CN.
SN are combined and referred to as classifier CN. When class
N+1 is considered, the same process is progressed. Figure 1 3.3. Testing
illustrates the relations between some date sets and
sub-classifiers mentioned above in the training process. The process of testing phase is showed in figure 2.
According to above discussions, classifier CN consists of a
binary sub-classifier SN and a lower-leveled classifier CN-1.
SN Every Instance x is projected to the vector space VN and the
projection, x|Vk, can be taken as a vector in space VN. A
positive decision after applying SN on x|Vk implies the
belonging of x to class N, otherwise x will be passed further
down to the lower-leveled classifier SN-1 iteratively until a
label returned. we give the summarization of the testing
class N-1 class N phase as follows:
SC N-2
For classifier CN, and the instance x, until k = m,
...
SC N-1 Step 1: Apply classifier Ck on x|Vk;
SC N Step 2: If x belong to a class
return label of class;
Figure 1. Relations between data sets and sub-classifiers in Else set k = k-1, pass x|Vk down to Ck;
the process of incremental updating of the classifier go to step 1.
x|V N
3.2. Incremental Feature Selection
2583
Authorized licensed use limited to: SLIIT - Sri Lanka Institute of Information Technology. Downloaded on June 09,2023 at 08:47:25 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fifth International Conference on Machine Learning and Cybernetics, Dalian, 13-16 August 2006
2584
Authorized licensed use limited to: SLIIT - Sri Lanka Institute of Information Technology. Downloaded on June 09,2023 at 08:47:25 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fifth International Conference on Machine Learning and Cybernetics, Dalian, 13-16 August 2006
the three batches learning methods and almost competed to Proceedings of ECML-98, Chemnitz, Germany
the best performances of all the others. pp.137-142, 1998.
Since CIL trains a binary classifier for class N and [3] Volkan V., Jennifer G. Dy, "A hierarchical methods
super class SCN-1, sometimes there will be far more negative for multi-class Support Vector Machines", Proceeding
samples than positive samples especially when N is bigger. of ICML-04 Conference, Banff, pp. 105-112, 2004.
This poses the problem of the class imbalance [10]. In the [4] Cauwenberghs G., Poggio T., “Incremental and
situation of our method, the positives will be prone to being Decremental Support Vector Machine Learning”,
submerged by the negatives. This is one of the possible NIPS 2000, pp. 409–415, MIT Press, 2000.
reasons for the decrease in effectiveness of the CIL and [5] Fine S., Scheinberg K., “Incremental learning and
other method when N increased. Our future work is to take selective sampling via parametric optimization
the class imbalance problem into account and build a framework for SVM”, Advances in Neural
classifier adaptive to the class distributions. Information Processing Systems 14, pp. 705-711, MIT
Another direction is to study the influence of different Press, 2002.
feature selection algorithms and the sizes of feature sets to [6] Syed N. A., Liu H., Sung, K., “Incremental learning
the performances of the CIL method because we have only with support vector machines”, Proceedings of
adopted Chi-Square to choose a given number of features in IJCAI-99, Stockholm, Sweden, 1999.
this paper. Some other feature selection metrics may be [7] Ralaivola L., d'Alch-Buc F., "Incremental Support
more suitable for CIL. Vector Machine learning: a local approach”, Georg
Dorffner Horst Bischof, Kurt Hornik (Eds.): Lecture
Acknowledgements Notes in Computer Science 2130, pp. 322–330,
Springer, Vienna, Austria, 2001.
This paper is supported by the National Research [8] Lewis D. D., “Reuters-21578 text categorization tests
Foundation for the Doctoral Program of Higher Education collection. Distribution 1.0.199”, available at
of China under Grant No. 20049998027, and the Natural http://www.daviddlewis.com/resources/testcollections/
Science Foundation of China under Grant No. 90604006. reuters21578/readme.txt
[9] Chang C., Lin C., “LIBSVM: a library for support
References vector machines”, software available at
http://www.csie.ntu.edu.tw/~cjlin/libsvm
[1] Sebastiani F., “Machine learning in automated text [10] Castillo M. D., Serrano J. I., “A multi-strategy
categorization”, ACM Computing Surveys. Vol 34 No. approach for digital text categorization from
1, pp. 1-47, 2002. imbalanced documents”, ACM SIGKDD Explorations
[2] Joachims T., "Text categorization with support vector Newsletter. Vol 6, No. 1, pp. 70-79, 2004.
machines: learning with many relevant features",
2585
Authorized licensed use limited to: SLIIT - Sri Lanka Institute of Information Technology. Downloaded on June 09,2023 at 08:47:25 UTC from IEEE Xplore. Restrictions apply.