See discussions, stats, and author profiles for this publication at: [Link]
net/publication/315743515
Semi-Unsupervised Clustering Using Reinforcement Learning
Conference Paper · May 2016
CITATIONS READS
2 234
2 authors, including:
Sourabh Bose
University of Texas at Arlington
4 PUBLICATIONS 5 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Learn a generic policy to build adaptive classifier networks View project
Learning abstractions for/from RL tasks View project
All content following this page was uploaded by Sourabh Bose on 02 April 2017.
The user has requested enhancement of the downloaded file.
Semi-Unsupervised Clustering Using Reinforcement Learning
Sourabh Bose and Manfred Huber
Department of Computer Science and Engineering
The University of Texas at Arlington
Arlington, TX USA
[Link]@[Link] huber@[Link]
Abstract Semi-unsupervised clustering defines pairwise constraints
on the input data in order to direct the clustering algorithm
Clusters defined over a dataset by unsupervised clustering of-
towards an answer which satisfies given constraints. We
ten present groupings which differ from the expected solu-
tion. This is primarily the case when some scarce knowl- can have two possible types of constraints, same cluster or
edge of the problem exists beforehand that partially identi- must link constraints which indicate that points should be in
fies desired characteristics of clusters. However conventional the same cluster, and different cluster or must not link con-
clustering algorithms are not defined to expect any super- straints indicating that points should be in different clusters.
vision from the external world, as they are supposed to be Given the input samples, it is often not possible to cluster
completely unsupervised. As a result they can not benefit the data according to the constraints in their original feature
or effectively take into account available information about space using unmodified distance measures as indications for
the use or properties of the clusters. In this paper we pro- similarity. Thus we have to modify the feature space, usually
pose a reinforcement learning approach to address this prob- by scaling the dimensions, so that an unmodified clustering
lem where existing, unmodified unsupervised clustering al-
gorithms are augmented in a way that the available sparse
algorithm is able to cluster based on it’s own distance and
information is utilized to achieve more appropriate clusters. variance constraints. In order to solve this problem, this pa-
Our model works with any clustering algorithm, but the in- per presents a novel approach which, at first, learns a policy
put to the algorithm, instead of being the original dataset, is to compute the scaling factors using Reinforcement learning
a scaled version of the same, where the scaling factors are from a set of training problems and subsequently applies the
determined by the reinforcement learning algorithm. learned policy to compute the scaling factors for new prob-
lems. The goal here is that by working on the scaled dimen-
sions, the traditional clustering algorithm can yield results
Introduction that satisfy the constraints.
Clustering of data into groups is an important task to per-
form dimensionality reduction and to identify important Existing Methodologies
properties of a data set. A wide range of algorithms for
clustering have been devised that all use some built-in sim- Existing methods to solve the problem usually include a reg-
ilarity/distance measure that is built into the algorithm to ularization term in the cluster optimization criteria, there-
establish groupings of the data with a range of properties fore redefining the goal of the clustering algorithm in or-
(Xu, Wunsch, and others 2005).These clustering algorithms der to include the satisfaction of the given constraints. Us-
group the data, attempting to maximize the distance between ing this regularization term (Zeng and Cheung 2012) these
the clusters while minimizing the variance within the indi- approaches modify traditional clustering algorithms (Ran-
vidual clusters. However, traditional clustering algorithms gapuram and Hein 2015) in order to solve for both the
do not provide an efficient mechanism to fine tune the final constraint satisfaction and the similarity optimization. The
clustering solution, given some sparse information about the methods proposed in this area differ in the way the informa-
desired properties of the grouping of the data. tion is utilized either by adapting the similarity measure or
The need of semi-unsupervised clustering arises, for ex- by modifying the search for appropriate clusters (Grira, Cru-
ample, in data sets with large numbers of attributes where cianu, and Boujemaa 2004). Additionally, constraints often
most of the attributes are not semantically relevant but will contain misleading information, which results in the miscal-
dominate any distance metric (due to their number), used by culation of the final cluster. To solve this problem, a subset
traditional clustering algorithms. In these cases, sparse in- of constraints are selected from the entire set as shown in
formation regarding the quality of clusters or regarding rela- (Zhao et al. 2012) to achieve required grouping.
tions between a small number of data points might be avail-
able which could be used to guide the cluster formation. Approach
Copyright c 2016, Association for the Advancement of Artificial Consider the set of sample data shown in Figure 1(a), gen-
Intelligence ([Link]). All rights reserved. erated from four distinct distributions {A, B, C, D}, which
(a) data from 4 distinct sources (b) clusters formed before scaling (c) clusters formed after scaling
Figure 1: (a): Original dataset and preferred final grouping; (b). Solution of traditional algorithm without dimension scaling;
(c). Solution of the same traditional algorithm with the X-axis scaled by a factor of 5
we want to divide into two clusters. Furthermore, an addi- smaller number than the number of samples m. For exam-
tional set of constraints are posed upon the dataset, which ple, given a set of 10 dimensional data of 100 points and
provide evidence that the required clusters are the grouping eight constraints, the resultant space after conversion would
of sets {A, C} and {B, D} respectively. A clustering al- be of 26 dimensions where the 10 features in the original
gorithm with Cartesian distance metric applied on the data, input are concatenated with the 16 new features computed,
groups the set of points into {A, B} and {C, D} as shown where the similarity metric of the kernel is only computed
in Figure 1(b). In order for the unmodified clustering al- with respect to the points in the constraints.
gorithm to produce the desired result, the data is scaled up Upon mapping the dataset onto the new feature space, the
along the x-axis by a factor of 5, thus increasing the distance new dimensions are scaled using a MDP (Markov Decision
between two points along the x-axis while keeping the y-axis Process) policy learned by the Reinforcement Learning al-
variance the same. The clustering algorithm, in order to sat- gorithm and subsequently a traditional clustering algorithm
isfy its own distance and variance optimization criteria, now is applied on the new feature space, which now achieves the
additionally satisfies the point constraints, achieving the de- desired clusters.
sired grouping, as shown in Figure 1(c).
For a same cluster type constraint, the dimensions are to Similarity Function Learning
be scaled in such a way that the points in the locality of
the constraint pair are closer in the new feature space. This To apply Reinforcement Learning to the learning of the ef-
is achieved by shrinking the dimension along which they fective scaling of the feature dimensions, an MDP is defined.
are farthest apart. Similarly for a different cluster type con- An MDP (Puterman 1990), < S, A, T, R >, defines a
straints the dimensions along which they are closest are ex- stochastic controllable process, where the state s ∈ S de-
panded, pushing them farther apart in the new feature space. scribes all information necessary to predict the transition
probabilities T : P (S 0 |S, a) for every possible action a ∈ A.
If the dimensions along which scaling is performed are
Thus policies Π defining the actions to be taken only require
the same as those of the original inputs, the number and
knowledge of the current state. The reward R : rQ(s, a)
forms of the potential final clusters are limited. For exam-
defines the intermediate reward (payoff) obtained for taken
ple, in a two dimensional scenario the final clusters formed
action a for a given state. The goal of solving an MDP is to
with two clusters can only be separated along the axes. In
determine the policy Π(S, a) = P (a|S), that maximizes the
order to address this problem this paper presents a novel ap-
accumulated reward obtained. Reinforcement Learning is a
proach where the original input dimension of the dataset is
method to learn this policy.
appended to a kernel space with a similarity metric to each
In this work, the MDP uses two state variables, namely,
pairwise point in the set of constraints. This provides the
the accuracy for same cluster constraints and the accuracy
scaling process with more degrees of freedom as it can scale
for different cluster constraints. Since the variables are con-
along a much higher number of dimensions, resulting in a
tinuous we discretize the state space into bins of size B. It
more complicated and nonlinear separation plane. Perform-
should be noted that, as with such discretizations, the choice
ing clustering in this reduced kernel space keeps the compu-
of discretization step is a trade-off between the fine grained
tational complexity lower compared to the complete kernel
accuracy of the current state status and the computational
space over all data points. A radial basis function (RBF) is
complexity.
used as a kernel function (Zhang et al. 2007) since we need
our kernel function to have a local property where points
closer to each other should behave similarly. Thus, for input State Space of the MDP
data with m samples and d dimensions we convert the data To compute state attributes, constraints are divided into two
samples into a d + 2 ∗ c dimensional space where c is the groups, ones that are satisfied in the current clusters and ones
number of constraints, each specifying a pair of points over that are not. For the ones that are not satisfied, a degree of
which the constraint is defined. In general, this is a much them being satisfied is calculated. This degree is used as a
metric to identify how close to satisfied a constraint is. Us- • For different cluster constraint type, fix unsatisfied con-
ing a probabilistic view on cluster membership, the proba- straint farthest from satisfaction
bility of a point, x, being in a cluster, Ki , is defined as • For different cluster constraint type, fix unsatisfied con-
distance(x, Ki ) straint closest to satisfaction
P (Ki |x) = P
j distance(x, Kj ) Upon choosing the specific constraint to fix, the dimensions
to scale are chosen. For same cluster constraint type, a sub-
The degree of a same cluster constraint, Cj,k , being satisfied
set of the dimensions with the highest distance is selected
is a function of the probabilities of the two member points,
and scaled down, bringing them closer in the new feature
xj and xk , in the constraint. For each cluster, the likelihood
space. Similarly for different cluster type, the dimensions
of them lying in the same cluster, Ki , is defined as
with the lowest distance are selected and scaled up, pushing
P (Ki |Cj,k , type(Cj,k ) = samecluster) = them farther apart.
P (Ki |xj ) ∗ P (Ki |xk )
Reward Function
and the degree of a same cluster type constraint to be ful-
The reward function is simply defined as the improvement
filled is then calculated as the maximum product of the prob-
in the satisfaction of the constraints as measured by the sum
abilities for each point regarding every cluster. Thus the de-
of the difference between each of the two accuracy variables
gree of a constraint being satisfied is the maximum likeli-
of the new state and the current state.
hood for any class that both points are in that cluster. The
maximum (rather than the sum) is used here to achieve a
stronger tie across multiple constraints containing the same
Goal State or End Criteria
data point since it forces a single cluster to be picked for The goal state is defined as the state where all the constraints
each constraint. Similarly, for a different cluster type con- are satisfied. However it is often not possible to reach this
straint, Cj,k , the probability of a point belonging to cluster state due to many factors like human error in constraint se-
Ki while the other does not, is defined as lection or other contradicting constraints. Thus, in addition
to the above goal criteria we append another condition which
P (Ki |Cj , k, type(Cj,k ) = dif f erentcluster) = states that we end our search when the state does not change
P (Ki |xj ) ∗ (1 − P (Ki |xk )) for a given fixed number of steps.
and the degree of the constraint to be satisfied is again calcu-
lated as the maximum across all clusters of this probability. Experimental results
The accuracy of a constraint type as a state variable is de- Problems for training, validation and testing were pseudo-
fined in Equation (1). randomly generated as multivariate Gaussian distributions.
#satisf ied constraints of type t Parameters controlling the individual cluster size, number
Pt (satisf ied) = of distinct clusters, mean and co-variance of each cluster,
#constraints of type t
(1) along with the number of input dimensions were seeded ran-
Accuracyt = domly, with a range of 400 to 800 samples in each problem.
Furthermore, a set of constraints for each problem were ran-
Pt (satisf ied) + (1 − Pt (satisf ied))∗ (2) domly chosen from the data points. Reinforcement learning
Ei∈unsatisf ied of type t [P (Ki |Cj,k , type(Cj,k ) = t)] was used to learn 5 different policies over a set of 70 training
This results in two state attributes, Accuracysame cluster problems. Subsequently, the policies were used to compute
and Accuracydif f erent cluster , which are both in the range the dimension scaling factors for 30 test problems similar to
of [0..1]. We then discretize the attribute into bins of size B their respective training problems which were unseen during
in order to reduce the complexity since state values closer to the policy learning phase.
each other are related and thus can be grouped into a single For the experiments, a bin size of 0.05 was chosen, i.e.
state, the size of the group determined by the bin size B. state variable values are discretized into the accuracy in-
tervals [0..0.05), [0.05..0.1), ..., [0.95..1]. For the actions,
Action Space of the MDP scaling steps of 0.7 and 1.3 are chosen. For same cluster
Given the degrees of satisfaction of the individual con- constraints the top third of the dimensions with the high-
straints of any type, the unsatisfied constraints closest to est distance are scaled down by multiplying the dimensions
satisfaction and the constraints farthest from satisfaction are by 0.7, and for different cluster constraints the bottom third
identified, i.e. for each constraint type the unsatisfied con- dimensions are scaled up by multiplying the selected dimen-
straints with the highest and the lowest probabilities are se- sions by 1.3. The algorithms used were K-means clustering
lected. There are 4 actions which the agent can take in every (Hartigan and Wong 1979) for the clustering process and
state in order to satisfy the constraints. SARSA(Rummery and Niranjan 1994) as the reinforcement
learning algorithm (Kaelbling, Littman, and Moore 1996)
• For same cluster constraint type, fix unsatisfied constraint for training the MDP. The policies were trained with prob-
farthest from satisfaction lems similar to each other with respect to the number of
• For same cluster constraint type, fix unsatisfied constraint dimensions in the original dataset and the number of pro-
closest to satisfaction posed constraints on the problem. Finally a policy was
(a) Solutions for 2-dimensional dataset (b) Solutions for 20-dimensional dataset mapped to 2
dimensions using t-sne for visualization
Figure 2: K-means clusters. (a). Solution without dimension scaling (Left); Solution with dimension scaling (Right) (b).
Solution of 20-dimensional data; Solution without dimension scaling (Left); Solution with dimension scaling (Right)
policy # dimensions # constraints % satisfied References
1 2-7 4 - 10 81.2% Grira, N.; Crucianu, M.; and Boujemaa, N. 2004. Unsu-
2 5 - 10 20 - 25 84.8% pervised and semi-supervised clustering: a brief survey. A
3 7 - 15 20 - 25 72% review of machine learning techniques for processing mul-
4 15 - 20 10 - 15 73.33% timedia content, Report of the MUSCLE European Network
5 2 - 20 10 - 25 40% of Excellence (FP6).
Table 1: Performance analysis of individual policies Hartigan, J. A., and Wong, M. A. 1979. Algorithm as 136:
A k-means clustering algorithm. Journal of the Royal Sta-
tistical Society. Series C (Applied Statistics) 28(1):100–108.
also trained with a broad range of problems, for compari- Kaelbling, L. P.; Littman, M. L.; and Moore, A. W. 1996.
son with the performance of the other specialized policies. Reinforcement learning: A survey. Journal of artificial in-
Figure 2 shows the result of a cluster output of K-means telligence research 237–285.
on the original dataset and the output upon scaling the di- Puterman, M. L. 1990. Markov decision processes.
mensions using the policies learned, red lines indicating dif- Handbooks in operations research and management science
ferent cluster type constraints while green lines indicating 2:331–434.
same cluster type constraints. Figure 2(a) is a solution to a Rangapuram, S. S., and Hein, M. 2015. Constrained 1-
2-dimensional problem using policy 1 while, Figure 2(b) is a spectral clustering. arXiv preprint arXiv:1505.06485.
20-dimensional problem, the solution to which was mapped
Rummery, G. A., and Niranjan, M. 1994. On-line q-learning
to 2 dimensions using t-sne (Van der Maaten and Hinton
using connectionist systems.
2008) for visualization. Table 1 shows the performance of
the individual policies on 30 test problems and the range of Van der Maaten, L., and Hinton, G. 2008. Visualizing data
dimensions and constraints learned upon. Policies 1 − 4 are using t-sne. Journal of Machine Learning Research 9(2579-
learned on specific types while policy 5 is a generic policy. 2605):85.
Since the constraints were randomly generated, it is of- Xu, R.; Wunsch, D.; et al. 2005. Survey of clustering algo-
ten impossible to solve for all of them because of conflict- rithms. Neural Networks, IEEE Transactions on 16(3):645–
ing constraints. However, the solutions computed from the 678.
policies above solve for the highest possible number of con- Zeng, H., and Cheung, Y.-m. 2012. Semi-supervised
straints while maximizing the accuracy of unsatisfied con- maximum margin clustering with pairwise constraints.
straints as shown in Equation (2). Table 1 also shows that Knowledge and Data Engineering, IEEE Transactions on
policies learned for a specific type of problem (policies 1 to 24(5):926–939.
4) perform much better than a generic policy (policy 5).
Zhang, J.; Marszałek, M.; Lazebnik, S.; and Schmid, C.
2007. Local features and kernels for classification of texture
Conclusion and object categories: A comprehensive study. International
The paper shows that semi-unsupervised clustering is pos- journal of computer vision 73(2):213–238.
sible by re-mapping and scaling the original input dimen- Zhao, W.; He, Q.; Ma, H.; and Shi, Z. 2012. Effec-
sions, without modifying the clustering algorithm. Thus, tive semi-supervised document clustering via active learning
this method can be applied to any traditional clustering algo- with instance-level constraints. Knowledge and information
rithm. This paper also shows that performance of generic is systems 30(3):569–587.
poor compared to policies from specific types of problems.
View publication stats