Anomalous Topic Discovery in High Dimensional Discrete Data
Anomalous Topic Discovery in High Dimensional Discrete Data
ABSTRACT
ABSTRACT
With the rapid development of Web 2.0 and Online To Offline (O2O)
marketing model, various online event-based social networks (EBSNs) are getting
popular. An important task of EBSNs is to facilitate the most satisfactory event-
participant arrangement for both sides, i.e. events enroll more participants and
participants are arranged with personally interesting events. Existing approaches
usually focus on the arrangement of each single event to a set of potential users, or
ignore the conflicts between different events, which leads to infeasible or
redundant arrangements. In this paper, to address the shortcomings of existing
approaches, we first identify a more general and useful event-participant
arrangement problem, called Global Event-participant Arrangement with Conflict
and Capacity (GEACC) problem, focusing on the conflicts of different events and
making event-participant arrangements in a global view. We find that the GEACC
problem is NP-hard due to the conflicts among events. Thus, we design two
approximation algorithms with provable approximation ratios and an exact
algorithm with pruning technique to address this problem. In addition, we propose
an online setting of GEACC, called OnlineGEACC, which is also practical in real-
world scenarios. We further design an online algorithm with provable performance
guarantee. Finally, we verify the effectiveness and efficiency of the proposed
methods through extensive experiments on real and synthetic datasets.
Incremental and Decremental Max-flow for Online Semi-supervised Learning
ABSTRACT
Max-flow has been adopted for semi-supervised data modelling, yet existing
algorithms were derived only for the learning from static data. This paper proposes
an online max-flow algorithm for the semi-supervised learning from data streams.
Consider a graph learned from labelled and unlabelled data, and the graph being
updated dynamically for accommodating online data adding and retiring. In
learning from the resulting non stationary graph, we augment and de-augment
paths to update max-flow with a theoretical guarantee that the updated max-flow
equals to that from batch retraining. For classification, we compute min-cut over
current max-flow, so that minimized number of similar sample pairs are classified
into distinct classes. Empirical evaluation on real-world data reveals that our
algorithm outperforms state-of-the-art stream classification algorithms.
ABSTRACT
ABSTRACT