Anomalous Topic Discovery in High Dimensional Discrete Data

The document proposes an algorithm for detecting anomalous clusters in high dimensional discrete data by finding groups of anomalies that collectively exhibit abnormal patterns in a small subset of features. The algorithm is applied to detecting anomalous topics in text documents. Experimental results on synthetic and real-world text corpora show the method can accurately detect anomalous topics and salient words.

Uploaded by

Brightworld Projects

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views

Anomalous Topic Discovery in High Dimensional Discrete Data

Uploaded by

Brightworld Projects

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 4

Anomalous Topic Discovery in High Dimensional Discrete Data

ABSTRACT

We propose an algorithm for detecting patterns exhibited by anomalous

clusters in high dimensional discrete data. Unlike most anomaly detection (AD)
methods, which detect individual anomalies, our proposed method detects groups
(clusters) of anomalies; i.e. sets of points which collectively exhibit abnormal
patterns. In many applications this can lead to better understanding of the nature of
the atypical behavior and to identifying the sources of the anomalies. Moreover,
we consider the case where the atypical patterns exhibit on only a small (salient)
subset of the very high dimensional feature space. Individual AD techniques and
techniques that detect anomalies using all the features typically fail to detect such
anomalies, but our method can detect such instances collectively, discover the
shared anomalous patterns exhibited by them, and identify the subsets of salient
features. In this paper, we focus on detecting anomalous topics in a batch of text
documents, developing our algorithm based on topic models. Results of our
experiments show that our method can accurately detect anomalous topics and
salient features (words) under each such topic in a synthetic data set and two real-
world text corpora and achieves better performance compared to both standard
group AD and individual AD techniques.
Conflict-Aware Event-Participant Arrangement and Its Variant for Online
Setting

ABSTRACT

With the rapid development of Web 2.0 and Online To Offline (O2O)
marketing model, various online event-based social networks (EBSNs) are getting
popular. An important task of EBSNs is to facilitate the most satisfactory event-
participant arrangement for both sides, i.e. events enroll more participants and
participants are arranged with personally interesting events. Existing approaches
usually focus on the arrangement of each single event to a set of potential users, or
ignore the conflicts between different events, which leads to infeasible or
redundant arrangements. In this paper, to address the shortcomings of existing
approaches, we first identify a more general and useful event-participant
arrangement problem, called Global Event-participant Arrangement with Conflict
and Capacity (GEACC) problem, focusing on the conflicts of different events and
making event-participant arrangements in a global view. We find that the GEACC
problem is NP-hard due to the conflicts among events. Thus, we design two
approximation algorithms with provable approximation ratios and an exact
algorithm with pruning technique to address this problem. In addition, we propose
an online setting of GEACC, called OnlineGEACC, which is also practical in real-
world scenarios. We further design an online algorithm with provable performance
guarantee. Finally, we verify the effectiveness and efficiency of the proposed
methods through extensive experiments on real and synthetic datasets.
Incremental and Decremental Max-flow for Online Semi-supervised Learning

ABSTRACT

Max-flow has been adopted for semi-supervised data modelling, yet existing
algorithms were derived only for the learning from static data. This paper proposes
an online max-flow algorithm for the semi-supervised learning from data streams.
Consider a graph learned from labelled and unlabelled data, and the graph being
updated dynamically for accommodating online data adding and retiring. In
learning from the resulting non stationary graph, we augment and de-augment
paths to update max-flow with a theoretical guarantee that the updated max-flow
equals to that from batch retraining. For classification, we compute min-cut over
current max-flow, so that minimized number of similar sample pairs are classified
into distinct classes. Empirical evaluation on real-world data reveals that our
algorithm outperforms state-of-the-art stream classification algorithms.

Trust-but-Verify: Verifying Result Correctness of Outsourced Frequent

Itemset Mining in Data-mining-as-a-service Paradigm

ABSTRACT

Cloud computing is popularizing the computing paradigm in which data is

outsourced to a third-party service provider (server) for data mining. Outsourcing,
however, raises a serious security issue: how can the client of weak computational
power verify that the server returned correct mining result? In this paper, we focus
on the specific task of frequent itemset mining. We consider the server that is
potentially untrusted and tries to escape from verification by using its prior
knowledge of the outsourced data. We propose efficient probabilistic and
deterministic verification approaches to check whether the server has returned
correct and complete frequent itemsets. Our probabilistic approach can catch
incorrect results with high probability, while our deterministic approach measures
the result correctness with 100% certainty. We also design efficient verification
methods for both cases that the data and the mining setup are updated. We
demonstrate the effectiveness and efficiency of our methods using an extensive set
of empirical results on real datasets.
User Preference Learning for Online Social Recommendation

ABSTRACT

Social recommendation system has attracted a lot of attention recently in the

research communities of information retrieval, machine learning and data mining.
Traditional social recommendation algorithms are often based on batch machine
learning methods which suffer from several critical limitations, e.g., extremely
expensive model retraining cost whenever new user ratings arrive, unable to
capture the change of user preferences over time. Therefore, it is important to make
social recommendation system suitable for realworld online applications where
data often arrives sequentially and user preferences may change dynamically and
rapidly. In this paper, we present a new framework of online social
recommendation from the viewpoint of online graph regularized user preference
learning (OGRPL), which incorporates both collaborative user-item relationship as
well as item content features into an unified preference learning process. We
further develop an efficient iterative procedure, OGRPL-FW which utilizes the
Frank-Wolfe algorithm, to solve the proposed online optimization problem. We
conduct extensive experiments on several large-scale datasets, in which the
encouraging results demonstrate that the proposed algorithms obtain significantly
lower errors (in terms of both RMSE and MAE) than the state-ofthe-art online
recommendation methods when receiving the same amount of training data in the
online learning process.

Generative AI in Practice
100% (7)
Generative AI in Practice
301 pages
1 s2.0 S0160791X23001264 Main
No ratings yet
1 s2.0 S0160791X23001264 Main
15 pages
Algotech Whitepaper
No ratings yet
Algotech Whitepaper
21 pages
Infocom Submission
No ratings yet
Infocom Submission
9 pages
Real-Time Fine-Grained Air Quality Sensing Networks in Smart City: Design, Implementation and Optimization
No ratings yet
Real-Time Fine-Grained Air Quality Sensing Networks in Smart City: Design, Implementation and Optimization
4 pages
Slicing A New Approach To Privacy Preserving Data Publishing
No ratings yet
Slicing A New Approach To Privacy Preserving Data Publishing
19 pages
CS6007 - Information Retrieval
No ratings yet
CS6007 - Information Retrieval
38 pages
Conference Paper LATENT DIRICHLET ALLOCATION (LDA)
No ratings yet
Conference Paper LATENT DIRICHLET ALLOCATION (LDA)
9 pages
Data Mining
No ratings yet
Data Mining
9 pages
Recommender: An Analysis of Collaborative Filtering Techniques
No ratings yet
Recommender: An Analysis of Collaborative Filtering Techniques
5 pages
Active Learning in The Era of Big Data
No ratings yet
Active Learning in The Era of Big Data
13 pages
A Simple Model For Chunk-Scheduling Strategies in P2P Streaming
No ratings yet
A Simple Model For Chunk-Scheduling Strategies in P2P Streaming
4 pages
Efficient Mining of Frequent Patterns On Uncertain Graphs
No ratings yet
Efficient Mining of Frequent Patterns On Uncertain Graphs
72 pages
IEEE 24-25
No ratings yet
IEEE 24-25
13 pages
NICE Actimize - DS_Incremental Online Learning Insights Article_3JUNE20
No ratings yet
NICE Actimize - DS_Incremental Online Learning Insights Article_3JUNE20
11 pages
Java Projects On 2013 Ieee Papers
No ratings yet
Java Projects On 2013 Ieee Papers
7 pages
LSJ1512 - Progressive Duplicate Detection
No ratings yet
LSJ1512 - Progressive Duplicate Detection
5 pages
A Novel Datamining Based Approach For Remote Intrusion Detection
No ratings yet
A Novel Datamining Based Approach For Remote Intrusion Detection
6 pages
Abstracts
No ratings yet
Abstracts
3 pages
International Journal of Computational Engineering Research (IJCER)
No ratings yet
International Journal of Computational Engineering Research (IJCER)
6 pages
M.Phil Computer Science Biometric System Projects
No ratings yet
M.Phil Computer Science Biometric System Projects
3 pages
main
No ratings yet
main
2 pages
NIPS 2017 Decoupling When To Update From How To Update Paper
No ratings yet
NIPS 2017 Decoupling When To Update From How To Update Paper
11 pages
1 s2.0 S0022000014001706 Main
No ratings yet
1 s2.0 S0022000014001706 Main
17 pages
Item-Based Collaborative Filtering Recommendation Algorithms - Highlighted Paper
No ratings yet
Item-Based Collaborative Filtering Recommendation Algorithms - Highlighted Paper
11 pages
M.Phil Computer Science Knowledge and Data Engineering Projects
No ratings yet
M.Phil Computer Science Knowledge and Data Engineering Projects
2 pages
Active Online Learning For Social Media Analysis To Support Crisis Management
No ratings yet
Active Online Learning For Social Media Analysis To Support Crisis Management
8 pages
IEEE 2012 Titles Abstract
No ratings yet
IEEE 2012 Titles Abstract
14 pages
L - B F E A: Earning Ased Requency Stimation Lgorithms
No ratings yet
L - B F E A: Earning Ased Requency Stimation Lgorithms
20 pages
Clustering Methods For Big Data Analytics Techniques, Toolboxes and Applications
No ratings yet
Clustering Methods For Big Data Analytics Techniques, Toolboxes and Applications
192 pages
Abstracts 2
No ratings yet
Abstracts 2
3 pages
Secure Mining of Association Rules in Horizontally Distributed Databases
No ratings yet
Secure Mining of Association Rules in Horizontally Distributed Databases
3 pages
HTML Forms Built On User Trait Detection
No ratings yet
HTML Forms Built On User Trait Detection
16 pages
IEEE Solved PROJECTS 2009
No ratings yet
IEEE Solved PROJECTS 2009
64 pages
Network-Assisted Mobile Computing With Optimal Uplink Query Processing
No ratings yet
Network-Assisted Mobile Computing With Optimal Uplink Query Processing
11 pages
Text Classificatio Through Time:: Efficient Label Propagation in Time-Based Graphs
No ratings yet
Text Classificatio Through Time:: Efficient Label Propagation in Time-Based Graphs
9 pages
Count Distributions For Autoregressive Conditional Duration Model
No ratings yet
Count Distributions For Autoregressive Conditional Duration Model
2 pages
Data Mining CaseBrasilTelecom
No ratings yet
Data Mining CaseBrasilTelecom
15 pages
Online Mining For Association Rules and Collective Anomalies in Data Streams
No ratings yet
Online Mining For Association Rules and Collective Anomalies in Data Streams
10 pages
Ijcseit 020604
No ratings yet
Ijcseit 020604
9 pages
Integrating Semantic Concept Similarity in Model-Based Web Applications
No ratings yet
Integrating Semantic Concept Similarity in Model-Based Web Applications
8 pages
A modified attention mechanism powered by Bayesian Network for user activity analysis and prediction
No ratings yet
A modified attention mechanism powered by Bayesian Network for user activity analysis and prediction
13 pages
Neural Network PHD Thesis PDF
100% (3)
Neural Network PHD Thesis PDF
5 pages
Item-Based Collaborative Filtering Recommendation Algorithms
No ratings yet
Item-Based Collaborative Filtering Recommendation Algorithms
11 pages
498_pate_gan_generating_synthetic_
No ratings yet
498_pate_gan_generating_synthetic_
21 pages
A Review On Knowledge Sharing in Collaborative Environment
No ratings yet
A Review On Knowledge Sharing in Collaborative Environment
6 pages
Embarrassingly Shallow Auto-Encoders For Dynamic
No ratings yet
Embarrassingly Shallow Auto-Encoders For Dynamic
33 pages
Visual Clustering Approaches
No ratings yet
Visual Clustering Approaches
3 pages
Mca, Bca Project List 2023-2024
No ratings yet
Mca, Bca Project List 2023-2024
90 pages
An Efficient and Robust Algorithm For Shape Indexing and Retrieval
No ratings yet
An Efficient and Robust Algorithm For Shape Indexing and Retrieval
5 pages
M Tech Seminar Topic
No ratings yet
M Tech Seminar Topic
11 pages
Anomaly Detection Via Eliminating Data Redundancy and Rectifying Data Error in Uncertain Data Streams
No ratings yet
Anomaly Detection Via Eliminating Data Redundancy and Rectifying Data Error in Uncertain Data Streams
18 pages
Abstract in All
No ratings yet
Abstract in All
13 pages
Record Matching Over Query Results From Multiple Web Databases
No ratings yet
Record Matching Over Query Results From Multiple Web Databases
27 pages
Mining Noisy Data Streams via a Discriminative Model 1st Edition by Fang Chu, Yizhou Wang, Carlo Zaniolo 9783540233572download
No ratings yet
Mining Noisy Data Streams via a Discriminative Model 1st Edition by Fang Chu, Yizhou Wang, Carlo Zaniolo 9783540233572download
53 pages
ML567 Final Project
No ratings yet
ML567 Final Project
8 pages
A New Two-Phase Sampling Algorithm For Discovering Association Rules
No ratings yet
A New Two-Phase Sampling Algorithm For Discovering Association Rules
24 pages
An Algorithmic Framework to Control Bias in Bandit-based Personalization
No ratings yet
An Algorithmic Framework to Control Bias in Bandit-based Personalization
22 pages
13238-Article Text-23626-1-10-20221220
No ratings yet
13238-Article Text-23626-1-10-20221220
7 pages
Reconciling Schema Matching Networks: Thèse N 6033 (2013)
No ratings yet
Reconciling Schema Matching Networks: Thèse N 6033 (2013)
162 pages
Mphil Thesis in Computer Science Data Mining
100% (3)
Mphil Thesis in Computer Science Data Mining
7 pages
Pedestrian Detection: Please, suggest a subtitle for a book with title 'Pedestrian Detection' within the realm of 'Computer Vision'. The suggested subtitle should not have ':'.
From Everand
Pedestrian Detection: Please, suggest a subtitle for a book with title 'Pedestrian Detection' within the realm of 'Computer Vision'. The suggested subtitle should not have ':'.
Fouad Sabry
No ratings yet
Statistical Classification: Fundamentals and Applications
From Everand
Statistical Classification: Fundamentals and Applications
Fouad Sabry
No ratings yet
Grid-Voltage-Oriented Sliding Mode Control For Dfig Under Balanced and Unbalanced Grid Faults
No ratings yet
Grid-Voltage-Oriented Sliding Mode Control For Dfig Under Balanced and Unbalanced Grid Faults
3 pages
Analysis of Anemia Using Data Mining Techniques With Risk Factors Specification
No ratings yet
Analysis of Anemia Using Data Mining Techniques With Risk Factors Specification
5 pages
Detailed Investigation and Performance Improvement of The Dynamic Behavior of Grid-Connected Dfig Based Wind Turbines Under LVRT Conditions
No ratings yet
Detailed Investigation and Performance Improvement of The Dynamic Behavior of Grid-Connected Dfig Based Wind Turbines Under LVRT Conditions
4 pages
Single-Phase To Three-Phase Unified Power Quality Conditioner Applied in Single Wire Earth Return Electric Power Distribution Grids
No ratings yet
Single-Phase To Three-Phase Unified Power Quality Conditioner Applied in Single Wire Earth Return Electric Power Distribution Grids
3 pages
Synchronization and Reactive Current Support of PMSG Based Wind Farm During Severe Grid Fault
No ratings yet
Synchronization and Reactive Current Support of PMSG Based Wind Farm During Severe Grid Fault
4 pages
PV Battery Charger Using An L3C Resonant Converter For Electric Vehicle Applications
No ratings yet
PV Battery Charger Using An L3C Resonant Converter For Electric Vehicle Applications
4 pages
Proposed Framework For V2V Communication Using Li-Fi Technology
No ratings yet
Proposed Framework For V2V Communication Using Li-Fi Technology
4 pages
A Novel Step-Up Single Source Multilevel Inverter: Topology, Operating Principle and Modulation
No ratings yet
A Novel Step-Up Single Source Multilevel Inverter: Topology, Operating Principle and Modulation
5 pages
High Speed LIFI Based Data Communication in Vehicular Networks
No ratings yet
High Speed LIFI Based Data Communication in Vehicular Networks
5 pages
Performance Evaluation of The Single-Phase Split-Source Inverter Using An Alternative Dc-Ac Configuration
No ratings yet
Performance Evaluation of The Single-Phase Split-Source Inverter Using An Alternative Dc-Ac Configuration
4 pages
Multistage and Multilevel Power Electronic Converter-Based Power Supply For Plasma DBD Devices
No ratings yet
Multistage and Multilevel Power Electronic Converter-Based Power Supply For Plasma DBD Devices
4 pages
A 5-Level High Efficiency Low Cost Hybrid Neutral Point Clamped Transformerless Inverter For Grid Connected Photovoltaic Application
No ratings yet
A 5-Level High Efficiency Low Cost Hybrid Neutral Point Clamped Transformerless Inverter For Grid Connected Photovoltaic Application
4 pages
Decentralized Control For Fully Modular Input-Series Output-Parallel (Isop) Inverter System Based On The Active Power Inverse-Droop Method
No ratings yet
Decentralized Control For Fully Modular Input-Series Output-Parallel (Isop) Inverter System Based On The Active Power Inverse-Droop Method
4 pages
A Single-Phase Single-Stage Switched-Boost Inverter With Four Switches
No ratings yet
A Single-Phase Single-Stage Switched-Boost Inverter With Four Switches
3 pages
Photovoltaic Ac Module Based On A Cuk Converter With A Switched-Inductor Structure
No ratings yet
Photovoltaic Ac Module Based On A Cuk Converter With A Switched-Inductor Structure
4 pages
New Indoor Navigation System For Visually Impaired People Using Visible Light Communication
No ratings yet
New Indoor Navigation System For Visually Impaired People Using Visible Light Communication
5 pages
An Improved Hybrid Modulation Method For The Single-Phase H6 Inverter With Reactive Power Compensation Photovoltaic (PV) Power Generation Has Boomed in Recent
No ratings yet
An Improved Hybrid Modulation Method For The Single-Phase H6 Inverter With Reactive Power Compensation Photovoltaic (PV) Power Generation Has Boomed in Recent
3 pages
Analysis of A High-Power, Resonant DC-DC Converter For DC Wind Turbines
No ratings yet
Analysis of A High-Power, Resonant DC-DC Converter For DC Wind Turbines
4 pages
A Common Ground Switched-Quasizsource Bidirectional DC-DC Converter With Wide-Voltage-Gain Range For Evs With Hybrid Energy Sources
No ratings yet
A Common Ground Switched-Quasizsource Bidirectional DC-DC Converter With Wide-Voltage-Gain Range For Evs With Hybrid Energy Sources
4 pages
Voice Recognition and Voice Comparison Using Machine Learning Techniques: A Survey
No ratings yet
Voice Recognition and Voice Comparison Using Machine Learning Techniques: A Survey
7 pages
Lecture 0 Tentative Lecture Plan
No ratings yet
Lecture 0 Tentative Lecture Plan
1 page
Machine Learning: Junaid Khan Department of Computer Science University of Peshawar Pakistan Presenter
No ratings yet
Machine Learning: Junaid Khan Department of Computer Science University of Peshawar Pakistan Presenter
21 pages
Multi-Faults Classification in WSN A Deep Learning Approach
No ratings yet
Multi-Faults Classification in WSN A Deep Learning Approach
6 pages
Chap 6 - Deep FeedForward Networks - Eunjeong Yi
No ratings yet
Chap 6 - Deep FeedForward Networks - Eunjeong Yi
21 pages
MIDAS- Multi-layered attack detection architecture with decision optimisation
No ratings yet
MIDAS- Multi-layered attack detection architecture with decision optimisation
14 pages
Manifest AI
No ratings yet
Manifest AI
18 pages
Machine Learning For Hydrologic Sciences: An Introductory Overview
No ratings yet
Machine Learning For Hydrologic Sciences: An Introductory Overview
40 pages
LLM Attacks
No ratings yet
LLM Attacks
32 pages
M1 & M2 Supplementaries
No ratings yet
M1 & M2 Supplementaries
52 pages
Initiatives Related To Industry Interaction
No ratings yet
Initiatives Related To Industry Interaction
13 pages
SB Fortigate and Darktrace Security Solution
No ratings yet
SB Fortigate and Darktrace Security Solution
3 pages
Seminar
No ratings yet
Seminar
18 pages
Introduction To K-Nearest Neighbor (KNN) : Height (In CMS) Weight (In KGS) T Shirt Size
No ratings yet
Introduction To K-Nearest Neighbor (KNN) : Height (In CMS) Weight (In KGS) T Shirt Size
5 pages
2024-25 B.Tech 4-1 (R20) Regular-Supply Examinations, November-2024 Time Table
No ratings yet
2024-25 B.Tech 4-1 (R20) Regular-Supply Examinations, November-2024 Time Table
1 page
Heart Disease Prediction Documentation
No ratings yet
Heart Disease Prediction Documentation
4 pages
Download Study Resources for Practical Analytics Second Edition Nitin Kalé & Nancy Jones
100% (2)
Download Study Resources for Practical Analytics Second Edition Nitin Kalé & Nancy Jones
64 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
28 pages
Plant Disease Detection Using AI Based VGG 16 Model
No ratings yet
Plant Disease Detection Using AI Based VGG 16 Model
10 pages
Learning Law in Neural Networks
100% (2)
Learning Law in Neural Networks
19 pages
DWDM R20 Lab Manual 3-1 Cse 2022-2023 Sem 1
No ratings yet
DWDM R20 Lab Manual 3-1 Cse 2022-2023 Sem 1
151 pages
Project Report Group 3
No ratings yet
Project Report Group 3
32 pages
Data Science Task List Pfsinterns
No ratings yet
Data Science Task List Pfsinterns
14 pages
Head Data & Analytics JD
No ratings yet
Head Data & Analytics JD
3 pages
Customer Churn Prediction Using Machine Learning Techniques: The Case of Lion Insurance
No ratings yet
Customer Churn Prediction Using Machine Learning Techniques: The Case of Lion Insurance
14 pages
Customer Churn Prediction
No ratings yet
Customer Churn Prediction
6 pages
Finding and Recommendation Industrial Training Report
No ratings yet
Finding and Recommendation Industrial Training Report
17 pages
A Multi-Objective Active Learning Platform and Web App ForReaction Optimization
No ratings yet
A Multi-Objective Active Learning Platform and Web App ForReaction Optimization
9 pages

Anomalous Topic Discovery in High Dimensional Discrete Data

Uploaded by

Anomalous Topic Discovery in High Dimensional Discrete Data

Uploaded by

Anomalous Topic Discovery in High Dimensional Discrete Data

We propose an algorithm for detecting patterns exhibited by anomalous

Trust-but-Verify: Verifying Result Correctness of Outsourced Frequent

Cloud computing is popularizing the computing paradigm in which data is

Social recommendation system has attracted a lot of attention recently in the

You might also like