0% found this document useful (0 votes)
19 views

Paper For Project

The document proposes a hybrid model for outlier detection in graphs that utilizes generative adversarial networks and graph neural networks. It discusses different existing outlier detection techniques and their limitations. The presented approach aims to overcome these limitations by leveraging the power of GANs to generate realistic data distributions and the expressiveness of GNNs to capture complex graph structures.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views

Paper For Project

The document proposes a hybrid model for outlier detection in graphs that utilizes generative adversarial networks and graph neural networks. It discusses different existing outlier detection techniques and their limitations. The presented approach aims to overcome these limitations by leveraging the power of GANs to generate realistic data distributions and the expressiveness of GNNs to capture complex graph structures.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

A Hybrid Model for Outlier Detection in Graphs

using GAN
Aryan Arya Manav Motiani Aayush Kumar Patwari
Department of Computer Science Department of Computer Science Department of Computer Science
and Engineering and Engineering and Engineering
Netaji Subhas University of Netaji Subhas University of Netaji Subhas University of
Technology Technology Technology
[email protected] [email protected] [email protected]

Dr. Vandana Bhatia


Assistant Professor
Department of Computer Science
Engineering
Netaji Subhas University of
Technology
[email protected]

Abstract - Graphs are used widely to model complex systems, and definition exists and a specific application might provide
detecting anomalies in a graph is an important task in the better guidance. For example, in a spam detection
analysis of complex systems. Graph anomalies are patterns in a application, the degree distributions of nodes can provide
graph that do not conform to normal patterns expected of the insights about outliers.
attributes and/or structures of the graph. The demand for Outlier
detection in social networks has been increasing in recent years
due to its use in many applications like discovery of criminal To handle the outlier detection problem, many techniques
activities in electronic commerce, spam detection, intrusion like clustering method, auto-encoder method, iForest
detection, transaction fraud detection, and many others. The use method, DNODA method etc. have been developed in the
of neural network based methods for outlier detection has been past decades but these algorithms have their own
growing. In recent times, there has been a growth in the use of disadvantages.
GAN (Generative adversarial network) models for detecting
outliers. A GAN comprises of a generator and a discriminator. In recent times, there has been a surge in the usage of
The generator makes fake data samples, while the discriminator Neural Networks for outlier detection. Neural networks
predicts if a sample is real or fake. To enhance our model
further, we include a GNN-based encoder that produces latent
(NNs) are a class of machine learning models inspired by
representations of the generated data by efficiently capturing the the structure and function of biological neural networks in
structural connections in our graph network. Our model the human brain. They consist of interconnected layers of
generates an outlier score for every node in the network. The artificial neurons (nodes) that process and transform input
higher the outlier score of a node, the more likely it is an outlier. data to produce output predictions. NNs have gained
Lastly, we assess the performance of our model using various popularity due to their ability to learn complex patterns and
evaluation metrics. relationships from data, making them suitable for a wide
range of tasks including classification, regression,
I. INTRODUCTION
clustering, and more. There are different types of Neural
Networks - Convolutional Neural Networks (CNNs),
Outliers in social networks are the nodes which exhibit Recurrent Neural Networks(RNNs), Graph Neural
different properties that do not match expected behaviour of Networks(GNNs), Autoencoders, etc.
nodes. In other words, Outliers (also called as anomalies or
impurities) are the instances which exhibit different In this research, we present a novel approach to outlier
properties. Outliers typically correspond to structural detection utilizing Generative Adversarial Networks
changes in large-scale networks like the Web, social or (GANs) in conjunction with Graph Neural Networks
communication networks. Structural changes may be (GNNs) based encoders. By leveraging the power of GANs
modeled in terms of communities, shortest paths, or other for generating realistic data distributions and the
local structural properties. expressiveness of GNNs for capturing complex graph
structures, our proposed method aims to overcome the
Outlier detection is the procedure to find such instances in shortcomings of traditional outlier detection techniques.
the dataset which exhibit different properties from the Through empirical evaluation, we demonstrate the
majority of the data instances in a dataset. There are an effectiveness and robustness of our approach in accurately
infinite number of different ways in which outliers could be identifying outliers within large-scale graph networks.
found. Also it is important to define the outliers from an
application specific perspective, because no uniform

1
Generative Adversarial Networks (GANs) are a type of this task: LOF, ODIN, NC, IFOREST, and ABOD. A key
neural network framework composed of two main benefit of the proposed approach is that the number of
components: a generator and a discriminator. The generator outliers does not need to be specified beforehand.
synthesizes data samples from random noise, aiming to
produce outputs that resemble real data, while the For outlier detection specifically, the mean-shift outlier
discriminator learns to distinguish between real and fake detector (MOD) is slightly more effective than the
samples. Through an adversarial training process, where the medoid-shift outlier detector (DOD). Experiments
generator and discriminator compete against each other, demonstrate the proposed approaches outperform eleven
GANs can generate highly realistic data samples across cutting-edge outlier detection methods: LOF, NC, KNN,
various domains, such as images, text, and audio. This ODIN, MCD, IFOREST, OCSVM, PCAD, and ABOD. A
competitive training dynamic allows GANs to capture major advantage of the proposed approach is its strong
complex data distributions and produce outputs that exhibit performance even when the number of outliers is large. The
intricate patterns and characteristics similar to those found method can also handle non-numeric data like strings by
in real data, making them a powerful tool for tasks like using edit distance.
image generation, data augmentation, and image-to-image
translation. Direct Neighbour Outlier Detection Algorithm
(DNODA)[4]:

Direct Neighbour Outlier Detection Algorithm (DNODA)


DNODA is an algorithm that considers use of the direct
neighbours u ∈ N(v) of a given node v. A DNODA outlier
score is calculated. Intuitively this score is directly
proportional to the distance of v from its direct neighbors.
Hence an aberrant variation of any node would indicate an
anomaly.

Community Neighbor Algorithm (CNA)[4]:


Figure 1 shows the outlier or anomaly detection in Graphs

Another outlier detection approach is Community Neighbor


A Graph Neural Network (GNN) is a type of neural network Algorithm(CNA). In this approach, we partition the graph
model designed to operate on graph-structured data. GNNs into K communities using clustering algorithms, and define
gather information from neighboring nodes and edges, outliers as the objects that have significantly different values
enabling them to capture both local and global graph compared with the other objects in the same community.
structures. They learn node-level features and graph-level
representations, making them useful for tasks such as node Isolation Forest (iForest)[4]:
classification, link prediction, and graph classification.
GNNs update node-level features based on the information The idea of the Isolated Forest algorithm is isolation
from surrounding nodes, allowing them to handle dynamic (separating instances from the rest of instances) i.e. use the
graph structures. Overall, GNNs are a powerful tool for property of anomalies that they are rare and different. Most
understanding and analyzing graph data, with applications anomaly detection methods like one-class SVM try to build
across various domains such as social network analysis, a model for ”normal” points, and then find points that do not
recommendation systems, and biological network analysis. follow this model. The crucial advantage of Isolation Forest
is the speed compared to one-class SVM, and also more
Objective -Essential objectives of this study include
building a Generative Adversarial Network(GAN) model complex decision boundary. The intuition of this method is
for outlier detection, improving this model by introducing that anomalies are isolated closer to the root of the tree,
a Graph Neural Network(GNN) as an encoder which whereas normal points are isolated at the deeper end of the
produces latent representations from generated data by tree, and we can use path length within a tree till point is
capturing the structural relationships of graph data and isolated as an anomaly score.
finally evaluating our model using various performance
metrics like accuracy, precision, recall etc. Generative Adversarial Network[5]:

In the proposed framework, a generator is trained to


II. RELATED WORK reconstruct node attributes while the discriminator
discriminates whether the embedding pair encoded by an
Mean Shift Outlier Detection and Filtering [3]: encoder is from original input or generator output. The
anomaly score is determined by a combination of
Mean shift and medoid shift are proposed as preprocessing reconstruction loss and discriminator loss. Experiments on
techniques before conducting analysis such as clustering and real-world datasets achieve state-of-the-art performance,
outlier detection. When used before clustering, our results demonstrating the effectiveness of the proposed method.
show that they improve the performance of both k-means
and random swap clustering algorithms. The proposed
approach surpasses five existing outlier removal methods on

2
GANAD[7] : model and evaluating its performance metrics, we
observed significantly more promising results compared
GANAD is an approach based on Generative adversarial to using the simple encoder. This underscores the
network(GAN) specifically designed for detecting importance of leveraging advanced techniques like GNNs
anomalies. The detection is based on the novel training for outlier detection tasks, especially when dealing with
strategy, which can better learn minority abnormal complex graph-structured data.
distribution from normal data patterns. Also, an encoder is
used to map data samples to the latent space, such that the
generator loss computation is optimized. This approach Discriminator
reduces the training cost and time consumption.
We suggest using graph embedding to estimate the
adjacency matrix A in order to extract the graph structural
information. The estimation is derived from the dot
III. PROPOSED MODEL product of the embedding output and an entry-wise
sigmoid function, as follows: Â = 𝑆𝑖g𝑚𝑜𝑖𝑑 (ZZT) or Â′ =
In this section, we present the proposed framework of 𝑆𝑖gmoid (Z′Z′𝑇), where Z and Z′ are the embeddings that
outlier detection in detail. We denote an undirected are encoded from the generator output X′ and the original
attributed network as G = (V, A, X), where V = {𝑣1, ..., node attributes X. The likelihood of a link occurring
𝑣𝑛} denotes 𝑛 graph nodes and X ∈ R𝑛×𝑚 denotes the between nodes 𝑖 and 𝑗 is indicated by Â𝑖𝑗 or Â′𝑖𝑗. The
node attribute matrix. The node connection of graph G is discriminator D is trained to determine whether the dot
represented by an adjacency matrix A, where A𝑖,𝑗 = 1 if product of embedding is from  (real) or Â′ (fake) for
there is an edge between 𝑣𝑖 and 𝑣𝑗 , otherwise A𝑖,𝑗 = 0. each node pair < 𝑣𝑖, 𝑣𝑗 > where A𝑖𝑗 > 0. For the purpose of
Given an attributed network G, the anomaly detection training the GAN model, we minimize the binary
aims to detect the nodes whose patterns differ classifier's cross-entropy cost.
significantly from the majority reference instances both in
attribute and structure. The architecture of the deep model Anomaly Detection
is illustrated in Figure 1. The proposed framework is a
Generative Adversarial Network(GAN) model having two Following model training, each node's anomaly score (𝑣𝑖)
components - a Generator and a Discriminator. To is determined using a structure discriminator loss (LD)
improve the efficiency of our GAN model for anomaly and a context reconstruction loss (LG):
detection, an Encoder is introduced to map the raw
attributes to a latent representation. 𝑠𝑐𝑜𝑟𝑒 (𝑣𝑖) = LG(𝑣𝑖) + LD(𝑣𝑖)

Generator where LG(𝑣𝑖) = ||𝑥𝑖 − 𝑥′𝑖||2 and LD(𝑣𝑖) is defined as

Using a low-dimensional prior Gaussian distribution, the 𝑚

generator approximates the distribution of the original 𝐿𝐷​ =− ​(1 / 𝑚) ∑ ​[𝑦𝑖​𝑙𝑜𝑔(𝐷(𝑥𝑖​)) + (1 − 𝑦𝑖​)𝑙𝑜𝑔(1 − 𝐷(𝐺(𝑧𝑖​)))]
𝑖=1
characteristics X. Our suggestion is to employ an MLP
(multi-layer perceptron) as the generator. MLP discovers Here LD(𝑣𝑖) is the discriminator loss, yi are the
a layer-wise representation using nonlinear mapping and corresponding labels (1 for real samples and 0 for fake
linear transformation: samples), m is the number of samples, xi are real samples,
zi are random noise vectors fed into the generator, G(zi)
HG(𝑙+1) = 𝑓 (WG(𝑙) HG(𝑙) + bG(𝑙) ) are the generated samples, D(xi) are the discriminator’s
output probability for real samples and D(G(zi)) are the
where HG(𝑙) is the input of the 𝑙-th perception layer, and discriminator’s output probability for fake samples.
HG(𝑙+1) is the output of this layer. The Gaussian random
noise in 𝑑 dimensions, where 𝑑 ≪ 𝑚, is taken as HG(0). IV. Datasets
WG(𝑙) is the layer parameter matrix and bE(𝑙) is the
corresponding bias. 𝑓 is set to be relu activation function. We have conducted evaluations on the proposed method
using three real-world datasets that have been widely
Encoder utilized in previous research [14]:

The encoder E converts the generator's output, X′, and the 1. BlogCatalog: BlogCatalog is a platform for sharing
original node characteristics, X, into a low-dimensional blogs, where bloggers who follow each other form a
latent space whose dimension matches the generator's social network. Users and their blogs contribute to the
previous data distribution. Using a simple encoder proved node attributes.
insufficient for capturing the structural information
inherent in graph data. Consequently, we opted to replace 2. Flickr: Flickr is a website for hosting and sharing
it with a Graph Neural Network (GNN) encoder. This images, where users who follow each other form a social
decision was motivated by the GNN’s ability to network. The interests of users contribute to the node
effectively capture the intricate structural details of graph attributes.
data. Upon implementing the GNN encoder in our GAN

3
3. Cora: The Cora dataset comprises 2708 scientific
publications categorized into seven classes. The citation
network consists of 5429 links. Each publication in the
dataset is represented by a binary word vector (0/1)
indicating the absence/presence of specific words from the
dictionary. The dictionary consists of 1433 unique words.
(3) Recall: The recall is calculated as the ratio
It is important to note that there is no definitive reference for between the numbers of Positive samples correctly
anomalies in the aforementioned datasets. Therefore, we classified as Positive to the total number of
rely on the method described in [11] to generate anomalies. Positive samples. This metric measures the
By perturbing the graph structure and node attributes, we proportion of true anomalies that a specific
create a combined set of anomalies for each dataset. The detection method discovered in the total number of
statistical information of these three attributed network ground truth anomalies.
datasets is summarized in Table 1.

V. RESULTS

We have used the following metrics to evaluate our model:

(1) Accuracy: Accuracy measures how often the (4) F1 Score: The F1 score or F-measure is described
model's predictions are correct across all classes. as the harmonic mean of the precision and recall of
It's calculated by dividing the number of correct a classification model.
predictions by the total number of predictions
made.

Table 1 shows performance of our model by using various metrics:


(2) Precision: Precision is defined as the ratio of
correctly classified positive samples (True Dataset Precision Recall Accuracy F1 Score
Positive) to a total number of classified positive Blog 0.987 0.987 0.997 0.987
samples (either correctly or incorrectly). As each Catalog
anomaly detection method outputs a ranking list Flickr 0.954 0.954 0.995 0.954
according to the anomalous scores of different Cora 0.993 0.993 0.999 0.993
nodes, we use Precision to measure the proportion
of true anomalies that a specific detection method We evaluated our GAN based model on three real world
discovered in its top K ranked nodes. datasets. The performance measures of our model on the
three datasets are shown in the table above. Our model gives
better results than other GAN based approaches.

4
attackers may insert harmful samples to evade the detection
of anomalies. Finally, we will explore methods to create
resilient anomaly detectors in the face of adversarial attacks.

V111. REFERENCES

[1] H. C. Mandhare and S. R. Idate, “A comparative study of cluster based


outlier detection, distance based outlier detection and density based outlier
detection techniques,” in 2017 International Conference on Intelligent
Computing and Control Systems (ICICCS), 2017, pp. 931–935.

[2] D. Pahuja and R. Yadav, “Outlier detection for different applications:


Review,” International Journal of Engineering Research Technology
(IJERT), vol. 02, 03 2013.

[3] J. Yang, S. Rahardja, and P. Franti, “Mean- ¨ shift outlier detection and
filtering,” Pattern Recognition, vol. 115, p. 107874, 2021. [Online].
Fig. 3. Line Plot of Number of Outliers vs Accuracy Values for Available:
Different Datasets. https://www.sciencedirect.com/science/article/pii/S0031320321000613

[4] D. Vengertsev and H. Thakkar, “Anomaly detection in graph :


Unsupervised learning , graph-based features and deep architecture,” 2015.
[Online]. Available: https://api.semanticscholar.org/CorpusID:16413497

[5] Z. Chen, B. Liu, M. Wang, P. Dai, J. Lv, and L. Bo, “Generative


adversarial attributed network anomaly detection,” in Proceedings of the
29th ACM International Conference on Information & Knowledge
Management, ser. CIKM ’20. New York, NY, USA: Association for
Computing Machinery, 2020, p. 1989–1992. [Online]. Available:
https://doi.org/10.1145/3340531.3412070

[6] X. Du, J. Chen, J. Yu, S. Li, and Q. Tan, “Generative adversarial nets
for unsupervised outlier detection,” Expert Systems with Applications, vol.
236, p. 121161, 2024. [Online]. Available:
https://www.sciencedirect.com/science/article/pii/S0957417423016639

[7] J. Fu, L. Wang, J. Ke, K. Yang, and R. Yu, “Ganad:a gan-based method
for network anomaly detection,” 09 2022.

[8] W. Jiang, Y. Hong, B. Zhou, X. He, and C. Cheng, “A gan-based


anomaly detection approach for imbalanced industrial time series,” IEEE
Fig. 4. Line Plot of Number of Outliers vs Precision Values for Access, vol. PP, pp. 1–1, 09 2019. 20
Different Datasets.
[9] T. Kumarage, S. Ranathunga, C. Kuruppu, N. D. Silva, and M.
Ranawaka, “Generative adversarial networks (gan) based anomaly
V1. CONCLUSION detection in industrial software systems,” in 2019 Moratuwa Engineering
Research Conference (MERCon), 2019, pp. 43–48.

This paper introduces a novel approach for detecting [10] B. Khemani, S. Patil, K. Kotecha, and S. Tanwar, “A review of graph
network anomalies using an adversarial attributed network neural networks: concepts, architectures, techniques, challenges, datasets,
applications, and future directions,” Journal of Big Data, vol. 11, no. 1, p.
method. The method involves training a generator to 18, Jan 2024. [Online]. Available:
reconstruct node attributes, while a discriminator determines https://doi.org/10.1186/s40537-023-00876-4
whether the embedding pair encoded by a graph neural
network-based encoder is from the original input or the [11] K. Ding, J. Li, R. Bhanushali, and H. Liu, Deep Anomaly Detection on
Attributed Networks, pp. 594–602. [Online]. Available:
generator output. The anomaly score is calculated based on https://epubs.siam.org/doi/abs/10.1137/1.9781611975673.67
a combination of reconstruction loss and discriminator loss.
Through experiments on real-world datasets, our method [12] A. Chaudhary, H. Mittal, and A. Arora, “Anomaly detection using
has shown to achieve top-notch performance, proving its graph neural networks,” 2019 International Conference on Machine
Learning, Big Data, Cloud and Parallel Computing (COMITCon), pp.
effectiveness. 346–350, 2019. [Online]. Available:
https://api.semanticscholar.org/CorpusID:204230429

V11. FUTURE SCOPES [13] J. Zhou, G. Cui, S. Hu, Z. Zhang, C. Yang, Z. Liu, L. Wang, C. Li, and
M. Sun, “Graph neural networks: A review of methods and applications,”
AI Open, vol. 1, pp. 57–81, 2020. [Online]. Available:
Future work involves reducing the computation time of our https://www.sciencedirect.com/science/article/pii/S2666651021000012
model since Graph neural network based encoders are more
computationally expensive than simple MLP encoders. We [14] [Online]. Available: https://github.com/EdisonLeeeee/GraphData
will also examine whether the suggested deep model is
susceptible to data poisoning attacks, as sophisticated

You might also like