0% found this document useful (0 votes)

58 views15 pages

F - ON : A E A G I P G Q P: AST N Xtended Lgorithm For Raph Somorphism Roblem and Raph Uery Rocessing

The document proposes an algorithm called Fast-ON* that extends an existing algorithm called Fast-ON to efficiently solve three graph problems: the graph isomorphism problem, subgraph matching over large graphs, and subgraph search in transaction graph databases. The performance of Fast-ON* is evaluated against existing algorithms for each problem and is shown to outperform previous approaches with a wide margin.

Uploaded by

Maurice Lee

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

58 views15 pages

F - ON : A E A G I P G Q P: AST N Xtended Lgorithm For Raph Somorphism Roblem and Raph Uery Rocessing

Uploaded by

Maurice Lee

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

International Journal of Database Management Systems ( IJDMS ) Vol.4, No.

6, December 2012

FAST-ON*: AN EXTENDED ALGORITHM FOR GRAPH ISOMORPHISM PROBLEM AND GRAPH QUERY PROCESSING
Mosab Hassaan and Karam Gouda
Faculty of Computers and Informatics, Benha University, Egypt
{mosab.hassaan, karam.gouda}@fci.bu.edu.eg

ABSTRACT
Graphs are widely used to model complicated data semantics in many applications. In our paper [8], we proposed Fast-ON, an efficient algorithm for subgraph isomorphism problem. In this paper, we develop an efficient algorithm called Fast-ON* that extends Fast-ON to handle two other problems, namely, graph isomorphism problem and graph query processing. Our performance study shows that Fast-ON* outperforms previously proposed algorithms of the two problems with a wide margin.

KEYWORDS
graph isomorphism, graph query processing

1. INTRODUCTION
As a popular data structure, graphs have been used to model many complex data objects and their relationships in the real world, such as the chemical compounds [20], entities in images [16], and social networks [2], etc. For example, in social network, a person i corresponds to a vertex vi in the graph G , and another person j corresponds to a vertex vj in the graph G . If persons i and j are acquaintances or they have a business relation, then an edge (vi, vj) exists, which connects vertex vi and vj. Also in chemistry, a set of atoms combined with designated bonds are used to describe chemical molecules. The Graph Isomorphism Problem (GIP) tests whether two given graphs are isomorphic. In other words, it asks whether there is a one-to-one mapping between the vertices of the graphs, preserving the edges. This problem has been studied for decades by mathematicians, chemists and computer scientists, and is considered interesting from both the theoretical and the practical point of view, since it has applications in many fields, ranging from pattern recognition and computer vision [7] to information retrieval [1], data mining [19], or chemistry [14]. For example, in data mining, one main challenges in frequent subgraph mining is to systematically generate candidate subgraphs in a non-redundant manner, such that we do not generate the same graph more than once. This means that we have to do graph isomorphism checking to make sure that duplicate graphs are removed. Given the two graphs q and q' in Figure 1, we have q is iosmorphic to q' (u1 is mapped to u3', u2 is mapped to u1', and u3 is mapped to u2'). GIP has not been possible thus far to prove it to be in the complexity class P nor NP-complete. Many algorithms have been proposed, such as Ullman [18], Nauty [15], and Vflib [4].

DOI: 10.5121/ijdms.2012.4602

International Journal of Database Management Systems ( IJDMS ) Vol.4, No.6, December 2012

Graph query processing [6, 22, 3, 21, 10, 23, 9, 26, 17, 29, 11, 24, 28, 27, 25] has attracted much attention in recent years thanks to the increasing popularity of graph databases in various application domains. Existing research on graph query processing is conducted mainly on two types of graph databases as follows.

Figure 1: Running Example The first one is a large graph such as social networks. Query processing on large graph (Subgraph over Large Graph (SLG)) can be described as follows. Given a query graph q and large graph G , we want to retrieve as output the set of subgraphs of G , each of which is isomorphic to q . For example, in protein-protein interaction networks, biologists may want to recognize groups of proteins which match a particular pattern in a large protein-protein interaction network. . Given the the graph G and and the query graph q in Figure 1, we have q is iosmorphic to two subgraphs in G (the first one: u1 is mapped to v1, u2 is mapped to v3, and u3 is mapped to v4 and the second one: u1 is mapped to v2, u2 is mapped to v4, and u3 is mapped to v5). There are many algorithms have been proposed for subgraph over large graph, such as [24, 28, 27, 25]. The second one is transaction graph databases that consist of a set of relatively smaller graphs. Transaction graph databases are prevalently used in scientific domains such as chemistry, bioinformatics, etc. Query processing on transaction graph databases (Subgraph Search Problem (SSP)) can be described as follows. Given a query graph q and a graph database

D = {g1 , g 2 , g 3 ,..., g n } , we want to retrieve as output all graph g i D such that q is a subgraph of g i . For example, given a large chemical compound database, a chemist may want to find all chemical compounds having a particular substructure. Given the databases graphs D =
{G, G'} and query graph q in Figure 1, Graph G should be returned as the result since G contains q. There are many algorithms have been proposed for subgraph search problem, such as [6, 22, 3, 21, 10, 23, 9, 26, 17, 29, 11]. In this paper, we propose an efficient algorithm called Fast-ON * for testing GIP, SLG, and SSP. Fast-ON * is an extension of Fast-ON, a previously published algorithm developed by us [8]. We evaluate Fast-ON * and compare it with Ullman and Vflib on real and synthetic datasets for GIP, with SMS [27] and GADDI [24] for SLG, and with CT-index [11] and FG-index [3] for SSP. Organization. This paper is organized as follows. Section 2 introduces the preliminary concepts. Section 3 presents the related work. Fast-ON * algorithm is introduced in Section 4. Section 5 reports the experimental results. We conclude in Section 6 by giving a summary and directions for future work. In Section 7 (Appendix), we give more details about Ullman and Fast-ON algorithms.

International Journal of Database Management Systems ( IJDMS ) Vol.4, No.6, December 2012

2. PRELIMINARY CONCEPTS
As a general data structure, labeled graph is used to model complex structured and schema-less data. In labeled graph, vertices and edges represent entity and relationship, respectively. The attributes associated with entities and relationships are called labels. This paper focuses on simple undirected graphs with vertex and edge labels or with vertex label only. Below, the terminology used throughout the paper is introduced. Definition 2.1 Labeled Graph. A labeled graph G is defined as a 4-tuples < VG , EG , LG , lG > , where VG is the set of vertices,

EG is the set of edges, LG is the set of labels, and lG is a labeling function that maps each
vertex or edge to a label in LG . Definition 2.2 Vertex Neighborhood. Given a graph G , the neighborhood of u VG is the set N G (u ) = {v VG | (u, v ) EG } . The degree of a vertex v VG is defined as deg (v ) =| N G (v ) | for simple graphs. Definition 2.3 Graph Isomorphism Problem (SIP). Given two graphs H = < VH , EH , LH , lH > and G = < VG , EG , LG , lG > . A graph isomorphism from H to G is a bijection f : VH a VG such that: (1) for any edge (u , v) EH , there is an edge

( f (u ), f (v)) EG , (2) l H ((u , v)) = lG (( f (u ), f (v))) .

l H (u ) = lG ( f (u ))

and

l H (v) = lG ( f (v)) ,

and

(3)

The concept of subgraph isomorphism probelm can be defined analogously by using an injection instead of a bijection. A graph H is called a subgraph of another graph G (or G is a supergraph of H ), denoted as H G (or G H ), if there exists a subgraph isomorphism from H to G . Definition 2.4 Vertex Labeled Neighborhood. [8] Given a graph G and a vertex u VG , the labeled neighborhood of u is given as NLG (u ) =

{(lG (v), lG ((u , v))) : v VG and (u, v) EG } .

The following theorem [8] presents the necessary condition required to map a vertex u Vq to a vertex v VG . Theorem 2.1

Given two graphs q and G such that q is subgraph isomorphic G under injective function f. If u Vq is mapped to v VG , then NLq (u ) NLG (v)

International Journal of Database Management Systems ( IJDMS ) Vol.4, No.6, December 2012

Definition 2.5 Subgraph Over Large Graph (SLG).

Given a large data graph G and a query graph q , where | Vq | << | VG | , the problem of subgraph over large graph is defined as to find all matches of q in G , where matches are defined in Definition 2.3.
Definition 2.6 Subgraph Search Problem (SSP).

Given a graph database D = {g1 , g 2 , g 3 ,..., g n } and a query graph q , the subgraph search problem is to find a set of graphs Dq which contain q from D , such that Dq = {g |

g D and q G} .

3. RELATED WORK
Ullman [18] and Vflib [4] are two well-known algorithms for (sub)graph isomorphism problem and Nauty [15] is well-known algorithm for graph isomorphism problem. Ullman algorithm is developed based on the branch and bound paradigm [13]. It is prohibitively expensive for querying against a very large data graph. The Vflib algorithm is another important algorithm for subgraph isomorphism problem. It uses an optimized serial version of Ullman algorithm. The algorithm proceeds by creating and modifying a match state. The match state contains a matchedset, which is a set of vertex pairs that match between the query graph q and data graph G. If the matched-set contains all of the query graph q, then the algorithm is successful and returns. Otherwise, the algorithm attempts to add a new pair. It does this by tracking the in-set and out-set of each graph, which are the sets of vertices immediately adjacent to the matched-set. These two sets define the potential vertices that can be added to a given state. The only pairs that can be added are either in the in-set of both graphs or the out-set of both graphs. The algorithm uses backtracking search to find either a successful match state, or return a failure. Nauty is a backtracking algorithm that traverses a search tree looking for a canonical labeling, and, in the process, builds the automorphism group of the graph. Nauty starts with an initial vertex classification by their degree, that defines a partition of the vertices. From this partition, it performs successive refinements based on the adjacencies of the vertices of a cell of the partition with the vertices in all the cells of the partition. For subgraph over large graph (SLG), Ullman [18] and Vflib [4] cannot work well in large graphs. There are many algorithms have been proposed for SLG. GADDI [24] has been proposed for this problem. The authors of GADDI proposed an index based on Neighborhood discriminative substructures. It counts the number of small substructures in induced intersection graph between the neighborhood of two vertices. Nova [28] is another algorithm for SLG. Nova utilizes a noval index called nIndex. It pre-order the query vertices in a way such that more computational cost could be shared. It also employed an eagerly pruning strategy which could determine the current enumeration state is impossible to lead to a successful mapping, so that the enumeration process could exit early. Also SPath [25] has been proposed for this problem. SPath maintains for each vertex of the network (large graph) a neighborhood signature, a compact indexing structure comprising decomposed shortest path information within the vertexs vicinity. It revolutionizes the way of graph query processing from vertex-at-a-time to path-at-a-time. There is another algorithm called SMS [27] for SLG. It desgin vertex code based on the information of each vertex and its neighbors. The authors of SMS proposed the strategy of partitioning the large graph to improve the query performance. For subgraph search problem (SSP), Ullman [18] and Vflib [4] also cannot work well since subgraph isomorphism problem is NP-complete problem [5] and we need to perform subgraph
14

International Journal of Database Management Systems ( IJDMS ) Vol.4, No.6, December 2012

isomorphism checking of q against each graph g i D . The major challenge in this scenario is to reduce the number of pairwise subgraph isomorphism checking. A number of graph indexing techniques have been proposed to address this challenge [22, 3, 23, 26, 17, 11, 10, 29]. There are two categories of graph indexing techniques: feature-based index and nonfeature-based index. In feature-based index [22, 3, 23, 26, 17, 11], some subgraphs are chosen as index features, and an inverted list is built for each feature. Generally, query processing follows the filtering-andverification framework. for example, CT-index [11] is proposed for SSP. The authors of CTindex proposed new approach based on the filtering-and-verification framework, using a new hash-key fingerprint technique with a combination of tree and cycle features for filtering and a new subgraph isomorphism test for verification. In another new indexing technique FG-Index [3], both frequent subgraphs and infrequent edges are chosen as feature set and it supports verification-free strategy. In the category of non-feature-based index. Closure-tree [10] (a clustering-based index) has been proposed to support both subgraph search problem and similarity queries. The authors of Closure-tree proposed pseudo subgraph isomorphism using the strategy of checking the existence of semi-perfect matching between query graph and databases graphs. Also GCoding [29] has been proposed. Based on GCoding, the structure of the graph can be encoded into a numerical space, and a two-step filtering method is presented to search the graph database. In Section 7 (Appendix), Since Fast-ON * is extended version of Fast-ON [8], we will give more details about Fast-ON algorithm and we will give also more details about Ullman algorithm [18].

4. FAST-ON* ALGORITHM
In this section, we present an algorithm called Fast-ON * that extends Fast-ON (see subsection 7.2. in Appendix) to handle two other problems, namely, graph isomorphism problem and graph query processing.

4.1. Graph Isomorphism Problem (GIP).

Considering the graph isomorphism instead of the subgraph isomorphism, we must apply the following. First, we test that the query graph and the data graph have the same number of vertices. If | Vq | is not equal to | VG | , we are sure that q is not isomorphic to G , thus the process could exit early. Then, we must modify the bit matrix M DLN in Algorithm 5 (Subsection 7.2) as follows. m(i, j ) = 1 if DLN q [i ] = DLN G [ j ] , otherwise m(i, j ) = 0 . algorithm for GIP is presented in the following algorithm. -------------------------------------------------------------------------------------------Algorithm 1: Fast - ON * ( q, G ) for graph isomorphism problem (GIP) ------------------------------------------------------------------------------------------Input: two graph q and G . Output: Boolean: q is isomorphic to G . Boolean Test = FALSE; /* Global Variable */ 1: if( | Vq |=| VG | ) then 2: Fast - ON ( q, G ) 3: else 4: return FASLE -------------------------------------------------------------------------------------------Fast-ON *

International Journal of Database Management Systems ( IJDMS ) Vol.4, No.6, December 2012

4.2. Graph Query Processing.

As we discussed in Section 1, graph query processing consists of two problems as follows. 4.2.1. Subgraph over Large Graph (SLG). Recall, SLG can be described as follows. Given a query graph q and large graph G , we want to retrieve as output the set of subgraphs of G, each of which is isomorphic to q. Fast-ON * algorithm for SLG can be presented as follows. The output of this algorithm will be all subgraph somorphism mappings f of q against G. This will be done by some changes in Fast-ON algorithm (Algorithm 5) as follows. First, remove the statement Boolean Test = FALSE. Then replace the lines 7-9 (in Procedure: Recursive_Search(ui ) ) by only one line "Output a mapping f ". In some cases, we do not use distinct neighborhoods strategy (see optimization two in Subsection 7.2) since the large graph may have large size of distinct neighborhoods. 4.2.2. Subgraph Search Problem (SSP). Recall, SSP can be described as follows. Given a query graph q and a graph database D = {g1 , g 2 , g 3 ,..., g n } , we want to retrieve as output all graph g i D such that q is a subgraph of g i . Fast-ON * algorithm for SSP is presented in the following algorithm. ----------------------------------------------------------------------------------------Algorithm 2: Fast - ON * (q, D ) for subgraph search problem (SSP). ----------------------------------------------------------------------------------------Input: q : a query graph and D : a graph database. Output: Dq : the answer set. 1: Dq = 2: for each G D do 3: Boolean Test = FALSE; 4: if Fast - ON ( q, G ) 5:

Dq = Dq {G}

-----------------------------------------------------------------------------------------

5. EXPERIMENTAL EVALUATION
In this section, we evaluate the performance of Fast-ON * on real and synthetic graphs. FastON * is implemented in standard C++ with STL library support and compiled with GNU GCC. Experiments were run on a PC with Intel 3GHz dual Core CPU and 4G memory running Linux. In experiments, we consider vertex-labeled and edge-labeled(or edge-unlabeled) simple undirected graphs.

5.1. Datasets
Experimental evaluation are performed on a group of real and synthetic datasets as follows. Real Datasets. The first real dataset, referred to as AIDS_10K, consists of 10,000 graphs that are randomly drawn from the AIDS Antiviral screen database 1. These graphs have 25 vertices and 27
1

http://dtp.nci.gov/.

International Journal of Database Management Systems ( IJDMS ) Vol.4, No.6, December 2012

edges on average. There are totally 62 distinct vertex labels in the dataset but the majority of these labels are C, O and N. The total number of distinct edge labels is 3. The second real dataset, referred to as Chem_1M. it is a subset of the PubChem database 2, and consists of one million graphs. Chem_1M has 23.98 vertices and 25.76 edges on average. The number of distinct vertex and distinct edge labels are 81 and 3 respectively. For this study, we derive subsets from Chem_1M, each of which consists of N graphs and called Chem_N dataset. The third real dataset, referred to as HRPD, that is, a human protein interaction network (one large graph). It consists of 9,460 vertices, 37,000 edges and 307 generated vertex labels with GO term description. In this dataset, the edge labels and direction are ignored in our experiment. Synthetic Datasets. The synthetic graph dataset is generated as follows: first, a set of S seed fragments (seed of a small subgraphs) is generated randomly, whose size is determined by a Poisson distribution with mean I . The size of each graph is a Poisson random variable with mean T . Seed fragments are then randomly selected and inserted into a graph one by one until the graph reaches its size. More details about the synthetic data generator are available in [12]. A typical dataset may have the following setting: it has 10,000 graphs and uses 100 seed fragments ( S = 100 ) with distinct vertex labels, LV = 3 and distinct edge labels, LE = 2 . On average, each graph has 50 edges ( T = 50 ) and each seed fragment has 15 edges ( I = 15 ). This dataset is denoted by Syn_10K. Query Sets. For the three datasets AIDS_10K, Chem_1M and Syn_10K, there are six query sets Q4, Q8, Q12, Q16, Q20 and Q24. Each set Qi consists of 1000 query graphs with i edges. For AIDS_10K, we adopt the query set from [22]. In order to generate query sets for Chem_1M and Syn_10K datasets, a set of 1000 graphs whose size larger than or equal to 24 are randomly selected from the dataset. Then, edges are removed from graphs such that the remaining graphs still connected. These graphs constitute Qi when all graphs are of size i . For HRPD dataset, we generate only three queries, namely, q4, q8, and q12 with size 4 , 8, and 12 respectively.

5.2 Performance Study

In this section, we evaluate the performance of Fast-ON on various datasets for the two problems, namely, graph isomorphism problem and graph query processing as follows. 5.2.1. Performance Study for Graph Isomorphism Problem (GIP) In this subsection, for GIP, we evaluate the performance of Fast-ON * on the query sets of the three datasets, namely, AIDS_10k, Chem_1M, and Syn_10K against themselves, i.e., we perform Qi against Qi where i {4, 8, 12, 16, 20, 24}. This will be done by comparing Fast-ON * with the two algorithms Ullman and Vflib. Total response time in msec for each query set is recorded and demonstrated in Figure 2. From this Figure, Fast-ON * algorithm significantly outperforms Ullman algorithm and Vflib algorithm and it achieves even more performance gain with increasing query size. 5.2.2. Performance Study for graph query processing In this subsection, we evaluate the performance of Fast-ON * for the two problems of graph query processing on real and synthetic graphs as follows. 1. Performance Study for Subgraph over Large Graph (SLG).
2

ftp://ftp.ncbi.nlm.nih.gov/pubchem/.

International Journal of Database Management Systems ( IJDMS ) Vol.4, No.6, December 2012

For SLG, we evaluate the performance of Fast-ON * on HRPD dataset by comparing it with the two algorithms GADDI and SMS. Total response time in msec for the queries (q4, q8, and q12) is recorded and demonstrated in Figure 3. From this Figure, Fast-ON * algorithm significantly outperforms the two algorithms GADDI and SMS by 7 order of magnitude and a factor up to 2 respectively. 2. Performance Study for Subgraph Graph Search (SSP). For SSP, we evaluate the performance of Fast-ON * on AIDS_40K and Chem_200K datasets by comparing it with the two algorithms FG-index and CT-index. Note that, the total response time of the algorithm FG-index (or CT-index) is equal to its index construction time plus its query processing time. Total response time in sec for each query set is recorded and demonstrated in Figure 4. In Figure 4(a), Fast-ON * algorithm significantly outperforms FG-index algorithm and CT-index algorithm by 5 order of magnitude and 2 order of magnitude respectively. In Figure 4(b), Fast-ON * algorithm significantly outperforms FG-index algorithm with a factor up to 4. Note that CT-index is not shown in Figure 4(b), since it failed to run on our machine.

6. CONCLUSION
In this paper, we develop an efficient algorithm called Fast-ON * that extends Fast-ON to handle two other problems, namely, graph isomorphism problem and graph query processing. The algorithm presented in this paper is very effective and efficient. The experimental results demonstrated that Fast-ON * outperforms the state-of-the-art algorithms of the two problems that studied in this paper on various datasets. Possible direction for future studies include an approximate graph query processing.

International Journal of Database Management Systems ( IJDMS ) Vol.4, No.6, December 2012

Figure 2: Performance Study for Graph Isomorphism Problem (GIP)

International Journal of Database Management Systems ( IJDMS ) Vol.4, No.6, December 2012

Figure 3: Performance Study for Subgraph over Large Graph (SLG)

F igure 4: Performance Study for Subgraph Search Problem (SSP)

International Journal of Database Management Systems ( IJDMS ) Vol.4, No.6, December 2012

7. APPENDIX
In this section, we discuss Ullman algorithm and Fast-ON algorithm in more details as follows.

7.1. Ullman Algorithm [18]

Given a query graph q and a data graph G. To check if q is subgraph of G, Ullmans basic approach is to enumerate all possible mappings of vertices in Vq to those in VG using a depthfirst tree-search algorithm. Figure 5 shows a part of the search tree generated from testing the two graphs G and q (query graph) in Figure 1. At level i of the search tree, a vertex ui in Vq is mapped to some vertex in VG (the number j inside each node in the search tree means that this node represents the vertex v j VG ). The root node of the search tree represents the starting point of the search, inner nodes of the search tree correspond to partial mappings, and nodes at level | Vq | represent complete not necessarily sub-isomorphic mappings. If there exists a complete mapping that preserves adjacency in both q and G, then we have q is subgraph isomorphic to G, otherwise q is not subgraph isomorphic to G. The bold path in Figure 4, ( u1 is mapped to v1 , u 2 is mapped to v3 , and u3 is mapped to v4 ), is a complete mapping that preserves adjacency in q and G, thus q is subgraphs isomorphic to G. Unfortunately, the number of complete mappings is exponential in the number of nodes of the involved graphs. This means that the running time may be huge even for reasonably small graphs. In order to cope with subgraph isomorphism problem efficiently, Ullman uses a refinement procedure to prune the search space. This procedure based on the following two conditions: 1. Label and degree condition. A vertex u Vq can be mapped to v VG under injective mapping f, i.e., v = f(u), if (i) l q (u) = lG (v), and (ii) deg(u) deg(v). 2. Neighbor condition. By this condition, Ullman algorithm examines the feasibility of mapping u Vq to v VG by considering the preservation of structural connectivity. If there exist edges connecting u with previously explored vertices of q but there are no counterpart edges in G, the mapping test simply fails.

Figure 5: A Part of the Search Tree of Ullman Algorithm

International Journal of Database Management Systems ( IJDMS ) Vol.4, No.6, December 2012

7.2. Fast-ON Algorithm [8]

Our algorithm Fast-ON is based on Ullman algorithm. It improves Ullman algorithm by considering two effective optimizations as follows. It reduces the search space as much as possible by following a novel ordering strategy of the querys vertices (first optimization (Opt1)), and by utilizing the label information of vertexs neighborhood (second optimization (Opt2)). Comparing to Ullman [18] and Vflib [4], Fast-ON achieves up to 1-3 orders of magnitude speedup. The two optimizations are explained as follows. The first optimization is based on the observation that the search order in Ullman algorithm is random. Ullman algorithm depends on the order of query vertices imposed during input. This default ordering of Vq can possibly result in a search order that seriously slows down the algorithm. The adopted approach to order Vq is to require the currently processing query vertex to have high connectivity with the previously explored ones, that is, suppose that ui Vq is the currently processing vertex, then ui should have the higher connectivity with u1 , u2 , K , ui 1 among the remaining ones. Whereas, u1 is the one with maximum degree. This ordering forces unsuccesful mapping to be discarded as early as possible during the search, thus saving much of the time that Ullman algorithm may take on false long partial mappings. Algorithm 4 outlines this idea. -----------------------------------------------------------------------Algorithm 4: Order_Vertices(Vq ) -----------------------------------------------------------------------Input: Vq = {u1 , u2 , K, u|V | } ;
q
' Output: An order of Vq , Vq' = {u1 , u'2 , K , u|'V | } ; q

1: V = ; 2: for each u Vq do calculate deg (u ) ;

' 3: u1 = uk , k = argmax uV deg (u ) ;

' q

4: Add u to V and remove u k from Vq ; 5: for i = 2 K | Vq | 6: 7:

' u1 = uk , k = argmax uV | {(u, u ' ) Eq : u ' Vq' } | ;

' 1

' q

Add u to V and remove u k from Vq ;

' i

' q

8: return Vq' ; -----------------------------------------------------------------------For the second optimization, Fast-ON uses Theorem 2.1 to map a vertex u Vq to a vertex

v VG to improve the condition 2 in Ullman algorithm. To reduce the cost of the containment checks, Fast-ON cashing most of the repeated computation, as in the following steps:
1. Find the set of distinct labeled neighborhoods for the two graphs q and G , denoted as

DLN G and DLN q respectively.

International Journal of Database Management Systems ( IJDMS ) Vol.4, No.6, December 2012

2. Construct a bit matrix M DLN = [ m(i, j )] where =| DLN q | and =| DLN G | , to maintain the inclusion relationship between distinct neighborhoods of q and G , that is, m(i, j ) = 1 if

DLN q [i ] DLN G [ j ] , otherwise m(i, j ) = 0 .

3. For a graph g g is q or G construct an array of pointers Pg of size | Vg | , called position array, where each slot u holds the index of the vertex u labeled neighborhood at DLN g . Algorithm 5 outlines Fast-ON algorithm. Line 1 applies the first optimization Opt1, whereas lines 2-5 outline the second optimization Opt2. In line 5, for each query vertex u Vq , data graph vertices v VG that satisfy the modified first condition are collected into a set called candidate set C (u ) . The procedure Recursive_Search matches ui over C (ui ) (line 5) and proceeds step-by-step by recursively matching the subsequent vertex ui +1 over C (ui +1 ) (lines 6-7), or sets Test to true value and returns if every vertex of q has counterpart in G (line 9). If ui exhausts all vertices in C (ui ) and still cannot find matching, Recursive_Search backtracks to the previous state for further exploration (line 11). The procedure Matchable applies the second condition in [18]. This condition examines the feasibility of mapping u Vq to v VG by considering the preservation of structural connectivity. If there exist edges connecting u with previously explored vertices of q but there are no counterpart edges in G , the mapping test simply fails.

---------------------------------------------------------------------------------------------------------Algorithm 5: Fast ON (q, G ) ---------------------------------------------------------------------------------------------------------Input: q : a query graph and G : a data graph. Output: Boolean: q is a subgraph of G .
Boolean Test = FALSE; /* Global Variable */ 1: Vq' = Order _ Vertices(Vq ) ; 2: Construct DLN G , DLN q and M DLN ; 3: Construct both Pq and PG ; 4: for each u Vq' do 5: /* Opt1 */

C (u ) = {v : v VG , lq (u ) = lG (v), and m( Pq (u ), PG (v)) = 1} ; /* Opt2 (Cond. 1 of Ullman

after changing)*/

6: Recursive_Search(u1 ) ; 7: return Test;

----------------------------------------------------------------------------------------------------------

International Journal of Database Management Systems ( IJDMS ) Vol.4, No.6, December 2012

-------------------------------------------------------------------Algorithm 5: Fast ON (q, G ) [continued] -------------------------------------------------------------------Procedure Recursive_Search(ui )

1: if NOT Test then 2: for v C (ui ) and v is unmatched do 3: 4: 5: 6: 7: 8: 9: 10: if NOT Matchable(ui , v) then continue;

f (ui ) = v ; v = matched;
if i <| Vq' | then

Recursive_Search(ui+1 ) ;
else Test = TRUE; return; f (ui ) = NULL; v = unmatched; /* Backtrack */

Function Matchable(ui , v) /* Cond. 2 of Ullman*/ 1: for each j < i do 2: if (v, f (u j )) EG then return FALSE; 3: return TRUE;

----------------------------------------------------------REFERENCES
[1] [2] [3] [4] A. T. Berztiss. A backtrack procedure for isomorphism of directed graphs. Journal of the ACM, Vol. 20, No. 3, pp365377, 1973. D. Cai, Z. Shao, X. He, X. Yan, and J. Han. Community mining from multi-relational networks. Proc. of PKDD, 2005. J. Cheng, Y. Ke, W. Ng, and A. Lu. Fg-index: towards verification-free query processing on graph databases. SIGMOD, pp857872, 2007. L. P. Cordella, P. Foggia, C. Sansone, and M. Vento. A (sub)graph isomorphism algorithm for matching large graphs. IEEE transaction on pattern analysis and machine intelligence, Vol. 26, No. 10, pp13671372, 2004. M. R. Garey and D. S. Johnson. Computers and intractability; guide to the theory of NPcompleteness. W. H. Freeman & Co., 1990. R. Giugno and D. Shasha. Graphgrep; a fast and universal method for quering graphs. Proc. of the 16th International Conference on Pattern Recognition, pp112115, 2002. M. Gori, M. Maggini, and L. Sarti. Graph matching using random walks. Proc. of the 17th International Conference on Pattern Recognition, pp394397, 2004. K. Gouad and M. Hassaan. A fast algorithm for subgraph search problem. Proc. of the 8th International Conference on Informatics and Systems, ppDE53 DE59, 2012. W.-S. Han, J. Lee, M.-D. Pham, and J. X. Yu. igraph: a framework for comparisons of disk-based graph indexing techniques. PVLDB, pp449459, 2010. H. He and A. K. Singh. Closure-tree: An index structure for graph queries. ICDE, pp3849, 2006. Karsten Klein, Nils Kriege, and Petra Mutzel. Ct-index: Fingerprint-based graph indexing combining cycles and trees. ICDE, pp11151126, 2011. M. Kuramochi and G. Karypis. Frequent subgraph discovery. Proc. of ICDM, pp313320, 2001. A. H. Land and A. G. Doig. An automatic method of solving discrete programming problems. Econometrica, Vol. 28, No. 3, pp497520. X. Liu and D. J. Klein. The graph isomorphism problem. Journal of Computational Chemistry, Vol. 12, No. 10, pp12431251, 1991. 24

[5] [6] [7] [8] [9] [10] [11] [12] [13] [14]

International Journal of Database Management Systems ( IJDMS ) Vol.4, No.6, December 2012 [15] B. D. McKay. Practical graph isomorphism. Congressus Numerantium, Vol. 30, pp4587, 1981. [16] E. G. M. Petrakis and C. Faloutsos. Similarity searching in medical image databases. IEEE transactions on knowledge and data enginnering, , Vol. 9, No. 3, 1997. [17] H. Shang, Y. Zhang, and X. Lin. Taming verification hardness: an efficient algorithm for testing subgraph isomorphism. PVLDB, pp364375, 2008. [18] J. R. Ullmann. An algorithm for subgraph isomorphism. ACM, , Vol. 23, No. 1, pp3142, 1976. [19] T. Washio and H. Motoda. State of the art of graph-based data mining. ACM SIGKDD Explorations Newsletter, , Vol. 5, No. 1, pp5968, 2003. [20] P. Willett. Chemical similarity searching. J. Chem. Inf. Computer Science, , Vol. 38, No. 6, 1998. [21] D. W. Williams, J. Huan, and W. Wang. Graph database indexing using structured graph decomposition. ICDE, pp976985, 2007. [22] X. Yan, S. Yu, and J. Han. Graph indexing: a frequent structure-based approach. SIGMOD, pp335 346, 2004. [23] S. Zhang, M. Hu, and J. Yang. Treepi: A novel graph indexing method. ICDE, pp966975, 2007. [24] S. Zhang, S. Li, and J. X. Yu. Gaddi: distance index based subgraph matching in biological networks. EDBT, 2009. [25] P. Zhao and J. Han. On graph query optimization in large networks. VLDB, 2010. [26] P. Zhao, J. X. Yu, and P. S. Yu. Graph indexing: tree + delta <= graph. VLDB, pp938949, 2007. [27] W. Zheng, L. Zou, and D. Zhao. Answering subgraph queries over large graphs. WAIM, pp390402, 2011. [28] K. Zhu, Y. Zhang, X. Zhu, and W. Wang. Nova: A novel and efficient framework for finding subgraph isomorphism mappings in large graphs. DASFAA, pp140154, 2010. [29] L. Zou, L. Chen, J. X. Yu, and Y. Lu. A novel spectral coding in a large graph database. EDBT, pp181192, 2008.

Reservoir Types. Classification Methodology
100% (1)
Reservoir Types. Classification Methodology
2 pages
Geneaid - GSYNC DNA Extraction Kit - Protocol
100% (1)
Geneaid - GSYNC DNA Extraction Kit - Protocol
16 pages
1 s2.0 S0306437924000590 Main
No ratings yet
1 s2.0 S0306437924000590 Main
15 pages
Graph Mining: Anuraj Mohan 13MZ01, CSED
No ratings yet
Graph Mining: Anuraj Mohan 13MZ01, CSED
50 pages
O B S I: Ptimized Acktracking For Ubgraph Somorphism
No ratings yet
O B S I: Ptimized Acktracking For Ubgraph Somorphism
10 pages
Graph Pattern Mining, Search and OLAP
No ratings yet
Graph Pattern Mining, Search and OLAP
14 pages
Feature-Based Similarity Search in Graph
No ratings yet
Feature-Based Similarity Search in Graph
36 pages
A New Method For Subgraph Detection - SubGraD
No ratings yet
A New Method For Subgraph Detection - SubGraD
8 pages
Subgraph Matching With Set Similarity in A Large Graph Database
No ratings yet
Subgraph Matching With Set Similarity in A Large Graph Database
6 pages
Indexing Sparse Graphs For Similarity Search
No ratings yet
Indexing Sparse Graphs For Similarity Search
3 pages
A Comparative Study of Frequent Subgraph Mining Algorithms
No ratings yet
A Comparative Study of Frequent Subgraph Mining Algorithms
17 pages
(AAAI2023) Learning To Count Isomorphisms
No ratings yet
(AAAI2023) Learning To Count Isomorphisms
9 pages
Continuous Subgraph Pattern Search Over Certain and Uncertain Graph Streams
No ratings yet
Continuous Subgraph Pattern Search Over Certain and Uncertain Graph Streams
18 pages
An Improved Algorithm For Matching Large Graphs: L. P. Cordella, P. Foggia, C. Sansone, M. Vento
No ratings yet
An Improved Algorithm For Matching Large Graphs: L. P. Cordella, P. Foggia, C. Sansone, M. Vento
8 pages
Neighborhood Based Fast Graph Search in
No ratings yet
Neighborhood Based Fast Graph Search in
12 pages
Mengjiao Guo Thesis PDF
No ratings yet
Mengjiao Guo Thesis PDF
152 pages
Graph Indexing - A Review
No ratings yet
Graph Indexing - A Review
40 pages
Unit 4
No ratings yet
Unit 4
78 pages
Graph: Dr. Krishan Kumar Assistant Professor & Head Computer Science & Engineering
No ratings yet
Graph: Dr. Krishan Kumar Assistant Professor & Head Computer Science & Engineering
102 pages
Group 4 PRT Presentation
No ratings yet
Group 4 PRT Presentation
48 pages
Co So Du Lieu Do Thi
No ratings yet
Co So Du Lieu Do Thi
46 pages
Original GNN
No ratings yet
Original GNN
22 pages
Application of Graph Theory in CS
No ratings yet
Application of Graph Theory in CS
7 pages
Graph Data Mining: Slides Are Modified From Jiawei Han & Micheline Kamber
No ratings yet
Graph Data Mining: Slides Are Modified From Jiawei Han & Micheline Kamber
37 pages
Community Detection Using Statistically Significant Subgraph Mining
No ratings yet
Community Detection Using Statistically Significant Subgraph Mining
10 pages
13 Network Ion
No ratings yet
13 Network Ion
23 pages
Information Sciences: Chunyao Song, Tingjian Ge, Yao Ge, Haowen Zhang, Xiaojie Yuan
No ratings yet
Information Sciences: Chunyao Song, Tingjian Ge, Yao Ge, Haowen Zhang, Xiaojie Yuan
24 pages
The Graph Neural Network Model
No ratings yet
The Graph Neural Network Model
20 pages
Lecture 03
No ratings yet
Lecture 03
23 pages
I-Introduction To Network Theory: Basic Concepts
No ratings yet
I-Introduction To Network Theory: Basic Concepts
66 pages
Distributed Graph Isomorphism Using Quantum Walks
No ratings yet
Distributed Graph Isomorphism Using Quantum Walks
5 pages
Learning With L1-Graph For Image Analysis-rD5
No ratings yet
Learning With L1-Graph For Image Analysis-rD5
9 pages
Graph Isomofirm Problem3372123
No ratings yet
Graph Isomofirm Problem3372123
7 pages
Graph Algorithms: Timothy Vismor June 11, 2011
No ratings yet
Graph Algorithms: Timothy Vismor June 11, 2011
30 pages
Vf2 Sub GVraph Iso Impl
No ratings yet
Vf2 Sub GVraph Iso Impl
10 pages
Lecture 5: Basics of Graph Theory: 1 Definitions
No ratings yet
Lecture 5: Basics of Graph Theory: 1 Definitions
11 pages
Logical Reasoning To Reasoning Studies
No ratings yet
Logical Reasoning To Reasoning Studies
10 pages
Efficient Frequent Connected Induced Subgraph Mining in Graphs of Bounded Tree-Width
No ratings yet
Efficient Frequent Connected Induced Subgraph Mining in Graphs of Bounded Tree-Width
16 pages
Module 1
No ratings yet
Module 1
32 pages
Term Paper
No ratings yet
Term Paper
12 pages
Data Mining-Graph Mining
No ratings yet
Data Mining-Graph Mining
9 pages
Shortest Path Computing in Relational DBMSS: Jun Gao, Jiashuai Zhou, Jeffrey Xu Yu, and Tengjiao Wang
No ratings yet
Shortest Path Computing in Relational DBMSS: Jun Gao, Jiashuai Zhou, Jeffrey Xu Yu, and Tengjiao Wang
15 pages
Community Detection: Statistical Inference Models: Anupama Chowdhary Satya Prakash Sharma
No ratings yet
Community Detection: Statistical Inference Models: Anupama Chowdhary Satya Prakash Sharma
6 pages
Unit-3Graph Theory
No ratings yet
Unit-3Graph Theory
27 pages
Graph in Datastructure
No ratings yet
Graph in Datastructure
34 pages
Neural Subgraph Counting With Wasserstein Estimator - 副本
No ratings yet
Neural Subgraph Counting With Wasserstein Estimator - 副本
16 pages
Grami-2014-Elseidy
No ratings yet
Grami-2014-Elseidy
12 pages
29256-Article Text-33310-1-2-20240324
No ratings yet
29256-Article Text-33310-1-2-20240324
9 pages
Tutorial On Spectral Clustering
No ratings yet
Tutorial On Spectral Clustering
26 pages
Advanced Graph Theory
No ratings yet
Advanced Graph Theory
11 pages
Mining Frequent Subgraph Patterns From Uncertain Graph Data
No ratings yet
Mining Frequent Subgraph Patterns From Uncertain Graph Data
16 pages
Discrete Structures
No ratings yet
Discrete Structures
26 pages
Literatue Review
No ratings yet
Literatue Review
13 pages
1 s2.0 S2352220824000518 Main
No ratings yet
1 s2.0 S2352220824000518 Main
23 pages
Shervashidze 11 A
No ratings yet
Shervashidze 11 A
23 pages
Efficient (,) - Core Computation: An Index-Based Approach: Boge Liu Long Yuan Xuemin Lin
No ratings yet
Efficient (,) - Core Computation: An Index-Based Approach: Boge Liu Long Yuan Xuemin Lin
12 pages
11 Graph Pattern Mining
No ratings yet
11 Graph Pattern Mining
71 pages
International Journal of Database Management Systems
No ratings yet
International Journal of Database Management Systems
2 pages
International Journal of Database Management Systems (IJDMS)
No ratings yet
International Journal of Database Management Systems (IJDMS)
2 pages
International Journal of Database Management Systems (IJDMS)
No ratings yet
International Journal of Database Management Systems (IJDMS)
2 pages
International Journal of Database Management Systems (IJDMS)
No ratings yet
International Journal of Database Management Systems (IJDMS)
2 pages
A Comparative Analysis of Data Mining Methods and Hierarchical Linear Modeling Using Pisa 2018 Data
No ratings yet
A Comparative Analysis of Data Mining Methods and Hierarchical Linear Modeling Using Pisa 2018 Data
16 pages
8th International Conference On Data Mining & Knowledge Management (DaKM 2023)
No ratings yet
8th International Conference On Data Mining & Knowledge Management (DaKM 2023)
2 pages
S11 Question Catalog en
No ratings yet
S11 Question Catalog en
2 pages
Enrtl-Rk Rate Based Dipa Model
No ratings yet
Enrtl-Rk Rate Based Dipa Model
34 pages
ENCOR - Chapter - 1 - Packet Forwarding
No ratings yet
ENCOR - Chapter - 1 - Packet Forwarding
57 pages
Reteach Multiples - Worksheet Given by The Teacher
No ratings yet
Reteach Multiples - Worksheet Given by The Teacher
1 page
XpressBees ReverseReattemptDate CustomerAlternateAddress MobileUpdationAPI
No ratings yet
XpressBees ReverseReattemptDate CustomerAlternateAddress MobileUpdationAPI
5 pages
Data Reconciliation
No ratings yet
Data Reconciliation
15 pages
Micrologix 1200 and 1500 Programmable Controllers Firmware Upgrade
No ratings yet
Micrologix 1200 and 1500 Programmable Controllers Firmware Upgrade
12 pages
Synapse RIS Version 4-1
No ratings yet
Synapse RIS Version 4-1
46 pages
Netflix Cookies X19
No ratings yet
Netflix Cookies X19
4 pages
(Lab Report) : Experiment 03
No ratings yet
(Lab Report) : Experiment 03
20 pages
X and y I.e., For Arguments and Entries Values, and Plot The Different Points On The Graph For
No ratings yet
X and y I.e., For Arguments and Entries Values, and Plot The Different Points On The Graph For
4 pages
Errecom Cat.a.05 19.en
No ratings yet
Errecom Cat.a.05 19.en
88 pages
3-Terminal 1A Positive Voltage Regulator
No ratings yet
3-Terminal 1A Positive Voltage Regulator
2 pages
Maths Sample Papers XII
No ratings yet
Maths Sample Papers XII
111 pages
PDMS Procedure: 2D DRAFT Intermediate - Structural Discipline
No ratings yet
PDMS Procedure: 2D DRAFT Intermediate - Structural Discipline
14 pages
Philosophy of Mind (Jenkins & Sullivan) (2012)
100% (2)
Philosophy of Mind (Jenkins & Sullivan) (2012)
199 pages
Features Material Specifications: Application
No ratings yet
Features Material Specifications: Application
1 page
Abyss MiniRPG
No ratings yet
Abyss MiniRPG
4 pages
Batiment International, Building Research and Practice
No ratings yet
Batiment International, Building Research and Practice
2 pages
Chap 3 VDZ Activity Report 09-12
No ratings yet
Chap 3 VDZ Activity Report 09-12
19 pages
Economic Order Quantity: Information
No ratings yet
Economic Order Quantity: Information
11 pages
Diagnostic Trouble Code Chart: Hint: When The Air Conditioning System Function Properly, DTC B1400/00 Is Output
No ratings yet
Diagnostic Trouble Code Chart: Hint: When The Air Conditioning System Function Properly, DTC B1400/00 Is Output
3 pages
SOPFalloutDataWorkaroundv1 2
No ratings yet
SOPFalloutDataWorkaroundv1 2
457 pages
Teknik Menjawab Kimia 3 SPM
No ratings yet
Teknik Menjawab Kimia 3 SPM
31 pages
PH 401
No ratings yet
PH 401
9 pages
Chemical Resistance Guide
No ratings yet
Chemical Resistance Guide
20 pages
Flux Motor 2018
No ratings yet
Flux Motor 2018
29 pages
Iron FerroVer + TPTZ Methods
No ratings yet
Iron FerroVer + TPTZ Methods
15 pages

F - ON : A E A G I P G Q P: AST N Xtended Lgorithm For Raph Somorphism Roblem and Raph Uery Rocessing

Uploaded by

F - ON : A E A G I P G Q P: AST N Xtended Lgorithm For Raph Somorphism Roblem and Raph Uery Rocessing

Uploaded by

International Journal of Database Management Systems ( IJDMS ) Vol.4, No.

( f (u ), f (v)) EG , (2) l H ((u , v)) = lG (( f (u ), f (v))) .

{(lG (v), lG ((u , v))) : v VG and (u, v) EG } .

Definition 2.5 Subgraph Over Large Graph (SLG).

4.1. Graph Isomorphism Problem (GIP).

4.2. Graph Query Processing.

5.2 Performance Study

Figure 2: Performance Study for Graph Isomorphism Problem (GIP)

Figure 3: Performance Study for Subgraph over Large Graph (SLG)

F igure 4: Performance Study for Subgraph Search Problem (SSP)

7.1. Ullman Algorithm [18]

Figure 5: A Part of the Search Tree of Ullman Algorithm

7.2. Fast-ON Algorithm [8]

1: V = ; 2: for each u Vq do calculate deg (u ) ;

4: Add u to V and remove u k from Vq ; 5: for i = 2 K | Vq | 6: 7:

Add u to V and remove u k from Vq ;

DLN G and DLN q respectively.

DLN q [i ] DLN G [ j ] , otherwise m(i, j ) = 0 .

C (u ) = {v : v VG , lq (u ) = lG (v), and m( Pq (u ), PG (v)) = 1} ; /* Opt2 (Cond. 1 of Ullman

6: Recursive_Search(u1 ) ; 7: return Test;

-------------------------------------------------------------------Algorithm 5: Fast ON (q, G ) [continued] -------------------------------------------------------------------Procedure Recursive_Search(ui )

You might also like