GPT4Graph
GPT4Graph
Figure 2: Illustration of Self-prompting. The first request is to ask LLMs to automatically generate the context of
the input graph (w/ or w/o respect to the question). We may ask LLMs multiple context related questions. After
generating the new context (such as context summarization and format explanation). the new context is combined
with the original input and are send to the LLMs to generate the final output.
generate answers to the user. During the reason- obtain more context or eliminating irrelevant infor-
ing, the LLMs may generate intermedium output mation from the given input. It can be challenging
that should be handled by prompt handler to form for LLM to generate effective prompts for graph-
new input to the LLMs. Here we elaborate the based tasks, as graphs have complex structures and
prompt handler to show how to make LLM better relationships that need to be accurately captured
understand graph data. in the prompt. However, there are several strate-
gies that can be employed for self-prompting in
3.1 Manual Prompting graph-based tasks.
Manual prompting for graph-based problems in- Context Summarization: LLM can generate a
volves utilizing familiar graph representations to summary of the given graph by extracting key fea-
prompt a large language model (LLM) for desired tures, such as important nodes, edges, or subgraphs.
outputs. The novelty of this approach lies in the The generated summary can serve as a prompt for
fact that it requires a shift from traditional text- the subsequent graph-related questions or tasks.
based inputs to graphical representations. These Besides, based on some important elements like
graph formats have been discussed in Section 2.2. nodes and edges, we can use LLM to summarize
By employing these graph formats as input, we can their context (neighborhood) information to form
provide more comprehensive and context-rich in- neighborhood aware text features.
formation about the graph to the LLM. Other man- Format Explanation: Sometimes it is hard for a
ual prompting techniques include adding format human to give the entire description of the input
explanation to make LLM better understand the graph format. To make the LLM gain more context
format and adding role prompting to make LLM information of the input graph, we can make the
better understand the specific task. Besides, we LLM to generate format explanation by itself.
can also change the input order between question By leveraging these self-prompting strategies,
and external input, and adding examples to utilize LLM can actively engage in the understanding and
the in-context learning ability (Wei et al., 2021) of manipulation of graphs, facilitating graph-based
LLM. reasoning and learning.
Nonetheless, some recent developed change-of-
4 Graph Understanding Benchmark
thoughts promptings (Kojima et al., 2022; Yao
et al., 2023) can also be applied to enhance the 4.1 Structure Understanding Tasks
reasoning ability of LLM for there are many tasks Graph Size Detection. This task evaluates a large
requiring multiple step of reasoning (e.g clustering language model’s (LLM) capability to discern the
coefficient computing). size of a provided graph. In this context, size refers
to the count of nodes and edges present in the graph.
3.2 Self-Prompting
The LLM is expected to accurately determine these
Sometimes the given graph context contains less metrics, even when user-provided designs and ac-
useful or redundant information for solving tasks. companying data, such as descriptions, statements,
Thus we need LLM to perform self-prompting to or queries, augment the graph. Despite the inherent
challenge this poses for language models, a precise Size Detection Degree Detection Edge Detection
is starred
rect
is starred
Forest
Forest
tr y Gump is st
by
coun
arr
tr y Gump is st
by
ed arr
by coun ed
by
Tom
U.S. Tom
Hanks U.S. Hanks
Edge Detection. Building on degree detection, this Query: The director who directs Query: The director who directs Forest Gump
also direct what?Use Cypher to answer
task further explores the LLM’s understanding of Forest Gump also direct what?
Answer: MATCH (m1)-[is directed by]->(d)-[direct]->(m2)
Answer: Back to the Future RETURN m2
Size Detection Degree Detection Edge Detection Attribute Retrieval Diameter Clustering
Format Input Design
ACC ∆ ACC ∆ ACC ∆ ACC ∆ ACC ∆ ACC ∆
1-shot 35.50 0.00 15.21 0.00 65.45 0.00 - - 28.00 0.00 5.42 0.00
1-shot-cot 44.00 +8.50 14.58 -0.63 65.25 -0.20 - - 24.00 -4.00 1.85 -3.57
Adjacency w/o format explanation 33.00 -0.25 16.34 +1.13 57.50 -8.25 - - 18.00 -10.00 5.19 +3.43
List w/o role prompting 36.60 +1.10 15.70 +0.49 55.00 -10.45 - - 20.00 -8.00 4.71 -0.23
w/o change order 14.00 -21.50 26.28 +11.07 51.20 -14.25 - - 30.00 +2.00 14.92 -9.50
w/o 1-shot 33.00 -2.50 17.18 +1.97 71.90 -6.45 - - 22.00 -6.00 7.85 +2.43
1-shot 22.50 0.00 44.87 0.00 74.60 0.00 - - 43.00 0.00 13.31 0.00
1-shot-cot 27.00 +4.50 48.65 +3.78 74.70 +0.10 - - 41.00 -2.00 11.33 -1.98
Edge List w/o format explanation 25.00 +2.50 47.86 +2.99 71.55 -3.05 - - 36.00 -7.00 18.11 +4.80
w/o role prompting 18.00 -4.50 47.64 +2.57 71.70 -2.90 - - 39.00 -4.00 13.63 +0.35
w/o change order 9.00 -13.50 20.48 -23.39 79.60 +5.00 - - 10.00 -33.00 20.06 + 7.05
w/o 1-shot 23.00 +0.50 49.34 +4.47 80.95 +6.35 - - 34.00 -9.00 19.16 +5.84
1-shot 54.50 0.00 20.91 0.00 50.45 0.00 83.40 0.00 37.00 0.00 4.36 0.00
1-shot-cot 55.50 +1.00 20.76 -0.15 50.10 -0.35 83.30 -0.10 28.00 -9.00 0.95 -3.41
GML w/o format explanation 55.00 -0.50 29.06 +8.15 50.00 -0.45 85.97 +2.57 41.00 +4.00 12.71 +8.35
w/o role prompting 54.50 -0.50 29.79 +8.88 50.00 -0.45 84.50 +0.10 35.00 -2.00 6.96 +2.60
w/o change order 51.50 -3.00 21.16 +0.24 55.65 +5.20 83.56 +0.16 39.00 +2.00 5.25 +0.89
w/o 1-shot 54.00 -0.50 19.85 -1.06 50.25 +0.20 83.22 -0.18 42.00 +5.00 5.39 +1.03
1-shot 25.00 0.00 40.20 0.00 62.05 0.00 83.87 0.00 34.00 0.00 9.74 0.00
1-shot-cot 22.50 -2.50 40.02 -0.18 62.30 +0.25 83.75 -0.12 32.00 -2.00 7.29 -2.45
GraphML w/o format explanation 19.00 -6.00 46.90 +5.88 53.75 -8.40 85.37 +1.50 38.00 +4.00 22.75 +13.01
w/o role prompting 15.50 -9.50 49.89 +9.87 56.10 -5.95 87.63 +3.76 31.00 -3.00 14.52 +4.78
w/o change order 8.50 -16.50 30.60 -9.60 65.35 +3.30 9.76 -4.11 43.00 +9.00 8.00 -1.74
0-shot 24.50 -0.50 39.59 -0.61 73.95 +11.90 82.90 -0.97 30.00 -4.00 14.32 +4.58
lead to improved performance and more accurate We investigated the impact of external knowledge,
structural understanding. such as questions, statements, and examples, on
Role Prompting Generally Improves Perfor- graph understanding. Comparing the placement
mance. Our findings indicate that incorporating of external knowledge before or after the graph
role-prompting techniques generally enhances the input, we observed that positioning external knowl-
model’s performance in the Structure Understand- edge before the graph generally led to better per-
ing Task. By explicitly guiding the model to focus formance. Placing external knowledge before the
on specific roles or relationships within the graph, graph provides additional context information, en-
we enable it to extract more meaningful insights abling the model to better comprehend the specific
and make more accurate predictions. Role prompt- graph it needs to handle. Conversely, positioning
ing serves as an effective mechanism for capturing the graph behind external knowledge may hinder
the nuances of the graph’s structure and leveraging the model’s ability to effectively utilize the relevant
that information for improved understanding. information, potentially degrading performance.
These findings show the importance of thought-
Examples Have Impacts on Graph Understand- ful input design, the potential benefits of role
ing. Similar to previous research that suggests prompting techniques, the limited impact of exam-
the utility of examples in large language models ples in graph understanding, and the significance of
(LLMs), we discovered that examples also have positioning external knowledge for optimal perfor-
some extend of positive effects in graph under- mance. Understanding these factors can guide fu-
standing scenarios. However, omitting specific ture research and inform the development of more
examples and relying on zero-shot learning ap- effective models for structure understanding tasks.
proaches sometimes yielded more powerful results.
This phenomenon can be attributed to the rich in- 6.3 Results for Semantic Understanding Task
herent information present within the graph itself, The resuls for semantic understanding tasks are
which allows the model to grasp the complexities shown in Figure 2 and Figure 3. We have the fol-
of the structure without the need for explicit exam- lowing discoveries:
ples. Examples, in some cases, can introduce noise, Resuts for KGQA and GQL generation. The
biases, or incomplete information, hindering the results for KGQA and GQL generation is shown
model’s overall understanding. in Table 2. It’s noticeable that current SOTA mod-
The Position of External Knowledge Matters. els consistently show higher performance across
Table 2: Performance on KGQA and GQL Generation
all datasets, with scores ranging from 94.80 on Table 3: Performance of Node Classification on OGBN-
MetaQA-3hop to 98.80 on MetaQA-2hop. How- ARXIV. self denotes only the use of the text feature of
the target nodes. 1-hop denotes using the text feature of
ever, LLM showed comparative performance on
direct neighbors. 2-hop denotes using the text feature
certain tasks with prompt strategies. Specifically, within 2-hop neighbors.
the ’zero-shot+graph’ method has performed ex-
ceptionally well on the ’Wiki’ dataset, achiev- Method Context ACC
ing an accuracy of 56.38, the highest among
our proposed models. Similarly, the ’zero-shot- self 48.00
cot+graph+change-order’ model performs the best zero-shot 1-hop 53.00
on MetaQA-1hop, scoring 95.87. When we com- 2-hop 57.00
pare zero-shot models with ’zero-shot-cot’ coun- self 40.00
terparts, we observe a general trend that the in- zero-shot-cot 1-hop 40.00
clusion of the graph (’+graph’) and change or- 2-hop 56.00
der (’+change-order’) enhancements improve the
self 50.00
model performance. For the ’one-shot Cypher’
one-shot 1-hop 54.00
method, an impressive performance of 99.00 is
2-hop 60.00
achieved on the MetaQA-1hop, surpassing the
state-of-the-art and all other models in our study. self 43.00
one-shot-cot 1-hop 55.00
Results for Node Classification. For Node Clas- 2-hop 59.00
sification on OGBN-ARXIV (Table 3), the ’one-
shot + 1-hop neighborhood context summarization’
model has the highest accuracy of 60.00 among Results for Graph Classification. The results
all the variants. Interestingly, models augmented for graph classification task is shown in Table 4.
with 2-hop neighborhood context summarization From the result, we find that the self-augmentation
(’2-hop’) show better performance than their 1- is effective in improving the performance of GC.
hop counterparts, showing that expanding context It shows that self-augmentation like self-format
range is helpful in providing valuable information. explanation and self-summarization can enrich the
Also, the model performs better than the change-of- context of the original graph and will make the
thought (cot) model, suggesting that the cot strategy LLM more easily complete the task.
might not be as effective for this task. These results
7 Discussion
indicate potential areas for improvement, partic-
ularly for the ’zero-shot-cot’ and ’change-order’ Our findings suggest several promising directions
strategies, which don’t consistently improve per- for future work in structure understanding tasks
formance. Nonetheless, the experiments provide with LLMs. First, more research is needed to
valuable insights into the performance of different understand how different input designs and role
strategies in the node classification task. prompting techniques can further enhance perfor-
mance. Second, we encourage researchers to in-
Table 4: Performance on Graph Classification
OGBG-MOLHIV OGBG-MOLPCBA
Dataset
GML GraphML GML GraphML
1-shot-tot 66.87 63.25 57.18 57.45
1-shot-cot 67.65 64.71 59.26 57.32
w/o self-format explanation 64.71 64.71 58.73 56.24
w/o self-summarization 61.76 61.77 57.64 56.67
0-shot-cot 58.82 59.76 55.57 55.32
vestigate why examples are less effective for graph with textual information, leveraging both textual
understanding and to explore alternative strategies and structural data to make informed predictions.
for leveraging the rich information embedded in
graphs. Third, the role of external knowledge place- 8.2 Graph Machine Learning
ment merits further exploration. Finally, new ap- Graph machine learning develops models and algo-
proaches for graph augmentation could be devel- rithms for data structured as graphs, representing
oped to improve performance on semantic under- complex relationships in various domains. Tra-
standing tasks. ditional machine learning struggles with graph-
In addition, our experiments have revealed the structured data, but graph machine learning meth-
potential of LLMs in various tasks beyond pure nat- ods utilize the graph structure to extract mean-
ural language processing. We believe that more ef- ingful features and make predictions. Graph con-
fort should be dedicated to integrating graph-based volutional (GCN) networks extend convolutional
information into LLMs, exploring different types neural networks to operate on graph-structured
of graph structures, and applying LLMs to other data, capturing local and global structural patterns
areas such as graph theory, network science, and and excelling in tasks like node classification and
complex systems. In the future, we may also con- graph-level classification (Kipf and Welling, 2016).
sider using LLM to control the use of external tools Graph attention networks (GAT) incorporate atten-
to better handle graph structured data (Schick et al., tion mechanisms, allowing adaptive aggregation of
2023; Zhang, 2023). information from relevant nodes (Velickovic et al.,
2017). GAT perform well in tasks like node clas-
8 Related Works sification and graph-level representation learning.
8.1 Language Model for Structural Data Graph generative models generate new graphs to
Understanding capture the structural characteristics and properties
of the input data, benefiting tasks like molecule
Language models are being extended to understand
generation (Walters and Barzilay, 2020) and graph-
and work with structural data, such as graphs, ta-
based data augmentation (Zhao et al., 2021). Graph
bles, and trees. One approach is using graph neural
machine learning techniques enable effective analy-
networks to encode structural information, captur-
sis and extraction of insights from graph-structured
ing dependencies and relationships between ele-
data, advancing fields relying on understanding
ments (Qasim et al., 2019). Incorporating GNNs
complex relationships and dependencies.
into language models enables them to generate con-
textually aware outputs that consider the structural
9 Conclusion
characteristics of the data. Another approach is
incorporating attention mechanisms into language In this work, we analyze the ability of large lan-
models for structural data (Chen et al., 2022; Eisen- guage models to understand graph-structured data.
schlos et al., 2021). Attention allows the model to Our findings indicate that there is still a long way
focus on relevant parts, improving understanding for a LLM to understand graph data. Future re-
of complex dependencies and enhancing perfor- search should focus on developing and refining
mance in tasks like graph completion and table methods for encoding graph-structured informa-
understanding. Language models can also bene- tion into a format that a large language model can
fit from combining knowledge graph embeddings comprehend and manipulate effectively. This is a
complex challenge given the inherent differences Weihua Hu, Matthias Fey, Marinka Zitnik, Yuxiao Dong,
between sequential text data and graph data, which Hongyu Ren, Bowen Liu, Michele Catasta, and Jure
Leskovec. 2020. Open graph benchmark: Datasets
is intrinsically multi-dimensional and relational.
for machine learning on graphs. Advances in neural
information processing systems, 33:22118–22133.
Federico Errica, Marco Podda, Davide Bacciu, and Kazuya Okamoto, Wei Chen, and Xiang-Yang Li. 2008.
Alessio Micheli. 2019. A fair comparison of graph Ranking of closeness centrality for large-scale so-
neural networks for graph classification. arXiv cial networks. Lecture Notes in Computer Science,
preprint arXiv:1912.09893. 5059:186–195.
Heng Gong, Yawei Sun, Xiaocheng Feng, Bing Qin, Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida,
Wei Bi, Xiaojiang Liu, and Ting Liu. 2020. Tablegpt: Carroll Wainwright, Pamela Mishkin, Chong Zhang,
Few-shot table-to-text generation with table structure Sandhini Agarwal, Katarina Slama, Alex Ray, et al.
reconstruction and content matching. In Proceedings 2022. Training language models to follow instruc-
of the 28th International Conference on Computa- tions with human feedback. Advances in Neural
tional Linguistics, pages 1978–1988. Information Processing Systems, 35:27730–27744.
Michael Himsolt. 1997. Gml: Graph modelling lan- Shah Rukh Qasim, Hassan Mahmood, and Faisal
guage. University of Passau. Shafait. 2019. Rethinking table recognition using
graph neural networks. In 2019 International Con- Yuyu Zhang, Hanjun Dai, Zornitsa Kozareva, Alexan-
ference on Document Analysis and Recognition (IC- der Smola, and Le Song. 2018. Variational reasoning
DAR), pages 142–147. IEEE. for question answering with knowledge graph. In
Proceedings of the AAAI conference on artificial in-
Yu Rong, Wenbing Huang, Tingyang Xu, and Junzhou telligence, volume 32.
Huang. 2019. Dropedge: Towards deep graph con-
volutional networks on node classification. arXiv Tong Zhao, Yozen Liu, Leonardo Neves, Oliver Wood-
preprint arXiv:1907.10903. ford, Meng Jiang, and Neil Shah. 2021. Data aug-
mentation for graph neural networks. In Proceedings
Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta of the aaai conference on artificial intelligence, vol-
Raileanu, Maria Lomeli, Luke Zettlemoyer, Nicola ume 35, pages 11015–11023.
Cancedda, and Thomas Scialom. 2023. Toolformer:
Language models can teach themselves to use tools. Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang,
arXiv preprint arXiv:2302.04761. Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen
Zhang, Junjie Zhang, Zican Dong, et al. 2023. A
Yuan Sui, Mengyu Zhou, Mingjie Zhou, Shi Han, and survey of large language models. arXiv preprint
Dongmei Zhang. 2023. Evaluating and enhancing arXiv:2303.18223.
structural understanding capabilities of large lan-
guage models on tables via input designs. arXiv
preprint arXiv:2305.13062. A Detailed Description of Datasets
Jie Tang, Jing Zhang, Limin Yao, Juanzi Li, Li Zhang, A.1 OGBN-ARXIV
and Zhong Su. 2008. Arnetminer: extraction and
mining of academic social networks. In Proceedings The Open Graph Benchmark (OGB) is a collection
of the 14th ACM SIGKDD international conference of diverse, large-scale, and challenging datasets and
on Knowledge discovery and data mining, pages 990– benchmarking tasks for graph machine learning re-
998. search. OGBN-ARXIV is a part of the OGB Node
Property Prediction track. The dataset comprises
Petar Velickovic, Guillem Cucurull, Arantxa Casanova,
Adriana Romero, Pietro Lio, Yoshua Bengio, et al. academic papers from the arXiv website, which
2017. Graph attention networks. stat, 1050(20):10– are represented as nodes in a citation graph. In the
48550. graph, the edges denote the citation relationships
between the papers. Each paper is associated with
W Patrick Walters and Regina Barzilay. 2020. Appli-
cations of deep learning in molecule generation and
a 128-dimensional word2vec feature vector derived
molecular property prediction. Accounts of chemical from its title and abstract. The task associated with
research, 54(2):263–270. this dataset is to predict the subject area of each pa-
per, making it a multi-class classification problem.
Stanley Wasserman and Katherine Faust. 1994. Social We sample a subset of 100 nodes with multi-hop
network analysis: Methods and applications.
neighbors for testing.
Jason Wei, Maarten Bosma, Vincent Zhao, Kelvin Guu,
Adams Wei Yu, Brian Lester, Nan Du, Andrew M A.2 OGBG-MOLX
Dai, and Quoc V Le. 2021. Finetuned language mod-
els are zero-shot learners. In International Confer- OGBG-MOLX is part of the Graph Property Predic-
ence on Learning Representations. tion track in OGB and it comprises of two datasets:
MOLHIV and MOLPCBA. MOLHIV dataset con-
Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, tains molecular graphs where the task is to pre-
Thomas L Griffiths, Yuan Cao, and Karthik
Narasimhan. 2023. Tree of thoughts: Deliberate dict whether a molecule inhibits HIV virus replica-
problem solving with large language models. arXiv tion or not, making it a binary classification prob-
preprint arXiv:2305.10601. lem. On the other hand, MOLPCBA dataset con-
tains molecular graphs with the task of predict-
Jiawei Zhang. 2023. Graph-toolformer: To em- ing bioactivity for various protein targets, which
power llms with graph reasoning ability via
prompt augmented by chatgpt. arXiv preprint is a multi-label classification problem. In these
arXiv:2304.11116. datasets, nodes represent atoms and edges repre-
sent bonds between atoms. Node and edge features
Junlong Zhang and Yu Luo. 2017. Degree centrality, include atom type, atom degree, bond type, and
betweenness centrality, and closeness centrality in
social network. In 2017 2nd international conference
whether the bond is in a ring. We sample 100
on modelling, simulation and applied mathematics graphs with the same number of positive and nega-
(MSAM2017), pages 300–303. Atlantis press. tive samples for testing.
A.3 Wiki Table 5: Input Design for Different Tasks.
C Cypher Introduction