LLM Multi-Agent Systems Challenges and Open Problems
LLM Multi-Agent Systems Challenges and Open Problems
Shanshan Han 1 Qifan Zhang 1 Yuhang Yao 2 Weizhao Jin 3 Zhaozhuo Xu 4 Chaoyang He 5
1
(Ongoing Update) LLM Multi-Agent Systems: Challenges and Open Problems
complete some tasks on behalf of the user in the blockchain Dynamic Structure. Dynamic structures mean that the
network. states of the multi-agent system, e.g., the role of agents,
their relations, and the number of agents in the multi-agent
2. Overview system, may change (Talebirad & Nadiri, 2023) over time.
As an example, (Talebirad & Nadiri, 2023) enables ad-
2.1. Structure of Multi-agent Systems dition and removal of agents to make the system to suit
the tasks at hand. A multi-agent system may also be con-
The structure of multi-agent systems can be categorized into
textually adaptive, with the interaction patterns inside the
various types, based on the each agent’s functionality and
system being modified based on internal system states or
their interactions.
external factors, such as contexts. Agents in such systems
Equi-Level Structure. LLM agents in an equi-level system can dynamically reconfigure their roles and relationships in
operate at the same hierarchical level, where each agent response to changing conditions.
has its role and strategy, but neither holds a hierarchical
advantage over the other, e.g., DMAS (Chen et al., 2023); 2.2. Overview of Challenges in Multi-Agent Systems
see Figure 1(a). The agents in such systems can have same,
neutral, or opposing objectives. Agents with same goals This paper surveys various components of multi-agent sys-
collaborate towards a common goal without a centralized tems and discusses the challenges compared with single-
leadership. The emphasis is on collective decision-making agent systems. We discuss planning, memory management,
and shared responsibilities (Li et al., 2019). With opposing as well as potential applications of multi-agent systems on
objectives, the agents negotiate or debate to convince the distributed systems, e.g., blockchain systems.
others or achieve some final solutions (Terekhov et al., 2023; Planning. In a single-agent system, planning involves the
Du et al., 2023; Liang et al., 2023; Chan et al., 2023). LLM agent breaking down large tasks into a sequence of
Hierarchical Structure. Hierarchical structures (Gronauer small, manageable tasks to achieve specific goals efficiently
& Diepold, 2022; Ahilan & Dayan, 2019) typically con- while enhancing interpretability, controllability, and flexi-
sists of a leader and one or multiple followers; see Fig- bility (Li et al., 2024; Zhang et al., 2023b; Nye et al., 2021;
ure 1(b). The leader’s role is to guide or plan, while the Wei et al., 2022). The agent can also learn to call external
followers respond or execute based on the leader’s instruc- APIs for extra information that is missing from the model
tions. Hierarchical structures are prevalent in scenarios weights (often hard to change after pre-training), or connect
where coordinated efforts directed by a central authority LLMs with websites, software, and tools (Patil et al., 2023;
are essential. Multi-agent systems that explore Stackel- Zhou et al., 2023; Cai et al., 2023) to help reasoning and
berg games (Von Stackelberg, 2010; Conitzer & Sandholm, improve performance. While agents in a multi-agent sys-
2006) fall into this category (Harris et al., 2023). This type tem have same capabilities with single-agent systems, they
of game is distinguished by this leadership-followership dy- encounter challenges inherited from the work flow in multi-
namic and the sequential nature of decision-making. Agents agent systems. In §3, we discuss partitioning work flow and
make decisions in a sequential order, where the leader player allocating the sub-tasks to agents; we name this process as
first generate an output (e.g., instructions) then the follower “global planning”; see §3.1. We then discuss task decom-
players take an action based on the leader’s instruction. position in each single-agent. Different from planning in a
single-agent systems, agents in multi-agent systems must
Nested Structure. Nested structures, or hybrid structures, deal with more sophisticated contexts to reach alignment in-
constitute sub-structures of equi-level structures and/or hier- side the multi-agent system, and further, achieve consistency
archical structures in a same multi-agent system (Chan et al., towards the overall objective; see §3.2.
2023); see Figure 1(c). The “big picture” of the system can
be either equi-level or hierarchical, however, as some agents Memory management. Memory management in single-
have to handle complex tasks, they break down the tasks agent systems include short-term memory during a conver-
into small ones and construct a sub-system, either equi-level sation, long-term memory that store historical conversations,
or hierarchical, and “invite” several agents to help with and, if any, external data storage that serves as a complemen-
those tasks. In such systems, the interplay between different tary information source for inferences, e.g., RAG (Lewis
levels of hierarchy and peer-to-peer interaction contributes et al., 2020). Memory management in multi-agent systems
to complexity. Also, the interaction among those different must handle complex context data and sophisticated interac-
structures can lead to intricate dynamics, where strategies tion and history information, thus requires advanced design
and responses become complicated due to the presence of for memories. We classify the memories involved in multi-
various influencing factors, including external elements like agent systems in §4.1 and then discuss potential challenges
context or environment. posed by the sophisticated structure of memory in §4.2.
Application. We discuss applications of multi-agent sys-
2
(Ongoing Update) LLM Multi-Agent Systems: Challenges and Open Problems
tems in blockchain, a distributed system that involves sophis- sider the context for the overall tasks as well as each agent.
ticated design of layers and applications. Basically, multi- This requires a deep understanding of the task at hand and
agent systems can serve as a tool due to its ability to handle the specific strengths and limitations of each agent in the
sophisticated tasks in blockchain; see §5.1. Blockchain can system.
also be integrated with multi-agent systems due to their dis-
Introducing loops for a subset of agents to enhance in-
tributed nature, where an intelligent agent can be allocated
termediate results. Multi-agent systems can be integrated
to an blockchain node to perform sophisticated actions, such
with loops inside one or multiple subsets of agents to im-
as negotiations, on behalf of the agent; see §5.2.
prove the quality of the intermediate results, or, local op-
timal answers. In such loops, agents debate or discuss to
3. Planning achieve an optimal results that are accepted by the agents
in the loop. The iterative process can refine the interme-
Planning in multi-agent systems involves understanding the
diate results, leading to a deeper exploration of the task.
overall tasks and design work flow among agents based on
The agents in the loop can adjust their reasoning process
their roles and specializations, (i.e., global planning) and
and plans during the loop, thus have better capabilities in
breaking down the tasks for each agent into small manage-
handling uncertainties of the task.
able tasks (i.e., local planning). Such process must account
for functionalities of the agents, dynamic interactions among Game Theory. Game theory provides a well-structured
the agents, as well as a more complex context compared with framework for understanding strategic interactions in multi-
single-agent systems. This complexity introduces unique agent systems, particularly for systems that involve com-
challenges and opportunities in the multi-agent systems. plex interactions among agents such as debates or discus-
sions. A crucial concept in game theory is equilibrium, e.g.,
3.1. Global Planning Nash Equilibrium (Kreps, 1989) and Stackelberg Equilib-
rium (Von Stackelberg, 2010; Conitzer & Sandholm, 2006),
Global planning refers to understanding the overall task that describes a state where, given the strategies of others,
and split the task into smaller ones and coordinate the sub- no agent benefits from unilaterally changing their strategy.
tasks to the agents. It requires careful consideration of task Game theory has been applied in multi-agent systems, espe-
decomposition and agent coordination. Below we discuss cially Stackelberg equilibrium (Gerstgrasser & Parkes, 2023;
the unique challenges in global planning in multi-agent Harris et al., 2023), as the structure of Stackelberg equilib-
systems. rium contains is a leader agent and multiple follower agents,
Designing effective work flow based on the agents’ spe- and such hierarchical architectures are wildely considered
cializations. Partitioning responsibilities and designing an in multi-agent systems. As an example, (Gerstgrasser &
effective work flow for agents is crucial for ensuring that the Parkes, 2023) designs a general multi-agent framework to
tasks for each agent are executable while meaningful and identify Stackelberg Equilibrium in Markov games, and
directly contributes to the overall objective in multi-agent (Harris et al., 2023) extend the Stackelberg model to al-
systems. The biggest challenge lies in the following per- low agents to consider external context information, such
spectives: 1) the partition of work flow should maximize as traffic and weather, etc. However, some problems are
the utilization of each agent’s unique capabilities, i.e., each still challenging in multi-agent systems, such as defining an
agent can handle a part of the task that matches its capabili- appropriate payoff structure for both the collective strategy
ties and expertise; 2) each agent’s tasks must align with the and individual agents based on the context of the overall
overall goal; and 3) the design must understand and con- tasks, and efficiently achieving equilibrium states. These
3
(Ongoing Update) LLM Multi-Agent Systems: Challenges and Open Problems
unresolved issues highlight the ongoing need for refinement Aligning Overall Context. Alignment of goals among
in the application of game theory to complex multi-agent different agents is crucial in multi-agent systems. Each LLM
scenarios. agent must have a clear understanding of its role and how it
fits into the overall task, such that the agents can perform
3.2. Single-Agent Task Decomposition their functions effectively. Beyond individual roles, agents
need to recognize how their tasks fit into the bigger picture,
Task decomposition in a single agent involves generating such that their outputs can harmonize with the outputs of
a series of intermediate reasoning steps to complete a task other agents, and, further, ensuring all efforts are directed
or arrive at an answer. This process can be represented as towards the common goal.
transforming direct input-output (⟨input → output⟩) map-
pings into the ⟨input → rational → output⟩ mappings (Wei Aligning Context Between Agents. Agents in multi-
et al., 2022; Zhang et al., 2023b). Task composition can be agent systems process tasks collectively, and each agent
of different formats, as follows. must understand and integrate the contextual information
provided by other agents within the system to ensure that
1) Chain of Thoughts (CoT) (Wei et al., 2022) that trans- the information provided by other agents is fully utilized.
forms big tasks into step-by-step manageable tasks to rep-
resent interpretation of the agents’ reasoning (or thinking) Aligning Context for Decomposed Tasks. When tasks of
process. each agents are broken down into smaller, more manageable
sub-tasks, aligning the complex context in multi-agent sys-
2) Multiple CoTs (Wang et al., 2022a) that explores multiple tems becomes challenging. Each agent’s decomposed task
independent CoT reasoning paths and return the one with must fit their individual tasks and the overall goal while inte-
the best output. grating with contexts of other agents. Agents must adapt and
3) Program-of-Thoughts (PoT) (Chen et al., 2022) that uses update their understanding of the task in response to context
language models to generate text and programming lan- provided by other agents, and further, plan the decomposed
guage statements, and finally an answer. tasks accordingly.
4) Table-of-Thoughts (Tab-CoT) (Ziqi & Lu, 2023) that uti- Consistency in Objectives. In multi-agent systems, con-
lize a tabular-format for reasoning, enabling the complex sistency in objectives is maintained across various levels,
reasoning process to be explicitly modelled in a highly struc- i.e., from overall goals down to individual agent tasks and
tured manner. their decomposed tasks. Each agent must understand and
effectively utilize the layered contexts while ensuring its
5) Tree-of-Thoughts (ToT) (Yao et al., 2023; Long, 2023) task and the decomposed sub-tasks to remain aligned with
that extends CoT by formulating a tree structure to explore the overall goals. (Harris et al., 2023) extends the Stackel-
multiple reasoning possibilities at each step. It enables berg model (Von Stackelberg, 2010; Conitzer & Sandholm,
generating new thoughts based on a given arbitrary thought 2006) to enable agents to incorporate external context in-
and possibly backtracking from it. formation, such as context (or insights) provided by other
6) Graph-of-Thoughts-Rationale (GoT-Rationale) (Besta agents. However, aligning the complex context with the
et al., 2023) that explores an arbitrary graph to enable ag- decomposed tasks during reasoning remains an unresolved
gregating arbitrary thoughts into a new one and enhancing issue.
the thoughts using loops.
7) Rationale-Augmented Ensembles (Wang et al., 2022b) 4. Agent Memory and Information Retrieval
that automatically aggregate across diverse rationales to The memory in single-LLM agent system refers to the
overcome the brittleness of performance to sub-optimal agent’s ability to record, manage, and utilize data, such
rationales. as past historical queries and some external data sources,
In multi-agent systems, task decomposition for a single to help inference and enhance decision-making and reason-
agent becomes more intricate. Each agent must understand ing (Yao et al., 2023; Park et al., 2023; Li & Qiu, 2023;
layered and sophisticated context, including 1) the overall Wang et al., 2023; Guo et al., 2023). While the memory in a
tasks, 2) the specific context of the agent’s individual tasks, single-LLM agent system primarily focuses on internal data
and 3) the contextual information provided by other agents management and utilization, a multi-agent system requires
in the multi-agent system. Moreover, the agents must align the agents to work collaboratively to complete some tasks,
these complex, multi-dimensional contexts into their decom- necessitating not only the individual memory capabilities of
posed tasks to ensure coherent and effective functioning each agent but also a sophisticated mechanism for sharing,
within the overall task. We summarize the challenges for integrating, and managing information across the different
single agent planning as follows. agents, thus poses challenges to memory and information
4
(Ongoing Update) LLM Multi-Agent Systems: Challenges and Open Problems
5
(Ongoing Update) LLM Multi-Agent Systems: Challenges and Open Problems
sophisticated tools for various tasks on blockchain and Web3 Smart Contract Management and Optimization. Smart
systems. Also, blockchain nodes can be viewed as agents contracts are programs that execute the terms of a contract
with specific roles and capabilities (Ankile et al., 2023). between a buyer and a seller in a blockchain system. The
Given that both Blockchain systems and multi-agent sys- codes are fixed, and are self-executed when predetermined
tems are inherently distributed, the blockchain networks conditions are met. Multi-agent systems can automate and
can be integrated with multi-agent systems seamlessly. By optimize the execution of smart contracts with more flexible
assigning a dedicated agent to each blockchain node, it’s terms and even dynamic external information from users.
possible to enhance data analyzing and processing while Agents can negotiate contract terms on behalf of their users,
bolstering security and privacy in the chain. manage contract execution, and even optimize gas fees (in
the context of Ethereum (Wood et al., 2014). The agents
5.1. Multi-Agent Systems As a Tool can analyze context information , such as past actions and
pre-defined criteria, and utilize the information with more
To cast a brick to attract jade, we give some potential direc- flexibility. Such negotiations can also utilize game theory,
tions that multi-agents systems can act as tools to benefit such as Stackelberg Equilibrium (Von Stackelberg, 2010;
blockchain systems. Conitzer & Sandholm, 2006) when there is a leader nego-
Smart Contract Analysis. Smart contracts are programs tiator and Nash Equilibrium (Kreps, 1989) when there is no
stored on a blockchain that run when predetermined con- leader.
ditions are met. Multi-agents work together to analyze
and audit smart contracts. The agents can have different 6. Conclusion
specializations, such as identifying security vulnerabilities,
legal compliance, and optimizing contract efficiency. Their The exploration of multi-agent systems in this paper under-
collaborative analysis can provide a more comprehensive scores their significant potential in advancing the capabil-
review than a single agent could achieve alone. ities of LLM agents beyond the confines of single-agent
paradigms. By leveraging the specialized abilities and col-
Consensus Mechanism Enhancement. Consensus mech- laborative dynamics among agents, multi-agent systems can
anisms like Proof of Work (PoW) (Gervais et al., 2016) or tackle complex tasks with enhanced efficiency and innova-
Proof of Stake (PoS) (Saleh, 2021) are critical for validating tion. Our study has illuminated challenges that need to be
transactions and maintaining network integrity. Multi-agent addressed to harness the power of multi-agent systems bet-
systems can collaborate to monitor network activities, an- ter, including optimizing task planning, managing complex
alyze transaction patterns, and identify potential security context information, and improving memory management.
threats. By working together, these agents can propose Furthermore, the potential applications of multi-agent sys-
enhancements to the consensus mechanism, making the tems in blockchain technologies reveal new avenues for
blockchain more secure and efficient. development, which suggests a promising future for these
Fraud Detection. Fraud detection is one of the most im- systems in distributed computing environments.
portant task in financial monitoring. As an example, (Ankile
et al., 2023) studies fraud detection through the perspective References
of an external observer who detects price manipulation by
analyzing the transaction sequences or the price movements Ahilan, S. and Dayan, P. Feudal multi-agent hierarchies
of a specific asset. Multi-agent systems can benefit fraud de- for cooperative reinforcement learning. arXiv preprint
tection in blockchain as well. Agents can be deployed with arXiv:1901.08492, 2019.
different roles, such as monitoring transactions for fraud- Ankile, L., Ferreira, M. X., and Parkes, D. I see you! robust
ulent activities and analyzing user behaviors. Each agent measurement of adversarial behavior. In Multi-Agent
could also focus on different behavior patterns to improve Security Workshop@ NeurIPS’23, 2023.
the accuracy and efficiency of the fraud detection process.
Besta, M., Blach, N., Kubicek, A., Gerstenberger, R., Gi-
5.2. Blockchain Nodes as Agents aninazzi, L., Gajda, J., Lehmann, T., Podstawski, M.,
Niewiadomski, H., Nyczyk, P., et al. Graph of thoughts:
(Ankile et al., 2023) identifies blockchain nodes as agents, Solving elaborate problems with large language models.
and studies fraud detection in the chain from the perspective arXiv preprint arXiv:2308.09687, 2023.
an external observer. However, as powerful LLM agents
with analyzing and reasoning capabilities, there are much Cai, T., Wang, X., Ma, T., Chen, X., and Zhou, D.
that the agents can do, especially when combined with game Large language models as tool makers. arXiv preprint
theory and enable the agents to negotiate and debate. Below arXiv:2305.17126, 2023.
we provide some perspectives.
Chan, C.-M., Chen, W., Su, Y., Yu, J., Xue, W., Zhang, S.,
6
(Ongoing Update) LLM Multi-Agent Systems: Challenges and Open Problems
Fu, J., and Liu, Z. Chateval: Towards better llm-based Li, G., Hammoud, H. A. A. K., Itani, H., Khizbullin, D., and
evaluators through multi-agent debate. arXiv preprint Ghanem, B. Camel: Communicative agents for” mind”
arXiv:2308.07201, 2023. exploration of large scale language model society. arXiv
preprint arXiv:2303.17760, 2023.
Chen, W., Ma, X., Wang, X., and Cohen, W. W. Program
of thoughts prompting: Disentangling computation from Li, X. and Qiu, X. Mot: Memory-of-thought enables chatgpt
reasoning for numerical reasoning tasks. arXiv preprint to self-improve. In Proceedings of the 2023 Conference
arXiv:2211.12588, 2022. on Empirical Methods in Natural Language Processing,
pp. 6354–6374, 2023.
Chen, Y., Arkin, J., Zhang, Y., Roy, N., and Fan, C. Scalable
multi-robot collaboration with large language models: Li, X., Sun, M., and Li, P. Multi-agent discussion mech-
Centralized or decentralized systems? arXiv preprint anism for natural language generation. In Proceedings
arXiv:2309.15943, 2023. of the AAAI Conference on Artificial Intelligence, vol-
ume 33, pp. 6096–6103, 2019.
Conitzer, V. and Sandholm, T. Computing the optimal
strategy to commit to. In Proceedings of the 7th ACM Li, Y., Wen, H., Wang, W., Li, X., Yuan, Y., Liu, G., Liu,
conference on Electronic commerce, pp. 82–90, 2006. J., Xu, W., Wang, X., Sun, Y., et al. Personal llm agents:
Du, Y., Li, S., Torralba, A., Tenenbaum, J. B., and Mor- Insights and survey about the capability, efficiency and
datch, I. Improving factuality and reasoning in lan- security. arXiv preprint arXiv:2401.05459, 2024.
guage models through multiagent debate. arXiv preprint Liang, T., He, Z., Jiao, W., Wang, X., Wang, Y., Wang,
arXiv:2305.14325, 2023. R., Yang, Y., Tu, Z., and Shi, S. Encouraging divergent
Gerstgrasser, M. and Parkes, D. C. Oracles & followers: thinking in large language models through multi-agent
Stackelberg equilibria in deep multi-agent reinforcement debate. arXiv preprint arXiv:2305.19118, 2023.
learning. In International Conference on Machine Learn- Long, J. Large language model guided tree-of-thought.
ing, pp. 11213–11236. PMLR, 2023. arXiv preprint arXiv:2305.08291, 2023.
Gervais, A., Karame, G. O., Wüst, K., Glykantzis, V., Ritz-
Nye, M., Andreassen, A. J., Gur-Ari, G., Michalewski, H.,
dorf, H., and Capkun, S. On the security and performance
Austin, J., Bieber, D., Dohan, D., Lewkowycz, A., Bosma,
of proof of work blockchains. In Proceedings of the 2016
M., Luan, D., et al. Show your work: Scratchpads for
ACM SIGSAC conference on computer and communica-
intermediate computation with language models. arXiv
tions security, pp. 3–16, 2016.
preprint arXiv:2112.00114, 2021.
Gronauer, S. and Diepold, K. Multi-agent deep reinforce-
Park, J. S., O’Brien, J., Cai, C. J., Morris, M. R., Liang,
ment learning: a survey. Artificial Intelligence Review,
P., and Bernstein, M. S. Generative agents: Interactive
pp. 1–49, 2022.
simulacra of human behavior. In Proceedings of the 36th
Guo, Z., Cheng, S., Wang, Y., Li, P., and Liu, Y. Prompt- Annual ACM Symposium on User Interface Software and
guided retrieval augmentation for non-knowledge- Technology, pp. 1–22, 2023.
intensive tasks. arXiv preprint arXiv:2305.17653, 2023.
Patil, S. G., Zhang, T., Wang, X., and Gonzalez, J. E. Gorilla:
Harris, K., Wu, S., and Balcan, M. F. Stackelberg games Large language model connected with massive apis. arXiv
with side information. In Multi-Agent Security Work- preprint arXiv:2305.15334, 2023.
shop@ NeurIPS’23, 2023.
Saleh, F. Blockchain without waste: Proof-of-stake. The
Jinxin, S., Jiabao, Z., Yilei, W., Xingjiao, W., Jiawen, L., Review of financial studies, 34(3):1156–1190, 2021.
and Liang, H. Cgmi: Configurable general multi-agent
interaction framework. arXiv preprint arXiv:2308.12503, Talebirad, Y. and Nadiri, A. Multi-agent collaboration:
2023. Harnessing the power of intelligent llm agents. arXiv
preprint arXiv:2306.03314, 2023.
Kreps, D. M. Nash equilibrium. In Game Theory, pp. 167–
177. Springer, 1989. Terekhov, M., Graux, R., Neville, E., Rosset, D., and Kolly,
G. Second-order jailbreaks: Generative agents success-
Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., fully manipulate through an intermediary. In Multi-Agent
Goyal, N., Küttler, H., Lewis, M., Yih, W.-t., Rocktäschel, Security Workshop@ NeurIPS’23, 2023.
T., et al. Retrieval-augmented generation for knowledge-
intensive nlp tasks. Advances in Neural Information Pro- Von Stackelberg, H. Market structure and equilibrium.
cessing Systems, 33:9459–9474, 2020. Springer Science & Business Media, 2010.
7
(Ongoing Update) LLM Multi-Agent Systems: Challenges and Open Problems
Wang, W., Dong, L., Cheng, H., Liu, X., Yan, X., Gao, J.,
and Wei, F. Augmenting language models with long-term
memory. arXiv preprint arXiv:2306.07174, 2023.
Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang,
S., Chowdhery, A., and Zhou, D. Self-consistency im-
proves chain of thought reasoning in language models.
arXiv preprint arXiv:2203.11171, 2022a.
Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., and
Zhou, D. Rationale-augmented ensembles in language
models. arXiv preprint arXiv:2207.00747, 2022b.
Wei, J., Wang, X., Schuurmans, D., Bosma, M., Xia, F.,
Chi, E., Le, Q. V., Zhou, D., et al. Chain-of-thought
prompting elicits reasoning in large language models.
Advances in Neural Information Processing Systems, 35:
24824–24837, 2022.
Wood, G. et al. Ethereum: A secure decentralised gener-
alised transaction ledger. Ethereum project yellow paper,
151(2014):1–32, 2014.
Yao, S., Yu, D., Zhao, J., Shafran, I., Griffiths, T. L., Cao, Y.,
and Narasimhan, K. Tree of thoughts: Deliberate prob-
lem solving with large language models. arXiv preprint
arXiv:2305.10601, 2023.