0% found this document useful (0 votes)
177 views30 pages

From Google Gemini To OpenAI

This document surveys recent developments in generative artificial intelligence, focusing on Google's Gemini and the speculated OpenAI Q* project. It discusses how these new technologies are reshaping the AI research landscape and moving the field toward models that can handle diverse inputs through mixture-of-experts methods and multimodal learning. While innovations like Gemini represent significant progress, the survey also notes emerging challenges around issues like bias, plagiarism, and the proliferation of AI-related preprints.

Uploaded by

Agra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
177 views30 pages

From Google Gemini To OpenAI

This document surveys recent developments in generative artificial intelligence, focusing on Google's Gemini and the speculated OpenAI Q* project. It discusses how these new technologies are reshaping the AI research landscape and moving the field toward models that can handle diverse inputs through mixture-of-experts methods and multimodal learning. While innovations like Gemini represent significant progress, the survey also notes emerging challenges around issues like bias, plagiarism, and the proliferation of AI-related preprints.

Uploaded by

Agra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

JOURNAL OF LATEX CLASS FILES, VOL. 1, NO.

1, DECEMBER 2023 1

From Google Gemini to OpenAI Q* (Q-Star): A


Survey of Reshaping the Generative Artificial
Intelligence (AI) Research Landscape
Timothy R. McIntosh, Teo Susnjak, Tong Liu, Paul Watters, Senior Member, IEEE, and Malka N. Halgamuge,
Senior Member, IEEE
arXiv:2312.10868v1 [cs.AI] 18 Dec 2023

Abstract—This comprehensive survey explored the evolving (AI) has witnessed a crucial turn with the advent of Large
landscape of generative Artificial Intelligence (AI), with a specific Language Models (LLMs), notably ChatGPT, developed by
focus on the transformative impacts of Mixture of Experts (MoE), OpenAI, and the recent unveiling of Google’s Gemini [7], [8].
multimodal learning, and the speculated advancements towards
Artificial General Intelligence (AGI). It critically examined the This technology has not only revolutionized the industry and
current state and future trajectory of generative Artificial In- academia, but has also reignited critical discussions concerning
telligence (AI), exploring how innovations like Google’s Gemini AI consciousness and its potential threats to humanity [9],
and the anticipated OpenAI Q* project are reshaping research [10], [11]. The development of such advanced AI systems,
priorities and applications across various domains, including including notable competitors like Anthropic’s Claude, and
an impact analysis on the generative AI research taxonomy.
It assessed the computational challenges, scalability, and real- now Gemini, which demonstrates several advances over pre-
world implications of these technologies while highlighting their vious models like GPT-3 and Google’s own LaMDA, has
potential in driving significant progress in fields like healthcare, reshaped the research landscape. Gemini’s ability to learn
finance, and education. It also addressed the emerging academic from two-way conversations and its “spike-and-slab” attention
challenges posed by the proliferation of both AI-themed and AI- method, which allows it to focus on relevant parts of the
generated preprints, examining their impact on the peer-review
process and scholarly communication. The study highlighted the context during multi-turn conversations, represents a signifi-
importance of incorporating ethical and human-centric methods cant leap in developing models that are better equipped for
in AI development, ensuring alignment with societal norms multidomain conversational applications1 . These innovations
and welfare, and outlined a strategy for future AI research in LLMs, including the mixture-of-experts methods employed
that focuses on a balanced and conscientious use of MoE, by Gemini, signal a move towards models that can handle a
multimodality, and AGI in generative AI.
diversity of inputs and foster multimodal approaches. Amidst
Index Terms—AI Ethics, Artificial General Intelligence (AGI), this backdrop, speculations of an OpenAI project known as
Artificial Intelligence (AI), Gemini, Generative AI, Mixture of
Q* (Q-Star) have surfaced, allegedly combining the power of
Experts (MoE), Multimodality, Q* (Q-star), Research Impact
Analysis. LLMs with sophisticated algorithms such as Q-learning and
A* (A-Star algorithm), further contributing to the dynamic
research environment2.
I. I NTRODUCTION

T HE historical context of AI, tracing back to Alan Turing’s


“Imitation Game” [1], early computational theories [2],
[3], and the development of the first neural networks and
A. Changing AI Research Popularity
As the field of LLMs continues to evolve, exemplified by
machine learning [4], [5], [6], has set the foundation for to- innovations such as Gemini and Q*, a multitude of stud-
day’s advanced models. This evolution, accentuated by crucial ies have surfaced with the aim of charting future research
moments such as the rise of deep learning and reinforcement paths, which have varied from identifying emerging trends to
learning, has been vital in shaping the contemporary trends highlighting areas poised for swift progress. The dichotomy
in AI, including the sophisticated Mixture of Experts (MoE) of established methods and early adoption is evident, with
models and multimodal AI systems, illustrating the field’s “hot topics” in LLM research increasingly shifting towards
dynamic and continuously evolving character. These advance- multimodal capabilities and conversation-driven learning, as
ments are a testament to the dynamic and ever-evolving nature demonstrated by Gemini. The propagation of preprints has
of AI technology. The evolution of Artificial Intelligence expedited knowledge sharing, but also brings the risk of re-
Manuscript received December 19, 2023. (Corresponding author: Timothy
duced academic scrutiny. Issues like inherent biases, noted by
R. McIntosh.) Retraction Watch, along with concerns about plagiarism and
Timothy McIntosh is with Academies Australasia Polytechnic, Melbourne, forgery, present substantial hurdles [12]. The academic world,
VIC 3000, Australia (e-mail: [email protected]).
Teo Susnjak and Tong Liu are with Massey University, Auckland 0632,
therefore, stands at an intersection, necessitating a unified drive
New Zealand (e-mail: [email protected]; [email protected]).
1 https://deepmind.google/technologies/gemini/
Paul Watters is with Cyberstronomy Pty Ltd, Ballarat, VIC 3350, Australia
(e-mail: [email protected]). 2 https://www.forbes.com/sites/lanceeliot/2023/11/26/about-that-mysterious-
Malka N. Halgamuge is with RMIT University, Melbourne, VIC 3000, ai-breakthrough-known-as-q-by-openai-that-allegedly-attains-true-ai-or-is-on-
Australia (e-mail: [email protected]). the-path-toward-artificial-general-intelligence-agi
JOURNAL OF LATEX CLASS FILES, VOL. 1, NO. 1, DECEMBER 2023 2

to refine research directions in light of the fast-paced evolution in generative AI research. This paper specifically contributes
of the field, which appears to be partly traced through the to the understanding of how MoE, multimodality, and Artifi-
changing popularity of various research keywords over time. cial General Intelligence (AGI) are impacting generative AI
The release of generative models like GPT and the widespread models, offering detailed analysis and future directions for
commercial success of ChatGPT have been influential. As each of these three key areas. This study does not aim to
depicted in Figure 1, the rise and fall of certain keywords perpetuate conjecture about the unrevealed Q-Star initiative,
appear to have correlated with significant industry milestones, but rather to critically appraise the potential for obsolescence
such as the release of the “Transformer” model in 2017 [13], or insignificance in extant research themes, whilst concur-
the GPT model in 2018 [14], and the commercial ChatGPT-3.5 rently delving into burgeoning prospects within the rapidly
in December 2022. For instance, the spike in searches related transforming LLM panorama. This inquiry is reminiscent
to “Deep Learning” coincides with the breakthroughs in neural of the obsolete nature of encryption-centric or file-entropy-
network applications, while the interest in “Natural Language based ransomware detection methodologies, which have been
Processing” surges as models like GPT and LLaMA redefine eclipsed by the transition of ransomware collectives towards
what’s possible in language understanding and generation. The data theft strategies utilizing varied attack vectors, relegating
enduring attention to “Ethics / Ethical” in AI research, despite contemporary studies on crypto-ransomware to the status of
some fluctuations, reflects the continuous and deep-rooted latecomers [18], [19]. Advances in AI are anticipated to not
concern for the moral dimensions of AI, underscoring that only enhance capabilities in language analysis and knowledge
ethical considerations are not merely a reactionary measure, synthesis but also to pioneer in areas like Mixture of Experts
but an integral and persistent dialogue within the AI discussion (MoE) [20], [21], [22], [23], [24], [25], multimodality [26],
[15]. [27], [28], [29], [30], and Artificial General Intelligence (AGI)
It is academically intriguing to postulate whether these [31], [32], [10], [11], and has already heralded the obso-
trends signify a causal relationship, where technological ad- lescence of conventional, statistics-driven natural language
vancements drive research focus, or if the burgeoning research processing techniques in many domains [8]. Nonetheless,
itself propels technological development. This paper also the perennial imperative for AI to align with human ethics
explores the profound societal and economic impacts of AI and values persists as a fundamental tenet [33], [34], [35],
advancements. We examine how AI technologies are reshap- and the conjectural Q-Star initiative offers an unprecedented
ing various industries, altering employment landscapes, and opportunity to instigate discourse on how such advancements
influencing socio-economic structures. This analysis highlights might reconfigure the LLM research topography. Within this
both the opportunities and challenges posed by AI in the milieu, insights from Dr. Jim Fan (senior research scientist &
modern world, emphasizing its role in driving innovation and lead of AI agents at NVIDIA) on Q*, particularly concerning
economic growth, while also considering the ethical implica- the amalgamation of learning and search algorithms, furnish
tions and potential for societal disruption. Future studies could an invaluable perspective on the prospective technical con-
yield more definitive insights, yet the synchronous interplay struct and proficiencies of such an undertaking4. Our research
between innovation and academic curiosity remains a hallmark methodology involved a structured literature search using key
of AI’s progress. terms like ‘Large Language Models’ and ‘Generative AI’. We
Meanwhile, the exponential increase in the number of utilized filters across several academic databases such as IEEE
preprints posted on arXiv under the Computer Science > Ar- Xplore, Scopus, ACM Digital Library, ScienceDirect, Web of
tificial Intelligence (cs.AI) category, as illustrated in Figure 2, Science, and ProQuest Central, tailored to identify relevant
appears to signify a paradigm shift in research dissemination articles published in the timeframe from 2017 (the release
within the AI community. While the rapid distribution of of the “Transformer” model) to 2023 (the writing time of
findings enables swift knowledge exchange, it also raises this manuscript). This paper aspires to dissect the technical
concerns regarding the validation of information. The surge ramifications of Gemini and Q*, probing how they (and
in preprints may lead to the propagation of unvalidated or similar technologies whose emergence is now inevitable) may
biased information, as these studies do not undergo the rigor- transfigure research trajectories and disclose new vistas in the
ous scrutiny and potential retraction typical of peer-reviewed domain of AI. In doing so, we have pinpointed three nascent
publications [16], [17]. This trend underlines the need for research domains—MoE, multimodality, and AGI—that stand
careful consideration and critique in the academic community, to reshape the generative AI research landscape profoundly.
especially given the potential for such unvetted studies to be This investigation adopts a survey-style approach, systemat-
cited and their findings propagated. ically mapping out a research roadmap that synthesizes and
analyzes the current and emergent trends in generative AI.
B. Objectives The major contributions of this study is as follows:
The impetus for this investigation is the official unveiling of 1) Detailed examination of the evolving landscape in genera-
Gemini and the speculative discourse surrounding Q* project, tive AI, emphasizing the advancements and innovations in
which prompts a timely examination of the prevailing currents technologies like Gemini and Q*, and their wide-ranging
implications within the AI domain.
3 The legend entries correspond to the keywords used in the search query,
which is constructed as: “(AI OR artificial OR (machine learning) OR (neural
network) OR computer OR software) AND ([specific keyword])”. 4 https://twitter.com/DrJimFan/status/1728100123862004105
JOURNAL OF LATEX CLASS FILES, VOL. 1, NO. 1, DECEMBER 2023 3

700k

600k
Number of search results

500k

400k

300k

200k

100k

2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023
Year
Deep Learning Transfer Learning Supervised Learning
Convolutional Neural Network(s) Explainable AI Natural Language Processing
Unsupervised Learning Reinforcement Learning Generative Adversarial Networks
Fine(-)tuning Ethics / Ethical Language Model(s)
3
Figure 1: Number of search results on Google Scholar with different keywords by year

25,000 aligning these technologies with ethical norms, ensuring


cs.AI Preprints on arXiv data privacy, and developing comprehensive governance
frameworks.
20,000
The rest of this paper is organized as follows: Section
Number of Preprints

II explores the historical development of Generative AI.


15,000 Section III presents a taxonomy of current Generative AI
research. Section IV explores the Mixture of Experts (MoE)
model architecture, its innovative features, and its impact on
10,000 transformer-based language models. Section V discusses the
speculated capabilities of the Q* project. Section VI discusses
5,000 the projected capabilities of AGI. Section VII examines the
impact of recent advancements on the Generative AI research
taxonomy. Section VIII identifies emerging research priorities
0 in Generative AI. Section X discusses the academic challenges
of the rapid surge of preprints in AI. The paper concludes in
20 9
20 1
20 2
20 3
20 4
20 5
20 6
20 7
20 8

20

20 1
20 2
23
1
1
1
1
1
1
1
1
1

2
2
20

20

Section XI, summarizing the overall effects of these develop-


Year ments in generative AI.
Figure 2: Annual number of preprints posted under the cs.AI
II. BACKGROUND : E VOLUTION OF G ENERATIVE AI
category on arXiv.org
The ascent of Generative AI has been marked by significant
milestones, with each new model paving the way for the next
evolutionary leap. From single-purpose algorithms to LLMs
2) Analysis of the transformative effect of advanced gener- like OpenAI’s ChatGPT and the latest multimodal systems,
ative AI systems on academic research, exploring how the AI landscape has been transformed, while countless other
these developments are altering research methodologies, fields have been disrupted.
setting new trends, and potentially leading to the obso-
lescence of traditional approaches.
3) Thorough assessment of the ethical, societal, and tech- A. The Evolution of Language Models
nical challenges arising from the integration of genera- Language models have undergone a transformative journey
tive AI in academia, underscoring the crucial need for (Fig. 3), evolving from rudimentary statistical methods to the
JOURNAL OF LATEX CLASS FILES, VOL. 1, NO. 1, DECEMBER 2023 4

1980s: Statistical Models (n-grams) 2) Large Language Models: Technical Advancement and
Commercial Success: The advent of deep learning has revolu-
tionized the field of NLP, leading to the development of LLMs
1990s: Adoption in NLP, n-gram Usage
like GPT, BERT, and notably, OpenAI’s ChatGPT. Recent
models such as GPT-4 and LLaMA have pushed the bound-
1997: Introduction of LSTMs aries by integrating sophisticated techniques like transformer
architectures and advanced natural language understanding,
2000s: LSTMs in Text/Voice Processing illustrating the rapid evolution in this field [37]. These models
represent a significant leap in NLP capabilities, leveraging
vast computational resources and extensive datasets to achieve
2010s: Deep Learning Era, GPT, BERT new heights in language understanding and generation [37],
[50]. ChatGPT has shown impressive conversational skills
2020s: LLaMA, Gemini; ChatGPT Launch and contextual understanding with a broad spectrum of func-
tional uses in many areas, as evidenced by its technical and
commercial success, including rapid adoption by over 100
Figure 3: Timeline of Key Developments in Language Model million users shortly after launch, which underscores a robust
Evolution market demand for natural language AI and has catalyzed
interdisciplinary research into its applications in sectors like
education, healthcare, and commerce [8], [50], [51], [52],
[53]. In education, ChatGPT offers innovative approaches
complex neural network architectures that underpin today’s to personalized learning and interactive teaching [54], [51],
LLMs [36], [37]. This evolution has been driven by a relentless [55], [56], while in commerce, it revolutionizes customer
quest for models that more accurately reflect the nuances of service and content creation [57], [58]. The widespread use
human language, as well as the desire to push the boundaries of ChatGPT, Google Bard, Anthropic Claude and similar
of what machines can understand and generate [36], [38], [37]. commercial LLMs has reignited important debates in the field
However, this rapid advancement has not been without its of AI, particularly concerning AI consciousness and safety, as
challenges. As language models have grown in capability, so its human-like interaction capabilities raise significant ethical
too have the ethical and safety concerns surrounding their use, questions and highlight the need for robust governance and
prompting a reevaluation of how these models are developed safety measures in AI development [59], [31], [32], [11]. Such
and the purposes for which they are employed [36], [39], [40]. influence appears to extend beyond its technical achievements,
shaping cultural and societal discussions about the role and
1) Language Models as Precursors: The inception of lan- future of AI in our world.
guage modeling can be traced to the statistical approaches of The advancements in LLMs, including the development of
the late 1980s, a period marked by a transition from rule-based models like GPT and BERT, have paved the way for the
to machine learning algorithms in Natural Language Process- conceptualization of Q*. Specifically, the scalable architecture
ing (NLP) [41], [42], [43], [44], [45]. Early models, primarily and extensive training data that characterize these models
n-gram based, calculated the probability of word sequences are foundational to the proposed capabilities of Q*. The
in a corpus, thus providing a rudimentary understanding of success of ChatGPT in contextual understanding and con-
language structure [41]. Those models, simplistic yet ground- versational AI, for example, informs the design principles
breaking, laid the groundwork for future advances in language of Q*, suggesting a trajectory towards more sophisticated,
understanding. With the increase of computational power, the context-aware, and adaptive language processing capabilities.
late 1980s witnessed a revolution in NLP, pivoting towards Similarly, the emergence of multimodal systems like Gemini,
statistical models capable of ‘soft’ probabilistic decisions, as capable of integrating text, images, audio, and video, reflects
opposed to the rigid, ‘handwritten’ rule-based systems that an evolutionary path that Q* could extend, combining the
dominated early NLP systems [43]. IBM’s development of versatility of LLMs with advanced learning and pathfinding
complicated statistical models throughout this period signified algorithms for a more holistic AI solution.
the growing importance and success of these approaches. 3) Fine-tuning, Hallucination Reduction, and Alignment
In the subsequent decade, the popularity and applicability in LLMs: The advancement of LLMs has underlined the
of statistical models surged, proving invaluable in managing significance of fine-tuning [60], [61], [62], [63], hallucination
the flourishing flow of digital text. The 1990s saw statistical reduction [64], [65], [66], [67], and alignment [68], [69],
methods firmly established in NLP research, with n-grams [70], [71], [72]. These aspects are crucial in enhancing the
becoming instrumental in numerically capturing linguistic pat- functionality and reliability of LLMs. Fine-tuning, which
terns. The introduction of Long Short-Term Memory (LSTM) involves adapting pre-trained models to specific tasks, has
networks in 1997 [46], and their application to voice and seen significant progress: techniques like prompt-based and
text processing a decade later [47], [48], [49], marked a few-shot learning [73], [74], [75], [76], alongside supervised
significant milestone, leading to the current era where neural fine-tuning on specialized datasets [60], [77], [78], [79], have
network models represent the cutting edge of NLP research enhanced the adaptability of LLMs in various contexts, but
and development. challenges remain, particularly in bias mitigation and the
JOURNAL OF LATEX CLASS FILES, VOL. 1, NO. 1, DECEMBER 2023 5

generalization of models across diverse tasks [60], [80], [72]. Although the paradigm shift to MoE signifies a major leap in
Hallucination reduction is a persistent challenge in LLMs, LLM development, offering significant scalability and special-
characterized by the generation of confident but factually in- ization advantages, ensuring the safety, ethical alignment, and
correct information [36]. Strategies such as confidence penalty transparency of these models remains a paramount concern.
regularization during fine-tuning have been implemented to The MoE architecture, while technologically advanced, entails
mitigate overconfidence and improve accuracy [81], [82], [83]. continued interdisciplinary research and governance to align
Despite these efforts, the complexity of human language and AI with broader societal values and ethical standards.
the breadth of topics make completely eradicating hallucina-
tions a daunting task, especially in culturally sensitive contexts B. Multimodal AI and the Future of Interaction
[36], [9]. Alignment, ensuring LLM outputs are congruent The advent of multimodal AI marks a transformative era
with human values and ethics, is an area of ongoing research. in AI development, revolutionizing how machines interpret
Innovative approaches, from constrained optimization [84], and interact with a diverse array of human sensory inputs and
[85], [86], [87], [88], to different types of reward modeling contextual data.
[89], [90], [91], [92], aim to embed human preferences within 1) Gemini: Redefining Benchmarks in Multimodality: Gem-
AI systems. While advancements in fine-tuning, hallucination ini, a pioneering multimodal conversational system, marks a
reduction, and alignment have propelled LLMs forward, these significant shift in AI technology by surpassing traditional
areas still present considerable challenges. The complexity of text-based LLMs like GPT-3 and even its multimodal coun-
aligning AI with the diverse spectrum of human ethics and the terpart, ChatGPT-4. Gemini’s architecture has been designed
persistence of hallucinations, particularly on culturally sensi- to incorporate the processing of diverse data types such
tive topics, highlight the need for continued interdisciplinary as text, images, audio, and video, a feat facilitated by its
research in the development and application of LLMs [9]. unique multimodal encoder, cross-modal attention network,
4) Mixture of Experts: A Paradigm Shift: The adoption and multimodal decoder [112]. The architectural core of
of the MoE architecture in LLMs marks a critical evolution Gemini is its dual-encoder structure, with separate encoders
in AI technology. This innovative approach, exemplified by for visual and textual data, enabling sophisticated multimodal
advanced models like Google’s Switch Transformer5 and contextualization [112]. This architecture is believed to surpass
MistralAI s Mixtral-8x7B6, leverages multiple transformer- the capabilities of single-encoder systems, allowing Gemini
based expert modules for dynamic token routing, enhancing to associate textual concepts with image regions and achieve
modeling efficiency and scalability. The primary advantage of a compositional understanding of scenes [112]. Furthermore,
MoE lies in its ability to handle vast parameter scales, reduc- Gemini integrates structured knowledge and employs special-
ing memory footprint and computational costs significantly ized training paradigms for cross-modal intelligence, setting
[93], [94], [95], [96], [97]. This is achieved through model new benchmarks in AI [112]. In [112], Google has claimed and
parallelism across specialized experts, allowing the training demonstrated that Gemini distinguishes itself from ChatGPT-4
of models with trillions of parameters, and its specialization through several key features:
in handling diverse data distributions enhances its capability • Breadth of Modalities: Unlike ChatGPT-4, which pri-
in few-shot learning and other complex tasks [94], [95]. To marily focuses on text, documents, images, and code,
illustrate the practicality of MoE, consider its application in Gemini handles a wider range of modalities including
healthcare. For example, an MoE-based system could be used audio, and video. This extensive range allows Gemini to
for personalized medicine, where different ‘expert’ modules tackle complex tasks and understand real-world contexts
specialize in various aspects of patient data analysis, including more effectively.
genomics, medical imaging, and electronic health records. This • Performance: Gemini Ultra excels in key multimodality
approach could significantly enhance diagnostic accuracy and benchmarks, notably in massive multitask language un-
treatment personalization. Similarly, in finance, MoE models derstanding (MMLU) which encompasses a diverse array
can be deployed for risk assessment, where experts analyze of domains like science, law, and medicine, outperform-
distinct financial indicators, market trends, and regulatory ing ChatGPT-4.
compliance factors. • Scalability and Accessibility: Gemini is available in three

Despite its benefits, MoE confronts challenges in dynamic tailored versions – Ultra, Pro, and Nano – catering to a
routing complexity [98], [99], [100], [101], [102], expert range of applications from data centers to on-device tasks,
imbalance [103], [104], [105], [106], and probability dilu- a level of flexibility not yet seen in ChatGPT-4.
tion [107], and such technical hurdles demand sophisticated • Code Generation: Gemini’s proficiency in understanding

solutions to fully harness MoE’s potential. Moreover, while and generating code across various programming lan-
MoE may offer performance gains, it does not inherently guages is more advanced, offering practical applications
solve ethical alignment issues in AI [108], [109], [110]. The beyond ChatGPT-4’s capabilities.
complexity and specialization of MoE models can obscure the • Transparency and Explainability: A focus on explainabil-

decision-making processes, complicating efforts to ensure ethi- ity sets Gemini apart, as it provides justifications for its
cal compliance and alignment with human values [108], [111]. outputs, enhancing user trust and understanding of the
AI’s reasoning process.
5 https://huggingface.co/google/switch-c-2048 Despite these advancements, Gemini’s real-world perfor-
6 https://huggingface.co/mistralai/Mixtral-8x7B-v0.1 mance in complex reasoning tasks that require integration
JOURNAL OF LATEX CLASS FILES, VOL. 1, NO. 1, DECEMBER 2023 6

of commonsense knowledge across modalities remains to be ethical development of multimodal AI systems requires robust
thoroughly evaluated. governance frameworks focusing on transparency, consent,
2) Technical Challenges in Multimodal Systems: The devel- data handling protocols, and public awareness, when ethical
opment of multimodal AI systems faces several technical hur- guidelines must evolve to address the unique challenges posed
dles, including creating robust and diverse datasets, managing by these technologies, including setting standards for data
scalability, and enhancing user trust and system interpretability usage and safeguarding against the nonconsensual exploita-
[113], [114], [115]. Challenges like data skew and bias are tion of personal information [135], [136]. Additionally, the
prevalent due to data acquisition and annotation issues, which development of AI literacy programs will be crucial in helping
requires effective dataset management by employing strate- society understand and responsibly interact with multimodal
gies such as data augmentation, active learning, and transfer AI technologies [113], [135]. As the field progresses, interdis-
learning [113], [116], [80], [115]. A significant challenge is ciplinary collaboration will be key in ensuring these systems
the computational demands of processing various data streams are developed and deployed in a manner that aligns with
simultaneously, requiring powerful hardware and optimized societal values and ethical principles [113].
model architectures for multiple encoders [117], [118]. Ad-
vanced algorithms and multimodal attention mechanisms are
C. Speculative Advances and Chronological Trends
needed to balance attention across different input media and
resolve conflicts between modalities, especially when they pro- In the dynamic landscape of AI, the speculative capabilities
vide contradictory information [119], [120], [118]. Scalability of the Q* project, blending LLMs, Q-learning, and A* (A-
issues, due to the extensive computational resources needed, Star algorithm), embodies a significant leap forward. This
are exacerbated by limited high-performance hardware avail- section explores the evolutionary trajectory from game-centric
ability [121], [122]. There is also a pressing need for calibrated AI systems to the broad applications anticipated with Q*.
multimodal encoders for compositional scene understanding 1) From AlphaGo’s Groundtruth to Q-Star’s Exploration:
and data integration [120]. Refining evaluation metrics for The journey from AlphaGo, a game-centric AI, to the con-
these systems is necessary to accurately assess performance ceptual Q-Star project represents a significant paradigm shift
in real-world tasks, calling for comprehensive datasets and in AI. AlphaGo’s mastery in the game of Go highlighted
unified benchmarks, and for enhancing user trust and system the effectiveness of deep learning and tree search algorithms
interpretability through explainable AI in multimodal contexts. within well-defined rule-based environments, underscoring the
Addressing these challenges is vital for the advancement of potential of AI in complex strategy and decision-making [137],
multimodal AI systems, enabling seamless and intelligent [138]. Q-Star, however, is speculated to move beyond these
interaction aligned with human expectations. confines, aiming to amalgamate the strengths of reinforcement
3) Multimodal AI: Beyond Text in Ethical and Social Con- learning (as seen in AlphaGo), with the knowledge, NLG,
texts: The expansion of multimodal AI systems introduces creativity and versatility of LLMs, and the strategic effi-
both benefits and complex ethical and social challenges that ciency of pathfinding algorithms like A*. This blend, merging
extend beyond those faced by text-based AI. In commerce, pathfinding algorithms and LLMs, could enable AI systems
multimodal AI can transform customer engagement by inte- to transcend board game confines and, with Q-Star’s natural
grating visual, textual, and auditory data [123], [124], [125]. language processing, interact with human language, enabling
For autonomous vehicles, multimodality can enhance safety nuanced interactions and marking a leap towards AI adept in
and navigation by synthesizing data from various sensors, both structured tasks and complex human-like communication
including visual, radar, and Light Detection and Ranging and reasoning. Moreover, the incorporation of Q-learning and
(LIDAR) [126], [125], [127]. Still, DeepFake technology’s A* algorithms would enable Q-Star to optimize decision paths
ability to generate convincingly realistic videos, audio, and and learn from its interactions, making it more adaptable and
images is a critical concern in multimodality, as it poses risks intelligent over time. The combination of these technologies
of misinformation and manipulation that significantly impact could lead to AI that is not only more efficient in problem-
public opinion, political landscapes, and personal reputations, solving but also creative and insightful in its approach. This
thereby compromising the authenticity of digital media and speculative advancement from the game-focused power of Al-
raising issues in social engineering and digital forensics where phaGo to the comprehensive potential of Q-Star illustrates the
distinguishing genuine from AI-generated content becomes dynamic and ever-evolving nature of AI research, and opens
increasingly challenging [128], [129]. Privacy concerns are up possibilities for AI applications that are more integrated
amplified in multimodal AI due to its ability to process with human life and capable of handling a broader range of
and correlate diverse data sources, potentially leading to tasks with greater autonomy and sophistication.
intrusive surveillance and profiling, which raises questions 2) Bridging Structured Learning with Creativity: The antic-
about the consent and rights of individuals, especially when ipated Q* project, blending Q-learning and A* algorithms with
personal media is used without permission for AI training or the creativity of LLMs, embodies a groundbreaking step in
content creation [113], [130], [131]. Moreover, multimodal AI, potentially surpassing recent innovations like Gemini. The
AI can propagate and amplify biases and stereotypes across fusion suggested in Q* points to an integration of structured,
different modalities, and if unchecked, this can perpetuate goal-oriented learning with generative, creative capabilities, a
discrimination and social inequities, making it imperative to combination that could transcend the existing achievements
address algorithmic bias effectively [132], [133], [134]. The of Gemini. While Gemini represents a significant leap in
JOURNAL OF LATEX CLASS FILES, VOL. 1, NO. 1, DECEMBER 2023 7

multimodal AI, combining various forms of data inputs such • Recurrent Neural Networks (RNNs): RNNs excel in the
as text, images, audio, and video, Q* is speculated to bring a realm of sequence modeling, making them particularly
more profound integration of creative reasoning and structured effective for tasks involving language and temporal data,
problem-solving. This would be achieved by merging the as their architecture is specifically designed to process
precision and efficiency of algorithms like A* with the learning sequences of data, such as text, enabling them to capture
adaptability of Q-learning, and the complex understanding the context and order of the input effectively [150],
of human language and context offered by LLMs. Such an [151], [152], [153], [154]. This proficiency in handling
integration could enable AI systems to not only process and sequential information renders them indispensable in
analyze complex multimodal data but also to autonomously applications that require a deep understanding of the
navigate through structured tasks while engaging in creative temporal dynamics within data, such as natural language
problem-solving and knowledge generation, mirroring the tasks and time-series analysis [155], [156]. RNNs’ ability
multifaceted nature of human cognition. The implications of to maintain a sense of continuity over sequences is a
this potential advancement are vast, suggesting applications critical asset in the broader field of AI, especially in
that span beyond the capabilities of current multimodal sys- scenarios where context and historical data play crucial
tems like Gemini. By aligning the deterministic aspects of roles [157].
traditional AI algorithms with the creative and generative • Mixture of Experts (MoE): MoE models can signifi-
potential of LLMs, Q* could offer a more holistic approach cantly enhance efficiency by deploying model parallelism
to AI development. This could bridge the gap between the across multiple specialized expert modules, which en-
logical, rule-based processing of AI and the creative, abstract ables these models to leverage transformer-based modules
thinking characteristic of human intelligence. The anticipated for dynamic token routing, and to scale to trillions of
unveiling of Q*, merging structured learning techniques and parameters, thereby reducing both memory footprint and
creative problem-solving in a singular, advanced framework, computational costs [94], [98]. MoE models stand out for
holds the promise of not only extending but also significantly their ability to divide computational loads among various
surpassing the multimodal capabilities of systems like Gemini, experts, each specializing in different aspects of the data,
thus heralding another game-changing era in the domain of which allows for handling vast scales of parameters more
generative AI, showcasing its potential as a crucial develop- effectively, leading to a more efficient and specialized
ment eagerly awaited in the ongoing evolution of AI. handling of complex tasks [94], [21].
• Multimodal Models: Multimodal models, which inte-
III. T HE C URRENT G ENERATIVE AI R ESEARCH grate a variety of sensory inputs such as text, vision,
TAXONOMY and audio, are crucial in achieving a comprehensive
The field of Generative AI is evolving rapidly, which understanding of complex data sets, particularly trans-
necessitates a comprehensive taxonomy that encompasses the formative in fields like medical imaging [113], [112],
breadth and depth of research within this domain. Detailed in [115]. These models facilitate accurate and data-efficient
Table I, this taxonomy categorizes the key areas of inquiry analysis by employing multi-view pipelines and cross-
and innovation in generative AI, and serves as a foundational attention blocks [158], [159]. This integration of diverse
framework to understand the current state of the field, guiding sensory inputs allows for a more nuanced and detailed
through the complexities of evolving model architectures, interpretation of data, enhancing the model’s ability to
advanced training methodologies, diverse application domains, accurately analyze and understand various types of infor-
ethical implications, and the frontiers of emerging technolo- mation [160]. The combination of different data types,
gies. processed concurrently, enables these models to provide a
holistic view, making them especially effective in applica-
tions that require a deep and multifaceted understanding
A. Model Architectures of complex scenarios [113], [161], [162], [160].
Generative AI model architectures have seen significant
developments, with four key domains standing out: B. Training Techniques
• Transformer Models: Transformer models have signifi- The training of generative AI models leverages four key
cantly revolutionized the field of AI, especially in NLP, techniques, each contributing uniquely to the field:
due to their higher efficiency and scalability [139], [140], • Supervised Learning: Supervised learning, a founda-
[141]. They employ advanced attention mechanisms to tional approach in AI, uses labeled datasets to guide
achieve enhanced contextual processing, allowing for models towards accurate predictions, and it has been
more subtle understanding and interaction [142], [143], integral to various applications, including image recogni-
[144]. These models have also made notable strides in tion and NLP [163], [164], [165]. Recent advancements
computer vision, as evidenced by the development of have focused on developing sophisticated loss functions
vision transformers like EfficientViT [145], [146] and and regularization techniques, aimed at enhancing the
YOLOv8 [147], [148], [149]. These innovations symbol- performance and generalization capabilities of supervised
ize the extended capabilities of transformer models in learning models, ensuring they remain robust and effec-
areas such as object detection, offering not only improved tive across a wide range of tasks and data types [166],
performance but also increased computational efficiency. [167], [168].
JOURNAL OF LATEX CLASS FILES, VOL. 1, NO. 1, DECEMBER 2023 8

Table I: Comprehensive Taxonomy of Current Generative AI and LLM Research


Domain Subdomain Key Focus Description
Model Architec- Transformer Models Efficiency, Scalability Optimizing network structures for faster processing and larger datasets.
ture
Recurrent Neural Sequence Processing Handling sequences of data, like text, for improved contextual understanding.
Networks
Mixture of Experts Specialization, Leveraging multiple expert modules for enhanced efficiency and task-specific
Efficiency performance.
Multimodal Models Sensory Integration Integrating text, vision, and audio inputs for comprehensive understanding.
Training Supervised Learning Data Labeling, Accu- Using labeled datasets to train models for precise predictions.
Techniques racy
Unsupervised Learn- Pattern Discovery Finding patterns and structures from unlabeled data.
ing
Reinforcement Learn- Adaptability, Training models through feedback mechanisms for optimal decision-making.
ing Optimization
Transfer Learning Versatility, Applying knowledge gained in one task to different but related tasks.
Generalization
Application Do- Natural Language Comprehension, Con- Enhancing the ability to understand and interpret human language in context.
mains Understanding textualization
Natural Language Creativity, Coherence Generating coherent and contextually relevant text responses.
Generation
Conversational AI Interaction, Natural- Developing systems for natural and contextually relevant human-computer
ness conversations.
Creative AI Innovation, Artistic Generating creative content, including text, art, and music.
Generation
Compliance and Bias Mitigation Fairness, Representa- Addressing and reducing biases in AI outputs.
Ethical Consider- tion
ations
Data Security Data Protection, Con- Ensuring data confidentiality, integrity and availability security in AI models
fidentiality and outputs.
AI Ethics Fairness, Addressing ethical issues such as bias, fairness, and accountability in AI
Accountability systems.
Privacy Preservation Privacy Compliance, Protecting data privacy in model training and outputs.
Anonymization
Advanced Learn- Self-supervised Autonomy, Efficiency Utilizing unlabeled data for model training, enhancing learning efficiency.
ing Learning
Meta-learning Rapid Adaptation Enabling AI models to quickly adapt to new tasks with minimal data.
Fine Tuning Domain- Adapting models to specific domains or user preferences for enhanced relevance
Specific Tuning, and accuracy.
Personalization
Human Value Align- Ethical Integration, Aligning AI outputs with human ethics and societal norms, ensuring decisions
ment Societal Alignment are ethically and socially responsible.
Emerging Trends Multimodal Learning Integration with Vi- Combining language models with other sensory data types for richer under-
sion, Audio standing.
Interactive and Coop- Collaboration, Enhancing AI’s ability to work alongside humans in collaborative tasks.
erative AI Human-AI
Interaction
AGI Development Holistic Understand- Pursuing the development of AI systems with comprehensive, human-like
ing understanding.
AGI Containment Safety Protocols, Developing methods to contain and control AGI systems to prevent unintended
Control Mechanisms consequences.

• Unsupervised Learning: Unsupervised learning is es- autonomous systems [176], [177]. This training technique
sential in AI for uncovering patterns within unlabeled has undergone significant advancements, particularly with
data, a process central to tasks like feature learning the development of Deep Q-Networks (DQN) [178],
and clustering [169], [170]. This method has seen sig- [179], [180] and Proximal Policy Optimization (PPO)
nificant advancements with the introduction of autoen- algorithms [181], [182], [183]. These enhancements have
coders [171], [172] and Generative Adversarial Networks been crucial in improving the efficacy and applicability
(GANs) [173], [174], [175], which have notably expanded of reinforcement learning, especially in complex and
unsupervised learning’s applicability, enabling more so- dynamic environments. By optimizing decisions and poli-
phisticated data generation and representation learning cies through interactive feedback loops, reinforcement
capabilities. Such innovations are crucial for understand- learning has established itself as a crucial tool for training
ing and leveraging the complex structures often inherent AI systems in scenarios that demand a high degree
in unstructured datasets, highlighting the growing versa- of adaptability and precision in decision-making [184],
tility and depth of unsupervised learning techniques. [185].
• Reinforcement Learning: Reinforcement learning, char- • Transfer Learning: Transfer learning emphasizes ver-
acterized by its adaptability and optimization capabilities, satility and efficiency in AI training, allowing models
has become increasingly vital in decision-making and to apply knowledge acquired from one task to different
JOURNAL OF LATEX CLASS FILES, VOL. 1, NO. 1, DECEMBER 2023 9

yet related tasks, which significantly reduces the need by large pre-trained models like Meena7 and BlenderBot8 ,
for large labeled datasets [186], [187]. Transfer learning, have significantly enhanced the empathetic and respon-
through the use of pre-trained networks, streamlines the sive capabilities of AI interactions. These systems not
training process by allowing models to be efficiently only improve user engagement and satisfaction, but also
fine-tuned for specific applications, thereby enhancing maintain the flow of conversation over multiple turns,
adaptability and performance across diverse tasks, and providing coherent, contextually relevant, and engaging
proving particularly beneficial in scenarios where acquir- experiences [208], [209].
ing extensive labeled data is impractical or unfeasible • Creative AI: This emerging subdomain spans across text,
[188], [189]. art, music, and more, pushing the boundaries of AI’s
creative and innovative potential across various modalities
including images, audio, and video, by engaging in the
C. Application Domains
generation of artistic content, encompassing applications
The application domains of Generative AI are remarkably in idea generation, storytelling, poetry, music composi-
diverse and evolving, encompassing both established and tion, visual arts, and creative writing, and has resulted in
emerging areas of research and application. These domains commercial success like MidJourney and DALL-E [210],
have been significantly influenced by recent advancements in [211], [212]. The challenges in this field involve finding
AI technology and the expanding scope of AI applications. suitable data representations, algorithms, and evaluation
• Natural Language Understanding (NLU): NLU is cen- metrics to effectively assess and foster creativity [212],
tral to enhancing the comprehension and contextualiza- [213]. Creative AI serves not only as a tool for automating
tion of human language in AI systems, and involves and enhancing artistic processes, but also as a medium for
key capabilities such as semantic analysis, named en- exploring new forms of artistic expression, enabling the
tity recognition, sentiment analysis, textual entailment, creation of novel and diverse creative outputs [212]. This
and machine reading comprehension [190], [191], [192], domain represents a significant leap in AI’s capability to
[193]. Advances in NLU have been crucial in improving engage in and contribute to creative endeavors, redefining
AI’s proficiency in interpreting and analyzing language the intersection of technology and art.
across a spectrum of contexts, ranging from straightfor-
ward conversational exchanges to intricate textual data
[190], [192], [193]. NLU is fundamental in applications D. Compliance and Ethical Considerations
like sentiment analysis, language translation, information As AI technologies rapidly evolve and become more in-
extraction, and more [194], [195], [196]. Recent advance- tegrated into various sectors, ethical considerations and legal
ments have prominently featured large transformer-based compliance have become increasingly crucial, which requires a
models like BERT and GPT-3, which have significantly focus on developing ‘Ethical AI Frameworks’, a new category
advanced the field by enabling a deeper and more com- in our taxonomy reflecting the trend towards responsible AI
plex understanding of language subtleties [197], [198]. development in generative AI [214], [215], [15], [216], [217].
• Natural Language Generation (NLG): NLG em- Such frameworks are crucial in ensuring AI systems are built
phasizes the training of models to generate coherent, with a core emphasis on ethical considerations, fairness, and
contextually-relevant, and creative text responses, a crit- transparency, as they address critical aspects such as bias
ical component in chatbots, virtual assistants, and auto- mitigation for fairness, privacy and security concerns for data
mated content creation tools [199], [36], [200], [201]. protection, and AI ethics for accountability, thus responding
NLG encompasses challenges such as topic model- to the evolving landscape where accountability in AI is of
ing, discourse planning, concept-to-text generation, style paramount importance [214], [15]. The need for rigorous
transfer, and controllable text generation [36], [202]. approaches to uphold ethical integrity and legal conformity
The recent surge in NLG capabilities, exemplified by has never been more pressing, reflecting the complexity and
advanced models like GPT-3, has significantly enhanced multifaceted challenges introduced by the adoption of these
the sophistication and nuance of text generation, which technologies [15].
enable AI systems to produce text that closely mirrors • Bias Mitigation: Bias Mitigation in AI systems is a
human writing styles, thereby broadening the scope and critical endeavor to ensure fairness and representation,
applicability of NLG in various interactive and creative which involves not only balanced data collection to
contexts [203], [55], [51]. avoid skewed perspectives but also involves implementing
• Conversational AI: This subdomain is dedicated to algorithmic adjustments and regularization techniques to
developing AI systems capable of smooth, natural, and minimize biases [218], [219]. Continuous monitoring and
context-aware human-computer interactions, by focusing bias testing are essential to identify and address any
on dialogue modeling, question answering, user intent biases that may emerge from AI’s predictive patterns
recognition, and multi-turn context tracking [204], [205], [220], [219]. A significant challenge in this area is
[206], [207]. In finance and cybersecurity, AI’s predictive dealing with intersectional biases [221], [222], [223] and
analytics have transformed risk assessment and fraud
detection, leading to more secure and efficient operations 7 https://neptune.ai/blog/transformer-nlp-models-meena-lamda-chatbots

[205], [19]. The advancements in this area, demonstrated 8 https://blenderbot.ai


JOURNAL OF LATEX CLASS FILES, VOL. 1, NO. 1, DECEMBER 2023 10

understanding the causal interactions that may contribute negative sample pairs. Further, it employs self-prediction
to these biases [224], [225], [226], [227]. strategies, inspired by NLP, using techniques like mask-
• Data Security: In AI data security, key requirements and ing for input reconstruction, significantly enhanced by
challenges include ensuring data confidentiality, adhering recent Vision Transformers developments [249], [250],
to consent norms, and safeguarding against vulnerabilities [165]. This integration of varied methods highlights self-
like membership inference attacks [228], [229]. Compli- supervised learning’s role in advancing AI’s autonomous
ance with stringent legal standards within applicable juris- training capabilities.
dictions, such as the General Data Protection Regulation • Meta-learning: Meta-learning, or ‘learning to learn’,
(GDPR) and California Consumer Privacy Act (CCPA), centers on equipping AI models with the ability to
is essential, necessitating purpose limitation and data rapidly adapt to new tasks and domains using limited data
minimization [230], [231], [232]. Additionally, issues of samples [251], [252]. This technique involves mastering
data sovereignty and copyright emphasize the need for the optimization process and is critical in situations with
robust encryption, access control, and continuous security limited data availability, to ensure models can quickly
assessments [233], [234]. These efforts are critical for adapt and perform across diverse tasks, essential in the
maintaining the integrity of AI systems and protecting current data-driven landscape [253], [254]. It focuses on
user privacy in an evolving digital landscape. few-shot generalization, enabling AI to handle a wide
• AI Ethics: The field of AI ethics focuses on fairness, range of tasks with minimal data, underlining its impor-
accountability, and societal impact, addresses the surge in tance in developing versatile and adaptable AI systems
ethical challenges posed by AI’s increasing complexity [255], [256], [254], [257].
and potential misalignment with human values, and re- • Fine Tuning: Involves customizing pre-trained models to
quires ethical governance frameworks, multidisciplinary specific domains or user preferences, enhancing accuracy
collaborations, and technological solutions [214], [235], and relevance for niche applications [60], [258], [259].
[15], [236]. Furthermore, AI Ethics involves ensuring Its two primary approaches are end-to-end fine-tuning,
traceability, auditability, and transparency throughout the which adjusts all weights of the encoder and classifier
model development lifecycle, employing practices such as [260], [261], and feature-extraction fine-tuning, where
algorithmic auditing, establishing ethics boards, and ad- the encoder weights are frozen to extract features for
hering to documentation standards and model cards [237], a downstream classifier [262], [263], [264]. This tech-
[236]. However, the adoption of these initiatives remains nique ensures that generative models are more effectively
uneven, highlighting the ongoing need for comprehensive adapted to specific user needs or domain requirements,
and consistent ethical practices in AI development and making them more versatile and applicable across various
deployment [214]. contexts.
• Privacy Preservation: This domain focuses on maintain- • Human Value Alignment: This emerging aspect con-
ing data confidentiality and integrity, employing strategies centrates on harmonizing AI models with human ethics
like anonymization and federated learning to minimize and values to ensure that their decisions and actions
direct data exposure, especially when the rise of genera- mirror societal norms and ethical standards, involving
tive AI poses risks of user profiling [238], [239]. Despite the integration of ethical decision-making processes and
these efforts, challenges such as achieving true anonymity the adaptation of AI outputs to conform with human
against correlation attacks highlight the complexities in moral values [265], [89], [266]. This is increasingly
effectively protecting against intrusive surveillance [240], important in scenarios where AI interacts closely with
[241]. Ensuring compliance with privacy laws and im- humans, such as in healthcare, finance, and personal
plementing secure data handling practices are crucial in assistants, to ensure that AI systems make decisions that
this context, demonstrating the continuous need for robust are not only technically sound, but also ethically and
privacy preservation mechanisms. socially responsible, which means human value alignment
is becoming crucial in developing AI systems that are
trusted and accepted by society [89], [267].
E. Advanced Learning
Advanced learning techniques, including self-supervised
learning, meta-learning, and fine-tuning, are at the forefront F. Emerging Trends
of AI research, enhancing the autonomy, efficiency, and ver- Emerging trends in generative AI research are shaping the
satility of AI models. future of technology and human interaction, and they indicate
• Self-supervised Learning: This method emphasizes au- a dynamic shift towards more integrated, interactive, and
tonomous model training using unlabeled data, reducing intelligent AI systems, driving forward the boundaries of what
manual labeling efforts and model biases [242], [165], is possible in the realm of AI. Key developments in this area
[243]. It incorporates generative models like autoencoders include:
and GANs for data distribution learning and original • Multimodal Learning: Multimodal Learning in AI, a
input reconstruction [244], [245], [246], and also includes rapidly evolving subdomain, focuses on combining lan-
contrastive methods such as SimCLR [247] and MoCo guage understanding with computer vision and audio
[248], designed to differentiate between positive and processing to achieve a richer, multi-sensory context
JOURNAL OF LATEX CLASS FILES, VOL. 1, NO. 1, DECEMBER 2023 11

awareness [114], [268]. Recent developments like Gem-


Training
ini model have set new benchmarks by demonstrating
Efficiency
state-of-the-art performance in various multimodal tasks,
including natural image, audio, and video understanding,
and mathematical reasoning [112]. Gemini’s inherently Load Parallelism
multimodal design exemplifies the seamless integration Core Concept
Balancing Techniques
and operation across different information types [112].
Despite the advancements, the field of multimodal learn-
ing still confronts ongoing challenges, such as refining Future
the architectures to handle diverse data types more ef- Directions
fectively [269], [270], developing comprehensive datasets
that accurately represent multifaceted information [269], Figure 4: Conceptual Diagram of MoE’s Innovation
[271], and establishing benchmarks for evaluating the
performance of these complex systems [272], [273].
• Interactive and Cooperative AI: This subdomain aims IV. I NNOVATIVE H ORIZON OF MOE
to enhance the capabilities of AI models to collaborate ef- The MoE model architecture represents a pioneering ad-
fectively with humans in complex tasks [274], [35]. This vancement in transformer-based language models, offering
trend focuses on developing AI systems that can work unparalleled scalability and efficiency (Fig. 4). As evidenced
alongside humans, thereby improving user experience and by recent models like the 1.6 trillion parameter Switch Trans-
efficiency across various applications, including produc- former [285] and the 8x7B parameter Mixtra [286], MoE-
tivity and healthcare [275], [276], [277]. Core aspects of based designs are rapidly redefining the frontiers of model
this subdomain involve advancing AI in areas such as scale and performance across diverse language tasks.
explainability [278], understanding human intentions and
behavior (theory of mind) [279], [280], and scalable coor-
dination between AI systems and humans, a collaborative A. Core Concept and Structure
approach crucial in creating more intuitive and interactive MoE models represent a significant innovation in neural
AI systems, capable of assisting and augmenting human network design, offering enhanced scalability and efficiency in
capabilities in diverse contexts [281], [35]. training and inference [287], [288], [110]. At their core, MoE
• AGI Development: AGI, representing the visionary goal models utilize a sparsity-driven architecture by replacing dense
of crafting AI systems that emulate the comprehensive layers with sparse MoE layers comprising multiple expert
and multifaceted aspects of human cognition, is a sub- networks, where each expert is dedicated to a specific subset
domain focused on developing AI with the capability of the training data or task, and a trainable gating mechanism
for holistic understanding and complex reasoning that dynamically allocates input tokens to these experts, thereby
closely aligns with the depth and breadth of human optimizing computational resources and effectively adapting to
cognitive abilities [282], [283], [32]. AGI is not just about the task’s complexity [94], [21], [110]. MoE models demon-
replicating human intelligence, but also involves crafting strate a substantial advantage in terms of pretraining speed,
systems that can autonomously perform a variety of tasks, outperforming dense models [94], [287]. However, they face
demonstrating adaptability and learning capabilities akin challenges in fine-tuning and require substantial memory for
to those of humans [282], [283]. The pursuit of AGI is a inference due to the necessity of loading all experts into Video
long-term aspiration, continually pushing the boundaries Random Access Memory (VRAM) [289], [290], [110]. The
of AI research and development. structure of MoE involves alternating transformer layers with
• AGI Containment: AGI Safety and Containment ac- router layers containing gating networks for expert routing,
knowledges the potential risks associated with highly leading to an architecture that allows significant parameter
advanced AI systems, focused on ensuring that these ad- scaling and advanced specialization in problem-solving [291],
vanced systems are not only technically proficient but also [21].
ethically aligned with human values and societal norms A distinguishing characteristic of MoE models is their flexi-
[15], [32], [11]. As we progress towards developing bility in managing large datasets, capable of amplifying model
superintelligent systems, it becomes crucial to establish capacity by over a thousand times while only experiencing
rigorous safety protocols and control mechanisms [11]. minor reductions in computational efficiency [289], [292]. The
Key areas of concern include mitigating representational Sparsely-Gated Mixture-of-Experts Layer, a key component
biases, addressing distribution shifts, and correcting spu- of these models, comprises numerous simple feed-forward
rious correlations within AI models [11], [284]. The expert networks and a trainable gating network responsible for
objective is to prevent unintended societal consequences expert selection, which can facilitate the dynamic and sparse
by aligning AI development with responsible and ethical activation of experts for each input instance, maintaining high
standards. computational efficiency [293], [294], [110].
Recent advancements in MoE models, such as those in the
Switch Transformer, have highlighted the significant benefits
of intelligent routing, when the router’s ability to intelligently
JOURNAL OF LATEX CLASS FILES, VOL. 1, NO. 1, DECEMBER 2023 12

route tokens to appropriate experts confers considerable ad- bottlenecks and ensuring a more efficient and streamlined
vantages to MoE models, allowing them to scale up model model operation, leading to improved training processes and
sizes while keeping compute time constant [295], [296], [297]. heightened performance during complex computational tasks
Experimental evidence suggests that routers learn to route [293], [303], [289].
inputs according to data clusters, demonstrating their potential
in real-world applications [295], [289]. The core concept D. Parallelism and Serving Techniques
and structure of MoE models lie in their dynamic routing Recent developments in MoE models highlighted their ef-
and specialization capabilities, offering promising avenues for ficiency in parallelism and serving techniques, significantly
scaling up neural networks and enhancing their efficiency and influencing large-scale neural networks. DeepSpeed-MoE, for
adaptability in various tasks, but the robustness of the router instance, introduces advanced parallelism modes like data par-
must be protected against adversarial attacks [289], [298]. allelism, tensor-slicing for non-expert parameters, and expert
parallelism for expert parameters, enhancing model efficiency,
B. Training and Inference Efficiency as their approach optimizes both latency and throughput in
MoE model inference, offering scalable solutions in produc-
MoE models, notably Mixtral 8x7B, are renowned for their
tion environments using multiple Graphics Processing Unit
superior pretraining speed compared to dense models, yet they
(GPU) devices [287]. MoE models, versatile in applications
face hurdles in fine-tuning and demand considerable VRAM
like multilingual tasks and coding, demonstrated impressive
for inference, owing to the requirement of loading all experts
capabilities in handling complex tasks due to their ensemble-
[289], [290], [110]. Recent advancements in MoE architecture
like structure within a single framework [304], [305], [306].
have resulted in notable training cost efficiencies, especially in
Notably, models like Mixtral and Switch Transformer, with
encoder-decoder models, with evidence showing cost savings
over 1.6 trillion parameters, achieved computational efficiency
of up to fivefold in certain contexts when compared to dense
equivalent to a 10 billion-parameter dense model, because
models [21], [289], [298], [287]. Innovations like DeepSpeed-
they benefited from the sublinear scaling of MoE compute
MoE [287] offered new architectural designs and model com-
versus model size, leading to substantial accuracy gains within
pression, decreasing the MoE model size by approximately
fixed compute budgets [21], [289], [287], [110]. Moreover,
3.7x and optimizing inference to achieve up to 7.3x better
DeepSpeed-MoE included model compression techniques, re-
latency and cost efficiency. The progression in distributed
ducing model size by up to 3.7x while maintaining accuracy,
MoE training and inference, notably with innovations like
and an end-to-end MoE training and inference solution, part
Lina [299], has effectively tackled the all-to-all communication
of the DeepSpeed library, which was instrumental in serv-
bottleneck by enhancing tensor partitioning, which not only
ing large-scale MoE models with enhanced speed and cost-
improves all-to-all communication and training step time, but
efficiency [287]. These innovations open new directions in AI,
also optimizes resource scheduling during inference, leading
shifting from dense to sparse MoE models, where training and
to a substantial reduction in training step time by up to 1.73
deploying higher-quality models with fewer resources become
times and lowering the 95th percentile inference time by an
more widely achievable.
average of 1.63 times compared to existing systems. These
developments have marked a crucial shift in the large model E. Future Directions and Applications
landscape, from dense to sparse MoE models, expanding the
Emerging research on MoE architectures could focus on
potential applications of AI by training higher-quality models
advancing sparse fine-tuning techniques, exploring instruction
with fewer resources.
tuning methods, and improving routing algorithms to fully
utilize performance and efficiency gains. As models scale
C. Load Balancing and Router Optimization over one billion parameters, MoE represents a paradigm shift
Effective load balancing is essential in MoE models to for vastly expanding capabilities across scientific, medical,
guarantee a uniform distribution of computational load among creative, and real-world applications. Frontier work could also
experts, with the router network in MoE layers, responsible for aim to refine auto-tuning of hyperparameters during fine-
selecting the appropriate experts for processing specific tokens, tuning to optimize accuracy, calibration, and safety. MoE re-
playing a pivotal role in achieving this balance, which is funda- search continues to push model scale limits while maintaining
mental to the stability and overall performance of MoE models specialization for transfer learning. Adaptive sparse access
[293], [289], [288], [300], [110]. Developments in router Z- allows coordinating thousands of experts to cooperate on tasks
loss regularization techniques plays a crucial role in addressing ranging from reasoning to open domain dialogue. Continued
expert imbalance in MoE models by fine-tuning the gating analysis of routing mechanisms seeks to balance load across
mechanism, ensuring a more equitable workload distribution experts and minimize redundant computation. As the AI
across experts and fostering a stable training environment, community further investigates MoE methods at scale, these
thereby enhancing model performance and reducing training models hold promise for new breakthroughs in language, code
time and computational overhead [301], [302]. Concurrently, generation, reasoning, and multimodal applications. There
the integration of expert capacity management strategies, is great interest in evaluating implications across education,
emerges as a crucial approach in MoE models to regulate the healthcare, financial analysis, and other fields. Outcomes may
processing abilities of individual experts by setting thresholds yield insights not only into model optimization but also for
on the number of tokens each can handle, effectively averting understanding principles behind combinatorial generalization.
JOURNAL OF LATEX CLASS FILES, VOL. 1, NO. 1, DECEMBER 2023 13

Self- comprehensive and dynamic ethical guidelines that evolve in


Learning and tandem with AI advancements.
Exploration
B. Advanced Self-Learning and Exploration
Common
Human-Level General In the realm of advanced AI development, Q* is antici-
Sense
Understanding Intelligence pated to represent a significant evolution in self-learning and
Reasoning
exploration capabilities. It is speculated to utilize sophisticated
Real-World Policy Neural Networks (NNs), similar to those in AlphaGo,
Knowledge but with substantial enhancements to handle the complexities
Integration of language and reasoning tasks. These networks are expected
to employ advanced reinforcement learning techniques like
Figure 5: Conceptual Diagram of Speculated Q* Capabilities
Proximal Policy Optimization (PPO), which stabilizes policy
updates and improves sample efficiency, a crucial factor in
V. S PECULATED C APABILITIES OF Q* autonomous learning. The integration of these NNs with
cutting-edge search algorithms, potentially including novel
In the burgeoning realm of AI, the anticipated Q* project
stands as a beacon of potential breakthroughs, heralding ad- iterations of Tree or Graph of Thought, is predicted to en-
vancements that could redefine the landscape of AI capabilities able Q* to autonomously navigate and assimilate complex
information. This approach might be augmented with graph
(Fig. 5).
neural networks to bolster meta-learning capacities, allowing
Q* to rapidly adapt to new tasks and environments while
A. Enhanced General Intelligence
retaining previously acquired knowledge. The corresponding
Q*’s development in the arena of general intelligence rep- quasi-mathematical formulation can be represented as:
resents a paradigm shift from specialized to holistic AI, indi-
cating a broadening of the model’s cognitive abilities akin to ASLE(Q∗) = RL(P N N, SA) × GN N (2)
human intelligence. This advanced form of general intelligence
involves integrating diverse neural network architectures and Where:
machine learning techniques, enabling the AI to process and • ASLE: “Advanced Self-Learning and Exploration”
synthesize multifaceted information seamlessly. The universal • RL: to reinforcement learning algorithms, particularly
adapter approach, mirroring models like T0, could endow Proximal Policy Optimization (PPO).
Q* with the capability to rapidly assimilate knowledge from • P N N : Policy Neural Networks, adapted for language
various domains. This method allows Q* to learn adaptable and reasoning tasks.
module plugins, enhancing its ability to tackle new data • SA: sophisticated search algorithms, like Tree or Graph
types while preserving existing skills, leading to an AI model of Thought.
that combines narrow specializations into a comprehensive, • GN N : the incorporation of Graph Neural Networks for
adaptive, and versatile reasoning system. The corresponding meta-learning.
quasi-mathematical formulation can be expressed as: • ×: the cross-functional enhancement of RL with GNN.

M
n Such capabilities indicate a model not limited to understand-
EGI(Q∗) = (N Ni ⊙ M LTi ) (1) ing existing data but equipped to actively seek and synthesize
i=1 new knowledge, effectively adapting to evolving scenarios
Where: without the need for frequent retraining. This signifies a leap
beyond current AI models, embedding a level of autonomy
• EGI: “Enhanced General Intelligence”
and efficiency previously unattained.
• N Ni : a diverse set of neural network architectures.
• M LTi : various machine learning techniques.
L
• : the integration of these components. C. Superior Human-Level Understanding
• ⊙: a functional interaction between neural networks and Q*’s aspiration to achieve superior human-level under-
machine learning techniques. standing is speculated to hinge on an advanced integration
Such advancements in AI suggest the emergence of an intel- of multiple neural networks, including a Value Neural Net-
ligence that not only parallels but potentially exceeds human work (VNN), paralleling the evaluative components found in
cognitive flexibility, with far-reaching implications in facil- systems like AlphaGo. This network would extend beyond
itating cross-disciplinary innovations and complex problem- assessing accuracy and relevance in language and reasoning
solving. The speculated capabilities of Q* bring forth com- processes, delving into the subtleties of human communica-
plex ethical implications and governance challenges. As AI tion. The model’s deep comprehension capabilities may be
systems approach higher levels of autonomy and decision- enhanced by advanced natural language processing algorithms
making, it is crucial to establish robust ethical frameworks and and techniques, such as those found in transformer architec-
governance structures to ensure responsible and transparent AI tures like DeBERTa. These algorithms would empower Q* to
development. This involves mitigating potential risks associ- interpret not just the text but also the nuanced socio-emotional
ated with advanced AI capabilities, emphasizing the need for aspects such as intent, emotion, and underlying meanings.
JOURNAL OF LATEX CLASS FILES, VOL. 1, NO. 1, DECEMBER 2023 14

Incorporating sentiment analysis and natural language infer-


Autonomous
ence, Q* could navigate layers of socio-emotional insights,
Learning
including empathy, sarcasm, and attitude. The corresponding
quasi-mathematical formulation can be expressed as:
X Understanding Common
Cognitive
SHLU (Q∗) = (V N N ⊕ alg) (3) and Sense
Abilities
alg∈N LP Interaction Reasoning

Where:
Knowledge
• SHLU : “Superior Human-Level Understanding”.
Integration
• V N N : the Value Neural Network, similar to evaluative
components in systems like AlphaGo.
Figure 6: Conceptual Diagram of Projected AGI Capabilities
• N LP : a set of advanced NLP algorithms.
• ⊕: the combination of VNN evaluation with NLP algo-
rithms.
coupled with sophisticated neural network architectures and
• alg: individual algorithms within the NLP set.
dynamic learning algorithms, would enable Q* to engage
This level of understanding, surpassing current language deeply with the complexities of the real world, transcending
models, would position Q* to excel in empathetic, context- conventional AI limitations. Additionally, Q* might employ
aware interactions, thus enabling a new echelon of personal- mathematical theorem proving techniques for validation, en-
ization and user engagement in AI applications. suring that its reasoning and outputs are not only accurate but
also ethically grounded. The incorporation of Ethics classifiers
D. Advanced Common Sense Reasoning in this process further strengthens its capacity to deliver
Q*’s anticipated development in advanced common sense reliable and responsible understanding and interaction with
reasoning is predicted to integrate sophisticated logic and real-world scenarios. The corresponding quasi-mathematical
decision-making algorithms, potentially combining elements formulation can be represented as:
of symbolic AI and probabilistic reasoning. This integration
aims to endow Q* with an intuitive grasp of everyday logic and ERW KI(Q∗) = F V S ⊗ N N ⊗ LT P ⊗ EC (5)
an understanding akin to human common sense, thus bridging
a significant gap between artificial and natural intelligence. Where:
Enhancements in Q*’s reasoning abilities might involve graph- • ERW KI: “Extensive Real-World Knowledge Integra-
structured world knowledge, incorporating physics and social tion”.
engines similar to those in models like CogSKR. This ap- • F V S: Formal Verification Systems.

proach, grounded in physical reality, is expected to capture • N N : neural network architectures.


and interpret the everyday logic often absent in contemporary • LT P : mathematical theorem proving for logical and
AI systems. By leveraging large-scale knowledge bases and factual validation.
semantic networks, Q* could effectively navigate and respond • EC: the incorporation of Ethics classifiers.
to complex social and practical scenarios, aligning its infer- • ⊗: the comprehensive integration for knowledge synthesis
ences and decisions more closely with human experiences and and ethical alignment.
expectations. The corresponding quasi-mathematical formula- Furthermore, the speculated capabilities of Q* have the
tion can be represented as: potential to significantly reshape the job market and labor
dynamics. With its advanced functionalities, Q* could auto-
mate complex tasks, leading to a shift in job requirements and
ACSR(Q∗) = LogicAI ⊙ P robAI ⊙ W orldK (4)
the emergence of new skill demands. This necessitates a re-
Where: evaluation of workforce strategies and educational paradigms,
• ACSR: “Advanced Common Sense Reasoning”. aligning them with the evolving technological landscape and
• LogicAI and P robAI: symbolic AI and probabilistic ensuring that the workforce is equipped to interact with and
reasoning components, respectively. complement these advanced AI systems.
• W orldK: the integration of graph-structured world
knowledge. VI. P ROJECTED C APABILITIES OF AGI
• ⊙: the integrated operation of these elements for common
AGI stands as a transformative leap in AI, endeavoring
sense reasoning.
to mirror human cognitive abilities in a software paradigm
(Fig. 6). AGI’s evolution is marked by advanced self-learning
E. Extensive Real-World Knowledge Integration capabilities, utilizing policy neural networks and sophisticated
Q*’s approach to integrating extensive real-world knowl- reinforcement learning techniques for autonomous adaptation.
edge is speculated to involve the use of advanced formal The integration of algorithms like Tree/Graph of Thought with
verification systems, which would provide a robust basis for these networks suggests a future where AGI can independently
validating its logical and factual reasoning. This method, when acquire and apply knowledge across diverse domains.
JOURNAL OF LATEX CLASS FILES, VOL. 1, NO. 1, DECEMBER 2023 15

A. Revolution in Autonomous Learning F. Challenges and Opportunities in AGI Development


AGI is anticipated to revolutionize self-learning and ex- The development of AGI encompasses both challenges
ploration [282], [307], [283], [32]. By incorporating methods and opportunities. While AGI promises productivity boosts
like PPO, AGI models are positioned to achieve a level of in creative fields and innovations in cross-modal generation
autonomous learning and problem-solving that exceeds the techniques, substantial challenges like data bias, computational
current AI models’ dependence on training data, indicating a efficiency, and ethical implications persist [15], [32]. These
potential paradigm shift towards reducing the need for frequent challenges necessitate a balanced approach in AGI develop-
retraining and facilitating dynamic adaptation in response to ment, focusing on data curation, efficient systems, and societal
evolving scenarios [181], [308]. impacts [309].
In the context of AGI development, experts from various
domains caution against overestimating current AI capabilities
B. Broadening of Cognitive Abilities and highlight the gap between the theoretical framework of
Envisaged to integrate various architectures, AGI could AGI and the practical realities of today’s AI [314], [32]. The
promise a level of general intelligence that replicates the multi- envisioned autonomy and cognitive abilities of AGI separate it
faceted nature of human cognition [282], [309]. The universal from current AI models, suggesting a future where AI systems
adapter approach, mirroring models like GPT and BERT, could could perform tasks across various domains without human
facilitate rapid assimilation of diverse information, positioning intervention [282]. This development trajectory underscores
AGI as a system capable of performing tasks across multiple the importance of ethical considerations and technological
domains with an adaptability akin to human intellect [282], breakthroughs in AGI’s journey towards becoming a transfor-
[310]. While AGI’s full capabilities remain speculative, current mative force in society [15], [32]. While projecting the time-
trends suggest its potential application in advanced healthcare line for achieving true AGI remains speculative, recognizing
diagnostics, which is evidenced by recent breakthroughs in AI- potential roadblocks is crucial, such as the current limitations
driven predictive medicine models, indicating AGI’s potential in computational power, and the complexity of replicating
to revolutionize medical diagnosis and treatment. human-like cognitive abilities. These emphasize the need for
sustained research and ethical considerations in the pursuit of
AGI, ensuring responsible and conscientious development.
C. Elevating Understanding and Interaction
AGI is projected to achieve an unparalleled understanding VII. I MPACT A NALYSIS ON G ENERATIVE AI R ESEARCH
of human language and socio-emotional subtleties, leverag- TAXONOMY
ing algorithms like those in transformer architectures, which With the advent of advanced AI developments such as
would enable AGI to engage in complex, empathetic, and con- MoE, multimodality, and AGI, the landscape of Generative
textually aware interactions, suggesting potential applications AI research is undergoing a significant transformation. This
that revolutionize how AI systems communicate and interact section analyzes how these developments are reshaping the
[282], [307], [311]. research taxonomy in generative AI.

D. Advanced Common Sense Reasoning A. Criteria for Impact Analysis


Symbolic AI and probabilistic reasoning, integrated into The continuously evolving landscape of Generative AI,
AGI, could imbue these systems with an innate grasp of which instigates transformative changes across various re-
common sense, to bridge the gap between artificial and natural search domains, necessitates a systematic evaluation of these
intelligence, enabling AGI to navigate and respond effectively advancements’ influence, for which we have established a set
to real-world scenarios with reasoning aligned closely with of criteria detailed in Table II, serving as analytical lenses
human thought processes [282], [312], [313]. to quantify and categorize the impact, deeply rooted in the
dynamic interplay between technological progress and the
evolving paradigms of research focus areas. Our analysis
E. Holistic Integration of Knowledge framework has been constructed on a gradient scale ranging
AGI’s potential in integrating extensive real-world knowl- from emergent to obsolete, reflecting the extent to which
edge, guided by formal verification systems, hints at future areas of Generative AI research are being reshaped. The
capabilities where AGI’s outputs are not only accurate but categorization into five distinct classes allows for a complex
ethically grounded, suggesting AGI’s ability for responsible assessment, acknowledging that not all areas will be uniformly
interaction with real-world complexities [282], [311]. The affected. This multi-tiered approach is informed by historical
projected capabilities of AGI extend to addressing significant patterns of technological disruption and the adaptability of
global challenges, such as climate change, in which AGI’s scientific inquiry.
advanced data analysis and predictive modeling can play a At the apex of our evaluative hierarchy, ‘Emerging Direc-
better and more crucial role in environmental monitoring, fore- tion’ encapsulates the advent of uncharted research vistas,
casting climate patterns, and devising sustainable solutions, propelled by ongoing AI breakthroughs, which is predicated
contributing significantly to global ecological efforts [282], not on conjecture, but on a historical continuum of AI evo-
[283], [32]. lution, where each surge in technological power unfurls new
JOURNAL OF LATEX CLASS FILES, VOL. 1, NO. 1, DECEMBER 2023 16

Table II: Criteria for Analyzing Impact on Generative AI Research


Symbol Criteria Score Definition Justification
ր Emerging Direc- 5 New research areas expected to arise as a Emphasizes novel research domains emerging from AI break-
tion direct consequence of AI advancements. throughs [315], [316].
֒→ Requiring Redi- 4 Areas that need to shift focus or methodol- Technological shifts necessitate reevaluation and redirection
rection ogy to stay relevant with new AI develop- in AI research [315], [317].
ments.
↔ Still Relevant 3 Areas where the advancements have mini- Observes the persistence of certain AI research areas despite
mal or no impact, maintaining their current technological advancements [317].
status and methodologies.
ց Likely to 2 Areas that may lose relevance or become Discusses rapid obsolescence in AI methodologies due to new
Become obsolete with the advent of new AI tech- technologies [318].
Redundant nologies.
△ Inherently Unre- 1 Challenges that may remain unresolved due Inherent difficulties in issues such as aligning AI with diverse
solvable to complexities like subjective human per- human values and ethics [319], [320].
spectives and diverse cultural values.

scientific enigmas and avenues [315], [316]. ‘Areas Requiring dynamic and specialized architectures. While transformers re-
Redirection’ denote research spheres that, though established, main essential, there is a need for them to evolve and integrate
find themselves at an inflection point, necessitating a strategic with these advanced systems for enhanced performance and
pivot to assimilate emergent AI paradigms and an overhaul adaptability.
of traditional methodologies, akin to the transition from rule- Recurrent Neural Networks (RNNs) are facing a potential
based expert systems to adaptive machine learning frameworks decline in relevance, as indicated by their scores: likely to
[315], [317]. The ‘Still Relevant’ classification affirms the become redundant (ց) 2 in both MoE and AGI contexts
tenacity of select research domains that, by addressing persis- and still relevant (↔) 3 in multimodality, totaling a score
tent scientific inquiries or through their inherent malleability, of 7. Although effective for sequence processing, RNNs are
remain impervious to the tides of AI innovation [317]. In challenged by their limitations in handling long-range depen-
contrast, domains categorized as ‘Likely to Become Redun- dencies and lower efficiency compared to newer models like
dant’ confront potential obsolescence, inviting strategic fore- transformers. They may retain some relevance in multimodal
sight and resource reallocation to forestall scientific stagnation tasks involving sequential data but are generally overshadowed
[318]. Lastly, ‘Inherently Unresolvable’ challenges serve as by more advanced architectures.
a sobering reminder of the perpetual dilemmas within AI The MoE models have scored a consistent relevance (↔)
research that defy resolution, rooted in the complex web of of 3 in their own development and a score of 5 (ր) in
human ethics and cultural diversity, thus anchoring the pursuit multimodality, combined with a redirection score (֒→) of 4
of AI within the intractable tapestry of human values and in the context of AGI, amounting to an overall score of 12.
societal imperatives [319], [320]. MoE models are at the forefront of emerging research in
multimodality due to their ability to handle diverse data types.
B. Overview of Impact Analysis For AGI, these models will require adjustments to effectively
This subsection offers a detailed overview of the impact integrate into systems exhibiting general intelligence, espe-
analysis carried out on the research taxonomy within the realm cially in areas beyond their initial specialization.
of generative AI, with a specific focus on recent progress Multimodal Models have received high scores for emerging
in MoE, multimodality, and AGI, aiming to evaluate the research directions (ր) of 5 in both MoE and AGI contexts,
impact of these innovative developments on various facets alongside a score of 3 (↔) for current relevance in multi-
of generative AI research, ranging from model architecture modality, culminating in an overall score of 13. The integration
to sophisticated learning methodologies, and includes both of MoE and the pursuit of AGI are opening new pathways
quantitative and qualitative assessments across a multitude of for research in multimodal models. These developments are
domains and subdomains in LLM research, shedding light on crucial for enhancing the ability to process and synthesize
the extent to which each area is influenced by these techno- information from multiple modalities, a key aspect for both
logical advancements. This evaluation considered factors such specialized and generalized AI systems.
as the emergence of new research directions, the necessity for 2) Impact On Training Techniques: Supervised Learning
redirection in existing research areas, the continued relevance has been assigned a redirection score (֒→) of 4, a relevance
of certain methodologies, and the potential redundancy of score (↔) of 3 in multimodality, and a score indicating poten-
others, and has encapsulated in Table III. tial redundancy (ց) of 2 in the context of AGI, culminating
1) Impact On Model Architecture: Transformer Models in an overall score of 9. While supervised learning requires
have been scored with a redirection requirement (֒→) of 4 in adaptation to fit the MoE framework, it remains relevant for
both MoE and AGI, and a relevance (↔) of 3 in multimodality, multimodal AI models that depend on labeled data. However,
leading to an overall score of 11. These models, forming the with the shift towards more autonomous learning methods in
backbone of many current AI architectures, continue to be AGI, the dependence on extensive labeled datasets typically
relevant for handling complex input sequences. However, the associated with supervised learning may diminish, leading to
emergence of MoE and AGI indicates a shift towards more its potential decrease in significance.
JOURNAL OF LATEX CLASS FILES, VOL. 1, NO. 1, DECEMBER 2023 17

Table III: Impact of MoE, Multimodality, and AGI on Generative AI Research


Domain Subdomain MoE Multimodality AGI Overall Score
Model Architecture Transformer Models ֒→ (4) ↔ (3) ֒→ (4) 11
Recurrent Neural Networks ց (2) ↔ (3) ց (2) 7
Mixture of Experts ↔ (3) ր (5) ֒→ (4) 12
Multimodal Models ր (5) ↔ (3) ր (5) 13
Training Techniques Supervised Learning ֒→ (4) ↔ (3) ց (2) 9
Unsupervised Learning ֒→ (4) ↔ (3) ֒→ (4) 11
Reinforcement Learning ↔ (3) ֒→ (4) ր (5) 12
Transfer Learning ↔ (3) ր (5) ֒→ (4) 12
Application Domains Natural Language Understanding ↔ (3) ↔ (3) ր (5) 11
Natural Language Generation ↔ (3) ֒→ (4) ր (5) 12
Conversational AI ֒→ (4) ր (5) ր (5) 14
Creative AI ֒→ (4) ր (5) ր (5) 14
Compliance and Ethical Considerations Bias Mitigation ֒→ (4) ֒→ (4) ր (5) 13
Data Security ↔ (3) ↔ (3) ↔ (3) 9
AI Ethics ֒→ (4) ֒→ (4) △ (1) 9
Privacy Preservation ֒→ (4) ֒→ (4) ֒→ (4) 12
Advanced Learning Self-supervised Learning ֒→ (4) ր (5) ↔ (3) 12
Meta-learning ↔ (3) ↔ (3) ր (5) 11
Fine Tuning ↔ (3) ↔ (3) ց (2) 8
Human Value Alignment △ (1) △ (1) △ (1) 3
Emerging Trends Multimodal Learning ր (5) ↔ (3) ր (5) 13
Interactive and Cooperative AI ֒→ (4) ↔ (3) ր (5) 12
AGI Development ֒→ (4) ֒→ (4) ↔ (3) 11
AGI Containment △ (1) △ (1) ր (5) 7

Unsupervised Learning scores a redirection requirement 3) Impact On Application Domains: Natural Language
(֒→) of 4 in both MoE and AGI contexts and maintains its Understanding holds steady relevance (↔) with a score of 3 in
relevance (↔) with a score of 3 in multimodality, resulting in a both MoE and multimodality, and an emerging direction (ր)
total score of 11. In the MoE architecture, unsupervised learn- score of 5 in AGI, totaling an overall score of 11. MoE models
ing methods may need adjustments, particularly in managing support the relevance of NLU by enhancing its precision and
dynamic task allocation. It remains crucial for understanding depth through their ability to handle large, diverse datasets.
unlabeled data across various modalities. In AGI, unsupervised In multimodal AI, NLU remains a critical component for
learning is expected to evolve beyond traditional techniques, comprehending language in diverse data formats. With AGI’s
focusing on more advanced self-discovery and intrinsic learn- progress, NLU is expected to undergo significant expansion,
ing mechanisms. moving towards more advanced, human-like comprehension
and interpretation capabilities.
Reinforcement Learning is rated as still relevant (↔) with Natural Language Generation maintains relevance (↔) with
a score of 3 in MoE, requiring redirection (֒→) with a a score of 3 in MoE, requires redirection (֒→) with a score of
score of 4 in multimodality, and identified as an emerging 4 in multimodality, and is identified as an emerging research
research area (ր) with a score of 5 in AGI, giving it a total area (ր) with a score of 5 in AGI, resulting in a total score of
score of 12. This technique continues to play a significant 12. MoE’s scalability is crucial for enhancing NLG, while in
role in optimizing MoE model structures. In the realm of multimodal contexts, NLG may need strategic adjustments to
multimodality, it necessitates a strategic shift to effectively align effectively with other modalities. As AGI evolves, NLG
manage complex interactions between different modalities. As is anticipated to venture into new research domains, especially
for AGI, reinforcement learning is emerging as a crucial area, in creating content that reflects human-like creativity and
particularly in the development of autonomous systems that adaptability.
learn from their environment.
Conversational AI is marked for redirection (֒→) with a
Transfer Learning receives a consistent relevance score (↔) score of 4 in MoE, emerging research directions (ր) with
of 3 in MoE, a high score for emerging research directions a score of 5 in both multimodality and AGI, accumulating
(ր) of 5 in multimodality, and a redirection requirement an overall score of 14. While MoE enhances conversational
(֒→) of 4 in AGI, accumulating to an overall score of 12. AI, it may require strategic changes to fully utilize MoE’s
It remains important in the MoE framework for leveraging distributed expertise. The integration of multiple modalities
knowledge across different experts. In multimodal contexts, opens new avenues for conversational AI, expanding its scope
transfer learning is becoming increasingly crucial as it facili- to include various sensory data. The development of AGI is
tates the transfer of learning between different modalities. With set to bring revolutionary advancements in this domain, paving
the evolution of AGI, this technique is expected to undergo the way for more autonomous, context-aware, and human-like
significant changes to cater to broader and more generalized interactions.
knowledge applications. Creative AI scores a redirection requirement (֒→) of 4 in
JOURNAL OF LATEX CLASS FILES, VOL. 1, NO. 1, DECEMBER 2023 18

MoE, and high scores for emerging research directions (ր) of self-supervised learning remains relevant (↔) with a score
5 in both multimodality and AGI, leading to a total score of 14. of 3, contributing to the system’s autonomy and adaptability,
In the context of MoE, Creative AI may need to be realigned though likely to be integrated with more complex strategies.
to capitalize on MoE’s capacity for generating novel content. The overall impact score is 12.
The combination of different modalities in creative AI presents Meta-learning maintains consistent relevance (↔) with a
exciting new research opportunities, enabling the creation of score of 3 across MoE and multimodality, aligning well with
more intricate and diverse outputs. As AGI progresses, it is the dynamic nature of MoE and aiding quick adaptation to
expected to significantly broaden the capabilities of creative varying data types and tasks in multimodal contexts. In AGI,
AI, potentially surpassing existing boundaries and exploring it is marked as an emerging research direction (ր) with a
new realms of creativity. score of 5, suggesting novel research in achieving human-like
4) Impact On Compliance and Ethical Considerations: adaptability and learning efficiency. The total score for meta-
Bias Mitigation in the context of MoE, multimodality, and learning is 11.
AGI scores a redirection requirement (֒→) of 4 in both MoE Fine tuning continues to be relevant (↔) with a score
and multimodality, and an emerging research direction (ր) of 3 in both MoE and multimodality, being essential for
with a score of 5 in AGI, resulting in an overall score of 13. adapting pre-trained models to specific tasks and tailoring
MoE architectures demand a new approach in bias mitigation multimodal models. However, in AGI, it is likely to become
due to the diversity of expert networks, which could other- redundant (ց) with a score of 2, as AGI aims to develop
wise amplify biases. In multimodal systems, bias mitigation systems that autonomously understand and learn across a
requires novel strategies to address biases in various data broad range of domains, reducing the need for traditional fine-
types, including non-textual forms like images and audio. With tuning processes. The overall impact score for fine tuning is
AGI’s broad cognitive capabilities, a comprehensive approach 8.
towards understanding and addressing biases across diverse Aligning AI with human values poses inherently unresolv-
domains is emerging as a critical research area. able challenges (△) in all contexts—MoE, multimodality, and
Data Security maintains a consistent relevance (↔) with AGI—with a score of 1. This reflects the complexity and
a score of 3 across MoE, multimodality, and AGI, leading diversity of tasks MoE models handle, the integration of
to a total score of 9. The fundamental principles of data various data types in multimodal AI, and the broad range
security remain crucial despite the advancements in MoE, of cognitive abilities encompassed by AGI. These factors
which may necessitate tailored strategies for its distributed contribute to the significant ongoing challenges in aligning
nature. In multimodal AI, the secure handling of diverse data AI with human values, resulting in a total score of 3.
types continues to be of paramount importance. The core 6) Impact On Emerging Trends: Multimodal learning is
tenets of data security are sustained even with the advancement marked as an emerging research direction (ր) with a score
of AGI, though the complexity and scope of security measures of 5 in both MoE and AGI contexts, reflecting its capacity to
are likely to increase. integrate various data types such as text, images, and audio.
AI Ethics is marked for redirection (֒→) with a score This integration is crucial for specialized tasks in MoE and
of 4 in both MoE and multimodality, and faces inherently processing diverse forms of data in AGI. In the realm of
unresolvable challenges (△) with a score of 1 in AGI, accu- multimodality, it remains a core aspect (↔) with a score of 3,
mulating a total score of 9. The decision-making processes being essential for ongoing multimodal AI development. The
and transparency of MoE models necessitate a reevaluation overall impact score is 13.
of ethical considerations. In multimodal AI, ethical concerns, Interactive and Cooperative AI requires redirection (֒→) in
particularly in the interpretation and use of multimodal data, MoE with a score of 4, as MoE models adapt to include more
require new approaches. The ethical challenges in AGI are interactive elements for broader applications. In multimodality,
expected to be complex and involve deep philosophical and interaction and cooperation continue to be central (↔) with
societal implications that might be difficult to fully resolve. a score of 3, especially in fields like robotics and virtual
Privacy Preservation scores a redirection need (֒→) of 4 assistants. AGI’s evolution includes significant advancements
across MoE, multimodality, and AGI, leading to an overall in interactive AI, marking it as an emerging research area (ր)
score of 12. The distributed nature of MoE systems requires a with a score of 5. The total score for this trend is 12.
reassessment of privacy preservation techniques to handle data The development of AGI necessitates redirection (֒→) in
processed by multiple experts. Multimodal AI systems, espe- both MoE and multimodality, each with a score of 4, indicating
cially those handling sensitive data such as images and sounds, the need for more integrated and complex systems. AGI
necessitate tailored privacy strategies. With the extensive data remains at the forefront of its own field (↔) with a score
processing capabilities of AGI, advanced and potentially new of 3, with each breakthrough directly influencing its progress.
approaches to privacy preservation are called for. The overall impact score for AGI development is 11.
5) Impact On Advanced Learning: In the context of MoE, AGI containment is identified as a challenge not required to
self-supervised learning requires redirection (֒→) with a score be solved (△) in both MoE and multimodality, with a score
of 4, signaling the need to adapt to the evolving architecture. of 1, as these areas are not expected to reach the levels of
Emerging research directions (ր) with a score of 5 are iden- autonomy and complexity associated with AGI. However, as
tified in multimodality, suggesting the integration of various AGI progresses, the emerging need for effective containment
autonomous data types like text, image, and audio. For AGI, strategies is marked (ր) with a score of 5, highlighting the
JOURNAL OF LATEX CLASS FILES, VOL. 1, NO. 1, DECEMBER 2023 19

importance of ensuring safe and controlled AI deployment. effectively engage with and leverage the advancements in
The total impact score is 7. AI, equipping them with the necessary skills to navigate its
complexities and innovations.
VIII. E MERGENT R ESEARCH P RIORITIES IN G ENERATIVE
AI C. Emergent Research Priorities in AGI
As we are likely to approach the precipice of a new The AGI domain is witnessing a surge in research priorities
era marked by the advent of Q*, nudging us closer to the across multiple areas:
realization of usable AGI, the research landscape in generative • Multimodal Models in Model Architecture: Similar to
AI is undergoing a crucial transformation. MoE, multimodal models are crucial in AGI, enabling
deeper and more nuanced understanding.
A. Emergent Research Priorities in MoE • Reinforcement Learning in Training Techniques:
The MoE domain is increasingly focusing on two critical Emerging as a key area in AGI, reinforcement learning
areas: focuses on developing autonomous systems learning from
• Multimodal Models in Model Architecture: The in- their environment.
tegration of MoE and AGI is opening new pathways • Application Domains: AGI is extending the boundaries

for research in multimodal models. These developments of natural language understanding and generation, conver-
are enhancing the capability to process and synthesize sational AI, and creative AI, with a focus on human-like
information from multiple modalities, which is crucial comprehension and creativity.
for both specialized and generalized AI systems. • Bias Mitigation in Compliance and Ethical Consid-

• Multimodal Learning in Emerging Trends: MoE is at erations: New directions in bias mitigation are focusing
the forefront of multimodal learning, integrating diverse on a comprehensive approach to addressing biases across
data types like text, images, and audio for specialized diverse domains in AGI.
tasks. This trend is directly impacting the enhancement • Meta-Learning in Advanced Learning: AGI’s pursuit

of the field. of human-like adaptability is leading to novel research in


meta-learning.
Furthermore, an analysis of funding trends and investment
• Emerging Trends: Multimodal learning, interactive and
patterns in AI research could indicate a substantial shift
cooperative AI, and AGI containment strategies are be-
towards areas like multimodal models in MoE. This trend,
coming crucial research areas as AGI progresses.
characterized by increased capital flow into fields involving
complex data processing and autonomous systems, is shaping In line with these developments in AGI, a noticeable trend in
the direction of future research priorities. It underscores the AI research funding and investment patterns is evident. There
growing interest and investment in the potential of generative is a significant inclination towards supporting projects and
AI, influencing both academic and industry-led initiatives. studies in AGI, particularly in areas such as natural language
understanding and generation, and autonomous systems. This
funding trend not only mirrors the escalating interest in the
B. Emergent Research Priorities in Multimodality
capabilities of AGI but also directs the trajectory of future
In the realm of multimodality, several areas are identified research, shaping both academic exploration and industry-
as emerging research priorities: driven projects.
• MoE in Model Architecture: MoE models are becoming
increasingly relevant for handling diverse data types in IX. P RACTICAL I MPLICATIONS AND L IMITATIONS OF
multimodal contexts. G ENERATIVE AI T ECHNOLOGIES
• Transfer Learning in Training Techniques: Transfer
Generative AI technologies, encompassing MoE, multi-
learning is emerging as a key research direction, espe-
modality, and AGI, present unique computational challenges.
cially for learning between different modalities.
This section explores the processing power requirements,
• Conversational AI and Creative AI in Application
memory usage, and scalability concerns inherent in these
Domains: Both conversational AI and creative AI are
advanced AI models.
expanding in multimodal contexts, encompassing visual,
auditory, and other sensory data integration.
• Self-Supervised Learning in Advanced Learning: New A. Computational Complexity and Real-world Applications of
research directions in self-supervised learning are emerg- Generative AI Technologies
ing, focusing on the integration of various data types 1) Computational Complexity: Generative AI technologies,
autonomously. encompassing MoE, multimodality, and AGI, present unique
Additionally, the rise of generative AI, particularly in multi- computational challenges. This section explores the processing
modal contexts, can significantly impact educational curricula power requirements, memory usage, and scalability concerns
and skill development. There is a growing need to update inherent in these advanced AI models.
academic programs to include comprehensive AI literacy, • Processing Power Requirements: Advanced generative
with a focus on multimodal AI technologies. This evolution AI models, including MoE architectures and AGI sys-
in education is aimed at preparing future professionals to tems, require significant processing power [321]. The
JOURNAL OF LATEX CLASS FILES, VOL. 1, NO. 1, DECEMBER 2023 20

demand for GPUs and TPUs is accentuated, particularly 2) Existing Industry Solutions: Generative AI is reshaping
when handling complex computations and large datasets various industries by offering innovative solutions and altering
typical in multimodal AI applications. market dynamics.
• Memory Usage in AI Modeling: A critical challenge in • Sector-Wise Deployment: The diverse applications of
training and deploying large-scale AI models, particularly generative AI, from digital content creation to process
in multimodal and AGI systems executed on GPUs, streamlining, also raise questions about originality and
lies in the substantial GPU and VRAM requirements. intellectual property rights.
Unlike computer RAM, VRAM often cannot be expanded • Impact on Market Dynamics: The effect of AI solutions
easily on many platforms, posing significant constraints. on traditional industry structures and the introduction of
Developing strategies for GPU and VRAM optimization novel business models are significant considerations.
and efficient model scaling is thus crucial for the practical • Challenges and Constraints: Addressing limitations
deployment of these AI technologies. such as scalability, data management complexity, privacy
• Scalability and Efficiency in AI Deployment: Address- concerns, and ethical implications is essential for robust
ing scalability challenges in generative AI, especially in governance frameworks.
MoE and AGI contexts, involves optimizing load man-
agement and parallel processing techniques. This is vital
for their practical application in fields like healthcare, C. Limitations and Future Directions in Generative AI Tech-
finance, and education. nologies
2) Real-world Application Examples of Generative AI Tech- 1) Technical Limitations: Identifying and addressing tech-
nologies: The application of generative AI models in real- nical limitations in generative AI models is crucial for their
world scenarios demonstrates their transformative potential advancement and reliability.
and challenges in various sectors. • Contextual Understanding: Enhancing AI’s ability to
• Healthcare: In healthcare, generative AI facilitates ad- understand and interpret context, especially in natural
vancements in diagnostic imaging and personalized language processing and image recognition, is a key area
medicine, but also raises significant concerns regarding for improvement.
data privacy and the potential for misuse of sensitive • Handling Ambiguous Data: Developing better algo-
health information [322]. rithms for processing ambiguous or incomplete data sets
• Finance: The use of AI for fraud detection and al- is essential for decision-making accuracy and reliability.
gorithmic trading in finance underlines its efficiency • Navigating Human Judgment: Despite generative AI’s
and accuracy, while at the same time, it raises ethical accuracy in interpreting policies and procedures, its im-
concerns, particularly in automated decision-making pro- pact is limited in replacing human judgment. This is
cesses, which may lack transparency and accountability especially true in legal and political contexts where
[323]. decision-makers might selectively use AIGC, leading to
• Education: Generative AI’s role in creating personalized biased outcomes. Thus, the effectiveness of generative AI
learning experiences offers immense benefits in terms of in such scenarios should be realistically assessed.
educational accessibility and tailored instruction. How-
2) Future Research Directions to Enhance the Practicality
ever, it poses challenges in equitable access to technology,
of Generative AI: Future research in generative AI should
potential biases in AI-Generated Content (AIGC), and
focus on addressing current limitations and expanding its
could reduce demand for human educators. Addition-
practical applications.
ally, there’s a growing concern about educators who are
• Improved Contextual Understanding: Research should
against the use of AIGC, fearing it may undermine tradi-
tional teaching methodologies and the role of educators. aim at developing models with better contextual aware-
ness, particularly in complex natural language and image
processing tasks.
B. Commercial Viability and Industry Solutions in Generative
• Robust Handling of Ambiguous Data: Investigating
AI Technologies techniques for effective processing of ambiguous data is
1) Market Readiness: Assessing the market readiness of vital for advancing the decision-making capabilities of AI
generative AI technologies involves analyzing cost, accessi- models.
bility, deployment challenges, and user adoption trends. • Ethical Integration of AIGC in Legal and Political
• Cost Analysis: The financial aspects of deploying gen- Arenas: Future research should focus on the ethical
erative AI, including MoE, multimodality, and AGI, are integration of AI-generated content into legal and political
crucial for market adoption. decision-making processes, which involves developing
• Accessibility and Deployment: Integration of these tech- frameworks that utilize AIGC in a supportive role, en-
nologies into existing systems and the technical expertise suring it enhances human judgment and contributes to
required are key factors influencing their adoption. transparency and fairness [324]. Importantly, researchers
• User Adoption Trends: Understanding current adoption should consider the biases and limitations inherent in AI
patterns provides insights into market acceptance and the [324], alongside the potential for human fallibility, ethical
role of user trust and perceived benefits. complexities, and possible corruption in these domains.
JOURNAL OF LATEX CLASS FILES, VOL. 1, NO. 1, DECEMBER 2023 21

ature, attempt to identify and understand a smaller sets of core


physics contributions [327]. Thus, the rapid expansion of academic
100k math literature across various fields presents a significant challenge
stat
Number of Preprints

for researchers seeking to perform evidence syntheses over the


80k cs.AI increasingly vast body of available knowledge [328]. Further-
cs
more, this explosion in publication volume poses a distinct
60k challenge for literature reviews and surveys, where the human
capacity for manually selecting, understanding, and critically
40k evaluating articles is increasingly strained, potentially leading
to gaps in synthesizing comprehensive knowledge landscapes.
Although reproduction of results is a theoretical possibility,
20k
practical constraints such as the lack of technical expertise,
computational resources, or access to proprietary datasets hin-
0 der rigorous evaluation. This is concerning, as the inability to
20 9
20 1
20 2
20 3
20 4
20 5
20 6
20 7
18

20

20 1
20 2
23
1
1
1
1
1
1
1
1

2
2
thoroughly assess preprint research undermines the foundation
20
20

20 of scientific reliability and validity. Furthermore, the peer-


Year
review system, a cornerstone of academic rigour, is under the
Figure 7: Annual preprint submissions to different categories threat of being further overwhelmed [325], [329]. The potential
on arXiv.org consequences are significant, with unvetted preprints possibly
perpetuating biases or errors within the scientific community
and beyond. The absence of established retraction mechanisms
X. I MPACT OF G ENERATIVE AI ON P REPRINTS ACROSS for preprints, akin to those for published articles, exacerbates
D ISCIPLINES the risk of persistent dissemination of flawed research.
The challenges detailed in this section are not directly The academic community is at a crossroads, necessitating
related to the knowledge domains within generative AI, but an urgent and thoughtful discourse on navigating this emerging
are fueled by the success of Generative AI, particularly the “mess” — a situation that risks spiraling out of control if left
commercialization of ChatGPT. The proliferation of preprints unaddressed. In this context, the role of peer review becomes
in the field of AI (Fig. 7), especially in the cs.AI category increasingly crucial, as it serves as a critical checkpoint for
on platforms like arXiv, has introduced a set of academic quality and validity, ensuring that the rapid production of
challenges that merit careful consideration and strategic re- AI research is rigorously studied for scientific accuracy and
sponse. The rapid commercialization and adoption of tools relevance. However, the current modus operandi of traditional
such as ChatGPT, as evidenced by over 55,700 entries on peer review does not appear to be sustainable, primarily due
Google Scholar mentioning “ChatGPT” within just one year to its inability to keep pace with the exponential growth in
of its commercialization, exemplify the accelerated pace at AI-themed research and Generative-AI-accelerated research
which the field is advancing. This rapid development is not submissions, and the increasingly specialized nature of emerg-
mirrored in the traditional peer-review process, which is con- ing AI topics [325], [326]. This situation is compounded
siderably slower. The peer-review process now appears to be by a finite pool of qualified reviewers, leading to delays,
overwhelmed with manuscripts that are either generated with potential biases, and a burden on the scholarly community.
ChatGPT (or other LLMs), or whose writing processes have This reality demands an exploration of new paradigms for peer
been significantly accelerated by such LLMs, contributing review and dissemination of research that can keep pace with
to a bottleneck in scholarly communication [325], [326]. swift advancements in AI. Innovative models for community-
This situation is further compounded by the fact that many driven vetting processes, enhanced reproducibility checks,
journals in disciplines outside of computer science are also and dynamic frameworks for post-publication scrutiny and
experiencing longer review times and higher rates of desk re- correction may be necessary. Efforts to incorporate automated
jections. Additionally, the flourishing trend of manuscripts and tools and AI-assisted review processes could also be explored
preprints, either generated by or significantly expedited using to alleviate the strain on human reviewers.
tools like ChatGPT, extends beyond computer science into In this rapidly evolving landscape, envision a convergence
diverse academic disciplines. This trend presents a looming between the traditional peer review system and the flour-
challenge, potentially overwhelming both the traditional peer- ishing preprint ecosystem, which could involve creating hy-
review process and the flourishing preprint ecosystem with a brid models (Fig. 8), where preprints undergo a preliminary
volume of work that may not always adhere to established community-based review, harnessing the collective expertise
academic standards. and rapid feedback of the academic community, similar to
The sheer volume of preprints has made the task of se- product review websites and Twitter [330]. This approach
lecting and scrutinizing research exceedingly demanding. In could provide an initial layer of validation, offering addi-
the current research era, the exploration of scientific literature tional insights on issues that may be overlooked by a lim-
has become increasingly complex, as knowledge has continued ited number of peer reviewers. The Editors-in-Chief (EICs)
to expand and disseminate exponentially, while concurrently, could consider the major criticisms and suggestions of an
integrative research efforts attempting to distill these vast liter- article from the community-based review, ensuring a more
JOURNAL OF LATEX CLASS FILES, VOL. 1, NO. 1, DECEMBER 2023 22

Rapid Feedback In-depth


(Similar to Product Academic Rigor and
Review Sites) Initial Assessment Quality
Preprint Community- Validation Formal Peer Assurance
Final Publication
Submission Based Review Review

Figure 8: Possible Convergence Between Traditional Peer Review and the Preprint Ecosystem

thorough and diverse evaluation. Subsequent, more formal generative AI have been identified as significant, as their
peer review processes could then refine and endorse these advancements can enhance model performance and versatility,
preprints for academic rigor and quality assurance. This hybrid and pave the way for future research in areas like ethical AI
model would require robust technological support, possibly alignment and AGI. As we forge ahead, the balance between
leveraging AI and machine learning tools to assist in initial AI advancements and human creativity is not just a goal
screening and identification of suitable reviewers. The aim but a necessity, ensuring AI’s role as a complementary force
would be to establish a seamless continuum from rapid dis- that amplifies our capacity to innovate and solve complex
semination to validated publication, ensuring both the speed challenges. Our responsibility is to guide these advancements
of preprints and the credibility of peer-reviewed research. A towards enriching the human experience, aligning technolog-
balanced approach must be struck to harness the benefits of ical progress with ethical standards and societal well-being.
preprints—such as rapid dissemination of findings and open
access—while mitigating their drawbacks. The development of D ISCLAIMER
new infrastructure and norms could be instrumental in steering
the academic community towards a sustainable model that The authors hereby declare no conflict of interest.
upholds the integrity and trustworthiness of scientific research
in the age of Generative AI. A BBREVIATIONS

XI. C ONCLUSIONS AGI Artificial General Intelligence


AI Artificial Intelligence
This roadmap survey has embarked on an exploration of the AIGC AI-generated content
BERT Bidirectional Encoder Representations from Transformers
transformative trends in generative AI research, particularly CCPA California Consumer Privacy Act
focusing on speculated advancements like Q* and the pro- DQN Deep Q-Networks
gressive strides towards AGI. Our analysis highlights a crucial EU European Union
GAN Generative Adversarial Network
paradigm shift, driven by innovations such as MoE, multi- GDPR General Data Protection Regulation
modal learning, and the pursuit of AGI. These advancements GPT Generative Pre-trained Transformers
signal a future where AI systems could significantly extend GPU Graphics Processing Unit
LIDAR Light Detection and Ranging
their capabilities in reasoning, contextual understanding, and LLM Large Language Model
creative problem-solving. This study reflects on AI’s dual LSTM Long Short-Term Memory
potential to either contribute to or impede global equity MCTS Monte Carlo Tree Search
ML Machine Learning
and justice. The equitable distribution of AI benefits and MoE Mixture of Experts
its role in decision-making processes raise crucial questions NLG Natural Language Generation
about fairness and inclusivity. It is imperative to thoughtfully NLP Natural Language Processing
NLU Natural Language Understanding
integrate AI into societal structures to enhance justice and NN Neural Network
reduce disparities. Despite these advancements, several open PPO Proximal Policy Optimization
questions and research gaps remain. These include ensuring RNNs Recurrent Neural Networks
VNN Value Neural Network
the ethical alignment of advanced AI systems with human VRAM Video Random Access Memory
values and societal norms, a challenge compounded by their
increasing autonomy. The safety and robustness of AGI sys-
R EFERENCES
tems in diverse environments also remain a significant research
gap. Addressing these challenges requires a multidisciplinary [1] A. Turing, “Computing machinery and intelligence,” Mind, vol. 59, no.
approach, incorporating ethical, social, and philosophical per- 236, p. 433, 1950.
[2] D. McDermott, “Artificial intelligence meets natural stupidity,” Acm
spectives. Sigart Bulletin, no. 57, pp. 4–9, 1976.
Our survey has highlighted key areas for future inter- [3] M. Minsky, “Steps toward artificial intelligence,” Proceedings of the
disciplinary research in AI, emphasizing the integration of IRE, vol. 49, no. 1, pp. 8–30, 1961.
[4] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” nature, vol. 521,
ethical, sociological, and technical perspectives. This approach no. 7553, pp. 436–444, 2015.
will foster collaborative research, bridging the gap between [5] M. Minsky and S. Papert, “An introduction to computational geometry,”
technological advancement and societal needs, ensuring that Cambridge tiass., HIT, vol. 479, no. 480, p. 104, 1969.
[6] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning repre-
AI development is aligned with human values and global sentations by back-propagating errors,” nature, vol. 323, no. 6088, pp.
welfare. The roles of MoE, multimodal, and AGI in reshaping 533–536, 1986.
JOURNAL OF LATEX CLASS FILES, VOL. 1, NO. 1, DECEMBER 2023 23

[7] G.-G. Lee, L. Shi, E. Latif, Y. Gao, A. Bewersdorf, M. Nyaaba, S. Guo, objects in rgb-thermal images,” Neurocomputing, vol. 527, pp. 119–
Z. Wu, Z. Liu, H. Wang et al., “Multimodality of ai for education: To- 129, 2023.
wards artificial general intelligence,” arXiv preprint arXiv:2312.06037, [30] Q. Ye, H. Xu, G. Xu, J. Ye, M. Yan, Y. Zhou, J. Wang, A. Hu, P. Shi,
2023. Y. Shi et al., “mplug-owl: Modularization empowers large language
[8] P. Maddigan and T. Susnjak, “Chat2vis: Generating data visualisations models with multimodality,” arXiv preprint arXiv:2304.14178, 2023.
via natural language using chatgpt, codex and gpt-3 large language [31] K. LaGrandeur, “How safe is our reliance on ai, and should we regulate
models,” IEEE Access, 2023. it?” AI and Ethics, vol. 1, pp. 93–99, 2021.
[9] T. R. McIntosh, T. Liu, T. Susnjak, P. Watters, A. Ng, and M. N. [32] S. McLean, G. J. Read, J. Thompson, C. Baber, N. A. Stanton, and
Halgamuge, “A culturally sensitive test to evaluate nuanced gpt hallu- P. M. Salmon, “The risks associated with artificial general intelligence:
cination,” IEEE Transactions on Artificial Intelligence, vol. 1, no. 01, A systematic review,” Journal of Experimental & Theoretical Artificial
pp. 1–13, 2023. Intelligence, vol. 35, no. 5, pp. 649–663, 2023.
[10] M. R. Morris, J. Sohl-dickstein, N. Fiedel, T. Warkentin, A. Dafoe, [33] Y. K. Dwivedi, L. Hughes, E. Ismagilova, G. Aarts, C. Coombs,
A. Faust, C. Farabet, and S. Legg, “Levels of agi: Operationalizing T. Crick, Y. Duan, R. Dwivedi, J. Edwards, A. Eirug, V. Galanos,
progress on the path to agi,” arXiv preprint arXiv:2311.02462, 2023. P. V. Ilavarasan, M. Janssen, P. Jones, A. K. Kar, H. Kizgin, B. Kro-
[11] J. Schuett, N. Dreksler, M. Anderljung, D. McCaffary, L. Heim, nemann, B. Lal, B. Lucini, R. Medaglia, K. Le Meunier-FitzHugh,
E. Bluemke, and B. Garfinkel, “Towards best practices in agi L. C. Le Meunier-FitzHugh, S. Misra, E. Mogaji, S. K. Sharma,
safety and governance: A survey of expert opinion,” arXiv preprint J. B. Singh, V. Raghavan, R. Raman, N. P. Rana, S. Samothrakis,
arXiv:2305.07153, 2023. J. Spencer, K. Tamilmani, A. Tubadji, P. Walton, and M. D. Williams,
[12] X. Shuai, J. Rollins, I. Moulinier, T. Custis, M. Edmunds, and “Artificial intelligence (ai): Multidisciplinary perspectives on emerging
F. Schilder, “A multidimensional investigation of the effects of pub- challenges, opportunities, and agenda for research, practice and policy,”
lication retraction on scholarly impact,” Journal of the Association for International Journal of Information Management, vol. 57, p. 101994,
Information Science and Technology, vol. 68, no. 9, pp. 2225–2236, 2021.
2017. [34] I. Gabriel, “Artificial intelligence, values, and alignment,” Minds and
[13] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Machines, vol. 30, pp. 411–437, 2020.
Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” [35] A. Shaban-Nejad, M. Michalowski, S. Bianco, J. S. Brownstein,
Advances in neural information processing systems, vol. 30, 2017. D. L. Buckeridge, and R. L. Davis, “Applied artificial intelligence in
[14] A. Radford, K. Narasimhan, T. Salimans, I. Sutskever et al., “Improving healthcare: Listening to the winds of change in a post-covid-19 world,”
language understanding by generative pre-training,” 2018. pp. 1969–1971, 2022.
[15] C. Huang, Z. Zhang, B. Mao, and X. Yao, “An overview of artificial [36] Z. Ji, N. Lee, R. Frieske, T. Yu, D. Su, Y. Xu, E. Ishii, Y. J. Bang,
intelligence ethics,” IEEE Transactions on Artificial Intelligence, 2022. A. Madotto, and P. Fung, “Survey of hallucination in natural language
[16] L. Besançon, N. Peiffer-Smadja, C. Segalas, H. Jiang, P. Masuzzo, generation,” ACM Computing Surveys, vol. 55, no. 12, pp. 1–38, 2023.
C. Smout, E. Billy, M. Deforet, and C. Leyrat, “Open science saves [37] B. Min, H. Ross, E. Sulem, A. P. B. Veyseh, T. H. Nguyen, O. Sainz,
lives: lessons from the covid-19 pandemic,” BMC Medical Research E. Agirre, I. Heintz, and D. Roth, “Recent advances in natural language
Methodology, vol. 21, no. 1, pp. 1–18, 2021. processing via large pre-trained language models: A survey,” ACM
[17] C. R. Triggle, R. MacDonald, D. J. Triggle, and D. Grierson, “Requiem Computing Surveys, vol. 56, no. 2, pp. 1–40, 2023.
for impact factors and high publication charges,” Accountability in [38] J. Li, X. Cheng, W. X. Zhao, J.-Y. Nie, and J.-R. Wen, “Halueval:
Research, vol. 29, no. 3, pp. 133–164, 2022. A large-scale hallucination evaluation benchmark for large language
[18] T. McIntosh, A. Kayes, Y.-P. P. Chen, A. Ng, and P. Watters, “Ran- models,” in Proceedings of the 2023 Conference on Empirical Methods
somware mitigation in the modern era: A comprehensive review, in Natural Language Processing, 2023, pp. 6449–6464.
research challenges, and future directions,” ACM Computing Surveys [39] L. Weidinger, J. Mellor, M. Rauh, C. Griffin, J. Uesato, P.-S. Huang,
(CSUR), vol. 54, no. 9, pp. 1–36, 2021. M. Cheng, M. Glaese, B. Balle, A. Kasirzadeh et al., “Ethical and social
[19] T. McIntosh, T. Liu, T. Susnjak, H. Alavizadeh, A. Ng, R. Nowrozy, risks of harm from language models,” arXiv preprint arXiv:2112.04359,
and P. Watters, “Harnessing gpt-4 for generation of cybersecurity grc 2021.
policies: A focus on ransomware attack mitigation,” Computers & [40] X. Zhiheng, Z. Rui, and G. Tao, “Safety and ethical concerns of large
Security, vol. 134, p. 103424, 2023. language models,” in Proceedings of the 22nd Chinese National Con-
[20] H. Bao, W. Wang, L. Dong, Q. Liu, O. K. Mohammed, K. Aggarwal, ference on Computational Linguistics (Volume 4: Tutorial Abstracts),
S. Som, S. Piao, and F. Wei, “Vlmo: Unified vision-language pre- 2023, pp. 9–16.
training with mixture-of-modality-experts,” Advances in Neural Infor- [41] P. F. Brown, V. J. Della Pietra, P. V. Desouza, J. C. Lai, and R. L. Mer-
mation Processing Systems, vol. 35, pp. 32 897–32 912, 2022. cer, “Class-based n-gram models of natural language,” Computational
[21] N. Du, Y. Huang, A. M. Dai, S. Tong, D. Lepikhin, Y. Xu, M. Krikun, linguistics, vol. 18, no. 4, pp. 467–480, 1992.
Y. Zhou, A. W. Yu, O. Firat et al., “Glam: Efficient scaling of [42] S. Katz, “Estimation of probabilities from sparse data for the language
language models with mixture-of-experts,” in International Conference model component of a speech recognizer,” IEEE transactions on
on Machine Learning. PMLR, 2022, pp. 5547–5569. acoustics, speech, and signal processing, vol. 35, no. 3, pp. 400–401,
[22] S. Masoudnia and R. Ebrahimpour, “Mixture of experts: a literature 1987.
survey,” Artificial Intelligence Review, vol. 42, pp. 275–293, 2014. [43] R. Kneser and H. Ney, “Improved backing-off for m-gram language
[23] C. Riquelme, J. Puigcerver, B. Mustafa, M. Neumann, R. Jenatton, modeling,” in 1995 international conference on acoustics, speech, and
A. Susano Pinto, D. Keysers, and N. Houlsby, “Scaling vision with signal processing, vol. 1. IEEE, 1995, pp. 181–184.
sparse mixture of experts,” Advances in Neural Information Processing [44] R. Kuhn and R. De Mori, “A cache-based natural language model
Systems, vol. 34, pp. 8583–8595, 2021. for speech recognition,” IEEE transactions on pattern analysis and
[24] S. E. Yuksel, J. N. Wilson, and P. D. Gader, “Twenty years of mixture of machine intelligence, vol. 12, no. 6, pp. 570–583, 1990.
experts,” IEEE transactions on neural networks and learning systems, [45] H. Ney, U. Essen, and R. Kneser, “On structuring probabilistic de-
vol. 23, no. 8, pp. 1177–1193, 2012. pendences in stochastic language modelling,” Computer Speech &
[25] L. Zhang, S. Huang, W. Liu, and D. Tao, “Learning a mixture of Language, vol. 8, no. 1, pp. 1–38, 1994.
granularity-specific experts for fine-grained categorization,” in Proceed- [46] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural
ings of the IEEE/CVF International Conference on Computer Vision, computation, vol. 9, no. 8, pp. 1735–1780, 1997.
2019, pp. 8331–8340. [47] M. K. Nammous and K. Saeed, “Natural language processing: speaker,
[26] D. Martin, S. Malpica, D. Gutierrez, B. Masia, and A. Serrano, language, and gender identification with lstm,” Advanced Computing
“Multimodality in vr: A survey,” ACM Computing Surveys (CSUR), and Systems for Security: Volume Eight, pp. 143–156, 2019.
vol. 54, no. 10s, pp. 1–36, 2022. [48] D. Wei, B. Wang, G. Lin, D. Liu, Z. Dong, H. Liu, and Y. Liu, “Re-
[27] Q. Sun, Q. Yu, Y. Cui, F. Zhang, X. Zhang, Y. Wang, H. Gao, J. Liu, search on unstructured text data mining and fault classification based on
T. Huang, and X. Wang, “Generative pretraining in multimodality,” rnn-lstm with malfunction inspection report,” Energies, vol. 10, no. 3,
arXiv preprint arXiv:2307.05222, 2023. p. 406, 2017.
[28] L. Wei, L. Xie, W. Zhou, H. Li, and Q. Tian, “Mvp: Multimodality- [49] L. Yao and Y. Guan, “An improved lstm structure for natural language
guided visual pre-training,” in European Conference on Computer processing,” in 2018 IEEE International Conference of Safety Produce
Vision. Springer, 2022, pp. 337–353. Informatization (IICSPI). IEEE, 2018, pp. 565–569.
[29] J. Wu, W. Zhou, X. Qian, J. Lei, L. Yu, and T. Luo, “Menet: [50] L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. Wainwright, P. Mishkin,
Lightweight multimodality enhancement network for detecting salient C. Zhang, S. Agarwal, K. Slama, A. Ray et al., “Training language
JOURNAL OF LATEX CLASS FILES, VOL. 1, NO. 1, DECEMBER 2023 24

models to follow instructions with human feedback,” Advances in [72] Y. Wolf, N. Wies, Y. Levine, and A. Shashua, “Fundamental lim-
Neural Information Processing Systems, vol. 35, pp. 27 730–27 744, itations of alignment in large language models,” arXiv preprint
2022. arXiv:2304.11082, 2023.
[51] T. Susnjak, “Beyond predictive learning analytics modelling and onto [73] H. Dang, L. Mecke, F. Lehmann, S. Goller, and D. Buschek, “How
explainable artificial intelligence with prescriptive analytics and chat- to prompt? opportunities and challenges of zero-and few-shot learning
gpt,” International Journal of Artificial Intelligence in Education, pp. for human-ai interaction in creative applications of generative models,”
1–31, 2023. arXiv preprint arXiv:2209.01390, 2022.
[52] T. Susnjak, E. Griffin, M. McCutcheon, and K. Potter, “Towards clinical [74] R. Ma, X. Zhou, T. Gui, Y. Tan, L. Li, Q. Zhang, and X. Huang,
prediction with transparency: An explainable ai approach to survival “Template-free prompt tuning for few-shot ner,” arXiv preprint
modelling in residential aged care,” arXiv preprint arXiv:2312.00271, arXiv:2109.13532, 2021.
2023. [75] C. Qin and S. Joty, “Lfpt5: A unified framework for lifelong few-
[53] R. Yang, T. F. Tan, W. Lu, A. J. Thirunavukarasu, D. S. W. Ting, shot language learning based on prompt tuning of t5,” arXiv preprint
and N. Liu, “Large language models in health care: Development, arXiv:2110.07298, 2021.
applications, and challenges,” Health Care Science, vol. 2, no. 4, pp. [76] S. Wang, L. Tang, A. Majety, J. F. Rousseau, G. Shih, Y. Ding,
255–263, 2023. and Y. Peng, “Trustworthy assertion classification through prompting,”
[54] D. Baidoo-Anu and L. O. Ansah, “Education in the era of generative Journal of biomedical informatics, vol. 132, p. 104139, 2022.
artificial intelligence (ai): Understanding the potential benefits of chat- [77] Y. Fan, F. Jiang, P. Li, and H. Li, “Grammargpt: Exploring open-source
gpt in promoting teaching and learning,” Journal of AI, vol. 7, no. 1, llms for native chinese grammatical error correction with supervised
pp. 52–62, 2023. fine-tuning,” in CCF International Conference on Natural Language
[55] T. Susnjak, “Chatgpt: The end of online exam integrity?” arXiv preprint Processing and Chinese Computing. Springer, 2023, pp. 69–80.
arXiv:2212.09292, 2022. [78] D. Liga and L. Robaldo, “Fine-tuning gpt-3 for legal rule classifica-
[56] A. Tlili, B. Shehata, M. A. Adarkwah, A. Bozkurt, D. T. Hickey, tion,” Computer Law & Security Review, vol. 51, p. 105864, 2023.
R. Huang, and B. Agyemang, “What if the devil is my guardian angel: [79] Y. Liu, A. Singh, C. D. Freeman, J. D. Co-Reyes, and P. J. Liu, “Im-
Chatgpt as a case study of using chatbots in education,” Smart Learning proving large language model fine-tuning for solving math problems,”
Environments, vol. 10, no. 1, p. 15, 2023. arXiv preprint arXiv:2310.10047, 2023.
[57] M. A. AlAfnan, S. Dishari, M. Jovic, and K. Lomidze, “Chatgpt as an [80] Z. Talat, A. Névéol, S. Biderman, M. Clinciu, M. Dey, S. Longpre,
educational tool: Opportunities, challenges, and recommendations for S. Luccioni, M. Masoud, M. Mitchell, D. Radev et al., “You reap
communication, business writing, and composition courses,” Journal of what you sow: On the challenges of bias evaluation under multilingual
Artificial Intelligence and Technology, vol. 3, no. 2, pp. 60–68, 2023. settings,” in Proceedings of BigScience Episode# 5–Workshop on
[58] A. S. George and A. H. George, “A review of chatgpt ai’s impact on Challenges & Perspectives in Creating Large Language Models, 2022,
several business sectors,” Partners Universal International Innovation pp. 26–41.
Journal, vol. 1, no. 1, pp. 9–23, 2023. [81] Y. Liu, S. Yu, and T. Lin, “Hessian regularization of deep neural
[59] G. K. Hadfield and J. Clark, “Regulatory markets: The future of ai networks: A novel approach based on stochastic estimators of hessian
governance,” arXiv preprint arXiv:2304.04914, 2023. trace,” Neurocomputing, vol. 536, pp. 13–20, 2023.
[60] M. Bakker, M. Chadwick, H. Sheahan, M. Tessler, L. Campbell- [82] Y. Lu, Y. Bo, and W. He, “Confidence adaptive regularization for deep
Gillingham, J. Balaguer, N. McAleese, A. Glaese, J. Aslanides, learning with noisy labels,” arXiv preprint arXiv:2108.08212, 2021.
M. Botvinick et al., “Fine-tuning language models to find agreement [83] G. Pereyra, G. Tucker, J. Chorowski, Ł. Kaiser, and G. Hinton, “Regu-
among humans with diverse preferences,” Advances in Neural Infor- larizing neural networks by penalizing confident output distributions,”
mation Processing Systems, vol. 35, pp. 38 176–38 189, 2022. arXiv preprint arXiv:1701.06548, 2017.
[61] Z. Hu, Y. Lan, L. Wang, W. Xu, E.-P. Lim, R. K.-W. Lee, L. Bing, and [84] E. Chen, Z.-W. Hong, J. Pajarinen, and P. Agrawal, “Redeeming
S. Poria, “Llm-adapters: An adapter family for parameter-efficient fine- intrinsic rewards via constrained optimization,” Advances in Neural
tuning of large language models,” arXiv preprint arXiv:2304.01933, Information Processing Systems, vol. 35, pp. 4996–5008, 2022.
2023. [85] Y. Jiang, Z. Li, M. Tan, S. Wei, G. Zhang, Z. Guan, and B. Han,
[62] H. Liu, D. Tam, M. Muqeeth, J. Mohta, T. Huang, M. Bansal, and C. A. “A stable block adjustment method without ground control points
Raffel, “Few-shot parameter-efficient fine-tuning is better and cheaper using bound constrained optimization,” International Journal of Remote
than in-context learning,” Advances in Neural Information Processing Sensing, vol. 43, no. 12, pp. 4708–4722, 2022.
Systems, vol. 35, pp. 1950–1965, 2022. [86] M. Kachuee and S. Lee, “Constrained policy optimization for con-
[63] H. Zheng, L. Shen, A. Tang, Y. Luo, H. Hu, B. Du, and D. Tao, trolled self-learning in conversational ai systems,” arXiv preprint
“Learn from model beyond fine-tuning: A survey,” arXiv preprint arXiv:2209.08429, 2022.
arXiv:2310.08184, 2023. [87] Z. Song, H. Wang, and Y. Jin, “A surrogate-assisted evolutionary
[64] P. Manakul, A. Liusie, and M. J. Gales, “Selfcheckgpt: Zero-resource framework with regions of interests-based data selection for expensive
black-box hallucination detection for generative large language mod- constrained optimization,” IEEE Transactions on Systems, Man, and
els,” arXiv preprint arXiv:2303.08896, 2023. Cybernetics: Systems, 2023.
[65] A. Martino, M. Iannelli, and C. Truong, “Knowledge injection to [88] J. Yu, T. Xu, Y. Rong, J. Huang, and R. He, “Structure-aware condi-
counter large language model (llm) hallucination,” in European Se- tional variational auto-encoder for constrained molecule optimization,”
mantic Web Conference. Springer, 2023, pp. 182–185. Pattern Recognition, vol. 126, p. 108581, 2022.
[66] J.-Y. Yao, K.-P. Ning, Z.-H. Liu, M.-N. Ning, and L. Yuan, “Llm [89] P. Butlin, “Ai alignment and human reward,” in Proceedings of the 2021
lies: Hallucinations are not bugs, but features as adversarial examples,” AAAI/ACM Conference on AI, Ethics, and Society, 2021, pp. 437–445.
arXiv preprint arXiv:2310.01469, 2023. [90] F. Faal, K. Schmitt, and J. Y. Yu, “Reward modeling for mitigating
[67] Y. Zhang, Y. Li, L. Cui, D. Cai, L. Liu, T. Fu, X. Huang, E. Zhao, toxicity in transformer-based language models,” Applied Intelligence,
Y. Zhang, Y. Chen et al., “Siren’s song in the ai ocean: A survey on hal- vol. 53, no. 7, pp. 8421–8435, 2023.
lucination in large language models,” arXiv preprint arXiv:2309.01219, [91] J. Leike, D. Krueger, T. Everitt, M. Martic, V. Maini, and S. Legg,
2023. “Scalable agent alignment via reward modeling: a research direction,”
[68] J. Ji, M. Liu, J. Dai, X. Pan, C. Zhang, C. Bian, R. Sun, Y. Wang, and arXiv preprint arXiv:1811.07871, 2018.
Y. Yang, “Beavertails: Towards improved safety alignment of llm via [92] L. Li, Y. Chai, S. Wang, Y. Sun, H. Tian, N. Zhang, and H. Wu, “Tool-
a human-preference dataset,” arXiv preprint arXiv:2307.04657, 2023. augmented reward modeling,” arXiv preprint arXiv:2310.01045, 2023.
[69] Y. Liu, Y. Yao, J.-F. Ton, X. Zhang, R. G. H. Cheng, Y. Klochkov, [93] F. Barreto, L. Moharkar, M. Shirodkar, V. Sarode, S. Gonsalves,
M. F. Taufiq, and H. Li, “Trustworthy llms: a survey and guideline and A. Johns, “Generative artificial intelligence: Opportunities and
for evaluating large language models’ alignment,” arXiv preprint challenges of large language models,” in International Conference on
arXiv:2308.05374, 2023. Intelligent Computing and Networking. Springer, 2023, pp. 545–553.
[70] Y. Wang, W. Zhong, L. Li, F. Mi, X. Zeng, W. Huang, L. Shang, [94] Z. Chen, Z. Wang, Z. Wang, H. Liu, Z. Yin, S. Liu, L. Sheng,
X. Jiang, and Q. Liu, “Aligning large language models with human: A W. Ouyang, Y. Qiao, and J. Shao, “Octavius: Mitigating task inter-
survey,” arXiv preprint arXiv:2307.12966, 2023. ference in mllms via moe,” arXiv preprint arXiv:2311.02684, 2023.
[71] Z. Sun, Y. Shen, Q. Zhou, H. Zhang, Z. Chen, D. Cox, Y. Yang, [95] C. Dun, M. D. C. H. Garcia, G. Zheng, A. H. Awadallah, A. Kyrillidis,
and C. Gan, “Principle-driven self-alignment of language mod- and R. Sim, “Sweeping heterogeneity with smart mops: Mixture of
els from scratch with minimal human supervision,” arXiv preprint prompts for llm task adaptation,” arXiv preprint arXiv:2310.02842,
arXiv:2305.03047, 2023. 2023.
JOURNAL OF LATEX CLASS FILES, VOL. 1, NO. 1, DECEMBER 2023 25

[96] H. Naveed, A. U. Khan, S. Qiu, M. Saqib, S. Anwar, M. Usman, ai techniques,” International Journal of High Performance Systems
N. Barnes, and A. Mian, “A comprehensive overview of large language Architecture, vol. 10, no. 3-4, pp. 185–196, 2021.
models,” arXiv preprint arXiv:2307.06435, 2023. [118] C. Zhang, Z. Yang, X. He, and L. Deng, “Multimodal intelligence:
[97] F. Xue, Y. Fu, W. Zhou, Z. Zheng, and Y. You, “To repeat or not to Representation learning, information fusion, and applications,” IEEE
repeat: Insights from scaling llm under token-crisis,” arXiv preprint Journal of Selected Topics in Signal Processing, vol. 14, no. 3, pp.
arXiv:2305.13230, 2023. 478–493, 2020.
[98] M. Nowaz Rabbani Chowdhury, S. Zhang, M. Wang, S. Liu, and P.-Y. [119] H. Qiao, V. Liu, and L. Chilton, “Initial images: using image prompts
Chen, “Patch-level routing in mixture-of-experts is provably sample- to improve subject representation in multimodal ai generated art,” in
efficient for convolutional neural networks,” arXiv e-prints, pp. arXiv– Proceedings of the 14th Conference on Creativity and Cognition, 2022,
2306, 2023. pp. 15–28.
[99] J. Peng, K. Zhou, R. Zhou, T. Hartvigsen, Y. Zhang, Z. Wang, and [120] A. E. Stewart, Z. Keirn, and S. K. D’Mello, “Multimodal modeling
T. Chen, “Sparse moe as a new treatment: Addressing forgetting, fitting, of collaborative problem-solving facets in triads,” User Modeling and
learning issues in multi-modal multi-task learning,” in Conference on User-Adapted Interaction, pp. 1–39, 2021.
Parsimony and Learning (Recent Spotlight Track), 2023. [121] L. Xue, N. Yu, S. Zhang, J. Li, R. Martı́n-Martı́n, J. Wu, C. Xiong,
[100] C. N. d. Santos, J. Lee-Thorp, I. Noble, C.-C. Chang, and D. Uthus, R. Xu, J. C. Niebles, and S. Savarese, “Ulip-2: Towards scal-
“Memory augmented language models through mixture of word ex- able multimodal pre-training for 3d understanding,” arXiv preprint
perts,” arXiv preprint arXiv:2311.10768, 2023. arXiv:2305.08275, 2023.
[101] W. Wang, G. Ma, Y. Li, and B. Du, “Language-routing mixture of [122] L. Yan, L. Zhao, D. Gasevic, and R. Martinez-Maldonado, “Scalability,
experts for multilingual and code-switching speech recognition,” arXiv sustainability, and ethicality of multimodal learning analytics,” in
preprint arXiv:2307.05956, 2023. LAK22: 12th international learning analytics and knowledge confer-
[102] X. Zhao, X. Chen, Y. Cheng, and T. Chen, “Sparse moe with language ence, 2022, pp. 13–23.
guided routing for multilingual machine translation,” in Conference on [123] Y. Liu-Thompkins, S. Okazaki, and H. Li, “Artificial empathy in
Parsimony and Learning (Recent Spotlight Track), 2023. marketing interactions: Bridging the human-ai gap in affective and
[103] W. Huang, H. Zhang, P. Peng, and H. Wang, “Multi-gate mixture- social customer experience,” Journal of the Academy of Marketing
of-expert combined with synthetic minority over-sampling technique Science, vol. 50, no. 6, pp. 1198–1218, 2022.
for multimode imbalanced fault diagnosis,” in 2023 26th International [124] M. S. Rahman, S. Bag, M. A. Hossain, F. A. M. A. Fattah, M. O.
Conference on Computer Supported Cooperative Work in Design Gani, and N. P. Rana, “The new wave of ai-powered luxury brands
(CSCWD). IEEE, 2023, pp. 456–461. online shopping experience: The role of digital multisensory cues and
[104] B. Liu, L. Ding, L. Shen, K. Peng, Y. Cao, D. Cheng, and D. Tao, customers’ engagement,” Journal of Retailing and Consumer Services,
“Diversifying the mixture-of-experts representation for language mod- vol. 72, p. 103273, 2023.
els with orthogonal optimizer,” arXiv preprint arXiv:2310.09762, 2023. [125] E. Sachdeva, N. Agarwal, S. Chundi, S. Roelofs, J. Li, B. Dariush,
[105] W. Wang, Z. Lai, S. Li, W. Liu, K. Ge, Y. Liu, A. Shen, and D. Li, C. Choi, and M. Kochenderfer, “Rank2tell: A multimodal driving
“Prophet: Fine-grained load balancing for parallel training of large- dataset for joint importance ranking and reasoning,” arXiv preprint
scale moe models,” in 2023 IEEE International Conference on Cluster arXiv:2309.06597, 2023.
Computing (CLUSTER). IEEE, 2023, pp. 82–94. [126] C. Cui, Y. Ma, X. Cao, W. Ye, Y. Zhou, K. Liang, J. Chen, J. Lu,
[106] X. Yao, S. Liang, S. Han, and H. Huang, “Enhancing molecular Z. Yang, K.-D. Liao et al., “A survey on multimodal large language
property prediction via mixture of collaborative experts,” arXiv preprint models for autonomous driving,” arXiv preprint arXiv:2311.12320,
arXiv:2312.03292, 2023. 2023.
[107] Z. Xiao, Y. Jiang, G. Tang, L. Liu, S. Xu, Y. Xiao, and W. Yan, “Ad- [127] A. B. Temsamani, A. K. Chavali, W. Vervoort, T. Tuytelaars, G. Rade-
versarial mixture of experts with category hierarchy soft constraint,” vski, H. Van Hamme, K. Mets, M. Hutsebaut-Buysse, T. De Schepper,
in 2021 IEEE 37th International Conference on Data Engineering and S. Latré, “A multimodal ai approach for intuitively instructable
(ICDE). IEEE, 2021, pp. 2453–2463. autonomous systems: a case study of an autonomous off-highway
[108] M. Agbese, R. Mohanani, A. Khan, and P. Abrahamsson, “Implement- vehicle,” in The Eighteenth International Conference on Autonomic
ing ai ethics: Making sense of the ethical requirements,” in Proceedings and Autonomous Systems, ICAS 2022, May 22-26, 2022, Venice, Italy,
of the 27th International Conference on Evaluation and Assessment in 2022, pp. 31–39.
Software Engineering, 2023, pp. 62–71. [128] J. Lee and S. Y. Shin, “Something that they never said: Multimodal
[109] Z. Chen, Y. Deng, Y. Wu, Q. Gu, and Y. Li, “Towards understanding disinformation and source vividness in understanding the power of ai-
the mixture-of-experts layer in deep learning,” Advances in neural enabled deepfake news,” Media Psychology, vol. 25, no. 4, pp. 531–
information processing systems, vol. 35, pp. 23 049–23 062, 2022. 546, 2022.
[110] Y. Zhou, T. Lei, H. Liu, N. Du, Y. Huang, V. Zhao, A. M. Dai, Q. V. [129] S. Muppalla, S. Jia, and S. Lyu, “Integrating audio-visual features
Le, J. Laudon et al., “Mixture-of-experts with expert choice routing,” for multimodal deepfake detection,” arXiv preprint arXiv:2310.03827,
Advances in Neural Information Processing Systems, vol. 35, pp. 7103– 2023.
7114, 2022. [130] S. Kumar, M. K. Chaube, S. N. Nenavath, S. K. Gupta, and S. K.
[111] N. Guha, C. Lawrence, L. A. Gailmard, K. Rodolfa, F. Surani, Tetarave, “Privacy preservation and security challenges: a new frontier
R. Bommasani, I. Raji, M.-F. Cuéllar, C. Honigsberg, P. Liang et al., multimodal machine learning research,” International Journal of Sensor
“Ai regulation has its own alignment problem: The technical and insti- Networks, vol. 39, no. 4, pp. 227–245, 2022.
tutional feasibility of disclosure, registration, licensing, and auditing,” [131] J. Marchang and A. Di Nuovo, “Assistive multimodal robotic system
George Washington Law Review, Forthcoming, 2023. (amrsys): security and privacy issues, challenges, and possible solu-
[112] Gemini Team, Google, “Gemini: A family tions,” Applied Sciences, vol. 12, no. 4, p. 2174, 2022.
of highly capable multimodal models,” 2023, [132] A. Peña, I. Serna, A. Morales, J. Fierrez, A. Ortega, A. Herrarte, M. Al-
accessed: 17 December 2023. [Online]. Available: cantara, and J. Ortega-Garcia, “Human-centric multimodal machine
https://storage.googleapis.com/deepmind-media/gemini/gemini 1 report.pdf learning: Recent advances and testbed on ai-based recruitment,” SN
[113] J. N. Acosta, G. J. Falcone, P. Rajpurkar, and E. J. Topol, “Multimodal Computer Science, vol. 4, no. 5, p. 434, 2023.
biomedical ai,” Nature Medicine, vol. 28, no. 9, pp. 1773–1784, 2022. [133] R. Wolfe and A. Caliskan, “American== white in multimodal language-
[114] S. Qi, Z. Cao, J. Rao, L. Wang, J. Xiao, and X. Wang, “What is and-image ai,” in Proceedings of the 2022 AAAI/ACM Conference on
the limitation of multimodal llms? a deeper look into multimodal AI, Ethics, and Society, 2022, pp. 800–812.
llms through prompt probing,” Information Processing & Management, [134] R. Wolfe, Y. Yang, B. Howe, and A. Caliskan, “Contrastive language-
vol. 60, no. 6, p. 103510, 2023. vision ai models pretrained on web-scraped multimodal data exhibit
[115] B. Xu, D. Kocyigit, R. Grimm, B. P. Griffin, and F. Cheng, “Applica- sexual objectification bias,” in Proceedings of the 2023 ACM Confer-
tions of artificial intelligence in multimodality cardiovascular imaging: ence on Fairness, Accountability, and Transparency, 2023, pp. 1174–
a state-of-the-art review,” Progress in cardiovascular diseases, vol. 63, 1185.
no. 3, pp. 367–376, 2020. [135] M. Afshar, B. Sharma, D. Dligach, M. Oguss, R. Brown, N. Chhabra,
[116] A. Birhane, V. U. Prabhu, and E. Kahembwe, “Multimodal datasets: H. M. Thompson, T. Markossian, C. Joyce, M. M. Churpek et al.,
misogyny, pornography, and malignant stereotypes,” arXiv preprint “Development and multimodal validation of a substance misuse algo-
arXiv:2110.01963, 2021. rithm for referral to treatment using artificial intelligence (smart-ai): a
[117] Y. Li, W. Li, N. Li, X. Qiu, and K. B. Manokaran, “Multimodal infor- retrospective deep learning study,” The Lancet Digital Health, vol. 4,
mation interaction and fusion for the parallel computing system using no. 6, pp. e426–e435, 2022.
JOURNAL OF LATEX CLASS FILES, VOL. 1, NO. 1, DECEMBER 2023 26

[136] H. Alwahaby, M. Cukurova, Z. Papamitsiou, and M. Giannakos, “The [156] Z. Wei, X. Zhang, and M. Sun, “Extracting weighted finite automata
evidence of impact and ethical considerations of multimodal learning from recurrent neural networks for natural languages,” in International
analytics: A systematic literature review,” The Multimodal Learning Conference on Formal Engineering Methods. Springer, 2022, pp.
Analytics Handbook, pp. 289–325, 2022. 370–385.
[137] Q. Miao, W. Zheng, Y. Lv, M. Huang, W. Ding, and F.-Y. Wang, [157] F. Bonassi, M. Farina, J. Xie, and R. Scattolini, “On recurrent neural
“Dao to hanoi via desci: Ai paradigm shifts from alphago to chatgpt,” networks for learning-based control: recent results and ideas for future
IEEE/CAA Journal of Automatica Sinica, vol. 10, no. 4, pp. 877–897, developments,” Journal of Process Control, vol. 114, pp. 92–104, 2022.
2023. [158] Z. Guo, Y. Tang, R. Zhang, D. Wang, Z. Wang, B. Zhao, and
[138] Y. Rong, “Roadmap of alphago to alphastar: Problems and challenges,” X. Li, “Viewrefer: Grasp the multi-view knowledge for 3d visual
in 2nd International Conference on Artificial Intelligence, Automation, grounding,” in Proceedings of the IEEE/CVF International Conference
and High-Performance Computing (AIAHPC 2022), vol. 12348. SPIE, on Computer Vision, 2023, pp. 15 372–15 383.
2022, pp. 904–914. [159] C. Pan, Y. He, J. Peng, Q. Zhang, W. Sui, and Z. Zhang, “Baeformer:
[139] Y. Gao, M. Zhou, D. Liu, Z. Yan, S. Zhang, and D. N. Metaxas, “A Bi-directional and early interaction transformers for bird’s eye view
data-scalable transformer for medical image segmentation: architecture, semantic segmentation,” in Proceedings of the IEEE/CVF Conference
model efficiency, and benchmark,” arXiv preprint arXiv:2203.00131, on Computer Vision and Pattern Recognition, 2023, pp. 9590–9599.
2022. [160] P. Xu, X. Zhu, and D. A. Clifton, “Multimodal learning with transform-
[140] W. Peebles and S. Xie, “Scalable diffusion models with transformers,” ers: A survey,” IEEE Transactions on Pattern Analysis and Machine
in Proceedings of the IEEE/CVF International Conference on Com- Intelligence, 2023.
puter Vision, 2023, pp. 4195–4205. [161] I. Molenaar, S. de Mooij, R. Azevedo, M. Bannert, S. Järvelä, and
[141] R. Pope, S. Douglas, A. Chowdhery, J. Devlin, J. Bradbury, J. Heek, D. Gašević, “Measuring self-regulated learning and the role of ai: Five
K. Xiao, S. Agrawal, and J. Dean, “Efficiently scaling transformer years of research using multimodal multichannel data,” Computers in
inference,” Proceedings of Machine Learning and Systems, vol. 5, Human Behavior, vol. 139, p. 107540, 2023.
2023. [162] S. Steyaert, M. Pizurica, D. Nagaraj, P. Khandelwal, T. Hernandez-
[142] Y. Ding and M. Jia, “Convolutional transformer: An enhanced atten- Boussard, A. J. Gentles, and O. Gevaert, “Multimodal data fusion
tion mechanism architecture for remaining useful life estimation of for cancer biomarker discovery with deep learning,” Nature Machine
bearings,” IEEE Transactions on Instrumentation and Measurement, Intelligence, vol. 5, no. 4, pp. 351–362, 2023.
vol. 71, pp. 1–10, 2022. [163] V. Rani, S. T. Nabi, M. Kumar, A. Mittal, and K. Kumar, “Self-
[143] Y. Ding, M. Jia, Q. Miao, and Y. Cao, “A novel time–frequency supervised learning: A succinct review,” Archives of Computational
transformer based on self–attention mechanism and its application in Methods in Engineering, vol. 30, no. 4, pp. 2761–2775, 2023.
fault diagnosis of rolling bearings,” Mechanical Systems and Signal [164] M. C. Schiappa, Y. S. Rawat, and M. Shah, “Self-supervised learning
Processing, vol. 168, p. 108616, 2022. for videos: A survey,” ACM Computing Surveys, vol. 55, no. 13s, pp.
[144] G. Wang, Y. Zhao, C. Tang, C. Luo, and W. Zeng, “When shift 1–37, 2023.
operation meets vision transformer: An extremely simple alternative [165] J. Yu, H. Yin, X. Xia, T. Chen, J. Li, and Z. Huang, “Self-supervised
to attention mechanism,” in Proceedings of the AAAI Conference on learning for recommender systems: A survey,” IEEE Transactions on
Artificial Intelligence, vol. 36, no. 2, 2022, pp. 2423–2430. Knowledge and Data Engineering, 2023.
[145] H. Cai, J. Li, M. Hu, C. Gan, and S. Han, “Efficientvit: Lightweight [166] V. Bharti, A. Kumar, V. Purohit, R. Singh, A. K. Singh, and S. K.
multi-scale attention for high-resolution dense prediction,” in Proceed- Singh, “A label efficient semi self-supervised learning framework for
ings of the IEEE/CVF International Conference on Computer Vision, iot devices in industrial process,” IEEE Transactions on Industrial
2023, pp. 17 302–17 313. Informatics, 2023.
[146] X. Liu, H. Peng, N. Zheng, Y. Yang, H. Hu, and Y. Yuan, “Efficientvit: [167] D. Sam and J. Z. Kolter, “Losses over labels: Weakly supervised
Memory efficient vision transformer with cascaded group attention,” learning via direct loss construction,” in Proceedings of the AAAI
in Proceedings of the IEEE/CVF Conference on Computer Vision and Conference on Artificial Intelligence, vol. 37, no. 8, 2023, pp. 9695–
Pattern Recognition, 2023, pp. 14 420–14 430. 9703.
[147] Y. Li, Q. Fan, H. Huang, Z. Han, and Q. Gu, “A modified yolov8 [168] M. Wang, P. Xie, Y. Du, and X. Hu, “T5-based model for abstractive
detection network for uav aerial image recognition,” Drones, vol. 7, summarization: A semi-supervised learning approach with consistency
no. 5, p. 304, 2023. loss functions,” Applied Sciences, vol. 13, no. 12, p. 7111, 2023.
[148] F. M. Talaat and H. ZainEldin, “An improved fire detection approach [169] Q. Li, X. Peng, Y. Qiao, and Q. Hao, “Unsupervised person re-
based on yolo-v8 for smart cities,” Neural Computing and Applications, identification with multi-label learning guided self-paced clustering,”
vol. 35, no. 28, pp. 20 939–20 954, 2023. Pattern Recognition, vol. 125, p. 108521, 2022.
[149] S. Tamang, B. Sen, A. Pradhan, K. Sharma, and V. K. Singh, “Enhanc- [170] P. Nancy, H. Pallathadka, M. Naved, K. Kaliyaperumal, K. Arumugam,
ing covid-19 safety: Exploring yolov8 object detection for accurate face and V. Garchar, “Deep learning and machine learning based efficient
mask classification,” International Journal of Intelligent Systems and framework for image based plant disease classification and detection,”
Applications in Engineering, vol. 11, no. 2, pp. 892–897, 2023. in 2022 International Conference on Advanced Computing Technolo-
[150] J. Lu, R. Xiong, J. Tian, C. Wang, C.-W. Hsu, N.-T. Tsou, F. Sun, gies and Applications (ICACTA). IEEE, 2022, pp. 1–6.
and J. Li, “Battery degradation prediction against uncertain future con- [171] P. An, Z. Wang, and C. Zhang, “Ensemble unsupervised autoencoders
ditions with recurrent neural network enabled deep learning,” Energy and gaussian mixture model for cyberattack detection,” Information
Storage Materials, vol. 50, pp. 139–151, 2022. Processing & Management, vol. 59, no. 2, p. 102844, 2022.
[151] A. Onan, “Bidirectional convolutional recurrent neural network archi- [172] S. Yan, H. Shao, Y. Xiao, B. Liu, and J. Wan, “Hybrid robust convo-
tecture with group-wise enhancement mechanism for text sentiment lutional autoencoder for unsupervised anomaly detection of machine
classification,” Journal of King Saud University-Computer and Infor- tools under noises,” Robotics and Computer-Integrated Manufacturing,
mation Sciences, vol. 34, no. 5, pp. 2098–2117, 2022. vol. 79, p. 102441, 2023.
[152] F. Shan, X. He, D. J. Armaghani, P. Zhang, and D. Sheng, “Success [173] E. Ayanoglu, K. Davaslioglu, and Y. E. Sagduyu, “Machine learning
and challenges in predicting tbm penetration rate using recurrent neural in nextg networks via generative adversarial networks,” IEEE Trans-
networks,” Tunnelling and Underground Space Technology, vol. 130, actions on Cognitive Communications and Networking, vol. 8, no. 2,
p. 104728, 2022. pp. 480–501, 2022.
[153] C. Sridhar, P. K. Pareek, R. Kalidoss, S. S. Jamal, P. K. Shukla, S. J. [174] K. Yan, X. Chen, X. Zhou, Z. Yan, and J. Ma, “Physical model
Nuagah et al., “Optimal medical image size reduction model creation informed fault detection and diagnosis of air handling units based
using recurrent neural network and genpsowvq,” Journal of Healthcare on transformer generative adversarial network,” IEEE Transactions on
Engineering, vol. 2022, 2022. Industrial Informatics, vol. 19, no. 2, pp. 2192–2199, 2022.
[154] J. Zhu, Q. Jiang, Y. Shen, C. Qian, F. Xu, and Q. Zhu, “Application [175] N.-R. Zhou, T.-F. Zhang, X.-W. Xie, and J.-Y. Wu, “Hybrid quantum–
of recurrent neural network to mechanical fault diagnosis: A review,” classical generative adversarial networks for image generation via
Journal of Mechanical Science and Technology, vol. 36, no. 2, pp. learning discrete distribution,” Signal Processing: Image Communica-
527–542, 2022. tion, vol. 110, p. 116891, 2023.
[155] S. Lin, W. Lin, W. Wu, F. Zhao, R. Mo, and H. Zhang, “Segrnn: Seg- [176] P. Ladosz, L. Weng, M. Kim, and H. Oh, “Exploration in deep
ment recurrent neural network for long-term time series forecasting,” reinforcement learning: A survey,” Information Fusion, vol. 85, pp.
arXiv preprint arXiv:2308.11200, 2023. 1–22, 2022.
JOURNAL OF LATEX CLASS FILES, VOL. 1, NO. 1, DECEMBER 2023 27

[177] Y. Matsuo, Y. LeCun, M. Sahani, D. Precup, D. Silver, M. Sugiyama, [198] W. Peng, D. Xu, T. Xu, J. Zhang, and E. Chen, “Are gpt embeddings
E. Uchibe, and J. Morimoto, “Deep learning, reinforcement learning, useful for ads and recommendation?” in International Conference on
and world models,” Neural Networks, vol. 152, pp. 267–275, 2022. Knowledge Science, Engineering and Management. Springer, 2023,
[178] D. Bertoin, A. Zouitine, M. Zouitine, and E. Rachelson, “Look where pp. 151–162.
you look! saliency-guided q-networks for generalization in visual [199] E. Erdem, M. Kuyu, S. Yagcioglu, A. Frank, L. Parcalabescu, B. Plank,
reinforcement learning,” Advances in Neural Information Processing A. Babii, O. Turuta, A. Erdem, I. Calixto et al., “Neural natural
Systems, vol. 35, pp. 30 693–30 706, 2022. language generation: A survey on multilinguality, multimodality, con-
[179] A. Hafiz, “A survey of deep q-networks used for reinforcement trollability and learning,” Journal of Artificial Intelligence Research,
learning: State of the art,” Intelligent Communication Technologies and vol. 73, pp. 1131–1207, 2022.
Virtual Mobile Networks: Proceedings of ICICV 2022, pp. 393–402, [200] J. Qian, L. Dong, Y. Shen, F. Wei, and W. Chen, “Controllable
2022. natural language generation with contrastive prefixes,” arXiv preprint
[180] A. Hafiz, M. Hassaballah, A. Alqahtani, S. Alsubai, and M. A. Hameed, arXiv:2202.13257, 2022.
“Reinforcement learning with an ensemble of binary action deep q- [201] H. Rashkin, V. Nikolaev, M. Lamm, L. Aroyo, M. Collins, D. Das,
networks.” Computer Systems Science & Engineering, vol. 46, no. 3, S. Petrov, G. S. Tomar, I. Turc, and D. Reitter, “Measuring attribution
2023. in natural language generation models,” Computational Linguistics, pp.
[181] A. Alagha, S. Singh, R. Mizouni, J. Bentahar, and H. Otrok, “Target 1–64, 2023.
localization using multi-agent deep reinforcement learning with prox- [202] A. K. Pandey and S. S. Roy, “Natural language generation using
imal policy optimization,” Future Generation Computer Systems, vol. sequential models: A survey,” Neural Processing Letters, pp. 1–34,
136, pp. 342–357, 2022. 2023.
[182] S. S. Hassan, Y. M. Park, Y. K. Tun, W. Saad, Z. Han, and C. S. Hong, [203] J. Y. Khan and G. Uddin, “Automatic code documentation generation
“3to: Thz-enabled throughput and trajectory optimization of uavs in 6g using gpt-3,” in Proceedings of the 37th IEEE/ACM International
networks by proximal policy optimization deep reinforcement learn- Conference on Automated Software Engineering, 2022, pp. 1–6.
ing,” in ICC 2022-IEEE International Conference on Communications. [204] Y. K. Dwivedi, N. Kshetri, L. Hughes, E. L. Slade, A. Jeyaraj, A. K.
IEEE, 2022, pp. 5712–5718. Kar, A. M. Baabdullah, A. Koohang, V. Raghavan, M. Ahuja et al.,
[183] A. K. Jayant and S. Bhatnagar, “Model-based safe deep reinforcement ““so what if chatgpt wrote it?” multidisciplinary perspectives on op-
learning via a constrained proximal policy optimization algorithm,” portunities, challenges and implications of generative conversational ai
Advances in Neural Information Processing Systems, vol. 35, pp. for research, practice and policy,” International Journal of Information
24 432–24 445, 2022. Management, vol. 71, p. 102642, 2023.
[184] B. Lin, “Reinforcement learning and bandits for speech and language [205] T. Fu, S. Gao, X. Zhao, J.-r. Wen, and R. Yan, “Learning towards
processing: Tutorial, review and outlook,” Expert Systems with Appli- conversational ai: A survey,” AI Open, vol. 3, pp. 14–28, 2022.
cations, p. 122254, 2023. [206] H. Ji, I. Han, and Y. Ko, “A systematic review of conversational
[185] B. Luo, Z. Wu, F. Zhou, and B.-C. Wang, “Human-in-the-loop rein- ai in language education: Focusing on the collaboration with human
forcement learning in continuous-action space,” IEEE Transactions on teachers,” Journal of Research on Technology in Education, vol. 55,
Neural Networks and Learning Systems, 2023. no. 1, pp. 48–63, 2023.
[186] A. Raza, K. P. Tran, L. Koehl, and S. Li, “Designing ecg monitoring [207] Y. Wan, W. Wang, P. He, J. Gu, H. Bai, and M. R. Lyu, “Biasasker:
healthcare system with federated transfer learning and explainable ai,” Measuring the bias in conversational ai system,” in Proceedings of
Knowledge-Based Systems, vol. 236, p. 107763, 2022. the 31st ACM Joint European Software Engineering Conference and
[187] S. Siahpour, X. Li, and J. Lee, “A novel transfer learning approach Symposium on the Foundations of Software Engineering, 2023, pp.
in remaining useful life prediction for incomplete dataset,” IEEE 515–527.
Transactions on Instrumentation and Measurement, vol. 71, pp. 1–11, [208] S. Kusal, S. Patil, J. Choudrie, K. Kotecha, S. Mishra, and A. Abraham,
2022. “Ai-based conversational agents: A scoping review from technologies
[188] Z. Guo, K. Lin, X. Chen, and C.-Y. Chit, “Transfer learning for angle to future directions,” IEEE Access, 2022.
of arrivals estimation in massive mimo system,” in 2022 IEEE/CIC [209] Z. Xiao, “Seeing us through machines: designing and building con-
International Conference on Communications in China (ICCC). IEEE, versational ai to understand humans,” Ph.D. dissertation, University of
2022, pp. 506–511. Illinois at Urbana-Champaign, 2023.
[189] S. Liu, Y. Lu, P. Zheng, H. Shen, and J. Bao, “Adaptive reconstruction [210] H.-K. Ko, G. Park, H. Jeon, J. Jo, J. Kim, and J. Seo, “Large-scale
of digital twins for machining systems: A transfer learning approach,” text-to-image generation models for visual artists’ creative works,” in
Robotics and Computer-Integrated Manufacturing, vol. 78, p. 102390, Proceedings of the 28th International Conference on Intelligent User
2022. Interfaces, 2023, pp. 919–933.
[190] H. Liu, J. Liu, L. Cui, Z. Teng, N. Duan, M. Zhou, and Y. Zhang, [211] A. Pearson, “The rise of crealtives: Using ai to enable and speed up
“Logiqa 2.0—an improved dataset for logical reasoning in natural the creative process,” Journal of AI, Robotics & Workplace Automation,
language understanding,” IEEE/ACM Transactions on Audio, Speech, vol. 2, no. 2, pp. 101–114, 2023.
and Language Processing, 2023. [212] J. Rezwana and M. L. Maher, “Designing creative ai partners with
[191] Y. Meng, J. Huang, Y. Zhang, and J. Han, “Generating training data cofi: A framework for modeling interaction in human-ai co-creative
with language models: Towards zero-shot language understanding,” systems,” ACM Transactions on Computer-Human Interaction, vol. 30,
Advances in Neural Information Processing Systems, vol. 35, pp. 462– no. 5, pp. 1–28, 2023.
477, 2022. [213] S. Sharma and S. Bvuma, “Generative adversarial networks (gans) for
[192] R. M. Samant, M. R. Bachute, S. Gite, and K. Kotecha, “Framework creative applications: Exploring art and music generation,” Interna-
for deep learning-based language models using multi-task learning in tional Journal of Multidisciplinary Innovation and Research Method-
natural language understanding: A systematic literature review and ology, ISSN: 2960-2068, vol. 2, no. 4, pp. 29–33, 2023.
future directions,” IEEE Access, vol. 10, pp. 17 078–17 097, 2022. [214] B. Attard-Frost, A. De los Rı́os, and D. R. Walters, “The ethics of ai
[193] H. Weld, X. Huang, S. Long, J. Poon, and S. C. Han, “A survey business practices: a review of 47 ai ethics guidelines,” AI and Ethics,
of joint intent detection and slot filling models in natural language vol. 3, no. 2, pp. 389–406, 2023.
understanding,” ACM Computing Surveys, vol. 55, no. 8, pp. 1–38, [215] A. Gardner, A. L. Smith, A. Steventon, E. Coughlan, and M. Oldfield,
2022. “Ethical funding for trustworthy ai: proposals to address the respon-
[194] S. Ajmal, A. A. I. Ahmed, and C. Jalota, “Natural language process- sibilities of funders to ensure that projects adhere to trustworthy ai
ing in improving information retrieval and knowledge discovery in practice,” AI and Ethics, pp. 1–15, 2022.
healthcare conversational agents,” Journal of Artificial Intelligence and [216] J. Schuett, “Three lines of defense against risks from ai,” AI &
Machine Learning in Management, vol. 7, no. 1, pp. 34–47, 2023. SOCIETY, pp. 1–15, 2023.
[195] A. Montejo-Ráez and S. M. Jiménez-Zafra, “Current approaches and [217] M. Sloane and J. Zakrzewski, “German ai start-ups and “ai ethics”:
applications in natural language processing,” Applied Sciences, vol. 12, Using a social practice lens for assessing and implementing socio-
no. 10, p. 4859, 2022. technical innovation,” in Proceedings of the 2022 ACM Conference on
[196] K. Vijayan, O. Anand, and A. Sahaj, “Language-agnostic text process- Fairness, Accountability, and Transparency, 2022, pp. 935–947.
ing for information extraction,” in CS & IT Conference Proceedings, [218] M. Vasconcelos, C. Cardonha, and B. Gonçalves, “Modeling epistemo-
vol. 12, no. 23. CS & IT Conference Proceedings, 2022. logical principles for bias mitigation in ai systems: an illustration in
[197] C. D. Manning, “Human language understanding & reasoning,” hiring decisions,” in Proceedings of the 2018 AAAI/ACM Conference
Daedalus, vol. 151, no. 2, pp. 127–138, 2022. on AI, Ethics, and Society, 2018, pp. 323–329.
JOURNAL OF LATEX CLASS FILES, VOL. 1, NO. 1, DECEMBER 2023 28

[219] Y. Yang, A. Gupta, J. Feng, P. Singhal, V. Yadav, Y. Wu, P. Natarajan, [241] M. Song, Z. Wang, Z. Zhang, Y. Song, Q. Wang, J. Ren, and H. Qi,
V. Hedau, and J. Joo, “Enhancing fairness in face detection in computer “Analyzing user-level privacy attack against federated learning,” IEEE
vision systems by demographic bias mitigation,” in Proceedings of the Journal on Selected Areas in Communications, vol. 38, no. 10, pp.
2022 AAAI/ACM Conference on AI, Ethics, and Society, 2022, pp. 813– 2430–2444, 2020.
822. [242] I. Misra and L. v. d. Maaten, “Self-supervised learning of pretext-
[220] R. Schwartz, A. Vassilev, K. Greene, L. Perine, A. Burt, P. Hall et al., invariant representations,” in Proceedings of the IEEE/CVF conference
“Towards a standard for identifying and managing bias in artificial on computer vision and pattern recognition, 2020, pp. 6707–6717.
intelligence,” NIST special publication, vol. 1270, no. 10.6028, 2022. [243] X. Zhai, A. Oliver, A. Kolesnikov, and L. Beyer, “S4l: Self-supervised
[221] W. Guo and A. Caliskan, “Detecting emergent intersectional biases: semi-supervised learning,” in Proceedings of the IEEE/CVF interna-
Contextualized word embeddings contain a distribution of human-like tional conference on computer vision, 2019, pp. 1476–1485.
biases,” in Proceedings of the 2021 AAAI/ACM Conference on AI, [244] T. Chen, X. Zhai, M. Ritter, M. Lucic, and N. Houlsby, “Self-supervised
Ethics, and Society, 2021, pp. 122–133. gans via auxiliary rotation loss,” in Proceedings of the IEEE/CVF
[222] Y. Kong, “Are “intersectionally fair” ai algorithms really fair to women conference on computer vision and pattern recognition, 2019, pp.
of color? a philosophical analysis,” in Proceedings of the 2022 ACM 12 154–12 163.
Conference on Fairness, Accountability, and Transparency, 2022, pp. [245] S. Jenni and P. Favaro, “Self-supervised feature learning by learning
485–494. to spot artifacts,” in Proceedings of the IEEE Conference on Computer
[223] Y. C. Tan and L. E. Celis, “Assessing social and intersectional biases in Vision and Pattern Recognition, 2018, pp. 2733–2742.
contextualized word representations,” Advances in neural information [246] P. Patel, N. Kumari, M. Singh, and B. Krishnamurthy, “Lt-gan: Self-
processing systems, vol. 32, 2019. supervised gan with latent transformation detection,” in Proceedings of
[224] L. Cheng, A. Mosallanezhad, P. Sheth, and H. Liu, “Causal learning the IEEE/CVF winter conference on applications of computer vision,
for socially responsible ai,” arXiv preprint arXiv:2104.12278, 2021. 2021, pp. 3189–3198.
[225] J. D. Correa, J. Tian, and E. Bareinboim, “Identification of causal [247] T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A simple frame-
effects in the presence of selection bias,” in Proceedings of the AAAI work for contrastive learning of visual representations,” in International
Conference on Artificial Intelligence, vol. 33, no. 01, 2019, pp. 2744– conference on machine learning. PMLR, 2020, pp. 1597–1607.
2751. [248] K. He, H. Fan, Y. Wu, S. Xie, and R. Girshick, “Momentum contrast
[226] B. Ghai and K. Mueller, “D-bias: a causality-based human-in-the- for unsupervised visual representation learning,” in Proceedings of the
loop system for tackling algorithmic bias,” IEEE Transactions on IEEE/CVF conference on computer vision and pattern recognition,
Visualization and Computer Graphics, vol. 29, no. 1, pp. 473–482, 2020, pp. 9729–9738.
2022. [249] A. T. Liu, S.-W. Li, and H.-y. Lee, “Tera: Self-supervised learning of
[227] J. N. Yan, Z. Gu, H. Lin, and J. M. Rzeszotarski, “Silva: Interactively transformer encoder representation for speech,” IEEE/ACM Transac-
assessing machine learning fairness using causality,” in Proceedings of tions on Audio, Speech, and Language Processing, vol. 29, pp. 2351–
the 2020 chi conference on human factors in computing systems, 2020, 2366, 2021.
pp. 1–13. [250] Y. Pang, W. Wang, F. E. Tay, W. Liu, Y. Tian, and L. Yuan, “Masked
[228] E. Bertino, M. Kantarcioglu, C. G. Akcora, S. Samtani, S. Mittal, autoencoders for point cloud self-supervised learning,” in European
and M. Gupta, “Ai for security and security for ai,” in Proceedings conference on computer vision. Springer, 2022, pp. 604–621.
of the Eleventh ACM Conference on Data and Application Security [251] T. Hospedales, A. Antoniou, P. Micaelli, and A. Storkey, “Meta-
and Privacy, 2021, pp. 333–334. learning in neural networks: A survey,” IEEE transactions on pattern
[229] H. Susanto, L. F. Yie, D. Rosiyadi, A. I. Basuki, and D. Setiana, “Data analysis and machine intelligence, vol. 44, no. 9, pp. 5149–5169, 2021.
security for connected governments and organisations: Managing au- [252] R. Vilalta and Y. Drissi, “A perspective view and survey of meta-
tomation and artificial intelligence,” in Web 2.0 and cloud technologies learning,” Artificial intelligence review, vol. 18, pp. 77–95, 2002.
for implementing connected government. IGI Global, 2021, pp. 229– [253] M. Al-Shedivat, L. Li, E. Xing, and A. Talwalkar, “On data efficiency
251. of meta-learning,” in International Conference on Artificial Intelligence
[230] S. Dilmaghani, M. R. Brust, G. Danoy, N. Cassagnes, J. Pecero, and and Statistics. PMLR, 2021, pp. 1369–1377.
P. Bouvry, “Privacy and security of big data in ai systems: A research [254] Y. Hu, R. Liu, X. Li, D. Chen, and Q. Hu, “Task-sequencing meta
and standards perspective,” in 2019 IEEE International Conference on learning for intelligent few-shot fault diagnosis with limited data,”
Big Data (Big Data). IEEE, 2019, pp. 5737–5743. IEEE Transactions on Industrial Informatics, vol. 18, no. 6, pp. 3894–
[231] T. McIntosh, “Intercepting ransomware attacks with staged event- 3904, 2021.
driven access control,” Ph.D. dissertation, La Trobe, 2022. [255] S. Baik, J. Choi, H. Kim, D. Cho, J. Min, and K. M. Lee, “Meta-
[232] T. McIntosh, A. Kayes, Y.-P. P. Chen, A. Ng, and P. Watters, “Applying learning with task-adaptive loss function for few-shot learning,” in
staged event-driven access control to combat ransomware,” Computers Proceedings of the IEEE/CVF international conference on computer
& Security, vol. 128, p. 103160, 2023. vision, 2021, pp. 9465–9474.
[233] P. Hummel, M. Braun, M. Tretter, and P. Dabrock, “Data sovereignty: [256] Y. Chen, Z. Liu, H. Xu, T. Darrell, and X. Wang, “Meta-baseline:
A review,” Big Data & Society, vol. 8, no. 1, p. 2053951720982012, Exploring simple meta-learning for few-shot learning,” in Proceedings
2021. of the IEEE/CVF international conference on computer vision, 2021,
[234] M. Lukings and A. Habibi Lashkari, “Data sovereignty,” in Under- pp. 9062–9071.
standing Cybersecurity Law in Data Sovereignty and Digital Gover- [257] M. A. Jamal and G.-J. Qi, “Task agnostic meta-learning for few-shot
nance: An Overview from a Legal Perspective. Springer, 2022, pp. learning,” in Proceedings of the IEEE/CVF Conference on Computer
1–38. Vision and Pattern Recognition, 2019, pp. 11 719–11 727.
[235] M. Hickok, “Lessons learned from ai ethics principles for future [258] R. Behnia, M. R. Ebrahimi, J. Pacheco, and B. Padmanabhan, “Ew-
actions,” AI and Ethics, vol. 1, no. 1, pp. 41–47, 2021. tune: A framework for privately fine-tuning large language models with
[236] J. Zhou and F. Chen, “Ai ethics: From principles to practice,” AI & differential privacy,” in 2022 IEEE International Conference on Data
SOCIETY, pp. 1–11, 2022. Mining Workshops (ICDMW). IEEE, 2022, pp. 560–566.
[237] J. A. Kroll, “Outlining traceability: A principle for operationalizing [259] J. Wei, M. Bosma, V. Y. Zhao, K. Guu, A. W. Yu, B. Lester, N. Du,
accountability in computing systems,” in Proceedings of the 2021 ACM A. M. Dai, and Q. V. Le, “Finetuned language models are zero-shot
Conference on Fairness, Accountability, and Transparency, 2021, pp. learners,” arXiv preprint arXiv:2109.01652, 2021.
758–771. [260] W. Kuang, B. Qian, Z. Li, D. Chen, D. Gao, X. Pan, Y. Xie, Y. Li,
[238] A. Oseni, N. Moustafa, H. Janicke, P. Liu, Z. Tari, and A. Vasilakos, B. Ding, and J. Zhou, “Federatedscope-llm: A comprehensive package
“Security and privacy for artificial intelligence: Opportunities and for fine-tuning large language models in federated learning,” arXiv
challenges,” arXiv preprint arXiv:2102.04661, 2021. preprint arXiv:2309.00363, 2023.
[239] B. C. Stahl and D. Wright, “Ethics and privacy in ai and big data: [261] M. Nguyen, K. Kishan, T. Nguyen, A. Chadha, and T. Vu, “Effi-
Implementing responsible research and innovation,” IEEE Security & cient fine-tuning large language models for knowledge-aware response
Privacy, vol. 16, no. 3, pp. 26–33, 2018. planning,” in Joint European Conference on Machine Learning and
[240] C. Ma, J. Li, K. Wei, B. Liu, M. Ding, L. Yuan, Z. Han, and H. V. Knowledge Discovery in Databases. Springer, 2023, pp. 593–611.
Poor, “Trusted ai in multiagent systems: An overview of privacy and [262] M. Engelbach, D. Klau, F. Scheerer, J. Drawehn, and M. Kintz,
security for distributed learning,” Proceedings of the IEEE, vol. 111, “Fine-tuning and aligning question answering models for complex
no. 9, pp. 1097–1132, 2023. information extraction tasks,” arXiv preprint arXiv:2309.14805, 2023.
JOURNAL OF LATEX CLASS FILES, VOL. 1, NO. 1, DECEMBER 2023 29

[263] T. T. Nguyen, C. Wilson, and J. Dalins, “Fine-tuning llama 2 large [286] S. Shen, L. Hou, Y. Zhou, N. Du, S. Longpre, J. Wei, H. W.
language models for detecting online sexual predatory chats and Chung, B. Zoph, W. Fedus, X. Chen et al., “Mixture-of-experts meets
abusive texts,” arXiv preprint arXiv:2308.14683, 2023. instruction tuning: A winning combination for large language models,”
[264] Q. Zhou, C. Yu, S. Zhang, S. Wu, Z. Wang, and F. Wang, “Regionblip: arXiv preprint arXiv:2305.14705, 2023.
A unified multi-modal pre-training framework for holistic and regional [287] S. Rajbhandari, C. Li, Z. Yao, M. Zhang, R. Y. Aminabadi, A. A.
comprehension,” arXiv preprint arXiv:2308.02299, 2023. Awan, J. Rasley, and Y. He, “Deepspeed-moe: Advancing mixture-of-
[265] T. Arnold and D. Kasenberg, “Value alignment or misalignment - what experts inference and training to power next-generation ai scale,” in
will keep systems accountable?” in AAAI Workshop on AI, Ethics, and International Conference on Machine Learning. PMLR, 2022, pp.
Society, 2017. 18 332–18 346.
[266] I. Gabriel and V. Ghazavi, “The challenge of value alignment: From [288] L. Shen, Z. Wu, W. Gong, H. Hao, Y. Bai, H. Wu, X. Wu, J. Bian,
fairer algorithms to ai safety,” arXiv preprint arXiv:2101.06060, 2021. H. Xiong, D. Yu et al., “Se-moe: A scalable and efficient mixture-
[267] S. Nyholm, “Responsibility gaps, value alignment, and meaningful of-experts distributed training and inference system,” arXiv preprint
human control over artificial intelligence,” in Risk and responsibility arXiv:2205.10034, 2022.
in context. Routledge, 2023, pp. 191–213. [289] C. Hwang, W. Cui, Y. Xiong, Z. Yang, Z. Liu, H. Hu, Z. Wang, R. Salas,
[268] S. Wu, H. Fei, L. Qu, W. Ji, and T.-S. Chua, “Next-gpt: Any-to-any J. Jose, P. Ram et al., “Tutel: Adaptive mixture-of-experts at scale,”
multimodal llm,” arXiv preprint arXiv:2309.05519, 2023. Proceedings of Machine Learning and Systems, vol. 5, 2023.
[269] K. Bayoudh, R. Knani, F. Hamdaoui, and A. Mtibaa, “A survey [290] Y. Wang, S. Mukherjee, X. Liu, J. Gao, A. H. Awadallah, and J. Gao,
on deep multimodal learning for computer vision: advances, trends, “Adamix: Mixture-of-adapter for parameter-efficient tuning of large
applications, and datasets,” The Visual Computer, pp. 1–32, 2021. language models,” arXiv preprint arXiv:2205.12410, vol. 1, no. 2, p. 4,
[270] P. Hu, L. Zhen, D. Peng, and P. Liu, “Scalable deep multimodal learning 2022.
for cross-modal retrieval,” in Proceedings of the 42nd international [291] T. Chen, Z. Zhang, A. Jaiswal, S. Liu, and Z. Wang, “Sparse moe
ACM SIGIR conference on research and development in information as the new dropout: Scaling dense and self-slimmable transformers,”
retrieval, 2019, pp. 635–644. arXiv preprint arXiv:2303.01610, 2023.
[271] A. Rahate, R. Walambe, S. Ramanna, and K. Kotecha, “Multimodal co- [292] H. Zhu, B. He, and X. Zhang, “Multi-gate mixture-of-experts stacked
learning: Challenges, applications with datasets, recent advances and autoencoders for quality prediction in blast furnace ironmaking,” ACS
future directions,” Information Fusion, vol. 81, pp. 203–239, 2022. omega, vol. 7, no. 45, pp. 41 296–41 303, 2022.
[272] L. Che, J. Wang, Y. Zhou, and F. Ma, “Multimodal federated learning: [293] Z. Chi, L. Dong, S. Huang, D. Dai, S. Ma, B. Patra, S. Singhal,
A survey,” Sensors, vol. 23, no. 15, p. 6986, 2023. P. Bajaj, X. Song, X.-L. Mao et al., “On the representation collapse of
[273] P. P. Liang, Y. Lyu, X. Fan, Z. Wu, Y. Cheng, J. Wu, L. Chen, P. Wu, sparse mixture of experts,” Advances in Neural Information Processing
M. A. Lee, Y. Zhu et al., “Multibench: Multiscale benchmarks for Systems, vol. 35, pp. 34 600–34 613, 2022.
multimodal representation learning,” arXiv preprint arXiv:2107.07502, [294] S. Gupta, S. Mukherjee, K. Subudhi, E. Gonzalez, D. Jose, A. H.
2021. Awadallah, and J. Gao, “Sparsely activated mixture-of-experts are
[274] Z. Ashktorab, Q. V. Liao, C. Dugan, J. Johnson, Q. Pan, W. Zhang, robust multi-task learners,” arXiv preprint arXiv:2204.07689, 2022.
S. Kumaravel, and M. Campbell, “Human-ai collaboration in a co- [295] N. Dikkala, N. Ghosh, R. Meka, R. Panigrahy, N. Vyas, and X. Wang,
operative game setting: Measuring social perception and outcomes,” “On the benefits of learning to route in mixture-of-experts models,” in
Proceedings of the ACM on Human-Computer Interaction, vol. 4, no. Proceedings of the 2023 Conference on Empirical Methods in Natural
CSCW2, pp. 1–20, 2020. Language Processing, 2023, pp. 9376–9396.
[275] P. Esmaeilzadeh, T. Mirzaei, and S. Dharanikota, “Patients’ perceptions [296] N. Dryden and T. Hoefler, “Spatial mixture-of-experts,” Advances in
toward human–artificial intelligence interaction in health care: exper- Neural Information Processing Systems, vol. 35, pp. 11 697–11 713,
imental study,” Journal of medical Internet research, vol. 23, no. 11, 2022.
p. e25856, 2021. [297] Z. You, S. Feng, D. Su, and D. Yu, “Speechmoe2: Mixture-of-
[276] M. Nazar, M. M. Alam, E. Yafi, and M. M. Su’ud, “A systematic review experts model with improved routing,” in ICASSP 2022-2022 IEEE
of human–computer interaction and explainable artificial intelligence in International Conference on Acoustics, Speech and Signal Processing
healthcare with artificial intelligence techniques,” IEEE Access, vol. 9, (ICASSP). IEEE, 2022, pp. 7217–7221.
pp. 153 316–153 348, 2021. [298] J. Puigcerver, R. Jenatton, C. Riquelme, P. Awasthi, and S. Bhojana-
[277] A. S. Rajawat, R. Rawat, K. Barhanpurkar, R. N. Shaw, and A. Ghosh, palli, “On the adversarial robustness of mixture of experts,” Advances
“Robotic process automation with increasing productivity and improv- in Neural Information Processing Systems, vol. 35, pp. 9660–9671,
ing product quality using artificial intelligence and machine learning,” 2022.
in Artificial Intelligence for Future Generation Robotics. Elsevier, [299] J. Li, Y. Jiang, Y. Zhu, C. Wang, and H. Xu, “Accelerating distributed
2021, pp. 1–13. {MoE} training and inference with lina,” in 2023 USENIX Annual
[278] S. Mohseni, N. Zarei, and E. D. Ragan, “A multidisciplinary survey Technical Conference (USENIX ATC 23), 2023, pp. 945–959.
and framework for design and evaluation of explainable ai systems,” [300] L. Wu, M. Liu, Y. Chen, D. Chen, X. Dai, and L. Yuan, “Residual
ACM Transactions on Interactive Intelligent Systems (TiiS), vol. 11, no. mixture of experts,” arXiv preprint arXiv:2204.09636, 2022.
3-4, pp. 1–45, 2021. [301] B. Zoph, I. Bello, S. Kumar, N. Du, Y. Huang, J. Dean, N. Shazeer, and
[279] M. C. Buehler and T. H. Weisswange, “Theory of mind based com- W. Fedus, “Designing effective sparse expert models,” arXiv preprint
munication for human agent cooperation,” in 2020 IEEE International arXiv:2202.08906, vol. 2, 2022.
Conference on Human-Machine Systems (ICHMS). IEEE, 2020, pp. [302] ——, “St-moe: Designing stable and transferable sparse expert mod-
1–6. els,” arXiv preprint arXiv:2202.08906, 2022.
[280] M. M. Çelikok, T. Peltola, P. Daee, and S. Kaski, “Interactive ai with [303] Y. Chow, A. Tulepbergenov, O. Nachum, M. Ryu, M. Ghavamzadeh,
a theory of mind,” arXiv preprint arXiv:1912.05284, 2019. and C. Boutilier, “A mixture-of-expert approach to rl-based dialogue
[281] A. Dafoe, E. Hughes, Y. Bachrach, T. Collins, K. R. McKee, J. Z. management,” arXiv preprint arXiv:2206.00059, 2022.
Leibo, K. Larson, and T. Graepel, “Open problems in cooperative ai,” [304] Z. Fan, R. Sarkar, Z. Jiang, T. Chen, K. Zou, Y. Cheng, C. Hao, Z. Wang
arXiv preprint arXiv:2012.08630, 2020. et al., “M3 vit: Mixture-of-experts vision transformer for efficient multi-
[282] S. Bubeck, V. Chandrasekaran, R. Eldan, J. Gehrke, E. Horvitz, task learning with model-accelerator co-design,” Advances in Neural
E. Kamar, P. Lee, Y. T. Lee, Y. Li, S. Lundberg et al., “Sparks of Information Processing Systems, vol. 35, pp. 28 441–28 457, 2022.
artificial general intelligence: Early experiments with gpt-4,” arXiv [305] T. Zadouri, A. Üstün, A. Ahmadian, B. Ermiş, A. Locatelli, and
preprint arXiv:2303.12712, 2023. S. Hooker, “Pushing mixture of experts to the limit: Extremely
[283] N. Fei, Z. Lu, Y. Gao, G. Yang, Y. Huo, J. Wen, H. Lu, R. Song, parameter efficient moe for instruction tuning,” arXiv preprint
X. Gao, T. Xiang et al., “Towards artificial general intelligence via a arXiv:2309.05444, 2023.
multimodal foundation model,” Nature Communications, vol. 13, no. 1, [306] J. Zhu, X. Zhu, W. Wang, X. Wang, H. Li, X. Wang, and J. Dai, “Uni-
p. 3094, 2022. perceiver-moe: Learning sparse generalist models with conditional
[284] R. Williams and R. Yampolskiy, “Understanding and avoiding ai moes,” Advances in Neural Information Processing Systems, vol. 35,
failures: A practical guide,” Philosophies, vol. 6, no. 3, p. 53, 2021. pp. 2664–2678, 2022.
[285] W. Fedus, B. Zoph, and N. Shazeer, “Switch transformers: Scaling [307] F. Dou, J. Ye, G. Yuan, Q. Lu, W. Niu, H. Sun, L. Guan, G. Lu,
to trillion parameter models with simple and efficient sparsity,” The G. Mai, N. Liu et al., “Towards artificial general intelligence (agi)
Journal of Machine Learning Research, vol. 23, no. 1, pp. 5232–5270, in the internet of things (iot): Opportunities and challenges,” arXiv
2022. preprint arXiv:2309.07438, 2023.
JOURNAL OF LATEX CLASS FILES, VOL. 1, NO. 1, DECEMBER 2023 30

[308] Z. Jia, X. Li, Z. Ling, S. Liu, Y. Wu, and H. Su, “Improving from twitter,” Journal of university teaching & learning practice,
policy optimization with generalist-specialist learning,” in International vol. 19, no. 3, p. 02, 2022.
Conference on Machine Learning. PMLR, 2022, pp. 10 104–10 119.
[309] M. Simeone, “Unknown future, repeated present: A narrative-centered
analysis of long-term ai discourse,” Humanist Studies & the Digital
Age, vol. 7, no. 1, 2022.
[310] A. Nair and F. Banaei-Kashani, “Bridging the gap between ar-
tificial intelligence and artificial general intelligence: A ten com-
mandment framework for human-like intelligence,” arXiv preprint
arXiv:2210.09366, 2022.
[311] M. H. Jarrahi, D. Askay, A. Eshraghi, and P. Smith, “Artificial intelli-
gence and knowledge management: A partnership between human and
ai,” Business Horizons, vol. 66, no. 1, pp. 87–99, 2023.
[312] D. J. Edwards, C. McEnteggart, and Y. Barnes-Holmes, “A functional
contextual account of background knowledge in categorization: Im-
plications for artificial general intelligence and cognitive accounts of
general knowledge,” Frontiers in Psychology, vol. 13, p. 745306, 2022.
[313] J. McCarthy, “Artificial intelligence, logic, and formalising common
sense,” Machine Learning and the City: Applications in Architecture
and Urban Design, pp. 69–90, 2022.
[314] S. Friederich, “Symbiosis, not alignment, as the goal for liberal
democracies in the transition to artificial general intelligence,” AI and
Ethics, pp. 1–10, 2023.
[315] S. Makridakis, “The forthcoming artificial intelligence (ai) revolution:
Its impact on society and firms,” Futures, vol. 90, pp. 46–60, 2017.
[316] S. Pal, K. Kumari, S. Kadam, and A. Saha, “The ai revolution,” IARA
Publication, 2023.
[317] S. Verma, R. Sharma, S. Deb, and D. Maitra, “Artificial intelligence in
marketing: Systematic review and future research direction,” Interna-
tional Journal of Information Management Data Insights, vol. 1, no. 1,
p. 100002, 2021.
[318] P. Budhwar, S. Chowdhury, G. Wood, H. Aguinis, G. J. Bamber, J. R.
Beltran, P. Boselie, F. Lee Cooke, S. Decker, A. DeNisi et al., “Human
resource management in the age of generative artificial intelligence:
Perspectives and research directions on chatgpt,” Human Resource
Management Journal, vol. 33, no. 3, pp. 606–659, 2023.
[319] J. B. Telkamp and M. H. Anderson, “The implications of diverse human
moral foundations for assessing the ethicality of artificial intelligence,”
Journal of Business Ethics, vol. 178, no. 4, pp. 961–976, 2022.
[320] X. Zhou, C. Liu, L. Zhai, Z. Jia, C. Guan, and Y. Liu, “Interpretable and
robust ai in eeg systems: A survey,” arXiv preprint arXiv:2304.10755,
2023.
[321] C. Zhang, C. Zhang, C. Li, Y. Qiao, S. Zheng, S. K. Dam, M. Zhang,
J. U. Kim, S. T. Kim, J. Choi et al., “One small step for generative
ai, one giant leap for agi: A complete survey on chatgpt in aigc era,”
arXiv preprint arXiv:2304.06488, 2023.
[322] K. Singhal, T. Tu, J. Gottweis, R. Sayres, E. Wulczyn, L. Hou,
K. Clark, S. Pfohl, H. Cole-Lewis, D. Neal et al., “Towards expert-
level medical question answering with large language models,” arXiv
preprint arXiv:2305.09617, 2023.
[323] S. Wu, O. Irsoy, S. Lu, V. Dabravolski, M. Dredze, S. Gehrmann,
P. Kambadur, D. Rosenberg, and G. Mann, “Bloomberggpt: A large
language model for finance,” arXiv preprint arXiv:2303.17564, 2023.
[324] P. Henderson, K. Sinha, N. Angelard-Gontier, N. R. Ke, G. Fried,
R. Lowe, and J. Pineau, “Ethical challenges in data-driven dialogue
systems,” in Proceedings of the 2018 AAAI/ACM Conference on AI,
Ethics, and Society, 2018, pp. 123–129.
[325] S. A. Bin-Nashwan, M. Sadallah, and M. Bouteraa, “Use of chatgpt
in academia: Academic integrity hangs in the balance,” Technology in
Society, vol. 75, p. 102370, 2023.
[326] N. Liu, A. Brown et al., “Ai increases the pressure to overhaul the
scientific peer review process. comment on “artificial intelligence can
generate fraudulent but authentic-looking scientific medical articles:
Pandora’s box has been opened”,” J Med Internet Res, vol. 25, p.
e50591, 2023.
[327] A. P. Siddaway, A. M. Wood, and L. V. Hedges, “How to do a
systematic review: a best practice guide for conducting and reporting
narrative reviews, meta-analyses, and meta-syntheses,” Annual review
of psychology, vol. 70, pp. 747–770, 2019.
[328] E. Landhuis, “Scientific literature: Information overload,” Nature, vol.
535, no. 7612, pp. 457–458, 2016.
[329] G. D. Chloros, V. P. Giannoudis, and P. V. Giannoudis, “Peer-reviewing
in surgical journals: revolutionize or perish?” Annals of surgery, vol.
275, no. 1, pp. e82–e90, 2022.
[330] K.-A. Allen, J. Reardon, Y. Lu, D. V. Smith, E. Rainsford, and
L. Walsh, “Towards improving peer review: Crowd-sourced insights

You might also like