From Google Gemini To OpenAI
From Google Gemini To OpenAI
1, DECEMBER 2023 1
Abstract—This comprehensive survey explored the evolving (AI) has witnessed a crucial turn with the advent of Large
landscape of generative Artificial Intelligence (AI), with a specific Language Models (LLMs), notably ChatGPT, developed by
focus on the transformative impacts of Mixture of Experts (MoE), OpenAI, and the recent unveiling of Google’s Gemini [7], [8].
multimodal learning, and the speculated advancements towards
Artificial General Intelligence (AGI). It critically examined the This technology has not only revolutionized the industry and
current state and future trajectory of generative Artificial In- academia, but has also reignited critical discussions concerning
telligence (AI), exploring how innovations like Google’s Gemini AI consciousness and its potential threats to humanity [9],
and the anticipated OpenAI Q* project are reshaping research [10], [11]. The development of such advanced AI systems,
priorities and applications across various domains, including including notable competitors like Anthropic’s Claude, and
an impact analysis on the generative AI research taxonomy.
It assessed the computational challenges, scalability, and real- now Gemini, which demonstrates several advances over pre-
world implications of these technologies while highlighting their vious models like GPT-3 and Google’s own LaMDA, has
potential in driving significant progress in fields like healthcare, reshaped the research landscape. Gemini’s ability to learn
finance, and education. It also addressed the emerging academic from two-way conversations and its “spike-and-slab” attention
challenges posed by the proliferation of both AI-themed and AI- method, which allows it to focus on relevant parts of the
generated preprints, examining their impact on the peer-review
process and scholarly communication. The study highlighted the context during multi-turn conversations, represents a signifi-
importance of incorporating ethical and human-centric methods cant leap in developing models that are better equipped for
in AI development, ensuring alignment with societal norms multidomain conversational applications1 . These innovations
and welfare, and outlined a strategy for future AI research in LLMs, including the mixture-of-experts methods employed
that focuses on a balanced and conscientious use of MoE, by Gemini, signal a move towards models that can handle a
multimodality, and AGI in generative AI.
diversity of inputs and foster multimodal approaches. Amidst
Index Terms—AI Ethics, Artificial General Intelligence (AGI), this backdrop, speculations of an OpenAI project known as
Artificial Intelligence (AI), Gemini, Generative AI, Mixture of
Q* (Q-Star) have surfaced, allegedly combining the power of
Experts (MoE), Multimodality, Q* (Q-star), Research Impact
Analysis. LLMs with sophisticated algorithms such as Q-learning and
A* (A-Star algorithm), further contributing to the dynamic
research environment2.
I. I NTRODUCTION
to refine research directions in light of the fast-paced evolution in generative AI research. This paper specifically contributes
of the field, which appears to be partly traced through the to the understanding of how MoE, multimodality, and Artifi-
changing popularity of various research keywords over time. cial General Intelligence (AGI) are impacting generative AI
The release of generative models like GPT and the widespread models, offering detailed analysis and future directions for
commercial success of ChatGPT have been influential. As each of these three key areas. This study does not aim to
depicted in Figure 1, the rise and fall of certain keywords perpetuate conjecture about the unrevealed Q-Star initiative,
appear to have correlated with significant industry milestones, but rather to critically appraise the potential for obsolescence
such as the release of the “Transformer” model in 2017 [13], or insignificance in extant research themes, whilst concur-
the GPT model in 2018 [14], and the commercial ChatGPT-3.5 rently delving into burgeoning prospects within the rapidly
in December 2022. For instance, the spike in searches related transforming LLM panorama. This inquiry is reminiscent
to “Deep Learning” coincides with the breakthroughs in neural of the obsolete nature of encryption-centric or file-entropy-
network applications, while the interest in “Natural Language based ransomware detection methodologies, which have been
Processing” surges as models like GPT and LLaMA redefine eclipsed by the transition of ransomware collectives towards
what’s possible in language understanding and generation. The data theft strategies utilizing varied attack vectors, relegating
enduring attention to “Ethics / Ethical” in AI research, despite contemporary studies on crypto-ransomware to the status of
some fluctuations, reflects the continuous and deep-rooted latecomers [18], [19]. Advances in AI are anticipated to not
concern for the moral dimensions of AI, underscoring that only enhance capabilities in language analysis and knowledge
ethical considerations are not merely a reactionary measure, synthesis but also to pioneer in areas like Mixture of Experts
but an integral and persistent dialogue within the AI discussion (MoE) [20], [21], [22], [23], [24], [25], multimodality [26],
[15]. [27], [28], [29], [30], and Artificial General Intelligence (AGI)
It is academically intriguing to postulate whether these [31], [32], [10], [11], and has already heralded the obso-
trends signify a causal relationship, where technological ad- lescence of conventional, statistics-driven natural language
vancements drive research focus, or if the burgeoning research processing techniques in many domains [8]. Nonetheless,
itself propels technological development. This paper also the perennial imperative for AI to align with human ethics
explores the profound societal and economic impacts of AI and values persists as a fundamental tenet [33], [34], [35],
advancements. We examine how AI technologies are reshap- and the conjectural Q-Star initiative offers an unprecedented
ing various industries, altering employment landscapes, and opportunity to instigate discourse on how such advancements
influencing socio-economic structures. This analysis highlights might reconfigure the LLM research topography. Within this
both the opportunities and challenges posed by AI in the milieu, insights from Dr. Jim Fan (senior research scientist &
modern world, emphasizing its role in driving innovation and lead of AI agents at NVIDIA) on Q*, particularly concerning
economic growth, while also considering the ethical implica- the amalgamation of learning and search algorithms, furnish
tions and potential for societal disruption. Future studies could an invaluable perspective on the prospective technical con-
yield more definitive insights, yet the synchronous interplay struct and proficiencies of such an undertaking4. Our research
between innovation and academic curiosity remains a hallmark methodology involved a structured literature search using key
of AI’s progress. terms like ‘Large Language Models’ and ‘Generative AI’. We
Meanwhile, the exponential increase in the number of utilized filters across several academic databases such as IEEE
preprints posted on arXiv under the Computer Science > Ar- Xplore, Scopus, ACM Digital Library, ScienceDirect, Web of
tificial Intelligence (cs.AI) category, as illustrated in Figure 2, Science, and ProQuest Central, tailored to identify relevant
appears to signify a paradigm shift in research dissemination articles published in the timeframe from 2017 (the release
within the AI community. While the rapid distribution of of the “Transformer” model) to 2023 (the writing time of
findings enables swift knowledge exchange, it also raises this manuscript). This paper aspires to dissect the technical
concerns regarding the validation of information. The surge ramifications of Gemini and Q*, probing how they (and
in preprints may lead to the propagation of unvalidated or similar technologies whose emergence is now inevitable) may
biased information, as these studies do not undergo the rigor- transfigure research trajectories and disclose new vistas in the
ous scrutiny and potential retraction typical of peer-reviewed domain of AI. In doing so, we have pinpointed three nascent
publications [16], [17]. This trend underlines the need for research domains—MoE, multimodality, and AGI—that stand
careful consideration and critique in the academic community, to reshape the generative AI research landscape profoundly.
especially given the potential for such unvetted studies to be This investigation adopts a survey-style approach, systemat-
cited and their findings propagated. ically mapping out a research roadmap that synthesizes and
analyzes the current and emergent trends in generative AI.
B. Objectives The major contributions of this study is as follows:
The impetus for this investigation is the official unveiling of 1) Detailed examination of the evolving landscape in genera-
Gemini and the speculative discourse surrounding Q* project, tive AI, emphasizing the advancements and innovations in
which prompts a timely examination of the prevailing currents technologies like Gemini and Q*, and their wide-ranging
implications within the AI domain.
3 The legend entries correspond to the keywords used in the search query,
which is constructed as: “(AI OR artificial OR (machine learning) OR (neural
network) OR computer OR software) AND ([specific keyword])”. 4 https://twitter.com/DrJimFan/status/1728100123862004105
JOURNAL OF LATEX CLASS FILES, VOL. 1, NO. 1, DECEMBER 2023 3
700k
600k
Number of search results
500k
400k
300k
200k
100k
2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023
Year
Deep Learning Transfer Learning Supervised Learning
Convolutional Neural Network(s) Explainable AI Natural Language Processing
Unsupervised Learning Reinforcement Learning Generative Adversarial Networks
Fine(-)tuning Ethics / Ethical Language Model(s)
3
Figure 1: Number of search results on Google Scholar with different keywords by year
20
20 1
20 2
23
1
1
1
1
1
1
1
1
1
2
2
20
20
1980s: Statistical Models (n-grams) 2) Large Language Models: Technical Advancement and
Commercial Success: The advent of deep learning has revolu-
tionized the field of NLP, leading to the development of LLMs
1990s: Adoption in NLP, n-gram Usage
like GPT, BERT, and notably, OpenAI’s ChatGPT. Recent
models such as GPT-4 and LLaMA have pushed the bound-
1997: Introduction of LSTMs aries by integrating sophisticated techniques like transformer
architectures and advanced natural language understanding,
2000s: LSTMs in Text/Voice Processing illustrating the rapid evolution in this field [37]. These models
represent a significant leap in NLP capabilities, leveraging
vast computational resources and extensive datasets to achieve
2010s: Deep Learning Era, GPT, BERT new heights in language understanding and generation [37],
[50]. ChatGPT has shown impressive conversational skills
2020s: LLaMA, Gemini; ChatGPT Launch and contextual understanding with a broad spectrum of func-
tional uses in many areas, as evidenced by its technical and
commercial success, including rapid adoption by over 100
Figure 3: Timeline of Key Developments in Language Model million users shortly after launch, which underscores a robust
Evolution market demand for natural language AI and has catalyzed
interdisciplinary research into its applications in sectors like
education, healthcare, and commerce [8], [50], [51], [52],
[53]. In education, ChatGPT offers innovative approaches
complex neural network architectures that underpin today’s to personalized learning and interactive teaching [54], [51],
LLMs [36], [37]. This evolution has been driven by a relentless [55], [56], while in commerce, it revolutionizes customer
quest for models that more accurately reflect the nuances of service and content creation [57], [58]. The widespread use
human language, as well as the desire to push the boundaries of ChatGPT, Google Bard, Anthropic Claude and similar
of what machines can understand and generate [36], [38], [37]. commercial LLMs has reignited important debates in the field
However, this rapid advancement has not been without its of AI, particularly concerning AI consciousness and safety, as
challenges. As language models have grown in capability, so its human-like interaction capabilities raise significant ethical
too have the ethical and safety concerns surrounding their use, questions and highlight the need for robust governance and
prompting a reevaluation of how these models are developed safety measures in AI development [59], [31], [32], [11]. Such
and the purposes for which they are employed [36], [39], [40]. influence appears to extend beyond its technical achievements,
shaping cultural and societal discussions about the role and
1) Language Models as Precursors: The inception of lan- future of AI in our world.
guage modeling can be traced to the statistical approaches of The advancements in LLMs, including the development of
the late 1980s, a period marked by a transition from rule-based models like GPT and BERT, have paved the way for the
to machine learning algorithms in Natural Language Process- conceptualization of Q*. Specifically, the scalable architecture
ing (NLP) [41], [42], [43], [44], [45]. Early models, primarily and extensive training data that characterize these models
n-gram based, calculated the probability of word sequences are foundational to the proposed capabilities of Q*. The
in a corpus, thus providing a rudimentary understanding of success of ChatGPT in contextual understanding and con-
language structure [41]. Those models, simplistic yet ground- versational AI, for example, informs the design principles
breaking, laid the groundwork for future advances in language of Q*, suggesting a trajectory towards more sophisticated,
understanding. With the increase of computational power, the context-aware, and adaptive language processing capabilities.
late 1980s witnessed a revolution in NLP, pivoting towards Similarly, the emergence of multimodal systems like Gemini,
statistical models capable of ‘soft’ probabilistic decisions, as capable of integrating text, images, audio, and video, reflects
opposed to the rigid, ‘handwritten’ rule-based systems that an evolutionary path that Q* could extend, combining the
dominated early NLP systems [43]. IBM’s development of versatility of LLMs with advanced learning and pathfinding
complicated statistical models throughout this period signified algorithms for a more holistic AI solution.
the growing importance and success of these approaches. 3) Fine-tuning, Hallucination Reduction, and Alignment
In the subsequent decade, the popularity and applicability in LLMs: The advancement of LLMs has underlined the
of statistical models surged, proving invaluable in managing significance of fine-tuning [60], [61], [62], [63], hallucination
the flourishing flow of digital text. The 1990s saw statistical reduction [64], [65], [66], [67], and alignment [68], [69],
methods firmly established in NLP research, with n-grams [70], [71], [72]. These aspects are crucial in enhancing the
becoming instrumental in numerically capturing linguistic pat- functionality and reliability of LLMs. Fine-tuning, which
terns. The introduction of Long Short-Term Memory (LSTM) involves adapting pre-trained models to specific tasks, has
networks in 1997 [46], and their application to voice and seen significant progress: techniques like prompt-based and
text processing a decade later [47], [48], [49], marked a few-shot learning [73], [74], [75], [76], alongside supervised
significant milestone, leading to the current era where neural fine-tuning on specialized datasets [60], [77], [78], [79], have
network models represent the cutting edge of NLP research enhanced the adaptability of LLMs in various contexts, but
and development. challenges remain, particularly in bias mitigation and the
JOURNAL OF LATEX CLASS FILES, VOL. 1, NO. 1, DECEMBER 2023 5
generalization of models across diverse tasks [60], [80], [72]. Although the paradigm shift to MoE signifies a major leap in
Hallucination reduction is a persistent challenge in LLMs, LLM development, offering significant scalability and special-
characterized by the generation of confident but factually in- ization advantages, ensuring the safety, ethical alignment, and
correct information [36]. Strategies such as confidence penalty transparency of these models remains a paramount concern.
regularization during fine-tuning have been implemented to The MoE architecture, while technologically advanced, entails
mitigate overconfidence and improve accuracy [81], [82], [83]. continued interdisciplinary research and governance to align
Despite these efforts, the complexity of human language and AI with broader societal values and ethical standards.
the breadth of topics make completely eradicating hallucina-
tions a daunting task, especially in culturally sensitive contexts B. Multimodal AI and the Future of Interaction
[36], [9]. Alignment, ensuring LLM outputs are congruent The advent of multimodal AI marks a transformative era
with human values and ethics, is an area of ongoing research. in AI development, revolutionizing how machines interpret
Innovative approaches, from constrained optimization [84], and interact with a diverse array of human sensory inputs and
[85], [86], [87], [88], to different types of reward modeling contextual data.
[89], [90], [91], [92], aim to embed human preferences within 1) Gemini: Redefining Benchmarks in Multimodality: Gem-
AI systems. While advancements in fine-tuning, hallucination ini, a pioneering multimodal conversational system, marks a
reduction, and alignment have propelled LLMs forward, these significant shift in AI technology by surpassing traditional
areas still present considerable challenges. The complexity of text-based LLMs like GPT-3 and even its multimodal coun-
aligning AI with the diverse spectrum of human ethics and the terpart, ChatGPT-4. Gemini’s architecture has been designed
persistence of hallucinations, particularly on culturally sensi- to incorporate the processing of diverse data types such
tive topics, highlight the need for continued interdisciplinary as text, images, audio, and video, a feat facilitated by its
research in the development and application of LLMs [9]. unique multimodal encoder, cross-modal attention network,
4) Mixture of Experts: A Paradigm Shift: The adoption and multimodal decoder [112]. The architectural core of
of the MoE architecture in LLMs marks a critical evolution Gemini is its dual-encoder structure, with separate encoders
in AI technology. This innovative approach, exemplified by for visual and textual data, enabling sophisticated multimodal
advanced models like Google’s Switch Transformer5 and contextualization [112]. This architecture is believed to surpass
MistralAI s Mixtral-8x7B6, leverages multiple transformer- the capabilities of single-encoder systems, allowing Gemini
based expert modules for dynamic token routing, enhancing to associate textual concepts with image regions and achieve
modeling efficiency and scalability. The primary advantage of a compositional understanding of scenes [112]. Furthermore,
MoE lies in its ability to handle vast parameter scales, reduc- Gemini integrates structured knowledge and employs special-
ing memory footprint and computational costs significantly ized training paradigms for cross-modal intelligence, setting
[93], [94], [95], [96], [97]. This is achieved through model new benchmarks in AI [112]. In [112], Google has claimed and
parallelism across specialized experts, allowing the training demonstrated that Gemini distinguishes itself from ChatGPT-4
of models with trillions of parameters, and its specialization through several key features:
in handling diverse data distributions enhances its capability • Breadth of Modalities: Unlike ChatGPT-4, which pri-
in few-shot learning and other complex tasks [94], [95]. To marily focuses on text, documents, images, and code,
illustrate the practicality of MoE, consider its application in Gemini handles a wider range of modalities including
healthcare. For example, an MoE-based system could be used audio, and video. This extensive range allows Gemini to
for personalized medicine, where different ‘expert’ modules tackle complex tasks and understand real-world contexts
specialize in various aspects of patient data analysis, including more effectively.
genomics, medical imaging, and electronic health records. This • Performance: Gemini Ultra excels in key multimodality
approach could significantly enhance diagnostic accuracy and benchmarks, notably in massive multitask language un-
treatment personalization. Similarly, in finance, MoE models derstanding (MMLU) which encompasses a diverse array
can be deployed for risk assessment, where experts analyze of domains like science, law, and medicine, outperform-
distinct financial indicators, market trends, and regulatory ing ChatGPT-4.
compliance factors. • Scalability and Accessibility: Gemini is available in three
Despite its benefits, MoE confronts challenges in dynamic tailored versions – Ultra, Pro, and Nano – catering to a
routing complexity [98], [99], [100], [101], [102], expert range of applications from data centers to on-device tasks,
imbalance [103], [104], [105], [106], and probability dilu- a level of flexibility not yet seen in ChatGPT-4.
tion [107], and such technical hurdles demand sophisticated • Code Generation: Gemini’s proficiency in understanding
solutions to fully harness MoE’s potential. Moreover, while and generating code across various programming lan-
MoE may offer performance gains, it does not inherently guages is more advanced, offering practical applications
solve ethical alignment issues in AI [108], [109], [110]. The beyond ChatGPT-4’s capabilities.
complexity and specialization of MoE models can obscure the • Transparency and Explainability: A focus on explainabil-
decision-making processes, complicating efforts to ensure ethi- ity sets Gemini apart, as it provides justifications for its
cal compliance and alignment with human values [108], [111]. outputs, enhancing user trust and understanding of the
AI’s reasoning process.
5 https://huggingface.co/google/switch-c-2048 Despite these advancements, Gemini’s real-world perfor-
6 https://huggingface.co/mistralai/Mixtral-8x7B-v0.1 mance in complex reasoning tasks that require integration
JOURNAL OF LATEX CLASS FILES, VOL. 1, NO. 1, DECEMBER 2023 6
of commonsense knowledge across modalities remains to be ethical development of multimodal AI systems requires robust
thoroughly evaluated. governance frameworks focusing on transparency, consent,
2) Technical Challenges in Multimodal Systems: The devel- data handling protocols, and public awareness, when ethical
opment of multimodal AI systems faces several technical hur- guidelines must evolve to address the unique challenges posed
dles, including creating robust and diverse datasets, managing by these technologies, including setting standards for data
scalability, and enhancing user trust and system interpretability usage and safeguarding against the nonconsensual exploita-
[113], [114], [115]. Challenges like data skew and bias are tion of personal information [135], [136]. Additionally, the
prevalent due to data acquisition and annotation issues, which development of AI literacy programs will be crucial in helping
requires effective dataset management by employing strate- society understand and responsibly interact with multimodal
gies such as data augmentation, active learning, and transfer AI technologies [113], [135]. As the field progresses, interdis-
learning [113], [116], [80], [115]. A significant challenge is ciplinary collaboration will be key in ensuring these systems
the computational demands of processing various data streams are developed and deployed in a manner that aligns with
simultaneously, requiring powerful hardware and optimized societal values and ethical principles [113].
model architectures for multiple encoders [117], [118]. Ad-
vanced algorithms and multimodal attention mechanisms are
C. Speculative Advances and Chronological Trends
needed to balance attention across different input media and
resolve conflicts between modalities, especially when they pro- In the dynamic landscape of AI, the speculative capabilities
vide contradictory information [119], [120], [118]. Scalability of the Q* project, blending LLMs, Q-learning, and A* (A-
issues, due to the extensive computational resources needed, Star algorithm), embodies a significant leap forward. This
are exacerbated by limited high-performance hardware avail- section explores the evolutionary trajectory from game-centric
ability [121], [122]. There is also a pressing need for calibrated AI systems to the broad applications anticipated with Q*.
multimodal encoders for compositional scene understanding 1) From AlphaGo’s Groundtruth to Q-Star’s Exploration:
and data integration [120]. Refining evaluation metrics for The journey from AlphaGo, a game-centric AI, to the con-
these systems is necessary to accurately assess performance ceptual Q-Star project represents a significant paradigm shift
in real-world tasks, calling for comprehensive datasets and in AI. AlphaGo’s mastery in the game of Go highlighted
unified benchmarks, and for enhancing user trust and system the effectiveness of deep learning and tree search algorithms
interpretability through explainable AI in multimodal contexts. within well-defined rule-based environments, underscoring the
Addressing these challenges is vital for the advancement of potential of AI in complex strategy and decision-making [137],
multimodal AI systems, enabling seamless and intelligent [138]. Q-Star, however, is speculated to move beyond these
interaction aligned with human expectations. confines, aiming to amalgamate the strengths of reinforcement
3) Multimodal AI: Beyond Text in Ethical and Social Con- learning (as seen in AlphaGo), with the knowledge, NLG,
texts: The expansion of multimodal AI systems introduces creativity and versatility of LLMs, and the strategic effi-
both benefits and complex ethical and social challenges that ciency of pathfinding algorithms like A*. This blend, merging
extend beyond those faced by text-based AI. In commerce, pathfinding algorithms and LLMs, could enable AI systems
multimodal AI can transform customer engagement by inte- to transcend board game confines and, with Q-Star’s natural
grating visual, textual, and auditory data [123], [124], [125]. language processing, interact with human language, enabling
For autonomous vehicles, multimodality can enhance safety nuanced interactions and marking a leap towards AI adept in
and navigation by synthesizing data from various sensors, both structured tasks and complex human-like communication
including visual, radar, and Light Detection and Ranging and reasoning. Moreover, the incorporation of Q-learning and
(LIDAR) [126], [125], [127]. Still, DeepFake technology’s A* algorithms would enable Q-Star to optimize decision paths
ability to generate convincingly realistic videos, audio, and and learn from its interactions, making it more adaptable and
images is a critical concern in multimodality, as it poses risks intelligent over time. The combination of these technologies
of misinformation and manipulation that significantly impact could lead to AI that is not only more efficient in problem-
public opinion, political landscapes, and personal reputations, solving but also creative and insightful in its approach. This
thereby compromising the authenticity of digital media and speculative advancement from the game-focused power of Al-
raising issues in social engineering and digital forensics where phaGo to the comprehensive potential of Q-Star illustrates the
distinguishing genuine from AI-generated content becomes dynamic and ever-evolving nature of AI research, and opens
increasingly challenging [128], [129]. Privacy concerns are up possibilities for AI applications that are more integrated
amplified in multimodal AI due to its ability to process with human life and capable of handling a broader range of
and correlate diverse data sources, potentially leading to tasks with greater autonomy and sophistication.
intrusive surveillance and profiling, which raises questions 2) Bridging Structured Learning with Creativity: The antic-
about the consent and rights of individuals, especially when ipated Q* project, blending Q-learning and A* algorithms with
personal media is used without permission for AI training or the creativity of LLMs, embodies a groundbreaking step in
content creation [113], [130], [131]. Moreover, multimodal AI, potentially surpassing recent innovations like Gemini. The
AI can propagate and amplify biases and stereotypes across fusion suggested in Q* points to an integration of structured,
different modalities, and if unchecked, this can perpetuate goal-oriented learning with generative, creative capabilities, a
discrimination and social inequities, making it imperative to combination that could transcend the existing achievements
address algorithmic bias effectively [132], [133], [134]. The of Gemini. While Gemini represents a significant leap in
JOURNAL OF LATEX CLASS FILES, VOL. 1, NO. 1, DECEMBER 2023 7
multimodal AI, combining various forms of data inputs such • Recurrent Neural Networks (RNNs): RNNs excel in the
as text, images, audio, and video, Q* is speculated to bring a realm of sequence modeling, making them particularly
more profound integration of creative reasoning and structured effective for tasks involving language and temporal data,
problem-solving. This would be achieved by merging the as their architecture is specifically designed to process
precision and efficiency of algorithms like A* with the learning sequences of data, such as text, enabling them to capture
adaptability of Q-learning, and the complex understanding the context and order of the input effectively [150],
of human language and context offered by LLMs. Such an [151], [152], [153], [154]. This proficiency in handling
integration could enable AI systems to not only process and sequential information renders them indispensable in
analyze complex multimodal data but also to autonomously applications that require a deep understanding of the
navigate through structured tasks while engaging in creative temporal dynamics within data, such as natural language
problem-solving and knowledge generation, mirroring the tasks and time-series analysis [155], [156]. RNNs’ ability
multifaceted nature of human cognition. The implications of to maintain a sense of continuity over sequences is a
this potential advancement are vast, suggesting applications critical asset in the broader field of AI, especially in
that span beyond the capabilities of current multimodal sys- scenarios where context and historical data play crucial
tems like Gemini. By aligning the deterministic aspects of roles [157].
traditional AI algorithms with the creative and generative • Mixture of Experts (MoE): MoE models can signifi-
potential of LLMs, Q* could offer a more holistic approach cantly enhance efficiency by deploying model parallelism
to AI development. This could bridge the gap between the across multiple specialized expert modules, which en-
logical, rule-based processing of AI and the creative, abstract ables these models to leverage transformer-based modules
thinking characteristic of human intelligence. The anticipated for dynamic token routing, and to scale to trillions of
unveiling of Q*, merging structured learning techniques and parameters, thereby reducing both memory footprint and
creative problem-solving in a singular, advanced framework, computational costs [94], [98]. MoE models stand out for
holds the promise of not only extending but also significantly their ability to divide computational loads among various
surpassing the multimodal capabilities of systems like Gemini, experts, each specializing in different aspects of the data,
thus heralding another game-changing era in the domain of which allows for handling vast scales of parameters more
generative AI, showcasing its potential as a crucial develop- effectively, leading to a more efficient and specialized
ment eagerly awaited in the ongoing evolution of AI. handling of complex tasks [94], [21].
• Multimodal Models: Multimodal models, which inte-
III. T HE C URRENT G ENERATIVE AI R ESEARCH grate a variety of sensory inputs such as text, vision,
TAXONOMY and audio, are crucial in achieving a comprehensive
The field of Generative AI is evolving rapidly, which understanding of complex data sets, particularly trans-
necessitates a comprehensive taxonomy that encompasses the formative in fields like medical imaging [113], [112],
breadth and depth of research within this domain. Detailed in [115]. These models facilitate accurate and data-efficient
Table I, this taxonomy categorizes the key areas of inquiry analysis by employing multi-view pipelines and cross-
and innovation in generative AI, and serves as a foundational attention blocks [158], [159]. This integration of diverse
framework to understand the current state of the field, guiding sensory inputs allows for a more nuanced and detailed
through the complexities of evolving model architectures, interpretation of data, enhancing the model’s ability to
advanced training methodologies, diverse application domains, accurately analyze and understand various types of infor-
ethical implications, and the frontiers of emerging technolo- mation [160]. The combination of different data types,
gies. processed concurrently, enables these models to provide a
holistic view, making them especially effective in applica-
tions that require a deep and multifaceted understanding
A. Model Architectures of complex scenarios [113], [161], [162], [160].
Generative AI model architectures have seen significant
developments, with four key domains standing out: B. Training Techniques
• Transformer Models: Transformer models have signifi- The training of generative AI models leverages four key
cantly revolutionized the field of AI, especially in NLP, techniques, each contributing uniquely to the field:
due to their higher efficiency and scalability [139], [140], • Supervised Learning: Supervised learning, a founda-
[141]. They employ advanced attention mechanisms to tional approach in AI, uses labeled datasets to guide
achieve enhanced contextual processing, allowing for models towards accurate predictions, and it has been
more subtle understanding and interaction [142], [143], integral to various applications, including image recogni-
[144]. These models have also made notable strides in tion and NLP [163], [164], [165]. Recent advancements
computer vision, as evidenced by the development of have focused on developing sophisticated loss functions
vision transformers like EfficientViT [145], [146] and and regularization techniques, aimed at enhancing the
YOLOv8 [147], [148], [149]. These innovations symbol- performance and generalization capabilities of supervised
ize the extended capabilities of transformer models in learning models, ensuring they remain robust and effec-
areas such as object detection, offering not only improved tive across a wide range of tasks and data types [166],
performance but also increased computational efficiency. [167], [168].
JOURNAL OF LATEX CLASS FILES, VOL. 1, NO. 1, DECEMBER 2023 8
• Unsupervised Learning: Unsupervised learning is es- autonomous systems [176], [177]. This training technique
sential in AI for uncovering patterns within unlabeled has undergone significant advancements, particularly with
data, a process central to tasks like feature learning the development of Deep Q-Networks (DQN) [178],
and clustering [169], [170]. This method has seen sig- [179], [180] and Proximal Policy Optimization (PPO)
nificant advancements with the introduction of autoen- algorithms [181], [182], [183]. These enhancements have
coders [171], [172] and Generative Adversarial Networks been crucial in improving the efficacy and applicability
(GANs) [173], [174], [175], which have notably expanded of reinforcement learning, especially in complex and
unsupervised learning’s applicability, enabling more so- dynamic environments. By optimizing decisions and poli-
phisticated data generation and representation learning cies through interactive feedback loops, reinforcement
capabilities. Such innovations are crucial for understand- learning has established itself as a crucial tool for training
ing and leveraging the complex structures often inherent AI systems in scenarios that demand a high degree
in unstructured datasets, highlighting the growing versa- of adaptability and precision in decision-making [184],
tility and depth of unsupervised learning techniques. [185].
• Reinforcement Learning: Reinforcement learning, char- • Transfer Learning: Transfer learning emphasizes ver-
acterized by its adaptability and optimization capabilities, satility and efficiency in AI training, allowing models
has become increasingly vital in decision-making and to apply knowledge acquired from one task to different
JOURNAL OF LATEX CLASS FILES, VOL. 1, NO. 1, DECEMBER 2023 9
yet related tasks, which significantly reduces the need by large pre-trained models like Meena7 and BlenderBot8 ,
for large labeled datasets [186], [187]. Transfer learning, have significantly enhanced the empathetic and respon-
through the use of pre-trained networks, streamlines the sive capabilities of AI interactions. These systems not
training process by allowing models to be efficiently only improve user engagement and satisfaction, but also
fine-tuned for specific applications, thereby enhancing maintain the flow of conversation over multiple turns,
adaptability and performance across diverse tasks, and providing coherent, contextually relevant, and engaging
proving particularly beneficial in scenarios where acquir- experiences [208], [209].
ing extensive labeled data is impractical or unfeasible • Creative AI: This emerging subdomain spans across text,
[188], [189]. art, music, and more, pushing the boundaries of AI’s
creative and innovative potential across various modalities
including images, audio, and video, by engaging in the
C. Application Domains
generation of artistic content, encompassing applications
The application domains of Generative AI are remarkably in idea generation, storytelling, poetry, music composi-
diverse and evolving, encompassing both established and tion, visual arts, and creative writing, and has resulted in
emerging areas of research and application. These domains commercial success like MidJourney and DALL-E [210],
have been significantly influenced by recent advancements in [211], [212]. The challenges in this field involve finding
AI technology and the expanding scope of AI applications. suitable data representations, algorithms, and evaluation
• Natural Language Understanding (NLU): NLU is cen- metrics to effectively assess and foster creativity [212],
tral to enhancing the comprehension and contextualiza- [213]. Creative AI serves not only as a tool for automating
tion of human language in AI systems, and involves and enhancing artistic processes, but also as a medium for
key capabilities such as semantic analysis, named en- exploring new forms of artistic expression, enabling the
tity recognition, sentiment analysis, textual entailment, creation of novel and diverse creative outputs [212]. This
and machine reading comprehension [190], [191], [192], domain represents a significant leap in AI’s capability to
[193]. Advances in NLU have been crucial in improving engage in and contribute to creative endeavors, redefining
AI’s proficiency in interpreting and analyzing language the intersection of technology and art.
across a spectrum of contexts, ranging from straightfor-
ward conversational exchanges to intricate textual data
[190], [192], [193]. NLU is fundamental in applications D. Compliance and Ethical Considerations
like sentiment analysis, language translation, information As AI technologies rapidly evolve and become more in-
extraction, and more [194], [195], [196]. Recent advance- tegrated into various sectors, ethical considerations and legal
ments have prominently featured large transformer-based compliance have become increasingly crucial, which requires a
models like BERT and GPT-3, which have significantly focus on developing ‘Ethical AI Frameworks’, a new category
advanced the field by enabling a deeper and more com- in our taxonomy reflecting the trend towards responsible AI
plex understanding of language subtleties [197], [198]. development in generative AI [214], [215], [15], [216], [217].
• Natural Language Generation (NLG): NLG em- Such frameworks are crucial in ensuring AI systems are built
phasizes the training of models to generate coherent, with a core emphasis on ethical considerations, fairness, and
contextually-relevant, and creative text responses, a crit- transparency, as they address critical aspects such as bias
ical component in chatbots, virtual assistants, and auto- mitigation for fairness, privacy and security concerns for data
mated content creation tools [199], [36], [200], [201]. protection, and AI ethics for accountability, thus responding
NLG encompasses challenges such as topic model- to the evolving landscape where accountability in AI is of
ing, discourse planning, concept-to-text generation, style paramount importance [214], [15]. The need for rigorous
transfer, and controllable text generation [36], [202]. approaches to uphold ethical integrity and legal conformity
The recent surge in NLG capabilities, exemplified by has never been more pressing, reflecting the complexity and
advanced models like GPT-3, has significantly enhanced multifaceted challenges introduced by the adoption of these
the sophistication and nuance of text generation, which technologies [15].
enable AI systems to produce text that closely mirrors • Bias Mitigation: Bias Mitigation in AI systems is a
human writing styles, thereby broadening the scope and critical endeavor to ensure fairness and representation,
applicability of NLG in various interactive and creative which involves not only balanced data collection to
contexts [203], [55], [51]. avoid skewed perspectives but also involves implementing
• Conversational AI: This subdomain is dedicated to algorithmic adjustments and regularization techniques to
developing AI systems capable of smooth, natural, and minimize biases [218], [219]. Continuous monitoring and
context-aware human-computer interactions, by focusing bias testing are essential to identify and address any
on dialogue modeling, question answering, user intent biases that may emerge from AI’s predictive patterns
recognition, and multi-turn context tracking [204], [205], [220], [219]. A significant challenge in this area is
[206], [207]. In finance and cybersecurity, AI’s predictive dealing with intersectional biases [221], [222], [223] and
analytics have transformed risk assessment and fraud
detection, leading to more secure and efficient operations 7 https://neptune.ai/blog/transformer-nlp-models-meena-lamda-chatbots
understanding the causal interactions that may contribute negative sample pairs. Further, it employs self-prediction
to these biases [224], [225], [226], [227]. strategies, inspired by NLP, using techniques like mask-
• Data Security: In AI data security, key requirements and ing for input reconstruction, significantly enhanced by
challenges include ensuring data confidentiality, adhering recent Vision Transformers developments [249], [250],
to consent norms, and safeguarding against vulnerabilities [165]. This integration of varied methods highlights self-
like membership inference attacks [228], [229]. Compli- supervised learning’s role in advancing AI’s autonomous
ance with stringent legal standards within applicable juris- training capabilities.
dictions, such as the General Data Protection Regulation • Meta-learning: Meta-learning, or ‘learning to learn’,
(GDPR) and California Consumer Privacy Act (CCPA), centers on equipping AI models with the ability to
is essential, necessitating purpose limitation and data rapidly adapt to new tasks and domains using limited data
minimization [230], [231], [232]. Additionally, issues of samples [251], [252]. This technique involves mastering
data sovereignty and copyright emphasize the need for the optimization process and is critical in situations with
robust encryption, access control, and continuous security limited data availability, to ensure models can quickly
assessments [233], [234]. These efforts are critical for adapt and perform across diverse tasks, essential in the
maintaining the integrity of AI systems and protecting current data-driven landscape [253], [254]. It focuses on
user privacy in an evolving digital landscape. few-shot generalization, enabling AI to handle a wide
• AI Ethics: The field of AI ethics focuses on fairness, range of tasks with minimal data, underlining its impor-
accountability, and societal impact, addresses the surge in tance in developing versatile and adaptable AI systems
ethical challenges posed by AI’s increasing complexity [255], [256], [254], [257].
and potential misalignment with human values, and re- • Fine Tuning: Involves customizing pre-trained models to
quires ethical governance frameworks, multidisciplinary specific domains or user preferences, enhancing accuracy
collaborations, and technological solutions [214], [235], and relevance for niche applications [60], [258], [259].
[15], [236]. Furthermore, AI Ethics involves ensuring Its two primary approaches are end-to-end fine-tuning,
traceability, auditability, and transparency throughout the which adjusts all weights of the encoder and classifier
model development lifecycle, employing practices such as [260], [261], and feature-extraction fine-tuning, where
algorithmic auditing, establishing ethics boards, and ad- the encoder weights are frozen to extract features for
hering to documentation standards and model cards [237], a downstream classifier [262], [263], [264]. This tech-
[236]. However, the adoption of these initiatives remains nique ensures that generative models are more effectively
uneven, highlighting the ongoing need for comprehensive adapted to specific user needs or domain requirements,
and consistent ethical practices in AI development and making them more versatile and applicable across various
deployment [214]. contexts.
• Privacy Preservation: This domain focuses on maintain- • Human Value Alignment: This emerging aspect con-
ing data confidentiality and integrity, employing strategies centrates on harmonizing AI models with human ethics
like anonymization and federated learning to minimize and values to ensure that their decisions and actions
direct data exposure, especially when the rise of genera- mirror societal norms and ethical standards, involving
tive AI poses risks of user profiling [238], [239]. Despite the integration of ethical decision-making processes and
these efforts, challenges such as achieving true anonymity the adaptation of AI outputs to conform with human
against correlation attacks highlight the complexities in moral values [265], [89], [266]. This is increasingly
effectively protecting against intrusive surveillance [240], important in scenarios where AI interacts closely with
[241]. Ensuring compliance with privacy laws and im- humans, such as in healthcare, finance, and personal
plementing secure data handling practices are crucial in assistants, to ensure that AI systems make decisions that
this context, demonstrating the continuous need for robust are not only technically sound, but also ethically and
privacy preservation mechanisms. socially responsible, which means human value alignment
is becoming crucial in developing AI systems that are
trusted and accepted by society [89], [267].
E. Advanced Learning
Advanced learning techniques, including self-supervised
learning, meta-learning, and fine-tuning, are at the forefront F. Emerging Trends
of AI research, enhancing the autonomy, efficiency, and ver- Emerging trends in generative AI research are shaping the
satility of AI models. future of technology and human interaction, and they indicate
• Self-supervised Learning: This method emphasizes au- a dynamic shift towards more integrated, interactive, and
tonomous model training using unlabeled data, reducing intelligent AI systems, driving forward the boundaries of what
manual labeling efforts and model biases [242], [165], is possible in the realm of AI. Key developments in this area
[243]. It incorporates generative models like autoencoders include:
and GANs for data distribution learning and original • Multimodal Learning: Multimodal Learning in AI, a
input reconstruction [244], [245], [246], and also includes rapidly evolving subdomain, focuses on combining lan-
contrastive methods such as SimCLR [247] and MoCo guage understanding with computer vision and audio
[248], designed to differentiate between positive and processing to achieve a richer, multi-sensory context
JOURNAL OF LATEX CLASS FILES, VOL. 1, NO. 1, DECEMBER 2023 11
route tokens to appropriate experts confers considerable ad- bottlenecks and ensuring a more efficient and streamlined
vantages to MoE models, allowing them to scale up model model operation, leading to improved training processes and
sizes while keeping compute time constant [295], [296], [297]. heightened performance during complex computational tasks
Experimental evidence suggests that routers learn to route [293], [303], [289].
inputs according to data clusters, demonstrating their potential
in real-world applications [295], [289]. The core concept D. Parallelism and Serving Techniques
and structure of MoE models lie in their dynamic routing Recent developments in MoE models highlighted their ef-
and specialization capabilities, offering promising avenues for ficiency in parallelism and serving techniques, significantly
scaling up neural networks and enhancing their efficiency and influencing large-scale neural networks. DeepSpeed-MoE, for
adaptability in various tasks, but the robustness of the router instance, introduces advanced parallelism modes like data par-
must be protected against adversarial attacks [289], [298]. allelism, tensor-slicing for non-expert parameters, and expert
parallelism for expert parameters, enhancing model efficiency,
B. Training and Inference Efficiency as their approach optimizes both latency and throughput in
MoE model inference, offering scalable solutions in produc-
MoE models, notably Mixtral 8x7B, are renowned for their
tion environments using multiple Graphics Processing Unit
superior pretraining speed compared to dense models, yet they
(GPU) devices [287]. MoE models, versatile in applications
face hurdles in fine-tuning and demand considerable VRAM
like multilingual tasks and coding, demonstrated impressive
for inference, owing to the requirement of loading all experts
capabilities in handling complex tasks due to their ensemble-
[289], [290], [110]. Recent advancements in MoE architecture
like structure within a single framework [304], [305], [306].
have resulted in notable training cost efficiencies, especially in
Notably, models like Mixtral and Switch Transformer, with
encoder-decoder models, with evidence showing cost savings
over 1.6 trillion parameters, achieved computational efficiency
of up to fivefold in certain contexts when compared to dense
equivalent to a 10 billion-parameter dense model, because
models [21], [289], [298], [287]. Innovations like DeepSpeed-
they benefited from the sublinear scaling of MoE compute
MoE [287] offered new architectural designs and model com-
versus model size, leading to substantial accuracy gains within
pression, decreasing the MoE model size by approximately
fixed compute budgets [21], [289], [287], [110]. Moreover,
3.7x and optimizing inference to achieve up to 7.3x better
DeepSpeed-MoE included model compression techniques, re-
latency and cost efficiency. The progression in distributed
ducing model size by up to 3.7x while maintaining accuracy,
MoE training and inference, notably with innovations like
and an end-to-end MoE training and inference solution, part
Lina [299], has effectively tackled the all-to-all communication
of the DeepSpeed library, which was instrumental in serv-
bottleneck by enhancing tensor partitioning, which not only
ing large-scale MoE models with enhanced speed and cost-
improves all-to-all communication and training step time, but
efficiency [287]. These innovations open new directions in AI,
also optimizes resource scheduling during inference, leading
shifting from dense to sparse MoE models, where training and
to a substantial reduction in training step time by up to 1.73
deploying higher-quality models with fewer resources become
times and lowering the 95th percentile inference time by an
more widely achievable.
average of 1.63 times compared to existing systems. These
developments have marked a crucial shift in the large model E. Future Directions and Applications
landscape, from dense to sparse MoE models, expanding the
Emerging research on MoE architectures could focus on
potential applications of AI by training higher-quality models
advancing sparse fine-tuning techniques, exploring instruction
with fewer resources.
tuning methods, and improving routing algorithms to fully
utilize performance and efficiency gains. As models scale
C. Load Balancing and Router Optimization over one billion parameters, MoE represents a paradigm shift
Effective load balancing is essential in MoE models to for vastly expanding capabilities across scientific, medical,
guarantee a uniform distribution of computational load among creative, and real-world applications. Frontier work could also
experts, with the router network in MoE layers, responsible for aim to refine auto-tuning of hyperparameters during fine-
selecting the appropriate experts for processing specific tokens, tuning to optimize accuracy, calibration, and safety. MoE re-
playing a pivotal role in achieving this balance, which is funda- search continues to push model scale limits while maintaining
mental to the stability and overall performance of MoE models specialization for transfer learning. Adaptive sparse access
[293], [289], [288], [300], [110]. Developments in router Z- allows coordinating thousands of experts to cooperate on tasks
loss regularization techniques plays a crucial role in addressing ranging from reasoning to open domain dialogue. Continued
expert imbalance in MoE models by fine-tuning the gating analysis of routing mechanisms seeks to balance load across
mechanism, ensuring a more equitable workload distribution experts and minimize redundant computation. As the AI
across experts and fostering a stable training environment, community further investigates MoE methods at scale, these
thereby enhancing model performance and reducing training models hold promise for new breakthroughs in language, code
time and computational overhead [301], [302]. Concurrently, generation, reasoning, and multimodal applications. There
the integration of expert capacity management strategies, is great interest in evaluating implications across education,
emerges as a crucial approach in MoE models to regulate the healthcare, financial analysis, and other fields. Outcomes may
processing abilities of individual experts by setting thresholds yield insights not only into model optimization but also for
on the number of tokens each can handle, effectively averting understanding principles behind combinatorial generalization.
JOURNAL OF LATEX CLASS FILES, VOL. 1, NO. 1, DECEMBER 2023 13
M
n Such capabilities indicate a model not limited to understand-
EGI(Q∗) = (N Ni ⊙ M LTi ) (1) ing existing data but equipped to actively seek and synthesize
i=1 new knowledge, effectively adapting to evolving scenarios
Where: without the need for frequent retraining. This signifies a leap
beyond current AI models, embedding a level of autonomy
• EGI: “Enhanced General Intelligence”
and efficiency previously unattained.
• N Ni : a diverse set of neural network architectures.
• M LTi : various machine learning techniques.
L
• : the integration of these components. C. Superior Human-Level Understanding
• ⊙: a functional interaction between neural networks and Q*’s aspiration to achieve superior human-level under-
machine learning techniques. standing is speculated to hinge on an advanced integration
Such advancements in AI suggest the emergence of an intel- of multiple neural networks, including a Value Neural Net-
ligence that not only parallels but potentially exceeds human work (VNN), paralleling the evaluative components found in
cognitive flexibility, with far-reaching implications in facil- systems like AlphaGo. This network would extend beyond
itating cross-disciplinary innovations and complex problem- assessing accuracy and relevance in language and reasoning
solving. The speculated capabilities of Q* bring forth com- processes, delving into the subtleties of human communica-
plex ethical implications and governance challenges. As AI tion. The model’s deep comprehension capabilities may be
systems approach higher levels of autonomy and decision- enhanced by advanced natural language processing algorithms
making, it is crucial to establish robust ethical frameworks and and techniques, such as those found in transformer architec-
governance structures to ensure responsible and transparent AI tures like DeBERTa. These algorithms would empower Q* to
development. This involves mitigating potential risks associ- interpret not just the text but also the nuanced socio-emotional
ated with advanced AI capabilities, emphasizing the need for aspects such as intent, emotion, and underlying meanings.
JOURNAL OF LATEX CLASS FILES, VOL. 1, NO. 1, DECEMBER 2023 14
Where:
Knowledge
• SHLU : “Superior Human-Level Understanding”.
Integration
• V N N : the Value Neural Network, similar to evaluative
components in systems like AlphaGo.
Figure 6: Conceptual Diagram of Projected AGI Capabilities
• N LP : a set of advanced NLP algorithms.
• ⊕: the combination of VNN evaluation with NLP algo-
rithms.
coupled with sophisticated neural network architectures and
• alg: individual algorithms within the NLP set.
dynamic learning algorithms, would enable Q* to engage
This level of understanding, surpassing current language deeply with the complexities of the real world, transcending
models, would position Q* to excel in empathetic, context- conventional AI limitations. Additionally, Q* might employ
aware interactions, thus enabling a new echelon of personal- mathematical theorem proving techniques for validation, en-
ization and user engagement in AI applications. suring that its reasoning and outputs are not only accurate but
also ethically grounded. The incorporation of Ethics classifiers
D. Advanced Common Sense Reasoning in this process further strengthens its capacity to deliver
Q*’s anticipated development in advanced common sense reliable and responsible understanding and interaction with
reasoning is predicted to integrate sophisticated logic and real-world scenarios. The corresponding quasi-mathematical
decision-making algorithms, potentially combining elements formulation can be represented as:
of symbolic AI and probabilistic reasoning. This integration
aims to endow Q* with an intuitive grasp of everyday logic and ERW KI(Q∗) = F V S ⊗ N N ⊗ LT P ⊗ EC (5)
an understanding akin to human common sense, thus bridging
a significant gap between artificial and natural intelligence. Where:
Enhancements in Q*’s reasoning abilities might involve graph- • ERW KI: “Extensive Real-World Knowledge Integra-
structured world knowledge, incorporating physics and social tion”.
engines similar to those in models like CogSKR. This ap- • F V S: Formal Verification Systems.
scientific enigmas and avenues [315], [316]. ‘Areas Requiring dynamic and specialized architectures. While transformers re-
Redirection’ denote research spheres that, though established, main essential, there is a need for them to evolve and integrate
find themselves at an inflection point, necessitating a strategic with these advanced systems for enhanced performance and
pivot to assimilate emergent AI paradigms and an overhaul adaptability.
of traditional methodologies, akin to the transition from rule- Recurrent Neural Networks (RNNs) are facing a potential
based expert systems to adaptive machine learning frameworks decline in relevance, as indicated by their scores: likely to
[315], [317]. The ‘Still Relevant’ classification affirms the become redundant (ց) 2 in both MoE and AGI contexts
tenacity of select research domains that, by addressing persis- and still relevant (↔) 3 in multimodality, totaling a score
tent scientific inquiries or through their inherent malleability, of 7. Although effective for sequence processing, RNNs are
remain impervious to the tides of AI innovation [317]. In challenged by their limitations in handling long-range depen-
contrast, domains categorized as ‘Likely to Become Redun- dencies and lower efficiency compared to newer models like
dant’ confront potential obsolescence, inviting strategic fore- transformers. They may retain some relevance in multimodal
sight and resource reallocation to forestall scientific stagnation tasks involving sequential data but are generally overshadowed
[318]. Lastly, ‘Inherently Unresolvable’ challenges serve as by more advanced architectures.
a sobering reminder of the perpetual dilemmas within AI The MoE models have scored a consistent relevance (↔)
research that defy resolution, rooted in the complex web of of 3 in their own development and a score of 5 (ր) in
human ethics and cultural diversity, thus anchoring the pursuit multimodality, combined with a redirection score (֒→) of 4
of AI within the intractable tapestry of human values and in the context of AGI, amounting to an overall score of 12.
societal imperatives [319], [320]. MoE models are at the forefront of emerging research in
multimodality due to their ability to handle diverse data types.
B. Overview of Impact Analysis For AGI, these models will require adjustments to effectively
This subsection offers a detailed overview of the impact integrate into systems exhibiting general intelligence, espe-
analysis carried out on the research taxonomy within the realm cially in areas beyond their initial specialization.
of generative AI, with a specific focus on recent progress Multimodal Models have received high scores for emerging
in MoE, multimodality, and AGI, aiming to evaluate the research directions (ր) of 5 in both MoE and AGI contexts,
impact of these innovative developments on various facets alongside a score of 3 (↔) for current relevance in multi-
of generative AI research, ranging from model architecture modality, culminating in an overall score of 13. The integration
to sophisticated learning methodologies, and includes both of MoE and the pursuit of AGI are opening new pathways
quantitative and qualitative assessments across a multitude of for research in multimodal models. These developments are
domains and subdomains in LLM research, shedding light on crucial for enhancing the ability to process and synthesize
the extent to which each area is influenced by these techno- information from multiple modalities, a key aspect for both
logical advancements. This evaluation considered factors such specialized and generalized AI systems.
as the emergence of new research directions, the necessity for 2) Impact On Training Techniques: Supervised Learning
redirection in existing research areas, the continued relevance has been assigned a redirection score (֒→) of 4, a relevance
of certain methodologies, and the potential redundancy of score (↔) of 3 in multimodality, and a score indicating poten-
others, and has encapsulated in Table III. tial redundancy (ց) of 2 in the context of AGI, culminating
1) Impact On Model Architecture: Transformer Models in an overall score of 9. While supervised learning requires
have been scored with a redirection requirement (֒→) of 4 in adaptation to fit the MoE framework, it remains relevant for
both MoE and AGI, and a relevance (↔) of 3 in multimodality, multimodal AI models that depend on labeled data. However,
leading to an overall score of 11. These models, forming the with the shift towards more autonomous learning methods in
backbone of many current AI architectures, continue to be AGI, the dependence on extensive labeled datasets typically
relevant for handling complex input sequences. However, the associated with supervised learning may diminish, leading to
emergence of MoE and AGI indicates a shift towards more its potential decrease in significance.
JOURNAL OF LATEX CLASS FILES, VOL. 1, NO. 1, DECEMBER 2023 17
Unsupervised Learning scores a redirection requirement 3) Impact On Application Domains: Natural Language
(֒→) of 4 in both MoE and AGI contexts and maintains its Understanding holds steady relevance (↔) with a score of 3 in
relevance (↔) with a score of 3 in multimodality, resulting in a both MoE and multimodality, and an emerging direction (ր)
total score of 11. In the MoE architecture, unsupervised learn- score of 5 in AGI, totaling an overall score of 11. MoE models
ing methods may need adjustments, particularly in managing support the relevance of NLU by enhancing its precision and
dynamic task allocation. It remains crucial for understanding depth through their ability to handle large, diverse datasets.
unlabeled data across various modalities. In AGI, unsupervised In multimodal AI, NLU remains a critical component for
learning is expected to evolve beyond traditional techniques, comprehending language in diverse data formats. With AGI’s
focusing on more advanced self-discovery and intrinsic learn- progress, NLU is expected to undergo significant expansion,
ing mechanisms. moving towards more advanced, human-like comprehension
and interpretation capabilities.
Reinforcement Learning is rated as still relevant (↔) with Natural Language Generation maintains relevance (↔) with
a score of 3 in MoE, requiring redirection (֒→) with a a score of 3 in MoE, requires redirection (֒→) with a score of
score of 4 in multimodality, and identified as an emerging 4 in multimodality, and is identified as an emerging research
research area (ր) with a score of 5 in AGI, giving it a total area (ր) with a score of 5 in AGI, resulting in a total score of
score of 12. This technique continues to play a significant 12. MoE’s scalability is crucial for enhancing NLG, while in
role in optimizing MoE model structures. In the realm of multimodal contexts, NLG may need strategic adjustments to
multimodality, it necessitates a strategic shift to effectively align effectively with other modalities. As AGI evolves, NLG
manage complex interactions between different modalities. As is anticipated to venture into new research domains, especially
for AGI, reinforcement learning is emerging as a crucial area, in creating content that reflects human-like creativity and
particularly in the development of autonomous systems that adaptability.
learn from their environment.
Conversational AI is marked for redirection (֒→) with a
Transfer Learning receives a consistent relevance score (↔) score of 4 in MoE, emerging research directions (ր) with
of 3 in MoE, a high score for emerging research directions a score of 5 in both multimodality and AGI, accumulating
(ր) of 5 in multimodality, and a redirection requirement an overall score of 14. While MoE enhances conversational
(֒→) of 4 in AGI, accumulating to an overall score of 12. AI, it may require strategic changes to fully utilize MoE’s
It remains important in the MoE framework for leveraging distributed expertise. The integration of multiple modalities
knowledge across different experts. In multimodal contexts, opens new avenues for conversational AI, expanding its scope
transfer learning is becoming increasingly crucial as it facili- to include various sensory data. The development of AGI is
tates the transfer of learning between different modalities. With set to bring revolutionary advancements in this domain, paving
the evolution of AGI, this technique is expected to undergo the way for more autonomous, context-aware, and human-like
significant changes to cater to broader and more generalized interactions.
knowledge applications. Creative AI scores a redirection requirement (֒→) of 4 in
JOURNAL OF LATEX CLASS FILES, VOL. 1, NO. 1, DECEMBER 2023 18
MoE, and high scores for emerging research directions (ր) of self-supervised learning remains relevant (↔) with a score
5 in both multimodality and AGI, leading to a total score of 14. of 3, contributing to the system’s autonomy and adaptability,
In the context of MoE, Creative AI may need to be realigned though likely to be integrated with more complex strategies.
to capitalize on MoE’s capacity for generating novel content. The overall impact score is 12.
The combination of different modalities in creative AI presents Meta-learning maintains consistent relevance (↔) with a
exciting new research opportunities, enabling the creation of score of 3 across MoE and multimodality, aligning well with
more intricate and diverse outputs. As AGI progresses, it is the dynamic nature of MoE and aiding quick adaptation to
expected to significantly broaden the capabilities of creative varying data types and tasks in multimodal contexts. In AGI,
AI, potentially surpassing existing boundaries and exploring it is marked as an emerging research direction (ր) with a
new realms of creativity. score of 5, suggesting novel research in achieving human-like
4) Impact On Compliance and Ethical Considerations: adaptability and learning efficiency. The total score for meta-
Bias Mitigation in the context of MoE, multimodality, and learning is 11.
AGI scores a redirection requirement (֒→) of 4 in both MoE Fine tuning continues to be relevant (↔) with a score
and multimodality, and an emerging research direction (ր) of 3 in both MoE and multimodality, being essential for
with a score of 5 in AGI, resulting in an overall score of 13. adapting pre-trained models to specific tasks and tailoring
MoE architectures demand a new approach in bias mitigation multimodal models. However, in AGI, it is likely to become
due to the diversity of expert networks, which could other- redundant (ց) with a score of 2, as AGI aims to develop
wise amplify biases. In multimodal systems, bias mitigation systems that autonomously understand and learn across a
requires novel strategies to address biases in various data broad range of domains, reducing the need for traditional fine-
types, including non-textual forms like images and audio. With tuning processes. The overall impact score for fine tuning is
AGI’s broad cognitive capabilities, a comprehensive approach 8.
towards understanding and addressing biases across diverse Aligning AI with human values poses inherently unresolv-
domains is emerging as a critical research area. able challenges (△) in all contexts—MoE, multimodality, and
Data Security maintains a consistent relevance (↔) with AGI—with a score of 1. This reflects the complexity and
a score of 3 across MoE, multimodality, and AGI, leading diversity of tasks MoE models handle, the integration of
to a total score of 9. The fundamental principles of data various data types in multimodal AI, and the broad range
security remain crucial despite the advancements in MoE, of cognitive abilities encompassed by AGI. These factors
which may necessitate tailored strategies for its distributed contribute to the significant ongoing challenges in aligning
nature. In multimodal AI, the secure handling of diverse data AI with human values, resulting in a total score of 3.
types continues to be of paramount importance. The core 6) Impact On Emerging Trends: Multimodal learning is
tenets of data security are sustained even with the advancement marked as an emerging research direction (ր) with a score
of AGI, though the complexity and scope of security measures of 5 in both MoE and AGI contexts, reflecting its capacity to
are likely to increase. integrate various data types such as text, images, and audio.
AI Ethics is marked for redirection (֒→) with a score This integration is crucial for specialized tasks in MoE and
of 4 in both MoE and multimodality, and faces inherently processing diverse forms of data in AGI. In the realm of
unresolvable challenges (△) with a score of 1 in AGI, accu- multimodality, it remains a core aspect (↔) with a score of 3,
mulating a total score of 9. The decision-making processes being essential for ongoing multimodal AI development. The
and transparency of MoE models necessitate a reevaluation overall impact score is 13.
of ethical considerations. In multimodal AI, ethical concerns, Interactive and Cooperative AI requires redirection (֒→) in
particularly in the interpretation and use of multimodal data, MoE with a score of 4, as MoE models adapt to include more
require new approaches. The ethical challenges in AGI are interactive elements for broader applications. In multimodality,
expected to be complex and involve deep philosophical and interaction and cooperation continue to be central (↔) with
societal implications that might be difficult to fully resolve. a score of 3, especially in fields like robotics and virtual
Privacy Preservation scores a redirection need (֒→) of 4 assistants. AGI’s evolution includes significant advancements
across MoE, multimodality, and AGI, leading to an overall in interactive AI, marking it as an emerging research area (ր)
score of 12. The distributed nature of MoE systems requires a with a score of 5. The total score for this trend is 12.
reassessment of privacy preservation techniques to handle data The development of AGI necessitates redirection (֒→) in
processed by multiple experts. Multimodal AI systems, espe- both MoE and multimodality, each with a score of 4, indicating
cially those handling sensitive data such as images and sounds, the need for more integrated and complex systems. AGI
necessitate tailored privacy strategies. With the extensive data remains at the forefront of its own field (↔) with a score
processing capabilities of AGI, advanced and potentially new of 3, with each breakthrough directly influencing its progress.
approaches to privacy preservation are called for. The overall impact score for AGI development is 11.
5) Impact On Advanced Learning: In the context of MoE, AGI containment is identified as a challenge not required to
self-supervised learning requires redirection (֒→) with a score be solved (△) in both MoE and multimodality, with a score
of 4, signaling the need to adapt to the evolving architecture. of 1, as these areas are not expected to reach the levels of
Emerging research directions (ր) with a score of 5 are iden- autonomy and complexity associated with AGI. However, as
tified in multimodality, suggesting the integration of various AGI progresses, the emerging need for effective containment
autonomous data types like text, image, and audio. For AGI, strategies is marked (ր) with a score of 5, highlighting the
JOURNAL OF LATEX CLASS FILES, VOL. 1, NO. 1, DECEMBER 2023 19
importance of ensuring safe and controlled AI deployment. effectively engage with and leverage the advancements in
The total impact score is 7. AI, equipping them with the necessary skills to navigate its
complexities and innovations.
VIII. E MERGENT R ESEARCH P RIORITIES IN G ENERATIVE
AI C. Emergent Research Priorities in AGI
As we are likely to approach the precipice of a new The AGI domain is witnessing a surge in research priorities
era marked by the advent of Q*, nudging us closer to the across multiple areas:
realization of usable AGI, the research landscape in generative • Multimodal Models in Model Architecture: Similar to
AI is undergoing a crucial transformation. MoE, multimodal models are crucial in AGI, enabling
deeper and more nuanced understanding.
A. Emergent Research Priorities in MoE • Reinforcement Learning in Training Techniques:
The MoE domain is increasingly focusing on two critical Emerging as a key area in AGI, reinforcement learning
areas: focuses on developing autonomous systems learning from
• Multimodal Models in Model Architecture: The in- their environment.
tegration of MoE and AGI is opening new pathways • Application Domains: AGI is extending the boundaries
for research in multimodal models. These developments of natural language understanding and generation, conver-
are enhancing the capability to process and synthesize sational AI, and creative AI, with a focus on human-like
information from multiple modalities, which is crucial comprehension and creativity.
for both specialized and generalized AI systems. • Bias Mitigation in Compliance and Ethical Consid-
• Multimodal Learning in Emerging Trends: MoE is at erations: New directions in bias mitigation are focusing
the forefront of multimodal learning, integrating diverse on a comprehensive approach to addressing biases across
data types like text, images, and audio for specialized diverse domains in AGI.
tasks. This trend is directly impacting the enhancement • Meta-Learning in Advanced Learning: AGI’s pursuit
demand for GPUs and TPUs is accentuated, particularly 2) Existing Industry Solutions: Generative AI is reshaping
when handling complex computations and large datasets various industries by offering innovative solutions and altering
typical in multimodal AI applications. market dynamics.
• Memory Usage in AI Modeling: A critical challenge in • Sector-Wise Deployment: The diverse applications of
training and deploying large-scale AI models, particularly generative AI, from digital content creation to process
in multimodal and AGI systems executed on GPUs, streamlining, also raise questions about originality and
lies in the substantial GPU and VRAM requirements. intellectual property rights.
Unlike computer RAM, VRAM often cannot be expanded • Impact on Market Dynamics: The effect of AI solutions
easily on many platforms, posing significant constraints. on traditional industry structures and the introduction of
Developing strategies for GPU and VRAM optimization novel business models are significant considerations.
and efficient model scaling is thus crucial for the practical • Challenges and Constraints: Addressing limitations
deployment of these AI technologies. such as scalability, data management complexity, privacy
• Scalability and Efficiency in AI Deployment: Address- concerns, and ethical implications is essential for robust
ing scalability challenges in generative AI, especially in governance frameworks.
MoE and AGI contexts, involves optimizing load man-
agement and parallel processing techniques. This is vital
for their practical application in fields like healthcare, C. Limitations and Future Directions in Generative AI Tech-
finance, and education. nologies
2) Real-world Application Examples of Generative AI Tech- 1) Technical Limitations: Identifying and addressing tech-
nologies: The application of generative AI models in real- nical limitations in generative AI models is crucial for their
world scenarios demonstrates their transformative potential advancement and reliability.
and challenges in various sectors. • Contextual Understanding: Enhancing AI’s ability to
• Healthcare: In healthcare, generative AI facilitates ad- understand and interpret context, especially in natural
vancements in diagnostic imaging and personalized language processing and image recognition, is a key area
medicine, but also raises significant concerns regarding for improvement.
data privacy and the potential for misuse of sensitive • Handling Ambiguous Data: Developing better algo-
health information [322]. rithms for processing ambiguous or incomplete data sets
• Finance: The use of AI for fraud detection and al- is essential for decision-making accuracy and reliability.
gorithmic trading in finance underlines its efficiency • Navigating Human Judgment: Despite generative AI’s
and accuracy, while at the same time, it raises ethical accuracy in interpreting policies and procedures, its im-
concerns, particularly in automated decision-making pro- pact is limited in replacing human judgment. This is
cesses, which may lack transparency and accountability especially true in legal and political contexts where
[323]. decision-makers might selectively use AIGC, leading to
• Education: Generative AI’s role in creating personalized biased outcomes. Thus, the effectiveness of generative AI
learning experiences offers immense benefits in terms of in such scenarios should be realistically assessed.
educational accessibility and tailored instruction. How-
2) Future Research Directions to Enhance the Practicality
ever, it poses challenges in equitable access to technology,
of Generative AI: Future research in generative AI should
potential biases in AI-Generated Content (AIGC), and
focus on addressing current limitations and expanding its
could reduce demand for human educators. Addition-
practical applications.
ally, there’s a growing concern about educators who are
• Improved Contextual Understanding: Research should
against the use of AIGC, fearing it may undermine tradi-
tional teaching methodologies and the role of educators. aim at developing models with better contextual aware-
ness, particularly in complex natural language and image
processing tasks.
B. Commercial Viability and Industry Solutions in Generative
• Robust Handling of Ambiguous Data: Investigating
AI Technologies techniques for effective processing of ambiguous data is
1) Market Readiness: Assessing the market readiness of vital for advancing the decision-making capabilities of AI
generative AI technologies involves analyzing cost, accessi- models.
bility, deployment challenges, and user adoption trends. • Ethical Integration of AIGC in Legal and Political
• Cost Analysis: The financial aspects of deploying gen- Arenas: Future research should focus on the ethical
erative AI, including MoE, multimodality, and AGI, are integration of AI-generated content into legal and political
crucial for market adoption. decision-making processes, which involves developing
• Accessibility and Deployment: Integration of these tech- frameworks that utilize AIGC in a supportive role, en-
nologies into existing systems and the technical expertise suring it enhances human judgment and contributes to
required are key factors influencing their adoption. transparency and fairness [324]. Importantly, researchers
• User Adoption Trends: Understanding current adoption should consider the biases and limitations inherent in AI
patterns provides insights into market acceptance and the [324], alongside the potential for human fallibility, ethical
role of user trust and perceived benefits. complexities, and possible corruption in these domains.
JOURNAL OF LATEX CLASS FILES, VOL. 1, NO. 1, DECEMBER 2023 21
20
20 1
20 2
23
1
1
1
1
1
1
1
1
2
2
thoroughly assess preprint research undermines the foundation
20
20
Figure 8: Possible Convergence Between Traditional Peer Review and the Preprint Ecosystem
thorough and diverse evaluation. Subsequent, more formal generative AI have been identified as significant, as their
peer review processes could then refine and endorse these advancements can enhance model performance and versatility,
preprints for academic rigor and quality assurance. This hybrid and pave the way for future research in areas like ethical AI
model would require robust technological support, possibly alignment and AGI. As we forge ahead, the balance between
leveraging AI and machine learning tools to assist in initial AI advancements and human creativity is not just a goal
screening and identification of suitable reviewers. The aim but a necessity, ensuring AI’s role as a complementary force
would be to establish a seamless continuum from rapid dis- that amplifies our capacity to innovate and solve complex
semination to validated publication, ensuring both the speed challenges. Our responsibility is to guide these advancements
of preprints and the credibility of peer-reviewed research. A towards enriching the human experience, aligning technolog-
balanced approach must be struck to harness the benefits of ical progress with ethical standards and societal well-being.
preprints—such as rapid dissemination of findings and open
access—while mitigating their drawbacks. The development of D ISCLAIMER
new infrastructure and norms could be instrumental in steering
the academic community towards a sustainable model that The authors hereby declare no conflict of interest.
upholds the integrity and trustworthiness of scientific research
in the age of Generative AI. A BBREVIATIONS
[7] G.-G. Lee, L. Shi, E. Latif, Y. Gao, A. Bewersdorf, M. Nyaaba, S. Guo, objects in rgb-thermal images,” Neurocomputing, vol. 527, pp. 119–
Z. Wu, Z. Liu, H. Wang et al., “Multimodality of ai for education: To- 129, 2023.
wards artificial general intelligence,” arXiv preprint arXiv:2312.06037, [30] Q. Ye, H. Xu, G. Xu, J. Ye, M. Yan, Y. Zhou, J. Wang, A. Hu, P. Shi,
2023. Y. Shi et al., “mplug-owl: Modularization empowers large language
[8] P. Maddigan and T. Susnjak, “Chat2vis: Generating data visualisations models with multimodality,” arXiv preprint arXiv:2304.14178, 2023.
via natural language using chatgpt, codex and gpt-3 large language [31] K. LaGrandeur, “How safe is our reliance on ai, and should we regulate
models,” IEEE Access, 2023. it?” AI and Ethics, vol. 1, pp. 93–99, 2021.
[9] T. R. McIntosh, T. Liu, T. Susnjak, P. Watters, A. Ng, and M. N. [32] S. McLean, G. J. Read, J. Thompson, C. Baber, N. A. Stanton, and
Halgamuge, “A culturally sensitive test to evaluate nuanced gpt hallu- P. M. Salmon, “The risks associated with artificial general intelligence:
cination,” IEEE Transactions on Artificial Intelligence, vol. 1, no. 01, A systematic review,” Journal of Experimental & Theoretical Artificial
pp. 1–13, 2023. Intelligence, vol. 35, no. 5, pp. 649–663, 2023.
[10] M. R. Morris, J. Sohl-dickstein, N. Fiedel, T. Warkentin, A. Dafoe, [33] Y. K. Dwivedi, L. Hughes, E. Ismagilova, G. Aarts, C. Coombs,
A. Faust, C. Farabet, and S. Legg, “Levels of agi: Operationalizing T. Crick, Y. Duan, R. Dwivedi, J. Edwards, A. Eirug, V. Galanos,
progress on the path to agi,” arXiv preprint arXiv:2311.02462, 2023. P. V. Ilavarasan, M. Janssen, P. Jones, A. K. Kar, H. Kizgin, B. Kro-
[11] J. Schuett, N. Dreksler, M. Anderljung, D. McCaffary, L. Heim, nemann, B. Lal, B. Lucini, R. Medaglia, K. Le Meunier-FitzHugh,
E. Bluemke, and B. Garfinkel, “Towards best practices in agi L. C. Le Meunier-FitzHugh, S. Misra, E. Mogaji, S. K. Sharma,
safety and governance: A survey of expert opinion,” arXiv preprint J. B. Singh, V. Raghavan, R. Raman, N. P. Rana, S. Samothrakis,
arXiv:2305.07153, 2023. J. Spencer, K. Tamilmani, A. Tubadji, P. Walton, and M. D. Williams,
[12] X. Shuai, J. Rollins, I. Moulinier, T. Custis, M. Edmunds, and “Artificial intelligence (ai): Multidisciplinary perspectives on emerging
F. Schilder, “A multidimensional investigation of the effects of pub- challenges, opportunities, and agenda for research, practice and policy,”
lication retraction on scholarly impact,” Journal of the Association for International Journal of Information Management, vol. 57, p. 101994,
Information Science and Technology, vol. 68, no. 9, pp. 2225–2236, 2021.
2017. [34] I. Gabriel, “Artificial intelligence, values, and alignment,” Minds and
[13] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Machines, vol. 30, pp. 411–437, 2020.
Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” [35] A. Shaban-Nejad, M. Michalowski, S. Bianco, J. S. Brownstein,
Advances in neural information processing systems, vol. 30, 2017. D. L. Buckeridge, and R. L. Davis, “Applied artificial intelligence in
[14] A. Radford, K. Narasimhan, T. Salimans, I. Sutskever et al., “Improving healthcare: Listening to the winds of change in a post-covid-19 world,”
language understanding by generative pre-training,” 2018. pp. 1969–1971, 2022.
[15] C. Huang, Z. Zhang, B. Mao, and X. Yao, “An overview of artificial [36] Z. Ji, N. Lee, R. Frieske, T. Yu, D. Su, Y. Xu, E. Ishii, Y. J. Bang,
intelligence ethics,” IEEE Transactions on Artificial Intelligence, 2022. A. Madotto, and P. Fung, “Survey of hallucination in natural language
[16] L. Besançon, N. Peiffer-Smadja, C. Segalas, H. Jiang, P. Masuzzo, generation,” ACM Computing Surveys, vol. 55, no. 12, pp. 1–38, 2023.
C. Smout, E. Billy, M. Deforet, and C. Leyrat, “Open science saves [37] B. Min, H. Ross, E. Sulem, A. P. B. Veyseh, T. H. Nguyen, O. Sainz,
lives: lessons from the covid-19 pandemic,” BMC Medical Research E. Agirre, I. Heintz, and D. Roth, “Recent advances in natural language
Methodology, vol. 21, no. 1, pp. 1–18, 2021. processing via large pre-trained language models: A survey,” ACM
[17] C. R. Triggle, R. MacDonald, D. J. Triggle, and D. Grierson, “Requiem Computing Surveys, vol. 56, no. 2, pp. 1–40, 2023.
for impact factors and high publication charges,” Accountability in [38] J. Li, X. Cheng, W. X. Zhao, J.-Y. Nie, and J.-R. Wen, “Halueval:
Research, vol. 29, no. 3, pp. 133–164, 2022. A large-scale hallucination evaluation benchmark for large language
[18] T. McIntosh, A. Kayes, Y.-P. P. Chen, A. Ng, and P. Watters, “Ran- models,” in Proceedings of the 2023 Conference on Empirical Methods
somware mitigation in the modern era: A comprehensive review, in Natural Language Processing, 2023, pp. 6449–6464.
research challenges, and future directions,” ACM Computing Surveys [39] L. Weidinger, J. Mellor, M. Rauh, C. Griffin, J. Uesato, P.-S. Huang,
(CSUR), vol. 54, no. 9, pp. 1–36, 2021. M. Cheng, M. Glaese, B. Balle, A. Kasirzadeh et al., “Ethical and social
[19] T. McIntosh, T. Liu, T. Susnjak, H. Alavizadeh, A. Ng, R. Nowrozy, risks of harm from language models,” arXiv preprint arXiv:2112.04359,
and P. Watters, “Harnessing gpt-4 for generation of cybersecurity grc 2021.
policies: A focus on ransomware attack mitigation,” Computers & [40] X. Zhiheng, Z. Rui, and G. Tao, “Safety and ethical concerns of large
Security, vol. 134, p. 103424, 2023. language models,” in Proceedings of the 22nd Chinese National Con-
[20] H. Bao, W. Wang, L. Dong, Q. Liu, O. K. Mohammed, K. Aggarwal, ference on Computational Linguistics (Volume 4: Tutorial Abstracts),
S. Som, S. Piao, and F. Wei, “Vlmo: Unified vision-language pre- 2023, pp. 9–16.
training with mixture-of-modality-experts,” Advances in Neural Infor- [41] P. F. Brown, V. J. Della Pietra, P. V. Desouza, J. C. Lai, and R. L. Mer-
mation Processing Systems, vol. 35, pp. 32 897–32 912, 2022. cer, “Class-based n-gram models of natural language,” Computational
[21] N. Du, Y. Huang, A. M. Dai, S. Tong, D. Lepikhin, Y. Xu, M. Krikun, linguistics, vol. 18, no. 4, pp. 467–480, 1992.
Y. Zhou, A. W. Yu, O. Firat et al., “Glam: Efficient scaling of [42] S. Katz, “Estimation of probabilities from sparse data for the language
language models with mixture-of-experts,” in International Conference model component of a speech recognizer,” IEEE transactions on
on Machine Learning. PMLR, 2022, pp. 5547–5569. acoustics, speech, and signal processing, vol. 35, no. 3, pp. 400–401,
[22] S. Masoudnia and R. Ebrahimpour, “Mixture of experts: a literature 1987.
survey,” Artificial Intelligence Review, vol. 42, pp. 275–293, 2014. [43] R. Kneser and H. Ney, “Improved backing-off for m-gram language
[23] C. Riquelme, J. Puigcerver, B. Mustafa, M. Neumann, R. Jenatton, modeling,” in 1995 international conference on acoustics, speech, and
A. Susano Pinto, D. Keysers, and N. Houlsby, “Scaling vision with signal processing, vol. 1. IEEE, 1995, pp. 181–184.
sparse mixture of experts,” Advances in Neural Information Processing [44] R. Kuhn and R. De Mori, “A cache-based natural language model
Systems, vol. 34, pp. 8583–8595, 2021. for speech recognition,” IEEE transactions on pattern analysis and
[24] S. E. Yuksel, J. N. Wilson, and P. D. Gader, “Twenty years of mixture of machine intelligence, vol. 12, no. 6, pp. 570–583, 1990.
experts,” IEEE transactions on neural networks and learning systems, [45] H. Ney, U. Essen, and R. Kneser, “On structuring probabilistic de-
vol. 23, no. 8, pp. 1177–1193, 2012. pendences in stochastic language modelling,” Computer Speech &
[25] L. Zhang, S. Huang, W. Liu, and D. Tao, “Learning a mixture of Language, vol. 8, no. 1, pp. 1–38, 1994.
granularity-specific experts for fine-grained categorization,” in Proceed- [46] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural
ings of the IEEE/CVF International Conference on Computer Vision, computation, vol. 9, no. 8, pp. 1735–1780, 1997.
2019, pp. 8331–8340. [47] M. K. Nammous and K. Saeed, “Natural language processing: speaker,
[26] D. Martin, S. Malpica, D. Gutierrez, B. Masia, and A. Serrano, language, and gender identification with lstm,” Advanced Computing
“Multimodality in vr: A survey,” ACM Computing Surveys (CSUR), and Systems for Security: Volume Eight, pp. 143–156, 2019.
vol. 54, no. 10s, pp. 1–36, 2022. [48] D. Wei, B. Wang, G. Lin, D. Liu, Z. Dong, H. Liu, and Y. Liu, “Re-
[27] Q. Sun, Q. Yu, Y. Cui, F. Zhang, X. Zhang, Y. Wang, H. Gao, J. Liu, search on unstructured text data mining and fault classification based on
T. Huang, and X. Wang, “Generative pretraining in multimodality,” rnn-lstm with malfunction inspection report,” Energies, vol. 10, no. 3,
arXiv preprint arXiv:2307.05222, 2023. p. 406, 2017.
[28] L. Wei, L. Xie, W. Zhou, H. Li, and Q. Tian, “Mvp: Multimodality- [49] L. Yao and Y. Guan, “An improved lstm structure for natural language
guided visual pre-training,” in European Conference on Computer processing,” in 2018 IEEE International Conference of Safety Produce
Vision. Springer, 2022, pp. 337–353. Informatization (IICSPI). IEEE, 2018, pp. 565–569.
[29] J. Wu, W. Zhou, X. Qian, J. Lei, L. Yu, and T. Luo, “Menet: [50] L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. Wainwright, P. Mishkin,
Lightweight multimodality enhancement network for detecting salient C. Zhang, S. Agarwal, K. Slama, A. Ray et al., “Training language
JOURNAL OF LATEX CLASS FILES, VOL. 1, NO. 1, DECEMBER 2023 24
models to follow instructions with human feedback,” Advances in [72] Y. Wolf, N. Wies, Y. Levine, and A. Shashua, “Fundamental lim-
Neural Information Processing Systems, vol. 35, pp. 27 730–27 744, itations of alignment in large language models,” arXiv preprint
2022. arXiv:2304.11082, 2023.
[51] T. Susnjak, “Beyond predictive learning analytics modelling and onto [73] H. Dang, L. Mecke, F. Lehmann, S. Goller, and D. Buschek, “How
explainable artificial intelligence with prescriptive analytics and chat- to prompt? opportunities and challenges of zero-and few-shot learning
gpt,” International Journal of Artificial Intelligence in Education, pp. for human-ai interaction in creative applications of generative models,”
1–31, 2023. arXiv preprint arXiv:2209.01390, 2022.
[52] T. Susnjak, E. Griffin, M. McCutcheon, and K. Potter, “Towards clinical [74] R. Ma, X. Zhou, T. Gui, Y. Tan, L. Li, Q. Zhang, and X. Huang,
prediction with transparency: An explainable ai approach to survival “Template-free prompt tuning for few-shot ner,” arXiv preprint
modelling in residential aged care,” arXiv preprint arXiv:2312.00271, arXiv:2109.13532, 2021.
2023. [75] C. Qin and S. Joty, “Lfpt5: A unified framework for lifelong few-
[53] R. Yang, T. F. Tan, W. Lu, A. J. Thirunavukarasu, D. S. W. Ting, shot language learning based on prompt tuning of t5,” arXiv preprint
and N. Liu, “Large language models in health care: Development, arXiv:2110.07298, 2021.
applications, and challenges,” Health Care Science, vol. 2, no. 4, pp. [76] S. Wang, L. Tang, A. Majety, J. F. Rousseau, G. Shih, Y. Ding,
255–263, 2023. and Y. Peng, “Trustworthy assertion classification through prompting,”
[54] D. Baidoo-Anu and L. O. Ansah, “Education in the era of generative Journal of biomedical informatics, vol. 132, p. 104139, 2022.
artificial intelligence (ai): Understanding the potential benefits of chat- [77] Y. Fan, F. Jiang, P. Li, and H. Li, “Grammargpt: Exploring open-source
gpt in promoting teaching and learning,” Journal of AI, vol. 7, no. 1, llms for native chinese grammatical error correction with supervised
pp. 52–62, 2023. fine-tuning,” in CCF International Conference on Natural Language
[55] T. Susnjak, “Chatgpt: The end of online exam integrity?” arXiv preprint Processing and Chinese Computing. Springer, 2023, pp. 69–80.
arXiv:2212.09292, 2022. [78] D. Liga and L. Robaldo, “Fine-tuning gpt-3 for legal rule classifica-
[56] A. Tlili, B. Shehata, M. A. Adarkwah, A. Bozkurt, D. T. Hickey, tion,” Computer Law & Security Review, vol. 51, p. 105864, 2023.
R. Huang, and B. Agyemang, “What if the devil is my guardian angel: [79] Y. Liu, A. Singh, C. D. Freeman, J. D. Co-Reyes, and P. J. Liu, “Im-
Chatgpt as a case study of using chatbots in education,” Smart Learning proving large language model fine-tuning for solving math problems,”
Environments, vol. 10, no. 1, p. 15, 2023. arXiv preprint arXiv:2310.10047, 2023.
[57] M. A. AlAfnan, S. Dishari, M. Jovic, and K. Lomidze, “Chatgpt as an [80] Z. Talat, A. Névéol, S. Biderman, M. Clinciu, M. Dey, S. Longpre,
educational tool: Opportunities, challenges, and recommendations for S. Luccioni, M. Masoud, M. Mitchell, D. Radev et al., “You reap
communication, business writing, and composition courses,” Journal of what you sow: On the challenges of bias evaluation under multilingual
Artificial Intelligence and Technology, vol. 3, no. 2, pp. 60–68, 2023. settings,” in Proceedings of BigScience Episode# 5–Workshop on
[58] A. S. George and A. H. George, “A review of chatgpt ai’s impact on Challenges & Perspectives in Creating Large Language Models, 2022,
several business sectors,” Partners Universal International Innovation pp. 26–41.
Journal, vol. 1, no. 1, pp. 9–23, 2023. [81] Y. Liu, S. Yu, and T. Lin, “Hessian regularization of deep neural
[59] G. K. Hadfield and J. Clark, “Regulatory markets: The future of ai networks: A novel approach based on stochastic estimators of hessian
governance,” arXiv preprint arXiv:2304.04914, 2023. trace,” Neurocomputing, vol. 536, pp. 13–20, 2023.
[60] M. Bakker, M. Chadwick, H. Sheahan, M. Tessler, L. Campbell- [82] Y. Lu, Y. Bo, and W. He, “Confidence adaptive regularization for deep
Gillingham, J. Balaguer, N. McAleese, A. Glaese, J. Aslanides, learning with noisy labels,” arXiv preprint arXiv:2108.08212, 2021.
M. Botvinick et al., “Fine-tuning language models to find agreement [83] G. Pereyra, G. Tucker, J. Chorowski, Ł. Kaiser, and G. Hinton, “Regu-
among humans with diverse preferences,” Advances in Neural Infor- larizing neural networks by penalizing confident output distributions,”
mation Processing Systems, vol. 35, pp. 38 176–38 189, 2022. arXiv preprint arXiv:1701.06548, 2017.
[61] Z. Hu, Y. Lan, L. Wang, W. Xu, E.-P. Lim, R. K.-W. Lee, L. Bing, and [84] E. Chen, Z.-W. Hong, J. Pajarinen, and P. Agrawal, “Redeeming
S. Poria, “Llm-adapters: An adapter family for parameter-efficient fine- intrinsic rewards via constrained optimization,” Advances in Neural
tuning of large language models,” arXiv preprint arXiv:2304.01933, Information Processing Systems, vol. 35, pp. 4996–5008, 2022.
2023. [85] Y. Jiang, Z. Li, M. Tan, S. Wei, G. Zhang, Z. Guan, and B. Han,
[62] H. Liu, D. Tam, M. Muqeeth, J. Mohta, T. Huang, M. Bansal, and C. A. “A stable block adjustment method without ground control points
Raffel, “Few-shot parameter-efficient fine-tuning is better and cheaper using bound constrained optimization,” International Journal of Remote
than in-context learning,” Advances in Neural Information Processing Sensing, vol. 43, no. 12, pp. 4708–4722, 2022.
Systems, vol. 35, pp. 1950–1965, 2022. [86] M. Kachuee and S. Lee, “Constrained policy optimization for con-
[63] H. Zheng, L. Shen, A. Tang, Y. Luo, H. Hu, B. Du, and D. Tao, trolled self-learning in conversational ai systems,” arXiv preprint
“Learn from model beyond fine-tuning: A survey,” arXiv preprint arXiv:2209.08429, 2022.
arXiv:2310.08184, 2023. [87] Z. Song, H. Wang, and Y. Jin, “A surrogate-assisted evolutionary
[64] P. Manakul, A. Liusie, and M. J. Gales, “Selfcheckgpt: Zero-resource framework with regions of interests-based data selection for expensive
black-box hallucination detection for generative large language mod- constrained optimization,” IEEE Transactions on Systems, Man, and
els,” arXiv preprint arXiv:2303.08896, 2023. Cybernetics: Systems, 2023.
[65] A. Martino, M. Iannelli, and C. Truong, “Knowledge injection to [88] J. Yu, T. Xu, Y. Rong, J. Huang, and R. He, “Structure-aware condi-
counter large language model (llm) hallucination,” in European Se- tional variational auto-encoder for constrained molecule optimization,”
mantic Web Conference. Springer, 2023, pp. 182–185. Pattern Recognition, vol. 126, p. 108581, 2022.
[66] J.-Y. Yao, K.-P. Ning, Z.-H. Liu, M.-N. Ning, and L. Yuan, “Llm [89] P. Butlin, “Ai alignment and human reward,” in Proceedings of the 2021
lies: Hallucinations are not bugs, but features as adversarial examples,” AAAI/ACM Conference on AI, Ethics, and Society, 2021, pp. 437–445.
arXiv preprint arXiv:2310.01469, 2023. [90] F. Faal, K. Schmitt, and J. Y. Yu, “Reward modeling for mitigating
[67] Y. Zhang, Y. Li, L. Cui, D. Cai, L. Liu, T. Fu, X. Huang, E. Zhao, toxicity in transformer-based language models,” Applied Intelligence,
Y. Zhang, Y. Chen et al., “Siren’s song in the ai ocean: A survey on hal- vol. 53, no. 7, pp. 8421–8435, 2023.
lucination in large language models,” arXiv preprint arXiv:2309.01219, [91] J. Leike, D. Krueger, T. Everitt, M. Martic, V. Maini, and S. Legg,
2023. “Scalable agent alignment via reward modeling: a research direction,”
[68] J. Ji, M. Liu, J. Dai, X. Pan, C. Zhang, C. Bian, R. Sun, Y. Wang, and arXiv preprint arXiv:1811.07871, 2018.
Y. Yang, “Beavertails: Towards improved safety alignment of llm via [92] L. Li, Y. Chai, S. Wang, Y. Sun, H. Tian, N. Zhang, and H. Wu, “Tool-
a human-preference dataset,” arXiv preprint arXiv:2307.04657, 2023. augmented reward modeling,” arXiv preprint arXiv:2310.01045, 2023.
[69] Y. Liu, Y. Yao, J.-F. Ton, X. Zhang, R. G. H. Cheng, Y. Klochkov, [93] F. Barreto, L. Moharkar, M. Shirodkar, V. Sarode, S. Gonsalves,
M. F. Taufiq, and H. Li, “Trustworthy llms: a survey and guideline and A. Johns, “Generative artificial intelligence: Opportunities and
for evaluating large language models’ alignment,” arXiv preprint challenges of large language models,” in International Conference on
arXiv:2308.05374, 2023. Intelligent Computing and Networking. Springer, 2023, pp. 545–553.
[70] Y. Wang, W. Zhong, L. Li, F. Mi, X. Zeng, W. Huang, L. Shang, [94] Z. Chen, Z. Wang, Z. Wang, H. Liu, Z. Yin, S. Liu, L. Sheng,
X. Jiang, and Q. Liu, “Aligning large language models with human: A W. Ouyang, Y. Qiao, and J. Shao, “Octavius: Mitigating task inter-
survey,” arXiv preprint arXiv:2307.12966, 2023. ference in mllms via moe,” arXiv preprint arXiv:2311.02684, 2023.
[71] Z. Sun, Y. Shen, Q. Zhou, H. Zhang, Z. Chen, D. Cox, Y. Yang, [95] C. Dun, M. D. C. H. Garcia, G. Zheng, A. H. Awadallah, A. Kyrillidis,
and C. Gan, “Principle-driven self-alignment of language mod- and R. Sim, “Sweeping heterogeneity with smart mops: Mixture of
els from scratch with minimal human supervision,” arXiv preprint prompts for llm task adaptation,” arXiv preprint arXiv:2310.02842,
arXiv:2305.03047, 2023. 2023.
JOURNAL OF LATEX CLASS FILES, VOL. 1, NO. 1, DECEMBER 2023 25
[96] H. Naveed, A. U. Khan, S. Qiu, M. Saqib, S. Anwar, M. Usman, ai techniques,” International Journal of High Performance Systems
N. Barnes, and A. Mian, “A comprehensive overview of large language Architecture, vol. 10, no. 3-4, pp. 185–196, 2021.
models,” arXiv preprint arXiv:2307.06435, 2023. [118] C. Zhang, Z. Yang, X. He, and L. Deng, “Multimodal intelligence:
[97] F. Xue, Y. Fu, W. Zhou, Z. Zheng, and Y. You, “To repeat or not to Representation learning, information fusion, and applications,” IEEE
repeat: Insights from scaling llm under token-crisis,” arXiv preprint Journal of Selected Topics in Signal Processing, vol. 14, no. 3, pp.
arXiv:2305.13230, 2023. 478–493, 2020.
[98] M. Nowaz Rabbani Chowdhury, S. Zhang, M. Wang, S. Liu, and P.-Y. [119] H. Qiao, V. Liu, and L. Chilton, “Initial images: using image prompts
Chen, “Patch-level routing in mixture-of-experts is provably sample- to improve subject representation in multimodal ai generated art,” in
efficient for convolutional neural networks,” arXiv e-prints, pp. arXiv– Proceedings of the 14th Conference on Creativity and Cognition, 2022,
2306, 2023. pp. 15–28.
[99] J. Peng, K. Zhou, R. Zhou, T. Hartvigsen, Y. Zhang, Z. Wang, and [120] A. E. Stewart, Z. Keirn, and S. K. D’Mello, “Multimodal modeling
T. Chen, “Sparse moe as a new treatment: Addressing forgetting, fitting, of collaborative problem-solving facets in triads,” User Modeling and
learning issues in multi-modal multi-task learning,” in Conference on User-Adapted Interaction, pp. 1–39, 2021.
Parsimony and Learning (Recent Spotlight Track), 2023. [121] L. Xue, N. Yu, S. Zhang, J. Li, R. Martı́n-Martı́n, J. Wu, C. Xiong,
[100] C. N. d. Santos, J. Lee-Thorp, I. Noble, C.-C. Chang, and D. Uthus, R. Xu, J. C. Niebles, and S. Savarese, “Ulip-2: Towards scal-
“Memory augmented language models through mixture of word ex- able multimodal pre-training for 3d understanding,” arXiv preprint
perts,” arXiv preprint arXiv:2311.10768, 2023. arXiv:2305.08275, 2023.
[101] W. Wang, G. Ma, Y. Li, and B. Du, “Language-routing mixture of [122] L. Yan, L. Zhao, D. Gasevic, and R. Martinez-Maldonado, “Scalability,
experts for multilingual and code-switching speech recognition,” arXiv sustainability, and ethicality of multimodal learning analytics,” in
preprint arXiv:2307.05956, 2023. LAK22: 12th international learning analytics and knowledge confer-
[102] X. Zhao, X. Chen, Y. Cheng, and T. Chen, “Sparse moe with language ence, 2022, pp. 13–23.
guided routing for multilingual machine translation,” in Conference on [123] Y. Liu-Thompkins, S. Okazaki, and H. Li, “Artificial empathy in
Parsimony and Learning (Recent Spotlight Track), 2023. marketing interactions: Bridging the human-ai gap in affective and
[103] W. Huang, H. Zhang, P. Peng, and H. Wang, “Multi-gate mixture- social customer experience,” Journal of the Academy of Marketing
of-expert combined with synthetic minority over-sampling technique Science, vol. 50, no. 6, pp. 1198–1218, 2022.
for multimode imbalanced fault diagnosis,” in 2023 26th International [124] M. S. Rahman, S. Bag, M. A. Hossain, F. A. M. A. Fattah, M. O.
Conference on Computer Supported Cooperative Work in Design Gani, and N. P. Rana, “The new wave of ai-powered luxury brands
(CSCWD). IEEE, 2023, pp. 456–461. online shopping experience: The role of digital multisensory cues and
[104] B. Liu, L. Ding, L. Shen, K. Peng, Y. Cao, D. Cheng, and D. Tao, customers’ engagement,” Journal of Retailing and Consumer Services,
“Diversifying the mixture-of-experts representation for language mod- vol. 72, p. 103273, 2023.
els with orthogonal optimizer,” arXiv preprint arXiv:2310.09762, 2023. [125] E. Sachdeva, N. Agarwal, S. Chundi, S. Roelofs, J. Li, B. Dariush,
[105] W. Wang, Z. Lai, S. Li, W. Liu, K. Ge, Y. Liu, A. Shen, and D. Li, C. Choi, and M. Kochenderfer, “Rank2tell: A multimodal driving
“Prophet: Fine-grained load balancing for parallel training of large- dataset for joint importance ranking and reasoning,” arXiv preprint
scale moe models,” in 2023 IEEE International Conference on Cluster arXiv:2309.06597, 2023.
Computing (CLUSTER). IEEE, 2023, pp. 82–94. [126] C. Cui, Y. Ma, X. Cao, W. Ye, Y. Zhou, K. Liang, J. Chen, J. Lu,
[106] X. Yao, S. Liang, S. Han, and H. Huang, “Enhancing molecular Z. Yang, K.-D. Liao et al., “A survey on multimodal large language
property prediction via mixture of collaborative experts,” arXiv preprint models for autonomous driving,” arXiv preprint arXiv:2311.12320,
arXiv:2312.03292, 2023. 2023.
[107] Z. Xiao, Y. Jiang, G. Tang, L. Liu, S. Xu, Y. Xiao, and W. Yan, “Ad- [127] A. B. Temsamani, A. K. Chavali, W. Vervoort, T. Tuytelaars, G. Rade-
versarial mixture of experts with category hierarchy soft constraint,” vski, H. Van Hamme, K. Mets, M. Hutsebaut-Buysse, T. De Schepper,
in 2021 IEEE 37th International Conference on Data Engineering and S. Latré, “A multimodal ai approach for intuitively instructable
(ICDE). IEEE, 2021, pp. 2453–2463. autonomous systems: a case study of an autonomous off-highway
[108] M. Agbese, R. Mohanani, A. Khan, and P. Abrahamsson, “Implement- vehicle,” in The Eighteenth International Conference on Autonomic
ing ai ethics: Making sense of the ethical requirements,” in Proceedings and Autonomous Systems, ICAS 2022, May 22-26, 2022, Venice, Italy,
of the 27th International Conference on Evaluation and Assessment in 2022, pp. 31–39.
Software Engineering, 2023, pp. 62–71. [128] J. Lee and S. Y. Shin, “Something that they never said: Multimodal
[109] Z. Chen, Y. Deng, Y. Wu, Q. Gu, and Y. Li, “Towards understanding disinformation and source vividness in understanding the power of ai-
the mixture-of-experts layer in deep learning,” Advances in neural enabled deepfake news,” Media Psychology, vol. 25, no. 4, pp. 531–
information processing systems, vol. 35, pp. 23 049–23 062, 2022. 546, 2022.
[110] Y. Zhou, T. Lei, H. Liu, N. Du, Y. Huang, V. Zhao, A. M. Dai, Q. V. [129] S. Muppalla, S. Jia, and S. Lyu, “Integrating audio-visual features
Le, J. Laudon et al., “Mixture-of-experts with expert choice routing,” for multimodal deepfake detection,” arXiv preprint arXiv:2310.03827,
Advances in Neural Information Processing Systems, vol. 35, pp. 7103– 2023.
7114, 2022. [130] S. Kumar, M. K. Chaube, S. N. Nenavath, S. K. Gupta, and S. K.
[111] N. Guha, C. Lawrence, L. A. Gailmard, K. Rodolfa, F. Surani, Tetarave, “Privacy preservation and security challenges: a new frontier
R. Bommasani, I. Raji, M.-F. Cuéllar, C. Honigsberg, P. Liang et al., multimodal machine learning research,” International Journal of Sensor
“Ai regulation has its own alignment problem: The technical and insti- Networks, vol. 39, no. 4, pp. 227–245, 2022.
tutional feasibility of disclosure, registration, licensing, and auditing,” [131] J. Marchang and A. Di Nuovo, “Assistive multimodal robotic system
George Washington Law Review, Forthcoming, 2023. (amrsys): security and privacy issues, challenges, and possible solu-
[112] Gemini Team, Google, “Gemini: A family tions,” Applied Sciences, vol. 12, no. 4, p. 2174, 2022.
of highly capable multimodal models,” 2023, [132] A. Peña, I. Serna, A. Morales, J. Fierrez, A. Ortega, A. Herrarte, M. Al-
accessed: 17 December 2023. [Online]. Available: cantara, and J. Ortega-Garcia, “Human-centric multimodal machine
https://storage.googleapis.com/deepmind-media/gemini/gemini 1 report.pdf learning: Recent advances and testbed on ai-based recruitment,” SN
[113] J. N. Acosta, G. J. Falcone, P. Rajpurkar, and E. J. Topol, “Multimodal Computer Science, vol. 4, no. 5, p. 434, 2023.
biomedical ai,” Nature Medicine, vol. 28, no. 9, pp. 1773–1784, 2022. [133] R. Wolfe and A. Caliskan, “American== white in multimodal language-
[114] S. Qi, Z. Cao, J. Rao, L. Wang, J. Xiao, and X. Wang, “What is and-image ai,” in Proceedings of the 2022 AAAI/ACM Conference on
the limitation of multimodal llms? a deeper look into multimodal AI, Ethics, and Society, 2022, pp. 800–812.
llms through prompt probing,” Information Processing & Management, [134] R. Wolfe, Y. Yang, B. Howe, and A. Caliskan, “Contrastive language-
vol. 60, no. 6, p. 103510, 2023. vision ai models pretrained on web-scraped multimodal data exhibit
[115] B. Xu, D. Kocyigit, R. Grimm, B. P. Griffin, and F. Cheng, “Applica- sexual objectification bias,” in Proceedings of the 2023 ACM Confer-
tions of artificial intelligence in multimodality cardiovascular imaging: ence on Fairness, Accountability, and Transparency, 2023, pp. 1174–
a state-of-the-art review,” Progress in cardiovascular diseases, vol. 63, 1185.
no. 3, pp. 367–376, 2020. [135] M. Afshar, B. Sharma, D. Dligach, M. Oguss, R. Brown, N. Chhabra,
[116] A. Birhane, V. U. Prabhu, and E. Kahembwe, “Multimodal datasets: H. M. Thompson, T. Markossian, C. Joyce, M. M. Churpek et al.,
misogyny, pornography, and malignant stereotypes,” arXiv preprint “Development and multimodal validation of a substance misuse algo-
arXiv:2110.01963, 2021. rithm for referral to treatment using artificial intelligence (smart-ai): a
[117] Y. Li, W. Li, N. Li, X. Qiu, and K. B. Manokaran, “Multimodal infor- retrospective deep learning study,” The Lancet Digital Health, vol. 4,
mation interaction and fusion for the parallel computing system using no. 6, pp. e426–e435, 2022.
JOURNAL OF LATEX CLASS FILES, VOL. 1, NO. 1, DECEMBER 2023 26
[136] H. Alwahaby, M. Cukurova, Z. Papamitsiou, and M. Giannakos, “The [156] Z. Wei, X. Zhang, and M. Sun, “Extracting weighted finite automata
evidence of impact and ethical considerations of multimodal learning from recurrent neural networks for natural languages,” in International
analytics: A systematic literature review,” The Multimodal Learning Conference on Formal Engineering Methods. Springer, 2022, pp.
Analytics Handbook, pp. 289–325, 2022. 370–385.
[137] Q. Miao, W. Zheng, Y. Lv, M. Huang, W. Ding, and F.-Y. Wang, [157] F. Bonassi, M. Farina, J. Xie, and R. Scattolini, “On recurrent neural
“Dao to hanoi via desci: Ai paradigm shifts from alphago to chatgpt,” networks for learning-based control: recent results and ideas for future
IEEE/CAA Journal of Automatica Sinica, vol. 10, no. 4, pp. 877–897, developments,” Journal of Process Control, vol. 114, pp. 92–104, 2022.
2023. [158] Z. Guo, Y. Tang, R. Zhang, D. Wang, Z. Wang, B. Zhao, and
[138] Y. Rong, “Roadmap of alphago to alphastar: Problems and challenges,” X. Li, “Viewrefer: Grasp the multi-view knowledge for 3d visual
in 2nd International Conference on Artificial Intelligence, Automation, grounding,” in Proceedings of the IEEE/CVF International Conference
and High-Performance Computing (AIAHPC 2022), vol. 12348. SPIE, on Computer Vision, 2023, pp. 15 372–15 383.
2022, pp. 904–914. [159] C. Pan, Y. He, J. Peng, Q. Zhang, W. Sui, and Z. Zhang, “Baeformer:
[139] Y. Gao, M. Zhou, D. Liu, Z. Yan, S. Zhang, and D. N. Metaxas, “A Bi-directional and early interaction transformers for bird’s eye view
data-scalable transformer for medical image segmentation: architecture, semantic segmentation,” in Proceedings of the IEEE/CVF Conference
model efficiency, and benchmark,” arXiv preprint arXiv:2203.00131, on Computer Vision and Pattern Recognition, 2023, pp. 9590–9599.
2022. [160] P. Xu, X. Zhu, and D. A. Clifton, “Multimodal learning with transform-
[140] W. Peebles and S. Xie, “Scalable diffusion models with transformers,” ers: A survey,” IEEE Transactions on Pattern Analysis and Machine
in Proceedings of the IEEE/CVF International Conference on Com- Intelligence, 2023.
puter Vision, 2023, pp. 4195–4205. [161] I. Molenaar, S. de Mooij, R. Azevedo, M. Bannert, S. Järvelä, and
[141] R. Pope, S. Douglas, A. Chowdhery, J. Devlin, J. Bradbury, J. Heek, D. Gašević, “Measuring self-regulated learning and the role of ai: Five
K. Xiao, S. Agrawal, and J. Dean, “Efficiently scaling transformer years of research using multimodal multichannel data,” Computers in
inference,” Proceedings of Machine Learning and Systems, vol. 5, Human Behavior, vol. 139, p. 107540, 2023.
2023. [162] S. Steyaert, M. Pizurica, D. Nagaraj, P. Khandelwal, T. Hernandez-
[142] Y. Ding and M. Jia, “Convolutional transformer: An enhanced atten- Boussard, A. J. Gentles, and O. Gevaert, “Multimodal data fusion
tion mechanism architecture for remaining useful life estimation of for cancer biomarker discovery with deep learning,” Nature Machine
bearings,” IEEE Transactions on Instrumentation and Measurement, Intelligence, vol. 5, no. 4, pp. 351–362, 2023.
vol. 71, pp. 1–10, 2022. [163] V. Rani, S. T. Nabi, M. Kumar, A. Mittal, and K. Kumar, “Self-
[143] Y. Ding, M. Jia, Q. Miao, and Y. Cao, “A novel time–frequency supervised learning: A succinct review,” Archives of Computational
transformer based on self–attention mechanism and its application in Methods in Engineering, vol. 30, no. 4, pp. 2761–2775, 2023.
fault diagnosis of rolling bearings,” Mechanical Systems and Signal [164] M. C. Schiappa, Y. S. Rawat, and M. Shah, “Self-supervised learning
Processing, vol. 168, p. 108616, 2022. for videos: A survey,” ACM Computing Surveys, vol. 55, no. 13s, pp.
[144] G. Wang, Y. Zhao, C. Tang, C. Luo, and W. Zeng, “When shift 1–37, 2023.
operation meets vision transformer: An extremely simple alternative [165] J. Yu, H. Yin, X. Xia, T. Chen, J. Li, and Z. Huang, “Self-supervised
to attention mechanism,” in Proceedings of the AAAI Conference on learning for recommender systems: A survey,” IEEE Transactions on
Artificial Intelligence, vol. 36, no. 2, 2022, pp. 2423–2430. Knowledge and Data Engineering, 2023.
[145] H. Cai, J. Li, M. Hu, C. Gan, and S. Han, “Efficientvit: Lightweight [166] V. Bharti, A. Kumar, V. Purohit, R. Singh, A. K. Singh, and S. K.
multi-scale attention for high-resolution dense prediction,” in Proceed- Singh, “A label efficient semi self-supervised learning framework for
ings of the IEEE/CVF International Conference on Computer Vision, iot devices in industrial process,” IEEE Transactions on Industrial
2023, pp. 17 302–17 313. Informatics, 2023.
[146] X. Liu, H. Peng, N. Zheng, Y. Yang, H. Hu, and Y. Yuan, “Efficientvit: [167] D. Sam and J. Z. Kolter, “Losses over labels: Weakly supervised
Memory efficient vision transformer with cascaded group attention,” learning via direct loss construction,” in Proceedings of the AAAI
in Proceedings of the IEEE/CVF Conference on Computer Vision and Conference on Artificial Intelligence, vol. 37, no. 8, 2023, pp. 9695–
Pattern Recognition, 2023, pp. 14 420–14 430. 9703.
[147] Y. Li, Q. Fan, H. Huang, Z. Han, and Q. Gu, “A modified yolov8 [168] M. Wang, P. Xie, Y. Du, and X. Hu, “T5-based model for abstractive
detection network for uav aerial image recognition,” Drones, vol. 7, summarization: A semi-supervised learning approach with consistency
no. 5, p. 304, 2023. loss functions,” Applied Sciences, vol. 13, no. 12, p. 7111, 2023.
[148] F. M. Talaat and H. ZainEldin, “An improved fire detection approach [169] Q. Li, X. Peng, Y. Qiao, and Q. Hao, “Unsupervised person re-
based on yolo-v8 for smart cities,” Neural Computing and Applications, identification with multi-label learning guided self-paced clustering,”
vol. 35, no. 28, pp. 20 939–20 954, 2023. Pattern Recognition, vol. 125, p. 108521, 2022.
[149] S. Tamang, B. Sen, A. Pradhan, K. Sharma, and V. K. Singh, “Enhanc- [170] P. Nancy, H. Pallathadka, M. Naved, K. Kaliyaperumal, K. Arumugam,
ing covid-19 safety: Exploring yolov8 object detection for accurate face and V. Garchar, “Deep learning and machine learning based efficient
mask classification,” International Journal of Intelligent Systems and framework for image based plant disease classification and detection,”
Applications in Engineering, vol. 11, no. 2, pp. 892–897, 2023. in 2022 International Conference on Advanced Computing Technolo-
[150] J. Lu, R. Xiong, J. Tian, C. Wang, C.-W. Hsu, N.-T. Tsou, F. Sun, gies and Applications (ICACTA). IEEE, 2022, pp. 1–6.
and J. Li, “Battery degradation prediction against uncertain future con- [171] P. An, Z. Wang, and C. Zhang, “Ensemble unsupervised autoencoders
ditions with recurrent neural network enabled deep learning,” Energy and gaussian mixture model for cyberattack detection,” Information
Storage Materials, vol. 50, pp. 139–151, 2022. Processing & Management, vol. 59, no. 2, p. 102844, 2022.
[151] A. Onan, “Bidirectional convolutional recurrent neural network archi- [172] S. Yan, H. Shao, Y. Xiao, B. Liu, and J. Wan, “Hybrid robust convo-
tecture with group-wise enhancement mechanism for text sentiment lutional autoencoder for unsupervised anomaly detection of machine
classification,” Journal of King Saud University-Computer and Infor- tools under noises,” Robotics and Computer-Integrated Manufacturing,
mation Sciences, vol. 34, no. 5, pp. 2098–2117, 2022. vol. 79, p. 102441, 2023.
[152] F. Shan, X. He, D. J. Armaghani, P. Zhang, and D. Sheng, “Success [173] E. Ayanoglu, K. Davaslioglu, and Y. E. Sagduyu, “Machine learning
and challenges in predicting tbm penetration rate using recurrent neural in nextg networks via generative adversarial networks,” IEEE Trans-
networks,” Tunnelling and Underground Space Technology, vol. 130, actions on Cognitive Communications and Networking, vol. 8, no. 2,
p. 104728, 2022. pp. 480–501, 2022.
[153] C. Sridhar, P. K. Pareek, R. Kalidoss, S. S. Jamal, P. K. Shukla, S. J. [174] K. Yan, X. Chen, X. Zhou, Z. Yan, and J. Ma, “Physical model
Nuagah et al., “Optimal medical image size reduction model creation informed fault detection and diagnosis of air handling units based
using recurrent neural network and genpsowvq,” Journal of Healthcare on transformer generative adversarial network,” IEEE Transactions on
Engineering, vol. 2022, 2022. Industrial Informatics, vol. 19, no. 2, pp. 2192–2199, 2022.
[154] J. Zhu, Q. Jiang, Y. Shen, C. Qian, F. Xu, and Q. Zhu, “Application [175] N.-R. Zhou, T.-F. Zhang, X.-W. Xie, and J.-Y. Wu, “Hybrid quantum–
of recurrent neural network to mechanical fault diagnosis: A review,” classical generative adversarial networks for image generation via
Journal of Mechanical Science and Technology, vol. 36, no. 2, pp. learning discrete distribution,” Signal Processing: Image Communica-
527–542, 2022. tion, vol. 110, p. 116891, 2023.
[155] S. Lin, W. Lin, W. Wu, F. Zhao, R. Mo, and H. Zhang, “Segrnn: Seg- [176] P. Ladosz, L. Weng, M. Kim, and H. Oh, “Exploration in deep
ment recurrent neural network for long-term time series forecasting,” reinforcement learning: A survey,” Information Fusion, vol. 85, pp.
arXiv preprint arXiv:2308.11200, 2023. 1–22, 2022.
JOURNAL OF LATEX CLASS FILES, VOL. 1, NO. 1, DECEMBER 2023 27
[177] Y. Matsuo, Y. LeCun, M. Sahani, D. Precup, D. Silver, M. Sugiyama, [198] W. Peng, D. Xu, T. Xu, J. Zhang, and E. Chen, “Are gpt embeddings
E. Uchibe, and J. Morimoto, “Deep learning, reinforcement learning, useful for ads and recommendation?” in International Conference on
and world models,” Neural Networks, vol. 152, pp. 267–275, 2022. Knowledge Science, Engineering and Management. Springer, 2023,
[178] D. Bertoin, A. Zouitine, M. Zouitine, and E. Rachelson, “Look where pp. 151–162.
you look! saliency-guided q-networks for generalization in visual [199] E. Erdem, M. Kuyu, S. Yagcioglu, A. Frank, L. Parcalabescu, B. Plank,
reinforcement learning,” Advances in Neural Information Processing A. Babii, O. Turuta, A. Erdem, I. Calixto et al., “Neural natural
Systems, vol. 35, pp. 30 693–30 706, 2022. language generation: A survey on multilinguality, multimodality, con-
[179] A. Hafiz, “A survey of deep q-networks used for reinforcement trollability and learning,” Journal of Artificial Intelligence Research,
learning: State of the art,” Intelligent Communication Technologies and vol. 73, pp. 1131–1207, 2022.
Virtual Mobile Networks: Proceedings of ICICV 2022, pp. 393–402, [200] J. Qian, L. Dong, Y. Shen, F. Wei, and W. Chen, “Controllable
2022. natural language generation with contrastive prefixes,” arXiv preprint
[180] A. Hafiz, M. Hassaballah, A. Alqahtani, S. Alsubai, and M. A. Hameed, arXiv:2202.13257, 2022.
“Reinforcement learning with an ensemble of binary action deep q- [201] H. Rashkin, V. Nikolaev, M. Lamm, L. Aroyo, M. Collins, D. Das,
networks.” Computer Systems Science & Engineering, vol. 46, no. 3, S. Petrov, G. S. Tomar, I. Turc, and D. Reitter, “Measuring attribution
2023. in natural language generation models,” Computational Linguistics, pp.
[181] A. Alagha, S. Singh, R. Mizouni, J. Bentahar, and H. Otrok, “Target 1–64, 2023.
localization using multi-agent deep reinforcement learning with prox- [202] A. K. Pandey and S. S. Roy, “Natural language generation using
imal policy optimization,” Future Generation Computer Systems, vol. sequential models: A survey,” Neural Processing Letters, pp. 1–34,
136, pp. 342–357, 2022. 2023.
[182] S. S. Hassan, Y. M. Park, Y. K. Tun, W. Saad, Z. Han, and C. S. Hong, [203] J. Y. Khan and G. Uddin, “Automatic code documentation generation
“3to: Thz-enabled throughput and trajectory optimization of uavs in 6g using gpt-3,” in Proceedings of the 37th IEEE/ACM International
networks by proximal policy optimization deep reinforcement learn- Conference on Automated Software Engineering, 2022, pp. 1–6.
ing,” in ICC 2022-IEEE International Conference on Communications. [204] Y. K. Dwivedi, N. Kshetri, L. Hughes, E. L. Slade, A. Jeyaraj, A. K.
IEEE, 2022, pp. 5712–5718. Kar, A. M. Baabdullah, A. Koohang, V. Raghavan, M. Ahuja et al.,
[183] A. K. Jayant and S. Bhatnagar, “Model-based safe deep reinforcement ““so what if chatgpt wrote it?” multidisciplinary perspectives on op-
learning via a constrained proximal policy optimization algorithm,” portunities, challenges and implications of generative conversational ai
Advances in Neural Information Processing Systems, vol. 35, pp. for research, practice and policy,” International Journal of Information
24 432–24 445, 2022. Management, vol. 71, p. 102642, 2023.
[184] B. Lin, “Reinforcement learning and bandits for speech and language [205] T. Fu, S. Gao, X. Zhao, J.-r. Wen, and R. Yan, “Learning towards
processing: Tutorial, review and outlook,” Expert Systems with Appli- conversational ai: A survey,” AI Open, vol. 3, pp. 14–28, 2022.
cations, p. 122254, 2023. [206] H. Ji, I. Han, and Y. Ko, “A systematic review of conversational
[185] B. Luo, Z. Wu, F. Zhou, and B.-C. Wang, “Human-in-the-loop rein- ai in language education: Focusing on the collaboration with human
forcement learning in continuous-action space,” IEEE Transactions on teachers,” Journal of Research on Technology in Education, vol. 55,
Neural Networks and Learning Systems, 2023. no. 1, pp. 48–63, 2023.
[186] A. Raza, K. P. Tran, L. Koehl, and S. Li, “Designing ecg monitoring [207] Y. Wan, W. Wang, P. He, J. Gu, H. Bai, and M. R. Lyu, “Biasasker:
healthcare system with federated transfer learning and explainable ai,” Measuring the bias in conversational ai system,” in Proceedings of
Knowledge-Based Systems, vol. 236, p. 107763, 2022. the 31st ACM Joint European Software Engineering Conference and
[187] S. Siahpour, X. Li, and J. Lee, “A novel transfer learning approach Symposium on the Foundations of Software Engineering, 2023, pp.
in remaining useful life prediction for incomplete dataset,” IEEE 515–527.
Transactions on Instrumentation and Measurement, vol. 71, pp. 1–11, [208] S. Kusal, S. Patil, J. Choudrie, K. Kotecha, S. Mishra, and A. Abraham,
2022. “Ai-based conversational agents: A scoping review from technologies
[188] Z. Guo, K. Lin, X. Chen, and C.-Y. Chit, “Transfer learning for angle to future directions,” IEEE Access, 2022.
of arrivals estimation in massive mimo system,” in 2022 IEEE/CIC [209] Z. Xiao, “Seeing us through machines: designing and building con-
International Conference on Communications in China (ICCC). IEEE, versational ai to understand humans,” Ph.D. dissertation, University of
2022, pp. 506–511. Illinois at Urbana-Champaign, 2023.
[189] S. Liu, Y. Lu, P. Zheng, H. Shen, and J. Bao, “Adaptive reconstruction [210] H.-K. Ko, G. Park, H. Jeon, J. Jo, J. Kim, and J. Seo, “Large-scale
of digital twins for machining systems: A transfer learning approach,” text-to-image generation models for visual artists’ creative works,” in
Robotics and Computer-Integrated Manufacturing, vol. 78, p. 102390, Proceedings of the 28th International Conference on Intelligent User
2022. Interfaces, 2023, pp. 919–933.
[190] H. Liu, J. Liu, L. Cui, Z. Teng, N. Duan, M. Zhou, and Y. Zhang, [211] A. Pearson, “The rise of crealtives: Using ai to enable and speed up
“Logiqa 2.0—an improved dataset for logical reasoning in natural the creative process,” Journal of AI, Robotics & Workplace Automation,
language understanding,” IEEE/ACM Transactions on Audio, Speech, vol. 2, no. 2, pp. 101–114, 2023.
and Language Processing, 2023. [212] J. Rezwana and M. L. Maher, “Designing creative ai partners with
[191] Y. Meng, J. Huang, Y. Zhang, and J. Han, “Generating training data cofi: A framework for modeling interaction in human-ai co-creative
with language models: Towards zero-shot language understanding,” systems,” ACM Transactions on Computer-Human Interaction, vol. 30,
Advances in Neural Information Processing Systems, vol. 35, pp. 462– no. 5, pp. 1–28, 2023.
477, 2022. [213] S. Sharma and S. Bvuma, “Generative adversarial networks (gans) for
[192] R. M. Samant, M. R. Bachute, S. Gite, and K. Kotecha, “Framework creative applications: Exploring art and music generation,” Interna-
for deep learning-based language models using multi-task learning in tional Journal of Multidisciplinary Innovation and Research Method-
natural language understanding: A systematic literature review and ology, ISSN: 2960-2068, vol. 2, no. 4, pp. 29–33, 2023.
future directions,” IEEE Access, vol. 10, pp. 17 078–17 097, 2022. [214] B. Attard-Frost, A. De los Rı́os, and D. R. Walters, “The ethics of ai
[193] H. Weld, X. Huang, S. Long, J. Poon, and S. C. Han, “A survey business practices: a review of 47 ai ethics guidelines,” AI and Ethics,
of joint intent detection and slot filling models in natural language vol. 3, no. 2, pp. 389–406, 2023.
understanding,” ACM Computing Surveys, vol. 55, no. 8, pp. 1–38, [215] A. Gardner, A. L. Smith, A. Steventon, E. Coughlan, and M. Oldfield,
2022. “Ethical funding for trustworthy ai: proposals to address the respon-
[194] S. Ajmal, A. A. I. Ahmed, and C. Jalota, “Natural language process- sibilities of funders to ensure that projects adhere to trustworthy ai
ing in improving information retrieval and knowledge discovery in practice,” AI and Ethics, pp. 1–15, 2022.
healthcare conversational agents,” Journal of Artificial Intelligence and [216] J. Schuett, “Three lines of defense against risks from ai,” AI &
Machine Learning in Management, vol. 7, no. 1, pp. 34–47, 2023. SOCIETY, pp. 1–15, 2023.
[195] A. Montejo-Ráez and S. M. Jiménez-Zafra, “Current approaches and [217] M. Sloane and J. Zakrzewski, “German ai start-ups and “ai ethics”:
applications in natural language processing,” Applied Sciences, vol. 12, Using a social practice lens for assessing and implementing socio-
no. 10, p. 4859, 2022. technical innovation,” in Proceedings of the 2022 ACM Conference on
[196] K. Vijayan, O. Anand, and A. Sahaj, “Language-agnostic text process- Fairness, Accountability, and Transparency, 2022, pp. 935–947.
ing for information extraction,” in CS & IT Conference Proceedings, [218] M. Vasconcelos, C. Cardonha, and B. Gonçalves, “Modeling epistemo-
vol. 12, no. 23. CS & IT Conference Proceedings, 2022. logical principles for bias mitigation in ai systems: an illustration in
[197] C. D. Manning, “Human language understanding & reasoning,” hiring decisions,” in Proceedings of the 2018 AAAI/ACM Conference
Daedalus, vol. 151, no. 2, pp. 127–138, 2022. on AI, Ethics, and Society, 2018, pp. 323–329.
JOURNAL OF LATEX CLASS FILES, VOL. 1, NO. 1, DECEMBER 2023 28
[219] Y. Yang, A. Gupta, J. Feng, P. Singhal, V. Yadav, Y. Wu, P. Natarajan, [241] M. Song, Z. Wang, Z. Zhang, Y. Song, Q. Wang, J. Ren, and H. Qi,
V. Hedau, and J. Joo, “Enhancing fairness in face detection in computer “Analyzing user-level privacy attack against federated learning,” IEEE
vision systems by demographic bias mitigation,” in Proceedings of the Journal on Selected Areas in Communications, vol. 38, no. 10, pp.
2022 AAAI/ACM Conference on AI, Ethics, and Society, 2022, pp. 813– 2430–2444, 2020.
822. [242] I. Misra and L. v. d. Maaten, “Self-supervised learning of pretext-
[220] R. Schwartz, A. Vassilev, K. Greene, L. Perine, A. Burt, P. Hall et al., invariant representations,” in Proceedings of the IEEE/CVF conference
“Towards a standard for identifying and managing bias in artificial on computer vision and pattern recognition, 2020, pp. 6707–6717.
intelligence,” NIST special publication, vol. 1270, no. 10.6028, 2022. [243] X. Zhai, A. Oliver, A. Kolesnikov, and L. Beyer, “S4l: Self-supervised
[221] W. Guo and A. Caliskan, “Detecting emergent intersectional biases: semi-supervised learning,” in Proceedings of the IEEE/CVF interna-
Contextualized word embeddings contain a distribution of human-like tional conference on computer vision, 2019, pp. 1476–1485.
biases,” in Proceedings of the 2021 AAAI/ACM Conference on AI, [244] T. Chen, X. Zhai, M. Ritter, M. Lucic, and N. Houlsby, “Self-supervised
Ethics, and Society, 2021, pp. 122–133. gans via auxiliary rotation loss,” in Proceedings of the IEEE/CVF
[222] Y. Kong, “Are “intersectionally fair” ai algorithms really fair to women conference on computer vision and pattern recognition, 2019, pp.
of color? a philosophical analysis,” in Proceedings of the 2022 ACM 12 154–12 163.
Conference on Fairness, Accountability, and Transparency, 2022, pp. [245] S. Jenni and P. Favaro, “Self-supervised feature learning by learning
485–494. to spot artifacts,” in Proceedings of the IEEE Conference on Computer
[223] Y. C. Tan and L. E. Celis, “Assessing social and intersectional biases in Vision and Pattern Recognition, 2018, pp. 2733–2742.
contextualized word representations,” Advances in neural information [246] P. Patel, N. Kumari, M. Singh, and B. Krishnamurthy, “Lt-gan: Self-
processing systems, vol. 32, 2019. supervised gan with latent transformation detection,” in Proceedings of
[224] L. Cheng, A. Mosallanezhad, P. Sheth, and H. Liu, “Causal learning the IEEE/CVF winter conference on applications of computer vision,
for socially responsible ai,” arXiv preprint arXiv:2104.12278, 2021. 2021, pp. 3189–3198.
[225] J. D. Correa, J. Tian, and E. Bareinboim, “Identification of causal [247] T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A simple frame-
effects in the presence of selection bias,” in Proceedings of the AAAI work for contrastive learning of visual representations,” in International
Conference on Artificial Intelligence, vol. 33, no. 01, 2019, pp. 2744– conference on machine learning. PMLR, 2020, pp. 1597–1607.
2751. [248] K. He, H. Fan, Y. Wu, S. Xie, and R. Girshick, “Momentum contrast
[226] B. Ghai and K. Mueller, “D-bias: a causality-based human-in-the- for unsupervised visual representation learning,” in Proceedings of the
loop system for tackling algorithmic bias,” IEEE Transactions on IEEE/CVF conference on computer vision and pattern recognition,
Visualization and Computer Graphics, vol. 29, no. 1, pp. 473–482, 2020, pp. 9729–9738.
2022. [249] A. T. Liu, S.-W. Li, and H.-y. Lee, “Tera: Self-supervised learning of
[227] J. N. Yan, Z. Gu, H. Lin, and J. M. Rzeszotarski, “Silva: Interactively transformer encoder representation for speech,” IEEE/ACM Transac-
assessing machine learning fairness using causality,” in Proceedings of tions on Audio, Speech, and Language Processing, vol. 29, pp. 2351–
the 2020 chi conference on human factors in computing systems, 2020, 2366, 2021.
pp. 1–13. [250] Y. Pang, W. Wang, F. E. Tay, W. Liu, Y. Tian, and L. Yuan, “Masked
[228] E. Bertino, M. Kantarcioglu, C. G. Akcora, S. Samtani, S. Mittal, autoencoders for point cloud self-supervised learning,” in European
and M. Gupta, “Ai for security and security for ai,” in Proceedings conference on computer vision. Springer, 2022, pp. 604–621.
of the Eleventh ACM Conference on Data and Application Security [251] T. Hospedales, A. Antoniou, P. Micaelli, and A. Storkey, “Meta-
and Privacy, 2021, pp. 333–334. learning in neural networks: A survey,” IEEE transactions on pattern
[229] H. Susanto, L. F. Yie, D. Rosiyadi, A. I. Basuki, and D. Setiana, “Data analysis and machine intelligence, vol. 44, no. 9, pp. 5149–5169, 2021.
security for connected governments and organisations: Managing au- [252] R. Vilalta and Y. Drissi, “A perspective view and survey of meta-
tomation and artificial intelligence,” in Web 2.0 and cloud technologies learning,” Artificial intelligence review, vol. 18, pp. 77–95, 2002.
for implementing connected government. IGI Global, 2021, pp. 229– [253] M. Al-Shedivat, L. Li, E. Xing, and A. Talwalkar, “On data efficiency
251. of meta-learning,” in International Conference on Artificial Intelligence
[230] S. Dilmaghani, M. R. Brust, G. Danoy, N. Cassagnes, J. Pecero, and and Statistics. PMLR, 2021, pp. 1369–1377.
P. Bouvry, “Privacy and security of big data in ai systems: A research [254] Y. Hu, R. Liu, X. Li, D. Chen, and Q. Hu, “Task-sequencing meta
and standards perspective,” in 2019 IEEE International Conference on learning for intelligent few-shot fault diagnosis with limited data,”
Big Data (Big Data). IEEE, 2019, pp. 5737–5743. IEEE Transactions on Industrial Informatics, vol. 18, no. 6, pp. 3894–
[231] T. McIntosh, “Intercepting ransomware attacks with staged event- 3904, 2021.
driven access control,” Ph.D. dissertation, La Trobe, 2022. [255] S. Baik, J. Choi, H. Kim, D. Cho, J. Min, and K. M. Lee, “Meta-
[232] T. McIntosh, A. Kayes, Y.-P. P. Chen, A. Ng, and P. Watters, “Applying learning with task-adaptive loss function for few-shot learning,” in
staged event-driven access control to combat ransomware,” Computers Proceedings of the IEEE/CVF international conference on computer
& Security, vol. 128, p. 103160, 2023. vision, 2021, pp. 9465–9474.
[233] P. Hummel, M. Braun, M. Tretter, and P. Dabrock, “Data sovereignty: [256] Y. Chen, Z. Liu, H. Xu, T. Darrell, and X. Wang, “Meta-baseline:
A review,” Big Data & Society, vol. 8, no. 1, p. 2053951720982012, Exploring simple meta-learning for few-shot learning,” in Proceedings
2021. of the IEEE/CVF international conference on computer vision, 2021,
[234] M. Lukings and A. Habibi Lashkari, “Data sovereignty,” in Under- pp. 9062–9071.
standing Cybersecurity Law in Data Sovereignty and Digital Gover- [257] M. A. Jamal and G.-J. Qi, “Task agnostic meta-learning for few-shot
nance: An Overview from a Legal Perspective. Springer, 2022, pp. learning,” in Proceedings of the IEEE/CVF Conference on Computer
1–38. Vision and Pattern Recognition, 2019, pp. 11 719–11 727.
[235] M. Hickok, “Lessons learned from ai ethics principles for future [258] R. Behnia, M. R. Ebrahimi, J. Pacheco, and B. Padmanabhan, “Ew-
actions,” AI and Ethics, vol. 1, no. 1, pp. 41–47, 2021. tune: A framework for privately fine-tuning large language models with
[236] J. Zhou and F. Chen, “Ai ethics: From principles to practice,” AI & differential privacy,” in 2022 IEEE International Conference on Data
SOCIETY, pp. 1–11, 2022. Mining Workshops (ICDMW). IEEE, 2022, pp. 560–566.
[237] J. A. Kroll, “Outlining traceability: A principle for operationalizing [259] J. Wei, M. Bosma, V. Y. Zhao, K. Guu, A. W. Yu, B. Lester, N. Du,
accountability in computing systems,” in Proceedings of the 2021 ACM A. M. Dai, and Q. V. Le, “Finetuned language models are zero-shot
Conference on Fairness, Accountability, and Transparency, 2021, pp. learners,” arXiv preprint arXiv:2109.01652, 2021.
758–771. [260] W. Kuang, B. Qian, Z. Li, D. Chen, D. Gao, X. Pan, Y. Xie, Y. Li,
[238] A. Oseni, N. Moustafa, H. Janicke, P. Liu, Z. Tari, and A. Vasilakos, B. Ding, and J. Zhou, “Federatedscope-llm: A comprehensive package
“Security and privacy for artificial intelligence: Opportunities and for fine-tuning large language models in federated learning,” arXiv
challenges,” arXiv preprint arXiv:2102.04661, 2021. preprint arXiv:2309.00363, 2023.
[239] B. C. Stahl and D. Wright, “Ethics and privacy in ai and big data: [261] M. Nguyen, K. Kishan, T. Nguyen, A. Chadha, and T. Vu, “Effi-
Implementing responsible research and innovation,” IEEE Security & cient fine-tuning large language models for knowledge-aware response
Privacy, vol. 16, no. 3, pp. 26–33, 2018. planning,” in Joint European Conference on Machine Learning and
[240] C. Ma, J. Li, K. Wei, B. Liu, M. Ding, L. Yuan, Z. Han, and H. V. Knowledge Discovery in Databases. Springer, 2023, pp. 593–611.
Poor, “Trusted ai in multiagent systems: An overview of privacy and [262] M. Engelbach, D. Klau, F. Scheerer, J. Drawehn, and M. Kintz,
security for distributed learning,” Proceedings of the IEEE, vol. 111, “Fine-tuning and aligning question answering models for complex
no. 9, pp. 1097–1132, 2023. information extraction tasks,” arXiv preprint arXiv:2309.14805, 2023.
JOURNAL OF LATEX CLASS FILES, VOL. 1, NO. 1, DECEMBER 2023 29
[263] T. T. Nguyen, C. Wilson, and J. Dalins, “Fine-tuning llama 2 large [286] S. Shen, L. Hou, Y. Zhou, N. Du, S. Longpre, J. Wei, H. W.
language models for detecting online sexual predatory chats and Chung, B. Zoph, W. Fedus, X. Chen et al., “Mixture-of-experts meets
abusive texts,” arXiv preprint arXiv:2308.14683, 2023. instruction tuning: A winning combination for large language models,”
[264] Q. Zhou, C. Yu, S. Zhang, S. Wu, Z. Wang, and F. Wang, “Regionblip: arXiv preprint arXiv:2305.14705, 2023.
A unified multi-modal pre-training framework for holistic and regional [287] S. Rajbhandari, C. Li, Z. Yao, M. Zhang, R. Y. Aminabadi, A. A.
comprehension,” arXiv preprint arXiv:2308.02299, 2023. Awan, J. Rasley, and Y. He, “Deepspeed-moe: Advancing mixture-of-
[265] T. Arnold and D. Kasenberg, “Value alignment or misalignment - what experts inference and training to power next-generation ai scale,” in
will keep systems accountable?” in AAAI Workshop on AI, Ethics, and International Conference on Machine Learning. PMLR, 2022, pp.
Society, 2017. 18 332–18 346.
[266] I. Gabriel and V. Ghazavi, “The challenge of value alignment: From [288] L. Shen, Z. Wu, W. Gong, H. Hao, Y. Bai, H. Wu, X. Wu, J. Bian,
fairer algorithms to ai safety,” arXiv preprint arXiv:2101.06060, 2021. H. Xiong, D. Yu et al., “Se-moe: A scalable and efficient mixture-
[267] S. Nyholm, “Responsibility gaps, value alignment, and meaningful of-experts distributed training and inference system,” arXiv preprint
human control over artificial intelligence,” in Risk and responsibility arXiv:2205.10034, 2022.
in context. Routledge, 2023, pp. 191–213. [289] C. Hwang, W. Cui, Y. Xiong, Z. Yang, Z. Liu, H. Hu, Z. Wang, R. Salas,
[268] S. Wu, H. Fei, L. Qu, W. Ji, and T.-S. Chua, “Next-gpt: Any-to-any J. Jose, P. Ram et al., “Tutel: Adaptive mixture-of-experts at scale,”
multimodal llm,” arXiv preprint arXiv:2309.05519, 2023. Proceedings of Machine Learning and Systems, vol. 5, 2023.
[269] K. Bayoudh, R. Knani, F. Hamdaoui, and A. Mtibaa, “A survey [290] Y. Wang, S. Mukherjee, X. Liu, J. Gao, A. H. Awadallah, and J. Gao,
on deep multimodal learning for computer vision: advances, trends, “Adamix: Mixture-of-adapter for parameter-efficient tuning of large
applications, and datasets,” The Visual Computer, pp. 1–32, 2021. language models,” arXiv preprint arXiv:2205.12410, vol. 1, no. 2, p. 4,
[270] P. Hu, L. Zhen, D. Peng, and P. Liu, “Scalable deep multimodal learning 2022.
for cross-modal retrieval,” in Proceedings of the 42nd international [291] T. Chen, Z. Zhang, A. Jaiswal, S. Liu, and Z. Wang, “Sparse moe
ACM SIGIR conference on research and development in information as the new dropout: Scaling dense and self-slimmable transformers,”
retrieval, 2019, pp. 635–644. arXiv preprint arXiv:2303.01610, 2023.
[271] A. Rahate, R. Walambe, S. Ramanna, and K. Kotecha, “Multimodal co- [292] H. Zhu, B. He, and X. Zhang, “Multi-gate mixture-of-experts stacked
learning: Challenges, applications with datasets, recent advances and autoencoders for quality prediction in blast furnace ironmaking,” ACS
future directions,” Information Fusion, vol. 81, pp. 203–239, 2022. omega, vol. 7, no. 45, pp. 41 296–41 303, 2022.
[272] L. Che, J. Wang, Y. Zhou, and F. Ma, “Multimodal federated learning: [293] Z. Chi, L. Dong, S. Huang, D. Dai, S. Ma, B. Patra, S. Singhal,
A survey,” Sensors, vol. 23, no. 15, p. 6986, 2023. P. Bajaj, X. Song, X.-L. Mao et al., “On the representation collapse of
[273] P. P. Liang, Y. Lyu, X. Fan, Z. Wu, Y. Cheng, J. Wu, L. Chen, P. Wu, sparse mixture of experts,” Advances in Neural Information Processing
M. A. Lee, Y. Zhu et al., “Multibench: Multiscale benchmarks for Systems, vol. 35, pp. 34 600–34 613, 2022.
multimodal representation learning,” arXiv preprint arXiv:2107.07502, [294] S. Gupta, S. Mukherjee, K. Subudhi, E. Gonzalez, D. Jose, A. H.
2021. Awadallah, and J. Gao, “Sparsely activated mixture-of-experts are
[274] Z. Ashktorab, Q. V. Liao, C. Dugan, J. Johnson, Q. Pan, W. Zhang, robust multi-task learners,” arXiv preprint arXiv:2204.07689, 2022.
S. Kumaravel, and M. Campbell, “Human-ai collaboration in a co- [295] N. Dikkala, N. Ghosh, R. Meka, R. Panigrahy, N. Vyas, and X. Wang,
operative game setting: Measuring social perception and outcomes,” “On the benefits of learning to route in mixture-of-experts models,” in
Proceedings of the ACM on Human-Computer Interaction, vol. 4, no. Proceedings of the 2023 Conference on Empirical Methods in Natural
CSCW2, pp. 1–20, 2020. Language Processing, 2023, pp. 9376–9396.
[275] P. Esmaeilzadeh, T. Mirzaei, and S. Dharanikota, “Patients’ perceptions [296] N. Dryden and T. Hoefler, “Spatial mixture-of-experts,” Advances in
toward human–artificial intelligence interaction in health care: exper- Neural Information Processing Systems, vol. 35, pp. 11 697–11 713,
imental study,” Journal of medical Internet research, vol. 23, no. 11, 2022.
p. e25856, 2021. [297] Z. You, S. Feng, D. Su, and D. Yu, “Speechmoe2: Mixture-of-
[276] M. Nazar, M. M. Alam, E. Yafi, and M. M. Su’ud, “A systematic review experts model with improved routing,” in ICASSP 2022-2022 IEEE
of human–computer interaction and explainable artificial intelligence in International Conference on Acoustics, Speech and Signal Processing
healthcare with artificial intelligence techniques,” IEEE Access, vol. 9, (ICASSP). IEEE, 2022, pp. 7217–7221.
pp. 153 316–153 348, 2021. [298] J. Puigcerver, R. Jenatton, C. Riquelme, P. Awasthi, and S. Bhojana-
[277] A. S. Rajawat, R. Rawat, K. Barhanpurkar, R. N. Shaw, and A. Ghosh, palli, “On the adversarial robustness of mixture of experts,” Advances
“Robotic process automation with increasing productivity and improv- in Neural Information Processing Systems, vol. 35, pp. 9660–9671,
ing product quality using artificial intelligence and machine learning,” 2022.
in Artificial Intelligence for Future Generation Robotics. Elsevier, [299] J. Li, Y. Jiang, Y. Zhu, C. Wang, and H. Xu, “Accelerating distributed
2021, pp. 1–13. {MoE} training and inference with lina,” in 2023 USENIX Annual
[278] S. Mohseni, N. Zarei, and E. D. Ragan, “A multidisciplinary survey Technical Conference (USENIX ATC 23), 2023, pp. 945–959.
and framework for design and evaluation of explainable ai systems,” [300] L. Wu, M. Liu, Y. Chen, D. Chen, X. Dai, and L. Yuan, “Residual
ACM Transactions on Interactive Intelligent Systems (TiiS), vol. 11, no. mixture of experts,” arXiv preprint arXiv:2204.09636, 2022.
3-4, pp. 1–45, 2021. [301] B. Zoph, I. Bello, S. Kumar, N. Du, Y. Huang, J. Dean, N. Shazeer, and
[279] M. C. Buehler and T. H. Weisswange, “Theory of mind based com- W. Fedus, “Designing effective sparse expert models,” arXiv preprint
munication for human agent cooperation,” in 2020 IEEE International arXiv:2202.08906, vol. 2, 2022.
Conference on Human-Machine Systems (ICHMS). IEEE, 2020, pp. [302] ——, “St-moe: Designing stable and transferable sparse expert mod-
1–6. els,” arXiv preprint arXiv:2202.08906, 2022.
[280] M. M. Çelikok, T. Peltola, P. Daee, and S. Kaski, “Interactive ai with [303] Y. Chow, A. Tulepbergenov, O. Nachum, M. Ryu, M. Ghavamzadeh,
a theory of mind,” arXiv preprint arXiv:1912.05284, 2019. and C. Boutilier, “A mixture-of-expert approach to rl-based dialogue
[281] A. Dafoe, E. Hughes, Y. Bachrach, T. Collins, K. R. McKee, J. Z. management,” arXiv preprint arXiv:2206.00059, 2022.
Leibo, K. Larson, and T. Graepel, “Open problems in cooperative ai,” [304] Z. Fan, R. Sarkar, Z. Jiang, T. Chen, K. Zou, Y. Cheng, C. Hao, Z. Wang
arXiv preprint arXiv:2012.08630, 2020. et al., “M3 vit: Mixture-of-experts vision transformer for efficient multi-
[282] S. Bubeck, V. Chandrasekaran, R. Eldan, J. Gehrke, E. Horvitz, task learning with model-accelerator co-design,” Advances in Neural
E. Kamar, P. Lee, Y. T. Lee, Y. Li, S. Lundberg et al., “Sparks of Information Processing Systems, vol. 35, pp. 28 441–28 457, 2022.
artificial general intelligence: Early experiments with gpt-4,” arXiv [305] T. Zadouri, A. Üstün, A. Ahmadian, B. Ermiş, A. Locatelli, and
preprint arXiv:2303.12712, 2023. S. Hooker, “Pushing mixture of experts to the limit: Extremely
[283] N. Fei, Z. Lu, Y. Gao, G. Yang, Y. Huo, J. Wen, H. Lu, R. Song, parameter efficient moe for instruction tuning,” arXiv preprint
X. Gao, T. Xiang et al., “Towards artificial general intelligence via a arXiv:2309.05444, 2023.
multimodal foundation model,” Nature Communications, vol. 13, no. 1, [306] J. Zhu, X. Zhu, W. Wang, X. Wang, H. Li, X. Wang, and J. Dai, “Uni-
p. 3094, 2022. perceiver-moe: Learning sparse generalist models with conditional
[284] R. Williams and R. Yampolskiy, “Understanding and avoiding ai moes,” Advances in Neural Information Processing Systems, vol. 35,
failures: A practical guide,” Philosophies, vol. 6, no. 3, p. 53, 2021. pp. 2664–2678, 2022.
[285] W. Fedus, B. Zoph, and N. Shazeer, “Switch transformers: Scaling [307] F. Dou, J. Ye, G. Yuan, Q. Lu, W. Niu, H. Sun, L. Guan, G. Lu,
to trillion parameter models with simple and efficient sparsity,” The G. Mai, N. Liu et al., “Towards artificial general intelligence (agi)
Journal of Machine Learning Research, vol. 23, no. 1, pp. 5232–5270, in the internet of things (iot): Opportunities and challenges,” arXiv
2022. preprint arXiv:2309.07438, 2023.
JOURNAL OF LATEX CLASS FILES, VOL. 1, NO. 1, DECEMBER 2023 30
[308] Z. Jia, X. Li, Z. Ling, S. Liu, Y. Wu, and H. Su, “Improving from twitter,” Journal of university teaching & learning practice,
policy optimization with generalist-specialist learning,” in International vol. 19, no. 3, p. 02, 2022.
Conference on Machine Learning. PMLR, 2022, pp. 10 104–10 119.
[309] M. Simeone, “Unknown future, repeated present: A narrative-centered
analysis of long-term ai discourse,” Humanist Studies & the Digital
Age, vol. 7, no. 1, 2022.
[310] A. Nair and F. Banaei-Kashani, “Bridging the gap between ar-
tificial intelligence and artificial general intelligence: A ten com-
mandment framework for human-like intelligence,” arXiv preprint
arXiv:2210.09366, 2022.
[311] M. H. Jarrahi, D. Askay, A. Eshraghi, and P. Smith, “Artificial intelli-
gence and knowledge management: A partnership between human and
ai,” Business Horizons, vol. 66, no. 1, pp. 87–99, 2023.
[312] D. J. Edwards, C. McEnteggart, and Y. Barnes-Holmes, “A functional
contextual account of background knowledge in categorization: Im-
plications for artificial general intelligence and cognitive accounts of
general knowledge,” Frontiers in Psychology, vol. 13, p. 745306, 2022.
[313] J. McCarthy, “Artificial intelligence, logic, and formalising common
sense,” Machine Learning and the City: Applications in Architecture
and Urban Design, pp. 69–90, 2022.
[314] S. Friederich, “Symbiosis, not alignment, as the goal for liberal
democracies in the transition to artificial general intelligence,” AI and
Ethics, pp. 1–10, 2023.
[315] S. Makridakis, “The forthcoming artificial intelligence (ai) revolution:
Its impact on society and firms,” Futures, vol. 90, pp. 46–60, 2017.
[316] S. Pal, K. Kumari, S. Kadam, and A. Saha, “The ai revolution,” IARA
Publication, 2023.
[317] S. Verma, R. Sharma, S. Deb, and D. Maitra, “Artificial intelligence in
marketing: Systematic review and future research direction,” Interna-
tional Journal of Information Management Data Insights, vol. 1, no. 1,
p. 100002, 2021.
[318] P. Budhwar, S. Chowdhury, G. Wood, H. Aguinis, G. J. Bamber, J. R.
Beltran, P. Boselie, F. Lee Cooke, S. Decker, A. DeNisi et al., “Human
resource management in the age of generative artificial intelligence:
Perspectives and research directions on chatgpt,” Human Resource
Management Journal, vol. 33, no. 3, pp. 606–659, 2023.
[319] J. B. Telkamp and M. H. Anderson, “The implications of diverse human
moral foundations for assessing the ethicality of artificial intelligence,”
Journal of Business Ethics, vol. 178, no. 4, pp. 961–976, 2022.
[320] X. Zhou, C. Liu, L. Zhai, Z. Jia, C. Guan, and Y. Liu, “Interpretable and
robust ai in eeg systems: A survey,” arXiv preprint arXiv:2304.10755,
2023.
[321] C. Zhang, C. Zhang, C. Li, Y. Qiao, S. Zheng, S. K. Dam, M. Zhang,
J. U. Kim, S. T. Kim, J. Choi et al., “One small step for generative
ai, one giant leap for agi: A complete survey on chatgpt in aigc era,”
arXiv preprint arXiv:2304.06488, 2023.
[322] K. Singhal, T. Tu, J. Gottweis, R. Sayres, E. Wulczyn, L. Hou,
K. Clark, S. Pfohl, H. Cole-Lewis, D. Neal et al., “Towards expert-
level medical question answering with large language models,” arXiv
preprint arXiv:2305.09617, 2023.
[323] S. Wu, O. Irsoy, S. Lu, V. Dabravolski, M. Dredze, S. Gehrmann,
P. Kambadur, D. Rosenberg, and G. Mann, “Bloomberggpt: A large
language model for finance,” arXiv preprint arXiv:2303.17564, 2023.
[324] P. Henderson, K. Sinha, N. Angelard-Gontier, N. R. Ke, G. Fried,
R. Lowe, and J. Pineau, “Ethical challenges in data-driven dialogue
systems,” in Proceedings of the 2018 AAAI/ACM Conference on AI,
Ethics, and Society, 2018, pp. 123–129.
[325] S. A. Bin-Nashwan, M. Sadallah, and M. Bouteraa, “Use of chatgpt
in academia: Academic integrity hangs in the balance,” Technology in
Society, vol. 75, p. 102370, 2023.
[326] N. Liu, A. Brown et al., “Ai increases the pressure to overhaul the
scientific peer review process. comment on “artificial intelligence can
generate fraudulent but authentic-looking scientific medical articles:
Pandora’s box has been opened”,” J Med Internet Res, vol. 25, p.
e50591, 2023.
[327] A. P. Siddaway, A. M. Wood, and L. V. Hedges, “How to do a
systematic review: a best practice guide for conducting and reporting
narrative reviews, meta-analyses, and meta-syntheses,” Annual review
of psychology, vol. 70, pp. 747–770, 2019.
[328] E. Landhuis, “Scientific literature: Information overload,” Nature, vol.
535, no. 7612, pp. 457–458, 2016.
[329] G. D. Chloros, V. P. Giannoudis, and P. V. Giannoudis, “Peer-reviewing
in surgical journals: revolutionize or perish?” Annals of surgery, vol.
275, no. 1, pp. e82–e90, 2022.
[330] K.-A. Allen, J. Reardon, Y. Lu, D. V. Smith, E. Rainsford, and
L. Walsh, “Towards improving peer review: Crowd-sourced insights