Preprints202409 0311 v1
Preprints202409 0311 v1
doi: 10.20944/preprints202409.0311.v1
Keywords: generative AI; large language models; AI chatbots; ChatGPT; artificial intelligence; retrieval-
augmented generation; medicine; healthcare; human health; AI regulation
Copyright: This is an open access article distributed under the Creative Commons
Attribution License which permits unrestricted use, distribution, and reproduction in any
medium, provided the original work is properly cited.
Preprints.org (www.preprints.org) | NOT PEER-REVIEWED | Posted: 4 September 2024 doi:10.20944/preprints202409.0311.v1
Disclaimer/Publisher’s Note: The statements, opinions, and data contained in all publications are solely those of the individual author(s) and
contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting
from any ideas, methods, instructions, or products referred to in the content.
Review
* Correspondence: [email protected]
Abstract: The rapid development of specific-purpose Large Language Models (LLMs), such as Med-PaLM,
MEDITRON-70B, and Med-Gemini, has significantly impacted healthcare, offering unprecedented capabilities
in clinical decision support, diagnostics, and personalized health monitoring. This paper reviews the
advancements in medicine-specific LLMs, the integration of Retrieval-Augmented Generation (RAG) and
prompt engineering, and their applications in improving diagnostic accuracy and educational utility. Despite
the potential, these technologies present challenges, including bias, hallucinations, and the need for robust
safety protocols. The paper also discusses the regulatory and ethical considerations necessary for integrating
these models into mainstream healthcare. By examining current studies and developments, this paper aims to
provide a comprehensive overview of the state of LLMs in medicine and highlight the future directions for
research and application. The study concludes that while LLMs hold immense potential, their safe and effective
integration into clinical practice requires rigorous testing, ongoing evaluation, and continuous collaboration
among stakeholders.
Keywords: generative AI; large language models; AI chatbots; ChatGPT; artificial intelligence;
retrieval-augmented generation; medicine; healthcare; human health; AI regulation
1. Introduction
In July 2024, twenty months after the initial public launch of OpenAI’s ChatGPT in November
2022, Gartner, the firm behind the well-known Hype Cycle methodology, declared in a new research
report that generative AI has passed the ‘peak of inflated expectations’ and is moving into the ‘trough
of disillusionment.’ This phase is expected to lead to the ‘slope of enlightenment,’ ultimately resulting
in a ‘plateau of productivity’ as the technology matures, becomes mainstream, and its real-world
benefits begin to materialize [1]. Applications utilizing generative AI and large language models
(LLMs) in patient management, such as diagnosis, are considered SaMD/AIaMD (software as a
medical device/AI as a medical device) and fall under established MDR (medical device regulation)
provisions. However, as of August 2024, no application of this nature has been approved by
regulatory bodies like the FDA (Food and Drug Administration, US), MHRA (Medicines and
Healthcare products Regulatory Agency, UK), EMA (European Medicines Agency, EU), or
corresponding agencies worldwide.
While these tools show promise in certain scenarios, such as assisting in difficult diagnoses [2],
the current generation of generative AI and LLMs, including medically trained models such as
Google's Med-PaLM 2 [3], are not yet ready for mainstream clinical use. Passing a medical licensing
exam with high scores [4]—something that these models can accomplish—does not equate to
readiness for safe use in routine patient care [5–7]. This is due to several key limitations inherent in
the technology.
One of the most significant issues is the phenomenon of AI "hallucinations," where the models
generate plausible sounding but factually incorrect or nonsensical information. A recent study by
Aljamaan et al. developed a Reference Hallucination Score specifically for medical AI chatbots,
highlighting the significance of accurately detecting and mitigating hallucinations to ensure the
reliability and safety of these tools in clinical environments [8]. This issue of hallucinations, combined
with the often inconsistent, unpredictable, and fluctuating (stochastic) performance of these models,
and their proneness to bias, underscores their lack of real human-like intelligence. These models
operate on vast amounts of data but lack the nuanced understanding and contextual awareness that
human practitioners possess. As a result, their unregulated use in critical medical settings could lead
to dangerous oversights or errors.
For these tools to be safely and effectively integrated into mainstream healthcare, substantial
technological advancements are necessary. These advancements, a number of which will be briefly
presented later in this article, would need to address the current limitations and ensure that AI
models can reliably support clinical decision-making without introducing undue risk.
Despite these challenges and the current lack of regulatory approvals, some early clinician
adopters are already using these tools in practice. This premature adoption is particularly concerning
given the potential for critical information oversight—such as missing a patient's drug allergies due
to an AI error [7]. The dangers of such oversights have led to growing calls from within the medical
community for stringent regulations to govern the use of AI in healthcare. For instance, there have
been increasing demands for clear guidelines and rules to prevent medical mistakes caused by AI,
highlighting the urgent need for a regulatory framework that ensures patient safety while enabling
innovation [9,10].
This article extends upon our previous review [11] and highlights recent advancements from
August 2023. By examining the potential advantages, challenges, and ethical considerations of
applying generative AI models in medicine and healthcare, this study aims to contribute to the
ongoing dialogue on harnessing AI’s capabilities responsibly for the betterment of medical practice
and patient well-being.
There are many general-purpose models on offer today, including Llama 2 [17], GPT-4o (omni
with voice mode) [18], LangChain [19], Claude [20], and Mistral [21], among others. Generative AI’s
capabilities extend beyond text to produce synthetic content such as images, videos, and audio. Tools
such as Image FX [22] and DALL-E [23] use language prompts to generate images (Figure 1). GPT-
4V is capable of handling multimodal inputs such as both text and images for tasks like visual
description and object localization [24]. These models can create visuals to aid in conveying
information, which can make them valuable in clinical and public health scenarios requiring rapid
and clear communication.
Prompt engineering is crucial for optimizing the performance of generative AI models. It
involves designing and refining input prompts to elicit the most accurate and relevant responses
from the models. Effective prompt engineering can significantly enhance the usability and reliability
of AI systems in healthcare settings [25]. Retrieval-Augmented Generation (RAG) represents another
advancement in AI, enhancing the capabilities of LLMs by integrating external knowledge retrieval
mechanisms [26]. RAG combines retrieval-based models, which fetch relevant documents or data,
with generative models like GPT-3 and GPT-4, which generate coherent responses based on retrieved
information. This approach addresses some limitations of pure generative models, such as
hallucinations, by grounding responses in real, retrieved data [27]. Techniques like semantic entropy
analysis also help detect and reduce hallucinations, increasing the reliability of LLMs in clinical
applications [28]. Additionally, the recently introduced "Thermometer" method offers a novel
approach to preventing AI models from becoming overly confident in incorrect answers by
continuously evaluating their confidence levels across different predictions. This method helps in
reducing overconfidence-related errors, making AI models more reliable and safer for clinical use
[29].
The rapid advancement of GPT models and the integration of technologies like multimodal
content generation, prompt engineering, and RAG carry the potential of reshaping healthcare
communication and decision-making. However, the integration of AI in healthcare raises ethical and
safety considerations. Chatbots, while potentially beneficial for patient communication, must be
carefully managed to ensure they contribute positively to medical practice and patient well-being,
and prevent any harmful outcomes. For instance, there is a risk that chatbots could mislead
individuals with depression, potentially exacerbating their condition [30]. Ensuring the ethical and
safe deployment of AI involves rigorous evaluation, transparent communication of limitations, and
continuous monitoring to prevent misuse and harm. These issues are discussed in detail later in this
paper.
Preprints.org (www.preprints.org) | NOT PEER-REVIEWED | Posted: 4 September 2024 doi:10.20944/preprints202409.0311.v1
Figure 1. AI-generated image and text description in response to the prompt “generate an image
illustrating the different generative AI and LLM uses and applications in medicine and healthcare.”
Note the malformed and misspelled text towards the top right part of the image (it was probably
meant to read “personalized medicine”). This is a common observation with current models.
Generator: OpenAI's DALL·E 3, September 1, 2024; Requestor: Maged N. Kamel Boulos; License:
Public Domain (CC0).
remembered that current regulation focuses on claims, efficacy and safety—principles that do not
and should never change across AI generations. Any new clinical generative AI regulation should
not attempt to “dilute” existing quality and safety principles and standards in any way, but should
rather evolve to maintain or improve upon the high standards that SaMD/AIaMD regulation adheres
to today. Reform is inevitable and more agile regulatory changes are coming [37] that can better
address the complexities of generative AI and the fast pace of developments in this field, protect
patients, and encourage responsible innovation.
Blumenthal and Patel [40] are joined by Derraz et al., who similarly argue that current regulatory
frameworks are a “de facto blocker to AI-based personalized medicine” and that new regulatory
thinking and approaches are necessary to overcome this situation [44]. In the same vein, Freyer et al.
call for a new regulatory framework that recognizes the capabilities and limitations of generative AI
applications, but stress that this should be done while also enforcing existing regulations [45].
In August 2024, the UNESCO published its “Consultation Paper on AI Regulation - Emerging
Approaches Across the World” in which it described nine non-mutually-exclusive regulatory
approaches [46]. Medical AI regulatory frameworks (current and future) in different countries will
often combine two or more of these approaches:
1. Principles-based approach: Offers core principles guiding the ethical and responsible creation
and use of AI systems, emphasizing human-centered processes and respect for human rights.
2. Standards-based approach: Transfers state regulatory authority to organizations that develop
technical standards to interpret and enforce mandatory rules.
3. Agile and experimentalist approach: Creates adaptable regulatory frameworks, like sandboxes,
that allow businesses to test new AI models and tools under flexible regulations with
government oversight.
4. Facilitating and enabling approach: Fosters an environment that promotes the development and
use of ethical and human rights-compliant AI by all stakeholders.
5. Adapting existing laws approach: Updates sector-specific and general laws to improve the
current regulatory system for AI.
6. Access to information and transparency mandates approach: Mandates transparency measures
to ensure public access to basic information about AI systems.
7. Risk-based approach: Implements requirements based on the risk levels associated with using
AI in various contexts.
8. Rights-based approach: Sets obligations to protect individuals’ rights and freedoms when using
AI.
9. Liability approach: Establishes accountability and penalties for the misuse of AI systems.
Schmidt et al. compiled a collection of 141 binding policies applicable to AI in healthcare and
population health in the EU and 10 European countries. They concluded that specific AI regulation
is still nascent, and that the combination in place today of existing data, technology, innovation, and
health and human rights policies is already providing a baseline regulatory framework for AI in
health, but needs additional work to address specific regulatory challenges [47].
The European Union’s AI Act, which came into force on 1st August 2024 [48], introduces an
additional regulatory layer requiring manufacturers to address AI-specific risks and ethical
considerations in any medical application or device incorporating AI or machine learning
functionalities. This act underscores the importance of aligning with both standard medical device
regulations (MDR) and the new AI Act-specific requirements to ensure the safe and effective
deployment of AI technologies in healthcare [49,50]. It is worth noting that the EU AI Act was the
subject of strong lobbying efforts by big tech companies and EU member states to weaken much of
its power through an overreliance on self-regulation and self-certification among other things [51].
The evaluation of AI-based clinical interventions is closely related to their regulation. Evaluation
provides much of the evidence required by the governing regulatory frameworks to secure
regulatory body approval for the mainstream use of these interventions. However, executing the
ideal clinical trial for an AI-based intervention has always proved challenging, and because of this,
we have hundreds of medical algorithms (non-generative-AI-based) that received approval on the
basis of limited clinical data, which is far from ideal. The testing and evaluation of such interventions
Preprints.org (www.preprints.org) | NOT PEER-REVIEWED | Posted: 4 September 2024 doi:10.20944/preprints202409.0311.v1
is inherently tricky; for example, a perfectly good algorithm can fail if clinicians (or patients) ignore
its suggestions. Other evaluation challenges include AI bias and informed patient consent [52].
Generative AI algorithms are no exception.
This situation has led Coiera and Fraile-Navarro to propose a shift in the focus of generative AI
evaluation from reliance on ‘premarket assessment’ to ‘real-world postmarket surveillance’. They
argue that traditional scientific methods may not be sufficient to evaluate generative AI, and that
viewing it as a cybersocial ecosystem rather than as a specific technology may help with its global
performance analysis, such as evaluating resilience and sustainability under changing conditions or
tasks [53].
With the fast pace of generative AI and LLM developments, the continual mapping ongoing
research on their applications in medicine and healthcare has become necessary to inform and
regularly update the corresponding ethical frameworks, ensuring that these technologies are adopted
responsibly and effectively. Ong et al. discussed the benefits of LLMs in medical research, education,
and clinical tasks, emphasizing the associated challenges and ethical concerns, including data
privacy, cognitive and automation biases, and accountability. They went on to propose a bioethical
framework based on the principles of beneficence, nonmaleficence, autonomy, and justice to ensure
responsible use of LLMs in medicine. They highlighted the importance of human oversight,
transparency, and the need for regulations to mitigate risks while harnessing the benefits of these
advanced AI technologies [54,55].
Haltaufderheide and Ranisch conducted a systematic review on the ethics of ChatGPT and other
LLMs in medicine, identifying four broad LLM application categories (covering health professionals
and researchers, patient support, clinical applications, and public health) and a number of recurring
ethical issues related to epistemic values (reliability, transparency, hallucinations), therapeutic
relationships, and privacy, among others. They also noted the recurrent calls for ethical guidance and
human oversight in this area, and proposed shifting the focus of the ethical guidance discussion
towards establishing clear definitions of acceptable human oversight in various applications, taking
into account the diversity of settings, risks, and performance standards that are involved [56].
Kim et al. evaluated biases in responses from AI chatbots (ChatGPT-4 and Bard) and clinicians.
Their study showed that both AI chatbots and clinicians displayed biases based on patient
demographics, and that ChatGPT and Bard varied in treatment recommendations, sometimes
aligning with or diverging from clinician biases. Specific vignettes showed discrepancies in diagnoses
and treatments, indicating biases in both AI and human decision-making. They concluded that while
AI chatbots can assist in medical decision-making, their responses are not bias-free, necessitating
further research to prevent perpetuating health disparities [60].
3.1.4. Limitations
† StyleGAN is a generative adversarial network (GAN) first introduced by NVIDIA researchers in December 2018.
Preprints.org (www.preprints.org) | NOT PEER-REVIEWED | Posted: 4 September 2024 doi:10.20944/preprints202409.0311.v1
The integration of LLMs in clinical decision-making is not without its limitations. Hager et al.
evaluated the performance of a number of Meta Llama 2 LLM derivatives, both generalist (Llama 2
Chat, Open Assistant (OASST), and WizardLM) and medical-domain aligned derivatives (Clinical
Camel and MEDITRON—more on these in subsection 3.4). They used a curated dataset of 2,400
patient cases for their evaluation and identified several key issues [6]:
• Diagnostic accuracy: LLMs performed worse than clinicians in diagnosing diseases and
frequently failed to follow established diagnostic guidelines. This limitation is critical as accurate
diagnosis is the cornerstone of effective medical treatment and patient care.
• Interpretation of data: LLMs struggled with interpreting lab results, following instructions
accurately, and handling varying quantities and orders of information. This issue is particularly
concerning in clinical settings where precise and context-aware interpretation of data is
essential.
• Autonomous decision-making: The current capabilities of LLMs are insufficient for autonomous
clinical decision-making without extensive human supervision. This limitation suggests that
while LLMs can assist clinicians, they are not yet ready to replace human decision-making in
critical healthcare environments.
• Integration into clinical workflows: Improvements in fine-tuning and evaluation frameworks
are needed to better integrate LLMs into clinical workflows. This includes developing more
robust training data, improving model transparency, and ensuring that AI-generated
recommendations can be easily understood and validated by human clinicians.
Ando et al. compared the quality of ChatGPT responses to anesthesia-related medical questions
in English and Japanese. They found that English LLM responses were superior in quality to Japanese
responses when assessed by bilingual anesthesia experts [64]. Indeed, generative AI models do not
process or understand natural language the way humans do [65]. Tokenization, the process of
breaking down raw text into smaller units, or tokens, to be processed by an LLM, is a major reason
behind some of the strange LLM outputs and limitations observed today including their worse
performance in non-English languages. Unless there is a significant breakthrough in tokenization, it
appears that new model architectures will be the key to resolving this issue in the future [66].
In summary, while generative AI and LLMs hold immense potential to transform healthcare
through enhanced clinical decision-making, improved diagnostic accuracy, and reduced physician
burnout, significant challenges remain. Concerns about bias, variability, and ethical implications
necessitate robust regulatory frameworks and comprehensive standards. Explainability and
performance of AI models must be enhanced through innovative methodologies, and the limitations
of current models need to be addressed through continuous improvements in training and
evaluation. The safe and effective integration of these advanced technologies in healthcare requires
ongoing research, collaboration among stakeholders, and stringent oversight to ensure they
contribute positively to patient care and clinical practice.
10
prompt strategies can help guide AI models to produce more accurate and relevant responses, which
is vital for clinical decision-making [35].
A recent study by Patel et al. assessed how different prompt engineering techniques influence
GPT-3.5's ability to answer medical questions. The study compared direct prompts, Chain of Thought
(CoT), and modified CoT approaches using 1,000 questions generated by GPT-4 and 95 real USMLE
Step 1 questions. The analysis revealed no significant differences in accuracy among the prompt
types, with success rates of approximately 61.7% for direct prompts, 62.8% for CoT, and 57.4% for
modified CoT on USMLE questions. This finding suggested that while prompt engineering
techniques like CoT are designed to enhance reasoning, they do not significantly impact the model's
performance on medical calculations or clinical scenarios. Consequently, simpler prompting methods
can be as effective as more complex ones, potentially simplifying the integration of AI tools like
ChatGPT into medical education and clinical practice [68].
11
of RAG models in extracting and explaining complex medical information, such as cancer
progression diagnoses, highlighting the potential of user-tailored learning-to-rank methods to
support clinical practice and improve the reliability of AI-generated medical insights [72].
While RAG technology can reduce a model’s hallucinations by grounding the generated content
in retrieved, verified data, it is not a complete solution to the hallucination problem [73]. RAG
mitigates but does not entirely prevent AI hallucinations. This limitation arises because the
generative component can still synthesize incorrect or misleading content, even when the retrieval
mechanism supplies accurate information. Therefore, while RAG enhances the reliability of LLMs,
continuous improvements in both retrieval methods and generative accuracy are necessary for these
tools to become fully reliable in clinical applications.
12
fitness to demonstrate model’s capabilities. The model performed comparably to human experts in
fitness and showed significant improvements in sleep insights after fine-tuning, achieving 79% on
sleep medicine exams and 88% on fitness exams, surpassing human expert average scores. This
underscores the potential of integrating wearable device data into AI models for personalized health
recommendations and monitoring, although further development and evaluation are necessary for
application in safety-critical personal health domains.
Around the same time in June 2024, the Personal Health Insights Agent (PHIA) was introduced,
also by Google researchers [78]. PHIA is an agent system leveraging LLM capabilities for analyzing
and interpreting wearable health data to generate personalized health insights. Utilizing iterative
reasoning, code generation, and web search, PHIA addresses personal health queries effectively. This
model was evaluated using two benchmark datasets comprising over 4,000 health insights questions
for objective and open-ended evaluation. Human and expert evaluations demonstrated PHIA’s
accuracy in addressing factual and open-ended health queries, outperforming standard LLM
baselines. It accurately addressed over 84% of factual numerical questions and more than 83% of
crowdsourced open-ended questions, providing personalized health insights that can serve as an aid
to individuals in interpreting their wearable data and potentially improving their health behaviors.
The results highlight the potential for LLM agents to advance behavioral health, making personalized
wellness regimens more accessible.
In a randomized study assessing the differential diagnosis (DDx) accuracy for 302 NEJM (New
England Journal of Medicine) Clinicopathological Conference (CPC) series case reports, a specialized
LLM (Google Med-PaLM 2) was compared with 20 physicians (US board-certified internists with a
median of 9 years of experience). Each case report was assessed by two clinicians randomized to one
of two assistive arms: search engines and standard medical resources, or LLM assistance in addition
to these tools. Clinicians in both groups also provided a baseline, unassisted DDx prior to using their
assigned tools. The study revealed that Med-PaLM 2 performed better on its own than unassisted
clinicians. Additionally, when comparing the two groups of assisted clinicians, those supported by
Med-PaLM 2 achieved a higher DDx quality score, highlighting the growing potential of LLMs in
supporting complex medical decision-making tasks [79].
4. Discussion
The development of medicine-specific Large Language Models (LLMs) such as Med-PaLM,
MEDITRON-70B, Med-Gemini, PH-LLM, and PHIA showcases the rapid advancements and
potential of these models to enhance healthcare delivery and personalized health monitoring. These
models demonstrate significant improvements in medical reasoning, diagnostic accuracy, and
practical applications, offering new possibilities for clinical decision support, patient communication,
and educational tools.
The integration of Retrieval-Augmented Generation (RAG) and prompt engineering in medical
applications, as demonstrated by tools like Aeyeconsult [69] and the custom dermatology AI chatbot
built using NVIDIA’s ‘Chat with RTX’ [70], further enhances the capabilities of AI models by
combining generative AI with domain-specific knowledge bases. These advancements improve
diagnostic accuracy, educational utility, and user experience by providing precise, contextually
relevant answers and interactive learning tools.
However, despite these promising developments, significant challenges remain. The risks
associated with bias, hallucinations, and ethical concerns necessitate ongoing research and the
establishment of robust safety protocols to ensure the responsible use of AI in medicine. Additionally,
the regulatory landscape must evolve to keep pace with these advancements, ensuring that AI tools
meet the high standards required for clinical use while safeguarding patient safety and privacy.
To sum up, while generative AI and LLMs hold immense potential to transform healthcare, their
successful integration into clinical practice and personal health management will require continuous
collaboration among stakeholders, including regulatory bodies, patients/patient advocates,
healthcare professionals, industry representatives, the academia, government partners, and
international organizations (e.g., IMDRF, the International Medical Device Regulators Forum, and
Preprints.org (www.preprints.org) | NOT PEER-REVIEWED | Posted: 4 September 2024 doi:10.20944/preprints202409.0311.v1
13
ICH, the International Council for Harmonization of Technical Requirements for Pharmaceuticals for
Human Use) [80]. Success will also be dependent on rigorous testing and a commitment to addressing
the ethical and regulatory challenges that lie ahead. The future of AI in medicine is bright, but will
require careful stewardship to realize its full potential in improving patient care and healthcare
outcomes.
It should be noted that, while this paper attempted to cover as much topic breadth (bird's-eye
view) as possible, as an incremental update to our 2023 review on the same subject [11], there are
additional innovative applications of generative AI and LLM technology in healthcare that we were
not able to discuss without making this paper unduly long. Among these, a few notable applications
stand out that deserve mentioning, albeit very briefly, before concluding this review. They include
Huma’s cloud generative AI builder, which automates or semi-automates the generation and coding
of new healthcare applications from text inputs, demonstrating the potential of AI to streamline app
development processes [81]. Another application worth noting is the integration of Brain-Computer
Interface (BCI) technology with ChatGPT by Synchron, which offers a glimpse into the future of
assistive technologies, where generative AI could help users with neurological impairments
communicate and control devices more effectively [82–84]. Finally, generative AI is also being used
in the development of health digital twins [85], as well as in drug discovery [86] and drug
repurposing [87].
5. Conclusion
The applications of generative AI in medicine and healthcare have come a long way since the
initial public launch in November 2022 of OpenAI’s ChatGPT, a general-purpose LLM. ChatGPT and
other general-purpose LLMs have greatly and rapidly improved afterwards, and continue to do so,
with OpenAI said to be working on a new approach and LLM (codenamed ‘Strawberry’ and ‘Orion’
respectively [88]) that will have the ability to do more complex reasoning and context-sensitive
solving instead of just mere pattern recognition and word prediction. Furthermore, the medical and
healthcare arena witnessed the release of specific-purpose and narrow-focus (medically-trained)
LLMs (such as Google Med-PaLM), and the adoption of promising methods such as retrieval-
augmented generation and more robust applications involving clusters of multiple specialized LLMs
working together (as seen in Hippocratic AI [89]), all of which are intended to improve the efficacy
and reliability of generative AI applications in medicine.
As we move beyond the initial hype surrounding generative AI in medicine and healthcare (the
‘peak of inflated expectations’), we realize there is still a long way to go to properly regulate these
applications and better mitigate or overcome their current limitations in order to make them ready
for safe mainstream clinical use and ultimately reach their ‘plateau of productivity’.
To supplement this article, given the fast pace of developments in this area, we have made
publicly available a Web page at [90] featuring a regularly updated archive of Web pointers to
handpicked news, posts and articles about generative AI in medicine and healthcare.
Authors’ Contributions: M.N.K.B. conceived the manuscript idea, set its scope and direction, conducted its core
literature review, wrote and edited its draft and final versions, and invited P.Z. and J.S. to contribute as co-
authors. P.Z and J.S. contributed equally to additional parts of the literature review and to the writing of the
initial draft versions of the paper. All authors have read and agreed to the published version of the manuscript.
Data Availability Statement: No new data were created or analyzed in this study. Data sharing is not applicable
to this article.
References
1. Gartner Research. Hype Cycle for Generative AI, 2024. 31 July 2024. Available online:
https://www.gartner.com/en/documents/5636791 (accessed on 30 August 2024).
Preprints.org (www.preprints.org) | NOT PEER-REVIEWED | Posted: 4 September 2024 doi:10.20944/preprints202409.0311.v1
14
2. Holohan, M. Mom ChatGPT diagnosis pain. Today, 11 September 2023. Available online:
https://www.today.com/health/mom-chatgpt-diagnosis-pain-rcna101843 (accessed on 30 August 2024).
3. Google Cloud. Sharing Google Med-PaLM 2: Medical Large Language Model. Available online:
https://cloud.google.com/blog/topics/healthcare-life-sciences/sharing-google-med-palm-2-medical-large-
language-model (accessed on 30 August 2024).
4. Gottlieb, S.; Benezra, S. Op-ed: How well can AI chatbots mimic doctors in a treatment setting? CNBC,
Published: 18 July 2024. Available online: https://www.cnbc.com/2024/07/18/op-ed-how-well-can-ai-
chatbots-mimic-doctors.html (accessed on 30 August 2024).
5. Kim, W. No, you cannot gauge large language models (LLMs) "for their medical proficiency" using
multiple-choice questions alone. LinkedIn Commentary, 2024. Available online:
https://www.linkedin.com/posts/woojinkim_genai-chatgpt-gpt4-activity-7225200801898487809-QRxW
(accessed on 30 August 2024).
6. Hager, P.; Jungmann, F.; Holland, R.; Bhagat, K.; Hubrecht, I.; Knauer, M.; Vielhauer, J.; Makowski, M.;
Braren, R.; Kaissis, G.; Rueckert, D. Evaluation and mitigation of the limitations of large language models
in clinical decision-making. Nat. Med. 2024. https://doi.org/10.1038/s41591-024-03097-1.
7. Alba, D.; Swetlitz, I. Google Taps AI to Revamp Costly Health-Care Push Marred by Flops. BNN
Bloomberg, 2024. Available online:
https://www.bnnbloomberg.ca/business/technology/2024/07/30/google-taps-ai-to-revamp-costly-health-
care-push-marred-by-flops/ (accessed on 30 August 2024).
8. Aljamaan, F.; Temsah, M.H.; Altamimi, I.; Al-Eyadhy, A.; Jamal, A.; Alhasan, K.; Mesallam, T.A.; Farahat,
M.; Malki, K.H. Reference Hallucination Score for Medical Artificial Intelligence Chatbots: Development
and Usability Study. JMIR Med. Inform. 2024, 12, e54345. https://doi.org/10.2196/54345.
9. Dudley-Nicholson, J. Doctors Call for AI Rules to Prevent Medical Mistakes. The Standard (Australia), 17
July 2024. Available online: https://www.standard.net.au/story/8698797/doctors-call-for-ai-rules-to-
prevent-medical-mistakes/ (accessed on 30 August 2024).
10. Meskó, B.; Topol, E.J. The imperative for regulatory oversight of large language models (or generative AI)
in healthcare. NPJ Digit. Med. 2023, 6, 120. https://doi.org/10.1038/s41746-023-00873-0.
11. Zhang, P.; Kamel Boulos, M.N. Generative AI in medicine and healthcare: promises, opportunities and
challenges. Future Internet 2023, 15(9), 286. https://doi.org/10.3390/fi15090286.
12. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I.
Attention is All You Need. In Advances in Neural Information Processing Systems; Curran Associates, Inc.:
2017; pp. 5998-6008.
13. Radford, A.; Narasimhan, K.; Salimans, T.; Sutskever, I. Improving Language Understanding by
Generative Pre-Training. 2018. Available online: https://cdn.openai.com/research-covers/language-
unsupervised/language_understanding_paper.pdf
14. Radford, A.; Wu, J.; Child, R.; Luan, D.; Amodei, D.; Sutskever, I. Language Models are Unsupervised
Multitask Learners. 2019. Available online: https://cdn.openai.com/better-language-
models/language_models_are_unsupervised_multitask_learners.pdf
15. Brown, T.B.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.D.; Dhariwal, P.; Neelakantan, A.; Shyam, P.;
Sastry, G.; Askell, A.; et al. Language Models are Few-Shot Learners. 2020. Available online:
https://arxiv.org/abs/2005.14165
16. OpenAI. GPT-4 Technical Report. 2023. Available online: https://cdn.openai.com/papers/gpt-4.pdf
17. Touvron, H.; Martin, L.; Stone, K.; Albert, P.; Almahairi, A.; Babaei, Y.; Bashlykov, N.; Batra, S.; Bhargava,
P.; Bhosale, S.; Bikel, D. Llama 2: Open foundation and fine-tuned chat models. arXiv 2023,
arXiv:2307.09288.
18. Hello GPT-4o. Available online: https://openai.com/index/hello-gpt-4o (accessed on 30 August 2024).
19. Topsakal, O.; Akinci, T.C. Creating large language model applications utilizing langchain: A primer on
developing LLM apps fast. In International Conference on Applied Engineering and Natural Sciences; 2023;
Volume 1, Issue 1, pp. 1050–1056.
20. Meet Claude [Internet]. Available online: https://www.anthropic.com/claude (accessed on 30 August 2024).
21. Jiang, A.Q.; Sablayrolles, A.; Mensch, A.; Bamford, C.; Chaplot, D.S.; Casas, D.D.; Bressand, F.; Lengyel, G.;
Lample, G.; Saulnier, L.; Lavaud, L.R. Mistral 7B. arXiv 2023, arXiv:2310.06825.
22. Google AI Test Kitchen. Image FX. Available online: https://aitestkitchen.withgoogle.com/tools/image-
fx(accessed on 30 August 2024).
23. DALL-E 3. Available online: https://openai.com/index/dall-e-3/ (accessed on 30 August 2024).
24. Yang, Z.; Li, L.; Lin, K.; Wang, J.; Lin, C.C.; Liu, Z.; Wang, L. The dawn of LMMs: Preliminary explorations
with GPT-4V (ision). arXiv 2023, 9(1), 1. arXiv:2309.17421.
25. Meskó, B. Prompt engineering as an important emerging skill for medical professionals: tutorial. J. Med.
Internet Res.2023, 25, e50638.
26. Lewis, P.; Perez, E.; Kiela, D.; Cho, K.; Stenetorp, P. Retrieval-Augmented Generation for Knowledge-
Intensive NLP Tasks. 2020. Available online: https://arxiv.org/abs/2005.11401
Preprints.org (www.preprints.org) | NOT PEER-REVIEWED | Posted: 4 September 2024 doi:10.20944/preprints202409.0311.v1
15
27. Shuster, K.; Poff, S.; Chen, M.; Kiela, D.; Weston, J. Retrieval augmentation reduces hallucination in
conversation. arXiv 2021, arXiv:2104.07567.
28. Farquhar, S.; Kossen, J.; Kuhn, L.; Gal, Y. Detecting hallucinations in large language models using semantic
entropy. Nature 2024, 630, 625–630. https://doi.org/10.1038/s41586-024-07421-0.
29. Zewe, A. Method prevents an AI model from being overconfident about wrong answers. MIT News,
Massachusetts Institute of Technology, 2024. Available online: https://news.mit.edu/2024/thermometer-
prevents-ai-model-overconfidence-about-wrong-answers-0731 (accessed on 30 August 2024).
30. Williamson, S.M.; Prybutok, V. The Era of Artificial Intelligence Deception: Unraveling the Complexities
of False Realities and Emerging Threats of Misinformation. Information 2024, 15(6), 299.
31. Burford, K.G.; Itzkowitz, N.G.; Ortega, A.G.; Teitler, J.O.; Rundle, A.G. Use of Generative AI to Identify
Helmet Status Among Patients With Micromobility-Related Injuries From Unstructured Clinical Notes.
JAMA Netw Open 2024 Aug 1;7(8):e2425981. doi: 10.1001/jamanetworkopen.2024.25981.
32. Rumale Vishwanath, P.; Tiwari, S.; Naik, T.G.; Gupta, S.; Thai, D.N.; Zhao, W.; Kwon, S.; Ardulov, V.;
Tarabishy. K.; McCallum, A.; Salloum, W. Faithfulness Hallucination Detection in Healthcare AI. In
Proceedings of KDD-AIDSH 2024, August 26, 2024, Barcelona, Spain. Available online:
https://openreview.net/pdf?id=6eMIzKFOpJ
33. McDonald, K. RACGP issues guidance on AI scribes in practice. Pulse IT, 2024, August 20. Available online:
https://www.pulseit.news/australian-digital-health/racgp-issues-guidance-on-ai-scribes-in-
practice/?goal=0_b39f06f53f-9a4da8fc00-413088949 (accessed on 30 August 2024).
34. Australian Health Practitioner Regulation Agency (Ahpra) and National Boards. Meeting your professional
obligations when using Artificial Intelligence in healthcare. Available online:
https://www.ahpra.gov.au/Resources/Artificial-Intelligence-in-healthcare.aspx (accessed on 30 August
2024).
35. Goodman, K.E.; Paul, H.Y.; Morgan, D.J. AI-Generated Clinical Summaries Require More Than Accuracy.
JAMA 2024, February 27.
36. Bharel, M.; Auerbach, J.; Nguyen, V.; DeSalvo, K.B. Transforming Public Health Practice With Generative
Artificial Intelligence: Article examines how generative artificial intelligence could be used to transform
public health practice in the US. Health Aff. 2024, 43(6), 776–782.
37. UK MHRA. Software and Artificial Intelligence (AI) as a Medical Device. Guidance, Updated 13 June 2024.
Available online: https://www.gov.uk/government/publications/software-and-artificial-intelligence-ai-as-
a-medical-device/software-and-artificial-intelligence-ai-as-a-medical-device (accessed on 30 August 2024).
38. Matias, Y.; Gupta, A. MedLM: generative AI fine-tuned for the healthcare industry. Google Cloud Blog,
2023, December 13. Available online: https://cloud.google.com/blog/topics/healthcare-life-
sciences/introducing-medlm-for-the-healthcare-industry (accessed on 30 August 2024).
39. Harvey, H.; Pogose, M. How to get ChatGPT regulatory approved as a medical device. Hardian Health,
2024. Available online: https://www.hardianhealth.com/insights/how-to-get-regulatory-approval-for-
medical-large-language-models (accessed on 30 August 2024).
40. Blumenthal, D.; Patel, B. The Regulation of Clinical Artificial Intelligence. NEJM AI 2024, 1(8), AIpc2400545.
https://doi.org/10.1056/AIpc2400545.
41. Howell, M.D.; Corrado, G.S.; DeSalvo, K.B. Three Epochs of Artificial Intelligence in Health Care. JAMA
2024, 331(3), 242–244. https://doi.org/10.1001/jama.2023.25057.
42. US FDA. Artificial Intelligence and Machine Learning (AI/ML)-Enabled Medical Devices. Available online:
https://www.fda.gov/medical-devices/software-medical-device-samd/artificial-intelligence-and-machine-
learning-aiml-enabled-medical-devices (accessed on 30 August 2024).
43. Amazon Web Services (AWS). Generative AI for Healthcare (White Paper). 2024 June. Available online:
https://pages.awscloud.com/rs/112-TZM-766/images/AWS-GenAI-for-HCLS-Whitepaper_062024.pdf
(accessed on 30 August 2024).
44. Derraz, B.; Breda, G.; Kaempf, C.; Baenke, F.; Cotte, F.; Reiche, K.; Köhl, U.; Kather, J.N.; Eskenazy, D.;
Gilbert, S. New regulatory thinking is needed for AI-based personalised drug and cell therapies in precision
oncology. NPJ Precis Oncol 2024 Jan 30;8(1):23. doi: 10.1038/s41698-024-00517-w.
45. Freyer, O.; Wiest, I.C.; Kather, J.N.; Gilbert, S. A future role for health applications of large language models
depends on regulators enforcing safety standards. Lancet Digit Health 2024 Sep;6(9):e662-e672. doi:
10.1016/S2589-7500(24)00124-9.
46. Gutiérrez, J.D. Consultation paper on AI regulation: emerging approaches across the world. UNESCO:
Paris, France, 2024 August. Available online: https://unesdoc.unesco.org/ark:/48223/pf0000390979
(accessed on 30 August 2024).
47. Schmidt, J.; Schutte, N.M.; Buttigieg, S.; Novillo-Ortiz, D.; Sutherland, E.; Anderson, M.; de Witte, B.;
Peolsson, M.; Unim, B.; Pavlova, M.; Stern, A.D.; Mossialos, E.; van Kessel, R. Mapping the regulatory
landscape for artificial intelligence in health within the European Union. NPJ Digit Med 2024 Aug
27;7(1):229. doi: 10.1038/s41746-024-01221-6.
Preprints.org (www.preprints.org) | NOT PEER-REVIEWED | Posted: 4 September 2024 doi:10.20944/preprints202409.0311.v1
16
48. European Commission. European Artificial Intelligence Act comes into force. Press Release, 1 August 2024.
Available online: https://ec.europa.eu/commission/presscorner/detail/en/ip_24_4123 (accessed on 12
August 2024).
49. van Rooijen SB. The EU AI Act's Impact on Medical Devices and MDR Certification. LinkedIn post, July
2024. https://www.linkedin.com/posts/sigridberge_eu-ai-act-impact-on-medical-devices-activity-
7222488338342006784-T45R/ (accessed on 30 August 2024).
50. van Rooijen SB. EU AI Act - Healthcare. LinkedIn post, August 2024.
https://www.linkedin.com/posts/sigridberge_eu-ai-act-has-come-into-effect-what-does-activity-
7224662682841325570-6Zx3/ (accessed on 30 August 2024).
51. Wachter, S. Limitations and Loopholes in the EU AI Act and AI Liability Directives: What This Means for
the European Union, the United States, and Beyond. Yale Journal of Law & Technology 2024;26(3):671-718.
https://yjolt.org/limitations-and-loopholes-eu-ai-act-and-ai-liability-directives-what-means-european-
union-united.
52. Lenharo, M. The testing of AI in medicine is a mess. Here's how it should be done. Nature 2024
Aug;632(8026):722-724. doi: 10.1038/d41586-024-02675-0.
53. Coiera, E.; Fraile-Navarro, D. AI as an Ecosystem — Ensuring Generative AI Is Safe and Effective. NEJM
AI 2024;1(9). doi: 10.1056/AIp2400611.
54. Ong, J.C.; Chang, S.Y.; William, W.; Butte, A.J.; Shah, N.H.; Chew, L.S.; Liu, N.; Doshi-Velez, F.; Lu, W.;
Savulescu, J.; Ting, D.S. Medical Ethics of Large Language Models in Medicine. NEJM AI 2024, AIra2400038.
https://doi.org/10.1056/AIra2400038.
55. Ong, J.C.; Chang, S.Y.; William, W.; Butte, A.J.; Shah, N.H.; Chew, L.S.; Liu, N.; Doshi-Velez, F.; Lu, W.;
Savulescu, J.; Ting, D.S. Ethical and Regulatory Challenges of Large Language Models in Medicine. Lancet
Digit. Health 2024, 6(6), e428–e432. https://doi.org/10.1016/S2589-7500(24)00061-X.
56. Haltaufderheide, J.; Ranisch, R. The Ethics of ChatGPT in Medicine and Healthcare: A Systematic Review
on Large Language Models (LLMs). NPJ Digit. Med. 2024, 7(1), 183. https://doi.org/10.1038/s41746-024-
01157-x.
57. Levkovich, I.; Elyoseph, Z. Identifying depression and its determinants upon initiating treatment: ChatGPT
versus primary care physicians. Fam. Med. Community Health 2023, 11(4), e002391.
https://doi.org/10.1136/fmch-2023-002391.
58. Omiye, J.A.; Lester, J.C.; Spichak, S.; et al. Large language models propagate race-based medicine. NPJ
Digit. Med.2023, 6, 195. https://doi.org/10.1038/s41746-023-00939-z.
59. Singhal, K.; Tu, T.; Gottweis, J.; Sayres, R.; Wulczyn, E.; Hou, L.; Clark, K.; Pfohl, S.; Cole-Lewis, H.; Neal,
D.; Schaekermann, M. Towards expert-level medical question answering with large language models. arXiv
2023, arXiv:2305.09617.
60. Kim, J.; Cai, Z.R.; Chen, M.L.; Simard, J.F.; Linos, E. Assessing Biases in Medical Decisions via Clinician
and AI Chatbot Responses to Patient Vignettes. JAMA Netw. Open 2023, 6(10), e2338050.
https://doi.org/10.1001/jamanetworkopen.2023.38050.
61. Lang, O.; Yaya-Stupp, D.; Traynis, I.; Cole-Lewis, H.; Bennett, C.R.; Lyles, C.R.; Lau, C.; Irani, M.; Semturs,
C.; Webster, D.R.; Corrado, G.S. Using generative AI to investigate medical imagery models and datasets.
EBioMedicine 2024, 102. https://doi.org/10.1016/j.ebiom.2024.105075.
62. Painter A, et al. Explaining Explainable AI (for healthcare). YouTube, 2024, August 15. Available online:
https://www.youtube.com/watch?v=d5ZMVIgQ0jM (accessed on 30 August 2024).
63. Masanneck, L.; Schmidt, L.; Seifert, A.; Kölsche, T.; Huntemann, N.; Jansen, R.; Mehsin, M.; Bernhard, M.;
Meuth, S.; Böhm, L.; Pawlitzki, M. Triage Performance Across Large Language Models, ChatGPT, and
Untrained Doctors in Emergency Medicine: Comparative Study. J. Med. Internet Res. 2024, 26, e53297.
https://doi.org/10.2196/53297.
64. Ando, K.; Sato, M.; Wakatsuki, S.; Nagai, R.; Chino, K.; Kai, H.; Sasaki, T.; Kato, R.; Nguyen, T.P.; Guo, N.;
Sultan, P. A comparative study of English and Japanese ChatGPT responses to anaesthesia-related medical
questions. BJA Open 2024 Jun 14;10:100296. doi: 10.1016/j.bjao.2024.100296.
65. Greefhorst, A. The ‘Artificial Stubbornness’ of ChatGPT when Solving a Simple Puzzle: The farmer with
his wolf, goat, and cabbage. International Policy Digest, 2024, May 28. Available online:
https://intpolicydigest.org/the-artificial-stubbornness-of-chatgpt-when-solving-a-simple-puzzle/
(accessed on 12 August 2024).
66. Wiggers, K. Tokens are a big reason today’s generative AI falls short. TechCrunch, 2024, July 6. Available
online: https://techcrunch.com/2024/07/06/tokens-are-a-big-reason-todays-generative-ai-falls-short/
(accessed on 9 July 2024).
67. Eliot, L. Doctors relying on generative AI to summarize medical notes might unknowingly be taking big
risks. Forbes, 2024, February 5. Available online:
https://www.forbes.com/sites/lanceeliot/2024/02/05/doctors-relying-on-generative-ai-to-summarize-
medical-notes-might-unknowingly-be-taking-big-risks/ (accessed on 30 August 2024).
Preprints.org (www.preprints.org) | NOT PEER-REVIEWED | Posted: 4 September 2024 doi:10.20944/preprints202409.0311.v1
17
68. Patel, D.; Raut, G.; Zimlichman, E.; et al. Evaluating prompt engineering on GPT-3.5’s performance in
USMLE-style medical calculations and clinical scenarios generated by GPT-4. Sci. Rep. 2024, 14, 17341.
https://doi.org/10.1038/s41598-024-66933-x.
69. Singer, M.B.; Fu, J.J.; Chow, J.; Teng, C.C. Development and evaluation of Aeyeconsult: a novel
ophthalmology chatbot leveraging verified textbook knowledge and GPT-4. J. Surg. Educ. 2024, 81(3), 438–
443. https://doi.org/10.1016/j.jsurg.2023.11.019.
70. Kamel Boulos, M.N.; Dellavalle, R. NVIDIA’s ‘Chat with RTX’ custom Large Language Model and
Personalized AI Chatbot Augments the Value of Electronic Dermatology Reference Material. JMIR
Dermatol. 2024, 7, e58396. https://doi.org/10.2196/58396.
71. Ge, J.; Sun, S.; Owens, J.; Galvez, V.; Gologorskaya, O.; Lai, J.; Pletcher, M.; Lai, K. Development of a liver
disease-specific large language model chat interface using retrieval-augmented generation. Hepatology.
2024 Mar 07; doi: 10.1097/HEP.0000000000000834.01515467-990000000-00791.
72. Ye, C. Exploring a learning-to-rank approach to enhance the Retrieval Augmented Generation (RAG)-
based electronic medical records search engines. Informatics and Health 2024, 1(2), 93–99.
https://doi.org/10.1016/j.infoh.2024.07.001.
73. Wiggers, K. Why RAG won’t solve Generative AI’s hallucination problem [Internet]. TechCrunch, 2024,
May 4. Available online: https://techcrunch.com/2024/05/04/why-rag-wont-solve-generative-ais-
hallucination-problem/ (accessed on 30 August 2024).
74. Singhal, K.; Azizi, S.; Tu, T.; Mahdavi, S.S.; Wei, J.; Chung, H.W.; Scales, N.; Tanwani, A.; Cole-Lewis, H.;
Pfohl, S.; Payne, P. Large language models encode clinical knowledge. Nature 2023, 620(7972), 172–180.
75. Chen, Z.; Cano, A.H.; Romanou, A.; Bonnet, A.; Matoba, K.; Salvi, F.; Pagliardini, M.; Fan, S.; Köpf, A.;
Mohtashami, A.; Sallinen, A. Meditron-70b: Scaling medical pretraining for large language models. arXiv
2023, arXiv:2311.16079.
76. Saab, K.; Tu, T.; Weng, W.H.; Tanno, R.; Stutz, D.; Wulczyn, E.; Zhang, F.; Strother, T.; Park, C.; Vedadi, E.;
Chaves, J.Z. Capabilities of Gemini models in medicine. arXiv 2024, arXiv:2404.18416.
77. Cosentino, J.; Belyaeva, A.; Liu, X.; Furlotte, N.A.; Yang, Z.; Lee, C.; Schenck, E.; Patel, Y.; Cui, J.; Schneider,
L.D.; Bryant, R. Towards a Personal Health Large Language Model. arXiv 2024, arXiv:2406.06474.
78. Merrill, M.A.; Paruchuri, A.; Rezaei, N.; Kovacs, G.; Perez, J.; Liu, Y.; Schenck, E.; Hammerquist, N.;
Sunshine, J.; Tailor, S.; Ayush, K. Transforming wearable data into health insights using large language
model agents. arXiv 2024, arXiv:2406.06464.
79. McDuff, D.; Schaekermann, M.; Tu, T.; Palepu, A.; Wang, A.; Garrison, J.; Singhal, K.; Sharma, Y.; Azizi, S.;
Kulkarni, K.; Hou, L. Towards accurate differential diagnosis with large language models. arXiv 2023,
arXiv:2312.00164.
80. World Health Organization. Regulatory considerations on artificial intelligence for health. World Health
Organization, 2023 October. ISBN 978-92-4-007887-1 (electronic version).
https://iris.who.int/handle/10665/373421.
81. Sharma, S. Huma raises $80M to turn text into healthcare apps with gen AI. 2024. Available online:
https://venturebeat.com/ai/huma-raises-80m-to-turn-text-into-healthcare-apps-with-gen-ai/ (accessed on
30 August 2024).
82. Synchron announces brain computer interface chat feature powered by OpenAI. 2024. Available online:
https://www.businesswire.com/news/home/20240711493318/en/Synchron-Announces-Brain-Computer-
Interface-Chat-Feature-Powered-by-OpenAI (accessed on 31 August 2024).
83. Orrall, J. How this brain implant is using ChatGPT. 2024. Available online:
https://www.cnet.com/tech/computing/how-this-brain-implant-is-using-chatgpt/ (accessed on 31 August
2024).
84. What it’s like using a brain implant with ChatGPT - Video. CNET, 2024. Available online:
https://www.cnet.com/videos/what-its-like-using-a-brain-implant-with-chatgpt/ (accessed on 31 August
2024).
85. Makarov, N.; Bordukova, M.; Rodriguez-Esteban, R.; Schmich, F.; Menden, M.P. Large Language Models
forecast Patient Health Trajectories enabling Digital Twins. medRxiv 2024.07.05.24309957; doi:
10.1101/2024.07.05.24309957.
86. Gangwal, A., Lavecchia, A. Unleashing the power of generative AI in drug discovery. Drug Discov Today
2024 Jun;29(6):103992. doi: 10.1016/j.drudis.2024.103992.
87. Ghandikota, S.K.; Jegga, A.G. Application of artificial intelligence and machine learning in drug
repurposing. Prog Mol Biol Transl Sci 2024;205:171-211. doi: 10.1016/bs.pmbts.2024.03.030.
88. Caswell A. OpenAI to launch new advanced "Strawberry" AI product this fall — what we know so far.
Tom's Guide, 2024, August 28. Available online: https://www.tomsguide.com/ai/openai-to-launch-new-
advanced-strawberry-ai-product-this-fall-what-we-know-so-far (accessed on 30 August 2024).
89. Hippocratic AI. Foundation Model. Available online: https://www.hippocraticai.com/foundationmodel
(accessed on 30 August 2024).
Preprints.org (www.preprints.org) | NOT PEER-REVIEWED | Posted: 4 September 2024 doi:10.20944/preprints202409.0311.v1
18
90. Kamel Boulos MN. Generative AI in medicine and health/care: handpicked news, posts and articles from
around the Web. Available online: https://healthcybermap.org/HDTs/genai-med.html (accessed on 30
August 2024).
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those
of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s)
disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or
products referred to in the content.