Digital Discovery: Perspective
Digital Discovery: Perspective
Discovery
This article is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported Licence.
Cite this: Digital Discovery, 2024, 3, 23 Seoin Back, *a Alán Aspuru-Guzik,†bc Michele Ceriotti, d Ganna Gryn'ova, ef
Bartosz Grzybowski, ghi Geun Ho Gu,j Jason Hein, k Kedar Hippalgaonkar, lm
Rodrigo Hormázabal, n Yousung Jung, †op Seonah Kim, q Woo Youn Kim, r
Seyed Mohamad Moosavi,s Juhwan Noh,t Changyoung Park,n Joshua Schrier, u
Philippe Schwaller,v Koji Tsuda, wxy Tejs Vegge, z O. Anatole von
Lilienfeld †caaab and Aron Walsh acad
In light of the pressing need for practical materials and molecular solutions to renewable energy and health
problems, to name just two examples, one wonders how to accelerate research and development in the
chemical sciences, so as to address the time it takes to bring materials from initial discovery to
commercialization. Artificial intelligence (AI)-based techniques, in particular, are having a transformative
and accelerating impact on many if not most, technological domains. To shed light on these questions,
the authors and participants gathered in person for the ASLLA Symposium on the theme of ‘Accelerated
Chemical Science with AI’ at Gangneung, Republic of Korea. We present the findings, ideas, comments,
and often contentious opinions expressed during four panel discussions related to the respective general
topics: ‘Data’, ‘New applications’, ‘Machine learning algorithms’, and ‘Education’. All discussions were
Received 25th October 2023
Accepted 6th December 2023
recorded, transcribed into text using Open AI's Whisper, and summarized using LG AI Research's
EXAONE LLM, followed by revision by all authors. For the broader benefit of current researchers,
DOI: 10.1039/d3dd00213f
educators in higher education, and academic bodies such as associations, publishers, librarians, and
rsc.li/digitaldiscovery companies, we provide chemistry-specific recommendations and summarize the resulting conclusions.
a
Department of Chemical and Biomolecular Engineering, Institute of Emergent q
Department of Chemistry, Colorado State University, 1301 Center Avenue, Fort
Materials, Sogang University, Seoul, Republic of Korea. E-mail: [email protected] Collins, CO 80523, USA
b
Departments of Chemistry, Computer Science, University of Toronto, St. George r
Department of Chemistry, KAIST, Daejeon, Republic of Korea
Campus, Toronto, ON, Canada s
Chemical Engineering & Applied Chemistry, University of Toronto, Toronto, Ontario
c
Acceleration Consortium and Vector Institute for Articial Intelligence, Toronto, ON, M5S 3E5, Canada
M5S 1M1, Canada t
Chemical Data-Driven Research Center, Korea Research Institute of Chemical
d
Laboratory of Computational Science and Modeling (COSMO), École Polytechnique Technology, Daejeon, 34114, Republic of Korea
Fédérale de Lausanne, Lausanne, Switzerland u
Department of Chemistry, Fordham University, The Bronx, NY 10458, USA
e
Heidelberg Institute for Theoretical Studies (HITS gGmbH), 69118, Heidelberg, Germany v
Laboratory of Articial Chemical Intelligence (LIAC) & National Centre of
f
Interdisciplinary Center for Scientic Computing, Heidelberg University, 69120, Competence in Research (NCCR) Catalysis, École Polytechnique Fédérale de
Heidelberg, Germany Lausanne, Lausanne, Switzerland
g
Center for Algorithmic and Robotized Synthesis (CARS), Institute for Basic Science w
Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba 277-
(IBS), Ulsan, Republic of Korea 8561, Japan
h
Institute of Organic Chemistry, Polish Academy of Sciences, Warsaw, Poland x
Center for Basic Research on Materials, National Institute for Materials Science,
i
Department of Chemistry, Ulsan National Institute of Science and Technology, Tsukuba, Ibaraki 305-0044, Japan
Ulsan, Republic of Korea y
RIKEN Center for Advanced Intelligence Project, Tokyo 103-0027, Japan
j
Department of Energy Engineering, Korea Institute of Energy Technology z
Department of Energy Conversion and Storage, Technical University of Denmark,
(KENTECH), Naju, 58330, Republic of Korea 301 Anker Engelunds vej, Kongens Lyngby, Copenhagen, 2800, Denmark
k
Department of Chemistry, University of British Columbia, Vancouver, BC, V6T 1Z1, aa
Departments of Chemistry, Materials Science and Engineering, and Physics,
Canada University of Toronto, St George Campus, Toronto, ON, Canada
l
School of Materials Science and Engineering, Nanyang Technological University, 50 ab
Machine Learning Group, Technische Universität Berlin and Berlin Institute for the
Nanyang Avenue, Singapore 639798, Singapore Foundations of Learning and Data, 10587, Berlin, Germany
m
Institute of Materials Research and Engineering, Agency for Science Technology and ac
Department of Materials, Imperial College London, London SW7 2AZ, UK
Research, 2 Fusionopolis Way, 08-03, Singapore 138634, Singapore ad
Department of Physics, Ewha Women's University, Seoul, Republic of Korea
n
LG AI Research, Seoul, Republic of Korea † The symposium was organized by Yousung Jung, Alán Aspuru-Guzik, and O.
o
Department of Chemical and Biomolecular Engineering, KAIST, Daejeon, Republic Anatole von Lilienfeld. The authors are listed in alphabetical order, except for
of Korea the rst author who took charge of organizing the initial dra written by all
p
School of Chemical and Biological Engineering, Interdisciplinary Program in co-authors who contributed to different sections.
Articial Intelligence, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul
08826, Republic of Korea
© 2024 The Author(s). Published by the Royal Society of Chemistry Digital Discovery, 2024, 3, 23–33 | 23
View Article Online
24 | Digital Discovery, 2024, 3, 23–33 © 2024 The Author(s). Published by the Royal Society of Chemistry
View Article Online
development of easy-to-use, web browser-based interfaces for large numbers of examples, typically a few hundred or more,
predictive models is of great importance.42 At the same time, the which hinders the development of practical/useful/general AI
systematic management of meta-information remains impor- models.56 Efforts such as the Open Reaction Database are
tant to ensure the reliability of the constructed database. For notable for trying to address these limitations,57 but remain
This article is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported Licence.
example, tools such as AiiDA43 and NoMaD44,45 record compre- populated with data from USPTO, with only a few hundred
hensive data provenance for ‘static’ materials simulations. brand-new entries – this poses a question of how to best
Finally, it is important to distinguish between multiple incentivize synthetic chemists to deposit their results (both
datasets categories: smaller, more accurate, and computation- positive and negative) into such databases.
ally challenging ones that serve specic practical purposes, and Correspondingly, purely data-driven approaches in organic
datasets specically designed for benchmarking ML models. synthesis planning would greatly benet from maximal training
Open Access Article. Published on 06 December 2023. Downloaded on 5/31/2024 9:44:07 AM.
This differentiation helps avoid situations where research solely data efficiency when learning. Potential solutions to enhance
focuses on improving model performance to surpass bench- efficiency include Delta-learning and transfer learning,58 multi-
marks without effectively translating those advancements into level learning,36,37 and few-shot learning techniques.59 However,
practical applications, (overtting). In this context, dynamic the challenge of sparse data becomes particularly pronounced
management of databases within the relevant research when attempting to identify the scope of “impossible” reactions.
community proves to be fruitful, as discussed below. If a certain reaction is not listed in a database, one oen assumes
it cannot happen. But this assumption is mostly true for the types
Dynamic community database of reactions that happen oen. As mentioned earlier, such
classes are relatively limited in number and occurrence.60
For ML algorithms to effectively capture the true complexity of
When high-quality datasets are lacking, an alternative, albeit
the chemical and materials compound space, it is crucial to
more labor-intensive approach, is expert coding within
overcome biases present in existing databases. This requires
programs like Chematica or AllChemy. These programs can
a collaborative effort within the community to enable true
perform advanced-level synthesis planning, even for complex
discovery. To facilitate this goal, the successful implementation
natural products.61
of the Common Task Framework (CTF) in the protein commu-
One conclusion reached with broad consensus is the ever-
nity, in conjunction with the Protein Data Bank, has served as
increasing need for improved quality and open databases in
a model. The following list outlines key components in datasets
all AI-related efforts, not only for reaction data but also for
that could help to facilitate and foster collaborations between
describing rules of chemical reactivity, or the properties of
non-experts and experts in solving such problems:
experimentally-available and virtual ligands to nd new cata-
(1) Tasks: clearly dened tasks with precise mathematical
lysts.62,63 Moreover, new featurization schemes may be neces-
interpretation, physical meaning, and chemical purpose.
sary, particularly ones that consider stereochemical, steric
(2) Accessibility: availability of easily accessible gold-
hindrance, and long-range interaction aspects of reactions on
standard datasets in a standardized format, publicly acces-
complex scaffolds.
sible and ready for use.
(3) Metrics: specication of one or more proposed quanti-
tative metrics for each task to measure success. Publisher's role
(4). Evaluation: continuously updated leaderboards that rank
The consensus among many participants was that funding
state-of-the-art methods and/or data-splits that allow us to
bodies and scientic journals should adopt stringent require-
better track the model improvements and generalization to out-
ments to foster the open availability, completeness, curation,
of-domain (OOD).
and standardized formatting of published data. However,
(5) Discovery: ability to generate new data as needed, by
determining the specic standards and formats for data
“Augmenting with chemical knowledge.”
remains an ongoing question.
Similarly, it was emphasized that the codes utilized to
Discussions specic to organic reactions databases generate the data should be accessible, unless licensed, and
While signicant progress has been made in the past decade well-documented. Such practices align with the increasing
with the emergence of deep learning, the effectiveness of purely adoption of FAIR (Findable, Accessible, Interoperable, and
data-driven approaches in organic synthesis planning remains Reusable) policies in the scientic community.64,65 Another
to be determined.46–51 related challenge is facilitating broader access to proprietary
Large databases of reactions, such as USPTO,52 Pistachio,53 data and/or establishing new repositories where researchers
Reaxys,54 and SciFinder,55 do exist. However, the knowledge can deposit results of both successful and, importantly,
contained within these databases falls short regarding quality, unsuccessful experiments they have conducted.
diversity, and accessibility. For instance, while USPTO offers On the former issue, the panelists agreed that professional
open access, its quality may be lower compared to the limited, non-prot organizations, such as the American Chemical
paid access but higher-quality Reaxys. Reproducibility has also Society (ACS), should consider opening up their extensive
become a point of concern. Additionally, despite the vast repositories or, at the very least, enabling broader academic
number of experimental data available in these reaction data- access. Currently, the SciFinder dataset contains approximately
bases, only a limited number of reaction types have sufficiently 100 million reactions, yet it remains completely inaccessible for
© 2024 The Author(s). Published by the Royal Society of Chemistry Digital Discovery, 2024, 3, 23–33 | 25
View Article Online
downloads, severely limiting systematic data analyses. Given its ultimately a tool that accelerates technological advancements
status as a non-prot organization, the ACS is seen to have an and scientic discoveries. The progress made in this eld has
ethical obligation to share the datasets it accumulates. While undeniably expedited the pace of invention. It can also be
the CAS Common Registry initiative66 is appreciated, restricted argued that AI enhances the occurrence of “eureka moments”
This article is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported Licence.
licensing hinders research progress. Thinking more broadly, by facilitating new insights and understanding. This aspect is
policies that require disseminating a complete set of data and intricately linked to the exploration of new concepts and the
code as a requirement for publication will help accelerate perception of reality. As a creative discipline, chemistry is driven
progress in this eld. ACS has started dening research data by scientists motivated to uncover novel phenomena, unen-
policy recommendations to achieve this goal.67 An excellent cumbered by pre-established physical laws. For example, this
example of this is RSC's new journal Digital Discovery,68 which could involve stabilizing challenging structures, creating
Open Access Article. Published on 06 December 2023. Downloaded on 5/31/2024 9:44:07 AM.
has a dedicated data and code reviewer to assess submitted unconventional solvation environments, or discovering previ-
materials for documentation and reproducibility. ously unknown and aesthetically pleasing spin states. There-
fore, by leveraging AI to comprehend the existing knowledge
and venture into unexplored territories, creative pursuits in
III. New applications chemistry can be truly enhanced. In particular, the question of
Non-equilibrium states what it entails for AI to gain scientic understanding based on
Particular emphasis should be given to developing benchmark data is a very relevant question due to the advent of large
training sets that extend beyond equilibrium structures.69 Such language models (LLMs) and their applications to
sets, e.g., Transition1x, should enable advancing methods chemistry.78–81 In this context, philosophical and conceptual
capable of describing dynamics, activated processes, and frameworks like the one proposed by Krenn, et al. are needed.82
chemical reaction networks/pathways.69,70
Addressing the multi-scale nature of materials
Utilizing experimental data An example discussed was the need to provide detailed
Computational data has played a signicant role in AI-driven descriptions of the operating conditions of functional materials
materials discovery. However, specic critical properties at their relevant scales,83–85 and under intended operating
remain inaccessible to these computational approaches conditions.86 This information is crucial for facilitating inverse
regarding real-world applications. To enhance the impact of design. Much of the work in the eld currently follows a bottom-
computational discoveries, it becomes crucial to develop AI up approach, focusing on the development of machine learning
methods that can predict the synthesizability of materials.71 The potentials to extend the accessible time- and length scales in
panel emphasized the importance of establishing an efficient atomic-scale simulations. This is necessary to ensure sufficient
two-way communication channel between theoreticians and statistical sampling for retaining predictive accuracy.87,88
experimentalists, as well as the need for integrated autonomous Different materials exhibit limiting processes and reactions at
workows that bridge both domains.72–75 various scales. For instance, catalysts' activity and selectivity89
Simultaneously, the experimental literature tends to exhibit and the performance of thermoelectric materials90,91 are gov-
bias towards “success stories” while failing experiments oen erned at the atomic scale, while durability and reliability involve
go unreported.76,77 This bias can arise from various factors, such processes at the meso- to micro-scale or beyond.
as the superior performance or ease of synthesis and charac- The concept of self-driving labs was also discussed,9 with
terization of certain materials for unrelated applications. considerations given to the expenses associated with building,
Consequently, the available data on chemical space for explo- maintaining, and operating such facilities, especially when
ration with AI becomes limited, impeding the discovery of tailored for testing various optimization algorithms. The idea of
genuinely novel systems. From a modeling perspective, a data “virtual labs” emerged as an alternative, where multi-level
point perceived as a “failure” in experimental terms can be just modeling is utilized to mimic real-world experiments. For
as valuable for training models as a data point from example, in the context of batteries, simulations running on
a “successful” experiment. Although the concept of a “Journal of materials could be linked to single-cell and battery-pack
Failed Research” remains elusive, the panel suggested that well- congurations to understand the key inuences from micro-
documented and openly available metadata from experiments, structure to system performance.
regardless of outcomes, could address this limitation by There is also a need to approach data dynamically. Building
providing theoreticians with more extensive and diverse data in a multi-modal capacity to capture different scales or
training sets in terms of structure and composition. Moreover, incorporating new experiments and calculations is critical for
it was highlighted that the context of an experiment matters in aiding chemical discovery. It is crucial to emphasize the
dening what constitutes a “failed experiment”. For instance, importance of top-down approaches, starting from the meso/
a seemingly failed experiment in one context may actually lead micro-scale phase-eld60 and seamlessly coupling them with
to successful outcomes or the discovery of new compounds in ML potentials92 for autonomous parameterization. Additionally,
a different context. to enable more meaningful AI-driven discoveries, it is highly
During the discussions, the topic of how AI empowers crea- desirable to restrict the search to compounds that are easy to
tivity in chemistry was addressed. It was acknowledged that AI is synthesize and provide synthesis recipes.
26 | Digital Discovery, 2024, 3, 23–33 © 2024 The Author(s). Published by the Royal Society of Chemistry
View Article Online
IV. Machine learning algorithms simulations and experimental data.100 Such models have the
potential to learn by effectively integrating diverse sources of
Given these considerations, the natural question also arises: information.
what other foundational AI advancements, explicitly addressing
This article is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported Licence.
Encoding algorithms for science accurately predicting data from domains outside the training
Identifying AI algorithm development specic to the sciences data distribution remains a challenge.101 One intriguing and
(chemistry, physics, materials) that has been driven by clearly challenging topic discussed was the development of AI tech-
dened needs is an important consideration. One notable niques that consider the minimum amount of information
example is the effect of differentiation and the loss function in necessary to learn everything from the system. Additionally,
the case of organic molecules, as observed in the QM9 dataset. there was a signicant focus on the necessity and development
In this dataset, which provides quantum chemical properties of multi-objective optimizations for new materials
for a comprehensive chemical space of small organic molecules, discovery.102,103
the use of different loss functions for the training and testing Considering these fundamental AI advancements for
sets was necessary to discover new motifs with desired func- enabling chemical discovery, it was noted that most multi-
tionality.35 This requirement arises due to the unique challenge objective, multi-delity constrained problems addressed in
of extrapolating from known molecules to identifying motifs self-driving labs today tend to prioritize higher performance
and properties that differ from the original set encountered by based on predened objectives. However, to advance chemistry
the algorithm. This specic example highlights the demand for knowledge, algorithms need to be further tailored for inter-
novel machine learning techniques tailored to the eld of pretability, extrapolation to learn new science, and hypothesis
chemistry. testing, which fundamentally require different approaches. A
An additional example of algorithmic developments, recent example involves dedicated exploration of the Pareto
partially inspired by chemical applications, involves the front, allowing the extraction of local correlations with near-
construction of models that incorporate physical symmetries optimal performance to aid in result understanding.104,105
into their structure. In the case of interatomic potentials, since The subsequent topic of discussion revolved around using
the early stages of this eld the crucial insight has been the the acceleration and discovery of new molecules/materials
requirement for models to be exactly invariant to rotations, successfully validated in the lab as metrics of success in
translations, and atom index permutations.93 More recently, applying machine learning in chemistry. However, going
these ideas have been expanded to create physics inspired beyond the speed of material development, true discovery of
models that build upon covariant features/representations, an new concepts,82 such as topological materials, remains elusive.
extension motivated by the widespread presence of vectorial This led to the question of exploring deeper paths in AI to
and tensorial targets in quantum chemistry.94 It is noteworthy unlock such possibilities.106 One potential avenue is consid-
that these developments have progressed independently and in ering an automatic system that generates novel questions,
parallel with similar efforts in computer science,95 albeit although formulating the problems is typically within the
formulated using different terminology and with less mathe- domain of human experts. In scientic discovery, anomalies or
matical generality. outliers oen lead to new ndings. Optimization algorithms are
During the panel discussions, intriguing questions were already designed to nd regions of high uncertainty in the
raised regarding the potential integration of data-centered and parameter space, which are oen unexplored. Rewarding data
expert methods and the extent to which this integration could points in those regions, even if only a small percentage results
be achieved.96,97 Hybrid approaches were proposed as a means in actual discoveries, can lead to the real discovery of new
to leverage the encoded knowledge of experts while maintaining phenomena. Additionally, digitizing existing knowledge in
the exibility and adaptability of data-driven approaches. It was chemistry and creating a comprehensive corpus of our current
also observed that the raw reaction rules derived from either of understanding can help dene a concept of “known unknowns”
these approaches can be signicantly enhanced through further for AI, making the idea less vague and facilitating exploration
renement using quantum mechanical (QM) or molecular beyond what is already known. An example was shared
mechanical (MM) calculations. For instance, MM methods can regarding an automated robotic system developed by David
be employed to calculate strains and estimate the applicability MacMillan's group at Princeton University, which achieved
of reaction rules to cyclization reactions.98 “accelerated serendipity” by assembling molecules with no
Another notable example of a hybrid approach involves known history of interactions and rewarding accidental reac-
breaking down the barriers between different methodologies. tivity.107 This approach resulted in discovering new reactions or
This includes merging electronic structure theory and machine improved methods for existing reactions. Furthermore,
learning99 or creating a unied framework that combines emphasizing the uncertainty quantication of AI models was
© 2024 The Author(s). Published by the Royal Society of Chemistry Digital Discovery, 2024, 3, 23–33 | 27
View Article Online
highlighted as a critical step, as rewarding areas of large There are both advantages and disadvantages to having this
uncertainty in active learning frameworks necessitates the course taught by a computer science department, considering
quantication and understanding of the epistemic and alea- university politics and topical relevance to students. On the one
toric uncertainty of the models,38,108,109 and the errors at each hand, departments may be protective of their specic areas of
This article is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported Licence.
step. study, and other departments may lack the staffing necessary to
support the teaching of new classes. On the other hand,
students oen benet from direct applications of programming
V. Education to their primary coursework, which may be lacking in broader
All participants unanimously agreed on the importance of service courses. Regardless of how it is offered, it is crucial that
students learn elementary programming as early as possible, as
Open Access Article. Published on 06 December 2023. Downloaded on 5/31/2024 9:44:07 AM.
28 | Digital Discovery, 2024, 3, 23–33 © 2024 The Author(s). Published by the Royal Society of Chemistry
View Article Online
The literature also offers examples of mixed-reality enhance- prestigious than careers in science. Factors such as higher
ments in teaching microuidics.47 salaries, early exposure to computers compared to chemistry
Another potential application for training is using “body sets, and negative perceptions of chemistry as ‘polluting’ or
cam” footage or similar technologies to provide mentorship in ‘bad’ may contribute to this disparity. Despite being the archi-
This article is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported Licence.
the laboratory. The COVID-19 pandemic, with its need for tects of matter, chemists oen remain in the background in
remote work and limited laboratory occupancy, presented many applications. The general public may need to be made
opportunities for pilot projects exploring augmented reality. In aware of the signicant role chemists and materials scientists
these projects, a trainer could supervise trainees from a remote play in scientic advancements, such as space exploration,
location and provide relevant information directly into the where chemical expertise is essential for activities like analyzing
trainee's eld of view. samples and developing chemical processes. Promoting green
Open Access Article. Published on 06 December 2023. Downloaded on 5/31/2024 9:44:07 AM.
© 2024 The Author(s). Published by the Royal Society of Chemistry Digital Discovery, 2024, 3, 23–33 | 29
View Article Online
chemical education. The importance of cooperation among 6 B. Huang and O. A. Von Lilienfeld, Chem. Rev., 2021, 121,
researchers, educators, associations, publishers, and compa- 10001–10036.
nies was emphasized in all panel discussions to facilitate AI in 7 I. Poltavsky and A. Tkatchenko, J. Phys. Chem. Lett., 2021, 12,
chemical science. The authors anticipate the continuation of 6551–6564.
This article is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported Licence.
efforts from various elds, expecting that such endeavors will 8 T. Zubatiuk and O. Isayev, Acc. Chem. Res., 2021, 54, 1575–
eventually lead to critical innovations in the eld of chemistry. 1585.
9 E. Stach, B. DeCost, A. G. Kusne, J. Hattrick-Simpers,
Data availability K. A. Brown, K. G. Reyes, J. Schrier, S. Billinge,
T. Buonassisi and I. Foster, Matter, 2021, 4, 2702–2726.
This Perspective is derived from the ASLLA symposium panel 10 J. H. Montoya, M. Aykol, A. Anapolsky, C. B. Gopal,
Open Access Article. Published on 06 December 2023. Downloaded on 5/31/2024 9:44:07 AM.
discussion, and as such, no data or codes are available for P. K. Herring, J. S. Hummelshøj, L. Hung, H.-K. Kwon,
sharing. D. Schweigert and S. Sun, Appl. Phys. Rev., 2022, 9, 011405.
11 K. Hippalgaonkar, Q. Li, X. Wang, J. W. Fisher III,
Author contributions J. Kirkpatrick and T. Buonassisi, Nat. Rev. Mater., 2023, 8,
241–260.
Writing − original dra, Alán Aspuru-Guzik, Seoin Back, 12 B. Sanchez-Lengeling and A. Aspuru-Guzik, Science, 2018,
Michele Ceriotti, Ganna Gryn'ova, Bartosz Grzybowski, Geun Ho 361, 360–365.
Gu, Jason Hein, Kedar Hippalgaonkar, Rodrigo Hormázabal, 13 D. P. Tabor, L. M. Roch, S. K. Saikin, C. Kreisbeck,
Yousung Jung, Seonah Kim, Woo Youn Kim, Seyed Mohamad D. Sheberla, J. H. Montoya, S. Dwaraknath, M. Aykol,
Moosavi, Juhwan Noh, Changyoung Park, Joshua Schrier, Phil- C. Ortiz and H. Tribukait, Nat. Rev. Mater., 2018, 3, 5–20.
ippe Schwaller, Koji Tsuda, Tejs Vegge, O. Anatole von Lil- 14 Z. Yao, Y. Lum, A. Johnston, L. M. Mejia-Mendoza, X. Zhou,
ienfeld, and Aron Walsh. Writing − review & editing, Seoin Y. Wen, A. Aspuru-Guzik, E. H. Sargent and Z. W. Seh, Nat.
Back. Funding acquisitions, Yousung Jung, Alán Aspuru-Guzik, Rev. Mater., 2023, 8, 202–215.
and O. Anatole von Lilienfeld. 15 R. Pollice, G. dos Passos Gomes, M. Aldeghi, R. J. Hickman,
M. Krenn, C. Lavigne, M. Lindner-D’Addario, A. Nigam,
Conflicts of interest C. T. Ser and Z. Yao, Acc. Chem. Res., 2021, 54, 849–860.
16 Z. Yao, B. Sánchez-Lengeling, N. S. Bobbitt, B. J. Bucior,
There are no conicts to declare. S. G. H. Kumar, S. P. Collins, T. Burns, T. K. Woo,
O. K. Farha and R. Q. Snurr, Nat. Mach. Intell., 2021, 3,
Acknowledgements 76–86.
17 N. S. Eyke, B. A. Koscher and K. F. Jensen, Trends Chem.,
The symposium organizers (YJ, AAG, and AVL) are grateful to 2021, 3, 120–132.
KIST for generous nancial support to organize the symposium. 18 B. A. Grzybowski, T. Badowski, K. Molga and S. Szymkuć,
YJ acknowledges support from IITP Korea (No. 2021-0-01343, Wiley Interdiscip. Rev.: Comput. Mol. Sci., 2023, 13, e1630.
Articial Intelligence Graduate School Program for Seoul 19 M. Seifrid, R. Pollice, A. Aguilar-Granda, Z. Morgan Chan,
National University & No. 2021-0-02068, Articial Intelligence K. Hotta, C. T. Ser, J. Vestfrid, T. C. Wu and A. Aspuru-
Innovation Hub) and NRF of Korea funded by Ministry of Guzik, Acc. Chem. Res., 2022, 55, 2454–2466.
Science and ICT (RS-2023-00283902). PS acknowledges support 20 N. J. Szymanski, Y. Zeng, H. Huo, C. J. Bartel, H. Kim and
from the NCCR Catalysis (grant number 180544), a National G. Ceder, Mater. Horiz., 2021, 8, 2169–2198.
Centre of Competence in Research funded by the Swiss National 21 S. M. Moosavi, K. M. Jablonka and B. Smit, J. Am. Chem.
Science Foundation. A. A.-G. acknowledges support from the Soc., 2020, 142, 20273–20287.
Acceleration Consortium, a Canada First Research Excellence 22 J. A. Bennett and M. Abolhasani, Curr. Opin. Chem. Eng.,
Fund at the University of Toronto as well as Anders G. Frøseth. 2022, 36, 100831.
23 H. Tao, T. Wu, M. Aldeghi, T. C. Wu, A. Aspuru-Guzik and
References E. Kumacheva, Nat. Rev. Mater., 2021, 6, 701–716.
24 Z. Bao, J. Buon, R. J. Hickman, A. Aspuru-Guzik,
1 J. Yano, K. J. Gaffney, J. Gregoire, L. Hung, A. Ourmazd, P. Bannigan and C. Allen, Adv. Drug Delivery Rev., 2023,
J. Schrier, J. A. Sethian and F. M. Toma, Nat. Rev. Chem., 115108.
2022, 6, 357–370. 25 Y. Ivanenkov, B. Zagribelnyy, A. Malyshev, S. Evteev,
2 K. Jorner, A. Tomberg, C. Bauer, C. Sköld and P.-O. Norrby, V. Terentiev, P. Kamya, D. Bezrukov, A. Aliper, F. Ren and
Nat. Rev. Chem., 2021, 5, 240–255. A. Zhavoronkov, ACS Med. Chem. Lett., 2023, 14, 901–915.
3 M. Meuwly, Chem. Rev., 2021, 121, 10218–10239. 26 A. L. Ferguson and K. A. Brown, Annu. Rev. Chem. Biomol.
4 P. Schwaller, A. C. Vaucher, R. Laplaza, C. Bunne, A. Krause, Eng., 2022, 13, 25–44.
C. Corminboeuf and T. Laino, Wiley Interdiscip. Rev.: 27 F. Häse, L. M. Roch and A. Aspuru-Guzik, Trends Chem.,
Comput. Mol. Sci., 2022, 12, e1604. 2019, 1, 282–291.
5 F. Strieth-Kalthoff, F. Sandfort, M. H. Segler and F. Glorius, 28 R. J. Hickman, P. Bannigan, Z. Bao, A. Aspuru-Guzik and
Chem. Soc. Rev., 2020, 49, 6154–6168. C. Allen, Matter, 2023, 6, 1071–1081.
30 | Digital Discovery, 2024, 3, 23–33 © 2024 The Author(s). Published by the Royal Society of Chemistry
View Article Online
29 S. Lo, S. Baird, J. Schrier, B. Blaiszik, S. Kalinin, H. Tran, 53 J. Mayeld, D. Lowe and R. Sayle, Pistachio, https://
T. Sparks and A. Aspuru-Guzik, ChemRxiv, 2023, DOI: www.nextmovesoware.com/pistachio.html, accessed 19th
10.26434/chemrxiv-2023-6z9mq. Sep, 2023.
30 N. Artrith, K. T. Butler, F.-X. Coudert, S. Han, O. Isayev, 54 Reaxys, https://www.reaxys.com, accessed 19th Sep, 2023.
This article is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported Licence.
A. Jain and A. Walsh, Nat. Chem., 2021, 13, 505–508. 55 SciFinder, https://scinder.cas.org, accessed 19th Sep,
31 G. Vishwakarma, A. Sonpal and J. Hachmann, Trends 2023.
Chem., 2021, 3, 146–156. 56 S. Szymkuć, T. Badowski and B. A. Grzybowski, Angew.
32 A. Radford, J. W. Kim, T. Xu, G. Brockman, C. McLeavey and Chem., 2021, 133, 26430–26436.
I. Sutskever, arXiv preprint arXiv:2212.04356, 2022. 57 S. M. Kearnes, M. R. Maser, M. Wleklinski, A. Kast,
33 EXAONE, https://www.lgresearch.ai/exaone, accessed 19th A. G. Doyle, S. D. Dreher, J. M. Hawkins, K. F. Jensen and
Open Access Article. Published on 06 December 2023. Downloaded on 5/31/2024 9:44:07 AM.
© 2024 The Author(s). Published by the Royal Society of Chemistry Digital Discovery, 2024, 3, 23–33 | 31
View Article Online
S. Clark, Matter, 2023, 6, 2647–2665. H. S. Chae, M. Einzinger, D.-G. Ha and T. Wu, Nat.
76 S. M. Moosavi, A. Chidambaram, L. Talirz, M. Haranczyk, Mater., 2016, 15, 1120–1127.
K. C. Stylianou and B. Smit, Nat. Commun., 2019, 10, 539. 97 S. Nagasawa, E. Al-Naamani and A. Saeki, J. Phys. Chem.
77 X. Jia, A. Lynch, Y. Huang, M. Danielson, I. Lang’at, Lett., 2018, 9, 2639–2646.
A. Milder, A. E. Ruby, H. Wang, S. A. Friedler and 98 K. Molga, E. P. Gajewska, S. Szymkuć and B. A. Grzybowski,
A. J. Norquist, Nature, 2019, 573, 251–255. React. Chem. Eng., 2019, 4, 1506–1521.
Open Access Article. Published on 06 December 2023. Downloaded on 5/31/2024 9:44:07 AM.
78 M. Skreta, N. Yoshikawa, S. Arellano-Rubach, Z. Ji, 99 M. Ceriotti, MRS Bull., 2022, 47, 1045–1053.
L. B. Kristensen, K. Darvish, A. Aspuru-Guzik, F. Shkurti 100 J. Weinreich, N. J. Browning and O. A. von Lilienfeld, J.
and A. Garg, arXiv, 2023, preprint arXiv:2303.14100, DOI: Chem. Phys., 2021, 154, 134113.
10.48550/arXiv.2303.14100. 101 J. Schrier, A. J. Norquist, T. Buonassisi and J. Brgoch, J. Am.
79 A. M. Bran, S. Cox, A. D. White and P. Schwaller, arXiv, 2023, Chem. Soc., 2023, 145, 21699–21716.
preprint arXiv:2304.05376, DOI: 10.48550/ 102 F. Häse, L. M. Roch and A. Aspuru-Guzik, Chem. Sci., 2018,
arXiv.2304.05376. 9, 7642–7655.
80 D. A. Boiko, R. MacKnight and G. Gomes, arXiv, 2023, 103 R. Hickman, M. Sim, S. Pablo-Garcı́a, I. Woolhouse, H. Hao,
preprint arXiv:2304.05332, DOI: 10.48550/ Z. Bao, P. Bannigan, C. Allen, M. Aldeghi and A. Aspuru-
arXiv.2304.05332. Guzik, ChemRxiv, 2023, DOI: 10.26434/chemrxiv-2023-
81 G. M. Hocky and A. D. White, Digital Discovery, 2022, 1, 79– 8nrxx.
83. 104 C. J. Taylor, A. Pomberger, K. C. Felton, R. Grainger,
82 M. Krenn, R. Pollice, S. Y. Guo, M. Aldeghi, A. Cervera- M. Barecka, T. W. Chamberlain, R. A. Bourne,
Lierta, P. Friederich, G. dos Passos Gomes, F. Häse, C. N. Johnson and A. A. Lapkin, Chem. Rev., 2023, 123,
A. Jinich and A. Nigam, Nat. Rev. Phys., 2022, 4, 761–769. 3089–3126.
83 V. L. Deringer, N. Bernstein, G. Csányi, C. Ben Mahmoud, 105 J. C. Fromer and C. W. Coley, Patterns, 2023, 4, 100678.
M. Ceriotti, M. Wilson, D. A. Drabold and S. R. Elliott, 106 J. Schrier, A. J. Norquist, T. Buonassisi and J. Brgoch, J. Am.
Nature, 2021, 589, 59–64. Chem. Soc., 2023, 145, 21699–21716.
84 S. Han, G. Barcaro, A. Fortunelli, S. Lysgaard, T. Vegge and 107 A. McNally, C. K. Prier and D. W. MacMillan, Science, 2011,
H. A. Hansen, npj Comput. Mater., 2022, 8, 121. 334, 1114–1117.
85 G. H. Gu, J. Lim, C. Wan, T. Cheng, H. Pu, S. Kim, J. Noh, 108 J. Busk, M. Schmidt, O. Winther, T. Vegge and
C. Choi, J. Kim and W. A. Goddard III, J. Am. Chem. Soc., P. B. Jørgensen, Phys. Chem. Chem. Phys., 2023, 25,
2021, 143, 5355–5363. 25828–25837.
86 S. Han, S. Lysgaard, T. Vegge and H. A. Hansen, npj Comput. 109 S. Chen and Y. Jung, Nat. Mach. Intell., 2022, 4, 772–780.
Mater., 2023, 9, 139. 110 S. Vargas, S. Zamirpour, S. Menon, A. Rothman, F. Häse,
87 M. Ceriotti, C. Clementi and O. Anatole von Lilienfeld, T. Tamayo-Mendoza, J. Romero, S. Sim, T. Menke and
Chem. Rev., 2021, 121, 9719–9721. A. Aspuru-Guzik, J. Chem. Educ., 2020, 97, 689–694.
88 A. E. Mikkelsen, H. H. Kristoffersen, J. Schiøtz, T. Vegge, 111 L. Saar, H. Liang, A. Wang, A. McDannald, E. Rodriguez,
H. A. Hansen and K. W. Jacobsen, Phys. Chem. Chem. I. Takeuchi and A. G. Kusne, MRS Bull., 2022, 47, 881–885.
Phys., 2022, 24, 9885–9890. 112 A. K. Sharma, J. Comput. Sci. Educ., 2021, 12, 8–15.
89 Q. Wang, J. Pan, J. Guo, H. A. Hansen, H. Xie, L. Jiang, 113 E. S. Thrall, S. E. Lee, J. Schrier and Y. Zhao, J. Chem. Educ.,
L. Hua, H. Li, Y. Guan and P. Wang, Nat. Catal., 2021, 4, 2021, 98, 3269–3276.
959–967. 114 D. Revignas and V. Amendola, J. Chem. Educ., 2022, 99,
90 Y. Zhang, Y. Zheng, K. Rui, H. H. Hng, K. Hippalgaonkar, 2112–2120.
J. Xu, W. Sun, J. Zhu, Q. Yan and W. Huang, Small, 2017, 115 D. Lafuente, B. Cohen, G. Fiorini, A. A. Garcı́a, M. Bringas,
13, 1700661. E. Morzan and D. Onna, J. Chem. Educ., 2021, 98, 2892–
91 D. Bash, Y. Cai, V. Chellappan, S. L. Wong, X. Yang, 2898.
P. Kumar, J. D. Tan, A. Abutaha, J. J. Cheng and Y. F. Lim, 116 A. G. St James, L. Hand, T. Mills, L. Song, A. S. J. Brunt,
Adv. Funct. Mater., 2021, 31, 2102606. P. E. Bergstrom Mann, A. F. Worrall, M. I. Stewart and
92 P. Friederich, F. Häse, J. Proppe and A. Aspuru-Guzik, Nat. C. Vallance, J. Chem. Educ., 2023, 100, 1343–1350.
Mater., 2021, 20, 750–761. 117 R. C. Cachichi, G. Girotto Junior, E. Galembeck,
93 F. Musil, A. Grisa, A. P. Bartók, C. Ortner, G. Csányi and J. A. M. Schewinsky Junior, D. Ferreira Gomes and
M. Ceriotti, Chem. Rev., 2021, 121, 9759–9815. J. d. A. Simoni, J. Chem. Educ., 2020, 97, 3667–3672.
94 A. Grisa, D. M. Wilkins, G. Csányi and M. Ceriotti, Phys. 118 S. Lo, S. Baird, J. Schrier, B. Blaiszik, S. Kalinin, H. Tran,
Rev. Lett., 2018, 120, 036002. T. Sparks and A. Aspuru-Guzik, 2023, DOI: DOI: 10.26434/
95 T. Cohen and M. Welling, Group Equivariant Convolutional chemrxiv-2023-6z9mq-v2.
Networks, Proceedings of The 33rd International Conference 119 J. Vanderplas, Statistics for hackers, Portland, Oregon, 2016.
32 | Digital Discovery, 2024, 3, 23–33 © 2024 The Author(s). Published by the Royal Society of Chemistry
View Article Online
120 M. Abdinejad, B. Talaie, H. S. Qorbani and S. Dalili, J. Sci. 125 E. Li, A. T. Lam, T. Fuhrmann, L. Erikson, M. Wirth,
Educ. Technol., 2021, 30, 87–96. M. L. Miller, P. Blikstein and I. H. Riedel-Kruse, PLoS
121 R. van Dinther, L. de Putter and B. Pepin, J. Chem. Educ., One, 2022, 17, e0275688.
2023, 100, 1537–1546. 126 L. B. Armstrong, M. C. Rivas, Z. Zhou, L. M. Irie,
This article is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported Licence.
122 S. G. Baird and T. D. Sparks, Matter, 2022, 5, 4170–4178. G. A. Kerstiens, M. T. Robak, M. C. Douskey and
123 R. Keesey, R. LeSuer and J. Schrier, HardwareX, 2022, 12, A. M. Baranger, J. Chem. Educ., 2019, 96, 2410–2419.
e00319. 127 Y. Liu, J. Chem. Educ., 2022, 99, 2588–2596.
124 L. C. Gerber, A. Calasanz-Kaiser, L. Hyman, K. Voitiuk, 128 G. N. Quam, J. Chem. Educ., 1940, 17, 363.
U. Patil and I. H. Riedel-Kruse, PLoS Biol., 2017, 15, 129 R. M. Baker, M. E. Leonard and B. H. Milosavljevic, J. Chem.
e2001413. Educ., 2020, 97, 3097–3101.
Open Access Article. Published on 06 December 2023. Downloaded on 5/31/2024 9:44:07 AM.
© 2024 The Author(s). Published by the Royal Society of Chemistry Digital Discovery, 2024, 3, 23–33 | 33