0% found this document useful (0 votes)

6 views

Advancing Material Property Prediction Using Physics-Informed Machine Learning Models for Viscosity

Uploaded by

Danh Nguyen

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

Advancing Material Property Prediction Using Physics-Informed Machine Learning Models for Viscosity

Uploaded by

Danh Nguyen

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Chew et al.

Journal of Cheminformatics (2024) 16:31 Journal of Cheminformatics

https://doi.org/10.1186/s13321-024-00820-5

RESEARCH Open Access

Advancing material property prediction:

using physics‑informed machine learning
models for viscosity
Alex K. Chew1, Matthew Sender2, Zachary Kaplan1, Anand Chandrasekaran1, Jackson Chief Elk2,
Andrea R. Browning2, H. Shaun Kwak2, Mathew D. Halls3 and Mohammad Atif Faiz Afzal2*

Abstract
In materials science, accurately computing properties like viscosity, melting point, and glass transition temperatures
solely through physics-based models is challenging. Data-driven machine learning (ML) also poses challenges in con-
structing ML models, especially in the material science domain where data is limited. To address this, we integrate
physics-informed descriptors from molecular dynamics (MD) simulations to enhance the accuracy and interpretability
of ML models. Our current study focuses on accurately predicting viscosity in liquid systems using MD descriptors.
In this work, we curated a comprehensive dataset of over 4000 small organic molecules’ viscosities from scientific
literature, publications, and online databases. This dataset enabled us to develop quantitative structure–property
relationships (QSPR) consisting of descriptor-based and graph neural network models to predict temperature-
dependent viscosities for a wide range of viscosities. The QSPR models reveal that including MD descriptors improves
the prediction of experimental viscosities, particularly at the small data set scale of fewer than a thousand data
points. Furthermore, feature importance tools reveal that intermolecular interactions captured by MD descriptors
are most important for viscosity predictions. Finally, the QSPR models can accurately capture the inverse relationship
between viscosity and temperature for six battery-relevant solvents, some of which were not included in the original
data set. Our research highlights the effectiveness of incorporating MD descriptors into QSPR models, which leads
to improved accuracy for properties that are difficult to predict when using physics-based models alone or when lim-
ited data is available.
Keywords Classical molecular dynamics simulations, Organic molecules, Physical properties, Viscosity, Quantitative
structure–property relationships, Machine learning

*Correspondence:
Mohammad Atif Faiz Afzal
[email protected]
Full list of author information is available at the end of the article

© The Author(s) 2024. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which
permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the
original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or
other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line
to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory
regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this
licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativeco
mmons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
Chew et al. Journal of Cheminformatics (2024) 16:31 Page 2 of 14

Graphical Abstract

Introduction can substantially reduce cost and time by learning the

Dynamic viscosity, referred to in this work as the vis- underlying connection between molecular structure
cosity, is an important material property that measures to bulk properties, such as viscosity, from a large data-
“stickiness”, or a fluid’s resistance to flow when an external set. Fortunately, a substantial amount of experimental
force is applied. Viscosity stems from the friction in the viscosity data can be found online or through literature
bulk caused by adjacent layers of fluid moving at differ- [2, 8], which enables the training of ML models. A vari-
ent relative velocities; hence, on the molecular level, vis- ety of ML methods have been used previously to predict
cosity is dictated by intermolecular interactions between viscosity, namely group-contribution-based methods
particles that lead to internal friction upon fluid flow [1, and artificial neural networks (NN) [1–3, 9]. In par-
2]. Given that viscosity is a fundamental property of flu- ticular, quantitative structure–property relationships
ids, it is often measured in a wide range of applications, (QSPR)—which correlates molecular-level features to a
such as electronics, pharmaceuticals, and cosmetics [1, desired property—have shown great promise in devel-
3]. Viscosity is also an important parameter for battery oping accurate viscosity models. For instance, Goussard
and energy storage research because it dictates the per- and others recently developed a ML model that predicts
formance of the electrolyte solution within lithium-ion viscosity of pure liquids using a dataset of 300 molecules
batteries [4, 5]. Thus, accurately and rapidly measuring at a temperature of 25 ◦ C [2]. While this model is useful
viscosity is of pivotal importance for the design of new for predicting viscosities at room temperature, develop-
materials. ing a ML model that can predict the viscosity of mol-
Experimentally, the viscosity of a fluid can be meas- ecules across a span of temperatures would broaden its
ured using devices such as rheometers or viscometers utility. For example, temperature effects on the viscosity
[2]. However, measuring a large number of experimental of gasoline has a significant impact over fuel efficiency,
viscosities is challenging, costly, and limited based on the emphasizing the importance of a viscosity prediction
availability of compounds. Alternative to experiments, model as a function of temperature [10]. Based on past
much effort has been invested in obtaining viscosity empirical relationships, such as the Vogel equation [11],
using physics-based modeling, such as molecular dynam- the viscosity is expected to be inversely proportional to
ics (MD) simulations [4, 6, 7]. Despite advancements in temperature; hence, increase in temperature results in
simulation procedures, estimating viscosities from MD is possibly orders of magnitude decrease in viscosity (see
especially challenging for highly viscous systems greater examples in Fig. 1A). An ideal ML model should capture
than ∼5 cP and is computationally expensive, making the inverse relationship between viscosity and tempera-
MD simulations challenging to use for the high-through- ture, which would be useful in various applications, such
put screening of viscosities. Thus, developing computa- as consumer packaged goods [2], battery technology [5],
tionally efficient and accurate models that can predict or the automobile industry [10].
the viscosity of molecules is necessary to reduce trial- The recent emergence of deep learning methods have
and-error experimentation or expensive physics-based revolutionized how QSPR models are developed. QSPR
calculations. development was predominantly a traditional chemin-
In contrast to experimental or physics-based meth- formatic task that correlated expert-defined descriptors
ods, data-driven machine learning (ML) approaches or fingerprints to a property of interest [12]. The current
Chew et al. Journal of Cheminformatics (2024) 16:31 Page 3 of 14

Fig. 1 Distribution of the curated viscosity dataset. A Log-scale viscosity (µ) in centipoise as a function of temperature of three example
battery-relevant structures. Chemical structures are drawn within the plot, and linear dashed lines are included as visual guides. Histogram of B
log-scale µ and C temperature in Kelvins of the final viscosity dataset consisting of 4440 examples

state-of-the-art deep learning approach is graph neural improve the model accuracy. Finally, we employ feature
networks (GNN), specifically graph convolutional net- importance analysis tools to evaluate the influence of
works, which uses convolution operators that learn fea- molecular-based and physics-informed descriptors on
tures directly from a graph representation of a molecule QSPR performance. We demonstrate that the developed
(i.e. representing atoms as nodes and bonds as edges) models are highly accurate and can be used for quick
[13]. GNNs are a promising approach to autonomously estimation of viscosity of new molecules, which enables
create structure–property relationships without hav- these models to be used for the high-throughput screen-
ing to pre-define descriptors based on expert domain ing of viscosities.
knowledge [14]. However, it is still unclear whether
GNNs outperform the descriptor-based models, where
the prediction accuracy of both approaches is depend- Methods
ent on the type and size of the data [12]. Furthermore, Viscosity dataset
it is unclear how the inclusion of external features (such We extracted viscosities, temperatures, and structures
as temperature) might impact the prediction accuracy of from the relevant literature and online databases [2, 16–
either descriptor-based descriptors or GNN approaches. 28]. Details of the literature sources are included in Addi-
Finally, developing accurate QSPR models requires a tional file 1: Table S1. All structures were represented as
large, curated viscosity dataset that could broadly gener- simplified molecular-input line entry-system (SMILES)
alize viscosity values across a wide range of temperatures. strings. We curated an initial dataset of 5356 viscosity
Some recent work has explored the use of data-driven entries, covering a wide range of temperatures and vis-
methods to predict viscosities, such as group contribu- cosities. Then, we filtered the dataset using the follow-
tion methods for n-alkanes and iso-alkanes [9] or GNNs ing steps: (1) filtered for single, organic structures with
for single and binary liquid mixtures [15]. However, the atomic elements of {H, C, N, O, F, Si, P, S, Cl, Br, and I};
comparison between descriptor-based and graph-based (2) since high experimental errors were observed for high
approaches, as well as the inclusion of physics-informed and low extremums of the viscosity and temperature val-
descriptors, has not been well-explored. ues, the dataset was filtered using the box-and-whisker
In this work, we have extracted and cleaned a large plot method, where viscosities and temperatures that
dataset of over 4000 experimental viscosities of small fall outside of 1.5 times their corresponding interquar-
molecules at various temperatures from multiple litera- tile range are removed as outliers; (3) since the viscosity
ture sources. We use this viscosity dataset to build and values are expected to be inversely proportional to tem-
benchmark machine learning models that can predict perature for bulk liquids, data points that have a posi-
viscosity as a function of temperature. We constructed tive deviation of viscosity with respect to temperature
both descriptor-based and GNN-based QSPR models to greater than 0.02 cP were removed as outliers (positive
evaluate whether learned features from graphs could out- deviations often arise from different literature sources).
perform hand-crafted features in predicting viscosities. After applying the data filtration process, we used a total
Additionally, we incorporate information obtained from of 4440 viscosity entries for ML model development.
physics-based simulations into the ML models to further This dataset consists of 1005 unique structures, with
Chew et al. Journal of Cheminformatics (2024) 16:31 Page 4 of 14

viscosities ranging from 0.10 cP to 26.52 cP, and tempera- Descriptor‑based QSPR models
tures ranging from 227 to 404 K. Since only 136 of the The general workflow for developing descriptor-based
1005 unique structures have stereoisomers, we did not models is summarized in Fig. 2A. All molecules were
account for the impact of isomerism in this work. We featurized with 209 RDKit descriptors, 1000 Morgan
apply log transform of viscosity to ameliorate the skewed fingerprints, and 132 Matminer descriptors. Featuri-
distribution of viscosity values; thus, all viscosities will be zation for RDKit and Morgan fingerprints were imple-
presented in the log-scale as log µ, where µ has units of mented using the rdkit package (Version 2021.09.4)
centipoise. [29], whereas Matminer descriptors were imple-
Figure 1A shows the log-scale viscosity as a function mented using the matminer package (Version 0.6.3)
of temperature for three representative small molecules [30]. Based on the Vogel equation of viscosity [11],
(methyl acetate, ethyl acetate, and methyl butyrate), we expect that log µ is proportional to the inverse
which are electrolytes relevant to the designing of Li-ion of temperature; hence, we input the inverse of tem-
batteries [5]. Figure 1A highlights the inverse proportion- perature for all ML models. External features, such as
ality expected between viscosity and temperature, where experimental inverse temperature or physics-based
higher temperature values yield lower viscosities. Fig- descriptors, were included as an additional descrip-
ure 1B and C shows the histogram of log-scale viscosity tor into the models; hence, a total of 1341 + Next fea-
and temperatures for the 4,440 entries respectively. Both tures were passed into ML model development, where
Fig. 1B and C shows a right-skewed normal distribution Next is the number of external features. All features
for both log-scale viscosity and temperatures, which were preprocessed with the following procedure: (1)
means that data is more spread apart at larger viscos- correlated features with Pearson’s r greater than or
ity and temperature values. We used the 4,440 viscosity equal to 0.90 were removed; (2) constant features with
entries to train and evaluate all QSPR models. variance of zero were removed; and, (3) features were
standardized by subtracting the mean and dividing by

Fig. 2 Descriptor-based QSPR approaches for predicting viscosity. A Workflow of the descriptor-based approaches using methyl acetate
as an example. Methyl acetate is featurized with RDKit, Morgan fingerprint, and Matminer descriptors. A total of 1341 + Next (external features)
features were passed into machine learning model development. The inverse temperature is included in model development to incorporate
temperature effects. B Five-fold cross validation and test set RMSE for QSPR models. The average RMSE is reported across five out-of-sample
train-test splits and the RMSE uncertainty is estimated by computing the standard deviation across the splits. C Parity plot between predicted
and actual log-viscosity showing the validation set predictions across 5-CV on the training set for a single train/test split when using the LGBM
model, which had the highest model score based on Eq. 1. Each color indicates the different validation sets for each of the five folds. The number
of examples used (N), R2, and RMSE for 5-CV are reported within the plot. D Parity plot between predicted and actual log viscosity for a single
80:20 train:test split for the LGBM model. The total number of examples used (N) and statistics (i.e. R2 and RMSE) for train and test sets are reported
within the plot. For all parity plots, a dashed diagonal y = x line is drawn as a guide to indicate which predictions are in agreement with the actual
values
Chew et al. Journal of Cheminformatics (2024) 16:31 Page 5 of 14

the standard deviation. On average, 876 of the descrip- the Additional file 1: Table S2. For all descriptor-based
tors remained after feature preprocessing, which were QSPR models, we used a bagging regressor approach
passed as inputs into ML algorithms. Eight differ- to allow for estimation of prediction errors, where 20
ent ML algorithms were tested: multilayer perceptron estimators for each ML algorithm were independently
(MLP), support vector regression (SVR), random forest trained by randomly sampling the training set with
(RF), gradient boosting regression (GBR), light gradi- replacement. Prediction values are reported by com-
ent-boosting machine (LGBM) [31], extreme gradient puting the average prediction of the 20 estimators, and
boosting (XGB) [32], least absolute shrinkage and selec- prediction uncertainties are computed using the 90%
tion operator (LASSO), and partial least squares (PLS). confidence interval of the prediction values.
All models were implemented with the scikit-learn
package (Version 1.0.2) [33], except LGBM (lightgbm GNN QSPR models
package, Version 3.2.1) and XGB (xgboost package, GNN models were built using DeepAutoQSAR,
Version 1.5.1). We selected these ML algorithms based Schrödinger’s automated molecular property predic-
on the current state-of-the-art in the literature to iden- tion engine [34, 35]. For GNNs, molecules are treated
tify the best ML algorithm to predict liquid viscosity as molecular graphs with atoms as nodes and bonds as
[12]. For LASSO models, sparsity or reduction of fea- edges, which is illustrated in Fig. 3A. A total of 75 fea-
ture space was applied by modifying the “alpha” param- tures + Next (external features) were used to featur-
eter in the sklearn module, which dictates the extent of ize each heavy atom. Atomic featurizations include
L1 regularization on the coefficients of a linear regres- one-hot encodings of atomic number, implicit valence,
sion. For SVR models, we used the default radial basis formal charge, atomic degree, number of radial elec-
function kernel type in the sklearn module. Hyperpa- trons, hybridization, and aromaticity [35]. External fea-
rameters for descriptor-based models are described in tures were standardized by subtracting the mean and

Fig. 3 Graph neural network QSPR approaches for predicting viscosity. A Workflow of the graph neural network (GNN) based approaches
using methyl acetate as an example. Methyl acetate is represented as a molecular graph (G) with atoms as nodes (V) and bonds as edges (E).
B Five-fold cross validation and test set RMSE for QSPR models. The average RMSE is reported across five random train-test splits and the RMSE
uncertainty is estimated by computing the standard deviation across the splits. LGBM is included in this plot as a comparison between the best
descriptor-based QSPR model against GNN QSPR models. Only the top five performing GNNs are shown for brevity, which were selected
based on Eq. 1. C Parity plot between predicted and actual log-viscosity showing the validation set predictions across 5-CV on the training set
for a single train/test split when using the EdgePool model, which had the highest model score based on 5-CV and test set R2. Each color indicates
the different validation sets for each of the five folds. The number of examples used (N), R2, and RMSE for 5-CV are reported within the plot. D
Parity plot between predicted and actual log viscosity for a single 80:20 train:test split for the EdgePool model. The total number of examples used
(N) and statistics (i.e. R2 and RMSE) for train and test sets are reported within the plot. For all parity plots, a dashed diagonal y = x line is drawn
as a guide to indicate which predictions are in agreement with the actual values
Chew et al. Journal of Cheminformatics (2024) 16:31 Page 6 of 14

dividing by the standard deviation before being passed percentage free volume (MD_FV), radius of gyration of
into GNNs. For each atom, GNNs aggregate informa- the molecule (MD_Rg), Hansen solubility parameters
tion from its neighboring atoms and update a new atomic (MD_SP, MD_SP_E, and MD_SP_V), heat of vaporiza-
vector based on message passing across the molecular tion (MD_HV), and root-mean-square displacement
graph. The final learned atomic features outputted by the (MD_RMSD) (see Additional file 1: Section S2.1 for
readout phase are then inputted into a fully connected details). MD descriptors were computed by taking the
layer to predict log viscosities. Ten graph-based model ensemble-average over the last 10 ns simulation of the
approaches were evaluated: Graph Convolution Neural production run, and these descriptors show convergence
Network (GCN) [36], Pytorch version of GCN (Torch- for both low and high viscosity examples (see Additional
GraphConv) [37], TopK [38], GraphSAGE [39], Graph file 1: Figs. S3 and S4). Averaging MD descriptors using
Isomorphism Network (GIN) [40], Self-Attention Graph multiple replicas of MD simulations may yield better
Pooling (SAGPool) [41], EdgePool [42], GlobalAtten- monotonic trends as a function of temperature, but their
tion [40], Set2Set [43], and SortPool [44]. Different GNN values do not significantly differ as compared to descrip-
models differ slightly by how they aggregate information tors from a single MD simulation (see Additional file 1:
based on successes from previous literature [40, 42]. All Fig. S9). Therefore, we use MD descriptors from a sin-
graph-based models were trained with PyTorch (Version gle simulation. These MD descriptors were inputted as
1.9.0) [45] for 500 epochs, a learning rate of 0.01, and a external features into the ML models to evaluate whether
dropout ratio of 0.25. Hyperparameters for GNNs are they could improve the prediction accuracy of viscosi-
described in the Additional file 1: Table S3. ties. While MD simulations can yield highly informative
descriptors, they also incur additional simulation costs.
Classical molecular dynamics simulations The estimated computational cost is around one hour
We performed MD simulations for all the structures at per structure and temperature, assuming the use of a
each experimental temperatures in the viscosity data- computer with a GPU similar to the NVIDIA Tesla T4.
set to evaluate whether the inclusion of MD descrip- However, this cost could be mitigated by employing more
tors would improve ML models. For all simulations, we efficient GPUs.
used the Schrödinger’s Materials Science Suite (MSS)
[46], which leverages the Desmond MD engine to rap- QSPR model training and evaluation
idly speed up MD computations through GPU accelera- The workflow used to evaluate QSPR models is shown
tion [7, 47, 48]. All molecules were parameterized with in Additional file 1: Fig. S5. To alleviate the effect of ran-
the OPLS4 force field [49]. For each system, we first domness in data splitting, five independent runs with
constructed an amorphous simulation cell with approxi- different random seeds were performed with an 80:20
mately 8000 atoms. The initial density of the system in train:test split. Previous literature has used multiple
the amorphous cell structure was 0.5 g/cm3. train/test splits to better assess the accuracy of machine
The equilibration procedure consisted of Brownian learning models [12]. While the average model perfor-
minimization of 150 ps, 0.5 ns NVT ensemble (Number mance of multiple train/test splits is similar to the model
of atoms, Volume, and Temperature are conserved) with performance when using a single train/test split for pre-
2 fs time step at temperature of 500 K and pressure of 1 dicting viscosity (see Fig. S6 in the Additional file 1), we
atm, 1 ns NPT ensemble (Number of atoms, Pressure, only report the average model performance of the multi-
and Temperature are conserved) with 2 fs time step at ple train/test splits to avoid possible bias in data splitting.
temperature of 400 K and pressure of 1000 bar, 2 ns NPT Since the viscosity dataset contains multiple entries with
ensemble with 2 fs time step at temperature of 300 K and the same molecule at different temperature and viscosity
pressure of 1 atm, 5 ns NPT ensemble with 2 fs time step values, we implement an out-of-sampling approach for
at the temperature (Texp) where experimental viscosity is data splitting, where unique compounds are iteratively
reported K and pressure of 1 atm, 10 ns NPT ensemble introduced to the training set until it reaches 80% of the
with 2 fs time step at Texp and pressure of 1 atm. After dataset and the remaining 20% of the data is placed in
this equilibration protocol, we take the average cell size the testing set. Previous studies have observed that out-
of the last 20% of the previous step and subsequently of-sampling splitting is a better approach to measure
perform 1 ns NVT ensemble with 2 fs time step at Texp. model accuracy as compared to random splitting from
The final production run consists of 20 ns NVT ensemble an application standpoint because model performance
with 2 fs time step at Texp with saving a frame at every from random splitting may lead to over-optimistic model
100 ps interval. performance for datasets with repeated molecules where
We extracted eight MD descriptors from the final pro- the same molecule could appear in both train and test
duction MD simulation: packing density (MD_density), sets [50]. Therefore, all train/test splits in this work uses
Chew et al. Journal of Cheminformatics (2024) 16:31 Page 7 of 14

the out-of-sampling approach such that the test set has the sign of the importance is determined by computing
unique compounds from the training set. the Pearson’s r correlation coefficient between the Shap-
For each train/test split, a five-fold cross validation pro- ley and descriptor values. Positive Pearson’s r between
cedure (5-CV) was implemented on the training set for Shapley and descriptor values indicate that the feature
hyperparameter tuning and evaluating model generaliz- positively contributes to the output property, whereas
ability across the training set. In 5-CV, the training set negative Pearson’s r indicates the converse. Additional
is partitioned into five separate sets, whereby for each details about the SHAP method could be found in previ-
of the five folds, one set is left-out as the validation set ous literature [12, 55, 56].
using the out-of-sample data splitting approach and the
remaining sets are used to train the model; this proce- Results and discussion
dure is repeated five times until all of the data instances Performance of descriptor‑based QSPR models
are within the left-out set exactly once. In this work, we We first sought to develop QSPR models using the
report the 5-CV coefficient of determination ( R2) and descriptor-based approach, where hand-crafted two-
root-mean-square error (RMSE) of the left-out sets only, dimensional (2D) descriptors and fingerprints are used
which measures the model performance on new com- as inputs into the machine learning model. Figure 2A
pounds. After selecting the best hyperparameters from shows the general workflow for inputting hand-crafted
5-CV, the model is re-trained with the entire training set descriptors and external descriptors, such as inverse tem-
and used to predict the test set. The models are evalu- perature, into QSPR models to predict log viscosities (see
ated based on their ability to accurately generalize across Methods for more details). Figure 2B shows the 5-CV and
the training set using the 5-CV approach and predict test set RMSE for the eight ML algorithms when using
the testing set, which is summarized by a model score five random, out-of-sample 80:20 train:test splits across
(ScoreM ) in Eq. 1. the viscosity dataset. ML algorithms were rank-ordered
based on their model scores as described in Eq. 1. From
2 2 2
ScoreM = Rtest × 1 − R5−CV − Rtest (1) Fig. 2B, we observe that tree-based ML models, such
as LGBM, XGB, and GBR, were the top performers in
2
Rtest and R5−CV
2 is the coefficient of determination for predicting log viscosities, followed by other non-linear
the test and 5-CV of the train set, respectively. ScoreM approaches such as SVR and MLP. Linear models like
rewards models that exhibit high generalizability for both LASSO and PLS perform the worst, suggesting that a
the training and testing sets. ScoreM penalizes mod- non-linear relationship between the 2D descriptors and
els where the accuracy is low for both sets or when the log viscosities may be necessary for an accurate model.
accuracies between the two sets are very distinct, which For all models, 5-CV and test set RMSEs are very similar,
may be indicative of overfitting or poor generalization. which shows that the models’ ability to generalize across
ScoreM is similar to previous model scoring functions the training set is correlative to its ability to generalize to
in the literature that automatically select good models for unseen examples.
structure–property relationships [51]. We primarily use Since LGBM had the highest model score, we further
ScoreM to rank-order QSPR models based on accuracy investigated its accuracy in the 5-CV of the training set
on 5-CV and test set prediction accuracy. All QSPR mod- and in test set predictions. Figure 2C shows the parity
els were implemented using Python (Version 3.8.15). plot between predicted versus actual log viscosities when
performing 5-CV across the training set when using the
Model interpretation LGBM algorithm; only predictions on the left-out vali-
Feature importance was evaluated using the SHapley dation set are shown for each of the five cross validation
Additive exPLanations (SHAP) approach (shap pack- folds. The 5-CV parity plot shows that the majority of the
age, Version 0.41.0), which is a game theory approach to points lie along the diagonal y = x line, suggesting that
quantify the contributions of single players in a collabo- the LGBM model generalizes well across the training
rative game [52, 53]. Shapley values measure the impact set with a 5-CV R2 of 0.88 and RMSE of 0.16. Figure 2D
of a descriptor to an output property by including or shows a parity plot of predicted versus actual log viscosi-
excluding the descriptor across a set of instances. SHAP ties for the training set and testing set when performing
is a local model-agnostic method for explaining individ- an 80:20 train:test split and using the LGBM algorithm.
ual predictions. SHAP can also be used as a global inter- The LGBM model learned the training set well with a
pretation method by aggregations of Shapley values [54]. train R2 of 0.99 and RMSE of 0.04 and predicted the
For all SHAP calculations, we use the test set instances to left-out test set with lower accuracy (i.e. test R2 of 0.91
measure descriptor importance. The average magnitude and RMSE of 0.13). The parity plots in Fig. 2C and D
of Shapley values is reported (i.e. Mean SHAP ), and show minimal outliers in the LGBM model predictions,
Chew et al. Journal of Cheminformatics (2024) 16:31 Page 8 of 14

which suggests the model is accurately capturing trends may slightly outperform graph-based approaches for this
between structure, temperature, and viscosities. viscosity dataset. Figure 3C shows the parity plot between
predicted versus actual log viscosities when performing
Performance of GNN QSPR models 5-CV across the training set when using the EdgePool
We next evaluated whether GNNs might outperform the model. EdgePool achieves a 5-CV R2 of 0.84 and RMSE
descriptor-based approaches in predicting temperature- of 0.18, which is slightly poorer compared to LGBM (see
dependent viscosities. Figure 3A shows the general work- Fig. 2C). Figure 3D shows a parity plot between predicted
flow of using GNNs to predict viscosities using methyl versus actual log viscosities for an 80:20 train:test split
acetate as an example (see Methods section for details). using the EdgePool algorithm. In comparison to LGBM
Figure 3B shows the 5-CV and test set R2 for the top five (Fig. 2D), EdgePool achieves a slightly poorer test set R2
GNN models ranked based on model score and the top of 0.89 and RMSE of 0.15. Overall, these results show
descriptor-based LGBM model as a comparison. While that GNNs could be used to predict viscosities; however,
the EdgePool model had the highest model score, the descriptor-based approaches perform slightly better for
overall 5-CV and test set R2 is comparable between the this dataset.
different GNN approaches, which suggests that varying
GNN architectures did not yield higher accuracy in vis- Impact of molecular simulation derived descriptors
cosity predictions. The GNN models have slightly lower on QSPR models for viscosity
5-CV and test set R2 as compared to the descriptor-based We next investigated whether the inclusion of physics-
LGBM model (performance is drawn as a vertical dashed based descriptors computed from molecular dynamics
line), which suggests that descriptor-based approaches simulations could help improve the QSPR accuracy of

Fig. 4 Impact of MD descriptors in QSPR models for viscosity predictions. A Simulation snapshot of methyl acetate at T = 298 K, which was used
to compute eight MD descriptors. B Test set root-mean-square error (RMSE) for descriptor-based LGBM model and GNN-based EdgePool model
when including two-dimensional descriptors (2D), molecular dynamics (MD) descriptors, or combinations of 2D and MD (2D and MD) into the QSPR
models. The average RMSE is reported across five random, out-of-sample train-test splits and the RMSE uncertainty is estimated by computing
the standard deviation across the splits. C Log-scale learning curve showing test set RMSE versus train set size when using 20% of the dataset as test
set and re-training the models with increasing training set sizes. These curves are plotted for LGBM and EdgePool models with and without MD
descriptors. Twenty train-test splits were implemented to obtain accurate measurements of test RMSE, where the mean test set RMSE is reported
and the uncertainty is estimated by the standard deviation of the test set RMSEs
Chew et al. Journal of Cheminformatics (2024) 16:31 Page 9 of 14

temperature-dependent viscosities. We hypothesized including MD descriptors becomes statistically insig-

that since viscosity is dictated by intermolecular interac- nificant when comparing against models without MD
tions during fluid flow, MD-derived features that capture descriptors. We further quantified the percent change in
these interactions may improve QSPR models for viscos- test RMSE with and without MD descriptors in the Addi-
ity. As an example, Fig. 4A shows a production simula- tional file 1: Fig. S8. We found that both LGBM and Edge-
tion snapshot of methyl acetate at T = 298 K, which was Pool models achieve at least 15% reduction in test RMSE
used to generate eight MD descriptors for QSPR model at 500 training examples and plateauing at 10% reduc-
development (see the Methods section for details). We tion in test RMSE at 3,500 examples. These results sug-
evaluated the inclusion of MD descriptors for both gest that MD descriptors are particularly advantageous
descriptor-based and GNN-based QSPR models, spe- for viscosity predictions at the low-data regions, but the
cifically the LGBM algorithm for the descriptor-based usefulness of MD descriptors are diminished at the high-
model and the EdgePool algorithm for the graph-based data regions since the ML models may better correlate
model since these models obtained the highest model non-linear trends between lower dimensional features
score (see Fig. 3B). For LGBM models, inclusion of 2D (e.g. 2D descriptors) and viscosities.
and MD descriptors would result in 1350 descriptors Figure 4C highlights the surprisingly good performance
that consist of 1341 2D descriptors, one inverse tempera- of LGBM models when using only eight MD descrip-
ture descriptor, and eight MD descriptors. For EdgePool tors, which outperformed the same model when using
models, inclusion of MD descriptors would result in a more than hundreds of 2D descriptors at the small data
total of nine external features consisting of one inverse scale. The improved model performance suggests that
temperature descriptor and eight MD descriptors. All MD descriptors are informative to viscosity predictions,
descriptors are preprocessed by correlated and constant which is further supported by feature importance analy-
feature removal as described in the Methods section. sis in the next section. Furthermore, while MD simula-
Fig. 4B compares the test set RMSE of LGBM and Edge- tions struggle to directly measure high viscosities that are
Pool when using either 2D descriptors alone, 2D and MD greater than five cP, MD excels in accurately predicting
descriptors together, and MD descriptors alone to pre- certain properties, such as system density, heat of vapori-
dict the log viscosities. We observe that LGBM with 2D zation, and solubility parameters, which shows a high
and MD descriptors has a slightly lower test set RMSE degree of correlation with experimental data [48, 49, 57].
as compared to LGBM trained with 2D and MD descrip- Thus, we can reliably use these MD descriptors in our
tors separately. Similarly, inclusion of MD descriptors for ML models even for molecules with high viscosities.
EdgePool slightly decreases test set RMSE as compared
to EdgePool alone. Model interpretability for descriptors‑based QSPR models
While MD descriptors did not significantly improve One advantage of descriptor-based QSPR models is the
test set RMSEs when performing an 80:20 train:test split ability to interpret which features are most relevant to
across the viscosity dataset, we hypothesized that MD predicting viscosity, which remains an active area of
descriptors may be more useful in the low-data region research for graph-based QSPR models that are gener-
where highly informative descriptors are expected to ally more challenging to interpret [58–61]. Given that
improve prediction accuracy for viscosity. Figure 4C LGBM models with permutations of 2D and MD descrip-
shows the learning curve for LGBM and EdgePool mod- tors performed similarly in predicting viscosities (see
els with the inclusion of 2D descriptors alone, 2D and Fig. 4B), we sought to analyze the underlying connec-
MD descriptors, and MD descriptors alone. The learning tions between the descriptors and viscosities to see if
curve measures the effectiveness of these models and var- there are any similarities when varying the featurization
ying featurization schemes to predict an unseen test set spaces. We use the SHAP approach to quantify feature
consisting of 20% of the viscosity dataset when increasing importance by measuring the impact of each descrip-
the number of training examples inputted to the models. tor to viscosity predictions by including or excluding
For LGBM models, we observe that using the combina- the descriptor across a set of instances (see Methods for
tion of 2D and MD descriptors or MD descriptors alone details). The SHAP method is advantageous because it is
outperform using 2D descriptors alone at the small train- a model-agnostic approach that is capable of quantifying
ing sizes (∼100 data points) in predicting the test set feature importance even for “black box” models, such as
RMSE. We observe a similar pattern when training Edge- deep neural networks [62]. Figure 5 shows the top five
Pool with and without MD descriptors, where inclusion features measured by the average magnitude of Shapley
of MD descriptors lowers test set RMSE between ∼100 to values when using the LGBM model with 2D descriptors
∼1000 training sizes. As the training size increases above only, 2D and MD descriptors, and MD descriptors only.
∼1000 examples, the prediction accuracy gained from
Chew et al. Journal of Cheminformatics (2024) 16:31 Page 10 of 14

When using 2D descriptors only (see Fig. 5A), “MOE-

like” charge van der Waal’s surface area descriptors
(RD_PEOE_VSA1), inverse temperature (Inv. Temp.),
graph-like descriptors (Ipc) [63], molecular weight (RD_
MolWt), and EState VSA Descriptor 3 (RD_VSA_Estate3)
are the top descriptors that contribute to predictions of
viscosity. We expect experimental temperature to be an
important parameter for temperature-dependent viscos-
ity predictions, hence it is no surprise that Inv. Temp. is
one of the top descriptors. The other descriptors suggest
that molecular size and charge distribution contributes to
viscosity, which is consistent with our understanding that
larger molecules result in more intermolecular attrac-
tions that lead to higher viscosities and charges influence
attractiveness between molecules.
When combining 2D and MD descriptors (see Fig. 5B),
the top descriptors when using 2D descriptors alone are
replaced with two MD descriptors: heat of vaporiza-
tion (MD_HV) and free volume (MD_FV). MD_HV is
computed from nonbonded interactions and is the top
descriptor that contributes to viscosity, which agrees
with our hypothesis that intermolecular interactions cap-
tured from MD simulations may be more informative for
a QSPR model as compared to 2D descriptors. Interest-
ingly, experimental heat of vaporization has been previ-
ously used as a parameter to correlate with viscosity [64],
which is in agreement with the top MD_HV descriptor
identified by the LGBM model. MD_FV captures the
voids between molecules in solution, which has been
observed in the literature to be related to viscosity [9,
65]. MD_FV is also negatively correlated to viscosity (see
Fig. S2 in the Additional file 1), which means that smaller
voids in solution results in favorable interactions between
molecules and, hence, higher internal friction and viscos-
ity. The results in Fig. 5B highlights that MD descriptors
are important for viscosity despite being in the presence
of more than hundreds of 2D descriptors.
When using MD descriptors only (Fig. 5C), MD_HV
Fig. 5 Feature importance of descriptor-based LGBM models. remains to be the top descriptor relevant to viscosity pre-
Top 5 important features measured as the average magnitude dictions consistent when using both 2D and MD descrip-
of SHapley Additive exPLanations (SHAP) values (i.e. Mean |SHAP|) tors. Information about molecular size, such as radius
for LGBM models trained with A 2D descriptors only, B 2D and MD of gyration (MD_Rg) and density (MD_density), are the
descriptors, and C MD descriptors only. Positive Mean |SHAP|
indicates that the descriptor positively contributes to viscosity,
next top descriptors when using MD descriptors, which
whereas negative Mean |SHAP| indicates the converse. Descriptors is similar with the top descriptors observed when using
with prefixes of “RD” and “MD” refer to RDKit and MD descriptors, 2D descriptors only. Interestingly, experimental inverse
respectively. The average Mean |SHAP| of twenty LGBM estimators temperature is the least important of the top five features
is reported and the uncertainty is estimated by the computing when using MD descriptors only, which may be because
standard deviation of the Mean |SHAP| values. The number of features
correlated to the top features based on a Pearson’s r correlation
MD descriptors capture temperature effects during the
coefficient cutoff greater or equal to 0.90 are shown in brackets simulation or use temperature as part of the calculations.
and summarized here (parenthesis is Pearson’s r correlation to the top Altogether, Fig. 5 suggests that descriptors from MD
feature): a RD_HeavyAtomMolWt (0.99), RD_ExactMolWt (1.00), RD_ simulations that capture nonbonded interactions, such as
Chi0v (0.93), RD_LabuteASA (0.93); b MD_SP (0.93); c MD_RMSD (0.95) the heat of vaporization, are useful to accurately predict
viscosities.
Chew et al. Journal of Cheminformatics (2024) 16:31 Page 11 of 14

Temperature‑dependent viscosity predictions the viscosity dataset used in this work. We investigate
for battery‑relevant solvents whether the QSPR models in this work could predict the
We next evaluated whether the QSPR models can cap- experimental viscosity trends measured from Ref. [5].
ture the temperature dependence of log viscosities. To eliminate the effect of data splitting, we re-trained
We focused on six pure solvents previously studied by the QSPR models using the entire viscosity dataset in
Logan and coworkers, which were used to potentially this work. Figure 6 shows the log viscosities versus tem-
improve lithium ion battery electrolytes: methyl acetate perature predictions for the six solvents using descriptor-
(MA), ethyl acetate (EA), methyl butyrate (MB), methyl based LGBM and GNN-based EdgePool models with
propionate (MP), dimethyl carbonate (DMC), and ethyl varying featurization inputs (2D descriptors only, 2D
methyl carbonate (EMC) [5]. These pure solvents could and MD descriptors, and MD descriptors only). MA, EA,
be added as co-solvents for lithium ion batteries to lower MB, and MP are structures within the viscosity dataset
viscosities and increase electric conductivity; hence, (i.e. training set) and encompass the same range of tem-
these solvents can improve how fast a battery can charge peratures as experimentally measured in Ref. [5]. Hence,
or discharge. The authors report experimental tempera- across all QSPR models and featurization schemes,
ture-dependent viscosities for these six solvents, which the experimental points shown as orange triangles are
were not used in the original data curation of viscosities well-captured for MA, EA, MB, and MP (see Fig. 6A–
in this work. However, some of these solvents have been D). These results show that the QSPR models capture
observed in other databases (e.g. PubChem [20]), so there experimental trends from Ref. [5] for structures and
is some overlap between the viscosities from Ref. [5] and temperatures already seen in the training set, suggesting

Fig. 6 QSPR performance on six battery-relevant solvents. Predictions of descriptor-based LGBM model and GNN-based EdgePool model
when using two-dimensional descriptors (2D), molecular dynamics (MD) descriptors, or combinations of 2D and MD (2D and MD) in the QSPR
models for six battery electrolytes: A methyl acetate (MA); B ethyl acetate (EA); C methyl butyrate (MB); D methyl propionate (MP); E dimethyl
carbonate (DMC); and F ethyl methyl carbonate (EMC). Orange triangles represent experimental viscosities extracted from Ref [5]. MA, EA, MB,
and MP are in the training set and contain the temperature ranges that encompass those found in Ref [5]. DMC is partially in the training set such
that only two temperatures are provided to the models at T = 293.15 and 298.15 K. EMC is not within the training set at all. Molecular structures are
drawn in the upper right of each plot
Chew et al. Journal of Cheminformatics (2024) 16:31 Page 12 of 14

consistency between the viscosity values from Ref. [5] to using two-dimensional descriptors alone, suggesting
and the viscosity dataset in this work. that using features that capture intermolecular interac-
For DMC (see Fig. 6E), the solvent is partially within tions can help improve predictions of viscosities. The
the training set such that only two temperatures at T improvement in prediction accuracy upon inclusion
= 293.15 K and 298.15 K have been seen by the model. of MD descriptors is most pronounced when training
Hence, the QSPR models would be extrapolating across a viscosity models using small datasets of less than 1000
wider range of temperatures between 280 K to 323 K that examples. Analyzing the top features related to viscos-
were experimentally varied in Ref. [5]. We observe that ity for the LGBM model reveal that MD descriptors
EdgePool with (cyan line) and without MD descriptors become most important to predicting viscosity, specifi-
(green line), as well as LGBM with 2D and MD descrip- cally the heat of vaporization that captures nonbonded
tors (blue line), can accurately capture the experimental interactions between molecules. Finally, the QSPR
viscosities. Interestingly, predictions from LGBM models models can accurately capture the inverse relationship
with 2D descriptors or MD descriptors alone have the between temperature and viscosity for six battery-rele-
largest deviation from the experimental viscosities, which vant solvents.
suggests that combining 2D and MD descriptors helped These results demonstrate that regardless of descrip-
improve generalizations across temperature. For EMC tor-based or graph-based models, the inclusion of MD
(see Fig. 6F), the solvent is not within the training set; descriptors that capture intermolecular interactions is
hence, QSPR models would be predicting on a new mole- useful for prediction of viscosities, especially at small
cule. We observe similar trends as in Fig. 6E, where Edge- data sizes. The usefulness of MD descriptors may be
Pool with and without MD descriptors accurately capture even more relevant for mixture systems, where MD
experimental viscosity trends. LGBM with 2D and MD descriptors could more broadly generalize since they
descriptors outperform models trained with 2D or MD are not single-molecule-dependent as compared to
descriptors alone in capturing experimental trends. two-dimensional structural descriptors. However, one
Altogether, the predictions on the six battery-relevant of the drawbacks of using MD descriptors is the com-
solvents show that these QSPR models can: (1) capture putational cost to generate them. The improvement in
the inverse relationship between log viscosity and tem- accuracy from using MD at the small data scale, gener-
perature, (2) predict temperature-dependent viscosities alizability of MD descriptors to heterogeneous systems,
of new structures, and (3) improve in generalizability by and generating automated computational workflows
inclusion of MD descriptors for descriptor-based LGBM may help outweigh the cost of computing these
models. Given that the EdgePool model without the descriptors. Future work will investigate the utility of
inclusion of MD descriptors performed well on battery MD descriptors in predicting viscosities for mixture
solvents shown in Figure 6, we use this model to predict systems, such as binary mixtures explored in a recent
log viscosities for other solvents related to battery elec- work [15] (Additional file 3).
trolyte design for lithium metal anodes from Ref. [66].
Viscosity predictions for 50 solvents at temperature
ranges between 270 and 330 K are available in the Addi- Scientific contribution
tional file 1: Section S4.3. Future work will focus on using
these models to screen new compounds to identify mate- • Curated a viscosity dataset of more than 4000 exam-
rials with promising viscosities. ples consisting of small organic molecules and
trained quantitative structure property relationships
Conclusion (QSPR) models to accurately predict viscosity as a
In this work, we developed quantitative structure– function of temperature.
property relationships (QSPR) to predict temperature- • Encoding molecular dynamics (MD) simulation-
dependent viscosities of small organic molecules using derived descriptors that capture intermolecular
a curated dataset of over 4000 experimental viscosities. interactions improve viscosity prediction, especially
Both descriptor-based and graph-based models were in small data scenarios.
benchmarked to identify the best machine learning • Feature importance analysis reveal that MD-derived
algorithms that could accurately predict experimen- heat of vaporization is found to be the most useful
tal viscosities, which were the light gradient-boosting descriptor relevant to viscosity even in the presence
machine (LGBM) algorithm and EdgePool algorithms of hundreds of two-dimensional descriptors.
for descriptor-based and graph-based approaches,
respectively. Including molecular dynamics (MD)
descriptors slightly improved QSPR models compared
Chew et al. Journal of Cheminformatics (2024) 16:31 Page 13 of 14

Supplementary Information 5. Logan ER, Tonita EM, Gering KL, Li J, Ma X, Beaulieu LY, Dahn JR (2018) A
study of the physical properties of li-ion battery electrolytes containing
The online version contains supplementary material available at https://doi.
esters. J Electrochem Soc 165(2):A21
org/10.1186/s13321-024-00820-5.
6. Santak P, Conduit G (2020) Enhancing NEMD with automatic shear rate
sampling to model viscosity and correction of systematic errors in mod-
Additional file 1. This file contains details of the curated viscosity dataset, eling density: application to linear and light branched alkanes. J Chem
how molecular dynamics descriptors are computed, the correlation Phys 153(1):014102
between top descriptors and viscosity, the stability of molecular dynamics 7. Mohanty S, Stevenson J, Browning AR, Jacobson L, Leswing K, Halls MD,
descriptors, and hyperparameters for QSPR models. Afzal MAF (2023) Development of scalable and generalizable machine
Additional file 2. Viscosity dataset used for generating ML models. learned force field for polymers. Sci Rep 13(1):17251
8. Reid RC, Prausnitz JM, Poling BE (1987) The properties of gases and
Additional file 3. Viscosity predictions for 50 battery-relevant solvents at liquids, 4th edn. McGraw-Hill, New York
temperature ranges between 270 K - 330 K. 9. Jovanović JD, Grozdanić ND, Radović IR, Kijevčanin ML (2023) A new
group contribution model for prediction liquid hydrocarbon viscosity
based on free-volume theory. J Mol Liq 376:121452
Acknowledgements 10. Zhu Ling, Chen Jiaqing, Liu Yan, Geng Rongmei, Junjie Yu (2012)
We are grateful to the data team at Schrödinger for their assistance in the Experimental analysis of the evaporation process for gasoline. J Loss Prev
data curation of literature viscosity values, namely Asela Chandrasinge, Sophia Process Ind 25(6):916–922
Newman, and Mohammed Sulaiman. 11. Poling BE, Prausnitz JM, O’Connell JP (2000) The properties of gases and
liquids, 5th edn. McGraw Hill professional, McGraw Hill LLC, New York
Author contributions 12. Jiang D, Zhenxing W, Hsieh C-Y, Chen G, Liao B, Wang Z, Shen C, Cao D,
M.A.F.A. conceived the idea; A.K.C. and M.S. worked on the initial idea and Jian W, Hou T (2021) Could graph neural networks learn better molecular
report; J.C.E. helped in cleaning the initial data and conversion to SMILES; representation for drug discovery? A comparison study of descriptor-
A.K.C. extended the work by adding new data, adding advanced machine based and graph-based models. J Chem 13(1):1–23
learning algorithms, and implementing feature importance; A.K.C. wrote the 13. Reiser Patrick, Neubert Marlen, Eberhard André, Torresi Luca, Zhou Chen,
manuscript; M.A.F.A. supervised the work; all authors modified and approved Shao Chen, Metni Houssam, van Hoesel Clint, Schopmans Henrik, Som-
the manuscript. mer Timo et al (2022) Graph neural networks for materials science and
chemistry. Commun Mater 3(1):93
Data availibility 14. Zhenqin W, Ramsundar B, Feinberg EN, Gomes J, Geniesse C, Pappu AS,
Only 3582 of 4440 examples of the viscosity dataset are made available due to Leswing K, Pande V (2018) MoleculeNet: a benchmark for molecular
copyright restrictions as described in the Additional file 2. The subset viscosity machine learning. Chem Sci 9(2):513–530
dataset and a pre-trained LGBM model using the subset dataset are provided 15. Bilodeau C, Kazakov A, Mukhopadhyay S, Emerson J, Kalantar T, Muzny C,
under the Creative Commons Non-Commercial 4.0 International (CC-BY-NC Jensen K (2023) Machine learning for predicting the viscosity of binary
4.0) Attribution License. This license allows for the use of the data and the liquid mixtures. Chem Eng J 464:142454
creation of adaptations, exclusively for non-commercial purposes, provided 16. Saldana DA, Starck L, Mougin P, Rousseau B, Ferrando N, Creton B (2012)
that appropriate credit is given. The additional file contains details of the Prediction of density and viscosity of biofuel compounds using machine
curated viscosity dataset, how molecular dynamics descriptors are computed, learning methods. Energy Fuels 26(4):2416–2426
correlation between top descriptors and viscosity, stability of molecular 17. Viswanath DS, Ghosh TK, Prasad DHL, Dutt NVK, Rani KY, Viswanath DS,
dynamics descriptors, hyperparameters for QSPR models, and availability of Ghosh TK, Prasad DHL, Dutt NVK, Rani KY (2007) Correlations and estima-
the viscosity dataset and model. tion of pure liquid viscosity. In: Viscosity of liquids: theory, estimation,
experiment, and data, pp 135–405
Declarations 18. Cocchi Marina, Benedetti Pier Giuseppe De, Seeber Renato, Tassi Lorenzo,
Ulrici Alessandro (1999) Development of quantitative structure- property
Competing interests relationships using calculated descriptors for the prediction of the phys-
The authors declare no competing interests. icochemical properties (n d, ρ , bp, ε , η) of a series of organic solvents. J
Chem Inform Comput Sci 39(6):1190–1203
Author details 19. Kauffman Gregory W, Jurs Peter C (2001) Prediction of surface tension,
1
Schrödinger, Inc., New York 10036, USA. 2 Schrödinger, Inc., Portland, OR viscosity, and thermal conductivity for common organic solvents using
97204, USA. 3 Schrödinger, Inc., San Diego, CA 92121, USA. quantitative structure- property relationships. J Chem Inform Comput Sci
41(2):408–418
Received: 13 July 2023 Accepted: 27 February 2024 20. Kim Sunghwan, Thiessen Paul A, Cheng Tiejun, Zhang Jian, Gindulyte
Asta, Bolton Evan E (2019) Pug-view: programmatic access to chemical
annotations integrated in PubChem. J Cheminform 11(1):1–11
21. Dean JA et al (1999) Lange’s handbook of chemistry, 5th edn. Universitas
Of Tennese Knoxville, Mc. Graw Hill Inc, New York
22. Wasburn WE (2003) International critical tables of numerical data, physics,
References
chemistry and technology, 1st edn. Knovel, Norwich
1. Conte E, Martinho A, Matos HA, Gani R (2008) Combined group-contri-
23. Rumble John R (2022) CRC handbook of chemistry and physics, 103rd
bution and atom connectivity index-based methods for estimation of
edn. CRC Press, Boca Raton
surface tension and viscosity. Ind Eng Chem Res 47(20):7940–7954
24. Manivannan RG, Mohammad S, McCarley K, Cai T, Aichele C (2019) A
2. Goussard V, Duprat F, Ploix J-L, Dreyfus G, Nardello-Rataj V, Aubry J-M
new test system for distillation efficiency experiments at elevated liquid
(2020) A new machine-learning tool for fast estimation of liquid viscosity:
viscosities: vapor-liquid equilibrium and liquid viscosity data for cyclo-
application to cosmetic oils. J Chem Inf Model 60(4):2012–2023
pentanol+ cyclohexanol. J Chem Eng Data 64(2):696–705
3. Chen Y, Peng B, Kontogeorgis GM, Liang X (2022) Machine learning
25. Chen X, Jin S, Dai Y, Jianzhou W, Guo Y, Lei Q, Fang W (2019) Densities
for the prediction of viscosity of ionic liquid-water mixtures. J Mol Liq
and viscosities for the ternary system of decalin+ methylcyclohexane+
350:118546
cyclopentanol and corresponding binaries at t= 293.15 to 343.15 k. J
4. Dajnowicz S, Agarwal G, Stevenson JM, Jacobson LD, Ramezanghorbani
Chem Eng Data 64(4):1414–1424
F, Leswing K, Friesner RA, Halls MD, Abel R (2022) High-dimensional
26. Burk V, Pollak S, Quinones-Cisneros SE, Schmidt KAG (2021) Complemen-
neural network potential for liquid electrolyte simulations. J Phys Chem B
tary experimental data and extended density and viscosity reference
126(33):6271–6280
models for squalane. J Chem Eng Data 66(5):1992–2005
Chew et al. Journal of Cheminformatics (2024) 16:31 Page 14 of 14

27. Bright Norman FH, Hutchison H, Smith D (1946) The viscosity and density 49. Chao L, Chuanjie W, Ghoreishi D, Chen W, Wang L, Damm W, Ross GA,
of sulphuric acid and oleum. J Soc Chem Ind 65(12):385–388 Dahlgren MK, Russell E, Von Bargen CD et al (2021) Opls4: improving force
28. Segur JB, Oberstar HE (1951) Viscosity of glycerol and its aqueous solu- field accuracy on challenging regimes of chemical space. J Chem Theor
tions. Ind Eng Chem 43(9):2117–2120 Comput 17(7):4291–4300
29. Landrum G et al. (2010) Rdkit. Q2.https://www.rdkit.org/. Accessed Jan – 50. Zahrt AF, Henle JJ, Denmark SE (2020) Cautionary guidelines for
Apr 2023 machine learning studies with combinatorial datasets. ACS Comb Sci
30. Ward L, Dunn A, Faghaninia A, Zimmermann NE, Bajaj S, Wang Q, Mon- 22(11):586–591
toya J, Chen J, Bystrom K, Dylla M et al (2018) Matminer: an open source 51. Dixon SL, Duan J, Smith E, Von Bargen CD, Sherman W, Repasky MP
toolkit for materials data mining. Comput Mater Sci 152:60–69 (2016) AutoQSAR: an automated machine learning tool for best-practice
31. Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, Ye Q, Liu T-Y (2017) quantitative structure-activity relationship modeling. Future Med Chem
Lightgbm: a highly efficient gradient boosting decision tree. In: Guyon 8(15):1825–1839
I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R 52. Lundberg SM, Lee S-I (2017) A unified approach to interpreting model
(eds) Advances in neural information processing systems, vol 30. Curran predictions. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R,
Associates Inc, New York Vishwanathan S, Garnett R (eds) Advances in Neural Information Process-
32. Chen T, Guestrin C (2016) XGBoost: a scalable tree boosting system. In: ing Systems vol. 30. Curran Associates, Inc., pp 4765–4774. http://papers.
Proceedings of the 22nd ACM SIGKDD International Conference on nips.cc/paper/7062-a-unified-approach-to-interpreting-model-predi
Knowledge Discovery and Data Mining, KDD’16, ACM, New York. pp ctions.pdf. Accessed Jan – Apr 2023
785–794. https://doi.org/10.1145/2939672.2939785 53. Lundberg SM, Erion G, Chen H, DeGrave A, Prutkin JM, Nair B, Katz
33. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, R, Himmelfarb J, Bansal N, Lee S-I (2020) From local explanations to
Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, global understanding with explainable ai for trees. Nat Mach Intell
Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: 2(1):2522–5839
machine learning in Python. J Mach Learn Res 12:2825–2830 54. Molnar C (2022) Interpretable machine learning. 2nd edn. https://chris
34. Yang Y, Yao K, Repasky MP, Leswing K, Abel R, Shoichet BK, Jerome SV tophm.github.io/interpretable-ml-book. Accessed Jan – Apr 2023
(2021) Efficient exploration of chemical space with docking and deep 55. Rodríguez-Pérez R, Bajorath J (2019) Interpretation of compound activity
learning. J Chem Theor Comput 17(11):7106–7119 predictions from complex machine learning models using local approxi-
35. Benchmark study of deepautoqsar, chemprop, and deeppurpose on the mations and shapley values. J Med Chem 63(16):8761–8777
admet subset of the therapeutic data commons (2022) https://www. 56. Bannigan P, Bao Z, Hickman RJ, Aldeghi M, Häse F, Aspuru-Guzik A, Allen
schrodinger.com/sites/default/files/22_086_machine_learning_white_ C (2023) Machine learning models to accelerate the design of polymeric
paper_r4-1.pdf. Accessed 4 May 2024 long-acting injectables. Nat Commun 14(1):35
36. Kipf TN, Welling M (2016) Semi-supervised classification with graph 57. Afzal MAF, Sonpal A, Haghighatlari M, Schultz AJ, Hachmann J (2019)
convolutional networks. arXiv preprint arXiv:1609.02907 A deep neural network model for packing density predictions and its
37. Duvenaud DK, Maclaurin D, Iparraguirre J, Bombarell R, Hirzel T, Aspuru- application in the study of 1.5 million organic molecules. Chem Sci
Guzik A, Adams RP (2015) Convolutional networks on graphs for learning 10(36):8374–8383
molecular fingerprints. In: Advances in neural information processing 58. Wellawatte GP, Gandhi HA, Seshadri A, White AD (2022) A perspective
systems, p 28 on explanations of molecular prediction models. J Chem Theor Comput.
38. Knyazev B, Taylor GW, Amer M (2019) Understanding attention and gen- https://doi.org/10.1021/acs.jctc.2c01235
eralization in graph neural networks. In: Advances in neural information 59. Sanchez-Lengeling B, Wei J, Lee B, Reif E, Wang P, Qian W, McCloskey
processing systems, p 32 K, Colwell L, Wiltschko A (2020) Evaluating attribution for graph neural
39. Hamilton W, Ying Z, Leskovec J (2017) Inductive representation learning networks. Adv Neural Inf Process Syst 33:5898–5910
on large graphs. In: Advances in neural information processing systems, p 60. Huang Q, Yamada M, Tian Y, Singh D, Chang Y (2022) Graphlime: local
30 interpretable model explanations for graph neural networks. IEEE Trans
40. Xu K, Hu W, Leskovec J, Jegelka S (2018) How powerful are graph neural Knowl Data Eng
networks? arXiv preprint arXiv:1810.00826, 61. Weber JK, Morrone JA, Bagchi S, Estrada JD, Pabon SK, Zhang L, Cornell
41. Lee J, Lee I, Kang J (2019) Self-attention graph pooling. In: International WD (2022) Simplified, interpretable graph convolutional neural networks
conference on machine learning, PMLR. pp 3734–3743 for small molecule activity prediction. J Comput-Aided Mol Des. https://
42. Diehl F (2019) Edge contraction pooling for graph neural networks. arXiv doi.org/10.1007/s10822-021-00421-6
preprint arXiv:1905.10990 62. Rodríguez-Pérez R, Bajorath J (2020) Interpretation of machine learning
43. Vinyals O, Bengio S, Kudlur M (2015) Order matters: sequence to models using shapley values: application to compound potency and
sequence for sets. arXiv preprint arXiv:1511.06391 multi-target activity predictions. J Comput-Aided Mol Des 34:1013–1026
44. Zhang M, Cui Z, Neumann M, Chen Y (2018) An end-to-end deep learn- 63. Bonchev D, Trinajstić N (1977) Information theory, distance matrix, and
ing architecture for graph classification. In: Proceedings of the AAAI molecular branching. J Chem Phys 67(10):4517–4533
conference on artificial intelligence, vol. 32 64. Qun-Fang L, Yu-Chun H, Rui-Sen L (1997) Correlation of viscosities of pure
45. Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, liquids in a wide temperature range. Fluid Ph Equilib 140(1–2):221–231
Gimelshein N, Antiga L, Desmaison A, Kopf A, Yang E, DeVito Z, Raison M, 65. Miller AA (1963) “Free volume’’ and the viscosity of liquid water. J Chem
Tejani A, Chilamkurthy S, Steiner B, Fang L, Bai J, Chintala S (2019) Pytorch: Phys 38(7):1568–1571
an imperative style, high-performance deep learning library. In Advances 66. Kim SC, Oyakhire ST, Athanitis C, Wang J, Zhang Z, Zhang W, Boyle DT,
in Neural Information Processing Systems vol. 32. Curran Associates, Inc., Kim MS, Yu Z, Gao X et al (2023) Data-driven electrolyte design for lithium
pp 8024–8035. http://papers.neurips.cc/paper/9015-pytorch-an-imper metal anodes. Proc Natl Acad Sci 120(10):e2214357120
ative-style-high-performance-deep-learning-library.pdf. Accessed Jan –
Apr 2023
46. Version 2022–2 Materials Science Suite (2022) Schrödinger, llc, New York. Publisher’s Note
https://www.schrodinger.com/platform/materials-science. Accessed Jan Springer Nature remains neutral with regard to jurisdictional claims in pub-
– Apr 2023 lished maps and institutional affiliations.
47. Bowers KJ, Chow E, Xu H, Dror RO, Eastwood MP, Gregersen BA, Klepeis JL,
Kolossvary I, Moraes MA, Sacerdoti FD, et al (2006) Scalable algorithms for
molecular dynamics simulations on commodity clusters. In: Proceedings
of the 2006 ACM/IEEE Conference on Supercomputing, p. 84
48. Afzal MAF, Browning AR, Goldberg A, Halls MD, Gavartin JL, Morisato
T, Hughes TF, Giesen DJ, Goose JE (2020) High-throughput molecular
dynamics simulations and validation of thermophysical properties of
polymers for various applications. ACS Appl Polym Mater 3(2):620–630

Abusag Bridge Project: Structural Calculation For The Proposed Sheet Pile
No ratings yet
Abusag Bridge Project: Structural Calculation For The Proposed Sheet Pile
20 pages
Chemie Ingenieur Technik - 2022 - Micale - Computational Fluid Dynamics of Reacting Flows at Surfaces Methodologies and
No ratings yet
Chemie Ingenieur Technik - 2022 - Micale - Computational Fluid Dynamics of Reacting Flows at Surfaces Methodologies and
18 pages
Chemie Ingenieur Technik - 2022 - Micale - Computational Fluid Dynamics of Reacting Flows at Surfaces Methodologies and
No ratings yet
Chemie Ingenieur Technik - 2022 - Micale - Computational Fluid Dynamics of Reacting Flows at Surfaces Methodologies and
18 pages
s13321-024-00899-w
No ratings yet
s13321-024-00899-w
11 pages
1-s2.0-S0921883124000955-main
No ratings yet
1-s2.0-S0921883124000955-main
13 pages
A Self-Attention Based Message Passing Neural Netw
No ratings yet
A Self-Attention Based Message Passing Neural Netw
10 pages
AIChEJ-Kumar2024-在气固流的CFD模拟中捕获介观结构
No ratings yet
AIChEJ-Kumar2024-在气固流的CFD模拟中捕获介观结构
12 pages
Validation Metrics For Turbulent Plasma Transport
No ratings yet
Validation Metrics For Turbulent Plasma Transport
32 pages
Physical Review Research, 043210 (2022) : 10.1103/physrevresearch.4.043210
No ratings yet
Physical Review Research, 043210 (2022) : 10.1103/physrevresearch.4.043210
49 pages
2022 - A review of molecular representation in the age of machine learning
No ratings yet
2022 - A review of molecular representation in the age of machine learning
19 pages
26 Augmented Hill-Climb Increases Reinforcement Learning Efficiency For Language-Based de Novo Molecule Generation.
No ratings yet
26 Augmented Hill-Climb Increases Reinforcement Learning Efficiency For Language-Based de Novo Molecule Generation.
22 pages
2307.03811v3
No ratings yet
2307.03811v3
37 pages
machine learn
No ratings yet
machine learn
6 pages
TRAN_MASA_202402_P1_ARTBAS_CSCD
No ratings yet
TRAN_MASA_202402_P1_ARTBAS_CSCD
22 pages
GNN ML For Materials ACS 2019
No ratings yet
GNN ML For Materials ACS 2019
31 pages
Comparative Analysis of Nucleotide Translocation T
No ratings yet
Comparative Analysis of Nucleotide Translocation T
11 pages
Comfa and Related Approaches
No ratings yet
Comfa and Related Approaches
23 pages
Master in Maritime Operation
No ratings yet
Master in Maritime Operation
17 pages
AIChE Journal - 2015 - Zhang
No ratings yet
AIChE Journal - 2015 - Zhang
15 pages
42-51CFDMLReview Updated1 (1)
No ratings yet
42-51CFDMLReview Updated1 (1)
11 pages
Providing a Photovoltaic Performance Enhancement Relationship from Binary to Ternary Polymer Solar Cells via Machine Learning
No ratings yet
Providing a Photovoltaic Performance Enhancement Relationship from Binary to Ternary Polymer Solar Cells via Machine Learning
16 pages
Dong2021 PDF
No ratings yet
Dong2021 PDF
55 pages
1-s2.0-S0921883123003473-main
No ratings yet
1-s2.0-S0921883123003473-main
14 pages
Song 2002
No ratings yet
Song 2002
17 pages
Loosely Coupled Method
No ratings yet
Loosely Coupled Method
18 pages
Polymers 10 00103 PDF
No ratings yet
Polymers 10 00103 PDF
16 pages
Prediction_of_Geopolymer_Concrete_Compressive_Stre
No ratings yet
Prediction_of_Geopolymer_Concrete_Compressive_Stre
32 pages
Scalable Quantum Simulation of Molecular Energies: Doi: Subject Areas: Condensed Matter Physics, Quantum Information
No ratings yet
Scalable Quantum Simulation of Molecular Energies: Doi: Subject Areas: Condensed Matter Physics, Quantum Information
13 pages
Processes-11-03325 Machine Learning
No ratings yet
Processes-11-03325 Machine Learning
40 pages
Prediction of Organic Compound Aqueous Solubility Using Machine Learning: A Comparison Study of Descriptor-Based and Fingerprints-Based Models
No ratings yet
Prediction of Organic Compound Aqueous Solubility Using Machine Learning: A Comparison Study of Descriptor-Based and Fingerprints-Based Models
16 pages
Generalized Diffusion RUL
No ratings yet
Generalized Diffusion RUL
21 pages
Modeling and Simulation of Photobioreactors With Computational Fluid Dynamics-A Comprehensive Review
No ratings yet
Modeling and Simulation of Photobioreactors With Computational Fluid Dynamics-A Comprehensive Review
63 pages
Asian Journal - 2022 - Machine Learning Applications For Chemical Reactions
No ratings yet
Asian Journal - 2022 - Machine Learning Applications For Chemical Reactions
16 pages
Version Publicada Multiscale Failure
No ratings yet
Version Publicada Multiscale Failure
27 pages
ML for composites
No ratings yet
ML for composites
11 pages
Malpica Preprint PyCSP 2022
No ratings yet
Malpica Preprint PyCSP 2022
34 pages
lamps - cpc
No ratings yet
lamps - cpc
34 pages
Artificial Intelligence and Design of Experiments for Res 2024 Energy Strate
No ratings yet
Artificial Intelligence and Design of Experiments for Res 2024 Energy Strate
29 pages
Fatigue Fract Eng Mat Struct - 2021 - Silva - Machine Learning and Finite Element Analysis An Integrated Approach For
No ratings yet
Fatigue Fract Eng Mat Struct - 2021 - Silva - Machine Learning and Finite Element Analysis An Integrated Approach For
15 pages
Bilodeau-Generative Models For Molecular Discovery-Recent Advances and challenges-article-2022-NA
No ratings yet
Bilodeau-Generative Models For Molecular Discovery-Recent Advances and challenges-article-2022-NA
17 pages
Fast Design of Catalyst Layer With Optimal Electrical-Thermal-Water Performance For Proton Exchange Membrane Fuel Cells
No ratings yet
Fast Design of Catalyst Layer With Optimal Electrical-Thermal-Water Performance For Proton Exchange Membrane Fuel Cells
14 pages
Efficient Machine Learning Force Field For Large-Scale Molecular Simulations of Organic Systems
No ratings yet
Efficient Machine Learning Force Field For Large-Scale Molecular Simulations of Organic Systems
23 pages
s13321-025-00974-w
No ratings yet
s13321-025-00974-w
11 pages
0350 820X1803371V
No ratings yet
0350 820X1803371V
15 pages
Coupling Building Simu With CFD
No ratings yet
Coupling Building Simu With CFD
44 pages
10.1515_jib-2022-0034
No ratings yet
10.1515_jib-2022-0034
15 pages
Advancing Vapor Pressure Prediction A Machine Learning Approach With Directed Message Passing Neural Networks
No ratings yet
Advancing Vapor Pressure Prediction A Machine Learning Approach With Directed Message Passing Neural Networks
31 pages
Mcclean 2016
No ratings yet
Mcclean 2016
23 pages
EL Copy
No ratings yet
EL Copy
16 pages
Biomass paper
No ratings yet
Biomass paper
11 pages
1-s2.0-S1000936124004242-main
No ratings yet
1-s2.0-S1000936124004242-main
14 pages
Quantile Based Optimization Under Uncertainties for Comple 2025 Chinese Jour
No ratings yet
Quantile Based Optimization Under Uncertainties for Comple 2025 Chinese Jour
13 pages
Qsar With Python
No ratings yet
Qsar With Python
18 pages
(Mechanical Engineering Series) Hamid Arastoopour, Dimitri Gidaspow, Emad Abbasi (Auth.) - Computational Transport Phenomena of Fluid-Particle Systems-Springer International Publishing (2017)
No ratings yet
(Mechanical Engineering Series) Hamid Arastoopour, Dimitri Gidaspow, Emad Abbasi (Auth.) - Computational Transport Phenomena of Fluid-Particle Systems-Springer International Publishing (2017)
114 pages
Polymers 16 01425
No ratings yet
Polymers 16 01425
18 pages
Data Driven Models For The Design of Rocket Injector Elements
No ratings yet
Data Driven Models For The Design of Rocket Injector Elements
30 pages
GROMACS in The Cloud - A Global Supercomputer To Speed Up Alchemical Drug Design
No ratings yet
GROMACS in The Cloud - A Global Supercomputer To Speed Up Alchemical Drug Design
59 pages
Processes 09 00950 v2
No ratings yet
Processes 09 00950 v2
16 pages
Application of Machine-Learning Algorithms To Predict The Transport Properties of Mie Fluids
No ratings yet
Application of Machine-Learning Algorithms To Predict The Transport Properties of Mie Fluids
12 pages
Mechanical Properties of Nanostructured Materials: Quantum Mechanics and Molecular Dynamics Insights
From Everand
Mechanical Properties of Nanostructured Materials: Quantum Mechanics and Molecular Dynamics Insights
Abdolhossein Fereidoon
No ratings yet
Connectivity Prediction in Mobile Ad Hoc Networks for Real-Time Control
From Everand
Connectivity Prediction in Mobile Ad Hoc Networks for Real-Time Control
Sebastian Thelen
5/5 (1)
Gr6 Maths INTEGERS WS - Answerkey
No ratings yet
Gr6 Maths INTEGERS WS - Answerkey
10 pages
R. K. Educational School: Annual Exam (2023-24) Syllabus & Date Sheet For Class XI
No ratings yet
R. K. Educational School: Annual Exam (2023-24) Syllabus & Date Sheet For Class XI
1 page
Single-Chip IC: For Home Stereo IC With Electronic Tuning Support
No ratings yet
Single-Chip IC: For Home Stereo IC With Electronic Tuning Support
9 pages
HW3-Forces and motion-5 (学生版) - 022532
No ratings yet
HW3-Forces and motion-5 (学生版) - 022532
16 pages
Complete Heat Transfer Calculations 1st Edition Myer Kutz (Ed) PDF For All Chapters
100% (10)
Complete Heat Transfer Calculations 1st Edition Myer Kutz (Ed) PDF For All Chapters
70 pages
Plasticstrain
No ratings yet
Plasticstrain
3 pages
Minimum Spectral Light Requirements and Maximum Light Levels For Long Term Germling Growth of Several Red Algae From Different Water Depths and A
No ratings yet
Minimum Spectral Light Requirements and Maximum Light Levels For Long Term Germling Growth of Several Red Algae From Different Water Depths and A
11 pages
21-22 - S5 - Math - Mid-Yr - Exam - P2 Solutions
No ratings yet
21-22 - S5 - Math - Mid-Yr - Exam - P2 Solutions
13 pages
01 STB-S19-00 - Datenblatt Nettowaage EN v03
No ratings yet
01 STB-S19-00 - Datenblatt Nettowaage EN v03
2 pages
Omron G7L 2A BUBJ CB AC24 Datasheet
No ratings yet
Omron G7L 2A BUBJ CB AC24 Datasheet
14 pages
Mechanisms
No ratings yet
Mechanisms
10 pages
Anchor Rod
No ratings yet
Anchor Rod
17 pages
9709_w13_qp_41
No ratings yet
9709_w13_qp_41
4 pages
Earth Solar System: Unit 7: Beyond Module 22: Our
No ratings yet
Earth Solar System: Unit 7: Beyond Module 22: Our
25 pages
IX4340
No ratings yet
IX4340
10 pages
4 - Laced and Battened Columns
No ratings yet
4 - Laced and Battened Columns
37 pages
MagneticShapes Puzzles ButterflyFields
No ratings yet
MagneticShapes Puzzles ButterflyFields
82 pages
Strength of Material Question Baks
No ratings yet
Strength of Material Question Baks
33 pages
Agss Plant Iom
No ratings yet
Agss Plant Iom
31 pages
CL Xii Phy Vlab E6
No ratings yet
CL Xii Phy Vlab E6
5 pages
Math Project Complete Database
No ratings yet
Math Project Complete Database
11 pages
Chemical Bonding
No ratings yet
Chemical Bonding
43 pages
Self Evaluation 1
No ratings yet
Self Evaluation 1
3 pages
Assign 1 2025 KM RM
No ratings yet
Assign 1 2025 KM RM
13 pages
Power Quality in Power Systems, Electrical Machines, and Power-Electronic Drives 3rd Edition Fuchs all chapter instant download
100% (1)
Power Quality in Power Systems, Electrical Machines, and Power-Electronic Drives 3rd Edition Fuchs all chapter instant download
50 pages
Download Full Polymer Structure Characterisation From Nano to Macro Morphological its molecular origins Issues in Environmental Scienc 1st Edition R. Pethrick PDF All Chapters
100% (7)
Download Full Polymer Structure Characterisation From Nano to Macro Morphological its molecular origins Issues in Environmental Scienc 1st Edition R. Pethrick PDF All Chapters
82 pages
Open Source Plans For FreeEnergy Devices
90% (20)
Open Source Plans For FreeEnergy Devices
12 pages
Ultrasonic Interferometer
No ratings yet
Ultrasonic Interferometer
12 pages
Pavement Engineering 4300:565 Assignment #1 Spring 2014: Submitted by Mir Shahnewaz Arefin Student Id: 2824475
No ratings yet
Pavement Engineering 4300:565 Assignment #1 Spring 2014: Submitted by Mir Shahnewaz Arefin Student Id: 2824475
8 pages

Advancing Material Property Prediction Using Physics-Informed Machine Learning Models for Viscosity

Uploaded by

Advancing Material Property Prediction Using Physics-Informed Machine Learning Models for Viscosity

Uploaded by

Chew et al.

Journal of Cheminformatics (2024) 16:31 Journal of Cheminformatics

RESEARCH Open Access

Advancing material property prediction:

Introduction can substantially reduce cost and time by learning the

temperature-dependent viscosities. We hypothesized including MD descriptors becomes statistically insig-

When using 2D descriptors only (see Fig. 5A), “MOE-

You might also like