0% found this document useful (0 votes)
15 views

ML_TOx

vdsfbvsb

Uploaded by

Mehul Patel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

ML_TOx

vdsfbvsb

Uploaded by

Mehul Patel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

pubs.acs.

org/crt Review

Machine Learning in Predictive Toxicology: Recent Applications and


Future Directions for Classification Models
Marcus W. H. Wang, Jonathan M. Goodman,* and Timothy E. H. Allen
Cite This: https://dx.doi.org/10.1021/acs.chemrestox.0c00316 Read Online

ACCESS Metrics & More Article Recommendations *


sı Supporting Information
Downloaded via AUCKLAND UNIV OF TECHNOLOGY on December 23, 2020 at 20:39:44 (UTC).
See https://pubs.acs.org/sharingguidelines for options on how to legitimately share published articles.

ABSTRACT: In recent times, machine learning has become


increasingly prominent in predictive toxicology as it has shifted
from in vivo studies toward in silico studies. Currently, in vitro
methods together with other computational methods such as
quantitative structure−activity relationship modeling and absorp-
tion, distribution, metabolism, and excretion calculations are being
used. An overview of machine learning and its applications in
predictive toxicology is presented here, including support vector
machines (SVMs), random forest (RF) and decision trees (DTs),
neural networks, regression models, naive ̈ Bayes, k-nearest
neighbors, and ensemble learning. The recent successes of these
machine learning methods in predictive toxicology are summar-
ized, and a comparison of some models used in predictive
toxicology is presented. In predictive toxicology, SVMs, RF, and DTs are the dominant machine learning methods due to the
characteristics of the data available. Lastly, this review describes the current challenges facing the use of machine learning in
predictive toxicology and offers insights into the possible areas of improvement in the field.

■ INTRODUCTION
Machine learning is a recent field that has advanced computa-
drugs,5,9,10,15,22−26 consumer products,27 agrochemicals,28 and
foods.29,30 Historically, the toxicity of new chemicals was
tional chemistry with numerous applications such as drug determined through in vivo studies, preclinical trials, and clinical
discovery, cheminformatics, and predictive toxicology.1−21 trials.22−26,31,32 The toxicity of a drug or drug candidate should
Machine learning generally involves building a model, training be determined before it reaches the market or before any clinical
the model, performing validation, repeating training and trials are performed, where there is a risk of causing severe
validation until a suitable model is obtained, and finally testing adverse effects to humans.22,23,25,26,31−33 However, determining
the model on data not previously exposed to the model (Figure the toxicity of new compounds is challenging due to a wide
1). The goal of a machine learning model is to pick out patterns variety of potential metabolic products,33,34 idiosyncratic effects
from the input data, or generalize these data, and apply the which only occur in small parts of the population,23,26,32,33 and
results to unknown test data.3−5,15−19,21 the general complexity of in vivo systems. These difficulties in
These models can be used for predictive toxicology, which is a determining toxicity have resulted in drug withdrawals or drug
field that revolves around in silico predictions of in vivo
candidates being terminated at the preclinical or clinical
toxicological effects, such as for drug candidates or
phase.22−26,31−36 Generally, the drug attrition rate is reported
to be about 90%−95% for phase I trials.30,37 Such a high failure
rate only contributes to the high cost of drug development,
which totalled $2.59 billion in the United States in 2014, and is
expected to continue increasing in the future.38 For these
reasons, and due to ethical concerns and regulatory

Special Issue: Computational Toxicology


Received: July 31, 2020

Figure 1. Outline of machine learning.

© XXXX American Chemical Society https://dx.doi.org/10.1021/acs.chemrestox.0c00316


A Chem. Res. Toxicol. XXXX, XXX, XXX−XXX
Chemical Research in Toxicology pubs.acs.org/crt Review

advances,39−41 there has been a shift toward the use of in silico


methods for predicting the toxicity of compounds.30,42−46
In silico methods for predictive toxicology include algorithms
such as machine learning, quantitative structure−activity
relationship (QSAR), expert-based systems, and read-
across.15,16,19,47−49 We will focus on machine learning in
predictive toxicology and cover a variety of toxicity end points,
of which the main types are hepatotoxicity, carcinogenicity/
genotoxicity/mutagenicity, and cardiotoxicity. We also concen-
trate on classification algorithms, which treat toxicity as an
active/inactive question, rather than on regression models,
which attempt to make quantitative predictions of toxicity. To Figure 2. Different categories of machine learning models.
predict toxicity for drugs and drug candidates, several software
packages have been developed, including Derek and the OECD
Toolbox.50−53 Machine learning models are generally treated as Supervised learning refers to models that are trained on a data
black boxes, and the decisions made by them are sometimes set containing known inputs (features) and outputs (la-
unclear, or at least unexplained. In contrast, some mechanistic bels).86−90 The model predicts the labels, while the accuracy
QSAR models and expert systems can be understood and is defined as the difference between the predicted labels and the
interpreted, although this is not true for all of them. Machine experimental labels. These models are usually used for
learning also has the advantage that it is often able to handle classification purposes. Unsupervised learning refers to models
complex problems more effectively and scales well to many that are trained on a data set containing known features but
different tasks, as evidenced by the successes so far in predictive unknown labels, and this is commonly used for clustering or
toxicology.5,16−18,20,21,54−56 pattern recognition.8,88,91−93 Lastly, semisupervised learning
Some people consider machine learning algorithms as a type aims to improve model performance compared to the former
of QSAR and some consider it to be different. In general, QSAR two model categories by making use of both labeled and
modeling refers to using a structure−activity relationship to unlabeled data.90,94−101
model a quantitative prediction of a label. On the other hand, In predictive toxicology, supervised learning is commonly
machine learning refers to using a statistical technique to used as it can classify the input data into different classes (binary
generalize the data and obtaining predictions based on the or multiclass classification) or labels (multilabel classifica-
model. In machine learning, structure−activity relationships can tion).19,46,54,90,102,103 For example, a supervised learning model
be used to model the data, which might give rise to confusion can be used to predict reproductive toxicity given the fingerprint
between the two types of modeling techniques. It is also noted of a compound.104 Supervised learning can also be used for
that with machine learning, there are other features that can be regression-based tasks, such as for predicting the quantitative
used to model the data, which need not be the structure. value associated with compound toxicities.105 In contrast,
Similarly, the activity of the compound need not be the unsupervised learning and semisupervised learning are less
predicted label, and other labels such as toxicity end points can commonly used in predictive toxicology.
be used. Here, we will focus on supervised machine learning in
In predictive toxicology, common databases used to build data predictive toxicology, with several model types that have been
sets for machine learning models include ChEMBL, ToxCast, used in predictive toxicology being analyzed. These model types
and PubChem.57−64 These databases contain data for different will be introduced in order of increasing complexity:
groups of compounds. For example, ChEMBL has data for drug- (1) regression models
like compounds, while ToxCast focuses more on industrial (2) k-nearest neighbors (kNNs)
chemicals.59−63 Other sources of data include results provided (3) decision trees (DTs)
by pharmaceutical companies and data that can be extracted ̈ Bayes (NB)
(4) naive
from published papers or the public domain.65−71 The data are (5) support vector machines (SVMs)
subsequently processed for suitability as model input. For (6) random forest (RF) and ensemble learning
example, missing, invalid, or unnecessary data would usually be (7) neural networks (artificial neural networks (ANNs), deep
removed. Another part of data processing is related to working neural networks (DNNs), and convolutional neural
with imbalanced data, which is common in machine networks (CNNs))
learning.70−81 Several methods have been employed to balance Unlike related reviews on machine learning models in
the classes in the data set to train good quality models.70−81 predictive toxicology,12,18,56 this review aims to give a new
These methods, which usually involve oversampling, under- perspective on predictive toxicology by giving an overall picture
sampling, or a combination of both, are described in the of the situation as well as a more focused analysis on some
literature.70−81 The checking and processing of the raw data, as toxicity end points.
well as checking the quality of end-point data, are essential steps In predictive toxicology, chemical structures are usually
that are often overlooked, which can result in poor models being represented by features that can be processed by machine
developed. learning methods.54 This can take the form of either molecular
Once the data set has been prepared, the next step is to build a descriptors, molecular fingerprints, or both.54 Molecular
machine learning model. Many papers have been published on descriptors include features such as atom count, logP, solubility,
the models used in machine learning and its use in predictive etc., and are commonly obtained using cheminformatics
toxicology.3,10,13,15−21,47,54,82−85 In general, these models can be toolkits.54 Molecular fingerprints represent the molecule as a
classified into three categories: supervised learning, unsuper- binary string, with each bit in the string corresponding to the
vised learning, and semisupervised learning (Figure 2). fragments or substructures in the molecule.54 More detailed
B https://dx.doi.org/10.1021/acs.chemrestox.0c00316
Chem. Res. Toxicol. XXXX, XXX, XXX−XXX
Chemical Research in Toxicology pubs.acs.org/crt Review

explanations of this subject are available.17,20,54,78,92,106−123 k-Fold Cross-Validation. k-Fold cross-validation refers to a
Commonly used molecular fingerprints include Molecular method where the data set is split into k groups (Figure 3B), with
ACCess System (MACCS) and extended-connectivity finger- k being chosen based on the data set.120,121,127,130,133
prints (ECFPs) such as Morgan fingerprints.107,124−126 Commonly used values of k include 5 or 10. One of the groups
Finally, to determine the reliability of the results and the is set aside as the test set, while the model is trained on the
quality of these models, validation methods such as hold-out remaining groups. This process is repeated iteratively until all
validation, k-fold cross-validation, or LOO cross-validation are groups have been chosen to be the test set once. Model
used.5,12,18,119−121,127−133 Using validation methods to test the performance is taken to be the average of the test set
models allows for the assessment of the models’ robustness and performance over all groups.
the reliability of the results obtained. Performance measures and Leave-One-Out Cross-Validation. Leave-one-out (LOO)
metrics have also been reviewed recently.20,54,72,102,122,134,135 cross-validation is a special case of k-fold cross-validation, where

■ VALIDATION METHODS
Hold-out Validation. Hold-out validation refers to a
the number of groups equals the number of data points (Figure
3C).106,112,131,132
Regarding the model training process and evaluation, LOO
method where the data set is split into a training and test set cross-validation follows the process as described earlier in k-fold
(Figure 3A).119,121 The data which have been partitioned into cross-validation. Model performance using this validation
method is similarly taken to be the average across all runs.
Generally, LOO cross-validation is the most computationally
expensive due to the large number of training cycles required,
while the model also tends to overestimate performance. LOO
cross-validation is best used for small data sets to offset these
disadvantages.
Leave-Many-Out Cross-Validation. Leave-many-out
cross-validation (LMO-CV), also known as Monte Carlo
resampling or bootstrapping, is another cross-validation
technique that is used in the field of machine learning.136,137 It
involves leaving out all possible subsets of m examples of the
data.137 In a similar fashion to LOO cross-validation, in LMO-
CV the training data are split into two subsets: a subset of m
examples which is used for validation and a subset of (n−m)
examples for training the model.137 In total, there are Cmn splits
Figure 3. Common validation methods used in machine learning: (A) that can be carried out on the training examples.137 LMO-CV
hold-out validation, (B) k-fold cross-validation, (C) LOO cross- carries out this procedure on all possible Cmn cases, resulting in an
validation. exhaustive procedure that is computationally expensive.137
Monte Carlo Cross-Validation. Another validation
the training set are subsequently used for training the model, and technique used in machine learning is Monte Carlo cross-
it is validated by calculating performance statistics using the test validation (MCCV). This method randomly split the samples
set. In the case where the test set is independent of the data used into two parts Sc(i) (of size nc) and Sv(i) (of size nv), and this
for building the model, this is known as external or independent procedure is repeated N times (i = 1, 2, ..., N), where nc and nv are
validation. Generally, the test set contains around 20% of the the number of samples in the calibration set and validation set,
total data set. In some cases, the data are split into three respectively.138 This is defined in eq 1:138
partitions to prevent hyperparameter bias. The training set is
used for training, the validation set for hyperparameter tuning, MCCVn v(k) = 1Nn v ∑ N || yS (i) − yŜ (i) ||2
v v
and the test set for final performance review. i=1 (1)

Table 1. Comparison of Literature Searches in Google Scholar and Web of Knowledge for the Year Range 2010−2020
machine number of papers found number of papers found number of identical papers number of papers from Google
learning total number of papers during Google Scholar during Web of Knowledge found in both databases’ Scholar found in Web of
method from both databasesa search search search Knowledge
regression 2 2 1 1 2
models
kNNs 8 6 4 2 6
decision trees 5 2 4 1 1
NB 14 9 5 0 9
SVMs 15 10 8 3 10
ensemble 13 9 9 5 9
learning and
RF
neural 13 10 5 2 9
networks
total 70 48 36 14 46

a
Papers with multiple machine learning methods are counted as belonging to the groups they appear in when they are searched. This means that
the papers can be counted multiple times, and thus, the overall total in the table includes this result.

C https://dx.doi.org/10.1021/acs.chemrestox.0c00316
Chem. Res. Toxicol. XXXX, XXX, XXX−XXX
Chemical Research in Toxicology pubs.acs.org/crt Review

Table 2. Comparison of Literature Searches in Google Scholar and Web of Knowledge for the Year Range 2000−2009
machine number of papers found number of papers found number of identical papers number of papers from Google
learning total number of papers during Google Scholar during Web of Knowledge found in both databases’ Scholar found in Web of
method from both databasesa search search search Knowledge
regression 6 5 1 0 5
models
kNNs 3 2 1 0 2
decision trees 3 3 1 1 3
NB 3 2 1 0 2
SVMs 9 5 4 0 4
ensemble 3 2 1 0 2
learning and
RF
neural 8 6 3 1 5
networks
total 35 25 12 2 23
a
Papers with multiple machine learning methods are counted as belonging to the groups they appear in when they are searched. This means that
the papers can be counted multiple times and thus the overall total in the table includes this result.

MCCV reduces the computational complexity drastically considered during the searching process unlike the results in
because of the reduction in the number of samples.138 In general, Table 1, where the results were verified manually. This is in line
N = n2 is sufficient for MCCVnv to perform as well as CVnv (cross- with our intention to estimate the situation for the year range
validation).138 As compared to LOO cross-validation, MCCV 2000−2009.
has a larger probability to choose the correct number of It was also generally observed that the total number of results
components in a model.138 For the data set, in order to obtain a using Google Scholar for 2000−2009 is significantly less than
larger probability using MCCV, a lesser the number of samples is 2010−2020 (e.g., ca. <1000 compared to >10,000 for SVM,
required for validation.138 1610 vs 15,600 for neural networks (NNs)). Therefore, it can be


concluded that there has been a general increase in the number
of papers published for the field of predictive toxicology for the
SEARCH PROTOCOL
year range 2010−2020. This can also be seen in Figure 4 which
All references for the machine learning models (Table 3) were shows the changes in the number of papers over the year ranges
obtained by searching the first 50 pages of results in Google used for the search criteria.
Scholar using the model type, the keywords “machine learning”
and “toxicity”, and the default settings, with all results being
sorted by relevance. Each hit was manually checked to ensure
relevance to the topic. The year range was restricted to 2010−
2020 to obtain the most recent developments. QSARs, expert-
based systems, read-across methods, and preprints were
excluded from the results. The most recent search was carried
out in March 2020.
A similar search was performed with the Web of Knowledge
(Web of Science Core Collection) using the same keywords and
the default settings. The results from the Web of Knowledge
were refined with the same criteria used with Google Scholar. It
was found that there were fewer hits from the Web of Knowledge
compared to the search performed using Google Scholar, with
the hits from Web of Knowledge generally being included in the
search from Google Scholar (Table 1). Some of the hits from the
Web of Knowledge are not included in the hits from the first 50
pages Google Scholar, perhaps due to the references lacking the
keywords specified during the search, or the ordering of the hits
in Google Scholar which caused some of the hits to be outside
the first 50 pages. Additionally, the references found using
Google Scholar were also searched in the Web of Knowledge Figure 4. Comparison of results obtained using the search protocol
using the title as the search criteria, and the results are recorded over the year ranges specified.
in Table 1. In order to give a more complete picture of the
developments in the field, the hits from the Web of Knowledge However, compiling an exhaustive list of relevant articles on
are also included in the review. This search protocol generated a this topic when using this search criteria has limitations. This is
total of 43 references which are listed in Table 3.9,17,46,139−178 because articles can list the names of machine learning methods
In order to find out if there has been a shift in the situation used without explicitly indicating that they represent machine
since 2000−2009, papers were searched using the same criteria learning and vice versa. For example, it was observed during the
as Table 1 while restricting the year range to 2000−2009, with search process that titles of papers generally do not include the
the results being presented in Table 2. However, to speed up the machine learning method used, that is, they just specify
search, only the titles and the abstracts of the papers were “machine learning”, while the machine learning method used
D https://dx.doi.org/10.1021/acs.chemrestox.0c00316
Chem. Res. Toxicol. XXXX, XXX, XXX−XXX
Chemical Research in Toxicology pubs.acs.org/crt Review

Table 3. A Summary of the Results of All Machine Learning Methods


AUC data set
no. machine learning method toxicity end point accuracy (%) (%) SE (%) SP (%) validation type size ref
1 regression (linear) cardiotoxicity − 75 − − 10-fold 1917 159
2 regression (ridge) cardiotoxicity − 77 − − 10-fold 1917 159
3 regression (partial least-squares) molecular toxicity 82 − − − hold-out 2849 170
4 kNN aquatic toxicity 84 92 84 85 10-fold 1005 174
5 kNN carcinogenicity − − 84 84 external 661 175
6 kNN cardiotoxicity 82 78 82 57 external 206 176
7 kNN genotoxicity 86 93 80 90 5-fold 576 177
8 kNN hepatotoxicity 62 52 91 20 external 978 178
9 kNN hepatotoxicity 78 − 79 76 10-fold 1274 139
10 kNN hepatotoxicity − − 62 ± 20 92 ± 14 10-fold 288 140
11 kNN organ toxicity − − 92 ± 8 78 ± 6 5-fold - 141
12 DT food-related toxicity 0.23b − − − 10-fold 94 172
13 DT genotoxicity 82 81 75 87 5-fold 576 177
14 DT hepatotoxicity 89a − − − 10-fold 575 173
15 DT hepatotoxicity − − 74 ± 16 94 ± 5 10-fold 288 140
16 DT organ toxicity − − 89 ± 12 76 ± 7 5-fold - 141
17 NB aquatic toxicity 77 81 70 84 10-fold 1005 174
18 NB carcinogenicity 68 ±2 − 60 ± 8 75 ± 10 5-fold 834 142
19 NB developmental toxicity 83 − 90 67 5-fold 232 143
20 NB genotoxicity 85 90 89 81 5-fold 576 177
21 NB hepatotoxicity − − 73 73 5-fold 336 144
22 NB hepatotoxicity − − 70 ± 15 85 ± 7 10-fold 288 140
23 NB immunotoxicity − 78 73 70 hold-out 44615 145
24 NB mitochondrial toxicity 81 ±1 − 88 ± 4 77 ± 4 5-fold 226 148
25 NB mutagenicity 90.9 ± 0.3 − 39 ± 4 95 ± 1 5-fold 5159 146
26 NB mutagenicity 65 − − − 10-fold 3903 147
27 NB myelotoxicity 82 ±3 − 76 ± 6 84 ± 3 external 727 149
28 NB nephrotoxicity − − 62 78 stratified 3-fold 27 150
29 NB respiratory toxicity 84 − 84 85 5-fold 993 151
30 NB urinary tract toxicity 84 − 84 85 5-fold 173 152
31 SVM acute oral toxicity 90 − − − external 8102 46
32 SVM acute toxicity 70 72 85 59 external 321 153
33 SVM aquatic toxicity 89 94 89 89 10-fold 1005 174
34 SVM carcinogenicity − − 73 79 external 499 9
35 SVM carcinogenicity 68 ±3 73 ±3 65 ± 5 72 ± 5 5-fold 802 154
36 SVM carcinogenicity 78 − 84 74 external 661 175
37 SVM cardiotoxicity 86 72 88 29 external 206 176
38 SVM cardiotoxicity 87 − 90 74 10-fold 1501 155
39 SVM genotoxicity 89 95 92 84 5-fold 576 177
40 SVM hepatotoxicity − − 58 ± 16 99 ± 6 10-fold 288 140
41 SVM hepatotoxicity 75 61 93 38 external 978 178
42 SVM hepatotoxicity 83 89 93 68 external 1731 156
43 SVM mutagenicity 72 − 69 74 10-fold 1696 157
44 SVM nephrotoxicity − − 79 84 stratified 3-fold 27 150
45 SVM ototoxicity 85 − 82 92 external 536 158
46 ensemble learning/RF aquatic toxicity 86 93 83 89 10-fold 1005 174
47 ensemble learning/RF carcinogenicity 70 ±3 77 ±3 67 ± 5 73 ± 4 5-fold 802 154
48 ensemble learning/RF carcinogenicity 74 − 65 80 hold-out 661 175
49 ensemble learning/RF carcinogenicity 68 ±3 74 ±3 64 ± 5 73 ± 4 5-fold 802 154
50 ensemble learning/RF cardiotoxicity 82 94 65 98 10-fold 2901 160
51 ensemble learning/RF cardiotoxicity 97 − − − external 522 161
52 ensemble learning/RF genotoxicity 94 96 95 93 external 576 177
53 ensemble learning/RF hepatotoxicity 69 ±3 75 ±3 76 ± 3 62 ± 5 5-fold 993 163
54 ensemble learning/RF hepatotoxicity 74 79 − − 10-fold 281 162
55 ensemble learning/RF hepatotoxicity − − 58 ± 15 97 ± 5 10-fold 288 140
56 ensemble learning/RF hepatotoxicity 71 ±3 76 ±2 80 ± 4 60 ± 5 5-fold 1117 163
57 ensemble learning/RF hepatotoxicity 73 − 77 66 10-fold 1274 139
58 Ensemble learning/RF nephrotoxicity − − 89 75 10-fold 30 164
59 NN (ANN) acute toxicity − 70 100 53 external 321 153
60 NN (ANN) aquatic toxicity 88 94 87 89 10-fold 1005 174

E https://dx.doi.org/10.1021/acs.chemrestox.0c00316
Chem. Res. Toxicol. XXXX, XXX, XXX−XXX
Chemical Research in Toxicology pubs.acs.org/crt Review

Table 3. continued
AUC data set
no. machine learning method toxicity end point accuracy (%) (%) SE (%) SP (%) validation type size ref
61 NN (ANN) genotoxicity 87 94 91 82 5-fold 576 177
62 NN (ANN) hepatotoxicity 82 − 71 98 external 475 166
63 NN (ANN) mutagenicity 60 − 40 81 10-fold 1696 157
64 NN (ANN) mutagenicity 80 87 84 75 5-fold 6094 167
65 NN (DNN) cardiotoxicity 93 97 93 91 hold-out 3954 165
66 NN (DNN) cardiotoxicity 98 − − − external 522 161
67 NN (DNN) general − 84 − − external 11,764 17
68 NN (DNN) hepatotoxicity 81 − 82 80 external 475 166
69 NN (Graph CNN) cardiotoxicity − 96 − − − 3954 165
70 NN (CNN) general − 78 − 100 − 10,588 168
71 NN (CNN) general − 85 − − − 7438 169
72 NN (CNN) hepatotoxicity 63 62 64 62 − 7630 171
a
Corrected classification rate used instead of accuracy. bError between the predicted value vs the actual value was used instead of accuracy.

is usually specified in the abstract of the paper. A similar situation machines (SVM), and will be covered in a later section. Also, in
would also apply for the toxicity end points and terms. Such the literature, kernel functions are sometimes referred to as
limitations would result in papers being missed or papers being regression models. In this review, we will make a distinction
counted wrongly if they are not checked manually for relevance, between the two: the term kernel functions will be reserved for
which is time consuming. Therefore, to supplement the results the SVMs, while regression models will be discussed in this
of the search criteria obtained in Tables 1 and 2, we have chosen section.
to highlight several papers obtained using a manual search which Regression models, which are statistical models, can be
have a significant impact in the field of predictive toxicology. broadly classified as linear regression and nonlinear regression.
In particular, the Tox21 program, which was developed in These include linear, multivariate linear, polynomial, stepwise,
accordance with the National Research Council’s vision of ridge, and least absolute shrinkage and selection opera-
testing toxicity in the 21st century.179,180 The large amount of tor.190−205 These regression models are used for quantitative
publicly accessible data generated due to the collaboration of predictions unlike the standard classification models. Examples
many institutions have promoted the development and use of of these quantitative characteristics of toxicity include LD50,
high-throughput screening assays.179,180 In this review, Table 3 LC50, IC50, and EC50.
shows the results obtained by the search which generally While linear regression models have low computational cost,
involves predicting the toxicity of drug or drug-like compounds; their linear nature limits their ability to model complex
in other words, pharmaceuticals. As compared to pharmaceut- problems, unlike nonlinear regression models.84,206,207 How-
icals which is the focus in recent times (Table 3), Tox21/ ever, using nonlinear regression models will increase the
ToxCast is different because the focus is on environmental computational cost.
chemicals.179,180 Several successful machine learning models In predictive toxicology, regression functions have been
have also been built using Tox21/ToxCast data, which have employed in several works. These models, as well as all machine
demonstrated the potential of these data in predictive learning methods in this review, will be measured by
toxicology.140,181−185 performance metrics which include accuracy (Q), sensitivity
ToxPrint is a widely used set of structural features from (SE), specificity (SP), and area under the receiver operating
molecules in toxicity databases, which can be represented as characteristic curve (AUC). Even though the accuracy was
chemical fingerprints.141,186 This was developed by Yang et al. in chosen as the performance metric for regression model types, it
2015 and is based on various toxicity prediction models and is acknowledged that R2 and root-mean-square error are more
safety assessment guidelines by several institutions including the adequate performance metrics to gauge the quality of the model.
Food and Drug Administration.141,186 ToxPrint has also begun However, in line with our intention to give an estimate of the
to be used with machine learning models, including a recent performance of regression models as compared to the other
study predicting estrogen binding.187 machine learning methods, accuracy which is a common
Another paper to be highlighted involves XGBoost. XGBoost performance metric used in machine learning was used. Table
is a scalable tree boosting method used in machine learning and 3, entries 1−3, summarizes the recent performance of several
has received recognition in numerous data mining and machine regression models across different toxicity end points.
learning challenges.188 XGBoost is highly scalable in all k-Nearest Neighbors (kNNs). kNN is a nonparametric
scenarios, which has contributed to its success in machine classifier where the test sample is assigned a class label based on
learning.188 Further details can be found in the work by Chen et the most frequently occurring class label among the
al.188 kNNs.208−210 A proximity measure, such as Euclidean distance


or Manhattan distance, is used to define the kNNs to each test
sample.208−210 All samples are represented by points in an n-
MODEL TYPES dimensional feature space, while the neighbors are taken from a
Regression Models. A regression task involves predicting a set of objects for which the correct classification or value is
numerical response variable using several predictor variables by known.209
learning a model that minimizes the loss function.189 First, we More formally, the kNNs algorithm is defined as follows:
will distinguish the regression models from support vector Given a collection of incomplete/unlabeled test data {(xi, yi), i =
regression (SVR), which is an application of support vector n + 1, ..., n + m}, the problem amounts to predicting the class
F https://dx.doi.org/10.1021/acs.chemrestox.0c00316
Chem. Res. Toxicol. XXXX, XXX, XXX−XXX
Chemical Research in Toxicology pubs.acs.org/crt Review

labels for y* = {yn+1, ..., yn+m} with corresponding feature vectors Chervonenkis dimension (VC-dimension) to minimize the
x* = {xn+1, ..., xn+m}.208,211 Thus, the kNNs algorithm amounts to guaranteed risk, which is the sum of the empirical risk and the
classifying an unlabeled yn+1 as the most common class among confidence interval.224 This, however, involves a trade-off as the
the kNNs of xn+1 in the training set {(xi, yi), i = 1, ..., n}.208,209,211 minimum empirical risk decreases, while the confidence interval
In the algorithm, the value of k is typically a positive integer, increases as the VC-dimension increases.224 Yan et al. describes
usually small (such as k = 1), or is chosen based on LOO cross- this in another way, which is that SVMs, which are maximum
validation.208,209,211,212 If the value of k chosen is too small, it margin classifiers, simultaneously minimize the empirical
might result in overfitting, while if the value of k is too large, it classification error and maximize the geometric margin.225
might result in misclassification.209 While kNN is easy to According to Vapnik et al., an SVM maps the input vectors
implement and often gives good performance, it is heavily into some high-dimensional feature space Z through some
dependent on the classification accuracy of the test class labels as nonlinear mapping chosen a priori.223 This allows a hyperplane
well as the value of k.211 Samsudin et al. has also outlined several to be constructed between the data points, where there is a
improvements to kNN in the literature.210 Table 3, entries 4− margin of separation between the two classes.223 If the training
11, summarizes some of the results achieved by kNN models. data cannot be separated with error, the algorithm would
Decision Trees. A decision tree (DT) is a tree-structured separate the training set with a minimal number of errors, which
classifier consisting of a root, nodes, and leaves, where each node results in a soft margin SVM.223
has only one unique path from the root. The decision that the SVMs usually results in a feature space where the data are
classifier makes at each node is based on decision rules, which linearly separable, even when the initial feature space is
depends on the features of the data used. Several papers have nonlinear. In the case of a linear classifier, the feature space is
explained DTs in great detail and will not be reproduced separated by a hyperplane with eq 3, where w is the weight
here.213−215 Furthermore, in a later section, RF, which is an vector, x is the input vector, and b is the bias (Figure
ensemble of decision trees, will also introduce the basics of 5).20,223,228−230
decision trees.
DTs have the advantage of model interpretability because the wT · x + b = 0 (3)
decision rules can be retrieved from the model for each result,
unlike complex models like NNs where each node is based on all
of the nodes in the previous layer. However, due to the design of
the tree, it is easier for errors to accumulate at each level, and
thus a compromise on accuracy and efficiency has to be reached.
In predictive toxicology, decision trees are not commonly
used as evidenced by the data from Tables 1 and 2. This could be
because the toxicity end points are complex, and thus a simple
model cannot generalize all the patterns in the data. Table 3,
entries 12−16, summarizes some of the results achieved by DT
models.
Naivë Bayes. In Bayesian classification, the given data are
hypothesized to belong to a particular class.216 The probability
that the hypothesis is true is then calculated.216 Another way to
describe the Bayesian classifier is shown in eq 2, where the
Bayesian classifier is defined as obtaining the posterior
probability P(Ci|A1, ..., An) of each class Ci, using Bayes
rule:217,218
P(Ci|A1 , ... , A n) = P(Ci)P(A1|Ci)...P(A n|Ci)/P(A) (2)
This equation makes the simplifying assumption that given the
class, the attributes, A, are independent, and thus the likelihood Figure 5. A representation of an SVM with linearly separable classes.
can be obtained by the product of the individual conditional
̈ Bayes
probabilities of each attribute.217,218 This is called a naive
(NB) classifier. The geometric margin is thus represented by the constraints
NB models are efficient and generally robust and have high shown in eq 4, and this is shown in Figure 5 as the boundary lines

l
accuracy.217,218 However, the NB model classification accuracy running parallel to the hyperplane.20,223,228−230
o
o≥ 1 when yi = +1
wT ·x + bo m
decreases when the attributes are not independent.217 Also, NB
o
o
o
models cannot deal with nonparametric continuous attrib-
n
utes.217 Some improvements such as feature selection have been ≤−1 when yi = −1 (4)
carried out to tackle these issues.219 Additional details about
these improvements can be found in the literature.217,219−222 To construct a linear SVM classifier for a nonlinear feature space,
Table 3, entries 17−30, shows the results obtained for NB kernels such as a Gaussian, radial basis function, or polynomial
models in predictive toxicology. types are used.9,157,226,231−236 These kernels are functions that
Support Vector Machines. The next model type to be work by mapping the input data which is linearly nonseparable
introduced is SVMs, which was introduced by Vapnik et al.223 It into a higher dimensional space where the data are linearly
is based on the structural risk minimization principle which was separable.9,231 A hyperplane can thus be constructed that
developed from statistical learning theory.223−227 Vapnik et al. separates the two classes, resulting in a situation similar to a
state that the basis of the principle is to control the Vapnik− linear classifier.
G https://dx.doi.org/10.1021/acs.chemrestox.0c00316
Chem. Res. Toxicol. XXXX, XXX, XXX−XXX
Chemical Research in Toxicology pubs.acs.org/crt Review

It is known that SVMs are good at pattern recognition, can difficulty in interpreting the output of the ensemble learning
generalize well, and can handle high-dimensional model, the software availability, and the usability, and,
data.223,226,228,237 However, the drawbacks of SVMs include sometimes, computational cost.245 The key considerations
difficulty in choosing an appropriate kernel and being time- would be the first two points. Elaborating slightly further on
consuming for large data sets.231,233,237 Improvements have the first point, it is known that each machine learning method is
been made to counter these drawbacks. For example, SVMs such more suitable to handle certain types of data or settings. This will
as the generalized eigenvalue proximal SVM and twin SVM have be explained in more detail in the section which covers the
been developed to reduce the time consumed.238−240 These overall analysis. Next, as ensemble learning methods are built up
SVMs work by constructing two nonparallel hyperplanes instead of multiple models, it follows that the total computational cost
of a single hyperplane, effectively reducing the quadratic will at least be equivalent to running each model separately. It is
programming problem (QPP) required to generate the thus likely that significant computational resources need to be
hyperplane(s) from a single large QPP to two smaller allocated to train an ensemble learning method for results to be
QPPs.238−240 Also, multiple kernel functions have been obtained within a reasonable time frame. Improvements to
developed which can handle complex classification problems ensemble learning methods are also covered in the review by
better by adapting better to the characteristics of the data.241−244 Sagi et al.245
In recent years, several SVMs have been used in predictive Even though ensemble learning uses a general approach, for
toxicology. These results are tabulated in Table 3, entries 31−45. example, bagging, we intend to give an estimate of the
Random Forest and Ensemble Learning. A recent review performance of models that use these general approaches as
by Sagi et al. provides a comprehensive survey of ensemble compared to concrete machine learning methods like RF.
learning.245 In this review, some of the ideas by Sagi et al. will be Ensemble learning can be interpreted as a group of algorithms
introduced in this section as a general introduction to this for example, bagging and a machine learning method, or as a
machine learning method. There are also other reviews on group of machine learning models, for example, NB and NNs.
ensemble learning.246−252 In this review, a general overview on Thus, care should be taken when comparing performance
ensemble learning will be given before focusing on random metrics of ensemble learning methods to other machine learning
forest (RF), as it a popular machine learning method used in methods. It should also be noted that RF is a special case because
predictive toxicology. it is made up of a ensemble of decision trees and is treated as a
Ensemble learning refers to methods that combine multiple concrete machine learning method.
inducers to make a decision, typically in supervised machine Next, RF will be covered in more detail, while details on the
learning tasks.245 An inducer, or base-learner, is an algorithm other two methods can be found in the literature.252−257
that takes a set of labeled examples as input and produces a Breiman introduced RF in 2001 as a classifier that is made up of
model that generalizes these examples.245 By combining multiple tree-structured classifiers (decision trees) {h(x, θk), k =
multiple models, the error of a single inducer will likely be 1, ...}, where the {θk} are independent identically distributed
compensated by other inducers, which leads to better overall random vectors, and each tree casts a unit vote for the most
performance as compared to a single inducer.245 In other words, popular class at input x.111,258 Each RF consists of a root, nodes,
the predictive performance of an ensemble learning model and leaves, with each split representing two branches at each
cannot be lower than the predictive performance of each model node (Figure 6). Each node in an RF represents the decision
making up the ensemble.245
More formally, the ensemble learning method can be
represented as follows: Given a data set of n examples and m
features, D = {(xi, yi)}(|D| = n, xi ∈ Rm, yi ∈ R), an ensemble
learning model φ uses an aggregation function G that aggregates
K inducers, { f1, f 2, ..., f k} toward predicting a single output.245
This is represented by eq 5:245
yi ̂ = φ(xi) = G(f1 , f2 , ... , fk ) (5)
In an ensemble method, there exists two main types of
frameworks.245 The dependent framework is one where the
inducer’s output affects the construction of the next inducer.245
In contrast, the independent framework each inducer is built
independently from other inducers.245 For instance, popular Figure 6. A RF with two decision trees. The two colors represent the
ensemble methods such as AdaBoost, bagging, and RF are positive or negative decision made at each split.
examples of dependent, dependent, and independent frame-
works, respectively.245 made by the model which is based on a subset of the available
Ensemble learning methods are generally able to handle class features, while the leaves represent the outputs of the
imbalance, concept drift, and the curse of dimensionality better RF.111,258−260 These outputs, which are predictions of each
as compared to their machine learning counterparts.245 Also, tree, are combined by taking the most common prediction
these methods tend to avoid overfitting as different hypotheses across all trees.109,258,259
are averaged to give the final result.245 Moreover, ensemble To build an RF for machine learning, an algorithm has to be
learning methods decrease the risk of finding a local minimum chosen. One common algorithm used in literature was
while also giving a better representation of the data.245 developed by Breiman and uses the bagging principle together
However, even with these numerous advantages, there are with random feature selection.111,258 Each tree in the RF is
some considerations when building an ensemble learning model. grown by randomly drawing N samples from the original
This includes the individual method’s suitability toward the data, training set with replacement, also known as bootstrapping
H https://dx.doi.org/10.1021/acs.chemrestox.0c00316
Chem. Res. Toxicol. XXXX, XXX, XXX−XXX
Chemical Research in Toxicology pubs.acs.org/crt Review

which is then used to build the tree.111,258 For each node of the
tree, a portion of features from the original feature space is
randomly drawn without replacement, among which the best
split is selected.111,258 Throughout the process, no pruning of
the tree is performed.111,258
Building an RF also requires that one considers pruning as
well as the number of trees. Pruning refers to removing nodes in
the RF-based on a criterion which simplifies the RF.260,261 On
the other hand, the number of trees in an RF affects the
generalization of the model.111,258 However, it was found that
above 10,000 trees using Adaboost, or 100 trees for a well-
defined value of the number of random features preselected in
the splitting process using the Forest-RI method, adding more
trees does not improve performance.111,258
RFs have several advantages such as they can minimize the Figure 7. Graphical representation of a feedforward ANN.
issue of overfitting, are resistant to noise with some algorithms,
can handle high-dimensional data well, have the ability to ignore
irrelevant descriptors, and are interpretable in terms of the Gaussian, and Elliot.278 A popular and recent activation function
decision rules made.109,262,263 RFs are also known to keep the called the rectified linear unit (ReLU) has been shown to
benefits afforded by DTs while achieving better results most of converge quickly for NNs.279−281 The ReLU activation function
the time.263 On the contrary, the disadvantages of RF include is represented by eq 6:
being susceptible to bias when there are dominant features as
well as placing more emphasis on the correlation among smaller f (u) = max(0, u) (6)
groups of features.264 Overall, the output in a feedforward NN can be described by eq
Recently, a review by Fawagreh et al. in 2014 has covered 7 where oi represents the output, σi is the output function, bi is
some of the recent advancements in RF.265 They mention in the the bias input of the ith hidden node, wij is the weight of the
review that the performance of the RF can be improved through connection from the jth input, uj, to the ith hidden node, and M
the use of different voting methods or by implementing a is the total number of inputs.282
weighting scheme for the features or for discarding trees.265
M
Another review by Rokach in 2016 on decision forests
(including RFs) introduces the models, the methods to oi = σi(bi + ∑ (wij × uj))
construct them, and surveys the state-of-the-art methods in j=1 (7)
the field.246 Next, Table 3, entries 46−58, summarizes the results The starting weights are usually randomized, with the final
of RFs and ensemble learning used in predictive toxicology. weights commonly determined by backpropagation. Back-
Neural Networks (Artificial Neural Networks, Deep propagation is a method to calculate the gradient of the error
Neural Networks, Convolutional Neural Networks). with respect to weights, and several algorithms have been
Neural networks (NNs) can generally be divided into three developed which include first-order and second-order algo-
groups, namely artifical neural networks (ANNs), deep neural rithms.283,284 One of the commonly used algorithm is the
networks (DNNs), and convolutional neural networks (CNNs). Levenberg−Marquardt algorithm which is a second-order
In this review, ANNs and DNNs will be elaborated in more algorithm.285 Wilamowski et al. describes these backpropagation
detail first as they as similar, following which CNNs will be algorithms in more detail.285
explained in more detail. Recently, DNNs have gained interest due to deep learning, of
In recent times, ANNs which excel at pattern recognition and which there have been several reviews published in the
classification have been successfully applied in multiple fields literature.1,84,91,269,284,286−290 These DNNs are NNs with a
such as novelty detection, renewable energy systems, and image deep network architecture, where the number of hidden layers is
processing.266−268 The most widely used types of ANNs include more than one.291,292 By increasing the number of hidden layers,
feedforward and recurrent neural networks, of which feedfor- DNNs can handle more complex problems, of which there are
ward neural networks are generally enough for most binary successful examples such as skin cancer classification, image
classification tasks.269 In this review, the focus will be on classification, and syntheses route planning.291−293
feedforward neural networks. Recurrent neural networks such as Despite the growing popularity of DNNs, one must
long short-term memory are not covered in this review and can acknowledge the several limitations of DNNs. These include
be found in the literature.270−274 In a typical feedforward NN, long network training times due to the increase in the number of
the network is made up of layers, namely the input layer, hidden hidden layers and parameters. Improvements have been made to
layers and the output layer. Each layer is in turn made up of rectify these issues. For example, graphical processing units
independent nodes or neurons, with each node connected to all (GPUs) have been employed to train NNs, where the higher
nodes in the subsequent layer. This is illustrated in Figure 7. In processing capabilities of GPUs help reduce training
the literature, several strategies for determining the number of times.294−297
nodes as well as hidden layers when building a NN have been On the other hand, ANNs have their own set of limitations
described.275−277 which include their expressivity, which shows that they are
In addition to the network architecture, one must choose an unable to express certain functions that DNNs are capable of, as
activation function which is a function that transforms the well as having lower approximation capability as compared to
activation level of a unit (neuron) into an output signal.278 DNNs.298 Unfortunately, as these are issues intrinsic to the
Examples of activation functions include linear, sigmoid, model type, there are no good solutions for them.
I https://dx.doi.org/10.1021/acs.chemrestox.0c00316
Chem. Res. Toxicol. XXXX, XXX, XXX−XXX
Chemical Research in Toxicology pubs.acs.org/crt Review

In the field of predictive toxicology, DNNs have been shown outperform traditional CNNs. A recent review by Rawat et al.
to be successful in numerous examples (Table 3). Several papers covers the topic of deep CNNs more extensively.300 On the
have also covered on the topic of NNs or DNNs in the field of other hand, graph CNNs are the most recent CNNs to be
predictive toxicology,10,16,102 in particular, the review by Tang et developed which consist of embedding nodes in a graph in
al. in 2018.5 Hence, this review will not focus on the recent Euclidean space.303 More information about graphs and graph
advances of DNNs, but rather the model performance of DNNs CNNs can be found in the literature.303−307
in predictive toxicology. Given the increased performance of Although CNNs have made advances in recent times, in the
DNNs as compared to their ANN counterparts, it is predicted field of predictive toxicology, CNNs are uncommonly used. This
that the use of DNNs in predictive toxicology will become more is because CNNs use images as inputs, while in predictive
prevalent. The complexity provided by DNN could also give toxicology, molecular structures, fingerprints, or descriptors are
these models an edge when predicting complex toxicity end more common as inputs. An example of this is the work by
points, for which simpler models might not be able to model Jimenez-Carretero et al.308 In their work, a CNN using cell-
well. based images as features was developed. This is a situation where
CNNs are similar to ANNs and DNNs in the sense that they machine learning is used to generalize the biological properties,
also consist of layers and are feedforward networks. However, rather than the chemical properties which the main focus of
unlike their counterparts, a CNN typically consists of convolu- predictive toxicology focuses on. In this review, we have chosen
tional and pooling layers stacked on top of each other.299−301 to focus on molecular structure in predictive toxicology, as thus
The fully connected layers that follow these layers interpret the far, CNNs have not shown themselves to be useful in this field.
feature representations and perform the function of high-level Table 3, entries 59−72, summarizes some of the results achieved
reasoning, such as classification.300 by NN models.


Convolutional layers serve as feature extractors, where the
neurons are arranged into feature maps.302,303 Each neuron has a
OVERALL ANALYSIS
receptive field which is connected to the neurons in the previous
layer via a set of trainable weights, while all neurons within a The popularity of SVMs and RFs could be attributed to their
feature map have weights that are constrained to be equal.302,303 advantages, such as being efficient while easy to use as well as
Eq 8 shows the representation of the kth output feature map Yk, being able to generalize the data well. Furthermore, numerous
where x represents the input image, Wk the convolutional filter, successful models have been built for these two model types,
the multiplication sign as the two-dimensional (2D) convolu- thus contributing to their well-established reputation. On the
tional operator used to calculate the inner product of the filer other hand, recently developed machine learning models, such
model at each location of the input image, and f (·) the nonlinear as DNNs or ensemble learning, are less popular possibly due to
activation function.300 their high computational cost or their complexity or because
they are not as well-established as SVMs and RFs. Machine
Yk = f (Wk × x) (8) learning methods perform differently on different data sets, and
In contrast, the pooling layers reduce the spatial resolution of the these differences arise from the diverse characteristics of the
feature maps which reduces the number of parameters (controls data, such as the data set size, class distribution, and the
overfitting), achieving spatial invariance to input distortions and distribution of the data in the feature space.
translations.300 These pooling operations play a role in It is also observed that the simplest machine learning method,
producing downstream representations that are more robust regression models are not the most popular model of choice in
to the effects of variations in data while still preserving important predictive toxicology. The performance of regression models
motifs.300,302 Lee et al. and Rawat et al. describes these pooling also falls behind their classification counterparts. However,
operations in more detail.300,302 In toxicology, molecules are caution should be taken when comparing the performance of
represented as images, grids, or graphs which are then fed into regression models with classification models, as the two are
the CNN for training.165,168,169,171 inherently different. This illustrates the complexity of the
Similar to ANNs and DNNs, CNNs also uses the back- problem of predicting the toxicity of chemicals, where usually a
propagation algorithm during training.300 A typical CNN with more complex machine learning model is required to model the
both convolutional and pooling layers is shown in Figure 8. problem or the data more effectively. A more complex model
CNNs have several advantages over traditional ANNs such as might also produce results that one would otherwise miss if a
requiring fewer free parameters and being able to deal with the simpler model was used instead, simply because the model does
variability of 2D shapes.300 However, CNNs require a large not overgeneralize the data. Another possible reason is perhaps
amount of training data which increases computational cost and the familiarity or ease of use of such machine learning methods,
lengthens training time.300 In recent years, new CNNs have been where one is inclined to use a method that is more familiar to
developed such as deep CNNs and graph CNNs which raise the chances of building a successful model. Since SVM has
been well established in the literature, it continues to be a
popular machine learning method for predictive toxicology.
The statistical performance results of all machine learning
methods in predictive toxicology covered, including the values
for SE, SP, accuracy, AUC, and the validation type are
summarized in Table 3. It is common to see that different
studies use different performance metrics to measure their
model’s performance. While the best performance metric to use
is still up for debate, such diversity in the performance metrics
makes it difficult to compare across different models. Moreover,
Figure 8. A CNN with convolutional and pooling layers. as the models generally do not use a benchmark data set, it is
J https://dx.doi.org/10.1021/acs.chemrestox.0c00316
Chem. Res. Toxicol. XXXX, XXX, XXX−XXX
Chemical Research in Toxicology pubs.acs.org/crt Review

Figure 9. A graph showing model performance against training data set size across (A) different model types and (B) different end points.

once again difficult to compare the performance of different in vitro methods which covers different mechanisms that may
models. lead to carcinogenicity.309,310 In silico predictions of the Ames
Generally, the machine learning methods in Table 3 have an test mutagenicity have also been investigated by Xu et al. and
accuracy or AUC of 75% or above. Those models which Hillebrecht et al., which highlight the need for faster and more
reported a lower performance than expected could have accurate predictions of a key test in toxicology.167,311 Hillebrecht
experienced issues with the data set, such as the data set not et al. has demonstrated that the expert-based system Derek
being able to generalize well to another test set. Most of the time, performs the best for predicting Ames test mutagenicity, though
it is more likely for there to be a problem with the data or the they believe that the fusion of the expert-based system and
deployment of the algorithm, rather than the machine learning QSAR techniques would lead to improvements in the predictive
method of choice which are generally well established. power of the in silico models.311 Therefore, by screening for
Based on the data and the search criteria outlined in Table 3,
these major types of toxicity, the number of potential drug
hepatotoxicity, carcinogenicity/mutagenicity/genotoxicity, and
candidates can be reduced to a smaller number from a larger
cardiotoxicity are the most common types of toxicity that have
range of drug-like compounds.
been investigated. Hepatotoxicity is important in predictive
toxicology because most toxicity originates from the liver, which The distribution of all models in Table 3 is shown in Figure
is the main site of metabolism for drugs, that is, hepatotoxic 9A. Entries without accuracy/AUC values or missing data were
compounds would have adverse side effects in vivo and thus are omitted. Also, for entries that report both values, accuracy was
unsuitable to be drugs. 139,156 Hence, determining the chosen to represent the model performance as it is a common
hepatotoxicity potential of a drug candidate would allow for performance metric. Larger data set sizes do not correspond to
the quick screening of potential drug candidates from all of the higher model performance, but some model types do appear to
compounds. Cardiotoxicity is important because side effects have been more prominently reported for some data set sizes.
such as cardiac arrest are highly undesirable. Lastly, tests for Ensemble learning/RF, nearest neighbor algorithms, SVMs, and
carcinogenicity and mutagenicity, during drug screening are NB models are more common where training data sets are under
common, as cancer is known to be a leading cause of death in the 1000 data points. NNs of all types see more use in data sets over
developed world.225 The Ames test is used as part of a battery of 2000 data points.
K https://dx.doi.org/10.1021/acs.chemrestox.0c00316
Chem. Res. Toxicol. XXXX, XXX, XXX−XXX
Chemical Research in Toxicology pubs.acs.org/crt Review

When models with accuracy or AUC ≥90% are considered, library.181 It was found that the in vitro assay data-based models
ensemble learning/RF is more prominent among smaller data performed better when predicting in vivo toxicity end points for
sets, while NNs, SVMs, and NB are more common among larger humans than using in vivo data from animals to predict in vivo
data sets. NNs are the most represented algorithm type in this end points for humans.181 While more high-quality data for
category, with three models scoring above 90%. human toxicity and in vivo studies are required to better assess
Looking at the highest performing models for each of the the model performance of these in vitro data-based models in the
toxicity end points most represented in Table 3, some prediction of in vivo toxicity, this result demonstrates that in vitro
suggestions can be provided for those new to machine learning data from assays offers a promising alternative to expensive and
model construction. The highest performing models for low-throughput methods of obtaining in vivo data and that
carcinogenicity, genotoxicity, and mutagenicity are RFs and extrapolation from in vitro data to human in vivo end points is
SVMs using MACCS keys and PubChem fingerprints as inputs more reliable than extrapolation from animal to human in vivo
and relatively small data sets.177 High-performing models on data.181
larger data sets use molecular descriptors and ECFPs in a NB In another study by Novotarskyi et al. in 2016, the top model
model.146 For cardiotoxicity, the highest performing model is a for the ToxCast Environmental Protection Agency challenge
deep NN using fingerprints on a small data set.161 A SVM using was reported.312 The aim of the challenge was to develop a
MACCS fingerprints on a small data set provides the highest model to predict the lowest effect level concentration (in vivo
performing model in hepatotoxicity.156 These are summarized toxicity) based on in vitro measurements and calculated in silico
in Figure 10 and serve as suggestions to be considered when descriptors.312 A recent study by Xu et al. in 2020 also
investigated in vivo toxicity. In their work, predictive models for
human organ toxicity based on in vitro bioactivity data and
chemical structure were developed.313 The models could be
used to hazard screen large sets of chemicals for potential human
toxicity as well as to provide insights into toxicity mecha-
nisms.313
In this review, we look at both in vivo and in vitro assay data,
but we expect that the importance of in vitro data is likely to
increase in future studies. A related subject is the extrapolation of
in vitro assay data toward in vivo data.314 This is an important
area, for which machine learning has yet to be applied, and may
be one to watch for the future.
The recent successes of machine learning in predictive
Figure 10. A summary of the highest performing methods and inputs toxicology have demonstrated that machine learning methods
found separated by data set size. can generalize the data as well as predict the toxicity potential of
compounds accurately. While the results in Table 3 are generally
constructing new models, as the data type, distribution, and of the single task classification method, multitask classification/
modelability also affect which models will perform best and how learning is also another method used in predictive toxicology. By
high model performance statistics will be. learning tasks in parallel, multitask learning has the potential to
It has to be acknowledged that it is hard to compare the improve the generalization of the model, provided that sufficient
algorithm performance over different toxicity end points due to data are available, longer training times and more complex
the difference in complexity and data available. Hence, a subplot architectures are practical, and the distinct data sets are
of Figure 9A was generated for the three common toxicity end sufficiently closely linked for the models to be related.315 The
points, namely, hepatotoxicity, carcinogenicity/mutagenicity/ assignment of more than one label to each instance might
genotoxicity, and cardiotoxicity. This subplot is shown in Figure improve model performance by increasing model complexity.
9B. There are more models predicting cardiotoxicity with an Several papers have explained multitask classification, and this
accuracy/AUC of above 90%, followed by carcinogenicity/ could be an alternative approach to take instead of varying the
mutagenicity/genotoxicity, and last hepatotoxicity. The lower machine learning method used.19,315−317 For example, the work
overall performance of models predicting hepatotoxicity could by Mayr et al. in 2016 used multitask learning and found that it
possibly arise due to a lack of data used for building the models. enhances the model performance for 10 out of 12 assays.17 In
It is also observed that most of the models have 2000 or less another study, Wu et al. used multitask learning on four different
training data points. quantitative toxicity data sets.10 Generally, it was observed that
To study model performance based on human vs animal and the multitask models performed better than single task models
in vivo vs in vitro data, the studies in Table 3 were differentiated when suitable data were available.10 Hence, by using machine
based on the data used where possible and plotted alongside learning in predictive toxicology, the advantages of the various
each other (Supporting Information). No clear trend or pattern machine learning methods can be applied to the databases of
was observed for the generated plots. This indicates that the drugs and drug-like compounds. This would lead to even more
model performance is dependent on the quality and quantity of efficient and accurate predictions for drug toxicity.
the data used rather than the type of data (e.g., animal vs However, there are some considerations when using machine
human). Caution must also be taken when interpreting these learning to predict toxicity and even for using machine learning
results as a whole because these studies use different data sets. in general. First, care must be taken when processing the data set
The lack of high-quality in vivo data could also have contributed for input into the model. This is because the results obtained by
to this outcome. machine learning are highly dependent on the characteristics of
In 2016, Huang et al. built predictive models for 72 in vivo the input data. For example, a data set containing a significant
toxicity end points using in vitro data from the Tox21 10 K majority of nontoxic compounds will likely result in a model that
L https://dx.doi.org/10.1021/acs.chemrestox.0c00316
Chem. Res. Toxicol. XXXX, XXX, XXX−XXX
Chemical Research in Toxicology pubs.acs.org/crt Review

is skewed toward predicting nontoxicity. This would also affect they are generally more resistant to noise while being able to
the model’s ability to predict toxicity on unseen data. This is part output results efficiently. To model complex problems which
of the imbalanced data problem in predictive toxicology, as most typical machine learning methods are unable to handle, deep
of the available data are about toxic compounds, while the learning (for example DNNs) as well as ensemble learning seem
number of nontoxic compounds is significantly fewer. Other to be the most suitable machine learning method of choice due
than the class distribution in the data set, the size of the data set to their more complex model architecture. Images are handled
also needs to be considered. Generally, machine learning by the specialized CNNs, while ANNs are a general machine
methods perform better and can generalize better as the size of learning method that can be used to model data. Therefore,
the data set increases, provided that the quality of the data does understanding the data well is the first step to building a
not diminish. successful model in machine learning.
Another issue to take note of for machine learning concerns
overfitting when training the model on the data set. Overfitting
of a model means that the model learns the training data too
■ FUTURE OUTLOOK AND CONCLUSION
Thus far, machine learning has been discussed, while the recent
well, that is, the model has memorized the training data. This results of machine learning in predictive toxicology have been
affects the results when predicting new, unseen test data, in summarized and analyzed. In the future, it is expected that more
which case the model normally performs badly or more poorly models will be developed to predict toxicity, especially with
than expected. In contrast, the results for the training data would technological advances which help lower computational costs
score very well across most common performance metrics. and the continual development of new data sources. However,
During model training, overfitting can be identified as the region much needs to be done to address the main bottleneck facing
after the point at which the loss function reaches the minimum machine learning in predictive toxicology, which is the quality
and is represented by an increase in the loss function after the and quantity of the data that is available to create data sets. While
minimum point. Another indicator of overfitting is when there is collaborations with pharmaceutical companies help mitigate
a large difference between the training and test accuracy or when part of this issue, as well as there being publicly accessible
the gap between these two metrics increases during model databases online, there are some gene or protein targets or even
training. toxicological end points which cannot be reliably predicted due
Some methods to tackle overfitting include sampling to the lack of data. However, if a complete computational model,
techniques for imbalanced data, which have been mentioned or more likely, a collection of computational models
in an earlier section. In contrast, regularization is commonly encompassing all of human toxicology, is to be built, these
used during training to handle the overfitting problem. gaps in data need to be addressed. Moreover, more must be done
Regularization aims to minimize the loss function, subject to a to solve the issue of imbalanced data in predictive toxicology, an
regularization condition on the model parameter, where the example of which is to collect and disseminate negative
regularization parameter is represented by λ.318−320 The first experimental results for the compounds.
regularization method is early stopping, which, as its name Nevertheless, there have been many machine learning models
suggests, is stopping the training of the model before the that have high-performance metrics for predicting drug toxicity,
overfitting region.321,322 However, this can only be carried out if which demonstrates the applicability of machine learning in
there is a clear identification of the overfitting region. Another predictive toxicology. However, even the best performing model
two common regularization methods are L1 and L2 regulariza- type has its own set of limitations that has been covered and
tion. L1 regularization uses a penalty term which encourages the needs to be addressed and improved on for machine learning in
sum of the absolute values of the parameters to be small, while L2 predictive toxicology to make further advances. While several
regularization encourages the sum of the squares of the common types of machine learning methods have been
parameters to be small.323 More details about regularization discussed in this review, other machine learning methods are
can be found in the literature.318−327 being developed and may become influential as their efficacy is
Therefore, overfitting is an issue of immense importance in established.
the development of all computational models, and complex Recent research in predictive toxicology heavily focuses on
machine learning algorithms are often considered to have the hepatotoxicity, carcinogenicity, cardiotoxicity, and mutagenic-
most danger of overfitting. Modellers often use regularization or ity, while other types of toxicity are relatively less explored.
external validation to limit the effects of overfitting, but these do While detailed models of these toxicities have been developed,
not establish how applicable a model is to an incoming novel this leaves a large amount of human toxicology unexplored. By
chemical. For this an applicability domain is appropriate, and understanding more about all types of toxicity (and not just the
these domains are common practice in the QSAR field and have main types of toxicity), the community as a whole can move
been identified as a key element in in silico toxicology closer to the dream of eliminating in vivo toxicity testing by
modeling.328 The use of applicability domains does not yet replacing it with in silico means.
seem to be commonplace in the development of machine Mechanistic understanding is also key in the future of
learning algorithms. For additional acceptance in the field of toxicology, hence, another topic on the rise in predictive
toxicology, particularly at a regulatory level, machine learning toxicology deals with adverse outcome pathways (AOPs) and
models should aim to meet these requirements. molecular initiating events (MIEs).57,58,329−334 While struc-
With the vast quantity of data available, how should one then ture−activity relationships have been well established and
choose a machine learning model for their data set? In our researched in predictive toxicology, AOPs and MIEs are
opinion, there is perhaps no best method that can generalize all relatively less researched. The lack of data on AOPs and MIEs
data sets, but rather only the most suitable method. For data with prove to be detrimental when trying to predict toxicological end
a strong correlation between features, regression models, kNNs, points from the compound structure. Although there have been
NB models, or SVMs seem to be the most suitable due to their some improvements in this aspect such as the establishment of a
characteristics. DTs or RF can be considered for noisy data, as publicly accessible AOP database (AOP-Wiki: https://aopwiki.
M https://dx.doi.org/10.1021/acs.chemrestox.0c00316
Chem. Res. Toxicol. XXXX, XXX, XXX−XXX
Chemical Research in Toxicology pubs.acs.org/crt Review

org/), much still has to be done if the accurate prediction of 0002-8693-9136; Phone: 01223 336434; Email: jmg11@
toxicity is to become a reality. cam.ac.uk
Nevertheless, in silico methods have been increasingly
employed in predictive toxicology, particularly during the Authors
screening process of new chemicals for safety decision making. Marcus W. H. Wang − Centre for Molecular Informatics,
The use of in silico methods such as machine learning to Department of Chemistry, University of Cambridge,
complement in vitro methods is the current status quo of the Cambridge CB2 1EW, United Kingdom
industry. While examples tend to be commercially sensitive and Timothy E. H. Allen − Centre for Molecular Informatics,
these data are rarely shared openly, there are some recent papers Department of Chemistry and MRC Toxicology Unit,
that contain some information about the use of in silico methods University of Cambridge, Cambridge CB2 1EW, United
in the industry.335−337 As the amount of available data increases, Kingdom; orcid.org/0000-0001-7369-0901
machine learning methods will likely become more attractive Complete contact information is available at:
over expert-based systems due to their scalability. In particular, if https://pubs.acs.org/10.1021/acs.chemrestox.0c00316
more data can be generated or made publicly available, machine
learning is also expected to perform better even if there are no Funding
improvements in the current algorithms. Perhaps, this indicates The authors acknowledge the financial support from Unilever.
a possible direction of development for future in silico methods, Notes
where the focus will be on generating new data. The authors declare no competing financial interest.
Even as the need for new data is highlighted, regulatory Biographies
acceptance of such in silico methods is also key to their
widespread use and acceptance in the industry and by regulators.
Regulations protect consumers, ensuring products have been
through a rigorous safety protocol, allowing them to use these
products while feeling at ease. Protocols to establish the quality
and reliability of in silico methods have to be created, which
should also ensure that reproducible results can be obtained. In
silico and in vitro approaches must be demonstrated to be able to
produce risk assessments as rigorous as those of traditional
methods with clear mechanistic understanding throughout. In
order to help bridge these gaps, machine learning algorithms
should be combined with more traditional computational
approaches such as read across and experimental in vitro studies
as part of a weight of evidence approach. Case studies will need
to be constructed and presented to regulators to gain
confidence. More effort thus has to be put in to develop such Marcus Wang is a second year Ph.D. student studying Chemistry at the
studies and encourage their usage throughout industry and University of Cambridge and is currently working in the research group
academia if there is to be even more progress in the use of in silico of Jonathan Goodman. His current research focuses on creating
methods. machine learning models that are trained on molecular fingerprints.
Currently, machine learning algorithms are significantly These models will be used subsequently to predict toxicological end
useful, and this is highlighted by the recent successes of such points such as hepatotoxicity and carcinogenicity. He is also
methods in predictive toxicology. However, these methods still investigating the use of techniques such as dimensionality reduction
have much potential to be unlocked, which can only be done if methods in order to better understand the data used for the machine
the issues of insufficient high-quality data and regulatory learning models.
acceptance are resolved. Nevertheless, the future of machine
learning applications in predictive toxicology is bright, and we
envision that in silico methods, in particular machine learning
algorithms, will be increasingly used in the industry and in
academia to complement the use of in vitro methods.


*
ASSOCIATED CONTENT
sı Supporting Information

The Supporting Information is available free of charge at


https://pubs.acs.org/doi/10.1021/acs.chemrestox.0c00316.
Further data and discussion on in vitro and in vivo data and
between animal and human studies (PDF)

■ AUTHOR INFORMATION
Corresponding Author
Tim Allen is a Research Associate at the MRC Toxicology Unit,
University of Cambridge. He completed his Ph.D. in 2016 on molecular
Jonathan M. Goodman − Centre for Molecular Informatics, initiating events (MIEs) and how computational methods can be used
Department of Chemistry, University of Cambridge, to predict them. Since then, his postdoctoral work has included using
Cambridge CB2 1EW, United Kingdom; orcid.org/0000- quantum chemistry density functional theory calculations, three-

N https://dx.doi.org/10.1021/acs.chemrestox.0c00316
Chem. Res. Toxicol. XXXX, XXX, XXX−XXX
Chemical Research in Toxicology pubs.acs.org/crt Review

dimensional quantitative structure−activity relationships and state-of- (7) Cao, D. S., Zhao, J. C., Yang, Y. N., Zhao, C. X., Yan, J., Liu, S., Hu,
the-art machine learning approaches to model MIEs. Tim has served as Q. N., Xu, Q. S., and Liang, Y. Z. (2012) In Silico Toxicity Prediction by
a member of ILSI Europe’s expert group on the application of adverse Support Vector Machine and SMILES Representation-Based String
outcome pathways in food ingredient risk assessment and was the Kernel. SAR QSAR Environ. Res. 23 (1−2), 141−153.
(8) Kourou, K., Exarchos, T. P., Exarchos, K. P., Karamouzis, M. V.,
recipient of the 2020 European Society of Toxicology In Vitro Early
and Fotiadis, D. I. (2015) Machine Learning Applications in Cancer
Career Award.
Prognosis and Prediction. Comput. Struct. Biotechnol. J. 13, 8−17.
(9) Cao, D.-S., Dong, J., Wang, N.-N., Wen, M., Deng, B.-C., Zeng, W.-
B., Xu, Q.-S., Liang, Y.-Z., Lu, A.-P., and Chen, A. F. (2015) In Silico
Toxicity Prediction of Chemicals from EPA Toxicity Database by
Kernel Fusion-Based Support Vector Machines. Chemom. Intell. Lab.
Syst. 146, 494−502.
(10) Wu, K., and Wei, G. W. (2018) Quantitative Toxicity Prediction
Using Topology Based Multitask Deep Neural Networks. J. Chem. Inf.
Model. 58 (2), 520−531.
(11) Lavecchia, A. (2015) Machine-Learning Approaches in Drug
Discovery: Methods and Applications. Drug Discovery Today 20 (3),
318−331.
(12) Mitchell, J. B. O. (2014) Machine Learning Methods in
Chemoinformatics. Wiley Interdiscip. Rev. Comput. Mol. Sci. 4 (5), 468−
481.
(13) Gao, M., Igata, H., Takeuchi, A., Sato, K., and Ikegaya, Y. (2017)
Machine Learning-Based Prediction of Adverse Drug Effects: An
Jonathan Goodman is a Professor of Chemistry at the University of Example of Seizure-Inducing Compounds. J. Pharmacol. Sci. 133 (2),
Cambridge. His research focuses on computational organic chemistry 70−78.
and chemical informatics. As well as focusing on computational (14) Lo, Y.-C., Rensi, S. E., Torng, W., and Altman, R. B. (2018)
toxicology, his research group also analyzes reaction mechanisms and Machine Learning in Chemoinformatics and Drug Discovery. Drug
analytical data and also works with IUPAC to develop the InChI Discovery Today 23 (8), 1538−1546.
identifier. In 2013, he won the RSC’s Bader Award. (15) Wu, Y., and Wang, G. (2018) Machine Learning Based Toxicity


Prediction: From Chemical Structural Description to Transcriptome
Analysis. Int. J. Mol. Sci. 19 (8), 2358.
ABBREVIATIONS (16) Idakwo, G., Luttrell, J., Chen, M., Hong, H., Zhou, Z., Gong, P.,
ADME, absorption, distribution, metabolism, and excretion; and Zhang, C. (2018) A Review on Machine Learning Methods for in
ANN, artificial neural network; AOP, adverse outcome pathway; Silico Toxicity Prediction. J. Environ. Sci. Heal. - Part C Environ.
Carcinog. Ecotoxicol. Rev. 36 (4), 169−191.
AUC, area under the receiver operating characteristic curve;
(17) Mayr, A., Klambauer, G., Unterthiner, T., and Hochreiter, S.
CNN, convolutional neural network; DNN, deep neural (2016) DeepTox: Toxicity Prediction Using Deep Learning. Front.
network; DT, decision tree; ECFP, extended connectivity Environ. Sci. 3 (FEB), 80.
fingerprint; GPU, graphical processing unit; kNN, k-nearest (18) Zhang, L., Zhang, H., Ai, H., Hu, H., Li, S., Zhao, J., and Liu, H.
neighbor; LMO-CV, leave-many-out cross-validation; LOO, (2018) Applications of Machine Learning Methods in Drug Toxicity
leave one out; MACCS, Molecular ACCess System; MCCV, Prediction. Curr. Top. Med. Chem. 18 (12), 987−997.
Monte Carlo cross-validation; MIE, molecular initiating event; (19) Raies, A. B., and Bajic, V. B. (2018) In Silico Toxicology:
NB, naive ̈ Bayes; NN, neural network; QPP, quadratic Comprehensive Benchmarking of Multi-Label Classification Methods
programming problem; QSAR, quantitative structure−activity Applied to Chemical Toxicity Data. Wiley Interdiscip. Rev.: Comput. Mol.
relationship; ReLU, rectified linear unit; RF, random forest; SE, Sci. 8 (3), e1352.
sensitivity; SP, specificity; SVM, support vector machine; SVR, (20) Pu, L., Naderi, M., Liu, T., Wu, H. C., Mukhopadhyay, S., and
Brylinski, M. (2019) EToxPred: A Machine Learning-Based Approach
support vector regression


to Estimate the Toxicity of Drug Candidates. BMC Pharmacol. Toxicol.
20 (1), 2.
REFERENCES (21) Lysenko, A., Sharma, A., Boroevich, K. A., and Tsunoda, T.
(1) Angermueller, C., Pärnamaa, T., Parts, L., and Stegle, O. (2016) (2018) An Integrative Machine Learning Approach for Prediction of
Deep Learning for Computational Biology. Mol. Syst. Biol. 12 (7), 878. Toxicity-Related Drug Safety. Life Sci. Alliance 1 (6), e201800098.
(2) Dana, D., Gadhiya, S., St. Surin, L., Li, D., Naaz, F., Ali, Q., Paka, L., (22) Wenlock, M. C., Austin, R. P., Barton, P., Davis, A. M., and
Yamin, M., Narayan, M., Goldberg, I., and Narayan, P. (2018) Deep Leeson, P. D. (2003) A Comparison of Physiochemical Property
Learning in Drug Discovery and Medicine; Scratching the Surface. Profiles of Development and Marketed Oral Drugs. J. Med. Chem. 46
Molecules 23 (9), 2384. (7), 1250−1256.
(3) Chen, J., Tang, Y. Y., Fang, B., and Guo, C. (2012) In Silico (23) Frank, C., Himmelstein, D. U., Woolhandler, S., Bor, D. H.,
Prediction of Toxic Action Mechanisms of Phenols for Imbalanced Wolfe, S. M., Heymann, O., Zallman, L., and Lasser, K. E. (2014) Era Of
Data with Random Forest Learner. J. Mol. Graphics Modell. 35, 21−27. Faster FDA Drug Approval Has Also Seen Increased Black-Box
(4) Merlot, C. (2010) Computational Toxicology-a Tool for Early Warnings And Market Withdrawals. Health Aff. 33 (8), 1453−1459.
Safety Evaluation. Drug Discovery Today 15 (1−2), 16−22. (24) Foufelle, F., and Fromenty, B. (2016) Role of Endoplasmic
(5) Tang, W., Chen, J., Wang, Z., Xie, H., and Hong, H. (2018) Deep Reticulum Stress in Drug-induced Toxicity. Pharmacol. Res. Perspect. 4
Learning for Predicting Toxicity of Chemicals: A Mini Review. J. (1), e00211.
Environ. Sci. Heal. - Part C Environ. Carcinog. Ecotoxicol. Rev. 36 (4), (25) DiMasi, J. A. (2001) Risks in New Drug Development: Approval
252−271. Success Rates for Investigational Drugs. Clin. Pharmacol. Ther. 69, 297−
(6) Koutsoukas, A., St. Amand, J., Mishra, M., and Huan, J. (2016) 307.
Predictive Toxicology: Modeling Chemical Induced Toxicological (26) Segall, M. D., and Barber, C. (2014) Addressing Toxicity Risk
Response Combining Circular Fingerprints with Random Forest and When Designing and Selecting Compounds in Early Drug Discovery.
Support Vector Machine. Front. Environ. Sci. 4, 11. Drug Discovery Today 19 (5), 688−693.

O https://dx.doi.org/10.1021/acs.chemrestox.0c00316
Chem. Res. Toxicol. XXXX, XXX, XXX−XXX
Chemical Research in Toxicology pubs.acs.org/crt Review

(27) Fröhlich, E., and Roblegg, E. (2012) Models for Oral Uptake of Hoberman, A., Kinter, L. B., Madden, S., Mattis, C., Stemple, H. A.,
Nanoparticles in Consumer Products. Toxicology 291, 10−17. and Wilson, S. (2013) Pharmaceutical Toxicology: Designing Studies
(28) Mantovani, A., Maranghi, F., La Rocca, C., Tiboni, G. M., and to Reduce Animal Use, While Maximizing Human Translation. Regul.
Clementi, M. (2008) The Role of Toxicology to Characterize Toxicol. Pharmacol. 66 (1), 88−103.
Biomarkers for Agrochemicals with Potential Endocrine Activities. (45) Knudsen, T. B., Keller, D. A., Sander, M., Carney, E. W., Doerrer,
Reprod. Toxicol. 26 (1), 1−7. N. G., Eaton, D. L., Fitzpatrick, S. C., Hastings, K. L., Mendrick, D. L.,
(29) Smith, M.-C., Madec, S., Coton, E., and Hymery, N. (2016) Tice, R. R., Watkins, P. B., and Whelan, M. (2015) FutureTox II: In
Natural Co-Occurrence of Mycotoxins in Foods and Feeds and Their in Vitro Data and In Silico Models for Predictive Toxicology. Toxicol. Sci.
Vitro Combined Toxicological Effects. Toxins 8 (4), 94. 143 (2), 256−267.
(30) Rovida, C., Asakura, S., Daneshian, M., Hofman-Huether, H., (46) Li, X., Chen, L., Cheng, F., Wu, Z., Bian, H., Xu, C., Li, W., Liu,
Leist, M., Meunier, L., Reif, D., Rossi, A., Schmutz, M., Valentin, J. P., G., Shen, X., and Tang, Y. (2014) In Silico Prediction of Chemical Acute
Zurlo, J., and Hartung, T. (2015) Food for Thought···: Toxicity Testing Oral Toxicity Using Multi-Classification Methods. J. Chem. Inf. Model.
in the 21st Century beyond Environmental Chemicals. ALTEX 32 (3), 54 (4), 1061−1069.
171−181. (47) Goh, G. B., Hodas, N. O., and Vishnu, A. (2017) Deep Learning
(31) Van Norman, G. A. (2016) Drugs, Devices, and the FDA: Part 1: for Computational Chemistry. J. Comput. Chem. 38 (16), 1291−1307.
An Overview of Approval Processes for Drugs. JACC Basic to Transl. Sci. (48) Raies, A. B., and Bajic, V. B. (2016) In Silico Toxicology:
1 (3), 170−179. Computational Methods for the Prediction of Chemical Toxicity. Wiley
(32) Lee, K. H., Baik, S. Y., Lee, S. Y., Park, C. H., Park, P. J., and Kim, Interdiscip. Rev. Comput. Mol. Sci. 6 (2), 147−172.
J. H. (2016) Genome Sequence Variability Predicts Drug Precautions (49) Raunio, H. (2011) In Silico Toxicology − Non-Testing Methods.
and Withdrawals from the Market. PLoS One 11 (9), e0162135. Front. Pharmacol. 2, 33.
(33) McNaughton, R., Huet, G., and Shakir, S. (2014) An (50) Greene, N., Judson, P. N., Langowski, J. J., and Marchant, C. A.
Investigation into Drug Products Withdrawn from the EU Market (1999) Knowledge-Based Expert Systems for Toxicity and Metabolism
between 2002 and 2011 for Safety Reasons and the Evidence Used to Prediction: DEREK, StAR and METEOR. SAR QSAR Environ. Res. 10
Support the Decision-Making. BMJ. Open 4 (1), e004221. (2−3), 299−314.
(34) Park, B. K., Boobis, A., Clarke, S., Goldring, C. E. P., Jones, D., (51) Marchant, C. A., Briggs, K. A., and Long, A. (2008) In Silico
Kenna, J. G., Lambert, C., Laverty, H. G., Naisbitt, D. J., Nelson, S., Tools for Sharing Data and Knowledge on Toxicity and Metabolism:
Nicoll-Griffith, D. A., Obach, R. S., Routledge, P., Smith, D. A., Derek for Windows, Meteor, and Vitic. Toxicol. Mech. Methods 18 (2−
Tweedie, D. J., Vermeulen, N., Williams, D. P., Wilson, I. D., and Baillie, 3), 177−187.
T. A. (2011) Managing the Challenge of Chemically Reactive (52) Mombelli, E., and Devillers, J. (2010) Evaluation of the OECD
Metabolites in Drug Development. Nat. Rev. Drug Discovery 10 (4), (Q)SAR Application Toolbox and Toxtree for Predicting and Profiling
292−306. the Carcinogenic Potential of Chemicals. SAR QSAR Environ. Res. 21
(35) O’Brien, P. J., Siraki, A. G., and Shangari, N. (2005) Aldehyde (7−8), 731−752.
Sources, Metabolism, Molecular Toxicity Mechanisms, and Possible (53) Dimitrov, S. D., Diderich, R., Sobanski, T., Pavlov, T. S.,
Effects on Human Health. Crit. Rev. Toxicol. 35 (7), 609−662. Chankov, G. V., Chapkanov, A. S., Karakolev, Y. H., Temelkov, S. G.,
(36) Mak, I. W. Y., Evaniew, N., and Ghert, M. (2014) Lost in Vasilev, R. A., Gerova, K. D., Kuseva, C. D., Todorova, N. D., Mehmed,
Translation: Animal Models and Clinical Trials in Cancer Treatment. A. M., Rasenberg, M., and Mekenyan, O. G. (2016) QSAR Toolbox −
Am. J. Transl. Res. 6 (2), 114. Workflow and Major Functionalities. SAR QSAR Environ. Res. 27 (3),
(37) Smietana, K., Siatkowski, M., and Møller, M. (2016) Trends in 203−219.
Clinical Success Rates. Nat. Rev. Drug Discovery 15 (6), 379−380. (54) Yang, H., Sun, L., Li, W., Liu, G., and Tang, Y. (2018) In Silico
(38) Evens, R. P. (2016) Pharma Success in Product Development Prediction of Chemical Toxicity for Drug Design Using Machine
Does Biotechnology Change the Paradigm in Product Development Learning Methods and Structural Alerts. Front. Chem. 6, 30.
and Attrition. AAPS J. 18 (1), 281−285. (55) Vamathevan, J., Clark, D., Czodrowski, P., Dunham, I., Ferran, E.,
(39) Festing, S., and Wilkinson, R. (2007) The Ethics of Animal Lee, G., Li, B., Madabhushi, A., Shah, P., Spitzer, M., and Zhao, S.
Research. EMBO Rep. 8 (6), 526−530. (2019) Applications of Machine Learning in Drug Discovery and
(40) Varga, O. E., Hansen, A. K., Sandøe, P., and Olsson, I. A. S. Development. Nat. Rev. Drug Discovery 18 (6), 463−477.
(2010) Validating Animal Models for Preclinical Research: A Scientific (56) Baskin, I. I. Machine Learning Methods in Computational
and Ethical Discussion. ATLA, Altern. Lab. Anim. 38 (3), 245−248. Toxicology. (2018) In Computational Toxicology, pp 119−139,
(41) Adler, S., Basketter, D., Creton, S., Pelkonen, O., Van Benthem, Humana Press, New York, NY.
J., Zuang, V., Andersen, K. E., Angers-Loustau, A., Aptula, A., Bal-Price, (57) Villeneuve, D. L., Crump, D., Garcia-Reyero, N., Hecker, M.,
A., Benfenati, E., Bernauer, U., Bessems, J., Bois, F. Y., Boobis, A., Hutchinson, T. H., LaLone, C. A., Landesmann, B., Lettieri, T., Munn,
Brandon, E., Bremer, S., Broschard, T., Casati, S., Coecke, S., Corvi, R., S., Nepelska, M., Ottinger, M. A., Vergauwen, L., and Whelan, M.
Cronin, M., Daston, G., Dekant, W., Felter, S., Grignard, E., Gundert- (2014) Adverse Outcome Pathway (AOP) Development I: Strategies
Remy, U., Heinonen, T., Kimber, I., Kleinjans, J., Komulainen, H., and Principles. Toxicol. Sci. 142 (2), 312−320.
Kreiling, R., Kreysa, J., Leite, S. B., Loizou, G., Maxwell, G., Mazzatorta, (58) Knapen, D., Vergauwen, L., Villeneuve, D. L., and Ankley, G. T.
P., Munn, S., Pfuhler, S., Phrakonkham, P., Piersma, A., Poth, A., Prieto, (2015) The Potential of AOP Networks for Reproductive and
P., Repetto, G., Rogiers, V., Schoeters, G., Schwarz, M., Serafimova, R., Developmental Toxicity Assay Development. Reprod. Toxicol. 56,
Tähti, H., Testai, E., Van Delft, J., Van Loveren, H., Vinken, M., Worth, 52−55.
A., and Zaldivar, J. M. (2011) Alternative (Non-Animal) Methods for (59) Richard, A. M., Judson, R. S., Houck, K. A., Grulke, C. M.,
Cosmetics Testing: Current Status and Future Prospects-2010. Arch. Volarath, P., Thillainadarajah, I., Yang, C., Rathman, J., Martin, M. T.,
Toxicol. 85 (5), 367−485. Wambaugh, J. F., Knudsen, T. B., Kancherla, J., Mansouri, K., Patlewicz,
(42) Burden, N., Mahony, C., Müller, B. P., Terry, C., Westmoreland, G., Williams, A. J., Little, S. B., Crofton, K. M., and Thomas, R. S.
C., and Kimber, I. (2015) Aligning the 3Rs with New Paradigms in the (2016) ToxCast Chemical Landscape: Paving the Road to 21st Century
Safety Assessment of Chemicals. Toxicology 330, 62−66. Toxicology. Chem. Res. Toxicol. 29 (8), 1225−1251.
(43) Sullivan, K. M., Manuppello, J. R., and Willett, C. E. (2014) (60) Dix, D. J., Houck, K. A., Martin, M. T., Richard, A. M., Setzer, R.
Building on a Solid Foundation: SAR and QSAR as a Fundamental W., and Kavlock, R. J. (2007) The ToxCast Program for Prioritizing
Strategy to Reduce Animal Testing. SAR QSAR Environ. Res. 25 (5), Toxicity Testing of Environmental Chemicals. Toxicol. Sci. 95 (1), 5−
357−365. 12.
(44) Chapman, K. L., Holzgrefe, H., Black, L. E., Brown, M., (61) Gaulton, A., Hersey, A., Nowotka, M., Bento, A. P., Chambers, J.,
Chellman, G., Copeman, C., Couch, J., Creton, S., Gehen, S., Mendez, D., Mutowo, P., Atkinson, F., Bellis, L. J., Cibrián-Uhalte, E.,

P https://dx.doi.org/10.1021/acs.chemrestox.0c00316
Chem. Res. Toxicol. XXXX, XXX, XXX−XXX
Chemical Research in Toxicology pubs.acs.org/crt Review

Davies, M., Dedman, N., Karlsson, A., Magariños, M. P., Overington, J. (76) Lin, W. C., Tsai, C. F., Hu, Y. H., and Jhang, J. S. (2017)
P., Papadatos, G., Smit, I., and Leach, A. R. (2017) The ChEMBL Clustering-Based Undersampling in Class-Imbalanced Data. Inf. Sci.
Database in 2017. Nucleic Acids Res. 45 (D1), D945−D954. (N. Y.) 409−410, 17−26.
(62) Gaulton, A., Bellis, L. J., Bento, A. P., Chambers, J., Davies, M., (77) Boughorbel, S., Jarray, F., and El-Anbari, M. (2017) Optimal
Hersey, A., Light, Y., McGlinchey, S., Michalovich, D., Al-Lazikani, B., Classifier for Imbalanced Data Using Matthews Correlation Coefficient
and Overington, J. P. (2012) ChEMBL: A Large-Scale Bioactivity Metric. PLoS One 12 (6), e0177678.
Database for Drug Discovery. Nucleic Acids Res. 40 (D1), D1100− (78) Yan, Y., Chen, M., Shyu, M. L., and Chen, S. C. (2015) Deep
D1107. Learning for Imbalanced Multimedia Data Classification. IEEE
(63) Bento, A. P., Gaulton, A., Hersey, A., Bellis, L. J., Chambers, J., International Symposium on Multimedia, 483−488.
Davies, M., Krüger, F. A., Light, Y., Mak, L., McGlinchey, S., Nowotka, (79) Akbani, R., Kwek, S., and Japkowicz, N. Applying Support Vector
M., Papadatos, G., Santos, R., and Overington, J. P. (2014) The Machines to Imbalanced Datasets. (2004) In European Conference on
ChEMBL Bioactivity Database: An Update. Nucleic Acids Res. 42 (D1), Machine Learning , pp 39−50, Springer, Berlin.
D1083−D1090. (80) Batista, G. E. A. P. A., Prati, R. C., and Monard, M. C. (2004) A
(64) Kim, S., Chen, J., Cheng, T., Gindulyte, A., He, J., He, S., Li, Q., Study of the Behavior of Several Methods for Balancing Machine
Shoemaker, B. A., Thiessen, P. A., Yu, B., Zaslavsky, L., Zhang, J., and Learning Training Data. ACM SIGKDD Explor. Newsl. 6 (1), 20−29.
(81) Lemâıtre, G., Nogueira, F., and Aridas, C. K. (2017) Imbalanced-
Bolton, E. E. (2019) PubChem 2019 Update: Improved Access to
Learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets
Chemical Data. Nucleic Acids Res. 47, D1102.
in Machine Learning. J. Mach. Learn. Res. 18 (1), 559−563.
(65) Bowes, J., Brown, A. J., Hamon, J., Jarolimek, W., Sridhar, A.,
(82) Erickson, B. J., Korfiatis, P., Akkus, Z., and Kline, T. L. (2017)
Waldron, G., and Whitebread, S. (2012) Reducing Safety-Related Drug
Machine Learning for Medical Imaging. Radiographics 37 (2), 505−
Attrition: The Use of in Vitro Pharmacological Profiling. Nat. Rev. Drug 515.
Discovery 11 (12), 909−922. (83) Beam, A. L., and Kohane, I. S. (2018) Big Data and Machine
(66) Schulz, M., Schmoldt, A., and Schulz, M. (2003) Therapeutic and Learning in Health Care. JAMA 319 (13), 1317−1318.
Toxic Blood Concentrations of More than 800 Drugs and Other (84) Gawehn, E., Hiss, J. A., and Schneider, G. (2016) Deep Learning
Xenobiotics. Pharmazie 58 (7), 447−474. in Drug Discovery. Mol. Inf. 35 (1), 3−14.
(67) Sagar, S., Kaur, M., Radovanovic, A., and Bajic, V. B. (2013) (85) Chen, H., Engkvist, O., Wang, Y., Olivecrona, M., and Blaschke,
Dragon Exploration System on Marine Sponge Compounds Inter- T. (2018) The Rise of Deep Learning in Drug Discovery. Drug
actions. J. Cheminf. 5 (1), 11. Discovery Today 23 (6), 1241−1250.
(68) Cases, M., Pastor, M., and Sanz, F. (2013) The ETOX Library of (86) Nichols, J. A., Herbert Chan, H. W., and Baker, M. A. B. (2019)
Public Resources for in Silico Toxicity Prediction. Mol. Inf. 32 (1), 24− Machine Learning: Applications of Artificial Intelligence to Imaging
35. and Diagnosis. Biophys. Rev. 11 (1), 111−118.
(69) Wishart, D. S., Knox, C., Guo, A. C., Eisner, R., Young, N., (87) Wang, X., Wang, X., and Wilkes, D. M. (2020) Supervised
Gautam, B., Hau, D. D., Psychogios, N., Dong, E., Bouatra, S., Mandal, Learning for Data Classification Based Object Recognition. Machine
R., Sinelnikov, I., Xia, J., Jia, L., Cruz, J. A., Lim, E., Sobsey, C. A., Learning-based Natural Scene Recognition for Mobile Robot Localization
Shrivastava, S., Huang, P., Liu, P., Fang, L., Peng, J., Fradette, R., Cheng, in An Unknown Environment, 179−194.
D., Tzur, D., Clements, M., Lewis, A., De Souza, A., Zuniga, A., Dawe, (88) Jordan, M. I., and Mitchell, T. M. (2015) Machine Learning:
M., Xiong, Y., Clive, D., Greiner, R., Nazyrova, A., Shaykhutdinov, R., Trends, Perspectives, and Prospects. Science 349 (6245), 255−260.
Li, L., Vogel, H. J., and Forsythe, I. (2009) HMDB: A Knowledgebase (89) Schrider, D. R., and Kern, A. D. (2018) Supervised Machine
for the Human Metabolome. Nucleic Acids Res. 37, D603−D610. Learning for Population Genetics: A New Paradigm. Trends Genet. 34
(70) Wishart, D. S., Jewison, T., Guo, A. C., Wilson, M., Knox, C., Liu, (4), 301−312.
Y., Djoumbou, Y., Mandal, R., Aziat, F., Dong, E., Bouatra, S., (90) Libbrecht, M. W., and Noble, W. S. (2015) Machine Learning
Sinelnikov, I., Arndt, D., Xia, J., Liu, P., Yallou, F., Bjorndahl, T., Perez- Applications in Genetics and Genomics. Nat. Rev. Genet. 16 (6), 321−
Pineiro, R., Eisner, R., Allen, F., Neveu, V., Greiner, R., and Scalbert, A. 332.
(2012) HMDB 3.0The Human Metabolome Database in 2013. (91) Längkvist, M., Karlsson, L., and Loutfi, A. (2014) A Review of
Nucleic Acids Res. 41 (D1), D801−D807. Unsupervised Feature Learning and Deep Learning for Time-Series
(71) Wishart, D. S., Tzur, D., Knox, C., Eisner, R., Guo, A. C., Young, Modeling. Pattern Recognit. Lett. 42 (1), 11−24.
N., Cheng, D., Jewell, K., Arndt, D., Sawhney, S., Fung, C., Nikolai, L., (92) Le, Q. V. Building High-Level Features Using Large Scale
Lewis, M., Coutouly, M.-A., Forsythe, I., Tang, P., Shrivastava, S., Unsupervised Learning. (2013) Proceedings from the IEEE Interna-
Jeroncic, K., Stothard, P., Amegbey, G., Block, D., Hau, D. D., Wagner, tional Conference on Acoustics, Speech and Signal Processing, May 26−31,
2013, Vancouver, BC, Canada, pp 8595−8598, IEEE, New York.
J., Miniaci, J., Clements, M., Gebremedhin, M., Guo, N., Zhang, Y.,
(93) Raina, R., Madhavan, A., and Ng, A. Y. Large-Scale Deep
Duggan, G. E., MacInnis, G. D., Weljie, A. M., Dowlatabadi, R.,
Unsupervised Learning Using Graphics Processors. (2009) Proceed-
Bamforth, F., Clive, D., Greiner, R., Li, L., Marrie, T., Sykes, B. D.,
ings of the 26th Annual International Conference on Machine Learning,
Vogel, H. J., and Querengesser, L. (2007) HMDB: The Human June, 2009, Montreal, Quebec, Canada, pp 873−880, ACM, New York.
Metabolome Database. Nucleic Acids Res. 35, D521−D526. (94) van Engelen, J. E., and Hoos, H. H. (2020) A Survey on Semi-
(72) Han, H., Wang, W. Y., and Mao, B. H. Borderline-SMOTE: A Supervised Learning. Mach. Learn. 109 (2), 373−440.
New over-Sampling Method in Imbalanced Data Sets Learning. (2005) (95) Guo, B., Tao, H., Hou, C., and Yi, D. (2020) Semi-Supervised
In International conference on intelligent computing, pp 878−887, Multi-Label Feature Learning via Label Enlarged Discriminant
Springer, Berlin. Analysis. Knowl. Inf. Syst. 62, 2383−2417.
(73) Chen, J. J., Tsai, C. A., Young, J. F., and Kodell, R. L. (2005) (96) Kostopoulos, G., Kotsiantis, S., Fazakis, N., Koutsonikos, G., and
Classification Ensembles for Unbalanced Class Sizes in Predictive Pierrakeas, C. (2019) A Semi-Supervised Regression Algorithm for
Toxicology. SAR QSAR Environ. Res. 16 (6), 517−529. Grade Prediction of Students in Distance Learning Courses. Int. J. Artif.
(74) Fernández, A., García, S., Herrera, F., and Chawla, N. V. (2018) Intell. Tools 28 (4), 1940001.
SMOTE for Learning from Imbalanced Data: Progress and Challenges, (97) Zhou, L., Liu, Z., Tan, H., and Xie, X. (2019) Semisupervised
Marking the 15-Year Anniversary. Journal of Artificial Intelligence Learning with Adversarial Training among Joint Distributions. J.
Research 61, 863−905. Electron. Imaging 28 (05), 1.
(75) Buda, M., Maki, A., and Mazurowski, M. A. (2018) A Systematic (98) Wang, J., Zuo, R., and Xiong, Y. (2020) Mapping Mineral
Study of the Class Imbalance Problem in Convolutional Neural Prospectivity via Semi-Supervised Random Forest. Nat. Resour. Res. 29
Networks. Neural Networks 106, 249−259. (1), 189−202.

Q https://dx.doi.org/10.1021/acs.chemrestox.0c00316
Chem. Res. Toxicol. XXXX, XXX, XXX−XXX
Chemical Research in Toxicology pubs.acs.org/crt Review

(99) Huang, G., Song, S., Gupta, J. N. D., and Wu, C. (2014) Semi- (120) Rao, R. B., Fung, G., and Rosales, R. On the Dangers of Cross-
Supervised and Unsupervised Extreme Learning Machines. IEEE Trans. Validation. An Experimental Evaluation. (2008) Proceedings from the
Cybern. 44 (12), 2405−2417. 2008 SIAM international conference on data mining, April 24−26, 2008,
(100) Tanha, J., van Someren, M., and Afsarmanesh, H. (2017) Semi- Atlanta, GA, pp 588−596, SIAM, Philadelphia, PA.
Supervised Self-Training for Decision Tree Classifiers. Int. J. Mach. (121) Tan, N. X., Li, P., Rao, H. B., Li, Z. R., and Li, X. Y. (2010)
Learn. Cybern. 8 (1), 355−370. Prediction of the Acute Toxicity of Chemical Compounds to the
(101) Zhou, Z.-H. (2018) A Brief Introduction to Weakly Supervised Fathead Minnow by Machine Learning Approaches. Chemom. Intell.
Learning. Natl. Sci. Rev. 5 (1), 44−53. Lab. Syst. 100 (1), 66−73.
(102) Idakwo, G., Thangapandian, S., Luttrell, J., Zhou, Z., Zhang, C., (122) Wen, M., Zhang, Z., Niu, S., Sha, H., Yang, R., Yun, Y., and Lu,
and Gong, P. (2019) Deep Learning-Based Structure-Activity H. (2017) Deep-Learning-Based Drug-Target Interaction Prediction. J.
Relationship Modeling for Multi-Category Toxicity Classification: A Proteome Res. 16 (4), 1401−1409.
Case Study of 10K Tox21 Chemicals With High-Throughput Cell- (123) Gayvert, K. M., Madhukar, N. S., and Elemento, O. (2016) A
Based Androgen Receptor Bioassay Data. Front. Physiol. 10, 1044. Data-Driven Approach to Predicting Successes and Failures of Clinical
(103) Sun, L., Yang, H., Cai, Y., Li, W., Liu, G., and Tang, Y. (2019) In Trials. Cell Chem. Biol. 23 (10), 1294−1301.
Silico Prediction of Endocrine Disrupting Chemicals Using Single- (124) Kearnes, S., McCloskey, K., Berndl, M., Pande, V., and Riley, P.
Label and Multilabel Models. J. Chem. Inf. Model. 59 (3), 973−982. (2016) Molecular Graph Convolutions: Moving beyond Fingerprints.
(104) Jiang, C., Yang, H., Di, P., Li, W., Tang, Y., and Liu, G. (2019) In J. Comput.-Aided Mol. Des. 30 (8), 595−608.
Silico Prediction of Chemical Reproductive Toxicity Using Machine (125) Cereto-Massagué, A., Ojeda, M. J., Valls, C., Mulero, M., Garcia-
Learning. J. Appl. Toxicol. 39 (6), 844−854. Vallvé, S., and Pujadas, G. (2015) Molecular Fingerprint Similarity
(105) Xu, Y., Pei, J., and Lai, L. (2017) Deep Learning Based Search in Virtual Screening. Methods 71, 58−63.
Regression and Multiclass Models for Acute Oral Toxicity Prediction (126) Durant, J. L., Leland, B. A., Henry, D. R., and Nourse, J. G.
with Automatic Chemical Feature Extraction. J. Chem. Inf. Model. 57 (2002) Reoptimization of MDL Keys for Use in Drug Discovery. J.
(11), 2672−2685. Chem. Inf. Comput. Sci. 42 (6), 1273−1280.
(106) Pereira, J. C., Caffarena, E. R., and Dos Santos, C. N. (2016) (127) Schaffer, C. (1993) Selecting a Classification Method by Cross-
Boosting Docking-Based Virtual Screening with Deep Learning. J. Validation. Mach. Learn. 13 (1), 135−143.
Chem. Inf. Model. 56 (12), 2495−2506. (128) Kerns, S. L., Kundu, S., Oh, J. H., Singhal, S. K., Janelsins, M.,
(107) Rogers, D., and Hahn, M. (2010) Extended-Connectivity Travis, L. B., Deasy, J. O., Janssens, A. C. J. E., Ostrer, H., Parliament,
Fingerprints. J. Chem. Inf. Model. 50 (5), 742−754. M., Usmani, N., and Rosenstein, B. S. (2015) The Prediction of
(108) Cai, Y., Yang, H., Li, W., Liu, G., Lee, P. W., and Tang, Y. (2019) Radiotherapy Toxicity Using Single Nucleotide Polymorphism-Based
Computational Prediction of Site of Metabolism for UGT-Catalyzed Models: A Step Toward Prevention. Seminars in radiation oncology 25
Reactions. J. Chem. Inf. Model. 59 (3), 1085−1095. (4), 281−291.
(109) Wang, Y., Guo, Y., Kuang, Q., Pu, X., Ji, Y., Zhang, Z., and Li, M. (129) Linden, A., Yarnold, P. R., and Nallamothu, B. K. (2016) Using
(2015) A Comparative Study of Family-Specific Protein-Ligand Machine Learning to Model Dose-Response Relationships. J. Eval. Clin.
Complex Affinity Prediction Based on Random Forest Approach. J. Pract. 22 (6), 860−867.
Comput.-Aided Mol. Des. 29 (4), 349−360. (130) Zhong, E., Fan, W., Yang, Q., Verscheure, O., and Ren, J. Cross
(110) Bartlett, M. S., Littlewort, G., Frank, M., Lainscsek, C., Fasel, I., Validation Framework to Choose amongst Models and Datasets for
and Movellan, J. Recognizing Facial Expression: Machine Learning and Transfer Learning. (2010) Proceedings from the Joint European
Application to Spontaneous Behavior. (2005) Proceedings from the Conference on Machine Learning and Knowledge Discovery in Databases,
IEEE Computer Society Conference on Computer Vision and Pattern September 20−24, 2010, Barcelona, Spain, pp 547−562, Springer,
Recognition, June 20−25, 2005, San Diego, CA, Vol. 2, pp 568−573, Berlin.
IEEE, New York. (131) Cawley, G. C., and Talbot, N. L. C. (2003) Efficient Leave-One-
(111) Breiman, L. (2001) Random Forests. Mach. Learn. 45 (1), 5− out Cross-Validation of Kernel Fisher Discriminant Classifiers. Pattern
32. Recognit. 36 (11), 2585−2592.
(112) Kumari, P., Nath, A., and Chaube, R. (2015) Identification of (132) Cawley, G. C., and Talbot, N. L. C. (2004) Fast Exact Leave-
Human Drug Targets Using Machine-Learning Algorithms. Comput. One-out Cross-Validation of Sparse Least-Squares Support Vector
Biol. Med. 56, 175−181. Machines. Neural Networks 17 (10), 1467−1475.
(113) Fooshee, D., Mood, A., Gutman, E., Tavakoli, M., Urban, G., (133) Chen, T., Cao, Y., Zhang, Y., Liu, J., Bao, Y., Wang, C., Jia, W.,
Liu, F., Huynh, N., Van Vranken, D., and Baldi, P. (2018) Deep and Zhao, A. (2013) Random Forest in Clinical Metabolomics for
Learning for Chemical Reaction Prediction. Mol. Syst. Des. Eng. 3 (3), Phenotypic Discrimination and Biomarker Selection. J. Evidence-Based
442−452. Complementary Altern. Med. 2013, 298183.
(114) Patil, S. R., and Suralkar, S. R. Neural Network Based (134) He, H., Bai, Y., Garcia, E. A., and Li, S. ADASYN: Adaptive
Fingerprint Classification. Int. J. Sci. Res. 2013, 2 (1).5862 Synthetic Sampling Approach for Imbalanced Learning. (2008)
(115) Jha, D., Ward, L., Paul, A., Liao, W., keng, Choudhary, A., Proceedings from the IEEE International Joint Conference on Neural
Wolverton, C., and Agrawal, A. (2018) ElemNet: Deep Learning the Networks, June 1−8, 2008, Hong Kong, China, pp 1322−1328, IEEE,
Chemistry of Materials From Only Elemental Composition. Sci. Rep. 8 New York.
(1), 17593. (135) Graser, J., Kauwe, S. K., and Sparks, T. D. (2018) Machine
(116) Chandrashekar, G., and Sahin, F. (2014) A Survey on Feature Learning and Energy Minimization Approaches for Crystal Structure
Selection Methods. Comput. Electr. Eng. 40 (1), 16−28. Predictions: A Review and New Horizons. Chem. Mater. 30 (11),
(117) Blum, A. L., and Langley, P. (1997) Selection of Relevant 3601−3612.
Features and Examples in Machine Learning. Artif. Intell. 97 (1−2), (136) An, S., Liu, W., and Venkatesh, S. (2007) Fast Cross-Validation
245−271. Algorithms for Least Squares Support Vector Machine and Kernel
(118) Korkmaz, S., Zararsiz, G., and Goksuluk, D. (2014) Drug/ Ridge Regression. Pattern Recognit. 40 (8), 2154−2162.
Nondrug Classification Using Support Vector Machines with Various (137) Rácz, A., Bajusz, D., and Héberger, K. (2018) SAR and QSAR in
Feature Selection Strategies. Comput. Methods Programs Biomed. 117 Environmental Research Modelling Methods and Cross-Validation
(2), 51−60. Variants in QSAR: A Multi-Level Analysis. SAR QSAR Environ. Res. 29
(119) Zhang, P., Wang, F., Hu, J., and Sorrentino, R. (2015) Label (9), 661−674.
Propagation Prediction of Drug-Drug Interactions Based on Clinical (138) Xu, Q. S., and Liang, Y. Z. (2001) Monte Carlo Cross
Side Effects. Sci. Rep. 5, 12339. Validation. Chemom. Intell. Lab. Syst. 56 (1), 1−11.

R https://dx.doi.org/10.1021/acs.chemrestox.0c00316
Chem. Res. Toxicol. XXXX, XXX, XXX−XXX
Chemical Research in Toxicology pubs.acs.org/crt Review

(139) He, S., Ye, T., Wang, R., Zhang, C., Zhang, X., Sun, G., and Sun, Artificial Neural Network and Bayesian Classifier for Mutagenicity
X. (2019) An In Silico Model for Predicting Drug-Induced Prediction. Interdiscip. Sci.: Comput. Life Sci. 3 (3), 232−239.
Hepatotoxicity. Int. J. Mol. Sci. 20 (8), 1897. (158) Zhou, S., Li, G. B., Huang, L. Y., Xie, H. Z., Zhao, Y. L., Chen, Y.
(140) Liu, J., Mansouri, K., Judson, R. S., Martin, M. T., Hong, H., Z., Li, L. L., and Yang, S. Y. (2014) A Prediction Model of Drug-
Chen, M., Xu, X., Thomas, R. S., and Shah, I. (2015) Predicting Induced Ototoxicity Developed by an Optimal Support Vector
Hepatotoxicity Using ToxCast in Vitro Bioactivity and Chemical Machine (SVM) Method. Comput. Biol. Med. 51, 122−127.
Structure. Chem. Res. Toxicol. 28 (4), 738−751. (159) Lee, H.-M., Yu, M.-S., Kazmi, S. R., Oh, S. Y., Rhee, K.-H., Bae,
(141) Liu, J., Patlewicz, G., Williams, A. J., Thomas, R. S., and Shah, I. M.-A., Lee, B. H., Shin, D.-S., Oh, K.-S., Ceong, H., Lee, D., and Na, D.
(2017) Predicting Organ Toxicity Using in Vitro Bioactivity Data and (2019) Computational Determination of HERG-Related Cardiotox-
Chemical Structure. Chem. Res. Toxicol. 30 (11), 2046−2059. icity of Drug Candidates. BMC Bioinf. 20 (S10), 250.
(142) Zhang, H., Cao, Z. X., Li, M., Li, Y. Z., and Peng, C. (2016) (160) Siramshetty, V. B., Chen, Q., Devarakonda, P., and Preissner, R.
Novel Naive ̈ Bayes Classification Models for Predicting the (2018) The Catch-22 of Predicting HERG Blockade Using Publicly
Carcinogenicity of Chemicals. Food Chem. Toxicol. 97, 141−149. Accessible Bioactivity Data. J. Chem. Inf. Model. 58 (6), 1224−1233.
(143) Zhang, H., Ren, J. X., Kang, Y. L., Bo, P., Liang, J. Y., Ding, L., (161) Zhang, Y., Zhao, J., Wang, Y., Fan, Y., Zhu, L., Yang, Y., Chen,
Kong, W. B., and Zhang, J. (2017) Development of Novel in Silico X., Lu, T., Chen, Y., and Liu, H. (2019) Prediction of HERG K+
Model for Developmental Toxicity Assessment by Using Naive ̈ Bayes Channel Blockage Using Deep Neural Networks. Chem. Biol. Drug Des.
Classifier Method. Reprod. Toxicol. 71, 8−15. 94 (5), 1973−1985.
(144) Zhang, H., Ding, L., Zou, Y., Hu, S. Q., Huang, H. G., Kong, W. (162) Kim, E., and Nam, H. (2017) Prediction Models for Drug-
B., and Zhang, J. (2016) Predicting Drug-Induced Liver Injury in Induced Hepatotoxicity by Using Weighted Molecular Fingerprints.
Human with Naive ̈ Bayes Classifier Approach. J. Comput.-Aided Mol. BMC Bioinf. 18 (S7), 227.
Des. 30 (10), 889−898. (163) Ai, H., Chen, W., Zhang, L., Huang, L., Yin, Z., Hu, H., Zhao, Q.,
(145) Schrey, A. K., Nickel-Seeber, J., Drwal, M. N., Zwicker, P., Zhao, J., and Liu, H. (2018) Predicting Drug-Induced Liver Injury
Schultze, N., Haertel, B., and Preissner, R. (2017) Computational Using Ensemble Learning Methods and Molecular Fingerprints.
Prediction of Immune Cell Cytotoxicity. Food Chem. Toxicol. 107, 150− Toxicol. Sci. 165 (1), 100−107.
166. (164) Kandasamy, K., Chuah, J. K. C., Su, R., Huang, P., Eng, K. G.,
(146) Zhang, H., Kang, Y. L., Zhu, Y. Y., Zhao, K. X., Liang, J. Y., Ding, Xiong, S., Li, Y., Chia, C. S., Loo, L. H., and Zink, D. (2015) Prediction
L., Zhang, T. G., and Zhang, J. (2017) Novel Naive ̈ Bayes Classification of Drug-Induced Nephrotoxicity and Injury Mechanisms with Human
Models for Predicting the Chemical Ames Mutagenicity. Toxicol. In Induced Pluripotent Stem Cell-Derived Cells and Machine Learning
Vitro 41, 56−63. Methods. Sci. Rep. 5, 12337.
(147) Seal, A., Passi, A., Abdul Jaleel, U. C., and Wild, D. J. (2012) In- (165) Cai, C., Guo, P., Zhou, Y., Zhou, J., Wang, Q., Zhang, F., Fang, J.,
Silico Predictive Mutagenicity Model Generation Using Supervised and Cheng, F. (2019) Deep Learning-Based Prediction of Drug-
Learning Approaches. J. Cheminf. 4 (1), 10. Induced Cardiotoxicity. J. Chem. Inf. Model. 59 (3), 1073−1084.
(148) Zhang, H., Yu, P., Ren, J. X., Li, X. B., Wang, H. L., Ding, L., and (166) Xu, Y., Dai, Z., Chen, F., Gao, S., Pei, J., and Lai, L. (2015) Deep
Kong, W. B. (2017) Development of Novel Prediction Model for Drug- Learning for Drug-Induced Liver Injury. J. Chem. Inf. Model. 55 (10),
Induced Mitochondrial Toxicity by Using Naive ̈ Bayes Classifier 2085−2093.
Method. Food Chem. Toxicol. 110, 122−129. (167) Xu, C., Cheng, F., Chen, L., Du, Z., Li, W., Liu, G., Lee, P. W.,
(149) Zhang, H., Yu, P., Zhang, T. G., Kang, Y. L., Zhao, X., Li, Y. Y., and Tang, Y. (2012) In Silico Prediction of Chemical Ames
He, J. H., and Zhang, J. (2015) In Silico Prediction of Drug-Induced Mutagenicity. J. Chem. Inf. Model. 52 (11), 2840−2847.
Myelotoxicity by Using Naive ̈ Bayes Method. Mol. Diversity 19 (4), (168) Fernandez, M., Ban, F., Woo, G., Hsing, M., Yamazaki, T.,
945−953. LeBlanc, E., Rennie, P. S., Welch, W. J., and Cherkasov, A. (2018) Toxic
(150) Su, R., Li, Y., Zink, D., and Loo, L.-H. (2014) Supervised Colors: The Use of Deep Learning for Predicting Toxicity of
Prediction of Drug-Induced Nephrotoxicity Based on Interleukin-6 and Compounds Merely from Their Graphic Images. J. Chem. Inf. Model.
−8 Expression Levels. BMC Bioinf. 15 (S16), S16. 58 (8), 1533−1543.
(151) Zhang, H., Ma, J. X., Liu, C. T., Ren, J. X., and Ding, L. (2018) (169) Yuan, Q., Wei, Z., Guan, X., Jiang, M., Wang, S., Zhang, S., and
Development and Evaluation of in Silico Prediction Model for Drug- Li, Z. (2019) Toxicity Prediction Method Based on Multi-Channel
Induced Respiratory Toxicity by Using Naive ̈ Bayes Classifier Method. Convolutional Neural Network. Molecules 24 (18), 3383.
Food Chem. Toxicol. 121, 593−603. (170) Sharma, A. K., Srivastava, G. N., Roy, A., and Sharma, V. K.
(152) Zhang, H., Ren, J. X., Ma, J. X., and Ding, L. (2019) (2017) ToxiM: A Toxicity Prediction Tool for Small Molecules
Development of an in Silico Prediction Model for Chemical-Induced Developed Using Machine Learning and Chemoinformatics Ap-
Urinary Tract Toxicity by Using Naive ̈ Bayes Classifier. Mol. Diversity proaches. Front. Pharmacol. 8, 880.
23 (2), 381−392. (171) Su, R., Wu, H., Liu, X., and Wei, L. (2019) Predicting Drug-
(153) Pella, A., Cambria, R., Riboldi, M., Jereczek-Fossa, B. A., Fodor, Induced Hepatotoxicity Based on Biological Feature Maps and Diverse
C., Zerini, D., Torshabi, A. E., Cattani, F., Garibaldi, C., Pedroli, G., Classification Strategies. Briefings Bioinf., bbz165.
Baroni, G., and Orecchia, R. (2011) Use of Machine Learning Methods (172) Yajima, D., Ohkawa, T., Muroi, K., and Imaishi, H. (2014)
for Prediction of Acute Toxicity in Organs at Risk Following Prostate Predicting Toxicity of Food-Related Compounds Using Fuzzy Decision
Radiotherapy. Med. Phys. 38, 2859−2867. Trees. Int. J. Biosci., Biochem. Bioinf. 4 (1), 33.
(154) Zhang, L., Ai, H., Chen, W., Yin, Z., Hu, H., Zhu, J., Zhao, J., (173) Hammann, F., Schöning, V., and Drewe, J. (2019) Prediction of
Zhao, Q., and Liu, H. (2017) CarcinoPred-EL: Novel Models for Clinically Relevant Drug-Induced Liver Injury from Structure Using
Predicting the Carcinogenicity of Chemicals Using Molecular Finger- Machine Learning. J. Appl. Toxicol. 39 (3), 412−419.
prints and Ensemble Learning Methods. Sci. Rep. 7 (1), 2118. (174) Li, F., Fan, D., Wang, H., Yang, H., Li, W., Tang, Y., and Liu, G.
(155) Shen, M., Su, B.-H., Esposito, E. X., Hopfinger, A. J., and Tseng, (2017) In Silico Prediction of Pesticide Aquatic Toxicity with Chemical
Y. J. (2011) A Comprehensive Support Vector Machine Binary HERG Category Approaches. Toxicol. Res. (Cambridge, U. K.) 6 (6), 831−842.
Classification Model Based on Extensive but Biased End Point HERG (175) Li, X., Du, Z., Wang, J., Wu, Z., Li, W., Liu, G., Shen, X., and
Data Sets. Chem. Res. Toxicol. 24 (6), 934−949. Tang, Y. (2015) In Silico Estimation of Chemical Carcinogenicity with
(156) Li, X., Chen, Y., Song, X., Zhang, Y., Li, H., and Zhao, Y. (2018) Binary and Ternary Classification Methods. Mol. Inf. 34 (4), 228−235.
The Development and Application of: In Silico Models for Drug (176) Zhang, C., Zhou, Y., Gu, S., Wu, Z., Wu, W., Liu, C., Wang, K.,
Induced Liver Injury. RSC Adv. 8 (15), 8101−8111. Liu, G., Li, W., Lee, P. W., and Tang, Y. (2016) In Silico Prediction of
(157) Sharma, A., Kumar, R., Varadwaj, P. K., Ahmad, A., and Ashraf, HERG Potassium Channel Blockage by Chemical Category Ap-
G. M. (2011) A Comparative Study of Support Vector Machine, proaches. Toxicol. Res. (Cambridge, U. K.) 5 (2), 570−582.

S https://dx.doi.org/10.1021/acs.chemrestox.0c00316
Chem. Res. Toxicol. XXXX, XXX, XXX−XXX
Chemical Research in Toxicology pubs.acs.org/crt Review

(177) Fan, D., Yang, H., Li, F., Sun, L., Di, P., Li, W., Tang, Y., and Liu, Regression and Neural Networks. Chem. Biol. Drug Des. 70 (5), 424−
G. (2018) In Silico Prediction of Chemical Genotoxicity Using 436.
Machine Learning Methods and Structural Alerts. Toxicol. Res. (193) Das, R. N., and Roy, K. (2012) Development of Classification
(Cambridge, U. K.) 7 (2), 211−220. and Regression Models for Vibrio Fischeri Toxicity of Ionic Liquids:
(178) Zhang, C., Cheng, F., Li, W., Liu, G., Lee, P. W., and Tang, Y. Green Solvents for the Future. Toxicol. Res. (Cambridge, U. K.) 1 (3),
(2016) In Silico Prediction of Drug Induced Liver Toxicity Using 186−195.
Substructure Pattern Recognition Method. Mol. Inf. 35 (3−4), 136− (194) Borchert, D. M., Walgenbach, J. F., Kennedy, G. G., and Long, J.
144. W. (2004) Toxicity and Residual Activity of Methoxyfenozide and
(179) Richard, A. M., Huang, R., Waidyanatha, S., Shinn, P., Collins, Tebufenozide to Codling Moth (Lepidoptera: Tortricidae) and
B. J., Thillainadarajah, I., Grulke, C. M., Williams, A. J., Lougee, R. R., Oriental Fruit Moth (Lepidoptera: Tortricidae). J. Econ. Entomol. 97
Judson, R. S., Houck, K. A., Shobair, M., Yang, C., Rathman, J. F., (4), 1342−1352.
Yasgar, A., Fitzpatrick, S. C., Simeonov, A., Thomas, R. S., Crofton, K. (195) Tibshirani, R. (2011) Regression Shrinkage and Selection via
M., Paules, R. S., Bucher, J. R., Austin, C. P., Kavlock, R. J., and Tice, R. the Lasso: A Retrospective. J. R. Stat. Soc. Ser. B (Statistical Methodol.)
R. (2020) The Tox21 10K Compound Library: Collaborative 73 (3), 273−282.
Chemistry Advancing Toxicology. Chem. Res. Toxicol., (196) Micevska, T., Warne, M. S. J., Pablo, F., and Patra, R. (2006)
DOI: 10.1021/acs.chemrestox.0c00264. Variation in, and Causes of, Toxicity of Cigarette Butts to a Cladoceran
(180) Hsieh, J.-H., Smith-Roe, S. L., Huang, R., Sedykh, A., Shockley, and Microtox. Arch. Environ. Contam. Toxicol. 50 (2), 205−212.
K. R., Auerbach, S. S., Merrick, B. A., Xia, M., Tice, R. R., and Witt, K. L. (197) Osborne, M. R., Presnell, B., and Turlach, B. A. (2000) On the
(2019) Identifying Compounds with Genotoxicity Potential Using LASSO and Its Dual. J. Comput. Graph. Stat. 9 (2), 319−337.
Tox21 High-Throughput Screening Assays. Chem. Res. Toxicol. 32 (7), (198) Isbister, G. K., O’Regan, L., Sibbritt, D., and Whyte, I. M.
1384−1401. (2004) Alprazolam Is Relatively More Toxic than Other Benzodiaze-
(181) Huang, R., Xia, M., Sakamuru, S., Zhao, J., Shahane, S. A., pines in Overdose. Br. J. Clin. Pharmacol. 58 (1), 88−95.
Attene-Ramos, M., Zhao, T., Austin, C. P., and Simeonov, A. (2016) (199) Hawkins, D. M., Basak, S. C., and Mills, D. (2004) QSARs for
Modelling the Tox21 10 K Chemical Profiles for in Vivo Toxicity Chemical Mutagens from Structure: Ridge Regression Fitting and
Prediction and Mechanism Characterization. Nat. Commun. 7 (1), Diagnostics. Environ. Toxicol. Pharmacol. 16 (1−2), 37−44.
10425. (200) Loganayagam, A., Arenas Hernandez, M., Corrigan, A.,
(182) Capuzzi, S. J., Politi, R., Isayev, O., Farag, S., and Tropsha, A. Fairbanks, L., Lewis, C. M., Harper, P., Maisey, N., Ross, P.,
(2016) QSAR Modeling of Tox21 Challenge Stress Response and Sanderson, J. D., and Marinaki, A. M. (2013) Pharmacogenetic
Variants in the DPYD, TYMS, CDA and MTHFR Genes Are Clinically
Nuclear Receptor Signaling Toxicity Assays. Front. Environ. Sci. 4, 3.
(183) Judson, R., Houck, K., Martin, M., Knudsen, T., Thomas, R. S., Significant Predictors of Fluoropyrimidine Toxicity. Br. J. Cancer 108
(12), 2505−2515.
Sipes, N., Shah, I., Wambaugh, J., and Crofton, K. (2014) In Vitro and
(201) Cawley, G. C., and Talbot, N. L. C. (2002) Reduced Rank
Modelling Approaches to Risk Assessment from the U.S. Environ-
Kernel Ridge Regression. Neural Process. Lett. 16 (3), 293−302.
mental Protection Agency ToxCast Programme. Basic Clin. Pharmacol.
(202) Roy, K., and Ghosh, G. (2007) QSTR with Extended
Toxicol. 115 (1), 69−76.
Topochemical Atom (ETA) Indices. 9. Comparative QSAR for the
(184) Norinder, U., and Boyer, S. (2016) Conformal Prediction
Toxicity of Diverse Functional Organic Compounds to Chlorella
Classification of a Large Data Set of EnRvironmental Chemicals from
Vulgaris Using Chemometric Tools. Chemosphere 70 (1), 1−12.
ToxCast and Tox21 Estrogen Receptor Assays. Chem. Res. Toxicol. 29 (203) Agarwal, V., Gribok, A. V., Koschan, A., and Abidi, M. A.
(6), 1003−1010. Estimating Illumination Chromaticity via Kernel Regression. (2006)
(185) Sipes, N. S., Martin, M. T., Reif, D. M., Kleinstreuer, N. C., Proceedings from the International Conference on Image Processing,
Judson, R. S., Singh, A. V., Chandler, K. J., Dix, D. J., Kavlock, R. J., and October 8−11, 2006, Atlanta, GA, pp 981−984, IEEE, New York.
Knudsen, T. B. (2011) Predictive Models of Prenatal Developmental (204) Kar, S., and Roy, K. (2010) QSAR Modeling of Toxicity of
Toxicity from Toxcast High-Throughput Screening Data. Toxicol. Sci. Diverse Organic Chemicals to Daphnia Magna Using 2D and 3D
124 (1), 109−127. Descriptors. J. Hazard. Mater. 177 (1−3), 344−351.
(186) Yang, C., Tarkhov, A., Marusczyk, J., Bienfait, B., Gasteiger, J., (205) Duchowicz, P. (2018) Linear Regression QSAR Models for
Kleinoeder, T., Magdziarz, T., Sacher, O., Schwab, C. H., Schwoebel, J., Polo-Like Kinase-1 Inhibitors. Cells 7 (2), 13.
Terfloth, L., Arvidson, K., Richard, A., Worth, A., and Rathman, J. (206) Pan, Y., Jiang, J., Wang, R., and Cao, H. (2008) Advantages of
(2015) New Publicly Available Chemical Query Language, CSRML, Support Vector Machine in QSPR Studies for Predicting Auto-Ignition
To Support Chemotype Representations for Application to Data Temperatures of Organic Compounds. Chemom. Intell. Lab. Syst. 92
Mining and Modeling. J. Chem. Inf. Model. 55 (3), 510. (2), 169−178.
(187) Russo, D. P., Zorn, K. M., Clark, A. M., Zhu, H., and Ekins, S. (207) Razi, M. A., and Athappilly, K. (2005) A Comparative
(2018) Comparing Multiple Machine Learning Algorithms and Metrics Predictive Analysis of Neural Networks (NNs), Nonlinear Regression
for Estrogen Receptor Binding Prediction. Mol. Pharmaceutics 15 (10), and Classification and Regression Tree (CART) Models. Expert Syst.
4361−4370. Appl. 29 (1), 65−74.
(188) Chen, T., and Guestrin, C. (2016) XGBoost: A Scalable Tree (208) Bermejo, S., and Cabestany, J. (2001) Learning with Nearest
Boosting System. Proc. 22nd acm sigkdd Int. Conf. Knowl. Discovery data Neighbour Classifiers. Neural Process. Lett. 13 (2), 159−181.
Min., 785−794. (209) Neelamegam, S., and Ramaraj, E. (2013) Classification
(189) Jagielski, M., Oprea, A., Biggio, B., Liu, C., Nita-Rotaru, C., and Algorithm in Data Mining: An Overview. Int. J. P2P Netw. Trends
Li, B. Manipulating Machine Learning: Poisoning Attacks and Technol. 4 (8), 369−374.
Countermeasures for Regression Learning. (2018) Proceedings from (210) Samsudin, N. A., and Bradley, A. P. (2010) Nearest Neighbour
the IEEE Symposium on Security and Privacy, May 21−23, 2018, San Group-Based Classification. Pattern Recognit. 43 (10), 3458−3467.
Francisco, CA, pp 19−35, IEEE, New York. (211) Friel, N., and Pettitt, A. N. (2011) Classification Using Distance
(190) Stulp, F., and Sigaud, O. (2015) Many Regression Algorithms, Nearest Neighbours. Stat. Comput. 21 (3), 431−437.
One Unified Model: A Review. Neural Networks 69, 60−79. (212) Lakshmi, S. V., and Prabakaran, T. E. (2014) Application of K-
(191) Bezerra de Menezes, L. M., Volpato, M. C., Rosalen, P. L., and Nearest Neighbour Classification Method for Intrusion Detection in
Cury, J. A. (2003) Bone as a Biomarker of Acute Fluoride Toxicity. Network Data. Int. J. Comput. Appl. 97 (7), 34.
Forensic Sci. Int. 137 (2−3), 209−214. (213) Safavian, S. R., and Landgrebe, D. (1991) A Survey of Decision
(192) Nandi, S., Vracko, M., and Bagchi, M. C. (2007) Anticancer Tree Classifier Methodology. IEEE Trans. Syst. Man. Cybern. 21 (3),
Activity of Selected Phenolic Compounds: QSAR Studies Using Ridge 660−674.

T https://dx.doi.org/10.1021/acs.chemrestox.0c00316
Chem. Res. Toxicol. XXXX, XXX, XXX−XXX
Chemical Research in Toxicology pubs.acs.org/crt Review

(214) Song, Y. Y., and Lu, Y. (2015) Decision Tree Methods: (236) Xiao, Y., Wang, H., Zhang, L., and Xu, W. (2014) Two Methods
Applications for Classification and Prediction. Shanghai Arch. Psychiatry of Selecting Gaussian Kernel Parameters for One-Class SVM and Their
27 (2), 130−135. Application to Fault Detection. Knowledge-Based Syst. 59, 75−84.
(215) Dey, A. (2016) Machine Learning Algorithms: A Review. Int. J. (237) Kim, H. C., Pang, S., Je, H. M., Kim, D., and Bang, S. Y. (2003)
Comput. Sci. Inf. Technol. 7 (3), 1174−1179. Constructing Support Vector Machine Ensemble. Pattern Recognit. 36
(216) Panda, M., and Patra, M. R. (2007) Network Intrusion (12), 2757−2767.
Detection Using Naive Bayes. Int. J. Comput. Sci. Network Secur. 7 (12), (238) Mangasarian, O. L., and Wild, E. W. (2006) Multisurface
258−263. Proximal Support Vector Machine Classification via Generalized
(217) Martfnez-Arroyo, M., and Sucar, L. E. Learning an Optimal Eigenvalues. IEEE Trans. Pattern Anal. Mach. Intell. 28 (1), 69−74.
Naive Bayes Classifier. (2006) Proceedings of the 18th International (239) Khemchandani, R., and Chandra, S. (2007) Twin Support
Conference on Pattern Recognition, August 20−24, 2006, Hong Kong, Vector Machines for Pattern Classification. IEEE Trans. Pattern Anal.
China, pp 1236−1239, IEEE, New York. Mach. Intell. 29 (5), 905−910.
(218) Taheri, S., and Mammadov, M. (2013) Learning the naive Bayes (240) Shao, Y. H., Zhang, C. H., Wang, X. B., and Deng, N. Y. (2011)
classifier with optimization models. Int. J. Appl. Math. Comput. Sci. 23 Improvements on Twin Support Vector Machines. IEEE Trans. Neural
(4), 787−795. Networks 22 (6), 962−968.
(219) Wei, W., Visweswaran, S., and Cooper, G. F. (2011) The (241) Dioşan, L., Rogozan, A., and Pecuchet, J. P. (2012) Improving
Application of Naive Bayes Model Averaging to Predict Alzheimer’s Classification Performance of Support Vector Machine by Genetically
Disease from Genome-Wide Data. J. Am. Med. Informatics Assoc. 18 (4), Optimising Kernel Shape and Hyper-Parameters. Appl. Intell 36, 280−
370−375. 294.
(220) Frank, E., Hall, M., and Pfahringer, B. Locally Weighted Naive (242) Lanckriet, G. R., Bartlett, P., El Ghaoui, L., Jordan, M. I.,
Bayes. (2002) Proceedings from the Nineteenth conference on Cristianini, N., Jordan Lanckriet, M. I., and Ghaoui, E. (2004) Learning
Uncertainty in Artificial Intelligence, Acapulco, Mexico, pp 249−256, the Kernel Matrix with Semidefinite Programming. J. Mach. Learn. Res.
Morgan Kaufmann Publishers, Inc., San Francisco, CA. 5, 27−72.
(221) Klon, A. E., Lowrie, J. F., and Diller, D. J. (2006) Improved (243) Sonnenburg, S., Rätsch, G., Schäfer, C., and Schölkopf, B.
̈ Bayesian Modeling of Numerical Data for Absorption,
Naive (2006) Large scale multiple kernel learning. J. Mach. Learn. Res. 7,
Distribution, Metabolism and Excretion (ADME) Property Prediction. 1531−1565.
J. Chem. Inf. Model. 46 (5), 1945−1956. (244) Bucak, S. S., Jin, R., and Jain, A. K. (2014) Multiple Kernel
(222) Jiang, L., Zhang, H., and Cai, Z. (2009) A Novel Bayes Model: Learning for Visual Object Recognition: A Review. IEEE Trans. Pattern
Hidden Naive Bayes. IEEE Trans. Knowl. Data Eng. 21 (10), 1361−
Anal. Mach. Intell. 36 (7), 1354−1369.
1371. (245) Sagi, O., and Rokach, L. (2018) Ensemble Learning: A Survey.
(223) Cortes, C., Vapnik, V., and Saitta, L. (1995) Support-Vector
Wiley Interdiscip. Rev. Data Min. Knowl. Discovery 8 (4), e1249.
Networks. Mach. Learn. 20 (3), 273−297.
(246) Rokach, L. (2016) Decision Forest: Twenty Years of Research.
(224) Vapnik, V. Principles of Risk Minimization for Learning Theory.
Inf. Fusion 27, 111−125.
(1992) Advances in Neural Information Processing Systems, pp 831−838,
(247) Gomes, H. M., Barddal, J. P., Enembreck, A. F., and Bifet, A.
NeurIPS, San Diego, CA.
(2017) A Survey on Ensemble Learning for Data Stream Classification.
(225) Zhong, M., Nie, X., Yan, A., and Yuan, Q. (2013)
ACM Computing Surveys 50 (2), 1−36.
Carcinogenicity Prediction of Noncongeneric Chemicals by a Support
(248) Rokach, L. (2010) Ensemble-Based Classifiers. Artif. Intell. Rev.
Vector Machine. Chem. Res. Toxicol. 26 (5), 741−749.
(226) Burges, C. J. C. (1998) A Tutorial on Support Vector Machines 33 (1−2), 1−39.
(249) Ren, Y., Zhang, L., and Suganthan, P. N. (2016) Ensemble
for Pattern Recognition. Data mining and knowledge discovery 2 (2),
121−167. Classification and Regression-Recent Developments, Applications and
(227) Vapnik, V. N. (1999) An Overview of Statistical Learning Future Directions. IEEE Computational Intelligence Magazine 11 (1),
Theory. IEEE transactions on neural networks 10 (5), 988−999. 41−53.
(228) Ben-Hur, A., and Weston, J. (2010) A User’s Guide to Support (250) Woźniak, M., Graña, M., and Corchado, E. (2014) A Survey of
Vector Machines. Methods Mol. Biol. 609, 223−239. Multiple Classifier Systems as Hybrid Systems. Inf. Fusion 16 (1), 3−17.
(229) Noble, W. S. (2006) What Is a Support Vector Machine? Nat. (251) Krawczyk, B., Minku, L. L., Gama, J., Stefanowski, J., and
Biotechnol. 24 (12), 1565−1567. Woźniak, M. (2017) Ensemble Learning for Data Stream Analysis: A
(230) Rebentrost, P., Mohseni, M., and Lloyd, S. (2014) Quantum Survey. Inf. Fusion 37, 132−156.
Support Vector Machine for Big Data Classification. Phys. Rev. Lett. 113 (252) Galar, M., Fernandez, A., Barrenechea, E., Bustince, H., and
(3), 130503. Herrera, F. (2012) A Review on Ensembles for the Class Imbalance
(231) Amari, S., and Wu, S. (1999) Improving Support Vector Problem: Bagging-, Boosting-, and Hybrid-Based Approaches. IEEE
Machine Classifiers by Modifying Kernel Functions. Neural Networks Transactions on Systems, Man and Cybernetics Part C: Applications and
12 (6), 783−789. Reviews 42 (4), 463−484.
(232) Varewyck, M., and Martens, J. P. (2011) A Practical Approach (253) Solomatine, D. P., and Shrestha, D. L. AdaBoost.RT: A
to Model Selection for Support Vector Machines with a Gaussian Boosting Algorithm for Regression Problems. (2004) Proceedings from
Kernel. IEEE Trans. Syst. Man, Cybern. Part B Cybern. 41 (2), 330−340. the IEEE International Joint Conference on Neural Networks, July 25−29,
(233) Hussain, M., Wajid, S. K., Elzaart, A., and Berbar, M. A 2004, Budapest, Hungary, pp 1163−1168, IEEE, New York.
Comparison of SVM Kernel Functions for Breast Cancer Detection. (254) Schapire, R. E. (2013) Explaining Adaboost. Empirical Inference,
(2011) Proceedings from the 8th International Conference on Computer 37−52.
Graphics, Imaging and Visualization, August 16−19, 2011, Singapore, pp (255) Breiman, L. (1996) Bagging Predictors. Mach. Learn. 24 (2),
145−150, IEEE, New York. 123−140.
(234) Kuo, B. C., Ho, H. H., Li, C. H., Hung, C. C., and Taur, J. S. (256) Schapire, R. E., and Freund, Y. (1997) A Decision-Theoretic
(2014) A Kernel-Based Feature Selection Method for SVM with RBF Generalization of on-Line Learning and an Application to Boosting.
Kernel for Hyperspectral Image Classification. IEEE J. Sel. Top. Appl. Journal of computer and system sciences 55 (1), 119−139.
Earth Obs. Remote Sens. 7 (1), 317−326. (257) Dietterich, T. G. (2000) Experimental Comparison of Three
(235) Schölkopf, B., Sung, K. K., Burges, C. J. C., Girosi, F., Niyogi, P., Methods for Constructing Ensembles of Decision Trees: Bagging,
Poggio, T., and Vapnik, V. (1997) Comparing Support Vector Boosting, and Randomization. Mach. Learn. 40 (2), 139−157.
Machines with Gaussian Kernels to Radial Basis Function Classifiers. (258) Bernard, S., Heutte, L., and Adam, S. On the Selection of
IEEE Trans. Signal Process. 45 (11), 2758−2765. Decision Trees in Random Forests. (2009) Proceedings from the

U https://dx.doi.org/10.1021/acs.chemrestox.0c00316
Chem. Res. Toxicol. XXXX, XXX, XXX−XXX
Chemical Research in Toxicology pubs.acs.org/crt Review

International Joint Conference on Neural Networks, June 14−19, 2009, from the IEEE International Conference on Acoustics, Speech and Signal
Atlanta, GA, pp 302−307, IEEE, New York. Processing, May 26−31, 2013, Vancouver, Canada, pp 3517−3521,
(259) Kingsford, C., and Salzberg, S. L. (2008) What Are Decision IEEE, New York.
Trees? Nat. Biotechnol. 26 (9), 1011−1012. (281) Hara, K., Saito, D., and Shouno, H. Analysis of Function of
(260) Jiang, X., Wu, C. A., and Guo, H. (2017) Forest Pruning Based Rectified Linear Unit Used in Deep Learning. (2015) Proceedings from
on Branch Importance. Comput. Intell. Neurosci. 2017, 3162571. the International Joint Conference on Neural Networks, July 12−17, 2015,
(261) Yang, F., Lu, W.-h., Luo, L.-k., and Li, T. (2012) Margin Killarney, Ireland, pp 1−8, IEEE, New York.
Optimization Based Pruning for Random Forest. Neurocomputing 94, (282) Wray, J., and Green, G. G. R. (1995) Neural Networks,
54−63. Approximation Theory, and Finite Precision Computation. Neural
(262) Svetnik, V., Liaw, A., Tong, C., Culberson, J. C., Sheridan, R. P., networks 8 (1), 31−37.
and Feuston, B. P. (2003) Random Forest: A Classification and (283) Siddique, M. N. H., and Tokhi, M. O. Training Neural
Regression Tool for Compound Classification and QSAR Modeling. J. Networks: Backpropagation vs Genetic Algorithms. (2001) Proceed-
Chem. Inf. Comput. Sci. 43 (6), 1947−1958. ings from the International Joint Conference on Neural Networks, July
(263) Ali, J., Khan, R., Ahmad, N., and Maqsood, I. (2012) Random
15−19, 2001, Washington, DC, pp 2673−2678, IEEE, New York.
Forests and Decision Trees. Int. J. Comput. Sci. Issues 9 (5), 272−278.
(284) Ismail Fawaz, H., Forestier, G., Weber, J., Idoumghar, L., and
(264) Prajwala, T. R. (2015) A Comparative Study on Decision Tree
Muller, P. A. (2019) Deep Learning for Time Series Classification: A
and Random Forest Using R Tool. Int. J. Adv. Res. Comput. Commun.
Eng. 4 (1), 196. Review. Data Min. Knowl. Discovery 33 (4), 917−963.
(285) Wilamowski, B. M., and Yu, H. (2010) Neural Network
(265) Fawagreh, K., Gaber, M. M., and Elyan, E. (2014) Random
Forests: From Early Developments to Recent Advancements. Syst. Sci. Learning without Backpropagation. IEEE Trans. Neural Networks 21
Control Eng. 2 (1), 602−609. (11), 1793−1803.
(266) Kalogirou, S. A. (2001) Artificial Neural Networks in (286) Zhu, X. X., Tuia, D., Mou, L., Xia, G. S., Zhang, L., Xu, F., and
Renewable Energy Systems Applications: A Review. Renewable Fraundorfer, F. (2017) Deep Learning in Remote Sensing: A
Sustainable Energy Rev. 5 (4), 373−401. Comprehensive Review and List of Resources. IEEE Geosci. Remote
(267) Markou, M., and Singh, S. (2003) Novelty Detection: A Review Sens. Mag. 5 (4), 8−36.
- Part 2:: Neural Network Based Approaches. Signal Processing 83 (12), (287) Miotto, R., Wang, F., Wang, S., Jiang, X., and Dudley, J. T.
2499−2521. (2018) Deep Learning for Healthcare: Review, Opportunities and
(268) Egmont-Petersen, M., De Ridder, D., and Handels, H. (2002) Challenges. Briefings Bioinf. 19 (6), 1236−1246.
Image Processing with Neural Networks- A Review. Pattern Recognition (288) Guo, Y., Liu, Y., Oerlemans, A., Lao, S., Wu, S., and Lew, M. S.
35 (10), 2279−2301. (2016) Deep Learning for Visual Understanding: A Review. Neuro-
(269) Schmidhuber, J. (2015) Deep Learning in Neural Networks: An computing 187, 27−48.
Overview. Neural Networks 61, 85−117. (289) Lecun, Y., Bengio, Y., and Hinton, G. (2015) Deep Learning.
(270) Greff, K., Srivastava, R. K., Koutnik, J., Steunebrink, B. R., and Nature 521 (7553), 436−444.
Schmidhuber, J. (2017) LSTM: A Search Space Odyssey. IEEE Trans. (290) Kriegeskorte, N. (2015) Deep Neural Networks: A New
Neural Networks Learn. Syst. 28 (10), 2222−2232. Framework for Modeling Biological Vision and Brain Information
(271) Graves, A., Mohamed, A. R., and Hinton, G. Speech Processing. Annu. Rev. Vis. Sci. 1 (1), 417−446.
Recognition with Deep Recurrent Neural Networks. (2013) Proceed- (291) Segler, M. H. S., Preuss, M., and Waller, M. P. (2018) Planning
ings from the IEEE International Conference on Acoustics, Speech and Chemical Syntheses with Deep Neural Networks and Symbolic AI.
Signal Processing, May 26−31, 2013, Vancouver, Canada, pp 6645− Nature 555 (7698), 604−610.
6649, IEEE, New York. (292) Esteva, A., Kuprel, B., Novoa, R. A., Ko, J., Swetter, S. M., Blau,
(272) Che, Z., Purushotham, S., Cho, K., Sontag, D., and Liu, Y. H. M., and Thrun, S. (2017) Dermatologist-Level Classification of Skin
(2018) Recurrent Neural Networks for Multivariate Time Series with Cancer with Deep Neural Networks. Nature 542 (7639), 115−118.
Missing Values. Sci. Rep. 8 (1), 6085. (293) Sladojevic, S., Arsenovic, M., Anderla, A., Culibrk, D., and
(273) Saha, S., and Raghava, G. P. S. (2006) Prediction of Continuous Stefanovic, D. (2016) Deep Neural Networks Based Recognition of
B-Cell Epitopes in an Antigen Using Recurrent Neural Network. Plant Diseases by Leaf Image Classification. Comput. Intell. Neurosci.
Proteins: Struct., Funct., Genet. 65 (1), 40−48. 2016, 3289801.
(274) Zhang, H., Wang, Z., and Liu, D. (2014) A Comprehensive (294) Cireşan, D. C., Meier, U., Gambardella, L. M., and
Review of Stability Analysis of Continuous-Time Recurrent Neural Schmidhuber, J. (2010) Deep, Big, Simple Neural Nets for Hand-
Networks. IEEE Trans. Neural Networks Learn. Syst. 25 (7), 1229−
written Digit Recognition. Neural Computation 22 (12), 3207−3220.
1262.
(295) Pallipuram, V. K., Bhuiyan, M., and Smith, M. C. (2012) A
(275) Stathakis, D. (2009) How Many Hidden Layers and Nodes? Int.
Comparative Study of GPU Programming Models and Architectures
J. Remote Sens. 30 (8), 2133−2147.
(276) Teoh, E. J., Tan, K. C., and Xiang, C. (2006) Estimating the Using Neural Networks. J. Supercomput. 61 (3), 673−718.
Number of Hidden Neurons in a Feedforward Network Using the (296) Huqqani, A. A., Schikuta, E., Ye, S., and Chen, P. (2013)
Singular Value Decomposition. IEEE Trans. Neural Networks 17 (6), Multicore and GPU Parallelization of Neural Networks for Face
1623−1629. Recognition. Procedia Comput. Sci. 18, 349−358.
(277) Verikas, A., and Bacauskiene, M. (2002) Feature Selection with (297) Oh, K. S., and Jung, K. (2004) GPU Implementation of Neural
Neural Networks. Pattern Recognit. Lett. 23 (11), 1323−1335. Networks. Pattern Recognit. 37 (6), 1311−1314.
(278) Sibi, P., Jones, S. A., and Siddarth, P. (2013) Analysis of (298) Lin, S. B. (2017) Limitations of Shallow Nets Approximation.
Different Activation Functions Using Back Propagation Neural Neural Networks 94, 96−102.
Networks. J. Theor. Appl. Inf. Technol. 47 (3), 1264−1268. (299) Lawrence, S., Lee Giles, C., Member, S., Chung Tsoi, A., and
(279) Nair, V., and Hinton, G. E. Rectified Linear Units Improve Back, A. D. (1997) Face Recognition: A Convolutional Neural-
Restricted Boltzmann Machines. (2010) Proceedings from the Network Approach. IEEE Trans. Neural Networks 8 (1), 98−113.
International Conference on International Conference on Machine (300) Rawat, W., and Wang, Z. (2017) Deep Convolutional Neural
Learning, June 21−24, 2010, Haifa, Israel, pp 807-814, ICML, San Networks for Image Classification: A Comprehensive Review. Neural
Diego, CA. Computation 29 (9), 2352−2449.
(280) Zeiler, M. D., Ranzato, M., Monga, R., Mao, M., Yang, K., Le, Q. (301) Wiatowski, T., and Bolcskei, H. (2018) A Mathematical Theory
V., Nguyen, P., Senior, A., Vanhoucke, V., Dean, J., and Hinton, G. E. of Deep Convolutional Neural Networks for Feature Extraction. IEEE
On Rectified Linear Units for Speech Processing. (2013) Proceedings Trans. Inf. Theory 64 (3), 1845−1866.

V https://dx.doi.org/10.1021/acs.chemrestox.0c00316
Chem. Res. Toxicol. XXXX, XXX, XXX−XXX
Chemical Research in Toxicology pubs.acs.org/crt Review

(302) Lee, C. Y., Gallagher, P. W., and Tu, Z. (2018) Generalizing (321) Yao, Y., Rosasco, L., and Caponnetto, A. (2007) On Early
Pooling Functions in Convolutional Neural Networks: Mixed, Gated, Stopping in Gradient Descent Learning. Constr. Approx. 26 (2), 289−
and Tree. IEEE Trans. Pattern Anal. Mach. Intell. 40, 863−875. 315.
(303) Chami, I., Ying, R., Ré, C., and Leskovec, J. Hyperbolic Graph (322) Hagiwara, K. (2002) Regularization Learning, Early Stopping
Convolutional Neural Networks. (2019) Proceedings from the and Biased Estimator. Neurocomputing 48 (1−4), 937−955.
Advances in Neural Information Processing Systems, December 8−14, (323) Ng, A. Y. Feature Selection, L1 vs. L2 Regularization, and
2019, Vancouver, Canada, pp 4868−4879, NeurIPS, San Diego, CA. Rotational Invariance. (2004) Proceedings of the Twenty-First
(304) Wu, Z., Ramsundar, B., Feinberg, E. N., Gomes, J., Geniesse, C., International Conference on Machine Learning, July 4−8, 2004, Banff,
Pappu, A. S., Leswing, K., and Pande, V. (2018) MoleculeNet: A Alberta, Canada, p 78, ACM, New York.
Benchmark for Molecular Machine Learning. Chem. Sci. 9 (2), 513− (324) Girosi, F., Jones, M., and Poggio, T. (1995) Regularization
530. Theory and Neural Networks Architectures. Neural Comput. 7 (2),
(305) Niepert, M., Ahmad, M., and Kutzkov, K. Learning Convolu- 219−269.
tional Neural Networks for Graphs. (2016) Proceedings from the (325) Williams, P. M. (1995) Bayesian Regularization and Pruning
International Conference on Machine Learning, June 19−24, 2016, New Using a Laplace Prior. Neural Comput. 7 (1), 117−143.
York, pp 2014−2023, ICML, San Diego, CA. (326) Shi, G., Zhang, J., Li, H., and Wang, C. (2019) Enhance the
(306) Coley, C. W., Jin, W., Rogers, L., Jamison, T. F., Jaakkola, T. S., Performance of Deep Neural Networks via L2 Regularization on the
Green, W. H., Barzilay, R., and Jensen, K. F. (2019) A Graph- Input of Activations. Neural Process. Lett. 50 (1), 57−75.
Convolutional Neural Network Model for the Prediction of Chemical (327) Lee, S.-I., Ganapathi, V., and Koller, D. (2007) Efficient
Structure Learning of Markov Networks Using L 1-Regularization. Adv.
Reactivity. Chem. Sci. 10 (2), 370−377.
Neural Inform. Process Syst., 817−824.
(307) Ying, R., He, R., Chen, K., Eksombatchai, P., Hamilton, W. L.,
(328) Cronin, M. T., Richarz, A. N., and Schultz, T. W. (2019)
and Leskovec, J. Graph Convolutional Neural Networks for Web-Scale
Identification and Description of the Uncertainty, Variability, Bias and
Recommender Systems. (2018) Proceedings from the Proceedings of the
Influence in Quantitative Structure-Activity Relationships (QSARs) for
24th ACM SIGKDD International Conference on Knowledge Discovery Toxicity Prediction. Regul. Toxicol. Pharmacol. 106, 90−104.
and Data Mining, August, 2018, London, pp 974−983, ACM, New (329) Pittman, M. E., Edwards, S. W., Ives, C., and Mortensen, H. M.
York. (2018) AOP-DB: A Database Resource for the Exploration of Adverse
(308) Jimenez-Carretero, D., Abrishami, V., Fernández-de-Manuel, Outcome Pathways through Integrated Association Networks. Toxicol.
L., Palacios, I., Quílez-Á lvarez, A., Díez-Sánchez, A., del Pozo, M. A., Appl. Pharmacol. 343, 71−83.
and Montoya, M. C. (2018) Tox_(R)CNN: Deep Learning-Based (330) Leist, M., Ghallab, A., Graepel, R., Marchan, R., Hassan, R.,
Nuclei Profiling Tool for Drug Toxicity Screening. PLoS Comput. Biol. Bennekou, S. H., Limonciel, A., Vinken, M., Schildknecht, S.,
14 (11), e1006238. Waldmann, T., Danen, E., van Ravenzwaay, B., Kamp, H., Gardner,
(309) Maron, D. M., and Ames, B. N. (1983) Revised Methods for the I., Godoy, P., Bois, F. Y., Braeuning, A., Reif, R., Oesch, F., Drasdo, D.,
Salmonella Mutagenicity Test. Mutation Research/Environmental Muta- Höhme, S., Schwarz, M., Hartung, T., Braunbeck, T., Beltman, J.,
genesis and Related Subjects 113, 173−215. Vrieling, H., Sanz, F., Forsby, A., Gadaleta, D., Fisher, C., Kelm, J., Fluri,
(310) Ames, B. N., Durston, W. E., Yamasaki, E., and Lee, F. D. (1973) D., Ecker, G., Zdrazil, B., Terron, A., Jennings, P., van der Burg, B.,
Carcinogens Are Mutagens: A Simple Test Combining Liver Dooley, S., Meijer, A. H., Willighagen, E., Martens, M., Evelo, C.,
Homogenates for Activation and Bacteria for Detection. Proc. Natl. Mombelli, E., Taboureau, O., Mantovani, A., Hardy, B., Koch, B.,
Acad. Sci. U. S. A. 70 (8), 2281−2285. Escher, S., van Thriel, C., Cadenas, C., Kroese, D., van de Water, B., and
(311) Hillebrecht, A., Muster, W., Brigo, A., Kansy, M., Weiser, T., and Hengstler, J. G. (2017) Adverse Outcome Pathways: Opportunities,
Singer, T. (2011) Comparative Evaluation of in Silico Systems for Ames Limitations and Open Questions. Arch. Toxicol. 91 (11), 3477−3505.
Test Mutagenicity Prediction: Scope and Limitations. Chem. Res. (331) Vinken, M. (2013) The Adverse Outcome Pathway Concept: A
Toxicol. 24 (6), 843−854. Pragmatic Tool in Toxicology. Toxicology 312, 158−165.
(312) Novotarskyi, S., Abdelaziz, A., Sushko, Y., Korner, R., Vogt, J., (332) Allen, T. E. H., Goodman, J. M., Gutsell, S., and Russell, P. J.
and Tetko, I. V. (2016) ToxCast EPA in Vitro to in Vivo Challenge: (2014) Defining Molecular Initiating Events in the Adverse Outcome
Insight into the Rank-I Model. Chem. Res. Toxicol. 29 (5), 768−775. Pathway Framework for Risk Assessment. Chem. Res. Toxicol. 27 (12),
(313) Xu, T., Ngan, D. K., Ye, L., Xia, M., Xie, H. Q., Zhao, B., 2100−2112.
Simeonov, A., and Huang, R. (2020) Predictive Models for Human (333) Ellison, C. M., Enoch, S. J., and Cronin, M. T. D. (2011) A
Organ Toxicity Based on In Vitro Bioactivity Data and Chemical Review of the Use of in Silico Methods to Predict the Chemistry of
Structure. Chem. Res. Toxicol. 33 (3), 731−741. Molecular Initiating Events Related to Drug Toxicity. Expert Opin. Drug
(314) Rostami-Hodjegan, A. (2012) Physiologically Based Pharma- Metab. Toxicol. 7 (12), 1481−1495.
cokinetics Joined with in Vitro-in Vivo Extrapolation of ADME: A (334) Allen, T. E. H., Goodman, J. M., Gutsell, S., and Russell, P. J.
Marriage under the Arch of Systems Pharmacology. Clin. Pharmacol. (2016) A History of the Molecular Initiating Event. Chem. Res. Toxicol.
Ther. 92 (1), 50−61. 29 (12), 2060−2070.
(315) Caruana, R. (1997) Multitask Learning. Mach. Learn. 28 (1), (335) Slikker, W., Jr, de Souza Lima, T. A., Archella, D., de Silva, J. B.,
41−75. Jr., Barton-Maclaren, T., Bo, L., Buvinich, D., Chaudhry, Q., Chuan, P.,
(316) Gibaja, E., and Ventura, S. (2014) Multi-Label Learning: A Deluyker, H., Domselaar, G., et al. (2018) Emerging Technologies for
Review of the State of the Art and Ongoing Research. Wiley Interdiscip. Food and Drug Safety. Regul. Toxicol. Pharmacol. 98, 115−128.
Rev. Data Min. Knowl. Discovery 4 (6), 411−444. (336) Zaunbrecher, V., Beryt, E., Parodi, D., Telesca, D., Doherty, J.,
(317) Sosnin, S., Vashurina, M., Withnall, M., Karpov, P., Fedorov, M., Malloy, T., and Allard, P. (2017) Has Toxicity Testing Moved into the
and Tetko, I. V. (2019) A Survey of Multi-Task Learning Methods in 21st Century? A Survey and Analysis of Perceptions in the Field of
Chemoinformatics. Mol. Inf. 38 (4), 1800108. Toxicology. Environ. Health Perspect. 125 (8), 087024.
(318) Zhang, T. (2010) Analysis of Multi-Stage Convex Relaxation for (337) Chesnut, M., Yamada, T., Adams, T., Knight, D., Kleinstreuer,
N., Kass, G., Luechtefeld, T., Hartung, T., and Maertens, A. (2018)
Sparse Regularization. J. Mach. Learn. Res. 11, 1081−1107.
Regulatory Acceptance of Read-Across. ALTEX-Alternatives to Anim.
(319) Weigend, A. S., Rumelhart, D. E., and Huberman, B. A. (1991)
Exp. 35 (3), 413−419.
Generalization by Weight-Elimination with Application to Forecasting.
Adv. Neural Inform. Process Syst. 3, 875−882.
(320) Hastie, T., Rosset, S., Tibshirani, R., and Zhu, J. (2004) The
Entire Regularization Path for the Support Vector Machine. J. Mach.
Learn. Res. 5, 1391−1415.

W https://dx.doi.org/10.1021/acs.chemrestox.0c00316
Chem. Res. Toxicol. XXXX, XXX, XXX−XXX

You might also like