0% found this document useful (0 votes)

6 views

Drug Target Interaction Prediction Using Machine Learning Techniques

This review paper discusses the use of machine learning (ML) techniques for predicting Drug Target Interactions (DTIs) in the drug discovery process, highlighting the importance of identifying interactions between drugs and their protein targets. It explores various ML methods, including docking-based, ligand-based, and chemogenomics-based approaches, while emphasizing the need for improved classifiers and the challenges posed by the lack of true negative drug-target pairs. The paper aims to provide insights into state-of-the-art ML techniques to enhance DTI prediction and facilitate future research in drug development.

Uploaded by

MD. RAKIBUL HASAN TALUKDER

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

Drug Target Interaction Prediction Using Machine Learning Techniques

Uploaded by

MD. RAKIBUL HASAN TALUKDER

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

International Journal of Interactive Multimedia and Artificial Intelligence, Vol.

8, Nº6

Drug Target Interaction Prediction Using Machine

Learning Techniques – A Review
A. Suruliandi1, T. Idhaya1, S. P. Raja2 *
1
Department of Computer Science and Engineering, Manonmaniam Sundaranar University, Abhishekapatti, Tirunelveli,
TamilNadu (India)
2
School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, TamilNadu (India)

* Corresponding author. [email protected] (A. Suruliandi), [email protected] (T. Idhaya),

[email protected] (S. P. Raja).

Received 13 August 2021 | Accepted 4 January 2022 | Early Access 10 November 2022

Abstract Keywords
Drug discovery is a key process, given the rising and ubiquitous demand for medication to stay in good shape Chemogenomics,
right through the course of one’s life. Drugs are small molecules that inhibit or activate the function of a Drug Databases, Drug
protein, offering patients a host of therapeutic benefits. Drug design is the inventive process of finding new Discovery, Drug Target
medication, based on targets or proteins. Identifying new drugs is a process that involves time and money. Interactions, Machine
This is where computer-aided drug design helps cut time and costs. Drug design needs drug targets that Learning, Targets, Target
are a protein and a drug compound, with which the interaction between a drug and a target is established. Databases.
Interaction, in this context, refers to the process of discovering protein binding sites, which are protein pockets
that bind with drugs. Pockets are regions on a protein macromolecule that bind to drug molecules. Researchers
have been at work trying to determine new Drug Target Interactions (DTI) that predict whether or not a given
drug molecule will bind to a target. Machine learning (ML) techniques help establish the interaction between
drugs and their targets, using computer-aided drug design. This paper aims to explore ML techniques better
for DTI prediction and boost future research. Qualitative and quantitative analyses of ML techniques show that
several have been applied to predict DTIs, employing a range of classifiers. Though DTI prediction improves
with negative drug target pairs (DTP), the lack of true negative DTPs has led to the use a particular dataset
of drugs and targets. Using dynamic DTPs improves DTI prediction. Little attention has so far been paid to DOI: 10.9781/ijimai.2022.11.002
developing a new classifier for DTI classification, and there is, unquestionably, a need for better ones.

I. Introduction helps understand the biological process, recognize novel drugs, and
offer improved therapeutic medicine for illnesses of all sorts. Drug

D iscovering new drugs is critical and driven by the need

for medication in daily life, partly brought on by changing
environmental conditions. Nevertheless, drug discovery is not easy,
development has three trial phases, each of which is more expensive
than the others. As of today, the cost of drug development has risen
from US$3.4 million to US$8.6 million and US$21.4 million for phase
it demands time as well as money, and the drug success rate is I, phase II and phase III trials, respectively [1]. A new drug could fail
usually low. Computer-Aided Drug Design (CADD) is considered a to pass the test in any of the three drug development trial phases,
computational discipline that aims to discover, design, and develop notwithstanding the expense, effort and time involved.
therapeutic chemical targets. There are 3 phases in drug design -
discovery, development, and registry.
II. State of the Art Methods
In the first phase, discovery, the focus is on identifying a new drug
and its targets, based on binding sites. The second phase, development, DTI is the process of finding new drugs and targets for drug
involves pre-clinical research, where the drug is tested on animals for development. Drug and target molecules are discovered through their
safety. Successful research means that human trials are set in motion. interactions. Drug discovery methods are ligand-based, docking-based
In the third phase, registry, the Food and Drug Administration (FDA) and chemogenomics-based, and involve parameters like biomarker
thoroughly reviews all the submitted drug-related data and decides on identification, structure unavailability, physique and condition, and
its approval or otherwise. Initiating an efficient computational model environmental factors. Current research is focused on maximizing
that finds potential Drug Target Interaction (DTI) from biological data interactions so the drugs formulated can successfully treat disease.

Please cite this article as:

A. Suruliandi, T. Idhaya, S. P. Raja, “Drug Target Interaction Prediction Using Machine Learning Techniques – A Reviewˮ, International Journal of
Interactive Multimedia and Artificial Intelligence, vol. 8, no. 6, pp. 86-100, 2024, http://dx.doi.org/10.9781/ijimai.2022.11.002

- 86 -
Regular Issue

The new drugs developed today, though based on knowledge of of proteins is involved, as in, for instance, the G-Couple Protein
existing ones, could still have adverse side effects. Incidentally, a drug Receptor and ion channel, whose structures are far too complex to be
developed for a particular disease may be used, quite unexpectedly, obtained. The simulation is significant in regard to the time taken and
to treat another disease with no side effects whatsoever, a process its overall efficiency.
referred to as drug repurposing [2], [3]. It is essential in drug discovery
to establish the interaction between a drug and a target gene. The 2. Ligand-Based Approach
docking-based method needs a 3D structure of the target protein A ligand-based approach works on the premise that a drug can be
or gene for the process to work. The success of a newly developed predicted without the 3-Dimensional structure of targets and with the
drug depends on how well it fares in the market, particularly in existing knowledge of drugs and its targets.
terms of whether the purpose for which it was originally designed
3. Chemogenomics-Based Approach
is being fulfilled. The possibility of successfully identifying DTI is
enhanced by working on binding factors or interacting sites. This is a A chemogenomics-based approach integrates both the chemical
difficult process, given the limited information on drugs and targets. space of drugs and the genomic space of targets into a single
Bioinformaticians have tried to draw information from factors driving pharmacological space. The challenge here is that there are too few
drugs and targets. The automated tools employed to improve the DTI pairs and too many unknown interaction pairs.
success rate by discovering more interactions or binding sites between
C. Motivation and Justification
drugs and their targets are intended to actively assist doctors and
bioinformaticians. Scientists today work in drug development using The in-vitro prediction of DTI from biological data calls for a lot of
ML predictive analysis techniques to understand drugs and targets, effort in the search for new drugs and targets. Identifying potential
thus boosting DTI success prediction. drugs and targets is a painstaking step in initiating drug discovery.
Despite the plethora of research on DTI prediction in the recent
A. Drug Developing Procedure past, prediction is still material-intensive and protracted. Predicting
Drugs are synthesized chemicals that control, prevent, and cure and interaction between DTPs continues to challenge researchers.
diagnose illnesses. Disease diagnosis is carried out through reading the The motivation for this review is to help researchers in the drug
body’s reactions to drug molecules in the form of positive biological development domain access state-of-the-art methods used in ML for
responses. In pharmacological terms, the biomolecule whose function DTI predictions, and so enhance the quality of research. To this end,
and activity are modified by a specific drug is termed the drug target. several insightful articles on DTI procedures and methods that help
Biomolecules can be proteins, nucleic acids, receptors, enzymes, and discover new drugs and targets differently are reviewed. The machine
ion channels. The DTI process interacts or binds the drug molecule learning (ML) techniques used to predict DTIs are studied, each with
to the active biomolecule site with the same structural or functional its strengths and limitations. The research is categorized, based on the
properties as the drug molecule, culminating in the creation of a new ML techniques used in the prediction. Thereafter, it is qualitatively and
product as in Fig.1. The human body assimilates the product, resulting quantitatively analyzed to understand ML and DTI better so the latter
in a cure. can be improved.
The contributions of this paper are as follows, Articles related to
Compound ML and DTI in drug development are studied in detail and categorized,
Binding site
Active site based on the machine learning techniques deployed as in section III.
The feature selection techniques used in DTI prediction suggest the
best features for use. Articles on DTI prediction using ML techniques
Biomolecule Biomolecule Biomolecule
have described how ML manages datasets from miscellaneous
Biomolecule binds with Integrates in active site New product
databases, balances imbalanced data, handles large-scale datasets and
compoundin active site to create a new product features and, finally, examines at length the ML algorithms used in DTI
prediction. Articles that are qualitatively analysed in section V based
Fig.1. Drug Developing Procedure.
on ML techniques to understand their strengths and weaknesses. A
Drugs are developed in three phases. In the first phase, a drug and quantitative analysis in section VI follows to find the most appropriate
its target are discovered by means of the interacting or binding site, classifiers for DTI predictions.
using substrate on the active site of protein. In the second phase, the D. Organization of the Paper
drug is subjected to animal testing for safety’s sake. In the third phase,
the drug has human trials, following which it is marketed. The paper is organized as follows. Section II provides an overview
of state of the art methods involved in DTI prediction using ML
B. In-Silico Approaches in Drug Discovery techniques. In Section III, Machine learning techniques used for DTI
In-vitro is a technique where the process of drug discovery takes prediction are summarized. In Section IV, databases used for DTI
place in a controlled environment but not within a living organism. prediction are discussed. In Section V, a qualitative analysis of the
Here a pool of potential compounds is identified and narrowed down ML techniques used for DTI is presented. In Section VI, a quantitative
to find most reliable compound for treatment. In-vivo is a technique analysis of DTI prediction methods is offered. Section VII discusses
where the process of drug discovery takes place within a living DTI prediction. Section VIII concludes the study and offers new
organism by giving the reliable compounds to the human trials. Both directions for future research.
the data collected from in-vitro and in-vivo are given as input features
to the in-silico methods for drug prediction, which is a computational III. Machine Learning (ML) Techniques Used for DTI
method. The computational DTI prediction method is categorized into Prediction
the three approaches [4].
Computational models use ML techniques for prediction because
1. Docking-Based Approach they optimize data better and perform better as well. ML techniques,
A docking-based approach in DTI prediction requires a 3D structure which learn data without relying on previously defined formulas, are
for simulation. Consequently, it is not applicable where a large number grouped into two – supervised and unsupervised learning. Supervised

- 87 -
International Journal of Interactive Multimedia and Artificial Intelligence, Vol. 8, Nº6

learning predictions are based on observed existing knowledge from A. Chemogenomics-Based Machine Learning (ML) Techniques for
known data, while unsupervised learning predictions do the same DTI Prediction
without. Predictions are guesses based on existing knowledge from
The chemogenomics-based prediction approach is computationally
the data at hand. On the other hand, Classification refers to the
predicted using ML-based, graph-based or network-based methods.
process of differentiating between known and unknown labels.
ML-based methods are explained below in Fig 3.
The objective of this paper is to explore ML techniques involved
in improving DTP identification to find DTIs. The identification
of a new drug involves the drug and its target. Because of large Similarity based Methods
number of features of both drugs and targets manually extracting
them would be a time taking process, so the researchers use only
Matrix based Methods
tools like ChemCPP, EDragon, CDK, Open Babel, RDkit, PADEL for
extracting the features from drugs and Protr, SPICE, Propy, ProtDcal, Chemogenomics
ProtParam for extracting features from targets. Drug and target based ML Feature based Methods
techniques
features are extracted and concatenated with each other to form
DTPs. The pairs are analyzed for interaction prediction; specifically, Network based Methods
to observe whether or not the DTPs interact. The ML techniques
analyzed are explained qualitatively and quantitatively and the
Deep Learning based Methods
classifier used for DTI prediction is found. The DTI prediction
here mainly uses a static database. Prediction can be improved
when there are more targets and drugs with the interaction Fig.3. Chemogenomics based ML Techniques.
between them yet to be ascertained. In recent times, CADD has
been used to develop drugs for immunodeficiency syndrome, 1. Similarity-Based Methods
influenza virus infection, glaucoma and lung cancer [5]. CADD The most commonly used DTI prediction methods use drug and
helps in pharmacological, Pharmacodynamics and in-silico toxicity target similarity measures in tandem with the distance between each
prediction, which identifies or filters inactive or toxic molecules pair of drugs and its targets [11]-[18]. These methods use the drug,
[6] and naturally gets ML involved in DTI prediction strategies [7]- target and drug-target interaction similarity scores based on prior
[10]. Thus to improve drug development various methods based on knowledge of their interaction similarity. The similarity is obtained
drugs and targets are developed using ML techniques. Fig.2 shows using a distance function like the Euclidean. For instance, if the
DTI prediction through ML techniques with targets and drugs taken following function is employed for the nearest neighbor algorithm,
from diverse databases. Drug and target features are extracted using assuming two vectors x1 and x2, the distance between the vectors is
a slew of tools or web servers. Subsequently, the most influential found using equation (1) as D(x1, x2) where
features alone are selected and used for DTI prediction with several
ML classifiers to complete the process. (1)
and the same dimension and distance are calculated using the
Targets Drugs
Euclidean norm and the inner product. The similarity between a
drug and a target is given through the pharmacological similarity
of the drug, the genomic similarity of the protein sequence, and the
Feature Extraction topological properties of a multipartite network of previously known
drug-target interaction knowledge. The disadvantage of these methods
Feature Selection is that they use knowledge drawn from a small quantum of labelled
data, while there exist large quanta of unlabeled data.
Drug Target Pairs 2. Matrix-Based Methods
Several studies [19]-[24] have shown that matrix-based methods
Training Set Testing Set outperform the rest in DTI prediction. The interaction matrix is

Classifier
(2)
Prediction of DTI For i=1: m and j=1: n,

Fig.2. Flow of DTI Prediction.

In-silico methods include Machine learning, Data mining, Network The first move in DTI prediction is to break down matrix Xmxn into
analysis tool and data analysis tool, Quantitative Structure Analysis two matrices, Ymxk and Znxk, where X ~ YZT with k < m, n, and where ZT
Relationship (QSAR), pharmacophores, homology modeling, Here denotes the swapped matrix of Z. This process of factorizing matrices
Machine learning technique is more feasible than all other methods in lower order makes it easier for matrix-based approach to deal with
for working with drug discovery data for analysis. The trending the missing data. With these methods, however, the distance between
research in drug discovery is “Identification of screening hits the drug and target appears to be the same and establishes the
(compounds)” which helps in finding the particular compounds target strength of the interaction between them, embedding them in a low-
with more potency at different level like binding, reducing the side dimensional matrix. The reliability of these methods is affected when
effects, efficiency, and also increases the life of patients by changing the drug and target data increase in volume, impacting the capacity to
the function of the biomolecule. find their interaction.

- 88 -
Regular Issue

3. Feature-Based Methods E. Kyoto Encyclopedia of Genes and Genomes (KEGG)

Feature-based prediction methods largely use the support vector KEGG is an out-of-the-box database with exhaustive details of
machine to find drug-target interaction [25]-[33]. Any pair of targets genes and genome sequences [52]. The KEGG databases are divided
and drugs may be represented with features, leading to binary into four categories. The first has three numbers of databases KEGG
classification or two-class clustering with positive or negative - BRITE, PATHWAY and MODULE. The second has four databases
interactions. Features are represented as F that carry genomic information– KEGG-GENOME, KEGG-GENE,
KEGG-SSDB and KEGG ORTHOLOGY. The third has five databases
F= {d + t}, d= d1, d2,d3,….da and t = t1, t2, t3, …. tb (3)
with chemical information KEGG- COMPOUNDS, KEGG-REACTION,
where d denotes the drug features of length a and t the target KEGG-RCLASS, KEGG-ENZYME and KEGG-GLYCAN. The fourth has
features of length b, respectively. four databases carrying health information– KEGG-DISEASE, KEGG-
DRUG. The comprehensive KEGG has a wealth of DTI information
4. Network-Based Methods and outclasses others.
Network-based methods [34]-[40], which use graph-based
techniques to predict DTI, are considered simple and reliable F. Library of Integrated Network-Based Cellular Signatures (LINCS)
interaction prediction methods. Here, the drug-drug similarity, target- This database holds information on the KINOME scan. Kinases are
target similarity and known interactions between DTI are integrated small molecule-binding assays that help study the interaction between
into a heterogeneous network, operating on the simple logical drug compounds for testing purposes. The database consists of 398
principle that similar drugs interact with similar targets. datasets on fluorescence imaging, ELISA and ATAC-sequence data [53].

5. Deep Learning-Based Methods G. PROMISCUOUS

Deep learning-based approaches can reduce the loss of feature The database has network-based drug repositioning data with
information in predicting DTIs. However, they need adequate information on drugs, proteins and the side effects of every drug. The
information to predict interaction and drug repurposing [41]-[45]. information on protein is from the Unitprot database, while details on
The two steps of deep learning include generating feature vectors and drugs and side effects are from the SuperDrug and Sider databases,
predicting interaction. The target property and drug property generate incorporated into the LINCS [54].
a features matrix for prediction.
H. Search Tool for Interacting Chemicals (STITCH)
STITCH has information on target or protein interaction with small
IV. Databases Used in DTI Prediction molecules, collected from PubChem databases and literature studies [55].
Interaction prediction demands the twin data items of drugs and I. SuperTarget
targets, and a working knowledge of their interaction. The popular
SuperTarget is a web resource that carries information on DTIs,
databases used in this study fall into two categories, drug-centered and
drug metabolic rate, pathways, and Gene Ontology (GO) terms, as well
target-centered. More than 20 databases associated with interaction
as on adverse medical side effects. The DTI information is sourced
prediction are not directly involved in DTI prediction, though the data
from PubMed, DrugBank, KEGG, PDB and TTD, and potential drug-
contained therein maybe used as input for prediction. The popular
target relationships are extracted from Medline [56].
database, KEGG, used here for prediction, is divided into the sub-
databases of KEGG BRITE [46] and KEGG DRUG [47], incorporating a J. Therapeutic Target Database (TTD)
mass of biological data from genes and proteins.
The Target Therapeutic Database has therapeutic information
A. Chemical European Molecular Biology Laboratory (ChEMBLdb) on protein and nucleic acid, assimilated from literature studies and
miscellaneous databases with DTI data [57].
The data gathered is a chemical database of bioactive molecules
[48] which are collected from numerous literature studies. With K. BRENDA -The Comprehensive Enzyme Information System
millions of chemical compounds, 10,000 drugs and 12000 targets, the (BRENDA)
ChEMBLdb was established by the EMBL – European Bioinformatics
This is an enzyme database with information on enzyme-ligand
Institute in 2002.
interaction. The data collected is drawn from literature studies based
B. Chemical – Protein Annotation Resource (ChemProt) on enzyme nomenclature [58].
ChemProt [49] has Chemical-Protein interactions data that L. Drug Central
integrates data from multiple databases of chemical protein annotations.
Drug Central is a Food and Drug Association (FDA)-approved drug
It comprises data from the PDSP, DrugBank, PharmGKB, PubChem
database. The database incorporates relevant information on drugs in
and STITCH databases. ChemProt also integrates therapeutic effects,
the form of structure, bioactivity and regulatory records, which are
adverse drug reactions and chemical-biological disease data.
categorized as small molecule active ingredients and biological active
C. Drug Gene Interaction Database (DGIdb) ingredients [59].
This database has information on Druggable targets with their M. Protein Drug Interaction Database (PDID)
effects and drug-gene interaction data [50].
Protein Drug Interaction Database (PDID) has DTI for all the
D. DrugBank structural proteome for human beings, with predictions made using
DrugBank is one of the most well-known databases in DTI study, the ILbind, SMAP and eFindSite software[60].
with details about drug-like compounds, their different forms, target N. Pharos
genes and side effects brought on by drug intake. The DTI data in this
Pharos is the user interface for giving knowledge about Illuminating
database that have been collected from an array of literature studies
Druggable Genome (IDG) to the knowledge management center for
has extensive commercial uses [51].
three of the protein families like GPCR, Ion Channel and Kinases [61].

- 89 -
International Journal of Interactive Multimedia and Artificial Intelligence, Vol. 8, Nº6

O. PubChem TABLE I. Databases Involved in Dti Prediction

PubChem [62] has information about chemical substances and No. of No. of
their biological activity. The PubChem database incorporates three S. No Databases No. of Drugs
Targets Interactions
databases–Substances, Compounds and BioAssay. The first stores data 1 ChEMBL 12482 1879206 15504603
on chemical information, the second has exclusive chemical structures 2 ChemProt 20000 170000 -
obtained from substances, while the third holds biological information 3 DGI db 41100 9495 29783
on the extracted substances. 4 DrugBank 5175 13338 26932
P. Super Drug 5 KEGG 19711 4948 260000
6 LINCS 1469 41847 -
Super Drug [63] offers information on all drug features collected
7 PROMISCUOUS 6548 5258 23702
from several databases and incorporated here. The database has
8 STITCH 9600000 430000 -
2-Dimensional and 3-Dimensional structure information on small
molecule drugs, side effects and drugs pharmacokinetics specifications. 9 SuperTarget 6000 196000 330000
10 TTD 3101 34019 -
Q. FDA Adverse Event Reporting System (FAERS) 11 BRENDA 84000 20500 -
The FDA Adverse Event Reporting System (FAERS) is a database 12 Drug Central - 4543 -
with information obtained from adverse events and medication error 13 PDID 3746 5100 -
reports submitted to the FDA on side effects, as well as keywords for 14 Pharos 20244 130166 -
drugs [64]. 15 PubChem 79622 96157016 -
16 Super Drug 4456 4605 -
R. SIDe Effect Resource (SIDER)
17 FAERS - 24842 -
SIDER is a database [65] that holds data on marketed medicines and 18 SIDER 1430 140064 -
their side effect information, including frequency of side effects, and 19 IUPHAR/BPS 1396 1105 443
also drug and its side effect classification. 20 Cancer Dr - 148 -
S. International Union of Basic and Clinical Pharmacology 21 Binding DB 7020 489416 1132739
(IUPHAR) / British Pharmacological Society (BPS) -The IUPHAR/ 22 Zinc - 20 million -
BPS Guide to Pharmacology 23 PDSP 738 7449 -

The IUPHAR/BPS is considered as a guide to pharmacology [66]

TABLE II. Dataset Used in Dti Prediction
is an open access knowledge website that provides information
on licensed drugs and their targets and holds information on small Dataset Targets Drugs DTI
molecule drugs. Enzyme 664 445 2926
Ion Channel 204 210 1476
T. Cancer Drug Resistance Database (CancerDR)
GPCR 95 223 635
CancerDR offers elaborate information on anti-cancer drugs Nuclear Receptor 26 54 90
and their pharmacological profiling. CancerDR helps in effective GPCR- G-Protein Coupled Receptor
personalized cancer therapies and identifies gene-encoding drug
targets, based on genetic and residual resistance [67].
V. Qualitative Analysis of Machine Learning
U. Binding Database (DB) Techniques for DTI Prediction
Binding DB is a binding database that holds the DTI of small
molecules as well as all the interaction data collected from an array Qualitative analysis helps in an understanding of the ML techniques
of literature studies. This is an extensive database for protein ligand involved in DTI predictions, based on the quality and characteristics of
binding affinity [68]. the methods used. Qualitative analysis outcomes are descriptive, and
inferences are drawn easily from the data obtained and the analysis of
V. ZINC is not Commercial (ZINC) DTI prediction is shown in Table III-VII.
ZINC is the largest database [69] comprising every drug needed The Yaminishi et al. [71] Bench Mark (BM) dataset has been the
for new ligand discovery. Information on drugs and the targets they only one used by many of the researchers for the purpose because it
can interact with are collected here. ZINC is a major database for incorporates diverse drug and target data to create a new DTI dataset.
researchers looking for the chemical composition of their biological The BM dataset is shown in Table II.
targets.
A. Review of Literature for Similarity-Based Methods
W. Psychoactive Drug Screening Program (PDSP) Similarity-based methods consider similarities between drugs and
The Psychoactive Drug Screening Program (PDSP) [70] screens targets to identify DTIs. Perlman et al. [11] proposed a scheme that
compounds with previous reports of pharmacological, biochemical incorporates multiple drugs and targets similarity to predict DTI using
and behavioural activity. It is chiefly used to identify novel targets in the logistic regression SITAR (Similarity-based Inference of drug-
the treatment of mental disorders. TARgets) framework. Mei et al. [12] proposed a bipartite local model
(BLM)-based method to handle the candidate problem of baseline
X. A Summary of Databases BLM-NII (BLM with Neighbor-based Interaction profile Inferring). Van
Table I, summarizes the general statistical information on every Laarhoven and Marichiori [13] developed a weighted nearest neighbor
database. (WNN)algorithm that directly uses the GIP (Gaussian interaction
profile) kernel by drawing up a profile of the interaction score for a
new drug (WNN-GIP). Shi et al. [14] proposed a method to handle
missing interactions using a cluster of similar targets that is Super

- 90 -
Regular Issue

TABLE III. Qualitative Analysis of the Articles Using Similarity-Based Methods

Pre processing/
Feature
Source ML Tech Dataset Feature Validation Strength Weakness Outcome
Selection
Extraction
Logistic 250 Wrapper
Reference Lists the selected Only 10 features are Targets of 307
Regression Proteins, - Feature 10 Fold CV
[11] (2011) features considered drugs are predicted
(LR) 315 Drugs Selection
Bipartite Whenever new drug or
NII procedure for
Reference Local LOOCV and target is given as input 57 % of DTI has
BM Dataset - - finding drugs and
[12] (2012) Model-NII 10 Fold CV it is not considered as been predicted
targets
(BLM-NII) there is no training data

Uses regularized least No difference between Prediction of

Weighted
square algorithm to indirect and direct DTI interaction
Reference Nearest LOOCV and
BM Dataset - - find the new drug targets. These are not which show top 5
[13] (2013) Neighbor 5 Fold CV
based on the old measured to interact prediction for each
(WNN)
drugs with drugs. dataset.

Super Finds new drugs

Finds missing Considers only about
Reference Target and targets
BM Dataset - - 5 Fold CV interaction using missing interaction not
[14] (2015) Clustering and potential
cluster of targets. more about existing DTI
(STC) interaction
Improved
K-Nearest Finger print Hubness awareness LOOCV over fits and
Reference 5 Fold CV, prediction of
Neighbor BM Dataset extraction for - and ensemble size then shifted to 5 Fold
[15] (2016) LOOCV DTIs around 12
(KNN) drugs gives high accuracy CV
prediction is found

Integrating Considers only

Reference A promising tool
LPLNI BM Dataset - - LOOCV similarities of fingerprint as features
[16] (2017) for DTI prediction
different features for drugs

Reference Multi-View 1253drugs, 20 trials of 5 Enrichment analyzes No details of 56 newly identified

- -
[17] (2017) DTI 887 targets Fold CV of drugs and targets experiments clusters
Calculating
Considers only
Reference K-Nearest probability based 34 % better
5 trials of 10 Ranking of top several
[18] Neighbor BM Dataset - - weight and similarity prediction than
Fold CV integrations of drug and
(2018) (KNN) based weight for previous methods
targets
targets
BM Dataset - Bench Mark dataset, CV- Cross Validation, LOOCV-Leave One Out Cross Validation.

TABLE IV. Qualitative Analysis of the Articles Using Matrix-Based Methods

Pre processing/
Feature
Source ML Tech Dataset Feature Validation Strength Weakness Outcome
Selection
Extraction
Incorporates target DTI leads to Drug
More survey based on
Reference 5 Trials of bias and context repurposing and
BRDTI BM Dataset - - DTI is to be done for
[19] (2009) 10 Fold CV alignment for drug adverse drug
better prediction.
and target similarities reaction prediction
Interaction score
Reference Better for only 12 low Similarity based
KBMF BM Dataset - - 5 Fold CV is generated using
[20] (2012) dimensional projection DTIs.
factorization methods
Predict interaction
608 protein, Structural view
based on chemical
Reference 326 and chemical Preserving the point Noisy observation leads
MLRE - 5 Fold CV view with SVM
[21] (2017) drugs, 114 view of drug are wise linear regression to disagreement data
and graph based
interactions extracted
methods
DTI matrices
Reference VB-MK- 5 Trials of are linked to Works well for mid-
BM Dataset - - DTI predicted
[22] (2017) LMF 10 Fold CV weighted common sized datasets
observations
Uses extremely
randomized tree
Reference Pseudo Extraction of Uses only Pseudo AAC Predicted 15
BM Dataset - 5 Fold CV methods and it is
[23] (2018) SMR Pseudo AAC Descriptors. Potential DTIs.
computationally
more efficient
BM Dataset - Bench Mark Dataset, CV- Cross Validation, AAC –Amino Acid Composition.

- 91 -
International Journal of Interactive Multimedia and Artificial Intelligence, Vol. 8, Nº6

TABLE V. Qualitative Analysis of the Articles Using Feature-Based Methods

Pre processing/
Feature
Source ML Tech Dataset Feature Validation Strength Weakness Outcome
Selection
Extraction
LOO CV
Combining GIP with Increase kernel with 15 known
Reference Regularized and 5 Trials
BM Dataset - - target kernel and more information about interaction was
[25] (2011) Least Square of 10 Fold
drug kernel DTI predicted
CV
Incorporates both
Krons- Replaced missing known and unknown Prediction of
Reference Balancing the data is
Regularized BM Dataset values with - 5 Fold CV interaction and make interval as measure
[26] (2016) not considered
Least Square mean of data a general purpose of confidence
learner
Finds some unlabeled
Structural
sample as negative Asks for using structure
similarity, Predicts Interaction
Reference Weighted sample and also but we cannot get
BM Dataset Gene Function - 5 Fold CV and listed 3 top
[27] (2016) SVM considers positive structure for all the
similarity was known interaction
samples beneath targets
extracted
unlabeled samples
PROFEAT for
5877Drugs Ensemble learning to
Reference Ensemble Target Oversampling is done Predicted more
3348Targets - 5 Fold CV address issues of class
[28] (2016) learning and which increases noise than 20 Known DTI
12674DTI imbalance
Rcpi for Drug

Uses LBP histogram

Principal
Discriminate vectors which
Reference AAC feature Component Only AAC information Not listed the
Vector BM Dataset 5 Fold CV retains evolutionary
[29] (2017) were Extracted Analysis is used for prediction predicted DTI
Machine information of amino
(PCA)
acid

Support Multiple Kernel

Reference Compound-Protein-
Vector BM Dataset - - 10 Fold CV combination is used GIP based prediction
[30] (2017) Interaction
Machine for prediction

2719 E Considers different

Reference REP Tree 1372 IC families of proteins No cross validation is
- - 10 Fold CV DTI prediction
[31] (2017) Algorithm 630 GPCR by using various done
86 NR learning rate

Sequential
PSSM for target
Forward Balanced Data
Reference and SMILE Not considered domain Listed top 10
Adaboost BM Dataset Feature 5 Fold CV using RUS and CUS
[32] (2017) for drug were features known interaction
Selection techniques
extracted
(SFFS)
Considered class
imbalance and used
PROFEAT for
Bagging 5877Drugs Neighbourhood 14 out of 16 known
Reference Target Not discussed about
based 3348Targets - 10 Fold CV balanced bagging for interactions have
[33] (2018) and Features
ensemble 12674 DTI balancing the data been detected.
Rcpi for Drug
and active learning
strategy is used

BM dataset - Bench Mark dataset, CV- Cross Validation, LOOCV-Leave One Out Cross Validation, PROFEAT-PROtein FEATures, AAC- Amino Acid Composition,
Rcpi-R package for extracting features for compound protein interaction..

Target Clustering (STC). Buza K [15] proposed a K-nearest neighbor B. Review of Literature for Matrix-Based Methods
(KNN)-based method with hubness-aware classification and error Matrix-based methods use matrix similarity for DTI prediction.
correction to maximize the detrimental effect of bad hubs (EcKNN- Rendle et al. [19] proposed an algorithm based on the Bayesian
KNN with error correction). Zhang et al. [16] posited a framework that Personalized Ranking (BPR) matrix factorization which incorporates
develops a drug-drug linear neighbourhood, calculates the similarities, drug and target similarities to predict DTIs (BPRDTI). Gonen [20]
and predicts drug-target interaction profile and label propagation proposed a method to factorize the matrices with interaction score
(LPLNI-Label Propagation with Linear Neighbourhood Information). matrix so as to find new drugs and targets and determine their
Zhang et al. [17] developed a clustering algorithm by incorporating interaction using kernelized Bayesian matrix factorization (KBMF). Li
drug and target data from structural and chemical viewpoints with et al. [21] introduced an algorithm to find a low-rank representation
existing knowledge of interactions (MDTI- Multiview DTI). Shi and Li embedding (LRE) technique and fix errors in point wise linear
[18] advanced an improved Bayesian ranking DTI method that adds reconstruction. This was done to obtain a different view of the
weights for unknown drugs and targets using weighted neighboring structural and chemical features of drugs and targets as Single view
drugs and targets (WBRDTI–Weighted Bayesian Ranking DTI).

- 92 -
Regular Issue

TABLE VI. Qualitative Analysis of the Articles Using Network-Based Methods

Pre processing/
Feature
Source ML Tech Dataset Feature Validation Strength Weakness Outcome
Selection
Extraction

Reference Used a bipartite

Network based BM 5 new DTI were
[34] - - 10 Fold CV graph for Imbalanced data is used
Inference (NBI) Dataset predicted
(2012) prediction

Network-based
Used RWR to get Leaves the target
Reference Random Walk
BM potential DTI which has no drug it 29 new DTI were
[35] with Restart on - - LOOCV
Dataset using bipartite is considered ass zero predicted
(2012) the Heterogeneous
graph network matrix
network (NRWRH)

Network-
Reference Not DTI predicted
Consistency-based BM Considered as zero Listed out several
[36] - - discussed using bipartite
Prediction Method Dataset matrix DTI
(2013) Properly graph network
(Net CBP)

In order to improve
Integrates
Reference Normalized Not performance more
BM robust PCA Predicts
[37] Multi information - - discussed negative dataset to
Dataset with biological interaction
(2015) Fusion properly be built to find the
information
interactions.

RWR on
Reference Considered only 110 drugs
Random Walk 467Targets heterogeneous
[38] - - - fingerprints features for predicted for 3419
Restart (RWR) 544Drugs network using
(2015) drugs targets
chemical features

Principal Predicts
Reference 12015 Drug Used both labelled
IN - Random Walk Component interaction
[39] 1895445 - 5 Fold CV and unlabeled data Data is imbalanced
with Restart (RWR) Analysis between drug and
(2018) Target for prediction
(PCA) targets

Neighbourhood Calculates
Reference Predicts
Regularized Logistic BM similarities Improved using Not more parameters
[40] - 10 Fold CV interaction but
Matrix Factorization Dataset of drugs and rescoring matrix are considered
(2019) not listed
(NRLMF) targets

BM Dataset - Bench Mark Dataset, CV- Cross Validation, LOOCV-Leave One Out Cross Validation.

LRE and Multiview LRE, respectively (LRE). Bolgar et al. [22] developed an ensemble-based approach for a random projection ensemble (RPE)
a method integrating multiple kernels, weights, and graphs, all of the REP tree algorithm (Drug RPE). Rayhan et al. [32] developed a
regularized to model the probability of DTI prediction (VB-MK-LMF). model using targets in the form of a matrix (position-specific scoring
Huang et al. [23] propounded an extension of the structure activity matrix - PSSM) and drug molecules features for DTI prediction
relationship classification by implementing the extremely randomized using the AdaBoost classifier (iDTI-EsBoost). Sharma and Rani [33]
tree (ERT) using the pseudo substitution matrix representation (SMR) proposed an ensemble (Bagging-Ensemble) model that uses active
of the target (Pseudo-SMR). Marta et al. [24] proposes a local model- learning methodology to predict DTIs (BE-DTI).
agnostic for interaction prediction.
D. Review of Literature for Network-Based Methods
C. Review of Literature for Feature-Based Methods These methods use networks of similar drugs and targets for DTI
Feature-based methods consider drug and target features for DTI prediction. Cheng et al. [34] proposed a bipartite Network Based
prediction. Van Laarhoven et al. [25] proposed an algorithm that Inference (NBI) method for DTI prediction. Chen et al. [35] developed an
integrates the DTI network information with the Gaussian Interaction RWR framework to get potential DTIs using a bipartite graph network
Profile kernel using the Regularized Least Square (RLS). Ezzat et al. (NRWRH-Network-based Random Walk with Restart on Heterogenous
[26] developed a framework for DTI prediction using the voting of network). Chen et al. [36] used this method for both labelled and
the decision tree, random forest, STACK and Laplacian Eigen base unlabeled data DTI prediction (NETCBP-Network Consistency-based
classifiers, and also considered imbalanced classes for prediction. Prediction). Peng et al. [37] proposed a method that incorporates the
Nascimento et al. [27] advanced a method that incorporates both PCA to reduce dimensions and integrate data from multiple drug and
known and unknown interaction data using the RLS. Lan et al [28] target sources for DTI prediction (NMIF-Normalized Multi-Information
developed a framework for DTI prediction by taking unlabeled Fusion). Seal et al. [38] proposed a model that needs matrix inversion
samples using the weighted SVM (PUDT-Positively Unlabeled Drug and score of relevance between two nodes in a weighted graph of
Targets). Li et al. [29] proposed a method to find DTIs as a structure DTIs (RWR-Random Walk with Restart). Huang et al. [39] proposed
activity relationship (SAR) classification with the principal component a 2-network-based rank algorithm that involves the random walk and
analysis (PCA), using the Discriminative Vector Machine (DVM). bipartite graph (IN-RWR-intra network with Random Walk). Ban et al.
Ohue et al. [30] proposed an approach that uses virtual screening [40] developed a method based on improving the NRLMF algorithm by
and the Pairwise Kernel Method (PKM). Zhang et al. [31] proposed calculating the NRLMF scores as the expected beta distribution values.

- 93 -
International Journal of Interactive Multimedia and Artificial Intelligence, Vol. 8, Nº6

TABLE VII. Qualitative Analysis of the Articles Using Deep Learning-Based Methods

Pre processing/
Feature
Source ML Tech Dataset Feature Validation Strength Weakness Outcome
Selection
Extraction

Reference 1520 Targets Uses DBN and DTI probability

Only known
[41] Deep DTI 1412 Drugs - - 10 Fold CV Fine tune RBM in which are useful for
interaction are used
(2017) 12524 samples greedy way. drug repurposing

Reference 442 Targets 68 Creating CNN Predefined features

Concordance Predicts binding
[42] Deep DTA Drugs 30056 - - blocks of targets, are considered for
index affinity
(2018) DTI drugs CNN blocks of protein
PSSM for Target
Reference Uses Auto encoder
and PubChem Only CTD descriptors
[43] AUTO DNP BM Dataset - 5 Fold CV blocks to create Predicts interaction
fingerprint has are considered.
(2018) Deep NN
taken for drugs
Considers Diseases treated
Reference 3546 Proteins
LASSO – Tripeptide More number of by drug and its
[44] 5834 Drugs - - 10 Fold CV
DNN composition functions are used. association with
(2019) 14792 DTI
feature of proteins breast cancer is listed

t-distributed
Reference Deep 3675Targets stochastic Similarity acts
Considers only CTD
[45] Convolution- 11950Drugs - neighbor 5 Fold CV as a informative Predicts interaction
descriptors of targets
(2019) DTI 32,568 DTI embedding descriptors
(t-SNE)

BM Dataset - Bench Mark Dataset, CV- Cross Validation, CTD – Composition, Transition and Distribution, PSSM - Position Specific Scoring Matrix, PubChem
– PubChem is a Chemical Information database.

Beta distribution value is calculated using the interaction information Table IX the performance metrics used. Integrates here refers to drugs
and NRLMF score (NRLMF-beta). that produce a positive DTP result, that is, the integrating drug can be
used to treat a target it integrates with. The converse is true with non
E. Review of Literature for Deep Learning-Based Methods integrates, which refers to drugs that produce a negative DTP result,
Deep learning-based methods use the drug and target features for that is, the non integrating drug cannot be used to treat a target it does
DTI prediction. Wen et al. [41] proposed a method that takes raw target not integrate with.
and drug features using a deep belief network (DBN) and predicts DTI
in drugs approved by the Food and Drug Association (DeepDTIs). TABLE VIII. Confusion Matrix
Ozturk et al. [42] proposed a DTI prediction model using target Integrates Non Integrates
sequences and drug molecule to predict drug target binding affinity
Integrates True Positive False Positive
(DeepDTA). Wang et al [43] developed a computational model using a
stacked auto encoder for DTI prediction (AUTO-DNP). You et al. [44] Non integrates False Negative True Negative
presented a method based on protein and drug features with LASSO
regression model in tandem with the deep neural network (DNN) to TABLE IX. Performance Metrics Used in DTI Prediction
predict DTI (LASSO-DNN). Lee et al. [45] proposed a DTI prediction
S. Metrics
model using local protein residue patterns in DTI (DeepConv-DTI). Formula Metrics Description
No Used
Accuracy is the ratio
VI. Quantitative Analysis of Machine Learning of correct prediction
1. Accuracy (TP+ TN)/(TP+TN+FP+FN) out of total number of
Techniques in DTI Prediction predictions
Quantitative analysis is applied to determine the best prediction Sensitivity/
2. TP/(TP+FN) Measure of quantity
performance method, using different ML techniques with appropriate Recall
metrics. The prediction method must deal with the steps of data pre- 3. Precision TP/(TF+FP) Measure of quality
processing and feature selection, as well as drug and target integration.
Curve shows the
The best machine learning prediction method includes the hyper
relation between False
parameters and association index for DTI prediction. Of the various 4. AUC False Positive vs. True Positive Positive and True
ML techniques [11]-[44] available, the best is chosen for prediction. Positive
Tables X-XIV depict the quantitative analysis of the results of several Curve shows the
ML methods in DTI prediction that help enhance performance. 5. AUPR Precision vs. Recall relationship between the
Precision and Recall
A. Performance Metrics
A confusion matrix is used to calculate performance measures from Mathew’s Correlation
6. MCC Coefficient
test set values in terms of true positives, true negatives, false positives
and false negatives among classes that are to be classified as integrates Harmonic average of
7. F1 Score TP/(TP+1/2+TP/(FP+FN))
or not integrates. Table VIII shows the confusion matrix for DTI and Precision and Recall

- 94 -
Regular Issue

TABLE X. Quantitative Analysis of the Similarity-Based Methods Used in DTI Prediction

Similarity Based Methods

Accuracy Sensitivity/ Recall Precision/nDCG AUC AUPR/MAP
S. No ML Tech.
E IC G N E IC G N E IC G N E IC G N E IC G N
1. LR - - - - - - - - - - - - 92.2 92.7 94.6 86.3 87.7 88.9 93.9 85.1

2. BLM-NII - - - - - - - - - - - - 98.8 99.0 98.4 98.1 92.9 95.0 86.5 86.6

3. WNN - - - - - - - - - - - - 81.9 75.5 84.8 78.8 29.9 24.9 30.8 43.4

4. STC - - - - - - - - - - - - 81.2 81.1 87.5 87.1 38.5 36.7 41.4 53.3

5. KNN - - - - - - - - - - - - 95.4 97.2 97.2 - 83.7 85.5 62.8 -

6. LPLNI - - - - - - - - - - - - 97.0 97.6 99.4 99.1 90.6 94.6 96.8 94.9

Multi-
7. - - - - - - - - - - - - 86.9 - - - -
view DTI
nDCG MAP
8. KNN - - - - - - - - 98.3 98.4 96.2 94.8
90.8 95.9 94.0 94.5 88.0 94.2 91.5 92.7
E-Enzyme, IC-Ion Channel, G-G-Protein Coupled Receptor (GPCR), N-Nuclear Receptor, AUC-Area Under Curve, AUPR-Area Under Precision Recall, nDCG-
normalized Discounted Cumulative Gain PPV-Positive Predicted Values, MCC-Mathew’s Correlation Coefficient, MAP-Mean Average Precision.

TABLE X. Quantitative Analysis of the Matrix-Based Methods Used in DTI Prediction

Matrix based Methods

S. Accuracy Sensitivity/Recall Precision/nDCG AUC AUPR/MCC

ML Tech.
No E IC G N E IC G N E IC G N E IC G N E IC G N
nDCG
1. BRDTI - - - - - - - - 98.1 98.2 95.5 92.3 - - - -
89.7 95.3 92.9 94.8
2. KBMF - - - - - - - - - - - - 83.2 79.9 85.7 82.4 - - - -
3. MVLRE - - - - - - - - - - - - 65.0 51.4 61.7 - - - - -
4. VB-MK LMF - - - - - - - - - - - - 98.7 98.9 97.6 95.7 89.0 91.0 80.0 77.0
Pseudo MCC
5. 89.4 87.8 82.9 83.3 89.5 87.9 82.1 95.2 90.2 87.8 82.1 76.3 96.0 93.8 90.5 96.3
SMR 81.8 78.7 71.8 71.6

E-Enzyme, IC- Ion Channel, G- G-Protein Coupled Receptor (GPCR), N- Nuclear Receptor, AUC-Area Under Curve, AUPR- Area Under Precision Recall, PPV-
Positive Predicted Values, MCC- Mathew’s Correlation Coefficient, nDCG-normalized Discounted Cumulative Gain.

TABLE XII. Quantitative Analysis of the Feature-Based Methods Used in DTI Prediction

Feature Based Methods

Accuracy/PPV/MCC/
S. Sensitivity/Recall Precision AUC AUPR/MCC/ F1 Score
ML Tech. F1 Score
No
E IC G N E IC G N E IC G N E IC G N E IC G N
1. RLS - - - - - - - - - - - - 98.2 98.5 94.5 88.7 88.1 91.8 70.0 60.4
2. Krons-RLS - - - - - - - - - - - - 97.9 98.7 95.1 92.4 - - - -
Weighted PPV
3. 24.0 14.0 16.0 7.0 99.0 99.0 94.0 97.0 88.4 83.1 87.8 88.5 - - - -
SVM 36.0 74.0 58.0 64.0
Ensemble
4. - - - - - - - - - - - - 90.0 - - - -
Learning
MCC
5. DVM 93.1 91.7 89.3 92.2 92.9 92.6 89.2 96.6 93.1 90.9 89.4 88.6 92.8 91.7 88.56 93.00
86.3 83.4 78.77 84.80
F1 Score
6. REP Tree 94.0 91.0 88.0 88.0 92.0 89.0 81.0 87.0 90.0 86.0 83.0 79.0 98.0 97.0 94.0 93.0
91.0 88.0 82.0 83.0
MCC
18.0 29.0 26.0 22.0
7. Adaboost 85.0 84.0 84.0 87.0 85.0 78.0 80.0 92.0 96.0 93.0 93.0 92.0 68.0 48.0 50.0 79.0
F1 Score
10.0 20.0 19.0 24.0
8. BE-DTI - 88.0 - 92.7 88.6

- 95 -
International Journal of Interactive Multimedia and Artificial Intelligence, Vol. 8, Nº6

TABLE XIII. Quantitative Analysis of the Network-Based Methods Used in DTI Prediction

Network Based Methods

S. Accuracy Sensitivity/Recall Precision AUC AUPR

ML Tech.
No E IC G N E IC G N E IC G N E IC G N E IC G N
1. NBI - - - - 93.5 98.1 94.8 85.1 97.5 97.6 94.6 83.8 - - - - - - - -
2. NRWRH - - - - 85.0 - - - 99.0 - - - - - - - - - - -
3. Net CBP - - - - - - - - - - - - 82.5 80.3 82.3 83.9 - - - -
4. NMIF - - - - - - - - - - - - 83.0 82.0 82.0 80.0 81.0 78.0 74.0 71.0
5. RWR - - - - - - - - - - - - 70.9 - - - -
IN-RWR/
6. 82.2 - - - - - - - - 95.1 - - - -
Corank
7. NRLMF-beta - - - - - - - - - - - - 99.0 99.0 97.5 96.4 89.7 91.3 75.5 75.5
E-Enzyme, IC- Ion Channel, G- G-Protein Coupled Receptor (GPCR), N- Nuclear Receptor, AUC-Area Under Curve, AUPR- Area Under Precision Recall, PPV-
Positive Predicted Values, MCC- Mathew’s Correlation Coefficient.

TABLE XIV. Quantitative Analysis of the Deep Learning-Based Methods Used in DTI Prediction

Deep Learning based Methods

S. Accuracy Sensitivity/Recall Precision AUC AUPR/MCC

ML Tech.
No E IC G N E IC G N E IC G N E IC G N E IC G N
1. Deep DTI 85.8 82.2 - 91.5 -
2. Deep DTA - - - - - - - - - - - - - - - - 71.4
MCC
3. AUTO DNP 94.1 91.1 86.6 80.5 95.5 95.6 81.6 76.2 92.9 87.7 91.0 84.1 94.2 91.0 87.4 81.7
88.3 82.7 73.9 61.8
4. LASSO-DNN 81.0 - - - - - - - - 89.0 - - - -
Deep
5. Convolution 75.0 85.0 70.0 80.0 - - - -
DTI
E-Enzyme, IC- Ion Channel, G- G-Protein Coupled Receptor (GPCR), N- Nuclear Receptor, AUC-Area Under Curve, AUPR- Area Under Precision Recall, PPV-
Positive Predicted Values, MCC- Mathew’s Correlation Coefficient.

VII. Discussion techniques are, generally speaking, not used on the data because they
are curated when collected from different sources. When the data are
The analysis shows that the chemogenomics-based approach to incorporated, however, values may go missing or are replaced, and
DTI prediction is ideally suited to interaction prediction. A review there is thus a need for preprocessing. The preprocessing employed
of the qualitative and quantitative analyses offers an overview of the in [26] to replace missing values uses the mean values of the data.
dataset, preprocessing, feature selection techniques, validation and Employing preprocessing techniques like data cleaning enhances the
ML classification techniques used in DTI prediction, all of which are quality of the data for further processing.
discussed in this section. From the qualitative analysis tables III-VII, it is found that the
A. The Dataset dataset used in the prediction process is unbalanced and may affect
the performance of the classifiers. Balancing techniques include
The benchmark Yaminishi et al. dataset [71] is invariably used in
balancing the data using oversampling [26], [32], [33], though it
DTI prediction, with its four enzyme (E), ion channel (IC), G-protein
increases negative outcomes. For DTI prediction, undersampling can
coupled receptor (GPCR) and nuclear receptor (NR) classes and the
be suggested to improve the positive outcomes.
DTI positive pairs of each class. Apart from the benchmark dataset
above, others are used as well [11], [17], [21], [26], [31]. Deep learning- C. Feature Extraction Methods
based prediction works with more dynamic data. An attempt has been Feature Extraction is done to reduce the dimensionality of the input
made in [44] to construct a negative DTI dataset, which is significant features by creating a new set of features from the original features
in that it facilitates the assimilation of targets not taken into the which gains the important features of the data and also reduces the
prediction process. The number of instances used, which ranges from dimension of the features, which increases the speed of learning
250 to 5500, may be increased or decreased, depending on the purpose and generalization of machine learning. It can also be done through
of the research. various tools available for it. In drug discovery researchers use several
B. Preprocessing and Balancing Techniques tools for feature extraction, the trending tools are PROFEAT and
Protr for protein feature extraction, Rcpi and PADEL Descriptor for
Major issues in DTI prediction are brought on by the data obtained
drug feature extraction. The research work which uses these tools for
from miscellaneous sources, which may have a different range of
feature extraction are [28], [33].
values or none at all. Missing values from known data are inferred,
based on the observed values in the data structure. Preprocessing

- 96 -
Regular Issue

D. Feature Selection Methods for improved accuracy [33]. Logistic regression [11], [16] operates data
Feature selection is of fundamental importance, because the integration strategies effectively. The DVM [29] influences features
extracted features increase data dimensions and result in problems strongly in its handling of outliers. As far as feature-based methods
with over fitting. Feature selection techniques reduce the number of are concerned, the random forest outperforms the rest, while the
features by selecting the most important ones from the given input. It regularized least square (RLS) performs well in tandem with more
is clear from the analysis that target features can be categorized into influential features. In terms of performance, the WBR-DTI, VB-MK-
three –structural, evolutionary and sequence. While the drug feature LMF, NRLMF-beta and CNN find the best features for DTI prediction.
is structural, the number of target features considered varies from From the quantitative analysis table X-XIV, the progress made is
1080 to 1498. Likewise, drug features vary, depending on whether they evaluated using AUC values, with marked improvements in the SVM
are 1D or 2D and on the fingerprint of the drugs selected. Tables III- from 61.7% [19] to 96.34% [22], the KNN from 92.3% [18] to 95.4% [20],
VII in [11]-[18] that showcase similarity-based methods only consider and LR from 85.1% [11] to 95.32% [16]. Among the classifiers used
similarities between drug-drug, target-target and drug-target for DTI in DTI prediction, the SVM gives the best prediction results with an
prediction, which means that only similar drugs interact with similar improvement of 34.64%. The random forest and decision tree used in
targets. So in similarity based methods, drug-based and target-based ensemble learning give an AUC value of 90%. Adaptive Boosting and
features are considered unimportant for DTI prediction. Further, RLS give AUC values of 88.7% and 97%, respectively. The WBR-DTI
similarity-based methods do not handle large-scale datasets. Matrix- and VB-MK-LMF give an AUC value of 98%, while the NRLMF-beta
based methods [19]-[23] consider only drug and target similarities, and gives 96%.
no other features are taken for prediction. Also, matrix-based methods However, the results are based on the data given as input. The
only handle small-scale datasets. Of the feature-based methods used in new model developed may perform poorly, with imbalanced data
[25]-[33], the Sequential Forward Feature Selection (SFFS) technique and missing values. The qualitative analysis tables III-VII show that
is applied in [33], where the different feature sets considered are the dataset has more negative than positive predictions, owing to the
added sequentially, one by one, to evaluate the dataset. It is observed nature of the dataset used for DTI prediction. The quantitative analysis
that the structural feature, which is one of the most influential target tables X-XIV depict that matrix factorization-based methods perform
features, plays a significant role in DTI prediction, and may vary with best for DTI prediction, though deep learning-based methods handle
the dataset taken. Finding the most influential features is important large-scale data and find the most influential features and some of the
to feature selection. The network-based methods in [34]-[40] take papers gives light to other process like detecting adverse reaction of
different sets of features and handle them appropriately by selecting drugs [72]. This review has thus laid out a thorough understanding
the most important drug and target features. Compact feature learning of datasets, feature selection methods and validations, as well as a
is undertaken in [39] by applying the Diffusion Component Analysis comparison of the classifiers used for DTI prediction
(DCA), which constructs a low-dimensional vector representation for
each drug and target using diffusion distribution. It helps find the best
interpretable features. The deep learning-based methods discussed VIII. Conclusion and Future Scope
[41]-[45] use the t-distributed Stochastic Neighbor Embedding (t-SNE) It is concluded from the review that much research has focused
technique to reduce input feature dimensionality. Deep learning- chiefly on chemogenomics, and this is because DTI based on drug and
based methods consider dynamic data and dynamic features. The target features and similarities may be found without their structures.
Convolution Neural Network (CNN) used in [45] handles features The method works well by finding the most influential features
with ease and finds the most potent ones. Given that deep learning- using a range of classifiers for DTI prediction. The classifiers use
based methods deal with large-scale datasets well, future research that only known static interaction for training the model, given that the
applies deep learning will execute DTI prediction better. interaction data is static. Though static data has largely been used as
E. Validation Methods a benchmark dataset for interaction prediction, dynamic data may be
considered so the problem of new DTI is resolved. Several studies have
The qualitative analysis depicts that the 10-fold Cross-Validation
only considered target features (like the AAC, CTD and pseudo AAC)
(CV) and 5-fold cross-validation offer better results than other CV
and the PubChem fingerprint for drugs. There are, therefore, plenty of
techniques like the Leave-One-Out CV (LOOCV) and jackknife.
research opportunities to predict drugs using the influence of all the
Approaches using the LOOCV have problems with over fitting. DTI
features. Influential features may vary from one technique to another.
predictions are evaluated using AUC and AUPR values. The AUC
There is, however, a delay in finding influential features, since one
values of the classifiers show better results when the 10-fold CV is
feature may not be as important for prediction as another. More data
used to validate the methods. AUC is chosen because it distinguishes
are to be considered for finding the most influential features, which
between classes and validates the model’s capacity even when the
is possible with the introduction of big data for prediction. The ML
dataset is imbalanced.
techniques used by the deep learning-based and matrix-based methods
F. ML Techniques Engaged in DTI Prediction were found to predict DTI better than others. It is recommended,
The qualitative analysis table III-VII, depicts the various classifiers considering the above, that future researchers focus on building a
used, one outclasses the rest at DTI prediction. Ranking algorithms negative dataset for interaction prediction. Feature scaling or feature
like Bayesian ranking are used to rank DTI [20]. The SVM [19], [22] engineering techniques may be applied to enhance the dataset. New
classifier, which handles target and drug features by calculating them databases can be created by collecting data from numerous sources
separately and reducing prediction complexity cannot determine the and incorporating appropriate parameters or influential features for
relationship between the features and may produce a large number future research. Further, future models developed for DTI prediction
of false positives. The KNN [18], [20] falls short, performance-wise, must consider every feature for drug prediction. The model developed,
in its inability to handle features and large-scale datasets. Ensemble based on ML techniques, should be able to update information on
learning [27] handles large-scale and high-dimensional data. The drugs and targets constantly for new interaction prediction. Thus, the
Adaboost classifier separates the data and classifies them to get the model must be able to predict interaction, based on prior knowledge,
most appropriate features [32]. The decision tree manages missing without having to be trained on every occasion. Such a model is likely
data thoroughly and uses diversity to learn features based on instances to offer the best interaction prediction.

- 97 -
International Journal of Interactive Multimedia and Artificial Intelligence, Vol. 8, Nº6

References [20] M. Gönen, “Predicting drug–target interactions from chemical and

genomic kernels using Bayesian matrix factorization,” Bioinformatics,
[1] L. Martin, M. Hutchens, C. Hawkins, A. Radnov, “How much do clinical vol. 28, no. 18, pp. 2304–2310, 2012, DOI:10.1093/bioinformatics/bts360.
trials cost?,” Nature Reviews-Drug Discovery, vol. 16, no. 6, pp. 381-382, [21] L. Li, M. Cai, “Drug target prediction by multi-view low rank
June 2017, DOI: 10.1038/nrd.2017.70. embedding”, IEEE/ACM Transactions on Computational Biology and
[2] S.J. Swamidass, “Mining small-molecule screens to repurpose drugs,” Bioinformatics vol. 16, no.5, pp.1712-1721, 1 Sep-Oct 2019, DOI: 10.1109/
Briefing in Bioinformatics, vol. 12, no. 4, pp. 327–335, 2011, DOI: 10.1093/ TCBB.2017.2706267.
bib/bbr028. [22] B. Bolgár, P. Antal, “VB-MK-LMF: fusion of drugs, targets and interactions
[3] F. Moriaud, S.B. Richard, S.A. Adcock, L. Chanas-Martin, J.S. Surgand, M. using variational Bayesian multiple kernel logistic matrix factorization”,
Ben Jelloul, F. Delfaud, “Identify drug repurposing candidates by mining BMC Bioinformatics, vol. 18, no. 1, pp. 440, 2017, DOI: 10.1186/s12859-
the protein data bank,” Briefings in Bioinformatics, vol. 12, no. 4, pp. 336- 017-1845-z.
340, Jul 2011, DOI: 10.1093/bib/bbr017. [23] Y.A. Huang, Z.H. You, X. Chen, “A Systematic Prediction of Drug-Target
[4] R. Chen, X. Liu, S. Jin, J. Lin and J. Liu, “Machine learning for drug-target Interactions Using Molecular Fingerprints and Protein Sequences”,
interaction prediction,” Molecules, vol. 23, no. 9, pp. 2208, 2018, DOI: Current protein & peptide science, vol. 19, no. 5, pp. 468-478, 2018, DOI:
10.3390/molecules23092208. 10.2174/1389203718666161122103057.
[5] T.T. Talele, S.A. Khedkar and A.C. Rigby, “Successful Applications of [24] M. Caro-Martínez, G. Jiménez-Díaz, J. A. Recio-García. “Local Model-
Computer Aided Drug Discovery: Moving Drugs from Concept to the Agnostic Explanations for Black-box Recommender Systems Using
Clinic,” Current Topics in Medicinal Chemistry, vol. 10, no. 10, pp. 127, Interaction Graphs and Link Prediction Technique”, International Journal
2010, DOI: 10.2174/156802610790232251. of Interactive Multimedia and Artificial Intelligence, 2021, DOI: 10.9781/
[6] T. Usha, D. Shanmugarajan, A.K. Goyal, C.S. Kumar and S.K. Middha, ijimai.2021.12.001.
“Recent updates on computer-aided drug discovery: time for a paradigm [25] T. Van Laarhoven, S.B. Nabuurs, E. Marchiori, “Gaussian interaction
shift,” Current topics in medicinal chemistry, vol. 17, no. 30, pp. 3296- profile kernels for predicting drug–target interaction”, Bioinformatics,
3307, 2017, DOI: 10.2174/1568026618666180101163651. vol. 27, no. 21, pp. 3036–3043, 2011, DOI: 10.1093/bioinformatics/btr500.
[7] L. Jacob, J-P. Vert, “Protein-ligand interaction prediction: an improved [26] A. Ezzat, M. Wu, X.-L. Li, C.-K. Kwoh, “ Drug–target interaction prediction
chemogenomics approach,” Bioinformatics, vol. 24, no. 19, pp. 2149–2156, via class imbalance-aware ensemble learning”, BMC Bioinformatics, vol.
2008, DOI: 10.1093/bioinformatics/btn409. 17, no. 19, pp. 509, 2016, DOI:10.1186/s12859-016-1377-y.
[8] D. Rognan, “Chemogenomic approaches to rational drug design”, British [27] A.C. Nascimento, R.B. Prudêncio, I.G.Costa, “A multiple kernel learning
Journal of Pharmacology, vol. 152, no. 1, pp. 38–52, 2007, DOI:10.1038/ algorithm for drug–target interaction prediction,” BMC Bioinformatics,
sj.bjp.0707307. vol. 17, pp. 46 2016, DOI: 10.1186/s12859-016-0890-3.
[9] A. Nath, P. Kumari, R. Chaube, “Prediction of human drug targets and [28] W. Lan, J. Wang, M. Li, J. Liu, Y. Li, et al., “Predicting drug–target
their interactions using machine learning methods: current and future interaction using positive - unlabeled learning”, Neurocomputing, vol.
perspectives,” Methods in molecular biology, Springer, NY, USA, vol. 206, pp. 50–57, 2016, DOI: 10.1016/j.neucom.2016.03.080.
1762, pp. 21–30, 2018, DOI:10.1007/9781493977567_2. [29] Z. Li, P. Han, Z.-H. You, X. Li, Y. Zhang, H. Yu, et al., “In silico prediction
[10] L. Lü, T. Zhou, “Link prediction in complex networks: a survey”, Physica of drug-target interaction networks based on drug chemical structure
A, vol. 390, pp. 1150–1170, 2011, DOI: 10.1016/j.physa.2010.11.027. and protein sequences,” Scientific Reports, vol. 7, no. 1, pp. 11174, 2017,
[11] L. Perlman, A. Gottlieb, N. Atias, E. Ruppin, R. Sharan, “Combining DOI: 10.1038/s41598-017-10724-0.
drug and gene similarity measures for drug-target elucidation,” Journal [30] M. Ohue, T. Yamazaki, T. Ban, Y. Akiayama, “Link mining for kernel based
of computational biology: a journal of computational molecular cell compound–protein interaction predictions using a chemogenomics
biology, vol. 18, no. 2, pp. 133–145, 2011, DOI:10.1089/cmb.2010.0213. approach”, In: International Conference on Intelligent Computing,
[12] J.-P. Mei, C.-K. Kwoh, P. Yang, X.L. Li, J. Zheng, “Drug–target interaction Springer, Cham, Switzerland, 2017, pp. 549–558, DOI: 10.1007/978-3-319-
prediction by learning from local information and neighbours”, 63312-1_48.
Bioinformatics, vol. 29, no. 2, pp. 238–245, 2012, DOI: 10.1093/ [31] J. Zhang, M. Zhu, P. Chen, B. Wang, “DrugRPE: random projection ensemble
bioinformatics/bts670. approach to drug–target interaction prediction”, Neurocomputing, vol.
[13] T. Van Laarhoven, E. Marchiori, “Predicting drug–target interactions for 228, pp. 256–262, 2017, DOI: 10.1016/j.neucom.2016.10.039.
new drug compounds using a weighted nearest neighbor profile”, PloS [32] F. Rayhan, S. Ahmed, S. Shatabda, et al., “iDTI-ESBoost: identification of
One, vol. 8, no. 6, pp. e66952, 2013, DOI: 10.1371/journal.pone.0066952. drug target interaction using evolutionary and structural features with
[14] J.-Y. Shi, S.-M. Yiu, Y. Li, H.C. Leung, F.Y. Chin, “Predicting drug–target boosting”, Scientific Reports, vol. 7, no. 1, pp. 17731, 2017, DOI:10.1038/
interaction for new drugs using enhanced similarity measures and super- s41598-017-18025-2.
target clustering,” Methods, vol. 83, pp. 98–104, 2015, DOI: 10.1016/j. [33] A. Sharma, R. Rani, “BE-DTI: ensemble framework for drug target
ymeth.2015.04.036. interaction prediction using dimensionality reduction and active
[15] K. Buza, “Drug–target interaction prediction with hubness aware learning”, Computer Methods and Programs in Biomedicine, vol. 165, pp.
machine learning,” In: 2016 IEEE 11th International Symposium on 151–162, 2018, DOI:10.1016/j.cmpb.2018.08.011.
Applied Computational Intelligence and Informatics (SACI), IEEE, New [34] F. Cheng, C. Liu, J. Jiang, et al., “Prediction of drug–target interactions and
York, USA, 2016, pp. 37–40, DOI: 10.1109/SACI.2016.7507416. drug repositioning via network-based inference,” PLOS Computational
[16] W. Zhang, Y. Chen, D. Li, “Drug–target interaction prediction through Biology, vol. 8, no. 5, pp. e10025032012, 2012, DOI: 10.1371/journal.
label propagation with linear neighborhood information,” Molecules, vol. pcbi.1002503.
22, no. 12, pp. 2056, 2017, DOI: 10.3390/molecules22122056. [35] X. Chen, M.-X. Liu, G.-Y. Yan, “Drug–Target Interaction prediction by
[17] X. Zhang, L. Li, M.K. Ng, S. Zhang, “Drug–target interaction random walk on the heterogeneous network,” Molecular Biosystems, vol.
prediction by integrating multiview network data”, Computational 8, no. 7, pp. 1970–1978, 2012, DOI: 10.1039/C2M00002D.
Biology and Chemistry, vol. 69, pp. 185–193, 2017, DOI: 10.1016/j. [36] H. Chen, Z. Zhang, “A semi-supervised method for drug–target
compbiolchem.2017.03.011. interaction prediction with consistency in networks”, PloS One, vol. 8,
[18] Z. Shi, J. Li, “Drug–target interaction prediction with weighted Bayesian no. 5, pp. e62975, 2013, DOI: 10.1371/joural.poe.0062975.
ranking,” In: Proceedings of the 2nd International Conference on [37] L. Peng, B. Liao, W. Zhu, Z. Li, K. Li, “Predicting drug–target interactions
Biomedical Engineering and Bioinformatics, ACM, London, United with multi-information fusion”, IEEE Journal of Biomedical and
Kingdom, 2018, pp. 19–24. Health Informatics, vol. 21, no. 2, pp. 561–572, 2015, DOI: 10.1109/
[19] S. Rendle, C. Freudenthaler, Z. Gantner, Z. Gartner, L. Schmidt-Thieme, JBHI.2015.2513200.
“BPR: Bayesian Personalized Ranking from implicit feedback,” In: [38] A. Seal, Y.-Y. Ahn, D.J. Wild, “Optimizing drug–target interaction
Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial prediction based on random walk on heterogeneous networks”, Journal
Intelligence, AUAI Press, McGill, Canada, 2009, pp. 452–461, DOI: of Cheminformatics, vol. 7, no. 1, pp. 40, 2015, DOI: 10.1186/s13321-015-
10.1145/3278198.3278210. 0089-z.

- 98 -
Regular Issue

[59] O. Ursu, J. Holmes, C.G. Bologa, et al., “DrugCentral 2018: an update,”

[39] Y. Huang, L. Zhu, H. Tan, et al., “Predicting drug-target on heterogeneous Nucleic Acids Research, vol. 47, no. D1, pp. D963–970, 2018, DOI:
network with co-rank,” In: International Conference on Computer 10.1093/nar/gky963.
Engineering and Networks, Springer, Cham, Switzerland, 2018, pp. 571– [60] C. Wang, G. Hu, K. Wang, et al., “PDID: database of molecular level
581, DOI: 10.1007/978-3-030-14680-1_63. putative protein–drug interactions in the structural human proteome”,
[40] T. Ban, M. Ohue, Y. Akiyama, “NRLMFβ: beta-distribution rescored Bioinformatics, vol. 32, no. 4, pp: 579–586, 2016, DOI: 10.1093/
neighborhood regularized logistic matrix factorization for improving bioinformatics/btv597.
the performance of drug–target interaction prediction,” Biochemistry [61] D.-T. Nguyen, S. Mathias, C. Bologa, et al., “Pharos: collating protein
and Biophysics Reports, vol. 18, pp. 100615, 2019, DOI: 10.1016/j. information to shed light on the druggable genome”, Nucleic Acids
bbrep.2019.01.008. Research, vol. 45, no. D1, pp. D995–D1002, 2017, DOI: 10.1093/nar/
[41] M. Wen, Z. Zhang, S. Niu, et al., “Deep-learning-based drug– target gkw1072.
interaction prediction”, Journal of Proteome Research, vol. 16, no. 4, pp. [62] S. Kim, P.A. Thiessen, E.E. Bolton, et al., “PubChem substance and
1401–1409, 2017, DOI:1 0.1186/s12911-020-1052-0 . compound databases”, Nucleic Acids Research, vol. 44, no. D1, pp.
[42] H. Öztürk, A. Özgür, E. Ozkirimli, “DeepDTA: deep drug–target binding D1202–1213, 2016, DOI: 10.1093/nar/gkv951.
affinity prediction”, Bioinformatics, vol. 34, no. 17, pp. i821–i829, 2018, [63] V.B. Siramshetty, O.A. Eckert, B.-O. Gohlke, et al, “SuperDRUG2: a one
DOI: 10.1093/bioinformatics/bty593. stop resource for approved/marketed drugs”, Nucleic Acids Research, vol.
[43] L. Wang, Z.-H. You, X. Chen, et al, “A computational-based method for 46, no. D1, pp. D1137–1143, 2018, DOI: 10.1093/nar/gkx1088.
predicting drug–target interactions by using stacked autoencoder deep [64] H. Fang, Z. Su, Y. Wang, A. Miller, Z. Liu, P. C. Howard, W. Tong, &
neural network,” Journal of Computational Biology, vol. 25, no. 3, pp. S. M. Lin, “Exploring the FDA adverse event reporting system to
361–373, 2018, DOI: 10.1089/cmb.2017.0135. generate hypotheses for monitoring of disease characteristics”, Clinical
[44] J. You, R.D. McLeod, P. Hu, “Predicting drug–target interaction network pharmacology and therapeutics, vol. 95, no. 5, pp. 496–498, 2014, DOI:
using deep learning model,” Computational Biology Chemistry, vol. 80, 10.1038/clpt.2014.17.
pp. 90–101, 2019, DOI: 10.1016/j.compbiolchem.2019.03.016. [65] M. Kuhn, I. Letunic, L.J. Jensen, P. Bork, “The SIDER database of drugs
[45] I. Lee, J. Keum, H. Nam, “DeepConv-DTI: prediction of drug-target and side effects,” Nucleic Acid Research, vol. 44, no. D1, pp. D1075-1079,
interactions via deep learning with convolution on protein sequences”, 2015, DOI: 10.1093/nar/gkv1075.
PLoS Computational Biology, vol. 15, no. 6, pp. e1007129, 2019, DOI: [66] A.J. Pawson, J.L. Sharman, H.E. Benson, et al., “The IUPHAR/BPS guide
10.1371/journal.pcbi.1007129. to pharmacology: an expert-driven knowledgebase of drug targets and
[46] M. Kanehisa, M. Araki, S. Goto, et al., “KEGG for linking genomes to their ligands,” Nucleic Acids Research, vol. 42, no. D1, pp. D1098–1106,
life and the environment,” Nucleic Acids Research, vol. 36, pp. D480–484, 2013, DOI: 10.1093/nar/gkt1143.
2007, DOI: 10.1093/nar/gkm882. [67] R. Kumar, K. Chaudhary, S. Gupta, et al., “CancerDR: Cancer Drug
[47] M. Kanehisa, S. Goto, M. Hattori, M. Araki, M. Hirakawa, “From Resistance Database”, Scientific Reports, vol. 3, pp. 1445, 2013,
genomics to chemical genomics: new developments in KEGG,” Nucleic DOI:10.1038/srep01445.
Acids Research, vol. 34, pp. D354–D357, 2006, DOI: 10.1093/nar/gkj102. [68] M.K. Gilson, T. Liu, M. Baitaluk, et al., “BindingDB in 2015: a public
[48] A. Gaulton, A. Hersey, M. Nowotka, et al., “The ChEMBL database in database for medicinal chemistry, computational chemistry and systems
2017,” Nucleic Acids Research, vol. 45, no. D1, pp. D945–954, 2016, DOI: pharmacology,” Nucleic Acids Research, vol. 44, no. D1, pp. D1045–1053,
10.1093/nar/gkw1074. 2016, DOI: 10.1093/nar/gkv1072.
[49] J. Kringelum, S.K. Kjaerulff, S. Brunak, et al., “ChemProt-3.0: a global [69] T. Sterling and J.J. Irwin, “ZINC- Ligand Discovery for Everyone,”
chemical biology diseases mapping”, Database: the journal of biology Journal of Chemical Information and modelling, vol. 55, no. 11, pp. 2324-
databases and curation, vol. 2016 pp. bav123, 2016, DOI: 10.1093/ 2337, 2015, DOI: 10.1021/acs.jcim.5b00559.
database/bav123. [70] B.L. Roth, W.K. Kroeze, S. Patel, E. Lopez, “PDSP Ki -The Multiplicity
[50] A.H. Wagner, A.C. Coffman, B.J. Ainscough, et al, “DGIdb 2.0: mining of Serotonin Receptors: Uselessly diverse molecules or an
clinically relevant drug–gene interactions”, Nucleic Acids Research, vol. embarrassment of riches?”, The Neuroscientist, vol. 6, pp. 252–262, 2000,
44, no. D1, pp. D1036–1044, 2016, DOI: 10.1093/nar/gkv1165 . DOI:10.1177/107385840000600408.
[51] D.S. Wishart, Y.D. Feunang, A.C. Guo, et al., “Drugbank 5.0: a major [71] Y. Yamanishi, M. Kotera, M. Kanehisa, S. Goto, “Drug–target interaction
update to the drugbank database for 2018”, Nucleic Acids Research, vol. prediction from chemical, genomic and pharmacological data in an
46, no. D1, pp. D1074–1082, 2017, DOI: 10.1093/nar/gkx1037. integrated framework,” Bioinformatics, vol. 26, no. 12, pp. i246–i254,
[52] M. Kanehisa, M. Furumichi, M. Tanabe, et al., “KEGG: new perspectives 2010, DOI: 10.1093/bioinformatics/btq176.
on genomes, pathways, diseases and drugs”, Nucleic Acids Research, vol. [72] R. San-Miguel Carrasco, “Detection of Adverse Reaction to Drugs in
45, no. D1, pp. D353–361, 2016, DOI: 10.1093/nar/gkw1092. Elderly Patients through Predictive Modeling”, International Journal of
[53] HMS_LINCS: LINCS Pilot Phase Joint Project: Sensitivity measures of six Interactive Multimedia and Artificial Intelligence, vol. 3, no. 6, pp.
breast cancer cell lines to a library of small molecule kinase inhibitors 52-56, 2016, DOI:10.9781/ijimai.2016.368.
(drug combination treatments). Dataset 2 of 2: Mean cell count and mean
normalized growth rate inhibition values across technical replicates, 2016. A. Suruliandi
[54] J. Von Eichborn, M.S. Murgueitio, M. Dunkel, S. Koerner, P.E. Bourne, Dr. A. Suruliandi received the B.E., degree in electronics
R. Preissner, “PROMISCUOUS: a database for network-based drug- and communication engineering from the Coimbatore
repositioning”, Nucleic Acids Research, vol. 36, Jan 2011, DOI:10.1093/ institute of Technology, Coimbatore, India, in 1987.
nar/gkq1037. He completed his M.E., degree in computer science and
[55] D. Szklarczyk, A. Santos, C. Von Mering, et al., “STITCH 5: augmenting engineering from the Government College of Engineering,
protein–chemical interaction networks with tissue and affinity data”, Tirunelveli, India, in 2000. And he pursued his Ph.D degree
Nucleic Acids Research, vol. 44, no. D1, pp. D380–384, 2015, DOI: 10.1093/ from Manonmaniam Sundaranar University, Tirunelveli, in
nar/gkv1277. 2009. He is currently working as a professor with the Department of Computer
[56] S. Günther, M. Kuhn, M. Dunkel, et al., “Supertarget and matador: Science and Engineering, Manonmaniam Sundaranar University. He has more
resources for exploring drug-target relationships”, Nucleic Acids than 29 years of experience in teaching. He has been author of 50 articles in
Research, vol. 36, pp. D919–922, 2008, DOI: 10.1093/nar/gkm862. international journals, 23 articles in IEEE Xplore publications, 33 in national
[57] X. Chen, Z.L. Ji, Y.Z. Chen, “TTD: therapeutic target database”, conferences, and 13 in international conferences. His interested research areas
Nucleic Acids Research, vol. 30, no. 1, pp: 412–415, 2002, DOI: 10.1093/ are remote sensing, image processing, and pattern recognition.
nar/30.1.412.
[58] L. Jeske, S. Placzek, I. Schomburg, et al., “Brenda in 2019: a European
ELIXIR core data resource”, Nucleic Acids Research, vol. 47, no. D1, pp.
D542–549, 2019, DOI: 10.1093/nar/gky1048.

- 99 -
International Journal of Interactive Multimedia and Artificial Intelligence, Vol. 8, Nº6

T. Idhaya
T. Idhaya received M.Sc., degree in Computer Science
from St. Xavier’s College (Autonomous), Tirunelveli,
India, in 2016. She has completed her M.phil degree in
Manonmaniam Sundaranar University, Tirunelveli, India,
in 2017. She is currently pursuing her Ph.D degree in
Manonmaniam Sundaranar University, Tirunelveli, India.
Her area of interest is Image processing, Machine learning
and Big data.

S. P. Raja
S. P. Raja is born in Sathankulam, Tuticorin District,
Tamilnadu, India. He completed his schooling in Sacred
Heart Higher Secondary School, Sathankulam, Tuticorin,
Tamilnadu, India. He completed his B. Tech in Information
Technology in the year 2007 from Dr. Sivanthi Aditanar
College of Engineering, Tiruchendur. He completed his
M.E. in Computer Science and Engineering in the year
2010 from Manonmaniam Sundaranar University, Tirunelveli. He completed
his Ph.D. in the year 2016 in the area of Image processing from Manonmaniam
Sundaranar University, Tirunelveli. Currently he is working as an Associate
Professor in the School of Computer Science and Engineering in Vellore
Institute of Technology, Vellore, Tamilnadu, India. He published 75 papers
in International Journals, 24 in International conferences and 12 in national
conferences. Dr. Raja is an Associate Editor of the Journal of Circuits, Systems
and Computers, Computing and Informatics, International Journal of Interactive
Multimedia and Artificial Intelligence, Brazilian Archives of Biology and
Technology, International Journal of Image and Graphics, and International
Journal of Biometrics.

- 100 -

Fashion Styling Syllabus
No ratings yet
Fashion Styling Syllabus
3 pages
Pharmacoepidemiology, Pharmacoeconomics,Pharmacovigilance
From Everand
Pharmacoepidemiology, Pharmacoeconomics,Pharmacovigilance
NELLORE DHARANI SAI SREEKANTH
3/5 (1)
Drug 4
No ratings yet
Drug 4
15 pages
Artigo 3
No ratings yet
Artigo 3
3 pages
British J Pharmacology - 2025 - Yang - Artificial intelligence streamlines scientific discovery of drug target interactions
No ratings yet
British J Pharmacology - 2025 - Yang - Artificial intelligence streamlines scientific discovery of drug target interactions
18 pages
British J Pharmacology - 2025 - Yang - Artificial intelligence streamlines scientific discovery of drug target interactions
No ratings yet
British J Pharmacology - 2025 - Yang - Artificial intelligence streamlines scientific discovery of drug target interactions
18 pages
4
No ratings yet
4
8 pages
Improving the Prediction of Drug-Target Interactions Using Machine -Documentation
No ratings yet
Improving the Prediction of Drug-Target Interactions Using Machine -Documentation
45 pages
FMCA-DTI A Fragment-Oriented Method Based On A
No ratings yet
FMCA-DTI A Fragment-Oriented Method Based On A
10 pages
Comprehensive Survey of Recent Drug Discovery Usin
No ratings yet
Comprehensive Survey of Recent Drug Discovery Usin
37 pages
1 2021 ML
No ratings yet
1 2021 ML
53 pages
Drug Target Interaction (DTI) and Prediction Using Machine Learning
No ratings yet
Drug Target Interaction (DTI) and Prediction Using Machine Learning
9 pages
Deep Learning in Drug Discovery an Integrative Review
No ratings yet
Deep Learning in Drug Discovery an Integrative Review
63 pages
Deep Learning in Drug Discovery: An Integrative Review and Future Challenges
No ratings yet
Deep Learning in Drug Discovery: An Integrative Review and Future Challenges
63 pages
Technology
No ratings yet
Technology
10 pages
AI Assisted Drug Discovery
No ratings yet
AI Assisted Drug Discovery
10 pages
Drug-Drug Interactions Prediction Based On Deep Learning and Knowledge Graph
No ratings yet
Drug-Drug Interactions Prediction Based On Deep Learning and Knowledge Graph
27 pages
9-Ai Mi Drug Discov Dev
No ratings yet
9-Ai Mi Drug Discov Dev
40 pages
btae147
No ratings yet
btae147
8 pages
Drug-target interaction prediction using
No ratings yet
Drug-target interaction prediction using
16 pages
Drug-Target Interaction Prediction With Graph Attention Networks
No ratings yet
Drug-Target Interaction Prediction With Graph Attention Networks
9 pages
9. The New Era of Drug Discovery The Power of Computer-aided Drug
No ratings yet
9. The New Era of Drug Discovery The Power of Computer-aided Drug
5 pages
Artificial Intelligence in Drug Discovery: Applications and Techniques
No ratings yet
Artificial Intelligence in Drug Discovery: Applications and Techniques
65 pages
Waqar Hussain - Bilal Rasheed Machine Learning and Drug Discovery
No ratings yet
Waqar Hussain - Bilal Rasheed Machine Learning and Drug Discovery
4 pages
Role of Computer-Aided Drug Design in Modern Drug Discovery
No ratings yet
Role of Computer-Aided Drug Design in Modern Drug Discovery
16 pages
Vivek Patel Review
No ratings yet
Vivek Patel Review
14 pages
Abstract
No ratings yet
Abstract
5 pages
2306.05257v1
No ratings yet
2306.05257v1
22 pages
In-Silico Drug Designing: - Drug Discovery and Development
No ratings yet
In-Silico Drug Designing: - Drug Discovery and Development
3 pages
Advances in Artificial Intelligence AI Assisted 2024 Artificial Intellige
No ratings yet
Advances in Artificial Intelligence AI Assisted 2024 Artificial Intellige
16 pages
abstract
No ratings yet
abstract
8 pages
2 Overall DD DD
No ratings yet
2 Overall DD DD
10 pages
DRUG DISCOVERY
No ratings yet
DRUG DISCOVERY
12 pages
Artificial Intelligence in Drug Discovery Applications and Techniques v3
No ratings yet
Artificial Intelligence in Drug Discovery Applications and Techniques v3
66 pages
20 Limitation
No ratings yet
20 Limitation
13 pages
Juliana Reiew
No ratings yet
Juliana Reiew
18 pages
A Guide to In Silico Drug Design
No ratings yet
A Guide to In Silico Drug Design
52 pages
AI in Drug Discovery - 032019 PDF
No ratings yet
AI in Drug Discovery - 032019 PDF
8 pages
Artificial Intelligence To Deep Learning: Machine Intelligence Approach For Drug Discovery
No ratings yet
Artificial Intelligence To Deep Learning: Machine Intelligence Approach For Drug Discovery
46 pages
Prediction of Drug-Target Interactions and Drug Repositioning Via Network-Based Inference
No ratings yet
Prediction of Drug-Target Interactions and Drug Repositioning Via Network-Based Inference
12 pages
Artificial Intelligence (AI) in Drugs and Pharmaceuticals: Adarsh Sahu, Jyotika Mishra and Namrata Kushwaha
No ratings yet
Artificial Intelligence (AI) in Drugs and Pharmaceuticals: Adarsh Sahu, Jyotika Mishra and Namrata Kushwaha
20 pages
full text BMS-CTMC-2024-HT242-5771-8
No ratings yet
full text BMS-CTMC-2024-HT242-5771-8
22 pages
Graph Regularized Non-Negative Matrix Factorization With Prior Knowledge Consistency Constraint For Drug-Target Interactions Prediction
No ratings yet
Graph Regularized Non-Negative Matrix Factorization With Prior Knowledge Consistency Constraint For Drug-Target Interactions Prediction
20 pages
Biomedinformatics 02 00039 v3
No ratings yet
Biomedinformatics 02 00039 v3
22 pages
Biology Project On Ai in Medicine
No ratings yet
Biology Project On Ai in Medicine
10 pages
fphar-12-814858 (1)
No ratings yet
fphar-12-814858 (1)
10 pages
Drug Discovery FINAL
No ratings yet
Drug Discovery FINAL
16 pages
Artificial_intelligence-driven_drug_interaction_pr
No ratings yet
Artificial_intelligence-driven_drug_interaction_pr
9 pages
ENGGG
No ratings yet
ENGGG
36 pages
Computational Drug Discovery Is A Field of Research That Utilizes Computational Techniques and Methods To Discover and Design New Drugs
No ratings yet
Computational Drug Discovery Is A Field of Research That Utilizes Computational Techniques and Methods To Discover and Design New Drugs
10 pages
Computer-Aided Drug Design
No ratings yet
Computer-Aided Drug Design
9 pages
Arju(2)
No ratings yet
Arju(2)
28 pages
Artificial Intelligence in Drug Discovery Recent Advances and Future Perspectives
No ratings yet
Artificial Intelligence in Drug Discovery Recent Advances and Future Perspectives
12 pages
Artificial Intelligence and Machine Learning Approaches For Drug Design: Challenges and Opportunities For The Pharmaceutical Industries
No ratings yet
Artificial Intelligence and Machine Learning Approaches For Drug Design: Challenges and Opportunities For The Pharmaceutical Industries
21 pages
Protein Interaction Mol Docking
No ratings yet
Protein Interaction Mol Docking
49 pages
Artificial Intelligence in Drug Discovery Recent Advances and Future Perspectives
No ratings yet
Artificial Intelligence in Drug Discovery Recent Advances and Future Perspectives
12 pages
AI-Enhanced_Drug_Discovery[1]
No ratings yet
AI-Enhanced_Drug_Discovery[1]
10 pages
AI in Drug Discovery
No ratings yet
AI in Drug Discovery
3 pages
1 s2.0 S0959440X23000027 Main
No ratings yet
1 s2.0 S0959440X23000027 Main
9 pages
AI Powered Therap Target Discovery Review 2023
No ratings yet
AI Powered Therap Target Discovery Review 2023
12 pages
Precision Medicine
From Everand
Precision Medicine
Mbuso Mabuza
No ratings yet
Antennas: Learning Objectives
No ratings yet
Antennas: Learning Objectives
31 pages
MSR127 Minotaur Monitoring Safety Relays: Installation Instructions
No ratings yet
MSR127 Minotaur Monitoring Safety Relays: Installation Instructions
4 pages
The Forgotten Garden
No ratings yet
The Forgotten Garden
1 page
Wockhardt Analysis
No ratings yet
Wockhardt Analysis
12 pages
Stakeholder Theory
100% (1)
Stakeholder Theory
20 pages
Draft Note On The Concept of Group Housing - Project Report For Press Association Cooperative Group Housing Society Limited - Architexturez Network
No ratings yet
Draft Note On The Concept of Group Housing - Project Report For Press Association Cooperative Group Housing Society Limited - Architexturez Network
2 pages
Twin Flame Signature Blend
No ratings yet
Twin Flame Signature Blend
4 pages
Spanish Boots Leather Bob Dylan
No ratings yet
Spanish Boots Leather Bob Dylan
2 pages
Formulating and Solving LPs Using Excel Solver
No ratings yet
Formulating and Solving LPs Using Excel Solver
10 pages
413 Health Care Ms
No ratings yet
413 Health Care Ms
9 pages
P70533 BTEC L3 Information & Creative Technology 31761H Unit 2 Part A Jun-2023
No ratings yet
P70533 BTEC L3 Information & Creative Technology 31761H Unit 2 Part A Jun-2023
12 pages
Notes - Tenses & Functions
No ratings yet
Notes - Tenses & Functions
1 page
20. Venice Vacation Travel Guide _ Script
No ratings yet
20. Venice Vacation Travel Guide _ Script
2 pages
Buckling of Curved Panels Under Combined Shear and Compression
No ratings yet
Buckling of Curved Panels Under Combined Shear and Compression
14 pages
CLASS 11 ENGLISH TEXTBOOK SOLUTION Chapter 6 THE TALE OF MELON CITY
No ratings yet
CLASS 11 ENGLISH TEXTBOOK SOLUTION Chapter 6 THE TALE OF MELON CITY
28 pages
Chapter 1.1 - 1.2
No ratings yet
Chapter 1.1 - 1.2
5 pages
How To Create A Citrix Xenapp 6.5 Vdisk: As Described Here As Described Here
No ratings yet
How To Create A Citrix Xenapp 6.5 Vdisk: As Described Here As Described Here
24 pages
Dusun Sangye Supplication Commentary
100% (2)
Dusun Sangye Supplication Commentary
13 pages
The Marriage of Figaro: Preview Only
No ratings yet
The Marriage of Figaro: Preview Only
36 pages
Sport and Development in Emerging Nations (Routledge Research in Sport Politics and Policy) 1st Edition Cem Tinaz
100% (6)
Sport and Development in Emerging Nations (Routledge Research in Sport Politics and Policy) 1st Edition Cem Tinaz
62 pages
Grade 9 Mathematics TQ Q4
No ratings yet
Grade 9 Mathematics TQ Q4
10 pages
MKT302 UFM Assignment Details - Sept 2021 Semester
No ratings yet
MKT302 UFM Assignment Details - Sept 2021 Semester
9 pages
Photoshop Creative - FiLELiST
No ratings yet
Photoshop Creative - FiLELiST
1 page
Biologic and Biophysical Technologies. FINAL
No ratings yet
Biologic and Biophysical Technologies. FINAL
28 pages
Sunday of Lent
No ratings yet
Sunday of Lent
90 pages
Factors Influencing Customers To Shop From Woodland: Polite Behavior by Staff
No ratings yet
Factors Influencing Customers To Shop From Woodland: Polite Behavior by Staff
4 pages
Framing Nailer: Model 04041
No ratings yet
Framing Nailer: Model 04041
12 pages
Blood Bank
No ratings yet
Blood Bank
15 pages
Bacterial Diseases of Pulse Crops
No ratings yet
Bacterial Diseases of Pulse Crops
20 pages

Drug Target Interaction Prediction Using Machine Learning Techniques

Uploaded by

Drug Target Interaction Prediction Using Machine Learning Techniques

Uploaded by

International Journal of Interactive Multimedia and Artificial Intelligence, Vol.

Drug Target Interaction Prediction Using Machine

* Corresponding author. [email protected] (A. Suruliandi), [email protected] (T. Idhaya),

D iscovering new drugs is critical and driven by the need

Please cite this article as:

Fig.2. Flow of DTI Prediction.

3. Feature-Based Methods E. Kyoto Encyclopedia of Genes and Genomes (KEGG)

5. Deep Learning-Based Methods G. PROMISCUOUS

O. PubChem TABLE I. Databases Involved in Dti Prediction

The IUPHAR/BPS is considered as a guide to pharmacology [66]

TABLE III. Qualitative Analysis of the Articles Using Similarity-Based Methods

Uses regularized least No difference between Prediction of

Super Finds new drugs

Integrating Considers only

Reference Multi-View 1253drugs, 20 trials of 5 Enrichment analyzes No details of 56 newly identified

TABLE IV. Qualitative Analysis of the Articles Using Matrix-Based Methods

TABLE V. Qualitative Analysis of the Articles Using Feature-Based Methods

Uses LBP histogram

Support Multiple Kernel

2719 E Considers different

TABLE VI. Qualitative Analysis of the Articles Using Network-Based Methods

Reference Used a bipartite

Reference 1520 Targets Uses DBN and DTI probability

Reference 442 Targets 68 Creating CNN Predefined features

TABLE X. Quantitative Analysis of the Similarity-Based Methods Used in DTI Prediction

Similarity Based Methods

2. BLM-NII - - - - - - - - - - - - 98.8 99.0 98.4 98.1 92.9 95.0 86.5 86.6

3. WNN - - - - - - - - - - - - 81.9 75.5 84.8 78.8 29.9 24.9 30.8 43.4

4. STC - - - - - - - - - - - - 81.2 81.1 87.5 87.1 38.5 36.7 41.4 53.3

5. KNN - - - - - - - - - - - - 95.4 97.2 97.2 - 83.7 85.5 62.8 -

6. LPLNI - - - - - - - - - - - - 97.0 97.6 99.4 99.1 90.6 94.6 96.8 94.9

TABLE X. Quantitative Analysis of the Matrix-Based Methods Used in DTI Prediction

Matrix based Methods

S. Accuracy Sensitivity/Recall Precision/nDCG AUC AUPR/MCC

Feature Based Methods

Network Based Methods

S. Accuracy Sensitivity/Recall Precision AUC AUPR

Deep Learning based Methods

S. Accuracy Sensitivity/Recall Precision AUC AUPR/MCC

References [20] M. Gönen, “Predicting drug–target interactions from chemical and

[59] O. Ursu, J. Holmes, C.G. Bologa, et al., “DrugCentral 2018: an update,”

You might also like