0% found this document useful (0 votes)
15 views25 pages

SSRN 4541252

This systematic review discusses the challenges and advancements in protein structure prediction, focusing on conventional and AI methods. It categorizes various strategies, including ab initio, de novo, and comparative modeling, while emphasizing the role of deep learning in enhancing prediction accuracy. The review also highlights the importance of empirical data and knowledge-based approaches in improving the understanding of protein structures and their functions.

Uploaded by

2594523127
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views25 pages

SSRN 4541252

This systematic review discusses the challenges and advancements in protein structure prediction, focusing on conventional and AI methods. It categorizes various strategies, including ab initio, de novo, and comparative modeling, while emphasizing the role of deep learning in enhancing prediction accuracy. The review also highlights the importance of empirical data and knowledge-based approaches in improving the understanding of protein structures and their functions.

Uploaded by

2594523127
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

A SYSTEMATIC REVIEW ON PROTEIN STRUCTURE PREDICTION –

CONVENTIONAL AND AI METHODS

Swati Subhash Jadhav,


Assistant Professor,
E&TC Dept, D. Y Patil College of Engineering,Akurdi,(SPPU), Pune, India
[email protected]
Arati J. Vyavahare,
H.O.D. ECE Dept.,
PES’s Modern Engineering College,Pune(SPPU), Pune, India
[email protected]
Manish Sharma,
Associate Professor,
E&TC Dept, D. Y Patil College of Engineering,Akurdi,(SPPU), Pune, India
[email protected]

Abstract: Proteins are vital for survival, and understanding protein structure leads to protein
function determination. An extensive exploratory attempt has established the structures of
approximately 100,000 different proteins, even though this symbolizes a limited percentage
of the billions of recognized protein sequences. The months to years of tedious attempts
needed to ascertain a single protein structure restrict structural coverage. This review
highlights the crucial problems of (i) structure determination in protein ab initio and (ii) de
novo protein design (iii) comparative modeling (iv) optimization. The categorization of
strategies in comparative modeling with and without database information is being used to
review current progress in protein folding. Finally, current advances in ab initio, de novo
protein design are discussed, with an emphasis on template flexibility, in silico sequence
choice, as well as effective peptide and protein design. The utilization of deep learning
approaches to construct protein backbone structure from amino acid sequence has resulted in
the latest innovations in ab initio protein structure prediction methodologies. This review
discusses the notable strategies for template-based (TBM) and template-free (FM) protein
structure modeling, as well as a few tools developed for each strategy.

Keywords:Protein Structure Prediction (PSP), Deep Learning, Machine Learning, Secondary


Protein Structure.

1. INTRODUCTION
Proteins seem to be three-dimensional linear chains of amino acids which take on a distinct
three-dimensional architecture in their natural environment. The inborn structure of the
protein is what allows it to perform its biological function. Given the variety of geometrically
possible forms, an amino acid sequencecan bend into its native functional architecture.
Proteins aren't assembled into their native styles by a biological practice such as protein
synthesis (biological processesare essential for an organism's survival and affect its ability to
interact with its environment), as per Anfinsen's thermodynamic hypothesis, folding is indeed
merely physical process(physical process means the natural forces that change the protein
structure’s physical features)that would be solely ascertained by the protein's amino acid
sequence as well as the surrounding solvent [1]. According to Anfinsen's theory, protein
structure can be predicted in principle in both ways such as, if a free energy prototype exists
as well as the global minimum of this function could be identified. The above-mentioned
protein structure prediction approaches perfectly encapsulates the protein structure prediction
challenge because it allows the inference of the macroscopic such as Sperm Cells, Human

Electronic copy available at: https://ssrn.com/abstract=4541252


egg, chromosome condensed structure(Macroscopic things are large enough to be seen
without using a microscope)assembly of abundant proteins from several sorts of microscopic
organisms like bacteria (objects that are too small to see with an optical microscope)
interactions between the protein's elements [2]. Protein structure prediction, on the other
hand, is extremely difficult since even short amino acid sequences can result in a large
number of geometric structures, each of which should be identified to obtain the free energy
minimum.
A protein is made up of several structural levels. The unique amino acid sequence
describes the fundamental structure of a protein, which is conventionally characterized as
initiating just at the amino-terminal (N) end as well as ending at the carboxyl-terminal (C)
end. The structure of a protein could be straightly sequenced or inferred from the DNA
sequence. The Secondary structures can be observed as patterns of local bonding. 𝛼helices
and 𝛽sheets were the 2 most prevalent secondary structure kinds. Loop regions connect these
parts of the secondary structure.
A tertiary structure is indeed the ultimate three-dimensional structure of these components
after the protein bends into its innate state [3]. Figure 1 illustrates a protein structure. The
problem of predicting protein structure is addressed in a variety of areas. Chemists have been
fascinated by the structure prognostication issue as it is a precondition for fruitfully
combating de novo protein prototypes [4]. The eventual aim of de novo protein configuration
would be to develop amino acid sequences which curl into proteins that perform the preferred
operations. De novo protein configuration can be viewed as a product design issue at the
molecular scale.

Figure 1:Protein Structure Prediction

Over the last decade, many first-principles approaches(the first principle has been used to
anticipate the binding sites as well as the vitality of differing ligands & how they would be
altered by diverse mutations. The HierDock approach is developed to generate predictions
from the first principles) quantitative protein structure prognostication has also been
established, several of which are predicated on Anfinsen's thermodynamic theory [5].
However, first-principles computational structure prediction isn't the only approach to
figuring out protein structure. The count of protein structures revealed experimentally
continues to climb at a prompt rate [6]. The accessibility of empirical observations on protein
structures is being used to spur the advancement of knowledge-dependent rather than

Electronic copy available at: https://ssrn.com/abstract=4541252


physics-dependent strategies for quantitative structure prognostication to anticipate the
binding sites as well as vitality for multiple ligands, and also how they were being adjusted
by diverse mutations.
Knowledge-dependent strategies, as contrasted to methodologies that strive to lessen free
energy as well as deduce the structure from that first principle [7], investigate databases of
recognized structures to discern data about an amino acid sequence with just an unfamiliar
three-dimensional structure. Whereas these knowledge-based methodologies were critiqued
for failing to provide a foundational comprehension of the principles which influence
structure establishment, those who can frequently fruitfully anticipate unfamiliar three-
dimensional structures.
The Critical Assessment of Protein Structure Prediction (CASP) tests was performed every
two years to appraise advancement for all quantitative protein structure prognostication
techniques. Due to the low count of amino acid sequences offered by the CASP trials, they
offer a beneficial approach for evaluating strategies as well as advancement throughout the
field in an unbiased way. As evidenced by the biannual CASP trials [8], scientific enthusiasm
had also influenced the major portion of research into protein structure prognostication
techniques. The de novo protein design obstacle, which would be the inverse of the protein
folding concern, begins with the structure instead of just the sequence as well as looks for any
sequences which will bend into that structure. This same article addresses the latest
innovations in the disciplines of Ab initio, de novo protein, Competitive modeling protein
structure prognostication, and 3D protein structure sequence specifically focusing on
alignments.
The preceding is the structure of the review paper: A survey of protein structure prediction
is presented in Section 2, which is divided into ab initio protein structure prediction, de novo
protein structure prediction, comparative modeling in protein structure prediction, multi-
objective differential function, Protein Structure Prediction Using Machine Learning, as well
as Deep Learning. A summary and discussions are included in section 3, and Section 4
includes recommendations for the future. Section 5 finally brings the paper to a conclusion.

2. LITERATURE SURVEY
Protein structure prediction can be performed in a variety of ways. The methods for
predicting structure can be divided into five categories; (i) Ab initio protein structure
prediction (ii) De novo (iii) Comparative modeling (iv) Multi-objective differential solution
(v) Protein structure prediction using AI techniques.

2.1 AbInitio Protein Structure Prediction


Despite years of research, predicting the structure of the proteins from their amino acid
sequence was still a massive challenge. If the query protein has such a defined homolog, this
same operation would be fairly simple, and high-resolution models are generally created by
mutating as well as customizing the concept of a resolved framework. If systemic homologs
would not appear, or if they do appear but cannot be recognized, designs should be formed
from the beginning. This method, referred to as ab initio modeling, is obligated for a
thorough alternative to the protein structure prognostication problem; it also can assist in
understanding the physicochemical principles that govern how proteins bend in nature.

Electronic copy available at: https://ssrn.com/abstract=4541252


Figure 2: Structure of the Ab initio protein structure prediction [36]

Although sequence-dependent contact prognostication has already shown potential in


enhancing non-homologous structure designing, getting proper folds often necessitates a
massive count of homologous sequences as well as a notable number of proper contacts [9].
C-QUARK, a tactic for instructing replica-exchange Monte Carlo fragment construction
simulations which incorporate multiple deep-learning as well as coevolution-
dependent contact maps, has been conveyed in this article. C-QUARK folded 75 percent of
the cases with TM scores (template-modeling scores) less than 0.5 on 247 non-redundant
proteins, which was 2.6 times preferable than QUARK. In the 59 situations where contact
accuracy became low or there were few homologous sequences, C-QUARK fruitfully folded
6 times more proteins than other contact-dependent folding techniques.
Prediction of protein secondary structure (alpha-helix, beta-strand, as well as the coil) is
indeed a crucial stage in predicting protein inter-residue interactions as well as ab initio
tertiary structure [10]. In this study, numerous sophisticated deep learning architectures
(DNSS2) have been developed to strengthen secondary structure prediction. The DNSS2
strategy anticipates secondary structure by incorporating (i) the design as well as integration
of six innovative one-dimensional deep convolutional/recurrent/residual /
memory/fractal/inception networks, & (ii) the use of more sensitive profile features inferred
from the Hidden Markov model (HMM) but also multiple sequence alignment (MSA). The
huge percentage of deep learning architectures for protein secondary structure
prognostication seems to be novel. DNSS2 was extensively benchmarked on 2 distinct test
datasets with eight cutting-edge tools but also persistently scored among the top ways.
For so many years, amide hydrogen-deuterium exchange (HDX) is being used to ascertain
protein regional leeway as well as binding sites, but the data is insufficient for complete
structural characterization [11]. While contrasted to X-ray crystallography, cryo-EM, or even
a full suite of NMR analyses for structure perseverance, interrogations that quantify HDX
rates, including HDXNMR, get a much-elevated throughput. HDX-NMR data encodes
information about protein structure, making it a good candidate for being augmented with
computational techniques for protein structure predicting.A methodology for incorporating
HDX-NMR information into ab initio protein structure prediction has been formed utilizing
Rosetta software schema to anticipate structures predicated on exploratory contract. Further
research could concentrate on optimizing our scoring system to HDX-MS for monomeric
structure predicting as well as protein complex structure predicting, although both are pivotal
to the overwhelming bulk of bioactivities.
Deep contextual learning techniques can be used to create a high-quality fragment library.
Our DeepFragLib strategy utilizes bidirectional long short term memory recurrent neural
networks with information distillation for initial fragment categorization, accompanied by an

Electronic copy available at: https://ssrn.com/abstract=4541252


accumulated residual transition network with tediously expanded convolution for recognizing
near-native fragments [12]. DeepFragLib enhances the position-averaged fraction of near-
native fragments by 12.2 percent over extant methodologies while merged with Rosetta,
leading to upgraded near-inherent structures for 72.0 % of the free-modeling domain
objectives trialed. DeepFragLib is a fully parallelized structure prediction framework that can
be used in combination with other applications.DeepFragLib's cumulative hierarchical
structure has been configured to minimize quantitative intricacy while growing computational
efficacy.
GalaxyTongDock seems to be a web service for ab initio protein-protein docking that, like
ZDOCK, accomplishes rigid-body docking but with enhanced energy variables [13]. These
same energy attributes have been trained to utilize iterative docking as well as parameter
lookup so that more native-like structures could be chosen as top rankers. GalaxyTongDock
could indeed dock symmetrically homooligomeric proteins with Cn and Dn symmetries
(GalaxyTongDock C and GalaxyTongDock D) as well as asymmetrically two different
proteins (GalaxyTongDock A).
The most fruitful ab initio protein structure strategies, i.e. fragment-assembly-
dependent methodologies, necessitate the creation of a massive count of decoys to create
precise prediction [14]. Methodologies that can create models quicker and are much more
residue contact-sensitive were also needed to realize the assurance of ab initio protein
structure predicting motivated by the latest innovations in contact prediction. When the
anticipated contacts are exact, the CONFOLD tactic can produce high-quality secondary
structures (with coupling beta-strands to form beta-sheets) as well as valid tertiary structures.
Ab initio protein tertiary structure predicting is indeed a long-standing concern in
structural bioinformatics [15]. Using residue-residue interaction as well as secondary
structure prediction data, the accuracy of ab initio structure prediction could be enhanced. In
this paper, an enhanced differential evolution scheme dependent on secondary structure and
residue-residue contact information has been suggested for protein structure prediction.
SCDE identifies two score designs predicated on secondary structure and contact details, as
well as two shortlisting strategies for directing conformation space search: secondary
structure-dependent shortlisting tactic as well as contact-dependent selection methods.
Experimental data is frequently used by ab initio protein-protein docking algorithms to
determine the most likely complex structure [16].Ab initio protein-protein docking
methodologies regularly utilize exploratory data to anticipate the far more probable
sophisticated structure [16]. The suggested strategy blended protein-protein docking with
chemical cross-linking information, accompanied by chromatographic techniques. Cross-
links had been incorporated as distance constraints relying on Euclidean or void-volume
distance. In such scenarios, the use of symmetrical data enhanced the sophisticated structure
prediction effectiveness even further.
The discovery of T-cell epitope structure seems to be a tough immune informatic
undertaking in the configuration of epitope-dependent vaccines [17]. Antigenic peptides, also
widely defined as epitopes, are indeed a category of amino acid that binds to MHC molecules
in the Major Histocompatibility Complex (MHC). Using a Genetic Algorithm for Predicting
Epitope Structure (GAPES), this work presents an innovative model for assessing the
structure of MHC class-II epitopes predicated on their sequence. This same suggested Elitist-
based genetic algorithm identifies the tertiary structure of the epitope utilizing the Ab-Initio
Empirical Conformational Energy Program for Peptides (ECEPP) Force Field Model. The
support vector machine (SVM) classifier is being used to assess predicting accuracy.
An efficacious conformation lookup tactic, as well as an effective energy function, are
needed to resolve the protein structure prediction (PSP) challenge [18]. This study designs the
PSP as a multi-objective optimization issue. AIMOES, a three-objective evolution algorithm,

Electronic copy available at: https://ssrn.com/abstract=4541252


has been developed. Bond energy, non-bond energy, as well as solvent-accessible surface
area are the three physical energy parameters used by AIMOES. The results showed that the
proposed strategy may produce comparable or higher performance. It should be highlighted
that the proposed strategy appears to be more efficient when compared to the long evolution
timeframes. The future scope includes attention to the energy function and the confrontation
search approach.

Table 1:Structureof the AbInitio Protein Structure Prediction

Reference Technique Dataset Advantages


Disadvantages /
No Future work
[9] C-Quark CASP dataset Utilizing This system has a
numerous weaker prediction
predictors assists due to the low Nf
to enhance the value
contact
map's overall
accuracy
[10] Multiple CASP13 The architecture It requires a large
advanced deep is unaffected by amount of data to
learning the size of the perform better
architectures input. than other
(DNSS2) CNN learns both techniques
local as well as
global
characteristics.
[11] HDX-NMR CASP13 Reduces the Furthermore, MS
computational studies are not as
power and constrained by
structural protein size as
prediction NMR
investigations,
which are often
unprofitable for
proteins bigger
than 50kDa except
if specialized
sampling is being
used, which also
have its own set of
downfalls.
[12] Bidirectional SCOP350 as Reduce It is quite difficult
long short term well as CASP computational to train an RNN.
memory datasets complexity while
recurrent neural improving
networks computational
efficiency

Electronic copy available at: https://ssrn.com/abstract=4541252


[13] GalaxyTongDock CASP Prediction power The knowledge-
is high in both based scoring
asymmetric and function is the
symmetric challenge
docking
[14] CONFOLD2 PSICOV CONFOLD2 Accuracy is less
contact only generates a
prediction few hundred
dataset model decoys to
explore the fold
space, it is
relatively quick.
[15] SCDE CASP12 The proposed Forecast accuracy
SCDE is both is enhanced in
effective and future
efficient.
[16] Protein-protein Protein Data Improved the sequences of the
ZDOCK docking Bank complex two-component
with chemical structure proteins from a
cross-linking data prediction large number of
performance species is required
[17] Genetic Immune GAPES was GAPES need
Algorithm for Epitope reliable and further
Predicting Database accurate. development to
Epitope Structure (IEDB) improve its
(GAPES) prediction
response time.
[18] Archive Protein Data The developed In the upcoming
information Bank method gives a years, the energy
assisted multi- higher function, as well
objective performance as the
evolutionary confrontation
strategy search approach,
(AIMOES) would be used.

The above survey of ab initio protein structure prediction techniques is presented. With
single and multi-objective optimization, the Genetic Algorithm, Immune Algorithm,
Differential Evolution, and other evolutionary approaches were examined. We present an
overview of several studies, covering specific aspects and points of issue modeling as well as
the methods employed. For the most often examined proteins in the literature, numerical
results were provided. Despite advancements in issue modeling and computational
approaches, the PSP problem remains a challenging issue. Adaptation, local search, and
parallelism are three techniques for solving the ab initio PSP problem that has yet to be
investigated.

2.2 De Novo Protein Structure Prediction


If an energy function encompasses every sort of contact identified in a thorough atomic
model of a protein, it is said to be truly efficacious. Such genuine effective energy functions
could be gained at the atomic level of a protein by utilizing basic physics legislation.

Electronic copy available at: https://ssrn.com/abstract=4541252


However, because atomistic level conception necessitates the significance of energetics
among all pairs of atoms, as well as the count of couples grows quickly as the chain length
grows, ever more quantitative exertion is needed.

Figure 3: De Novo Protein Structure Prediction [19]

The distinct space created by fragment arrangement can no longer execute the distance
constraint as deep learning-dependent inter-residue contact/distance predicting advances [20].
As a result, the continuous space's optimum solution may not be obtained. To boost the
effectiveness of the distance-assisted fragment assembly approach, an efficient closed-loop
frequent dihedral angle optimization tactic that augments distinct fragment assembly would
be necessary, where IPTDFold, a residue-level distance deviation optimization method,
vastly enhances structure predictive performance. With the swift progression of design
quality appraisal innovations, incorporating model assessment into the folding technique to
get a feedback loop will indeed help enhance protein structure prediction exactness.
By incorporating distinct contact maps into one, meta contact seems to be a prevalently
utilized approach to enhance contact prediction accuracy but also effectively lowering noise
from a solo contact map [21]. The data undertaken by initial contact mapscannot, however,
be completely used by protein structure prediction utilizing meta contact. MultiCFold, an
evolutionary algorithm framework, is used to offer a multi contact-based folding approach. In
MultiCFold, populations use detailed information from several contact maps to regulate
protein structure folding.
Protein structure forecasting from sequence has been extensively researched for centuries
owing to the overall problem's significance and well-known systematic as well as the
computational basis [22]. While progress has come in and run in the past, the neutralization
of structure prediction pipelines has seen dramatic advances in the last two years,with neural
networks replacing arithmetic originally dependent on energy models as well as sampling
operations.To revise the refinement of coarsely forecasted formations into delicately
remedied ones, the distillation of set of instances from defined structures, the incorporation of
templates from homologs in the Protein Databank, as well as the retrieval of physical contacts
from the evolutionary documentation, neural networks are being used.
Computational de novo protein design has become more popular in biomedicine and
biological engineering to solve a variety of problems [23]. Over several decades, advances in
design concepts and approaches have propelled success in expanding applications.The
research looks for the latest breakthroughs in substantial factors of de novo protein layout, as
well as how guidelines of protein architecture, as well as interactions deduced from of the
Protein Data Bank's vast gathering of structures, impacted such breakthroughs. De novo
generation of tunable backbone architectures, sequence optimization, scoring function
modeling, as well as functional models is all discussed. The advancements not only

Electronic copy available at: https://ssrn.com/abstract=4541252


emphasize design goals that may be achieved now, but also the challenges and opportunities
that the industry faces in the future.
Because of the inaccuracy of the energy force disciplines, the arithmetically best scenario
in computational protein folding analyses doesn't often correspond to the innate structure
[24]. As a result, more diversified suboptimal solutions must be sought to discover states that
are near to the native. Develop an exceptional multi-modal optimization algorithm to attain
the verification sampling effectiveness as well as design correctness of de novo protein
structure folding scenarios. The MMpred strategy is indeed a multimodal optimization
sampling tactic that relies on distance. Integrating local abstract convex underestimation into
modal exploration is a smart way to increase the algorithm's efficiency.
De novo protein structure prediction seems to be a tough issue that necessitates the use of
an accurate energy function as well as a fast confirmation sampling approach [25]. This paper
recommends CoDiFold, a de novo structure estimation strategy. To develop energy function
accurateness, connections, as well as proximity profiles, have been naturally blended into the
Rosetta lower-resolution energy configuration in CoDiFold. As a consequence, the
correlation between energy as well as root mean square deviation (RMSD) seems to be more
profound. To enhance PSP accuracy, the CoDiFold utilizes an optimized energy function that
incorporates contact with distance profiles but a multi-mutation methodology.
Tracking the contact between both the severe acute respiratory syndrome coronavirus 2
(SARS-CoV-2) spike protein as well as the human angiotensin-converting enzyme 2 (ACE2)
receptor becomes an alternative treatment strategy [26]. Two de novo design techniques were
used to create inhibitors. The amino acid sequences of computer-generated scaffolds were
intended to improve target binding, folding, and stability, and they have been either
constituted around an ACE2 helix that engages with the spike receptor-binding domain
(RBD) or docked against the RBD to find novel binding configurations. It's impossible to
plan for unknown future pandemics, but such a capacity may be a vital part of a broader
response strategy.
De novo protein structure forecasting could be conceived of as a conformational space
optimization issue with an energy function as a direction [27]. However, constructing a
precise energy function that offers low-energy configurations closely related to organic
structures seems to be tricky. This study describes a two-step distance feature-
dependent optimization approach (TDFO) for de novo protein structure prediction inside the
system of evolutionary algorithms. To initiate, the correlation model-depend short-
listing technique is intended to lessen the influence of energy function shortcomings on
outcomes. Meanwhile, supplemental optimization strategies, including multi-objective
optimization strategies as well as multimodal optimization techniques, have been used to
enhance prediction performance even further.
Protein structure prediction could be presumed as a multimodal optimization issue for
sampling the protein conformational space there concerning a quite intricate energy
surrounding [28]. A conformational spatial sampling strategy predicated on multi-differential
evolution, MDE, has been suggested as an alternative to this concern. Discovering a
methodology to rationalize limited energy as well as elevated appropriateness of
conformations becomes a promising direction for our upcoming research.
EvoDesign seems to be a computational method for rapidly generating novel protein
sequences that become consistent with specified protein structures [29].As an outcome, this
could be utilized to enhance protein constancy, reshape the protein exterior to lessen
undesired protein-protein contact, as well as augment protein-protein linkage.As a
consequence of evolutionary understanding, EvoDesign sequences have native-like folding as
well as binding character traits not discovered in prior physics-dependent styling concepts.

Electronic copy available at: https://ssrn.com/abstract=4541252


EvoDesign might be used to redevelop proteins, with an emphasis just on computational. The
exploratory strategies will be used to verify this same design concept.

Table 2: De Novo Protein Structure Prediction

Reference Technique Dataset Advantages Disadvantages \ Future


No work
[20] IPTDFold CASP- enhances structure A feedback loop would aid
13, prediction to enhance the protein
CASP - accuracy structure's prediction
14 significantly accuracy.
[21] MultiCFold CASP-13 Accuracy is high Possible risk due to high
voltage
[22] Machine CASP-13 Increases the (i) MSA-free protein
Learning accuracy sequence prediction, which
would be beneficial for de
novo styled proteins,
swiftly emerging viral
proteins, as well as evolved
youthful mammalian
proteins; (ii) ultra-high
accuracy prediction (0.5
A), which itself is handy
for drug invention but also
enzymology; as well as (iii)
forecasts vulnerable to
mild sequential variations
which result to massive
structural alterations, that is
advantageous for
comprehending the
molecular foundations of
heritable diseases.
[23] De novo PDB High accuracy De novo protein design, on
scoring the other hand, seems to be
a problem that has yet to be
rectified. Since proteins
have such a wide range of
forms and activities, the
challenges of designing
them vary greatly.
[24] MMpred CASP-13 High prediction In the future, Integrating
accuracy local abstract convex
underestimation into modal
exploration is a smart way
to increase the algorithm's
efficiency.
[25] CoDiFold CASP-13 multi-mutation Advances in energy
technique to functions and
increase PSP conformational search are

10

Electronic copy available at: https://ssrn.com/abstract=4541252


accuracy still required.
[26] human Protein improve target It's impossible to plan for
angiotensin- Data binding, folding unknown future
converting Bank pandemics, but such a
enzyme 2 (PDB) capacity may be a vital part
(ACE2) of a broader response
receptor strategy.
[27] two-stage CASP-11 High-energy In the future, additional
distance conformations optimization approaches,
feature- including multi-objective
dependent optimization approaches as
optimization well as multimodal
technique optimization approaches.
(TDFO)
[28] MDE CASP-11 Energy is reduced One promising path is to
develop a methodology for
reconciling low energy as
well as elevated rationality
of conformations.
[29] EvoDesign CASP-11 improve protein To verify this same design,
stability, reshape computational as well as
the protein surface exploratory strategies
to reduce should be used.
unwanted protein-
protein
interactions, and
improve protein-
protein binding

There are multiple regions there in the domain of quantitative de novo protein styling
which necessitate notable advancement. To manage massive sequence optimization
challenges computationally comprehensible, scoring functions utilize numerous
approximations, which include implicit solvation concepts as well as pairwise degradable
energy parameters. Boosting scoring accuracy as well as speed would be a target of the
strategy. Since many de novo protein functions were established, lots of them cannot be
generated on a regular schedule. Recent advances in the design of basic functions such as
ligand binding, protein-protein contact, membrane tracking, & induced switching facilitate
researchers to foresee the formation of far more complicated as well as composite functions
including artificial cellular signaling frameworks, motors, as well as manageable molecular
machinery utilizing de novo designed elementary components.

2.3 ComparativeModeling in Protein Structure Prediction


Comparative modeling is being utilized to anticipate the protein structure by contrasting its
amino acid sequence to sequences wherein the native three-dimensional formation already is
well-known. Structure resemblance would be deduced from sequence similarity, as per
comparative modeling. The degree of sequence similarity, on the other hand, does have a
critical influence on comparison modeling prediction accuracy. In a recent analysis of
homology modeling, threading advancements, it was observed that the gap between average
and best homology predictions for easy targets is just marginal. Because techniques to

11

Electronic copy available at: https://ssrn.com/abstract=4541252


homology modeling do not even differ significantly in terms of template shortlisting as well
as alignment, this recommends that refinement operations just aren't as vital for simple
targets.
Bacillus halodurans C-125 subtilisin-like alkaline serine protease (ASP) has been
heterologously conveyed, enzymatically characterized, as well as structural homology
patterned [30]. The coding gene has been satisfactorily attained utilizing PCR (long-form),
which was then cloned into the pMA0911 shuttle vector and conveyed extracellularly under
the regulation of the powerful HpaII promoter. In a B. Subtilis WB800 cell line needing eight
extracellular proteases, the ASP enzyme has been efficaciously conveyed as well as
developed extracellularly in the culture medium.
Tuberculosis (TB), which is induced by Mycobacterium tuberculosis and has been
murdered by numerous human beings, is a severe worldwide health threat [31]. This
necessitates a rush to find and produce more strong medications to combat the rapidly
growing multitude of extensively drug-resistant (XDR) as well as multi-drug-resistant (MDR)
M. TB strains. The homology prototype of QcrB protein in M. tuberculosis has been formed
utilizing Swiss-Model online workspace and also the X-ray formation of QcrB in M.
smegmatis as just a template. This same study's findings set the stage for structure-
dependent drug styling as well as recognized a few potential hits for upcoming anti-
tubercular drug advancement. The study recommended that the anticipated protein target be
validated further through experimental investigation, characterization, and applications.
DNNs for geographic constraint prognostication as well as end-to-end prototype training
had also vastly enhanced protein structure forecasting accuracy, efficaciously eradicating the
issue just at fold level for solitary-domain proteins [32]. Substantial advancement has been
achieved in the protein design field, with notable instances demonstrating that data held in
neural network models may be leveraged to promotethe design of operational proteins.As a
consequence, incorporating deep learning tactics into diverse aspects of protein folding as
well as styling strategies is indeed a highly promising area that is expected to have a
revolutionary influence on both fields.
The amount of known protein sequences has exploded as a result of genome sequencing
initiatives [33]. Only about one-hundredth of such sequences were characterized at atomic
resolution utilizing experimental structure evaluation strategies. This sequence-structure gap
might be bridged using computational protein structure modeling approaches. This chapter
demonstrates using MODELLER to create a comparative prototype for one protein whose
structure is unidentified. Through the automation of a similar protocol, beneficial accuracy
designs for realms in much more than half of all known protein sequences have been
produced.
A model has been developed for protein structure prediction which contains a 1D-Convent
as well as a modified recurrent neural network with a modified continuous coin betting
optimizer [34]. This same modified continuous coin betting (COCOB) approach identifies the
likelihood of receiving a head or a tail by tossing the coin twice to ascertain the result of a
coin flip. It demonstrated a major advancement in gradient estimation. As per our knowledge,
it's the first strategy in the deep learning line of work that utilizes modified COCOB
optimization to predict protein secondary structure.
For over two decades, computational methodologies have been utilized to predict protein
structures, clearing the path for far more concentrated studies and advancement of techniques
in comparative modeling, ab-initio modeling, as well as structure refinement protocols [35].
Template-based modeling techniques have had a lot of success, but template-free modeling
strategies are still lagging, especially for bigger proteins (> 150 amino acids).The utilization
of deep learning strategies to establish protein backbone structure from amino acid
arrangementhas been reflected in the latest advancements throughout ab initio protein

12

Electronic copy available at: https://ssrn.com/abstract=4541252


structure prognostication processes. This paper outlines the substantial template-free protein
structure model-dependent methodologies and explores a few tools created for each strategy.
In structural bioinformatics, protein structure prediction is a major concern [36].This
review comprehensively explores the issue of protein structure forecasting as well as the
underpinnings of Deep Learning (such as CNN, RNN, as well as basic feed-forward Neural
Networks), before actually mentioning the emergence of forecasting methodologies for one-
dimensional but also 2-dimensional Protein Structure Labels, from simple statistical analyses
to computation-intensive techniques. Protein misfolding prediction is indeed a novel concern
for the prevailing forecasting approach, with ML strategies progressing at a diminishing rate.
A database of protein Pentafragments has been created based on ideas about the protein's
molecular vector machine, as well as methodologies for forecasting the secondary protein
structure depending on their predominant structure & formulating the primary protein
structure for a given secondary structure that it takes on [37]. Using the Pentafragments
database and the aforementioned methods, a full software suite has been created. For the
proteins used to construct the Pentafragments database, great accuracy (almost 100 percent)
in predicting secondary protein structure has been established, as well as promising prospects
for its usage in creating secondary protein structures.

Table 3: Comparative modeling Protein Structure Prediction

Reference Technique Dataset Advantages Disadvantages/


No Future work
[30] Modeling the Protein Data Bank ASP is a There are still a
structure of a strong serine few issues to
subtilisin- protease that work out in the
like alkaline can be easily production
serine incorporated mechanisms for
protease into a variety large-demand
(ASP) from of industrial proteases with
Bacillus applications. factory uses,
halodurans To build a such as the
C-125 low-cost, optimization
reliable, and need of
efficient regulatory
system. elements.
[31] QcrB protein Protein Data Bank Increases the Anticipated
homology efficiency protein targets
model in M. are validated
tuberculosis further through
experimental
investigation,
characterization,
and
applications.

[32] Deep Genome database Less time Protein folding


learning consuming and design
techniques approaches is an
interesting
future area that

13

Electronic copy available at: https://ssrn.com/abstract=4541252


is expected to
have a
revolutionary
influence on
both fields
[33] MODELLER Protein Data Bank Increase the Computational
accuracy protein structure
modeling
approaches
[34] 1D-Convent CB513 as well as High High cost
prototype as CullPDB dataset efficiency
well as 1D-
Convent with
the
bidirectional
long short
term memory
cell
[35] Temple UniProtKB/TrEMBL Quicker than Redundancy of
based database the other perforation
prediction methods

[36] Deep Protein Data Bank Less time Computationally


Learning consuming, intensive
methods Increase the highly-
accuracy sophisticated
[37] Secondary Improved database great Promising
protein of Pentafragments accuracy prospects for its
structure usage in
creating
secondary
protein
structures

Applying the described prediction correction approach to groups of proteins with similar
structures but derived from different species is convenient and relevant (as in cases with
myoglobin and other heme-containing proteins). A global database which can be utilized to
accurately anticipate the sequence of any protein would be ideal. A significant increase in the
count of Pentafragments in the database, on the other hand, considerably increases the
number of different alternatives for secondary structure prediction. This, in turn, causes a
significant reduction in software performance and a decrease in prediction quality. Homology
models give sufficient information on the spatial arrangement of key residues in a protein,
and they are frequently employed in drug development to screen enormous libraries using
molecular docking techniques. There is still more work to be done in this field, but the results
appear to be quite promising.

2.4 Multi-Objective Differential Function


Biochemistry's central challenge is to predict protein structure from sequence [38]. Despite
the reality that co-evolutionary concepts exhibit assures, an explicit sequence-to-structure

14

Electronic copy available at: https://ssrn.com/abstract=4541252


layout has yet to be identified. Deep learning advancements that substitute complicated,
human-designed pipelines with differentiable configurations which are adapted from start to
end imply that structural prediction might profit from such a comparable restructuring. The
paper offers an end-to-end discrete prototype to learn the protein structure. The model
attaches global and regional protein structures by utilizing spatial blocks that optimize global
geometry while not interfering with local covalent chemistry. We put our method to the test
by asking it to predict innovative folds without utilizing co-evolutionary information but also
known folds without utilizing structural templates.
Existing algorithms and approaches have two major flaws that render them inappropriate
for protein-peptide docking problems [39]. For instance, it appears that existing
methodologies for weighing the boundless forces between a protein as well as a peptide need
to be changed and redesigned. Second, they don't use cutting-edge search techniques to
determine a peptide's 3D position to a protein. To overcome these limitations, the current
research intends to offer a unique multi-objective algorithm that creates several alternative
3D peptide postures before improving them using its operators. Multi-Objective Pareto Front
(MOPF) optimization principles are used to further analyze the potential solutions.
Acknowledging the biological activity of proteins necessitates structural analysis [40]. The
operation of evaluating the structural attributes of such molecules, on the other hand, is time-
consuming and costly. The current study takes this method by analyzing and assessing three
multi-objective algorithms: The Non-Dominated Sorting Genetic Algorithm in its second
version, Generalized Differential Evolution in its third version, as well as Differential
Evolution Multi-Objective. Eventually, the research group may be engaged in an inquiry into
how to effectively ascertain -sheets, that have yet to be undertaken.
Modest compound binding strategies, as well as relative affinity, could be anticipated by
utilizing protein structure-dependent methodologies [41]. Using only a rigid protein point,
large-throughput docking of approximately 106 tiny compounds, accompanied by scoring
predicated on an implicit solvent force field, could be used to explore micromolar binders.
Molecular dynamics with explicit solvent is indeed a low-throughput tactic for studying
flexible binding sites as well as determining binding routes, kinetics, but also
thermodynamics.
The problem seems to be difficult to fix due to the inaccuracy of extant protein-energy
operations as well as the huge conformation search area [42]. In this study, the PSP issue has
been depicted as a multiple-objective optimization issue. The three-objective energy function
was being composed of a physics-based energy function as well as an expert energy function.
A conformation spatial exploration would be carried out by using an enhanced multi-
objective particle swarm optimization in conjunction with two archives. The promising
energy function, as well as the search engine, could be enhanced. Furthermore, the proposed
system makes use of very basic prior knowledge of protein structure. This strategy is thought
to be a barrier to the suggested approach's performance. To tackle the complicated problem,
continual efforts in various areas are required.
The suggested technique segregates the initial large-scale swarm into numerous tinier
subpopulations which are evolved simultaneously on distinct computer units, thereby
boosting implementation efficiency as well as population diversification [43].The
disorganizedinitializing tactic has been used in the evolutionary phase to optimize the
reliability of the preliminary population, whilst viable spacialidentification technique, as well
as the modified dominance strategy, were being established to enhance the viability of the
resolution but also individual convergence rate.
This study introduces a multi-objective evolutionary strategy [44] for tackling the tricky
issue of forecasting a protein's three-dimensional (3-D) structure from its one-dimensional
structure. It deconstructs the protein-energy function force fields into bond as well as non-

15

Electronic copy available at: https://ssrn.com/abstract=4541252


bond energies as initial as well as secondarypurposes, respectively. To account for the
solvent's influence, a solvent-accessible surface region is being used as the third target. The
study plan to test this approach on more proteins in the future to see how well it works.
Theoretical investigations of the algorithm's selection and mutation operators should be
carried out since these operators contribute to the method's overall improvement.

Table 4: Multi Objective Differential Function

Reference No Technique Dataset Advantages Disadvantages


\ Future work
[38] end-to-end ProteinNet More data A restriction is
differentiable dataset efficient the utilization
model model of solitary-
structures scale atomic or
residue-level
depictions.
[39] Multi- LEADS-PEP Reduces the Time taking
Objective dataset error value process
Pareto Front
(MOPF)
[40] Differential Protein Data Increases the Determine 𝛽-
Evolution Bank capacity sheets, which
Multi- has yet to be
Objective investigated
[41] ligand PDB High Insufficient
optimization throughput of sampling
docking
[42] The multi- Protein Data Increases the Test this
objective Bank accuracy approach on
optimization more proteins
problem in the future to
(MOOP) see how well it
works
[43] parallel multi- Synthetic Boost the In addition,
objective dataset population's three effective
genetic performance. measures have
algorithm been
(PMOGA implemented
to increase
population
diversity.
[44] Enhanced Protein Data Potential The potential
multi- Bank energy energy
objective function function, as
particle well as the
swarm search
optimization approach, will
be enhanced in
the upcoming
research.

16

Electronic copy available at: https://ssrn.com/abstract=4541252


Since its inception, the PSP issue was already handled as a solitary-objective optimization
concern. Designing the PSP issue as a multiple-objective optimal solution was recently
gained popularity. Protein structure assessment is a critical step in understanding their
biological function. However, determining the structural properties of such molecules would
be a costly as well as time-consuming operation. Computational approaches, notwithstanding
their complexity, may be a provocative strategy for reducing these issues. In this approach,
the current study compares and evaluates three multi-objective algorithms: The Non-
Dominated Sorting Genetic Algorithm in its second version, Generalized Differential
Evolution in its third version, and Differential Evolution Multi-Objective. To evaluate this
approach on more proteins, theoretical investigations of the algorithm's selection and
mutation operators should be carried out since these operators contribute to the method's
overall improvement.

2.5 Protein Structure Prediction utilizing Machine Learning as well as Deep Learning
Protein structure prediction would be a tactic for forecasting forecasting a protein's 3-D form
based on its amino acid chain. This is indeed a vital challenge since the structure of a protein
defines its operation to a large extent. Protein structures, on the other hand, are notoriously
hard to ascertain experimentally. Utilizing genetic data has lately led to a significant
improvement.It's indeed plausible to ascertain which amino acid byproducts were already in
touch by analyzing correlation in homologs, that assist in protein structure
prognostication.We display how a neural network could be trained to anticipate distances
between couples of deposits, which also offer additional structural data than connection
predictions.

Figure 4: Deep Learning-based protein structure prediction [45]

Metabolic engineering necessitates a comprehensive insight intothe cellular metabolism,


such as metabolic activities as well as enzymes, to create industrial strains competent for
overproducing bioproducts [46]. Nevertheless, the metabolic routes, as well as enzymes
implicated in countless of the compounds of interest, still are unidentified, revealing a
substantial barrier to their biological synthesis. This problem can be partially solved by using
enzyme and route design to create new biosynthetic pathways. As bio-big data rises, data-
driven methodologies utilizing artificial intelligence (AI) tools facilitate quite sophisticated

17

Electronic copy available at: https://ssrn.com/abstract=4541252


protein as well as pathway models. Current findings on AI-assisted protein design as well as
engineering must have centered on guided progression, that also utilizes AI to quickly create
mutant libraries.
Machine learning strategies were being often utilized in bioinformatics, computational
biology, as well as systems biology. The advancement of ML strategies for protein structure
prognostication seems to have vital implications in systemic biology as well as bioinformatics
[47]. Despite the fact that protein structure prognostication becomes a tricky issue, it's also
oftenly partitioned as well as focused at 4 distinct stages: 1-D prognostication of structural
properties along the key amino acid sequence; 2-D prognostication of spatial interplay among
amino acids; 3-D prognostication of a protein's tertiary formation; and 4-D prognostication of
a multi-protein complex's quaternary structure.
Protein structure prognostication from sequence has indeed been extensively researched
for centuries [48] due to the observable problem's significance and well-defined physics as
well as a computational basis. While development has ebbed and flowed in the past, the
"neutralization" of structure prediction pipelines has seen substantial gains in the last two
years, with calculations formerlyreliant on energy designs as well as sampling mechanisms
being supplanted by neural networks.A Deep Neural Network (DNN) methodology has been
established in this paper for uncovering protein-ligand contact well with a drug of interest
[48].The DNN detects protein-ligand contact for a provided medicine as well as evaluates
which drug produces the most efficacious virus-fighting communication.
That used a constrained genome sequence of Indian patients offered towards the GISAID
database, the DNN satisfactorily reveals protein-ligand interactions for just an exact
medication [49]. The research might be enlarged in the upcoming years to entail a mixture of
reinforcement learning as well as deep learning methodologies to extensively assess the
peptide structure culminating from protein-ligand interactions.
Despite their prominence, clustering-dependent strategies are unable to find good/near-
native decoys on data sources in which near-native decoys seem to be radically under-
sampled by protein structure prognostication methodologies [50]. However, collecting basins
from the terrain using these approaches takes a long time.In this study, the article suggests an
innovative decoy shortlisting technique predicated on non-matrix factorization. It
demonstrates that the proposed strategy beats the method based on energy landscapes. This
same suggested methodology fruitfully recognizes near-native decoys for both simple as well
as challenging protein targets, tackling both the time money concern as well as the obstacles
of finding adequate decoys in such a sparse dataset.
The massive compilation of 3D structure information in the PDB had already fueled
substantial advancement in our comprehension of protein design, climaxing in the latest
advances in protein structure prognostication assisted by AI technology as well as deep or
machine learning strategies [51]. As the count of protein structure information of public-
domain increases, accuracy enhances as well as the influence of AI breakthroughs keeps
growing. The most pressing dilemma encountering structural biology seems to be the
adequate governance but also long-term restoration of information produced by structural
inquests of ever-larger frameworks, varying from molecular machinery to organelles to whole
cells, utilizing integrative or hybrid methodologies.
Historically, the architectures of proteins, as well as their complexes, were ascertained
utilizing individual or combined exploratory strategies such as X-ray crystallography, NMR,
or cryo-electron microscopy [52]. Meanwhile, computational approaches for forecasting
protein structure have indeed been getting better, culminating in the breakthrough
achievement of AlphaFold2, whose monomeric protein models are often as precise as actual
structures. As a consequence, deep learning strategies might be trained to substitute

18

Electronic copy available at: https://ssrn.com/abstract=4541252


prevailing X-ray crystallography as well as EM prototype advancement pipelines in only one
step.
Natural language processing (NLP) appears to be a branch of computer science concerned
with the automatic appraisal of text as well as language [53]. Proteins, as well as language,
have theoretical parallels & distinctions, as well as a variety of protein-related activities that
can be automated. A review of old ideas such as bag-of-words, k-mers/n-grams, and text
search, as well as newer approaches like word embedding, contextualized embedding, deep
learning, as well as neural language concepts, provides tactics for storing protein information
as text but also assessing it utilizing NLP methods.

Table 5: Protein Structure Prediction Utilizing Machine Learning as well as Deep Learning

Reference No Technique Dataset Advantages Disadvantages \


Future work
[46] Artificial LASER Artificial The AI-directed
Intelligence database intelligence enzyme, as well
(AI) was as pathway
indeed configuration, is
enabling more predicted to be
sophisticated growingly
protein as well utilized to
as pathway broaden the
design. diverse range of
synthetic paths
& enzymes for
the
manufacturing
of numerous
compounds.
[47] Machine Protein Data Easier to Accuracy and
learning Bank implement speed
considerations
are required
[48] AlphaFold2 CASP -13 Reduce the Ultra-high
computational accuracy
in the network prediction is
required
[49] Deep Neural GISAID most efficient A combination
Network database of reinforcement
(DNN) learning and
deep learning
algorithms
introduced
[50] NMF-Rank & CASP dataset For both Time and cost
NMF-MAD simple as well issue
as challenging
protein targets,
this suggested
methodology
satisfactorily

19

Electronic copy available at: https://ssrn.com/abstract=4541252


recognizes
near-native
decoys.
[51] AI as well as Protein Data accuracy will The most
deep or Bank improve pressing issue
machine confronting the
learning structural
approaches biology field is
the proper
administration
[52] AlphaFold2 CASP -13 and Increases the Deep learning
CASP -14 accuracy algorithms
might be trained
to replace
present X-ray
crystallography
and EM model
development
[53] Natural mass- Less memory Problematic due
Language spectrometry and to sparser and
Processing proteomic computational more biased
(NLP) databases time data.

In the coming years, machine learning, as well as deep learning methodologies, would
then proceed to perform a function throughout protein structure prediction as well as many
other features. The rapid expansion of accessible training datasets, as well as the disparity
between both the count of sequences as well as rectified structures, remain great motivators
for upcoming advancement.Moreover, ML algorithms are frequently quicker than other
strategies. A lot of the time, machine learning techniques devote learning, that might be
accomplished offline. In "production" mode, a trained feedforward neural network, for
example, could create predictions rapidly. As genomic, proteomic, as well as protein
engineering attempts proceed to introduce substantial obstacles, both accuracy and speed will
probably become more critical.

3. SUMMARY AND DISCUSSIONS


Protein structure prediction is the most vital as well as demanding concern in computational
structural biology. As stated previously, various techniques to address this issue have been
developed. We review protein structure prediction strategies predicated on evolutionary
algorithms in our study. We investigated ab initio, de novo protein prediction, single as well
as multi-objective optimization, differential evolution, and even other evolutionary strategies.
We present an overview of several studies, providing specific aspects and points of issue
modeling as well as the techniques employed.
● Proteins were being involved in a broad range of biological operations. Protein
structure prediction is indeed a tough issue in bioinformatics which has been labeled as NP
(Non-deterministic Polynomial). In the ab initio approach, this can be asserted as a
minimization problem to discover the global minimum of a function that quantifies a protein
structure's free energy.
● There seem to be innumerable regions as in sector of computational de novo protein
layout that necessitate substantial advancement. To create tremendous sequence optimization
issue computationally manageable, scoring tasks render a few presumptions, including

20

Electronic copy available at: https://ssrn.com/abstract=4541252


implicit solvation designs as well as pairwise degradable energy factors. Enhancing scoring
accuracy as well as speed will be a future priority.
● In the coming years, ML algorithms might well proceed to perform a task in protein
structure prediction as well as its several other features. The rising size of the obtainable
training dataset, as well as the imbalance between the count of sequences as well as the
number of solved structures, remain critical motivators for future progress. Moreover,
machine learning methodologies have been frequently quicker than other strategies. For most
of the period, ml algorithms are learning, that can be accomplished offline.
However, high-level formulation necessitates the regard for energetics among all pairs of
proteins, as well as the count of pairings grows quickly as chain length rises, necessitating
more computational effort.

3.1 Future Work


Further development of ML algorithms in in-cell structural biology would assist us by
enabling us to anticipate the structures of protein complexes with high accuracy, which could
then be used as templates to mine tomography data. Experiments on such small complexes,
on the other hand, have highlighted their fleeting nature and plasticity, demonstrated that
their integrity is often dependent on nucleic acids and tiny molecule cofactors, and discovered
that protein-protein interfaces are frequently quite small in size. As a result, reliable protein
complex prediction will probably stay a challenging challenge for the foreseeable future,
relying on improved platforms to incorporate data from multiple sources.
● New progress in designing basic operations like ligand binding, protein-protein
interaction, membrane tracking, as well as induced switching had also opened up the path for
the advancement of much more complicated as well as composite processes like artificial
cellular signaling frameworks, motors, as well as controllable molecular machines predicated
on de novo designed elementary components.
● Discovering a method to reconcile limited energy as well as elevated rationality of
conformations becomes a promising direction for our upcoming research.
● Improving the potential energy function and the search approach in future research.
● In the future, a mixture of reinforcement learning, as well as deep learning algorithms,
would be used to thoroughly examine the peptide format induced by protein-ligand
interactions.
The development of AlphaFold2, which achieves near-angstrom accuracy for solitary apo
domain prediction granted satisfactory deep MSAs, is certainly a breakthrough in PSP's
multi-decade heritage. Regrettably, the latest PSP mechanisms seem to be unverified as well
as untested for a couple of important use cases. These entail: I MSA-free protein sequence
forecasting, which is beneficial for de novo styled proteins, quickly emerging viral proteins,
as well as evolutionarily young mammalian proteins; (ii) ultra-high accuracy prediction,
which would be beneficial for drug discovery as well as enzymology; but also (iii)
predictions delicate to minor sequence modifications which direct to huge structural
alterations, that helps comprehend the molecular foundations of genetic diseases.

4. CONCLUSION
Creating methodologies to broaden the area of tunable backbones will significantly increase
the number of functions that can be achieved. Even though numerous ab initio, de novo
protein functions were generated, a significant proportion of functionalities cannot be created
on a routine basis. Methodological advancements are required to design the complex
geometries of protein operational sites with rising accuracy so that consequent exploratory
optimization could be minimized.Such breakthroughs are obligated for well-tuned as well as
regulated conformational modifications, along with widely polar functional areas.Applying

21

Electronic copy available at: https://ssrn.com/abstract=4541252


design protocols to various situations and systematically evaluating the methods might be
effective in detecting and overcoming limits. In this relatively new field, emerging artificial
intelligence technologies present both opportunities and challenges.AI algorithms not only
can synthesize extant information into statistical designs which develop additional proteins,
but they can also incorporate exploratory information iterative manner to drive protein
design. The supreme objective of computational protein design would be to offer the novel
protein definite operations or characteristics in addition to generating the requisite structure.
In the latter case, multi-faceted computational protein design was often choosing to produce
proteins that fold quicker than the sequences, enhance protein steadiness, confer brand-new
metal-binding centers onto proteins that were initially missing those moieties, generate
proteins that fold quicker than the sequences, as well as anticipate sequence mutations which
constrain proteins in specific conformations.In general, template-based prediction is faster
than experimental methods, at least in terms of determining the protein's preliminary spatial
configuration. One of the significant downfalls of such strategies would be that they
depended on constructing prototypes from extant structures, which means that no novel folds
or families could be discovered. Furthermore, by reducing sequence or structure authenticity,
such strategies fail to create a protein sequence's structural integrity.
REFERENCES
[1] Jumper, John, Richard Evans, Alexander Pritzel, Tim Green, et al., Highly accurate
protein structure prediction with AlphaFold. Nature 2021;596(7873):583-589.
[2] Tunyasuvunakool, Kathryn, Jonas Adler, Zachary Wu, et al., Highly accurate protein
structure prediction for the human proteome. Nature2021;596(7873):590-596.
[3] Du, Zongyang, Hong Su, Wenkai Wang, Lisha Ye, et al., The trRosetta server for fast and
accurate protein structure prediction. Nature protocols2021;16(12):5634-5651.
[4] Pereira, Joana, Adam J. Simpkin, Marcus D. Hartmann, et al., High‐accuracy protein
structure prediction in CASP14. Proteins: Structure, Function, and
Bioinformatics2021;89(12):1687-1699.
[5] Pearce, Robin and Yang Zhang. Deep learning techniques have significantly impacted
protein structure prediction and protein design. Current opinion in structural
biology2021;68:194-207.
[6] Biehn, Sarah,E. and Steffen Lindert. Accurate protein structure prediction with hydroxyl
radical protein footprinting data. Nature communications2021;12(1):1-10.
[7] Chowdhury, Ratul, NazimBouatta, Surojit Biswas, Charlotte Rochereau, George M.
Church, Peter Karl Sorger and Mohammed N. AlQuraishi. Single-sequence protein
structure prediction using language models from deep learning.bioRxiv 2021.
[8] FlohilJ.A., Vriend,G. and Berendsen, H.J.C. Completion and refinement of 3‐D homology
models with restricted molecular dynamics: application to targets 47, 58, and 111 in the
CASP modeling competition and posterior analysis. Proteins: Structure, Function, and
Bioinformatics2002;48(4):593-604.
[9] MortuzaS.M., Wei Zheng, Chengxin Zhang, Yang Li, Robin Pearce and Yang Zhang.
Improving fragment-based ab initio protein structure assembly using low-accuracy
contact-map predictions. Nature communications2021;12(1):1-12.
[10] Guo, Zhiye, JieHou and Jianlin Cheng. DNSS2: improved ab initio protein secondary
structure prediction using advanced deep learning architectures. Proteins: Structure,
Function, and Bioinformatics2021;89(2):207-217.
[11] Hou, Jie, Tianqi Wu, ZhiyeGuo, FarhanQuadir and Jianlin Cheng. The multicom protein
structure prediction server empowered by deep learning and contact distance prediction.
In Protein Structure Prediction Humana, New York, NY2020;13-26.

22

Electronic copy available at: https://ssrn.com/abstract=4541252


[12] Wang, Tong, YanhuaQiao, Wenze Ding, Wenzhi Mao, Yaoqi Zhou and Haipeng Gong.
Improved fragment sampling for ab initio protein structure prediction using deep neural
networks. Nature Machine Intelligence2019;1(8):347-355.
[13] Park, Taeyong, MinkyungBaek, Hasup Lee and ChaokSeok. GalaxyTongDock:
Symmetric and asymmetric ab initio protein-protein docking web server with improved
energy parameters. Journal of computational chemistry2019;40(27):2413-2417.
[14] Adhikari, Badri and Jianlin Cheng, “CONFOLD2: improved contact-driven ab initio
protein structure modeling,” BMC bioinformatics2018;19(1):1-5.
[15] Zhang, Gui-Jun, Lai-Fa Ma, Xiao-Qi Wang and Xiao-Gen Zhou. Secondary structure
and contact guided differential evolution for protein structure prediction. IEEE/ACM
transactions on computational biology and bioinformatics2018;17(3):1068-1081.
[16] Vreven, Thom, Devin K. Schweppe, Juan D. Chavez, Chad R. Weisbrod, Sayaka
Shibata, ChunxiangZheng, James E. Bruce and ZhipingWeng. Integrating cross-linking
experiments with ab initio protein-protein docking. Journal of molecular
biology2018;430(12):1814-1828.
[17] Moghram, BasemAmeen, Emad Nabil and AmrBadr. Ab-initio conformational epitope
structure prediction using genetic algorithm and SVM for vaccine desig. Computer
methods and programs in biomedicine2018;153:161-170.
[18] Song, Shuangbao, ShangceGao, Xingqian Chen, DongbaoJia, XiaoxiaoQian and Yuki
Todo. AIMOES: Archive information assisted multi-objective evolutionary strategy for
ab initio protein structure prediction. Knowledge-Based Systems2018;146:58-72.
[19] FloudasC.A., FungH.K., McAllisterS.R.,MönnigmannM. and RajgariaR. Advances in
protein structure prediction and de novo protein design: A review. Chemical Engineering
Science 2006;61(3):966-988.
[20] Liu, Jun, Kai-Long Zhao, Guang-Xing He, Liu-Jing Wang, Xiao-Gen Zhou and Gui-Jun
Zhang. A de novo protein structure prediction by iterative partition sampling, topology
adjustment and residue-level distance deviation optimization.
Bioinformatics2022;38(1):99-107.
[21] Hou, Minghua, Chunxiang Peng, Xiaogen Zhou, Biao Zhang and Guijun Zhang. Multi
contact-based folding method for de novo protein structure prediction. Briefings in
Bioinformatics2022;23(1):bbab463.
[22] AlQuraishi, Mohammed. Machine learning in protein structure prediction. Current
Opinion in Chemical Biology2021;65:1-8.
[23] Pan, Xingjie and TanjaKortemme. Recent advances in de novo protein design:
Principles, methods, and applications. Journal of Biological Chemistry2021;100558.
[24] Zhao, Kailong, Jun Liu, Xiaogen Zhou, Jianzhong Su, Yang Zhang and Guijun Zhang.
MMpred: a distance-assisted multimodal conformation sampling for de novo protein
structure prediction. bioRxiv2021.
[25] Peng, Chunxiang, Xiaogen Zhou and Guijun Zhang. De novo Protein Structure
Prediction by Coupling Contact with Distance Profile. IEEE/ACM Transactions on
Computational Biology and Bioinformatics 2020.
[26] Cao, Longxing, Inna Goreshnik, Brian Coventry, James Brett Case, Lauren Miller, Lisa
Kozodoy, Rita E. Chen et al.De novo design of picomolar SARS-CoV-2 miniprotein
inhibitors. Science2020;370(6515):426-43.
[27] Zhang, Gui-Jun, Xiao-Qi Wang, Lai-Fa Ma, Liu-Jing Wang, Jun Hu, and Xiao-Gen
Zhou. Two-stage distance feature-based optimization algorithm for de novo protein
structure prediction. IEEE/ACM transactions on computational biology and
bioinformatics2019;17(6):2119-2130.

23

Electronic copy available at: https://ssrn.com/abstract=4541252


[28] Hao, Xiao-Hu, Gui-Jun Zhang and Xiao-Gen Zhou. Conformational space sampling
method using multi-subpopulation differential evolution for de novo protein structure
prediction. IEEE transactions on Nanobioscience201716(7):618-633.
[29] Brender, R. Jeffrey, David Shultis, NaureenAslamKhattak and Yang Zhang. An
evolution-based approach to de novo protein design. In Computational Protein Design
Humana Press, New York, NY2017;243-264.
[30] Tekin, Aşkın, UgurUzuner and KazımSezen. Homology modeling and heterologous
expression of highly alkaline subtilisin-like serine protease from Bacillus halodurans C-
125. Biotechnology Letters2021;43(2):479-494.
[31] Abdullahi, Mustapha, Shola Elijah Adeniji, David Ebuka Arthur and
AbdurrashidHaruna. Homology modeling and molecular docking simulation of some
novel imidazo [1, 2-a] pyridine-3-carboxamide (IPA) series as inhibitors of
Mycobacterium tuberculosis. Journal of Genetic Engineering and
Biotechnology2021;19(1):1-13.
[32] Pearce, Robin and Yang Zhang. Deep learning techniques have significantly impacted
protein structure prediction and protein design. Current Opinion in Structural
Biology2021;68:194-207.
[33] Webb, Benjamin and Andrej Sali. Protein structure modeling with MODELLER.
In Structural Genomics, Humana, New York, NY2021;239-255.
[34] Sonsare, Pravinkumar,M and Gunavathi, C. Cascading 1D-Convnet Bidirectional Long
Short Term Memory Network with Modified COCOB Optimizer: A Novel Approach for
Protein Secondary Structure Prediction. Chaos, Solitons& Fractals2021;153:111446.
[35] Dhingra, Surbhi, RamanathanSowdhamini, Frédéric Cadet and Bernard Offmann. A
glance into the evolution of template-free protein structure prediction methodologies.
Biochimie2020;175:85-92.
[36] Torrisi, Mirko, GianlucaPollastri and Quan Le. Deep learning methods in protein
structure prediction. Computational and Structural Biotechnology Journal2020;18:1301-
1310.
[37] Karasev, Vladimir. Data on the application of the molecular vector machine model: A
database of protein pentafragments and computer software for predicting and designing
secondary protein structures. Data, in brief2020;28:104815.
[38] AlQuraishi, Mohammed. End-to-end differentiable learning of protein structure. Cell
Systems2019;8(4):292-301.
[39] Masoudi-Sobhanzadeh, Yosef, BehzadJafari, SepidehParvizpour, Mohammad M.
Pourseif and YadollahOmidi. A novel multi-objective metaheuristic algorithm for
protein-peptide docking and benchmarking on the LEADS-PEP dataset. Computers in
Biology and Medicine2021;138:104896.
[40] Narloch, Pedro Henrique, Mathias J. Krause and Márcio Dorn. Multi-Objective
Differential Evolution Algorithms for the Protein Structure Prediction Problem. In 2020
IEEE Congress on Evolutionary Computation (CEC) IEEE2020;1-8.
[41] Śledź, Paweł and AmedeoCaflisch. Protein structure-based drug design: from docking to
molecular dynamics. Current opinion in structural biology 2018;48:93-102.
[42] Song, Shuangbao, JunkaiJi, Xingqian Chen, ShangceGao, Zheng Tang and Yuki Todo.
Adoption of an improved PSO to explore a compound multi-objective energy function in
protein structure prediction. Applied Soft Computing 2018;72:539-551.
[43] Feng, Zhong-kai, Wen-jingNiu and Chun-tian Cheng. Optimization of hydropower
reservoirs operation balancing generation benefit and ecological requirement with a
parallel multi-objective genetic algorithm. Energy 2018;153:706-718.
[44] Gao, Shangce, Shuangbao Song, Jiujun Cheng, Yuki Todo and Mengchu Zhou.
Incorporation of solvent effect into a multi-objective evolutionary algorithm for improved

24

Electronic copy available at: https://ssrn.com/abstract=4541252


protein structure prediction. IEEE/ACM transactions on computational biology and
bioinformatics2017;15(4):1365-1378.
[45] Senior, W. Andrew, Richard Evans, John Jumper, James Kirkpatrick, Laurent Sifre, Tim
Green, Chongli Qin, et al. Improved protein structure prediction using potentials from
deep learning. Nature2020;577(7792):706-710.
[46] Jang, Woo Dae, Gi Bae Kim, Yeji Kim and Sang Yup Lee. Applications of artificial
intelligence to enzyme and pathway design for metabolic engineering. Current Opinion in
Biotechnology 2022;73:101-107.
[47] Cheng, Jianlin, Allison N. Tegge and Pierre Baldi. Machine learning methods for protein
structure prediction. IEEE reviews in biomedical engineering 2008;1:41-49.
[48] AlQuraishi, Mohammed. Machine learning in protein structure prediction. Current
Opinion in Chemical Biology 2021;65:1-8.
[49] Yuvaraj, Natarajan, Kannan Srihari, SelvarajChandragandhi, RajanArshath Raja, Gaurav
Dhiman and Amandeep Kaur. Analysis of protein-ligand interactions of SARS-Cov-2
against selective drug-using deep neural networks. Big Data Mining and
Analytics2021;4(2):76-83.
[50] Akhter, Nasrin, KaziLutfulKabir, GopinathChennupati, RavitejaVangara,
BoianAlexandrov, Hristo N. Djidjev and AmardaShehu. Improved protein decoy
selection via non-negative matrix factorization. IEEE/ACM Transactions on
Computational Biology and Bioinformatics 2021.
[51] Burley, StephenK and Helen M. Berman. Open-access data: A cornerstone for artificial
intelligence approaches to protein structure prediction. Structure 2021.
[52] Masrati, Gal, Meytal Landau, Nir Ben-Tal, Andrei Lupas, Mickey Kosloff and Jan
Kosinski. Integrative structural biology in the era of accurate structure prediction. Journal
of Molecular Biology 2021;433(20):167127.
[53] Ofer, Dan, NadavBrandes and Michal Linial. The language of proteins: NLP, machine
learning & protein sequences. Computational and Structural Biotechnology Journal 2021.

25

Electronic copy available at: https://ssrn.com/abstract=4541252

You might also like