0% found this document useful (0 votes)
26 views11 pages

Genomic and Protein Structure Modelling Analysis D

Uploaded by

agharezaee2022
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views11 pages

Genomic and Protein Structure Modelling Analysis D

Uploaded by

agharezaee2022
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/339345927

Genomic and protein structure modelling analysis depicts the origin and
pathogenicity of 2019-nCoV, a new coronavirus which caused a pneumonia
outbreak in Wuhan, China

Article in F1000 Research · February 2020


DOI: 10.12688/f1000research.22357.1

CITATIONS READS

16 221

7 authors, including:

Ning Dong Xuemei Yang


Soochow University (PRC) The Hong Kong Polytechnic University
72 PUBLICATIONS 1,490 CITATIONS 47 PUBLICATIONS 719 CITATIONS

SEE PROFILE SEE PROFILE

Lianwei Ye Chen Kaichao


City University of Hong Kong City University of Hong Kong
35 PUBLICATIONS 268 CITATIONS 46 PUBLICATIONS 568 CITATIONS

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Basic Research on the development, transmission and control of the antimicrobial resistance in major animal pathogens. View project

anticromicrobial resistance View project

All content following this page was uploaded by Lianwei Ye on 30 April 2020.

The user has requested enhancement of the downloaded file.


F1000Research 2020, 9:121 Last updated: 07 APR 2020

RESEARCH ARTICLE
Genomic and protein structure modelling analysis depicts the
origin and pathogenicity of 2019-nCoV, a new coronavirus which
caused a pneumonia outbreak in Wuhan, China [version 1; peer
review: awaiting peer review]
Ning Dong1*, Xuemei Yang 1*, Lianwei Ye1*, Kaichao Chen1*,

Edward Wai-Chi Chan2, Mengsu Yang3, Sheng Chen 1

1Department of Infectious Diseases and Public Health, Jockey Club College of Veterinary Medicine and Life Sciences, City University of Hong

Kong, Kowloon, Hong Kong


2State Key Lab of Chirosciences, Department of Applied Biology and Chemical Technology, The Hong Kong Polytechnic University, Hung

Hom, Hong Kong


3Department of Biomedical Science, Jockey Club College of Veterinary Medicine and Life Sciences, City University of Hong Kong, Kowloon,

Hong Kong

* Equal contributors

First published: 18 Feb 2020, 9:121 ( Open Peer Review


v1 https://doi.org/10.12688/f1000research.22357.1)
Latest published: 09 Mar 2020, 9:121 ( Reviewer Status AWAITING PEER REVIEW
https://doi.org/10.12688/f1000research.22357.2)
Any reports and responses or comments on the
Abstract
article can be found at the end of the article.
Background: A pandemic outbreak caused by a novel coronavirus,
2019-nCoV, has originated from Wuhan, China and spread to many
countries around the world. The outbreak has led to around 45 thousand
cases and over one thousand death so far.
Methods: Phylogenetic analysis and sequence alignment were used to
align the whole genome sequence of 2019-nCoV with other over 200
sequences of coronaviruses to predict the origin of this novel virus. In
addition, protein modeling and analysis were performed to access the
potential binding of the spike protein of 2019-nCoV with human cell
receptor, angiotensin-converting enzyme 2 (ACE2).
Results: Detailed genomic and structure-based analysis of a new
coronavirus, namely 2019-nCoV, showed that the new virus is a new type
of bat coronavirus and is genetically fairly distant from the human SARS
coronavirus. Structure analysis of the spike (S) protein of this new virus
showed that its S protein only binds much weaker to the ACE2 receptor on
human cells whereas the human SARS coronavirus exhibits strongly affinity
to the ACE receptor.
Conclusions: These findings suggest that the new virus should
theoretically not be able to cause very serious human infection when
compared to human SARS virus. However, the lower pathogenicity of this
new virus may lead to longer incubation time and better adaption to human,
which may favor its efficient transmission in human. These data are
important to guide design of infection control policy and inform the public on
the nature of threat imposed by 2019-nCov. Most importantly, using the

analysis platform that we have developed, we should be able to predict

Page 1 of 10
F1000Research 2020, 9:121 Last updated: 07 APR 2020

analysis platform that we have developed, we should be able to predict


whether the new mutations could lead to the increase of infectivity of the
mutated virus in a very short time.

Keywords
2019-nCoV, Genomics, Protein modelling, Origin, Pathogenicity, Wuhan

This article is included in the Disease Outbreaks


gateway.

Corresponding author: Sheng Chen ([email protected])


Author roles: Dong N: Formal Analysis, Methodology; Yang X: Formal Analysis, Methodology; Ye L: Formal Analysis, Methodology; Chen K:
Formal Analysis, Methodology; Chan EWC: Formal Analysis, Writing – Review & Editing; Yang M: Conceptualization, Project Administration;
Chen S: Conceptualization, Data Curation, Formal Analysis, Funding Acquisition, Investigation, Project Administration, Supervision, Writing –
Original Draft Preparation, Writing – Review & Editing
Competing interests: No competing interests were disclosed.
Grant information: The author(s) declared that no grants were involved in supporting this work.
Copyright: © 2020 Dong N et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which
permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
How to cite this article: Dong N, Yang X, Ye L et al. Genomic and protein structure modelling analysis depicts the origin and
pathogenicity of 2019-nCoV, a new coronavirus which caused a pneumonia outbreak in Wuhan, China [version 1; peer review:
awaiting peer review] F1000Research 2020, 9:121 (https://doi.org/10.12688/f1000research.22357.1)
First published: 18 Feb 2020, 9:121 (https://doi.org/10.12688/f1000research.22357.1)

Page 2 of 10
F1000Research 2020, 9:121 Last updated: 07 APR 2020

Introduction Protein structure prediction and contacts between the


A cluster of pneumonia cases of unknown cause were reported human ACE2 and spike receptor-binding domains
in Wuhan, the capital City of Hubei Province of China in Spike receptor-binding domain (RBD) of coronavirus
December 2019. On 18th January 2020, a total of 44 such cases 2019-nCoV and four bat-originated coronavirus were predicted
were documented, among whom two patients have died, five by aligning their spike protein sequences to spike RBD of SARS
in critical condition, and six have been discharged from hospi- coronavirus5 using Clustal Omega6. Homology modeling
tal. Most patients had visited or worked in a seafood wholesale of spike proteins and the related RBDs of 2019-nCoV and
market in Wuhan. In the early stage of this outbreak, there is four bat-originated coronavirus was performed using SARS-
no strong evidence which suggests that the unknown agent is CoV spike glycoprotein as template (PDB ID: 5X58)7 on the
highly infectious as no health care personnel in the hospitals Swiss-Model workspace8. The structure assessment results are
where the patients were admitted were infected. However, presented below. The models were visualized with PyMol. The
the situation changed dramatically with newly infected cases contacts between human ACE2 and spike RBD were predicted
rapidly increased and human to human transmission has been by aligning to structure of the RBD in complex with the human
confirmed with several health care personnel being con- receptor ACE2 (PDB ID: 2AJF)5.
firmed to be infected. As of Jan 28, 2020, the infection cases
sharply increased in the past few days reaching a total of 4500 Open source alternatives
cases with over 100 confirmed deaths. Sporadic new cases were Sequence alignment performed with CLC Genomics Work-
reported in more and more provinces in China mainly due to the bench can be replicated using open source alternatives such as
traveler from Wuhan. Many confirmed cases were reported in ClustalW.
many other countries. These patients were known to origi-
nated from or have been to Wuhan. These epidemiological data An open source version of PyMol has been made freely available
were quite different from the data reported in the beginning by the developers, available from GitHub (https://github.com/
and may suggest that the new virus could undergo human host schrodinger/pymol-open-source).
adaption/evolution and become more adaptive to human host
leading to more efficient human to human transmission. An earlier version of this article can be found on bioRxiv
(https://doi.org/10.1101/2020.01.20.913368).
As of February 11, 2020, a total of 42,714 confirmed
infections and 21,675 suspected cases were reported worldwide Results and discussion
with over 31,728 confirmed cases in Hubei Province according to On 6th January 2020, the Chinese authority released the sequence
data from the National Health Commission of China. A total of (accession#: MN908947) of a novel coronavirus, designated
1,017 deaths were reported so far with 974 deaths being from as 2019-nCoV, which was isolated from one of the pneumonia
Hubei Province. The mortality rate is very high within Hubei patients and confirmed to be the causative agent for this
Province reaching 3.06% (974/31,728), while the mortality rate outbreak9. Coronaviruses are a large family of viruses, most
outside Hubei Province was about 0.39% (43/10986). These of which cause mild infections such as the common cold,
data suggested that the 2019-nCoV is very infectious, while the but some such as the SARS and MERS (Middle East Res-
pathogenicity seemed to be lower than SARS virus. It is urgent piratory Syndrome) viruses cause severe and potential fatal
to find scientific evidence to support the epidemiological data to respiratory tract infections10,11. Some coronaviruses are known
guide further control measure development. to be transmitted easily between humans, while others do not.
Based on currently available information, the 2019-nCoV virus
Methods belongs to a category that can cause severe illness in some
Phylogenetic analysis and sequence alignment patients but does not transmit readily between people12–14. It is
A total of 211 genome sequences of viruses in the necessary to investigate the genetic and functional data of this
Coronaviridae family including the 2019-nCoV Wuhan virus new virus and compare to other coronaviruses so as to guide
were downloaded from GenBank (last accessed on 11 January future research and design of appropriate infection control
2020; list of downloaded sequences is provided as extended policy to prevent widespread dissemination of another poten-
data22). Circular proteomic trees were computed using tially deadly coronavirus since the emergence of the SARS and
ViPTreeGen v1.1.21. The sequence of a Breda virus (accession: MERS viruses. In this study, we performed in-depth genetic
NC_007447) was used as an outgroup. Alignment of sequences analysis of 2019-nCoV and generated data which provide
in different viral genomes was conducted using the alignment timely and valuable insight into the potential origin of this virus,
function of ViPTree1. The phylogenetic tree and sequence align- its ability to cause human infection, and its genetic relatedness
ment products were manually edited using Inkscape v0.912. with SARS and MERS.
Ten spike protein sequences which were similar to that of
2019-nCoV were downloaded from NCBI. SNP analysis was Phylogenetic analysis of genomic sequences of coronaviruses
performed using Mega X3, and the alignment was carried out deposited in the GenBank revealed that 2019-nCoV belonged
by using ClustalW4. The aligned sequences were edited and to betacoronavirus and exhibited the closest linkage with
viewed in CLC Genomics Workbench 20. two SARS-like coronavirus from bat (bat-SL-CoVZX45 and

Page 3 of 10
F1000Research 2020, 9:121 Last updated: 07 APR 2020

bat-SL-CoVZX21) (Figure 1a). According to the phylogenetic new virus with result showing that 2019-nCoV showed similar
tree, human SARS viruses were closest to bat SARS-like diversity from these two strains (Figure S1b, Extended data22).
viruses but with a lesser degree to bat coronaviruses, and was When 2019-nCoV was aligned with bat CoV HKU9-1, another
least related to other coronaviruses. The 2019-nCoV stands in a bat coronavirus with further distance, it showed that 2019-nCoV
position between bat SARS-like viruses and bat coronavi- was very different from this virus (Figure S1c, Extended
ruses, suggesting it is less related to the human SARS virus data22). These sequence alignment data were consistent with
than other bat SARS-like viruses, and is likely a new type of bat results of phylogenetic tree analysis and indicated that 2019-
coronavirus. Nevertheless, all coronaviruses that exhibit close nCoV exhibited in between bat SARS like viruses and bat
linkage with 2019-nCoV originated from bat, strongly suggest- coronaviruses, but is genetically distant from the human SARS
ing that this new coronavirus originated from bat (Figure 1a). virus. It should be considered a new type of bat coronavirus.
Coronaviruses of other species including the murine coronavirus
are genetically distant from this new coronavirus, indicating Since the S protein is the protein that exhibits the highest
that 2019-nCoV did not originate from other animal hosts. As degree of genetic variations among different coronaviruses, we
bats are not sold in the Wuhan market, animals that serve as the performed phylogenetic analysis of the S protein of different
transmission vehicle remains to be identified15,16. coronaviruses (Figure S2, Extended data22). Our data showed
that the S protein of 2019-nCoV exhibited high homology with
The sequence of 2019-nCoV was annotated and aligned with bat SARS-like coronaviruses such as bat-SL-CoVZX45 and
several representative coronaviruses selected according to the bat-SL-CoVZX21, human SARS virus and bat coronaviruses.
degree of genetic relatedness depicted by the phylogenetic tree Homology of S protein of 2019-nCoV with representative
(Figure 1b). These included two highly homologous human human SARS virus, bat SARS like viruses and bat corona-
SARS coronavirus: SARS CoV P2 (FJ882963) and SARS viruses was determined as shown in Table 2. S protein of
CoV ZJ02 (EU371559), one bat SARS virus that exhibits high 2019-nCoV showed about 76% homology to human SARS
homology with human SARS virus and similar potential to virus P2 and high homology to bat SARS like viruses, while it
infect human as human SARS coronavirus: bat SARS CoV showed 72% homology to closest bat coronavirus, bat coro-
W1V1 (KF367457)17, two bat SARS-like viruses that were not navirus BM48-31, even lower homology with other bat
able to infect human: (bat-SL-CoVZX45 and Rp Shaanxi2011), coronaviruses. These data further suggested that 2019-nCoV
and two un-related coronaviruses, the MERS virus MERS CoV is more likely a new type of bat coronavirus with only loose
(NC019843) and the Avian Infectious Bronchitis (IBV) virus IBV linkage with the SARS virus. Interestingly, different regions
CoV (AY646283). The new 2019-nCoV was annotated slightly of the S proteins exhibited different levels of homology among
different from the human SARS virus and other coronaviruses, the known coronaviruses (Figure 1b). Amino acid sequence
but the functionally important ORFs, ORF1a and ORF1b, and alignment showed consistently that the N-terminal regions
major structural proteins including the spike (S), membrane (M) were far more diverse than the C-terminus, which seemed to be
and envelop (E) and nucleic capsid (N) proteins are well annotated highly conservative (Figure 2). Aligning these regions to the
(Figure 1b). Consistent with the phylogenetic tree data, structure of S protein indicated that the structurally conserved
2019-nCoV did not align well with the MERS and IBV virus C-terminal aligned well to the transmembrane domain which
(Figure S1, Extended data22). Among the SARS viruses, bat consists of a double helix, whereas the most variable region
coronaviruses and 2019-nCoV, the non-structural proteins gener- aligned to the N-terminal domain; on the other hand, the
ally aligned well but variations were observable in major structural receptor binding domain exhibited intermediate level of sequence
proteins and some small ORFs (Figure 1b). Detailed sequence variation (Figure 2).
alignment showed that 2019-nCoV exhibited significant sequence
variation at several regions with the human SARS coronavirus Due to the high amino acid sequence homology of the S protein
including the N-terminal region of ORF1a and S protein, in 2019-nCoV and the SARS virus which can cause severe
ORF3, E, ORF6, 7 and 8, and the middle part of N protein. Bat human infection, we analyzed the structural similarity of this
SARS-like virus W1V1 aligned well with human SARS virus protein in various viruses. Protein structure modeling was
P2, with some variations at ORF8 and an insertion between performed to obtain high quality structure of S proteins from
ORF6 and 7. Coronavirus 2019-nCoV aligned best with the bat different coronaviruses (Table 1). The high level similarity
SARS-like virus bat-SL-CoVZX45, with the majority of genetic observable between the structures of S protein from differ-
variations being seen at the N-terminal part of S protein. The ent viruses implied that the S protein of 2019-nCoV and bat
bat-SL-CoVZX45 virus itself exhibited a high degree of vari- coronaviruses would most likely use the same human cell recep-
ation with another bat SARS-like CoV Rp Shaanxi2011 at tor as SARS virus (Table 2). It was shown that angiotensin-
ORF1a, the N-terminus of S protein and other structural proteins. converting enzyme 2 (ACE2) was the cellular receptor of
However, bat SARS-like CoV Rp Shaanxi2011 exhibited high the SARS virus S protein18. Complex structure of ACE2 with
homology with human SARS virus ZJ02, with variation being the receptor binding domain (RBD) of S protein of SARS
seen at the N-terminus of S protein and the middle part of N virus has been resolved and demonstrated tight interaction
protein (Figure 1b). To check if 2019-nCoV is more close to bat between these two proteins at the interaction interface5. Current
SARS like virus or bat coronavirus, we selected two adjacent studies have confirmed that 2019-nCoV indeed showed bind-
viruses from each group, bat SARSCoV Rf1/2004 and bat ing to ACE2. These data prompted us to determine the level of
coronavirus BM48-31/BGR/2008, to perform alignment with the interaction between the S protein of 2019-nCoV with its potential

Page 4 of 10
F1000Research 2020, 9:121 Last updated: 07 APR 2020

Figure 1. Phylogenetic analysis and sequence alignment of coronoviruses of different species. (a) Phylogenetic tree of coronaviruses
from different species. The type of coronovirus and the host were labelled. Virus labeled with red is the newly discovered coronovirus 2019-
nCoV. (b) Sequence alignment of representative the new 2019-nCoV, bat SARS like coronoviruses and bat coronoviruses. These included two
highly homologous human SARS coronaviruses: SARS CoV P2 (FJ882963) and SARS CoV ZJ02 (EU371559), one bat SARS virus showing
high homology with human SARS virus and similar potential to infect human as human SARS coronavirus: bat SARS CoV W1V1 (KF367457),
two bat SARS-like viruses that are not able to infect human: (bat-SL-CoVZX45 and Rp Shaanxi2011) and the newly discovered 2019-nCoV
from Wuhan.

Page 5 of 10
F1000Research 2020, 9:121 Last updated: 07 APR 2020

Figure 2. Amino acid sequence alignment of coronaviruses from bat and human. Upper panel is the amino acid sequence alignment
and the lower panel is the structure of the S protein. Three domains of S protein were labelled as red, blue and green, which aligned with the
different colors for the aligned sequence. Sequences that exhibits the highest degree of variation are in the N-terminal domain, followed by
those in the RBD. The C-terminal green domain is the most conservative among the test viruses.

cellular receptor ACE2. Using modeled S protein structure and ACE2 (2ajf) as reference5. Structural analysis of potential inter-
by further performing structure-based alignment, we obtained the actions between RBD of S protein from human SARS virus and
complex structure of RBD of the S protein of 2019-nCoV and ACE2 protein depicted several interaction points including four
several bat coronaviruses with human ACE2, using the com- hydrophobic interactions: ACE2(Y41)/RBD(Y484), ACE2(L45)/
plex structure of RBS of S protein from human SARS virus and RBD(Y484), ACE2(L79, M82)/RBD(L472), ACE2(Y83)/RBD(Y475),

Page 6 of 10
F1000Research 2020, 9:121 Last updated: 07 APR 2020

Table 1. Assessment of the quality of modeled structures of spike protein and its receptor
binding domain (RBD) of different coronavirus using protein structures of the human SARS
virus as templates.

S protein GenBank residues MolProbity Ramachandran GMQE2 QMEAN2


Score1 Favoured1
2019-nCoV MN908947 1273 1.64 89.4% 0.73 -4.46
BatSARS-likeCoV-WIV1 KC881007 1256 1.36 90.02% 0.74 -3.45
BatSARS-likeCoV MG772934 1245 1.64 88.96% 0.73 -3.91
BatSARSr-CoV DQ022305 1242 1.34 89.49% 0.73 -3.96
BatCoV JX993987 1240 1.4 90.02% 0.73 -3.90
1, Structure assessment. MolProbity is all-atom contact analysis based only on properties of the predicted model.
Lower numbers indicate better models. Ramachandran favoured indicates energetically favoured regions for
backbone dihedral angles against amino acid residues in protein structure. Larger numbers indicate better models.
2, Model evaluation. GMQE (Global Model Quality Estimation) is a quality estimation which combines properties
of the target–template alignment and the template search method. The resulting GMQE score is expressed as a
number between 0 and 1, larger numbers indicate higher reliability. The QMEAN Z-score provides an estimate of the
“degree of nativeness” of the structural features observed in the model on a global scale. QMEAN Z-scores around
zero indicate good agreement between the model structure and experimental structures of similar size.

Table 2. Homology of S protein between different coronaviruses.

2019- SL- CoV- CoV- Shaanxi- Bat- Bat- CoV- CoV-


SARS_P2
nCoV CoVZXC21 HKU3-1 WIV1 2011 BM48-31 zj2013 HKU5-1 HKU4-1
2019-nCoV 100% 81.35% 77.93% 77.7% 76.58% 76.56% 72.5% 41.06% 31.21% 30.58%
SLCoVZXC21 81.35% 100% 83.09% 78.15% 77.5% 82.05% 71.8% 40.59% 31.62% 31.46%
CoV-HKU3-1 77.93% 83.09% 100% 80.11% 79.55% 88.38% 75.1% 40.79% 32.27% 31.87%
CoV-WIV1 77.7% 78.15% 80.11% 100% 92.11% 81.06% 75.66% 41.35% 31.31% 30.8%
SARS_P2 76.58% 77.5% 79.55% 92.11% 100% 81.14% 75.64% 41.14% 31.42% 30.99%
Shaanxi-2011 76.56% 82.05% 88.38% 81.06% 81.14% 100% 75.06% 41.6% 31.91% 31.67%
Bat-BM48-31 72.5% 71.8% 75.1% 75.66% 75.64% 75.06% 100% 40.21% 32.12% 32.05%
Bat-zj2013 41.06% 40.59% 40.79% 41.35% 41.14% 41.6% 40.21% 100% 28.77% 28.9%
CoV-HKU5-1 31.21% 31.62% 32.27% 31.31% 31.42% 31.91% 32.12% 28.77% 100% 69.23%
CoV-HKU4-1 30.58% 31.46% 31.87% 30.8% 30.99% 31.67% 32.05% 28.9% 69.23% 100%

one salt-bridge: ACE2(E329)/RBD(R426) and one cation-π interac- ACE2(L45)/RBD(Y485), ACE2(L79, M82)/RBD(F473), ACE2(Y83)/
tion: ACE2(K353)/RBD(Y491) (Figure 3a–e). However, examination RBD(Y476), one salt-bridge: ACE2(E329)/RBD(R427) and one
of interaction between RBD of 2019-nCoV and human ACE2 cation-π interaction site ACE2(K353)/RBD(Y492). These data sug-
depicted only one potential hydrophobic interaction gested that the higher binding affinity of RBD of coronavirus to
between ACE2(L79, M82) and RBD(F486) and one cation-π ACE2 will confer the virus higher infectivity and pathogenicity.
interaction ACE2(K353)/RBD(Y492). Further examination of The fact that the RBD of 2019-nCoV exhibited much lower affin-
interactions between RBD from bat SARS like coronaviruses, ity to ACE2 implies that the virulence potential of 2019-nCoV
bat-SL-CoVZX45 and bat coronavirus HKU3-1 that do not should be much lower than that of human SARS virus, but is
infect human, showed only one cation-π interaction interac- nevertheless stronger than viruses that do not cause human infec-
tion, ACE2(K353)/RBD(Y481)20,21. Another bat-originated coro- tion20,21; such finding is also consistent with the current epidemio-
navirus, bat SARS CoV W1V1 that displayed strong binding logical data in that 2019-nCoV only caused severe pneumonia in
to ACE2 and exhibited potential to cause human infection was patients with weaker immune system such as the elderlies and
also included for analysis17. Binding affinity of RBD from this people with underlying diseases. The weaker binding affinity of
virus to ACE2 was as tight as that of the human SARS virus, 2019-nCoV to human cell might also explain the limited human
involving four hydrophobic interactions: ACE2(Y41)/RBD(Y485), to human transmission potential of this virus observed to date.

Page 7 of 10
F1000Research 2020, 9:121 Last updated: 07 APR 2020

Figure 3. Potential interactions between receptor binding domain (RBD) of S proteins from different coronaviruses and the
human cell receptor ACE2. Interactions between ACE2 with RBD of human SARS virus (a), highly similar bat SARS like coronavirus,
CoV-W1V1 that can infect human (b), new type of coronavirus, 2019-nCoV (c), bat SARS like coronavirus CoV SL-CoVZXC21 and bat
coronavirus HKU3-1 (d) are shown. The detailed amino acid interaction sites between these two proteins are shown in (e). Arrows
showed the areas with interacted residues from both proteins. Amino acid highlighted with different colors indicated the potential interaction
residues between different proteins, which was highlighted with different colors.

Discussion be large and the incubation time should be longer, which imply
In this study, we utilized the whole genome sequence of the that 1) human to human transmission should mainly happen
newly discovered coronavirus, 2019-nCoV, that caused an out- with close contact within a certain period of time; 2) the longer
break of pneumonia in Wuhan, China to perform comparative incubation time should favor the transmission of this virus in
genetic and functional analysis with the human SARS virus and human, which suggest that patients with no or mild symptom
coronaviruses recovered from different animals. Phylogenetic should also be isolated to prevent further transmission; 3) sup-
analysis of coronavirus of different species indicated that portive treatment should be enough to save lives of severe cases.
2019-nCoV might have originated from bat, but the intermediate Second, the lower pathogenicity of the virus may suggest that
transmission vehicle is not known at this stage. Genetic linkage most of the patients should have mild symptom, but they are
analysis showed that 2019-nCoV lied at the interface between infectious. Appropriate measures should be designed to isolate
bat SARS like coronavirus and bat coronavirus and should these patients without occupying large amount of hospital
belong to a novel type of bat coronavirus owing to high degree resources. Thirdly, the lower pathogenicity of the new virus may
of variation from the human SARS virus. Analysis of the suggest that it adapts to human much better, which imply that
potential interaction of RBD of 2019-nCoV with human ACE2 the virus may not disappear after few generations of transmis-
receptor protein indicated that its affinity to human cell is much sion and long term battle with this virus should be prepared.
lower than that of human SARS virus due to the loss of several Lastly, the epidemiology of the outbreak in the early stage is
important interaction sites, implying that the infectivity and very different from the current dramatic situation. This may not
pathogenicity of this new virus should be much lower than the exclude that the virus may have undergo human adaption and
human SARS virus. These data were the most comprehensive mutation in the past month. Using the analysis platform that
scientific data that support the origin of this new virus, which is we have developed above, we should be able to predict whether
important for the following research to identify the intermediate the new mutations could lead to the increase of infectivity of the
transmission vehicles most likely wild animals. Most impor- mutated virus in a very short time.
tantly, our data supported that the pathogenicity of 2019-nCoV
is much lower when compared to SARS virus. These data are Data availability
very important for the current prevention and treatment of Underlying data
infections caused by this new virus. First, the lower pathogenic- All data underlying the results are available as part of the article
ity of this new virus may suggest that the infectious dose should and no additional source data are required.

Page 8 of 10
F1000Research 2020, 9:121 Last updated: 07 APR 2020

Extended data Data are available under the terms of the Creative Commons
Figshare: Extended data for 2019-nCoV. https://doi.org/10.6084/ Zero “No rights reserved” data waiver (CC0 1.0 Public domain
m9.figshare.1184822422 dedication).

This project contains the following extended data:


• E  xtended+data_F1000Research.docx (Document contain-
ing supplementary table and figures) Acknowledgments
We acknowledge the use of the genome sequence of 2019-nCoV
• a ll-virus-accessions.xlsx (Excel spreadsheet of virus
(deposited in GenBank with the accession number MN908947)
accession numbers used for analysis)
and other coronavirus sequences that we have used in this study
• s pike-protein-accessions.xlsx (Excel spreadsheet of spike in GenBank. An earlier version of this article can be found on
protein accession numbers used for analysis) bioRxiv (doi: https://doi.org/10.1101/2020.01.20.913368).

References

1. Nishimura Y, Yoshida T, Kuronishi M, et al.: ViPTree: the viral proteomic tree 12. Li Q, Guan X, Wu P, et al.: Early Transmission Dynamics in Wuhan, China, of
server. Bioinformatics. 2017; 33(15): 2379–2380. Novel Coronavirus-Infected Pneumonia. N Engl J Med. 2020.
PubMed Abstract | Publisher Full Text PubMed Abstract | Publisher Full Text
2. Bah T: Inkscape: guide to a vector drawing program[M]. Upper Saddle River, NJ 13. Rothe C, Schunk M, Sothmann P, et al.: Transmission of 2019-nCoV Infection
USA: Prentice Hall. 2011. from an Asymptomatic Contact in Germany. N Engl J Med. 2020.
Reference Source PubMed Abstract | Publisher Full Text
3. Kumar S, Stecher G, Li M, et al.: MEGA X: Molecular Evolutionary Genetics 14. Zhu N, Zhang D, Wang W, et al.: A Novel Coronavirus from Patients with
Analysis across Computing Platforms. Mol Biol Evol. 2018; 35(6): 1547–1549. Pneumonia in China, 2019. N Engl J Med. 2020.
PubMed Abstract | Publisher Full Text | Free Full Text PubMed Abstract | Publisher Full Text
4. Larkin MA, Blackshields G, Brown NP, et al.: Clustal W and Clustal X version 2.0. 15. Lau SK, Woo PC, Li KS, et al.: Severe acute respiratory syndrome coronavirus-
Bioinformatics. 2007; 23(21): 2947–2948. like virus in Chinese horseshoe bats. Proc Natl Acad Sci U S A. 2005; 102(39):
PubMed Abstract | Publisher Full Text 14040–14045.
5. Li F, Li W, Farzan M, et al.: Structure of SARS coronavirus spike receptor- PubMed Abstract | Publisher Full Text | Free Full Text
binding domain complexed with receptor. Science. 2005; 309(5742): 1864–1868. 16. To KK, Hung IF, Chan JF, et al.: From SARS coronavirus to novel animal and
PubMed Abstract | Publisher Full Text human coronaviruses. J Thorac Dis. 2013; 5 Suppl 2: S103–108.
6. Madeira F, Park YM, Lee J, et al.: The EMBL-EBI search and sequence analysis PubMed Abstract | Publisher Full Text | Free Full Text
tools APIs in 2019. Nucleic Acids Res. 2019; 47(C): W636–W641.
17. Ge XY, Li JL, Yang XL, et al.: Isolation and characterization of a bat SARS-like
PubMed Abstract | Publisher Full Text | Free Full Text
coronavirus that uses the ACE2 receptor. Nature. 2013; 503(7477):
7. Yuan Y, Cao D, Zhang Y, et al.: Cryo-EM structures of MERS-CoV and SARS- 535–538.
CoV spike glycoproteins reveal the dynamic receptor binding domains. Nat PubMed Abstract | Publisher Full Text | Free Full Text
Commun. 2017; 8: 15092.
18. Li W, Moore MJ, Vasilieva N, et al.: Angiotensin-converting enzyme 2 is a
PubMed Abstract | Publisher Full Text | Free Full Text
functional receptor for the SARS coronavirus. Nature. 2003; 426(6965):
8. Waterhouse A, Bertoni M, Bienert S, et al.: SWISS-MODEL: homology modelling 450–454.
of protein structures and complexes. Nucleic Acids Res. 2018; 46(W1): PubMed Abstract | Publisher Full Text
W296–W303.
PubMed Abstract | Publisher Full Text | Free Full Text 19. Zhou P, Yang XL, Wang XG, et al.: A pneumonia outbreak associated with a
new coronavirus of probable bat origin. Nature. 2020.
9. Lu R, Zhao X, Li J, et al.: Genomic characterisation and epidemiology of 2019 PubMed Abstract | Publisher Full Text
novel coronavirus: implications for virus origins and receptor binding. Lancet.
2020; pii: S0140-6736(20)30251-8. 20. Dominguez SR, Shrivastava S, Berglund A, et al.: Isolation, propagation, genome
PubMed Abstract | Publisher Full Text analysis and epidemiology of HKU1 betacoronaviruses. J Gen Virol. 2014;
95(Pt 4): 836–848.
10. Ji JS: Origins of MERS-CoV, and lessons for 2019-nCoV. Lancet Planet Health. 2020;
PubMed Abstract | Publisher Full Text | Free Full Text
pii: S2542-5196(20)30032-2.
PubMed Abstract | Publisher Full Text 21. Hu D, Zhu C, Ai L, et al.: Genomic characterization and infectivity of a novel
SARS-like coronavirus in Chinese bats. Emerg Microbes Infect. 2018; 7(1): 154.
11. Lee PI, Hsueh PR: Emerging threats from zoonotic coronaviruses-from
PubMed Abstract | Publisher Full Text | Free Full Text
SARS and MERS to 2019-nCoV. J Microbiol Immunol Infect. 2020; pii: S1684-
1182(20)30011-6. 22. Chen S: Extended data for 2019-nCoV. figshare. Dataset. 2020.
PubMed Abstract | Publisher Full Text http://www.doi.org/10.6084/m9.figshare.11848224.v1

Page 9 of 10
F1000Research 2020, 9:121 Last updated: 07 APR 2020

The benefits of publishing with F1000Research:

Your article is published within days, with no editorial bias

You can publish traditional articles, null/negative results, case reports, data notes and more

The peer review process is transparent and collaborative

Your article is indexed in PubMed after passing peer review

Dedicated customer support at every stage

For pre-submission enquiries, contact [email protected]

Page 10 of 10

View publication stats

You might also like