Genomic and Protein Structure Modelling Analysis D
Genomic and Protein Structure Modelling Analysis D
net/publication/339345927
Genomic and protein structure modelling analysis depicts the origin and
pathogenicity of 2019-nCoV, a new coronavirus which caused a pneumonia
outbreak in Wuhan, China
CITATIONS READS
16 221
7 authors, including:
Some of the authors of this publication are also working on these related projects:
Basic Research on the development, transmission and control of the antimicrobial resistance in major animal pathogens. View project
All content following this page was uploaded by Lianwei Ye on 30 April 2020.
RESEARCH ARTICLE
Genomic and protein structure modelling analysis depicts the
origin and pathogenicity of 2019-nCoV, a new coronavirus which
caused a pneumonia outbreak in Wuhan, China [version 1; peer
review: awaiting peer review]
Ning Dong1*, Xuemei Yang 1*, Lianwei Ye1*, Kaichao Chen1*,
1Department of Infectious Diseases and Public Health, Jockey Club College of Veterinary Medicine and Life Sciences, City University of Hong
Hong Kong
* Equal contributors
Page 1 of 10
F1000Research 2020, 9:121 Last updated: 07 APR 2020
Keywords
2019-nCoV, Genomics, Protein modelling, Origin, Pathogenicity, Wuhan
Page 2 of 10
F1000Research 2020, 9:121 Last updated: 07 APR 2020
Page 3 of 10
F1000Research 2020, 9:121 Last updated: 07 APR 2020
bat-SL-CoVZX21) (Figure 1a). According to the phylogenetic new virus with result showing that 2019-nCoV showed similar
tree, human SARS viruses were closest to bat SARS-like diversity from these two strains (Figure S1b, Extended data22).
viruses but with a lesser degree to bat coronaviruses, and was When 2019-nCoV was aligned with bat CoV HKU9-1, another
least related to other coronaviruses. The 2019-nCoV stands in a bat coronavirus with further distance, it showed that 2019-nCoV
position between bat SARS-like viruses and bat coronavi- was very different from this virus (Figure S1c, Extended
ruses, suggesting it is less related to the human SARS virus data22). These sequence alignment data were consistent with
than other bat SARS-like viruses, and is likely a new type of bat results of phylogenetic tree analysis and indicated that 2019-
coronavirus. Nevertheless, all coronaviruses that exhibit close nCoV exhibited in between bat SARS like viruses and bat
linkage with 2019-nCoV originated from bat, strongly suggest- coronaviruses, but is genetically distant from the human SARS
ing that this new coronavirus originated from bat (Figure 1a). virus. It should be considered a new type of bat coronavirus.
Coronaviruses of other species including the murine coronavirus
are genetically distant from this new coronavirus, indicating Since the S protein is the protein that exhibits the highest
that 2019-nCoV did not originate from other animal hosts. As degree of genetic variations among different coronaviruses, we
bats are not sold in the Wuhan market, animals that serve as the performed phylogenetic analysis of the S protein of different
transmission vehicle remains to be identified15,16. coronaviruses (Figure S2, Extended data22). Our data showed
that the S protein of 2019-nCoV exhibited high homology with
The sequence of 2019-nCoV was annotated and aligned with bat SARS-like coronaviruses such as bat-SL-CoVZX45 and
several representative coronaviruses selected according to the bat-SL-CoVZX21, human SARS virus and bat coronaviruses.
degree of genetic relatedness depicted by the phylogenetic tree Homology of S protein of 2019-nCoV with representative
(Figure 1b). These included two highly homologous human human SARS virus, bat SARS like viruses and bat corona-
SARS coronavirus: SARS CoV P2 (FJ882963) and SARS viruses was determined as shown in Table 2. S protein of
CoV ZJ02 (EU371559), one bat SARS virus that exhibits high 2019-nCoV showed about 76% homology to human SARS
homology with human SARS virus and similar potential to virus P2 and high homology to bat SARS like viruses, while it
infect human as human SARS coronavirus: bat SARS CoV showed 72% homology to closest bat coronavirus, bat coro-
W1V1 (KF367457)17, two bat SARS-like viruses that were not navirus BM48-31, even lower homology with other bat
able to infect human: (bat-SL-CoVZX45 and Rp Shaanxi2011), coronaviruses. These data further suggested that 2019-nCoV
and two un-related coronaviruses, the MERS virus MERS CoV is more likely a new type of bat coronavirus with only loose
(NC019843) and the Avian Infectious Bronchitis (IBV) virus IBV linkage with the SARS virus. Interestingly, different regions
CoV (AY646283). The new 2019-nCoV was annotated slightly of the S proteins exhibited different levels of homology among
different from the human SARS virus and other coronaviruses, the known coronaviruses (Figure 1b). Amino acid sequence
but the functionally important ORFs, ORF1a and ORF1b, and alignment showed consistently that the N-terminal regions
major structural proteins including the spike (S), membrane (M) were far more diverse than the C-terminus, which seemed to be
and envelop (E) and nucleic capsid (N) proteins are well annotated highly conservative (Figure 2). Aligning these regions to the
(Figure 1b). Consistent with the phylogenetic tree data, structure of S protein indicated that the structurally conserved
2019-nCoV did not align well with the MERS and IBV virus C-terminal aligned well to the transmembrane domain which
(Figure S1, Extended data22). Among the SARS viruses, bat consists of a double helix, whereas the most variable region
coronaviruses and 2019-nCoV, the non-structural proteins gener- aligned to the N-terminal domain; on the other hand, the
ally aligned well but variations were observable in major structural receptor binding domain exhibited intermediate level of sequence
proteins and some small ORFs (Figure 1b). Detailed sequence variation (Figure 2).
alignment showed that 2019-nCoV exhibited significant sequence
variation at several regions with the human SARS coronavirus Due to the high amino acid sequence homology of the S protein
including the N-terminal region of ORF1a and S protein, in 2019-nCoV and the SARS virus which can cause severe
ORF3, E, ORF6, 7 and 8, and the middle part of N protein. Bat human infection, we analyzed the structural similarity of this
SARS-like virus W1V1 aligned well with human SARS virus protein in various viruses. Protein structure modeling was
P2, with some variations at ORF8 and an insertion between performed to obtain high quality structure of S proteins from
ORF6 and 7. Coronavirus 2019-nCoV aligned best with the bat different coronaviruses (Table 1). The high level similarity
SARS-like virus bat-SL-CoVZX45, with the majority of genetic observable between the structures of S protein from differ-
variations being seen at the N-terminal part of S protein. The ent viruses implied that the S protein of 2019-nCoV and bat
bat-SL-CoVZX45 virus itself exhibited a high degree of vari- coronaviruses would most likely use the same human cell recep-
ation with another bat SARS-like CoV Rp Shaanxi2011 at tor as SARS virus (Table 2). It was shown that angiotensin-
ORF1a, the N-terminus of S protein and other structural proteins. converting enzyme 2 (ACE2) was the cellular receptor of
However, bat SARS-like CoV Rp Shaanxi2011 exhibited high the SARS virus S protein18. Complex structure of ACE2 with
homology with human SARS virus ZJ02, with variation being the receptor binding domain (RBD) of S protein of SARS
seen at the N-terminus of S protein and the middle part of N virus has been resolved and demonstrated tight interaction
protein (Figure 1b). To check if 2019-nCoV is more close to bat between these two proteins at the interaction interface5. Current
SARS like virus or bat coronavirus, we selected two adjacent studies have confirmed that 2019-nCoV indeed showed bind-
viruses from each group, bat SARSCoV Rf1/2004 and bat ing to ACE2. These data prompted us to determine the level of
coronavirus BM48-31/BGR/2008, to perform alignment with the interaction between the S protein of 2019-nCoV with its potential
Page 4 of 10
F1000Research 2020, 9:121 Last updated: 07 APR 2020
Figure 1. Phylogenetic analysis and sequence alignment of coronoviruses of different species. (a) Phylogenetic tree of coronaviruses
from different species. The type of coronovirus and the host were labelled. Virus labeled with red is the newly discovered coronovirus 2019-
nCoV. (b) Sequence alignment of representative the new 2019-nCoV, bat SARS like coronoviruses and bat coronoviruses. These included two
highly homologous human SARS coronaviruses: SARS CoV P2 (FJ882963) and SARS CoV ZJ02 (EU371559), one bat SARS virus showing
high homology with human SARS virus and similar potential to infect human as human SARS coronavirus: bat SARS CoV W1V1 (KF367457),
two bat SARS-like viruses that are not able to infect human: (bat-SL-CoVZX45 and Rp Shaanxi2011) and the newly discovered 2019-nCoV
from Wuhan.
Page 5 of 10
F1000Research 2020, 9:121 Last updated: 07 APR 2020
Figure 2. Amino acid sequence alignment of coronaviruses from bat and human. Upper panel is the amino acid sequence alignment
and the lower panel is the structure of the S protein. Three domains of S protein were labelled as red, blue and green, which aligned with the
different colors for the aligned sequence. Sequences that exhibits the highest degree of variation are in the N-terminal domain, followed by
those in the RBD. The C-terminal green domain is the most conservative among the test viruses.
cellular receptor ACE2. Using modeled S protein structure and ACE2 (2ajf) as reference5. Structural analysis of potential inter-
by further performing structure-based alignment, we obtained the actions between RBD of S protein from human SARS virus and
complex structure of RBD of the S protein of 2019-nCoV and ACE2 protein depicted several interaction points including four
several bat coronaviruses with human ACE2, using the com- hydrophobic interactions: ACE2(Y41)/RBD(Y484), ACE2(L45)/
plex structure of RBS of S protein from human SARS virus and RBD(Y484), ACE2(L79, M82)/RBD(L472), ACE2(Y83)/RBD(Y475),
Page 6 of 10
F1000Research 2020, 9:121 Last updated: 07 APR 2020
Table 1. Assessment of the quality of modeled structures of spike protein and its receptor
binding domain (RBD) of different coronavirus using protein structures of the human SARS
virus as templates.
one salt-bridge: ACE2(E329)/RBD(R426) and one cation-π interac- ACE2(L45)/RBD(Y485), ACE2(L79, M82)/RBD(F473), ACE2(Y83)/
tion: ACE2(K353)/RBD(Y491) (Figure 3a–e). However, examination RBD(Y476), one salt-bridge: ACE2(E329)/RBD(R427) and one
of interaction between RBD of 2019-nCoV and human ACE2 cation-π interaction site ACE2(K353)/RBD(Y492). These data sug-
depicted only one potential hydrophobic interaction gested that the higher binding affinity of RBD of coronavirus to
between ACE2(L79, M82) and RBD(F486) and one cation-π ACE2 will confer the virus higher infectivity and pathogenicity.
interaction ACE2(K353)/RBD(Y492). Further examination of The fact that the RBD of 2019-nCoV exhibited much lower affin-
interactions between RBD from bat SARS like coronaviruses, ity to ACE2 implies that the virulence potential of 2019-nCoV
bat-SL-CoVZX45 and bat coronavirus HKU3-1 that do not should be much lower than that of human SARS virus, but is
infect human, showed only one cation-π interaction interac- nevertheless stronger than viruses that do not cause human infec-
tion, ACE2(K353)/RBD(Y481)20,21. Another bat-originated coro- tion20,21; such finding is also consistent with the current epidemio-
navirus, bat SARS CoV W1V1 that displayed strong binding logical data in that 2019-nCoV only caused severe pneumonia in
to ACE2 and exhibited potential to cause human infection was patients with weaker immune system such as the elderlies and
also included for analysis17. Binding affinity of RBD from this people with underlying diseases. The weaker binding affinity of
virus to ACE2 was as tight as that of the human SARS virus, 2019-nCoV to human cell might also explain the limited human
involving four hydrophobic interactions: ACE2(Y41)/RBD(Y485), to human transmission potential of this virus observed to date.
Page 7 of 10
F1000Research 2020, 9:121 Last updated: 07 APR 2020
Figure 3. Potential interactions between receptor binding domain (RBD) of S proteins from different coronaviruses and the
human cell receptor ACE2. Interactions between ACE2 with RBD of human SARS virus (a), highly similar bat SARS like coronavirus,
CoV-W1V1 that can infect human (b), new type of coronavirus, 2019-nCoV (c), bat SARS like coronavirus CoV SL-CoVZXC21 and bat
coronavirus HKU3-1 (d) are shown. The detailed amino acid interaction sites between these two proteins are shown in (e). Arrows
showed the areas with interacted residues from both proteins. Amino acid highlighted with different colors indicated the potential interaction
residues between different proteins, which was highlighted with different colors.
Discussion be large and the incubation time should be longer, which imply
In this study, we utilized the whole genome sequence of the that 1) human to human transmission should mainly happen
newly discovered coronavirus, 2019-nCoV, that caused an out- with close contact within a certain period of time; 2) the longer
break of pneumonia in Wuhan, China to perform comparative incubation time should favor the transmission of this virus in
genetic and functional analysis with the human SARS virus and human, which suggest that patients with no or mild symptom
coronaviruses recovered from different animals. Phylogenetic should also be isolated to prevent further transmission; 3) sup-
analysis of coronavirus of different species indicated that portive treatment should be enough to save lives of severe cases.
2019-nCoV might have originated from bat, but the intermediate Second, the lower pathogenicity of the virus may suggest that
transmission vehicle is not known at this stage. Genetic linkage most of the patients should have mild symptom, but they are
analysis showed that 2019-nCoV lied at the interface between infectious. Appropriate measures should be designed to isolate
bat SARS like coronavirus and bat coronavirus and should these patients without occupying large amount of hospital
belong to a novel type of bat coronavirus owing to high degree resources. Thirdly, the lower pathogenicity of the new virus may
of variation from the human SARS virus. Analysis of the suggest that it adapts to human much better, which imply that
potential interaction of RBD of 2019-nCoV with human ACE2 the virus may not disappear after few generations of transmis-
receptor protein indicated that its affinity to human cell is much sion and long term battle with this virus should be prepared.
lower than that of human SARS virus due to the loss of several Lastly, the epidemiology of the outbreak in the early stage is
important interaction sites, implying that the infectivity and very different from the current dramatic situation. This may not
pathogenicity of this new virus should be much lower than the exclude that the virus may have undergo human adaption and
human SARS virus. These data were the most comprehensive mutation in the past month. Using the analysis platform that
scientific data that support the origin of this new virus, which is we have developed above, we should be able to predict whether
important for the following research to identify the intermediate the new mutations could lead to the increase of infectivity of the
transmission vehicles most likely wild animals. Most impor- mutated virus in a very short time.
tantly, our data supported that the pathogenicity of 2019-nCoV
is much lower when compared to SARS virus. These data are Data availability
very important for the current prevention and treatment of Underlying data
infections caused by this new virus. First, the lower pathogenic- All data underlying the results are available as part of the article
ity of this new virus may suggest that the infectious dose should and no additional source data are required.
Page 8 of 10
F1000Research 2020, 9:121 Last updated: 07 APR 2020
Extended data Data are available under the terms of the Creative Commons
Figshare: Extended data for 2019-nCoV. https://doi.org/10.6084/ Zero “No rights reserved” data waiver (CC0 1.0 Public domain
m9.figshare.1184822422 dedication).
References
1. Nishimura Y, Yoshida T, Kuronishi M, et al.: ViPTree: the viral proteomic tree 12. Li Q, Guan X, Wu P, et al.: Early Transmission Dynamics in Wuhan, China, of
server. Bioinformatics. 2017; 33(15): 2379–2380. Novel Coronavirus-Infected Pneumonia. N Engl J Med. 2020.
PubMed Abstract | Publisher Full Text PubMed Abstract | Publisher Full Text
2. Bah T: Inkscape: guide to a vector drawing program[M]. Upper Saddle River, NJ 13. Rothe C, Schunk M, Sothmann P, et al.: Transmission of 2019-nCoV Infection
USA: Prentice Hall. 2011. from an Asymptomatic Contact in Germany. N Engl J Med. 2020.
Reference Source PubMed Abstract | Publisher Full Text
3. Kumar S, Stecher G, Li M, et al.: MEGA X: Molecular Evolutionary Genetics 14. Zhu N, Zhang D, Wang W, et al.: A Novel Coronavirus from Patients with
Analysis across Computing Platforms. Mol Biol Evol. 2018; 35(6): 1547–1549. Pneumonia in China, 2019. N Engl J Med. 2020.
PubMed Abstract | Publisher Full Text | Free Full Text PubMed Abstract | Publisher Full Text
4. Larkin MA, Blackshields G, Brown NP, et al.: Clustal W and Clustal X version 2.0. 15. Lau SK, Woo PC, Li KS, et al.: Severe acute respiratory syndrome coronavirus-
Bioinformatics. 2007; 23(21): 2947–2948. like virus in Chinese horseshoe bats. Proc Natl Acad Sci U S A. 2005; 102(39):
PubMed Abstract | Publisher Full Text 14040–14045.
5. Li F, Li W, Farzan M, et al.: Structure of SARS coronavirus spike receptor- PubMed Abstract | Publisher Full Text | Free Full Text
binding domain complexed with receptor. Science. 2005; 309(5742): 1864–1868. 16. To KK, Hung IF, Chan JF, et al.: From SARS coronavirus to novel animal and
PubMed Abstract | Publisher Full Text human coronaviruses. J Thorac Dis. 2013; 5 Suppl 2: S103–108.
6. Madeira F, Park YM, Lee J, et al.: The EMBL-EBI search and sequence analysis PubMed Abstract | Publisher Full Text | Free Full Text
tools APIs in 2019. Nucleic Acids Res. 2019; 47(C): W636–W641.
17. Ge XY, Li JL, Yang XL, et al.: Isolation and characterization of a bat SARS-like
PubMed Abstract | Publisher Full Text | Free Full Text
coronavirus that uses the ACE2 receptor. Nature. 2013; 503(7477):
7. Yuan Y, Cao D, Zhang Y, et al.: Cryo-EM structures of MERS-CoV and SARS- 535–538.
CoV spike glycoproteins reveal the dynamic receptor binding domains. Nat PubMed Abstract | Publisher Full Text | Free Full Text
Commun. 2017; 8: 15092.
18. Li W, Moore MJ, Vasilieva N, et al.: Angiotensin-converting enzyme 2 is a
PubMed Abstract | Publisher Full Text | Free Full Text
functional receptor for the SARS coronavirus. Nature. 2003; 426(6965):
8. Waterhouse A, Bertoni M, Bienert S, et al.: SWISS-MODEL: homology modelling 450–454.
of protein structures and complexes. Nucleic Acids Res. 2018; 46(W1): PubMed Abstract | Publisher Full Text
W296–W303.
PubMed Abstract | Publisher Full Text | Free Full Text 19. Zhou P, Yang XL, Wang XG, et al.: A pneumonia outbreak associated with a
new coronavirus of probable bat origin. Nature. 2020.
9. Lu R, Zhao X, Li J, et al.: Genomic characterisation and epidemiology of 2019 PubMed Abstract | Publisher Full Text
novel coronavirus: implications for virus origins and receptor binding. Lancet.
2020; pii: S0140-6736(20)30251-8. 20. Dominguez SR, Shrivastava S, Berglund A, et al.: Isolation, propagation, genome
PubMed Abstract | Publisher Full Text analysis and epidemiology of HKU1 betacoronaviruses. J Gen Virol. 2014;
95(Pt 4): 836–848.
10. Ji JS: Origins of MERS-CoV, and lessons for 2019-nCoV. Lancet Planet Health. 2020;
PubMed Abstract | Publisher Full Text | Free Full Text
pii: S2542-5196(20)30032-2.
PubMed Abstract | Publisher Full Text 21. Hu D, Zhu C, Ai L, et al.: Genomic characterization and infectivity of a novel
SARS-like coronavirus in Chinese bats. Emerg Microbes Infect. 2018; 7(1): 154.
11. Lee PI, Hsueh PR: Emerging threats from zoonotic coronaviruses-from
PubMed Abstract | Publisher Full Text | Free Full Text
SARS and MERS to 2019-nCoV. J Microbiol Immunol Infect. 2020; pii: S1684-
1182(20)30011-6. 22. Chen S: Extended data for 2019-nCoV. figshare. Dataset. 2020.
PubMed Abstract | Publisher Full Text http://www.doi.org/10.6084/m9.figshare.11848224.v1
Page 9 of 10
F1000Research 2020, 9:121 Last updated: 07 APR 2020
You can publish traditional articles, null/negative results, case reports, data notes and more
Page 10 of 10