Subgenomic RNA identification in SARS-CoV-2 genomic sequencing data
- Matthew D. Parker1,2,3,14,
- Benjamin B. Lindsey4,5,14,
- Shay Leary6,
- Silvana Gaudieri6,7,8,
- Abha Chopra6,
- Matthew Wyles2,
- Adrienn Angyal5,
- Luke R. Green5,
- Paul Parsons9,
- Rachel M. Tucker9,
- Rebecca Brown5,
- Danielle Groves5,
- Katie Johnson4,
- Laura Carrilero9,
- Joe Heffer10,
- David G. Partridge4,5,
- Cariad Evans4,
- Mohammad Raza4,
- Alexander J. Keeley4,5,
- Nikki Smith5,
- Ana Da Silva Filipe11,
- James G. Shepherd11,
- Chris Davis11,
- Sahan Bennett11,
- Vattipally B. Sreenu11,
- Alain Kohl11,
- Elihu Aranday-Cortes11,
- Lily Tong11,
- Jenna Nichols11,
- Emma C. Thomson11,
- The COVID-19 Genomics UK (COG-UK) Consortium12,15,
- Dennis Wang1,2,3,13,
- Simon Mallal6,7 and
- Thushan I. de Silva4,5
- 1Sheffield Bioinformatics Core, The University of Sheffield, Sheffield S10 2HQ, United Kingdom;
- 2Sheffield Institute for Translational Neuroscience, The University of Sheffield, Sheffield S10 2HQ, United Kingdom;
- 3Sheffield Biomedical Research Centre, The University of Sheffield, Sheffield S10 2JF, United Kingdom;
- 4Sheffield Teaching Hospitals NHS Foundation Trust, Department of Virology/Microbiology, Sheffield S10 2JF, United Kingdom;
- 5The Florey Institute for Host-Pathogen Interactions and Department of Infection, Immunity and Cardiovascular Disease, Medical School, University of Sheffield, Sheffield S10 2TN, United Kingdom;
- 6Institute for Immunology and Infectious Diseases, Murdoch University, Murdoch WA 6150, Western Australia, Australia;
- 7Division of Infectious Diseases, Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee 37232, USA;
- 8School of Human Sciences, University of Western Australia, Crawley WA 6009, Western Australia, Australia;
- 9Department of Animal and Plant Sciences, The University of Sheffield, Sheffield S10 2TN, United Kingdom;
- 10IT Services, The University of Sheffield, Sheffield S10 2FN, United Kingdom;
- 11Centre for Virus Research, The University of Glasgow, Glasgow G61 1QH, United Kingdom;
- 12https://www.cogconsortium.uk;
- 13Department of Computer Science, The University of Sheffield, Sheffield S1 4DP, United Kingdom
-
↵14 These authors contributed equally to this work.
Abstract
We have developed periscope, a tool for the detection and quantification of subgenomic RNA (sgRNA) in SARS-CoV-2 genomic sequence data. The translation of the SARS-CoV-2 RNA genome for most open reading frames (ORFs) occurs via RNA intermediates termed “subgenomic RNAs.” sgRNAs are produced through discontinuous transcription, which relies on homology between transcription regulatory sequences (TRS-B) upstream of the ORF start codons and that of the TRS-L, which is located in the 5′ UTR. TRS-L is immediately preceded by a leader sequence. This leader sequence is therefore found at the 5′ end of all sgRNA. We applied periscope to 1155 SARS-CoV-2 genomes from Sheffield, United Kingdom, and validated our findings using orthogonal data sets and in vitro cell systems. By using a simple local alignment to detect reads that contain the leader sequence, we were able to identify and quantify reads arising from canonical and noncanonical sgRNA. We were able to detect all canonical sgRNAs at the expected abundances, with the exception of ORF10. A number of recurrent noncanonical sgRNAs are detected. We show that the results are reproducible using technical replicates and determine the optimum number of reads for sgRNA analysis. In VeroE6 ACE2+/− cell lines, periscope can detect the changes in the kinetics of sgRNA in orthogonal sequencing data sets. Finally, variants found in genomic RNA are transmitted to sgRNAs with high fidelity in most cases. This tool can be applied to all sequenced COVID-19 samples worldwide to provide comprehensive analysis of SARS-CoV-2 sgRNA.
Footnotes
-
↵15 Full lists of Consortium authors and affiliations are located in the Supplemental Material.
-
[Supplemental material is available for this article.]
-
Article published online before print. Article, supplemental material, and publication date are at https://www.genome.org/cgi/doi/10.1101/gr.268110.120.
-
Freely available online through the Genome Research Open Access option.
- Received July 1, 2020.
- Accepted February 2, 2021.
This article, published in Genome Research, is available under a Creative Commons License (Attribution 4.0 International), as described at http://creativecommons.org/licenses/by/4.0/.











