bioinformatics software
bioinformatics software
https://doi.org/10.1093/bib/bbab601
Problem Solving Protocol
Abstract
Circular RNAs (circRNAs), transcripts generated by backsplicing, are particularly stable and pleiotropic molecules, whose dysregulation
drives human diseases and cancer by modulating gene expression and signaling pathways. CircRNAs can regulate cellular processes
by different mechanisms, including interaction with microRNAs (miRNAs) and RNA-binding proteins (RBP), and encoding specific
peptides. The prediction of circRNA functions is instrumental to interpret their impact in diseases, and to prioritize circRNAs
for functional investigation. Currently, circRNA functional predictions are provided by web databases that do not allow custom
analyses, while self-standing circRNA prediction tools are mostly limited to predict only one type of function, mainly focusing on
the miRNA sponge activity of circRNAs. To solve these issues, we developed CRAFT (CircRNA Function prediction Tool), a freely
available computational pipeline that predicts circRNA sequence and molecular interactions with miRNAs and RBP, along with their
coding potential. Analysis of a set of circRNAs with known functions has been used to appraise CRAFT predictions and to optimize its
setting. CRAFT provides a comprehensive graphical visualization of the results, links to several knowledge databases, and extensive
functional enrichment analysis. Moreover, it originally combines the predictions for different circRNAs. CRAFT is a useful tool to help
the user explore the potential regulatory networks involving the circRNAs of interest and generate hypotheses about the cooperation
of circRNAs into the modulation of biological processes.
Introduction
in disease, cancer and when genomic translocations
Circular RNAs (circRNAs) are RNA molecules in which, in occur [4–6]. They represent stable diagnostic and
a process called backsplicing, a downstream 5 splice site prognostic markers [7, 8] and, most interesting, the
is covalently linked to an upstream 3 splice site giving identification of new disease mechanisms involving
rise to a circle. The backsplice most frequently, although circRNAs has the potential to indicate novel therapeutic
not always, joins canonical exons, with respect to linear targets.
transcript annotation [1]. CircRNA functions mostly involve sequence-specific
These peculiar RNAs often exhibit cell type- and binding with other nucleic acids or proteins, or specific
tissue-specific expression [2, 3] and can regulate several coding potential. One prominent mechanism whereby
biological processes with different mechanisms. More- circRNAs function is by sponging microRNAs (miRNAs),
over, circRNAs are emerging as key oncogenic or tumor through one or multiple miRNA-response elements
suppressor molecules whose expression is dysregulated (MRE). CircRNAs thus act as competitive endogenous
Anna Dal Molin is post-doc at the Computational Genomics Laboratory at the Department of Molecular Medicine, University of Padova. Her research interests
include bioinformatics and transcriptomics, circular RNAs in leukemias and development of computational methods for circular RNA function prediction.
Enrico Gaffo is post-doc at the Computational Genomics Laboratory at the Department of Molecular Medicine, University of Padova. His research interests include
circular RNA, microRNA, advanced methods for RNA-seq data analysis and bioinformatics applied to cancer research.
Valeria Difilippo graduated in Industrial Biotechnologies at the University of Padova working on circRNA prediction for her master thesis.
Alessia Buratin is PhD student in Biosciences (curriculum Genetics, Genomics and Bioinformatics) of the University of Padova. Her main interests are biostatistics
and bioinformatics, transcriptomics of hematologic malignancies and circular RNA biogenesis.
Caterina Tretti Parenzan is a PhD student in Pediatric Oncoematology of the University of Padua. Her research is mostly focused on the study of the oncogenic
molecular mechanisms involving circRNAs in pediatric acute lymphoblastic leukemia.
Silvia Bresolin is assistant professor of Molecular Biology at the Department of Women and Child Health Department of the University of Padova. Her field of
interest is focused on pediatric leukemia transcriptomics and genomics alterations, as well as functional genomics and development of novel disease models.
Stefania Bortoluzzi is associate professor of Applied Biology at the Department of Molecular Medicine of the University of Padova, where she leads the
Computational Genomics Laboratory. Her research interests include cancer genomics and transcriptomics, bioinformatics, systems biology, noncoding RNAs,
circular RNAs, exosomal RNAs and hematologic malignancies.
Received: November 10, 2021. Revised: December 10, 2021. Accepted: December 26, 2021
© The Author(s) 2022. Published by Oxford University Press.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/
by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial
re-use, please contact [email protected]
2 | Dal Molin et al.
RNAs (ceRNA) and regulate the expression of miRNA- Instead, CircRNAprofiler [36] and CircCode [37] are both
target genes (TG) [9]. Several key oncogenic circRNA- dedicated to the coding potential of circRNAs, and the
miRNA-gene axes have been described, impacting all second needs ribosome profiling (Ribo-seq) data as input.
cancer hallmarks [10]. With this mechanism, circRNAs In summary, all of the available methods allow the
also regulate key epigenetic ‘writers’ [11] controlling DNA prediction of a single function of circRNAs and there
methylation and histone modifications. is a high need for software tools to comprehensively
CircRNAs can also modulate the activity of RNA- predict circRNA functions. The circRNA sequence is
binding proteins (RBP), a large class of molecules a prerequisite for functional predictions, but most
involved in most biological processes, since they reg- methods to detect circRNA from RNA-seq data return
ulate gene expression at the transcriptional and post- only the backsplice junction coordinates. FcircSEC R
transcriptional levels [12–14]. The protein-circRNA inter- package [38] retrieves circRNA sequence using the longer
action is sequence-specific and RBP-response elements transcript of the gene as reference.
(RRE) can be predicted using pattern recognition [15]. To fill this gap, we developed CRAFT (CircRNA Function
file with AGO2 binding data can be provided by the user Assessment and sample analyses
in BED format. For all the presented analyses, known circRNA back-
Validated miRNA-TG were retrieved from three dif- splice coordinates were retrieved from CircBase. CircRNA
ferent databases (miRecords, mirTarBase and Tarbase) binding with miRNAs or RBP, and the coding potential
using the multiMiR 1.8.0 R package [43], selecting the information, were collected from literature (Table 1). Cir-
‘strong’ validation category of each database. cRNA annotation and genomic sequence are based on the
Ensembl GRCh38 v.93 human genome. CRAFT was run
RBP binding site prediction with default parameters.
RBP binding sites were predicted through beRBP [44] tool, CRAFT sensitivity was calculated as true positive
modified to give in output the ‘voteFrac’ score for all predictions (TP, known elements correctly predicted)
predictions, and run in ‘general mode’ searching for all over the total number of predictions (TP + FN, where
the 143 RBP and 175 Position Weight Matrices of the tool FN stands for false negatives). The prediction ratio was
calculated as the mean number of predictions at a
Table 1. CircRNAs used for the CRAFT assessment analysis, with previously validated functional elements and biological or pathogenetic roles
|
CircRNA Backsplice Validated Validated CircRNA binding validation Tissue Function Reference
functional interaction/peptide
element (aa)
circBCRC3 2:231075511– MRE miR-182-5p Biotin-labeled RNA pulldown assay, Bladder cancer Tumor suppressor [54]
231087181:+ FISH, transfection
circHIPK3∗ 11:33286413-33287511:+ MRE miR-124-3p Luciferase reporter assay, Liver cancer Oncogenic [55]
Dal Molin et al.
Figure 1. CRAFT workf low, schematizing the pipeline input, the main analysis steps and the produced output.
different strategies and settings can recover the known determined AGO2 binding sites was useful. Particularly in
functional elements in the test set (sensitivity), and how combination with the intersection of miRanda and PITA
the prediction ratio (Methods section) were affected. predictions, it kept it with a good sensitivity (≥0.82), while
The results based on the analysis of the known MRE reducing the amount of predictions to only 6%, respective
is presented in Figure 2A. The sensitivity of predictions to the total set of predictions obtained from the union of
obtained by miRanda, PITA and intersection thereof miRanda and PITA (Figure 2A). The starting number of
was constantly high (≥0.9), and the large number of predictions to be intersected and combined with AGO2
predictions provided by single methods can be controlled binding sites is obviously reduced when more stringent
by their intersection, as shown by the line plot. At our thresholds for the four parameters provided by miRanda
tests, the addition of information about experimentally and PITA are set, as shown in Supplementary Figures 1
6 | Dal Molin et al.
and 2. The scatterplot of prediction scores provided by CRAFT when only the exons from the canonical tran-
by miRanda (Supplementary Figure 1) and PITA (Sup- script (Ensembl id: ENST00000261769.10) were used to
plementary Figure 2), with indications of the score predict the circRNA sequence.
pairs associated with known MRE, can help the user to
understand the effect of stringency on MRE detection. The CircExtractor module
For instance, keeping only the predictions with both The circRNA nucleotide sequence is necessary to per-
scores over the first quartile for both methods, 7/11 form functional predictions. Therefore, we included in
known MRE were correctly detected (sensitivity 0.71). our pipeline a module that takes as input the circRNA
Using the median as threshold for the same four scores, backsplice coordinates, the genome sequence and the
the number of predictions decreased (prediction ratio gene annotation, and reconstructs the circRNA sequence
19%), still identifying 4/11 positives (sensitivity 0.36). (Figure 1). If circRNA sequences are available to the user,
In summary, considering that the intersection of this step can be skipped.
miRanda and PITA predictions and the overlapping The CRAFT CircExtractor module reconstructs the
Figure 2. Assessment analysis of circRNAs with known functions. (A) The upper panel shows the number of validated circRNA-miRNA interactions
detected by different methods and combinations thereof for each of the considered circRNAs, whereas the line plot below reports the sensitivity and
the prediction ratio in the same conditions (M, miRanda; P, PITA; M | P, union of M and P; M & P, intersection of M and P; the same with AGO2, predicted
sites overlapping with experimentally determined AGO2 binding sites); the right axis shows the mean number of predictions for circRNA; (B) the upper
panel shows the number of validated circRNA-RBP interactions detected by different beRBP voteFrac thresholds, whereas the line plot below reports
the sensitivity and the prediction ratio in the same conditions; the right axis shows the mean number of predictions for circRNA; (C) distribution of RBP
binding site prediction scores for each circRNA (known binding sites are shown as orange points, the name is shown for the known site with the highest
score; the percentage on the right indicates the prediction ratio of the corresponding voteFrac threshold); the notch boxplot on the right represents the
distribution of the number of predicted RBP per circRNA at a certain voteFrac score; (D) barplot of the amino acid (aa) length of known (in green) and
predicted (in red) peptides for each circRNA.
8 | Dal Molin et al.
is provided for each miRNA, along with different The user can tune the filter stringency and regener-
summarization tables, including the number of TG asso- ate the HTML pages with updated tables and figures
ciated with each miRNA (Figure 4C), the list of miRNAs with minimal computation time. Two filtering modes are
associated with each TG, and miRNA-TG interactions allowed: (i) a ‘direct mode’, by directly inputting each pre-
associated with drugs or diseases. Functional enrichment diction tool threshold parameter; (ii) a ‘quantile mode’, to
results on validated TG are displayed, including disease- filter a subset of predictions based on a selected quantile
associated genes (Figure 4D), Gene Ontology, KEGG (f.i. display and analyze only the top 25% high-scoring
analysis, Reactome pathways (Figure 4E) and MeSH predictions for each tool).
terms. Of note, miR-15a-5p and miR-15b-5p binding sites In principle, different circRNAs can act as sponge for
were detected for circZNF609, in line with literature data the same miRNA, reinforcing the desuppression of the
[17, 61, 62]. miRNA targets. Multiple circRNAs can regulate the same
The RBP section, after a circular plot with RBP gene also through different miRNAs (Figure 1). Thus, if
binding positions and the corresponding interactive table more than one circRNA is given in input, CRAFT inte-
(Figure 5A), displays results of functional enrichments grates altogether the predictions obtained for the dif-
performed on RBP, considering Gene Ontology, KEGG ferent circRNAs. The common predictions are displayed
analysis, Reactome pathways (Figure 5B) and RBP- through tables and figures, to provide valuable hints
disease association (Figure 5C). Additional tables provide about the possible cooperation of circRNAs to the same
protein expression, functional and miscellaneous infor- regulatory axes. The 13 miRNAs with binding sites in the
mation associated with RBP. top 45% high-scoring predictions for both tools and com-
In the last section, the longest predicted ORF overlap- mon to multiple circRNAs, are shown in Figure 6A. More-
ping to the backsplice junction is shown by a circular plot over, in the CRAFT output, circRNAs are connected to
(Figure 5D), while a table reports, for all the predicted miRNA-validated TG through their predicted interactions
ORF, the start and end positions, the distance from the with miRNAs. In addition to tables with TG potentially
backsplice junction, and the predicted coding and pep- shared by circRNAs, CRAFT provides a network visual-
tide sequences in FASTA format (Figure 5E). ization of predicted circRNA-miRNA-TG axes in which
A default set of high-scoring predictions is reported multiple circRNAs are connected to the same TG through
and visualized in the output HTML pages (see Methods). the same or different miRNAs. For instance, circHIPK3
CRAFT tool for circular RNA function prediction | 9
and circZNF609 can potentially control the expression including circSMARCA5 whose interaction with SRSF1
of a group of eleven genes by sponging miR-1207-5p has been previously demonstrated [69].
(circHIPK3) or miR-5581-3p and miR-611 (circZNF609) Finally a barplot (Figure 6D) and a detailed table
(Figure 6B). present the number of predicted ORF for each circRNA,
Similarly, CRAFT displays RBP associated with RRE distinguishing between ORF with and without a stop
predicted in multiple circRNAs. Considering all the pre- codon.
dictions obtained with CRAFT default stringency, 48 RBP
are commonly associated with all the three circRNAs.
Of note, an interaction with the FUS RBP is predicted by Discussion
CRAFT for circHIPK3, in line with literature data [78], as Robust evidence on circRNA functions, involvement in
well as for circSMARCA5. The RBP with binding sites in biological processes, and oncogenic potential make these
the top 10% high-scoring predictions and common for molecules extremely attractive for both fundamental
multiple of the three circRNAs considered in our sample and cancer research [79]. CircRNAs can be detected and
analysis are shown in Figure 6C. All the considered cir- quantified from RNA-seq data using appropriate soft-
cRNAs have strongly predicted binding sites for SRSF1, ware tools, including [80, 81]. Next, the challenge faced
10 | Dal Molin et al.
by studies aiming at defining circRNA roles in disease and Of note, CRAFT correctly detected the known ORF of
cancer is the identification of circRNA-involving mecha- circSHPRH, which is formed by the circularization of
nisms. CircRNA screening by massive silencing or over- four exons of the SHPRH gene. In specific instances,
expression studies and experimental investigation can the sequence reconstructed by CircExtractor can be
be of great help. Nevertheless, the prediction of circRNA different from the sequence produced by the splicing
potential interactions and functions is a precondition for of the exons included in the canonical transcript. This
the prioritization of circRNAs for functional experimen- is exemplified by circ-E-Cad, whose backsplice ends
tal studies, for the interpretation of experiment results include a region overlapped by 11 different transcripts
and the design of targeted mechanistic investigations. that use alternative 5 - and 3 -exon ends in the linear
Indeed, even once a circRNA has been demonstrated splicing, while the validated circ-E-Cad follows the
to significantly impact cell features, such as prolifera- linear splicing pattern of the canonical longer coding
tion, apoptosis, metabolism or differentiation, there is transcript of the gene. In general, the uncertainty
the need to disclose the involved molecules and regula- associated with circRNA sequence reconstruction based
tory axes. on exon annotations tends to increase with the size
CRAFT has been developed to overcome the limita- of the genomic region within the backsplice ends and,
tions of the currently available circRNA function pre- in particular, with the complexity of the alternative
diction methods. It allows the user to explore putative linear splicing pattern in the region. In this regard,
regulatory networks involving one or more circRNAs of our recommendation is to use the CRAFT sequence
interest, facilitating the interpretation of their biological reconstruction module if the circRNA sequences are
and pathogenetic role. Moreover, CRAFT is also a highly not known, or for massive analyses of several circRNAs.
portable tool, thanks to the containerization of the soft- Nevertheless, in projects focusing on the functional
ware, easy to use, and it requires minimal input data. experimental investigation of one or a few specific
CRAFT presents four main advantages over the exist- circRNAs, we advise using experimentally determined
ing tools. First, it provides the putative circRNA sequence. circRNA sequences, that provide a firmer ground to build
The sequence extraction is based on annotated exons, predictions on.
following the observation that linear splicing patterns The second main CRAFT asset is that it allows predic-
of circRNAs mainly join canonical exons [82]. The tion of the three most important circRNAs functions,
CircExtractor module in CRAFT merges all the overlap- whose biological relevance is supported by robust
ping exons within the circRNA coordinates. This works literature [9–14, 16, 18–20]. The most investigated
perfectly well for most cases, as indirectly demonstrated function of circRNAs is the ability to bind miRNAs. One
by our test on circRNAs proved to encode for peptides. limitation of miRNA binding site prediction algorithms
CRAFT tool for circular RNA function prediction | 11
is the low specificity of the predictions. To face this inspecting the possible interactions between circRNAs
problem, we used a dual strategy: (i) the combination and RBP, and also the identification of the RBP possi-
of two algorithms, miRanda [40, 83] and PITA [41], bly bound by multiple circRNAs. Of note, beyond the
and (ii) the integration of predictions with experimentally beRBP database of position weight matrices embedded
determined AGO2 binding sites. The assessment of the in CRAFT, custom matrices can be provided by the user,
CRAFT predictions on a sizable set of circRNAs with extending the analysis.
known functions showed that this approach can control CRAFT predicts the ORF in the circularized sequence,
the number of predictions, while holding good sensitivity. allowing the user to obtain the peptides potentially
Also the interaction of circRNAs with RBP can be asso- encoded by the circRNA and not by the linear transcript.
ciated with important functions [23]. In principle, cir- To what extent circRNAs are translated it is still a
cRNAs acting as decoys can counteract RBP functions, matter of debate [84], but CEP have been identified
whereas in other cases circRNAs can serve as scaffolds in different studies and, in several cases, were also
for the assembly of molecular complexes. CRAFT allows proven to be biologically active [85–87]. CRAFT aims
12 | Dal Molin et al.
at the prediction of circRNA-specific peptides, thus it of interest. The generation of new hypotheses about the
focuses on those ORF overlapping the backplice site, potential impact of circRNAs in biological processes,
which are not contained in linear transcripts. To undergo pathways and diseases can help circRNA prioritization
translation without a 5 cap, circRNAs can require for further study and the interpretation of their biological
specific sequence elements (internal ribosome entry and pathogenetic role.
sites (IRES) [88] and N6 -methyladenosine modification
[18, 89]) whose prediction can be envisaged as an as
interesting CRAFT future development.
Key Points
The third CRAFT advantage for the user is that it pro-
• CRAFT is a self-standing tool for comprehensive circRNA
vides a rich output with graphical visualizations, leverag-
function prediction.
ing functional enrichments and results linking to several • CRAFT functions include circRNA sequence reconstruc-
knowledge databases. The output can be explored, fil- tion, microRNA and RNA-binding protein response ele-
tered and reanalyzed as long as the user prefers to gener- ments and coding potential prediction.
2. Maass PG, Glažar P, Memczak S, et al. A map of human circular 25. Glažar P, Papavasileiou P, Rajewsky N. circBase: a database for
RNAs in clinically relevant tissues. J Mol Med 2017;95:1179–89. circular RNAs. RNA 2014;20:1666–70.
3. Gaffo E, Boldrin E, Dal Molin A, et al. Circular RNA differential 26. Yao D, Zhang L, Zheng M, et al. Circ2Disease: a manually curated
expression in blood cell populations and exploration of circRNA database of experimentally validated circRNAs in human dis-
deregulation in pediatric acute lymphoblastic leukemia. Sci Rep ease. Sci Rep 2018;8:11018.
2019;9:14670. 27. Ghosal S, Das S, Sen R, et al. Circ2Traits: a comprehensive
4. Dal Molin A, Hofmans M, Gaffo E, et al. CircRNAs dysregulated in database for circular RNA potentially associated with disease
juvenile myelomonocytic leukemia: CircMCTP1 stands out. Front and traits. Front Genet 2013;4:283.
Cell Dev Biol 2020;8:613540. 28. Liu YC, Li JR, Sun CH, Andrews E, et al. CircNet: a database
5. Buratin A, Paganin M, Gaffo E, et al. Large-scale circular RNA of circular RNAs derived from transcriptome sequencing data.
deregulation in T-ALL: unlocking unique ectopic expression of Nucleic Acids Res 2016;44(Database issue):D209–D215.
molecular subtypes. Blood Adv 2020;4:5902–14. 29. Lyu Y, Caudron-Herger M, Diederichs S. circ2GO: a database
6. Dal Molin A, Bresolin S, Gaffo E, et al. CircRNAs are here to stay: linking circular RNAs to gene function. Cancer 2020;12.
a perspective on the MLL recombinome. Front Genet 2019;10. 30. Dudekula DB, Panda AC, Grammatikakis I, et al. CircInteractome:
49. Soudy M, Anwar AM, Ahmed EA, et al. UniprotR: retrieving and 70. Feng Y, Yang Y, Zhao X, et al. Circular RNA circ0005276 promotes
visualizing protein sequence and functional information from the proliferation and migration of prostate cancer cells by inter-
universal protein resource (UniProt knowledgebase). J Proteomics acting with FUS to transcriptionally activate XIAP. Cell Death Dis
2020;213:103613. 2019;10:792.
50. Wickham H. ggplot2: Elegant Graphics for Data Analysis, 2009. 71. Zhu Y-J, Zheng B, Luo G-J, et al. Circular RNAs negatively regulate
51. Wickham H, Averick M, Bryan J, et al. Welcome to the Tidyverse. cancer stem cells by physically binding FMRP against CCAR1
J Open Source Software 2019;4:1686. complex in hepatocellular carcinoma. Theranostics 2019;9:
52. Fox J, Weisberg S. An R Companion to Applied Regression, 2018. 3526–40.
53. Yu G. enrichplot: Visualization of Functional Enrichment Result. R 72. Gao X, Xia X, Li F, et al. Circular RNA-encoded oncogenic E-
package version 1.14.1, 2021. cadherin variant promotes glioblastoma tumorigenicity through
54. Xie F, Li Y, Wang M, et al. Circular RNA BCRC-3 suppresses activation of EGFR-STAT3 signalling. Nat Cell Biol 2021;23:
bladder cancer proliferation through miR-182-5p/p27 axis. Mol 278–91.
Cancer 2018;17:144. 73. Pan Z, Cai J, Lin J, et al. A novel protein encoded by circFNDC3B
55. Zheng Q, Bao C, Guo W, et al. Circular RNA profiling reveals inhibits tumor progression and EMT through regulating snail in