R for NGS Analysis and Bioconductor

The document discusses the application of R and Bioconductor in next-generation sequencing (NGS) analysis, highlighting various tools and packages for RNA-seq, ChIP-seq, and SNP-seq. It emphasizes the importance of R as a programming language in bioinformatics and outlines the capabilities of Bioconductor for genomic data analysis. Key topics include data import/export, differential expression analysis, and the integration of biological metadata.

Uploaded by

azhagar_ss

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

43 views29 pages

R for NGS Analysis and Bioconductor

Uploaded by

azhagar_ss

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

R & NGS

Dr. G. Ramesh Kumar, PhD.,

AU-KBC Research Centre,
MIT, Anna University,
Chromepet,Chennai-44.
UNIT III
• Application of R in NGS analysis:
• 5 TOPICS
• Introduction to Bioconductor GR
• Reading of RNA-seq data (ShortRead,
Rsamtools, GenomicRanges),
• annotation (biomaRt, genomeIntervals),
• reads coverage and assign counts (IRanges,
GenomicFeatures),
• differential expression (DESeq).
REF

• [Link]
home/ht-seq#R_BACK
Application of R in NGS analysis
• They are central to many applications in the:
• Genome annotation and
• NGS analysis areas, such as
• RNA-Seq,
• ChIP-Seq and
• SNP-Seq.
Application of R in NGS analysis

• Seq2pathway: an R/Bioconductor package for

pathway analysis of next-generation
sequencing data
R
• In recent years the R language has become the
Lingua Franca of data intensive research, and is
now by far the most widely used data analysis
programming language in bioinfomatics.
• One of the outstanding strengths of the R
language is the ease of programming extensions
to automate the analysis and mining of almost
any data type.
R

• The following topics will be introduced:

• (1) conditional executions,
• (2) loops,
• (3) writing custom functions,
• (4) calling external software,
• (5) running and debugging R programs, and
• (6) building custom R packages.
R
• R ([Link] is a versatile data
analysis environment that has a broad
application spectrum in all experimental and
quantitative scientific areas.
• The associated Bioconductor project provides
access to over 700 R extension packages for
the analysis of modern biological and
biomedical data sets, such as next generation
sequences, comparative genomics, network
modeling and statistical analysis.
R
• The R software is free and runs on all common operating
systems.

• The following topics will be covered:

• (1) command syntax,
• (2) basic functions,
• (3) data import/export,
• (4) data/object types,
• (5) graphical display,
• (6) usage of R packages/libraries (e.g. Bioconductor) and
• (7) using R for basic data analysis operations.
Bioconductor
• [Link]
• Bioconductor is a free, open source and open
development software project for the analysis and
comprehension of genomic data generated by wet
lab experiments in molecular biology.
• [Link]
• Bioconductor provides tools for the analysis and
comprehension of high-throughput genomic data.
• Bioconductor uses the R statistical programming
language, and is open source and open
development.
Why Open Source
• so that you can find out what algorithm is being
used, and how it is being used
• so that you can modify these algorithms to try
out new ideas or to accommodate local
conditions or needs
• so you can read the code, find bugs, suggest
improvements etc.
• so that they can be used as components
(potentially modified) in other peoples software
Overview
• biology is a computational science
• problems of data analysis, data generation,
reproducibility require computational support and
computational solutions
• we value code reuse
– many of the tasks have already been solved
– if we use those solutions we can put effort into new
research
• well designed, self-describing data structures help us
deal with complex data
Goals
• Provide access to powerful statistical and graphical methods
for the analysis of genomic data.
• Facilitate the integration of biological metadata (GenBank,
GO, Entrez Gene, PubMed) in the analysis of experimental
data.
• Allow the rapid development of extensible, interoperable, and
scalable software.
• Promote high-quality documentation and reproducible
research.
• Provide training in computational and statistical methods.
Bioconductor packages
Release 2.10, 554 Software Packages!
• General infrastructure
Biobase, Biostrings, biocViews
• Annotation:
annotate, annaffy, biomaRt, AnnotationDbi  data packages.
• Graphics/GUIs:
geneplotter, hexbin, limmaGUI, exploRase
• Pre-processing:
affy, affycomp, oligo, makecdfenv, vsn, gcrm, limma
• Differential gene expression:
genefilter, limma, ROC, siggenes, EBArrays, factDesign
• GSEA/Hypergeometric Testing
GSEABase, Category, GOstats, topGO
• Graphs and networks:
graph, RBGL, Rgraphviz
• Flow Cytometry:
flowCore, flowViz, flowUtils
• Protein Interactions:
ppiData, ppiStats, ScISI, Rintact
• Sequence Data:
Biostrings,ShortRead,rtracklayer,IRanges,GenomicFeatures,
VariantAnnotation
• Other data:
xcms, DNAcopy, PROcess, aCGH, rsbml, SBMLR, Rdisop
Component software

• interesting problems will require the

coordinated application of many
different techniques
• thus we need integrated interoperable
software
• of primary importance is well designed
and shared data structures
Data complexity
• Dimensionality.
• Dynamic/evolving data: e.g., gene annotation, sequence,
literature.
• Multiple data sources and locations: in-house, WWW.
• Multiple data types: numeric, textual, graphical.
No longer Xnxp!
We distinguish between biological metadata and
experimental metadata.
Experimental metadata

• when were the samples processed

and how
• what arrays were used/what kits
• if size selection of some sort (eg.
fractionation for proteomics
experiments) was used
• date the samples were run
• lane or chip information
• treatments
Biological metadata
• Biological attributes that can be applied to the
experimental data.
• E.g. for genes
– chromosomal location;
– gene annotation (Entrez Gene, GO);
– gene models
– relevant literature (PubMed)
• Biological metadata sets are large, evolving rapidly, and
typically distributed via the WWW.
• Tools: annotate, biomaRt, and
AnnotationDbi, GenomicFeatures packages,
and annotation data packages.
Annotation packages
annotate, annafy, biomaRt, and AnnotationDbi
Metadata package hgu95av2 mappings • Assemble and process genomic
between different gene IDs for this chip. annotation data from public
repositories.
GENENAME
ENTREZID • Build annotation data packages.
zinc finger protein 261
9203 • Associate experimental data in
real time to biological metadata
ACCNUM from web databases such as
X95808 MAP GenBank, GO, KEGG, Entrez
Xq13.1 Gene, and PubMed.
AffyID
41046_s_at
• Process and store query results:
e.g., search PubMed abstracts.
• Generate HTML reports of
analyses.
SYMBOL
ZNF261
PMID
10486218 GO
9205841 GO:0003677
8817323 GO:0007275
GO:0016021 + many other mappings
Sequence Annotation
• for a given gene:
– gene models
– sequence
– exon/intron boundaries
– location
– conservation
• often in the form of tracks
• it is important to keep track of the reference
genome being used
Vignettes
• Bioconductor developed a new documentation
paradigm, the vignette.
• A vignette is an executable document consisting of a
collection of documentation text and code chunks.
• Vignettes form dynamic, integrated, and reproducible
statistical documents that can be automatically
updated if either data or analyses are changed.
• Vignettes can be generated using the Sweave
function from the R tools package.
Bioconductor Software

• concentrate development resources on a few

important aspects
• Biobase: core classes and definitions that allow for
succinct description and handling of the data
• annotate: generic functions for annotation that can be
specialized
• genefilter/limma/DESeq/DEXSeq: differential
expression
• ShortRead/IRanges/GenomicFeatures/
VariantAnnotation: string manipulations, sequence
analysis
Quality Assessment
• ensuring that the data are of sufficient quality
is an essential first step
• arrayQuality Metrics: comprehensive QA
assessment of microarrays (one color or two
color)
– modifications are coming to make it more suitable
for sequence data
• ShortRead: tools for QA of short reads,
primarily Illumina
Biobase:ExpressionSet
• software should help organize and manipulate your
data
• the data need to be assembled correctly once, and
then they can be processed, subset etc without
worrying about them
• we developed the ExpressionSet class
• SummarizedExperiment class is the next iteration in
this process (in the GenomicRanges package)
Microarray data analysis
CEL, CDF .gpr, .Spot

Pre-processing affy marray

vsn limma
vsn
ExpressionSet
Annotation
annotate
Differential Graphs & Cluster Prediction annaffy
expression networks analysis biomaRt
edd graph CRAN + metadata
CRAN packages
genefilter RBGL class
class
limma Rgraphviz e1071
cluster Graphics
multtest ipred
MASS geneplotter
ROC LogitBoost
mva hexbin
+ CRAN MASS
nnet + CRAN
randomForest
rpart
Differential Expression
• limma: provides a linear models interface for
DE
– uses a moderated variance
– a variety of p-value correction methods are
provided
• DESeq and edgeR: for sequence data
– similar approach to limma
– make use of count data (Neg Binomial)
• DEXSeq for exon level differential expression
Machine Learning
• Software for machine learning has been written by many
different people
– the calling sequences and return values are unique to each
method
• MLInterfaces
• provides uniform calling sequences and return values for
all machine learning algorithms
• MLearn is the main wrapper function
– methods, eg knni, are passed to the wrapper
• return values are of class MLOutput
• see the MLInterfaces vignette for more details
Publications
• Bioconductor: Open software development for
computational biology and bioinformatics, Genome
Biology 2004, 5:R80,
[Link]
• Bioinformatics and Computational Biology Solutions
using R and Bioconductor, Springer, 2005, R.
Gentleman, V. Carey, W. Huber, R. Irizarry, S. Dudoit
eds.
• Bioconductor Case Studies, Springer
• R Programming for Bioinformatics, Chapman Hall
Comprehensive R Archive Network
• CRAN is a network of ftp and web servers
around the world that store identical, up-to-
date, versions of code and documentation for
R.
• [Link]

R/Bioconductor for High-Throughput Genomics
100% (1)
R/Bioconductor for High-Throughput Genomics
46 pages
R Bioconductor for High-Throughput Genomics
No ratings yet
R Bioconductor for High-Throughput Genomics
46 pages
Introduction to Bioinformatics Course
No ratings yet
Introduction to Bioinformatics Course
14 pages
Introduction to Bioinformatics Overview
No ratings yet
Introduction to Bioinformatics Overview
40 pages
Bioinformatics: Genomic Databases & Tools
No ratings yet
Bioinformatics: Genomic Databases & Tools
81 pages
Introduction to Bioinformatics Course
No ratings yet
Introduction to Bioinformatics Course
37 pages
R Programming for Computational Biology
No ratings yet
R Programming for Computational Biology
32 pages
Bioinformatics Learning Framework Guide
No ratings yet
Bioinformatics Learning Framework Guide
7 pages
Nucleotide Sequence Analysis Overview
No ratings yet
Nucleotide Sequence Analysis Overview
6 pages
Bioinformatics Training at Bio4 Campus
No ratings yet
Bioinformatics Training at Bio4 Campus
27 pages
Intro to Bioinformatics Course Notes
No ratings yet
Intro to Bioinformatics Course Notes
56 pages
Bioinformatics
No ratings yet
Bioinformatics
22 pages
Supplementary List of Software For Bioinformatics and Comparative Genomics
No ratings yet
Supplementary List of Software For Bioinformatics and Comparative Genomics
5 pages
Instant Notes in Bioinformatics, Richard M Tywman
100% (2)
Instant Notes in Bioinformatics, Richard M Tywman
257 pages
Overview of Biological Databases
No ratings yet
Overview of Biological Databases
28 pages
Bioconductor: Tools for Genomic Data Analysis
No ratings yet
Bioconductor: Tools for Genomic Data Analysis
3 pages
Bioinformatics Workshop Overview
No ratings yet
Bioinformatics Workshop Overview
21 pages
Bioinformatics Lab Assignment Overview
No ratings yet
Bioinformatics Lab Assignment Overview
28 pages
Bioinformatics
No ratings yet
Bioinformatics
55 pages
Introduction To Bioinformatics and Biocomputing I: DR Tan Tin Wee Director Bioinformatics Centre
No ratings yet
Introduction To Bioinformatics and Biocomputing I: DR Tan Tin Wee Director Bioinformatics Centre
39 pages
Introduction to Genomics and Bioinformatics
No ratings yet
Introduction to Genomics and Bioinformatics
72 pages
Introduction to Bioinformatics Basics
No ratings yet
Introduction to Bioinformatics Basics
20 pages
Internet's Role in Bioinformatics Resources
No ratings yet
Internet's Role in Bioinformatics Resources
11 pages
Introduction to Bioinformatics Course
No ratings yet
Introduction to Bioinformatics Course
28 pages
Understanding "hs" in Prescriptions
No ratings yet
Understanding "hs" in Prescriptions
165 pages
Introduction to Bioinformatics Basics
No ratings yet
Introduction to Bioinformatics Basics
28 pages
Biopython Org DIST Docs Tutorial Tutorial HTML
No ratings yet
Biopython Org DIST Docs Tutorial Tutorial HTML
267 pages
Computational Validation and Analysis of Semi-Quantitative Data Using In-Silico Approaches
No ratings yet
Computational Validation and Analysis of Semi-Quantitative Data Using In-Silico Approaches
5 pages
Overview of NCBI Biological Databases
No ratings yet
Overview of NCBI Biological Databases
35 pages
Introduction to Bioinformatics Basics
No ratings yet
Introduction to Bioinformatics Basics
41 pages
Bioinformatics Overview and Exam Tips
No ratings yet
Bioinformatics Overview and Exam Tips
3 pages
Bioinformatics Databases Overview
100% (4)
Bioinformatics Databases Overview
82 pages
Bioinformatics Course Overview and Resources
No ratings yet
Bioinformatics Course Overview and Resources
104 pages
Bi Workbook
No ratings yet
Bi Workbook
13 pages
Genomics and Computational Biology Course
No ratings yet
Genomics and Computational Biology Course
32 pages
BMB 402/502 Bioinformatics Syllabus
No ratings yet
BMB 402/502 Bioinformatics Syllabus
11 pages
Bioinformatics - Trends and Methodologies
No ratings yet
Bioinformatics - Trends and Methodologies
736 pages
Bioinformatics Overview and Resources
No ratings yet
Bioinformatics Overview and Resources
75 pages
Bioinformatics Management System SRS
No ratings yet
Bioinformatics Management System SRS
135 pages
PB Bioinfo L1 2023
No ratings yet
PB Bioinfo L1 2023
21 pages
Bioconductor: Open Software Development For Computational Biology and Bioinformatics
No ratings yet
Bioconductor: Open Software Development For Computational Biology and Bioinformatics
16 pages
Overview of Bioinformatics Techniques
No ratings yet
Overview of Bioinformatics Techniques
43 pages
Bioinformatics in Molecular Biology
No ratings yet
Bioinformatics in Molecular Biology
105 pages
Database Analysis of Protein Sequences
No ratings yet
Database Analysis of Protein Sequences
70 pages
Top Bioinformatics Tools in Biotechnology
No ratings yet
Top Bioinformatics Tools in Biotechnology
3 pages
Bioinformatics Overview by Elizebeth Varghese
No ratings yet
Bioinformatics Overview by Elizebeth Varghese
9 pages
Introduction to Bioinformatics Basics
No ratings yet
Introduction to Bioinformatics Basics
20 pages
Introduction to Bioinformatics Overview
No ratings yet
Introduction to Bioinformatics Overview
33 pages
Understanding Laboratory Information Management Systems
No ratings yet
Understanding Laboratory Information Management Systems
44 pages
Understanding Bioinformatics Basics
No ratings yet
Understanding Bioinformatics Basics
18 pages
Bioinformatics:: Guide To Bio-Computing and The Internet
No ratings yet
Bioinformatics:: Guide To Bio-Computing and The Internet
34 pages
Applications of Bioinformatics in Biology
No ratings yet
Applications of Bioinformatics in Biology
9 pages
Protein Structure Prediction Tools
No ratings yet
Protein Structure Prediction Tools
21 pages
Comprehensive Bioinformatics Course
No ratings yet
Comprehensive Bioinformatics Course
3 pages
DNA Markers for Genetic Mapping
No ratings yet
DNA Markers for Genetic Mapping
27 pages
STS Mapping: Techniques and Applications
No ratings yet
STS Mapping: Techniques and Applications
10 pages
Key Concepts in Genomics and Proteomics
No ratings yet
Key Concepts in Genomics and Proteomics
24 pages
Introduction to CUDA Programming
No ratings yet
Introduction to CUDA Programming
24 pages
Understanding Genomes and Genomics
No ratings yet
Understanding Genomes and Genomics
21 pages
Chou-Fasman Method for Protein Prediction
No ratings yet
Chou-Fasman Method for Protein Prediction
6 pages
RBC Metabolism Simulation for G6PD Deficiency
No ratings yet
RBC Metabolism Simulation for G6PD Deficiency
11 pages
Shrikant Kaushik Profile Overview
No ratings yet
Shrikant Kaushik Profile Overview
458 pages
Grid Computing and e-Science Overview
No ratings yet
Grid Computing and e-Science Overview
39 pages
Real-Time Proteomics Data Processing
No ratings yet
Real-Time Proteomics Data Processing
6 pages
Sequence Alignment Algorithms Review
No ratings yet
Sequence Alignment Algorithms Review
11 pages
ChIP-seq: Insights into Protein-DNA Interactions
No ratings yet
ChIP-seq: Insights into Protein-DNA Interactions
27 pages
Extracting Sequence from PDB 1o9u
No ratings yet
Extracting Sequence from PDB 1o9u
6 pages
Read Alignment Visualization Tools
No ratings yet
Read Alignment Visualization Tools
9 pages
Phylogenetic Relationships of Flectonotus
No ratings yet
Phylogenetic Relationships of Flectonotus
16 pages
Sambalpur University B.Sc. Course Guide
No ratings yet
Sambalpur University B.Sc. Course Guide
163 pages
Ficht 2010, Brucella Taxonomy
No ratings yet
Ficht 2010, Brucella Taxonomy
11 pages
Heredity and Evolution in Genetics
No ratings yet
Heredity and Evolution in Genetics
6 pages
IB Biology Cell Biology Overview
No ratings yet
IB Biology Cell Biology Overview
9 pages
Understanding Reproduction in Organisms
No ratings yet
Understanding Reproduction in Organisms
3 pages
Understanding Intelligence and Creativity
No ratings yet
Understanding Intelligence and Creativity
7 pages
Understanding DNA Structure and Function
No ratings yet
Understanding DNA Structure and Function
3 pages
Oncomine Myeloid Assay User Guide
No ratings yet
Oncomine Myeloid Assay User Guide
84 pages
Evolution and Human Values
No ratings yet
Evolution and Human Values
6 pages
Quick Guide to Genomic DNA Extraction
No ratings yet
Quick Guide to Genomic DNA Extraction
4 pages
The Arctic Apple
No ratings yet
The Arctic Apple
10 pages
WAEC Biology Past Questions 2015-2016
No ratings yet
WAEC Biology Past Questions 2015-2016
82 pages
DNA and Protein Synthesis Study Guide
No ratings yet
DNA and Protein Synthesis Study Guide
65 pages
Key Themes in Child Development
No ratings yet
Key Themes in Child Development
13 pages
Taxonomy and Biological Classification
No ratings yet
Taxonomy and Biological Classification
12 pages
G311G46 BT26S26435138 Questionpaper
No ratings yet
G311G46 BT26S26435138 Questionpaper
35 pages
Comparing Biocide and UV Treatments
No ratings yet
Comparing Biocide and UV Treatments
16 pages
Rox (Passive Reference Dye)
No ratings yet
Rox (Passive Reference Dye)
4 pages
Day 13 Zoology
No ratings yet
Day 13 Zoology
2 pages
Structure and Development of Megasporangium
No ratings yet
Structure and Development of Megasporangium
21 pages
Life Science - RKM
No ratings yet
Life Science - RKM
87 pages
Edexcel IGCSE Biology Test Paper Guide
No ratings yet
Edexcel IGCSE Biology Test Paper Guide
4 pages
Saccharomyces cerevisiae for Succinic Acid Production
No ratings yet
Saccharomyces cerevisiae for Succinic Acid Production
8 pages
Sequence Submission to Nucleotide Databases
No ratings yet
Sequence Submission to Nucleotide Databases
27 pages
Psychiatric and Neuropsychological Evaluation Results
No ratings yet
Psychiatric and Neuropsychological Evaluation Results
4 pages
Biology Exam Paper: Genetic Engineering Focus
No ratings yet
Biology Exam Paper: Genetic Engineering Focus
5 pages
Plant Biotechnology Exam Guide
No ratings yet
Plant Biotechnology Exam Guide
2 pages
Mitosis and Meiosis Explained
No ratings yet
Mitosis and Meiosis Explained
12 pages
Cell Types and Functions Overview
0% (1)
Cell Types and Functions Overview
3 pages

R for NGS Analysis and Bioconductor

Uploaded by

R for NGS Analysis and Bioconductor

Uploaded by

R & NGS

Dr. G. Ramesh Kumar, PhD.,

• Seq2pathway: an R/Bioconductor package for

• The following topics will be introduced:

• The following topics will be covered:

• interesting problems will require the

• when were the samples processed

• concentrate development resources on a few

Pre-processing affy marray

You might also like