0% found this document useful (0 votes)
19 views

Biomarker Discovery Tutorial

This tutorial guides users through the process of discovering gene biomarkers from RNA-seq data using R, specifically with a breast cancer dataset (GSE5364). It covers the installation of necessary R packages, loading data, performing differential expression analysis with DESeq2, visualizing results with EnhancedVolcano, and conducting GO enrichment analysis. The tutorial emphasizes the importance of these methods in biomarker research and clinical diagnostics.

Uploaded by

Mazi Sopuru
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views

Biomarker Discovery Tutorial

This tutorial guides users through the process of discovering gene biomarkers from RNA-seq data using R, specifically with a breast cancer dataset (GSE5364). It covers the installation of necessary R packages, loading data, performing differential expression analysis with DESeq2, visualizing results with EnhancedVolcano, and conducting GO enrichment analysis. The tutorial emphasizes the importance of these methods in biomarker research and clinical diagnostics.

Uploaded by

Mazi Sopuru
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Beginner Tutorial: Computational Biomarker Discovery Using RNA-seq

1. Introduction to Biomarker Discovery

Biomarker discovery involves identifying biological molecules (genes, proteins, metabolites) that are

indicators of a particular disease state or therapeutic response. In this tutorial, you'll learn how to identify

gene biomarkers from RNA-seq data using R. We'll work with a real dataset: GSE5364 (Breast Cancer

dataset from the GEO database).

2. Requirements

You need to install R and RStudio. Then install the required R packages by running the following code:

install.packages("BiocManager")
BiocManager::install(c("DESeq2", "GEOquery", "pheatmap", "ggplot2",
"EnhancedVolcano", "org.Hs.eg.db", "clusterProfiler"))

3. Loading RNA-seq Data

We'll download the breast cancer gene expression data from GEO:

library(GEOquery)
gse <- getGEO("GSE5364", GSEMatrix = TRUE)
exprSet <- exprs(gse[[1]])
phenoData <- pData(gse[[1]])

4. Setting Labels

We define sample groups (Cancer vs Normal):

group <- ifelse(grepl("normal", phenoData$title, ignore.case = TRUE), "Normal", "Cancer")


group <- factor(group)

5. Differential Expression Analysis

We use DESeq2 to find differentially expressed genes:

library(DESeq2)
Beginner Tutorial: Computational Biomarker Discovery Using RNA-seq

dds <- DESeqDataSetFromMatrix(countData = exprSet, colData = data.frame(group), design = ~ group)


dds <- DESeq(dds)
res <- results(dds)
head(res[order(res$pvalue), ])

6. Volcano Plot Visualization

Use EnhancedVolcano to plot significantly different genes:

library(EnhancedVolcano)
EnhancedVolcano(res,
lab = rownames(res),
x = "log2FoldChange",
y = "pvalue",
pCutoff = 0.05,
FCcutoff = 1)

7. GO Enrichment Analysis

Convert gene names to Entrez IDs and analyze biological processes:

library(clusterProfiler)
library(org.Hs.eg.db)
sig_genes <- rownames(res[which(res$padj < 0.05 & abs(res$log2FoldChange) > 1), ])
entrez_ids <- mapIds(org.Hs.eg.db, keys=sig_genes, column="ENTREZID", keytype="SYMBOL",
multiVals="first")
go_results <- enrichGO(gene = na.omit(entrez_ids),
OrgDb = org.Hs.eg.db,
ont = "BP",
pAdjustMethod = "BH")
barplot(go_results, showCategory = 10)

8. Conclusion

In this case study, we identified potential gene biomarkers for breast cancer using DESeq2 and visualized

them with volcano plots. We then explored their biological roles using GO enrichment. This process is

foundational for biomarker research and clinical diagnostics.

You might also like