Biomarker Discovery Tutorial
Biomarker Discovery Tutorial
Biomarker discovery involves identifying biological molecules (genes, proteins, metabolites) that are
indicators of a particular disease state or therapeutic response. In this tutorial, you'll learn how to identify
gene biomarkers from RNA-seq data using R. We'll work with a real dataset: GSE5364 (Breast Cancer
2. Requirements
You need to install R and RStudio. Then install the required R packages by running the following code:
install.packages("BiocManager")
BiocManager::install(c("DESeq2", "GEOquery", "pheatmap", "ggplot2",
"EnhancedVolcano", "org.Hs.eg.db", "clusterProfiler"))
We'll download the breast cancer gene expression data from GEO:
library(GEOquery)
gse <- getGEO("GSE5364", GSEMatrix = TRUE)
exprSet <- exprs(gse[[1]])
phenoData <- pData(gse[[1]])
4. Setting Labels
library(DESeq2)
Beginner Tutorial: Computational Biomarker Discovery Using RNA-seq
library(EnhancedVolcano)
EnhancedVolcano(res,
lab = rownames(res),
x = "log2FoldChange",
y = "pvalue",
pCutoff = 0.05,
FCcutoff = 1)
7. GO Enrichment Analysis
library(clusterProfiler)
library(org.Hs.eg.db)
sig_genes <- rownames(res[which(res$padj < 0.05 & abs(res$log2FoldChange) > 1), ])
entrez_ids <- mapIds(org.Hs.eg.db, keys=sig_genes, column="ENTREZID", keytype="SYMBOL",
multiVals="first")
go_results <- enrichGO(gene = na.omit(entrez_ids),
OrgDb = org.Hs.eg.db,
ont = "BP",
pAdjustMethod = "BH")
barplot(go_results, showCategory = 10)
8. Conclusion
In this case study, we identified potential gene biomarkers for breast cancer using DESeq2 and visualized
them with volcano plots. We then explored their biological roles using GO enrichment. This process is