0% found this document useful (0 votes)

7 views

TCGA gene expression data classification

The document presents a study on transcriptome analysis and its application in understanding genetic diseases, particularly cancer. It outlines the motivation for early cancer diagnosis through deep learning-based classifiers using transcriptome data and discusses the methodology for analyzing gene expression data from major cancer databases. The study aims to identify significant gene products and validate them through pathway analysis to enhance cancer detection and understanding.

Uploaded by

coolbd

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views

TCGA gene expression data classification

Uploaded by

coolbd

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 24

Transcriptome analysis and genetic diseases

Presented by
Tareque Mohmud Chowdhury
(144701)

Supervisor
Md Abu Raihan Mostafa Kamal, PhD
Professor, Department of Computer Science and Engineering (CSE)
Islamic University of Technology (IUT)
Contents
 Introduction
 Motivation
 Objectives
 Literature Review
• Gene and gene expression
• Gene products
• Gene regulation
 Outline of methodology
 References

2
Introduction
 Cancer is the most common human genetic disease. The transition from a normal cell
to a cancerous cell is driven by changes to a cell's DNA, known as mutations.

 Genetic mutations can occur randomly during cell division, can occur due to some
extreme environmental stress, or can be inherited from parents.

 Cancer arises from the transformation of normal cells into tumor cells in a multi-stage
process that generally progresses from a pre-cancerous lesion to a malignant tumor.

3
Introduction
 A transcriptome is the full range of messenger RNA, or mRNA, molecules expressed
by an organism in a cell, in a tissue or in the entire body.

 Transcriptome profile analysis can help researchers to understand genetic diseases,

progress of genetic disease from one state to another over time.

4
Motivation
 Cancer is one of the leading cause of death worldwide, accounting for nearly 10 million
deaths in 2020, or nearly one in every six deaths. [1]

 Early diagnosis of cancer has the best chance for a successful treatment. Transcriptome
profile analysis may help to diagnose cancer in early stages.

 Transcriptome profile analysis also helps to understand the root causes behind cancer
development and progression.

 Recent reviews[10,11] have shown that not much work has done on Transcriptome
profile analysis by Deep Learning despite the the fact that DL has the capability to
discover hidden features from big datasets.
5
Objective
 Objective of the study is to build a deep learning based classifier model to detect
cancer types and cancer stages by transcriptome data (gene products) and/or DNA
methylation data.
 Using the model identify set of gene products and/or DNA methylation data which
contributed significantly for each cancer types and cancer progression.
 Finally validate the gene products and/or DNA methylation data set by pathway
analysis.

6
Challenges(1)
 The architecture of a Deep Learning Network (DLN) provides the working parameters
—such as the number, size, and type of layers of the network. This architecture
greatly varies based on the dataset. Mostly intuitive trial and error approach has been
used to fine tune the architecture to achieve higher level of validation accuracy for a
dataset from a certain domain .

7
Challenges(2)
 Biological data related to cancer genomics comes in a wide variety of category. Some
examples may include: DNA methylation data, mRNA nucleotide sequence, mRNA
expression data, miRNA nucleotide sequence, miRNA expression data, other RNA
expression data, etc. Choosing the right category to feed to the prepared DLN is not
easy.

 Moreover as all category of data points to the same cancer samples, all or subset of
them can be feed to the prepared DLN for better outcome. In this case, DLN need to
be updated to accept such array of diverse input data.

8
Gene and Gene Expression

 Genes are the coded regions (instructions) of DNA

which can be transcribed into various types of
RNAs.

 Gene expression is the process by which the

instructions (genes) in DNA are converted into a
functional products, such as RNAs or proteins.

 mRNA is the only RNA which can be translated into proteins. Other RNAs
participated in various cellular activities.

9
Gene Products Transcription + Translation = Central-Dogma

transcription
 DNA/Gene  RNA
translation
• mRNA  Protein
• ncRNA
− lncRNA
− miRNA
− snRNA
− circRNA
− tRNA
− pseudoGene RNA
− Other RNA

Level 1 Level 2 Level 3

10
Hierarchy of Gene Products
DNA
gene gene gene gene gene gene gene gene gene gene Level 1

Other RNAs snRNA

lncRNA
Level 2
circRNA
miRNA
~22 nt pseudoGene RNA

mRNA

Protein Level 3

11
Gene Regulation
 Gene regulation is the process used to control the timing, location and amount in which
genes are expressed.
 Gene regulation is carried out by a variety of mechanisms, including through
regulatory proteins/RNAs and chemical modification of DNA (methylation).
 A typical animal gene is regulated by its adjacent promoter plus several enhancers that
can be located in 5' and 3' regulatory regions, as well as within introns. Enhancers are,
on average, 500 base pairs in length and contain as many as 10 binding sites for
multiple transcription factors. [2]

12
Gene Regulation

Figure 1. Regulation of gene expression by transcription factors. Transcriptional activators and repressors bind to specific DNA sequences
present in enhancers and promoters. The regulation of gene expression occurs by enhancing or inhibiting recruitment [3]
13
Datasets(1)
 Two major sources of genes and gene expression database related to Cancer are-

• The Cancer Genome Atlas (TCGA) is a landmark cancer genomics program, molecularly
characterized over 20,000 primary cancer and matched normal samples spanning 33 cancer
types available at https://portal.gdc.cancer.gov

• Gene Expression Omnibus (GEO) is an international repository for high-throughput functional

genomic data which can be downloaded in several formats using a variety of mechanisms at
https://www.ncbi.nlm.nih.gov/geo/info/download.html

 A number of other database have been listed in a review paper[11] by Yong-Kui Ma et

al. but most are based on specific types of cancer with limited number and type of
samples.

14
Datasets (2)
 For our study we will download gene expression data from TCGA database for our DL
architecture and after preparing the model we will cross validate it with GEO datasets.

15
Gene Expression Analysis: Literature Review:
 Identify a set of genes related to a specific state (normal or disease) or as a difference
between two different states (normal and disease or between disease states), any of
the following methods are used widely:

• DESeq2 [7]
• WGCNA: Weighted Correlation Network Analysis [8]
• ceRNA: Competitive endogenous RNA network [9]

 Explore pathways from the set of identified genes using Gene Ontology (GO) database.

16
Literature Review: DESeq2

Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2
Author: Love, M.I., Huber, W. & Anders, S.

 DESeq2 offers a comprehensive and general solution for gene-level analysis of RNA-
seq data.

 DESeq2 is a method for differential analysis of count data, using shrinkage estimation
for dispersions and fold changes to improve stability and interpretability of estimates.
This enables a more quantitative analysis focused on the strength rather than the
mere presence of differential expression.

 DESeq2 provides us differentially expressed genes between two or more conditions.

17
Literature Review: WGCNA

WGCNA: an R package for weighted correlation network analysis

Author: Langfelder, P., Horvath, S.

 Weighted gene co-expression network analysis is a systems biology method for

describing the correlation patterns among genes across gene expression data.
 WGCNA can be used for finding clusters (modules) of highly correlated genes.

18
Literature Review: CERNA

Competing endogenous RNAs (ceRNAs): new entrants to the intricacies of gene regulation
Author: Subramanian Subbaya

 miRNA is a short sequence (~22nt) of RNA which can attached to promoter region of an mRNA and silence the
mRNA. Afterwards that particular mRNA can never be participated in the translation process hence silenced.

 There are around 2000 miRNA exists in human transcriptome. There is a many-to-many relationship exists
between miRNA and mRNA. That means one type of miRNA can silence multiple types of mRNA and one type of
mRNA canbe silenced by multiple types of miRNA.

 Moreover, there are some other types of RNA (lncRNA, circRNA, etc.) exists which can work as sponge for miRNA.
i.e., miRNA can be absorbed by them by attaching with them.

19
Literature Review: Deep Learning on Gene
Expression Data
 Koumakis L. has listed several limitations to apply DL in genomic data In a review
paper[11] as follows:
• Model interpretation (the black box)
• The curse of dimensionality
• Imbalanced classes
• Heterogeneity of data
 A review paper[10] titled “Deep Learning in Cancer Diagnosis and Prognosis
Prediction: A Minireview on Challenges, Recent Trends, and Future Directions” by
Yong-Kui Ma et al. suggested that so far DL has been applied only few of types of
genomic data. Much works can be done to apply DL on different types of genomic
data.

20
Outline of methodology
 We download TCGA datasets for each types of Cancer having significant number of
samples.
 Prepare a DL architecture of classify types and/or stages of cancer
 Process TCGA downloaded datasets into intermediary/augmented form to feed into
the prepared DL architecture
 Identify features (genes, gene products) which contributes most for classification of
data.

21
References
1. WHO, World health organization, https://www.who.int/news-room/fact-sheets/detail/cancer

2. Sunil Lakhani et al., The landscape of cancer genes and mutational processes in breast cancer. Nature 486 (7403) 400-U133. https://doi.org/10.1038/nature11017

3. "Versuche ber Pflanzen-Hybriden“, Johann Gregor Mendel, Verhandlungen des naturforscheden Vereines in Brünn 4 1865, [3]-47

4. Adams, J. (2008) The complexity of gene expression, protein interaction, and cell differentiation. Nature Education 1(1):110

5. Duygu Koca, Characterization of novel histone methyltransferases and their roles in cancer, Thesis for Doctoral, Advisor: Roland Schüle, January 2019

6. Preetha Anand, Ajaikumar B. Kunnumakara, Chitra Sundaram, Kuzhuvelil B. Harikumar, Sheeja T. Tharakan, Oiki S. Lai, Bokyung Sung & Bharat B. Aggarwal, Cancer is a Preventable Disease
that Requires Major Lifestyle Changes, Pharmaceutical Research, Vol. 25, No. 9, September 2008 DOI: 10.1007/s11095-008-9661-9

7. Love, M.I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 15, 550 (2014).
https://doi.org/10.1186/s13059-014-0550-8

8. Langfelder, P., Horvath, S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 9, 559 (2008). https://doi.org/10.1186/1471-2105-9-559

9. Subramanian Subbaya, Competing endogenous RNAs (ceRNAs): new entrants to the intricacies of gene regulation, Frontiers in Genetics VOL 5, 2014,
https://doi.org/10.3389/fgene.2014.00008

10. Yong-Kui Ma, Ahsan Bin Tufail, Mohammed K. A. Kaabar, Francisco Martínez, A. R. Junejo, Inam Ullah, Rahim Khan, "Deep Learning in Cancer Diagnosis and Prognosis Prediction: A Minireview on
Challenges, Recent Trends, and Future Directions", Computational and Mathematical Methods in Medicine, vol. 2021, Article ID 9025470, 28 pages, 2021. https://doi.org/10.1155/2021/9025470

11. Koumakis L. Deep learning models in genomics; are we there yet? Comput Struct Biotechnol J. 2020 Jun 17;18:1466-1473. doi: 10.1016/j.csbj.2020.06.017. PMID: 32637044; PMCID: PMC7327302.

22
Thank You All

23
Questions ?

Cells Alive Webquest
No ratings yet
Cells Alive Webquest
4 pages
Mat 240 Excel Project 3 F20-Revised
No ratings yet
Mat 240 Excel Project 3 F20-Revised
6 pages
U20BTBT01-Biology For Engineers
No ratings yet
U20BTBT01-Biology For Engineers
1 page
Pi is 2001037024000424
No ratings yet
Pi is 2001037024000424
15 pages
Gene Expression Databases - 525 - 2016
No ratings yet
Gene Expression Databases - 525 - 2016
60 pages
2023-GenomicaFuncional y Biocomputacion-Day3
No ratings yet
2023-GenomicaFuncional y Biocomputacion-Day3
97 pages
Gene Expression: Quantification of Information Molecules and Their Applications
No ratings yet
Gene Expression: Quantification of Information Molecules and Their Applications
146 pages
Review 1 Report
No ratings yet
Review 1 Report
10 pages
Large-Scale Analysis of Gene Expression
No ratings yet
Large-Scale Analysis of Gene Expression
27 pages
Cancer Classification Using Gene Expression Data
No ratings yet
Cancer Classification Using Gene Expression Data
35 pages
The Application of The Permutation Test in Genome Wide Expression Analysis
No ratings yet
The Application of The Permutation Test in Genome Wide Expression Analysis
115 pages
Methods used to study the Molecular mechanisms disease
No ratings yet
Methods used to study the Molecular mechanisms disease
2 pages
Classification of Cancerous Profiles Using Machine Learning
No ratings yet
Classification of Cancerous Profiles Using Machine Learning
38 pages
RECONSTRUCTION AND ANALYSIS OF CANCERSPECIFIC GENE REGULATORY NETWORKS FROM GENE EXPRESSION PROFILES
No ratings yet
RECONSTRUCTION AND ANALYSIS OF CANCERSPECIFIC GENE REGULATORY NETWORKS FROM GENE EXPRESSION PROFILES
10 pages
APPLICATION OF BIOINFORMATICS IN MOLECULAR BIOLOGY AND CURRENT RESEACRH-Dr. Ruchi Yadav
No ratings yet
APPLICATION OF BIOINFORMATICS IN MOLECULAR BIOLOGY AND CURRENT RESEACRH-Dr. Ruchi Yadav
105 pages
Genetic Expressions
No ratings yet
Genetic Expressions
6 pages
Cancer Research Review PDF
No ratings yet
Cancer Research Review PDF
57 pages
(Ebook) Gene Expression Profiling by Microarrays: Clinical Implications by Wolf-Karsten Hofmann ISBN 9780521853965, 0521853966, 1397805112212 all chapter instant download
No ratings yet
(Ebook) Gene Expression Profiling by Microarrays: Clinical Implications by Wolf-Karsten Hofmann ISBN 9780521853965, 0521853966, 1397805112212 all chapter instant download
86 pages
Cancer Genomics Qb[1]
No ratings yet
Cancer Genomics Qb[1]
30 pages
Eukaryotic Transcriptional and Post-Transcriptional Gene Expression Regulation
100% (1)
Eukaryotic Transcriptional and Post-Transcriptional Gene Expression Regulation
280 pages
8.A_Comparative_Study_on_Classification_Methods_for_Renal_Cell_and_Lung_Cancers_Using_RNA-Seq_Data
No ratings yet
8.A_Comparative_Study_on_Classification_Methods_for_Renal_Cell_and_Lung_Cancers_Using_RNA-Seq_Data
9 pages
DNA Microarray Technology Final 97-2003
No ratings yet
DNA Microarray Technology Final 97-2003
34 pages
document(0)
No ratings yet
document(0)
8 pages
finocchiaro2007
No ratings yet
finocchiaro2007
13 pages
Complex-based analysis of dysregulated cellular processes in cancer
No ratings yet
Complex-based analysis of dysregulated cellular processes in cancer
22 pages
Final Year Project
No ratings yet
Final Year Project
10 pages
Introduction To Genomics: Children's Hospital Informatics Program
No ratings yet
Introduction To Genomics: Children's Hospital Informatics Program
22 pages
2nd Review PPT Template
No ratings yet
2nd Review PPT Template
13 pages
Bioinformatics TM4
No ratings yet
Bioinformatics TM4
44 pages
1-s2.0-S1359644604032246-main
No ratings yet
1-s2.0-S1359644604032246-main
8 pages
Recent advancements in transcriptomics and its application in basic medical and clinical sciences
No ratings yet
Recent advancements in transcriptomics and its application in basic medical and clinical sciences
18 pages
(Ebook) Gene Expression Profiling by Microarrays: Clinical Implications by Wolf-Karsten Hofmann ISBN 9780521853965, 0521853966, 1397805112212 - The ebook in PDF/DOCX format is available for instant download
100% (1)
(Ebook) Gene Expression Profiling by Microarrays: Clinical Implications by Wolf-Karsten Hofmann ISBN 9780521853965, 0521853966, 1397805112212 - The ebook in PDF/DOCX format is available for instant download
53 pages
Genomics in Healthcare & of Haemat Cancers 2
No ratings yet
Genomics in Healthcare & of Haemat Cancers 2
79 pages
Cancerous Profiles - 2017 - Conference - Paper
No ratings yet
Cancerous Profiles - 2017 - Conference - Paper
6 pages
2000-Gene Expression Data Analysis
No ratings yet
2000-Gene Expression Data Analysis
8 pages
Gene Expression Data Analysis: Minireview
No ratings yet
Gene Expression Data Analysis: Minireview
8 pages
Microarray PDF
No ratings yet
Microarray PDF
34 pages
Oncology Research Review
No ratings yet
Oncology Research Review
30 pages
An Overview On Gene Expression Analysis: Dr. R. Radha, P. Rajendiran
No ratings yet
An Overview On Gene Expression Analysis: Dr. R. Radha, P. Rajendiran
6 pages
Molecular Biology Notes
No ratings yet
Molecular Biology Notes
4 pages
Cyto 2.4
No ratings yet
Cyto 2.4
5 pages
Gene Expression Analysis On Cancer Dataset
No ratings yet
Gene Expression Analysis On Cancer Dataset
11 pages
Introduction to Bioinformatics, Sequence and Genome Analysis
From Everand
Introduction to Bioinformatics, Sequence and Genome Analysis
Jerry H. Swift
No ratings yet
Unit 2 Lect 1
No ratings yet
Unit 2 Lect 1
18 pages
WGCNA [Autosaved]
No ratings yet
WGCNA [Autosaved]
54 pages
cancers-17-01008
No ratings yet
cancers-17-01008
47 pages
Post Genomic Approaches in Cancer and Nano Medicine 1st Edition Kishore R Sakharkar - Download the ebook now and own the full detailed content
No ratings yet
Post Genomic Approaches in Cancer and Nano Medicine 1st Edition Kishore R Sakharkar - Download the ebook now and own the full detailed content
62 pages
Nihms 1817768
No ratings yet
Nihms 1817768
52 pages
Day_4_Lecture_2_Biotech_notes_ @FallenAngelUPSC Team LPRPDV
No ratings yet
Day_4_Lecture_2_Biotech_notes_ @FallenAngelUPSC Team LPRPDV
35 pages
Download Complete Post Genomic Approaches in Cancer and Nano Medicine 1st Edition Kishore R Sakharkar PDF for All Chapters
100% (2)
Download Complete Post Genomic Approaches in Cancer and Nano Medicine 1st Edition Kishore R Sakharkar PDF for All Chapters
47 pages
New04 Thefuture Sequence To Expression Modells
No ratings yet
New04 Thefuture Sequence To Expression Modells
12 pages
(Translational Bioinformatics 9) Jiaqian Wu (Eds.) - Transcriptomics and Gene Regulation - Springer Netherlands (2016)
100% (1)
(Translational Bioinformatics 9) Jiaqian Wu (Eds.) - Transcriptomics and Gene Regulation - Springer Netherlands (2016)
190 pages
Full Download Analyzing High-Dimensional Gene Expression and DNA Methylation Data With R 1st Edition Hongmei Zhang (Author) PDF
100% (3)
Full Download Analyzing High-Dimensional Gene Expression and DNA Methylation Data With R 1st Edition Hongmei Zhang (Author) PDF
52 pages
Cancer Info
No ratings yet
Cancer Info
11 pages
6
No ratings yet
6
8 pages
Chapter 1 - The Cancer Genome
No ratings yet
Chapter 1 - The Cancer Genome
35 pages
Explore Rare Cancer Medicine Mutations
No ratings yet
Explore Rare Cancer Medicine Mutations
6 pages
Gene Prediction
No ratings yet
Gene Prediction
50 pages
GP Report
No ratings yet
GP Report
3 pages
1-s2.0-S0957417421017590-main
No ratings yet
1-s2.0-S0957417421017590-main
10 pages
Introduction To Bioinformatics 1
No ratings yet
Introduction To Bioinformatics 1
109 pages
A Systematic Review
No ratings yet
A Systematic Review
22 pages
Wanted: Gene Regulators $1.000.000.000.000.000.000 Each: Paper Presentation, Next Generation Genomics
No ratings yet
Wanted: Gene Regulators $1.000.000.000.000.000.000 Each: Paper Presentation, Next Generation Genomics
15 pages
新托福百日百句百篇（第三册）
No ratings yet
新托福百日百句百篇（第三册）
294 pages
Defining Species: A Sourcebook From Antiquity To Today
No ratings yet
Defining Species: A Sourcebook From Antiquity To Today
239 pages
Genetics
No ratings yet
Genetics
11 pages
Instant ebooks textbook Plant Cytogenetics Second Edition Ram J. Singh download all chapters
100% (2)
Instant ebooks textbook Plant Cytogenetics Second Edition Ram J. Singh download all chapters
84 pages
Journal Pbio 3002878
No ratings yet
Journal Pbio 3002878
28 pages
Atomi HSC Biology Study Plan 2023
No ratings yet
Atomi HSC Biology Study Plan 2023
2 pages
A Review of Reaction Enhancement Strategies For Isothe - 2021 - Sensors and Actu
No ratings yet
A Review of Reaction Enhancement Strategies For Isothe - 2021 - Sensors and Actu
16 pages
Chapter 25: The History of Life On Earth
No ratings yet
Chapter 25: The History of Life On Earth
6 pages
Sas #10 Cri 170
No ratings yet
Sas #10 Cri 170
5 pages
Topic 18 - Biodiversity, Classification and Conservation
No ratings yet
Topic 18 - Biodiversity, Classification and Conservation
18 pages
Lec12HW
No ratings yet
Lec12HW
3 pages
Code-Botany - COurse - Details PDF
No ratings yet
Code-Botany - COurse - Details PDF
89 pages
Bill Nye Cloning
No ratings yet
Bill Nye Cloning
2 pages
Reproduction in Animals Class 8 Notes - Chapter 9
No ratings yet
Reproduction in Animals Class 8 Notes - Chapter 9
6 pages
Pau Cover Letter Joseph - 022046
No ratings yet
Pau Cover Letter Joseph - 022046
2 pages
Lesson 7 Origin and Evolution of Life
No ratings yet
Lesson 7 Origin and Evolution of Life
13 pages
PGC Price-List - 2019
No ratings yet
PGC Price-List - 2019
2 pages
15 Famous Female Scientists
No ratings yet
15 Famous Female Scientists
16 pages
Reticulate Evolution, or Network Evolution, Describes The
No ratings yet
Reticulate Evolution, or Network Evolution, Describes The
7 pages
Chapter 5 Genetic Resources in Agriculture
No ratings yet
Chapter 5 Genetic Resources in Agriculture
59 pages
Practical Manual On "Fundamentals of Plant Breeding" (PBG-212)
No ratings yet
Practical Manual On "Fundamentals of Plant Breeding" (PBG-212)
5 pages
Apocalypse Keys Mystery Folio - The Missing Link
No ratings yet
Apocalypse Keys Mystery Folio - The Missing Link
5 pages
Abominations Spreads
No ratings yet
Abominations Spreads
10 pages
Biology Class 10 ERP 2024-2025 New
No ratings yet
Biology Class 10 ERP 2024-2025 New
12 pages
Zyg Nema
No ratings yet
Zyg Nema
8 pages
Lecture 3.1 (Capillary Sequencing
No ratings yet
Lecture 3.1 (Capillary Sequencing
19 pages
Biology Exam Questions Review
No ratings yet
Biology Exam Questions Review
5 pages

TCGA gene expression data classification

Uploaded by

TCGA gene expression data classification

Uploaded by

Transcriptome analysis and genetic diseases

 Transcriptome profile analysis can help researchers to understand genetic diseases,

 Genes are the coded regions (instructions) of DNA

 Gene expression is the process by which the

Level 1 Level 2 Level 3

Other RNAs snRNA

• Gene Expression Omnibus (GEO) is an international repository for high-throughput functional

 A number of other database have been listed in a review paper[11] by Yong-Kui Ma et

 DESeq2 provides us differentially expressed genes between two or more conditions.

WGCNA: an R package for weighted correlation network analysis

 Weighted gene co-expression network analysis is a systems biology method for

You might also like