ID Tissue Tissue Short Name Sources Wgbs Rna-Seq Technical Replicates

This document contains supplementary tables and figures for a research article. Supplementary Table 1 lists biological samples from the Roadmap Epigenomics Project, including tissue type and technical replicates. Supplementary Table 2 lists differentially expressed and methylation class genes from REP. The remaining supplementary figures provide additional evaluation and validation of the methods described in the research article.

Uploaded by

Matt

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views

ID Tissue Tissue Short Name Sources Wgbs Rna-Seq Technical Replicates

Uploaded by

Matt

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Supplementary Tables and Figures

Supplementary Table 1. Roadmap Epigenomics Project (REP) biological and technical replicate counts.

Technical Replicates
ID Tissue Tissue Short Name Sources WGBS RNA-seq
E058 Penis Foreskin Keratinocyte Keratin. 1 2 3
E065 Aorta Aorta 1 6 2
E066 Adult Liver Liver 3 2 2
E071 Brain Hippocampus Middle Hippo. 2 3 2
E079 Esophagus Esoph. 1 2 2
E094 Gastric Gastric 1 5 3
E095 Left Ventricle L. Ventr. 1 4 2
E096 Lung Lung 1 2 2
E097 Ovary Ovary 1 2 1
E098 Pancreas Pancreas 1 2 2
E100 Psoas Muscle Psoas 2 3 3
E104 Right Atrium R. Atrium 1 3 1
E105 Right Ventricle R. Ventr. 2 5 2
E106 Sigmoid Colon Colon 2 2 3
E109 Small Intestine Intest. 2 4 3
E112 Thymus Thymus 1 2 1
E113 Spleen Spleen 1 3 3
Supplementary Table 2. Differentially expressed and ME-Class interpolated gene counts from Roadmap
Epigenomics Project (REP).
Supplementary Figure 1. Increasing the number of RF estimators from 100 to 1000 for the ROI classifier
does not substantially increase performance as evaluated by: a) accuracy versus 1-reject rate, b)
precision versus recall (PR AUC; 100 estimators: 0.70, 1000 estimators: 0.71), and c) ROC curve (ROC
AUC; 100 estimators: 0.72, 1000 estimators: 0.73).
Supplementary Figure 2. Alternative full-gene methylation representations do not outperform TSS-
centric representations. a) Heat map indicates methylation status at individual CpG sites – red is fully
methylated, blue is fully unmethylated – for an example gene in two samples (Methyl. 1 and Methyl. 2).
Individual points below indicate differential DNA methylation (Methyl. 2 – Methyl. 1) across the example
gene at individual CpG sites. b) Whole Scaled Gene (WSG), c) Whole Gene (WG) and d) Uniform Gene
Features (UGF) representation of the gene in (a). See additional description of each method in the
Materials and Methods. Performance plots of TSS, WSG, WG, and UGF as reported by: e) accuracy
versus 1-reject, f) ROC curve (ROC AUC; TSS: 0.76, WSG: 0.75, WG: 0.70, UGF: 0.65), and g) precision
versus recall (PR AUC; TSS: 0.75, WSG: 0.74, WG: 0.69, UGF: 0.65). CGI = CpG island.
Supplementary Figure 3. Additional evaluation metrics for each method using 17 REP tissue differential
samples: a) positive predictive value (PPV) versus 1- reject rate, b) negative predictive value (NPV)
versus 1- reject rate, c) precision versus recall (PR AUC; ME-Class: 0.75, ROI: 0.70, DMR: 0.63, SW:
0.66) and d) accuracy versus the classifier’s probability of prediction.
Supplementary Figure 4: Metagene plots of genes identified by ME-Class in REP data at different
probabilities of prediction p. Blue curves represent the average Z-score normalized methylation difference
between each sample for downregulated genes while red curves represent the average for upregulated
genes.
Supplementary Figure 5. ME-Class outperforms classifiers using REP data based on only the most
important methylation features, [+0.5kb, +2.5kb] around the TSS, as evaluated by: a) accuracy versus 1-
reject rate, and b) ROC curve (ROC AUC; ME-Class: 0.76, RF-Most Imp. Feat.: 0.74, SW: 0.67, SW-Most
Imp. Feat.: 0.64). RF-Most Imp. Feature is an ME-Class like classifier built using features from only the
region [+0.5kb, +2.5kb] around the TSS. SW-Most Imp. Feat. is similar to the SW approach, but only
using methylation from [+0.5kb, +2.5kb] around the TSS.
Supplementary Figure 6. The addition of methylated CpG density and gene body features (GF) does not
increase ME-Class performance. a) Performance plots of ME-Class altered to use either mCG/CG,
mCG/bp, or CpG density (200bp resolution, CG/bp) as input. (PR AUC; mCG/CG: 0.75, mCG/bp: 0.75,
CG/bp: 0.50; ROC AUC; mCG/CG: 0.75, mCG/bp: 0.75, CG/bp: 0.50) b) Performance plots of ME-Class
with and without adding gene body features (GF) from the ROI classifier including average internal exons,
introns, and downstream features. ROI features are in Fig. 1d. (PR AUC; ME-Class: 0.75, ME-Class+GF:
0.76, ROI: 0.70; ROC AUC; ME-Class: 0.76, ME-Class+GF: 0.76, ROI: 0.72).
Supplementary Figure 7. CpG-poor genes are more predictive of expression classification. ME-Class
performance for genes overlapping a CpG Island (CGI) by >=1bp is reported as: a) accuracy versus 1-
rejection rate and b) ROC curve analysis (ROC AUC: No-CGI: 0.79; CGI: 0.75) c) Histogram of all genes
with complete start and stop annotation according to RefSeq (n=19,175). Low CpG density genes
comprise 26.0% (4,977 genes) while high CpG density genes comprise 74.0% (14,198 genes). d)
Histogram of differentially expressed RefSeq genes (n=12,064 genes), where low CpG density genes
comprise 18.8% (2,265 genes) while high CpG density genes comprise 81.2% (9,799 genes). e)
Histogram of differentially expressed, interpolated RefSeq genes (n=10,524 genes) after applying our
filtering parameters (see Materials and Methods). Low CpG density genes comprise 17.5% (1,842 genes)
while high CpG density genes comprise 82.5% (8,681 genes). ME-Class performance is reported as: f)
accuracy versus 1-rejection rate and g) ROC curve analysis (ROC AUC: CpG-poor: 0.8; CpG-rich: 0.75)
Cutoff between low and high CpG density genes at 0.35 observed/expected normalized CpGs +/-1500bp
of TSS. ME-Class performance with or without added feature of observed/expected normalized CpG
density +/-1500bp of TSS is reported as h) accuracy versus 1-rejection rate and i) ROC curve analysis
(ROC AUC: ME-Class: 0.76; ME-Class, CpG Density: 0.76).
Supplementary Figure 8. Random Forest classifier performs similarly or outperforms alternatives based
on classification performance as measured by a) ROC curve (ROC AUC; RF: 0.76, LR: 0.76, GBCT: 0.76,
DTW-kNN: 0.73, L2-kNN: 0.73, Naïve Bayes: 0.71) and b) accuracy versus 1-reject rate. LR = Logistic
Regression, GBCT= Gradient Boosted Classification Trees, DTW-kNN = Dynamic Time Warping based k-
Nearest Neighbor, L2-kNN = Euclidean distance (L2) based k-Nearest Neighbor.
Supplementary Figure 9. Effect on ME-Class performance of tuning parameters for smoothing, bin
resolution, and interpolation. Performance is reported as: accuracy versus 1-reject, precision versus
recall, and ROC curve. a) Relationship between ME-Class performance and sigma for Gaussian
smoothing with a constant bin resolution of 20bp. b) Relationship between ME-Class performance and
the size of the bin resolution at a constant sigma of 50bp. c) Relationship between ME-Class performance
and alternative interpolation method (PR AUC; PCHIP: 0.76, Linear: 0.76; ROC AUC; PCHIP: 0.76,
Linear: 0.76).
Supplementary Figure 10. Number of training samples and genes determine ME-Class performance.
The testing ROC AUC as a function of a) the number of training samples. The fraction of correctly
identified genes using ME-Class with 9 evaluation samples as a function of b) the number of training
samples and c) the total number of training genes from the training samples. In (a) and (b), each point
indicates the performance across all genes in an individual sample comparison. In (c), each point
indicates the number of training genes and fraction of genes returned for an individual sample
comparison and set of training samples. Training genes in (c) means the number of genes summed
across all training samples. A gene can be counted multiple times if it shows up in multiple samples.
Although it will likely have different methylation profiles and expression values in each comparison.
Permuted sets of all differential training samples (n=8) and a fixed set of differential evaluation samples
(n=9) are randomly chosen from the REP dataset.
Supplementary Figure 11. Performance of Blueprint neutrophil samples in comparison to other
hematopoietic cell types. ME-Class is trained from the full REP dataset and performance is reported as:
a) accuracy versus 1- reject rate and b) ROC curve analysis (ROC AUC; Lymphoid: 0.65, Megakayrocyte:
0.63, Erythoblast: 0.62, Other Myeloid: 0.55).
Supplementary Figure 12. Performance of Blueprint Epigenome samples using a similar leave-one-out
differential sample evaluation cross-validation framework as used for the REP data (see Fig. 1e). The
performance of ME-Class trained and evaluated solely using Blueprint samples is similar to that of a ME-
class model trained from the REP dataset. Shown are ROC AUC of differential comparisons of randomly
chosen single samples of each of the 14 cell types from Blueprint dataset.

GAN Aptitude Study Guide
100% (3)
GAN Aptitude Study Guide
53 pages
Pnle CHN PDF
No ratings yet
Pnle CHN PDF
11 pages
Gene Finding
No ratings yet
Gene Finding
5 pages
Carol Articles
No ratings yet
Carol Articles
5 pages
Edger: Differential Analysis of Sequence Read Count Data User'S Guide
No ratings yet
Edger: Differential Analysis of Sequence Read Count Data User'S Guide
119 pages
Pooling Data Across Micorarray
No ratings yet
Pooling Data Across Micorarray
49 pages
Edger: Differential Analysis of Sequence Read Count Data User'S Guide
No ratings yet
Edger: Differential Analysis of Sequence Read Count Data User'S Guide
122 pages
Edge RUsers Guide
No ratings yet
Edge RUsers Guide
138 pages
Participation paper3
No ratings yet
Participation paper3
14 pages
Edger Users Guide
No ratings yet
Edger Users Guide
139 pages
edgeRUsersGuide PDF
No ratings yet
edgeRUsersGuide PDF
110 pages
The Science of Stem Cells
From Everand
The Science of Stem Cells
Jonathan M. W. Slack
No ratings yet
TCC: An R Package For Comparing Tag Count Data With Robust Normalization Strategies
No ratings yet
TCC: An R Package For Comparing Tag Count Data With Robust Normalization Strategies
14 pages
Revolutionizing cancer classification: the snr-ogscc method for improved gene selection and clustering
No ratings yet
Revolutionizing cancer classification: the snr-ogscc method for improved gene selection and clustering
7 pages
NIHMS1536249 Supplement 1
No ratings yet
NIHMS1536249 Supplement 1
22 pages
Assignment 2: EEL 709 Deepali Jain 2012ee10082
No ratings yet
Assignment 2: EEL 709 Deepali Jain 2012ee10082
9 pages
SYSTEMATIC ASSESSMENT OF ANALYTICAL METHODS FOR DRUG SENSITIVITY PREDICTION FROM CANCER CELL LINE DATA-Jang
No ratings yet
SYSTEMATIC ASSESSMENT OF ANALYTICAL METHODS FOR DRUG SENSITIVITY PREDICTION FROM CANCER CELL LINE DATA-Jang
12 pages
BMC Bioinformatics
No ratings yet
BMC Bioinformatics
10 pages
Microarray Review
No ratings yet
Microarray Review
5 pages
Edger Users Guide
No ratings yet
Edger Users Guide
105 pages
Project O: Breast Cancer Gene Analysis Using R: Sheena Scroggins, Susan Mcgowan, John Caras
No ratings yet
Project O: Breast Cancer Gene Analysis Using R: Sheena Scroggins, Susan Mcgowan, John Caras
25 pages
De Vos Et Al 2021 Comparative Analytical Evaluation of Four Centralized Platforms For The Detection of Mycobacterium
No ratings yet
De Vos Et Al 2021 Comparative Analytical Evaluation of Four Centralized Platforms For The Detection of Mycobacterium
11 pages
Practical Aplication 2
No ratings yet
Practical Aplication 2
10 pages
The Application of The Permutation Test in Genome Wide Expression Analysis
No ratings yet
The Application of The Permutation Test in Genome Wide Expression Analysis
115 pages
Microarray gene expression classification: dwarf mongoose optimization with deep learning
No ratings yet
Microarray gene expression classification: dwarf mongoose optimization with deep learning
9 pages
New Tools For Recognizing TB (Molecular Testing) (Dr. Mark Perkins)
No ratings yet
New Tools For Recognizing TB (Molecular Testing) (Dr. Mark Perkins)
71 pages
Feature Selection
No ratings yet
Feature Selection
7 pages
biometrics_65_4_1030
No ratings yet
biometrics_65_4_1030
11 pages
Edger: Differential Expression Analysis of Digital Gene Expression Data
No ratings yet
Edger: Differential Expression Analysis of Digital Gene Expression Data
69 pages
Gene Prediction
25% (4)
Gene Prediction
36 pages
Metagenomics Classification: Project Synopsis
No ratings yet
Metagenomics Classification: Project Synopsis
15 pages
Practical Machine Learning
No ratings yet
Practical Machine Learning
11 pages
PARODI - Not Proper ROC Curves As New Tool For The Analysis of Differentially Expressed Genes in Microarray Experiments
No ratings yet
PARODI - Not Proper ROC Curves As New Tool For The Analysis of Differentially Expressed Genes in Microarray Experiments
30 pages
LHQ Thesis
No ratings yet
LHQ Thesis
198 pages
Example Analysis AMDA Version 2.0.0: Mattia Pelizzola March 13, 2006
No ratings yet
Example Analysis AMDA Version 2.0.0: Mattia Pelizzola March 13, 2006
48 pages
Cancer Type Prediction and Classification Based On RNA-sequencing Data
No ratings yet
Cancer Type Prediction and Classification Based On RNA-sequencing Data
4 pages
Ramana 2019
No ratings yet
Ramana 2019
6 pages
Discovering Combinatorial Biomarkers: Vipin Kumar
No ratings yet
Discovering Combinatorial Biomarkers: Vipin Kumar
23 pages
Thesis
100% (1)
Thesis
73 pages
Hernandez p1 Sem1
No ratings yet
Hernandez p1 Sem1
2 pages
Computational Biology and Chemistry: Gholam-Hossein Jowkar, Eghbal G. Mansoori
No ratings yet
Computational Biology and Chemistry: Gholam-Hossein Jowkar, Eghbal G. Mansoori
8 pages
Genomics: Experimental Methods: Dr. Pragasam Viswanathan Professor, SBST
No ratings yet
Genomics: Experimental Methods: Dr. Pragasam Viswanathan Professor, SBST
56 pages
Project (Sec: 01)
No ratings yet
Project (Sec: 01)
10 pages
Chemprop Benchmark 2019 SI
No ratings yet
Chemprop Benchmark 2019 SI
44 pages
Methods: Contents Lists Available at
No ratings yet
Methods: Contents Lists Available at
15 pages
Model Performance and Interpretability s12859-023-05141-2
No ratings yet
Model Performance and Interpretability s12859-023-05141-2
16 pages
CIBERSORT
No ratings yet
CIBERSORT
10 pages
Ref 29 These Luc
No ratings yet
Ref 29 These Luc
7 pages
DNA Microarrays: DR Divya Gupta
100% (1)
DNA Microarrays: DR Divya Gupta
33 pages
Zimmer 2019
No ratings yet
Zimmer 2019
7 pages
Feature Selection based on F-score for Enhancing CTG Data Classification
No ratings yet
Feature Selection based on F-score for Enhancing CTG Data Classification
5 pages
Microbas Community Profiler Enables Precise Measurement of The Gut Microbiome FINAL
No ratings yet
Microbas Community Profiler Enables Precise Measurement of The Gut Microbiome FINAL
9 pages
A Comparative Study of Cancer Detection Models Using Deep Learning
No ratings yet
A Comparative Study of Cancer Detection Models Using Deep Learning
48 pages
Genexpert Ultra JC Final
No ratings yet
Genexpert Ultra JC Final
41 pages
Top GO
No ratings yet
Top GO
12 pages
A Fullboard
No ratings yet
A Fullboard
10 pages
PBMC Guided Tutorial
No ratings yet
PBMC Guided Tutorial
27 pages
Paper3 - Prediction Error Estimation
No ratings yet
Paper3 - Prediction Error Estimation
7 pages
Rosales
No ratings yet
Rosales
27 pages
Tmod Vignette Current
No ratings yet
Tmod Vignette Current
33 pages
tmp25AA TMP
No ratings yet
tmp25AA TMP
19 pages
Fast Facts: EGFR Exon 20 Insertion Mutations in NSCLC
From Everand
Fast Facts: EGFR Exon 20 Insertion Mutations in NSCLC
Julia Rotow
No ratings yet
Flow Accelerated Corrosion (FAC) of Deaerator Tank
No ratings yet
Flow Accelerated Corrosion (FAC) of Deaerator Tank
1 page
Site Plan: A B C D E F G
No ratings yet
Site Plan: A B C D E F G
1 page
Occupational Health and Safety in Physiotherapy: Guidelines For Practice
No ratings yet
Occupational Health and Safety in Physiotherapy: Guidelines For Practice
9 pages
KNS 3243 Engineering Hydrology (Extraction of Irreplaceable Groundwater and Vanishing Aquifers)
No ratings yet
KNS 3243 Engineering Hydrology (Extraction of Irreplaceable Groundwater and Vanishing Aquifers)
21 pages
American Class Structure in an Age of Growing Inequality 10th The All Chapters Instant Download
100% (3)
American Class Structure in an Age of Growing Inequality 10th The All Chapters Instant Download
34 pages
Arts DIRECTION: ENUMERATION: Write The Correct Answers. Wrong Spelling Is Wrong
No ratings yet
Arts DIRECTION: ENUMERATION: Write The Correct Answers. Wrong Spelling Is Wrong
3 pages
Consumer S Preference Towards Organic Food Products: Rupesh Mervin M and Dr. R. Velmurugan
No ratings yet
Consumer S Preference Towards Organic Food Products: Rupesh Mervin M and Dr. R. Velmurugan
5 pages
Denture Delivery and Follow Up: Dr. Cecilia E. Aragón
No ratings yet
Denture Delivery and Follow Up: Dr. Cecilia E. Aragón
39 pages
TIDC Final El Paso Report
No ratings yet
TIDC Final El Paso Report
70 pages
DETAILED OVERVIEW NATURE FIELDS ENGLISH 自然田中英简介-详细版
No ratings yet
DETAILED OVERVIEW NATURE FIELDS ENGLISH 自然田中英简介-详细版
8 pages
Original Article: Arvinder Pal Singh Batra, Anupama Mahajan, Karunesh Gupta
No ratings yet
Original Article: Arvinder Pal Singh Batra, Anupama Mahajan, Karunesh Gupta
6 pages
Fire Sprinkler Infographic
No ratings yet
Fire Sprinkler Infographic
1 page
AP Chemistry Chapter 6 HWAnswers
No ratings yet
AP Chemistry Chapter 6 HWAnswers
4 pages
Functional Milks and Dairy Beverages: Review
No ratings yet
Functional Milks and Dairy Beverages: Review
15 pages
Spesifikasi Philips DigitalDiagnost C50 HP
No ratings yet
Spesifikasi Philips DigitalDiagnost C50 HP
2 pages
Jeppview For Windows: List of Pages in This Trip Kit
No ratings yet
Jeppview For Windows: List of Pages in This Trip Kit
139 pages
Elna EC60 Sewing Machine Instruction Manual
No ratings yet
Elna EC60 Sewing Machine Instruction Manual
48 pages
AI - Welding Process
100% (3)
AI - Welding Process
29 pages
WS IG9 Chemistry of the Environment. (1)
No ratings yet
WS IG9 Chemistry of the Environment. (1)
6 pages
Chapter 5 Microbial Deterioration of Stone Monuments-An Updated
No ratings yet
Chapter 5 Microbial Deterioration of Stone Monuments-An Updated
45 pages
Equipments
No ratings yet
Equipments
9 pages
Chapter 4 Bookwork
No ratings yet
Chapter 4 Bookwork
6 pages
Advertise 0
No ratings yet
Advertise 0
7 pages
Chapter 00
No ratings yet
Chapter 00
102 pages
Training Centre Inspection Check List Name of Inspecting Officer-Basic Details Sl. No. Particulars Details Remarks
No ratings yet
Training Centre Inspection Check List Name of Inspecting Officer-Basic Details Sl. No. Particulars Details Remarks
9 pages
RRL With References Updated
50% (2)
RRL With References Updated
14 pages
Pediatric Nutritional Assessment
100% (1)
Pediatric Nutritional Assessment
51 pages
Option B Iodine Number New Syllabus
No ratings yet
Option B Iodine Number New Syllabus
3 pages