New Tools For Functional Genomic Analysis
New Tools For Functional Genomic Analysis
Author Manuscript
Drug Discov Today. Author manuscript; available in PMC 2010 August 1.
Published in final edited form as:
NIH-PA Author Manuscript
Abstract
For the past decade, the development of genomic technology has revolutionized modern biological
research and drug discovery. Functional genomic analyses enable biologists to perform analysis of
genetic events on a global scale and they have been widely used in gene discovery, biomarker
NIH-PA Author Manuscript
determination, disease classification, and drug target identification. In this article, we provide an
overview of the current and emerging tools involved in genomic studies, including expression arrays,
microRNA arrays, array CGH, ChIP-on-chip, methylation arrays, mutation analysis, genome wide-
association studies, proteomic analysis, integrated functional genomic analysis and related
bioinformatic and biostatistical analyses. Using human liver cancer as an example, we provide further
information of how these genomic approaches can be applied in cancer research.
Keywords
Functional genomics; arrays; cancer
Genomic analyses include a variety of tools that address the global changes of specific
biological parameters. Genomic analyses that examine DNA, RNA, or protein levels provide
powerful tools to characterize gene function and regulation, facilitate disease classification,
biomarker identification, risk factor stratification and drug discovery (Fig. 1).
followed by the data analysis process. Two platforms that are commonly in use are: cDNA
microarrays and oligonucleotide microarrays [1]. The cDNA microarrays contain a collection
of probes generated by PCR amplification from cDNA libraries, expressed sequence tag clones
NIH-PA Author Manuscript
Global expression profiles enable a better understanding of the molecular signature of human
diseases, including liver cancers [2,3] (Fig. 2). For instance, we and others have reported
genome-wide expression profiles of liver cancer and their clinico-pathological implications
[4–7]. We have observed specific gene expression patterns between tumor and non-tumor
[4], in association with p53 status [4], and clonality delineation for multi-nodular tumor [8].
There are reports on gene expression profiling in association with tumor metastasis [6], and
patient outcome [5,7]. Differentially expressed genes demonstrated the potential to serve as
prognostic biomarkers [9] and therapeutic targets [10].
NIH-PA Author Manuscript
Array based CGH for Genome-wide Analysis of DNA Copy Number Variation
Chromosomal imbalances, including deletions and amplifications, are common in human
tumors. Comparative genome hybridization (CGH) has been widely used to examine for the
NIH-PA Author Manuscript
global analysis of DNA copy number since its first report in the early nineties. The resolution
of conventional CGH is limited by the length of the metaphase chromosomes, which is
approximately 10 megabases and could contain hundreds of genes. Microarray-based CGH
has been developed, which combines microarray technology with the CGH approach [18,19].
Defined DNA fragments (BACs, cDNAs or oligo) have been used to replace metaphase
chromosomes and results in higher resolution. Microarray-based CGH allows a precise
mapping for the regions of genetic aberrations [20,21], including human liver cancers [22].
divided into two categories: array based or non-array based. Several companies provide
commercially-available arrays for methylation analysis [24]. The arrays are designed to
analyze bisulfite-converted DNA (for example, bead arrays from Illumina), or use the
restriction enzyme-based methylation analysis (for example, oligo arrays from NimbleGen or
Agilent). Array hybridization and analysis is similar to what has been described in expression
or CGH arrays. Nonmicroarray based experimental design include Restriction Landmark
Genome Scanning (RLGS), methylation specific digital karyotyping (MSDK), and high-
throughput sequencing after bisulfate conversion [23,24].
Genome-wide association studies have become possible due to several recent technological
advances. Improvements in DNA microarray technology have rapidly reduced the cost of
genotyping SNPs, allowing for the testing of up to one million SNPs using a single microarray.
NIH-PA Author Manuscript
At the same time, the HapMap Project validated nearly four million SNPs in multiple diverse
populations, and determined the extent of linkage disequilibrium (LD) between SNPs [31]. LD
refers to the non-random association of SNPs, typically those that are closest together. The
presence of LD allows for SNPs on genotyping platforms to serve as a proxy for other nearby
SNPs [32]. As a result, current DNA microarrays can assay most common SNPs in the HapMap.
In this way, LD reduces both the genotyping costs of genome-wide association studies and the
multiple testing burden (see the Biostatistical Analysis section below).
It is important to note that genome-wide association studies are better suited to investigate the
potential association of common variants, typically defined as those with a minor allele
frequency of greater than 5%, with disease than rare variants [33]. Since strongly deleterious
alleles are likely to face selection pressure, variants with large effects will be rare; common
variants will have more modest effects on gene function. Most variants associated with disease
in genome-wide association studies are common and have modest or small effects. Because of
this, individual variants will not serve as strong predictors of disease risk [34]. They may,
however, explain a large amount of the risk of disease in the population, or population
attributable fraction. For this reason, interventions that developed to counteract these risk
NIH-PA Author Manuscript
Proteomics Analysis
Expression profiling (mRNA-based for expression level changes) and genomic profiling
(DNA-based for copy number or sequence variation) can not provide a complete picture on
NIH-PA Author Manuscript
the heterogeneity of complex diseased tissue. The level of mRNA or DNA changes do not
always correspond to protein level changes, nor to the post-translational modifications, e.g.
phosphorylation, which are critical in regulating protein activity. Indeed, a number of the
targeted therapeutic agents are designed to inhibit the activity of a protein, e.g., tyrosine kinases.
Therefore, protein profiling is essential in providing the protein molecular signatures for
bioassay and therapeutic development. Two-dimensional polyacrylamide gel electrophoresis
(2D–PAGE) approach is commonly employed to study protein profiles [41,42]. The proteins
can be separated according to their size (molecular mass) and charge (isoelectric point)
properties and their abundance then determined accordingly. The difficulty in elucidating the
identity of the protein spots remains, however, the major obstacle in clinical application. Since
the late 1980s, matrix-assisted laser desorption/ionization (MALDI) mass spectrometry (MS)
has been advanced to allow the rapid measurement for the molecular weights of different
proteins with a time-of-flight (TOF) MS. MALDI-TOF MS has limitations on mass resolution
and accuracy, however, to identify peptides with high confidence. Alternative approaches have
been used to provide a more reliable determination of peptide sequence, including collision-
induced dissociation (CID) with a tandem MS, electron capture dissociation (ECD), infrared
multiphoton dissociation (IRMPD), and electron transfer dissociation (ETD) [43].
NIH-PA Author Manuscript
Furthermore, protein arrays, also known as antibody arrays, are an emerging technology that
provides parallel analysis of multiple proteins [44]. In addition, protein arrays can be applied
to profile specific protein post-translational modifications, such as phosphorylation or
neddylation, and to measure enzyme activities and protein cell-surface expression. Proteomic
approaches have been widely used for biomarker discovery in human tumors [45]. Protein
profiling of blood samples has been the focus in recent years, because it allows repeated
measurements (especially important in monitoring treatment response) and without the need
to obtain tumor tissues. Blood samples, prepared as serum or plasma fractions, have been used
for biomarker discovery [46–48].
values in the expression matrix. Robust biostatistical analyses are required to obtain biological
relevant interpretations of the genomic data [49,50].
NIH-PA Author Manuscript
Specific statistical tools need to be applied to specific genomic studies. Therefore, people have
to choose biostatistical software that is best suited for their specific experiments and those
questions that they are trying to address. In general, statistical analyses of genomic data can
be divided into two major categories: supervised and unsupervised methods [49,50].
Supervised approaches try to identify the genetic events that fit a predetermined pattern. For
example, supervised analysis is used to identify genes that are differentially expressed between
groups of samples, as well as to find genes that can be used to accurately predict the
characteristics of groups. In contrast to the supervised method, the unsupervised approaches
characterize genomic data without prior input or knowledge of predetermined pattern.
Unsupervised analysis is used to identify internal structure in the genomic data set. The most
commonly used unsupervised analysis tool is Hierarchical clustering and Principal
Components Analysis (PCA).
Because genomic studies examine thousands or millions of data points, stringent significance
criteria are applied to the association results. One method is to undertake a Bonferroni
correction by dividing the significance criteria by the number of tests being conducted. For
example, correcting for a million tested common SNPs in a genome-wide association study,
an association would need a p-value of 5 × 10−8 (0.05 divided by 1,000,000) to be considered
NIH-PA Author Manuscript
“genome-wide significant.” It is for this reason that genome-wide association studies typically
involve thousands of subjects to achieve sufficient statistical power. Other multiple testing
adjustments that are less conservative than a Bonferroni correction, such as permutation
derived p-values and false discovery rates, are often employed to maintain statistical power
and to clarify the strength of a reported finding in light of the genomic scale of the experiment
[51,52].
Bioinformatics Analysis
Genomic studies generally generate large amounts of data. Even after statistical analysis, one
may identify large number of de-regulated genes, for example, genes which are methylated or
mutated in tumor samples. Bioinformatics analysis tools have been developed to assist
scientists to extract meaningful data and interpret the genomic data in a functional manner.
NIH-PA Author Manuscript
One of the most commonly used methods to annotate the gene function is through Gene
Ontology (GO, http://www.geneontology.org/) [53,54]. GO classifies gene function according
to three organizing principles: molecular function, biological process and cellular component.
When certain GO terms are statistically enriched in a cluster, it may suggest possible functional
significance of the cluster of genes.
Another commonly used bioinformatic analysis tool is Gene Module Analysis [55,56]. Just as
GO term analysis builds on pre-existing knowledge for the interpretation of microarray data,
one can interrogate the global gene expression profile with respect to known sets of genes by
gene module analysis. In brief, gene module analysis asks whether the genes whose level of
expression changes in an experiment are similar to those which have been observed in another
setting. The gene modules may be defined by function (e.g. GO terms or other annotations),
the presence of specific cis- or trans-regulatory motifs for transcription factor or miRNA
binding, or known responsiveness to specific signaling pathways or drugs. Gene Set
Enrichment Analysis (GSEA) is the most popular modular analysis method that is publically
available (http://www.broad.mit.edu/gsea/) [57]. Ingeniuty Pathway analysis software
(http://www.ingenuity.com/) is a popular commercially available modular analysis tool.
NIH-PA Author Manuscript
associated with DNA copy number changes [58]. In another example, ChIP-chip experiments
can be integrated with gene expression analysis to delineate how specific transcriptional factors
regulate global gene expression. In a recent study, Acevedo et al. performed ChIP-chip
experiments to assay binding of RNA polymerase II, H3me3K27, and H3me3K9 and DNA
methylation in 25,000 promoter regions in normal liver and liver tumor samples [59]. The
experiments successfully identified changes in active and silenced regions of the genome in
liver tumor cells, and in so doing identified novel molecular mechanisms that mediate tumor
specific changes in gene expression in the liver. In addition, by combining genomic analysis
and functional screenings, such as siRNA mediated gene silencing, we can rapidly identify
potential driver genetic events. For example, in a recent study, Zender L et al. identified small
regions of recurrent deletions in human liver cancer by genomic analyses [60]. Using
microRNA based short-hairpin RNA libraries, targeting genes within these deleted regions,
the group conducted in vivo RNAi screening to identify genes that, when silenced, cooperate
with Myc to promote liver cancer development. The study successfully identified and validated
13 tumor suppressor genes for liver cancer [60].
Conclusion
NIH-PA Author Manuscript
In summary, in the next decade, with these tools for genomic analyses being widely used in
biomedical research, one can foresee the emerging of large amounts of data tangling different
biological questions. Functional genomic analyses will likely have multiple implications for
drug discovery and development. For example, integrated functional genomic studies will
likely identify driver mutations or genes which tumor cells depend on for growth and
metastasis. These genes can be used as targets for drug development, and it will lead to drugs
for specific genetic events which are likely to be more efficient and less toxic for cancer
treatment. Genomic analyses will also identify genetic signatures, such as gene expression
profiles or specific mutation status, which can be used to predict drug responsiveness. These
biomarkers will clearly increase the power and efficiency of clinical trials by selecting the
appropriate patient populations and may lead to successful clinical drug development.
Altogether, the application of genomic analysis to drug development will facilitate drug
discovery and the development process in a more efficient manner.
ACKNOWLEDGEMENT
This work is supported by NIH K01CA096774 and R21CA131625 to X.C as well as RGC and NSFC (HKU 7560/06M
NIH-PA Author Manuscript
References
1. Kimmel, A.; Oliver, B. DNA microarrays. Academic Press; 2006.
2. Chung CH, et al. Molecular portraits and the family tree of cancer. Nat Genet 2002;32:533–540.
[PubMed: 12454650]
3. Liu ET. Functional genomics of cancer. Curr Opin Genet Dev 2008;18(3):251–256. [PubMed:
18691651]
4. Chen X, et al. Gene expression patterns in human liver cancers. Mol Biol Cell 2002;13(6):1929–1939.
[PubMed: 12058060]
5. Iizuka N, et al. Oligonucleotide microarray for prediction of early intrahepatic recurrence of
hepatocellular carcinoma after curative resection. Lancet 2003;361(9361):923–929. [PubMed:
12648972]
6. Ye QH, et al. Predicting hepatitis B virus-positive metastatic hepatocellular carcinomas using gene
expression profiling and supervised machine learning. Nat Med 2003;9(4):416–423. [PubMed:
12640447]
7. Lee JS, et al. A novel prognostic subtype of human hepatocellular carcinoma derived from hepatic
progenitor cells. Nat Med 2006;12(4):410–416. [PubMed: 16532004]
NIH-PA Author Manuscript
8. Cheung ST, et al. Identify metastasis-associated genes in hepatocellular carcinoma through clonality
delineation for multinodular tumor. Cancer Res 2002;62(16):4711–4721. [PubMed: 12183430]
9. Cheung ST, et al. Claudin-10 expression level is associated with recurrence of primary hepatocellular
carcinoma. Clin Cancer Res 2005;11(2 Pt 1):551–556. [PubMed: 15701840]
10. Ho JC, et al. Granulin-epithelin precursor as a therapeutic target for hepatocellular carcinoma.
Hepatology 2008;47(5):1524–1532. [PubMed: 18393387]
11. Ruvkun G. The perfect storm of tiny RNAs. Nat Med 2008;14(10):1041–1045. [PubMed: 18841145]
12. Stefani G, Slack FJ. Small non-coding RNAs in animal development. Nat Rev Mol Cell Biol 2008;9
(3):219–230. [PubMed: 18270516]
13. Esquela-Kerscher A, Slack FJ. Oncomirs - microRNAs with a role in cancer. Nat Rev Cancer 2006;6
(4):259–269. [PubMed: 16557279]
14. Fabbri M, et al. MicroRNAs. Cancer J 2008;14(1):1–6. [PubMed: 18303474]
15. Blenkiron C, Miska EA. miRNAs in cancer: approaches, aetiology, diagnostics and therapy. Hum
Mol Genet 2007;16(Spec No 1):R106–R113. [PubMed: 17613543]
16. Kong W, et al. Strategies for profiling microRNA expression. J Cell Physiol 2009;218(1):22–25.
[PubMed: 18767038]
17. Yin JQ, et al. Profiling microRNA expression with microarrays. Trends Biotechnol 2008;26(2):70–
NIH-PA Author Manuscript
34. Hardy J, Singleton A. Genomewide association studies and human disease. N Engl J Med 2009;360
(17):1759–1768. [PubMed: 19369657]
35. Weir B, et al. Somatic alterations in the human cancer genome. Cancer Cell 2004;6(5):433–438.
[PubMed: 15542426]
36. Parmigiani G, et al. Design and analysis issues in genome-wide somatic mutation studies of cancer.
Genomics 2009;93(1):17–21. [PubMed: 18692126]
37. Sjoblom T. Systematic analyses of the cancer genome: lessons learned from sequencing most of the
annotated human protein-coding genes. Curr Opin Oncol 2008;20(1):66–71. [PubMed: 18043258]
38. Simpson AJ. Sequence-based advances in the definition of cancer-associated gene mutations. Curr
Opin Oncol 2009;21(1):47–52. [PubMed: 19125018]
39. Ley TJ, et al. DNA sequencing of a cytogenetically normal acute myeloid leukaemia genome. Nature
2008;456(7218):66–72. [PubMed: 18987736]
40. Ding L, et al. Somatic mutations affect key pathways in lung adenocarcinoma. Nature 2008;455
(7216):1069–1075. [PubMed: 18948947]
41. Issaq H, Veenstra T. Two-dimensional polyacrylamide gel electrophoresis (2D–PAGE): advances
and perspectives. Biotechniques 2008;44(5):697–698. [PubMed: 18474047]700
42. Carrette O, et al. State-of-the-art two-dimensional gel electrophoresis: a key tool of proteomics
research. Nat Protoc 2006;1(2):812–823. [PubMed: 17406312]
NIH-PA Author Manuscript
43. Chen CH. Review of a current role of mass spectrometry for proteome research. Anal Chim Acta
2008;624(1):16–36. [PubMed: 18706308]
44. Haab BB. Applications of antibody array platforms. Curr Opin Biotechnol 2006;17(4):415–421.
[PubMed: 16837184]
45. Sun S, et al. Oncoproteomics of hepatocellular carcinoma: from cancer markers' discovery to
functional pathways. Liver Int 2007;27(8):1021–1038. [PubMed: 17845530]
46. Steel LF, et al. A strategy for the comparative analysis of serum proteomes for the discovery of
biomarkers for hepatocellular carcinoma. Proteomics 2003;3(5):601–609. [PubMed: 12748940]
47. Smalley DM, Ley K. Plasma-derived microparticles for biomarker discovery. Clin Lab 2008;54(3–
4):67–79. [PubMed: 18630736]
48. Zolla L. Proteomics studies reveal important information on small molecule therapeutics: a case study
on plasma proteins. Drug Discov Today 2008;13(23–24):1042–1051. [PubMed: 18973825]
49. Slonim DK. From patterns to pathways: gene expression data analysis comes of age. Nat Genet
2002;32:502–508. [PubMed: 12454645]
50. Allison DB, et al. Microarray data analysis: from disarray to consolidation and consensus. Nat Rev
Genet 2006;7(1):55–65. [PubMed: 16369572]
51. Browning BL. PRESTO: rapid calculation of order statistic distributions and multiple-testing adjusted
NIH-PA Author Manuscript
P-values via permutation for one and two-stage genetic association studies. BMC Bioinformatics
2008;9:309. [PubMed: 18620604]
52. Farcomeni A. A review of modern multiple hypothesis testing, with particular attention to the false
discovery proportion. Stat Methods Med Res 2008;17(4):347–388. [PubMed: 17698936]
53. Rhee SY, et al. Use and misuse of the gene ontology annotations. Nat Rev Genet 2008;9(7):509–515.
[PubMed: 18475267]
54. Harris MA, et al. The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res
2004;32(Database issue):D258–D261. [PubMed: 14681407]
55. Segal E, et al. From signatures to models: understanding cancer using microarrays. Nat Genet
2005;37:S38–S45. [PubMed: 15920529]
56. Wang X, et al. Gene module level analysis: identification to networks and dynamics. Curr Opin
Biotechnol 2008;19(5):482–491. [PubMed: 18725293]
57. Subramanian A, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting
genome-wide expression profiles. Proc Natl Acad Sci U S A 2005;102(43):15545–15550. [PubMed:
16199517]
58. Lee SA, et al. Integration of genomic analysis and in vivo transfection to identify sprouty 2 as a
candidate tumor suppressor in liver cancer. Hepatology 2008;47(4):1200–1210. [PubMed:
18214995]
NIH-PA Author Manuscript
59. Acevedo LG. Analysis of the mechanisms mediating tumor-specific changes in gene expression in
human liver tumors. Cancer Res 2008;68(8):2641–2651. [PubMed: 18413731]
60. Zender L, et al. An oncogenomics-based in vivo RNAi screen identifies tumor suppressors in liver
cancer. Cell 2008;135(5):852–864. [PubMed: 19012953]
NIH-PA Author Manuscript
Figure 1.
Schematic illustration of new tools for functional genomic analysis
Figure 2.
Functional genomic approach in liver cancer studies using gene expression profiling. Liver
tissues and liver cancer tissues were collected with patient consent. The purified nucleic acid
samples would be labeled with fluorescent dyes and hybridized to the arrays. The fluorescent
signal intensities would be further analyzed by biostatistic and bioinformatic methods [4]. The
clinical implication for functional genomics includes expression finger printing to delineate
the clonality relationship of the multinodular tumors [8], identify novel biomarkers for
prognostication [9] and therapeutic development [10].