Introduction To Bioinformatics: Course 341 Department of Computing Imperial College, London Moustafa Ghanem
Introduction To Bioinformatics: Course 341 Department of Computing Imperial College, London Moustafa Ghanem
Course 341
Department of Computing
Imperial College, London
Moustafa Ghanem
Learn basic data analysis methods and how to apply them in the
analysis of gene expression data
Data Clustering
Data Classification
Statistical Analysis
Recommended Texts
Lecture Notes
Handouts
Data Mining
Microarray Technology
Lecture Overview
Background
Functional Genomics
Functional Genomics:
Background
The Drug Discovery Pipeline
Drug Discovery is a lengthy process that takes years and requires the use
of bioinformatics, chemoinformatics and clinical-informatics tools.
Target
Identification
Target
Validation
Lead
Identification
Lead
Optimization
Preclinical
Trials
clinical
Trials
Background
Drug Discovery
Target Identification:
Target Validation:
Compound Screening:
Prioritise studies
Cell
Nucleus
Chromosome
Background
Protein
Gene (mRNA),
single strand
Gene (DNA)
Cells are of many different types (blood, skin, nerve), but all
arose from a single cell (the fertilized egg)
DNA sequence
(split into genes)
codes for
Amino Acid
Sequence
What is?
folds into
Protein
has
3D
Structure
dictates
Protein
Function
determines
Cell
Activity
Gene Expression:
Microarrays:
Background
Gene Expression
A Dynamic View
Environment
Metabolites
DNA
Growth rate
RNA
Protein
Expression
A Dynamic View
metabolites
protein
mRNA
time
event
Microarray Technology
Quantitative Measurement of Gene Expression
Applications of Microarray
Technology
10
9
0.8
8
7
0.6
6
5
0.4
4
0.2
3
2
1
0
-0.2
0
100
200
300
400
500
600
700
Microarrays
Basic Idea
technology (GeneChip )
http://www.affymetrix.com/
Basic Idea
Background
DNA/RNA Hybridization
DNA molecules:
DNA-RNA hybridization:
When a mixture of DNA and RNA
is heated to denaturation
temperatures to form single
strands and then cooled, RNA can
hybridize (form a double helix) with
DNA that has a complementary
nucleotide sequence.
The Array
Applying a Labelled
Sample
After the sample is applied, a laser light source is applied to the array.
The fluorescent label enables the detection of which probes have hybridised
(presence) via the light emitted from the probe.
If gene is highly expressed, more mRNA exists and thus more mRNA
hybridises to the probe molecules (abundance) via the intensity of the light
emitted.
Chemistry Basics:
Surface Chemistry is used to attach the probe molecules
to the glass substrate.
The Process
Labelled targets
in solution
Heteroduplexes
Probes on array
Hybridisation
The array
2.
3.
4.
5.
Types of Microarrays
Output type
The probes need to generate an output that is easy to read (spots lie in
defined positions and be of regular size and shape and even spacing).
The probes have to have high sensitivity to detect the mRNA and the
intensity of the spot light must be differentiable from background noise.
The intensity of a spot light also needs to correlate with the abundance
of the target molecule in the sample.
Probe Types
In the first case (cDNA), highly parallel PCR is used to amplify DNA
from a clone library, and the amplified DNA is purified, the clones are
typically long sequences (Complete genes or ESTs).
3.
4.
5.
Spotting Process
In-situ Synthesis
Affymetrix
Advantages
Advantages
Limitations
Limitations
Most laboratories use fluorescent labelling, with the two dyes Cy3 (excited by a
green laser) and Cy5 (excited by a red laser).
In Dual label experiments, two samples are hybridised to the arrays, one
labelled with each dye; this allows the simultaneous measurement of two
samples (e.g. for differential analysis)
In Single label experiments, only one sample is hybridised to the arrays labelled
with one dye. (in which case control needs to be measured using a separate
chip).
Choice between single and dual label is governed by array technology and
underlying chemistry.
+ Red label
+ Green label
RNA sample 2
RNA sample 1
e
Slid
Affymetrix GeneChip
RNA labeled and scanned in a single color one sample per chip
These probe pairs, called the Perfect Match probe (PM) and the Mismatch
probe (MM), allow the quantitation and subtraction of signals caused by
non-specific cross-hybridization.
PM to maximizehybridization
MM toascertainthedegreeofcrosshybridization
Affymetrix GeneChips
Various Image processing techniques may be applied to read and interpret the
outputs of Microarrays
Commercial Microarray (e.g. Affymetrix) systems use proprietary software
Image Analysis software packages exist for the analysis of the output of custom made
chips (e.g. GenePix Pro, Array Vision, TIGR Spot Finder, etc)
Typical Problems of Raw Output
Uneven grid positions
Curves within a grid
Variable Spot size or shape
Variable Distance between spots
Intermediate data
Array scans
Images
Samples
Spots
Genes
Raw data
Spot/Image
quantiations
Gene
expression
levels
In spot quantitation matrices, rows typically represent all the measurements made from
individual spots on the array. These can include mean and median pixel intensities of the spot
and local background, etc.
An experiment typically consists of one or more spot quantitation matrices representing all
arrays used in the study.
In the gene expression matrix, rows represent genes (as opposed to features/spots on the array)
and columns represent measurements from different experimental conditions measured on
individual arrays.
An example is each column representing measurements at different time points (to, t1, t2, ) in time
course experiments
A second example is each column representing different tissue type
A third is each column representing a different individual
A fourth is having groups of columns representing measurements from diseased cells, and other groups
representing measurements from health cells,
etc,
Each of the above matrices requires the application of data normalisation technuiques as
discussed in the next lecture.
Summary
Microarrays
Basic Concept
Sources of errors
Image processing is required
Images are converted into gene expression matrices for further analysis