Module-1: Genomics and Proteomics Notes
Module-1: Genomics and Proteomics Notes
MODULE-1
Genome definition: A genome is the complete set of genetic information in
an organism. It provides all of the information the organism requires to
function. In living organisms, the genome is stored in long molecules of DNA
called chromosomes.
The study and analysis of genomes is called genomics. Genomic studies are
characterized by simultaneous analysis of a large number of genes using
automated data gathering tool. It focuses on identifying the variants associated
with diseases, response to treatment or future patient prognosis.
Genomic study can be tentatively divided into structural genomics and
functional genomics. Structural genomics (Architecture + Sequencing) refers
to the initial phase of genome analysis, which includes construction of genetic
and physical maps of a genome, identification of genes, annotation of gene
features, and comparison of genome structures. Functional genomics (Gene
expression and Regulation) refers to the analysis of global gene expression and
gene functions in a genome.
EPIGENOMICS focuses on genome wide characterization of reversible
modification of DNA or DNA associated proteins, such as DNA methylation or
histone acetylation.
TRANSCRIPTOMICS examines RNA levels genome wide, both qualitatively
(which transcripts are present, identification of novel splice sites, RNA editing
sites) and quantitatively (how much of each transcript is expressed).
PROTEOMICS used to quantify peptide abundance, modification, and
interaction. Adapted for high throughput analyses of thousands of proteins in
cell body or body fluids.
METABOLOMICS simultaneously quantifies multiple small molecules, amino
acids, fatty acids, carbohydrates or other products of cellular metabolic
functions.
MICROBOLOMICS: All the microorganisms of a given community including
human skin, mucosal surfaces and the gut colonized by microorganisms
consisting of bacteria, viruses and fungi, are investigated together.
TRADITIONAL BIOLOGY
Small team working on a specialised topic.
Well-defined experiment to answer practical questions.
NEW “HIGH THROUGHPUT” BIOLOGY
Large international and integrated teams using cutting edge technology
for biological products.
Result are given as raw to scientific community without any underlying
hypothesis.
GENOME ORGANIZATION
one full complement of the genetic material in the organism –
For Haploid organisms: entire DNA content
For Diploid organisms: ½ of the DNA content
Various levels of genome comparison:
1. Genome structure – Circular/Linear
2. Genome size – small/large
3. Genome packaging – minimal/extensive
4. Gene length, gene density, repeat content
5. Gene structure – continuous/split
6. Gene organization – operon/non-operon
GENOME STRUCTURE
Viruses are more advanced that they require a minimum amount of genome
information to let them survive and replicate in the host when given a condition.
Considering human beings at the top of the level/evolution why do we require
95% of non-coding DNA in us?
Most prokaryotes – genome is circular, usually a single diffused chromosome in
the nucleoid, usually without introns, relatively high gene density.
Extrachromosomal DNA as plasmids (usually circular) in prokaryotes.
Most eukaryotes – genome is linear, usually as compact chromosomes in the
nucleus, relatively low gene density.
Contour length of DNA from a single human cell = 2 meters. Human
chromosome vary in length over a 25 fold range. Carry organelle genome as
well.
Extrachromosomal circular DNA in mitochondria and chloroplast in eukaryotes.
GENOME SIZE AND DENSITY
The gene density is a measure of the number of genes per million base pairs;
prokaryotic genomes have much higher gene densities then eukaryotes.
Gene density decreases drastically as one moves to higher taxonomic orders –
this is attributed to disproportionate increase in genome size as compared to
gene number.
The genome size (C value) is defined as the amount of DNA in a haploid
genome. The C value is calculated by dividing the mass of the DNA sample and
by the copy number of the target gene. In prokaryotes, genome size and gene
number are strongly correlated, but in eukaryotes the vast majority of nuclear
DNA is non-coding.
Number of genes increases as one moves to higher taxonomic orders.
GENOME PACKAGING
Prokaryotes:
Some proteins are known to be involved in the supercoiling; other proteins and
enzymes such as DNA gyrase help in maintaining the supercoiled structure.
DNA supercoiling refers to the over- or under-winding of a DNA strand, and is
an expression of the strain on that strand. Supercoiling is important in a number
of biological processes, such as compacting DNA. Additionally, certain
enzymes such as topoisomerases are able to change DNA topology to facilitate
functions such as DNA replication or transcription.
Prokaryotes compress their DNA into smaller space through supercoiling.
Genomes can be negatively supercoiled, meaning that the DNA is twisted in the
same direction as the double helix. Most bacterial genome are negatively
supercoiled during normal growth. Most prokaryotes do not have histones
(except archaea). HU is the most abundant protein in the nucleoid, is
structurally very different to eukaryotic histones but acts in a similar way,
forming tetramer around which approximately 60 bp of DNA becomes wound.
Other proteins including Integration host factor (IHF), can bind to specific
sequences within the genome and introduce additional bends. Once the
prokaryotic genes has been condensed, DNA topoisomerase I, DNA gyrase and
other proteins help maintain the supercoil.
Because there is no nuclear membrane to separate prokaryotic DNA from the
ribosomes with the cytoplasm, transcription and translation occur
simultaneously in these organisms.
ELEMENTS OF GENOME ORGANISATION
While bacterial genomes are immensely fluid in terms of gene repertories they
are extremely conservative in terms of chromosome organization.
Eukaryotes:
The arrangement of DNA on chromosome through nucleosome assembly is
known as DNA packaging.
The folding of DNA is started when the proteins called Histones interact with
DNA. The eukaryotic DNA packaging is organized into 3 major structures;
Nucleosome assembly
30nm fiber
Chromatid
Histone proteins are positively charged protein molecules which interact with
negatively charged phosphate of DNA and makes a tight wrap. The H1 histone
is not involved in the nucleosome assembly. Hence the remaining four histones
are called as core element or nucleosome core particles.
DNA molecules wrap around the Octamer of 4 histones and create the
nucleosome core.
The nuclease enzyme helps to cut DNA into small pieces. The enzyme is used
in the study of nucleosome properties. Particularly, these types of nuclease
enzyme only cut the DNA which links two nucleosome assembly.
However, the enzyme is unable to cut the tightly wrapped DNA and the
nucleosome assembly remains intact. “The short stretch of DNA which connects
the two nucleosome assembly is called as a linker DNA”.
30nm fiber
The 30nm fiber organized further into one of the two described models: the
solenoid model and the zigzag model.
In the zigzag model of 30nm fiber, the linker DNA is passed through the central
axis of the fiber and creates a zigzag pattern of arrangement.
The entire process of this next level of compaction depends on the length of the
linker DNA. As we discussed earlier the length of the linker DNA is variable. If
the linker DNA is long enough to passes through the axis, it will arrange in a
zigzag manner otherwise it will arrange in the solenoid.
Chromatid
Finally, the loop creates chromatid (not chromatin). The chromatid is now
attached with one centromere and becomes a chromosome. Each chromosome
has two sister chromatids.
During the replication, polymerase only accesses the DNA which is not bound
by histones. Histone acetylation, methylation, phosphorylation and other
modification allows DNA replication and transcription.
GENE STRUCTURE:
The prokaryotic structural genes are found continuously with any non-coding region
while eukaryotic structural genes are divided into exons (coding regions) and introns
(non-coding regions).
GENE ORGANIZATION:
3 genomic paradoxes
K-value paradox: complexity does not correlate with the chromosome number.
C-value paradox
G value: the number of gene found in the haploid genome; the number includes
predicted and ORFs.
The range in C values does not correlate well with the complexity of the
organism. This phenomenon is called the C-value paradox.