0% found this document useful (0 votes)
17 views

Intro - MolBio

The document provides an introduction to molecular biology, covering topics such as cells, DNA, RNA, proteins, and the central dogma of molecular biology. It describes the basic components of cells, DNA structure and replication, RNA transcription and processing, and protein synthesis.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

Intro - MolBio

The document provides an introduction to molecular biology, covering topics such as cells, DNA, RNA, proteins, and the central dogma of molecular biology. It describes the basic components of cells, DNA structure and replication, RNA transcription and processing, and protein synthesis.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Introduction to molecular biology

Summary

• Cells
• Chromosomes
• DNA
• RNA
• Aminoacids
• Proteins
• Genomics
• Transcriptomics
• Proteomics

1
Cells

All the living beings are composed of cells, that


are the basic unit of life. Each cell derives from
other cell.

Prokaryotes
No nucleus or internal membranes.
Eukaryotes
• Nucleus.
• Internal membranes.
• Organelles inside the cell that play
different and specific roles.

Organisms can be:


Unicellular
• Prokaryotes: bacteria, rchaea.
• Eukaryotes: baker yeast.
Multicellular
•Eukaryotes: animals, plants, fungi…

Human beings: 60 E18 cells, 320 different types

Cells

Composition

70% Water
7% Small molecules:
• Salts
• Lipids
• Aminoacids
• Nucleotides
23% Macromolecules:
• Proteins
• Polysaccharide

Cell functions:

A cell contains all the necessary information to


perform a replication (a virus does not!). Processes
developed by cells include:
Metabolic pathways
Traduction of RNA to proteins

2
Chromosomes

• The nucleus of Eukaryots contains


one or more DNA molecules (double
stranded). Each of these
supermoleluces are called
chomosomes.
• For examples, human beings have
22 pairs of autosomes) and 1 pair of
sexual chromosomes. :

Cells

• Almost all the cells in an organism have the same


genome (some times there are slight differences).
• The DNA represents all the information needed by the
cell to perform its functions.

3
Three basic macromolecules for life

• DNA
– It contains all the information needed by the cell (the “hard drive”)
– Actually, since almost all the cells in an organism share the same
genome, it contains all the information needed by ANY cell to perform
their functions.
– It stays (almost) always in the nucleus.
• RNA
– RNA has two main functions:
• It mimics the information in DNA (located in the nucleus) and migrates to
other parts of the cell where this information is used (messenger RNA,
mRNA)
• It has a crucial role in protein synthesis (transfer RNA, tRNA).
• Proteins
– Many different functions (signalling, structural, enzymes,
regulation…). They are the key constituents of the organism.

Central dogma of molecular biology

• It is not a DOGMA
– A dogma is some
important part of the
faith that must be
believed.
– The researcher that
coined this term finally
recognized that “I did not
know what dogma
meant”.
– There are strong support
to this… it is not a dogma
(or at least there are
other fields of knowledge
that deserve this term
much more ☺.

4
DNA vs RNA

DNA: code of life

DeoxyriboNucleic Acid

There are four different nucleotides for all living beings: Adenine (A), Guanine
(G), Cytosine (C) y Thymine (T). They have two complimentary pairs: A-T and
C-G

5
DNA structure

DNA replication

6
Structure of a nucleotide

• Purines: Adenine (A) and Guanine (G).


There is a double ring.
• Pyrimidines: Thymine (T), Cytosine (C) and
Uracil (U). Thymine is substituted by Uracil
on RNA. Single ring.

One “nucleotide” is a compound formed by one base (A, C, T ó G), one sugar
molecule and phosphoric acid.

How to read a DNA sequence?

All the nucleotides have two


bonds: 5’ and 3’. The number is
the position of the carbon atoms
in the sugar molecule.
Nucleotides, in turn, form a
phosphodiesther bond.

The DNA molecule is created when bonds betwen 5’


and 3’ of the nucleotides are set. DNA is alwyas read
from 5’ to 3’. Funny equivocations…

Sequence: TGACT

7
DNA
Code:

Symbol Meaning Origin of the name

G G Guanine
• It can be seen as a code A A Adenine
with only 4 letters instead of T T Thymine
2 (binary coding).
C C Cytosine
• How many letters? 16 to
R G or A puRine
describe different
Y T or C pYrimidine
possibilities.
M A or C aMino

K G or T Keto

S G or C Strong interaction (3 H bonds)

W A or T Weak interaction (2 H bonds)

H A or C or T not-G, H follows G in the alphabet

B G or T or C not-A, B follows A

V G or C or A not-T (not-U), V follows U

D G or A or T not-C, D follows C

N G or A or T or C aNy

- --- None (gap)

DNA is double stranded

• Hydrogen bonds between the nucleotide pairs..


• DNA is not symmetric!! It has two directions and is
read, always, from 5’ to 3’.
• Both strands are complimentary: A-T and C-G
– Forward strand
– Reverse strand

8
Mitochondrial DNA

Mitochondrial DNA (mtDNA) is the DNA located in organelles called mitochondria. All
mtDNA is received by the mother (since mitochondria is provided by the zygote.
Mitochondria are sometimes described as "cellular power plants," because they generate
ATP, used as a source of chemical energy .

RNA (RiboNucleic Acid)

• Protein synthesis occurs in the Ribosomes


• Organelles located in in the cytoplasm outside the nucleus.
• DNA is in the nucleus.
• RNA transports the information from the nucleus to the Ribosomes

• The mechanism that creates RNA complimentary to DNA is called

Transcription.
DNA vs RNA:
(T) is substituted by uracil (U).
• RNA is single stranded. It can bend and form two stranded chains (palindromes) (“Sit on a
potato pan, Otis”).

9
Messenger RNA (mRNA)

• Part of the DNA is


trascripted into RNA (RNA is
a copy of the DNA).

• RNA goes to the cytoplasm


and in the ribosomes, mRNA
is used to build proteins.

• RNA itself is the message


from the nucleus to the
cytoplasm.

Transcription process

Inititiation
In the first stage, RNA polimerase binds to a region of DNA
(that is called the promoter). The enzyme opens de
DNA, and allows the creation of the RNA molecule that
has a complementary sequence to the DNA.

Elongation
RNA polimerase moves along the supporting strand and
RNA nucleotides are inserted in the new RNA molecule

Termination
RNA termination process is a complex process (it involves
palidrons –hairpins- in prokaryots and more complex
processes in eukaryots). Once it has finished, DNA is
closed again, and RNA moves form the nucleus to the
cytoplasm.

10
Transcription in action

RNA maturation

• In Eukaryots, the sequence that appears in the genome is not exactly the one
translated.

• RNA has a maturation process


• Remove intermediate sequences called introns.
• Join the exons using polymerases.

• A single gene (DNA) can raise several variants (using different exons). This process is
called Alternative Splicing.

11
What is a gene?

• Promoter region. It contains the necessary sequences to activate or deactivate


the gene. Limits are fuzzy and depends on different genes. Proximal promoter is
considered to be 1000-50000 bp upstream the TSS (transcription start site)

• Exons: Coding regions of the gen (it converts into proteins)


1 to 178 exones/gene (average: 8.8)
8 bp to 17 kb /exon (average145 bp)

• Introns: Non coding region flanked by two exons.


Size (average): 1 kb – 50 kb /intron (much larger than exons)

• Size of a gene: the largest: 2.4 Mb (Dystrophin). Average: 27 kb.

12
PROTEINS

Aminoacids

Amino acids are the basic structural building units of proteins. An amino acid
is a molecule that contains both amine and carboxyl functional groups
with the general formula H2NCHRCOOH, where R is an organic
substituent.

They form polymer chains


Short ones called peptides, large ones called polypeptides or proteins.
Translation
Process to form the protein according to the mRNA template.

As both the amine and carboxylic acid groups of amino acids can react to form amide bonds, one amino
acid molecule can react with another and become joined through an amide linkage. This polymerization
of amino acids is what creates proteins.

13
Aminoacids

• 20 standard aminoacids
• Bricks to build proteins.

• 10 essential amino acids


• Cannot be synthesized by
human body.
• They therefore must be
obtained from food
• Plants synthesizes all of
them.

Aminoacids

Amino Acid 3-Letter 1-Letter polarity acidity hydrophobycity


Alanine Ala A nonpolar neutral 1.8
Arginine Arg R polar basic -4.5
Asparagine Asn N polar neutral -3.5
Aspartic acid Asp D polar acidic -3.5
Cysteine Cys C polar neutral 2.5
Glutamic acid Glu E polar acidic -3.5
Glutamine Gln Q polar neutral -3.5
Glycine Gly G nonpolar neutral -0.4
Histidine His H polar basic -3.2
Isoleucine Ile I nonpolar neutral 4.5
Leucine Leu L nonpolar neutral 3.8
Lysine Lys K polar basic -3.9
Methionine Met M nonpolar neutral 1.9
Phenylalanine Phe F nonpolar neutral 2.8
Proline Pro P nonpolar neutral -1.6
Serine Ser S polar neutral -0.8
Threonine Thr T polar neutral -0.7
Tryptophan Trp W nonpolar neutral -0.9
Tyrosine Tyr Y polar neutral -1.3
Valine Val V nonpolar neutral 4.2

14
Proteins

• Proteins are large molecules composed of aminoacids.


• Their 3D structure is complex
• It is not a double helix as DNA: the shape is different for each of
them.
• Proteins fold. This folding plays a crucial role in their function
• For example, mad cow disease is produced by an anormal folding
of a protein.

Protein structure
• Protein structure is crucial to
determine their chemical
properties, and even, their
function.
• 3D structure determines which
are the aminoacids in the
surface.
• There are 4 levels at which
structure can be studied:
1. Aminoacid sequence
2. Polipeptide folding
3. Protein shape
4. Protein interactions (that include
changes in the positions of the
atoms).

15
Central Dogma (once again)

• Transcription brings the data from


DNA to RNA
• RNA from the nucleus to the
Ribosoms
• Translation obtains protein according
to the genetic code and the
corresponding mRNA
– tRNA is used as a lorry to carry the
aminoacids as we will see.

Translation

• Translation is the second step in the central


dogma.
• mARN is decoded using the genetic code
• Aminoacids follow the sequence given
by mRNA.
• This process takes place in the cytoplasm.
• tRNA is used as a “lorry” to carry the
aminoacids.
•Ribosomes are the factories to build the
proteins.

16
Trasnfer RNA (tRNA)

tRNA is a RNA that is used to carry aminoacids to the ribosomes in order to build teh proteins.
tRNA abundance is larger than mRNA (75% vs 15%)
Most RNA in the cell is tRNA
tRNA acknoledges mRNA and transfer the correspondign aminoacid to the protein being
created.

Genetic code
Codon: a sequence of 3 nucleotides that codes for an aminoacid according to
this table.

AUG codes methionine, and is also the start code. First AUG in mRNA is the region where translation starts.

17
Some exceptions:

Genetic code is almost universal.

Other considerations…

• A codon is a sequence of 3 nuclotides (DNA or RNA) that


codes for a particular aminoacid.
– There are 4 possible bases (RNA) : A, C, G y U
– 3 bases per codon
• Therefore, there are: 4 * 4 * 4 = 64 possible codons
• Special codons:
– Start codon: AUG. Translation starts in this codon. It also codes
for an aminoacid (methinine)
– Stop codons (three flavours): UAA, UAG, UGA
• There are 61 codons left to code 19 aminoacids
– Genetic code is redundant: the same aminoacid may be coded
by several codons.

18
Translation again:

Anticodon: A sequence of 3
nucleotides in tRNA that acknowledge
the corresponding codon in mRNA.

Using the anticodon, the aminoacid to


include in the protein is selected.

tRNA carries the “free” aminoacid and,


in the Ribosome, it is joined to the
polypeptide chain that it is being
created.

For example, tRNA with anticodon


UAC, corresponds to the AUG codon
that, in turn, codes methionine.
ATGGAAGTATTTAAAGCGCCACCTATTGGGATATAAG…

ATG GAA GTA TTT AAA GCG CCA CCT ATT GGG ATA TAA G…
M E V F K A P P I G I stop

Translation in action

19
In brief:

• Proteins are coded in the genes in ADN located in the nucleus. DNA
stays always in the nucleus.

• Ribosomes are factories to build proteins located in the cytoplasm.


mRNA carries the mesage from the nucleus to the ribosomes.
There is an intermediate step called mRNA maturation in which
introns are excluded and exons are retained.

• Ribosomes build what mRNA codes, using aminoacids that in turn,


are carried by tRNA.
– Ribosomes are composed of proteins and rRNA (a third class of RNA…
and there are even more!!)

Some important
Definitions in
BIOINFORMATICS

20
ORF (Open Reading Frame):

• Coding From DNA to protein is done by codons. There are three possibilities (starts
with the first, the second or the third nucleotide in the sequence). We can use one
strand (forward) or the other (reverse strand). Each of these six possbilities are
called a reading frame. Only one of them is valid. For example, this sequence has
the following possibilities :
ATGCC (M) ATGCC (C) ATGCC (A)

• A sequence flanked by start codon and a stop codon is called an Open Reading
Frame (ORF).

ATG TGA
Genomic Sequence

Open reading frame

ORFs as gene candidates

• An open reading frame that begins with a start codon (ATG)


• Most prokaryotic genes code for proteins that are 60 or more
amino acids in length
• The probability that a random sequence of nucleotides of length
n has no stop codons is (61/64)n
• When n is 50, there is a probability of 92% that the random
sequence contains a stop codon
• When n is 100, this probability exceeds 99%
– A large sequence without stop codons is probably coding a protein.

21
Definitions:

• Nucleic acids = composed of nucleotides= bases or base


pairs
• Short form: nt (nucleotides), bp (base pairs).
• 400-nt: means 400 nucleotide positions (in DNA they
are 800!)
• 400-bp menas 400 base pairs
• 1000000-bp = 1000-kb = 1-Mb

Genomic analysis

How to build a whole genome in four steps:


– Cut it!:
• Restriction enzymes break the DNA in specific sites.
• It is divided into sort pieces.
– Copy it!:
• It is easy to copy DNA (it was designed for that!).
• We get several clones of each DNA sequence using the Polymerase Chain
Reaction (PCR).
– Using a cycle of PCR, the concentration of DNA is doubled
» 20 cycles of PCR increases the concentration by 2^20…
» (about 1 million times)
– Read it!:
• Electrophoresis to read small fragments.
– Ensembl it!:
• Using all the fragments, there are overlapping sequences that can be used to
perform the ensembl (just like building a puzzle).
• This puzzle has “large sky regions” difficult to build: there are large parts of
the genomes quite repetitve (and they are also important).

22
Genomic analysis and bioinformatics

Once that we have the sequence we can find genes (using


statistical properties of the intra gene regions).
It is also important to measure gene expression and predict
their function.

Gene hunting

Protein sequence
DNA sequence analysis analysis

2001: First draft version


of the human genome. Protein function can be inferred from
2003: Human genome their sequence and structure. Structure
curated. First “release analysis gives better resutls
version”. Mouse genome
completed.

Bioinformatics analysis:

• DNA
– Useful for genomic diseases
• Single gene (mendelian), chromosomal.
• Multifactorial o complex diseases.
Predisposition to develop a disease
– Does not change if the organism has an acquired disease condition 
Not valid as a marker of an acquiered disease
• RNA
– Easy to measure
– RNA concentration changes for disease state
Early marker for different diseases
• Proteins
– It is difficult to perform a whole proteome analysis.
– They finally explain most of the disease targets Closer to the
biological fact
Most reasonable drug targets

23
Genomes:

ORGANISM CHROMOSOMES Size GENE Number

Homo sapiens 23 3,200,000,000 ~ 30,000


(Humans)

Mus musculus 20 2,600,000,000 ~30,000


(Mouse)

Drosophila 4 180,000,000 ~18,000


melanogaster
(Fruit Fly)

Saccharomyces 16 14,000,000 ~6,000


cerevisiae (Yeast)

Zea mays (Corn) 10 2,400,000,000 ???

Transcriptome:

• Different mRNA (including splice forms) for a particular organism.


– About 30.000 genes
– About 250.000 splice variants
• Other RNA fucntions related with trasncription regulation
– miRNA: small pieces of RNA that interrupt the transcription of a gene

24
Proteome

• The complete collection of proteins in an organism


– Nobody knows how many… At least several millions.
– For each splice variant, using post transductional modifications,
different proteins (with different functions can be obtaines).
• One gene  Several splice forms  Several proteins
– Proteins are modified by other molecules that are joined to it
• Phosphate, acyl, methil, sugars, lípids, etc.,
• They change radically the activity of the protein
– There are many proteins with two forms: idle and active. The
transition is done by adding a phosphate group.
• Many disparate biological activity can be assigned to a single gene.

Questions?

25
Problem:

Genetic code

AUG codes methionine, and it is also the start codon.

26

You might also like