Cours M1OSBIntroductionProteoIF-TC-2023
Cours M1OSBIntroductionProteoIF-TC-2023
The PROTEOMICS
Includes all the tools and strategies used to study the proteome, i.e. to
identify and characterize proteins
N.B The term was introduced in 1997 by P. James in his publication Terme
« Protein identification in the post-genome era: the rapid rise of proteomics. »,
Quarterly reviews of biophysics 3
The complexity of the Proteome
A high dynamics giving rise to huge a complexity
The proteome is highly dynamic by comparison to the genome
A single
genome
But different
proteomes
Human genome Human Transcriptome
2.9 billion bp 10,000-12,000 gene
20,000-25,000 genes products
+ Post-translational
About Reference modifications (PTMs) + Isoforms & Truncated
75,000 proteins e.g. phosphorylation,
proteins
glycosylation …
Not all
expressed at the
same time or in
the same cells
4
The PTMs
Chemical Modifications Fundamental to the Cell Signaling
5
The PTMs
Chemical Modifications Fundamental to the Cell Signaling
e.g. Phosphorylation
is an important cellular regulatory mechanism as many enzymes and
receptors are activated/deactivated by phosphorylation and
dephosphorylation events, by means of kinases and phosphatases. In
particular, the protein kinases are responsible for cellular transduction
signaling and their hyperactivity.
6
Grabbing the Proteome
What for?
Large Scale Identification of Proteins
MODERN PROTEOMICS
Is a combination of biological questions, cutting-edge bioanalytical
technologies and bioinformatics
9
Protein Databases
A Mandatory Step to the Proteome Identification
10
Protein Databases
A Mandatory Step to the Proteome Identification
• TrEMBL:
-Complementary bank to Swiss-Prot to access new protein sequences
-All sequences translated from DDBJ/EMBL/GenBank as well as sequences from
publications loaded by users
UniprotKB
11
Protein Databases
UniProtKB - https://www.uniprot.org/
• The UniProt Knowledgebase (UniProtKB) is the central hub for the collection of functional
information on proteins, with accurate, consistent and rich annotation.
• In addition to capturing the core data mandatory for each UniProtKB entry (mainly, the amino
acid sequence, protein name or description, taxonomic data and citation information), as
much annotation information as possible is added.
12
Protein Databases
UniProtKB - https://www.uniprot.org/
13
Protein Databases
UniProtKB - https://www.uniprot.org/
14
NeXtProt
The Human Protein Project
15
The Hidden Proteome
The Alternative Proteins: A novel class of Proteins
Death of a dogma:
Eukaryotic mRNAs can code for
more than one protein
17
The Proteome Dynamic Range
An additional difficulty to grab the proteome complexity
The proteome vs. the transcriptome
• The dynamic range of the proteome is wide
• mRNA = 3-4 orders of magnitude dynamic range
• Proteins = span over >7 orders of magnitude
• Abundances from 1 to 10,000,000 copies per cell
Mass
Spectrometry
(MS)
18
Mass Spectrometry
A versatile & robust technology
Mass Spectrometry
• Give access to the molecular weight (MW) of compounds with high accuracy
• Through the measurement of M/Z
• Provide structural information
• For peptides and proteins = access the amino acid sequence
• MS separates molecules = can cope with analyzing mixtures
Orbitrap MS
FT-ICR MS TimsTOF MS
19
Mass Spectrometry
A versatile & robust technology
• Structure information obtained by gas phase fragmentation
So-called Tandem MS or
MS/MS
Modern MS instrumentation
performs analysis with up to
four orders of magnitude of the
dynamic range in the untargeted
mode, in a stark mismatch with
the proteome dynamic range
Important improvements
have been made over the
last decade in the speed
and depth of the proteome
analysis
21
Large-Scale MS Based Strategies
Untargeted Proteome-Wise Analysis
I. Samples
II. Extraction
According to the phyico-
chemical properties e.g.
large hydrophobic III. Separation
(membrane proteins)
vs. low hydrophilic (e.g. Proteome Complexity IV. MS Analysis
cytokines) far too important for
direct analysis by MS
Identification Relative
Quantification
V. Purification
/production
Tissues
Cells in culture
23
Large-Scale MS Based Strategies
The Historic Gel Based Workflow
The conventional 2D gel Separation Pipeline "Peptide Mass Fingerprint
MS analysis PMF
In gel Enzymatic
Protein (MW, IP) digestion (MALDI, ESI) ★
Intensity
Spot exision Low confidence ID M/z
M/z 24
Large-Scale MS Based Strategies
The Historic Gel Based Workflow
The relative quantification is obtained from the 2D gels spots
25
Large-Scale MS Based Strategies
The Historic Gel Based Workflow
The 2D DIGE Proteomics
26
Large-Scale MS Based Strategies
Protein Identification Through Database interrogation
Protein Identification is obtained by comparison of in silico to experimental measurements
27
28
Multilayer organization of the cell
Secondary structure:
Alpha helix and beta
sheet
Spectre MS (MS1)
Tertiary structure: ?
526.27 ?
3D structure in space ?
?
m/z
31
Large-Scale MS Based Strategies
Protein Identification Through Database interrogation
M2 Protéomique [email protected]
Protein Database
species
Digestion
Digestion efficiency
enzyme
32
Large-Scale MS Based Strategies
Protein Identification Through Database interrogation
M2 Protéomique
[email protected]
33
Large-Scale MS Based Strategies
Protein Identification Through Database interrogation
35
Identification by MS
Secondary structure:
Alpha helix and beta
sheet
Spectre MS (MS1)
?
526.27 ?
?
?
Tertiary structure:
3D structure in space m/z
37
Large-Scale MS Based Strategies
Bottom-Up vs. Top-Down Strategies
Bottom-Up Shot-Gun Top-Down
Proteins
Proteins Proteins
separation or
separation digestion in bulk
not
Proteins
digestion MS on all protein
MS of native proteins
digestion products
MS on digestion
Proteins ID
Proteins ID
Proteins ID
products
PMF
MS2 to MSn of
MS2 on native proteins
peptides
MS2 to MSn of
peptides
PST
AA sequences of Partial AA sequence of
AA sequences of peptides of all proteins proteins (<60 AA)
peptides
In shotgun proteomics,
the dynamic range of signal
intensities of peptides resulting
from the proteome’s digestion is
at least an order of magnitude
larger than that of the original
proteome which make the game
of protein identification even
more difficult
40
Large-Scale MS Based Strategies
Label Free vs. Label-Based Methods for Relative Quantification
42
Large-Scale MS Based Strategies
Processing Tools for Identification
43
Large-Scale MS Based Strategies
Statistical Tools for Data Analysis
44
Large-Scale MS Based Strategies
Statistical Tools for Data Analysis
Go-Terms (Gene
ontology) are
searched for each
cluster
Subnetwork
enrichment
analysis
e.g. Pathway Studio
v10.0 Elsevier
Careful: Network analysis is based on Bibliographic data (here >15,000 refs included) 46
Shot Gun Proteomics
From Relative quantification to Signaling Pathways
http://www.pantherdb.org/
47
Shot Gun Proteomics
From Relative quantification to Signaling Pathways
Cellular process
Signal transduction
Gène
GO-terme
49
Shot Gun Proteomics
From Relative quantification to Signaling Pathways
Ajouter des protéines
Banque de données des connues pour être en
partenaires d’interaction interaction avec les
https://string-db.org/ cibles
Donne accès aux voies
de signalisations type
processus biologique ou
fonction moléculaires
51
Large-Scale MS Based Strategies
PTMs Identification Strategies
Identification of PTMs require an enrichment step to be performed due to the low
abundances of modified proteins and their transient nature
52
Large-Scale MS Based Strategies
PTMs Identification Strategies
Identification of PTMs require an enrichment step to be performed due to the low
abundances of modified proteins and their transient nature
Phosphoproteins enrichment
53
Large-Scale MS Based Strategies
The Shot Gun Approach
The Top Down requires higher performances MS instruments
Intact protein fragmentation more difficult
sequencing LC-MS/MS
Dedicated databasis
55
Application of Large-Scale Proteomics
Cancer Research
Studying pre-cancerous lesions from risk patients (BRCA1 mutated) who had
undergone prophylactic ovariectomy
Benign
SCOUT
Secretory Cell OUTgrowth
(PAX2- , Bcl2)
STIL
Serous Tubal
Intraepthelial Lesion
IHC P53 or KI67 (P53 signature)
TILT
Tubal Intraepithelial Lesion
Remove cover slide in Transition (p53 & Ki67+)
STIC
Antigen retrieval Serous Tubal Intraepithelial
Carcinoma (p53+ &
Ki67+++)
Carcinoma
Carcinoma
Carcinoma
57
Application of Large-Scale Proteomics
Cancer Research
325 out of 1242
proteins show
significate 25
variations after Normal-p53
Normal-p53-stil
Anova (FDR 0.01) 20 Normal-p53-STIC
Carcinome
Percent of genes
15
10
59
Application of Large-Scale Proteomics
Agri-food
60
Application of Large-Scale Proteomics
Paleoproteomics
61
Application of Large-Scale Proteomics
Paleoproteomics
62
Application of Large-Scale Proteomics
Space
63
Thanks for Your Attention
64