Gatk Pipeline Presentation: From Fastq Data To High Confident Variants

The document summarizes a GATK pipeline for analyzing sequencing data from the NA12878 genome. It describes the major steps: 1) aligning FASTQ reads to the reference genome using BWA to produce a BAM file, 2) calling variants using GATK HaplotypeCaller, 3) filtering and annotating variants with tools like Picard and GATK. It also lists the main software tools used, including BWA, GATK, SAMtools, and Picard Tools. The goal is to produce high confidence variants from the raw sequencing data through alignment, variant calling, and quality control steps.

Uploaded by

Sampreeth Reddy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

116 views8 pages

Gatk Pipeline Presentation: From Fastq Data To High Confident Variants

Uploaded by

Sampreeth Reddy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

GATK PIPELINE PRESENTATION

FROM FASTQ DATA TO HIGH CONFIDENT VARIANTS.

DATASET USED
● The dataset that has been used was NA12878
● This was generated on Illumina HiSeq.
● Chromosome 20 was the major focus for the whole workflow.
WORKFLOW
● There were 3 protocols that were followed.
○ FASTQ to BAM
○ CALLING VARIANTS
○ FILTERING VARIANTS.

Apart from these there have been few other protocols such as Support Protocols
and Alternate Protocols.
SOFTWARES AND FILES
● Softwares
○ BWA
○ GATK
○ SAMtools
○ Picard Tools
● Files
○ Sequence of raw reads in FASTQ format
○ Reference Genome in FASTA format
○ Database of known variants in VCF format.
Preparing the Reference Sequence
● GATK uses two files to safely access reference genome.
○ a dictionary of contig names and sizes.
○ index file to efficient random access to reference bases.
● The index file is created by using SAMtools and BWA.
○ BWA is separately used to create some other files for aligning reads
● The dictionary file is created using Picard Tools.
FASTQ To BAM
● Now the reads are aligned to the reference genome using BWA.
● Duplicate reads are marked for those aligned reads and are removed as they
doesn’t provide any additional information. This is done using Picard Tools.
● The above process creates a BAM file with duplicate reads marked.
● Now the bam file is marked with known indels giving a list of target regions.
● According to these targets list the reads are now realigned for a better
alignment.
● The above two steps are carried out using GATK.
● Later a Base Quality Score Recalibration is done using GATK again.
HaplotypeCaller Vs UnifiedGenotyper
● HaplotypeCaller is capable of calling SNPs and indels simultaneously via
local de-novo assembly of haplotypes in an active region. This allows the
HaplotypeCaller to be more accurate when calling regions that are
traditionally difficult to call, for example when they contain different types of
variants close to each other.
● Unifiedgenotyper calls SNPs and indels separately by considering each
variant locus independently. The model it uses to do so has been generalized
to work with data from organisms of any ploidy
RAW VARIANTS To ANALYSIS READY VARIANTS
● As we have used the HaplotypeCaller we will have 2 classes of variants SNPs
and Indels.
● Variant Quality Score Recalibration is done on SNPs and Indels separately.
● Specify call sets that should be used to build the recalibration model for Indels
and SNPs.
● Specify which annotations should be used to evaluate the likelihood of SNPs
and Indels being real.
● Recalibration models are build seperately using GATK and desired levels of
recalibration are applied to detect the original SNPs and Indels.
● The Output is annotated with recalibrated quality scores.

Whole Exome Seq Data Analysis 1742774815
No ratings yet
Whole Exome Seq Data Analysis 1742774815
58 pages
Module Exercise C
No ratings yet
Module Exercise C
6 pages
Slides Woods
No ratings yet
Slides Woods
156 pages
Bioinformatics Analysis of Whole Exome Sequencing Data: Peter J. Ulintz, Weisheng Wu, and Chris M. Gates
No ratings yet
Bioinformatics Analysis of Whole Exome Sequencing Data: Peter J. Ulintz, Weisheng Wu, and Chris M. Gates
42 pages
NIHMS753481 Supplement Supplemental Data
No ratings yet
NIHMS753481 Supplement Supplemental Data
124 pages
3 RNAseq-Mapping LO
No ratings yet
3 RNAseq-Mapping LO
98 pages
Assignment Cb 1
No ratings yet
Assignment Cb 1
69 pages
NGS - From Seq2var
No ratings yet
NGS - From Seq2var
60 pages
Intro_to_RNA-seq_concepts
No ratings yet
Intro_to_RNA-seq_concepts
85 pages
4 - 7 Genome Assembly To Annotation - Final
No ratings yet
4 - 7 Genome Assembly To Annotation - Final
92 pages
Analysis of RNA-Seq Data
No ratings yet
Analysis of RNA-Seq Data
71 pages
Gamgee - A C++14 Library For Genomics Data Processing and Analysis - Mauricio Carneiro - CppCon 2014
No ratings yet
Gamgee - A C++14 Library For Genomics Data Processing and Analysis - Mauricio Carneiro - CppCon 2014
50 pages
Bif401 Highlighted Subjective Handouts by BINT - E - HAWA
No ratings yet
Bif401 Highlighted Subjective Handouts by BINT - E - HAWA
222 pages
GATKwr17-01-Intro to Variant Discovery
No ratings yet
GATKwr17-01-Intro to Variant Discovery
39 pages
2015 PAG Variant PDF
No ratings yet
2015 PAG Variant PDF
65 pages
MBG2004 GATK - Pipeline (Jiahui Zheng) - Lab - Week - III
No ratings yet
MBG2004 GATK - Pipeline (Jiahui Zheng) - Lab - Week - III
23 pages
COMP90016 2023 08 Variant Calling II
No ratings yet
COMP90016 2023 08 Variant Calling II
41 pages
3may2017 AdamAmeur
No ratings yet
3may2017 AdamAmeur
42 pages
1000 Genomes Reference
No ratings yet
1000 Genomes Reference
54 pages
Lecture Slides Human Variant Calling
No ratings yet
Lecture Slides Human Variant Calling
55 pages
IntroductiontoVariantCallsetEvaluationandFilteringTutorialAppendix-LA2016
No ratings yet
IntroductiontoVariantCallsetEvaluationandFilteringTutorialAppendix-LA2016
16 pages
Balamurugan
No ratings yet
Balamurugan
17 pages
A Universal SNP and Small-Indel Variant Caller Using Deep Neural Networks
No ratings yet
A Universal SNP and Small-Indel Variant Caller Using Deep Neural Networks
9 pages
Seanmaro 04 Alignment-Workshop
No ratings yet
Seanmaro 04 Alignment-Workshop
26 pages
Biology Grade 10 ST (MT) (BOOK)
No ratings yet
Biology Grade 10 ST (MT) (BOOK)
177 pages
Biogenome_Euformatics_Webinar_2024-09-24 (1)
No ratings yet
Biogenome_Euformatics_Webinar_2024-09-24 (1)
25 pages
WES Shivangi
No ratings yet
WES Shivangi
43 pages
Brief Guide For NGS Transcriptomics: From Gene Expression To Genetics
No ratings yet
Brief Guide For NGS Transcriptomics: From Gene Expression To Genetics
120 pages
Summary of Sequencing updated
No ratings yet
Summary of Sequencing updated
11 pages
Documents - Pub Introduction To Next Generation Sequencing and Variant Calling Karin Kassahn
No ratings yet
Documents - Pub Introduction To Next Generation Sequencing and Variant Calling Karin Kassahn
74 pages
lt11 06cmn
No ratings yet
lt11 06cmn
39 pages
Lab03 - Lab Manual
No ratings yet
Lab03 - Lab Manual
16 pages
GATKwr12 3 IndelRealignment PDF
No ratings yet
GATKwr12 3 IndelRealignment PDF
15 pages
squigulator_sup
No ratings yet
squigulator_sup
8 pages
Sam2bam High-Performance Framework for NGS Data Preprocessing Tools
No ratings yet
Sam2bam High-Performance Framework for NGS Data Preprocessing Tools
11 pages
RNA-Seq Analysis Course
No ratings yet
RNA-Seq Analysis Course
40 pages
GATKwr17-09-Somatic SNVs and Indels
No ratings yet
GATKwr17-09-Somatic SNVs and Indels
23 pages
Hisat 2
No ratings yet
Hisat 2
7 pages
s41598-022-05833-4
No ratings yet
s41598-022-05833-4
6 pages
Tool Combinaison Nfcore Rnaseq
No ratings yet
Tool Combinaison Nfcore Rnaseq
9 pages
Titanic: Mohit Kothari Roger Tanuatmadja Gautam Akiwate
No ratings yet
Titanic: Mohit Kothari Roger Tanuatmadja Gautam Akiwate
18 pages
FreeBayes variant calling workflow for DNA-Seq - Bioinformatics Workbook
No ratings yet
FreeBayes variant calling workflow for DNA-Seq - Bioinformatics Workbook
9 pages
2024 01 14 575595v1 Full
No ratings yet
2024 01 14 575595v1 Full
9 pages
NGS Overview Digital Brochure Feb26 2018 No Crops
No ratings yet
NGS Overview Digital Brochure Feb26 2018 No Crops
6 pages
RNA-Seq and Transcriptome Analysis: Jessica Holmes
No ratings yet
RNA-Seq and Transcriptome Analysis: Jessica Holmes
98 pages
Blank en Berg Pittsburgh 2011 Ngs
No ratings yet
Blank en Berg Pittsburgh 2011 Ngs
59 pages
RNA-Seq Module 1
No ratings yet
RNA-Seq Module 1
54 pages
Iso 25720-2009
No ratings yet
Iso 25720-2009
140 pages
s13073 017 0425 1
No ratings yet
s13073 017 0425 1
11 pages
A Universal SNP and Small-Indel Variant Caller Using Deep Neural Networks
No ratings yet
A Universal SNP and Small-Indel Variant Caller Using Deep Neural Networks
6 pages
Assignment I
No ratings yet
Assignment I
4 pages
Artrac Pinnacle 2022
No ratings yet
Artrac Pinnacle 2022
183 pages
Lecture14-Perl in Bioinformatics
No ratings yet
Lecture14-Perl in Bioinformatics
19 pages
BioInformatics For Newbies Dantelan
No ratings yet
BioInformatics For Newbies Dantelan
46 pages
Phylogenetic Trees
No ratings yet
Phylogenetic Trees
48 pages
TranscriptClean
No ratings yet
TranscriptClean
3 pages
All p2 Aml NBL Mdls PPTP WT Wxs
No ratings yet
All p2 Aml NBL Mdls PPTP WT Wxs
3 pages
NGS ToolsFormats r1 BDG
No ratings yet
NGS ToolsFormats r1 BDG
32 pages
BIM3007 Assignment 1
No ratings yet
BIM3007 Assignment 1
6 pages
Bioinforanatics 11
No ratings yet
Bioinforanatics 11
1 page
Bioinformatics LAb Report
100% (3)
Bioinformatics LAb Report
7 pages
Informe Cnio 3013
No ratings yet
Informe Cnio 3013
248 pages
04 Application of Genomic Tools - One Technology Takes It All
No ratings yet
04 Application of Genomic Tools - One Technology Takes It All
14 pages
The Orchid Genome 2021 Libro
100% (1)
The Orchid Genome 2021 Libro
174 pages
Skill DEVElopment
No ratings yet
Skill DEVElopment
30 pages
The Variant Call Format and Vcftools: Example
No ratings yet
The Variant Call Format and Vcftools: Example
1 page
North South University: Department of Biochemistry & Microbiology
No ratings yet
North South University: Department of Biochemistry & Microbiology
17 pages
EnglishAnnual15 16
No ratings yet
EnglishAnnual15 16
239 pages
2.3_History of Biological Databases
No ratings yet
2.3_History of Biological Databases
4 pages
Poster PPT Portrait
No ratings yet
Poster PPT Portrait
1 page
Training Workshop On Mycobacterium Whole-Genome Sequence Data Analysis
No ratings yet
Training Workshop On Mycobacterium Whole-Genome Sequence Data Analysis
2 pages
Comparative Table
No ratings yet
Comparative Table
1 page
Bioinformatics Answers
100% (1)
Bioinformatics Answers
13 pages
Bioinformatics: Concepts and Applications: December 2019
No ratings yet
Bioinformatics: Concepts and Applications: December 2019
11 pages
Recommendations For The Introduction of Metagenomic Next-Generation Sequencing in Clinical Virology, Part II: Bioinformatic Analysis and Reporting
No ratings yet
Recommendations For The Introduction of Metagenomic Next-Generation Sequencing in Clinical Virology, Part II: Bioinformatic Analysis and Reporting
13 pages
TNSCST Workshop Brochure
No ratings yet
TNSCST Workshop Brochure
2 pages
Bioinformatics Lecture 5-9 Review
100% (4)
Bioinformatics Lecture 5-9 Review
44 pages
Archer Fusionplex Ngs Assays Brochure
No ratings yet
Archer Fusionplex Ngs Assays Brochure
4 pages
FPGA Based Parallel Computation Techniques For Bioinformatics Applications
No ratings yet
FPGA Based Parallel Computation Techniques For Bioinformatics Applications
5 pages
Notes Applications of Molecular Techniques (Supplementation)
No ratings yet
Notes Applications of Molecular Techniques (Supplementation)
5 pages
Eyrich Bioinformatics 2001
No ratings yet
Eyrich Bioinformatics 2001
2 pages
MSC in Data Analytics For Precision Medicine 2pg A4 Web 100223
No ratings yet
MSC in Data Analytics For Precision Medicine 2pg A4 Web 100223
2 pages
P.R.No.: 11/Bt/07/028 Roll No.: Bt04B028 Transocean
No ratings yet
P.R.No.: 11/Bt/07/028 Roll No.: Bt04B028 Transocean
2 pages
Lab Report 3 BME 310
No ratings yet
Lab Report 3 BME 310
12 pages
Lion Phylogeny Activity Cytochrome B
0% (1)
Lion Phylogeny Activity Cytochrome B
4 pages
Letter of Motivation - Gottingen
No ratings yet
Letter of Motivation - Gottingen
2 pages
Computer Syllabus
No ratings yet
Computer Syllabus
3 pages
CBE 647 Lesson Plan - Sept 2017
No ratings yet
CBE 647 Lesson Plan - Sept 2017
3 pages
Q Tips: Fast, Scalable, and Maintainable Kdb+
From Everand
Q Tips: Fast, Scalable, and Maintainable Kdb+
Nick Psaris
No ratings yet
NNG Reference Manual, Second Edition
From Everand
NNG Reference Manual, Second Edition
Garrett D'Amore
No ratings yet

Gatk Pipeline Presentation: From Fastq Data To High Confident Variants

Uploaded by

Gatk Pipeline Presentation: From Fastq Data To High Confident Variants

Uploaded by

GATK PIPELINE PRESENTATION

FROM FASTQ DATA TO HIGH CONFIDENT VARIANTS.

You might also like