WritingSample 1 PDF

Russ Weeks led an innovative project to design a distributed genomic database to store and index genomic data at scale. Key challenges included selecting Accumulo as the distributed database due to its flexibility and security features, designing an optimal key schema to satisfy both patient-oriented and researcher queries, and improving ingest performance by 28x through annotating variant observations and condensing a reference dataset into a Bloom filter. The project was technically challenging due to the large data volumes and variety of access patterns, and was personally fulfilling for Weeks due to its potential to help researchers and caregivers understand the effects of genomes on health and disease.

Uploaded by

Akinbolajo Olumide

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

250 views

WritingSample 1 PDF

Uploaded by

Akinbolajo Olumide

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

Amazon Writing Sample

What is the Most Inventive or Innovative Thing You’ve Done?

Russ Weeks
Sept. 27 2016
One of the most innovative and personally fulfilling projects that I’ve led is the
design and implementation of a distributed database optimized for the storage and
indexing of genomic data.
I am by no means a bioinformatician but for the purposes of this project it was
sufficient to understand a few terms. Your genome is the total of your genetic
material. It consists of a very long sequence of alleles represented by the characters
G,T,A and C measured across a set of chromosomes. Some chromosomes are
represented by numbers (1-22); some are represented by letters (X,Y).
Physicians and researchers are interested in the ways in which a person’s genome
differs from a reference human genome – these are called variant observations or
variants. There are 4-5 million variants in your genome. The vast majority of
these variants are meaningless – in fact, many are unique to you and have never been
observed before. These variants are the noise in your genomic signal. Other variants
are important or clinically significant. Some of these variants confer benefits: for
example, the deletion of 32 alleles around position 46,000,000 on chromosome 3
may give immunity to the HIV virus. Unfortunately, many significant variants are
pathological: they may lead directly to a disease like muscular dystrophy or cystic
fibrosis, or they may put the carrier at increased risk of disease like Alzheimer’s
or obesity. Variant metadata such as clinical significance and allele frequency is
available through a collection of public reference datasets.
Physicians and researchers study clinically significant variants from different per-
spectives: a physician wants to understand the variants in a patient’s genome in
order to plan the best course of treatment, and a researcher wants to understand
what variants are present in an existing cohort of subjects. The technical challenge
I faced was to design and build a genomic database that could satisfy both these
access patterns while scaling horizontally to hundreds of thousands of patients.
The first design question I faced in this project was the selection of a distributed
database, since the dataset was clearly too large for a centralized solution. The
database needed to provide sub-second responses to narrowly-scoped queries such
as “does this patient have a variant at chromosome Y, position 11134340” and
interactive (1s - 10 min) responses to more broadly-scoped queries such as “show
me all variants in the FOXRED1 gene found in cancerous lung tissue”. I needed to
be able to integrate the database with the Spark distributed computing framework
and I also needed to enforce strict access control rules due to the confidential nature
of the data. The Apache Accumulo distributed key/value store was a great fit
for these requirements. Accumulo is an open-source implementation of Google’s
BigTable data structure. It features a flexible and powerful distributed processing

1
stack which allowed me to prune results for many queries at the server-side, and it
has a very mature and robust cell-level security model.
Having settled on a key/value store, the next design challenge was to determine
an optimal key schema. It was clear that the keys would represent variant
observations; it was also clear that a variant could be identified by the tuple
of values (patient_id, chromosome, position, ref erence_allele, alternate_allele).
What wasn’t clear was the order of these components within the key. Conventional
genomic databases, which are oriented towards the researcher’s workflow, effectively
put patient_id at the end of the key. This optimizes for research-oriented queries
like “show me all patients with a known variant” but makes it nearly impossible to
answer patient-oriented queries like “show me all clinically significant variants in
this patient’s CCR5 gene”. I can’t disclose the solution we arrived at but through
a combination of schema design and server-side processing we were able to satisfy
both the researcher and physician query patterns.
Another challenge I faced was related to annotating variant observations. For exam-
ple, the DBSNP dataset consists of 160M “known” variants that have been identified
and catalogued by researchers around the world. When a new patient genome is
ingested by the system, all 4-5M variants should be annotated with metadata from
known public datasets. I implemented this denormalization because (a) our use
of commodity hardware means that storage is relatively cheap, (b) genomic data
is immutable, which means that the cost of processing once at write-time can be
amortized across many reads, and (c) the alternative, which is an asymmetric join
at read-time between two large datasets, is prohibitively expensive at this scale.
One especially frustrating aspect of the annotation process is that an inner join
between a patient genome and a reference dataset produces very few results. Cross-
referencing a full set of ~5M variant observations against the 160M catalogued ob-
servations in DBSNP may yield only ~1K annotations. Since the DBSNP database
itself is so large that it must be distributed, every lookup involved an RPC and
the vast majority of the lookups produced no match. I mitigated this problem and
improved ingest performance by 28x by condensing the DBSNP dataset to a Bloom
filter. The Bloom filter was small enough to be kept in RAM on all the worker
nodes during the ingest workflow and avoided 99.99% of useless network RPCs.
This project was technically innovative due to the sheer volume of the data being
processed as well as the variety of access patterns that we needed to support. More
than that, it was meaningful to me because it was an opportunity to contribute to
the important work that bioinformaticians, researchers and primary caregivers are
doing all over the world to understand the effects of our genome on our physical
and mental well-being.
There were many more interesting and challenging aspects of this project, for more
information please check out my talk at this year’s Accumulo Summit.

Writing Sample Amit
30% (10)
Writing Sample Amit
3 pages
35 Amazon Leadership Principles Interview Questions & Answers
50% (2)
35 Amazon Leadership Principles Interview Questions & Answers
28 pages
Are Right, A Lot
75% (4)
Are Right, A Lot
3 pages
Interview Guide - Consolidated Leadership Principles
100% (5)
Interview Guide - Consolidated Leadership Principles
16 pages
Amazon Confidential Amazon Interview Question Bank
No ratings yet
Amazon Confidential Amazon Interview Question Bank
18 pages
amazon interview questions
No ratings yet
amazon interview questions
19 pages
Amazon Interview Method
75% (8)
Amazon Interview Method
9 pages
Amazon'sW Wriitng Sample
25% (4)
Amazon'sW Wriitng Sample
2 pages
The 14 Amazon Leadership Principles: Common Questions For Each Principle
No ratings yet
The 14 Amazon Leadership Principles: Common Questions For Each Principle
19 pages
Think Big
67% (3)
Think Big
2 pages
Amazon Leadership Principles Interview: Order ID: 0028913
50% (2)
Amazon Leadership Principles Interview: Order ID: 0028913
4 pages
How To Answer Amazon "Earn Trust" Interview Questions: Jennifer Scupi
100% (2)
How To Answer Amazon "Earn Trust" Interview Questions: Jennifer Scupi
5 pages
Interview Questions Framework On Amazon Value "Bias For Action"
100% (1)
Interview Questions Framework On Amazon Value "Bias For Action"
2 pages
Interview Questions Framework On Amazon Value "Are Right, A Lot"
No ratings yet
Interview Questions Framework On Amazon Value "Are Right, A Lot"
3 pages
Amazon Interview Ques
100% (2)
Amazon Interview Ques
7 pages
How To Answer Interview Questions About Amazon Leadership Principle"earn Trust" - Interview Genie
100% (2)
How To Answer Interview Questions About Amazon Leadership Principle"earn Trust" - Interview Genie
9 pages
Tips For Answering Amazon "Are Right, A Lot" Leadership Principle Interview Questions
No ratings yet
Tips For Answering Amazon "Are Right, A Lot" Leadership Principle Interview Questions
9 pages
Amazon Writing Sample
0% (1)
Amazon Writing Sample
1 page
Have Backbone Disagree and Commit-1
100% (2)
Have Backbone Disagree and Commit-1
2 pages
Vacuum Diagrams
0% (1)
Vacuum Diagrams
43 pages
Amazon Principles
100% (1)
Amazon Principles
10 pages
Amazon Interview
0% (1)
Amazon Interview
3 pages
Amazon Leadership Principles
No ratings yet
Amazon Leadership Principles
5 pages
Amazon 14 Leadership
No ratings yet
Amazon 14 Leadership
6 pages
Amazon Interview Preparation
No ratings yet
Amazon Interview Preparation
3 pages
2020 NDE FAQ & Spec Dublin
0% (1)
2020 NDE FAQ & Spec Dublin
4 pages
Amazon Leadership Principles: A Short Guide
100% (4)
Amazon Leadership Principles: A Short Guide
17 pages
How To Answer Amazon "Bias For Action" Interview Questions: Jennifer Scupi
No ratings yet
How To Answer Amazon "Bias For Action" Interview Questions: Jennifer Scupi
4 pages
Tell Me About Yourself?
No ratings yet
Tell Me About Yourself?
6 pages
Creating Shared Value A How To Guide PDF
100% (1)
Creating Shared Value A How To Guide PDF
28 pages
Writing Sample 101 PDF
No ratings yet
Writing Sample 101 PDF
2 pages
WritingSample 1 PDF
100% (2)
WritingSample 1 PDF
2 pages
Amazon Candidate Interview Toolkit
No ratings yet
Amazon Candidate Interview Toolkit
2 pages
Interview Advice: How To Answer Interview Questions About The Amazon Leadership Principle "Think Big"
100% (1)
Interview Advice: How To Answer Interview Questions About The Amazon Leadership Principle "Think Big"
8 pages
Consolidated AIQB Reference Guide GLO EN
No ratings yet
Consolidated AIQB Reference Guide GLO EN
23 pages
Leadership Principle Interview Questions by Principle
No ratings yet
Leadership Principle Interview Questions by Principle
23 pages
Interview Essay
No ratings yet
Interview Essay
17 pages
Amazon Interview Question Links
No ratings yet
Amazon Interview Question Links
1 page
STAR Method & Amazon's Leadership Principles
No ratings yet
STAR Method & Amazon's Leadership Principles
2 pages
Amazon OA (Online Assessment) 2021 Questions With Solution
No ratings yet
Amazon OA (Online Assessment) 2021 Questions With Solution
2 pages
Amazon Principles
100% (1)
Amazon Principles
9 pages
Amazon 3 - How To Interview at Amazon Using The Amazon Leadership Principles
No ratings yet
Amazon 3 - How To Interview at Amazon Using The Amazon Leadership Principles
7 pages
How To Answer Amazon "Frugality" Interview Questions: Jennifer Scupi
No ratings yet
How To Answer Amazon "Frugality" Interview Questions: Jennifer Scupi
3 pages
Google Interview Guide
No ratings yet
Google Interview Guide
4 pages
Amazon Interview Prep Notes PDF
100% (1)
Amazon Interview Prep Notes PDF
2 pages
Build Great Answers:: Sample Responses To Common Interview Questions
No ratings yet
Build Great Answers:: Sample Responses To Common Interview Questions
29 pages
Amazon Interview Experiences 2016-2017 PDF
100% (1)
Amazon Interview Experiences 2016-2017 PDF
10 pages
Invent and Simplify
No ratings yet
Invent and Simplify
2 pages
Interview Preparation
No ratings yet
Interview Preparation
4 pages
How To Ace Your Amazon Interview
71% (7)
How To Ace Your Amazon Interview
15 pages
Amazon Question Set-Blank - Detailed
No ratings yet
Amazon Question Set-Blank - Detailed
16 pages
Amazon Phone Interview Prep. 1
100% (1)
Amazon Phone Interview Prep. 1
3 pages
How To Answer Amazon "Deliver Results" Interview Questions: Jennifer Scupi
100% (2)
How To Answer Amazon "Deliver Results" Interview Questions: Jennifer Scupi
4 pages
STAR Method - Inteview Preparation
No ratings yet
STAR Method - Inteview Preparation
2 pages
Mazon CAN BE ONE OF THE Most Successful Companies ALL Over THE World Isks ARE BIG BUT Profitability CAN BE Also BIG
No ratings yet
Mazon CAN BE ONE OF THE Most Successful Companies ALL Over THE World Isks ARE BIG BUT Profitability CAN BE Also BIG
14 pages
STAR Method
100% (1)
STAR Method
2 pages
3 Do-s and Don't-s for the Amazon Interview
From Everand
3 Do-s and Don't-s for the Amazon Interview
Nick Dimitrov
1/5 (2)
Bioinformatics Thesis Download
100% (3)
Bioinformatics Thesis Download
8 pages
DNA Microarrays Part A Array Platforms and Wet Bench Protocols 1st Edition Alan Kimmel pdf download
No ratings yet
DNA Microarrays Part A Array Platforms and Wet Bench Protocols 1st Edition Alan Kimmel pdf download
63 pages
Instant Access To DNA Microarrays Part A Array Platforms and Wet Bench Protocols 1st Edition Alan Kimmel Ebook Full Chapters
100% (10)
Instant Access To DNA Microarrays Part A Array Platforms and Wet Bench Protocols 1st Edition Alan Kimmel Ebook Full Chapters
60 pages
Plagiarism1 - Report
No ratings yet
Plagiarism1 - Report
8 pages
DNA Microarrays Part A Array Platforms and Wet Bench Protocols 1st Edition Alan Kimmel - The ebook is ready for download with just one simple click
100% (1)
DNA Microarrays Part A Array Platforms and Wet Bench Protocols 1st Edition Alan Kimmel - The ebook is ready for download with just one simple click
36 pages
Retaining Wall - 8m With Traffic Load
No ratings yet
Retaining Wall - 8m With Traffic Load
17 pages
Fitter Fabrication PDF
No ratings yet
Fitter Fabrication PDF
29 pages
TC Unit 5 AKTU Semester Questions
No ratings yet
TC Unit 5 AKTU Semester Questions
2 pages
How To Flush Gearboxes and Bearing Housings: Noria Corporation
No ratings yet
How To Flush Gearboxes and Bearing Housings: Noria Corporation
6 pages
8918675-CL 10 - Arithmetic Progressions - Case Study - Arsha.k.r. - 2023 - 24
0% (1)
8918675-CL 10 - Arithmetic Progressions - Case Study - Arsha.k.r. - 2023 - 24
4 pages
Statutory Construction Syllabus - EKS
No ratings yet
Statutory Construction Syllabus - EKS
9 pages
Problem-set
No ratings yet
Problem-set
2 pages
Ridin West
100% (1)
Ridin West
2 pages
Abu Dhabi Sewerage Services Company (Adssc)
No ratings yet
Abu Dhabi Sewerage Services Company (Adssc)
16 pages
Revision of Gramin Utsav Yojona
No ratings yet
Revision of Gramin Utsav Yojona
2 pages
Toplotna Pumpa Hidria Clint - Eu - Cha K - 182 P 604 P - cls61.7 Eng
No ratings yet
Toplotna Pumpa Hidria Clint - Eu - Cha K - 182 P 604 P - cls61.7 Eng
2 pages
A Strategy For A Digital Age - McKinsey
No ratings yet
A Strategy For A Digital Age - McKinsey
9 pages
ACS and stroke
No ratings yet
ACS and stroke
10 pages
CON2015 SAP Material Ledger New Tips and Key Updates
100% (2)
CON2015 SAP Material Ledger New Tips and Key Updates
10 pages
Fishing Vessel Safety
No ratings yet
Fishing Vessel Safety
62 pages
Pemaknaan Iklan Djarum 76 Versi Teman Hidup (Analisis Semiotika Roland Barthes) Oleh: Rudi Kurniawan
No ratings yet
Pemaknaan Iklan Djarum 76 Versi Teman Hidup (Analisis Semiotika Roland Barthes) Oleh: Rudi Kurniawan
15 pages
Civil Procedure i Course Outline Riano 1
No ratings yet
Civil Procedure i Course Outline Riano 1
8 pages
App001 Student Day02
No ratings yet
App001 Student Day02
6 pages
Bits and Pieces LTD
No ratings yet
Bits and Pieces LTD
1 page
GDPR Privacy Notice For DHL International UK LTD PDF
No ratings yet
GDPR Privacy Notice For DHL International UK LTD PDF
4 pages
Chapter 1. Introduction and Theoretical Issues in Archaeological Gis. Chapter 2.
100% (1)
Chapter 1. Introduction and Theoretical Issues in Archaeological Gis. Chapter 2.
22 pages
Sound and Noise
No ratings yet
Sound and Noise
22 pages
ARL-700 Kumanda Panoları Tip-Onay Sertifikası en 81-20-50
No ratings yet
ARL-700 Kumanda Panoları Tip-Onay Sertifikası en 81-20-50
1 page
2025-(ACTION RESEARCH)DIVISION PROPOSAL APPROVAL SHEET
No ratings yet
2025-(ACTION RESEARCH)DIVISION PROPOSAL APPROVAL SHEET
3 pages
Sereno - Escra
No ratings yet
Sereno - Escra
8 pages
Archive of SID: The Most Well-Known Health Literacy Questionnaires: A Narrative Review
No ratings yet
Archive of SID: The Most Well-Known Health Literacy Questionnaires: A Narrative Review
10 pages
AQSAT Sep21 Eng
No ratings yet
AQSAT Sep21 Eng
1 page
LTES Nutrition Month Report 2024
No ratings yet
LTES Nutrition Month Report 2024
8 pages

WritingSample 1 PDF

Uploaded by

WritingSample 1 PDF

Uploaded by

Amazon Writing Sample

What is the Most Inventive or Innovative Thing You’ve Done?

You might also like