Applied Scientist Candidate Companion
Applied Scientist Candidate Companion
Candidate Companion
Applied Scientists work and solve a broad collection of practical problems that dramatically improve customer
experience, reduce costs, and drive speed and automation. Amazon has the eagerness of a fresh startup to absorb
Machine Learning solutions, and has the scale of a mature firm to help support their development at the same
time.
Amazon has a rich data environment for Applied Scientists to develop new models and algorithms, and use those
to have an impact on the lives of millions of customers. Applied Science is highly experimental in nature, and
needs to be supported through strong theoretical analysis and associated process innovations. At Amazon, Applied
Scientists work closely with Software Engineers to put algorithms into practice. We encourage rigorous customer-
impacting work that involves careful consideration of modeling assumptions, a thorough review of ML literature,
experimentation using state-of-the-art methods, and error-free scalable implementations.
Annexure
• Answering the Machine Learning questions
• Why Amazon
• Technical topics to review
• Depth and Breadth of Knowledge
• Problem Solving and Coding
• Leadership Principles
• Scope of work
Why Amazon
Please reflect on how you think a career with Amazon would
be mutually beneficial and be prepared to speak to it. Although
“Why Amazon?” is a standard type of question, it’s not a check-
the-box type of formality for us.
We genuinely want to understand how working together with
you would be great, so we get a better sense of who you are.
Our interviewers also appreciate an opportunity to share their
thoughts and experiences, so take a moment to prepare a
couple of questions for the interviewer.
Technical topics to review
Domain Expertise: We are curious about your area of expertise, whether it is Automatic Speech Recognition (ASR),
Natural Language Understanding/ Processing (NLU/ NLP), Computer Vision (CV), Deep- learning (DL), Machine
Learning or Statistical modeling. You will be expected to demonstrate a broader and deep knowledge of your
research area and its literature, deep understanding of the field’s classical methods and your prior work, pros/
cons of modeling approaches, data sources, and practical experience in applying those research ideas to modeling
problems.
Machine Learning Problem Solving/ Application: Of course, we expect you to understand the basic machine
learning methods and algorithms. It is important that you revisit your favorite Machine Learning text books to go
through it. However, it is also important to apply those methods to real world problems. For example, given a
problem definition, you should be able to formulate it as a machine learning problem and propose a solution,
including ideas for data sources, annotation, modeling approach, evaluation, and be able to discuss potential
pitfalls and tradeoffs.
Supervised Learning: Linear & Logistic regression, Naive Bayes classifier, Bagging & Boosting, K-nearest
neighbors, Trees, Neural Networks, Support Vector Machines, Random Forests, Gradient Boosted trees, kernel
methods, Stochastic Gradient Descent (SGD), Sequence Modeling, Bayesian linear regression, Gaussian
Processes, Concepts of overfitting and under fitting, Regularization and evaluation metrics for classification and
regression problems
Unsupervised Learning: Clustering algorithms, k-Means clustering, Anomaly detection, Markov methods,
DBSCAN, Self-organizing maps, Deep Belief Nets, Expectation Maximization (EM), Gaussian Mixture Models
(GMM) and evaluation metrics for clustering problems
Probabilistic graphical models: Bayesian Network, Markov Networks, Variational inference, Markov chain,
Monte Carlo methods, Latent Dirichlet Allocation (LDA), inference methods such as Belief Propagation, Gibbs
Sampling
Dimensionality reduction: Auto encoders, t-SNE, Principal Component Analysis (PCA), Singular Value
Decomposition (SVD), Spectral Clustering and Matrix Factorization
Sequential models: Hidden Markov model (HMM), Conditional random fields (CRF), Recurrent Neural Network
(RNN), Natural Language processing applications such as Named Entity Recognition (NER) and Parts of Speech
(POS) tagging
Deep Neural Networks / Deep-learning: Feed forward Neural Networks, Convolutional Neural Networks,
Backpropagation, Recurrent Neural Networks (RNNs), and Long Short Term Memory (LSTM) networks. GAN,
Attention, Dropout, Vanishing gradient, Activation Functions
Natural Language processing: Statistical Language Modelling, Latent Dirichlet allocation (LDA), Named Entity
Recognition (NER), Word Embedding, Word2Vec, Sentiment Analysis, BERT, ULMFiT
Image and Computer Vision: Object Detection, Image recognition, Pattern recognition, FaceNet, CNN, YOLO
Training and Optimization:
Adaptive gradient approaches, Regularisation and overfitting, loss functions, Bayesian v/s maximum likelihood
estimation, dealing with lass imbalance, K-fold cross validation, bias and variance
Evaluation metrics:
Accuracy, Precision, Recall, Area under ROC, R-squared, Mean average precision (MAP), Mean reciprocal rank,
Equal Error rate, A/B testing fundamentals
Interviewer’s expectation:
Key points to keep in mind The interviewer’s expectation is for you to solve the problem
with minimal hints. Interviewer will be interested in your
• You should be able to convert your
thinking process and hearing your explanations before you
thoughts into coding and cover all edge start solving the question on the spot, you could start with a
cases brute force approach and later try to optimize it. Think out loud
• Understanding time and space Complexity and be vocal in your communication with your interviewer.
• Code should be bug free, readable and Also, remember to ask clarifying questions.
modular
• Exposure to dynamic programming and Use Suitable Data Structures:
Coding convention Wherever applicable, pick up suitable Data Structures to solve
the problem. For e.g.: Using Stack for Parentheses Validation
solves the problem easier rather than using Heap. During
preparation, try to learn as many data structures as possible.
Keep in mind, Amazon is a data-driven company. When you answer questions, your focus should be on the
question asked, ensure your answer is well-structured and provide examples using metrics or data if
applicable. Reference recent situations whenever possible
Scope of work
Our work includes, but are not limited to Recommendation engines, eCommerce fraud detection, Large-scale
optimization, Automated pricing, Demand forecasting, Predicting ad click probabilities, Ranking product search
results, Matching and Classifying products, Search optimizations, Information extraction, Sentiment analysis,
Alexa ASR, Robotics, Natural language understanding, Question answering, Delivery Routes and Conversational
systems.
Amazon Press
• Fortune: Bezos Article I’ve made billions of dollars of failures’
• Day One: The Amazon Blog
• How Amazon Hires
• Amazon Science page
• Publications
• Conferences-and-events
• Select-awards-and-recognition