Amazon Data Analyst Interview Questions -1
Amazon Data Analyst Interview Questions -1
Interview Questions
0-3 YOE
22-25 LPA
SQL Questions
Example:
FROM Employees
• The subquery (SELECT MAX(Salary) FROM Employees) finds the maximum salary.
• The outer query finds the maximum salary that is less than the highest salary.
Example:
WITH DuplicateCTE AS (
SELECT *,
FROM Employees
Example:
SELECT
EmployeeID,
Name,
Salary,
FROM Employees;
• The RANK() function assigns a rank to each row based on salary in descending order.
• If two employees have the same salary, they will have the same rank, and the next
rank will be skipped.
Example:
SELECT CustomerID,
SUM(SalesAmount) AS Total_Sales
FROM Sales
GROUP BY CustomerID
• The result is ordered in descending order, and the LIMIT clause restricts the result to
the top N customers.
Example:
-- Using WHERE
SELECT *
FROM Employees
-- Using HAVING
GROUP BY Department
If you need further assistance or more questions solved, feel free to ask!
2. Define Churn: Clearly define what constitutes churn, such as no transactions in the
last 6 months.
3. Feature Engineering: Create features like average transaction value, days since the
last transaction, tenure, etc.
7. KPIs: Monitor churn rate, retention rate, customer lifetime value (CLV), etc.
CustomerID,
FROM CustomerTransactions
GROUP BY CustomerID
This query identifies customers who have not made any transactions in the last 6 months.
2. Data Preprocessing: Handle missing values, remove outliers, and create time
series data.
6. Forecasting: Predict sales for the next quarter and adjust strategies accordingly.
result = model.fit()
print(forecast)
1. Define Objective: Identify the metric to optimize, like click-through rate (CTR) or
conversion rate.
2. Formulate Hypothesis: State the null hypothesis (no difference) and the alternative
hypothesis (there is a difference).
3. Sample Selection: Randomly divide the audience into two groups — Group A
(control) and Group B (test).
4. Experiment Design: Implement the changes in the test group while keeping the
control group unchanged.
6. Analyze Results: If the p-value < 0.05, reject the null hypothesis — the change has
a significant impact.
Example:
SELECT
Version,
COUNT(*) AS Users,
SUM(Conversion) AS Conversions,
GROUP BY Version;
• The query helps compare the conversion rates for versions A and B.
• Customer Lifetime Value (CLV): Estimated revenue from a subscriber during their
lifetime.
SELECT
COUNT(*) AS Total_Subscribers,
FROM PrimeSubscriptions;
3. Seasonal ARIMA (SARIMA): Time series model that accounts for seasonality.
result.plot();
• The decomposition shows the observed, trend, seasonal, and residual components.
Example:
In Power BI, use a line chart to analyze monthly sales trends and a bar chart to compare
regional performance.
• Drill-Down: Navigate from a summary level to a detailed view within the same
visual.
• Drill-Through: Navigate from one page to another with a focus on specific data.
o Example: Right-click a region in a report to view detailed sales data on a
separate page.
2. Add a drill-through field like Product Category to view detailed sales by category.
13. What are parameters in Tableau, and how do you use them?
Solution: Parameters in Tableau are dynamic input values used for filtering, calculations,
and control of visualizations.
Example:
Creating a parameter to switch between Sales and Profit:
1. Create a Parameter:
ELSE [Profit]
END
3. Use the calculated field in the visualization, and apply the parameter control.
In Power BI:
YoY Growth =
CALCULATE(
[Total Sales],
SAMEPERIODLASTYEAR(Sales[Order Date])
YoY Growth % =
DIVIDE(
[YoY Growth],
o X-axis: Year
In Tableau:
• Use Table Calculations:
1. Data Modeling:
4. Query Optimization:
5. Visual Optimization:
Key Properties:
• 99.7% of data lies within ±3σ from the mean (Empirical Rule)
Importance:
• It is the foundation for various statistical tests like t-tests and confidence
intervals.
Example:
If the average IQ score is 100 with a standard deviation of 15:
• Low p-value (< 0.05): Strong evidence against H₀ → Reject the null hypothesis.
• High p-value (> 0.05): Weak evidence against H₀ → Fail to reject the null hypothesis.
Interpreting p-value:
Example:
In an A/B test, suppose p = 0.03 when comparing two marketing strategies.
• Since p < 0.05, there is significant evidence to reject the null hypothesis that both
strategies perform the same.
• Right (Positive) Skew: Tail extends to the right; Mean > Median > Mode
• Left (Negative) Skew: Tail extends to the left; Mean < Median < Mode
• Skewness Coefficient:
o Pearson’s Skewness: Skewness=3(Mean−Median) / Standard Deviation
import pandas as pd
Example:
A dataset of house prices typically shows right skew because a few expensive houses
inflate the mean.
• Correlation: Measures the strength and direction of the relationship between two
variables. It does not imply causation.
o 0: No correlation
Example:
• Correlation: Ice cream sales and drowning incidents are positively correlated.
However, ice cream consumption does not cause drowning — it's due to summer
temperatures.
• Causation: Increasing the number of study hours leads to better exam scores.
20. Explain the Central Limit Theorem (CLT) in simple terms.
Solution:
The Central Limit Theorem (CLT) states that the distribution of sample means approaches
a normal distribution as the sample size increases, regardless of the original data
distribution.
Key Points:
• The mean of the sample means equals the population mean (µ).
• The standard deviation of sample means is called the Standard Error (SE):
SE=σ/n
where σ is the population standard deviation and n is the sample size.
Importance:
Example:
If we take the average height of random samples of 30 students from a population, the
distribution of those sample means will approximate a normal distribution, regardless of
the population's height distribution.
Methods:
• Forward/Backward Fill:
import pandas as pd
return data.rolling(window=window).mean()
# Example
print(moving_average(data, window=3))
Output:
0 NaN
1 NaN
2 20.0
3 30.0
4 40.0
5 50.0
dtype: float64
• The first two values are NaN because the window size is 3.
Use Cases:
• agg(): Applies aggregation functions like sum, mean, max, etc., on grouped data.
Example:
import pandas as pd
print(grouped)
Output:
Salary
mean sum
Department
HR 52500 105000
IT 62500 125000
Explanation:
Use Cases:
Merge Methods:
import pandas as pd
print(merged_df)
Output:
ID Name Age
0 2 Bob 25
1 3 Charlie 30
• Outer Join: All data from both, filling NaN for non-matches.
Examples:
• apply() on DataFrame:
import pandas as pd
print(df)
Output:
A B Sum
01 4 5
12 5 7
23 6 9
• map() on Series:
df['Mapped_A'] = df['A'].map(mapping)
print(df)
Output:
A B Sum Mapped_A
01 4 5 One
12 5 7 Two
2 3 6 9 Three
When to Use: