DDDM_Lecture3_ExperimentBasics_Dec11
DDDM_Lecture3_ExperimentBasics_Dec11
Shuyang Yang
Dec 11th
Agenda
Lecture 1 A Peep into DDDM
I. Best Practices for Conducting Rigorous A/B Tests
• Hypothesis testing
• Test Statistics
• MDR
• Iterative improvement: small, incremental tests compound over time to create significant
• Measuring real-world impact: evaluate changes directly with real users/customers in real-world
conditions
It doesn't matter how beautiful your theory is, it doesn't matter
how smart you are. If it doesn't agree with experiment[s], it's
wrong -- Richard Feynman
• The advantages of A/B testing only hold if the test is conducted rigorously and
• A poorly designed or executed A/B test can lead to misleading conclusions, wasted
• Measurable: Clearly defines the variables involved and the expected outcome, includes metrics or
Target
Population
Traffic Requirement
https://www.evanmiller.org/ab-testing/sample-size.html
Data Driven Decision Making - Shuyang 20
0.Best
I. Introduction: AB up
Practices: Set Testing / RCT/ Experiments
Experiment
Traffic Requirement – Establish Test Duration
• Traffic allocation:
• Ensure that participants are correctly allocated to control and variation groups
• No significant discrepancies in sample size
• Early Indicators of Results:
• monitoring trends can help identify potential issues
• Data Integrity Audits
• Detect and resolve data issues before analysis
4. Compare p-value to 𝛼
• reject p < 𝛼
• Statistical power: 1- 𝛽
• The test’s ability to detect a true effect when it
exists (probability of avoiding Type II error)
• Common desired: 80%
• Ensure a test is sensitive enough to detect a
meaningful effect
The lower the significance level (𝛼), the more likely Type II error occurs critical (clinical trial)
• Common standard: set 𝛼 = 0.05, 𝛽 = 0.2
• Use case:
• Analysis unit is different from experiment unit
• e.g. average clicks per content
• Observations at group level are NOT independent – variance is incorrect
• Ratio metrics measure the ratio of two metrics
• e.g. page-level CTR= # clicks / # page visits
= (# clicks / #user) / (# page visits / # user)
= (user-level) / (user-level) ------ ratio metric, i.i.d. (numerator, denominator)
• Delta-Method: 𝑇𝑎𝑦𝑙𝑜𝑟 𝐸𝑥𝑝𝑎𝑛𝑠𝑖𝑜𝑛 (1𝑠𝑡 𝑂𝑟𝑑𝑒𝑟)
#$%&'
(()*+)) %$
• 𝐶𝑇𝑅 = ,- = = f(x, y)
(()*+)) &$
• Minimum detectable effect: the smallest effect size that a statistical test can reliably detect given the
• represents the smallest change in a metric (e.g., conversion rate or revenue) that you want to detect
with a specified level of confidence à determine prior to test based on business hypothesis
." 0 /
/ .#
1 − 𝛽 = Pr( > 𝑍"02/4 |𝑝1, 𝑝2) =0.8
%$" "& %$" %$ "& %$#
1 #
'" '#