A - B Testing

Uploaded by

Dhinesh T

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views27 pages

A - B Testing

Uploaded by

Dhinesh T

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 27

Revolutionising B.

Tech
A/B testing
To test the causal effect as P(Y|T, X)
Table of Content
• Aim
• Objectives
• Self Assessments
• Activities –
• Did You Know
• Summary
• Terminal Questions
Aim

To equip students with the tools and methodologies needed to

determine the causal effect as P(Y|T, X) using A/B testing.
The objective of this course is to provide students with a
comprehensive understanding of the fundamental
principles of causal reasoning, enabling them to discern
and establish cause-and-effect relationships in complex
data sets.

Objective
A/B testing
A/B testing, also known as split testing, is a method used to compare two versions of a webpage,
app, email, or other digital assets to determine which one performs better in terms of a specific
metric, such as conversion rate, click-through rate, or user engagement.
How A/B Testing Works:
1. Create Variations: You create two versions of the element you want to test. Version A is the
control, or original version, and Version B is the variation with some change (e.g., different color
buttons, headlines, images, etc.).
2. Split Traffic: The traffic to your site or app is randomly split between the two versions. Half of the
users see Version A, and the other half see Version B.
3. Measure Results: You track the performance of each version based on the metric you’re
interested in. For example, if you’re testing a landing page, you might measure how many users
sign up for a newsletter.
4. Analyze Data: After running the test for a sufficient period, you analyze the data to see which
version performed better. Statistical significance is often calculated to ensure the results are not
due to chance.
5. Implement Changes: If Version B outperforms Version A, you might implement the changes
from Version B across your site or app. If there’s no significant difference, you might try testing
other variations.
Applications of A/B Testing:
1. Website Optimization: Testing different layouts, call-to-action buttons, images, and content
to increase user engagement or conversions.
2. Email Campaigns: Testing subject lines, content, or send times to improve open rates and
click-through rates.
3. Product Features: Testing new features or changes in an app to determine how users
respond.
4. Advertising: Testing different ad creatives, headlines, and targeting options to improve ad
performance.
Goal of the feature/idea (the treatment) conception:
1. What do you hope to achieve with this feature incorporation/update?
2. Why this specific feature update for that goal and not any other feature.
3. Has this been experimented before? Are other product lines also following suit?
4. Is it because of previous experiment data, industry insights, reports or other evidence that
supports your hypothesis?
5. Is this feature for a specific user group or for all user groups?
The test metric - that feature expects to bring an impact to and the data available.
1. The list of metrics and finalize the metric we want to test significance on.
• Metrics - In this case study we can focus on the primary impact of the feature being
more user engagement, daily active users etc.
• User Engagement can be defined as % of active users who have engaged with
Facebook in some way (like, comments, save, reactions).
• Daily Active Users - # of unique users who have logged on Facebook each day. We
expect this metric to increase with this new feature
2. Come up with north star metrics, supporting metrics (if applicable) and guard rail metrics.
• North Star Metric - the most crucial parameter to focus on e.g. % of user with
engagement
• Supporting Metric - Daily Active Users
• Guard Rail Metric (This is the metric that should not degrade in pursuit of a new feature)
- % of media content (assuming media content provides more value, we don't want this
% to decrease because of this feature). Or because this feature takes up so much
space, are we seeing lesser number of posts on average that people interact with
Steps - A/B testing
1. Set up hypothesis (State the null hypothesis & alternate hypothesis) - What would be the
null and alternative hypothesis in the case?
a. Null Hypothesis (Treatment t=0) - There is no significant difference in user
engagement between the treatment and control groups.
b. Alternative Hypothesis (Treatment t=1) - There is a significant difference in user
engagement between the treatment and control groups.
2. Choice of test - Since we are comparing two (y=0, y=1) ratios, we can use the Z-proportions
test.
a. Z-test, T-test is a statistical test used to determine whether there is a significant
difference between the means of two groups or between a sample mean and a
population mean.
3. Choosing experiment control & treatment subjects
a. Who is the experiment being run on?
b. Are we targeting all users on the platform? Or should we pick a proper segment of users for whom we
feel this test will be particularly well suited
4. Sample Size Calculation
a. Baseline metrics
i. Let’s say that before this feature launch, the user engagement is around 45%
b. Minimum detectable effect - what change is considered meaningful enough for you to take an action
i. Let’s say that the business stakeholders are hoping for a 1% increase in user engagement in the
treatment group
c. Significance level (Usually 95%) or confidence interval.
d. Power (Usually 80%): A function of the significance level, sample size, and the size of the effect being
detected. A power of close to 100% means the test is good at detecting a false null hypothesis.
Increasing the significance level increases the power of the test.
e. With the above number, assume you would roughly need 40K users in each group for us to design this
experiment in a statistically significant manner.
5. Experiment Duration - Based on sample size estimated and the approximate traffic -
● Divide sample size by the number of users in each group
i. Since we need a sample size of 80K (40K in each group) based on the above
calculation
ii. Assuming FB gets a traffic of 5K everyday
iii. Experiment duration = 80/5 = 16 days
6. Significance testing after we have reached the required sample size on the north star metric
to identify significance.
7. Continue monitoring supporting metrics and guard rail metrics.
Testing Pitfalls - A/B testing
Experimental Design Bias
1. Novelty / Primacy Effect -
a. Primacy Effect - When changes happen, some people that got used to how things work
may feel reluctant to change.
● Some users in the treatment group are reluctant to try out the new feature as they
were used to the older status UI so they stop using FB much to post status.
● So user engagement for first 2 weeks are low Wk1 = 45% and Wk 2 = 48%
● But as these reluctant users see more users engaging with this colored status button,
they will slowly start using this feature more.
● So from Wk3 onwards , the user engagement stabilizes to 62%
● It's important to not take the first 2 weeks of low user engagement due to the primacy
effect into consideration when comparing with control.
➢ Here it would have shown that there is no significant difference between the two
groups in the first 2 weeks even though subsequently we see that this feature
actually gets the users more engaged.
b. Novelty Effect - These users resonate with the new change and use more frequently
● Some users in the treatment group got excited about the new feature.
● The excited users use this feature and subsequently engage more in the first two
weeks after which the excitement dies down.
● So user engagement for first 2 weeks are high Wk1 = 65% and Wk 2 = 68%
● But from Wk3 onwards , the user engagement stabilizes to 52%
● It's important to not take the first 2 weeks of high user engagement due to novelty
effect into consideration when comparing with control.
c. Both of these effects are not long term effects, so it’s important that results are not
biased due to this effect. Treatment results may get exaggerated/undermined initially
due to these effects.
d. Solutions:
● Run the experiment for a longer time than required if possible to observe for any
novelty or primacy effect.
● The test can be conducted only on the first time users.
● Compare first time users with experienced users in the treatment group (we can
get an estimated impact of primacy / novelty effect).
2. Group Interference Qs - Interference between variants happens a lot. It's important to select
your sample in such a way that this interaction doesn’t cause biased results.
a. Eg: IF the treatment group is seeing a positive effect because of this new FB status
feature.
b. This effect can spill over to the control group (who is not seeing the new feature and
makes new posts seeing their friend who is affected by the new feature in the treatment
group). This is called a network effect.
c. So in this, the difference underestimates the treatment effect.
d. In reality the difference may actually be more than 1% but due to network effect, Actual
Effect > Treatment Effect.
e. Hence giving an incorrect result that this new feature did not significantly impact the
north star metric.
3. Outcome Bias
Look out for other design or system issues that led to the actual effect being undermined or
over estimated to the treatment effect.
Recommendations - based on experiment results “Launch or not?”
Link results to the goal and business impact
1. Example: What does 1% lift in engagement rate translate to revenue?
● If the 1% lift is increasing revenue through Ads by $20M, it might be worth it, however if
it only increases revenue by $50K it might not be (based on efforts estimation).
2. Is it worth it to launch the product given all the costs?
3. While the perfect scenario is that the increase in success metrics are significant and we don’t
see any difference in the guardrail metrics - Give recommendations on what to do in case of
conflicting situations.
4. Example: There's an increase in % user engagement among active users but also the daily
active users have decreased.
Link results to the goal and business impact
5. Translate this to impact to users and business -
a. Is the increased engagement among existing users bringing increased revenue to
balance out the loss of some daily active users?
b. For eg:
● Let's say the daily active users were 5K earlier but now it has come down to 3K.
● However the user engagement has increased from 45% to 65%
● If the increase in user engagement has led to a revenue increase despite the loss
of daily active users. This feature might be worth the consideration.
● ITs good to give a thought to strategy to retain the daily active users as next steps.
Consider short term and long term impact of the launch -
1. Sometimes a short-term impression increase can conflict with the brand image or company’s
mission in the long run.
2. One reasonable suggestion could be even with the decrease in daily active users, the launch
of color background search could potentially bring in more engaged users to the platform and
in the long term, the benefit may outweigh the drawbacks.
Activities

• Surprise Quiz in Class.

Summary

Outcomes:
Terminal Questions
Reference Links
Thank you

Research Methodology - Best Practices for Rigorous, Credible, And Impactful Research
100% (1)
Research Methodology - Best Practices for Rigorous, Credible, And Impactful Research
617 pages
Practical Statistical Process Control
From Everand
Practical Statistical Process Control
Colin Hardwick
5/5 (9)
Quiz
0% (1)
Quiz
23 pages
ISTQB Certified Tester Foundation Level Practice Exam Questions
From Everand
ISTQB Certified Tester Foundation Level Practice Exam Questions
Gabriel Awoyemi
5/5 (1)
Practical Guide To Usability Testing PDF
No ratings yet
Practical Guide To Usability Testing PDF
24 pages
Facebook status colour change AB tetsing case study
No ratings yet
Facebook status colour change AB tetsing case study
11 pages
Download
No ratings yet
Download
7 pages
AB Test Notes
No ratings yet
AB Test Notes
7 pages
Google(DA) 面试准备
No ratings yet
Google(DA) 面试准备
20 pages
AB Cheatsheet
No ratings yet
AB Cheatsheet
13 pages
Data Science Product Questions
No ratings yet
Data Science Product Questions
92 pages
A-B Testing - Framework-2025061017080742
No ratings yet
A-B Testing - Framework-2025061017080742
5 pages
25 A - B Testing Concepts You Must Know - Interview Refresher
No ratings yet
25 A - B Testing Concepts You Must Know - Interview Refresher
7 pages
A - B Test Guide
No ratings yet
A - B Test Guide
33 pages
4.ABTesting
No ratings yet
4.ABTesting
18 pages
B experiments
No ratings yet
B experiments
3 pages
2020-04-08 A - B-Tests - Jasmin Yaya
No ratings yet
2020-04-08 A - B-Tests - Jasmin Yaya
42 pages
AB Testing Cheat Sheet
No ratings yet
AB Testing Cheat Sheet
13 pages
EP-Experimentation Best Practices-200125-135749
No ratings yet
EP-Experimentation Best Practices-200125-135749
5 pages
2021-02-16 - Why AB Testing Is Likely Making Your UX Worse
No ratings yet
2021-02-16 - Why AB Testing Is Likely Making Your UX Worse
6 pages
Test What Matters_ Level-Up Your Product Experiments with Behavioral Data
No ratings yet
Test What Matters_ Level-Up Your Product Experiments with Behavioral Data
12 pages
A Comprehensive Getting Started Guide To A/B Testing
No ratings yet
A Comprehensive Getting Started Guide To A/B Testing
8 pages
What Is Bidirectional Traceability?: Manual Testing A&Q
No ratings yet
What Is Bidirectional Traceability?: Manual Testing A&Q
73 pages
Student Introduction to the Simulation Updated 11-27-23 - FINAL
No ratings yet
Student Introduction to the Simulation Updated 11-27-23 - FINAL
11 pages
AB Testing Notes
No ratings yet
AB Testing Notes
13 pages
Foundations of Large-Scale Doubly-Sequential Experimentation
No ratings yet
Foundations of Large-Scale Doubly-Sequential Experimentation
339 pages
The Art Science of AB Testing For Business Decisions
No ratings yet
The Art Science of AB Testing For Business Decisions
97 pages
after m2 extra notes
No ratings yet
after m2 extra notes
9 pages
A/B Testing: Mazher Khan - IIT (BHU) - B.Tech (DR-2)
No ratings yet
A/B Testing: Mazher Khan - IIT (BHU) - B.Tech (DR-2)
4 pages
Data Science Product Questions
No ratings yet
Data Science Product Questions
93 pages
DDDM_Lecture3_ExperimentBasics_Dec11
No ratings yet
DDDM_Lecture3_ExperimentBasics_Dec11
38 pages
Udacity AB Testing Course Final Test Report
No ratings yet
Udacity AB Testing Course Final Test Report
7 pages
MANUAL TESTING FAQ's
No ratings yet
MANUAL TESTING FAQ's
4 pages
U141201B
No ratings yet
U141201B
109 pages
5.ABTesting-Part2
No ratings yet
5.ABTesting-Part2
8 pages
2.measurement Methodologies - Facebook
No ratings yet
2.measurement Methodologies - Facebook
23 pages
How To Build Products Users Love
100% (2)
How To Build Products Users Love
41 pages
9 Most Common A B Testing Sins 1678686114
No ratings yet
9 Most Common A B Testing Sins 1678686114
11 pages
Study Material Module 2 - BBADMC602
No ratings yet
Study Material Module 2 - BBADMC602
15 pages
A:B Testing in Design Projects
No ratings yet
A:B Testing in Design Projects
2 pages
Activity-Based Techniques: Test Design Techniques (18 April 2019)
No ratings yet
Activity-Based Techniques: Test Design Techniques (18 April 2019)
71 pages
Lecture 3 - Experiments
No ratings yet
Lecture 3 - Experiments
66 pages
Software Testing Dec 2020
0% (1)
Software Testing Dec 2020
39 pages
Uxpin Guide To Usability Testing
100% (3)
Uxpin Guide To Usability Testing
102 pages
Practical Guide To Controlled Experiments On The Web: Listen To Your Customers Not To The Hippo
No ratings yet
Practical Guide To Controlled Experiments On The Web: Listen To Your Customers Not To The Hippo
34 pages
SQR: Balancing Speed, Quality and Risk in Online Experiments
No ratings yet
SQR: Balancing Speed, Quality and Risk in Online Experiments
9 pages
Innovate by Rapid Experimentation 1
No ratings yet
Innovate by Rapid Experimentation 1
33 pages
2012-06 ART ControlledExperimentsTutorialAll
No ratings yet
2012-06 ART ControlledExperimentsTutorialAll
49 pages
Challenges in A/B Testing Mobile Native Apps
0% (1)
Challenges in A/B Testing Mobile Native Apps
31 pages
AB Testing CH 3
No ratings yet
AB Testing CH 3
14 pages
Rapid Software Testing
No ratings yet
Rapid Software Testing
174 pages
научный файл 3
No ratings yet
научный файл 3
9 pages
Lessons, Tactics, and Stories From World-Class Experimenters
No ratings yet
Lessons, Tactics, and Stories From World-Class Experimenters
23 pages
ISTQB Certified Tester Advanced Level Test Manager (CTAL-TM): Practice Questions Syllabus 2012
From Everand
ISTQB Certified Tester Advanced Level Test Manager (CTAL-TM): Practice Questions Syllabus 2012
Gabriel Awoyemi
No ratings yet
A Refresher On A B Testing
No ratings yet
A Refresher On A B Testing
9 pages
9 Product Use Testing (Note) PDF
No ratings yet
9 Product Use Testing (Note) PDF
12 pages
Important Interview Questions
No ratings yet
Important Interview Questions
22 pages
Unlock Insights With AB Testing Data-Driven Decision Making
No ratings yet
Unlock Insights With AB Testing Data-Driven Decision Making
5 pages
TA Lecture-3 (Other Types of Testing)
No ratings yet
TA Lecture-3 (Other Types of Testing)
12 pages
Beginner S QA Testing of Websites
No ratings yet
Beginner S QA Testing of Websites
25 pages
Steps To Completing An A/B Test
No ratings yet
Steps To Completing An A/B Test
1 page
ADM SHS StatProb Q3 M14 Identifying the Different Random Sampling Techniques
No ratings yet
ADM SHS StatProb Q3 M14 Identifying the Different Random Sampling Techniques
27 pages
FAST FLAGS V2 (1)
No ratings yet
FAST FLAGS V2 (1)
12 pages
Hanford Joint Union High School District Curriculum Guide Course Cover Page I. Course Title
100% (2)
Hanford Joint Union High School District Curriculum Guide Course Cover Page I. Course Title
175 pages
Calendar Demand Problem
No ratings yet
Calendar Demand Problem
43 pages
Statistics CSS Syllabus
No ratings yet
Statistics CSS Syllabus
4 pages
Medtech Subjects UST
No ratings yet
Medtech Subjects UST
27 pages
Final (Version A) : Last Name: First Name
No ratings yet
Final (Version A) : Last Name: First Name
23 pages
Strengths and Weaknesses
No ratings yet
Strengths and Weaknesses
24 pages
Yorulmazetal 2017
No ratings yet
Yorulmazetal 2017
19 pages
Survey Report Writing Layout
No ratings yet
Survey Report Writing Layout
5 pages
Analysis of The Loans and Advances: "Jivan Commercial Co-Operative Bank"
100% (1)
Analysis of The Loans and Advances: "Jivan Commercial Co-Operative Bank"
59 pages
Ensemble Based Reservoir Modeling
No ratings yet
Ensemble Based Reservoir Modeling
2 pages
Analysis of Level Practice Satisfaction On Basic Chemical Laboratory Performance Faculty of Mipa Jenderal Soedirman University
No ratings yet
Analysis of Level Practice Satisfaction On Basic Chemical Laboratory Performance Faculty of Mipa Jenderal Soedirman University
11 pages
Notes6 Correlation
No ratings yet
Notes6 Correlation
28 pages
Probabilty & Statistics: Role of Statistics in Computer Science
No ratings yet
Probabilty & Statistics: Role of Statistics in Computer Science
2 pages
7 - Developing Models For Optimization
No ratings yet
7 - Developing Models For Optimization
18 pages
Classification Through Machine Learning Technique: C4.5 Algorithm Based On Various Entropies
No ratings yet
Classification Through Machine Learning Technique: C4.5 Algorithm Based On Various Entropies
8 pages
Schutt Paginile 53 87
No ratings yet
Schutt Paginile 53 87
36 pages
Presentation ON: Fluorescence
100% (1)
Presentation ON: Fluorescence
14 pages
Group-4-Data-Management-Notes
No ratings yet
Group-4-Data-Management-Notes
21 pages
Chapter 8 Hypothesis Testing
No ratings yet
Chapter 8 Hypothesis Testing
34 pages
Guidelines For Seminar Paper Writing PDF
No ratings yet
Guidelines For Seminar Paper Writing PDF
26 pages
CHE223 Course Midterm Aid 2019
No ratings yet
CHE223 Course Midterm Aid 2019
7 pages
Planning A Research: Dr.M.Logaraj, M.D., Professor of Community Medicine SRM Medical College
No ratings yet
Planning A Research: Dr.M.Logaraj, M.D., Professor of Community Medicine SRM Medical College
36 pages
Teodoro, Angela Humss 1
No ratings yet
Teodoro, Angela Humss 1
2 pages
(Written Examination Scheme) : (MCQ S)
No ratings yet
(Written Examination Scheme) : (MCQ S)
4 pages
A Study On The Effects of Remote Working On Qualit
No ratings yet
A Study On The Effects of Remote Working On Qualit
6 pages
Customer Relationship Management and Financial Sustainability in Commercial Banks in Uganda A Case Study of Stanbic Bank
No ratings yet
Customer Relationship Management and Financial Sustainability in Commercial Banks in Uganda A Case Study of Stanbic Bank
12 pages
1 Understanding Data and Ways to Systematically Collect Data
No ratings yet
1 Understanding Data and Ways to Systematically Collect Data
21 pages

A - B Testing

Uploaded by

A - B Testing

Uploaded by

Revolutionising B.

To equip students with the tools and methodologies needed to

• Surprise Quiz in Class.

You might also like