AB Testing With R Programming

Split testing is another name of A/B testing and it's a common or general methodology. It's used online when one wants to test a new feature or a product. The main agenda over here is to design an experiment that gives repeatable results and robust to make an informed decision to launch it or not. Generally, this test includes a comparison of two web pages by representing variants A and B for them, as the number of visitors is similar the conversion rate given by the variant becomes better. Overall, it's an experiment where two or more variations of the same web page are compared against together by showcasing them to real-time visitors, and through that determines which one performs better for a given goal. A/B testing is not only used or limited by web pages only, it can be used in emails, popups, sign-up forms, apps, and more. Let's look into the example of a case study. So let's implement AB testing in the R language.

Case Study

Let's imagine we have results of A/B tests from two hotel booking websites, (Note: the data is not the real one ). First, we need to conduct a test analysis of the data; second, we need to draw conclusions from the data which we obtained from the first step, and in the final step, we make recommendations or suggestions to the product or management teams.

Data Set Summary

Download the data set from here.

Variant A is from the control group which tells the existing features or products on a website.
Variant B is from the experimental group to check the new version of a feature or product to see if users like it or if it increases the conversions(bookings).
Converted is based on the data set given, there are two categories defined by logical value. It's going to show true when the customer completes bookings and it's going to show false when the customer visits the sites but not makes a booking.

Test Hypothesis

Null Hypothesis: Both versions A and B have an equal probability of conversion or driving customer booking. In other words, there is no difference or no effect between A and B versions
Alternative Hypothesis: Versions both A and B possess different probability of conversion or driving customer booking and there is a difference between A and B version. Version B is better than version A in driving customer bookings. PExp_B! = Pcont_A.

Analysis in R

1. Prepare the dataset and load the tidyverse library which contains the relevant packages used for the analysis.

# load the library
library(tidyverse)

# set up your own directory
setwd(“~egot_\\Projects\\ABTest”) 

# Using read.csv base import function  
ABTest <- read.csv("Website Results.csv", 
                   header = TRUE)

# save in your own directory
save(ABTest, file = "~rda\\ABTest.rda")

2. Let’s filter conversions for variants A & B and compute their corresponding conversion rates

# Let's filter out conversions for variant_A  
conversion_subset_A <- ABTest %>% 
    filter(variant == "A" & converted == "TRUE")

# Total Number of Conversions for variant_A
conversions_A <- nrow(conversion_subset_A)

# Number of Visitors for variant_A
visitors_A <- nrow(ABTest %>% 
    filter(variant == "A"))

# Conversion_rate_A
conv_rate_A <- (conversions_A/visitors_A)  
print(conv_rate_A) # 0.02773925

# Let's take a subset of conversions for variant_B
conversion_subset_B <- ABTest %>% 
    filter(variant == "B" & converted == "TRUE")

# Number of Conversions for variant_B
conversions_B <- nrow(conversion_subset_B)

# Number of Visitors for variant_B
visitors_B <- nrow(ABTest %>% 
    filter(variant == "B"))

# Conversion_rate_B
conv_rate_B <- (conversions_B/visitors_B)  
print(conv_rate_B) # 0.05068493

Output:

0.02773925
0.05068493

3. Let’s compute the relative uplift using conversion rates A & B. The uplift is a percentage of the increase

uplift <- (conv_rate_B - conv_rate_A) / conv_rate_A * 100
uplift # 82.72%

Output:

82.72%

B is better than A by 83%. This is high enough to decide a winner.

4. Let’s compute the pooled probability, standard error, the margin of error, and difference in proportion (point estimate) for variants A & B

# Pooled sample proportion for variants A & B
p_pool <- (conversions_A + conversions_B) / (visitors_A +
                                             visitors_B)
print(p_pool) # 0.03928325

# Let's compute Standard error for variants A & B (SE_pool)
SE_pool <- sqrt(p_pool * (1 - p_pool) * ((1 / visitors_A) + 
                                         (1 / visitors_B)))
print(SE_pool) # 0.01020014

# Let's compute the margin of error for the pool
MOE <- SE_pool * qnorm(0.975)
print(MOE) # 0.0199919

# Point Estimate or Difference in proportion
d_hat <- conv_rate_B - conv_rate_A

Output:

0.03928325
0.01020014
0.0199919

5. Let’s compute the z-score

# Compute the Z-score so we
# can determine the p-value
z_score <- d_hat / SE_pool
print(z_score) # 2.249546

Output:

2.249546

6. Using this z-score, we can quickly determine the p-value via a look-up table, or using the code below:

# Let's compute p_value 
# using the z_score value
p_value <- pnorm(q = -z_score, 
                 mean = 0, 
                 sd = 1) * 2
print(p_value) # 0.02447777

Output:

0.02447777

7. Let’s compute the confidence interval for the pool

# Let's compute Confidence interval for the 
# pool using pre-calculated results
ci <- c(d_hat - MOE, d_hat + MOE) 
ci # 0.002953777 0.042937584

# Using same steps as already shown, 
# let's compute the confidence 
# interval for variants A separately
X_hat_A <- conversions_A / visitors_A
se_hat_A <- sqrt(X_hat_A * (1 - X_hat_A) / visitors_A) 
ci_A <- c(X_hat_A - qnorm(0.975) * se_hat_A, X_hat_A
          + qnorm(0.975) * se_hat_A) 
print(ci_A) # 0.01575201 0.03972649

# Using same steps as already shown, 
# let's compute the confidence 
# interval for variants B separately                                 
X_hat_B <- conversions_B / visitors_B
se_hat_B <- sqrt(X_hat_B * (1 - X_hat_B) / visitors_B) 
ci_B <- c(X_hat_B - qnorm(0.975) * se_hat_B, 
          X_hat_B + qnorm(0.975) * se_hat_B) 
print(ci_B) # 0.03477269 0.06659717

Output:

0.002953777 0.042937584
0.01575201 0.03972649
0.03477269 0.06659717

8. Let’s visualize the results computed so far in a dataframe (table):

vis_result_pool <- data.frame(
  metric = c(
    'Estimated Difference',
    'Relative Uplift(%)',
    'pooled sample proportion',
    'Standard Error of Difference',
    'z_score',
    'p-value',
    'Margin of Error',
    'CI-lower',
    'CI-upper'),
  value = c(
    conv_rate_B - conv_rate_A,
    uplift,
    p_pool,
    SE_pool,
    z_score,
    p_value,
    MOE,
    ci_lower,
    ci_upper
  ))
vis_result_pool

Output:

                     metric       value
1         Estimated Difference  0.02294568
2           Relative Uplift(%) 82.71917808
3     pooled sample proportion  0.03928325
4 Standard Error of Difference  0.01020014
5                      z_score  2.24954609
6                      p-value  0.02447777
7              Margin of Error  0.01999190
8                     CI-lower  0.00000000
9                     CI-upper  0.04589136

Recommendation & Conclusions

Variant A has 20 conversions and 721 hits whereas Variant B has 37 conversions and 730 hits.
Relative uplift of 82.72% based on a variant A conversion rate is 2.77% and for B is 5.07%. Hence, variant B is better than A by 82.72%.
For this analysis P-value computed was 0.02448. Hence, there is strong statistical significance in test results.
From the above results that depict strong statistical significance. You should reject the null hypothesis and proceed with the launch.
Therefore, Accept Variant B and you can roll it to the users for 100%.

If you want to know the full analysis and datasets details then please click on this Github link.

Limitations

It is one of the tools for conversion optimization and it's not an independent solution and it's not going to fix all the conversion issues of ours and it can't fix the issues as you get with messy data and you need to perform more than just an A/B test to improve on conversions.

AB Testing With R Programming

Case Study

Data Set Summary

Test Hypothesis

Analysis in R

Recommendation & Conclusions

Limitations

Explore