You formulated a great idea for a new shopping experience. You made all needed estimates and setups required for the experiment and started the experiment, success! Now what? How can you get valuable insights from the A/B test results? Let’s dig in.
The use case
In our example, we will be adding a new section with images to our product page. The new section will be placed below the product details area and demonstrates how real people wear & and style the product.
Hypothesis: adding the Bllush content layer will create interest among the users – they can imagine how the product looks in real life and thus increase the probability that they find a fit and add it to cart. The estimated conversion rate lift will be 5%. Conversion rate is calculated as the number of purchases after viewing the product page (buy to detail rate, or BTDR)
Current conversion (control): 3.67%
Expected conversion (challenger): 3.86%
Running A/B testing
The key parameter we would like to test is how the content layer affects the overall conversion rate – from the moment the user sees the layer to purchasing the product. Of course, we will also analyze secondary KPIs, but the main one here is to increase revenue. Using the A/B testing calculator we will set the following
- Baseline conversion: 3.67%
- Minimum detectable effect: 3.86% (or 5% in relative)
- The sample size needed for a significant result: 222,000 per variation
- Traffic will be split randomly between groups with 50:50 ratio
- Overall traffic needed: 444,000 visitors
In order to measure the final result, the data following data need to be logged:
- Users: unique users that visited at least one product page
- Sessions: non-unique visits that visited at least one product page
- Product views (PV): views on the product-detail-page, includes repeats
- Content seen: views when the content section was in the user’s viewport
- Add to cart: items that added to cart
- Purchases: order and items (SKU’s) that the user purchased
* Each log should have the timestamp and the variation group it belongs.
Several possible outcomes can come after the test:
The uplift estimation was good, we can see that the number of users passed the minimum needed. The buy to detail rate of the challenger group has increased by 5%
- We have reached the minimum amount of visitors – 222K per variation.
- Splitting traffic randomly between Control and Challenger groups worked well.
- Bllush content layer increased the probability that user will purchase a product by 5% – BTDR uplift: ((3.86-3.67) / 3.67) * 100
- Bllush content layer increased the probability that the user will add a product to cart by 10% cart to detail rate uplift: ((8.60-7.81) / 7.81) * 100
The conversion rate of the challenger group has decreased by 10%. Conclusion: there is significant proof – since we only need 56K visitors for each variation to view an effect of 10% (positive or negative) . The content layer doesn’t have an impact on the BTDR, on the contrary, it seems that the content layer prevents making a purchase.
Outcome: inconclusive / neutral
Although the test has passed the minimum sample size, there is no real outcome because the lift is 1%, much less than our goal of 5%. This means that in order to have significant proof that we’ve increased by 1% we need 5.5M visitors for each variation.
At this stage, we can can’t statistically determine that the content layer has a positive effect on the user. Although there is a potential for a 1% increase, we need far more samples to determine that. Here we could explore more on the reasoning why it’s not having the required impact. Perhaps there is not enough presence of the section on the page and we need to change the user interface. There are many things to experiment with and debug, but I’ll leave that for another post.
As we can see, inaccurate planning and evaluation can lead to a situation where the experiment is irrelevant and no conclusions can be concluded. A precise analysis of A\B testing results is critical for understanding and improving the current state of the product. The analysis must be done very carefully and accurately, taking into account all the planning data.