Book call

Measuring with A/B Testing

This post will explain how to use A/B testing in the best way to measure your user experiences within an e-commerce environment.

What is A/B Testing?

Short answer: A/B testing is a method to help you determine what works, and what doesn’t

A bit more technical answer: A/B testing is a method of experimenting in which two or more variants are performed simultaneously in order to see which variant performs better. 

Remind me why I need this?

Imagine the following: one of your designers created a new add-to-cart button which looks much slicker than the original one. Everyone in the office is very excited that this new button will increase the conversion rate (makes sense, right?). So instead of testing your hypothesis with A/B testing, you go ahead and execute the change. Now you start refreshing your reporting report and pray that it has any effect on the shoppers. And miraculously, the conversion rate increased 5%, wow!

But…how can you be sure that the positive changes in user behavior have changed due to your button? You later find out that the marketing team was testing a new coupon code at the same time as you switched the button. There are hundreds of other factors that affect every e-commerce decision (day of the week, holiday, a promotion you missed – etc)

And if the button actually reduced 5% in conversion rate and it was offset with the 10% increase that came from the discounts. Keeping the new button is a mistake that can cost you a lot of money down the line.

How does A/B testing work?

The way to see the actual effect of the new button is to create two different versions of the product page. The only difference would be the button, which is the element you want to check. The original version will have the original button and the other is the same page with only the button changed.

After this, we will split the website users into two groups by randomly assigning each user a group. Users of group A will see the original page and users of group B will see the new version. Then we will track what is the conversion rate of each group members. Once enough users enter this test, we will have conclusive evidence on which version works better and based on that implement the change to all users.

Terminology

  • Variant – a version of your hypothesis that will be included in the A/B testing experiment. A variant can be anything that will change the original user experience (change of button, more shipping options…etc). At least two variants are needed to start a test. It’s possible to add more variants with more advanced methods, which we won’t cover today.
  • Control – the original variant without any changes
  • Challenger – new variant to challenge the existing Control variant.
  • Champion – the variant that wins the A/B testing experiment with the best conversion performance.

How to split the users into two groups?

In order to get significant results in the A/B test, there are few things that we need to take into consideration – the amount of traffic and the way we split it into variant groups. 

Traffic should be split randomly between the variants based on a predetermined weighting. In a test with two variants, you can split traffic 50/50, 60/40 or even 80/20, depending on how many users you want to expose the Challenger. Usually, it’s recommended to start with 50/50 distribution, to make sure that the experiment is running properly. In case you are afraid of a bad influence potentially caused by the new variant you can make the whole experiment for X% of your store traffic.

Keep in mind that you have to expose a certain amount of traffic through all variants in order to get significant results.

How to calculate the amount of traffic?

Our goal in each and every A/B test experiment is to get significant results and make a conclusive decision. The million-dollar question in an A/B test experiment is:

“How many subjects are needed for a valid A/B test” or in other words “ how long do I have to wait?”

It’s the same question because after you know how many users you need for each variant, you can estimate the experiment duration based on your store traffic.

There are few calculators that help us to make this estimation. The input for this calculation is usually: 

  • Baseline conversion rate – the estimated existing conversion rate
  • The minimum conversion rate improvement you want to detect in %

The output will be the sample size needed for each variant. The higher the conversion rate percentage we want to improve the smaller the number of samples we need to get significant results. Let’s illustrate this with an example:

Hypothesis – changing the “add-to-cart” button in a product detail page (PDP) from a square to a circle will increase conversion rate by 20%, which is a dramatic improvement.

This means we’re expecting the change to make a very big impact on the user. This means we’ll need a relatively limited number of users to understand whether the conversion rate will be increased or not. 

Here are some recommended calculators to get you started

Pros & Cons

A/B testing is a great tool to get useful data from users, Although, A/B testing still leaves a lot of unknown.

Pros:

  • Great tool for testing new ideas – In today’s shopping era, A/B testing is a great tool for trying out innovative ideas on users and it provides hard proof if the idea is worth big-scale implementation or not.
  • Cheap to run – software solutions that help you run A/B tests are affordable and manageable.

Cons:

  • Time – you’ll need to wait for many experiments to finish, which could take weeks or months in some cases.
  • Proper setup of a test is hard and will take a lot of resources in your company. There are many parameters that you need to take into consideration in order to make sure you will get valid and significant results and you need a team to monitor this.
  • Not suitable for “on the go changes” – when you launch an experiment you need to commit to it fully. You can’t change the experiment settings in the middle of the test if something went wrong or you have discovered an insight you weren’t aware of if so, the experiment should be restarted.

What can get complicated?

There may be some failure points in the A/B test process, here is what we learned from our experience: 

  1. Traffic is not homogeneous – if your store’s traffic is not homogeneous and it consists of a very low percentage of return users or that traffic characteristic is changing from time to time (different demographics for example), it can get difficult to measure a certain behavioral pattern among your users.
  2. Estimate the right minimum conversion rate improvement you want to detect – usually, our initial estimates are not very accurate. It is difficult or even impossible to make changes during the test if you come to an understanding that the evaluation was not good and needed more traffic.
  3. Collecting wrong traffic as reference data – Today there are many different search engines that run bots and crawlers on your site and can mislead and affect the results considerably, this should be considered and managed.

To summarize, A/B testing is a valuable tool to test new ideas within the shopping experience. It’s a valid way to discovery what attracts your users and how you can improve the experience. However, you need to carefully plan each experiment and take many factors into consideration.

Want to learn more?

Talk to one of our content experts!

Book a call