We're sorry but our site requires JavaScript

How to validate AB testing platform quality

There are many tools on the market to run A / B tests for your website or product. Large companies like Uber and Netflix has been actively implementing their own systems. Regardless of whether this is a SaaS platform or custom made solution, before launching an experiment, you need to make sure that you will get highly qualified results, otherwise, you can make a wrong decision based on an incorrect experiment analysis.

In this article, we will talk about how to ensure your A/B testing solution works correctly.

1. Launching A/A test

article image

First of all, you need to run A / A test which is a time and resource savvy option. A simple A/A test gives an understanding if something goes wrong and how risky it is to launch a real experiment on that platform.

2. Flickers

article image

Sometimes that happens that the tool may incorrectly distribute the users and some of them may have switched between control and treatment groups. For example, a user with a client ID c6fca060-cb00-40c2-aaa8-c1a3aaa4155d visited control group and making a next order also visited an experiment group. The existence of such users might contaminate the experiment results, so it’s important to find and clean such users from our dataset before the analysis and the real problem when you have a lot of flickers.

Metric Control group Experiment group
33cac3c2-9754-412a-8701-e2b4f6760b45 1 0
284cb69a-d993-456b-93a2-710a32e72472 0 1
ba681f9e-c573-43f7-a59b-f2becb4eacc4 1 0
c6fca060-cb00-40c2-aaa8-c1a3aaa4155d 1 1
4fd9bde7-c8d6-41af-942c-7401e7e1a2fc 0 1

3. Sample size imbalance

article image

If your A / B test supposed to split groups 50/50, then you should expect an almost equal share of users between control and experimental groups. Time to time happens that the group proportion is significantly different. This should be carefully monitored because such sample size imbalance may bias the results.

4. Variance distribution

article image

A quick way to validate the quality of user distribution between groups is a Homogeneity of Variance test. The test is aimed to determine if our samples are from the same general population. One of the solutions is Barlett’s test or Cohran’s test as a non-parametric analog.

Our null hypothesis (H0) is that our samples are taken from the same general population. If we get a p-value below or equal to 0.05, then we can conclude that something goes wrong with our A/B testing platform and it splits users with a significantly different group variance.


Any tool or platform may hide some flaws. It is essential to monitor A/B testing platform and fix the errors when occurred. If you blindly trust the tools and don’t validate results, you risk making wrong decisions which may cause huge losses for business or even reputation. We hope these steps will help you to extend your vision toward A/B testing process and keep you safe from the wrong result interpretation risks.