We're sorry but our site requires JavaScript

Permutation tests in AB testing

Before proceeding to the essence of permutation tests, let’s review a situation that comes up frequently. You want to analyze an experiment, but you realize that you don’t have enough data to apply it. You have started the analysis but now realize that with the results so far, there isn’t enough statistical power to determine statistical significance. This is a typical scenario that often stems from the limited sample size used to achieve the statistical power.

The options

  • Perform calculations based on the data received which is not a good idea because such data may be significantly skewed
  • Solve the problem by applying a bootstrap. However, after applying bootstrap to accept or deny null hypothesis (H0), the only option is to use confidence intervals

However, what if we want to describe the metric with multiple predictors, or to extract statistics from some criterion? In this case, the bootstrap would not be very useful. Now you are ready for the explanation of the permutation tests and when they are needed.

article image

Let's imagine that you decided to calculate your a/b test using a Student’s t-test. You assumed that your data is normally distributed and with equal variance (for example, you confirmed it using Bartlett’s test). Next, you would calculate the t-statistics and compare them to the theoretical distribution, and so on, in order to reject the H0.

A slightly different approach can be used with permutation tests

  1. Calculating the t-statistics as in the usual approach; let’s call it t0
  2. Placing for example all 10 values in one group
  3. Randomly placing 5 values into group A and another 5 into group B
  4. Calculating new t-statistics
  5. Repeating steps 3 and 4 2n amount of times
  6. Placing the values extracted from the t-statistics in ascending order
  7. If t0 is not included in the middle 95% of values of the empirical distribution, then you should reject the null hypothesis about the equality of the mean values in 2 samples with a probability of 95%

This approach may remind you of Bayes. In R, the implementation of this approach is implemented in the Coin package, but you can also implement it using standard functions or the boot package for bootstrap.

Permutation tests are one way to handle a situation where the sample size is not enough to obtain sufficient statistical power to determine the significance of the results. However, you need to remember that no “little trick” will replace the sample size to achieve the optimum power of the experiment. You should only apply this method when you understand both its applicability and the nature of your data.