A/B Testing

Is It Better To A/B Test For 95% Statistical Significance Or 99.95% Statistical Significance?

An excellent question was raised by one of my A/B testing clients at Web Marketing ROI. Adam asked:

“Just curious to know when you prefer to use 95%+ versus 99.95%?”

My Answer:

It’s an excellent question and one that very few A/B testers think about or understand.

When we say “95% statistical significance” – we’re (in simplistic terms) basically saying there is a 5% chance that the results could be a statistical false positive.

When we say “99.95% statistical significance” – we’re (in simplistic terms) basically saying there is a 0.05% chance that the results could be a statistical false positive.

Therefore, 99.95% is much more certain than 95% and obviously preferable.

It’s not as simple as that though.

There is also a question of the additional time it may take to achieve 99.95% statistical significance versus 95% statistical significance.

The additional time required to reduce the likelihood of statistical anomalies by 4.95% has an associated opportunity cost.

In other words: is it worth investing an additional X days (where X can be a large number) in increasing the certainty of this test or would be better off ‘playing the odds’ (with the odds in our favor) and using valuable testing time invested in finding another statistical significant conversion improvement? To be able answer this intelligently, we need to calculate what X is.

Assuming your current numbers extrapolate out over the next 7 days, it might look as follows:

Calculating Statistical Significance (MS Excel)

As you can see from the above: 95%+ statistical significance might be achieved on Day 4, but 99.95% statistical significance is still not achieved at Day 10. Your traffic numbers and number of daily conversions are more than 5 per day so getting to 99.95% is a lot easier for you than some companies with less daily conversions.

Another way of thinking about it is imagine you were in a casino where the odds were permanently rigged in your favor at 95% – the logical way to generate an infinite amount of money would be to keep making as many bets as possible and eventually ‘the house (i.e: you) would win’. Although you could possibly tweak the system to be 99.95% accurate, you wouldn’t need to because given a long enough period of time and enough bets (of small amounts relative to your total resources), you’d be a winner.

At 95% statistical significance, 1 in 20 A/B tests might be a false positive. But you’ve still got 19 accurate real winners in less time.

At 99.95% statistical significance, 1 in 2000 A/B tests would be a false positive but if you were going for 99.95% statistical significance – you might have only had the chance to execute half as many tests.

Given the choice between 20 tests at 95% statistical significance vs 3 — 10 tests at 99.95% statistical significance:

The former is better in terms of real conversion improvement, statistically speaking (‘playing the odds’).

Assuming you had an infinite amount of time and no opportunity cost (which isn’t really realistic in business), 99.95% would be better.

 

Standard