As any web geek worth his Github repo knows, A/B testing is the smartest way to improve the performance of everything from display ads to web registrations. If you have one of the more awesome A/B testing tools on the market (Optimizely is one of my favorites), significance testing and results are likely automated, which means you don’t have to worry about finding significance yourself. However, if you’re setting up and running your own test (which can be just as effective), you’ll want to make sure you have statistically relevant sample data before you make a long-term change to your website.
Making decisions based on observed data is easier than trying to guess which of two or more options will perform better
If you’re a gifted mind-reader, this is somewhat irrelevant (feel free to skip the rest of this section.) For those of us who can’t hear the thoughts of our friends and family, A/B testing makes it easy to decide which option is best in light of a specific goal.
For instance, let’s take as an example a call-to-action button on your website. You’re stuck trying to decide whether this button, which leads to a free trial registration, should say “Let’s get started!” or “Free Trial.” It’s a hard decision because both options sound thrilling; we all love it when we get something for free, but “Let’s get started!” sounds like you could be setting off on an adventure.
Rather than risk offering your potential subscribers an adventure when what they’re looking for is a free lunch, an A/B test will give you real-life information about which phrase encourages more people to sign up for a free trial.
A/B testing is sometimes performed with an exotic-sounding statistical calculation called an N-1 Chi-Square test. Fortunately for those of us who might or might not have slept through a few calculus classes (I admit nothing) this isn’t the only way to get statistically sound results from an A/B test.
Let’s take as an example the two different variations of button text I mentioned above.
We’ll assume that they’ve been displayed randomly to website visitors over time and that our sample data isn’t biased. In order for test results to be statistically sound, we first need to test for significance. Here’s a table of the results we have after our first two days of testing:
If I were to look at performance results at this stage, it might look like Option B is the obvious winner. Except, Option B is not the obvious winner, because the results are not statistically significant. Here’s how you can test significance:
1.) Add the positive results of all tested variations – this value is N.
In the button example above, our positive results are 13 + 19 = 32
2.) Find the difference between the results for the two variations, and divide this number by 2 – this value is X.
In our example, (19-13)/2 = 3
3.) Find the value of X2 and compare to the value of N. If X2>N, your results are statistically significant.
Since X2 = 9 in our example, the results are not significant because 9 is not greater than 32.
A/B testing is a great way to make your website a more effective marketing asset. Just make sure you’re making decisions based on relevant sample data.
Sources (for math geeks):
- Wikipedia – Pearson’s Chi Square Test
- Easy Statistics for A/B Testing with Hamsters
- Guide to A/B Testing with the Google Website Optimizer