LISTSERV at Work: Making the Most Out of A/B-Split Testing

Making the Most Out of A/B-Split Testing

By Jani Kumpula
Senior Webmaster/User Interface Designer, L-Soft

One of the more powerful tools that we, as email marketers, have in our arsenals is A/B-split testing. The concept is quite simple and means that you send two or more different versions of a mailing to random splits of your subscribers and compare the response rates to see which version performs better. You can then use these insights to optimize future messages and campaigns. However, to actually obtain meaningful data from A/B-split testing isn't quite as straightforward and requires a little more thought and planning. Here are five simple tips that will help you spend your time more wisely and make the most out of your A/B-split testing.

Make sure that you have enough subscribers

Fundamentally, A/B-split testing is based on statistics, probability and sampling. The smaller your sample size is, the larger the likelihood that random statistical noise can influence the data, making it difficult to reach statistically significant conclusions. So the first step before you begin any A/B-split testing is to make sure that you have enough subscribers to actually obtain meaningful data. For example, if your mailing list only has 100 subscribers, hold off on A/B-split testing until you have a larger subscriber base. To illustrate, if you were to send two variants to 50 subscribers each and got a 30 percent open-up rate from the first variant and a 20 percent open-up rate from the second, you might conclude that the first variant was far more successful. However, those percentages are equivalent to just 15 and 10 open-ups respectively – a difference of 5 – making it impossible to state with any confidence that the results were directly attributable to the differences between the two variants rather than just random variations. If this is your situation, focus on growing your list of subscribers before you spend serious time on A/B-split testing.

Decide what you want to test

If you think that your subscriber database is large enough for A/B-split testing, the next step is to decide exactly what you want to evaluate. Be sure to test only one aspect of your message at a time, or it will be impossible to isolate and ascertain exactly which change triggered the improvement in performance. For example, subject lines are commonly tested because they are one of the most important factors that determine whether subscribers will even open the email message. If you fail to pull your subscribers in, whatever compelling content that you may offer is of no consequence. Time of delivery is another common test. Subscribers are more likely to open the email message if it arrives on a day and time when they are interested and able to act on it. When it comes to content, you can try a different layout, image, copy or call-to-action. The possibilities are endless. Just make sure that whatever variations that you test are substantial enough to make a difference. For example, don't think that simply changing the color of a "Buy" button from blue to green is going to make a difference in response rates. However, omitting vs. including average product reviews next to each item in a message about a time-limited sale on winter boots might be worthwhile to test. Or perhaps try including the original price and percent savings in addition to the sale price next to each item. Try to put yourself in the shoes of the recipient when thinking of what tests are likely to be worthwhile.

Decide how to measure success

Generally, when conducting A/B-split testing with an email marketing campaign, you're measuring campaign success in terms of open-ups, click-throughs or conversions. These three metrics are not the same and can give seemingly contradictory results. For example, let's say that you send an email marketing message to 5000 subscribers on a Tuesday morning with two subject line variants but otherwise identical content:

Variant A: "Winter Boots On Sale. Special Offer Ends Sunday"

Variant B: "Our New Winter Boots Are In. Check Out The Special Offer"

One week later, you check your data and find that Variant A had an open-up rate of 24 percent and a click-through rate of 8 percent, compared to an open-up rate of 28 percent and a click-through rate of 12 percent for Variant B. This would make it seem like Variant B was the clear winner. However, then you dig deeper and find that Variant A had a conversion rate of 5 percent, which means that 5 percent of the recipients made a purchase under the special offer while Variant B only had a conversion rate of 3 percent. Why the discrepancy? Perhaps what was happening is that while the subject line of Variant B was more effective in enticing the recipients to open the email and click through to the website, its lack of a clear end date for the special offer made the need to make an immediate purchase appear less urgent.

So it's important to decide what metric to use for evaluating success. In this case, is Variant B more valuable because more recipients viewed the email and clicked through to the website, which can lead to a larger number of future purchases? Or is Variant A more valuable because it led to a larger number of purchases now, even if fewer total recipients bothered to open the email message?

Evaluate whether your results are statistically significant

Statistical significance is a term that is used to determine whether it's highly likely that the differences in results are real, repeatable and not due to random chance or statistical noise. In order for the data to be considered statistically significant, you generally want to achieve at least a 95 percent confidence level, which means that there is a 95 percent likelihood that the results are due to the actual differences between the two variants. There are mathematical formulas that can be used to calculate whether certain results are statistically significant and at what confidence level. Many free A/B-split test significance calculators are also available on the Internet that allow you to simply input your sample sizes and conversion numbers in a form, which then automatically calculates whether the results are statistically significant.

Let's look at an example:

Variant A was sent to 400 recipients and 40 of them clicked through to the website (10 percent rate).

Variant B was sent to 400 recipients and 52 of them clicked through to the website (13 percent rate).

Are these results statistically significant, meaning that we can conclude that the better results for Variant B are highly likely attributable to the differences between the two variants? No, because the numbers fail to meet the 95 percent confidence level.

On the other hand, let's look at another example:

Variant A was sent to 1600 recipients and 160 of them clicked through to the website (10 percent rate).

Variant B was sent to 1600 recipients and 208 of them clicked through to the website (13 percent rate).

As you can see, the click rates are the same, but in this case because of the substantially larger sample size, the results are statistically significant not just at 95 percent confidence but also at the higher 99 percent confidence level.

To the right, you can see a simple A/B-split statistical significance calculator that you can try. Just enter the number of recipients and conversions for each variant and click "Evaluate" to calculate whether or not the results are statistically significant and at what confidence level.

	Recipients	Conversions	Rate
Variant A
Variant B
			Significant
At 95% Confidence
At 99% Confidence

Be realistic with your expectations

Don't expect miracles or magic bullets here. Your testing won't reveal a simple tweak that leads to an earth-shattering improvement in campaign performance. A lot of the time the data will be inconclusive. However, don't let this discourage you. Instead, take an incremental approach. A/B-split testing is a long-term commitment and an ongoing process dedicated to learning about your particular – and unique – target audience. Whenever you do find statistically significant results, implement what you learned in your next campaign, then move on to test something else, and build on it.

Subscribe to LISTSERV at Work.