What is AB testing: a 2020s view

Numbers Rule Your World 2022-07-12

I don't mean what is the concept of AB testing. If you've been reading this blog, you already know what it is (for those trained in design of experiments, it's the very first lecture.). Here are some prior posts about AB testing (1, 2, 3, 4). I even made a video about AB testing from a business operations perspective here.

All big websites have people whose job is to run AB tests. These tests are supposed to help figure out the optimal page designs.

The devil is in the details. Every AB test has one or more objectives it's trying to optimize. Most AB testers believe that the overarching objective is to make users "happy".

It takes very little time to debunk that myth.

I'll drop an example in this post.

*** In recent times, Twitter has been running an AB test. I know because I have been on both arms of this test. They are testing a pop-up that shows up after a small amount of scrolling. (I am not logged on.)

Here is Version A.

Here is Version B.

That's my reconstruction of Version B as I think they have locked me into a specific test group, or they have already discarded Version B from the test at the time of writing.

Notice that Version A completely blocks access to the feed unless the user logs into or sets up an account while Version B has an X in the corner that the user can use to remove the pop-up and continue to read the feed without logging in or signing up.

***

What are some obvious metrics the AB tester would use to evaluate this test?

What is the goal of the test? What is the tester's hypothesis prior to running the test?

If your goal is to make users happy, is this the test you'd design?

***

Bonus question: AB testing, or randomized experiments in general, has been heralded as the gold standard for causal inference. It's the best way to learn what is the "cause" of observed effects. In all likelihood, version A outperforms version B on the objective of generating more sign-ups or log-ins. What is the cause of this effect?

Is it the presence/absence of the "X" in the corner? Is it the decision to block/allow users to work around the pop-up? Is it blocking/allowing users to access the site contents?

Let's say the test shows no effect. What did we learn about causality (or not)?

[Metrics: number of new sign-ups, number of log-ins, time on site, bounce rate are the most obvious.]