On the Ethics of A/B Testing

data_society's bookmarks 2014-07-08

The discussion triggered by Facebook’s mood manipulation experiment has been enlightening and frustrating at the same time. An enlightening aspect is how it has exposed divergent views on a practice called A/B testing, in which a company provides two versions of its service to randomly-chosen groups of users, and then measures how the users react. A frustrating aspect has been the often-confusing arguments made about the ethics of A/B testing.

One thing that is clear is that the ethics of A/B testing are an important and interesting topic. This post is my first cut at thinking through these ethical questions. I am thinking about A/B testing in general, and not just testing done for academic research purposes. Some disclaimers: I am considering A/B testing in general rather than one specific experiment; I am considering what is ethical rather than what is legal or what is required by somebody’s IRB; I am considering how people should act rather than observing how they do act.

Let’s start with an obvious point: Some uses of A/B testing are clearly ethical. For example, if a company wants to know which shade of blue to use in their user interface, they might use A/B testing to try a few shades and measure user’s responses. This is ethical because no user is harmed, especially if the only result is that the service better serves users.

Here’s another point that should be obvious: Some uses of A/B testing are clearly unethical. Consider a study where a service falsely tells teens that their parents are dead, or a study that tries to see if a service can incite ethnic violence in a war-torn region. Both studies are unethical because they cause significant harm or risk of harm.

So the question is not whether A/B testing is ethical, but rather where we should draw the line between ethical and unethical uses. A consequence of this is that any argument that implies that A/B testing is always ethical or always unethical must be wrong.

Here’s an example argument: Company X does A/B testing all the time; this is just another type of A/B testing; therefore this is ethical. Here’s another: Company X already uses an algorithm to decide what to show to users, and that algorithm changes from time to time; this is just another change to the algorithm; therefore this is ethical. Both arguments are invalid, in the same way that it’s invalid to argue that Chef Bob often cuts things with a knife, therefore it is ethical for him to cut up anything he wants. The ethical status of the act depends on what exactly Chef Bob is cutting, what exact A/B test is being done, or what exact algorithm is being used. (At the risk of stating the obvious: the fact that these sorts of invalid arguments are made on behalf of a practice does not in itself imply that the practice is bad.)

Another argument goes like this: Everybody knows that companies do A/B tests of type X; therefore it is ethical for them to do A/B tests of type X. This is also an invalid argument, because knowledge that an act is occurring does not imply that the act is ethical.

But the “everyone knows” argument is not entirely irrelevant, because we can refine it into a more explicit argument that deserves closer consideration. This is the implied consent argument: User Bob knows that if he uses Service X he will be subject to A/B tests of Type Y; Bob chooses to use Service X; therefore Bob can be deemed to have consented to Service X performing A/B tests of Type Y on him.

Making the argument explicit in this way exposes two potential failures in the argument. First, there must be general knowledge among users that a particular type of testing will happen. “Everyone knows” is not enough, if “everyone” means everyone in the tech blogosphere, or everyone who works in the industry. Whether users understand something to be happening is an empirical question that can be answered with data; or a company can take pains to inform its users—but of course I mean actually informing users, not just providing some minimal the-information-was-available-if-you-looked notification theater.

Second, the consent here is implied rather than explicit. In practice, User Bob might not have much real choice about whether to use a service. If his employer requires him to use the service, then he would have to quit his job to avoid being subject to the A/B test, and the most we can infer from his use of the service is that he dislikes the test less than he would dislike losing his job. Similarly, Bob might feel he needs to use a service to keep tabs on his kids, to participate in a social or religious organization, or for some other reason. The law might allow a legal fiction of implied consent, but what we care about ethically is whether Bob’s act of using the service really does imply that he does not object to being a test subject.

Both of these caveats will apply differently to different users. Some users will know about a company’s practices but others will not. Some users will have a free, unconstrained choice whether to use a service but others will not. Consent can validly be inferred for some users and not others; and in general the service won’t be able to tell for which users it exists. So if a test is run on a randomly selected set of users, it’s likely that consent can be inferred for only a subset of those users.

Where does this leave us? It seems to me that where the risks are minimal, A/B testing without consent is unobjectionable, as in the shades-of-blue example. Where risks are extremely high or there are significant risks to non-participants, as in the ethnic-violence example, the test is unethical even with consent from participants. In between, there is a continuum of risk levels, and the need for consent would vary based on the risk. Higher-risk cases would merit explicit, no-strings-attached consent for a particular test. For lower-risk cases, implied consent would be sufficient, with a higher rate of user knowledge and a higher rate of unconstrained user choice required as the risk level increases.

Where exactly to draw these lines, and what processes a company should use to avoid stepping over the lines, are left as exercises for the reader.