To Throw Away Data: Plagiarism as a Statistical Crime

Statistical Modeling, Causal Inference, and Social Science 2013-05-22

plaig1plaig2

I’ve been blogging a lot lately about plagiarism (sorry, Bob!), and one thing that’s been bugging me is, why does it bother me so much. Part of the story is simple: much of my reputation comes from the words I write, so I bristle at any attempt to devalue words. I feel the same way about plagiarism that a rich person would feel about counterfeiting: Don’t debase my currency!

But it’s more than that. After discussing this a bit with Thomas Basbøll, I realized that I’m bothered by the way that plagiarism interferes with the transmission of information:

Much has been written on the ethics of plagiarism. One aspect that has received less notice is plagiarism’s role in corrupting our ability to learn from data: We propose that plagiarism is a statistical crime. It involves the hiding of important information regarding the source and context of the copied work in its original form. Such information can dramatically alter the statistical inferences made about the work.

In statistics, throwing away data is a no-no. From a classical perspective, inferences are determined by the sampling process: point estimates, confidence intervals and hypothesis tests all require knowledge of (or assumptions about) the probability distribution of the observed data. In a Bayesian analysis, it is necessary to include in the model all variables that are relevant to the data-collection process. In either case, we are generally led to faulty inferences if we are given data from urn A and told they came from urn B.

A statistical perspective on plagiarism might seem relevant only to cases in which raw data are unceremoniously and secretively transferred from one urn to another. But statistical consequences also result from plagiarism of a very different kind of material: stories. To underestimate the importance of contextual information, even when it does not concern numbers, is dangerous.

Here’s our full article (which has just appeared in the American Scientist). It features two of the recurring characters from this blog. Here’s our conclusion:

Scholars in fields ranging from psychology to history to computer science have recognized that stories are part of how people understand the world. As statisticians, we can consider reasoning from stories as a form of approximate inference. From this perspective, statistical principles should provide some approximate guidance about the potential biases and precision of such inferences. One key principle is not to throw away information and, if discarding data is for some reason necessary, to describe as clearly as possible the mechanism by which the relevant information was excluded. Plagiarism violates both these rules and, as such, is a violation of statistical ethics, beyond any other considerations of moral behavior.

P.S. I’m more interested in scientific plagiarism than the legal or literary variety, but this 2004 news article by Daniel Hemel and Lauren Schuker (which I found by googling *laurence tribe plagiarism*) is full of good quotes. Here’s my favorite part:

Tribe’s mea culpa comes just three weeks after another prominent Harvard faculty member—Climenko Professor of Law Charles J. Ogletree—publicly apologized for copying six paragraphs almost word-for-word from a Yale scholar in a recent book, All Deliberate Speed.

Last fall, Frankfurter Professor of Law Alan M. Dershowitz also battled plagiarism charges. And in 2002, Harvard Overseer Doris Kearns Goodwin admitted that she had accidently copied passages from another scholar in her bestseller The Fitzgeralds and the Kennedys.

University President Lawrence H. Summers told The Crimson in an interview last week—before the allegations against Tribe surfaced—that he did not see “a big trend” of plagiarism problems at the Law School as a result of the charges against Ogletree and Dershowitz, but indicated that a third case would change his mind.

“If you had a third one, then I would have said, okay, you get to say this is a special thing, a focused problem at the Law School,” Summers said of the recent academic dishonesty cases.

He declined comment last night.

The post To Throw Away Data: Plagiarism as a Statistical Crime appeared first on Statistical Modeling, Causal Inference, and Social Science.