Should this paper in Psychological Science be retracted? The data do not conclusively demonstrate the claim, nor do they provide strong evidence in favor. The data are, however, consistent with the claim (as well as being consistent with no effect)
Statistical Modeling, Causal Inference, and Social Science 2016-07-01
Retractions or corrections of published papers are rare. We routinely encounter articles with fatal flaws, but it is so rare that such articles are retracted that it’s news when it happens.
Retractions sometimes happen at the request of the author (as in the link above, or in my own two retracted/corrected articles) and other times are achieved only with great difficulty if at all, in the context of scandals involving alleged scientific misconduct (Hauser), plagiarism (Wegman), fabrication (Lacour, Stapel), and plain old sloppiness (Reinhart and Rogoff, maybe Tol falls into this category as well).
And one thing that’s frustrating is that, even when the evidence is overwhelming that a published claim is just plain wrong, authors will fight and fight and refuse to admit even an inadvertent mistake (see the story on pages 51-52 here).
These cases are easy calls from the ethical perspective, whatever political difficulties might arise in trying to actually elicit a reaction in the face of opposition.
Should this paper be retracted?
Now I want to talk about a different example. It’s a published paper not involving any scientific misconduct, not even any p-hacking that I notice, but the statistical analysis is flawed, to the extent that I do not think the data offer any strong support for the researchers’ hypothesis.
Should this paper be retracted/corrected? I see three arguments:
1. Yes. The paper was published as an empirical study that offers strong support for a certain hypothesis. The study offers no such strong support, hence the paper should be flagged so that future researchers do not take it as evidence for something it’s not.
2. No. Although the data are consistent with the researchers’ hypothesis being false, they are also consistent with the researchers’ hypothesis being true. We can’t demonstrate convincingly that the hypothesis is wrong, either, so the paper should stand.
3. No. In practice, retraction and even correction are very strong signals, and these researchers should not be punished for an innocent mistake. It’s hard enough to get actual villains to retract their papers, so why pick on these guys.
Argument 3 has some appeal but I’ll set it aside; for the purpose of this post I will suppose that retractions and corrections should be decided based on scientific merit rather than on a comparative principle.
I’ll also set aside the reasonable argument that, if a fatal statistical error is enough of a reason for retraction, then half the content of Psychological Science would be retracted each issue.
Instead I want to focus on the question: To defend against retraction, is it enough to point out that your data are consistent with your theory, even if the evidence is not nearly as strong as was claimed in the published paper?
A study of individual talent and team performance
OK, now for the story, which I learned about through this email from Jeremy Koster:
I [Koster] was reading this article in Scientific American, which led me to the original research article in Psychological Science (a paper that includes a couple of researchers from the management department at Columbia, incidentally).
After looking at Figure 2 for a little while, I [Koster] thought, “Hmm, that’s weird, what soccer teams are comprised entirely of elite players?”
Which led me to their descriptive statistics. Their x-axis ranges to 100%, but the means and SD’s are only 7% and 16%, respectively:
They don’t plot the data or report the range, but given that distribution, I’d be surprised if they had many teams comprising 50% elite players.
And yet, their results hinge on the downward turn that their quadratic curve takes at these high values of the predictor. They write, “However, Study 2 also revealed a significant quadratic effect of top talent: Top talent benefited performance only up to a point, after which the marginal benefit of talent decreased and turned negative (Table 2, Model 2; Fig. 2).”
If you’re looking to write a post about the perils of out-of-sample predictions, this would seem to be a fun candidate . . .
For convenience, I’ve displayed the above curve in the range 0 to 50% so you can see that, based on the fitted model, there’s no evidence of any decline in performance:
So, in case you were thinking of getting both Messi and Cristiano Ronaldo on your team: Don’t worry. It looks like your team’s performance will improve.
Following the links
The news article is by a psychology professor named Cindi May and is titled, “The Surprising Problem of Too Much Talent: A new finding from sports could have implications in business and elsewhere.” The research article is by Roderick Swaab, Michael Schaerer, Eric Anicich, Richard Ronay and Adam Galinsky and is titled, “The Too-Much-Talent Effect: Team Interdependence Determines When More Talent Is Too Much or Not Enough.”
May writes:
Swaab and colleagues compared the amount of individual talent on teams with the teams’ success, and they find striking examples of more talent hurting the team. The researchers looked at three sports: basketball, soccer, and baseball. In each sport, they calculated both the percentage of top talent on each team and the teams’ success over several years. . . .
For both basketball and soccer, they found that top talent did in fact predict team success, but only up to a point. Furthermore, there was not simply a point of diminishing returns with respect to top talent, there was in fact a cost. Basketball and soccer teams with the greatest proportion of elite athletes performed worse than those with more moderate proportions of top level players.
Now that the finding’s been established, it’s story time:
Why is too much talent a bad thing? Think teamwork. In many endeavors, success requires collaborative, cooperative work towards a goal that is beyond the capability of any one individual. . . . When a team roster is flooded with individual talent, pursuit of personal star status may prevent the attainment of team goals. The basketball player chasing a point record, for example, may cost the team by taking risky shots instead of passing to a teammate who is open and ready to score.
Two related findings by Swaab and colleagues indicate that there is in fact tradeoff between top talent and teamwork. First, Swaab and colleagues found that the percentage of top talent on a team affects intrateam coordination. . . . The second revealing finding is that extreme levels of top talent did not have the same negative effect in baseball, which experts have argued involves much less interdependent play. In the baseball study, increasing numbers of stars on a team never hindered overall performance. . . .
The lessons here extend beyond the ball field to any group or endeavor that must balance competitive and collaborative efforts, including corporate teams, financial research groups, and brainstorming exercises. Indeed, the impact of too much talent is even evident in other animals: When hen colonies have too many dominant, high-producing chickens, conflict and hen mortality rise while egg production drops.
This is all well and good (except the bit about the hen colonies; that seems pretty much irrelevant to me, but then again I’m not an egg farmer so what do I know?), but it all hinges on the general validity of the claims made in the research paper. Without the data, it’s just storytelling. And I can tell as good a story as anyone. OK, not really. Steven King’s got me beat. Hell, Jonathan Franzen’s got me beat. Salman Rushdie on a good day’s got me beat. John Updike or Donald Westlake could probably still out-story me, even though they’re both dead. But I can tell stories just as well as the ovulation-and-voting people, or the fat-arms-and-politial attitudes people, or whatsisname who looked at beauty and sex ratio, etc. Stories are cheap. Convincing statistical evidence, that’s what’s hard to find.
So . . . I was going to look into this. After all, I’m a busy guy, I have lots to do and thus a desperate need to procrastinate. So if some perfect stranger emails me asking me to look into a paper I’ve never heard of on a topic that only mildly interests me (yes, I’m a sports fan but, still, this isn’t the most exciting hypothesis in the world), then, sure, I’m up for it. After all, if the options are blogging or real work, I’ll choose blogging any day of the week.
I contacted one of the authors who’s at Columbia and he reminded me that this paper had been discussed online by Leif Nelson and Uri Simonsohn. And then I remembered that I’d read that post by Nelson and Simonsohn and commented on it myself a year ago.
Swaab et al. responded to Nelson and Simonsohn with a short note, and here are their key graphs:
I think we can all agree on three things:
1. There’s not a lot of data in the “top talent” range as measured by the authors. Thus, to the extent there is a “top talent effect,” it is affecting very few teams.
2. The data are consistent with there being declining performance for the most talented teams.
3. The data are also consistent with there being no decline in performance for the most talented teams. Or, to put it another way, if these were the quantitative results that had been published (when using the measure that they used in the main text of their paper, they found no statistically significant decline at all; they were only able to find such a decline by changing to a different measure that had only been in the supplementary version of their original paper), I can’t imagine the paper would’ve been published.
The authors also present results for baseball, which they argue should not show a “too-much-talent effect”:
This looks pretty convincing, but I think the argument falls apart when you look at it too closely. Sure, these linear patterns look pretty good. But, again, these graphs are also consistent with a flat pattern at the high end—just draw a threshold far enough near the right edge of either graph and you’ll find no statistically significant pattern beyond the threshold.
In discussing these results, Swaab et al. write, “The finding that the effect of top talent becomes flat (null) at some point is an important finding: Even under the assumption of diminishing marginal returns, the cost-benefit ratio of adding more talent can decline as hiring top talent is often more expensive than hiring average talent.”
Sure, that’s fine, but recall that a key part of their paper was that their empirical findings contradicted naive intuition. In fact, the guesses they reported from naive subjects did show declining marginal return on talent. Look at this, from their Study 1 on “Lay Beliefs About the Relationship Between Top Talent and Performance”:
These lay beliefs seem completely consistent with the empirical data, especially considering that Swaab et al. defined “top talent” in a way so that there are very few if any teams in the 90%-100% range of talent.
What to do?
OK, so should the paper be retracted? What do you think?
The authors summarize the reanalysis with the remark:
The results of the new test . . . suggest that the strongest version of our arguments—that more talent can even lead to worse performance—may not be as robust as we initially thought. . . .
Sure, that’s one way of putting it. But “not as robust as we initially thought” could also be rephrased as “The statistical evidence is not as strong as we claimed” or “The data are consistent with no decline in performance” or, even more bluntly, “The effect we claimed to find may not actually exist.”
Or, perhaps this:
We surveyed ordinary people who thought that there would be diminishing returns of top talent on team performance. We, however, boldly proclaimed that, at some point, increasing the level of top talent would decrease team performance. Then we and others looked carefully at the data, and we found that the data are more consistent with with ordinary people’s common-sense intuition than with our bold, counterintuitive hypothesis. We made a risky hypothesis and it turned out not to be supported by the data.
That’s how things go when you make a risky hypothesis—you often get it wrong. That’s why they call it a risk!
The post Should this paper in Psychological Science be retracted? The data do not conclusively demonstrate the claim, nor do they provide strong evidence in favor. The data are, however, consistent with the claim (as well as being consistent with no effect) appeared first on Statistical Modeling, Causal Inference, and Social Science.