Even though it’s published in a top psychology journal, she still doesn’t believe it

Statistical Modeling, Causal Inference, and Social Science 2015-10-13

fork

Nadia Hassan writes:

I wanted to ask you about this article.

Andrea Meltzer, James McNulty, Saul Miller, and Levi Baker, “A Psychophysiological Mechanism Underlying Women’s Weight-Management Goals Women Desire and Strive for Greater Weight Loss Near Peak Fertility.” Personality and Social Psychology Bulletin (2015): 0146167215585726.

I [Hassan] find it kind of questionable. Fortunately, the authors use a within-subject sample, but it is 22 women. Effects in evolutionary biology are small. Women’s recall is not terribly accurate. Basically, to use the phrasing you have before, the authors are not necessarily wrong, but it seems as though the evidence is not as strong as they claim.

Here’s the abstract of the paper in question:

Three studies demonstrated that conception risk was associated with increased motivations to manage weight. Consistent with the rationale that this association is due to ovulatory processes, Studies 2 and 3 demonstrated that it was moderated by hormonal contraceptive (HC) use. Consistent with the rationale that this interactive effect should emerge when modern appearance-related concerns regarding weight are salient, Study 3 used a 14-day diary to demonstrate that the interactive effects of conception risk and HC use on daily motivations to restrict eating were further moderated by daily motivations to manage body attractiveness. Finally, providing evidence that this interactive effect has implications for real behavior, daily fluctuations in the desire to restrict eating predicted daily changes in women’s self-reported eating behavior. These findings may help reconcile prior inconsistencies regarding the implications of ovulatory processes by illustrating that such implications can depend on the salience of broader social norms.

Ummm, yeah, sure, whatever.

OK, let’s go thru the paper and see what we find:

This broader study consisted of 39 heterosexual women (the total number of participants was determined by the number of undergraduates who volunteered for this study during a time frame of one academic semester); however, 8 participants failed to respond correctly to quality-control items and 7 participants failed to complete both components of the within-person design and thus could not be used in the within-person analyses. Two additional participants were excluded from analyses: 1 who was over the age of 35 (because women over the age of 35 experience a significant decline in fecundability; Rothman et al., 2013) and 1 who reported a desire to lose an extreme amount of weight relative to the rest of the sample . . .

Fork. Fork. Fork.

We assessed self-esteem at each high- and low-fertility session using the Rosenberg Self-Esteem Scale (Rosenberg, 1965) and controlled for it in a supplemental analysis.

Fork. (The supplemental analysis could’ve been the main analysis.)

Within-person changes in ideal weight remained marginally negatively associated with conception risk . . . suggesting that changes in women’s current weight across their ovulatory cycle did not account for changes in women’s ideal weight across their ovulatory cycle.

The difference between “significant” and “not significant” is not itself statistically significant.

Notably, in this small sample of 22 women, self-esteem was not associated with within-person changes in conception risk . . .

“Not statistically significant” != “no effect.”

consistent with the idea that desired weight loss is associated with ovulation, only naturally cycling women reported wanting to weigh less near peak fertility.

The difference between “significant” and “not significant” is not itself statistically significant.

One recent study (Durante, Rae, & Griskevicius, 2013) demon- strates that ovulation had very different implications for women’s voting preferences depending on whether those women were single or in committed relationships.

Ha! Excessive credulity. If you believe that classic “power = .06” study, you’ll believe anything.

OK, I won’t go through the whole paper.

The point is: I agree with Hassan: this paper shows no strong evidence for anything.

Am I being unfair here?

At this point, you say that I’m being unfair: Why single out these unfortunate researchers just because they happen to have the bad luck to work in a field with low research standards? And what would happen if I treated everybody’s papers with this level of skepticism?

This question comes up a lot, and I have several answers.

First, if you think this sort of evolutionary psychology is important, then you should want to get things right. It’s not enough to just say that evolution is true, therefore this is good stuff. To put it another way: it’s quite likely that, if you got enough data and measured carefully enough, that the patterns in the general population could well be in the opposite direction (and, I would assume, much smaller) than what was claimed in the published paper. Does this matter? Do you want to get the direction of the effect right? Do you want to estimate the effect size within an order of magnitude? If the answer to these questions is Yes, then you should be concerned when shaky methods are being used.

Second, remember what happened when that Daryl Bem article on ESP came out? People said that the journal had to publish that paper because the statistical methods Bem used were standard in psychology research. Huh? There’s no good psychology being done anymore so we just have to fill up our top journals with unsubstantiated claims, presented as truths?? Sorry, but I think Personality and Social Psychology Bulletin can do better.

Third, should we care about forking paths and statistical significance and all that? I’d prefer not to. I’d prefer to see an analysis of all the data at once, using Bayesian methods to handle the multiple levels of variation. But if the claims are going to be based on p-values, then forking paths etc are a concern.

What, then?

Finally, the question will arise: What should these researchers do with this project, if not publish it in Personality and Social Psychology Bulletin? They worked hard, they gathered data; surely these data are of some value. They even did some within-person comparisons! It would be a shame to keep these data unpublished.

So here’s my recommendation: they should be able to publish this work in Personality and Social Psychology Bulletin. But it should be published in a way that is of maximum use to the research field (and, ultimately, to society):

– Post all the raw data. All of it.

– Tone down the dramatic claims. Remember Type S errors and Type M errors, and the garden of forking paths, and don’t take your p-values so seriously.

– Present all the relevant comparisons; don’t just navigate through and report the results that are part of your story.

– Finally, theorize all you want. Just recognize that your theories are open-ended and can explain just about any pattern in data (just as Bem could explain whatever interaction happened to show up for him).

And finally, let me emphasize that I’m not saying I think the claims of Meltzer et al. are false, I just think they’re not providing strong empirical evidence for their theory. Remember 50 shades of gray? That can happen to you too.

The post Even though it’s published in a top psychology journal, she still doesn’t believe it appeared first on Statistical Modeling, Causal Inference, and Social Science.