Beyond junk science: How to go forward

Statistical Modeling, Causal Inference, and Social Science 2024-08-25

If a research team starts a speculative idea, whether it be silly (the idea that people will play better if they’re told they have a lucky golf ball) or speculative (the idea that people will react differently to a male or female-named hurricane) or borderline ridiculous (the idea that beautiful parents will be more likely to have girl babies), there are a few ways they can go forward:

One way to go, which I like, is to advance from a directional hypothesis to a quantitative hypothesis. This takes some work, and I argue it’s work worth doing, as it leads to closer connection to existing science and motivates a critical reading of the literature. It also then leads to the next step of constructing a generative model and simulating fake data that can be used to design a possible experiment.

Another way to go, which unfortunately seems to still be the standard approach in may areas of psychology research, is to just jump right in and conduct an experiment with no realistic sense of possible effect size and variation, and then find something statistically significant and go publish.

That second approach will often take everyday mediocre science and turn it into bad science.

Just for example, here’s an abstract representing mediocre science:

We speculate that people could react consistently differently to hurricanes with male and female names. This could be studied by comparing death rates in historical hurricanes and further understood using laboratory experiments studying people’s gender-based expectations about severity and preparedness to take protective action.

And here’s an abstract representing junk science:

Do people judge hurricane risks in the context of gender-based expectations? We use more than six decades of death rates from US hurricanes to show that feminine-named hurricanes cause significantly more deaths than do masculine-named hurricanes. Laboratory experiments indicate that this is because hurricane names lead to gender-based expectations about severity and this, in turn, guides respondents’ preparedness to take protective action. This finding indicates an unfortunate and unintended consequence of the gendered naming of hurricanes, with important implications for policymakers, media practitioners, and the general public concerning hurricane communication and preparedness.

The latter is the abstract from the published himmicanes paper; the former is my adaptation of what could’ve been written as speculation. The published abstract is, in my opinion, bad science in combining a lack of strong theory with an absence of evidence. In contrast, my “mediocre” abstract has no strong theory but it does not pretend to have evidence. The addition of the strong and unsupported claims made the project worse.

Also, the gap between the mediocre and bad abstracts is instructive, in that it suggests the gap, which is some sense of effect sizes and variation, which would help in any study design.

Of course, the mediocre abstract would never get published in PPNAS or featured in NPR!

This all came up in a recent blog discussion, following this comment from Dale:

When it comes to research, it is all too easy to label a research paper as good or bad, or conclude it should never have been done, or label the data as too noisy to yield meaningful results. But I think all of these are a continuum – the world is all gray. When is the data “too noisy?” The real question is “too noisy for what?” I think the problem with these studies is not that the data is too noisy for the study to be done, but that it is too noisy to reach any conclusions. . . .

What accounts for the prevalence of bad social science? I would propose that this is the wrong focus – it is the nature of social science that we will differ about what studies are worth undertaking, which are worth reporting, and what conclusions can be reached (or even suggested). I think a better question is what accounts for the failures of our research institutions (including academia, think tanks, granting agencies, etc.) to provide meaningful evaluation of social science research? Until the evaluation improves, I would not expect to see the quality of the work improve.

There are multiple dimensions here. There’s the quality of the science (which is some combination of theory, design of data collection and measurement, and analysis), the interestingness or importance of the question being asked (whether the study is worth doing at all), and the general direction of the research program. The point of the present post is to separate some of these issues by considering how to science better, even if you happen to be studying a silly topic. The same principles should apply to more serious work as well.

Summary

Again, my recommendation is to advance from a directional hypothesis to a quantitative hypothesis. This takes some work, and I argue it’s work worth doing, as it leads to closer connection to existing science and motivates a critical reading of the literature. It also then leads to the next step of constructing a generative model and simulating fake data that can be used to design a possible experiment.

I think this is better than to just jump right in and conduct an experiment with no realistic sense of possible effect size and variation, and then find something statistically significant and go publish (or highlight a non-significant difference and claim to have demonstrated no effect).