“The Unbuilt Bench: Experimental Psychology on the Verge of Science”

Statistical Modeling, Causal Inference, and Social Science 2025-03-29

In his new book, “The Unbuilt Bench” sociologist David Peterson writes:

Psychological expertise is ubiquitous in Western societies. From schools to prisons, hospitals to city halls, psychologists categorize, enumerate, diagnose, and plan. They are perhaps the professional group most associated with the practices that Foucault called “technologies of the self.” These are the domains of expertise that emerge between subjectivity and liberal governance, encouraging or coercing self-regulation. Under the psychologist’s gaze, behavior, thought, and emotion become scientific topics which are objectified and returned to individuals as scientific facts meant to explain their experience and provide expectations to be met. . . .

Psychological expertise is one of the great and powerful arms of liberal governance because of its ability to bridge the emotional and spiritual needs of the individual with the designs of institutions, thus hastening the dissolution of that boundary. It is the technique by which the depths of human subjectivity are made visible and tractable to meet the operational necessities of increasingly large and interconnected organizations and systems.

Then comes the puzzle:

Yet despite its integration into all aspects of modern life, psychology has always been dogged by status anxiety. Although psychological experts inhabit many domains, they are the uncontested authority in few. And, unlike many scientific fields, psychological research has yielded relatively few technologies that are considered unambiguous successes. . . .

Rather than a history of technological success, the field has thrived on the basis of its promise. The promise of experimental psychology is not a concrete technology, method, or theory. It is not a scientific product. It is, instead, an idea which recurs in different guises throughout the landscape of modern life–the idea that human thought and behavior can and, thus, should be studied using the same experimental methods that have brought technological progress to the natural sciences. . . .

Yet the staggering variety of psychological theories, tools, and methods invites another interpretation—a promise unfulfilled. . . .

Peterson continues:

What if progress in psychology simply looks different? . . . Rather than drive straight at unwieldy questions about psychology’s cultural role or its status as a science, in this book I address the more tangible question of whether psychology is engaged in a different sort of epistemic activity than other experimental sciences.

Peterson’s book recounts his qualitative analysis based on careful observation and reporting from several academic biology and psychology labs, and he associates research success with what he calls “bench-building”: the specific tools and methods that are used in the laboratory (on the laboratory “bench,” as they call it in biology).

A lot of the discussion reminds me of the conversation that Megan Higgs and I had with the biologist Pemela Reinagel, which we recounted a few years in the post, Biology as a cumulative science, and the relevance of this idea to replication:

One interesting thing about the psychology replication crisis is that it centers on experimental psychology. An experiment should be easier to replicate than an observational study, and my biologist colleague was surprised when I informed her that various famous claims from experimental psychology were believed to be true, sometimes for decades, before everything changed when the studies failed to replicate. I think I gave her the example of the elderly-priming-slow-walking study.

Pamela thought this was wack: how could a famous study sit there for 20 years with nobody trying to replicate it? I said that there’d been lots of so-called conceptual replications, and that researcher degrees of freedom and forking paths had led to these conceptual replications all appearing to be successes, even though it turned out there was nothing going on (or, to say it more carefully, that any real effects were too small and variable to show up in these crude, nearly theory-free, between-person experiments).

Pamela said this doesn’t happen as often in biology. Why? Because in biology, when one research team publishes something useful, then other labs want to use it too. Important work in biology gets replicated all the time—not because people want to prove it’s right, not because people want to shoot it down, not as part of a “replication study,” but just because they want to use the method. So if there’s something that everybody’s talking about, and it doesn’t replicate, word will get out.

The way she put it is that biology is a cumulative science.

Another way to put it is that in psychology and social science, we do research for other people; biologists do research for themselves.

With that in mind, I found Peterson’s book to be interesting in all its detail about what actually is going on in these labs.

While I was reading the book, various associations came to mind:

p.xi: “Psychological fields are rarely organized around the evolution of technologically dynamic ‘experimental systems.” This relates to the general question when a new idea comes: Why did it not come up before? One good answer is “new technology.” If that answer is not available, you may be forced to give a lame story about how the researcher is so brilliant–not always such a plausible assumption!

p.3: “If physicists…” He’s comparing psychology to other natural sciences. But what about comparing to other social and behavioral sciences? Sociology, political science, and economics provide insight, but no “technology” comparable to that developed by physics, chemistry, and biology.

p.8: “Psychology has always faced the problem of demarcating its domain from folk knowledge.” haha

p.8: “Thus, lacking the promise of theoretical unity, psychologists came to unify around methodology in a way that is unusual in many sciences.” Economics is that way too!

p.13: Labs, etc. Compare to experimental economics and political science (“playing the dictator game”) or so-called field experiments.

p.16: “Trivial frustrations and low-key victories play a major role in the coming pages. One of my central arguments is that these unsung struggles at the bench play an important role in the production of technoscience. Unscrew any piece of modern technology, decompose any medicine into its elements, unpack the facts that support any significant scientific theory, and you will find dozens of strands of technoscientific development. . . . Theories of scientific progress are rarely pitched at this scale. Technical frustrations may appear, but as bumps along the road to significant technological or conceptual advances. Yet, whether it leads to path-breaking science or down a fruitless alley, these commonplace struggles are significant.” The fractal nature of scientific revolutions!

This also relates to the idea of “microfoundations” in political science and economics, echoing statistical-mechanical derivations of thermodynamics.

p.19-20: “Developmental psychologists face the burden of working with objects that are both difficult to bring into the lab and difficult to work with when they come. Like trying to erect a high-rise on a swamp, it may seem like an impossible feat trying to build a science with such trying material. Yet developmental psychologists have created a thriving science by embracing a rigid theoretical orientation and adopting a permissive attitude toward experimentation and analysis. Thus, even in the face of big practical challenges, the field has crafted something that looks very much like what Kuhn referred to as ‘normal science.’

Social psychologists, in contrast, have embraced a different strategy. If developmental psychologists have built normal science by operating within a theoretical framework, social psychologists operate in something of a perpetual scientific revolution. Rather than limit themselves empirically or theoretically, social psychologists have embraced flexibility as a way to produce research that is interesting. In this environment, which operates as a market of ‘interesting’ ideas, the slow accumulation of facts is viewed with disdain as researchers move from one counterintuitive finding to the next.”

p.24, “Fields and benches. . . . The developmental psychology lab looks like a daycare: warm and bright. The molecular biology lab is intimidating and cold. Furniture in the former is soft and safe, while everything in the latter is hard and sterile. The social psych lab looks like an unfinished office.” That’s amusing and insightful, but . . . how to think about a field such as statistics that is parasitic on real science? There’s no such thing as a “statistics lab.” Or what about observational sciences such as most of political science, economics, and sociology?

Indeed, to what extent is experimentation appropriate in psychology itself, given that psychology’s ultimate aims could be said to be observational?

Scientism . . . experiments give a false sense of rigor and causal identification (recall the quote from that Harvard professor, later notorious for his claim of 100% replication rate).

p.33, “indeterminate situation.” Part of a wonderful story about a stuck lock. Professional skeptic Michael Shermer could learn something from this one!

In political science we speak of “puzzles” and “stylized facts.” And a big problem comes when researchers try to explain or understand a stylized fact that isn’t really true (or for which there is no clear evidence). Kanazawa example.

p.35: “For scientists, however, the situation is different.” The thrill of getting stuck! “I think we have a big fish on the line.” Excitement after the solution, but an earlier excitement at the importance of the _problem_. The idea that an important aspect of being a good scientist is the capacity to be upset. Many practicing scientists don’t seem to have that capacity!

p.46: “The difference between diagnostic replication and integrative replication can be illustrated with an example from cooking. At a party, John eats a slice of delicious cake made by Sue, the host. John asks Sue for the recipe. John’s goal is clearly not to ‘verify’ the deliciousness of the cake but, rather, to reproduce the taste. If John follows the directions step-by-step and the cake turns out overly sweet or undercooked, it does not mean that Sue’s cake was not delicious. Instead, John might assume he made a mistake or lacked Sue’s skill, that she left out some necessary instruction, or that her superior baking equipment and professional oven elevated the taste. The point is that John is interested in doing what Sue did, not verifying to see Sue did what she said she did. Of course, if nothing John tried allowed him to replicate the taste, he might develop some doubts about the origins of Sue’s cake and might, if he were the paranoid type, demand that Sue make her cake in front of him to verify it. However, this adversarial situation, the typical way that replication is discussed in science studies literature is the exception rather than the rule.” This reminds me of three things: (a) it’s funny when researchers see replication as an attack or a threat rather than a compliment, (b) the Reinagel conversation, (c) business vs. academics: in biz, if you have a great idea you’ll keep it secret, but in academia you’ll give it out for free.

How does plagiarism fit into this? John makes the cake and then claims credit for the recipe.

p.47: “In contrast to the flexibility of integrative replication, alterations of method are strictly proscribed in diagnostic replication attempts because it makes their outcomes ambiguous. Failures can be attributed to the changes introduced by replicators and become potentially uninterpretable.” But that’s a mug’s game. There is no exact replication because of the time factor. Recall the ovulation-and-clothing example, where the researchers replicated their own study, did not get the expected result, and then, ridiculously, interpreted the changed result not as a statistically unsurprising consequence of natural variation, but as an interaction with weather (the two studies were conducted during different seasons).

p.47: “Randall Collins has argued that what separates the social sciences from ‘high consensus, rapid discovery’ sciences was the existence of an exciting frontier of technology that leads researchers to abandon controversies when new technologies open up new avenues of exploration.” That describes the field of statistics!

Peterson continues: “New technologies are not rapidly adopted because they are new. They are adopted because they offer new perceptual and manipulative abilities. They materially alter the site of data collection.” I would add: Or because the represent an appealing theory.

p.48: “Replication occurs as an organic part of maintaining one’s position at the cutting edge of the field.” Yes!

p.49: The concept of “the lab.” That’s different for me. I don’t have a “lab”; I have a network of collaborators. I wanted to say “a loose network of collaborators,” but some of the collaborations are pretty tight. Maybe better to say “overlapping networks of informal collaborations.” But no “lab.” Some statisticians and computer scientists do have “labs”; maybe this depends as much on how the funding is structured as on anything else? I guess that my situation is pretty common among mathematicians, statisticians, political scientists, and economists–even successful ones. In contrast, if you’re a successful academic psychologist or biologist, you’ll definitely have a “lab,” usually named after you.

p.56: interpreting ambiguous graphs: “It was the beginning of an investigation rather than its end. It was a step in a long, iterative process rather than a final product.” Indeed.

p.57: “These data visualizations were not meant for public consumption. Nor were they produced to bludgeon competing scientists into acceptance, which is frequently how science studies scholars have thought about the role of data visualizations. Rather, in the Harden lab, they were used to evaluate manipulations and to strategize for future steps. Because of this, researchers approached the graphs in a completely different way from those that were to be published.” I kinda know what he means here, but I’ve usually found that the same ideas that are relevant to exploration graphics also apply to presentation graphics–and vice-versa. I want my presentation to involve the reader as an active participant.

p.63: “Despite sometimes being denigrated in favor of intellectual work, tacit knowledge remains important in experimental science. As another ethnographer of molecular biology labs has pointed out, ‘What needs to be stressed with regard to molecular biology is that scientists act like ensembles of sense and memory organs and manipulation routines onto which intelligence has been inscribed; they tend to treat themselves more like intelligent materials than silent thinking machines.'” This is true even in statistics. Over m career I’ve built up a mental library of thousands of examples, each of which is a story.

This is also related to my experience that, when working with a student, I can sometimes be most effective not as “the idea man” but as “the research assistant.” The student provides ideas and I make the graphs or whatever.

p.65: “Dr. Harden” and “Alicia.” Hierarchy! In some ways an academic science lab is very equal and “flat”; in other ways it is very hierarchical. Industrial labs, even more so.

p.78: “Attention to detail is key. Remembering the temperature of a solution or the age of a particular mouse can provide the key to explaining an anomaly or suggest some intriguing new direction. Sloppy or inattentive practice is one constraint on bench-building. Unlike other constraints, however, this can often be overcome with sufficient rigor.” Yes, and in statistics too! But you have to care. Many researchers don’t seem to care. Or they don’t know how to care.

p.88: “Professional Grade . . . The Material Frontier.” But what about MRI-style junk science?

Two key issues in psychology (and, I assume, biology) are variation (between people, over time, etc.) and measurement error. Research psychologists sometimes show what seems to me to be a shocking disregard for variation and measurement error. Why? There’s the erroneous but appealing logic of causal identification and statistical significance.

An example is the guy who wanted to test Seth Roberts’s diet: his measurement plan was hopelessly noisy.

Lizzie’s example of the process underlying lynx/hare dynamics. The usual model is just one step from “phenomenological” but it can be hard for people to step back and see what’s missing. Similar issues arise in economics and political science, that the intermediate steps are not clear. In modern biology research, however, it’s my impression that the intermediate steps are the main focus.

p.95: “Although experiments need to be easy enough for the subjects to understand, they also had to be complex enough to keep their attention.” He’s talking about toddlers, but the same issues arise in surveys or Mechanical Turk-style experiments.

p.115: “An op-ed in the New York Times about the dust-up points out that the effect sizes in Bargh’s article were not very large to begin with, and thus failed replications should not be surprising.” No. The claimed effect from Bargh et al. was large, indeed implausibly so (piranha principle).

p.121: Popper quote, “Science does not rest upon solid bedrock. The bold structure of its theories rises, as it were, above a swamp. It is like a building erected on piles. The piles are driven down from above into the swamp, but not down to any natural or ‘given’ base; and if we stop driving the piles deeper, it is not because we have reached firm ground. We simply stop when we are satisfied that the piles are firm enough to carry the structure, at least for the time being.” This reminds me of my saying about much of theoretical statistics, that “the house is stronger than the foundations.” The difficulty is that many researchers take their foundations so seriously! I think that similar issues arise in the arts, where a theory serves a useful role in stimulating creative work–but it would be a mistake to take the theory too seriously!

p.137: “During one meeting, Dr. Collins was asking a post doc about his new experiment. It was not going well. The post doc had run just three subjects and described the reactions of each in detail. One supported the hypothesis, one contradicted it, and one showed no preference for the experimental or control conditions. The professor responded, ‘Well, you can’t tell from just three babies,’ but she gave him advice on how to alter the protocol slightly and instructed him to stop after ten subjects if the study still was not working.” Oh no! Attitudes like this are why Tversky and Kahneman coined the phrase, “Law of small numbers,” to refer to the (erroneous) expectation that an effect should show up in every case. A bit of reflection should make it clear why this won’t be the case–many other things are going on at the same time as in the experiment, and these are people, not iron bars–but even experienced researchers make this error. Recall the example of the Yale study of female athletes.

p.141-142: “Roughly half of the regular lab meetings I attended (e.g., meetings concerned with research issues and not administration, planning, job searches, etc.) were dedicated to the discussion of statistically significant, but ambiguous, findings. . . . when a clear and interesting story could be told about significant findings, the original motivation was often abandoned. ” This is from the developmental psychology labs. What happens in the presence of high levels of variation and measurement error?

p.142-143: “A blunt explanation of this strategy was given to me by an advanced graduate student: ‘You want to know how it works? We have a bunch of half-baked ideas. We run a bunch of experiments. Whatever data we get, we pretend that’s what we were looking for.'” Wansink!

p.158: “Experimentation is what separates social psychology from other domains of expertise that seek to establish authority over everyday life.” Interesting.

p.163: “In the first three minutes, the lab leapt between questions of how to define the core concept, how it related to current literature, and how to operationalize it.” That happens in sociology, political science, and economics too!

One way to put it is that, in the social and behavioral sciences, often the underlying object of study is itself a construct. That is, the measurement defines the construct. This is most obviously the case in IQ testing and personality profiling, but also in studies of economic growth or democracy or social ties or all sorts of things.

p.176: a student is quoted as saying, “My goal was, by casting a large net of things to code, I would give myself the best chances of finding effects.” That’s actually not so bad! Ref. my paper with Hill and Yajima. Our recommendation is to resolve the problem of selection by displaying everything.

p.180: “Conceptual flexibility allows social psychologists to define their research objects in ways that either places them within an ongoing lineage of or as a novel break from existing work.” Your findings are simultaneously unexpected (path-breaking) and completely consistent with theory. Just as is said of Beethoven’s music, that each note is both a surprise and, in retrospect, exactly what it should be.

p.180: Two psychologists are quoted: “As soon as you find yourself surrounded by others, consider seeking out the dangerous freedom of the unexamined. Usually–but not always–this risk is rewarded and can help lay the foundation for a new subfield.” The idea that _any_ research strategy will “usually” be rewarded and “can help lay the foundation for a new subfield.” That’s just nuts. There are thousands of academic psychology researchers out there doing this every day. How many “new subfields” are gonna get created? Also there’s this whole high-risk, high-reward mumbo jumbo. The whole idea of “risk” is that you’re taking a risk; you could lose. “The dangerous freedom,” indeed. This one really makes me mad. It’s so damn smug.

p.188: “Progress in Social Psychology.” I also wonder sometimes about progress in political science. There’s technical progress in polling, etc., but not in our deeper understanding, not in the way that there’s been so much progress in physics, chemistry, biology, etc.–even in statistics!

p.194: “Robert K. Merton famously described a system in which a set of internalized scientific norms resulted in a social order that was beneficial to the field as a whole. It was a virtuous cycle of honest research and open critique all undergirded by transparency (Merton’s ‘communalism’), valuing truth over personal gain (“disinterestedness”), and the incredulous replication of new claims (“organized skepticism’). Sloppy or fraudulent claims were said to be punished because sharing data and organized skepticism meant that findings were routinely tested for veracity. Frequent failed replications of one’s work would harm one’s reputation.

The Mertonian view of science was a central foil for the early field of science and technology studies, and it has been rightly criticized for its confusion of scientific rhetoric and scientific practice. . . . As rhetoric, on the other hand, the Mertonian portrait of science continues to have influence.” Fair enough, but I think the Mertonian ideal describes a lot of the best science. Yes, there are many prominent examples of leasing academic scientists who do junk work (Daryl Bem comes to mind), but it’s my impression that some large percentage are trying to do it right, working hard to falsify their theories, etc.

“In the current context of psychology, there is a movement to make this rhetoric a reality.” Remember that quote about how the future is already here, it’s just not evenly distributed? Well, the Mertonian vision is reality for many of us.

p.195: “Replication failures.” From my perspective, much of the value of replication is rhetorical. I didn’t need failed replications to know that the beauty-and-sex-ratio was junk; this could be seen from purely statistical grounds–and that’s the case with many of these junk-science studies. Indeed, sometimes I will give a talk criticizing some study or another, and someone in the audience will ask if I think that the study should be replicated, and I’ll say something like, “If it was up to me, I wouldn’t bother. If some researchers want to do a replication study, I won’t try to stop them, but I feel it would be irresponsible to recommend they waste their time and effort on it.”

p.198: “Organizing Skepticism at the Shaw Journal Club.” Also consider the role of social media. Our blog demonstrates an approach to science criticism, provides many examples, and is also a safe space for dissidents. People send me emails, and even people who never contact me can see that there is a place where criticism is accepted, and a place where I and others push back against aspects of the academic science power structure (while in other ways being part of this structure). Everyone’s an outsider in some way.

Also my paper with Nick Brown criticizing the healing study.

p.205: “Mike Carrow’s Book of Poker Tells.” It’s Mike Caro! No author can be be well read in everything. Arguably it’s a strength of Peterson’s book that he goes far enough beyond his comfort zone to make such a mistake.

p.213: “Broken Social Ties.” Easier for me because I’m coming from outside of psychology. Within statistics, it has sometimes been painful–I deal with it by avoiding difficult situations.

p.217: “contrary to Merton, skepticism itself is primarily beneficial in reference to a group that is not skeptical.” Why does Peterson say this? I don’t get it. I work in groups that are skeptical, and skepticism is valuable to us!

p.220: “Dr. Myer argued that a bad study does not necessarily mean an idea is wrong: ‘If I don’t believe a study, it just means there’s not great evidence. . . . I work with the ideas.'” I’m concerned about the one-way street fallacy. If there’s no great evidence, why can’t the effect be in the wrong direction? Or, more reasonably, consider that the effect can vary in direction and magnitude based on situational factors. There could be no there there, if “there” refers to a consistent effect.

p.222: If you’re going to quote reactionary barons like Fiske, Gilbert, and Baumeisters, it would be good to also quote reformers and revolutionaries like Nick Brown, Simine Vazire, and Dorothy Bishop.

p.223: Progress in psychology vs. biology. Peterson’s a sociologist, and I’d like to see him talk a bit about his oen field, as well as whatever subfields within sociology he works in. I’ve collaborated with sociologists and published some papers in the field, and I’ve seen some good work and some terrible work out there. I don’t have a clear sense of what sociology feels like from the inside. For better or worse, it has less prestige than economics or even psychology–I guess that sociology is on par with political science as getting no respect, maybe deservedly so, who am I to say?–so less of a motivation to produce headline-bait. In any case, I’d be interested in Peterson’s take on the “bench-building” that’s done in his field.

p.233: “Described by Kuhn as the ‘essential tension’ within science,10 the ‘normal’ practice of science as it occurs within a stable and shared package of theory and methods is the necessary backdrop of scientific creativity and radical leaps forward. It is only by developing an intimate familiarity with an established theory that its limitations become manifest.” Posterior predictive checking! The fractal nature of scientific revolutions! A chicken is an egg’s way of producing another egg.

p.234: “Low status fields tend to borrow tools and theories from higher status fields.” It goes both ways! Physics and economics are high-status fields that borrow tools and theories from the low-status field of statistics.

p.236: “a predictable trajectory for social psychological research in which researchers attempted to undermine expectations through surprising findings that would eventually become accepted, expected, and could then be undermined by new, and now surprising, findings.” That’s a funny way to put it! Also it’s the classic model of fundamental physics, from Copernicus, Kepler, and Galileo to Newton to Einstein to today’s string theorists.

How to think about bad critique? For example the argument that polls are no good because nonresponse etc. In our discussion of science criticism and science reform, we don’t have much to say about criticism that is aggressive and misguided. I don’t know what to say about this either, beyond invoking the Chestertonian principle.

p.250: “while I hope this work causes some reflection within psychology regarding their research practices, I do not feel this work justifies the condemnation of psychology as a whole or even the sub-fields I discuss.” I feel the same way! I’ve not tried to do any systematic study of research quality in any field or subfield. Other people have, for example by looking at samples of published work. Such sampling can be done; I just haven’t done it myself, nor has Peterson, so I think it’s fair for him to note this limitation of his (and my) writings on science practices and science reform.

p.251: Is Diederik Stapel really “psychology’s most famous data fabricator”? Maybe that title now should go to Dan Ariely, recognizing that we’ll likely never know what happened with his various retracted studies, so we could more carefully label him as “psychology’s most famous author of studies based on fabricated data.”

p.251: “The righteous ascetic rigor of science reformers does not reflect how science has been done.” I don’t like that “righteous acetic” rhetoric–I’m reminded of the Javert paradox. Is it so horrible to be “righteous” about doing good work? It seems reasonable to me that, if you think science is important, you can be morally bothered by aspects of science culture that reward and encourage bad work. And if “ascetic” means that we are denying ourselves the simple pleasures of misrepresenting results, sidestepping objections, and publicizing claims that are unsupported by data, well, that’s an ascetism I can get behind.

I’m not saying that science reformers are all on the right track–as with any reform movement, there are people who go too far, or who focus on the symbols rather than underlying issues–nor am I saying that you have to buy into “science reform” attitudes in order to do good science. I’m fine saying that the reformist position is only one of mamy reasonable ways to do good work in the science ecosystem. I just don’t like the casual sideswipe at “righteous acetic rigor,” as if the reformers should all just chill out.

The other thing is that, as I’ve said before, “The righteous ascetic rigor of science reformers” does actually reflect how good science is often done. Lots of us really are out there sharing data and research methods, putting our own models to the test, with a willingness to learn from our mistakes. Just cos various high-profile representatives of public social and behavioral science such as the Freakonomics and Nudge teams don’t operate that way, doesn’t mean that “science” write large doesn’t do it right. I’d argue that the idealistic model of science, while sadly far from descriptive of much of academic and industrial research, does characterize much of the best work, the core research that advances our scientific understanding of the world.

P.S. David Peterson is an assistant professor of sociology at Purdue University. How wonderful to see this kind of work from the new generation of researchers!