“Alphabetical order of surnames may affect grading”

Statistical Modeling, Causal Inference, and Social Science 2024-08-22

A Beatles fan points to this press release:

An analysis by University of Michigan researchers of more than 30 million grading records from U-M finds students with alphabetically lower-ranked names receive lower grades. This is due to sequential grading biases and the default order of students’ submissions in Canvas — the most widely used online learning management system — which is based on alphabetical rank of their surnames. . . .

The researchers collected available historical data of all programs, students and assignments on Canvas from the fall 2014 semester to the summer 2022 semester. They supplemented the Canvas data with university registrar data, which contains detailed information about students’ backgrounds, demographics and learning trajectories at the university. . . .

Their research uncovered a clear pattern of a decline in grading quality as graders evaluate more assignments. Wang said students whose surnames start with A, B, C, D or E received a 0.3-point higher grade out of 100 possible points than compared with when they were graded randomly. Likewise, students with later-in-the-alphabet surnames received a 0.3-point lower grade — creating a 0.6-point gap.

Wang noted that for a small group of graders (about 5%) that grade from Z to A, the grade gap flips as expected: A-E students are worse off, while W-Z students receive higher grades relative to what they would receive when graded randomly. . . .

Here’s the research article, by Zhihan (Helen) Wang, Jiaxin Pei, and Jun Li.

The result seems plausible to me.

What I’d really like to see is some graphs. To start with, a plot showing average grade on the y-axis vs. first letter of surname (from A to Z) on x-axis, with two sets of dots: red dots for the assignments graded in surname initial, black dots for the assignments graded in quasi-random order order, and blue dots for the one-third of assignments that were not graded in either of those orders. And then separate graphs for social science, humanities, engineering, science, and medicine. With 30 million observations, there should be more than enough data to make all these plots.

The regression analyses are fine, sure, whatever, but I wanna see the data. Also, I want to see all 26 letters. For some reason, in their they put the surnames into five bins. I guess the data are probably owned by the University of Michigan and not available for reanalysis.