MOHS was a mistake

Power Overwhelming 2024-01-06

I remember reading a Paul Graham essay about how people can’t think clearly about parts of their identity. In my students, I have never seen this more clearly than when people argue about the difficulty of problems.

Some years ago I published a chart of my ratings of problem difficulty, using a scale called MOHS. When I wrote this I had two goals in mind. One was that I thought the name “MOHS” for a Math Olympiad Hardness Scale was the best pun of all time, because there’s a geological scale of mineral hardness that coincidentally has the same name. The other was that I thought it would be useful for beginner students, and coaches, to help find problems that are suitable for practice.

I think it did accomplish those goals. The problem is that I also inadvertently helped catalyze an endless, incessant stream of students constantly arguing heatedly about whether so-and-so problem is 10 MOHS or 30 MOHS or whatever.

I think there is an inductive chain of failure modes here. To start, it’s hard to reason about the difficulty of a problem, because it threatens your identity as a strong problem-solver if you miss an “easy” problem. Going one further step, if you claim a problem is easier than the consensus, then people might attack you as insensitive, out-of-touch with reality, miscalibrated, elitist, and so on. Since “out-of-touch with reality” is not something most people want to be part of their identity, people also start saying things “this problem is not as easy as you think” to try and send the relevant tribal signals that they’re not one of the head-in-the-clouds-humblebraggers. Which then leads to the pendulum swinging all the way across to “you’re just saying that, students aren’t as dumb as you think”, and so on ad infinitum.

That’s how you can get beautiful Internet phenomenon such as flame wars about whether the IMO bronze cutoff is going to be 15 or 16 points.

That particular example illustrates another thing: although problem difficulty is obviously subjective, adjacent questions like “how many people will solve this problem on this exam?” is a completely objective question that generates just as much controversy.

And more importantly, they generate wrong answers. I see so many examples of students who boldly assert that “I bet 75% of students solved this problem”, only for statistics to come out a week later showing the prediction was completely in the wrong ballpark. Sometimes there is emotion attached, but other times there isn’t. People will casually try to predict outcomes in passing conversation and still find themselves totally off the mark.

So there is something deeper going on. These students are normally pretty smart (because they’re math olympiad students) and also often under-confident (because they’re math olympiad students). So why would they suddenly be so confidently wrong on their field of expertise?

Perhaps judging problem difficulty is just hard? After thinking about it, I’ve begun to suspect that it’s not actually intrinsically as hard as it’s made out to be; instead, most of the trouble1 is actually just self-imposed by people’s egos being tied in.

This leads me to my latest piece of advice: if you are an intermediate-advanced student who doesn’t need help picking practice problems anymore, do not use the MOHS hardness scale. It’s fine to ask questions like, “what is the hardest step of this problem?” or “what makes this problem difficult?”, because that kind of reflection does help you improve.2 But further trying to place that difficulty on a scale from 0 to 50 in multiples of 5 seems to largely be a waste of time, because at that point there is too much emotional baggage attached that isn’t easy to disentangle.


  1. OK, there is one other factor: it’s time-consuming. I think it is true that it’s difficult to judge a problem unless you try it yourself, and olympiad problems take a lot of time. This is an issue for example at the IMO, where the Jury who votes on which problems to use on the exam only gets a few days to work an entire shortlist of 30+ problems. I’ve felt for a while this is simply not enough time, and this leads to a lot of haphazard decisions. ↩
  2. In my case, when students find a problem harder than I predicted, I’ve been sometimes able to use that to guide my teaching. For a concrete example, see the story about TSTST 2016/4 at the end of this blog post, where lower-than-expected scores on TSTST 2016/4 gave me an idea for a new lesson plan. ↩