Against exploitable rubrics
Power Overwhelming 2024-04-05
Editorial note: this post was mostly written in February 2023. Any resemblance to contests after that date is therefore coincidental.
Background
A long time ago, rubrics for the IMO and USAMO were fairly strict. Out of seven, the overall meta-rubric looks like:
- 7: Problem solved
- 6: Tiny slip (and contestant could repair)
- 5: Small gap or mistake, but non-central
- 2: Lots of genuine progress
- 1: Significant non-trivial progress
- 0: “Busy work”, special cases, lots of writing
In particular, traditional rubrics were often sublinear. You’d see problems where you could split it into two parts, and solving either part would only give 2 points, whereas solving both was worth 7.
Increasingly, I’ve noticed this is less and less common. Particularly, at the IMO1, there are increasingly more examples of marking schemes which are totally additive: the 7 points of the problem are divided into chunks, and the score is the total sum of the chunks. The effect is that this is usually a lot more generous.
So like the grumpy old conservative I am, I want to explain why this annoys me.
Hacking the test
The naive version of the rubric development game design is that, you have a bunch of submissions, and you need to put a total order on them (from 0 to 7). If that was a correct model, then it doesn’t matter what number you assign as long as the ordering was correct.
But I think there are two wrong assumptions with this model. One is that scores are added between six problems. But the larger issue is that it’s not a single-fire game, it’s a repeated game: rubrics in a given year are likely to be similar to previous years, so precedents matter.
My overall sentiment with rubric design follows something Kent Merryfield wrote on the forums in early 2010 about the entry-level AMC contests:
I want students to stop playing games, and to stop being overwhelmed by the meta-analysis, and just solve math problems.
My feeling is the same. I want to design exams where students win by solving more problems, not by knowing more about how grading works. There are enough hackable tests in the world, and I’d like math olympiads to stay away from that territory (as much as they can anyhow).
When you have clean 0+/7- rubrics, the “winning strategy” is clear: maximize solved problems. Nothing else matters. You’ll get 7 points for each complete solution, and for the remaining problems, you’ll get small partials for work you were already doing anyway (if it’s in the right direction).
We’ll say a rubric is exploitable if it fails this property: knowing what the rubric is2 encourages you to do something else.
Although “exploitable” is not exactly the same as “generous”, I think the two are fairly correlated: the more partial items there are, the more surface area you have to exploit. Examples:
- Knowing that two-part problems (anything with “find all”) are split into two halves (and one half is likely to have easier pickings) starts to become useful insider information.
- Or a first-time contestant might say, “I’m not sure that I will solve anything, so I’m going to look for things that seem like rubric items and try to extract those”, e.g. by solving as many special cases () as they can with no intention of actually solving the problem at hand.
- Even experienced contestants might say, “I’m probably not going to solve IMO3, so I should maximize my expected score by breadth-first searching for easy crumbs on the marking scheme instead”.
You also start having a meta-game of “how much time to spend writing down every easy observation I can think of”; whereas under stricter rubrics, anything worth more than 0 was usually obviously worth submitting.3
Transparency
Seven years ago, if you asked me, “should we publish IMO rubrics?”, I would have answered, “Of course! We have nothing to hide, transparency is good.” The students would look at the rubrics, see that there is nothing to see, and then move on with studying math. And I could stop having to answer questions like “do I have to prove the angle bisector theorem on a contest?”. (Answer: no.)
In fact, this is exactly what I do for my own contest, the USEMO. I’m completely comfortable publishing the full rubrics for USEMO because I believe these rubrics are not exploitable.
But today, I would genuinely be hesitant to post IMO rubrics. Because, frankly, when I see rubric items like
“[IMO 2018/2] Proving that is not possible: 1 point”
then the first thing that comes to mind as a coach is “oh, guess I need to tell all my students that items like this can be worth a point”.
Do I want to do this? No. That’s why I’m so annoyed.
Score inflation
The usual argument I hear for being more generous is to be less discouraging to students who don’t solve many problems. This is why I grudgingly allow some exploitation of points on problems 1 and 4.
Nonetheless, I wonder if you can’t just get the same benefit by having a 4-problem test (rather than 3 problems) with two easy problems. Experienced students won’t bat an eye, and less experienced students have twice as many chances to solve something.
Well, I don’t suppose the IMO is going to change format anytime soon. (If it did, then a lot of my scripts that have hard-coded 6’s in them are going to break…)
- As far as I know, the IMO rubrics aren’t really available anywhere. (On the other hand, I’ve never been told that rubrics explicitly need to be kept secret, either.)
- Pedants may point out that “knowing” the rubric doesn’t quite make sense because you need to know the solution in order to understand the rubric. I don’t think this makes an essential difference to the argument: you can replace it with “having seen 100 past rubrics written by the same person”.
- This applies to coordinate bashing as well. The IMO tradition is that incomplete analytic approaches don’t get partials. Here’s a rationalization why: imagine two students Alice and Bob see that a problem could be solved in principle by coordinates. Alice thinks about it for a bit, and realizes that the calculation is too ugly for her to finish in the time limit, so she abandons it. Bob is more naive, and starts plunging in, before giving up five pages later when he realizes the expressions are hopeless. (Instead, he writes, “simplifying, we get the desired result”.) I think in this case, we should reward Alice, not Bob. If Bob could get partial credit, coaches would need to re-train geometry students to always write setups that could work, even if they know they won’t be able to carry them out, and even to dive in to the calculation if they anticipate they don’t have a good chance at solving the problem anyway.