IMO 2024 and 2025
Power Overwhelming 2025-07-29
I was a coordinator for last year’s IMO 2024 and this year’s IMO 2025.1 Here’s some thoughts about that, contrasting my IMO 2019 post.
What is coordination?
For those of you that don’t know, coordination is the grading process for IMO. As I describe it in my FAQ:
Basically, the outline of the idea is: before the exam, a marking scheme (rubric) is set for each problem, to cover the typical cases of what progress will be worth what points. Then, the leaders of each country get to see the solutions of their country’s students, while there is a number of coordinators from the IMO host country for each problem. Both the coordinators and the leaders read through these scripts in advance, and then meet at an appointment where they discuss the scores. There are a lot of scripts that are not in English, in which case the leader is expected to do any necessary translation.
The idea is that the leaders try to make the best case they can from their student’s work (while of course still being honest), while the coordinators for the problem try to maintain uniform fairness.
We sometimes say jokingly that it’s an “organized fight” (where leaders are defense attorneys and coordinators are prosecutors), but in practice it’s a lot friendlier than this.
IMO 2024: Turbo the snail
Briefing
The briefing for the coordinators is led again by Imre Leader, who was chief coordinator for IMO 2002 in UK. Much of this orientation is similar to the one quoted in the 2002 report. Imre reminds us that coordination is makes a deep impression on the leaders: even if the buses are delayed, or the food sucks, if the leaders have a pleasant coordination, they’ll say it was a good IMO. Being a good diplomat is important.
The problem: Turbo the snail
I’m assigned to IMO 2024/5 (proposed by Chu Cheuk Hei from Hong Kong), a problem that (although I don’t know it at the time) will make history for shaking up the team rankings.
Turbo the snail is in the top row of a grid with
rows and
columns and wants to get to the bottom row. However, there are
hidden monsters, one in every row except the first and last, with no two monsters in the same column.
Turbo makes a series of attempts to go from the first row to the last row. On each attempt, he chooses to start on any cell in the first row, then repeatedly moves to an orthogonal neighbor. (He is allowed to return to a previously visited cell.) If Turbo reaches a cell with a monster, his attempt ends and he is transported back to the first row to start a new attempt. The monsters do not move between attempts, and Turbo remembers whether or not each cell he has visited contains a monster. If he reaches any cell in the last row, his attempt ends and Turbo wins.
Find the smallest integer
such that Turbo has a strategy which guarantees being able to reach the bottom row in at most
attempts, regardless of how the monsters are placed.
I end up missing the opening ceremony in order to help work on the marking scheme for this problem, which is good, because we really did need the extra time. Getting the first draft of the marking scheme took pretty much all day.
Trying the problem
The process for creating a marking scheme seems to always start the same way: all the coordinators have to try the problem themselves first. After this, we meet and start discussing the possible solutions. There’s two pieces of good news that make our lives a lot easier:
- Most of the correct solutions are pretty close to the official solution.
- Not only that, our team is able to pretty much give a complete classification of all possible strategies with the optimal number of moves, by describing a so-called “happy triangle”. (See my solution notes for that description.)
The happy-triangle classification is pretty essential to our coordination. Usually one of the hardest parts of writing a marking scheme for IMO is you do not get to see student papers beforehand. But if you can mathematically prove that all correct algorithms must fit inside a certain narrow space, it becomes much easier to plan ahead.
Writing the marking scheme
We then discuss how we can assign this to partial items. (I think marking schemes are technically supposed to internal, so in what follows I’ll purposely be a bit vague.) There’s a bunch of items on the rubric labeled A-K, but for almost all scripts only two of them really mattered, items G and H.
To explain what G and H are: roughly speaking, the official solution has two cases based on whether the first monster encountered is on the edge of the first row or not; or more generally, within what we called the “happy triangle”.
- We had a 1-point item G for the idea used in the easier case: finding a monster with two accessible “shoulders” should be good enough.
- We had a 1-point item H for the idea used in the harder case finding two monsters, even far apart, where one can cut behind the monster closer to the top.
The issue is that it was pretty difficult to define these items in a way that both covered all possible solutions but was easy to parse. The coordinator team had spent the day all working together thinking about the happy triangle structure, so we had an agreement in our head what the items were. We opted for a wording that was a bit clumsy and long-winded but was mathematically airtight and did what we wanted.
I’m largely responsible for the appendix to the marking scheme with the figures.
Reading the papers
The thing about having a Day 2 problem is that the scripts arrive in the afternoon, and coordination starts the next morning. I read scripts continuously from about 3PM to 1AM. It goes pretty smoothly because most solutions are either 0, 1 (from G), or 7. I think by the nature of the problem it’s straightforward to coordinate.
The language barrier (students write in their own language) turns out to be a lot less of an issue than I would have expected, because this is a problem with pictures. You can often tell from just the pictures what’s going on, even if you can’t understand the text itself.
Coordination meetings
I’d heard before that being a coordinator is like being customer service. I can now confirm this is true.
Several of my coordination appointments are mostly spent explaining to the leaders how items G and H are defined. I got pretty good at giving this explanation: like, down to the act of pulling out the appendix to the marking scheme, pointing to certain figures, and repeating the same spiel. It reminded me a bit of how when you go to open a new bank account or something, and the banker already has a printout with colorful tables with fee structures and knows exactly what to highlight in an obviously-rehearsed speech.
As expected, there were a couple of tough cases from this year. But overall the coordination is clean across all the problems; Imre Leader says that at no point did we have to consider escalating to the final jury meeting.
IMO 2025: Bonza functions
Briefing
The briefing for the coordinators is led by Ivan Guo, whose comments are a lot like Imre’s last year. Ivan reminds us that coordination is not an adversarial process, and the goal is to collaboratively agree on a consistent score. We talk a bit about general coordination principles much like last year, and then proceed to start working on our assigned problems.
The problem: bonza functions
I’m assigned to IMO 2025/3 (proposed by Lorenzo Sarria from Colombia).
A function
is said to be bonza if
divides
for all positive integers
and
. Determine the smallest real constant
such that
for all bonza functions
and all positive integers
.
Trying the problem
So I have a confession to make. When I fill out the preference form that asks what kind of problems I’d like to coordinate, I usually write “anything that’s not a functional equation”. Well, here we are.
It’s been a long day and I’m still jet-lagged, and the problem statement doesn’t really grab my attention. So I decide to retire to bed early and try the problem in the morning itself. I end up waking up at 5am. With breakfast opening at 7am, I tell myself, “well, try the problem for a couple of hours and get breakfast”.
There are two things that surprise me.
- One is that the problem grows on me a lot. You wouldn’t guess it from the statement, but the solution is actually pretty fun to work out.
- The other is that by 6am I had a solution fully written out, on paper. Normally, I consider myself pretty lucky to solve an IMO 3/6 level problem at all within 4.5 hours.
This was the moment I knew P3 was way easier than the jury expected and the medal cutoffs were going to be high this year.
Writing the marking scheme
I think the marking scheme prepared is fine, although I play less of an active role this year in its development compared to last year. This is also a problem for which a lot of the correct solutions look similar, although it’s not quite as constrained as last year’s Turbo. There are some clear milestones that basically all correct solutions passed through and I think we assigned them reasonable weights.
Still, if I could go back in time with hindsight, there were some cases I wish we could put in the marking scheme. During the coordinator meeting we spent a lot of time arguing about a certain deduction that I don’t think ever actually appeared. Conversely, situations that did come up that I think we could have prepared for better:
- The student conditionally proves
for large
(say, assuming there is some
with
).
- The value of
the student gives is accidentally too low because of a small but fixable mistake in
calculation.
- The student’s construction has only the mistake
and otherwise works.
Reading the papers
This problem had quite a few equations, which helped with language issues. Still, compared to the pictures from Turbo the Snail last year, it was a bit harder to parse. For example, it matters when a student states something for some prime , for sufficiently large primes
, or all odd primes
.
Languages I got to see this year included Bulgarian, Chinese, Italian, Persian, Portuguese, Japanese, Mongolian, Vietnamese, and I think a couple more I’m forgetting. But the handwriting was often more of an issue than the foreign language.
Overall I think I finished reading all 100-ish scripts at my table with a bit of time to spare. I think Problem 3 was definitely the cleanest one to coordinate this year. Most of the other problems had a really rough time.
Coordination meetings
The meetings themselves were mostly straightforward. We had a harsh marking scheme this year, so there were several times where I needed to call over the problem captain to tell the leader, “yes, it’s really this strict”.
AI comments
Some people have asked me to comment on AI results this year. I am not an AI expert, so I will be brief.
I was generally impressed how far language models have come since 2024. (In contrast, last year’s AlphaProof result was Lean-based.) I think it’s clear there has been a ton of improvement. I give my congratulations to the researchers who’ve made this much progress.
To split a few hairs: as far as “gold medal” claims, overall I think making human-vs-computer comparisons is more distracting than helpful. (See Terence Tao’s thoughts on that.) Medal comparisons really only matter to the general public, who are unable to even understand the statements of IMO-level problems. If you’re the kind of person who reads my blog, you’d be better off reading the problems and solutions yourself2 rather than just reading news headlines.
Most importantly: try problem 6! It’s super cool with an unexpected answer. I liked the other five problems this year too, but IMO 2025/6 is the one I’m actually telling all my friends about.
Consider a
grid of unit squares. Matilda wishes to place on the grid some rectangular tiles, possibly of different sizes, such that each side of every tile lies on a grid line and every unit square is covered by at most one tile.
Determine the minimum number of tiles Matilda needs to place so that each row and each column of the grid has exactly one unit square that is not covered by any tile.
As I know even finding the construction is still out of reach of existing models. We’ll see how much longer it stays that way.
- Before, I was a coordinator for some virtual IMO during the pandemic too, which is much less fun. And from 2017-2019 I was an observer for the USA.
- If you take my advice and actually solve the problems, you will realize problems 1-5 are all surprisingly accessible this year; in Australia ranks 9-90 all essentially solved 5 problems, and there was a 45-way tie for 35 points. Problem 3 deserves particular comment: in addition to being too easy for its place, in MathArena’s evaluation it was the only one solved consistently by any public model. Meanwhile, Problem 6 was brutal. So I think this year’s IMO had an unusually large chasm in difficulty between the P1-P5 and P6. I was interested to see how models fared on problems in between those two extremes, but we’ll have to wait for that.