Teammate Hunt 2025 Author Notes

Power Overwhelming 2025-04-23

Thanks to Olga and Holly for factchecking a draft of this post. Remaining errors are my responsibility of course.

The recent 2025 Teammate Hunt just finished, which went really well. See the link to the wrapup. I was a minor supporting character in the organizing team, mostly just taking care of writing a few puzzles here and there. This post is about the creation stories behind all those puzzles.

(Puzzle links only work if you’re logged in for now; public access is coming later.)

Stories of my puzzles

Black or White and Black or White or Moon or Sun (from the control panel round)

This was a puzzle that worked because of our hunt structure (control panel puzzles come in pairs). Masyu is a common logic puzzle genre, and I was curious if there was another standard Nikoli genre that also involved a closed loop. That’s how I got the idea to do Moon or Sun to do a fusion: “one feeder uses Masyu to extract and the other feeder uses Moon and Sun to extract”.

The grid from the second puzzle BWMS.

(Penpa+ link)

I think designing BWMS taught me a lot about how to design logic puzzles in a way that feels satisfying to me. (Before that, my only experience was the helping with the Flooded Caves puzzle from the 2023 Mystery Hunt.) Let me explain some shower thoughts.

High-level decisions

I won’t say I dislike standard logic puzzles. But I have a confession: given a choice between manually doing a normal Sudoku, versus plugging it into a solver so I can unlock a new hunt puzzle, I’d usually take the latter.

The thing is I already know exactly how Sudoku rules work and basic strategy. So solving a Sudoku doesn’t feel new anymore. I know this is a personal preference — there are definitely people who wouldn’t want to use a solver even during a competitive hunt — but it influences the way I think about design.

Now when I’m trying to design logic puzzles for puzzle hunts, I really want them to carry some oomph instead of just “here is a fully specified satisfiability problem, compute the unique answer”. Here’s how I tried to do it, at a high level.

Having the solver take part in figuring out the rules. In Black or White, once you know it’s Masyu plus Sun-or-Moon, it’s not hard to guess what the implied rules are. Most of our testsolvers’ first guesses were right: “fill in the pearl colors and satisfy both constraints”.But I think even that 10% uncertainty is engaging. It creates a natural incentive to try the first minipuzzle. It’s a lot more rewarding to say “I found a unique solution under my conjectured rule set, confirming my hypothesis” instead of “I found the unique solution that was already guaranteed”.
Of course, you also want to make sure that the solvers infer the right rules. It sucks to work on logic puzzles and realize later you have the wrong rules. The checksum numbers (number of black pearls / white pearls / moons / suns) were designed to protect for this. I think the numbers are pretty useless for actually doing the logic puzzle. But in BWMS they were especially important to confirm that, yes, fill in all the pearls, and then everything else is suns and moons.

Having the solver take part in figuring out high-level strategy. In Black or White, I intentionally set every grid to have the same configuration of cages, so that lessons carried over. Once the solver realizes that there’s only two ways for the loop to go, then the minipuzzles become a lot easier.Similarly, in BWMS, I continued using cages of the same size, helping carry over what the solver learned about how pearls worked in Black or White. (Spoiler: black pearls are really constrained!) In particular, the “bipartite graph” lesson carried over, where once you know whether a single room is sun or moon, you can globally deduce it for the entire puzzle. That’s an aha that I think feels really cool to figure out.

Terror. If you worked on BWMS, you might have felt a feeling of dread when you opened the puzzle and realized what you were being asked to do. I want you to know this was totally 100% intentional. In contrast to the friendly small Black or White grids, the first impression you get from BWMS is that there’s no way there could be a unique solution, because the grid looks practically empty.¹ That’s exactly what I was going for: something that looks impossible at first, but turns out to be within your reach.

Have an elegant extraction. The yin-yang feeling in both genres of the fusion made Braille feel thematically coherent for the extraction.

Keep it short. I don’t want a logic puzzle to overstay its welcome and become a slog. Especially in this hunt where the puzzle radius is unusually narrow.For Black or White, the target time for an experienced team (once the rules were fully understood) was 5 minutes per minipuzzle. For BWMS, I hoped an experienced team would take about an hour. (Myself, I went through so many versions that by the end, I could solve each version in about 20 minutes.)

All in all I think the high-level decisions worked out really well. The rules are simple to figure out, then you get to invent your own strategies for this new fusion genre, and finally carry over those lessons from Black or White to handle the scary-looking BWMS.

Low-level decisions

That covers the overall general decisions about the structure of the puzzle, but I haven’t even gotten to the part of actually setting the specific grid. For Black or White, where the minipuzzles only take five minutes each and are more about learning the rules, there was leeway in choosing givens. But BWMS was much more sensitive and took many hours of fiddling before I really found a grid that I was happy with.

I started by writing an automated verifier using grilops — which outputs for any given grid whether it has any solutions and whether it’s unique. You have to be careful about leaning too much on the verifier, because the actual human solvers aren’t going to have that. But I still found it helpful for prototyping: I would say it’s a way to iterate faster when experimenting.

Here’s my rough approach. I’d start with a grid with a known loop and put lots of givens so that the solution was obviously unique. Then, I’d experiment with taking away givens and seeing which ones cause the solution to stop being unique — grilops will output alternate solutions when one exists, and looking at the difference gave me a sense of which parts of the grid were more troublesome than others. Then, I’d tweak the loop around the problem areas and see how that worked. After a while I’d get a loop and a set of not-too-many givens for which the verifier promised me the solution was still unique.

(The verifier was also responsible for making sure that the Braille letters extracted were the desired ones. You can imagine a nightmare scenario where I have a beautiful grid I’m super happy with, only to realize that one cell doesn’t give the right Braille letter², and it can’t be easily fixed.)

Then, I’d take that grid and try to solve it by hand. Unsurprisingly the grid was usually bad. But I would find out which parts of the grid were possible to break in on, as well as which parts you’d get stuck on with no way forward besides brute-force casework. I’d adjust those. Or sometimes, I’d run into something that almost worked as a solve path, and if I added one more given, the casework would go away. I’d add that in (and have the verifier double-check), and go on. Conversely, sometimes I’d find unneeded givens, and drop those (and double-check), and go on. Then repeat, many, many times, going back and forth between trying to solve the draft by hand, and adjusting the draft so that the solve path felt smooth.

The whole time I used Git for version control, so from the commit history you can actually see my indecision captured in realtime. Here’s a snapshot:

a383c92 [2024-10-12 16:47] [Evan Chen] Working prototype of brutal02c5d83 [2024-10-12 16:48] [Evan Chen] Remove two X's39eced9 [2024-10-12 16:51] [Evan Chen] Make hardere8fb519 [2024-10-12 16:59] [Evan Chen] Remove another moondfaf0a7 [2024-10-12 17:02] [Evan Chen] Remove another moonbf3a21d [2024-10-12 17:03] [Evan Chen] Remove another moone7a1d45 [2024-10-12 17:04] [Evan Chen] Keep workshoppinge6bad46 [2024-10-12 17:05] [Evan Chen] Even fewer moons857f3f6 [2024-10-12 17:06] [Evan Chen] Remove a pearl!f37bec4 [2024-10-12 17:07] [Evan Chen] Drop yet another pearlab18409 [2024-10-12 17:14] [Evan Chen] Two moons for two pearls22aea74 [2024-10-12 17:15] [Evan Chen] Two moons for one pearl8650315 [2024-10-12 17:15] [Evan Chen] Drop unused moonb26987f [2024-10-12 17:16] [Evan Chen] Trade one pearl for one moonc52494c [2024-10-13 15:12] [Evan Chen] Trade back a pearl for a moon6b0dd7f [2024-10-13 15:13] [Evan Chen] Trade another pearl for moonaa1b700 [2024-10-13 15:17] [Evan Chen] One last pearl -> moon traded3b607f [2024-10-13 15:26] [Evan Chen] Swap some pearls for moons772216f [2024-10-13 15:27] [Evan Chen] Remove unneeded pearl9d1822a [2024-10-13 15:43] [Evan Chen] Swap some pearls for moons7e62a03 [2024-10-13 15:43] [Evan Chen] Trade white pearl for black4eb97da [2024-10-13 16:00] [Evan Chen] Move moons all to edge2e1587d [2024-10-13 16:14] [Evan Chen] Okay last touches

To give you an idea of how sensitive these things can be: one of the pearls I thought a lot about was the one in row 11, column 2. If you remove that particular given pearl, it turns out the logic puzzle still has a unique solution. However, deleting that pearl opens up a plausible-looking branch that takes maybe five cages of lookahead before you can actually disprove it. So that pearl turns out to be super important to keep.

Edits

Testsolving the logic puzzle component went well. But there was a lot of difficulty with extraction BWMS at first. It’s easy to get thrown off by only some of the cages in BWMS being valid Braille letters, even with the given necklace at the bottom (which we colored with red rectangles to match the ones in the puzzle). And because of that, we had to be really clear that moons and black pearls were Braille dots while suns and white pearls were Braille non-dots. That’s how the necklace got annotations in the center.

Also, a note on the errata. Officially, I don’t think there is any requirement in Moon or Sun that a moon room must actually contain one moon. I had checked on the Nikoli website to make sure, because in BWMS the loop sometimes only passes through pearls in a room. None of the testsolvers noticed this or brought it up, so I thought it was fine.

Then early on we got a Contact HQ saying this might be nice to clarify even though I wasn’t technically wrong, because some teams might still get tripped up. At first I was a bit reluctant to change the puzzle mid-hunt, but hours later, other teams started getting tripped up. So then I was like, aw crap, should’ve listened, and sent the “errata”.

I should know better than this. “I was technically correct” just isn’t good enough in puzzle hunt design. Puzzles should be fun, and tripping over technicalities isn’t fun.

Digits (from the toolbench round)

The Digits puzzle.

In contrast to BWMS, Digits took 2 hours and 21 minutes to get the draft out, and the only change from testsolving was the one line at the bottom³. (Of course, then you have to watch both testsolves, write up the full solution, postprod and edit it, etc.) But by a long shot this was the shortest construction time. It was also the first puzzle to get factchecked and the first puzzle to be marked “Done”.

I came in already with a simple idea⁴: I just wanted a system of equations in A through Z where besides the all-zero solution, the only nontrivial solution (constrained to some set — I chose 0-9) spelled out an answer. Then I wrote some equations that made it work. Then I shipped it. Done.

Behold, a Puzzle (from the greenhouse round)

This was a puzzle where we (coauthors and I) took a while to come up with a mechanic that we were happy with. Kaity had sent me the Behold a Square meme from Reddit, and we wanted to build a puzzle around it. But we didn’t have a good idea for how that could work.

We threw around a lot of ideas, many which involved identifying a picture or shape (SQUARE given keyhole, TORUS given coffee mug, …). Nothing in that brainstorm really felt right to me. Identification felt passive, and there wasn’t a set of words that felt right. To me, yet another identify-sort-index-solve puzzle didn’t feel exciting⁵.

We started making headway when we had the idea that having the solver compute the area of a shape would be a thematic way to get numbers for A1Z26 that we had control over. That resolved a lot of the word-based constraints our initial ideas had.

Starting from the last step being A1Z26 on areas, we came up with the rest of the puzzle basically in reverse order. What shapes should the solver find the area of? Well, we got a six-letter answer assigned, and there are six stereotypical high school quadrilaterals, so we’ll have a cursed version of each quadrilateral. How should these shapes be given? We’ll give the solvers the pieces and have them assemble it. How do we give the cursed definitions? We’ll have some blanks at the top with the definitions that the solvers can fill in with the six words, highlighting USE AREA as an excuse to have them there.

And so we started drafting the puzzle. I have to say making the shapes was a load of fun.

Shapes in Behold, a Puzzle

A few other intentional choices:

We could’ve made the given pieces drawn as typeset pretty SVGs rather than the crappy doodle that I did on my iPad⁶, but we intentionally chose not to, because we thought the derpy not-to-scale aesthetic was more suitable for the puzzle.
The note about rounding error⁷ with 0.01 being insignificant is actually really important for the solver experience. We found this out the hard way during the first testsolve, before adding that note.
It’s possible to Nutrimatic the answer pretty early on. We thought about preventing this, but didn’t see an elegant way to do so. So I let it go. The thing is, computing the area of some of the shapes is a bit annoying. I thought it was perfectly fine if someone assembled most of the shapes, found the area of just one or two, and then got the answer. I didn’t really think they were missing out on anything.

Derpy iPad art for Behold, a Puzzle.

Fun aside: we got several hint requests about whether Severance was relevant (we used “innies” and “outies” to describe whether the curved parts go in or out). I’d never heard of this TV show before. Is it good? Should I watch it?

Tetris-Nonogram sequence (from the navigation round)

There’s actually a good story on how Olga and I came up with the mechanic.

Last-minute deadline panic

The Plot A Course round was one of the last rounds to come together, and I had volunteered to help write one of the three sequences in the meta. We needed to come up with an answerless logic grid puzzle, and later we’d get specifications of which cells should be “safe” or “unsafe”. Sure.

Olga and I were originally planning Slitherlinks variants, and had some decent prototypes by mid-February. I wanted to keep things simple because of the tight deadline. We had a verifier up and running (like with BWMS), so we could iterate quickly. And Brian had done a post-production of a Slitherlink interface on the site.

But when we got the specification later, it didn’t fit well. To satisfy it, our Slitherlinks would need than one loop, often several unnatural loops. Everything I came up with felt super forced. Quoting myself in Discord:

i’m getting the sense that the shapes might look pretty bad. i almost wonder which of these is worse:

a slitherlink with 3 loops but one of which is nested

a slitherlink with 10 little bite sized loops

We were aiming to have a draft for testsolving by Thursday March 6, and a few days before the deadline I was still unsatisfied. I think by Monday we basically agreed Slitherlink was just not the right genre in hindsight given the meta requirements. It was just a question of whether there was enough time to start over. Nonogram had been floated as a possible idea, and on Tuesday morning Olga wrote a joke message in Discord:

Dumb new variant: columns are Tetris pieces, rows are numbers

Then finally on Tuesday night I wrote:

About three hours later I messaged back:

update: it actually kinda works so that’s what we’re going with. it might require some work to postprod well.

Another three hours later I wrote back again:

i think in principle we have something that could be testsolved. i’m paranoid of errors with uniqueness. but it’s too late at night for me to do much about it because my brain has turned into tofu.

(It turned out we had a nice :tofu: emoji in Discord, courtesy of Ariel, so that made it into the channel name for the rest of the hunt.)

Design

Like with BWMS, I wanted the rules to be gradually figured out by the solver. Roughly this sequence has six stages:

Learn how the Tetris pieces work on the first two 7×7’s.
Learn how the Akari work on the other two 7×7’s.
Do two large grids using strategies you learned before.

The Akari lights were a way to try to keep things “fresh”. I think it would have gotten repetitive if all six grids used only the Tetris mechanic, and I thought it was kind of cute to have the Akari in “reverse”. (In normal Akari puzzles, you’re given walls and add lights; this time you’re given lights and draw walls.)

One of the other things we really pushed for was to make the sequence easier. I worried that Plot A Course was becoming long and starting to be a slog, so I wanted my sequence to be brisker. We intentionally had more givens than actually needed. In fact we worried a bit because our testsolvers burned through all six minipuzzles in less than half an hour. But we decided it was OK, because the testsolvers said they loved it, and that’s the most important dial to optimize for.

Postprodding

This was a puzzle for which the tech implementation was really important. Despite the short timeline, I worked pretty hard to get an interface on the website that I was personally happy with and that took a lot of programming.⁸ This isn’t a puzzle that I would want to do in Google Sheets, and if I felt that way as the author, I’m sure the solvers would too.

In the final version, you can have one hand on the mouse and one hand on the keyboard for switching between pieces. And there was a way to also mark cells as shaded or empty for convenience. All the pieces are colored differently. It’s synced between team members. And also I think it looks pretty.

The user interface for one of the levels, Tetkari

I like teammate’s tech stack a lot. I’m not a programmer by profession, so I was relieved that tph-site handled all the stuff like websockets and states.

I regret I didn’t have the time to make the interface mobile-friendly. But that was the reality given how pressed against the deadline I was.

Closing thoughts

I didn’t realize until I was writing this post that every puzzle I wrote was kind of a logic puzzle.⁹ Ha. It was really a great experience watching these puzzles come together, and I think my writing style has improved too. I am grateful to the editors, testsolvers, and factcheckers who helped.

I’ve written a lot of words about my puzzles, which I was happy with, but this doesn’t even scratch the surface of what teammate as a whole put together. The other puzzles in the hunt were great too, and that’s just the individual puzzles. The tech, art, and story teams absolutely hit things out of the park, and I saw just how hard many people worked to make this event fly. I’m honored that I could be a small part of writing this hunt. Thank you teammate.

And thanks to the audience, especially those who sent some nice messages to us! We have a channel called #warm-fuzzies where we share kind words we get, and reading through those is a really nice way to celebrate the conclusion of the hunt.

In fact, an early version of BWMS went so far as to only have dashed pearls, not even given moons, just for this aesthetic effect. I gave up on that because if you do it that way, you end up doing no sun-or-moon logic at all, which is disappointing (for example, you don’t get to use the “bipartite” idea).
By the way, IIRC the letters E and O actually just can’t be formed at all as extraction letters in BWMS. I was lucky the answer was flexible.
I had initially hoped that the title Digits would make it clear the numbers should all be digits, but the first testsolve showed that was a bit too optimistic. We added the line to make it clear what was going on and the testsolve finished smoothly after that.
My honest thought while writing Digits was “surely someone else has done something like this for Mystery Hunt, right?”. And I’m still suspicious of that.
And it shouldn’t come as a surprise to people that the first taste test I use for any puzzle is “would I want to work on this?”. If the puzzle isn’t even fun to me, why would it be fun to someone else?
I did slip up a bit though: if you pay close attention to the arcs, some of the endpoints are blue and some are black. That was just sloppiness on my part; I didn’t notice the inconsistency. That’s the kind of thing you have to be careful with in puzzle design. There were teams that picked up on this and asked if it was relevant.
We joked briefly about giving exact numbers, but the closed forms of the numbers are not only hideous but also give away which piece maps to what.
I know everyone’s sick of hearing about AI, so I’ll just make a short acknowledgment that GPT gave a workable first version out-of-the-box to iterate on. (Got to give credit where it’s due.)
Actually a lot of people have gotten pretty good at guessing my puzzles; I had a few friends doing the hunt who messaged me once they got to Behold, A Puzzle, asking if I wrote it.