Rich Sutton’s bitter lesson of AI

The Endeavour 2025-02-20

This morning I read Alberto Romeo’s article Grok 3: Another Win For The Bitter Lesson. I wasn’t clear on what exactly the bitter lesson was, so I followed the link to Romeo’s previous article GPT-4: The Bitterer Lesson, which lead to Rich Sutton’s original article The Bitter Lesson.

Sutton opens his article by saying

The biggest lesson that can be read from 70 years of AI research is that general methods that leverage computation are ultimately the most effective, and by a large margin.

Clear enough, except you have to read further to get the “compared to what.” I asked Grok 3 to read the article and tell me exactly what the bitter lesson is. It begins with this explanation:

The “Bitter Lesson” is the principle that, in the field of artificial intelligence, the most effective and lasting progress has historically come from methods that rely on leveraging ever-increasing computational power rather than depending on human ingenuity to craft domain-specific knowledge or shortcuts. …

and concludes with the following pithy summary of its summary:

In essence, the Bitter Lesson is: computation trumps human-crafted specialization in AI, and embracing this, though humbling, is key to future success.

Sutton supports his thesis with examples from chess, go, speech recognition, and computer vision.

In some sense this is the triumph of statistics over logic. All else being equal, collecting enormous amounts of data and doing statistical analysis (i.e. training an AI model) will beat out rule-based systems. But of course all else is often not equal. Logic is often the way to go, even on big problems.

When you do know whats going on and can write down a (potentially large) set of logical requirements, exploiting that knowledge via techniques like SAT solvers and SMT is the way to go. See, for example, Wayne’s article on constraint programming. But if you want to build the Everything Machine, building software that wants to be all things to all people, statistics is the way to go.

Sometimes you have a modest amount of relevant data, and you don’t have 200,000 Nvidia H100 GPUs. In that case you’re much better off with classical statistics than AI. Classical models are more robust and easier to interpret. They are also not subject to hastily written AI regulation.

The post Rich Sutton’s bitter lesson of AI first appeared on John D. Cook.