Intelligence is whatever machines cannot (yet) do

Statistical Modeling, Causal Inference, and Social Science 2024-04-13

I had dinner a few nights ago with Andrew’s former postdoc Aleks Jakulin, who left the green fields of academia for entrepreneurship ages ago. Aleks was telling me he was impressed by the new LLMs, but then asserted that they’re clearly not intelligent. This reminded me of the old saw in AI that “AI is whatever a machine can’t do.”

In the end, the definition of “intelligent” is a matter of semantics. Semantics is defined by conventional usage, not by fiat (the exception seems to be an astronomical organization trying to change the definition of “planet” to make it more astronomically precise). We do this all the time. If you think about what “water” means, it’s incredibly vague. In the simplest case, how many minerals can it contain before we call it “mud” rather than “water”? Does it even have to be made of H20 if we can find a clear liquid on an alternative earth that will nourish us in the same way (this is a common example in philosophy from Hilary Putnam, I believe)? When the word “water” was first introduced into English, let’s just say that our understanding of chemistry was less developed than it is now. The word “intelligent” is no different. We’ve been using the term since before computers, and now we have to rethink what it means. By convention, we could decide as a group of language users to define “intelligent” however we want. Usually such decisions are guided by pragmatic considerations (or at least I’d like to think so—this is the standard position of pragmatist philosophers of language, like Richard Rorty). For instance, we could decide to exclude GPT because (a) it’s not embodied in the same way as a person, (b) it doesn’t have long-term memory, (c) it runs on silicon rather than cells, etc.

It would be convenient for benchmarking if we could fix a definition of “intelligence” to work with. What we do instead is just keep moving the bar on what counts as “intelligent.” I doubt people 50 years ago (1974) would have said you can play chess without being intelligent. But as soon as Deep Blue beat the human chess champion, everyone changed their tune and the chorus became “chess is just a game” and “it’s finite” and “it has well defined rules, unlike real life.” Then when IBM’s Watson trounced the world champion at Jeopardy!, a language based game, it was dismissed as a parlor trick. Obviously because a machine can play Jeopardy!, the reasoning went, it doesn’t require intelligence.

Here’s the first hit on Google I found searching for something like [what machines can’t do]. This one’s in a popular magazine, not the scientific literature. It’s the usual piece in the genre of “ML is amazing, but it’s not intelligent because it can’t do X”.

Let’s go over Toews’s list of AI’s failures circa 2021 (these are direct quotes).

  1. Use “common sense.” A man went to a restaurant. He ordered a steak. He left a big tip. If asked what the man ate in this scenario, a human would have no problem giving the correct answer—a steak. Yet today’s most advanced artificial intelligence struggles with prompts like this.  
  2. Learn continuously and adapt on the fly. Today, the typical AI development process is divided into two distinct phases: training and deployment.  
  3. Understand cause and effect. Today’s machine learning is at its core a correlative tool. It excels at identifying subtle patterns and associations in data. But when it comes to understanding the causal mechanisms—the real-world dynamics—that underlie those patterns, today’s AI is at a loss.  
  4. “Reason ethically…In 2016, Microsoft debuted an AI personality on Twitter named Tay. The idea was for Tay to engage in online conversations with Twitter users as a fun, interactive demonstration of Microsoft’s NLP technology. It did not go well. Within hours, Internet trolls had gotten Tay to tweet a wide range of offensive messages: for instance, “Hitler was right” and “I hate feminists and they should all die and burn in hell.”

(1) ChatGPT-4 gets these common-sense problems mostly right. But it’s not logic. The man may have ordered a steak, gotten it, sent it back, ordered the fish instead, and still left a big tip. This is a problem with a lot of the questions posed to GPT about whether X follows from Y. It’s not a sound inference, just the most likely thing to happen, or as we used to say, the “default.” Older AIs were typically designed around sound inference and weren’t so much trying to emulate human imprecision (having said that, my grad school admissions essay was about and my postdoc was funded by a grant on default logics back in the 1980s!).

(2) You can do in-context learning with ChatGPT, but it doesn’t retain anything long term without retraining/fine tuning. It will certainly adapt to its task/listener on the fly throughout a conversation (arguably the current systems like ChatGPT adapt to their interlocuter too much—it’s what they were trained to do via reinforcement learning). Long-term memory is perhaps the biggest technical challenge to overcome, and it’s been interesting to see people going back to LSTM/recursive NN ideas (transformers, the neural net architecture underlying ChatGPT, were introduced in a paper titled “Attention is all you need”, which used long, but finite memory).

(3) ChatGPT 4 is pretty bad at causal inference. But it’s probably above the bar for what Toews’s complaints. It’ll get simple “causal inference” right the same way people do. In general, humans are pretty bad at causal inference. We are way too prone to jump to causal conclusions based on insufficient evidence. Do we classify baseball announcers as not intelligent when they talk about how a player struggles with high pressure situations after N = 10 plate appearances in the playoffs? We’re also pretty bad at reasoning about things that go against our preconceptions. Do we think Fisher was not intelligent because he argued that smoking didn’t cause cancer? Do we think all the anthropogenic global warming deniers are not intelligent? Maybe they’re right and it’s just a coincidence that temps have gone up coinciding with industrialization and carbon emissions. Seems like a highly suspicious coincidence, but causation is really hard when you can’t do randomized controlled trials (and even then it’s not so easy because of all the possible mediation).

(4) How you call this one depends on whether you think the front-line fine-tuning of ChatGPT made a reasonably helpful/harmless/truthful bot or not and whether the “ethics” it was trained with are yours. You can certainly jailbreak even ChatGPT-4 to send it spiraling into hate land or fantasy land. You can jailbreak some of my family in the same way, but I wouldn’t go so far as to say they weren’t intelligent. You can find lots of folks who think ChatGPT is too “woke”. This is a running theme on the GPT subreddit. It’s also a running theme among anti-woke billionaires, as reflected in the UK’s Daily Telegraph article title, “ChatGPT may be the next big thing, but it’s a biased woke robot.”

I’ve heard a lot of people say their dog is more intelligent than ChatGPT. I suppose they would argue for a version of intelligence that doesn’t require (1) or (4) and is very tolerant of poor performance in (2) and (3).