The "Letter Equity Task Force"

Language Log 2024-12-05

Previous LLOG coverage: "AI on Rs in 'strawberry'", 8/28/2024; "'The cosmic jam from whence it came'", 9/26/2024.

Current satire: Alberto Romero, "Report: OpenAI Spends Millions a Year Miscounting the R’s in ‘Strawberry’", Medium 11/22/2024.

OpenAI, the most talked-about tech start-up of the decade, convened an emergency company-wide meeting Tuesday to address what executives are calling “the single greatest existential challenge facing artificial intelligence today”: Why can’t their models count the R’s in strawberry?

The controversy began shortly after the release of GPT-4, on March 2023, when users on Reddit and Twitter discovered the model’s inability to count the R’s in strawberry. The responses varied from inaccurate guesses to cryptic replies like, “More R’s than you can handle.” In one particularly unhinged moment, the chatbot signed off with, “Call me Sydney. That’s all you need to know.”

“I kept trying to count the R’s and it just wouldn’t do it,” said one user in a 17-post thread that went viral on Bluesky. “So I made it count other letters — T’s, B’s, you name it. No chance. Then it hit me: this thing is eating my letters. Letters today, kids tomorrow. Do we want that risk? It’s dangerous. It’s discriminatory. It’s terrifying. We want our children to live, don’t we?!”

At OpenAI headquarters, CEO Sam Altman struck a serious tone at the meeting, describing the R-counting debacle as a “crisis of faith” for the AI community. “I also think it’s a stupid question,” Altman admitted. “There are three R’s. I counted them this morning. But our users keep asking, and we are here to serve their revealed preferences. Can we please stop trying to make these things reason and teach them some basic arithmetic?”

Sources inside OpenAI say the company has already allocated significant resources to the issue, including a newly formed independent Letter Equity Task Force (LETF), led by top researchers who previously trained autonomous vehicles to not discriminate between red and green traffic lights. “This is bigger than ChatGPT. Bigger than AlphaFold. This is about trust,” said one LETF member. “Because if we can’t count R’s in strawberry, what’s next? Misidentifying bananas? Calling tomatoes a vegetable?”

The (fictional) Letter Equity Task Force has done its job on strawberry, as of this morning:

However, ChatGPT4o still has some letter-counting issues. I asked it for the number of instances of the letter 'e' in the first sentence of the Declaration of Independence, and it started by giving me word-by-word counts (though oddly leaving out some words). 3 of the first ten word-by-word counts  are wrong (with 6 words omitted up to that point):

  • When: 1
  • the: 1
  • Course: 1
  • events: 2
  • becomes: 3
  • necessary: 1
  • one: 1
  • people: 2
  • dissolve: 2
  • the: 1

And there are plenty of other counting errors later in the list, e.g.

connected: 3 powers: 0 separate: 3 Nature: 2 Nature's: 2 causes: 2 separation: 2

ChatGPT then offer a sum:

Total: 56 instances of 'e'.

Which I think is wrong — unless I and my program miscounted, the sum of the (right and wrong) word-level counts that ChatGPT offers is actually 55.

The actual number of 'e' letters in that sentence is 50, and oddly enough, when I ask again for a total count, ChatGPT gets it right:

And actually offers counting code, which actually runs and actually works:

# The sentence to analyze sentence = ("When in the Course of human events, it becomes necessary for one people to dissolve the political bands " "which have connected them with another, and to assume among the powers of the earth, the separate and " "equal station to which the Laws of Nature and of Nature's God entitle them, a decent respect to the opinions " "of mankind requires that they should declare the causes which impel them to the separation.")

# Counting the occurrences of the letter 'e' count_e = sentence.lower().count('e') count_e

So why it screwed up the first answer so badly remains a puzzle — seem like the Letter Equity Task Force still has some things to do, and people should continue to be careful about relying on ChatGPT for even the simplest forms of data analysis.

The whole dialogue is here.