Trip to the library
West Coast Stat Views (on Observational Epidemiology and more) 2025-08-12
I'm a bit surprised I haven't posted this before.
Emily M. Bender is one of, perhaps the, leading critic of LLM from the theoretically side. (On the business and social impact side I think we'd have to give the title to Ed Zitron.) Though best known for coining the term "stochastic parrot," my favorite example of her work is this essay, in which she demonstrates that, even if the algorithms were intelligent, they still couldn't understand what they were saying.
[I've left out some context. If you think you've spotted a flaw in the logic, you should check out the origin before weighing in.]
From Thought experiment in the National Library of Thailand:
To try to bring the difference between form and meaning into focus, I like to lead people through a thought experiment. Think of a language that you do not speak which is furthermore written in a non-ideographic writing system that you don’t read. For many (but by no means all) people reading this post, Thai might fit that description, so I’ll use Thai in this example.
Imagine you are in the National Library of Thailand (Thai wikipedia page). You have access to all the books in that library, except any that have illustrations or any writing not in Thai. You have unlimited time, and your physical needs are catered to, but no people to interact with. Could you learn to understand written Thai? If so, how would you achieve that? (Please ponder for a moment, before reading on.)
I’ve had this conversation with many many people. Some ideas that have come up:
- Look for an illustrated encyclopedia. [Sorry, I removed all books with photos, remember?]
- Find scientific articles which might have English loanwords spelled out in English orthography. [Those are gone too. I was thorough.]
- Patiently collate a list of all strings, locating the most frequent ones, and deduce that those are function words, like the equivalents of and, the, or to, or whichever elements Thai grammaticalizes. [Thai actually doesn’t use white space delimiters for words, so this strategy would be extra challenging. If you succeeded, you’d be succeeding because you were bringing additional knowledge to the situation, something which an LLM doesn’t have. Also, the function words aren’t going to help you much in terms of the actual content.]
- Unlimited time and yummy Thai food? I’d just sit back and enjoy that. [Great! But also, not going to lead to learning Thai.]
- Hunt around until you find something that from its format is obviously a translation of a book you already know well in another language. [Again, bringing in external information.]
- Look at the way the books are organized in the library, and find words (substrings) that appear disproportionate in each section (compared to the others). Deduce that these are the words that have to do with the topic of that section. [That would be an interesting way to partition the vocabulary for sure, but how would you actually figure out what any of the words mean?]
Without any way to relate the texts you are looking at to anything outside language, i.e. to hypotheses about their communicative intent, you can’t get off the ground with this task. Most of the strategies above involve pulling in additional information that would let you make those hypotheses — something beyond the strict form of the language.
...
You could, if you didn’t get fed up, get really good as knowing what a reasonable string of Thai “looks like”. You could maybe even write something that a Thai speaker could make sense of. But this isn’t the same thing as “knowing Thai”. If you wanted to learn from the knowledge stored in that library, you still wouldn’t have access.
...
It doesn’t matter how “intelligent” [ChatGPT] is — it can’t get to meaning if all it has access to is form. But also: it’s not “intelligent”. Our only evidence for its “intelligence” is the apparent coherence of its output. But we’re the ones doing all the meaning making there, as we make sense of it.