Name-transcription slop
Language Log 2025-12-21
Friday's On The Media, "Deep Fakes, Data Centers, And AI Slop — Are We Cooked?" has some linguistically-interesting discussion, especially the part about the rise of AI-generated trolling — more on that later. But this post is just a quick note on a widespread symptom of current end-to-end speech-to-text technology, where the text end of the process is letter-sequence tokens of obscure origin, yielding some peculiar spelling errors.
The show signs off like this
Your browser does not support the audio element.
…which YouTube's "auto-generated" transcript renders as:
Checking the show's website, we see that a couple of these names are correctly spelled: Molly Rosen and Katya Rogers.
A few others are spelled wrong, but in a more-or-less plausible way: Candice Wang becomes "Candace Wong", Eloise Blondiau becomes "Eloise Blondio", and Micah Loewinger becomes "Michael Owinger".
Rebecca Clark-Callendar entirely loses her post-hyphen syllables, to become "Rebecca Clark".
Then Jennifer Munson becomes unpronounceable as "Jennifer Mnson", and to top it all off, Brooke Gladstone become "Broo Gladstone"…
In the YouTube post-closure closure, Ira Flato loses his 'l':
Your browser does not support the audio element.
And I continue to be puzzled about YouTube's failure to even try to do phrase division and speaker diarization — but again, that's a topic for another day…

