Transcription, lenition and allophonic variation

Language Log 2017-01-11

I doubt that many native speakers of American English will recognize this word:

Your browser does not support the audio element.

But with a little more context, more people will get the message:

Your browser does not support the audio element.

And if we play the whole pause group, it becomes obvious:

Your browser does not support the audio element.

The clip comes from Terry Gross's 10/22/2015 interview with Sarah Silverman, at about 12:13 of the download, and of course the transcript is

gee he didn't even wait to get his braces off you know because

The OED gives /ˈdɪdn(t)/ as the US pronunciation of the word "didn't", with this online sound clip:

Your browser does not support the audio element.

That pronunciation actually has six clear phonetic segments in it (more if we divide stop closures and releases):

And the IPA form of Sarah Silverman's pronunciation is [dɪn], not [dɪdn̩t] or [dɪdn̩] (supposing that the OED's 'n' is meant to be syllabic):

Sarah Silverman is of course not the only person ever to reduce "didn't" to [dɪn] — here's Carrie Brownstein, at about 11:24 of her 10/27/2015 Fresh Air interview:

not true — didn't even mention Buffy is Fluffy in the book

Your browser does not support the audio element.

On the other hand, "didn't" isn't always reduced to that point — here's Terry Gross at about 9:51 of that same interview:

and so growing up in- in the house that you did with a father who didn't acknowledge to

Your browser does not support the audio element.

As indicated in the image above, an IPA transcription of that rendition of didn't would be something like

[dɪdn̩ʔ]: Your browser does not support the audio element.

… or maybe [dɪɾn̩ʔ], though in this case the medial /d/ is arguably a weak voiced stop rather than a tap.

This is the commonest pronunciation of the word "didn't" in even rather formal American English — the second-syllable reduced vowel and the final /t/ are hardly ever seen.

But that division into five phonetic segments obscures important aspects of what's going on, as we can see in a schematic articulatory score:

In the second syllable, the tip of the tongue closes off the oral tract for the [d], and this closure remains in place through the following nasal murmur and glottal stop; the velum opens to create the release of the [d] into the syllabic nasal [n̩]. and then remains open through (much of the) glottal stop; the glottis constricts to close off the nasal murmur — and then remains constricted as the velum closes again in preparation for the start of the initial vowel of acknowledge.

It's that articulatory spreading that makes the syllabic nasal a more natural outcome than the vowel+nasal combination. And it's the tendency to weaken non-pre-stress /t/ and /d/ to the point of losing the oral closure that makes it natural to turn the final /t/ into nothing more than a weak glottal stricture, and to weaken the medial /d/ to the point of extinction.

The OED is not the only dictionary to get the pronunciation of this word wrong.

If we look up didn't in the online version of cmudict, an open source pronouncing dictionary widely used in the speech technology field, we get

D IH D AH N T

which is the arpabet version of IPA [dɪdənt].

The full cmudict lists four variants (IPA equivalent added):

DIDN'T 1 D IH D AH N T = [dɪdənt] DIDN'T 2 D IH D N T = [dɪdnt] DIDN'T 3 D IH D AH N = [dɪdən] DIDN'T 4 D IH N T = [dɪnt]

All of these are wrong in one way or another — either they postulate a second-syllable schwa segment, or they have a final [t], or both.

The online Merriam-Webster gives this pronunciation field

ˈdi-d^ənt, -d^ən, dial also ˈdit-^ən(t) or ˈdint

which (eliding stress and syllabification) I think is meant to correspond to the five IPA variants

[dɪdənt], [dɪdən], [dɪtənt], [dɪtən], [dɪnt]

with this audio: Your browser does not support the audio element.

And the Wiktionary gives these pronunciations:

IPA: /ˈdɪd(ə)nt/ (General American): [ˈdɪɾn̩(t)], [dɪʔn̩(t)]

with the audio Your browser does not support the audio element.

What's the point of this little example?

Well, the first thing is not news: dictionary pronunciations don't give a very good account of how people actually talk.

And the second, related point is not news either: IPA segment sequences — or other ways of segmenting and alphabetizing pronunciation — are not a scientifically very satisfactory representation of speech.

There are at least three reasons for this failure:

Articulatory gestures interact in terms of a multi-layered "score" rather than a single segmental sequence;
The lexical representation of word pronunciation is digital, but there's a complex, context-dependent, and variable analog process of "phonetic realization" between symbols and sounds;
The result of that process is often acoustically similar or identical to the output expected from a different input, e.g. where a gesture is weakened to the point of changing its nature (say from stop to fricative to approximant to nothing), or where two gestures merge to create what might have been originally just one.

As a result , there are three plausible but fundamentally different accounts for a given observation: modification of the lexical pronunciation in the process of phonetic realization; adoption of a different lexical pronunciation; symbolic modification of the lexical pronunciation between the lexicon and the process of phonetic realization.

There's more to say about this, but that's enough for now.