Hot words

Lingua Franca 2018-08-10

It is my solemn duty to call the attention of Language Log readers to a seriously deficient BBC article:

"China's rebel generation and the rise of 'hot words'", by Kerry Allen with additional reporting from Stuart Lau (8/10/18). 

Language Matters is a new column from BBC Capital exploring how evolving language will influence the way we work and live.

Even though the article annoyed me greatly, I probably wouldn't have written a post about it on the basis of the flimsy substance of the last 23 paragraphs were it not for the outrageous first paragraph, which really requires refutation.

Before I dissect the first paragraph, however, I need to point out the erroneous premises built into the capsule description of this new series following the title of the article.

All right, "Language Matters" is cutesy, what with the dual nounal and verbal meanings of the second word, but it's too closely modeled on some current politically sensitive slogans for comfort.  Then I'm troubled by the future tense of "will influence".  Neither the present article nor any other article about "evolving language" that I can imagine will be able to predict the way we live and work in the future.  It's hard enough just to figure out how the current stage of a language reflects the way we are living and working in the present.

Now, moving on to the disastrous first paragraph:

Mandarin Chinese is one of the most complex languages in the world. Opening a Chinese dictionary, you find around 370,000 words. That's more than double the number of words in the Oxford English dictionary, and almost three times those in French and Russian dictionaries.

The initial sentence is incredibly lame. It says nothing.  Flunk.

All languages are complex in one way or another:  phonology, morphology, grammar, syntax… — you name it.  If it's a real language that people rely on for all of their needs and transactions, it's bound to be as complicated as life itself.  To tell the truth, I've always felt that Mandarin is one of the simplest and easiest languages I've ever learned.  See, inter alia, "Difficult languages and easy languages" (3/4/17) — I still owe Language Log readers the results of the survey taken in that post; I have all the data, just need to type them up.

The second sentence is worse.  The number of words in a language is no index of its complexity.  The active spoken vocabulary of most people is not going to exceed much more than about 5,000 words and they might use twice that amount in their writing.  Unless their name is William F. Buckley, Jr. or they are someone extremely rare like him, even highly educated people are unlikely to have more than 20,000 words in their spoken vocabulary and 40,000 or so words in their written vocabulary.  Shakespeare knew and used 31,534 words, though he probably knew in addition to that amount another 35,000 or so words, but didn't include them in his published works, making a total of around 65,000 words.

Even if "Chinese" really did have 370,000 words, that wouldn't tell us anything about vocabulary size for individuals.  But "Chinese" doesn't have 370,000 words.  The authors must have gotten their fantastical figure of 370,000 from the number of entries in Hànyǔ Dà Cídiǎn 漢語大詞典 (Unabridged Dictionary of Sintic) (1986-1994), but that is a dictionary based on historical principles, and most of its entries are no longer current.  It's hugely misleading to casually speak of "Opening a Chinese dictionary" in this instance, since Hànyǔ Dà Cídiǎn 漢語大詞典 (Unabridged Dictionary of Sintic) isn't just any old, typical "Chinese" dictionary.  For Sinitic, it's the closest thing to an equivalent of the OED, requiring the mobilization of more than a thousand researchers over a period of nearly two decades for its compilation.

The 7th edition (2016) of the Xiàndài Hànyǔ Cídiǎn 现代汉语词典 (Dictionary of Contemporary Sinitic), the standard and most authoritative dictionary of Modern Standard Mandarin (MSM), has around 70,000 entries.  That fits comfortably in the realm of vocabulary size for highly educated individuals that I described above.

Now, to assert that "Chinese" has more than double the number of words in the Oxford English dictionary" is both bad mathematics and contrary to fact — even if we accept the fictitious claim that MSM has 370,000 words, which it most certainly does not.

"How many words are there in the English language?" from the Oxford Dictionaries website (see especially the last paragraph):

There is no single sensible answer to this question. It's impossible to count the number of words in a language, because it's so hard to decide what actually counts as a word. Is dog one word, or two (a noun meaning 'a kind of animal', and a verb meaning 'to follow persistently')? If we count it as two, then do we count inflections separately too (e.g. dogs = plural noun, dogs = present tense of the verb). Is dog-tired a word, or just two other words joined together? Is hot dog really two words, since it might also be written as hot-dog or even hotdog?

It's also difficult to decide what counts as 'English'. What about medical and scientific terms? Latin words used in law, French words used in cooking, German words used in academic writing, Japanese words used in martial arts? Do you count Scots dialect? Teenage slang? Abbreviations?

The Second Edition of the 20-volume Oxford English Dictionary contains full entries for 171,476 words in current use, and 47,156 obsolete words. To this may be added around 9,500 derivative words included as subentries. Over half of these words are nouns, about a quarter adjectives, and about a seventh verbs; the rest is made up of exclamations, conjunctions, prepositions, suffixes, etc. And these figures don't take account of entries with senses for different word classes (such as noun and adjective).

This suggests that there are, at the very least, a quarter of a million distinct English words, excluding inflections, and words from technical and regional vocabulary not covered by the OED, or words not yet added to the published dictionary, of which perhaps 20 per cent are no longer in current use. If distinct senses were counted, the total would probably approach three quarters of a million.

Enough said on that score.  Now what about the other 23 paragraphs of the article, which is what it's really about — rè cí 热词 ("hot words")?  In a word, it's all very confused and confusing, muddled at best.  I pity anyone unfamiliar with Chinese who slogs through it, because they will be flooded with misinformation, imprecision, and obfuscation about what the language is today and how it works.

The article is a veritable mess.

The authors do offer a fair number of more or less clever paraphrase translations like "freedamn" for Zhōngguó tèsè zìyóu 中国特色自由 ("freedom with Chinese characteristics") and "smilence" for xiào ér bù yǔ 笑而不语 ("laugh without speaking"), though some of these are flops, and one doesn't always know where they come from.  Furthermore, they instance a lot of "hot words" — some of which (such as "niubility") are decidedly cool by now — without translating them or giving an idea of what they mean.

Jonathan Smith, in calling this article to my attention, notes:

…[T]he weird thing is the screwed-up non-translations….

An important point that is not made clear is whether these "hot words" are Chinese with English glosses, English with Chinese glosses or some combination…. I suppose "Chinsumers (zàiwài fēngkuáng gòuwù de Zhōngguó rén 在外疯狂购物的中国人)", "departyment (zhèngfǔ bùmén 政府部门)", "innernet (Zhōngguó hùliánwǎng 中国互联网)", etc., are the latter but with no attempt whatsoever at marking the puns within the Chinese translations… whereas "harmany (Zhōngguó tèsè héxié 中国特色和谐)" is the former with a (bad) attempt at marking the funny in English… etc.

The most valuable aspects of the article are how netizens use rè cí 热词 ("hot words") to circumvent China's ubiquitous internet censors.  But I'm afraid that this point will be lost in the welter of bewilderment that suffuses the entire piece, from beginning to end.