Chinese dialectometry: fundamental flaws

Language Log 2024-04-24

Really happy to announce our new (open access) paper was finally published today in @LinguisticsJ! "Geographic structure of Chinese dialects: A computational dialectometric approach"https://t.co/oyNPabq0CN with He Huang (lead author), Lei Jia and Zhuo Chi A short … pic.twitter.com/ldXwh3FDCU

— Jack Grieve (@JWGrieve) April 23, 2024

This is the cited paper:

"Geographic structure of Chinese dialects: A computational dialectometric approach", by He Huang, Jack Grieve, Lei Jiao, and Zhuo Cai, Linguistics (De Gruyter Mouton [April 23, 2024])

https://doi.org/10.1515/ling-2021-0138

Abstract

Dialect classification is a long-standing issue in Chinese dialectology. Although various theories of Chinese dialect regions have been proposed, most have been limited by similar methodological issues, especially due to their reliance on the subjective analysis of dialect maps both individually and in the aggregate, as well as their focus on phonology over syntax and vocabulary. Consequently, we know relatively little about the geolinguistic underpinnings of Chinese dialect variation. Following a review of previous research in this area, this article presents a theory of Chinese dialect regions based on the first large-scale quantitative analysis of the data from the Linguistic Atlas of Chinese Dialects, which was collected between 2000 and 2008, providing the most up-to-date picture of the full Chinese dialect landscape. We identify and map a hierarchy of 10 major Chinese dialect regions, challenging traditional accounts. In addition, we propose a new theory of Chinese dialect formation to account for our findings.

Conclusions

To conclude, in this article we have presented the first large-scale dialectometric analysis of Chinese dialect survey data, uncovering hidden structure in regional variation in Chinese, including proposing new theories of modern Chinese dialect regions and of the historical formation of Chinese dialect regions. Our results both support and challenge standard views in Chinese dialectology, providing a quantitative basis for future research in Chinese dialectology, as well as for cross-linguistic typological analysis. This study also highlights the importance of adopting a quantitative and data-driven approach to dialectology. Geolinguistic data is voluminous, high-dimensional, and spatially related, and it is therefore challenging to effectively and efficiently detect and understand relationships and patterns in dialect data. Crucially, extending our scientific understanding of geolinguistic phenomenon must generally rely on the discovery, interpretation, and presentation of multivariate spatial patterns. Dialectometry is a powerful tool that integrates computational, visual, and cartographic methods together to detect and visualize multivariate spatial patterns. It bridges our linguistic knowledges with data-driven, quantitative research and provides us a new way to evaluate previous theories and explore new issues objectively, as we have demonstrated for the Chinese language in this study, leading to new and important insights about regional variation in one of the most important languages in the world.

The conceptual defects of this paper are evident from the first paragraph of the Introduction:

Chinese is a group of language varieties that forms the Sinitic branch of the Sino-Tibetan family. It is the mother tongue of 1.2 billion people, approximately 16 % of the World’s population. Understanding the geographic structure of Chinese dialects and the relationships between these dialects is important because it allows us to better understand the history of Sinitic languages, which is crucial for resolving questions about the formation of the linguistic landscape in eastern Eurasia, as well as processes of language variation and change more generally.

Critical observations, questions, and exegesis

Chinese is — note the singular form of the verb

a group — not a single entity

language varieties — what is a "language variety"?  how does it differ from a language?  how does it differ from a dialect?

group… forms [a] branch — is "Chinese" a group or a branch? or both?  in any event, whether a group or a branch, by any linguistically acceptable definition, "Chinese" consists of more than a single language, not just a mass of "dialects"

Sinitic — what is this? how does it relate to Chinese?  the authors say that "Chinese" is a "group of language varieties [i.e., languages]" that "forms the Sinitic branch of the Sino-Tibetan family"  in other words, Chinese is essentially equivalent to Sinitic, but — in their minds — perhaps the Chinese group is not exactly equivalent to the Sinitic branch  if they are not exactly equal, how do they differ?  it's all very muddy and murky

That's just my critical analysis of the first sentence of the Introduction.  The rest of it reads like AI-generated superficial, vapid blather, which is true of much of the rest of the paper when it is not citing and interpreting data.

Methinks the authors of this paper have been seduced and confused by the compilers of the Linguistic Atlas of Chinese Dialects, the chief source of their data, into thinking that "Chinese" is a single language ("the mother tongue") spoken by 1.2 billion people and that it consists of thousands of mutually intelligible "dialects".

Nothing could be further from the truth, linguistic and otherwise.

My assessment of the paper under review may seem to be unnecessarily harsh.  In actuality, it is not much different from countless other studies in Chinese dialectology that cannot distinguish between family, branch, group, language, dialect, and fāngyán 方言 ("topolect").

P.S.:  This has nothing to do with armies and navies, a topic we've fruitlessly discussed ad nauseam on Language Log countless times in the past.

P.P.S.:  As for the mutual intelligibility of so-called "Chinese dialects", listen to this 4-year-old kid from Tianjin, which is close (70 miles) to Beijing, singing in the local Muttersprache.

P.P.S.:  If we can't call all those multitudinous strains of language in China "dialects", what would be a good alternative?  I propose "lect" (see especially the last sentence in the passage below).

In sociolinguistics, a variety, also known as a lect or an isolect, is a specific form of a language or language cluster. This may include languages, dialects, registers, styles, or other forms of language, as well as a standard variety. The use of the word variety to refer to the different forms avoids the use of the term language, which many people associate only with the standard language, and the term dialect, which is often associated with non-standard language forms thought of as less prestigious or "proper" than the standard. Linguists speak of both standard and non-standard (vernacular) varieties as equally complex, valid, and full-fledged forms of language. Lect avoids the problem in ambiguous cases of deciding whether two varieties are distinct languages or dialects of a single language.

(Wikipedia)

 

Selected readings

[Thanks to Hiroshi Kumamoto]