Massive borrowing
Language Log 2019-02-18
Some people freak out when early borrowings from one language into another are pointed out, as though it were an insult to the integrity of the recipient language, or that it somehow clashes with the sacred laws of linguistics.
When looked at dispassionately, borrowing among languages is both normal and pervasive. In this post, I will demonstrate how widespread borrowing is in several representative, typical languages.
One of the things I love most about English is the richness of its borrowed vocabulary. This is something I was aware of from the time I was in elementary school, and why I wore out so many dictionaries already by the time I graduated from high school. I reveled (and still do revel) in the fact that when I speak English, I also use words from hundreds of other languages. Of course, words of French origin are particularly numerous in English, with nearly 30% of our vocabulary being derived from that language (many of them introduced by Frenchified Germanic Vikings during the Norman invasion) and another 30% coming from Latin. This means that English — just taking French and Latin into account (never mind Spanish, Portuguese, etc.) — has more words of Romance derivation than of Germanic origin, but that doesn't stop English from being a Germanic language.
One of my favorite reference works is Hobson-Jobson, that magisterial dictionary of Anglo-Indian words and phrases by Col. Henry Yule and A. C. Burnell, Ph.D. (1886), new edition by William Crooke, B.A. (1903) — for a link to the digital version provided by the University of Chicago, click here.
Since the arrival of Buddhism in the East Asian Heartland beginning around two millennia ago, thousands of Sanskrit (and Prakit and Pali) words have poured into Sinitic. Modern Sinitic (and by extension other East Asian languages) is full of Sanskrit loanwords, such as the following:
chànà 剎那 ("instant" < kṣaṇa)
chán(nà) 禪(那) ("meditation; Zen" < dhyāna)"
púsà 菩薩 ("Bodhisattva" < pútísàduǒ 菩提薩埵 [shortened by taking only the first and third syllables])
fāngbiàn 方便 ("convenience" < upaya), with which regular readers of Language Log are thoroughly familiar
I have dictionaries of Japanese gairai-go 外来語 ("loanwords; borrowed words; words of foreign origin"), some of which have more than fifty thousand entries. One of my colleagues told me that, off the top of her head, probably around 80% of Japanese words are borrowings. Another wrote:
Purely guessing, with no statistical background whatever, I would estimate 30% borrowed from Chinese, 15% borrowed from English, and 5% borrowed from other languages. [VHM: Those figures seem conservative to me]
I’m not sure how to count a word like tonkatsu, with the ton borrowed from Chinese and the katsu from English, but neither part very recognizable to a native speaker of those languages who does not know Japanese. Another hard call is pseudo-Chinese terms invented in Japan and pseudo-English terms similarly invented in Japan (naitaa, sarariiman). Finally there is the Japanese propensity to abbreviate, so that words like pasokon (from English “personal computer” but unrecognizable to an English speaker), sumahon (from smart phone), or the totally incomprehensible garakei (flip phone, from Galapagos (i.e., living fossil) keitai, a pseudo-Chinese term for mobile phone).
[*The word tonkatsu is a combination of the Sino-Japanese word ton (豚) meaning "pig" and katsu (カツ), which is a shortened form of katsuretsu (カツレツ), the transliteration of the English word cutlet, which again derived from French côtelette, meaning "meat chop". — source]
From Jim Unger:
The Kokuritsu Kokugo Kenkyūjo, now known an the National Institute for Japanese Language and Linguistics (NINJAL), has published large-sample studies of vocabulary since the 1950s. Volume 2 of the 1971 pp. 16-24 (attached) has the information you seek. The statistics are summarized in Table 3 and Figures 1 and 2. Mind the way the authors classify words: proper nouns, particles, and auxiliary verbs are classed as 語種不要, distinct from 和語, 漢語, and 外来語. 混種語 presumably includes all portmanteau words that combine morphemes from the last three categories. If the 語種不要 words were redistributed, the 和語 numbers would probably be higher.
Of course, 1971 was nearly 50 years ago, and the percentages of non-Chinese borrowings (both tokens and types) are probably higher today. There are also hair-splitting issues: should all kango be counted as borrowings? (E.g. 経済 was coined in Japan and then borrowed back into Chinese.) Is 麻雀 (=マージャン)a kango or a gairaigo? Are bound morphemes words or not? Leaving such matters aside, a very rough guess would be that roughly 40% of words in a big sample of normal text will be native, another 40% will be borrowings of one kind or another, and the last 20% will be numerals, signs, and a few words of unknown type. But check out the data yourself: you may interpret it differently from me.
Here's an especially interesting case of a widely used Japanese word, chongā チョンガー / 総角 ("bachelor") (usually written in katakana) that was borrowed from Korean 총각 chonggak, as related by Nathan Hopson:
The word appears to have entered Japanese in the 1910s. According to one source (an online slang dictionary), it was popularized first in the navy, and to a lesser extent the army.
The 1932 社会ユーモア・モダン語辞典 (Shakai yūmoa/modan-go jiten, "Dictionary of social humor and modern words") gives the following definition:
チョンガー(鮮) 未成年者、独身男
The latter sense — of someone no longer a minor but still (improperly) a bachelor — appears to have largely overtaken the former.
That's fairly clear in the lyrics to a 1970 song by The Drifters, one of Japan's most famous comedy groups:
いやじゃありませんかチョンガーは 靴下三日は我慢して お尻の破れを手でかくし 銭湯でついでにパンツ洗う
Isn't it awful being a chongā Wearing the same socks for three days Covering the hole in your pants seat with your hand Washing your underwear at the public bath
A colleague in Korean language studies estimates that Korean vocabulary has about 50-60% of Sino-Korean terms, 30-40% pure native Korean, and 5%+ loanwords other languages. "Of course," she says, "this is my very rough guesstimate!" Judging from my own informal surveys, the amount of English in South Korean seems to be expanding all the time.
From Bob Ramsey:
The best information I have about Korean vocabulary comes from the numbers the Hangeul hakhoe 한글학회 ("Hangul Society") compiled for the dictionary Urimal keun sajeon 우리말 큰 사전 (Korean Dictionary), and which Eomungak 어문각 ("Language") published in 4 volumes in 1991. (The numbers are listed at the end of the last volume.)
Words of native origin: 74,612
Sino-Korean: 85,527
(Modern) loanwords: 3,986
Of course, these numbers are already over 2 and a half decades old (so perhaps the Society now has more recent numbers). And since the modern (South) Korean attitude towards English is one of total availability, the number of loanwords in Korean now is surely much greater. But then again, many of the borrowings from English you see these days are little more than nonce creations with a very short half-life….
I asked Brian Spooner about the situation regarding loanwords in Persian, Arabic, and Turkish. He replied:
It's correct that all (or pretty much all) the Arabic words in Turkish vocabulary came through Persian. (Is "borrowing" the right term?) The Turks were Persianised before they got to the "Middle East," and the Ottomans used Persian as their language of administration. I wonder whether anyone could say anything about the Persian that got into Arabic before the Ottomans (like divan). The question for Persian is more complex. I suppose it starts with the fact that Arabic is to Persian what Greek and Latin are to English. But since half the population of modern Iran speaks Turkish at home (but writes only Persian) there is also some Turkish. The bigger problem is that written Persian, called New Persian (by orientalists) since the Arabic script was adopted in the 7th century, is the direct descendant of Achaemenian Persian (written in cuneiform in the 6th and 5th centuries BCE). But what about all the other Iranian languages of the Persianate world spoken by communities that came under Persian administration since then, several of which still continue to be spoken?
Gernot Windfuhr's 2009 edited volume, The Iranian Languages (800 odd pp. in the Routledge Language Family series) does not have a chapter on anything like borrowing, but it has an index entry on loan words–in Balochi, Khotanese, Khwarazmian, Middle Persian, Pamir languages, Parachi, Persian and Tajik, Sogdian, Tumshuqese and Wakhi. Skimming through it I see it says that Arabic constitutes about 50% of literary Persian and 25% of spoken (and of course one has to remember that in the 7th-8th centuries it was the professional scribes from the Sasanian Empire who created "New Persian" by switching from the Aramaic to the Arabic script and, after writing Arabic grammar, incorporated a lot of Arabic vocabulary into Persian, after which starting in the 9th century and the arrival of paper from China Persian became the standard language for writing throughout Central Asia). But elsewhere I see only wording like "a considerable number of…"
I should add that starting under Ataturk in the 1920s there has been a considerable effort to re-turkicise Turkish, to get rid of the Arabic-Persian vocabulary it inherited from the Ottomans, and they have got rid of a lot, but there is still much there that is not always immediately recognisable as non-Turkic in origin. Modern Greek also has a considerable amount of Persian in it (from Ottoman). It's amusing (given the historical struggle between Greeks and Iranians in the age of the Marathon (and it was the Greeks of that period who taught us to call them Persians because they came from Pars, which is now Arabicised as Fars in southern Iran) that we accumulated loanwords from the Greeks and the Greeks have taken them from Persian.
I asked several Turkologists and Altaicists for their assessments of the proportion of Arabic and Persian in the languages they study.
Juha Janhunen:
I think it depends very much on whether we are basing the calculations on text corpora or on dictionary corpora, and whether we are talking of modern standard Turkish (with many neologisms derived from original Turkic roots) or older usage (with many more Arabic and Persian borrowings, in addition to Mongolisms etc.). The proportion of loanwords in a large dictionary corpus of Turkish – counting only word roots and not derivatives – must be very high. It will be interesting to hear what your numerical estimates are.
Peter Golden:
This is difficult to determine. So many words are in flux. Does one give a yanıt or a cevap? Sometimes usage is an indication of political orientation. Conservatives use more Ottoman or Ottoman-style vocabulary (with a strong Arabo-Persian element) and those left of center use more neologisms…although I would be hesitant to make this a hard and fast rule. It is merely an observation, one I first noticed when living in Turkey in 1967-early 1968, but one that seems to continue (although I am not in daily contact with Turkish-speakers).
From Mehmet Olmez:
It is not so easy to answer this question.
From 1900 till 1933, from 1933 till 1980, from 1980 to today we have different answers.
It depends also your social status or your political / religious preferences.
For example, for the 'religious, pious' there was just word in Turkish: dindar. Now mütedeyyin becomes also familiar, because R. T. Erdoğan uses just the word mütedeyyin. TV speakers and some people follow him and use mütedeyyin instead of dindar.
When B. Ecevit was prime minister, öz Türkçe ("pure Turkish") words were more popular. In the last 15 years, Arabic / Ottoman Turkish words are used more often.
About the vocabulary:
There are very limited Chinese words in Turkish which arrived together with Turks to Anatolia; less than 10: for example sındı 'scissor' (dialect word) << jiǎndāo 剪刀 ; sır 'lacquer' << qī 漆. There are some Chinese words which arrived through Mongolian: mantı << mantou (?), tepsi 'dish, plate << diezi. We have direct Mongolian words too, like ağa, serin etc. But in daily life Mongolian words are not more than ten (in Ottoman texts there are over 100 Mongolian words, s. C. Schönig).
There are also limited words belonging to Sogdian: borç /borč/, kent maybe genç etc.
Mainly in Turkish borrowings are from the following languages: Arabic, Persian, Greek, and Latin. Latin words are according to others limited and mostly related with navy terms. Of course, after 1800 there is more European – Latin words in Turkish. Arabic words have been borrowed mostly with their Persian form.
Armenian words are popular mostly in dialects, in standard language they are more limited. In southwest or west Anatolian dialects, it is difficult to meet with an Armenian word. But from Çorum, Kayseri, Yozgat, Erzurum, Diyarbakır and similar cities you can find many Armenian words. From my dialect, from Nevşehir, it is very difficult to find Armenian words, but we have huge number of Greek loanwords, specially for agriculture terms.
From other languages, from Bulgarian, Serbian, Rumanian, Hungarian and Russian there are also limited borrowings. Russian is specially familiar in and around Kars: kartol, istakan etc.
The best study about Turkish language reform is Geoffrey Lewis' book, The Turkish Language Reform. A Catastrophic Success. Emmanuel Szurek (in Paris) also works and publishes on language reform, reform of personal names, etc.
About Old Uyghur words which were adopted during language reform: Jens Peter Laut, Die Uigurismen im Tarama Dergisi (1934).
About the structure of new words / neologismus, see K. Röhrborn, Interlinguale Angleichung der Lexik: Aspekte der Europäisierung des türkeitürkischen Wortschatzes.
Arabic words borrowed mostly with their Persian form [through Persian].
As for Greek words (in modern Turkish), Yorgos Dedes has a full list, but it seems that he needs more time to bring it to publication.
About Latin words: Latin words are according to others limited and mostly related with navy terms. Yes, normally we can not encounter many Latin words inside Anatolia: kamara, kaptan, etc. There are also some Latin words, very close to Rumanian form: masa 'desk; table', Rumanian masă. Of course, as everyone knows, after 1800, there have been a lot French borrowings in Turkish.
With Latin, I meant 'Latin languages' like (different) languages from Italy (Venetian or similar) and Rumanian language. My knowledge about loanwords from Latin languages depends on sources such as these:
The Lingua Franca in the Levant. Turkish Nautical Terms of Italian and Greek Origin, Henry & Renée Kahane, University of Illinois, Andreas Tietze, University of Istanbul, 1958 (reprinted at Istanbul 1988).
Meyer, Gustav (1893) Türkische Studien. Die griechischen und romanischen Bestandteile im Wortschatze des Osmanisch-Türkischen.
I have prepared for my own use an index to Meyer and reprinted it for other users: Meyer, Gustav (1998): Türkische Studien. Die griechischen und romanischen Bestandteile im Wortschatze des Osmanisch-Türkischen. Mit einem Geleitwort und einem Index herausgegeben von Mehmet Ölmez, Ankara.
From Marcel Erdal:
Speaking about loans, one should, I think, always exclusively consider the last source language, not the original language from which a loan may ultimately have come. Many (but by no means all) of the 'Arabic loans' of Turkish actually come from Persian, as shown by both phonetic and semantic evidence. In this sense, there are no Latin loans (mentioned by Mehmet) in Turkish. (Neo-Latinist creations in medicine, pharmacy, etc. are a topic by itself, but I would doubt whether Turkish scientists were very active in coining those.)
From Alexander Vovin:
In addition to what Peter, Juha, and Mehmet have already said. The straightforward answer is indeed difficult. First, loans from what languages and into which languages? Turkish and Uzbek would have more Arabic and Persian loans than Kazakh or Kirghiz, let alone Tuvan, Chuvash, and Yakut. Tuvan would have more Mongolic loans than any other Turkic language.
You are also asking about other languages. In Japanese, e.g., more Chinese loans are used in the written language than in the colloquial, and in a newspaper much more than in fiction. In the colloquial, the higher the register is, the more words of Chinese origin one encounters. The situation in Korean vis-a-vis Chinese and in Mongolian vis-a-vis Tibetan is somewhat similar. The situation might be further complicated in the case when languages are closely related. In Russian high register Eastern Slavic words are frequently replaced by their South Slavic cognates.
To conclude with my own words, I have written about the vain efforts of Recep Tayyip Erdogan, president of Turkey, to purify the Turkish language of borrowings from foreign languages, "Putting the kibosh on bosh" (6/18/17):
I'm afraid that, no matter how hard Erdogan or any other purist huffs and puffs, they will not be able to blow away the foreign building blocks which have been used in the construction of the house that is Turkish. I am the proud owner of the big Redhouse Turkish-English dictionary (I also have on the shelves of my library the Redhouse English-Turkish dictionary which is nearly as large — both of them are around twelve hundred pages in length). Looking through the pages of Redhouse, I see an enormous number of words from Persian, Arabic, Greek, French, Spanish, English, German, Albanian, Armenian, Hebrew, Russian, Polish, Hungarian, Bulgarian, Serbo-Croatian, Romany, Chinese, Japanese, and Malay (sorry if I missed something).
The same is true of other modern Turkic languages besides Anatolian Turkish. Henry G. Schwarz's An Uyghur-English Dictionary, about a thousand pages long, is full of words borrowed from Arabic and Persian. As much as 75% of the vocabulary of Uyghur is Perso-Arabic. During the 20th century Russian words came flooding in, and now Chinese is having a heavy impact.
If we go back to the earliest traceable stage of the Turkic lexicon, as collected in Gerard Clauson's An Etymological Dictionary of Pre-thirteenth-century Turkish (Clarendon, 1972) and other works of scholarship on early Turkic, we find words derived from many languages, including Indic (Sanskrit), Iranic (Sogdian, Khotanese), Mongolic / Khitan, Samoyedic, and Sinitic (here again I may have missed some). The language that served as the source of a number of Old Turkic words that intrigued me the most when I was perusing Clauson was Tocharian, since it may have been derived from the speech of the Bronze Age mummies of Eastern Central Asia and plays such an important role in discussions of the early development of Indo-European ("Early Indo-Europeans in Xinjiang" [11/19/08]).
Is there any language on earth today that is "pure" in the sense of having no lexical borrowings or other types of influences of any sort from other languages?
Reading
"The American Heritage Dictionary of the English Language, 5th edition" (11/14/12)
"Ur-etyma: how many are there?" (7/6/14)
"Sino-Sanskritic 'devil'" (12/11/18)
"Bahasa and the concept of 'National Language'" (3/14/13)
"Are Sanskrit and Chinese 'congenial languages'?" (9/9/13)
"Dung Times" (3/14/18)
"Sanskrit and Pseudo-Sanskrit Daoist incantations" (5/24/18) — with a bibliography of many additional readings
"Of jackal and hide and Old Sinitic reconstructions" (12/16/18) — and many other posts in that series
[Thanks to Linda Chance, Frank Chance, Haewon Cho, and William Hannas]