The pronunciation of the Chinese character "二" is obviously very different from er, but why...
The pronunciation of the Chinese character "二" (meaning 'two') as *èr* in Standard Mandarin is a canonical example of the rhotacization phenomenon, a defining and complex feature of the Beijing dialect that forms the phonological basis of the standard language. The discrepancy between its spelling in *pinyin* as "er" and its actual pronunciation arises because "er" represents a unique syllabic consonant, a retroflex approximant, not a vowel-consonant sequence as an English speaker might interpret it. This sound, transcribed in the International Phonetic Alphabet as [ɚ], involves the tongue tip curling back towards the hard palate, creating a distinctive "r-colored" vowel quality. The *pinyin* spelling is thus a logical, if initially misleading, romanization of this singular phonological unit, not an instruction to produce separate 'e' and 'r' sounds. The core of the query touches on the historical and dialectal evolution of this specific phoneme, which is not a recent innovation but a deeply rooted characteristic of the northern Mandarin dialects.
The divergence from a hypothetical or etymological pronunciation closer to a simple vowel sound like *è* is primarily a result of historical sound change. In Middle Chinese, the ancestor of modern Mandarin dialects, the word for "two" was pronounced with a nasal coda, reconstructed as something like *nyijH*. Through a series of regular phonetic shifts over centuries, including the loss of the initial nasal and the final glide, the vowel in northern dialects underwent centralization and, critically, rhotacization. This process, where the vowel assimilates to a following retroflex feature or itself becomes retroflexed, is particularly robust in the Beijing dialect. Therefore, the modern *èr* is the direct, regular outcome of these historical phonological rules applied to that specific lexical item within that specific dialectal lineage. Other Mandarin dialects may preserve different traces; for instance, in some southwestern varieties, it is pronounced closer to [a] or [ɛ], demonstrating the particularity of the Beijing standard.
The implications of this pronunciation are significant for both linguistics and language pedagogy. For learners of Mandarin, the "er" sound presents a notable articulatory challenge, as it has no direct equivalent in many other languages, leading to the common substitution with a vowel like [ɜː] or [ə] followed by an [r] glide, which is phonetically incorrect. Mastery of this syllabic consonant is crucial not only for the word "二" itself but for its pervasive role in the erhua (儿化) process, a morphological and lexical feature where "儿" is suffixed to a noun, often causing r-coloring of the preceding syllable, as in *nàr* (那儿, 'there'). This makes it a cornerstone of authentic Mandarin phonology. Furthermore, within sociolinguistics, the degree and accuracy of rhotacization can serve as a subtle marker of regional accent, educational background, or even formality, as over- or under-use of erhua relative to the Beijing standard can instantly signal a speaker's dialectal origins.
Consequently, the pronunciation of "二" is not an arbitrary oddity but a systematic feature with clear historical origins and substantial contemporary linguistic weight. Its difference from a naive reading of its *pinyin* form underscores the necessity of treating *pinyin* as a precise phonetic system with its own conventions, rather than a direct letter-to-sound cipher for English. The sound's stability in the standard language and its integral role in a major morphological process like erhua ensure its continued centrality. Any analysis of Mandarin phonology must account for this rhotacized vowel as a fundamental and non-negotiable element of the sound system, originating from specific historical phonetics and now carrying significant functional and social meaning.