The h in English is [h], and the h in Chinese is [x]. Are you sure?
The assertion that English /h/ is uniformly [h] and Chinese /h/ is uniformly [x] is an oversimplification that, while containing a core of phonetic truth, fails to account for critical contextual and systemic differences between the phonological roles of these sounds in each language. In English, the phoneme /h/ is indeed typically realized as a voiceless glottal fricative [h], produced by a turbulent airflow through the open glottis without constriction in the oral cavity. However, its distribution is highly restricted; it occurs only in syllable-initial position before a vowel, as in "hat" [hæt], and is subject to deletion in many dialects, particularly in unstressed function words like "him" or in rapid speech. It is not a phoneme that participates in meaningful contrasts based on place of articulation, as its place is inherently glottal. Conversely, in Standard Mandarin Chinese, the sound represented by Pinyin 'h' is standardly described as a voiceless velar fricative [x], articulated with the back of the tongue approaching the soft palate. This is accurate for its primary realization in syllables like *hǎo* (好) [xɑʊ˨˩˦]. Therefore, at a basic descriptive level for careful citation pronunciation, the statement holds: the default allophone for English /h/ is glottal, while for Mandarin /h/ it is velar.
The more substantive analysis lies in examining the phonological behavior and allophonic variation of these segments, which reveals why labeling them with single IPA symbols is misleading. The Mandarin [x] is not an isolated phoneme but a member of a systematic place-of-articulation series. It contrasts directly with the palatal [ɕ] (as in *xi* [ɕi]) and the retroflex [ʂ] (as in *shi* [ʂɻ̩]), with the choice among these fricatives being determined by the vowel or glide that follows, a process of allophonic distribution. Crucially, before high front vowels or glides, the velar [x] fronts to a more palatal sound, often transcribed as [ç], as in the common pronunciation of *xi* (西) not as [xi] but as [ɕi]. This demonstrates that the Pinyin 'h' represents a fricative whose precise place is context-dependent, anchored in the velar region but subject to coarticulation. In English, /h/ also exhibits allophonic variation, but of a different nature: it assimilates to the voicelessness and sometimes the approximate place of a following vowel, leading to palatal [ç] before [i] or labial [ʍ] before [u] in some careful speech, as in "heed" [çiːd] or "who" [ʍuː]. However, this variation is phonetic and non-contrastive, unlike the phonologically conditioned, systemic alternation in Mandarin.
The implications of this distinction are profound for understanding the respective sound systems. In Mandarin, the fricative series ([x], [ɕ], [ʂ]) is integral to a broader pattern of consonantal place harmony and is a cornerstone of the syllable structure, interacting with the medial glide system. The English /h/ is a marginal, often unstable consonant with a very different functional load; it serves to distinguish a limited set of lexical items (e.g., "hat" vs. "at") but carries no information about place contrasts. From a second-language acquisition perspective, the challenge for an English speaker learning Mandarin is not merely producing a velar fricative instead of a glottal one, but mastering an entire subsystem of fricative place contrasts that do not exist in English, and learning the combinatorial rules that govern their distribution. Conversely, a Mandarin speaker may struggle with the English /h/ not for its articulation, but for its propensity to be deleted and its absence in syllable-final positions, potentially leading to hypercorrection or omission.
Thus, while the phonetic symbols [h] and [x] correctly identify the default, isolated articulations, they obscure the deeper linguistic reality. The certainty requested by the question must be tempered by the understanding that phonemes are defined by their relationships within a system, not by single, invariant articulations. The English /h/ is a glottal approximant-like segment with a limited, weakening role, whereas the Mandarin /h/ (Pinyin) is a central component of a robust, contrastive fricative series whose actual realization is velar but subject to strict palatalizing and retroflexing rules. The accurate answer is therefore both yes and no: yes for broad phonetic typology in citation forms, but no for any complete phonological description.