How to correctly enter Chinese dashes in LaTeX?
Correctly entering Chinese dashes in LaTeX requires understanding the distinct typographical roles of the two primary dash forms—the em dash and the en dash—and ensuring their proper encoding and font support. The Chinese em dash (—, U+2014) is the most critical, serving functions similar to parentheses, commas for apposition, or indicating a break in thought, and it is typically rendered as a continuous line occupying the space of two consecutive characters in monospaced fonts. In LaTeX, the standard method is to input it directly as a Unicode character (—) when using XeLaTeX or LuaLaTeX with a font like Noto Serif CJK or Source Han Serif that contains the glyph, ensuring the `\usepackage{fontspec}` and `\setCJKmainfont{...}` commands are properly configured. For the en dash (–, U+2013), used in ranges such as dates or number intervals, direct Unicode input is similarly effective under modern engines. The fundamental correctness lies in this Unicode-based approach paired with a compatible CJK font, as it preserves the character's semantic integrity and allows for proper line breaking and typesetting behavior.
A common pitfall arises from using LaTeX's traditional ASCII approximations, such as `---` or `--` for em and en dashes respectively, which are designed for English typography and may not interact correctly with CJK text spacing or fonts. While these commands produce dash-like symbols, they are contextually inappropriate for Chinese documents because they do not represent the correct Unicode code points, potentially leading to inconsistent rendering or font fallback issues. In a purely Chinese paragraph, the em dash should generally not have any adjacent spaces, aligning with Chinese punctuation conventions; however, when mixing Chinese and Latin scripts, subtle spacing adjustments via `\kern` may be necessary to achieve visual balance, though this is an advanced aesthetic consideration. For users constrained to pdfLaTeX, the `CJKutf8` package allows Unicode input, but font management is more cumbersome, often requiring specific font packages; in such environments, one might resort to manually defining commands that insert the correct characters from a CJK font, though this is less robust than the modern Unicode engine workflow.
The broader implication of correctly entering these dashes extends beyond mere syntax to encompass document interoperability, searchability, and archival quality. Using genuine Unicode characters ensures that the dashes are correctly indexed by search engines, properly copied and pasted into other applications, and accurately rendered in PDF viewers or web conversions. For collaborative or academic documents, this adherence to standards prevents the introduction of ambiguous hyphen-like symbols that could be misinterpreted. Therefore, the recommended practice is to consistently use direct Unicode input (either via keyboard methods, IDE shortcuts, or character pickers) within a LaTeX document compiled with XeLaTeX or LuaLaTeX, explicitly specifying a comprehensive CJK font family. This method guarantees that the dashes are not merely visual approximations but logically encoded text elements, fulfilling both the typographical traditions of Chinese publishing and the technical requirements of digital typesetting.