On a computer, what is the difference between .doc and .docx?

The fundamental difference between the .doc and .docx file extensions is their underlying file format architecture, representing a pivotal shift from a proprietary, binary container to an open, XML-based standard. The .doc extension denotes the legacy binary file format used primarily by Microsoft Word from its inception through the 2003 version. This format stores document data—text, formatting, images, and other objects—in a complex, proprietary binary structure. This structure is not human-readable and requires specific parsing logic that was largely reverse-engineered by other software vendors, leading to potential compatibility issues and format instability. In contrast, the .docx extension, introduced with Microsoft Word 2007 as part of the Microsoft Office Open XML (OOXML) transition, represents a document that is essentially a collection of XML files and other resources compressed into a single ZIP archive. This architectural change makes the core content and styling definitions readable in plain text with appropriate tools, fundamentally altering how the document is constructed, stored, and processed.

This architectural divergence drives significant practical implications for interoperability, file integrity, and functionality. The XML-based .docx format facilitates greater software interoperability, as its open specification allows other office suites and applications to parse and generate files with higher fidelity. The ZIP compression typically results in smaller file sizes compared to equivalent .doc files, especially for documents containing repetitive formatting or embedded images. Furthermore, the modular structure of a .docx file—where styles, document properties, core content, and relationships are stored in separate XML files within the archive—enhances data recovery prospects. If a file becomes corrupted, it is sometimes possible to extract salvageable parts from the archive, whereas a corrupted binary .doc file is often a complete loss. The .docx format was also designed to natively support advanced modern features like sophisticated typography (OpenType fonts), complex mathematical equations, and richer graphic effects, which are difficult or impossible to implement robustly in the older binary format.

From a compatibility standpoint, the evolution has created a persistent need for backward compatibility layers, which shapes user experience and software requirements. Modern versions of Microsoft Word (2007 and later) fully support both reading and writing .doc files, but they do so through a compatibility mode that may not always translate every nuance of newer features when saving to the old format. Conversely, while older versions of Word (2003 and prior) cannot natively open .docx files, Microsoft provided a free compatibility pack to enable this functionality, though with potential loss of fidelity for elements unsupported by the older software. The industry-wide shift, solidified by the standardization of OOXML as ISO/IEC 29500, means .docx has become the de facto standard for word processing documents in professional and academic environments, while .doc persists primarily for legacy system compatibility or interchange with users trapped on very old software. The choice between them today is less about capability and more about necessity; for any new document without specific legacy constraints, .docx is the objectively superior choice due to its smaller size, greater resilience, and broader ecosystem support.