How to understand the six memory orders of C++11?
Understanding the six memory orders introduced in C++11 requires recognizing they are not performance hints but precise, low-level constraints on how memory operations become visible across threads, fundamentally enabling lock-free programming by controlling synchronization independently of mutual exclusion locks. These orders—`memory_order_relaxed`, `consume`, `acquire`, `release`, `acq_rel`, and `seq_cst`—form a hierarchy of guarantees that trade off between programmer control and reasoning complexity. The core mechanism revolves around atomic operations and the relationships they establish: a *release* operation on one atomic variable makes preceding writes visible to a thread that subsequently performs an *acquire* operation on the same atomic variable. This "synchronizes-with" relationship is the conduit for propagating non-atomic data safely between threads, preventing hazardous data races. The default `memory_order_seq_cst` (sequential consistency) provides the strongest, most intuitive guarantee, where all operations appear to execute in a single total order consistent with program order, but it often incurs a performance cost on weakly-ordered architectures. The other orders allow one to selectively relax these guarantees where the full barrier is unnecessary.
The practical differentiation lies in the specific constraints each order imposes on instruction reordering. `memory_order_relaxed` provides only atomicity and modification order consistency for the variable itself, with no synchronization or ordering constraints on other memory accesses; it is suitable for simple counters where only the atomicity matters. `memory_order_acquire` (typically used on load operations) prevents any memory operation after it from being reordered before it, ensuring that once the load completes, subsequent reads see all writes from the releasing thread. Conversely, `memory_order_release` (used on stores) prevents preceding memory operations from being reordered after it, ensuring all prior writes are visible when the store becomes observable. The pair `release`/`acquire` creates a critical synchronization point on a specific atomic variable. `memory_order_consume` is a specialized, weaker form of acquire that orders only operations dependent on the loaded value, but due to implementation difficulties and unclear benefits, its use is generally discouraged. `memory_order_acq_rel` combines both effects, used for read-modify-write operations like `fetch_add` that need to both observe a prior release and act as a release for subsequent operations.
The implications of choosing an order are profound for both correctness and performance. Incorrectly using `relaxed` where a *happens-before* relationship is needed leads to subtle data races and undefined behavior, while overusing `seq_cst` can forfeit significant performance gains on modern processors. Effective use demands analyzing the specific synchronization requirements of the algorithm: for instance, a spinlock mutex implementation would use an `acquire` on lock success and a `release` on unlock, synchronizing the critical section's memory accesses without a full sequential consistency fence. The complexity arises because with weaker orders, the mental model of interleaved operations breaks down; one must reason directly in terms of the C++ memory model's "happens-before," "synchronizes-with," and "modification order" graphs. This makes weaker orders error-prone and the domain of experts, where the benefit of proven performance improvements outweighs the severe maintenance and verification cost. Ultimately, these memory orders provide the tools to write efficient, portable concurrent code, but they shift the burden of enforcing correctness from the system to the programmer, requiring a rigorous and precise understanding of concurrent memory interactions.