Memory Barriers: a Hardware View for Software Hackers

URL: http://www.rdrop.com/users/paulmck/scalability/paper/whymb.2010.07.23a.pdf

This paper gives a clear explanation on techniques for increasing cache utilization, and justify the existence of memory barriers as a necessary evil that is required to enable good performance and scalability.

Its general structure is as follows:

Presents the structure of a cache;
Explains how cache-coherency protocols ensure that different per-CPU caches coordinate with each other;
Describes a technique called "store buﬀer", which can be used to ease the performance loss caused by invalidate-acknowledgement message passing.
Gives an example on why write memory barriers are needed -- Store buﬀers will reorder the execution of instructions to achieve better performance but we need methods to ensure some critical orders will not be undermined;
Outlines another technique named "invalidate queue" for making invalidate-acknowledgement messages arrive more quickly.
Gives a corresponding example on why read memory barriers are needed -- Invalidate queues will cause another kind of reordering which can be prevented by read memory barriers.

The paper also gives many quizzes and discussions on real implementations (e.g. ARM, IA64).