This paper gives a clear explanation on techniques for increasing cache utilization, and justify the existence of memory barriers as a necessary evil that is required to enable good performance and scalability.
Its general structure is as follows:
- Presents the structure of a cache;
- Explains how cache-coherency protocols ensure that different per-CPU caches coordinate with each other;
- Describes a technique called "store buﬀer", which can be used to ease the performance loss caused by invalidate-acknowledgement message passing.
- Gives an example on why write memory barriers are needed -- Store buﬀers will reorder the execution of instructions to achieve better performance but we need methods to ensure some critical orders will not be undermined;
- Outlines another technique named "invalidate queue" for making invalidate-acknowledgement messages arrive more quickly.
- Gives a corresponding example on why read memory barriers are needed -- Invalidate queues will cause another kind of reordering which can be prevented by read memory barriers.
The paper also gives many quizzes and discussions on real implementations (e.g. ARM, IA64).