Memory Barriers: a Hardware View for Software Hackers

URL: http://www.rdrop.com/users/paulmck/scalability/paper/whymb.2010.07.23a.pdf

This paper gives a clear explanation on techniques for increasing cache utilization, and justify the existence of memory barriers as a necessary evil that is required to enable good performance and scalability.

Its general structure is as follows:

  1. Presents the structure of a cache;
  2. Explains how cache-coherency protocols ensure that different per-CPU caches coordinate with each other;
  3. Describes a technique called "store buffer", which can be used to ease the performance loss caused by invalidate-acknowledgement message passing.
  4. Gives an example on why write memory barriers are needed -- Store buffers will reorder the execution of instructions to achieve better performance but we need methods to ensure some critical orders will not be undermined;
  5. Outlines another technique named "invalidate queue" for making invalidate-acknowledgement messages arrive more quickly.
  6. Gives a corresponding example on why read memory barriers are needed -- Invalidate queues will cause another kind of reordering which can be prevented by read memory barriers.

The paper also gives many quizzes and discussions on real implementations (e.g. ARM, IA64).

Pitfalls of Object Oriented Programming

URL: http://research.scee.net/files/presentations/gcapaustralia09/Pitfalls_of_Object_Oriented_Programming_GCAP_09.pdf

Since the time consumed by a single CPU cycle is much lesser than RAM latency in nowadays, it becomes critical and profitable to better utilize the cache.

In this slide, Tony Albrecht states that with modern hardware, excessive encapsulation in OO is BAD. According to the principle of OOP, an instantiated object will generally contain all data associated with it. But when we only need a small portion of its fileds during the calculation there may be a lot of avoidable cache misses.