Buffer Sequence Points
======================

David Lovemore, Ravenbrook Limited, 2012-09-24


Introduction
------------

This is an analysis of the kind of memory barriers that might be
required for safety by the MPS "buffered allocation" design and the
"allocation point" protocol on modern multi-core CPUs.  It was a
response to [this
request](https://info.ravenbrook.com/mail/2012/09/14/22-13-58/0/) by
Richard Brooksby:

> //info.ravenbrook.com/project/mps/master/code/buffer.c#14 line 791 says:
>
>     /* .improve.memory-barrier: Memory barrier here on the DEC Alpha */
>     /* (and other relaxed memory order architectures). */
>
> Well, I suspect everything's pretty relaxed these days.
>
> Could I ask you to assess the situation for I3 and I6?  Do we need to
> insert some processor sequence points somewhere?


Analysis
--------

Memory barriers are needed at the compiler level and the CPU level.

The memory barrier for the compiler stops loads and stores being
reordered.  On Windows they come in three flavours `_ReadBarrier`,
`_WriteBarrier` and `_ReadWriteBarrier` intrinsics see
<http://msdn.microsoft.com/en-us/library/f20w0x5e.aspx>. 
Alternatively we can mark the relevant accesses as volatile.  My guess
is that all mutator accesses are relevant because the objects need to be
written before they are committed, so I don't think that is going to
work for us.

That does not stop the CPU reordering reads and writes.  The portable
way to have a memory barrier on windows is to use the `MemoryBarrier`
macro
<http://msdn.microsoft.com/en-us/library/windows/desktop/ms684208.aspx>, which *both* inserts a memory barrier instruction sequence
*and* prevents the compiler reordering memory accesses across it.

Currently we stop all threads before we do the flip, which will force
all reads and writes to be in a consistent state between threads when we
do the synchronization, so I don't think we *currently* need a full CPU
synchronization.

So I don't think we need a memory barrier in `BufferFlip` where it is
marked "Memory Barrier here?" because the mutator is stopped at this
point and these fields aren't changing:

    buffer->initAtFlip = buffer->ap_s.init;
    /* Memory Barrier here? @@@@ */
    buffer->ap_s.limit = (Addr)0;

However in the mutator we ought to have a `_ReadWriteBarrier` just
before testing the limit:

    buffer->ap_s.init = buffer->ap_s.alloc;
    
    /* .improve.memory-barrier: Memory barrier here on the DEC Alpha */
    /* (and other relaxed memory order architectures). */
    /* .commit.after: If a flip occurs at this point, the pool will */
    /* see "initAtFlip" above the object, which is valid, so it will */
    /* be collected.  The commit must succeed when trip is called.  */
    /* The pointer "p" will have been fixed up.  (@@@@ Will it?) */
    /* .commit.trip: Trip the buffer if a flip has occurred. */
    if (buffer->ap_s.limit == 0)
      return BufferTrip(buffer, p, size);

It seems here that a compiler might be tempted to load the limit early
to avoid having to wait for the load when it wants to test it.  Again I
don't think we need a full CPU memory barrier here as the underlying
memory will only get changed when threads are stopped.

Note we also need to update the macros in mps.h.

I guess that we have been getting away without the memory barriers so far.


Document History
----------------

- 2012-10-01  RB  Edited from email into document because it contains an
important analysis that we need to reference publicly.

---
$Id: //info.ravenbrook.com/project/mps/doc/2012-09-24/buffer-sequence-points/index.txt#2 $