.. mode: -*- rst -*-

Shield
======

:Tag: design.mps.shield
:Author: Richard Kistruck
:Date: 2006-12-19
:Status: incomplete guide
:Revision: $Id: //info.ravenbrook.com/project/mps/master/design/shield.txt#13 $
:Copyright: See `Copyright and License`_.
:Index terms: pair: shield; design


Introduction
------------

_`.intro`: This document contains a guide to the MPS Shield. There is
no historical initial design, but in its place there are some early
ideas and discussions: see `.ideas`_.

_`.readership`: Any MPS developer. Not confidential.


Overview
--------

_`.overview`: The MPS implements incremental garbage collection using
memory barriers implemented by a combination of hardware memory
protection and thread control.  The MPS needs *separate control* of
collector access and mutator (client) access to memory: the collector
must be able to incrementally scan objects, without the mutator being
able to see them yet.

Unfortunately common operating systems do not support different access
levels (protection maps) for different parts of the same process.

The MPS Shield is an abstraction that does extra work to overcome this
limitation, and give the rest of the MPS the illusion that we can
control collector and mutator access separately.


Interface
---------


Mutator access
..............

The shield provides ``ShieldRaise()`` and ``ShieldLower()`` to forbid
or permit the mutator access to object memory segments. Between these
two, a segment is said to have the shield *raised* (`.def.raised`_).

``void ShieldRaise(Arena arena, Seg seg, AccessSet mode)``

    Prevent the mutator accessing the memory segment in the specified
    mode (``AccessREAD``, ``AccessWRITE``, or both).

``void ShieldLower(Arena arena, Seg seg, AccessSet mode)``

    Allow the mutator to access the memory segment in the specified
    mode (``AccessREAD``, ``AccessWRITE``, or both).

If the mutator attempts an access that hits the shield, the MPS gets
an OS-specific hardware protection fault which reaches
``ArenaAccess()``, does whatever work is necessary, then lowers the
shield and returns to the mutator.

``ShieldRaise()`` and ``ShieldLower()`` do *not* nest.


Entering the shield
...................

The MPS can only gain exclusive access from *inside* the shield
(`.def.inside`_). To enter the shield, the MPS must call
``ShieldEnter()``, and to leave it, the MPS must call
``ShieldLeave()``.

``ShieldEnter()`` and ``ShieldLeave()`` are called by ``ArenaEnter()``
and ``ArenaLeave()`` so almost all of the MPS is is inside the
shield.


Collector access to segments
............................

When the MPS wants to access object memory segments from inside the
shield, it must wrap any accesses with a ``ShieldExpose()`` and
``ShieldCover()`` pair. These calls nest. After a call to
``ShieldExpose()`` a segment is said to be *exposed* until the last
nested call to ``ShieldCover()``. The shield arranges that the MPS can
access the memory while it is exposed.

A segment might for example be exposed during:

  - format-scan (when scanning);
  - format-skip (when marking grains in a non-moving fix);
  - format-isMoved and ``AddrCopy()`` (during a copying fix);
  - format-pad (during reclaim).

Note that there is no need to call ``ShieldExpose()`` when accessing
pool management memory such as bit tables. This is not object memory,
is never (legally) accessed by the mutator, and so is never shielded.

Similarly, a pool class that never raises the shield on its segments
need never expose them to gain access.


Collector access to the unprotectable
.....................................

When the MPS wants to access an unprotectable object from inside the
shield, it must wrap any accesses with a ``ShieldHold()`` and
``ShieldRelease()`` pair. This allows access to objects which cannot
be shielded by ``ShieldRaise()``, such as:

  - the stack and registers of mutator threads,
  - lockless allocation point structures,
  - areas of memory that can't be protected by operating system calls,
  - unprotectable roots.

``void ShieldHold(Arena arena)``

    Get exclusive access to the unprotectable.

``void ShieldRelease(Arena arena)``

    Declare that exclusive access is no longer needed.


Mechanism
---------

On common operating systems, the only way to allow the MPS access is
to allow access from the whole process, including the mutator. So
``ShieldExpose()`` will suspend all mutator threads to prevent any
mutator access, and so will ``ShieldRaise()`` on an unexposed segment.
The shield handles suspending and resuming threads, and so the rest of
the MPS does not need to worry about it.

The MPS can make multiple sequential, overlapping, or nested calls to
``ShieldExpose()`` on the same segment, as long as each is balanced by
a corresponding ``ShieldCover()`` before ``ShieldLeave()`` is called.
A usage count is maintained on each segment in ``seg->depth``. When
the usage count reaches zero, there is no longer any reason the
segment should be unprotected, and the shield may reinstate hardware
protection at any time.

However, as a performance-improving hysteresis, the shield defers
re-protection, maintaining a queue of segments that require attention
before mutator threads are resumed (`.impl.delay`_). While a segment
is in the queue, it has ``seg->queued`` set true.

This hysteresis allows the MPS to proceed with garbage collection
during a pause without actually setting hardware protection until it
returns to the mutator. This is particularly important on operating
systems where the protection is expensive and poorly implemented, such
as macOS.

The queue also ensures that no memory protection system calls will be
needed for incremental garbage collection if a complete collection
cycle occurs during one pause.


Implementation
--------------

_`.impl.delay`: The implementation of the shield avoids suspending
threads for as long as possible. When threads are suspended, it
maintains a queue of segments where the desired and actual protection
do not match. This queue is flushed on leaving the shield.


Definitions
...........

_`.def.raised`: A segment has the shield *raised* for an access mode
after a call to ``ShieldRaise()`` and before a call to
``ShieldLower()`` with that mode.

_`.def.exposed`: A segment is *exposed* after a call to
``ShieldExpose()`` and before a call to ``ShieldLower()``.

_`.def.synced`: A segment is *synced* if the prot and shield modes are
the same, and unsynced otherwise.

_`.def.depth`: The *depth* of a segment is defined as:

  | depth ≔ #exposes − #covers, where
  |    #exposes = the number of calls to ``ShieldExpose()`` on the segment
  |    #covers  = the number of calls to ``ShieldCover()`` on the segment

``ShieldCover()`` must not be called without a matching
``ShieldExpose()``, so this figure must always be non-negative.

_`.def.total.depth`: The total depth is the sum of the depth over all
segments.

_`.def.outside`: Being outside the shield is being between calls to
``ShieldLeave()`` and ``ShieldEnter()``, and similarly _`.def.inside`:
being inside the shield is being between calls to ``ShieldEnter()``
and ``ShieldLeave()``. [In a multi-threaded MPS this would be
per-thread. RB 2016-03-18]

_`.def.shielded`: A segment is shielded if the shield mode is
non-zero. [As set by ShieldRaise.]


Properties
..........

_`.prop.outside.running`: The mutator may not be suspended while outside
the shield.

_`.prop.mutator.access`: An attempt by the mutator to access shielded
memory must be pre-empted by a call to ``ArenaAccess()``.

_`.prop.inside.access`: Inside the shield the MPS must be able to access
all unshielded segments and all exposed segments.


Invariants
..........

_`.inv.outside.running`: The mutator is not suspended while outside the
shield.

_`.inv.unsynced.suspended`: If any segment is not synced, the mutator is
suspended.

_`.inv.unsynced.depth`: All unsynced segments have positive depth or are
in the queue.

_`.inv.outside.depth`: The total depth is zero while outside the shield.

_`.inv.prot.shield`: The prot mode is never more than the shield mode.

_`.inv.expose.depth`: An exposed segment's depth is greater than zero.

_`.inv.expose.prot`: An exposed segment is not protected in the mode
it was exposed with.


Proof Hints
...........

Hints at proofs of properties from invariants.

_`.proof.outside`: `.inv.outside.running`_ directly ensures
`.prop.outside.running`_.

_`.proof.sync`: As the depth of a segment cannot be negative

  | total depth = 0
  |   ⇒ for all segments, depth = 0
  |   ⇒ all segments are synced (by `.inv.unsynced.depth`_)

_`.proof.access`: If the mutator is running then all segments must be
synced (`.inv.unsynced.suspended`_). Which means that the hardware
protection (protection mode) must reflect the software protection
(shield mode). Hence all shielded memory will be hardware protected
while the mutator is running. This ensures `.prop.mutator.access`_.

_`.proof.inside`: `.inv.prot.shield`_ and `.inv.expose.prot`_ ensure
`.prop.inside.access`_.


Initial ideas
-------------

_`.ideas`: There never was an initial design document, but
[RB_1995-11-29]_ and [RB_1995-11-30]_ contain some initial ideas.


Improvement Ideas
-----------------


Mass exposure
.............

_`.improv.mass-expose`: If protection calls have a high overhead it might
be good to pre-emptively unprotect large ranges of memory when we
expose one segment.  With the current design this would mean
discovering adjacent shielded segments and adding them to the queue.
The collector should take advantage of this by preferentially scanning
exposed segments during a pause.


Segment independence
....................

_`.improv.noseg`: The shield is implemented in terms of segments, using
fields in the segment structure to represent its state. This forces us
to (for example) flush the shield queue when deleting a segment. The
shield could keep track of protection and shielding independently,
possibly allowing greater coalescing and more efficient and flexible
use of system calls (see `.improv.mass-expose`_).


Concurrent collection
.....................

_`.improv.concurrent`: The MPS currently does not collect
concurrently, however the only thing that makes it not-concurrent is a
critical point in the Shield abstraction where the MPS seeks to gain
privileged access to memory (usually in order to scan it). The
critical point is where ``ShieldExpose()`` in shield.c has to call
``ShieldHold()`` to preserve the shield invariants. This is the only
point in the MPS that prevents concurrency, and the rest of the MPS is
designed to support it.

The restriction could be removed if either:

 * the MPS could use a different set of protections to the mutator
   program

 * the mutator program uses a software barrier

The first one is tricky, and the second one just hasn't come up in any
implementation we've been asked to make yet. Given a VM, it could
happen, and the MPS would be concurrent.

So, I believe there's nothing fundamentally non-concurrent about the
MPS design. It's kind of waiting to happen.

(Originally written at <http://news.ycombinator.com/item?id=4524036>.)


Early Resume
............

_`.improv.resume`: There is a tradeoff between delaying flushing the
shield queue (preventing unnecessary protection and allowing us to
coalesce) and resuming mutator threads. We could resume threads
earlier under some circumstances, such as before reclaim (which does
not need to interact with the mutator). Basically, it might be worth
resuming the mutator early in a pause if we know that we're unlikely
to suspend it again (no more calls to ``ShieldRaise()`` or
``ShieldExpose()`` on shielded segments).


Expose modes
............

_`.improv.expose-modes`: Would it be a good idea for
``ShieldExpose()`` to take an ``AccessSet``? It might be good if we
didn't have to raise a write barrier unless we want to write. When
scanning (for instance), we may not need to write, so when scanning a
segment behind a write barrier we shouldn't have to call
``mprotect()``. That's a bit speculative: how often do we scan a
segment and not write to it. Alternatively, and more speculatively, we
could keep the write barrier up, handle the (possibly nested) trap and
*then* expose the shield. I'm just scraping around for ways to reduce
calls to ``mprotect()``.

Theoretically we can do this, but:

 1. We're mostly a moving collector so we'll almost always want to
    write to segments we scan.  That could change if we do more
    non-moving collection.

 2. The main cost of protection is changing it at all, not whether we
    change just read or write.  On macOS, the main cost seems to be the
    TLB flush, which affects wall-clock time of everything on the
    processor!


References
----------

.. [RB_1995-11-29] "Shield protocol for barriers"; Richard Brooksby;
   Harlequin; 1995-11-29;
   <https://info.ravenbrook.com/project/mps/doc/2002-06-18/obsolete-mminfo/mminfo/idea/shield/index.txt>.

.. [RB_1995-11-30] "Exegesis of Incremental Tracing"; Richard Brooksby;
   Harlequin; 1995-11-30;
   <https://info.ravenbrook.com/project/mps/mail/1995/11/30/15-07/0.txt>.


Document History
----------------

- 2006-12-19 Richard Kistruck. Created: Guide, plus links to initial
  ideas.

- 2007-01-04 Richard Kistruck. Minor text changes for clarity.

- 2007-01-12 Richard Kistruck. ``ShieldEnter()`` and ``ShieldLeave()``
  are called by ``ArenaEnter()`` and ``ArenaLeave()`` respectively.

- 2013-05-24 GDR_ Converted to reStructuredText.

- 2016-03-17 RB_ Updated for dynamic queueing and general code tidying
  that has removed complaints.

- 2016-03-19 RB_ Updated for separate queued flag on segments, changes
  of invariants, cross-references, and ideas for future improvement.

.. _GDR: https://www.ravenbrook.com/consultants/gdr/

.. _RB: https://www.ravenbrook.com/consultants/rb/


Copyright and License
---------------------

Copyright © 2013–2020 `Ravenbrook Limited <https://www.ravenbrook.com/>`_.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:

1. Redistributions of source code must retain the above copyright
   notice, this list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright
   notice, this list of conditions and the following disclaimer in the
   documentation and/or other materials provided with the distribution.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.