MPS Wiki: Garbage Collection

This wiki article contains incomplete and informal notes about the MPS, the precursor to more formal documentation. Not confidential. Readership: MPS users and developers.

Notes on getting started with GC -- Garbage Collection.

Introduction

The essential MPS concepts for GC are:

Format↓: lets the MPS ask the client a question about an object;
Root↓: tell the MPS which things (eg. stack, globals) are always alive.

The advanced MPS concepts for GC are:

Allocation Point (see user's guide): reserve memory, build a new object, and commit it to MPS management.
Protection and incremental collection↓: MPS sets operating-system read- or write-barriers on certain pages. When client tries to use this protected memory, the MPS interrupts and does just enough garbage-collection to allow the client to see this page.

Format

The client can choose how to format its objects, but the MPS will sometimes need to ask the client some questions about an object, such as:

how big is it? (so that if it is dead the MPS collects only that object)
what pointers does it contain? (so the MPS can mark referred-to objects as still alive)

For each type of question, the client must provide a function that gives the answer. These functions are called "format methods", and are collected together into a "format" (mps_fmt_t). Usually the client developer writes these functions, calls "mps_fmt_create_..." to collect them into a format, and passes this format to the MPS as an argument to mps_pool_create().

A client's format code is 'special'. It is called at special times (eg. when handling an interrupt), and is quite restricted in what it is allowed to do. Format code is often on the critical path for memory management operations. The full requirement for a format is complicated; see documentation in the Reference Manual. But here's a simplified overview:

No format is required for an object in a non-scanned, non-collectable pool -- the MPS never needs to know the internal details of objects in these pools.

For collectable and/or scannable pools, the client's format code generally needs to support three types of object:

the client's own initialized objects;
on behalf of the MPS, a "forwarding object";
on behalf of the MPS, a "padding object".

A forwarding object is a placeholder that the MPS uses when it has moved the object that used to be there. ('Normal' client code never sees these forwarding objects; only client format code does).

A padding object is a 'dummy' object that the MPS uses when there is not enough room (eg. at the end of a memory page) to put a real client object.

The client must be able to distinguish between these three types of object. To guarantee this, client code MUST initialize memory obtained from the MPS (by mps_alloc or mps_reserve) BEFORE it makes any other call into the MPS from that thread (including the call to mps_commit). Here, "initialize" means at least enough initialization that the client's format code handles the object correctly, including whether it is a client object, a forwarding object, or a padding object. [See also protocol.mps.format.rel.ap. RHSK 2006-06-02]

The most important format methods are:

Indicate the size of a collectable client object:

Skip (see mps_fmt_skip_t): client must tell the MPS how big this object is.

List all references from this object to (other) collectable objects:

Scan (see mps_fmt_scan_t): client must 'scan' the object for pointers, and tell the MPS to 'fix' each pointer it finds;

Move this client object (obsolete, see below):

Copy (obsolete) (see mps_fmt_copy_t): client must copy the object to a new location.

Create/use a forwarding object (used by an incremental moving pool after moving a client object):

Fwd (see mps_fmt_fwd_t): client must create a forwarding object here, and store the supplied forwarding pointer in it;
IsFwd (see mps_fmt_isfwd_t): client must test whether this is a forwarding object (created by the Fwd method), and if so return the forwarding pointer stored in it. (This is how the MPS remembers where a moved object was moved to).

Create a padding object:

Pad (see mps_fmt_pad_t): create a padding object here, please.

Format Methods: the Copy method is obsolete

The Copy method is obsolete, and is never called. The MPS just uses mps_lib_memcpy() to copy the bytes to the new location. It knows how many bytes to copy by using the Skip method.

Historical notes: The original justification for Copy was:

client (knowing the object's data fields) might intelligently copy less than all the bytes;
client might alter the data, eg. fixing-up internal pointers.

... but this didn't happen with actual clients, and the interface to the copy method didn't pass the skip-size (which the MPS has already determined by calling Skip) and so didn't allow an efficient copy method to be written. So it was dropped, in 1998 or earlier. The only place this memcpy currently occurs is in poolamc.c, in AMCFix() (and now AMCHeaderFix() too).

There are notes on this change at design/trace/#fix.nocopy.

Format Variants

There are several different ways to package-up the format methods -- these different ways are called 'variants'. MPS currently supports four format variants. The creation functions for these are:

It's a bit messy -- sorry.

Roots

//info.ravenbrook.com/project/mps/import/2001-09-27/mminfo/doc/impl/h/mps/index.txt#1
   307 /* Root Creation and Destruction
   308  *
   309  * mps_root_create creates the most general type of root: a scanning
   310  * function with two closure elements, p and s.
   311  *
   312  * mps_root_create_table creates a root from a block of memory which
   313  * contains a contiguous array of references.  Pages containing table
   314  * roots may be protected by the MPS if MPS_RM_PROT is set in the
   315  * root mode.
   316  *
   317  * mps_root_create_fmt creates a root from a block of memory which
   318  * contains formatted objects.  Pages containing fmt roots may be
   319  * protected by the MPS if MPS_RM_PROT is set in the root mode.
   320  *
   321  * mps_root_create_reg creates a root which will be used to scan
   322  * the registers of a thread.  (Often it will scan the stack, too.)
   323  */

You must designate the permanent root-points of your object graph as an MPS root, for example with one or more calls to mps_root_create_table. All your MPS objects (in collectable pools) must be reachable from these permanent roots, to prevent them from getting collected.

Usually, you also use mps_root_create_reg to designate your local variables (stack and registers) as an ambiguous root. This ensures the objects you are working on are preserved even if you temporarily remove them from the graph reachable from your permanent roots. Because this is an ambiguous root, it also pins the objects you are working on.

Protection and incremental collection

These notes on Protection and incremental collection were written by RHSK between 2006-06-15 and 2006-06-15, following research and discussion with RB and DRJ.

The MPS can set operating-system read- or write-barriers on certain pages. When client tries to use this protected memory, the MPS interrupts and does just enough garbage-collection to allow the client to see this page.

On Windows, currently, these barriers are implemented using SEH -- Structured Exception Handling.

Some advanced, incomplete notes on protection and foreign function calls:

//info.ravenbrook.com/mail/2006/05/17/11-03-54/0.txt
"chat with DRJ RB RHSK about work for Configura, and other MPS issues"
...
MPS should be part of runtime for a language.
One necessary part is SEH:
   - required once per thread;
   - AND required on foreign function calls inward.
Otherwise,
   1. some foreign C++ might steal our exception;
   2. this might be a thread we've never seen before
(In fact, client need to have a bit standard of code that works out  
if the MPS has seen this thread before.  It must call this code on  
every inbound call from foreign code.)  DRJ: "Dylan does this".
There's a bunch of assumes about compiled code, ABI etc, play nice  
with sighandler etc.
Might be poss for GR to geneerate code that doesn't use SEH if he  
wasn't using MPS.

MPS has a requirement that you use this extremely poorly documented  
SEH interface.
2 improvements:
   1. better documentation for this
   2. try to make the MPS *not* require SEH

job001360 -- SEH

B. Document History

  2006-06-02  RHSK  Created.
  2006-06-02  RHSK  Introduction to MPS Formats
  2006-06-06  RHSK  Formats: clarify explanation of methods, copy method is 
                    obsolete, mention format variants.
  2006-06-14  RHSK  Roots -- User's Guide: bare bones: root variants.
  2006-06-15  RHSK  Brief notes on Protection and incremental collection.
  2006-06-22  RHSK  Move Allocation Points User Guide to its own article; 
                    add note to Root.

C. Copyright and License

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
Redistributions in any form must be accompanied by information on how to obtain complete source code for the this software and any accompanying software that uses this software. The source code must either be included in the distribution or be available for no more than the cost of distribution plus a nominal fee, and must be freely redistributable under reasonable conditions. For an executable file, complete source code means the source code for all modules it contains. It does not include source code for modules or files that typically accompany the major components of the operating system on which the executable file runs.

This software is provided by the copyright holders and contributors "as is" and any express or implied warranties, including, but not limited to, the implied warranties of merchantability, fitness for a particular purpose, or non-infringement, are disclaimed. In no event shall the copyright holders and contributors be liable for any direct, indirect, incidental, special, exemplary, or consequential damages (including, but not limited to, procurement of substitute goods or services; loss of use, data, or profits; or business interruption) however caused and on any theory of liability, whether in contract, strict liability, or tort (including negligence or otherwise) arising in any way out of the use of this software, even if advised of the possibility of such damage.