43. MPS Strategy

43.1. Introduction

.intro This is the design of collection strategy for the MPS.

.readership MPS developers.

43.2. Overview

.overview The MPS uses “strategy” code to make three decisions:

  • when to start a collection trace;

  • what to condemn;

  • how to schedule tracing work.

This document describes the current strategy, identifies some weaknesses in it, and outlines some possible future development directions.

43.3. Requirements

[TODO: source some from req.dylan, or do an up-to-date requirements analysis – NB 2013-03-25]

Garbage collection is a trade-off between time and space: it consumes some [CPU] time in order to save some [memory] space. Strategy shifts the balance point. A better strategy will take less time to produce more space. Examples of good strategy might include:

  • choosing segments to condemn which contain high proportions of dead objects;

  • starting a trace when a large number of objects have just died;

  • doing enough collection soon enough that the client program never suffers low-memory problems;

  • using otherwise-idle CPU resources for tracing.

Conversely, it would be bad strategy to do the reverse of each of these (condemning live objects; tracing when there’s very little garbage; not collecting enough; tracing when the client program is busy).

Abstracting from these notions, requirements on strategy would relate to:

  • Maximum pause time and other utilization metrics (for example, bounded mutator utilization, minimum mutator utilization, total MPS CPU usage);

  • Collecting enough garbage (for example: overall heap size; low-memory requirements).

  • Allowing client control (for example, client recommendations for collection timing or condemnation).

There are other possible strategy considerations which are so far outside the scope of current strategy and MPS design that this document disregards them. For example, either inferring or allowing the client to specify preferred relative object locations (“this object should be kept in the same cache line as that one”), to improve cache locality.

43.4. Generations

The largest part of the current MPS strategy implementation is the support for generational GC. Generations are only fully supported for AMC (and AMCZ) pools. See under “Non-AMC Pools”, below, for more information.

43.4.1. Data Structures

The fundamental structure of generational GC is the Chain, which describes a set of generations. A chain is created by client code calling mps_chain_create(), specifying the “size” and “mortality” for each generation. When creating an AMC pool, the client code must specify the chain which will control collections for that pool. The same chain may be used for multiple pools.

Each generation in a chain has a GenDesc structure, allocated in an array pointed to from the chain. Each AMC pool has a set of PoolGen structures, one per generation. The PoolGens for each generation point to the GenDesc and are linked together in a ring on the GenDesc. These structures are (solely?) used to gather information for strategy decisions.

The arena has a unique GenDesc structure, named topGen and described in comments as “the dynamic generation” (although in fact it is the least dynamic generation). Each AMC pool has one more PoolGen than there are GenDescs in the chain. The extra PoolGen refers to this topGen.

AMC segments have a segment descriptor amcSegStruct which is a GCSegStruct with two additional fields. One field segTypeP is a pointer either to the per-generation per-pool amcGen structure (a subclass of PoolGen), or to a nailboard (which then points to an amcGen). The other field new is a boolean used for keeping track of memory usage for strategy reasons (see below under ‘Accounting’). The amcGen is used for statistics (->segs) and forwarding buffers (->forward).

The AMC pool class only ever allocates a segment in order to fill a buffer: either the buffer for a client Allocation Point, or a forwarding buffer. In order to support generational collection, there is a subclass amcBuf of SegBuf, with a gen field (pointing to a amcGen). So in AMCBufferFill() the generation of the new segment can be determined.

When an AMC pool is created, these amcGen and amcBuf structures are all created, and the amcBuf->gen fields initialized so that the forwarding buffer of each amcGen knows that it belongs to the next “older” amcGen (apart from the “oldest” amcGen - that which refers to the topGen - whose forwarding buffer belongs to itself).

When copying an object in AMCFix(), the object’s current generation is determined (amcSegGen()), and the object is copied to that amcGen’s forwarding buffer, using the buffer protocol. Thus, objects are “promoted” up the chain of generations until they end up in the topGen, which is shared between all chains and all pools.

For statistics and reporting purposes, when STATISTICS is on, each AMC pool has an array of PageRetStruct`s, one per trace.  This structure has many :c:type:`Count fields, and is intended to help to assess AMC page retention code. See job001811.

43.4.2. Zones

All collections in the MPS start with condemnation of a complete ZoneSet. Each generation in each chain has a zoneset associated with it (chain->gen[N].zones); the condemned zoneset is the union of some number of generation’s zonesets. It is condemned by code in the chain system calling TraceCondemnZones(). This is either for all chains (ChainCondemnAll() called for every chain from traceCondemnAll()) or for some number of generations in a single chain (ChainCondemnAuto() called from TracePoll()). Note that the condemnation is of every automatic-pool segment in any zone in the zoneset. It is not limited to the segments actually associated with the condemned generation(s).

An attempt is made to use distinct zonesets for different generations. Whenever a segment is allocated (AMCBufferFill()), a SegPref is created containing the generation number (obtained from amcBuf->gen->pgen->nr) and passed to SegAlloc(). The arena keeps a zoneset for each generation number (up to VMArenaGenCount, defined in arenavm.c to be MPS_WORD_WIDTH/2), and a freeSet. The zoneset for each generation number starts out empty, and the freeSet starts out ZoneSetUNIV. When a segment is allocated with a SegPref with a generation number, an attempt is made to allocate it in the corresponding zoneset (pagesFindFreeInZones()). If the zoneset is empty, an attempt is made to allocate it in the freeSet zoneset. After it is allocated, the zones it occupies are removed from the freeSet and (if there’s a generation SegPref) added to the zoneset for that generation number.

Note that this zone placement code knows nothing of chains, generations, pool classes, etc. It is based solely on the generation number, so generations with the same number from different chains share a zoneset preference for the purpose of placing newly allocated segments. Combined with the fact that condemnation is per-zone, this effectively means that generations in distinct chains are collected together. One consequence of this is that we don’t have a very fine granularity of control over collection: a garbage collection of all chains together is triggered by the most eager chain. There’s no way for a library or other small part of a client program to arrange independent collection of a separate pool or chain.

When AMCBufferFill() gets the allocated segment back, it adds it to the zoneset associated with that generation in the pool’s controlling chain. Note that a chain’s per-generation zonesets, which represent the zones in which segments for that generation in that chain have been placed, are quite distinct from the arena-wide per-generation-number zonesets, which represent the zones in which segments for that generation number in any chain have been placed. The arena-wide per-generation-number zoneset vmArena->genZoneSet[N] is augmented in vmAllocComm(). The per-chain per-generation zoneset chain->gen[N].zones is augmented in PoolGenUpdateZones(). Neither kind of zoneset can ever shrink.

43.4.3. Accounting

  • gen[N].mortality

    • Specified by the client.

    • TODO: fill in how this is used.

  • gen[N].capacity

    • Specified by the client.

    • TODO: fill in how this is used.

  • amcSeg->new

    • TODO: fill this in

  • pgen->totalSize:

  • pgen->newSize:

    • incremented by AMCBufferFill() (when not ramping) and AMCRampEnd();

    • decremented by AMCWhiten(),

    • added up by GenDescNewSize(gen).

  • gen[N].proflow:

    • set to 1.0 by ChainCreate();

    • arena->topGen.proflow set to 0.0 by LocusInit(arena);

    • The value of this field is never used.

  • pgen->newSizeAtCreate:

    • set by traceCopySizes() (that is its purpose);

    • output in the TraceStartPoolGen telemetry event.

43.4.4. Ramps

The intended semantics of ramping are pretty simple. It allows the client to advise us of periods of large short-lived allocation on a particular AP. Stuff allocated using that AP during its “ramp” will probably be dead when the ramp finishes. How the MPS makes use of this advice is up to us, but for instance we might segregate those objects, collect them less enthusiastically during the ramp and then more enthusiastically soon after the ramp finishes. Ramps can nest.

A ramp is entered by calling:

mps_ap_alloc_pattern_begin(ap, mps_alloc_pattern_ramp())

or similar, and left in a similar way.

This is implemented on a per-pool basis, for AMC only (it’s ignored by the other automatic pools). PoolAMC throws away the identity of the AP specified by the client. The implementation is intended to work by changing the generational forwarding behaviour, so that there is a “ramp generation” - one of the regular AMC generations - which forwards to itself if collected during a ramp (instead of promoting to an older generation). It also tweaks the strategy calculation code, in a way with consequences I am documenting elsewhere.

Right now, the code sets this ramp generation to the last generation specified in the pool’s “chain”: it ordinarily forwards to the “after-ramp” generation, which is the “dynamic generation” (i.e. the least dynamic generation, i.e. the arena-wide “top generation”). My recollection, and some mentions in design/poolamc, suggests that the ramp generation used to be chosen differently from this.

So far, it doesn’t sound too ghastly, I guess, although the subversion of the generational system seems a little daft. Read on….

An AMC pool has a rampMode (which is really a state of a state machine), taking one of five values: OUTSIDE, BEGIN, RAMPING, FINISH, and COLLECTING (actually the enum values are called RampX for these X). We initialize in OUTSIDE. The pool also has a rampCount, which is the ramp nesting depth and is used to allow us to ignore ramp transitions other than the outermost. According to design/poolamc, there’s an invariant (in BEGIN or RAMPING, rampCount > 0; in COLLECTING or OUTSIDE, rampCount == 0), but this isn’t checked in AMCCheck() and in fact is false for COLLECTING (see below).

There is a small set of events causing state machine transitions:

  • entering an outermost ramp;

  • leaving an outermost ramp;

  • condemning any segment of a ramp generation (detected in AMCWhiten);

  • reclaiming any AMC segment.

Here’s pseudo-code for all the transition events:

Entering an outermost ramp:

if not FINISH, go to BEGIN.

Leaving an outermost ramp:

if RAMPING, go to FINISH. Otherwise, go to OUTSIDE.

Condemning a ramp generation segment:

If BEGIN, go to RAMPING and make the ramp generation forward to itself (detach the forwarding buffer and reset its generation). If FINISH, go to COLLECTING and make the ramp generation forward to the after-ramp generation.

Reclaiming any AMC segment:
If COLLECTING:

if rampCount > 0, go to BEGIN. Otherwise go to OUTSIDE.

Now, some deductions:

#. When OUTSIDE, the count is always zero, because (a) it starts that way, and the only ways to go OUTSIDE are (b) by leaving an outermost ramp (count goes to zero) or (c) by reclaiming when the count is zero.

#. When BEGIN, the count is never zero (consider the transitions to BEGIN and the transition to zero).

#. When RAMPING, the count is never zero (again consider transitions to RAMPING and the transition to zero).

#. When FINISH, the count can be anything (the transition to FINISH has zero count, but the Enter transition when FINISH can change that and then it can increment to any value).

#. When COLLECTING, the count can be anything (from the previous fact, and the transition to COLLECTING).

#. This is a bug!! The ramp generation is not always reset (to forward to the after-ramp generation). If we get into FINISH and then see another ramp before the next condemnation of the ramp generation, we will Enter followed by Leave. The Enter will keep us in FINISH, and the Leave will take us back to OUTSIDE, skipping the transition to the COLLECTING state which is what resets the ramp generation forwarding buffer. [TODO: check whether I made an issue and/or fixed it; NB 2013-06-04]

The simplest change to fix this is to change the behaviour of the Leave transition, which should only take us OUTSIDE if we are in BEGIN or COLLECTING. We should also update design/poolamc to tell the truth, and check the invariants, which will be these:

OUTSIDE => zero BEGIN => non-zero RAMPING => non-zero

A cleverer change might radically rearrange the state machine (e.g. reduce the number of states to three) but that would require closer design thought and should probably be postponed until we have a clearer overall strategy plan.

While I’m writing pseudo-code versions of ramp-related code, I should mention this other snippet, which is the only other code relating to ramping (these notes are useful when thinking about the broader strategy code):

In AMCBufferFill(), if we’re RAMPING, and filling the forwarding buffer of the ramp generation, and the ramp generation is the forwarding buffer’s generation, set amcSeg->new to FALSE. Otherwise, add the segment size to poolGen.newSize.

And since I’ve now mentioned the amcSeg->new flag, here are the only other uses of that:

  • it initializes as TRUE.

  • When leaving an outermost ramp, go through all the segments in the pool. Any non-white segment in the rampGen with new set to FALSE has its size added to poolGen->newSize and gets new set to TRUE.

  • in AMCWhiten(), if new is TRUE, the segment size is deducted from poolGen.newSize and new is set to FALSE.

43.4.5. Non-AMC Pools

The implementations of AMS, AWL, and LO pool classes are all aware of generations (this is necessary because all tracing is driven by the generational data structures described above), but do not make use of them. For LO and AWL, when a pool is created, a chain with a single generation is also created, with size and mortality parameters hard-wired into the pool-creation function (LOInit, AWLInit). For AMS, a chain is passed as a pool creation parameter into mps_pool_create(), but this chain must also have only a single generation (otherwise ResPARAM is returned).

Note that these chains are separate from any chain used by an AMC pool (except in the trivial case when a single-generation chain is used for both AMC and AMS). Note also that these pools do not use or point to the arena->topGen, which applies only to AMC.

Non-AMC pools have no support for ramps.

43.4.6. Starting a Trace

TODO: Why do we start a trace? How do we choose what to condemn?

43.4.7. Trace Progress

TODO: When do we do some tracing work? How much tracing work do we do?