4. The critical path through the MPS

single: critical path single: path; critical single: Memory Pool System; critical path

4.1. Introduction

The critical path is a key concept in the design of the Memory Pool System. Code on the critical path is usually executed more than any other code in the process. A change of just one instruction on the critical path can make as much as a 1% difference in overall run-time. A lot of the design of the MPS is arranged around making the critical path as short and fast as possible. This document describes the critical path and explains some of that design, with reference to more detailed documents.

4.2. What makes the critical path critical

In order to determine which object can be recycled, the garbage collector has to frequently examine a very large number of pointers in the program’s objects. It does this by scanning memory, both allocated objects and roots (such as the thread stacks).

This means that the scanning functions must loop over pretty much every word in memory sooner or later. The MPS takes great pains to avoid scanning memory which does not need scanning, but to get good performance, scanning must be highly optimised.

What’s more, the scanning functions apply an operation called “fix” to every pointer (or potential pointer) that they find in the objects in memory. Fixing also attempts to eliminate uninteresting pointers as fast as possible, but it has to do some work on every object that is being considered for recycling, and that can be a large proportion of the objects in existence. The path through fixing must also be highly optimised, especially in the early stages.

4.3. How the MPS avoids scanning and fixing

This is just a brief overview of how the MPS is designed to reduce unnecessary scanning and fixing.

Firstly, the MPS must occasionally decide which objects to try to recycle. It does this using various facts it knows about the objects, primarily their age and whether they’ve survived previous attempts at recycling them. It then “condemns” a large number of objects at once, and each of these objects must be “preserved” by fixing references to them.

When the MPS condemns objects it chooses sets of objects in a small set of “zones” in memory (preferably a single zone). The zone of an object can be determined extremely quickly from its address, without looking at the object or any other data structure.

The MPS arranges that objects which will probably die at the same time are in the same zones.

The MPS allocates in “segments”. Each segment is of the order of one “tract” of memory (generally the same as the operating system page size, usually 4KiB or 8KiB) but may be larger if there are large objects inside. The MPS maintains a “summary” of the zones pointed to by all the pointers in a segment from previous scans.

So, once the MPS has decided what to condemn, it can quickly eliminate all segments which definitely do not point to anything in those zones. This avoids a large amount of scanning. It is an implementation of a remembered set, though it is unlike that in most other GCs.

In addition, the fix operation can quickly ignore pointers to the wrong zones. This is called the “zone check” and is a BIBOP technique.

Even if a pointer passes the zone check, it may still not point to a segment containing condemned objects. The next stage of the fix operation is to look up the segment pointed to by the pointer and see if it was condemned. This is a fast lookup.

After that, each pool class must decide whether the pointer is to a condemned object and do something to preserve it. This code is still critical. The MPS will have tried to condemn objects that are dead, but those objects are still likely to be in segments with other objects that must be preserved. The pool class fix method must quickly distinguish between them.

Furthermore, many objects will be preserved at least once in their lifetime, so even the code that preserves an object needs to be highly efficient. (Programs in languages like ML might not preserve 95% of their objects even once, but many other programs will preserve nearly all of theirs many times.)

4.4. Where to find the critical path

Very briefly, the critical path consists of five stages:

  1. The scanner, which iterates over pointers in objects. The MPS has several internal scanners, but the most important ones will be format scanners in client code registered through mps_format_create() functions.

    Note

    There needs to be a chapter in the manual explaining how to write a good scanner. Then that could be linked from here.

  2. The first-stage fix, which filters out pointers inline in the scanner. This is implemented in MPS_FIX() macros in mps.h.

  1. The second-stage fix, which filters out pointers using general information about segments. This is _mps_fix2 in trace.c.

  2. The third-stage fix, which filters out pointers using pool-specific information. Implemented in pool class functions called AMCFix(), LOFix(), etc. in pool*.c.

  3. Preserving the object, which might entail

    • marking it to prevent it being recycled; and/or

    • copying it and updating the original pointer (or just updating the pointer, if the object has previously been copied); and/or

    • adding it to a queue of objects to be scanned later, if it contains pointers.

    Found in or near the pool class fix functions.

4.5. The format scanner

The critical path starts when a format scan method is called. That is a call from the MPS to a client function of type mps_fmt_scan_t registered with one of the mps_format_create() functions in mps.h.

Here is an example of part of a format scanner for scanning contiguous runs of pointers, from fmtdy.c, the scanner for the Open Dylan runtime:

static mps_res_t dylan_scan_contig(mps_ss_t mps_ss,
                                   mps_addr_t *base, mps_addr_t *limit)
{
  mps_res_t res;
  mps_addr_t *p;        /* reference cursor */
  mps_addr_t r;         /* reference to be fixed */

  MPS_SCAN_BEGIN(mps_ss) {
          p = base;
    loop: if(p >= limit) goto out;
          r = *p++;
          if(((mps_word_t)r&3) != 0) /* pointers tagged with 0 */
            goto loop;             /* not a pointer */
          if(!MPS_FIX1(mps_ss, r)) goto loop;
          res = MPS_FIX2(mps_ss, p-1);
          if(res == MPS_RES_OK) goto loop;
          return res;
    out:  assert(p == limit);
  } MPS_SCAN_END(mps_ss);

  return MPS_RES_OK;
}

(To help with understanding optimisation of this code, it’s written in a pseudo-assembler style, with one line roughly corresponding to each instruction of an idealized intermediate code.)

The MPS C interface provides macros to try to help optimise this code. The mps_ss object is a “scan state” and contains data that is used to eliminate uninteresting pointers now, and record information which will be used to reduce scanning in future by maintaining the remembered set.

The macros MPS_SCAN_BEGIN() and MPS_SCAN_END() load key data from the scan state into local variables, and hopefully into processor registers. This avoids aliasing values that we know won’t change when calls are made to _mps_fix2 later, and so allows the compiler to keep the scan loop small and avoid unnecessary memory references.

This scanner knows that words not ending in 0b00 aren’t pointers to objects, so it eliminates them straight away. This is a kind of reference tag chosen by the client for its object representation.

Next, the pointer is tested using MPS_FIX1(). This performs fast tests on the pointer without using any other memory. In particular, it does the “zone check” described in section 3. If a pointer fails these tests, it isn’t interesting and can be skipped. It is very important to proceed to the next pointer as fast as possible in this case.

Having passed these tests, we need to fix the pointer using other data in memory, and possibly call the MPS to preserve the object. This is what MPS_FIX2() does. The important distinction here is that MPS_FIX2() can fail and return an error code, which must be propagated without ado by returning from the scanner. Separating MPS_FIX1() from MPS_FIX2() helps keep the error handling code away from the tight loop with the zone check.

MPS_FIX*, the macro/inline part of the fix operation, are referred to as “fix stage 1” or “the first stage fix” in other documents and comments.

If these inline checks pass, _mps_fix2 is called. If the MPS has been built as a separate object file or library, this is where the function call out of the scan loop happens. Since version 1.110 of the MPS, we encourage clients to compile the MPS in the same translation unit as their format code, so that the compiler can be intelligent about inlining parts of _mps_fix2 in the format scanner. The instructions for doing this are in Building the Memory Pool System, part of the manual.

4.6. The second stage fix in the MPM

If a pointer gets past the first-stage fix filters, it is passed to _mps_fix2, the “second stage fix”. The second stage can filter out yet more pointers using information about segments before it has to consult the pool class.

The first test applied is the “tract test”. The MPS looks up the tract containing the address in the tract table, which is a simple linear table indexed by the address shifted – a kind of flat page table.

Note that if the arena has been extended, the tract table becomes less simple, and this test may involved looking in more than one table. This will cause a considerable slow-down in garbage collection scanning. This is the reason that it’s important to give a good estimate of the amount of address space you will ever occupy with objects when you initialize the arena.

The pointer might not even be in the arena (and so not in any tract). The first stage fix doesn’t guarantee it. So we eliminate any pointers not in the arena at this stage.

If the pointer is in an allocated tract, then the table also contains a cache of the “white set” – the set of garbage collection traces for which the tract is “interesting”. If a tract isn’t interesting, then we know that it contains no condemned objects, and we can filter out the pointer.

If the tract is interesting them it’s part of a segment containing objects that have been condemned. The MPM can’t know anything about the internal layout of the segment, so at this point we dispatch to the third stage fix.

This dispatch is slightly subtle. We have a cache of the function to dispatch to in the scan state, which has recently been looked at and is with luck still in the processor cache. The reason there is a dispatch at all is to allow for a fast changeover to emergency garbage collection, or overriding of garbage collection with extra operations. Those are beyond the scope of this document. Normally, ss->fix points at PoolFix(), and we rely somewhat on modern processor branch target prediction). PoolFix() is passed the pool, which is fetched from the tract table entry, and that should be in the cache.

PoolFix() itself dispatches to the pool class. Normally, a dispatch to a pool class would indirect through the pool class object. That would be a double indirection from the tract, so instead we have a cache of the pool’s fix method in the pool object. This also allows a pool class to vary its fix method per pool instance, a fact that is exploited to optimize fixing in the AMC Pool depending on what kind of object format it is managing.

4.7. The third stage fix in the pool class

The final stage of fixing is entirely dependent on the pool class. The MPM can’t, in general, know how the objects within a pool are arranged, so this is pool class specific code.

Furthermore, the pool class must make decisions based on the “reference rank” of the pointer. If a pointer is ambiguous (RankAMBIG) then it can’t be changed, so even a copying pool class can’t move an object. On the other hand, if the pointer is weak (RankWEAK) then the pool fix method shouldn’t preserve the object at all, even if it’s condemned.

The exact details of the logic that the pool fix must implement in order to co-operate with the MPM and other pools are beyond the scope of this document, which is about the critical path. Since it is on the critical path, it’s important that whatever the pool fix does is simple and fast and returns to scanning as soon as possible.

The first step, though, is to further filter out pointers which aren’t to objects, if that’s its policy. Then, it may preserve the object, according to its policy, and possibly ensure that the object gets scanned at some point in the future, if it contains more pointers.

If the object is moved to preserve it (for instance, if the pool class implements a copying GC), or was already moved when fixing a previous reference to it, the reference being fixed must be updated (this is the origin of the term “fix”).

As a simple example, LOFix() is the pool fix method for the Leaf Only pool class. It implements a marking garbage collector, and does not have to worry about scanning preserved objects because it is used to store objects that don’t contain pointers. (It is used in compiler run-time systems to store binary data such as character strings, thus avoiding any scanning, decoding, or remembered set overhead for them.)

LOFix() filters any ambiguous pointers that aren’t aligned, since they can’t point to objects it allocated. Otherwise it subtracts the segment base address and shifts the result to get an index into a mark bit table. If the object wasn’t marked and the pointer is weak, then it sets the pointer to zero, since the object is about to be recycled. Otherwise, the mark bit is set, which preserves the object from recycling when LOReclaim is called later on. LOFix illustrates about the minimum and most efficient thing a pool fix method can do.

4.8. Other considerations

So far this document has described the ways in which the garbage collector is designed around optimising the critical path. There are a few other things that the MPS does that are important.

Firstly, inlining is very important. The first stage fix is inlined into the format scanner by being implemented in macros in mps.h. And to get even better inlining, we recommend that the whole MPS is compiled in a single translation unit with the client format and that strong global optimisation is applied.

Secondly, we are very careful with code annotations on the critical path. Assertions, statistics, and telemetry are all disabled on the critical path in “hot” (production) builds. (In fact, it’s because the critical path is critical that we can afford to leave annotations switched on elsewhere.)

Last, but by no means least, we pay a lot of brainpower and measurement to the critical path, and are very very careful about changing it. Code review around the critical path is especially vigilant.

And we write long documents about it.

4.9. References

MMRef

“The Memory Management Reference”; <http://www.memorymanagement.org/>.