Collection framework

author Pekka P. Pirinen
date 1998-01-01
index terms pair: collection framework; design
revision //
status incomplete design
tag design.mps.collection


.intro: This document describes the Collection Framework. It's a framework for implementing garbage collection techniques and integrating them into a system of collectors that all cooperate in recycling garbage.


.framework: MPS provides a framework that allows the integration of many different types of GC strategies and provides many of the basic services that those strategies use.

.framework.cover: The framework subsumes most major GC strategies and allows many efficient techniques, like in-line allocation or software barriers.

.framework.overhead: The overhead due to cooperation is low.


But not non-existent. Can we say something useful about it?

.framework.benefits: The ability to combine collectors contributes significantly to the flexibility of the system. The reduction in code duplication contributes to reliability and integrity. The services of the framework make it easier to write new MM strategies and collectors.

.framework.mpm: The Collection Framework is merely a part of the structure of the MPM. See design.mps.architecture and design.mps.arch for the big picture.


Those two documents should be combined into one. Pekka P. Pirinen, 1998-01-15.

Other notable components that the MPM manages to integrate into a single framework are manually-managed memory and finalization services (see design.mps.finalize).


A document describing the design of manually-managed memory is missing. Pekka P. Pirinen, 1998-01-15.

.see-also: This document assumes basic familiarity with the ideas of pool (see design.mps.arch.pools) and segment (see design.mps.seg.over).

Collection abstractions

Colours, scanning and fixing

.state: The framework knows about the three colours of the tri-state abstraction and free blocks. Recording the state of each object is the responsibility of the pool, but the framework gets told about changes in the states and keeps track of colours in each segment. Specifically, it records whether a segment might contain white, grey and black objects with respect to each active trace (see .tracer)


Black not currently implemented. Pekka P. Pirinen, 1998-01-04.

(A segment might contain objects of all colours at once, or none.) This information is approximate, because when an object changes colour, or dies, it usually is too expensive to determine if it was the last object of its former colour.

.state.transitions: The possible state transitions are as follows:

free   ---alloc--> black (or grey) or white or none
none   --condemn-> white
none   --refine--> grey
grey   ---scan---> black
white  ----fix---> grey (or black)
black  --revert--> grey
white  --reclaim-> free
black  --reclaim-> none

.none-is-black: Outside of a trace, objects don't really have colour, but technically, the colour is black. Objects are only allocated grey or white during a trace, and by the time the trace has finished, they are either dead or black, like the other surviving objects. We might then reuse the colour field for another trace, so it's convenient to set the colour to black when allocating outside a trace. This means that refining the foundation (analysis.tracer.phase.condemn.refine), actually turns black segments grey, rather than vice versa, but the principle is the same.

.scan-fix: "Scanning" an object means applying the "fix" function to all references in that object. Fixing is the generic name for the operation that takes a reference to a white object and makes it non-white (usually grey, but black is a possibility, and so is changing the reference as we do for weak references). Typical examples of fix methods are copying the object into to-space or setting its mark bit.

.cooperation: The separation of scanning and fixing is what allows different GC techniques to cooperate. The scanning is done by a method on the pool that the scanned object resides in, and the fixing is done by a method on the pool that the reference points to.

.scan-all: Pools provide a method to scan all the grey objects in a segment.

Reference sets

.refsets: The cost of scanning can be significantly reduced by storing remembered sets. We have chosen a very compact and efficient implementation, called reference sets, or refsets for short (see idea.remember).


design.mps.refset is empty! Perhaps some of this should go there. Pekka P. Pirinen, 1998-02-19.

This makes the cost of maintaining them low, so we maintain them for all references out of all scannable segments.

.refsets.approx: You might describe refsets as summaries of all references out of an area of memory, so they are only approximations of remembered sets. When a refset indicates that an interesting reference might be present in a segment, we still have to scan the segment to find it.

.refsets.scan: The refset information is collected during scanning. The scan state protocol provides a way for the pool and the format scan methods to cooperate in this, and to pass this information to the tracer module which checks it and updates the segment (see design.mps.scan).


Actually, there's very little doc there. Pekka P. Pirinen, 1998-02-17.

.refsets.maintain: The MPS tries to maintain the refset information when it moves or changes object.

.refsets.pollution: Ambiguous references and pointers outside the arena will introduce spurious zones into the refsets. We put up with this to keep the scanning costs down. Consistency checks on refsets have to take this into account.

.refsets.write-barrier: A write-barrier are needed to keep the mutator from invalidating the refsets when writing to a segment. We need one on any scannable segment whose refset is not a superset of the mutator's (and that the mutator can see). If we know what the mutator is writing and whether it's a reference, we can just add that reference to the refset (figuring out whether anything can be removed from the refset is too expensive). If we don't know or if we cannot afford to keep the barrier up, the framework can union the mutator's refset to the segment's refset.

.refset.mutator: The mutator's refset could be computed during root scanning in the usual way, and then kept up to date by using a read-barrier. It's not a problem that the mutator can create new pointers out of nothing behind the read-barrier, as they won't be real references. However, this is probably not cost-effective, since it would cause lots of barrier hits. We'd need a read-barrier on every scannable segment whose refset is not a subset of the mutator's (and that the mutator can see). So instead we approximate the mutator's refset with the universal refset.

The tracer

.tracer: The tracer is an engine for implementing multiple garbage collection processes. Each process (called a "trace") proceeds independently of the others through five phases as described in analysis.tracer. The following sections describe how the action of each phase fits into the framework. See design.mps.trace for details


No, there's not much there, either. Possibly some of this section should go there. Pekka P. Pirinen, 1998-02-18.

.combine: The tracer can also combine several traces for some actions, like scanning a segment or a root. The methods the tracer calls to do the work get an argument that tells them which traces they are expected to act for.


Extend this.

.trace.begin: Traces are started by external request, usually from a client function or an action (see design.mps.action).

.trace.progress: The tracer gets time slices from the arena to work on a given trace.


This is just a provisional arrangement, in lieu of real progress control. Pekka P. Pirinen, 1998-02-18.

In each slice, it selects a small amount of work to do, based on the state of the trace, and does it, using facilities provided by the pools.

.trace.scan: A typical unit of work is to scan a single segment. The tracer can choose to do this for multiple traces at once, provided the segment is grey for more than one trace.

.trace.barrier: Barrier hits might also cause a need to scan :mps:a segment (see .hw-barriers.hit). Again, the tracer can :mps:choose to combine traces, when it does this.

.mutator-colour: The framework keeps track of the colour of the mutator separately for each trace.

The condemn phase

.phase.condemn: The agent that creates the trace (see .trace.begin) determines the condemned set and colours it white. The tracer then examines the refsets on all scannable segments, and if it can deduce some segment cannot refer to the white set, it's immediately coloured black, otherwise the pool is asked to grey any objects in the segment that might need to be scanned (in copying pools, this is typically the whole segment).

.phase.condemn.zones: To get the maximum benefit from the refsets, we try to arrange that the zones are a minimal superset (for example, generations uniquely occupy zones) and a maximal subset (there's nothing else in the zone) of the condemned set. This needs to be arranged at allocation time (or when copying during collection, which is much like allocation)


Soon, this will be handled by segment loci, see

.phase.condemn.mutator: At this point, the mutator might reference any objects, that is, it is grey. Allocation can be in any colour, most commonly white.


More could be said about this.

The grey mutator phase

.phase.grey-mutator: Grey segments are chosen according to some sort of progress control and scanned by the pool to make them black. Eventually, the tracer will decide to flip or it runs out of grey segments, and proceeds to the next phase.


Currently, this phase has not been implemented; all traces flip immediately after condemn. Pekka P. Pirinen, 1998-02-18.

.phase.grey-mutator.copy: At this stage, we don't want to copy condemned objects, because we would need an additional barrier to keep the mutator's view of the heap consistent (see analysis.async-gc.copied.pointers-and-new-copy).

.phase.grey-mutator.ambig: This is a good time to get all ambiguous scanning out of the way, because we usually can't do any after the flip and because it doesn't cause any copying.


Write a detailed explanation of this some day.

The flip phase

.phase.flip: The roots (see design.mps.root) are scanned. This has to be an atomic action as far as the mutator is concerned, so all threads are suspended for the duration.

.phase.flip.mutator: After this, the mutator is black: if we use a strong barrier (analysis.async-gc.strong), this means it cannot refer to white objects. Allocation will be in black (could be grey as well, but there's no point to it).

The black mutator phase Grey segments are chosen according to some sort of progress control and scanned by the pool to make them black. Eventually, the tracer runs out of segments that are grey for this trace, and proceeds to the next phase. At this stage white objects can be relocated, because the mutator cannot see them (as long as a strong barrier is used, as we must do for a copying collection, see analysis.async-gc.copied.pointers).

The reclaim phase

.phase.reclaim: The tracer finds the remaining white segments and asks the pool to reclaim any white objects in them.

.phase.reclaim.barrier: Once a trace has started reclaiming objects, the others shouldn't try to scan any objects that are white for it, because they might have dangling pointers in them.


Needs cross-reference to document that is yet to be written.

Currently, we reclaim atomically, but it could be incremental, or even overlapped with a new trace on the same condemned set. Pekka P. Pirinen, 1997-12-31.



An introduction and a discussion of general principles should go here. This is a completely undesigned area.

Hardware barriers

.hw-barriers: Hardware barrier services cannot, by their very nature, be independently provided to each trace. A segment is either protected or not, and we have to set the protection on a segment if any trace needs a hardware barrier on it.

.hw-barriers.supported: The framework currently supports segment-oriented Appel-Ellis-Li barriers (analysis.async-gc.barrier.appel-ellis-li), and write-barriers for keeping the refsets up-to-date. It would not be hard to add Steele barriers (analysis.async-gc.barrier.steele.scalable).

.hw-barriers.hit: When a barrier hit happens, the arena determines which segment it was on. The segment colour info is used to determine whether it had trace barriers on it, and if so, the appropriate barrier action is performed, using the methods of the owning pool. If the segment was write-protected, its refset is unioned with the refset of the mutator.


In practice this is RefSetUNIV.

.hw-barriers.hit.multiple: Fortunately, if we get a barrier hit on a segment with multiple trace barriers on it, we can scan it for all the traces that it had a barrier for.


Needs link to unwritten section under .combine.

Software barriers


Write something about software barriers.

Document History

  • 1998-01-01 Pekka P. Pirinen. Initial draft based on the current implementation of the MPS, analysis.async-gc, [that note on the independence of collections] and analysis.tracer.
  • 2002-06-07 RB Converted from MMInfo database design document.
  • 2013-03-22 GDR Converted to reStructuredText.