The MPS Locus Manager

This document contains a guide to the MPS locus manager, followed by the historical initial design. References, History, Copyright and License are at the end.

Guide

Readership: any MPS developer. Not confidential.

The MPS manages three main resources:

storage;
address-space;
time.

Management of arena address-space

The locus manager manages address-space at the arena level.

[Tucker was right: mail.ptw.1998-11-02.14-25. RHSK 2007-04-24].

When a pool wants some address-space, it expresses some preferences to the locus manager. The locus manager and the arena (working together) try to honour these preferences, and decide what address-space the pool gets.

Preferences are expressed by the SegPref argument to SegAlloc().

Note that, when they call SegAlloc(), pools are asking for address-space and writeable storage simultaneously, in a single call. There is currently no way for pools to 'reserve' address space without requesting storage.

Why is it important to manage address-space?

Trace differentiability

Carefully chosen addresses are used by reference tracing systems (ie. automatic pools), to:

categorise objects into clumps;
summarise and cheaply find references between clumps.

Different clumps will become worth collecting at different times (the classic example, of course, is generations in a generational collector). For these partial collections to be efficient, it must be cheap to keep these clumps differentiable, cheap to condemn (Whiten) a particular clump, and cheap to find a good conservative approximation to all inward references to a clump (both initially to construct the Grey set, and to make scanning the Grey set efficient).

This is what the MPS zone mechanism is all about.

The locus manager manages the mapping from clumps to zones.

To specify a clump, pools use the "SegPrefGen" tag to SegPrefExpress. [The name is misleading: generations are only one sort of clump. RHSK 2007-04-24]

Prevent address-space fragmentation (within arena)

Address-space is not infinite.

In some use-cases, the MPS is required to remain efficient when using very nearly all available address-space and storage. (For example, with the client-arena class, where the only address-space available is that of the storage available).

Even with the VM arena class, typical storage sizes (in the year 2007) can make 32-bit address-space constrained: the problem may have Order(4GB) size, which leaves little 'spare' address-space.

Address-space fragmentation incurs failure when there is no way to allocate a big block of address-space. The big block may be requested via the MPS (by the client), or by something else in the same process, such as a 3rd-party graphics library, image library, etc.

Address-space fragmentation incurs cost when:

desired large-block requests (such as for buffering) are denied, causing them to be re-requested as a smaller block, or as several smaller blocks;
possible operating-system costs in maintaining a fragmented mapping?

Prevent storage fragmentation (within tracts and segments)

Storage is not infinite: it is allocated in multiples of a fixed-size tract. Small lonely objects, each retaining a whole tract, cause storage fragmentation.

Non-moving pools manage this fragmentation with placement strategies that use:

co-located death (in space and time);
segment merging and splitting.

These pool-level strategies always care about contiguity of object storage. They also often care about the ordering of addresses, because pool code uses an address-ordered search when choosing where to place a new object. For these two reasons, the address chosen (by the locus manager and arena) for new tracts is important.

Certain specialised pools, and/or some client programs that use them, have carefully tuned segment sizes, positioning, and search order. Be careful: seemingly inconsequential changes can catastrophically break this tuning.

Pools can specify a preference for High and Low ends of address-space, which implies a search-order. Pools could also specify clumping, using either SegPrefGen or SegPrefZoneSet.

Discovering the layout

The locus manager is not given advance notice of how much address-space will be required with what preferences. Instead, the locus manager starts with an empty 'layout', and adapts it as more requests come in over time. It is attempting to 'discover' a suitable layout by successive refinement. This is ambitious.

Initial Design

                    THE DESIGN FOR THE LOCUS MANAGER
                            design.mps.locus
                           incomplete design
                           gavinm 1998-02-27

INTRODUCTION

.intro: The locus manager coordinates between the pools and takes the burden of 
having to be clever about tract/group placement away from the pools, preserving 
trace differentiability and contiguity where appropriate.

.source: mail.gavinm.1998-02-05.17-52(0), mail.ptw.1998-02-05.19-53(0), 
mail.pekka.1998-02-09.13-58(0), and mail.gavinm.1998-02-09.14-05(0).


Document History

.hist.0: Originally written as part of change.dylan.box-turtle.170569.  Much 
developed since.  gavinm 1998-02-27

.hist.1: Pekka wrote the real requirements after some discussion.  pekka 
1998-10-28

.hist.2: Pekka deleted Gavin's design and wrote a new one.  pekka 1998-12-15


DEFINITIONS

.note.cohort: We use the word "cohort" in its usual sense here, but we're 
particularly interested in cohorts that have properties relevant to tract 
placement.  It is such cohorts that the pools will try to organize using the 
services of the locus manager.  Typical properties would be trace 
differentiability or (en masse) death-time
predictability.  Typical cohorts would be instances of a non-generational pool, 
or generations of a collection strategy.

.def.trace.differentiability: Objects (and hence tracts) that are collected, 
may or may not have "trace differentiability" from each other, depending on 
their placement in the different zones.  Objects (or pointers to them) can also 
have trace differentiability (or not) from non-pointers in ambiguous 
references; in practice, we will be worried about low integers, that may appear 
to be in zones 0 or -1.


REQUIREMENTS

.req.cohort: Tract allocations must specify the cohort they allocate in.  These 
kind of cohorts will be called loci, and they will have such attributes as are 
implied by the other requirements.  Critical.

.req.counter.objects: As a counter-requirement, pools are expected to manage 
objects.  Objects the size of a tract allocation request (segment-sized) are 
exceptional.  Critical.  .req.counter.objects.just: This means the locus 
manager is not meant to solve the problems of allocating large objects, and it 
isn't required to know what goes on in pools.

.req.contiguity: Must support a high level of contiguity within cohorts when 
requested.  This means minimizing the number of times a cohort is made aware of 
discontiguity.  Essential (as we've effectively renegotiated this in SW, down 
to a vague hope that certain critical cohorts are not too badly fragmented).  
.req.contiguity.just: TSBA.

.req.contiguity.specific: It should be possible to request another allocation 
next to a specific tract on either side (or an extension in that direction, as 
the case may be).  Such a request can fail, if there's no space there.  Nice.  
[IWBN to have one for "next to the largest free block".]

.req.differentiable: Must support the trace differentiability of segments that 
may be condemned separately.  Due to the limited number of zones, it must be 
possible to place several cohorts into the same zone.  Essential.

.req.differentiable.integer: It must be possible to place collectable 
allocations so that they are trace-differentiable from small integers.  
Essential.

.req.disjoint: Must support the disjointness of pages that have different VM 
properties (such as mutable/immutable, read-only/read-write, and different 
lifetimes).  Optional.  [I expect the implementation will simply work at page 
or larger granularity, so the problem will not arise, but Tucker insisted on 
stating this as a requirement.  pekka 1998-10-28]

.req.low-memory: The architecture of the locus manager must not prevent the 
design of efficient applications that often use all available memory.  
Critical.  .req.low-memory.expl: This basically says it must be designed to 
perform well in low-memory conditions, but that there can be configurations 
where it doesn't do as well, as long as this is documented for the application 
programmer.  Note that it doesn't say all applications are efficient, only that 
if you manage to design an otherwise efficient application, the locus manager 
will not sink it.

.req.address: Must conserve address space in VM arenas to a reasonable extent.  
Critical.

.req.inter-pool: Must support the association of sets of tracts in different 
pools into one cohort.  Nice.

.req.ep-style: Must support the existing EP-style of allocation whereby 
allocation is from one end of address space either upwards or downwards (or a 
close approximation thereto with the same behavior).  .req.ep-style.just: We 
cannot risk disrupting a policy with well-known properties when this technology 
is introduced.

.req.attributes: There should be a way to inform the locus manager about 
various attributes of cohorts that might be useful for placement: deathtime, 
expected total size, [more in the future].  Optional.  [It's a given that the 
cohorts must then have these attributes, within the limits set in the contract 
of the appropriate interface.]  .req.attributes.action: The locus manager 
should use the attributes to guide its placement decisions.  Nice.

.req.blacklisting: There should be a way of maintaining at least one blacklist 
for pages (or some other small unit), that can not/should not be allocated to 
collectable pools.  [How to do blacklist breaking for ambiguous refs?]  
Optional.

.req.hysteresis: There should be a way to indicate which cohorts fluctuate in 
size and by how much, to guide the arena hysteresis to hold on to suitable 
pages.  Optional.


ANALYSIS

.anal.sw: Almost any placement policy would be an improvement on the current SW 
one.

.anal.cause-and-effect: The locus manager doesn't usually need to know _why_ 
things need to be differentiable, disjoint, contiguous, etc.  Abstracting the 
reason away from the interface makes it more generic, more likely to have 
serendipitous new uses.  Attributes described by a quantity (deathtime, size, 
etc.) are an exception to this, because we can't devise a common measure.

.anal.stable: The strategy must be stable: it must avoid repeated 
recomputation, especially the kind that switches between alternatives with a 
short period (repeated "bites" out the same region or flip-flopping between two 
regions).

.anal.fragmentation: There's some call to avoid fragmentation in cohorts that 
don't need strict contiguity, but this is not a separate requirement, since 
fragmentation is a global condition, and can only be ameliorated if there's a 
global strategy that clumps allocations together.  .anal.deathtime: Cohorts 
with good death-time clumping of their objects could use some locality of tract 
allocation, because it increases the chances of creating large holes in the 
address space (for other allocation to use).  OTOH. many cohorts will not do 
multiple frees in short succession, or at least cannot reasonably be predicted 
to do so.  This locality is not contiguity, nor is it low fragmentation, it's 
just the requirement to place the new tracts next to the tract where the last 
object was allocated in the cohort.  Note that the placement of objects is 
under the control of the pool, and the locus manager will not know it, 
therefore this requirement should be pursued by requesting allocation next to a 
particular tract (which we already have a requirement for).

.anal.asymmetrical: The strategy has to be asymmetrical with respect to cohorts 
growing and shrinking.  The reason of this asymmetry is that it can choose 
where to grow, but it cannot choose where to shrink (except in a small way by 
growing with good locality).


INTERFACE

.interface.locus: A cohort will typically reside on multiple tracts (and the 
pools will avoid putting objects of other cohorts on them), so there should be 
an interface to describe the properties of the cohort, and associate each 
allocation request with the cohort.  We shall call such an object, created to 
represent a cohort, a locus (pl. loci).

.interface.locus.pool: Loci will usually be created by the pool that uses it.  
Some of the locus attributes will be inherited from client-specified pool 
attributes [this means there will be additional pool attributes].

.interface.detail: This describes interface in overview; for details, see 
implementation section and code, or user doc.


Loci

.function.create: A function to create a locus:
  Res LocusCreate(Locus *locusReturn, LocusAttrs attrs, ZoneGroup zg, 
LocusAllocDesc adesc)
where adesc contains the information about the allocation sequences in the 
locus, zg is used for zone differentiability, and attrs encodes the following:
  .locus.contiguity: A locus can be contiguous.  This means performing as 
required in .req.contiguity, non-contiguous allocations can be freely placed 
anywhere (but efficiency dictates that similar allocations are placed close 
together and apart from others).
  .locus.blacklist: Allocations in the locus will avoid blacklisted pages (for 
collectable segments).
  .locus.zero: Allocations in the locus are zero-filled.
  [Other attributes will be added, I'm sure.]

.interface.zone-group: The locus can be made a member of a zone group.  Passing 
ZoneGroupNONE means it's not a member of any group (allocations will be placed 
without regard to zone, except to keep them out of stripes likely to be needed 
for some group).  [I propose no mechanism for managing zone groups at this 
time, since it's only used internally for one purpose.  pekka 2000-01-17]

.interface.size: An allocation descriptor (LocusAllocDesc) contains various 
descriptions of how the locus will develop over time (inconsistent 
specifications are forbidden, of course):
  .interface.size.typical-alloc: Size of a typical allocation in this locus, in 
bytes.  This will mainly affect the grouping of non-contiguous loci.
  .interface.size.large-alloc: Typical large allocation that the manager should 
try to allow for (this allows some relief from .req.counter.objects), in 
bytes.  This will mainly affect the size of gaps that will be allotted 
adjoining this locus.
  .interface.size.direction: Direction of growth: up/down/none.  Only useful if 
the locus is contiguous.
  .interface.size.lifetime: Some measure of the lifetime of tracts (not 
objects) in the cohort.  [Don't know the details yet, probably only useful for 
placing similar cohorts next to each other, so the details don't actually 
matter.  pekka 2000-01-17]
  .interface.size.deathtime: Some measure of the deathtime of tracts (not 
objects) in the cohort.  [Ditto.  pekka 2000-01-17]

.function.init: LocusInit is like LocusCreate, but without the allocation.  
This is the usual i/f, since most loci are embedded in a pool or something.

.function.alloc: ArenaAlloc to take a locus arg.  ArenaAllocHere is like it, 
plus it takes a tract and a specification to place the new allocation 
immediately above/below a given tract; if that is not possible, it returns 
ResFAIL (this will make it useful for realloc functionality).

.function.set-total: A function to tell the arena the expected number of 
(non-miscible client) loci, and of zone groups:
  ArenaSetTotalLoci(Arena arena, Size nLoci, Size nZoneGroups)


Peaks

.function.peak.create: A function to create a peak:
 mps_res_t mps_peak_create(mps_peak_t*, mps_arena_t)
A newly-created peak is open, and will not be used to guide the strategy of the 
locus manager.

.function.peak.add: A function to add a description of the state of one pool 
into the peak:
  mps_res_t mps_peak_describe_pool(mps_peak_t, mps_pool_t, mps_size_desc_t)
Calling this function again for the same peak and pool instance will replace 
the earlier description.  .function.peak.add.size: The size descriptor contains 
a total size in bytes or % of arena size [@@@@is this right?].  
.function.peak.add.remove: Specifying a NULL size will remove the pool from the 
peak.  The client is not allowed to destroy a pool that is mentioned in any 
peak; it must be first removed from the peak, or the peak must be destroyed.  
This is to ensure that the client adjusts the peaks in a manner that makes 
sense to the application; the locus manager can't know how to do that.

.function.peak.close: A function to indicate that all the significant pools 
have been added to the peak, and it can now be used to guide the locus manager:
 mps_res_t mps_peak_close(mps_peak_t)
For any pool not described in the peak, the locus manager will take its current 
size at any given moment as the best prediction of its size at the peak.  
.function.peak.close.after: It is legal to add more descriptions to the peak 
after closing, but this will reopen the peak, and it will have to be closed 
before the locus manager will use it again.  The locus manager uses the 
previous closed state of the peak, while this is going on.

.function.peak.destroy: A function to destroy a peak:
 void mps_peak_destroy(mps_peak_t)

.interface.ep-style: This satisfies .req.ep-style by allowing SW to specify 
zero size for most pools (which will cause them to be place next to other loci 
with the same growth direction).  [Not sure this is good enough, but we'll try 
it first.  pekka 2000-01-17]


ARCHITECTURE

Data Objects

.arch.locus: To represent the cohorts, we have locus objects.  Usually a locus 
is embedded in a pool instance, but generations are separate loci.

.arch.locus.attr: contiguity, blacklist, zg, current region, @@@@

.arch.locus.attr.exceptional: The client can define a typical large allocation 
for the locus.  Requests substantially larger than that are deemed exceptional.

.arch.zone-group: To satisfy .req.condemn, we offer zone groups.  Each locus 
can be a member of a zone group, and the locus manager will attempt to place 
allocations in this locus in different zones from all the other zone groups.  A 
zone-group is represented as @@@@.

.arch.page-table: A page table is maintained by the arena, as usual to track 
association between tracts, pools and segments, and mapping status for VM 
arenas.

.arch.region: All of the address space is divided into disjoint regions, 
represented by region objects.  These objects store their current limits, and 
high and low watermarks of currently allocated tracts (we hope there's usually 
a gap of empty space between regions).  The limits are actually quite porous 
and flexible.

.arch.region.assoc: Each region is associated with one contiguous locus or any 
number of non-contiguous loci (or none).  We call the first kind of region 
"contiguous".  .arch.locus.assoc: Each locus remembers all regions where it has 
tracts currently, excepting the badly-placed allocations (see below).  It is 
not our intention that any locus would have very many, or that loci that share 
regions would have any reason to stop doing do.

.arch.region.more: Various quantities used by the placement computation are 
also stored in the regions and the loci.  Regions are created (and destroyed) 
by the placement recomputation.  Regions are located in stripes (if it's a 
zoned region), but they can extend into neighboring stripes if an exceptionally 
large tract allocation is requested (to allow for large objects).

.arch.chunk: Arenas may allocate more address space in additional chunks, which 
may be disjoint from the existing chunks.  Inter-chunk space will be 
represented by dummy regions.  There are also sentinel regions at both ends of 
the address space.


Overview of Strategy

.arch.strategy.delay: The general strategy is to delay placement decisions 
until they have to be made, but no later.

.arch.strategy.delay.until: Hence, the locus manager only makes placement 
decisions when an allocation is requested (frees and other operations might set 
a flag to cause the next allocation to redecide).  This also allows the client 
to change the peak and pool configuration in complicated ways without causing a 
lot of recomputation, by doing all the changes without allocating in the middle 
(unless the control pool needs more space because of the changes).

.arch.strategy.normal: While we want the placement to be sophisticated, we do 
not believe it is worth the effort to consider all the data at each 
allocation.  Hence, allocations are usually just placed in one of the regions 
used previously (see .arch.alloc) without reconsidering the issues.

.arch.strategy.normal.limit: However, the manager sets precautionary limits on 
the regions to ensure that the placement decisions are revisited when an 
irrevocable placement is about to be made.

.arch.strategy.create: The manager doesn't create new regions until they are 
needed for allocation (but it might compute where they could be placed to 
accommodate a peak).


Allocation

.arch.alloc: Normally, each allocation to a locus is placed in its current 
region.  New regions are only sought when necessary to fulfill an allocation 
request or when there is reason to think the situation has changed 
significantly (see .arch.significant).

.arch.alloc.same: An allocation is first attempted next to the previous 
allocation in the same locus, respecting growth direction.  If that is not 
possible, a good place in the current region is sought.  .arch.alloc.same.hole: 
ATM, for finding a good place within a region, we just use the current 
algorithm, limited to the region.  In future, the placement within regions will 
be more clever.

.arch.alloc.extend: If there's no adequate hole in the current region and the 
request is not exceptional, the neighboring regions are examined to see if the 
region could be extended at one border.  (This will basically only be done if 
the neighbor has shrunk since the last placement recomputation, because the 
limit was set on sophisticated criteria, and should not be changed without 
justification.)   .arch.alloc.extend.here: When an allocation is requested next 
to a specific tract (ArenaAllocHere), we try to extend a little harder [at 
least for change_size, perhaps not for locality].

.arch.alloc.other: If no way can be found to allocate in the current region, 
other regions used for this locus are considered in the same way, to see if 
space can be found there.  [Or probably look at other regions before trying to 
extend anything?]

.arch.alloc.recompute: When no region of this locus has enough space for the 
request, or when otherwise required, region placement is recomputed to find a 
new region for the request (which might be the same region, after extension).

.arch.alloc.current: This region where the allocation was placed then becomes 
the current region for this locus, except when the request was exceptional, or 
when the region chosen was "bad" (see @@@@).

.arch.significant: Significant changes to the parameters affecting placement 
are deemed to have happened at certain client calls and when the total 
allocation has changed substantially since the last recomputation.  Such 
conditions set a flag that causes the next allocation to recompute even if its 
current region is not full [possibly second-guess the decision to recompute 
after some investigation of the current state?].


Deallocation

.arch.free: Deallocation simply updates the counters in the region and the 
locus.  For some loci, it will make the region of the deallocation the current 
region.
.arch.free.remove: If a region becomes entirely empty, it is deleted (and the 
neighbors limits might be adjusted [quite tricky to get right, this]).


Region Placement Recomputation

.arch.gap: When doing placement computations, we view the arena as a sequence 
of alternating region cores and gaps (which can be small, even zero-sized).  
Initially, we'll take the core of a region to be the area between the high and 
low watermark, but in the future we might be more flexible about that.  [Edge 
determination is actually a worthwhile direction to explore.]

.arch.reach: The gap between two cores could potentially end up being allocated 
to either region, if they grow in that direction, or one or neither, if they 
don't.  The set of states that the region assignment could reach by assigning 
the gaps to their neighbors is called the reach of the current configuration.

.arch.placement.object: The object of the recomputation is to find a 
configuration of regions that is not too far from the current configuration and 
that keeps all the peaks inside its reach; if that is not possible, keep the 
nearest ones in the reach and then minimize the total distance from the rest.

.arch.placement.hypothetical: The configurations that are considered will 
include hypothetical placements for new regions for loci that cannot fit in 
their existing regions at the peak.  This is necessary to avoid choosing a bad 
alternative.

.arch.placement.interesting: The computation will only consider new regions of 
loci that are deemed interesting, i.e., far from their peak state.  This will 
reduce the computational burden and avoid jittering near a peak.

[details missing]


IMPLEMENTATION

[missing]


NOTES

.idea.change: Even after the first segment, be prepared to change your mind, if 
by the second segment a lot of new loci have been created.

.distance: If the current state is far from a peak, there's time to reassign 
regions and for free space to appear (in fact, under the steady arena 
assumption, enough free space _will_ appear).

.clear-pool: Need to have a function to deallocate all objects in a pool, so 
that PoolDestroy won't have to be used for that purpose.

A. References

B. Document History

2002-06-07	RB	Converted from MMInfo database design document.
2007-04-24	RHSK	add Guide: Manage arena address-space, why, discover layout.

C. Copyright and License

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
Redistributions in any form must be accompanied by information on how to obtain complete source code for the this software and any accompanying software that uses this software. The source code must either be included in the distribution or be available for no more than the cost of distribution plus a nominal fee, and must be freely redistributable under reasonable conditions. For an executable file, complete source code means the source code for all modules it contains. It does not include source code for modules or files that typically accompany the major components of the operating system on which the executable file runs.

This software is provided by the copyright holders and contributors "as is" and any express or implied warranties, including, but not limited to, the implied warranties of merchantability, fitness for a particular purpose, or non-infringement, are disclaimed. In no event shall the copyright holders and contributors be liable for any direct, indirect, incidental, special, exemplary, or consequential damages (including, but not limited to, procurement of substitute goods or services; loss of use, data, or profits; or business interruption) however caused and on any theory of liability, whether in contract, strict liability, or tort (including negligence or otherwise) arising in any way out of the use of this software, even if advised of the possibility of such damage.