Ravenbrook / Projects / Memory Pool System / Master Product Sources / Design Documents

Memory Pool System Project


                 DESIGN FOR COALESCING BLOCK STRUCTURE
                             design.mps.cbs
                             incomplete doc
                           gavinm 1998-05-01

INTRODUCTION

.intro: This is the design for impl.c.cbs, which implements a data structure 
for the management of non-intersecting memory ranges, with eager coalescence.

.readership: This document is intended for any MM developer.

.source: design.mps.poolmv2, design.mps.poolmvff.

.overview: The "coalescing block structure" is a set of addresses (or a subset 
of address space), with provision for efficient management of contiguous 
ranges, including insertion and deletion, high level communication with the 
client about the size of contiguous ranges, and detection of protocol 
violations.


Document History

.hist.0: This document was derived from the outline in design.mps.poolmv2(2).  
Written by Gavin Matthews 1998-05-01.

.hist.1: Updated by Gavin Matthews 1998-07-22 in response to approval comments 
in change.epcore.anchovy.160040 There is too much fragmentation in trapping 
memory.

.hist.2: Updated by Gavin Matthews (as part of change.epcore.brisling.160158: 
MVFF cannot be instantiated with 4-byte alignment) to document new alignment 
restrictions.


DEFINITIONS

.def.range: A (contiguous) range of addresses is a semi-open interval on 
address space.

.def.isolated: A contiguous range is isolated with respect to some property it 
has, if adjacent elements do not have that property.

.def.interesting: A block is interesting if it is of at least the minimum 
interesting size specified by the client. 


REQUIREMENTS

.req.set: Must maintain a set of addresses.

.req.fast: Common operations must have a low amortized cost.

.req.add: Must be able to add address ranges to the set.

.req.remove: Must be able to remove address ranges from the set.

.req.size: Must report concisely to the client when isolated contiguous ranges 
of at least a certain size appear and disappear.

.req.iterate: Must support the iteration of all isolated contiguous ranges.  
This will not be a common operation.

.req.protocol: Must detect protocol violations.

.req.debug: Must support debugging of client code.

.req.small: Must have a small space overhead for the storage of typical subsets 
of address space and not have abysmal overhead for the storage of any subset of 
address space.

.req.align: Must support an alignment (the alignment of all addresses 
specifying ranges) of down to sizeof(void *) without losing memory.


INTERFACE

.header: CBS is used through impl.h.cbs.


External Types

.type.cbs: CBS is the main data-structure for manipulating a CBS.  It is 
intended that a CBSStruct be embedded in another structure.  No convenience 
functions are provided for the allocation or deallocation of the CBS.
  typedef struct CBSStruct CBSStruct, *CBS;

.type.cbs.block: CBSBlock is the data-structure that represents an isolated 
contiguous range held by the CBS.  It is returned by the new and delete methods 
described below.
  typedef struct CBSBlockStruct CBSBlockStruct, *CBSBlock;

.type.cbs.method: The following methods are provided as callbacks to advise the 
client of certain events.  The implementation of these functions should not 
cause any CBS function to be called on the same CBS.  In this respect, the CBS 
module is not re-entrant.

.type.cbs.change.size.method: CBSChangeSizeMethod is the function pointer type, 
four instances of which are optionally registered via CBSInit.
  typedef void (*CBSChangeSizeMethod)(CBS cbs, CBSBlock block, Size oldSize, 
SizeNewSize);
These callbacks are invoked under CBSInsert, CBSDelete, or CBSSetMinSize in 
certain circumstances.  Unless otherwise stated, oldSize and newSize will both 
be non-zero, and different.  The accessors CBSBlockBase, CBSBlockLimit, and 
CBSBlockSize may be called from within these callbacks, except within the 
delete callback when newSize is zero.  See .impl.callback for implementation 
details.

.type.cbs.iterate.method: CBSIterateMethod is a function pointer type is a 
client method invoked by the CBS module for every isolated contiguous range in 
address order, when passed to the CBSIterate or CBSIterateLarge functions.  The 
function returns a boolean indicating whether to continue with the iteration.
  typedef Bool (*CBSIterateMethod)(CBS cbs, CBSBlock block, void *closureP, 
unsigned long closureS);


External Functions

.function.cbs.init: CBSInit is the function that initialises the CBS 
structure.  It performs allocation in the supplied arena.  Four methods are 
passed in as function pointers (see .type.* above), any of which may be NULL.  
It receives a minimum size, which is used when determining whether to call the 
optional methods.  The mayUseInline boolean indicates whether the CBS may use 
the memory in the ranges as a low-memory fallback (see .impl.low-mem).  The 
alignment indicates the alignment of ranges to be maintained.  An initialised 
CBS contains no ranges.
  Res CBSInit(Arena arena, CBS cbs, CBSChangeSizeMethod new, 
CBSChangeSizeMethod delete, CBSChangeSizeMethod grow, CBSChangeSizeMethod 
shrink, Size minSize, Align alignment, Bool mayUseInline);

.function.cbs.init.may-use-inline: If mayUseInline is set, then alignment must 
be at least sizeof(void *).  In this mode, the CBS will never fail to insert or 
delete ranges, even if memory for control structures becomes short.  Note that, 
in such cases, the CBS may defer notification of new/grow events, but will 
report available blocks in CBSFindFirst and CBSFindLast.  Such low memory 
conditions will be rare and transitory.  See .align for more details.

.function.cbs.finish: CBSFinish is the function that finishes the CBS structure 
and discards any other resources associated with the CBS.
  void CBSFinish(CBS cbs);

.function.cbs.insert: CBSInsert is the function used to add a contiguous range 
specified by [base,limit) to the CBS.  If any part of the range is already in 
the CBS, then ResFAIL is returned, and the CBS is unchanged.  This function may 
cause allocation; if this allocation fails, and any contingency mechanism 
fails, then ResMEMORY is returned, and the CBS is unchanged.
  Res CBSInsert(CBS cbs, Addr base, Addr limit);

.function.cbs.insert.callback: CBSInsert will invoke callbacks as follows:
  new: when a new block is created that is interesting.  oldSize == 0; newSize 
>= minSize.
  new: when an uninteresting block coalesces to become interesting.  0 < 
oldSize < minSize <= newSize.
  delete: when two interesting blocks are coalesced.  grow will also be invoked 
in this case on the larger of the two blocks.  newSize == 0; oldSize >= minSize.
  grow: when an interesting block grows in size.  minSize <= oldSize < newSize.
 
.function.cbs.delete: CBSDelete is the function used to remove a contiguous 
range specified by [base,limit) from the CBS.  If any part of the range is not 
in the CBS, then ResFAIL is returned, and the CBS is unchanged.  This function 
may cause allocation; if this allocation fails, and any contingency mechanism 
fails, then ResMEMORY is returned, and the CBS is unchanged.
  Res CBSDelete(CBS cbs, Addr base, Addr limit);

.function.cbs.delete.callback: CBSDelete will invoke callbacks as follows:
  delete: when an interesting block is entirely removed.  newSize == 0; oldSize 
>= minSize.
  delete: when an interesting block becomes uninteresting. 0 < newSize < 
minSize <= oldSize.
  new: when a block is split into two blocks, both of which are interesting.  
shrink will also be invoked in this case on the larger of the two blocks.  
oldSize == 0; newSize >= minSize.
  shrink: when an interesting block shrinks in size, but remains interesting.  
minSize <= newSize < oldSize.

.function.cbs.iterate: CBSIterate is the function used to iterate all isolated 
contiguous ranges in a CBS.  It receives a pointer, unsigned long closure pair 
to pass on to the iterator method, and an iterator method to invoke on every 
range in address order.  If the iterator method returns FALSE, then the 
iteration is terminated.
  void CBSIterate(CBS cbs, CBSIterateMethod iterate, void *closureP, unsigned 
long closureS);

.function.cbs.iterate.large: CBSIterateLarge is the function used to iterate 
all isolated contiguous ranges of size greater than or equal to the client 
indicated minimum size in a CBS.  It receives a pointer, unsigned long closure 
pair to pass on to the iterator method, and an iterator method to invoke on 
every large range in address order.  If the iterator method returns FALSE, then 
the iteration is terminated.
  void CBSIterateLarge(CBS cbs, CBSIterateMethod iterate, void *closureP, 
unsigned long closureS);

.function.cbs.set.min-size: CBSSetMinSize is the function used to change the 
minimum size of interest in a CBS.  This minimum size is used to determine 
whether to invoke the client callbacks from CBSInsert and CBSDelete.  This 
function will invoke either the new or delete callback for all blocks that are 
(in the semi-open interval) between the old and new values.  oldSize and 
newSize will be the same in these cases.
  void CBSSetMinSize(CBS cbs, Size minSize);

.function.cbs.describe: CBSDescribe is a function that prints a textual 
representation of the CBS to the given stream, indicating the contiguous ranges 
in order, as well as the structure of the underlying splay tree 
implementation.  It is provided for debugging purposes only.
  Res CBSDescribe(CBS cbs, mps_lib_FILE *stream);

.function.cbs.block.base: The CBSBlockBase function returns the base of the 
range represented by the CBSBlock.  This function may not be called from the 
delete callback when the block is being deleted entirely.
  Addr CBSBlockBase(CBSBlock block);
Note that the value of the base of a particular CBSBlock is not guaranteed to 
remain constant across calls to CBSDelete and CBSInsert, regardless of whether 
a callback is invoked.

.function.cbs.block.limit: The CBSBlockLimit function returns the limit of the 
range represented by the CBSBlock.  This function may not be called from the 
delete callback when the block is being deleted entirely.
  Addr CBSBlockLimit(CBSBlock block);
Note that the value of the limit of a particular CBSBlock is not guaranteed to 
remain constant across calls to CBSDelete and CBSInsert, regardless of whether 
a callback is invoked.

.function.cbs.block.size: The CBSBlockSize function returns the size of the 
range represented by the CBSBlock.  This function may not be called from the 
delete callback when the block is being deleted entirely.
  Size CBSBlockSize(CBSBlock block);
Note that the value of the size of a particular CBSBlock is not guaranteed to 
remain constant across calls to CBSDelete and CBSInsert, regardless of whether 
a callback is invoked.

.function.cbs.block.describe: The CBSBlockDescribe function prints a textual 
representation of the CBSBlock to the given stream.  It is provided for 
debugging purposes only.
  Res CBSBlockDescribe(CBSBlock block, mps_lib_FILE *stream);

.function.cbs.find.first: The CBSFindFirst function locates the first block (in 
address order) within the CBS of at least the specified size, and returns its 
range.  If there are no such blocks, it returns FALSE.  It optionally deletes 
the top, bottom, or all of the found range, depending on the findDelete 
argument (this saves a separate call to CBSDelete, and uses the knowledge of 
exactly where we found the range).
  Bool CBSFindFirst(Addr *baseReturn, Addr *limitReturn, CBS cbs, Size size, 
CBSFindDelete findDelete);
  enum {
    CBSFindDeleteNONE,    /* don't delete after finding */
    CBSFindDeleteLOW,     /* delete precise size from low end */
    CBSFindDeleteHIGH,    /* delete precise size from high end */
    CBSFindDeleteENTIRE   /* delete entire range */
  };

.function.cbs.find.last: The CBSFindLast function locates the last block (in 
address order) within the CBS of at least the specified size, and returns its 
range.  If there are no such blocks, it returns FALSE.  Like CBSFindFirst, it 
optionally deletes the range.
  Bool CBSFindLast(Addr *baseReturn, Addr *limitReturn, CBS cbs, Size size, 
CBSFindDelete findDelete);

.function.cbs.find.largest: The CBSFindLargest function locates the largest 
block within the CBS, and returns its range.  If there are no blocks, it 
returns FALSE.  Like CBSFindFirst, it optionally deletes the range (specifying 
CBSFindDeleteLOW or CBSFindDeleteHIGH has the same effect as 
CBSFindDeleteENTIRE).
  Bool CBSFindLargest(Addr *baseReturn, Addr *limitReturn, CBS cbs, 
CBSFindDelete findDelete)


Alignment

.align: When mayUseInline is specified to permit inline data structures and 
hence avoid losing memory in low memory situations, the alignments that the CBS 
supports are constrained by three requirements:
  - The smallest possible range (namely one that is the alignment in size) must 
be large enough to contain a single void * pointer (see 
.impl.low-mem.inline.grain);
  - Any larger range (namely one that is at least twice the alignment in size) 
must be large enough to contain two void * pointers (see 
.impl.low-mem.inline.block);
  - It must be valid on all platforms to access a void * pointer stored at the 
start of an aligned range.

All alignments that meet these requirements are aligned to sizeof(void *), so 
we take that as the minimum alignment.


IMPLEMENTATION


.impl: Note that this section is concerned with describing various aspects of 
the implementation.  It does not form part of the interface definition.


Size Change Callback Protocol

.impl.callback: The size change callback protocol concerns the mechanism for 
informing the client of the appearance and disappearance of interesting 
ranges.  The intention is that each range has an identity (represented by the 
CBSBlock).  When blocks are split, the larger fragment retains the identity.  
When blocks are merged, the new block has the identity of the larger fragment.

.impl.callback.delete: Consider the case when the minimum size is <minSize>, 
and CBSDelete is called to remove a range of size <middle>.  The two (possibly 
non-existant) neighbouring ranges have (possibly zero) sizes <left> and 
<right>.  <middle> is part of the CBSBlock <middleBlock>.

.impl.callback.delete.delete: The delete callback will be called in this case 
if and only if:
  left + middle + right >= minSize  &&  left < minSize  &&  right < minSize
That is, the combined range is interesting, but neither remaining fragment is.  
It will be called with the following parameters:
  block: middleBlock
  oldSize: left + middle + right
  newSize: left >= right ? left : right
 
.impl.callback.delete.new: The new callback will be called in this case if and 
only if:
  left >= minSize  &&  right >= minSize
That is, both remaining fragments are interesting.  It will be called with the 
following parameters:
  block: a new block
  oldSize: 0
  newSize: left >= right ? right : left

.impl.callback.delete.shrink: The shrink callback will be called in this case 
if and only if:
  left + middle + right >= minSize && (left >= minSize || right >= minSize)
That is, at least one of the remaining fragments is still interesting.  It will 
be called with the following parameters:
  block: middleBlock
  oldSize: left + middle + right
  newSize: left >= right ? left : right

.impl.callback.insert: Consider the case when the minimum size is <minSize>, 
and CBSInsert is called to add a range of size <middle>.  The two (possibly 
non-existant) neighbouring blocks are <leftBlock> and <rightBlock>, and have 
(possibly zero) sizes <left> and <right>.

.impl.callback.insert.delete: The delete callback will be called in this case 
if and only if:
  left >= minSize  &&  right >= minSize
That is, both neighbours were interesting.  It will be called with the 
following parameters:
  block: left >= right ? rightBlock : leftBlock
  oldSize: left >= right ? right : left
  newSize: 0

.impl.callback.insert.new: The new callback will be called in this case if and 
only if:
  left + middle + right >= minSize  &&  left < minSize  &&  right < minSize
That is, the combined block is interesting, but neither neighbour was.  It will 
be called with the following parameters:
  block: left >= right ? leftBlock : rightBlock
  oldSize: left >= right ? left : right
  newSize: left + middle + right

.impl.callback.insert.grow: The grow callback will be called in this case if 
and only if:
  left + middle + right >= minSize && (left >= minSize || right >= minSize)
That is, at least one of the neighbours was interesting.  It will be called 
with the following parameters:
  block: left >= right ? leftBlock : rightBlock
  oldSize: left >= right ? left : right
  newSize: left + middle + right


Splay Tree

.impl.splay: The CBS is principally implemented using a splay tree (see 
design.mps.splay).  Each splay tree node is embedded in a CBSBlock that 
represents a semi-open address range.  The key passed for comparison is the 
base of another range.

.impl.splay.fast-find: CBSFindFirst and CBSFindLast use the update/refresh 
facility of splay trees to store, in each CBSBlock, an accurate summary of the 
maximum block size in the tree rooted at the corresponding splay node.  This 
allows rapid location of the first or last suitable block, and very rapid 
failure if there is no suitable block.

.impl.find-largest: CBSFindLargest simply finds out the size of the largest 
block in the CBS from the root of the tree (using SplayRoot), and does 
SplayFindFirst for a block of that size.  This is O(log(n)) in the size of the 
free list, so it's about the best you can do without maintaining a separate 
priority queue, just to do CBSFindLargest.  Except when the emergency lists 
(see .impl.low-mem) are in use, they are also searched.


Low Memory Behaviour

.impl.low-mem: Low memory situations cause problems when the CBS tries to 
allocate a new CBSBlock structure for a new isolated range as a result of 
either CBSInsert or CBSDelete, and there is insufficient memory to allocation 
the CBSBlock structure:

.impl.low-mem.no-inline: If mayUseInline is FALSE, then the range is not added 
to the CBS, and the call to CBSInsert or CBSDelete returns ResMEMORY.

.impl.low-mem.inline: If mayUseInline is TRUE:

.impl.low-mem.inline.block: If the range is large enough to contain an inline 
block descriptor consisting of two pointers, then it is kept on an emergency 
block list.  The CBS will eagerly attempt to add this block back into the splay 
tree during subsequent calls to CBSInsert and CBSDelete.  The CBS will also 
keep its emergency block list in address order, and will coalesce this list 
eagerly.  Some performance degradation will be seen when the emergency block 
list is in use.  Ranges on this emergency block list will not be made available 
to the CBS's client via callbacks.  CBSIterate* will not iterate over ranges on 
this list.

.impl.low-mem.inline.block.structure: The two pointers stored are to the next 
such block (or NULL), and to the limit of the block, in that order.

.impl.low-mem.inline.grain: Otherwise, the range must be large enough to 
contain an inline grain descriptor consisting of one pointer, then it is kept 
on an emergency grain list.  The CBS will eagerly attempt to add this grain 
back into either the splay tree or the emergency block list during subsequent 
calls to CBSInsert and CBSDelete.  The CBS will also keep its emergency grain 
list in address order.  Some performance degradation will be seen when the 
emergency grain list is in use.  Ranges on this emergency grain list will not 
be made available to the CBS's client via callbacks.  CBSIterate* will not 
iterate over ranges on this list.

.impl.low-mem.inline.grain.structure: The pointer stored is to the next such 
grain, or NULL.


The CBS Block

.impl.cbs.block: The block contains a base-limit pair and a splay tree node.  

.impl.cbs.block.special: The base and limit may be equal if the block is 
halfway through being deleted.

.impl.cbs.block.special.just: This conflates values and status, but is 
justified because block size is very important.


TESTING

.test: The following testing will be performed on this module:

.test.cbstest: There is a stress test for this module in impl.c.cbstest.  This 
allocates a large block of memory and then simulates the allocation and 
deallocation of ranges within this block using both a CBS and a BT.  It makes 
both valid and invalid requests, and compares the CBS response to the correct 
behaviour as determined by the BT.  It also iterates the ranges in the CBS, 
comparing them to the BT.  It also invokes the CBS describe method, but makes 
no automatic test of the resulting output.  It does not currently test the 
callbacks.

.test.pool: Several pools (currently MV2 and MVFF) are implemented on top of a 
CBS.  These pool are subject to testing in development, QA, and are/will be 
heavily exercised by customers.


NOTES FOR FUTURE DEVELOPMENT

.future.not-splay: The initial implementation of CBSs is based on splay trees.  
It could be revised to use any other data structure that meets the requirements 
(especially .req.fast).

.future.hybrid: It would be possible to attenuate the problem of .risk.overhead 
(below) by using a single word bit set to represent the membership in a 
(possibly aligned) word-width of grains.  This might be used for block sizes 
less than a word-width of grains, converting them when they reach all free in 
the bit set.  Note that this would make coalescence slightly less eager, by up 
to (word-width - 1).


RISKS

.risk.overhead: Clients should note that the current implementation of CBSs has 
a space overhead proportional to the number of isolated contiguous ranges.  [ 
Four words per range. ]  If the CBS contains every other grain in an area, then 
the overhead will be large compared to the size of that area.  [ Four words per 
two grains. ]  See .future.hybrid for a suggestion to solve this problem.  An 
alternative solution is to use CBSs only for managing long ranges.
---
The following relates to a pending re-design and does not yet relate to any 
working source version.  GavinM 1998-09-25

The CBS system provides its services by combining the services provided by 
three subsidiary CBS modules:

  - CBSST -- Splay Tree: Based on out-of-line splay trees; must allocate to 
insert isolated, which may therefore fail. 

  - CBSBL -- Block List: Based on a singly-linked list of variable sized ranges 
with inline descriptors; ranges must be at least large enough to store the 
inline descriptor.

  - CBSGL -- Grain List: Based on a singly-linked list of fixed size ranges 
with inline descriptors; the ranges must be the alignment of the CBS.

The three sub-modules have a lot in common.  Although their methods are not 
invoked via a dispatcher, they have been given consistent interfaces, and 
consistent internal appearance, to aid maintenance.

Methods supported by sub-modules (not all sub-modules support all methods):

  - MergeRange -- Finds any ranges in the specific CBS adjacent to the supplied 
one.  If there are any, it extends the ranges, possibly deleting one of them.  
This cannot fail, but should return FALSE if there is an intersection between 
the supplied range and a range in the specific CBS.  

  - InsertIsolatedRange -- Adds a range to the specific CBS that is not 
adjacent to any range already in there.  Depending on the specific CBS, this 
may be able to fail for allocation reasons, in which case it should return 
FALSE.  It should AVER if the range is adjacent to or intersects with a range 
already there.

  - RemoveAdjacentRanges -- Finds and removes from the specific CBS any ranges 
that are adjacent to the supplied range.  Should return FALSE if the supplied 
range intersects with any ranges already there.

  - DeleteRange -- Finds and deletes the supplied range from the specific CBS.  
Returns a tri-state result:
    - Success -- The range was successfully deleted.  This may have involved 
the creation of a new range, which should be done via CBSInsertIsolatedRange.
    - ProtocolError -- Either some non-trivial strict subset of the supplied 
range was in the specific CBS, or a range adjacent to the supplied range was in 
the specific CBS.  Either of these indicates a protocol error. 
    - NoIntersection -- The supplied range was not found in the CBS.  This may 
or not be a protocol error, depending on the invocation context.

  - FindFirst -- Returns the first (in address order) range in the specific CBS 
that is at least as large as the supplied size, or FALSE if there is no such 
range.

  - FindFirstBefore -- As FindFirst, but only finds ranges prior to the 
supplied address.

  - FindLast -- As FindFirst, but finds the last such range in address order.

  - FindLastAfter -- FindLast's equivalent of FindFirstBefore.

  - Init -- Initialise the control structure embedded in the CBS.

  - Finish -- Finish the control structure embedded in the CBS.

  - InlineDescriptorSize -- Returns the aligned size of the inline descriptor.

  - Check -- Checks the control structure embedded in the CBS.

The CBS supplies the following utilities:

  - CBSAlignment -- Returns the alignment of the CBS.

  - CBSMayUseInline -- Returns whether the CBS may use the memory in the ranges 
stored.

  - CBSInsertIsolatedRange -- Wrapper for CBS*InsertIsolatedRange.
 
Internally, the CBS* sub-modules each have an internal structure CBS*Block that 
represents an isolated range within the module.  It supports the following 
methods (for sub-module internal use):
  - BlockBase -- Returns the base of the associated range;
  - BlockLimit 
  - BlockRange
  - BlockSize

A. References

B. Document History

2002-06-07 RB Converted from MMInfo database design document.