.. mode: -*- rst -*-

Coalescing block structure
==========================

:Tag: design.mps.cbs
:Author: Gavin Matthews
:Date: 1998-05-01
:Status: complete design
:Revision: $Id: //info.ravenbrook.com/project/mps/version/1.113/design/cbs.txt#1 $
:Copyright: See section `Copyright and License`_.


Introduction
------------

_`.intro`: This is the design for impl.c.cbs, which implements a data
structure for the management of non-intersecting memory ranges, with
eager coalescence.

_`.readership`: This document is intended for any MM developer.

_`.source`: design.mps.poolmv2, design.mps.poolmvff.

_`.overview`: The "coalescing block structure" is a set of addresses
(or a subset of address space), with provision for efficient
management of contiguous ranges, including insertion and deletion,
high level communication with the client about the size of contiguous
ranges, and detection of protocol violations.


Definitions
-----------

_`.def.range`: A (contiguous) *range* of addresses is a semi-open
interval on address space.

_`.def.isolated`: A contiguous range is *isolated* with respect to
some property it has, if adjacent elements do not have that property.


Requirements
------------

_`.req.set`: Must maintain a set of addresses.

_`.req.fast`: Common operations must have a low amortized cost.

_`.req.add`: Must be able to add address ranges to the set.

_`.req.remove`: Must be able to remove address ranges from the set.

_`.req.size`: Must report concisely to the client when isolated
contiguous ranges of at least a certain size appear and disappear.

_`.req.iterate`: Must support the iteration of all isolated
contiguous ranges. This will not be a common operation.

_`.req.protocol`: Must detect protocol violations.

_`.req.debug`: Must support debugging of client code.

_`.req.small`: Must have a small space overhead for the storage of
typical subsets of address space and not have abysmal overhead for the
storage of any subset of address space.

_`.req.align`: Must support an alignment (the alignment of all
addresses specifying ranges) of down to ``sizeof(void *)`` without
losing memory.


Interface
---------

_`.header`: CBS is used through impl.h.cbs.


External types
..............

``typedef struct CBSStruct *CBS``

_`.type.cbs`: ``CBS`` is the main data structure for manipulating a
CBS. It is intended that a ``CBSStruct`` be embedded in another
structure. No convenience functions are provided for the allocation or
deallocation of the CBS.

``typedef Bool (*CBSIterateMethod)(CBS cbs, Range range, void *closureP, Size closureS)``

_`.type.cbs.iterate.method`: Type ``CBSIterateMethod`` is a callback
function that may be passed to ``CBSIterate()``. It is called for
every isolated contiguous range in address order. The function must
returns a ``Bool`` indicating whether to continue with the iteration.


External functions
..................

``Res CBSInit(Arena arena, CBS cbs, void *owner, Align alignment, Bool fastFind, ArgList args)``

_`.function.cbs.init`: ``CBSInit()`` is the function that initialises
the CBS structure. It performs allocation in the supplied arena. The
parameter ``owner`` is passed to ``MeterInit()``, an ``alignment``
indicates the alignment of ranges to be maintained. An initialised CBS
contains no ranges.

``fastFind``, if set, causes the CBS to maintain, for each subtree,
the size of the largest block in that subtree. This must be true if
any of the ``CBSFindFirst()``, ``CBSFindLast()``, or
``CBSFindLargest()`` functions are going to be used on the CBS.

``CBSInit()`` may take one keyword argument:

* ``MPS_KEY_CBS_EXTEND_BY`` (type ``Size``; default 4096) is the size
  of segment that the CBS will request from the arena in which to
  allocate its ``CBSBlock`` structures.

``void CBSFinish(CBS cbs)``

_`.function.cbs.finish`: ``CBSFinish()`` is the function that finishes
the CBS structure and discards any other resources associated with the
CBS.

``Res CBSInsert(Range rangeReturn, CBS cbs, Range range)``

_`.function.cbs.insert`: If any part of ``range`` is already in the
CBS, then leave it unchanged and return ``ResFAIL``. Otherwise,
attempt to insert ``range`` into the CBS. If the insertion succeeds,
then update ``rangeReturn`` to describe the contiguous isolated range
containing the inserted range (this may differ from ``range`` if there
was coalescence on either side) and return ``ResOK``. If the insertion
fails, return a result code indicating allocation failure.

_`.function.cbs.insert.fail`: Insertion of a valid range (that is, one
that does not overlap with any range in the CBS) can only fail if the
new range is isolated and the allocation of the necessary data
structure to represent it failed.


``Res CBSDelete(Range rangeReturn, CBS cbs, Range range)``

_`.function.cbs.delete`: If any part of the range is not in the CBS,
then leave the CBS unchanged and return ``ResFAIL``. Otherwise, update
``rangeReturn`` to describe the contiguous isolated range that
contains ``range`` (this may differ from ``range`` if there are
fragments on either side) and attempt to delete the range from the
CBS. If the deletion succeeds, return ``ResOK``. If the deletion
fails, return a result code indicating allocation failure.

_`.function.cbs.delete.fail`: Deletion of a valid range (that is, one
that is wholly contained in the CBS) can only fail if there are
fragments on both sides and the allocation of the necessary data
structures to represent them fails.

_`.function.cbs.delete.return`: ``CBSDelete()`` returns the contiguous
isolated range that contains ``range`` even if the deletion fails.
This is so that the caller can try deleting the whole block (which is
guaranteed to succeed) and managing the fragments using a fallback
strategy.

``void CBSIterate(CBS cbs, CBSIterateMethod iterate, void *closureP, Size closureS)``

_`.function.cbs.iterate`: ``CBSIterate()`` is the function used to
iterate all isolated contiguous ranges in a CBS. It receives a
pointer, ``Size`` closure pair to pass on to the iterator method,
and an iterator method to invoke on every range in address order. If
the iterator method returns ``FALSE``, then the iteration is
terminated.

``Res CBSDescribe(CBS cbs, mps_lib_FILE *stream)``

_`.function.cbs.describe`: ``CBSDescribe()`` is a function that prints
a textual representation of the CBS to the given stream, indicating
the contiguous ranges in order, as well as the structure of the
underlying splay tree implementation. It is provided for debugging
purposes only.

``Bool CBSFindFirst(Range rangeReturn, Range oldRangeReturn, CBS cbs, Size size, FindDelete findDelete)``

_`.function.cbs.find.first`: Locate the first block (in address order)
within the CBS of at least the specified size, update ``rangeReturn``
to describe that range, and return ``TRUE``. If there is no such
block, it returns ``FALSE``.

In addition, optionally delete the top, bottom, or all of the found
range, depending on the ``findDelete`` argument. This saves a separate
call to ``CBSDelete()``, and uses the knowledge of exactly where we
found the range. The value of ``findDelete`` must come from this
enumeration::

    enum {
        FindDeleteNONE,    /* don't delete after finding */
        FindDeleteLOW,     /* delete size bytes from low end of block */
        FindDeleteHIGH,    /* delete size bytes from high end of block */
        FindDeleteENTIRE   /* delete entire range */
    };

The original contiguous isolated range in which the range was found is
returned via the ``oldRangeReturn`` argument. (If ``findDelete`` is
``FindDeleteNONE`` or ``FindDeleteENTIRE``, then this will be
identical to the range returned via the ``rangeReturn`` argument.)

``CBSFindFirst()`` requires that ``fastFind`` was true when
``CBSInit()`` was called.

``Bool CBSFindLast(Range rangeReturn, Range oldRangeReturn, CBS cbs, Size size, FindDelete findDelete)``

_`.function.cbs.find.last`: Like ``CBSFindFirst()``, except that it
finds the last block in address order.

``Bool CBSFindLargest(Range rangeReturn, Range oldRangeReturn, CBS cbs, Size size, FindDelete findDelete)``

_`.function.cbs.find.largest`: Locate the largest block within the
CBS, and if that block is at least as big as ``size``, return its
range via the ``rangeReturn`` argument, and return ``TRUE``. If there
are no blocks in the CBS at least as large as ``size``, return
``FALSE``. Pass 0 for ``size`` if you want the largest block
unconditionally.

Like ``CBSFindFirst()``, optionally delete the range (specifying
``FindDeleteLOW`` or ``FindDeleteHIGH`` has the same effect as
``FindDeleteENTIRE``). This feature requires that ``fastFind`` was
true when ``CBSInit()`` was called.


Implementation
--------------

_`.impl`: This section is concerned with describing various aspects of
the implementation. It does not form part of the interface definition.



Splay tree
..........

_`.impl.splay`: The CBS is principally implemented using a splay tree
(see design.mps.splay_). Each splay tree node is embedded in a
``CBSBlock`` that represents a semi-open address range. The key passed
for comparison is the base of another range.

.. _design.mps.splay: splay

_`.impl.splay.fast-find`: ``CBSFindFirst()`` and ``CBSFindLast()`` use
the update/refresh facility of splay trees to store, in each
``CBSBlock``, an accurate summary of the maximum block size in the
tree rooted at the corresponding splay node. This allows rapid
location of the first or last suitable block, and very rapid failure
if there is no suitable block.

_`.impl.find-largest`: ``CBSFindLargest()`` simply finds out the size
of the largest block in the CBS from the root of the tree, using
``SplayRoot()``, and does ``SplayFindFirst()`` for a block of that
size. This takes time proportional to the logarithm of the size of the
free list, so it's about the best you can do without maintaining a
separate priority queue, just to do ``CBSFindLargest()``.


Low memory behaviour
....................

_`.impl.low-mem`: When the CBS tries to allocate a new ``CBSBlock``
structure for a new isolated range as a result of either
``CBSInsert()`` or ``CBSDelete()``, and there is insufficient memory
to allocation the ``CBSBlock`` structure, then the range is not added
to the CBS or deleted from it, and the call to ``CBSInsert()`` or
``CBSDelete()`` returns ``ResMEMORY``.


The CBS block
.............

_`.impl.cbs.block`: The block contains a base-limit pair and a splay
tree node.

_`.impl.cbs.block.special`: The base and limit may be equal if the
block is halfway through being deleted.

_`.impl.cbs.block.special.just`: This conflates values and status, but
is justified because block size is very important.


Testing
-------

_`.test`: The following testing will be performed on this module:

_`.test.cbstest`: There is a stress test for this module in
impl.c.cbstest. This allocates a large block of memory and then
simulates the allocation and deallocation of ranges within this block
using both a ``CBS`` and a ``BT``. It makes both valid and invalid
requests, and compares the ``CBS`` response to the correct behaviour
as determined by the ``BT``. It also iterates the ranges in the
``CBS``, comparing them to the ``BT``. It also invokes the
``CBSDescribe()`` method, but makes no automatic test of the resulting
output. It does not currently test the callbacks.

_`.test.pool`: Several pools (currently MVT_ and MVFF_) are implemented
on top of a CBS. These pool are subject to testing in development, QA,
and are/will be heavily exercised by customers.

.. _MVT: poolmvt
.. _MVFF: poolmvff


Notes for future development
----------------------------

_`.future.not-splay`: The initial implementation of CBSs is based on
splay trees. It could be revised to use any other data structure that
meets the requirements (especially `.req.fast`_).

_`.future.hybrid`: It would be possible to attenuate the problem of
`.risk.overhead`_ (below) by using a single word bit set to represent
the membership in a (possibly aligned) word-width of grains. This
might be used for block sizes less than a word-width of grains,
converting them when they reach all free in the bit set. Note that
this would make coalescence slightly less eager, by up to
``(word-width - 1)``.


Risks
-----

_`.risk.overhead`: Clients should note that the current implementation
of CBSs has a space overhead proportional to the number of isolated
contiguous ranges. [Four words per range.] If the CBS contains every
other grain in an area, then the overhead will be large compared to
the size of that area. [Four words per two grains.] The CBS structure
is thus suitable only for managing large enough ranges.



Document History
----------------

- 1998-05-01 Gavin Matthews. This document was derived from the
  outline in design.mps.poolmv2(2).

- 1998-07-22 Gavin Matthews. Updated in response to approval comments
  in change.epcore.anchovy.160040. There is too much fragmentation in
  trapping memory.

- Gavin Matthews. Updated (as part of change.epcore.brisling.160158:
  MVFF cannot be instantiated with 4-byte alignment) to document new
  alignment restrictions.

- 2002-06-07 RB_ Converted from MMInfo database design document.

- 2013-04-14 GDR_ Converted to reStructuredText.

- 2013-05-19 GDR_ Removed the "emergency" free list allocator, the
  design notes on an unimplemented "future hybrid" scheme, the
  callbacks, block interface, and minimum size interface. Updated the
  arguments for ``CBSIterateMethod``, ``CBSInit()``, ``CBSInsert()``,
  and ``CBSDelete()``.

- 2013-05-23 RB_ Removed references to "contingency methods" that were
  talking about the deleted "emergency" free list allocator.
  Documented ``fastFind`` argument to ``CBSInit()``.

.. _RB: http://www.ravenbrook.com/consultants/rb/
.. _GDR: http://www.ravenbrook.com/consultants/gdr/


Copyright and License
---------------------

Copyright © 1998-2014 Ravenbrook Limited. All rights reserved. 
<http://www.ravenbrook.com/>. This is an open source license. Contact
Ravenbrook for commercial licensing options.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:

#. Redistributions of source code must retain the above copyright
   notice, this list of conditions and the following disclaimer.

#. Redistributions in binary form must reproduce the above copyright
   notice, this list of conditions and the following disclaimer in the
   documentation and/or other materials provided with the distribution.

#. Redistributions in any form must be accompanied by information on how
   to obtain complete source code for this software and any
   accompanying software that uses this software.  The source code must
   either be included in the distribution or be available for no more than
   the cost of distribution plus a nominal fee, and must be freely
   redistributable under reasonable conditions.  For an executable file,
   complete source code means the source code for all modules it contains.
   It does not include source code for modules or files that typically
   accompany the major components of the operating system on which the
   executable file runs.

**This software is provided by the copyright holders and contributors
"as is" and any express or implied warranties, including, but not
limited to, the implied warranties of merchantability, fitness for a
particular purpose, or non-infringement, are disclaimed.  In no event
shall the copyright holders and contributors be liable for any direct,
indirect, incidental, special, exemplary, or consequential damages
(including, but not limited to, procurement of substitute goods or
services; loss of use, data, or profits; or business interruption)
however caused and on any theory of liability, whether in contract,
strict liability, or tort (including negligence or otherwise) arising in
any way out of the use of this software, even if advised of the
possibility of such damage.**
