MPS Wiki: Unmanaged workspace

This wiki article contains incomplete and informal notes about the MPS, the precursor to more formal documentation. Not confidential. Readership: MPS users and developers.

Warning: As far as I am aware, issues with unmanaged workspace have not been formally addressed in MPS documentation to date. This article is a start. Consequently this article is even more speculative and new than others in the Wiki. RHSK 2006-06-21

Introduction: What is unmanaged workspace?

"Unmanaged↓" means containing references that the MPS cannot see. "Workspace" means stack, registers, and incomplete objects.

We are most familiar with an MPS mode of use where the stack and registers are managed (they are declared as a root and are ambiguously scanned), and the only unmanaged parts of the workspace are one or more incomplete objects in one or more allocation points.

(Several test executables omit the stack scanner, and rely on guarantees that the MPS will not move or collect memory that, I believe, we have not documented).

We should design guaranteed protocols↓ for simple single-threaded clients.

However, these are burdensome for complex clients (where it is hard to determine when writing client function X whether a call to client function Y will end some guaranteed period), and for multi-threaded clients.

So this article also attempts to generalize some concepts from our experience of the allocation point protocol, to define a guarded-protocol↓, and then restates the allocation point protocol in terms of those concepts.

This article has the following sections:

Objects, cells, references, and reference-values
Using unmanaged workspace

Objects, cells, references, and reference-values

[Note: apart from "root" and "reachable", the terminology in this section is new (I think). RHSK 2006-06-16]

object: some memory allocated by the MPS; it may contain cells.
cell: a place that stores one reference; (A cell is most commonly: a register, a local (C-stack) variable, a statically allocated global variable, or a memory word inside an object)
reference: one stored instance of a reference-value; (For example: the stored bit-pattern 0x1200A3B0 in register r4)
reference-value: a value that designates an allocated object; (A reference-value is usually an address, for example: the bit-pattern 0x1200A3B0)

Types of object:

Root object (in MPS terminology): ≡ an object that the mutator has designated as a root.
Reachable object (in MPS terminology): ≡ a root, or an object reachable from a root via scanned references.

Note that the MPS does NOT assume that registers, the stack, etc, are roots. This is a little unusual. This means that "reachable" means "reachable to the collector"; the mutator may be able to access references that are not "reachable".

Types of cell:

Managed: ≡ a scanned cell in a reachable object. If the reference in the cell is TO a reachable object, then the MPS ensures the reference remains fresh. (The MPS must ensure the referent does not get reclaimed. If ambiguous, pins referent. If exact, may update the reference.)
Unmanaged: ≡ a cell that is not managed. A reference stored in an unmanaged cell is not useful unless it is a guaranteed reference or a guarded value. (The MPS cannot see the reference).

The state of a reference:

A reference may be:

Fresh: ≡ a reference that the MPS has kept up-to-date. (The definition of 'up-to-date' depends on context. For example, for a weak reference, 'up-to-date' means it either points to the intended object, or it has been nulled out.)
Stale: ≡ a reference that has not been kept up-to-date, and may now designate an out-of-date copy of the object, some other object, random memory, or unmapped memory.

Using unmanaged workspace

[Note: the terminology in this section is new (I think). RHSK 2006-06-16]

Unmanaged workspace must be used with care because the MPS cannot see the references stored in it, and so cannot keep these references fresh.

Surprisingly, the MPS seems to lack the necessary protocols for using unmanaged workspace in general. The MPS currently supports only one protocol for unmanaged workspace: objects under construction in allocation points.

This section is a preliminary analysis of what protocols are needed (and appropriate) for using unmanaged workspace in general.

Operations

Using unmanaged workspace involves:

copying references from the managed into the unmanaged world;
dereferencing references in the unmanaged world;
copying references from the unmanaged into the managed world;
moving objects (and the references they contain) from the unmanaged to the managed world.

Guaranteed -vs- guarded protocols

guaranteed protocol: Permitted operations are safe and the results are always valid.
guarded protocol: Permitted operations are safe, but the results may be invalid. After performing the operations, the mutator can find out whether the results are valid or not.

guaranteed reference

≡ a value that the MPS guarantees will remain a fresh reference, even when stored in an unmanaged cell, until the end of the guaranteed period. Permitted operations are safe and correct. (The MPS guarantees that the referent does not move or get reclaimed). [Clearly possible, but these protocols are not currently clearly defined. RHSK 2006-06-15]

A guaranteed reference may be copied into a managed cell.

guarded value

≡ a value that may be a fresh reference, or may be stale. (The MPS knows which, but the mutator doesn't). A protocol defines how the value may be used.

unsafe value

≡ a value that is neither guaranteed nor guarded, and even the MPS does not know if it is a fresh reference or not. An unsafe value must not be copied into a managed non-ambiguous cell.

Operations with guarded values

A guarded protocol defines the "guarded period", which has an "operations" phase, followed by an atomic "commit" event that may succeed or fail, followed by an "abort" phase (only if the commit failed).

"commit" succeeds ⇒ the guarded references remained fresh. The result of the operations is valid, and the guarded period is over. A successful commit may atomically change other state too, such as making a previously unmanaged object become managed.

"commit" fails ⇒ the guarded references are stale, the results are invalid, and the guarded period enters an "abort" phase.

The mutator must detect the abort phase, clear up if necessary, and then "close" the abort phase (this also closes the guarded period).

Note: although the "commit" is atomic and succeeds or fails, the mutator will not be able to both commit and test for success or failure atomically.

"false abort": the mutator may be unable to test commit's success or failure accurately: it may be required to make a conservative approximation, erring on the side of failure. This may result in a 'false abort', where the mutator believes it is in the "abort" phase of the guarded period, but actually the commit succeeded and the guarded period is over. Write your mutator abort code with great care: it must be valid and have the same semantics whether run in a real abort phase or a false abort.

A guarded protocol restricts which operations are permitted on guarded values.

[Note: It might be possible to design a protocol where guarded references may be dereferenced. If stale, the mutator would see objects in the stale world that persists until reclaim. Not currently supported. RHSK 2006-06-15]

Example: the allocation point protocol:

[Note: This example attempts to define the allocation point protocol using the general concepts for unmanaged workspace and guarded protocols developed above, and extend the definition of the protocol to discuss storing guarded references in exactly-scanned managed cells. Commit is when the mutator sets I=A. MPS does not currently define abort-close: in practice the MPS code currently does it on the next mps_ap_fill, but we should probably define it as occurring earlier, when the mutator calls mps_ap_trip. RHSK 2006-06-20]

Throughout the guarded period, the new object is unmanaged. On successful commit, the guarded period ends, and if the new object is reachable it becomes managed.

Throughout the guarded period, the pointer value returned by mps_reserve is a guarded value that may be dereferenced to access the new object, and may be copied into a managed cell. On successful commit, copies in the managed world (including copies in the new object, if it is reachable) become fresh references to the new object.

Throughout the guarded period, a reference to a reachable object may be copied from a managed cell into the unmanaged world (including into the new object): it becomes a guarded value that may NOT be dereferenced. The guarded value may be copied into a managed cell. On successful commit, all copies stored in managed cells (including cells in the new object, if it is reachable) are still fresh. All other copies are unsafe.

On failed commit, for the duration of the abort phase, all guarded values remain guarded. After the abort phase is closed, all remaining copies of the guarded value become unsafe.

During its abort, the mutator must therefore remove all guarded values from managed non-ambiguous cells, and (because this may be a false abort) it must not dereference the pointer value returned by mps_reserve.

Other notes

.talk.RB.2006-06-13:

 need two snippets:
  1= with stack being ambig scanned
  
  2. without stack being ambig scanned: means, snippet should 
     show parent being pinned.
      /* Store an ambiguous reference to the new object */
      global->ambigroot = neo;
  [RHSK 2006-06-14: I'm not sure that unscanned reg+stack is possible!]
Also:
  = can't use "new";
  = ? (void**) is not portable.
  - avoid use of "race" -- it conventionally means a bad race.

.talk.RB.2006-06-14:

[Modes of MPS use snipped: now in its own wiki article. RHSK]

MPS could make a promise about when it 
might invalidate unmanaged references.  (We currently only invalidate 
when we flip the mutator for a trace, so each trace flip starts 
a new epoch, and unmanaged references are only valid during 
the epoch in which the unmanaged reference was created by copying
some managed reference).

.think.RHSK.2006-06-14:

For 7: unscanned reg+stack, multi-threaded, moving.

How do we support operations like:
  a->b->c->d = e->f->g->h;
?

If forwarding pointers (broken hearts) didn't over-write the old 
objects, then the long chain would be 'safe' as long as you throw 
away the answer at the end, and as long as you do that before 
the reclaim phase of the trace.  

Hmm, it's easy to write a format where the broken heart doesn't 
obliterate the object's slots -- store the forwarding pointer 
solely in the object's format header, invisible to 'normal' client 
code.  

(By the way, are there any limits on the size of pad object that 
the MPS can request of a format pad method?)

With such a format, and the current behaviour of MPS's only 
moving pool (amc):
  -  after flip, unmanaged references are stale (point  
     to stale copies of objects), but are otherwise still 
     connected as the client expects;
  -  after reclaim, they may be to unmapped memory.  
But the MPS does not currently give a guarantee that it will 
behave be like this.  Perhaps it could.

Allocation Point is two things:
  a. a fast inlined allocation protocol;
  b. a quarantine that allows unmanaged references to be safely 
     written, and either:
       - safely converted into valid managed references 
         (if commit succeeds); or
       - flagged as invalid (if commit fails).

b. is nearly what we want for 7.

We want a defined region of time and space within 
which we can use unmanaged references.  

This can either be guaranteed success, or an attempt 
followed by a 'did we get away with that?' check.  

We could work with this primitive:
  - load word from (ref1 + offset) and store it in (ref2 + offset).
Exact stacks, manually maintained.  But it's painful.  (Okay for a 
language compiler, but not for a human.)

From a pinned object, how do i pin its child?
do {
  Read epoch
  Read o->child into unmanaged ref "child"
} while(! mps_pin(child, epoch);

We can do this already with an allocation point "ap_PinStack".  Each 
mps_pin_t object in the PinStack is simply a managed ambiguous ref:

/* object o MUST be pinned */
int pin_child(object *child_o, object *o)
{
  void *p;
  mps_pin_t *pin;

  do {
    void *child;  /* unmanaged ref: we must not deref it */
    
    if (mps_reserve(&p, ap_PinStack, sizeof(mps_pin_t)) != MPS_RES_OK) {
      goto fail_pin_child;
    }
    pin = p;
    
    child = o->child;

    /* We must not dereference child;
       We must not store child anywhere managed.  
       We must not store child in any quarantine except pin.
       
       We may store child in the pin quarantine.
       We may store child somewhere where it will 
         always remain umnanaged.  
       
       We may do whatever else we like with child!
       (Not much!)
    */
    
    pin->thing = child;
  } while (! mps_commit(ap_PinStack, p, sizeof(mps_pin_t));

  /* Success: we managed to pin the child */
  child_o = pin->thing;
  return TRUE;

fail_pin_child:
  child_o = NULL;
  return FALSE;  /* out of memory, etc */
}

Given pinned object o, can we pin its child simply by telling our 
own format code that the o->child slot is to be RankAMBIG now?

NO!  We may have finished scanning at RankAMBIG, be scanning at 
RankEXACT now, come across another ref to child, and move it.

B. Document History

  2006-06-21  RHSK  Created.
  2006-06-21  RHSK  Moved here from modes of use article; add Intro.

C. Copyright and License

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
Redistributions in any form must be accompanied by information on how to obtain complete source code for the this software and any accompanying software that uses this software. The source code must either be included in the distribution or be available for no more than the cost of distribution plus a nominal fee, and must be freely redistributable under reasonable conditions. For an executable file, complete source code means the source code for all modules it contains. It does not include source code for modules or files that typically accompany the major components of the operating system on which the executable file runs.

This software is provided by the copyright holders and contributors "as is" and any express or implied warranties, including, but not limited to, the implied warranties of merchantability, fitness for a particular purpose, or non-infringement, are disclaimed. In no event shall the copyright holders and contributors be liable for any direct, indirect, incidental, special, exemplary, or consequential damages (including, but not limited to, procurement of substitute goods or services; loss of use, data, or profits; or business interruption) however caused and on any theory of liability, whether in contract, strict liability, or tort (including negligence or otherwise) arising in any way out of the use of this software, even if advised of the possibility of such damage.

MPS Wiki: issues with unmanaged workspace