MPS issue job003561

Titleamcssth fails on lii6gc with RES_COMMIT_LIMIT
Statusclosed
Priorityessential
Assigned userGareth Rees
OrganizationRavenbrook
DescriptionOn lii6gc:

    $ code/lii6gc/hot/amcssth
    code/lii6gc/hot/amcssth: randomize(): choosing initial state (v3): 611924816.
    [...]

    MPS_RESERVE_BLOCK: 7

    Aborted (core dumped)

Result code 7 is COMMIT_LIMIT.
AnalysisThis test case also fails with MPS_RES_RESOURCE (see job003703).

The underlying cause is that the test case waits for a message of type mps_message_type_gc, and then if the condemned size was large enough:

    if (condemned > (gen1SIZE + gen2SIZE + (size_t)128) * 1024)

it sets the commit limit:

    /* When condemned size is larger than could happen in a gen 2
     * collection (discounting ramps, natch), guess that was a dynamic
     * collection, and reset the commit limit, so it doesn't run out. */
    mps_arena_commit_limit_set(arena, 2 * testArenaSIZE)

This is very delicate even in a single-threaded test case because (i) the MPS provides no guarantee of getting collection messages in a timely fashion; (ii) these numbers depend in a complicated way on the MPS's internals, such as overheads, allocation policy and so on. And of course it's utterly hopeless in a multi-threaded test case.

Nonetheless DL points out that there is an issue here. Setting the commit limit checks that the MPS can live inside a limited amount of memory. Without this limit, the MPS could be as wasteful as it liked and the test suite would still pass. So we shouldn't just turn it off.

GDR 2014-04-10: The following code is responsible for the test failures:

    if (collections == collectionsCOUNT / 2) {
      unsigned long object_count = 0;
      mps_arena_park(arena);
      mps_arena_formatted_objects_walk(arena, test_stepper, &object_count, 0);
      mps_arena_release(arena);
      printf("stepped on %lu objects.\n", object_count);
    }

There are two problems here. First, the condition on "collections" is an exact comparison. But there may be multiple collections in between consecutive visits to this line, and so often the block is missed. This suppresses the error in many cases and explains why we see it so rarely. The second problem is that when the arena is parked, the MPS can make no collection progress, and yet other threads are still allocating. This ensures that the commit limit will be reached.

The solution is to split the test case into two modes. In one mode, we walk the objects but don't set the commit limit (this tests that we can allocate and walk simultaneously). In the other mode, we set the commit limit but don't walk the objects (this tests that the MPS can live in a tight limit even when allocation is happening in multiple threads).
How foundautomated_test
EvidenceNone as yet.
Observed in1.111.0
Created byGareth Rees
Created on2013-07-17 17:22:29
Last modified byGareth Rees
Last modified on2014-04-10 15:06:58
History2013-07-17 GDR Created.
2014-04-08 GDR Add analysis from job003703.
2014-04-10 GDR Explain the problem and its solution.

Fixes

Change Effect Date User Description
185430 closed 2014-04-10 15:06:58 Gareth Rees Fix amcssth -- don't try to test two incompatible features at the same time (see job003561).
Set the commit limit in amcss and amcsshe so that we test that the MPS can live in a tight memory limit.
Don't try to detect when the MPS has made a "dynamic" collection. This is not reliable or maintainable.