MPS issue job001549

TitleAssertion failure !AMS_IS_INVALID_COLOUR
Assigned userRichard Brooksby
DescriptionMPS assert !AMS_IS_INVALID_COLOUR(seg, i) poolams.c AMSFix

Repeatable: yes (seed 23954)
Recurrence: rare <0.1% in amsss with random seeds
Platforms: w3i3mv, xcppgc
Varieties: CI, TI

This may be a bogus assert. See related jobs:
  job000548 "AMS grain colour doesn't agree with design doc" may be related. It says " AMS object can become INVALID (notwhite & !notgrey) ... This doesn't break the pool, and can be regarded as correct".
  job000535 "Interior ambiguous pointers can give PoolAMS a headache" which changed the way the bit tables are used.

RHSK 2006-12-14
version/1.107/...@161223 == release 1.107.0


running ..\tool\ with
  w3i3mv\ci\: amcss, amsss, finalcv, awlut, awluthe, messtest
  each: rebuild and run 10 times
  loop forever

amsss.exe failed near Collection 10:

Randomize seed = 23954, encountered in iteration 29 x 10 = 290 roughly, interspersed with other tests (see list above) at CI, but not including mpsicv.


amsss fails *every time* with Randomize seed = 23954, in CI and TI only. (That is, when CONFIG_DEBUG = DIAGNOSTICS = STATISTICs = METERs are compiled in). This is version/1.107/...@161233 == release/1.107.0, on Duck. To set the seed, simply call like this:
    ./xcppgc/ci/amsss 23954

No failures yet with other seeds.

RHSK 2009-05-29
New failing seed value (seed-values have changed with improvements to rnd() in testlib.c)
.\w3i3m9\ci\amsss.exe 1243611132
asserts at poolams.c 1397

RHSK 2010-02-26
Another: ./xci3gc/ci/amsss 1267201508
poolams.c 1397

GDR 2013-05-06: Also reported by Bruce Mitchener [1].

GDR 2013-06-07: Happens with seed 1435966849 on xci6ll.
AnalysisRHSK 2006-12-14
static Res AMSFix(Pool pool, ScanState ss, Seg seg, Ref *refIO)
  i = AMS_ADDR_INDEX(seg, base);
#define AMS_IS_INVALID_COLOUR(seg, index) \
  (AMS_IS_GREY(seg, index) && !AMS_IS_WHITE(seg, index))

So: this grain is grey, but not white. Awwooga awwooga!

Could this be reading outside the BT arrays? Yes: AMSFix does *not* bounds-check it. [details moved to job001550. RHSK 2006-12-14]. Why would poolams.c start getting this wrong after all this time? Perhaps the pervasive format->headerSize changes? Regardless of culpability here, index i should be bounds-checked in an AVER. See job001550: add better asserts.

Alternatively, this could be a real correctness bug in the tables, possibly related to job0001548.

Hot topics:
  - summary accuracy (most recent work: DRJ (for Configura) on saving summaries);
  - pervasive format->headerSize changes.

I can't check the index in the debugger just yet, because the debugger is busy with job001548.

GDR 2014-04-10: When the option AMS_SUPPORT_AMBIGUOUS is FALSE, ams->shareAllocTable is TRUE: that is, segments save space by only having two bit tables, with amsseg->allocTable equal to amsseg->nonwhiteTable. (See analysis.non-moving-colour.constraint.reclaim.white-free-bit [5] for the design.) Moreover, these tables are not always in use: the implementation has flags allocTableInUse and colourTablesInUse to control access to the tables, which relies on the following invariant (which ought to be in AMSSegCheck, but is not there):

      && amsseg->allocTableInUse
      && amsseg->colourTablesInUse)

That is, if the table is shared, it may either be accessed via amsseg->allocTable, or via amsseg->nonwhiteTable, but not both at the same time. This is because the two uses have subtly different assumptions: code that accesses this table via amsseg->allocTable doesn't worry about maintaining the colour invariant, whereas code that accesses this table via amsseg->nonwhiteTable does (it mustn't let the colour become INVALID, that is, grey and nonwhite).

Now, AMSBufferEmpty() contains the following code:

    if (amsseg->allocTableInUse) {
    } else {
      amsseg->allocTableInUse = TRUE;
      BTSetRange(amsseg->allocTable, 0, amsseg->firstFree);

This uses amsseg->allocTable without checking to see if it is already in use as amsseg->nonwhiteTable. If that's the case then if any grains in the used part of the buffer are grey (as they might be if they were fixed) then the BTSetRange call will turn them INVALID. So we must guard this code with a check on ams->shareAllocTable.

(Why does this bug manifest so rarely? The bug will happen whenever AMS_SUPPORT_AMBIGUOUS is FALSE and there's a BufferDetach of a buffer containing grey grains, which surely wouldn't be quite as rare as it seems to be? Maybe the test case is just not very good at provoking the problem?)

Note that Pekka gave essentially the same analysis and solution in 2002 [6].
How foundautomated_test
Evidenceversion/1.107/...@161223 == release 1.107.0, w3i3mv\ci\amsss
version/1.107/...@161223 == release 1.107.0, xcppgc\ci\amsss 23954 on duck
mps/branch/2009-05-18/vc9exit/...@168128, .\w3i3m9\ci\amsss.exe 1243611132, on PC
[1] <>
[2] <>
[3] <>
[4] <>
[5] <>
[6] <>
Observed in1.107.0
Created byRichard Kistruck
Created on2006-12-14 11:38:32
Last modified byGareth Rees
Last modified on2014-05-16 10:57:55
History2006-12-14 RHSK Created.
2006-12-14 RHSK Move lack of BT index bounds-check into new job001550.
2006-12-14 RHSK Always fails on xcppgc CI,TI with seed 23954.
2006-12-14 RHSK Summarise occurrence.
2006-12-28 RHSK Link to job000548 and job000535.
2009-05-29 RHSK New failing seed value (seed-values have changed with improvements to rnd() in testlib.c)
2010-02-26 RHSK Another.
2013-03-19 GDR Assigned to RB.
2013-06-16 RB Downgraded to "optional" as AMS isn't affecting anyone (see job003499).
2014-04-10 GDR Explain the problem and the solution.


Change Effect Date User Description
185434 closed 2014-04-10 18:16:29 Gareth Rees Don't turn on the allocTable in AMSBufferEmpty when it's shared with nonwhiteTable and the colour tables are in use -- this will turn any grey grains in the segment invalid.
Add more checking to AMS, including the table use invariant.