MPS issue job003772

TitleAWL doesn't provoke collections
Statusclosed
Priorityessential
Assigned userGareth Rees
OrganizationRavenbrook
DescriptionAllocating objects in AWL pools does not provoke opportunistic collections via TracePoll.

There's a test case in finaltest on the 2014-04-22/condemn branch, commented out with a reference to this job.
AnalysisIt's clear why this happens: allocating objects in AWL pools does not contribute to the newSize of the generation, and that means that ChainDeferral never recommends the generation for collection.

Could this be deliberate? If so, there's no indication in the design [2]. But it's possible that this features ensures that rehashing of address-based hash tables (which are likely things to allocate in AWL pools) doesn't cause their chain to be collected (which might move objects and so force another rehash). But is this necessary now that we have made it possible to specify your own generation chain when creating an AWL pool? If you don't want the rehashing of your table to cause the objects it refers to to be moved, then put the table and the objects in different chains.

I had a quick go at implementing this and found it too difficult because it is not clear what invariant needs to be maintained (and indeed, as we'll see below, the existing code is buggy and so provides poor guidance).

Here's an analysis of the operations on the pool generation sizes in the current code:

AMC
1. BufferFill: total += segment; new += segment (unless ramping, or forHashArrays).
2. RampEnd: new += segment [i]
3. Whiten: new -= unused part of buffer [ii]
4. Reclaim: total -= segment
5. BufferEmpty: do nothing [ii]
6. Finish: total -= segments; newSize = 0

AMS
1. BufferFill: total += buffer; new += buffer; seg->newAlloc += alloc
2. Whiten: new += unused part of buffer; new -= seg->newAlloc; seg->newAlloc = unused part of buffer [v]
3. Reclaim: total -= reclaimed
4. BufferEmpty: total -= unused part of buffer [iii]; new -= unused part of buffer; seg->newAlloc -= unused part of buffer
5. Finish: nothing [iv]

LO
1. BufferFill: total += buffer; new += buffer; seg->newAlloc += buffer
2. Whiten: new -= seg->newAlloc; seg->newAlloc = 0
3. Reclaim: total -= reclaimed [vi]
4. BufferEmpty: total -= unused part of buffer [iii]; new -= unused part of buffer; seg->newAlloc -= unused part of buffer

The following points seem dodgy or wrong:
[i] forHashArrays is not taken into account when ending a ramp
[ii] the unused part of a buffer on an AMC segments continues to contribute to the newSize if it's emptied, but not if it's condemned
[iii] the unused part of AMS and LO buffers is subtracted from totalSize even though it still belongs to the pool
[iv] AMS doesn't update poolgen when finishing
[v] The logic here seems correct but needlessly obscure
[vi] reclaimed parts of an LO segment are subtracted from the total even if the segment is not deleted

So in order to be confident that the accounting has been done correctly, there needs to be an invariant that can be checked. What should that invariant consist of? Well, a generation consists of these parts:

total Total space (sum of all segments)
free Unused (free or fragmented) space
old Old allocations
new New allocations
oldDeferred Old allocations (deferred, e.g. when ramping)
newDeferred New allocations (deferred, e.g. when ramping)

Where "old" allocations are prior to the last time this generation was collected, and "new" allocations are since the last time this generation was collected. (Roughly speaking: actually the point of measurement is where the segment is whitened.)

Then we could check that total == free + old + new + oldDeferred + newDeferred, and we can check that the operations preserve the invariant: for example when a ramp finishes we can set old += oldDeferred; oldDeferred = 0; new += newDeferred; newDeferred = 0; which clearly preserves the invariant.

(Not all pools need to maintain all of these values: AMC doesn't have free space, and only AMC has deferred allocation. I guess that's what makes it tempting to implement this in different ways in the different pool classes. But it would be better to get it right consistently.)

Some more general points:
1. Why are the sizes maintained on a per-pool basis rather than a per-generation basis? The current approach means that ChainDeferral has to add up the sizes of all the poolgens each time. The pools don't seem to make any use of this per-pool information at present. (But they could use it to implement hypothetical PoolSize and PoolFreeSize functions -- see job003787).
2. LO and AMS maintain per-segment accounting in addition to the per-PoolGen accounting. Is this really necessary?
3. Why is the generation number stored in the PoolGen and not the pool? What is it used for anyway?
4. It strikes me that a better way to handle the ramping use case would be to "clamp" the chain so that it does not get collected via the poll mechanism. (Analogous to clamping the arena so that it does not start new collections.) But let's save that for another job.

New set of operations on the sizes maintains the invariant:
1. SegCreate — +total, +free
2. BufferFill — −free, +new/+newDeferred
3. Whiten — +old, −new, +oldDeferred, −newDeferred [but see below]
4. Reclaim — +free, −old/-oldDeferred
5. BufferEmpty — −new/-newDeferred, +free
6. SegFree — −total, −free

There's an annoying wrinkle: when you whiten a buffered segment, the unused part of the buffer needs to remain "new". We can handle this by adding an operation 3b: blacken unused buffer which does −old/−oldDeferred +new/+newDeferred.
How foundmanual_test
Evidence[1] <https://info.ravenbrook.com/project/mps/master/code/poolawl.c>
[2] <https://info.ravenbrook.com/project/mps/master/design/poolawl.txt>
Test procedurefinaltest
Created byGareth Rees
Created on2014-04-22 17:34:42
Last modified byGareth Rees
Last modified on2014-10-20 17:34:11
History2014-04-22 GDR Created.

Fixes

Change Effect Date User Description
186409 closed 2014-06-03 14:52:47 Richard Brooksby Merge branch mps/branch/2014-04-23/awl into the master sources