MPS issue job003898

TitleSpare committed memory in the wrong zones prevents allocation
Assigned userGareth Rees
DescriptionFrom [1]:

amcss failed: log follows
lii6ll/cool/amcss: randomize(): choosing initial state (v3): 2111235085.
Picked scale=32 grainSize=8


MPS_RESERVE_BLOCK: COMMIT_LIMIT: arena commit limit exceeded

Aborted (core dumped)
AnalysisHere's one that is reproducible on OS X when ASLR is disabled:

    $ noaslr xci6ll/cool/amcss 138581164
    Picked scale=8 grainSize=32768
    MPS_RESERVE_BLOCK: COMMIT_LIMIT: arena commit limit exceeded

To debug this (line numbers as of changelevel 187491):

    $ lldb xci6ll/cool/amcss
    (lldb) br set -f amcss.c -l 65 -c 'nCollsStart == 19'
    (lldb) process launch -A 138581164
    (lldb) c
    (lldb) br set -f amcss.c -l 102 -c 'calls == 111308'
    (lldb) c

This puts us at the failed reserve. Here we find that:

    committed = 8183808
    commitLimit = 8192000
    spareCommitted = 7241728
    spareCommitLimit = 10485760

The reserve will fit in a single arena grain (32768 bytes) but the zone preference is


but the arena's free land only contains the following blocks, none of which have any space in this zone:

    [0000000101020000,0000000101200000) {1966080, 0000000000000000000000000000000011111111111111111111111111111100}
    [0000000101218000,00000001013E0000) {1966080, 0011111111111111111111111111111011111111111111111111111111111100}
    [00000001013F0000,0000000101410000) {1966080, 1011111111111111111111111111111011111111111111111111111111111101}
    [0000000101420000,00000001017D0000) {3866624, 1011111111111111111111111111111111111111111111111111111111111101}
    [00000001017F0000,0000000101810000) {3932160, 1011111111111111111111111111111111111111111111111111111111111101}
    [0000000101820000,0000000101BE0000) {3932160, 1011111111111111111111111111111111111111111111111111111111111101}
    [0000000101BF0000,0000000101C10000) { 131072, 1000000000000000000000000000000000000000000000000000000000000001}
    [0000000101C20000,0000000101FA8000) {3702784, 1011011111111111111111111111111111111111111111111111111111111101}
    [0000000101FC0000,0000000101FE0000) { 131072, 0011000000000000000000000000000000000000000000000000000000000000}
    [0000000101FF0000,0000000102010000) {3932160, 1011111111111111111111111111111111111111111111111111111111111101}
    [0000000102020000,00000001023E0000) {3932160, 1011111111111111111111111111111111111111111111111111111111111101}
    [00000001023F0000,0000000102410000) {3932160, 1011111111111111111111111111111111111111111111111111111111111101}
    [0000000102420000,0000000102780000) {3538944, 0000000011111111111111111111111111111111111111111111111111111100}

This is due to three interacting problems with arenaAllocPolicy:

1. Even though arenaAllocPolicy can't allocate in the preferred zones, it has fallback code for allocating in other zones (of which there are plenty in this situation). Why isn't this fallback code being used? It's because arenaAllocPolicy tries extending the arena, and this fails because it hits the commit limit (see job003899) and this causes arenaAllocPolicy to return immediately instead of falling back to the other zones.

If arenaAllocPolicy is changed to continue from a failure to grow the arena, so that the fallback code is exercised, then this test case passes.

2. Unmapping even a small amount of spare committed memory would have enabled us to create a new chunk and so continue allocating. But spare committed memory is treated as a fixed amount (rather than a proportion of the committed memory) given by ARENA_INIT_SPARE_COMMIT_LIMIT, which is 10 megabytes. (This problem is noted in a TODO in config.h.)

And sure enough, adding a call to mps_arena_spare_commit_limit_set to the test case, with a sensible proportion of the initial chunk, allows the test case to pass.

3. Even with a spare committed memory implemented as a proportion rather than a fixed value, we might still end up in a situation where the spare committed memory is preventing us from creating a new chunk. So arenaAllocPolicy could be changed to work like this:

Plan C: try extending the arena, then try A & B again
Plan D: release the spare committed memory, then try C again
Plan E: add every zone that isn't blacklisted

Implemented point 2 in changelist 189362. Implemented point 1 in changelist 189363. Either one in isolation fixes the problem. I was able to reproduce on OS X with "xc/Debug/amcss 99" and observe the changelists fixing the problem in lldb. Plan 3 doesn't quite make sense, and is covered by changelist 189363. RB 2016-02-26
How foundautomated_test
Evidence[1] <>
Created byGareth Rees
Created on2014-11-07 12:22:53
Last modified byGareth Rees
Last modified on2016-03-13 00:26:14
History2014-11-07 GDR Created.


Change Effect Date User Description
189397 closed 2016-02-29 12:36:05 Richard Brooksby Merging branch/2016-02-26/job003898 into master sources.