MPS issue job003893

TitleArena contracted callback gets called with invalid arena
Statusclosed
Priorityessential
Assigned userGareth Rees
OrganizationRavenbrook
DescriptionOn platform ananll (really lii6ll), amsss segfaults with seed 1273421766 [1]. Not reproducible with this seed on xci6ll.
Analysisgdb shows the following backtrace:

#0 0x000000000047b033 in TreeCheck (tree=0x7ffff71fa028) at tree.c:28
#1 0x00000000004065a8 in ArenaCheck (arena=0x6bc000) at arena.c:168
#2 0x000000000040f70d in vmArenaTrivContracted (arena=0x6bc000,
    base=0x7ffff71fa000, size=1048576) at arenavm.c:486
#3 0x000000000040e5db in vmChunkCompact (tree=0x7ffff71fa028,
    closureP=0x6bc000, closureS=1375995373) at arenavm.c:1127
#4 0x000000000047c674 in TreeTraverseAndDelete (treeIO=0x6bc3d8,
    visitor=0x40e3c0 <vmChunkCompact>, closureP=0x6bc000, closureS=1375995373)
    at tree.c:558
#5 0x000000000040d6fe in VMCompact (arena=0x6bc000, trace=0x6bc698)
    at arenavm.c:1150

In vmChunkCompact the code looks like this:

    vmChunkDestroy(tree, UNUSED_POINTER, UNUSED_SIZE);
    vmArena->contracted(arena, base, size);
    return TRUE;

The call to vmChunkDestroy puts the arena into an invalid state (because the tree node was embedded in the Chunk and so in the VM, which has been destroyed) which is corrected immediately after the return (to TreeTraverseAndDelete).

Solution ideas:

1. Call vmArena->contracted before destroying the chunk instead of afterwards. (Simple, robust, and avoids a race condition in which the freed address space gets reused before vmArena->contracted gets called.)

2. Use a message to inform the client program, instead of a callback. (But will this work? The point of the callbacks is so that on Windows, a compiler that is putting executable code into memory managed by the MPS can register this memory with the Microsoft runtime for stack unwinding via RtlInstallFunctionTableCallback [2]. The registration of the chunk needs to be prompt, otherwise if you get an exception in code that's been allocated in a fresh chunk before you get back to the message-handling loop, then Windows won't know how to unwind the stack. Maybe deregistration could be lazy but then the race condition gets more severe.)

3. (Suggested by NB.) Change the VMCompact algorithm so that it does not destroy the chunks inside the TreeTraverseAndDelete. Instead, we could do the operation in two passes: first, remove chunks from the arena's chunkTree (putting them into another structure); second, iterate over the other structure destroying the chunks. (Complicated and error-prone.)
How foundautomated_test
Evidence[1] https://travis-ci.org/Ravenbrook/mps-temporary/jobs/37985784
[2] http://msdn.microsoft.com/en-gb/library/windows/desktop/ms680595.aspx
Test procedureamsss
Created byGareth Rees
Created on2014-10-14 23:11:55
Last modified byGareth Rees
Last modified on2014-10-20 23:07:34
History2014-10-14 GDR Created.
2014-10-20 GDR Analysis, upgrade to essential.

Fixes

Change Effect Date User Description
187294 closed 2014-10-20 16:40:32 Gareth Rees Call the "contracted" callback before destroying the chunk, as the arena is (briefly) invalid afterwards.