|Title||MPS _gc_start messages may cause assert or infinite loop|
|Assigned user||Richard Kistruck|
|Description||MPS _gc_start messages may cause assert or infinite loop|
job001570 "MPS _gc_start messages may change while client reads them, or be silently skipped"
If the client enables _gc_start messages with:
and a collection starts before the client has collected (with
mps_message_get) the _gc_start message from the preceding
collection, then the message MPS queue becomes corrupted. This
causes an assert, or (in varieties without asserts) prevents the
message from being removed from the queue. This unremovable
message is likely to cause the client to loop forever if it
attempts to Get the message, and will cause such a loop if the
client calls mps_arena_destroy.
A typical assert is:
MPS ASSERTION FAILURE: !RingIsSingle(_old)
Mitigation: clients that do not enable _gc_start messages are
unaffected; a client that enables them and then frequently checks
for and collects them can apparently prevent the defect occurring.
Discovered when looking at the code. There is one GCSTART message,
statically allocated, per pre-allocated TraceStruct. If it's been
Posted but not yet Got, then it's on the arena message Ring. When
the trace ends, the TraceStruct is Del'd from arena->busyTraces,
even though the MessageStruct et al is still in use. When a new
trace starts, it re-uses that TraceStruct, treating it as
uninitialised memory, calls TraceCreate, which calls MessageInit,
which calls RingInit, thus corrupting the arena message Ring.
Now it's right and proper that GCSTART messages are pre-allocated,
in accordance with the principle that we avoid allocating when
starting a trace. But it's just wrong to statically allocate them:
they should be dynamically allocated with ControlAlloc, like other
messages. Yes: if the client enables GCSTART and never collects
the messages, then they will consume more & more memory. This is
just not a problem: the client should not fail to collect messages
it has enabled!
Surprisingly, the message queue limps on for a long time after
corruption with a single RingInited GCSTART message. Appending new
messages (eg. finalization messages) works, and heals the
next-direction of the ring, which allows these messages to then be
Got as normal. Re-adding the same GCSTART message, when already
present, also 'works'. But RingRemove simply leaves the ring
unchanged; this causes an infinite loop.
Fixed with a complete redesign of GC message lifecycle.
|Created by||Richard Kistruck|
|Created on||2008‑11‑28 17:24:45|
|Last modified by||Richard Kistruck|
|Last modified on||2008‑12‑19 16:30:14|
|History||2008-11-28 RHSK Created.|
2008-12-19 RHSK Fixed with a complete redesign of GC message lifecycle.
|166995||closed||2008‑12‑19 16:25:20||Richard Kistruck||MPS br/timing design/message-gc: Complete re-write, for new GC message lifecycle. See job001989.|
|166993||closed||2008‑12‑19 14:27:48||Richard Kistruck||MPS br/timing: rename z001989a.c as zmess.c|
|166919||closed||2008‑12‑11 16:08:47||Richard Kistruck||MPS br/timing z001989a.c: oops, re-enable test of delayeed getting
of messages. Demonstrates fix for job001989.
|166918||closed||2008‑12‑11 16:06:14||Richard Kistruck||MPS br/timing TraceIdMessages: Create them at mps_arena_create time,
re-create them immediately after a trace sends its last message.
Store them, one per TraceId, in ArenaStruct. Cope correctly if they
cannot be created (ControlAlloc fails). Destroy them correctly on
mps_arena_destroy. See job001989 and design/message-gc#lifecycle
(not yet written).
|166881||open||2008‑12‑05 22:14:56||Richard Kistruck||MPS br/timing z001989a: get collection begin/end messages.
FAILS: thereby demonstrating defect job001989.