MPS issue job001570

TitleMPS _gc_start messages may change while client reads them, or be silently skipped
Statusclosed
Priorityoptional
Assigned userRichard Kistruck
OrganizationRavenbrook
DescriptionMPS _gc_start messages may change while client reads them, or be silently skipped

Related jobs:
job001989 "MPS _gc_start messages may cause assert or infinite loop"

RHSK 2006-12-27
If the client's processing (via mps-message_get(), mps_message_gc_start_why(), and mps_message_discard()) of the _gc_start message is 'too late', then the collection this message was about may have already finished and another one begun.

In this case, the MPS will overwrite the data in the first message (such as the "why" reason that the collection started), with data pertaining to the second collection, while the client is still reading the message.

Furthermore, no message will be posted for the second collection start.

This arises because the MPS unsafely reuses a single message structure for all _gc_start messages.

There is no indication that overwriting has occurred or that _gc_start messages have been omitted.

Overwrites and omissions could be avoided by the client processing _gc_start messages 'in time', in other words before a second collection has begun.

However, this is not always possible: MPS collection scheduling may advance the first collection very 'fast' (relative to client speed) by setting a high trace->rate, in other words a very small nPolls, perhaps even as low as 1, thereby preventing the client from processing the _gc_start message.
AnalysisRHSK 2006-12-27
See design/message for the lifecycle of a message. After the client calls Get(), the message is in state "received". It would be clearer to call this state "atclient".

The MPS Message System provides no support for an Originator that needs to know when the message is still "atclient". This support should be added at the Message System level, perhaps with a new function MessageAtClient().

Examine Originator code for both _gc_start and _finalize messages, and consider using MessageAtClient().

Note that it is prudent to pre-allocate _gc_start messages, as there may not be memory available when a GC starts.

DRJ 2007-06-19

MPS should not change message whilst it is at client (using
MessageAtClient, see above).

MPS should have some way of saying how many messages were "dropped".
For example a field in each message could record how many messages were
dropped prior to this one.

RHSK 2008-12-19
Fixed with a complete redesign of GC message lifecycle.
How foundunknown
Evidencehttp://info.ravenbrook.com/infosys/cgi/perfbrowse.cgi?@describe+161203
Observed in1.107.0
Introduced in1.107.0
Created byRichard Kistruck
Created on2006-12-27 16:35:51
Last modified byRichard Kistruck
Last modified on2008-12-19 16:30:38
History2006-12-27 RHSK Created.
2007-06-19 DRJ Notes on what would be desirable.
2008-11-28 RHSK Link to job001989
2008-12-19 RHSK Fixed with a complete redesign of GC message lifecycle.

Fixes

Change Effect Date User Description
166995 closed 2008-12-19 16:25:20 Richard Kistruck MPS br/timing design/message-gc: Complete re-write, for new GC message lifecycle. See job001989.
166993 closed 2008-12-19 14:27:48 Richard Kistruck MPS br/timing: rename z001989a.c as zmess.c
166919 closed 2008-12-11 16:08:47 Richard Kistruck MPS br/timing z001989a.c: oops, re-enable test of delayeed getting
of messages. Demonstrates fix for job001989.
166918 closed 2008-12-11 16:06:14 Richard Kistruck MPS br/timing TraceIdMessages: Create them at mps_arena_create time,
re-create them immediately after a trace sends its last message.
Store them, one per TraceId, in ArenaStruct. Cope correctly if they
cannot be created (ControlAlloc fails). Destroy them correctly on
mps_arena_destroy. See job001989 and design/message-gc#lifecycle
(not yet written).