MPS issue job004099

TitleMonitor not usable for long-running processes
Assigned userGareth Rees
DescriptionThe monitor system consumes telemetry events and maintains a model of the internal state of the MPS, along with time series of events in that model. If the telemetry file is larger than about 1 GB, the monitor stops working (we're not completely sure why this is, but either the monitor window never comes up, or it comes up and doesn't display anything meaningful. More analysis required). The telemetry file grows at about 2-3 MB per second of execution of a busy client program, if arena/pool/user/trace events are turned on (on the machine called ROBIN, in mid-2018), and about 20-30 MB per second of execution if 'seg' events are also turned on (important for some analysis). So we can't use the monitor to analyse runs of more than 40 seconds (with seg events) or 400 seconds (without them). We need to be able to study and diagnose MPS problems in long-running programs (hours).
Analysis 1. Although there is a way for the client to turn telemetry output on and off (mps_telemetry_control() etc), the monitor can't cope with a telemetry stream which starts in mid-execution. The telemetry subsystem could be enhanced to dump some MPS state when the telemetry mask changes. Then the client could turn telemetry on when it hits MPS difficulty, and turn it off again, and the resulting telemetry file would have enough information for a monitor to do something useful.
 2. The monitor should at least *notice* if it gets a partial telemetry stream, and complain. Ideally, it would be able to build and analyse a partial model.
 3. The monitor could have a 'recording mode', which controls the creation of time series. It would maintain its model continuously, but only add to time series when 'recording mode' is switched on (by a user interface button, or something on the command-line, or even a user event). If the problems of the monitor with huge telemetry files are caused by the sheer size of the time series objects on the Python heap, such a change would help (although we'd still have telemetry files which are tens of GB).
How foundinspection
EvidenceExperimentation with multi-GB telemetry files, as described in 'Description'.
Created byNick Barnes
Created on2018-07-24 14:31:16
Last modified byGareth Rees
Last modified on2018-12-13 13:51:24
History2018-07-24 NB Created.