MPS issue job001153

TitleThe MPS lacks error recovery mechanisms
Statusclosed
Priorityessential
Assigned userRichard Brooksby
OrganizationRavenbrook
DescriptionCurrently, if there's a protocol violation, we assert out. (What exactly happens then depends on the plinth, but we don't allow it to return in this case.) This could be disastrous in a deployment environment. If this happens in a customer environment there might be something better we could do to allow work to continue or data to be recovered.
AnalysisWe should first define the requirements. Some things to think about are the recovery tactics that might be needed in the environment, what kind of backtrace information could be included in automated bug reporting in order to diagnose the problem, and friendliness or transparency to the end user: in particular not losing their work.

Not only the mps.h interface, but also the format interface, the plinth interface, etc. [RHSK+NB 2005-10-06]
Currently, internal knowledge that mps_foo() cannot go wrong has lead us to define mps_foo() as not returning a status code. This is not future-proof. NB cites pthread's use of EINVAL. [RHSK+NB 2005-10-06]

Classes of protocol violation include: garbage values, incorrect call order or state, ... [RHSK+NB 2005-10-06]

This is especially important if we plan to have CET ship with the "hot" variety with assertions. CET users don't want to lose their work if there's a possibility that the MPS can limp along.

To solve this in the short term, the MPS should not stop the program on assertion in production, but instead log the information somehow for later inspection. In the longer term, the MPS ought to go into recovery mode and get a message to the client program (and perhaps the user) that things are amiss and they should save work and restart as soon as possible; i.e. the MPS should "raise an exception" in some sense.
How foundcustomer
EvidenceRaw notes from Configura workshop <http://info.ravenbrook.com/mail/2005/02/28/12-40-49/0.txt>.

Can CET use the default plinth? <http://localhost:10080/mail/2012/09/20/23-28-08/0/> et seq.
Observed in1.105.0
Created byRichard Brooksby
Created on2005-03-09 18:25:22
Last modified byGareth Rees
Last modified on2014-04-12 21:55:55
History2005-03-09 RB Created.
2005-10-06 RHSK More analysis (with NB).
2012-09-21 RB Upgraded to critical for CET
2013-05-25 RB Can now be closed since CET can implement user warnings through the installable handler.

Fixes

Change Effect Date User Description
181943 closed 2013-05-19 19:12:47 Richard Brooksby Adding section about assertion handling to the Error topic of the manual.

Imported from Git
 Author: Richard Brooksby <rptb1+github@pobox.com> 1368987167 +0100
 Committer: Richard Brooksby <rptb1+github@pobox.com> 1368987167 +0100
 sha1: fac14cc5d1cba94a37d31bdf1497384014140f7a
181942 open 2013-05-19 18:56:52 Richard Brooksby Documenting mps_lib_assert_fail_install in the Plinth section of the manual.

Imported from Git
 Author: Richard Brooksby <rptb1+github@pobox.com> 1368986212 +0100
 Committer: Richard Brooksby <rptb1+github@pobox.com> 1368986212 +0100
 sha1: cca9721a07b2459b5ba424687690dea48bd6d969
181941 open 2013-05-19 18:17:07 Richard Brooksby Adding always-die assertion handler to testlib, and using it in all tests. Verified that the test fails to stop in HOT variety otherwise.

Imported from Git
 Author: Richard Brooksby <rptb1+github@pobox.com> 1368983827 +0100
 Committer: Richard Brooksby <rptb1+github@pobox.com> 1368984925 +0100
 sha1: edc4e2379aed9dbbdb0b06ef9511f14013765923
179654 open 2012-09-24 10:53:45 Richard Brooksby Adding an installable assertion handler so that Configura don't have to have their own plinth.
Making hot varieties continue after assertions by default.