MPS issue job003348

TitleUnclear what happens if a thread dies while registered
Assigned userRichard Brooksby
DescriptionIt may be hard for the client program to deregister all their threads before the threads exit: threads might be killed or otherwise terminate inconveniently.

The documentation needs to say what the consequences are. Does the MPS go wrong?

(Originally reported by Tony Mann [1].)
AnalysisOn all platforms, if the thread dies before calling mps_thread_dereg, there is no way for the client program to deregister it. This results in a failure of the assertion RingIsSingle(&arena->threadRing) when the arena is destroyed.

On platforms li and fr, we send signals to the thread with pthread_kill. POSIX [2] specifies that this can return ESRCH ("No thread could be found corresponding to that specified by the given thread ID") or EINVAL ("The value of the sig argument is an invalid or unsupported signal number").

On platform xc, we call thread_suspend and thread_resume and AVER(kern_return == KERN_SUCCESS). Experiment shows that if the thread has died, we can get MACH_SEND_INVALID_DEST or KERN_INVALID_ARGUMENT. So a dead thread will provoke an exception here.

On platform w3, we call SuspendThread [3] and ResumeThread [4]. The documentation doesn't say what happens if the thread is dead, but a comment by DRJ in thw3.c [5] says that when these functions return -1, GetLastError is 5 (ERROR_ACCESS_DENIED).

It's not clear from the documentation for any of these systems how reliable detecting thread termination is likely to be. Thread ids are likely to be a scarce resource, so that by the time the MPS gets around to attempting to suspend a terminated thread, the id might have been reused for another thread. The MPS might make best efforts, but cannot guarantee to handle any case other than the one where all threads terminate cleanly. The MPS can't be a thread manager for the mutator!

The most we might do is to make a "best effort" to spot dead threads when we attempt to suspend them, moving the thread to some ring of dead threads so that we stop sending it signals in future. This will at least allow a malfunctioning program to limp along. But we ought to assert if this happens, so that we're not covering up errors in the client program.
How foundinspection
Observed in1.110.0
Created byGareth Rees
Created on2012-10-31 15:32:51
Last modified byGareth Rees
Last modified on2016-03-13 00:30:40
History2012-10-31 GDR Created.


Change Effect Date User Description
188923 closed 2016-01-19 16:42:27 Richard Brooksby Merging branch/2014-10-25/thread.