MPS issue job003474

Titleamcssth test failure on lii6gc and fri3gc
Statusclosed
Priorityessential
Assigned userGareth Rees
OrganizationRavenbrook
DescriptionBruce Mitchener [1] reports that lii6gc/cool/amcssth fails on Ubuntu 10.04 (Lucid Lynx) x86-64:

    Running lii6gc/cool/amcssth
    Segmentation fault

I reproduced this on Ubuntu 12.04 x64-64:

    $ make test
    [...]
    Running lii6gc/hot/amcssth
    Segmentation fault (core dumped)

The test log shows that the seed was 341250310.
AnalysisThis test case fails with several different symptoms, so there may be multiple race conditions. One is recorded in job003503 (and fixed).

Because this test is multi-threaded it does not run deterministically, and the test does not fail consistently. I ran the test 100 times with this seed:

    for ((X=0;X<100;X++)); do
        code/lii6gc/hot/amcssth 341250310 > /dev/null
    done

Out of the 100 runs, there were three segmentation faults and one assertion failure:

    MPS ASSERTION FAILURE: res == 0
    lockli.c
    139

The manual [2] says that this might be because "The client program has made a re-entrant call into the MPS. Look at the backtrace to see what it was. Common culprits are format methods and stepper functions."

On fri4gc, this test mostly fails, and reasonably often (one time in five, say, when run with with seed 1263499804) it asserts here:

    Assertion failed: (((mps_word_t)w & 3) == 0), function dylan_wrapper_check, file fmtdy.c, line 109.

(It is possible to get it to assert twice in the same run: presumably once from each thread.)

With seed 925071270, I got a segmentation fault on lii6gc. Here's the backtrace for thread 1:

    #0 0x00007f0674079707 in kill () from /lib/x86_64-linux-gnu/libc.so.6
    #1 0x0000000000436914 in sigHandle (sig=<optimized out>,
        info=0x7f0674041ab0, context=0x7f0674041980) at protli.c:109
    #2 <signal handler called>
    #3 dylan_check (addr=0x7f0674a69598) at fmtdytst.c:216
    #4 0x0000000000401a14 in churn (ap=0x7f0674affe60) at amcssth.c:173
    #5 0x0000000000401afc in fooey2 (arg=<optimized out>, s=<optimized out>)
        at amcssth.c:290
    #6 0x000000000043680d in ProtTramp (resultReturn=0x7f0674041ec8,
        f=0x401ac3 <fooey2>, p=0x0, s=0) at protix.c:132
    #7 0x000000000040613c in mps_tramp (r_o=0x7f0674041ec8, f=0x401ac3 <fooey2>,
        p=0x0, s=0) at mpsi.c:1378
    #8 0x00000000004017c4 in fooey (childIsFinishedReturn=0x7ffff4e08d7c)
        at amcssth.c:306
    #9 0x00007f0674409e9a in start_thread ()
        from /lib/x86_64-linux-gnu/libpthread.so.0
    #10 0x00007f0674136ccd in clone () from /lib/x86_64-linux-gnu/libc.so.6
    #11 0x0000000000000000 in ?? ()

And here for thread 2:

    #0 0x00007f06744109fa in __lll_unlock_wake ()
        from /lib/x86_64-linux-gnu/libpthread.so.0
    #1 0x00007f067440d104 in _L_unlock_644 ()
        from /lib/x86_64-linux-gnu/libpthread.so.0
    #2 0x00007f067440d063 in pthread_mutex_unlock ()
        from /lib/x86_64-linux-gnu/libpthread.so.0
    #3 0x0000000000434b81 in LockReleaseMPM (lock=0x7f0674a7e110) at lockli.c:157
    #4 0x000000000040d06a in arenaLeaveLock (arena=0x7f0674b38000, recursive=0)
        at global.c:586
    #5 0x000000000040d09d in ArenaLeave (arena=<optimized out>) at global.c:565
    #6 0x00000000004051d6 in mps_ap_fill (p_o=0x7ffff4e08c18,
        mps_ap=0x7f0674a7e9e0, size=32) at mpsi.c:1002
    #7 0x000000000040194c in make (ap=0x7f0674a7e9e0) at amcssth.c:97
    #8 0x0000000000401a28 in churn (ap=0x7f0674a7e9e0) at amcssth.c:174
    #9 0x0000000000401e00 in test (arg=<optimized out>, s=<optimized out>)
        at amcssth.c:261
    #10 0x000000000043680d in ProtTramp (resultReturn=0x7ffff4e08d70,
        f=0x401b22 <test>, p=0x7f0674b38000, s=0) at protix.c:132
    #11 0x000000000040613c in mps_tramp (r_o=0x7ffff4e08d70, f=0x401b22 <test>,
        p=0x7f0674b38000, s=0) at mpsi.c:1378
    #12 0x00000000004020f2 in main (argc=<optimized out>, argv=0x7ffff4e08e78)
        at amcssth.c:334
How foundcustomer
Evidence[1] <https://info.ravenbrook.com/mail/2013/05/07/01-52-54/0/>
[2] <http://www.ravenbrook.com/project/mps/...html#common-assertions-and-their-causes>
Observed in1.111.0
Created byGareth Rees
Created on2013-05-07 10:07:28
Last modified byRichard Brooksby
Last modified on2013-07-17 16:59:08
History2013-05-07 GDR Created

Fixes

Change Effect Date User Description
182882 closed 2013-07-01 20:04:30 Richard Brooksby amcssth test was broken: didn't register the worker threads as roots, only created one worker thread, registered it twice. Weird.

Imported from Git
 Author: Richard Brooksby <rb@ravenbrook.com> 1372705470 +0100
 Committer: Richard Brooksby <rb@ravenbrook.com> 1372705470 +0100
 sha1: d7e6064d3b7f28e8fa0844be2984db209cebaee4