EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  <20202021  2022  2023  2024  Index 2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  <20202021  2022  2023  2024 
<== Date ==> <== Thread ==>

Subject: thread joinable race
From: Michael Davidsaver via Core-talk <core-talk at aps.anl.gov>
To: EPICS core-talk <core-talk at aps.anl.gov>
Date: Sun, 8 Mar 2020 22:46:22 -0700
In following up on a strange CI test failure I noticed during the recent codeathon[1]
I realized a mistake I made in adding epicsThreadMustJoin() [2].  This change
introduced a reference counter to struct epicsThreadOSD.  The bug is in
(conditionally) incrementing the ref counter after pthread_create().
This allows a short-lived thread which attempts to self-join to race for a double free().
And it happens that epicsThreadTest does this.

The fix is I think straight forward [3].  I'm wondering how severe this issue should be considered?
It's a race which can cause a crash at runtime.  However, the circumstances seem not so common.


[1] https://travis-ci.org/mdavidsaver/epics-base/jobs/649447749#L6255-L6261

> Dumping a stack trace of thread '_main_':
> [    0x7f9a9a027ade]: /home/travis/build/mdavidsaver/epics-base/lib/linux-x86_64/libCom.so.3.17.7(epicsStackTrace+0x5e)
> [    0x7f9a9a017d97]: /home/travis/build/mdavidsaver/epics-base/lib/linux-x86_64/libCom.so.3.17.7(cantProceed+0xb7)
> [    0x7f9a9a023273]: /home/travis/build/mdavidsaver/epics-base/lib/linux-x86_64/libCom.so.3.17.7(epicsThreadMustJoin+0x93)
> [          0x401b5a]: ./epicsThreadTest(main+0x40a)
> [    0x7f9a992d1b97]: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7)
> [          0x40168a]: ./epicsThreadTest(_start+0x2a)

Or sometimes locally w/ valgrind

> Process terminating with default action of signal 11 (SIGSEGV)
>  Access not within mapped region at address 0x8
>    at 0x487C0A7: ellDelete (ellLib.c:81)
>    by 0x489B8E5: free_threadInfo (osdThread.c:217)
>    by 0x489CF06: epicsThreadMustJoin (osdThread.c:656)
>    by 0x10A835: (anonymous namespace)::joinTests(void*) (epicsThreadTest.cpp:118)
>    by 0x489C463: start_routine (osdThread.c:411)
>    by 0x483C8B6: mythread_wrapper (hg_intercepts.c:389)
>    by 0x4E08FA2: start_thread (pthread_create.c:486)
>    by 0x4D394CE: clone (clone.S:95)

Or sometimes locally w/ gdb

> malloc(): unsorted double linked list corrupted

occurring during a subsequent create_threadInfo()


[2] https://code.launchpad.net/~epics-core/epics-base/+git/Com/+merge/361379

[3] https://github.com/mdavidsaver/epics-base/commit/02a24a144d0c062311212c769926c1e2df5a1a52

Replies:
Re: thread joinable race Ralph Lange via Core-talk

Navigate by Date:
Prev: Build failed: epics-base base-integration-423 AppVeyor via Core-talk
Next: Re: thread joinable race Ralph Lange via Core-talk
Index: 2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  <20202021  2022  2023  2024 
Navigate by Thread:
Prev: Build failed: epics-base base-integration-423 AppVeyor via Core-talk
Next: Re: thread joinable race Ralph Lange via Core-talk
Index: 2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  <20202021  2022  2023  2024 
ANJ, 09 Mar 2020 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·