1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 <2011> 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 | Index | 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 <2011> 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 |
<== Date ==> | <== Thread ==> |
---|
Subject: | RE: caget very rarely core dumps in osdThread.c |
From: | "Jeff Hill" <[email protected]> |
To: | "'Shankar, Murali'" <[email protected]>, <[email protected]> |
Date: | Fri, 6 May 2011 16:48:40 -0600 |
Hi Murlai, > We do have symbols and all the core files point here > > (gdb) bt > #3 0x0098f2a6 in start_routine (arg=0xb7d00598) at >../../../src/libCom/osi/os/posix/osdThread.c:309 > #4 0x007423cc in start_thread () from /lib/tls/libpthread.so.0 > #5 0x005a9f0e in clone () from /lib/tls/libc.so.6 Based on line number 309, it appears that the precipitating circumstance is a failure initially at line 309 in the “start_routine” function. status = pthread_setspecific(getpthreadInfo,arg); ç=================== bad status checkStatusQuit(status,"pthread_setspecific","start_routine"); It’s interesting that pthread_setspecific is failing. Of course one is suspecting a race condition where the “once” function, in the same source file hasn’t finished running at a time that the “start_thread” function is already running in a new thread, but supposedly pthread_once is preventing that. Could you forward the output from “thread apply all bt” in gdb against the same core file. Thanks, Jeff Message content: TSPA With sufficient thrust, pigs fly just fine. However, this is not necessarily a good idea. It is hard to be sure where they are going to land, and it could be dangerous sitting under them as they fly overhead. -- RFC 1925 From: [email protected] [mailto:[email protected]] On Behalf Of Shankar, Murali As part of tracking down failures elsewhere, we ran into some core dumps of caget from EPICS base version R3-14-10 on linux. These were caget’s for different PV’s and at different points in time. Apr 27 07:46 core.21238 - caget GTW4:MEM:CHK Apr 27 13:54 core.25030 - caget MCCELOG:MEM:CHK Apr 30 14:30 core.26546 - caget PROD03:MEM:CHK May 3 13:27 core.12026 - caget DAEMON4:DISK:CHK May 5 13:20 core.4340 - caget PROD01:PROC:CHK We do have symbols and all the core files point here (gdb) bt #0 0x0098f9b1 in createImplicit () at ../../../src/libCom/osi/os/posix/osdThread.c:466 #1 0x00990064 in epicsThreadGetIdSelf () at ../../../src/libCom/osi/os/posix/osdThread.c:618 #2 0x009828b2 in cantProceed (msg=0x9a0a8c "start_routine") at ../../../src/libCom/misc/cantProceed.c:63 #3 0x0098f2a6 in start_routine (arg=0xb7d00598) at ../../../src/libCom/osi/os/posix/osdThread.c:309 #4 0x007423cc in start_thread () from /lib/tls/libpthread.so.0 #5 0x005a9f0e in clone () from /lib/tls/libc.so.6 The method createImplicit has not changed from R3-14-10 to R3-14-12. This is the appropriate line of code. pthreadInfo->osiPriority = (param.sched_priority - pcommonAttr->minPriority) * 100.0 / (pcommonAttr->maxPriority - pcommonAttr->minPriority + 1); Is there anything more we can do to figure out the root cause? Regards, Murali |