Hi Murlai,
> We do have symbols and all the core files point here
>
> (gdb) bt
> #3 0x0098f2a6 in start_routine (arg=0xb7d00598) at >../../../src/libCom/osi/os/posix/osdThread.c:309
> #4 0x007423cc in start_thread () from /lib/tls/libpthread.so.0
> #5 0x005a9f0e in clone () from /lib/tls/libc.so.6
Based on line number 309, it appears that the precipitating circumstance is a failure initially at line 309 in the “start_routine” function.
status = pthread_setspecific(getpthreadInfo,arg); ç=================== bad status
checkStatusQuit(status,"pthread_setspecific","start_routine");
It’s interesting that pthread_setspecific is failing. Of course one is suspecting a race condition where the “once” function, in the same source file hasn’t finished running at a time that the “start_thread” function is already running in a new thread, but supposedly pthread_once is preventing that.
Could you forward the output from “thread apply all bt” in gdb against the same core file.
Thanks,
Jeff
______________________________________________________
Jeffrey O. Hill Email [email protected]
LANL MS H820 Voice 505 665 1831
Los Alamos NM 87545 USA FAX 505 665 5107
Message content: TSPA
With sufficient thrust, pigs fly just fine. However, this is
not necessarily a good idea. It is hard to be sure where they
are going to land, and it could be dangerous sitting under them
as they fly overhead. -- RFC 1925
As part of tracking down failures elsewhere, we ran into some core dumps of caget from EPICS base version R3-14-10 on linux. These were caget’s for different PV’s and at different points in time.
Apr 27 07:46 core.21238 - caget GTW4:MEM:CHK
Apr 27 13:54 core.25030 - caget MCCELOG:MEM:CHK
Apr 30 14:30 core.26546 - caget PROD03:MEM:CHK
May 3 13:27 core.12026 - caget DAEMON4:DISK:CHK
May 5 13:20 core.4340 - caget PROD01:PROC:CHK
We do have symbols and all the core files point here
(gdb) bt
#0 0x0098f9b1 in createImplicit () at ../../../src/libCom/osi/os/posix/osdThread.c:466
#1 0x00990064 in epicsThreadGetIdSelf () at ../../../src/libCom/osi/os/posix/osdThread.c:618
#2 0x009828b2 in cantProceed (msg=0x9a0a8c "start_routine") at ../../../src/libCom/misc/cantProceed.c:63
#3 0x0098f2a6 in start_routine (arg=0xb7d00598) at ../../../src/libCom/osi/os/posix/osdThread.c:309
#4 0x007423cc in start_thread () from /lib/tls/libpthread.so.0
#5 0x005a9f0e in clone () from /lib/tls/libc.so.6
The method createImplicit has not changed from R3-14-10 to R3-14-12. This is the appropriate line of code.
pthreadInfo->osiPriority =
(param.sched_priority - pcommonAttr->minPriority) * 100.0 /
(pcommonAttr->maxPriority - pcommonAttr->minPriority + 1);
Is there anything more we can do to figure out the root cause?
Regards,
Murali