>>> Could you forward the output from "thread apply all bt" in gdb against the same core file.
Here's the output of "thread apply all bt" in gdb against core.21238. The others have the same structure; two threads, one in _dl_sysinfo_int80 and the other in createImplicit.
---------------------------
#0 0x0091f9b1 in createImplicit () at ../../../src/libCom/osi/os/posix/osdThread.c:466
466 pthreadInfo->osiPriority =
(gdb) thread apply all bt
Thread 2 (process 21238):
#0 0x004c47a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1 0x00569f34 in _exit () from /lib/tls/libc.so.6
#2 0x00508604 in exit () from /lib/tls/libc.so.6
#3 0x004f2dfd in __libc_start_main () from /lib/tls/libc.so.6
#4 0x08048f09 in _start ()
Thread 1 (process 21250):
#0 0x0091f9b1 in createImplicit () at ../../../src/libCom/osi/os/posix/osdThread.c:466
#1 0x00920064 in epicsThreadGetIdSelf () at ../../../src/libCom/osi/os/posix/osdThread.c:618
#2 0x009128b2 in cantProceed (msg=0x930a8c "start_routine") at ../../../src/libCom/misc/cantProceed.c:63
#3 0x0091f2a6 in start_routine (arg=0x9144920) at ../../../src/libCom/osi/os/posix/osdThread.c:309
#4 0x007423cc in start_thread () from /lib/tls/libpthread.so.0
#5 0x005a9f0e in clone () from /lib/tls/libc.so.6
--------------------------
Regards,
Murali
-----Original Message-----
From: Jeff Hill [mailto:[email protected]]
Sent: Friday, May 06, 2011 3:49 PM
To: Shankar, Murali; [email protected]
Subject: RE: caget very rarely core dumps in osdThread.c
Hi Murlai,
> We do have symbols and all the core files point here
>
> (gdb) bt
> #3 0x0098f2a6 in start_routine (arg=0xb7d00598) at >../../../src/libCom/osi/os/posix/osdThread.c:309
> #4 0x007423cc in start_thread () from /lib/tls/libpthread.so.0
> #5 0x005a9f0e in clone () from /lib/tls/libc.so.6
Based on line number 309, it appears that the precipitating circumstance is a failure initially at line 309 in the "start_routine" function.
status = pthread_setspecific(getpthreadInfo,arg); ç=================== bad status
checkStatusQuit(status,"pthread_setspecific","start_routine");
It's interesting that pthread_setspecific is failing. Of course one is suspecting a race condition where the "once" function, in the same source file hasn't finished running at a time that the "start_thread" function is already running in a new thread, but supposedly pthread_once is preventing that.
Could you forward the output from "thread apply all bt" in gdb against the same core file.
Thanks,
Jeff
______________________________________________________
Jeffrey O. Hill Email [email protected] <mailto:[email protected]>
LANL MS H820 Voice 505 665 1831
Los Alamos NM 87545 USA FAX 505 665 5107
Message content: TSPA
With sufficient thrust, pigs fly just fine. However, this is
not necessarily a good idea. It is hard to be sure where they
are going to land, and it could be dangerous sitting under them
as they fly overhead. -- RFC 1925
From: [email protected] [mailto:[email protected]] On Behalf Of Shankar, Murali
Sent: Friday, May 06, 2011 3:04 PM
To: [email protected]
Subject: caget very rarely core dumps in osdThread.c
As part of tracking down failures elsewhere, we ran into some core dumps of caget from EPICS base version R3-14-10 on linux. These were caget's for different PV's and at different points in time.
Apr 27 07:46 core.21238 - caget GTW4:MEM:CHK
Apr 27 13:54 core.25030 - caget MCCELOG:MEM:CHK
Apr 30 14:30 core.26546 - caget PROD03:MEM:CHK
May 3 13:27 core.12026 - caget DAEMON4:DISK:CHK
May 5 13:20 core.4340 - caget PROD01:PROC:CHK
We do have symbols and all the core files point here
(gdb) bt
#0 0x0098f9b1 in createImplicit () at ../../../src/libCom/osi/os/posix/osdThread.c:466
#1 0x00990064 in epicsThreadGetIdSelf () at ../../../src/libCom/osi/os/posix/osdThread.c:618
#2 0x009828b2 in cantProceed (msg=0x9a0a8c "start_routine") at ../../../src/libCom/misc/cantProceed.c:63
#3 0x0098f2a6 in start_routine (arg=0xb7d00598) at ../../../src/libCom/osi/os/posix/osdThread.c:309
#4 0x007423cc in start_thread () from /lib/tls/libpthread.so.0
#5 0x005a9f0e in clone () from /lib/tls/libc.so.6
The method createImplicit has not changed from R3-14-10 to R3-14-12. This is the appropriate line of code.
pthreadInfo->osiPriority =
(param.sched_priority - pcommonAttr->minPriority) * 100.0 /
(pcommonAttr->maxPriority - pcommonAttr->minPriority + 1);
Is there anything more we can do to figure out the root cause?
Regards,
Murali
- References:
- caget very rarely core dumps in osdThread.c Shankar, Murali
- RE: caget very rarely core dumps in osdThread.c Jeff Hill
- Navigate by Date:
- Prev:
RE: caget very rarely core dumps in osdThread.c Jeff Hill
- Next:
Announce: seq-2.0.13 Benjamin Franksen
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
<2011>
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
- Navigate by Thread:
- Prev:
RE: caget very rarely core dumps in osdThread.c Jeff Hill
- Next:
Announce: seq-2.0.13 Benjamin Franksen
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
<2011>
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
|