Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  <20112012  2013  2014  2015  2016  2017  2018  2019  Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  <20112012  2013  2014  2015  2016  2017  2018  2019 
<== Date ==> <== Thread ==>

Subject: RE: caget very rarely core dumps in osdThread.c
From: "Shankar, Murali" <mshankar@slac.stanford.edu>
To: Jeff Hill <johill@lanl.gov>, "tech-talk@aps.anl.gov" <tech-talk@aps.anl.gov>
Date: Fri, 6 May 2011 15:58:45 -0700
>>> Could you forward the output from "thread apply all bt" in gdb against the same core file.

Here's the output of "thread apply all bt" in gdb against core.21238. The others have the same structure; two threads, one in _dl_sysinfo_int80 and the other in createImplicit.
---------------------------

#0  0x0091f9b1 in createImplicit () at ../../../src/libCom/osi/os/posix/osdThread.c:466
466             pthreadInfo->osiPriority =
(gdb) thread apply all bt

Thread 2 (process 21238):
#0  0x004c47a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1  0x00569f34 in _exit () from /lib/tls/libc.so.6
#2  0x00508604 in exit () from /lib/tls/libc.so.6
#3  0x004f2dfd in __libc_start_main () from /lib/tls/libc.so.6
#4  0x08048f09 in _start ()

Thread 1 (process 21250):
#0  0x0091f9b1 in createImplicit () at ../../../src/libCom/osi/os/posix/osdThread.c:466
#1  0x00920064 in epicsThreadGetIdSelf () at ../../../src/libCom/osi/os/posix/osdThread.c:618
#2  0x009128b2 in cantProceed (msg=0x930a8c "start_routine") at ../../../src/libCom/misc/cantProceed.c:63
#3  0x0091f2a6 in start_routine (arg=0x9144920) at ../../../src/libCom/osi/os/posix/osdThread.c:309
#4  0x007423cc in start_thread () from /lib/tls/libpthread.so.0
#5  0x005a9f0e in clone () from /lib/tls/libc.so.6
--------------------------


Regards,
Murali


-----Original Message-----
From: Jeff Hill [mailto:johill@lanl.gov] 
Sent: Friday, May 06, 2011 3:49 PM
To: Shankar, Murali; tech-talk@aps.anl.gov
Subject: RE: caget very rarely core dumps in osdThread.c

Hi Murlai,
 
> We do have symbols and all the core files point here
> 
> (gdb) bt
> #3  0x0098f2a6 in start_routine (arg=0xb7d00598) at >../../../src/libCom/osi/os/posix/osdThread.c:309
> #4  0x007423cc in start_thread () from /lib/tls/libpthread.so.0
> #5  0x005a9f0e in clone () from /lib/tls/libc.so.6
 
Based on line number 309, it appears that the precipitating circumstance is a failure initially at line 309 in the "start_routine" function.
 
    status = pthread_setspecific(getpthreadInfo,arg); ç=================== bad status
    checkStatusQuit(status,"pthread_setspecific","start_routine");
 
It's interesting that pthread_setspecific is failing. Of course one is suspecting a race condition where the "once" function, in the same source file hasn't finished running at a time that the "start_thread" function is already running in a new thread, but supposedly pthread_once is preventing that.
 
Could you forward the output from "thread apply all bt" in gdb against the same core file. 
 
Thanks,
 
Jeff
______________________________________________________
Jeffrey O. Hill           Email        johill@lanl.gov <mailto:johill@lanl.gov> 
LANL MS H820              Voice        505 665 1831
Los Alamos NM 87545 USA   FAX          505 665 5107
 
Message content: TSPA
 
With sufficient thrust, pigs fly just fine. However, this is
not necessarily a good idea. It is hard to be sure where they
are going to land, and it could be dangerous sitting under them
as they fly overhead. -- RFC 1925
 
From: tech-talk-bounces@aps.anl.gov [mailto:tech-talk-bounces@aps.anl.gov] On Behalf Of Shankar, Murali
Sent: Friday, May 06, 2011 3:04 PM
To: tech-talk@aps.anl.gov
Subject: caget very rarely core dumps in osdThread.c
 
As part of tracking down failures elsewhere, we ran into some core dumps of caget from EPICS base version R3-14-10 on linux.  These were caget's for different PV's and at different points in time. 
Apr 27 07:46 core.21238 - caget GTW4:MEM:CHK
Apr 27 13:54 core.25030 - caget MCCELOG:MEM:CHK
Apr 30 14:30 core.26546 - caget PROD03:MEM:CHK
May  3 13:27 core.12026 - caget DAEMON4:DISK:CHK
May  5 13:20 core.4340 - caget PROD01:PROC:CHK
 
We do have symbols and all the core files point here
(gdb) bt
#0  0x0098f9b1 in createImplicit () at ../../../src/libCom/osi/os/posix/osdThread.c:466
#1  0x00990064 in epicsThreadGetIdSelf () at ../../../src/libCom/osi/os/posix/osdThread.c:618
#2  0x009828b2 in cantProceed (msg=0x9a0a8c "start_routine") at ../../../src/libCom/misc/cantProceed.c:63
#3  0x0098f2a6 in start_routine (arg=0xb7d00598) at ../../../src/libCom/osi/os/posix/osdThread.c:309
#4  0x007423cc in start_thread () from /lib/tls/libpthread.so.0
#5  0x005a9f0e in clone () from /lib/tls/libc.so.6
 
 
The method createImplicit  has not changed from R3-14-10 to R3-14-12.  This is the appropriate line of code.
 
        pthreadInfo->osiPriority =
                 (param.sched_priority - pcommonAttr->minPriority) * 100.0 /
                    (pcommonAttr->maxPriority - pcommonAttr->minPriority + 1);
 
 
Is there anything more we can do to figure out the root cause?
 
Regards,
Murali
 


References:
caget very rarely core dumps in osdThread.c Shankar, Murali
RE: caget very rarely core dumps in osdThread.c Jeff Hill

Navigate by Date:
Prev: RE: caget very rarely core dumps in osdThread.c Jeff Hill
Next: Announce: seq-2.0.13 Benjamin Franksen
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  <20112012  2013  2014  2015  2016  2017  2018  2019 
Navigate by Thread:
Prev: RE: caget very rarely core dumps in osdThread.c Jeff Hill
Next: Announce: seq-2.0.13 Benjamin Franksen
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  <20112012  2013  2014  2015  2016  2017  2018  2019 
ANJ, 18 Nov 2013 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·