Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  <20112012  2013  2014  2015  2016  2017  2018  2019  Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  <20112012  2013  2014  2015  2016  2017  2018  2019 
<== Date ==> <== Thread ==>

Subject: Re: caget very rarely core dumps in osdThread.c
From: Andrew Johnson <anj@aps.anl.gov>
To: tech-talk@aps.anl.gov
Cc: "Shankar, Murali" <mshankar@slac.stanford.edu>
Date: Fri, 6 May 2011 17:17:27 -0500
Hi Murali,

On Friday 06 May 2011 16:04:07 Shankar, Murali wrote:
> As part of tracking down failures elsewhere, we ran into some core dumps of
>  caget from EPICS base version R3-14-10 on linux.  These were caget's for
>  different PV's and at different points in time.
> 
> Apr 27 07:46 core.21238 - caget GTW4:MEM:CHK
> Apr 27 13:54 core.25030 - caget MCCELOG:MEM:CHK
> Apr 30 14:30 core.26546 - caget PROD03:MEM:CHK
> May  3 13:27 core.12026 - caget DAEMON4:DISK:CHK
> May  5 13:20 core.4340 - caget PROD01:PROC:CHK

It's interesting that those are all :CHK records, is that because they're the 
only ones you're looking at using caget?

> We do have symbols and all the core files point here

Good.  Do you have any information about messages output when the core dump 
occurred?

> (gdb) bt
> #0  0x0098f9b1 in createImplicit () at
>  ../../../src/libCom/osi/os/posix/osdThread.c:466
> #1  0x00990064 in epicsThreadGetIdSelf () at
>  ../../../src/libCom/osi/os/posix/osdThread.c:618
> #2  0x009828b2 in cantProceed (msg=0x9a0a8c "start_routine") at
>  ../../../src/libCom/misc/cantProceed.c:63
> #3  0x0098f2a6 in start_routine (arg=0xb7d00598) at
>  ../../../src/libCom/osi/os/posix/osdThread.c:309
> #4  0x007423cc in start_thread () from /lib/tls/libpthread.so.0
> #5  0x005a9f0e in clone () from /lib/tls/libc.so.6

Line 309 of src/libCom/osi/os/posix/osdThread.c is checking the return value 
from pthread_setspecific(), which the main page says may fail with EINVAL if 
the key value is invalid.  That seems like some code is trying to start up 
another thread after the process has already run its atexit() routines, in 
particular the myAtExit() at line 115 in that file.

You could try commenting out the call to pthread_key_delete(getpthreadInfo); 
on line 148, but actually I think you can go further, as one thing that we 
have done since R3.14.10 is to delete the myAtExit() routine completely — the 
cleanup it was doing really isn't necessary; you can just comment out the call 
to atexit() on line 294 that registers it.  Doing that has removed a lot of 
strange shutdown issues that we used to get on Linux.

Alternatively just upgrade to R3.14.12.1...

HTH,

- Andrew
-- 
An error is only a mistake if you don't learn from it.
When you learn something from it, it becomes a lesson.


Replies:
RE: caget very rarely core dumps in osdThread.c Shankar, Murali
References:
caget very rarely core dumps in osdThread.c Shankar, Murali

Navigate by Date:
Prev: caget very rarely core dumps in osdThread.c Shankar, Murali
Next: RE: caget very rarely core dumps in osdThread.c Shankar, Murali
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  <20112012  2013  2014  2015  2016  2017  2018  2019 
Navigate by Thread:
Prev: caget very rarely core dumps in osdThread.c Shankar, Murali
Next: RE: caget very rarely core dumps in osdThread.c Shankar, Murali
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  <20112012  2013  2014  2015  2016  2017  2018  2019 
ANJ, 18 Nov 2013 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·