Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  <20112012  2013  2014  2015  2016  2017  2018  2019  Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  <20112012  2013  2014  2015  2016  2017  2018  2019 
<== Date ==> <== Thread ==>

Subject: RE: caget very rarely core dumps in osdThread.c
From: "Shankar, Murali" <mshankar@slac.stanford.edu>
To: "tech-talk@aps.anl.gov" <tech-talk@aps.anl.gov>
Date: Fri, 6 May 2011 15:37:40 -0700
>> It's interesting that those are all :CHK records, is that because they're the 
>> only ones you're looking at using caget?

No, we are looking at several other PV's (about 7K+) in this utility. 
So far, the only ones that have failed are these CHK records.
I checked with the owner of these PV's and nothing untoward seems to have happened at these times.

>> Good.  Do you have any information about messages output when the core dump 
>> occurred?
Unfortunately, no. I'll see if I can keep some of the logs around a bit longer.

>> You could try commenting out the call to pthread_key_delete(getpthreadInfo); 
>> on line 148, but actually I think you can go further, as one thing that we
>> ....
This is most likely the cause; I will try these suggestions out and see if that fixes it. 

>> Alternatively just upgrade to R3.14.12.1...
In this particular case, I'm fairly certain there's a way we can do this. 
Let me try this out as well.

Thank you for your assistance.

Regards,
Murali


-----Original Message-----
From: Andrew Johnson [mailto:anj@aps.anl.gov] 
Sent: Friday, May 06, 2011 3:17 PM
To: tech-talk@aps.anl.gov
Cc: Shankar, Murali
Subject: Re: caget very rarely core dumps in osdThread.c

Hi Murali,

On Friday 06 May 2011 16:04:07 Shankar, Murali wrote:
> As part of tracking down failures elsewhere, we ran into some core dumps of
>  caget from EPICS base version R3-14-10 on linux.  These were caget's for
>  different PV's and at different points in time.
> 
> Apr 27 07:46 core.21238 - caget GTW4:MEM:CHK
> Apr 27 13:54 core.25030 - caget MCCELOG:MEM:CHK
> Apr 30 14:30 core.26546 - caget PROD03:MEM:CHK
> May  3 13:27 core.12026 - caget DAEMON4:DISK:CHK
> May  5 13:20 core.4340 - caget PROD01:PROC:CHK

It's interesting that those are all :CHK records, is that because they're the 
only ones you're looking at using caget?

> We do have symbols and all the core files point here

Good.  Do you have any information about messages output when the core dump 
occurred?

> (gdb) bt
> #0  0x0098f9b1 in createImplicit () at
>  ../../../src/libCom/osi/os/posix/osdThread.c:466
> #1  0x00990064 in epicsThreadGetIdSelf () at
>  ../../../src/libCom/osi/os/posix/osdThread.c:618
> #2  0x009828b2 in cantProceed (msg=0x9a0a8c "start_routine") at
>  ../../../src/libCom/misc/cantProceed.c:63
> #3  0x0098f2a6 in start_routine (arg=0xb7d00598) at
>  ../../../src/libCom/osi/os/posix/osdThread.c:309
> #4  0x007423cc in start_thread () from /lib/tls/libpthread.so.0
> #5  0x005a9f0e in clone () from /lib/tls/libc.so.6

Line 309 of src/libCom/osi/os/posix/osdThread.c is checking the return value 
from pthread_setspecific(), which the main page says may fail with EINVAL if 
the key value is invalid.  That seems like some code is trying to start up 
another thread after the process has already run its atexit() routines, in 
particular the myAtExit() at line 115 in that file.

You could try commenting out the call to pthread_key_delete(getpthreadInfo); 
on line 148, but actually I think you can go further, as one thing that we 
have done since R3.14.10 is to delete the myAtExit() routine completely — the 
cleanup it was doing really isn't necessary; you can just comment out the call 
to atexit() on line 294 that registers it.  Doing that has removed a lot of 
strange shutdown issues that we used to get on Linux.

Alternatively just upgrade to R3.14.12.1...

HTH,

- Andrew
-- 
An error is only a mistake if you don't learn from it.
When you learn something from it, it becomes a lesson.


References:
caget very rarely core dumps in osdThread.c Shankar, Murali
Re: caget very rarely core dumps in osdThread.c Andrew Johnson

Navigate by Date:
Prev: Re: caget very rarely core dumps in osdThread.c Andrew Johnson
Next: RE: caget very rarely core dumps in osdThread.c Jeff Hill
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  <20112012  2013  2014  2015  2016  2017  2018  2019 
Navigate by Thread:
Prev: Re: caget very rarely core dumps in osdThread.c Andrew Johnson
Next: RE: caget very rarely core dumps in osdThread.c Jeff Hill
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  <20112012  2013  2014  2015  2016  2017  2018  2019 
ANJ, 18 Nov 2013 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·