EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  1999  2000  2001  2002  <20032004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024  Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  <20032004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
<== Date ==> <== Thread ==>

Subject: RE: sequencer SEGV
From: "Jeff Hill" <[email protected]>
To: "'Benjamin Sailer'" <[email protected]>, "'EPICS Tech-Talk'" <[email protected]>
Date: Tue, 18 Nov 2003 09:54:14 -0700
> #0  0x407cd147 in pthread_mutex_lock () from /lib/libpthread.so.0
> #1  0x40866d68 in free () from /lib/libc.so.6
> #2  0x400218c2 in Strdcpy () from
> /usr/local/opt/epics/extensions/lib/linux-x86/libpv.so
> #3  0x400184a4 in seq_pvGet ()

To get a better idea about what might be occurring build the sequencer for
debugging, leave the code running in the debugger until it fails, and then
move the debugger's context to the Strdcpy function and possibly also the
function that calls Strdcpy. In gdb you can change stack frame contexts by
typing "up" and or "down". Examine the arguments passed to see if they are
reasonable. In gdb you can see all of the threads that are running by typing
"info threads". You can switch thread contexts by typing "thread nnn". Use
of a debugger gui on Linux such as ddd makes this much easier especially if
many threads are running.

The fact that free() fails when it is taking a mutex points somewhat towards
some form of corruption, or if the bug is in the Linux run time support
possibly a race condition. You might try to verify that the same problem
occurs on a newer version of Linux. You might try also running the code on
Solaris if it's available to see if the problem occurs there. Since this
looks vaguely like corruption then running the code through purify, assuming
that you have a system that supports that, might also be helpful.

Jeff

> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of Benjamin
> Sailer
> Sent: Tuesday, November 18, 2003 9:24 AM
> To: EPICS Tech-Talk
> Subject: sequencer SEGV
> 
> Hello EPICS/sequencer-Gurus,
> 
> when trying to bring our Run Control System (based on exCas agents
> on ppc-lynxos, m68k-lynxos, x86-linux, one sequencer on x68-linux,
> one Tcl/Tk-client application on x68-linux) into stable operation,
> I discovered some SEGV's of our sequencer application.
> 
> We're using Linux 2.4.20-18.7smp / RedHat 7.3 as operating system,
> EPICS 3.14.2 as the base release and therefore seq-2.0.4 for the
> snc.  The problem occurs occasionally when at least one of the
> sequencer's
> threads (possibly more, but I'm not experienced in using a debugger
> with multiple threads)  tries to read a bunch of PV's.
> 
> The SNL-code looks like this:
> 
> <snip>
>     state STOP_PHASE_8 {
>         when (delay(5.0)) {
>             if (pvConnectCount() == pvAssignCount()) {
>                 pvGet (allBit2);
>                 pvGet (allBit1);
>                 pvGet (allRuns);
>             }
>         } state STOP_PHASE_8
> </snip>
> 
> (the frequent queries are due to the fact that a monitor event might be
> missed in our shaky network).
> 
> The C-code produced seems to be:
> 
> <snip>
> static void A_daq_STOP_PHASE_8(SS_ID ssId, struct UserVar *pVar, short
> transNum)
> {
>     switch(transNum)
>     {
>     case 0:
>         {
> # line 1994 "../../runctrl/daq_seq/daqStateSet.st"
>             if (seq_pvConnectCount(ssId) == seq_pvAssignCount(ssId))
>             {
>                 seq_pvGet(ssId, 84 /* allBit2 */, 0);
>                 seq_pvGet(ssId, 68 /* allBit1 */, 0);
>                 seq_pvGet(ssId, 52 /* allRuns */, 0);
>             }
>         }
>         return;
>     case 1:
>         {
> # line 1997 "../../runctrl/daq_seq/daqStateSet.st"
>             daqClearCmd((pVar->allBit2), 0, (pVar->numSubsystems));
> </snip>
> 
> The debugger says the following (looking at the core file):
> <snip>
> #0  0x407cd147 in pthread_mutex_lock () from /lib/libpthread.so.0
> (gdb) where
> #0  0x407cd147 in pthread_mutex_lock () from /lib/libpthread.so.0
> #1  0x40866d68 in free () from /lib/libc.so.6
> #2  0x400218c2 in Strdcpy () from
> /usr/local/opt/epics/extensions/lib/linux-x86/libpv.so
> #3  0x400184a4 in seq_pvGet ()
>    from /usr/local/opt/epics/extensions/lib/linux-x86/libseq.so
> #4  0x08054439 in A_daq_STOP_PHASE_8 (ssId=135043048, pVar=0x80c9578,
> transNum=0)
>     at ../../runctrl/daq_seq/daqStateSet.st:1996
> #5  0x4001747c in ss_entry ()
>    from /usr/local/opt/epics/extensions/lib/linux-x86/libseq.so
> #6  0x40017290 in sequencer ()
>    from /usr/local/opt/epics/extensions/lib/linux-x86/libseq.so
> #7  0x407adb04 in start_routine () from
> /usr/local/opt/epics/base/lib/linux-x86/libCom.so
> #8  0x407cc2ef in pthread_exit () from /lib/libpthread.so.0
> </snip>
> 
> so if I believe the debugger, there is a bug in the thread-safe version
> of our C-library, but I don't dare to make such a statement about the
> million-times-used glibc version 2.2.5 ...
> 
> When commenting out the code of Strdcpy() (hoping this does only some
> error message transport which I don't care too much about), the SEGV's
> vanished from the scene, but I wonder whether there isn't a better
> solution than throwing away possibly needed code ...
> 
> Is this issue already known or addressed in a newer version of the
> sequencer
> (if a problem of the sequencer at all ...?).
> 
> Thanks for all comments
> 
> Benjamin
> 
> --
> *****************************************************************
> Benjamin Sailer
> eMail: [email protected]
> *****************************************************************
> Disclaimer:  This signature has been generated automatically and
> does not reflect my opinion at all.
>                 -- Benjamin Sailer
> 
> I cannot believe that God plays dice with the cosmos.
> 		-- Albert Einstein, on the randomness of quantum mechanics



Replies:
RE: sequencer SEGV Nick Rees
References:
sequencer SEGV Benjamin Sailer

Navigate by Date:
Prev: sequencer SEGV Benjamin Sailer
Next: RE: sequencer SEGV Nick Rees
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  <20032004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
Navigate by Thread:
Prev: sequencer SEGV Benjamin Sailer
Next: RE: sequencer SEGV Nick Rees
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  <20032004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
ANJ, 10 Aug 2010 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·