> #0 0x407cd147 in pthread_mutex_lock () from /lib/libpthread.so.0
> #1 0x40866d68 in free () from /lib/libc.so.6
> #2 0x400218c2 in Strdcpy () from
> /usr/local/opt/epics/extensions/lib/linux-x86/libpv.so
> #3 0x400184a4 in seq_pvGet ()
To get a better idea about what might be occurring build the sequencer for
debugging, leave the code running in the debugger until it fails, and then
move the debugger's context to the Strdcpy function and possibly also the
function that calls Strdcpy. In gdb you can change stack frame contexts by
typing "up" and or "down". Examine the arguments passed to see if they are
reasonable. In gdb you can see all of the threads that are running by typing
"info threads". You can switch thread contexts by typing "thread nnn". Use
of a debugger gui on Linux such as ddd makes this much easier especially if
many threads are running.
The fact that free() fails when it is taking a mutex points somewhat towards
some form of corruption, or if the bug is in the Linux run time support
possibly a race condition. You might try to verify that the same problem
occurs on a newer version of Linux. You might try also running the code on
Solaris if it's available to see if the problem occurs there. Since this
looks vaguely like corruption then running the code through purify, assuming
that you have a system that supports that, might also be helpful.
Jeff
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of Benjamin
> Sailer
> Sent: Tuesday, November 18, 2003 9:24 AM
> To: EPICS Tech-Talk
> Subject: sequencer SEGV
>
> Hello EPICS/sequencer-Gurus,
>
> when trying to bring our Run Control System (based on exCas agents
> on ppc-lynxos, m68k-lynxos, x86-linux, one sequencer on x68-linux,
> one Tcl/Tk-client application on x68-linux) into stable operation,
> I discovered some SEGV's of our sequencer application.
>
> We're using Linux 2.4.20-18.7smp / RedHat 7.3 as operating system,
> EPICS 3.14.2 as the base release and therefore seq-2.0.4 for the
> snc. The problem occurs occasionally when at least one of the
> sequencer's
> threads (possibly more, but I'm not experienced in using a debugger
> with multiple threads) tries to read a bunch of PV's.
>
> The SNL-code looks like this:
>
> <snip>
> state STOP_PHASE_8 {
> when (delay(5.0)) {
> if (pvConnectCount() == pvAssignCount()) {
> pvGet (allBit2);
> pvGet (allBit1);
> pvGet (allRuns);
> }
> } state STOP_PHASE_8
> </snip>
>
> (the frequent queries are due to the fact that a monitor event might be
> missed in our shaky network).
>
> The C-code produced seems to be:
>
> <snip>
> static void A_daq_STOP_PHASE_8(SS_ID ssId, struct UserVar *pVar, short
> transNum)
> {
> switch(transNum)
> {
> case 0:
> {
> # line 1994 "../../runctrl/daq_seq/daqStateSet.st"
> if (seq_pvConnectCount(ssId) == seq_pvAssignCount(ssId))
> {
> seq_pvGet(ssId, 84 /* allBit2 */, 0);
> seq_pvGet(ssId, 68 /* allBit1 */, 0);
> seq_pvGet(ssId, 52 /* allRuns */, 0);
> }
> }
> return;
> case 1:
> {
> # line 1997 "../../runctrl/daq_seq/daqStateSet.st"
> daqClearCmd((pVar->allBit2), 0, (pVar->numSubsystems));
> </snip>
>
> The debugger says the following (looking at the core file):
> <snip>
> #0 0x407cd147 in pthread_mutex_lock () from /lib/libpthread.so.0
> (gdb) where
> #0 0x407cd147 in pthread_mutex_lock () from /lib/libpthread.so.0
> #1 0x40866d68 in free () from /lib/libc.so.6
> #2 0x400218c2 in Strdcpy () from
> /usr/local/opt/epics/extensions/lib/linux-x86/libpv.so
> #3 0x400184a4 in seq_pvGet ()
> from /usr/local/opt/epics/extensions/lib/linux-x86/libseq.so
> #4 0x08054439 in A_daq_STOP_PHASE_8 (ssId=135043048, pVar=0x80c9578,
> transNum=0)
> at ../../runctrl/daq_seq/daqStateSet.st:1996
> #5 0x4001747c in ss_entry ()
> from /usr/local/opt/epics/extensions/lib/linux-x86/libseq.so
> #6 0x40017290 in sequencer ()
> from /usr/local/opt/epics/extensions/lib/linux-x86/libseq.so
> #7 0x407adb04 in start_routine () from
> /usr/local/opt/epics/base/lib/linux-x86/libCom.so
> #8 0x407cc2ef in pthread_exit () from /lib/libpthread.so.0
> </snip>
>
> so if I believe the debugger, there is a bug in the thread-safe version
> of our C-library, but I don't dare to make such a statement about the
> million-times-used glibc version 2.2.5 ...
>
> When commenting out the code of Strdcpy() (hoping this does only some
> error message transport which I don't care too much about), the SEGV's
> vanished from the scene, but I wonder whether there isn't a better
> solution than throwing away possibly needed code ...
>
> Is this issue already known or addressed in a newer version of the
> sequencer
> (if a problem of the sequencer at all ...?).
>
> Thanks for all comments
>
> Benjamin
>
> --
> *****************************************************************
> Benjamin Sailer
> eMail: [email protected]
> *****************************************************************
> Disclaimer: This signature has been generated automatically and
> does not reflect my opinion at all.
> -- Benjamin Sailer
>
> I cannot believe that God plays dice with the cosmos.
> -- Albert Einstein, on the randomness of quantum mechanics
- Replies:
- RE: sequencer SEGV Nick Rees
- References:
- sequencer SEGV Benjamin Sailer
- Navigate by Date:
- Prev:
sequencer SEGV Benjamin Sailer
- Next:
RE: sequencer SEGV Nick Rees
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
2002
<2003>
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
- Navigate by Thread:
- Prev:
sequencer SEGV Benjamin Sailer
- Next:
RE: sequencer SEGV Nick Rees
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
2002
<2003>
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
|