PS: One can attach to a timer queue by explicitly creating one, or by just
asking for a default. So many different codes can end up sharing a common
timer queue.
PPS: One of the recent fixes (after your version) was to add a try catch
block around the invocation of the timer expiration callback so that an
exception in the timer expiration callback will not cause the timer queue
facility to die. This type of flexibility wrt prioritized damage control is
one of the benefits of managing errors as exceptions (maybe no-longer a
topic of discussion in the Java community).
PPPS: Maybe I am restating the obvious, but the throwing site is obviously a
C++ code which certainly points to the timer queue or the CA client library,
and maybe a few others. Nevertheless, suspect corruption if the C++ facility
isn't being shutdown as C++ is pretty good about keeping track of when the
event id needs to be destroyed (only only in the dtor).
Jeff
______________________________________________________
Jeffrey O. Hill Email [email protected]
LANL MS H820 Voice 505 665 1831
Los Alamos NM 87545 USA FAX 505 665 5107
Message content: TSPA
> -----Original Message-----
> From: Jeff Hill [mailto:[email protected]]
> Sent: Thursday, May 06, 2010 9:48 AM
> To: 'Dirk Zimoch'; 'EPICS'
> Cc: 'ebner'
> Subject: RE: epicsEvent::invalidSemaphore exception in timerQueue
>
> Dirk,
>
> One typically attaches the tornado debugger and request, for a particular
> thread, that it suspend execution at the point where the exception gets
> thrown instead of where it gets caught (currently in the epicsThread's
> last chance exception handler). This will provide a more useful stack
> trace.
>
> Are you aware of a precipitating set of events that leads to the demise
> of this timer queue. Is there any facility that is being shut down at
> about the same time (just before this occurrence)? New device support?
> Does it happen shortly after the EPICS system starts?
>
> Typically erroneous use of an invalid semaphore id results from ...
> A) memory corruption
> B) shutdown order issues where an object is used after it was destroyed
>
> It _would_ be useful to determine what component of the IOC the timer
> queue belongs to. We should be able to make that determination if you
> send the output from the vxWorks "i" command (hopefully also identifying
> the task id of the culprit). The associated component probably can be
> inferred from the relative execution priorities of the timer queue to the
> spawning component.
>
> I am the author of the timer queue (a new, heavily used, feature in
> R3.14). The timer queue thread typically spends most of its time waiting,
> with timeout, on an event semaphore. The event semaphore gets posted only
> when the timer queue needs to reschedule. That particular event semaphore
> would only become invalid if the timer queue was being shut down, the
> timer queue data structure was corrupted, or if the vxWorks kernel data
> structures were corrupted.
>
> The timer queue class has a show diagnostic member function which is
> typically called by the diagnostic member function of its owner. So if
> you can find out who the timer queue belongs to then you can invoke its
> show function, at increased interest level, to find out if there is
> generalized corruption in its data structures. The tornado debugger can
> also help with quick surveys for corruption.
>
> Also, see Mantis entries 336, 332, 320 which may be unrelated, but
> nevertheless do involve fixes to the timer queue facility after your
> version.
>
> Jeff
> ______________________________________________________
> Jeffrey O. Hill Email [email protected]
> LANL MS H820 Voice 505 665 1831
> Los Alamos NM 87545 USA FAX 505 665 5107
>
> Message content: TSPA
>
>
> > -----Original Message-----
> > From: [email protected] [mailto:tech-talk-
> > [email protected]] On Behalf Of Dirk Zimoch
> > Sent: Thursday, May 06, 2010 7:51 AM
> > To: EPICS
> > Cc: ebner
> > Subject: epicsEvent::invalidSemaphore exception in timerQueue
> >
> > Hi all,
> >
> > We are having a strange problem where a TimerQueue task gets suspended
> > because of a invalidSemaphore exception and we can't wind out where.
> >
> > We are using EPICS 3.14.8 on vxWorks.
> >
> > Here is the error message:
> >
> > > 0x1fb87960 (timerQueue): Unhandled C++ exception resulted in call to
> > terminate
> > > epicsThread: Unexpected C++ exception
> "epicsEvent::invalidSemaphore()"
> > with type "Q210epicsEvent16invalidSemaphore" in thread "timerQueue" at
> > THU MAY 06 2010 10:19:21.310783060
> >
> > The dead task:
> > > timerQueue a94c08 1fb87960 148 SUSPEND 23d410 1fb87470
> > 3d0002 0
> >
> > And the stack trace:
> > >> tt 0x1fb87960
> > > 243c24 vxTaskEntry +68 : a94c08 (&epicsThreadCallEntryPoint,
> > 1f9d24fc)
> > > a94c84 epicsThreadOnceOsd+174: a7b230 ()
> > > a7b230 epicsThreadCallEntryPoint+5c8: __cp_exception_info ()
> > > 15e2fc __cp_exception_info+0 : __default_unexpected(void) ()
> > > 15e2b8 set_terminate(void (*)(void))+0 : terminate(void) ()
> > > 15e2a8 __default_unexpected(void)+0 : cplusTerminate(void) ()
> > > 15d068 cplusTerminate(void)+50 : taskSuspend ()
> >
> > I found out that epicsEvent::invalidSemaphore is thrown by
> > epicsEvent::wait() and its variants when semTake failes for other
> > reasons than timeout (which can only be an invalid SEM_ID).
> > Unfortunately the stack trace is not very helpful to find out where the
> > actual error happened (thanks to the C++ exception mechanism).
> >
> > The error happens when (shortly after) the attached SNL program
> finishes
> > the entry block of state "active". We are using seq version 2.0.10.
> >
> > Any idea how to find out where the problem really is? Who might own the
> > timerQueue? What corrupted epicsEvent the timerQueue might wait for?
> How
> > the epicsEvent semaphore might got corrupted?
> >
> > Dirk
- Navigate by Date:
- Prev:
Re: Experimental Sequencer Release Ralph Lange
- Next:
Re: Experimental Sequencer Release Andrew Johnson
- Index:
2002
2003
2004
2005
2006
2007
2008
2009
<2010>
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
- Navigate by Thread:
- Prev:
Re: Experimental Sequencer Release Benjamin Franksen
- Next:
[Question #110395]: push fails? Jeff Hill
- Index:
2002
2003
2004
2005
2006
2007
2008
2009
<2010>
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
|