Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  <19992000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  Index 1994  1995  1996  1997  1998  <19992000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019 
<== Date ==> <== Thread ==>

Subject: Re: workQPanic! Oh no!
From: Tim Mooney <mooney@aps.anl.gov>
To: tech-talk@aps.anl.gov, marki@jlab.org
Date: Wed, 1 Sep 1999 14:13:55 -0500 (CDT)
re...,

> I know that this is probably a VxWorks question, but what exactly does
> 
>   workQPanic: Kernel work queue overflow.
> 
> mean? We note a perfect correlation with IOC crashes ;). Running
> R3.13.0beta11 and VxWorks 5.2 on a MVME162-532A, 16 MB, not obviously near
> the memory limit, not totally CPU limited either. Running lots of
> sequences, for what that's worth.
> 
>   -- Mark
> 
> --
> Mark M. Ito, Thomas Jefferson National Accelerator Facility

Too many interrupts.  The perfect correlation with IOC crashes is not
coincidental.

John Winans got off on a rant on this subject several years ago.  Here's the
part of his reply that was on point, or at least in the vicinity of the point:

>>  ...The only workQPanic stuff I am
>> aware of is based on the fact that VW uses a ring buffer to collect a
>> list of junk to do 'next'.  It backically contains a list of semaphores
>> that need to be 'given' now.  The reason they have this set up this way
>> is because they are idiots and don't know enough to write code that
>> scales properly.  You see, this ring buffer is a fixed size (engenius
>> eh?) and it can fill up if it decides to defer the giving of these
>> semaphores.
>> 
>> Well, the only time these semaphore 'give' operations are deferred is
>> when a semGive() is in an interrupt handler.  Thus this message means
>> that there were too many semGive()s that happened in a narrow time in
>> one or more interrupt handlers.  Where a "narrow time" is one where not
>> enough time has elapsed to let the cpu run the code to complete all the
>> semGive()s.
>> 
>> Personally, I think the boyz at VW are idiots because it is not very
>> hard to replacethe code that currently looks like this:
>> 
>> if (ringAroundTheCollarBufferIsFull)
>> 	panic;
>> else
>> 	addMoreCrapTo(ringAroundTheCollarBuffer, crap);
>> 
>> with code that is more sensable and truely useful in the real world that
>> might look more like this:
>> 
>> if (ringAroundTheCollarBufferIsFull)
>> {
>> 	flushTheDamnThing(ringAroundTheCollarBuffer);
>> 	consoleWrite("WARNING: ring buffer overflow\");
>> }
>> addMoreCrapTo(ringAroundTheCollarBuffer, crap);
>> 
>> Yah, it is a bad thing to do list-processing and large amounts of work
>> at an interrupt level, but it sure beats asserting a panic()!!!!
>> 
>> 
>> Now, the way I deal with this misfeature is to figure out what IRQs might
>> be generated at the time that it dies... it is usually fairly easy
>> because you can normally narrow things down to some single device that
>> is being used or saturated at the panic moment... that driver is the
>> one that is generating too many IRQs.
>> 
>> The most common occurrance of this that I have seen is caused by using
>> raw (non-debounced) binary inputs on a card whose driver is configured
>> to generate interrupts on input transitions.  It is pretty much a
>> guarentee to panic the beast.  The second easy way to do it would be to
>> enable IRQ processing of global events on Frank Lenksus's global event
>> boards... and then generating a boat-load of events with the event code
>> numbers of those enabled IRQs... I wrote the code... I know it will
>> fail if not configured properly.  I have caused it to do so while bench
>> testing.
>> 
>> Groovus?
>> 
>> 
>> --John Winans

Tim Mooney (mooney@aps.anl.gov) (630)252-5417
Beamline Controls & Data Acquisition Group
Advanced Photon Source, Argonne National Lab



Navigate by Date:
Prev: One CA server question and one CA server problem saa
Next: Arrays and portable channel access server Tony Cox - (415)926-3105
Index: 1994  1995  1996  1997  1998  <19992000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019 
Navigate by Thread:
Prev: workQPanic! Oh no! Mark M. Ito
Next: One CA server question and one CA server problem saa
Index: 1994  1995  1996  1997  1998  <19992000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019 
ANJ, 10 Aug 2010 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·