EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  <19971998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024  Index 1994  1995  1996  <19971998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
<== Date ==> <== Thread ==>

Subject: Re: hanging IOCs
From: Jeff Hill <[email protected]>
To: Ric Claus <[email protected]>
Cc: [email protected], [email protected]
Date: Thu, 20 Feb 1997 16:55:56 -0700
Ric Claus wrote:
> 
> We (SLAC's PEP-II RF group) are having a problem with hanging/crashing
> IOCs and wonder if anyone has some suggestions.  We're running EPICS
> R3.13.0.beta4 with dm version 2.3.  The processor is a National
> Instruments VXIcpu-030 with 8 MB of RAM.  When everything is up and
> running normally we see about 2 MB of free memory.  Task stacks are not
> close to the edge.  There is no VXI interrupt activity, although perhaps
> some are being genterated by the Allen Bradley scanner module.

Early versions of the NI CPU030 had serious hardware flaws. These
problems were fixed by several board swap iterations here.
We eventually ended up with a version that had no hard faults
however the system would not run under heavy ethernet load 
without eventually crashing. NI was blaming WRS and vise versa. 
We lived with the problem for a year or so. They eventually gave us
another hardware/software upgrade and our reliability problems 
went away.
 
LBL had a similar experience except that they started using
the board after the initial bugs were worked out. They did
experience the reliability problems but didnt wait as long for
them to be fixed because their project started up not long
before when the NI discovered the cause of the reliability problem.
As I recall BOEING also saw an improvement after the upgrade. 

Apparently the reliability problem was fixed by WRS SPR 4143 
which is intended to fix a system crash resulting from a locked up
lanse ethernet chip. As I recall a fix for this SPR was 
included with REV D6 of the board which most likely also
included other hardware/BSP changes. I am not sure about this
rev level so you should check with NI. We are checking the
rev levels of our boards but have not so far located where NI
hides the rev sticker.

> When an IOC crashes or hangs, there is generally nothing to see on the
> console port record.  Once, however, I saw tNetTask trying to say
> something, but it never made it out.

> 
> One possibility I wonder about is that CA clients cause the creation
> of "CA client" and "CA event" tasks that consume a lot (800 Mb) of
> memory.  

Its true that memory is allocated for each client. There is also memory
allocated for each channel that is created and also for each monitor
that is created. If you are seeing 800 Mb then the client that
is attaching must be setting up many CA monitors. 

> Perhaps these crashes are due to one too many dm sessions
> being fired up in people's offices. 

I routinely run the IOC out of memory when I am testing channel access.
I have not seen any problems of this sort. If there were enough clients
to get the CA server to its 100k bytes free limit then you would see
messages
on the console.

It should be easy to test this by creating a dm screen with many
channels 
and running it many times.

> Is there a way to limit the
> number of CA clients?  

Since, depending on the number of channels/monitors created, different
clients use different amounts of memory it is not that useful to limit 
only the number of clients. 

Should we consider adding a configurable hard limit on the 
total number each of {clients,channels,monitors} in the future?

CA will not allow clients to connect if there is less than 100k free.
The WRS routine that returns the number of free bytes was (last time
I checked) written very inefficiently (it was summing up all of the
blocks on the free list). Therefore, the ca server checks the number
of free bytes periodically (every EPICS_CA_BEACON_PERIOD), and not every 
time that it calls malloc().

The CA gateway could also be used to concentrate several clients
into one effective client (as seen by the IOC).

> What do these tasks do with that much memory?  

I suspect that most of the memory in your case is being used for a
monitor
queue. The new CA server (which is _not_ the IOC server in 3.13) is
better about memory consumption for the monitor queue.

> When IOC resources
> run out, does CA currently stop allowing new client connections or
> does it let the IOC die? 

It stops allowing new clients when the IOC's free memory drops below 
100k (as polled every EPICS_CA_BEACON_PERIOD).

Jeff

-- 
______________________________________________________________________
Jeffrey O. Hill                 Internet        [email protected]
LANL MS H820                    Voice           505 665 1831
Los Alamos, NM 87545 USA        FAX             505 665 5107


References:
hanging IOCs Ric Claus

Navigate by Date:
Prev: Web Pages about PC port Kay-Uwe Kasemir
Next: Re: R3.13 Jeff Hill
Index: 1994  1995  1996  <19971998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
Navigate by Thread:
Prev: Re: hanging IOCs Marty Kraimer
Next: Re: hanging IOCs Ned Arnold
Index: 1994  1995  1996  <19971998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
ANJ, 10 Aug 2010 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·