Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  <20112012  2013  2014  2015  2016  2017  2018  2019  Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  <20112012  2013  2014  2015  2016  2017  2018  2019 
<== Date ==> <== Thread ==>

Subject: Re: vxWorks network problems
From: "Steven M. Hartman" <hartmansm@ornl.gov>
To: Dirk Zimoch <dirk.zimoch@psi.ch>
Cc: "tech-talk@aps.anl.gov" <tech-talk@aps.anl.gov>
Date: Mon, 23 May 2011 12:18:14 -0400
Dirk Zimoch wrote:

First incident:
An IOC (vxWorks 5.5) lost CA connectivity. All medm panels went white. Logging in on the IOC over serial port showed that the data base was still running, but there were many messages
"rsrv: system low on network buffers - send retry in 15 seconds"
These messages made it a bit tough to debug, because they spill all over any output of any debug tool.

As you saw, mbuf starvation is usually recoverable once the client starts behaving. Jeff's comments about queuing theory show that you cannot protect the server completely, but you can hopefully tune your buffer numbers to deal with most situations your network is likely to experience. Was this a one-time event, and if so, was anything unusual happening at the time? Which EPICS version for IOC and the clients?

Second incident:
An other IOC lost CA connectivity. This time the error message was different:
CA beacon (send to "...") error was "ENOBUFS"

This one is similar to what SNS has seen over the years with mv2100 and related boards with the DEC network driver. The precipitating event is a temporary loss of the physical network layer (i.e. unplugged network cable to IOC, or edge network switch powered down). It looks like a buggy network driver that cannot recover from this fault when there are lots of UDP packets queuing in the outgoing buffers. This one does not seem to be recoverable except by a reboot so we have taken steps to reduce the likelihood of loosing the physical link. We do not see this with other boards.

--
Steven Hartman
hartmansm@ornl.gov

Replies:
Re: vxWorks network problems Andrew Johnson
Re: vxWorks network problems Dirk Zimoch
References:
vxWorks network problems Dirk Zimoch

Navigate by Date:
Prev: Re: When a record is changed twice very fast, camonitor only detects first change Tim Mooney
Next: Re: When a record is changed twice very fast, camonitor only detects first change Andrew Johnson
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  <20112012  2013  2014  2015  2016  2017  2018  2019 
Navigate by Thread:
Prev: RE: vxWorks network problems Jeff Hill
Next: Re: vxWorks network problems Andrew Johnson
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  <20112012  2013  2014  2015  2016  2017  2018  2019 
ANJ, 18 Nov 2013 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·