Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  <20112012  2013  2014  2015  2016  2017  2018  2019  Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  <20112012  2013  2014  2015  2016  2017  2018  2019 
<== Date ==> <== Thread ==>

Subject: Re: vxWorks network problems
From: Dirk Zimoch <dirk.zimoch@psi.ch>
To: "Steven M. Hartman" <hartmansm@ornl.gov>
Cc: "tech-talk@aps.anl.gov" <tech-talk@aps.anl.gov>
Date: Tue, 24 May 2011 11:54:37 +0200
Hi Steven,

Steven M. Hartman wrote:
Dirk Zimoch wrote:

First incident:
An IOC (vxWorks 5.5) lost CA connectivity. All medm panels went white. Logging in on the IOC over serial port showed that the data base was still running, but there were many messages
"rsrv: system low on network buffers - send retry in 15 seconds"
These messages made it a bit tough to debug, because they spill all over any output of any debug tool.

As you saw, mbuf starvation is usually recoverable once the client starts behaving. Jeff's comments about queuing theory show that you cannot protect the server completely, but you can hopefully tune your buffer numbers to deal with most situations your network is likely to experience. Was this a one-time event, and if so, was anything unusual happening at the time? Which EPICS version for IOC and the clients?

All clients and IOCs are running EPICS 3.14.8.2.

I tried to tune the network buffers. But once I when I saw with inetstatShow that send queues filled up, I increased their size from the default 8k to 64k. This seems to be a bad decision, because now I don't have enough mbufs any more. But how many do I need? If I have 20 CA connections, I might need more than a MB mbufs. Thus I better go back to 8k buffers.

Has anyone changed TCP_SND_SIZE_DFLT or TCP_RCV_SIZE_DFLT in vxWorks?

My settings are at the moment:
#define NUM_64          800             /* default 100 */
#define NUM_128         800             /* default 100 */
#define NUM_256         100             /* default 40  */
#define NUM_512         100             /* default 40  */
#define NUM_1024        100             /* default 25  */
#define NUM_2048        100             /* default 25  */

#define NUM_SYS_64      1024            /* default 64  */
#define NUM_SYS_128     1024            /* default 64  */
#define NUM_SYS_256     512             /* default 64  */
#define NUM_SYS_512     512             /* default 64  */

/* TCP queue sizes increaded to improve array thoughput */
#define TCP_SND_SIZE_DFLT 65536
#define TCP_RCV_SIZE_DFLT 65536


Second incident:
An other IOC lost CA connectivity. This time the error message was different:
CA beacon (send to "...") error was "ENOBUFS"

This one is similar to what SNS has seen over the years with mv2100 and related boards with the DEC network driver. The precipitating event is a temporary loss of the physical network layer (i.e. unplugged network cable to IOC, or edge network switch powered down). It looks like a buggy network driver that cannot recover from this fault when there are lots of UDP packets queuing in the outgoing buffers. This one does not seem to be recoverable except by a reboot so we have taken steps to reduce the likelihood of loosing the physical link. We do not see this with other boards.


It was user shift, so hopefully nobody had unplugged any cables. Probably ca_search UDP packages are related, but the time resolution of our network diagnosis is not so good that I can see if the IOC crashes because of search broadcasts or if the clients broadcast because the IOC crashed.

To my experience WRS network drivers are always buggy. My dec21x40End.c is from September 2005. Maybe I should ask WRS for a new driver.


Thanks for your reply

Dirk

Replies:
Re: vxWorks network problems Steven M. Hartman
Re: vxWorks network problems Andrew Johnson
References:
vxWorks network problems Dirk Zimoch
Re: vxWorks network problems Steven M. Hartman

Navigate by Date:
Prev: Re: Device Support for Nemic Lambda Power Supplies Eric Norum
Next: Re: vxWorks network problems Dirk Zimoch
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  <20112012  2013  2014  2015  2016  2017  2018  2019 
Navigate by Thread:
Prev: Re: vxWorks network problems Dirk Zimoch
Next: Re: vxWorks network problems Steven M. Hartman
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  <20112012  2013  2014  2015  2016  2017  2018  2019 
ANJ, 18 Nov 2013 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·