Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  <20112012  2013  2014  2015  2016  2017  2018  2019  Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  <20112012  2013  2014  2015  2016  2017  2018  2019 
<== Date ==> <== Thread ==>

Subject: vxWorks network problems
From: Dirk Zimoch <dirk.zimoch@psi.ch>
To: "tech-talk@aps.anl.gov" <tech-talk@aps.anl.gov>
Date: Mon, 23 May 2011 12:06:30 +0200
Hi all,

We had some network problems over the weekend. Maybe someone knows what to do. Here is what I observed:

First incident:
An IOC (vxWorks 5.5) lost CA connectivity. All medm panels went white. Logging in on the IOC over serial port showed that the data base was still running, but there were many messages
"rsrv: system low on network buffers - send retry in 15 seconds"
These messages made it a bit tough to debug, because they spill all over any output of any debug tool.

But what I found was: mbuf showed 0 free buffers. Where are they?
inestatShow showed three CA connections with full send queues. Following the foreign address entries ans using netstat -tp on the client computers I found one CA gateway and 2 Tcl/Tk clients. All had large numbers in their receive queues. At least the gateway reported in its log file that it has lost connection to the IOC.

After restarting the gateway and killing the two clients, the IOC recovered.

It is not the first time that this happens. I have seen any type of clients causing this problem, Tcl, medm, gateway, ...

Some time ago I had increased the queue sizes on vxWorks from the default 8k to 64k. Was that a bad idea?

How should an IOC behave is a CA client which subscribed for monitor events does not handle input fast enough? Using up all network resources on vxWorks is not the best thing that can happen.

What could have stopped the clients from handling their input?


Second incident:
An other IOC lost CA connectivity. This time the error message was different:
CA beacon (send to "...") error was "ENOBUFS"
Again, inetstatShow showed two client connections with quite full send queues. But this time, mbufShow still showed free buffers. And killing the clients did not help!

Using a function I once got from WindRiver I found that the network interface send queue was full (size: 50 entries).

void ifQValuesShow (char *name) {
    struct ifnet *ifp;
    ifp = ifunit(name);
    if (ifp == NULL) {
        printf("Could not find %s interface\n", name);
        return;
        }
    printf("%s drops = %d queue length = %d max_len = %d \n",
        name, ifp->if_snd.ifq_drops,
        ifp->if_snd.ifq_len, ifp->if_snd.ifq_maxlen);
    return;
    }

The only way to recover from this problem seems to be a reboot.

Any idea what went wrong here?

I can increase the queue size, but how much? WindRiver never answers questions like "why does the network not recover?".


Dirk



Replies:
RE: vxWorks network problems Jeff Hill
Re: vxWorks network problems Steven M. Hartman

Navigate by Date:
Prev: Re: caGateway crashes / use of *MustSucceed functions Benjamin Franksen
Next: When a record is changed twice very fast, camonitor only detects first change Mikel Rojo
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  <20112012  2013  2014  2015  2016  2017  2018  2019 
Navigate by Thread:
Prev: Re: CAJ Flow Control Bug David Brodrick
Next: RE: vxWorks network problems Jeff Hill
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  <20112012  2013  2014  2015  2016  2017  2018  2019 
ANJ, 18 Nov 2013 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·