1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 <2011> 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 | Index | 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 <2011> 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 |
<== Date ==> | <== Thread ==> |
---|
Subject: | Re: vxWorks network problems |
From: | "Steven M. Hartman" <[email protected]> |
To: | Dirk Zimoch <[email protected]> |
Cc: | "[email protected]" <[email protected]> |
Date: | Mon, 23 May 2011 12:18:14 -0400 |
Dirk Zimoch wrote:
First incident:An IOC (vxWorks 5.5) lost CA connectivity. All medm panels went white. Logging in on the IOC over serial port showed that the data base was still running, but there were many messages"rsrv: system low on network buffers - send retry in 15 seconds"These messages made it a bit tough to debug, because they spill all over any output of any debug tool.
As you saw, mbuf starvation is usually recoverable once the client starts behaving. Jeff's comments about queuing theory show that you cannot protect the server completely, but you can hopefully tune your buffer numbers to deal with most situations your network is likely to experience. Was this a one-time event, and if so, was anything unusual happening at the time? Which EPICS version for IOC and the clients?
Second incident:An other IOC lost CA connectivity. This time the error message was different:CA beacon (send to "...") error was "ENOBUFS"
This one is similar to what SNS has seen over the years with mv2100 and related boards with the DEC network driver. The precipitating event is a temporary loss of the physical network layer (i.e. unplugged network cable to IOC, or edge network switch powered down). It looks like a buggy network driver that cannot recover from this fault when there are lots of UDP packets queuing in the outgoing buffers. This one does not seem to be recoverable except by a reboot so we have taken steps to reduce the likelihood of loosing the physical link. We do not see this with other boards.
-- Steven Hartman [email protected]