1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 <2005> 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 | Index | 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 <2005> 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 |
<== Date ==> | <== Thread ==> |
---|
Subject: | Solution to a different version of the S_errno_ENOBUFS problem |
From: | "Redman, Russell O." <[email protected]> |
To: | "'[email protected]'" <[email protected]> |
Date: | Fri, 14 Jan 2005 08:17:14 -0500 |
I encountered a somewhat different version of the ENOBUFS problem, but seem to have cured it. I am by no means an expert in these issues, but enough people have encountered ENOBUFS that there may be some interest in my cure.
I am running EPICS R3.13.8 on an MVME2402-3. I build vxWorks using Tornado 2.0.2.
The IOC would refuse to start the Channel Access repeater during iocInit. I am writing this message on a different machine and did not save screen dumps of the boot sequence, so I cannot reproduce the exact error messages (which were different in each test that I made anyways) but the upshot was always that the system could not allocate a socket to start the repeater. There were also occasional complaints about a bind error. Always, the sequence ended with S_errno_ENOBUFS. I was able to isolate the system from the rest of the network, so I am quite sure that network traffic was not an issue. Only iocInit was running (and any subtasks tht it spawned), so higer-priority tasks should not have been in the way. This looked very much like resource starvation as was discussed for an MVME162 by "Zoltan Kakucs" on 20 Oct 2003 (see Re: CA block sem corrupted error and S_errno_ENOBUFS" in the tech-talk archives), but his detailed solution does not apply to my BSP that uses a very different set of #define's. Jeff Hill also contributed a useful discussion of a related problem on 29 Jan 2003.
casr verified that there were no channels connected. Similarly, inetstatShow verified that there were no active network connections, and netStackDataPoolshow revealed no problems. However, netStackSysPoolShow claimed that the system pool had been drained 3 times, and the number of sockets was 16, the maximum number of MUX bindings allowed, I do open a lot of serial ports using tnetdev to access a pair of networked terminal servers. In Tornado, I therefore went to
- network components
-- basic network initialization
---network buffer initialization
and arbitrarily doubled the number of system buffers from 64 to 128:
NUM_SYS_128 = 128
NUM_SYS_256 = 128
NUM_SYS_512 = 128
NUM_SYS_64 = 128
Because of the complaints about bindings, and the suspicious equality between the number of sockets used and the MUX_MAX_BINDS, I also doubled the number of BINDS from 16 to 32.
-network components
-- basic network initialization
---network buffer initialization
MUX_MAX_BINDS = 32
Rebuilding vxWorks and rebooting the IOC, I now find that the repeater starts properly, and that netStackSysPoolShow reports the pool was never drained. I also have 19 sockets open - no wonder that MUX_MAX_BINDS=16 gave trouble.
Hope this is helpful for someone else.
Russell O. Redman