EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  <20072008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024  Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  <20072008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
<== Date ==> <== Thread ==>

Subject: Re: vxWorks network problem on MVME2700
From: Maren Purves <[email protected]>
To: Mark Rivers <[email protected]>
Cc: TechTalk EPICS <[email protected]>
Date: Fri, 12 Oct 2007 11:10:35 -1000
Mark,

this reminds me of a problem Ian Smith had at the ATC (Edinburgh) last
year. The solution isn't on the exploder (at least not if I search by
"Smith"), but I found something in my email:

-------------- quote -----------
I think I've eliminated almost everything and still have the problem. I've set up another board, used the vxWorks that you sent me and the problem persists.


I got one reply from the epics group:

"The folks at the SNS had to patch there vxWorks IP kernel against defects related to mbuf starvation IP deadlocks (both in the kernel itself and also in the NIC driver)."
------------- unquote ----------------


the "you" refers to is Craig Walther.

Hope this helps any at all,
Maren
(Ian has since taken redundancy and may be hard to get hold of)


Mark Rivers wrote:
Folks,

We have been getting network lockups on our MVME2700 boards.  This is
running vxWorks 5.4.2, EPICS 3.14.8.2, and Andrew Johnson's latest BSP
asd9-nodns.

When it happens it appears that the network is still receiving packets,
but not sending any, as seen by 2 successive ifShow commands:

ioc13ida> ifShow("dc")
dc (unit number 0):
Flags: (0x8063) UP BROADCAST MULTICAST ARP RUNNING Type: ETHERNET_CSMACD
Internet address: 164.54.160.75
Broadcast address: 164.54.160.255
Netmask 0xffff0000 Subnetmask 0xffffff00
Ethernet address is 08:00:3e:2f:39:46
Metric is 0
Maximum Transfer Unit size is 1500
0 octets received
0 octets sent
83065850 packets received
122169709 packets sent
83065850 unicast packets received
122118974 unicast packets sent
0 non-unicast packets received
50735 non-unicast packets sent
0 input discards
0 input unknown protocols
1407 input errors
2834 output errors
0 collisions; 0 dropped
ioc13ida> ioc13ida> ifShow("dc")
dc (unit number 0):
Flags: (0x8063) UP BROADCAST MULTICAST ARP RUNNING Type: ETHERNET_CSMACD
Internet address: 164.54.160.75
Broadcast address: 164.54.160.255
Netmask 0xffff0000 Subnetmask 0xffffff00
Ethernet address is 08:00:3e:2f:39:46
Metric is 0
Maximum Transfer Unit size is 1500
0 octets received
0 octets sent
83065862 packets received
122169709 packets sent
83065862 unicast packets received
122118974 unicast packets sent
0 non-unicast packets received
50735 non-unicast packets sent
0 input discards
0 input unknown protocols
1419 input errors
2858 output errors
0 collisions; 0 dropped


The above shows that the number of packets received is increasing, but
the number of packets sent is not.  It also shows that the number of
input and output errors is increasing.

mbufShow shows no free buffers of size 128 and above.

ioc13ida> mbufShow
type number
--------- ------
FREE : 150
DATA : 581
HEADER : 69
SOCKET : 0
PCB : 0
RTABLE : 0
HTABLE : 0
ATABLE : 0
SONAME : 0
ZOMBIE : 0
SOOPTS : 0
FTABLE : 0
RIGHTS : 0
IFADDR : 0
CONTROL : 0
OOBDATA : 0
IPMOPTS : 0
IPMADDR : 0
IFMADDR : 0
MRTABLE : 0
TOTAL : 800
number of mbufs: 800
number of times failed to find space: 22579
number of times waited for space: 0
number of times drained protocols for space: 22526
__________________
CLUSTER POOL TABLE
________________________________________________________________________
_______
size clusters free usage
------------------------------------------------------------------------
-------
64 125 50 68411923 128 400 0 115759801 256 50 0 45096817 512 25 0 11828199 1024 25 0 8333 2048 25 0 1558940 ------------------------------------------------------------------------
-------


netStackSysPoolShow shows no problems:

ioc13ida> netStackSysPoolShow
type number
--------- ------
FREE : 732
DATA : 0
HEADER : 0
SOCKET : 95
PCB : 116
RTABLE : 75
HTABLE : 0
ATABLE : 0
SONAME : 0
ZOMBIE : 0
SOOPTS : 0
FTABLE : 0
RIGHTS : 0
IFADDR : 4
CONTROL : 0
OOBDATA : 0
IPMOPTS : 0
IPMADDR : 2
IFMADDR : 0
MRTABLE : 0
TOTAL : 1024
number of mbufs: 1024
number of times failed to find space: 0
number of times waited for space: 0
number of times drained protocols for space: 0
__________________
CLUSTER POOL TABLE
________________________________________________________________________
_______
size clusters free usage
------------------------------------------------------------------------
-------
64 256 206 65 128 256 154 63893 256 256 211 4103 512 256 161 63886 ------------------------------------------------------------------------
-------
value = 80 = 0x50 = 'P'



Finally a command Andrew added to show the Ethernet driver pool shows no free clusters:


ioc13ida> endPoolShow
Device name needed, e.g. "ei" or "dc"
value = -1 = 0xffffffff = ipAddrToAsciiEnginePrivate type_info node +
0xfe2f46af
ioc13ida> endPoolShow("dc")
type number
--------- ------
FREE : 432
DATA : 80
HEADER : 0
SOCKET : 0
PCB : 0
RTABLE : 0
HTABLE : 0
ATABLE : 0
SONAME : 0
ZOMBIE : 0
SOOPTS : 0
FTABLE : 0
RIGHTS : 0
IFADDR : 0
CONTROL : 0
OOBDATA : 0
IPMOPTS : 0
IPMADDR : 0
IFMADDR : 0
MRTABLE : 0
TOTAL : 512
number of mbufs: 512
number of times failed to find space: 0
number of times waited for space: 0
number of times drained protocols for space: 0
__________________
CLUSTER POOL TABLE
________________________________________________________________________
_______
size clusters free usage
------------------------------------------------------------------------
-------
1520 208 0 205183548 ------------------------------------------------------------------------
-------
ioc13ida>


Another command, netQueueShow shows that there have been 249 dropped
transmissions:

ioc13ida> netQueueShow
IP Rx queue: len = 0, max = 50, drops = 0
ARP Rx queue: len = 0, max = 50, drops = 0
dc0 Tx queue: len = 50, max = 50, drops = 249
value = 0 = 0x0
ioc13ida>


Has anyone else been seeing such problems?  This is happening on at
least 3 IOCs with a frequency of about once per week per IOC.  Too rare
to do easy debugging, but too frequent to live with.  There does not
seem to be any correlation with something unusual happening on the IOC.

Mark



Replies:
RE: vxWorks network problem on MVME2700 Thompson, David H.
References:
vxWorks network problem on MVME2700 Mark Rivers

Navigate by Date:
Prev: vxWorks network problem on MVME2700 Mark Rivers
Next: fedora core 7 medm (was >> Re: MEDM compile - update :) Mauro Giacchini
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  <20072008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
Navigate by Thread:
Prev: vxWorks network problem on MVME2700 Mark Rivers
Next: RE: vxWorks network problem on MVME2700 Thompson, David H.
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  <20072008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
ANJ, 10 Nov 2011 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·