The dec driver has a couple of different problems. One, the driver allocates loan buffers out of the TX pool so if the TX pool isn't large a starved ca client or ca repeater on the IOC can cause E_NOBUFS. The other is that when the Ethernet mac interface is not able to get rid of packets because the physical layer is down UDP packets back up in the buffer pool and finally break the driver.
WRS has a fix for the latter problem that I have not tested yet.
________________________________
From: [email protected] on behalf of Maren Purves
Sent: Fri 10/12/2007 5:10 PM
To: Mark Rivers
Cc: TechTalk EPICS
Subject: Re: vxWorks network problem on MVME2700
Mark,
this reminds me of a problem Ian Smith had at the ATC (Edinburgh) last
year. The solution isn't on the exploder (at least not if I search by
"Smith"), but I found something in my email:
-------------- quote -----------
I think I've eliminated almost everything and still have the problem.
I've set up another board, used the vxWorks that you sent me and the
problem persists.
I got one reply from the epics group:
"The folks at the SNS had to patch there vxWorks IP kernel against
defects related to mbuf starvation IP deadlocks (both in the kernel
itself and also in the NIC driver)."
------------- unquote ----------------
the "you" refers to is Craig Walther.
Hope this helps any at all,
Maren
(Ian has since taken redundancy and may be hard to get hold of)
Mark Rivers wrote:
> Folks,
>
> We have been getting network lockups on our MVME2700 boards. This is
> running vxWorks 5.4.2, EPICS 3.14.8.2, and Andrew Johnson's latest BSP
> asd9-nodns.
>
> When it happens it appears that the network is still receiving packets,
> but not sending any, as seen by 2 successive ifShow commands:
>
> ioc13ida> ifShow("dc")
> dc (unit number 0):
> Flags: (0x8063) UP BROADCAST MULTICAST ARP RUNNING
> Type: ETHERNET_CSMACD
> Internet address: 164.54.160.75
> Broadcast address: 164.54.160.255
> Netmask 0xffff0000 Subnetmask 0xffffff00
> Ethernet address is 08:00:3e:2f:39:46
> Metric is 0
> Maximum Transfer Unit size is 1500
> 0 octets received
> 0 octets sent
> 83065850 packets received
> 122169709 packets sent
> 83065850 unicast packets received
> 122118974 unicast packets sent
> 0 non-unicast packets received
> 50735 non-unicast packets sent
> 0 input discards
> 0 input unknown protocols
> 1407 input errors
> 2834 output errors
> 0 collisions; 0 dropped
> ioc13ida>
> ioc13ida> ifShow("dc")
> dc (unit number 0):
> Flags: (0x8063) UP BROADCAST MULTICAST ARP RUNNING
> Type: ETHERNET_CSMACD
> Internet address: 164.54.160.75
> Broadcast address: 164.54.160.255
> Netmask 0xffff0000 Subnetmask 0xffffff00
> Ethernet address is 08:00:3e:2f:39:46
> Metric is 0
> Maximum Transfer Unit size is 1500
> 0 octets received
> 0 octets sent
> 83065862 packets received
> 122169709 packets sent
> 83065862 unicast packets received
> 122118974 unicast packets sent
> 0 non-unicast packets received
> 50735 non-unicast packets sent
> 0 input discards
> 0 input unknown protocols
> 1419 input errors
> 2858 output errors
> 0 collisions; 0 dropped
>
> The above shows that the number of packets received is increasing, but
> the number of packets sent is not. It also shows that the number of
> input and output errors is increasing.
>
> mbufShow shows no free buffers of size 128 and above.
>
> ioc13ida> mbufShow
> type number
> --------- ------
> FREE : 150
> DATA : 581
> HEADER : 69
> SOCKET : 0
> PCB : 0
> RTABLE : 0
> HTABLE : 0
> ATABLE : 0
> SONAME : 0
> ZOMBIE : 0
> SOOPTS : 0
> FTABLE : 0
> RIGHTS : 0
> IFADDR : 0
> CONTROL : 0
> OOBDATA : 0
> IPMOPTS : 0
> IPMADDR : 0
> IFMADDR : 0
> MRTABLE : 0
> TOTAL : 800
> number of mbufs: 800
> number of times failed to find space: 22579
> number of times waited for space: 0
> number of times drained protocols for space: 22526
> __________________
> CLUSTER POOL TABLE
> ________________________________________________________________________
> _______
> size clusters free usage
> ------------------------------------------------------------------------
> -------
> 64 125 50 68411923
> 128 400 0 115759801
> 256 50 0 45096817
> 512 25 0 11828199
> 1024 25 0 8333
> 2048 25 0 1558940
> ------------------------------------------------------------------------
> -------
>
> netStackSysPoolShow shows no problems:
>
> ioc13ida> netStackSysPoolShow
> type number
> --------- ------
> FREE : 732
> DATA : 0
> HEADER : 0
> SOCKET : 95
> PCB : 116
> RTABLE : 75
> HTABLE : 0
> ATABLE : 0
> SONAME : 0
> ZOMBIE : 0
> SOOPTS : 0
> FTABLE : 0
> RIGHTS : 0
> IFADDR : 4
> CONTROL : 0
> OOBDATA : 0
> IPMOPTS : 0
> IPMADDR : 2
> IFMADDR : 0
> MRTABLE : 0
> TOTAL : 1024
> number of mbufs: 1024
> number of times failed to find space: 0
> number of times waited for space: 0
> number of times drained protocols for space: 0
> __________________
> CLUSTER POOL TABLE
> ________________________________________________________________________
> _______
> size clusters free usage
> ------------------------------------------------------------------------
> -------
> 64 256 206 65
> 128 256 154 63893
> 256 256 211 4103
> 512 256 161 63886
> ------------------------------------------------------------------------
> -------
> value = 80 = 0x50 = 'P'
>
>
> Finally a command Andrew added to show the Ethernet driver pool shows no
> free clusters:
>
>
> ioc13ida> endPoolShow
> Device name needed, e.g. "ei" or "dc"
> value = -1 = 0xffffffff = ipAddrToAsciiEnginePrivate type_info node +
> 0xfe2f46af
> ioc13ida> endPoolShow("dc")
> type number
> --------- ------
> FREE : 432
> DATA : 80
> HEADER : 0
> SOCKET : 0
> PCB : 0
> RTABLE : 0
> HTABLE : 0
> ATABLE : 0
> SONAME : 0
> ZOMBIE : 0
> SOOPTS : 0
> FTABLE : 0
> RIGHTS : 0
> IFADDR : 0
> CONTROL : 0
> OOBDATA : 0
> IPMOPTS : 0
> IPMADDR : 0
> IFMADDR : 0
> MRTABLE : 0
> TOTAL : 512
> number of mbufs: 512
> number of times failed to find space: 0
> number of times waited for space: 0
> number of times drained protocols for space: 0
> __________________
> CLUSTER POOL TABLE
> ________________________________________________________________________
> _______
> size clusters free usage
> ------------------------------------------------------------------------
> -------
> 1520 208 0 205183548
> ------------------------------------------------------------------------
> -------
> ioc13ida>
>
> Another command, netQueueShow shows that there have been 249 dropped
> transmissions:
>
> ioc13ida> netQueueShow
> IP Rx queue: len = 0, max = 50, drops = 0
> ARP Rx queue: len = 0, max = 50, drops = 0
> dc0 Tx queue: len = 50, max = 50, drops = 249
> value = 0 = 0x0
> ioc13ida>
>
> Has anyone else been seeing such problems? This is happening on at
> least 3 IOCs with a frequency of about once per week per IOC. Too rare
> to do easy debugging, but too frequent to live with. There does not
> seem to be any correlation with something unusual happening on the IOC.
>
> Mark
>
- References:
- vxWorks network problem on MVME2700 Mark Rivers
- Re: vxWorks network problem on MVME2700 Maren Purves
- Navigate by Date:
- Prev:
Installing EDM on OS X 10.4.10 Bertrand H.J. Biritz
- Next:
Re: Installing EDM on OS X 10.4.10 Eric Norum
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
<2007>
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
- Navigate by Thread:
- Prev:
Re: vxWorks network problem on MVME2700 Maren Purves
- Next:
Re: vxWorks network problem on MVME2700 Martin L. Smith
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
<2007>
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
|