Hi Lewis,
Ø
Also, I'm not understanding how this works at all from your office. You said it's a Netgear X5712T, but according to the Product Data Sheet listed at
Ø
https://www.netgear.com/support/product/XS712T.aspx#docs
Ø
it doesn't support *any* IEEE 802.2 protocol. It lists the following IEEE network protocols as supported:
Ø
…
I think that list of protocols supported is what it controls for managing/controlling the switch, not what protocols it supports for normal switching operations. All
switches should support SNAP.
I just did some tests on the office Linux machine with the Netgear 10Gbit switch.
It runs fine if I use the 1 Gbit interface. There are no errors reported by ifconfig or ethtool.
When I use the 10 Gbit interface it does not work, and I see errors, though ethtool labels them differently than it does on the Centos 7 system.
This is the output of ethtool –i on the two NICs
10 Gbit NIC:
TahoeU18:/corvette/home/epics/devel/mca/iocBoot/iocLinux> ethtool -i enp23s0f1
driver: ixgbe
version: 5.1.0-k
firmware-version: 0x8000087c
expansion-rom-version:
bus-info: 0000:17:00.1
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes
1 Gbit NIC:
TahoeU18:/corvette/home/epics/devel/mca/iocBoot/iocLinux> ethtool -i eno1
driver: e1000e
version: 3.2.6-k
firmware-version: 0.1-4
expansion-rom-version:
bus-info: 0000:00:1f.6
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: no
So they are different drivers.
After I run the IOC on the 10Gbit NIC there are 6 RX errors and 6 frame errors reported by ifconfig
enp23s0f1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 9000
inet 192.168.0.1 netmask 255.255.255.0 broadcast 192.168.0.255
inet6 fe80::b49a:5672:707a:2bd5 prefixlen 64 scopeid 0x20<link>
ether b4:96:91:2e:0a:76 txqueuelen 1000 (Ethernet)
RX packets 7380 bytes 1919282 (1.9 MB)
RX errors 6 dropped 0 overruns 0 frame 6
TX packets 438 bytes 77751 (77.7 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
This is the output of ethtool –S on that NIC. Note that it lists rx_errors=6 and lsc_int=6. This is different from Centos 7, which reported rx_errrors=0 and rx_length_errors=235.
Each time I start the IOC rx_errors increases by 6, while lsc_int does not increase, it stays at 6.
I am not sure what lsc_int means. But I think the errors are probably really the same.
TahoeU18:/corvette/home/epics/devel/mca/iocBoot/iocLinux> ethtool -S enp23s0f1
NIC statistics:
rx_packets: 8431
tx_packets: 438
rx_bytes: 2155147
tx_bytes: 77751
rx_pkts_nic: 8431
tx_pkts_nic: 438
rx_bytes_nic: 2188871
tx_bytes_nic: 79988
lsc_int: 6
tx_busy: 0
non_eop_descs: 0
rx_errors: 6
tx_errors: 0
rx_dropped: 0
tx_dropped: 0
multicast: 91
broadcast: 8324
rx_no_buffer_count: 0
collisions: 0
rx_over_errors: 0
rx_crc_errors: 0
rx_frame_errors: 0
hw_rsc_aggregated: 0
hw_rsc_flushed: 0
fdir_match: 0
fdir_miss: 7147
fdir_overflow: 0
This system is a triple-boot system with Ubuntu 18, Centos 7, and Windows 10.
I just booted into Windows 10 and ran the EPICS IOC using both the 1 Gbit NIC and the 10 Gbit NIC.
It worked fine with the 1 Gbit card, just as on Linux. It also failed with the 10 Gbit card in exactly the same manner as Linux. Much of the SNAP communication worked.
For example, it was able to multicast a SNAP message asking all Canberra AIM modules on the network to identify themselves. It received all of the responses OK and built this table.
mcaAIMShowModules
Module Type HW rev. FW rev. Owner name Owner ID Status Memory size Free address
NI0006e6 1 0 5 TahoeU18 B4:96:91:2E:0A:76 Reachable 261116 00000000
NI0008d7 1 0 6 ioc13bmc 08:00:3E:2E:63:37 Reachable 261116 00000000
NI0009ce 1 0 6 ioc13bmd 00:01:AF:0A:6B:71 Reachable 261116 00000000
NI00059e 1 0 5 ioc13idd 00:01:AF:0A:6B:5F Reachable 261116 00000000
NI0003ed 1 0 6 Tahoe B4:96:91:2E:0A:76 Reachable 261116 00012000
However, when it tried to ask for the Instrument Control Bus modules attached to module
NI0003ed it received no reply. This is exactly be the same behavior as on Linux. This is the output of a command I found for getting interface statistics in Windows with
PowerShell. It does not appear to show any errors, but I think there probably are receive errors of some sort.
PS C:\Users\epics> Get-NetAdapterStatistics -Name "Ethernet 2" | Format-list -Property "*"
ifAlias : Ethernet 2
InterfaceAlias : Ethernet 2
ifDesc : Intel(R) Ethernet 10G 2P X550-t Adapter
Caption : MSFT_NetAdapterStatisticsSettingData 'Intel(R) Ethernet 10G 2P X550-t Adapter'
Description : Intel(R) Ethernet 10G 2P X550-t Adapter
ElementName : Intel(R) Ethernet 10G 2P X550-t Adapter
InstanceID : {7314A851-AA79-4A9C-937F-F8C73BC7BB75}
InterfaceDescription : Intel(R) Ethernet 10G 2P X550-t Adapter
Name : Ethernet 2
Source : 2
SystemName : Tahoe.CARS.APS.ANL.GOV
OutboundDiscardedPackets : 0
OutboundPacketErrors : 0
RdmaStatistics : MSFT_NetAdapter_RdmaStatistics
ReceivedBroadcastBytes : 7899700
ReceivedBroadcastPackets : 54709
ReceivedBytes : 10010388
ReceivedDiscardedPackets : 0
ReceivedMulticastBytes : 2107509
ReceivedMulticastPackets : 5331
ReceivedPacketErrors : 0
ReceivedUnicastBytes : 3179
ReceivedUnicastPackets : 37
RscStatistics : MSFT_NetAdapter_RscStatistics
SentBroadcastBytes : 11814
SentBroadcastPackets : 171
SentBytes : 63550
SentMulticastBytes : 47560
SentMulticastPackets : 235
SentUnicastBytes : 1996
SentUnicastPackets : 15
SupportedStatistics : 4163583
PSComputerName :
CimClass : ROOT/StandardCimv2:MSFT_NetAdapterStatisticsSettingData
CimInstanceProperties : {Caption, Description, ElementName, InstanceID...}
CimSystemProperties : Microsoft.Management.Infrastructure.CimSystemProperties
So I conclude that it is most likely not a bug in the Linux or Windows network driver since the behavior is the same on both.
Mark
-----Original Message-----
From: J. Lewis Muir <[email protected]>
Sent: Tuesday, January 7, 2020 11:14 AM
To: Mark Rivers <[email protected]>
Cc: [email protected]
Subject: Re: Ethernet question
On 01/06, Mark Rivers wrote:
> On 01/06, J. Lewis Muir wrote:
> > Could you show the output of "ethtool -S p5p1" just in case it shows
> > more detail about exactly what it means by RX "frame"?
>
> Here is the current output of ifconfig and ethtool (abbreviated). ifconfig "frame" is 235, which is the same as ethtool "rx_length_errors", so those are the same thing. They are not CRC errors, which is what I think Michael was assuming.
Yes, I think that's a helpful clue. The frame error indicates a malformed packet, and if the packet is damaged, perhaps due to bad network hardware or local collisions, the CRC checksum would be incorrect. But in this case, rx_crc_errors
is 0 which means the CRC checksum is correct and the problem is likely that the packet is an invalid size, hence rx_length_errors being 235.
The packet might be too short (e.g., for Ethernet, too short would be less than 64 bytes, but I'm not sure how SNAP might affect this), too long (e.g., greater than the MTU of the network), or some other issue.
This could be the result of bad hardware or perhaps a network stack or driver bug where the network stack or driver corrupts the packet or rejects it as being malformed when it is not.
If there's a firewall running on the Linux machine, can you try disabling it temporarily just to be sure that it's not causing a problem? (I know it works with the 1 GbE NIC, and I know the protocol is SNAP which the firewall should
not even touch assuming it's an IP packet filter, but if there's a bug somewhere, then it seems worth at least checking.)
Also, I'm not understanding how this works at all from your office. You said it's a Netgear X5712T, but according to the Product Data Sheet listed at
https://www.netgear.com/support/product/XS712T.aspx#docs
it doesn't support *any* IEEE 802.2 protocol. It lists the following IEEE network protocols as supported:
* IEEE 802.3 Ethernet
* IEEE 802.3u 100BASE-T (XS712T only)
* IEEE 802.3ab 1000BASE-T
* IEEE 802.1Q VLAN Tagging
* IEEE 802.3x Full-Duplex Flow Control
* IEEE 802.3z Gigabit Ethernet 1000BASE-SX/LX
* IEEE 802.3an 10GBASE-T 10 Gbit/s Ethernet Over Copper Twisted Pair Cable
* IEEE 802.3ae 10-Gigabit Ethernet Over Fiber (10GBASE-SR, 10GBASE-LR,
10GBASE-ER, 10GBASE-LX4)
* IEEE 802.3ad Trunking (LACP)
* IEEE 802.1AB LLDP with ANSI/TIA-1057 (LLDP-MED)
* IEEE 802.1p Class of Service
* IEEE 802.1D Spanning Tree (STP)
* IEEE 802.1s Multiple Spanning Tree (MSTP)
* IEEE 802.1w Rapid Spanning Tree (RSTP)
* IEEE 802.1x RADIUS Network Access Control
* IEEE 802.3az Energy Efficient Ethernet (EEE) Compliant
How does this work at all?
I'm thinking the answer is that it's SNAP, and SNAP is using IEEE 802.3 (Ethernet)? But I know next to nothing about SNAP.
> > Is the NIC driver the same for the 1 GbE and the 10 GbE NICs on Linux?
>
> I'm not sure. How can I tell that?
$ readlink -f /sys/class/net/eno1/device/driver $ readlink -f /sys/class/net/p5p1/device/driver
or
$ ethtool -i eno1
$ ethtool -i p5p1
It would also be interesting to know the make and model of the 10 GbE
NIC:
$ lspci -v | grep -i ether
> > Do you have a Windows machine with a 10 GbE NIC that you could try?
>
> Yes, I could try that, but I have not yet.
That might be interesting if there's a bug in the Linux network stack or NIC driver.
Lewis