EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  <20202021  2022  2023  2024  Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  <20202021  2022  2023  2024 
<== Date ==> <== Thread ==>

Subject: Re: Ethernet question
From: Ryan Pierce via Tech-talk <[email protected]>
To: Mark Rivers <[email protected]>
Cc: "[email protected]" <[email protected]>
Date: Tue, 7 Jan 2020 23:35:59 -0600
It could be NIC hardware / firmware related then. Do all three of your failed Linux boxes use the same 10 Gbit NICs?

I wouldn’t be surprised if NIC manufacturers don’t bother testing SNAP in promiscuous mode. Because the response is what’s dropped / recorded as an error, and promiscuous mode is necessary to get the received packet to the application, I’d be inclined to think something on the NIC is unhappy with those packets.

Ryan

On Jan 7, 2020, at 4:44 PM, Mark Rivers via Tech-talk <[email protected]> wrote:



Hi Lewis,

 

Ø  Also, I'm not understanding how this works at all from your office.  You said it's a Netgear X5712T, but according to the Product Data Sheet listed at

Ø  https://www.netgear.com/support/product/XS712T.aspx#docs

Ø  it doesn't support *any* IEEE 802.2 protocol.  It lists the following IEEE network protocols as supported:

Ø 

 

I think that list of protocols supported is what it controls for managing/controlling the switch, not what protocols it supports for normal switching operations.  All switches should support SNAP.

 

I just did some tests on the office Linux machine with the Netgear 10Gbit switch.

 

It runs fine if I use the 1 Gbit interface.  There are no errors reported by ifconfig or ethtool.

 

When I use the 10 Gbit interface it does not work, and I see errors, though ethtool labels them differently than it does on the Centos 7 system.

 

This is the output of ethtool –i on the two NICs

 

10 Gbit NIC:

TahoeU18:/corvette/home/epics/devel/mca/iocBoot/iocLinux> ethtool -i enp23s0f1

driver: ixgbe

version: 5.1.0-k

firmware-version: 0x8000087c

expansion-rom-version:

bus-info: 0000:17:00.1

supports-statistics: yes

supports-test: yes

supports-eeprom-access: yes

supports-register-dump: yes

supports-priv-flags: yes

 

1 Gbit NIC:

TahoeU18:/corvette/home/epics/devel/mca/iocBoot/iocLinux> ethtool -i eno1

driver: e1000e

version: 3.2.6-k

firmware-version: 0.1-4

expansion-rom-version:

bus-info: 0000:00:1f.6

supports-statistics: yes

supports-test: yes

supports-eeprom-access: yes

supports-register-dump: yes

supports-priv-flags: no

 

So they are different drivers.

 

After I run the IOC on the 10Gbit NIC there are 6 RX errors and 6 frame errors reported by ifconfig

 

enp23s0f1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9000

        inet 192.168.0.1  netmask 255.255.255.0  broadcast 192.168.0.255

        inet6 fe80::b49a:5672:707a:2bd5  prefixlen 64  scopeid 0x20<link>

        ether b4:96:91:2e:0a:76  txqueuelen 1000  (Ethernet)

        RX packets 7380  bytes 1919282 (1.9 MB)

        RX errors 6  dropped 0  overruns 0  frame 6

        TX packets 438  bytes 77751 (77.7 KB)

        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

 

 

This is the output of ethtool –S on that NIC.  Note that it lists rx_errors=6 and lsc_int=6.  This is different from Centos 7, which reported rx_errrors=0 and rx_length_errors=235.  Each time I start the IOC rx_errors increases by 6, while lsc_int does not increase, it stays at 6.

 

I am not sure what lsc_int means.  But I think the errors are probably really the same.

 

TahoeU18:/corvette/home/epics/devel/mca/iocBoot/iocLinux> ethtool -S enp23s0f1

NIC statistics:

     rx_packets: 8431

     tx_packets: 438

     rx_bytes: 2155147

     tx_bytes: 77751

     rx_pkts_nic: 8431

     tx_pkts_nic: 438

     rx_bytes_nic: 2188871

     tx_bytes_nic: 79988

     lsc_int: 6

     tx_busy: 0

     non_eop_descs: 0

     rx_errors: 6

     tx_errors: 0

     rx_dropped: 0

     tx_dropped: 0

     multicast: 91

     broadcast: 8324

     rx_no_buffer_count: 0

     collisions: 0

     rx_over_errors: 0

     rx_crc_errors: 0

     rx_frame_errors: 0

     hw_rsc_aggregated: 0

     hw_rsc_flushed: 0

     fdir_match: 0

     fdir_miss: 7147

     fdir_overflow: 0

    

This system is a triple-boot system with Ubuntu 18, Centos 7, and Windows 10.

 

I just booted into Windows 10 and ran the EPICS IOC using both the 1 Gbit NIC and the 10 Gbit NIC.

 

It worked fine with the 1 Gbit card, just as on Linux.  It also failed with the 10 Gbit card in exactly the same manner as Linux.  Much of the SNAP communication worked.  For example, it was able to multicast a SNAP message asking all Canberra AIM modules on the network to identify themselves.  It received all of the responses OK and built this table.

 

mcaAIMShowModules

Module     Type  HW rev.  FW rev.  Owner name      Owner ID       Status      Memory size Free address

NI0006e6    1       0        5       TahoeU18  B4:96:91:2E:0A:76  Reachable     261116      00000000

NI0008d7    1       0        6       ioc13bmc  08:00:3E:2E:63:37  Reachable     261116      00000000

NI0009ce    1       0        6       ioc13bmd  00:01:AF:0A:6B:71  Reachable     261116      00000000

NI00059e    1       0        5       ioc13idd  00:01:AF:0A:6B:5F  Reachable     261116      00000000

NI0003ed    1       0        6       Tahoe     B4:96:91:2E:0A:76  Reachable     261116      00012000

 

However, when it tried to ask for the Instrument Control Bus modules attached to module NI0003ed it received no reply.  This is exactly be the same behavior as on Linux.  This is the output of a command I found for getting interface statistics in Windows with PowerShell. It does not appear to show any errors, but I think there probably are receive errors of some sort.

 

 

PS C:\Users\epics> Get-NetAdapterStatistics -Name "Ethernet 2" | Format-list -Property "*"

 

ifAlias                  : Ethernet 2

InterfaceAlias           : Ethernet 2

ifDesc                   : Intel(R) Ethernet 10G 2P X550-t Adapter

Caption                  : MSFT_NetAdapterStatisticsSettingData 'Intel(R) Ethernet 10G 2P X550-t Adapter'

Description              : Intel(R) Ethernet 10G 2P X550-t Adapter

ElementName              : Intel(R) Ethernet 10G 2P X550-t Adapter

InstanceID               : {7314A851-AA79-4A9C-937F-F8C73BC7BB75}

InterfaceDescription     : Intel(R) Ethernet 10G 2P X550-t Adapter

Name                     : Ethernet 2

Source                   : 2

SystemName               : Tahoe.CARS.APS.ANL.GOV

OutboundDiscardedPackets : 0

OutboundPacketErrors     : 0

RdmaStatistics           : MSFT_NetAdapter_RdmaStatistics

ReceivedBroadcastBytes   : 7899700

ReceivedBroadcastPackets : 54709

ReceivedBytes            : 10010388

ReceivedDiscardedPackets : 0

ReceivedMulticastBytes   : 2107509

ReceivedMulticastPackets : 5331

ReceivedPacketErrors     : 0

ReceivedUnicastBytes     : 3179

ReceivedUnicastPackets   : 37

RscStatistics            : MSFT_NetAdapter_RscStatistics

SentBroadcastBytes       : 11814

SentBroadcastPackets     : 171

SentBytes                : 63550

SentMulticastBytes       : 47560

SentMulticastPackets     : 235

SentUnicastBytes         : 1996

SentUnicastPackets       : 15

SupportedStatistics      : 4163583

PSComputerName           :

CimClass                 : ROOT/StandardCimv2:MSFT_NetAdapterStatisticsSettingData

CimInstanceProperties    : {Caption, Description, ElementName, InstanceID...}

CimSystemProperties      : Microsoft.Management.Infrastructure.CimSystemProperties

 

 

So I conclude that it is most likely not a bug in the Linux or Windows network driver since the behavior is the same on both. 

 

Mark

 

 

 

-----Original Message-----
From: J. Lewis Muir <[email protected]>
Sent: Tuesday, January 7, 2020 11:14 AM
To: Mark Rivers <[email protected]>
Cc: [email protected]
Subject: Re: Ethernet question

 

On 01/06, Mark Rivers wrote:

> On 01/06, J. Lewis Muir wrote:

> > Could you show the output of "ethtool -S p5p1" just in case it shows

> > more detail about exactly what it means by RX "frame"?

>

> Here is the current output of ifconfig and ethtool (abbreviated).  ifconfig "frame" is 235, which is the same as ethtool "rx_length_errors", so those are the same thing.  They are not CRC errors, which is what I think Michael was assuming.

 

Yes, I think that's a helpful clue.  The frame error indicates a malformed packet, and if the packet is damaged, perhaps due to bad network hardware or local collisions, the CRC checksum would be incorrect.  But in this case, rx_crc_errors is 0 which means the CRC checksum is correct and the problem is likely that the packet is an invalid size, hence rx_length_errors being 235.

 

The packet might be too short (e.g., for Ethernet, too short would be less than 64 bytes, but I'm not sure how SNAP might affect this), too long (e.g., greater than the MTU of the network), or some other issue.

This could be the result of bad hardware or perhaps a network stack or driver bug where the network stack or driver corrupts the packet or rejects it as being malformed when it is not.

 

If there's a firewall running on the Linux machine, can you try disabling it temporarily just to be sure that it's not causing a problem?  (I know it works with the 1 GbE NIC, and I know the protocol is SNAP which the firewall should not even touch assuming it's an IP packet filter, but if there's a bug somewhere, then it seems worth at least checking.)

 

Also, I'm not understanding how this works at all from your office.  You said it's a Netgear X5712T, but according to the Product Data Sheet listed at

 

  https://www.netgear.com/support/product/XS712T.aspx#docs

 

it doesn't support *any* IEEE 802.2 protocol.  It lists the following IEEE network protocols as supported:

 

* IEEE 802.3 Ethernet

* IEEE 802.3u 100BASE-T (XS712T only)

* IEEE 802.3ab 1000BASE-T

* IEEE 802.1Q VLAN Tagging

* IEEE 802.3x Full-Duplex Flow Control

* IEEE 802.3z Gigabit Ethernet 1000BASE-SX/LX

* IEEE 802.3an 10GBASE-T 10 Gbit/s Ethernet Over Copper Twisted Pair Cable

* IEEE 802.3ae 10-Gigabit Ethernet Over Fiber (10GBASE-SR, 10GBASE-LR,

  10GBASE-ER, 10GBASE-LX4)

* IEEE 802.3ad Trunking (LACP)

* IEEE 802.1AB LLDP with ANSI/TIA-1057 (LLDP-MED)

* IEEE 802.1p Class of Service

* IEEE 802.1D Spanning Tree (STP)

* IEEE 802.1s Multiple Spanning Tree (MSTP)

* IEEE 802.1w Rapid Spanning Tree (RSTP)

* IEEE 802.1x RADIUS Network Access Control

* IEEE 802.3az Energy Efficient Ethernet (EEE) Compliant

 

How does this work at all?

 

I'm thinking the answer is that it's SNAP, and SNAP is using IEEE 802.3 (Ethernet)?  But I know next to nothing about SNAP.

 

> > Is the NIC driver the same for the 1 GbE and the 10 GbE NICs on Linux?

>

> I'm not sure.  How can I tell that?

 

$ readlink -f /sys/class/net/eno1/device/driver $ readlink -f /sys/class/net/p5p1/device/driver

 

or

 

$ ethtool -i eno1

$ ethtool -i p5p1

 

It would also be interesting to know the make and model of the 10 GbE

NIC:

 

$ lspci -v | grep -i ether

 

> > Do you have a Windows machine with a 10 GbE NIC that you could try?

>

> Yes, I could try that, but I have not yet.

 

That might be interesting if there's a bug in the Linux network stack or NIC driver.

 

Lewis


References:
RE: Ethernet question Mark Rivers via Tech-talk

Navigate by Date:
Prev: CA link question Mark Rivers via Tech-talk
Next: RE: EPICS release series after 7.0: 7.1 or 8.0? Abdalla Ahmad via Tech-talk
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  <20202021  2022  2023  2024 
Navigate by Thread:
Prev: RE: Ethernet question Mark Rivers via Tech-talk
Next: RE: Sphinx/CSS question Mark Rivers via Tech-talk
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  <20202021  2022  2023  2024 
ANJ, 08 Jan 2020 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·