> All of the packets are still going through the same 10 GbE switch, though, right (i.e., the one labeled "10 Gbit switch #1" in the network path diagram you included in a message upthread)?
> So, that switch has been involved in all of the tests conducted so far, so it still could be the problem, right?
No, the two different Linux systems have different switches. The one I showed in a previous message is on the APS experiment hall floor and is our Centos 7 server corvette.
The second Linux machine I have tested is an Ubuntu 18 system in my office. Its topology is:
Linux machine (has both 10 Gbit and 1 Gbit NICs)
|
10 Gbit switch #1 (in my office)
| (1 Gbit uplink)
1 Gbit switch #2 (in APS network closet)
| (possibly additional switches in here, I'm not sure)
|
1 Gbit switch
|
Device (10 Mbit AUI)
> Also, are the 10 GbE switches labeled "10 Gbit switch #1" and "10 Gbit switch #2" in the network path diagram identical and running the same OS or firmware version, or are they different?
Different OS and firmware. Switch #1 is a Dell in the system on the floor is a Dell N1548, while switch #1 in my office is a Netgear X5712T. Switch #2 in both cases is managed by the APS in the network closet. It is definitely not
a Dell, but is probably an HP or Cisco and they may or may not be the same switch.
> Also, are those two switches managed or unmanaged? If managed, can you find out what the switch has set the speed and duplex to for the ports involved to ensure that it set them correctly?
Switches #1 and #2 are managed for both configurations. The link between them is definitely 10 Gbit full-duplex. The final 1 Gbit switch to the device is unmanaged. But it clearly is configured correctly because when the Linux NIC
is 1 Gbit it works fine. And using a different NIC on Linux has no effect on the configuration of the port on the final 1 Gbit switch. And I have tested 2 devices that are connected to 2 different 1 Gbit switches. They are both Dell but different models.
> Could you show the output of "ethtool -S p5p1" just in case it shows more detail about exactly what it means by RX "frame"?
Here is the current output of ifconfig and ethtool (abbreviated). ifconfig "frame" is 235, which is the same as ethtool "rx_length_errors", so those are the same thing. They are not CRC errors, which is what I think Michael was assuming.
corvette:areaDetector/ADCore/iocBoot>/sbin/ifconfig p5p1
p5p1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 164.54.160.82 netmask 255.255.255.0 broadcast 164.54.160.255
inet6 fe80::3efd:feff:fea3:f258 prefixlen 64 scopeid 0x20<link>
ether 3c:fd:fe:a3:f2:58 txqueuelen 1000 (Ethernet)
RX packets 147929456684 bytes 111438844085337 (101.3 TiB)
RX errors 0 dropped 920 overruns 0 frame 235
TX packets 100625271243 bytes 29596808110595 (26.9 TiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
corvette:areaDetector/ADCore/iocBoot>/sbin/ethtool -S p5p1
NIC statistics:
rx_packets: 147929466577
tx_packets: 100625281563
rx_bytes: 111438844918608
tx_bytes: 29596809166293
rx_errors: 0
tx_errors: 0
rx_dropped: 891
tx_dropped: 0
collisions: 0
rx_length_errors: 235
rx_crc_errors: 0
rx_unicast: 147182042495
tx_unicast: 100555363340
rx_multicast: 5115
tx_multicast: 412
rx_broadcast: 747419793
tx_broadcast: 69917765
rx_unknown_protocol: 0
tx_linearize: 2166
tx_force_wb: 0
rx_alloc_fail: 0
rx_pg_alloc_fail: 0
> Is the NIC driver the same for the 1 GbE and the 10 GbE NICs on Linux?
I'm not sure. How can I tell that?
> Do you have a Windows machine with a 10 GbE NIC that you could try?
Yes, I could try that, but I have not yet.
> Do you have a Mac with a 10 GbE NIC that you could try?
No.
I should add that I normally actually communicate with these devices from vxWorks, going through the same switches, and that is working fine and has been for 20 years (with switch upgrades over the years of course). I would like to
move from vxWorks to Linux but have hit this problem with the 10 Gbit NICs.
Thanks,
Mark
-----Original Message-----
From: J. Lewis Muir <[email protected]>
Sent: Monday, January 6, 2020 2:39 PM
To: Mark Rivers <[email protected]>
Cc: [email protected]
Subject: Re: Ethernet question
On 12/23, Mark Rivers via Tech-talk wrote:
> On further investigation I found that the problem only occurs when using 10 Gbit Ethernet adapters. When using a 1 Gbit adapter it works fine. I was able to see this in a single machine that has both 10 Gbit and 1 Gbit adapters.
If I use the 1 Gbit adapter it works fine, if I use the 10 Gbit adapter it fails (but kind of works as described above). On 3 separate systems 1 Gbit works, and 3 other systems 10 Gbit fails.
Is the NIC driver the same for the 1 GbE and the 10 GbE NICs on Linux?
Do you have a Windows machine with a 10 GbE NIC that you could try?
Do you have a Mac with a 10 GbE NIC that you could try?
Lewis