Over the last few days I have had the network of a few VxWorks IOCs hang.
PVs are white-boxed, the IOC does not respond to ping or any other network
traffic. The IOC cannot send or receive any network traffic. Except for
the network, the IOC appears to be running fine. The only way I have found
to restore network is to reboot. The IOCs are all VxWorks 5.4.2 on mv167
with EPICS 3.13.10. Other IOCs which are MVME5110, VxWorks 5.5.1, EPICS
3.14 do not seem to be effected.
I was able to correlate these events to some rouge UDP traffic on the
network (which has been eliminated). This traffic was Windows Messenger
spam targeting the Windows RPC messenger service which typically listens
on UDP ports 1025-1030.
I don't have a full understanding of what is happening, or whether the
problem is from VxWorks or EPICS, but here is what I have. CA-UDP opens a
UDP port on the IOC using the next available port (typically 1027-1029 on
my IOCs) for sending UDP beacons to listeners on UDP 5065 every 15
seconds. When a UDP packet is directed at CA_UDP's server port, however,
something goes wrong. inetstatShow() shows a positive value for Recv-Q
which never goes down, but increases as additional UDP traffic is directed
at it. The size of this buffer is set in netLib.h (UDP_RCV_SIZE_DFLT) to
41600, but it does not seem to be the buffer filling, but the frequency of
the traffic which locks up the network interface.
VxWorks utilities mbufShow, netStackDataPoolShow, netStackSysPoolShow,
ifShow, etc. don't show any abnormalities. (I have seen inputs errors on
ifShow of a hung IOC, but I think these are occuring after the fact.)
tNetTask is still running.
I have been able to reproduce this by using nmap
(http://www.insecure.org/nmap/) to send UDP scans at the CA_UDP port. With
UDP_CA using port 1028 on the IOC, looping this nmap scan (as root) will
cause the network to hang:
./nmap -sU -p 1028 testioc
inetstatShow will show:
Active Internet connections (including servers)
PCB Proto Recv-Q Send-Q Local Address Foreign Address
(state)
. . .
727480 UDP 1694 0 0.0.0.0.1028 0.0.0.0.0
with the Recv-Q increasing until at some point the IOC stops responding to
all network traffic.
Interestingly, on the effected mv167 VxWorks 5.4.2 IOCs, nmap reports this
port as in an "open" state, but on the non-effected MVME5110 VxWorks
5.5.1, the CA_UDP port is reported as "closed". I don't have any other
targets available to try this out on.
Any insight or suggestions or other tests to run?
Thanks,