Dear Jeff--
Thank you for the detailed reply. Some additional comments inline below.
> One would certainly expect that if messages were sent to this UDP port that
> the high water mark would be found very quickly and that there would be no
> negative impact on the IP kernel other than the network stack's data pool
> being reduced in size by UDP_RCV_SIZE_DFLT bytes.
One would hope so. In my observation, I did not see the Recv-Q ever reach
UDP_RCV_SIZE_DFLT before the network failed. So something else is
happening in the IP kernel. Unfortunately, I do not have enough different
types of VxWorks targets to investigate further, so I do not know how
widespread this problem may be.
> With sockets we can shutdown one side of their full duplex capabilities
> should we not need them. I'm not fully certain what internal impact that
> might have on the IP kernel, but this does appear to be a sensible thing
> to do in this situation. However, since you are reporting that the
> problem appears to be related to the frequency of the rogue traffic then
> it may be such a change will not have a functional impact on robustness.
I have modified my rsrv/online_notify.c so that rsrv_online_notify_task()
does a shutdown(sock,0); so receives are not allowed on this socket.
This seems to have fixed the problem. My test case no longer crashes the
IOC and I am not seeing anything in Recv-Q. Beacons are still working and
I have not seen any negative consequences with IOC functionality.
> That's interesting, but it's hard to comment further w/o knowing more about
> what criteria nmap uses to decide between a "closed" and "open" status
> report.
For a UDP scan, nmap sends an empty UDP header to the target. A response
of ICMP type 3, code 3 port unreachable error indicates the port is
"closed".
> Sounds like an IP kernel issue, but if we were to shutdown the receive side
> of that UDP socket it might be more robust.
>
> I created a Mantis issue against R3.15. I selected that release, and
> assigned a low priority to the fix, because this sounds primarily like a
> problem with a particular vxWorks IP kernel and its not clear that a fix
> will produce any visible benefit.
As I said above, shutting down the receive side seems to be an adequate
fix and gives sufficient robustness against anomalous traffic on these UDP
ports as far as my testing shows.
I also notice that in 3.14, the rsrv_online_notify_task() function is
changed from 3.13.10, in particular it uses connect() rather then bind().
> PS: Was the rogue traffic in fact caused by programs such as nmap running
> scans on all ports (and sending potentially invalid protocol)?
The initial traffic was actually Windows Messenger Pop-up spam. Apparently
Windows listens on ports 135 plus 1026 or higher for this. See:
http://www.lurhq.com/popup_spam.html for some info. Recently automated
probes for this have ratcheted up on ports 1027, 1028 and 1029. I have
also read some indications that this traffic may also be associated with
Windows worms exploiting microsoft's DCOM RPC bugs.
A standard nmap UDP scan (as might be run by network administrators or
someone penetration testing the network) runs quite slowly due to long
timeouts. I don't think that is likely to crash the IOC from my limited
testing (but given enough time it may). (The more typical nmap TCP scans
of an IOC do not cause any problems that I have noticed.) I restricted my
scan to a single port on a single host and looped that in a shell script
to cause the IOC to crash.
But since it is not the payload (nmap sends empty packets), but the
traffic itself which is causing the problem, it should be trivial to write
a program to crash an IOC in this way (open a UDP socket with port 1028 or
whatever the IOC is using and repeatedly sendto the IOC's address.)
In conclusion, given the ease with which an IOC can be brought down, I
would like to see a fix be made available through official channels. My
shutdown() seems to be adequate, but someone with a better understanding
of base my have a better solution. In the meantime, I will be running with
my modified online_notify.c
Thank you for your help,
--
Steve Hartman
[email protected] || 919-660-2650
Duke Free Electron Laser Laboratory
- References:
- RE: UDP to CA_UDP hangs network? Jeff Hill
- Navigate by Date:
- Prev:
Re: Capfast symbol for R3.13 scan record Peregrine McGehee
- Next:
Re: 64-bit EPICS anyone? Steven Hartman
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
<2005>
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
- Navigate by Thread:
- Prev:
RE: UDP to CA_UDP hangs network? Jeff Hill
- Next:
Re: UDP to CA_UDP hangs network? Chris Timossi
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
<2005>
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
|