Dear all,
I'm pulling my hair out over an issue that only appears sometimes and I can't determine the reason for it. I have a bridge interface with IP address 10.1.1.1. It is NAT-ed to the outside world and is running DHCP and DNS for containers and VMs. The latter have
their interfaces bridged there; in short, it's an internal network on 10.1.1.0/24. Apart from NAT, no other iptables rules are in place.
I would like to restrict EPICS to this internal network. Nothing needs to be done for VMs and containers as they only see the internal network anyway. CA between VMs works as expected. The problem is with the IOCs on the host where I have set (this is EPICS
7.0.6.1)
EPICS_CAS_INTF_ADDR_LIST=10.1.1.1
EPICS_CA_AUTO_ADDR_LIST=NO
EPICS_CA_ADDR_LIST=10.1.1.255
This appears to restrict the IOC CA server to the bridge interface, as intended:
epics> casr 5
Channel Access Server V4.13
No clients connected.
CAS-TCP server on 10.1.1.1:5064 with
CAS-UDP unicast name server on 10.1.1.1:5064
Last name requested by 0.0.0.0:0:
User '', V4.0, Priority = 0, 0 Channels
Task Id = 0x2245a30, Socket FD = 7
214.35 secs since last send, 214.35 secs since last receive
Unprocessed request bytes = 0, Undelivered response bytes = 16
State = up
CAS-UDP broadcast name server on 10.1.1.255:5064
Last name requested by 0.0.0.0:0:
User '', V4.0, Priority = 0, 0 Channels
Task Id = 0x2245c80, Socket FD = 7
214.35 secs since last send, 214.35 secs since last receive
Unprocessed request bytes = 0, Undelivered response bytes = 16
State = up
Sending CAS-beacons to 1 address:
10.1.1.255:5065
However, it only works sometimes; other times (i.e. after a reboot or several, and more often than not), this happens:
$ caget test:arg_echo
CAC: Unable to connect because "Connection refused"
CA.Client.Exception...............................................
Warning: "Virtual circuit disconnect"
Context: "192.168.11.237:5064"
Source File: ../cac.cpp line 1237
Current Time: Thu Apr 07 2022 20:19:45.727753820
..................................................................
Channel connect timed out: 'test:arg_echo' not found.
The IP address shown here is the IP of the WiFi interface, which should not have been involved at all. The problem does not originate from the client. Here is the packet dissection of the CA search and response, relevant fields of IP and CA frames only:
Search:
{
"ip.src": "10.1.1.1",
"ip.dst": "10.1.1.255"
}
{
"ca.command": "0x00000006",
"ca.size": "16",
"ca.doreply": "0x00000005",
"ca.version": "13",
"ca.cid": "1",
"ca.p2": "0x00000001",
"ca.pv": "test:arg_echo"
}
Response:
{
"ip.src": "192.168.11.237",
"ip.dst": "10.1.1.1"
}
{
"ca.command": "0x00000006",
"ca.size": "8",
"ca.serv.port": "5064",
"ca.serv.ip": "255.255.255.255",
"ca.cid": "1",
"ca.version": "0"
}
And so, the client tries to connect to the WiFi interface, while the IOC is only listening on the bridge. I don't understand why the IOC would respond from an address it is not listening on, and why
this behavior is not consistent. Things start to work if I turn WiFi off. I tried keeping it on and just deleting the default route to see what happens, no change. Likewise, and as I'd expect, adding the WiFi address to EPICS_CAS_IGNORE_ADDR_LIST makes
no difference. I hope someone here has an idea before I go dig into the code.
Thanks,
Jure Varlec
Senior Software Developer
Cosylab d.d.
www.cosylab.com
|