Hi,
At SNS we’ve been finally trying to use a nameserver to reduce the broadcast on the network. During testing, we found that newer CA client will connect to old IOCs (pre 3.14.12) but won’t receive any monitored value. This is true for at
least EDM and camonitor with CA client from EPICS 7.0.8.1, while CSS seems to work fine. While camonitor receives an event and displays the timestamp when record processed but no value, the EDM will display a random value in the widget which made testing of
the nameserver even more challenging.
After debugging with Wireshark, it appears that the problem lies when a CA Version 13 client connects to a V13 nameserver which redirects it to V11 IOC. In particular, the ChannelAccess V13 introduced dynamic resizing of arrays, which means
that the event subscribe command requests 0 elements. The V11 IOC understands such a request as ‘send me events but no data’. This doesn’t happen when V13 client talks directly to V11 IOC, but only when going through the nameserver. I believe that is because
the V11 IOC ignores the VERSION request in the subscribe command, and so the V13 clients continues to assume the remote side is V13 which it got from the nameserver. This may sound confusing, so here’s a dump from Wireshark with some of my comments inline,
the 7.0.8.1 and 3.14.8.2 are the EPICS base versions that client/nameserver/IOC are compiled against:
# camonitor (base 7.0.8.1) asking NameServer (base 7.0.8.1,PCAS 4.13.3) on port 5062 about the 'TEST' PV
15 1.296358367 192.168.1.34 192.168.1.34 CA 84 39208->5062 Version(13), Search('TEST',1),
20 1.328361887 192.168.1.34 192.168.1.34 CA 84 39208->5062 Version(13), Search('TEST',1),
# NameServer (base 7.0.8.1) starts the broadcast search
25 1.339764217 192.168.1.34 192.168.1.255 CA 84 50905->5064 Version(13), Search('TEST',1),
# IOC (base 3.14.8.2) replies
26 1.339837224 192.168.1.34 192.168.1.34 CA 84 5064->50905 Version(11), Search Reply(1),
# NameServer (7.0.8.1) establishes the connection and from now on knows the 'TEST' PV
32 1.340686305 192.168.1.34 192.168.1.34 CA 156 58972->5064 Version(13), User('klemen'), Host('homebox'), Create Request('TEST', cid=1),
34 1.340729527 192.168.1.34 192.168.1.34 CA 100 5064->58972 Rights(cid=1, RW), Create Reply(cid=1, sid=16),
# camonitor (7.0.8.1) asking NameServer about the 'TEST' PV again
42 1.392397425 192.168.1.34 192.168.1.34 CA 84 39208->5062 Version(13), Search('TEST',1),
# NameServer (7.0.8.1) replies back pointing it to the IOC
51 1.412974029 192.168.1.34 192.168.1.34 CA 84 5062->39208 Version(13), Search Reply(1),
# camonitor (7.0.8.1) requests 'TEST' PV from IOC (3.14.8.2)
56 1.413850566 192.168.1.34 192.168.1.34 CA 156 58980->5064 Version(13), User('klemen'), Host('homebox'), Create Request('TEST', cid=1),
# IOC (3.14.8.2) replies but doesn't include a Version response
58 1.413913529 192.168.1.34 192.168.1.34 CA 100 5064->58980 Rights(cid=1, RW), Create Reply(cid=1, sid=17),
Linux cooked capture v1
Internet Protocol Version 4, Src: 192.168.1.34, Dst: 192.168.1.34
Transmission Control Protocol, Src Port: 5064, Dst Port: 58980, Seq: 1, Ack: 89, Len: 32
Channel Access
Command: Rights (0x0016)
Payload Size: 0
Client Channel ID: 1
Rights: RW (0x00000003)
Channel Access
Command: Create Channel (0x0012)
Payload Size: 0
DBR Type: DOUBLE (6)
Data Count: 1
Client Channel ID: 1
Server Channel ID: 17
# camonitor 7.0.8.1 assumes Version(13) from the search request packet #51, and specifies Data Count: 0
61 1.414229433 192.168.1.34 192.168.1.34 CA 100 58980->5064 Event Add(sid=17, sub=1, mask=5),
Frame 61: 100 bytes on wire (800 bits), 100 bytes captured (800 bits) on interface any, id 0
Linux cooked capture v1
Internet Protocol Version 4, Src: 192.168.1.34, Dst: 192.168.1.34
Transmission Control Protocol, Src Port: 58980, Dst Port: 5064, Seq: 89, Ack: 33, Len: 32
Channel Access
Command: Event (0x0001)
Payload Size: 16
DBR Type: TIME_DOUBLE (20)
Data Count: 0
Subscription ID: 1
Server Channel ID: 17
Event Mask: 0x0005
No. Time Source Destination Protocol Length Info
# IOC (3.14.8.2) doesn't recognize Data Count: 0 from Version(13) as dynamic array, and hence sends an event with no elements
62 1.414314283 192.168.1.34 192.168.1.34 CA 108 5064->58980 Event(sub=1, value=[]),
Frame 62: 108 bytes on wire (864 bits), 108 bytes captured (864 bits) on interface any, id 0
Linux cooked capture v1
Internet Protocol Version 4, Src: 192.168.1.34, Dst: 192.168.1.34
Transmission Control Protocol, Src Port: 5064, Dst Port: 58980, Seq: 33, Ack: 121, Len: 40
Channel Access
Command: Event (0x0001)
Payload Size: 24
DBR Type: TIME_DOUBLE (20)
Data Count: 0
Subscription ID: 1
Status: ECA_NORMAL (0x00000001)
Status: 0
Severity: 0
Timestamp: Sat 08 Mar 2025 10:40:26 AM EST
Padding: 17
No. Time Source Destination Protocol Length Info
For now, we found a workaround: compile the nameserver against base 3.14.8.2. This forces old protocol in the search reply and fixes the issue. Furthermore, the V13 server correctly responds to VERSION command which allows V13 client to
properly change the version from V11 it got from nameserver to the V13 that the IOC is sending, so it can use the new features. For obvious reasons, we’d like this to be only a temporary solution. Unfortunately, we still have several old IOCs that we’re slowly
upgrading.
On a high level, I thought of a few solutions , and if I was to go down the road of trying to fix it, I’d like to get some input from experts which one would be most optimal:
- If server doesn’t respond with VERSION command when a channel is connected to a new address, do another search first for old clients – detecting a new address is tricky, as going from
UDP to TCP the port numbers will likely change, but only detecting it if the IP changes limits the use of a nameserver on a unique IP
- Subscribe with data_count=0 only when absolutely sure that the remote side is V13+, ie when server properly replied to VERSION request
- Use Data Count from Create Channel reply, maybe that’s what CSS does
- Change the nameserver to return the CA version of the IOC where the PV is found instead of its own. This requires nameserver to drop using PCAS and CA client library from base, and
instead implement CA protocol on its own
-- Klemen