Hello Brian,
At CSNS, we have been using LabView IOCs in beam diagnostic devices together with CA gateway for quite a few years, it seems that CA gateway is sensitive with unexpected data format and it indeed sometimes crashes or stops working even if not crashes. We
plan to replace the non-standard IOCs in the ongoing CSNS upgrade.
But the intelligent aspect of CA gateway or the depending ca library in EPICS base is that they always reports the specific bad format PVs in the log file for us to troubleshoot.
For example, half a year ago, we encountered a similar issue that the CA gateway serving LabView IOCs keep restarting until crash when bad format PVs are being accessed by other CA clients via the CA gateway, reported like this in the CA gateway log file:
Oct 03 00:21:54 !!! Errlog message received (message is above)
filename="../../../../src/cas/generic/casPVI.cc" line number=253
Bad data type application type "enums" string conversion table for enumerated PV isnt a string type?
Then, we added one line in EPICS base to print (the print feature exists in newer EPICS version by default) the problematic PVs and reported the issue to colleagues in the beam diagnostic group, they restarted the relevant IOC to resolve the issue and it
never happens again.
So, when issue occurs in our LavView IOC / CA gateway environment, we often restart CA gateway, or find the problematic IOC and restart the IOC.
Regards,
Lin
-----Original Messages-----
From: "Brian Bevins via Tech-talk" <tech-talk at aps.anl.gov>
Send time: Sunday, 03/08/2026 04:50:13
To: "tech-talk at aps.anl.gov" <tech-talk at aps.anl.gov>
Subject: [SPAM] CA gateway crashes periodically due to badly behaved servers
I've noticed an odd behavior in one of our channel access gateways. We use them in various places to sort of shield our stuff from stuff we don't control. I'm using version 2.1.3.
There is an apparently badly behaved server that I don't control that is seen by the gateway. It's a bit of a black box even to those who do control it. All I know is that it's a pxi crate running a LabView ca server. It disconnects about once every 8-40 seconds
and frequently returns errors about bad resource IDs, apparently in response to CA_PROTO_EVENT_CANCEL commands. For unknown reasons this happens much more often than it did when we were running version 2.0.3.0 of the gateway. I'm still looking into that.
The gateway does a fine job of ignoring the errors and reconnecting, until it has received 100 errors. After that the error logging throttle kicks in and it logs only disconnects for 1 hour. What seems odd to me is that when it does this it uninstalls its own
error handler. When the next serious error comes in it invokes the default client handler and aborts the program. This seems wrong to me. I've worked around it by just not uninstalling the error handler. Messages still stop being logged for an hour, but they
don't abort the gateway.
The gateway always restarts within just a few seconds, but our archiver (Mya) doesn't always reconnect for some reason, leaving occasional gaps of several hours in the archived data. I'm trying to look into this too. Unfortunately our Mya expert just retired.
--Brian Bevins
Thomas Jefferson National Accelerator Facility