Thank you for the suggestion. I have verified that gateway beacons are reaching the archiver machines. Whatever is delaying the reconnection, it doesn't seem to be that.
Hi Brian,
Not sure if you saw my email reply buried below in the body of your email.
I will highlight.
Meanwhile, I will get my mail client fixed up. 🙂
Cheers,
Ernest
From: Tech-talk <tech-talk-bounces at aps.anl.gov> on behalf of Williams Jr., Ernest L. via Tech-talk <tech-talk at aps.anl.gov>
Sent: Monday, March 9, 2026 6:48 AM
To: tech-talk at aps.anl.gov <tech-talk at aps.anl.gov>; Brian Bevins <bevins at jlab.org>
Subject: Re: CA gateway crashes periodically due to badly behaved servers
Hi Brian
From: Tech-talk <tech-talk-bounces at aps.anl.gov> on behalf of Brian Bevins via Tech-talk <tech-talk at aps.anl.gov>
Sent: Saturday, March 7, 2026 12:50 PM
To: tech-talk at aps.anl.gov <tech-talk at aps.anl.gov>
Subject: CA gateway crashes periodically due to badly behaved servers
I've noticed an odd behavior in one of our channel access gateways. We use them in various places to sort of shield our stuff from stuff we don't control. I'm using version 2.1.3.
There is an apparently badly behaved server that I don't control that is seen by the gateway. It's a bit of a black box even to those who do control it. All I know is that it's a pxi crate running a LabView ca server. It disconnects about once every 8-40 seconds
and frequently returns errors about bad resource IDs, apparently in response to CA_PROTO_EVENT_CANCEL commands. For unknown reasons this happens much more often than it did when we were running version 2.0.3.0 of the gateway. I'm still looking into that.
The gateway does a fine job of ignoring the errors and reconnecting, until it has received 100 errors. After that the error logging throttle kicks in and it logs only disconnects for 1 hour. What seems odd to me is that when it does this it uninstalls its own
error handler. When the next serious error comes in it invokes the default client handler and aborts the program. This seems wrong to me. I've worked around it by just not uninstalling the error handler. Messages still stop being logged for an hour, but they
don't abort the gateway.
The gateway always restarts within just a few seconds, but our archiver (Mya) doesn't always reconnect for some reason, leaving occasional gaps of several hours in the archived data. I'm trying to look into this too. Unfortunately our Mya expert just retired.
===============================================================================
ELW: Did you set the environment variable to send beacons to your archiver client?
That should wake up the archiver client..
EPICS_CAS_BEACON_ADDR_LIST used to send beacons to clients
================================================================================
--Brian Bevins
Thomas Jefferson National Accelerator Facility