Experimental Physics and
Industrial Control System

1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 <2026>	Index	1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 <2026>
<== Date ==>		<== Thread ==>

Subject:	Re: [EXTERNAL] Re: CA gateway crashes periodically due to badly behaved servers
From:	Brian Bevins via Tech-talk <tech-talk at aps.anl.gov>
To:	"Wang, Lin" <wanglin at ihep.ac.cn>, Brian Bevins <bevins at jlab.org>
Cc:	"tech-talk at aps.anl.gov" <tech-talk at aps.anl.gov>
Date:	Wed, 25 Mar 2026 00:24:59 +0000

My patch to the gateway that keeps the exception handler installed has completely solved the problem of the gateway restarting over and over again.

I have some new information about the problem with the newer gateway (2.1.3) repeatedly disconnecting from the Labview PXI crate where the old gateway (2.0.6) didn't. It happens about every ~10 seconds.

Looking at packet captures, the root problem seems to be that the PXI crate reports its protocol version as 12, but it's not really compliant with version 12. The proximal cause is that the newer gateway uses DBE_PROPERTY masks for some connections, where the old gateway didn't use those at all.

Whenever the gateway tries to subscribe to a PV on the ioc with a mask of 0x08 (DBE_PROPERTY), the ioc responds with an ECA_BADMASK exception. Per the protocol spec, I believe that a version 12 server should understand DBE_PROPERTY, and it should almost certainly ignore unknown mask bits and not return ECA_BADMASK.

When the gateway gets the BADMASK exception from the ioc, it attempts to cancel the subscription. The ioc, apparently never having created the subscription, responds with a Bad ID exception. When the gateway gets this exception it closes the virtual circuit and the cycle starts over again.

In contrast, when I use another client like camonitor talking directly to the ioc, it never complains or disconnects even if I use -m p to set the mask to be DBE_PROPERTY. I think this is because camonitor is just ignoring the exceptions.

The Labview PXI crate itself is a complete black box to me. Even the owners know very little about it. The only thing they seem to be able to do with it is configure the environment. The purpose of the gateway for my department is to shield our stuff from this kind of behavior. The frequent disconnections result in a lot of beacon anomalies on our side from the gateway that are kind of playing havoc with systems like our archiver, which is expected to be able to archive PVs from this crate, among others.

Does anybody have any suggestions about how to approach this? At this point I'm ready to hack the gateway to detect the BADMASK and NOT cancel the subscriptions. Only the property info would be lost, and our archiver doesn't care about that anyway. Alternatively, I could patch it to not shut down the VC in response to the Bad ID exception, but that seems riskier.

--Brian Bevins

Jefferson Lab

From: Tech-talk <tech-talk-bounces at aps.anl.gov> on behalf of Brian Bevins via Tech-talk <tech-talk at aps.anl.gov>
Sent: Thursday, March 12, 2026 12:44 PM
To: Wang, Lin <wanglin at ihep.ac.cn>
Cc: tech-talk at aps.anl.gov <tech-talk at aps.anl.gov>
Subject: Re: [EXTERNAL] Re: CA gateway crashes periodically due to badly behaved servers

HI Lin,

Thank you for the information. In my case I don't seem to get back the name of the PV in question. The server reports that the exception was generated in response to cancelling a subscription, but the payload doesn't include the PV name.

I can report that the problem does seem to be mitigated now that I've tweaked our gateway to not uninstall it's exception handler when the throttle trigger is reached. The gateway just keeps running. It still disconnects from the Labview server much more frequently than our older gateway (2.0.3) did. I'm not sure why yet.

Best,

-Brian

From: Wang, Lin <wanglin at ihep.ac.cn>
Sent: Saturday, March 7, 2026 8:33 PM
To: Brian Bevins <bevins at jlab.org>
Cc: tech-talk at aps.anl.gov <tech-talk at aps.anl.gov>
Subject: [EXTERNAL] Re: CA gateway crashes periodically due to badly behaved servers

Hello Brian,

At CSNS, we have been using LabView IOCs in beam diagnostic devices together with CA gateway for quite a few years, it seems that CA gateway is sensitive with unexpected data format and it indeed sometimes crashes or stops working even if not crashes. We plan to replace the non-standard IOCs in the ongoing CSNS upgrade.

But the intelligent aspect of CA gateway or the depending ca library in EPICS base is that they always reports the specific bad format PVs in the log file for us to troubleshoot.

For example, half a year ago, we encountered a similar issue that the CA gateway serving LabView IOCs keep restarting until crash when bad format PVs are being accessed by other CA clients via the CA gateway, reported like this in the CA gateway log file:

Oct 03 00:21:54 !!! Errlog message received (message is above)

filename="../../../../src/cas/generic/casPVI.cc" line number=253

Bad data type application type "enums" string conversion table for enumerated PV isnt a string type?

Then, we added one line in EPICS base to print (the print feature exists in newer EPICS version by default) the problematic PVs and reported the issue to colleagues in the beam diagnostic group, they restarted the relevant IOC to resolve the issue and it never happens again.

So, when issue occurs in our LavView IOC / CA gateway environment, we often restart CA gateway, or find the problematic IOC and restart the IOC.

Regards,

Lin

-----Original Messages-----
From: "Brian Bevins via Tech-talk" <tech-talk at aps.anl.gov>
Send time: Sunday, 03/08/2026 04:50:13
To: "tech-talk at aps.anl.gov" <tech-talk at aps.anl.gov>
Subject: [SPAM] CA gateway crashes periodically due to badly behaved servers

I've noticed an odd behavior in one of our channel access gateways. We use them in various places to sort of shield our stuff from stuff we don't control. I'm using version 2.1.3.

There is an apparently badly behaved server that I don't control that is seen by the gateway. It's a bit of a black box even to those who do control it. All I know is that it's a pxi crate running a LabView ca server. It disconnects about once every 8-40 seconds and frequently returns errors about bad resource IDs, apparently in response to CA_PROTO_EVENT_CANCEL commands. For unknown reasons this happens much more often than it did when we were running version 2.0.3.0 of the gateway. I'm still looking into that.

The gateway does a fine job of ignoring the errors and reconnecting, until it has received 100 errors. After that the error logging throttle kicks in and it logs only disconnects for 1 hour. What seems odd to me is that when it does this it uninstalls its own error handler. When the next serious error comes in it invokes the default client handler and aborts the program. This seems wrong to me. I've worked around it by just not uninstalling the error handler. Messages still stop being logged for an hour, but they don't abort the gateway.

The gateway always restarts within just a few seconds, but our archiver (Mya) doesn't always reconnect for some reason, leaving occasional gaps of several hours in the archived data. I'm trying to look into this too. Unfortunately our Mya expert just retired.

--Brian Bevins

Thomas Jefferson National Accelerator Facility

Replies:: Re: [EXTERNAL] Re: CA gateway crashes periodically due to badly behaved servers Ralph Lange via Tech-talk

References:: CA gateway crashes periodically due to badly behaved servers Brian Bevins via Tech-talk; Re: CA gateway crashes periodically due to badly behaved servers Wang, Lin via Tech-talk; Re: [EXTERNAL] Re: CA gateway crashes periodically due to badly behaved servers Brian Bevins via Tech-talk

Navigate by Date:: Prev: Re: Lack of error message from invalid CROSS_COMPILER_TARGET_ARCHS Johnson, Andrew N. via Tech-talk; Next: Re: Lack of error message from invalid CROSS_COMPILER_TARGET_ARCHS Wells, Alex (DLSLtd,RAL,TEC) via Tech-talk; Index: 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 <2026>
Navigate by Thread:: Prev: Re: [EXTERNAL] Re: CA gateway crashes periodically due to badly behaved servers Brian Bevins via Tech-talk; Next: Re: [EXTERNAL] Re: CA gateway crashes periodically due to badly behaved servers Ralph Lange via Tech-talk; Index: 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 <2026>

ANJ, 25 Mar 2026

· Home · News · About · Talk · Base · Modules · Extensions ·
· Distributions · Download · Documents · Links · Licensing ·

Experimental Physics and Industrial Control System

Experimental Physics and
Industrial Control System