Experimental Physics and Industrial Control System
|
I don't have packet captures yet, but I think these demonstrate the
issue and agree with Brian's report:
voltctl% caget
B:tune:pinger:controlGetWF B:tune:pinger:controlGetWF.NORD
B:tune:pinger:controlGetWF 1024 127927811 1539 128187902 1539
128187902 1539 ...
B:tune:pinger:controlGetWF.NORD 1024
voltctl% camonitor
B:tune:pinger:controlGetWF B:tune:pinger:controlGetWF.NORD
B:tune:pinger:controlGetWF 2023-08-22 07:37:48.101901
B:tune:pinger:controlGetWF.NORD 2023-08-22 07:37:48.101901
^C
voltctl% caget -c
B:tune:pinger:controlGetWF B:tune:pinger:controlGetWF.NORD
B:tune:pinger:controlGetWF
B:tune:pinger:controlGetWF.NORD
voltctl% caget -c -# 5 B:tune:pinger:controlGetWF
B:tune:pinger:controlGetWF.NORD
B:tune:pinger:controlGetWF 5 127927811 1539 128187902 1539
128187902
B:tune:pinger:controlGetWF.NORD 1 1024
voltctl% camonitor -# 5 B:tune:pinger:controlGetWF
B:tune:pinger:controlGetWF.NORD
B:tune:pinger:controlGetWF 2023-08-22 07:37:48.101901 5
127927811 1539 128187902 1539 128187902
B:tune:pinger:controlGetWF.NORD 2023-08-22 07:37:48.101901 1
1024
^C
This waveform record is on an RTEMS-uC5282 IOC running 3.14.11. A
normal caget works, camonitor of the same PVs
doesn't work (CA_PROTO_EVENT_ADD), caget -c also doesn't
work (CA_PROTO_READ_NOTIFY), but specifying the number of elements
to fetch makes both of them work again. The same happens for purely
scalar record types such as longin.
We have a similar group of 3.14.11 IOCs that are protected by a
Gateway, and there since the Gateway doesn't resolve the PV names
using a nameserver there is no problem with accessing the data
normally. As Ralph suggested a Gateway is another possibility for
fixing this here, we'd just have to generate a set of regexp's that
match all of the PVs we need from the unprotected 3.14.11 IOCs.
- Andrew
On 9/19/23 9:49 AM, Michael Davidsaver
wrote:
On
9/18/23 14:58, Brian Bevins via Tech-talk wrote:
I recently had to diagnose a problem with
our local CA nameserver. We use a fork of the EPICS CA
nameserver, which exhibits the same problem.
I am not sure I understand what is being described.
Can you provide a packet capture of this situation? (or
situations?)
The symptom was that certain clients would
receive monitor updates with element counts of zero, even for
scalar PVs, when the PVs were resolved through a new build of
the CA nameserver. This only occurred with clients built against
R3.14.12 or newer, a nameserver built against R3.14.12 or newer,
and an ioc built against R3.14.11 or older. (Note that I use the
term "ioc" throughout to refer to any CA server in order to
avoid confusion over which server is being described.) We
identified camonitor and PyEpics as two clients that were
affected. Notably caget was not.
I traced the problem to a change made in CA_V413, that allows
clients to use a zero element count in a CA_PROTO_EVENT_ADD or
CA_PROTO_READ_NOTIFY messages. Prior to 413 an element count of
zero in a request is supposed to always trigger an error. 413
allows a request for zero elements to which the server responds
with the actual number of elements available. Using the element
count of zero seems to have become the default when a client is
communicating with an ioc that supports V413. Everything works
fine when the client and ioc handle name resolution directly.
The channel falls back to the lowest common denominator of CA
minor protocol version.
The problem occurs when a CA nameserver built against a newer
EPICS redirects newer clients to older iocs. The nameserver
always answers the client with a message that correctly gives
the ioc's address, but includes the nameserver's own protocol
version. When the client then opens a channel with the ioc, the
client has the false belief that the ioc has a later protocol
version that it really does. This never seemed to create a
problem for us until CA_V413, but when a 413 client connects to
a <= 412 ioc it uses the zero element count which the ioc
handles by returning updates with zero elements. (This itself
seems like another bug in that for V412 and earlier an element
count of zero is supposed to generate an error.)
My fix was to create patches to libca and libcas and modify the
nameserver to use them. The libca patch adds a
ca_host_minor_protocol(chid) function to the CA client API that
allows a client to request the minor protocol version of a
connected server. The client side of the CA nameserver uses it
to fetch and store the minor protocol version from each ioc. The
libcas patch adds an overload of pvExistReturn() that allows the
server to respond to a client with both a specified network
address and a specified minor protocol version number. The
server side of the CA nameserver uses this to respond to clients
with both the stored address and the stored protocol version.
With these changes in place the problem disappears.
I know that pcas is basically dead, but the CA nameserver seems
to still be in use. I can provide the patches against R3.15.9 if
there is any interest in that.
--Brian Bevins
Jefferson Lab
--
Complexity is free, it's Simplicity that takes work.
|
- References:
- Problem with CA nameserver and CA_V413 protocol Brian Bevins via Tech-talk
- Re: Problem with CA nameserver and CA_V413 protocol Michael Davidsaver via Tech-talk
- Navigate by Date:
- Prev:
areadetector, extra IOC only for plugins Heinz Junkes via Tech-talk
- Next:
Re: areadetector, extra IOC only for plugins Mark Rivers via Tech-talk
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
<2023>
2024
- Navigate by Thread:
- Prev:
Re: Problem with CA nameserver and CA_V413 protocol Michael Davidsaver via Tech-talk
- Next:
areadetector, extra IOC only for plugins Heinz Junkes via Tech-talk
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
<2023>
2024
|
ANJ, 19 Sep 2023 |
·
Home
·
News
·
About
·
Base
·
Modules
·
Extensions
·
Distributions
·
Download
·
·
Search
·
EPICS V4
·
IRMIS
·
Talk
·
Bugs
·
Documents
·
Links
·
Licensing
·
|