EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  <20202021  2022  Index 2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  <20202021  2022 
<== Date ==> <== Thread ==>

Subject: Re: Java CA client beacon/search timing issues
From: Michael Davidsaver via Core-talk <core-talk at aps.anl.gov>
To: "Johnson, Andrew N." <anj at anl.gov>
Cc: EPICS core-talk <core-talk at aps.anl.gov>, "Shroff, Kunal" <shroffk at bnl.gov>, John Sinclair <sinclairjw at ornl.gov>
Date: Sun, 11 Oct 2020 10:38:09 -0700
On 10/9/20 12:50 PM, Johnson, Andrew N. wrote:
> On Oct 9, 2020, at 11:56 AM, Michael Davidsaver via Core-talk <core-talk at aps.anl.gov <mailto:core-talk at aps.anl.gov>> wrote:
>>
>> On 10/9/20 8:51 AM, Kasemir, Kay via Core-talk wrote:
>>> For unfortunate reasons, our EPICS_CA_BEACON_PERIOD is set to 2 instead of 15 seconds, and the EPICS_CA_CONN_TMO=5. The idea was that clients like EDM should show disconnects after 5 seconds instead of looking at stale data for the default 30 seconds, and IOCs with CA links should consider them disconnected after 5 seconds as well.
>>
>> This seems excessive.  The reduced timeout I can understand,
>> but reducing the beacon period I'm less sure about.  And 2 seconds
>> seems excessive.  Is this left over from the days when UDP beacons were
>> used to timeout TCP connections?

> IIRC on virtual circuits that do not have regular traffic from the client to the server the C++ client implementation takes about twice the beacon period to recognize that an IOC is no longer responsive. It then sends an “are you there" over TCP, and if that doesn’t get responded to within some period it will mark those channels as disconnected. Using UDP beacons this way reduces the amount of network traffic (and the corresponding server workload) that would be needed if the server sent periodic beacons over each TCP circuit.

I'm aware that this is the case now.  I don't know the details, but I
recall being told that it was not always so.  There is still a reference
hanging around in the CA protocol document.

> While it was done historically, clients SHOULD NOT use Beacons to make
> timeout decisions for TCP Circuits. The CA_PROTO_ECHO message should be
> used instead.

I was simply wondering if this bit of history played into the decision
to reduce EPICS_CA_BEACON_PERIOD.


> Were you thinking that has changed? Has it?
> 
> 
>>> Just one such IOC this tricks the CA client into restarting the name searches for disconnected channels. Add archive setups with 4000 missing channels (why are there so many missing channels? other issue...), physics apps that look for "all BPMs" and some are currently offline, ... and you get a lot of broadcasts.
>>>
>>> What to do?
> 
> At APS we have occasional campaigns that look at what clients are searching for names that don’t connect and force the client owners to clean up their screens or software. If you have any C Gateways with their server-side connected to the machine network they can tell you what your current CA search rate is, always worth keeping an eye on.
> 
> I would suggest trying to increase the beacon periods on your IOCs to something more reasonable, would 5 seconds be acceptable to your users instead? That should give you 10-15 seconds for disconnect notifications; maybe now that you know the cost of aiming for 5 seconds you can persuade management to let you increase it?
> 
> 
>> I've long thought that this approach of trying to model the timing
>> of beacons was too clever.  Maybe a simpler model with a timeout at
>> 3x the beacon period, or if the beacon count jumps by >3, then reset
>> search timers?
> 
> The client doesn’t really try to model the timing of the beacons from each server, it just regards a significant change in the measured period as its trigger, although I’m not sure how lenient it is. It does have to adapt to different periods from each server, the PCAS used a different beacon period than the IOC for a long time and it may still do that.
> 
> 
>> With PVXS, I use linear back off for search retry instead of exponential
>> in an attempt to mitigate the effects of this sort of situation.  I also
>> have a 30 second hold off after each beacon anomaly before another will
>> be recognized.
> 
> The 30 second hold-off certainly sounds like a good idea that might be implementable in the Java client; I’m not sure if the C++ CA client has anything like that in it, it may not need it.
> 
> - Andrew
> 
> -- 
> Complexity comes for free, simplicity you have to work for.
> 


References:
Java CA client beacon/search timing issues Kasemir, Kay via Core-talk
Re: Java CA client beacon/search timing issues Michael Davidsaver via Core-talk
Re: Java CA client beacon/search timing issues Johnson, Andrew N. via Core-talk

Navigate by Date:
Prev: Re: Java CA client beacon/search timing issues Ralph Lange via Core-talk
Next: Build failed: EPICS Base 7 base-7.0-90 AppVeyor via Core-talk
Index: 2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  <20202021  2022 
Navigate by Thread:
Prev: Re: Java CA client beacon/search timing issues Ralph Lange via Core-talk
Next: Build failed: EPICS Base 7 base-7.0-90 AppVeyor via Core-talk
Index: 2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  <20202021  2022 
ANJ, 13 Oct 2020 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·