EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  <20202021  2022  Index 2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  <20202021  2022 
<== Date ==> <== Thread ==>

Subject: Re: Java CA client beacon/search timing issues
From: Michael Davidsaver via Core-talk <core-talk at aps.anl.gov>
To: "Kasemir, Kay" <kasemirk at ornl.gov>, "core-talk at aps.anl.gov" <core-talk at aps.anl.gov>
Cc: "'Shroff, Kunal'" <shroffk at bnl.gov>, "Sinclair, John" <sinclairjw at ornl.gov>
Date: Fri, 9 Oct 2020 09:56:56 -0700
On 10/9/20 8:51 AM, Kasemir, Kay via Core-talk wrote:
> Hi:
> 
> At SNS, we have a problem with the current Java CA client causing search message broadcast storms. In principle, the client code is correct. Our channel access setup is questionable, but it "works" with the C++ CA client, so the java client needs to be tweaked, maybe with new configuration parameters, to also "work".
> 
> For unfortunate reasons, our EPICS_CA_BEACON_PERIOD is set to 2 instead of 15 seconds, and the EPICS_CA_CONN_TMO=5. The idea was that clients like EDM should show disconnects after 5 seconds instead of looking at stale data for the default 30 seconds, and IOCs with CA links should consider them disconnected after 5 seconds as well.

This seems excessive.  The reduced timeout I can understand,
but reducing the beacon period I'm less sure about.  And 2 seconds
seems excessive.  Is this left over from the days when UDP beacons were
used to timeout TCP connections?

> Turns out this isn't practical. Some IOCs do send beacons every 1999 .. 2001 ms, so the client sees very regular beacons and all is fine.
> But many IOCs don't send regular beacons.
> The CA client is not missing any beacons, the sequential beacon ID increments perfectly.
> The consecutive time stamps in the beacon message, however, have an inter-beacon distance of 1992ms, 2930, 2003, 3128, 1989, 2999, 2001, ... So instead of equidistant every 2 seconds, it's more like 2, 3, 2, 3, 2, 3, ... seconds between beacons, both for actual reception and based on the time stamps within the beacons.
> 
> The JCA/CAJ client receives these and decides:
> First received: First time I see this server
> 2sec later:   Period seems to be 2 seconds
> 3sec later:   Looks like beacons are decaying from a recent server start, assume 3 sec average period
> 2sec later:   A beacon that's faster than the average, reset average, restart the searches!
> 3sec later:   First beacon after we reset the average, assume it's a 3 sec average
> 2sec later:   A beacon that's faster than the average, reset average, restart the searches!
> 3sec later:   First beacon after we reset the average, assume it's a 3 sec average
> 2sec later:   A beacon that's faster than the average, reset average, restart the searches!
> 3sec later:   First beacon after we reset the average, assume it's a 3 sec average
> ...
> 
> Just one such IOC this tricks the CA client into restarting the name searches for disconnected channels. Add archive setups with 4000 missing channels (why are there so many missing channels? other issue...), physics apps that look for "all BPMs" and some are currently offline, ... and you get a lot of broadcasts.
> 
> What to do?

I've long thought that this approach of trying to model the timing
of beacons was too clever.  Maybe a simpler model with a timeout at
3x the beacon period, or if the beacon count jumps by >3, then reset
search timers?

> One thing that comes to mind is adding a new 'minimum averaging count' configuration parameter to the java CA client. By default its 1, giving the current behavior.
> But by setting it to say 4, the client would average over at least 4 received beacons to determine the 'average' time.
> In this example, it would find that to be 2.5 secs.
> Another configuration parameter could be the factor for considering a beacon "fast". Right now that's fixed in the code as 0.8, so a beacon with a period of 2.5*0.8=2.0 or less would be considered fast, triggering new search messages. We could set that to 0.5 and ride over those bad beacon periods without broadcast storms.
> 
> Such averaging would of course lengthen the initial time until a client then declares "disconnected", but once the 'minimum averaging count' of beacons have been received, one faster beacon would again trigger searches right away.
> 
> Other ideas?

With PVXS, I use linear back off for search retry instead of exponential
in an attempt to mitigate the effects of this sort of situation.  I also
have a 30 second hold off after each beacon anomaly before another will
be recognized.

Replies:
Re: Java CA client beacon/search timing issues Johnson, Andrew N. via Core-talk
References:
Java CA client beacon/search timing issues Kasemir, Kay via Core-talk

Navigate by Date:
Prev: Re: Java CA client beacon/search timing issues Kasemir, Kay via Core-talk
Next: Re: Java CA client beacon/search timing issues Johnson, Andrew N. via Core-talk
Index: 2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  <20202021  2022 
Navigate by Thread:
Prev: Re: Java CA client beacon/search timing issues Kasemir, Kay via Core-talk
Next: Re: Java CA client beacon/search timing issues Johnson, Andrew N. via Core-talk
Index: 2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  <20202021  2022 
ANJ, 09 Oct 2020 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·