On 10/9/20 8:51 AM, Kasemir, Kay via Core-talk wrote:
> Hi:
>
> At SNS, we have a problem with the current Java CA client causing search message broadcast storms. In principle, the client code is correct. Our channel access setup is questionable, but it "works" with the C++ CA client, so the java client needs to be tweaked, maybe with new configuration parameters, to also "work".
>
> For unfortunate reasons, our EPICS_CA_BEACON_PERIOD is set to 2 instead of 15 seconds, and the EPICS_CA_CONN_TMO=5. The idea was that clients like EDM should show disconnects after 5 seconds instead of looking at stale data for the default 30 seconds, and IOCs with CA links should consider them disconnected after 5 seconds as well.
This seems excessive. The reduced timeout I can understand,
but reducing the beacon period I'm less sure about. And 2 seconds
seems excessive. Is this left over from the days when UDP beacons were
used to timeout TCP connections?
> Turns out this isn't practical. Some IOCs do send beacons every 1999 .. 2001 ms, so the client sees very regular beacons and all is fine.
> But many IOCs don't send regular beacons.
> The CA client is not missing any beacons, the sequential beacon ID increments perfectly.
> The consecutive time stamps in the beacon message, however, have an inter-beacon distance of 1992ms, 2930, 2003, 3128, 1989, 2999, 2001, ... So instead of equidistant every 2 seconds, it's more like 2, 3, 2, 3, 2, 3, ... seconds between beacons, both for actual reception and based on the time stamps within the beacons.
>
> The JCA/CAJ client receives these and decides:
> First received: First time I see this server
> 2sec later: Period seems to be 2 seconds
> 3sec later: Looks like beacons are decaying from a recent server start, assume 3 sec average period
> 2sec later: A beacon that's faster than the average, reset average, restart the searches!
> 3sec later: First beacon after we reset the average, assume it's a 3 sec average
> 2sec later: A beacon that's faster than the average, reset average, restart the searches!
> 3sec later: First beacon after we reset the average, assume it's a 3 sec average
> 2sec later: A beacon that's faster than the average, reset average, restart the searches!
> 3sec later: First beacon after we reset the average, assume it's a 3 sec average
> ...
>
> Just one such IOC this tricks the CA client into restarting the name searches for disconnected channels. Add archive setups with 4000 missing channels (why are there so many missing channels? other issue...), physics apps that look for "all BPMs" and some are currently offline, ... and you get a lot of broadcasts.
>
> What to do?
I've long thought that this approach of trying to model the timing
of beacons was too clever. Maybe a simpler model with a timeout at
3x the beacon period, or if the beacon count jumps by >3, then reset
search timers?
> One thing that comes to mind is adding a new 'minimum averaging count' configuration parameter to the java CA client. By default its 1, giving the current behavior.
> But by setting it to say 4, the client would average over at least 4 received beacons to determine the 'average' time.
> In this example, it would find that to be 2.5 secs.
> Another configuration parameter could be the factor for considering a beacon "fast". Right now that's fixed in the code as 0.8, so a beacon with a period of 2.5*0.8=2.0 or less would be considered fast, triggering new search messages. We could set that to 0.5 and ride over those bad beacon periods without broadcast storms.
>
> Such averaging would of course lengthen the initial time until a client then declares "disconnected", but once the 'minimum averaging count' of beacons have been received, one faster beacon would again trigger searches right away.
>
> Other ideas?
With PVXS, I use linear back off for search retry instead of exponential
in an attempt to mitigate the effects of this sort of situation. I also
have a 30 second hold off after each beacon anomaly before another will
be recognized.
- Replies:
- Re: Java CA client beacon/search timing issues Johnson, Andrew N. via Core-talk
- References:
- Java CA client beacon/search timing issues Kasemir, Kay via Core-talk
- Navigate by Date:
- Prev:
Re: Java CA client beacon/search timing issues Kasemir, Kay via Core-talk
- Next:
Re: Java CA client beacon/search timing issues Johnson, Andrew N. via Core-talk
- Index:
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
<2020>
2021
2022
2023
2024
- Navigate by Thread:
- Prev:
Re: Java CA client beacon/search timing issues Kasemir, Kay via Core-talk
- Next:
Re: Java CA client beacon/search timing issues Johnson, Andrew N. via Core-talk
- Index:
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
<2020>
2021
2022
2023
2024
|