2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 <2020> 2021 2022 2023 2024 | Index | 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 <2020> 2021 2022 2023 2024 |
<== Date ==> | <== Thread ==> |
---|
Subject: | Java CA client beacon/search timing issues |
From: | "Kasemir, Kay via Core-talk" <core-talk at aps.anl.gov> |
To: | "core-talk at aps.anl.gov" <core-talk at aps.anl.gov> |
Cc: | "Sinclair, John" <sinclairjw at ornl.gov>, "'Shroff, Kunal'" <shroffk at bnl.gov>, Michael Davidsaver <mdavidsaver at ospreydcs.com> |
Date: | Fri, 9 Oct 2020 15:51:16 +0000 |
Hi:
At SNS, we have a problem with the current Java CA client causing search message broadcast storms. In principle, the client code is correct. Our channel access setup is questionable, but it "works" with the C++ CA client, so the java client needs to be
tweaked, maybe with new configuration parameters, to also "work".
For unfortunate reasons, our EPICS_CA_BEACON_PERIOD is set to 2 instead of 15 seconds, and the EPICS_CA_CONN_TMO=5. The idea was that clients like EDM should show disconnects after 5 seconds instead of looking at stale data for the default 30 seconds,
and IOCs with CA links should consider them disconnected after 5 seconds as well.
Turns out this isn't practical. Some IOCs do send beacons every 1999 .. 2001 ms, so the client sees very regular beacons and all is fine.
But many IOCs don't send regular beacons.
The CA client is not missing any beacons, the sequential beacon ID increments perfectly.
The consecutive time stamps in the beacon message, however, have an inter-beacon distance of 1992ms, 2930, 2003, 3128, 1989, 2999, 2001, ... So instead of equidistant every 2 seconds, it's more like 2, 3, 2, 3, 2, 3, ... seconds between beacons, both for
actual reception and based on the time stamps within the beacons.
The JCA/CAJ client receives these and decides:
First received: First time I see this server
2sec later: Period seems to be 2 seconds
3sec later: Looks like beacons are decaying from a recent server start, assume 3 sec average period
2sec later: A beacon that's faster than the average, reset average, restart the searches!
3sec later: First beacon after we reset the average, assume it's a 3 sec average
2sec later: A beacon that's faster than the average, reset average, restart the searches!
3sec later: First beacon after we reset the average, assume it's a 3 sec average
2sec later: A beacon that's faster than the average, reset average, restart the searches!
3sec later: First beacon after we reset the average, assume it's a 3 sec average
...
Just one such IOC this tricks the CA client into restarting the name searches for disconnected channels. Add archive setups with 4000 missing channels (why are there so many missing channels? other issue...), physics
apps that look for "all BPMs" and some are currently offline, ... and you get a lot of broadcasts.
What to do?
One thing that comes to mind is adding a new 'minimum averaging count' configuration parameter to the java CA client. By default its 1, giving the current behavior.
But by setting it to say 4, the client would average over at least 4 received beacons to determine the 'average' time.
In this example, it would find that to be 2.5 secs.
Another configuration parameter could be the factor for considering a beacon "fast". Right now that's fixed in the code as 0.8, so a beacon with a period of 2.5*0.8=2.0 or less would be considered fast, triggering new search messages. We could set that
to 0.5 and ride over those bad beacon periods without broadcast storms.
Such averaging would of course lengthen the initial time until a client then declares "disconnected", but once the 'minimum averaging count' of beacons have been received, one faster beacon would again trigger searches right away.
Other ideas?
Thanks,
Kay |