EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  <20212022  2023  2024  Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  <20212022  2023  2024 
<== Date ==> <== Thread ==>

Subject: Re: Archiver Appliance configuration questions
From: "Shankar, Murali via Tech-talk" <tech-talk at aps.anl.gov>
To: "tech-talk at aps.anl.gov" <tech-talk at aps.anl.gov>
Date: Thu, 4 Nov 2021 15:56:15 +0000
The archivePVWorkflowBatchSize​ and archivePVWorkflowTickSeconds​ control the rate at which we add new PV's into archiver so these are probably not the root cause of your issue. 

The commandThreadCount is probably related. This determines the number of CAJContexts that are created; each CAJContext has a separate search thread. In practice, this seems to help with reconnects and reconnect speed. You could try increasing this slowly to see if this helps. 

>>  What precise effects would we observe if this value were not high enough?
Longer reconnect times, higher thread CPU usage for some of the engine threads. Larger commandThreadCount increases the parallelism somewhat. 

>> Why may a value of 1 have been chosen? It is likely that we wanted to throttle the level of channel access search requests.
This is the reason for lowering the value; more search threads definitely means more broadcast traffic (CA searches are broadcast). 


But the root cause for this is probably that there are a lot of PV's in the search queues of the CAJContexts. These are mainly "PVs requested for archiving that do not exist" and "disconnected PV's that do not exist". You should abort the former and pause the latter to eliminate them from the search queues. See the admin guide on hints for maintaining a clean system

We run into this here occasionally as well. We have a script that checks for the liveness of disconnected PV's and sends out an email if a live PV has been disconnected for a long time ( the getCurrentlyDisconnectedPVs has a time of disconnect to help with this). Pausing and resuming a PV will tear down the CA channel and rebuild it and this should almost always result in a immediate reconnection. A good strategy seems to be 
  1. Pause all PV's that are live but disconnected.
  2. Resume these PV's in small batches of a few hundred or more. 

>> We have determined that we have a large number of PVs requested for archiving that do not exist (around 6 %),
BTW, I recently worked around a bug ( in a Java collection class of all things) that impacted this. In a working system, these PV's should have been kicked out of the system in 24 hours. Still watching this issue tho; not sure if it is completely gone.

>> We have a collective memory that "a maximum of 80 000 PVs per appliance is optimal."
This is probably just a ballpark used for capacity planning. The Engine write thread(s) and Max ETL(%) in the Metrics page are probably a decent measure of capacity.

Hope this helps.

Regards,
Murali








Navigate by Date:
Prev: RE: archiver appliance start failed Shankar, Murali via Tech-talk
Next: Re:RE: archiver appliance start failed 网易邮件中心 via Tech-talk
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  <20212022  2023  2024 
Navigate by Thread:
Prev: Archiver Appliance configuration questions Wilson, Andy (DLSLtd, RAL, LSCI) via Tech-talk
Next: nPoint motor driver Randall Cayford via Tech-talk
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  <20212022  2023  2024 
ANJ, 05 Nov 2021 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·