EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  <20212022  2023  2024  Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  <20212022  2023  2024 
<== Date ==> <== Thread ==>

Subject: Archiver Appliance configuration questions
From: "Wilson, Andy \(DLSLtd, RAL, LSCI\) via Tech-talk" <tech-talk at aps.anl.gov>
To: EPICS tech-talk <tech-talk at aps.anl.gov>
Cc: "Williams, Rebecca \(OBS, RAL, LSCI\)" <rebecca.williams at diamond.ac.uk>, "Gaughran, Martin \(DLSLtd, RAL, LSCI\)" <Martin.Gaughran at diamond.ac.uk>
Date: Thu, 4 Nov 2021 11:18:36 +0000
Hello,

I am looking for some advice on the details of configuration for the EPICS Archiver Appliance (AA).

At Diamond, our AA installation was configured in 2017 and the config hasn't changed much since then, although we regularly update the software.

We have recently started seeing a problem where the AA is taking unusually long (sometimes days) to establish new connections to PVs, and re-establish connections which are interrupted, e.g. by IOC restarts.

We have determined that we have a large number of PVs requested for archiving that do not exist (around 6 %), and that this is likely the root cause, so we are currently working to reduce these to a manageable level.

We would like to rule out any other factors. We have identified two possibilities.

1. Configuration options in archappl.properties

We have archivePVWorkflowBatchSize​ set to 30 000. archivePVWorkflowTickSeconds​ is not defined so must be using the default. We do not think that these two are causing our problem because we are not close to having this many PVs pending. We also see the issue with re-establishing interrupted connections to PVs which are already archiving, so I think it is more likely something at the channel access client library level.

The only parameter that looks like it could be relevant is

org.epics.archiverappliance.engine.epics.commandThreadCount

The default is 10. Our site configuration has it set to 1. I do not have a record of the reason for this value.

The comment for this in archappl.properties says:

For faster reconnect times, we may want to use more than one JCAContext/CAJContext. This controls the number of JCACommandThreads and thus the number of JCAContext/CAJContext.

Each JCACommandThread launches aprox 4 threads in all in CAJ - one CAJ search thread (UDP); a couple of TCP threads and the JCACommand thread that controls them.

Routing all PVs thru fewer contexts seems to result in larger reconnect times.


What precise effects would we observe if this value were not high enough?
Why may a value of 1 have been chosen? It is likely that we wanted to throttle the level of channel access search requests. Excessive broadcast traffic from the AA has caused problems at DLS in the past. Are there other potential downsides to increasing it?

Are there any other parameters we haven't thought of that may be relevant for this problem?

2. Distribution of PVs between appliances

We have a collective memory that "a maximum of 80 000 PVs per appliance is optimal."  The source of this statement is not remembered, however. Perhaps it was in a talk at a collaboration meeting. We have more PVs than this now (average around 100k on each of our three servers).

Can anybody explain where this 80k number came from? Is it expressing a limit on system resources, or some limit in the application? We find that the system resources such as RAM, CPU and network are all well within capacity on our servers. I would like to understand if there is some other factor we are not taking into account.

Many thanks,
Andy

 

-- 

This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail.
Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd.
Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message.
Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom
 


Navigate by Date:
Prev: archiver appliance start failed 网易邮件中心 via Tech-talk
Next: RE: archiver appliance start failed Abdalla Ahmad via Tech-talk
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  <20212022  2023  2024 
Navigate by Thread:
Prev: RE: archiver appliance start failed Shankar, Murali via Tech-talk
Next: Re: Archiver Appliance configuration questions Shankar, Murali via Tech-talk
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  <20212022  2023  2024 
ANJ, 04 Nov 2021 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·