1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 <2021> 2022 2023 2024 | Index | 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 <2021> 2022 2023 2024 |
<== Date ==> | <== Thread ==> |
---|
Subject: | Archiver Appliance configuration questions |
From: | "Wilson, Andy \(DLSLtd, RAL, LSCI\) via Tech-talk" <tech-talk at aps.anl.gov> |
To: | EPICS tech-talk <tech-talk at aps.anl.gov> |
Cc: | "Williams, Rebecca \(OBS, RAL, LSCI\)" <rebecca.williams at diamond.ac.uk>, "Gaughran, Martin \(DLSLtd, RAL, LSCI\)" <Martin.Gaughran at diamond.ac.uk> |
Date: | Thu, 4 Nov 2021 11:18:36 +0000 |
Hello,
I am looking for some advice on the details of configuration for the EPICS Archiver Appliance (AA).
At Diamond, our AA installation was configured in 2017 and the config hasn't changed much since then, although we regularly update the software.
We have recently started seeing a problem where the AA is taking unusually long (sometimes days) to establish new connections to PVs, and re-establish connections which are interrupted, e.g. by IOC restarts.
We have determined that we have a large number of PVs requested for archiving that do not exist (around 6 %), and that this is likely the root cause, so we are currently working to reduce these to a manageable level.
We would like to rule out any other factors. We have identified two possibilities.
1. Configuration options in archappl.properties
We have
archivePVWorkflowBatchSize set to 30 000. archivePVWorkflowTickSeconds is not defined so must be using the default. We do not think that these two are causing our problem because we are not close to having this many PVs pending.
We also see the issue with re-establishing interrupted connections to PVs which are already archiving, so I think it is more likely something at the channel access client library level.
The only parameter that looks like it could be relevant is
org.epics.archiverappliance.engine.epics.commandThreadCount
The default is 10. Our site configuration has it set to 1. I do not have a record of the reason for this value.
The comment for this in archappl.properties says:
What precise effects would we observe if this value were not high enough?
Why may a value of 1 have been chosen? It is likely that we wanted to throttle the level of channel access search requests. Excessive broadcast traffic from the AA has caused problems at DLS in the past. Are there other potential downsides to increasing it?
Are there any other parameters we haven't thought of that may be relevant for this problem?
2. Distribution of PVs between appliances
We have a collective memory that "a maximum of 80 000 PVs per appliance is optimal." The source of this statement is not remembered, however. Perhaps it was in a talk at a collaboration meeting. We have more PVs than this now (average around 100k on each of
our three servers).
Can anybody explain where this 80k number came from? Is it expressing a limit on system resources, or some limit in the application? We find that the system resources such as RAM, CPU and network are all well within capacity on our servers. I would like to
understand if there is some other factor we are not taking into account.
Many thanks,
Andy
-- This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail. |