Experimental Physics and
Industrial Control System

1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 <2026>	Index	1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 <2026>
<== Date ==>		<== Thread ==>

Subject:	Archiver Appliance stuck on initial sampling
From:	Dennis Hilhorst via Tech-talk <tech-talk at aps.anl.gov>
To:	"tech-talk at aps.anl.gov" <tech-talk at aps.anl.gov>
Date:	Fri, 9 Jan 2026 10:13:57 +0000

Hi all,

We have been using the Archiver Appliance, but for some reason one deployment is getting stuck adding new PVs. Here’s the situation:

We are using version 2.1.2 of the archiver appliance (using the EPNix package https://github.com/epics-extensions/EPNix/blob/master/pkgs/by-name/archiver-appliance/package.nix), and tomcat version 9.0.108 (also through nix) on NixOS 25.05. Before, we were using version 2.0.10 of the Archiver Appliance (which was the setup we had which was working just fine before, but also suddenly started hanging). The services start just fine and the web interface and api respond just fine, but the archiver seems to get stuck trying to sample PVs. Basically, we submit a list of about 4500 PVs for it to archive, it gets to work and adds some of them after a couple of minutes. Then after a while it just hangs, and it didn’t add any more PVs for an entire day. In the management / metrics page, it shows that the number of “PVs pending computation of meta info” is 1000 (which seems to be the maximum). We then tried rebooting, after which some of the PVs that did get through become disconnected from the archiver (“Disconnected PV count” in the metrics page was no longer 0, and we found some variables that indeed were no longer being archived). Some PVs would still be connected, and it would add some more PVs, but it would again get stuck after a while, with 1000 PVs pending computation of meta info. Other metrics do update, for example the events / sec or MB / sec metrics, and we can see data through the Grafana plugin so the archiver is definitely doing stuff. The tomcat logs don’t make me any wiser, they just repeat:

Jan 09 08:17:26 a3xr-control startup.sh[34080]: INFO Running the archive PV workflow with 1387 requests pending (org.epics.archiverappliance.mgmt.MgmtRuntimeState)

Jan 09 08:17:36 a3xr-control startup.sh[34080]: INFO Appliances that have loaded their PVsappliance0 (org.epics.archiverappliance.config.DefaultConfigService)

Jan 09 08:17:36 a3xr-control startup.sh[34080]: INFO Running the archive PV workflow with 1387 requests pending (org.epics.archiverappliance.mgmt.MgmtRuntimeState)

Jan 09 08:17:46 a3xr-control startup.sh[34080]: INFO Appliances that have loaded their PVsappliance0 (org.epics.archiverappliance.config.DefaultConfigService)

Jan 09 08:17:46 a3xr-control startup.sh[34080]: INFO Running the archive PV workflow with 1387 requests pending (org.epics.archiverappliance.mgmt.MgmtRuntimeState)

Jan 09 08:17:56 a3xr-control startup.sh[34080]: INFO Appliances that have loaded their PVsappliance0 (org.epics.archiverappliance.config.DefaultConfigService)

Jan 09 08:17:56 a3xr-control startup.sh[34080]: INFO Running the archive PV workflow with 1387 requests pending (org.epics.archiverappliance.mgmt.MgmtRuntimeState)

I increased the verbosity by adding JAVA_OPTS=-verbose:class and the -v parameter to the startup script, I could tell that it struggled to detect DBR types for python softioc-hosted PVs, but there should only be about ~250 of those, and it would at some point just give up (Aborting archive request for pv nbl_a:bpm1:x_avg Reason: (org.epics.archiverappliance.mgmt.archivepv.ArchivePVState)), so I don’t think it’s getting stuck on those. I didn’t see anything else that seems to indicate it hanging on specific PVs, and at some point the logs would just be what is stated above, repeated again and again.

We tried completely wiping the archiver’s sts / mts / lts directories and completely wiping the mysql and tomcat environments (which does seem to reset the archiver, but the same problem would occur right after). The policies.py file we are using is the default one packaged in EPNix (I think it’s the same one that can be found here: https://gitlab.esss.lu.se/julianomurari/epicsarchiver-config/-/blob/master/policies/default_policies.py). I also tried increasing callbackSetQueueSizeto 50000 in our ioc, hoping the issue would be an excess of monitor callbacks, but alas it was to no avail. I am not sure where else to look, there don’t seem to be any more detailed logs for the archiver appliance for me to see which PVs it is getting stuck on.

For completeness, our appliances.xml looks like this:

<identity>appliance0</identity>

<cluster_inetport>localhost:16670</cluster_inetport>

<mgmt_url>http://localhost:8080/mgmt/bpl</mgmt_url>

<engine_url>http://localhost:8080/engine/bpl</engine_url>

<etl_url>http://localhost:8080/etl/bpl</etl_url>

<retrieval_url>http://localhost:8080/retrieval/bpl</retrieval_url>

<data_retrieval_url>http://localhost:8080/retrieval</data_retrieval_url>

</appliance>

</appliances>

And here are the full stats after it hung for the first time after a complete sts / mts / lts and mysql wipe:

Attribute	Detail
Appliance Identity	appliance0
Total PV count	1449
Disconnected PV count	0
Connected PV count	1449
Paused PV count	0
Total channels	5435
Approx pending jobs in engine queue	1
Event Rate (in events/sec)	4.86
Data Rate (in bytes/sec)	58.8
Data Rate in (GB/day)	0
Data Rate in (GB/year)	1.73
Time consumed for writing samplebuffers to STS (in secs)	0
Benchmark - writing at (events/sec)	10,674.23
Benchmark - writing at (MB/sec)	0.12
PVs pending computation of meta info	1000
Total number of CAJ channels	12452
Channels with pending search requests	7000 of 12452
Total number of ETL runs into MTS so far	20
Average time spent in ETL into MTS (s/run)	0.04
Average percentage of time spent in ETL	0
Approximate time taken by last ETL job (s)	0
Estimated weekly usage in ETL (%)	0
Avg time spent by getETLStreams (s/run)	0.01
Avg time spent by free space checks (s/run)	0
Avg time spent by prepareForNewPartition() (s/run)	0
Avg time spent by appendToETLAppendData() (s/run)	0.03
Avg time spent by commitETLAppendData() (s/run)	0
Avg time spent by markForDeletion() in ETL (s/run)	0
Avg time spent by runPostProcessors() in ETL (s/run)	0
Avg time spent by executePostETLTasks() in ETL (s/run)	0
Estimated bytes transferred in ETL (MTS)(MB)	4.54
Number of Retrieval Requests	23
Time of last Retrieval Request	Jan/07/2026 09:25:22 GMT
Number of unique users	2
PVs in archive workflow	3063
Capacity planning last update	Jan/06/2026 13:42:44 GMT
Engine write thread usage	0
Aggregated appliance storage rate (in GB/year)	17.57
Aggregated appliance event rate (in events/sec)	41.32
Aggregated appliance PV count	1,838
Incremental appliance storage rate (in GB/year)	17.57
Incremental appliance event rate (in events/sec)	41.32
Incremental appliance PV count	1,838

We haven’t tried a complete reinstall of the system, but of course we would prefer to figure out what is going wrong so that we can prevent it from happening in the future. We would greatly appreciate help trying to debug / fix this issue!

Sincerely,

Dennis Hilhorst

Navigate by Date:: Prev: Re: NDPluginFile question Jörn Dreyer via Tech-talk; Next: RE: Archiver Appliance stuck on initial sampling Sky Brewer via Tech-talk; Index: 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 <2026>
Navigate by Thread:: Prev: Re: The motor module is unable to control the PM600 motor. Mark Rivers via Tech-talk; Next: RE: Archiver Appliance stuck on initial sampling Sky Brewer via Tech-talk; Index: 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 <2026>

ANJ, 19 Mar 2026

· Home · News · About · Talk · Base · Modules · Extensions ·
· Distributions · Download · Documents · Links · Licensing ·

Experimental Physics and Industrial Control System

Experimental Physics and
Industrial Control System