Unfortunately, the script did not work for some PVs but this might be a separate issue. The script pauses archiving, consolidates data,
and then resumes archiving. I noticed on a number of PVs the pause and resume fails with HTTP 500 returned from the server, this error is the result of a java null pointer exception (attached) which is printed in the MGMT catalina.err. After some troubleshooting,
I found out that these failed PVs are all aliases, I was not able to figure out why this happened so I ended up building the master branch and it is working so far.
Best Regards,
Abdalla Al-Dalleh
Control Engineer
SESAME
From: Abdalla Ahmad
Sent: Monday, January 26, 2026 8:57 AM
To: 'Hu, Yong' <yhu at bnl.gov>; tech-talk at aps.anl.gov
Subject: RE: Investigating Archiver Appliance lost events
Hi Yong
Your observation is valid, and we faced it before but it is not the case this time as the fastest PV we have is archived at 10 Hz, and the issue happens with a large number of PVs, we are archiving 18K+ PVs and
I think there was less than 2K PVs in the STS. I am not sure how valid is this, but I remember the issue happened when I executed a script that fetches PV names list and consolidates data for all PVs to the LTS (The management BPL consolidateDataForPV). Now
the server is working, I will execute the script again and see how it goes.
Best Regards,
Abdalla Al-Dalleh
Control Engineer
SESAME
I recall that we had similar issues at NSLS-2. The buffer refers to the troublesome PV on the AA, not the OS. For us, 'buffer full' means the PV updates too fast (maybe someone
has made the changes on the PV). I remember the default monitor-based archiving rate is limited to 1Hz for scaler PVs. I am guessing your problematic PVs update faster the default rate. You can use camonitor (or caEventRate) to verify the update rate.
Hi
This morning we noticed strange behavior on the archiver, it unable to write PB files for a large number of PVs, we noticed this behavior when doing live plots where the plotter plots only the first few 3 or
4 sample and no archived data at all. During troubleshooting, I noticed on one of the problematic PVs it had on its Details page the following parameters increasing every second (i.e. every sample received):
- How many events lost because the sample buffer is full so far?
- How many events lost totally so far?
Both AA and OS logs were not helpful, I ended up rebooting the server itself. My question is which buffer is the details page referring to? Is it buffer within AA itself or the OS?
Best Regards,
Abdalla Al-Dalleh
Control Engineer
SESAME
P.O. Box 7, Allan 19252, Jordan
Tel: +96253511348 , ext. 265
Fax: +96253511423
Email : abdalla.ahmad at sesame.org.jo
Website:
www.sesame.org.jo
|
// mgmt/logs/cataline.err contents when doing pause or resume.
Jan 26, 2026 10:03:08 AM org.apache.catalina.core.StandardWrapperValve invoke
SEVERE: Servlet.service() for servlet [BPLServlet] in context with path [/mgmt] threw exception
java.io.IOException: java.lang.NullPointerException
at org.epics.archiverappliance.common.BasicDispatcher.handleBPLAction(BasicDispatcher.java:85)
at org.epics.archiverappliance.common.BasicDispatcher.dispatch(BasicDispatcher.java:50)
at org.epics.archiverappliance.mgmt.BPLServlet.doGet(BPLServlet.java:214)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:529)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:623)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:197)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:142)
at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:51)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:166)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:142)
at org.apache.catalina.filters.CorsFilter.handleNonCORS(CorsFilter.java:334)
at org.apache.catalina.filters.CorsFilter.doFilter(CorsFilter.java:161)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:166)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:142)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:166)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:88)
at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:481)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:90)
at org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:653)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:72)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:344)
at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:398)
at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:63)
at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:935)
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1833)
at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:52)
at org.apache.tomcat.util.threads.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:975)
at org.apache.tomcat.util.threads.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:493)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:63)
at java.base/java.lang.Thread.run(Thread.java:1474)
Caused by: java.lang.NullPointerException
// mgmt/logs/cataline.out contents when doing pause and resume on a problematic PV.
2026-01-26 10:05:36,880 INFO [http-nio-17665-exec-9] common.BasicDispatcher (BasicDispatcher.java:44) - Servicing /pauseArchivingPV
2026-01-26 10:05:36,890 ERROR [http-nio-17665-exec-9] common.BasicDispatcher (BasicDispatcher.java:84) - null
java.lang.NullPointerException: null
2026-01-26 10:41:21,519 INFO [http-nio-17665-exec-1] common.BasicDispatcher (BasicDispatcher.java:44) - Servicing /resumeArchivingPV
2026-01-26 10:41:21,530 ERROR [http-nio-17665-exec-1] common.BasicDispatcher (BasicDispatcher.java:84) - null
java.lang.NullPointerException: null
2026-01-26 10:41:21,532 WARN [hz.appliance0.event-4] persistence.MySQLPersistence (MySQLPersistence.java:360) - 2 rows changed when updating key SR-PS-GW5-CH11-PS2:getIload in putTypeInfo
// mgmt/logs/cataline.out contents when doing pause and resume on a working PV.
2026-01-26 10:20:10,383 INFO [http-nio-17665-exec-5] common.BasicDispatcher (BasicDispatcher.java:44) - Servicing /pauseArchivingPV
2026-01-26 10:20:10,395 WARN [hz.appliance0.event-5] persistence.MySQLPersistence (MySQLPersistence.java:360) - 2 rows changed when updating key SRC01-VA-IMG1:getPressure in putTypeInfo
2026-01-26 10:20:10,405 INFO [http-nio-17665-exec-2] common.BasicDispatcher (BasicDispatcher.java:44) - Servicing /getPVDetails
2026-01-26 10:20:10,405 INFO [http-nio-17665-exec-2] reports.PVDetails (PVDetails.java:70) - Getting the detailed status for PV SRC01-VA-IMG1:getPressure
2026-01-26 10:20:30,243 INFO [http-nio-17665-exec-6] common.BasicDispatcher (BasicDispatcher.java:44) - Servicing /resumeArchivingPV
2026-01-26 10:20:30,257 WARN [hz.appliance0.event-5] persistence.MySQLPersistence (MySQLPersistence.java:360) - 2 rows changed when updating key SRC01-VA-IMG1:getPressure in putTypeInfo
2026-01-26 10:20:30,268 INFO [http-nio-17665-exec-1] common.BasicDispatcher (BasicDispatcher.java:44) - Servicing /getPVDetails
2026-01-26 10:20:30,268 INFO [http-nio-17665-exec-1] reports.PVDetails (PVDetails.java:70) - Getting the detailed status for PV SRC01-VA-IMG1:getPressure
- References:
- Investigating Archiver Appliance lost events Abdalla Ahmad via Tech-talk
- Re: Investigating Archiver Appliance lost events Hu, Yong via Tech-talk
- RE: Investigating Archiver Appliance lost events Abdalla Ahmad via Tech-talk
- Navigate by Date:
- Prev:
Keithley 2460 sourcemeter Pete Jemian via Tech-talk
- Next:
Re: Tech-talk Digest, Vol 20, Issue 24 Sourabh Halli via Tech-talk
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
<2026>
- Navigate by Thread:
- Prev:
RE: Investigating Archiver Appliance lost events Abdalla Ahmad via Tech-talk
- Next:
RE: Investigating Archiver Appliance lost events Sky Brewer via Tech-talk
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
<2026>
|