EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024  2025  <2026 Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024  2025  <2026
<== Date ==> <== Thread ==>

Subject: Re: Problems with Dante (XGLab) Driver
From: Dariush Hampai via Tech-talk <tech-talk at aps.anl.gov>
To: Mark Rivers <rivers at cars.uchicago.edu>
Cc: "tech-talk at aps.anl.gov" <tech-talk at aps.anl.gov>
Date: Wed, 24 Jun 2026 16:06:03 +0200
Hi Mark,
the IOC doesn’t exit… simply it remains in “acquiring status” without exit (even StopAll command doesn’t work).
I tried to increase the poll (from 0.01 to 0.1) as Claude suggested, and apparently works.
Tomorrow I’ll start a complete session (so there will be many acquisition) in order to see if the problem remain.
Two days ago I’ll attach a complete (and full) report about threads…

Dariush

************************************

Dr. Dariush Hampai, PhD

INFN - LNF
X-Lab Frascati
Via E. Fermi, 54 (ex 40)
I-00044 Frascati (RM)
Italy

Mail Address:
XLab-Frascati
LNF-INFN
Casella Postale 13
Frascati (RM)
Italy

Room: +39.06.9403.5248
Lab.: +39.06.9403.2286
Mob.: +39.06.9403.8025
Fax.: +39.06.9403.2597

************************************



Il giorno 24 giu 2026, alle ore 15:47, Mark Rivers <rivers at cars.uchicago.edu> ha scritto:

Hi Dariush,

I am a bit confused.  In your original message you said:


That led me to understand that the IOC was crashing, i.e. the exiting with an error. 

I then asked you to run the IOC under gdb and send me the backtrace when it crashed.  You did that, and it showed the error was in the readline library.

But now I am wondering if it really crashed.  What did you do to get to the gdb prompt and type "backtrace".  Did you type ^C, i.e. did you force it to crash?

You need to be clear on whether it is "crashing" or "hanging".

To make progress on this you need to do the following:

Mark



From: Dariush Hampai <dariush.hampai at lnf.infn.it>
Sent: Wednesday, June 24, 2026 4:10 AM
To: Johnson, Andrew N. <anj at anl.gov>; Mark Rivers <rivers at cars.uchicago.edu>; tech-talk at aps.anl.gov <tech-talk at aps.anl.gov>
Subject: Re: Problems with Dante (XGLab) Driver
 

Hi Mark,
Hi Andrew,
using Claude maybe I found the problem (but not the solution...). Following what Claude ask, I use Dante in 4096 channels mode, not in mapping mode and without trigger.

Claude answer to me:

There are two key threads:
Thread 17 "acquisitionTask": 
#2 epicsThreadSleep 
#3 Dante::acquisitionTask (this=0x594560) at ../dante.cpp:1289 
Thread 8 "DANTE1": 
#2 epicsThreadSleep 
#3 Dante::writeInt32 (this=0x594560, value=1) at ../dante.cpp:461 
#4 asynPortDriver::writeInt32 
#5 processCallbackOutput (devAsynInt32.c) 
#6 portThread (asynManager.c)

1) Thread 8 "DANTE1" is the dedicated thread for the asyn port (portThread in asynManager.c) — the one EPICS uses to serialize all read/write requests on that asyn device. It is blocked inside Dante::writeInt32 with value=1, which with very high probability corresponds exactly to the EraseStart command you sent. This function internally calls epicsThreadSleep — so writeInt32 is waiting in a blocking, synchronous manner for something to happen (most likely for the acquisition to signal "done").

2) Thread 17 "acquisitionTask" is the internal thread of the Dante driver that should handle the acquisition loop, also stopped on epicsThreadSleep inside a loop at dante.cpp:1289. This is almost certainly a polling loop that periodically checks the acquisition status (while acquiring → sleep → check status).

The critical point: if acquisitionTask never sees the "acquisition completed" condition — due to a lost event, a race condition, or a status flag not being correctly updated by the firmware/board — it remains stuck in this loop forever. And since writeInt32 (called by the EraseStart command) most likely waits for this very completion before returning, the asyn port thread also remains blocked.

Why the IOC appears "frozen" and no commands work:

This is the most important detail: in asyn, a single thread (portThread) serializes all requests on that port. If that thread is blocked inside writeInt32, no other command can be processed on that device — not even StopAll or Reset, because they end up queued behind the blocked request. This is why the system appears completely frozen and unresponsive to anything until you restart the IOC.

Confirmation from official documentation:

This matches exactly what I found in the Performance page: in mapping mode with 2048 and 1024 channels, above a certain trigger frequency spectra are lost and the EPICS IOC must be restarted. It's very likely you're running into a variant of the same problem — a race condition in the driver where an acquisition completion event is lost, leaving acquisitionTask (and therefore writeInt32/EraseStart) waiting indefinitely. Paulscherrerinstitute

What you can do:

1) There is no documented clean software reset for this scenario, because the thread that should process any reset command is precisely the one that is blocked. Restarting the IOC remains, according to the documentation itself, the official solution.

2) Preventive mitigation — since the problem seems related to a lost event/state during acquisition, try to:
    a) Slightly increase PollTime (e.g., from 0.01 to 0.02-0.05s) to give the firmware more margin to communicate the status
    b) If you are in mapping mode with 1024/2048 channels and high trigger frequencies, evaluate whether you can switch to 4096 channels, where (according to the Performance table) the limits are much higher
    c) Verify that the Dante board firmware is at the latest version — bugs of this type are often fixed on the firmware side

3) If the problem persists, it's worth reporting it directly to EPICS Community or to Dante/XGLab support — the backtrace you produced is exactly the kind of evidence needed for a bug report, because it precisely identifies the lines (dante.cpp:1289 and dante.cpp:461) where the driver gets stuck.

Dariush

Il 23/06/2026 14:51, Dariush Hampai via Tech-talk ha scritto:

Hi Mark
Hi Andew
with a carefully read of "(gdb) thread apply all bt" I found a strange Thread almost at the end...

Thread 256 (Thread 0x7fff669a2640 (LWP 1412432) "save_restore"):
#0  0x00007ffff58884da in __futex_abstimed_wait_common () from /lib64/libc.so.6
#1  0x00007ffff588acaf in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libc.so.6
#2  0x00007ffff6c8578d in epicsEventWaitWithTimeout (pevent=0x7fff78020330, timeout=<optimized out>) at ../osi/os/posix/osdEvent.c:131
#3  0x00007ffff6c86c27 in myReceive (timeout=<optimized out>, size=512, message=0x200, pmsg=0xa724a0) at ../osi/os/default/osdMessageQueue.cpp:369
#4  epicsMessageQueueReceiveWithTimeout (pmsg=0xa724a0, message=message@entry=0x7fff669a1a40, size=size@entry=512, timeout=1) at ../osi/os/default/osdMessageQueue.cpp:404
#5  0x00007ffff73d5796 in save_restore () at ../save_restore.c:1226
#6  0x00007ffff6c82795 in start_routine (arg=0x3384670) at ../osi/os/posix/osdThread.c:442
#7  0x00007ffff588b4f9 in start_thread () from /lib64/libc.so.6
#8  0x00007ffff59106e0 in clone3 () from /lib64/libc.so.6

only here it was called the Thread "save_restore" and only here the "epicsMessageQueueReceiveWithTimeout"
maybe is this the problem?

Dariush

Il 23/06/2026 14:05, Dariush Hampai via Tech-talk ha scritto:

Hi Mark,
Hi Andrew,
any idea?

Dariush

Il 22/06/2026 11:25, Dariush Hampai ha scritto:

Dear Mark
Dear Andrew,

I replied the crash and it seems the same.

#0  0x00007ffff5904b92 in pselect () from /lib64/libc.so.6
#1  0x00007ffff6b143bb in rl_getc () from /lib64/libreadline.so.8
#2  0x00007ffff6b13cd1 in rl_read_key () from /lib64/libreadline.so.8
#3  0x00007ffff6af8497 in readline_internal_char () from /lib64/libreadline.so.8
#4  0x00007ffff6b01535 in readline () from /lib64/libreadline.so.8
#5  0x00007ffff6c85cd2 in osdReadline (context=0x444dc0, prompt=0x7ffff6c9c183 "epics> ") at ../osi/os/default/gnuReadline.c:70
#6  epicsReadline (prompt=0x7ffff6c9c183 "epics> ", context=0x444dc0) at ../osi/epicsReadline.c:68
#7  0x00007ffff6c77aea in iocshBody (pathname=<optimized out>, commandLine=0x0, macros=0x0) at ../iocsh/iocsh.cpp:1143
#8  0x000000000040a616 in main (argc=<optimized out>, argv=<optimized out>) at ../mcaDanteAppMain.cpp:20

In attach I'll put the output of "thread apply all backtrace​"

awaiting your replies...

Dariush


Il 19/06/2026 17:13, Johnson, Andrew N. ha scritto:
Hi Dariush,

Please also include any messages output just before and announcing the crash, and instead of just the gdb command  backtrace  first run  set height 0  to disable the pager and then  thread apply all backtrace  which will produce lots of output that may help Mark diagnose the problem.

- Andrew

-- 
Complexity comes for free, Simplicity you have to work for.

On 6/19/26, 9:50 AM, "Tech-talk" <tech-talk-bounces at aps.anl.gov> wrote:

What did you do that triggered the crash this time?

Please continue to run the IOC using gdb.  Each time it crashes save the output of backtrace.  We need to see if it is always crashing in the readline library.

Mark



From: Dariush Hampai <dariush.hampai at lnf.infn.it>
Sent: Friday, June 19, 2026 9:45 AM
To: Mark Rivers <rivers at cars.uchicago.edu>; tech-talk at aps.anl.gov <tech-talk at aps.anl.gov>
Subject: Re: Problems with Dante (XGLab) Driver
 
Dear Mark,
I don't know if it is the same for all the previous crashes... however the effects are the same...
Dariush

Il 19/06/2026 16:42, Mark Rivers ha scritto:
Hi Dariush,

The gdb backtrace says that the crash is actually in the Linux readline library.  Was this crash caused by the same sequence of events as previous crashes you observed?

Mark



From: Dariush Hampai <dariush.hampai at lnf.infn.it>
Sent: Friday, June 19, 2026 9:17 AM
To: Mark Rivers <rivers at cars.uchicago.edu>; tech-talk at aps.anl.gov <tech-talk at aps.anl.gov>
Subject: Re: Problems with Dante (XGLab) Driver
 
 Hi Mark,

Are you using a Dante1 or a Dante8?
I'm using Dante8

Does this happen every time you start, or just occasionally.  If it is occasionally, then how frequently does it happen?
Occasionally, more often when I two records are executed very closely

Are there any error messages on the IOC?
no, however some records are in "acquire" exit (as $(P)$(M).ACQG in mca window)

Are you running on Linux or Windows? 
Linux (Centos 9)

If you are running on Linux then please run the IOC in the GNU debugger. You can do that with the following commands from the iocDante1 directory:

gdb ../../bin/linux-x86_64/mcaDanteApp
run st.cmd

When it crashes type this command at the debugger prompt:

backtrace

(gdb) backtrace
#0  0x00007ffff5904b92 in pselect () from /lib64/libc.so.6
#1  0x00007ffff6b143bb in rl_getc () from /lib64/libreadline.so.8
#2  0x00007ffff6b13cd1 in rl_read_key () from /lib64/libreadline.so.8
#3  0x00007ffff6af8497 in readline_internal_char () from /lib64/libreadline.so.8
#4  0x00007ffff6b01535 in readline () from /lib64/libreadline.so.8
#5  0x00007ffff6c85cd2 in osdReadline (context=0x444dc0, prompt=0x7ffff6c9c183 "epics> ") at ../osi/os/default/gnuReadline.c:70
#6  epicsReadline (prompt=0x7ffff6c9c183 "epics> ", context=0x444dc0) at ../osi/epicsReadline.c:68
#7  0x00007ffff6c77aea in iocshBody (pathname=<optimized out>, commandLine=0x0, macros=0x0) at ../iocsh/iocsh.cpp:1143
#8  0x000000000040a616 in main (argc=<optimized out>, argv=<optimized out>) at ../mcaDanteAppMain.cpp:20

Thank you in advance
Dariush e Maurizio

Il 16/06/2026 18:18, Mark Rivers ha scritto:
Hi Dariush,

Are you using a Dante1 or a Dante8?

Does this happen every time you start, or just occasionally.  If it is occasionally, then how frequently does it happen?

Are there any error messages on the IOC?

Are you running on Linux or Windows? 

If you are running on Linux then please run the IOC in the GNU debugger. You can do that with the following commands from the iocDante1 directory:

gdb ../../bin/linux-x86_64/mcaDanteApp
run st.cmd

When it crashes type this command at the debugger prompt:

backtrace

Send me the output.

Mark





From: Dariush Hampai <dariush.hampai at lnf.infn.it>
Sent: Tuesday, June 16, 2026 10:23 AM
To: Mark Rivers <rivers at cars.uchicago.edu>; tech-talk at aps.anl.gov <tech-talk at aps.anl.gov>
Subject: Problems with Dante (XGLab) Driver

Hi Community,
Hi Mark,

I'm almost finish the implementation of Dante EPICS Drivers in our
system, however I have a big problem (maybe a bug?).
When I start few acquisitions (from caput command or from Phebus), the
system seems that crash.
The Dante:dante2:ElapsedRealTime stops (not on target). Moreover on
Phebus the text is in "Collecting" mode, The Acquire Busy is in
"Acquiring" mode and the IOC do not respond to any command that I send.
Up to now, the only solution is to stop the IOC and restart it.
What's the problem?
Is there a possibility to force a reset/reinitialize the driver without
stop and restart it?

awaiting your (precious) help

Dariush


-- ************************************ Dr. Dariush Hampai, PhD INFN - LNF X-Lab Frascati Via E. Fermi, 54 (ex 40) I-00044 Frascati (RM) Italy Mail Address: XLab-Frascati LNF-INFN Casella Postale 13 Frascati (RM) Italy Room: +39.06.9403.5248 Lab.: +39.06.9403.2286 Mob.: +39.06.9403.8025 Fax.: +39.06.9403.2597 ************************************
-- ************************************ Dr. Dariush Hampai, PhD INFN - LNF X-Lab Frascati Via E. Fermi, 54 (ex 40) I-00044 Frascati (RM) Italy Mail Address: XLab-Frascati LNF-INFN Casella Postale 13 Frascati (RM) Italy Room: +39.06.9403.5248 Lab.: +39.06.9403.2286 Mob.: +39.06.9403.8025 Fax.: +39.06.9403.2597 ************************************
-- ************************************ Dr. Dariush Hampai, PhD INFN - LNF X-Lab Frascati Via E. Fermi, 54 (ex 40) I-00044 Frascati (RM) Italy Mail Address: XLab-Frascati LNF-INFN Casella Postale 13 Frascati (RM) Italy Room: +39.06.9403.5248 Lab.: +39.06.9403.2286 Mob.: +39.06.9403.8025 Fax.: +39.06.9403.2597 ************************************
-- ************************************ Dr. Dariush Hampai, PhD INFN - LNF X-Lab Frascati Via E. Fermi, 54 (ex 40) I-00044 Frascati (RM) Italy Mail Address: XLab-Frascati LNF-INFN Casella Postale 13 Frascati (RM) Italy Room: +39.06.9403.5248 Lab.: +39.06.9403.2286 Mob.: +39.06.9403.8025 Fax.: +39.06.9403.2597 ************************************
-- ************************************ Dr. Dariush Hampai, PhD INFN - LNF X-Lab Frascati Via E. Fermi, 54 (ex 40) I-00044 Frascati (RM) Italy Mail Address: XLab-Frascati LNF-INFN Casella Postale 13 Frascati (RM) Italy Room: +39.06.9403.5248 Lab.: +39.06.9403.2286 Mob.: +39.06.9403.8025 Fax.: +39.06.9403.2597 ************************************
-- ************************************ Dr. Dariush Hampai, PhD INFN - LNF X-Lab Frascati Via E. Fermi, 54 (ex 40) I-00044 Frascati (RM) Italy Mail Address: XLab-Frascati LNF-INFN Casella Postale 13 Frascati (RM) Italy Room: +39.06.9403.5248 Lab.: +39.06.9403.2286 Mob.: +39.06.9403.8025 Fax.: +39.06.9403.2597 ************************************


Replies:
Re: Problems with Dante (XGLab) Driver Mark Rivers via Tech-talk
References:
Problems with Dante (XGLab) Driver Dariush Hampai via Tech-talk
Re: Problems with Dante (XGLab) Driver Mark Rivers via Tech-talk
Re: Problems with Dante (XGLab) Driver Dariush Hampai via Tech-talk
Re: Problems with Dante (XGLab) Driver Mark Rivers via Tech-talk
Re: Problems with Dante (XGLab) Driver Dariush Hampai via Tech-talk
Re: Problems with Dante (XGLab) Driver Mark Rivers via Tech-talk
Re: Problems with Dante (XGLab) Driver Johnson, Andrew N. via Tech-talk
Re: Problems with Dante (XGLab) Driver Dariush Hampai via Tech-talk
Re: Problems with Dante (XGLab) Driver Dariush Hampai via Tech-talk
Re: Problems with Dante (XGLab) Driver Dariush Hampai via Tech-talk
Re: Problems with Dante (XGLab) Driver Dariush Hampai via Tech-talk
Re: Problems with Dante (XGLab) Driver Mark Rivers via Tech-talk

Navigate by Date:
Prev: RxEpics/cpp: ReactiveX-style recipes for EPICS/PVXS Khokhriakov, Igor via Tech-talk
Next: Re: Problems with Dante (XGLab) Driver Mark Rivers via Tech-talk
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024  2025  <2026
Navigate by Thread:
Prev: Re: Problems with Dante (XGLab) Driver Mark Rivers via Tech-talk
Next: Re: Problems with Dante (XGLab) Driver Mark Rivers via Tech-talk
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024  2025  <2026
ANJ, 24 Jun 2026 · Home · News · About · Talk · Base · Modules · Extensions ·
· Distributions · Download · Documents · Links · Licensing ·