| Hi Mark, the IOC doesn’t exit… simply it remains in “acquiring status” without exit (even StopAll command doesn’t work). I tried to increase the poll (from 0.01 to 0.1) as Claude suggested, and apparently works. Tomorrow I’ll start a complete session (so there will be many acquisition) in order to see if the problem remain. Two days ago I’ll attach a complete (and full) report about threads…
Dariush
************************************
Dr. Dariush Hampai, PhD
INFN - LNF X-Lab Frascati Via E. Fermi, 54 (ex 40) I-00044 Frascati (RM) Italy
Mail Address: XLab-Frascati LNF-INFN Casella Postale 13 Frascati (RM) Italy Room: +39.06.9403.5248 Lab.: +39.06.9403.2286 Mob.: +39.06.9403.8025 Fax.: +39.06.9403.2597
************************************
Il giorno 24 giu 2026, alle ore 15:47, Mark Rivers <rivers at cars.uchicago.edu> ha scritto:
Hi Dariush,
I am a bit confused. In your original message you said:
That led me to understand that the IOC was crashing, i.e. the exiting with an error.
I then asked you to run the IOC under gdb and send me the backtrace when it crashed. You did that, and it showed the error was in the readline library.
But now I am wondering if it really crashed. What did you do to get to the gdb prompt and type "backtrace". Did you type ^C, i.e. did you force it to crash?
You need to be clear on whether it is "crashing" or "hanging".
To make progress on this you need to do the following: Tell us exactly what versions of each module you are using (base, asyn, mca, dante). Send a screenshot of the Dante screen when it crashes so we know how you have configured it. Run the IOC under gdb and send the complete output from when you first start gdb to when it crashes and you type "backtrace".
Mark
Hi Mark, Hi Andrew, using Claude maybe I found the problem (but not the solution...). Following what Claude ask, I use Dante in 4096 channels mode, not in mapping mode and without trigger. Claude answer to me: There are two key threads: Thread 17 "acquisitionTask": #2 epicsThreadSleep #3 Dante::acquisitionTask (this=0x594560) at ../dante.cpp:1289 Thread 8 "DANTE1": #2 epicsThreadSleep #3 Dante::writeInt32 (this=0x594560, value=1) at ../dante.cpp:461 #4 asynPortDriver::writeInt32 #5 processCallbackOutput (devAsynInt32.c) #6 portThread (asynManager.c) 1) Thread 8 "DANTE1" is the dedicated thread for the asyn port (portThread in asynManager.c) — the one EPICS uses to serialize all read/write requests on that asyn device. It is blocked inside Dante::writeInt32 with value=1, which with very high probability corresponds exactly to the EraseStart command you sent. This function internally calls epicsThreadSleep — so writeInt32 is waiting in a blocking, synchronous manner for something to happen (most likely for the acquisition to signal "done").
2) Thread 17 "acquisitionTask" is the internal thread of the Dante driver that should handle the acquisition loop, also stopped on epicsThreadSleep inside a loop at dante.cpp:1289. This is almost certainly a polling loop that periodically checks the acquisition status (while acquiring → sleep → check status).
The critical point: if acquisitionTask never sees the "acquisition completed" condition — due to a lost event, a race condition, or a status flag not being correctly updated by the firmware/board — it remains stuck in this loop forever. And since writeInt32 (called by the EraseStart command) most likely waits for this very completion before returning, the asyn port thread also remains blocked.
Why the IOC appears "frozen" and no commands work:
This is the most important detail: in asyn, a single thread (portThread) serializes all requests on that port. If that thread is blocked inside writeInt32, no other command can be processed on that device — not even StopAll or Reset, because they end up queued behind the blocked request. This is why the system appears completely frozen and unresponsive to anything until you restart the IOC.
Confirmation from official documentation:
This matches exactly what I found in the Performance page: in mapping mode with 2048 and 1024 channels, above a certain trigger frequency spectra are lost and the EPICS IOC must be restarted. It's very likely you're running into a variant of the same problem — a race condition in the driver where an acquisition completion event is lost, leaving acquisitionTask (and therefore writeInt32/EraseStart) waiting indefinitely. Paulscherrerinstitute
What you can do:
1) There is no documented clean software reset for this scenario, because the thread that should process any reset command is precisely the one that is blocked. Restarting the IOC remains, according to the documentation itself, the official solution.
2) Preventive mitigation — since the problem seems related to a lost event/state during acquisition, try to: a) Slightly increase PollTime (e.g., from 0.01 to 0.02-0.05s) to give the firmware more margin to communicate the status b) If you are in mapping mode with 1024/2048 channels and high trigger frequencies, evaluate whether you can switch to 4096 channels, where (according to the Performance table) the limits are much higher c) Verify that the Dante board firmware is at the latest version — bugs of this type are often fixed on the firmware side
3) If the problem persists, it's worth reporting it directly to EPICS Community or to Dante/XGLab support — the backtrace you produced is exactly the kind of evidence needed for a bug report, because it precisely identifies the lines (dante.cpp:1289 and dante.cpp:461) where the driver gets stuck.
Dariush Il 23/06/2026 14:51, Dariush Hampai via Tech-talk ha scritto: Hi Mark Hi Andew with a carefully read of "(gdb) thread apply all bt" I found a strange Thread almost at the end...
Thread 256 (Thread 0x7fff669a2640 (LWP 1412432) "save_restore"): #0 0x00007ffff58884da in __futex_abstimed_wait_common () from /lib64/libc.so.6 #1 0x00007ffff588acaf in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libc.so.6 #2 0x00007ffff6c8578d in epicsEventWaitWithTimeout (pevent=0x7fff78020330, timeout=<optimized out>) at ../osi/os/posix/osdEvent.c:131 #3 0x00007ffff6c86c27 in myReceive (timeout=<optimized out>, size=512, message=0x200, pmsg=0xa724a0) at ../osi/os/default/osdMessageQueue.cpp:369 #4 epicsMessageQueueReceiveWithTimeout (pmsg=0xa724a0, message=message@entry=0x7fff669a1a40, size=size@entry=512, timeout=1) at ../osi/os/default/osdMessageQueue.cpp:404 #5 0x00007ffff73d5796 in save_restore () at ../save_restore.c:1226 #6 0x00007ffff6c82795 in start_routine (arg=0x3384670) at ../osi/os/posix/osdThread.c:442 #7 0x00007ffff588b4f9 in start_thread () from /lib64/libc.so.6 #8 0x00007ffff59106e0 in clone3 () from /lib64/libc.so.6
only here it was called the Thread "save_restore" and only here the "epicsMessageQueueReceiveWithTimeout" maybe is this the problem?
Dariush Il 23/06/2026 14:05, Dariush Hampai via Tech-talk ha scritto: Hi Mark, Hi Andrew, any idea?
Dariush Il 22/06/2026 11:25, Dariush Hampai ha scritto: Dear Mark Dear Andrew,
I replied the crash and it seems the same.
#0 0x00007ffff5904b92 in pselect () from /lib64/libc.so.6 #1 0x00007ffff6b143bb in rl_getc () from /lib64/libreadline.so.8 #2 0x00007ffff6b13cd1 in rl_read_key () from /lib64/libreadline.so.8 #3 0x00007ffff6af8497 in readline_internal_char () from /lib64/libreadline.so.8 #4 0x00007ffff6b01535 in readline () from /lib64/libreadline.so.8 #5 0x00007ffff6c85cd2 in osdReadline (context=0x444dc0, prompt=0x7ffff6c9c183 "epics> ") at ../osi/os/default/gnuReadline.c:70 #6 epicsReadline (prompt=0x7ffff6c9c183 "epics> ", context=0x444dc0) at ../osi/epicsReadline.c:68 #7 0x00007ffff6c77aea in iocshBody (pathname=<optimized out>, commandLine=0x0, macros=0x0) at ../iocsh/iocsh.cpp:1143 #8 0x000000000040a616 in main (argc=<optimized out>, argv=<optimized out>) at ../mcaDanteAppMain.cpp:20 In attach I'll put the output of "thread apply all backtrace" awaiting your replies...
Dariush
Il 19/06/2026 17:13, Johnson, Andrew N. ha scritto: Hi Dariush,
Please also include any messages output just before and announcing the crash, and instead of just the gdb command backtrace first run set height 0 to disable the pager and then thread apply all backtrace which will produce lots of output that may help Mark diagnose the problem.
- Andrew
-- Complexity comes for free, Simplicity you have to work for.
What did you do that triggered the crash this time?
Please continue to run the IOC using gdb. Each time it crashes save the output of backtrace. We need to see if it is always crashing in the readline library.
Mark
Dear Mark, I don't know if it is the same for all the previous crashes... however the effects are the same... Dariush
Il 19/06/2026 16:42, Mark Rivers ha scritto: Hi Dariush,
The gdb backtrace says that the crash is actually in the Linux readline library. Was this crash caused by the same sequence of events as previous crashes you observed?
Mark
Hi Mark,
Are you using a Dante1 or a Dante8? I'm using Dante8 Does this happen every time you start, or just occasionally. If it is occasionally, then how frequently does it happen? Occasionally, more often when I two records are executed very closely
Are there any error messages on the IOC? no, however some records are in "acquire" exit (as $(P)$(M).ACQG in mca window)
Are you running on Linux or Windows? Linux (Centos 9)
If you are running on Linux then please run the IOC in the GNU debugger. You can do that with the following commands from the iocDante1 directory:
gdb ../../bin/linux-x86_64/mcaDanteApp run st.cmd
When it crashes type this command at the debugger prompt:
backtrace
(gdb) backtrace #0 0x00007ffff5904b92 in pselect () from /lib64/ libc.so.6 #1 0x00007ffff6b143bb in rl_getc () from /lib64/ libreadline.so.8 #2 0x00007ffff6b13cd1 in rl_read_key () from /lib64/ libreadline.so.8 #3 0x00007ffff6af8497 in readline_internal_char () from /lib64/ libreadline.so.8 #4 0x00007ffff6b01535 in readline () from /lib64/ libreadline.so.8 #5 0x00007ffff6c85cd2 in osdReadline (context=0x444dc0, prompt=0x7ffff6c9c183 "epics> ") at ../osi/os/default/gnuReadline.c:70 #6 epicsReadline (prompt=0x7ffff6c9c183 "epics> ", context=0x444dc0) at ../osi/epicsReadline.c:68 #7 0x00007ffff6c77aea in iocshBody (pathname=<optimized out>, commandLine=0x0, macros=0x0) at ../iocsh/iocsh.cpp:1143 #8 0x000000000040a616 in main (argc=<optimized out>, argv=<optimized out>) at ../mcaDanteAppMain.cpp:20 Thank you in advance Dariush e Maurizio
Il 16/06/2026 18:18, Mark Rivers ha scritto: Hi Dariush,
Are you using a Dante1 or a Dante8?
Does this happen every time you start, or just occasionally. If it is occasionally, then how frequently does it happen?
Are there any error messages on the IOC?
Are you running on Linux or Windows?
If you are running on Linux then please run the IOC in the GNU debugger. You can do that with the following commands from the iocDante1 directory:
gdb ../../bin/linux-x86_64/mcaDanteApp run st.cmd
When it crashes type this command at the debugger prompt:
backtrace
Send me the output.
Mark
Hi Community, Hi Mark,
I'm almost finish the implementation of Dante EPICS Drivers in our system, however I have a big problem (maybe a bug?). When I start few acquisitions (from caput command or from Phebus), the system seems that crash. The Dante:dante2:ElapsedRealTime stops (not on target). Moreover on Phebus the text is in "Collecting" mode, The Acquire Busy is in "Acquiring" mode and the IOC do not respond to any command that I send. Up to now, the only solution is to stop the IOC and restart it. What's the problem? Is there a possibility to force a reset/reinitialize the driver without stop and restart it?
awaiting your (precious) help
Dariush
--
************************************
Dr. Dariush Hampai, PhD
INFN - LNF
X-Lab Frascati
Via E. Fermi, 54 (ex 40)
I-00044 Frascati (RM)
Italy
Mail Address:
XLab-Frascati
LNF-INFN
Casella Postale 13
Frascati (RM)
Italy
Room: +39.06.9403.5248
Lab.: +39.06.9403.2286
Mob.: +39.06.9403.8025
Fax.: +39.06.9403.2597
************************************
--
************************************
Dr. Dariush Hampai, PhD
INFN - LNF
X-Lab Frascati
Via E. Fermi, 54 (ex 40)
I-00044 Frascati (RM)
Italy
Mail Address:
XLab-Frascati
LNF-INFN
Casella Postale 13
Frascati (RM)
Italy
Room: +39.06.9403.5248
Lab.: +39.06.9403.2286
Mob.: +39.06.9403.8025
Fax.: +39.06.9403.2597
************************************
--
************************************
Dr. Dariush Hampai, PhD
INFN - LNF
X-Lab Frascati
Via E. Fermi, 54 (ex 40)
I-00044 Frascati (RM)
Italy
Mail Address:
XLab-Frascati
LNF-INFN
Casella Postale 13
Frascati (RM)
Italy
Room: +39.06.9403.5248
Lab.: +39.06.9403.2286
Mob.: +39.06.9403.8025
Fax.: +39.06.9403.2597
************************************
--
************************************
Dr. Dariush Hampai, PhD
INFN - LNF
X-Lab Frascati
Via E. Fermi, 54 (ex 40)
I-00044 Frascati (RM)
Italy
Mail Address:
XLab-Frascati
LNF-INFN
Casella Postale 13
Frascati (RM)
Italy
Room: +39.06.9403.5248
Lab.: +39.06.9403.2286
Mob.: +39.06.9403.8025
Fax.: +39.06.9403.2597
************************************
--
************************************
Dr. Dariush Hampai, PhD
INFN - LNF
X-Lab Frascati
Via E. Fermi, 54 (ex 40)
I-00044 Frascati (RM)
Italy
Mail Address:
XLab-Frascati
LNF-INFN
Casella Postale 13
Frascati (RM)
Italy
Room: +39.06.9403.5248
Lab.: +39.06.9403.2286
Mob.: +39.06.9403.8025
Fax.: +39.06.9403.2597
************************************
--
************************************
Dr. Dariush Hampai, PhD
INFN - LNF
X-Lab Frascati
Via E. Fermi, 54 (ex 40)
I-00044 Frascati (RM)
Italy
Mail Address:
XLab-Frascati
LNF-INFN
Casella Postale 13
Frascati (RM)
Italy
Room: +39.06.9403.5248
Lab.: +39.06.9403.2286
Mob.: +39.06.9403.8025
Fax.: +39.06.9403.2597
************************************
|