Hi Mark,
Hi Andrew,
using Claude maybe I found the problem (but not the solution...).
Following what Claude ask, I use Dante in 4096 channels mode, not
in mapping mode and without trigger.
Claude answer to me:
There are two key threads:
Thread 17 "acquisitionTask":
#2 epicsThreadSleep
#3 Dante::acquisitionTask (this=0x594560) at ../dante.cpp:1289
Thread 8 "DANTE1":
#2 epicsThreadSleep
#3 Dante::writeInt32 (this=0x594560, value=1) at
../dante.cpp:461
#4 asynPortDriver::writeInt32
#5 processCallbackOutput (devAsynInt32.c)
#6 portThread (asynManager.c)
1) Thread 8 "DANTE1" is the dedicated thread for the asyn port
(portThread in asynManager.c) — the one EPICS uses to serialize
all read/write requests on that asyn device. It is blocked inside
Dante::writeInt32 with value=1, which with very high probability
corresponds exactly to the EraseStart command you sent. This
function internally calls epicsThreadSleep — so writeInt32 is
waiting in a blocking, synchronous manner for something to happen
(most likely for the acquisition to signal "done").
2) Thread 17 "acquisitionTask" is the internal thread of the Dante
driver that should handle the acquisition loop, also stopped on
epicsThreadSleep inside a loop at dante.cpp:1289. This is almost
certainly a polling loop that periodically checks the acquisition
status (while acquiring → sleep → check status).
The critical point: if acquisitionTask never sees the "acquisition
completed" condition — due to a lost event, a race condition, or a
status flag not being correctly updated by the firmware/board — it
remains stuck in this loop forever. And since writeInt32 (called
by the EraseStart command) most likely waits for this very
completion before returning, the asyn port thread also remains
blocked.
Why the IOC appears "frozen" and no commands work:
This is the most important detail: in asyn, a single thread
(portThread) serializes all requests on that port. If that thread
is blocked inside writeInt32, no other command can be processed on
that device — not even StopAll or Reset, because they end up
queued behind the blocked request. This is why the system appears
completely frozen and unresponsive to anything until you restart
the IOC.
Confirmation from official documentation:
This matches exactly what I found in the Performance page: in
mapping mode with 2048 and 1024 channels, above a certain trigger
frequency spectra are lost and the EPICS IOC must be restarted.
It's very likely you're running into a variant of the same problem
— a race condition in the driver where an acquisition completion
event is lost, leaving acquisitionTask (and therefore
writeInt32/EraseStart) waiting indefinitely. Paulscherrerinstitute
What you can do:
1) There is no documented clean software reset for this scenario,
because the thread that should process any reset command is
precisely the one that is blocked. Restarting the IOC remains,
according to the documentation itself, the official solution.
2) Preventive mitigation — since the problem seems related to a
lost event/state during acquisition, try to:
a) Slightly increase PollTime (e.g., from 0.01 to 0.02-0.05s)
to give the firmware more margin to communicate the status
b) If you are in mapping mode with 1024/2048 channels and high
trigger frequencies, evaluate whether you can switch to 4096
channels, where (according to the Performance table) the limits
are much higher
c) Verify that the Dante board firmware is at the latest
version — bugs of this type are often fixed on the firmware side
3) If the problem persists, it's worth reporting it directly to
EPICS Community or to Dante/XGLab support — the backtrace you
produced is exactly the kind of evidence needed for a bug report,
because it precisely identifies the lines (dante.cpp:1289 and
dante.cpp:461) where the driver gets stuck.
Dariush
Il 23/06/2026 14:51, Dariush Hampai via
Tech-talk ha scritto:
Hi Mark
Hi Andew
with a carefully read of "(gdb) thread apply all bt" I found a
strange Thread almost at the end...
Thread 256 (Thread 0x7fff669a2640 (LWP 1412432) "save_restore"):
#0 0x00007ffff58884da in __futex_abstimed_wait_common () from
/lib64/libc.so.6
#1 0x00007ffff588acaf in pthread_cond_timedwait@@GLIBC_2.3.2
() from /lib64/libc.so.6
#2 0x00007ffff6c8578d in epicsEventWaitWithTimeout
(pevent=0x7fff78020330, timeout=<optimized out>) at
../osi/os/posix/osdEvent.c:131
#3 0x00007ffff6c86c27 in myReceive (timeout=<optimized
out>, size=512, message=0x200, pmsg=0xa724a0) at
../osi/os/default/osdMessageQueue.cpp:369
#4 epicsMessageQueueReceiveWithTimeout (pmsg=0xa724a0,
message=message@entry=0x7fff669a1a40, size=size@entry=512,
timeout=1) at ../osi/os/default/osdMessageQueue.cpp:404
#5 0x00007ffff73d5796 in save_restore () at
../save_restore.c:1226
#6 0x00007ffff6c82795 in start_routine (arg=0x3384670) at
../osi/os/posix/osdThread.c:442
#7 0x00007ffff588b4f9 in start_thread () from /lib64/libc.so.6
#8 0x00007ffff59106e0 in clone3 () from /lib64/libc.so.6
only here it was called the Thread "save_restore" and only here
the "epicsMessageQueueReceiveWithTimeout"
maybe is this the problem?
Dariush
Il 23/06/2026 14:05, Dariush Hampai
via Tech-talk ha scritto:
Hi Mark,
Hi Andrew,
any idea?
Dariush
Il 22/06/2026 11:25, Dariush Hampai
ha scritto:
Dear Mark
Dear Andrew,
I replied the crash and it seems the same.
#0 0x00007ffff5904b92 in pselect () from /lib64/libc.so.6
#1 0x00007ffff6b143bb in rl_getc () from
/lib64/libreadline.so.8
#2 0x00007ffff6b13cd1 in rl_read_key () from
/lib64/libreadline.so.8
#3 0x00007ffff6af8497 in readline_internal_char () from
/lib64/libreadline.so.8
#4 0x00007ffff6b01535 in readline () from
/lib64/libreadline.so.8
#5 0x00007ffff6c85cd2 in osdReadline (context=0x444dc0,
prompt=0x7ffff6c9c183 "epics> ") at
../osi/os/default/gnuReadline.c:70
#6 epicsReadline (prompt=0x7ffff6c9c183 "epics> ",
context=0x444dc0) at ../osi/epicsReadline.c:68
#7 0x00007ffff6c77aea in iocshBody (pathname=<optimized
out>, commandLine=0x0, macros=0x0) at
../iocsh/iocsh.cpp:1143
#8 0x000000000040a616 in main (argc=<optimized out>,
argv=<optimized out>) at ../mcaDanteAppMain.cpp:20
In attach I'll put the output of "thread apply all
backtrace"
awaiting your replies...
Dariush
Il 19/06/2026 17:13, Johnson,
Andrew N. ha scritto:
Hi Dariush,
Please also include any messages output just before and
announcing the crash, and instead of just the gdb command
backtrace first run set height 0
to disable the pager and then thread apply all backtrace
which will produce lots of output that may help Mark
diagnose the problem.
- Andrew
--
Complexity comes for free, Simplicity you have to work
for.
What did you do that triggered the crash this time?
Please continue to run the IOC using gdb. Each time it
crashes save the output of backtrace. We need to see if
it is always crashing in the readline library.
Mark
Dear Mark,
I don't
know if it is the same for all the previous crashes...
however the effects are the same...
Dariush
Il
19/06/2026 16:42, Mark Rivers ha scritto:
Hi Dariush,
The gdb backtrace says that the crash is actually in
the Linux readline library. Was this crash caused by
the same sequence of events as previous crashes you
observed?
Mark
Hi
Mark,
Are you using a Dante1 or a Dante8?
I'm using Dante8
Does this happen every time you start, or just
occasionally. If it is occasionally, then how
frequently does it happen?
Occasionally, more often when I two records are
executed very closely
Are there any error messages on the IOC?
no, however some records are in "acquire" exit (as
$(P)$(M).ACQG in mca window)
Are you running on Linux or Windows?
Linux (Centos 9)
If you are running on Linux then please run the IOC in
the GNU debugger. You can do that with the following
commands from the iocDante1 directory:
gdb ../../bin/linux-x86_64/mcaDanteApp
run st.cmd
When it crashes type this command at the debugger
prompt:
backtrace
(gdb)
backtrace
#0 0x00007ffff5904b92 in pselect () from
/lib64/libc.so.6
#1 0x00007ffff6b143bb in rl_getc () from
/lib64/libreadline.so.8
#2 0x00007ffff6b13cd1 in rl_read_key () from
/lib64/libreadline.so.8
#3 0x00007ffff6af8497 in readline_internal_char ()
from /lib64/libreadline.so.8
#4 0x00007ffff6b01535 in readline () from
/lib64/libreadline.so.8
#5 0x00007ffff6c85cd2 in osdReadline
(context=0x444dc0, prompt=0x7ffff6c9c183 "epics> ")
at ../osi/os/default/gnuReadline.c:70
#6 epicsReadline (prompt=0x7ffff6c9c183 "epics> ",
context=0x444dc0) at ../osi/epicsReadline.c:68
#7 0x00007ffff6c77aea in iocshBody
(pathname=<optimized out>, commandLine=0x0,
macros=0x0) at ../iocsh/iocsh.cpp:1143
#8 0x000000000040a616 in main (argc=<optimized
out>, argv=<optimized out>) at
../mcaDanteAppMain.cpp:20
Thank
you in advance
Dariush e Maurizio
Il
16/06/2026 18:18, Mark Rivers ha scritto:
Hi Dariush,
Are you using a Dante1 or a Dante8?
Does this happen every time you start, or just
occasionally. If it is occasionally, then how
frequently does it happen?
Are there any error messages on the IOC?
Are you running on Linux or Windows?
If you are running on Linux then please run the IOC
in the GNU debugger. You can do that with the
following commands from the iocDante1 directory:
gdb ../../bin/linux-x86_64/mcaDanteApp
run st.cmd
When it crashes type this command at the debugger
prompt:
backtrace
Send me the output.
Mark
Hi Community,
Hi Mark,
I'm almost finish the implementation of Dante EPICS
Drivers in our
system, however I have a big problem (maybe a bug?).
When I start few acquisitions (from caput command or
from Phebus), the
system seems that crash.
The Dante:dante2:ElapsedRealTime stops (not on
target). Moreover on
Phebus the text is in "Collecting" mode, The Acquire
Busy is in
"Acquiring" mode and the IOC do not respond to any
command that I send.
Up to now, the only solution is to stop the IOC and
restart it.
What's the problem?
Is there a possibility to force a reset/reinitialize
the driver without
stop and restart it?
awaiting your (precious) help
Dariush
--
************************************
Dr. Dariush Hampai, PhD
INFN - LNF
X-Lab Frascati
Via E. Fermi, 54 (ex 40)
I-00044 Frascati (RM)
Italy
Mail Address:
XLab-Frascati
LNF-INFN
Casella Postale 13
Frascati (RM)
Italy
Room: +39.06.9403.5248
Lab.: +39.06.9403.2286
Mob.: +39.06.9403.8025
Fax.: +39.06.9403.2597
************************************
--
************************************
Dr. Dariush Hampai, PhD
INFN - LNF
X-Lab Frascati
Via E. Fermi, 54 (ex 40)
I-00044 Frascati (RM)
Italy
Mail Address:
XLab-Frascati
LNF-INFN
Casella Postale 13
Frascati (RM)
Italy
Room: +39.06.9403.5248
Lab.: +39.06.9403.2286
Mob.: +39.06.9403.8025
Fax.: +39.06.9403.2597
************************************
--
************************************
Dr. Dariush Hampai, PhD
INFN - LNF
X-Lab Frascati
Via E. Fermi, 54 (ex 40)
I-00044 Frascati (RM)
Italy
Mail Address:
XLab-Frascati
LNF-INFN
Casella Postale 13
Frascati (RM)
Italy
Room: +39.06.9403.5248
Lab.: +39.06.9403.2286
Mob.: +39.06.9403.8025
Fax.: +39.06.9403.2597
************************************
--
************************************
Dr. Dariush Hampai, PhD
INFN - LNF
X-Lab Frascati
Via E. Fermi, 54 (ex 40)
I-00044 Frascati (RM)
Italy
Mail Address:
XLab-Frascati
LNF-INFN
Casella Postale 13
Frascati (RM)
Italy
Room: +39.06.9403.5248
Lab.: +39.06.9403.2286
Mob.: +39.06.9403.8025
Fax.: +39.06.9403.2597
************************************
--
************************************
Dr. Dariush Hampai, PhD
INFN - LNF
X-Lab Frascati
Via E. Fermi, 54 (ex 40)
I-00044 Frascati (RM)
Italy
Mail Address:
XLab-Frascati
LNF-INFN
Casella Postale 13
Frascati (RM)
Italy
Room: +39.06.9403.5248
Lab.: +39.06.9403.2286
Mob.: +39.06.9403.8025
Fax.: +39.06.9403.2597
************************************
--
************************************
Dr. Dariush Hampai, PhD
INFN - LNF
X-Lab Frascati
Via E. Fermi, 54 (ex 40)
I-00044 Frascati (RM)
Italy
Mail Address:
XLab-Frascati
LNF-INFN
Casella Postale 13
Frascati (RM)
Italy
Room: +39.06.9403.5248
Lab.: +39.06.9403.2286
Mob.: +39.06.9403.8025
Fax.: +39.06.9403.2597
************************************
- Replies:
- Re: Problems with Dante (XGLab) Driver Mark Rivers via Tech-talk
- References:
- Problems with Dante (XGLab) Driver Dariush Hampai via Tech-talk
- Re: Problems with Dante (XGLab) Driver Mark Rivers via Tech-talk
- Re: Problems with Dante (XGLab) Driver Dariush Hampai via Tech-talk
- Re: Problems with Dante (XGLab) Driver Mark Rivers via Tech-talk
- Re: Problems with Dante (XGLab) Driver Dariush Hampai via Tech-talk
- Re: Problems with Dante (XGLab) Driver Mark Rivers via Tech-talk
- Re: Problems with Dante (XGLab) Driver Johnson, Andrew N. via Tech-talk
- Re: Problems with Dante (XGLab) Driver Dariush Hampai via Tech-talk
- Re: Problems with Dante (XGLab) Driver Dariush Hampai via Tech-talk
- Re: Problems with Dante (XGLab) Driver Dariush Hampai via Tech-talk
- Navigate by Date:
- Prev:
Re: ASYN parameter value vs VAL field relationship André Favoto via Tech-talk
- Next:
Epicsarchiver-mgmt-client Sky Brewer via Tech-talk
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
<2026>
- Navigate by Thread:
- Prev:
Re: Problems with Dante (XGLab) Driver Dariush Hampai via Tech-talk
- Next:
Re: Problems with Dante (XGLab) Driver Mark Rivers via Tech-talk
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
<2026>
|