On 6/8/21 12:19 PM, Daykin, Evan wrote:
>> , if you think there is a deadlock, I would suggest attaching GDB to a running IOC process and capture all the stack.
>
> So should I run `bt full` on the cbLow thread?
I should have written "... all of the stacks". So all threads, or "thread apply all backtrace".
> -----Original Message-----
> From: Michael Davidsaver <mdavidsaver at gmail.com>
> Sent: Tuesday, June 8, 2021 2:20 PM
> To: Daykin, Evan <daykin at frib.msu.edu>; Mark Rivers <rivers at cars.uchicago.edu>
> Cc: tech-talk at aps.anl.gov
> Subject: Re: cbLow consumption randomly stopping
>
> [EXTERNAL] This email originated from outside of FRIB
>
> On 6/8/21 11:03 AM, Daykin, Evan via Tech-talk wrote:
>
>>> ...
>
>>> Is the driver or device support using asyn? If so, what version?
>
>> No, it seems to be using epicspci/devlib2 to communicate with hardware.
>
>
>
> I think this driver was one that I did back in 2017.
>
>
>
> @Andrew, Mark hardware access is partly through a Linux driver [1], and partly
>
> through MMIO for a custom register range. The Linux driver is public, but the
>
> EPICS driver was FRIB internal. (it would be nice if this could be public as well)
>
>
>
> @Evan, if you think there is a deadlock, I would suggest attaching GDB to a running
>
> IOC process and capture all the stack.
>
>
>
> Looking back, I also added a per-device logging mask (a la asyn). cf. "debugPico()"
>
> at the IOC shell.
>
>
>
>> All cbLow threads show same backtrace.
>
>
>
> These threads are expected to be in epicsEventWait() when idling.
>
> So this by itself does not indicate a deadlock.
>
>
>
>
>
> [1] https://github.com/CAENels/amc-pico8-driver and/or https://github.com/mdavidsaver/amc-pico8-driver
>
>
>
>
>
>> @Andrew, here is some output from attaching gdb:
>
>> (gdb) info threads
>
>> Id Target Id Frame
>
>> 1 Thread 0x7fd5a60a5c80 (LWP 22198) "st.cmd" 0x00007fd5a9d62151 in __pselect (nfds=1, readfds=0x7fff0eb98db0, writefds=0x0, exceptfds=0x0, timeout=<optimized out>, sigmask=<optimized out>) at ../sysdeps/unix/sysv/linux/pselect.c:69
>
>> 2 Thread 0x7fd5a5899700 (LWP 22200) "errlog" futex_wait_cancelable (private=0, expected=0, futex_word=0x55c1d2538054) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
>
>> 3 Thread 0x7fd5a5818700 (LWP 22202) "Reader" futex_wait_cancelable (private=0, expected=0, futex_word=0x55c1d2612600) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
>
>> 4 Thread 0x7fd5a5797700 (LWP 22203) "capture" __GI___libc_read (nbytes=176, buf=0x7fd598000b20, fd=9) at ../sysdeps/unix/sysv/linux/read.c:26
>
>> 5 Thread 0x7fd5a5716700 (LWP 22204) "Reader" futex_wait_cancelable (private=0, expected=0, futex_word=0x55c1d26144d0) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
>
>> 6 Thread 0x7fd5a5695700 (LWP 22205) "capture" __GI___libc_read (nbytes=176, buf=0x7fd59c000b20, fd=13) at ../sysdeps/unix/sysv/linux/read.c:26
>
>> 7 Thread 0x7fd5a5614700 (LWP 22206) "Reader" futex_wait_cancelable (private=0, expected=0, futex_word=0x55c1d26163c0) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
>
>> 8 Thread 0x7fd5a5593700 (LWP 22207) "capture" __GI___libc_read (nbytes=176, buf=0x7fd590000b20, fd=17) at ../sysdeps/unix/sysv/linux/read.c:26
>
>> 9 Thread 0x7fd5a5512700 (LWP 22208) "Reader" futex_wait_cancelable (private=0, expected=0, futex_word=0x55c1d2618350) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
>
>> 10 Thread 0x7fd5a5491700 (LWP 22209) "capture" __GI___libc_read (nbytes=176, buf=0x7fd594000b20, fd=21) at ../sysdeps/unix/sysv/linux/read.c:26
>
>> 11 Thread 0x7fd5a5410700 (LWP 22217) "taskwd" futex_abstimed_wait_cancelable (private=0, abstime=0x7fd5a540fcf0, expected=0, futex_word=0x55c1d2e56e00) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
>
>> 12 Thread 0x7fd5a538f700 (LWP 22218) "timerQueue" futex_abstimed_wait_cancelable (private=0, abstime=0x7fd5a538ecc0, expected=0, futex_word=0x55c1d2619b74) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
>
>> * 13 Thread 0x7fd5a609a700 (LWP 22219) "cbLow-0" futex_wait_cancelable (private=0, expected=0, futex_word=0x55c1d321ba40) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
>
>> 14 Thread 0x7fd5a528e700 (LWP 22220) "cbLow-1" futex_wait_cancelable (private=0, expected=0, futex_word=0x55c1d321ba40) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
>
>> 15 Thread 0x7fd5a508d700 (LWP 22221) "cbLow-2" futex_wait_cancelable (private=0, expected=0, futex_word=0x55c1d321ba40) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
>
>> 16 Thread 0x7fd5a4e8c700 (LWP 22222) "cbLow-3" futex_wait_cancelable (private=0, expected=0, futex_word=0x55c1d321ba40) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
>
>> 17 Thread 0x7fd5a4c8b700 (LWP 22223) "cbLow-4" futex_wait_cancelable (private=0, expected=0, futex_word=0x55c1d321ba40) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
>
>> 18 Thread 0x7fd5a4a8a700 (LWP 22224) "cbLow-5" futex_wait_cancelable (private=0, expected=0, futex_word=0x55c1d321ba40) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
>
>> 19 Thread 0x7fd5a4889700 (LWP 22225) "cbLow-6" futex_wait_cancelable (private=0, expected=0, futex_word=0x55c1d321ba40) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
>
>> 20 Thread 0x7fd5a4688700 (LWP 22226) "cbLow-7" futex_wait_cancelable (private=0, expected=0, futex_word=0x55c1d321ba40) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
>
>> 21 Thread 0x7fd5a4487700 (LWP 22227) "cbMedium" futex_wait_cancelable (private=0, expected=0, futex_word=0x55c1d31ed8c0) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
>
>> 22 Thread 0x7fd5a4286700 (LWP 22228) "cbHigh" futex_wait_cancelable (private=0, expected=0, futex_word=0x55c1d31edac0) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
>
>> 23 Thread 0x7fd567fff700 (LWP 22229) "dbCaLink" futex_wait_cancelable (private=0, expected=0, futex_word=0x55c1d31edeb4) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
>
>> 24 Thread 0x7fd567dfe700 (LWP 22230) "timerQueue" futex_abstimed_wait_cancelable (private=0, abstime=0x7fd567dfdcc0, expected=0, futex_word=0x55c1d31eef44) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
>
>> 25 Thread 0x7fd55095c700 (LWP 22233) "scanOnce" futex_wait_cancelable (private=0, expected=0, futex_word=0x55c1d3883b80) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
>
>> 26 Thread 0x7fd55075b700 (LWP 22234) "scan-10" futex_abstimed_wait_cancelable (private=0, abstime=0x7fd55075acd0, expected=0, futex_word=0x55c1d3883360) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
>
>> 27 Thread 0x7fd55055a700 (LWP 22235) "scan-5" futex_abstimed_wait_cancelable (private=0, abstime=0x7fd550559cd0, expected=0, futex_word=0x55c1d38834a0) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
>
>> 28 Thread 0x7fd550359700 (LWP 22236) "scan-2" futex_abstimed_wait_cancelable (private=0, abstime=0x7fd550358cd0, expected=0, futex_word=0x55c1d38835e0) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
>
>> 29 Thread 0x7fd543fff700 (LWP 22237) "scan-1" futex_abstimed_wait_cancelable (private=0, abstime=0x7fd543ffecd0, expected=0, futex_word=0x55c1d3883720) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
>
>> 30 Thread 0x7fd543dfe700 (LWP 22238) "scan-0.5" futex_abstimed_wait_cancelable (private=0, abstime=0x7fd543dfdcd0, expected=0, futex_word=0x55c1d3883860) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
>
>> 31 Thread 0x7fd543bfd700 (LWP 22239) "scan-0.2" futex_abstimed_wait_cancelable (private=0, abstime=0x7fd543bfccd0, expected=0, futex_word=0x55c1d38839a0) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
>
>> 32 Thread 0x7fd5439fc700 (LWP 22240) "scan-0.1" futex_abstimed_wait_cancelable (private=0, abstime=0x7fd5439fbcd0, expected=0, futex_word=0x55c1d3883ae0) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
>
>> 33 Thread 0x7fd5437fb700 (LWP 22241) "asCaTask" futex_wait_cancelable (private=0, expected=0, futex_word=0x55c1d3927c70) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
>
>> 34 Thread 0x7fd5435fa700 (LWP 22242) "ipToAsciiProxy" futex_wait_cancelable (private=0, expected=0, futex_word=0x7fd52c00fa34) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
>
>> 35 Thread 0x7fd56418c700 (LWP 22243) "timerQueue" futex_abstimed_wait_cancelable (private=0, abstime=0x7fd56418bcc0, expected=0, futex_word=0x7fd52c010224) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
>
>> 36 Thread 0x7fd550158700 (LWP 22244) "CAC-UDP" 0x00007fd5a9d6b742 in __libc_recvfrom (fd=26, buf=0x7fd52c0127d8, len=65551, flags=0, addr=..., addrlen=0x7fd550157ca8) at ../sysdeps/unix/sysv/linux/recvfrom.c:27
>
>> 37 Thread 0x7fd5433f9700 (LWP 22245) "CAC-TCP-recv" __libc_recv (flags=0, len=16384, buf=0x7fd54307b04c, fd=28) at ../sysdeps/unix/sysv/linux/recv.c:28
>
>> 38 Thread 0x7fd5431f8700 (LWP 22246) "CAC-TCP-send" futex_wait_cancelable (private=0, expected=0, futex_word=0x7fd52000e830) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
>
>> 39 Thread 0x7fd543076700 (LWP 22256) "CAS-TCP" 0x00007fd5a9d6b437 in __libc_accept (fd=29, addr=..., len=0x7fd543075cec) at ../sysdeps/unix/sysv/linux/accept.c:26
>
>> 40 Thread 0x7fd542f75700 (LWP 22257) "CAS-UDP" 0x00007fd5a9d6b742 in __libc_recvfrom (fd=30, buf=0x7fd50c001a50, len=65551, flags=0, addr=..., addrlen=0x7fd542f74ce8) at ../sysdeps/unix/sysv/linux/recvfrom.c:27
>
>> 41 Thread 0x7fd542e74700 (LWP 22258) "CAS-beacon" 0x00007fd5a9d37720 in __GI___nanosleep (requested_time=0x7fd542e73c50, remaining=0x7fd542e73c60) at ../sysdeps/unix/sysv/linux/nanosleep.c:28
>
>> 42 Thread 0x7fd542df3700 (LWP 22259) "CAC-event" futex_wait_cancelable (private=0, expected=0, futex_word=0x7fd56008ccd0) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
>
>> 43 Thread 0x7fd542cf2700 (LWP 22260) "logRestart" futex_abstimed_wait_cancelable (private=0, abstime=0x7fd542cf1d30, expected=0, futex_word=0x55c1d39310d0) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
>
>> 44 Thread 0x7fd542c71700 (LWP 22261) "caPutLog" futex_wait_cancelable (private=0, expected=0, futex_word=0x7fd4f8000b90) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
>
>> 45 Thread 0x7fd542bf0700 (LWP 22262) "logRestart" futex_abstimed_wait_cancelable (private=0, abstime=0x7fd542befd30, expected=0, futex_word=0x55c1d393f270) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
>
>> 46 Thread 0x7fd542b6f700 (LWP 22263) "save_restore" futex_abstimed_wait_cancelable (private=0, abstime=0x7fd542b6e880, expected=0, futex_word=0x7fd4fc0204e0) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
>
>> 47 Thread 0x7fd54296e700 (LWP 22264) "reccaster-etcd" 0x00007fd5a9d37720 in __GI___nanosleep (requested_time=0x7fd54296dc40, remaining=0x7fd54296dc50) at ../sysdeps/unix/sysv/linux/nanosleep.c:28
>
>> 48 Thread 0x7fd5428ed700 (LWP 22265) "recEtcdConsumer" futex_wait_cancelable (private=0, expected=0, futex_word=0x55c1d61a30d0) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
>
>> 49 Thread 0x7fd54286c700 (LWP 22266) "recEtcdConsumer" futex_wait_cancelable (private=0, expected=0, futex_word=0x55c1d61a3380) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
>
>> 50 Thread 0x7fd5427eb700 (LWP 22267) "recEtcdConsumer" futex_wait_cancelable (private=0, expected=0, futex_word=0x55c1d61a3630) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
>
>> -----
>
>> (gdb) thread 13
>
>> [Switching to thread 13 (Thread 0x7fd5a609a700 (LWP 22219))]
>
>> #0 futex_wait_cancelable (private=0, expected=0, futex_word=0x55c1d321ba40) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
>
>> 88 ../sysdeps/unix/sysv/linux/futex-internal.h: No such file or directory.
>
>> ------------
>
>> (gdb) bt
>
>> #0 futex_wait_cancelable (private=0, expected=0, futex_word=0x55c1d321ba40) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
>
>> #1 __pthread_cond_wait_common (abstime=0x0, mutex=0x55c1d321b9f0, cond=0x55c1d321ba18) at pthread_cond_wait.c:502
>
>> #2 __pthread_cond_wait (cond=0x55c1d321ba18, mutex=0x55c1d321b9f0) at pthread_cond_wait.c:655
>
>> #3 0x00007fd5a9f3051b in epicsEventWait () from /lib/x86_64-linux-gnu/libCom.so.3.15.8
>
>> #4 0x00007fd5a9f28c99 in epicsEventMustWait () from /lib/x86_64-linux-gnu/libCom.so.3.15.8
>
>> #5 0x00007fd5a9f9c251 in ?? () from /lib/x86_64-linux-gnu/libdbCore.so.3.15.8
>
>> #6 0x00007fd5a9f2da7b in ?? () from /lib/x86_64-linux-gnu/libCom.so.3.15.8
>
>> #7 0x00007fd5a98e9fa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
>
>> #8 0x00007fd5a9d6a4cf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
>
>>
>
>> All cbLow threads show same backtrace.
>
>>
>
>> Thanks for the help,
>
>> Evan
>
>>
>
>> -----Original Message-----
>
>> From: Mark Rivers <rivers at cars.uchicago.edu>
>
>> Sent: Tuesday, June 8, 2021 1:27 PM
>
>> To: Daykin, Evan <daykin at frib.msu.edu>
>
>> Cc: tech-talk at aps.anl.gov
>
>> Subject: Re: cbLow consumption randomly stopping
>
>>
>
>> [EXTERNAL] This email originated from outside of FRIB One possibility is that the thread the pulls the message out of the queue has locked up for some reason.One possibility is a deadlock. When this happens you could try typingepicsMutexShowAll 1several times. If the same mutexes are locked each time you type that, then there is likely a deadlock.What OS is this?What version of EPICS base?Is the driver or device support using asyn? If so, what version?Mark________________________________From: Tech-talk <tech-talk-bounces at aps.anl.gov> on behalf of Daykin, Evan via Tech-talk <tech-talk at aps.anl.gov>Sent: Tuesday, June 8, 2021 12:10 PMTo: EPICS tech-talkSubject: cbLow consumption randomly stoppingHello tech-talk,We have an IOC repeatedly locking up with the infamous "cbLow ring buffer full" message. The IOC consumes data from our CAEN Pico-8 card via an EPICS driver at 100Hz (about 1200-1215 events per second total) , and at some point on the order of hours to days, seems to stop consuming callbacks from the queue. I wrote a script to track this, and it seems to all be happening in one shot. Below is some data I captured from the last time this happened. Before this, everything was happy for about 4 hours, but sometimes it takes up to a week for this to happen.TimeHigh-waterItems in qQ capacity%Oflws6/7/2021 19:22:4013012000006/7/2021 19:22:4113012000006/7/2021 19:22:4213012000006/7/2021 19:22:4413012000006/7/2021 19:22:45571571120004.806/7/2021 19:22:46177817781200014.806/7/2021 19:22:47299129911200024.906/7/2021 19:22:4842044204120003506/7/2021 19:22:4954065406120004506/7/2021 19:22:50661966191200055.206/7/2021 19:22:51783478341200065.306/7/2021 19:22:52904890481200075.406/7/2021 19:22:5310249102491200085.406/7/2021 19:22:5411462114621200095.506/7/2021 19:22:5612000120001200010016/7/2021 19:22:581200012000120001001Has anyone else experienced this? If so, what is the remedy? I can't seem to correlate it to anything unusual happening in the OS/kernel.Thanks,Evan
>
>>
>
>
>
- Replies:
- RE: cbLow consumption randomly stopping Daykin, Evan via Tech-talk
- References:
- cbLow consumption randomly stopping Daykin, Evan via Tech-talk
- Re: cbLow consumption randomly stopping Mark Rivers via Tech-talk
- RE: cbLow consumption randomly stopping Daykin, Evan via Tech-talk
- Re: cbLow consumption randomly stopping Michael Davidsaver via Tech-talk
- RE: cbLow consumption randomly stopping Daykin, Evan via Tech-talk
- Navigate by Date:
- Prev:
Re: cbLow consumption randomly stopping Johnson, Andrew N. via Tech-talk
- Next:
Power Meters device support Luchini, Kristi L. via Tech-talk
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
<2021>
2022
2023
2024
- Navigate by Thread:
- Prev:
RE: cbLow consumption randomly stopping Daykin, Evan via Tech-talk
- Next:
RE: cbLow consumption randomly stopping Daykin, Evan via Tech-talk
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
<2021>
2022
2023
2024
|