EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  <20212022  2023  2024  Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  <20212022  2023  2024 
<== Date ==> <== Thread ==>

Subject: Re: EPICS stopped
From: Michael Davidsaver via Tech-talk <tech-talk at aps.anl.gov>
To: "Tagger, Jueri" <jtagger at bnl.gov>
Cc: "tech-talk at aps.anl.gov" <tech-talk at aps.anl.gov>
Date: Wed, 19 May 2021 12:52:02 -0700
On 5/19/21 12:44 PM, Michael Davidsaver wrote:
> One thread has triggered an assertion failure while holding a record scan lock and an event queue lock.
> There should be something in the IOC log about this.
> 
> https://github.com/epics-base/epics-base/blob/c7e42fab3c10f6b0c7464482f507e8ffac502937/src/ioc/db/dbEvent.c#L781

cf. https://bugs.launchpad.net/epics-base/+bug/541371


>> Thread 61 (Thread 0x7f394bfff700 (LWP 82077)):
>> #0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
>> #1  0x00007f3a04a9a47b in epicsEventWait () from /home/jtagger/EPICS/R3_15_8-x86_64/base-3.15.8/lib/linux-x86_64/libCom.so.3.15.8
>> #2  0x00007f3a04a96c7d in epicsAssert () from /home/jtagger/EPICS/R3_15_8-x86_64/base-3.15.8/lib/linux-x86_64/libCom.so.3.15.8
>> #3  0x00007f3a04f5fedb in db_queue_event_log () from /home/jtagger/EPICS/R3_15_8-x86_64/base-3.15.8/lib/linux-x86_64/libdbCore.so.3.15.8
>> #4  0x00007f3a04f6012d in db_post_single_event () from /home/jtagger/EPICS/R3_15_8-x86_64/base-3.15.8/lib/linux-x86_64/libdbCore.so.3.15.8
>> #5  0x00007f3a04f8a1e1 in event_add_action () from /home/jtagger/EPICS/R3_15_8-x86_64/base-3.15.8/lib/linux-x86_64/libdbCore.so.3.15.8
>> #6  0x00007f3a04f8b72e in camessage () from /home/jtagger/EPICS/R3_15_8-x86_64/base-3.15.8/lib/linux-x86_64/libdbCore.so.3.15.8
>> #7  0x00007f3a04f881e3 in camsgtask () from /home/jtagger/EPICS/R3_15_8-x86_64/base-3.15.8/lib/linux-x86_64/libdbCore.so.3.15.8
>> #8  0x00007f3a04a97d0c in start_routine () from /home/jtagger/EPICS/R3_15_8-x86_64/base-3.15.8/lib/linux-x86_64/libCom.so.3.15.8
>> #9  0x00007f3a03c72064 in start_thread (arg=0x7f394bfff700) at pthread_create.c:309
>> #10 0x00007f3a03f6f62d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
> 
> 
> 
> On 5/19/21 10:16 AM, Tagger, Jueri via Tech-talk wrote:
>> Thanks to all for many good advices!
>>
>> Juri
>>
>>  
>>
>> Kay wrote:
>>
>>  
>>
>> Well, good.
>>
>>  
>>
>> I would post that to tech talk, maybe somebody has an idea what's happening.
>>
>> Or at least there's now a record in case it happens again.
>>
>>  
>>
>> Sure looks like several threads are stuck waiting for a semaphore:
>>
>>  
>>
>>  
>>
>> Thread 84 (Thread 0x7f39b98b8700 (LWP 66431)):
>>
>> #0  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
>>
>> #1  0x00007f3a03c74494 in _L_lock_952 () from /lib/x86_64-linux-gnu/libpthread.so.0
>>
>> #2  0x00007f3a03c742f6 in __GI___pthread_mutex_lock (mutex=0x7f3978038770) at ../nptl/pthread_mutex_lock.c:114
>>
>> #3  0x00007f3a04a99fb6 in epicsMutexOsdLock () from /home/jtagger/EPICS/R3_15_8-x86_64/base-3.15.8/lib/linux-x86_64/libCom.so.3.15.8
>>
>> #4  0x00007f3a04f8887d in casAccessRightsCB () from /home/jtagger/EPICS/R3_15_8-x86_64/base-3.15.8/lib/linux-x86_64/libdbCore.so.3.15.8
>>
>> #5  0x00007f3a04a76f7e in asComputePvt.part.1 () from /home/jtagger/EPICS/R3_15_8-x86_64/base-3.15.8/lib/linux-x86_64/libCom.so.3.15.8
>>
>> #6  0x00007f3a04a77127 in asComputeAsgPvt.part.2 () from /home/jtagger/EPICS/R3_15_8-x86_64/base-3.15.8/lib/linux-x86_64/libCom.so.3.15.8
>>
>> #7  0x00007f3a04a77868 in asComputeAsg () from /home/jtagger/EPICS/R3_15_8-x86_64/base-3.15.8/lib/linux-x86_64/libCom.so.3.15.8
>>
>> #8  0x00007f3a04d10bc0 in oldSubscription::current(epicsGuard<epicsMutex>&, unsigned int, unsigned long, void const*) () from /home/jtagger/EPICS/R3_15_8-x86_64/base-3.15.8/lib/linux-x86_64/libca.so.3.15.8
>>
>> #9  0x00007f3a04cef017 in cac::eventRespAction(callbackManager&, tcpiiu&, epicsTime const&, caHdrLargeArray const&, void*) () from /home/jtagger/EPICS/R3_15_8-x86_64/base-3.15.8/lib/linux-x86_64/libca.so.3.15.8
>>
>> #10 0x00007f3a04d07616 in tcpiiu::processIncoming(epicsTime const&, callbackManager&) () from /home/jtagger/EPICS/R3_15_8-x86_64/base-3.15.8/lib/linux-x86_64/libca.so.3.15.8
>>
>> #11 0x00007f3a04d098aa in tcpRecvThread::run() () from /home/jtagger/EPICS/R3_15_8-x86_64/base-3.15.8/lib/linux-x86_64/libca.so.3.15.8
>>
>> #12 0x00007f3a04a92089 in epicsThreadCallEntryPoint () from /home/jtagger/EPICS/R3_15_8-x86_64/base-3.15.8/lib/linux-x86_64/libCom.so.3.15.8
>>
>> #13 0x00007f3a04a97d0c in start_routine () from /home/jtagger/EPICS/R3_15_8-x86_64/base-3.15.8/lib/linux-x86_64/libCom.so.3.15.8
>>
>> #14 0x00007f3a03c72064 in start_thread (arg=0x7f39b98b8700) at pthread_create.c:309
>>
>> #15 0x00007f3a03f6f62d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
>>
>>  
>>
>>  
>>
>> --> Trying to re-compute access security?
>>
>>  
>>
>>  
>>
>> Thread 142 (Thread 0x7f3a0222f700 (LWP 453879)):
>>
>> #0  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
>>
>> #1  0x00007f3a03c74494 in _L_lock_952 () from /lib/x86_64-linux-gnu/libpthread.so.0
>>
>> #2  0x00007f3a03c742f6 in __GI___pthread_mutex_lock (mutex=0x1771580) at ../nptl/pthread_mutex_lock.c:114
>>
>> #3  0x00007f3a04a99fb6 in epicsMutexOsdLock () from /home/jtagger/EPICS/R3_15_8-x86_64/base-3.15.8/lib/linux-x86_64/libCom.so.3.15.8
>>
>> #4  0x00007f3a04f4c01b in dbScanLock () from /home/jtagger/EPICS/R3_15_8-x86_64/base-3.15.8/lib/linux-x86_64/libdbCore.so.3.15.8
>>
>> #5  0x00007f3a04f5cbdc in scanList () from /home/jtagger/EPICS/R3_15_8-x86_64/base-3.15.8/lib/linux-x86_64/libdbCore.so.3.15.8
>>
>> #6  0x00007f3a04f5cd2f in ioscanCallback () from /home/jtagger/EPICS/R3_15_8-x86_64/base-3.15.8/lib/linux-x86_64/libdbCore.so.3.15.8
>>
>> #7  0x00007f3a04f66e05 in callbackTask () from /home/jtagger/EPICS/R3_15_8-x86_64/base-3.15.8/lib/linux-x86_64/libdbCore.so.3.15.8
>>
>> #8  0x00007f3a04a97d0c in start_routine () from /home/jtagger/EPICS/R3_15_8-x86_64/base-3.15.8/lib/linux-x86_64/libCom.so.3.15.8
>>
>> #9  0x00007f3a03c72064 in start_thread (arg=0x7f3a0222f700) at pthread_create.c:309
>>
>> #10 0x00007f3a03f6f62d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
>>
>>  
>>
>>  
>>
>> --> Trying to scan some record?
>>
>>  
>>
>>  
>>
>> #0  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
>>
>> #1  0x00007f3a03c74494 in _L_lock_952 () from /lib/x86_64-linux-gnu/libpthread.so.0
>>
>> #2  0x00007f3a03c742f6 in __GI___pthread_mutex_lock (mutex=0x1771580) at ../nptl/pthread_mutex_lock.c:114
>>
>> #3  0x00007f3a04a99fb6 in epicsMutexOsdLock () from /home/jtagger/EPICS/R3_15_8-x86_64/base-3.15.8/lib/linux-x86_64/libCom.so.3.15.8
>>
>> #4  0x00007f3a04f4c01b in dbScanLock () from /home/jtagger/EPICS/R3_15_8-x86_64/base-3.15.8/lib/linux-x86_64/libdbCore.so.3.15.8
>>
>> #5  0x00007f3a04f647d1 in dbChannel_get_count () from /home/jtagger/EPICS/R3_15_8-x86_64/base-3.15.8/lib/linux-x86_64/libdbCore.so.3.15.8
>>
>> #6  0x00007f3a04f8ad52 in read_notify_action () from /home/jtagger/EPICS/R3_15_8-x86_64/base-3.15.8/lib/linux-x86_64/libdbCore.so.3.15.8
>>
>> #7  0x00007f3a04f8b72e in camessage () from /home/jtagger/EPICS/R3_15_8-x86_64/base-3.15.8/lib/linux-x86_64/libdbCore.so.3.15.8
>>
>> #8  0x00007f3a04f881e3 in camsgtask () from /home/jtagger/EPICS/R3_15_8-x86_64/base-3.15.8/lib/linux-x86_64/libdbCore.so.3.15.8
>>
>> #9  0x00007f3a04a97d0c in start_routine () from /home/jtagger/EPICS/R3_15_8-x86_64/base-3.15.8/lib/linux-x86_64/libCom.so.3.15.8
>>
>> #10 0x00007f3a03c72064 in start_thread (arg=0x7f394adf3700) at pthread_create.c:309
>>
>> #11 0x00007f3a03f6f62d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
>>
>>  
>>
>> --> Trying to reply to a CA read, about to lock the channel to get the data size (element count)?
>>
>>  
>>
>> -Kay
>>
>>  
>>
>> *From:* Mark Rivers <rivers at cars.uchicago.edu>
>> *Sent:* Wednesday, May 19, 2021 12:19 PM
>> *To:* Tagger, Jueri <jtagger at bnl.gov>
>> *Cc:* tech-talk at aps.anl.gov
>> *Subject:* RE: EPICS stopped
>>
>>  
>>
>> Is this Linux or some other OS?
>>
>>  
>>
>> It looks like you will need to kill the IOC process.
>>
>>  
>>
>> Mark
>>
>>  
>>
>>  
>>
>> *From:* Tagger, Jueri <jtagger at bnl.gov <mailto:jtagger at bnl.gov>>
>> *Sent:* Wednesday, May 19, 2021 11:15 AM
>> *To:* Mark Rivers <rivers at cars.uchicago.edu <mailto:rivers at cars.uchicago.edu>>
>> *Cc:* tech-talk at aps.anl.gov <mailto:tech-talk at aps.anl.gov>
>> *Subject:* RE: EPICS stopped
>>
>>  
>>
>> No effect from ^C whatsoever. Typed epicsMutexShowAll 1
>>
>> 3 times to blank screen – no response
>>
>>  
>>
>>  
>>
>> *From:* Mark Rivers <rivers at cars.uchicago.edu <mailto:rivers at cars.uchicago.edu>>
>> *Sent:* Wednesday, May 19, 2021 12:12 PM
>> *To:* Tagger, Jueri <jtagger at bnl.gov <mailto:jtagger at bnl.gov>>
>> *Cc:* tech-talk at aps.anl.gov <mailto:tech-talk at aps.anl.gov>
>> *Subject:* RE: EPICS stopped
>>
>>  
>>
>> Can you type ^C to abort the casr command?  It could be a deadlock.  If that works then type
>>
>>  
>>
>> epicsMutexShowAll 1
>>
>>  
>>
>> several times and see if the same mutexes are always locked.   That is the symptom of a deadlock.
>>
>>  
>>
>> Mark
>>
>>  
>>
>>  
>>
>> *From:* Tech-talk <tech-talk-bounces at aps.anl.gov <mailto:tech-talk-bounces at aps.anl.gov>> *On Behalf Of *Tagger, Jueri via Tech-talk
>> *Sent:* Wednesday, May 19, 2021 11:04 AM
>> *To:* tech-talk at aps.anl.gov <mailto:tech-talk at aps.anl.gov>
>> *Subject:* EPICS stopped
>>
>>  
>>
>> Hi,
>>
>> We have suddenly situation we can no longer connect to ca channels, (while the existing connections are served)
>>
>> casr 4
>>
>> 181888 bytes allocated
>>
>>        Send Lock:
>>
>>            epicsMutexId 0x7f3978036420 source ../../../src/ioc/rsrv/caservertask.c line 1231
>>
>>        Put Notify Lock:
>>
>>            epicsMutexId 0x7f3978036450 source ../../../src/ioc/rsrv/caservertask.c line 1232
>>
>>        Address Queue Lock:
>>
>>            epicsMutexId 0x7f3978036480 source ../../../src/ioc/rsrv/caservertask.c line 1233
>>
>>        Event Queue Lock:
>>
>>            epicsMutexId 0x7f39780364b0 source ../../../src/ioc/rsrv/caservertask.c line 1234
>>
>>        Block Semaphore:
>>
>>            epicsEvent 0x7f3978010c90: full
>>
>>     pthread_mutex = 0x7f3978010c90, pthread_cond = 0x7f3978010cb8
>>
>>     TCP client at 10.0.153.12:54302 'box64-3':
>>
>>        User 'rose', V4.13, Priority = 0, 75 Channels
>>
>>        Task Id = 0x7f397803c750, Socket FD = 44
>>
>>        2262.05 secs since last send, 2262.04 secs since last receive
>>
>>        Unprocessed request bytes = 424, Undelivered response bytes = 0
>>
>>        State = up
>>
>>  
>>
>> Then the output stopped …. no response any more on the console
>>
>>  
>>
>> EPICS_BASE=base-3.15.8
>>
>>  
>>
>> Any ideas?
>>
>>  
>>
>> J. Tagger
>>
>> NSLS-II=
>>
> 


Replies:
RE: EPICS stopped Tagger, Jueri via Tech-talk
References:
EPICS stopped Tagger, Jueri via Tech-talk
RE: EPICS stopped Mark Rivers via Tech-talk
RE: EPICS stopped Tagger, Jueri via Tech-talk
RE: EPICS stopped Mark Rivers via Tech-talk
RE: EPICS stopped Tagger, Jueri via Tech-talk
Re: EPICS stopped Michael Davidsaver via Tech-talk

Navigate by Date:
Prev: PVAccess equivalent to EPICS_CAS_INTF_ADDR_LIST environment variable question Wlodek, Jakub via Tech-talk
Next: RE: EPICS stopped Tagger, Jueri via Tech-talk
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  <20212022  2023  2024 
Navigate by Thread:
Prev: Re: EPICS stopped Michael Davidsaver via Tech-talk
Next: RE: EPICS stopped Tagger, Jueri via Tech-talk
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  <20212022  2023  2024 
ANJ, 19 May 2021 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·