Hi Oleg,
I have observed a similar crash today with EPICS 3.14.12.7.
CAS: request from 172.26.120.14:57787 => bad resource ID
CAS: Request from 172.26.120.14:57787 => cmmd=15 cid=0x26f7 type=10
count=1 postsize=0
CAS: Request from 172.26.120.14:57787 => available=0x76635 ^IN=0
paddr=(nil)
CAS: forcing disconnect from 172.26.120.14:57787
A call to 'assert(status == epicsMutexLockOK)'
by thread 'CAS-client' failed in ../dbEvent.c line 531.
EPICS Release EPICS R3.14.12.7-DEV-2018-04.
Local time is 2018-05-24 15:45:05.435638450 CEST
Please E-mail this message to the author or to [email protected]
Calling epicsThreadSuspendSelf()
Thread CAS-client (0xb5c08258) suspended
Here the crash seems related to a CA failure as well. The same thread
'CAS-client' crashed.
In my case the crash is rare. Many "bad resource ID" errors can be
generated without problem before one error causes the crash.
Meanwhile I suspect the following race condition:
While CAS-client is repeatedly locking mutexes in a loop, another thread
destroys them. This makes epicsMutexMustLock() fail.
At least a test that I wrote creates the same type of error message when
doing exactly that:
creating mutex
locking mutex
creating thread
taking mutex
unlocking and destroying mutex now
destroying mutex done
unblocked
taking mutex
A call to 'assert(status == epicsMutexLockOK)'
by thread 'blocked' failed in ../testmutex.c line 11.
In your case LOCKREC = epicsMutexMustLock((RECPTR)->mlok) fails.
In my case LOCKEVQUE = epicsMutexMustLock((EV_QUE)->writelock) fails.
I had first suspected that epicsMutexDestroy might destroy the mutex
while the other thread is waiting on it, but in that case
pthread_mutex_destroy would fail. (What I am missing in
epicsMutexOsdDestroy() is to wait and retry as long as
pthread_mutex_destroy returns EBUSY.) But that seems not to be the
problem. Indeed if I change the test to destroy the locked mutex I get:
creating mutex
locking mutex
creating thread
taking mutex
destroying mutex now
epicsMutex pthread_mutex_destroy failed: error Device or resource busy
destroying mutex done
Now epicsMutexDestroy prints an error and finishes while the other
thread is stuck in epicsMutexMustLock() and never wakes up. Also not the
best behavior in my opinion. epicsMutexDestroy should wait until the
mutex is free. This needs some more code in epicsMutexOsdDestroy. But
that is not the problem here.
I think the problem is that mutexes used in dbEvent.c may be destroyed
before the CAS-client thread has finished its work.
Maybe I will start digging a bit deeper into the dbEvent.c,
posix/osdMutex.c and CAS-client code...
Dirk
On 09.04.2018 18:19, Oleg A. Makarov wrote:
Hi Jeffrey,
yesterday EPICS PVs updates stalled on one of our IOCs. IOC reboot solve
the problem.
I am posting log messages from IOC console below. EPICS version is 3.15.5.
Should some corrections to the CAS be made, so that erroneous request
does not cause suspended threads?
Oleg
CAS: request from 164.54.103.102:38866 => CAS: Missaligned protocol rejected
CAS: Request from 164.54.103.102:38866 => cmmd=29797 cid=0x73000000 type=29793 count=29813 postsiz
e=29267
CAS: Request from 164.54.103.102:38866 => available=0x0 N=2 paddr=0x0
CAS: invalid (damaged?) UDP request from 164.54.103.102:38866 ?
CAS: message received at 2018-03-22 20:18:21
CAS: request from 164.54.103.102:54641 => CAS: Missaligned protocol rejected
CAS: Request from 164.54.103.102:54641 => cmmd=29797 cid=0x73000000 type=29793 count=29813 postsiz
e=29267
CAS: Request from 164.54.103.102:54641 => available=0x0 N=2 paddr=0x0
CAS: invalid (damaged?) UDP request from 164.54.103.102:54641 ?
CAS: message received at 2018-03-31 13:36:44
A call to 'assert(status == epicsMutexLockOK)'
by thread 'CAS-client' failed in ../../../src/ioc/db/dbEvent.c line 493.
EPICS Release EPICS R3.15.5.
Local time is 2018-04-07 19:59:10.227314800 CST/CDT
Please E-mail this message to the author or [email protected]
Calling epicsThreadSuspendSelf()
Thread CAS-client (0x492c090) suspended
A call to 'assert(status == epicsMutexLockOK)'
by thread 'CAS-client' failed in ../../../src/ioc/db/dbEvent.c line 477.
EPICS Release EPICS R3.15.5.
Local time is 2018-04-08 08:04:00.242762000 CST/CDT
Please E-mail this message to the author or [email protected]
Calling epicsThreadSuspendSelf()
Thread CAS-client (0x4c573d0) suspended
A call to 'assert(status == epicsMutexLockOK)'
by thread 'CAS-client' failed in ../../../src/ioc/db/dbEvent.c line 477.
EPICS Release EPICS R3.15.5.
Local time is 2018-04-08 08:06:45.226095334 CST/CDT
Please E-mail this message to the author or [email protected]
Calling epicsThreadSuspendSelf()
Thread CAS-client (0x4b810a0) suspended
A call to 'assert(status == epicsMutexLockOK)'
by thread 'CAS-client' failed in ../../../src/ioc/db/dbEvent.c line 493.
EPICS Release EPICS R3.15.5.
Local time is 2018-04-08 08:11:10.776095352 CST/CDT
Please E-mail this message to the author or [email protected]
Calling epicsThreadSuspendSelf()
Thread CAS-client (0x4962b00) suspended
A call to 'assert(status == epicsMutexLockOK)'
by thread 'CAS-client' failed in ../../../src/ioc/db/dbEvent.c line 477.
EPICS Release EPICS R3.15.5.
Local time is 2018-04-08 08:13:56.288060135 CST/CDT
Please E-mail this message to the author or [email protected]
Calling epicsThreadSuspendSelf()
Thread CAS-client (0x4b8bf40) suspended
A call to 'assert(status == epicsMutexLockOK)'
by thread 'CAS-client' failed in ../../../src/ioc/db/dbEvent.c line 477.
EPICS Release EPICS R3.15.5.
Local time is 2018-04-08 08:17:01.421393463 CST/CDT
Please E-mail this message to the author or [email protected]
Calling epicsThreadSuspendSelf()
Thread CAS-client (0x4bfbbc0) suspended
A call to 'assert(status == epicsMutexLockOK)'
by thread 'CAS-client' failed in ../../../src/ioc/db/dbEvent.c line 477.
EPICS Release EPICS R3.15.5.
Local time is 2018-04-08 08:19:30.938060149 CST/CDT
Please E-mail this message to the author or [email protected]
Calling epicsThreadSuspendSelf()
Thread CAS-client (0x25d37f0) suspended
A call to 'assert(status == epicsMutexLockOK)'
by thread 'CAS-client' failed in ../../../src/ioc/db/dbEvent.c line 477.
EPICS Release EPICS R3.15.5.
Local time is 2018-04-08 08:22:36.554726791 CST/CDT
Please E-mail this message to the author or [email protected]
Calling epicsThreadSuspendSelf()
Thread CAS-client (0x4c40d70) suspended
A call to 'assert(status == epicsMutexLockOK)'
by thread 'CAS-client' failed in ../../../src/ioc/db/dbEvent.c line 477.
EPICS Release EPICS R3.15.5.
Local time is 2018-04-08 08:23:58.421393463 CST/CDT
Please E-mail this message to the author or [email protected]
Calling epicsThreadSuspendSelf()
Thread CAS-client (0x4c4a810) suspended
A call to 'assert(status == epicsMutexLockOK)'
by thread 'CAS-client' failed in ../../../src/ioc/db/dbEvent.c line 477.
EPICS Release EPICS R3.15.5.
Local time is 2018-04-08 08:28:55.938060149 CST/CDT
Please E-mail this message to the author or [email protected]
Calling epicsThreadSuspendSelf()
Thread CAS-client (0x4c85be0) suspended
A call to 'assert(status == epicsMutexLockOK)'
by thread 'CAS-client' failed in ../../../src/ioc/db/dbEvent.c line 477.
EPICS Release EPICS R3.15.5.
Local time is 2018-04-08 08:40:05.300141941 CST/CDT
Please E-mail this message to the author or [email protected]
Calling epicsThreadSuspendSelf()
Thread CAS-client (0x4c90690) suspended
23i:ioc2>
23i:ioc2>
23i:ioc2>
23i:ioc2>
23i:ioc2>
23i:ioc2>
23i:ioc2>
23i:ioc2>
23i:ioc2>
23i:ioc2> reboot
- References:
- EPICS CAS errors Oleg A. Makarov
- Navigate by Date:
- Prev:
RE: EDM X/Y Plot segfaults Baily, Scott A
- Next:
caQtDM questions Baily, Scott A
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
<2018>
2019
2020
2021
2022
2023
2024
- Navigate by Thread:
- Prev:
Re: EPICS CAS errors Oleg A. Makarov
- Next:
EPICS Meeting Web Site Issues Paul Chu
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
<2018>
2019
2020
2021
2022
2023
2024
|