Subject: |
[Bug 1830957] [NEW] pcas deadlocks in casEventSys |
From: |
Till Straumann via Core-talk <[email protected]> |
To: |
[email protected] |
Date: |
Wed, 29 May 2019 17:42:00 -0000 |
Public bug reported:
We observe a deadlock situation in the pcas server:
The indented lines represent the call stack; 1) 2) are threads
1) Application calls casPV::postEvent();
casPVI::postEvent() takes casPVI::.mutex
...
casEventSys::postEvent() takes casEventSys::.mutex
2) server thread runs fileDescriptorManager.process(..)
...
casEventSys::process() takes casEventSys::.mutex
...
casAsyncWriteIOI::cbFuncAsyncIO()
this->chan.uninstallIO()
..
casPVI::uninstallIO() takes casPVI::.mutex
Thus, we have the classical case of two threads trying to acquire two locks in opposite order.
Note that this bug has already been experienced and discussed on tech-
talk (no launchpad bug report I could find, though):
https://epics.anl.gov/tech-talk/2016/msg01930.php
https://github.com/paulscherrerinstitute/pcaspy/issues/29
and a "solution" to the particular race condition reported then has been put in place.
This "solution" is, IMHO, but a mere hack which works around one particular scenario.
(another potential race condition is casPVI::updateEnumStringTableAsyncCompletion()
when called from casAsyncReadIOI::cbFuncAsyncIO() and there may be more)
The deeper problem is -- again IMHO -- a design flaw in the event processing loop which
holds on to the casEventSys::.mutex while working on the callbacks.
It is not unreasonable (and quite common in other event processing systems I have seen)
for an application to post to an asynchronous facility from a guarded code section
and for callbacks to be synchronized using the same (application) lock:
{ guard( myLock );
POST_TO_ASYC_FACILTY( somewhere, myCallback );
other_guarded_business();
}
and
myCallback()
{ guard( myLock );
do_something();
}
Not possible with pcas.
-> I believe the casEventSys::process() loop should be reviewed
- release casEventSys::.mutex while working on the callback
- remove the epicsGuard< evSysMutex > & argument from casEvent::cbFunc()
(this is super ugly anyways. Callback should not have to know about
locking semantics of the event loop)
** Affects: epics-base
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of EPICS
Core Developers, which is subscribed to EPICS Base.
Matching subscriptions: epics-core-list-subscription
https://bugs.launchpad.net/bugs/1830957
Title:
pcas deadlocks in casEventSys
Status in EPICS Base:
New
Bug description:
We observe a deadlock situation in the pcas server:
The indented lines represent the call stack; 1) 2) are threads
1) Application calls casPV::postEvent();
casPVI::postEvent() takes casPVI::.mutex
...
casEventSys::postEvent() takes casEventSys::.mutex
2) server thread runs fileDescriptorManager.process(..)
...
casEventSys::process() takes casEventSys::.mutex
...
casAsyncWriteIOI::cbFuncAsyncIO()
this->chan.uninstallIO()
..
casPVI::uninstallIO() takes casPVI::.mutex
Thus, we have the classical case of two threads trying to acquire two locks in opposite order.
Note that this bug has already been experienced and discussed on tech-
talk (no launchpad bug report I could find, though):
https://epics.anl.gov/tech-talk/2016/msg01930.php
https://github.com/paulscherrerinstitute/pcaspy/issues/29
and a "solution" to the particular race condition reported then has been put in place.
This "solution" is, IMHO, but a mere hack which works around one particular scenario.
(another potential race condition is casPVI::updateEnumStringTableAsyncCompletion()
when called from casAsyncReadIOI::cbFuncAsyncIO() and there may be more)
The deeper problem is -- again IMHO -- a design flaw in the event processing loop which
holds on to the casEventSys::.mutex while working on the callbacks.
It is not unreasonable (and quite common in other event processing systems I have seen)
for an application to post to an asynchronous facility from a guarded code section
and for callbacks to be synchronized using the same (application) lock:
{ guard( myLock );
POST_TO_ASYC_FACILTY( somewhere, myCallback );
other_guarded_business();
}
and
myCallback()
{ guard( myLock );
do_something();
}
Not possible with pcas.
-> I believe the casEventSys::process() loop should be reviewed
- release casEventSys::.mutex while working on the callback
- remove the epicsGuard< evSysMutex > & argument from casEvent::cbFunc()
(this is super ugly anyways. Callback should not have to know about
locking semantics of the event loop)
To manage notifications about this bug go to:
https://bugs.launchpad.net/epics-base/+bug/1830957/+subscriptions
- Navigate by Date:
- Prev:
Re: [Merge] ~epics-core/epics-base/+git/Com:iocsherr into epics-base:7.0 Keenan Lang via Core-talk
- Next:
Jenkins build is still unstable: epics-pva2pva-linux32 #118 APS Jenkins via Core-talk
- Index:
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
<2019>
2020
2021
2022
2023
2024
- Navigate by Thread:
- Prev:
Re: heard about the github sponsors? Jeong Han Lee via Core-talk
- Next:
C++ string question Mark Rivers via Core-talk
- Index:
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
<2019>
2020
2021
2022
2023
2024
|