One of our programmers consulted me because his device support module
deadlocked the database (i.e. one or two of the scanning tasks). This happened
when his module read a field using dbGetField().
dbGetField() tried to read the value of "that one record" that triggered
"this record's"
processing:
FLNK
Record 1 --------->
Record 2
^
|
|
|
------------------------
dbGetField()
Of course, the deadlock occurred because Record1 processing had
not finished yet at the time the FLNK was processed, leaving Record1
locked when dbGetField() also tried to lock Record1, hence the deadlock.
(Both records are not necessarily scanned by the same task)
Nevertheless, I remembered a very similar constellation to work well
FLNK
Record1 -------->
Record2
^
|
|
|
---------------------
INP NPP
Here, Record2 obtains the value by dbGetLink() and no deadlock occurs.
Studying the source, I was surprised to learn that dbGetLink() (calling
dbGet)
not only does no record locking, but EPICS not seeming to implement
something like a `field read/write access mutex'. A task reading a
database
link (using dbGetLink()) may therefore read a tampered value from a
record
according to the following race condition:
-
record A processing (i.e. in the context of a low prio scanning task) starts
writing a field.
-
record B processing starts (in the context of a high prio task) preempting
the processing of record A. record B processing calls
dbGetLink() trying to read just the field A is currently writing.
Hence B will get only the partially written value!
-
record A completes writing the field and terminates processing.
It is indeed very simple to observe this race condition. I created two
stringin
records, A and B. B is scanned `.1 second', has its INP field set to
"A NPP" and is using the devSiSoft device. A is scanned less frequently
and
has a device support module which (artificially slowly) modifies its
value field.
Observing B shows that the described race condition occasionally is
met.
Did I miss something? Wouldn't some finer grained locking than locking
a whole
scanLock set make sense to prevent this kind of race condition?
Best regards.
Till Straumann (PTB/Bessy II, Berlin)
- Replies:
- Re: database race condition? Marty Kraimer
- Re: database race condition? Marty Kraimer
- Navigate by Date:
- Prev:
Re: TCP s_errno_ENOBUFS error in CAS Frank Lenkszus
- Next:
Re: database race condition? Marty Kraimer
- Index:
1994
1995
1996
1997
1998
<1999>
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
- Navigate by Thread:
- Prev:
RE: TCP s_errno_ENOBUFS error in CAS Jeff Hill
- Next:
Re: database race condition? Marty Kraimer
- Index:
1994
1995
1996
1997
1998
<1999>
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
|