On Wednesday 24 February 2010, Andrew Johnson wrote:
> On Wednesday 24 February 2010 14:43:09 Eric Norum wrote:
> > So, what is the bug?
> > I can certainly reproduce it. I thought that adding 'option +r' at
> > the beginning would fix it, but no such luck.
> >
> > >> Since there is only one place where x gets set to one (in the
> > >> master), and one place where it gets reset to zero (in the slave),
> > >> the assertions in the calls to check() should never fail.
>
> The above statement is false; x can also be set by the sequencer itself
> since it is named in a monitor statement.
Yes.
> I haven't played with this,
> but I suspect the issue is that the state sets don't wait for the CA
> round-trip, and they can get ahead of the server replies.
Not exactly. We checked (experiment-wise) that state sets do indeed wait for
the CA round-trip. However, it is true that nevertheless "the state sets ...
can get ahead of the server replies". The reason is that an event generated
by a pvPut (and the subsequent change in the PV) arrives at /both/ state
sets, so both wake up and try to check their when-conditions.
Now, if the state set that issued the pvPut in the first place gets woken
up, but immediately, before it has a chance to check the condition, gets
interrupted, then the other state set may continue /and overwrite the value
of the state variable/. When the original state set finally gets the
processor and checks the condition things get out of control. It's a
classical race condition.
For the interested, we (that is Götz Pfeiffer and me) have analyzed a
concrete trace and re-constructed execution in detail. The following is a
slightly modified version of the program, annotated with some ad-hoc made-up
syntax to describe (in the comments) the flow of execution:
program SeqTest
%{
#include <assert.h>
static int check_zero(x) {
assert(x==0);
return 1;
}
static int check_one(x) {
assert(x==1);
return 1;
}
}%
int x;
assign x to "x";
monitor x;
ss Master {
state idle {
when (check_zero(x)) { /* 3, 24, 34 */
printf("Master: after check (x=%d)\n",x); /* 4, 25 */
x = 1; /* 5, 26 */
printf("Master: before put (x=%d)\n",x); /* 6, 27 */
pvPut(x); /* 7->ev1[1]
28->ev3[1] */
printf("Master: after put (x=%d)\n",x); /* 8, 29 */
} state work
}
state work { /* 9:S
10<-ev1[1]
30<-ev2[0] */
when (x==0) { /* 21, 31 */
printf("Master: after when (x=%d)\n",x); /* 1, 22, 32 */
} state idle /* 2, 23
33<-ev3[1] */
}
}
ss Slave {
state idle { /* 10<-ev1[1]
20:M
30<-ev2[0]
33<-ev3[1] */
when (x==1) { /* 11 */
printf("Slave: after when (x=%d)\n",x); /* 12 */
} state work
}
state work { /* 13 */
when (check_one(x)) { /* 14 */
printf("Slave: after check (x=%d)\n",x); /* 15 */
x = 0; /* 16 */
printf("Slave: before put (x=%d)\n",x); /* 17 */
pvPut(x); /* 18->ev2[0] */
printf("Slave: after put (x=%d)\n",x); /* 19 */
} state idle
}
}
The annotations (in comments) are to be understood as follows:
timestamp / program counter = integer (1,2,...)
event = ev<number>[<value>] (e.g. ev1[1])
event generation = "->" event
event consumption = "<-" event
task switch = ":S" | ":M" (switch to slave resp. master)
These are the last 14 lines of the output (annotated with "program counters"
and event generation):
1 Master: after when (x=0)
4 Master: after check (x=0)
6 Master: before put (x=1)
7 (pvPut generates ev1[1])
8 Master: after put (x=1)
12 Slave: after when (x=1)
15 Slave: after check (x=1)
17 Slave: before put (x=0)
18 (pvPut generates ev2[0])
19 Slave: after put (x=0)
22 Master: after when (x=0)
25 Master: after check (x=0)
27 Master: before put (x=1)
28 (pvPut generates ev3[1])
29 Master: after put (x=1)
32 Master: after when (x=0)
34 check_zero: Assertion `x==0' failed.
The interesting stuff happens after point 9; control switches to the slave
and afterwards ev1[1] arrives, causing both state sets to unblock. But only
the slave actually proceeds, overwriting x with 0 and sending ev2[0]. Thus,
when the master finally gets back control at point 20, it will see the
"wrong" (overwritten) value for x, so proceeds to send yet another event
ev3[1] (at point 28), while ev2[0] has not yet been consumed. At point 30,
ev2[0] finally arrives and gets consumed by both state sets. The master
keeps going, goes to state idle and at the same time ev3[1] arrives,
invalidating the assertion that x is zero.
It remains to draw practical conclusions for how to avoid such race
conditions in SNL code. Should we generally be wary of pvPut and monitor to
the same state variable? Could the compiler warn us about that? Or is this
too restrictive?
Cheers
Ben
- Replies:
- Re: Fwd: Sequencer bug? Andrew Johnson
- Re: Fwd: Sequencer bug? Tim Mooney
- References:
- Re: Sequencer bug? Benjamin Franksen
- Navigate by Date:
- Prev:
Re: caml Pelaia II, Tom
- Next:
EPICS Spring 2010 Collaboration meeting Di Maio Franck
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
<2010>
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
- Navigate by Thread:
- Prev:
Re: Sequencer bug? Benjamin Franksen
- Next:
Re: Fwd: Sequencer bug? Andrew Johnson
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
<2010>
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
|