Experimental Physics and
Industrial Control System

1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 <2010> 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025	Index	1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 <2010> 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025
<== Date ==>		<== Thread ==>

Subject:	Re: Fwd: Sequencer bug?
From:	Benjamin Franksen <[email protected]>
To:	[email protected]
Date:	Thu, 25 Feb 2010 14:43:51 +0100

On Wednesday 24 February 2010, Andrew Johnson wrote:
> On Wednesday 24 February 2010 14:43:09 Eric Norum wrote:
> > So, what is the bug?
> > I can certainly reproduce it.   I thought that adding 'option +r' at
> > the beginning would fix it, but no such luck.
> >
> > >> Since there is only one place where x gets set to one (in the
> > >> master), and one place where it gets reset to zero (in the slave),
> > >> the assertions in the calls to check() should never fail.
> 
> The above statement is false; x can also be set by the sequencer itself
>  since it is named in a monitor statement.  

Yes.

>  I haven't played with this,
>  but I suspect the issue is that the state sets don't wait for the CA
>  round-trip, and they can get ahead of the server replies.

Not exactly. We checked (experiment-wise) that state sets do indeed wait for 
the CA round-trip. However, it is true that nevertheless "the state sets ... 
can get ahead of the server replies". The reason is that an event generated 
by a pvPut (and the subsequent change in the PV) arrives at /both/ state 
sets, so both wake up and try to check their when-conditions.

Now, if the state set that issued the pvPut in the first place gets woken 
up, but immediately, before it has a chance to check the condition, gets 
interrupted, then the other state set may continue /and overwrite the value 
of the state variable/. When the original state set finally gets the 
processor and checks the condition things get out of control. It's a 
classical race condition.

For the interested, we (that is Götz Pfeiffer and me) have analyzed a 
concrete trace and re-constructed execution in detail. The following is a 
slightly modified version of the program, annotated with some ad-hoc made-up 
syntax to describe (in the comments) the flow of execution:

program SeqTest

%{
#include <assert.h>

static int check_zero(x) {
    assert(x==0);
    return 1;
}

static int check_one(x) {
    assert(x==1);
    return 1;
}
}%

int x;
assign x to "x";
monitor x;

ss Master {
    state idle {
        when (check_zero(x)) {                          /* 3, 24, 34 */
            printf("Master: after check (x=%d)\n",x);   /* 4, 25 */
            x = 1;                                      /* 5, 26 */
            printf("Master: before put (x=%d)\n",x);    /* 6, 27 */
            pvPut(x);                                   /* 7->ev1[1]
                                                           28->ev3[1] */
            printf("Master: after put (x=%d)\n",x);     /* 8, 29 */
        } state work
    }
    state work {                                        /* 9:S
                                                           10<-ev1[1]
                                                           30<-ev2[0] */
        when (x==0) {                                   /* 21, 31 */
            printf("Master: after when (x=%d)\n",x);    /* 1, 22, 32 */
        } state idle                                    /* 2, 23
                                                           33<-ev3[1] */
    }
}

ss Slave {
    state idle {                                        /* 10<-ev1[1]
                                                           20:M
                                                           30<-ev2[0]
                                                           33<-ev3[1] */
        when (x==1) {                                   /* 11 */
            printf("Slave: after when (x=%d)\n",x);     /* 12 */
        } state work
    }
    state work {                                        /* 13 */
        when (check_one(x)) {                           /* 14 */
            printf("Slave: after check (x=%d)\n",x);    /* 15 */
            x = 0;                                      /* 16 */
            printf("Slave: before put (x=%d)\n",x);     /* 17 */
            pvPut(x);                                   /* 18->ev2[0] */
            printf("Slave: after put (x=%d)\n",x);      /* 19 */
        } state idle
    }
}

The annotations (in comments) are to be understood as follows:

timestamp / program counter	= integer     (1,2,...)
event 					= ev<number>[<value>]  (e.g. ev1[1])
event generation 			= "->" event
event consumption			= "<-" event
task switch				= ":S" | ":M" (switch to slave resp. master)

These are the last 14 lines of the output (annotated with "program counters" 
and event generation):

1   Master: after when (x=0)
4   Master: after check (x=0)
6   Master: before put (x=1)
7       (pvPut generates ev1[1])
8   Master: after put (x=1)
12  Slave: after when (x=1)
15  Slave: after check (x=1)
17  Slave: before put (x=0)
18      (pvPut generates ev2[0])
19  Slave: after put (x=0)
22  Master: after when (x=0)
25  Master: after check (x=0)
27  Master: before put (x=1)
28      (pvPut generates ev3[1])
29  Master: after put (x=1)
32  Master: after when (x=0)
34  check_zero: Assertion `x==0' failed.

The interesting stuff happens after point 9; control switches to the slave 
and afterwards ev1[1] arrives, causing both state sets to unblock. But only 
the slave actually proceeds, overwriting x with 0 and sending ev2[0]. Thus, 
when the master finally gets back control at point 20, it will see the 
"wrong" (overwritten) value for x, so proceeds to send yet another event 
ev3[1] (at point 28), while ev2[0] has not yet been consumed. At point 30, 
ev2[0] finally arrives and gets consumed by both state sets. The master 
keeps going, goes to state idle and at the same time ev3[1] arrives, 
invalidating the assertion that x is zero.

It remains to draw practical conclusions for how to avoid such race 
conditions in SNL code. Should we generally be wary of pvPut and monitor to 
the same state variable? Could the compiler warn us about that? Or is this 
too restrictive?

Cheers
Ben

Replies:: Re: Fwd: Sequencer bug? Andrew Johnson; Re: Fwd: Sequencer bug? Tim Mooney

References:: Re: Sequencer bug? Benjamin Franksen

Navigate by Date:: Prev: Re: caml Pelaia II, Tom; Next: EPICS Spring 2010 Collaboration meeting Di Maio Franck; Index: 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 <2010> 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025
Navigate by Thread:: Prev: Re: Sequencer bug? Benjamin Franksen; Next: Re: Fwd: Sequencer bug? Andrew Johnson; Index: 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 <2010> 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025

ANJ, 02 Sep 2010

· Home · News · About · Base · Modules · Extensions · Distributions ·
· Download · Search · IRMIS · Talk · Documents · Links · Licensing ·

Experimental Physics and Industrial Control System

Experimental Physics and
Industrial Control System