Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  <20132014  2015  2016  2017  2018  2019  Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  <20132014  2015  2016  2017  2018  2019 
<== Date ==> <== Thread ==>

Subject: Re: pvGet timeout in sequencer 2.1.12
From: Benjamin Franksen <benjamin.franksen@helmholtz-berlin.de>
To: <tech-talk@aps.anl.gov>
Date: Thu, 4 Jul 2013 21:16:43 +0200
I feel a bit left alone, momentarily, talking to myself all the time. I guess
all the experts are on vacation.

For the meantime, I updated the known problems page for the sequencer, and
added two patches that might or might not solve the problem or make it worse,
depending on subtle details of the CA client API semantics.

See http://www-csr.bessy.de/control/SoftDist/sequencer/KnownProblems.html for
details.

Cheers
Ben

PS: I've already been making jokes to my co-workers about adding two new
runtime options, named, respectively, --risk-memory-corruption and --risk-
memory-leak, ideally aborting with an error if you forget to chose one or the
other ;-)) See also http://dilbert.com/strips/comic/1994-09-30/

Am Mittwoch, 3. Juli 2013, 01:22:41 schrieb Benjamin Franksen:
> Am Mittwoch, 19. Juni 2013, 18:10:32 schrieb Benjamin Franksen:
> > On Wednesday, June 12, 2013 12:38:40 J. Lewis Muir wrote:
> > > On 6/12/13 11:10 AM, Benjamin Franksen wrote:
> > > > On Monday, June 10, 2013 17:22:12 Carl Lionberger wrote:
> > > >> Looking at the code in seq_ca.c and seq_if.c, it seems that if a
> > > >> synchronous pvGet ever fails to get a callback from channel access,
> > > >> the "get completion timeout" message is set in the channel metadata
> > > >> and all subsequent attempts to do pvGets on that channel will fail
> > > >> as in the first
> > > >> message.  The latter occurs because there is a get semaphore for
> > > >> each channel that the callback is supposed to give.
> > > >>
> > > >> My thought is that if the callback doesn't come the semaphore should
> > > >> be released in the same code that sets the "get completion timeout"
> > > >> message.
> > > >
> > > > I find it very hard to judge what the correct behaviour is, here.
> > >
> > > I think I've run into Carl's problem before too.
> > >
> > > Could you supply the CA callback with a sequence number that would
> > > uniquely identify the callback as being for a particular pvGet?  This
> > > way, you could ignore a callback if it arrives for a pvGet that has
> > > been abandoned.
>
> I just found that fixing this is more subtle than I expected.
>
> The problem is this: in order to be able to associate incoming put and get
> events (callbacks from CA) to where the request originated, I need two
> pieces of information:
>
> (a) the variable
> (b) the state set (~= thread)
>
> Both pieces are identified via pointers. So I had to introduce a data type
> for requests (a struct) that contains just these two pointers. I pass a
> pointer to such a struct to ca_array_get_callback resp.
> ca_array_put_callback, which when they call the callback, pass the pointer
> as an argument.
>
> This pointer MUST be valid until I receive the callback that belongs to the
> request! Because if not, then the callback routine accesses invalid memory,
> leading to crashes or worse.
>
> That means I must never call free (actually freeListFree, but that's not
> important here) before the callback arrives. But what if it never arrives?
> Then we get a memory leak. Arrgh!
>
> This leads me to the following
>
>
> Channel Access Question
> =======================
>
> Does the CA client library guarantee that callbacks for
> ca_array_get_callback and ca_array_put_callback are eventually called, no
> matter what, assuming I just wait long enough?
>
> Note that "no matter what" includes lost connections etc.
>
>
> If I can rely on all put/get request callbacks to be called, eventually,
> then I can solve the (original) problem by using the request pointer as my
> unique identifier. I'll have to store the request pointers, (quasi-)
> statically allocating memory for (number of state sets)x(number of
> assigned variables) request pointers, but taht's no different from storing
> the same amount of semaphores which I already do. And I simply forget
> about timed out requests, assuming that my callback will eventually be
> called and the memory freed.
>
> If I cannot rely on this, then I have to keep track of timed out requests
> (probably chaining them in a list), so that at least the disconnect handler
> can finally free them. (That assumes that I *can* rely on the connection
> handler to be called, eventually.)
>
> Another solution would be to avoid allocation of request structs by
> pressing the information I need into an integer and casting that from/to
> the void pointer. This means that on 32 bit systems I have to introduce
> upper limits (2^16 ) for the number of variables and the number of state
> sets per program. The limits themselves are not unreasonable, but I would
> rather try to avoid such an ugly hack.
>
> All this headache could easily be avoided if the CA client API offered a
> way to cancel pending requests.
--
"Make it so they have to reboot after every typo." -- Scott Adams

________________________________

Helmholtz-Zentrum Berlin für Materialien und Energie GmbH

Mitglied der Hermann von Helmholtz-Gemeinschaft Deutscher Forschungszentren e.V.

Aufsichtsrat: Vorsitzender Prof. Dr. Dr. h.c. mult. Joachim Treusch, stv. Vorsitzende Dr. Beatrix Vierkorn-Rudolph
Geschäftsführung: Prof. Dr. Anke Rita Kaysser-Pyzalla, Thomas Frederking

Sitz Berlin, AG Charlottenburg, 89 HRB 5583

Postadresse:
Hahn-Meitner-Platz 1
D-14109 Berlin

http://www.helmholtz-berlin.de


Replies:
Re: pvGet timeout in sequencer 2.1.12 Andrew Johnson
References:
pvGet timeout in sequencer 2.1.12 Carl Lionberger
Re: pvGet timeout in sequencer 2.1.12 Benjamin Franksen
Re: pvGet timeout in sequencer 2.1.12 Benjamin Franksen

Navigate by Date:
Prev: Re: asyn+streamDevice hangs with synchronousLock:Yes Brian McAllister
Next: how to use gdb to debug a target IOC? Silver
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  <20132014  2015  2016  2017  2018  2019 
Navigate by Thread:
Prev: Re: pvGet timeout in sequencer 2.1.12 Benjamin Franksen
Next: Re: pvGet timeout in sequencer 2.1.12 Andrew Johnson
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  <20132014  2015  2016  2017  2018  2019 
ANJ, 20 Apr 2015 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·