EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  <20192020  2021  2022  2023  2024  Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  <20192020  2021  2022  2023  2024 
<== Date ==> <== Thread ==>

Subject: Re: libca bug?
From: Michael Davidsaver via Tech-talk <[email protected]>
To: Matt Newville <[email protected]>
Cc: tech-talk <[email protected]>
Date: Mon, 6 May 2019 18:14:56 -0700
On 5/6/19 4:29 PM, Matt Newville wrote:
> Hi Michael, 
> 
> .. and sorry for the slow response

slow?

> On Mon, May 6, 2019 at 12:31 PM Michael Davidsaver <[email protected] <mailto:[email protected]>> wrote:
> 
>     On 5/5/19 9:04 AM, Matt Newville wrote:
>     > (conversation seems to have jumped threads... not sure why)
> 
>     Attempting to jump back...
> 
>     > On Sat, May 4, 2019 at 1:35 PM Michael Davidsaver via Tech-talk <[email protected] <mailto:[email protected]> <mailto:[email protected] <mailto:[email protected]>>> wrote:
>     >
>     >     On 5/4/19 8:29 AM, Matt Newville wrote:>
>     >     ...
>     >     > OK, thanks -- that might be useful. To be clear, this returns False for me in the "normal" case of actually loading `libca` for the first time from a Python session:
>     >
>     >     This depends on how LoadLibrary() is calling dlopen().  cf. RTLD_LOCAL vs. RTLD_GLOBAL.
>     >     The default is RTLD_LOCAL.  fyi. this is actually controllable with
>     >     sys.setdlopenflags(), though I've not tried changing this myself.
>     >
>     >
>     > I think we want to avoid using RTLD_GLOBAL.   That's probably especially true when embedded in an application that is loading and using libca itself.   
> 
>     I can see why the cpython dev.s default to RTLD_LOCAL, and the ctypes API makes changing this
>     unappealing.  Personally, I would want to avoid any situation where libca.so were loaded twice
>     into the same process.
> 
>  
> 
> My concern could be due more to my lack of knowledge of what actually happens when two threads load a shared lib (or, specifically what happens when Python loads a shared lib that was already loaded by the process).

Best case is a duplicate symbol error.  Worst case is some kind of crash.
Calling dlopen() is playing with fire.

>  That is to say: if `ca.py` sets RTLD_GLOBAL, would it override the pointers to the shared lib in the outer application (the IOC)?   I don't know how it would work to have a process have two global pointers to a shared library -- I'm way out of my element on this -- but I see dragons here. 

Agreed.

> I believe the IOC and embedded Python process need to agree on the exact libca to use.   I don't know if it makes a difference if that is loaded twice.  Perhaps there is a way to ask a process (even assuming linux) what shared libs it has *actually* loaded (akin to `ldd` only on a running process) - I don't see anything obvious in /proc/PID/.  Fr now, I think we can probably assume that this can be ensured by setting PYEPICS_LIBCA, but that be something to try to assure rather than expect the user to have done this correctly.

For the record, the process memory map info contains this and more (cf. /proc/<pid>/maps).

> 
>     >     Libraries loaded during normal process start behave as RTLD_GLOBAL.
>     >
>     >     > $ softIocPy2.7
>     >     > epics> py "from ctypes import cdll"
>     >     > epics> py "print hasattr(cdll.LoadLibrary(None), 'ca_context_create')"
>     >     > True
>     >
>     >
>     > Yes, it appears to return True when the application has that shared lib loaded when the Python interpreter is created (or initialized -- don't know that it matters here but maybe, and maybe that depends on details of how pyDevSup works).  I think that is why it might be useful.   
>     >
>     > FWIW, I don't think we want to use
>     >      getattr(cdll.LoadLibrary(None), 'ca_context_cache', None)   
>     >
>     > as the normal way to look up the `ca_context_cache` function when `lbca.ca_context_create` will do.
> 
>     This was only a way to shorten the example.
> 
> 
> My main concern is trying to avoid (unnecessary) complications in the code for what could be considered an unusual usage.   We could probably abstract that away to a `get_ca_method()` function.  That is, I could definitely come around to embracing using pyepics within an IOC (say as an alternative to SNC), but  I'm just a little slow and want to make sure we don't mess up the use of pyepics as a client on workstations.

LoadLibrary(None) is to the best of my knowledge safe on Linux.
Looking further it isn't clear if it will work at all on Windows.

Fun fact.  epicsFindSymbol() for WIN32 is 'GetProcAddress(0, name)',
but the MS documentation doesn't actually say you can pass 0 as the
first argument.


I don't think that much additional complexity is needed.

Looking at

https://github.com/pyepics/pyepics/blob/87c5560324255b07ee18fee643d779b0344ce6ce/epics/ca.py#L321-L331

Something like:

> if os.name == 'nt':
>     load_dll(find_libCom())
>     libca = load_dll(find_libca())
> else:
>     libca = load_dll(None)
>     if not hasattr(libca, 'ca_context_create'):
>         libca = load_dll(find_libca())


> 
>     >
>     >     > So that might be a reliable way to answer the question "am I running as an embedded Python within an IOC-like process that has loaded libca?".  That would be good to know.  Registering epicsAtExit() might be useful in all cases, but it seems like we might really need to do that when embedded in an IOC.
>     >
>     >     Since libca is registering its own exit handler pyepics would want to
>     >     do the same (after ca context creation) to avoid racing against this
>     >     cleanup when epicsExit() is called.
>     >
>     >
>     > I am completely willing to believe we would need to do this (or something like it) in order to use pyepics embedded in an IOC.  Apparently it has not been needed when running as a normal Python process.
> 
>     Yup.  I'm not sure of the details, but I think this is related to the way in which
>     IOCs exit, and what pyDevSup is forced to do as a result.  In a normal python
>     process (main() from Modules/main.c), Py_Finalize() happens before exit().
>     So all python atexit hooks have run, and all non-daemon Threads have stopped,
>     before any epicsAtExit() hooks.
> 
>     With pyDevSup, I have to use an epicsAtExit() hook call to Py_Finalize().
>     But mine may not be the first hook to run.  I think cacExitHandler() is
>     being run before cleanupPy().
> 
> 
> 
> OK, that seems consistent with Bruno's message too (Thanks Bruno!).   It seems like we might be able to tell reliably at runtime "am I running in an embedded interpreter in a process that has already loaded libca", and so be able to alter when the `finalize_libca()` routine is run for the two cases.  That seems like it would solve the main issue.

I think I've found a way to make it more difficult to trigger an ordering
issue.   Though it will still be possible.  You may be off the hook on this part.

https://github.com/mdavidsaver/pyDevSup/pull/20

also fyi this will help going forward:

https://code.launchpad.net/~info-martin-konrad/epics-base/+git/epics-base/+merge/366996

>     >  There might be no harm in always registering an exit handler, but it might require some code changes and testing to do so.  We don't want the "normal" non-embedded case worse.   That's why a test of "was libca loaded by an enclosing process" might be useful.
>     >
>     > I've not yet used pyDevSup myelf, but trying to make pyepics work in a python process embedded in an IOC sounds like a fine idea to me.  Currently, pyepics is not testing that. I could be wrong, but it looks to me like pyDevSup isn't testing the use of pyepics.  Without such tests, I don't think it is really sensible to expect this combination to work without some real effort.
> 
>     Yup.  Coordinating clean shutdown of a multi-threaded app. isn't a "just works" situation.
>     Adding multiple exit hook mechanisms to the mix doesn't help this.
> 
> 
> Right.  But, it does seem like your suggestion might allow us to reliably infer and distinguish the two cases (embedded vs plain Python program).  I think that would go along way towards making it "just work" even if it meant "just a bit of harmless brain alteration".

Great, I'll get my drill!

References:
libca bug? Bruno Martins via Tech-talk
Re: libca bug? Matt Newville via Tech-talk
Re: libca bug? Michael Davidsaver via Tech-talk
Re: libca bug? Matt Newville via Tech-talk
Re: libca bug? Michael Davidsaver via Tech-talk
Re: libca bug? Matt Newville via Tech-talk

Navigate by Date:
Prev: Re: Simple test fails to find subroutine in registry Matt Rippa via Tech-talk
Next: Re: RES: Pmac Slits giles.knap--- via Tech-talk
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  <20192020  2021  2022  2023  2024 
Navigate by Thread:
Prev: Re: libca bug? Matt Newville via Tech-talk
Next: Device sending continuously Data Benjamin Hetz via Tech-talk
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  <20192020  2021  2022  2023  2024 
ANJ, 07 May 2019 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·