EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  <20222023  2024  Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  <20222023  2024 
<== Date ==> <== Thread ==>

Subject: Re: Can't create mutex semaphore: too many
From: Michael Davidsaver via Tech-talk <tech-talk at aps.anl.gov>
To: Till Straumann <till.straumann at psi.ch>, "Luchini, Kristi L." <luchini at slac.stanford.edu>
Cc: Joel Sherrill <joel.sherrill at gmail.com>, Talk EPICS Tech <tech-talk at aps.anl.gov>
Date: Wed, 25 May 2022 14:15:01 -0700
On 5/25/22 08:35, Michael Davidsaver wrote:
On 5/25/22 01:03, Till Straumann wrote:
Hi Kristi.

I'm sorry that I only get to look at this after a long time.

The list of mutexes is indeed very, very long and therefore
the message that no more can be created seems legit.

I believe that your problem could be caused by an algorithm that leeks
mutexes, i.e., a mutex is created, some operation fails and
this process is repeated without destroying the mutex.

Indeed this appears to be the case.  There is an std::map used to track
which PVA servers have sent beacons.  Entries are added, but never removed.
Each entry has a Mutex (this code does very granular locking...).

I can't be certain that this is the cause of what Kristi reports, but it
seems a likely culprit.

https://github.com/epics-base/pvAccessCPP/issues/184


It may be awhile before I have time to dig into this.  If someone else wants
to have a go first, my suggested starting point would be to add LRU behavior.
BeaconHandler::beaconNotify() is already passed (but ignores) the current time.
Then remove old entries as new beacons arrive subject to a max. size,
and/or periodically remove entries for servers which haven't sent a beacon
in some arbitrary time (like PVXS does).

https://github.com/mdavidsaver/pvxs/blob/f22ab94458abb9ed58af0582e88cf1b2b40ed52a/src/client.cpp#L1084-L1121


I was unable to find the code that produces the 'Error on UDP RX xxx -> xxx'.
It is not in any of the RTEMS or EPICS sources that I still have on my
computer.

This particular message is coming from pvAccessCPP.  Those seen earlier in the log
do not.  I'm somewhat surprised that reception of a UDP packet would cause a
mutex/semaphore to be allocated.  However, that code is hard to follow (waaaay too
many virtual function calls for my taste).

https://github.com/epics-base/pvAccessCPP/blob/f72c7e653c607f95daa818d14dfc928d59cfe8ff/src/remote/blockingUDPTransport.cpp#L283-L284


It could be helpful if you could identify the code that generates said message.
If analyzing it does not produce any clues then you might need a bigger hammer
(such as hacking the epicsMutex code to attach a stack trace to every mutex
recording how it was created. Once the problem occurs you look at these
stack traces and then you can see if my hypothesis holds, i.e., there is indeed
some sort of loop/repeated pattern and the traces also point you to the code that
deserves scrutiny).

The first place to start is running "epicsMutexShowAll(0,1)" (eg. from iocsh)
Hopefully this will give some hint.

The epicsMutex code tries to keep track of a file+line where each mutex was
allocated.  eg. why epicsMutexCreate() is a C macro.  Although this can be
obscured by wrapper code.

#define epicsMutexCreate() epicsMutexOsiCreate(__FILE__,__LINE__)


HTH
- Till

On 5/10/22 20:44, Luchini, Kristi L. wrote:

Hi Till,

I should point out that the VME ioc is running RTEMS  4.10.2.   I understood from Shantha that RTEMS changes in this RTEMS version or maybe it was a previous version, was done by someone at SLAC other than you.   That version does have an unrelated known bug regarding the 2^nd lan port and we haven’t been able to move with RF due to that error.  So I don’t know if this is newly introduced problem, but I don’t call this occurring in the RTEMS 4.9.4 version, but  I’ll double check with Sonya since this is her IOC. It’s odd because this ioc doesn’t do much…it is loading the BsaCore epics module and there was a bug that caused a crash when you’re ioc didn’t have any BSA pvs or an EVR. This ioc doesn’t have bsa pvs but it does have an EVR. In any case this binary includes the bug fix for BSA core.

I’ve attached the screenlog.0 file with the rtemsMonitor() command you recommend.

Regards,

  * Kristi

*From:* Till Straumann <till.straumann at psi.ch>
*Sent:* Monday, May 9, 2022 11:45 PM
*To:* Joel Sherrill <joel.sherrill at gmail.com>; Michael Davidsaver <mdavidsaver at gmail.com>; Luchini, Kristi L. <luchini at slac.stanford.edu>
*Cc:* Talk EPICS Tech <tech-talk at aps.anl.gov>
*Subject:* Re: Can't create mutex semaphore: too many

The GeSys application is configured for an unlimited amount of semaphores and in
case of beatnik/mvme6100 15MB of CONFIGURE_EXECUTIVE_RAM_SIZE.

(https://github.com/till-s/rtems-gesys/blob/master/config.c, for beatnik MEMORY_HUGE is defined).

If the rtems monitor is loaded (make sure the 'monitor.obj' module is loaded before you
run into the problem as the monitor creates at least one semaphore for its own use; IIRC
the standard startup scripts at SLAC do in fact load the monitor)
then you can invoke the monitor from the cexp shell and dump info about semaphores

cexpsh> rtemsMonitor()
monitor> sema

(note that the monitor has  its own interpreter with syntax that differs from cexpsh's; leave
the monitor by typing 'exit')

monitor> exit
cepxsh>

Strange - in my many years at SLAC I have never seen this problem...

HTH
- Till

On 5/10/22 04:14, Joel Sherrill via Tech-talk wrote:

    On Mon, May 9, 2022, 7:41 PM Michael Davidsaver <mdavidsaver at gmail.com> wrote:

        On 5/9/22 16:42, Joel Sherrill wrote:
        >
        >
        > On Mon, May 9, 2022, 5:45 PM Michael Davidsaver via Tech-talk <tech-talk at aps.anl.gov <mailto:tech-talk at aps.anl.gov>> wrote:
        >
        >     On 5/9/22 15:08, Luchini, Kristi L. via Tech-talk wrote:
        >      > Hello,
        >      >
        >      > I have a MVME6100 ioc running RTEMS that appears to be having a resource issue.  Does anyone recognize these error messages below and happen to know function call that I can add to the st.cmd to increase that resource? Do  I need to increase the number file descriptors?
        >      >
        >      > If I reboot the ioc the message will go away and the iocs works fine, but after some period of time, a month or so, this message reappears and you can’t access pvs over CA, which I assume is because the iocs seems to be too busy spewing error messages.
        >      >
        >      > Any help would be appreciated.
        >
        >     Sounds like there is a resource leak somewhere, at least of RTEMS semaphores.
        >     I'm not sure if the mention of "file descriptor" is real, or a case of
        >     overloading the meaning of an errno code.
        >
        >     Which drivers and other code is being loaded?
        >
        >     Are any of these creating worker threads after iocInit() ?
        >
        >     Are there any earlier error messages being logged?
        >
        >     You could try running "epicsThreadShowAll()" to see if some driver threads are
        >     hanging around when they shouldn't.
        >
        >     Unfortunately, there is no "epicsMutexShowAll" implemented for RTEMS 4.X, and
        >     I don't know if your RTEMS builds include the RTEMS shell and its associated
        >     diagnostics functions.
        >
        >
        > I don't know what RTEMS configure options are used but there is an option for unlimited and another for unified workspace. If they both are configured, you have one pool of memory to allocate everything from. This includes RTEMS objects and things you allocate via malloc() and new. In this configuration, any leak would be bad and could lead to this. It would just be a matter of which create or allocate failed. Malloc() will return NULL in this situation and that often isn't error checked.

        @joel, I'm 90% certain that Kristi is using Till's GESYS application,
        which has a different RTEMS configuration.  So the following isn't
        relevant for @SLAC.

        Still.  fyi.  The current 7.0 default RTEMS configurations for EPICS applications are:

        for RTEMS  <= 4.x which uses separate pools.

        https://github.com/epics-base/epics-base/blob/cbae8d37b3da486ac8a68ad6ef9d2028cd98cca0/modules/libcom/RTEMS/score/rtems_config.c#L29-L38

    I think that for 4.10, the cpp logic is setting unified workspace. If I remember correctly that's the first version with that feature.



        for RTEMS >= 5.x which sets CONFIGURE_UNIFIED_WORK_AREAS

        https://github.com/epics-base/epics-base/blob/cbae8d37b3da486ac8a68ad6ef9d2028cd98cca0/modules/libcom/RTEMS/posix/rtems_config.c#L64-L7 <https://github.com/epics-base/epics-base/blob/cbae8d37b3da486ac8a68ad6ef9d2028cd98cca0/modules/libcom/RTEMS/posix/rtems_config.c#L64-L72>

    This uses 64 as the maximum file descriptors while the previous had a much larger number. But 64 is still a lot. You shouldn't run out. :)

    Kristi.. what are the configure settings for maximums/unlimited and unified workspace.

    I am pretty sure unified workspace was first an option around 4.10.

    --joel




        > --joel
        >
        >
        >
        >      > Thanks,
        >      >
        >      > Kristi
        >      >
        >      > Message from RTEMS ioc:
        >      >
        >      > Can't create mutex semaphore: too many
        >      >
        >      > Can't create mutex semaphore: too many
        >      >
        >      > Can't create mutex semaphore: too many
        >      >
        >      > Can't create mutex semaphore: too many
        >      >
        >      > Cexp@ioc-in10-mp01>
        >      >
        >      > Cexp@ioc-in10-mp01>Error on UDP RX 172.27.72.43:47591 <http://172.27.72.43:47591> <http://172.27.72.43:47591> -> 172.27.75.255:5076 <http://172.27.75.255:5076> <http://172.27.75.255:5076> at 46 : Can't create mutex semaphore: too many
        >      >
        >      > epicsMutex::mutexCreateFailed()
        >      >
        >      > 0x00 ca024000 27000000 d5b37962 00000000 ..@. '... ..yb ....
        >      >
        >      > 0x10 c84f9922 00740000 00000000 00000000  .O." .t.. .... ....
        >      >
        >      > 0x20 0000ffff 00000000 d3130374 6370ff    .... .... ...t cp.
        >      >
        >      > CAS: Client accept error: Too many open files in system (16)
        >      >
        >      > Error on UDP RX 172.27.72.99:37311 <http://172.27.72.99:37311> <http://172.27.72.99:37311> -> 172.27.75.255:5076 <http://172.27.75.255:5076> <http://172.27.75.255:5076> at 46 : Can't create mutex semaphore: too many
        >      >
        >      > epicsMutex::mutexCreateFailed()
        >      >
        >      > 0x00 ca024000 27000000 a5a77862 00000000 ..@. '... ..xb ....
        >      >
        >      > 0x10 ea850c38 004f0000 00000000 00000000  ...8 .O.. .... ....
        >      >
        >      > 0x20 0000ffff 00000000 d3130374 6370ff    .... .... ...t cp.
        >      >
        >      >   * Kristi
        >      >
        >





Replies:
Re: Can't create mutex semaphore: too many Michael Davidsaver via Tech-talk
References:
RE: Can't create mutex semaphore: too many Luchini, Kristi L. via Tech-talk
Re: Can't create mutex semaphore: too many Michael Davidsaver via Tech-talk
Re: Can't create mutex semaphore: too many Joel Sherrill via Tech-talk
Re: Can't create mutex semaphore: too many Michael Davidsaver via Tech-talk
Re: Can't create mutex semaphore: too many Joel Sherrill via Tech-talk
Re: Can't create mutex semaphore: too many Till Straumann via Tech-talk
Re: Can't create mutex semaphore: too many Till Straumann via Tech-talk
Re: Can't create mutex semaphore: too many Michael Davidsaver via Tech-talk

Navigate by Date:
Prev: Re: Can't create mutex semaphore: too many Michael Davidsaver via Tech-talk
Next: Re: Can't create mutex semaphore: too many Michael Davidsaver via Tech-talk
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  <20222023  2024 
Navigate by Thread:
Prev: Re: Can't create mutex semaphore: too many Michael Davidsaver via Tech-talk
Next: Re: Can't create mutex semaphore: too many Michael Davidsaver via Tech-talk
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  <20222023  2024 
ANJ, 14 Sep 2022 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·