EPICS Home

Experimental Physics and Industrial Control System


 
2002  2003  2004  2005  2006  <20072008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024  Index 2002  2003  2004  2005  2006  <20072008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
<== Date ==> <== Thread ==>

Subject: Re: Mutex error on recent Redhat systems
From: Andrew Johnson <[email protected]>
To: Sergey Stepanov <[email protected]>
Cc: EPICS core-talk <[email protected]>
Date: Fri, 18 May 2007 11:45:22 -0500
Hi Sergey,

Sergey Stepanov wrote:
> Hi Andrew,
> we wonder if you have seen this: we are often observing
> the following error when using caget and caput (the EPICS
> base programs) on Redhat EL5 and Fedora-6:

I have not seen this yet as I haven't had either of those OSs available 
before now -- I've been asking IT to upgrade my PC to Fedora-6 for some 
time, and it finally looks like they're able to do it so I will 
investigate this problem as soon as I can.

Has anyone else on core-talk seen this problem?

> gmca@bl1ws3:~ 43> caget 23i:GO:Om:Start.DESC
> 23i:GO:Om:Start.DESC           Gonio Omega
> epicsThreadOnceOsd epicsMutexLock failed.
> 
> 
> gmca@bl1ws3:~ 47> caput 23i:GO:Om:Start.DESC "Gonio Omega"
> Old : 23i:GO:Om:Start.DESC           Gonio Omega
> New : 23i:GO:Om:Start.DESC           Gonio Omega
> epicsThreadOnceOsd epicsMutexLock failed.
> 
> This is happening at least with EPICS BASE-3.14.9 and BASE-3.14.8.2.
> At the same time the error does not show up on Redhat EL3 and
> Redhat EL4 (with the same versions of EPICS BASE). The rate of
> the error is about 5-10%.
> 
> Does it ring the bell for you? We did not find any note on this
> problem in the EPICS Mantis Bug Reports list. Also we do not
> have an estimate how harmful it is. In case of the error the caput
> program still does what it is asked to do (i.e. sets the new value),
> but we are not sure how the other EPICS software would behave.
> 
> We also tried caget and caput from EPICS extensions and got the
> same problem.

The error message is from src/libCom/osi/os/posix/osdThread.c at the top 
of the routine epicsThreadOnceOsd(), and occurs if mutexLock(&onceLock) 
returns a non-zero status.  The mutexLock() routine is one of ours that 
calls pthread_mutex_lock() and handles retrying in the event that it 
returns EINTR (which violates the SUSv3).

The result of a bad status (other than EINTR) from pthread_mutex_lock() 
is for picsThreadOnceOsd() to call exit(1).  However since Sergey's 
caput and caget operations seem to be completing normally, it looks like 
this may be some kind of shutdown issue.  The onceLock mutex does get 
destroyed after myAtExit() has run epicsExitCallAtExits() and called 
pthread_cancel() on all threads other than itself and main, and this 
might explain the issue -- if a thread other than main invokes exit() 
and is used to perform the atExit() cleanups itself, the main thread 
might still try to run something that checks the onceLock before the 
process breathes its very last breath.

I'll add some more instrumentation and try to replicate this on Fedora-6 
but any assistance is welcome...

- Andrew
-- 
The right to be heard does not automatically include
the right to be taken seriously. -- Hubert H. Humphrey
_______________________________________________
Core-talk mailing list
[email protected]
http://www.aps.anl.gov/mailman/listinfo/core-talk

Replies:
RE: Mutex error on recent Redhat systems Jeff Hill

Navigate by Date:
Prev: Core-talk moves to Mailman Andrew Johnson
Next: RE: Mutex error on recent Redhat systems Jeff Hill
Index: 2002  2003  2004  2005  2006  <20072008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
Navigate by Thread:
Prev: Core-talk moves to Mailman Andrew Johnson
Next: RE: Mutex error on recent Redhat systems Jeff Hill
Index: 2002  2003  2004  2005  2006  <20072008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024