Experimental Physics and Industrial Control System
Hi Sergey,
Sergey Stepanov wrote:
> Hi Andrew,
> we wonder if you have seen this: we are often observing
> the following error when using caget and caput (the EPICS
> base programs) on Redhat EL5 and Fedora-6:
I have not seen this yet as I haven't had either of those OSs available
before now -- I've been asking IT to upgrade my PC to Fedora-6 for some
time, and it finally looks like they're able to do it so I will
investigate this problem as soon as I can.
Has anyone else on core-talk seen this problem?
> gmca@bl1ws3:~ 43> caget 23i:GO:Om:Start.DESC
> 23i:GO:Om:Start.DESC Gonio Omega
> epicsThreadOnceOsd epicsMutexLock failed.
>
>
> gmca@bl1ws3:~ 47> caput 23i:GO:Om:Start.DESC "Gonio Omega"
> Old : 23i:GO:Om:Start.DESC Gonio Omega
> New : 23i:GO:Om:Start.DESC Gonio Omega
> epicsThreadOnceOsd epicsMutexLock failed.
>
> This is happening at least with EPICS BASE-3.14.9 and BASE-3.14.8.2.
> At the same time the error does not show up on Redhat EL3 and
> Redhat EL4 (with the same versions of EPICS BASE). The rate of
> the error is about 5-10%.
>
> Does it ring the bell for you? We did not find any note on this
> problem in the EPICS Mantis Bug Reports list. Also we do not
> have an estimate how harmful it is. In case of the error the caput
> program still does what it is asked to do (i.e. sets the new value),
> but we are not sure how the other EPICS software would behave.
>
> We also tried caget and caput from EPICS extensions and got the
> same problem.
The error message is from src/libCom/osi/os/posix/osdThread.c at the top
of the routine epicsThreadOnceOsd(), and occurs if mutexLock(&onceLock)
returns a non-zero status. The mutexLock() routine is one of ours that
calls pthread_mutex_lock() and handles retrying in the event that it
returns EINTR (which violates the SUSv3).
The result of a bad status (other than EINTR) from pthread_mutex_lock()
is for picsThreadOnceOsd() to call exit(1). However since Sergey's
caput and caget operations seem to be completing normally, it looks like
this may be some kind of shutdown issue. The onceLock mutex does get
destroyed after myAtExit() has run epicsExitCallAtExits() and called
pthread_cancel() on all threads other than itself and main, and this
might explain the issue -- if a thread other than main invokes exit()
and is used to perform the atExit() cleanups itself, the main thread
might still try to run something that checks the onceLock before the
process breathes its very last breath.
I'll add some more instrumentation and try to replicate this on Fedora-6
but any assistance is welcome...
- Andrew
--
The right to be heard does not automatically include
the right to be taken seriously. -- Hubert H. Humphrey
_______________________________________________
Core-talk mailing list
[email protected]
http://www.aps.anl.gov/mailman/listinfo/core-talk
- Replies:
- RE: Mutex error on recent Redhat systems Jeff Hill
- Navigate by Date:
- Prev:
Core-talk moves to Mailman Andrew Johnson
- Next:
RE: Mutex error on recent Redhat systems Jeff Hill
- Index:
2002
2003
2004
2005
2006
<2007>
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
- Navigate by Thread:
- Prev:
Core-talk moves to Mailman Andrew Johnson
- Next:
RE: Mutex error on recent Redhat systems Jeff Hill
- Index:
2002
2003
2004
2005
2006
<2007>
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024