Hi Nick,
Rees, NP (Nick) wrote:
We are having channel access problems occasionally on some R3.14.8.2
vxWorks IOC's. It seems that one of the database semaphores isn't being
released for some reason and this is screwing everything up. No task is
suspended, and there are no inverted priorities indicating a deadlock
(but this is no guarantee). Has anyone else seen this?
The details follow.
The simplest symptom is caget fails as follows:
[npr78@i06-ws002 ~]$ caget BL06I-AL-SLITS-01:YA
Read operation timed out: some PV data was not read.
BL06I-AL-SLITS-01:YA 0
CA.Client.Exception...............................................
Warning: "Virtual circuit disconnect"
Context: "op=0, channel=BL06I-AL-SLITS-01:YA, type=DBR_TIME_DOUBLE,
count=1, ctx="BL06I-MO-IOC-01.diamond.ac.uk:5064""
Source File: ../getCopy.cpp line 82
Current Time: Fri Nov 03 2006 16:30:00.426411000
..................................................................
If I try a dbgf on the IOC to try and get the same parameter it hangs
and the stack trace emitted after Ctrl-C is as follows:
BL06I-MO-IOC-01 -> dbgf "BL06I-AL-SLITS-01:YA"
231f7c vxTaskEntry +68 : shell ()
1f81e0 shell +190: 1f820c ()
1f840c shell +3bc: execute ()
1f8590 execute +d8 : yyparse ()
2122d0 yyparse +71c: 210668 ()
2107ec yystart +96c: dbgf ()
1e752e18 dbgf +15c: dbGetField ()
1e73f3fc dbGetField +68 : dbScanLock ()
1e73c70c dbScanLock +1b4: epicsMutexLock ()
1e8089c0 epicsMutexLock +24 : semTake ()
2283ac semTake +13c: semMTake ()
tShell restarted.
So the shell is hanging because it can't get a lock in dbScanLock.
Address 0x1e73c70c is somewhere in the middle of dbScanLock - the next
routine is dbScanUnlock and the addresses of each routine is:
BL06I-MO-IOC-01 -> lkup "dbScanLock"
dbScanLock 0x1e73c558 text (BL06I-MO-IOC-01.munch)
BL06I-MO-IOC-01 -> lkup "dbScanUnlock"
dbScanUnlock 0x1e73c8b8 text (BL06I-MO-IOC-01.munch)
In the middle of dbScanLock there are various statements of the form:
epicsMutexMustLock(plockSet->lock);
epicsMutexMustLock(lockSetModifyLock);
... and so the problem is presumably in one of these semaphores.
Has anyone seem something similar? Does anyone have suggestions of what
I should do next time for diagnostics?
Unlike dbgf, dbpr doesn't try to lock the lockset, so I would definitely
recommend that you use that instead next time. There are also a couple
of tools that you should run: dblsr which can optionally take the name
of a record and an interest level, and dbLockShowLocked which takes an
interest level.
What is the record type and device type of the BL06I-AL-SLITS-01:YA
record? From your symptoms it does appear that the record's lockset is
locked, but this could easily be due to a device support issue of some
kind, which you would have to investigate. Device support is
responsible for locking a record before reprocessing it as asynchronous
I/O completion time, and it's possible that it might not unlocked it
again afterwards in some circumstances. Check the state of the tasks
associated with all devices that are connected with the record.
This record might also not have been the one that actually locked the
lockset either, so you'll have to bear that in mind during your
investigations and look at any other device supports related to that
lockset.
HTH,
- Andrew
--
There is considerable overlap between the intelligence of the smartest
bears and the dumbest tourists -- Yosemite National Park Ranger
- References:
- Database hanging Rees, NP (Nick)
- Navigate by Date:
- Prev:
Re: HP8116A signal generator Till Straumann
- Next:
RE: Multihomed IOC Jeff Hill
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
<2006>
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
- Navigate by Thread:
- Prev:
Database hanging Rees, NP (Nick)
- Next:
edm error again, help~~~~ marco_hair
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
<2006>
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
|