We are having channel access problems occasionally on some R3.14.8.2
vxWorks IOC's. It seems that one of the database semaphores isn't being
released for some reason and this is screwing everything up. No task is
suspended, and there are no inverted priorities indicating a deadlock
(but this is no guarantee). Has anyone else seen this?
The details follow.
The simplest symptom is caget fails as follows:
[npr78@i06-ws002 ~]$ caget BL06I-AL-SLITS-01:YA
Read operation timed out: some PV data was not read.
BL06I-AL-SLITS-01:YA 0
CA.Client.Exception...............................................
Warning: "Virtual circuit disconnect"
Context: "op=0, channel=BL06I-AL-SLITS-01:YA, type=DBR_TIME_DOUBLE,
count=1, ctx="BL06I-MO-IOC-01.diamond.ac.uk:5064""
Source File: ../getCopy.cpp line 82
Current Time: Fri Nov 03 2006 16:30:00.426411000
..................................................................
If I try a dbgf on the IOC to try and get the same parameter it hangs
and the stack trace emitted after Ctrl-C is as follows:
BL06I-MO-IOC-01 -> dbgf "BL06I-AL-SLITS-01:YA"
231f7c vxTaskEntry +68 : shell ()
1f81e0 shell +190: 1f820c ()
1f840c shell +3bc: execute ()
1f8590 execute +d8 : yyparse ()
2122d0 yyparse +71c: 210668 ()
2107ec yystart +96c: dbgf ()
1e752e18 dbgf +15c: dbGetField ()
1e73f3fc dbGetField +68 : dbScanLock ()
1e73c70c dbScanLock +1b4: epicsMutexLock ()
1e8089c0 epicsMutexLock +24 : semTake ()
2283ac semTake +13c: semMTake ()
tShell restarted.
So the shell is hanging because it can't get a lock in dbScanLock.
Address 0x1e73c70c is somewhere in the middle of dbScanLock - the next
routine is dbScanUnlock and the addresses of each routine is:
BL06I-MO-IOC-01 -> lkup "dbScanLock"
dbScanLock 0x1e73c558 text (BL06I-MO-IOC-01.munch)
BL06I-MO-IOC-01 -> lkup "dbScanUnlock"
dbScanUnlock 0x1e73c8b8 text (BL06I-MO-IOC-01.munch)
In the middle of dbScanLock there are various statements of the form:
epicsMutexMustLock(plockSet->lock);
epicsMutexMustLock(lockSetModifyLock);
... and so the problem is presumably in one of these semaphores.
Has anyone seem something similar? Does anyone have suggestions of what
I should do next time for diagnostics?