I will test it next week.
Jeff Hill wrote:
DB CA link does not connect through the gateway
I committed a PCAS change this evening to the R3.14 branch which is
hopefully a fix for this issue. See mantis entry 360. This fix, and a number
of other fixes to PCAS will be in R3.14.11 RC1. I also installed an upgrade
into the gateway so that it can now use ca_put for put requests and
ca_put_callback for put callback requests.
PS: These changes impact the CA gateway, but not IOCs.
Jeff
______________________________________________________
Jeffrey O. Hill Email [email protected]
LANL MS H820 Voice 505 665 1831
Los Alamos NM 87545 USA FAX 505 665 5107
Message content: TSPA
-----Original Message-----
From: Shepherd, Emma (DLSLtd,RAL,DIA) [mailto:[email protected]]
Sent: Tuesday, August 11, 2009 3:00 AM
To: Jeff Hill; Rees, Nick (DLSLtd,RAL,DIA); Dirk Zimoch
Cc: tech-talk
Subject: RE: CA reconnection problem
Hi all,
I've gone back through my notes as well to see what I can remember about
this. I think part of the problem is that we have had several different
issues at Diamond with similar symptoms:
1) The CAC-TCP-recv task gets suspended - I don't know what causes it but
things are well and truly broken when this happens and we have to reboot
the IOC. I haven't seen this in quite a while though.
2) A timerQueue task responsible for scheduling UDP requests gets
suspended. In one investigation the task trace showed that an exception
had been thrown and not handled properly - Mantis entry 336 was created
for this and I believe you committed a fix Jeff.
3) Some CA links not working but nothing obvious wrong, no suspended
tasks. Like in Dirk's case, the link is connecting through a gateway and
re-writing the record name to the link can fix that particular link.
I hope that helps to explain our situation at the moment, let me know if
there's any more investigation I can do here.
Emma
-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Jeff Hill
Sent: 11 August 2009 02:02
To: Rees, Nick (DLSLtd,RAL,DIA); 'Dirk Zimoch'
Cc: 'tech-talk'
Subject: RE: CA reconnection problem
I had another look at this issue this evening and suspect now
that the timer queue entries with negative delays reflect
only the time spent (on the order of 20 seconds) holding the
lock while dumping "dbcar" at high intrest levels to the slow
9600 baud serial port.
So perhaps its best to step back a bit and look at what
Diamond and SLS have in common. It seems that both sites have
a db ca link connecting through the gateway when this occurs.
Dirk mentioned that R3.13 CA links connect fine at SLS, but
R3.14 ones don't. That can probably be explained by the fact
that the R3.13 ca client library disconnects both the circuit
and the application when a circuit is unresponsive. In
contrast, the R3.14 ca client library disconnects only the
application, and not the circuit when the circuit is
unresponsive relying instead on TCP to disconnect broken circuits.
So what probably is occurring is an issue with the gateway
where PCAS isn't properly scheduling itself to interact with
the R3.14 ca client's circuit, but the circuit remains
connected at TCP's level. In Dirk's situation I seem to
recall that a power outage was the precipitating event. A
precipitating event at Diamond maybe isn't identified?
There are an assortment of PCAS, and also gateway
improvements, coming soon in R3.14.11 RC1 but I'm not certain
that this one has been fixed. I will see if I can reproduce it.
Jeff
______________________________________________________
Jeffrey O. Hill Email [email protected]
LANL MS H820 Voice 505 665 1831
Los Alamos NM 87545 USA FAX 505 665 5107
Message content: TSPA
-----Original Message-----
From: [email protected]
[mailto:[email protected]]
On Behalf Of Jeff Hill
Sent: Monday, August 10, 2009 11:05 AM
To: 'Rees, NP (Nick)'; 'Dirk Zimoch'
Cc: 'tech-talk'
Subject: RE: CA reconnection problem
Nick, Dirk,
I think this is possibly the same problem we described in the
thread:
"Inter-IOC link problems"
http://www.aps.anl.gov/epics/tech-talk/2008/msg01103.php
I vaguely recall that there was a mantis entry for the
above, but have
had no success finding it. I did find 164 which is somewhat
close, but
probably not the same issue. Also 330 (which will probably soon be
assigned not-a- bug
status) might be of some interest if you are seeing
unexpectedly long
reconnect times with R3.14.
Nevertheless, working only from my memory, it's not clear that Emma
Shepherd's and Dirk's issues have the same symptoms. In Dirk's
situation dbcar (at high interest levels) shows entries in the CA
client library's timer queue with negative expiration which, as I
recall, was not identified as a symptom with Emma Shepherd's
situation? The tech-talk thread does indicate some close
examination
of the dbcar output occurred, which probably
would have immediately identified the negative expiration
times on the
timer
queue as a problem?
In any case you now have a common symptom to look for with
dbcar the
next time that it happens.
Stack traces from _all_ of the dbca, CA client library, and timer
threads involved will be the information I need to
determine a cause.
When deciding on which threads traces to send, look at the threads
operating at close to the same scheduling priority as dbca in the
vxWorks "i" output. Emphasis on
all because if there is a deadlock issue they can be tricky
to identify
w/o
clear understanding of the lock hierarchy interactions
between the two or
more threads involved. Of course, if we have a deadlock
situation, then
any
threads that are continuously blocking on a mutex are of particular
interest.
Thanks for your help,
Jeff
______________________________________________________
Jeffrey O. Hill Email [email protected]
LANL MS H820 Voice 505 665 1831
Los Alamos NM 87545 USA FAX 505 665 5107
Message content: TSPA
-----Original Message-----
From: Rees, NP (Nick) [mailto:[email protected]]
Sent: Thursday, August 06, 2009 6:29 AM
To: Dirk Zimoch; Jeff Hill
Cc: tech-talk
Subject: RE: CA reconnection problem
Hi Dirk, Jeff,
I think this is possibly the same problem we described in the
thread:
"Inter-IOC link problems"
http://www.aps.anl.gov/epics/tech-talk/2008/msg01103.php
That thread seemed to peter out because we basically developed
reboot workarounds that meant it didn't bother us as
much. However,
I don't think it was ever solved. I don't know whether there is a
Mantis entry for it.
Cheers,
Nick Rees
Principal Software Engineer Phone: +44 (0)1235-778430
Diamond Light Source Fax: +44 (0)1235-446713
-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Dirk Zimoch
Sent: 06 August 2009 10:22
To: Jeff Hill
Cc: 'tech-talk'
Subject: Re: CA reconnection problem
Hi Jeff,
Time advances normally on the IOC.
Bug 299 is not related because I have no suspended
tasks. Bug 330
is also not related because my channels never reconnect, even
after days.
The IOC connects to the gateway with normal broadcasts on the
local network. I can see beacons from all IOCs and
gateway and I
can see the beacon anomaly when
I start a softioc. Still the IOC does not send any search
requests to reconnect
the links. I can see the search requests from 3.13 IOCs,
3.14.8 medms and the
gateway after a beacon anomaly, but not from 3.14.8 IOCs. It
is definitely not
the case that the 3.14.8 IOC never stops searching.
I do not want to restart the gateway at the moment
because we have
user operation. I also think it's not necessary because
I can see
the channels
through the gateway with caget. I doubt that it is a gateway
problem at all.
Rebooting the IOC helps. Writing the record name to the
link again
helps too, but only for this link. All other links on
the same IOC
to the same target
record stay disconnected. I do not understand this.
UDP package size is not an issue because its only 8 broken CA
links to 2 records. So it should easily fit into one
package. But
as I said, I see no
package at all (except when I rewnew the link).
I would send a stack trace of the CAC timer thread if I
could find
it. What's the name?
Here is the full dbcar output, mybe it helps:
X04SA-VME-ID2 > dbcar "",100
connected X04SA-VME-ID2 X04SA-ID-LMD:PMD.INPB
X04SA-ID-LMD:SCALER.S3
connected X04SA-VME-ID2 X04SA-ID-LMD:PMD-INJ.INPB
X04SA-ID-LMD:PMD
connected X04SA-VME-ID2 X04SA-ID-LMD:PMD-NOINJ.INPB
X04SA-ID-LMD:PMD
connected X04SA-VME-ID2 X04SA-ID-LMD:PMU.INPB
X04SA-ID-LMD:SCALER.S2
connected X04SA-VME-ID2 X04SA-ID-LMU:PMD.INPB
X04SA-ID-LMU:SCALER.S3
connected X04SA-VME-ID2 X04SA-ID-LMU:PMU.INPB
X04SA-ID-LMU:SCALER.S2
connected 129.129.104.31:5064 X04SA-ID-FBPMD:ALARM.INPD
X04SA-ID-GAP:ISOPEN not_connected X04SA-ID-FBPMD:X-COMP.INPE
ACOAU-ACCU:OP-MODE not_connected X04SA-ID-FBPMD:X-COMP.INPI
ARIDI-PCT:CURRENT not_connected X04SA-ID-FBPMD:Y-COMP.INPE
ACOAU-ACCU:OP-MODE not_connected X04SA-ID-FBPMD:Y-COMP.INPI
ARIDI-PCT:CURRENT
connected 129.129.104.31:5064 X04SA-ID-FBPMU:ALARM.INPD
X04SA-ID-GAP:ISOPEN not_connected X04SA-ID-FBPMU:X-COMP.INPE
ACOAU-ACCU:OP-MODE not_connected X04SA-ID-FBPMU:X-COMP.INPI
ARIDI-PCT:CURRENT not_connected X04SA-ID-FBPMU:Y-COMP.INPE
ACOAU-ACCU:OP-MODE not_connected X04SA-ID-FBPMU:Y-COMP.INPI
ARIDI-PCT:CURRENT
connected 129.129.104.31:5064 X04SA-ID:FBPM-ILK.OUT
X04SA-ID:ILK-BITS.B1
connected X04SA-VME-ID2 X04SA-VME-ID2:HBTSYNC.INPA
X04SA-VME-ID2:HBT.VAL
connected X04SA-VME-ID2 X04SA-VME-ID2:MEMALLOC.INPA
X04SA-VME-ID2:MEMALLOCRA W
connected X04SA-VME-ID2 X04SA-VME-ID2:MEMFREE.INPA
X04SA-VME-ID2:MEMFREE-RAW
connected X04SA-VME-ID2 X04SA-VME-ID2:MEMMAX.INPA
X04SA-VME-ID2:MEMMAX-RAW
ncalinks 21 not connected 8 no_read_access 0 no_write_access 0
nDisconnect 0 nNoWrite 0
ca_client_context at 0x7ca6c90 pndRecvCnt=0 ioSeqNo=0
dbContext at 0x7ca6a80
event call back cache location 0x7bac130, and
its size 24
dbContextReadNotifyCache
dbContextReadNotifyCacheAlocator
count 1 and size 88
epicsMutexId 0x7ca6c20 source
../../../src/libCom/osi/epicsMutex.cpp line 197
Semaphore Id : 0x7ca6c50
Semaphore Type : MUTEX
Task Queuing : PRIORITY
Pended Tasks : 3
Owner : 0x7a331e0 (tShell)
Options : 0xd SEM_Q_PRIORITY
SEM_DELETE_SAFE
SEM_INVERSION_SAFE
VxWorks Events
--------------
Registered Task : NONE
Event(s) to Send : N/A
Options : N/A
Pended Tasks
------------
NAME TID PRI TIMEOUT
---------- -------- --- -------
timerQueue 7bf5d20 148 0
CAC-event 7bb5a90 148 0
CAC-TCP-recv 7a5e3b0 150 0
Channel Access Client Context at 0x7bf8cc0 for user ioc
revision "@(#) EPICS 3.14.8.2, CA Client
Library Nov 29
2006" Hash table with 512 buckets and 2 items of type 6tcpiiu
installed Virtual circuit to "129.129.104.31:5064" at
version V4.8 state 1
current data cache pointer = 0x7b88c20 current
data cache
size = 16384
contiguous receive message count=0, busy
detect bool=0,
flow control boo l=0
eceive thread is busy=0
virtual circuit socket identifier 20
send thread flush signal:
Semaphore Id : 0x7b88bc0
Semaphore Type : BINARY
Task Queuing : FIFO
Pended Tasks : 1
State : EMPTY
Options : 0x0 SEM_Q_FIFO
VxWorks Events
--------------
Registered Task : NONE
Event(s) to Send : N/A
Options : N/A
Pended Tasks
------------
NAME TID PRI TIMEOUT
---------- -------- --- -------
CAC-TCP-send 7b8fba0 148 0
send thread:
recv thread:
echo pending bool = 0
IO identifier hash table:
Connected channels
Channel "X04SA-ID-GAP:ISOPEN", connected to server
129.129.104.31:5064, native t ype DBF_ENUM, native
element count
1, read access, write access
network IO pointer = 0x7b96cd0
server identifier 392
search retry number=1
name length=20
Channel "X04SA-ID-GAP:ISOPEN", connected to server
129.129.104.31:5064, native t ype DBF_ENUM, native
element count
1, read access, write access
network IO pointer = 0x7b96cd0
server identifier 393
search retry number=1
name length=20
Channel "X04SA-ID:ILK-BITS.B1", connected to server
129.129.104.31:5064, native type DBF_CHAR, native
element count 1,
read access, write access
network IO pointer = 0x7b96cd0
server identifier 394
search retry number=1
name length=21
Virtual circuit to "129.129.104.14:5064" at version
V4.11 state 1
current data cache pointer = 0x7af62b0 current
data cache
size = 16384
contiguous receive message count=0, busy
detect bool=0,
flow control boo l=0
eceive thread is busy=0
virtual circuit socket identifier 22
send thread flush signal:
Semaphore Id : 0x7a1f670
Semaphore Type : BINARY
Task Queuing : FIFO
Pended Tasks : 1
State : EMPTY
Options : 0x0 SEM_Q_FIFO
VxWorks Events
--------------
Registered Task : NONE
Event(s) to Send : N/A
Options : N/A
Pended Tasks
------------
NAME TID PRI TIMEOUT
---------- -------- --- -------
CAC-TCP-send 7a61590 148 0
send thread:
recv thread:
echo pending bool = 0
IO identifier hash table:
Unresponsive circuit channels
Channel "ARIDI-PCT:CURRENT" is disconnected
network IO pointer = 0x7b96f58
server identifier 1
search retry number=1
name length=18
Channel "ARIDI-PCT:CURRENT" is disconnected
network IO pointer = 0x7b96f58
server identifier 2
search retry number=1
name length=18
Channel "ARIDI-PCT:CURRENT" is disconnected
network IO pointer = 0x7b96f58
server identifier 3
search retry number=1
name length=18
Channel "ARIDI-PCT:CURRENT" is disconnected
network IO pointer = 0x7b96f58
server identifier 4
search retry number=1
name length=18
Channel "ACOAU-ACCU:OP-MODE" is disconnected
network IO pointer = 0x7b96f58
server identifier 5
search retry number=1
name length=19
Channel "ACOAU-ACCU:OP-MODE" is disconnected
network IO pointer = 0x7b96f58
server identifier 6
search retry number=1
name length=19
Channel "ACOAU-ACCU:OP-MODE" is disconnected
network IO pointer = 0x7b96f58
server identifier 7
search retry number=1
name length=19
Channel "ACOAU-ACCU:OP-MODE" is disconnected
network IO pointer = 0x7b96f58
server identifier 8
search retry number=1
name length=19
Virtual circuit to "129.129.104.31:5064" at version V4.8 state 1
current data cache pointer = 0x7b88c20 current
data cache
size = 16384
contiguous receive message count=0, busy
detect bool=0,
flow control boo l=0
eceive thread is busy=0
virtual circuit socket identifier 20
send thread flush signal:
Semaphore Id : 0x7b88bc0
Semaphore Type : BINARY
Task Queuing : FIFO
Pended Tasks : 1
State : EMPTY
Options : 0x0 SEM_Q_FIFO
VxWorks Events
--------------
Registered Task : NONE
Event(s) to Send : N/A
Options : N/A
Pended Tasks
------------
NAME TID PRI TIMEOUT
---------- -------- --- -------
CAC-TCP-send 7b8fba0 148 0
send thread:
recv thread:
echo pending bool = 0
IO identifier hash table:
Connected channels
Channel "X04SA-ID-GAP:ISOPEN", connected to server
129.129.104.31:5064, native t ype DBF_ENUM, native
element count
1, read access, write access
network IO pointer = 0x7b96cd0
server identifier 392
search retry number=1
name length=20
Channel "X04SA-ID-GAP:ISOPEN", connected to server
129.129.104.31:5064, native t ype DBF_ENUM, native
element count
1, read access, write access
network IO pointer = 0x7b96cd0
server identifier 393
search retry number=1
name length=20
Channel "X04SA-ID:ILK-BITS.B1", connected to server
129.129.104.31:5064, native type DBF_CHAR, native
element count 1,
read access, write access
network IO pointer = 0x7b96cd0
server identifier 394
search retry number=1
name length=21
Virtual circuit to "129.129.104.14:5064" at version
V4.11 state 1
current data cache pointer = 0x7af62b0 current
data cache
size = 16384
contiguous receive message count=0, busy
detect bool=0,
flow control boo l=0
eceive thread is busy=0
virtual circuit socket identifier 22
send thread flush signal:
Semaphore Id : 0x7a1f670
Semaphore Type : BINARY
Task Queuing : FIFO
Pended Tasks : 1
State : EMPTY
Options : 0x0 SEM_Q_FIFO
VxWorks Events
--------------
Registered Task : NONE
Event(s) to Send : N/A
Options : N/A
Pended Tasks
------------
NAME TID PRI TIMEOUT
---------- -------- --- -------
CAC-TCP-send 7a61590 148 0
send thread:
recv thread:
echo pending bool = 0
IO identifier hash table:
Unresponsive circuit channels
Channel "ARIDI-PCT:CURRENT" is disconnected
network IO pointer = 0x7b96f58
server identifier 1
search retry number=1
name length=18
Channel "ARIDI-PCT:CURRENT" is disconnected
network IO pointer = 0x7b96f58
server identifier 2
search retry number=1
name length=18
Channel "ARIDI-PCT:CURRENT" is disconnected
network IO pointer = 0x7b96f58
server identifier 3
search retry number=1
name length=18
Channel "ARIDI-PCT:CURRENT" is disconnected
network IO pointer = 0x7b96f58
server identifier 4
search retry number=1
name length=18
Channel "ACOAU-ACCU:OP-MODE" is disconnected
network IO pointer = 0x7b96f58
server identifier 5
search retry number=1
name length=19
Channel "ACOAU-ACCU:OP-MODE" is disconnected
network IO pointer = 0x7b96f58
server identifier 6
search retry number=1
name length=19
Channel "ACOAU-ACCU:OP-MODE" is disconnected
network IO pointer = 0x7b96f58
server identifier 7
search retry number=1
name length=19
Channel "ACOAU-ACCU:OP-MODE" is disconnected
network IO pointer = 0x7b96f58
server identifier 8
search retry number=1
name length=19
entries per bucket: mean = 0.003906 std dev = 0.062378 max = 1
connection time out watchdog period 30.000000
Datagram IO
circuit (and disconnected channel repository)
repeater port 5065
default server port 5064
Channel Access Address List
129.129.104.255:5064
socket identifier 18
bytes in xmit buffer 16
shut down command bool 0
recv thread exit signal:
repeater subscribe timer: attempts=2 registered=1 once=0
disconnect governor timer: search timer delay 0.032000
0 channels with search request pending
0 channels with search response pending
search timer delay 0.064000
0 channels with search request pending
0 channels with search response pending
search timer delay 0.128000
0 channels with search request pending
0 channels with search response pending
search timer delay 0.256000
0 channels with search request pending
0 channels with search response pending
search timer delay 0.512000
0 channels with search request pending
0 channels with search response pending
search timer delay 1.024000
0 channels with search request pending
0 channels with search response pending
search timer delay 2.048000
0 channels with search request pending
0 channels with search response pending
search timer delay 4.096000
0 channels with search request pending
0 channels with search response pending
search timer delay 8.192000
0 channels with search request pending
0 channels with search response pending
search timer delay 16.384000
0 channels with search request pending
0 channels with search response pending
search timer delay 32.768000
0 channels with search request pending
0 channels with search response pending
search timer delay 65.536000
0 channels with search request pending
0 channels with search response pending
search timer delay 131.072000
0 channels with search request pending
0 channels with search response pending
search timer delay 262.144000
0 channels with search request pending
0 channels with search response pending
Program begin time:
epicsTime: MON MAY 11 2009 16:53:29.047623310
epicsTime: revision "@(#) EPICS 3.14.8.2, Common Utilities
Library Nov 29 2006"
Channel identifier hash table:
Hash table with 512 buckets and 11 items of type 4nciu installed
Channel "X04SA-ID-GAP:ISOPEN", connected to server
129.129.104.31:5064, native t
ype DBF_ENUM, native element count 1, read access, write access
network IO pointer = 0x7b96cd0
server identifier 392
search retry number=1
name length=20
Channel "ACOAU-ACCU:OP-MODE" is disconnected
network IO pointer = 0x7b96f58
server identifier 5
search retry number=1
name length=19
Channel "ARIDI-PCT:CURRENT" is disconnected
network IO pointer = 0x7b96f58
server identifier 1
search retry number=1
name length=18
Channel "ACOAU-ACCU:OP-MODE" is disconnected
network IO pointer = 0x7b96f58
server identifier 6
search retry number=1
name length=19
Channel "ARIDI-PCT:CURRENT" is disconnected
network IO pointer = 0x7b96f58
server identifier 2
search retry number=1
name length=18
Channel "X04SA-ID-GAP:ISOPEN", connected to server
129.129.104.31:5064, native t
ype DBF_ENUM, native element count 1, read access, write access
network IO pointer = 0x7b96cd0
server identifier 393
search retry number=1
name length=20
Channel "ACOAU-ACCU:OP-MODE" is disconnected
network IO pointer = 0x7b96f58
server identifier 7
search retry number=1
name length=19
Channel "ARIDI-PCT:CURRENT" is disconnected
network IO pointer = 0x7b96f58
server identifier 3
search retry number=1
name length=18
Channel "ACOAU-ACCU:OP-MODE" is disconnected
network IO pointer = 0x7b96f58
server identifier 8
search retry number=1
name length=19
Channel "ARIDI-PCT:CURRENT" is disconnected
network IO pointer = 0x7b96f58
server identifier 4
search retry number=1
name length=18
Channel "X04SA-ID:ILK-BITS.B1", connected to server
129.129.104.31:5064, native
type DBF_CHAR, native element count 1, read access, write access
network IO pointer = 0x7b96cd0
server identifier 394
search retry number=1
name length=21
Channel "X04SA-ID-GAP:ISOPEN", connected to server
129.129.104.31:5064, native t
ype DBF_ENUM, native element count 1, read access, write access
network IO pointer = 0x7b96cd0
server identifier 392
search retry number=1
name length=20
Channel "ACOAU-ACCU:OP-MODE" is disconnected
network IO pointer = 0x7b96f58
server identifier 5
search retry number=1
name length=19
Channel "ARIDI-PCT:CURRENT" is disconnected
network IO pointer = 0x7b96f58
server identifier 1
search retry number=1
name length=18
Channel "ACOAU-ACCU:OP-MODE" is disconnected
network IO pointer = 0x7b96f58
server identifier 6
search retry number=1
name length=19
Channel "ARIDI-PCT:CURRENT" is disconnected
network IO pointer = 0x7b96f58
server identifier 2
search retry number=1
name length=18
Channel "X04SA-ID-GAP:ISOPEN", connected to server
129.129.104.31:5064, native t
ype DBF_ENUM, native element count 1, read access, write access
network IO pointer = 0x7b96cd0
server identifier 393
search retry number=1
name length=20
Channel "ACOAU-ACCU:OP-MODE" is disconnected
network IO pointer = 0x7b96f58
server identifier 7
search retry number=1
name length=19
Channel "ARIDI-PCT:CURRENT" is disconnected
network IO pointer = 0x7b96f58
server identifier 3
search retry number=1
name length=18
Channel "ACOAU-ACCU:OP-MODE" is disconnected
network IO pointer = 0x7b96f58
server identifier 8
search retry number=1
name length=19
Channel "ARIDI-PCT:CURRENT" is disconnected
network IO pointer = 0x7b96f58
server identifier 4
search retry number=1
name length=18
Channel "X04SA-ID:ILK-BITS.B1", connected to server
129.129.104.31:5064, native
type DBF_CHAR, native element count 1, read access, write access
network IO pointer = 0x7b96cd0
server identifier 394
search retry number=1
name length=21
entries per bucket: mean = 0.021484 std dev = 0.144992 max = 1
IO identifier hash table:
Hash table with 512 buckets and 10 items of type
8baseNMIU installed
event subscription IO at 0x7ae2ef0, type DBF_invalid, element
count 1, mask 5
event subscription IO at 0x7ae2f1c, type DBF_invalid, element
count 1, mask 5
event subscription IO at 0x7ae2f48, type DBF_invalid, element
count 1, mask 5
event subscription IO at 0x7ae2f74, type DBF_invalid, element
count 1, mask 5
event subscription IO at 0x7ae2fa0, type DBF_invalid, element
count 1, mask 5
event subscription IO at 0x7ae2fcc, type DBF_invalid, element
count 1, mask 5
event subscription IO at 0x7ae2ff8, type DBF_invalid, element
count 1, mask 5
event subscription IO at 0x7ae3024, type DBF_invalid, element
count 1, mask 5
event subscription IO at 0x7ae3050, type DBF_invalid, element
count 1, mask 5
event subscription IO at 0x7ae307c, type DBF_invalid, element
count 1, mask 5
event subscription IO at 0x7ae2ef0, type DBF_invalid, element
count 1, mask 5
event subscription IO at 0x7ae2f1c, type DBF_invalid, element
count 1, mask 5
event subscription IO at 0x7ae2f48, type DBF_invalid, element
count 1, mask 5
event subscription IO at 0x7ae2f74, type DBF_invalid, element
count 1, mask 5
event subscription IO at 0x7ae2fa0, type DBF_invalid, element
count 1, mask 5
event subscription IO at 0x7ae2fcc, type DBF_invalid, element
count 1, mask 5
event subscription IO at 0x7ae2ff8, type DBF_invalid, element
count 1, mask 5
event subscription IO at 0x7ae3024, type DBF_invalid, element
count 1, mask 5
event subscription IO at 0x7ae3050, type DBF_invalid, element
count 1, mask 5
event subscription IO at 0x7ae307c, type DBF_invalid, element
count 1, mask 5
entries per bucket: mean = 0.019531 std dev = 0.138383 max = 1
Beacon source identifier hash table:
Hash table with 512 buckets and 11 items of type 3bhe installed
CA beacon hash entry for 129.129.104.13:32771 with period
estimate 15.049622
beacon number 718291, on MON JUN 29 2009 13:59:03
CA beacon hash entry for 129.129.104.33:5064 with period
estimate 14.849221
beacon number 0, on THU AUG 06 2009 10:56:40
CA beacon hash entry for 129.129.104.34:5064 with period
estimate 14.891577
beacon number 0, on THU AUG 06 2009 10:56:30
CA beacon hash entry for 129.129.104.36:5064 with period
estimate 14.936137
beacon number 0, on THU AUG 06 2009 10:56:36
CA beacon hash entry for 129.129.104.37:5064 with period
estimate 14.915017
beacon number 26678, on THU AUG 06 2009 10:56:33
CA beacon hash entry for 129.129.104.39:5064 with period
estimate 14.838267
beacon number 0, on THU AUG 06 2009 10:56:39
CA beacon hash entry for 129.129.104.13:5064 with period
estimate 20.991435
beacon number 51, on THU AUG 06 2009 10:56:33
CA beacon hash entry for 129.129.104.14:5064 with period
estimate 14.935088
beacon number 218331, on THU AUG 06 2009 10:56:37
CA beacon hash entry for 129.129.104.15:5064 with period
estimate 14.881638
beacon number 218058, on THU AUG 06 2009 10:56:29
CA beacon hash entry for 129.129.104.30:5064 with period
estimate 14.953183
beacon number 0, on THU AUG 06 2009 10:56:39
CA beacon hash entry for 129.129.104.31:5064 with period
estimate 14.925085
beacon number 0, on THU AUG 06 2009 10:56:34
CA beacon hash entry for 129.129.104.13:32771 with period
estimate 15.049622
beacon number 718291, on MON JUN 29 2009 13:59:03
CA beacon hash entry for 129.129.104.33:5064 with period
estimate 14.849221
beacon number 0, on THU AUG 06 2009 10:56:40
CA beacon hash entry for 129.129.104.34:5064 with period
estimate 14.891577
beacon number 0, on THU AUG 06 2009 10:56:30
CA beacon hash entry for 129.129.104.36:5064 with period
estimate 14.936137
beacon number 0, on THU AUG 06 2009 10:56:36
CA beacon hash entry for 129.129.104.37:5064 with period
estimate 14.915017
beacon number 26678, on THU AUG 06 2009 10:56:33
CA beacon hash entry for 129.129.104.39:5064 with period
estimate 14.838267
beacon number 0, on THU AUG 06 2009 10:56:39
CA beacon hash entry for 129.129.104.13:5064 with period
estimate 20.991435
beacon number 51, on THU AUG 06 2009 10:56:33
CA beacon hash entry for 129.129.104.14:5064 with period
estimate 14.935088
beacon number 218331, on THU AUG 06 2009 10:56:37
CA beacon hash entry for 129.129.104.15:5064 with period
estimate 14.881638
beacon number 218058, on THU AUG 06 2009 10:56:29
CA beacon hash entry for 129.129.104.30:5064 with period
estimate 14.953183
beacon number 0, on THU AUG 06 2009 10:56:39
CA beacon hash entry for 129.129.104.31:5064 with period
estimate 14.925085
beacon number 0, on THU AUG 06 2009 10:56:34
entries per bucket: mean = 0.021484 std dev = 0.144992 max = 1
Timer queue:
EPICS threaded timer queue at 0x7bf6240
epicsTimerQueue with 15 items pending
11searchTimer, state = pending, delay = -19.894333
search timer delay 0.064000
0 channels with search request pending
0 channels with search response pending
11searchTimer, state = pending, delay = -20.000999
search timer delay 1.024000
0 channels with search request pending
0 channels with search response pending
11searchTimer, state = pending, delay = -20.163666
search timer delay 0.128000
0 channels with search request pending
0 channels with search response pending
11searchTimer, state = pending, delay = -20.235666
search timer delay 0.256000
0 channels with search request pending
0 channels with search response pending
11searchTimer, state = pending, delay = -20.296333
search timer delay 0.512000
0 channels with search request pending
0 channels with search response pending
11searchTimer, state = pending, delay = -18.926999
search timer delay 2.048000
0 channels with search request pending
0 channels with search response pending
11searchTimer, state = pending, delay = -17.095666
search timer delay 4.096000
0 channels with search request pending
0 channels with search response pending
23disconnectGovernorTimer, state = pending, delay = -16.191666
disconnect governor timer:
11searchTimer, state = pending, delay = -13.316332
search timer delay 8.192000
0 channels with search request pending
0 channels with search response pending
11searchTimer, state = pending, delay = -5.424332
search timer delay 16.384000
0 channels with search request pending
0 channels with search response pending
11searchTimer, state = pending, delay = -5.590332
search timer delay 32.768000
0 channels with search request pending
0 channels with search response pending
15tcpRecvWatchdog, state = pending, delay = 1.191668
Receive virtual circuit watchdog at 0x7b96df0, period 30.000000
11searchTimer, state = pending, delay = 6.361002
search timer delay 65.536000
0 channels with search request pending
0 channels with search response pending
11searchTimer, state = pending, delay = 51.530337
search timer delay 131.072000
0 channels with search request pending
0 channels with search response pending
11searchTimer, state = pending, delay = 236.852334
search timer delay 262.144000
0 channels with search request pending
0 channels with search response pending
reschedule event
Semaphore Id : 0x7bf6070
Semaphore Type : BINARY
Task Queuing : FIFO
Pended Tasks : 0
State : EMPTY
Options : 0x0 SEM_Q_FIFO
VxWorks Events
--------------
Registered Task : NONE
Event(s) to Send : N/A
Options : N/A
exit event
Semaphore Id : 0x7bf6030
Semaphore Type : BINARY
Task Queuing : FIFO
Pended Tasks : 0
State : EMPTY
Options : 0x0 SEM_Q_FIFO
VxWorks Events
--------------
Registered Task : NONE
Event(s) to Send : N/A
Options : N/A
exitFlag = F, terminateFlag = F
IP address to name conversion engine:
ipAddrToAsciiEngine at 0x7bf86c0 with 0 requests pending
mutex:
epicsMutexId 0x7bf8650 source
../../../src/libCom/osi/epicsMutex.cpp line 197
Semaphore Id : 0x7bf8680
Semaphore Type : MUTEX
Task Queuing : PRIORITY
Pended Tasks : 0
Owner : 0x7a331e0 (tShell)
Options : 0xd SEM_Q_PRIORITY
SEM_DELETE_SAFE
SEM_INVERSION_SAFE
VxWorks Events
--------------
Registered Task : NONE
Event(s) to Send : N/A
Options : N/A
laborEvent:
Semaphore Id : 0x7bf8610
Semaphore Type : BINARY
Task Queuing : FIFO
Pended Tasks : 1
State : EMPTY
Options : 0x0 SEM_Q_FIFO
VxWorks Events
--------------
Registered Task : NONE
Event(s) to Send : N/A
Options : N/A
Pended Tasks
------------
NAME TID PRI TIMEOUT
---------- -------- --- -------
ipToAsciiProxy 7bf82c0 189 0
exitFlag boolean = 0
exit event:
Default mutex:
epicsMutexId 0x7ca6c20 source
../../../src/libCom/osi/epicsMutex.cpp line 197
Semaphore Id : 0x7ca6c50
Semaphore Type : MUTEX
Task Queuing : PRIORITY
Pended Tasks : 5
Owner : 0x7a331e0 (tShell)
Options : 0xd SEM_Q_PRIORITY
SEM_DELETE_SAFE
SEM_INVERSION_SAFE
VxWorks Events
--------------
Registered Task : NONE
Event(s) to Send : N/A
Options : N/A
Pended Tasks
------------
NAME TID PRI TIMEOUT
---------- -------- --- -------
scan10 7c2d530 136 0
CAC-UDP 7be23d0 147 0
timerQueue 7bf5d20 148 0
CAC-event 7bb5a90 148 0
CAC-TCP-recv 7a5e3b0 150 0
mutex:
epicsMutexId 0x7ca6c20 source
../../../src/libCom/osi/epicsMutex.cpp line 197
Semaphore Id : 0x7ca6c50
Semaphore Type : MUTEX
Task Queuing : PRIORITY
Pended Tasks : 5
Owner : 0x7a331e0 (tShell)
Options : 0xd SEM_Q_PRIORITY
SEM_DELETE_SAFE
SEM_INVERSION_SAFE
VxWorks Events
--------------
Registered Task : NONE
Event(s) to Send : N/A
Options : N/A
Pended Tasks
------------
NAME TID PRI TIMEOUT
---------- -------- --- -------
scan10 7c2d530 136 0
CAC-UDP 7be23d0 147 0
timerQueue 7bf5d20 148 0
CAC-event 7bb5a90 148 0
CAC-TCP-recv 7a5e3b0 150 0
preemptive callback is enabled
there are 0 unsatisfied IO operations blocking
ca_pend_io()
the current io sequence number is 0
IO done event:
Semaphore Id : 0x7ca6b70
Semaphore Type : BINARY
Task Queuing : FIFO
Pended Tasks : 0
State : EMPTY
Options : 0x0 SEM_Q_FIFO
VxWorks Events
--------------
Registered Task : NONE
Event(s) to Send : N/A
Options : N/A
Synchronous group identifier hash table:
Hash table with 0 buckets and 0 items of type 4CASG installed
value = 0 = 0x0 X04SA-VME-ID2 >
Jeff Hill wrote:
Dirk,
The R3.14 design is more conservative from a perspective of
avoiding positive congestion feedback. Here are the primary
differences.
O When a circuit times out the CAC library disconnects the
application, but
not the circuit. We rely on TCP to decide when to give up
on a circuit, and
disconnect it.
O When the CAC library sees a beacon anomaly the search
rate receives a
boost, but this boost is only to once every 5 seconds. In
contrast, newly
created channels are initially searched for at a rate which
is closer to the
average measured round trip time. The CAC library sends
multiple search
requests per UDP frame, and depending on past success rates
also multiple
UDP frames per periodic search attempt (that part isn't
different between
R3.14 and R3.15). Nevertheless, if there many channels it
can take a little
while to connect all of them if the IOC (or GW in this
situation has been
off for a long time).
O In R3.14 the CAC library never stops searching for
disconnected channels,
but the maximum search rate is adjustable with an
environment variable.
I cannot see any search requests from that IOC.
That doesn't sound right, but bear in mind that if the IOC
is sending
unicasts, and not broadcasts, to the gateway on a switched
network then the
network sniffer may or may not see them if it isn't
listening at the source
or the destination cables. Directed broadcasts may also not
be seen by the
network sniffer if it isn't sniffing directly on the
directed broadcast's
destination subnet.
The output from dbcar when its interest level argument is
set very high (say
to 100) will show many things about the internals of the
CAC library. Also,
there is a timer thread created by the CAC library that is
scheduling the
search requests, and you might send a stack trace (if it
isn't simply
waiting on its time delay event flag).
Another possibility is that something is amiss in the GW.
Does restarting
the GW improve the situation?
Jeff ______________________________________________________
Jeffrey O. Hill Email [email protected]
LANL MS H820 Voice 505 665 1831
Los Alamos NM 87545 USA FAX 505 665 5107
Message content: TSPA
-----Original Message-----
From: [email protected]
[mailto:[email protected]]
On Behalf Of Dirk Zimoch
Sent: Wednesday, August 05, 2009 8:09 AM
To: tech-talk
Subject: CA reconnection problem
Hi all (Jeff in particular)
I have seen a problem with some of our IOCs today. Several
beamline IOCs
running
3.14.8 have disconnected CA input links to a central IOC
(3.13.10) even
though
the channels are there. The beamline IOCs run behind ca
gateways. It may
be that
the gateways or the central IOC had been offline for more
than some hours
recently in the last shutdown.
Even though the channels are back and I can read them from
the beamline
network
with caget the IOCs don't reconnect.
I have started a softioc in the beamline network to
generate a beacon
anomaly
but even then the links do not reconnect. I cannot see any
search requests
from
that IOC.
Only 3.14.8 systems show this problem. 3.13. systems
don't. Is there a bug
in
the 3.14.8 implementation of CAC I should know of?
Dirk
--
Dr. Dirk Zimoch
Paul Scherrer Institut, WBGB/006
5232 Villigen PSI, Switzerland
Phone +41 56 310 5182
--
Dr. Dirk Zimoch
Paul Scherrer Institut, WBGB/006
5232 Villigen PSI, Switzerland
Phone +41 56 310 5182
--
Scanned by iCritical.
This e-mail and any attachments may contain confidential, copyright and or
privileged material, and are for the use of the intended addressee only.
If you are not the intended addressee or an authorised recipient of the
addressee please notify us of receipt by returning the e-mail and do not
use, copy, retain, distribute or disclose the information in or attached
to the e-mail.
Any opinions expressed within this e-mail are those of the individual and
not necessarily of Diamond Light Source Ltd.
Diamond Light Source Ltd. cannot guarantee that this e-mail or any
attachments are free from viruses and we cannot accept liability for any
damage which you may sustain as a result of software viruses which may be
transmitted in or with the message.
Diamond Light Source Limited (company no. 4375679). Registered in England
and Wales with its registered office at Diamond House, Harwell Science and
Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom
--
Dr. Dirk Zimoch
Paul Scherrer Institut, WBGB/006
5232 Villigen PSI, Switzerland
Phone +41 56 310 5182
- References:
- RE: CA reconnection problem Jeff Hill
- RE: CA reconnection problem Shepherd, Emma (DLSLtd,RAL,DIA)
- RE: CA reconnection problem Jeff Hill
- Navigate by Date:
- Prev:
RE: CA reconnection problem Jeff Hill
- Next:
RE: uptodate version of modtcp? Mark Rivers
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
<2009>
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
- Navigate by Thread:
- Prev:
RE: CA reconnection problem Jeff Hill
- Next:
RE: CA reconnection problem Jeff Hill
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
<2009>
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
|