1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 <2009> 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 | Index | 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 <2009> 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 |
<== Date ==> | <== Thread ==> |
---|
Subject: | Re: CA reconnection problem |
From: | Dirk Zimoch <[email protected]> |
To: | Jeff Hill <[email protected]> |
Cc: | "'Ernest L. Williams Jr.'" <[email protected]>, "'Amedeo Perazzo'" <[email protected]>, "'tech-talk'" <[email protected]> |
Date: | Fri, 14 Aug 2009 14:20:14 +0200 |
I will test it next week. Jeff Hill wrote:
DB CA link does not connect through the gatewayI committed a PCAS change this evening to the R3.14 branch which is hopefully a fix for this issue. See mantis entry 360. This fix, and a number of other fixes to PCAS will be in R3.14.11 RC1. I also installed an upgrade into the gateway so that it can now use ca_put for put requests and ca_put_callback for put callback requests. PS: These changes impact the CA gateway, but not IOCs. Jeff ______________________________________________________ Jeffrey O. Hill Email [email protected] LANL MS H820 Voice 505 665 1831 Los Alamos NM 87545 USA FAX 505 665 5107 Message content: TSPA-----Original Message----- From: Shepherd, Emma (DLSLtd,RAL,DIA) [mailto:[email protected]] Sent: Tuesday, August 11, 2009 3:00 AM To: Jeff Hill; Rees, Nick (DLSLtd,RAL,DIA); Dirk Zimoch Cc: tech-talk Subject: RE: CA reconnection problem Hi all, I've gone back through my notes as well to see what I can remember about this. I think part of the problem is that we have had several different issues at Diamond with similar symptoms: 1) The CAC-TCP-recv task gets suspended - I don't know what causes it but things are well and truly broken when this happens and we have to reboot the IOC. I haven't seen this in quite a while though. 2) A timerQueue task responsible for scheduling UDP requests gets suspended. In one investigation the task trace showed that an exception had been thrown and not handled properly - Mantis entry 336 was created for this and I believe you committed a fix Jeff. 3) Some CA links not working but nothing obvious wrong, no suspended tasks. Like in Dirk's case, the link is connecting through a gateway and re-writing the record name to the link can fix that particular link. I hope that helps to explain our situation at the moment, let me know if there's any more investigation I can do here. Emma
-----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Jeff Hill Sent: 11 August 2009 02:02 To: Rees, Nick (DLSLtd,RAL,DIA); 'Dirk Zimoch' Cc: 'tech-talk' Subject: RE: CA reconnection problem I had another look at this issue this evening and suspect now that the timer queue entries with negative delays reflect only the time spent (on the order of 20 seconds) holding the lock while dumping "dbcar" at high intrest levels to the slow 9600 baud serial port. So perhaps its best to step back a bit and look at what Diamond and SLS have in common. It seems that both sites have a db ca link connecting through the gateway when this occurs. Dirk mentioned that R3.13 CA links connect fine at SLS, but R3.14 ones don't. That can probably be explained by the fact that the R3.13 ca client library disconnects both the circuit and the application when a circuit is unresponsive. In contrast, the R3.14 ca client library disconnects only the application, and not the circuit when the circuit is unresponsive relying instead on TCP to disconnect broken circuits. So what probably is occurring is an issue with the gateway where PCAS isn't properly scheduling itself to interact with the R3.14 ca client's circuit, but the circuit remains connected at TCP's level. In Dirk's situation I seem to recall that a power outage was the precipitating event. A precipitating event at Diamond maybe isn't identified? There are an assortment of PCAS, and also gateway improvements, coming soon in R3.14.11 RC1 but I'm not certain that this one has been fixed. I will see if I can reproduce it. Jeff ______________________________________________________ Jeffrey O. Hill Email [email protected] LANL MS H820 Voice 505 665 1831 Los Alamos NM 87545 USA FAX 505 665 5107 Message content: TSPA
-----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Jeff Hill Sent: Monday, August 10, 2009 11:05 AM To: 'Rees, NP (Nick)'; 'Dirk Zimoch' Cc: 'tech-talk' Subject: RE: CA reconnection problem Nick, Dirk,
I think this is possibly the same problem we described in the thread: "Inter-IOC link problems" http://www.aps.anl.gov/epics/tech-talk/2008/msg01103.phpI vaguely recall that there was a mantis entry for theabove, but havehad no success finding it. I did find 164 which is somewhatclose, butprobably not the same issue. Also 330 (which will probably soon be assigned not-a- bug status) might be of some interest if you are seeingunexpectedly longreconnect times with R3.14. Nevertheless, working only from my memory, it's not clear that Emma Shepherd's and Dirk's issues have the same symptoms. In Dirk's situation dbcar (at high interest levels) shows entries in the CA client library's timer queue with negative expiration which, as I recall, was not identified as a symptom with Emma Shepherd's situation? The tech-talk thread does indicate some closeexaminationof the dbcar output occurred, which probably would have immediately identified the negative expirationtimes on thetimer queue as a problem? In any case you now have a common symptom to look for withdbcar thenext time that it happens. Stack traces from _all_ of the dbca, CA client library, and timer threads involved will be the information I need todetermine a cause.When deciding on which threads traces to send, look at the threads operating at close to the same scheduling priority as dbca in the vxWorks "i" output. Emphasis on all because if there is a deadlock issue they can be trickyto identifyw/o clear understanding of the lock hierarchy interactionsbetween the two ormore threads involved. Of course, if we have a deadlocksituation, thenany threads that are continuously blocking on a mutex are of particular interest. Thanks for your help, Jeff ______________________________________________________ Jeffrey O. Hill Email [email protected] LANL MS H820 Voice 505 665 1831 Los Alamos NM 87545 USA FAX 505 665 5107 Message content: TSPA-----Original Message----- From: Rees, NP (Nick) [mailto:[email protected]] Sent: Thursday, August 06, 2009 6:29 AM To: Dirk Zimoch; Jeff Hill Cc: tech-talk Subject: RE: CA reconnection problem Hi Dirk, Jeff, I think this is possibly the same problem we described in the thread: "Inter-IOC link problems" http://www.aps.anl.gov/epics/tech-talk/2008/msg01103.php That thread seemed to peter out because we basically developed reboot workarounds that meant it didn't bother us as
much. However,I don't think it was ever solved. I don't know whether there is a Mantis entry for it. Cheers, Nick Rees Principal Software Engineer Phone: +44 (0)1235-778430 Diamond Light Source Fax: +44 (0)1235-446713-----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Dirk Zimoch Sent: 06 August 2009 10:22 To: Jeff Hill Cc: 'tech-talk' Subject: Re: CA reconnection problem Hi Jeff, Time advances normally on the IOC. Bug 299 is not related because I have no suspended
tasks. Bug 330is also not related because my channels never reconnect, even after days. The IOC connects to the gateway with normal broadcasts on the local network. I can see beacons from all IOCs andgateway and Ican see the beacon anomaly when I start a softioc. Still the IOC does not send any search requests to reconnect the links. I can see the search requests from 3.13 IOCs, 3.14.8 medms and the gateway after a beacon anomaly, but not from 3.14.8 IOCs. It is definitely not the case that the 3.14.8 IOC never stops searching. I do not want to restart the gateway at the momentbecause we haveuser operation. I also think it's not necessary becauseI can seethe channels through the gateway with caget. I doubt that it is a gateway problem at all. Rebooting the IOC helps. Writing the record name to thelink againhelps too, but only for this link. All other links onthe same IOCto the same target record stay disconnected. I do not understand this. UDP package size is not an issue because its only 8 broken CA links to 2 records. So it should easily fit into onepackage. Butas I said, I see no package at all (except when I rewnew the link). I would send a stack trace of the CAC timer thread if Icould findit. What's the name? Here is the full dbcar output, mybe it helps: X04SA-VME-ID2 > dbcar "",100 connected X04SA-VME-ID2 X04SA-ID-LMD:PMD.INPB X04SA-ID-LMD:SCALER.S3 connected X04SA-VME-ID2 X04SA-ID-LMD:PMD-INJ.INPB X04SA-ID-LMD:PMD connected X04SA-VME-ID2 X04SA-ID-LMD:PMD-NOINJ.INPB X04SA-ID-LMD:PMD connected X04SA-VME-ID2 X04SA-ID-LMD:PMU.INPB X04SA-ID-LMD:SCALER.S2 connected X04SA-VME-ID2 X04SA-ID-LMU:PMD.INPB X04SA-ID-LMU:SCALER.S3 connected X04SA-VME-ID2 X04SA-ID-LMU:PMU.INPB X04SA-ID-LMU:SCALER.S2 connected 129.129.104.31:5064 X04SA-ID-FBPMD:ALARM.INPD X04SA-ID-GAP:ISOPEN not_connected X04SA-ID-FBPMD:X-COMP.INPE ACOAU-ACCU:OP-MODE not_connected X04SA-ID-FBPMD:X-COMP.INPI ARIDI-PCT:CURRENT not_connected X04SA-ID-FBPMD:Y-COMP.INPE ACOAU-ACCU:OP-MODE not_connected X04SA-ID-FBPMD:Y-COMP.INPI ARIDI-PCT:CURRENT connected 129.129.104.31:5064 X04SA-ID-FBPMU:ALARM.INPD X04SA-ID-GAP:ISOPEN not_connected X04SA-ID-FBPMU:X-COMP.INPE ACOAU-ACCU:OP-MODE not_connected X04SA-ID-FBPMU:X-COMP.INPI ARIDI-PCT:CURRENT not_connected X04SA-ID-FBPMU:Y-COMP.INPE ACOAU-ACCU:OP-MODE not_connected X04SA-ID-FBPMU:Y-COMP.INPI ARIDI-PCT:CURRENT connected 129.129.104.31:5064 X04SA-ID:FBPM-ILK.OUT X04SA-ID:ILK-BITS.B1 connected X04SA-VME-ID2 X04SA-VME-ID2:HBTSYNC.INPA X04SA-VME-ID2:HBT.VAL connected X04SA-VME-ID2 X04SA-VME-ID2:MEMALLOC.INPA X04SA-VME-ID2:MEMALLOCRA W connected X04SA-VME-ID2 X04SA-VME-ID2:MEMFREE.INPA X04SA-VME-ID2:MEMFREE-RAW connected X04SA-VME-ID2 X04SA-VME-ID2:MEMMAX.INPA X04SA-VME-ID2:MEMMAX-RAW ncalinks 21 not connected 8 no_read_access 0 no_write_access 0 nDisconnect 0 nNoWrite 0 ca_client_context at 0x7ca6c90 pndRecvCnt=0 ioSeqNo=0 dbContext at 0x7ca6a80 event call back cache location 0x7bac130, andits size 24dbContextReadNotifyCache dbContextReadNotifyCacheAlocator count 1 and size 88 epicsMutexId 0x7ca6c20 source ../../../src/libCom/osi/epicsMutex.cpp line 197 Semaphore Id : 0x7ca6c50 Semaphore Type : MUTEX Task Queuing : PRIORITY Pended Tasks : 3 Owner : 0x7a331e0 (tShell) Options : 0xd SEM_Q_PRIORITY SEM_DELETE_SAFE SEM_INVERSION_SAFE VxWorks Events -------------- Registered Task : NONE Event(s) to Send : N/A Options : N/A Pended Tasks ------------ NAME TID PRI TIMEOUT ---------- -------- --- ------- timerQueue 7bf5d20 148 0 CAC-event 7bb5a90 148 0 CAC-TCP-recv 7a5e3b0 150 0 Channel Access Client Context at 0x7bf8cc0 for user ioc revision "@(#) EPICS 3.14.8.2, CA ClientLibrary Nov 292006" Hash table with 512 buckets and 2 items of type 6tcpiiu installed Virtual circuit to "129.129.104.31:5064" atversion V4.8 state 1current data cache pointer = 0x7b88c20 currentdata cachesize = 16384 contiguous receive message count=0, busydetect bool=0,flow control boo l=0 eceive thread is busy=0 virtual circuit socket identifier 20 send thread flush signal: Semaphore Id : 0x7b88bc0 Semaphore Type : BINARY Task Queuing : FIFO Pended Tasks : 1 State : EMPTY Options : 0x0 SEM_Q_FIFO VxWorks Events -------------- Registered Task : NONE Event(s) to Send : N/A Options : N/A Pended Tasks ------------ NAME TID PRI TIMEOUT ---------- -------- --- ------- CAC-TCP-send 7b8fba0 148 0 send thread: recv thread: echo pending bool = 0 IO identifier hash table: Connected channels Channel "X04SA-ID-GAP:ISOPEN", connected to server 129.129.104.31:5064, native t ype DBF_ENUM, nativeelement count1, read access, write access network IO pointer = 0x7b96cd0 server identifier 392 search retry number=1 name length=20 Channel "X04SA-ID-GAP:ISOPEN", connected to server 129.129.104.31:5064, native t ype DBF_ENUM, nativeelement count1, read access, write access network IO pointer = 0x7b96cd0 server identifier 393 search retry number=1 name length=20 Channel "X04SA-ID:ILK-BITS.B1", connected to server 129.129.104.31:5064, native type DBF_CHAR, nativeelement count 1,read access, write access network IO pointer = 0x7b96cd0 server identifier 394 search retry number=1 name length=21 Virtual circuit to "129.129.104.14:5064" at versionV4.11 state 1current data cache pointer = 0x7af62b0 currentdata cachesize = 16384 contiguous receive message count=0, busydetect bool=0,flow control boo l=0 eceive thread is busy=0 virtual circuit socket identifier 22 send thread flush signal: Semaphore Id : 0x7a1f670 Semaphore Type : BINARY Task Queuing : FIFO Pended Tasks : 1 State : EMPTY Options : 0x0 SEM_Q_FIFO VxWorks Events -------------- Registered Task : NONE Event(s) to Send : N/A Options : N/A Pended Tasks ------------ NAME TID PRI TIMEOUT ---------- -------- --- ------- CAC-TCP-send 7a61590 148 0 send thread: recv thread: echo pending bool = 0 IO identifier hash table: Unresponsive circuit channels Channel "ARIDI-PCT:CURRENT" is disconnected network IO pointer = 0x7b96f58 server identifier 1 search retry number=1 name length=18 Channel "ARIDI-PCT:CURRENT" is disconnected network IO pointer = 0x7b96f58 server identifier 2 search retry number=1 name length=18 Channel "ARIDI-PCT:CURRENT" is disconnected network IO pointer = 0x7b96f58 server identifier 3 search retry number=1 name length=18 Channel "ARIDI-PCT:CURRENT" is disconnected network IO pointer = 0x7b96f58 server identifier 4 search retry number=1 name length=18 Channel "ACOAU-ACCU:OP-MODE" is disconnected network IO pointer = 0x7b96f58 server identifier 5 search retry number=1 name length=19 Channel "ACOAU-ACCU:OP-MODE" is disconnected network IO pointer = 0x7b96f58 server identifier 6 search retry number=1 name length=19 Channel "ACOAU-ACCU:OP-MODE" is disconnected network IO pointer = 0x7b96f58 server identifier 7 search retry number=1 name length=19 Channel "ACOAU-ACCU:OP-MODE" is disconnected network IO pointer = 0x7b96f58 server identifier 8 search retry number=1 name length=19 Virtual circuit to "129.129.104.31:5064" at version V4.8 state 1 current data cache pointer = 0x7b88c20 currentdata cachesize = 16384 contiguous receive message count=0, busydetect bool=0,flow control boo l=0 eceive thread is busy=0 virtual circuit socket identifier 20 send thread flush signal: Semaphore Id : 0x7b88bc0 Semaphore Type : BINARY Task Queuing : FIFO Pended Tasks : 1 State : EMPTY Options : 0x0 SEM_Q_FIFO VxWorks Events -------------- Registered Task : NONE Event(s) to Send : N/A Options : N/A Pended Tasks ------------ NAME TID PRI TIMEOUT ---------- -------- --- ------- CAC-TCP-send 7b8fba0 148 0 send thread: recv thread: echo pending bool = 0 IO identifier hash table: Connected channels Channel "X04SA-ID-GAP:ISOPEN", connected to server 129.129.104.31:5064, native t ype DBF_ENUM, nativeelement count1, read access, write access network IO pointer = 0x7b96cd0 server identifier 392 search retry number=1 name length=20 Channel "X04SA-ID-GAP:ISOPEN", connected to server 129.129.104.31:5064, native t ype DBF_ENUM, nativeelement count1, read access, write access network IO pointer = 0x7b96cd0 server identifier 393 search retry number=1 name length=20 Channel "X04SA-ID:ILK-BITS.B1", connected to server 129.129.104.31:5064, native type DBF_CHAR, nativeelement count 1,read access, write access network IO pointer = 0x7b96cd0 server identifier 394 search retry number=1 name length=21 Virtual circuit to "129.129.104.14:5064" at versionV4.11 state 1current data cache pointer = 0x7af62b0 currentdata cachesize = 16384 contiguous receive message count=0, busydetect bool=0,flow control boo l=0 eceive thread is busy=0 virtual circuit socket identifier 22 send thread flush signal: Semaphore Id : 0x7a1f670 Semaphore Type : BINARY Task Queuing : FIFO Pended Tasks : 1 State : EMPTY Options : 0x0 SEM_Q_FIFO VxWorks Events -------------- Registered Task : NONE Event(s) to Send : N/A Options : N/A Pended Tasks ------------ NAME TID PRI TIMEOUT ---------- -------- --- ------- CAC-TCP-send 7a61590 148 0 send thread: recv thread: echo pending bool = 0 IO identifier hash table: Unresponsive circuit channels Channel "ARIDI-PCT:CURRENT" is disconnected network IO pointer = 0x7b96f58 server identifier 1 search retry number=1 name length=18 Channel "ARIDI-PCT:CURRENT" is disconnected network IO pointer = 0x7b96f58 server identifier 2 search retry number=1 name length=18 Channel "ARIDI-PCT:CURRENT" is disconnected network IO pointer = 0x7b96f58 server identifier 3 search retry number=1 name length=18 Channel "ARIDI-PCT:CURRENT" is disconnected network IO pointer = 0x7b96f58 server identifier 4 search retry number=1 name length=18 Channel "ACOAU-ACCU:OP-MODE" is disconnected network IO pointer = 0x7b96f58 server identifier 5 search retry number=1 name length=19 Channel "ACOAU-ACCU:OP-MODE" is disconnected network IO pointer = 0x7b96f58 server identifier 6 search retry number=1 name length=19 Channel "ACOAU-ACCU:OP-MODE" is disconnected network IO pointer = 0x7b96f58 server identifier 7 search retry number=1 name length=19 Channel "ACOAU-ACCU:OP-MODE" is disconnected network IO pointer = 0x7b96f58 server identifier 8 search retry number=1 name length=19 entries per bucket: mean = 0.003906 std dev = 0.062378 max = 1 connection time out watchdog period 30.000000Datagram IOcircuit (and disconnected channel repository) repeater port 5065 default server port 5064 Channel Access Address List 129.129.104.255:5064 socket identifier 18 bytes in xmit buffer 16 shut down command bool 0 recv thread exit signal: repeater subscribe timer: attempts=2 registered=1 once=0 disconnect governor timer: search timer delay 0.032000 0 channels with search request pending 0 channels with search response pending search timer delay 0.064000 0 channels with search request pending 0 channels with search response pending search timer delay 0.128000 0 channels with search request pending 0 channels with search response pending search timer delay 0.256000 0 channels with search request pending 0 channels with search response pending search timer delay 0.512000 0 channels with search request pending 0 channels with search response pending search timer delay 1.024000 0 channels with search request pending 0 channels with search response pending search timer delay 2.048000 0 channels with search request pending 0 channels with search response pending search timer delay 4.096000 0 channels with search request pending 0 channels with search response pending search timer delay 8.192000 0 channels with search request pending 0 channels with search response pending search timer delay 16.384000 0 channels with search request pending 0 channels with search response pending search timer delay 32.768000 0 channels with search request pending 0 channels with search response pending search timer delay 65.536000 0 channels with search request pending 0 channels with search response pending search timer delay 131.072000 0 channels with search request pending 0 channels with search response pending search timer delay 262.144000 0 channels with search request pending 0 channels with search response pending Program begin time: epicsTime: MON MAY 11 2009 16:53:29.047623310 epicsTime: revision "@(#) EPICS 3.14.8.2, Common Utilities Library Nov 29 2006" Channel identifier hash table: Hash table with 512 buckets and 11 items of type 4nciu installed Channel "X04SA-ID-GAP:ISOPEN", connected to server 129.129.104.31:5064, native t ype DBF_ENUM, native element count 1, read access, write access network IO pointer = 0x7b96cd0 server identifier 392 search retry number=1 name length=20 Channel "ACOAU-ACCU:OP-MODE" is disconnected network IO pointer = 0x7b96f58 server identifier 5 search retry number=1 name length=19 Channel "ARIDI-PCT:CURRENT" is disconnected network IO pointer = 0x7b96f58 server identifier 1 search retry number=1 name length=18 Channel "ACOAU-ACCU:OP-MODE" is disconnected network IO pointer = 0x7b96f58 server identifier 6 search retry number=1 name length=19 Channel "ARIDI-PCT:CURRENT" is disconnected network IO pointer = 0x7b96f58 server identifier 2 search retry number=1 name length=18 Channel "X04SA-ID-GAP:ISOPEN", connected to server 129.129.104.31:5064, native t ype DBF_ENUM, native element count 1, read access, write access network IO pointer = 0x7b96cd0 server identifier 393 search retry number=1 name length=20 Channel "ACOAU-ACCU:OP-MODE" is disconnected network IO pointer = 0x7b96f58 server identifier 7 search retry number=1 name length=19 Channel "ARIDI-PCT:CURRENT" is disconnected network IO pointer = 0x7b96f58 server identifier 3 search retry number=1 name length=18 Channel "ACOAU-ACCU:OP-MODE" is disconnected network IO pointer = 0x7b96f58 server identifier 8 search retry number=1 name length=19 Channel "ARIDI-PCT:CURRENT" is disconnected network IO pointer = 0x7b96f58 server identifier 4 search retry number=1 name length=18 Channel "X04SA-ID:ILK-BITS.B1", connected to server 129.129.104.31:5064, native type DBF_CHAR, native element count 1, read access, write access network IO pointer = 0x7b96cd0 server identifier 394 search retry number=1 name length=21 Channel "X04SA-ID-GAP:ISOPEN", connected to server 129.129.104.31:5064, native t ype DBF_ENUM, native element count 1, read access, write access network IO pointer = 0x7b96cd0 server identifier 392 search retry number=1 name length=20 Channel "ACOAU-ACCU:OP-MODE" is disconnected network IO pointer = 0x7b96f58 server identifier 5 search retry number=1 name length=19 Channel "ARIDI-PCT:CURRENT" is disconnected network IO pointer = 0x7b96f58 server identifier 1 search retry number=1 name length=18 Channel "ACOAU-ACCU:OP-MODE" is disconnected network IO pointer = 0x7b96f58 server identifier 6 search retry number=1 name length=19 Channel "ARIDI-PCT:CURRENT" is disconnected network IO pointer = 0x7b96f58 server identifier 2 search retry number=1 name length=18 Channel "X04SA-ID-GAP:ISOPEN", connected to server 129.129.104.31:5064, native t ype DBF_ENUM, native element count 1, read access, write access network IO pointer = 0x7b96cd0 server identifier 393 search retry number=1 name length=20 Channel "ACOAU-ACCU:OP-MODE" is disconnected network IO pointer = 0x7b96f58 server identifier 7 search retry number=1 name length=19 Channel "ARIDI-PCT:CURRENT" is disconnected network IO pointer = 0x7b96f58 server identifier 3 search retry number=1 name length=18 Channel "ACOAU-ACCU:OP-MODE" is disconnected network IO pointer = 0x7b96f58 server identifier 8 search retry number=1 name length=19 Channel "ARIDI-PCT:CURRENT" is disconnected network IO pointer = 0x7b96f58 server identifier 4 search retry number=1 name length=18 Channel "X04SA-ID:ILK-BITS.B1", connected to server 129.129.104.31:5064, native type DBF_CHAR, native element count 1, read access, write access network IO pointer = 0x7b96cd0 server identifier 394 search retry number=1 name length=21 entries per bucket: mean = 0.021484 std dev = 0.144992 max = 1 IO identifier hash table: Hash table with 512 buckets and 10 items of type8baseNMIU installedevent subscription IO at 0x7ae2ef0, type DBF_invalid, element count 1, mask 5 event subscription IO at 0x7ae2f1c, type DBF_invalid, element count 1, mask 5 event subscription IO at 0x7ae2f48, type DBF_invalid, element count 1, mask 5 event subscription IO at 0x7ae2f74, type DBF_invalid, element count 1, mask 5 event subscription IO at 0x7ae2fa0, type DBF_invalid, element count 1, mask 5 event subscription IO at 0x7ae2fcc, type DBF_invalid, element count 1, mask 5 event subscription IO at 0x7ae2ff8, type DBF_invalid, element count 1, mask 5 event subscription IO at 0x7ae3024, type DBF_invalid, element count 1, mask 5 event subscription IO at 0x7ae3050, type DBF_invalid, element count 1, mask 5 event subscription IO at 0x7ae307c, type DBF_invalid, element count 1, mask 5 event subscription IO at 0x7ae2ef0, type DBF_invalid, element count 1, mask 5 event subscription IO at 0x7ae2f1c, type DBF_invalid, element count 1, mask 5 event subscription IO at 0x7ae2f48, type DBF_invalid, element count 1, mask 5 event subscription IO at 0x7ae2f74, type DBF_invalid, element count 1, mask 5 event subscription IO at 0x7ae2fa0, type DBF_invalid, element count 1, mask 5 event subscription IO at 0x7ae2fcc, type DBF_invalid, element count 1, mask 5 event subscription IO at 0x7ae2ff8, type DBF_invalid, element count 1, mask 5 event subscription IO at 0x7ae3024, type DBF_invalid, element count 1, mask 5 event subscription IO at 0x7ae3050, type DBF_invalid, element count 1, mask 5 event subscription IO at 0x7ae307c, type DBF_invalid, element count 1, mask 5 entries per bucket: mean = 0.019531 std dev = 0.138383 max = 1 Beacon source identifier hash table: Hash table with 512 buckets and 11 items of type 3bhe installed CA beacon hash entry for 129.129.104.13:32771 with period estimate 15.049622 beacon number 718291, on MON JUN 29 2009 13:59:03 CA beacon hash entry for 129.129.104.33:5064 with period estimate 14.849221 beacon number 0, on THU AUG 06 2009 10:56:40 CA beacon hash entry for 129.129.104.34:5064 with period estimate 14.891577 beacon number 0, on THU AUG 06 2009 10:56:30 CA beacon hash entry for 129.129.104.36:5064 with period estimate 14.936137 beacon number 0, on THU AUG 06 2009 10:56:36 CA beacon hash entry for 129.129.104.37:5064 with period estimate 14.915017 beacon number 26678, on THU AUG 06 2009 10:56:33 CA beacon hash entry for 129.129.104.39:5064 with period estimate 14.838267 beacon number 0, on THU AUG 06 2009 10:56:39 CA beacon hash entry for 129.129.104.13:5064 with period estimate 20.991435 beacon number 51, on THU AUG 06 2009 10:56:33 CA beacon hash entry for 129.129.104.14:5064 with period estimate 14.935088 beacon number 218331, on THU AUG 06 2009 10:56:37 CA beacon hash entry for 129.129.104.15:5064 with period estimate 14.881638 beacon number 218058, on THU AUG 06 2009 10:56:29 CA beacon hash entry for 129.129.104.30:5064 with period estimate 14.953183 beacon number 0, on THU AUG 06 2009 10:56:39 CA beacon hash entry for 129.129.104.31:5064 with period estimate 14.925085 beacon number 0, on THU AUG 06 2009 10:56:34 CA beacon hash entry for 129.129.104.13:32771 with period estimate 15.049622 beacon number 718291, on MON JUN 29 2009 13:59:03 CA beacon hash entry for 129.129.104.33:5064 with period estimate 14.849221 beacon number 0, on THU AUG 06 2009 10:56:40 CA beacon hash entry for 129.129.104.34:5064 with period estimate 14.891577 beacon number 0, on THU AUG 06 2009 10:56:30 CA beacon hash entry for 129.129.104.36:5064 with period estimate 14.936137 beacon number 0, on THU AUG 06 2009 10:56:36 CA beacon hash entry for 129.129.104.37:5064 with period estimate 14.915017 beacon number 26678, on THU AUG 06 2009 10:56:33 CA beacon hash entry for 129.129.104.39:5064 with period estimate 14.838267 beacon number 0, on THU AUG 06 2009 10:56:39 CA beacon hash entry for 129.129.104.13:5064 with period estimate 20.991435 beacon number 51, on THU AUG 06 2009 10:56:33 CA beacon hash entry for 129.129.104.14:5064 with period estimate 14.935088 beacon number 218331, on THU AUG 06 2009 10:56:37 CA beacon hash entry for 129.129.104.15:5064 with period estimate 14.881638 beacon number 218058, on THU AUG 06 2009 10:56:29 CA beacon hash entry for 129.129.104.30:5064 with period estimate 14.953183 beacon number 0, on THU AUG 06 2009 10:56:39 CA beacon hash entry for 129.129.104.31:5064 with period estimate 14.925085 beacon number 0, on THU AUG 06 2009 10:56:34 entries per bucket: mean = 0.021484 std dev = 0.144992 max = 1 Timer queue: EPICS threaded timer queue at 0x7bf6240 epicsTimerQueue with 15 items pending 11searchTimer, state = pending, delay = -19.894333 search timer delay 0.064000 0 channels with search request pending 0 channels with search response pending 11searchTimer, state = pending, delay = -20.000999 search timer delay 1.024000 0 channels with search request pending 0 channels with search response pending 11searchTimer, state = pending, delay = -20.163666 search timer delay 0.128000 0 channels with search request pending 0 channels with search response pending 11searchTimer, state = pending, delay = -20.235666 search timer delay 0.256000 0 channels with search request pending 0 channels with search response pending 11searchTimer, state = pending, delay = -20.296333 search timer delay 0.512000 0 channels with search request pending 0 channels with search response pending 11searchTimer, state = pending, delay = -18.926999 search timer delay 2.048000 0 channels with search request pending 0 channels with search response pending 11searchTimer, state = pending, delay = -17.095666 search timer delay 4.096000 0 channels with search request pending 0 channels with search response pending 23disconnectGovernorTimer, state = pending, delay = -16.191666 disconnect governor timer: 11searchTimer, state = pending, delay = -13.316332 search timer delay 8.192000 0 channels with search request pending 0 channels with search response pending 11searchTimer, state = pending, delay = -5.424332 search timer delay 16.384000 0 channels with search request pending 0 channels with search response pending 11searchTimer, state = pending, delay = -5.590332 search timer delay 32.768000 0 channels with search request pending 0 channels with search response pending 15tcpRecvWatchdog, state = pending, delay = 1.191668 Receive virtual circuit watchdog at 0x7b96df0, period 30.000000 11searchTimer, state = pending, delay = 6.361002 search timer delay 65.536000 0 channels with search request pending 0 channels with search response pending 11searchTimer, state = pending, delay = 51.530337 search timer delay 131.072000 0 channels with search request pending 0 channels with search response pending 11searchTimer, state = pending, delay = 236.852334 search timer delay 262.144000 0 channels with search request pending 0 channels with search response pending reschedule event Semaphore Id : 0x7bf6070 Semaphore Type : BINARY Task Queuing : FIFO Pended Tasks : 0 State : EMPTY Options : 0x0 SEM_Q_FIFO VxWorks Events -------------- Registered Task : NONE Event(s) to Send : N/A Options : N/A exit event Semaphore Id : 0x7bf6030 Semaphore Type : BINARY Task Queuing : FIFO Pended Tasks : 0 State : EMPTY Options : 0x0 SEM_Q_FIFO VxWorks Events -------------- Registered Task : NONE Event(s) to Send : N/A Options : N/A exitFlag = F, terminateFlag = F IP address to name conversion engine: ipAddrToAsciiEngine at 0x7bf86c0 with 0 requests pending mutex: epicsMutexId 0x7bf8650 source ../../../src/libCom/osi/epicsMutex.cpp line 197 Semaphore Id : 0x7bf8680 Semaphore Type : MUTEX Task Queuing : PRIORITY Pended Tasks : 0 Owner : 0x7a331e0 (tShell) Options : 0xd SEM_Q_PRIORITY SEM_DELETE_SAFE SEM_INVERSION_SAFE VxWorks Events -------------- Registered Task : NONE Event(s) to Send : N/A Options : N/A laborEvent: Semaphore Id : 0x7bf8610 Semaphore Type : BINARY Task Queuing : FIFO Pended Tasks : 1 State : EMPTY Options : 0x0 SEM_Q_FIFO VxWorks Events -------------- Registered Task : NONE Event(s) to Send : N/A Options : N/A Pended Tasks ------------ NAME TID PRI TIMEOUT ---------- -------- --- ------- ipToAsciiProxy 7bf82c0 189 0 exitFlag boolean = 0 exit event: Default mutex: epicsMutexId 0x7ca6c20 source ../../../src/libCom/osi/epicsMutex.cpp line 197 Semaphore Id : 0x7ca6c50 Semaphore Type : MUTEX Task Queuing : PRIORITY Pended Tasks : 5 Owner : 0x7a331e0 (tShell) Options : 0xd SEM_Q_PRIORITY SEM_DELETE_SAFE SEM_INVERSION_SAFE VxWorks Events -------------- Registered Task : NONE Event(s) to Send : N/A Options : N/A Pended Tasks ------------ NAME TID PRI TIMEOUT ---------- -------- --- ------- scan10 7c2d530 136 0 CAC-UDP 7be23d0 147 0 timerQueue 7bf5d20 148 0 CAC-event 7bb5a90 148 0 CAC-TCP-recv 7a5e3b0 150 0 mutex: epicsMutexId 0x7ca6c20 source ../../../src/libCom/osi/epicsMutex.cpp line 197 Semaphore Id : 0x7ca6c50 Semaphore Type : MUTEX Task Queuing : PRIORITY Pended Tasks : 5 Owner : 0x7a331e0 (tShell) Options : 0xd SEM_Q_PRIORITY SEM_DELETE_SAFE SEM_INVERSION_SAFE VxWorks Events -------------- Registered Task : NONE Event(s) to Send : N/A Options : N/A Pended Tasks ------------ NAME TID PRI TIMEOUT ---------- -------- --- ------- scan10 7c2d530 136 0 CAC-UDP 7be23d0 147 0 timerQueue 7bf5d20 148 0 CAC-event 7bb5a90 148 0 CAC-TCP-recv 7a5e3b0 150 0 preemptive callback is enabled there are 0 unsatisfied IO operations blockingca_pend_io()the current io sequence number is 0 IO done event: Semaphore Id : 0x7ca6b70 Semaphore Type : BINARY Task Queuing : FIFO Pended Tasks : 0 State : EMPTY Options : 0x0 SEM_Q_FIFO VxWorks Events -------------- Registered Task : NONE Event(s) to Send : N/A Options : N/A Synchronous group identifier hash table: Hash table with 0 buckets and 0 items of type 4CASG installed value = 0 = 0x0 X04SA-VME-ID2 > Jeff Hill wrote:Dirk, The R3.14 design is more conservative from a perspective of avoiding positive congestion feedback. Here are the primary differences. O When a circuit times out the CAC library disconnects theapplication, butnot the circuit. We rely on TCP to decide when to give upon a circuit, anddisconnect it. O When the CAC library sees a beacon anomaly the searchrate receives aboost, but this boost is only to once every 5 seconds. Incontrast, newlycreated channels are initially searched for at a rate whichis closer to theaverage measured round trip time. The CAC library sendsmultiple searchrequests per UDP frame, and depending on past success ratesalso multipleUDP frames per periodic search attempt (that part isn'tdifferent betweenR3.14 and R3.15). Nevertheless, if there many channels itcan take a littlewhile to connect all of them if the IOC (or GW in thissituation has beenoff for a long time). O In R3.14 the CAC library never stops searching fordisconnected channels,but the maximum search rate is adjustable with anenvironment variable.I cannot see any search requests from that IOC.That doesn't sound right, but bear in mind that if the IOCis sendingunicasts, and not broadcasts, to the gateway on a switchednetwork then thenetwork sniffer may or may not see them if it isn'tlistening at the sourceor the destination cables. Directed broadcasts may also notbe seen by thenetwork sniffer if it isn't sniffing directly on thedirected broadcast'sdestination subnet. The output from dbcar when its interest level argument isset very high (sayto 100) will show many things about the internals of theCAC library. Also,there is a timer thread created by the CAC library that isscheduling thesearch requests, and you might send a stack trace (if itisn't simplywaiting on its time delay event flag). Another possibility is that something is amiss in the GW.Does restartingthe GW improve the situation? Jeff ______________________________________________________ Jeffrey O. Hill Email [email protected] LANL MS H820 Voice 505 665 1831 Los Alamos NM 87545 USA FAX 505 665 5107 Message content: TSPA-----Original Message----- From: [email protected][mailto:[email protected]]
On Behalf Of Dirk Zimoch Sent: Wednesday, August 05, 2009 8:09 AM To: tech-talk Subject: CA reconnection problem Hi all (Jeff in particular) I have seen a problem with some of our IOCs today. Severalbeamline IOCsrunning 3.14.8 have disconnected CA input links to a central IOC(3.13.10) eventhough the channels are there. The beamline IOCs run behind cagateways. It maybe that the gateways or the central IOC had been offline for morethan some hoursrecently in the last shutdown. Even though the channels are back and I can read them fromthe beamlinenetwork with caget the IOCs don't reconnect. I have started a softioc in the beamline network togenerate a beaconanomaly but even then the links do not reconnect. I cannot see anysearch requestsfrom that IOC. Only 3.14.8 systems show this problem. 3.13. systemsdon't. Is there a bugin the 3.14.8 implementation of CAC I should know of? Dirk -- Dr. Dirk Zimoch Paul Scherrer Institut, WBGB/006 5232 Villigen PSI, Switzerland Phone +41 56 310 5182-- Dr. Dirk Zimoch Paul Scherrer Institut, WBGB/006 5232 Villigen PSI, Switzerland Phone +41 56 310 5182-- Scanned by iCritical.This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail. Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd. Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message. Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom
-- Dr. Dirk Zimoch Paul Scherrer Institut, WBGB/006 5232 Villigen PSI, Switzerland Phone +41 56 310 5182