1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 <2018> 2019 2020 2021 2022 2023 2024 2025 | Index | 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 <2018> 2019 2020 2021 2022 2023 2024 2025 |
<== Date ==> | <== Thread ==> |
---|
Subject: | Re: caget randomly returns Channel connect timed out |
From: | Matt Rippa <[email protected]> |
To: | Ignacio Arriagada <[email protected]> |
Cc: | Talk EPICS Tech <[email protected]> |
Date: | Fri, 13 Jul 2018 08:59:28 -1000 |
Hello everyone,Thanks for the responses.Just to add more info, we have another system (with IP 172.17.102.130) that's reading tcs:ak:astCtx with a caget. Whenever we see the "not found" message on our test, we see this message on the TCS ioc console,Jul 12 12:40:59 E) PORT: tcs_vme, MSG: CAS: request from 172.17.102.130:57113 => bad resource IDJul 12 12:40:59 E) PORT: tcs_vme, MSG: CAS: Request from 172.17.102.130:57113 => cmmd=12 cid=0xffffffff type=0 count=0 postsize=0Jul 12 12:40:59 E) PORT: tcs_vme, MSG: CAS: Request from 172.17.102.130:57113 => available=0x1 N=0 paddr=0x0Jul 12 12:40:59 E) PORT: tcs_vme, MSG: CAS: forcing disconnect from 172.17.102.130:57113The other system uses EPICS3.14.8.2Thanks again,Saludos,Ignacio A.On Thu, Jul 12, 2018 at 4:47 PM Michael Davidsaver <[email protected]> wrote:On 07/12/2018 01:12 PM, Andrew Johnson wrote:
> Hi Matt,
>
> On 07/12/2018 02:54 PM, Matt Rippa wrote:
>> We ran a simple caget (once per second) from a linux host to an upgraded
>> IOC. With the new rtems/3.14.12.7 <http://3.14.12.7> IOC, it often
>> experiences timeouts as shown below. With our legacy vxworks/3.13.9 IOC
>> there are no timeouts logged. Relevant info is shown below. The
>> caRepeater is running on the linux host.
>
> If your IOC is busy the caget default search timeout of 1 second may be
> too short, have you tried using 'caget -w 5 tcs:ak:astCtx' say?
@Matt, If this doesn't have an effect I'd suggest taking a packet capture
to see whether these is a search reply coming back.
As it happens, I had an symptom similar to this while playing around
with an mvme31000 emulator. The issue turned out to be dropped RX
interrupts due to a bug in the emulator. The effect was due to
a hard coded 1 second timeout in CA during TCP connect().
Have you tried the simple test of letting 'ping' run for a while?
If the reported latencies aren't stable, it may be a sign of a
lower level issue.
>> I'm thinking that setting the linux host EPICS_CA_ADDR_LIST=YES may
>> help. I'm running that test now.
>
> I have never seen an EPICS_CA_ADDR_LIST as long as yours, but given that
> you seem to be searching multiple subnets maybe you do need that.
Agreed. For this test I'd suggest setting EPICS_CA_ADDR_LIST with only
the IP of the IOC you're testing. Along with EPICS_CA_AUTO_ADDR_LIST=NO
> I don't think setting EPICS_CA_AUTO_ADDR_LIST to YES will make any
> difference to this behaviour (your epicsPrtEnvParams showed it as YES
> already, so unless you have explicitly changed it for the client you're
> probably already searching your local subnet as well).
Also if both IOC and client tools are 3.14.12.x then you can try
EPICS_CA_AUTO_ADDR_LIST=NO EPICS_CA_ADDR_LIST= EPICS_CA_NAME_SERVERS=<IP>
This will skip the UDP search altogether and do a TCP search.
> HTH,
>
> - Andrew
>
>
>> EPICS_CA_ADDR_LIST=172.17.2.255 172.17.3.255 172.17.102.130
>> 172.17.105.20 172.16.102.130 aom-vme sbfmcao01.cl.gemini.edu
>> mcao-stealth.cl.gemini.edu 172.17.106.111 172.17.105.37
>> 172.17.107.50 172.17.55.101 172.17.101.101 172.17.65.255
>> 172.17.102.139 172.17.102.138
>>
>> linux> caget tcs:ak:astCtx
>> tcs:ak:astCtx 0
>> linux> caget tcs:ak:astCtx
>> Channel connect timed out: 'tcs:ak:astCtx' not found.
>> linux> caget tcs:ak:astCtx
>> Channel connect timed out: 'tcs:ak:astCtx' not found.
>> linux> caget tcs:ak:astCtx
>> tcs:ak:astCtx 0
>