On 07/12/2018 02:03 PM, Ignacio wrote:
> Hello everyone,
> Thanks for the responses.
> Just to add more info, we have another system (with IP 172.17.102.130) that's reading tcs:ak:astCtx with a caget. Whenever we see the "not found" message on our test, we see this message on the TCS ioc console,
>
> Jul 12 12:40:59 E) PORT: tcs_vme, MSG: CAS: request from 172.17.102.130:57113 <http://172.17.102.130:57113> => bad resource ID
> Jul 12 12:40:59 E) PORT: tcs_vme, MSG: CAS: Request from 172.17.102.130:57113 <http://172.17.102.130:57113> => cmmd=12 cid=0xffffffff type=0 count=0 postsize=0
> Jul 12 12:40:59 E) PORT: tcs_vme, MSG: CAS: Request from 172.17.102.130:57113 <http://172.17.102.130:57113> => available=0x1 N=0 paddr=0x0
> Jul 12 12:40:59 E) PORT: tcs_vme, MSG: CAS: forcing disconnect from 172.17.102.130:57113 <http://172.17.102.130:57113>
What you're seeing is the IOC (aka. RSRV) flagging a protocol violation, and forcing the TCP connection closed.
"cmmd=12" is a CA_PROTO_CLEAR_CHANNEL message. And the the other fields look valid, except that "cid=0xffffffff"...
https://epics.anl.gov/base/R3-16/0-docs/CAproto/index.html
The labeling of cid= and available= are not specific to this message. The header is:
000c 0000 0000 0000
ffff ffff 0000 0001
Despite the label "cid=", in this message the 0xffffffff is the Server provided SID.
So the CID is 1, which makes sense as the client is 'caget' requesting a single PV.
RSRV and libca pick CID/SID numbers sequentially starting with 1.
Have a look at https://github.com/mdavidsaver/cashark and run
> wireshark -X lua_script:ca.lua test/catest.pcap
The first exchange (a caget) has CID=1 and SID=2, so it's obvious which is which.
Anyway, this does appear to be a bad SID sent by the client. And 0xffffffff is quite suggestive.
Like '-1' suggestive. And in fact the SID is initialized to -1 when a new channel structure (class nciu)
is allocated, and returned to -1 in nciu::setServerAddressUnknown().
https://github.com/epics-base/epics-base/search?q=sid+%3D+UINT_MAX&unscoped_q=sid+%3D+UINT_MAX
Given the circumstances of a CA_PROTO_CLEAR_CHANNEL message, I suspect that the a message
is being sent after the 'sid' member has been cleared.
Would it be possible for me to see a capture of the TCP traffic resulting from a 'caget' which triggers this error?
>
> The other system uses EPICS3.14.8.2
>
> Thanks again,
> Saludos,
> Ignacio A.
>
>
>
> On Thu, Jul 12, 2018 at 4:47 PM Michael Davidsaver <[email protected] <mailto:[email protected]>> wrote:
>
> On 07/12/2018 01:12 PM, Andrew Johnson wrote:
> > Hi Matt,
> >
> > On 07/12/2018 02:54 PM, Matt Rippa wrote:
> >> We ran a simple caget (once per second) from a linux host to an upgraded
> >> IOC. With the new rtems/3.14.12.7 <http://3.14.12.7> <http://3.14.12.7> IOC, it often
> >> experiences timeouts as shown below. With our legacy vxworks/3.13.9 IOC
> >> there are no timeouts logged. Relevant info is shown below. The
> >> caRepeater is running on the linux host.
> >
> > If your IOC is busy the caget default search timeout of 1 second may be
> > too short, have you tried using 'caget -w 5 tcs:ak:astCtx' say?
>
> @Matt, If this doesn't have an effect I'd suggest taking a packet capture
> to see whether these is a search reply coming back.
>
> As it happens, I had an symptom similar to this while playing around
> with an mvme31000 emulator. The issue turned out to be dropped RX
> interrupts due to a bug in the emulator. The effect was due to
> a hard coded 1 second timeout in CA during TCP connect().
>
> Have you tried the simple test of letting 'ping' run for a while?
> If the reported latencies aren't stable, it may be a sign of a
> lower level issue.
>
> >> I'm thinking that setting the linux host EPICS_CA_ADDR_LIST=YES may
> >> help. I'm running that test now.
> >
> > I have never seen an EPICS_CA_ADDR_LIST as long as yours, but given that
> > you seem to be searching multiple subnets maybe you do need that.
>
> Agreed. For this test I'd suggest setting EPICS_CA_ADDR_LIST with only
> the IP of the IOC you're testing. Along with EPICS_CA_AUTO_ADDR_LIST=NO
>
> > I don't think setting EPICS_CA_AUTO_ADDR_LIST to YES will make any
> > difference to this behaviour (your epicsPrtEnvParams showed it as YES
> > already, so unless you have explicitly changed it for the client you're
> > probably already searching your local subnet as well).
>
> Also if both IOC and client tools are 3.14.12.x then you can try
>
> EPICS_CA_AUTO_ADDR_LIST=NO EPICS_CA_ADDR_LIST= EPICS_CA_NAME_SERVERS=<IP>
>
> This will skip the UDP search altogether and do a TCP search.
>
>
> > HTH,
> >
> > - Andrew
> >
> >
> >> EPICS_CA_ADDR_LIST=172.17.2.255 172.17.3.255 172.17.102.130
> >> 172.17.105.20 172.16.102.130 aom-vme sbfmcao01.cl.gemini.edu <http://sbfmcao01.cl.gemini.edu>
> >> mcao-stealth.cl.gemini.edu <http://mcao-stealth.cl.gemini.edu> 172.17.106.111 172.17.105.37
> >> 172.17.107.50 172.17.55.101 172.17.101.101 172.17.65.255
> >> 172.17.102.139 172.17.102.138
> >>
> >> linux> caget tcs:ak:astCtx
> >> tcs:ak:astCtx 0
> >> linux> caget tcs:ak:astCtx
> >> Channel connect timed out: 'tcs:ak:astCtx' not found.
> >> linux> caget tcs:ak:astCtx
> >> Channel connect timed out: 'tcs:ak:astCtx' not found.
> >> linux> caget tcs:ak:astCtx
> >> tcs:ak:astCtx 0
> >
>
- References:
- caget randomly returns Channel connect timed out Matt Rippa
- Re: caget randomly returns Channel connect timed out Andrew Johnson
- Re: caget randomly returns Channel connect timed out Michael Davidsaver
- Re: caget randomly returns Channel connect timed out Ignacio
- Navigate by Date:
- Prev:
Re: caget randomly returns Channel connect timed out Matt Rippa
- Next:
GeSys+Cexp on RTEMS 4.10+ Anton Derbenev
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
<2018>
2019
2020
2021
2022
2023
2024
- Navigate by Thread:
- Prev:
Re: caget randomly returns Channel connect timed out Matt Rippa
- Next:
Re: caget randomly returns Channel connect timed out Matt Rippa
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
<2018>
2019
2020
2021
2022
2023
2024
|