Thanks for the ideas. It's intriguing this 1 second delay is not long
enough.
Our change from VxWorks to RTEMS did not change the fundamental
application logic.
The changes were OSI semantics. Further, the CPU loads are only ~10-20%
viewing iocStats.
Is that the best assessment of a system being "busy" ? Perhaps iocStats
doesn't give enough
resolution.
We're collecting data from new tests at both our sites and should know more
tomorrow. I ran a quick ping test and interestingly the VxWorks IOC
shows higher rtt mdev:
RTEMS:
53 packets transmitted, 53 received, 0% packet loss, time 52045ms
rtt min/avg/max/mdev = 1.126/1.153/1.204/0.038 ms
VxWorks:
53 packets transmitted, 53 received, 0% packet loss, time 52059ms
rtt min/avg/max/mdev = 0.951/1.052/1.902/0.184 ms
-Matt
On Thu, Jul 12, 2018 at 10:47 AM Michael Davidsaver
<[email protected] <mailto:[email protected]>> wrote:
On 07/12/2018 01:12 PM, Andrew Johnson wrote:
> Hi Matt,
>
> On 07/12/2018 02:54 PM, Matt Rippa wrote:
>> We ran a simple caget (once per second) from a linux host to an
upgraded
>> IOC. With the new rtems/3.14.12.7 <http://3.14.12.7>
<http://3.14.12.7> IOC, it often
>> experiences timeouts as shown below. With our legacy
vxworks/3.13.9 IOC
>> there are no timeouts logged. Relevant info is shown below. The
>> caRepeater is running on the linux host.
>
> If your IOC is busy the caget default search timeout of 1 second
may be
> too short, have you tried using 'caget -w 5 tcs:ak:astCtx' say?
@Matt, If this doesn't have an effect I'd suggest taking a packet
capture
to see whether these is a search reply coming back.
As it happens, I had an symptom similar to this while playing around
with an mvme31000 emulator. The issue turned out to be dropped RX
interrupts due to a bug in the emulator. The effect was due to
a hard coded 1 second timeout in CA during TCP connect().
Have you tried the simple test of letting 'ping' run for a while?
If the reported latencies aren't stable, it may be a sign of a
lower level issue.
>> I'm thinking that setting the linux host EPICS_CA_ADDR_LIST=YES may
>> help. I'm running that test now.
>
> I have never seen an EPICS_CA_ADDR_LIST as long as yours, but
given that
> you seem to be searching multiple subnets maybe you do need that.
Agreed. For this test I'd suggest setting EPICS_CA_ADDR_LIST with only
the IP of the IOC you're testing. Along with EPICS_CA_AUTO_ADDR_LIST=NO
> I don't think setting EPICS_CA_AUTO_ADDR_LIST to YES will make any
> difference to this behaviour (your epicsPrtEnvParams showed it as YES
> already, so unless you have explicitly changed it for the client
you're
> probably already searching your local subnet as well).
Also if both IOC and client tools are 3.14.12.x then you can try
EPICS_CA_AUTO_ADDR_LIST=NO EPICS_CA_ADDR_LIST=
EPICS_CA_NAME_SERVERS=<IP>
This will skip the UDP search altogether and do a TCP search.
> HTH,
>
> - Andrew
>
>
>> EPICS_CA_ADDR_LIST=172.17.2.255 172.17.3.255 172.17.102.130
>> 172.17.105.20 172.16.102.130 aom-vme sbfmcao01.cl.gemini.edu
<http://sbfmcao01.cl.gemini.edu>
>> mcao-stealth.cl.gemini.edu <http://mcao-stealth.cl.gemini.edu>
172.17.106.111 172.17.105.37
>> 172.17.107.50 172.17.55.101 172.17.101.101 172.17.65.255
>> 172.17.102.139 172.17.102.138
>>
>> linux> caget tcs:ak:astCtx
>> tcs:ak:astCtx 0
>> linux> caget tcs:ak:astCtx
>> Channel connect timed out: 'tcs:ak:astCtx' not found.
>> linux> caget tcs:ak:astCtx
>> Channel connect timed out: 'tcs:ak:astCtx' not found.
>> linux> caget tcs:ak:astCtx
>> tcs:ak:astCtx 0
>
--