On 07/02/2018 04:09 PM, Ricardo Cardenes wrote:
>
> Hi,
>
> We've been experiencing some problems on some of our systems (based on MVME 2700/3100; EPICS 3.14.12.7, RTEMS 4.10.2) related to the RPC subsystem, like these:
>
> *
>
> 2018-06-04-tcs.log:Jun 4 12:30:06 E) PORT: tcs_vme, MSG: RPCIO: server '172.17.71.11' not responding - still trying
> [...]
>
> \2018-06-07-tcs.log:Jun 7 13:11:37 E) PORT: tcs_vme, MSG: RPCIO: server '172.17.2.10' not responding - still trying
> [...]
>
> 2018-06-26-tcs.log:Jun 26 16:43:14 E) PORT: tcs_vme, MSG: RPCIO - statistics: already 17000 retries to server 172.17.2.10
> 2018-06-26-tcs.log:Jun 26 20:57:43 E) PORT: tcs_vme, MSG: RPCIO - statistics: already 18000 retries to server 172.17.2.10
>
> *
> And it seems to me that we're getting too many CA disconnects, too. We hadn't noticed until recently, when we commissioned a system that seems the only one being systematically affected (but it is the one that orchestrates most of the others, too...)
>
> We suspect some hardware or networking problem, possibly outside of our boards, given that the interface stats look quite clear (no corrupt Ethernet frames, no retries, ...), and we've actually seen at least one corrupt UDP header, plus several thousand occurrences of timed out IP fragments (over a period of a few weeks).
What are you using for an NFS server?
> In any case, while we investigate the possible hardware issue, has anyone else experienced a similar problem?
I've seen messages similar to what was reported in 2012 by Bruce Hill [1],
as well as the "not responding" message which you see, while @BNL. I never
associated this with any undesirable behavior. These IOCs served
some PVs both archived, and in an alarm monitor. There were no extraneous
disconnects. However, for these applications, latency up to the CA timeout
interval wouldn't have been noticed unless it happened fairly often.
This was with RTEMS 4.9.6 and various EPICS 3.14.12.x series on mvme3100 boards.
The network traffic was light, the only NFS activity after boot was autosave,
with at least a 5 second holdoff on rewrites.
The NFS server in this case was initially Linux (debian 6 and then 7 as I recall).
Though I think it was later changed out for a proprietary solution, though this
didn't have a noticeable effect on these messages.
[1] https://epics.anl.gov/tech-talk/2012/msg00590.php
- Replies:
- Re: RTEMS NFS/RPIOC problems? Ricardo Cardenes
- References:
- RTEMS NFS/RPIOC problems? Ricardo Cardenes
- Navigate by Date:
- Prev:
RTEMS NFS/RPIOC problems? Ricardo Cardenes
- Next:
Re: RTEMS NFS/RPIOC problems? Ricardo Cardenes
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
<2018>
2019
2020
2021
2022
2023
2024
- Navigate by Thread:
- Prev:
RTEMS NFS/RPIOC problems? Ricardo Cardenes
- Next:
Re: RTEMS NFS/RPIOC problems? Ricardo Cardenes
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
<2018>
2019
2020
2021
2022
2023
2024
|