EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  <20182019  2020  2021  2022  2023  2024  Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  <20182019  2020  2021  2022  2023  2024 
<== Date ==> <== Thread ==>

Subject: Re: RTEMS NFS/RPIOC problems?
From: Michael Davidsaver <[email protected]>
To: Ricardo Cardenes <[email protected]>
Cc: Talk EPICS Tech <[email protected]>
Date: Mon, 2 Jul 2018 20:03:24 -0700
On 07/02/2018 04:09 PM, Ricardo Cardenes wrote:
> 
> Hi,
> 
> We've been experiencing some problems on some of our systems (based on MVME 2700/3100; EPICS 3.14.12.7, RTEMS 4.10.2) related to the RPC subsystem, like these:
> 
> *
> 
> 2018-06-04-tcs.log:Jun  4 12:30:06 E) PORT: tcs_vme, MSG: RPCIO: server '172.17.71.11' not responding - still trying
> [...]
> 
> \2018-06-07-tcs.log:Jun  7 13:11:37 E) PORT: tcs_vme, MSG: RPCIO: server '172.17.2.10' not responding - still trying
> [...]
> 
> 2018-06-26-tcs.log:Jun 26 16:43:14  E) PORT: tcs_vme, MSG: RPCIO - statistics: already 17000 retries to server 172.17.2.10
> 2018-06-26-tcs.log:Jun 26 20:57:43  E) PORT: tcs_vme, MSG: RPCIO - statistics: already 18000 retries to server 172.17.2.10
> 
> *
> And it seems to me that we're getting too many CA disconnects, too. We hadn't noticed until recently, when we commissioned a system that seems the only one being systematically affected (but it is the one that orchestrates most of the others, too...)
> 
> We suspect some hardware or networking problem, possibly outside of our boards, given that the interface stats look quite clear (no corrupt Ethernet frames, no retries, ...), and we've actually seen at least one corrupt UDP header, plus several thousand occurrences of timed out IP fragments (over a period of a few weeks).

What are you using for an NFS server?

> In any case, while we investigate the possible hardware issue, has anyone else experienced a similar problem?

I've seen messages similar to what was reported in 2012 by Bruce Hill [1],
as well as the "not responding" message which you see, while @BNL.  I never
associated this with any undesirable behavior.  These IOCs served
some PVs both archived, and in an alarm monitor.  There were no extraneous
disconnects.  However, for these applications, latency up to the CA timeout
interval wouldn't have been noticed unless it happened fairly often.

This was with RTEMS 4.9.6 and various EPICS 3.14.12.x series on mvme3100 boards.
The network traffic was light, the only NFS activity after boot was autosave,
with at least a 5 second holdoff on rewrites.

The NFS server in this case was initially Linux (debian 6 and then 7 as I recall).
Though I think it was later changed out for a proprietary solution, though this
didn't have a noticeable effect on these messages.

[1] https://epics.anl.gov/tech-talk/2012/msg00590.php

Replies:
Re: RTEMS NFS/RPIOC problems? Ricardo Cardenes
References:
RTEMS NFS/RPIOC problems? Ricardo Cardenes

Navigate by Date:
Prev: RTEMS NFS/RPIOC problems? Ricardo Cardenes
Next: Re: RTEMS NFS/RPIOC problems? Ricardo Cardenes
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  <20182019  2020  2021  2022  2023  2024 
Navigate by Thread:
Prev: RTEMS NFS/RPIOC problems? Ricardo Cardenes
Next: Re: RTEMS NFS/RPIOC problems? Ricardo Cardenes
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  <20182019  2020  2021  2022  2023  2024 
ANJ, 03 Jul 2018 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·