Experimental Physics and
Industrial Control System

1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 <2018> 2019 2020 2021 2022 2023 2024 2025 2026	Index	1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 <2018> 2019 2020 2021 2022 2023 2024 2025 2026
<== Date ==>		<== Thread ==>

Subject:	Re: RTEMS NFS/RPIOC problems?
From:	Michael Davidsaver <[email protected]>
To:	Ricardo Cardenes <[email protected]>
Cc:	Talk EPICS Tech <[email protected]>
Date:	Mon, 2 Jul 2018 20:03:24 -0700

On 07/02/2018 04:09 PM, Ricardo Cardenes wrote:
> 
> Hi,
> 
> We've been experiencing some problems on some of our systems (based on MVME 2700/3100; EPICS 3.14.12.7, RTEMS 4.10.2) related to the RPC subsystem, like these:
> 
> *
> 
> 2018-06-04-tcs.log:Jun  4 12:30:06 E) PORT: tcs_vme, MSG: RPCIO: server '172.17.71.11' not responding - still trying
> [...]
> 
> \2018-06-07-tcs.log:Jun  7 13:11:37 E) PORT: tcs_vme, MSG: RPCIO: server '172.17.2.10' not responding - still trying
> [...]
> 
> 2018-06-26-tcs.log:Jun 26 16:43:14  E) PORT: tcs_vme, MSG: RPCIO - statistics: already 17000 retries to server 172.17.2.10
> 2018-06-26-tcs.log:Jun 26 20:57:43  E) PORT: tcs_vme, MSG: RPCIO - statistics: already 18000 retries to server 172.17.2.10
> 
> *
> And it seems to me that we're getting too many CA disconnects, too. We hadn't noticed until recently, when we commissioned a system that seems the only one being systematically affected (but it is the one that orchestrates most of the others, too...)
> 
> We suspect some hardware or networking problem, possibly outside of our boards, given that the interface stats look quite clear (no corrupt Ethernet frames, no retries, ...), and we've actually seen at least one corrupt UDP header, plus several thousand occurrences of timed out IP fragments (over a period of a few weeks).

What are you using for an NFS server?

> In any case, while we investigate the possible hardware issue, has anyone else experienced a similar problem?

I've seen messages similar to what was reported in 2012 by Bruce Hill [1],
as well as the "not responding" message which you see, while @BNL.  I never
associated this with any undesirable behavior.  These IOCs served
some PVs both archived, and in an alarm monitor.  There were no extraneous
disconnects.  However, for these applications, latency up to the CA timeout
interval wouldn't have been noticed unless it happened fairly often.

This was with RTEMS 4.9.6 and various EPICS 3.14.12.x series on mvme3100 boards.
The network traffic was light, the only NFS activity after boot was autosave,
with at least a 5 second holdoff on rewrites.

The NFS server in this case was initially Linux (debian 6 and then 7 as I recall).
Though I think it was later changed out for a proprietary solution, though this
didn't have a noticeable effect on these messages.

[1] https://epics.anl.gov/tech-talk/2012/msg00590.php

Replies:: Re: RTEMS NFS/RPIOC problems? Ricardo Cardenes

References:: RTEMS NFS/RPIOC problems? Ricardo Cardenes

Navigate by Date:: Prev: RTEMS NFS/RPIOC problems? Ricardo Cardenes; Next: Re: RTEMS NFS/RPIOC problems? Ricardo Cardenes; Index: 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 <2018> 2019 2020 2021 2022 2023 2024 2025 2026
Navigate by Thread:: Prev: RTEMS NFS/RPIOC problems? Ricardo Cardenes; Next: Re: RTEMS NFS/RPIOC problems? Ricardo Cardenes; Index: 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 <2018> 2019 2020 2021 2022 2023 2024 2025 2026

ANJ, 03 Jul 2018

· Home · News · About · Talk · Base · Modules · Extensions ·
· Distributions · Download · Documents · Links · Licensing ·

Experimental Physics and Industrial Control System

Experimental Physics and
Industrial Control System