What are you using for an NFS server?
On that listing before, both machines are CentOS 6.5 (Linux 2.6.32 / rpcbind 0.2):
- [...].71.11 is serving the boot images for all our production EPICS/RTEMS machines
- [...].2.10 is an additional mount with data files. The machine displaying the problem changes directory into this one at the end of the startup script.
We've got a twin site with similar layout. The only difference is that on this other site the CentOS versions are 6.9 and 6.7 respectively, but same kernel and rpcbind versions.
I've seen messages similar to what was reported in 2012 by Bruce Hill [1],
as well as the "not responding" message which you see, while @BNL. I never
associated this with any undesirable behavior. These IOCs served
some PVs both archived, and in an alarm monitor. There were no extraneous
disconnects. However, for these applications, latency up to the CA timeout
interval wouldn't have been noticed unless it happened fairly often.
This was with RTEMS 4.9.6 and various EPICS 3.14.12.x series on mvme3100 boards.
The network traffic was light, the only NFS activity after boot was autosave,
with at least a 5 second holdoff on rewrites.
The NFS server in this case was initially Linux (debian 6 and then 7 as I recall).
Though I think it was later changed out for a proprietary solution, though this
didn't have a noticeable effect on these messages.
Sure. The rpcbind version for CentOS is the same as for Debian 7. In any case, what makes me worry is that we never saw these messages with the other several systems we've been commissioning over the past year, using the same combination of EPICS/OS, and hardware. Sure, this one is the most complex, but still...
Also, I know that the messages are normal for NFS. But I would expect them to be the exception not the norm! Having them so consistently (right now that count is +44000 for that later server) is extremely suspicious.
In any case, thanks for your input! I'll keep the list updated, in case that we find something.
Cheers,
Ricardo