Subject: |
EPICS Reliability |
From: |
Carl Dickey <[email protected]> |
Date: |
Thu, 16 Feb 1995 19:26:35 -0500 |
All-
Bob has asked me to relate an experience that we have gone through here at DFELL.
Several weeks ago, our linac IOC locked up during injection into our storage ring. Despite
generally excellent reliability, this is always a bit of an embarrassment, so we started
recollecting the recent history of similar happenings. We recalled five such events during
the past year or so. Two occurred on the linac IOC, one for our storage ring IOC's and
two for our Mark III IOC. As usual, the software enthusiasts began to investigate the
hardware layers and the hardware types started analysing the software. An investigation
of the syslogs for workstations involved with EPICS revealed occasional NFS errors.
Furthermore, at the time of the most recent linac IOC failure, we found a RPC error and
a corresponding timeout. We began to supsect that the NFS errors were being caused by
noise coupled into our ethernet. Fred Carter, had been having a bad feeling about our
COW (computer on wheels) that we keep near the center of our storage ring for quick
troubleshooting. Sure enough, we discovered that in our haste to commission the system,
we had employed an unshielded AUI drop cable. This cable worked ok until we began to
fire up our pulsed power systems and our ceramic gaps became illuminated. Apparently,
about once a month or so, the occasional packet bursts that were evidenced by the NFS
errors, would occur in coincidence with the reception of data by the IOC. It seems that
this causes the IOC to die. Thus we see an RPC error and a timeout. Reboot of the
IOC clears the locked condition. Since replacing the bad drop cable, we have seen no
further NFS errors, and we have had no further IOC dropouts.
Bob tells me that Los Alamos has had a very similar experience in the past with
their Heurikon HKV2 based IOC's. Given that so may labs are in the commissioning phase,
it might be good to keep this in mind in case you encounter something similar. Bob can
provide more detailed pathological information concerning this type of failure.
Best wishes,
Carl
PS- Our commissioning is continuing to go well. Beam lifetime in our storage ring is on
the order of hours. We have successfully achieved single bunch injection and synchronous
stacking. We have ramped to over 1GeV, our machine's design energy. Our next goals include
increasing the current and reducing our injection pulse from the present length of three
buckets (about 15ns) to one bucket (about 5ns).
- Navigate by Date:
- Prev:
Re: Building Epics R3.12 Nick Rees
- Next:
New edd/dm Bob Dalesio
- Index:
1994
<1995>
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
- Navigate by Thread:
- Prev:
Re: Building Epics R3.12 Nick Rees
- Next:
New edd/dm Bob Dalesio
- Index:
1994
<1995>
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
|