-----Original Message-----
From: Ron Sluiter [mailto:[email protected]]
Sent: Monday, November 13, 2006 12:12 PM
To: EPICS
Subject: Motion control failure at APS
On Tuesday, July 25, a serious motion control failure
occurred in the 34ID-A hutch here at the APS.
All eight stepper motors controlled by a single Oregon
Micro Systems (OMS) VME58 board ran past their limit
switches and into their mechanical stops.
The eight stepper motors involved in the accident are
used to position two white beam slits and a mirror.
Bellows up stream from the 1st white beam slit were
punctured by the x ray beam. Vacuum was lost.
Additional damage included two broken limit switches
and an ion pump.
Three days of beamtime were lost.
Like all APS beamlines using the OMS motor controllers,
this IOC relies on the OMS controller to stop motion
when a motor is moving in the direction of an activated
limit switch. In this case, not only did the OMS
controller fail to stop motion when the limit switches
were activated, the controller entered a strange failure
mode that drove the motors into their mechanical stops.
The VME crate in question is located inside the 34ID-A
white-beam hutch. It is possible that radiation damage
contributed to the failure of the OMS VME58 controller.
There are hundreds of OMS boards at APS that have been in
operation for many years, and no report of this type of
failure has ever been made. OMS has also never had
a report of this type of failure.
Between Kurt Goetze, OMS and myself; we have determined
that the failure resulted from two characteristics of
the OMS VME58 board, that, together conspired to cause
this failure. First, we have known for some time that
the OMS VME58 boards output a step pulse on every axis
after a VMEbus reset. Second, (unknown to us) the
on-board VME58 watchdog can cause the board to go into
a continuous reset cycle if there is a board failure.
These two characteristics, together, caused the failed
VME58 board at 34ID-A to step all eight of its' motors at
approximately 4 steps per second until power was shut off.
Since this accident OMS has resolved the "step pulse on
reset" problem. By fixing the "step pulse on reset" problem,
OMS prevents the VME58 from accidentally driving its' motors
if a board fails and exhibits the "continuous reset" behavior.
The "fix" requires the replacement of nine socketed
components on the VME58 controller board.
OMS is providing two ways to retrofit VME58 boards with
this fix.
Option #1 - Send your VME58 boards back to OMS. They will
replace the nine components and retest the board. To do
this, first contact OMS for a Return Materials Authorization
(RMA) number and refer to Engineering Waver #112 (EW112)
as the reason for the RMA. Cost is $357.00 per board.
Option #2 - OMS is providing a retrofit kit that includes
the nine parts (5 of which are pre-programmed), labeling,
instructions and shipment for $107 per kit.
I don't have the specifics on how to order the kit. I will
post that info. in a follow-up message.
If you are ordering new OMS VME58 boards, be aware that OMS
will not be incorporating the upgrade chip set unless you
specify Engineering Waiver no. 112 (EW112) in your order.
In addition, we recommend the MAXv rather than the VME58
for new orders.
For those who choose to upgrade existing VME58 boards,
I strongly advise option #1. Option #2 opens you to the
risk of taking a perfectly functioning VME58 board and
making it less reliable; e.g., bending a pin while inserting
a part in a socket or damage from static discharge. Option
#1 provides full testing and quality assurance from OMS.
Closing thoughts;
- Thanks to OMS for addressing and fixing this problem.
The VME58 product family is approaching it's end of life.
The only incentive for OMS to devote resources to resolving
this problem is their commitment to quality.
- The "step on reset" problem does not occur in OMS's
latest controller; the MAXv and MAXp controllers.
- OMS VME58 user's should review what the consequences of
having this failure occur at their facilities would be.
The decision to have your boards retrofitted should strike
a balance between the probability of board failure (higher
if exposed to radiation) and the consequences of that
failure.
Sorry for the length; hope I didn't bore you,
Ron