Hi,
Thanks both Michael and Andrew for your answers. I also have hardware error as the top cause, but we're trying other ideas first because:
- This has happened on two different boards (one rather old, another one more a more recent purchase, both MVME2700s) within 4 days. This is an RTEMS systems replacing a VxWorks one, also on a 2700.
- To account for that, I have speculated that an external factor (another board in the same crate, or maybe a failing power supply) is making this happen, but the VxWorks based system doesn't seem to be experiencing the problem at all
To make things worse, the system is at a separate location and we can only test it in real time by sending an engineer to spend the night at 13800' just waiting for a failure that may happen, or not. It's a quite frustrating situation, because it seems site-dependent, as the same boards have been running for weeks on our lab without a glitch! :-(
Andrew: the watchdog to trigger the abort looks like a great idea. I'll look into your interrupt latency suggestion, too. Our problem now would be to convince the users to put a known unstable system in operations just to see it fail :-)
Regards,
Ricardo