EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  <20182019  2020  2021  2022  2023  2024  Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  <20182019  2020  2021  2022  2023  2024 
<== Date ==> <== Thread ==>

Subject: Re: IOC Crash with No Exception Generated
From: Ricardo Cardenes via Tech-talk <[email protected]>
To: [email protected]
Cc: Talk EPICS Tech <[email protected]>
Date: Thu, 26 Jul 2018 09:16:06 -1000
Hi,

Thanks both Michael and Andrew for your answers. I also have hardware error as the top cause, but we're trying other ideas first because:
  1. This has happened on two different boards (one rather old, another one more a more recent purchase, both MVME2700s) within 4 days. This is an RTEMS systems replacing a VxWorks one, also on a 2700.
  2. To account for that, I have speculated that an external factor (another board in the same crate, or maybe a failing power supply) is making this happen, but the VxWorks based system doesn't seem to be experiencing the problem at all
To make things worse, the system is at a separate location and we can only test it in real time by sending an engineer to spend the night at 13800' just waiting for a failure that may happen, or not. It's a quite frustrating situation, because it seems site-dependent, as the same boards have been running for weeks on our lab without a glitch! :-(

Andrew: the watchdog to trigger the abort looks like a great idea. I'll look into your interrupt latency suggestion, too. Our problem now would be to convince the users to put a known unstable system in operations just to see it fail :-)

Regards,
Ricardo

On Thu, Jul 26, 2018 at 5:38 AM Andrew Johnson <[email protected]> wrote:
Hi Matt,

On 07/25/2018 08:52 PM, Matt Rippa via Tech-talk wrote:
> Is there a way to force an exception (or stack trace), for example with
> watchdog?
>
> RTEMS 4.10.2/EPICS 3.14.12.7 MVME2307 BSP

Back in 2000 we were having this kind of issue on some of our VxWorks
systems, and I wrote some code for a couple of different CPU boards
(MVME167 and 172) which connected an interrupt handler to the Abort
button interrupt. When the button was pressed this routine dumped the
status of a selected set of tasks into an area of memory that was
configured to survive a reboot, allowing us to show that status after
bringing the system back up.

My code would be no use to you on a different CPU board and OS, but the
idea of connecting something up to an Abort button interrupt if your
boards have one might help. You'll need the hardware manual for the CPU
board to work out how to enable and connect the abort interrupt, but
most Motorola/Emerson/Whoever boards do have such a button.


I will add that the PowerPC CPUs seem to be a bit more prone to the CPU
completely hanging up than the 68Ks were, which I think tends to happen
if they get a Bus Error (PCIbus Target Abort) from code running inside
an interrupt handler/ISR. If you have any ISRs that do VMEbus I/O you
might want to look at whether they can be converted into high priority
threads that sit waiting on a semaphore, and have the ISR do nothing but
trigger that semaphore. This will increase the interrupt latency and
jitter, but would prevent hangups if something goes wrong with the VME I/O.

HTH,

- Andrew

--
Arguing for surveillance because you have nothing to hide is no
different than making the claim, "I don't care about freedom of
speech because I have nothing to say." -- Edward Snowdon

Replies:
Re: IOC Crash with No Exception Generated Michael Davidsaver
References:
IOC Crash with No Exception Generated Matt Rippa via Tech-talk
Re: IOC Crash with No Exception Generated Andrew Johnson

Navigate by Date:
Prev: RE: Convert .edl (EDM) display files to .adl (MEDM)? Mark Rivers
Next: Re: IOC Crash with No Exception Generated Michael Davidsaver
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  <20182019  2020  2021  2022  2023  2024 
Navigate by Thread:
Prev: Re: IOC Crash with No Exception Generated Andrew Johnson
Next: Re: IOC Crash with No Exception Generated Michael Davidsaver
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  <20182019  2020  2021  2022  2023  2024 
ANJ, 26 Jul 2018 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·