EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  <20222023  2024  Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  <20222023  2024 
<== Date ==> <== Thread ==>

Subject: RE: Bus errors accessing VME with base 7.0.6.1 and latest synApps modules
From: Mark Rivers via Tech-talk <tech-talk at aps.anl.gov>
To: Michael Davidsaver <mdavidsaver at gmail.com>
Cc: "tech-talk at aps.anl.gov" <tech-talk at aps.anl.gov>
Date: Mon, 23 May 2022 13:31:27 +0000

I have done more testing.  The following is a summary of problem and results.

 

-          On recent versions of EPICS base VME crates give A16 and A32 bus errors when CA clients are monitoring PVs associated with VME drivers.

-          The problem occurs on base 7.0.6 and 7.0.6.1

-          The problem does not occur on base 7.0.4 or 7.0.5.

-          The problem only occurs if there are CA clients monitoring the PVs coming from the VME driver.  In my test crate this is specifically the ADC values from an IP330 Industry Pack ADC. 

o   The IP330 is generating interrupts a 2 kHz and doing callbacks to the asynInt32Average device support at this rate. 

o   The ai records are periodically processing at 0.1 or 1.0 second scan rate.

-          If the CA clients are running when the IOC starts it typically fails in a few seconds, and never runs for more than 90 seconds before the bus error.

-          Last night I stopped the CA clients and started the IOC with base 7.0.6.1. 

o   The IOC ran for 10.5 hours with no errors.

o   This morning I started the medm client and then the A32 bus error happened in just a few seconds.  This is the error:

 

[Sun May 22 20:47:23 2022] Done executing startup script '/home/epics/devel/CARS/iocBoot/ioc13lab2/st.cmd'.

[Sun May 22 20:47:23 2022]

[Sun May 22 20:47:23 2022] ioc13lab2>

[Mon May 23 07:15:01 2022] VME Bus Error accessing A32: 0xbfa2b610

[Mon May 23 07:15:01 2022] machine check

[Mon May 23 07:15:01 2022] Exception next instruction address: 0x0368cec0

[Mon May 23 07:15:01 2022] Machine Status Register: 0x0008b032

[Mon May 23 07:15:01 2022] Condition Register: 0x48000884

[Mon May 23 07:15:01 2022] Task: 0x57efe30 "CAS-event"

[Mon May 23 07:15:01 2022] 0x57efe30 (CAS-event): task 0x57efe30 has had a failure and has been stopped.

[Mon May 23 07:15:01 2022] 0x57efe30 (CAS-event): The task has been terminated because it triggered an exception that raised the signal 10.

 

-          My IOC application builds 8 SNL programs.  The application support dbd file contains these lines:

 

# DRIVER SUPPORT

################

registrar(EnergyRegistrar)

registrar(Energy_CCRegistrar)

registrar(BM13_EnergyRegistrar)

registrar(GSE_MonoEnergyRegistrar)

registrar(IDD_LVP_DetectorRegistrar)

registrar(BMD_LVP_DetectorRegistrar)

registrar(newport_tableRegistrar)

registrar(tomoCollectRegistrar)

 

My IOC startup script does not start any of those SNL programs, but they are built into the application.

 

If these registrar lines are not commented out then the IOC fails within a few seconds if the CA clients monitoring the IP330 ADC ai PVs are running.  It typically fails with an A16 bus error which is in the address space of the IP330.  In this case it failed 2 seconds after the st.cmd file was done loading because medm was already running.

 

[Sun May 22 18:14:09 2022] Done executing startup script '/home/epics/devel/CARS/iocBoot/ioc13lab2/st.cmd'.

[Sun May 22 18:14:09 2022]

[Sun May 22 18:14:09 2022] ioc13lab2>

[Sun May 22 18:14:11 2022] VME Bus Error accessing A16: 0x325e

[Sun May 22 18:14:11 2022] machine check

[Sun May 22 18:14:11 2022] Exception next instruction address: 0x0368ce90

[Sun May 22 18:14:11 2022] Machine Status Register: 0x0008b032

[Sun May 22 18:14:11 2022] Condition Register: 0x48000884

[Sun May 22 18:14:11 2022] Task: 0x230bcf0 "CAS-event"

[Sun May 22 18:14:11 2022] 0x230bcf0 (CAS-event): task 0x230bcf0 has had a failure and has been stopped.

[Sun May 22 18:14:11 2022] 0x230bcf0 (CAS-event): The task has been terminated because it triggered an exception that raised the signal 10.

 

If I comment out the last 4 registrar commands in the dbd file  the time to fail is much longer.  In the first test it ran for 4700 seconds with no failure before I stopped it. In the second test it ran for 6400 seconds before it failed with an A16 bus error.  I don’t understand why commenting out those lines should matter, but it definitely does.

 

Mark

 

 

From: Mark Rivers
Sent: Saturday, May 21, 2022 1:05 PM
To: Michael Davidsaver <mdavidsaver at gmail.com>
Cc: tech-talk at aps.anl.gov
Subject: RE: Bus errors accessing VME with base 7.0.6.1 and latest synApps modules

 

Hi Michael

Ø  What specific board is involved?  (eg. mvme3100?)

The test crate is an MVME5100.  But the production crates that were also failing include several MVME2700 boards as well as some MVME5100.

Ø  Although you haven't mentioned the presence of an EVR card or timing driver.

There is no EVR card or timing driver.  The following cards seem to work fine:

-          SIS3801 and SIS3820 VME cards.  These are scalers/multichannel scalers.  I have tested them a lot and it does not crash.

-          IpUnidig digital I/O Industry Pack module with interrupts on inputs.  I cannot get any failures flipping these bits.  It is on the same carrier card as the IP330.

Ø  I still wonder if this isn't somehow resulting from an incomplete rebuild.

I don’t think so.  I did a “make clean” at the top of the synApps/support tree, which includes my specific application directory.  I then did a “find” for .o files and there were none leftover.

Ø  Since the error seems to come from a CAS-event thread, I would be interested to know which CA client is associated.  Running "casr 5" will show this (and more).  eg.

I have attached a file which shows the following:

Ø  Ran casr 5 right after IOC finishes startup.  It shows a client which is connected to a few records from the iocStats module.

Ø  Opened medm screen for IP330, got VME bus error

Ø  Ran casr 5 again showing connections for that medm screen.

Thanks,

Mark


References:
Bus errors accessing VME with base 7.0.6.1 and latest synApps modules Mark Rivers via Tech-talk
RE: Bus errors accessing VME with base 7.0.6.1 and latest synApps modules Mark Rivers via Tech-talk
RE: Bus errors accessing VME with base 7.0.6.1 and latest synApps modules Mark Rivers via Tech-talk
RE: Bus errors accessing VME with base 7.0.6.1 and latest synApps modules Mark Rivers via Tech-talk
Re: Bus errors accessing VME with base 7.0.6.1 and latest synApps modules Michael Davidsaver via Tech-talk
RE: Bus errors accessing VME with base 7.0.6.1 and latest synApps modules Mark Rivers via Tech-talk

Navigate by Date:
Prev: Re: issue with procserv Johnson, Andrew N. via Tech-talk
Next: Re: issue with procserv montis via Tech-talk
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  <20222023  2024 
Navigate by Thread:
Prev: Re: Bus errors accessing VME with base 7.0.6.1 and latest synApps modules Michael Davidsaver via Tech-talk
Next: Modbus for Huber Pilotone Florian Feldbauer via Tech-talk
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  <20222023  2024 
ANJ, 14 Sep 2022 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·