1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 <2022> 2023 2024 2025 | Index | 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 <2022> 2023 2024 2025 |
<== Date ==> | <== Thread ==> |
---|
Subject: | RE: Bus errors accessing VME with base 7.0.6.1 and latest synApps modules |
From: | Mark Rivers via Tech-talk <tech-talk at aps.anl.gov> |
To: | Michael Davidsaver <mdavidsaver at gmail.com> |
Cc: | "tech-talk at aps.anl.gov" <tech-talk at aps.anl.gov> |
Date: | Fri, 27 May 2022 17:33:10 +0000 |
I am now using the unmodified 7.0.6 release. My IOC only contains these records: ioc13lab2> dbl 13LAB:DAC1_1 13LAB:DAC1_2 13LAB:DAC1_3 13LAB:DAC1_4 13LAB:DAC1_5 13LAB:DAC1_6 13LAB:DAC1_7 13LAB:DAC1_8 13LAB:IP330_1Gain 13LAB:IP330_2Gain 13LAB:IP330_3Gain 13LAB:IP330_4Gain 13LAB:IP330_5Gain 13LAB:IP330_6Gain 13LAB:IP330_7Gain 13LAB:IP330_8Gain 13LAB:IP330_9Gain 13LAB:IP330_10Gain 13LAB:IP330_11Gain 13LAB:IP330_12Gain 13LAB:IP330_13Gain 13LAB:IP330_14Gain 13LAB:IP330_15Gain 13LAB:IP330_16Gain 13LAB:IP330_1 13LAB:IP330_2 13LAB:IP330_3 13LAB:IP330_4 13LAB:IP330_5 13LAB:IP330_6 13LAB:IP330_7 13LAB:IP330_8 13LAB:IP330_9 13LAB:IP330_10 13LAB:IP330_11 13LAB:IP330_12 13LAB:IP330_13 13LAB:IP330_14 13LAB:IP330_15 13LAB:IP330_16 value = 0 = 0x0 These are all records from EPICS base, "ai", "ao", and "longout". ioc13lab2> dbl "ai" 13LAB:IP330_1 13LAB:IP330_2 13LAB:IP330_3 13LAB:IP330_4 13LAB:IP330_5 13LAB:IP330_6 13LAB:IP330_7 13LAB:IP330_8 13LAB:IP330_9 13LAB:IP330_10 13LAB:IP330_11 13LAB:IP330_12 13LAB:IP330_13 13LAB:IP330_14 13LAB:IP330_15 13LAB:IP330_16 value = 0 = 0x0 ioc13lab2> dbl "ao" 13LAB:DAC1_1 13LAB:DAC1_2 13LAB:DAC1_3 13LAB:DAC1_4 13LAB:DAC1_5 13LAB:DAC1_6 13LAB:DAC1_7 13LAB:DAC1_8 value = 0 = 0x0 ioc13lab2> dbl "longout" 13LAB:IP330_1Gain 13LAB:IP330_2Gain 13LAB:IP330_3Gain 13LAB:IP330_4Gain 13LAB:IP330_5Gain 13LAB:IP330_6Gain 13LAB:IP330_7Gain 13LAB:IP330_8Gain 13LAB:IP330_9Gain 13LAB:IP330_10Gain 13LAB:IP330_11Gain 13LAB:IP330_12Gain 13LAB:IP330_13Gain 13LAB:IP330_14Gain 13LAB:IP330_15Gain 13LAB:IP330_16Gain value = 0 = 0x0 coreRelease shows I am building with base 7.0.6 ioc13lab2> coreRelease ############################################################################ ## EPICS R7.0.6 ## Rev. R7.0.6-dirty ############################################################################ value = 0 = 0x0 dbpr shows that the "AMSG" field is present, which shows I am building with the version of base with those fields. ioc13lab2> dbpr "13LAB:IP330_1",5 ACKS: NO_ALARM ACKT: YES ADEL: 0 AFTC: 0 AFVL: 0 ALST: 0.37796597238117 AMSG: AOFF: 0 ASG : ASLO: 1 ASP : PTR 0x0 BKLNK: ELL 0 [0x0 .. 0x0] BKPT: 00 DESC: DISA: 0 DISP: 0 DISS: NO_ALARM DISV: 1 DPVT: PTR 0x44bc2d0 DSET: PTR 0x42e7d2c DTYP: asynInt32Average EGU : EGUF: 10 EGUL: -10 EOFF: -10 ESLO: 3.05180437933928e-04 EVNT: FLNK: CONSTANT HHSV: NO_ALARM HIGH: 0 HIHI: 0 HOPR: 10 HSV : NO_ALARM HYST: 0 INIT: 0 INP : INST_IO @asyn(Ip330_1 0)DATA LALM: 0.37796597238117 LBRK: 0 LCNT: 0 LINR: LINEAR LLSV: NO_ALARM LOLO: 0 LOPR: -10 LOW : 0 LSET: PTR 0x44aa0a0 LSV : NO_ALARM MDEL: 0 MLIS: ELL 12 [0x26f6e38 .. 0x26f58d0] MLOK: 04 4a cb 90 MLST: 0.37796597238117 NAME: 13LAB:IP330_1 NAMSG: NSEV: NO_ALARM NSTA: NO_ALARM OLDSIMM: NO ORAW: 30778 PACT: 1 PBRK: PTR 0x0 PHAS: 0 PINI: NO PPN : PTR 0x0 PPNR: PTR 0x0 PREC: 4 PRIO: LOW PROC: 0 PUTF: 0 RDES: PTR 0x32bbbd0 ROFF: 0 RPRO: 0 RSET: PTR 0x42e86cc RVAL: 34006 SCAN: .1 second SDIS: CONSTANT SDLY: -1 SEVR: NO_ALARM SIML: CONSTANT SIMM: NO SIMPVT: PTR 0x0 SIMS: NO_ALARM SIOL: CONSTANT SMOO: 0 SPVT: PTR 0x26cb230 SSCN: <nil> STAT: NO_ALARM SVAL: 0 TIME: 2022-05-27 17:03:43.752404587 TPRO: 0 TSE : 0 TSEL: CONSTANT UDF : 0 UDFS: INVALID UTAG: 0 VAL : 0.37796597238117 value = 0 = 0x0 ioc13lab2> The device support for these records is standard asyn device support. asyn is built with base 7.0.6. corvette:~/devel>grep EPICS_BASE asyn-4-42/configure/RELEASE EPICS_BASE=/corvette/usr/local/epics-devel/base-7.0.6 The drivers for these devices are ipac, ip330, and dac128V. These are all built with base 7.0.6. corvette:~/devel>grep EPICS_BASE ipac-2-16/configure/RELEASE EPICS_BASE=/corvette/usr/local/epics-devel/base-7.0.6 corvette:~/devel>grep EPICS_BASE ip330-2-10/configure/RELEASE EPICS_BASE=/corvette/usr/local/epics-devel/base-7.0.6 corvette:~/devel>grep EPICS_BASE dac128V-2-10-1/configure/RELEASE EPICS_BASE=/corvette/usr/local/epics-devel/base-7.0.6 The application is called CARS, and its configure/RELEASE contains these lines: SUPPORT=/corvette/home/epics/devel ASYN=$(SUPPORT)/asyn-4-42 CARS=$(SUPPORT)/CARS DAC128V=$(SUPPORT)/dac128V-2-10-1 IPAC=$(SUPPORT)/ipac-2-16 IP330=$(SUPPORT)/ip330-2-10 EPICS_BASE=/corvette/usr/local/epics-devel/base-7.0.6 /corvette/home/epics/devel is the path to asyn, ipac, dac128v, ip330, and CARS. Before I build in that directory I run make clean make realuninstall It thus appears to me that all records in the IOC are built with base 7.0.6, the records are all part of base, and the device support and drivers are built with base
7.0.6. However, I am getting the same errors: ioc13lab2> VME Bus Error accessing A32: 0xbfe37a50 machine check Exception next instruction address: 0x0368d24c Machine Status Register: 0x0008b032 Condition Register: 0x42000884 Task: 0x48ec280 "CAS-event" 0x48ec280 (CAS-event): task 0x48ec280 has had a failure and has been stopped. 0x48ec280 (CAS-event): The task has been terminated because it triggered an exception that raised the signal 10. VME Bus Error accessing A16: 0x347e machine check Exception next instruction address: 0x035ef230 Machine Status Register: 0x0008b032 Condition Register: 0x44004824 Task: 0x48e90c0 "CAS-client" 0x48e90c0 (CAS-client): task 0x48e90c0 has had a failure and has been stopped. 0x48e90c0 (CAS-client): The task has been terminated because it triggered an exception that raised the signal 10. If I comment out those 2 new fields in base the errors don’t occur. What could possibly cause this? Mark -----Original Message----- On 5/27/22 06:17, Mark Rivers wrote: > Hi Michael, > > I commented out all lines in recGbl.c that reference the new amsg and namsg fields. That did not fix the problem. > > I then commented out the AMSG and NAMSG fields in dbCommon.dbd.pod. That did fix the problem. > > It is starting to seem like something in synApps is not working with
> these new fields? I have done the following before each build at the
> top-level of synApps > > make clean > make realuninstall > > So it should have rebuilt everything. Maybe what you are building isn't what is being loaded? > I don't think my IOC uses any of the records that synApps adds except the asynRecord in asyn. I will strip down the application to only the modules actually needed, doing it one at a time and see if I can figure out what the offending
module is. > > Thanks, > Mark > > -----Original Message----- > From: Michael Davidsaver <mdavidsaver at gmail.com> > Sent: Friday, May 27, 2022 2:29 AM > To: Mark Rivers <rivers at cars.uchicago.edu> > Cc: tech-talk at aps.anl.gov > Subject: Re: Bus errors accessing VME with base 7.0.6.1 and latest
> synApps modules > > On 5/26/22 20:47, Mark Rivers wrote: >> Hi Michael, >> >> Thanks for the instructions on git bisect. It seems to have worked. I only had to do 1 skip because base gave an error building on vxWorks, all of the rest built OK and clearly failed (in just a few seconds) or clearly worked (ran
for 10+ minutes). >> >> Here is the final result: >> >> corvette:local/epics-devel/base-7.0.6>git bisect bad >> 892a361de7002bf2b3f375f24bc5bf61858de2e5 is the first bad commit
>> commit 892a361de7002bf2b3f375f24bc5bf61858de2e5 >> Author: Michael Davidsaver <mdavidsaver at gmail.com> >> Date: Mon Mar 30 13:56:13 2020 -0700 >> >> add alarm message field >> >> :040000 040000 e339a59018c9e2f28c3b4efd555b675a803e84f7 a7fd036f53c20f6e7fd673d7191ffb3b28770a79 M modules > >
https://github.com/epics-base/epics-base/commit/892a361de7002bf2b3f375 > f24bc5bf61858de2e5 > > Well, that points the finger squarely at me... > > From the earlier references to dbEvent.c, the obvious place to start is is if you remove the db_post_events() call added by this commit. > >> diff --git a/modules/database/src/ioc/db/recGbl.c >> b/modules/database/src/ioc/db/recGbl.c >> index 95387f5de..971716fa5 100644 >> --- a/modules/database/src/ioc/db/recGbl.c >> +++ b/modules/database/src/ioc/db/recGbl.c >> @@ -204,7 +204,7 @@ unsigned short recGblResetAlarms(void *precord) >> } >> if (stat_mask) { >> db_post_events(pdbc, &pdbc->stat, stat_mask); >> - db_post_events(pdbc, &pdbc->amsg, stat_mask); >> + //db_post_events(pdbc, &pdbc->amsg, stat_mask); >> val_mask = DBE_ALARM; >> >> if (!pdbc->ackt || new_sevr >= pdbc->acks) { > > > >> Mark >> >> >> -----Original Message----- >> From: Michael Davidsaver <mdavidsaver at gmail.com> >> Sent: Thursday, May 26, 2022 5:23 PM >> To: Mark Rivers <rivers at cars.uchicago.edu> >> Cc: tech-talk at aps.anl.gov >> Subject: Re: Bus errors accessing VME with base 7.0.6.1 and latest
>> synApps modules >> >> On 5/26/22 14:48, Mark Rivers wrote: >>> Other ideas of how to debug it? >> >> If we accept that the stack trace is correct, then memory corruption at some earlier point seems likely. It would be interesting to inspect the stack of the faulting thread. However, I don't know if this is possible with your setup,
and wouldn't be able to talk you through it anyway. >> >> At this point, maybe bite the bullet and go for git-bisect? This will likely take some time to work through, although perhaps less than what has already been spent. >> >> According to git 2.30.2 on my laptop, bisecting 7.0.5 through 7.0.6.1 would take 7 iterations assuming no skips (eg. build failures). >> >> >> Begin with: >> >>> git bisect start R7.0.6.1 R7.0.5 >> >> This will update your checkout to a revision for testing. Build and run. If if doesn't build, or the result is otherwise inconclusive, then 'skip': >> >>> git bisect skip >> >> If the problem appears: >> >>> git bisect bad >> >> If the problem does not appear: >> >>> git bisect good >> >> >> Each time the checkout will be updated to another revision. Repeat until complete. Ideally this will narrow down to a single revision. >> If there are skips, it will give a range of revisions (hopefully not including a 3.15 merge commit...). > |