I am now using the unmodified 7.0.6 release.
My IOC only contains these records:
ioc13lab2> dbl
13LAB:DAC1_1
13LAB:DAC1_2
13LAB:DAC1_3
13LAB:DAC1_4
13LAB:DAC1_5
13LAB:DAC1_6
13LAB:DAC1_7
13LAB:DAC1_8
13LAB:IP330_1Gain
13LAB:IP330_2Gain
13LAB:IP330_3Gain
13LAB:IP330_4Gain
13LAB:IP330_5Gain
13LAB:IP330_6Gain
13LAB:IP330_7Gain
13LAB:IP330_8Gain
13LAB:IP330_9Gain
13LAB:IP330_10Gain
13LAB:IP330_11Gain
13LAB:IP330_12Gain
13LAB:IP330_13Gain
13LAB:IP330_14Gain
13LAB:IP330_15Gain
13LAB:IP330_16Gain
13LAB:IP330_1
13LAB:IP330_2
13LAB:IP330_3
13LAB:IP330_4
13LAB:IP330_5
13LAB:IP330_6
13LAB:IP330_7
13LAB:IP330_8
13LAB:IP330_9
13LAB:IP330_10
13LAB:IP330_11
13LAB:IP330_12
13LAB:IP330_13
13LAB:IP330_14
13LAB:IP330_15
13LAB:IP330_16
value = 0 = 0x0
These are all records from EPICS base, "ai", "ao", and "longout".
ioc13lab2> dbl "ai"
13LAB:IP330_1
13LAB:IP330_2
13LAB:IP330_3
13LAB:IP330_4
13LAB:IP330_5
13LAB:IP330_6
13LAB:IP330_7
13LAB:IP330_8
13LAB:IP330_9
13LAB:IP330_10
13LAB:IP330_11
13LAB:IP330_12
13LAB:IP330_13
13LAB:IP330_14
13LAB:IP330_15
13LAB:IP330_16
value = 0 = 0x0
ioc13lab2> dbl "ao"
13LAB:DAC1_1
13LAB:DAC1_2
13LAB:DAC1_3
13LAB:DAC1_4
13LAB:DAC1_5
13LAB:DAC1_6
13LAB:DAC1_7
13LAB:DAC1_8
value = 0 = 0x0
ioc13lab2> dbl "longout"
13LAB:IP330_1Gain
13LAB:IP330_2Gain
13LAB:IP330_3Gain
13LAB:IP330_4Gain
13LAB:IP330_5Gain
13LAB:IP330_6Gain
13LAB:IP330_7Gain
13LAB:IP330_8Gain
13LAB:IP330_9Gain
13LAB:IP330_10Gain
13LAB:IP330_11Gain
13LAB:IP330_12Gain
13LAB:IP330_13Gain
13LAB:IP330_14Gain
13LAB:IP330_15Gain
13LAB:IP330_16Gain
value = 0 = 0x0
coreRelease shows I am building with base 7.0.6
ioc13lab2> coreRelease
############################################################################
## EPICS R7.0.6
## Rev. R7.0.6-dirty
############################################################################
value = 0 = 0x0
dbpr shows that the "AMSG" field is present, which shows I am building with the version of base with those fields.
ioc13lab2> dbpr "13LAB:IP330_1",5
ACKS: NO_ALARM ACKT: YES ADEL: 0 AFTC: 0
AFVL: 0 ALST: 0.37796597238117 AMSG:
AOFF: 0 ASG : ASLO: 1 ASP : PTR 0x0
BKLNK: ELL 0 [0x0 .. 0x0] BKPT: 00 DESC:
DISA: 0 DISP: 0 DISS: NO_ALARM DISV: 1
DPVT: PTR 0x44bc2d0 DSET: PTR 0x42e7d2c DTYP: asynInt32Average
EGU : EGUF: 10 EGUL: -10 EOFF: -10
ESLO: 3.05180437933928e-04 EVNT: FLNK: CONSTANT
HHSV: NO_ALARM HIGH: 0 HIHI: 0 HOPR: 10
HSV : NO_ALARM HYST: 0 INIT: 0
INP : INST_IO @asyn(Ip330_1 0)DATA LALM: 0.37796597238117
LBRK: 0 LCNT: 0 LINR: LINEAR LLSV: NO_ALARM
LOLO: 0 LOPR: -10 LOW : 0 LSET: PTR 0x44aa0a0
LSV : NO_ALARM MDEL: 0 MLIS: ELL 12 [0x26f6e38 .. 0x26f58d0]
MLOK: 04 4a cb 90 MLST: 0.37796597238117 NAME: 13LAB:IP330_1
NAMSG: NSEV: NO_ALARM NSTA: NO_ALARM OLDSIMM: NO
ORAW: 30778 PACT: 1 PBRK: PTR 0x0 PHAS: 0
PINI: NO PPN : PTR 0x0 PPNR: PTR 0x0 PREC: 4
PRIO: LOW PROC: 0 PUTF: 0 RDES: PTR 0x32bbbd0
ROFF: 0 RPRO: 0 RSET: PTR 0x42e86cc RVAL: 34006
SCAN: .1 second SDIS: CONSTANT SDLY: -1 SEVR: NO_ALARM
SIML: CONSTANT SIMM: NO SIMPVT: PTR 0x0 SIMS: NO_ALARM
SIOL: CONSTANT SMOO: 0 SPVT: PTR 0x26cb230 SSCN: <nil>
STAT: NO_ALARM SVAL: 0 TIME: 2022-05-27 17:03:43.752404587
TPRO: 0 TSE : 0 TSEL: CONSTANT UDF : 0
UDFS: INVALID UTAG: 0 VAL : 0.37796597238117
value = 0 = 0x0
ioc13lab2>
The device support for these records is standard asyn device support. asyn is built with base 7.0.6.
corvette:~/devel>grep EPICS_BASE asyn-4-42/configure/RELEASE
EPICS_BASE=/corvette/usr/local/epics-devel/base-7.0.6
The drivers for these devices are ipac, ip330, and dac128V. These are all built with base 7.0.6.
corvette:~/devel>grep EPICS_BASE ipac-2-16/configure/RELEASE
EPICS_BASE=/corvette/usr/local/epics-devel/base-7.0.6
corvette:~/devel>grep EPICS_BASE ip330-2-10/configure/RELEASE
EPICS_BASE=/corvette/usr/local/epics-devel/base-7.0.6
corvette:~/devel>grep EPICS_BASE dac128V-2-10-1/configure/RELEASE
EPICS_BASE=/corvette/usr/local/epics-devel/base-7.0.6
The application is called CARS, and its configure/RELEASE contains these lines:
SUPPORT=/corvette/home/epics/devel
ASYN=$(SUPPORT)/asyn-4-42
CARS=$(SUPPORT)/CARS
DAC128V=$(SUPPORT)/dac128V-2-10-1
IPAC=$(SUPPORT)/ipac-2-16
IP330=$(SUPPORT)/ip330-2-10
EPICS_BASE=/corvette/usr/local/epics-devel/base-7.0.6
/corvette/home/epics/devel is the path to asyn, ipac, dac128v, ip330, and CARS.
Before I build in that directory I run
make clean
make realuninstall
It thus appears to me that all records in the IOC are built with base 7.0.6, the records are all part of base, and the device support and drivers are built with base
7.0.6.
However, I am getting the same errors:
ioc13lab2>
VME Bus Error accessing A32: 0xbfe37a50
machine check
Exception next instruction address: 0x0368d24c
Machine Status Register: 0x0008b032
Condition Register: 0x42000884
Task: 0x48ec280 "CAS-event"
0x48ec280 (CAS-event): task 0x48ec280 has had a failure and has been stopped.
0x48ec280 (CAS-event): The task has been terminated because it triggered an exception that raised the signal 10.
VME Bus Error accessing A16: 0x347e
machine check
Exception next instruction address: 0x035ef230
Machine Status Register: 0x0008b032
Condition Register: 0x44004824
Task: 0x48e90c0 "CAS-client"
0x48e90c0 (CAS-client): task 0x48e90c0 has had a failure and has been stopped.
0x48e90c0 (CAS-client): The task has been terminated because it triggered an exception that raised the signal 10.
If I comment out those 2 new fields in base the errors don’t occur.
What could possibly cause this?
Mark
-----Original Message-----
From: Michael Davidsaver <mdavidsaver at gmail.com>
Sent: Friday, May 27, 2022 8:37 AM
To: Mark Rivers <rivers at cars.uchicago.edu>
Cc: tech-talk at aps.anl.gov
Subject: Re: Bus errors accessing VME with base 7.0.6.1 and latest synApps modules
On 5/27/22 06:17, Mark Rivers wrote:
> Hi Michael,
>
> I commented out all lines in recGbl.c that reference the new amsg and namsg fields. That did not fix the problem.
>
> I then commented out the AMSG and NAMSG fields in dbCommon.dbd.pod. That did fix the problem.
>
> It is starting to seem like something in synApps is not working with
> these new fields? I have done the following before each build at the
> top-level of synApps
>
> make clean
> make realuninstall
>
> So it should have rebuilt everything.
Maybe what you are building isn't what is being loaded?
> I don't think my IOC uses any of the records that synApps adds except the asynRecord in asyn. I will strip down the application to only the modules actually needed, doing it one at a time and see if I can figure out what the offending
module is.
>
> Thanks,
> Mark
>
> -----Original Message-----
> From: Michael Davidsaver <mdavidsaver at gmail.com>
> Sent: Friday, May 27, 2022 2:29 AM
> To: Mark Rivers <rivers at cars.uchicago.edu>
> Cc: tech-talk at aps.anl.gov
> Subject: Re: Bus errors accessing VME with base 7.0.6.1 and latest
> synApps modules
>
> On 5/26/22 20:47, Mark Rivers wrote:
>> Hi Michael,
>>
>> Thanks for the instructions on git bisect. It seems to have worked. I only had to do 1 skip because base gave an error building on vxWorks, all of the rest built OK and clearly failed (in just a few seconds) or clearly worked (ran
for 10+ minutes).
>>
>> Here is the final result:
>>
>> corvette:local/epics-devel/base-7.0.6>git bisect bad
>> 892a361de7002bf2b3f375f24bc5bf61858de2e5 is the first bad commit
>> commit 892a361de7002bf2b3f375f24bc5bf61858de2e5
>> Author: Michael Davidsaver <mdavidsaver at gmail.com>
>> Date: Mon Mar 30 13:56:13 2020 -0700
>>
>> add alarm message field
>>
>> :040000 040000 e339a59018c9e2f28c3b4efd555b675a803e84f7 a7fd036f53c20f6e7fd673d7191ffb3b28770a79 M modules
>
>
https://github.com/epics-base/epics-base/commit/892a361de7002bf2b3f375
> f24bc5bf61858de2e5
>
> Well, that points the finger squarely at me...
>
> From the earlier references to dbEvent.c, the obvious place to start is is if you remove the db_post_events() call added by this commit.
>
>> diff --git a/modules/database/src/ioc/db/recGbl.c
>> b/modules/database/src/ioc/db/recGbl.c
>> index 95387f5de..971716fa5 100644
>> --- a/modules/database/src/ioc/db/recGbl.c
>> +++ b/modules/database/src/ioc/db/recGbl.c
>> @@ -204,7 +204,7 @@ unsigned short recGblResetAlarms(void *precord)
>> }
>> if (stat_mask) {
>> db_post_events(pdbc, &pdbc->stat, stat_mask);
>> - db_post_events(pdbc, &pdbc->amsg, stat_mask);
>> + //db_post_events(pdbc, &pdbc->amsg, stat_mask);
>> val_mask = DBE_ALARM;
>>
>> if (!pdbc->ackt || new_sevr >= pdbc->acks) {
>
>
>
>> Mark
>>
>>
>> -----Original Message-----
>> From: Michael Davidsaver <mdavidsaver at gmail.com>
>> Sent: Thursday, May 26, 2022 5:23 PM
>> To: Mark Rivers <rivers at cars.uchicago.edu>
>> Cc: tech-talk at aps.anl.gov
>> Subject: Re: Bus errors accessing VME with base 7.0.6.1 and latest
>> synApps modules
>>
>> On 5/26/22 14:48, Mark Rivers wrote:
>>> Other ideas of how to debug it?
>>
>> If we accept that the stack trace is correct, then memory corruption at some earlier point seems likely. It would be interesting to inspect the stack of the faulting thread. However, I don't know if this is possible with your setup,
and wouldn't be able to talk you through it anyway.
>>
>> At this point, maybe bite the bullet and go for git-bisect? This will likely take some time to work through, although perhaps less than what has already been spent.
>>
>> According to git 2.30.2 on my laptop, bisecting 7.0.5 through 7.0.6.1 would take 7 iterations assuming no skips (eg. build failures).
>>
>>
>> Begin with:
>>
>>> git bisect start R7.0.6.1 R7.0.5
>>
>> This will update your checkout to a revision for testing. Build and run. If if doesn't build, or the result is otherwise inconclusive, then 'skip':
>>
>>> git bisect skip
>>
>> If the problem appears:
>>
>>> git bisect bad
>>
>> If the problem does not appear:
>>
>>> git bisect good
>>
>>
>> Each time the checkout will be updated to another revision. Repeat until complete. Ideally this will narrow down to a single revision.
>> If there are skips, it will give a range of revisions (hopefully not including a 3.15 merge commit...).
>