The failure is very sensitive to minor changes to the source code. If I add the following simple check of the value of pfl->u.r.dtor then there are no bus errors in over 2500 seconds.
diff --git a/modules/database/src/ioc/db/dbEvent.c b/modules/database/src/ioc/db/dbEvent.c
index 9fcc1ee..7b9970d 100644
--- a/modules/database/src/ioc/db/dbEvent.c
+++ b/modules/database/src/ioc/db/dbEvent.c
@@ -1199,7 +1199,12 @@ void db_delete_field_log (db_field_log *pfl)
{
if (pfl) {
/* Free field if reference type field log and dtor is set */
- if (pfl->type == dbfl_type_ref && pfl->u.r.dtor) pfl->u.r.dtor(pfl);
+ if (pfl->type == dbfl_type_ref && pfl->u.r.dtor) {
+ if ((pfl->u.r.dtor >= (dbfl_freeFunc *)0x80000000) && (pfl->u.r.dtor <= (dbfl_freeFunc *)0xBFFFFFFF)) {
+ printf("db_delete_field_log pfl->u.r.dtor is in A32 address space %p\n", pfl->u.r.dtor);
+ }
+ pfl->u.r.dtor(pfl);
+ }
/* Free the field log chunk */
freeListFree(dbevFieldLogFreeList, pfl);
}
If I then comment out those lines like this then I get a bus error in less than 20 seconds.
corvette:local/epics-devel/base-7.0.6>git diff modules
diff --git a/modules/database/src/ioc/db/dbEvent.c b/modules/database/src/ioc/db/dbEvent.c
index 9fcc1ee..6addadc 100644
--- a/modules/database/src/ioc/db/dbEvent.c
+++ b/modules/database/src/ioc/db/dbEvent.c
@@ -1199,7 +1199,12 @@ void db_delete_field_log (db_field_log *pfl)
{
if (pfl) {
/* Free field if reference type field log and dtor is set */
- if (pfl->type == dbfl_type_ref && pfl->u.r.dtor) pfl->u.r.dtor(pfl);
+ if (pfl->type == dbfl_type_ref && pfl->u.r.dtor) {
+// if ((pfl->u.r.dtor >= (dbfl_freeFunc *)0x80000000) && (pfl->u.r.dtor <= (dbfl_freeFunc *)0xBFFFFFFF)) {
+// printf("db_delete_field_log pfl->u.r.dtor is in A32 address space %p\n", pfl->u.r.dtor);
+// }
+ pfl->u.r.dtor(pfl);
+ }
/* Free the field log chunk */
freeListFree(dbevFieldLogFreeList, pfl);
}
Mark
From: Hartman, Steven <hartmansm at ornl.gov>
Sent: Thursday, June 9, 2022 12:56 PM
To: Mark Rivers <rivers at cars.uchicago.edu>
Cc: Till Straumann <till.straumann at psi.ch>; tech-talk at aps.anl.gov
Subject: Re: [EXTERNAL] Bus errors accessing VME with base 7.0.6.1 and latest synApps modules
On Jun 9, 2022, at 12:05 PM, Mark Rivers via Tech-talk <tech-talk at aps.anl.gov> wrote:
I am now convinced that the A32 addresses which generate VME bus errs are in fact the double values that are written to the ao records.
And this number shows up in the floating point register for the terminated task as FR0 . . .
VME Bus Error accessing A32: 0xbfe26ed0
...
fr0 = -0.576028
0xbfe26ed000000000 = -5.7602691650390620e-1
VME Bus Error accessing A32: 0xbffffffc
...
fr0 = -2
0xbffffffc00000000 = -1.9999961853027344
Still the big question as to where this is being treated as an address.