EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  <20222023  2024  Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  <20222023  2024 
<== Date ==> <== Thread ==>

Subject: RE: [EXTERNAL] Bus errors accessing VME with base 7.0.6.1 and latest synApps modules
From: Mark Rivers via Tech-talk <tech-talk at aps.anl.gov>
To: "Hartman, Steven" <hartmansm at ornl.gov>
Cc: "tech-talk at aps.anl.gov" <tech-talk at aps.anl.gov>
Date: Wed, 8 Jun 2022 18:57:06 +0000

Yesterday I worked to strip the tests down to the minimum required to generate the failure.

 

-          I removed the TVME200 Industry Pack carrier card from the VME crate.  The only hardware in the crate is the MVME5100 CPU card.

-          I tested with base 7.0.6.1 so it was easy for others to reproduce.

-          I used only the ao and ai records from EPICS base with soft device support

 

When I built the example application with asyn, ipac, ip330, and dac128v it failed 3 tests in [71, 97, 55] seconds.  I got this error:

 

VME Bus Error accessing A32: 0xbffffffc

machine check

Exception next instruction address: 0x0329b200

Machine Status Register: 0x0008b032

Condition Register: 0x48000844

Task: 0x34d0d90 "CAC-event"

0x34d0d90 (CAC-event): task 0x34d0d90 has had a failure and has been stopped.

0x34d0d90 (CAC-event): The task has been terminated because it triggered an exception that raised the signal 10.

 

This is the stack trace:

 

iocexample> tt 0x34d0d90

0x0012489c vxTaskEntry  +0x48 : epicsThreadEntry ()

0x0333d704 epicsThreadEntry+0x80 : 0x0329ba28 ()

0x0329bd00 db_start_events+0x458: db_delete_field_log ()

value = 0 = 0x0

 

This is the register dump:

 

iocexample> ti  0x34d0d90

 

  NAME         ENTRY       TID    PRI   STATUS      PC       SP     ERRNO  DELAY

----------  ------------ -------- --- ---------- -------- -------- ------- -----

CAC-event   epicsThread>  34d0d90 148 STOP        329b200  34d0ca0       0     0

 

full task name : CAC-event

task entry     : epicsThreadEntry

process        : kernel

options        : 0x1009001

VX_SUPERVISOR_MODE  VX_DEALLOC_TCB      VX_FP_TASK          VX_DEALLOC_EXC_STACK

 

STACK      BASE     END       SP      SIZE    HIGH   MARGIN

--------- -------- -------- -------- ------- ------- -------

execution  34d0d90  34cdeb0  34d0ca0   12000    2784    9216

exception  26f8230  26f73c0             3696    1344    2352

 

VxWorks Events

--------------

Events Pended on    : Not Pended

Received Events     : 0x0

Options             : N/A

 

r0         = 0x40050000   sp         = 0x034d0ca0   r2         = 0x002f1ac0

r3         = 0x0346af70   r4         = 0x03506360   r5         = 0x00000001

r6         = 0x00000001   r7         = 0x0343f460   r8         = 0x000a0008

r9         = 0x033b0000   r10        = 0x00000001   r11        = 0x034d0c10

r12        = 0x28000844   r13        = 0x00321e60   r14        = 0x0333caa0

r15        = 0x0336d31c   r16        = 0x032834c8   r17        = 0x0336d2d4

r18        = 0x00000000   r19        = 0x0333ab44   r20        = 0x0336d294

r21        = 0x0329b224   r22        = 0x00000001   r23        = 0x0329b1bc

r24        = 0x033b0b0c   r25        = 0x03334040   r26        = 0x033342a0

r27        = 0x034cd9d8   r28        = 0x032ac2c0   r29        = 0x034d6868

r30        = 0x03506360   r31        = 0x03506360   msr        = 0x0008b032

lr         = 0x0329bd00   ctr        = 0x0329b1bc   pc         = 0x0329b200

cr         = 0x48000844   xer        = 0x00000000   pgTblPtr   = 0x02310000

scSrTblPtr = 0x0230fe74   srTblPtr   = 0x0230fe74

 

fpcsr  =        0

fr0    =       -2   fr1    =      NaN   fr2    =      NaN   fr3    =      NaN

fr4    =      NaN   fr5    =      NaN   fr6    =      NaN   fr7    =      NaN

fr8    =      NaN   fr9    =      NaN   fr10   =      NaN   fr11   =      NaN

fr12   =      NaN   fr13   =      NaN   fr14   =      NaN   fr15   =      NaN

fr16   =      NaN   fr17   =      NaN   fr18   =      NaN   fr19   =      NaN

fr20   =      NaN   fr21   =      NaN   fr22   =      NaN   fr23   =      NaN

fr24   =      NaN   fr25   =      NaN   fr26   =      NaN   fr27   =      NaN

fr28   =      NaN   fr29   =      NaN   fr30   =      NaN   fr31   =      NaN

 

machine checkvalue =

Exception next instruction address: 0x00329b200

Machine Status Register: 0x = 0008b032

Condition Register: 0x0x48000844

0

 

This is the disassembly around the next address (in red):

 

                        db_delete_field_log:

0x329b1bc  9421fff0    stwu        r1,-16(r1)

0x329b1c0  7c0802a6    mfspr       r0,LR

0x329b1c4  93e1000c    stw         r31,12(r1)

0x329b1c8  7c7f1b79    or.         r31,r3,r3

0x329b1cc  90010014    stw         r0,20(r1)

0x329b1d0  41820040    bc          0xc,2, 0x329b210 # 0x0329b210

0x329b1d4  801f0000    lwz         r0,0(r31)

0x329b1d8  2f800000    cmpi        crf7,0,r0,0x0 # 0

0x329b1dc  409c0018    bc          0x4,28, 0x329b1f4 # 0x0329b1f4

0x329b1e0  801f0050    lwz         r0,80(r31)

0x329b1e4  2f800000    cmpi        crf7,0,r0,0x0 # 0

0x329b1e8  419e000c    bc          0xc,30, 0x329b1f4 # 0x0329b1f4

0x329b1ec  7c0903a6    mtspr       CTR,r0

0x329b1f0  4e800421    bcctrl      0x14,0

0x329b1f4  3d20033b    lis         r9,0x33b # 827

0x329b1f8  7fe4fb78    or          r4,r31,r31

0x329b1fc  80690b08    lwz         r3,2824(r9)

0x329b200  3d200332    lis         r9,0x332 # 818

0x329b204  392914c0    addi        r9,r9,0x14c0 # 5312

0x329b208  7d2903a6    mtspr       CTR,r9

0x329b20c  4e800421    bcctrl      0x14,0

0x329b210  80010014    lwz         r0,20(r1)

0x329b214  83e1000c    lwz         r31,12(r1)

0x329b218  38210010    addi        r1,r1,0x10 # 16

0x329b21c  7c0803a6    mtspr       LR,r0

0x329b220  4e800020    blr

 

I then changed the application configure/RELEASE to comment out asyn, ipac, ip330, and dac128v, so the application is built without those.  The vxWorks link command is the following, showing that it is built only with EPICS base.

 

/usr/local/vw/vxWorks-6.9/gnu/4.3.3-vxworks-6.9/x86-linux2/bin/ldppc -r -o example  -L/corvette/usr/local/epics-devel/base-7.0.6/lib/vxWorks-ppc32/                   example_registerRecordDeviceDriver.o    -ldbRecStd -ldbCore -lca –lCom

 

When built that way it failed in 3 tests in [9300, 1200, 12400] seconds, so the MTBF is about 100 times longer than when it is built with asyn, ipac, etc.

 

I then went back to testing commit g85822f305.  I also changed to testing using the Ipac hardware, because that make it fail more quickly.

 

This is the first failure, after 1070 seconds:

 

VME Bus Error accessing A16: 0x347e

machine check

Exception next instruction address: 0x03326a90

Machine Status Register: 0x0008b032

Condition Register: 0x42000884

Task: 0x3593ed0 "CAS-event"

0x3593ed0 (CAS-event): task 0x3593ed0 has had a failure and has been stopped.

0x3593ed0 (CAS-event): The task has been terminated because it triggered an exception that raised the signal 10.

 

Stack trace:

 

iocexample> tt

0x0012489c vxTaskEntry  +0x48 : epicsThreadEntry ()

0x0334501c epicsThreadEntry+0x80 : 0x032a17f0 ()

0x032a1acc db_start_events+0x450: db_delete_field_log ()

0x032a0fe4 db_delete_field_log+0x54 : freeListFree ()

value = 0 = 0x0

 

Register dump:

 

iocexample> ti

 

  NAME         ENTRY       TID    PRI   STATUS      PC       SP     ERRNO  DELAY

----------  ------------ -------- --- ---------- -------- -------- ------- -----

CAS-event   epicsThread>  3593ed0 133 STOP+I      3326a90  3593de0       0     0

 

full task name : CAS-event

task entry     : epicsThreadEntry

process        : kernel

options        : 0x1009001

VX_SUPERVISOR_MODE  VX_DEALLOC_TCB      VX_FP_TASK          VX_DEALLOC_EXC_STACK

 

STACK      BASE     END       SP      SIZE    HIGH   MARGIN

--------- -------- -------- -------- ------- ------- -------

execution  3593ed0  3590ff0  3593de0   12000    2800    9200

exception  3595020  35941b0             3696    1344    2352

 

VxWorks Events

--------------

Events Pended on    : Not Pended

Received Events     : 0x0

Options             : N/A

 

r0         = 0x40000000   sp         = 0x03593de0   r2         = 0x002f1ac0

r3         = 0x03492c80   r4         = 0x0357f318   r5         = 0x00000001

r6         = 0x00000001   r7         = 0x00000000   r8         = 0x00140001

r9         = 0x03326a90   r10        = 0x00000000   r11        = 0x03593d50

r12        = 0x20000884   r13        = 0x00321e60   r14        = 0x033b6098

r15        = 0x03373c64   r16        = 0x03289478   r17        = 0x033424f0

r18        = 0x03373bdc   r19        = 0x03373c1c   r20        = 0x032a0ff8

r21        = 0x032a0f90   r22        = 0x00000000   r23        = 0x00000001

r24        = 0x033b609c   r25        = 0x03339340   r26        = 0x033390e0

r27        = 0x03443170   r28        = 0x03443170   r29        = 0x03569e18

r30        = 0x032dad44   r31        = 0x0357f318   msr        = 0x0008b032

lr         = 0x032a0fe4   ctr        = 0x03326a90   pc         = 0x03326a90

cr         = 0x42000884   xer        = 0x00000000   pgTblPtr   = 0x02310000

scSrTblPtr = 0x0230fe74   srTblPtr   = 0x0230fe74

 

fpcsr  =        0

fr0    =  -1.1635   fr1    =      NaN   fr2    =      NaN   fr3    =      NaN

fr4    =      NaN   fr5    =      NaN   fr6    =      NaN   fr7    =      NaN

fr8    =      NaN   fr9    =      NaN   fr10   =      NaN   fr11   =      NaN

fr12   =      NaN   fr13   =      NaN   fr14   =      NaN   fr15   =      NaN

fr16   =      NaN   fr17   =      NaN   fr18   =      NaN   fr19   =      NaN

fr20   =      NaN   fr21   =      NaN   fr22   =      NaN   fr23   =      NaN

fr24   =      NaN   fr25   =      NaN   fr26   =      NaN   fr27   =      NaN

fr28   =      NaN   fr29   =      NaN   fr30   =      NaN   fr31   =      NaN

 

machine checkvalue =

Exception next instruction address: 0x003326a90

Machine Status Register: 0x = 0008b032

Condition Register: 0x420008840x

0

 

Memory dump of R31 address:

 

iocexample> d  0x0357f318

NOTE: memory values are displayed in hexadecimal.

0x0357f310:                      4000 0000 3d02 496c  *        @ ..=.Il*

0x0357f320:  05bd 0a67 0000 0000 000a 0008 0000 0001  *...g............*

0x0357f330:  bff2 9db2 9db2 9db3 0000 0000 0000 0000  *................*

0x0357f340:  0357 f2f0 3d02 496c 05bd 0a67 0000 0000  *.W..=.Il...g....*

0x0357f350:  000a 0008 0000 0001 bff2 9ef2 9ef2 9ef3  *................*

0x0357f360:  0000 0000 0000 0000 0357 eff8 3d02 496c  *.........W..=.Il*

0x0357f370:  05bd 0a67 0000 0000 000a 0008 0000 0001  *...g............*

0x0357f380:  bff2 94f2 94f2 94f3 0000 0000 0000 0000  *................*

0x0357f390:  0357 f2c8 3d02 496c 05bd 0a67 0000 0000  *.W..=.Il...g....*

0x0357f3a0:  000a 0008 0000 0001 bff2 9ef2 9ef2 9ef3  *................*

0x0357f3b0:  0000 0000 0000 0000 0357 f6b0 3d02 496c  *.........W..=.Il*

0x0357f3c0:  05bd 0a67 0000 0000 000a 0008 0000 0001  *...g............*

0x0357f3d0:  bff2 b572 b572 b573 0000 0000 0000 0000  *...r.r.s........*

0x0357f3e0:  4000 0000 3d02 496c 05bd 0a67 0000 0000  *@...=.Il...g....*

0x0357f3f0:  000a 0008 0000 0001 bff2 9ef2 9ef2 9ef3  *................*

0x0357f400:  0000 0000 0000 0000 0357 f250 3d02 496c  *.........W.P=.Il*

0x0357f410:  05bd 0a67 0000 0000                      *...g............*

value = 0 = 0x0

 

Memory dump of R9 address:

 

iocexample> d 0x03326a90

NOTE: memory values are displayed in hexadecimal.

0x03326a90:  9421 fff0 7c08 02a6 3d20 0334 3929 90e0  *.!..|...= .49)..*

0x03326aa0:  7d29 03a6 9001 0014 93c1 0008 7c9e 2378  *})..........|.#x*

0x03326ab0:  93e1 000c 7c7f 1b78 8063 0014 4e80 0421  *....|..x.c..N..!*

0x03326ac0:  2f83 0000 41be 002c 3d20 0334 3c60 0338  */...A..,= .4<`.8*

0x03326ad0:  3ca0 0338 3929 24f0 3863 35a4 38a5 35c0  *<..89)$.8c5.8.5.*

0x03326ae0:  7d29 03a6 3880 008f 38c0 0000 4e80 0421  *})..8...8...N..!*

0x03326af0:  813f 0010 801f 0008 3929 0001 807f 0014  *.?......9)......*

0x03326b00:  913f 0010 3d20 0334 3929 9340 901e 0000  *.?..= .49).@....*

0x03326b10:  7d29 03a6 93df 0008 4e80 0421 8001 0014  *})......N..!....*

0x03326b20:  83c1 0008 7c08 03a6 83e1 000c 3821 0010  *....|.......8!..*

0x03326b30:  4e80 0020 9421 ffe0 7c08 02a6 9001 0024  *N.. .!..|......$*

0x03326b40:  93e1 001c 9381 0010 7c7c 1b78 83e3 000c  *........||.x....*

0x03326b50:  93a1 0014 2f9f 0000 93c1 0018 419e 0074  *..../.......A..t*

0x03326b60:  3d20 001e 3bc9 02fc 807f 0004 7fc9 03a6  *= ..;...........*

0x03326b70:  83bf 0000 4e80 0421 7fe3 fb78 7fc9 03a6  *....N..!...x....*

0x03326b80:  7fbf eb78 4e80 0421 2f9d 0000 409e ffdc  *...xN..!/...@...*

value = 0 = 0x0

 

Disassembly of R9 address:

 

iocexample> l 0x03326a90

                        freeListFree:

0x3326a90  9421fff0    stwu        r1,-16(r1)

0x3326a94  7c0802a6    mfspr       r0,LR

0x3326a98  3d200334    lis         r9,0x334 # 820

0x3326a9c  392990e0    addi        r9,r9,0x90e0 # -28448

0x3326aa0  7d2903a6    mtspr       CTR,r9

0x3326aa4  90010014    stw         r0,20(r1)

0x3326aa8  93c10008    stw         r30,8(r1)

0x3326aac  7c9e2378    or          r30,r4,r4

0x3326ab0  93e1000c    stw         r31,12(r1)

0x3326ab4  7c7f1b78    or          r31,r3,r3

value = 0 = 0x0

iocexample>

 

Other things I should check before rebooting?

 

Thanks,

Mark

 

From: Hartman, Steven <hartmansm at ornl.gov>
Sent: Wednesday, June 8, 2022 7:44 AM
To: Mark Rivers <rivers at cars.uchicago.edu>
Cc: Till Straumann <till.straumann at psi.ch>; tech-talk at aps.anl.gov
Subject: Re: [EXTERNAL] Bus errors accessing VME with base 7.0.6.1 and latest synApps modules

 

 



On Jun 7, 2022, at 4:39 PM, Mark Rivers via Tech-talk <tech-talk at aps.anl.gov> wrote:

 

Another time I tested the same version it failed with an A32 address error in freeListFree called from db_delete_field_log

 

 

Hi Mark—

 

Next time, can you also include a dump of the memory of the address in r9 (0x03326a90 in these examples)? In both cases, it made it far enough along to put freeListFree into r9, even in the case where the stack trace implicated access happening in db_delete_field_log(). 

 

(Great analysis, Till!) 

 

-- 
Steven Hartman
hartmansm at ornl.gov

 

 

 


References:
Re: Bus errors accessing VME with base 7.0.6.1 and latest synApps modules Michael Davidsaver via Tech-talk
Re: Bus errors accessing VME with base 7.0.6.1 and latest synApps modules Mark Rivers via Tech-talk
Re: Bus errors accessing VME with base 7.0.6.1 and latest synApps modules Mark Rivers via Tech-talk
Re: Bus errors accessing VME with base 7.0.6.1 and latest synApps modules Till Straumann via Tech-talk
RE: Bus errors accessing VME with base 7.0.6.1 and latest synApps modules Mark Rivers via Tech-talk
Re: [EXTERNAL] Bus errors accessing VME with base 7.0.6.1 and latest synApps modules Hartman, Steven via Tech-talk

Navigate by Date:
Prev: Re: compile EPICS without examples and tests Michael Davidsaver via Tech-talk
Next: RE: Bus errors accessing VME with base 7.0.6.1 and latest synApps modules Mark Rivers via Tech-talk
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  <20222023  2024 
Navigate by Thread:
Prev: Re: [EXTERNAL] Bus errors accessing VME with base 7.0.6.1 and latest synApps modules Hartman, Steven via Tech-talk
Next: Re: Bus errors accessing VME with base 7.0.6.1 and latest synApps modules Till Straumann via Tech-talk
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  <20222023  2024 
ANJ, 14 Sep 2022 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·