Folks,
I updated my VME crates from base 7.0.4 to 7.0.6.1, and also updated to the latest version of synApps modules. 5 of the 7 crates are now giving VME bus errors within a few
minutes of booting. Since it is happening on 5 crates it is highly unlikely to be a hardware problem.
The errors are all for a small set of addresses:
A16: 0x300e, 0x3136, 0x3172
These are in the address space of the first Industry Pack module on a VIPC616_01 or TVME200 IP carrier card
A32: 0xbf57c014
I am not sure what device this is. The Joerger scalars are configured to start at address 0xB0000000, but I don’t think they use that much memory space.
The task generating the error is always “CAS-event”.
These are the errors:
13IDA crate:
[Wed May 18 20:23:06 2022] VME Bus Error accessing A16: 0x300e
[Wed May 18 20:23:06 2022] machine check
[Wed May 18 20:23:06 2022] Exception next instruction address: 0x03604600
[Wed May 18 20:23:06 2022] Machine Status Register: 0x0008b032
[Wed May 18 20:23:06 2022] Condition Register: 0x42000884
[Wed May 18 20:23:06 2022] Task: 0x55d8aa0 "CAS-event"
[Wed May 18 20:23:06 2022] 0x55d8aa0 (CAS-event): task 0x55d8aa0 has had a failure and has been stopped.
[Wed May 18 20:23:06 2022] 0x55d8aa0 (CAS-event): The task has been terminated because it triggered an exception that raised the signal 10.
[Wed May 18 20:48:19 2022] 0x4d50400 (timerQueue): callbackRequest: cbLow ring buffer full
13BMA crate
[Wed May 18 18:16:04 2022] VME Bus Error accessing A16: 0x3136
[Wed May 18 18:16:04 2022] machine check
[Wed May 18 18:16:04 2022] Exception next instruction address: 0x0141b5b8
[Wed May 18 18:16:04 2022] Machine Status Register: 0x0008b032
[Wed May 18 18:16:04 2022] Condition Register: 0x42000884
[Wed May 18 18:16:04 2022] Task: 0x3545520 "CAS-event"
[Wed May 18 18:16:04 2022] 0x3545520 (CAS-event): task 0x3545520 has had a failure and has been stopped.
[Wed May 18 18:16:04 2022] 0x3545520 (CAS-event): The task has been terminated because it triggered an exception that raised the signal 10.
13IDC crate:
[Wed May 18 18:11:58 2022] VME Bus Error accessing A32: 0xbf57c014
[Wed May 18 18:11:58 2022] machine check
[Wed May 18 18:11:58 2022] Exception next instruction address: 0x013962a0
[Wed May 18 18:11:58 2022] Machine Status Register: 0x0008b032
[Wed May 18 18:11:58 2022] Condition Register: 0x42000884
[Wed May 18 18:11:58 2022] Task: 0x395e4b0 "CAS-event"
[Wed May 18 18:11:58 2022] 0x395e4b0 (CAS-event): task 0x395e4b0 has had a failure and has been stopped.
[Wed May 18 18:11:58 2022] 0x395e4b0 (CAS-event): The task has been terminated because it triggered an exception that raised the signal 10.
13BMD crate:
[Wed May 18 18:12:43 2022] VME Bus Error accessing A16: 0x317e
[Wed May 18 18:12:43 2022] machine check
[Wed May 18 18:12:43 2022] Exception next instruction address: 0x01395e40
[Wed May 18 18:12:43 2022] Machine Status Register: 0x0008b032
[Wed May 18 18:12:44 2022] Condition Register: 0x48000884
[Wed May 18 18:12:44 2022] Task: 0x3c73ae0 "CAS-event"
[Wed May 18 18:12:44 2022] 0x3c73ae0 (CAS-event): task 0x3c73ae0 has had a failure and has been stopped.
[Wed May 18 18:12:44 2022] 0x3c73ae0 (CAS-event): The task has been terminated because it triggered an exception that raised the signal 10.
13IDE crate:
[Wed May 18 16:07:12 2022] VME Bus Error accessing A16: 0x317e
[Wed May 18 16:07:12 2022] machine check
[Wed May 18 16:07:12 2022] Exception next instruction address: 0x01395c20
[Wed May 18 16:07:12 2022] Machine Status Register: 0x0008b032
[Wed May 18 16:07:12 2022] Condition Register: 0x48000884
[Wed May 18 16:07:12 2022] Task: 0x3a13c60 "CAS-event"
[Wed May 18 16:07:12 2022] 0x3a13c60 (CAS-event): task 0x3a13c60 has had a failure and has been stopped.
[Wed May 18 16:07:12 2022] 0x3a13c60 (CAS-event): The task has been terminated because it triggered an exception that raised the signal 10.
[Wed May 18 18:20:22 2022] VME Bus Error accessing A32: 0xbf140014
[Wed May 18 18:20:22 2022] machine check
[Wed May 18 18:20:22 2022] Exception next instruction address: 0x013959c0
[Wed May 18 18:20:22 2022] Machine Status Register: 0x0008b032
[Wed May 18 18:20:22 2022] Condition Register: 0x48000884
[Wed May 18 18:20:22 2022] Task: 0x3a2d780 "CAS-event"
[Wed May 18 18:20:22 2022] 0x3a2d780 (CAS-event): task 0x3a2d780 has had a failure and has been stopped.
[Wed May 18 18:20:23 2022] 0x3a2d780 (CAS-event): The task has been terminated because it triggered an exception that raised the signal 10.
This is the result of “tt” and “ti” on a couple of the crashed tasks:
ioc13ida> tt 0x55d8aa0
0x0012489c vxTaskEntry +0x48 : epicsThreadEntry ()
0x036a6afc epicsThreadEntry+0x80 : 0x03604e20 ()
0x036050f8 db_start_events+0x458: db_delete_field_log ()
value = 0 = 0x0
ioc13ida> ti 0x55d8aa0
NAME ENTRY TID PRI STATUS PC SP ERRNO DELAY
---------- ------------ -------- --- ---------- -------- -------- ------- -----
CAS-event epicsThread> 55d8aa0 139 STOP+I 3604600 55d89b0 0 0
full task name : CAS-event
task entry : epicsThreadEntry
process : kernel
options : 0x1009001
VX_SUPERVISOR_MODE VX_DEALLOC_TCB VX_FP_TASK VX_DEALLOC_EXC_STACK
STACK BASE END SP SIZE HIGH MARGIN
--------- -------- -------- -------- ------- ------- -------
execution 55d8aa0 55d5bc0 55d89b0 12000 3568 8432
exception 55d9bf0 55d8d80 3696 1344 2352
VxWorks Events
--------------
Events Pended on : Not Pended
Received Events : 0x0
Options : N/A
r0 = 0x40050000 sp = 0x055d89b0 r2 = 0x002f1ac0
r3 = 0x04e56c70 r4 = 0x05293e20 r5 = 0x00000001
r6 = 0x00000001 r7 = 0x00000000 r8 = 0x00000000
r9 = 0x0368a8b8 r10 = 0x00000000 r11 = 0x055d8920
r12 = 0x22000884 r13 = 0x00321e60 r14 = 0x036a5e98
r15 = 0x03fb9a44 r16 = 0x035ec8c0 r17 = 0x03fb99fc
r18 = 0x00000000 r19 = 0x036a3f3c r20 = 0x03fb99bc
r21 = 0x0360461c r22 = 0x00000001 r23 = 0x036045b4
r24 = 0x04341544 r25 = 0x0369d438 r26 = 0x0369d698
r27 = 0x05156b28 r28 = 0x03639dc8 r29 = 0x05b15c98
r30 = 0x05293e20 r31 = 0x05293e20 msr = 0x0008b032
lr = 0x036050f8 ctr = 0x036045b4 pc = 0x03604600
cr = 0x42000884 xer = 0x00000000 pgTblPtr = 0x02310000
scSrTblPtr = 0x0230fe74 srTblPtr = 0x0230fe74
fpcsr = 0
fr0 = -1 fr1 = NaN fr2 = NaN fr3 = NaN
fr4 = NaN fr5 = NaN fr6 = NaN fr7 = NaN
fr8 = NaN fr9 = NaN fr10 = NaN fr11 = NaN
fr12 = NaN fr13 = NaN fr14 = NaN fr15 = NaN
fr16 = NaN fr17 = NaN fr18 = NaN fr19 = NaN
fr20 = NaN fr21 = NaN fr22 = NaN fr23 = NaN
fr24 = NaN fr25 = NaN fr26 = NaN fr27 = NaN
fr28 = NaN fr29 = NaN fr30 = NaN fr31 = NaN
machine check
Exception next instruction address: 0xva0lue = 3604600
Machine Status Register: 0x00008b032
Condition Register: 0x = 420008840x
ioc13ide> tt 0x3a2d780
0x00121e7c vxTaskEntry +0x48 : epicsThreadEntry ()
0x01437ebc epicsThreadEntry+0x80 : 0x013961e0 ()
0x013964b8 db_start_events+0x458: db_delete_field_log ()
value = 0 = 0x0
ioc13ide> ti 0x3a2d780
NAME ENTRY TID PRI STATUS PC SP ERRNO DELAY
---------- ------------ -------- --- ---------- -------- -------- ------- -----
CAS-event epicsThread> 3a2d780 133 STOP+I 13959c0 3a2d690 0 0
full task name : CAS-event
task entry : epicsThreadEntry
process : kernel
options : 0x1009001
VX_SUPERVISOR_MODE VX_DEALLOC_TCB VX_FP_TASK VX_DEALLOC_EXC_STACK
STACK BASE END SP SIZE HIGH MARGIN
--------- -------- -------- -------- ------- ------- -------
execution 3a2d780 3a2a8a0 3a2d690 12000 2800 9200
exception 3a2e8d0 3a2da60 3696 1344 2352
VxWorks Events
--------------
Events Pended on : Not Pended
Received Events : 0x0
Options : N/A
r0 = 0x40050000 sp = 0x03a2d690 r2 = 0x002e5010
r3 = 0x02dc8530 r4 = 0x0366ac00 r5 = 0x00000001
r6 = 0x00000001 r7 = 0x00000000 r8 = 0x00000002
r9 = 0x0141bc78 r10 = 0x00000000 r11 = 0x03a2d600
r12 = 0x28000884 r13 = 0x00313850 r14 = 0x01437258
r15 = 0x01d4ae04 r16 = 0x0137dc80 r17 = 0x01d4adbc
r18 = 0x00000000 r19 = 0x014352fc r20 = 0x01d4ad7c
r21 = 0x013959dc r22 = 0x00000001 r23 = 0x01395974
r24 = 0x020d2904 r25 = 0x0142e7f8 r26 = 0x0142ea58
r27 = 0x0351b4c8 r28 = 0x013cb188 r29 = 0x03a98e70
r30 = 0x0366ac00 r31 = 0x0366ac00 msr = 0x0008b032
lr = 0x013964b8 ctr = 0x01395974 pc = 0x013959c0
cr = 0x48000884 xer = 0x00000000 pgTblPtr = 0x00af7000
scSrTblPtr = 0x00af65c4 srTblPtr = 0x00af65c4
fpcsr = 0
fr0 = -7.62951e-05 fr1 = NaN fr2 = NaN fr3 = NaN
fr4 = NaN fr5 = NaN fr6 = NaN fr7 = NaN
fr8 = NaN fr9 = NaN fr10 = NaN fr11 = NaN
fr12 = NaN fr13 = NaN fr14 = NaN fr15 = NaN
fr16 = NaN fr17 = NaN fr18 = NaN fr19 = NaN
fr20 = NaN fr21 = NaN fr22 = NaN fr23 = NaN
fr24 = NaN fr25 = NaN fr26 = NaN fr27 = NaN
fr28 = NaN fr29 = NaN fr30 = NaN fr31 = NaN
machine checkvalue =
Exception next instruction address: 0x0013959c0
Machine Status Register: 0x = 0008b032
Condition Register: 0x048000884x
Any ideas?
Thanks,
Mark