1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 <2022> 2023 2024 2025 | Index | 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 <2022> 2023 2024 2025 |
<== Date ==> | <== Thread ==> |
---|
Subject: | Bus errors accessing VME with base 7.0.6.1 and latest synApps modules |
From: | Mark Rivers via Tech-talk <tech-talk at aps.anl.gov> |
To: | "Johnson, Andrew N." <anj at anl.gov> |
Cc: | "tech-talk at aps.anl.gov" <tech-talk at aps.anl.gov> |
Date: | Thu, 19 May 2022 13:33:37 +0000 |
Folks, I updated my VME crates from base 7.0.4 to 7.0.6.1, and also updated to the latest version of synApps modules. 5 of the 7 crates are now giving VME bus errors within a few
minutes of booting. Since it is happening on 5 crates it is highly unlikely to be a hardware problem. The errors are all for a small set of addresses: A16: 0x300e, 0x3136, 0x3172 These are in the address space of the first Industry Pack module on a VIPC616_01 or TVME200 IP carrier card A32: 0xbf57c014 I am not sure what device this is. The Joerger scalars are configured to start at address 0xB0000000, but I don’t think they use that much memory space. The task generating the error is always “CAS-event”. These are the errors: 13IDA crate: [Wed May 18 20:23:06 2022] VME Bus Error accessing A16: 0x300e [Wed May 18 20:23:06 2022] machine check [Wed May 18 20:23:06 2022] Exception next instruction address: 0x03604600 [Wed May 18 20:23:06 2022] Machine Status Register: 0x0008b032 [Wed May 18 20:23:06 2022] Condition Register: 0x42000884 [Wed May 18 20:23:06 2022] Task: 0x55d8aa0 "CAS-event" [Wed May 18 20:23:06 2022] 0x55d8aa0 (CAS-event): task 0x55d8aa0 has had a failure and has been stopped. [Wed May 18 20:23:06 2022] 0x55d8aa0 (CAS-event): The task has been terminated because it triggered an exception that raised the signal 10. [Wed May 18 20:48:19 2022] 0x4d50400 (timerQueue): callbackRequest: cbLow ring buffer full 13BMA crate [Wed May 18 18:16:04 2022] VME Bus Error accessing A16: 0x3136 [Wed May 18 18:16:04 2022] machine check [Wed May 18 18:16:04 2022] Exception next instruction address: 0x0141b5b8 [Wed May 18 18:16:04 2022] Machine Status Register: 0x0008b032 [Wed May 18 18:16:04 2022] Condition Register: 0x42000884 [Wed May 18 18:16:04 2022] Task: 0x3545520 "CAS-event" [Wed May 18 18:16:04 2022] 0x3545520 (CAS-event): task 0x3545520 has had a failure and has been stopped. [Wed May 18 18:16:04 2022] 0x3545520 (CAS-event): The task has been terminated because it triggered an exception that raised the signal 10. 13IDC crate: [Wed May 18 18:11:58 2022] VME Bus Error accessing A32: 0xbf57c014 [Wed May 18 18:11:58 2022] machine check [Wed May 18 18:11:58 2022] Exception next instruction address: 0x013962a0 [Wed May 18 18:11:58 2022] Machine Status Register: 0x0008b032 [Wed May 18 18:11:58 2022] Condition Register: 0x42000884 [Wed May 18 18:11:58 2022] Task: 0x395e4b0 "CAS-event" [Wed May 18 18:11:58 2022] 0x395e4b0 (CAS-event): task 0x395e4b0 has had a failure and has been stopped. [Wed May 18 18:11:58 2022] 0x395e4b0 (CAS-event): The task has been terminated because it triggered an exception that raised the signal 10. 13BMD crate: [Wed May 18 18:12:43 2022] VME Bus Error accessing A16: 0x317e [Wed May 18 18:12:43 2022] machine check [Wed May 18 18:12:43 2022] Exception next instruction address: 0x01395e40 [Wed May 18 18:12:43 2022] Machine Status Register: 0x0008b032 [Wed May 18 18:12:44 2022] Condition Register: 0x48000884 [Wed May 18 18:12:44 2022] Task: 0x3c73ae0 "CAS-event" [Wed May 18 18:12:44 2022] 0x3c73ae0 (CAS-event): task 0x3c73ae0 has had a failure and has been stopped. [Wed May 18 18:12:44 2022] 0x3c73ae0 (CAS-event): The task has been terminated because it triggered an exception that raised the signal 10. 13IDE crate: [Wed May 18 16:07:12 2022] VME Bus Error accessing A16: 0x317e [Wed May 18 16:07:12 2022] machine check [Wed May 18 16:07:12 2022] Exception next instruction address: 0x01395c20 [Wed May 18 16:07:12 2022] Machine Status Register: 0x0008b032 [Wed May 18 16:07:12 2022] Condition Register: 0x48000884 [Wed May 18 16:07:12 2022] Task: 0x3a13c60 "CAS-event" [Wed May 18 16:07:12 2022] 0x3a13c60 (CAS-event): task 0x3a13c60 has had a failure and has been stopped. [Wed May 18 16:07:12 2022] 0x3a13c60 (CAS-event): The task has been terminated because it triggered an exception that raised the signal 10.
[Wed May 18 18:20:22 2022] VME Bus Error accessing A32: 0xbf140014 [Wed May 18 18:20:22 2022] machine check [Wed May 18 18:20:22 2022] Exception next instruction address: 0x013959c0 [Wed May 18 18:20:22 2022] Machine Status Register: 0x0008b032 [Wed May 18 18:20:22 2022] Condition Register: 0x48000884 [Wed May 18 18:20:22 2022] Task: 0x3a2d780 "CAS-event" [Wed May 18 18:20:22 2022] 0x3a2d780 (CAS-event): task 0x3a2d780 has had a failure and has been stopped. [Wed May 18 18:20:23 2022] 0x3a2d780 (CAS-event): The task has been terminated because it triggered an exception that raised the signal 10. This is the result of “tt” and “ti” on a couple of the crashed tasks: ioc13ida> tt 0x55d8aa0 0x0012489c vxTaskEntry +0x48 : epicsThreadEntry () 0x036a6afc epicsThreadEntry+0x80 : 0x03604e20 () 0x036050f8 db_start_events+0x458: db_delete_field_log () value = 0 = 0x0 ioc13ida> ti 0x55d8aa0 NAME ENTRY TID PRI STATUS PC SP ERRNO DELAY ---------- ------------ -------- --- ---------- -------- -------- ------- ----- CAS-event epicsThread> 55d8aa0 139 STOP+I 3604600 55d89b0 0 0 full task name : CAS-event task entry : epicsThreadEntry process : kernel options : 0x1009001 VX_SUPERVISOR_MODE VX_DEALLOC_TCB VX_FP_TASK VX_DEALLOC_EXC_STACK STACK BASE END SP SIZE HIGH MARGIN --------- -------- -------- -------- ------- ------- ------- execution 55d8aa0 55d5bc0 55d89b0 12000 3568 8432 exception 55d9bf0 55d8d80 3696 1344 2352 VxWorks Events -------------- Events Pended on : Not Pended Received Events : 0x0 Options : N/A r0 = 0x40050000 sp = 0x055d89b0 r2 = 0x002f1ac0 r3 = 0x04e56c70 r4 = 0x05293e20 r5 = 0x00000001 r6 = 0x00000001 r7 = 0x00000000 r8 = 0x00000000 r9 = 0x0368a8b8 r10 = 0x00000000 r11 = 0x055d8920 r12 = 0x22000884 r13 = 0x00321e60 r14 = 0x036a5e98 r15 = 0x03fb9a44 r16 = 0x035ec8c0 r17 = 0x03fb99fc r18 = 0x00000000 r19 = 0x036a3f3c r20 = 0x03fb99bc r21 = 0x0360461c r22 = 0x00000001 r23 = 0x036045b4 r24 = 0x04341544 r25 = 0x0369d438 r26 = 0x0369d698 r27 = 0x05156b28 r28 = 0x03639dc8 r29 = 0x05b15c98 r30 = 0x05293e20 r31 = 0x05293e20 msr = 0x0008b032 lr = 0x036050f8 ctr = 0x036045b4 pc = 0x03604600 cr = 0x42000884 xer = 0x00000000 pgTblPtr = 0x02310000 scSrTblPtr = 0x0230fe74 srTblPtr = 0x0230fe74 fpcsr = 0 fr0 = -1 fr1 = NaN fr2 = NaN fr3 = NaN fr4 = NaN fr5 = NaN fr6 = NaN fr7 = NaN fr8 = NaN fr9 = NaN fr10 = NaN fr11 = NaN fr12 = NaN fr13 = NaN fr14 = NaN fr15 = NaN fr16 = NaN fr17 = NaN fr18 = NaN fr19 = NaN fr20 = NaN fr21 = NaN fr22 = NaN fr23 = NaN fr24 = NaN fr25 = NaN fr26 = NaN fr27 = NaN fr28 = NaN fr29 = NaN fr30 = NaN fr31 = NaN machine check Exception next instruction address: 0xva0lue = 3604600 Machine Status Register: 0x00008b032 Condition Register: 0x = 420008840x ioc13ide> tt 0x3a2d780 0x00121e7c vxTaskEntry +0x48 : epicsThreadEntry () 0x01437ebc epicsThreadEntry+0x80 : 0x013961e0 () 0x013964b8 db_start_events+0x458: db_delete_field_log () value = 0 = 0x0 ioc13ide> ti 0x3a2d780 NAME ENTRY TID PRI STATUS PC SP ERRNO DELAY ---------- ------------ -------- --- ---------- -------- -------- ------- ----- CAS-event epicsThread> 3a2d780 133 STOP+I 13959c0 3a2d690 0 0 full task name : CAS-event task entry : epicsThreadEntry process : kernel options : 0x1009001 VX_SUPERVISOR_MODE VX_DEALLOC_TCB VX_FP_TASK VX_DEALLOC_EXC_STACK STACK BASE END SP SIZE HIGH MARGIN --------- -------- -------- -------- ------- ------- ------- execution 3a2d780 3a2a8a0 3a2d690 12000 2800 9200 exception 3a2e8d0 3a2da60 3696 1344 2352 VxWorks Events -------------- Events Pended on : Not Pended Received Events : 0x0 Options : N/A r0 = 0x40050000 sp = 0x03a2d690 r2 = 0x002e5010 r3 = 0x02dc8530 r4 = 0x0366ac00 r5 = 0x00000001 r6 = 0x00000001 r7 = 0x00000000 r8 = 0x00000002 r9 = 0x0141bc78 r10 = 0x00000000 r11 = 0x03a2d600 r12 = 0x28000884 r13 = 0x00313850 r14 = 0x01437258 r15 = 0x01d4ae04 r16 = 0x0137dc80 r17 = 0x01d4adbc r18 = 0x00000000 r19 = 0x014352fc r20 = 0x01d4ad7c r21 = 0x013959dc r22 = 0x00000001 r23 = 0x01395974 r24 = 0x020d2904 r25 = 0x0142e7f8 r26 = 0x0142ea58 r27 = 0x0351b4c8 r28 = 0x013cb188 r29 = 0x03a98e70 r30 = 0x0366ac00 r31 = 0x0366ac00 msr = 0x0008b032 lr = 0x013964b8 ctr = 0x01395974 pc = 0x013959c0 cr = 0x48000884 xer = 0x00000000 pgTblPtr = 0x00af7000 scSrTblPtr = 0x00af65c4 srTblPtr = 0x00af65c4 fpcsr = 0 fr0 = -7.62951e-05 fr1 = NaN fr2 = NaN fr3 = NaN fr4 = NaN fr5 = NaN fr6 = NaN fr7 = NaN fr8 = NaN fr9 = NaN fr10 = NaN fr11 = NaN fr12 = NaN fr13 = NaN fr14 = NaN fr15 = NaN fr16 = NaN fr17 = NaN fr18 = NaN fr19 = NaN fr20 = NaN fr21 = NaN fr22 = NaN fr23 = NaN fr24 = NaN fr25 = NaN fr26 = NaN fr27 = NaN fr28 = NaN fr29 = NaN fr30 = NaN fr31 = NaN machine checkvalue = Exception next instruction address: 0x0013959c0 Machine Status Register: 0x = 0008b032 Condition Register: 0x048000884x Any ideas? Thanks, Mark |