Ø
How is that done? What did you actually have to do so that the mapping is now correct, that wasn’t happening before.
Ø
What will everyone need to do if and when they move on to base-7.0.6.1 . . . .
That mapping is set in the vxWorks BSP. Andrew Johnson made a new version for the MVME5100 today, mv5100-dev7. It is on the oxygen server at the APS. Now that we know it
works he will also make new versions for the 2100, 2700, and 6100 CPUs. You just need to load that new version of the BSP using the vxWorks “bootChange” command.
Mark
From: Engbretson, Mark S. <engbretson at anl.gov>
Sent: Monday, June 13, 2022 2:40 PM
To: Mark Rivers <rivers at cars.uchicago.edu>; tech-talk at aps.anl.gov
Subject: RE: Bus errors accessing VME with base 7.0.6.1 and latest synApps modules
How is that done? What did you actually have to do so that the mapping is now correct, that wasn’t happening before.
What will everyone need to do if and when they move on to base-7.0.6.1 . . . .
Hi Mark,
Ø
And how/where is that actually now being done correctly? In Epics base?
No, in the PowerPC processor itself.
If the mapping is wrong it gets a “machine check” error from which it cannot recover when executing the speculative branch.
If the mapping is correct it gets an “instruction access” error from which it can recover when executing the speculative branch.
Mark
RE: ”By setting the mapping in the BSP correctly the PowerPC receives a different exception when that happens, which it can correctly handle.”
And how/where is that actually now being done correctly? In Epics base?
Folks,
I’m happy to report that after 3.5 weeks this problem is finally solved!
The bottom line is that it was an issue with the vxWorks Board Support Package (BSP) we are using at the APS. The problem was that the mapping was not correctly configured to set the VME memory space as “guarded”. That guarding prevents
certain memory operations like speculative branch execution from happening in the VME memory space.
This BSP problem has existed for a long time, but did not manifest itself until base 7.0.6, where some changes were made to the db_field_log structure. That structure contains a union which can either hold a record field value or a pointer
to a destructor, depending on a flag bit in db_field_log. It appears that the PowerPC processor was doing a speculative branch which called the destructor. That was the wrong branch, because the flag said the union held a field value, not a pointer to a
destructor. If the value was a double in the range -2.0 to -0.5 then its high-order 32 bits correspond to a VME A32 address, and the speculative branch execution tried to execute the destructor at that VME address. It then got a “machine check” with an A32
VME bus error.
By setting the mapping in the BSP correctly the PowerPC receives a different exception when that happens, which it can correctly handle.
Thanks very much to Till Straumann, Michael Davidsaver, Andrew Johnson, and Steven Hartman for their help in tracking this down.
Base R7.0.6.1 now runs fine on my VME IOCs that were having this problem.
Mark
Hi Till,
I tried your test program where I expanded to 36 NOPs.
void address_test() {
*(volatile uint16_t*)0xfbff345e = 0;
asm volatile( " addi 3, 3, 0"); /* I hope I remember this correctly; this is just a NOP */
asm volatile( " addi 3, 3, 0"); /* I hope I remember this correctly; this is just a NOP */
asm volatile( " addi 3, 3, 0"); /* I hope I remember this correctly; this is just a NOP */
asm volatile( " addi 3, 3, 0"); /* I hope I remember this correctly; this is just a NOP */
asm volatile( " addi 3, 3, 0"); /* I hope I remember this correctly; this is just a NOP */
asm volatile( " addi 3, 3, 0"); /* I hope I remember this correctly; this is just a NOP */
asm volatile( " addi 3, 3, 0"); /* I hope I remember this correctly; this is just a NOP */
asm volatile( " addi 3, 3, 0"); /* I hope I remember this correctly; this is just a NOP */
asm volatile( " addi 3, 3, 0"); /* I hope I remember this correctly; this is just a NOP */
asm volatile( " addi 3, 3, 0"); /* I hope I remember this correctly; this is just a NOP */
asm volatile( " addi 3, 3, 0"); /* I hope I remember this correctly; this is just a NOP */
asm volatile( " addi 3, 3, 0"); /* I hope I remember this correctly; this is just a NOP */
asm volatile( " addi 3, 3, 0"); /* I hope I remember this correctly; this is just a NOP */
asm volatile( " addi 3, 3, 0"); /* I hope I remember this correctly; this is just a NOP */
asm volatile( " addi 3, 3, 0"); /* I hope I remember this correctly; this is just a NOP */
asm volatile( " addi 3, 3, 0"); /* I hope I remember this correctly; this is just a NOP */
asm volatile( " addi 3, 3, 0"); /* I hope I remember this correctly; this is just a NOP */
asm volatile( " addi 3, 3, 0"); /* I hope I remember this correctly; this is just a NOP */
asm volatile( " addi 3, 3, 0"); /* I hope I remember this correctly; this is just a NOP */
asm volatile( " addi 3, 3, 0"); /* I hope I remember this correctly; this is just a NOP */
asm volatile( " addi 3, 3, 0"); /* I hope I remember this correctly; this is just a NOP */
asm volatile( " addi 3, 3, 0"); /* I hope I remember this correctly; this is just a NOP */
asm volatile( " addi 3, 3, 0"); /* I hope I remember this correctly; this is just a NOP */
asm volatile( " addi 3, 3, 0"); /* I hope I remember this correctly; this is just a NOP */
asm volatile( " addi 3, 3, 0"); /* I hope I remember this correctly; this is just a NOP */
asm volatile( " addi 3, 3, 0"); /* I hope I remember this correctly; this is just a NOP */
asm volatile( " addi 3, 3, 0"); /* I hope I remember this correctly; this is just a NOP */
asm volatile( " addi 3, 3, 0"); /* I hope I remember this correctly; this is just a NOP */
asm volatile( " addi 3, 3, 0"); /* I hope I remember this correctly; this is just a NOP */
asm volatile( " addi 3, 3, 0"); /* I hope I remember this correctly; this is just a NOP */
asm volatile( " addi 3, 3, 0"); /* I hope I remember this correctly; this is just a NOP */
asm volatile( " addi 3, 3, 0"); /* I hope I remember this correctly; this is just a NOP */
asm volatile( " addi 3, 3, 0"); /* I hope I remember this correctly; this is just a NOP */
asm volatile( " addi 3, 3, 0"); /* I hope I remember this correctly; this is just a NOP */
asm volatile( " addi 3, 3, 0"); /* I hope I remember this correctly; this is just a NOP */
/* copy/paste ~30 of these NOPs here */
}
Much to my surprise the program does not generate a bus error. The two A16 addresses that it most frequently failed with were 0x345e and 0x347e. Your program tests the first one.
iocexample> address_test
value = 0 = 0x0
I then tried writing to those A16 addresses manually from the vxWorks shell. I can write to both of them with no bus error. So I was wrong, those addresses are both readable and writeable.
iocexample> *0xfbff345e = 0
address_test + 0xf8a6281e = 0xfbff345e: value = 0 = 0x0
iocexample> *0xfbff347e = 0
address_test + 0xf8a6283e = 0xfbff347e: value = 65353 = 0xff49 = 'I'
However, if I try the next address after 7e I do get a bus error, because that is beyond the range of the IP330 module.
iocexample> *0xfbff3480 = 0
VME Bus Error accessing A16: 0x3480
machine check
Exception next instruction address: 0x001fe4f4
Machine Status Register: 0x0008b032
Condition Register: 0x88000244
0x0012489c vxTaskEntry +0x48 : 0x0020c554 ()
0x0020c554 shellTask +0x4a8: shellExec ()
0x0020c020 shellExec +0x11c: 0x00201a78 ()
0x00201ca0 shellInterpCInit+0x640: shellInterpCparse ()
0x00201384 shellInterpCparse+0xa3c: 0x001fe9f4 ()
0x001fea24 shellInterpClex+0x1d6c: 0x001fe6dc ()
0x001fe7a0 shellInterpClex+0x1ae8: 0x001fe490 ()
Shell task 'tShell0' restarted...
So this leaves the question of why we get those A16 bus errors at all!
Mark