Hm.
So, in theory, the 2 versions are identical - we can agree on that.
There is a possible difference when using enums.
What happens, when we have crap for some reasons ?
dbfl_type_val is probably 0, and
dbfl_type_ref is 1.
using "if (type == 1)" is slightly different from "if(type != 0)" when we find an
illegal value here.
It could be interesting to see the assembler code for both versions.
And may be to add an assert, just to be sure, that there is only 0 or 1 in the
memory area. Probably the processor uses a full 32 bit word here ?
Do we need a volatile here ?
Michael, it seems as if you have checked 56f05d722dee4b8ca2968b8bface2737a3a9b185:
" I still think this is a correct case to take."
On the other hand, it seems as if the "git bisect" session from Mark point out
exactly 56f05d722dee4b8ca2968b8bface2737a3a9b185 as "problematic (my words)"
So what is going on ?
Is there a memory corruption, which had never been observed before,
because we had been lucky ?
And which is now hitting us ?
HTH
/Torsten
On 2022-06-01, 17:16, "Michael Davidsaver" <mdavidsaver at gmail.com> wrote:
On 6/1/22 01:59, Torsten Bögershausen wrote:
> There is something that I do not understand.
> Commit 85822f3051d2236144bb46dc2c24b7e38143e531 states
> "add macro".
>
> However, the change does not only introduce a macro, it changes the code
> as well_
> Old: type==dbfl_type_ref
> New: type==dbfl_type_val
Note the negation: !dbfl_has_copy(pfl)
See below.
> Is this a contradiction ?
> Strictly speaking, either the commit message is incomplete,
> (A macro is introduced and code is changed)
> or the code change is introducing an unintended change.
>
> Trying to be helpful.... Any thoughts ?
>
>
>
>
>
> ========================
> commit 85822f3051d2236144bb46dc2c24b7e38143e531
> Author: Ben Franksen <benjamin.franksen at helmholtz-berlin.de>
> Date: Wed Apr 1 10:42:22 2020 +0200
>
> add macro dbfl_has_copy to db_field_log.h and use it in dbAccess.c
>
> []
> - if ((!pfl || (pfl->type==dbfl_type_ref && !pfl->u.r.dtor)) &&
> + if (!dbfl_has_copy(pfl) &&
The replacement line expands to
> !((pfl) && ((pfl)->type==dbfl_type_val || (pfl)->u.r.dtor || (pfl)->no_elements==0))
Applying the identity !(A && B) == !A || !B
> !pfl || !((pfl)->type==dbfl_type_val || (pfl)->u.r.dtor || (pfl)->no_elements==0)
Applying !(A || B) == !A && !B
> !pfl || (!((pfl)->type==dbfl_type_val) && !(pfl)->u.r.dtor && !((pfl)->no_elements==0))
Applying !(A==B) == (A!=B)
> !pfl || ((pfl)->type!=dbfl_type_val && !(pfl)->u.r.dtor && (pfl)->no_elements!=0)
Ben's change removes 'dbfl_type_rec', leaving only two states _val and _rec, so
!=_val is the same as ==_ref :
> !pfl || ((pfl)->type==dbfl_type_ref && !(pfl)->u.r.dtor && (pfl)->no_elements!=0)
So there is a change with the addition of no_elements!=0, which I don't think is consequential.
In this particular conditional, I think that offset=0 is the right answer for no_elements==0.
> []
> +#define dbfl_has_copy(p)\
> + ((p) && ((p)->type==dbfl_type_val || (p)->u.r.dtor || (p)->no_elements==0))
> +
>
> =======================
> commit 56f05d722dee4b8ca2968b8bface2737a3a9b185
> Author: Ben Franksen <benjamin.franksen at helmholtz-berlin.de>
> Date: Thu Jan 14 17:38:58 2021 +0100
>
> fix in dbGet: decide use of db_field_log based on whether it has copy or not
>
> - if (!pfl) {
> + if (!dbfl_has_copy(pfl)) {
In this case, there is an earlier test that no_elements==1
While not clear in email form. There are ~50 lines between these two changes.
> - } else if (!pfl) {
> + } else if (!dbfl_has_copy(pfl)) {
This is on the branch where offset!=0 || no_elements!=0
I still think this is a correct case to take.
> On 6/1/22 05:17, Mark Rivers wrote:
>> The git bisect pinpointed commit 56f05d722dee4b8ca2968b8bface2737a3a9b185 as the first one that caused the problem with soft device
>>
>> This is the difference between that commit and the previous one:
>>
>> corvette:local/epics-devel/base-7.0.6>git diff 85822f3051d2236144bb46dc2c24b7e38143e531 56f05d722dee4b8ca2968b8bface2737a3a9b185
>>
>> diff --git a/modules/database/src/ioc/db/dbAccess.c b/modules/database/src/ioc/db/dbAccess.c
>>
>> index 3f7554a..d50b256 100644
>>
>> --- a/modules/database/src/ioc/db/dbAccess.c
>>
>> +++ b/modules/database/src/ioc/db/dbAccess.c
>>
>> @@ -952,7 +952,7 @@ long dbGet(DBADDR *paddr, short dbrType,
>>
>> goto done;
>>
>> }
>>
>> - if (!pfl) {
>>
>> + if (!dbfl_has_copy(pfl)) {
>>
>> status = dbFastGetConvertRoutine[field_type][dbrType]
>>
>> (paddr->pfield, pbuf, paddr);
>>
>> } else {
>>
>> @@ -1000,7 +1000,7 @@ long dbGet(DBADDR *paddr, short dbrType,
>>
>> /* convert data into the caller's buffer */
>>
>> if (n <= 0) {
>>
>> ; /*do nothing */
>>
>> - } else if (!pfl) {
>>
>> + } else if (!dbfl_has_copy(pfl)) {
>>
>> status = convert(paddr, pbuf, n, capacity, offset);
>>
>> } else {
>>
>> DBADDR localAddr = *paddr; /* Structure copy */
>>
>> That commit was not included in base R7.0.5 but it was included in R7.0.6.
>>
>> I ran 3 tests with the above commit with the soft device records, and it failed in [3, 6, 3] seconds.
>>
>> I then tested with the previous commit (85822f3051d2236144bb46dc2c24b7e38143e531) and it ran for 5700 seconds without failing before I stopped it.
>>
>> When running the tests with the soft records 100% of the VME bus errors were A32 address space, and the failing task was always CAC-event.
>>
>> Now that I am quite sure that commit 85822f3051d2236144bb46dc2c24b7e38143e531 fixes the problems with soft device support, I tested that same commit with the Ip330 and dac128V hardware devices. That commit is before Michael’s commit that added the AMSG and NAMSG fields to dbCommon, so they cannot be an issue.
>>
>> In 3 tests it failed after [60, 1227, 844] seconds
>>
>> The test with hardware first fails with an A16 error, not A32.
>>
>> VME Bus Error accessing A16: 0x347e
>>
>> machine check
>>
>> Exception next instruction address: 0x032a1a40
>>
>> Machine Status Register: 0x0008b032
>>
>> Condition Register: 0x28000884
>>
>> Task: 0x26fccd0 "CAS-event"
>>
>> 0x26fccd0 (CAS-event): task 0x26fccd0 has had a failure and has been stopped.
>>
>> 0x26fccd0 (CAS-event): The task has been terminated because it triggered an exception that raised the signal 10.
>>
>> The task trace is not very informative, but note that it is not in db_delete_field_log(), which all previous failures have been, both soft and hard device support.
>>
>> iocexample> tt 0x26fccd0
>>
>> 0x0012489c vxTaskEntry +0x48 : epicsThreadEntry ()
>>
>> 0x0334501c epicsThreadEntry+0x80 : 0x032a17f0 ()
>>
>> value = 0 = 0x0
>>
>> This was the second failure and tt
>>
>> VME Bus Error accessing A32: 0xbffb3330
>>
>> machine check
>>
>> Exception next instruction address: 0x032e1654
>>
>> Machine Status Register: 0x0008b032
>>
>> Condition Register: 0x48002882
>>
>> Task: 0x26fcd40 "CAS-event"
>>
>> 0x26fcd40 (CAS-event): task 0x26fcd40 has had a failure and has been stopped.
>>
>> 0x26fcd40 (CAS-event): The task has been terminated because it triggered an exception that raised the signal 10.
>>
>> iocexample> tt 0x26fcd40
>>
>> 0x0012489c vxTaskEntry +0x48 : epicsThreadEntry ()
>>
>> 0x0334501c epicsThreadEntry+0x80 : 0x032a17f0 ()
>>
>> 0x032a1a8c db_start_events+0x410: 0x032dad44 ()
>>
>> 0x032dae54 rsrvFreePutNotify+0xb44: cas_copy_in_header ()
>>
>> value = 0 = 0x0
>>
>> This was the third failure and tt
>>
>> VME Bus Error accessing A16: 0x347e
>>
>> machine check
>>
>> Exception next instruction address: 0x032a0fe0
>>
>> Machine Status Register: 0x0008b032
>>
>> Condition Register: 0x48000884
>>
>> Task: 0x230d720 "CAS-event"
>>
>> 0x230d720 (CAS-event): task 0x230d720 has had a failure and has been stopped.
>>
>> 0x230d720 (CAS-event): The task has been terminated because it triggered an exception that raised the signal 10.
>>
>> iocexample> tt 0x230d720
>>
>> 0x0012489c vxTaskEntry +0x48 : epicsThreadEntry ()
>>
>> 0x0334501c epicsThreadEntry+0x80 : 0x032a17f0 ()
>>
>> 0x032a1acc db_start_events+0x450: db_delete_field_log ()
>>
>> value = 0 = 0x0
>>
>> So this is interesting. Commit 85822f3051d2236144bb46dc2c24b7e38143e531 appears to fix the problem with the CAC-event errors with soft device support. But that commit does not fix the issue with hardware device support, so there must be an earlier commit that is causing these problems.
>>
>> Mark
>>
>> *From:* Mark Rivers
>> *Sent:* Tuesday, May 31, 2022 3:58 PM
>> *To:* Torsten Bögershausen <Torsten.Bogershausen at ess.eu>; Michael Davidsaver <mdavidsaver at gmail.com>
>> *Cc:* tech-talk at aps.anl.gov
>> *Subject:* RE: Bus errors accessing VME with base 7.0.6.1 and latest synApps modules
>>
>> I did a test with the latest 3.15 base, 3.15.9. It works fine.
>>
>> Torsten suggested:
>>
>> ØSo R7.0.5 is good, and f9ea6a5bff695c5f88bb95dce38a3fd349738907 is bad ?
>>
>> ØThere are some "real" commits, and merges:
>>
>> Øgit log R7.0.5..f9ea6a5bff695c5f88bb95dce38a3fd349738907
>>
>> ØThen it could make sense, to bisect between those 2?
>>
>> That was a good idea. I did:
>>
>> git bisect start f9ea6a5bff695c5f88bb95dce38a3fd349738907 R7.0.5
>>
>> The results were [bad, good, bad, good, bad, bad]. This is the final output:
>>
>> corvette:local/epics-devel/base-7.0.6>git bisect bad
>>
>> 56f05d722dee4b8ca2968b8bface2737a3a9b185 is the first bad commit
>>
>> commit 56f05d722dee4b8ca2968b8bface2737a3a9b185
>>
>> Author: Ben Franksen <benjamin.franksen at helmholtz-berlin.de <mailto:benjamin.franksen at helmholtz-berlin.de>>
>>
>> Date: Thu Jan 14 17:38:58 2021 +0100
>>
>> fix in dbGet: decide use of db_field_log based on whether it has copy or not
>>
>> :040000 040000 c097595692ed4936a3c90b5583fe78d635635e07 62d947b9bb7f1bc2c21375dfaf13917f4500d59f M modules
>>
>> Since every single failure, whether in CAS-event or in CAC-event is failing in db_delete_field_log this commit does seem very suspicious. I will run long tests on the commit before this one, and on this commit.
>>
>> Torsten asked:
>>
>> ØCould it be that a stack has become too short under vxWorks ?
>>
>> I checked that with checkStack after it failed running 7.0.6. None of the tasks have run out of stack space.
>>
>> iocexample> checkStack
>>
>> NAME ENTRY TID SIZE CUR HIGH MARGIN
>>
>> ------------ ------------ ---------- ----- ----- ----- ------
>>
>> tJobTask 0x1c6b00 0x23b8fa0 8000 224 608 7392
>>
>> (Exception Stack) 4000 0 96 3904
>>
>> tExcTask 0x1c61f8 0x33a5d0 8192 256 1248 6944
>>
>> (Exception Stack) 4000 0 96 3904
>>
>> tLogTask logTask 0x23bc340 5008 368 1232 3776
>>
>> (Exception Stack) 4000 0 96 3904
>>
>> tShell0 shellTask 0x351a600 65536 864 7328 58208
>>
>> (Exception Stack) 3696 0 96 3600
>>
>> tWdbTask 0x261f48 0x26e0a30 8192 272 320 7872
>>
>> (Exception Stack) 3696 0 96 3600
>>
>> ipcom_tickd 0x285b88 0x26c6ea0 6144 256 576 5568
>>
>> (Exception Stack) 4000 0 96 3904
>>
>> tVxdbgTask 0x13185c 0x26da910 8192 208 256 7936
>>
>> (Exception Stack) 3696 0 96 3600
>>
>> tNet0 ipcomNetTask 0x23c1060 10000 240 1904 8096
>>
>> (Exception Stack) 4000 0 192 3808
>>
>> ipcom_syslog 0x16bf48 0x268d220 6144 480 800 5344
>>
>> (Exception Stack) 4000 0 96 3904
>>
>> tNetConf 0x1b74b8 0x26b9a50 6144 640 1488 4656
>>
>> (Exception Stack) 4000 0 96 3904
>>
>> ipcom_telnet ipcom_telnet 0x26cdcc0 6144 512 1152 4992
>>
>> (Exception Stack) 4000 0 96 3904
>>
>> ipsntps 0x1ba330 0x26d1450 6144 416 1056 5088
>>
>> (Exception Stack) 4000 0 96 3904
>>
>> tPortmapd portmapd 0x26d5570 10000 640 1072 8928
>>
>> (Exception Stack) 4000 0 192 3808
>>
>> cbHigh epicsThreadE 0x3464310 22000 288 576 21424
>>
>> (Exception Stack) 3696 0 96 3600
>>
>> timerQueue epicsThreadE 0x3449ff0 12000 448 704 11296
>>
>> (Exception Stack) 3696 0 96 3600
>>
>> scanOnce epicsThreadE 0x3479e60 22000 320 864 21136
>>
>> (Exception Stack) 3696 0 352 3344
>>
>> scan-0.1 epicsThreadE 0x34a8cb0 22000 432 688 21312
>>
>> (Exception Stack) 3696 0 192 3504
>>
>> scan-0.2 epicsThreadE 0x34a2180 22000 432 688 21312
>>
>> (Exception Stack) 3696 0 96 3600
>>
>> cbMedium epicsThreadE 0x345b7b0 22000 288 576 21424
>>
>> (Exception Stack) 3696 0 96 3600
>>
>> scan-0.5 epicsThreadE 0x349b650 22000 432 688 21312
>>
>> (Exception Stack) 3696 0 192 3504
>>
>> scan-1 epicsThreadE 0x3494b20 22000 432 688 21312
>>
>> (Exception Stack) 3696 0 96 3600
>>
>> scan-2 epicsThreadE 0x348dff0 22000 432 688 21312
>>
>> (Exception Stack) 3696 0 96 3600
>>
>> scan-5 epicsThreadE 0x34874c0 22000 432 688 21312
>>
>> (Exception Stack) 3696 0 96 3600
>>
>> scan-10 epicsThreadE 0x3480990 22000 432 688 21312
>>
>> (Exception Stack) 3696 0 96 3600
>>
>> cbLow epicsThreadE 0x3452b00 22000 288 576 21424
>>
>> (Exception Stack) 3696 0 96 3600
>>
>> CAC-event epicsThreadE 0x34d4ae0 12000 240 2944 9056
>>
>> (Exception Stack) 3696 656 1344 2352
>>
>> dbCaLink epicsThreadE 0x346b0c0 22000 336 2976 19024
>>
>> (Exception Stack) 3696 0 96 3600
>>
>> poolPoll epicsThreadE 0x34b9e30 8000 272 560 7440
>>
>> (Exception Stack) 3696 0 96 3600
>>
>> CAS-client epicsThreadE 0x3560ef0 22000 784 1056 20944
>>
>> (Exception Stack) 3696 0 96 3600
>>
>> CAS-client epicsThreadE 0x357c910 22000 992 1232 20768
>>
>> (Exception Stack) 3696 0 352 3344
>>
>> CAS-client epicsThreadE 0x3586b10 22000 784 1056 20944
>>
>> (Exception Stack) 3696 0 96 3600
>>
>> CAS-event epicsThreadE 0x230c4b0 12000 320 2784 9216
>>
>> (Exception Stack) 3696 0 176 3520
>>
>> CAS-event epicsThreadE 0x3577040 12000 320 608 11392
>>
>> (Exception Stack) 3696 0 96 3600
>>
>> CAS-event epicsThreadE 0x357ff60 12000 320 2784 9216
>>
>> (Exception Stack) 3696 0 192 3504
>>
>> CAS-TCP epicsThreadE 0x34b2670 12000 768 1632 10368
>>
>> (Exception Stack) 3696 0 144 3552
>>
>> CAS-beacon epicsThreadE 0x34ce050 8000 464 928 7072
>>
>> (Exception Stack) 3696 0 96 3600
>>
>> CAS-UDP epicsThreadE 0x34b69e0 12000 864 1648 10352
>>
>> (Exception Stack) 3696 0 144 3552
>>
>> errlog epicsThreadE 0x343b740 8000 320 592 7408
>>
>> (Exception Stack) 3696 0 96 3600
>>
>> taskwd epicsThreadE 0x3445470 8000 416 560 7440
>>
>> (Exception Stack) 3696 0 96 3600
>>
>> INTERRUPT 5008 0 928 4080
>>
>> value = 63 = 0x3f = '?'
>>
>> Thanks,
>>
>> Mark
>>
>> -----Original Message-----
>> From: Torsten Bögershausen <Torsten.Bogershausen at ess.eu <mailto:Torsten.Bogershausen at ess.eu>>
>> Sent: Tuesday, May 31, 2022 9:01 AM
>> To: Mark Rivers <rivers at cars.uchicago.edu <mailto:rivers at cars.uchicago.edu>>; Michael Davidsaver <mdavidsaver at gmail.com <mailto:mdavidsaver at gmail.com>>
>> Cc: tech-talk at aps.anl.gov <mailto:tech-talk at aps.anl.gov>
>> Subject: Re: Bus errors accessing VME with base 7.0.6.1 and latest synApps modules
>>
>> Hej Mark,
>>
>> So R7.0.5 is good, and f9ea6a5bff695c5f88bb95dce38a3fd349738907 is bad ?
>>
>> There are some "real" commits, and merges:
>>
>> git log R7.0.5..f9ea6a5bff695c5f88bb95dce38a3fd349738907
>>
>> Then it could make sense, to bisect between those 2?
>>
>> Another question:
>>
>> Coluld it make sense to run the SW (even more stripped may be) under Linux instead with valgrind ?
>>
>> A 3rd one:
>>
>> Could it be that a stack has become too short under vxWorks ?
>>
- References:
- Bus errors accessing VME with base 7.0.6.1 and latest synApps modules Mark Rivers via Tech-talk
- RE: Bus errors accessing VME with base 7.0.6.1 and latest synApps modules Mark Rivers via Tech-talk
- Re: Bus errors accessing VME with base 7.0.6.1 and latest synApps modules Michael Davidsaver via Tech-talk
- RE: Bus errors accessing VME with base 7.0.6.1 and latest synApps modules Mark Rivers via Tech-talk
- Re: Bus errors accessing VME with base 7.0.6.1 and latest synApps modules Michael Davidsaver via Tech-talk
- RE: Bus errors accessing VME with base 7.0.6.1 and latest synApps modules Mark Rivers via Tech-talk
- Re: Bus errors accessing VME with base 7.0.6.1 and latest synApps modules Michael Davidsaver via Tech-talk
- RE: Bus errors accessing VME with base 7.0.6.1 and latest synApps modules Mark Rivers via Tech-talk
- RE: Bus errors accessing VME with base 7.0.6.1 and latest synApps modules Mark Rivers via Tech-talk
- RE: Bus errors accessing VME with base 7.0.6.1 and latest synApps modules Mark Rivers via Tech-talk
- Re: Bus errors accessing VME with base 7.0.6.1 and latest synApps modules Torsten Bögershausen via Tech-talk
- RE: Bus errors accessing VME with base 7.0.6.1 and latest synApps modules Mark Rivers via Tech-talk
- RE: Bus errors accessing VME with base 7.0.6.1 and latest synApps modules Mark Rivers via Tech-talk
- Re: Bus errors accessing VME with base 7.0.6.1 and latest synApps modules Torsten Bögershausen via Tech-talk
- Re: Bus errors accessing VME with base 7.0.6.1 and latest synApps modules Michael Davidsaver via Tech-talk
- Navigate by Date:
- Prev:
RE: Bus errors accessing VME with base 7.0.6.1 and latest synApps modules Mark Rivers via Tech-talk
- Next:
Re: Bus errors accessing VME with base 7.0.6.1 and latest synApps modules Till Straumann via Tech-talk
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
<2022>
2023
2024
- Navigate by Thread:
- Prev:
Re: Bus errors accessing VME with base 7.0.6.1 and latest synApps modules Michael Davidsaver via Tech-talk
- Next:
Re: Bus errors accessing VME with base 7.0.6.1 and latest synApps modules Michael Davidsaver via Tech-talk
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
<2022>
2023
2024
|