EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  <20222023  2024  Index 2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  <20222023  2024 
<== Date ==> <== Thread ==>

Subject: Re: Appveyor build failures from pvAccessCPP
From: Michael Davidsaver via Core-talk <core-talk at aps.anl.gov>
To: Andrew Johnson <anj at anl.gov>
Cc: core-talk at aps.anl.gov
Date: Wed, 9 Feb 2022 11:47:55 -0800
On 2/9/22 10:53, Andrew Johnson via Core-talk wrote:
My latest Appveyor job for epics-base showed a couple of failures, in the pvAccess testChannelAccess tests (which are unrelated to the CA provider or my commits that triggered this build),

https://github.com/epics-base/pvAccessCPP/issues/98

Not a new issue.  testChannelAccess, which in fact tests PVA only, has a number
synchronization issues.  The test code is itself quite complex.  To the point
that I've never been motivated to dig in to the depth required to straighten
it out.  Frankly it would mean a re-write.

https://github.com/epics-base/pvAccessCPP/blob/master/testApp/remote/channelAccessIFTest.cpp

I have mixed feelings about keeping this test.  It would probably have value
in validating in future changes to pvAccessCPP.  Until, or unless, this happens
it's just noise.

eg. maybe someone can come up with a recipe to run this test only during PR
builds in the pvAccessCPP repository?


and an unexplained core-dump analysis of the testCaProvider tests after a silent access violation. I'm just writing this to record what I've found out about it, I'm not expecting anyone else to delve further given that it's a very rare thing.

I see this core dump fairly frequently, although not on every run.

The msvc debug builds show "pNode = 0xdd" which suggests a use-after-free.
Maybe a double call of 'dbChannelDelete()'?

https://stackoverflow.com/questions/370195/when-and-why-will-a-compiler-initialise-memory-to-0xcd-0xdd-etc-on-malloc-fre



On 2/9/22 2:06 AM, AppVeyor via Core-talk wrote:


  Build epics-base base-7.0-48 failed <https://ci.appveyor.com/project/anjohnson/epics-base/builds/42500290>

Commit 2fbaa7f926 <https://github.com/anjohnson/epics-base/commit/2fbaa7f926> by Andrew Johnson <mailto:anj at anl.gov> on 2/8/2022 9:29 PM:
Improve POD documentation of the TSE and TSEL fields

C:/projects/epics-base/modules/pvAccess/testApp/O.windows-x64-debug
15337 <https://ci.appveyor.com/project/anjohnson/epics-base/builds/42500290/job/bcpdlxynwmfx616b#L15337>testAtomicBoolean.tap ..... ok
15338 <https://ci.appveyor.com/project/anjohnson/epics-base/builds/42500290/job/bcpdlxynwmfx616b#L15338>testHexDump.tap ........... ok
15339 <https://ci.appveyor.com/project/anjohnson/epics-base/builds/42500290/job/bcpdlxynwmfx616b#L15339>testInetAddressUtils.tap .. ok
15340 <https://ci.appveyor.com/project/anjohnson/epics-base/builds/42500290/job/bcpdlxynwmfx616b#L15340>configurationTest.tap ..... ok
15341 <https://ci.appveyor.com/project/anjohnson/epics-base/builds/42500290/job/bcpdlxynwmfx616b#L15341>testFairQueue.tap ......... ok
15342 <https://ci.appveyor.com/project/anjohnson/epics-base/builds/42500290/job/bcpdlxynwmfx616b#L15342>testWildcard.tap .......... ok
15343 <https://ci.appveyor.com/project/anjohnson/epics-base/builds/42500290/job/bcpdlxynwmfx616b#L15343>testChannelAccess.tap .....
15344 <https://ci.appveyor.com/project/anjohnson/epics-base/builds/42500290/job/bcpdlxynwmfx616b#L15344>not ok 25 - void __cdecl ChannelAccessIFTest::test_channel(void): a destroy event was caught for the testing channel that was destroyed twice
15345 <https://ci.appveyor.com/project/anjohnson/epics-base/builds/42500290/job/bcpdlxynwmfx616b#L15345>Failed 5/152 subtests
15346 <https://ci.appveyor.com/project/anjohnson/epics-base/builds/42500290/job/bcpdlxynwmfx616b#L15346>(less 12 skipped subtests: 135 okay)
15347 <https://ci.appveyor.com/project/anjohnson/epics-base/builds/42500290/job/bcpdlxynwmfx616b#L15347>testCodec.tap ............. ok
15348 <https://ci.appveyor.com/project/anjohnson/epics-base/builds/42500290/job/bcpdlxynwmfx616b#L15348>testRPC.tap ............... ok
15349 <https://ci.appveyor.com/project/anjohnson/epics-base/builds/42500290/job/bcpdlxynwmfx616b#L15349>testServerContext.tap ..... ok
15350 <https://ci.appveyor.com/project/anjohnson/epics-base/builds/42500290/job/bcpdlxynwmfx616b#L15350>testmonitorfifo.tap ....... ok
15351 <https://ci.appveyor.com/project/anjohnson/epics-base/builds/42500290/job/bcpdlxynwmfx616b#L15351>testsharedstate.tap ....... ok
15352 <https://ci.appveyor.com/project/anjohnson/epics-base/builds/42500290/job/bcpdlxynwmfx616b#L15352>
15353 <https://ci.appveyor.com/project/anjohnson/epics-base/builds/42500290/job/bcpdlxynwmfx616b#L15353>Test Summary Report
15354 <https://ci.appveyor.com/project/anjohnson/epics-base/builds/42500290/job/bcpdlxynwmfx616b#L15354>-------------------
15355 <https://ci.appveyor.com/project/anjohnson/epics-base/builds/42500290/job/bcpdlxynwmfx616b#L15355>testChannelAccess.tap (Wstat: 0 Tests: 148 Failed: 1)
15356 <https://ci.appveyor.com/project/anjohnson/epics-base/builds/42500290/job/bcpdlxynwmfx616b#L15356>Failed test: 25
15357 <https://ci.appveyor.com/project/anjohnson/epics-base/builds/42500290/job/bcpdlxynwmfx616b#L15357>Parse errors: Tests out of sequence. Found (23) but expected (21)
15358 <https://ci.appveyor.com/project/anjohnson/epics-base/builds/42500290/job/bcpdlxynwmfx616b#L15358>Tests out of sequence. Found (24) but expected (22)
15359 <https://ci.appveyor.com/project/anjohnson/epics-base/builds/42500290/job/bcpdlxynwmfx616b#L15359>Tests out of sequence. Found (25) but expected (23)
15360 <https://ci.appveyor.com/project/anjohnson/epics-base/builds/42500290/job/bcpdlxynwmfx616b#L15360>Tests out of sequence. Found (26) but expected (24)
15361 <https://ci.appveyor.com/project/anjohnson/epics-base/builds/42500290/job/bcpdlxynwmfx616b#L15361>Tests out of sequence. Found (27) but expected (25)
15362 <https://ci.appveyor.com/project/anjohnson/epics-base/builds/42500290/job/bcpdlxynwmfx616b#L15362>Displayed the first 5 of 129 TAP syntax errors.
15363 <https://ci.appveyor.com/project/anjohnson/epics-base/builds/42500290/job/bcpdlxynwmfx616b#L15363>Re-run prove with the -p option to see them all.
15364 <https://ci.appveyor.com/project/anjohnson/epics-base/builds/42500290/job/bcpdlxynwmfx616b#L15364>Files=12, Tests=6191, 0 wallclock secs ( 0.33 usr + 0.00 sys = 0.33 CPU)
15365 <https://ci.appveyor.com/project/anjohnson/epics-base/builds/42500290/job/bcpdlxynwmfx616b#L15365>Result: FAIL
15366 <https://ci.appveyor.com/project/anjohnson/epics-base/builds/42500290/job/bcpdlxynwmfx616b#L15366>-------------------
15367 <https://ci.appveyor.com/project/anjohnson/epics-base/builds/42500290/job/bcpdlxynwmfx616b#L15367>
15368 <https://ci.appveyor.com/project/anjohnson/epics-base/builds/42500290/job/bcpdlxynwmfx616b#L15368>
15369 <https://ci.appveyor.com/project/anjohnson/epics-base/builds/42500290/job/bcpdlxynwmfx616b#L15369>

The failing part of the testChannelAccess.tap file looks like this:

ok 20 - void __cdecl ChannelAccessIFTest::test_channel(void): channel connection state connected
# void __cdecl ChannelAccessIFTest::test_channel(void): destroying the channel
#SyncChannelRequesterImpl.channelStateChange:2
#ok 21 - SyncChannelRequesterImplvoid __cdecl ChannelAccessIFTest::test_channel(void): channel created count should be the same on the destroyed channel.
channelStateChange:3not ok 22 -
void __cdecl ChannelAccessIFTest::test_channel(void): channel state change count should increase on the destroyed channel
ok 23 - void __cdecl ChannelAccessIFTest::test_channel(void): channel should not be connected
ok 24 - void __cdecl ChannelAccessIFTest::test_channel(void): channel connection state DESTROYED
# void __cdecl ChannelAccessIFTest::test_channel(void): destroying the channel yet again
not ok 25 - void __cdecl ChannelAccessIFTest::test_channel(void): a destroy event was caught for the testing channel that was destroyed twice
# BEGIN TEST void __cdecl ChannelAccessIFTest::test_channelGetWithInvalidChannelAndRequester(void):
#SyncChannelRequesterImpl.channelCreated(Status [type=OK])
#SyncChannelRequesterImpl.channelStateChange:1
ok 26 # SKIP  creating a channel get with a null channel

Unfortunately the test code is emitting other text to stdout which is messing up the tap output, the two magenta-colored test results above aren't being seen or counted properly by the test harness, resulting in it reporting tests of of sequence. There are still 2 failing tests above though, #22 and #25, but only on this VS-2019 dynamic-debug build.


Then at the end of the build log there is a core-dump and exception analysis <https://ci.appveyor.com/project/anjohnson/epics-base/builds/42500290/job/bcpdlxynwmfx616b#L15729> of the testCaProvider.exe test program, which didn't show up as failing any tests or dying when it was run but does seem to have silently dumped a core file. This shows a destruction problem during atexit cleanups. Whether it's related to an issue in the caProvider itself or the combination of running a local CA client and an IOC in the same process isn't easy to tell though. I probably won't look at this any further unless it starts happening elsewhere.

- Andrew


--
Complexity comes for free, Simplicity you have to work for.



References:
Build failed: epics-base base-7.0-48 AppVeyor via Core-talk
Appveyor build failures from pvAccessCPP Andrew Johnson via Core-talk

Navigate by Date:
Prev: Appveyor build failures from pvAccessCPP Andrew Johnson via Core-talk
Next: lgtm.com, and a heads-up about gcc 10 Andrew Johnson via Core-talk
Index: 2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  <20222023  2024 
Navigate by Thread:
Prev: Appveyor build failures from pvAccessCPP Andrew Johnson via Core-talk
Next: lgtm.com, and a heads-up about gcc 10 Andrew Johnson via Core-talk
Index: 2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  <20222023  2024 
ANJ, 14 Sep 2022 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·