Hi Torsten,
On Oct 28, 2021, at 3:56 AM, Torsten Bögershausen via Core-talk
<core-talk at aps.anl.gov <mailto:core-talk at aps.anl.gov>> wrote:
Digging into the last "Build failed" messages
(thanks to Karl for reminding me)
It seems as if there is a problem with this test case:
not ok 54 - dbGetField("li2", 5) -> 3 == 0
You saw several “Build failed” messages yesterday because I was clicking
Appveyor’s “Re-run Incomplete” button on the base-7.0-419 build to see
if I could get all the failing test configurations to pass. There were 3
that failed 4 times yesterday, at which point I gave up.
We have a Discussion on GitHub
<https://github.com/epics-base/epics-base/discussions/162> where we’ve
been tracking the tests in Base that fail occasionally, please see the
comments about this particular failure
<https://github.com/epics-base/epics-base/discussions/162#discussioncomment-1460745> for
some more background on it.
And sometimes problems with packages.chocolatey.org
<http://packages.chocolatey.org>, see further down.
Unfortunately those aren’t something that we know how to fix, apparently
when they happen our Appveyor VM can't access the servers that we
install necessary packages from. If there’s a way to tell Appveyor
“please re-run this build configuration later or on a different VM” that
might be something that could be included in our CI scripts which do the
setup. Ideas welcome, but I’m not hopeful we could fix this ourselves.
Are there any good ideas, what could be done about the
dbGetField("li2", 5) -> 3 == 0
failure ?
Disable it under Windows ?
Retry, if it failed ?
Add a sleep?
sleep & retry if it failed ?
Any other good ideas ?
I think both Michael and I prefer that we find a way to fix the test
code so it waits for the event properly, adding sleeps just causes test
runs to take longer on systems where they delay isn’t generally
required, and you never really know how long you need to wait for.
There’s something about the Appveyor VM which causes it to schedule
threads in unusual ways, and that’s probably a good thing for our tests
in the long run — it makes us get them right.
I just took another look at the failure and I think I have a fix:
*index 387ee7299..7bb8df7f4 100644*
*--- a/modules/database/test/std/rec/regressLinkSevr.db*
*+++ b/modules/database/test/std/rec/regressLinkSevr.db*
@@ -6,11 +6,10 @@record(stringin, "si1") {
}
record(longin, "li1") {
field(INP, "ai.SEVR")
- field(FLNK, "si2")
}
record(stringin, "si2") {
- field(INP, "ai.SEVR CA")
+ field(INP, "ai.SEVR CP")
field(FLNK, "li2")
}
record(longin, "li2") {
The test is already waiting for the cnt record to process before it
checks the ENUM values. The above change ensures that the two records
which read the SEVR field over CA won’t actually process until the
update has arrived, so the return from testMonitorWait() will be delayed
appropriately. This also means the call to dbCaSync() is no longer required:
*index 95217043d..7580a3402 100644*
*--- a/modules/database/test/std/rec/regressTest.c*
*+++ b/modules/database/test/std/rec/regressTest.c*
@@ -197,7 +197,6 @@void testLinkSevr(void)
testdbPutFieldOk("si1.PROC", DBF_LONG, 1);
testMonitorWait(mon);
- dbCaSync(); /* wait for update */
testdbGetFieldEqual("si1", DBF_STRING, "INVALID");
testdbGetFieldEqual("li1", DBF_LONG, INVALID_ALARM);
I will commit and push these changes now and we’ll see if that solves it.
- Andrew
--
Complexity comes for free, simplicity you have to work for.