EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  <20212022  2023  2024  Index 2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  <20212022  2023  2024 
<== Date ==> <== Thread ==>

Subject: Re: Deadlock on VxWorks 6.7
From: "Zimoch Dirk \(PSI\) via Core-talk" <core-talk at aps.anl.gov>
To: "benjamin.franksen at helmholtz-berlin.de" <benjamin.franksen at helmholtz-berlin.de>
Cc: "core-talk at aps.anl.gov" <core-talk at aps.anl.gov>
Date: Tue, 8 Jun 2021 14:55:38 +0000
The problem that looked like memory corruption is caused by accessing uninitialized memory.

When epicsAtomicCmpAndSwapIntT in callbackInit fails for some reason (In my case, it returned 1.), the callback queues
never get initialized (and callback threads never get started). Nevertheless, callbackRequest happily uses these
uninitialized queues without checking:

    mySet = &callbackQueue[priority];
    if (mySet->queueOverflow) return S_db_bufFull;
    pushOK = epicsRingPointerPush(mySet->queue, pcallback);

It could be fixed like this:
    mySet = &callbackQueue[priority];
    if (!mySet->queue) {
        epicsInterruptContextMessage("callbackRequest: Callbacks not initialized\n");
        return S_db_notInit;
    }
    if (mySet->queueOverflow) return S_db_bufFull;
    pushOK = epicsRingPointerPush(mySet->queue, pcallback);

But then callbackCleanup should also not only destroy the queue but also set the pointer to NULL.
        epicsRingPointerDelete(mySet->queue);
	mySet->queue = NULL;

Probably the event semaphore as well.

Also the error message in callbackInit is misleading: "Warning: callbackInit called again before callbackCleanup"
Only that epicsAtomicCmpAndSwapIntT failed does not mean that callbackInit had been called *again*.



On Tue, 2021-06-08 at 10:17 +0000, Zimoch Dirk (PSI) via Core-talk wrote:
> It seems that the problem is caused by the Atomic implementation.
> 
> Adding debug output to callbackInit() showed that it runs only once (as it should) and cbState is 0 (= cbInit). 
> Nevertheless the Atomic CAS fails:
> 
>     if (epicsAtomicCmpAndSwapIntT(&cbState, cbInit, cbRun)!=cbInit) {
>         fprintf(stderr, "Warning: callbackInit called again before callbackCleanup\n");
>         return;
>     }
> 
> Changing the version limit in vxWorks/epicsAtomicOSD.h from
> #if _WRS_VXWORKS_MAJOR * 100 + _WRS_VXWORKS_MINOR >= 606
> to
> #if _WRS_VXWORKS_MAJOR * 100 + _WRS_VXWORKS_MINOR >= 609
> 
> Makes it work on my VxWorks 6.7 MV5100 System. No callbackInit warning any longer and no crash in scanIoRequest.
> 
> 
> 
> On Tue, 2021-06-08 at 11:30 +0200, Zimoch Dirk wrote:
> > Hi Ben,
> > 
> > 
> > I have applied your patch. Unfortunately it does not help with my problem. The out of bounds access you fixed is
> > only
> > a
> > read access, unlikely to cause memory corruption.
> > 
> > After adding debug output, I see that the devIocStats function scan_time() runs several times without problems but
> > then
> > suddenly crashes in scanIoRequest.
> > 
> > I tested other drivers using I/O Intr scanning and got the same problem. So it seems devIocStats was innocent.
> > 
> > But I see another strange thing. Just after the iocInit message, I noticed "Warning: callbackInit called again
> > before
> > callbackCleanup". Looks like maybe something is messed up with the callbacks threads used by I/O Intr. Investigating
> > that now...
> > 
> > Dirk
> > 
> > 
> > On Tue, 2021-06-08 at 10:43 +0200, Ben Franksen via Core-talk wrote:
> > > Am 07.06.21 um 14:53 schrieb Zimoch Dirk (PSI) via Core-talk:
> > > > Further investigation showed that the bug is somehow related to devIocStats (not loading it does not trigger the
> > > > problem).
> > > > The problem seems to be caused by memory corruption. When printing spin->locked (which should be 0 or 1), I get
> > > > 1919513701 = 0x7269746.
> > > > Maybe one out our modifications to devIocStats was buggy.
> > > 
> > > The fix I have just now made a PR for
> > > (https://github.com/epics-modules/iocStats/pull/40) may be related to this.
> > > 
> > > Cheers
> > > Ben
diff --git a/modules/database/src/ioc/db/callback.c b/modules/database/src/ioc/db/callback.c
index 3fa2493..4a26e27 100644
--- a/modules/database/src/ioc/db/callback.c
+++ b/modules/database/src/ioc/db/callback.c
@@ -263,7 +263,9 @@ void callbackCleanup(void)
 
         assert(epicsAtomicGetIntT(&mySet->threadsRunning)==0);
         epicsEventDestroy(mySet->semWakeUp);
+        mySet->semWakeUp = NULL;
         epicsRingPointerDelete(mySet->queue);
+        mySet->queue = NULL;
     }
 
     epicsTimerQueueRelease(timerQueue);
@@ -333,6 +335,10 @@ int callbackRequest(epicsCallback *pcallback)
         return S_db_badChoice;
     }
     mySet = &callbackQueue[priority];
+    if (!mySet->queue) {
+        epicsInterruptContextMessage("callbackRequest: Callbacks not initialized\n");
+        return S_db_notInit;
+    }
     if (mySet->queueOverflow) return S_db_bufFull;
 
     pushOK = epicsRingPointerPush(mySet->queue, pcallback);

References:
Deadlock on VxWorks 6.7 Zimoch Dirk (PSI) via Core-talk
Re: Deadlock on VxWorks 6.7 Michael Davidsaver via Core-talk
AW: Deadlock on VxWorks 6.7 Zimoch Dirk (PSI) via Core-talk
AW: Deadlock on VxWorks 6.7 Zimoch Dirk (PSI) via Core-talk
AW: Deadlock on VxWorks 6.7 Zimoch Dirk (PSI) via Core-talk
Re: Deadlock on VxWorks 6.7 Ben Franksen via Core-talk
Re: Deadlock on VxWorks 6.7 Zimoch Dirk (PSI) via Core-talk
Re: Deadlock on VxWorks 6.7 Zimoch Dirk (PSI) via Core-talk

Navigate by Date:
Prev: Re: Tests on RTEMS-pc686-qemu Chandler, Brendan via Core-talk
Next: Bug in vxWorks implementation and/or usage of epicsAtomicCmpAndSwapIntT Zimoch Dirk (PSI) via Core-talk
Index: 2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  <20212022  2023  2024 
Navigate by Thread:
Prev: Re: Deadlock on VxWorks 6.7 Zimoch Dirk (PSI) via Core-talk
Next: Heads-Up: Next Release of Base (C++) Ralph Lange via Core-talk
Index: 2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  <20212022  2023  2024 
ANJ, 08 Jun 2021 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·