EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  <20192020  2021  2022  2023  2024  Index 2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  <20192020  2021  2022  2023  2024 
<== Date ==> <== Thread ==>

Subject: Re: Excessive scan times for periodic scans
From: Michael Davidsaver via Core-talk <[email protected]>
To: "Konrad, Martin" <[email protected]>, "Johnson, Andrew N." <[email protected]>, EPICS Core Talk <[email protected]>
Date: Tue, 14 May 2019 20:38:12 -0700
Since I've been playing with systemtap recently I did a quick test.

The libca searchTimer does not work as I (naively) expected.
It has a 0.032000 sec. periodic timer (also 0.064000, 0.128000, etc.)
running regardless of whether there is a search in progress for that
interval.

So the shortest timer is expiring, and being restarted,
most frequently.  The worst case of the insertion
sort is the most common...

Thankfully libca does not share a timer queue.  So the
worst case is only 15 steps.  Though this does illustrate
how easily the assumption behind insertion sort can be violated.

I also notice that the reschedule() call (an epicsEvent::signal() ) happens 
while the mutex is held.  Depending on thread priorities, this may bounce.

The attached patch adds trace point to timer.cpp in libCom
when a timer is started.  It shows how many step the insertion
sort had to make, and the total # of timers.

Userspace tracing with systemtap is painful to setup.  cf. the absolute path
in the attached timerQueue.stp, and the large size of the kernel debug
symbols.  Once it's working though, it's quite powerful.

Run it like:

> sudo stap -x `pgrep softIoc` timerQueue.stp

When tracing a softIoc with a single CA link to a non-existent PV.

...
> start14/15
> # cancel 0
> value |-------------------------------------------------- count
>     1 |                                                     0
>     2 |                                                     0
>     4 |                                                     1
>     8 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@   336
>    16 |@                                                   11
>    32 |                                                     0
>    64 |                                                     0
> 
> start13/15
> start14/15
...


I have Debian 9 with a local build of systemtap 4.0 to go along
with the Debian backport kernel I run.

> Systemtap translator/driver (version 4.0/0.168, Debian version 4.0-1.md1 (UNRELEASED))

> $ aptitude search systemtap|grep ^i
> i  systemtap - instrumentation system for Linux
> i  systemtap-common - instrumentation system for Linux (common component)
> i  systemtap-doc - documentation and examples for SystemTap
> i  systemtap-runtime - instrumentation system for Linux (runtime component)
> i  systemtap-sdt-dev - statically defined probes development files

Make sure to install kernel debug symbols.

> $ aptitude search linux-image|grep ^i
> i A linux-image-4.19.0-0.bpo.4-amd64 - Linux 4.19 for 64-bit PCs (signed)
> i  linux-image-4.19.0-0.bpo.4-amd64-dbg - Debug symbols for linux-image-4.19.0-0.bpo.4-amd64



On 5/13/19 5:46 PM, Konrad, Martin via Core-talk wrote:
> Hi Andrew,
> Sorry about the delay. I got sidetracked...
> 
>> What other resources do your devices share that might limit how fast the
>> IOC can communicate with them?
> From looking at the Asyn code I first thought that there were no shared
> resources (Asyn is usually allocating all resources per port). So I
> moved the IOC to a physical machine and fired up the profiler to
> generate some flame graphs (open them in a web browser).
> 
> [Explanation: Flame graphs visualize the stack traces sampled by the
> profiler (statistical profiling). The stack traces are plotted from the
> bottom to the top. The x axis shows the different samples, ordered
> alphabetically by the stack trace, merging neighboring samples if they
> have the same backtrace. This means sharp spikes correspond to
> backtraces that only occurred rarely whereas wide plateaus represent
> backtraces that were found frequently.]
> 
> The first flame graph [1] shows CPU time spend by the IOC process. The
> interesting part is the "scan-1" thread on the right (the thread that is
> running slow). Zooming into this thread shows that queueRequest ->
> timer::start() -> epicsMutex::lock() dominates (89% of the time spend in
> queueRequest() is spend in epicsMutex::lock()). A quick look at the code
> reveals that starting a timer involves acquiring the lock on the timer
> queue [2]. As it turns out the relevant timer queue is indeed shared
> between Asyn ports.
> 
> Note that you have to take the CPU time flame graph with a grain of salt
> because it doesn't include time the machine spent running other
> processes or sitting idle (remember that was part of the problem: each
> core was >=30% idle). So I also profiled off-CPU time [3]. This again
> revealed that 86% of the off-CPU time in the queueRequest function is
> related to the epicsMutex::lock() call in timer::start().
> 
> This data suggests that the problem is either in the implementation of
> the timer queue or in the way Asyn uses it (or both). Looking at the
> timer code I'm suspecting that the implementation of the timer queue
> might not be ideal: The data structure used for holding the timers is a
> linked list which might result in a significant number of cache misses
> while searching the list in timer::start() for the right place to insert
> the new timer. I suspect that thanks to pre-fetching a priority queue
> based on a vector might perform better.
> 
> Any thoughts?
> 
> -Martin
> 
> [1] https://www.martin-konrad.net/nextcloud/index.php/s/bT2PooiYzS5xeeb
> [2]
> https://git.launchpad.net/epics-base/tree/src/libCom/timer/timer.cpp?h=3.15#n62
> [3] https://www.martin-konrad.net/nextcloud/index.php/s/dZpTtZfSqbNcm7f
> 

diff --git a/modules/libcom/src/timer/timer.cpp b/modules/libcom/src/timer/timer.cpp
index 6f0c7ea4a..282c383d3 100644
--- a/modules/libcom/src/timer/timer.cpp
+++ b/modules/libcom/src/timer/timer.cpp
@@ -19,6 +19,14 @@
 #include <stdexcept>
 #include <stdio.h>
 
+#ifdef __linux__
+#include <sys/sdt.h>
+#else
+#define DTRACE_PROBE(A,B,C) do{}while(0)
+#define DTRACE_PROBE1(A,B,C) do{(void)(C);}while(0)
+#define DTRACE_PROBE2(A,B,C,D) do{(void)(C);(void)(D);}while(0)
+#endif
+
 #define epicsExportSharedSymbols
 #include "epicsGuard.h"
 #include "timerPrivate.h"
@@ -72,6 +80,7 @@ void timer::privateStart ( epicsTimerNotify & notify, const epicsTime & expire )
     if ( this->curState == stateActive ) {
         // above expire time and notify will override any restart parameters
         // that may be returned from the timer expire callback
+        DTRACE_PROBE1(epicsBase, timerStart, 0);
         return;
     }
     else if ( this->curState == statePending ) {
@@ -94,7 +103,9 @@ void timer::privateStart ( epicsTimerNotify & notify, const epicsTime & expire )
     // **** this should use a binary tree ????
     //
     tsDLIter < timer > pTmr = this->queue.timerList.lastIter ();
+    size_t N = 0;
     while ( true ) {
+        N++;
         if ( ! pTmr.valid () ) {
             //
             // add to the beginning of the list
@@ -116,6 +127,9 @@ void timer::privateStart ( epicsTimerNotify & notify, const epicsTime & expire )
         --pTmr;
     }
 
+    size_t total = this->queue.timerList.count();
+    DTRACE_PROBE2(epicsBase, timerStart, N, total);
+
     this->curState = timer::statePending;
 
     if ( reschedualNeeded ) {
@@ -137,6 +151,7 @@ void timer::cancel ()
 {
     bool reschedual = false;
     bool wakeupCancelBlockingThreads = false;
+    DTRACE_PROBE1(epicsBase, timerCancel, 0);
     {
         epicsGuard < epicsMutex > locker ( this->queue.mutex );
         this->pNotify = 0;
probe begin {
    printf("Start with target=%d...\n", target())
}

private global nstart
private global ncancel

probe process("/home/mdavidsaver/work/epics/base-git/lib/linux-x86_64-debug/libCom.so").provider("epicsBase").mark("timerStart")
{
    if(pid()!=target()) next
    println("start ", $N, "/", $total)
    nstart <<< $N
}

probe process("/home/mdavidsaver/work/epics/base-git/lib/linux-x86_64-debug/libCom.so").provider("epicsBase").mark("timerCancel")
{
    if(pid()!=target()) next
    ncancel++
}

probe timer.ms(5000)
{
    try {
        printf("# cancel %d\n", ncancel)
        print(@hist_log(nstart))
        delete nstart
        ncancel = 0
    } catch(msg) {
        println("Error:", msg)
    }
}

References:
Excessive scan times for periodic scans Konrad, Martin via Core-talk
Re: Excessive scan times for periodic scans Konrad, Martin via Core-talk

Navigate by Date:
Prev: [Merge] ~dirk.zimoch/epics-base:dynamicVxWorksVmeFunctionBinding into epics-base:7.0 Andrew Johnson via Core-talk
Next: Re: [Merge] ~dirk.zimoch/epics-base:dynamicVxWorksVmeFunctionBinding into epics-base:7.0 mdavidsaver via Core-talk
Index: 2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  <20192020  2021  2022  2023  2024 
Navigate by Thread:
Prev: Re: Excessive scan times for periodic scans Konrad, Martin via Core-talk
Next: Re: Excessive scan times for periodic scans Konrad, Martin via Core-talk
Index: 2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  <20192020  2021  2022  2023  2024 
ANJ, 15 May 2019 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·