Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  <20182019  Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  <20182019 
<== Date ==> <== Thread ==>

Subject: RE: EDM X/Y Plot segfaults
From: "Baily, Scott A" <sbaily@lanl.gov>
To: Bruce Hill <bhill@slac.stanford.edu>, "Sinclair, John William" <sinclairjw@ornl.gov>, Christoph Schroeder <christoph.schroeder@helmholtz-berlin.de>, "tech-talk@aps.anl.gov" <tech-talk@aps.anl.gov>
Date: Fri, 31 Aug 2018 20:48:40 +0000

I actually never saw crashes in the unmodified edm.  But when I first added a change to enable 0 length subscriptions, then I had some issues related to traceSize and traceSizeLimit being used interchangeably in the original code.  Then I found some more issues related to need flags being uninitialized.  For example needUpdate is set before any data has been returned. I found several other class elements that were being used uninitialized (valgrind was very helpful).

 

From: Bruce Hill <bhill@slac.stanford.edu>
Sent: Tuesday, August 21, 2018 3:16 AM
To: Sinclair, John William <sinclairjw@ornl.gov>; Christoph Schroeder <christoph.schroeder@helmholtz-berlin.de>; tech-talk@aps.anl.gov; Baily, Scott A <sbaily@lanl.gov>
Subject: Re: EDM X/Y Plot segfaults

 

Hi John,
I don't think anyone here has seen a crash related to the related display and mux widgets.
The times we've seen crashes have nearly all been w/ xygraph widgets.

Re threads, I think the problem relates to the threads introduced by the EPICS CA lib.
gdb shows that CA adds 5 threads.  When CA executes Pv callbacks, they happen on CA threads.

The screens that crashed most often had resetPv's, traceCtlPv's and usually several traces where yPv was an array.
For each of these xygraph has callback functions when the connection goes up and down and
for when the value updates, and these callbacks write to xyGraphClass values.

I've gone through the code many times and haven't found a smoking gun, but it looks to me
like the normal execution of xyGraphClass::executeDeferred() should never call genChronoVector()
with a NULL ptr in yPvData[i] for an active trace, yet the diagnostics I added show that happens
occasionally, particularly when multiple xygraph widgets are active, each w/ many yPv callbacks at 120hz.

I think Scott's crash also was related to callback load, but in his case it appeared
to be related to arrays > 10000 elements.    I'm not sure what the update rate was on his PVs.
He added an additional test of yPvData[i] in the for_each_trace loop in  genChronoVector()
after my test for NULL yPvData[i] at the top of the function, and apparently that helped
reduce his crash rate even further.

Scott, could you add more details re your test case?

Regards,
- Bruce

On 08/16/2018 08:17 AM, Sinclair, John William wrote:

Edm is single threaded. There is code that takes and releases locks but that was for a particular pv type that was supported long ago. These could be removed and should probably at least be turned into macros.

 

I am investigating a crash observed in the related display widget that happens when it contain macro symbols in its properties and also requires that it reside on a page with a multiplexor that makes use of a controlling pv. You don’t happen to have a similar configuration?

 

As to converting the cvs repo to git, give me a bit of time. Work demands are high at the moment.

 

Thanks,

John Sinclair

 

From: Bruce Hill <bhill@slac.stanford.edu>
Sent: Wednesday, August 15, 2018 7:06 PM
To: Sinclair, John William <sinclairjw@ornl.gov>; Christoph Schroeder <christoph.schroeder@helmholtz-berlin.de>; tech-talk@aps.anl.gov
Subject: Re: EDM X/Y Plot segfaults

 

Hi John,
I spent most of a day trying to reduce our test case to something portable,
but can only get it to fail with multiple live 120hz arrays, and at least 2 screens w/
multiple xygraph widgets.

As best I can tell the crashes are due to race conditions between threads.   The patches
that Scott and I came up with greatly reduce the crash rate to the point where I have
not seen any additional crashes, but I wouldn't say these patches are a complete fix.

Would you be amenable to converting your edm repo to git and pushing it to github?
This would be a big aid for collaboration, allowing us to post proposed patches as
pull requests w/ all github's nice web based commenting and notifications.

I've had a lot of experience in doing this w/ svn and CVS based repos and could help
with advice if needed.

Best regards,
- Bruce Hill

On 08/13/2018 10:13 AM, Sinclair, John William wrote:

I am finally getting back to edm. Sorry for the delay.

 

I see that the author of the epics independent code left out the 0 element subscription so this can be fixed. As to the issue of the widget causing crashes, does someone have a recipe to induce a predictable crash in the latest version? If so, let me know at your convenience.

 

Thanks,

John

 

From: tech-talk-bounces@aps.anl.gov <tech-talk-bounces@aps.anl.gov> On Behalf Of Christoph Schroeder
Sent: Friday, May 25, 2018 5:40 AM
To: tech-talk@aps.anl.gov
Subject: Re: EDM X/Y Plot segfaults

 

Hi all,

if any Debian package maintainer is interested, I prepared the package for build and use on Debian Stretch/Jessie at HZB base on epicsdeb:
https://github.com/chrschroeder/edm
I already added the last patch send by Bruce.
The package was build against an updated version (3.15.5) of the epics base package from epicsdeb:
https://github.com/chrschroeder/epics-base
They are still considered "testing". Feel free to use it.

Since managing a patch queue is somewhat of a hassle, I would also appreciate if the patches of Bruce could be merged back into an official edm release.

Best regards,
Christoph

On 05/25/2018 12:37 AM, Bruce Hill wrote:

Sorry, typo in my github repo.  Should be:
https://github.com/bhill-slac/edm.git

On 05/24/2018 02:32 PM, Bruce Hill wrote:

Hi Scott,
We're basing our work off https://github.com/epicsdeb/edm.git and
have added edm releases up to 1-12-105B on the upstream branch
which I've pushed here:
https://github.com:bhill-slac/edm.git

Both of the above are based on release tarballs.

Ideally we'd prefer to work from John Sinclair's full version history
if someone could convert it to git and post on github.

Regards,
- Bruce

On 05/24/2018 08:43 AM, Baily, Scott A wrote:

I also have some patches for the XYplot.  I’ve added the feature of supporting 0 element subscriptions to edm, and modified the plot widget so it won’t show trailing 0’s past the end of the waveform.  Also, I fixed a number of crashes with the XYplot.  Most had to do with uninitialized values being used.  I sent these to John Sinclair early this year, but there haven’t been any releases of EDM in over a year. Is anyone maintaining EDM? I saw some code on github, but it was had not been updated with the latest release of EDM.

--

Correspondence

--

Scott Baily

AOT-IC, MS H820

Los Alamos National Laboratory

Los Alamos, NM 87545

ph: (505) 606-2260

 

From: tech-talk-bounces@aps.anl.gov <tech-talk-bounces@aps.anl.gov> On Behalf Of Bruce Hill
Sent: Wednesday, May 23, 2018 7:16 PM
To: Eric Norum <wenorum@lbl.gov>
Cc: Gregory Portmann <gregportmann@hotmail.com>; Michael Chin <MJChin@lbl.gov>; EPICS Tech-Talk <tech-talk@aps.anl.gov>
Subject: Re: EDM X/Y Plot segfaults

 

Hi Eric,
We use xygraph for many displays at SLAC and generally it works well without crashing.

However, recently one of our developers found a repeatable crash related to xygraph that she suggested may be related to your crash.

The crash scenario she found involves one screen running an xygraph w/ several traces, each w/ 8k x and y arrays.   If we then try to open or execute another screen w/ 6 xygraph's, each w/ 8 similar traces, it would crash every time.   gdb stack trace wasn't exactly the same as yours and would vary as to where it crashed, but it always involved a NULL yPvData[i].

Cutting back on the number of xygraphs in the 2nd screen or the number of traces appears to
make the crash less likely, and if I remove enough of them, it can succeed.

My hypothesis re the root cause is that the xygraph is redrawing before all the PV's have reconnected.    To address this, I added tests for NULL xPvData[i] and NULL yPvData[i] at
the top of genXyVector() and genChronoVector().    If EDMDEBUGMODE is set to a non-zero
value it prints an error msg each time xygraph would have crashed.

With this patch we can run the full test screens and see 4 or 5 of the NULL yPvData error msgs.
The trace isn't drawn if xPvData[i] or yPvData[i] is NULL, but it doesn't crash and subsequent
trace updates work ok.

I don't know if this patch will fix your issue, but the patch is attached if you want to give it a try.

Cheers,
- Bruce





On 05/09/2018 11:38 AM, Eric Norum wrote:

Hmm…sorry to keep following up my own posting, but shortly after I sent the message containing what I though was a work-around I got another segfault.  It looks like that there are *lots* of places that can have null pointer dereferences in the methods invoked from xyGraphClass::executeDeferred.  I hope there are still seem edm maintainers out there that can help with this since my quick hack fix clearly isn’t going to work.

 

 found that hacking in the following changes seems to stop the segfaults and result in waveform display.  i suspect that there’s a better fix that involves not invoking this method when the pointers in question are invalid, but I don’t have any idea what that would entail.

 

diff -u baselib/xygraph.cc.orig baselib/xygraph.cc

--- baselib/xygraph.cc.orig     2018-05-09 10:49:02.055680000 -0700

+++ baselib/xygraph.cc  2018-05-09 10:52:04.804485000 -0700

@@ -6298,6 +6298,7 @@

   arrayNumPoints[i] = 0;

 

   for ( ii=0; ii<yPvCount[i]; ii++ ) {

+    if (!yPvData[i]) dyValue = 0; else

 

     // There are two views of pv types, Type and specificType; this uses

     // specificType

@@ -6413,7 +6414,7 @@

 

 #endif

 

-    dxValue = ( (double *) xPvData[i] )[ii];

+    if (xPvData[i]) dxValue = ( (double *) xPvData[i] )[ii]; else dxValue = 0;

 

     if ( xAxisStyle == XYGC_K_AXIS_STYLE_LOG10 ) {

       dxValue = loc_log10(  dxValue  );

 






On May 9, 2018, at 10:28 AM, Eric Norum <wenorum@lbl.gov> wrote:

 

Displaying X/Y data in EDM on OS X and Linux often results in a segmentation fault.  Here’s where the fault occurs:

 

Program received signal SIGSEGV, Segmentation fault.

0x00007ffff1f16f3c in xyGraphClass::genChronoVector (this=0x96c6f0, i=0, 

    rescale=0x7fffffffc2ec) at ../xygraph.cc:6344

6344          dyValue = ( (double *) yPvData[i] )[ii];

(gdb) where

#0  0x00007ffff1f16f3c in xyGraphClass::genChronoVector (this=0x96c6f0, i=0, 

    rescale=0x7fffffffc2ec) at ../xygraph.cc:6344

#1  0x00007ffff1f1c186 in xyGraphClass::executeDeferred (this=0x96c6f0)

    at ../xygraph.cc:9599

#2  0x00007ffff7afe07d in activeWindowClass::processObjects (this=0x927880)

    at ../act_win.cc:22289

#3  0x00007ffff7af5858 in appContextClass::applicationLoop (this=0x636d20)

    at ../app_pkg.cc:6592

#4  0x0000000000405562 in main (argc=<value optimized out>, 

    argv=<value optimized out>) at ../main.cc:2806

(gdb) list

6339          else {

6340            dyValue = (double) ( (unsigned short *) yPvData[i] )[ii];

6341          }

6342          break;

6343        default:

6344          dyValue = ( (double *) yPvData[i] )[ii];

6345          break;

6346        }

6347   

6348        if ( y1AxisStyle[yi] == XYGC_K_AXIS_STYLE_LOG10 ) {

(gdb) print yPvData

$3 = {0x0 <repeats 20 times>}

(gdb) print &yPvData

$4 = (void *(*)[20]) 0x96f240

(gdb) print i

$5 = 0

(gdb) print yPvData[0]

$6 = (void *) 0x0

(gdb) 






For some reason yPvData is full of null pointers which results in a segfault when dereferenced by the [ii] subscript.

Any ideas why this happens sometimes?   I can get a good display maybe two or three times in a row and then get the segfault.

 

— 

Eric Norum

 






-- 
Bruce Hill
Member Technical Staff
SLAC National Accelerator Lab
2575 Sand Hill Road M/S 10
Menlo Park, CA  94025





-- 
Bruce Hill
Member Technical Staff
SLAC National Accelerator Lab
2575 Sand Hill Road M/S 10
Menlo Park, CA  94025





-- 
Bruce Hill
Member Technical Staff
SLAC National Accelerator Lab
2575 Sand Hill Road M/S 10
Menlo Park, CA  94025





-- 
(bb|[^b]{2})

 



Helmholtz-Zentrum Berlin für Materialien und Energie GmbH

Mitglied der Hermann von Helmholtz-Gemeinschaft Deutscher Forschungszentren e.V.

Aufsichtsrat: Vorsitzender Dr. Karl Eugen Huthmacher, stv. Vorsitzende Dr. Jutta Koch-Unterseher
Geschäftsführung: Prof. Dr. Bernd Rech (kommissarisch), Thomas Frederking

Sitz Berlin, AG Charlottenburg, 89 HRB 5583

Postadresse:
Hahn-Meitner-Platz 1
D-14109 Berlin

https://www.helmholtz-berlin.de




-- 
Bruce Hill
Member Technical Staff
SLAC National Accelerator Lab
2575 Sand Hill Road M/S 10
Menlo Park, CA  94025



-- 
Bruce Hill
Member Technical Staff
SLAC National Accelerator Lab
2575 Sand Hill Road M/S 10
Menlo Park, CA  94025

References:
EDM X/Y Plot segfaults Eric Norum
Re: EDM X/Y Plot segfaults Eric Norum
Re: EDM X/Y Plot segfaults Bruce Hill
RE: EDM X/Y Plot segfaults Baily, Scott A
Re: EDM X/Y Plot segfaults Bruce Hill
Re: EDM X/Y Plot segfaults Bruce Hill
Re: EDM X/Y Plot segfaults Christoph Schroeder
RE: EDM X/Y Plot segfaults Sinclair, John William
Re: EDM X/Y Plot segfaults Bruce Hill
RE: EDM X/Y Plot segfaults Sinclair, John William
Re: EDM X/Y Plot segfaults Bruce Hill

Navigate by Date:
Prev: Vacancy: Senior Software Engineer – Controls and Data Streaming, ISIS Neutron and Muon Source (IRC245520) Freddie Akeroyd - UKRI STFC
Next: Re: dbExpand.pl: cannot set PHAS to '' Andrew Johnson
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  <20182019 
Navigate by Thread:
Prev: Re: EDM X/Y Plot segfaults Andrew Johnson
Next: CSS Boy Text Input widget perform Action after value change Gregory, Ray
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  <20182019 
ANJ, 31 Aug 2018 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·