When I say that edm is single threaded, I mean there should never be contention for resources among multiple threads.
Edm calls ca_context_create( ca_disable_preemptive_callback );
I believe that forces callbacks to always occur in the thread in which the callback was requested.
I’ll keep looking for a scenario that crashes more predictably.
Thanks,
John
Hi John,
I don't think anyone here has seen a crash related to the related display and mux widgets.
The times we've seen crashes have nearly all been w/ xygraph widgets.
Re threads, I think the problem relates to the threads introduced by the EPICS CA lib.
gdb shows that CA adds 5 threads. When CA executes Pv callbacks, they happen on CA threads.
The screens that crashed most often had resetPv's, traceCtlPv's and usually several traces where yPv was an array.
For each of these xygraph has callback functions when the connection goes up and down and
for when the value updates, and these callbacks write to xyGraphClass values.
I've gone through the code many times and haven't found a smoking gun, but it looks to me
like the normal execution of xyGraphClass::executeDeferred() should never call genChronoVector()
with a NULL ptr in yPvData[i] for an active trace, yet the diagnostics I added show that happens
occasionally, particularly when multiple xygraph widgets are active, each w/ many yPv callbacks at 120hz.
I think Scott's crash also was related to callback load, but in his case it appeared
to be related to arrays > 10000 elements. I'm not sure what the update rate was on his PVs.
He added an additional test of yPvData[i] in the for_each_trace loop in genChronoVector()
after my test for NULL yPvData[i] at the top of the function, and apparently that helped
reduce his crash rate even further.
Scott, could you add more details re your test case?
Regards,
- Bruce
On 08/16/2018 08:17 AM, Sinclair, John William wrote:
Edm is single threaded. There is code that takes and releases locks but that was for a particular pv type that was supported long ago. These could be removed
and should probably at least be turned into macros.
I am investigating a crash observed in the related display widget that happens when it contain macro symbols in its properties and also requires that it reside
on a page with a multiplexor that makes use of a controlling pv. You don’t happen to have a similar configuration?
As to converting the cvs repo to git, give me a bit of time. Work demands are high at the moment.
Thanks,
John Sinclair
Hi John,
I spent most of a day trying to reduce our test case to something portable,
but can only get it to fail with multiple live 120hz arrays, and at least 2 screens w/
multiple xygraph widgets.
As best I can tell the crashes are due to race conditions between threads. The patches
that Scott and I came up with greatly reduce the crash rate to the point where I have
not seen any additional crashes, but I wouldn't say these patches are a complete fix.
Would you be amenable to converting your edm repo to git and pushing it to github?
This would be a big aid for collaboration, allowing us to post proposed patches as
pull requests w/ all github's nice web based commenting and notifications.
I've had a lot of experience in doing this w/ svn and CVS based repos and could help
with advice if needed.
Best regards,
- Bruce Hill
On 08/13/2018 10:13 AM, Sinclair, John William wrote:
I am finally getting back to edm. Sorry for the delay.
I see that the author of the epics independent code left out the 0 element subscription so this can be fixed. As to the issue of the widget causing crashes,
does someone have a recipe to induce a predictable crash in the latest version? If so, let me know at your convenience.
Thanks,
John
Hi all,
if any Debian package maintainer is interested, I prepared the package for build and use on Debian Stretch/Jessie at HZB base on epicsdeb:
https://github.com/chrschroeder/edm
I already added the last patch send by Bruce.
The package was build against an updated version (3.15.5) of the epics base package from epicsdeb:
https://github.com/chrschroeder/epics-base
They are still considered "testing". Feel free to use it.
Since managing a patch queue is somewhat of a hassle, I would also appreciate if the patches of Bruce could be merged back into an official edm release.
Best regards,
Christoph
On 05/25/2018 12:37 AM, Bruce Hill wrote:
Sorry, typo in my github repo. Should be:
https://github.com/bhill-slac/edm.git
On 05/24/2018 02:32 PM, Bruce Hill wrote:
Hi Scott,
We're basing our work off https://github.com/epicsdeb/edm.git and
have added edm releases up to 1-12-105B on the upstream branch
which I've pushed here:
https://github.com:bhill-slac/edm.git
Both of the above are based on release tarballs.
Ideally we'd prefer to work from John Sinclair's full version history
if someone could convert it to git and post on github.
Regards,
- Bruce
On 05/24/2018 08:43 AM, Baily, Scott A wrote:
I also have some patches for the XYplot. I’ve added the feature of supporting 0 element subscriptions to edm, and modified the plot widget so it won’t show trailing
0’s past the end of the waveform. Also, I fixed a number of crashes with the XYplot. Most had to do with uninitialized values being used. I sent these to John Sinclair early this year, but there haven’t been any releases of EDM in over a year. Is anyone
maintaining EDM? I saw some code on github, but it was had not been updated with the latest release of EDM.
--
Correspondence
--
Scott Baily
AOT-IC, MS H820
Los Alamos National Laboratory
Los Alamos, NM 87545
ph: (505) 606-2260
Hi Eric,
We use xygraph for many displays at SLAC and generally it works well without crashing.
However, recently one of our developers found a repeatable crash related to xygraph that she suggested may be related to your crash.
The crash scenario she found involves one screen running an xygraph w/ several traces, each w/ 8k x and y arrays. If we then try to open or execute another screen w/ 6 xygraph's, each w/ 8 similar traces, it would crash every time. gdb stack trace wasn't
exactly the same as yours and would vary as to where it crashed, but it always involved a NULL yPvData[i].
Cutting back on the number of xygraphs in the 2nd screen or the number of traces appears to
make the crash less likely, and if I remove enough of them, it can succeed.
My hypothesis re the root cause is that the xygraph is redrawing before all the PV's have reconnected. To address this, I added tests for NULL xPvData[i] and NULL yPvData[i] at
the top of genXyVector() and genChronoVector(). If EDMDEBUGMODE is set to a non-zero
value it prints an error msg each time xygraph would have crashed.
With this patch we can run the full test screens and see 4 or 5 of the NULL yPvData error msgs.
The trace isn't drawn if xPvData[i] or yPvData[i] is NULL, but it doesn't crash and subsequent
trace updates work ok.
I don't know if this patch will fix your issue, but the patch is attached if you want to give it a try.
Cheers,
- Bruce
On 05/09/2018 11:38 AM, Eric Norum wrote:
Hmm…sorry to keep following up my own posting, but shortly after I sent the message containing what I though was a work-around I got another segfault. It looks like that there are *lots* of places that can have null pointer dereferences
in the methods invoked from xyGraphClass::executeDeferred. I hope there are still seem edm maintainers out there that can help with this since my quick hack fix clearly isn’t going to work.
found that hacking in the following changes seems to stop the segfaults and result in waveform display. i suspect that there’s a better fix that involves not invoking this method when the pointers in question are invalid, but I don’t
have any idea what that would entail.
diff -u baselib/xygraph.cc.orig baselib/xygraph.cc
--- baselib/xygraph.cc.orig
2018-05-09 10:49:02.055680000 -0700
+++ baselib/xygraph.cc
2018-05-09 10:52:04.804485000 -0700
for ( ii=0; ii<yPvCount[i]; ii++ ) {
+ if (!yPvData[i]) dyValue = 0; else
// There are two views of pv types, Type and specificType; this uses
- dxValue = ( (double *) xPvData[i] )[ii];
+ if (xPvData[i]) dxValue = ( (double *) xPvData[i] )[ii]; else dxValue = 0;
if ( xAxisStyle == XYGC_K_AXIS_STYLE_LOG10 ) {
dxValue = loc_log10( dxValue );
Displaying X/Y data in EDM on OS X and Linux often results in a segmentation fault. Here’s where the fault occurs:
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff1f16f3c in xyGraphClass::genChronoVector (this=0x96c6f0, i=0,
6344
dyValue = ( (double *) yPvData[i] )[ii];
#0 0x00007ffff1f16f3c in xyGraphClass::genChronoVector (this=0x96c6f0, i=0,
#1 0x00007ffff1f1c186 in xyGraphClass::executeDeferred (this=0x96c6f0)
#2 0x00007ffff7afe07d in activeWindowClass::processObjects (this=0x927880)
#3 0x00007ffff7af5858 in appContextClass::applicationLoop (this=0x636d20)
#4 0x0000000000405562 in main (argc=<value optimized out>,
6340
dyValue = (double) ( (unsigned short *) yPvData[i] )[ii];
6344
dyValue = ( (double *) yPvData[i] )[ii];
6348
if ( y1AxisStyle[yi] == XYGC_K_AXIS_STYLE_LOG10 ) {
$3 = {0x0 <repeats 20 times>}
$4 = (void *(*)[20]) 0x96f240
For some reason yPvData is full of null pointers which results in a segfault when dereferenced by the [ii] subscript.
Any ideas why this happens sometimes? I can get a good display maybe two or three times in a row and then get the segfault.
--
Bruce Hill
Member Technical Staff
SLAC National Accelerator Lab
2575 Sand Hill Road M/S 10
Menlo Park, CA 94025
--
Bruce Hill
Member Technical Staff
SLAC National Accelerator Lab
2575 Sand Hill Road M/S 10
Menlo Park, CA 94025
--
Bruce Hill
Member Technical Staff
SLAC National Accelerator Lab
2575 Sand Hill Road M/S 10
Menlo Park, CA 94025
--
(bb|[^b]{2})
Helmholtz-Zentrum Berlin für Materialien und Energie GmbH
Mitglied der Hermann von Helmholtz-Gemeinschaft Deutscher Forschungszentren e.V.
Aufsichtsrat: Vorsitzender Dr. Karl Eugen Huthmacher, stv. Vorsitzende Dr. Jutta Koch-Unterseher
Geschäftsführung: Prof. Dr. Bernd Rech (kommissarisch), Thomas Frederking
Sitz Berlin, AG Charlottenburg, 89 HRB 5583
Postadresse:
Hahn-Meitner-Platz 1
D-14109 Berlin
https://www.helmholtz-berlin.de
--
Bruce Hill
Member Technical Staff
SLAC National Accelerator Lab
2575 Sand Hill Road M/S 10
Menlo Park, CA 94025
--
Bruce Hill
Member Technical Staff
SLAC National Accelerator Lab
2575 Sand Hill Road M/S 10
Menlo Park, CA 94025
|
- Replies:
- Re: EDM X/Y Plot segfaults Andrew Johnson
- References:
- EDM X/Y Plot segfaults Eric Norum
- Re: EDM X/Y Plot segfaults Eric Norum
- Re: EDM X/Y Plot segfaults Bruce Hill
- RE: EDM X/Y Plot segfaults Baily, Scott A
- Re: EDM X/Y Plot segfaults Bruce Hill
- Re: EDM X/Y Plot segfaults Bruce Hill
- Re: EDM X/Y Plot segfaults Christoph Schroeder
- RE: EDM X/Y Plot segfaults Sinclair, John William
- Re: EDM X/Y Plot segfaults Bruce Hill
- RE: EDM X/Y Plot segfaults Sinclair, John William
- Re: EDM X/Y Plot segfaults Bruce Hill
- Navigate by Date:
- Prev:
Re: a simple question about pyepics Matt Newville
- Next:
Re: EDM X/Y Plot segfaults Andrew Johnson
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
<2018>
2019
2020
2021
2022
2023
2024
- Navigate by Thread:
- Prev:
Re: EDM X/Y Plot segfaults Bruce Hill
- Next:
Re: EDM X/Y Plot segfaults Andrew Johnson
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
<2018>
2019
2020
2021
2022
2023
2024
|