EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  <20202021  2022  2023  2024  Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  <20202021  2022  2023  2024 
<== Date ==> <== Thread ==>

Subject: Re: CAS-client thread issues in areaDetector IOC
From: Michael Davidsaver via Tech-talk <tech-talk at aps.anl.gov>
To: "Wlodek, Jakub" <jwlodek at bnl.gov>
Cc: "Wlodek, Jakub via Tech-talk" <tech-talk at aps.anl.gov>
Date: Wed, 5 Feb 2020 09:33:34 -0800
On 2/5/20 7:35 AM, Wlodek, Jakub wrote:
> Hi all,
> 
> After some testing, I think I am able to reproduce the problem. Essentially, I had a CS-Studio client opened for both the simDetector IOC and the ADUVC IOC, with a misconfigured
> CA_MAX_ARRAY_BYTES (it had too small a value). I hadn't noticed because I use ImageJ for testing the camera IOCs.
> 
> Closing the CSS window and running the IOC with a simple python script to set values etc. I did not see the error despite running overnight.
> 
> I tried running again after fixing the CSS issue, and now I don't see the problem. I wonder how the client is able to have such an effect on the IOC process, and why I only saw the issue running with
> base 7.0.3.1?

This sounds increasingly like a Base bug.  I'm also surprised to see this appear with 7.0.3.1.
However, memory corruption errors can be so difficult to find in part because the
effects don't have to follow with the usual execution flow.

> I also tried the following to make sure the issue was tied to this: I started the IOC without the misconfigured CSS open, and ran the script to control it (this gave no errors). Then, while the IOC was running,
> I opened the CSS window, and promptly saw the error message display in the IOC shell several times, followed by a Segmentation Fault, as before.

Can you capture the specifics of how to reproduce this crash reliably?
Preferably with just a waveformRecord and caget/camonitor, though simDetector
and cs-studio should be enough.  eg. specific values for array sizes and CA_MAX_ARRAY_BYTES.

This would help if someone at the codeathon meeting next week tries to look into this.

https://epics.anl.gov/meetings/codeathon-20/index.php

Of course, you are welcomed to continue looking into this yourself :)
Being able to trigger this crash on demand should help immensely in isolating the problem.

The first thing I would try is running the IOC in valgrind.  For certain errors valgrind
can clearly identify the offending write.

If this doesn't give an answer, then a fairly certain strategy is to use a GDB watchpoint.
This would be much more complicated to setup as it requires breaking first during client
connection, then adding a watchpoint for a memory address which will be corrupted,
then continuing until to find where this address is (erroniously) written.

https://sourceware.org/gdb/current/onlinedocs/gdb/Set-Watchpoints.html#Set-Watchpoints

Specifically "watch -l <expr>" to watch an address regardless of current scope.


> Thanks for the help with this problem,
> Jakub

> *From:* Mark Rivers <rivers at cars.uchicago.edu>
> *Sent:* Friday, January 31, 2020 7:22 PM
> *To:* Wlodek, Jakub <jwlodek at bnl.gov>; Wlodek, Jakub via Tech-talk <tech-talk at aps.anl.gov>; Michael Davidsaver <mdavidsaver at gmail.com>
> *Subject:* RE: CAS-client thread issues in areaDetector IOC
>  
> 
> Hi Jakub,
> 
>  
> 
> The error seems to be consistently in the CAS-client.  This suggests perhaps there is some CA client that is causing it.  Can you try shutting down all CA clients that are accessing this IOC?  You can use “casr 1” to determine what clients are connected.
> 
>  
> 
> Mark
> 
>  
> 
>  
> 
> *From:*Wlodek, Jakub <jwlodek at bnl.gov>
> *Sent:* Friday, January 31, 2020 8:22 AM
> *To:* Wlodek, Jakub via Tech-talk <tech-talk at aps.anl.gov>; Mark Rivers <rivers at cars.uchicago.edu>; Michael Davidsaver <mdavidsaver at gmail.com>
> *Subject:* Re: CAS-client thread issues in areaDetector IOC
> 
>  
> 
> Hi all,
> 
>  
> 
> After updating my machine and rebooting it, then recompiling all of EPICS + modules, I ran a test by starting a simDetector IOC and letting it sit overnight.
> 
> I didn't activate any plugins or press any buttons, and when I checked on the IOC this morning, the error message was displayed several times, along with
> 
> a Segmentation Fault crash:
> 
>  
> 
> epics>
> 
> epics> epicsEventTrigger: pthread_mutex_lock failed: Invalid argument
> 
> epicsEventMustTriggerThread CAS-client (0x7fd6e8042940) can't proceed, suspending.
> 
> Dumping a stack trace of thread 'CAS-client':
> 
> [    0x56549c7567e3]: /epics/src/support/areaDetector/ADSimDetector/iocs/simDetectorIOC/bin/linux-x86_64/simDetectorApp(epicsStackTrace+0x73)
> 
> [    0x56549c747215]: /epics/src/support/areaDetector/ADSimDetector/iocs/simDetectorIOC/bin/linux-x86_64/simDetectorApp(cantProceed+0xc5)
> 
> [    0x56549c6d0403]: /epics/src/support/areaDetector/ADSimDetector/iocs/simDetectorIOC/bin/linux-x86_64/simDetectorApp(db_close_events+0x33)
> 
> [    0x56549c6f7d2f]: /epics/src/support/areaDetector/ADSimDetector/iocs/simDetectorIOC/bin/linux-x86_64/simDetectorApp(destroy_tcp_client+0x8f)
> 
> [    0x56549c6f894d]: /epics/src/support/areaDetector/ADSimDetector/iocs/simDetectorIOC/bin/linux-x86_64/simDetectorApp(camsgtask+0x13d)
> 
> [    0x56549c751c18]: /epics/src/support/areaDetector/ADSimDetector/iocs/simDetectorIOC/bin/linux-x86_64/simDetectorApp(start_routine+0xf8)
> 
> [    0x7fd82d7016db]: /lib/x86_64-linux-gnu/libpthread.so.0(start_thread+0xdb)
> 
> [    0x7fd82c49688f]: /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)
> 
> Thread CAS-client (0x7fd6e8042940) suspended
> 
> epicsEventTrigger: pthread_mutex_lock failed: Invalid argument
> 
> epicsEventMustTriggerThread CAS-client (0x7fd6e8042bf0) can't proceed, suspending.
> 
> Dumping a stack trace of thread 'CAS-client':
> 
> [    0x56549c7567e3]: /epics/src/support/areaDetector/ADSimDetector/iocs/simDetectorIOC/bin/linux-x86_64/simDetectorApp(epicsStackTrace+0x73)
> 
> [    0x56549c747215]: /epics/src/support/areaDetector/ADSimDetector/iocs/simDetectorIOC/bin/linux-x86_64/simDetectorApp(cantProceed+0xc5)
> 
> [    0x56549c6d0403]: /epics/src/support/areaDetector/ADSimDetector/iocs/simDetectorIOC/bin/linux-x86_64/simDetectorApp(db_close_events+0x33)
> 
> [    0x56549c6f7d2f]: /epics/src/support/areaDetector/ADSimDetector/iocs/simDetectorIOC/bin/linux-x86_64/simDetectorApp(destroy_tcp_client+0x8f)
> 
> [    0x56549c6f894d]: /epics/src/support/areaDetector/ADSimDetector/iocs/simDetectorIOC/bin/linux-x86_64/simDetectorApp(camsgtask+0x13d)
> 
> [    0x56549c751c18]: /epics/src/support/areaDetector/ADSimDetector/iocs/simDetectorIOC/bin/linux-x86_64/simDetectorApp(start_routine+0xf8)
> 
> [    0x7fd82d7016db]: /lib/x86_64-linux-gnu/libpthread.so.0(start_thread+0xdb)
> 
> [    0x7fd82c49688f]: /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)
> 
> Thread CAS-client (0x7fd6e8042bf0) suspended
> 
> epicsEventTrigger: pthread_mutex_unlock failed: Invalid argument
> 
> epicsEventMustTriggerThread CAS-client (0x7fd6e8043490) can't proceed, suspending.
> 
> Dumping a stack trace of thread 'CAS-client':
> 
> [    0x56549c7567e3]: /epics/src/support/areaDetector/ADSimDetector/iocs/simDetectorIOC/bin/linux-x86_64/simDetectorApp(epicsStackTrace+0x73)
> 
> [    0x56549c747215]: /epics/src/support/areaDetector/ADSimDetector/iocs/simDetectorIOC/bin/linux-x86_64/simDetectorApp(cantProceed+0xc5)
> 
> [    0x56549c6d0403]: /epics/src/support/areaDetector/ADSimDetector/iocs/simDetectorIOC/bin/linux-x86_64/simDetectorApp(db_close_events+0x33)
> 
> [    0x56549c6f7d2f]: /epics/src/support/areaDetector/ADSimDetector/iocs/simDetectorIOC/bin/linux-x86_64/simDetectorApp(destroy_tcp_client+0x8f)
> 
> [    0x56549c6f894d]: /epics/src/support/areaDetector/ADSimDetector/iocs/simDetectorIOC/bin/linux-x86_64/simDetectorApp(camsgtask+0x13d)
> 
> [    0x56549c751c18]: /epics/src/support/areaDetector/ADSimDetector/iocs/simDetectorIOC/bin/linux-x86_64/simDetectorApp(start_routine+0xf8)
> 
> [    0x7fd82d7016db]: /lib/x86_64-linux-gnu/libpthread.so.0(start_thread+0xdb)
> 
> [    0x7fd82c49688f]: /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)
> 
> Thread CAS-client (0x7fd6e8043490) suspended
> 
> epicsEventTrigger: pthread_mutex_unlock failed: Invalid argument
> 
> epicsEventMustTriggerThread CAS-client (0x7fd6e8043da0) can't proceed, suspending.
> 
> Dumping a stack trace of thread 'CAS-client':
> 
> [    0x56549c7567e3]: /epics/src/support/areaDetector/ADSimDetector/iocs/simDetectorIOC/bin/linux-x86_64/simDetectorApp(epicsStackTrace+0x73)
> 
> [    0x56549c747215]: /epics/src/support/areaDetector/ADSimDetector/iocs/simDetectorIOC/bin/linux-x86_64/simDetectorApp(cantProceed+0xc5)
> 
> [    0x56549c6d0403]: /epics/src/support/areaDetector/ADSimDetector/iocs/simDetectorIOC/bin/linux-x86_64/simDetectorApp(db_close_events+0x33)
> 
> [    0x56549c6f7d2f]: /epics/src/support/areaDetector/ADSimDetector/iocs/simDetectorIOC/bin/linux-x86_64/simDetectorApp(destroy_tcp_client+0x8f)
> 
> [    0x56549c6f894d]: /epics/src/support/areaDetector/ADSimDetector/iocs/simDetectorIOC/bin/linux-x86_64/simDetectorApp(camsgtask+0x13d)
> 
> [    0x56549c751c18]: /epics/src/support/areaDetector/ADSimDetector/iocs/simDetectorIOC/bin/linux-x86_64/simDetectorApp(start_routine+0xf8)
> 
> [    0x7fd82d7016db]: /lib/x86_64-linux-gnu/libpthread.so.0(start_thread+0xdb)
> 
> [    0x7fd82c49688f]: /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)
> 
> Thread CAS-client (0x7fd6e8043da0) suspended
> 
> epicsEventTrigger: pthread_mutex_unlock failed: Invalid argument
> 
> epicsEventMustTriggerThread CAS-client (0x7fd6e80446b0) can't proceed, suspending.
> 
> Dumping a stack trace of thread 'CAS-client':
> 
> [    0x56549c7567e3]: /epics/src/support/areaDetector/ADSimDetector/iocs/simDetectorIOC/bin/linux-x86_64/simDetectorApp(epicsStackTrace+0x73)
> 
> [    0x56549c747215]: /epics/src/support/areaDetector/ADSimDetector/iocs/simDetectorIOC/bin/linux-x86_64/simDetectorApp(cantProceed+0xc5)
> 
> [    0x56549c6d0403]: /epics/src/support/areaDetector/ADSimDetector/iocs/simDetectorIOC/bin/linux-x86_64/simDetectorApp(db_close_events+0x33)
> 
> [    0x56549c6f7d2f]: /epics/src/support/areaDetector/ADSimDetector/iocs/simDetectorIOC/bin/linux-x86_64/simDetectorApp(destroy_tcp_client+0x8f)
> 
> [    0x56549c6f894d]: /epics/src/support/areaDetector/ADSimDetector/iocs/simDetectorIOC/bin/linux-x86_64/simDetectorApp(camsgtask+0x13d)
> 
> [    0x56549c751c18]: /epics/src/support/areaDetector/ADSimDetector/iocs/simDetectorIOC/bin/linux-x86_64/simDetectorApp(start_routine+0xf8)
> 
> [    0x7fd82d7016db]: /lib/x86_64-linux-gnu/libpthread.so.0(start_thread+0xdb)
> 
> [    0x7fd82c49688f]: /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)
> 
> epicsEventTrigger: pthread_mutex_unlock failed: Invalid argument
> 
> epicsEventMustTriggerThread CAS-client (0x7fd6e8044fc0) can't proceed, suspending.
> 
> Dumping a stack trace of thread 'CAS-client':
> 
> [    0x56549c7567e3]: /epics/src/support/areaDetector/ADSimDetector/iocs/simDetectorIOC/bin/linux-x86_64/simDetectorApp(epicsStackTrace+0x73)
> 
> [    0x56549c747215]: /epics/src/support/areaDetector/ADSimDetector/iocs/simDetectorIOC/bin/linux-x86_64/simDetectorApp(cantProceed+0xc5)
> 
> [    0x56549c6d0403]: /epics/src/support/areaDetector/ADSimDetector/iocs/simDetectorIOC/bin/linux-x86_64/simDetectorApp(db_close_events+0x33)
> 
> [    0x56549c6f7d2f]: /epics/src/support/areaDetector/ADSimDetector/iocs/simDetectorIOC/bin/linux-x86_64/simDetectorApp(destroy_tcp_client+0x8f)
> 
> [    0x56549c6f894d]: /epics/src/support/areaDetector/ADSimDetector/iocs/simDetectorIOC/bin/linux-x86_64/simDetectorApp(camsgtask+0x13d)
> 
> [    0x56549c751c18]: /epics/src/support/areaDetector/ADSimDetector/iocs/simDetectorIOC/bin/linux-x86_64/simDetectorApp(start_routine+0xf8)
> 
> [    0x7fd82d7016db]: /lib/x86_64-linux-gnu/libpthread.so.0(start_thread+0xdb)
> 
> [    0x7fd82c49688f]: /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)
> 
> Thread CAS-client (0x7fd6e80446b0) suspended
> 
> Thread CAS-client (0x7fd6e8044fc0) suspended
> 
> Segmentation fault (core dumped)
> 
>  
> 
> I still haven't been able to reproduce this on any other machines, so I may just assume that something on my test machine is broken, and I'll just move development to a
> 
> different machine until I can sort this issue out.
> 
>  
> 
> Jakub
> 
>  
> 

> 
> *From:*Tech-talk <tech-talk-bounces at aps.anl.gov <mailto:tech-talk-bounces at aps.anl.gov>> on behalf of Wlodek, Jakub via Tech-talk <tech-talk at aps.anl.gov <mailto:tech-talk at aps.anl.gov>>
> *Sent:* Wednesday, January 29, 2020 3:13 PM
> *To:* Mark Rivers <rivers at cars.uchicago.edu <mailto:rivers at cars.uchicago.edu>>; Michael Davidsaver <mdavidsaver at gmail.com <mailto:mdavidsaver at gmail.com>>
> *Cc:* tech-talk at aps.anl.gov <mailto:tech-talk at aps.anl.gov> <tech-talk at aps.anl.gov <mailto:tech-talk at aps.anl.gov>>
> *Subject:* Re: CAS-client thread issues in areaDetector IOC
> 
>  
> 
> Hi all,
> 
>  
> 
> After testing on another Ubuntu 18 machine, I couldn't reproduce the issue. It seems that it is limited to my original specific machine. I will try updating all of the packages on it,
> 
> recompiling, and then I will use a python script to test, as suggested.
> 
>  
> 
> If I find the cause of the issue I will further update this thread.
> 
>  
> 
> Thanks,
> 
> Jakub
> 

> 
> *From:*Mark Rivers <rivers at cars.uchicago.edu <mailto:rivers at cars.uchicago.edu>>
> *Sent:* Monday, January 27, 2020 1:57 PM
> *To:* Wlodek, Jakub <jwlodek at bnl.gov <mailto:jwlodek at bnl.gov>>; Michael Davidsaver <mdavidsaver at gmail.com <mailto:mdavidsaver at gmail.com>>
> *Cc:* tech-talk at aps.anl.gov <mailto:tech-talk at aps.anl.gov> <tech-talk at aps.anl.gov <mailto:tech-talk at aps.anl.gov>>
> *Subject:* RE: CAS-client thread issues in areaDetector IOC
> 
>  
> 
> I suggest using the simDetector and a Python script for testing.  Start the simDetector with no plugins enabled, and have Python just press start and stop in a loop.  If that does not fail then enable plugins one at a time until you find a minimal configuration is that will generate the error.  Then others can try to reproduce.
> 
>  
> 
> I just did a lot of clicking on the simDetector with Ubuntu 18.04 and base 7.0.3.1 and cannot make it fail.
> 
>  
> 
> Mark
> 
>  
> 
>  
> 
>  
> 
> *From:*Wlodek, Jakub <jwlodek at bnl.gov <mailto:jwlodek at bnl.gov>>
> *Sent:* Monday, January 27, 2020 11:35 AM
> *To:* Michael Davidsaver <mdavidsaver at gmail.com <mailto:mdavidsaver at gmail.com>>
> *Cc:* Mark Rivers <rivers at cars.uchicago.edu <mailto:rivers at cars.uchicago.edu>>; tech-talk at aps.anl.gov <mailto:tech-talk at aps.anl.gov>
> *Subject:* Re: CAS-client thread issues in areaDetector IOC
> 
>  
> 
> Hi Mark
> 
>  
> 
>     Is this error consistent or intermittent?
> 
>  
> 
> I observe it every time I start an IOC, but not immediately after startup, it shows up after some time with normal operation (acquire start/stop, enable/disable plugins etc.)
> 
> I can't seem to nail down a specific series of steps that leads to the error, but it seems consistent. I also re cloned and rebuilt everything again from scratch in a different location
> 
> to make sure something wasn't messed up in that specific set of sources, and I saw the same issue.
> 
>  
> 
> Regards,
> 
> Jakub
> 

> 
> *From:*Michael Davidsaver <mdavidsaver at gmail.com <mailto:mdavidsaver at gmail.com>>
> *Sent:* Monday, January 27, 2020 12:28 PM
> *To:* Wlodek, Jakub <jwlodek at bnl.gov <mailto:jwlodek at bnl.gov>>
> *Cc:* Mark Rivers <rivers at cars.uchicago.edu <mailto:rivers at cars.uchicago.edu>>; tech-talk at aps.anl.gov <mailto:tech-talk at aps.anl.gov> <tech-talk at aps.anl.gov <mailto:tech-talk at aps.anl.gov>>
> *Subject:* Re: CAS-client thread issues in areaDetector IOC
> 
>  
> 
> On 1/27/20 7:45 AM, Wlodek, Jakub wrote:
>> ...
>> In addition, the following tests showed some error message, but were seemingly re-run successfully:
> 
> We have some tests which intentionally provoke error to ensure that
> they can be handled correctly.  Unless you see any "fail" instead of "ok",
> then the tests are passing and we'll have to look elsewhere.
> 
> 
> 
>> iocshTest.t ................... 1/19 Command no_such_command not found.
>> iocsh Error: Break
>> Can't open no_such_file.cmd: No such file or directory
>> .
>> .
>> .
>> iocshTest.t ................... ok 
>> 
>> chfPluginTest.t ............ 1/1433 chfConfigParseStart: plugin pvt alloc failed
>> chfPluginTest.t ............ ok       
>> 
>> netget.t ................ 1/3 **** The executable "caRepeater" couldn't be located
>> **** because of errno = "No such file or directory".
>> **** You may need to modify your PATH environment variable.
>> **** Unable to start "CA Repeater" process.
>> netget.t ................ ok  
>> 
>> There were also several tests that listed json parsing errors.
>> 
>> Regards,
>> Jakub
>> 

>> *From:* Michael Davidsaver <mdavidsaver at gmail.com <mailto:mdavidsaver at gmail.com>>
>> *Sent:* Friday, January 24, 2020 4:52 PM
>> *To:* Wlodek, Jakub <jwlodek at bnl.gov <mailto:jwlodek at bnl.gov>>
>> *Cc:* Mark Rivers <rivers at cars.uchicago.edu <mailto:rivers at cars.uchicago.edu>>; tech-talk at aps.anl.gov <mailto:tech-talk at aps.anl.gov> <tech-talk at aps.anl.gov <mailto:tech-talk at aps.anl.gov>>
>> *Subject:* Re: CAS-client thread issues in areaDetector IOC
>>  
>> On 1/24/20 10:15 AM, Mark Rivers wrote:
>>> Ø  Could it be that there is a problem with something in the newest release of epics-base?
>> ...
>>> You are running 7.0.3 now.  If you go back to 7.0.3.1 does the problem re-appear?
>> 
>> If the assert() failure does re-appear, could you run the base unittests and report any which don't pass?
>> 
>>> make runtests
> 


References:
CAS-client thread issues in areaDetector IOC Wlodek, Jakub via Tech-talk
Re: CAS-client thread issues in areaDetector IOC Wlodek, Jakub via Tech-talk
Re: CAS-client thread issues in areaDetector IOC Michael Davidsaver via Tech-talk
Re: CAS-client thread issues in areaDetector IOC Wlodek, Jakub via Tech-talk
RE: CAS-client thread issues in areaDetector IOC Mark Rivers via Tech-talk
Re: CAS-client thread issues in areaDetector IOC Michael Davidsaver via Tech-talk
Re: CAS-client thread issues in areaDetector IOC Wlodek, Jakub via Tech-talk
Re: CAS-client thread issues in areaDetector IOC Michael Davidsaver via Tech-talk
Re: CAS-client thread issues in areaDetector IOC Wlodek, Jakub via Tech-talk
RE: CAS-client thread issues in areaDetector IOC Mark Rivers via Tech-talk
Re: CAS-client thread issues in areaDetector IOC Wlodek, Jakub via Tech-talk
Re: CAS-client thread issues in areaDetector IOC Wlodek, Jakub via Tech-talk
RE: CAS-client thread issues in areaDetector IOC Mark Rivers via Tech-talk
Re: CAS-client thread issues in areaDetector IOC Wlodek, Jakub via Tech-talk

Navigate by Date:
Prev: Re: CAS-client thread issues in areaDetector IOC Wlodek, Jakub via Tech-talk
Next: ADVimba wrong camera initialization Kacper Klys via Tech-talk
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  <20202021  2022  2023  2024 
Navigate by Thread:
Prev: Re: CAS-client thread issues in areaDetector IOC Wlodek, Jakub via Tech-talk
Next: Building Matlab Channel Access with Base R3.15.7 Eric Norum via Tech-talk
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  <20202021  2022  2023  2024 
ANJ, 05 Feb 2020 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·