EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  <20202021  2022  2023  2024  Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  <20202021  2022  2023  2024 
<== Date ==> <== Thread ==>

Subject: Re: CAS-client thread issues in areaDetector IOC
From: Michael Davidsaver via Tech-talk <[email protected]>
To: "Wlodek, Jakub" <[email protected]>
Cc: "[email protected]" <[email protected]>
Date: Thu, 23 Jan 2020 11:43:49 -0800
The threads in question are per-connection CA workers, which start and stop
as clients (dis)connect.  In this case, the fault is on disconnect.

The second error "Access not within mapped region at address 0x8"
is a straight NULL dereference, which should never happen.
The ELLNODE is non-zero after successful thread creation, and is never re-zeroed.
Which suggests that some corruption happened while this client was connected.
(I don't suppose this client is 'caput'?)

The internal structure in question (struct epicsThreadOSD) is allocated
with calloc() directly (no free list).  Given that malloc() from glibc
is known to reuse recently free()'d allocations quickly, and assuming
that there are no other access violations being flagged, my guess
is that this is a use-after-free type error.




On 1/23/20 9:32 AM, Wlodek, Jakub via Tech-talk wrote:
> Hi Mark,
> 
> As far as I know the cameras on the beamline are running on EPICS base versions ranging from 7.0.1 to 7.0.3, and the camera I am testing is running R7.0.3.1.
> 
> I just recompiled everything to make sure it has all been built against the correct base, and reran with valgrind, and it does seem that the issue is memory access:
> 
> First, I see this invalid write error message repeatedly:
> 
> ==32571== Invalid write of size 8
> ==32571==    at 0x10FAD50: ellDelete (in /epics/src/support/areaDetector/ADUVC/iocs/adUVCIOC/bin/linux-x86_64/adUVCApp)
> ==32571==    by 0x1110645: free_threadInfo.part.0 (in /epics/src/support/areaDetector/ADUVC/iocs/adUVCIOC/bin/linux-x86_64/adUVCApp)
> ==32571==    by 0x111087F: start_routine (in /epics/src/support/areaDetector/ADUVC/iocs/adUVCIOC/bin/linux-x86_64/adUVCApp)
> ==32571==    by 0x681C6DA: start_thread (pthread_create.c:463)
> ==32571==    by 0x7AE988E: clone (clone.S:95)
> ==32571==  Address 0x321915f8 is 8 bytes inside a block of size 170 free'd
> ==32571==    at 0x4C30D3B: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
> ==32571==    by 0x111087F: start_routine (in /epics/src/support/areaDetector/ADUVC/iocs/adUVCIOC/bin/linux-x86_64/adUVCApp)
> ==32571==    by 0x681C6DA: start_thread (pthread_create.c:463)
> ==32571==    by 0x7AE988E: clone (clone.S:95)
> ==32571==  Block was alloc'd at
> ==32571==    at 0x4C31B25: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
> ==32571==    by 0x11103B6: init_threadInfo (in /epics/src/support/areaDetector/ADUVC/iocs/adUVCIOC/bin/linux-x86_64/adUVCApp)
> ==32571==    by 0x1111357: epicsThreadCreateOpt (in /epics/src/support/areaDetector/ADUVC/iocs/adUVCIOC/bin/linux-x86_64/adUVCApp)
> ==32571==    by 0x1109E56: epicsThreadCreate (in /epics/src/support/areaDetector/ADUVC/iocs/adUVCIOC/bin/linux-x86_64/adUVCApp)
> ==32571==    by 0x10B72D3: req_server (in /epics/src/support/areaDetector/ADUVC/iocs/adUVCIOC/bin/linux-x86_64/adUVCApp)
> ==32571==    by 0x11107C7: start_routine (in /epics/src/support/areaDetector/ADUVC/iocs/adUVCIOC/bin/linux-x86_64/adUVCApp)
> ==32571==    by 0x681C6DA: start_thread (pthread_create.c:463)
> ==32571==    by 0x7AE988E: clone (clone.S:95)
> ==32571==
> 
> Then finally the pthread error and Segfault, and valgrind lists "access not within mapped region" as the cause
> 
> pthread_join error No such process
> pthread_join error No such process
> ==32571==
> ==32571== Process terminating with default action of signal 11 (SIGSEGV)
> ==32571==  Access not within mapped region at address 0x8
> ==32571==    at 0x10FAD50: ellDelete (in /epics/src/support/areaDetector/ADUVC/iocs/adUVCIOC/bin/linux-x86_64/adUVCApp)
> ==32571==    by 0x1110645: free_threadInfo.part.0 (in /epics/src/support/areaDetector/ADUVC/iocs/adUVCIOC/bin/linux-x86_64/adUVCApp)
> ==32571==    by 0x1111D8B: epicsThreadMustJoin (in /epics/src/support/areaDetector/ADUVC/iocs/adUVCIOC/bin/linux-x86_64/adUVCApp)
> ==32571==    by 0x10B68DE: destroy_tcp_client (in /epics/src/support/areaDetector/ADUVC/iocs/adUVCIOC/bin/linux-x86_64/adUVCApp)
> ==32571==    by 0x10B74FC: camsgtask (in /epics/src/support/areaDetector/ADUVC/iocs/adUVCIOC/bin/linux-x86_64/adUVCApp)
> ==32571==    by 0x11107C7: start_routine (in /epics/src/support/areaDetector/ADUVC/iocs/adUVCIOC/bin/linux-x86_64/adUVCApp)
> ==32571==    by 0x681C6DA: start_thread (pthread_create.c:463)
> ==32571==    by 0x7AE988E: clone (clone.S:95)
> ==32571==  If you believe this happened as a result of a stack
> ==32571==  overflow in your program's main thread (unlikely but
> ==32571==  possible), you can try to increase the size of the
> ==32571==  main thread stack using the --main-stacksize= flag.
> ==32571==  The main thread stack size used in this run was 8388608.
> ==32571==
> ==32571== HEAP SUMMARY:
> ==32571==     in use at exit: 95,935,018 bytes in 345,749 blocks
> ==32571==   total heap usage: 762,443 allocs, 128,925 frees, 8,380,446,680 bytes allocated
> ==32571==
> ==32571== LEAK SUMMARY:
> ==32571==    definitely lost: 532,602 bytes in 8,403 blocks
> ==32571==    indirectly lost: 0 bytes in 0 blocks
> ==32571==      possibly lost: 26,580,722 bytes in 37,063 blocks
> ==32571==    still reachable: 68,112,742 bytes in 315,085 blocks
> ==32571==                       of which reachable via heuristic:
> ==32571==                         newarray           : 1,920 bytes in 18 blocks
> ==32571==         suppressed: 0 bytes in 0 blocks
> ==32571== Rerun with --leak-check=full to see details of leaked memory
> ==32571==
> ==32571== For counts of detected and suppressed errors, rerun with: -v
> ==32571== Use --track-origins=yes to see where uninitialised values come from
> ==32571== ERROR SUMMARY: 2057 errors from 33 contexts (suppressed: 0 from 0)
> Segmentation fault (core dumped)
> 
> Now the question becomes what have I done to cause this improper memory access - I ran an IOC from the same sources with ADSimDetector and did not see this issue, at least not in my limited test time.
> 
> Thanks,
> Jakub
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> *From:* Mark Rivers <[email protected]>
> *Sent:* Thursday, January 23, 2020 12:06 PM
> *To:* Wlodek, Jakub <[email protected]>
> *Cc:* [email protected] <[email protected]>
> *Subject:* RE: CAS-client thread issues in areaDetector IOC
>  
> 
> Hi Jakub,
> 
>  
> 
> The errors are in the Channel Access server from EPICS base.  Questions:
> 
> -          Are your working cameras using the same version of base as the camera you are testing?
> 
> -          Are you sure you have rebuilt everything that your camera IOC uses with this version of base?
> 
>  
> 
> This looks to me like it could be a memory corruption problem, e.g. array out of bounds.  Those errors can remain hidden until you make small changes to your code.
> 
>  
> 
> I suggest running the IOC under valgrind to see if it detects memory allocation issues.
> 
>  
> 
> Mark
> 
>  
> 
>  
> 
> *From:*Tech-talk <[email protected]> *On Behalf Of *Wlodek, Jakub via Tech-talk
> *Sent:* Thursday, January 23, 2020 9:44 AM
> *To:* [email protected]
> *Subject:* CAS-client thread issues in areaDetector IOC
> 
>  
> 
> Hi all,
> 
>  
> 
> I am currently ironing out a few bugs in an areaDetector driver I wrote a while ago for USB webcams, and doing some code cleanup (removing redundant code etc.), and I have encountered an
> 
> error I have not seen before, and I'm not sure what I have done to introduce it. I have ~15 beamline cameras working with this driver for time frames ranging from several months to a year, so I am
> 
> fairly certain this is a new issue. Essentially, after the IOC runs normally for  some time, I get the following stack trace dumped to the EPICS shell:
> 
>  
> 
> epicsEventTrigger: pthread_mutex_lock failed: Invalid argument
> 
> epicsEventMustTriggerThread CAS-client (0x7f4e4c042f00) can't proceed, suspending.
> 
> Dumping a stack trace of thread 'CAS-client':
> 
> [    0x55bf0e032393]: /epics/src/support/areaDetector/ADUVC/iocs/adUVCIOC/bin/linux-x86_64/adUVCApp(epicsStackTrace+0x73)
> 
> [    0x55bf0e022dc5]: /epics/src/support/areaDetector/ADUVC/iocs/adUVCIOC/bin/linux-x86_64/adUVCApp(cantProceed+0xc5)
> 
> [    0x55bf0dfabfb3]: /epics/src/support/areaDetector/ADUVC/iocs/adUVCIOC/bin/linux-x86_64/adUVCApp(db_close_events+0x33)
> 
> [    0x55bf0dfd38df]: /epics/src/support/areaDetector/ADUVC/iocs/adUVCIOC/bin/linux-x86_64/adUVCApp(destroy_tcp_client+0x8f)
> 
> [    0x55bf0dfd44fd]: /epics/src/support/areaDetector/ADUVC/iocs/adUVCIOC/bin/linux-x86_64/adUVCApp(camsgtask+0x13d)
> 
> [    0x55bf0e02d7c8]: /epics/src/support/areaDetector/ADUVC/iocs/adUVCIOC/bin/linux-x86_64/adUVCApp(start_routine+0xf8)
> 
> [    0x7f4f922c46db]: /lib/x86_64-linux-gnu/libpthread.so.0(start_thread+0xdb)
> 
> [    0x7f4f9105988f]: /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)
> 
> Thread CAS-client (0x7f4e4c042f00) suspended
> 
> 2020/01/23 10:28:32.930 ADUVC::acquireStop Stopping aquisition
> 
> 2020/01/23 10:28:32.930 ADUVC::writeInt32 function=8 value=0
> 
> epicsEventTrigger: pthread_mutex_lock failed: Invalid argument
> 
> epicsEventMustTriggerThread CAS-client (0x7f4e4c0433d0) can't proceed, suspending.
> 
> Dumping a stack trace of thread 'CAS-client':
> 
> [    0x55bf0e032393]: /epics/src/support/areaDetector/ADUVC/iocs/adUVCIOC/bin/linux-x86_64/adUVCApp(epicsStackTrace+0x73)
> 
> [    0x55bf0e022dc5]: /epics/src/support/areaDetector/ADUVC/iocs/adUVCIOC/bin/linux-x86_64/adUVCApp(cantProceed+0xc5)
> 
> [    0x55bf0dfabfb3]: /epics/src/support/areaDetector/ADUVC/iocs/adUVCIOC/bin/linux-x86_64/adUVCApp(db_close_events+0x33)
> 
> [    0x55bf0dfd38df]: /epics/src/support/areaDetector/ADUVC/iocs/adUVCIOC/bin/linux-x86_64/adUVCApp(destroy_tcp_client+0x8f)
> 
> [    0x55bf0dfd44fd]: /epics/src/support/areaDetector/ADUVC/iocs/adUVCIOC/bin/linux-x86_64/adUVCApp(camsgtask+0x13d)
> 
> [    0x55bf0e02d7c8]: /epics/src/support/areaDetector/ADUVC/iocs/adUVCIOC/bin/linux-x86_64/adUVCApp(start_routine+0xf8)
> 
> [    0x7f4f922c46db]: /lib/x86_64-linux-gnu/libpthread.so.0(start_thread+0xdb)
> 
> [    0x7f4f9105988f]: /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)
> 
> Thread CAS-client (0x7f4e4c0433d0) suspended
> 
> Segmentation fault (core dumped)
> 
>  
> 
> Note how it does not crash after the first stack dump, and the number of times this gets printed varies, usually up to 10 times. I running the IOC with all
> 
> plugins except for NDStdArrays disabled, and capturing 640x480 images @ 30 fps. The currently updated code is available here: https://github.com/jwlodek/ADUVC, with the four
> 
> most recent commits being the code cleanup changes since the last known-working version. Could anyone help shine some light on what might be the issue here?
> 
> 
> 
> Thanks,
> 
> Jakub Wlodek
> 
>  
> 


Replies:
Getting images from area detector plugins Randall Cayford via Tech-talk
Re: CAS-client thread issues in areaDetector IOC Wlodek, Jakub via Tech-talk
References:
CAS-client thread issues in areaDetector IOC Wlodek, Jakub via Tech-talk
RE: CAS-client thread issues in areaDetector IOC Mark Rivers via Tech-talk
Re: CAS-client thread issues in areaDetector IOC Wlodek, Jakub via Tech-talk

Navigate by Date:
Prev: Re: CAS-client thread issues in areaDetector IOC Wlodek, Jakub via Tech-talk
Next: Getting images from area detector plugins Randall Cayford via Tech-talk
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  <20202021  2022  2023  2024 
Navigate by Thread:
Prev: Re: CAS-client thread issues in areaDetector IOC Wlodek, Jakub via Tech-talk
Next: Getting images from area detector plugins Randall Cayford via Tech-talk
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  <20202021  2022  2023  2024 
ANJ, 24 Jan 2020 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·