Experimental Physics and
Industrial Control System

2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 <2021> 2022 2023 2024 2025	Index	2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 <2021> 2022 2023 2024 2025
<== Date ==>		<== Thread ==>

Subject:	Re: valgrind errors in base 7.0.5
From:	"Johnson, Andrew N. via Core-talk" <core-talk at aps.anl.gov>
To:	Michael Davidsaver <mdavidsaver at gmail.com>
Cc:	EPICS core-talk <core-talk at aps.anl.gov>
Date:	Mon, 5 Apr 2021 15:55:43 +0000

On Apr 2, 2021, at 11:44 PM, Michael Davidsaver <mdavidsaver at gmail.com> wrote:
> 
> On 4/2/21 8:25 PM, Mark Rivers wrote:
> 
>> but I do see this CAS-event problem.
> 
> I think this is a separate, and also I'm not certain how to go about troubleshoot it.
> Somehow an uninitialized value has ended up in a CA subscription update.

Since Mark is using 7.0.5 this is our first release with the zero-length array support. Could it be that a get/monitor of a zero-length array (e.g. a waveform record that hasn’t been initialized) now isn’t going to copy anything into the server’s data field since we now support zero elements?

- Andrew


> Some context might help.  Do you see this when monitoring any PV, or only specific one(s)?
> 
> 
>> ==6123== Thread 21 CAS-event:
>> ==6123== Syscall param socketcall.sendto(msg) points to uninitialised byte(s)
>> ==6123==    at 0x5909F96: send (in /usr/lib64/libpthread-2.28.so)
>> ==6123==    by 0xDFE970: cas_send_bs_msg (caserverio.c:68)
>> ==6123==    by 0xDFEF59: cas_copy_in_header (caserverio.c:287)
>> ==6123==    by 0xDF77C9: read_reply (camessage.c:513)
>> ==6123==    by 0xDD0FC8: event_read (dbEvent.c:959)
>> ==6123==    by 0xDD0FC8: event_task (dbEvent.c:1038)
>> ==6123==    by 0xE44682: start_routine (osdThread.c:412)
>> ==6123==    by 0x59002DD: start_thread (in /usr/lib64/libpthread-2.28.so)
>> ==6123==    by 0x6B9F132: clone (in /usr/lib64/libc-2.28.so)
>> ==6123==  Address 0x1fadb914 is 2,260 bytes inside a block of size 135,168 alloc'd
>> ==6123==    at 0x4C30F0B: malloc (vg_replace_malloc.c:307)
>> ==6123==    by 0xDF5FB8: casExpandBuffer (caservertask.c:1341)
>> ==6123==    by 0xDF5FB8: casExpandSendBuffer (caservertask.c:1392)
>> ==6123==    by 0xDFEE96: cas_copy_in_header (caserverio.c:275)
>> ==6123==    by 0xDF77C9: read_reply (camessage.c:513)
>> ==6123==    by 0xDD0FC8: event_read (dbEvent.c:959)
>> ==6123==    by 0xDD0FC8: event_task (dbEvent.c:1038)
>> ==6123==    by 0xE44682: start_routine (osdThread.c:412)
>> ==6123==    by 0x59002DD: start_thread (in /usr/lib64/libpthread-2.28.so)
>> ==6123==    by 0x6B9F132: clone (in /usr/lib64/libc-2.28.so)
>> ==6123==
>> 
>> Mark
>> 
>> 
>> -----Original Message-----
>> From: Michael Davidsaver <mdavidsaver at gmail.com> 
>> Sent: Friday, April 2, 2021 7:44 PM
>> To: Mark Rivers <rivers at cars.uchicago.edu>
>> Cc: EPICS core-talk <core-talk at aps.anl.gov>; Johnson, Andrew N. <anj at anl.gov>
>> Subject: Re: valgrind errors in base 7.0.5
>> 
>> Mark,
>> 
>> On 4/2/21 4:28 PM, Johnson, Andrew N. via Core-talk wrote:
>>> Hi Mark,
>>> 
>>> On Apr 2, 2021, at 4:33 PM, Mark Rivers via Core-talk <core-talk at aps.anl.gov> wrote:
>>>> 
>>>> I was tracking down some errors in an IOC with valgrind.  I found and fixed the errors in my drivers.  But I am seeing some errors in CAS-client and CAS-event.
>>> 
>>> The valgrind tools are very useful, but they can throw up a number of false positives inside EPICS, and the event you posted below is likely to be one of those since it’s talking about an area of memory that was allocated by the EPICS freeList module which manages memory allocations that are all the same size. This technique was essential with older versions of the VxWorks memory allocator which fragmented the free memory pool and thus prevented IOCs from running for long periods of time.
>>> 
>>> Michael Davidsaver may have some additional advice, but unless you’re seeing crashes associated with these I would not be too concerned about them. He has annotated many of the internal routines to teach Valgrind about the routines in Base that allocate and deallocate memory, but an “Invalid read” might be related to thread ownership and I don’t know whether there are annotations for that kind of event.
>> 
>> I think this is a false positive, but not one I've seen before.  So thanks for reporting it.
>> 
>> I don't suppose you can replicate this with softIocPVA?
>> 
>> 
>> details...
>> 
>> "Invalid read" is an access to memory which valgrind considers unallocated.  This is often followed by "Address ... is not stack'd, malloc'd or (recently) free'd", and then a SIGSEGV.
>> 
>> "... inside a recently re-allocated block ..." is not a message I've seen before, and I don't find many references to it.  I also don't see it myself, at least with softIocPVA and some camonitor/pvmonitor.
>> 
>> My initial guess is that this is something to do with how the annotations in freeListLib are being (mis)handled [1].  Possibly related to valgrind version.  I currently have:
>> 
>>> $ valgrind --version
>>> valgrind-3.14.0
>> 
>> 
>> 
>> [1] https://github.com/epics-base/epics-base/blob/c829ca39d873d59947d851a7b22d74d14748e336/modules/libcom/src/freeList/freeListLib.c#L88-L94
>> 
>> Also see the VALGRIND_MEMPOOL_*() manipulations further down.
>> 
>> 
>> 
>>> - Andrew
>>> 
>>> 
>>> 
>>>> 
>>>> ==146728== Thread 25 CAS-client:
>>>> ==146728== Invalid read of size 8
>>>> ==146728==    at 0xE37113: db_close_events (dbEvent.c:378)
>>>> ==146728==    by 0xE5DB92: destroy_tcp_client (caservertask.c:1217)
>>>> ==146728==    by 0xE5E77D: camsgtask (camsgtask.c:161)
>>>> ==146728==    by 0xEB131B: start_routine (osdThread.c:412)
>>>> ==146728==    by 0x5EC7E24: start_thread (in /usr/lib64/libpthread-2.17.so)
>>>> ==146728==    by 0x704BBAC: clone (in /usr/lib64/libc-2.17.so)
>>>> ==146728==  Address 0x1638ee68 is 12,120 bytes inside a recently re-allocated block of size 19,464 alloc'd
>>>> ==146728==    at 0x4C29C23: malloc (vg_replace_malloc.c:299)
>>>> ==146728==    by 0xE9FBA1: freeListMalloc (freeListLib.c:95)
>>>> ==146728==    by 0xE9FCBD: freeListCalloc (freeListLib.c:68)
>>>> ==146728==    by 0xE36F71: db_init_events (dbEvent.c:297)
>>>> ==146728==    by 0x64E0CB: PDBProvider::PDBProvider(std::tr1::shared_ptr<epics::pvAccess::Configuration const> const&) (pdb.cpp:542)
>>>> ==146728==    by 0x64665D: epics::pvAccess::SingletonChannelProviderFactory<PDBProvider>::sharedInstance() (pvAccess.h:1412)
>>>> ==146728==    by 0x7073A5: epics::pvAccess::ChannelProviderRegistry::getProvider(std::string const&) (ChannelAccessFactory.cpp:48)
>>>> ==146728==    by 0x7495AE: epics::pvAccess::ServerContextImpl::loadConfiguration() (serverContext.cpp:162)
>>>> ==146728==    by 0x74A3FA: epics::pvAccess::ServerContext::create(epics::pvAccess::ServerContext::Config const&) (serverContext.cpp:597)
>>>> ==146728==    by 0x6E27B9: (anonymous namespace)::startitup() (PVAServerRegister.cpp:46)
>>>> ==146728==    by 0x6E31F4: startPVAServer (PVAServerRegister.cpp:71)
>>>> ==146728==    by 0x6E31F4: void epics::detail::call1<char const*, &(anonymous namespace)::startPVAServer>(iocshArgBuf const*) (iocshelper.h:83)
>>>> ==146728==    by 0xEA1C1F: iocshBody (iocsh.cpp:938)
>>>> ==146728==
>>>> 
>>>> ==146728== Thread 21 CAS-event:
>>>> ==146728== Syscall param socketcall.sendto(msg) points to uninitialised byte(s)
>>>> ==146728==    at 0x5ECECFB: send (in /usr/lib64/libpthread-2.17.so)
>>>> ==146728==    by 0xE67479: cas_send_bs_msg (caserverio.c:68)
>>>> ==146728==    by 0xE67A98: cas_copy_in_header (caserverio.c:287)
>>>> ==146728==    by 0xE5F74B: read_reply (camessage.c:513)
>>>> ==146728==    by 0xE38084: event_read (dbEvent.c:959)
>>>> ==146728==    by 0xE38084: event_task (dbEvent.c:1038)
>>>> ==146728==    by 0xEB131B: start_routine (osdThread.c:412)
>>>> ==146728==    by 0x5EC7E24: start_thread (in /usr/lib64/libpthread-2.17.so)
>>>> ==146728==    by 0x704BBAC: clone (in /usr/lib64/libc-2.17.so)
>>>> ==146728==  Address 0x1d768584 is 36 bytes inside a block of size 135,168 alloc'd
>>>> ==146728==    at 0x4C29C23: malloc (vg_replace_malloc.c:299)
>>>> ==146728==    by 0xE5E029: casExpandBuffer (caservertask.c:1341)
>>>> ==146728==    by 0xE5E029: casExpandSendBuffer (caservertask.c:1392)
>>>> ==146728==    by 0xE67A4A: cas_copy_in_header (caserverio.c:275)
>>>> ==146728==    by 0xE5F74B: read_reply (camessage.c:513)
>>>> ==146728==    by 0xE38084: event_read (dbEvent.c:959)
>>>> ==146728==    by 0xE38084: event_task (dbEvent.c:1038)
>>>> ==146728==    by 0xEB131B: start_routine (osdThread.c:412)
>>>> ==146728==    by 0x5EC7E24: start_thread (in /usr/lib64/libpthread-2.17.so)
>>>> ==146728==    by 0x704BBAC: clone (in /usr/lib64/libc-2.17.so)
>>>> ==146728==
>>>> 
>>>> Base 7.0.5
>>>> Centos 7
>>>> gcc 4.8.5
>>>> 
>>>> Mark
>>> 
>> 
> 

-- 
Complexity comes for free, simplicity you have to work for.

References:: valgrind errors in base 7.0.5 Mark Rivers via Core-talk; Re: valgrind errors in base 7.0.5 Johnson, Andrew N. via Core-talk; Re: valgrind errors in base 7.0.5 Michael Davidsaver via Core-talk; RE: valgrind errors in base 7.0.5 Mark Rivers via Core-talk; Re: valgrind errors in base 7.0.5 Michael Davidsaver via Core-talk

Navigate by Date:: Prev: Build failed: EPICS Base 7 base-7.0-291 AppVeyor via Core-talk; Next: Build completed: epics-base base-7.0-690 AppVeyor via Core-talk; Index: 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 <2021> 2022 2023 2024 2025
Navigate by Thread:: Prev: Re: valgrind errors in base 7.0.5 Michael Davidsaver via Core-talk; Next: Build failed: EPICS Base 7 base-7.0-283 AppVeyor via Core-talk; Index: 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 <2021> 2022 2023 2024 2025

ANJ, 05 Apr 2021

· Home · News · About · Base · Modules · Extensions · Distributions ·
· Download · Search · IRMIS · Talk · Documents · Links · Licensing ·

Experimental Physics and Industrial Control System

Experimental Physics and
Industrial Control System