Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  <20192020  Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  <20192020 
<== Date ==> <== Thread ==>

Subject: RE: Streamdevice segmentation fault
From: "Hill, Bruce via Tech-talk" <tech-talk@aps.anl.gov>
To: "Brown, Garth" <gwbrown@slac.stanford.edu>, "tech-talk@aps.anl.gov" <tech-talk@aps.anl.gov>
Date: Thu, 7 Nov 2019 09:07:23 +0000
Ernest asked me to take a look at this.
I was able to duplicate the memory overrun and confirm that Garth's fix works.

The problem is that the expectedLength variable is set to -1 for certain conditions
that manifest in the Hybricon protocol that Garth was testing.    Thus, expectedLength should
really be ssize_t instead of size_t.  This causes the test for expectedLength > 0 in 
AsynDriverInterface::readHandler() to fail and bufferSize, a size_t, is set to -1.
The invalid bufferSize in turn leads to a buffer overrun.

We'll submit a pull request to the PSI github site soon.

Cheers,
Bruce

-----Original Message-----
From: Brown, Garth <gwbrown@slac.stanford.edu> 
Sent: Wednesday, November 6, 2019 2:47 PM
To: tech-talk@aps.anl.gov; Hill, Bruce <bhill@slac.stanford.edu>
Subject: RE: Streamdevice segmentation fault

I've found the problem, and made a fix that works (for me, so far, disclaimers of that sort). See attached patch.
It was introduced when variables representing sizes were changed to size_t. Sometimes AsynDriverInterface::expectedLength had been used as a signed number, assigned to -1 to indicate an error. The line "if (expectedLength > 0)" changed its behavior when it became unsigned.
The conditions causing a problem here are that maxInput is unset (defaults to 0) and there was input on the line after the InTerminator. Under those circumstances, ssize_t StreamCore::readCallback returns -1. This is called in StreamCore::evalIn()
	expectedInput = readCallback(lastInputStatus, NULL, 0); and the -1 is assigned to AsynDriverInterface::expectedLength a few lines later through
	return busReadRequest(pollPeriod, readTimeout, expectedInput, true); Then in AsynDriverInterface::readHandler(), that if statement is true, so the buffer length is set to 0xffffffffffffffff.
I don't have a deep enough understanding of the code overall to know if changing what ssize_t StreamCore::readCallback returns or how StreamCore::evalIn() handles it would be a better fix, or if StreamBuffer::reserve(0xffffffffffffffff) should be handling it differently. So I opted for a concise and narrow fix, going back to the old behavior in the if statement by casting to a signed type: "if ((ssize_t) expectedLength > 0)".

Garth

-----Original Message-----
From: Brown, Garth
Sent: Thursday, October 24, 2019 2:42 PM
To: 'tech-talk@aps.anl.gov' <tech-talk@aps.anl.gov>
Subject: RE: Streamdevice segmentation fault

Ok, the mail system is too clever. I've put the would-be attachment at:
https://drive.google.com/file/d/1YR4CySnWt9eCAVKZ9eXcy4Tk3RAZzS1l/view?usp=sharing

-----Original Message-----
From: Tech-talk <tech-talk-bounces@aps.anl.gov> On Behalf Of Brown, Garth via Tech-talk
Sent: Thursday, October 24, 2019 2:16 PM
To: 'tech-talk@aps.anl.gov' <tech-talk@aps.anl.gov>
Subject: RE: Streamdevice segmentation fault

For testing, I've created a stripped down version of the IOC in the attached tarball. You may have to delete "_" from the extension to extract, and replace configure/* with your own standard. The python script to simulate the target device is in IOCManagerApp/src/HybriconSim.py, which starts a server on localhost port 23456. The IOC can be started from iocBoot/sioc-li24-nw03 by running st.cmd.

Garth

-----Original Message-----
From: Tech-talk <tech-talk-bounces@aps.anl.gov> On Behalf Of Brown, Garth via Tech-talk
Sent: Monday, October 21, 2019 5:35 PM
To: Mark Rivers <rivers@cars.uchicago.edu>
Cc: tech-talk@aps.anl.gov
Subject: RE: Streamdevice segmentation fault

Hi Mark,
Sure, attached. I've tested my IOC with it, and it reliably crashes. I've inserted sleeps in the middle of the messages so it would more accurately behave like the crate. I also found that if I tack that last short message onto the previous message, the IOC doesn't crash.

Garth

-----Original Message-----
From: Mark Rivers <rivers@cars.uchicago.edu>
Sent: Sunday, October 20, 2019 9:17 AM
To: Brown, Garth <gwbrown@slac.stanford.edu>
Cc: tech-talk@aps.anl.gov
Subject: Re: Streamdevice segmentation fault

Hi Garth,

Could you write a simple sever in Python or C that sends that string to the asyn/Stream client? If you do that does it crash the same way? If so, then we can reproduce the problem and track it down.

Mark


Sent from my iPhone

On Oct 15, 2019, at 6:06 PM, Brown, Garth via Tech-talk <tech-talk@aps.anl.gov<mailto:tech-talk@aps.anl.gov>> wrote:

After updating an IOC from:
streamdevice 2.5, asyn 4.21, base 3.14.12
to:
streamdevice 2.8.9 (also tested 2.7.7), asyn 4.32 (also tested 4.31 and 4.35), base 7.0.2 I'm seeing 100% repeatable segfaults after processing the first response from a Hybricon VME crate.
I've attached the db and protocol files I'm using.

The text it is processing when it crashes is:
S/N Z834ÿÿÿÿ   U/N      Q:SN=Z834    ,UN=     ,IP=172.027.005.094,V3=3308,V5=5005,V+12=12009,V-12=12415,V24=00000,T1=29,T2=27,T3=28,T4=0,F1=07230,F2=07380,F3=07410,F4=07290,F5=07290,F6=07290,F7=00000,F8=00000,F9=00000,OT=0,OV=0,UV=000000,FF=0,PS1=1,PS2=1,MSG=0,SW=1,PROT=00,BS=0,HV=1,SD=1,WP=1,OS=000,CODE=10156F8F,ENET=D6.10,PSFAIL=00,POH=34848.0,MAXTMP=61,MINTMP=06
;EV0001000000000000ET00000000EF000000000OT0OV0UV000000FF0PS11MSG0SW1BS0HV1SD1WP1OS000

You can get a better idea of the raw ASCII in the attached console log with ASYN debug output, or in the streamdevice debug log (too huge to attach) at: https://gist.github.com/SLACer-garth/6505fd173dd71d7cfa2620d09e325276
One quirk that stands out to me is that the line terminator is \r\000\n. It would not surprise me if that null character in the middle causes parsing problems. I have tried changing the protocol file to use just \r or just \n, and it still crashed.
The call stack  at the point where it segfaults is:
Thread #7 252221 [core: 0] (Suspended : Signal : SIGSEGV:Segmentation fault)
                AsynDriverInterface::startTimer() at AsynDriverInterface.cc:228<http://asyndriverinterface.cc:228> 0x49f9b9
                AsynDriverInterface::readRequest() at AsynDriverInterface.cc:845<http://asyndriverinterface.cc:845> 0x49c769
                StreamBusInterface::Client::busReadRequest() at StreamBusInterface.h:82 0x4c0180
                StreamCore::evalIn() at StreamCore.cc:921<http://streamcore.cc:921> 0x4ba493
                StreamCore::readCallback() at StreamCore.cc<http://StreamCore.cc>:1,092 0x4bb48b
                StreamBusInterface::readCallback() at StreamBusInterface.h:118 0x49f5a8
                AsynDriverInterface::readHandler() at AsynDriverInterface.cc<http://AsynDriverInterface.cc>:1,064 0x49d793
                AsynDriverInterface::handleRequest() at AsynDriverInterface.cc<http://AsynDriverInterface.cc>:1,510 0x49ee87
                AsynDriverInterface::handleRequest() at AsynDriverInterface.cc:243<http://asyndriverinterface.cc:243> 0x49fa4b
                portThread() at asynManager.c:879 0x492439
                start_routine() at osdThread.c:403 0x5e1f9c
                start_thread() at 0x7ffff79a4aa1
                clone() at 0x7ffff6902c4d

The pointer "timer" on line 226 of AsynDriverInterface.c (this is code in the streamdevice module) has a value which, when decoded as ASCII, contains part of the message from the crate, "SW1BS0HV1SD1WP1O". It looks likely that inputBuffer.local was written beyond its limit. Right after inputBuffer.local in memory is inputBuffer.len. If you look at that memory location, decoded as ASCII, it's that last line from the crate data, "EV00010000000000". The "timer" pointer is a few memory addresses past there. But I haven't been able to identify a root cause, the point where a buffer overflows or whatever it is. Has anyone seen this before?

Garth

<crat_hybricon_8slot.template>
<crat_hybricon_8slot.proto.db>
<HybriconStreamFaultConsole.txt>

Note: The original attachment was automatically removed by Stanford's email system because it was identified as a file type that is commonly associated with malicious software.  In order to transmit this type of file, please use an alternate mechanism such as Stanford's Box service (https://itservices.stanford.edu/service/box).

References:
Streamdevice segmentation fault Brown, Garth via Tech-talk
Re: Streamdevice segmentation fault Mark Rivers via Tech-talk
RE: Streamdevice segmentation fault Brown, Garth via Tech-talk
RE: Streamdevice segmentation fault Brown, Garth via Tech-talk
RE: Streamdevice segmentation fault Brown, Garth via Tech-talk
RE: Streamdevice segmentation fault Brown, Garth via Tech-talk

Navigate by Date:
Prev: RE: Streamdevice segmentation fault Brown, Garth via Tech-talk
Next: rmt CCExecInit problem Tasaddaq Khan via Tech-talk
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  <20192020 
Navigate by Thread:
Prev: RE: Streamdevice segmentation fault Brown, Garth via Tech-talk
Next: areaDetector module ADAravis Jörn Dreyer via Tech-talk
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  <20192020 
ANJ, 07 Nov 2019 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·