EPICS Home

Experimental Physics and Industrial Control System


 
1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  <20122013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024  Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  <20122013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
<== Date ==> <== Thread ==>

Subject: RE: asynPortDriver
From: "Szalata, Zenon M." <[email protected]>
To: Mark Rivers <[email protected]>
Cc: "[email protected]" <[email protected]>
Date: Wed, 25 Jul 2012 12:37:54 -0700
Hi Mark,
Instead, I edited the patched devAsynXXXArray.h file, where I commented out the two lines that refer to timestamp.
With this change, my IOC has processed nearly 1 million triggers without an error.  Perhaps it is too early to declare victory but it smells like this is it.
Thanks for your help,
Zen

> -----Original Message-----
> From: Mark Rivers [mailto:[email protected]]
> Sent: Wednesday, July 25, 2012 7:02 AM
> To: Szalata, Zenon M.
> Subject: RE: asynPortDriver
>
> Hi Zen,
>
> I realized you probably cannot just modify the patch, since the line numbers
> won't match.
>
> What you want to do is just to bracket the two statements that access the
> waveform record with pdf to be bracketed by a dbScanLock like this:
>
>     dbScanLock((dbCommon *)pwf); \
>     if (len > pwf->nelm) len = pwf->nelm; \
>     for (i=0; i<(int)len; i++) pData[i] = value[i]; \
>     dbScanUnlock((dbCommon *)pwf); \
>
> Mark
>
> -----Original Message-----
> From: Szalata, Zenon M. [mailto:[email protected]]
> Sent: Tuesday, July 24, 2012 10:41 PM
> To: Mark Rivers
> Subject: RE: asynPortDriver
>
> Hi Mark,
> Asyn does not build after I patch the file.  I get:
> ../../asyn/devEpics/devAsynInt8Array.c: In function 'callbackWf':
> ../../asyn/devEpics/devAsynInt8Array.c:46: error: 'asynUser' has no member
> named 'timestamp'
> ../../asyn/devEpics/devAsynInt8Array.c: In function 'interruptCallbackInput':
> ../../asyn/devEpics/devAsynInt8Array.c:46: error: 'asynUser' has no member
> named 'timestamp'
> make[2]: *** [devAsynInt8Array.o] Error 1
>
> Zen
>
>
> > -----Original Message-----
> > From: Mark Rivers [mailto:[email protected]]
> > Sent: Tuesday, July 24, 2012 3:35 PM
> > To: Szalata, Zenon M.
> > Cc: [email protected]; Williams Jr., Ernest L.
> > Subject: RE: asynPortDriver
> >
> > Hi Zen,
> >
> > I think we were on the right track with the gotValue check, but the
> > timing problem is a little different.
> >
> > Here is the problem I thought might be happening:
> >
> > - Your driver calls doCallbacksInt32Array.
> > - This results in a call to interruptCallbackInput in
> > asyn/devEpics/devAsynXXXArray.h
> > - interruptCallbackInput copies the data to the waveform record VAL
> > field and then calls scanIoRequest
> > - Another interrupt occurs before the record processes and calls
> > devAsynXXXArray::processCommon
> > - gotValue will be 1 because the processCommon has not yet reset
> > gotValue=0
> >
> > You showed that this problem can happen, but only at higher callback rates.
> >
> > Here is the problem I now think is happening:
> > - Your driver calls doCallbacksInt32Array.
> > - This results in a call to interruptCallbackInput in
> > asyn/devEpics/devAsynXXXArray.h
> > - interruptCallbackInput copies the data to the waveform record VAL
> > field and then calls scanIoRequest
> > - scanIoRequest calls processCommon, which sets gotValue=0
> > - Another interrupt occurs before the record has finished posting
> > monitors.  It overwrites the data in the record.  It calls scanIoRequest again.
> > - The waveform record then processes twice with the data from the
> > second callback.
> >
> > This appears to be what you are seeing.
> >
> > I just realized that the problem is that there no mutex preventing the
> > callback from overwriting the waveform data at any time when the
> > waveform record is processing, which is bad.  Note that is problem is
> > confined to I/O Intr scanned waveform records, it does not happen on
> > other records that use a ring buffer.
> >
> > I think the solution is to call dbScanLock during the time the array
> > data is being modified in the callback.
> >
> > Here is a patch against asyn R4-19.  Can you try it?  It also sets the
> > timestamp from the callback.
> >
> > corvette:asyn/asyn/devEpics>svn diff -r1995 devAsynXXXArray.h
> > Index: devAsynXXXArray.h
> >
> =============================================================
> ======
> > --- devAsynXXXArray.h   (revision 1995)
> > +++ devAsynXXXArray.h   (working copy)
> > @@ -254,6 +254,7 @@
> >          pPvt->pasynUser, pwf->bptr, pwf->nelm, &nread); \
> >      asynPrint(pasynUser, ASYN_TRACEIO_DEVICE, \
> >                "%s %s::callbackWf\n", pwf->name, driverName); \
> > +    pwf->time = pasynUser->timestamp; \
> >      if (status == asynSuccess) { \
> >          pwf->udf=0; \
> >          pwf->nord = nread; \
> > @@ -280,8 +281,11 @@
> >          (char *)value, len*sizeof(EPICS_TYPE), \
> >          "%s %s::interruptCallbackInput\n", \
> >          pwf->name, driverName); \
> > +    dbScanLock((dbCommon *)pwf); \
> >      if (len > pwf->nelm) len = pwf->nelm; \
> >      for (i=0; i<(int)len; i++) pData[i] = value[i]; \
> > +    pwf->time = pasynUser->timestamp; \
> > +    dbScanUnlock((dbCommon *)pwf); \
> >      pPvt->gotValue = 1; \
> >      pPvt->nord = len; \
> >      if (pPvt->status == asynSuccess) pPvt->status =
> > pasynUser->auxStatus; \
> >
> >
> >
> > Mark
> >
> >
> > -----Original Message-----
> > From: Szalata, Zenon M. [mailto:[email protected]]
> > Sent: Tuesday, July 24, 2012 4:47 PM
> > To: Mark Rivers
> > Cc: [email protected]; Williams Jr., Ernest L.
> > Subject: RE: asynPortDriver
> >
> > Hi Mark,
> > I did couple of things to help understand what is happening.
> > 1. I put a printf statement in interruptCallbackInput to print a
> > message when gotValue not equals 0.
> >     This message does not get printed when the error condition occurs.
> > I tested that the message gets
> >     printed by increasing the trigger rate until the IOC "fall apart"
> > and those messages then come out abundantly.
> >
> > 2. I have simplified my diagnostic mechanism to remove as much
> > ambiguity as possible to the point
> >     that I think the problem happens between posting the data in the
> > device driver and the waveform
> >     record processing (I/O Intr).
> >
> > My diagnostic mechanism consists of two sets of EPICS records.
> > The data are passed from the device driver as an array of 17 integers.
> > The first is a gate counter, which is incremented just before it is put in the
> array.
> > This counter is followed by 16 ADC values.  The array is received in a
> > waveform record.  The waveform record triggers processing of an aSub
> > record and a chain of sub array records where the ADC values are
> > extracted and put into ai records.
> > The aSub record compares the value of the gate counter with the value
> > of gate counter from a previous event.  If this value is not greater
> > by 1 than the previous value, it prints a message, which shows this and
> previous values.
> >
> > The second set of diagnostic records is processed when the gate
> > counter is extracted from the waveform record and put into longin
> > record.  there is a sub record which does similar checking, but on
> > discrete data and prints a message if the counter value did not change
> > by 1.  Printed outputs from both sets indicate the same problem.
> >
> > Here is a typical output:
> >
> > asubMiss: Something wrong: n=53385,lastn=53383
> > asubMiss: Something wrong: n=53385,lastn=53385
> > subASMissing: got a missing ++++++++++++++++
> >
> a=53385.000,b=0.000,c=4294967295.000,d=1.000,e=53383.000,f=1.000,g=1.
> 0
> > 00
> > k=2.000,l=1.000,val=2.000
> > subASMissing: trigger number this=53385.0, prev=53385.0
> >
> > The first two messages come from the aSub record, the next three come
> > from the sub record.
> > I am not sure how to interpret the presence of the second of the two
> > messages from the aSub record and the third from the sub record.  It
> > looks as if the waveform record processed twice when the problem
> occurred.
> >
> > The sub record message prints the values of the fields a through l,
> > where a is this counter and e is previous counter.
> >
> > Thank you Mark for looking at this,
> > Zen
> >
> >
> > > -----Original Message-----
> > > From: Mark Rivers [mailto:[email protected]]
> > > Sent: Thursday, July 19, 2012 6:10 PM
> > > To: Szalata, Zenon M.
> > > Cc: [email protected]; Williams Jr., Ernest L.
> > > Subject: RE: asynPortDriver
> > >
> > > Hi Zen,
> > >
> > > What is the diagnostic mechanism you are using to determine that the
> > > counter has incremented by 2?  Is there a possibility that the
> > > waveform record is actually processing every callback OK, but that
> > > arrays are being dropped between the waveform record and whatever
> > > diagnostic client you are running?
> > >
> > > Mark
> > >
> > > ________________________________________
> > > From: Szalata, Zenon M. [[email protected]]
> > > Sent: Thursday, July 19, 2012 5:23 PM
> > > To: Mark Rivers
> > > Cc: [email protected]; Williams Jr., Ernest L.
> > > Subject: RE: asynPortDriver
> > >
> > > Hi Mark,
> > > I modified my version 2 asynPortDriver based device driver so that now it:
> > > 1. in ISR (at interrupt level) it read data from all 16 channels and
> > > sends the data by message queue.
> > > 2. In a high priority thread the data is received from the message
> > > queue, unpacked, and posted as an array.
> > >     The unpacked data are put into an array where the first element
> > > is a counter, which is incremented in the
> > >     data unpacking thread, and elements 1 through 16 are the ADC values.
> > > In the db layer I still see missing data.  Once in a while the
> > > counter value, which is the first element in the data array, is
> > > larger by 2 than the previous counter value.  The test is done at 100Hz.
> > >
> > > The version 3 of the device driver, which is not using asyn, does
> > > not exhibit this behavior.  In fact I can run the system with a
> > > trigger rate of up to 400Hz before it starts falling apart.  Since I
> > > have a device driver that works (version 3) I could just use it and move on.
> > > But at this point I am still reluctant to just abandon the
> > > asynPortDriver version.  Can you think of some tests that could help
> > understand what is causing this problem?
> > > Thanks,
> > > Zen
> > >
> > > > -----Original Message-----
> > > > From: Mark Rivers [mailto:[email protected]]
> > > > Sent: Sunday, July 15, 2012 6:59 PM
> > > > To: Szalata, Zenon M.
> > > > Cc: [email protected]; Williams Jr., Ernest L.
> > > > Subject: RE: asynPortDriver
> > > >
> > > > Hi Zen,
> > > >
> > > > > In the first version of the device driver, I saw the effect that
> > > > > Kevin Tsubota described in his reply to my email. This is what he said:
> > > >
> > > > > "At Keck we ran into a similar problem where we weren't seeing
> > > > > all the events we expected with a different driver and it turned
> > > > > out to be that ASYN
> > > > only posts on change."
> > > >
> > > > OK, there are 3 parts to my reply about that:
> > > >
> > > > 1) This is not an issue with ASYN in general, it is specific to
> > asynPortDriver.
> > > >
> > > > 2) A future version of asynPortDriver should have a variant of the
> > > > setXXXParam that takes a "force" flag, i.e.
> > > > setIntegerParam(addr, myParam, value, forceCallback) which will
> > > > force a callback the next time callParamCallbacks(addr) is called,
> > > > even if the value has not changed.
> > > >
> > > > 3) Meanwhile it is trivial to force such behavior in the current
> > > > version by doing setIntegerParam(addr, myParam, value+1);
> > > > setIntegerParam(addr, myParam, value);
> > > >
> > > > This will fool the parameter library into thinking that the value
> > > > has changed so it will do the callback on "value".  The overhead
> > > > of doing the additional call to setIntegerParam is insignificant.
> > > >
> > > > > In all three version of the device driver The data is read out
> > > > > at interrupt level, which is needed to clear the interrupt.
> > > > > Then I schedule a callback to continue processing the data.
> > > > > This consists of
> > > > unpacking the data  words into various components.
> > > >
> > > > I would have to see your code, but what you are doing sounds dangerous.
> > > > What if a second interrupt occurs before the callback occurs to
> > > > process the first?  If your data is not put in a queue, then you
> > > > risk missing data.  In my Ip330 and ipUnidig drivers I send the
> > > > data from the interrupt service routine to the thread that will do
> > > > callbacks via an epicsMessageQueue.  Then even if the thread that
> > > > does callbacks momentarily falls behind I will not lose data
> > > > unless the message queue
> > > overflows, and I have a flag to signal if that occurs.
> > > >
> > > > Mark
> > > >
> > > > ________________________________________
> > > > From: Szalata, Zenon M. [[email protected]]
> > > > Sent: Sunday, July 15, 2012 8:20 PM
> > > > To: Mark Rivers
> > > > Cc: [email protected]; Williams Jr., Ernest L.
> > > > Subject: RE: asynPortDriver
> > > >
> > > > Hi Mark,
> > > > In the first version of the device driver, I saw the effect that
> > > > Kevin Tsubota described in his reply to my email.  This is what he said:
> > > >
> > > > "At Keck we ran into a similar problem where we weren't seeing all
> > > > the events we expected with a different driver and it turned out
> > > > to be that ASYN only posts on change."
> > > >
> > > > In all three version of the device driver The data is read out at
> > > > interrupt level, which is needed to clear the interrupt.  Then I
> > > > schedule a callback to continue processing the data.  This
> > > > consists of unpacking the data words into various components.
> > > > In the case of the first version of the device driver, the
> > > > callback requests to process the records are done in a for loop.
> > > > Perhaps I have abandoned the first version too quickly.
> > > > Anyway it was clear to me that it would be better to bundle the
> > > > trigger number and data from all ADC channels and pass that to the
> > > > db
> > layer.
> > > >
> > > > I will modify the C subroutine of the subroutine record to also
> > > > print a message when the current gate number is less than the
> > > > previous
> > value.
> > > >
> > > > Thanks,
> > > > Zen
> > > >
> > > > > -----Original Message-----
> > > > > From: Mark Rivers [mailto:[email protected]]
> > > > > Sent: Sunday, July 15, 2012 3:19 PM
> > > > > To: Szalata, Zenon M.
> > > > > Cc: [email protected]; Williams Jr., Ernest L.
> > > > > Subject: RE: asynPortDriver
> > > > >
> > > > > How are you passing the data from your interrupt service routine
> > > > > to the thread that is calling callParamCallbacks()?  Are you
> > > > > using a mechanism with a queue, like an epicsMessageQueue?
> > > > >
> > > > > I'd like to understand why "the order of record processing is
> > > > > somewhat random" in method 1.  I would think that it would be
> > > > > deterministic because there is only 1 callback thread that is
> > > > > processing all the records in the order in which the callback
> > > > > requests were issued, which I assume your driver does in a loop like:
> > > > >
> > > > > for (i=0; i<16; i++) {
> > > > >   callParamCallbacks(i);
> > > > > }
> > > > >
> > > > > Mark
> > > > >
> > > > > ________________________________
> > > > > From: Szalata, Zenon M. [[email protected]]
> > > > > Sent: Sunday, July 15, 2012 12:15 PM
> > > > > To: Mark Rivers
> > > > > Cc: [email protected]; Williams Jr., Ernest L.
> > > > > Subject: asynPortDriver
> > > > >
> > > > > Hi Mark,
> > > > > I am trying to understand why my device driver for a CAEN VME
> > > > > gated ADC module, which is based on asynPortDriver class works
> > incorrectly.
> > > > > The module has 16 ADC channels and is setup to generate an
> > > > > interrupt for each gate signal.  The interrupt triggers data
> > > > > readout and record processing.  For testing, the trigger/gate
> > > > > rate is 100
> > Hz.
> > > > > The device driver maintains a gate counter, which gets
> > > > > incremented each time the data
> > > > is processed.
> > > > > So far I have three versions of the device driver.
> > > > >
> > > > > 1. In this version I have 16 I/O Intr scanned ai records.  The
> > > > > gate counter is pushed to a longin record.
> > > > >     This version has the problem that the order of record
> > > > > processing is somewhat random and the IOC
> > > > >     using this VME module requires the processing to be deterministic.
> > > > > To overcome this difficulty,
> > > > >    I wrote the second version.
> > > > >
> > > > > 2. In this version I pack the gate counter and all 16 ADC
> > > > > channel values into an array and this array gets
> > > > >     pushed to a waveform record.  Then, the data gets
> > > > > distributed to ai records using  sub array records.
> > > > >    This part works fine.  I have subroutine record which is
> > > > > processed each time the data waveform record
> > > > >    is processed.  It checks for missing triggers, by expecting
> > > > > that the gate counter value for this data array
> > > > >    is larger by 1 than the previous one.  This is where the
> > > > > device driver code seems to fail.
> > > > >    I let the IOC run for a bit over 35 hours.  More precisely
> > > > > the time corresponds to 12700898 triggers.
> > > > >    During this time the subroutine record reported 42 missing
> > > > > triggers.  They occurred spread out
> > > > >    throughout the run nearly but not equally spaced in time. I
> > > > > have a simple print statement
> > > > >    in the C routine which prints a few numbers.  These numbers
> > > > > tell me that the trigger incremented by 2.
> > > > >    I just looked at the logic in the C routine and I see that I
> > > > > am not printing messages when the new gate
> > > > >    counter is less than the previous one.  So it is possible
> > > > > that from time to time the waveform record gets
> > > > >    the data out of order.  It is also possible, but I think
> > > > > unlikely, that from time to time the data is lost.
> > > > >
> > > > > 3. This version of the device driver is non-asyn.  It consists
> > > > > of two parts, device support and record support.
> > > > >     I wrote it, by taking the second version of the device
> > > > > driver and replaced the asynPortDriver part with
> > > > >    device support.  Also, I created a new version of the IOC, by
> > > > > taking the IOC which used the second version of
> > > > >    the device driver and modifying the db files and the st.cmd
> > > > > file as needed.  I ran this IOC version
> > > > >   overnight, so far it has processed 4 and half million triggers
> > > > > and not a single missed trigger.
> > > > >
> > > > > I wonder if the behavior observed using the second version of
> > > > > the device driver is to be expected?
> > > > > Does it mean, that for some applications the asynPortDriver
> > > > > approach to device driver implementation might be not appropriate?
> > > > > Zen



Replies:
RE: asynPortDriver Mark Rivers
References:
asynPortDriver Szalata, Zenon M.
RE: asynPortDriver Mark Rivers
RE: asynPortDriver Szalata, Zenon M.
RE: asynPortDriver Mark Rivers
RE: asynPortDriver Szalata, Zenon M.
RE: asynPortDriver Mark Rivers
RE: asynPortDriver Szalata, Zenon M.
RE: asynPortDriver Mark Rivers

Navigate by Date:
Prev: Re: asynPortDriver Andrew Johnson
Next: RE: asynPortDriver Mark Rivers
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  <20122013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
Navigate by Thread:
Prev: Re: asynPortDriver Andrew Johnson
Next: RE: asynPortDriver Mark Rivers
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  <20122013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024