Experimental Physics and Industrial Control System
Hi Mark,
Instead, I edited the patched devAsynXXXArray.h file, where I commented out the two lines that refer to timestamp.
With this change, my IOC has processed nearly 1 million triggers without an error. Perhaps it is too early to declare victory but it smells like this is it.
Thanks for your help,
Zen
> -----Original Message-----
> From: Mark Rivers [mailto:[email protected]]
> Sent: Wednesday, July 25, 2012 7:02 AM
> To: Szalata, Zenon M.
> Subject: RE: asynPortDriver
>
> Hi Zen,
>
> I realized you probably cannot just modify the patch, since the line numbers
> won't match.
>
> What you want to do is just to bracket the two statements that access the
> waveform record with pdf to be bracketed by a dbScanLock like this:
>
> dbScanLock((dbCommon *)pwf); \
> if (len > pwf->nelm) len = pwf->nelm; \
> for (i=0; i<(int)len; i++) pData[i] = value[i]; \
> dbScanUnlock((dbCommon *)pwf); \
>
> Mark
>
> -----Original Message-----
> From: Szalata, Zenon M. [mailto:[email protected]]
> Sent: Tuesday, July 24, 2012 10:41 PM
> To: Mark Rivers
> Subject: RE: asynPortDriver
>
> Hi Mark,
> Asyn does not build after I patch the file. I get:
> ../../asyn/devEpics/devAsynInt8Array.c: In function 'callbackWf':
> ../../asyn/devEpics/devAsynInt8Array.c:46: error: 'asynUser' has no member
> named 'timestamp'
> ../../asyn/devEpics/devAsynInt8Array.c: In function 'interruptCallbackInput':
> ../../asyn/devEpics/devAsynInt8Array.c:46: error: 'asynUser' has no member
> named 'timestamp'
> make[2]: *** [devAsynInt8Array.o] Error 1
>
> Zen
>
>
> > -----Original Message-----
> > From: Mark Rivers [mailto:[email protected]]
> > Sent: Tuesday, July 24, 2012 3:35 PM
> > To: Szalata, Zenon M.
> > Cc: [email protected]; Williams Jr., Ernest L.
> > Subject: RE: asynPortDriver
> >
> > Hi Zen,
> >
> > I think we were on the right track with the gotValue check, but the
> > timing problem is a little different.
> >
> > Here is the problem I thought might be happening:
> >
> > - Your driver calls doCallbacksInt32Array.
> > - This results in a call to interruptCallbackInput in
> > asyn/devEpics/devAsynXXXArray.h
> > - interruptCallbackInput copies the data to the waveform record VAL
> > field and then calls scanIoRequest
> > - Another interrupt occurs before the record processes and calls
> > devAsynXXXArray::processCommon
> > - gotValue will be 1 because the processCommon has not yet reset
> > gotValue=0
> >
> > You showed that this problem can happen, but only at higher callback rates.
> >
> > Here is the problem I now think is happening:
> > - Your driver calls doCallbacksInt32Array.
> > - This results in a call to interruptCallbackInput in
> > asyn/devEpics/devAsynXXXArray.h
> > - interruptCallbackInput copies the data to the waveform record VAL
> > field and then calls scanIoRequest
> > - scanIoRequest calls processCommon, which sets gotValue=0
> > - Another interrupt occurs before the record has finished posting
> > monitors. It overwrites the data in the record. It calls scanIoRequest again.
> > - The waveform record then processes twice with the data from the
> > second callback.
> >
> > This appears to be what you are seeing.
> >
> > I just realized that the problem is that there no mutex preventing the
> > callback from overwriting the waveform data at any time when the
> > waveform record is processing, which is bad. Note that is problem is
> > confined to I/O Intr scanned waveform records, it does not happen on
> > other records that use a ring buffer.
> >
> > I think the solution is to call dbScanLock during the time the array
> > data is being modified in the callback.
> >
> > Here is a patch against asyn R4-19. Can you try it? It also sets the
> > timestamp from the callback.
> >
> > corvette:asyn/asyn/devEpics>svn diff -r1995 devAsynXXXArray.h
> > Index: devAsynXXXArray.h
> >
> =============================================================
> ======
> > --- devAsynXXXArray.h (revision 1995)
> > +++ devAsynXXXArray.h (working copy)
> > @@ -254,6 +254,7 @@
> > pPvt->pasynUser, pwf->bptr, pwf->nelm, &nread); \
> > asynPrint(pasynUser, ASYN_TRACEIO_DEVICE, \
> > "%s %s::callbackWf\n", pwf->name, driverName); \
> > + pwf->time = pasynUser->timestamp; \
> > if (status == asynSuccess) { \
> > pwf->udf=0; \
> > pwf->nord = nread; \
> > @@ -280,8 +281,11 @@
> > (char *)value, len*sizeof(EPICS_TYPE), \
> > "%s %s::interruptCallbackInput\n", \
> > pwf->name, driverName); \
> > + dbScanLock((dbCommon *)pwf); \
> > if (len > pwf->nelm) len = pwf->nelm; \
> > for (i=0; i<(int)len; i++) pData[i] = value[i]; \
> > + pwf->time = pasynUser->timestamp; \
> > + dbScanUnlock((dbCommon *)pwf); \
> > pPvt->gotValue = 1; \
> > pPvt->nord = len; \
> > if (pPvt->status == asynSuccess) pPvt->status =
> > pasynUser->auxStatus; \
> >
> >
> >
> > Mark
> >
> >
> > -----Original Message-----
> > From: Szalata, Zenon M. [mailto:[email protected]]
> > Sent: Tuesday, July 24, 2012 4:47 PM
> > To: Mark Rivers
> > Cc: [email protected]; Williams Jr., Ernest L.
> > Subject: RE: asynPortDriver
> >
> > Hi Mark,
> > I did couple of things to help understand what is happening.
> > 1. I put a printf statement in interruptCallbackInput to print a
> > message when gotValue not equals 0.
> > This message does not get printed when the error condition occurs.
> > I tested that the message gets
> > printed by increasing the trigger rate until the IOC "fall apart"
> > and those messages then come out abundantly.
> >
> > 2. I have simplified my diagnostic mechanism to remove as much
> > ambiguity as possible to the point
> > that I think the problem happens between posting the data in the
> > device driver and the waveform
> > record processing (I/O Intr).
> >
> > My diagnostic mechanism consists of two sets of EPICS records.
> > The data are passed from the device driver as an array of 17 integers.
> > The first is a gate counter, which is incremented just before it is put in the
> array.
> > This counter is followed by 16 ADC values. The array is received in a
> > waveform record. The waveform record triggers processing of an aSub
> > record and a chain of sub array records where the ADC values are
> > extracted and put into ai records.
> > The aSub record compares the value of the gate counter with the value
> > of gate counter from a previous event. If this value is not greater
> > by 1 than the previous value, it prints a message, which shows this and
> previous values.
> >
> > The second set of diagnostic records is processed when the gate
> > counter is extracted from the waveform record and put into longin
> > record. there is a sub record which does similar checking, but on
> > discrete data and prints a message if the counter value did not change
> > by 1. Printed outputs from both sets indicate the same problem.
> >
> > Here is a typical output:
> >
> > asubMiss: Something wrong: n=53385,lastn=53383
> > asubMiss: Something wrong: n=53385,lastn=53385
> > subASMissing: got a missing ++++++++++++++++
> >
> a=53385.000,b=0.000,c=4294967295.000,d=1.000,e=53383.000,f=1.000,g=1.
> 0
> > 00
> > k=2.000,l=1.000,val=2.000
> > subASMissing: trigger number this=53385.0, prev=53385.0
> >
> > The first two messages come from the aSub record, the next three come
> > from the sub record.
> > I am not sure how to interpret the presence of the second of the two
> > messages from the aSub record and the third from the sub record. It
> > looks as if the waveform record processed twice when the problem
> occurred.
> >
> > The sub record message prints the values of the fields a through l,
> > where a is this counter and e is previous counter.
> >
> > Thank you Mark for looking at this,
> > Zen
> >
> >
> > > -----Original Message-----
> > > From: Mark Rivers [mailto:[email protected]]
> > > Sent: Thursday, July 19, 2012 6:10 PM
> > > To: Szalata, Zenon M.
> > > Cc: [email protected]; Williams Jr., Ernest L.
> > > Subject: RE: asynPortDriver
> > >
> > > Hi Zen,
> > >
> > > What is the diagnostic mechanism you are using to determine that the
> > > counter has incremented by 2? Is there a possibility that the
> > > waveform record is actually processing every callback OK, but that
> > > arrays are being dropped between the waveform record and whatever
> > > diagnostic client you are running?
> > >
> > > Mark
> > >
> > > ________________________________________
> > > From: Szalata, Zenon M. [[email protected]]
> > > Sent: Thursday, July 19, 2012 5:23 PM
> > > To: Mark Rivers
> > > Cc: [email protected]; Williams Jr., Ernest L.
> > > Subject: RE: asynPortDriver
> > >
> > > Hi Mark,
> > > I modified my version 2 asynPortDriver based device driver so that now it:
> > > 1. in ISR (at interrupt level) it read data from all 16 channels and
> > > sends the data by message queue.
> > > 2. In a high priority thread the data is received from the message
> > > queue, unpacked, and posted as an array.
> > > The unpacked data are put into an array where the first element
> > > is a counter, which is incremented in the
> > > data unpacking thread, and elements 1 through 16 are the ADC values.
> > > In the db layer I still see missing data. Once in a while the
> > > counter value, which is the first element in the data array, is
> > > larger by 2 than the previous counter value. The test is done at 100Hz.
> > >
> > > The version 3 of the device driver, which is not using asyn, does
> > > not exhibit this behavior. In fact I can run the system with a
> > > trigger rate of up to 400Hz before it starts falling apart. Since I
> > > have a device driver that works (version 3) I could just use it and move on.
> > > But at this point I am still reluctant to just abandon the
> > > asynPortDriver version. Can you think of some tests that could help
> > understand what is causing this problem?
> > > Thanks,
> > > Zen
> > >
> > > > -----Original Message-----
> > > > From: Mark Rivers [mailto:[email protected]]
> > > > Sent: Sunday, July 15, 2012 6:59 PM
> > > > To: Szalata, Zenon M.
> > > > Cc: [email protected]; Williams Jr., Ernest L.
> > > > Subject: RE: asynPortDriver
> > > >
> > > > Hi Zen,
> > > >
> > > > > In the first version of the device driver, I saw the effect that
> > > > > Kevin Tsubota described in his reply to my email. This is what he said:
> > > >
> > > > > "At Keck we ran into a similar problem where we weren't seeing
> > > > > all the events we expected with a different driver and it turned
> > > > > out to be that ASYN
> > > > only posts on change."
> > > >
> > > > OK, there are 3 parts to my reply about that:
> > > >
> > > > 1) This is not an issue with ASYN in general, it is specific to
> > asynPortDriver.
> > > >
> > > > 2) A future version of asynPortDriver should have a variant of the
> > > > setXXXParam that takes a "force" flag, i.e.
> > > > setIntegerParam(addr, myParam, value, forceCallback) which will
> > > > force a callback the next time callParamCallbacks(addr) is called,
> > > > even if the value has not changed.
> > > >
> > > > 3) Meanwhile it is trivial to force such behavior in the current
> > > > version by doing setIntegerParam(addr, myParam, value+1);
> > > > setIntegerParam(addr, myParam, value);
> > > >
> > > > This will fool the parameter library into thinking that the value
> > > > has changed so it will do the callback on "value". The overhead
> > > > of doing the additional call to setIntegerParam is insignificant.
> > > >
> > > > > In all three version of the device driver The data is read out
> > > > > at interrupt level, which is needed to clear the interrupt.
> > > > > Then I schedule a callback to continue processing the data.
> > > > > This consists of
> > > > unpacking the data words into various components.
> > > >
> > > > I would have to see your code, but what you are doing sounds dangerous.
> > > > What if a second interrupt occurs before the callback occurs to
> > > > process the first? If your data is not put in a queue, then you
> > > > risk missing data. In my Ip330 and ipUnidig drivers I send the
> > > > data from the interrupt service routine to the thread that will do
> > > > callbacks via an epicsMessageQueue. Then even if the thread that
> > > > does callbacks momentarily falls behind I will not lose data
> > > > unless the message queue
> > > overflows, and I have a flag to signal if that occurs.
> > > >
> > > > Mark
> > > >
> > > > ________________________________________
> > > > From: Szalata, Zenon M. [[email protected]]
> > > > Sent: Sunday, July 15, 2012 8:20 PM
> > > > To: Mark Rivers
> > > > Cc: [email protected]; Williams Jr., Ernest L.
> > > > Subject: RE: asynPortDriver
> > > >
> > > > Hi Mark,
> > > > In the first version of the device driver, I saw the effect that
> > > > Kevin Tsubota described in his reply to my email. This is what he said:
> > > >
> > > > "At Keck we ran into a similar problem where we weren't seeing all
> > > > the events we expected with a different driver and it turned out
> > > > to be that ASYN only posts on change."
> > > >
> > > > In all three version of the device driver The data is read out at
> > > > interrupt level, which is needed to clear the interrupt. Then I
> > > > schedule a callback to continue processing the data. This
> > > > consists of unpacking the data words into various components.
> > > > In the case of the first version of the device driver, the
> > > > callback requests to process the records are done in a for loop.
> > > > Perhaps I have abandoned the first version too quickly.
> > > > Anyway it was clear to me that it would be better to bundle the
> > > > trigger number and data from all ADC channels and pass that to the
> > > > db
> > layer.
> > > >
> > > > I will modify the C subroutine of the subroutine record to also
> > > > print a message when the current gate number is less than the
> > > > previous
> > value.
> > > >
> > > > Thanks,
> > > > Zen
> > > >
> > > > > -----Original Message-----
> > > > > From: Mark Rivers [mailto:[email protected]]
> > > > > Sent: Sunday, July 15, 2012 3:19 PM
> > > > > To: Szalata, Zenon M.
> > > > > Cc: [email protected]; Williams Jr., Ernest L.
> > > > > Subject: RE: asynPortDriver
> > > > >
> > > > > How are you passing the data from your interrupt service routine
> > > > > to the thread that is calling callParamCallbacks()? Are you
> > > > > using a mechanism with a queue, like an epicsMessageQueue?
> > > > >
> > > > > I'd like to understand why "the order of record processing is
> > > > > somewhat random" in method 1. I would think that it would be
> > > > > deterministic because there is only 1 callback thread that is
> > > > > processing all the records in the order in which the callback
> > > > > requests were issued, which I assume your driver does in a loop like:
> > > > >
> > > > > for (i=0; i<16; i++) {
> > > > > callParamCallbacks(i);
> > > > > }
> > > > >
> > > > > Mark
> > > > >
> > > > > ________________________________
> > > > > From: Szalata, Zenon M. [[email protected]]
> > > > > Sent: Sunday, July 15, 2012 12:15 PM
> > > > > To: Mark Rivers
> > > > > Cc: [email protected]; Williams Jr., Ernest L.
> > > > > Subject: asynPortDriver
> > > > >
> > > > > Hi Mark,
> > > > > I am trying to understand why my device driver for a CAEN VME
> > > > > gated ADC module, which is based on asynPortDriver class works
> > incorrectly.
> > > > > The module has 16 ADC channels and is setup to generate an
> > > > > interrupt for each gate signal. The interrupt triggers data
> > > > > readout and record processing. For testing, the trigger/gate
> > > > > rate is 100
> > Hz.
> > > > > The device driver maintains a gate counter, which gets
> > > > > incremented each time the data
> > > > is processed.
> > > > > So far I have three versions of the device driver.
> > > > >
> > > > > 1. In this version I have 16 I/O Intr scanned ai records. The
> > > > > gate counter is pushed to a longin record.
> > > > > This version has the problem that the order of record
> > > > > processing is somewhat random and the IOC
> > > > > using this VME module requires the processing to be deterministic.
> > > > > To overcome this difficulty,
> > > > > I wrote the second version.
> > > > >
> > > > > 2. In this version I pack the gate counter and all 16 ADC
> > > > > channel values into an array and this array gets
> > > > > pushed to a waveform record. Then, the data gets
> > > > > distributed to ai records using sub array records.
> > > > > This part works fine. I have subroutine record which is
> > > > > processed each time the data waveform record
> > > > > is processed. It checks for missing triggers, by expecting
> > > > > that the gate counter value for this data array
> > > > > is larger by 1 than the previous one. This is where the
> > > > > device driver code seems to fail.
> > > > > I let the IOC run for a bit over 35 hours. More precisely
> > > > > the time corresponds to 12700898 triggers.
> > > > > During this time the subroutine record reported 42 missing
> > > > > triggers. They occurred spread out
> > > > > throughout the run nearly but not equally spaced in time. I
> > > > > have a simple print statement
> > > > > in the C routine which prints a few numbers. These numbers
> > > > > tell me that the trigger incremented by 2.
> > > > > I just looked at the logic in the C routine and I see that I
> > > > > am not printing messages when the new gate
> > > > > counter is less than the previous one. So it is possible
> > > > > that from time to time the waveform record gets
> > > > > the data out of order. It is also possible, but I think
> > > > > unlikely, that from time to time the data is lost.
> > > > >
> > > > > 3. This version of the device driver is non-asyn. It consists
> > > > > of two parts, device support and record support.
> > > > > I wrote it, by taking the second version of the device
> > > > > driver and replaced the asynPortDriver part with
> > > > > device support. Also, I created a new version of the IOC, by
> > > > > taking the IOC which used the second version of
> > > > > the device driver and modifying the db files and the st.cmd
> > > > > file as needed. I ran this IOC version
> > > > > overnight, so far it has processed 4 and half million triggers
> > > > > and not a single missed trigger.
> > > > >
> > > > > I wonder if the behavior observed using the second version of
> > > > > the device driver is to be expected?
> > > > > Does it mean, that for some applications the asynPortDriver
> > > > > approach to device driver implementation might be not appropriate?
> > > > > Zen
- Replies:
- RE: asynPortDriver Mark Rivers
- References:
- asynPortDriver Szalata, Zenon M.
- RE: asynPortDriver Mark Rivers
- RE: asynPortDriver Szalata, Zenon M.
- RE: asynPortDriver Mark Rivers
- RE: asynPortDriver Szalata, Zenon M.
- RE: asynPortDriver Mark Rivers
- RE: asynPortDriver Szalata, Zenon M.
- RE: asynPortDriver Mark Rivers
- Navigate by Date:
- Prev:
Re: asynPortDriver Andrew Johnson
- Next:
RE: asynPortDriver Mark Rivers
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
<2012>
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
- Navigate by Thread:
- Prev:
Re: asynPortDriver Andrew Johnson
- Next:
RE: asynPortDriver Mark Rivers
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
<2012>
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024