EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  <20232024  Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  <20232024 
<== Date ==> <== Thread ==>

Subject: Re: Monitor timeouts with asyn/StreamDevice, and issues in DRTO with UDP
From: Érico Nogueira Rolim via Tech-talk <tech-talk at aps.anl.gov>
To: Zimoch Dirk <dirk.zimoch at psi.ch>, "tech-talk at aps.anl.gov" <tech-talk at aps.anl.gov>
Date: Fri, 22 Sep 2023 11:28:01 +0000
Hi, Dirk

That's a good point on records containing the last known state :)

Counting the errors would get us more info than the latch, even, so I 
appreciate the idea.

We have semi-permanent logging (our IOCs run as docker services), but a 
proper log server for StreamDevice errors sounds great, as it would help 
in filtering and might allow us to more easily correlate errors between 
devices. Thanks for the suggestion!

I'm not sure what the best way of providing sample data would be. How do 
you usually do it?

IOC: https://github.com/lnls-dig/sinap-timing-epics-ioc

Startup file: 
https://github.com/lnls-dig/sinap-timing-epics-ioc/blob/master/iocBoot/ioctiming/stEVE.cmd

Proto: 
https://github.com/lnls-dig/sinap-timing-epics-ioc/blob/master/timingApp/Db/timing.proto

Db: 
https://github.com/lnls-dig/sinap-timing-epics-ioc/blob/master/timingApp/Db/fw_version.db 
(there are a few other databases, but this is the one with the record 
that did the unexpected locking)

For further debugging this issue, we are looking into increasing 
ReplyTimeout. We also tried changing the Lantronix device responsible 
for serial to network conversion to use TCP instead of UDP (which is how 
it was configured by our supplier), but so far that has shown other 
issues: we hit ReadTimeout, the input gets all garbled up, and finally 
the IOC segfaults, which was quite unexpected; I'm going to look into 
increasing ReadTimeout, and try and get a backtrace for the segfault.

Cheers,
Érico

On 19/09/2023 07:02, Zimoch Dirk wrote:
> Hi Érico,
>
> Usually EPICS records contain the last known state, not a history. Thus, it is
> to be expected that the status goes back to "all is fine" as soon as the
> connection is re-established.
>
> A few things you an do:
>
> 1. Count the errors (using a calc record). With this method, you can at least
> see the counter increasing. If you like, you can archive this counter and see
> later if there had been bursts, etc.
> 2. Log the error messages to a log server. In addition to seeing the problem on
> the ioc shell, this allows you to look up events later in a log file or a
> database (e.g. logstash). By default, StreamDevice logs to stderr. But you can
> use 'streamSetLogfile filename' to copy error messages to a file. (Call with no
> file name to stop logging to the file.)
>
> That the connection is not re-established is unexpected (as the message says):
>> 2023/09/18 15:30:55.800291 TIPORT DE-23RaBPM:TI-EVE:FrmVersionA-Cte:
> StreamCore::lockCallback(StreamIoSuccess) called unexpectedly>
>
> I need to analyze this. Can you send me your records, protocols and the port
> configuration from the startup script? Also some sample data would be helpful.
>
>
> Dirk
>
> On Mon, 2023-09-18 at 15:49 +0000, Érico Nogueira Rolim via Tech-talk wrote:
>> Hi!
>>
>> I'm using StreamDevice with a UDP device, and we have been observing some communication timeouts caused by packet drops/high CPU load (per our testing), which we only noticed due to actually opening the IOC shell and seeing multiple "No reply within 1000 ms to ..." messages. However, I'd like to be able to observe these (instantaneous) timeouts from a PV. Checking the alarm of the PVs isn't enough, because it's cleared as soon as the next communication attempt works, and having to check our archiver data for the information doesn't scale well for operation.
>>
>> I instantiated the asynRecord for our port, and tried setting .DRTO to "Yes", in order to observe disconnections (which I simulated by fully disconnecting the ethernet cable from the device). The .CNCT field does go to "Disconnect", but it goes back to "Connect" for every record that is processed, and displaying the timestamp of the PV in an interface wouldn't be enough, because it is "<undefined>" (per camonitor output). Furthermore, in what seems to be a bug, once I reconnect the ethernet cable, the iocsh prints the following messages:
>> 2023/09/18 15:30:55.702487 TIPORT DE-23RaBPM:TI-EVE:readAndUpdate: device TIPORT 0 disconnected
>> ** reconnected cable **
>> 2023/09/18 15:30:55.800291 TIPORT DE-23RaBPM:TI-EVE:FrmVersionA-Cte: StreamCore::lockCallback(StreamIoSuccess) called unexpectedly
>> and from this point on, it simply doesn't reestablish a connection. Pulling the cable again doesn't cause anything to be printed in iocsh anymore, and the records have stopped being updated (though at least they do have alarm information). I can provide more information, if necessary :)
>>
>> Is there some other way of monitoring timeouts and disconnections when using asyn/StreamDevice which I'm missing?
>>
>> Cheers,
>> Érico
>>
>> Aviso Legal: Esta mensagem e seus anexos podem conter informações confidenciais e/ou de uso restrito. Observe atentamente seu conteúdo e considere eventual consulta ao remetente antes de copiá-la, divulgá-la ou distribuí-la. Se você recebeu esta mensagem por engano, por favor avise o remetente e apague-a imediatamente.
>> Disclaimer: This email and its attachments may contain confidential and/or privileged information. Observe its content carefully and consider possible querying to the sender before copying, disclosing or distributing it. If you have received this email by mistake, please notify the sender and delete it immediately.



Replies:
Re: Monitor timeouts with asyn/StreamDevice, and issues in DRTO with UDP Zimoch Dirk via Tech-talk
References:
Monitor timeouts with asyn/StreamDevice, and issues in DRTO with UDP Érico Nogueira Rolim via Tech-talk
Re: Monitor timeouts with asyn/StreamDevice, and issues in DRTO with UDP Zimoch Dirk via Tech-talk

Navigate by Date:
Prev: Re: Monitor timeouts with asyn/StreamDevice, and issues in DRTO with UDP Érico Nogueira Rolim via Tech-talk
Next: Re: Monitor timeouts with asyn/StreamDevice, and issues in DRTO with UDP Zimoch Dirk via Tech-talk
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  <20232024 
Navigate by Thread:
Prev: Re: Monitor timeouts with asyn/StreamDevice, and issues in DRTO with UDP Zimoch Dirk via Tech-talk
Next: Re: Monitor timeouts with asyn/StreamDevice, and issues in DRTO with UDP Zimoch Dirk via Tech-talk
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  <20232024 
ANJ, 22 Sep 2023 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·