Hi, Dirk
That's a good point on records containing the last known state :)
Counting the errors would get us more info than the latch, even, so I
appreciate the idea.
We have semi-permanent logging (our IOCs run as docker services), but a
proper log server for StreamDevice errors sounds great, as it would help
in filtering and might allow us to more easily correlate errors between
devices. Thanks for the suggestion!
I'm not sure what the best way of providing sample data would be. How do
you usually do it?
IOC: https://github.com/lnls-dig/sinap-timing-epics-ioc
Startup file:
https://github.com/lnls-dig/sinap-timing-epics-ioc/blob/master/iocBoot/ioctiming/stEVE.cmd
Proto:
https://github.com/lnls-dig/sinap-timing-epics-ioc/blob/master/timingApp/Db/timing.proto
Db:
https://github.com/lnls-dig/sinap-timing-epics-ioc/blob/master/timingApp/Db/fw_version.db
(there are a few other databases, but this is the one with the record
that did the unexpected locking)
For further debugging this issue, we are looking into increasing
ReplyTimeout. We also tried changing the Lantronix device responsible
for serial to network conversion to use TCP instead of UDP (which is how
it was configured by our supplier), but so far that has shown other
issues: we hit ReadTimeout, the input gets all garbled up, and finally
the IOC segfaults, which was quite unexpected; I'm going to look into
increasing ReadTimeout, and try and get a backtrace for the segfault.
Cheers,
Érico
On 19/09/2023 07:02, Zimoch Dirk wrote:
> Hi Érico,
>
> Usually EPICS records contain the last known state, not a history. Thus, it is
> to be expected that the status goes back to "all is fine" as soon as the
> connection is re-established.
>
> A few things you an do:
>
> 1. Count the errors (using a calc record). With this method, you can at least
> see the counter increasing. If you like, you can archive this counter and see
> later if there had been bursts, etc.
> 2. Log the error messages to a log server. In addition to seeing the problem on
> the ioc shell, this allows you to look up events later in a log file or a
> database (e.g. logstash). By default, StreamDevice logs to stderr. But you can
> use 'streamSetLogfile filename' to copy error messages to a file. (Call with no
> file name to stop logging to the file.)
>
> That the connection is not re-established is unexpected (as the message says):
>> 2023/09/18 15:30:55.800291 TIPORT DE-23RaBPM:TI-EVE:FrmVersionA-Cte:
> StreamCore::lockCallback(StreamIoSuccess) called unexpectedly>
>
> I need to analyze this. Can you send me your records, protocols and the port
> configuration from the startup script? Also some sample data would be helpful.
>
>
> Dirk
>
> On Mon, 2023-09-18 at 15:49 +0000, Érico Nogueira Rolim via Tech-talk wrote:
>> Hi!
>>
>> I'm using StreamDevice with a UDP device, and we have been observing some communication timeouts caused by packet drops/high CPU load (per our testing), which we only noticed due to actually opening the IOC shell and seeing multiple "No reply within 1000 ms to ..." messages. However, I'd like to be able to observe these (instantaneous) timeouts from a PV. Checking the alarm of the PVs isn't enough, because it's cleared as soon as the next communication attempt works, and having to check our archiver data for the information doesn't scale well for operation.
>>
>> I instantiated the asynRecord for our port, and tried setting .DRTO to "Yes", in order to observe disconnections (which I simulated by fully disconnecting the ethernet cable from the device). The .CNCT field does go to "Disconnect", but it goes back to "Connect" for every record that is processed, and displaying the timestamp of the PV in an interface wouldn't be enough, because it is "<undefined>" (per camonitor output). Furthermore, in what seems to be a bug, once I reconnect the ethernet cable, the iocsh prints the following messages:
>> 2023/09/18 15:30:55.702487 TIPORT DE-23RaBPM:TI-EVE:readAndUpdate: device TIPORT 0 disconnected
>> ** reconnected cable **
>> 2023/09/18 15:30:55.800291 TIPORT DE-23RaBPM:TI-EVE:FrmVersionA-Cte: StreamCore::lockCallback(StreamIoSuccess) called unexpectedly
>> and from this point on, it simply doesn't reestablish a connection. Pulling the cable again doesn't cause anything to be printed in iocsh anymore, and the records have stopped being updated (though at least they do have alarm information). I can provide more information, if necessary :)
>>
>> Is there some other way of monitoring timeouts and disconnections when using asyn/StreamDevice which I'm missing?
>>
>> Cheers,
>> Érico
>>
>> Aviso Legal: Esta mensagem e seus anexos podem conter informações confidenciais e/ou de uso restrito. Observe atentamente seu conteúdo e considere eventual consulta ao remetente antes de copiá-la, divulgá-la ou distribuí-la. Se você recebeu esta mensagem por engano, por favor avise o remetente e apague-a imediatamente.
>> Disclaimer: This email and its attachments may contain confidential and/or privileged information. Observe its content carefully and consider possible querying to the sender before copying, disclosing or distributing it. If you have received this email by mistake, please notify the sender and delete it immediately.
- Replies:
- Re: Monitor timeouts with asyn/StreamDevice, and issues in DRTO with UDP Zimoch Dirk via Tech-talk
- References:
- Monitor timeouts with asyn/StreamDevice, and issues in DRTO with UDP Érico Nogueira Rolim via Tech-talk
- Re: Monitor timeouts with asyn/StreamDevice, and issues in DRTO with UDP Zimoch Dirk via Tech-talk
- Navigate by Date:
- Prev:
Re: Monitor timeouts with asyn/StreamDevice, and issues in DRTO with UDP Érico Nogueira Rolim via Tech-talk
- Next:
Re: Monitor timeouts with asyn/StreamDevice, and issues in DRTO with UDP Zimoch Dirk via Tech-talk
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
<2023>
2024
- Navigate by Thread:
- Prev:
Re: Monitor timeouts with asyn/StreamDevice, and issues in DRTO with UDP Zimoch Dirk via Tech-talk
- Next:
Re: Monitor timeouts with asyn/StreamDevice, and issues in DRTO with UDP Zimoch Dirk via Tech-talk
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
<2023>
2024
|