Experimental Physics and Industrial Control System
Hi Andrew,
As you experience a loss of communication with the IOC, I think something goes
severely wrong on your IOC. A normal timeout should not cause this. On the other
hand, a network communication problem can of course cause the timeout.
Do you have any error messages from the IOC shell?
Have you checked if some thread has crashed (probably while holding a
semaphore)?
Is the IOC actually still running after such a fault or has it crashed
completely?
Or maybe one thread uses up all CPU cycles in some tight loop? (How many cores
does your container use? Only one?) Have you checked the CPU usage with
top/htop?
I never used containers to run IOCs. Thus, I don't know the peculiarities of
such a setup. We run several dozen IOCs on one Linux system, which itself is a
VmWare instance on a sever cluster with several dozen others. Never had any
problems (i.e. bugs) that I could not reproduce on a single physical host as
well.
Dirk
On Fri, 2023-05-12 at 18:06 +0000, Wang, Andrew via Tech-talk wrote:
> Hi all,
>
> I have created multiple IOCs for the project in which I am involved. They are all running in their own Docker container in a host computer running Ubuntu 20.04. In each Docker container, the following EPICs and support module versions are used.
>
> EPICS: 7.0.4
> StreamDevice: 2.8.15
> Asyn: 4.41
>
> In one of the IOCs, I have a SSEQ record that is used to push a scalar value to multiple records that set four parameters for the target instrument. There is an instance where streamDevice is unable to push the value to the second parameter, causing the protocol to abort. Then, a few minutes later, my colleagues and I have observed that no records from the IOC in question can be accessed through Channel Access. This is the error message that we receive.
>
> Read operation timed out: some PV data was not read.
> <RECORD_NAME> 0
> CA.Client.Exception……………………………………………………..
> Warning: “Virtual circuit disconnect”
> Context: “op=0, channel=<RECORD_NAME>, type=DBR_TIME_DOUBLE, count=1, ctx=”<IP ADDRESS:PORT>”
> Source File: ../getCopy.cpp line 91
> Current Time: <TIME>
>
> This also meant that I was unable to check the STAT field to see what the cause of the abortion was.
>
> Thank you and I look forward to hearing back from everyone.
>
> Andy
>
>
> Purple ribbon awareness
>
- References:
- When I use an IOC in a container, streamDevice occasionally reports that protocol has been aborted, which causes the records in the IOC to become inaccessible from the host computer. Wang, Andrew via Tech-talk
- Navigate by Date:
- Prev:
Re: makefile compilation question NICOLE Remi via Tech-talk
- Next:
Re:RE: no bin file lynn via Tech-talk
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
<2023>
2024
2025
- Navigate by Thread:
- Prev:
Re: When I use an IOC in a container, streamDevice occasionally reports that protocol has been aborted, which causes the records in the IOC to become inaccessible from the host computer. Knap, Giles (DLSLtd,RAL,LSCI) via Tech-talk
- Next:
_registerRecordDeviceDriver warning David A. Slimmer via Tech-talk
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
<2023>
2024
2025