Hi Florian,
FYI. I find that it is perfectly OK to run multiple separate containers with one IOC each on the same machine (multiple IOCs in each container would also work).
This is as long as you are using network=host. Because they are all listening on the same port on the same address the CA search UDP broadcasts reach them all.
They can even start the caRepeater for you and the second attempt to start caRepeater will exit immediately as the CA_REPEATER_PORT is in use. (BUT - this relies on the first IOC container continuing to run, if it exits then it will take down caRepeater until
the next IOC is started. Hence, I recommend you start caRepeater in a dedicated container).
We are running in Kubernetes and using its container runtime. But the Docker (or Podman) runtime would exhibit the same behaviour as long as you also use network=host.
TLDR:
running multiple IOCs in containers with network=host is really similar to running them natively on the same host and should just work.
From: Tech-talk <tech-talk-bounces at aps.anl.gov> on behalf of Florian Feldbauer via Tech-talk <tech-talk at aps.anl.gov>
Sent: 15 May 2023 08:49
To: tech-talk at aps.anl.gov <tech-talk at aps.anl.gov>
Subject: Re: When I use an IOC in a container, streamDevice occasionally reports that protocol has been aborted, which causes the records in the IOC to become inaccessible from the host computer.
Hey all,
we are also working with IOCs running inside Docker Containers, using Modbus, StreamDevice (via serial and TCP ports), and a lot more. We have not seen any issues so far. All our IOCs are running stable.
Regarding our experience with network: If not using `net = host` for the container, the UDP broadcast does not seem to be forwarded, and thus IOCs on different Hosts (or in separated Docker networks) are not seen by CA/PVacces.
But usually we only have a single IOC per host, so we simply use `net = host` for our containers.
Another possibility we tested is to have multiple IOC in individual containers using a dedicated docker network and run a Gateway + reverse gateway (either directly on the host or inside a container with `net = host`) to establish CA communication of our
IOCs on one host to other hosts.
Cheers,
Florian
On 5/15/23 09:38, Knap, Giles (DLSLtd,RAL,LSCI) via Tech-talk wrote:
I'd just like to add that we have several IOCs running in containers including AreaDetector (Aravis) and motion controllers (turbo pmac) and have not seen any issue like this.
However, I have yet to try a Stream Device, so I'm interested to see your results, Andrew.
There, challenges with CA discovery and container networking. Our solution is to use the host network in our IOC containers. How have you solved this and are you able to verify that network routing to/between containers is working?
Hi Mark and Dirk,
Thank you for your responses. I will do my best to identify if it is a container or EPICS issue by separating out the IOC from the container as soon as possible. It won’t be until later next week unfortunately.
Timeouts in the protocol should be recoverable. From what I understand once this happens all CA access to the IOC is lost. If so, this is a more serious issue.
Mark
From: Tech-talk <tech-talk-bounces at aps.anl.gov>
On Behalf Of Zimoch Dirk via Tech-talk
Sent: Friday, May 12, 2023 2:16 PM
To: Wang, Andrew <wang126 at llnl.gov>
Cc: EPICS tech-talk <tech-talk at aps.anl.gov>
Subject: Re: When I use an IOC in a container, streamDevice occasionally reports that protocol has been aborted, which causes the records in the IOC to become inaccessible from the host computer.
Hi Andy,
I suspect your container stalls from time to time, causing timeouts in the protocol.
I do not think that containers have been designed with real-time performance in mind. Thus I am not really surprised that it does not behave like a pure host. Maybe you can tune how the host schedules the
containers? Or reduce the number of containers per host? It may simply be overburdened. Do you have any figures on the system load?
Hi all,
I have created multiple IOCs for the project in which I am involved. They are all running in their own Docker container in a host computer running Ubuntu 20.04. In each Docker container, the following EPICs and support module versions
are used.
- EPICS: 7.0.4
- StreamDevice: 2.8.15
- Asyn: 4.41
In one of the IOCs, I have a SSEQ record that is used to push a scalar value to multiple records that set four parameters for the target instrument. There is an instance where streamDevice is unable to push the value to the second parameter,
causing the protocol to abort. Then, a few minutes later, my colleagues and I have observed that no records from the IOC in question can be accessed through Channel Access. This is the error message that we receive.
Read operation timed out: some PV data was not read.
<RECORD_NAME> 0
CA.Client.Exception……………………………………………………..
Warning: “Virtual circuit disconnect”
Context: “op=0, channel=<RECORD_NAME>, type=DBR_TIME_DOUBLE, count=1, ctx=”<IP ADDRESS:PORT>”
Source File: ../getCopy.cpp line 91
Current Time: <TIME>
This also meant that I was unable to check the STAT field to see what the cause of the abortion was.
Thank you and I look forward to hearing back from everyone.
Andy
Purple ribbon
awareness
--
This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify
us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail.
Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd.
Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message.
Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom
--
Ruhr-Universität Bochum
AG der Experimentalphysik I
Dr. Florian Feldbauer
NB 2/131 / Fach 125
Universitätsstr. 150
D-44801 Bochum
Office: NB 2/134
Phone: (+49)234 / 32-23563
Fax: (+49)234 / 32-14170
https://paluma.ruhr-uni-bochum.de
--
This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail. Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd. Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message. Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom
|