EPICS Home

Experimental Physics and Industrial Control System


 
1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  <20242025  Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  <20242025 
<== Date ==> <== Thread ==>

Subject: Re: Generating metrics for IOC applications
From: TZVETKOV Stephane via Tech-talk <tech-talk at aps.anl.gov>
To: "ralph.lange at gmx.de" <ralph.lange at gmx.de>, "tech-talk at aps.anl.gov" <tech-talk at aps.anl.gov>
Date: Thu, 29 Feb 2024 16:26:02 +0000
Hi Ralph, From my point of view, the situation you describe is a monitoring problem. In our experience, we encounter similar issues: - When to schedule the replacement of our hardware, e. g. hard disks before they fail beyond repair? - How can
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside your organization.
 
ZjQcmQRYFpfptBannerEnd
Hi Ralph,

 From my point of view, the situation you describe is a monitoring problem. In our experience, we encounter similar issues:

- When to schedule the replacement of our hardware, e.g. hard disks before they fail beyond repair?
- How can we be warned when our hardware is failing, e.g. when a fan fails on one of our machines?
- How can we detect if a machine's OS is too old?
- How can we track certain "common" metrics on our machines (RAM, CPU, disk space used, network activity, read/write operations, ...)?
- How can we spot and identify security vulnerabilities in our programs deployed on those machines?
- etc

Your question is more focused on obsolescence, but I think our answer covers this concern.

We mostly use four tools to address this kind of situation:

1. NetBox (https://urldefense.us/v3/__https://docs.netbox.dev/en/stable/__;!!G_uCfscf7eWS!Zt1EFuIfinew_sHEJ-KDg4X7bpbCbD8I-j8SG2uwlwE43w5SCvqZY26zXZ-CzVQoxscG82twY3x0_nD0VBEqU063VPFiYMw$). This is a documentation tool that enable us to make an inventory of all our equipment.
There's a bit of a learning curve, but you get used to it, and administering your own instance isn't that hard.
Initially, the tool was designed for data centers, but in practice, it can be used for any installation.
What's really interesting about NetBox is that you can integrate plugins (https://urldefense.us/v3/__https://docs.netbox.dev/en/stable/plugins/development/__;!!G_uCfscf7eWS!Zt1EFuIfinew_sHEJ-KDg4X7bpbCbD8I-j8SG2uwlwE43w5SCvqZY26zXZ-CzVQoxscG82twY3x0_nD0VBEqU063bVT5PyM$):
in your case, it's easy to imagine a plugin that would regularly scan the inventory to detect obsolete equipment (depending on your criteria).

2. Grafana (https://urldefense.us/v3/__https://grafana.com/docs/grafana/latest/introduction/__;!!G_uCfscf7eWS!Zt1EFuIfinew_sHEJ-KDg4X7bpbCbD8I-j8SG2uwlwE43w5SCvqZY26zXZ-CzVQoxscG82twY3x0_nD0VBEqU063JqONqjM$) + Prometheus (https://urldefense.us/v3/__https://grafana.com/docs/grafana/latest/getting-started/get-started-grafana-prometheus/__;!!G_uCfscf7eWS!Zt1EFuIfinew_sHEJ-KDg4X7bpbCbD8I-j8SG2uwlwE43w5SCvqZY26zXZ-CzVQoxscG82twY3x0_nD0VBEqU0637jrmmk0$).
This tool help us monitoring most of our systems. You just have to install a "data exporter" on the system you want to monitor,
and then you have access to a lot of different metrics. For example, you can check the wikimedia ones (which are open): https://urldefense.us/v3/__https://grafana.wikimedia.org/d/000000305/maps-performances?orgId=1__;!!G_uCfscf7eWS!Zt1EFuIfinew_sHEJ-KDg4X7bpbCbD8I-j8SG2uwlwE43w5SCvqZY26zXZ-CzVQoxscG82twY3x0_nD0VBEqU063mV5LkKw$
You can even configure alarms on the metrics you want.

3. SSH Monitor (an internal project, not open sourced yet but we're getting there, you can contact me if you want more details about it).
This is an EPICS Top that will send small SSH commands to target machines in order to extract specific information.
This might sound "unsecure", but thanks to SSH (and restricted Bash) this can be done in a very safe way. It allows to monitor any data from one or multiple target machines,
as long as the target machine hosts an SSH daemon. In fact, it is made in such a way that it allows you to monitor any data, if you can come up with a system command-line that can retrieve it.
It is used in addition to Grafana + Prometheus for the following reasons:
- Grafana + Prometheus cannot be connected to EPICS (as far as I know), so if you want EPICS PVs for the metrics you monitor
(e.g. for "EPICS alarms" with phoebus-alarms, for "EPICS archives" with Archiver Appliance, ...), then SSH Monitor is more appropriated.
- SSH Monitor can also be more appropriated when looking for very specific informations, e.g. like parsing very particular expressions (specific for your project) in some log files...
- Grafana + Prometheus needs a data-exporter to be installed on the target machine, which might be considered "intrusive" and a "NO-GO" for a lot of target machines.
SSH Monitor allows very little to no intervention on the target machine.
- Data-exporters might not support the target machine CPU architecture. In that case, if the target machine hosts an SSH daemon, then you won't have to care much about its CPU architecture with SSH Monitor.

4. Nix (https://urldefense.us/v3/__https://nixos.org/manual/nix/stable/__;!!G_uCfscf7eWS!Zt1EFuIfinew_sHEJ-KDg4X7bpbCbD8I-j8SG2uwlwE43w5SCvqZY26zXZ-CzVQoxscG82twY3x0_nD0VBEqU063VYNfUdg$). This is just a package manager we use to package our programs (mostly EPICS stuff) before deploying them.
If you remember me, then you very probably remember my colleague Rémi NICOLE, who is quite enthusiast about Nix and NixOS.
Since you mentionned "bitrot", Rémi suggested to mention the bit-for-bit build reproducibility of the Nix package manager (which is a great guarantee).
Nix can be installed on any Linux distro (even very old ones, like old CentOS machines most of the time), without conflicting with the local package manager (yum, dnf, apt, etc).
We also use it to check the full dependency tree of the programs we package with it: this way, we can track our dependencies and Nix can even warn us for security issues in our dependencies!
They are a lot of other advantages (rollbacks, garbage collection, distributed builds and cache, the declarative nature of Nix, etc), but this is starting to be off-topic.


I hope this answer will help you! Do not hesitate to contact me if you have questions.

Cheers,

Stéphane

CEA / DRF / IRFU / DIS / LDISC 


On Mon, 2024-02-26 at 11:23 +0100, Ralph Lange via Tech-talk wrote:
> 
> Dear all,
> 
> For large control system installations consisting of many different  
> applications (largely independent units covering subsystems), we  
> would like to have something that analyzes the application and  
> generates some metrics, to give us a better grip on what we're  
> dealing with.
> 
> The context is "obsolescence management". More and more of ITER's  
> expected ~170 subsystems are being delivered, integrated and start to
> bitrot.
> 
> With our limited manpower, we need to make informed decisions about  
> where to start updating and migrating. It would help to have a few  
> comparable numbers describing the size, complexity, regularity, ...  
> of the candidates.
> 
> Did anyone ever start efforts in that direction?
> Any formulas that generated meaningful and reliable data?
> 
> Thanks a lot for your help and ideas,
> ~Ralph
>





Replies:
Re: Generating metrics for IOC applications Ralph Lange via Tech-talk
References:
Generating metrics for IOC applications Ralph Lange via Tech-talk

Navigate by Date:
Prev: Re: Reg : How to use custom baud rate such as 125000 for ioc running on Linux Mark Rivers via Tech-talk
Next: RE: pvAccess assert error Mark Rivers via Tech-talk
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  <20242025 
Navigate by Thread:
Prev: Generating metrics for IOC applications Ralph Lange via Tech-talk
Next: Re: Generating metrics for IOC applications Ralph Lange via Tech-talk
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  <20242025