![]() |
![]() ![]()
Experimental Physics and
| ||||||||||||||||
|
At EPIC, we log different types of errors to log files using procServ. Error logs, Console logs, CaPut logs and Server logs are the types of logs we capture. We push the logs to a central server using NFS share. For analytics, we use Grafana dashboards. We use Elastic search to process the logs ingested from Logstash. We also use "tags" to differentiate between errors (db_error, connection_error, etc.) in the logstash pipeline. For capturing the status of the servers, we use Prometheus with Node_exporter and Blackbox_exporter for the health (UP/DOWN status) of the servers. We also show the number of IOCs up/down using "node_systemd_unit_state" of Node_exporter (as each IOC runs as a systemd service). Our dashboard is distributed in 3 panels - first panel contains IOC count (server1, server2, etc), second panel contains Hardware errors and the third panel has IOC errors. The IOC errors are captured using keywords like ERROR, etc. and using "tags". Best Regards, Sai -- Saisrikiran Mudigonda Project Software Engineer Extreme Photonics Innovation Centre TIFR, Hyderabad, India On Fri, Nov 8, 2024 at 2:33 AM Dale Cox via Tech-talk <tech-talk at aps.anl.gov> wrote:
| ||||||||||||||||
ANJ, 11 Nov 2024 |
![]() · Download · Search · IRMIS · Talk · Documents · Links · Licensing · |