1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 <2011> 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 | Index | 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 <2011> 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 |
<== Date ==> | <== Thread ==> |
---|
Subject: | Re: Controls Network Monitoring Tools |
From: | Ralph Lange <[email protected]> |
To: | Martin Pieck <[email protected]> |
Cc: | [email protected] |
Date: | Thu, 28 Jul 2011 12:22:10 +0200 |
Hi Martin,for our accelerators (BESSY II, MLS, test setups) we decided to go different paths for monitoring the machine and monitoring the controls infrastructure, as we found the target audience was quite different: while the machine alarms go primarily to the operators and machine physicists, the infrastructure alarms are more directed to the controls and IT/network people. In many cases these groups would not even be able to assess alarm messages of the remote kind and anticipate effects correctly. Of course, there's an intersection that shows up in both systems.
The machine is connected through EPICS, and the common EPICS tools (alh with SMS notification, etc.) are used.
For the controls infrastructure we use a Nagios based system to run checks, collect and organize the results. Browser plugin and email notification are the main notification mechanisms. Nagios originates from the IT sector, so network components and services are extremely well supported with little extra effort. To support EPICS-integrated components, there are active (caget) and passive (camonitor daemon) check plugins available.
While the SNMP device support allows to monitor network component statistics through Channel Access, it is very hard under EPICS (and very easy under Nagios) to monitor the health of web and network based services: Is the "Operator Help" Wiki running? Is the web server for the panels up? Does the trouble ticketing system work? Is the mail server up and accepting mails for the local domain? Is a certain machine SSH accessible? Are the archive engines working and connected to their channels? These things are hard to convey through EPICS, but very important for successful operation.
Just my 2 cents... ~Ralph On 27.07.2011 19:43, Martin Pieck wrote:
Hello, We are in process of installing new single mode fiber optics cable in our LANSCE User Facility and have the opportunity to invest in some network monitoring soft and hardware. So, I was wondering what people are using to monitor their fiber and twisted pair controls network from the switch and router level all the way down to the device level? What are the pros and cons of that system? Any feedback is appreciated! Thanks in advance, Martin