Experimental Physics and Industrial Control System
|
Subject: |
Re: Stalled CA connection (IOC to CS-Studio archiver) |
From: |
Ralph Lange <[email protected]> |
To: |
EPICS Core Talk <[email protected]> |
Date: |
Fri, 25 Aug 2017 21:52:28 +0200 |
Update: We have been able to reproduce the issue on a set of wiresharked boxes.
After staring at wireshark captures for a good while, here's our current most probable explanation: - The caj Java client (BEAUTY archive engine) is happily connected to the IOC.
- The connection is very busy, at above 40K updates (double+status+timestamp) per second. We see congestion mode (events_off / events_on message pairs), i.e. the client is the bottleneck, while the IOC is still sort of relaxed, sending out beacons like a clockwork.
- At some point, the client gets so busy that it stops decoding the IOC's beacons. The reason for this moment of saturation is not clear. Garbage collection?
- When the next beacon gets through (after a few minutes!), the client decides the IOC is unresponsive, and issues a ping (echo request) on the TCP circuit.
- The TCP circuit is so busy that the echo doesn't return within the 5 second timeout period. The client declares the IOC dead (while continuing to receive lots of updates from it).
- We see the client doing name resolution broadcasts, the IOC answers.
- The client - still receiving lots of updates - issues event_add subscription requests for all channels.
- Trying to send the initial value update responses for the new subscriptions, within a very short time the IOC fills its send buffers and blocks.
- At the same time, the client fills it send buffers with event_add messages and blocks.
- Both ends continuously fail to send (send buffers full), and never receive (receive buffers not empty).
Bottom line: an IOC may be unresponsive but not dead at all. Getting lots of updates should count as a sign of life.
Does that sound realistic? We can put up the original captures on an accessible place for download if someone is interested.
Thanks for your help, ~Ralph
- Replies:
- Re: Stalled CA connection (IOC to CS-Studio archiver) Kasemir, Kay
- References:
- Stalled CA connection (IOC to CS-Studio archiver) Ralph Lange
- Re: Stalled CA connection (IOC to CS-Studio archiver) Kasemir, Kay
- Re: Stalled CA connection (IOC to CS-Studio archiver) Michael Davidsaver
- Navigate by Date:
- Prev:
Re: Possible Access Rights improvement? Kasemir, Kay
- Next:
Re: Stalled CA connection (IOC to CS-Studio archiver) Kasemir, Kay
- Index:
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
<2017>
2018
2019
2020
2021
2022
2023
2024
- Navigate by Thread:
- Prev:
Re: Stalled CA connection (IOC to CS-Studio archiver) Michael Davidsaver
- Next:
Re: Stalled CA connection (IOC to CS-Studio archiver) Kasemir, Kay
- Index:
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
<2017>
2018
2019
2020
2021
2022
2023
2024
|
ANJ, 21 Dec 2017 |
·
Home
·
News
·
About
·
Base
·
Modules
·
Extensions
·
Distributions
·
Download
·
·
Search
·
EPICS V4
·
IRMIS
·
Talk
·
Bugs
·
Documents
·
Links
·
Licensing
·
|