Hi Oleg,
I'm not sure if this matters or not, but I can tell you that the IP address
in the original message belongs to a Kiethly device named bl1keithley. This
might help to narrow down which record(s) are involved.
Marty
On 04/10/2018 04:44 AM, Ralph Lange wrote:
Hi Oleg,
Remote diagnosis of an unknown system is always more of a guessing game than
anything else.
So, first and most important suggestion: refer to a local expert.
Nevertheless, some thoughts:
Statistically, many if not most weird errors on the IOC are caused by memory
corruption.
In your case, the thread suspensions happen when the CA server on the IOC calls
db_event_enable (line 477) or db_event_disable (line 493), and trying to acquire
the monitor lock fails with an error.
The routines db_event_enable/ db_event_disable are called from within the CA
server when access rights change for a record or when a client sets up / cancels
a monitor.
Were there access rights changes happening on the IOC at 07-Apr 19:59:10 and
08-Apr 08:11:10 (at the "line 493" events)?
Some "line 477" thread suspensions happen with intervals of a few minutes. That
could match a client repeatedly getting ungracefully disconnected (because of
the server-side thread being suspended) and then reconnecting, provoking another
attempt to lock an invalid monitor lock and get disconnected again.
The semaphore locking code is used everywhere, all the time, all over EPICS
Base. Not an obvious candidate for a bug.
So ... I think what you see may be consistent with a memory corruption that
affects at least one record (i.e. the pointer to its monitor lock semaphore) or
the memory area where the semaphore structures have been allocated.
Too bad that the error messages don't show the record involved. That would give
valuable information.
Memory corruption issues (if there is one) are not easy to track down;
strategies and tools depend on the operating system. Which closes the loop to my
first and most important suggestion: refer to a local expert.
Cheers,
~Ralph
On Mon, Apr 9, 2018 at 10:40 PM, Oleg A. Makarov <[email protected]
<mailto:[email protected]>> wrote:
Ralph,
could you please provide some suggestions how to diagnose what causing
suspension of CAS-client threads ?
Thank you,
Oleg
- References:
- EPICS CAS errors Oleg A. Makarov
- Re: EPICS CAS errors Ralph Lange
- Re: EPICS CAS errors Oleg A. Makarov
- Re: EPICS CAS errors Ralph Lange
- Navigate by Date:
- Prev:
RE: driver support for Gamma Vacuum QPC 4 [email protected]
- Next:
Re: driver support for Gamma Vacuum QPC 4 Martin L. Smith
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
<2018>
2019
2020
2021
2022
2023
2024
- Navigate by Thread:
- Prev:
Re: EPICS CAS errors Ralph Lange
- Next:
Re: EPICS CAS errors Oleg A. Makarov
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
<2018>
2019
2020
2021
2022
2023
2024
|