2002 2003 2004 2005 <2006> 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 | Index | 2002 2003 2004 2005 <2006> 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 |
<== Date ==> | <== Thread ==> |
---|
Subject: | RE: Status Report |
From: | "Jeff Hill" <[email protected]> |
To: | "'Matej Sekoranja'" <[email protected]> |
Cc: | "'Thomas Pelaia II'" <[email protected]>, "'Hiroyuki Sako'" <[email protected]>, "'John Galambos'" <[email protected]>, "'Christopher K. Allen'" <[email protected]>, "'Ernest Williams'" <[email protected]>, "'Kay Kasemir'" <[email protected]>, "'NP Rees'" <[email protected]>, "'EPICS core-talk'" <[email protected]> |
Date: | Fri, 22 Dec 2006 15:32:17 -0700 |
Hi Matej,
> While testing I've also found a (dead)lock in CA library. > Simple lines as: > > for (p = 0; p < 1000; p++) { > ca_create_channel (pvs[n].name, > pCB, > &pvs[n], > 0, > &pvs[n].chid); > /*epicsThreadSleep(0.1);*/ > } > > can produce a (dead)lock. > Lock happens randomly and can not be determined by a number of issued requests. > If I uncomment thread sleep, the lock disappears. > I can provide a stack trace.
Tests of this type are a normal part of “catime” (the ca performance test and “acctst” (the ca regression test). For example the following code is in catime.
So I am not sure what is different with your situation. Do you see this problem only when access security files are not installed? I am at the moment testing with the example server. I will need to need to run the tests again against an ordinary IOC also.
/* * test_search () */ LOCAL void test_search ( ti *pItems, unsigned iterations, unsigned *pInlineIter ) { unsigned i; int status;
for ( i = 0u; i < iterations; i++ ) { status = ca_search ( pItems[i].name, &pItems[i].chix ); SEVCHK (status, NULL); } status = ca_pend_io ( 0.0 ); SEVCHK ( status, NULL );
*pInlineIter = 1; }
From: Matej Sekoranja
[mailto:[email protected]]
Hi,
I've been investigating this issue about CAJ deadlocking an IOC. I've done lots of testing on the latest CAJ and an IOC but I've always failed to deadlock the IOC. From the stack traces of IOC deadlock (once, but later never reproducible) and Mantis bug 268 mentioned by Jeff, I've concentrated on access rights callback.
I've created a simple record counter:
record(calc, record2) { field(VAL, "1") field(SCAN, ".1 second") field(CALC, "A+1") field(INPA, "record2") }
... and changing access rights policy (so that IOC will generate ACL callbacks to the client).
ASG(DEFAULT) { INPA(record2) RULE(0, READ) RULE(0, WRITE) { CALC("A%2") } }
Then I've written a C client with preemptive context which blocks (sleeps) in a connection callback. Client blocked at first connection callback and server reported 18 created channels. Looking at the "casr 10" I noticed that server's send queue size for this particular blocking client is increasing up to its limit of 16k. When this limit is reached the IOC is unable to create any new channel from other clients (e.g. using caget utility). So any badly written client (C or Java) can block (TCP buffers become full), but should not affect other server's clients. This was tested with 3.14.8.2.
I tested this with 3.14.9-pre2 and the IOC does not block anymore. Jeff, your patch seems to work.
I guess the deadlocking issue was present when CAJ didnot implement flow control (first versions). Now CAJ has it implemented (and search algorithm as well) and I guess the IOC deadlock cannot happen anymore.
While testing I've also found a (dead)lock in CA library. Simple lines as:
for (p = 0; p < 1000; p++) { ca_create_channel (pvs[n].name, pCB, &pvs[n], 0, &pvs[n].chid); /*epicsThreadSleep(0.1);*/ }
can produce a (dead)lock. Lock happens randomly and can not be determined by a number of issued requests. If I uncomment thread sleep, the lock disappears. I can provide a stack trace.
Cheers, Matej
|