1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 <2010> 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 | Index | 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 <2010> 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 |
<== Date ==> | <== Thread ==> |
---|
Subject: | RE: cac_select_io Segmentation fault |
From: | "Al Honey" <[email protected]> |
To: | "Jeff Hill" <[email protected]> |
Cc: | [email protected] |
Date: | Wed, 7 Apr 2010 09:25:30 -1000 |
No negativity noticed J We have deployed r3.14 caRepeaters. And we
have some random issues: sometimes we have ‘get’ failure responses,
indicating channels are disconnected or non-existant, in one client, whilst
other clients are happy; responses from multiple IOCs for a given channel when
only one of those IOCs has that channel. I have not had time to delve into
those. The errors seem to be realtime IOC/board and/or possibly vxWorks related,
as there is one newer processor (not PPC as are most of the others) running a newer
version of vxWorks, from which I never see those types of errors. I think our big issue, with respect to
forging forward with multi-thread clients, (i.e. using r3.14 for all clients) is
that major modifications would need to be made to the layer we have between CA
and our clients (said layer hides CA, as it is not the only inter-process/processor
communications mechanism in place, for instance we have numerous RPC systems; and
other socket based systems). Most of our operational clients do not interface
directly to CA. Hence, that ‘layer’ is critical. It was the
application interface provided to all our sister institutions (which create non-EPICS/CA
instruments/systems) back in the early 90’s. I have been studying that ‘layer’
in great detail, in my attempt to solve the multi-threaded issue (r3.13.10),
and it may be that I am now sufficiently less ignorant that I can make those
modifications. If that is the case then I will no doubt have more
questions. So, thanks for pointing me to pertinent documents that will
make the transition form r3.13.10 to r.3.14 possible. Does someone have a simple multi-threaded
example, utilizing r3.14 (so I can compare the CA library calls with what we
are currently using in r3.13.10)? Cheers, Al From: Jeff Hill
[mailto:[email protected]] Aloha again Allan, Sorry, after
rereading my message, the tone sounds a bit negative which wasn’t
my intent. I should have said, “please read also the section in the
reference manual entitled - Thread Safety and Preemptive Callback to User Code”. When designing this type of application, one
must decide if CA callbacks should occur only when periodically executing in a
CA client library function such as ca_poll, or if the CA callbacks should occur
asynchronously, as soon as the network messages are processed by the auxiliary
threads in the library. Either approach can be used in a multi-threaded
program. Jeff Message
content: TSPA From:
[email protected] [mailto:[email protected]] On Behalf Of Jeff Hill Aloha Allan, Ø Does the seg
fault occur because r3.13.10 is NOT thread safe? The R3.13 CA Client
library is definitely __not__ thread safe, and I can easily imagine that this
might be the cause of your seg fault. Ø Does anyone
have an example of a multi-threaded app using r3.13.10 on UNIX? The R3.14 CA client library _is_ thread safe, and it should also
interoperate fine with R3.13 IOCs. We routinely operate LANSCE with that
configuration in our production system. Our control room runs R3.14, but many
of our IOCs still run R3.13. You should read the section in the reference
manual entitled “Thread Safety and Preemptive Callback to User Code“.
Jeff Message
content: TSPA From:
[email protected] [mailto:[email protected]] On Behalf Of Al Honey Aloha I am trying to get a multi-threaded application working on
SunOs 5.10 with connection to two UNIX IOC’s. I get a seg fault for ellDelete, two statements from the end
of cac_select_io() (epics/r3.13.10/base/src/ca/bsd_depen.c). The seg fault does not occur immediately but within a couple
of minutes (connections are to two IOC’s running on UNIX, with events
from two long records on each IOC, where one record on each system is updated
at 1 hz and the other at 10 hz). Does the seg fault occur because r3.13.10 is NOT thread
safe? Does anyone have an example of a multi-threaded app using
r3.13.10 on UNIX? Thanks, Allan |