Experimental Physics and Industrial Control System
> We recently ran into a very puzzling problem here using the EPICS casr
> (channel access save restore) tool. The problem showed up in one of two
> ways after you push the casrSave or casrRestore buttons.
On UNIX systems the caRepeater process is spawned off using a call to the
fork function to create the new process followed by a call to the exec
function to force the new process to run the caRepeater executable.
The fork function does duplicate any open file descriptors into the new
process. To avoid problems EPICS base does the following.
O In R3.13 the CA client library closes all open files except stdin/out/err.
O In almost all versions of R3.14, instead of closing open files, the "close
on exec flag" is set for all sockets created by a special socket creation
function in EPICS base. This is a less intrusive approach.
Jeff
> -----Original Message-----
> From: Dennis Nicklaus [mailto:[email protected]]
> Sent: Wednesday, January 03, 2007 4:10 PM
> To: [email protected]
> Subject: caRepeater must run before casr
>
> We recently ran into a very puzzling problem here using the EPICS casr
> (channel access save restore) tool. The problem showed up in one of two
> ways after you push the casrSave or casrRestore buttons.
>
> Sometimes the Tcl/Tk casr interface would give an error dialog saying,
> "error waiting for process to exit: child process lost (is SIGCHLD
> ignored or trapped?)"
> and other times it would just hang forever after you push
> casrSave/casrRestore
> without the error dialog (though the save/restore would be processed).
>
> The short solution is that you must have caRepeater running before
> running casr.
>
> A brief summary of the gory details: when one presses the Tk casrSave
> button, that causes tcl to
> exec the casave program. casave in turn starts carepeater if carepeater
> isn't already there.
> carepeater, in trying to be a nice forked process, closes all its file
> descriptors except
> stdin, stdout ,and stderr. This is part of where the problem starts
> because the pipe open between
> the top-level wish (tcl) shell and the casave program gets dup-ed to
> stdout of casave,
> then when casave clones/forks off carepeater, the same stdout remains
> open in carepeater.
> Then when casave finishes, it's dead, but the higher level tcl is still
> trying to read() on the pipe,
> which is being held open by carepeater. This wouldn't be a problem if
> the high level tcl shell
> were getting a SIGCHLD from the casave process, but by sifting through
> trace output,
> we saw that the casave process was being started with the clone() system
> call without
> specifying SIGCHLD in the flags, and, as the clone() man page says, "If
> no signal is specified, then the parent process is not signaled when
> the child terminates." We don't know if this is a mistake in the
> version of tcl we have or something with the version of linux and TLS we
> happen to be running,
> though it happens on multiple linux kernel versions we have.
>
> YMMV widely depending on your verions of unix and tcl.
>
> I'm not suggesting anything necessarily needs to change in casr or
> caRepeater, just trying to point out a bizarre problem someone else may
> bump into along the way.
>
> Many thanks to Ron Rechenmacher who spent many hours puzzling over this
> one.
>
> Dennis
>
- Replies:
- Re: caRepeater must run before casr Eric Norum
- References:
- caRepeater must run before casr Dennis Nicklaus
- Navigate by Date:
- Prev:
synApps version 5.2 release Tim Mooney
- Next:
Re: caRepeater must run before casr Eric Norum
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
<2007>
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
- Navigate by Thread:
- Prev:
caRepeater must run before casr Dennis Nicklaus
- Next:
Re: caRepeater must run before casr Eric Norum
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
<2007>
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024