|
Subject: |
Re: ADAndor IOC Restart Issue: Camera Handle Already Claimed After Ctrl+X Termination |
|
From: |
"Henrique F. Simoes via Tech-talk" <tech-talk at aps.anl.gov> |
|
To: |
Tech-Talk EPICS <tech-talk at aps.anl.gov> |
|
Date: |
Thu, 5 Mar 2026 21:25:38 +0000 |
On Thu Mar 5, 2026 at 1:15 PM -03, Yong via Tech-talk Hu wrote:
> In procServ, Ctrl+X is the “kill child” command by default (--killcmd,
> default ^X). When procServ receives it, it kills the child process
> using --killsig, whose default is signal 9 (SIGKILL). SIGKILL cannot
> be handled by the IOC, so no cleanup runs.
>
> If you really want to use Ctrl + X, here is ChatGPT's suggestion: one
> way is to modify your procServ command, passing --killsig=2
ChatGPT trusts too much in the documentation. Unfortunately, there are a
few bugs that prevent this setup to actually work.
One of them was reported by Michael Davidsaver a while back [1] (and a
patch also proposed), which is related to the child processes inheriting
the parent's signal block mask. This prevents you to use SIGTERM (15),
SIGHUP (1) and SIGPIPE (13) for the --killsig argument.
[1] https://urldefense.us/v3/__https://github.com/ralphlange/procServ/pull/50__;!!G_uCfscf7eWS!alT-X0FPHtJCOtmPXCMjueOwdPuujiQnPfEfoFxlK4Q2Oc1FPMhdBhWKdPMh1XFONxIkC2IL8R39usZX7wUfNbjeI7iz8w65$
Another is the fact that procServ will send a SIGKILL to the child
process right after sending your requested signal. For instance,
consider the following execution:
$ strace -fo /tmp/strace.log ./procServ --killsig=2 unix:/tmp/tail.sock tail -f /dev/null
If we send ^X to tail.sock, we can see the following excerpt in
strace(1) log file:
```
1634835 write(5, "\r\n@@@ Got a kill command\r\n", 26) = 26
1634835 kill(-1634836, SIGINT) = 0
1634835 write(4, "\n", 1 <unfinished ...>
1634836 <... clock_nanosleep resumed>{tv_sec=0, tv_nsec=393047112}) = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
1634835 <... write resumed>) = 1
1634836 --- SIGINT {si_signo=SIGINT, si_code=SI_USER, si_pid=1634835, si_uid=1812038345} ---
1634835 wait4(-1, 0x7fff6304581c, WNOHANG, NULL) = 0
1634835 pselect6(6, [3 4 5], NULL, NULL, {tv_sec=0, tv_nsec=500000000}, {sigmask=[], sigsetsize=8}) = 1 (in [4], left {tv_sec=0, tv_nsec=499993481})
1634835 read(4, <unfinished ...>
1634836 +++ killed by SIGINT +++
1634835 <... read resumed>"\r\n", 1599) = 2
1634835 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_KILLED, si_pid=1634836, si_uid=1812038345, si_status=SIGINT, si_utime=0, si_stime=0} ---
1634835 write(5, "\r\n", 2) = 2
1634835 wait4(-1, [{WIFSIGNALED(s) && WTERMSIG(s) == SIGINT}], WNOHANG, NULL) = 1634836
1634835 write(5, "\r\n", 2) = 2
1634835 write(5, "@@@ @@@ @@@ @@@ @@@\r\n", 21) = 21
1634835 write(5, "@@@ Received a sigChild for proc"..., 81) = 81
1634835 write(5, "@@@ Current time: Thu Mar 5 17:"..., 44) = 44
1634835 write(5, "@@@ Child process is shutting do"..., 73) = 73
1634835 write(5, "@@@ ^R or ^X restarts the child,"..., 54) = 54
1634835 kill(-1634836, SIGKILL) = -1 ESRCH (No such process)
1634835 close(4) = 0
1634835 close(4) = -1 EBADF (Bad file descriptor)
```
So it will first correctly send SIGINT (2) (in this case), which turns
out to kill the process in my example. However, it also sends a SIGKILL
a few syscalls later. If your process needs to do a little work to deal
with SIGINT, it will likely get forcefully killed before that happens.
The termination mechanism in procServ is quite messy, IMHO. I guess the
SIGKILL culprit here is the processClass destructor, which SIGKILLs
whenever it has the PID stored [2].
[2] https://urldefense.us/v3/__https://github.com/ralphlange/procServ/blob/e106eb80f7ffb7d82de44963f46ae74f96758129/processFactory.cc*L117__;Iw!!G_uCfscf7eWS!alT-X0FPHtJCOtmPXCMjueOwdPuujiQnPfEfoFxlK4Q2Oc1FPMhdBhWKdPMh1XFONxIkC2IL8R39usZX7wUfNbjeI77EpUAu$
Again, Michael proposed a kind of fix for this last year [3], which is
to add an option to specify the amount of time to wait before destroying
the child.
[3] https://urldefense.us/v3/__https://github.com/ralphlange/procServ/pull/70__;!!G_uCfscf7eWS!alT-X0FPHtJCOtmPXCMjueOwdPuujiQnPfEfoFxlK4Q2Oc1FPMhdBhWKdPMh1XFONxIkC2IL8R39usZX7wUfNbjeI17ZyjhU$
IMO, a more elegant solution would be to make Michael's new
--grace-period argument a timeout instead, and use waitpid(2) to
actually wait for the child to finish. I started a draft patch series
for that some time ago [4], but I never actually got around putting it
in good shape.
[4] https://urldefense.us/v3/__https://github.com/henriquesimoes/procServ/pull/1__;!!G_uCfscf7eWS!alT-X0FPHtJCOtmPXCMjueOwdPuujiQnPfEfoFxlK4Q2Oc1FPMhdBhWKdPMh1XFONxIkC2IL8R39usZX7wUfNbjeI0fWkTJz$
Long story short: what you describe should ideally work, but it won't
with the latest procServ stable version (2.8.0) nor the latest
development version (e106eb8) (and probably with older versions as
well).
--
Henrique F. Simões
Control Software Group (SwC)
Brazilian Synchrotron Light Laboratory (LNLS)
Brazilian Center for Research in Energy and Materials (CNPEM)
Aviso Legal: Esta mensagem e seus anexos podem conter informações confidenciais e/ou de uso restrito. Observe atentamente seu conteúdo e considere eventual consulta ao remetente antes de copiá-la, divulgá-la ou distribuí-la. Se você recebeu esta mensagem por engano, por favor avise o remetente e apague-a imediatamente.
Disclaimer: This email and its attachments may contain confidential and/or privileged information. Observe its content carefully and consider possible querying to the sender before copying, disclosing or distributing it. If you have received this email by mistake, please notify the sender and delete it immediately.
- Replies:
- Re: ADAndor IOC Restart Issue: Camera Handle Already Claimed After Ctrl+X Termination Kim, Kuktae via Tech-talk
- References:
- ADAndor IOC Restart Issue: Camera Handle Already Claimed After Ctrl+X Termination Kim, Kuktae via Tech-talk
- Re: ADAndor IOC Restart Issue: Camera Handle Already Claimed After Ctrl+X Termination Mark Rivers via Tech-talk
- Re: ADAndor IOC Restart Issue: Camera Handle Already Claimed After Ctrl+X Termination Hu, Yong via Tech-talk
- Navigate by Date:
- Prev:
Re: Help in debugging ACF connection issue Hu, Yong via Tech-talk
- Next:
Propagating Frontend-Authenticated User Identity to EPICS CA/PVA Clients André Favoto via Tech-talk
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
<2026>
- Navigate by Thread:
- Prev:
Re: ADAndor IOC Restart Issue: Camera Handle Already Claimed After Ctrl+X Termination Hu, Yong via Tech-talk
- Next:
Re: ADAndor IOC Restart Issue: Camera Handle Already Claimed After Ctrl+X Termination Kim, Kuktae via Tech-talk
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
<2026>
|