EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  <20232024  Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  <20232024 
<== Date ==> <== Thread ==>

Subject: Re: Kernel showing segfaults in motorPoller.
From: "Marco A. Barra Montevechi Filho via Tech-talk" <tech-talk at aps.anl.gov>
To: "tech-talk at aps.anl.gov" <tech-talk at aps.anl.gov>
Cc: SWC <swc at lnls.br>
Date: Fri, 27 Jan 2023 19:47:15 +0000
Solved for Pmac controllers. Apparently there were .cmd files using pmacCreateCSAxes() to create more Axes than needed. Substituting them for pmacCreateCSAxis for each needed axis solved.

Also, i was wrong about not having core dumps: the machine was configured to limit the maximum core dump file size to 0, so nothing was generated.

Changing /etc/security/limits.conf solved it.  prlimit command was also pretty useful.

Thanks,

Marco

From: Marco A. Barra Montevechi Filho
Sent: 26 January 2023 15:01
To: tech-talk at aps.anl.gov <tech-talk at aps.anl.gov>
Cc: SWC <swc at lnls.br>
Subject: Kernel showing segfaults in motorPoller.
 
Hello all, good evening. 

We are experimenting several segfaults with motor IOCs here. Behavior is unpredictable, but i suspect common cause. Thanks in advance for anyone who has tips on how to debug.

Host machine: Linux - debian 9.11. Kernel version 4.9.0-11-amd64.

It runs a variable number of EPICS IOCs with procServ, between 20 and 30 depending on the day. Segfaults were noticed with dmesg after some clients got virtual circuit disconnect messages from camonitors or instantiated PV objects in pyepics.

In this machine, dmesg -T shows:

[qui jan 26 11:43:12 2023] motorPoller[2714]: segfault at 55d4ff3a2e68 ip 00007f38dac4f77b sp 00007f38d61e6e80 error 4 in libpmacAsynMotorPort.so[7f38dac30000+6a000]
[qui jan 26 11:45:21 2023] motorPoller[2814]: segfault at 55a6b6964d58 ip 00007f37ead6177b sp 00007f37e62f8e80 error 4 in libpmacAsynMotorPort.so[7f37ead42000+6a000]
[qui jan 26 11:54:03 2023] motorPoller[3245]: segfault at 55cde92e2df8 ip 00007f015eae877b sp 00007f015a07fe80 error 4 in libpmacAsynMotorPort.so[7f015eac9000+6a000]
[qui jan 26 12:03:40 2023] motorPoller[5289]: segfault at 560a9c276d98 ip 00007fabc71b377b sp 00007fabc274ae80 error 4 in libpmacAsynMotorPort.so[7fabc7194000+6a000]
[qui jan 26 12:22:06 2023] motorPoller[5494]: segfault at 55eaaf933da8 ip 00007ff3a6cbe77b sp 00007ff3a2255e80 error 4 in libpmacAsynMotorPort.so[7ff3a6c9f000+6a000]

dmesg version is 2.29.2.

i have no core dump, possibly because sigfault is in thread and not in main IOC process. Main IOC process can take up to 15 seconds after kernel segfault message to actually terminate and be automatically restarted by procServ.

Segfault is noticeably frequent. Frequency varies, possibly being higher when clients are using IOC. I have seen whole days without a single segfault and days in which it happens once every 10 minutes.

Problematic iocs are from Pmac module (pulled from here: https://github.com/dls-controls/pmac), but i  have observed similar behavior in NPT NewFocus module IOCs.

I tried running

gdb --args /path/to/ioc/binary/executable /path/to/st.cmd/file

with procServ so that when IOC sigfaulted i could telnet into the gdb console and debug, but the only IOC in which i could do that didnt segfault until now.
In Pmac IOCs, what i can observe until now is several "fail to parse axis status" messages, but i also observe these in non-problematic IOCs:

2023/01/25 17:20:51.692 getAxisStatus: Failed to parse position. Key: #33P  Value:  $0000400000000000
2023/01/25 17:20:51.693 pmacCSAxis::GetAxisStatus: Failed to parse position. Key: &1Q89  Value:  $0000f80000000000

Im probably going to get more info on that but for now its all i have.
Any tips on how to debug this?

Aviso Legal: Esta mensagem e seus anexos podem conter informações confidenciais e/ou de uso restrito. Observe atentamente seu conteúdo e considere eventual consulta ao remetente antes de copiá-la, divulgá-la ou distribuí-la. Se você recebeu esta mensagem por engano, por favor avise o remetente e apague-a imediatamente.

Disclaimer: This email and its attachments may contain confidential and/or privileged information. Observe its content carefully and consider possible querying to the sender before copying, disclosing or distributing it. If you have received this email by mistake, please notify the sender and delete it immediately.


Replies:
Re: Kernel showing segfaults in motorPoller. Ralph Lange via Tech-talk
References:
Kernel showing segfaults in motorPoller. Marco A. Barra Montevechi Filho via Tech-talk

Navigate by Date:
Prev: EPICS collaboration meeting - save the date Pierrick M Hanlet via Tech-talk
Next: Re: Kernel showing segfaults in motorPoller. Ralph Lange via Tech-talk
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  <20232024 
Navigate by Thread:
Prev: Re: Kernel showing segfaults in motorPoller. Ralph Lange via Tech-talk
Next: Re: Kernel showing segfaults in motorPoller. Ralph Lange via Tech-talk
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  <20232024 
ANJ, 30 Jan 2023 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·