EPICS Home

Experimental Physics and Industrial Control System


 
1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  <20232024  Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  <20232024 
<== Date ==> <== Thread ==>

Subject: Re: Suspended thread
From: Andrew Johnson via Tech-talk <tech-talk at aps.anl.gov>
To: tech-talk at aps.anl.gov
Date: Wed, 12 Jul 2023 16:15:43 -0500
Hi Mark,

On 7/12/23 3:38 PM, Mark Rivers via Tech-talk wrote:

I have a Linux IOC (base 7.0.7) that is throwing the following error a few seconds after iocInit:

 

Thread % (0x7f711c021600) suspended

 

I would like to figure out what thread that is and why it is being suspended.  This is the output of epicsThreadShowAll:

<snip>
    CAC-TCP-recv 0x7f711000a770    71172     49       0       OK

    CAC-TCP-send 0x7f711000ab40    71173     51       0       OK

    CAC-TCP-recv 0x7f7110016f60    71174     49       0       OK

    CAC-TCP-send 0x7f7110017360    71175     51       0       OK

    CAC-TCP-recv 0x7f7110022b60    71176     19       0       OK

    CAC-TCP-send 0x7f7110022f60    71177     21       0       OK

    CAC-TCP-recv 0x7f711002f380    71178     49       0       OK

    CAC-TCP-send 0x7f711002f780    71179     51       0       OK

    CAC-TCP-recv 0x7f7110033d00    71180     49       0       OK

    CAC-TCP-send 0x7f7110034100    71181     51       0       OK

      timerQueue 0x7f717400fae0    71182     44       0       OK

         CAC-UDP 0x7f71740206f0    71183     45       0       OK

    CAC-TCP-recv 0x7f71480085f0    71184     42       0       OK

    CAC-TCP-send 0x7f71480089f0    71185     44       0       OK

    CAC-TCP-recv 0x7f7148014fa0    71186     42       0       OK

    CAC-TCP-send 0x7f71480153a0    71187     44       0       OK

       CAS-event 0x7f7144001460    71188     19       0       OK

      CAS-client 0x7f7144001720    71189     20       0       OK

       CAS-event 0x7f7144001de0    71190     19       0       OK

      CAS-client 0x7f71440020a0    71191     20       0       OK

       CAS-event 0x7f7144002760    71192     19       0       OK

      CAS-client 0x7f7144002a20    71193     20       0       OK

       CAC-event 0x7f711c3421a0    71194     21       0       OK

       CAS-event 0x7f71440030e0    71195     35       0       OK

      CAS-client 0x7f71440033a0    71196     36       0       OK

       CAS-event 0x7f71440086d0    71197     19       0       OK

      CAS-client 0x7f7144008990    71198     20       0       OK

       CAS-event 0x7f7144009050    71199     35       0       OK

      CAS-client 0x7f7144009310    71200     36       0       OK

       CAS-event 0x7f71440099d0    71201     19       0       OK

      CAS-client 0x7f7144009c90    71202     20       0       OK

       CAS-event 0x7f714400ac80    71203     35       0       OK

      CAS-client 0x7f714400af40    71204     36       0       OK

       CAS-event 0x7f714400b620    71205     35       0       OK

      CAS-client 0x7f714400b8e0    71206     36       0       OK

       CAS-event 0x7f714400bfa0    71207     19       0       OK

      CAS-client 0x7f714400c260    71208     20       0       OK

 

The thread that was suspended is not in the threads shown with epicsThreadShowAll.

If you could add an epicsThreadShowAll call to your st.cmd file immediately after the iocInit call you might catch the matching EPICS ID. I suspect it's a CAS-client thread, given the similar existing IDs shown above, but that means it could be a thread that isn't started until a short time after the IOC is running. If you want to keep trying you could add an epicsThreadSleep() before it and change the delay until you manage to catch it, but it'll probably be random.

As to how you track it further that could be harder. IIRC a CAS-client thread is one that is created for a specific CA client program, and is processes the incoming CA messages and calls the IOC code to do whatever the client asks for, so it could be processing records and device support.

The message itself is coming from the task watchdog, which notices suspended threads and if the thread has registered a notification routine it calls that, thus allowing that subsystem (probably RSRV in this case is my guess) to clean up and delete the thread. That's why you aren't finding it, the watchdog runs every 6 seconds so that's the maximum length of time this task could be in the suspended state.

Do you have any clients connected to this IOC that are reporting CA channels being disconnected? That might be the easiest way to work out what could be causing the problem.

HTH,

- Andrew

-- 
Complexity comes for free, Simplicity you have to work for.

References:
Suspended thread Mark Rivers via Tech-talk

Navigate by Date:
Prev: Suspended thread Mark Rivers via Tech-talk
Next: =?gb18030?b?u9i4tKO6UkU6IEEgcXVlc3Rpb24gb24gbW9kYnVz?= =?gb18030?b?QW1iZXI=?= via Tech-talk
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  <20232024 
Navigate by Thread:
Prev: Suspended thread Mark Rivers via Tech-talk
Next: Xspress3 INP error lynn via Tech-talk
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  <20232024