EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  <20232024  Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  <20232024 
<== Date ==> <== Thread ==>

Subject: RE: sequencer problem
From: Mark Rivers via Tech-talk <tech-talk at aps.anl.gov>
To: Michael Davidsaver <mdavidsaver at gmail.com>
Cc: "tech-talk at aps.anl.gov" <tech-talk at aps.anl.gov>
Date: Thu, 11 May 2023 00:24:48 +0000
Hi Michael,

Thanks.  But as I said the motors (e.g. 13BMA:m17.RBV) are running in a VxWorks IOC.  I don't have a debugger for VxWorks.

Mark


-----Original Message-----
From: Michael Davidsaver <mdavidsaver at gmail.com> 
Sent: Wednesday, May 10, 2023 7:20 PM
To: Mark Rivers <rivers at cars.uchicago.edu>
Cc: tech-talk at aps.anl.gov
Subject: Re: sequencer problem

On 5/10/23 16:23, Mark Rivers via Tech-talk wrote:
> I have found that a simple caget also causes strange behavior:

My first suspicion is that some code is blocking for an extended time with a record lock held.  Either directly, or indirectly eg. by holding some other lock which a device support blocks on.

If this is such a deadlock, then one potentially easy way way to get more information would be to, effectively, catch the blockers in the act.
Run your IOC in a debugger, issue the caget, then immediately switch and issue a manual break (Ctrl+C) in the debugger and finally run "thread apply all backtrace".

If I am right in my guess, this should show a client TCP blocking on a dbScanLock(), and hopefully give some idea of which other thread is holding that lock.


> caget causes a CA disconnect:
> 
> corvette:CARS/CARSApp/src>caget 13BMA:m17.RBV
> 
> Read operation timed out: some PV data was not read.
> 
> 13BMA:m17.RBV                  0
> 
> CA.Client.Exception...............................................
> 
>      Warning: "Virtual circuit disconnect"
> 
>      Context: "op=0, channel=13BMA:m17.RBV, type=DBR_TIME_DOUBLE, count=1, ctx="ioc13bma.cars.aps.anl.gov:5064""
> 
>      Source File: ../getCopy.cpp line 91
> 
>      Current Time: Wed May 10 2023 18:16:47.411698930
> 
> ..................................................................
> 
> The record shows this with dbpr”
> 
> ioc13bma> dbpr "13BMA:m17",2
> 
> ACCL: 0.2           ACKS: NO_ALARM      ACKT: YES           ADEL: 0
> 
> AMSG:               ASG :               ATHM: 0             BACC: 0.2
> 
> BDST: 0             BKPT: 00            BVEL: 1             CDIR: 0
> 
> CNEN: Disable       DCOF: 0             DESC: Monochromator DHLM: 
> 42.9453
> 
> DIFF: 0             DINP: CONSTANT      DIR : Pos           DISA: 0
> 
> DISP: 0             DISS: NO_ALARM      DISV: 1             DLLM: -50
> 
> DLY : 0.25          DMOV: 0             DOL : CONSTANT      DRBV: 0
> 
> DTYP: Mclennan PM304                    DVAL: 12.336010114751
> 
> EGU : degrees       ERES: 0             EVNT:               FLNK: 
> CONSTANT
> 
> FOF : 0             FOFF: Frozen        FRAC: 1             HHSV: 
> NO_ALARM
> 
> HIGH: 0             HIHI: 0             HLM : 0             HLS : 0
> 
> HLSV: NO_ALARM      HOMF: 0             HOMR: 0             HOPR: 0
> 
> HSV : NO_ALARM      HVEL: 0             ICOF: 0             IGSET: 0
> 
> INIT:               JAR : 5             JOGF: 0             JOGR: 0
> 
> JVEL: 1             LCNT: 0             LDVL: 12.336010114751
> 
> LLM : 0             LLS : 0             LLSV: NO_ALARM      LOCK: NO
> 
> LOLO: 0             LOPR: 0             LOW : 0             LRLV: 0
> 
> LRVL: 246720        LSPG: Go            LSV : NO_ALARM      LVAL: 
> 3.77871710270
> 
> LVIO: 1             MDEL: 0             MISS: 0             MOVN: 0
> 
> MRES: 5.0e-05       NAME: 13BMA:m17     NAMSG:              NSEV: 
> NO_ALARM
> 
> NSTA: NO_ALARM      NTM : YES           NTMF: 2
> 
> OFF : -8.5572930120512                  OMSL: supervisory
> 
> OUT : VME_IO #C0 S0 @                   PACT: 1             PCOF: 0
> 
> PHAS: 0             PINI: NO            POST:               PP  : 0
> 
> PREC: 4             PREM:               PRIO: LOW           PUTF: 1
> 
> RBV : 0             RCNT: 0             RDBD: 5.0e-05       RDBL: 
> CONSTANT
> 
> RDIF: 0             REP : 0             RHLS: 0             RINP: 
> CONSTANT
> 
> RLLS: 0             RLNK: CONSTANT      RLV : 0             RMOD: 
> Default
> 
> RMP : 0             RPRO: 0             RRBV: 0             RRES: 0
> 
> RSTM: NearZero      RTRY: 0             RVAL: 246720        RVEL: 0
> 
> S   : 1             SBAK: 1             SBAS: 0             SCAN: 
> Passive
> 
> SDIS: CONSTANT      SET : Use           SEVR: INVALID       SMAX: 1
> 
> SPDB: 0             SPMG: Go            SREV: 20000         SSET: 0
> 
> STAT: UDF           STOO: CONSTANT      STOP: 0             SUSE: 0
> 
> SYNC: 0             TDIR: 0             TIME: <undefined>   TPRO: 0
> 
> TSE : 0             TSEL: CONSTANT      TWF : 0             TWR : 0
> 
> TWV : 1             UDF : 0             UDFS: INVALID       UEIP: No
> 
> UREV: 1             URIP: No            VAL : 3.77871710270 VBAS: 0
> 
> VELO: 1             VERS: 7.2           VMAX: 1             VOF : 0
> 
> value = 0 = 0x0
> 
> The record PACT=1, probably set in init_record when it could not find the controller.  But should that be causing the CA errors with caget and the sequencer?
> 
> Mark
> 
> *From:* Mark Rivers
> *Sent:* Wednesday, May 10, 2023 5:07 PM
> *To:* tech-talk at aps.anl.gov
> *Subject:* sequencer problem
> 
> Folks,
> 
> I am seeing behavior I don’t understand with the sequencer.  I am using base 7.0.6.1 and sequencer 2.2.9 (latest).
> 
> I have a VxWorks IOC running a number of motors.  All of the motor records are loaded, but currently one of the motor controllers is not available, so that motor record is not functional.
> 
> I am running an SNL program on a Linux IOC,  and it connects to the motors on the VxWorks system, including the currently non-functional motor.  The SNL program puts monitors several of the motor record fields.
> 
> I see the following messages after iocInit when the SNL program starts:
> 
> seq BM13_Energy, "E=13BMA:E, MONO=13BMA:m17, EXPTAB_Z=13BMD:m22, YXTAL=13BMA:MON:, ZXTAL=13BMA:m14"
> 
> sevr=info Sequencer release 2.2.9, compiled Wed May 10 16:25:47 2023
> 
> sevr=info Spawning sequencer program "BM13_Energy", thread 0x2c304a0: "BM13_Energy"
> 
> sevr=minor BM13_Energy[0](after 0 sec): assigned=26, connected=26, 
> monitored=24, got monitor=21
> 
> sevr=minor BM13_Energy[0](after 0 sec): assigned=26, connected=26, 
> monitored=24, got monitor=21
> 
> sevr=minor BM13_Energy[0](after 0 sec): assigned=26, connected=26, 
> monitored=24, got monitor=21
> 
> sevr=minor BM13_Energy[0](after 0 sec): assigned=26, connected=26, 
> monitored=24, got monitor=21
> 
> sevr=minor BM13_Energy[0](after 0 sec): assigned=26, connected=26, 
> monitored=24, got monitor=21
> 
> The SNL program has monitors on 3 record fields for the non-functional motor.  I think that is why there are 24 channels monitored, but only 21 monitors have been received?
> 
> The above seems like it might be normal.  However, after about 30 seconds I get this error on the Linux IOC:
> 
> CA.Client.Exception...............................................
> 
>      Warning: "Virtual circuit unresponsive"
> 
>      Context: "ioc13bma.cars.aps.anl.gov:5064"
> 
>      Source File: ../tcpiiu.cpp line 926
> 
>      Current Time: Wed May 10 2023 16:53:37.844955871
> 
> At the same time I get this error on the VxWorks IOC:
> 
> DB CA Link Exception: "Virtual circuit disconnect", context "corvette:38399"
> 
> After the exceptions I then get the following messages on the Linux IOC.   Note that it spontaneously dropped 24 of the 26 connections, and “got monitor” changed to -1.
> 
> ..................................................................
> 
> sevr=minor BM13_Energy[0](after 0 sec): assigned=26, connected=2, 
> monitored=24, got monitor=-1
> 
> sevr=minor BM13_Energy[0](after 0 sec): assigned=26, connected=2, 
> monitored=24, got monitor=-1
> 
> sevr=minor BM13_Energy[0](after 0 sec): assigned=26, connected=2, 
> monitored=24, got monitor=-1
> 
> Is this behavior expected?
> 
> Thanks,
> 
> Mark
> 


Replies:
Re: sequencer problem Michael Davidsaver via Tech-talk
References:
sequencer problem Mark Rivers via Tech-talk
RE: sequencer problem Mark Rivers via Tech-talk
Re: sequencer problem Michael Davidsaver via Tech-talk

Navigate by Date:
Prev: Re: sequencer problem Michael Davidsaver via Tech-talk
Next: Re: sequencer problem Michael Davidsaver via Tech-talk
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  <20232024 
Navigate by Thread:
Prev: Re: sequencer problem Michael Davidsaver via Tech-talk
Next: Re: sequencer problem Michael Davidsaver via Tech-talk
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  <20232024 
ANJ, 10 May 2023 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·