EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  <20232024  Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  <20232024 
<== Date ==> <== Thread ==>

Subject: Re: sequencer problem
From: Michael Davidsaver via Tech-talk <tech-talk at aps.anl.gov>
To: Mark Rivers <rivers at cars.uchicago.edu>
Cc: "tech-talk at aps.anl.gov" <tech-talk at aps.anl.gov>
Date: Wed, 10 May 2023 17:27:46 -0700
On 5/10/23 17:24, Mark Rivers wrote:
Hi Michael,

Thanks.  But as I said the motors (e.g. 13BMA:m17.RBV) are running in a VxWorks IOC.  I don't have a debugger for VxWorks.

Ah, well.  So sorry to hear that :(


Mark


-----Original Message-----
From: Michael Davidsaver <mdavidsaver at gmail.com>
Sent: Wednesday, May 10, 2023 7:20 PM
To: Mark Rivers <rivers at cars.uchicago.edu>
Cc: tech-talk at aps.anl.gov
Subject: Re: sequencer problem

On 5/10/23 16:23, Mark Rivers via Tech-talk wrote:
I have found that a simple caget also causes strange behavior:

My first suspicion is that some code is blocking for an extended time with a record lock held.  Either directly, or indirectly eg. by holding some other lock which a device support blocks on.

If this is such a deadlock, then one potentially easy way way to get more information would be to, effectively, catch the blockers in the act.
Run your IOC in a debugger, issue the caget, then immediately switch and issue a manual break (Ctrl+C) in the debugger and finally run "thread apply all backtrace".

If I am right in my guess, this should show a client TCP blocking on a dbScanLock(), and hopefully give some idea of which other thread is holding that lock.


caget causes a CA disconnect:

corvette:CARS/CARSApp/src>caget 13BMA:m17.RBV

Read operation timed out: some PV data was not read.

13BMA:m17.RBV                  0

CA.Client.Exception...............................................

      Warning: "Virtual circuit disconnect"

      Context: "op=0, channel=13BMA:m17.RBV, type=DBR_TIME_DOUBLE, count=1, ctx="ioc13bma.cars.aps.anl.gov:5064""

      Source File: ../getCopy.cpp line 91

      Current Time: Wed May 10 2023 18:16:47.411698930

..................................................................

The record shows this with dbpr”

ioc13bma> dbpr "13BMA:m17",2

ACCL: 0.2           ACKS: NO_ALARM      ACKT: YES           ADEL: 0

AMSG:               ASG :               ATHM: 0             BACC: 0.2

BDST: 0             BKPT: 00            BVEL: 1             CDIR: 0

CNEN: Disable       DCOF: 0             DESC: Monochromator DHLM:
42.9453

DIFF: 0             DINP: CONSTANT      DIR : Pos           DISA: 0

DISP: 0             DISS: NO_ALARM      DISV: 1             DLLM: -50

DLY : 0.25          DMOV: 0             DOL : CONSTANT      DRBV: 0

DTYP: Mclennan PM304                    DVAL: 12.336010114751

EGU : degrees       ERES: 0             EVNT:               FLNK:
CONSTANT

FOF : 0             FOFF: Frozen        FRAC: 1             HHSV:
NO_ALARM

HIGH: 0             HIHI: 0             HLM : 0             HLS : 0

HLSV: NO_ALARM      HOMF: 0             HOMR: 0             HOPR: 0

HSV : NO_ALARM      HVEL: 0             ICOF: 0             IGSET: 0

INIT:               JAR : 5             JOGF: 0             JOGR: 0

JVEL: 1             LCNT: 0             LDVL: 12.336010114751

LLM : 0             LLS : 0             LLSV: NO_ALARM      LOCK: NO

LOLO: 0             LOPR: 0             LOW : 0             LRLV: 0

LRVL: 246720        LSPG: Go            LSV : NO_ALARM      LVAL:
3.77871710270

LVIO: 1             MDEL: 0             MISS: 0             MOVN: 0

MRES: 5.0e-05       NAME: 13BMA:m17     NAMSG:              NSEV:
NO_ALARM

NSTA: NO_ALARM      NTM : YES           NTMF: 2

OFF : -8.5572930120512                  OMSL: supervisory

OUT : VME_IO #C0 S0 @                   PACT: 1             PCOF: 0

PHAS: 0             PINI: NO            POST:               PP  : 0

PREC: 4             PREM:               PRIO: LOW           PUTF: 1

RBV : 0             RCNT: 0             RDBD: 5.0e-05       RDBL:
CONSTANT

RDIF: 0             REP : 0             RHLS: 0             RINP:
CONSTANT

RLLS: 0             RLNK: CONSTANT      RLV : 0             RMOD:
Default

RMP : 0             RPRO: 0             RRBV: 0             RRES: 0

RSTM: NearZero      RTRY: 0             RVAL: 246720        RVEL: 0

S   : 1             SBAK: 1             SBAS: 0             SCAN:
Passive

SDIS: CONSTANT      SET : Use           SEVR: INVALID       SMAX: 1

SPDB: 0             SPMG: Go            SREV: 20000         SSET: 0

STAT: UDF           STOO: CONSTANT      STOP: 0             SUSE: 0

SYNC: 0             TDIR: 0             TIME: <undefined>   TPRO: 0

TSE : 0             TSEL: CONSTANT      TWF : 0             TWR : 0

TWV : 1             UDF : 0             UDFS: INVALID       UEIP: No

UREV: 1             URIP: No            VAL : 3.77871710270 VBAS: 0

VELO: 1             VERS: 7.2           VMAX: 1             VOF : 0

value = 0 = 0x0

The record PACT=1, probably set in init_record when it could not find the controller.  But should that be causing the CA errors with caget and the sequencer?

Mark

*From:* Mark Rivers
*Sent:* Wednesday, May 10, 2023 5:07 PM
*To:* tech-talk at aps.anl.gov
*Subject:* sequencer problem

Folks,

I am seeing behavior I don’t understand with the sequencer.  I am using base 7.0.6.1 and sequencer 2.2.9 (latest).

I have a VxWorks IOC running a number of motors.  All of the motor records are loaded, but currently one of the motor controllers is not available, so that motor record is not functional.

I am running an SNL program on a Linux IOC,  and it connects to the motors on the VxWorks system, including the currently non-functional motor.  The SNL program puts monitors several of the motor record fields.

I see the following messages after iocInit when the SNL program starts:

seq BM13_Energy, "E=13BMA:E, MONO=13BMA:m17, EXPTAB_Z=13BMD:m22, YXTAL=13BMA:MON:, ZXTAL=13BMA:m14"

sevr=info Sequencer release 2.2.9, compiled Wed May 10 16:25:47 2023

sevr=info Spawning sequencer program "BM13_Energy", thread 0x2c304a0: "BM13_Energy"

sevr=minor BM13_Energy[0](after 0 sec): assigned=26, connected=26,
monitored=24, got monitor=21

sevr=minor BM13_Energy[0](after 0 sec): assigned=26, connected=26,
monitored=24, got monitor=21

sevr=minor BM13_Energy[0](after 0 sec): assigned=26, connected=26,
monitored=24, got monitor=21

sevr=minor BM13_Energy[0](after 0 sec): assigned=26, connected=26,
monitored=24, got monitor=21

sevr=minor BM13_Energy[0](after 0 sec): assigned=26, connected=26,
monitored=24, got monitor=21

The SNL program has monitors on 3 record fields for the non-functional motor.  I think that is why there are 24 channels monitored, but only 21 monitors have been received?

The above seems like it might be normal.  However, after about 30 seconds I get this error on the Linux IOC:

CA.Client.Exception...............................................

      Warning: "Virtual circuit unresponsive"

      Context: "ioc13bma.cars.aps.anl.gov:5064"

      Source File: ../tcpiiu.cpp line 926

      Current Time: Wed May 10 2023 16:53:37.844955871

At the same time I get this error on the VxWorks IOC:

DB CA Link Exception: "Virtual circuit disconnect", context "corvette:38399"

After the exceptions I then get the following messages on the Linux IOC.   Note that it spontaneously dropped 24 of the 26 connections, and “got monitor” changed to -1.

..................................................................

sevr=minor BM13_Energy[0](after 0 sec): assigned=26, connected=2,
monitored=24, got monitor=-1

sevr=minor BM13_Energy[0](after 0 sec): assigned=26, connected=2,
monitored=24, got monitor=-1

sevr=minor BM13_Energy[0](after 0 sec): assigned=26, connected=2,
monitored=24, got monitor=-1

Is this behavior expected?

Thanks,

Mark




References:
sequencer problem Mark Rivers via Tech-talk
RE: sequencer problem Mark Rivers via Tech-talk
Re: sequencer problem Michael Davidsaver via Tech-talk
RE: sequencer problem Mark Rivers via Tech-talk

Navigate by Date:
Prev: RE: sequencer problem Mark Rivers via Tech-talk
Next: Re: Why does libca.so depend on libreadline.so? Dmitry Yu. Bolkhovityanov via Tech-talk
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  <20232024 
Navigate by Thread:
Prev: RE: sequencer problem Mark Rivers via Tech-talk
Next: Question about the best practice for integrating a stepper motor Python library into EPICS Alexander Kessler via Tech-talk
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  <20232024 
ANJ, 11 May 2023 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·