EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  <20182019  2020  2021  2022  2023  2024  Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  <20182019  2020  2021  2022  2023  2024 
<== Date ==> <== Thread ==>

Subject: 'assert (pca->pgetNative)' failed in ../dbCa.c
From: "[email protected]" <[email protected]>
To: "tech-talk ([email protected])" <[email protected]>
Date: Wed, 27 Jun 2018 10:15:21 +0000
Just a little reminder: we've just seen bug #541221 again here at DLS, and the impact was ... disconcerting; it's taken a little while to figure out exactly what's going on.

Here's the text of our event this time:

[Tue Jun 26 15:01:38 2018]DB CA Link Exception: "Virtual circuit disconnect", context "172.23.194.103:5064"
[Tue Jun 26 15:04:15 2018]
[Tue Jun 26 15:04:15 2018]
[Tue Jun 26 15:04:15 2018]
[Tue Jun 26 15:04:15 2018]A call to 'assert(pca->pgetNative)'
[Tue Jun 26 15:04:15 2018] by thread 'CAC-TCP-recv' failed in ../dbCa.c line 683.
[Tue Jun 26 15:04:15 2018]EPICS Release EPICS R3.14.12.3 $Date: Mon 2012-12-17 14:11:47 -0600$.
[Tue Jun 26 15:04:15 2018]Local time is 2018-06-26 14:04:15.383540000 UTC
[Tue Jun 26 15:04:15 2018]Please E-mail this message to the author or to [email protected]
[Tue Jun 26 15:04:15 2018]Calling epicsThreadSuspendSelf()

This happened simultaneously on about thirty IOCs (all running the same template) when an IOC serving a particular PV was rebooted, and as a result all affected IOCs needed to be rebooted.

I see that I last reported this issue just under 14 years ago (wow!) and that it is present and live in the bug tracker.

Now of course you'll see that we're running a pretty elderly version of EPICS, but given the state and conversation around this bug, I have to ask -- has this bug actually been fixed in a more recent version of EPICS?


Looking at the code I see an interesting disjunction between the code that creates a subscription callback and the callback itself.  Let's see if I can create a suitable caricature of the offending code (obviously this is 3.14.12.3 code, newer versions may be very different):


dbCaTask:
    if (link_action & CA_MONITOR_NATIVE) {
        pca->pgetNative = dbCalloc(pca->nelements, element_size);
        status = ca_add_array_event(
            ca_field_type(pca->chid)+DBR_TIME_STRING,
            ca_element_count(pca->chid),
            pca->chid, eventCallback, pca, 0.0, 0.0, 0.0, 0);
    }
    if (link_action & CA_MONITOR_STRING) {
        pca->pgetString = dbCalloc(1, MAX_STRING_SIZE);
        status = ca_add_array_event(DBR_TIME_STRING, 1,
            pca->chid, eventCallback, pca, 0.0, 0.0, 0.0, 0);
    }


eventCallback:
    if (arg.type == DBR_TIME_STRING && 
        ca_field_type(pca->chid) == DBR_ENUM) {
        // use pca->pgetString
    } else {
        // use pca->pgetNative


Notice that in the calling code we choose whether to create pgetNative and pgetString based on the transient link_action variable, but in the callback code we choose which to use based on arg.type and ca_field_type().  Now, I'm not familiar with the low level of CA, but I can well imagine that either arg.type or ca_field_type() may depend on external data, making this a fairly obvious bug!

-- 
This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail.
Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd. 
Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message.
Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom


Replies:
Re: 'assert (pca->pgetNative)' failed in ../dbCa.c Andrew Johnson

Navigate by Date:
Prev: Re: Question about RDB Channel archiver engine be used for the waveform channels Mazanec Tomáš
Next: CCD vs. CMOS in accelerator environement John Dobbins
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  <20182019  2020  2021  2022  2023  2024 
Navigate by Thread:
Prev: RE: Question about displaying process memory allocation on Linux Mark Rivers
Next: Re: 'assert (pca->pgetNative)' failed in ../dbCa.c Andrew Johnson
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  <20182019  2020  2021  2022  2023  2024 
ANJ, 27 Jun 2018 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·