Experimental Physics and Industrial Control System
Just a little reminder: we've just seen bug #541221 again here at DLS, and the impact was ... disconcerting; it's taken a little while to figure out exactly what's going on.
Here's the text of our event this time:
[Tue Jun 26 15:01:38 2018]DB CA Link Exception: "Virtual circuit disconnect", context "172.23.194.103:5064"
[Tue Jun 26 15:04:15 2018]
[Tue Jun 26 15:04:15 2018]
[Tue Jun 26 15:04:15 2018]
[Tue Jun 26 15:04:15 2018]A call to 'assert(pca->pgetNative)'
[Tue Jun 26 15:04:15 2018] by thread 'CAC-TCP-recv' failed in ../dbCa.c line 683.
[Tue Jun 26 15:04:15 2018]EPICS Release EPICS R3.14.12.3 $Date: Mon 2012-12-17 14:11:47 -0600$.
[Tue Jun 26 15:04:15 2018]Local time is 2018-06-26 14:04:15.383540000 UTC
[Tue Jun 26 15:04:15 2018]Please E-mail this message to the author or to [email protected]
[Tue Jun 26 15:04:15 2018]Calling epicsThreadSuspendSelf()
This happened simultaneously on about thirty IOCs (all running the same template) when an IOC serving a particular PV was rebooted, and as a result all affected IOCs needed to be rebooted.
I see that I last reported this issue just under 14 years ago (wow!) and that it is present and live in the bug tracker.
Now of course you'll see that we're running a pretty elderly version of EPICS, but given the state and conversation around this bug, I have to ask -- has this bug actually been fixed in a more recent version of EPICS?
Looking at the code I see an interesting disjunction between the code that creates a subscription callback and the callback itself. Let's see if I can create a suitable caricature of the offending code (obviously this is 3.14.12.3 code, newer versions may be very different):
dbCaTask:
if (link_action & CA_MONITOR_NATIVE) {
pca->pgetNative = dbCalloc(pca->nelements, element_size);
status = ca_add_array_event(
ca_field_type(pca->chid)+DBR_TIME_STRING,
ca_element_count(pca->chid),
pca->chid, eventCallback, pca, 0.0, 0.0, 0.0, 0);
}
if (link_action & CA_MONITOR_STRING) {
pca->pgetString = dbCalloc(1, MAX_STRING_SIZE);
status = ca_add_array_event(DBR_TIME_STRING, 1,
pca->chid, eventCallback, pca, 0.0, 0.0, 0.0, 0);
}
eventCallback:
if (arg.type == DBR_TIME_STRING &&
ca_field_type(pca->chid) == DBR_ENUM) {
// use pca->pgetString
} else {
// use pca->pgetNative
Notice that in the calling code we choose whether to create pgetNative and pgetString based on the transient link_action variable, but in the callback code we choose which to use based on arg.type and ca_field_type(). Now, I'm not familiar with the low level of CA, but I can well imagine that either arg.type or ca_field_type() may depend on external data, making this a fairly obvious bug!
--
This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail.
Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd.
Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message.
Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom
- Replies:
- Re: 'assert (pca->pgetNative)' failed in ../dbCa.c Andrew Johnson
- Navigate by Date:
- Prev:
Re: Question about RDB Channel archiver engine be used for the waveform channels Mazanec Tomáš
- Next:
CCD vs. CMOS in accelerator environement John Dobbins
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
<2018>
2019
2020
2021
2022
2023
2024
- Navigate by Thread:
- Prev:
RE: Question about displaying process memory allocation on Linux Mark Rivers
- Next:
Re: 'assert (pca->pgetNative)' failed in ../dbCa.c Andrew Johnson
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
<2018>
2019
2020
2021
2022
2023
2024