Thank you very much for this patch Andrew. We have been testing in the lab here at DLS and it is looking good; our plan is to roll it out more generally quite soon.
In our case it turns out that the trigger was (accidentially) changing a particular status pv from mbbi to stringin.
> -----Original Message-----
> From: Andrew Johnson [mailto:[email protected]]
> Sent: 29 June 2018 17:34
> To: Abbott, Michael (DLSLtd,RAL,TEC); EPICS tech-talk
> Subject: Re: 'assert (pca->pgetNative)' failed in ../dbCa.c
>
> Hi Michael,
>
> On 06/29/2018 01:43 AM, [email protected] wrote:
> > From: [email protected]
> >> So by this point you're probably hoping that the attached patch
> >> fixes the issue; well congratulations for reading this far, I tried
> >> out my suspicion above and the attached patch does seem to work for
> >> me on the 3.14 branch version, which should be close enough to
> >> yours to be able to apply one way or the other. Please test and let
> >> me know so I can apply it to the Base-3.14 branch and merge up.
> >
> > I've just noticed that the patch doesn't address the data type
> > mismatch directly, only through the separate connection callback. Is
> > this going to be enough to avoid hitting those asserts even in the
> > presence of an IOC server breaking the rules?
>
> Without the patch I can trigger this assertion using the attached pair
> of databases. Start by booting both in separate IOCs, then switch the
> anj:val alias to point to a different record type and reboot that IOC.
> This causes the other IOC to die with:
>
> > epics> DB CA Link Exception: "Virtual circuit disconnect", context
> "tux.aps.anl.gov:44710"
> >
> >
> >
> > A call to 'assert(pca->pgetNative)'
> > by thread 'CAC-TCP-recv' failed in ../dbCa.c line 686.
> > EPICS Release EPICS R3.14.12.7-DEV.
> > Local time is 2018-06-27 14:45:03.260025965 CDT
> > Please E-mail this message to the author or to [email protected]
> > Calling epicsThreadSuspendSelf()
>
> After applying that patch I can see all the INP links reconnecting and
> getting new values from the alias without any problems. It takes a
> couple of scan periods for everything to reconnect and sync up in this
> example, but there were no more crashes in my testing.
>
> > After all, an assert is a confident statement that the invariant in
> > question is never going to be broken, because all of the elements of
> > the invariant are under *our* control; but in this case aren't we
> > still in the situation where the ca_field_type() is not as expected?
> > Or are you saying (implicitly) that here connectionCallback() is
> > *guaranteed* to be called before any change of ca_field_type()?
>
> Au contraire, ca_field_type() should have been updated before the call
> to connectionCallback(). However the assertion that fails is in
> eventCallback() which isn't called until after connectionCallback(). By
> clearing the original monitor subscription inside connectionCallback()
> we are stopping the calls to eventCallback() with the old native data
> type.
>
> > I have a feeling that in our case when we saw the failure, it wasn't
> > so much that the restarting server changed its record type, but that
> > there was something rather more bumpy about its restart (my colleague
> > working on it was trying to migrate between different EPICS and Linux
> > versions with some unexpected failures). However the only concrete
> > evidence we have is the error message and a large coincidence.
>
> If you can find another way to trigger this assertion failure with the
> patch applied please try to replicate it so someone else can trigger it.
--
This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail.
Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd.
Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message.
Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom
- References:
- 'assert (pca->pgetNative)' failed in ../dbCa.c [email protected]
- Re: 'assert (pca->pgetNative)' failed in ../dbCa.c Andrew Johnson
- RE: 'assert (pca->pgetNative)' failed in ../dbCa.c [email protected]
- Re: 'assert (pca->pgetNative)' failed in ../dbCa.c Andrew Johnson
- Navigate by Date:
- Prev:
Failed caput not notified when using ca_array_put_callback [email protected]
- Next:
areaDetector R3-3-2 released Mark Rivers
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
<2018>
2019
2020
2021
2022
2023
2024
- Navigate by Thread:
- Prev:
Re: 'assert (pca->pgetNative)' failed in ../dbCa.c Andrew Johnson
- Next:
CCD vs. CMOS in accelerator environement John Dobbins
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
<2018>
2019
2020
2021
2022
2023
2024
|