I'm working with Base-3.16 but this bug appears in 3.15 as well so the
fix will have to be applied there.
valgrind is reporting invalid reads when I access a subarray of a
waveform record with either caget -c or camonitor. My PV with filter is
'wf100:i32.[0:4]' where wf100:i32 has NELM=100 and currently contains 10
elements. The current array size reported by the client is also 10, when
it should be 5 with that filter.
I temporarily instrumented some of the routines to work out what's
happening:
> dbChannel_get_count(buffer_type=19, to=0x578b2a8, pfl=0x57e7548, len=100)
> channel name: wf100:i32.[0:4]
> field_type=DBF_LONG (4 bytes), dbr_type=DBF_LONG, 100 elements, 1 filter (0 pre eventq, 1 post eventq)
> plugin arr, start=0, incr=1, end=4
> final field_type=DBF_LONG (4B), 5 elements
> dbChannelGet(type=5, to=0xdfecbf0, pfl=0x57e7548, len=0)
> channel name: wf100:i32.[0:4]
> field_type=DBF_LONG (4 bytes), dbr_type=DBF_LONG, 100 elements, 1 filter (0 pre eventq, 1 post eventq)
> final field_type=DBF_LONG (4B), 5 elements
> dbChannelGet(type=5, to=0x578b2b4, pfl=0x57e7548, len=100)
> channel name: wf100:i32.[0:4]
> field_type=DBF_LONG (4 bytes), dbr_type=DBF_LONG, 100 elements, 1 filter (0 pre eventq, 1 post eventq)
> final field_type=DBF_LONG (4B), 5 elements
> getLongLong(pfield=0x57e7648, to=0x578b2b4, req=10, nelms=10, off=0)
> ==16723== Thread 20:
> ==16723== Invalid read of size 2
> ==16723== at 0x4A09BC0: memmove (mc_replace_strmem.c:1026)
> ==16723== by 0x4E78592: getLongLong (dbConvert.c:60)
> ==16723== by 0x4E6C9CD: dbGet (dbAccess.c:920)
> ==16723== by 0x4E703A8: dbChannelGetField (dbChannel.c:680)
> ==16723== by 0x4E8386F: dbChannel_get_count (db_access.c:398)
> ==16723== by 0x4EA533E: read_reply (camessage.c:586)
> ==16723== by 0x4E7F9F5: event_task (dbEvent.c:942)
> ==16723== by 0x535084B: start_routine (osdThread.c:414)
> ==16723== by 0x373F807A50: start_thread (in /lib64/libpthread-2.12.so)
> ==16723== by 0x373F0E893C: clone (in /lib64/libc-2.12.so)
> ==16723== Address 0x57e7660 is 0 bytes after a block of size 24 client-defined
> ==16723== at 0x533F099: freeListMalloc (freeListLib.c:129)
> ==16723== by 0x533F225: freeListCalloc (freeListLib.c:71)
> ==16723== by 0x4C45638: filter (arr.c:125)
> ==16723== by 0x4E6FA55: dbChannelRunPostChain (dbChannel.c:590)
> ==16723== by 0x4E7FAE7: event_task (dbEvent.c:938)
> ==16723== by 0x535084B: start_routine (osdThread.c:414)
> ==16723== by 0x373F807A50: start_thread (in /lib64/libpthread-2.12.so)
> ==16723== by 0x373F0E893C: clone (in /lib64/libc-2.12.so)
> ==16723==
> ==16723== Invalid read of size 2
> ==16723== at 0x4A09BA8: memmove (mc_replace_strmem.c:1026)
> ==16723== by 0x4E78592: getLongLong (dbConvert.c:60)
> ==16723== by 0x4E6C9CD: dbGet (dbAccess.c:920)
> ==16723== by 0x4E703A8: dbChannelGetField (dbChannel.c:680)
> ==16723== by 0x4E8386F: dbChannel_get_count (db_access.c:398)
> ==16723== by 0x4EA533E: read_reply (camessage.c:586)
> ==16723== by 0x4E7F9F5: event_task (dbEvent.c:942)
> ==16723== by 0x535084B: start_routine (osdThread.c:414)
> ==16723== by 0x373F807A50: start_thread (in /lib64/libpthread-2.12.so)
> ==16723== by 0x373F0E893C: clone (in /lib64/libc-2.12.so)
> ==16723== Address 0x57e7662 is 2 bytes after a block of size 24 client-defined
> ==16723== at 0x533F099: freeListMalloc (freeListLib.c:129)
> ==16723== by 0x533F225: freeListCalloc (freeListLib.c:71)
> ==16723== by 0x4C45638: filter (arr.c:125)
> ==16723== by 0x4E6FA55: dbChannelRunPostChain (dbChannel.c:590)
> ==16723== by 0x4E7FAE7: event_task (dbEvent.c:938)
> ==16723== by 0x535084B: start_routine (osdThread.c:414)
> ==16723== by 0x373F807A50: start_thread (in /lib64/libpthread-2.12.so)
> ==16723== by 0x373F0E893C: clone (in /lib64/libc-2.12.so)
This shows that dbConvert.c::getLongLong() is being asked to fetch 10
elements from the arr filter's buffer that only has 5 elements in it.
This happens because dbAccess.c::dbGet() first sets field_type and
no_elements from either paddr or pfl, but later on it calls
prset->get_array_info() so the record can update no_elements, thereby
overwriting the value that came from the filter (i.e. from pfl).
The fix is to move the call to prset->get_array_info() inside the first
if (!pfl) block higher up [or to suppress it using the same conditions,
but I'd rather move it] — the attached patch works.
I would like to create a test to demonstrate the bug before applying my
fix, but I'm not sure where the right place would be for it:
1. std/filters/test/arrTest.cpp doesn't call dbChannelGetField() at all,
it just uses dbChannelRunPostChain() and examines the data in the
db_field_log, and this doesn't feel quite the right place anyway.
2. ioc/db/test/dbChannelTest.c is too low-level and doesn't have any
suitable filters.
3. dbCaLinkTest.c might not have the array filter registered, and this
isn't really a test of the dbCa subsystem anyway.
Ideas/suggestions?
- Andrew
--
There are only two hard problems in distributed systems:
2. Exactly-once delivery
1. Guaranteed order of messages
2. Exactly-once delivery
-- Mathias Verraes
=== modified file 'src/ioc/db/dbAccess.c'
--- src/ioc/db/dbAccess.c 2015-09-07 04:15:21 +0000
+++ src/ioc/db/dbAccess.c 2016-02-12 18:58:45 +0000
@@ -813,7 +813,7 @@
void *pbuffer, long *options, long *nRequest, void *pflin)
{
char *pbuf = pbuffer;
- void *pfieldsave;
+ void *pfieldsave = paddr->pfield;
db_field_log *pfl = (db_field_log *)pflin;
short field_type;
long no_elements;
@@ -829,9 +829,18 @@
if (!pfl || pfl->type == dbfl_type_rec) {
field_type = paddr->field_type;
no_elements = paddr->no_elements;
+
+ /* Update field info from record */
+ if (paddr->pfldDes->special == SPC_DBADDR &&
+ (prset = dbGetRset(paddr)) &&
+ prset->get_array_info) {
+ status = prset->get_array_info(paddr, &no_elements, &offset);
+ } else
+ offset = 0;
} else {
field_type = pfl->field_type;
no_elements = pfl->no_elements;
+ offset = 0;
}
if (field_type >= DBF_INLINK && field_type <= DBF_FWDLINK)
@@ -849,21 +858,6 @@
return S_db_badDbrtype;
}
- /* For SPC_DBADDR fields, the rset function
- * get_array_info() is allowed to modify
- * paddr->pfield. So we store the original
- * value and restore it later.
- */
- pfieldsave = paddr->pfield;
-
- /* Update field info */
- if (paddr->pfldDes->special == SPC_DBADDR &&
- (prset = dbGetRset(paddr)) &&
- prset->get_array_info) {
- status = prset->get_array_info(paddr, &no_elements, &offset);
- } else
- offset = 0;
-
if (offset == 0 && (!nRequest || no_elements == 1)) {
if (nRequest) *nRequest = 1;
if (!pfl || pfl->type == dbfl_type_rec) {
- Replies:
- Re: Bug in dbGet() with server-side plugins Michael Davidsaver
- Navigate by Date:
- Prev:
Re: Build failed in Jenkins: epics-base-3.16-linux32-test #13 Andrew Johnson
- Next:
Re: Bug in dbGet() with server-side plugins Michael Davidsaver
- Index:
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
<2016>
2017
2018
2019
2020
2021
2022
2023
2024
2025
- Navigate by Thread:
- Prev:
Re: Build failed in Jenkins: epics-base-3.16-linux32-test #13 Michael Davidsaver
- Next:
Re: Bug in dbGet() with server-side plugins Michael Davidsaver
- Index:
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
<2016>
2017
2018
2019
2020
2021
2022
2023
2024
2025
|