Experimental Physics and Industrial Control System
At SNS, we ran into a bug in memPartInfoGet, using VxWorks
kernel 5.4.2 and VxStats. Apparently there is a bug in memPartInfoGet()
which can corrupt the heap if another task preempts memPartInfoGet()
and does a heap operation (e.g. malloc/free). There is an SPR covering
this bug which seems to imply that the bug also exists in kernel
version 5.5. It is not clear which earlier versions of VxWorks might
also be affected, but the date on the SPR is more than 3 years ago.
The tell-tale signature is a message which says :
" invalid block at <some hex address> deleted"
If you ever see that message, then your heap has been
corrupted by the bug.
The symptoms of the heap corruption can be quite severe. In
our case, the system acted as if memory was completely exhausted,
requiring a system reset. In one case, the IOC would not run reliably
for more than a day without heap corruption.
We implemented a work-around in VxStats by re-writing
memPartInfoGet(). With this work-around, the unstable system
has been running the better part of a week now without failure.
I suggest that if anyone has had unexplained heap corruption
on IOCs which use VxStats, that they either log all console
messages looking for the signature, or try running without VxStats,
or try our work-around.
Attached to this message is the WRS SPR information, and
the patch to VxStats.
HTH -- Larry
/* Added by LTH because memPartInfoGet() has a bug when "walking" the
list */
#include "semLib.h"
#include "dllLib.h"
#include "smObjLib.h"
#include "private/memPartLibP.h"
static STATUS memInfoGet(
MEM_PART_STATS * ppartStats /* partition stats structure */
){
FAST PART_ID partId = memSysPartId;
BLOCK_HDR * pHdr;
DL_NODE * pNode;
ppartStats->numBytesFree = 0;
ppartStats->numBlocksFree = 0;
ppartStats->numBytesAlloc = 0;
ppartStats->numBlocksAlloc = 0;
ppartStats->maxBlockSizeFree = 0;
if (ID_IS_SHARED (partId)) /* partition is shared? */
{
/* shared partitions not supported yet */
return (ERROR);
}
/* partition is local */
if (OBJ_VERIFY (partId, memPartClassId) != OK)
return (ERROR);
/* take and keep semaphore until done */
semTake (&partId->sem, WAIT_FOREVER);
for (pNode = DLL_FIRST (&partId->freeList);
pNode != NULL;
pNode = DLL_NEXT (pNode))
{
pHdr = NODE_TO_HDR (pNode);
{
ppartStats->numBlocksFree ++ ;
ppartStats->numBytesFree += 2 * pHdr->nWords;
if(2 * pHdr->nWords > ppartStats->maxBlockSizeFree)
ppartStats->maxBlockSizeFree = 2 * pHdr->nWords;
}
}
ppartStats->numBytesAlloc = 2 * partId->curWordsAllocated;
ppartStats->numBlocksAlloc = partId->curBlocksAllocated;
semGive (&partId->sem);
return (OK);
}
SPR
TITLE:
memPartInfoGet() and memPartAvailable() can
cause memory corruption
SPR #: 30316
STATUS: Assigned
IDE: Tornado 2.2
RTOS: VxWorks 5.5
PRODUCT NAME: VxWorks 5.5
RELEASE STATUS: FCS
PRODUCTS AFFECTED: VxWorks
HOST: All
HOST OS: All
ARCH FAMILY: All
PROCESSOR FAMILY: N/A
PROCESSOR: All
BSP:
DESCRIPTION:
Under some conditions it may be possible that
calling the routine memPartBlockIsValid() will induce memory corruption.
This routine is called by both memPartInfoGet() and
memPartAvailable(). Also, memPartAvailable() is called by memPartShow()
and memShow().
PATCHES:
DATE CREATED:
Feb 15 2000
- Replies:
- RE: bug in VxWorks memPartInfoGet (used by VxStats) Jeff Hill
- Re: bug in VxWorks memPartInfoGet (used by VxStats) Benjamin Franksen
- Navigate by Date:
- Prev:
[Fwd: Re: R3.14.2 & excas] Billy R. Adams
- Next:
multicast storms, or how do WRS SPRs work Hoff, Lawrence
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
2002
<2003>
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
- Navigate by Thread:
- Prev:
[Fwd: Re: R3.14.2 & excas] Billy R. Adams
- Next:
RE: bug in VxWorks memPartInfoGet (used by VxStats) Jeff Hill
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
2002
<2003>
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024