Hi,
So it looks like the last line being executed in drvAsynIPPort.c is this:
thisWrite = send(tty->fd, (char *)data, (int)numchars, 0);
That is simply calling the vxWorks function send(). That ultimately results in a call to the internal vxWorks function ipcom_sendmsg() which is calling free(). It looks to me like the memory pointer being passed to free() is corrupted. However, that would not be freeing the memory buffer "data" that was passed from drvAsynIPPort.c, it must be some internal buffer in the vxWorks routines.
I suspect some other software in your IOC has a buffer overflow problem or uninitialized pointer and is corrupting memory elsewhere in the system.
These are hard problems to track down, particularly when the failure can take days. The typical approach is to remove software modules from the IOC one at a time until the problem goes away and then zero in on the last thing you removed. But that can take a long time when the failure rate is low. And there is no guarantee that will work, since the memory corruption may just move to an area where it never shows up.
Mark
________________________________
From: haquin [[email protected]]
Sent: Wednesday, September 23, 2015 7:13 AM
To: Mark Rivers; tech-talk
Subject: Re: Asyn ModbusTCP communication KO without error messages
Hello Mark,
the problem just happened again, it's always the same task being suspended without error message.
Here is the output of the "tt" command
iocVMELB1 > tt 0x17d1fb0
0x0012bd34 vxTaskEntry +0x48 : 0x00a4c4cc ()
0x00a4c538 epicsThreadCreate+0x1c0: 0x008ffc84 ()
0x008ffc84 modbusInterposeConfig+0xb88c: 0x00917bf8 ()
0x00917c60 asynInterposeFlushConfig+0x79d4: 0x008f2898 ()
0x008f2a44 drvModbusAsynConfigure+0x828: 0x008f157c ()
0x008f1934 lfSetErrLogSev+0x3584: 0x009081e4 ()
0x00908344 drvAsynIPServerPortConfigure+0x2904: 0x008f4b04 ()
0x008f4bf4 modbusInterposeConfig+0x7fc: 0x009070fc ()
0x0090711c drvAsynIPServerPortConfigure+0x16dc: 0x009049a4 ()
0x00904c0c drvAsynIPPortConfigure+0xc94: send ()
0x002b0a48 send +0x7c : 0x002e1938 ()
0x002e194c ipcom_spinlock_delete+0x97c: ipcom_send ()
0x001d6430 ipcom_sendto +0x48 : ipcom_sendmsg ()
0x001d639c ipcom_sendmsg+0x718: free ()
0x002077b4 free +0x3c : 0x00207148 ()
0x00207238 memPartBlockIsValid+0x180: taskSuspend ()
value = 0 = 0x0
iocVMELB1 >
... what to think about this "memPartBlockIsValid", it's coherent with the task error code : Errno 0x110003 => S_memLib_BLOCK_ERROR
Is there some memory addresses corrupted ?
thanx in advance
best regards
Le 09/09/2015 18:01, Mark Rivers a écrit :
I suspect your problem is indeed that suspended task. Are you sure there was no error message when the task got suspended?
You can learn something about the task status by running the “tt” command on it:
tt 0x16963d0
Mark
From: haquin [mailto:[email protected]]
Sent: Wednesday, September 09, 2015 6:44 AM
To: Mark Rivers; tech-talk
Subject: Re: Asyn ModbusTCP communication KO without error messages
Mark,
The problem happened again yesterday, I've attached files with the result of the several commands.
What I've noticed is that there is a PLC task that is suspended with a error code:
Errno 0x110003 => S_memLib_BLOCK_ERROR
NAME ENTRY TID PRI STATUS PC SP ERRNO DELAY
---------- ------------ -------- --- ---------- -------- -------- ------- -----
LBE1-PLCV a4c48c 167fb20 149 PEND 2c9038 1682b70 0 0
LBE1-PLCV-> a4c48c 1686000 149 PEND 2c7154 1689080 0 0
LBE1-PLCV-> a4c48c 168a9e0 149 PEND 2c7154 168c990 46 0
LBE1-PLCV-> a4c48c 168f510 149 PEND 2c7154 1692420 0 0
LBE1-PLCV-> a4c48c 16963d0 149 SUSPEND 2ce5c4 1699110 110003 0
LBE1-PLCV-> a4c48c 169c1b0 149 PEND 2c7154 169f0c0 0 0
LBE1-PLCV-> a4c48c 16a29c0 149 PEND 2c7154 16a58d0 0 0
LBE1-PLCV-> a4c48c 16a87c0 149 PEND 2c7154 16ab6d0 0 0
LBE1-PLCV-> a4c48c 16ae5c0 149 PEND 2c7154 16b14d0 0 0
LBE1-PLCV-> a4c48c 16b4280 149 PEND 2c7154 16b7190 0 0
LBE1-PLCV-> a4c48c 16ba620 149 PEND 2c7154 16bd6a0 0 0
LBE1-PLCV-> a4c48c 16c0420 149 PEND 2c7154 16c34a0 0 0
LBE1-PLCV-> a4c48c 16c6230 149 PEND 2c7154 16c92b0 0 0
LBE1-PLCV-> a4c48c 16cc030 149 PEND 2c7154 16cef40 0 0
- I don't think there is a deadlock, but I don't know what to expect with the output of the command in this case.
- There is no task overflow, especially the one that is suspended.
- The result of Asyn report is in attachment
I've added your Asyn record and there was the following errors when I tried to connect to the READ port or set ON/OFF a trace:
iocVMELB1 > 2015/09/08 11:29:24.509 [LBE1-PLCV-READ,0,0] [../../asyn/asynDriver/asynManager.c:1509] [CAS-client,0x8f70500,20] LBE1-PLCV-READ addr 0 queueRequest priority 3 not lockHolder
2015/09/08 11:29:24.509 [LBE1-PLCV-READ,0,0] [../../asyn/asynDriver/asynManager.c:1520] [CAS-client,0x8f70500,20] LBE1-PLCV-READ schedule queueRequest timeout
2015/09/08 11:29:27.925 [LBE1-PLCV-READ,0,0] [../../asyn/asynDriver/asynManager.c:637] [timerQueue,0xc04f20,60] LBE1-PLCV-READ asynManager:queueTimeoutCallback
2015/09/08 11:29:27.925 [LBE1-PLCV-READ,0,0] [../../asyn/asynRecord/asynRecord.c:2003] [timerQueue,0xc04f20,60] LBE1-PLCV-asyn: special queueRequest timeout
2015/09/08 11:29:34.509 [LBE1-PLCV-READ,0,0] [../../asyn/asynDriver/asynManager.c:637] [timerQueue,0xc04f20,60] LBE1-PLCV-READ asynManager:queueTimeoutCallback
2015/09/08 11:29:34.509 [LBE1-PLCV-READ,0,0] [../../asyn/asynRecord/asynRecord.c:2003] [timerQueue,0xc04f20,60] LBE1-PLCV-asyn: special queueRequest timeout
iocVMELB1 > 2015/09/08 11:31:44.577 [LBE1-PLCV-READ,0,0] [../../asyn/asynRecord/asynRecord.c:899] [CAS-client,0x8f70500,20] LBE1-PLCV-asyn: exception 3
2015/09/08 11:31:51.910 [LBE1-PLCV-READ,0,0] [../../asyn/asynRecord/asynRecord.c:899] [CAS-client,0x8f70500,20] LBE1-PLCV-asyn: exception 4
2015/09/08 11:31:54.527 [LBE1-PLCV-READ,0,0] [../../asyn/asynRecord/asynRecord.c:899] [CAS-client,0x8f70500,20] LBE1-PLCV-asyn: exception 4
2015/09/08 11:31:57.244 [LBE1-PLCV-READ,0,0] [../../asyn/asynRecord/asynRecord.c:899] [CAS-client,0x8f70500,20] LBE1-PLCV-asyn: exception 3
2015/09/08 11:32:00.077 [LBE1-PLCV-READ,0,0] [../../asyn/asynRecord/asynRecord.c:899] LBE1-PLCV-asyn: exception 5
2015/09/08 11:32:03.277 [LBE1-PLCV-READ,0,0] [../../asyn/asynRecord/asynRecord.c:899] [CAS-client,0x8f70500,20] LBE1-PLCV-asyn: exception 5
Thanks for your help
best regards
Le 02/09/2015 16:46, Mark Rivers a écrit :
Hi Christophe,
Here are some things to look for:
- On vxWorks perhaps a task has been suspended. Issue the "i" command to look at the status of all of the tasks.
- Perhaps there is a deadlock. Issue this command several times in a row to see if there is a mutex that is always locked:
epicsMutexShowAll 1
- Perhaps there was a stack overflow. Issue this command and look for tasks with a margin of 0
checkStack
If you don't find anything there then send us the output of "asynReport 10" on the Read port and Write port.
Mark
-----Original Message-----
From: [email protected]<mailto:[email protected]> [mailto:[email protected]] On Behalf Of haquin
Sent: Wednesday, September 02, 2015 9:00 AM
To: tech-talk
Subject: Asyn ModbusTCP communication KO without error messages
Hi all,
I have a VxWorks IOC (with MVME-CPU using both eth interfaces) communicating with a siemens S7PLC via Asyn/ModbusTCP.
EPICS release 3.14.12.4 Asyn v4.22
After a while (1 or 2 days), the communication is not working anymore ... but I have no error messages (no timeout nor
disconnection ...).
From IOC side I a have "Read Multiple Register" function reading the whole modbus table (109 registers) every second
and a "Write Multiple Register" function writing the value of a counter incremented every seconds from record level.
When I activate AsynTrace on IP Port or Read or Write ports there is no messages ...
asynReport on Read port indicates only 1 Read OK
asynReport on Write port indicates 0 Write OK
I can read the PLC register via "modpoll" tool from a Linux PC
I can start a Linux IOC connected to the same PLC
The netstat command on IOC shell tells that the TCP port is established but the Recv-Q is not equal to 0 (12 for example)
What can explain this behavior ?
thanks in advance !
--
Christophe Haquin
Control and Real Time systems Engineer
+33 231454661 office
+33 231454728 fax
SdA/GIM
GANIL
Bd Henri Becquerel BP 55027
14076 CAEN CEDEX5
--
Christophe Haquin
Control and Real Time systems Engineer
+33 231454661 office
+33 231454728 fax
SdA/GIM
GANIL
Bd Henri Becquerel BP 55027
14076 CAEN CEDEX5
- Replies:
- Re: Asyn ModbusTCP communication KO without error messages haquin
- References:
- Asyn ModbusTCP communication KO without error messages haquin
- RE: Asyn ModbusTCP communication KO without error messages Mark Rivers
- RE: Asyn ModbusTCP communication KO without error messages Mark Rivers
- Re: Asyn ModbusTCP communication KO without error messages haquin
- Navigate by Date:
- Prev:
Re: Asyn ModbusTCP communication KO without error messages haquin
- Next:
Re: Asyn ModbusTCP communication KO without error messages haquin
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
<2015>
2016
2017
2018
2019
2020
2021
2022
2023
2024
- Navigate by Thread:
- Prev:
Re: Asyn ModbusTCP communication KO without error messages haquin
- Next:
Re: Asyn ModbusTCP communication KO without error messages haquin
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
<2015>
2016
2017
2018
2019
2020
2021
2022
2023
2024
|