Ø
On this URL [1] a version 3-12 is mentioned. I'm not sure if the github repo is kept up to date with the tarballs in [1].
Lewis Muir created an issue on Github last December addressing that:
https://github.com/epics-modules/tpmac/issues/5
Oops. Here's the corrected one.
On this URL [1] a version 3-12 is mentioned. I'm not sure if the github repo is kept up to date with the tarballs in [1].
[1] https://www.gmca.aps.anl.gov/TPMAC2/
On Thu, Feb 15, 2018 at 3:55 PM, Mark Rivers <[email protected]> wrote:
Minor point: It looks like your patch is in the wrong direction, your additions have – rather than +.
I like #2. I think this patch ought to do it.
It seems that the upstream tpmac project is not on Github, who should I submit this to?
On Thu, Feb 15, 2018 at 12:37 PM, Mark Rivers <[email protected]> wrote:
Ø
So while just removing the "if (!commsError)" check seems to be enough for the connection to be re-established, it seems that telling the asyn to
Ø
close the connection would be the way to go, as you mention. It would be nice if Mark's suggestion of using "disconnectOnReadTimeout" option worked for PMAC, though.
I think the reason it is not working is that a read timeout is probably expected during “normal” operation of the driver in some circumstances. That messes up the logic for "disconnectOnReadTimeout"
because it will then disconnect when you don’t want it to.
There are a couple of possible solutions:
1)
In the code before the expected read timeout temporarily disable "disconnectOnReadTimeout", then re-enable it after the expected read timeout. You do this with the asynOption interface to drvAsynIPPort.
2)
Implement the disconnectOnReadTimeout logic inside the PMAC driver itself. You call pasynCommon->disconnect on unexpected read timeouts, but don’t call it on expected read timeouts.
Mark
FWIW, what I noticed is that, after the controller is powered off, netstat (netstat -oc | grep 1025) reports that TCP retries are happening. If I power the controller back on while retries
are still ongoing, after the current retry timer expires the connection is re-established.
So while just removing the "if (!commsError)" check seems to be enough for the connection to be re-established, it seems that telling the asyn to close the connection would be the
way to go, as you mention. It would be nice if Mark's suggestion of using "disconnectOnReadTimeout" option worked for PMAC, though.
On Thu, Feb 15, 2018 at 12:13 PM, Torsten Bögershausen <[email protected]> wrote:
Hej Bruno and all,
I think you are hunting a very issue (if someone asks me)
I can have a look the next days, I remember that I was digging
into the connection handling (especially the disconnection handling)
for some time ago.
Unless someone has a simulator for the pmac, I will dig in my
old source codes and try to get the one working, which I
once had been running.
One problem which I see here:
if (!commsError) {
status = pasynOctetSyncIO->writeRead(lowLevelPortUser_ ,
command, strlen(command),
response, PMAC_MAXBUF_,
PMAC_TIMEOUT_,
&nwrite, &nread, &eomReason );
if (status) {
asynPrint(lowLevelPortUser_, ASYN_TRACE_ERROR, "%s: Error from pasynOctetSyncIO->writeRead. command: %s\n", functionName, command);
is that we don't tell asyn that we "feel" disconnected.
The good handling would be to close() the underlying socket.
The next step would be to tell asyn "we are disconnected"
This is done here asyn/drvAsynSerial/drvAsynIPPort.c like this:
/* If recv() returns 0 on a SOCK_STREAM (TCP) socket, the connection has closed */
if ((thisRead == 0) && (tty->socketType == SOCK_STREAM)) {
epicsSnprintf(pasynUser->errorMessage,pasynUser->errorMessageSize,
"%s connection closed",
tty->IPDeviceName);
closeConnection(pasynUser,tty,"Read from broken connection");
reason |= ASYN_EOM_END;
}
Then we -should- see the reconnect attempts from asyn, shouldn't we ?
(And handle them)
On 15/02/18 18:03, Bruno Martins wrote:
So, I think I found the culprit:
https://github.com/epicsdeb/pmacasyn/blob/master/pmacApp/pmacAsynMotorPortSrc/pmacController.cpp#L325
Inside lowLevelWriteRead, if a communications error was previously detected, writeRead is *not* called, which I think prevents asyn from executing its reconnection logic. If I comment out this check everything seems to work correctly.
However, since I am not too familiar with this driver, I don't know if removing this check will have unintended consequences elsewhere.
Bruno
On Thu, Feb 15, 2018 at 10:44 AM, Bruno Martins <[email protected] <mailto:[email protected]>> wrote:
Hi Torsten,
Just for the sake of completeness I rebuilt my whole environment from
source (git):
Base R3.15.5
Motor R6-10
Asyn R4-33
pmacasyn from here (with all patches in debian/patches applied):
https://github.com/epicsdeb/pmacasyn <https://github.com/epicsdeb/pmacasyn>
Here's a minimal IOC that reproduces the issue
$ cat st.cmd
#!../../bin/linux-x86_64/tpmac
< envPaths
epicsEnvSet("STREAM_PROTOCOL_PATH","${PMACUTIL}/pmacUtilApp/protocol")
epicsEnvSet("PREFIX","LABTEST:STPC01")
dbLoadDatabase("${TOP}/dbd/tpmac.dbd",0,0)
tpmac_registerRecordDeviceDriver(pdbbase)
pmacAsynIPConfigure("P0","<ip redacted>:1025")
#asynSetOption("P0",0,"disconnectOnReadTimeout","Y")
pmacCreateController("motor","P0",0,2,100,1000)
pmacCreateAxis("motor", 1)
pmacCreateAxis("motor", 2)
dbLoadTemplate("$(TOP)/db/labTest.substitutions")
iocInit()
$ cat labTest.substitutions
file "$(TOP)/db/motor.db"
{
# ADDR ---> axis #
# NOTE ---> 2 asyn ports: 1) lowel comms ("P0"), and 2) Motor Record ("motor")
pattern
{ P, M, MOTOR, PORT, ADDR, DESC, DTYP, EGU, MRES,PREC }
{ "LABTEST:", "STPC01:M1", "motor", "P0", 1, "Axis 1",
asynMotor, "mm", 0.00079375, 2 }
{ "LABTEST:", "STPC01:M2", "motor", "P0", 2, "Axis 2",
asynMotor, "mm", 0.00079375, 2 }
}
$ cat db/motor.db
record(motor,"$(P)$(M)")
{
field(DESC,"$(DESC)")
field(DTYP,"$(DTYP)")
field(OUT,"@asyn($(MOTOR),$(ADDR))")
field(RTRY,"0")
# Use of MRES is deliberate on the following line.
# This sets the retry deadband to the smallest possible size.
field(RDBD,$(MRES="1"))
field(MRES,$(MRES="1"))
field(EGU,$(EGU="None"))
field(PREC,$(PREC="0"))
}
epics> dbl
LABTEST:STPC01:M1
LABTEST:STPC01:M2
As soon as I turn off the controller, these messages appear:
epics> 2018/02/15 10:12:19.085 P0 read pmacFlush failed - thisRead=0,
eomReason=0, status=1
2018/02/15 10:12:29.095 P0 read pmacReadReady failed
status=1,retval=02018/02/15 10:12:29.095 pmacController::lowLevelWriteRead:
Error from pasynOctetSyncIO->writeRead. command: ???
2018/02/15 10:12:29.095 pmacController::getGlobalStatus: Error reading ???.
2018/02/15 10:12:29.095 pmacController::poll: Error reading or setting params.
2018/02/15 10:12:29.095 drvPmacAxisGetStatus: not all status values
returned. Status: 0
Command :#1 ? F P
Response:2018/02/15 10:12:29.095 drvPmacAxisGetStatus: not all status
values returned. Status: 0
Command :#2 ? F P
Response:2018/02/15 10:12:30.096 pmacController::getGlobalStatus: Error
reading ???. nvals: -1, response:
2018/02/15 10:12:30.096 pmacController::poll: Error reading or setting params.
2018/02/15 10:12:30.096 drvPmacAxisGetStatus: not all status values
returned. Status: 0
Command :#1 ? F P
Response:2018/02/15 10:12:30.096 drvPmacAxisGetStatus: not all status
values returned. Status: 0
Command :#2 ? F P
Response:2018/02/15 10:12:31.096 pmacController::getGlobalStatus: Error
reading ???. nvals: -1, response:
2018/02/15 10:12:31.096 pmacController::poll: Error reading or setting params.
2018/02/15 10:12:31.096 drvPmacAxisGetStatus: not all status values
returned. Status: 0
Command :#1 ? F P
Response:2018/02/15 10:12:31.096 drvPmacAxisGetStatus: not all status
values returned. Status: 0
Command :#2 ? F P
Response:
After 20 min (see below), I powered the controller back on, but the
messages continued.
As Mark mentioned, it seems that Linux takes some time to shutdown the
socket. My previous tests (using pasynManager->isConnected to probe for
connection status) seemed to show that it takes ~15min on this particular
system for isConnected to go from 1 to 0. It also seems that by turning the
controller back on isConnected goes from 0 to 1. So my current suspicion is
in agreement with Mark's comments: something might be off in the interpose
code.
I will continue investigating.
Thanks!
Bruno
On Thu, Feb 15, 2018 at 2:15 AM, Torsten Bögershausen
<[email protected] <mailto:[email protected]>>
wrote:
On 12/02/18 20:45, Bruno Martins wrote:
Hi Mark,
We're using tpmac-asyn 3.10.12, asyn R4-32.
Hej Bruno,
what is tpmac-asyn 3.10.12 ?
The printouts you are pointing out ("command: #2 ? F P") seem to come from
.../epicsdeb/pmacasyn/pmacApp/pmacAsynMotorPortSrc/pmacAxis.cpp
Which seems to be a classic model 3 driver (?).
I am unsure which SW you are running.
Could you send us the whole st.cmd ?
And may be a receipe how to get the modules you are using,
tpmac-asyn, which motor. Are you using git clone ? Or wget ?
Regarding pmacAsynIPConfigure, it immediately calls
drvAsynIPPortConfigure:
https://github.com/epicsdeb/pmacasyn/blob/master/pmacApp/pmacAsynIPPortSrc/pmacAsynIPPort.c#L184
<https://github.com/epicsdeb/pmacasyn/blob/master/pmacApp/pmacAsynIPPortSrc/pmacAsynIPPort.c#L184>
<https://github.com/epicsdeb/pmacasyn/blob/master/pmacApp/pmacAsynIPPortSrc/pmacAsynIPPort.c#L184
<https://github.com/epicsdeb/pmacasyn/blob/master/pmacApp/pmacAsynIPPortSrc/pmacAsynIPPort.c#L184>>
You're right that the return code is asynError (from queueLockPort
[asynManager.c:1832]) but queueRequest returns asynDisconnected
[asynManager.c:1505]. I added more debugging.
I noticed two things:
1. If the IOC starts with the controller powered off, the above
messages are printed. asyn/tpmac correctly knows the device is
disconnected, but there's no attempt at a reconnection, so powering
it on doesn't do anything. Here's the report in this case:
asynManager::queueLockPort queueRequest failed: port P0 not connected
2018/02/12 14:05:56.201 pmacController::lowLevelWriteRead: Error
from pasynOctetSyncIO->writeRead. command: #2 ? F P
2018/02/12 14:05:56.201 drvPmacAxisGetStatus: not all status values
returned. Status: 3
epics> asynReport 5 P0
P0 multiDevice:No canBlock:Yes autoConnect:Yes
enabled:Yes connected:No numberConnects 0
nDevices 0 nQueued 18 blocked:No
asynManagerLock:No synchronousLock:Yes
exceptionActive:No exceptionUsers 11 exceptionNotifys 0
traceMask:0x1 traceIOMask:0x0 traceInfoMask:0x1
interposeInterfaceList
asynOctet pinterface 0x7fcf1ca2c900 drvPvt 0xf5c060
interfaceList
asynCommon pinterface 0x7fcf1ca29650 drvPvt 0xf579e0
asynOption pinterface 0x7fcf1ca29670 drvPvt 0xf579e0
asynOctet pinterface 0xf57b00 drvPvt 0xf579e0
Port 10.41.104.10:1025 <http://10.41.104.10:1025>
<http://10.41.104.10:1025>: Disconnected
fd: -1
Characters written: 0
Characters read: 0
2. If the IOC starts with the controller powered on, everything
works as expected:
epics> asynReport 5 P0
P0 multiDevice:No canBlock:Yes autoConnect:Yes
enabled:Yes connected:Yes numberConnects 1
nDevices 0 nQueued 0 blocked:No
asynManagerLock:No synchronousLock:No
exceptionActive:No exceptionUsers 11 exceptionNotifys 0
traceMask:0x1 traceIOMask:0x0 traceInfoMask:0x1
interposeInterfaceList
asynOctet pinterface 0x7fafe57dc900 drvPvt 0x1aa6090
interfaceList
asynCommon pinterface 0x7fafe57d9650 drvPvt 0x1aa1a10
asynOption pinterface 0x7fafe57d9670 drvPvt 0x1aa1a10
asynOctet pinterface 0x1aa1b30 drvPvt 0x1aa1a10
Port 10.41.104.10:1025 <http://10.41.104.10:1025>
<http://10.41.104.10:1025>: Connected
fd: 7
Characters written: 9787
Characters read: 5345
Then when we yank the power from the controller:
asynManager::queueLockPort queueRequest timed out
2018/02/12 14:33:03.069 pmacController::lowLevelWriteRead: Error
from pasynOctetSyncIO->writeRead. command: #1 ? F P
2018/02/12 14:33:03.069 drvPmacAxisGetStatus: not all status values
returned. Status: 3
epics> asynReport 5 P0
P0 multiDevice:No canBlock:Yes autoConnect:Yes
enabled:Yes connected:Yes numberConnects 1
nDevices 0 nQueued 18 blocked:No
asynManagerLock:No synchronousLock:Yes
exceptionActive:No exceptionUsers 11 exceptionNotifys 0
traceMask:0x1 traceIOMask:0x0 traceInfoMask:0x1
interposeInterfaceList
asynOctet pinterface 0x7fafe57dc900 drvPvt 0x1aa6090
interfaceList
asynCommon pinterface 0x7fafe57d9650 drvPvt 0x1aa1a10
asynOption pinterface 0x7fafe57d9670 drvPvt 0x1aa1a10
asynOctet pinterface 0x1aa1b30 drvPvt 0x1aa1a10
Port 10.41.104.10:1025 <http://10.41.104.10:1025>
<http://10.41.104.10:1025>: Connected
fd: 7
Characters written: 26671
Characters read: 11150
So it still thinks it is connected, even after quite some time.
I tried enabling the disconnectOnReadTimeout option, it yielded a
weird behavior: with the motor controller powered ON, the IOC seems
to start fine but then seems to disconnect from the controller by
itself:
asynManager::queueLockPort queueRequest failed: port P0 not connected
2018/02/12 14:40:38.972 pmacController::lowLevelWriteRead: Error
from pasynOctetSyncIO->writeRead. command: #1 ? F P
2018/02/12 14:40:38.972 drvPmacAxisGetStatus: not all status values
returned. Status: 3
epics> asynReport 5 P0
P0 multiDevice:No canBlock:Yes autoConnect:Yes
enabled:Yes connected:No numberConnects 1
nDevices 0 nQueued 21 blocked:No
asynManagerLock:No synchronousLock:Yes
exceptionActive:No exceptionUsers 11 exceptionNotifys 0
traceMask:0x1 traceIOMask:0x0 traceInfoMask:0x1
interposeInterfaceList
asynOctet pinterface 0x7f5c7df24900 drvPvt 0x8f21a0
interfaceList
asynCommon pinterface 0x7f5c7df21650 drvPvt 0x8edb20
asynOption pinterface 0x7f5c7df21670 drvPvt 0x8edb20
asynOctet pinterface 0x8edc40 drvPvt 0x8edb20
Port 10.41.104.10:1025 <http://10.41.104.10:1025>
<http://10.41.104.10:1025>: Disconnected
fd: -1
Characters written: 848
Characters read: 470
Bruno
On Mon, Feb 12, 2018 at 1:32 PM, Mark Rivers
<[email protected] <mailto:[email protected]>
<mailto:[email protected]
[mailto:[email protected]
<mailto:[email protected]>
<mailto:[email protected]
<mailto:[email protected]>>] *On Behalf Of *Mark Rivers
*Sent:* Monday, February 12, 2018 12:10 PM
*To:* 'Bruno Martins' <[email protected]
<mailto:[email protected]>
<mailto:[email protected]
<mailto:[email protected]>>>;
[email protected]
<mailto:[email protected]>
<mailto:[email protected] <mailto:[email protected]>>
*Cc:* [email protected] <mailto:[email protected]>
<mailto:[email protected] <mailto:[email protected]>>
*Subject:* RE: PMAC asyn autoconnect issue____
__ __
Hi Bruno,____
__ __
What version of asyn are you using?____
__ __
What version of tpmac are you using?____
__ __
__Ø__2018/02/12 12:00:14.826
pmacController::lowLevelWriteRead: Error from
pasynOctetSyncIO->writeRead. command: #2 ? F P
2018/02/12 12:00:14.826 drvPmacAxisGetStatus: not all status
values
returned. Status: 3
Command :#2 ? F P____
__ __
Note that Status: 3 is asynError, not asynDisconnected.____
__ __
Are you sure that asyn really thinks the port is disconnected
at this
time? You should run asynReport on that port to see whether
asyn thinks it
is disconnected or not.____
__ __
One problem is that when you power-off an IP device on Linux
it takes quite
a while before the OS decides the socket is disconnected. Before that time
it returns an error like “Resource temporarily unavailable”. During this
time asyn will not try to reconnect, because it thinks it is still
connected.____
__ __
This behavior is one reason that the drvAsynIPPort driver was
enhanced in
R4-29 to support the disconnectOnReadTimeout option. If this
option is
enabled then the port will automatically disconnect whenever
there is a
read timeout.____
__ __
Mark____
__ __
__ __
*From:*[email protected]
<mailto:[email protected]>
<mailto:[email protected]
<mailto:[email protected]>>
[mailto:[email protected]
<mailto:[email protected]>
<mailto:[email protected]
<mailto:[email protected]>>] *On Behalf Of *Bruno Martins
*Sent:* Monday, February 12, 2018 11:34 AM
*To:* [email protected] <mailto:[email protected]>
<mailto:[email protected] <mailto:[email protected]>>
*Cc:* [email protected] <mailto:[email protected]>
<mailto:[email protected] <mailto:[email protected]>>
|