Hi John,
I think I understand what is happening. For “slow” devices (ASYN_CANBLOCK=1) asynManager queues a request for the I/O operation. When the request gets to the head of the queue asynManager causes the actual I/O operation to be run in the
port driver thread.
If the device is not available there are 2 ways a read operation can fail.
1)
The request gets to the head of the queue and the actual I/O operation times out.
2)
The request spends too long in the queue, and the queue request itself times out.
The timeout for 1 is set in your startup script, in the modbusInterposeConfig command. You set it to 2000 ms = 2.0 seconds.
The timeout for 2 is set by default in asynManager.c to 2.0 seconds as well. It can be changed with the following iocsh command:
asynSetQueueLockPortTimeout(portName, timeout).
Failure mode 2 will happen when several requests are queued, and they each take timeout 1 to fail with failure mode 1. Eventually some of the requests in the queue will exceed timeout 2, and they will fail with failure mode 2.
The status returned for these 2 failure modes is different. Failure mode 1) returns a “read” error, while failure mode 2) returns a “timeout” error. For a particular record it can fail either way, and thus each record can be switching
its alarm status.
There are several ways this could be fixed.
1)
Change your startup script to have the read timeout be 200 ms rather than 2000. That is probably plenty because the device should normally respond faster than that. That would allow 10 requests to timeout before a queue request timeout
occurred. If that is not enough you could use the asynSetQueueLockPortTimeout to increase timeout 2, until all requests fail with mode 1.
2)
You could add the following command to your startup script:
asynSetOption(“MISC”, 0, “disconnectOnReadTimeout”, “Y)
That command causes asyn to disconnect the port if it times out. That will put the port in the disconnected state. If autoConnect is true then asynManager will keep trying to reconnect the device. When it becomes available it will reconnect.
3)
I could change the Modbus driver so that if failure 1 is indeed a timeout (as opposed to some other error) it sets the alarm status to “timeout” rather than “read”. Then both failure modes have the same alarm status.
I would suggest first trying method 2 and see what happens to the alarm status in that case.
If that does not work then try method 1.
If that does not work I will look at modifying the driver (method 3).
Mark
From: John Dobbins <john.dobbins at cornell.edu>
Sent: Sunday, July 12, 2020 7:19 PM
To: Mark Rivers <rivers at cars.uchicago.edu>
Subject: Re: Modbus alarms question
First, note that this is not an urgent matter.
EPICS_HOST_ARCH linux-x86_64
This IOC connects to a single PLC (from Automation Direct) via Modbus/TCP
It reads from 11 blocks of coils, length vary from 1 to 20 coils. It also reads some input registers but I have commented these out for the purpose of the test.
All the records are bi and the scan is I/O Intr for each record.
The template used by the records is
# bi record template for register inputs
field(DTYP,"asynUInt32Digital")
field(INP,"@asynMask($(PORT) $(OFFSET) 0x1)")
}
Note that I am not the author of the IOC and I haven't looked carefully at it.
I first tried to pare the IOC down to a single coil read from a single block but that did not reproduce the error. I tried two records, then two blocks without reproducing the
error. I then went back to the original configuration.
The attached file camonitor.txt shows the output for camonitor attached to one of the records "MSD_S1F_FLD_CR"
19:54:04 camonitor starts and shows STATE MAJOR (which is correct)
19:54:16 I unplug the network cable to the PLC
19:54:18 camonitor shows TIMEOUT INVALID (note: sometimes the first error is READ INVALID)
19:54:25
camonitor shows READ INVALID
19:54:40 camonitor shows TIMEOUT INVALID
19:54:47 camonitor shows READ INVALID
19:55:02 camonitor shows TIMEOUT INVALID
19:55:04 camonitor shows READ INVALID
19:55:12 camonitor shows TIMEOUT INVALID
19:55:14 camonitor shows READ INVALID
19:55:18 disconnected I exited the IOC
Also attached is the IOC console output as modbusTest_console.txt There are messages in the console output which match in time with the state changes of the PV.
I don't know that I have the best asyn trace settings:
asynSetTraceIOMask("MISC",0,9)
asynSetTraceMask("MISC",0,2)
asynSetTraceIOMask("MI_IN_3",0,4)
asynSetTraceMask("MI_IN_3",0,255)
asynSetTraceIOTruncateSize("MI_IN_3",0,512)
MI_IN_3 is the ModbusAsyn block for the signal to which I attached camonitor.
As always, thanks for your time and insight.
Your suggestion worked! I was able to reproduce the issue with the latest version of Modbus support. I will gather up all the details.
From: John Dobbins
Sent: Saturday, July 11, 2020 8:52 PM
To: Mark Rivers <rivers at cars.uchicago.edu>
Subject: Re: Modbus alarms question
Good point. I'll figure out a way to do that.
On Jul 11, 2020, at 7:09 PM, Mark Rivers <rivers at cars.uchicago.edu> wrote:
Hi John,
If you can run the simulator on a non-critical machine you could try then unplugging it's Ethernet cable, or powering it off.
Mark
Sent from my iPhone
On Jul 11, 2020, at 2:54 PM, John Dobbins <john.dobbins at cornell.edu> wrote:
?
Mark,
I tried to reproduce this problem by running a simulated PLC, connecting from the IOC, and then closing the simulated PLC. However the records behaved as hoped, i.e. they went straight to INVALID/READ.
It may be that this depends on the details of the PLC shutdown. The simulated PLC may have closed the socket on exit. The real PLC may have just disappeared.
It will take longer before I can try to reproduce with physical hardware.
John
From: John Dobbins <john.dobbins at cornell.edu>
Sent: Wednesday, July 8, 2020 2:58 PM
To: Mark Rivers <rivers at cars.uchicago.edu>
Cc: EPICS tech-talk <tech-talk at aps.anl.gov>
Subject: Re: Modbus alarms question
Mark
Thanks. I'll start by upgrading to latest version and do a new test. After the test I'll send all the details.
John
From: Mark Rivers <rivers at cars.uchicago.edu>
Sent: Wednesday, July 8, 2020 1:38 PM
To: John Dobbins <john.dobbins at cornell.edu>
Cc: EPICS tech-talk <tech-talk at aps.anl.gov>
Subject: Re: Modbus alarms question
Hi John,
Some questions:
- What version of asyn are you using?
- Are you using relative or absolute Modbus addressing?
- Is this Modbus TCP over Ethernet?
- For the records that generate the alarm, what is:
- The record type
- The Modbus function code
- The SCAN rate
- The polling rate of the modbus driver
- The time between alarm status changes
- Is it possible for you update to modbus R3-0 and see if the problem persists? It will make it easier for me to fix if you are running the latest release.
The alarm status is set according to the status returned by the function doModusIO(). The driver should only do callbacks with a new alarm status when that status return changes.
It would probably be a good idea to turn on asynTrace of the underlying asynIPPort driver and see if its status is changing.
Mark
From: Tech-talk <tech-talk-bounces at aps.anl.gov> on behalf of John Dobbins via Tech-talk <tech-talk at aps.anl.gov>
Sent: Wednesday, July 8, 2020 11:54 AM
To: tech-talk at aps.anl.gov
Subject: Modbus alarms question
All,
We have a PLC from which data is read by an IOC via Modbus (R2-11). The PLC was turned off and while the PLC was off the IOC generated a stream of alarm messages from the Alarm Server (BEAST) while the PVs toggled between
SEVERITY: INVALID STATUS: READ_ALARM
and
SEVERITY: INVALID STATUS: TIMEOUT_ALARM
Aside from configuring these alarms to be latching in the Alarm Server configuration is there some other way to make it so this state (loss of modbus communication) doesn't generate a stream of alarms?
Regards,
John Dobbins
Research Support Specialist
Cornell High Energy Synchrotron Source
Cornell University
www.chess.cornell.edu<http://www.chess.cornell.edu>