Problem solved, Mark!
I read the streamDevice documentation again to better understand your explanations. It became clear that the lack of terminator in the replies coming from the 60000 port was causing the timeouts as you said. To verify that i used "var streamError 1" right after
iocInit and indeed it acused several timeouts whenever i tried to trigger certain records.
I talked to the current developer of the protocol (Allan Borgato, who is now on copy) and together we reimplemented it with terminators. Also, we dont know why they were configured in the protocol file. After adding the terminators in the protocol several improvements
were noticed not only with the record we wanted but also with some other records that worked but used to take too much time to take effect. In particular, "var streamError 1" right after iocInit stopped accusing the timeouts.
To sum up the problem and its solution, if I understand the situation correctly:
- Weird terminators were configured in the .proto file without reason;
- StreamDevice waited for the terminators which never came, thus generating timeouts;
- The read functions returned too late thus delaying the whole process and preveting the records from receiving the expected values.
Thanks for your help. Allan also sends his thanks.
Cheers,
Marco
From: Mark Rivers <rivers at cars.uchicago.edu>
Sent: 07 October 2022 12:24
To: Marco A. Barra Montevechi Filho <marco.filho at lnls.br>; Matthew Newville <newville at cars.uchicago.edu>
Cc: tech-talk at aps.anl.gov <tech-talk at aps.anl.gov>
Subject: RE: Exploring EPICS performance/processing limits
Ø
I dont know why the previous developer defined these terminators in the .proto file but if this is the problem we are close to fixing it.
It is actually OK to define the terminator in the protocol file. I prefer to do it with the commands in iocsh because then any software using the port (e.g. asynRecord) will
be using the correct terminator, not just stream. But the problem is that the protocol file DOES define the terminator, but it seems like it is not being used when it communicates. You need to figure out why. You also need to understand if your server on
port 60000 expects to receive terminators on input and appends a terminator on output.
Mark
From: Marco A. Barra Montevechi Filho <marco.filho at lnls.br>
Sent: Friday, October 7, 2022 10:19 AM
To: Mark Rivers <rivers at cars.uchicago.edu>; Matthew Newville <newville at cars.uchicago.edu>
Cc: tech-talk at aps.anl.gov
Subject: Re: Exploring EPICS performance/processing limits
Hi!
The problem is getting much clearer thanks to your help.
The service running in port 60000 is a custom protocol made in-house. I dont have immediate access to the documents of the protocol, but its not hard to get them.
Im going to look for it, i think we are close to a solution. I dont know why the previous developer defined these terminators in the .proto file but if this is the problem we are close to fixing it.
Will be back soon with updates. Much thanks for the help 🙂
Marco
Hi Marco,
Your example is pretty complex, it is best to simplify as much as possible for tests like this.
I have some questions.
-
Your drvAsynIPPort driver is connecting to 127.0.0.1:60000. What server are you running on port 60000? I was using the standard Echo Protocol service on port 7.
-
Your protocol file has Terminator = LF; However, the asynTrace output shows that there is actually no LF terminator being sent or received. Do you understand this?
2022/10/07 11:15:41.810 127.0.0.1:60000 write 92
{\"id\":\"1\", \"method\":\"HS_ImgChipNumberID_Command\", \"params\": [[\"1\", \"0\"]], \"jsonr
2022/10/07 11:15:41.815 127.0.0.1:60000 read 57
{\"id\":\"1\", \"result\":\"Selected sensor 0\", \"jsonrpc\":\"2.0\"}
2022/10/07 11:15:41.917 127.0.0.1:60000 write 87
-
The asynTrace shows that it is almost exactly 0.1 seconds between the 2 write commands. This is the same thing I was seeing, and I believe it is due to a read timeout because you are not using a terminator.
Mark
Hi, Mark!
I just saw your most recent e-mail and im going to read it carefully. Just sending the previous required files in here as i said i would do: i send them as attached files.
The record ImgChipNumberID is defined as an alias in line 119 of mobibackend.template file. One of my attempts of changing it sequentially with transform record is in mobipix_sensors_initialization.template file.
The write_2param function is defined in line 55 of the protocol file.
The output of asynTrace function was made when running my python script with large (bigger than 0.5) delays.
Thanks for the help, im going to see the email about how you reproducted the behavior now.
Best regards,
Marco
Hi Marco,
Please post:
-
The complete output when your IOC starts
-
The database containing MOBICDTE:Backend:ImgChipNumberID
-
The protocol file you are using
-
The output of asynTrace as I suggested my message this morning
Mark
Hi, Matt!
Sorry, I tried to make my question succint but ended up hiding several details of the problem.
So, first of all: the whole problem showed itself when i tried to create a transform record that had several OUT fields, some of them pointing to the same record and changing its value. Here is part of the transform record:
record(transform, "$(P)$(R)SensorsInitChip"){
field(OUTA,"${P}${BE}:ImgChipNumberID PP")
field(OUTB,"${P}${BE}:DAC_GND PP")
field(OUTC,"${P}${BE}:DAC_CAS PP")
field(OUTD,"${P}${BE}:DAC_FBK PP")
field(OUTE,"${P}${BE}:DAC_DiscL PP")
field(OUTF,"$(P)$(R)Tmr1 PP")
field(OUTG,"${P}${BE}:ImgChipNumberID PP")
field(OUTH,"${P}${BE}:DAC_GND PP")
}
So it changes ImgChipNumberID's value, do several things and then changes it again.
To test if the problem was with the transform record (spoilers: it seems it is not), i then made the following script:
import epics, time, sys
epics.caput("MOBICDTE:Backend:ImgChipNumberID", i)
changer(float(sys.argv[1]))
and ran .my_test.py while monitoring the IOC with 'tcpflow -c -i lo port 60000 | grep "HS_ImgChipNumberID"' (HS_ImgChipNumberID is the JSON command being sent by the record) and monitoring the record with
camonitor, as you suggested.
Here are the results:
FOR 5 SECONDS OF SLEEP TIME:
TCPFLOW:
27.000.000.001.55780-127.000.000.001.60000: {"id":"1", "method":"HS_ImgChipNumberID_Command", "params": [["1", "0"]], "jsonrpc": "2.0"}
127.000.000.001.55780-127.000.000.001.60000: {"id":"1", "method":"HS_ImgChipNumberID_Command", "params": [["0"]], "jsonrpc": "2.0"}
127.000.000.001.55780-127.000.000.001.60000: {"id":"1", "method":"HS_ImgChipNumberID_Command", "params": [["1", "1"]], "jsonrpc": "2.0"}
127.000.000.001.55780-127.000.000.001.60000: {"id":"1", "method":"HS_ImgChipNumberID_Command", "params": [["0"]], "jsonrpc": "2.0"}
127.000.000.001.55780-127.000.000.001.60000: {"id":"1", "method":"HS_ImgChipNumberID_Command", "params": [["1", "2"]], "jsonrpc": "2.0"}
127.000.000.001.55780-127.000.000.001.60000: {"id":"1", "method":"HS_ImgChipNumberID_Command", "params": [["0"]], "jsonrpc": "2.0"}
127.000.000.001.55780-127.000.000.001.60000: {"id":"1", "method":"HS_ImgChipNumberID_Command", "params":
[["1", "3"]], "jsonrpc": "2.0"}
127.000.000.001.55780-127.000.000.001.60000: {"id":"1", "method":"HS_ImgChipNumberID_Command", "params": [["0"]], "jsonrpc": "2.0"}
(The "params": [["1", "N"]] is the command causing trouble later, but it is working nice here)
CAMONITOR:
camonitor MOBICDTE:Backend:ImgChipNumberID
CA.Client.Exception...............................................
Warning: "Identical process variable names on multiple servers"
Context: "Channel: "MOBICDTE:Backend:ImgChipNumberID", Connecting to: 192.168.55.1:5064, Ignored: s-mgn-mob01-l.abtlus.org.br:5064"
Source File: ../cac.cpp line 1320
Current Time: Thu Oct 06 2022 14:56:42.275342454
..................................................................
MOBICDTE:Backend:ImgChipNumberID 2022-10-06 14:31:09.112389 3 READ INVALID
MOBICDTE:Backend:ImgChipNumberID 2022-10-06 14:57:59.690554 0 READ INVALID
MOBICDTE:Backend:ImgChipNumberID 2022-10-06 14:58:04.690451 1 READ INVALID
MOBICDTE:Backend:ImgChipNumberID 2022-10-06 14:58:09.692468 2 READ INVALID
MOBICDTE:Backend:ImgChipNumberID 2022-10-06 14:58:14.696379 3 READ INVALID
FOR 0.1 SECONDS OF SLEEP TIME:
TCPFLOW:
127.000.000.001.55780-127.000.000.001.60000: {"id":"1", "method":"HS_ImgChipNumberID_Command", "params": [["1", "0"]], "jsonrpc": "2.0"}
127.000.000.001.55780-127.000.000.001.60000: {"id":"1", "method":"HS_ImgChipNumberID_Command", "params": [["0"]], "jsonrpc": "2.0"}
127.000.000.001.55780-127.000.000.001.60000: {"id":"1", "method":"HS_ImgChipNumberID_Command", "params":
[["1", "1"]], "jsonrpc": "2.0"}
127.000.000.001.55780-127.000.000.001.60000: {"id":"1", "method":"HS_ImgChipNumberID_Command", "params": [["0"]], "jsonrpc": "2.0"}
127.000.000.001.55780-127.000.000.001.60000: {"id":"1", "method":"HS_ImgChipNumberID_Command", "params": [["1", "3"]], "jsonrpc": "2.0"}
127.000.000.001.55780-127.000.000.001.60000: {"id":"1", "method":"HS_ImgChipNumberID_Command", "params": [["0"]], "jsonrpc": "2.0"}
(so we can see that the command "params": [["1", "2"]] was skipped)
CAMONITOR:
MOBICDTE:Backend:ImgChipNumberID 2022-10-06 15:05:22.936035 1 READ INVALID
MOBICDTE:Backend:ImgChipNumberID 2022-10-06 15:05:23.141153 3 READ INVALID
So we can see that even though the command "params": [["1","0"]] was sent, the PV was not published with value=0.
So to answer your questions: the weird behavior is also when monitoring with camonitor and VALUE is indeed different for each iteration.
Thanks for the help,
Marco
I don't understand that behavior and would ask whether that observation is only when using tcpflow or also appears from using a camonitor.
To be clear, you should not need `sleep()` there, and certainly not as long as 0.5 seconds. Yes, the IOC and Channel Access should be capable of dealing with time intervals much smaller
than that.
Since "VALUE" is left to the imagination of the reader, is it obvious that this should be a new value (ie, different from the current value) for all PVs?
Aviso Legal: Esta mensagem e seus anexos podem conter informações confidenciais e/ou de uso restrito. Observe atentamente seu conteúdo e considere eventual consulta ao remetente
antes de copiá-la, divulgá-la ou distribuí-la. Se você recebeu esta mensagem por engano, por favor avise o remetente e apague-a imediatamente.
Disclaimer: This email and its attachments may contain confidential and/or privileged information. Observe its content carefully and consider possible querying to the sender before
copying, disclosing or distributing it. If you have received this email by mistake, please notify the sender and delete it immediately.
|