EPICS Home

Experimental Physics and Industrial Control System


 
1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  <20192020  2021  2022  2023  2024  Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  <20192020  2021  2022  2023  2024 
<== Date ==> <== Thread ==>

Subject: Re: Comparison between EPICS and TANGO.
From: Anders Lindh Olsson via Tech-talk <[email protected]>
To: "[email protected]" <[email protected]>
Date: Wed, 13 Feb 2019 20:55:38 +0000

Hi Azra,


There's a technical report from 2014 by Alejandro Vázquez-Otero called "Tango vs EPICS technical comparison ELI beamlines" which may help. You can find it on researchgate.



Anders




From: [email protected] <[email protected]> on behalf of [email protected] <[email protected]>
Sent: 11 February 2019 15:37
To: [email protected]
Subject: Tech-talk Digest, Vol 13, Issue 74
 
Send Tech-talk mailing list submissions to
        [email protected]

To subscribe or unsubscribe via the World Wide Web, visit
        https://mailman.aps.anl.gov/mailman/listinfo/tech-talk
or, via email, send a message with subject or body 'help' to
        [email protected]

You can reach the person managing the list at
        [email protected]

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Tech-talk digest..."


Today's Topics:

   1. Comparison between EPICS and TANGO. (Azra Jabeen)
   2. Re: trying to find information on mvme5100 battery (Dirk Zimoch)
   3. Re: trying to find information on mvme5100 battery (Dirk Zimoch)
   4. Re: trying to find information on mvme5100 battery (Dirk Zimoch)
   5. Re: Weird stream device behavior when using the IOC shell's
      exit      function (Abdalla  Ahmad)
   6. RE: Weird stream device behavior when using the IOC shell's
      exit      function (Mark Rivers)


----------------------------------------------------------------------

Message: 1
Date: Mon, 11 Feb 2019 01:09:58 +0500
From: Azra Jabeen <[email protected]>
To: [email protected]
Subject: Comparison between EPICS and TANGO.
Message-ID:
        <CAHMUQcGdYhEJU1Ay4iybWZcqUYjLbPcQigEPGUqJh2xpQPwF6A@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Is ther any report published about the differences between TANGO and EPICS
based DCS. Which is best for a DCS design either EPICS based or TANGO based
DCS with respect to cyber security.
Any help please.

Azra
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.aps.anl.gov/pipermail/tech-talk/attachments/20190211/e16d3eef/attachment.html>

------------------------------

Message: 2
Date: Mon, 11 Feb 2019 10:43:39 +0100
From: Dirk Zimoch <[email protected]>
To: Maren Purves <[email protected]>, <[email protected]>
Subject: Re: trying to find information on mvme5100 battery
Message-ID: <[email protected]>
Content-Type: text/plain; charset="utf-8"; Format="flowed"

Hi Maren,

Attached find my driver for the M48T37 Timekeeper battery status. It
works with any EPICS version on MVME51xx as well as on MVME23xx, maybe
on more boards using this chip.

I have some more code to read and set the RTC and use the alarm function
of the M48T37 in case you need it.

The code is vxWorks only.

Be aware that the M48T37 reads the battery status only once at power-on.
So there is no point in processing the record more often than PINI=YES.

Dirk


On 09.02.19 02:06, Maren Purves wrote:
> Happy to, thanks Dirk!
>
> On 2/8/19 05:35, Dirk Zimoch via Tech-talk wrote:
>> I am just now writing a driver for it. Wait a few days please...
>>
>> On 08.02.19 03:01, Maren Purves via Tech-talk wrote:
>>> Hi all,
>>>
>>> I'm working on some monitoring stuff for a new receiver
>>> and as the new receiver isn't here yet I just got a spare
>>> mvme5100 board. Trying to boot it I found that it had
>>> 'forgotten' its boot parameters. It is supposed to have
>>> a replaceable battery and we found on part that looks like
>>> it may be a battery but with no part number or anything on
>>> it.
>>> None of the manuals I found on-line has a board layout
>>> with labels on it that would tell me where the battery
>>> is (assuming that the battery is the problem), or had
>>> any specifications for it.
>>> Pulling out another spare board, it behaved the same way.
>>> If we have to put any of these spare boards into use that
>>> would mean that they would have to have their boot parameters
>>> reinstalled each time we have a power glitch. That's OK
>>> for us here in the office/lab but not OK for people
>>> operating telescopes at night.
>>>
>>> Does anybody know and can either point me somewhere where
>>> to find this information - or have the information and
>>> pass it on?
>>>
>>> Thanks in advance,
>>> Maren Purves
>>> East Asian Observatory
>>>
>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: devMT48T37.c
Type: text/x-csrc
Size: 666 bytes
Desc: not available
URL: <http://mailman.aps.anl.gov/pipermail/tech-talk/attachments/20190211/1dfb3d73/attachment.bin>
-------------- next part --------------
device(bi,         INST_IO, MT48T37BatteryLow,  "MT48T37BatteryLow")
-------------- next part --------------
record (bi, "$(IOC):BATTERY")
{
    field(DTYP, "MT48T37BatteryLow")
    field(PINI, "YES") # battery status uptates at power-on only
    field(ZNAM, "OK")
    field(ONAM, "LOW")
    field(OSV,  "MINOR")
}

------------------------------

Message: 3
Date: Mon, 11 Feb 2019 10:53:39 +0100
From: Dirk Zimoch <[email protected]>
To: <[email protected]>
Subject: Re: trying to find information on mvme5100 battery
Message-ID: <[email protected]>
Content-Type: text/plain; charset="utf-8"; Format="flowed"

Ehm... I screwed up the file names. The chip Name is like "M48T37" and
the file and function names should have matched. Sorry. Here are the
fixed files.

Dirk

On 11.02.19 10:43, Dirk Zimoch via Tech-talk wrote:
> Hi Maren,
>
> Attached find my driver for the M48T37 Timekeeper battery status. It
> works with any EPICS version on MVME51xx as well as on MVME23xx, maybe
> on more boards using this chip.
>
> I have some more code to read and set the RTC and use the alarm function
> of the M48T37 in case you need it.
>
> The code is vxWorks only.
>
> Be aware that the M48T37 reads the battery status only once at power-on.
> So there is no point in processing the record more often than PINI=YES.
>
> Dirk
>
>
> On 09.02.19 02:06, Maren Purves wrote:
>> Happy to, thanks Dirk!
>>
>> On 2/8/19 05:35, Dirk Zimoch via Tech-talk wrote:
>>> I am just now writing a driver for it. Wait a few days please...
>>>
>>> On 08.02.19 03:01, Maren Purves via Tech-talk wrote:
>>>> Hi all,
>>>>
>>>> I'm working on some monitoring stuff for a new receiver
>>>> and as the new receiver isn't here yet I just got a spare
>>>> mvme5100 board. Trying to boot it I found that it had
>>>> 'forgotten' its boot parameters. It is supposed to have
>>>> a replaceable battery and we found on part that looks like
>>>> it may be a battery but with no part number or anything on
>>>> it.
>>>> None of the manuals I found on-line has a board layout
>>>> with labels on it that would tell me where the battery
>>>> is (assuming that the battery is the problem), or had
>>>> any specifications for it.
>>>> Pulling out another spare board, it behaved the same way.
>>>> If we have to put any of these spare boards into use that
>>>> would mean that they would have to have their boot parameters
>>>> reinstalled each time we have a power glitch. That's OK
>>>> for us here in the office/lab but not OK for people
>>>> operating telescopes at night.
>>>>
>>>> Does anybody know and can either point me somewhere where
>>>> to find this information - or have the information and
>>>> pass it on?
>>>>
>>>> Thanks in advance,
>>>> Maren Purves
>>>> East Asian Observatory
>>>>
>>
>>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: devM48T37.c
Type: text/x-csrc
Size: 662 bytes
Desc: not available
URL: <http://mailman.aps.anl.gov/pipermail/tech-talk/attachments/20190211/f231ac68/attachment.bin>
-------------- next part --------------
device(bi,         INST_IO, M48T37BatteryLow,  "M48T37BatteryLow")
-------------- next part --------------
record (bi, "$(IOC):BATTERY")
{
    field(DTYP, "M48T37BatteryLow")
    field(PINI, "YES") # battery status uptates at power-on only
    field(ZNAM, "OK")
    field(ONAM, "LOW")
    field(OSV,  "MINOR")
}

------------------------------

Message: 4
Date: Mon, 11 Feb 2019 11:36:04 +0100
From: Dirk Zimoch <[email protected]>
To: <[email protected]>
Subject: Re: trying to find information on mvme5100 battery
Message-ID: <[email protected]>
Content-Type: text/plain; charset="utf-8"; format=flowed

One more remark: According to my tests, a battery with low voltage is
correctly detected on MVME51xx and MVME23xx. But a completely missing
battery is not reported as "low". Strange.



------------------------------

Message: 5
Date: Mon, 11 Feb 2019 13:51:59 +0000
From: "Abdalla  Ahmad" <[email protected]>
To: Mark Rivers <[email protected]>, 'Dirk Zimoch'
        <[email protected]>
Cc: "'[email protected]'" <[email protected]>
Subject: Re: Weird stream device behavior when using the IOC shell's
        exit    function
Message-ID: <[email protected]>
Content-Type: text/plain; charset="iso-8859-1"

Hello Mark

Your assumption is correct! I applied the new database and the IOC exits correctly. If possible, can you please explain in more details how the new design could have solved the problem and why it is a preferred one?

Before applying the new database I was investigating the controller's response and it was for some reason sending "CR CR LF" as a terminator. I got the controller's latest manual and they mentioned it that the Ethernet's response is appended by "CR CR LF". What's more weird than this is, on a telnet session this is how the device responds:

>cmd 61 1
00 OK YES
>

But on EPICS and even a test socket program, this is the response:

Send: cmd 61 1
Receive: >
Receive: 00 OK YES

Receive: >

The extra new line indicates the aforementioned terminator sequence. The difference in the two responses is because in each protocol I start with "out" where in fact I should start with "in" to receive the starting ">", a test program confirmed that. Skipping 7 characters instead of 6 helped skipping the first ">" and the 2nd one should have been ignored by "ExtraInput = Ignore". I tried to read the first ">" at first but I got I/O error messages. I did not bother to investigate more so I changed back to skipping 7 characters.

But I can confirm that even this alone did not solve it, it seems the problem is a matter of database design and protocol file. Now the IOC is running with base 3.15.6, asyn 4.33 and stream 2.7.7. I still get the asyn write error messages but eventually it exits. I will try upgrading to stream 2.8.8 soon to avoid these messages.

Thank you very much Mark!
Abdalla.
________________________________________
From: Mark Rivers <[email protected]>
Sent: Thursday, February 7, 2019 5:27 PM
To: Abdalla  Ahmad; 'Dirk Zimoch'
Cc: '[email protected]'
Subject: RE: Weird stream device behavior when using the IOC shell's exit function

Hi Abdalla,

Somethings seems wrong in the output you sent.  Each record is set to scan at 1 second.  However, the records are all processing more frequently than that.  The isEnabled record (cmd 61) is processing 3 times per second, separated by 0.12, 0.12, and 0.76 seconds.  The getCurrent record is processing twice a second, separated by 0.25 and 0.75 seconds.

I have attached a modified version of your template file.  Only the first record is periodically scanned, and the remaining records are processed with FLNK.  No records have PINI set, because this is not needed if they are periodically scanned.  This is a preferred design when all records should process at the same rate, since they will execute in a fixed order as fast as possible.  You can control the scan rate of all of them by changing only the SCAN field of getCurrent.

My prediction is that there should be about 400 ms delay between the scan loops if SCAN=1 second, because each record takes about 120 ms and there are 5 records = 600 ms.  Please see if this fixes the problem.  What you have observed indicates a potential problem with stream/asyn, but this may let you work around the problem.

Mark

-----Original Message-----
From: Abdalla Ahmad <[email protected]>
Sent: Thursday, February 7, 2019 7:37 AM
To: Mark Rivers <[email protected]>; 'Dirk Zimoch' <[email protected]>
Cc: '[email protected]' <[email protected]>
Subject: RE: Weird stream device behavior when using the IOC shell's exit function

Hello Mark

Yes I noticed that and next week I will write a test program to read some parameters and see exactly how the device is actually communicating, because on a telnet session you get first a ">" prompt, when you type a command you get the output on a new line then followed by a ">" on a new line.

If you go back to the back trace, you can see that the poll is stuck with data ">", I started to suspect something is wrong with In/Out terminators. I should be able to verify the device terminator character(s?) through the test program I will write. Regarding the "cmd 61" which corresponds to the isEnabled PV, I noticed that if I reduce the number of substitutions, there is a point that the IOC exits if the is Enabled PV is commented from the DB. Nothing strange on a telnet session for any command.

Thank you for your time and efforts!
Abdalla.

-----Original Message-----
From: Mark Rivers [mailto:[email protected]]
Sent: Thursday, February 07, 2019 3:30 PM
To: Abdalla Ahmad <[email protected]>; 'Dirk Zimoch' <[email protected]>
Cc: '[email protected]' <[email protected]>
Subject: Re: Weird stream device behavior when using the IOC shell's exit function

Hi Abdalla,

I see some curious things in your output.

2019/02/07 10:29:03.257 SR-IPC1:23 write 9 cmd 61 1\r
2019/02/07 10:29:03.276 SR-IPC1:23 read 1
>
2019/02/07 10:29:03.376 SR-IPC1:23 read 12 OK 00 YES\r\r\n
2019/02/07 10:29:03.377 SR-IPC1:23 write 9 cmd 61 2\r
2019/02/07 10:29:03.396 SR-IPC1:23 read 1
>
2019/02/07 10:29:03.496 SR-IPC1:23 read 12 OK 00 YES\r\r\n
2019/02/07 10:29:03.497 SR-IPC1:23 write 9 cmd 61 3\r
2019/02/07 10:29:03.516 SR-IPC1:23 read 1
>
2019/02/07 10:29:03.616 SR-IPC1:23 read 12 OK 00 YES\r\r\n

- Each write is followed by 2 reads.  The first read occurs 20 ms after the write. The second read occurs 100 ms after the first read.  Is that just how the device works, or does your protocol file use 2 read operations?

- Sometimes the same command is sent several times in rapid succession as above.  Do you understand why the "cmd 61" command is being sent 3 times in a row?  Are your records each individually periodically scanned, or do you use FLNK to processes them sequentially?

Can you send your database and protocol file?

> A question came up to my mind, could the standard telnet port be related to this deadlock?

I don't think that should matter.  On the IOC side it does not care what port it connected to, the Linux client is using a random high number port on its end.

Mark




________________________________
From: Abdalla Ahmad <[email protected]>
Sent: Thursday, February 7, 2019 2:52 AM
To: Mark Rivers; 'Dirk Zimoch'
Cc: '[email protected]'
Subject: RE: Weird stream device behavior when using the IOC shell's exit function


Hello Mark



Attached is the output of asyn trace mask on one of the controllers. The IOC is working fine even after the exit function so I had to press Ctrl-C at some point.



For your suggestion regarding different configuration: asyn 4.18 can't be built with base 3.15.6 because there is a line in the asyn Gpib module that sets an event record value (a char array) to an integer (which is what was in 3.14.12.3). And stream 2.5.1 with base 3.15.6 complains about two headers it can't find: wdlib.h and streamReferences. According to streamDevice/src/Makefile, StreamReferences should have been generated from CONFIG_STREAM but it's not.



I think for now I will stick with the old setup: base 3.14.12.3, asyn 4.18 and stream 2.5.1.



Since the same setup worked with the agilent controllers (serially interfaced and connected via terminal servers), A question came up to my mind, could the standard telnet port be related to this deadlock?



Best Regards,

Abdalla.



From: Mark Rivers [mailto:[email protected]]
Sent: Wednesday, February 06, 2019 8:09 PM
To: Abdalla Ahmad <[email protected]>; 'Dirk Zimoch' <[email protected]>
Cc: '[email protected]' <[email protected]>
Subject: RE: Weird stream device behavior when using the IOC shell's exit function



The first test I would like you to run is with the recent code, ie. base 3.15.6, asyn 4.33 and stream 2.8.8. Set asynTraceIOMask=2 and asynTraceMask=9 for the TCP port.  Then we will see all communication with the device.  When that is running type "exit".  Send the complete output, for a few seconds before you type exit, and a few seconds after you type exit.



Mark





From: Mark Rivers
Sent: Wednesday, February 6, 2019 8:32 AM
To: 'Abdalla Ahmad' <[email protected]<mailto:[email protected]>>; Dirk Zimoch <[email protected]<mailto:[email protected]>>
Cc: [email protected]<mailto:[email protected]>
Subject: RE: Weird stream device behavior when using the IOC shell's exit function



?  I thought the while loop is causing the block so I removed the while loop and put the poll function just like what is done in version 4.18 but I got the same behavior.

?  So I think that poll here never returns.

?  As the documentation says, poll will block if a negative timeout is provided.

?  So I think somehow stream device is passing a negative timeout but I could not verify the timeout value received in the readIt function.



You could edit that code to print the value of readPollmsec just before the while() statement.  That will tell you if stream device is passing a negative timeout.  The purpose of the loop is to retry the poll in case it terminates early with errno=EINTR.  But it will only loop for readPollsec ms at longest.



I suggest you also try independently changing the versions of asyn and Stream to see which has changed to cause the problem.  I suggest using these 2 configurations:



-          Base 3.15.6, asyn 4.16 Stream 2.8.8

-          Base 3.15.6, asyn 4.33, Stream 2.5.1



I have a simple StreamDevice IOC that I will test today to see if I can reproduce the problem.



Mark





From: Abdalla Ahmad <[email protected]<mailto:[email protected]>>
Sent: Wednesday, February 6, 2019 6:45 AM
To: Mark Rivers <[email protected]<mailto:[email protected]>>; Dirk Zimoch <[email protected]<mailto:[email protected]>>
Cc: [email protected]<mailto:[email protected]>
Subject: RE: Weird stream device behavior when using the IOC shell's exit function



Hello Mark



Just would like to share some findings:



I cloned the latest stream device 2.8.8 and I still get the same behavior. So I investigated the backtrace a little bit specifically the thread you mentioned it is causing the deadlock.



In asyn 4.33 file asyn/drvAsynSerial/drvAsynIPPort.c function "readIt", line 725 is the poll function call causing the deadlock. I thought the while loop is causing the block so I removed the while loop and put the poll function just like what is done in version 4.18 but I got the same behavior. So I think that poll here never returns.



As the documentation says, poll will block if a negative timeout is provided. So I think somehow stream device is passing a negative timeout but I could not verify the timeout value received in the readIt function.



Best Regards,

Abdalla.



From: Abdalla Ahmad
Sent: Wednesday, February 06, 2019 9:12 AM
To: 'Mark Rivers' <[email protected]<mailto:[email protected]>>
Cc: [email protected]<mailto:[email protected]>
Subject: RE: Weird stream device behavior when using the IOC shell's exit function



Hello Mark



This is just to confirm that I re-built the old setup on CentOS 7 x64; base 3.14.12.3, asyn 4.18 and stream 2.5.1 and everything works fine. I don't even get the asynWrite error messages.



Best Regards,

Abdalla.



From: Mark Rivers [mailto:[email protected]]
Sent: Tuesday, February 05, 2019 5:47 PM
To: Abdalla Ahmad <[email protected]<mailto:[email protected]>>
Cc: [email protected]<mailto:[email protected]>
Subject: RE: Weird stream device behavior when using the IOC shell's exit function



You can now use asynSetTraceIOMask and asynSetTraceMask to monitor the communication.  If you enable that before you type "exit" we can see what is happening.



I view the problem exiting the IOC as an annoyance but not something that has to be fixed immediately.  You can always just type CTRL-C.



Mark





From: Abdalla Ahmad <[email protected]<mailto:[email protected]>>
Sent: Tuesday, February 5, 2019 8:17 AM
To: Mark Rivers <[email protected]<mailto:[email protected]>>
Cc: [email protected]<mailto:[email protected]>
Subject: Re: Weird stream device behavior when using the IOC shell's exit function



Well I never thought of checking the client, I will test it and come back.

I forgot to mention that the device is being controlled through standard telnet port 23. Does it make a difference for stream device to work with a well known port number?

Get Outlook for Android<https://aka.ms/ghei36>





On Tue, Feb 5, 2019 at 4:14 PM +0200, "Mark Rivers" <[email protected]<mailto:[email protected]>> wrote:

Is your device working correctly, or is it timing out? I wonder if the exit problem is because Stream is polling constantly and never releasing the lock so the epicsExit thread can never run?



Mark





Sent from my iPhone



> On Feb 5, 2019, at 7:23 AM, Abdalla Ahmad  wrote:

>

> Hello Mark

>

> Including asyn.dbd file did the trick. Looking forward for your thoughts on the exit issue.

>

> Thank you for your time.

> Abdalla.

>

> -----Original Message-----

> From: Mark Rivers [mailto:[email protected]]

> Sent: Tuesday, February 05, 2019 3:19 PM

> To: Abdalla Ahmad

> Cc: [email protected]<mailto:[email protected]>

> Subject: Re: Weird stream device behavior when using the IOC shell's
> exit function

>

> OK, I understand the problem with the missing asyn commands.  You do have these commands in your help output:

>

>

> drvAsynIPPortConfigure          drvAsynIPServerPortConfigure

> drvAsynSerialPortConfigure

>

>

> That tells me that your application dbd file is including drvAsynIPPort.dbd and drvAsynSerialPort.dbd, but it is not including asyn.dbd.  It also explains why it used to work, and now it does not.  Previously drvAsynIPPort.dbd and drvAsynSerialPort.dbd themselves included asyn.dbd.  However, recent EPICS base releases no longer allow a dbd file to be loaded more than once, so asyn.dbd was removed from drvAsynIPPort.dbd and drvAsynSerialPort.dbd.  Now your application must explicitly include asyn.dbd.

>

>

> I think this is independent of the IOC exiting issue, but please include asyn.dbd and see if you still have that problem.

>

>

> Mark

>

>

>

> ________________________________

> From: Abdalla Ahmad

> Sent: Tuesday, February 5, 2019 6:34 AM

> To: Mark Rivers

> Cc: [email protected]<mailto:[email protected]>

> Subject: RE: Weird stream device behavior when using the IOC shell's
> exit function

>

> Hi Mark

>

> I attached the help output in help.txt. For the database nothing is I/O scanned, standard 1 second rate as in the attached database. I also checked the protocol file but nothing seems strange.

>

> While writing the email I thought of increasing the scan rate to 5 seconds and it works. I still get the same asynError in write but the IOC eventually exits. Even if it exits gracefully now, it is still a problem because we need the scan rate to match the IMG ones. As I told you before it used to work on the old setup without any problems. Is there anything we can investigate?

>

> Best Regards,

> Abdalla.

>

> -----Original Message-----

> From: Mark Rivers [mailto:[email protected]]

> Sent: Tuesday, February 05, 2019 2:20 PM

> To: Abdalla Ahmad

> Cc: [email protected]<mailto:[email protected]>

> Subject: Re: Weird stream device behavior when using the IOC shell's
> exit function

>

> Hi Abdalla,

>

>

> Your IOC is clearly built OK with asyn because you have threads that are in asyn functions.

>

>

> Thread 13 (Thread 0x7fffed943700 (LWP 29173)):

> #0  0x00007ffff6034f0d in poll () from /lib64/libc.so.6

> #1  0x00007ffff7926c14 in readIt (drvPvt=0x70eb30, pasynUser=0x79e4a8,
> data="" ">", maxchars=2048, nbytesTransfered=0x7fffed942ba0,
> gotEom=0x7fffed942b90)

>    at ../../asyn/drvAsynSerial/drvAsynIPPort.c:726

> #2  0x00007ffff7930886 in readIt (drvPvt=0x70fbb0, pasynUser=0x79e4a8,
> data="" ">", maxchars=, nbytesTransfered=0x7fffed942ba0,
> eomReason=0x7fffed942b90)

>    at ../../asyn/interfaces/asynOctetBase.c:233

> #3  0x00007ffff793a563 in readIt (ppvt=0x70fcb0, pasynUser=0x79e4a8,
> data="" ">", maxchars=63, nbytesTransfered=0x7fffed942ca0,
> eomReason=0x7fffed942c70)

>    at ../../asyn/miscellaneous/asynInterposeEos.c:231

> #4  0x00007ffff7bab5f5 in AsynDriverInterface::readHandler
> (this=this@entry=0x79e2e0) at ../AsynDriverInterface.cc:960

> #5  0x00007ffff7bacd08 in handleRequest (pasynUser=) at
> ../AsynDriverInterface.cc:1503

> #6  0x00007ffff791c9f3 in portThread (pport=0x70ece0) at
> ../../asyn/asynDriver/asynManager.c:902

> #7  0x00007ffff6b6f7fc in start_routine (arg=0x70f290) at
> ../../../src/libCom/osi/os/posix/osdThread.c:403

> #8  0x00007ffff5d2ce25 in start_thread () from /lib64/libpthread.so.0

> #9  0x00007ffff603fbad in clone () from /lib64/libc.so.6

>

>

>

> Thread 12 (Thread 0x7fffeda44700 (LWP 29172)):

> #0  0x00007ffff5d30995 in pthread_cond_wait@@GLIBC_2.3.2 () from
> /lib64/libpthread.so.0

> #1  0x00007ffff6b71f3b in epicsEventWait (pevent=0x70d520) at
> ../../../src/libCom/osi/os/posix/osdEvent.c:103

> #2  0x00007ffff6b6b029 in epicsEventMustWait (id=) at
> ../../../src/libCom/osi/epicsEvent.cpp:125

> #3  0x00007ffff791c40c in portThread (pport=0x70d050) at
> ../../asyn/asynDriver/asynManager.c:788

> #4  0x00007ffff6b6f7fc in start_routine (arg=0x70d590) at
> ../../../src/libCom/osi/os/posix/osdThread.c:403

> #5  0x00007ffff5d2ce25 in start_thread () from /lib64/libpthread.so.0

> #6  0x00007ffff603fbad in clone () from /lib64/libc.so.6

>

> I don't understand why the asyn iocsh commands would not be available.

>

>

> Please send the complete output of the iocsh "help" command.

>

>

> The hang at shutdown appears to be caused by this thread:

>

>

> Thread 1 (Thread 0x7ffff7fda740 (LWP 29161)):

> #0  0x00007ffff5d3351d in __lll_lock_wait () from
> /lib64/libpthread.so.0

> #1  0x00007ffff5d2ee36 in _L_lock_870 () from /lib64/libpthread.so.0

> #2  0x00007ffff5d2ed2f in pthread_mutex_lock () from
> /lib64/libpthread.so.0

> #3  0x00007ffff6b71a76 in mutexLock (id=0x70ef60) at
> ../../../src/libCom/osi/os/posix/osdMutex.c:46

> #4  epicsMutexOsdLock (pmutex=0x70ef60) at
> ../../../src/libCom/osi/os/posix/osdMutex.c:130

> #5  0x00007ffff7917e9b in lockPort (pasynUser=0x710668) at
> ../../asyn/asynDriver/asynManager.c:1741

> #6  0x00007ffff7925876 in cleanup (arg=0x70eb30) at
> ../../asyn/drvAsynSerial/drvAsynIPPort.c:246

> #7  0x00007ffff6b65ce3 in epicsExitCallAtExitsPvt (pep=) at
> ../../../src/libCom/misc/epicsExit.c:95

> #8  epicsExitCallAtExits () at
> ../../../src/libCom/misc/epicsExit.c:113

> #9  0x00007ffff6b66088 in epicsExit (status=0) at
> ../../../src/libCom/misc/epicsExit.c:181

> #10 0x000000000040544d in main (argc=, argv=) at ../iocMain.cpp:21

>

>

> So the epicsExit function is hung up in the drvAsynIPPort::cleanup function.  It has called lockPort, but cannot get that mutex.

>

>

> No other thread is waiting for a mutex, so it is not a traditional deadlock.  But it seems that some other thread does have that mutex and is blocking the epicsExit thread from continuing.

>

>

> It is probably this thread, which is in StreamDevice.

>

>

> Thread 13 (Thread 0x7fffed943700 (LWP 29173)):

> #0  0x00007ffff6034f0d in poll () from /lib64/libc.so.6

> #1  0x00007ffff7926c14 in readIt (drvPvt=0x70eb30, pasynUser=0x79e4a8,
> data="" ">", maxchars=2048, nbytesTransfered=0x7fffed942ba0,
> gotEom=0x7fffed942b90)

>    at ../../asyn/drvAsynSerial/drvAsynIPPort.c:726

> #2  0x00007ffff7930886 in readIt (drvPvt=0x70fbb0, pasynUser=0x79e4a8,
> data="" ">", maxchars=, nbytesTransfered=0x7fffed942ba0,
> eomReason=0x7fffed942b90)

>    at ../../asyn/interfaces/asynOctetBase.c:233

> #3  0x00007ffff793a563 in readIt (ppvt=0x70fcb0, pasynUser=0x79e4a8,
> data="" ">", maxchars=63, nbytesTransfered=0x7fffed942ca0,
> eomReason=0x7fffed942c70)

>    at ../../asyn/miscellaneous/asynInterposeEos.c:231

> #4  0x00007ffff7bab5f5 in AsynDriverInterface::readHandler
> (this=this@entry=0x79e2e0) at ../AsynDriverInterface.cc:960

> #5  0x00007ffff7bacd08 in handleRequest (pasynUser=) at
> ../AsynDriverInterface.cc:1503

> #6  0x00007ffff791c9f3 in portThread (pport=0x70ece0) at
> ../../asyn/asynDriver/asynManager.c:902

> #7  0x00007ffff6b6f7fc in start_routine (arg=0x70f290) at
> ../../../src/libCom/osi/os/posix/osdThread.c:403

> #8  0x00007ffff5d2ce25 in start_thread () from /lib64/libpthread.so.0

> #9  0x00007ffff603fbad in clone () from /lib64/libc.so.6

>

>

> Do you have StreamDevice records that are I/O Intr scanned?

>

>

> Mark

>

>

>

>

> ________________________________

> From: Abdalla Ahmad

> Sent: Tuesday, February 5, 2019 5:53 AM

> To: Mark Rivers

> Cc: [email protected]<mailto:[email protected]>

> Subject: RE: Weird stream device behavior when using the IOC shell's
> exit function

>

>

> Hi Mark

>

>

>

> No command available which starts with asyn. I cloned the latest asyn from github with the same behavior.

>

> For the gdb part, attached is the stack trace from gdb for all pending threads.

>

>

>

> Best Regards,

>

> Abdalla.

>

>

>

> From: Mark Rivers [mailto:[email protected]]

> Sent: Monday, February 04, 2019 4:52 PM

> To: Abdalla Ahmad

> Cc: [email protected]<mailto:[email protected]>

> Subject: RE: Weird stream device behavior when using the IOC shell's
> exit function

>

>

>

> At the iocsh prompt when the IOC is still running type the command

>

>

>

> help

>

>

>

> It should show a complete list of commands that the iocsh understands.  See which ones start with "asyn".

>

>

>

> Mark

>

>

>

>

>

> From: Abdalla Ahmad
> <[email protected]<mailto:[email protected]>>

> Sent: Monday, February 4, 2019 8:44 AM

> To: Mark Rivers
> <[email protected]<mailto:[email protected]>>

> Cc: [email protected]<mailto:[email protected]>

> Subject: Re: Weird stream device behavior when using the IOC shell's
> exit function

>

>

>

> Hello Mark

>

> I will apply the gdb tip and get back to you. For the asyn commands, I don't see any asyn command when I type exit.

>

> Get Outlook for Android

>

>

>

>

>

> On Mon, Feb 4, 2019 at 4:33 PM +0200, "Mark Rivers" <[email protected]<mailto:[email protected]>> wrote:

>

> Hi Abdalla,

>

>

>

>> asynError in write. Asyn driver says: device:port disconnected.

>

>

>

> During IOC shutdown asyn does close all TCP ports.  However, record processing should have already been shut down, so I don't understand why you are getting that message from Stream.

>

>

>

> Another problem we are facing with this new setup is that I can't find some asyn IOC shell function like asynTraceMask for example. The IOC is configured properly in RELEASE and src/Makefile. Is there anything we miss in the new setup?

>

>

>

>> But eventually the IOC exits. For the gamma controllers we get something really strange.

>

>> There a point in the database where the IOC never exits, the exit command just freezes and Ctrl-C is the only way to shut down the IOC.

>

>

>

> I don't think I have seen that with Stream on any version of Stream/asyn/base that I have used.  That includes base 3.14.12, 3.15.5, 7.0.2.

>

>

>

> If you run the IOC with gdb then when you type exit and it hangs do the following:

>

>

>

> - Type Ctrl-C

>

> - Enter the gdb command

>

> thread apply all bt

>

>

>

> That will show you the current stack trace for all threads.  You can then see what is blocking the threads.

>

>

>

>> Another problem we are facing with this new setup is that I can't find some asyn IOC shell function like asynTraceMask for example.

>

>> The IOC is configured properly in RELEASE and src/Makefile. Is there anything we miss in the new setup?

>

>

>

> The command is not "asynTraceMask" it is "asynSetTraceMask" or "asynSetTraceIOMask".

>

>

>

> What asyn commands do you see if you type "help" at the iocsh prompt?

>

>

>

> Mark

>

>

>

> ________________________________________

>

> From:
> [email protected]<mailto:[email protected]>
> on behalf of Abdalla Ahmad via Tech-talk

>

> Sent: Monday, February 4, 2019 1:41 AM

>

> To: [email protected]<mailto:[email protected]>

>

> Subject: Weird stream device behavior when using the IOC shell's exit
> function

>

>

>

> Hi

>

>

>

> We are using the following setup to test control of the agilent XGS gauge controllers and Gamma ion pump controllers:

>

> 1.       EPICS Base 3.15.6

>

> 2.       Asyn R4-33

>

> 3.       Stream R2-7-7c

>

>

>

> For agilent controllers we get the following error:

>

>

>

> asynError in write. Asyn driver says: device:port disconnected.

>

>

>

> But eventually the IOC exits. For the gamma controllers we get something really strange. There a point in the database where the IOC never exits, the exit command just freezes and Ctrl-C is the only way to shut down the IOC. For now I can see that this behavior occurs because more DB substitutions are configured which means more PVs and more controllers. But that was not the case when we had:

>

> 1.       EPICS Base 3.14.12.3

>

> 2.       Asyn R4-18

>

> 3.       Stream R2-5-1

>

>

>

> Where the IOC exits with no errors or freezing. Should we upgrade our support modules or change the EPICS base?

>

>

>

> Another problem we are facing with this new setup is that I can't find some asyn IOC shell function like asynTraceMask for example. The IOC is configured properly in RELEASE and src/Makefile. Is there anything we miss in the new setup?

>

>

>

> Best Regards,

>

>

>

> Abdalla Ahmad

>

> Control Engineer

>

> SESAME

>

> Allan, Jordan.

>

> Tel: (+962-5) 3511348 , ext. 265

>

> Fax: (+962-5) 3511423

>

> Mob: (+962-7)88183296

>

> http://www.sesame.org.jo/

>

>

</[email protected]</[email protected]</[email protected]


------------------------------

Message: 6
Date: Mon, 11 Feb 2019 14:37:14 +0000
From: Mark Rivers <[email protected]>
To: "'Abdalla  Ahmad'" <[email protected]>, 'Dirk Zimoch'
        <[email protected]>
Cc: "'[email protected]'" <[email protected]>
Subject: RE: Weird stream device behavior when using the IOC shell's
        exit    function
Message-ID: <[email protected]>
Content-Type: text/plain; charset="iso-8859-1"

> Your assumption is correct! I applied the new database and the IOC exits correctly.
> If possible, can you please explain in more details how the new design could have solved the problem and why it is a preferred one?

I like the solution of only 1 periodically scanned record, with all others scanned with FLNK because it is then easy to change the scan rate at run time with only a single PV.

I don't really understand what was happening with your previous database.  Each of your records had SCAN=1 second, but for some reason they were actually processing faster than this.  Some records were processing 2 times a second, some 3 times a second.  This does not make sense.

In the new design all records should process in about 600 ms (120 ms each times 5 records).  That should allow 400 ms of idle time each second.  I think that during that idle time the asyn exit handler gets to run.

I don't understand why the records kept processing for you and were blocking access to the asyn manager lock for the shutdown task.  Perhaps I should change the shutdown task to use queueLockPort() rather than lockPort(), or perhaps Dirk needs to add some shutdown logic to StreamDevice.

> I got the controller's latest manual and they mentioned it that the Ethernet's response is appended by "CR CR LF".

It is best to have your protocol file completely consume the input that the device sends for each command, and not wait until the next command to consume the input from the previous command.  Unfortunately asyn does not support terminator strings longer than 2 characters, so you will need to set the input terminator to "\r\r" and consume the \n in your protocol file.

Mark




-----Original Message-----
From: Abdalla Ahmad <[email protected]>
Sent: Monday, February 11, 2019 7:52 AM
To: Mark Rivers <[email protected]>; 'Dirk Zimoch' <[email protected]>
Cc: '[email protected]' <[email protected]>
Subject: Re: Weird stream device behavior when using the IOC shell's exit function

Hello Mark

Your assumption is correct! I applied the new database and the IOC exits correctly. If possible, can you please explain in more details how the new design could have solved the problem and why it is a preferred one?

Before applying the new database I was investigating the controller's response and it was for some reason sending "CR CR LF" as a terminator. I got the controller's latest manual and they mentioned it that the Ethernet's response is appended by "CR CR LF". What's more weird than this is, on a telnet session this is how the device responds:

>cmd 61 1
00 OK YES
>

But on EPICS and even a test socket program, this is the response:

Send: cmd 61 1
Receive: >
Receive: 00 OK YES

Receive: >

The extra new line indicates the aforementioned terminator sequence. The difference in the two responses is because in each protocol I start with "out" where in fact I should start with "in" to receive the starting ">", a test program confirmed that. Skipping 7 characters instead of 6 helped skipping the first ">" and the 2nd one should have been ignored by "ExtraInput = Ignore". I tried to read the first ">" at first but I got I/O error messages. I did not bother to investigate more so I changed back to skipping 7 characters.

But I can confirm that even this alone did not solve it, it seems the problem is a matter of database design and protocol file. Now the IOC is running with base 3.15.6, asyn 4.33 and stream 2.7.7. I still get the asyn write error messages but eventually it exits. I will try upgrading to stream 2.8.8 soon to avoid these messages.

Thank you very much Mark!
Abdalla.
________________________________________
From: Mark Rivers <[email protected]>
Sent: Thursday, February 7, 2019 5:27 PM
To: Abdalla  Ahmad; 'Dirk Zimoch'
Cc: '[email protected]'
Subject: RE: Weird stream device behavior when using the IOC shell's exit function

Hi Abdalla,

Somethings seems wrong in the output you sent.  Each record is set to scan at 1 second.  However, the records are all processing more frequently than that.  The isEnabled record (cmd 61) is processing 3 times per second, separated by 0.12, 0.12, and 0.76 seconds.  The getCurrent record is processing twice a second, separated by 0.25 and 0.75 seconds.

I have attached a modified version of your template file.  Only the first record is periodically scanned, and the remaining records are processed with FLNK.  No records have PINI set, because this is not needed if they are periodically scanned.  This is a preferred design when all records should process at the same rate, since they will execute in a fixed order as fast as possible.  You can control the scan rate of all of them by changing only the SCAN field of getCurrent.

My prediction is that there should be about 400 ms delay between the scan loops if SCAN=1 second, because each record takes about 120 ms and there are 5 records = 600 ms.  Please see if this fixes the problem.  What you have observed indicates a potential problem with stream/asyn, but this may let you work around the problem.

Mark

-----Original Message-----
From: Abdalla Ahmad <[email protected]>
Sent: Thursday, February 7, 2019 7:37 AM
To: Mark Rivers <[email protected]>; 'Dirk Zimoch' <[email protected]>
Cc: '[email protected]' <[email protected]>
Subject: RE: Weird stream device behavior when using the IOC shell's exit function

Hello Mark

Yes I noticed that and next week I will write a test program to read some parameters and see exactly how the device is actually communicating, because on a telnet session you get first a ">" prompt, when you type a command you get the output on a new line then followed by a ">" on a new line.

If you go back to the back trace, you can see that the poll is stuck with data ">", I started to suspect something is wrong with In/Out terminators. I should be able to verify the device terminator character(s?) through the test program I will write. Regarding the "cmd 61" which corresponds to the isEnabled PV, I noticed that if I reduce the number of substitutions, there is a point that the IOC exits if the is Enabled PV is commented from the DB. Nothing strange on a telnet session for any command.

Thank you for your time and efforts!
Abdalla.

-----Original Message-----
From: Mark Rivers [mailto:[email protected]]
Sent: Thursday, February 07, 2019 3:30 PM
To: Abdalla Ahmad <[email protected]>; 'Dirk Zimoch' <[email protected]>
Cc: '[email protected]' <[email protected]>
Subject: Re: Weird stream device behavior when using the IOC shell's exit function

Hi Abdalla,

I see some curious things in your output.

2019/02/07 10:29:03.257 SR-IPC1:23 write 9 cmd 61 1\r
2019/02/07 10:29:03.276 SR-IPC1:23 read 1
>
2019/02/07 10:29:03.376 SR-IPC1:23 read 12 OK 00 YES\r\r\n
2019/02/07 10:29:03.377 SR-IPC1:23 write 9 cmd 61 2\r
2019/02/07 10:29:03.396 SR-IPC1:23 read 1
>
2019/02/07 10:29:03.496 SR-IPC1:23 read 12 OK 00 YES\r\r\n
2019/02/07 10:29:03.497 SR-IPC1:23 write 9 cmd 61 3\r
2019/02/07 10:29:03.516 SR-IPC1:23 read 1
>
2019/02/07 10:29:03.616 SR-IPC1:23 read 12 OK 00 YES\r\r\n

- Each write is followed by 2 reads.  The first read occurs 20 ms after the write. The second read occurs 100 ms after the first read.  Is that just how the device works, or does your protocol file use 2 read operations?

- Sometimes the same command is sent several times in rapid succession as above.  Do you understand why the "cmd 61" command is being sent 3 times in a row?  Are your records each individually periodically scanned, or do you use FLNK to processes them sequentially?

Can you send your database and protocol file?

> A question came up to my mind, could the standard telnet port be related to this deadlock?

I don't think that should matter.  On the IOC side it does not care what port it connected to, the Linux client is using a random high number port on its end.

Mark




________________________________
From: Abdalla Ahmad <[email protected]>
Sent: Thursday, February 7, 2019 2:52 AM
To: Mark Rivers; 'Dirk Zimoch'
Cc: '[email protected]'
Subject: RE: Weird stream device behavior when using the IOC shell's exit function


Hello Mark



Attached is the output of asyn trace mask on one of the controllers. The IOC is working fine even after the exit function so I had to press Ctrl-C at some point.



For your suggestion regarding different configuration: asyn 4.18 can't be built with base 3.15.6 because there is a line in the asyn Gpib module that sets an event record value (a char array) to an integer (which is what was in 3.14.12.3). And stream 2.5.1 with base 3.15.6 complains about two headers it can't find: wdlib.h and streamReferences. According to streamDevice/src/Makefile, StreamReferences should have been generated from CONFIG_STREAM but it's not.



I think for now I will stick with the old setup: base 3.14.12.3, asyn 4.18 and stream 2.5.1.



Since the same setup worked with the agilent controllers (serially interfaced and connected via terminal servers), A question came up to my mind, could the standard telnet port be related to this deadlock?



Best Regards,

Abdalla.



From: Mark Rivers [mailto:[email protected]]
Sent: Wednesday, February 06, 2019 8:09 PM
To: Abdalla Ahmad <[email protected]>; 'Dirk Zimoch' <[email protected]>
Cc: '[email protected]' <[email protected]>
Subject: RE: Weird stream device behavior when using the IOC shell's exit function



The first test I would like you to run is with the recent code, ie. base 3.15.6, asyn 4.33 and stream 2.8.8. Set asynTraceIOMask=2 and asynTraceMask=9 for the TCP port.  Then we will see all communication with the device.  When that is running type "exit".  Send the complete output, for a few seconds before you type exit, and a few seconds after you type exit.



Mark





From: Mark Rivers
Sent: Wednesday, February 6, 2019 8:32 AM
To: 'Abdalla Ahmad' <[email protected]<mailto:[email protected]>>; Dirk Zimoch <[email protected]<mailto:[email protected]>>
Cc: [email protected]<mailto:[email protected]>
Subject: RE: Weird stream device behavior when using the IOC shell's exit function



?  I thought the while loop is causing the block so I removed the while loop and put the poll function just like what is done in version 4.18 but I got the same behavior.

?  So I think that poll here never returns.

?  As the documentation says, poll will block if a negative timeout is provided.

?  So I think somehow stream device is passing a negative timeout but I could not verify the timeout value received in the readIt function.



You could edit that code to print the value of readPollmsec just before the while() statement.  That will tell you if stream device is passing a negative timeout.  The purpose of the loop is to retry the poll in case it terminates early with errno=EINTR.  But it will only loop for readPollsec ms at longest.



I suggest you also try independently changing the versions of asyn and Stream to see which has changed to cause the problem.  I suggest using these 2 configurations:



-          Base 3.15.6, asyn 4.16 Stream 2.8.8

-          Base 3.15.6, asyn 4.33, Stream 2.5.1



I have a simple StreamDevice IOC that I will test today to see if I can reproduce the problem.



Mark





From: Abdalla Ahmad <[email protected]<mailto:[email protected]>>
Sent: Wednesday, February 6, 2019 6:45 AM
To: Mark Rivers <[email protected]<mailto:[email protected]>>; Dirk Zimoch <[email protected]<mailto:[email protected]>>
Cc: [email protected]<mailto:[email protected]>
Subject: RE: Weird stream device behavior when using the IOC shell's exit function



Hello Mark



Just would like to share some findings:



I cloned the latest stream device 2.8.8 and I still get the same behavior. So I investigated the backtrace a little bit specifically the thread you mentioned it is causing the deadlock.



In asyn 4.33 file asyn/drvAsynSerial/drvAsynIPPort.c function "readIt", line 725 is the poll function call causing the deadlock. I thought the while loop is causing the block so I removed the while loop and put the poll function just like what is done in version 4.18 but I got the same behavior. So I think that poll here never returns.



As the documentation says, poll will block if a negative timeout is provided. So I think somehow stream device is passing a negative timeout but I could not verify the timeout value received in the readIt function.



Best Regards,

Abdalla.



From: Abdalla Ahmad
Sent: Wednesday, February 06, 2019 9:12 AM
To: 'Mark Rivers' <[email protected]<mailto:[email protected]>>
Cc: [email protected]<mailto:[email protected]>
Subject: RE: Weird stream device behavior when using the IOC shell's exit function



Hello Mark



This is just to confirm that I re-built the old setup on CentOS 7 x64; base 3.14.12.3, asyn 4.18 and stream 2.5.1 and everything works fine. I don't even get the asynWrite error messages.



Best Regards,

Abdalla.



From: Mark Rivers [mailto:[email protected]]
Sent: Tuesday, February 05, 2019 5:47 PM
To: Abdalla Ahmad <[email protected]<mailto:[email protected]>>
Cc: [email protected]<mailto:[email protected]>
Subject: RE: Weird stream device behavior when using the IOC shell's exit function



You can now use asynSetTraceIOMask and asynSetTraceMask to monitor the communication.  If you enable that before you type "exit" we can see what is happening.



I view the problem exiting the IOC as an annoyance but not something that has to be fixed immediately.  You can always just type CTRL-C.



Mark





From: Abdalla Ahmad <[email protected]<mailto:[email protected]>>
Sent: Tuesday, February 5, 2019 8:17 AM
To: Mark Rivers <[email protected]<mailto:[email protected]>>
Cc: [email protected]<mailto:[email protected]>
Subject: Re: Weird stream device behavior when using the IOC shell's exit function



Well I never thought of checking the client, I will test it and come back.

I forgot to mention that the device is being controlled through standard telnet port 23. Does it make a difference for stream device to work with a well known port number?

Get Outlook for Android<https://aka.ms/ghei36>





On Tue, Feb 5, 2019 at 4:14 PM +0200, "Mark Rivers" <[email protected]<mailto:[email protected]>> wrote:

Is your device working correctly, or is it timing out? I wonder if the exit problem is because Stream is polling constantly and never releasing the lock so the epicsExit thread can never run?



Mark





Sent from my iPhone



> On Feb 5, 2019, at 7:23 AM, Abdalla Ahmad  wrote:

>

> Hello Mark

>

> Including asyn.dbd file did the trick. Looking forward for your thoughts on the exit issue.

>

> Thank you for your time.

> Abdalla.

>

> -----Original Message-----

> From: Mark Rivers [mailto:[email protected]]

> Sent: Tuesday, February 05, 2019 3:19 PM

> To: Abdalla Ahmad

> Cc: [email protected]<mailto:[email protected]>

> Subject: Re: Weird stream device behavior when using the IOC shell's
> exit function

>

> OK, I understand the problem with the missing asyn commands.  You do have these commands in your help output:

>

>

> drvAsynIPPortConfigure          drvAsynIPServerPortConfigure

> drvAsynSerialPortConfigure

>

>

> That tells me that your application dbd file is including drvAsynIPPort.dbd and drvAsynSerialPort.dbd, but it is not including asyn.dbd.  It also explains why it used to work, and now it does not.  Previously drvAsynIPPort.dbd and drvAsynSerialPort.dbd themselves included asyn.dbd.  However, recent EPICS base releases no longer allow a dbd file to be loaded more than once, so asyn.dbd was removed from drvAsynIPPort.dbd and drvAsynSerialPort.dbd.  Now your application must explicitly include asyn.dbd.

>

>

> I think this is independent of the IOC exiting issue, but please include asyn.dbd and see if you still have that problem.

>

>

> Mark

>

>

>

> ________________________________

> From: Abdalla Ahmad

> Sent: Tuesday, February 5, 2019 6:34 AM

> To: Mark Rivers

> Cc: [email protected]<mailto:[email protected]>

> Subject: RE: Weird stream device behavior when using the IOC shell's
> exit function

>

> Hi Mark

>

> I attached the help output in help.txt. For the database nothing is I/O scanned, standard 1 second rate as in the attached database. I also checked the protocol file but nothing seems strange.

>

> While writing the email I thought of increasing the scan rate to 5 seconds and it works. I still get the same asynError in write but the IOC eventually exits. Even if it exits gracefully now, it is still a problem because we need the scan rate to match the IMG ones. As I told you before it used to work on the old setup without any problems. Is there anything we can investigate?

>

> Best Regards,

> Abdalla.

>

> -----Original Message-----

> From: Mark Rivers [mailto:[email protected]]

> Sent: Tuesday, February 05, 2019 2:20 PM

> To: Abdalla Ahmad

> Cc: [email protected]<mailto:[email protected]>

> Subject: Re: Weird stream device behavior when using the IOC shell's
> exit function

>

> Hi Abdalla,

>

>

> Your IOC is clearly built OK with asyn because you have threads that are in asyn functions.

>

>

> Thread 13 (Thread 0x7fffed943700 (LWP 29173)):

> #0  0x00007ffff6034f0d in poll () from /lib64/libc.so.6

> #1  0x00007ffff7926c14 in readIt (drvPvt=0x70eb30, pasynUser=0x79e4a8,
> data="" ">", maxchars=2048, nbytesTransfered=0x7fffed942ba0,
> gotEom=0x7fffed942b90)

>    at ../../asyn/drvAsynSerial/drvAsynIPPort.c:726

> #2  0x00007ffff7930886 in readIt (drvPvt=0x70fbb0, pasynUser=0x79e4a8,
> data="" ">", maxchars=, nbytesTransfered=0x7fffed942ba0,
> eomReason=0x7fffed942b90)

>    at ../../asyn/interfaces/asynOctetBase.c:233

> #3  0x00007ffff793a563 in readIt (ppvt=0x70fcb0, pasynUser=0x79e4a8,
> data="" ">", maxchars=63, nbytesTransfered=0x7fffed942ca0,
> eomReason=0x7fffed942c70)

>    at ../../asyn/miscellaneous/asynInterposeEos.c:231

> #4  0x00007ffff7bab5f5 in AsynDriverInterface::readHandler
> (this=this@entry=0x79e2e0) at ../AsynDriverInterface.cc:960

> #5  0x00007ffff7bacd08 in handleRequest (pasynUser=) at
> ../AsynDriverInterface.cc:1503

> #6  0x00007ffff791c9f3 in portThread (pport=0x70ece0) at
> ../../asyn/asynDriver/asynManager.c:902

> #7  0x00007ffff6b6f7fc in start_routine (arg=0x70f290) at
> ../../../src/libCom/osi/os/posix/osdThread.c:403

> #8  0x00007ffff5d2ce25 in start_thread () from /lib64/libpthread.so.0

> #9  0x00007ffff603fbad in clone () from /lib64/libc.so.6

>

>

>

> Thread 12 (Thread 0x7fffeda44700 (LWP 29172)):

> #0  0x00007ffff5d30995 in pthread_cond_wait@@GLIBC_2.3.2 () from
> /lib64/libpthread.so.0

> #1  0x00007ffff6b71f3b in epicsEventWait (pevent=0x70d520) at
> ../../../src/libCom/osi/os/posix/osdEvent.c:103

> #2  0x00007ffff6b6b029 in epicsEventMustWait (id=) at
> ../../../src/libCom/osi/epicsEvent.cpp:125

> #3  0x00007ffff791c40c in portThread (pport=0x70d050) at
> ../../asyn/asynDriver/asynManager.c:788

> #4  0x00007ffff6b6f7fc in start_routine (arg=0x70d590) at
> ../../../src/libCom/osi/os/posix/osdThread.c:403

> #5  0x00007ffff5d2ce25 in start_thread () from /lib64/libpthread.so.0

> #6  0x00007ffff603fbad in clone () from /lib64/libc.so.6

>

> I don't understand why the asyn iocsh commands would not be available.

>

>

> Please send the complete output of the iocsh "help" command.

>

>

> The hang at shutdown appears to be caused by this thread:

>

>

> Thread 1 (Thread 0x7ffff7fda740 (LWP 29161)):

> #0  0x00007ffff5d3351d in __lll_lock_wait () from
> /lib64/libpthread.so.0

> #1  0x00007ffff5d2ee36 in _L_lock_870 () from /lib64/libpthread.so.0

> #2  0x00007ffff5d2ed2f in pthread_mutex_lock () from
> /lib64/libpthread.so.0

> #3  0x00007ffff6b71a76 in mutexLock (id=0x70ef60) at
> ../../../src/libCom/osi/os/posix/osdMutex.c:46

> #4  epicsMutexOsdLock (pmutex=0x70ef60) at
> ../../../src/libCom/osi/os/posix/osdMutex.c:130

> #5  0x00007ffff7917e9b in lockPort (pasynUser=0x710668) at
> ../../asyn/asynDriver/asynManager.c:1741

> #6  0x00007ffff7925876 in cleanup (arg=0x70eb30) at
> ../../asyn/drvAsynSerial/drvAsynIPPort.c:246

> #7  0x00007ffff6b65ce3 in epicsExitCallAtExitsPvt (pep=) at
> ../../../src/libCom/misc/epicsExit.c:95

> #8  epicsExitCallAtExits () at
> ../../../src/libCom/misc/epicsExit.c:113

> #9  0x00007ffff6b66088 in epicsExit (status=0) at
> ../../../src/libCom/misc/epicsExit.c:181

> #10 0x000000000040544d in main (argc=, argv=) at ../iocMain.cpp:21

>

>

> So the epicsExit function is hung up in the drvAsynIPPort::cleanup function.  It has called lockPort, but cannot get that mutex.

>

>

> No other thread is waiting for a mutex, so it is not a traditional deadlock.  But it seems that some other thread does have that mutex and is blocking the epicsExit thread from continuing.

>

>

> It is probably this thread, which is in StreamDevice.

>

>

> Thread 13 (Thread 0x7fffed943700 (LWP 29173)):

> #0  0x00007ffff6034f0d in poll () from /lib64/libc.so.6

> #1  0x00007ffff7926c14 in readIt (drvPvt=0x70eb30, pasynUser=0x79e4a8,
> data="" ">", maxchars=2048, nbytesTransfered=0x7fffed942ba0,
> gotEom=0x7fffed942b90)

>    at ../../asyn/drvAsynSerial/drvAsynIPPort.c:726

> #2  0x00007ffff7930886 in readIt (drvPvt=0x70fbb0, pasynUser=0x79e4a8,
> data="" ">", maxchars=, nbytesTransfered=0x7fffed942ba0,
> eomReason=0x7fffed942b90)

>    at ../../asyn/interfaces/asynOctetBase.c:233

> #3  0x00007ffff793a563 in readIt (ppvt=0x70fcb0, pasynUser=0x79e4a8,
> data="" ">", maxchars=63, nbytesTransfered=0x7fffed942ca0,
> eomReason=0x7fffed942c70)

>    at ../../asyn/miscellaneous/asynInterposeEos.c:231

> #4  0x00007ffff7bab5f5 in AsynDriverInterface::readHandler
> (this=this@entry=0x79e2e0) at ../AsynDriverInterface.cc:960

> #5  0x00007ffff7bacd08 in handleRequest (pasynUser=) at
> ../AsynDriverInterface.cc:1503

> #6  0x00007ffff791c9f3 in portThread (pport=0x70ece0) at
> ../../asyn/asynDriver/asynManager.c:902

> #7  0x00007ffff6b6f7fc in start_routine (arg=0x70f290) at
> ../../../src/libCom/osi/os/posix/osdThread.c:403

> #8  0x00007ffff5d2ce25 in start_thread () from /lib64/libpthread.so.0

> #9  0x00007ffff603fbad in clone () from /lib64/libc.so.6

>

>

> Do you have StreamDevice records that are I/O Intr scanned?

>

>

> Mark

>

>

>

>

> ________________________________

> From: Abdalla Ahmad

> Sent: Tuesday, February 5, 2019 5:53 AM

> To: Mark Rivers

> Cc: [email protected]<mailto:[email protected]>

> Subject: RE: Weird stream device behavior when using the IOC shell's
> exit function

>

>

> Hi Mark

>

>

>

> No command available which starts with asyn. I cloned the latest asyn from github with the same behavior.

>

> For the gdb part, attached is the stack trace from gdb for all pending threads.

>

>

>

> Best Regards,

>

> Abdalla.

>

>

>

> From: Mark Rivers [mailto:[email protected]]

> Sent: Monday, February 04, 2019 4:52 PM

> To: Abdalla Ahmad

> Cc: [email protected]<mailto:[email protected]>

> Subject: RE: Weird stream device behavior when using the IOC shell's
> exit function

>

>

>

> At the iocsh prompt when the IOC is still running type the command

>

>

>

> help

>

>

>

> It should show a complete list of commands that the iocsh understands.  See which ones start with "asyn".

>

>

>

> Mark

>

>

>

>

>

> From: Abdalla Ahmad
> <[email protected]<mailto:[email protected]>>

> Sent: Monday, February 4, 2019 8:44 AM

> To: Mark Rivers
> <[email protected]<mailto:[email protected]>>

> Cc: [email protected]<mailto:[email protected]>

> Subject: Re: Weird stream device behavior when using the IOC shell's
> exit function

>

>

>

> Hello Mark

>

> I will apply the gdb tip and get back to you. For the asyn commands, I don't see any asyn command when I type exit.

>

> Get Outlook for Android

>

>

>

>

>

> On Mon, Feb 4, 2019 at 4:33 PM +0200, "Mark Rivers" <[email protected]<mailto:[email protected]>> wrote:

>

> Hi Abdalla,

>

>

>

>> asynError in write. Asyn driver says: device:port disconnected.

>

>

>

> During IOC shutdown asyn does close all TCP ports.  However, record processing should have already been shut down, so I don't understand why you are getting that message from Stream.

>

>

>

> Another problem we are facing with this new setup is that I can't find some asyn IOC shell function like asynTraceMask for example. The IOC is configured properly in RELEASE and src/Makefile. Is there anything we miss in the new setup?

>

>

>

>> But eventually the IOC exits. For the gamma controllers we get something really strange.

>

>> There a point in the database where the IOC never exits, the exit command just freezes and Ctrl-C is the only way to shut down the IOC.

>

>

>

> I don't think I have seen that with Stream on any version of Stream/asyn/base that I have used.  That includes base 3.14.12, 3.15.5, 7.0.2.

>

>

>

> If you run the IOC with gdb then when you type exit and it hangs do the following:

>

>

>

> - Type Ctrl-C

>

> - Enter the gdb command

>

> thread apply all bt

>

>

>

> That will show you the current stack trace for all threads.  You can then see what is blocking the threads.

>

>

>

>> Another problem we are facing with this new setup is that I can't find some asyn IOC shell function like asynTraceMask for example.

>

>> The IOC is configured properly in RELEASE and src/Makefile. Is there anything we miss in the new setup?

>

>

>

> The command is not "asynTraceMask" it is "asynSetTraceMask" or "asynSetTraceIOMask".

>

>

>

> What asyn commands do you see if you type "help" at the iocsh prompt?

>

>

>

> Mark

>

>

>

> ________________________________________

>

> From:
> [email protected]<mailto:[email protected]>
> on behalf of Abdalla Ahmad via Tech-talk

>

> Sent: Monday, February 4, 2019 1:41 AM

>

> To: [email protected]<mailto:[email protected]>

>

> Subject: Weird stream device behavior when using the IOC shell's exit
> function

>

>

>

> Hi

>

>

>

> We are using the following setup to test control of the agilent XGS gauge controllers and Gamma ion pump controllers:

>

> 1.       EPICS Base 3.15.6

>

> 2.       Asyn R4-33

>

> 3.       Stream R2-7-7c

>

>

>

> For agilent controllers we get the following error:

>

>

>

> asynError in write. Asyn driver says: device:port disconnected.

>

>

>

> But eventually the IOC exits. For the gamma controllers we get something really strange. There a point in the database where the IOC never exits, the exit command just freezes and Ctrl-C is the only way to shut down the IOC. For now I can see that this behavior occurs because more DB substitutions are configured which means more PVs and more controllers. But that was not the case when we had:

>

> 1.       EPICS Base 3.14.12.3

>

> 2.       Asyn R4-18

>

> 3.       Stream R2-5-1

>

>

>

> Where the IOC exits with no errors or freezing. Should we upgrade our support modules or change the EPICS base?

>

>

>

> Another problem we are facing with this new setup is that I can't find some asyn IOC shell function like asynTraceMask for example. The IOC is configured properly in RELEASE and src/Makefile. Is there anything we miss in the new setup?

>

>

>

> Best Regards,

>

>

>

> Abdalla Ahmad

>

> Control Engineer

>

> SESAME

>

> Allan, Jordan.

>

> Tel: (+962-5) 3511348 , ext. 265

>

> Fax: (+962-5) 3511423

>

> Mob: (+962-7)88183296

>

> http://www.sesame.org.jo/

>

>

</[email protected]</[email protected]</[email protected]


------------------------------

_______________________________________________
Tech-talk mailing list [email protected]
https://mailman.aps.anl.gov/mailman/listinfo/tech-talk


End of Tech-talk Digest, Vol 13, Issue 74
*****************************************

Navigate by Date:
Prev: Fwd: ioc crash with ip module (synaps) TPG261 message too small=0 Heinz Junkes via Tech-talk
Next: Re: trying to find information on mvme5100 battery Maren Purves via Tech-talk
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  <20192020  2021  2022  2023  2024 
Navigate by Thread:
Prev: Comparison between EPICS and TANGO. Azra Jabeen via Tech-talk
Next: Error in installing EtherCAT coupler module Harshal Sheth via Tech-talk
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  <20192020  2021  2022  2023  2024