Hi,
Small update on this issue:
I updated to the Andor SDK3.12 release, but found it’s not compatible with RHEL6. Initially it seemed ok, but after a few hours the machine locks up and we get messages like this in /var/log/messages:
Oct 17 16:01:03 cg1d-dassrv2 kernel: ERR(0): GetOneSG: cannot get 2328 blocks at 0
Oct 17 16:01:03 cg1d-dassrv2 kernel: ERR(0): bitflow_scatter_gather: g1sg failed on frame 0 of 5
Oct 17 16:03:09 cg1d-dassrv2 automount[14608]: do_umount_autofs_direct: attempt to umount busy direct mount /home/controls
Oct 17 16:04:41 cg1d-dassrv2 kernel: INFO: task events/0:35 blocked for more than 120 seconds.
Oct 17 16:04:41 cg1d-dassrv2 kernel: Tainted: P -- ------------ 2.6.32-642.1.1.el6.x86_64 #1
Oct 17 16:04:41 cg1d-dassrv2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 17 16:04:41 cg1d-dassrv2 kernel: events/0 D 0000000000000000 0 35 2 0x00000000
Oct 17 16:04:41 cg1d-dassrv2 kernel: ffff880418953d50 0000000000000046 ffff880418953d18 ffff880418953d14
Oct 17 16:04:41 cg1d-dassrv2 kernel: ffff880418953ce0 ffff88041ec24100 000006e46193d5a4 ffff880028316ec0
Oct 17 16:04:41 cg1d-dassrv2 kernel: 0000000000000400 00000001006f2807 ffff88041894fad8 ffff880418953fd8
Oct 17 16:04:41 cg1d-dassrv2 kernel: Call Trace:
Oct 17 16:04:41 cg1d-dassrv2 kernel: [<ffffffff81548f66>] __mutex_lock_slowpath+0x96/0x210
Oct 17 16:04:41 cg1d-dassrv2 kernel: [<ffffffffa04e4710>] ? bflki_work+0x0/0x10 [bitflow]
Oct 17 16:04:41 cg1d-dassrv2 kernel: [<ffffffff81548a8b>] mutex_lock+0x2b/0x50
Oct 17 16:04:41 cg1d-dassrv2 kernel: [<ffffffffa04e429e>] bflki_mutex_lock+0xe/0x10 [bitflow]
Oct 17 16:04:41 cg1d-dassrv2 kernel: [<ffffffffa04e5bda>] bflki_private_work+0x4a/0x2d0 [bitflow]
Oct 17 16:04:41 cg1d-dassrv2 kernel: [<ffffffffa04e471e>] ? bflki_work+0xe/0x10 [bitflow]
Oct 17 16:04:41 cg1d-dassrv2 kernel: [<ffffffff8109fdc0>] ? worker_thread+0x170/0x2a0
Oct 17 16:04:41 cg1d-dassrv2 kernel: [<ffffffff810a6ac0>] ? autoremove_wake_function+0x0/0x40
Oct 17 16:04:41 cg1d-dassrv2 kernel: [<ffffffff8109fc50>] ? worker_thread+0x0/0x2a0
Oct 17 16:04:41 cg1d-dassrv2 kernel: [<ffffffff810a662e>] ? kthread+0x9e/0xc0
Oct 17 16:04:41 cg1d-dassrv2 kernel: [<ffffffff8100c28a>] ? child_rip+0xa/0x20
Oct 17 16:04:41 cg1d-dassrv2 kernel: [<ffffffff810a6590>] ? kthread+0x0/0xc0
Oct 17 16:04:41 cg1d-dassrv2 kernel: [<ffffffff8100c280>] ? child_rip+0x0/0x20
I’ve passed this information on to Andor, but they don’t have a history of quickly resolving issues like this. And this one may be an issue with the BifFlow driver rather than the user space SDK.
Is anybody else using Andors SDK3.12 with Linux?
Also, it didn’t solve the original issue (I’ve seen it reoccur before the machine locks up). The only way to recover from the original issue seems to be an application restart. Two different calls into the Andor SDK produced the uncaught exception, so not sure we can recover from the EPICS driver side.
Cheers,
Matt
Data Acquisition and Control Engineer
Spallation Neutron Source
Oak Ridge National Lab
> On Sep 23, 2016, at 9:47 AM, Pearson, Matthew R. <[email protected]> wrote:
>
>>
>> If you restart the IOC does it start working, or so you need to power-cycle the camera?
>
> Hi Mark,
>
> Yes, an IOC restart fixes it (don’t need to power cycle the camera).
>
> Cheers,
> Matt
>>
>> ________________________________________
>> From: [email protected] [[email protected]] on behalf of Pearson, Matthew R. [[email protected]]
>> Sent: Friday, September 23, 2016 7:54 AM
>> To: [email protected] list
>> Subject: Andor SDK3 issue - AT_WaitBuffer fails
>>
>> Hi,
>>
>> We have an Andor sCMOS Zyla camera and we use the Andor3 support in areaDetector. This uses the Andor SDK3 API to control the camera and read out data. Sometimes we see the Andor3 driver get into a state because the:
>>
>> status = AT_WaitBuffer(handle_, &image, &size, AT_INFINITE);
>>
>> function always immediately returns with error code 11 (which means AT_ERR_NODATA). The Andor3 driver ends up then calling AT_WaitBuffer repeatedly in a tight loop, causing giant log files, until we exit the IOC.
>>
>> The function should block until there is a data frame to read from the SDK, but it doesn’t, which is a problem. I’ve contacted Andor about this (and they asked me to update my SDK version from 3.9 to the latest version).
>>
>> Has anyone else also seen this problem?
>>
>> When I get chance I’ll try to reproduce it, and also update the SDK. I’m not sure if we can easily recover from this state, without restarting the IOC, but perhaps the Andor3 driver could detect this problem and set the ADStatus error flag. I’ll experiment with that if I can reproduce the problem.
>>
>> A bit suspiciously we also see an uncaught exception when we exit the IOC:
>>
>> 2016/09/22 19:46:04.402 andor3:imageTask: AT_WaitBuffer, error=11
>> 2016/09/22 19:46:04.402 andor3:imageTask: AT_WaitBuffer, error=11
>> 2016[Thu Sep 22 19:46:07 2016] /09/22 19:46:04.402 andor3:imageTask: AT_WaitBuffer, error=11
>> terminate called after throwing an instance of '[Thu Sep 22 19:46:07 2016] TSDK3Exception'
>> [Thu Sep 22 19:46:07 2016] what(): TDualCLLogicalControl: Error - Sensor halves are not running in unison
>>
>> Cheers,
>> Matt
>>
>>
>
- References:
- Andor SDK3 issue - AT_WaitBuffer fails Pearson, Matthew R.
- RE: Andor SDK3 issue - AT_WaitBuffer fails Mark Rivers
- Re: Andor SDK3 issue - AT_WaitBuffer fails Pearson, Matthew R.
- Navigate by Date:
- Prev:
S7nodave/boost problem Remi FAURE
- Next:
channel connect timeout error palak shimpee
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
<2016>
2017
2018
2019
2020
2021
2022
2023
2024
- Navigate by Thread:
- Prev:
Re: Andor SDK3 issue - AT_WaitBuffer fails Pearson, Matthew R.
- Next:
printing value of pv to new line palak shimpee
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
<2016>
2017
2018
2019
2020
2021
2022
2023
2024
|