If it was previously working OK and you did not move the card, then it is unlikely to be a PCIe bandwidth issue. It seems more likely a hardware problem.
When my card was in an x4 slot it had horizontal black stripes that are mainly visible when the field of view is changing. If the image is static then they are not seen.
Mark
From: Wlodek, Jakub <jwlodek at bnl.gov>
Sent: Wednesday, November 29, 2023 1:44 PM
To: Mark Rivers <rivers at cars.uchicago.edu>; tech-talk at aps.anl.gov
Subject: Re: Andor3 IOC detects and connects to Marana, can get frames, but frames appear noisy/corrupted
Is there a way to confirm that the card is dropping packets?
According to
lspci -vvv
output, it looks like
it is on an x8 width bus I am using a PCI riser card to fit it into the server, maybe that can be causing issues?
37:00.0 Unassigned class [ff04]: BitFlow Inc Device 7002 (rev 01)
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 30
Region 0: Memory at e4000000 (32-bit, non-prefetchable) [size=1M]
Region 1: Memory at e3000000 (32-bit, non-prefetchable) [size=16M]
Capabilities: [50] MSI: Enable- Count=1/4 Maskable- 64bit+
Address: 0000000000000000 Data: 0000
Capabilities: [78] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [80] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 75.000W
DevCtl: CorrErr- NonFatalErr+ FatalErr+ UnsupReq-
RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
MaxPayload 256 bytes, MaxReadReq 4096 bytes
DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
LnkCap: Port #1, Speed 5GT/s, Width x8, ASPM L0s, Exit Latency L0s unlimited
ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 5GT/s (ok),
Width x8 (ok)
TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Not Supported, TimeoutDis+ NROPrPrP- LTR-
10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix-
EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
FRS- TPHComp- ExtTPHComp-
AtomicOpsCap: 32bit- 64bit- 128bitCAS-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- OBFF Disabled,
LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1-
EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest-
Retimer- 2Retimers- CrosslinkRes: unsupported
Capabilities: [100 v1] Virtual Channel
Caps: LPEVC=0 RefClk=100ns PATEntryBits=1
Arb: Fixed- WRR32- WRR64- WRR128-
VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
Status: NegoPending- InProgress-
Kernel driver in use: bitflow
Which BitFlow card is this? I have seen stripes on their Claxon CoaxPress card when it was plugged into an x4 rather than x8 PCIe slot. They are due to dropped packets.
Mark
Recently I configured an Andor Marana detector utilizing the Andor3 areaDetector driver and the bitflow kernel driver included with the andor3 SDK. This has been
working fine and collecting data for ~6 months. At one point, after the detector had been running for a long time, the beamline reported that the IOC was displaying images that appeared corrupted/didn't make sense, but stopping/starting acquisition got back
the expected readout, and they ran OK for another several months.
Now, however, this same corrupted image issue has re-emerged, but this time a stop/start does not fix it. In fact, a camera power cycle, IOC reboot, kernel driver
rebuild and reload, and IOC server reboot do not appear to have solved the problem. The IOC can detect and connect to the camera, all firmware/serial number information is correct, and it gets images of the correct size, but they do not appear to be valid
images. The different PreAmp Gain modes produce different issues, please take a look at the linked github gist for screenshots:
The 12 bit mode exhibits vertical stripe artifacts, while the 16 bit HDR mode exhibits horizontal lines that cross the image. The overall average pixel value in
the 16 bit mode with no beam used to be ~100 counts, it is now ~35 counts, and spikes to 200 when the artifacts appear. I don't see any error messages that would explain this in the IOC shell, and during acquisition the detector outputs the correct framerate
and image dimensions. The image also does not appear to be influenced by outside stimuli i.e. flashlight/beam.
Has anyone seen this on an Andor Marana, or Andor camera in general before? If so, is there a software based fix for this or is this a hardware related issue? My
next troubleshooting steps will probably be to pull out the PCI card and test the setup on a Windows machine with a shorter cable (the bitflow software on windows lets you collect images outside of the Andor SDK), but if this is some kind of configuration
issue I'm missing I'd rather ask first before I pull the card out of the server in the rack.
Thanks in advance for the advice!