we have recently observed a bizarre bug with one of our pizzabox encoders at NSLS2 that cropped up during the migration of its IOC to RHEL 8. As part of the migration, we modernized the EPICS modules that it depended on, including EPICS base. The IOC uses the
pscdrv module as a base, with two additional *App directories. When first moving to RHEL 8, I had to make some minor changes, which can be found in the following two commit diffs:
These changes allowed the IOC to build on RHEL 8 without any problems. However, when running the IOC and collecting some data from the encoder, we encountered a strange issue. Essentially, from the box, we collect a count of nanoseconds and seconds. The
nanoseconds count up to one second and then reset to 0, while the seconds counter gets a reading after every second. This data is then combined to give a straight line representing time to nanosecond accuracy. Unfortunately, what ended up happening was that
the first batch of data written to the datafile was corrupted, which then caused the nanosecond and second counters to become out of sync - causing the graph to become jagged. (See the linked images for a visualization. Note in the one with the three way split,
how the initial data is corrupted vs the normal data):
We did not see any error or warning messages in the IOC shell, and after some trial and error we tried with an older version of base + modules, 3.14.*, and modules circa 2017. This resolved the problem. Curiously, the initial corrupted data was always
1024 elements, which was both the value of NELM in the waveform records corresponding to the PVs for this data, as well as the hard-coded size of the buffer that is written to memory (using an fprintf call in an aSubRecord callback). Aside from that initial
invalid data, the remaining data seems to be consistent and valid, though cannot be used due to being out of sync.
Does anyone have an idea what could possibly cause this kind of behavior? I can send more data/source code snippets if that helps as well. It would be good to figure out a fix to make this IOC work with the version of EPICS base we have now standardized
on.