On 5/25/21 7:25 AM, Heinz Junkes via Tech-talk wrote:
Hallo Jakub,
I had the same thoughts as Joel ;-)
A few weeks ago I had tried to debug an RTEMS Posix example with printf - statements.
I came to the conclusion that the time was "broken". The fields in the timespec (tv_sec and tv_nsec) looked somehow swapped.
I suspected a problem with powerpc vs. intel (MSB vs. LSB).
Then Gedare Bloom pointed out to me that it was probably "just" a conversion problem:
"What you observe is an artifact of 32-/64-bit integer conversions in a 32-bit big-endian architecture, I suspect.”
I then swapped my used printf - statement for Gedare's suggestion.
my instruction:
clock_gettime( CLOCK_REALTIME, &now );
printf("now tv_sec = %d, tv_nsec = %d\n", now.tv_sec, now.tv_nsec);
Didn't you compiler spit out a warning here ?
Which compiler is this ?
gcc ?
Gedare's suggestion:
#include <inttypes.h>
[...]
printf("now tv_sec = %" PRIuLEAST64 ", tv_nsec = %d\n", now.tv_sec,
now.tv_nsec);
Hm, that is not ideal either.
We still pass variables of undefined/unknown size onto the stack.
Let's see:
A more correct version of this
>printf("now tv_sec = %d, tv_nsec = %d\n", now.tv_sec, now.tv_nsec);
Would be
printf("now tv_sec = %d, tv_nsec = %d\n",
(int)now.tv_sec, (int)now.tv_nsec);
We tell printf() that we print 2 integers, and casting those values
passed onto the stack as integers makes sure, that we don't pass
a 64 bit value instead of a 32 bit value.
The %d is still unlucky for tv_sec.
While it had been a signed 32 bit integer in the very past,
it is now a 32 bit unsigned integer.
Which wrap in the year 2038:
https://en.wikipedia.org/wiki/Year_2038_problem
In that sense
printf("now tv_sec = %u, tv_nsec = %d\n",
(unsigned int)now.tv_sec, (int)now.tv_nsec);
is a slightly better version.
Note that the nsec field is always < 999999999, so using %d
and casting nsec into an int is fine.
If we look at more modern systems, tv_sec is a 64 bit integer.
Both to overcome the Y2038 problem and to be able to present times
before the epoch in 1970, a signed integer is used.
Depending on your compiler, casting into a "long long" may be a good choice:
printf("now tv_sec = %lld, tv_nsec = %d\n",
(long long)now.tv_sec, (int)now.tv_nsec);
(That doesn't work on Windows)
Or, more modern,
printf("now tv_sec = %"PRId64" , tv_nsec = %d\n",
(int64_t)now.tv_sec, (int)now.tv_nsec);
HTH (and I didn't mess up things)
/Torsten
And then I was at the point to look for the "right" problem ;-)
Maybe I am wrong here too. I don't know your application well enough. I keep my fingers crossed for you.
Heinz
On 25. May 2021, at 03:14, Wlodek, Jakub via Tech-talk <tech-talk at aps.anl.gov> wrote:
Hi all,
we have recently observed a bizarre bug with one of our pizzabox encoders at NSLS2 that cropped up during the migration of its IOC to RHEL 8. As part of the migration, we modernized the EPICS modules that it depended on, including EPICS base. The IOC uses the pscdrv module as a base, with two additional *App directories. When first moving to RHEL 8, I had to make some minor changes, which can be found in the following two commit diffs:
• https://github.com/mdavidsaver/pscdrv/commit/7329aab921d6e5dad82e3c1e014f6298077478dd
• https://github.com/mdavidsaver/pscdrv/commit/b6dcdf7deecee95499dfb9f03786f5297e123307
These changes allowed the IOC to build on RHEL 8 without any problems. However, when running the IOC and collecting some data from the encoder, we encountered a strange issue. Essentially, from the box, we collect a count of nanoseconds and seconds. The nanoseconds count up to one second and then reset to 0, while the seconds counter gets a reading after every second. This data is then combined to give a straight line representing time to nanosecond accuracy. Unfortunately, what ended up happening was that the first batch of data written to the datafile was corrupted, which then caused the nanosecond and second counters to become out of sync - causing the graph to become jagged. (See the linked images for a visualization. Note in the one with the three way split, how the initial data is corrupted vs the normal data):
Pizzabox encoder graphs
We did not see any error or warning messages in the IOC shell, and after some trial and error we tried with an older version of base + modules, 3.14.*, and modules circa 2017. This resolved the problem. Curiously, the initial corrupted data was always 1024 elements, which was both the value of NELM in the waveform records corresponding to the PVs for this data, as well as the hard-coded size of the buffer that is written to memory (using an fprintf call in an aSubRecord callback). Aside from that initial invalid data, the remaining data seems to be consistent and valid, though cannot be used due to being out of sync.
Does anyone have an idea what could possibly cause this kind of behavior? I can send more data/source code snippets if that helps as well. It would be good to figure out a fix to make this IOC work with the version of EPICS base we have now standardized on.
Thanks!
Jakub Wlodek
- References:
- Issue with FPGA based encoder box when running with EPICS 7.0.5 Wlodek, Jakub via Tech-talk
- Re: Issue with FPGA based encoder box when running with EPICS 7.0.5 Heinz Junkes via Tech-talk
- Navigate by Date:
- Prev:
Re: EPICS support for Julabo chiller Florian Feldbauer via Tech-talk
- Next:
Re: "Starter Kit" for learning EPICS? Patrick Oppermann via Tech-talk
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
<2021>
2022
2023
2024
- Navigate by Thread:
- Prev:
Re: Issue with FPGA based encoder box when running with EPICS 7.0.5 Heinz Junkes via Tech-talk
- Next:
Re: Issue with FPGA based encoder box when running with EPICS 7.0.5 Michael Davidsaver via Tech-talk
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
<2021>
2022
2023
2024
|