Hi, Torsten! I did some tests:
First, I just added epicsThreadSleep 4 to check the concurrency situation. No *10 factor was observed again. I went down with sleep values as low as 0.1 and still observed no *10 factor even reinitializing the IOC several times. Removing epicsThreadSleep completely
made the *10 factor come back again, so i think we have the right hypotheses.
I then used asynReport right before iocInit and when epicsThreadSleep was present, I found this message in the report:
Parameter 8 type=asynFloat64, name=MOTOR_POSITION, value=4.91725e+06, status=0
while when I removed epicsThreadSleep and got the *10 factor, the message was in this form in the report:
Parameter 8 type=asynFloat64, name=MOTOR_POSITION, value=4.91725e+07, status=0
Bingo.
Now, the results for asynReport right before and right after pmacSetCoordStepsPerUnit:
before: "Parameter 8 type=asynFloat64, name=MOTOR_POSITION, value=4.91725e+07, status=0"
after: sometimes i got "Parameter 8 type=asynFloat64, name=MOTOR_POSITION, value=4.91725e+06, status=0" and sometimes i got:
"Parameter 8 type=asynFloat64, name=MOTOR_POSITION, value=4.91725e+07, status=0"
But even when i got e+07 i didnt observe the *10 factor in the PV anymore. So im guessing, as you suggested, that asynReport itself is taking enough time that when the second report is finished and the thread proceeds to print my dbgf's that are in the end
of the .cmd, the driver has already finished initializing and there is no *10 anymore.
So i think that settles it? As important as understanding the causes of the problem itself, I think the debug tool we were looking for was also found.
Thanks and best regards,
Marco 🙂
From: Torsten Bögershausen <torsten.bogershausen at ess.eu>
Sent: 01 October 2022 03:31
To: Marco A. Barra Montevechi Filho <marco.filho at lnls.br>
Cc: tech-talk at aps.anl.gov <tech-talk at aps.anl.gov>; Augusto Yoshio Horita <augusto.horita at lnls.br>; SOL <sol at lnls.br>
Subject: Re: Suspicious behavior of Pmac 2-5-16 IOC. .RBV and .VAL showing wrong values.
Hej again,
while thinking about it:
It could be that we have another race condition here:
The call to
pmacSetCoordStepsPerUnit("$(PPMAC_CS1_PORT)", 1, 1000)
(Sometimes with problems)
vs
pmacSetCoordStepsPerUnit("$(PPMAC_CS1_PORT)", 1, 10000)
(Solved the problem)
In this case the call
pmacSetCoordStepsPerUnit()
needs to make it's way into the PMAC, and back into the IOC,
where the poller is picking it up.
Then it could be interesting to see what asynReport says before and
after pmacSetCoordStepsPerUnit()
In this case I would not expect undedined here:
Parameter 8 type=asynFloat64, name=MOTOR_POSITION, value is undefined
Parameter 9 type=asynFloat64, name=MOTOR_ENCODER_POSITION, value is
... but rather a value which is off by factor 10, as you reported.
The good news:
The epicsThreadSleep() just before the iocInit() should help
here as well.
On 2022-09-30 17:35, Marco A. Barra Montevechi Filho wrote:
> Thanks, we will test with epicsThreadSleep and asynReport, then.
>
> It makes sense that its a concurrency situation since its the first time
> that we observe the situation in this kind of IOC and this IOC's .cmd
> file in particular seems to execute much less functions between the call
> to pmacSetCoordStepsPerUnit, dbLoadRecords and iocInit than other cmd's
> that use this same Pmac version we have here. So maybe this concurrency
> was being hidden until now.
>
> Will report test results in this same thread later.
>
> Cheers,
>
> Marco
> ------------------------------------------------------------------------
> *From:* Torsten Bögershausen <torsten.bogershausen at ess.eu>
> *Sent:* 30 September 2022 10:10
> *To:* Marco A. Barra Montevechi Filho <marco.filho at lnls.br>
> *Cc:* tech-talk at aps.anl.gov <tech-talk at aps.anl.gov>; Augusto Yoshio
> Horita <augusto.horita at lnls.br>; SOL <sol at lnls.br>
> *Subject:* Re: Suspicious behavior of Pmac 2-5-16 IOC. .RBV and .VAL
> showing wrong values.
> Hej Marco,
>
> This is a tricky thing.
> I have heard about those “weird values” at startup before,
> and we had them on a PMAC once.
> After a lot of digging, I found that there is a possible race condition.
> And did changes to overcome them, mainly in motorRecord.cc
>
> If my theory is right, it may help, if you put a
> epicsThreadSleep 4
> just before iocInit, and then the driver has more time to talk to the
> controller and read the motor position, before the record needs it.
>
> It could even be the case, the the asynReport delays execution
> similar to the epicsThreadSleep, so that the problem is worked around.
>
> Or it could be a complete different problem.
> Without having a hardware to play with, it is hard to tell,
> and easy to speculate.
>
> The “sometimes” in your report indicates that there is something
> shaky, so lets hope that we can figure out, what it is.
> BR
> /Torsten
>
>
>
>
>> On 30 Sep 2022, at 14:43, Marco A. Barra Montevechi Filho <marco.filho at lnls.br> wrote:
>>
>> Hy, Torsten! Thanks for the answer!
>>
>> Although I have time to try this, the beamline that uses these particular motors does not, at least not today. We have never observed this behavior in any motor before, so i believe it would be unproductive to test this elsewhere.
>> I believe next week i might have some free beamline time to test this. Let me see if i understood what you meant:
>>
>> If the race condition messes with us and causes this weird behavior, then putting asynReport 10 <PORTNAME> will result in messages such as
>> "MOTOR_POSITION, value is undefined" when i get wrong values (e.g.: multiplied by 10, in my case), and will not result in these messages when i get my values right.
>>
>> Is that it? If yes, i can make this test next week and report here.
>>
>> Thanks,
>>
>> Marco
>> From: Torsten Bögershausen <torsten.bogershausen at ess.eu>
>> Sent: 30 September 2022 02:56
>> To: Marco A. Barra Montevechi Filho <marco.filho at lnls.br>; tech-talk at aps.anl.gov <tech-talk at aps.anl.gov>
>> Cc: Augusto Yoshio Horita <augusto.horita at lnls.br>
>> Subject: Re: Suspicious behavior of Pmac 2-5-16 IOC. .RBV and .VAL showing wrong values.
>>
>> Hej Marco,
>>
>> There is a possible race condition, when the IOC starts:
>> The record tries to read the position of the motor
>> from the driver, but the driver has not yet been able
>> to fully talk to the controller.
>>
>> If I run
>> asynReport 10 MCU1
>>
>> just before iocInit in st.cmd, I see things like this:
>>
>> Parameter 8 type=asynFloat64, name=MOTOR_POSITION, value is undefined
>> Parameter 9 type=asynFloat64, name=MOTOR_ENCODER_POSITION, value is
>> undefined
>> (and may be have a look at
>> MOTOR_REC_RESOLUTION as well)
>>
>> (You asynReport needs probably another name)
>>
>> Before we dig into a possible solution:
>> Do you have time to try this ?
>>
>> BR
>> /Torsten
>>
>>
>>
>>
>> On 2022-09-29 19:43, Marco A. Barra Montevechi Filho via Tech-talk wrote:
>> > Hello all. We were having problems with Pmac 2-5-16 here at LNLS. The
>> > problem was solved by Augusto Horita (in copy), but the behavior was
>> > weird and we though maybe someone here knows why it is like that or
>> > maybe reporting it in tech-talk would be useful for other people.
>> >
>> > The IOC was compiled with EPICS base 3.15.6.
>> >
>> > The particular instance that we were having trouble with had a set of
>> > axis in the PVs such as:
>> >
>> > BL:H:PB02:CS1:m1
>> >
>> > the axis was configured with
>> >
>> > pmacSetCoordStepsPerUnit("$(PPMAC_CS1_PORT)", 1, 1000)
>> >
>> > and its MRES was set to 10E-3.
>> >
>> > When initialized, the IOC sometimes showed the right values for .RBV and
>> > .VAL fields, but sometimes showed values 10 times bigger than what
>> > should be. Sometimes the *10 factor was only in .RBV, sometimes only in
>> > .VAL and sometimes in both.
>> >
>> > Augusto suspected this was due to failure in pmacSetCoordStepsPerUnit
>> > since the default scale factor is 10000 (ten times bigger than what we
>> > were using). So we suspected that the function was failing for whatever
>> > reason and defaulting the scale factor to 10000 while MRES remained in
>> > 10E-3. Indeed, changing MRES to 10E-4 and
>> >
>> > pmacSetCoordStepsPerUnit("$(PPMAC_CS1_PORT)", 1, 1000)
>> > to
>> > pmacSetCoordStepsPerUnit("$(PPMAC_CS1_PORT)", 1, 10000)
>> >
>> > Solved the problem.
>> >
>> > In the middle of the debug process, we struggled to find any error
>> > message about the suspected failure in pmacSetCoordStepsPerUnit.
>> > Is this absence of error message intentional? Is there a particular
>> > debug method we should have used?
>> > Has this been reported before?
>> >
>> > Thanks for the attention. Best regards,
>> >
>> > Marco
>> >
Aviso Legal: Esta mensagem e seus anexos podem conter informações confidenciais e/ou de uso restrito. Observe atentamente seu conteúdo e considere eventual consulta
ao remetente antes de copiá-la, divulgá-la ou distribuí-la. Se você recebeu esta mensagem por engano, por favor avise o remetente e apague-a imediatamente.
Disclaimer: This email and its attachments may contain confidential and/or privileged information. Observe its content carefully and consider possible querying
to the sender before copying, disclosing or distributing it. If you have received this email by mistake, please notify the sender and delete it immediately.
|