1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 <2024> 2025 | Index | 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 <2024> 2025 |
<== Date ==> | <== Thread ==> |
---|
Subject: | Re: pasynOctetSyncIO->read() weirdly slow performance |
From: | Marco Filho via Tech-talk <tech-talk at aps.anl.gov> |
To: | Mark Rivers <rivers at cars.uchicago.edu>, "tech-talk at aps.anl.gov" <tech-talk at aps.anl.gov> |
Date: | Fri, 1 Nov 2024 10:26:03 +0000 |
Good morning! std::string detectorIPcmd(detectorIP);
detectorIPcmd.append(" TCP"); drvAsynIPPortConfigure(detectorPortName, detectorIPcmd.c_str(), 0, 1, 0); /* Assign that port to the asynUser */ pasynCommonSyncIO->connect(detectorPortName, 0, &pasynUserCommon, ""); pasynOctetSyncIO->connect(detectorPortName, 0, &pasynUserOctet, ""); Which was the cause of the segfault. After fixing this and putting the suggested code outside the loop (appendix A), i tested it again. No significant performance improvement was found, but probably the compiler might have optimized the previous code. - Your result implies that the overhead of epicsEventTryWait is about 47 microseconds. That is a bit surprising, and is probably worth making a tiny test program to confirm. A simple test program that tests epicsEventTryWait (appendix B) yielded: No event finished in: 0.000 s
with event finished in: 0.502 s So for 10000 iterations its about 5.02e-05 or 50 microsseconds per function call. ################## Appendix A ################## void ngem::acqTask(){
/* EPICS/asyn stuff. Parameters, status, etc... */ int acquire = 0, imageMode, arrayCallbacks; double timeout; int eomReason, status = 0; size_t nIn; /* Ngem data */ GenericEvent genEvent; unsigned char tcp_frame[TCP_FRAME_SIZE]; /*ngem data */ EventType eventType; struct octetPvt { asynCommon *pasynCommon; void *pcommonPvt; asynOctet *pasynOctet; void *octetPvt; asynDrvUser *pasynDrvUser; void *drvUserPvt; }; octetPvt *pPvt = (octetPvt *) pasynUserOctet->userPvt; while (1) { if (!acquire) { asynPrint(this->pasynUserSelf, ASYN_TRACE_FLOW, "%s:%s: waiting for acquire to start\n", driverName, __FUNCTION__); epicsEventWait(startEventId_); /* Set needed parameters for acquisition */ acquire = 1; this->lock(); setIntegerParam(ADAcquire, acquire); setStringParam(ADStatusMessage, "Acquiring data"); setIntegerParam(ADStatus, ADStatusAcquire); setIntegerParam(ADNumImagesCounter, 0); callParamCallbacks(); this->unlock(); /* Get needed parameters for acquisition */ getIntegerParam(ADImageMode, &imageMode); getDoubleParam(readTimeout, &timeout); getIntegerParam(NDArrayCallbacks, &arrayCallbacks); /* Connect to ngem to receive images */ pasynUserOctet->timeout = timeout; pasynCommonSyncIO->connectDevice(pasynUserCommon); } /* Read NGEM data */ //this->readData(eventType, arrayCallbacks, timeout); pasynManager->lockPort(pasynUserOctet); status = pPvt->pasynOctet->read(pPvt->octetPvt, pasynUserOctet, (char*)tcp_frame, (size_t)TCP_FRAME_SIZE, &nIn, &eomReason); pasynManager->unlockPort(pasynUserOctet); // if(epicsEventTryWait(stopEventId_) == epicsEventWaitOK) { // pasynCommonSyncIO->disconnectDevice(pasynUserCommon); // this->lock(); // if (imageMode == ADImageContinuous) { // setIntegerParam(ADStatus, ADStatusIdle); // } else { // setIntegerParam(ADStatus, ADStatusAborted); // } // setIntegerParam(ADAcquire, 0); // this->unlock(); // callParamCallbacks(); // memset(this->image, 0, sizeof(this->image)); // acquire = 0; // } } } ##################
Appendix B ################## #include <ctime>
#include <chrono> #include <epicsEvent.h> #include <stdio.h> int main(){ epicsEventId happySignal = epicsEventCreate(epicsEventEmpty); int x = 0; auto no_event_start = std::chrono::high_resolution_clock::now(); for (int i = 0; i<10000; i++){ x=i; } auto no_event_end = std::chrono::high_resolution_clock::now(); std::chrono::duration<double> no_event_duration = no_event_end - no_event_start; printf("No event finished in: %.3f s\n", no_event_duration.count()); auto start = std::chrono::high_resolution_clock::now(); for (int i = 0; i<10000; i++){ x=i; epicsEventTryWait(happySignal); } auto end = std::chrono::high_resolution_clock::now(); std::chrono::duration<double> duration = end - start; printf("with event finished in: %.3f s\n", duration.count()); } From: Mark Rivers <rivers at cars.uchicago.edu>
Sent: 31 October 2024 19:22:56 To: Marco Filho; tech-talk at aps.anl.gov Subject: RE: pasynOctetSyncIO->read() weirdly slow performance Hi Marco, A few comments:
Rather than “No explicit thread locking” I would say “using pasynOctetSyncIO”.
Rather than “explicit thread locking” I would say “using
pasynOctet->read(pPvt->octetPvt with pasynManager->lockPort()” In appendix E these lines should be before the while(1) loop for efficiency. /* Connect to ngem to receive images */ Your result implies that the overhead of epicsEventTryWait is about 47 microseconds. That is a bit surprising, and is probably worth making a tiny test program to confirm. Mark From: Marco Filho <marco.filho at ess.eu>
Hi! I had some interesting results! Turns out it was not asyn's read function the main responsible for the performance drag - although it has some considerable overhead - but epicsEventTryWait(stopEventId_)
that was in the end of the loop. if (!acquire) { Appendix B if (!acquire) { ################################# while (1) {
while (1) { ################### while (1) { From: Tech-talk <tech-talk-bounces at aps.anl.gov>
on behalf of Marco Filho via Tech-talk <tech-talk at aps.anl.gov> Hi, Mark! From: Mark Rivers <rivers at cars.uchicago.edu> Hi Marco, If I understand correctly, your acquisition thread is calling
pasynOctetSyncIO->read() 2,232,143 times in 24 seconds. That is 10.1 microseconds per call. If that is correct, then I am not surprised. pasynOctetSyncIO->read calls pasynManager->queueLockPort() to gain exclusive access to the
asyn port. That call involves a context switch to a waiting thread. A little history here. Prior to asyn R4-14 in 2010 all the asynXXXSyncIO functions (including pasynOctetSyncIO->read) called pasynManager->lockPort() and pasynManager->unlockPort().
This is fast, but it causes problems as explained in the R4-14 release notes which say: The problem with the previous implementation was that pasynManager->lockPort() immediately took the port mutex when it was available, rather than queueing a request
to take the mutex. This could lead to one thread effectively getting exclusive access to the port, even if other threads had queued requests or tried to do SyncIO calls themselves. For example, if a device could send unsolicited input, then one might create
a thread that simply called pasynOctetSyncIO->read() with a short timeout in a tight loop. The problem with this was that as soon as that thread released the port mutex when the read timed out, it would take the mutex again, blocking other threads that were
trying to access the port. Previously the only solution to this problem was to add a short epicsThreadSleep() in the read loop. The solution was to create a queueLockPort() function which queues a request to lock the port, allowing multiple threads to get access. The penalty is that it is less efficient. In your case where performance matters you really should not be calling
pasynOctetSyncIO->read(). I suggest you change your acquisition thread to directly call pasynManager->lockPort(), pasynOctet->read(), and pasynManager->unlockPort(). To call pasynOctet->read() you need access to pasynOctet, and to
the driver private handle. You can do that by directly calling the pasynManager functions to find the interface, or you can cheat and take advantage of the fact that the pasynUser you are using already has that information, which was created in pasynUser->drvUser
when pasynOctetSyncIO->connect() was called . It is cheating because the structure definition is not publicly exposed, but perhaps it should be. Here is some untested code: struct octetPvt { asynCommon *pasynCommon; void *pcommonPvt; asynOctet *pasynOctet; void *octetPvt; asynDrvUser *pasynDrvUser; void *drvUserPvt; }; octetPvt *pPvt = (octetPvt *)
pasynUserOctet->userPvt; pasynUserOctet->timeout = timeout; pasynManager->lockPort(pasynUserOctet); status = pPvt->pasynOctet->read(pPvt->octetPvt, pasynUserOctet, (char*)tcp_frame, (size_t)TCP_FRAME_SIZE, &nIn, &eomReason); pasynManager->unlockPort(pasynUserOctet); Let me know if this works, and if it improves the performance. Mark From: Tech-talk <tech-talk-bounces at aps.anl.gov>
On Behalf Of Marco Filho via Tech-talk Hi all, we are having a performance issue with asyn and I am not sure if I am misusing the tool... // /* Assign that port to the asynUser */ epicsThreadOpts acqTaskThreadOpts;
unsigned char tcp_frame[TCP_FRAME_SIZE]; I also tried improving the thread priority from medium to maximum, but to no avail. |