![]() |
![]() ![]()
Experimental Physics and
| ||||||||||||||
|
Hi folks, We have a relatively new EPICS 7 + PVAccess application on Linux details below. I use MCoreUtils to manage 16 cores. I pin cpu's 3-15 and let the Linux kernel and unspecified threads have cpus 0-2 for housekeeping I may have a flakey RJ-45. Randomly throughout the day yesterday I saw this in the logs: Feb 24 17:18:14 hbfm2ts-lp2.hi.gemini.edu kernel: i40e 0000:1a:00.0 ens82f0: NIC Link is Down
Feb 24 17:18:17 hbfm2ts-lp2.hi.gemini.edu NetworkManager[1036]: <info> [1740453497.6423] device (ens82f0): carrier: link connected Feb 24 17:18:17 hbfm2ts-lp2.hi.gemini.edu kernel: i40e 0000:1a:00.0 ens82f0: NIC Link is Up, 1000 Mbps Full Duplex, Flow Control: TX The symptom on my IOC was finding an uncontrolled proliferation of TCP-tx and TCP-rx threads. After just a few hours of ioc up time I counted 4800+ threads. These rouge TCP- threads are able to ignore the /etc/rtrules (attached below) and show up like this. TCP-rx 0x7f6df0f8d9d0 1795781 20 0 OK ? 0-2 TCP-tx 0x7f6df0fa8460 1795789 20 0 OK ? 0-2 Notice the "?" in the Policy column. When the issue shows up the PVAccess updates slow way down but don't completely stop. The 2 healthy TCP-rx and TCP-tx threads are properly running from what I can see and they're properly set with FIFO and assigned to cores 7 and 6: TCP-tx 0x7f79e40220f0 2132240 20 20 OK FIFO 6
TCP-rx 0x7f79e4021cc0 2132239 20 20 OK FIFO 7 The full mcoreThreadShowAll is below. Anything we should pay attention to? Thanks! -Matt 4.18.0-513.18.1.rt7.320.el8_9.x86_64 #1 SMP PREEMPT_RT [matt.rippa@hbfm2ts-lp2 epics-base]$ git submodule +dead44c3cbe70b134766af13a56b571dbb40a278 .ci (v3.4.1-3-gdead44c) 7a2d264f2cb107bfd10adb23bc2b73d8323a79e4 modules/normativeTypes (6.0.1-3-g7a2d264) +f1268adb8ecbacbd74bb66c172d02d9d427bedfd modules/pvAccess (7.1.7-2-gf1268adb) +144f0228cc412d2dc1eaad7e09e310697d18532f modules/pvData (8.0.6-2-g144f022) +f207e512d67addab79e33a00b712e3444228ba7c modules/pvDatabase (4.7.1-4-gf207e51) 949b3f63c2387bb92c1c22ca2f80f8d320805117 modules/pva2pva (1.4.1-1-g949b3f6) 8ed07fef96e41d35d47ab61276e29eb1a81e7fec modules/pvaClient (4.8.0-1-g8ed07fe) [matt.rippa@hbfm2ts-lp2 epics-base]$ git lg -5 * 5dfc6caf3 - (1 year, 1 month ago) Accept should return SOCKET rather than int - Freddie Akeroyd (HEAD -> upstream-7, upstream/7.0) * cb49bd013 - (12 months ago) Update ci-scripts to 3.4.1 - Ralph Lange * 4720b61c1 - (1 year, 1 month ago) Move call to setThreadName() - Freddie Akeroyd * 4383cf291 - (1 year ago) allow macros with defaults in dbLoadRecords without substitutions - Dirk Zimoch * a6977ae73 - (1 year ago) Fix issue where VSCode makefile extension can delete files - Simon Rose #!../../bin/linux-x86_64/m2ts
< envPaths epicsEnvSet("IOC","iocm2ts") epicsEnvSet("TOP","/home/matt.rippa/work/m2-cem-project/m2ts2") epicsEnvSet("EPICS_BASE","/home/matt.rippa/work/vendor/epics-base") epicsEnvSet("MCUtils","/home/matt.rippa/work/m2-cem-project/m2ts2/MCoreUtils") epicsEnvSet("GCBRec","/home/matt.rippa/work/m2-cem-project/m2ts2/gcbCommandRecord") cd "/home/matt.rippa/work/m2-cem-project/m2ts2" epicsEnvSet("IOCSH_PS1","m2ts2> ") epicsEnvSet("EPICS_PVA_ADDR_LIST","10.1.2.173") epicsEnvSet("EPICS_PVA_SERVER_PORT","41366") ## Register all support components dbLoadDatabase "dbd/m2ts.dbd" m2ts_registerRecordDeviceDriver pdbbase MCoreUtils version 1.2.3-SNAPSHOT MCoreUtils: Read 26 thread rule(s) from /etc/rtrules MCoreUtils: Read 0 thread rule(s) from /root/.rtrules ## Load record instances #dbLoadTemplate "db/user.substitutions" #dbLoadRecords "db/m2tsVersion.db", "user=mrippa" dbLoadRecords "db/AP323.db", "m2top=" #var mySubDebug 1 traceIocInit iocInit will be traced M2TSStartup cd "/home/matt.rippa/work/m2-cem-project/m2ts2/iocBoot/iocm2ts" iocInit iocInit: Reached initHookAtIocBuild Starting iocInit iocInit: Reached initHookAtBeginning #!../../bin/linux-x86_64/m2ts
< envPaths epicsEnvSet("IOC","iocm2ts") epicsEnvSet("TOP","/home/matt.rippa/work/m2-cem-project/m2ts2") epicsEnvSet("EPICS_BASE","/home/matt.rippa/work/vendor/epics-base") epicsEnvSet("MCUtils","/home/matt.rippa/work/m2-cem-project/m2ts2/MCoreUtils") epicsEnvSet("GCBRec","/home/matt.rippa/work/m2-cem-project/m2ts2/gcbCommandRecord") cd "/home/matt.rippa/work/m2-cem-project/m2ts2" epicsEnvSet("IOCSH_PS1","m2ts2> ") epicsEnvSet("IOCSH_PS1","m2ts2> ") epicsEnvSet("EPICS_PVA_ADDR_LIST","10.1.2.173") epicsEnvSet("EPICS_PVA_SERVER_PORT","41366") ## Register all support components dbLoadDatabase "dbd/m2ts.dbd" m2ts_registerRecordDeviceDriver pdbbase MCoreUtils version 1.2.3-SNAPSHOT MCoreUtils: Read 26 thread rule(s) from /etc/rtrules MCoreUtils: Read 0 thread rule(s) from /root/.rtrules ## Load record instances #dbLoadTemplate "db/user.substitutions" #dbLoadRecords "db/m2tsVersion.db", "user=mrippa" dbLoadRecords "db/AP323.db", "m2top=" #var mySubDebug 1 traceIocInit iocInit will be traced M2TSStartup iocInit iocInit: Reached initHookAtIocBuild Starting iocInit iocInit: Reached initHookAtBeginning ############################################################################ ## EPICS R7.0.8.1-DEV ## Rev. R7.0.8-16-g5dfc6caf3c898b213c84-dirty ## Rev. Date Git: 2024-03-06 09:48:26 -0600 ############################################################################ iocInit: Reached initHookAfterCallbackInit iocInit: Reached initHookAfterCaLinkInit iocInit: Reached initHookAfterInitDrvSup iocInit: Reached initHookAfterInitRecSup iocInit: Reached initHookAfterInitDevSup iocInit: Reached initHookAfterInitDatabase
iocInit: Reached initHookAfterFinishDevSup initPeriodic: Scan rate '.015 second' is not achievable. iocInit: Reached initHookAfterScanInit iocInit: Reached initHookAfterInitialProcess iocInit: Reached initHookAfterCaServerInit iocInit: Reached initHookAfterIocBuilt iocInit: Reached initHookAtIocRun iocInit: Reached initHookAfterDatabaseRunning iocInit: Reached initHookAfterInterruptAccept iocInit: Reached initHookAfterCaServerRunning iocInit: Reached initHookAtEnd iocRun: All initialization complete iocInit: Reached initHookAfterIocRunning m2ts2> mcoreThreadShowAll
NAME EPICS ID LWP ID OSIPRI OSSPRI STATE POLICY CPUSET _main_ 0x75b1d0 2989179 0 0 OK ? ? errlog 0x775510 2989181 10 10 OK FIFO 0-2 TMpage1Writer 0x836ae0 2989182 87 86 OK FIFO 5 TMwaveformWriter 0x831040 2989183 86 85 OK FIFO 5 TMTemps 0x8311e0 2989184 81 80 OK FIFO 5 TMTempEnable 0x831430 2989185 70 69 OK FIFO 5 AP323_ISR_1 0x8316f0 2989186 91 90 OK FIFO 11 AP323_ISR_2 0x831950 2989187 91 90 OK FIFO 12 AP323_ISR_0 0x831c00 2989188 91 90 OK FIFO 10 M2MirrorControlT 0x831f80 2989189 90 89 OK FIFO 13 HSDataThread 0x832460 2989190 70 69 OK FIFO 5 M2VibrationContr 0x832790 2989191 90 89 OK FIFO 14 SafetyShutdown 0x832c80 2989192 90 89 OK FIFO 8 StatusManager 0x832f50 2989193 89 88 OK FIFO 5 m2Init 0x833290 2989194 80 79 OK FIFO 5 taskwd 0x8488b0 2989195 10 10 OK FIFO 0-2 timerQueue 0x833df0 2989196 70 69 OK FIFO 0-2 cbLow 0x8440a0 2989197 59 58 OK FIFO 0-2 cbMedium 0x8336f0 2989198 64 63 OK FIFO 0-2 cbHigh 0x844fc0 2989199 71 70 OK FIFO 0-2 dbCaLink 0x8484d0 2989200 50 50 OK FIFO 0-2 PVAL 0x848580 2989201 50 50 OK FIFO 0-2 PDB-event 0x834f60 2989202 19 19 OK FIFO 0-2 pvAccess-client 0x835ac0 2989203 35 35 OK FIFO 3 UDP-rx 0.0.0.0:0 0x8432d0 2989204 50 50 OK FIFO 6 UDP-rx 10.26.70. 0x843de0 2989205 50 50 OK FIFO 6 UDP-rx 10.26.70. 0x83f2b0 2989206 50 50 OK FIFO 6 UDP-rx 224.0.0.1 0x83fa20 2989207 50 50 OK FIFO 6 scanOnce 0x8407b0 2989208 68 67 OK FIFO 0-2 scan-10 0x840af0 2989209 65 64 OK FIFO 5 scan-5 0x840d40 2989210 66 65 OK FIFO 5 scan-2 0x840f90 2989211 67 66 OK FIFO 5 scan-1 0x8411e0 2989212 68 67 OK FIFO 5 scan-0.5 0x841430 2989213 69 68 OK FIFO 5 scan-0.2 0x841680 2989214 70 69 OK FIFO 5 scan-0.1 0x9c2500 2989215 71 70 OK FIFO 5 scan-0.015 0x9c2750 2989216 72 71 OK FIFO 5 CAS-TCP 0x9cb050 2989217 16 16 OK FIFO 7 CAS-UDP 0x9cb2a0 2989218 12 12 OK FIFO 7 CAS-beacon 0x9cb4f0 2989219 14 14 OK FIFO 7 ipToAsciiProxy 0x7fb8c400f7d0 2989220 10 10 OK FIFO 0-2 PVAS timers 0x9cbb20 2989221 25 25 OK FIFO 0-2 timerQueue 0x7fb8c400ff30 2989222 52 51 OK FIFO 0-2 TCP-acceptor 0x9cc6a0 2989223 50 50 OK FIFO 6 CAC-UDP 0x7fb8c40110b0 2989224 54 53 OK FIFO 6 UDP-rx 0.0.0.0:0 0x9ed1f0 2989225 50 50 OK FIFO 6 UDP-rx 10.26.70. 0xa2dca0 2989226 50 50 OK FIFO 6 UDP-rx 10.26.70. 0xa2e0b0 2989227 50 50 OK FIFO 6 UDP-rx 224.0.0.1 0xa4e7a0 2989228 50 50 OK FIFO 6 m2tsStateTelemTh 0xa82520 2989229 90 89 OK FIFO 3 gcbCommandM 0xa8a890 2989230 81 80 OK FIFO 9 fastguider 0xaa7d90 2989231 89 88 OK FIFO 13 GCBProcessor 0xaa7ff0 2989233 84 83 OK FIFO 3 TCP-tx 0x7fb8680220f0 2989269 20 20 OK FIFO 6 TCP-rx 0x7fb868021cc0 2989268 20 20 OK FIFO 7 UDP-rx 224.0.0.1 0x7fb848058a30 2990167 50 50 OK FIFO 6 UDP-rx 10.26.70. 0x7fb848027c50 2990166 50 50 OK FIFO 6 UDP-rx 10.26.70. 0x7fb848006a60 2990165 50 50 OK FIFO 6 UDP-rx 0.0.0.0:0 0x7fb848068700 2990164 50 50 OK FIFO 6 pvAccess-client 0x7fb8480062f0 2990163 35 35 OK FIFO 3 ... /etc/rtrules: # Format of each line: name:policy:priority:affinity:pattern # # name distinguishing tag # policy scheduling policy (first letter suffices, case independent, * = don't change) # priority scheduling priority (OSI units, + or - defines a relative change, * = don't change) # affinity CPU set (use , and - to specify ranges, * = don't change) # pattern regular _expression_ to match thread names against # # Intel(R) Xeon(R) Silver 4216 CPU @ 2.10GHz errlog:f:10:4:errlog m2Init:f:80:5:m2Init HSDataT:f:70:5:HSDataThread TMTempE:f:70:5:TMTempEnable TMTemps:f:81:5:TMTemps TMwfW:f:86:5:TMwaveformWriter TMp1W:f:87:5:TMpage1Writer StatMan:f:89:5:StatusManager TCP-acc:f:*:6:TCP-acceptor.* TCP-tx:f:*:6:TCP-tx.* TCP-rx:f:*:7:TCP-rx.* SSDTask:f:90:8:SafetyShutdown gcbCommandM:f:81:9:gcbCommandM* AP323_ISR_0:f:91:10:AP323_ISR_0 AP323_ISR_1:f:91:11:AP323_ISR_1 AP323_ISR_2:f:91:12:AP323_ISR_2 fastguider:f:89:13:fastguider MCT1:f:90:13:M2MirrorControlT1 VCT1:f:90:14:M2VibrationControlT1 # The gcbCommandQueue is the GCBProcessor gcbProc:f:84:3:GCBProcessor # The Gui Status Record has is m2tsStateTelem m2tsStateTelemTh:f:90:3:m2tsStateTelemTh # pvaccess-client pva-c:f:*:3:pvAccess-client # set CAS threads to SCHED_RR on CPU 7 CAS-all:f:*:7:CAS-.* # set CAC threads to SCHED_RR on CPU 6 CAC-all:f:*:6:CAC-.* UDP-all:f:*:6:UDP-.* # increase priority of all scan tasks by 5 scan:*:+5:5:scan-.*
| ||||||||||||||
ANJ, 26 Feb 2025 |
![]() · Download · Search · IRMIS · Talk · Documents · Links · Licensing · |