Hi folks,
We have a relatively new EPICS 7 + PVAccess application on Linux details below.
I use MCoreUtils to manage 16 cores. I pin cpu's 3-15 and let the Linux kernel and unspecified threads have cpus 0-2 for housekeeping
I may have a flakey RJ-45. Randomly throughout the day yesterday I saw this in the logs:
The symptom on my IOC was finding an uncontrolled proliferation of TCP-tx and TCP-rx threads. After just a few hours of ioc up time I counted 4800+ threads. These rouge TCP- threads are able to ignore the /etc/rtrules (attached below) and show up like this.
TCP-rx 0x7f6df0f8d9d0 1795781 20 0 OK ? 0-2
TCP-tx 0x7f6df0fa8460 1795789 20 0 OK ? 0-2
Notice the "?" in the Policy column.
When the issue shows up the PVAccess updates slow way down but don't completely stop. The 2 healthy TCP-rx and TCP-tx threads are properly running from what I can see and they're properly set with FIFO and assigned to cores 7 and 6:
TCP-tx 0x7f79e40220f0 2132240 20 20 OK FIFO 6
TCP-rx 0x7f79e4021cc0 2132239 20 20 OK FIFO 7
The full mcoreThreadShowAll is below. Anything we should pay attention to?
Thanks!
-Matt
4.18.0-513.18.1.rt7.320.el8_9.x86_64 #1 SMP PREEMPT_RT
[matt.rippa@hbfm2ts-lp2 epics-base]$ git submodule
+dead44c3cbe70b134766af13a56b571dbb40a278 .ci (v3.4.1-3-gdead44c)
7a2d264f2cb107bfd10adb23bc2b73d8323a79e4 modules/normativeTypes (6.0.1-3-g7a2d264)
+f1268adb8ecbacbd74bb66c172d02d9d427bedfd modules/pvAccess (7.1.7-2-gf1268adb)
+144f0228cc412d2dc1eaad7e09e310697d18532f modules/pvData (8.0.6-2-g144f022)
+f207e512d67addab79e33a00b712e3444228ba7c modules/pvDatabase (4.7.1-4-gf207e51)
949b3f63c2387bb92c1c22ca2f80f8d320805117 modules/pva2pva (1.4.1-1-g949b3f6)
8ed07fef96e41d35d47ab61276e29eb1a81e7fec modules/pvaClient (4.8.0-1-g8ed07fe)
[matt.rippa@hbfm2ts-lp2 epics-base]$ git lg -5
* 5dfc6caf3 - (1 year, 1 month ago) Accept should return SOCKET rather than int - Freddie Akeroyd (HEAD -> upstream-7, upstream/7.0)
* cb49bd013 - (12 months ago) Update ci-scripts to 3.4.1 - Ralph Lange
* 4720b61c1 - (1 year, 1 month ago) Move call to setThreadName() - Freddie Akeroyd
* 4383cf291 - (1 year ago) allow macros with defaults in dbLoadRecords without substitutions - Dirk Zimoch
* a6977ae73 - (1 year ago) Fix issue where VSCode makefile extension can delete files - Simon Rose
#!../../bin/linux-x86_64/m2ts
< envPaths
epicsEnvSet("IOC","iocm2ts")
epicsEnvSet("TOP","/home/matt.rippa/work/m2-cem-project/m2ts2")
epicsEnvSet("EPICS_BASE","/home/matt.rippa/work/vendor/epics-base")
epicsEnvSet("MCUtils","/home/matt.rippa/work/m2-cem-project/m2ts2/MCoreUtils")
epicsEnvSet("GCBRec","/home/matt.rippa/work/m2-cem-project/m2ts2/gcbCommandRecord")
cd "/home/matt.rippa/work/m2-cem-project/m2ts2"
epicsEnvSet("IOCSH_PS1","m2ts2> ")
epicsEnvSet("EPICS_PVA_ADDR_LIST","10.1.2.173")
epicsEnvSet("EPICS_PVA_SERVER_PORT","41366")
## Register all support components
dbLoadDatabase "dbd/m2ts.dbd"
m2ts_registerRecordDeviceDriver pdbbase
MCoreUtils version 1.2.3-SNAPSHOT
MCoreUtils: Read 26 thread rule(s) from /etc/rtrules
MCoreUtils: Read 0 thread rule(s) from /root/.rtrules
## Load record instances
#dbLoadTemplate "db/user.substitutions"
#dbLoadRecords "db/m2tsVersion.db", "user=mrippa"
dbLoadRecords "db/AP323.db", "m2top="
#var mySubDebug 1
traceIocInit
iocInit will be traced
M2TSStartup
cd "/home/matt.rippa/work/m2-cem-project/m2ts2/iocBoot/iocm2ts"
iocInit
iocInit: Reached initHookAtIocBuild
Starting iocInit
iocInit: Reached initHookAtBeginning
#!../../bin/linux-x86_64/m2ts
< envPaths
epicsEnvSet("IOC","iocm2ts")
epicsEnvSet("TOP","/home/matt.rippa/work/m2-cem-project/m2ts2")
epicsEnvSet("EPICS_BASE","/home/matt.rippa/work/vendor/epics-base")
epicsEnvSet("MCUtils","/home/matt.rippa/work/m2-cem-project/m2ts2/MCoreUtils")
epicsEnvSet("GCBRec","/home/matt.rippa/work/m2-cem-project/m2ts2/gcbCommandRecord")
cd "/home/matt.rippa/work/m2-cem-project/m2ts2"
epicsEnvSet("IOCSH_PS1","m2ts2> ")
epicsEnvSet("IOCSH_PS1","m2ts2> ")
epicsEnvSet("EPICS_PVA_ADDR_LIST","10.1.2.173")
epicsEnvSet("EPICS_PVA_SERVER_PORT","41366")
## Register all support components
dbLoadDatabase "dbd/m2ts.dbd"
m2ts_registerRecordDeviceDriver pdbbase
MCoreUtils version 1.2.3-SNAPSHOT
MCoreUtils: Read 26 thread rule(s) from /etc/rtrules
MCoreUtils: Read 0 thread rule(s) from /root/.rtrules
## Load record instances
#dbLoadTemplate "db/user.substitutions"
#dbLoadRecords "db/m2tsVersion.db", "user=mrippa"
dbLoadRecords "db/AP323.db", "m2top="
#var mySubDebug 1
traceIocInit
iocInit will be traced
M2TSStartup
iocInit
iocInit: Reached initHookAtIocBuild
Starting iocInit
iocInit: Reached initHookAtBeginning
############################################################################
## EPICS R7.0.8.1-DEV
## Rev. R7.0.8-16-g5dfc6caf3c898b213c84-dirty
## Rev. Date Git: 2024-03-06 09:48:26 -0600
############################################################################
iocInit: Reached initHookAfterCallbackInit
iocInit: Reached initHookAfterCaLinkInit
iocInit: Reached initHookAfterInitDrvSup
iocInit: Reached initHookAfterInitRecSup
iocInit: Reached initHookAfterInitDevSup
iocInit: Reached initHookAfterInitDatabase
iocInit: Reached initHookAfterFinishDevSup
initPeriodic: Scan rate '.015 second' is not achievable.
iocInit: Reached initHookAfterScanInit
iocInit: Reached initHookAfterInitialProcess
iocInit: Reached initHookAfterCaServerInit
iocInit: Reached initHookAfterIocBuilt
iocInit: Reached initHookAtIocRun
iocInit: Reached initHookAfterDatabaseRunning
iocInit: Reached initHookAfterInterruptAccept
iocInit: Reached initHookAfterCaServerRunning
iocInit: Reached initHookAtEnd
iocRun: All initialization complete
iocInit: Reached initHookAfterIocRunning
m2ts2> mcoreThreadShowAll
NAME EPICS ID LWP ID OSIPRI OSSPRI STATE POLICY CPUSET
_main_ 0x75b1d0 2989179 0 0 OK ? ?
errlog 0x775510 2989181 10 10 OK FIFO 0-2
TMpage1Writer 0x836ae0 2989182 87 86 OK FIFO 5
TMwaveformWriter 0x831040 2989183 86 85 OK FIFO 5
TMTemps 0x8311e0 2989184 81 80 OK FIFO 5
TMTempEnable 0x831430 2989185 70 69 OK FIFO 5
AP323_ISR_1 0x8316f0 2989186 91 90 OK FIFO 11
AP323_ISR_2 0x831950 2989187 91 90 OK FIFO 12
AP323_ISR_0 0x831c00 2989188 91 90 OK FIFO 10
M2MirrorControlT 0x831f80 2989189 90 89 OK FIFO 13
HSDataThread 0x832460 2989190 70 69 OK FIFO 5
M2VibrationContr 0x832790 2989191 90 89 OK FIFO 14
SafetyShutdown 0x832c80 2989192 90 89 OK FIFO 8
StatusManager 0x832f50 2989193 89 88 OK FIFO 5
m2Init 0x833290 2989194 80 79 OK FIFO 5
taskwd 0x8488b0 2989195 10 10 OK FIFO 0-2
timerQueue 0x833df0 2989196 70 69 OK FIFO 0-2
cbLow 0x8440a0 2989197 59 58 OK FIFO 0-2
cbMedium 0x8336f0 2989198 64 63 OK FIFO 0-2
cbHigh 0x844fc0 2989199 71 70 OK FIFO 0-2
dbCaLink 0x8484d0 2989200 50 50 OK FIFO 0-2
PVAL 0x848580 2989201 50 50 OK FIFO 0-2
PDB-event 0x834f60 2989202 19 19 OK FIFO 0-2
pvAccess-client 0x835ac0 2989203 35 35 OK FIFO 3
UDP-rx 0.0.0.0:0 0x8432d0 2989204 50 50 OK FIFO 6
UDP-rx 10.26.70. 0x843de0 2989205 50 50 OK FIFO 6
UDP-rx 10.26.70. 0x83f2b0 2989206 50 50 OK FIFO 6
UDP-rx 224.0.0.1 0x83fa20 2989207 50 50 OK FIFO 6
scanOnce 0x8407b0 2989208 68 67 OK FIFO 0-2
scan-10 0x840af0 2989209 65 64 OK FIFO 5
scan-5 0x840d40 2989210 66 65 OK FIFO 5
scan-2 0x840f90 2989211 67 66 OK FIFO 5
scan-1 0x8411e0 2989212 68 67 OK FIFO 5
scan-0.5 0x841430 2989213 69 68 OK FIFO 5
scan-0.2 0x841680 2989214 70 69 OK FIFO 5
scan-0.1 0x9c2500 2989215 71 70 OK FIFO 5
scan-0.015 0x9c2750 2989216 72 71 OK FIFO 5
CAS-TCP 0x9cb050 2989217 16 16 OK FIFO 7
CAS-UDP 0x9cb2a0 2989218 12 12 OK FIFO 7
CAS-beacon 0x9cb4f0 2989219 14 14 OK FIFO 7
ipToAsciiProxy 0x7fb8c400f7d0 2989220 10 10 OK FIFO 0-2
PVAS timers 0x9cbb20 2989221 25 25 OK FIFO 0-2
timerQueue 0x7fb8c400ff30 2989222 52 51 OK FIFO 0-2
TCP-acceptor 0x9cc6a0 2989223 50 50 OK FIFO 6
CAC-UDP 0x7fb8c40110b0 2989224 54 53 OK FIFO 6
UDP-rx 0.0.0.0:0 0x9ed1f0 2989225 50 50 OK FIFO 6
UDP-rx 10.26.70. 0xa2dca0 2989226 50 50 OK FIFO 6
UDP-rx 10.26.70. 0xa2e0b0 2989227 50 50 OK FIFO 6
UDP-rx 224.0.0.1 0xa4e7a0 2989228 50 50 OK FIFO 6
m2tsStateTelemTh 0xa82520 2989229 90 89 OK FIFO 3
gcbCommandM 0xa8a890 2989230 81 80 OK FIFO 9
fastguider 0xaa7d90 2989231 89 88 OK FIFO 13
GCBProcessor 0xaa7ff0 2989233 84 83 OK FIFO 3
TCP-tx 0x7fb8680220f0 2989269 20 20 OK FIFO 6
TCP-rx 0x7fb868021cc0 2989268 20 20 OK FIFO 7
UDP-rx 224.0.0.1 0x7fb848058a30 2990167 50 50 OK FIFO 6
UDP-rx 10.26.70. 0x7fb848027c50 2990166 50 50 OK FIFO 6
UDP-rx 10.26.70. 0x7fb848006a60 2990165 50 50 OK FIFO 6
UDP-rx 0.0.0.0:0 0x7fb848068700 2990164 50 50 OK FIFO 6
pvAccess-client 0x7fb8480062f0 2990163 35 35 OK FIFO 3
...
/etc/rtrules:
# Format of each line: name:policy:priority:affinity:pattern
#
# name distinguishing tag
# policy scheduling policy (first letter suffices, case independent, * = don't change)
# priority scheduling priority (OSI units, + or - defines a relative change, * = don't change)
# affinity CPU set (use , and - to specify ranges, * = don't change)
# pattern regular _expression_ to match thread names against
#
# Intel(R) Xeon(R) Silver 4216 CPU @ 2.10GHz
errlog:f:10:4:errlog
m2Init:f:80:5:m2Init
HSDataT:f:70:5:HSDataThread
TMTempE:f:70:5:TMTempEnable
TMTemps:f:81:5:TMTemps
TMwfW:f:86:5:TMwaveformWriter
TMp1W:f:87:5:TMpage1Writer
StatMan:f:89:5:StatusManager
TCP-acc:f:*:6:TCP-acceptor.*
TCP-tx:f:*:6:TCP-tx.*
TCP-rx:f:*:7:TCP-rx.*
SSDTask:f:90:8:SafetyShutdown
gcbCommandM:f:81:9:gcbCommandM*
AP323_ISR_0:f:91:10:AP323_ISR_0
AP323_ISR_1:f:91:11:AP323_ISR_1
AP323_ISR_2:f:91:12:AP323_ISR_2
fastguider:f:89:13:fastguider
MCT1:f:90:13:M2MirrorControlT1
VCT1:f:90:14:M2VibrationControlT1
# The gcbCommandQueue is the GCBProcessor
gcbProc:f:84:3:GCBProcessor
# The Gui Status Record has is m2tsStateTelem
m2tsStateTelemTh:f:90:3:m2tsStateTelemTh
# pvaccess-client
pva-c:f:*:3:pvAccess-client
# set CAS threads to SCHED_RR on CPU 7
CAS-all:f:*:7:CAS-.*
# set CAC threads to SCHED_RR on CPU 6
CAC-all:f:*:6:CAC-.*
UDP-all:f:*:6:UDP-.*
# increase priority of all scan tasks by 5
scan:*:+5:5:scan-.*