Experimental Physics and Industrial Control System
I would like to start a discussion about possible improvements to ioc core.
These improvements should not be hard to implement, and will provide some
improvements in behavior. I am certainly not as expert about EPICS internals
as some of you, so feel free to correct any mistakes in my description of
the problem and suggested improvements, but I'd invite you to also discuss
the spirit of the change more than the exact technical implementation
details (which can be left to the real guru's like Marty Kraimer). I
have had preliminary discussions with Marty, so I know I'm not completely
off base.
Problem Statement
=================
1) Communication between 2 ioc's is unreliable and non-deterministic.
There are 2 ways to transfer data between ioc's: push (ca_put style)
and pull (monitor style). Suppose the source signal transitions from
0 to 1 and back again. In the case of push, if the remote/sink ioc
is temporarily busy, the 2nd put may occur before the record is
processed, and so the 0->1 transition is never seen. In the case
of pull, if the source ioc is loaded, the monitor may be "dropped"
(i.e. the channel access buffer is full and the data is discarded)
Summary: for database to database communication, there is
NO RELIABLE, GUARANTEED DELIVERY of data.
(I know that there is a proposal to support process passive over
channel access, which also addresses this problem)
2) Channel access server consumes large amounts of memory.
The design of epics core says that the channel access client
connection tasks run at a priority lower than all of the scan tasks.
The rationale (as I understand it) is that epics is a real time
system, and keeping operator displays up to date is lower priority
than executing the control algorithm on the ioc.
[ This rationale fails on 2 points: (1) control algorithms may span
ioc's, and (2) if an operator can't reach the ioc, he must assume
that the system is no longer under control and must crash it. This
is the only reasonable operational approach. The impact of not
crashing our RF system is that we may trip the cold compressor and
eventually release 10's of thousands of dollars of helium. ]
Because of its low priority, the channel access server reserves
considerable storage space to hold any outbound data occurring as
as result of database processing (at higher priorities). This
pre-allocation of memory is done to improve determinism in the
record processing. Nevertheless, under load the server discards
data (see above).
3) Insufficient flexibility in setting priorities in algorithm design
Database record processing is prioritized according to frequency.
10 hz pre-empts 5 hz, etc. In a system which may experience
temporary loading phenomena, either due to IOEVENT scanning or
non-epics task processing, less frequently processed records cease
to process (hard edged scheduler in VxWorks).
So, if I have a record which absolutely must be processed at least
once every 10 seconds, and the bulk of my database is processed
at once a second, then I must process the critical record at
2 hz to guarantee it is not starved out.
Proposed Solution
=================
1) Re-schedule database processing
Replace all periodic scan tasks with 3 new periodic scan tasks: HIGH,
MEDIUM, and LOW. Each of these tasks maintains multiple scan lists
corresponding to frequency. Scheduling is done at the granularity
of 1 record processing. So, while processing the 5 hz list, at
the end of a single record processing step I check to see if it is
time to process the 10 hz list. If so, I process that entire list
and then resume the 5 hz list. (Generalization to multiple lists
is left as an exercise for the user).
The PRIO field in each record determines to which of the 3 sets of
scan lists the record is assigned.
Impact 1: high priority records are never starved for CPU cycles
(unless you say everything is high priority :)
Impact 2: remove 4 tasks, and support more periods without incurring
a task overhead -- in fact could dynamically configure periods
instead of statically configuring them.
2) Re-prioritize channel access handling
a) When an event is posted to a client (either as a result of a
monitor firing, or an OUTLINK pushing data), and the buffer is
full, then IMMEDIATELY start the network operation and allocate
a new buffer from a freelist. The old buffer is marked as in the
state SENDING. When the network operation completes, the buffer is
put back on the freelist. Under this solution, channel access
outbound network operations are processed in the context
of the scan task.
b) Replace all the channel access client tasks with a single task
to flush partially full buffers that are more than so many
milleseconds old (programmable, of course). If channel access
priority is implemented, replace this with 3 tasks (high, medium,
and low).
Impact 1: some determinism in record processing is lost. This can
be minimized by using zero-copy options in VxWorks tcp/ip. The
amount of lost determinism is small (after all, all scan tasks
can already be interrupted by higher priority scan tasks, so we
don't truly have a determinstic system except for IOEVENT and/or
EVENT scanning, or whichever task you set to the highest priority).
Impact 2: memory usage is greatly reduced, because only 1 buffer
per client is needed, plus some number on the freelist.
Impact 3: monitors are NEVER dropped, which means a pull architecture
becomes one with reliable data delivery
Impact 4: (almost) all channel access client tasks are deleted,
again resulting in hugh memory savings, and significantly lowering
the cost of supporting many clients.
Regards,
Chip
-----------------------------------------------------------------------------
Chip Watson
Internet: [email protected] Thomas Jefferson National Accelerator Facility *
Tel: (757) 269-7101 12000 Jefferson Avenue, MS 12A2
FAX: (757) 269-5024 Newport News, VA 23606
WWW: http://www.jlab.org/~watson/
* (formerly CEBAF, the Continuous Electron Beam Accelerator Facility)
- Navigate by Date:
- Prev:
Re: Hideos Tim Mooney
- Next:
Re: EPICS database & channel access watson
- Index:
1994
1995
<1996>
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
- Navigate by Thread:
- Prev:
VxWorks Licenses Dave Reid
- Next:
Re: EPICS database & channel access watson
- Index:
1994
1995
<1996>
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024