EPICS Data Access Class Library Tutorial

1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 <2005> 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025	Index	1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 <2005> 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025
<== Date ==>		<== Thread ==>

EPICS BASE Versions 3.13.7 and higher are distributed subject to a Software License Agreement found in the file LICENSE that is included with this distribution.

Scope

This document is a tutorial introduction to a generic C++ programming interface for introspecting proprietary data.

Who to Blame

Jeff Hill (LANL SNS Division) and Ralph Lange (BESSY) are responsible for the design of this interface and the contents of this document.

Introduction

Data Access is a generic interface for introspection of proprietary data with complex structure. A program may choose to export its proprietary data using this interface. Once this is accomplished then any programs that know the interface may examine and or potentially modify the data. The user is not required to store his data in a particular format, but nevertheless knowledge of the structure of the data may be determined at compile time, and therefore efficient access to the data can be made. A support library is supplied for copying and comparing between properly interfaced property sets.

Why We Need This

Currently EPICS has a fixed set of meta-data, but this needs to expand because EPICS base developers can't anticipate all possible meta-data, and all possible meta-data permutations. Expansion of the toolset will hopefully accelerate if application developers can define new meta-data. Pivotal to a tool based approach is proper decoupling of tools from each other so that changes in one tool do not cause another tool to break. Data Access is about expanding the meta-data set while keeping the tools properly decoupled. If the meta-data set is expanded in a data source we must not require that all of the clients of that source be rewritten.

In multi-agent systems synchronization is a reoccurring theme. Currently, EPICS synchronizes a single parameter with a fixed set of meta data. Data Access facilitates synchronization of an arbitrary application defined set of meta-data with a time stamp, an arbitrary (application defined) event, a client’s read or write request, or a synchronized multi-channel read / write.

EPICS needs an application extensible event set. For example, a server might post an “arcDown” event with an application specific data capsule. A client might subscribe for event “arcDown” and specify a subset of meta-data to be acquired from the capsule posted with the event. It is essential in this scenario for the client and server data spaces to be decoupled. The Data Access support libraries efficiently copy between dissimilar, decoupled types.

Intelligent instruments are becoming the norm, and they require message passing. Devices communicate using arbitrary request / response capsules, and Data Access interfaces arbitrary data capsules.

Properties

The data access interface requires that all data be assigned a property name. A property name might be weight, units, maximum, or potentially any meta-data name and purpose that a group of programs mutually agree upon. A set of data with unique property names may be stored in a container, and this container is also assigned a property name.

Properties are organized in hierarchies. For example, the weight property might have subordinate high display limit and low display limit properties that belong to it. Likewise, the height property might need to have the same properties assigned to it. If the weight property and the height property both exist in the same container then the appropriate subordinate properties may need to have independent values depending on whether they are used with the weight property or the height property. Therefore, the weight and height properties both have optional subordinate properties. This results in a tree structured property hierarchy.

It is anticipated that the facilities in this library will not be generally useful unless users develop standards early on for the names and purposes of the properties shared by cooperating programs.

Interfaces

Programs using the data access interface can be roughly categorized into three different roles working as a property catalog, a data viewer, or a data manipulator.

C++ Name Space

All of the interfaces described in this document are in C++ name space da. The example code that follows assumes that a using namespace da directive has been seen by the C++ compiler.

using namespace da;

Primitive Data Types

Interfaced data may be stored in any of the C++ primitive types, or in specialized types for strings and time stamps. Additional specialized types may need to be supported in the future as convenience or efficiency dictate.

Property Identifiers

All interfaced data must be assigned a property name.

A property name can be converted to a property identifier as follows.

#include "daPropertyId.h"
static const propertyId propertyIdWeight ( "weight" );
static const propertyId propertyIdHeight ( "height" );
static const propertyId propertyIdValue ( "value" );
static const propertyId propertyIdHighLimit ( "high limit" );
static const propertyId propertyIdLowLimit ( "low limit );

Built-in precompiled property identifiers will be supplied for some of the commonly used property names. A partial list follows (the standard property set needs to be defined first before this can be documented).

Property Catalog

A property catalog derives from the interface class propertyCatalog. The property catalog provides virtual functions to be called from generic programs which don't have direct access to the data, but know how to introspect data using the propertyCatalog interface.

#include "dataAccess.h"
class myGirth : public propertyCatalog {
private:
    double height;
    double weight;
};

Property Catalog Traversal

Frequently it is necessary to traverse through all of the published properties. For example, a utility program could be used to archive many different types of data to and from disk storage — as long as each type of data provides an implementation of a generic property traversal interface. To this end the data interfacing class provides a traverse function that in turn calls a reveal function in the dataViewer interface for each of its participating properties. This reveal function is passed a propertyId and a C++ const reference to the property value. There are many overloaded reveal functions in the dataViewer allowing any of the C++ native storage types to be used.

void myGirth::traverse ( dataViewer & viewer ) const
{
    viewer.reveal ( propertyIdHeight, this->height );
    viewer.reveal ( propertyIdWeight, this->weight );
}

A similarly structured non-const traverse function must also be provided for traversing the data in situations when it might be modified. In this situation we might choose to provide access only to a subset of data members that are allowed to be modified. There are also many overloaded reveal functions in the dataManipulator interface. The overloaded reference to the property value passed to the reveal functions in the dataManipulator is modifyable (is not const).

traverseModifyStatus myGirth::traverse ( dataManipulator & manipulator ) 
{
    viewer.reveal ( propertyIdWeight, this->weight );
    return tmsSuccess;
}

When a data interfacing class must verify that a value modified by the reveal function is within application defined limits for a particular property it will need to pass a reference to a temporary variable, initialized to the current value of the property, to the reveal function. If, after reveal returns, this temporary variable is out of range, then failure status is returned to indicate that the request was invalid, and therefore the property was not modified.

traverseModifyStatus myGirth::traverse ( dataManipulator & manipulator )
{
    double tmpWeight = this->weight;
    manipulator.reveal ( propertyIdWeight, tmpWeight );
    if ( tmpWeight < 0.0 )
        return tmsOutOfRangeLow;
    if ( tmpWeight > 10.0 )
        return tmsOutOfRangeHigh;
    this->weight = tmpWeight;
    return tmsSuccess;
}

When a robustly interfaced property set calls reveal functions in the dataManipulator interface multiple times for multiple properties from within its traverse function then special care might be taken to postpone all property updates until after all of the calls to the reveal functions complete so that the container might be left in a consistent state if any one of the modified parameter is out of range. Here is an example.

void myGirth::traverse ( dataManipulator & manipulator )
{
    double tmpHeight = this->height;
    manipulator.reveal ( propertyIdHeight, tmpHeight );
    if ( tmpHeight < 0.0 )
        return tmsOutOfRangeLow;
    if ( tmpHeight > 10.0 )
        return tmsOutOfRangeHigh;
    double tmpWeight = this->weight;
    manipulator.reveal ( propertyIdWeight, tmpWeight );
    if ( tmpWeight < 0.0 )
        return tmsOutOfRangeLow;
    if ( tmpWeight > 10.0 )
        return tmsOutOfRangeHigh;
    this->height = tmpHeight;
    this->weight = tmpWeight;
    return tmsSuccess;
}

Subordinate Container Traversal

Suppose that we have limit properties and would like to publish them using the data access interfaces. As expected, we will need to create an interfacing class for the limits with traverse functions for the limit properties.

Next, we need to bind the limits subordinate properties to the height and weight properties. This is accomplished in the traverse function for the myGirth class by passing an additional propertyCatalog referencing parameter for the limits when calling reveal for the height and weight properties.

Property Catalog Indexed by Property Identifier

A program might choose to extract out of a data container only the specific properties that it needs, and would therefore need to locate a particular property in an unknown container indexed only by its property identifier. For example, we might choose to move data between dissimilar container types. Compared to the traversal mechanism above, we expect to introduce an additional degree of flexibility required by certain applications at the expense of some loss of runtime efficiency. Data that is interfaced for this type of access must provide the following function.

The find interface allows the programmer complete flexibility when implementing the indexing mechanism. A prototype indexing locator class that has proven to have good performance during testing is shown in the example below, but its use is not required, and for containers with a limited number of properties a cascaded if statement may prove to be the simplest and possibly (probably) also the best performing approach.

Recognition that indexing mechanisms can be greatly simplified when containers have a limited number of properties may prove to be an impetus to design property hierarchies with a limited number of properties on each level.

To use this locator class we must add some class, but not object, specific data members to the data interfacing class.

Next, "binding" member functions are added for each property with indexed access.

Finally, these "binding" member functions must be installed into the locator. This operation would be typically performed only once during initialization.

Note that the central aspect of the find function is that it provides random access to the properties in a container, and we can conclude that the implementor of the dataCatalog is not in control of the order of access or the comprehensiveness of access to a set of properties. Therefore, a non-const version of the find interface employing a dataManipulator is not provided because it is necessary for the implentor of a dataCatalog to be in complete control of the consistancy and completeness of modifications made to a property set (see property catalog traversal discussion).

Operators for Properly Interfaced Data

The data access library provides support for assignment and equivalence operations. Users may perform assignment and equivalence comparison between two dataCatalog interfaced operands. If the assignment or equivalence comparison fails then failure status is returned from the function implementing the operation. For example, during assignment, if a property in the left hand side operand is missing in the right hand side operand then asUndefinedProperty is returned. The library does not implement operator = and operator == for class propertyCatalog, but all of the necessary infrastructure is provided should a user chose to do so within another class. For example, functions are provided to convert failure status into a C++ exception. Functions are also provided for comparison and equivalence operations where one of the operands is a dataCatlog and the other is a stringSegment (see also strings).

enum assignStatus { 
    asSuccess = 0, 
    asOutOfRangeLow = 1, 
    asOutOfRangeHigh = 2, 
    asInvalidState = 3,
    asIncompatibleTypes = 4, 
    asElementIndexOverflow = 5,
    asUndefinedElements = 6,
    asUndefinedProperty = 7,
    asUnableToExtend = 8,
    asUnexpected = 9
};

epicsShareFunc assignStatus assign ( 
    propertyCatalog & lhs, const propertyCatalog & rhs );

epicsShareFunc assignStatus assign ( 
    propertyCatalog & lhs, 
    const stringSegment & rhs, const propertyCatalog & rhsMeta );

epicsShareFunc assignStatus assign ( 
    stringSegment & lhs, const propertyCatalog & lhsMeta, 
    const propertyCatalog & rhs ) ;

epicsShareFunc assignStatus assign ( 
    arraySegment & lhs, const propertyCatalog & lhsMeta, 
    const arraySegment & rhs, const propertyCatalog & rhsMeta );

epicsShareFunc void throwExceptionIfUnsuccessful ( assignStatus );

enum equivStatus { 
    esEqual = 0, 
    esNotEqual = 1, 
    esIncompatible = 2, 
    esElementIndexOverflow = 3,
    esUndefinedElements = 4, 
    esUndefinedProperty = 5
};

epicsShareFunc equivStatus equiv ( 
    const propertyCatalog & lhs, const propertyCatalog & rhsMeta );

epicsShareFunc equivStatus equiv ( 
    const propertyCatalog & lhs, 
    const stringSegment & rhs, const propertyCatalog & rhsMeta );

epicsShareFunc equivStatus equiv ( 
    const arraySegment & lhs, const propertyCatalog & lhsMeta, 
    const arraySegment & rhs, const propertyCatalog & rhsMeta );

bool equivStatusToBool ( equivStatus stat );

Specialized Data Types

Arrays

As with scalar properties, arrays are interfaced by calling a reveal function in the dataViewer or dataManipulator from the dataCatalog's traverse or find function. The only difference being that, instead of passing a scalar value, an arraySegment interface is passed to the reveal function. Arrays of all C++ primitive types are revealed by using a single reveal function in the dataViewer or dataManipulator that requires an arraySegment C++ reference.

Array Bounds

Information about the bounds of an array is provided using the arrayBounds interface. The arrayBounds interface is a subset of the arraySegment interface.

Array Traversal

Arrays may be stored in non-contiguous blocks of memory. This allows for improved memory management (less fragmentation). The arraySegment interface can be used to traverse all of the non-contiguous blocks that form an array. An arrayViewer reference is passed to the array interfacing traverse function. The array reveal function (overloaded for each of the supported primitive types) in the arrayViwer interface is called by the array interfacing traverse function to publish an array segment. Here is an example where the array is stored in two non-contiguous blocks.

When there are multiple calls to reveal from within traverse each new block revealed is considered to be a logical extension of the previous block. That is, if in the arrayViewer::reveal function each successive segment were pushed incrementally onto a stack growing the direction that C array index pointers advance then, after the traverse function completes, the entire array could be directly indexed on this stack using a C array pointer which has been initialized to the root of the stack.

If the array is multi-dimensional then array elements are revealed in the natural order for multi-dimensional arrays in the C language. That is, elements are revealed in row-major order where the right most subscript in the C language declaration int matrix [10][10] varies more rapidly.

Array Slice Traversal

A multi-dimensional slice is specified by a user defined class deriving from interface arrayBounds (see example above). The slice defines the first element index and element count for each dimension of a multi-dimensional array. An arrayBounds reference specifying the bounds of the slice is passed to the array slice interfacing traverse function.

A slice sequence index argument is also passed to the array slice interfacing traverse function. Consider the traversed slice as a continuous sequence of array elements in row-major order from the beginning of the slice (the lowest row-major order element) to its end (the highest row-major order element). The slice sequence index specifies the total number of contiguous elements to extract from this sequence after a specified starting element position in this sequence. A slice sequence index is passed to the slice traverse function using an arrayBounds::bound structure which contains a first sequence element index field and a sequence element count field. If the count field of the slice sequence index is equal to the multiple of the count field from each dimension in the slice, and the first field is zero, then we have requested traversal of the entire slice.

Similar to ordinary array traverse, the array slice traverse is also passed an arrayViewer interface with overloaded reveal functions for the purpose of incrementally publishing snippets of the array slice sequence using blocks of elements composed from any of the primitive types.

Here is an example implementation of an array slice traverse function. It is anticipated that library routines will free casual users from the burden of writing this type of code.

When Native Array Storage Isn't Matching C's Native Row-Major Order

In this situation the user will need to less efficiently call the reveal functions one element at a time. A somewhat parallel situation will also likely occur to more or less of a degree with certain multidimensional slices.

Strings

All strings are interfaced through the stringSegment interface so that a wide range of native string storage formats are supported. In particular, storage of strings in fixed sized non-contiguous blocks is permitted by the interface. There are reveal functions in the dataViewer and dataManipulator interfaces for type stringSegment. It is expected that this interface will be implemented for all of the most commonly used string data types and therefore casual users will not need to be familiar with it. The interface is described in the daString.h and daStream.h header files.

At the highest level a string is considered to be a linear sequence of tokens convertable to C type unsigned. This approach allows for wide character types (regional character sets). A string also has a concept of a current stream position that can be directly manipulated using the streamPosition interface. A string also has streamRead and streamWrite interfaces allowing the string to be converted to and from numeric types while allowing for regional number format variations.

Implementations of the putChar() function set the token at the current stream position and advance the stream position by one token. Implementations of the getChar() function return the token at the current stream position and advance the stream position by one token. Implementations of the stringDiff() function compare the two specified strings and return a constant indicating the relative sort order of the two strings. The stringDiff() function is passed a stringSegment reference and it is expected that implementations will attempt a dynamic cast of this reference to its derived type in the interest of optimization and or implementation of local sort order variations.

The stream position interface allows the total length of the string, its current position, and its number of tokens available in memory at the current position to be queried. Likewise, the current position can be set, all tokens from the current position to the end of the stream can be removed (pruned), and any tokens in a cache can be flushed. The first token in the string has position zero, the second token has position one, and all subsequent tokens are sequentially numbered. Strings are initialized with the first element in the string being the current token.

The length function returns the total number of elements in the string. The position function returns the position of the current token. The movePosition function sets the position of the current token returning false only if the request is not possible. The viewable function returns the number of immediately viewable tokens at the current position. The prune function removes all tokens from the current position to the end of the string. Finally, the flush function flushes any cached tokens.

The streamWrite interface facilitates writing of a string or numeric type at the current position in the string and advancing the current position to just after the end of what was written. When writing a numeric type the implementation must first check to see if there is a subordinate property with property id propertyIdEnumeration (this name might change) and, if it exists, use that interface to convert to and from a numeric type. Next the implementation will convert the number to a string allowing for regional variations in string numeric formats, and or using subordinate properties controlling string numeric formats such as the ioprecision (number of significant digits).

The streamRead interface facilitates reading of a string or numeric type at the current position in the string and advancing the current position to just after what was read. When reading a numeric type the implementation must first check to see if there is a subordinate property with property id propertyIdEnumeration (this name might change) and, if it exists, use that interface to convert to the numeric type. Next the implementation will convert the string to a number allowing for regional variations in string numeric formats, and or using subordinate properties controlling string numeric formats such as the precision (number of significant digits).

Enumerated (Limited Set of Labeled States) Data

The data access interface allows enumerated (Limited Set of Labeled States) data to be stored in any of the primitive types as long as all of the state set values are convertable to C type int. Be advised however that it is the responsibility of the implementor of the traverse and or find function(s) to range check any modifications made by a propertyManipulator. A simple example of his type of range checking follows.

There is also an interface allowing applications to view and manipulate the supported states. This is accomplished by providing a subordinate property of primitive type enumStateSet and property id propertyIdEnumeration (this name might change). The enumStateSet interface is described in the daEnum.h header file. Casual users will probably not implement this interface relying instead on libraries to instantiate enumerated state sets.

Design Goals

The overall design goal was to present concise interface to users and to, whenever possible, shift programming labor from users to the library implementors where there is maximum benefit and minimized code duplication.

Data Access isn't being designed for office computing. For a control system we need to adhere to some basic principals.

The user should not be required to store his data in a particular format. Nevertheless, knowledge of the structure of the data must be permitted to be determined at compile time so that access to the data can be efficient.

The interface must not preclude user data stored in multiple non-contiguous blocks. Memory management based on fixed sized non-contiguous blocks allows for predictable free lists based memory allocation which implies low latency, no memory fragmentation, and predictable latency.

The interface must not require C-RTL general purpose memory management - AKA malloc. When passing data via data access in high throughput situation efficiency gets noticed. When the Data Access interface to application data lifetime is the duration of a function call malloc is a very high overhead call.

Object Code Size

There has been a significant preoccupation surrounding the object code size of data access and so its necessary to clarify this issue.

Data Access is in essence only an interface. When looking at code size we are comparing the sizes of the accompanying support library components. Currently the support library provides equivalence and assignment between properly interfaces data containers. In an IOC it is unlikely that the equivalence functionality will be needed and so we should consider the size only of the assignment component which on linux-x86 amounts currently to about 43 kB when compiled by gcc 3.2.3. There will also be another 10kB for property id hashing. It is likely that these numbers could be reduced, but this is really not a significant overhead (even for legacy embedded systems) and so perhaps that is not a rational way to spend time.

When comparing the sizes of object code between C++ and C one must use the UNIX size utility to judge the true size of the object as it will be used within a properly built executable, and avoid considering irrelevant space in the object code file. For example space is used in C++ object code files for symbols that need not require space in a production executable, and therefore a related issue is that in future versions of EPICS we should endeavor to use a modern vxWorks configurations that doesn't require a target resident symbol table.

Frequently Asked Questions

Whoa, this thing is called Data Access!

Don’t O.O. systems use messages and remote procedure calls? That’s the new technology!

Data Access was invented for the purpose of passing messages - to specify the parameters of the messages.

Why not use a data description compiler like XDR, CORBA IDL, or EPICS DBD.

This is certainly worthy of consideration, but proper decoupling of sender and receiver data spaces appears to be important for a tool based approach. Conventional data description compiler based systems require interfaces of the sender and receiver to be utterly identical parameter-for-parameter, field-for-field, and bit-for-bit. The sender and receiver must have the same unique data structure identifier. If not, no communication is possible. However, consider for example the requirements for future implementations of EPICS. In these systems events posted to the server may have many associated subsystem unique properties. Clients will rarely need all of them, and there will be many permutated subsets requested by a range of different clients. Likewise, clients written in the past should continue to function if new properties are added to an event.

Furthermore, schemes like XDR require that the data be stored in, or converted to, a particular format as produced by the XDR compiler. This works fine for simple scalar datums, but becomes cumbersome for variable length datums such as strings and arrays. When complex data is stored in a proprietary format significant overhead can arrise. For excample, in limited memory systems it is important to allow scattered, non-contiguous storage of datums. It must not be required that random sized dynamically allocated blocks of memory exist only for the short duration that a complex dataum is passed between two different layers in the system. The XDR approach also tends to be inflexible when it comes to interfacing with multi-dimensional arrays.

In contrast, Data Access does not enforce a native storage format and therefore does not suffer from the above limitations, and our perception is that this is a more flexible, less intrusive, and better performing approach.

Where are the size locked types?

Past versions of CA were based on size locked types - i.e. "typedef epicsInt16 dbr_int_t;". However, in my experience users seldom bothered to use dbr_xxx_t, and therefore my perspective has evolved. Size locked types should not be in interfaces used by users. Becauase Data Access uses overloaded reveal functions the interfcace automatically binds to the user's selected primitive data type instead of requiring that users recode to yet-another size locked data type system. When data access is used with a network protocol there will of course be a private propertyViewer implementation that is aware of the size locked types that must be used in network protocols. But this is an internal matter that should not be exposed to users. IMHO, if users must be aware of protocol and architecture dependent matters then a proliferation of mistakes are almost guaranteed. Bugs are an inescapble result of human nature, but it is always preferable to confine them to the smallest source code real estate. It is better to deal with protocol and architecture dependent issues once in a library component instead of multiple times in each client side tool.

Misconceptions

This is a C++ template based interface.

In fact, this is a pure virtual base class based interfaced. Templates are used only in the implementation of the support library. Templates need not be seen by users.

This is a data object.

In fact, this is a universal interface to non-uniform data. Proprietary data storage formats need not change. We are not creating an object that defines a storage format, allocates space, and stores data.This isn't another GDD or cdevData.

Its best to code everything in C because C can be called from C++, Java, Python etc...

It is of course possible to call back and forth between any of these languages and, IMHO, none of them would be successful if that were not the case. The C++ interfaces described herein do not have templates in them and so wrappers can be easily be provided so that they can be called from from C, Java, Python, etc. A C++ plug in for the propertyCatalog interface can also quite easily provide access to data maintained by C programs.

Compared to C, the code and interfaces described herein can be more efficiently maintained and more efficiently executed at runtime when written in C++. This maintenance efficiency stems from C++'s template capabilities. In C we must either write and maintain a program to create the conversion matrix or else write and maintain many functions which could be supplied in C++ with one template. The runtime efficiency derives from C++'s overloaded function and template capabilities. In C, the user would see externally an interface with a void pointer and an additional parameter specifying a data type code. Internally a C implementation would call the conversion matrix using a data type code indexed jump table. With C++ this additional step can be eliminated through use of overloaded functions.

The C++ approach with overloaded functions is also less error prone for the user. Contrast C++ overloaded functions with the typical C interface to this type of functionality requiring a type code and a void pointer. The C interface is without doubt more error prone leaving problems related to improperly specified type codes to be discovered (and debugged) at run time. With C++ overloaded functions the compiler enforces the type system at compile time.

This interface isn't compatible with pure Java.

A pure Java implementation could also be written. Java does not have templates so when implementing a conversion matrix for the copy and assignment operators a program that creates a program would probably need to be written as would also be the case if this functionality were developed in pure C. IMHO, this is mostly not a technical issue, but instead a resource limitation and or administrative issue. With a small amount of up front planning we could probably write and maintain every component of EPICS in both C++ and Java as our needs and budgets allow.

Oustanding Issues

Virtual Base Class for Time Stamps?

Currently, time stamps are interfaced through class epicsTime. Should time stamps be interfaced using a pure virtual base class as has been the standard approach for all other complex data types such as strings and enumerated state set descriptions.

Subject:	Data Access Class Library Tutorial
From:	Jeff Hill <[email protected]>
To:	[email protected]
Date:	Tue, 22 Feb 2005 15:22:20 -0700

Experimental Physics and Industrial Control System

Data Access Class Library Tutorial

Table of Contents