Points for Discussion about the BOS-C++ Interface
The BOS-C++ interface presented here is only a first trial. There are
quite a few questions that should be discussed before a "production
strength" implementation of something based on these ideas should be
undertaken.
These are the points that come to my mind.
Mail me
(Benno List, blist@mail.desy.de)
your opinions or further questionable issues!
Style Guide
When many people work on software, a style guide is always a good thing to
have, and some steps have been taken in the direction to have a FORTRAN
style guide for H1 (e.g. the H1 software note
54 and Stephan Egli's example code in his
DST guide).
For a complicated and new language like C++, a style guide is much more
important. It should help people to avoid pitfalls, avoid writing
non-portable code, and produce reliable and easy-to-read code.
BaBar has already taken steps in that direction: Look at their
BABAR
Programming Guidelines!
There exists also a much larger, though in some points debatable
(because of its strictness)
style guide
from the Ellemtel corporation.
However, it does not cover some issues connected with more modern features
in C++ such as exceptions and namespaces.
Here are just a few topics that came to my mind which should be adressed.
Wrapper files: C or C++?
Thanx to mainly Martine Charlet, already a large collection of C wrapper
files for H1 software FORTRAN routines exist. Surely this is a very
valuable thing.
Nevertheless it should be considered to provide special C++ header files,
which can then use function overloading and default arguments to provide
sensible shortcuts.
Eventually, C and C++ wrappers could be combined by using the predefined
__CPLUSPLUS__ macro.
Using the STL?
The ANSI standard defines a standard template library. I think it's just
too powerful to ignore.
Exceptions
- Should they be used at all? (I think: yes)
- What to do until on all machines compilers which support exception
handling exist?
- A suitable hierarchy of exceptions must be defined.
Locking mechanism/associated structures/parallel banks
A widespread problem in HEP analyses is that we are offered a large
collection of reconstructed objects which are just hypotheses (e.g.
track hypotheses, electron candidates).
A kind of flagging and locking mechanism is therefore often needed.
In H1 software this is often done by using banks which are parallel to
one another.
- Should there be a special construct for treating parallel banks,
e.g. for the calorimeter banks?
- Could one have a special locking class template which maintains a
structure parallel to a bank which can be used to log or flag tracks?
How should a user interface for such a structure look like? Should it
be a special class, say "LockableTable", with
predefined access functions like dtra[i].flag(1)? Or maybe one could
have a sort of flag class, say
class flags {
public:
int goodtrack;
ind good_forward_track;
};
and associate it with the table:
void f () {
LockableTable dtra ("DTRA", 0);
for (i = 1; i <= dtra.rows(); i++) {
if (dtra[i].lock()->goodtrack) {
// do something
}
}
}
The "lock()" function would provide the pointer to the appropriate
element of the "flags" class. But the syntax is somehow still not
nice.
Syntax of StrBank
In my proposal, a structured bank's elements can be accessed via the
"dot" operator, as if the StrBank object was the bank itself:
struct qtrarow {
// Conditions for good central track:
float th_min_c; // min. theta [degrees]
float th_max_c; // max. theta [degrees]
float rs_max_c; // max. start radius [cm]
float re_min_c; // min. end radius [cm]
float rl_min_c; // min. radial length [cm]
float pt_min_c; // min. p_T [GeV]
// and much more...
}
void f () {
StrBank qtra ("QTRA", 0);
float thet_min_cen = qtra.th_min_c; // qtra acts like structure
qtrarow cuts = *qtra; // qtra acts like pointer
}
So, on one hand, the object qtra is treated as if it actually
contained the members th_min_c, th_max_c, and so on,
on the other hand it acts like a pointer. This is somehow
illogical.
The syntax might be more consistent if qtra was always viewed as a sort
of pointer:
void f () {
StrBank qtra ("QTRA", 0);
float thet_min_cen = qtra->th_min_c; // qtra acts like pointer
// ^^ not "." anymore!
qtrarow cuts = *qtra; // stays
}
This would also be more consistent with the treatment of tables:
A Table object is syntactically similar to an array, so it's very
close to a pointer, and indexing an array is somewhat equivalent to
dereferencing a pointer.
Should bank name and number be kept in the "Bank" object?
Currently, only the index of a bank is stored in the Bank object, not
the name or number of the bank the user wanted to open.
Therefore, when a bank was not found on the BOS common, this name and
number are not available anymore, e.g. for use in error messages. Also
code like this would not be possible:
void f () {
StrBank qtra ("QTRA", 0); // try to open bank
if (!qtra.is_open()) {
// do something to get bank, e.g. fetch it from database
qtra.reopen(); // perform again nlink ("QTRA", 0)
}
}
So the answer is probably "yes". But then questions arise: Should a method
name() return the stored name, or the take the name from the BOS common?
Should the "row" class contain a default name and number of a bank?
Currently, the user always must provide the name of the bank he wants to
open, although using a predefined structure generally implies a certain
bankname.
Code like this might be nice:
void f () {
Table dtra (); // open "DTRA" bank with number 0
}
For this one would need a default bank name (and possibly number) in the
class "dtrarow". This could be done the following way:
// file dstbanks.h
class BankDescriptor {
public:
static char name [5];
static int number;
};
class dtrarow: public BankDescriptor {
public:
float ptinv_tr;
// and all the rest...
};
This would be used in the Table template:
// file banks.h
template
class Table: public Bank {
public:
Table (): Bank (row::name, row::number) {
// check miniheader;
}
// and all the rest...
};
Then one would have, in a special file, initiators for name and number:
// file dstbanks.C
#include"banks.h"
#include"dstbanks.h"
dtrarow::name = "DTRA";
dtrarow::number = 0;
This is somewhat inconvenient, but at the moment it has to be done anyway
as soon as one wants to use TablePointers, and I see no way how to avoid it
there.
Also, the above initializations could be automatized exacly the same way as
the transition from DDL to structures/classes.
Banks with wrong format
One common case where one has banks with a "wrong" format is that a bank
has been added some new columns, and now one runs over old data.
In this particular case it might be useful to have a default treatment of
banks with too few columns:
If the constructor of a Table observes that columns are missing, it might
create a new Table with the right number of columns, use the default
creator of the "row" type to initialize all elements, and then copy the
elements present in the table which is too small.
This would not work for ill cases like DMIS, where the sequence of columns
has been changed, but for many other cases.
The alternative would be to leave it to the user to catch the exception
thrown in such a case, perform the same thing, and continue. This might
also be OK, and makes the user aware of the problem, and the fact that the
default solution may not work.
"Free format" list-like banks
Especially in H1SIM often banks are used that have a list-like structure,
i.e. consist of variable-length rows which start with the number of
elements of the row, and then a sequence of elements which typically
depends on the second element of the row.
The structure of an individual row is thus similar to a union (or a
variant record in Pascal) with a selection field.
It might make sense to try to define templates which can take such unions
as row descriptors, and do not use miniheaders, but provide singly-linked
list type iterators for access of the bank.
Use of int32 and float32
My implementation of Banks generally assumes that int and float are 32bit
numbers. Probably it would be better to define
typedef int32 int;
typedef float32 float;
and use these types wherever one relies on this assumption. Otherwise the
transition to a 64 bit compiler could mean lots of code rewriting!
Garbage collection
After garbage collection, which in the case that FORTRAN modules are called
from the C++ program, cannot be controlled or detected by the C++ program
automatically, all bank indices potentially point to the wrong
location.
A similar situation arises from dropping a bank.
Several "solutions" to this problem may be considered:
- The FORTRAN "solution":
Simply demand that no bank indices are reused after garbage
collection. In the C++ case this means that during garbage collection
no Bank object must be in scope.
This solution is not as silly as it seems:
Practically no user ever calls the garbage collector herself. This is
normally done only by FSEQR and MODULF. Since the only allowed
communication between modules is by banks, not by variables (and bank
indices are variables), no problem arises.
- Perform some check at every bank access, e.g. check that the stored
bank name is equal to the contents of iw[base-4].
This is certainly too slow and (I think) ugly.
- Don't store the bank index at all, but store name and number of the
desired bank, and always perform an "nlink" to access the bank.
This option is probably the safest, but surely too slow for practical
purposes.
- Fast version of the last solution:
Probably the slowest step in "nlink" is the lookup of the name, i.e.
evaluation of IW(NAMIND("BANK")). But the result of NAMIND("BANK") is
guaranteed not to change during a program, even after garbage
collection, so it can be safely stored.
Then an undocumented entry into NLINK, namely NLINC, can apparently be
used to get the actual link to the desired bank. Since in most cases
the bank with the lowest number is wanted by the user, i.e. the bank
given by IW(NAMIND("BANK")), this is bound to be very fast.
Actually, one could provide means to "allow" the class to use
IW(NAMIND("BANK")) instead of NLINC(NAMIND("BANK"), 0) in cases where
this is thought to be safe (e.g. DST banks).
Work banks
Currently, work banks are not used (and probably noty usable) in the proposed
scheme. An important hurdle is that BOS takes the adress of a workbank
pointer and changes it's value during garbage collection. Therefore workbank
indices must be in stored common blocks.
The C++ solution would be to have a workbank class that keeps and manages a
static array of workbank indices. Probably it is not much overhead to have
something like this, but it requires quite some thought how to do it. But the
STL standard container classes probably are very useful for that.
Probably the best would be to have an inheritable Type "WorkObject" which
could be inherited by "WorkBank", "WorkStrBank", and "WorkTable".
Database banks
Wouldn't it be nice to have a "database bank" class which issues a "UGTBNK"
for you every time you want to open the bank?
It would be nice to have some sort of base class "DatabaseBased" which can be
ingerited by "DatabaseBank", "DatabaseStrBank", and "DatabaseTable".