Points for Discussion about the BOS-C++ Interface

The BOS-C++ interface presented here is only a first trial. There are quite a few questions that should be discussed before a "production strength" implementation of something based on these ideas should be undertaken.

These are the points that come to my mind. Mail me (Benno List, blist@mail.desy.de) your opinions or further questionable issues!

Style Guide

When many people work on software, a style guide is always a good thing to have, and some steps have been taken in the direction to have a FORTRAN style guide for H1 (e.g. the H1 software note 54 and Stephan Egli's example code in his DST guide).
For a complicated and new language like C++, a style guide is much more important. It should help people to avoid pitfalls, avoid writing non-portable code, and produce reliable and easy-to-read code.
BaBar has already taken steps in that direction: Look at their BABAR Programming Guidelines!
There exists also a much larger, though in some points debatable (because of its strictness) style guide from the Ellemtel corporation. However, it does not cover some issues connected with more modern features in C++ such as exceptions and namespaces.
Here are just a few topics that came to my mind which should be adressed.

The for-scope problem
Between older C++ definitions and the ANSI draft standard the rules for the scope of variables in for loops have changed. Consider this example:
```
      #include"iostream.h"
      int i=1;                      // global variable
      void main () {
        for (int i=0; i < 5; i++) { // local i hides global i in for-loop
          if (i == 3) break;
        }
        cout << i;                  // local or global i?
      } 
```
This is perfectly valid code according to the old and the new standard, so no compiler could warn you about this. In the old standard the i defined in the for-loop header remains in scope, therefore the program should print "3" (I did not try it). According to the ANSI standard, the scope of the i defined in the loop header is only the loop body, and therefore the routine should print "1".
This problem is bound to stay with us a long time, and should be avoided. Therefore generally definitions in the header of a for-loop should be forbidden.
Use of exceptions
Should exceptions be used (I think yes!). If so: How? Are derived exception classes allowed (I think they should. But compiler support is dim yet)?
What should be done as long as compilers are with us which compile code that uses exceptions, but ignores them? Often, an exception is a way out of a situation that otherwise leads to a segmentation violation, most notably in the case where a function would be forced to return a 0 pointer. Therefore printing an error message after the (ignored) throw statement, possibly "enhanced" by calling H1STOP, may give the user at least a hint why his program crashed.
Naming conventions
Some sort of naming convention should be introduced. Look at the BaBar style guide for good recommendations.
Namespaces?
The ANSI standard defines name spaces. I think they are a good tool to make transparent where a routine comes from. In that case one would use e.g. h1rec::cjcrec() instead of just cjcrec, or h1util::jrdata(name, iret).
If that is too clumsy, e.g. for h1util, all names in a namespace can be made global. This would be a good idea for h1util, or look, or bos, maybe not such a good idea for use of an odd h1rec routine.
To make old C++ code backward compatible, the header files could include a directive to generally make all names available globally by using a preprocessor directive.

Wrapper files: C or C++?

Thanx to mainly Martine Charlet, already a large collection of C wrapper files for H1 software FORTRAN routines exist. Surely this is a very valuable thing.
Nevertheless it should be considered to provide special C++ header files, which can then use function overloading and default arguments to provide sensible shortcuts.
Eventually, C and C++ wrappers could be combined by using the predefined __CPLUSPLUS__ macro.

Using the STL?

The ANSI standard defines a standard template library. I think it's just too powerful to ignore.

Exceptions

Should they be used at all? (I think: yes)
What to do until on all machines compilers which support exception handling exist?
A suitable hierarchy of exceptions must be defined.

Locking mechanism/associated structures/parallel banks

A widespread problem in HEP analyses is that we are offered a large collection of reconstructed objects which are just hypotheses (e.g. track hypotheses, electron candidates).
A kind of flagging and locking mechanism is therefore often needed. In H1 software this is often done by using banks which are parallel to one another.

Should there be a special construct for treating parallel banks, e.g. for the calorimeter banks?
Could one have a special locking class template which maintains a structure parallel to a bank which can be used to log or flag tracks? How should a user interface for such a structure look like? Should it be a special class, say "LockableTable
", with predefined access functions like dtra[i].flag(1)? Or maybe one could have a sort of flag class, say
```
  class flags {
    public:
      int goodtrack;
      ind good_forward_track;
  }; 
```
and associate it with the table:
```
  void f () {
    LockableTable dtra ("DTRA", 0);
    for (i = 1; i <= dtra.rows(); i++) {
      if (dtra[i].lock()->goodtrack) {
        // do something
      }
    }
  } 
```
The "lock()" function would provide the pointer to the appropriate element of the "flags" class. But the syntax is somehow still not nice.

Syntax of StrBank

In my proposal, a structured bank's elements can be accessed via the "dot" operator, as if the StrBank object was the bank itself:

  struct qtrarow {
  // Conditions for good central track:
    float th_min_c;  // min. theta [degrees] 
    float th_max_c;  // max. theta [degrees] 
    float rs_max_c;  // max. start radius [cm] 
    float re_min_c;  // min. end radius [cm] 
    float rl_min_c;  // min. radial length [cm] 
    float pt_min_c;  // min. p_T [GeV] 
  // and much more... 
  }
  
  void f () {
    StrBank qtra ("QTRA", 0);
  
    float thet_min_cen = qtra.th_min_c;  // qtra acts like structure
  
    qtrarow cuts = *qtra;                // qtra acts like pointer
  
  }

So, on one hand, the object qtra is treated as if it actually contained the members th_min_c, th_max_c, and so on, on the other hand it acts like a pointer. This is somehow illogical.
The syntax might be more consistent if qtra was always viewed as a sort of pointer:

  void f () {
    StrBank qtra ("QTRA", 0);
  
    float thet_min_cen = qtra->th_min_c;  // qtra acts like pointer
    //                    ^^ not "." anymore!
  
    qtrarow cuts = *qtra; // stays
  }

This would also be more consistent with the treatment of tables:
A Table object is syntactically similar to an array, so it's very close to a pointer, and indexing an array is somewhat equivalent to dereferencing a pointer.

Should bank name and number be kept in the "Bank" object?

Currently, only the index of a bank is stored in the Bank object, not the name or number of the bank the user wanted to open.
Therefore, when a bank was not found on the BOS common, this name and number are not available anymore, e.g. for use in error messages. Also code like this would not be possible:

  void f () {
    StrBank qtra ("QTRA", 0); // try to open bank
  
    if (!qtra.is_open()) {
       // do something to get bank, e.g. fetch it from database
       qtra.reopen();                  // perform again nlink ("QTRA", 0)
    }
  }

So the answer is probably "yes". But then questions arise: Should a method name() return the stored name, or the take the name from the BOS common?

Should the "row" class contain a default name and number of a bank?

Currently, the user always must provide the name of the bank he wants to open, although using a predefined structure generally implies a certain bankname.
Code like this might be nice:

  void f () {
    Table dtra (); // open "DTRA" bank with number 0
  }

For this one would need a default bank name (and possibly number) in the class "dtrarow". This could be done the following way:

  // file dstbanks.h
  class BankDescriptor {
  public:
    static char name [5];
    static int number;
  };
    
  class dtrarow: public BankDescriptor {
  public:
    float ptinv_tr;
    // and all the rest...
  };

This would be used in the Table template:

  // file banks.h
  template  
  class Table: public Bank {
  public:
    Table (): Bank (row::name, row::number) {
      // check miniheader;
    }
  // and all the rest...
  };

Then one would have, in a special file, initiators for name and number:

  // file dstbanks.C
  #include"banks.h"
  #include"dstbanks.h"
  
  dtrarow::name   = "DTRA";
  dtrarow::number = 0;

This is somewhat inconvenient, but at the moment it has to be done anyway as soon as one wants to use TablePointers, and I see no way how to avoid it there.
Also, the above initializations could be automatized exacly the same way as the transition from DDL to structures/classes.

Banks with wrong format

One common case where one has banks with a "wrong" format is that a bank has been added some new columns, and now one runs over old data.
In this particular case it might be useful to have a default treatment of banks with too few columns:
If the constructor of a Table observes that columns are missing, it might create a new Table with the right number of columns, use the default creator of the "row" type to initialize all elements, and then copy the elements present in the table which is too small.
This would not work for ill cases like DMIS, where the sequence of columns has been changed, but for many other cases.
The alternative would be to leave it to the user to catch the exception thrown in such a case, perform the same thing, and continue. This might also be OK, and makes the user aware of the problem, and the fact that the default solution may not work.

"Free format" list-like banks

Especially in H1SIM often banks are used that have a list-like structure, i.e. consist of variable-length rows which start with the number of elements of the row, and then a sequence of elements which typically depends on the second element of the row.
The structure of an individual row is thus similar to a union (or a variant record in Pascal) with a selection field.
It might make sense to try to define templates which can take such unions as row descriptors, and do not use miniheaders, but provide singly-linked list type iterators for access of the bank.

Use of int32 and float32

My implementation of Banks generally assumes that int and float are 32bit numbers. Probably it would be better to define

  typedef int32 int;
  typedef float32 float;

and use these types wherever one relies on this assumption. Otherwise the transition to a 64 bit compiler could mean lots of code rewriting!

Garbage collection

After garbage collection, which in the case that FORTRAN modules are called from the C++ program, cannot be controlled or detected by the C++ program automatically, all bank indices potentially point to the wrong location.
A similar situation arises from dropping a bank.
Several "solutions" to this problem may be considered:

The FORTRAN "solution":
Simply demand that no bank indices are reused after garbage collection. In the C++ case this means that during garbage collection no Bank object must be in scope.
This solution is not as silly as it seems:
Practically no user ever calls the garbage collector herself. This is normally done only by FSEQR and MODULF. Since the only allowed communication between modules is by banks, not by variables (and bank indices are variables), no problem arises.
Perform some check at every bank access, e.g. check that the stored bank name is equal to the contents of iw[base-4].
This is certainly too slow and (I think) ugly.
Don't store the bank index at all, but store name and number of the desired bank, and always perform an "nlink" to access the bank.
This option is probably the safest, but surely too slow for practical purposes.
Fast version of the last solution:
Probably the slowest step in "nlink" is the lookup of the name, i.e. evaluation of IW(NAMIND("BANK")). But the result of NAMIND("BANK") is guaranteed not to change during a program, even after garbage collection, so it can be safely stored.
Then an undocumented entry into NLINK, namely NLINC, can apparently be used to get the actual link to the desired bank. Since in most cases the bank with the lowest number is wanted by the user, i.e. the bank given by IW(NAMIND("BANK")), this is bound to be very fast.
Actually, one could provide means to "allow" the class to use IW(NAMIND("BANK")) instead of NLINC(NAMIND("BANK"), 0) in cases where this is thought to be safe (e.g. DST banks).

Work banks

Currently, work banks are not used (and probably noty usable) in the proposed scheme. An important hurdle is that BOS takes the adress of a workbank pointer and changes it's value during garbage collection. Therefore workbank indices must be in stored common blocks.
The C++ solution would be to have a workbank class that keeps and manages a static array of workbank indices. Probably it is not much overhead to have something like this, but it requires quite some thought how to do it. But the STL standard container classes probably are very useful for that.
Probably the best would be to have an inheritable Type "WorkObject" which could be inherited by "WorkBank", "WorkStrBank", and "WorkTable".

Database banks

Wouldn't it be nice to have a "database bank" class which issues a "UGTBNK" for you every time you want to open the bank?
It would be nice to have some sort of base class "DatabaseBased" which can be ingerited by "DatabaseBank", "DatabaseStrBank", and "DatabaseTable".