From: ernest@pundit.cithep.caltech.edu (Ernest Prabhakar)
Newsgroups: hepnet.lang.c++
Subject: A Radical Perspective on CLHEP
Date: 9 Dec 1993 22:58:27 GMT
Organization: California Institute of Technology, Pasadena
Distribution: world

Suggestions for Reorganizing CLHEP

Ernie Prabhakar, 12/8/93

These are my comments for how best to organize and implement a C++ base  
library for High Energy Physics.  They are inspired by discussions that  
took place at the Dec 6th BFast analysis workshop, and many of the  
ideas came from participants there.  However, this document reflects my  
own memory and prejudices, and should not be taken as representative of  
anything except my own opinions.

My primary recommendation is that we should view CLHEP as a whole the  
same way we do (or rather, should) view our basic OO programming  
problems.  In particular, we should break our analysis down into the  
following four parts.

	I.	The Problem
	II.	The Hierarchy
	III.	The Interface
	IV.	The Implementation
 
I.  The Problem

The problem CLHEP is trying to solve is that the high energy physics  
community is not experiencing the benefits of object technology.  While  
many of us have adopted C++ for our own projects, we are not able to  
reuse code  and share objects with other groups within HEP.  Duplicated  
code, incompatible interfaces, and inefficient implementations still  
dog our footsteps.

The current CLHEP distribution, while a noble first step, does not  
solve this more general problem.  There is no guidance for how to fit  
other large packages (like GISMO) into this framwork.   There still  
exist multiple, incompatible solutions to certain problems (like  
persistence, where CLHEP and Cheetah sharply part company).   And even  
for something as basic as vectors, there are disagreements about how  
(or even whether) extensions should be made (arbitrary-sized vectors,  
garbage collections, matrices, etc.).

There is also a non-techical problem with the current CLHEP effort: it  
is not going anywhere.  Right now, there is very little innovation  
going on with the CLHEP classes.  Discussions on the newsgroup are  
anemic at best, and tend to result in minor feature additions and name  
changes.  Much C++ work going on at other places either ignores CLHEP  
or is not folded back into it.

The solution, I believe, is to treat CLHEP as sort of a giant  
meta-object library.  Rather than a single `monolithic` standard, we  
should treat it as a collection of standards, with well-defined  
dependencies, interfaces, and relationships.  These 'component'  
standards can be farmed out to different individuals, who can modify or  
add implementations as long as they preserve the basic interface.   
CLHEP itself will become sort of a clearinghouse and referee point for  
work done by other people, rather than the prime locale where work is  
done.

While this may seem impossibly idealistic, I believe it is the only  
functional model for distributed development.  It mirrors (to the  
extent of my understanding) the mechanisms used by GNU projects, which  
is easily the most succesful net-development program in existence.  In  
the implementation section, I will discuss some ideas for how to do  
this in a way that does not require extensive overhead or manpower.

The benefit of this is that we will be able to share in the fruits of  
each others development.  Even better, we will be able to do  
reimplement portions of each others work (where needed by our own  
particular optimizations) but still make use of everything else.

II.  The Hierarchy

As with any OOP problem, you need to set up a good hiearchy before you  
even think about writing the actual objects.  The basic idea is that  
CLHEP should be organized as a set of hierarchies (sort of a  
meta-hierarchy of libraries, rather than objects): lists, math,  
particle, etc.  Each hierachy can inherit from or otherwise be  
dependant on other hierarchies, but only in a well-defined way.  

Several hierarchies would be grouped into a 'layer'.  The idea is that  
each layer would only depend on itself and previous layers.  This would  
make it easy for users to use a well-defined subset of CLHEP, without  
having to worry about dependencies and ordering.    Layers would be  
numbered starting from zero, indicating their dependency.  Indepent  
libraries on the same layer could be differentiated by letter.  A  
possible breakdown would be as follows.

Layer 0 - The Generic Layer:  non-physics-specific classes
	These are the sort of things one finds in commerical base  
classes,
such as libg++.  It could also include pure mathematical classes like  
Helix and FourVector (but would not know about magnetic field or mass  
or such).
	- Collection: List, Iterator, AssociativeList
	- Math: Vector, Matrix (inversion, rotation), Helix
	- Combinatorics?

Layer 1 - Data Management Layer:  our data-base type functionality
	These are the sort of things we could ignore completely if we  
used a commerical OODB.  Since that is highly unlikely to happen in  
HEP, we need to make do the best we can.  However, we should allow for  
sites to 'slip in' a commerical DB in place of this layer.  Issues  
include:
	- Storage
	- Translating from disk-storage to 'live' storage, if needed
	- Self-description of classes
	- Data Retrieval via queries  

Layer 2 - Particle Representations:  physics constructs
	These are the base classes we use for representing actual  
physics quantities.  Many of these may be subclasses of the  
mathematical constructs above.  Classes would include:
	- FourMomenta
	- Measured Data
	- Identified Data
	- a Particle Properties database

Layer 3 - Support Packages:  
	These are relatively low-level packages that could be used by  
many different programs.  They cover both physics and computer-science  
issues:  They are best illustrated by example.  Some of these, I  
suppose, are sufficiently basic they could be put in Level 1 instead of  
Level 3.
	- Particle decayers
	- Tuple/Histogramming packages
	- Kinematic Fitters
	- Graphics packages
	- User-interface routines
	- Object-Oriented parsers

Layer 4 - End-user packages:  
	Here live relatively high-level packages that would tend to be  
single applications.  They are characterized by having 'frameworks' as  
well as normal objects.
	- Detector Simulations (i.e., GISMO)
	- Full Monte Carlo
	- Analysis Kits (e.g., SUSHI, CABS)

The idea is that each layer is built upon abstractions exported from  
previous layers.  We have tried to follow this philosophy in SUSHI,  
where we have
	Layer 0: BaseLib and VMP (list and vector classes)
	Layer 1: DDM and Cheetah (Data management)
	Layer 2: PETS (data structures) 
 	Layer 3: HippoPlus
	Layer 4: SUSHI Schemas

In fact, the current challenge I am facing is trying to clean up the  
relationship between the DDM and PETS structures.  There may be some  
more work necessary on the relationship between data storage and data  
structures.

This is not intended to be a definitive list, but as a 'first cut' to  
get people thinking.  The important part is to figure out the overall  
structure before getting bogged down on implementation details.

III.  The Interfaces

For maximum flexibility, I think each sublibrary should have a clearly  
defined interface.  That would include the libraries it is dependent  
on, as well as the inteface which it exports.  It should also have a  
simple name which can be referred to by other libraries.

I think it can be useful to define a 'minimal' interface, even if the  
library exports much more functionality than that.  For example, most  
users of a list really only need size, appending, random access, and  
iteration.  However, a good list class should include much more than  
that.  By declaring two different interfaces    protocols, I should say  
  (one a superset of the other)  users can decide whether they want  
portability or power.  Similarly, a developer can create an optimized  
list for a particular case that makes use of the minimal interface, but  
may add its own extensions.

For example, the list libraries can be classified as "CLHEP_0L" - level  
'0', hierarchy 'L'.  More precisely, this is the list protocol exported  
by CLHEP.  The 'basic' implemenation could be something like 0L.1,  
which includes extensions beyond the base protocol.    Dependent  
libraries could refer to '0L' or '0L.1' depending on how specific (and  
implementation dependant) their usage of the list class was.

I think the best way to define a protocol is simply to use a pure  
virtual class.  Note that this is primarily for notational convenience,  
though.  We can (and should) allow implementations which are  
non-virtual, but maintain the same member names to allow compatibility.   
It does introduce some risk, but that is an omni-present danger with  
C++ anyway.  There are some HEP applications where the overhead of a  
virtual table pointer would be prohibitive, and we don't want to lock  
them out.

IV. The Implementation

While I will be the first to admit these answers I have given are  
probably half-baked, I do believe these are the relevant questions we  
need to ask.    If the high energy physics community does not adopt  
something like this, I am afraid CLHEP will become just another  
standard that people pay lip-service to but nobody actually uses.

I honestly think it will not take a lot of extra effort or overhead to  
implement a scheme like this.  What it will take is a willingness to  
put effort into organizing code rather than just writing it, and a  
change in our political mindset.  Some of the steps I would see as  
necessary in that are:

1.  Obtaining commitments from all the major players

There are already a half-dozen different HEP groups using C++, and I  
think new ones spring up every few months.  We need to have a Council  
with a representative from each of the big labs/experiments - SLAC,  
FermiLab, and especially CERN - plus the major projects (MC++, GISMO,  
Cheetah)   in order to have sufficient credibility [Presumably this  
would be some sub-committee of CHEP].  And then we need to make some  
sort of binding pact to support the protocols agreed upon by the group  
[which means defining some sort of voting process, formal or informal].   
If the existing protocols are insufficient, we should agree to submit  
modifications or extensions, rather than merely ignoring them.

2.	Defining the goals and constraints

There are many nuts and bolts issues that need to be defined before we  
can seriously talk about developing a standard.  Some of these CLHEP  
has already answered (code conventions, compiler/OS support, etc.).   
Others I have tried to introduce here (interface vs. implementation,  
layering, naming scheme, etc.).  We also need to start considering new  
ANSI C++ extensions (String class, dynamic down-casting, name-space  
tools) and their relationship to our efforts.  These decisions should  
be made and spelled out up-front, otherwise decisions further down will  
erupt into bickering over these basics.

It would also be extremely helpful if we settled on a common notational  
scheme for class descriptions (such as was proposed by Gary, based on  
their experiences at SSC).  While we certainly would not want to  
require a using a commercial design tool, we should setup a format such  
that those with access to such tools could easily take advantage of  
them for CLHEP work. 

3.	Breaking the problem down into manageable pieces

Next (and this is the crucial part), we would need to break our library  
down into its constituent parts AND assign a specific person to be in  
charge of it.  That person has the responsibility for overseeing  
development of that protocol, and implementations thereof.  He is also  
responsible for verifying that donated implementations conform to the  
stated protocol.  Before any new library name, protocol or  
implementation, (e.g. CLHEP_0L.1) can be added to CLHEP, though, it  
must be approved by a vote of the `Council'.

The problem this is trying to avoid is that often somebody will say "I  
want to use , but your  doesn't do ."  The person in charge tends to respond, "No, that's not what I  
wrote this class to do" or "I don't have time to consider that now" so  
the person goes off  and implements something incompatible.  What would  
be nice if they could say, "You can use this source as a base; just  
make sure you stay within this protocol."  Or, if they really don't  
have time, "Okay, you can take over this subsection, but remember you  
must get protocol changes approved by the council."

4.	Gather existing code together and synthesize a common ground

It is important that whatever classes we come up with answers the  
(often diverse) needs of the different projects involved.  That  
involves
	- surveying the functionality of existing classes
	- identifying the common behavior for a protocol
	- recognizing useful but incompatible optimizations/extensions
		that would need to be used in different implementations

This could be handled at the subsection level.  None of us would desire  
to do this for an entire library.  However, I think most of us would be  
willing to send our code out for someone else to review.  And would be  
willing to get code from other people related to our current focus, and  
synthesize it into something general.  Especially if we knew somebody  
else was doing that with other code that we need.  The hard part is  
making sure people would be willing to fold back the resulting classes  
into their projects.

5.	Setup a reliable distribution mechanism

It should be easy for people to pick up the latest copy of CLHEP from  
more than one site.  It should also be straightforward for somebody  
with a single module to find out where the other modules are that he  
needs in order to use it.  This also implies a good versioning  
mechanism (rCVS?).

	 SUMMARY

Okay, perhaps this is massive overkill for a simple problem.  However,  
I really feel the current CLHEP effort is significiant underkill for  
the real problem of trying to build a universal HEP class library.  If  
we really are serious about trying to do that, I encourage people to  
come up with a system (hopefully better than what I sketched out here)  
that has a reasonable chance of doing that.  Let's pour our programming  
and organizational resources into getting a good foundation laid now,  
rather than frittering them away over the next decade in a plethora of  
incompatible systems.

Sincerely,

- Ernie N. Prabhakar
Speaking for myself
---
Ernest N. Prabhakar                  Caltech High Energy Physics
Member, League for Programming Freedom (league@prep.ai.mit.edu)