From: ernest@pundit.cithep.caltech.edu (Ernest Prabhakar)
Newsgroups: hepnet.lang.c++
Subject: A Radical Perspective on CLHEP
Date: 9 Dec 1993 22:58:27 GMT
Organization: California Institute of Technology, Pasadena
Distribution: world
Suggestions for Reorganizing CLHEP
Ernie Prabhakar, 12/8/93
These are my comments for how best to organize and implement a C++ base
library for High Energy Physics. They are inspired by discussions that
took place at the Dec 6th BFast analysis workshop, and many of the
ideas came from participants there. However, this document reflects my
own memory and prejudices, and should not be taken as representative of
anything except my own opinions.
My primary recommendation is that we should view CLHEP as a whole the
same way we do (or rather, should) view our basic OO programming
problems. In particular, we should break our analysis down into the
following four parts.
I. The Problem
II. The Hierarchy
III. The Interface
IV. The Implementation
I. The Problem
The problem CLHEP is trying to solve is that the high energy physics
community is not experiencing the benefits of object technology. While
many of us have adopted C++ for our own projects, we are not able to
reuse code and share objects with other groups within HEP. Duplicated
code, incompatible interfaces, and inefficient implementations still
dog our footsteps.
The current CLHEP distribution, while a noble first step, does not
solve this more general problem. There is no guidance for how to fit
other large packages (like GISMO) into this framwork. There still
exist multiple, incompatible solutions to certain problems (like
persistence, where CLHEP and Cheetah sharply part company). And even
for something as basic as vectors, there are disagreements about how
(or even whether) extensions should be made (arbitrary-sized vectors,
garbage collections, matrices, etc.).
There is also a non-techical problem with the current CLHEP effort: it
is not going anywhere. Right now, there is very little innovation
going on with the CLHEP classes. Discussions on the newsgroup are
anemic at best, and tend to result in minor feature additions and name
changes. Much C++ work going on at other places either ignores CLHEP
or is not folded back into it.
The solution, I believe, is to treat CLHEP as sort of a giant
meta-object library. Rather than a single `monolithic` standard, we
should treat it as a collection of standards, with well-defined
dependencies, interfaces, and relationships. These 'component'
standards can be farmed out to different individuals, who can modify or
add implementations as long as they preserve the basic interface.
CLHEP itself will become sort of a clearinghouse and referee point for
work done by other people, rather than the prime locale where work is
done.
While this may seem impossibly idealistic, I believe it is the only
functional model for distributed development. It mirrors (to the
extent of my understanding) the mechanisms used by GNU projects, which
is easily the most succesful net-development program in existence. In
the implementation section, I will discuss some ideas for how to do
this in a way that does not require extensive overhead or manpower.
The benefit of this is that we will be able to share in the fruits of
each others development. Even better, we will be able to do
reimplement portions of each others work (where needed by our own
particular optimizations) but still make use of everything else.
II. The Hierarchy
As with any OOP problem, you need to set up a good hiearchy before you
even think about writing the actual objects. The basic idea is that
CLHEP should be organized as a set of hierarchies (sort of a
meta-hierarchy of libraries, rather than objects): lists, math,
particle, etc. Each hierachy can inherit from or otherwise be
dependant on other hierarchies, but only in a well-defined way.
Several hierarchies would be grouped into a 'layer'. The idea is that
each layer would only depend on itself and previous layers. This would
make it easy for users to use a well-defined subset of CLHEP, without
having to worry about dependencies and ordering. Layers would be
numbered starting from zero, indicating their dependency. Indepent
libraries on the same layer could be differentiated by letter. A
possible breakdown would be as follows.
Layer 0 - The Generic Layer: non-physics-specific classes
These are the sort of things one finds in commerical base
classes,
such as libg++. It could also include pure mathematical classes like
Helix and FourVector (but would not know about magnetic field or mass
or such).
- Collection: List, Iterator, AssociativeList
- Math: Vector, Matrix (inversion, rotation), Helix
- Combinatorics?
Layer 1 - Data Management Layer: our data-base type functionality
These are the sort of things we could ignore completely if we
used a commerical OODB. Since that is highly unlikely to happen in
HEP, we need to make do the best we can. However, we should allow for
sites to 'slip in' a commerical DB in place of this layer. Issues
include:
- Storage
- Translating from disk-storage to 'live' storage, if needed
- Self-description of classes
- Data Retrieval via queries
Layer 2 - Particle Representations: physics constructs
These are the base classes we use for representing actual
physics quantities. Many of these may be subclasses of the
mathematical constructs above. Classes would include:
- FourMomenta
- Measured Data
- Identified Data
- a Particle Properties database
Layer 3 - Support Packages:
These are relatively low-level packages that could be used by
many different programs. They cover both physics and computer-science
issues: They are best illustrated by example. Some of these, I
suppose, are sufficiently basic they could be put in Level 1 instead of
Level 3.
- Particle decayers
- Tuple/Histogramming packages
- Kinematic Fitters
- Graphics packages
- User-interface routines
- Object-Oriented parsers
Layer 4 - End-user packages:
Here live relatively high-level packages that would tend to be
single applications. They are characterized by having 'frameworks' as
well as normal objects.
- Detector Simulations (i.e., GISMO)
- Full Monte Carlo
- Analysis Kits (e.g., SUSHI, CABS)
The idea is that each layer is built upon abstractions exported from
previous layers. We have tried to follow this philosophy in SUSHI,
where we have
Layer 0: BaseLib and VMP (list and vector classes)
Layer 1: DDM and Cheetah (Data management)
Layer 2: PETS (data structures)
Layer 3: HippoPlus
Layer 4: SUSHI Schemas
In fact, the current challenge I am facing is trying to clean up the
relationship between the DDM and PETS structures. There may be some
more work necessary on the relationship between data storage and data
structures.
This is not intended to be a definitive list, but as a 'first cut' to
get people thinking. The important part is to figure out the overall
structure before getting bogged down on implementation details.
III. The Interfaces
For maximum flexibility, I think each sublibrary should have a clearly
defined interface. That would include the libraries it is dependent
on, as well as the inteface which it exports. It should also have a
simple name which can be referred to by other libraries.
I think it can be useful to define a 'minimal' interface, even if the
library exports much more functionality than that. For example, most
users of a list really only need size, appending, random access, and
iteration. However, a good list class should include much more than
that. By declaring two different interfaces protocols, I should say
(one a superset of the other) users can decide whether they want
portability or power. Similarly, a developer can create an optimized
list for a particular case that makes use of the minimal interface, but
may add its own extensions.
For example, the list libraries can be classified as "CLHEP_0L" - level
'0', hierarchy 'L'. More precisely, this is the list protocol exported
by CLHEP. The 'basic' implemenation could be something like 0L.1,
which includes extensions beyond the base protocol. Dependent
libraries could refer to '0L' or '0L.1' depending on how specific (and
implementation dependant) their usage of the list class was.
I think the best way to define a protocol is simply to use a pure
virtual class. Note that this is primarily for notational convenience,
though. We can (and should) allow implementations which are
non-virtual, but maintain the same member names to allow compatibility.
It does introduce some risk, but that is an omni-present danger with
C++ anyway. There are some HEP applications where the overhead of a
virtual table pointer would be prohibitive, and we don't want to lock
them out.
IV. The Implementation
While I will be the first to admit these answers I have given are
probably half-baked, I do believe these are the relevant questions we
need to ask. If the high energy physics community does not adopt
something like this, I am afraid CLHEP will become just another
standard that people pay lip-service to but nobody actually uses.
I honestly think it will not take a lot of extra effort or overhead to
implement a scheme like this. What it will take is a willingness to
put effort into organizing code rather than just writing it, and a
change in our political mindset. Some of the steps I would see as
necessary in that are:
1. Obtaining commitments from all the major players
There are already a half-dozen different HEP groups using C++, and I
think new ones spring up every few months. We need to have a Council
with a representative from each of the big labs/experiments - SLAC,
FermiLab, and especially CERN - plus the major projects (MC++, GISMO,
Cheetah) in order to have sufficient credibility [Presumably this
would be some sub-committee of CHEP]. And then we need to make some
sort of binding pact to support the protocols agreed upon by the group
[which means defining some sort of voting process, formal or informal].
If the existing protocols are insufficient, we should agree to submit
modifications or extensions, rather than merely ignoring them.
2. Defining the goals and constraints
There are many nuts and bolts issues that need to be defined before we
can seriously talk about developing a standard. Some of these CLHEP
has already answered (code conventions, compiler/OS support, etc.).
Others I have tried to introduce here (interface vs. implementation,
layering, naming scheme, etc.). We also need to start considering new
ANSI C++ extensions (String class, dynamic down-casting, name-space
tools) and their relationship to our efforts. These decisions should
be made and spelled out up-front, otherwise decisions further down will
erupt into bickering over these basics.
It would also be extremely helpful if we settled on a common notational
scheme for class descriptions (such as was proposed by Gary, based on
their experiences at SSC). While we certainly would not want to
require a using a commercial design tool, we should setup a format such
that those with access to such tools could easily take advantage of
them for CLHEP work.
3. Breaking the problem down into manageable pieces
Next (and this is the crucial part), we would need to break our library
down into its constituent parts AND assign a specific person to be in
charge of it. That person has the responsibility for overseeing
development of that protocol, and implementations thereof. He is also
responsible for verifying that donated implementations conform to the
stated protocol. Before any new library name, protocol or
implementation, (e.g. CLHEP_0L.1) can be added to CLHEP, though, it
must be approved by a vote of the `Council'.
The problem this is trying to avoid is that often somebody will say "I
want to use , but your doesn't do ." The person in charge tends to respond, "No, that's not what I
wrote this class to do" or "I don't have time to consider that now" so
the person goes off and implements something incompatible. What would
be nice if they could say, "You can use this source as a base; just
make sure you stay within this protocol." Or, if they really don't
have time, "Okay, you can take over this subsection, but remember you
must get protocol changes approved by the council."
4. Gather existing code together and synthesize a common ground
It is important that whatever classes we come up with answers the
(often diverse) needs of the different projects involved. That
involves
- surveying the functionality of existing classes
- identifying the common behavior for a protocol
- recognizing useful but incompatible optimizations/extensions
that would need to be used in different implementations
This could be handled at the subsection level. None of us would desire
to do this for an entire library. However, I think most of us would be
willing to send our code out for someone else to review. And would be
willing to get code from other people related to our current focus, and
synthesize it into something general. Especially if we knew somebody
else was doing that with other code that we need. The hard part is
making sure people would be willing to fold back the resulting classes
into their projects.
5. Setup a reliable distribution mechanism
It should be easy for people to pick up the latest copy of CLHEP from
more than one site. It should also be straightforward for somebody
with a single module to find out where the other modules are that he
needs in order to use it. This also implies a good versioning
mechanism (rCVS?).
SUMMARY
Okay, perhaps this is massive overkill for a simple problem. However,
I really feel the current CLHEP effort is significiant underkill for
the real problem of trying to build a universal HEP class library. If
we really are serious about trying to do that, I encourage people to
come up with a system (hopefully better than what I sketched out here)
that has a reasonable chance of doing that. Let's pour our programming
and organizational resources into getting a good foundation laid now,
rather than frittering them away over the next decade in a plethora of
incompatible systems.
Sincerely,
- Ernie N. Prabhakar
Speaking for myself
---
Ernest N. Prabhakar Caltech High Energy Physics
Member, League for Programming Freedom (league@prep.ai.mit.edu)