Main Page | Class Hierarchy | Alphabetical List | Compound List | File List | Compound Members | File Members | Related Pages

Basic Abstractions used in SFH

If one looks at the graphical class hierarchy of this package, one notes a number of base classes that form the roots of the class hierarchy. These base abstractions have generally very simple interfaces, and will be discussed in the following.

Class SFO: Self-Filling Object

Class SFO is the heart of the self-filling histograms. Any class derived from SFO must implement a method Fill(), by which the corresponding object is filled, without the need to pass abscissa values or weights to the object.

The implementation of this Fill() method will generally use objects of classes BaseCut, FloatFun and possibly FillIterator and BinningFun to know what the abscissa value and weights are, and into which histogram to fill a value.

Particular subclasses of SFO are SFH1F, SFH2F, and SFHProf, which correspond to the RooT classes TH1F, TH2F, and THProf (and are actually derived from them).

Additionally, also collections of histograms can be self-filling, and are therefore derived from SFO. One such collection id SFROList, which is simply a collection of self-filling Objects. Its Fill() method is implemented by looping over all objects in its collection and calling their respective Fill() method.

By making SFROList an SFO itself, we allow the creation of trees of such object lists. This is an application of the "Composite" pattern of Gamma, Helm, Johnson and Vlissides (the famous "Gang of Four", "GoF" in short).

A particular SFROList subclass is EventLoop, which is the base class for the classes where the user books and outputs her histohrams.

Another type of self-filling histogram collections are SFSetOfHistograms and SFMatrixOfHistograms. The collections also have a Fill() method and administer several histograms. However, as opposed to SFROList, the Fill() method will fill only one histogram at a time. Objects of type BinningFun are checked into which histogram one should make an entry, and then the corresponding histogram gets an entry. These collections are used if we have a number of similar histograms (say, mass histograms) which should be filled under mutually exclusive conditions. These conditions can be file type (histogram 0 for data, 1 for MC type 1, 2 for MC type 2, etc), or some kinematic quantity, if we want to make a measurement differential in t, W, pt, or some other variable.

Class RegO: Registered Object

A registered object is an object that registers itself in a list (of class ROList) when it is created. Consequently, a RegO object can only be created when it is given a pointer to a ROList. (However, the pointer may be NULL.)

Initially, registered objects where only histograms in our framework (at which time we called the class RegH, for "registered histogram"). The main purpose was to build lists, or collections, of self-filling histograms.

Meanwhile we have extended the use of RegO. Now, also cached Objects (class CachedO) are derived from RegO, and we use ROList objects of cached objects to invalidate all caches of cached objects.

In future releases, we will probably use RegO to implement a reference-counting mechanism, so that purists will not have to complain about memory leaks anymore.

Class ROList: Registered Object List

As already discussed, RegO objects register themselves in ROList objects.

Class ROList is again derived from RegO, another case of the "Composite" pattern of GoF.

Class FloatFun: float-valued Function objects

Our framework makes heavy use of function objects.

A function object is symply an object that overloads operator(). One can think of it a a class that has mainly one purpose, and therefore one main method. This method could be called "doit()", or "foo()", or "doWhatThisClassIsMeantToDo()", or simply have "no name", namely be "operator()".

The nice (though at first puzzling) thing with function objects is that if you have declared such an object, lets say

  FloatFun& pt = *new PtFun (ntuple);
you can write
  float thePT = pt();
which looks like a function call (in fact, it is a function call, it calls PtFun::operator()!) and gives the impression that pt itself is simply a function, declared as
float pt() {
  return something;
}
rather that the function object that it really is.

It is proably best to think of a function object as a function with a memory. The object's state is the "memory", it can hold values (e.g. the normalization, mean and sigma of a Gaussian, or a pointer to an ntuple variable), the operator() is the main way to use the object.

Now, a FloatFun in our framework is a function object where operator() takes no arguments and returns a float, leaving the object itself unaltered:

class FloatFun {
  public:
    virtual float operator() () const = 0; 
    virtual void destroy() { delete this; } 
    virtual const FillIterator *getIterator() const { return 0; }
  protected:  
    virtual ~FloatFun() {}; 
};

The idea is that for any variable that we want to plot, we have a function object of some class derived from FloatFun that will return the value of the variable via operator().

The self-filling objects will then get pointers to FloatFun objects when they are created, and thereby "know" what to plot when their Fill() method is called.

A nice percolate of function objects is that we can easily define classes that hold pointers to other function objects and thereby can implement arithmetic operations like product or sum, or represent functions of function objects like sine or square root.

In our framework, FloatFun objects are used to define abscissa values and weights for self-filling histograms.

We can also express cuts by writing things like

FloatFun& pt = *new PtFun(ntuple);     // returns pt of an event
BaseCut& ptcut1 = (pt > 3);            // A cut on pt
BaseCut& ptcut2 = (2 <= pt < 3);       // Another ptcut, equivalent to 
                                       // (2 <= pt) && (pt < 3);
SFH1F *h = new SFH1F ("pt",            // Histo name
                      "Event pt > 3",  // Histo title
                      ptbinning,       // The binning
                      pt,              // What is plotted: pt
                      pt > 3,          // A cut
                      sqrt (pt));      // A weight

One last note: Why is the destructor of FloatFun virtual? Actually, for FloatFun itself this is a bit pointless, because FloatFun is a pure abstract class, so no FloatFun objects can be instantiated anyway. However, it should remind you, the user, also to make the destructor of any subclass of FloatFun protected, or even private (in which case no further subclasses can be derived from that class!).

The protected constructor makes code such as this illegal:

// constructor of DataLoop (derived from EventLoop): 
// Book all histograms
DataLoop::DataLoop (Ntuple& ntuple) {
  PtFun pt(ntuple);                // pt is now an instance of PtFun
  // h shall plot the event pt; it will hold a pointer to the
  // pt object!
  h = new SFH1F ("pt", "Event pt", ptbinning, pt); /

  // pt goes out of scope, ~PtFun(0 is called,
  // => object pt does no longer exist, the pointer held by 
  // the self-filling hstogram h is now invalid!
}
If PtFun has a protected destructor, this code will not compile.

Since FloatFun objects are referenced through pointers held by self-filling objects like SFH1F objects, and since the pointers are used in the SFH1F::Fill() method, the FloatFun objects must live at least until the last Fill() call has been made, i.e. basically until the end of the program. By forcing the user to write

// constructor of DataLoop (derived from EventLoop): 
// Book all histograms
DataLoop::DataLoop (Ntuple& ntuple) {
  PtFun& pt = *new PtFun(ntuple);  // pt now points to an instance of PtFun
  // h shall plot the event pt; it will hold a pointer to the
  // pt object!
  h = new SFH1F ("pt", "Event pt", ptbinning, pt); /

  // pt goes out of scope, but the (anonymous) objects
  // it points to lives on! => All is well that ends well...
}
we can ensure that the FloatFun objects live as long as necessary. See section Why Do I Have to Create FloatFun, BaseCut, andFillIterator Objects with 'new'?" for further discussion on this subject.

Class BaseCut: Base Class for Cuts

If class FloatFun returns a float, then Class BaseCut could also have been named BoolFun, which would maybe have been more logical. Anyway, an object of some subtype of BaseCut has to return a bool when its operator() is called:
class BaseCut {
  public:
    virtual bool operator() () const = 0; 
    virtual void destroy() { delete this; } 
    virtual const FillIterator *getIterator() const { return 0; }
  protected:
    virtual ~BaseCut() {}; 
};

So, a BaseCut object looks at an event (or some part of an event, such as a jet, if a FillIterator is used), decides whetehr the event passes a cut or not, and returns this decision as result of its operator().

Class IntFun: Integer valued Function objects

Class IntFun is equivalent to FloatFun in that it returns an integer when operator() is called:
class IntFun {
  public:
    virtual int operator() () const = 0; 
    inline virtual FloatIntFun& Float () const {
      return *new FloatIntFun(*this);
    } 
    virtual void destroy() { delete this; } 
    virtual const FillIterator *getIterator() const { return 0; }
  protected:  
    virtual ~IntFun() {}; 
};

An IntFun is very useful to express cuts on integer values, such as the number of tracks:

  IntFun&  ntrack = NTIntFun<Ntuple> (ntuple, "ntracks");
  BaseCut& trackcut = (ntrack == 2);
If, however, we want to plot an integer value, like the number of tracks in an event, we need a FloatFun, not an IntFun.

For this, we have method Float(), which returns an object of type FloatIntFun, which is derived from FloatFun and stores a reference to an IntFun. It calls operator() of the IntFun, converts the result to a float and returns it as result of its own operator():

class FloatIntFun: public FloatFun {
  public:
    FloatIntFun (const IntFun& intFun_) 
      : intFun (intFun_) {}
    virtual float operator() () const { 
      return intFun();
    } 
  protected:  
    virtual ~FloatIntFun() {}; 
    const IntFun& intFun;  
};

Class FillIterator: Filling Iterators

The FillIterator idea came relatively late into our framework, which is why FillIterator objects sometimes appear at strange places in argument lists. Anyway, what is a FillIterator for?

A FillIterator helps to adress the question "how can we plot more that one value per event (= ntuple row)?". Typical applications occur when we want to plot the pt (transverse momentum) of all tracks in an event, or the distance of all secondary vertices, or the masses of all D* candidates.

The interface of this class is a bit more complicated than the interfaces we saw so far:

class FillIterator: public IntFun {
  public:
    virtual int operator() () const=0;   
    virtual bool next()=0; 
    virtual bool reset()=0;              
    virtual const FillIterator *getIterator() const { return this; }
  protected:
    virtual ~FillIterator() {};
};

Typically, a FillIterator will access some integer value in an ntuple that contains the number of tracks, jets, D* candidates, or similar, and have an internal variable (like "index") that contains the value of the index.

To facilitate the writing of such a class, we have defined a subclass SimpleFillIterator, where the user just has to provide (in a subclass, of course) an implementation of method getRange(), and the rest is already in place.

A FillIterator object (if there is one) is used by the Fill() method of a self-filling histogram to step through all objects (jets, tracks, D* candidates) that are to be plotted. Then, the FloatFun objects are called and asked about their values.

Consequently, such FloatFun objects have to know about the iterator, i.e. they have to hold a pointer to the iterator and in their operator() method have to ask the iterator about the number of the track, jet or whatever whose pt or other value should be plotted.

Such a FloatFun might look like this:

class PtFun : public FloatFun {
  public: 
    PtFun(const Ntuple& nt_, const FillIterator& iter_) 
      : nt(nt_), iter(iter_) 
      {};
    virtual float operator() () const { 
      assert (trackIter() >= 0);
      return nt->IdPt[trackIter()];
    };
  protected: 
    ~PtFun() {};
    const Ntuple& nt;
    const FillIterator* iter;
};
Common to all FloatFun objects that collaborate with a FillIterator is that they have to store a pointer (or reference) to a FillIterator object, because this object has to tell them the object number whose properties they should return.

It is of paramount importance that the FloatFun object gets the same FillIterator object that is also passed to the self-filling histogram, because the histogram's Fill() method is responsible for incrementing the FillIterator object, which should be reflected in the result of the FloatFun.

Relation between FillIterator and IntFun

As can be seen from the definition of FillIterator, a FillIterator is also an IntFun. This is resonable, because an IntFun asically has only to provide an operator() that returns an int, which is what a FillIterator does.

A FillIterator has sort of two faces:

An advantage of a FillIterator being an IntFun is that we can use FillIterator also as value for an axis, e.g. in a situation where we want to plot how often a subtrigger has fired.

The Purpose of Method getIterator()

We have so far not discussed the purpose of method getIterator(), which is a method defined for classes FloatFun, IntFun, and BaseCut.

If any object of these classes uses a FillIterator object, it should return the pointer to this object in method getIterator(). This is used by the self-filling histograms to ensure that all function objects depend on the same iterator, and furthermore allows the self-filling histograms to deduce which iterator to use, so that the FillIterator needs not to be given explicitly.

Class CachedO: Cached Objects

Sometimes the calculation of event properties can be rather time-consuming, for instance when a jet finder has to be run over a set of tracks and clusters before jet properties like pt are plotted.

In such cases it can be prohibitively slow to recompute these properties every time operator() of a FloatFun is called. This problem is adressed by cached objects.

The interface of the abstract base class CachedO is again very simple:

class CachedO: public RegO {
  public: 
    CachedO (const ROListPoR& rol);
    virtual ~CachedO ();
    virtual void invalidateCache() = 0;
  private:
    CachedO (const CachedO& rhs);
    CachedO& operator= (const CachedO& rhs);
};

The main property of a CachedO is a method invalidateCache(), which tells the object that its cache has become invalid, which will be the case after a new row of an ntuple has been read.

Tobe able to efficiently invalidate all caches, we made CachedO a registered object (derived from RegO). In class EventLoop, we have a ROList named cachedObjects, which is used to collect all cached objects and call invalidateCache() for all of them in the loop() method.

To facilitate the use of cached objects, we have defined four derived classes that should serve as base classes for cached FloatFun and BaseCut classes. We have defined two versions, one using iterators and one without iterators:

A Word on const and mutable

An important caveat in using CachedO subclasses concerns the use of const. The classes FloatFun and BaseCut have declared their respecive operator() methods to be "const", i.e. these methods must not change any data members of these classe.

Howeve, if we want to recalculate the value of a FloatFun object only when operator() is actually called, we face the problem that we need to store (i.e., cache) the result in a data member, which is not allowed in a "const" method. Now what?

C++ defines the notion of "logical const-ness", i.e. allows to have data members that can be changed even for const objects. This is signalled by the keyword "mutable". Look at our implementation of SimpleCachedFloatFun:

class SimpleCachedFloatFun: public FloatFun, public CachedO {
  public:
    SimpleCachedFloatFun (const ROListPoR& rol)
      : CachedO (rol), cacheValid (false), cachedValue (0)
      {}

    virtual float operator() () const {
      if (!cacheValid) {
        recalculate();
        cacheValid = true;
      }
      return cachedValue;
    } 
      
    virtual void invalidateCache() {
      cacheValid = false;
    }
    
    virtual void recalculate() const = 0;
    
  protected:
    mutable bool cacheValid;   
    mutable float cachedValue; 
};

Here, cacheValid and cachedValue are declared "mutable" and hence may be altered for const objects, i.e. in member functions that are marked "const", such as operator(). Observe also that recalculate() has been declared "const", because otherwise it could not be called by operator().

Of course, another way out would be to recalculate the values immediately in the invalidateCache() method, which is not "const":

class SimpleCachedFloatFun: public FloatFun, public CachedO {
  public:
    SimpleCachedFloatFun (const ROListPoR& rol)
      : CachedO (rol), cachedValue (0)
      {}
    virtual float operator() () const {
      return cachedValue;
    } 
    virtual void invalidateCache() = 0;
  protected:
    float cachedValue; 
};

We have decided against this model, because presumably the recalculation is a costly process (otherwise we wouldn't go through all the trouble), and it is quite possible that for many events the recalculation is unnecessary, because cuts are made that reject most events (consider, for example, an application where we run a jet finder on an LHC event, but only for events which have at least two muon candidates). In such a case, a calculation of the FloatFun result for every event could degrade performance more than not using the cache mechanism at all.

How to apply cached objects

We see basically two situations where cached objects are beneficial:

HVisitor: Histogram Visitor

In handling collections of histograms we often encounter the situation that we want to apply a certain operation to all histograms in the collection. Such operations can be writing histograms to a file, drawing histograms on a canvas, fitting them with some fit function, or setting attributes like line colors.

Now we have two possibilities:

We could write (ugly) code like this:

void setLineColors (const ROList& rol) {
  for (unsigned int i = 0; i < rol.getEntries(); i++) {
    RegO *ro = rol.getEntry(i);
    // Make sure our registered object is really a RooT histogram:
    if (TH1 *h = dynamic_cast<TH1 *>(ro)) {
      // Set line color to red
      h->->SetLineColor (2);   
    }
  }
}

Then we apply this method to the various collections (e.g., all SFSetOfHistograms, all SFMatrixOfHistograms objects).

Somewhat better would be to derive our own collections:

class MySetOfHistograms: public SetOfHistograms {
  void setLineColors () {
    for (unsigned int i = 0; i < this->getEntries(); i++) {
      RegO *ro = this->getEntry(i);
      // Make sure our registered object is really a RooT histogram:
      if (TH1 *h = dynamic_cast<TH1 *>(ro)) {
        // Set line color to red
        h->->SetLineColor (2);   
      }
    }
  }
};
But then we have to provide a lot of constructors, we have to repeat the exercise for SSetOfHistograms, MatrixOfHistograms, SFMatrixOfHistograms, and possibly other classes. Not nice.

Or is there a third possibility? The Visitor pattern from GoF comes to the rescue:

Write a class like this:

class LineColorSetter: public HVisitor {
  public: 
    virtual void visit (RegO& ro) {
      if (Th1 *h = dynamic_cast<TAttLine *>(&ro)) {
        h->SetLineColor (h);
      }
    }
};

Now, all we need is some code in class ROList that calls the visit() method for every object in its list:

ROList& ROList::visit (HVisitor& v) {
  for (unsigned int i = 0; i < entries; i++) {
    if (theList[i]) v.visit((*theList[i]));
  }
  return *this;
}
(the true code is a bit more involved, but that's not the point here).

Now in our user code we can write code like this:

  LineColorSetter lcs; // a LineColorSetter object
  // sfset is a SFSetOfHistograms
  sfset.visit(lcs);    // Set all histogram's line colors to red

In file HVisitors.h (plural!), we have defined a number of handy HVisitor subclasses that perform common tasks:

We have also predefined some objects that can be used directly to set some attributes, with (as we think) selfexplaining names:

static AttLineSetter blackline (1);
static AttLineSetter redline (2);
static AttLineSetter greenline (3);
static AttLineSetter blueline (4);
static AttLineSetter yellowline (5);
static AttLineSetter magentaline (6);
static AttLineSetter cyanline (7);
static AttFillSetter blackfill (1);
static AttFillSetter redfill (2);
static AttFillSetter greenfill (3);
static AttFillSetter bluefill (4);
static AttFillSetter yellowfill (5);
static AttFillSetter magentafill (6);
static AttFillSetter cyanfill (7);
static AttFillSetter hollowfill (-1, 0);
static AttFillSetter solidfill (-1, 1001);
static AttMarkerSetter blackmarker (1);
static AttMarkerSetter redmarker (2);
static AttMarkerSetter greenmarker (3);
static AttMarkerSetter bluemarker (4);
static AttMarkerSetter yellowmarker (5);
static AttMarkerSetter magentamarker (6);
static AttMarkerSetter cyanmarker (7);

A note on the class name: HVisitor should really be called RegOVisitor, but for historical reasons its name is what it is.

How to use HVisitor and ROList efficiently

As we have seen, HVisitor objects are an efficient way to perform operations on collections of histograms.

Often we'll want to perform some operation, let's say a mass fit, only on some subset of all histograms. In such cases it makes sense to define our own ROList objects in our DataLoop class:

class DataLoop: public EventLoop {
public:
  // the usual stuff here...
private:
  ROList masshistos;
  ROList otherhistos;
};

// constructor: book histos
DataLoop::DataLoop() {
  // Define binnings, FloatFuns etc here
  
  // book a mass histo, put it into list masshistos:
  new SFH1F ("mass", "Some Mass", massbinning,
             masshistos, massfun);
}

// Plot histograms
DataLoop::output (const char* rootfile, const char* psfile) {
  // MassFitter is a HVistor that fits a mass histogram
  MassFitter theMassFitter;
  // Fit only mass histograms, not the others
  masshistos.visit (theMassFitter);
  // Continue here with plotting etc...
}

Another possibility would be that a HVisitor checks e.g. the name of a histogram and fits only histograms that contain the string "mass".

Binning: Histogram Binning

Class Binning is our way to have objects that represent the binning of a histogram axis. We think that RooT itself should have such a class.

Often we have to book histograms which all have the same binning. Now, instead of

  h1 = new TH1F ("h1", "hist 1", 100, 0., 1.);
  h2 = new TH1F ("h2", "hist 2", 100, 0., 1.);
  h3 = new TH1F ("h3", "hist 3", 100, 0., 1.);
  h4 = new TH1F ("h4", "hist 4", 100, 0., 1.);
  // and so on, ad infinitum...
we rather want to be able to write
  Binning hbinning (100, 0., 1.);
  h1 = new RegH1F ("h1", "hist 1", hbinning, this);
  h2 = new RegH1F ("h2", "hist 2", hbinning, this);
  h3 = new RegH1F ("h3", "hist 3", hbinning, this);
  h4 = new RegH1F ("h4", "hist 4", hbinning, this);
  // and so on, ad infinitum...
which allows us to change the binning of all histograms in one single place.

Binning objects allow us to do exactly that. We have constructors for Binning objects that exactly mimic the corresponding part in the RooT histogram constructors:

class Binning {
  public:
    Binning ();
    Binning (int nbins_, double xlow, double xhigh);
    Binning (int nbins_, const float binedges_[]);
    Binning (int nbins_, const double binedges_[]);
    Binning (const Binning& rhs);  
    virtual ~Binning(); 
    
    virtual int getBin (double x) const; 
    virtual int getNBins() const;
    virtual double getLowerBinEdge(int i) const;
    virtual double getUpperBinEdge(int i) const;
    virtual double getLowerEdge() const;
    virtual double getUpperEdge() const;
    virtual const double *getEdges() const;
    virtual bool isEquidistant() const;
    
  protected:  
    int nbins;        
    double *binedges; 
    bool equidistant; 
  
  private:
    Binning& operator= (const Binning&);
};

We have defined constructors for all our histogram classes that take Binning objects instead of the parameters nbins, xlow, xhigh.

BinningFun: Histogram Binning Function objects

Derived from Binning is class BinningFun.

Compared to a Binning, it has the same constructors, and only a few additional methods that have to be implemented by a derived class: class BinningFun: public Binning, public IntFun { public: BinningFun (); BinningFun (int nbins_, float xlow, float xhigh); BinningFun (int nbins_, const float binedges_[]); BinningFun (int nbins_, const double binedges_[]); BinningFun (const Binning& binning_);

virtual int operator() () const = 0; virtual const char *getBinName(int i) const = 0; virtual const char *getBinTitle(int i) const = 0; protected: virtual ~BinningFun() {}; };

A BinningFun returns a bin number (-1 for "no bin") in operator(), insofar it is also an IntFun. BinningFun objects are used by classes SFSetOfHistograms and SFMatrixOfHistograms to decide into which histogram a certain entry should be made.

For instance, if we have a flag "filetype" in our Ntuple, where 0 means "data", 1 means "Signal Monte Carlo", and 2 means "Background Monte Carlo", we could have a BinningFun that returns 0, 1, or 2, and then the entry goes into the corresponding histogram in a SFSetOfHistograms.

In addition to operator(), a BinningFun has two other methods that have to be implemented by a derived class, namely getBinName(int i) and getBinTitle (int i). These methods should return strings (allocated with operator new[]) with a name or a title for a given bin i. The name should be short (like "0", "1", "2" or "data", "sigMC") and contain no spaces, the title could be nicer like "Data", "Signal Monte Carlo" or "0.1 < t < 0.2". These methods are used by SetOfHistograms and MatrixOfHistograms and their subclasses to generate histogram names and titles during the histogram booking stage.

To make life easier, we have defined a class FloatFunBinning, which takes a FloatFun that defines a variable according to which the binning is done, a Binning, and a string that should be the variable name (like "t" or "pt" or "W"), from which bin names like "007" and bin titles like "0.1 < t < 0.2" are generated.


Generated on Thu Oct 26 12:55:27 2006 for SFH by doxygen 1.3.2