Alpos Documentation

Alpos is an object-oriented data to theory comparison and fitting tool

The project homepage is found at http://www.desy.de/~britzger/alpos/

Introduction

Alpos is an object-oriented data to theory comparison tool. The program is ideally suited for fits of theory parameters, for statistical analysis of theory predictions and for data combinations. The modular object-oriented architecture of the code allows for an easy implementation of new data sets and theory predictions, as well as for new analysis tools or their extension. The concpet of Alpos involves a modular and transparent implementation of theory predictions, which also enables a high level of consistency within different components of predictions, but still provides a very userfriendly interface.

The interface for new contributions is clearly defined in an object-oriented manner through inheritance. New contributions to Alpos may be so-called functions, tasks and datasets.

New theory functions are theoretical predictions, which take as input single parameters and/or other functions. Typically, these are predictions for particular measurements, but may also be for instance PDF or alpha-s evolution codes. Functions calculate their output values on the basis of all current input values by the virtual function Update().

Tasks are classes which provide an Execute() routine and may perform any kind of operation taking the theory predictions or datasets into account. Tasks are allowed to access all present theory values and also the data sets. Typical tasks are fitting routines or for instance a statistical analysis. Other tasks may be for instance print-out routines, plotting tools or the write-out of results to disk in a particular format.

Within Alpos, datasets are input data files which represent a measurement with all its uncertainties. The data card specifies how the uncertainties are treated in the tasks and also different phase space regions may be specified easily for detailed analysis.

More details are found on the main homepage http://www.desy.de/~britzger/alpos/ . All developers may mirror the project homepage to their webspace, which may partially provide a more up-to-date documentation.

Contributors

Contributors in random ordering:
  • Georg Sieber
  • Daniel Savoiu
  • Daniel Reichelt
  • Klaus Rabbertz
  • Kristian Bjoerke
  • Daniel Britzger
  • Valerio Bertone
  • Anterpret Kaur
  • ...
  • Add yourself if you made some contributions...

Download, installation and first steps

The source code can be checked out from a public svn repository. Please ask for the repository url by mail: daniel.britzger@desy.de

Requirements

Required packages:
Optional:
  • sphinx
  • pdflatex
  • minted (for code highlighting in the .pdf-manual)

Download and installation

Check out the package from svn using svn co <package-url>.For the installation, all above’s mentioned packages have to be installed first.

QCDNUM v17 needs to be compiled with option -fPIC, i.e. add this option to the four makelib files like: gfortran -c -Wall -O2 -fPIC -Iinc src/*.f.

For installation, please use cmake:

$ cmake .

Add further arguments for your local installation:

+ -DCMAKE_INSTALL_PREFIX=<your-dir>
+ -DQCDNUM_PREFIX=<your-dir>
+ -DLHAPDF_PREFIX=<your-dir>
+ -DFNLO_PREFIX=<your-dir>
+ -DAPFEL_PREFIX=<your-dir>

or, if everything is installed in the same “prefix” directory:

+ -DPREFIX=<your-dir>

It may be helpful to define these paths in a dedicated simple shell script.

For Eigen, either copy ‘Eigen’ into the directory apccpp or create a suitable symbolic link therein. Alternatively, you can also define a path to your Eigen-directory in CMakeList.txt, like:

include_directories(${EIGEN_INCLUDE_DIR})

or for examle:

include_directories(/cvmfs/atlas.cern.ch/repo/sw/software/x86_64-slc6-gcc48-opt/20.1.2/AtlasCore/20.1.2/External/AtlasEigen/x86_64-slc6-gcc48-opt/pkg-build-install-eigen)

Now continue with the compilation:

$ make
$ make install

There are compilation problems on Mac’s which are known, but not yet fixed.

It may be needed, that cmake links to non-desired default compilers. Therefore set the variables CC and CXX, e.g. like:

$ export CC=`which gcc`
$ export CXX=`which g++`

or directly specify their locations.

This produces the executble ./src/alpos. To run Alpos, stay in the trunk directory and type:

$ ./src/alpos tutorial/1.welcome.str

This executes alpos with a very simple steering card. This first example will printout the welcome message and executes the welcome-steering task list. Some few info, and warning statements can be ignored.

The steering file

A typical steering of Alpos is based on one main steering file, which is also input to the Alpos-class. In the main steering file some further steering-files (.dat) are specified for the datasets. Using the option ‘>>‘ from read_steer, the main steering file may also be subdivided into multiple files. Furthermore, steering parameters for the alpos theory have to be specified in the steering.

The main steering files consists typically of five different parts:
  • Some global alpos settings

  • The specification of the datafiles with their respective theory predictions (DataTheorySets)

  • The tasks to be executed (Tasks)

  • The parameters for the tasks (mind: these are no (alpos) theory-parameters)

  • The alpos theory settings (AlposTheory), consisting of:
    • The functions to be initialized (InitFunctions) (the predictions for the datasets, which are also functions, are given together with the datasets)

    • (Theory) parameters:
      • simple parameters
      • input parameters to the functions
      • default parameters for the functions

Well documented example steering files are available in the directory ./tutorial.

Alpos - ‘Datasets’

A brief summary and a collection of comments of the available data files.

Define a new dataset

An Alpos dataset consists of several different parts and is typically stored in a single file with the extension .dat. The constituents are:

  • Some general description
  • Alpos parameters for the Alpos theory function(s)
  • A huge Data table with values and errors
  • The specification of the Errors
  • Optionally also Subsets, Cuts, TheoryFactors and correlation matrices may be defined

How to specify errors

Within alpos, each error is handled as an instance of the AError class with a unique name. This name is composed of the ErrorSet and ErrorName (as specified in the table ‘Errors’) as <ErrorSet>_<ErrorName>. Errors with identical (full) name are finally assumed to be correlated among different datasets, taking their specified correlation coefficient into account.

All errors have to be specified in the table Errors. Therein, the colum ErrorName, Column and Type must be specified. An example could look like:

ErrorSet              "ATLAS Run-I"
ErrorUnit             "Percent"
Errors {{
   ErrorName      Column               Type         Nature
     "Stat"     "stat.(%)"              SC            P
     "JES"     "JES(up):JES(dn)"        0.5           M
      RCES     "EHFS(up):EHFS(dn)"      ""            M
      Lumi           2.5                 1            M
      Trig           1.2                EY1           M
}}

In Errors, each row specifies an error source. The value ErrorName can be chosen freely (use quotation marks if you want to include empty spaces).

The column Column may specify one or two (separated by :) columns of the Data-table which specify the size of the errors. If only one column is given, then the error is assumed to be symmetric. Otherwise the first column specifies the up and the second column the down uncertainty. Alternatively, the value of this error source may be specified directly (e.g. 2.5 for 2.5% luminosity uncertainty, if ErrorUnit is set to Percent.).

The units in the Data-table are specified for all error sources with the key ErrorUnit (options are: Percent, Relative, Absolute).

The column Type specifies the type of the uncertainty. Four distinctions are made:
  • Statistical (S) or systematic uncertainty (Y)
  • Experimental (E) or theoretical (T) nature
  • Define the correlation coefficient (0, 1, 0.5, ...) or provide a correlation matrix (or not) (C or Matrix)
  • Multiplicative (M) or additive (A) treatment

If no type specification is given (i.e. if the table entry is "" or some specifier is missing), the default error type EYM1 is used, which means it is an “experimental multiplicative systematic uncertainty with a correlation coefficent of 1, where no correlation matrix is specified”. This means, that essentially only the letters S, T, C and A and the correlation coefficients smaller than 1 have to be given explicitly.

The column Nature is free for user extensions and anything may be specified therein.

Important

For backward compatibility, the column Type may also be named Correlation.

Important

If the first error source contains the substring Stat, it is assumed to be a statistical uncertainty (if not explicitly Y is specified in Type).

Specify a correlation matrix

If the type-specifier of an error contains the key-letter M, then a covariance or correlation matrix has to be specified. Therefore, the three keys <ErrorName>_Matrix_Format, <ErrorName>_Matrix_Format and <ErrorName>_Matrix have to be given, while <ErrorName> denotes here the name of the error as given in the Error-table.

For instance the correlation coefficients of the statistical uncerainty Stat may be given like:

Stat_Matrix_Format           "SingleValues" # "Matrix" or "SingleValues" or specify single value only
Stat_Matrix_Type             "Correlation" # 'Covariance', 'Correlation' or 'CorrelationPercent'

Stat_Matrix {{
  q2min  ptmin        q2min   ptmin       values
    150      7          150      11        -0.224
  [...]
}}

The header of the table Stat_Matrix must denote sufficient columns to uniquely identify a row, and then given twice. The values in each row must match the values in Data. The column values is mandatory and denotes the correlation coefficient or the covariances.

Alternatively, the matrix may be specified directly in either full or half-matrix notation. Mind, that the first row of the table Stat_Matrix needs to be left empty. The values may either specify the correlation coefficients or the covariances. For example:

Stat_Matrix_Format           "SingleValues" # "Matrix" or "SingleValues" or specify single value only
Stat_Matrix_Type             "Correlation" # 'Covariance', 'Correlation' or 'CorrelationPercent'

Stat_Matrix {{
       # here an empty line is important !
   1
  -0.5   1
   0.1  -0.5   1
  [...]
}}

Optional keys in the data file

Todo

Specify theory factors:

TheoryFactors { }

Specify subsets:

Subsets { }

Specify cuts:

Cuts { }

Developers documentation

The ‘core’-package of alpos consists only of few ingredients. These are
  • Dataset steering files to specify the input data.
  • Theory parameters or functions.
  • The TheoryHandler as a global singleton which gives access to individual datasets, predictions and theory components
  • Tasks, which are executed one by another.
  • The Alpos-class which reads the steering file and brings everything together.

The main()-function instantiates only one instance of an Alpos`-object which takes as input parameter the steering file. In the ``Alpos-constructor the TheoryHandler is initialized. Afterwards, alpos simply executes all tasks as specified in the steering by calling their Init() and Execute() functions one after the other.

The Alpos class

The Alpos class reads in the steering, and prepares and executes the initialization of the TheoryHandler. Then, the tasks are executed by Alpos one by the other.

The TheoryHandler

The TheoryHandler is a global singleton class which provides access to the instances and the values of functions and parameters. The TheoryHandler can be access by:

static TheoryHandler* TheoryHandler::Handler();

In order to allow a simplified access to parameters and functions, precompiler function are defined in ATheory.h. These are subdivided into two purposes: Access to function/parameters within functions, or within tasks.

Mind the difference in the call for the precompiler-function for functions and tasks:
  • For functions, these access the input-parameters of the particular function-instance. The parameter names are given explicitly and without quotation marks.
  • For tasks, any parameter may be access and the full parameter name has to be passed. The value is passed by value, i.e. either as a string-parameter or with quotation marks.

These precompiler functions are:

Precompiler function Return value Usage
Use within functions
PAR(parname) std::string Access the value of an input-parameter (which is a requirement of the function) (identical to PAR(par)[0])
PAR_S(parname) double Access the value of a (string-)parameter, which is a requirement of the function
VALUES(parname) vector<double> Access the values of a parameter, which is a requirement of the function
UPDATE(parname) void Identical to PAR(parname). Used for update parameter, e.g. to later use QUICK within the Update() function.
CHECK(parname) bool Check, if a parameter has changed and thus this function obtained the IsOutdated from that parameter
CONST(parname) void Set a parameter to constant: i.e. the parameter is not allowed to change any longer.
SET(parname,V,E) void Set the value of any input-parameter.
SET_S("Xparname,V,E) void Set the value of any input-(string-)parameter.
QUICK(X,A) vector<double> Quick access to the the values of an input function
QUICK_VAR(par,n,...) vector<double> Quick access to the the values of an input function
QUICK_VEC(X,Y) vector<double> Quick access to the the values of an input function
 
Use within tasks
PAR_ANY("X") double Access any parameter/function value. The argument X is passed by value (i.e. may require quotation marks).
PAR_ANY_S("X") string Access any string-parameter. The argument X is passed by value (i.e. may require quotation marks).
VALUES_ANY("X") vector<double> Access any parameter/function value. The argument X is passed by value (i.e. may require quotation marks).
QUICK_ANY("X",val) vector<double> Quick-access to any parameter/function value. The argument X is passed by value (i.e. may require quotation marks).
SET_ANY("X",V,E) void Set the value of any string-parameter. The argument X is passed by value (i.e. may require quotation marks).
SET_ANY_S("X",V,E) void Set the value of any string-parameter. The argument X is passed by value (i.e. may require quotation marks).

The full access to the parameters is available through TheoryHandler::GetParmD(string) or TheoryHandler::GetFuncD(string)

Functions

Todo: Describe the
  • Function name
  • Init() and Update() functions
  • Requirements and input parameters/functions
  • Quick access
  • Access to steering parameters
  • Special cases: Singletons for Fortran wrapping

Init()

Important

No access to the values of other functions is possible

Other member-functions

Define your own member-functions for an easy readable and maintable code and call them from Init() or Update().

Singleton functions

Use singleton functions to wrap Fortran routines.

read_steer

read_steer is a tiny tool to read steering values from one or more steering files. This class reads in values, which are stored in a file. New variables can be declared, read-in and included without recompilation.

Features
  • Following types are supported:
    • Single values: bool, int, double, string (with empty spaces), char
    • Arrays: int, double, string (with empty spaces)
    • Tables/matrices: int, double, string
  • Multiple files can be read in and handled individually or together.

  • Namespaces can be defined (e.g. same variable name in different namesapces).

  • Variables within a steer-file can be defined similar to shell skripts.

  • Command line arguments can be parsed and can superseed values from files.

  • Easy access via pre-processor commands

  • Other files can be included into steering file

The full documentation and examples are found in fastnlotk/read_steer.h.

Misc

This Documentation

This HTML documentation is built on sphinx using rst (reStructuredText) input. The source files of it are stored in the Alpos root-directory at docs/sphinx and docs/sphinx/source.

To (re-)build this documentation you need sphinx to be installed and then type:

cd docs/sphinx
make html

You find the index page in html/index.html.

Details about sphinx can be found at http://www.sphinx-doc.org . Details about the rst markup syntax can be found at http://docutils.sourceforge.net/rst.html . Details about the alabaster theme are found at https://pypi.python.org/pypi/alabaster .

The most useful summary page of commands is at rstdemo.html .

Revision history

  • ongoing developements
    Contributors: D. Britzger, D. Reichelt, D. Savoiu, K. Rabbertz, A. Kaur, G. Sieber
    • New sphinx Docu
    • Function for APPLGrid (not committed)
    • fastNLO interface more flexible
    • Full APFEL functionality
    • Switches for ‘threshold-corrections’ for fastNLO
    • Lots of minor bugfixes
    • Analytic calculation of nuisance parameters
    • Updated interface for errors with column ‘type’ and specifiers like ‘EYN1’ or ‘TSM0.5’
    • ...
    • ...
    • ...
    • ...
  • v0.4, 25. Sep 2015, contact: daniel.britzger@desy.de
    Contributors: D. Britzger, D. Reichelt, K. Bjoerke, K. Rabbertz
    • Enable PDF fits: Tested HERAPDF1.0 and HERAPDF2.0 against HERAFitter
    • New chisq definitions with analytic calculation of nuisance parameters
    • Write out PDF root-files for plotting (SavePDFTGraph)
    • 2D contour scans (Contour)
    • 1D chisq scans (Chi2Scan)
    • Apply cuts on data
    • Enable to exclude datapoints
    • access PDF uncertainties from LHAPDF6
    • Specify uncertainties directly as numerical value
    • Pass steering parameters in command line
    • Enable to pass error-‘nature’ through code
    • bugfix for covariance matrix of subsets
    • Clearer getters for errors (uncorr, stat, matrix-type)
    • Dummies for APC, Bi-log PDF parameterization
    • Full interface to EPRC (EPRC)
    • Print error summary (PrintErrors)
  • v0.3, 24. Jul 2015, contact: daniel.britzger@desy.de
    • version for summer students
    • Tested and verified inclusive jet fits
    • ...
  • v0.2, 15. Feb 2015, contact: daniel.britzger@desy.de
    Update with relevant feature to exclude datapoints (to come) and study ranges of data points
    • Verbosity steerable
    • error averaging steerable
    • Subsets of datapoints for each datatheory sets
    • AStatAnalysis: (chisq, pulls, p-value) also for ‘subsets’
    • Improved printout and verbosity-level
    • One minor bugfix in ARegisterRequirements() (default values no longer needed)
    • Calculation of Pull values
    • New Chisq’s for uncor and stat uncertainties only
    • Simpler access to dataTheorysets from TheoryHandler: i.e. return map<name,pair<AData*,AFuncD*>>  // for 'full datasets' and return map<name,map<name<pair<AData*,AFuncD*>>> // for subsets
    • Init subsets in TheoryHandler
  • v0.1, 11. Feb 2015, contact: daniel.britzger@desy.de
    First version available as tarball
    • Tasks, Functions and datasets available
    • Simple fitting Task AFitter
    • several chisq functions
    • Various treatments of uncertainties: corr, uncorr, stat, error-averaging, etc...
    • Many inclusive jet data tables (HERA,LHC,Tevatron)
    • SuperData and SuperTheory
    • fastNLO interface
    • QCDNUM init and alpha_s evolution
    • CRunDec alpha_s code
    • LHAPDF interface (PDFs and alpha_s)
    • TheoryHandler and all that parameter/function stuff
    • Alpha_s fit doable for 8 inclusive jet measurements
    • Short pdf-manual available

Todo’s

Apparent todo’s
  • Tasks cannot yet access the previous results from previous tasks, although these are collected and stored
  • Re-implement the datasets (with full error-split up as provided in the publications)
  • Maybe the ‘datatheory’ steering should be somewhat separated, such that different theory-function can be directly choosable without having an own steering file with duplicated data
  • Plotting tools, etc...
  • GetRequirements() should return ‘const string&’
  • Chisq’s could be implemented as ‘AFunctions’ as well

Table Of Contents

Related Topics

This Page