Solution methods
Local fit
Parallelization
Compressed sparse matrix
Gzipped C binary files
Transformation from FORTRAN77 to Fortran90
Memory management
Read buffer size
Number of binary files

Major changes with respect to the draft manual.

Solution methods

The following methods to obtain the solution $\Vek{x}$ from a linear equation system $\Vek{A}\cdot\Vek{x}=\Vek{b}$ are implemented:

Inversion

The solution and the covariance matrix $\Vek{A}^{-1}$ are obtained by inversion of $\Vek{A}$ . Available are the value, error and global correlation for all global parameters. The matrix inversion routine has been parallelized and can be used for up to several 10000 parameters.

Diagonalization

The solution and the covariance matrix $\Vek{A}^{-1}$ are obtained by diagonalization of $\Vek{A}$ . Available are the value, error, global correlation and eigenvalue (and eigenvector) for all global parameters.

Minimal Residual Method (MINRES)

The solution is obtained by minimizing $\Vert\Vek{A}\cdot\Vek{x}-\Vek{b}\Vert_2$ iteratively. MINRES is a special case of the generalized minimal residual method (GMRES) for symmetric matrices. Preconditioning with a band matrix of zero or finite bandwidth is possible. Individual columns $\Vek{c_i}$ of the covariance matrix can be calculated by solving $\Vek{A}\cdot\Vek{c}_i=\Vek{1}_i$ where $\Vek{1}_i$ is the i-th column on the unit matrix. The most time consuming part (product matrix times vector per iteration) has been parallelized. Available are the value for all (and optionally error, global correlation for few) global parameters.

Local fit

In case the local fit is a track fit with proper description of multiple scattering in the detector material additional local parameters have to be introduced for each scatterer and solution by inversion can get time consuming (~ $n_{lp}^3$ for $n_{lp}$ local parameters). For trajectories based on broken lines [ref 4,6] the corresponding matrix $\Vek{\Gamma}$ has a bordered band structure ( $\Gamma_{ij}=0$ for $\min(i,j)>b$ (border size) and $|i-j|>m$ (bandwidth)). With root-free Cholesky decomposition the time for the solution is linear and for the calculation of $\Gamma^{-1}$ (needed for the construction of the global matrix) quadratic in $n_{lp}$ . For each local fit the structure of $\Vek{\Gamma}$ is checked and the faster solution method selected automatically.

Parallelization

The code has been largely parallelized using OpenMP™. This includes the reading of binary files, the local fits, the construction of the sparsity structure and filling of the global matrix and the global fit (except by diagonalization). The number of threads is set by the command threads.

Caching. The records are read in blocks into a read cache and processed from there in parallel, each record by a single thread. For the filling of the global matrix the (zero-compressed) update matrices ( $\Vek{\D C}_1+\Vek{\D C}_2$ from equations (15), (16)) produced by each local fit are collected in a write cache. After processing the block of records this is used to update the global matrix in parallel, each row by a single thread. The total cache size can be changed by the command cache.

Compressed sparse matrix

In sparse storage mode for each row the list of column indices (and values) for the non-zero elements are stored. With compression regions of continous column indices are represented by the first index and their number (packed into a single 32bit integer). Compression is selected by the command compress. In addition rare elements can be neglected (,histogrammed) or stored in single instead of double precision according to the pairentries command.

Gzipped C binary files

The zlib can be used to directly read gzipped C binary files. In this case reading with multiple threads (each file by single thread) can speed up the decompression.

Transformation from FORTRAN77 to Fortran90

The Millepede source code has been formally transformed from fixed form FORTRAN77 to free form Fortran90 (using TO_F90 by Alan Miller) and (most parts) modernized:

IMPLICIT NONE everywhere. Unused variables removed.
COMMON blocks replaced by MODULEs.
Backward GOTOs replaced by proper DO loops.
INTENT (input/output) of arguments described.
Code documented with doxygen.

Unused parts of the code (like the interactive mode) have been removed. The reference compiler for the Fortran90 version is gcc-4.6.2 (gcc-4.4.4 works too).

Memory management

The memory management for dynamic data structures (matrices, vectors, ..) has been changed from a subdivided static COMMON block to dynamic (ALLOCATABLE) Fortran90 arrays. One Pede executable is now sufficient for all application sizes.

Read buffer size

In the first loop over all binary files a preset read buffer size is used. Too large records are skipped, but the maximal record length is still being updated. If any records had to be skipped the read buffer size is afterwards adjusted according to the maximal record length and the first loop is repeated.

Number of binary files

The number of binary files has no hard-coded limit anymore, but is calculated from the steering file and resources (file names, descriptors, ..) are allocated dynamically.

Table of Contents