Stefan Schmitt, DESY
Hamburg, Notkestr. 85, 22607 Hamburg, GERMANY

Member of the H1 collaboration
and of the
Statistics tools group of the
Helmholtz Alliance for Physics at
the Terascale

- Selected talks
- Software packages:

**Note: this documentaion was written for TUnfold version
16. Please consider to use the latest version
of TUnfold.**

**Number of bins:**you have to use a finer binning in the measured variable as compared to the truth variable. As a rule of thumb, use at least twice the number of bins for the measured variable.Example: measure 20 bins and unfold 10 bins. The data histogram has 20 bins. The 2-dimensional histogram has 20x10 bins.

*Note: unfolding with regularisation (τ>0) may still technically work even if the number of measured bins is equal to or less than the number of bins in the "true" variable. However, TUnfold's standard method to estimate the proper value of τ (automated L-curve scan) requires one unfolding step with τ=0 and so this is likely to fail if the number of measured bins is too small.***How to unfold Monte Carlo with itself:**it is in general a good idea to test the unfolding with Monte Carlo events alone before looking at data. However, the distribution which is unfolded must be statistically independent from the 2-dimensional migration histogram, in order to get meaningful results.Example: Split the Monte Carlo into two samples: 90% of the events are used to fill the 2-dimensional histogram, the remaining 10% are used to produce the "data" distribution.

**Use of the underflow/overflow bins:**the underflow and overflow bins of the reconstructed variable have a special meaning.*Make sure that the underflow/overflow bins are not filled, unless you understand what is going on.*The meaning of underflow/overflow bins is indicated in the picture below.

For the 2-dimensional histogram of migrations, the overflow or underflow bins of the measured variable are used to count the Monte Carlo events which are not reconstructed. The "true" distributions are unfolded using the bins inside the data histograms, but the events in the overflow or underflow bins are used to extrapolate to the full phase-space. This can be understood as an efficiency correction. The underflow and overflow bins of the data histogram are discarded.

Example: imagine there is some observable

There are also underflow and overflow bins of the "true" variable. These are treated in the same way as the other bins in the "true" variable. This means that the content of the underflow and overflow bins of the "true" variable are unfolded from the data.`P`which is non-negative but has no upper bound. However, events with`P≥100`are rare.- Data histogram filling code:

// book histogram with 20+1 bins.

TH1D *data=new TH1D("data","Pdata",21,0.,105.)

...

// fill histogram in event loop

data->Fill(TMath::Min(Pdata,104.9));

The data events with`Pdata≥100`are all filled into the bin number 21. - Migration histogram filling code:

// book histogram with 20+1 times 10 bins

TH2D *migration = new TH2D("migration","xdata vs xgen",21.,0.,105.,10,0.,100.);

...

// fill histogram in event loop

if(isReconstructed) migration->Fill(TMath::Min(Pdata,104.9),Pgen);

else migration->Fill(-1.,Pgen);

The events with`Pdata≥100`are filled into the bin number 21.

The events with no reconstruction are filled in the underflow bin.

For the generated variable, the events with`Pgen≥100`end up in the overflow bins of the y-axis.

`Pgen`), using the 21 data bins in`Pdata`. The unfolded distribution in`Pgen`is corrected for effciency losses, using the non-reconstructed events as stored in the underflow bin.- Data histogram filling code:

Only those errors appearing in the most recent TUnfold version are documented below.

Level | Source | Message | Comment |
---|---|---|---|

Warning/Error | TUnfold | too few (ny=%d) input bins for nx=%d output bins | Reliable unfolding requires a finer binning in the reconstructed variable as compared to the "truth" variable |

Fatal | InvertMSparse | InvertMConditioned(full matrix) failed | One of the matrices in your problem has a rank deficit.
The most common mistake is to have to few bins in the reconstructed variable and τ=0. Another (rare) possibility is that the covariance matrix for the measured variable can not be inverted. |

InvertMConditioned failed (part of matrix) | |||

InvertMConditioned failed (full matrix) | |||

Error | inversion failed (diagonal matrix) nerror=%d | ||

inversion failed (diagonal part) nerror=%d | |||

Warning/Info | TUnfold | %d regularisation conditions have been skipped | Some truth bins of the migration matrix are not populated. Maybe this happened on purpose (overflow/underflow bins). |

Warning | SetInput | %d input bins have zero error, 1/error set to %lf. | Sometimes there are data bins with error=0 (for example, if there is no data event). These bins can not be used, unless 1/error>0 is specified. |

%d input bins have zero error, and are ignored. | |||

Info/Warning | TUnfold | the following output bins are not connected to the input side %s | Sometimes there are "truth" bins not connected to any of the data bins (before or after removing data bins with error=0). Such "truth" bins are automatically excluded from the unfolding. In many cases the bins are left empty on purpose (underflow/overflow bin) |

SetInput | %d output bins are not constrained by any data. | ||

output bin %d depends on ignored input bins %d ... %d | |||

Error | AddSysError | Source %s given twice, ignoring 2nd call. | Systematic error and background sources must have names which are unique |

SubtractBackground | |||

Error | AddSysError | source %s has no influence and has not been added. | Did you provide an empty histogram? |