IRCExplorer

Clone on GitLab

Latest version: v8312.01 | Compatible with: Pythia 8.312+ | Authors: MLhad team; Ilten, Philip philten@cern.ch; Szewc, Manuel szewcml@ucmail.uc.edu


Documentation

Package Description

This package is designed to compute and store observables at different scales in the evolution of an event in Pythia. This allows the user to study the evolution of an observable with respect to the parton shower and, in particular, compute its IRC-safety in terms of the gradient of its entropy with respect to the evolution scale. Additionally, hadronization effects are also considered, allowing the user to compute the hadronization sensitivty of an observable in terms of the induced Fisher Information matrix over the hadronization parameters.

More details regarding the package are found below.

Package Authors

MLhad team; Ilten, Philip philten@cern.ch; Szewc, Manuel szewcml@ucmail.uc.edu

Package Dependencies

FASTJET (fastjet.fr, > 3.4.2)

Package Documentation

Below you'll find details about how to install and run the package

Package configuration and installation

Following the plugin rules, one first needs to configure IRCExplorer and link it to Pythia and FastJet.

./configure --with-pythia8-config=$PATH-TO-pythia8-config --with-fastjet3-config=$PATH-TO-fastjet-config

Then, one can simply install the package with make.

In order for Pythia to find the library, the user needs to run

export PYTHIA8CONTRIB=$PWD/share

After the library is compiled, it can be called in any Pythia script with the command

Init:plugins = {libIRCExplorer.so::IRCExplorer}

The package computes a multitude of observables for a given event at a given scale (discussed below) and stores their distribution for each scale as a Pythia Histogram. The observable names need to stored in obsNames as found in IRCExplorer.h and their definition can be found in the function used to compute the observables for a given event, IRCExplorer::observableGetter at IRCexplorer.cc.

For a given observable, the user only needs to define the bins which will be used for the stored Histograms, defined as the IRCExplorer:obsName:bins settings. The bins are defined {Number of bins, lower limit, upper limit}.

Currently, the defined observables are

Event-level

Ensemble-level

Event-level, jet-based

If the user wishes to define new observables, they need to modify the observableGetter function, which is a map of vectors. Each observable is defined as a vector with key obsName, and the user also needs to add a vector of weights (which can be one) with an associated key wObsName. The latter is necessary to compute distribution-based observables which are only defined at the histogram level.

Output histograms

Two maps of vectors of histograms are defined and filled as events are generated, HistObsScales and HistObsWeights. The key to each map is the obsName, but the vector index refers to different things. For HistObsScales, the index refers to the scale which is crossed while for HistObsWeights it refers to the set of hadronization parameters used to hadronize the event. The program runs the events and stores the HistObsScales histograms as pyplottables, as defined by Pythia. The stored named for each histogram is obsName_Scale.dat. It also computes the entropy and entropy gradient at zero as defined below.

Two examples of how to plot histograms are given in share/IRCExplorer/examples/.

Accessing an observable at a given scale for a given event

If the user wants to access the event-by-event observables at a given scale, they can do so by printing the setting IRCExplorer:obsName:scaleIndex, which is defined implicitly by the program during initialization and updated as events are generated and scales are crossed. The scale Index is defined below.

The package computes all defined observables at a multitude of parton shower evolution scales and before / after hadronization. The stored observables are those of the event before the scale is crossed by an emission. If a single emission crosses multiple scales, then the observable is stored for all crossed scales. A scale can only be crossed once per event.

The User can define all scales to be crossed by using the setting

IRCExplorer:scales = {}

These scales are defined in addition to the scale = 0 case, which is always considered and thus does not need to be included. The scaleIndex referred to above in the context of observables can be obtained by adding the 0.0 scale, keeping unique scales and sorting in ascending order. The value of the scale can be verified by storing the scales setting vector<double> scales = pythia.settings.pvec("IRCExplorer:scales") and printing the required scales[scaleIndex].

To check if and how a scale has been crossed for a given event, the user can look at vector<double> crossing = pythia.settings.pvec("IRCExplorer:crossing:"+ to_string(scaleIndex)); which stores the scale of the event before the scales[scaleIndex] scale was crossed. If empty, the scale hasn't been crossed.

As currently defined, the 0 scale is defined as after hadronization and hadron decays. Thus, it is always filled. However, to better disentangle IRC safety from hadronization sensitivity, one could think of defining three different zero-scales: * After parton shower before hadronization, which could be stored at HistObsScales using scale 0. * After hadronization but before hadron decays, which could be stored at HistObsScales using scale -1. * After hadronization and hadron decays, which should be stored at HistObsWeights as it is computed for a variety of hadronization parameters to gauge hadronization sensitivity.

Note that the scale definition is not purely about storage, since the values are important when computing the entropy gradient that gauges IRC-safety. Currently, the IRC-safety studies include non-perturbative effects through hadronization + decays, which may not be the final call.

Entropy and IRC safety

After all events are generated, the entropy is computed for each observable at each scale, by looking at the normalized histogram with the function IRCExplorer::entropy(Hist hist), which for a set of bin frequencies $p_k > 0$ computes the entropy as

$S(\mathrm{obsName},\mathrm{scale})=-\sum_k p_k \ln p_k$

The program outputs using the onStat method the entropy values for each observable at each scale and computes, to gauge IRC-safety, the log gradient at the smallest scale evolution:

$\frac{\partial \ln S(\mathrm{obsName},\mathrm{scale})}{\partial \ln \mathrm{scale}}|_{\text{end of evolution}}\approx \frac{\ln S(\mathrm{obsName},\text{next-to-last scale})-\ln S(\mathrm{obsName},0)}{\ln \text{next-to-last scale} - \ln 0.5}$

where the end scale is currently defined as $0.5$ GeV as done by default in Pythia's SimpleShower. This should be improved once a protocol for arbitrary final scales is defined, as discussed above.

One can asssess the hadronization sensitivty of each observable obsName for a given choice of hadronization parameters by computing the Fisher Information Matrix at that point in parameter space. To do so, small parameter variations are considered using the reweighting framework. These parameter variations allow us to numerically estimate the gradient of an observable with respect to parameter changes by treating the problem as a Linear Regression. Since we have a linear problem, estimating the gradients is equivalent to solving the normal equations through matrix inversion.

The number of weights used to estimate the gradients can be set with IRCExplorer:neweights, while the size of the hypersphere is set with IRCExplorer:weighthypersphere (it should be small enough so that the linear term is the only important contribution while big enough that stochastic noise is not an issue). All weight variations are stored via HistObsWeights. The parameter values, observable names and histograms are stored as .dat files. The name of the first two files are set with IRCExplorer:mapParamsFile and IRCExplorer:obsNamesFile, and all histograms are stored in a folder whose name is set with IRCExplorer:weightedHistogramsFolder.

In the example program share/IRCExplorer/examples/fisher_information_matrix_computation_with_multinomial_expectation_values.py we show how these files can be accessed and used to compute the Fisher Information Matrix at a given parameter point via the gradients of the rates per bin as a function of the hadronization parameters.

Running the Observable Ranker

First, we need to clone the repository.

git clone git@gitlab.com:pythia8-contrib/packages/ircexplorer.git
cd ircexplorer

We need an environment where we have the following. 1. fastjet - must include the n-subjetiness contrib plugin. 2. pythia - should be version 8.316 or later. 3. rivet - not strictly necessary, but helps with plotting and observables as needed.

We can use the Docker container docker://pythia8/dev:sherpa where all this is available. An environment can be created with singularity as follows.

singularity shell docker://pythia8/dev:sherpa

Alternatively, an enviroment can be created directly with docker as follows.

docker run -i -t -v "$PWD:$PWD" -w $PWD -u `id -u` --cap-add=SYS_PTRACE --rm pythia8/dev:sherpa bash --norc

Next, we need to set up our Python environment, this should work with both singularity and docker.

pip install --target ./python scikit-learn
export PYTHONPATH=$PYTHONPATH:$PWD/python

Then, we need do the following.

  1. Configure the plugin.
  2. Build the plugin.
  3. Run the plugin.
  4. Rank the observables.

These steps are given below. Be careful, the default run is on 60 threads so make sure that your machine can handle this. On sneezy, this run should take on the order of 5 minutes.

# Configure plugin with the installed Pythia version.
./configure --with-pythia8=/hep/pythia

# Build the plugin.
make -j7

# Change the examples directory.
cd share/IRCExplorer/examples/

# Build the plugins.
make testPlugins

# Run the IRC explorer to generate the histograms for the observable ranking. 
./testPlugins ircExplorer.cmnd

# Run the observable ranking.
# Note, configuration help is available as `python rank.py --help`.
python rank.py

Archiving Output

Once you have run IRCExplorer to produce histogram output, it may be useful to archive this so that other people can use it. This can be done as follows, starting in the top-level of the project.

cd share/IRCExplorer/examples/
./archive_data.sh

This will produce a file of the form <YYMMDD>.<commit hash>.tgz. You can then upload this file to the data folder in our CERNbox storage for this project: https://cernbox.cern.ch/s/cP296PdcZOuhVSr. The password is mlhad.

The output of ranker.py should also be logged. This of course depends not only on the version of ranker.py being used, but also the version of the data being used. Logs should be uploaded to the logs folder of our CERNbox storage. Additional notes should be kept in share/IRCExplorer/examples/README.md to document the ranking algorithms and results.

cd share/IRCExplorer/examples/
./archive_rank.sh <data directory>

Implementing New Observables

The observables have been factorized so that implementing a new observable should be relatively straight forward. First, declare a new observable class in include/IRCExplorer/IRCExplorer.h. This class should inherit from the Observable base class, and should exist in the Observables namespace. Look for the following code in the header:

//==========================================================================

// Actual observables are defined here.

// Parton/hadron multiplicity.
class nAll : public Observable {
  bool analyze(const Event& event) override;
};

The observable should be declared after the already declared observables. For example, the following declares the new observable newObservable. Here, we use the convention of starting with a lower case, since these classes act more like functors than classes.

// New observable.
class newObservable : public Observable {
  bool analyze(const Event& event) override;
};

The only method that needs to be defined is analyze. This should be done in src/IRCExporer.cc. Look for the following code in the source.

//==========================================================================

// Derived observable classes.

namespace Observables {

//--------------------------------------------------------------------------

The declaration can be as follows. Again, make sure this is within the Observables namespace.

//--------------------------------------------------------------------------

// Parton/hadron multiplicity.

bool newObservable::analyze(const Event& event) {
  // Set the values.
  values  = {0};
  // Set the weights.
  weights = {1};
  // Loop over the particles in the vent if needed.
  for (const Particle& prt : event) {
    // Do something useful here.
  }

  // Return `true` if successful.
  return true;
}

Finally, you need to insert the observable into the active observables. Looks for the following in src/IRCExporer.cc.

  // Define the observables.
  observables["nAll"]      = make_shared<Observables::nAll>();
  observables["nCharge"]   = make_shared<Observables::nCharge>();
  observables["nBaryon"]   = make_shared<Observables::nBaryon>();
  observables["nStrange"]  = make_shared<Observables::nStrange>();
  observables["nJet"]      = make_shared<Observables::nJet>();
  observables["nSubjet1"]  = make_shared<Observables::nSubjet>(1);
  observables["nSubjet2"]  = make_shared<Observables::nSubjet>(2);
  observables["nSubjet3"]  = make_shared<Observables::nSubjet>(3);
  observables["thrust"]    = make_shared<Observables::thrust>();
  observables["eec"]       = make_shared<Observables::eec>();
  observables["nnc"]       = make_shared<Observables::nnc>();

Add your new observable to the end of this list. Use the same key name as the class name, and try to maintain the Pythia-style coding guidelines throughout your code.

  observables["newObservable"] = make_shared<Observables::newObservable>();

Running the Scale Explorer

Warning, this is not up-to-date. To illustrate the use of the IRCExplorer package, a simple example is found in share/IRCExplorer/examples/testPlugins.cc.

The example can be compiled with make testPlugins

and run as

./testPlugins IRCExplorer_with_generated_events.cmnd

The example shows how: * To print individual event crossings and observables * Generate histograms and print entropy values.

One histogram can be plotted by running

python3 plotter.py obsName_scaleIndex.dat

while multiple scales can be compared by running

python3 plotter_multiple_scales.py obsName