Hdsense

Clone on GitLab

Latest version: v8316.00 | Compatible with: Pythia 8.316+ | Authors: MLhad team; Ilten, Philip philten@cern.ch; Szewc, Manuel mszewc@unsam.edu.ar


Documentation

Package Description

This package is designed to compute the hadronization sensitivty of an observable in terms of the induced Fisher Information matrix over the hadronization parameters.

More details regarding the package are found below.

Package Authors

MLhad team; Ilten, Philip philten@cern.ch; Szewc, Manuel mszewc@unsam.edu.ar

Package Dependencies

FASTJET (fastjet.fr, > 3.4.2)

Package Documentation

Below you'll find details about how to install and run the package

Package configuration and installation

Following the plugin rules, one first needs to configure HDSense and link it to Pythia and FastJet.

./configure --with-pythia8-config=$PATH-TO-pythia8-config --with-fastjet3-config=$PATH-TO-fastjet-config

Then, one can simply install the package with make.

In order for Pythia to find the library, the user needs to run

export PYTHIA8CONTRIB=$PWD/share

After the library is compiled, it can be called in any Pythia script with the command

Init:plugins = {libHDSense.so::MySelector}

The package computes a multitude of observables for a given event at a given scale (discussed below) and stores their distribution for each scale as a Pythia Histogram. The observable names need to stored in obsNames as found in MySelector.h and their definition can be found in the function used to compute the observables for a given event, MySelector::observableGetter at MySelector.cc.

For a given observable, the user only needs to define the bins which will be used for the stored Histograms, defined as the HDSense:obsName:bins settings. The bins are defined {Number of bins, lower limit, upper limit}.

Currently, the defined observables are

Event-level

Ensemble-level

Event-level, jet-based

If the user wishes to define new observables, they need to modify the observableGetter function, which is a map of vectors. Each observable is defined as a vector with key obsName, and the user also needs to add a vector of weights (which can be one) with an associated key wObsName. The latter is necessary to compute distribution-based observables which are only defined at the histogram level.

Output histograms

A default vector of histograms is defined and filled as events are generated, HistObsWeights. The key to each map is the obsName, with the vector index refering to the set of hadronization parameters used to hadronize the event. The program runs the events and stores the HistObsWeights histograms as pyplottables, as defined by Pythia. The stored named for each histogram is obsName_nWeight_$nWeight.dat.

One can asssess the hadronization sensitivty of each observable obsName for a given choice of hadronization parameters by computing the Fisher Information Matrix at that point in parameter space. To do so, small parameter variations are considered using the reweighting framework. These parameter variations allow us to numerically estimate the gradient of an observable with respect to parameter changes by treating the problem as a Linear Regression. Since we have a linear problem, estimating the gradients is equivalent to solving the normal equations through matrix inversion.

The number of weights used to estimate the gradients can be set with HDSense:nweights, while the size of the hypersphere is set with HDSense:weighthypersphere (it should be small enough so that the linear term is the only important contribution while big enough that stochastic noise is not an issue). All weight variations are stored via HistObsWeights. The parameter values, observable names and histograms are stored as .dat files. The name of the first two files are set with HDSense:mapParamsFile and HDSense:obsNamesFile, and all histograms are stored in a folder whose name is set with HDSense:weightedHistogramsFolder.

In the example program share/HDSense/examples/rank.py we show how these files can be accessed and used to compute the Fisher Information Matrix at a given parameter point via the gradients of the rates per bin as a function of the hadronization parameters.

Running the Observable Ranker

First, we need to clone the repository.

git clone git@gitlab.com:pythia8-contrib/packages/hdsense.git
cd HDSense

We need an environment where we have the following. 1. fastjet - must include the n-subjetiness contrib plugin. 2. pythia - should be version 8.316 or later. 3. rivet - not strictly necessary, but helps with plotting and observables as needed.

We can use the Docker container docker://pythia8/dev:sherpa where all this is available. An environment can be created with singularity as follows.

singularity shell docker://pythia8/dev:sherpa

Alternatively, an enviroment can be created directly with docker as follows.

docker run -i -t -v "$PWD:$PWD" -w $PWD -u `id -u` --cap-add=SYS_PTRACE --rm pythia8/dev:sherpa bash --norc

Next, we need to set up our Python environment, this should work with both singularity and docker.

pip install --target ./python scikit-learn
export PYTHONPATH=$PYTHONPATH:$PWD/python

Then, we need do the following.

  1. Configure the plugin.
  2. Build the plugin.
  3. Run the plugin.
  4. Rank the observables.

These steps are given below. Be careful, the default run is on 60 threads so make sure that your machine can handle this. On sneezy, this run should take on the order of 5 minutes.

# Configure plugin with the installed Pythia version.
./configure --with-pythia8=/hep/pythia

# Build the plugin.
make -j7

# Change the examples directory.
cd share/HDSense/examples/

# Build the plugins.
make testFullEvents

# Run the selector to generate the full events for the observable ranking. 
./testFullEvents MySelectorFullEvents.cmnd

# Rebin the full events using quantile binning
# Note, configuration help is available as `python rebinning_and_saving.py --help`.
python rebinning_and_saving.py

# Run the observable ranking.
# Note, configuration help is available as `python rank.py --help`.
python rank.py

# Once all K choices are run, create the figure
# Note, configuration help is available as `python rank_log_to_tex.py --help`.
# Note, there are two other choices depending on whether there are bootstrap samples
# Or if experiments need to be combined
python rank_log_to_tex.py


# Run the optimal comparison using the full events
# Note, configuration help is available as `python optimal_comparison.py --help`.
python optimal_comparison.py

Useful bash scripts to run multiple values of K are run_multiple_rank.sh and run_multiple_optimal_comparison.sh

Implementing New Observables

The observables have been factorized so that implementing a new observable should be relatively straight forward. First, declare a new observable class in include/HDSense/MySelector.h. This class should inherit from the Observable base class, and should exist in the Observables namespace. Look for the following code in the header:

//==========================================================================

// Actual observables are defined here.

// Parton/hadron multiplicity.
class nAll : public Observable {
  bool analyze(const Event& event) override;
};

The observable should be declared after the already declared observables. For example, the following declares the new observable newObservable. Here, we use the convention of starting with a lower case, since these classes act more like functors than classes.

// New observable.
class newObservable : public Observable {
  bool analyze(const Event& event) override;
};

The only method that needs to be defined is analyze. This should be done in src/MySelector.cc. Look for the following code in the source.

//==========================================================================

// Derived observable classes.

namespace Observables {

//--------------------------------------------------------------------------

The declaration can be as follows. Again, make sure this is within the Observables namespace.

//--------------------------------------------------------------------------

// Parton/hadron multiplicity.

bool newObservable::analyze(const Event& event) {
  // Set the values.
  values  = {0};
  // Set the weights.
  weights = {1};
  // Loop over the particles in the vent if needed.
  for (const Particle& prt : event) {
    // Do something useful here.
  }

  // Return `true` if successful.
  return true;
}

Finally, you need to insert the observable into the active observables. Looks for the following in src/MySelector.cc.

  // Define the observables.
  observables["nAll"]      = make_shared<Observables::nAll>();
  observables["nCharge"]   = make_shared<Observables::nCharge>();
  observables["nBaryon"]   = make_shared<Observables::nBaryon>();
  observables["nStrange"]  = make_shared<Observables::nStrange>();
  observables["nJet"]      = make_shared<Observables::nJet>();
  observables["nSubjet1"]  = make_shared<Observables::nSubjet>(1);
  observables["nSubjet2"]  = make_shared<Observables::nSubjet>(2);
  observables["nSubjet3"]  = make_shared<Observables::nSubjet>(3);
  observables["thrust"]    = make_shared<Observables::thrust>();
  observables["eec"]       = make_shared<Observables::eec>();
  observables["nnc"]       = make_shared<Observables::nnc>();

Add your new observable to the end of this list. Use the same key name as the class name, and try to maintain the Pythia-style coding guidelines throughout your code.

  observables["newObservable"] = make_shared<Observables::newObservable>();