ExampleML

Introduction

In this example PYTHIA-CONTRIB package, a derived Pythia UserHooks is provided which accesses an ONNX model and then uses the ORT library to perform inference using this model. The model produces fragmentation weights which are in turn used to perform a rejection sampling algorithm that modifies the Pythia fragmentation function. The final result are unweighted samples that follow a new fragmentation function determined by the ONNX model.

ONNX is the Open Neural Network Exchange and is intended to provide open standards in the context of machine learning development. Specifically, models can be built using ONNX and then executed with a platform specific runtime. From the ONNX documentation:

ONNX provides a definition of an extensible computation graph model, as well as definitions of built-in operators and standard data types.

Each computation dataflow graph is structured as a list of nodes that form an acyclic graph. Nodes have one or more inputs and one or more outputs. Each node is a call to an operator. The graph also has metadata to help document its purpose, author, etc.

Operators are implemented externally to the graph, but the set of built-in operators are portable across frameworks. Every framework supporting ONNX will provide implementations of these operators on the applicable data types.

It is important to note that ONNX is intended for the development of machine learned models, but not the execution of these models. Instead, a runtime environment is needed. One such environment is ONXX runtime (ORT). From the ORT documentation:

ONNX Runtime is a cross-platform inference and training machine-learning accelerator.

ONNX Runtime inference can enable faster customer experiences and lower costs, supporting models from deep learning frameworks such as PyTorch and TensorFlow/Keras as well as classical machine learning libraries such as scikit-learn, LightGBM, XGBoost, etc. ONNX Runtime is compatible with different hardware, drivers, and operating systems, and provides optimal performance by leveraging hardware accelerators where applicable alongside graph optimizations and transforms.

ONNX Runtime training can accelerate the model training time on multi-node NVIDIA GPUs for transformer models with a one-line addition for existing PyTorch training scripts.

Docker Container for ONNX Runtime

In many cases, it may be simplest to work with a Docker container where ORT is already available. The pythia8/dev:test container provides such an environment, along with a number of other standard HEP packages.

docker run -i -t -v "$PWD:$PWD" -w $PWD -u `id -u` --cap-add=SYS_PTRACE --rm pythia8/dev:test bash --norc

Compiling ONNX Runtime

On older or non-standard systems, compiling and installing ORT can be challenging. Here a short guide is given to introduce the ORT ecosystem. The most recent version of ORT can be found on github. However, the repository is large, given its full history, and so just downloading the specific version needed may be advisable.

git clone --depth 1 -b v1.17.1 https://github.com/microsoft/onnxruntime.git

Here, the --depth 1 option only downloads the branch or tag specififed with the -b option without any history, and reduces the download size by roughly a factor of three. The version in this example is v1.17.1; a full list of the versions can be found by browsing the tags of the repository.

The build system for ORT is somewhat specialized, and uses the following chain. 1. The build.sh script at the top-level directory is called with relevant options provided by the user. 2. This script then uses python3 to call the script tools/ci_build/build.py which requires at a minimum Python 3.7. If a specific version of python3 needs to be used, this can be done by just calling

    PYTHON tools/ci_build/build.py --build_dir BUILDDIR ...

where `PYTHON` is the Python executable to use and `BUILDDIR` is the directory where ORT should be built.

The build.py script then creates a cmake command. A minimum version of 3.26 for CMake is required. The CMakeLists.txt is located in the top-level cmake direcotry.
The make command is called after cmake is run. This then builds ORT.
The build.sh script does not not call make install. To install ORT to a user specified location then the standard CMAKE flag CMAKE_INSTALL_PREFIX=INSTALLDIR must be passed via the --cmake_extra_defines option for build.sh. More details are given on this later. To then install, change to BUILDDIR and then call make install.

It is possible to bypass the build.sh and build.py system and just directly call CMake. In this case it may be useful to browse the source of build.py to determine all the ORT specific flags which can be passed to CMake. These flags all begin with onnxruntime and are defined with the cmake_args variable. All the available options can be accessed by options passed to the build script and can be listed by calling --help.

./build.sh --help

There are a number of CMake flags which are overwritten by build.py. * Python_EXECUTABLE and PYTHON_EXECUTABLE are set using sys.executable and cannot be changed by the user. * CMAKE_PREFIX_PATH is explictly set based on the build directory. * A number of compiler options are overwritten, e.g.CMAKE_*_COMPILER, if onnxruntime_BUILD_CACHE is set to ON. * Some XCode, Android, and GDK flags are exlicitly set.

The following build.sh call provides some options which may be needed, and are detailed below.

./build.sh --cmake_path CMAKEEXE --config Release --parallel --allow_running_as_root --build_shared_lib --cmake_extra_defines CMAKE_INSTALL_PREFIX=INSTALLDIR --cmake_extra_defines CMAKE_C_COMPILER=GCC --cmake_extra_defines CMAKE_CXX_COMPILER=GXX

--cmake_path: provides the executable for CMake, useful if not system default.
--config: sets the type of build, which defaults to a debugging build, whereas Release provides an optimized build.
--parallel: use multiple cores with the -j flag when calling make. Note, there is no way to specify the number of threads to use and so this can significantly slow down a system.
--allow_running_as_root: allows the build to be made by the root user.
--build_shared_lib: by default, the shared library for ORT is not built, and so this option must be passed to be able to link against the shared library for ORT.
--cmake_extra_defines: allows additional flags to be passed to CMake.
CMAKE_INSTALL_PREFIX: the directory where make install installs ORT.
CMAKE_C_COMPILER: specifies the C compiler to use.
CMAKE_CXX_COMPILER: specifies the C++ compiler to use.

Given these details, the following example demonstrates how ORT might be built on a non-standard system.

# Clone the repository.
git clone --depth 1 -b v1.17.1 https://github.com/microsoft/onnxruntime.git

# Run the build script.
./build.sh --cmake_path CMAKEEXE --config Release --parallel --allow_running_as_root --build_shared_lib --cmake_extra_defines CMAKE_INSTALL_PREFIX=INSTALLDIR --cmake_extra_defines CMAKE_C_COMPILER=GCC --cmake_extra_defines CMAKE_CXX_COMPILER=GXX

# Install the build.
cd Linux/build/Release
make install

Documentation

The UserHooks itself is named OnnxUser and has the following settings to be specified.

OnnxUser:canChangeFragPar: This boolean setting decides whether the OnnxUser UserHook is actually called or not. It can take the values of either "on" or "off". (default: "off")
OnnxUser:hadronizationNN: This is a character-string setting which can take any string value without blanks. It is the location of the ONNX model needed to compute the weights. (default: "none")
OnnxUser:maxWeight: This is a double-precision setting which can take real-number values larger than 0.01. It is the maximum value of weights, needed to perform rejection sampling. If any weights are larger than the chosen value, a warning will be issued as the unweighted samples become biased. Larger values equate a more ineffcient sampling. It is overwritten if the maxWeight can be found in the ONNX meta-data. (default: 1.0).

The basic logic behindOnnxUser is * When constructing OnnxUser, an ORT session session is created. This ORT session loads the model from OnnxUser:hadronizationNN and extracts the relevant dimensionalities. The necessary ORT tensors inVals and outVals are also initialized and will be updated during inference. * If OnnxUser:canChangeFragPar is set to on, each time a fragmentation following the nominal Lund fragmentation function is produced by Pythia, except for the finalTwo case, OnnxUser::doVetoFragmentation is called. * If OnnxUser:hadronizationNNFlag is set to on, OnnxUser::doVetoFragmentation calls OnnxUser::FragmentationWeight to run inference using the ORT session and obtain a weight which is scaled by maxWeight. * OnnxUser::FragmentationWeight updates inVals with the necessary information from the StringEnd, runs inference using the session->Run method and returns the updated value of outVals (exponentiaded due to our model producing the logarithm of the weight). This specific module is model dependent. It should be modified to account for the necessary inputs to the ONNX model and the specifics of the output returned from said model. In this example, the model takes as inputs the following variables from the StringEnd $z,p_{x,\text{new}},p_{y,\text{new}},m_{\text{Had}},\text{fromPos},p_{x,\text{old}},p_{y,\text{old}}$ and returns the logarithm of the weight between the desired data distribution (which in our case is aLund=0.30) and the baseline choice of aLund=0.68. * The normalized weight is used to perform rejection sampling. We draw a random number eff and accept the sample only if eff < weight/maxWeight. This ensures that the distribution follows the appropriate fragmentation function.

A working example that implements OnnxUser can be found in share/ExampleML/examples/main01.cc, with a provided ONNX model located at share/ExampleML/models/full_HOMER.onnx. The Pythia Settings are selected so that the generated events follow the same distributions as those used during training: $e^{+}e^{-}$ collisions at $\sqrt{s}$ = 91.2 GeV producing only charged and neutral pions. By simply running

cd share/ExampleML/examples/
make main01
./main01
python3 main01plot.py

We obtain a figure main01plot.pdf, which compares the baseline choice of aLund=0.68 with the desired aLund=0.30 obtained with variations and with the approximate Neural Network (NN) weight using 100k events.

Example of a generated figure through main01 for 10k events