MergeMaster (a PyRoot script for merging histograms)
Last updated: November 24th, 2006
download: source(python)
Description:
MergeMaster is a set of two scripts developed for CMS production, that enable merging root histograms. Production sometimes runs jobs (from the same request) on multiple (distributed) compute nodes whos result (root files) contain parts of the resulting histogram which need to be merged. The assumption is that the histograms are of equal bin size.
Merging happens in two steps: (1) Merge resulting root files on the same compute node (to prevent sending many small files). (2) Merge the results per compute node into a final root file containing all the histograms.
The different root files being generated on a node represent different tasks and packages, thus when merging on the compute node level you specify the task and package this root file represents and this hierarchy ( <package >/<task> ) will be included in the compute node level result root file. The assumption here is that on a compute node level all tasks/packages generated are different, and basically we only merge file here.
Once a compute node is finised it can send the resulted root file to a master which merges all compute node level files. On this level it merges the histograms and scatter plots of the task/package hierarchy of the different compute node level results files.
Prereqs:
-Python installed
-Root installed (and PyRoot enabled)
Installation:
-Untar the source file
-Set the environment (setup.sh) and the root environment
Usage:
-See the *.sh test files in the test directory. On a compute node level (file merge) the command looks like this:
merge-files --input=../data/workflow0/input1.root --package=package1 --output=workflow0MergedFile1.root --task=task1On a master level (merging histograms) the command looks like this:
merge-histos --input=workflow0MergedFile1.root --output=workflow0MergedTotal.root
-You can view the root files (the actual histograms that is) by starting a root session and entering the command:
new TBrowser;-The source contains several root files for testing and two test scripts in the test directory.

Histogram produced on a compute node

Merged histogram from multiple compute nodes