Madgraph output files - CompHEP output files - Using the KUMAC files - Plot_events - Stupid PAW tricks

How to process MadGraph output files:

The final product of MadGraph is a file named scaled.dat, which contains a weighted set of events. This file is generated by the combine_events routine. For each subprocess, a scaled.dat file is created in the subdirectory by scale_events and is then appended to the main scaled.dat file in the top level directory. Each event takes the form:

 5 1  3.03380238E-14
     -1      2    -13     14     22    
      0      0      1      1      1    
      0      0      2      2      2    
      0    501      0      0      0    
    501      0      0      0      0    
  1 0.128997637100E+03 0.000000000000E+00 0.000000000000E+00 0.128997637100E+03
  2 0.414032469710E+02 0.000000000000E+00 0.000000000000E+00-0.414032469710E+02
  3 0.784368427541E+02 0.531487211437E+02-0.923761186607E+01 0.569404800563E+02
  4 0.387629175823E+02-0.224533444115E+02-0.295766226778E+02 0.111191049643E+02
  5 0.532011237341E+02-0.306953767322E+02 0.388142345439E+02 0.195348051080E+02

The first line gives the number of external particles, the index of the event (note that this index applies only within the context of events file for the subprocess, not for the mixed file), and the weight of the event. The next N_external lines describe the particles according to:
  line 1:  particle type (according to the PDG code in particles.dat)
  lines 2 & 3:  mother particle information 
  lines 4 & 5:  daughter particle information 
Finally, the last N_external lines give the 4-momenta of the particles (E, P_x, P_y, P_z)


For MadGraph files, the PAW command nt/read works in such a way that we don't have to worry about trying to describe the special formatting (in terms of 1X,F16.5,2X, etc.) or that fact that the event is spread out multiple lines. For once, PAW is smart enough (or just lucky that the layout is straightforward enough) to figure out everything on its own. We simply create an ntuple with 3 + 5*(N_external) + 5*(N_external) variables and read the data into the kumac:

nt/create (ntuple id) (ntuple name) 53 //LUN1 ! nextern i wgt p1id1 p2id1 p3id1 p4id1 p5id1 
  p1id2 p2id2 p3id2 p4id2 p5id2 p1id3 p2id3 p3id3 p4id3 p5id3 p1id4 p2id4 p3id4 p4id4 p5id4 
  p1id5 p2id5 p3id5 p4id5 p5id5 p1i p1e p1x p1y p1z p2i p2e p2x p2y p2z p3i p3e p3x p3y p3z 
  p4i p4e p4x p4y p4z p5i p5e p5x p5y p5z
nt/read (ntuple id) (input file)
where:
  nextern = number of external particles
  i       = index (not very useful in this context)
  wgt     = weight of the event
  pXidY   = the information for particle X described by line Y above
  pXi     = index (again not very useful since pXi always equals X)
  pXe,pXx, pXy, pXz  
          = energy and 3-momenta of particle X


How to process CompHEP output files:

CompHEP output follows a slightly different format. For an example, look at the area /cdf/data22d/chadd/comphep/Zgamma-all/results-tightcuts-events/. In this directory, you will find files corresponding to four subprocesses:

  events_5.txt   events_6.txt   events_7.txt   events_8.txt
These files are then mixed to produce the a sample of unweighted events (Mixed.PEV) for the process as a whole. The mixing is performed using a utility called mixPEV, available in /cdf/data23a/frisch/cpyth62/. When mixPEV is used to mix events, it also produces a file, Prt.PEV, describing which subprocess files were used and which of those files was the limiting factor (too few events) in the production of the mixed sample.


The mixed sample, Mixed.PEV, begins with a header which is simply a combination of the headers from the subprocess files:

#PEVLIB_v.1.0 =============================================
#CompHEP version 41.10    
#PROCESS   u#   U#  -> A    e2   E2  
#Initial_state
  SQRT(S) 1.960000E+03
  Rapidity(c.m.s) 0.000000E+00
  StrFun1: PDF:CTEQ5L(proton)
  StrFun2: PDF:CTEQ5L(anti-proton)
#MASSES  0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 1.0570000000E-01 1.0570000000E-01
#Cross_section(Width) 1.840382E+01
#Number_of_events     100000
#----------------------------------------------------------
#CompHEP version 41.10    
#PROCESS   U#   u#  -> A    e2   E2  
#Initial_state
  SQRT(S) 1.960000E+03
  Rapidity(c.m.s) 0.000000E+00
  StrFun1: PDF:CTEQ5L(proton)
  StrFun2: PDF:CTEQ5L(anti-proton)
#MASSES  0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 1.0570000000E-01 1.0570000000E-01
#Cross_section(Width) 1.675956E+00
#Number_of_events     100000
#----------------------------------------------------------
#CompHEP version 41.10    
#PROCESS   d#   D#  -> A    e2   E2  
#Initial_state
  SQRT(S) 1.960000E+03
  Rapidity(c.m.s) 0.000000E+00
  StrFun1: PDF:CTEQ5L(proton)
  StrFun2: PDF:CTEQ5L(anti-proton)
#MASSES  0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 1.0570000000E-01 1.0570000000E-01
#Cross_section(Width) 6.916155E+00
#Number_of_events     100000
#----------------------------------------------------------
#CompHEP version 41.10    
#PROCESS   D#   d#  -> A    e2   E2  
#Initial_state
  SQRT(S) 1.960000E+03
  Rapidity(c.m.s) 0.000000E+00
  StrFun1: PDF:CTEQ5L(proton)
  StrFun2: PDF:CTEQ5L(anti-proton)
#MASSES  0.0000000000E+00 0.0000000000E+00 0.0000000000E+00 1.0570000000E-01 1.0570000000E-01
#Cross_section(Width) 1.798046E+00
#Number_of_events     100000
#----------------------------------------------------------
#Number_of_subprocesses = 4
#Total_cross_section_(pb) = 2.879398E+01
#Events_mixed_and_randomized = 156360
#Nproc ================== Events ==========================
The key lines are those that detail each subprocess in terms of the external particles (#PROCESS ...) and those containing the masses (#MASSES ...).


Next come the events, which are arranged one per line. Here I have added a line describing the format of each event. This line is found in each of the individual subprocess files but is strangely missing from Mixed.PEV.

#Events   P1_3 [Gev]        P2_3 [Gev]        P3_1 [Gev]        P3_2 [Gev]        P3_3 [Gev]        P4_1 [Gev]        P4_2 [Gev]        P4_3 [Gev]        P5_1 [Gev]        P5_2 [Gev]        P5_3 [Gev]     QCD SCALE    Color chains
 3     1.3346018118E+01 -1.4651427386E+02 -2.6775285148E+00 -3.6853018394E+00 -1.1610410211E+01 -2.3076294649E+01 -2.9689582073E+01 -7.8962405647E+01  2.5753823164E+01  3.3374883912E+01 -4.2595439879E+01  9.119E+01   (1 2)
 1     2.3028958573E+01 -1.6721855066E+02 -1.3893870160E+00 -1.9123271694E+00  9.3861047666E+00  2.7892668822E+01  2.3256759858E+01 -1.3879449642E+02 -2.6503281806E+01 -2.1344432689E+01 -1.4781200436E+01  9.119E+01   (1 2)
 4     4.3924036868E+01 -4.8190210284E+01 -1.9385005837E+01 -2.6681171562E+01 -3.1673888763E+00  2.6027373774E+01  3.7646668771E+01  1.9312992137E+00 -6.6423679375E+00 -1.0965497209E+01 -3.0300837531E+00  9.119E+01   (2 1)
 2     8.6197847531E+01 -2.4019786381E+01 -1.4392712571E+01 -1.9809869369E+01 -3.2326745978E+00  2.3010951187E+00 -6.3317876094E+00 -5.6815044653E+00  1.2091617452E+01  2.6141656979E+01  7.1092240214E+01  9.119E+01   (1 2)
The value in the first column describes which subprocess in the header created the event. This is important because only the 3-momenta are listed for final-state particles, so any particle energies must be calculated using the appropriate MASSES entry in the header. Also note that the x- and y-momenta for particles 1 and 2 are not included in the output, as they are assumed to be zero -- the initial-state particles have momenta only along the z-axis.


Reading these events into PAW presents more of a problem due to the formatting of the data. Because not every line in the header is commented out with a leading # sign, we cannot simply filter them out. In addition, the parentheses around the color flow info is just enough to confuse PAW and force the usage of the FORMAT field in nt/read. Rather than deal with this, we opted to create a backup file where the header has been filtered out and the parentheses have been removed.

Because the particle masses are contained in the header, not in the events, it is necessary to make one pass through the file and dump the masses into vectors mass1, mass2, mass3, .... The mass of particle X in subprocess Y is then accessible via massX(Y).

The particle information and energies are also not stored on an event by event basis, so the ntuple is significantly smaller. Also, we can neglect the color flow information altogether by not providing enough variables in the ntuple to cover the number of entries in each event record. nt/read simply ignores any remaining values not assigned to a variable, up to the end of the line:

  nt/create MYUNIT MYTITLE 13 //LUN1 ! process p1z p2z p3x p3y p3z p4x p4y p4z p5x p5y p5z qcdscale
  nt/read MYUNIT comphepbackup.txt


Because the particle energies are no longer contained in the ntuple, we must calculate them based on the appropriate masses. The information contained in the mass vectors is accessible at the ntuple level via the following construct:

alias/create p1e (SQRT((p1z)*(p1z)+mass1(int(process))*mass1(int(process))))
The ntuple contains a reference to which subprocess created the event (process), but before it can be used as a vector index, it must be converted to type int. Now the command:
nt/plot 1.p1e
works exactly the same as if the variable p1e were contained in the ntuple.


Using the getmadgraph and getcomphep kumac files:

In order to facilitate the analysis of information from MadGraph and CompHEP, I have created kumacs for use with PAW. The files are named getmadgraph.kumac and getcomphep.kumac and are located in /cdf/data22a/chadd/Madgraph. If you decide to use getcomphep.kumac, you will also need comphepbackup.csh. (For some reason, I can't get the commands embedded in this script to work correctly in PAW using shell (command) when the input file is large.)

When either of these is executed in PAW, the user is queried for the following:

name of input file:            This can be a file in the local directory or the full path to a file.

number of external particles:  It's much easier to just ask than have PAW figure it out.

unit number for the ntuple:    This allows for multiple ntuples to be created, and since they
                               are disk-based, there are no memory issues.

ntuple name:                   This sets the TITLE attribute in nt/create.
At this time, these routines are capable of handling data for processes containing up to 9 external particles. This can be changed, if necessary.


Once the ntuple has been filled, the kumac proceeds to create aliases for kinematic variables which might be of interest. In the case of getcomphep, additional aliases are created for any missing variables which are not contained in the file (i.e., the x- and y-momenta of particles 1 & 2, and all particle energies).

At this point, the following variables for individual particles and pairs of particles are available for study:

event weight:         wgt  (valid only for MadGraph files; CompHEP events are unweighted)
energy:               p1e, p2e, p3e, ...  (contained in the ntuple for MadGraph events,
                                           defined via aliases in CompHEP events)

3-momenta:            p1x, p1y, p1z, p2x, p2y, p2z, ...
transverse energy:    et1, et2, et3, ...
transverse momenta:   pt1, pt2, pt3, ...
eta:                  eta1, eta2, eta3, ...
delta phi:	      dphi12, dphi13, dphi23, ...
delta R:              deltar12, deltar13, deltar23, ...
invariant mass:       invmass12, invmass13, invmass23, ...


Additional aliases may be built using the existing entries in the ntuple or the aliases already defined. As an example, here is my definition of dphi16:

alias/create dphi16 (ACOS(MIN((MAX((((p1x)*(p6x)+(p1y)*(p6y))/SQRT(((p1x)*(p1x)+(p1y)*(p1y))*((p6x)*(p6x)+(p6y)*(p6y)))),-0.99999999)),0.99999999)))
A couple things to note are:
  1. It's generally a good idea to enclose the alias definition in parentheses. This protects you from any unforseen precedence issues when the alias value is substituted into an expression being evaluated. (For example, if ALIAS = a + b, then 3 * ALIAS could be evaluated as 3 * a + b = 3a + b instead of 3 (a + b), when it's the latter value that you probably would want to be calculated. It's better to be safe, and it only costs 2 extra keystrokes.

  2. When using existing aliases in the definition of a new alias, it is necessary to surround the current aliases in parentheses (like (p1x) above). An alias name needs to be cleanly separated from the surrounding text, and unfortunately the mathematical operators (+,-,*,/) are not sufficient to do this. If you define an alias using
    alias/create etsum et3+et4+et5
    and try to plot etsum using nt/plot
    nt/plot 1.etsum
    then PAW will complain that et3 is not defined because no substitution occurs and et3 is not a valid variable name in the ntuple. The same problem will also occur if you try to combine aliases when using nt/plot:
    nt/plot 1.et3+et4+et5
    
    will give the error
    /NTUPLE/PLOT: Name 'et3' undefined
    /NTUPLE/PLOT: Name 'et4' undefined
    
    while
    nt/plot 1.(et3)+(et4)+(et5)
    
    will give you a nice plot.


createoutput.kumac - an alternative to MadGraph's plot_events routine:

Once we have all of the event information in the ntuple and several kinematic variables defined using aliases, we would like to make plots. Most of that functionality already exists using nt/plot. However, MadGraph offers a couple options that we would like to emulate.

MadGraph includes a routine called plot_events which allows the user to book various histograms (such as pt of particle #5). The really useful thing that plot_events does is calculate the statistical error for each bin of the histogram and a weighted sum of events per bin. The output is written to a text file which than then be read into PAW as vectors. This easily lends itself to doing things like summation of the weighted number of events above a certain threshold. Henry wrote a nice routine to do this, so we would like to incorporate that into what we have done here.

This new routine, named createoutput.kumac, is designed to substitute for plot_events. The user is asked to provide the name of the variable to be plotted, a filter (if any) to be applied when selecting events, the number of histogram bins, and the minimum and maximum values to be binned.


From these values, the kinematic variable and the event weight are dumped from the ntuple to a text file using

nt/dump 1.(variable or alias)%wgt (cut value) ! ! x-dumpedinfo.txt ',' ','
and the values are read back into vectors, while any empty entries (for events which fail the cut) are ignored.

The following vectors are then created:

xindex(nbins) =    center of each histogram bin
data(nbins)   =    sum of the event weights for each bin
xerror(nbins) =    statistical error for each bin
xnpoints(nbins) =  (have to think about this one for a second)
plus the integration of data and errors above the threshold for each bin:
xintdata(nbins)  =  summation of data for all bins including and above the current bin
xinterror(nbins) =  summation of xerror for all bins including and above the current bin
The routine also returns the total integrated cross section based on the cuts provided, which corresponds to the lowest bin of xintdata.

Finally, the user is given the option to write xindex, data, xerror, xnpoints to a file, in the same format as that used by hcurve.f in MadGraph's plot_events routine.


At this point, the user may plot the distribution of events according to the variable chosen in createoutput, either by using the vector plot functions:

v/plot data%xindex
v/plot xintdata%xindex
or by putting the vectors into a histogram:
1d 1001 '' BINSVAL MINVAL MAXVAL
h/put_vect/contents 1001 xintdata
h/put_vect/errors 1001 xinterror
set/min 1001 .000001
set/max 1001 100
h/plot 1001
selnt 1
text 10 17 'Insert text here' .3


Stupid PAW tricks:

  1. How to open a file whose name contains uppercase letters (from PAW FAQ #66):
    filecase keep
    
  2. For large ntuples (approximately 40 variables * 30-40k events), it is not load the entire ntuple into memory. Instead, a disk-resident ntuple must be used:
    h/file 1 (hbook filename) ! N
    nt/create (ntuple id) (ntuple name) (number of variables) //lun1 ! a1 a2 ...
    nt/read (ntuple id) (input file) ...
    
  3. How to get a list of previously executed commands (similar to the shell command history):
    last
    
  4. How to write text independent of the histogram axes (if zone X X not active):
    selnt 1
    text (x-coord) (y-coord) 'Insert text here' (font size)
    
    The units are in centimeters, and the graphics box is 20 x 20. For example, try text 10 17 'Hello' 0.3.

  5. How to get use user-defined titles:
    opt utit on
    title 'New title'     (global title is plotted at top of histogram)
    title 'New title' U   (user title is plotted at bottom of histogram)
    
    Use SET GSIZ and SET YGTI to change the size and Y position of the global title, or use SET TSIZ and SET YHTI to change the size and Y position of the user title.

  6. How to define maximum and minimum for plot, then overlay v/plot :
    1d 101 'Title' 500 0.0 500.0
    set/min 101 0.00001
    set/max 101 100
    h/plot
    v/plot y%x 101 s
    
  7. How to print a copy of the current graphic:
    opt zfl1  (this only needs to be executed once)
    h/plot ...
    pic/print x.ps
    
  8. How to get fancy formatting and characters in user-defined titles:
    subscripts:        title 'E?HAD!/E?EM!' U
    superscripts:      title 'x^2! y^2!' U
    Greek characters:  title 'W[gg] cross section' U
    absolute value:    title '"b#x"b#' U
    
    If you want to define an alias and use that alias to set a title, there is one area where you can get into trouble. If the alias contains no spaces, then things are fairly straightforward:
      alias/create ATI3 'E?HAD!'    
      atitle ATI3
    
    If you want the alias to have spaces, then the value must be encapsulated in an extra layer of double single quotes:
      alias/create ATI6 '''z-vertex match (cm)'''
      atitle ATI6
    
  9. How to turn off the annoying bounding box that is drawn around plots that are saved using pic/print:
    opt nbox
    
  10. How to display statistics:
    opt stat         (displays statistics for ID, Entries, Mean, RMS)
    opt nstat        (turns off statistics display)
    set stat 111111  (displays ID, Entries, Mean, RMS, Underflow, Overflow)
    set stat 110110  (displays     Entries, Mean,      Underflow, Overflow;
                      rightmost bit corresponds to ID, leftmost to Overflow)
    set stat 1111    (displays ID, Entries, Mean, RMS;
                      missing bits are assumed to be on left: 1111 = 001111)
    

Questions and things to do:

  1. Need to incorporate correct event weights for CompHEP data. Events are equally weighted (?), but how to get cross section included?
  2. What shell does PAW use? Will my "shell (command)" lines cause problems if a user is able to select a different shell?


A few complaints:

  1. In output files of CompHEP (/cdf/data23a/frisch/V_41.10/USER/Zmumu+gamma/EVENTS/events_1_uU.txt, for example), why is it that some lines in the header are not commented out? (Specifically, the Initial_state info)

  2. Then in the mixed and randomized file CompHEP output, why reduce the number of leading spaces on event lines from 7 to 1? This combined with the lack of a leading # in all header lines make it impossible to use MATCH in PAW to filter out non-event lines.

  3. Why are the output formats of MadGraph and CompHEP so different? Should there be a standard? Does the Les Houches accord dictate a specific output format?