fast_carpenter package

Top-level package for fast-carpenter.

class fast_carpenter.Define(name, out_dir, variables)[source]

Bases: object

Creates new variables using a string-based expression.

There are two types of expressions:

  • Simple formulae, and
  • Reducing formulae.

The essential difference, unfortunately, is an internal one: simple expressions are nearly directly handled by numexpr, whereas reducing expressions add a layer on top.

From a users perspective, however, simple expressions are those that preserve the dimensionality of the input. If one of the input variables represents a list of values for each event (whose length might vary), then the output will contain an equal-length list of values for each event.

If, however, a reducing expression is used, then there will be one less dimension on the resulting variable. In this case, if an input variable has a list of values for each event, the result of the expression will only contain a single value per event.

Parameters:

variables (list[dictionary]) – A list of single-length dictionaries whose key is the name of the resulting variable, and whose value is the expression to create it.

Other Parameters:
 
  • name (str) – The name of this stage (handled automatically by fast-flow)
  • out_dir (str) – Where to put the summary table (handled automatically by fast-flow)

Example

variables:
  - Muon_pt: "sqrt(Muon_px**2 + Muon_py**2)"
  - Muon_is_good: (Muon_iso > 0.3) & (Muon_pt > 10)
  - NGoodMuons: {reduce: count_nonzero, formula: Muon_is_good}
  - First_Muon_pt: {reduce: 0, formula: Muon_pt}

See also

event(chunk)[source]
class fast_carpenter.SystematicWeights(name, out_dir, weights, out_format='weight_{}', extra_variations=[])[source]

Bases: object

Combines multiple weights and variations to produce a single event weight

To study the impact of systematic uncertainties it is common to re-weight events using a variation of the weights representing, for example, a 1-sigma increase or decrease in the weights. Once there are multiple weight schemes involved writing out each possible combination of these weights becomes tedious and potentially error-prone; this stage makes it easier.

It forms the nominal weight for each event by multiplying all nominal weights together, then the specific variation by replacing a given nominal weight with its corresponding “up” or “down” variation.

Each variation of a weight should just be a string giving an expression to use for that variation. This stage then combines these into a single expression by joining each set of variations with “*”, i.e. multiplying them together. The final results then use an internal Define stage to do the calculation.

Parameters:
  • weights (dictionary[str, dictionary]) – A Dictionary of weight variations to combine. The keys in this dictionary will determine how this variation is called in the output variable. The values of this dictionary should either be a single string – the name of the input variable to use for the “nominal” variation, or a dictionary containing any of the keys, nominal, up, or down. Each of these should then have a value providing the expression to use for that variation/
  • out_format (str) – The format string to use to build the name of the output variations. Defaults to “weight_{}”. Should contain a pair of empty braces which will be replaced with the name for the current variation, e.g. “nominal” or “PileUp_up”.
  • extra_variations (list[str]) – A list of additional variations to allow
Other Parameters:
 
  • name (str) – The name of this stage (handled automatically by fast-flow)
  • out_dir (str) – Where to put the summary table (handled automatically by fast-flow)

Example

syst_weights:
  energy_scale: {nominal: WeightEnergyScale, up: WeightEnergyScaleUp, down: WeightEnergyScaleDown}
  trigger: TriggerEfficiency
  recon: {nominal: ReconEfficiency, up: ReconEfficiency_up}

which will create 4 new variables:

weight_nominal =  WeightEnergyScale * TriggerEfficiency * ReconEfficiency
weight_energy_scale_up =  WeightEnergyScaleUp * TriggerEfficiency * ReconEfficiency
weight_energy_scale_down =  WeightEnergyScaleDown * TriggerEfficiency * ReconEfficiency
weight_recon_up =  WeightEnergyScale * TriggerEfficiency * ReconEfficiency_up
event(chunk)[source]
class fast_carpenter.CutFlow(name, out_dir, selection_file=None, keep_unique_id=False, selection=None, counter=True, weights=None)[source]

Bases: object

Prevents subsequent stages seeing certain events.

The two most important parameters to understand are the selection and weights parameters.

Parameters:
  • selection (str or dict) – The criteria for selecting events, formed by a nested set of “cuts”. Each cut must either be a valid Expressions or a single-length dictionary, with one of Any or All as the key, and a list of cuts as the value.
  • weights (str or list[str], dict[str, str]) – How to weight events in the output summary table. Must be either a single variable, a list of variables, or a dictionary where the values are variables in the data and keys are the column names that these weights should be called in the output tables.

Example

Mask events using a single cut based on the nJet variable being greater than 2 and weight events in the summary table by the EventWeight variable:

cut_flow_1:
    selection:
        nJet > 2
    weights: EventWeight

Mask events by requiring both the nMuon variable being greater than 2 and the first Muon_energy value in each event being above 20. Don’t weight events in the summary table:

cut_flow_2:
    selection:
        All:
          - nMuon > 2
          - {reduce: 0, formula: Muon_energy > 20}

Mask events by requiring the nMuon variable be greater than 2 and either the first Muon_energy value in each event is above 20 or the total_energy is greater than 100. The summary table will weight events by both the EventWeight variable (called weight_nominal in the table) and the SystUp variable (called weight_syst_up in the summary):

cut_flow_3:
    selection:
        All:
          - nMuon > 2
          - Any:
            - {reduce: 0, formula: Muon_energy > 20}
            - total_energy > 100
    weights: {weight_nominal: EventWeight, weight_syst_up: SystUp}
Other Parameters:
 
  • name (str) – The name of this stage (handled automatically by fast-flow)
  • out_dir (str) – Where to put the summary table (handled automatically by fast-flow)
  • selection_file (str) – Deprecated
  • keep_unique_id (bool) – If True, the summary table will contain a column that gives each cut a unique id. This is used internally to maintain the cut order, and often will not be useful in subsequent manipulation of the output table, so by default this is removed.
  • counter (bool) – Currently unused
Raises:

BadCutflowConfig – If neither or both of selection and selection_file are provided, or if a bad selection or weight configuration is given.

See also

SelectPhaseSpace: Adds the resulting event-mask as a new variable to the data.

selection.filters.build_selection(): Handles the actual creation of the event selection, based on the configuration.

numexpr: which is used for the internal expression handling.

collector()[source]
event(chunk)[source]
merge(rhs)[source]
class fast_carpenter.SelectPhaseSpace(name, out_dir, region_name, **kwargs)[source]

Bases: fast_carpenter.selection.stage.CutFlow

Creates an event-mask and adds it to the data-space.

This is identical to the CutFlow class, except that the resulting mask is added to the list of variables in the data-space, rather than being used directly to remove events. This allows multiple “regions” to be defined using different CutFlows in a single configuration.

Parameters:region_name – The name given to the resulting mask when added to back to the data-space.

See also

CutFlow: for a description of the other parameters.

event(chunk)[source]
class fast_carpenter.BinnedDataframe(name, out_dir, binning, weights=None, dataset_col=True, pad_missing=False, file_format=None)[source]

Bases: object

Produces a binned dataframe (a multi-dimensional histogram).

def __init__(self, name, out_dir, binning, weights=None, dataset_col=False):

Parameters:
  • binning (list[dict]) –

    A list of dictionaries describing the variables to bin on, and how they should be binned. Each of these dictionaries can contain the following:

    Parameter Default Description
    in   The name of the attribute on the event to use.
    out same as in The name of the column to be filled in the output dataframe.
    bins None
    Must be either None or a dictionary. If a dictionary, it must contain one of the follow sets of
    key-value pairs:
    1. nbins, low, high: which are used to produce a list of bin edges equivalent to:
    numpy.linspace(low, high, nbins + 1)
    2. edges: which is treated as the list of bin edges directly.
    If set to None, then the input variable is assumed to already be categorical (ie. binned or discrete)
  • weights (str or list[str], dict[str, str]) – How to weight events in the output table. Must be either a single variable, a list of variables, or a dictionary where the values are variables in the data and keys are the column names that these weights should be called in the output tables.
  • file_format (str or list[str], dict[str, str]) – determines the file format to use to save the binned dataframe to disk. Should be either a) a string with the file format, b) a dict containing the keyword extension to give the file format and then all other keyword-argument pairs are passed on to the corresponding pandas function, or c) a list of values matching a) or b).
  • dataset_col (bool) – adds an extra binning column with the name for each dataset.
  • pad_missing (bool) – If False, any bins that don’t contain data are excluded from the stored dataframe. Leaving this False can save some disk-space and improve processing time, particularly if the bins are only very sparsely filled.
Other Parameters:
 
  • name (str) – The name of this stage (handled automatically by fast-flow)
  • out_dir (str) – Where to put the summary table (handled automatically by fast-flow)
Raises:

BadBinnedDataframeConfig – If there is an issue with the binning description.

collector()[source]
event(chunk)[source]
merge(rhs)[source]
class fast_carpenter.BuildAghast(name, out_dir, binning, weights=None, dataset_col=True)[source]

Bases: object

Builds an aghast histogram.

Can be parametrized in the same way as fast_carpenter.BinnedDataframe (and actually uses that stage behind the scenes) but additionally writes out a Ghast which can be reloaded with other ghast packages.

See also

collector()[source]
contents
event(chunk)[source]
merge(rhs)[source]