fast_carpenter package¶
Top-level package for fast-carpenter.
-
class
fast_carpenter.Define(name, out_dir, variables)[source]¶ Bases:
objectCreates new variables using a string-based expression.
There are two types of expressions:
- Simple formulae, and
- Reducing formulae.
The essential difference, unfortunately, is an internal one: simple expressions are nearly directly handled by numexpr, whereas reducing expressions add a layer on top.
From a users perspective, however, simple expressions are those that preserve the dimensionality of the input. If one of the input variables represents a list of values for each event (whose length might vary), then the output will contain an equal-length list of values for each event.
If, however, a reducing expression is used, then there will be one less dimension on the resulting variable. In this case, if an input variable has a list of values for each event, the result of the expression will only contain a single value per event.
Parameters: variables (list[dictionary]) – A list of single-length dictionaries whose key is the name of the resulting variable, and whose value is the expression to create it.
Other Parameters: - name (str) – The name of this stage (handled automatically by fast-flow)
- out_dir (str) – Where to put the summary table (handled automatically by fast-flow)
Example
variables: - Muon_pt: "sqrt(Muon_px**2 + Muon_py**2)" - Muon_is_good: (Muon_iso > 0.3) & (Muon_pt > 10) - NGoodMuons: {reduce: count_nonzero, formula: Muon_is_good} - First_Muon_pt: {reduce: 0, formula: Muon_pt}
See also
fast_carpenter.define.reductions– for how reductions are handled and exactly what is valid.- numexpr: which is used for the internal expression handling.
-
class
fast_carpenter.SystematicWeights(name, out_dir, weights, out_format='weight_{}', extra_variations=[])[source]¶ Bases:
objectCombines multiple weights and variations to produce a single event weight
To study the impact of systematic uncertainties it is common to re-weight events using a variation of the weights representing, for example, a 1-sigma increase or decrease in the weights. Once there are multiple weight schemes involved writing out each possible combination of these weights becomes tedious and potentially error-prone; this stage makes it easier.
It forms the
nominalweight for each event by multiplying all nominal weights together, then the specific variation by replacing a given nominal weight with its corresponding “up” or “down” variation.Each variation of a weight should just be a string giving an expression to use for that variation. This stage then combines these into a single expression by joining each set of variations with “*”, i.e. multiplying them together. The final results then use an internal
Definestage to do the calculation.Parameters: - weights (dictionary[str, dictionary]) – A Dictionary of weight variations
to combine. The keys in this dictionary will determine how this
variation is called in the output variable. The values of this
dictionary should either be a single string – the name of the input
variable to use for the “nominal” variation, or a dictionary containing
any of the keys,
nominal,up, ordown. Each of these should then have a value providing the expression to use for that variation/ - out_format (str) – The format string to use to build the name of the output variations. Defaults to “weight_{}”. Should contain a pair of empty braces which will be replaced with the name for the current variation, e.g. “nominal” or “PileUp_up”.
- extra_variations (list[str]) – A list of additional variations to allow
Other Parameters: - name (str) – The name of this stage (handled automatically by fast-flow)
- out_dir (str) – Where to put the summary table (handled automatically by fast-flow)
Example
syst_weights: energy_scale: {nominal: WeightEnergyScale, up: WeightEnergyScaleUp, down: WeightEnergyScaleDown} trigger: TriggerEfficiency recon: {nominal: ReconEfficiency, up: ReconEfficiency_up}
which will create 4 new variables:
weight_nominal = WeightEnergyScale * TriggerEfficiency * ReconEfficiency weight_energy_scale_up = WeightEnergyScaleUp * TriggerEfficiency * ReconEfficiency weight_energy_scale_down = WeightEnergyScaleDown * TriggerEfficiency * ReconEfficiency weight_recon_up = WeightEnergyScale * TriggerEfficiency * ReconEfficiency_up
- weights (dictionary[str, dictionary]) – A Dictionary of weight variations
to combine. The keys in this dictionary will determine how this
variation is called in the output variable. The values of this
dictionary should either be a single string – the name of the input
variable to use for the “nominal” variation, or a dictionary containing
any of the keys,
-
class
fast_carpenter.CutFlow(name, out_dir, selection_file=None, keep_unique_id=False, selection=None, counter=True, weights=None)[source]¶ Bases:
objectPrevents subsequent stages seeing certain events.
The two most important parameters to understand are the
selectionandweightsparameters.Parameters: - selection (str or dict) – The criteria for selecting events, formed by a
nested set of “cuts”. Each cut must either be a valid Expressions
or a single-length dictionary, with one of
AnyorAllas the key, and a list of cuts as the value. - weights (str or list[str], dict[str, str]) – How to weight events in the output summary table. Must be either a single variable, a list of variables, or a dictionary where the values are variables in the data and keys are the column names that these weights should be called in the output tables.
Example
Mask events using a single cut based on the
nJetvariable being greater than 2 and weight events in the summary table by theEventWeightvariable:cut_flow_1: selection: nJet > 2 weights: EventWeight
Mask events by requiring both the
nMuonvariable being greater than 2 and the firstMuon_energyvalue in each event being above 20. Don’t weight events in the summary table:cut_flow_2: selection: All: - nMuon > 2 - {reduce: 0, formula: Muon_energy > 20}
Mask events by requiring the
nMuonvariable be greater than 2 and either the firstMuon_energyvalue in each event is above 20 or thetotal_energyis greater than 100. The summary table will weight events by both the EventWeight variable (called weight_nominal in the table) and the SystUp variable (called weight_syst_up in the summary):cut_flow_3: selection: All: - nMuon > 2 - Any: - {reduce: 0, formula: Muon_energy > 20} - total_energy > 100 weights: {weight_nominal: EventWeight, weight_syst_up: SystUp}
Other Parameters: - name (str) – The name of this stage (handled automatically by fast-flow)
- out_dir (str) – Where to put the summary table (handled automatically by fast-flow)
- selection_file (str) – Deprecated
- keep_unique_id (bool) – If
True, the summary table will contain a column that gives each cut a unique id. This is used internally to maintain the cut order, and often will not be useful in subsequent manipulation of the output table, so by default this is removed. - counter (bool) – Currently unused
Raises: BadCutflowConfig– If neither or both ofselectionandselection_fileare provided, or if a bad selection or weight configuration is given.See also
SelectPhaseSpace: Adds the resulting event-mask as a new variable to the data.selection.filters.build_selection(): Handles the actual creation of the event selection, based on the configuration.numexpr: which is used for the internal expression handling.
- selection (str or dict) – The criteria for selecting events, formed by a
nested set of “cuts”. Each cut must either be a valid Expressions
or a single-length dictionary, with one of
-
class
fast_carpenter.SelectPhaseSpace(name, out_dir, region_name, **kwargs)[source]¶ Bases:
fast_carpenter.selection.stage.CutFlowCreates an event-mask and adds it to the data-space.
This is identical to the
CutFlowclass, except that the resulting mask is added to the list of variables in the data-space, rather than being used directly to remove events. This allows multiple “regions” to be defined using different CutFlows in a single configuration.Parameters: region_name – The name given to the resulting mask when added to back to the data-space. See also
CutFlow: for a description of the other parameters.
-
class
fast_carpenter.BinnedDataframe(name, out_dir, binning, weights=None, dataset_col=True, pad_missing=False, file_format=None)[source]¶ Bases:
objectProduces a binned dataframe (a multi-dimensional histogram).
def __init__(self, name, out_dir, binning, weights=None, dataset_col=False):
Parameters: - binning (list[dict]) –
A list of dictionaries describing the variables to bin on, and how they should be binned. Each of these dictionaries can contain the following:
Parameter Default Description inThe name of the attribute on the event to use. outsame as inThe name of the column to be filled in the output dataframe. binsNoneMust be eitherNoneor a dictionary. If a dictionary, it must contain one of the follow sets ofkey-value pairs:1.nbins,low,high: which are used to produce a list of bin edges equivalent to:numpy.linspace(low, high, nbins + 1)2.edges: which is treated as the list of bin edges directly.If set toNone, then the input variable is assumed to already be categorical (ie. binned or discrete) - weights (str or list[str], dict[str, str]) – How to weight events in the output table. Must be either a single variable, a list of variables, or a dictionary where the values are variables in the data and keys are the column names that these weights should be called in the output tables.
- file_format (str or list[str], dict[str, str]) – determines the file format to use to save the binned dataframe to disk. Should be either a) a string with the file format, b) a dict containing the keyword extension to give the file format and then all other keyword-argument pairs are passed on to the corresponding pandas function, or c) a list of values matching a) or b).
- dataset_col (bool) – adds an extra binning column with the name for each dataset.
- pad_missing (bool) – If
False, any bins that don’t contain data are excluded from the stored dataframe. Leaving thisFalsecan save some disk-space and improve processing time, particularly if the bins are only very sparsely filled.
Other Parameters: - name (str) – The name of this stage (handled automatically by fast-flow)
- out_dir (str) – Where to put the summary table (handled automatically by fast-flow)
Raises: BadBinnedDataframeConfig– If there is an issue with the binning description.- binning (list[dict]) –
-
class
fast_carpenter.BuildAghast(name, out_dir, binning, weights=None, dataset_col=True)[source]¶ Bases:
objectBuilds an aghast histogram.
Can be parametrized in the same way as
fast_carpenter.BinnedDataframe(and actually uses that stage behind the scenes) but additionally writes out a Ghast which can be reloaded with other ghast packages.See also
fast_carpenter.BinnedDataframefor a version which only produces binned pandas dataframes.- The aghast main page: https://github.com/scikit-hep/aghast.
-
contents¶