fast_carpenter.summary package

class fast_carpenter.summary.BuildAghast(name, out_dir, binning, weights=None, dataset_col=True)[source]

Bases: object

Builds an aghast histogram.

Can be parametrized in the same way as fast_carpenter.BinnedDataframe (and actually uses that stage behind the scenes) but additionally writes out a Ghast which can be reloaded with other ghast packages.

See also

collector()[source]
contents
event(chunk)[source]
merge(rhs)[source]
class fast_carpenter.summary.BinnedDataframe(name, out_dir, binning, weights=None, dataset_col=True, pad_missing=False, file_format=None, observed=False, weight_data=False)[source]

Bases: object

Produces a binned dataframe (a multi-dimensional histogram).

def __init__(self, name, out_dir, binning, weights=None, dataset_col=False):

Parameters:
  • binning (list[dict]) –

    A list of dictionaries describing the variables to bin on, and how they should be binned. Each of these dictionaries can contain the following:

    Parameter Default Description
    in   The name of the attribute on the event to use.
    out same as in The name of the column to be filled in the output dataframe.
    bins None
    Must be either None or a dictionary. If a dictionary, it must contain one of the follow sets of
    key-value pairs:
    1. nbins, low, high: which are used to produce a list of bin edges equivalent to:
    numpy.linspace(low, high, nbins + 1)
    2. edges: which is treated as the list of bin edges directly.
    If set to None, then the input variable is assumed to already be categorical (ie. binned or discrete)
  • weights (str or list[str], dict[str, str]) – How to weight events in the output table. Must be either a single variable, a list of variables, or a dictionary where the values are variables in the data and keys are the column names that these weights should be called in the output tables.
  • file_format (str or list[str], dict[str, str]) – determines the file format to use to save the binned dataframe to disk. Should be either a) a string with the file format, b) a dict containing the keyword extension to give the file format and then all other keyword-argument pairs are passed on to the corresponding pandas function, or c) a list of values matching a) or b).
  • dataset_col (bool) – adds an extra binning column with the name for each dataset.
  • pad_missing (bool) – If False, any bins that don’t contain data are excluded from the stored dataframe. Leaving this False can save some disk-space and improve processing time, particularly if the bins are only very sparsely filled.
  • observed (bool) – If False bins in the dataframe will only be filled if their are datapoints contained within them. Otherwise, depending on the binning specification for each dimension, all bins for that dimension will be present. Use pad_missing: true to force all bins to be present.
Other Parameters:
 
  • name (str) – The name of this stage (handled automatically by fast-flow)
  • out_dir (str) – Where to put the summary table (handled automatically by fast-flow)
Raises:

BadBinnedDataframeConfig – If there is an issue with the binning description.

collector()[source]
event(chunk)[source]
merge(rhs)[source]
class fast_carpenter.summary.EventByEventDataframe(name, out_dir, collections, mask=None, flatten=True)[source]

Bases: object

Write out a pandas dataframe with event-level values

collector()[source]
event(chunk)[source]
merge(rhs)[source]