DataBinder

This page documents the classes used to describe a DataBinder request.

pydantic model servicex.databinder_models.ServiceXSpec[source]

ServiceX Submission Specification - pass this into the ServiceX deliver function

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

field General: General = General(Codegen=None, OutputFormat=<OutputFormatEnum.root_ttree: 'root-ttree'>, Delivery=<DeliveryEnum.LocalCache: 'LocalCache'>, OutputDirectory=None, OutFilesetName='servicex_fileset')

General settings for the transform request

field Sample: List[Sample] [Required]

List of samples to be transformed

field Definition: List | None = None

Any reusable definitions that are needed for the transform request

pydantic model servicex.databinder_models.General[source]

Represents a group of samples to be transformed together.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

class OutputFormatEnum(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Specifies the output format for the transform request.

parquet = 'parquet'

Save the output as a parquet file https://parquet.apache.org/

root_ttree = 'root-ttree'

Save the output as a ROOT TTree https://root.cern.ch/doc/master/classTTree.html

class DeliveryEnum(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]
LocalCache = 'LocalCache'

Download the files to the local computer and store them in the cache. Transform requests will return paths to these files in the cache

URLs = 'URLs'

Return URLs to the files stored in the ServiceX object store

field Codegen: str | None = None

Code generator name to be applied across all of the samples, if applicable. Generally users don’t need to specify this. It is implied by the query class

field OutputFormat: OutputFormatEnum = OutputFormatEnum.root_ttree

Output format for the transform request.

field Delivery: DeliveryEnum = DeliveryEnum.LocalCache

Specifies the delivery method for the output files.

field OutputDirectory: str | None = None

Directory to output a yaml file describing the output files.

field OutFilesetName: str = 'servicex_fileset'

Name of the yaml file that will be created in the output directory.

pydantic model servicex.databinder_models.Sample[source]

Represents a single transform request within a larger submission.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

field Name: str [Required]

The name of the sample. This makes it easier to identify the sample in the output.

field Dataset: DataSetIdentifier | None = None

Dataset identifier for the sample

field NFiles: int | None = None

Limit the Number of files to be used in the sample. The DID Finder will guarantee the same files will be returned between each invocation. Set to None to use all files.

field Query: str | QueryStringGenerator | None = None

Query string or query generator for the sample.

field IgnoreLocalCache: bool = False

Flag to ignore local cache.

field Codegen: str | None = None

Code generator name, if applicable. Generally users don’t need to specify this. It is implied by the query class

field RucioDID: str | None = None
Rucio Dataset Identifier, if applicable.

Deprecated: Use ‘Dataset’ instead.

field XRootDFiles: str | List[str] | None = None
XRootD file(s) associated with the sample.

Deprecated: Use ‘Dataset’ instead.

property dataset_identifier: DataSetIdentifier

Access the dataset identifier for the sample.