servicex package

Subpackages

Submodules

servicex.configuration module

pydantic model servicex.configuration.Configuration[source]

Bases: BaseModel

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Validators:
field api_endpoints: List[Endpoint] [Required]
field default_endpoint: str | None = None (alias 'default-endpoint')
field cache_path: str | None = None
field shortened_downloaded_filename: bool | None = False
validator expand_cache_path  »  all fields[source]

Expand the cache path to a full path, and create it if it doesn’t exist. Expand ${USER} to be the user name on the system. Works for windows, too. :param v: :return:

endpoint_dict() Dict[str, Endpoint][source]
classmethod read(config_path: str | None = None)[source]

Read configuration from .servicex or servicex.yaml file. :param config_path: If provided, use this as the path to the .servicex file.

Otherwise, search, starting from the current working directory and look in enclosing directories

Returns:

Populated configuration object

pydantic model servicex.configuration.Endpoint[source]

Bases: BaseModel

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

field endpoint: str [Required]
field name: str [Required]
field token: str | None = ''

servicex.databinder_models module

pydantic model servicex.databinder_models.General[source]

Bases: BaseModel

Represents a group of samples to be transformed together.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

class OutputFormatEnum(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: str, Enum

Specifies the output format for the transform request.

parquet = 'parquet'

Save the output as a parquet file https://parquet.apache.org/

root_ttree = 'root-ttree'

Save the output as a ROOT TTree https://root.cern.ch/doc/master/classTTree.html

to_ResultFormat() ResultFormat[source]

This method is used to convert the OutputFormatEnum enum to the ResultFormat enum, which is what is actually used for the TransformRequest. This allows us to use different string values in the two enum classes to maintain backend compatibility

class DeliveryEnum(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: str, Enum

LocalCache = 'LocalCache'

Download the files to the local computer and store them in the cache. Transform requests will return paths to these files in the cache

URLs = 'URLs'

Return URLs to the files stored in the ServiceX object store

field Codegen: str | None = None

Code generator name to be applied across all of the samples, if applicable. Generally users don’t need to specify this. It is implied by the query class

field OutputFormat: OutputFormatEnum = OutputFormatEnum.root_ttree

Output format for the transform request.

field Delivery: DeliveryEnum = DeliveryEnum.LocalCache

Specifies the delivery method for the output files.

field OutputDirectory: str | None = None

Directory to output a yaml file describing the output files.

field OutFilesetName: str = 'servicex_fileset'

Name of the yaml file that will be created in the output directory.

pydantic model servicex.databinder_models.Sample[source]

Bases: BaseModel

Represents a single transform request within a larger submission.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Validators:
field Name: str [Required]

The name of the sample. This makes it easier to identify the sample in the output.

field Dataset: DataSetIdentifier | None = None

Dataset identifier for the sample

field NFiles: int | None = None

Limit the Number of files to be used in the sample. The DID Finder will guarantee the same files will be returned between each invocation. Set to None to use all files.

field Query: str | QueryStringGenerator | None = None

Query string or query generator for the sample.

field IgnoreLocalCache: bool = False

Flag to ignore local cache.

field Codegen: str | None = None

Code generator name, if applicable. Generally users don’t need to specify this. It is implied by the query class

field RucioDID: str | None = None
Rucio Dataset Identifier, if applicable.

Deprecated: Use ‘Dataset’ instead.

field XRootDFiles: str | List[str] | None = None
XRootD file(s) associated with the sample.

Deprecated: Use ‘Dataset’ instead.

property dataset_identifier: DataSetIdentifier

Access the dataset identifier for the sample.

validator validate_did_xor_file  »  all fields[source]

Ensure that only one of Dataset, RootFile, or RucioDID is specified. :param values: :return:

validator truncate_long_sample_name  »  Name[source]

Truncate sample name to 128 characters if exceed Print warning message

pydantic model servicex.databinder_models.ServiceXSpec[source]

Bases: BaseModel

ServiceX Submission Specification - pass this into the ServiceX deliver function

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

field General: General = General(Codegen=None, OutputFormat=<OutputFormatEnum.root_ttree: 'root-ttree'>, Delivery=<DeliveryEnum.LocalCache: 'LocalCache'>, OutputDirectory=None, OutFilesetName='servicex_fileset')

General settings for the transform request

field Sample: List[Sample] [Required]

List of samples to be transformed

field Definition: List | None = None

Any reusable definitions that are needed for the transform request

servicex.dataset_group module

class servicex.dataset_group.DatasetGroup(datasets: List[Query])[source]

Bases: object

A group of datasets that are to be transformed together. This is a convenience class to allow you to submit multiple datasets to a ServiceX instance and then wait for all of them to complete.

Parameters:

datasets – List of transform request as dataset instances

as_files(display_progress: bool = True, provided_progress: Progress | None = None, return_exceptions: bool = False) List[TransformedResults | BaseException]
async as_files_async(display_progress: bool = True, provided_progress: Progress | None = None, return_exceptions: bool = False) List[TransformedResults | BaseException][source]
as_signed_urls(display_progress: bool = True, provided_progress: Progress | None = None, return_exceptions: bool = False) List[TransformedResults | BaseException]
async as_signed_urls_async(display_progress: bool = True, provided_progress: Progress | None = None, return_exceptions: bool = False) List[TransformedResults | BaseException][source]
set_result_format(result_format: ResultFormat)[source]

Set the result format for all the datasets in the group.

Parameters:

result_format – ResultFormat instance

servicex.dataset_identifier module

class servicex.dataset_identifier.CERNOpenDataDatasetIdentifier(dataset: int, num_files: int | None = None)[source]

Bases: DataSetIdentifier

CERN Open Data Dataset - this will be looked up using the CERN Open Data DID finder.

Parameters:
  • dataset – The dataset ID - this is an integer.

  • num_files – Maximum number of files to return. This is useful during development to perform quick runs. ServiceX is careful to make sure it always returns the same subset of files.

classmethod from_yaml(_, node)[source]
yaml_tag = '!CERNOpenData'
class servicex.dataset_identifier.DataSetIdentifier(scheme: str, dataset: str, num_files: int | None = None)[source]

Bases: object

Base class for specifying the dataset to transform. This can either be a list of xRootD URIs or a rucio DID

property did
populate_transform_request(transform_request: TransformRequest) None[source]
class servicex.dataset_identifier.FileListDataset(files: List[str] | str)[source]

Bases: DataSetIdentifier

Dataset specified as a list of XRootD URIs.

Parameters:

files – Either a list of URIs or a single URI string

property did
files: List[str]
classmethod from_yaml(constructor, node)[source]
populate_transform_request(transform_request: TransformRequest) None[source]
yaml_tag = '!FileList'
class servicex.dataset_identifier.RucioDatasetIdentifier(dataset: str, num_files: int | None = None)[source]

Bases: DataSetIdentifier

Rucio Dataset - this will be looked up using the Rucio data management service.

Parameters:
  • dataset – The rucio DID - this can be a dataset or a container of datasets.

  • num_files – Maximum number of files to return. This is useful during development to perform quick runs. ServiceX is careful to make sure it always returns the same subset of files.

classmethod from_yaml(_, node)[source]
yaml_tag = '!Rucio'

servicex.expandable_progress module

class servicex.expandable_progress.ExpandableProgress(display_progress: bool = True, provided_progress: Progress | ExpandableProgress | None = None, overall_progress: bool = False)[source]

Bases: object

We want to be able to use rich progress bars in the async code, but there are some situtations where the user doesn’t want them. Also we might be running several simultaneous progress bars, and we want to be able to control that.

We still want to keep the context manager interface, so this class implements the context manager but if display_progress is False, then it does nothing. If provided_progress is set then we just use that. Otherwise we create a new progress bar

Parameters:
  • display_progress

  • provided_progress

add_task(param, start, total)[source]
advance(task_id, task_type)[source]
start_task(task_id, task_type)[source]
update(task_id, task_type, total=None, completed=None, **fields)[source]
class servicex.expandable_progress.ProgressCounts(description: str, task_id: TaskID, start: int | None = None, total: int | None = None, completed: int | None = None)[source]

Bases: object

class servicex.expandable_progress.TranformStatusProgress(*columns: str | ProgressColumn, console: Console | None = None, auto_refresh: bool = True, refresh_per_second: float = 10, speed_estimate_period: float = 30.0, transient: bool = False, redirect_stdout: bool = True, redirect_stderr: bool = True, get_time: Callable[[], float] | None = None, disable: bool = False, expand: bool = False)[source]

Bases: Progress

get_renderables()[source]

Get a number of renderables for the progress display.

servicex.minio_adapter module

class servicex.minio_adapter.MinioAdapter(endpoint_host: str, secure: bool, access_key: str, secret_key: str, bucket: str)[source]

Bases: object

MAX_PATH_LEN = 60
async download_file(object_name: str, local_dir: str, shorten_filename: bool = False) Path[source]
classmethod for_transform(transform: TransformStatus)[source]
async get_signed_url(object_name: str) str[source]
classmethod hash_path(file_name)[source]

Make the path safe for object store or POSIX, by keeping the length less than MAX_PATH_LEN. Replace the leading (less interesting) characters with a forty character hash. :param file_name: Input filename :return: Safe path string

async list_bucket() List[ResultFile][source]

servicex.models module

class servicex.models.ResultDestination(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: str, Enum

Direct the output to object store or posix volume

object_store = 'object-store'
volume = 'volume'
pydantic model servicex.models.ResultFile[source]

Bases: BaseModel

Record reporting the properties of a transformed file result

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

field filename: str [Required]
field size: int [Required]
field extension: str [Required]
class servicex.models.ResultFormat(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: str, Enum

Specify the file format for the generated output

parquet = 'parquet'
root_ttree = 'root-file'
class servicex.models.Status(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: str, Enum

Status of a submitted transform

canceled = 'Canceled'
complete = 'Complete'
fatal = 'Fatal'
looking = 'Lookup'
pending = 'Pending Lookup'
running = 'Running'
submitted = 'Submitted'
pydantic model servicex.models.TransformRequest[source]

Bases: BaseModel

Transform request sent to ServiceX

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

field title: str | None = None
field did: str | None = None
field file_list: List[str] | None = None (alias 'file-list')
field selection: str [Required]
field image: str | None = None
field codegen: str [Required]
field tree_name: str | None = None (alias 'tree-name')
field result_destination: ResultDestination [Required]
field result_format: ResultFormat [Required]
compute_hash()[source]

Compute a hash for this submission. Only include properties that impact the result so we have maximal ability to reuse transforms

Returns:

SHA256 hash of request

pydantic model servicex.models.TransformStatus[source]

Bases: BaseModel

Status object returned by servicex

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Validators:
field request_id: str [Required]
field did: str [Required]
field title: str | None = None
field selection: str [Required]
field tree_name: str | None [Required]
field image: str [Required]
field result_destination: ResultDestination [Required]
field result_format: ResultFormat [Required]
field generated_code_cm: str [Required]
field status: Status [Required]
field app_version: str [Required]
field files: int [Required]
field files_completed: int [Required]
field files_failed: int [Required]
field files_remaining: int | None = 0
field submit_time: datetime = None
field finish_time: datetime | None = None
field minio_endpoint: str | None = None
field minio_secured: bool | None = None
field minio_access_key: str | None = None
field minio_secret_key: str | None = None
field log_url: str | None = None
validator parse_finish_time  »  finish_time[source]
pydantic model servicex.models.TransformedResults[source]

Bases: BaseModel

Returned for a submission. Gives you everything you need to know about a completed transform.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

field hash: str [Required]
field title: str [Required]
field codegen: str [Required]
field request_id: str [Required]
field submit_time: datetime [Required]
field data_dir: str [Required]
field file_list: List[str] [Required]
field signed_url_list: List[str] [Required]
field files: int [Required]
field result_format: ResultFormat [Required]
field log_url: str | None = None

servicex.python_dataset module

class servicex.python_dataset.PythonFunction(python_function: str | Callable | None = None)[source]

Bases: QueryStringGenerator

default_codegen: str | None = 'python'
classmethod from_yaml(_, node)[source]
generate_selection_string() str[source]

override with the selection string to send to ServiceX

with_uproot_function(f: str | Callable) Self[source]
yaml_tag = '!PythonFunction'

servicex.query module

servicex.query_cache module

exception servicex.query_cache.CacheException[source]

Bases: Exception

class servicex.query_cache.QueryCache(config: Configuration)[source]

Bases: object

cache_path_for_transform(transform_status: TransformStatus) Path[source]
cache_transform(record: TransformedResults)[source]
cached_queries() List[TransformedResults][source]
close()[source]
delete_codegen_by_backend(backend: str)[source]
delete_record_by_request_id(request_id: str)[source]
get_codegen_by_backend(backend: str) dict | None[source]
get_transform_by_hash(hash: str) TransformedResults | None[source]
get_transform_by_request_id(request_id: str) TransformedResults | None[source]
transformed_results(transform: TransformRequest, completed_status: TransformStatus, data_dir: str, file_list: List[str], signed_urls) TransformedResults[source]
update_codegen_by_backend(backend: str, codegen_list: list)[source]
update_record(record: TransformedResults)[source]

servicex.servicex_adapter module

exception servicex.servicex_adapter.AuthorizationError[source]

Bases: BaseException

class servicex.servicex_adapter.ServiceXAdapter(url: str, refresh_token: str | None = None)[source]

Bases: object

get_code_generators()[source]
async get_transform_status(request_id: str) TransformStatus[source]
async get_transforms() List[TransformStatus][source]
async submit_transform(transform_request: TransformRequest)[source]

servicex.servicex_client module

class servicex.servicex_client.GuardList(data: Sequence | Exception)[source]

Bases: Sequence

valid() bool[source]
exception servicex.servicex_client.ReturnValueException(exc)[source]

Bases: Exception

An exception occurred at some point while obtaining this result from ServiceX

class servicex.servicex_client.ServiceXClient(backend=None, url=None, config_path=None)[source]

Bases: object

Connection to a ServiceX deployment. Instances of this class can deployment data from the service and also interact with previously run transformations. Instances of this class are factories for Datasets`

If both backend and url are unspecified then it will attempt to pick up the default backend from .servicex

Parameters:
  • backend – Name of a deployment from the .servicex file

  • url – Direct URL of a serviceX deployment instead of using .servicex. Can only work with hosts without auth, or the token is found in a file pointed to by the environment variable BEARER_TOKEN_FILE

  • config_path – Optional path te the .servicex file. If not specified, will search in local directory and up in enclosing directories

generic_query(dataset_identifier: DataSetIdentifier | FileListDataset, query: str | QueryStringGenerator, codegen: str | None = None, title: str = 'ServiceX Client', result_format: ResultFormat = ResultFormat.parquet, ignore_cache: bool = False) Query[source]

Generate a Query object for a generic codegen specification

Parameters:
  • dataset_identifier – The dataset identifier or filelist to be the source of files

  • title – Title to be applied to the transform. This is also useful for relating transform results.

  • codegen – Name of the code generator to use with this transform

  • result_format – Do you want Paqrquet or Root? This can be set later with the set_result_format method

  • ignore_cache – Ignore the query cache and always run the query

Returns:

A Query object

get_code_generators(backend=None)[source]

Retrieve the code generators deployed with the serviceX instance :return: The list of code generators as json dictionary

get_transform_status(transform_id) TransformStatus

Get the status of a given transform :param transform_id: The uuid of the transform :return: The current status for the transform

async get_transform_status_async(transform_id) TransformStatus[source]

Get the status of a given transform :param transform_id: The uuid of the transform :return: The current status for the transform

get_transforms() List[TransformStatus]

Retrieve all transforms you have run on the server :return: List of Transform status objects

async get_transforms_async() List[TransformStatus][source]

Retrieve all transforms you have run on the server :return: List of Transform status objects

servicex.servicex_client.deliver(config: ServiceXSpec | Mapping[str, Any] | str | Path, config_path: str | None = None, servicex_name: str | None = None, return_exceptions: bool = True)[source]

servicex.types module

Module contents

pydantic model servicex.General[source]

Bases: BaseModel

Represents a group of samples to be transformed together.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

class OutputFormatEnum(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: str, Enum

Specifies the output format for the transform request.

parquet = 'parquet'

Save the output as a parquet file https://parquet.apache.org/

root_ttree = 'root-ttree'

Save the output as a ROOT TTree https://root.cern.ch/doc/master/classTTree.html

to_ResultFormat() ResultFormat[source]

This method is used to convert the OutputFormatEnum enum to the ResultFormat enum, which is what is actually used for the TransformRequest. This allows us to use different string values in the two enum classes to maintain backend compatibility

class DeliveryEnum(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: str, Enum

LocalCache = 'LocalCache'

Download the files to the local computer and store them in the cache. Transform requests will return paths to these files in the cache

URLs = 'URLs'

Return URLs to the files stored in the ServiceX object store

field Codegen: str | None = None

Code generator name to be applied across all of the samples, if applicable. Generally users don’t need to specify this. It is implied by the query class

field OutputFormat: OutputFormatEnum = OutputFormatEnum.root_ttree

Output format for the transform request.

field Delivery: DeliveryEnum = DeliveryEnum.LocalCache

Specifies the delivery method for the output files.

field OutputDirectory: str | None = None

Directory to output a yaml file describing the output files.

field OutFilesetName: str = 'servicex_fileset'

Name of the yaml file that will be created in the output directory.

class servicex.ResultDestination(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: str, Enum

Direct the output to object store or posix volume

object_store = 'object-store'
volume = 'volume'
class servicex.ResultFormat(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: str, Enum

Specify the file format for the generated output

parquet = 'parquet'
root_ttree = 'root-file'
pydantic model servicex.Sample[source]

Bases: BaseModel

Represents a single transform request within a larger submission.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Validators:
field Name: str [Required]

The name of the sample. This makes it easier to identify the sample in the output.

field Dataset: DataSetIdentifier | None = None

Dataset identifier for the sample

field NFiles: int | None = None

Limit the Number of files to be used in the sample. The DID Finder will guarantee the same files will be returned between each invocation. Set to None to use all files.

field Query: str | QueryStringGenerator | None = None

Query string or query generator for the sample.

field IgnoreLocalCache: bool = False

Flag to ignore local cache.

field Codegen: str | None = None

Code generator name, if applicable. Generally users don’t need to specify this. It is implied by the query class

field RucioDID: str | None = None
Rucio Dataset Identifier, if applicable.

Deprecated: Use ‘Dataset’ instead.

field XRootDFiles: str | List[str] | None = None
XRootD file(s) associated with the sample.

Deprecated: Use ‘Dataset’ instead.

property dataset_identifier: DataSetIdentifier

Access the dataset identifier for the sample.

validator validate_did_xor_file  »  all fields[source]

Ensure that only one of Dataset, RootFile, or RucioDID is specified. :param values: :return:

validator truncate_long_sample_name  »  Name[source]

Truncate sample name to 128 characters if exceed Print warning message

pydantic model servicex.ServiceXSpec[source]

Bases: BaseModel

ServiceX Submission Specification - pass this into the ServiceX deliver function

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

field General: General = General(Codegen=None, OutputFormat=<OutputFormatEnum.root_ttree: 'root-ttree'>, Delivery=<DeliveryEnum.LocalCache: 'LocalCache'>, OutputDirectory=None, OutFilesetName='servicex_fileset')

General settings for the transform request

field Sample: List[Sample] [Required]

List of samples to be transformed

field Definition: List | None = None

Any reusable definitions that are needed for the transform request

servicex.deliver(config: ServiceXSpec | Mapping[str, Any] | str | Path, config_path: str | None = None, servicex_name: str | None = None, return_exceptions: bool = True)[source]