servicex package¶
Subpackages¶
- servicex.app package
- servicex.databinder_models
- servicex.func_adl package
Submodules¶
servicex.configuration module¶
- pydantic model servicex.configuration.Configuration[source]¶
Bases:
BaseModel
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- Validators:
expand_cache_path
»all fields
- field default_endpoint: str | None = None (alias 'default-endpoint')¶
- field cache_path: str | None = None¶
- field shortened_downloaded_filename: bool | None = False¶
- validator expand_cache_path » all fields[source]¶
Expand the cache path to a full path, and create it if it doesn’t exist. Expand ${USER} to be the user name on the system. Works for windows, too. :param v: :return:
- classmethod read(config_path: str | None = None)[source]¶
Read configuration from .servicex or servicex.yaml file. :param config_path: If provided, use this as the path to the .servicex file.
Otherwise, search, starting from the current working directory and look in enclosing directories
- Returns:
Populated configuration object
- pydantic model servicex.configuration.Endpoint[source]¶
Bases:
BaseModel
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- field endpoint: str [Required]¶
- field name: str [Required]¶
- field token: str | None = ''¶
servicex.databinder_models module¶
- pydantic model servicex.databinder_models.General[source]¶
Bases:
BaseModel
Represents a group of samples to be transformed together.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- class OutputFormatEnum(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]¶
Bases:
str
,Enum
Specifies the output format for the transform request.
- parquet = 'parquet'¶
Save the output as a parquet file https://parquet.apache.org/
- root_ttree = 'root-ttree'¶
Save the output as a ROOT TTree https://root.cern.ch/doc/master/classTTree.html
- to_ResultFormat() ResultFormat [source]¶
This method is used to convert the OutputFormatEnum enum to the ResultFormat enum, which is what is actually used for the TransformRequest. This allows us to use different string values in the two enum classes to maintain backend compatibility
- class DeliveryEnum(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]¶
Bases:
str
,Enum
- LocalCache = 'LocalCache'¶
Download the files to the local computer and store them in the cache. Transform requests will return paths to these files in the cache
- URLs = 'URLs'¶
Return URLs to the files stored in the ServiceX object store
- field Codegen: str | None = None¶
Code generator name to be applied across all of the samples, if applicable. Generally users don’t need to specify this. It is implied by the query class
- field OutputFormat: OutputFormatEnum = OutputFormatEnum.root_ttree¶
Output format for the transform request.
- field Delivery: DeliveryEnum = DeliveryEnum.LocalCache¶
Specifies the delivery method for the output files.
- field OutputDirectory: str | None = None¶
Directory to output a yaml file describing the output files.
- field OutFilesetName: str = 'servicex_fileset'¶
Name of the yaml file that will be created in the output directory.
- pydantic model servicex.databinder_models.Sample[source]¶
Bases:
BaseModel
Represents a single transform request within a larger submission.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- Validators:
validate_did_xor_file
»all fields
- field Name: str [Required]¶
The name of the sample. This makes it easier to identify the sample in the output.
- field Dataset: DataSetIdentifier | None = None¶
Dataset identifier for the sample
- field NFiles: int | None = None¶
Limit the Number of files to be used in the sample. The DID Finder will guarantee the same files will be returned between each invocation. Set to None to use all files.
- field Query: str | QueryStringGenerator | None = None¶
Query string or query generator for the sample.
- field IgnoreLocalCache: bool = False¶
Flag to ignore local cache.
- field Codegen: str | None = None¶
Code generator name, if applicable. Generally users don’t need to specify this. It is implied by the query class
- field RucioDID: str | None = None¶
- Rucio Dataset Identifier, if applicable.
Deprecated: Use ‘Dataset’ instead.
- field XRootDFiles: str | List[str] | None = None¶
- XRootD file(s) associated with the sample.
Deprecated: Use ‘Dataset’ instead.
- property dataset_identifier: DataSetIdentifier¶
Access the dataset identifier for the sample.
- pydantic model servicex.databinder_models.ServiceXSpec[source]¶
Bases:
BaseModel
ServiceX Submission Specification - pass this into the ServiceX deliver function
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- field General: General = General(Codegen=None, OutputFormat=<OutputFormatEnum.root_ttree: 'root-ttree'>, Delivery=<DeliveryEnum.LocalCache: 'LocalCache'>, OutputDirectory=None, OutFilesetName='servicex_fileset')¶
General settings for the transform request
- field Definition: List | None = None¶
Any reusable definitions that are needed for the transform request
servicex.dataset_group module¶
- class servicex.dataset_group.DatasetGroup(datasets: List[Query])[source]¶
Bases:
object
A group of datasets that are to be transformed together. This is a convenience class to allow you to submit multiple datasets to a ServiceX instance and then wait for all of them to complete.
- Parameters:
datasets – List of transform request as dataset instances
- as_files(display_progress: bool = True, provided_progress: Progress | None = None, return_exceptions: bool = False) List[TransformedResults | BaseException] ¶
- async as_files_async(display_progress: bool = True, provided_progress: Progress | None = None, return_exceptions: bool = False) List[TransformedResults | BaseException] [source]¶
- as_signed_urls(display_progress: bool = True, provided_progress: Progress | None = None, return_exceptions: bool = False) List[TransformedResults | BaseException] ¶
- async as_signed_urls_async(display_progress: bool = True, provided_progress: Progress | None = None, return_exceptions: bool = False) List[TransformedResults | BaseException] [source]¶
- set_result_format(result_format: ResultFormat)[source]¶
Set the result format for all the datasets in the group.
- Parameters:
result_format – ResultFormat instance
servicex.dataset_identifier module¶
- class servicex.dataset_identifier.CERNOpenDataDatasetIdentifier(dataset: int, num_files: int | None = None)[source]¶
Bases:
DataSetIdentifier
CERN Open Data Dataset - this will be looked up using the CERN Open Data DID finder.
- Parameters:
dataset – The dataset ID - this is an integer.
num_files – Maximum number of files to return. This is useful during development to perform quick runs. ServiceX is careful to make sure it always returns the same subset of files.
- yaml_tag = '!CERNOpenData'¶
- class servicex.dataset_identifier.DataSetIdentifier(scheme: str, dataset: str, num_files: int | None = None)[source]¶
Bases:
object
Base class for specifying the dataset to transform. This can either be a list of xRootD URIs or a rucio DID
- property did¶
- populate_transform_request(transform_request: TransformRequest) None [source]¶
- class servicex.dataset_identifier.FileListDataset(files: List[str] | str)[source]¶
Bases:
DataSetIdentifier
Dataset specified as a list of XRootD URIs.
- Parameters:
files – Either a list of URIs or a single URI string
- property did¶
- files: List[str]¶
- populate_transform_request(transform_request: TransformRequest) None [source]¶
- yaml_tag = '!FileList'¶
- class servicex.dataset_identifier.RucioDatasetIdentifier(dataset: str, num_files: int | None = None)[source]¶
Bases:
DataSetIdentifier
Rucio Dataset - this will be looked up using the Rucio data management service.
- Parameters:
dataset – The rucio DID - this can be a dataset or a container of datasets.
num_files – Maximum number of files to return. This is useful during development to perform quick runs. ServiceX is careful to make sure it always returns the same subset of files.
- yaml_tag = '!Rucio'¶
servicex.expandable_progress module¶
- class servicex.expandable_progress.ExpandableProgress(display_progress: bool = True, provided_progress: Progress | ExpandableProgress | None = None, overall_progress: bool = False)[source]¶
Bases:
object
We want to be able to use rich progress bars in the async code, but there are some situtations where the user doesn’t want them. Also we might be running several simultaneous progress bars, and we want to be able to control that.
We still want to keep the context manager interface, so this class implements the context manager but if display_progress is False, then it does nothing. If provided_progress is set then we just use that. Otherwise we create a new progress bar
- Parameters:
display_progress
provided_progress
- class servicex.expandable_progress.ProgressCounts(description: str, task_id: TaskID, start: int | None = None, total: int | None = None, completed: int | None = None)[source]¶
Bases:
object
- class servicex.expandable_progress.TranformStatusProgress(*columns: str | ProgressColumn, console: Console | None = None, auto_refresh: bool = True, refresh_per_second: float = 10, speed_estimate_period: float = 30.0, transient: bool = False, redirect_stdout: bool = True, redirect_stderr: bool = True, get_time: Callable[[], float] | None = None, disable: bool = False, expand: bool = False)[source]¶
Bases:
Progress
servicex.minio_adapter module¶
- class servicex.minio_adapter.MinioAdapter(endpoint_host: str, secure: bool, access_key: str, secret_key: str, bucket: str)[source]¶
Bases:
object
- MAX_PATH_LEN = 60¶
- async download_file(object_name: str, local_dir: str, shorten_filename: bool = False) Path [source]¶
- classmethod for_transform(transform: TransformStatus)[source]¶
- classmethod hash_path(file_name)[source]¶
Make the path safe for object store or POSIX, by keeping the length less than MAX_PATH_LEN. Replace the leading (less interesting) characters with a forty character hash. :param file_name: Input filename :return: Safe path string
- async list_bucket() List[ResultFile] [source]¶
servicex.models module¶
- class servicex.models.ResultDestination(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]¶
Bases:
str
,Enum
Direct the output to object store or posix volume
- object_store = 'object-store'¶
- volume = 'volume'¶
- pydantic model servicex.models.ResultFile[source]¶
Bases:
BaseModel
Record reporting the properties of a transformed file result
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- field filename: str [Required]¶
- field size: int [Required]¶
- field extension: str [Required]¶
- class servicex.models.ResultFormat(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]¶
Bases:
str
,Enum
Specify the file format for the generated output
- parquet = 'parquet'¶
- root_ttree = 'root-file'¶
- class servicex.models.Status(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]¶
Bases:
str
,Enum
Status of a submitted transform
- canceled = 'Canceled'¶
- complete = 'Complete'¶
- fatal = 'Fatal'¶
- looking = 'Lookup'¶
- pending = 'Pending Lookup'¶
- running = 'Running'¶
- submitted = 'Submitted'¶
- pydantic model servicex.models.TransformRequest[source]¶
Bases:
BaseModel
Transform request sent to ServiceX
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- field title: str | None = None¶
- field did: str | None = None¶
- field file_list: List[str] | None = None (alias 'file-list')¶
- field selection: str [Required]¶
- field image: str | None = None¶
- field codegen: str [Required]¶
- field tree_name: str | None = None (alias 'tree-name')¶
- field result_destination: ResultDestination [Required]¶
- field result_format: ResultFormat [Required]¶
- pydantic model servicex.models.TransformStatus[source]¶
Bases:
BaseModel
Status object returned by servicex
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- Validators:
- field request_id: str [Required]¶
- field did: str [Required]¶
- field title: str | None = None¶
- field selection: str [Required]¶
- field tree_name: str | None [Required]¶
- field image: str [Required]¶
- field result_destination: ResultDestination [Required]¶
- field result_format: ResultFormat [Required]¶
- field generated_code_cm: str [Required]¶
- field app_version: str [Required]¶
- field files: int [Required]¶
- field files_completed: int [Required]¶
- field files_failed: int [Required]¶
- field files_remaining: int | None = 0¶
- field submit_time: datetime = None¶
- field finish_time: datetime | None = None¶
- field minio_endpoint: str | None = None¶
- field minio_secured: bool | None = None¶
- field minio_access_key: str | None = None¶
- field minio_secret_key: str | None = None¶
- field log_url: str | None = None¶
- validator parse_finish_time » finish_time[source]¶
- pydantic model servicex.models.TransformedResults[source]¶
Bases:
BaseModel
Returned for a submission. Gives you everything you need to know about a completed transform.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- field hash: str [Required]¶
- field title: str [Required]¶
- field codegen: str [Required]¶
- field request_id: str [Required]¶
- field submit_time: datetime [Required]¶
- field data_dir: str [Required]¶
- field file_list: List[str] [Required]¶
- field signed_url_list: List[str] [Required]¶
- field files: int [Required]¶
- field result_format: ResultFormat [Required]¶
- field log_url: str | None = None¶
servicex.python_dataset module¶
servicex.query module¶
servicex.query_cache module¶
- class servicex.query_cache.QueryCache(config: Configuration)[source]¶
Bases:
object
- cache_path_for_transform(transform_status: TransformStatus) Path [source]¶
- cache_transform(record: TransformedResults)[source]¶
- cached_queries() List[TransformedResults] [source]¶
- get_transform_by_hash(hash: str) TransformedResults | None [source]¶
- get_transform_by_request_id(request_id: str) TransformedResults | None [source]¶
- transformed_results(transform: TransformRequest, completed_status: TransformStatus, data_dir: str, file_list: List[str], signed_urls) TransformedResults [source]¶
- update_record(record: TransformedResults)[source]¶
servicex.servicex_adapter module¶
- class servicex.servicex_adapter.ServiceXAdapter(url: str, refresh_token: str | None = None)[source]¶
Bases:
object
- async get_transform_status(request_id: str) TransformStatus [source]¶
- async get_transforms() List[TransformStatus] [source]¶
- async submit_transform(transform_request: TransformRequest)[source]¶
servicex.servicex_client module¶
- exception servicex.servicex_client.ReturnValueException(exc)[source]¶
Bases:
Exception
An exception occurred at some point while obtaining this result from ServiceX
- class servicex.servicex_client.ServiceXClient(backend=None, url=None, config_path=None)[source]¶
Bases:
object
Connection to a ServiceX deployment. Instances of this class can deployment data from the service and also interact with previously run transformations. Instances of this class are factories for Datasets`
If both backend and url are unspecified then it will attempt to pick up the default backend from .servicex
- Parameters:
backend – Name of a deployment from the .servicex file
url – Direct URL of a serviceX deployment instead of using .servicex. Can only work with hosts without auth, or the token is found in a file pointed to by the environment variable BEARER_TOKEN_FILE
config_path – Optional path te the .servicex file. If not specified, will search in local directory and up in enclosing directories
- generic_query(dataset_identifier: DataSetIdentifier | FileListDataset, query: str | QueryStringGenerator, codegen: str | None = None, title: str = 'ServiceX Client', result_format: ResultFormat = ResultFormat.parquet, ignore_cache: bool = False) Query [source]¶
Generate a Query object for a generic codegen specification
- Parameters:
dataset_identifier – The dataset identifier or filelist to be the source of files
title – Title to be applied to the transform. This is also useful for relating transform results.
codegen – Name of the code generator to use with this transform
result_format – Do you want Paqrquet or Root? This can be set later with the set_result_format method
ignore_cache – Ignore the query cache and always run the query
- Returns:
A Query object
- get_code_generators(backend=None)[source]¶
Retrieve the code generators deployed with the serviceX instance :return: The list of code generators as json dictionary
- get_transform_status(transform_id) TransformStatus ¶
Get the status of a given transform :param transform_id: The uuid of the transform :return: The current status for the transform
- async get_transform_status_async(transform_id) TransformStatus [source]¶
Get the status of a given transform :param transform_id: The uuid of the transform :return: The current status for the transform
- get_transforms() List[TransformStatus] ¶
Retrieve all transforms you have run on the server :return: List of Transform status objects
- async get_transforms_async() List[TransformStatus] [source]¶
Retrieve all transforms you have run on the server :return: List of Transform status objects
- servicex.servicex_client.deliver(config: ServiceXSpec | Mapping[str, Any] | str | Path, config_path: str | None = None, servicex_name: str | None = None, return_exceptions: bool = True)[source]¶
servicex.types module¶
Module contents¶
- pydantic model servicex.General[source]¶
Bases:
BaseModel
Represents a group of samples to be transformed together.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- class OutputFormatEnum(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]¶
Bases:
str
,Enum
Specifies the output format for the transform request.
- parquet = 'parquet'¶
Save the output as a parquet file https://parquet.apache.org/
- root_ttree = 'root-ttree'¶
Save the output as a ROOT TTree https://root.cern.ch/doc/master/classTTree.html
- to_ResultFormat() ResultFormat [source]¶
This method is used to convert the OutputFormatEnum enum to the ResultFormat enum, which is what is actually used for the TransformRequest. This allows us to use different string values in the two enum classes to maintain backend compatibility
- class DeliveryEnum(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]¶
Bases:
str
,Enum
- LocalCache = 'LocalCache'¶
Download the files to the local computer and store them in the cache. Transform requests will return paths to these files in the cache
- URLs = 'URLs'¶
Return URLs to the files stored in the ServiceX object store
- field Codegen: str | None = None¶
Code generator name to be applied across all of the samples, if applicable. Generally users don’t need to specify this. It is implied by the query class
- field OutputFormat: OutputFormatEnum = OutputFormatEnum.root_ttree¶
Output format for the transform request.
- field Delivery: DeliveryEnum = DeliveryEnum.LocalCache¶
Specifies the delivery method for the output files.
- field OutputDirectory: str | None = None¶
Directory to output a yaml file describing the output files.
- field OutFilesetName: str = 'servicex_fileset'¶
Name of the yaml file that will be created in the output directory.
- class servicex.ResultDestination(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]¶
Bases:
str
,Enum
Direct the output to object store or posix volume
- object_store = 'object-store'¶
- volume = 'volume'¶
- class servicex.ResultFormat(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]¶
Bases:
str
,Enum
Specify the file format for the generated output
- parquet = 'parquet'¶
- root_ttree = 'root-file'¶
- pydantic model servicex.Sample[source]¶
Bases:
BaseModel
Represents a single transform request within a larger submission.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- Validators:
validate_did_xor_file
»all fields
- field Name: str [Required]¶
The name of the sample. This makes it easier to identify the sample in the output.
- field Dataset: DataSetIdentifier | None = None¶
Dataset identifier for the sample
- field NFiles: int | None = None¶
Limit the Number of files to be used in the sample. The DID Finder will guarantee the same files will be returned between each invocation. Set to None to use all files.
- field Query: str | QueryStringGenerator | None = None¶
Query string or query generator for the sample.
- field IgnoreLocalCache: bool = False¶
Flag to ignore local cache.
- field Codegen: str | None = None¶
Code generator name, if applicable. Generally users don’t need to specify this. It is implied by the query class
- field RucioDID: str | None = None¶
- Rucio Dataset Identifier, if applicable.
Deprecated: Use ‘Dataset’ instead.
- field XRootDFiles: str | List[str] | None = None¶
- XRootD file(s) associated with the sample.
Deprecated: Use ‘Dataset’ instead.
- property dataset_identifier: DataSetIdentifier¶
Access the dataset identifier for the sample.
- pydantic model servicex.ServiceXSpec[source]¶
Bases:
BaseModel
ServiceX Submission Specification - pass this into the ServiceX deliver function
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- field General: General = General(Codegen=None, OutputFormat=<OutputFormatEnum.root_ttree: 'root-ttree'>, Delivery=<DeliveryEnum.LocalCache: 'LocalCache'>, OutputDirectory=None, OutFilesetName='servicex_fileset')¶
General settings for the transform request
- field Definition: List | None = None¶
Any reusable definitions that are needed for the transform request
- servicex.deliver(config: ServiceXSpec | Mapping[str, Any] | str | Path, config_path: str | None = None, servicex_name: str | None = None, return_exceptions: bool = True)[source]¶