Code documentation

Note

This section provides API documentation for the esgprep Python modules.

esgdrs

Main module

platform:

Unix

synopsis:

Toolbox to prepare ESGF data for publication.

esgprep.esgdrs.get_args()[source]

Returns parsed command-line arguments.

esgprep.esgdrs.main()[source]

Run main program

Submodules

Module author: Guillaume Levavasseur <glipsl@ipsl.fr>

esgprep.drs.run(args)[source]

Main process.

platform:

Unix

synopsis:

Constants used in this module.

platform:

Unix

synopsis:

Processing context used in this module.

class esgprep.drs.context.ProcessingContext(args)[source]

Processing context class to drive main process.

check_commands_file()[source]

Checks commands file behavior.

platform:

Unix

synopsis:

Manages the filesystem tree according to the project the Data Reference Syntax and versioning.

class esgprep.drs.make.Process(ctx)[source]

Child process.

platform:

Unix

synopsis:

Manages the filesystem tree according to the project the Data Reference Syntax and versioning.

class esgprep.drs.remove.Process(ctx)[source]

Child process.

platform:

Unix

synopsis:

Manages the filesystem tree according to the project the Data Reference Syntax and versioning.

class esgprep.drs.latest.Process(ctx)[source]

Child process.

esgmapfile

Main module

platform:

Unix

synopsis:

Toolbox to prepare ESGF data for publication.

esgprep.esgmapfile.get_args()[source]

Returns parsed command-line arguments.

esgprep.esgmapfile.main()[source]

Run main program

Submodules

Module author: Guillaume Levavasseur <glipsl@ipsl.fr>

platform:

Unix

synopsis:

Constants used in this module.

platform:

Unix

synopsis:

Processing context used in this module.

class esgprep.mapfile.context.ProcessingContext(args)[source]

Processing context class to drive main process.

clean()[source]

Clean directory from incomplete mapfiles. Incomplete mapfiles from a previous run are silently removed.

platform:

Unix

synopsis:

Generates ESGF mapfiles upon a local ESGF node or not.

class esgprep.mapfile.make.Process(ctx)[source]

Child process.

platform:

Unix

synopsis:

Show mapfile name to be generated..

class esgprep.mapfile.show.Process(ctx)[source]

Child process.

Utilities

Collectors

Module author: Guillaume Levavasseur <glipsl@ipsl.fr>

class esgprep._collectors.Collector(sources)[source]

Base collector class to yield input sources.

class esgprep._collectors.FilterCollection[source]

Evaluates a string against a dictionary of several regular expressions. The dictionary includes 2-tuples with the regular expression as a string and a boolean indicating to match (i.e., include) or non-match (i.e., exclude) the corresponding expression.

FILTER_TYPES = (<class 'str'>, typing.Pattern)
add(name=None, regex='*', inclusive=True)[source]

Contexts

Module author: Guillaume Levavasseur <glipsl@ipsl.fr>

class esgprep._contexts.BaseContext(args)[source]

Base class for processing context manager.

set(key, default=False)[source]

Exceptions

Module author: Guillaume Levavasseur <glipsl@ipsl.fr>

exception esgprep._exceptions.KeyNotFound(key, keys=None)[source]

Raised when a class key is not found.

exception esgprep._exceptions.InvalidChecksumType(client)[source]

Raised when checksum type in unknown.

exception esgprep._exceptions.ChecksumFail(path, checksum_type=None)[source]

Raised when a checksum fails.

exception esgprep._exceptions.NoFileFound(paths)[source]

Raised when no file found.

exception esgprep._exceptions.DuplicatedDataset(path, version)[source]

Raised if a dataset already exists with submitted version.

exception esgprep._exceptions.OlderUpgrade(version, latest)[source]

Raised if a dataset already exists with submitted version.

exception esgprep._exceptions.DuplicatedFile(latest, upgrade)[source]

Raised if a NetCDF file already exists into submitted dataset version.

exception esgprep._exceptions.UnchangedTrackingID(latest, latest_id, upgrade, upgrade_id)[source]

Raised if a NetCDF file already has the tracking ID of submitted file to upgrade.

exception esgprep._exceptions.NoVersionPattern(regex, patterns)[source]

Raised if no version facet found in the destination format.

exception esgprep._exceptions.InconsistentDRSPath(project, path)[source]

Raised when DRS path doesn’t start with the project ID.

exception esgprep._exceptions.NoProjectCodeFound(val)[source]

Raised when no project code found or extract from DRS path or file.

exception esgprep._exceptions.MissingCVdata(authority, project)[source]

Raised when CV data is missing for an authority/project.

Handlers

Module author: Guillaume Levavasseur <glipsl@ipsl.fr>

Utils

Module author: Guillaume Levavasseur <glipsl@ipsl.fr>

esgprep._utils.checksum.multihash(data: bytes, algo: str) bytes[source]

Generate a multihash for the given data using the specified algorithm.

Args:

data: The data to hash algo: The multihash algorithm name (e.g., “sha2-256”)

Returns:

The multihash as bytes (code + length + digest)

Raises:

ValueError: If the algorithm is not supported

esgprep._utils.checksum.multihash_hex(data: bytes, algo: str) str[source]

Generate a multihash for the given data and return it as a hex string.

Args:

data: The data to hash algo: The multihash algorithm name (e.g., “sha2-256”)

Returns:

The multihash as a hexadecimal string

esgprep._utils.checksum.detect_multihash_algo(hash_hex: str) str[source]

Detect the multihash algorithm from a multihash hex string.

Args:

hash_hex: The multihash as a hexadecimal string

Returns:

The algorithm name (e.g., “sha2-256”) or None if not a valid multihash

esgprep._utils.checksum.is_multihash_algo(checksum_type: str) bool[source]

Check if a checksum type is a multihash algorithm.

Args:

checksum_type: The checksum type to check

Returns:

True if it’s a multihash algorithm, False otherwise

esgprep._utils.checksum.checksum(ffp, checksum_type, include_filename=False, human_readable=True)[source]

Computes a file checksum. Supports both standard hashlib algorithms and multihash algorithms.

esgprep._utils.checksum.get_checksum_pattern(checksum_type)[source]

Builds a regular expression describing a checksum pattern.

esgprep._utils.checksum.get_checksum(ffp, checksum_type='sha256', checksums=None)[source]

Global method to get file checksum: 1. By computing the checksum directly. 2. Through a list of checksums in a dictionary way {file: checksum}.

Module author: Guillaume Levavasseur <glipsl@ipsl.fr>

Module author: Guillaume Levavasseur <glipsl@ipsl.fr>

class esgprep._utils.ncfile.ncopen(path: str, mode: str = 'r')[source]

Opens opens a netCDF file

esgprep._utils.ncfile.get_ncattrs(path: str) dict[source]

Loads netCDF global attributes from a pathlib.Path as dictionary. Ignores attributes with only whitespaces.

esgprep._utils.ncfile.get_tracking_id(attrs: dict) str[source]

Get tracking_id/PID string from netCDF global attributes.

esgprep._utils.ncfile.is_valid(identifier: str, project: str) bool[source]

Validates a tracking_id/PID string.

esgprep._utils.ncfile.is_uuid(uuid_string, version=4)[source]

Validates an UUID.

esgprep._utils.ncfile.get_project(attrs: str | dict) str | None[source]

Extract project code from the file attributes.

Module author: Guillaume Levavasseur <glipsl@ipsl.fr>

class esgprep._utils.parser.CustomArgumentParser(prog=None, usage=None, description=None, epilog=None, parents=[], formatter_class=<class 'argparse.HelpFormatter'>, prefix_chars='-', fromfile_prefix_chars=None, argument_default=None, conflict_handler='error', add_help=True, allow_abbrev=True, exit_on_error=True)[source]

Custom argument parser class.

error(message: string)[source]

Prints a usage message incorporating the message to stderr and exits.

If you override this in a subclass, it should not return – it should either exit or raise an exception.

class esgprep._utils.parser.MultilineFormatter(prog, default_columns=120)[source]

Custom formatter class.

add_arguments(actions)[source]

Sort optional arguments alphabetically while keeping positional arguments first.

class esgprep._utils.parser.DirectoryChecker(option_strings, dest, nargs=None, const=None, default=None, type=None, choices=None, required=False, help=None, metavar=None)[source]

Action class to check a directory.

static directory_checker(path)[source]

Verify a directory exists.

class esgprep._utils.parser.ConfigFileLoader(option_strings, dest, nargs=None, const=None, default=None, type=None, choices=None, required=False, help=None, metavar=None)[source]

Configuration file action class.

static load(path)[source]

Loads configuration file parser.

class esgprep._utils.parser.ChecksumsReader(option_strings, dest, nargs=None, const=None, default=None, type=None, choices=None, required=False, help=None, metavar=None)[source]

Action class to read a checksum file similar to any checksum client output. Returns a dictionary where (key: value) pairs respectively are the file path and its checksum.

static read(path)[source]

Reads checksum list.

class esgprep._utils.parser.DatasetsReader(option_strings, dest, nargs=None, const=None, default=None, type=None, choices=None, required=False, help=None, metavar=None)[source]

Action class to read a dataset identifier list from a simple text file. Returns a list of identifiers.

static read(path)[source]

Reads checksum list.

class esgprep._utils.parser.VersionChecker(option_strings, dest, nargs=None, const=None, default=None, type=None, choices=None, required=False, help=None, metavar=None)[source]

Custom action class.

static version_checker(version)[source]

Validates version number.

esgprep._utils.parser.keyval_converter(pair)[source]

Validates (key = value) argument format.

esgprep._utils.parser.regex_validator(string)[source]

Validates a regular expression syntax.

esgprep._utils.parser.processes_validator(value)[source]

Validates the maximum number of processes.

esgprep._utils.path.extract_version(path: Path) str[source]

Extracts the version string (vXXXXXXXX) from the given path. Raises a ValueError if no valid version is found.

esgprep._utils.path.get_version_index(path: Path) int[source]

Returns the index position of the version part (vXXXXXXXX) in the path parts.

esgprep._utils.path.get_version_and_subpath(path: Path) list[str][source]

Returns a list of path parts from the version part to the end of the path.

esgprep._utils.path.get_path_to_version(path: Path) list[str][source]

Returns a list of path parts from the start part to the version of the path.

esgprep._utils.path.get_ordered_version_paths(base_path: Path) list[Path][source]

Returns a list of all “version directory” paths in the base_path directory, ordered by version, excluding the ‘latest’ symlink.

esgprep._utils.path.get_ordered_file_version_paths(base_path: Path, file_name: str)[source]
esgprep._utils.path.get_versions(path: Path) list[Path][source]

Returns a list of all version directory paths for the given path, ordered by version. This is used to find all existing versions of a dataset.

esgprep._utils.path.get_drs(path: Path) Path[source]

Returns the DRS (Data Reference Syntax) part of the path. This returns the path up to but not including the version.

Check if the path contains ‘latest’ and is a symlink.

esgprep._utils.path.with_latest_target(path: Path) Path[source]

If path is a ‘latest’ symlink, return the target path. Otherwise return the original path.

esgprep._utils.path.get_project(path) str | None[source]

Extract project code from a pathlib.Path object.

esgprep._utils.path.get_terms(path: Path) dict[source]

Extract DRS terms from NetCDF file global attributes. Returns a dictionary of DRS terms for the given path.

esgprep._utils.path.dataset_id(path: Path) str | None[source]

Build dataset identifier from DRS path structure using esgvoc DrsGenerator. Returns the dataset identifier string for the given path.

Extracts terms from the directory path parts (between DRS root and version) and uses the DrsGenerator to build a valid dataset ID.

Module author: Guillaume Levavasseur <glipsl@ipsl.fr>

class esgprep._utils.print.COLOR(color=None)[source]

Define color object for print statements Default is no color (i.e., restore original color)

PALETTE = {'blue': 34, 'cyan': 36, 'gray': 37, 'green': 32, 'light blue': 94, 'light cyan': 96, 'light gray': 97, 'light green': 92, 'light magenta': 95, 'light red': 91, 'light yellow': 93, 'magenta': 35, 'red': 31, 'yellow': 33}
RESTORE = '\x1b[0m'
COLORS = False
bold(msg=None)[source]
italic(msg=None)[source]
underline(msg=None)[source]
class esgprep._utils.print.COLORS[source]

Preset colors statements depending on the status.

static OKBLUE(msg)[source]
static HEADER(msg)[source]
static SUCCESS(msg)[source]
static FAIL(msg)[source]
static INFO(msg)[source]
static WARNING(msg)[source]
static ERROR(msg)[source]
static DEBUG(msg)[source]
class esgprep._utils.print.Print[source]

Class to manage and dispatch print statement depending on log and debug mode.

LOG: str | None = None
DEBUG: bool = False
CMD: str | None = None
LOG_TO_STDOUT: bool = False
LOGFILE: str | None = None
CARRIAGE_RETURNED: bool = True
BUFFER = <Synchronized wrapper for c_wchar_p(140057509584672)>
static init(log, debug, cmd)[source]
static check_carriage_return(msg)[source]
static print_to_stdout(msg)[source]
static print_to_logfile(msg)[source]
static progress(msg)[source]
static command(msg=None)[source]
static log(msg=None)[source]
static summary(msg)[source]
static info(msg)[source]
static debug(msg)[source]
static warning(msg)[source]
static error(msg, buffer=False)[source]
static success(msg, buffer=False)[source]
static result(msg, buffer=False)[source]
static exception(msg, buffer=False)[source]
static flush()[source]
static enable_colors()[source]
static disable_colors()[source]

Module author: Levavasseur Guillaume (CNRS/IPSL) <glipsl@ipsl.fr>