Code documentation

Note

This section provides API documentation for the esgprep Python modules.

esgdrs

Main module

platform:

Unix

synopsis:

Toolbox to prepare ESGF data for publication.

esgprep.esgdrs.get_args()[source]

Returns parsed command-line arguments.

esgprep.esgdrs.main()[source]

Run main program

Submodules

Module author: Guillaume Levavasseur <glipsl@ipsl.fr>

esgprep.drs.run(args)[source]

Main process.

platform:

Unix

synopsis:

Constants used in this module.

platform:

Unix

synopsis:

Processing context used in this module.

class esgprep.drs.context.ProcessingContext(args)[source]

Processing context class to drive main process.

check_commands_file()[source]

Checks commands file behavior.

platform:

Unix

synopsis:

Manages the filesystem tree according to the project the Data Reference Syntax and versioning.

class esgprep.drs.make.Process(ctx)[source]

Child process.

platform:

Unix

synopsis:

Manages the filesystem tree according to the project the Data Reference Syntax and versioning.

class esgprep.drs.remove.Process(ctx)[source]

Child process.

platform:

Unix

synopsis:

Manages the filesystem tree according to the project the Data Reference Syntax and versioning.

class esgprep.drs.latest.Process(ctx)[source]

Child process.

esgmapfile

Main module

platform:

Unix

synopsis:

Toolbox to prepare ESGF data for publication.

esgprep.esgmapfile.get_args()[source]

Returns parsed command-line arguments.

esgprep.esgmapfile.main()[source]

Run main program

Submodules

Module author: Guillaume Levavasseur <glipsl@ipsl.fr>

platform:

Unix

synopsis:

Constants used in this module.

platform:

Unix

synopsis:

Processing context used in this module.

class esgprep.mapfile.context.ProcessingContext(args)[source]

Processing context class to drive main process.

clean()[source]

Clean directory from incomplete mapfiles. Incomplete mapfiles from a previous run are silently removed.

platform:

Unix

synopsis:

Generates ESGF mapfiles upon a local ESGF node or not.

class esgprep.mapfile.make.Process(ctx)[source]

Child process.

platform:

Unix

synopsis:

Show mapfile name to be generated..

class esgprep.mapfile.show.Process(ctx)[source]

Child process.

Utilities

Collectors

Module author: Guillaume Levavasseur <glipsl@ipsl.fr>

class esgprep._collectors.Collector(sources)[source]

Base collector class to yield input sources.

class esgprep._collectors.FilterCollection[source]

Evaluates a string against a dictionary of several regular expressions. The dictionary includes 2-tuples with the regular expression as a string and a boolean indicating to match (i.e., include) or non-match (i.e., exclude) the corresponding expression.

FILTER_TYPES = (<class 'str'>, typing.Pattern)
add(name=None, regex='*', inclusive=True)[source]

Contexts

Module author: Guillaume Levavasseur <glipsl@ipsl.fr>

class esgprep._contexts.BaseContext(args)[source]

Base class for processing context manager.

set(key, default=False)[source]

Exceptions

Module author: Guillaume Levavasseur <glipsl@ipsl.fr>

exception esgprep._exceptions.KeyNotFound(key, keys=None)[source]

Raised when a class key is not found.

exception esgprep._exceptions.InvalidChecksumType(client)[source]

Raised when checksum type in unknown.

exception esgprep._exceptions.ChecksumFail(path, checksum_type=None)[source]

Raised when a checksum fails.

exception esgprep._exceptions.NoFileFound(paths)[source]

Raised when no file found.

exception esgprep._exceptions.DuplicatedDataset(path, version)[source]

Raised if a dataset already exists with submitted version.

exception esgprep._exceptions.OlderUpgrade(version, latest)[source]

Raised if a dataset already exists with submitted version.

exception esgprep._exceptions.DuplicatedFile(latest, upgrade)[source]

Raised if a NetCDF file already exists into submitted dataset version.

exception esgprep._exceptions.UnchangedTrackingID(latest, latest_id, upgrade, upgrade_id)[source]

Raised if a NetCDF file already has the tracking ID of submitted file to upgrade.

exception esgprep._exceptions.NoVersionPattern(regex, patterns)[source]

Raised if no version facet found in the destination format.

exception esgprep._exceptions.InconsistentDRSPath(project, path)[source]

Raised when DRS path doesn’t start with the project ID.

exception esgprep._exceptions.NoProjectCodeFound(val)[source]

Raised when no project code found or extract from DRS path or file.

exception esgprep._exceptions.MissingCVdata(authority, project)[source]

Raised when CV data is missing for an authority/project.

Handlers

Module author: Guillaume Levavasseur <glipsl@ipsl.fr>

Utils

Module author: Guillaume Levavasseur <glipsl@ipsl.fr>

esgprep._utils.checksum.multihash(data: bytes, algo: str) bytes[source]

Generate a multihash for the given data using the specified algorithm.

Args:

data: The data to hash algo: The multihash algorithm name (e.g., “sha2-256”)

Returns:

The multihash as bytes (code + length + digest)

Raises:

ValueError: If the algorithm is not supported

esgprep._utils.checksum.multihash_hex(data: bytes, algo: str) str[source]

Generate a multihash for the given data and return it as a hex string.

Args:

data: The data to hash algo: The multihash algorithm name (e.g., “sha2-256”)

Returns:

The multihash as a hexadecimal string

esgprep._utils.checksum.detect_multihash_algo(hash_hex: str) str[source]

Detect the multihash algorithm from a multihash hex string.

Args:

hash_hex: The multihash as a hexadecimal string

Returns:

The algorithm name (e.g., “sha2-256”) or None if not a valid multihash

esgprep._utils.checksum.is_multihash_algo(checksum_type: str) bool[source]

Check if a checksum type is a multihash algorithm.

Args:

checksum_type: The checksum type to check

Returns:

True if it’s a multihash algorithm, False otherwise

esgprep._utils.checksum.checksum(ffp, checksum_type, include_filename=False, human_readable=True)[source]

Computes a file checksum. Supports both standard hashlib algorithms and multihash algorithms.

esgprep._utils.checksum.get_checksum_pattern(checksum_type)[source]

Builds a regular expression describing a checksum pattern.

esgprep._utils.checksum.get_checksum(ffp, checksum_type='sha256', checksums=None)[source]

Global method to get file checksum: 1. By computing the checksum directly. 2. Through a list of checksums in a dictionary way {file: checksum}.

Module author: Guillaume Levavasseur <glipsl@ipsl.fr>

Module author: Guillaume Levavasseur <glipsl@ipsl.fr>

class esgprep._utils.ncfile.ncopen(path: str, mode: str = 'r')[source]

Opens opens a netCDF file

esgprep._utils.ncfile.get_ncattrs(path: str) dict[source]

Loads netCDF global attributes from a pathlib.Path as dictionary. Ignores attributes with only whitespaces.

esgprep._utils.ncfile.get_tracking_id(attrs: dict) str[source]

Get tracking_id/PID string from netCDF global attributes.

esgprep._utils.ncfile.is_valid(identifier: str, project: str) bool[source]

Validates a tracking_id/PID string.

esgprep._utils.ncfile.is_uuid(uuid_string, version=4)[source]

Validates an UUID.

esgprep._utils.ncfile.get_project(attrs: str | dict) str | None[source]

Extract project code from the file attributes.

Module author: Guillaume Levavasseur <glipsl@ipsl.fr>

class esgprep._utils.parser.CustomArgumentParser(prog=None, usage=None, description=None, epilog=None, parents=[], formatter_class=<class 'argparse.HelpFormatter'>, prefix_chars='-', fromfile_prefix_chars=None, argument_default=None, conflict_handler='error', add_help=True, allow_abbrev=True, exit_on_error=True)[source]

Custom argument parser class.

error(message: string)[source]

Prints a usage message incorporating the message to stderr and exits.

If you override this in a subclass, it should not return – it should either exit or raise an exception.

class esgprep._utils.parser.MultilineFormatter(prog, default_columns=120)[source]

Custom formatter class.

add_arguments(actions)[source]

Sort optional arguments alphabetically while keeping positional arguments first.

class esgprep._utils.parser.DirectoryChecker(option_strings, dest, nargs=None, const=None, default=None, type=None, choices=None, required=False, help=None, metavar=None)[source]

Action class to check a directory.

static directory_checker(path)[source]

Verify a directory exists.

class esgprep._utils.parser.ConfigFileLoader(option_strings, dest, nargs=None, const=None, default=None, type=None, choices=None, required=False, help=None, metavar=None)[source]

Configuration file action class.

static load(path)[source]

Loads configuration file parser.

class esgprep._utils.parser.ChecksumsReader(option_strings, dest, nargs=None, const=None, default=None, type=None, choices=None, required=False, help=None, metavar=None)[source]

Action class to read a checksum file similar to any checksum client output. Returns a dictionary where (key: value) pairs respectively are the file path and its checksum.

static read(path)[source]

Reads checksum list.

class esgprep._utils.parser.DatasetsReader(option_strings, dest, nargs=None, const=None, default=None, type=None, choices=None, required=False, help=None, metavar=None)[source]

Action class to read a dataset identifier list from a simple text file. Returns a list of identifiers.

static read(path)[source]

Reads checksum list.

class esgprep._utils.parser.VersionChecker(option_strings, dest, nargs=None, const=None, default=None, type=None, choices=None, required=False, help=None, metavar=None)[source]

Custom action class.

static version_checker(version)[source]

Validates version number.

esgprep._utils.parser.keyval_converter(pair)[source]

Validates (key = value) argument format.

esgprep._utils.parser.regex_validator(string)[source]

Validates a regular expression syntax.

esgprep._utils.parser.processes_validator(value)[source]

Validates the maximum number of processes.

esgprep._utils.path.extract_version(path: Path) str[source]

Extracts the version string (vXXXXXXXX) from the given path. Raises a ValueError if no valid version is found.

esgprep._utils.path.get_version_index(path: Path) int[source]

Returns the index position of the version part (vXXXXXXXX) in the path parts.

esgprep._utils.path.get_version_and_subpath(path: Path) list[str][source]

Returns a list of path parts from the version part to the end of the path.

esgprep._utils.path.get_path_to_version(path: Path) list[str][source]

Returns a list of path parts from the start part to the version of the path.

esgprep._utils.path.get_ordered_version_paths(base_path: Path) list[Path][source]

Returns a list of all “version directory” paths in the base_path directory, ordered by version, excluding the ‘latest’ symlink.

esgprep._utils.path.get_ordered_file_version_paths(base_path: Path, file_name: str)[source]
esgprep._utils.path.get_versions(path: Path) list[Path][source]

Returns a list of all version directory paths for the given path, ordered by version. This is used to find all existing versions of a dataset.

esgprep._utils.path.get_drs(path: Path) Path[source]

Returns the DRS (Data Reference Syntax) part of the path. This returns the path up to but not including the version.

Check if the path contains ‘latest’ and is a symlink.

esgprep._utils.path.with_latest_target(path: Path) Path[source]

If path is a ‘latest’ symlink, return the target path. Otherwise return the original path.

esgprep._utils.path.get_project(path) str | None[source]

Extract project code from a pathlib.Path object.

esgprep._utils.path.get_terms(path: Path) dict[source]

Extract DRS terms from NetCDF file global attributes. Returns a dictionary of DRS terms for the given path.

esgprep._utils.path.dataset_id(path: Path) str | None[source]

Build dataset identifier from NetCDF file using esgvoc DrsGenerator. Returns the dataset identifier string for the given path.

Module author: Guillaume Levavasseur <glipsl@ipsl.fr>

class esgprep._utils.print.COLOR(color=None)[source]

Define color object for print statements Default is no color (i.e., restore original color)

PALETTE = {'blue': 34, 'cyan': 36, 'gray': 37, 'green': 32, 'light blue': 94, 'light cyan': 96, 'light gray': 97, 'light green': 92, 'light magenta': 95, 'light red': 91, 'light yellow': 93, 'magenta': 35, 'red': 31, 'yellow': 33}
RESTORE = '\x1b[0m'
COLORS = False
bold(msg=None)[source]
italic(msg=None)[source]
underline(msg=None)[source]
class esgprep._utils.print.COLORS[source]

Preset colors statements depending on the status.

static OKBLUE(msg)[source]
static HEADER(msg)[source]
static SUCCESS(msg)[source]
static FAIL(msg)[source]
static INFO(msg)[source]
static WARNING(msg)[source]
static ERROR(msg)[source]
static DEBUG(msg)[source]
class esgprep._utils.print.Print[source]

Class to manage and dispatch print statement depending on log and debug mode.

LOG: str | None = None
DEBUG: bool = False
CMD: str | None = None
LOG_TO_STDOUT: bool = False
LOGFILE: str | None = None
CARRIAGE_RETURNED: bool = True
BUFFER = <Synchronized wrapper for c_wchar_p(140443376115456)>
static init(log, debug, cmd)[source]
static check_carriage_return(msg)[source]
static print_to_stdout(msg)[source]
static print_to_logfile(msg)[source]
static progress(msg)[source]
static command(msg=None)[source]
static log(msg=None)[source]
static summary(msg)[source]
static info(msg)[source]
static debug(msg)[source]
static warning(msg)[source]
static error(msg, buffer=False)[source]
static success(msg, buffer=False)[source]
static result(msg, buffer=False)[source]
static exception(msg, buffer=False)[source]
static flush()[source]
static enable_colors()[source]
static disable_colors()[source]

Module author: Levavasseur Guillaume (CNRS/IPSL) <glipsl@ipsl.fr>