Code documentation
Note
This section provides API documentation for the esgprep Python modules.
esgdrs
Main module
- platform:
Unix
- synopsis:
Toolbox to prepare ESGF data for publication.
Submodules
Module author: Guillaume Levavasseur <glipsl@ipsl.fr>
- esgprep.drs.run(args)[source]
Main process.
- platform:
Unix
- synopsis:
Constants used in this module.
- platform:
Unix
- synopsis:
Processing context used in this module.
- class esgprep.drs.context.ProcessingContext(args)[source]
Processing context class to drive main process.
- platform:
Unix
- synopsis:
Manages the filesystem tree according to the project the Data Reference Syntax and versioning.
- platform:
Unix
- synopsis:
Manages the filesystem tree according to the project the Data Reference Syntax and versioning.
- platform:
Unix
- synopsis:
Manages the filesystem tree according to the project the Data Reference Syntax and versioning.
esgmapfile
Main module
- platform:
Unix
- synopsis:
Toolbox to prepare ESGF data for publication.
Submodules
Module author: Guillaume Levavasseur <glipsl@ipsl.fr>
- platform:
Unix
- synopsis:
Constants used in this module.
- platform:
Unix
- synopsis:
Processing context used in this module.
- class esgprep.mapfile.context.ProcessingContext(args)[source]
Processing context class to drive main process.
- platform:
Unix
- synopsis:
Generates ESGF mapfiles upon a local ESGF node or not.
- platform:
Unix
- synopsis:
Show mapfile name to be generated..
Utilities
Collectors
Module author: Guillaume Levavasseur <glipsl@ipsl.fr>
- class esgprep._collectors.FilterCollection[source]
Evaluates a string against a dictionary of several regular expressions. The dictionary includes 2-tuples with the regular expression as a string and a boolean indicating to match (i.e., include) or non-match (i.e., exclude) the corresponding expression.
- FILTER_TYPES = (<class 'str'>, typing.Pattern)
Contexts
Module author: Guillaume Levavasseur <glipsl@ipsl.fr>
Exceptions
Module author: Guillaume Levavasseur <glipsl@ipsl.fr>
- exception esgprep._exceptions.KeyNotFound(key, keys=None)[source]
Raised when a class key is not found.
- exception esgprep._exceptions.InvalidChecksumType(client)[source]
Raised when checksum type in unknown.
- exception esgprep._exceptions.ChecksumFail(path, checksum_type=None)[source]
Raised when a checksum fails.
- exception esgprep._exceptions.DuplicatedDataset(path, version)[source]
Raised if a dataset already exists with submitted version.
- exception esgprep._exceptions.OlderUpgrade(version, latest)[source]
Raised if a dataset already exists with submitted version.
- exception esgprep._exceptions.DuplicatedFile(latest, upgrade)[source]
Raised if a NetCDF file already exists into submitted dataset version.
- exception esgprep._exceptions.UnchangedTrackingID(latest, latest_id, upgrade, upgrade_id)[source]
Raised if a NetCDF file already has the tracking ID of submitted file to upgrade.
- exception esgprep._exceptions.NoVersionPattern(regex, patterns)[source]
Raised if no version facet found in the destination format.
- exception esgprep._exceptions.InconsistentDRSPath(project, path)[source]
Raised when DRS path doesn’t start with the project ID.
Handlers
Module author: Guillaume Levavasseur <glipsl@ipsl.fr>
Utils
Module author: Guillaume Levavasseur <glipsl@ipsl.fr>
- esgprep._utils.checksum.multihash(data: bytes, algo: str) bytes[source]
Generate a multihash for the given data using the specified algorithm.
- Args:
data: The data to hash algo: The multihash algorithm name (e.g., “sha2-256”)
- Returns:
The multihash as bytes (code + length + digest)
- Raises:
ValueError: If the algorithm is not supported
- esgprep._utils.checksum.multihash_hex(data: bytes, algo: str) str[source]
Generate a multihash for the given data and return it as a hex string.
- Args:
data: The data to hash algo: The multihash algorithm name (e.g., “sha2-256”)
- Returns:
The multihash as a hexadecimal string
- esgprep._utils.checksum.detect_multihash_algo(hash_hex: str) str[source]
Detect the multihash algorithm from a multihash hex string.
- Args:
hash_hex: The multihash as a hexadecimal string
- Returns:
The algorithm name (e.g., “sha2-256”) or None if not a valid multihash
- esgprep._utils.checksum.is_multihash_algo(checksum_type: str) bool[source]
Check if a checksum type is a multihash algorithm.
- Args:
checksum_type: The checksum type to check
- Returns:
True if it’s a multihash algorithm, False otherwise
- esgprep._utils.checksum.checksum(ffp, checksum_type, include_filename=False, human_readable=True)[source]
Computes a file checksum. Supports both standard hashlib algorithms and multihash algorithms.
- esgprep._utils.checksum.get_checksum_pattern(checksum_type)[source]
Builds a regular expression describing a checksum pattern.
- esgprep._utils.checksum.get_checksum(ffp, checksum_type='sha256', checksums=None)[source]
Global method to get file checksum: 1. By computing the checksum directly. 2. Through a list of checksums in a dictionary way {file: checksum}.
Module author: Guillaume Levavasseur <glipsl@ipsl.fr>
Module author: Guillaume Levavasseur <glipsl@ipsl.fr>
- esgprep._utils.ncfile.get_ncattrs(path: str) dict[source]
Loads netCDF global attributes from a pathlib.Path as dictionary. Ignores attributes with only whitespaces.
- esgprep._utils.ncfile.get_tracking_id(attrs: dict) str[source]
Get tracking_id/PID string from netCDF global attributes.
- esgprep._utils.ncfile.is_valid(identifier: str, project: str) bool[source]
Validates a tracking_id/PID string.
- esgprep._utils.ncfile.get_project(attrs: str | dict) str | None[source]
Extract project code from the file attributes.
Module author: Guillaume Levavasseur <glipsl@ipsl.fr>
- class esgprep._utils.parser.CustomArgumentParser(prog=None, usage=None, description=None, epilog=None, parents=[], formatter_class=<class 'argparse.HelpFormatter'>, prefix_chars='-', fromfile_prefix_chars=None, argument_default=None, conflict_handler='error', add_help=True, allow_abbrev=True, exit_on_error=True)[source]
Custom argument parser class.
- class esgprep._utils.parser.MultilineFormatter(prog, default_columns=120)[source]
Custom formatter class.
- class esgprep._utils.parser.DirectoryChecker(option_strings, dest, nargs=None, const=None, default=None, type=None, choices=None, required=False, help=None, metavar=None)[source]
Action class to check a directory.
- class esgprep._utils.parser.ConfigFileLoader(option_strings, dest, nargs=None, const=None, default=None, type=None, choices=None, required=False, help=None, metavar=None)[source]
Configuration file action class.
- class esgprep._utils.parser.ChecksumsReader(option_strings, dest, nargs=None, const=None, default=None, type=None, choices=None, required=False, help=None, metavar=None)[source]
Action class to read a checksum file similar to any checksum client output. Returns a dictionary where (key: value) pairs respectively are the file path and its checksum.
- class esgprep._utils.parser.DatasetsReader(option_strings, dest, nargs=None, const=None, default=None, type=None, choices=None, required=False, help=None, metavar=None)[source]
Action class to read a dataset identifier list from a simple text file. Returns a list of identifiers.
- class esgprep._utils.parser.VersionChecker(option_strings, dest, nargs=None, const=None, default=None, type=None, choices=None, required=False, help=None, metavar=None)[source]
Custom action class.
- esgprep._utils.parser.processes_validator(value)[source]
Validates the maximum number of processes.
- esgprep._utils.path.extract_version(path: Path) str[source]
Extracts the version string (vXXXXXXXX) from the given path. Raises a ValueError if no valid version is found.
- esgprep._utils.path.get_version_index(path: Path) int[source]
Returns the index position of the version part (vXXXXXXXX) in the path parts.
- esgprep._utils.path.get_version_and_subpath(path: Path) list[str][source]
Returns a list of path parts from the version part to the end of the path.
- esgprep._utils.path.get_path_to_version(path: Path) list[str][source]
Returns a list of path parts from the start part to the version of the path.
- esgprep._utils.path.get_ordered_version_paths(base_path: Path) list[Path][source]
Returns a list of all “version directory” paths in the base_path directory, ordered by version, excluding the ‘latest’ symlink.
- esgprep._utils.path.get_versions(path: Path) list[Path][source]
Returns a list of all version directory paths for the given path, ordered by version. This is used to find all existing versions of a dataset.
- esgprep._utils.path.get_drs(path: Path) Path[source]
Returns the DRS (Data Reference Syntax) part of the path. This returns the path up to but not including the version.
- esgprep._utils.path.is_latest_symlink(path: Path) bool[source]
Check if the path contains ‘latest’ and is a symlink.
- esgprep._utils.path.with_latest_target(path: Path) Path[source]
If path is a ‘latest’ symlink, return the target path. Otherwise return the original path.
- esgprep._utils.path.get_project(path) str | None[source]
Extract project code from a pathlib.Path object.
- esgprep._utils.path.get_terms(path: Path) dict[source]
Extract DRS terms from NetCDF file global attributes. Returns a dictionary of DRS terms for the given path.
- esgprep._utils.path.dataset_id(path: Path) str | None[source]
Build dataset identifier from DRS path structure using esgvoc DrsGenerator. Returns the dataset identifier string for the given path.
Extracts terms from the directory path parts (between DRS root and version) and uses the DrsGenerator to build a valid dataset ID.
Module author: Guillaume Levavasseur <glipsl@ipsl.fr>
- class esgprep._utils.print.COLOR(color=None)[source]
Define color object for print statements Default is no color (i.e., restore original color)
- PALETTE = {'blue': 34, 'cyan': 36, 'gray': 37, 'green': 32, 'light blue': 94, 'light cyan': 96, 'light gray': 97, 'light green': 92, 'light magenta': 95, 'light red': 91, 'light yellow': 93, 'magenta': 35, 'red': 31, 'yellow': 33}
- RESTORE = '\x1b[0m'
- COLORS = False
- class esgprep._utils.print.Print[source]
Class to manage and dispatch print statement depending on log and debug mode.
- LOG: str | None = None
- DEBUG: bool = False
- CMD: str | None = None
- LOG_TO_STDOUT: bool = False
- LOGFILE: str | None = None
- CARRIAGE_RETURNED: bool = True
- BUFFER = <Synchronized wrapper for c_wchar_p(140057509584672)>
Module author: Levavasseur Guillaume (CNRS/IPSL) <glipsl@ipsl.fr>