Code documentation
Note
This section provides API documentation for the esgprep Python modules.
esgdrs
Main module
- platform:
Unix
- synopsis:
Toolbox to prepare ESGF data for publication.
Submodules
Module author: Guillaume Levavasseur <glipsl@ipsl.fr>
- esgprep.drs.run(args)[source]
Main process.
- platform:
Unix
- synopsis:
Constants used in this module.
- platform:
Unix
- synopsis:
Processing context used in this module.
- class esgprep.drs.context.ProcessingContext(args)[source]
Processing context class to drive main process.
- platform:
Unix
- synopsis:
Manages the filesystem tree according to the project the Data Reference Syntax and versioning.
- platform:
Unix
- synopsis:
Manages the filesystem tree according to the project the Data Reference Syntax and versioning.
- platform:
Unix
- synopsis:
Manages the filesystem tree according to the project the Data Reference Syntax and versioning.
esgmapfile
Main module
- platform:
Unix
- synopsis:
Toolbox to prepare ESGF data for publication.
Submodules
Module author: Guillaume Levavasseur <glipsl@ipsl.fr>
- platform:
Unix
- synopsis:
Constants used in this module.
- platform:
Unix
- synopsis:
Processing context used in this module.
- class esgprep.mapfile.context.ProcessingContext(args)[source]
Processing context class to drive main process.
- platform:
Unix
- synopsis:
Generates ESGF mapfiles upon a local ESGF node or not.
- platform:
Unix
- synopsis:
Show mapfile name to be generated..
Utilities
Collectors
Module author: Guillaume Levavasseur <glipsl@ipsl.fr>
- class esgprep._collectors.FilterCollection[source]
Evaluates a string against a dictionary of several regular expressions. The dictionary includes 2-tuples with the regular expression as a string and a boolean indicating to match (i.e., include) or non-match (i.e., exclude) the corresponding expression.
- FILTER_TYPES = (<class 'str'>, typing.Pattern)
Contexts
Module author: Guillaume Levavasseur <glipsl@ipsl.fr>
Exceptions
Module author: Guillaume Levavasseur <glipsl@ipsl.fr>
- exception esgprep._exceptions.KeyNotFound(key, keys=None)[source]
Raised when a class key is not found.
- exception esgprep._exceptions.InvalidChecksumType(client)[source]
Raised when checksum type in unknown.
- exception esgprep._exceptions.ChecksumFail(path, checksum_type=None)[source]
Raised when a checksum fails.
- exception esgprep._exceptions.DuplicatedDataset(path, version)[source]
Raised if a dataset already exists with submitted version.
- exception esgprep._exceptions.OlderUpgrade(version, latest)[source]
Raised if a dataset already exists with submitted version.
- exception esgprep._exceptions.DuplicatedFile(latest, upgrade)[source]
Raised if a NetCDF file already exists into submitted dataset version.
- exception esgprep._exceptions.UnchangedTrackingID(latest, latest_id, upgrade, upgrade_id)[source]
Raised if a NetCDF file already has the tracking ID of submitted file to upgrade.
- exception esgprep._exceptions.NoVersionPattern(regex, patterns)[source]
Raised if no version facet found in the destination format.
- exception esgprep._exceptions.InconsistentDRSPath(project, path)[source]
Raised when DRS path doesn’t start with the project ID.
Handlers
Module author: Guillaume Levavasseur <glipsl@ipsl.fr>
Utils
Module author: Guillaume Levavasseur <glipsl@ipsl.fr>
- esgprep._utils.checksum.multihash(data: bytes, algo: str) bytes[source]
Generate a multihash for the given data using the specified algorithm.
- Args:
data: The data to hash algo: The multihash algorithm name (e.g., “sha2-256”)
- Returns:
The multihash as bytes (code + length + digest)
- Raises:
ValueError: If the algorithm is not supported
- esgprep._utils.checksum.multihash_hex(data: bytes, algo: str) str[source]
Generate a multihash for the given data and return it as a hex string.
- Args:
data: The data to hash algo: The multihash algorithm name (e.g., “sha2-256”)
- Returns:
The multihash as a hexadecimal string
- esgprep._utils.checksum.detect_multihash_algo(hash_hex: str) str[source]
Detect the multihash algorithm from a multihash hex string.
- Args:
hash_hex: The multihash as a hexadecimal string
- Returns:
The algorithm name (e.g., “sha2-256”) or None if not a valid multihash
- esgprep._utils.checksum.is_multihash_algo(checksum_type: str) bool[source]
Check if a checksum type is a multihash algorithm.
- Args:
checksum_type: The checksum type to check
- Returns:
True if it’s a multihash algorithm, False otherwise
- esgprep._utils.checksum.checksum(ffp, checksum_type, include_filename=False, human_readable=True)[source]
Computes a file checksum. Supports both standard hashlib algorithms and multihash algorithms.
- esgprep._utils.checksum.get_checksum_pattern(checksum_type)[source]
Builds a regular expression describing a checksum pattern.
- esgprep._utils.checksum.get_checksum(ffp, checksum_type='sha256', checksums=None)[source]
Global method to get file checksum: 1. By computing the checksum directly. 2. Through a list of checksums in a dictionary way {file: checksum}.
Module author: Guillaume Levavasseur <glipsl@ipsl.fr>
Module author: Guillaume Levavasseur <glipsl@ipsl.fr>
- esgprep._utils.ncfile.get_ncattrs(path: str) dict[source]
Loads netCDF global attributes from a pathlib.Path as dictionary. Ignores attributes with only whitespaces.
- esgprep._utils.ncfile.get_tracking_id(attrs: dict) str[source]
Get tracking_id/PID string from netCDF global attributes.
- esgprep._utils.ncfile.is_valid(identifier: str, project: str) bool[source]
Validates a tracking_id/PID string.
- esgprep._utils.ncfile.get_project(attrs: str | dict) str | None[source]
Extract project code from the file attributes.
Module author: Guillaume Levavasseur <glipsl@ipsl.fr>
- class esgprep._utils.parser.CustomArgumentParser(prog=None, usage=None, description=None, epilog=None, parents=[], formatter_class=<class 'argparse.HelpFormatter'>, prefix_chars='-', fromfile_prefix_chars=None, argument_default=None, conflict_handler='error', add_help=True, allow_abbrev=True, exit_on_error=True)[source]
Custom argument parser class.
- class esgprep._utils.parser.MultilineFormatter(prog, default_columns=120)[source]
Custom formatter class.
- class esgprep._utils.parser.DirectoryChecker(option_strings, dest, nargs=None, const=None, default=None, type=None, choices=None, required=False, help=None, metavar=None)[source]
Action class to check a directory.
- class esgprep._utils.parser.ConfigFileLoader(option_strings, dest, nargs=None, const=None, default=None, type=None, choices=None, required=False, help=None, metavar=None)[source]
Configuration file action class.
- class esgprep._utils.parser.ChecksumsReader(option_strings, dest, nargs=None, const=None, default=None, type=None, choices=None, required=False, help=None, metavar=None)[source]
Action class to read a checksum file similar to any checksum client output. Returns a dictionary where (key: value) pairs respectively are the file path and its checksum.
- class esgprep._utils.parser.DatasetsReader(option_strings, dest, nargs=None, const=None, default=None, type=None, choices=None, required=False, help=None, metavar=None)[source]
Action class to read a dataset identifier list from a simple text file. Returns a list of identifiers.
- class esgprep._utils.parser.VersionChecker(option_strings, dest, nargs=None, const=None, default=None, type=None, choices=None, required=False, help=None, metavar=None)[source]
Custom action class.
- esgprep._utils.parser.processes_validator(value)[source]
Validates the maximum number of processes.
- esgprep._utils.path.extract_version(path: Path) str[source]
Extracts the version string (vXXXXXXXX) from the given path. Raises a ValueError if no valid version is found.
- esgprep._utils.path.get_version_index(path: Path) int[source]
Returns the index position of the version part (vXXXXXXXX) in the path parts.
- esgprep._utils.path.get_version_and_subpath(path: Path) list[str][source]
Returns a list of path parts from the version part to the end of the path.
- esgprep._utils.path.get_path_to_version(path: Path) list[str][source]
Returns a list of path parts from the start part to the version of the path.
- esgprep._utils.path.get_ordered_version_paths(base_path: Path) list[Path][source]
Returns a list of all “version directory” paths in the base_path directory, ordered by version, excluding the ‘latest’ symlink.
- esgprep._utils.path.get_versions(path: Path) list[Path][source]
Returns a list of all version directory paths for the given path, ordered by version. This is used to find all existing versions of a dataset.
- esgprep._utils.path.get_drs(path: Path) Path[source]
Returns the DRS (Data Reference Syntax) part of the path. This returns the path up to but not including the version.
- esgprep._utils.path.is_latest_symlink(path: Path) bool[source]
Check if the path contains ‘latest’ and is a symlink.
- esgprep._utils.path.with_latest_target(path: Path) Path[source]
If path is a ‘latest’ symlink, return the target path. Otherwise return the original path.
- esgprep._utils.path.get_project(path) str | None[source]
Extract project code from a pathlib.Path object.
- esgprep._utils.path.get_terms(path: Path) dict[source]
Extract DRS terms from NetCDF file global attributes. Returns a dictionary of DRS terms for the given path.
- esgprep._utils.path.dataset_id(path: Path) str | None[source]
Build dataset identifier from NetCDF file using esgvoc DrsGenerator. Returns the dataset identifier string for the given path.
Module author: Guillaume Levavasseur <glipsl@ipsl.fr>
- class esgprep._utils.print.COLOR(color=None)[source]
Define color object for print statements Default is no color (i.e., restore original color)
- PALETTE = {'blue': 34, 'cyan': 36, 'gray': 37, 'green': 32, 'light blue': 94, 'light cyan': 96, 'light gray': 97, 'light green': 92, 'light magenta': 95, 'light red': 91, 'light yellow': 93, 'magenta': 35, 'red': 31, 'yellow': 33}
- RESTORE = '\x1b[0m'
- COLORS = False
- class esgprep._utils.print.Print[source]
Class to manage and dispatch print statement depending on log and debug mode.
- LOG: str | None = None
- DEBUG: bool = False
- CMD: str | None = None
- LOG_TO_STDOUT: bool = False
- LOGFILE: str | None = None
- CARRIAGE_RETURNED: bool = True
- BUFFER = <Synchronized wrapper for c_wchar_p(140443376115456)>
Module author: Levavasseur Guillaume (CNRS/IPSL) <glipsl@ipsl.fr>