Code documentation

esgfetchini

platform:Unix
synopsis:Toolbox to prepare ESGF data for publication.
esgprep.esgfetchini.get_args(args=None)[source]

Returns parsed command-line arguments.

Returns:The argument parser
Return type:argparse.Namespace
esgprep.esgfetchini.main(args=None)[source]

Run main program

platform:Unix
synopsis:Constants used in this module.
platform:Unix
synopsis:Processing context used in this module.
class esgprep.fetchini.context.ProcessingContext(args)[source]

Encapsulates the processing context/information for main process.

Parameters:args (ArgumentParser) – The command-line arguments parser
Returns:The processing context
Return type:ProcessingContext
platform:Unix
synopsis:Fetches ESGF configuration files from GitHub repository.
esgprep.fetchini.main.make_outdir(root)[source]

Build the output directory as follows:

Parameters:root (str) – The root directory
esgprep.fetchini.main.run(args)[source]

Main process that:

  • Decide to fetch or not depending on file presence/absence and command-line arguments,
  • Gets the GitHub file content from full API URL,
  • Backups old file if desired,
  • Writes response into INI file.
Parameters:args (ArgumentParser) – Parsed command-line arguments

esgfetchtables

platform:Unix
synopsis:Toolbox to prepare ESGF data for publication.
esgprep.esgfetchtables.get_args()[source]

Returns parsed command-line arguments.

Returns:The argument parser
Return type:argparse.Namespace
esgprep.esgfetchtables.main()[source]

Run main program

platform:Unix
synopsis:Constants used in this module.
platform:Unix
synopsis:Processing context used in this module.
class esgprep.fetchtables.context.ProcessingContext(args)[source]

Encapsulates the processing context/information for main process.

Parameters:args (ArgumentParser) – The command-line arguments parser
Returns:The processing context
Return type:ProcessingContext
platform:Unix
synopsis:Fetches ESGF configuration files from GitHub repository.
esgprep.fetchtables.main.make_outdir(tables_dir, repository, reference=None)[source]

Build the output directory.

Parameters:
  • tables_dir (str) – The CMOR tables directory submitted
  • repository (str) – The GitHub repository name
  • reference (str) – The GitHub reference name (tag or branch)
esgprep.fetchtables.main.get_special_case(f, url, repo, ref, auth)[source]

Get a dictionary of (filename -> file_info) pairs to be used for named files in place of the file info from the general API call done for the directory. file_info should contain at least the elements ‘sha’ and ‘download_url’

esgprep.fetchtables.main.fetch_gh_ref(url, outdir, auth, keep, overwrite, backup_mode, filter, special_cases=None)[source]

Fetch all files for a single reference (e.g. tag or branch) of a GitHub repository

esgprep.fetchtables.main.run(args)[source]

Main process that:

  • Decide to fetch or not depending on file presence/absence and command-line arguments,
  • Gets the GitHub file content from full API URL,
  • Backups old file if desired,
  • Writes response into table file.
Parameters:args (ArgumentParser) – Parsed command-line arguments

esgdrs

platform:Unix
synopsis:Toolbox to prepare ESGF data for publication.
esgprep.esgdrs.get_args()[source]

Returns parsed command-line arguments.

Returns:The argument parser
Return type:argparse.Namespace
esgprep.esgdrs.main()[source]

Run main program

platform:Unix
synopsis:Constants used in this module.
platform:Unix
synopsis:Custom exceptions used in this module.
exception esgprep.drs.custom_exceptions.DuplicatedDataset(path, version)[source]

Raised if a dataset already exists with submitted version.

exception esgprep.drs.custom_exceptions.OlderUpgrade(version, latest)[source]

Raised if a dataset already exists with submitted version.

exception esgprep.drs.custom_exceptions.DuplicatedFile(latest, upgrade)[source]

Raised if a NetCDF file already exists into submitted dataset version.

exception esgprep.drs.custom_exceptions.UnchangedTrackingID(latest, latest_id, upgrade, upgrade_id)[source]

Raised if a NetCDF file already has the tracking ID of submitted file to upgrade.

exception esgprep.drs.custom_exceptions.NoVersionPattern(regex, patterns)[source]

Raised if no version facet found in the destination format.

exception esgprep.drs.custom_exceptions.ReadAccessDenied(user, path)[source]

Raised when user has no read access.

exception esgprep.drs.custom_exceptions.WriteAccessDenied(user, path)[source]

Raised when user has not write access.

exception esgprep.drs.custom_exceptions.CrossMigrationDenied(src, dst, mode)[source]

Raised when migration fails for cross-device link.

exception esgprep.drs.custom_exceptions.MigrationDenied(src, dst, mode, reason)[source]

Raised when migration fails in another case.

exception esgprep.drs.custom_exceptions.InconsistentDRSPath(project, path)[source]

Raised when DRS path doesn’t start with the project ID.

platform:Unix
synopsis:Processing context used in this module.
class esgprep.drs.context.ProcessingContext(args)[source]

Encapsulates the processing context/information for main process.

Parameters:args (ArgumentParser) – Parsed command-line arguments
Returns:The processing context
Return type:ProcessingContext
check_existing_commands_file()[source]

Check for existing commands file, and depending on --overwrite-commands-file setting, either delete it or throw a fatal error.

platform:Unix
synopsis:Class to handle dataset directory for DRS management.
class esgprep.drs.handler.File(ffp)[source]

Handler providing methods to deal with file processing.

get(key)[source]

Returns the attribute value corresponding to the key. The submitted key can refer to File.key or File.attributes[key].

Parameters:key (str) – The key
Returns:The corresponding value
Return type:str or list or dict depending on the key
Raises:Error – If unknown key
load_attributes(root, pattern, set_values)[source]

Loads DRS attributes catched from a regular expression match. The root facet is added by default. The dataset version is initially set to None. Can be overwrite by “set_values” pairs if submitted.

Parameters:
  • root (str) – The DRS tree root
  • pattern (str) – The regular expression to match
  • set_values (dict) – Key/value pairs of facet to set for the run
Raises:
  • Error – If regular expression matching fails
  • Error – If invalid NetCDF file.
check_facets(facets, config, set_keys)[source]

Checks each facet against the controlled vocabulary. If a DRS attribute is missing regarding the list of facets, The DRS attributes are completed from the configuration file maptables. In the case of non-standard attribute, it gets the most similar key among netCDF attributes names. Attributes can be directly mapped with “set_keys” pairs if submitted.

Parameters:
  • facets (list) – The list of facet to check
  • config (ESGConfigParser.SectionParser) – The configuration parser
  • set_keys (dict) – Key/Attribute pairs to map for the run
Raises:

Error – If one facet checkup fails

get_drs_parts(facets)[source]

Gets the DRS pairs required to build the DRS path. The DRS parts are included as an OrderedDict(): {project : ‘CMIP5’, product: ‘output1’, …}

Parameters:facets (list) – The list of facet to check
Returns:The ordered DRS parts
Return type:OrderedDict
class esgprep.drs.handler.DRSPath(parts)[source]

Handler providing methods to deal with paths.

get(key)[source]

Returns the attribute value corresponding to the key. The submitted key can refer to the DRS dataset parts of the DRS file parts.

Parameters:key (str) – The key
Returns:The value
Return type:str or list or dict depending on the key
Raises:Error – If unknown key
items(d_part=True, f_part=True, version=True, file_folder=False, latest=False, root=False)[source]

Itemizes the facet values along the DRS path. Flags can be combine to obtain different behaviors.

Parameters:
  • d_part (boolean) – True to append the dataset facets
  • f_part (boolean) – True to append the file facets
  • version (boolean) – True to append the version facet
  • file_folder (boolean) – True to append the folder for physical files
  • latest (boolean) – True to switch from upgrade to latest version
  • root (boolean) – True to prepend the DRS root directory
Returns:

The corresponding facet values

Return type:

list

path(**kwargs)[source]

Convert a list of facet values into path. The arguments are the same as esgprep.drs.handler.DRSPath.items()

Returns:The path
Return type:str
get_latest_version()[source]

Get the current latest dataset version if exists.

Returns:The latest dataset version properly formatted
Return type:str
Raises:Error – If latest version exists and is the same as upgrade version
class esgprep.drs.handler.DRSLeaf(src, dst, mode, origin)[source]

Handler providing methods to deal with DRS file.

upgrade(todo_only=True, commands_file=None)[source]

Upgrade the DRS tree.

Parameters:
  • commands_file (str) – The file to write command-lines statement if submitted
  • todo_only (boolean) – True to only print Unix command-lines to apply (i.e., as dry-run)
has_permissions(root)[source]

Checks permissions for DRS leaf migration. Discards relative paths.

Parameters:root (str) – The DRS tree root
Raises:Error – If missing user privileges
migration_granted(root)[source]

Check if migration mode is allowed by filesystem. Bacially, copy or move will always succeed. Only hardlinks could fail depending on the filesystem partition.

Parameters:root (str) – The DRS tree root
Raises:Error – If migration is disallowed by filesystem configuration
class esgprep.drs.handler.DRSTree(root=None, version=None, mode=None, outfile=None)[source]

Handler providing methods to deal with DRS tree.

get_display_lengths()[source]

Gets the string lengths for comfort display.

create_leaf(nodes, leaf, label, src, mode, origin=None, force=False)[source]

Creates all upstream nodes to a DRS leaf. The esgprep.drs.handler.DRSLeaf() class is added to data leaf nodes.

Parameters:
  • nodes (list) – The list of node tags to the leaf
  • leaf (str) – The leaf name
  • label (str) – The leaf label
  • src (str) – The source of the leaf
  • mode (str) – The migration mode (e.g., ‘copy’, ‘move’, etc.)
  • origin (str) – The original file full path used for the DRSLeaf source
  • force (boolean) – Overwrite node creation if True and node exists
leaves(root=None)[source]

Yield leaves of the whole DRS tree of a subtree.

check_uniqueness()[source]

Check tree upgrade uniqueness. Each data version to upgrade has to be stricly different from the latest version if exists.

list()[source]

List and summary upgrade information at the publication level.

tree()[source]

Prints the whole DRS tree in a visual way.

todo()[source]

As a dry run esgprep.drs.handler.DRSTree.upgrade() that only prints command-lines to do.

upgrade(todo_only=False)[source]

Upgrades the whole DRS tree.

Parameters:todo_only (boolean) – Only print Unix command-line to do
esgprep.drs.handler.print_cmd(line, commands_file, todo_only, mode='a')[source]

Print unix command-line depending on the choosen output and DRS action.

Parameters:
  • line (str) – The command-line to write.
  • commands_file (str) – The output file to write command-lines, None if not.
  • todo_only (boolean) – True to only print Unix command-lines to apply (i.e., as dry-run)
  • mode (str) – File open() mode
platform:Unix
synopsis:Manages the filesystem tree according to the project the Data Reference Syntax and versioning.
esgprep.drs.main.process(collector_input)[source]

File process that:

  • Handles files,
  • Deduces facet key, values pairs from file attributes
  • Checks facet values against CV,
  • Applies the versioning
  • Populates the DRS tree crating the appropriate leaves,
  • Stores dataset statistics.
Parameters:source (str) – The file full path to process
esgprep.drs.main.tree_builder(fh)[source]

Builds the DRS tree accord to a source

Parameters:fh (esgprep.drs.handler.File) – The file handler object
esgprep.drs.main.initializer(keys, values)[source]

Initialize process context by setting particular variables as global variables.

Parameters:
  • keys (list) – Argument name
  • values (list) – Argument value
esgprep.drs.main.do_scanning(ctx)[source]

Returns True if file scanning is necessary regarding command-line arguments

Parameters:ctx (esgprep.drs.context.ProcessingContext) – New processing context to evaluate
Returns:True if file scanning is necessary
Return type:boolean
esgprep.drs.main.run(args)[source]

Main process that:

  • Instantiates processing context,
  • Loads previous program instance,
  • Parallelizes file processing with threads pools,
  • Apply command-line action to the whole DRS tree,
  • Evaluate exit status.
Parameters:args (ArgumentParser) – The command-line arguments parser

esgcheckvocab

platform:Unix
synopsis:Toolbox to prepare ESGF data for publication.
esgprep.esgcheckvocab.get_args()[source]

Returns parsed command-line arguments.

Returns:The argument parser
Return type:argparse.Namespace
esgprep.esgcheckvocab.main()[source]

Run main program

platform:Unix
synopsis:Constants used in this module.
platform:Unix
synopsis:Processing context used in this module.
class esgprep.checkvocab.context.ProcessingContext(args)[source]

Encapsulates the processing context/information for main process.

Parameters:args (ArgumentParser) – The command-line arguments parser
Returns:The processing context
Return type:ProcessingContext
platform:Unix
synopsis:Custom exceptions used in this module.
platform:Unix
synopsis:Checks DRS vocabulary against configuration files.
esgprep.checkvocab.main.process(collector_input)[source]

Data process that:

  • Retrieve facet key, values pairs from file or directory attributes
Parameters:source (str) – The file full path to process or the dataset ID
esgprep.checkvocab.main.initializer(keys, values)[source]

Initialize process context by setting particular variables as global variables.

Parameters:
  • keys (list) – Argument name
  • values (list) – Argument value
esgprep.checkvocab.main.run(args)[source]

Main process that:

  • Instantiates processing context
  • Parses the configuration files options and values,
  • Deduces facets and values from directories or dataset lists,
  • Compares the values of each facet between both,
  • Print or log the checking.
Parameters:args (ArgumentParser) – The command-line arguments parser

esgmapfile

platform:Unix
synopsis:Toolbox to prepare ESGF data for publication.
esgprep.esgmapfile.get_args()[source]

Returns parsed command-line arguments.

Returns:The argument parser
Return type:argparse.Namespace
esgprep.esgmapfile.main()[source]

Run main program

platform:Unix
synopsis:Constants used in this module.
platform:Unix
synopsis:Processing context used in this module.
class esgprep.mapfile.context.ProcessingContext(args)[source]

Encapsulates the processing context/information for main process.

Parameters:args (ArgumentParser) – The command-line arguments parser
Returns:The processing context
Return type:ProcessingContext
clean()[source]

Clean directory from incomplete mapfiles. Incomplete mapfiles from a previous run are silently removed.

platform:Unix
synopsis:Custom exceptions used in this module.
exception esgprep.mapfile.custom_exceptions.InconsistentDatasetID(project, dset_id)[source]

Raised when dataset ID doesn’t start with the project ID.

platform:Unix
synopsis:Class to handle files for mapfile generation.
class esgprep.mapfile.handler.Source(source)[source]

Handler providing methods to deal with file processing.

get(key)[source]

Returns the attribute value corresponding to the key. The submitted key can refer to File.key or File.attributes[key].

Parameters:key (str) – The key
Returns:The corresponding value
Return type:str or list or dict depending on the key
Raises:Error – If unknown key
load_attributes(pattern)[source]

Loads DRS attributes catched from a regular expression match. The project facet is added in any case with lower case.

Parameters:pattern (str) – The regular expression to match
Raises:Error – If regular expression matching fails
check_facets(facets, config)[source]

Checks each facet against the controlled vocabulary. If a DRS attribute is missing regarding the list of facets, the DRS attributes are completed from the configuration file maptables.

Parameters:
  • facets (list) – The list of facet to check
  • config (ESGConfigParser.SectionParser) – The configuration parser
Raises:

Error – If one facet checkup fails

get_dataset_id(dataset_format)[source]

Builds the dataset identifier from the dataset template interpolation.

Parameters:dataset_format (str) – The dataset template pattern
Returns:The resulting dataset identifier
Return type:str
Raises:Error – If a facet key is missing
get_dataset_version(no_version=False)[source]

Retrieve the dataset version. If the version facet cannot be deduced from full path, it follows the symlink to complete the DRS attributes.

Parameters:no_version (boolean) – True to not append version to the dataset ID
Returns:The dataset version
Return type:str
class esgprep.mapfile.handler.Dataset(*args, **kwargs)[source]

Dataset handler class

class esgprep.mapfile.handler.File(*args, **kwargs)[source]

File handler class

platform:Unix
synopsis:Generates ESGF mapfiles upon a local ESGF node or not.
esgprep.mapfile.main.get_output_mapfile(outdir, attributes, mapfile_name, dataset_id, dataset_version, mapfile_drs=None, basename=False)[source]

Builds the mapfile full path depending on:

  • the –mapfile name using tokens,
  • an optional mapfile tree declared in configuration file with mapfile_drs,
  • the –outdir output directory.
Parameters:
  • outdir (str) – The output directory (default is current working directory)
  • attributes (dict) – The facets values deduces from file full path
  • mapfile_name (str) – An optional mapfile name from the command-line
  • dataset_id (str) – The dataset id
  • dataset_version (str) – The dataset version
  • mapfile_drs (str) – The optional mapfile tree
  • basename (boolean) – True to only get mapfile name without root directory
Returns:

The mapfile full path

Return type:

str

esgprep.mapfile.main.mapfile_entry(dataset_id, dataset_version, ffp, size, optional_attrs)[source]

Builds the mapfile entry corresponding to a processed file.

Parameters:
  • dataset_id (str) – The dataset id
  • dataset_version (str) – The dataset version
  • ffp (str) – The file full path
  • size (str) – The file size
  • optional_attrs (dict) – Optional attributes to append to mapfile lines
Returns:

The mapfile line/entry

Return type:

str

esgprep.mapfile.main.write(outfile, entry)[source]

Inserts a mapfile entry. It generates a lockfile to avoid that several threads write on the same file at the same time. A LockFile is acquired and released after writing. Acquiring LockFile is timeouted if it’s locked by other thread. Each process adds one line to the appropriate mapfile

Parameters:
  • outfile (str) – The output mapfile full path
  • entry (str) – The mapfile entry to write
esgprep.mapfile.main.process(source)[source]

File process that:

  • Handles file,
  • Harvests directory attributes,
  • Check DRS attributes against CV,
  • Builds dataset ID,
  • Retrieves file size,
  • Does checksums,
  • Deduces mapfile name,
  • Writes the corresponding mapfile entry.

Any error leads to skip the file. It does not stop the process.

Parameters:source (str) – The source to process could be a path or a dataset ID
Returns:The output mapfile full path
Return type:str
esgprep.mapfile.main.initializer(keys, values)[source]

Initialize process context by setting particular variables as global variables.

Parameters:
  • keys (list) – Argument name list
  • values (list) – Argument value list
esgprep.mapfile.main.run(args)[source]

Main process that:

  • Instantiates processing context,
  • Parallelizes file processing with threads pools,
  • Copies mapfile(s) to the output directory,
  • Evaluate exit status.
Parameters:args (ArgumentParser) – Command-line arguments parser

utils

platform:Unix
synopsis:Useful functions to collect data from directories.
class esgprep.utils.collectors.Collecting(spinner)[source]

Spinner pending data collection.

next()[source]

Print collector spinner

class esgprep.utils.collectors.Collector(sources, spinner=True)[source]

Base collector class to yield regular NetCDF files.

Parameters:sources (list) – The list of sources to parse
Returns:The data collector
Return type:iter
class esgprep.utils.collectors.PathCollector(*args, **kwargs)[source]

Collector class to yield files from a list of directories to parse.

Parameters:dir_filter (str) – A regular expression to exclude directories from the collection
class esgprep.utils.collectors.VersionedPathCollector(project, dir_format, *args, **kwargs)[source]

Collector class to yield files from a list of versioned directories to parse.

Parameters:dir_format (str) – The regular expression of the directory format
version_finder(directory)[source]

Returns the version number find into a DRS path :param str directory: The directory to parse :returns: The version :rtype: str

class esgprep.utils.collectors.DatasetCollector(versioned=True, *args, **kwargs)[source]

Collector class to yield datasets from a list of files to read.

class esgprep.utils.collectors.FilterCollection[source]

Regex dictionary with a call method to evaluate a string against several regular expressions. The dictionary values are 2-tuples with the regular expression as a string and a boolean indicating to match (i.e., include) or non-match (i.e., exclude) the corresponding expression.

add(name=None, regex='*', inclusive=True)[source]

Add new filter

platform:Unix
synopsis:Constants used in this package.
platform:Unix
synopsis:Useful functions to use with this package.
class esgprep.utils.context.BaseContext(args)[source]

Base class for processing context manager.

class esgprep.utils.context.GitHubBaseContext(args)[source]

Base manager class for esgfetch* modules.

authenticate()[source]

Builds GitHub HTTP authenticator

Returns:The HTTP authenticator
Return type:requests.auth.HTTPBasicAuth
class esgprep.utils.context.MultiprocessingContext(args)[source]

Base manager class for esgmapfile, esgdrs and esgcheckvocab modules.

get_checksum_type()[source]

Gets the checksum type to use. Be careful to Exception constants by reading two different sections.

Returns:The checksum type
Return type:str
platform:Unix
synopsis:Custom exceptions used in this package.
exception esgprep.utils.custom_exceptions.InvalidNetCDFFile(path)[source]

Raised when invalid or corrupted NetCDF file.

exception esgprep.utils.custom_exceptions.NoNetCDFAttribute(attribute, path, variable=None)[source]

Raised when a NetCDF attribute is missing.

exception esgprep.utils.custom_exceptions.KeyNotFound(key, keys=None)[source]

Raised when a class key is not found.

exception esgprep.utils.custom_exceptions.InvalidChecksumType(client)[source]

Raised when checksum type in unknown.

exception esgprep.utils.custom_exceptions.ChecksumFail(path, checksum_type=None)[source]

Raised when a checksum fails.

exception esgprep.utils.custom_exceptions.NoFileFound(paths)[source]

Raised when frequency no file found.

exception esgprep.utils.custom_exceptions.GitHubException(msg)[source]

Basic exception for GitHub errors.

exception esgprep.utils.custom_exceptions.GitHubUnauthorized[source]

Raised when no read access on GitHub repo.

exception esgprep.utils.custom_exceptions.GitHubAPIRateLimit(reset_time)[source]

Raised when GitHub API rate limit exceeded.

exception esgprep.utils.custom_exceptions.GitHubFileNotFound[source]

Raised when no file found on GitHub repo.

exception esgprep.utils.custom_exceptions.GitHubConnectionError[source]

Raised when the GitHub request fails.

exception esgprep.utils.custom_exceptions.GitHubReferenceNotFound(ref, refs)[source]

Raised when invalid GitHub reference requested.

platform:Unix
synopsis:Useful functions to use with this package.
class esgprep.utils.custom_print.COLOR(color=None)[source]

Define color object for print statements Default is no color (i.e., restore original color)

class esgprep.utils.custom_print.COLORS[source]

String colors for print statements

class esgprep.utils.custom_print._TAGS[source]

Tags strings for print statements These are evaluated as properties, in order to defer until after enable_colors or disable_colors has been called during initialisation

class esgprep.utils.custom_print.Print[source]

Class to manage and dispatch print statement depending on log and debug mode.

platform:Unix
synopsis:Useful functions to use with this package.
esgprep.utils.github.gh_request_content(url, auth=None)[source]

Gets the GitHub content of a file or a directory.

Parameters:
  • url (str) – The GitHub url to request
  • auth (requests.auth.HTTPBasicAuth) – The authenticator object
Returns:

The GitHub request content

Return type:

requests.models.Response

Raises:
  • Error – If user not authorized to read GitHub repository
  • Error – If user exceed the GitHub API rate limit
  • Error – If the queried content does not exist
  • Error – If the GitHub request fails for other reasons
esgprep.utils.github.backup(f, mode=None)[source]

Backup a local file following different modes:

  • “one_version” renames the existing file in its source directory adding a “.bkp” extension to the filename.
  • “keep_versions” moves the existing file in a child directory called “bkp” and add a timestamp to the filename.
Parameters:
  • f (str) – The file to backup
  • mode (str) – The backup mode to follow
esgprep.utils.github.write_content(outfile, content)[source]

Write GitHub content into a file.

Parameters:
  • outfile (str) – The output file
  • content (str) – The file content to write
esgprep.utils.github.do_fetching(f, remote_checksum, keep, overwrite)[source]

Returns True or False depending on decision schema

Parameters:
  • f (str) – The file to test
  • remote_checksum (str) – The remote file checksum
  • overwrite (boolean) – True if overwrite existing files
  • keep (boolean) – True if keep existing files
Returns:

True depending on the conditions

Return type:

boolean

esgprep.utils.github.githash(outfile)[source]

Makes Git checksum (as called by “git hash-object”) of a file

Parameters:outfile
Returns:The SHA1 sum
platform:Unix
synopsis:Useful functions to use with this package.
class esgprep.utils.misc.ProcessContext(args)[source]

Encapsulates the processing context/information for child process.

Parameters:args (dict) – Dictionary of argument to pass to child process
Returns:The processing context
Return type:ProcessContext
class esgprep.utils.misc.ncopen(path, mode='r')[source]

Properly opens a netCDF file

Parameters:path (str) – The netCDF file full path
Returns:The netCDF dataset object
Return type:netCDF4.Dataset
esgprep.utils.misc.remove(pattern, string)[source]

Removes a substring catched by a regular expression.

Parameters:
  • pattern (str) – The regular expression to catch
  • string (str) – The string to test
Returns:

The string without the catched substring

Return type:

str

esgprep.utils.misc.match(pattern, string, inclusive=True)[source]

Validates a string against a regular expression. Only match at the beginning of the string. Default is to match inclusive regex.

Parameters:
  • pattern (str) – The regular expression to match
  • string (str) – The string to test
  • inclusive (boolean) – False if negative matching (i.e., exclude the regex)
Returns:

True if it matches

Return type:

boolean

esgprep.utils.misc.load(path)[source]

Loads data from Pickle file.

Parameters:path (str) – The Pickle file path
Returns:The Pickle file content
Return type:object
esgprep.utils.misc.store(path, data)[source]

Stores data into a Pickle file.

Parameters:
  • path (str) – The Pickle file path
  • data (list) – A list of data objects to store
esgprep.utils.misc.evaluate(results)[source]

Evaluates a list depending on absence/presence of None values.

Parameters:results (list) – The list to evaluate
Returns:True if no blocking errors
Return type:boolean
esgprep.utils.misc.checksum(ffp, checksum_type, include_filename=False, human_readable=True)[source]

Does the checksum by the Shell avoiding Python memory limits.

Parameters:
  • ffp (str) – The file full path
  • checksum_type (str) – Checksum type
  • human_readable (boolean) – True to return a human readable digested message
  • include_filename (boolean) – True to include filename in hash calculation
Returns:

The checksum

Return type:

str

Raises:

Error – If the checksum fails

esgprep.utils.misc.get_checksum_pattern(checksum_type)[source]

Build the checksum pattern depending on the checksum type.

Parameters:checksum_type (str) – The checksum type
Returns:The checksum pattern
Return type:re.Object
esgprep.utils.misc.get_tracking_id(ffp, project)[source]

Get and validate tracking_id/PID string from netCDF global attributes of file

Parameters:
  • ffp (str) – The file full path
  • project (str) – The project name
Returns:

THe tracking_id string

esgprep.utils.misc.is_uuid(uuid_string, version=4)[source]

Returns True is validated string is a UUID.

Parameters:
  • uuid_string (str) – The string to validate
  • version (int) – The UUID version to use, default is 4
Returns:

True if uuid_string is a valid uuid

Return type:

boolean

esgprep.utils.misc.load_checksums(checksum_file)[source]

Convert checksums file input as dictionary where (key: value) pairs respectively are the file path and its checksum.

Parameters:checksum_file (FileObject) – The submitted checksum file
Returns:The loaded checksums
Return type:dict
esgprep.utils.misc.get_checksum(ffp, checksum_type='sha256', checksums_from_file=None)[source]

Get file checksum. Allows to submit a list of checksums in a dictionary way {file: checksum}, to be used by –checksums-from flag.

Parameters:
  • checksum_type (str) – Checksum type
  • checksums_from_file (dict) – Checksums from file
Returns:

The checksum

Return type:

str

Raises:

Error – If the checksum fails

platform:Unix
synopsis:Class and methods used to parse command-line arguments.
class esgprep.utils.parser.MultilineFormatter(prog, default_columns=120)[source]

Custom formatter class for argument parser to use with the Python argparse module.

class esgprep.utils.parser.DirectoryChecker(option_strings, dest, nargs=None, const=None, default=None, type=None, choices=None, required=False, help=None, metavar=None)[source]

Custom action class for argument parser to use with the Python argparse module.

static directory_checker(path)[source]

Checks if the supplied directory exists. The path is normalized without trailing slash.

Parameters:path (str) – The path list to check
Returns:The same path list
Return type:str
Raises:Error – If one of the directory does not exist
class esgprep.utils.parser.VersionChecker(option_strings, dest, nargs=None, const=None, default=None, type=None, choices=None, required=False, help=None, metavar=None)[source]

Custom action class for argument parser to use with the Python argparse module.

static version_checker(version)[source]

Checks the version format from command-line.

Returns:The version if allowed
Return type:str
Raises:Error – If invalid version format
esgprep.utils.parser.keyval_converter(pair)[source]

Checks the key value syntax.

Parameters:pair (str) – The key/value pair to check
Returns:The key/value pair
Return type:list
Raises:Error – If invalid pair syntax
esgprep.utils.parser.regex_validator(string)[source]

Validates a Python regular expression syntax.

Parameters:string (str) – The string to check
Returns:The Python regex
Return type:re.compile
Raises:Error – If invalid regular expression
esgprep.utils.parser.processes_validator(value)[source]

Validates the max processes number.

Parameters:value (str) – The max processes number submitted
Returns:

Module author: Levavasseur Guillaume (CNRS/IPSL) <glipsl@ipsl.fr>