Generic usage

Important

REQUIRED FIRST STEP: Before using any esgdrs or esgmapfile commands, you must install the controlled vocabularies:

esgvoc install
# OR if using uv:
uv run esgvoc install

Without this step, all commands will fail with: RuntimeError: universe connection is not initialized

Keep vocabularies updated: Run esgvoc install regularly to update your local controlled vocabularies with the latest changes from ESGF projects. Outdated vocabularies may cause validation failures.

See Installation for more details, or visit the esgvoc documentation for advanced vocabulary management.

All the following arguments can be safely combined and add to each of the esgprep COMMAND:
  • esgdrs

  • esgmapfile

Check the help

$> COMMAND [SUBCOMMAND] {-h,--help}

Check the version

$> COMMAND {-v,--version}

Note

The program version will be the same for all the esgprep tools.

Debug mode

Some progress bars informs you about the processing status of the different subcommands. You can switch to a more verbose mode displaying each step with useful additional information.

$> COMMAND [SUBCOMMAND] {-d,--debug}

Warning

This debug/verbose mode is silently activated in the case of a logfile (i.e., no progress bars).

Specify the project

The --project argument is used to parse the corresponding project configuration. It is always required. This argument is case-sensitive and has to correspond to a valid project name in the esgvoc controlled vocabularies.

$> COMMAND [SUBCOMMAND] {-p,--project} PROJECT_ID

Configuration management

Configuration and controlled vocabularies are now managed by the esgvoc library. The esgvoc library automatically handles vocabulary fetching and caching. By default, vocabularies are cached locally to avoid repeated downloads.

For more information on esgvoc configuration, see the Configuration section.

For detailed information about controlled vocabularies management and the esgvoc library, see the esgvoc documentation.

Use a logfile

Outputs can be logged into a file named PROG-YYYYMMDD-HHMMSS-PID.log. If not, the standard output is used following the verbose mode. By default, the logfiles are stored in a logs folder created in your current working directory (if not exists). It can be changed by adding a optional logfile directory to the flag.

$> COMMAND [SUBCOMMAND] -l [/PATH/TO/LOGDIR/]

Use filters

esgdrs and esgmapfile commands will scan your local archive to achieve proper data management. In such a scan, you can filter the file discovery by using a Python regular expression (see re Python library).

The default is to walk through your local filesystem ignoring the files and latest version levels and any hidden folders by using the following regular expression: ^.*/(files|latest|\.[\w]*).*$. It can be change with:

$> COMMAND [SUBCOMMAND] --ignore-dir PYTHON_REGEX

esgprep only considers unhidden NetCDF files by default excuding the regular expression ^\..*$ and including the following one .*\.nc$. It can be independently change with:

$> COMMAND [SUBCOMMAND] --include-file PYTHON_REGEX --exclude-file PYTHON_REGEX

Keep in mind that --ignore-dir and --exclude-file specifie a directory pattern NOT to be matched, while --include-file specifies a filename pattern TO BE matched.

Warning

esgdrs only works with unhidden NetCDF files.

Use multiprocessing

esgprep uses a multiprocessing. This is useful to process a large amount of data, especially in the case of drs and mapfile subcommands with file checksum computation. Set the number of maximal processes to simultaneously treat several files. One process seems sequential processing. Set it to -1 to use all available CPU processes (as returned by multiprocessing.cpu_count()). Default is set to 4 processes.

$> COMMAND [SUBCOMMAND] --max-processes INTEGER

Warning

The number of maximal processes is limited to the maximum CPU count in any case.

Toggle color prompt

esgprep commands prompot you different results in a color fashion to emphasise useful information. Those colors can lead to undesired characters into the logs or machine-readable processes. Default is to enable colors when priting to a terminal. You can switch on/off colors print by using:

$> COMMAND [SUBCOMMAND] {--color, --no-color}

Exit status

  • Status = -1

    Argument parsing error.