Frequently Asked Questions
This page answers common questions about esgprep. For detailed troubleshooting,
see Troubleshooting.
General
What is esgprep and why should I use it?
esgprep is a tool suite for preparing climate data for publication on the
Earth System Grid Federation (ESGF). It handles two main tasks:
DRS organization (
esgdrs): Organizes your NetCDF files into the standardized Data Reference Syntax directory structure required by ESGF.Mapfile generation (
esgmapfile): Creates mapfiles containing file metadata (paths, sizes, checksums) needed by the ESGF publication system.
You should use it if you’re publishing climate model output or observational data to ESGF and need to comply with project data standards (CMIP6, CMIP7, CORDEX, etc.).
Do I need to be on an ESGF node to use esgprep?
No. esgprep can run on any Linux/Unix system with Python 3.12+. You can
prepare your data locally and then transfer the DRS structure and mapfiles
to your ESGF node for publication.
What data formats are supported?
esgprep works with NetCDF files (.nc). The files should be:
CMOR-compliant (following CF conventions)
Named according to project conventions
Containing required global attributes
Other formats (HDF5, GRIB, etc.) are not supported.
Can I use esgprep for non-ESGF projects?
esgprep is designed specifically for ESGF projects with controlled
vocabularies managed by esgvoc. It validates facet values against
these vocabularies.
For custom projects, you would need to:
Define your project in
esgvoc(see esgvoc documentation)Or use alternative tools for non-ESGF data organization
Installation
What Python version do I need?
Python 3.12 or higher is required. Check your version:
$ python3 --version
Python 3.12.7
If you have an older version, consider using pyenv or conda to
install a newer Python.
How do I upgrade from version 2.x?
Version 3.0 has significant changes from 2.x:
Install the new version:
$ pip install --upgrade esgprep
Install controlled vocabularies:
$ esgvoc install
Update your scripts:
Remove
esgfetchinicalls (no longer needed)Configuration is now handled by
esgvocCommand syntax is mostly compatible
See Changelog from esgprep 2.x to 3.0 for detailed migration instructions.
What if pip install fails?
Common solutions:
Upgrade pip first:
$ pip install --upgrade pip
Use a virtual environment:
$ python3 -m venv esgprep-env $ source esgprep-env/bin/activate $ pip install esgprep
Check for conflicting packages:
$ pip check
Install build dependencies (if compilation fails):
# Debian/Ubuntu $ sudo apt-get install python3-dev libnetcdf-dev # RHEL/CentOS $ sudo yum install python3-devel netcdf-devel
Usage
How do I know which project to specify?
The --project argument must match a project defined in the esgvoc
controlled vocabularies. Common values:
cmip6- CMIP6 datacmip7- CMIP7 datacordex- CORDEX regional projectionscordex-cmip6- CORDEX driven by CMIP6 modelsinput4mips- Input datasets for MIPsobs4mips- Observational datasets for MIPs
To list available projects:
$ esgvoc list-projects
Note
Project names are case-sensitive. Use cmip6, not CMIP6.
What if my project isn’t supported?
If your project isn’t in the vocabulary:
Check for updates:
$ pip install --upgrade esgvoc $ esgvoc install
Verify the project name (check spelling, case)
Contact your project administrators - new projects need to be added to the official ESGF vocabularies
For testing purposes, you may be able to use a similar project’s vocabulary, but this is not recommended for production.
Can I test without modifying my files?
Yes, several approaches:
Use dry-run commands:
# Preview datasets $ esgdrs make list --project cmip6 /data/incoming/ # Preview structure $ esgdrs make tree --project cmip6 /data/incoming/ # See planned operations $ esgdrs make todo --project cmip6 /data/incoming/ --root /tmp/test
Use a temporary output directory:
$ esgdrs make upgrade --project cmip6 /data/incoming/ \ --root /tmp/test-drs --link
Use hard links (``–link``) to avoid copying files - original files remain untouched.
How do I undo an esgdrs upgrade?
There’s no automatic “undo” command. However:
If you used ``–link`` (hard links):
Your original files are untouched. Simply delete the DRS structure:
$ rm -rf /path/to/drs/PROJECT/
If you used ``–symlink``:
Original files are untouched. Delete the DRS structure.
If you used default mode (move) or ``–copy``:
Files were moved/copied. You’ll need to move them back manually or restore from backup.
Tip
Always use --link when testing to preserve your original files.
Checksums
Which checksum algorithm should I use?
Recommended: SHA256 (default)
$ esgmapfile make --project cmip6 --directory /data/
For new ESGF infrastructure, you can use multihash format:
$ esgmapfile make --project cmip6 --directory /data/ \
--checksum-type sha2-256
Comparison:
Algorithm |
Use Case |
Speed |
ESGF Support |
|---|---|---|---|
sha256 |
General use (default) |
Fast |
Full |
sha2-256 |
Multihash format |
Fast |
Modern nodes |
sha2-512 |
Higher security |
Slower |
Modern nodes |
md5 |
Legacy only |
Fastest |
Deprecated |
Can I skip checksums?
Yes, for testing only:
$ esgmapfile make --project cmip6 --directory /data/ --no-checksum
Warning
Never skip checksums for production data. Checksums are required for ESGF publication and data integrity verification.
How do I provide pre-calculated checksums?
For large datasets, pre-calculate checksums to save time:
Generate checksums:
$ find /data -name "*.nc" -exec sha256sum {} \; > checksums.txt
Use them with esgmapfile:
$ esgmapfile make --project cmip6 --directory /data/ \ --checksums-from checksums.txt
The file format should be standard sha256sum output:
abc123def456... /path/to/file1.nc
789xyz012abc... /path/to/file2.nc
Troubleshooting
Why is my project not recognized?
ValueError: Project 'cmip6' not found in esgvoc
Solutions:
Initialize vocabularies (required after installation):
$ esgvoc install
Check spelling (case-sensitive):
# Correct $ esgdrs make list --project cmip6 /data/ # Wrong $ esgdrs make list --project CMIP6 /data/
Update esgvoc:
$ pip install --upgrade esgvoc $ esgvoc install
List available projects:
$ esgvoc list-projects
What if I get facet validation errors?
ERROR: Invalid value 'my_experiment' for facet 'experiment'
This means your data contains values not in the controlled vocabulary.
Solutions:
Check your NetCDF attributes:
$ ncdump -h your_file.nc | grep experiment
See valid values:
$ esgvoc get cmip6:experiment:
If the value should be valid, update esgvoc:
$ pip install --upgrade esgvoc $ esgvoc install
Override temporarily (use with caution):
$ esgdrs make list --project cmip6 \ --set-value experiment=historical \ /data/
How do I handle duplicate files?
If you get warnings about duplicate datasets:
Check if files are truly duplicates:
$ md5sum /path/to/file1.nc /path/to/file2.nc
If publishing an update, use versioning:
$ esgdrs make upgrade --project cmip6 /data/incoming/ \ --upgrade-from-latest
If replacing existing data, remove the old version first:
$ rm -rf /data/drs/CMIP6/.../v20240101/
Migration from v2.x
What changed from version 2.x?
Major changes in v3.0:
esgfetchini removed - Configuration now handled by
esgvocesgvoc required - Must run
esgvoc installbefore first usePython 3.12+ required - Older Python versions not supported
Command structure - Some subcommands reorganized
Mapfile format - Version separator changed from
#to.v
See Changelog from esgprep 2.x to 3.0 for complete details.
Where did esgfetchini go?
esgfetchini is no longer needed. In v2.x, it downloaded INI
configuration files. In v3.0, configuration is handled by the esgvoc
library:
# Old way (v2.x) - NO LONGER NEEDED
$ esgfetchini
# New way (v3.0)
$ esgvoc install
The esgvoc library manages controlled vocabularies and project
definitions automatically.
Do my old scripts still work?
Most scripts will work with minor modifications:
Remove esgfetchini calls:
# Remove this line esgfetchini # Add this once (or in setup) esgvoc install
Check command syntax:
Most commands are compatible, but verify with
--help:$ esgdrs --help $ esgmapfile --help
Update Python version if needed (3.12+ required)
Test with sample data before running on production
See Also
Getting Started - Step-by-step tutorial
Troubleshooting - Detailed error solutions
Concepts & Terminology - ESGF terminology explained
Changelog from esgprep 2.x to 3.0 - Upgrading from v2.x