Getting Started
This guide will walk you through your first use of esgprep to prepare climate data for ESGF publication.
By the end, you’ll understand the complete workflow from raw NetCDF files to publication-ready mapfiles.
Note
This guide assumes you have already installed esgprep. If not, see Installation.
Prerequisites
Before starting, ensure you have:
- System Requirements:
Python 3.12 or higher
Linux/Unix environment
Access to the filesystem containing your data
Sufficient disk space (at least 2x your data size for DRS structure)
- Data Requirements:
CMOR-compliant NetCDF files
Files from a supported ESGF project (CMIP6, CMIP5, CORDEX, etc.)
Files with proper global attributes and naming conventions
Verify Your Environment:
# Check Python version
$ python3 --version
Python 3.12.7
# Check esgprep is installed
$ esgdrs --version
esgdrs (from esgprep v3.0.0)
$ esgmapfile --version
esgmapfile (from esgprep v3.0.0)
Important
Initialize Controlled Vocabularies
Before using esgprep for the first time, you must initialize the controlled vocabularies:
$ esgvoc install
This downloads ESGF project vocabularies and builds local databases. The installation may take a few minutes.
What happens if you skip this? You’ll see an error like:
RuntimeError: universe connection is not initialized
Keep your vocabularies updated: Run esgvoc install periodically to get the latest controlled
vocabulary updates from ESGF projects. This ensures you can work with newly added experiments, models,
or updated facet values.
For more details about controlled vocabularies, see the esgvoc documentation.
Understanding the Workflow
The esgprep workflow has two main stages:
┌─────────────────┐
│ NetCDF Files │ Your incoming CMOR-compliant standardized data
│ (any location) │ (following project norms: CMIP7, CMIP6, etc.)
└────────┬────────┘
│
│ esgdrs list (preview datasets)
│ esgdrs tree (preview structure)
│ esgdrs upgrade (organize files)
↓
┌─────────────────┐
│ DRS Structure │ Files organized following the project DRS
│ (versioned) │
└────────┬────────┘
│
│ esgmapfile make (generate publication metadata)
↓
┌─────────────────┐
│ Mapfiles │ Ready for ESGF publication
└─────────────────┘
Key Concepts:
DRS (Data Reference Syntax): A standardized directory structure for organizing climate data
Facets: Metadata attributes (like project, model, experiment) that define your data
Dataset ID: A unique identifier constructed from facets, like
CMIP6.CMIP.IPSL.IPSL-CM6A-LR.historical...Mapfiles: Text files listing your data files with checksums, required for ESGF publication
Example Scenario
Let’s prepare CMIP6 data from the IPSL-CM6A-LR model for publication.
Starting Point:
Note
Your files must be CMOR-compliant standardized NetCDF files that follow your project’s conventions (CMIP7, CMIP6, CORDEX, etc.). Files produced by CMOR or following the same standards will work correctly.
You have standardized NetCDF files in an incoming directory:
/data/incoming/
├── tas_Amon_IPSL-CM6A-LR_historical_r1i1p1f1_gr_185001-201412.nc
├── tas_Amon_IPSL-CM6A-LR_historical_r2i1p1f1_gr_185001-201412.nc
└── pr_Amon_IPSL-CM6A-LR_historical_r1i1p1f1_gr_185001-201412.nc
Goal:
Create a DRS-compliant structure and generate mapfiles for ESGF publication.
Step 1: List Datasets
First, let’s see what datasets esgprep detects from your files:
$ esgdrs make list --project cmip6 /data/incoming/
Expected Output:
DRS tree generation [----<-] ...
DRS tree generation [<<<<<<] Completed
Number of success(es): 3
Number of error(s): 0
===================================================================================================================================
Publication level Latest version -> Upgrade version Files to upgrade Total size
-----------------------------------------------------------------------------------------------------------------------------------
CMIP6/CMIP/IPSL/IPSL-CM6A-LR/historical/r1i1p1f1/Amon/tas/gr Initial -> v20250125 1 1.2G
CMIP6/CMIP/IPSL/IPSL-CM6A-LR/historical/r2i1p1f1/Amon/tas/gr Initial -> v20250125 1 1.2G
CMIP6/CMIP/IPSL/IPSL-CM6A-LR/historical/r1i1p1f1/Amon/pr/gr Initial -> v20250125 1 2.1G
===================================================================================================================================
What This Shows:
Progress spinner during file scanning:
[----<-]→[<<<<<<]Success/error counts for processed files
Table showing:
Publication level: Dataset path in DRS structure
Latest version: “Initial” for new datasets, or existing version number
Upgrade version: New version to be created (vYYYYMMDD format, typically today’s date)
Files to upgrade: Count of files in this dataset
Total size: Human-readable size of the dataset
Tip
If you see errors here about invalid facets or unrecognized project, check that:
Your files are CMOR-compliant
The project name is spelled correctly (case-sensitive:
cmip6notCMIP6)Files have proper global attributes
Step 2: Preview DRS Structure
Before making changes, preview what the DRS structure will look like:
$ esgdrs make tree --project cmip6 /data/incoming/ --root /data/esgf-data
Expected Output:
DRS tree generation [----<-] ...
DRS tree generation [<<<<<<] Completed
Number of success(es): 3
Number of error(s): 0
===================================================================================================================================
Upgrade DRS Tree
-----------------------------------------------------------------------------------------------------------------------------------
/data/esgf-data
CMIP6
└── CMIP
└── IPSL
└── IPSL-CM6A-LR
└── historical
├── r1i1p1f1
│ └── Amon
│ ├── tas
│ │ └── gr
│ │ ├── files
│ │ │ └── d20250125
│ │ │ └── tas_Amon_IPSL-CM6A-LR_historical_r1i1p1f1_gr_185001-201412.nc
│ │ ├── latest -> v20250125
│ │ └── v20250125
│ │ └── tas_Amon_*.nc -> ../files/d20250125/tas_Amon_*.nc
│ └── pr
│ └── gr
│ ├── files
│ │ └── d20250125
│ ├── latest -> v20250125
│ └── v20250125
└── r2i1p1f1
└── Amon
└── tas
└── gr
├── files
│ └── d20250125
├── latest -> v20250125
└── v20250125
===================================================================================================================================
What This Shows:
Complete directory hierarchy following project DRS
Files organized by facets: activity, institution, model, experiment, variant, frequency, variable, grid
files/dYYYYMMDD/ directories containing actual data files
vYYYYMMDD/ directories with symlinks pointing to files
latest symlinks pointing to newest version directory
Arrow notation (
->) showing symlink targets
Note
The --root option specifies where to create the DRS structure. If omitted, it uses your current directory.
Step 3: See Planned Operations
For more detail on what operations will be performed:
$ esgdrs make todo --project cmip6 /data/incoming/ --root /data/esgf-data --link
Expected Output:
DRS tree generation [----<-] ...
DRS tree generation [<<<<<<] Completed
Number of success(es): 3
Number of error(s): 0
===================================================================================================================================
Unix command-lines (DRY-RUN)
-----------------------------------------------------------------------------------------------------------------------------------
mkdir -p /data/esgf-data/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/historical/r1i1p1f1/Amon/tas/gr/v20250125
ln -s ../files/d20250125/tas_Amon_IPSL-CM6A-LR_historical_r1i1p1f1_gr_185001-201412.nc /data/esgf-data/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/historical/r1i1p1f1/Amon/tas/gr/v20250125/tas_Amon_IPSL-CM6A-LR_historical_r1i1p1f1_gr_185001-201412.nc
mkdir -p /data/esgf-data/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/historical/r1i1p1f1/Amon/tas/gr
ln -s v20250125 /data/esgf-data/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/historical/r1i1p1f1/Amon/tas/gr/latest
mkdir -p /data/esgf-data/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/historical/r1i1p1f1/Amon/tas/gr/files/d20250125
ln /data/incoming/tas_Amon_IPSL-CM6A-LR_historical_r1i1p1f1_gr_185001-201412.nc /data/esgf-data/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/historical/r1i1p1f1/Amon/tas/gr/files/d20250125/tas_Amon_IPSL-CM6A-LR_historical_r1i1p1f1_gr_185001-201412.nc
[... similar commands for other files ...]
===================================================================================================================================
What This Shows:
Header: “Unix command-lines (DRY-RUN)” indicates no actual changes
Exact Unix commands that will be executed:
mkdir -p- Create version directoriesln -s- Create symlinks from version dir to files/ln -s- Create latest symlink pointing to versionmkdir -p- Create files/dYYYYMMDD directoryln- Create hard link (with--linkflag) ormv- Move file (default)
Sequence shows complete operation for each dataset
Tip
Use --copy instead of --link if you want to preserve the original files separately.
Use --symlink for symbolic links (use with caution - broken if source moves).
Step 4: Apply DRS Structure
Now let’s actually create the DRS structure:
$ esgdrs make upgrade --project cmip6 /data/incoming/ --root /data/esgf-data --link
Expected Output:
DRS tree generation [----<-] ...
DRS tree generation [<<<<<<] Completed
Number of success(es): 3
Number of error(s): 0
===================================================================================================================================
Unix command-lines
-----------------------------------------------------------------------------------------------------------------------------------
mkdir -p /data/esgf-data/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/historical/r1i1p1f1/Amon/tas/gr/v20250125
ln -s ../files/d20250125/tas_Amon_IPSL-CM6A-LR_historical_r1i1p1f1_gr_185001-201412.nc /data/esgf-data/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/historical/r1i1p1f1/Amon/tas/gr/v20250125/tas_Amon_IPSL-CM6A-LR_historical_r1i1p1f1_gr_185001-201412.nc
ln -s v20250125 /data/esgf-data/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/historical/r1i1p1f1/Amon/tas/gr/latest
mkdir -p /data/esgf-data/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/historical/r1i1p1f1/Amon/tas/gr/files/d20250125
ln /data/incoming/tas_Amon_IPSL-CM6A-LR_historical_r1i1p1f1_gr_185001-201412.nc /data/esgf-data/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/historical/r1i1p1f1/Amon/tas/gr/files/d20250125/tas_Amon_IPSL-CM6A-LR_historical_r1i1p1f1_gr_185001-201412.nc
[... similar commands for other datasets ...]
===================================================================================================================================
What Just Happened:
Commands shown are actually being executed (no longer “DRY-RUN”)
DRS directory structure created under
/data/esgf-data/CMIP6/Files hard-linked (with
--link) to their DRS locationslatestsymlinks created pointing to v20250125Success count confirms all operations completed
Verify the Result:
$ ls -lh /data/esgf-data/cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/historical/r1i1p1f1/Amon/tas/gr/
drwxr-xr-x 2 user group 4.0K Nov 25 10:30 v20250125
lrwxrwxrwx 1 user group 10 Nov 25 10:30 latest -> v20250125
$ ls /data/esgf-data/cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/historical/r1i1p1f1/Amon/tas/gr/v20250125/
tas_Amon_IPSL-CM6A-LR_historical_r1i1p1f1_gr_185001-201412.nc
Perfect! Your data is now organized according to the project DRS.
Step 5: Generate Mapfiles
Now generate the mapfiles needed for ESGF publication:
$ esgmapfile make --project cmip6 --directory /data/esgf-data/CMIP6/ --outdir /data/mapfiles
Expected Output:
Mapfiles generation [----<-] ...
Mapfiles generation [<<<<<<] Completed
Mapfile(s) generated: 3 (in /data/mapfiles)
Number of success(es): 3
Number of error(s): 0
What Was Created:
Mapfiles are text files with one line per data file:
$ cat /data/mapfiles/CMIP6.CMIP.IPSL.IPSL-CM6A-LR.historical.r1i1p1f1.Amon.tas.gr.v20250125.map
Content:
CMIP6.CMIP.IPSL.IPSL-CM6A-LR.historical.r1i1p1f1.Amon.tas.gr#20250125 | /data/esgf-data/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/historical/r1i1p1f1/Amon/tas/gr/v20250125/tas_Amon_IPSL-CM6A-LR_historical_r1i1p1f1_gr_185001-201412.nc | 1234567890 | mod_time=1732531200.0 | checksum=a3d5e6f7890b1c2d3e4f5a6b7c8d9e0f1a2b3c4d5e6f7a8b9c0d1e2f3a4b5c6d
- Format:
dataset_id#version | file_path | size_bytes | mod_time=TIMESTAMP | checksum=HEXDIGESTNote: Version uses
#separator (not.v) in the dataset ID within mapfilesChecksum type defaults to SHA256 unless specified with
--checksum-type
These mapfiles are ready for ESGF publication!
Step 6: Verify Everything
Let’s verify the complete workflow succeeded:
# Check DRS structure exists
$ tree /data/esgf-data/CMIP6/ | head -20
# or use: find /data/esgf-data/CMIP6/ -type f -o -type l
# Check mapfiles were created
$ ls -lh /data/mapfiles/
-rw-r--r-- 1 user group 350 Jan 25 10:30 CMIP6.CMIP.IPSL.IPSL-CM6A-LR.historical.r1i1p1f1.Amon.tas.gr.v20250125.map
-rw-r--r-- 1 user group 350 Jan 25 10:30 CMIP6.CMIP.IPSL.IPSL-CM6A-LR.historical.r2i1p1f1.Amon.tas.gr.v20250125.map
-rw-r--r-- 1 user group 350 Jan 25 10:30 CMIP6.CMIP.IPSL.IPSL-CM6A-LR.historical.r1i1p1f1.Amon.pr.gr.v20250125.map
# View mapfile content
$ cat /data/mapfiles/*.map | head -1
CMIP6.CMIP.IPSL.IPSL-CM6A-LR.historical.r1i1p1f1.Amon.tas.gr#20250125 | /data/esgf-data/CMIP6/... | 1234567890 | ...
Common Options
During your workflow, you may want to use these common options:
Performance:
# Use multiple processors for faster processing
$ esgdrs make upgrade --project cmip6 /data/incoming/ --max-processes 8
# Skip checksums for faster testing (not for production!)
$ esgmapfile make --project cmip6 --directory /data/esgf-data/CMIP6/ --no-checksum
Checksums:
# Use multihash format (recommended for new data)
$ esgmapfile make --project cmip6 --directory /data/esgf-data/CMIP6/ --checksum-type sha2-256
# Provide pre-calculated checksums
$ esgmapfile make --project cmip6 --directory /data/esgf-data/CMIP6/ --checksums-from checksums.txt
Versioning:
# Specify a custom version
$ esgdrs make upgrade --project cmip6 /data/incoming/ --version 20241201
# Process all versions (not just latest)
$ esgmapfile make --project cmip6 --directory /data/esgf-data/CMIP6/ --all-versions
Logging:
# Save output to logfile
$ esgdrs make upgrade --project cmip6 /data/incoming/ --log /var/log/esgprep/
# Enable debug mode for troubleshooting
$ esgdrs make upgrade --project cmip6 /data/incoming/ --debug
What’s Next?
Congratulations! You’ve successfully:
✓ Organized your data into project DRS structure
✓ Generated mapfiles for ESGF publication
✓ Computed checksums for data integrity
Next Steps:
Publish to ESGF:
Use the ESGF publisher tools with your mapfiles:
esgpublish --map /data/mapfiles/*.map --service fileservice
Learn More:
Manage local data through the DRS - Detailed DRS command reference
Generate mapfile for ESGF publication - Advanced mapfile options
Configuration - Checksum and vocabulary configuration
Changelog from esgprep 2.x to 3.0 - Migrating from version 2.x
Handle Updates:
When you need to publish a new version:
# New files in incoming directory $ esgdrs make list --project cmip6 /data/incoming/ # Will show version increment: v20250126 $ esgdrs make upgrade --project cmip6 /data/incoming/ --upgrade-from-latest # Only processes changed files, symlinks unchanged ones $ esgmapfile make --project cmip6 --directory /data/esgf-data/CMIP6/ # Generates new mapfiles for updated version
Common Issues
Problem: “Project ‘cmip6’ not found in vocabulary”
ValueError: Project 'cmip6' not found in esgvoc
Solution: Check project name spelling (case-sensitive). Update esgvoc:
pip install --upgrade esgvoc
—
Problem: “Invalid facet value” errors
Solution: Your NetCDF files may not be CMOR-compliant. Verify:
# Check file attributes
ncdump -h your_file.nc | grep ":"
Ensure global attributes match project requirements.
—
Problem: Checksumming is very slow
Solution: For large files, pre-calculate checksums:
# Generate checksums separately
sha256sum /data/incoming/*.nc > checksums.txt
# Use them in esgmapfile
esgmapfile make --project cmip6 --directory /data/esgf-data/CMIP6/ --checksums-from checksums.txt
—
Problem: Need to test without modifying data
Solution: Use a test directory:
# Create test directory
mkdir -p /tmp/test
# Test in /tmp
esgdrs make upgrade --project cmip6 /data/incoming/ --root /tmp/test --link
# Review output with tree or find
tree /tmp/test/CMIP6/
# If satisfied, run on production location
Getting Help
If you encounter issues:
Check the logs: Use
--logand--debugoptionsRead the FAQ: See Frequently Asked Questions for common questions
Consult detailed docs: Generic usage, Manage local data through the DRS, Generate mapfile for ESGF publication
Report bugs: https://github.com/ESGF/esgf-prepare/issues
Useful Commands for Debugging:
# Check esgprep version
esgdrs --version
esgmapfile --version
# Check available projects (if esgvoc accessible)
python -c "import esgvoc; print(esgvoc.list_projects())"
# Verify NetCDF file structure
ncdump -h your_file.nc
# Test with single file
esgdrs make list --project cmip6 /path/to/single/file.nc
Quick Reference
Essential Commands:
# Preview what datasets will be created
esgdrs make list --project PROJECT /path/to/incoming/
# Preview DRS directory structure
esgdrs make tree --project PROJECT /path/to/incoming/ --root /output/path
# See planned operations (dry-run)
esgdrs make todo --project PROJECT /path/to/incoming/ --root /output/path --link
# Apply DRS structure
esgdrs make upgrade --project PROJECT /path/to/incoming/ --root /output/path --link
# Generate mapfiles
esgmapfile make --project PROJECT --directory /path/to/drs/ --outdir /path/to/mapfiles/
# Show mapfile paths (dry-run)
esgmapfile show --project PROJECT --directory /path/to/drs/
- Common Projects:
cmip6- CMIP6cmip5- CMIP5cordex- CORDEXinput4mips- input4MIPsobs4mips- obs4MIPs
For a complete list, check the esgvoc library documentation.
—
Ready to dive deeper? Continue to Generic usage for comprehensive command-line options, or Manage local data through the DRS for detailed DRS management features.