Real-World Examples
This section provides complete, copy-paste-ready examples for common ESGF projects. Each example shows the full workflow from raw data to publication-ready mapfiles.
CMIP7 Example
CMIP7 (Coupled Model Intercomparison Project Phase 7) is the latest generation of coordinated climate model experiments. This example demonstrates preparing CMIP7 data for ESGF publication.
Scenario
You have CMIP7 model output from IPSL-CM7A-LR for the historical experiment:
/incoming/
├── tas_mon_IPSL-CM7A-LR_historical_r1i1p1f1_glb_g1_185001-201412.nc
├── tas_mon_IPSL-CM7A-LR_historical_r2i1p1f1_glb_g1_185001-201412.nc
└── pr_mon_IPSL-CM7A-LR_historical_r1i1p1f1_glb_g1_185001-201412.nc
Step 1: List datasets
First, check what datasets will be created:
$> esgdrs list --project cmip7 /incoming/
Expected output:
Dataset: CMIP7.CMIP.IPSL.IPSL-CM7A-LR.historical.r1i1p1f1.glb.mon.tas.g1
Files: 1, Size: 2.3 GB, Version: v20250120
Dataset: CMIP7.CMIP.IPSL.IPSL-CM7A-LR.historical.r2i1p1f1.glb.mon.tas.g1
Files: 1, Size: 2.3 GB, Version: v20250120
Dataset: CMIP7.CMIP.IPSL.IPSL-CM7A-LR.historical.r1i1p1f1.glb.mon.pr.g1
Files: 1, Size: 1.8 GB, Version: v20250120
Step 2: Preview the DRS structure
See how files will be organized:
$> esgdrs tree --project cmip7 /incoming/
Expected output:
/data/MIP-DRS7/
└── CMIP7/
└── CMIP/
└── IPSL/
└── IPSL-CM7A-LR/
└── historical/
├── r1i1p1f1/
│ └── glb/
│ └── mon/
│ ├── tas/
│ │ └── none/
│ │ └── g1/
│ │ ├── v20250120/
│ │ └── latest -> v20250120/
│ └── pr/
│ └── none/
│ └── g1/
│ ├── v20250120/
│ └── latest -> v20250120/
└── r2i1p1f1/
└── ...
Step 3: Apply the DRS structure
Create the directory structure using symlinks (recommended to save disk space):
$> esgdrs upgrade --project cmip7 --root /data --link /incoming/
Note
Use --link to create symlinks instead of copying files. This saves disk space
and allows the original files to remain in place.
Step 4: Generate mapfiles
Create mapfiles for ESGF publication:
$> esgmapfile make --project cmip7 /data/MIP-DRS7/
Output files:
CMIP7.CMIP.IPSL.IPSL-CM7A-LR.historical.r1i1p1f1.glb.mon.tas.g1.v20250120.map
CMIP7.CMIP.IPSL.IPSL-CM7A-LR.historical.r2i1p1f1.glb.mon.tas.g1.v20250120.map
CMIP7.CMIP.IPSL.IPSL-CM7A-LR.historical.r1i1p1f1.glb.mon.pr.g1.v20250120.map
Example mapfile content:
CMIP7.CMIP.IPSL.IPSL-CM7A-LR.historical.r1i1p1f1.glb.mon.tas.g1.v20250120 | /data/MIP-DRS7/CMIP7/CMIP/IPSL/IPSL-CM7A-LR/historical/r1i1p1f1/glb/mon/tas/none/g1/v20250120/tas_mon_IPSL-CM7A-LR_historical_r1i1p1f1_glb_g1_185001-201412.nc | 2469134567 | mod_time=1737331200.0 | checksum=a1b2c3d4... | checksum_type=SHA256
CORDEX-CMIP6 Example
CORDEX (Coordinated Regional Climate Downscaling Experiment) provides high-resolution regional climate projections. This example shows preparing CORDEX-CMIP6 data.
Scenario
You have CORDEX data for the European domain from the ALARO1-SFX model:
/incoming/
├── tas_EUR-12_MPI-ESM1-2-HR_historical_r1i1p1f1_RMIB-UGent_ALARO1-SFX_v1_day_19500101-19501231.nc
└── pr_EUR-12_MPI-ESM1-2-HR_historical_r1i1p1f1_RMIB-UGent_ALARO1-SFX_v1_day_19500101-19501231.nc
Step 1: List datasets
$> esgdrs list --project cordex-cmip6 /incoming/
Expected output:
Dataset: CORDEX.DD.EUR-12.RMIB-UGent.MPI-ESM1-2-HR.historical.r1i1p1f1.ALARO1-SFX.v1.day.tas
Files: 1, Size: 450 MB, Version: v20250120
Dataset: CORDEX.DD.EUR-12.RMIB-UGent.MPI-ESM1-2-HR.historical.r1i1p1f1.ALARO1-SFX.v1.day.pr
Files: 1, Size: 380 MB, Version: v20250120
Step 2: Preview and apply DRS
# Preview structure
$> esgdrs tree --project cordex-cmip6 /incoming/
# Apply structure
$> esgdrs upgrade --project cordex-cmip6 --root /data --link /incoming/
Step 3: Generate mapfiles
$> esgmapfile make --project cordex-cmip6 /data/CORDEX/
Key Differences Between Projects
Aspect |
CMIP7 |
CORDEX-CMIP6 |
|---|---|---|
Scope |
Global climate models |
Regional downscaling |
Resolution |
~100-250 km |
12-50 km |
Key facets |
source, experiment, frequency |
domain_id, driving_source_id, source_id |
DRS root |
MIP-DRS7/CMIP7/ |
CORDEX/ |
Dataset Updates (New Versions)
When you need to publish an updated version of existing data:
Scenario
You have corrections to existing CMIP7 data:
/incoming_v2/
└── tas_mon_IPSL-CM7A-LR_historical_r1i1p1f1_glb_g1_185001-201412.nc (corrected)
Using –upgrade-from-latest
This option reuses unchanged files from the previous version:
$> esgdrs upgrade --project cmip7 \
--root /data \
--link \
--upgrade-from-latest \
/incoming_v2/
Result:
/data/MIP-DRS7/CMIP7/.../tas/none/g1/
├── files/
│ ├── d20250101/
│ │ └── tas_*.nc (original)
│ └── d20250120/
│ └── tas_*.nc (corrected)
│
├── v20250101/
│ └── tas_*.nc -> ../files/d20250101/tas_*.nc
│
├── v20250120/
│ └── tas_*.nc -> ../files/d20250120/tas_*.nc (new version)
│
└── latest -> v20250120/ (updated)
Generate mapfiles for the new version:
$> esgmapfile make --project cmip7 --latest /data/MIP-DRS7/
Testing Before Production
Always test your workflow before modifying production data:
# Create a test directory
mkdir -p /tmp/esgprep_test
# Copy a few sample files
cp /incoming/tas_*.nc /tmp/esgprep_test/
# Test the workflow
$> esgdrs list --project cmip7 /tmp/esgprep_test/
$> esgdrs tree --project cmip7 /tmp/esgprep_test/
$> esgdrs upgrade --project cmip7 --root /tmp/test_output --link /tmp/esgprep_test/
# Verify structure
tree /tmp/test_output/
# Generate test mapfiles
$> esgmapfile make --project cmip7 /tmp/test_output/
# Review mapfiles
cat *.map
# Clean up when satisfied
rm -rf /tmp/esgprep_test /tmp/test_output
Large Dataset Processing
For datasets with many files (>10,000), optimize processing:
Pre-calculate checksums
For large files, pre-calculating checksums can save time:
# Create checksum file
find /incoming -name "*.nc" -exec sha256sum {} \; > checksums.txt
# Use pre-calculated checksums
$> esgmapfile make --project cmip7 \
--checksums-from checksums.txt \
/data/MIP-DRS7/
Optimize multiprocessing
Adjust the number of processes based on your system:
# Use 8 processes (default is 4)
$> esgmapfile make --project cmip7 \
--max-processes 8 \
/data/MIP-DRS7/
Tip
For I/O-bound operations (reading many files), more processes may not help. For CPU-bound operations (checksum calculation), use as many cores as available.
See Also
Concepts & Terminology - Understanding ESGF terminology
Manage local data through the DRS - Detailed esgdrs command reference
Generate mapfile for ESGF publication - Detailed esgmapfile command reference
Troubleshooting - Common issues and solutions