Real-World Examples

This section provides complete, copy-paste-ready examples for common ESGF projects. Each example shows the full workflow from raw data to publication-ready mapfiles.

CMIP7 Example 

CMIP7 (Coupled Model Intercomparison Project Phase 7) is the latest generation of coordinated climate model experiments. This example demonstrates preparing CMIP7 data for ESGF publication.

Scenario 

You have CMIP7 model output from IPSL-CM7A-LR for the historical experiment:

/incoming/
├── tas_mon_IPSL-CM7A-LR_historical_r1i1p1f1_glb_g1_185001-201412.nc
├── tas_mon_IPSL-CM7A-LR_historical_r2i1p1f1_glb_g1_185001-201412.nc
└── pr_mon_IPSL-CM7A-LR_historical_r1i1p1f1_glb_g1_185001-201412.nc

Step 1: List datasets 

First, check what datasets will be created:

$> esgdrs list --project cmip7 /incoming/

Expected output:

Dataset: CMIP7.CMIP.IPSL.IPSL-CM7A-LR.historical.r1i1p1f1.glb.mon.tas.g1
Files: 1, Size: 2.3 GB, Version: v20250120

Dataset: CMIP7.CMIP.IPSL.IPSL-CM7A-LR.historical.r2i1p1f1.glb.mon.tas.g1
Files: 1, Size: 2.3 GB, Version: v20250120

Dataset: CMIP7.CMIP.IPSL.IPSL-CM7A-LR.historical.r1i1p1f1.glb.mon.pr.g1
Files: 1, Size: 1.8 GB, Version: v20250120

Step 2: Preview the DRS structure 

See how files will be organized:

$> esgdrs tree --project cmip7 /incoming/

Expected output:

/data/MIP-DRS7/
└── CMIP7/
    └── CMIP/
        └── IPSL/
            └── IPSL-CM7A-LR/
                └── historical/
                    ├── r1i1p1f1/
                    │   └── glb/
                    │       └── mon/
                    │           ├── tas/
                    │           │   └── none/
                    │           │       └── g1/
                    │           │           ├── v20250120/
                    │           │           └── latest -> v20250120/
                    │           └── pr/
                    │               └── none/
                    │                   └── g1/
                    │                       ├── v20250120/
                    │                       └── latest -> v20250120/
                    └── r2i1p1f1/
                        └── ...

Step 3: Apply the DRS structure 

Create the directory structure using symlinks (recommended to save disk space):

$> esgdrs upgrade --project cmip7 --root /data --link /incoming/

Note

Use --link to create symlinks instead of copying files. This saves disk space and allows the original files to remain in place.

Step 4: Generate mapfiles 

Create mapfiles for ESGF publication:

$> esgmapfile make --project cmip7 /data/MIP-DRS7/

Output files:

CMIP7.CMIP.IPSL.IPSL-CM7A-LR.historical.r1i1p1f1.glb.mon.tas.g1.v20250120.map
CMIP7.CMIP.IPSL.IPSL-CM7A-LR.historical.r2i1p1f1.glb.mon.tas.g1.v20250120.map
CMIP7.CMIP.IPSL.IPSL-CM7A-LR.historical.r1i1p1f1.glb.mon.pr.g1.v20250120.map

Example mapfile content:

CMIP7.CMIP.IPSL.IPSL-CM7A-LR.historical.r1i1p1f1.glb.mon.tas.g1.v20250120 | /data/MIP-DRS7/CMIP7/CMIP/IPSL/IPSL-CM7A-LR/historical/r1i1p1f1/glb/mon/tas/none/g1/v20250120/tas_mon_IPSL-CM7A-LR_historical_r1i1p1f1_glb_g1_185001-201412.nc | 2469134567 | mod_time=1737331200.0 | checksum=a1b2c3d4... | checksum_type=SHA256

CORDEX-CMIP6 Example 

CORDEX (Coordinated Regional Climate Downscaling Experiment) provides high-resolution regional climate projections. This example shows preparing CORDEX-CMIP6 data.

Scenario 

You have CORDEX data for the European domain from the ALARO1-SFX model:

/incoming/
├── tas_EUR-12_MPI-ESM1-2-HR_historical_r1i1p1f1_RMIB-UGent_ALARO1-SFX_v1_day_19500101-19501231.nc
└── pr_EUR-12_MPI-ESM1-2-HR_historical_r1i1p1f1_RMIB-UGent_ALARO1-SFX_v1_day_19500101-19501231.nc

Step 1: List datasets 

$> esgdrs list --project cordex-cmip6 /incoming/

Expected output:

Dataset: CORDEX.DD.EUR-12.RMIB-UGent.MPI-ESM1-2-HR.historical.r1i1p1f1.ALARO1-SFX.v1.day.tas
Files: 1, Size: 450 MB, Version: v20250120

Dataset: CORDEX.DD.EUR-12.RMIB-UGent.MPI-ESM1-2-HR.historical.r1i1p1f1.ALARO1-SFX.v1.day.pr
Files: 1, Size: 380 MB, Version: v20250120

Step 2: Preview and apply DRS 

# Preview structure
$> esgdrs tree --project cordex-cmip6 /incoming/

# Apply structure
$> esgdrs upgrade --project cordex-cmip6 --root /data --link /incoming/

Step 3: Generate mapfiles 

$> esgmapfile make --project cordex-cmip6 /data/CORDEX/

Key Differences Between Projects 

Aspect	CMIP7	CORDEX-CMIP6
Scope	Global climate models	Regional downscaling
Resolution	~100-250 km	12-50 km
Key facets	source, experiment, frequency	domain_id, driving_source_id, source_id
DRS root	MIP-DRS7/CMIP7/	CORDEX/

Dataset Updates (New Versions)

When you need to publish an updated version of existing data:

Scenario 

You have corrections to existing CMIP7 data:

/incoming_v2/
└── tas_mon_IPSL-CM7A-LR_historical_r1i1p1f1_glb_g1_185001-201412.nc  (corrected)

Using –upgrade-from-latest 

This option reuses unchanged files from the previous version:

$> esgdrs upgrade --project cmip7 \
                  --root /data \
                  --link \
                  --upgrade-from-latest \
                  /incoming_v2/

Result:

/data/MIP-DRS7/CMIP7/.../tas/none/g1/
├── files/
│   ├── d20250101/
│   │   └── tas_*.nc          (original)
│   └── d20250120/
│       └── tas_*.nc          (corrected)
│
├── v20250101/
│   └── tas_*.nc -> ../files/d20250101/tas_*.nc
│
├── v20250120/
│   └── tas_*.nc -> ../files/d20250120/tas_*.nc  (new version)
│
└── latest -> v20250120/      (updated)

Generate mapfiles for the new version:

$> esgmapfile make --project cmip7 --latest /data/MIP-DRS7/

Testing Before Production 

Always test your workflow before modifying production data:

# Create a test directory
mkdir -p /tmp/esgprep_test

# Copy a few sample files
cp /incoming/tas_*.nc /tmp/esgprep_test/

# Test the workflow
$> esgdrs list --project cmip7 /tmp/esgprep_test/
$> esgdrs tree --project cmip7 /tmp/esgprep_test/
$> esgdrs upgrade --project cmip7 --root /tmp/test_output --link /tmp/esgprep_test/

# Verify structure
tree /tmp/test_output/

# Generate test mapfiles
$> esgmapfile make --project cmip7 /tmp/test_output/

# Review mapfiles
cat *.map

# Clean up when satisfied
rm -rf /tmp/esgprep_test /tmp/test_output

Large Dataset Processing 

For datasets with many files (>10,000), optimize processing:

Pre-calculate checksums 

For large files, pre-calculating checksums can save time:

# Create checksum file
find /incoming -name "*.nc" -exec sha256sum {} \; > checksums.txt

# Use pre-calculated checksums
$> esgmapfile make --project cmip7 \
                   --checksums-from checksums.txt \
                   /data/MIP-DRS7/

Optimize multiprocessing 

Adjust the number of processes based on your system:

# Use 8 processes (default is 4)
$> esgmapfile make --project cmip7 \
                   --max-processes 8 \
                   /data/MIP-DRS7/

Tip

For I/O-bound operations (reading many files), more processes may not help. For CPU-bound operations (checksum calculation), use as many cores as available.