ESGF Core Metadata¶

ESGF “core” metadata includes mandatory and optional fields that are needed to search and download data throughout the ESGF system. These fields are not specific to any scientific discpline, rather they are common to any discpline that intends to leverage the ESGF infrastructure. Core metadata fields are defined in the ESGF meta-schema file esgf.xml, which is always used for validation whenever metadata records are published.

When a client invokes the ESGF “Pull” Publishing Services, valid metadata records are automatically created on the server by harvesting pre-existing THREDDS catalogs. Viceversa, when a client invokes the ESGF “Push” Publishing Services, it is itself responsible for creating valid metadata records before they are sent to the server for ingestion.

Required Fields¶

id (string): record identifier, must be globally unique. It is specific to a verion and replica of that object
- Note: when parsing THREDDS catalogs, it is built as “instance_id|data_node”
- Note: could be any string, including a UUID
- Example (for Dataset): id cmip5.output1.INM.inmcm4.1pctCO2.day.atmos.day.r1i1p1.v20110323|pcmdi9.llnl.gov
- Example (for File): id = cmip5.output1.INM.inmcm4.1pctCO2.day.atmos.day.r1i1p1.v20110323.huss_day_inmcm4_1pctCO2_r1i1p1_20900101-20991231.nc|pcmdi9.llnl.gov
title (string): record title, displayed as main text in search results
- Example: title = “project=CMIP5 / IPCC Fifth Assessment Report, model=Institute for Numerical Mathematics, experiment=1 percent per year CO2, time_frequency=day, modeling realm=atmos, ensemble=r1i1p1, version=20110323”
type (string chosen from Controlled Vocabulary): record type, used to enable searching on different targets
- Note: currently valid values are: “Dataset”, “File”, “Aggregation”
- Note: record of different types are mapped to separate Solr cores, cannot search acrosss cores
- Example: type=Dataset
project (string): provides a scientific context for these data - for Datasets only
- Example: project = CMIP5
dataset_id (string) - the enclosing dataset identifier, for files and aggregations only
- Note: allows to search for datasets first, files later
- Example: dataset_id = cmip5.output1.INM.inmcm4.1pctCO2.day.atmos.day.r1i1p1.v20110323|pcmdi9.llnl.gov
index_node (string): host indexing the data
- Note: allows to target the files search to a specific index node which greatly improves performance
- Note: not needed if publishing Datasets that have no files, only “service” level endpoints
- Example: index_node = pcmdi9.llnl.gov
data_node (string): host serving the data
- Note: not needed when publishing Datasets that have no files
- Example: data_node = pcmdi11.llnl.gov

Optional Fields needed for Versioning and Replication¶

version (integer, default=0): version of a Dataset, File or Aggregation, if provided it must be an integer
master_id (string): globally unique identifier for a “logical” Dataset or File: it is the same across all versions and replicas of that object
- Note: allows searching for all versions and copies of the same logical record
- Example (for Dataset): master_id = cmip5.output1.INM.inmcm4.1pctCO2.day.atmos.day.r1i1p1
- Example (for File): master_id = cmip5.output1.INM.inmcm4.1pctCO2.day.atmos.day.r1i1p1.huss_day_inmcm4_1pctCO2_r1i1p1_20900101-20991231.nc
  - instance_id (string): globally unique identifier for a specific version of a Dataset or File, it’s the same for all replicas of that object, but different for different versions
- Note: allows searching for all replicas of the same record of a given version
- Example (for Dataset): iknstance_id = cmip5.output1.INM.inmcm4.1pctCO2.day.atmos.day.r1i1p1.v20110323
- Example (for File): instance_id =cmip5.output1.INM.inmcm4.1pctCO2.day.atmos.day.r1i1p1.v20110323.huss_day_inmcm4_1pctCO2_r1i1p1_20900101-20991231.nc
replica (boolean, default=false): enables support for logical copies of the same record
latest (boolean, default=true): enables support for searching on latest version of each record
shard (string): enables publishing to local shard to avoid replicating these data across the federation
- Example: shard=localhost:8982
checksum: file checksum - Files only
- Example: checksum = 7fcd959a4bb57e4079c8e65a7a5d0499
checksum_type (string from CV): the algorithm used to compute the checksum
- Example: checksum_type = SHA256

Optional Generic Fields¶

description (string) - additional text displayed in search results
- Example: description = inmcm4 model output prepared for CMIP5 1 percent per year CO2
url (string, zero, one or more): record access point, encoded as 3-tuple of the form (URL|mime-type|service name)
- Example: url = http://pcmdi9.llnl.gov/thredds/esgcet/1/cmip5.output1.INM.inmcm4.1pctCO2.day.atmos.day.r1i1p1.v20110323.xml#cmip5.output1.INM.inmcm4.1pctCO2.day.atmos.day.r1i1p1.v20110323|application/xml+thredds|Catalog
- Example: url = http://pcmdi9.llnl.gov/las/getUI.do?catid=D9C519D5A310E197819B7197215FD574_ns_cmip5.output1.INM.inmcm4.1pctCO2.day.atmos.day.r1i1p1.v20110323|application/las|LAS
access (string, zero, one or more): name of the service through which the data can be acceesed
- Example: access = THREDDS, access = LAS
timestamp (date): date the document was created or last modified - if such information is found
- Note: if provided, it must be in the form: yyyy-MM-dd’T’HH:mm:ss’Z’
- Example: timestamp = 2012-01-13T01:34:15Z
schema (string, must be chosen from Controlled Vocabulary) - used to enable server-side validation
- Note: must be encoded as attribute of the top-level root XML element
- Example: schema = “cmip5”
format (string): the data format
- Example: format = NetCDF

Optional Fields needed for Files Download¶

size (long) - total size in bytes of record content (i.e. size of single file, or global size of all files in a dataset)
number_of_files (integer) - for datasets only
numbr_of_aggregations (integer) - for datasets only

Examples¶

esgf_dataset.xml : example Dataset metadata record complying to the ESGF core and Earth Science schemas
esgf_file.xml : example File metadata record complying to the ESGF core and Earth Science schemas