ESGF Core Metadata

ESGF “core” metadata includes mandatory and optional fields that are needed to search and download data throughout the ESGF system. These fields are not specific to any scientific discpline, rather they are common to any discpline that intends to leverage the ESGF infrastructure. Core metadata fields are defined in the ESGF meta-schema file esgf.xml, which is always used for validation whenever metadata records are published.

When a client invokes the ESGF “Pull” Publishing Services, valid metadata records are automatically created on the server by harvesting pre-existing THREDDS catalogs. Viceversa, when a client invokes the ESGF “Push” Publishing Services, it is itself responsible for creating valid metadata records before they are sent to the server for ingestion.

Required Fields

  • id (string): record identifier, must be globally unique. It is specific to a verion and replica of that object

    • Note: when parsing THREDDS catalogs, it is built as “instance_id|data_node”

    • Note: could be any string, including a UUID

    • Example (for Dataset): id cmip5.output1.INM.inmcm4.1pctCO2.day.atmos.day.r1i1p1.v20110323|pcmdi9.llnl.gov

    • Example (for File): id = cmip5.output1.INM.inmcm4.1pctCO2.day.atmos.day.r1i1p1.v20110323.huss_day_inmcm4_1pctCO2_r1i1p1_20900101-20991231.nc|pcmdi9.llnl.gov

  • title (string): record title, displayed as main text in search results

    • Example: title = “project=CMIP5 / IPCC Fifth Assessment Report, model=Institute for Numerical Mathematics, experiment=1 percent per year CO2, time_frequency=day, modeling realm=atmos, ensemble=r1i1p1, version=20110323”

  • type (string chosen from Controlled Vocabulary): record type, used to enable searching on different targets

    • Note: currently valid values are: “Dataset”, “File”, “Aggregation”

    • Note: record of different types are mapped to separate Solr cores, cannot search acrosss cores

    • Example: type=Dataset

  • project (string): provides a scientific context for these data - for Datasets only

    • Example: project = CMIP5

  • dataset_id (string) - the enclosing dataset identifier, for files and aggregations only

    • Note: allows to search for datasets first, files later

    • Example: dataset_id = cmip5.output1.INM.inmcm4.1pctCO2.day.atmos.day.r1i1p1.v20110323|pcmdi9.llnl.gov

  • index_node (string): host indexing the data

    • Note: allows to target the files search to a specific index node which greatly improves performance

    • Note: not needed if publishing Datasets that have no files, only “service” level endpoints

    • Example: index_node = pcmdi9.llnl.gov

  • data_node (string): host serving the data

    • Note: not needed when publishing Datasets that have no files

    • Example: data_node = pcmdi11.llnl.gov

Optional Fields needed for Versioning and Replication

  • version (integer, default=0): version of a Dataset, File or Aggregation, if provided it must be an integer

  • master_id (string): globally unique identifier for a “logical” Dataset or File: it is the same across all versions and replicas of that object

    • Note: allows searching for all versions and copies of the same logical record

    • Example (for Dataset): master_id = cmip5.output1.INM.inmcm4.1pctCO2.day.atmos.day.r1i1p1

    • Example (for File): master_id = cmip5.output1.INM.inmcm4.1pctCO2.day.atmos.day.r1i1p1.huss_day_inmcm4_1pctCO2_r1i1p1_20900101-20991231.nc

      • instance_id (string): globally unique identifier for a specific version of a Dataset or File, it’s the same for all replicas of that object, but different for different versions

    • Note: allows searching for all replicas of the same record of a given version

    • Example (for Dataset): iknstance_id = cmip5.output1.INM.inmcm4.1pctCO2.day.atmos.day.r1i1p1.v20110323

    • Example (for File): instance_id =cmip5.output1.INM.inmcm4.1pctCO2.day.atmos.day.r1i1p1.v20110323.huss_day_inmcm4_1pctCO2_r1i1p1_20900101-20991231.nc

  • replica (boolean, default=false): enables support for logical copies of the same record

  • latest (boolean, default=true): enables support for searching on latest version of each record

  • shard (string): enables publishing to local shard to avoid replicating these data across the federation

    • Example: shard=localhost:8982

  • checksum: file checksum - Files only

    • Example: checksum = 7fcd959a4bb57e4079c8e65a7a5d0499

  • checksum_type (string from CV): the algorithm used to compute the checksum

    • Example: checksum_type = SHA256

Optional Generic Fields

Optional Fields needed for Files Download

  • size (long) - total size in bytes of record content (i.e. size of single file, or global size of all files in a dataset)

  • number_of_files (integer) - for datasets only

  • numbr_of_aggregations (integer) - for datasets only

Examples

  • esgf_dataset.xml : example Dataset metadata record complying to the ESGF core and Earth Science schemas

  • esgf_file.xml : example File metadata record complying to the ESGF core and Earth Science schemas