ESGVOC Library#

ESGVOC is a Python library designed to simplify interaction with controlled vocabularies (CVs) used in WCRP climate data projects. It supports querying, caching, and validating terms across various CV repositories like the universe and project-specific repositories (e.g., CMIP6Plus, CMIP6, etc.)


Features#

  • Query controlled vocabularies:

    • Retrieve terms, collections, or descriptors.

    • Perform cross-validation and search operations.

    • Supports case-sensitive, wildcard, and approximate matching.

  • Caching:

    • Download CVs to a local database for offline use.

    • Keep the local cache up-to-date.

  • Validation:

    • Validate string values against CV terms (DRS).


Use cases#

The ESGVOC library supports a wide range of use cases, including:

Caching#

  • Usage without internet access.

  • Downloading CVs to a local archive or database.

  • Updating the local cache.

  • Performing consistency checks between the local cache and remote CV repositories.

Listing#

  • All data descriptors from the universe.

  • All terms of one data descriptor from the universe.

  • All available projects.

  • All collections from a project.

  • All terms from a project.

  • All terms of a collection from a project.

Searching#

  • Data descriptors in the universe.

  • Terms in the universe or data descriptors.

  • Collections in projects.

  • Terms in collections of projects.

Note

Searching is based on the term id and not its regex nor DRS name. It may be case-sensitive or not, supports wildcards (%) and regex.

DRS validation#

  • Terms of a project.

  • Terms of a collection from a project.

  • Terms from all projects (cross-validation).

DRS validation is a feature that validates a character string against a vocabulary term. This feature is based on the DRS semantics of the terms, i.e. :

  • The drs_name field of a plain term

  • Or the regex field of a pattern term.

For composite terms, each part of the term is validated one by one and falls under one or other of the validation conditions described above.

Note

Don’t confuse the term identifier (id) with its DRS semantics.

DRS applications#

The DRS applications are based on the functionalities described above. They provide convenient way to check DRS expressions of a project (directory, dataset id and file name) and also generate expressions from mappings of collections and terms or an unordered bag of terms.

=======