Skip to content

Data discovery

There are two ways to search for datasets with esgpull: facet and free-text search. Both are done through the search command.

Facet search is performed with all terms using the facet syntax <name>:<value>, for which both name and value are matched exactly.

Search CMIP6 datasets
$ esgpull search project:CMIP6
esgpull search

Multiple values for a facet

Multiples values can be used by separating each value with a , comma.

Single variable search
$ esgpull search variable_id:c2h2 -0
$ esgpull search variable_id:c2h6 -0
esgpull search esgpull search

Combine both variables in a single search
$ esgpull search variable_id:c2h2,c2h6 -0
esgpull search

Using spaces "disconnects" values from the facet name

esgpull search

If you don't know the facets' names

esgpull provides a handy way to print out facet names that can be used to refine a query:

esgpull search variable_id:c2h2,c2h6 --facets
esgpull search

Using a facet that is not in this list will always result in a query with 0 corresponding datasets.

Ignoring a facet

A facet can be ignored by prepending its name with !. In this case, it is not possible to specify other values for this facet.

Ignore all datasets from IPSL
$ esgpull search !institution_id:IPSL
esgpull search ignore

Free-text search is performed on any term not using the <name>:<value> syntax. Each term is sent to the ESGF search API, which uses Apache Solr to match results.

Solr syntax

Solr will always try to match any of the search terms.

esgpull search surface temperature # will match either `surface` or `temperature`

To make sure all the words are matched, terms must be separated with AND and written between quotes

esgpull search "surface AND temperature" # will match BOTH `surface` and `temperature`

Wildcard syntax

A wildcard * can be used in both facet (value only) and free-text search, allowing for more controllable search results.

Note that on most shells, the wildcard symbol should be inside " quotes, to escape it from being expanded by the shell before esgpull receives any input.

All initializations for areacella variable from piControl experiments
$ esgpull search "member_id:r1i*p1f1" table_id:fx variable_id:areacella experiment_id:piControl
esgpull search

Case sensitivity

For facet search, the case must match exactly, so project:cmip6 will give entirely different results than project:CMIP6.

In the case of free-text search, the opposite is true, case is entirely insensitive, and may lead to funky results.