clef.esgf

Functions for searching the ESGF and matching the results against the MAS database

exception clef.esgf.ESGFException[source]

Error from the ESGF API

clef.esgf.esgf_query(query, fields, limit=5000, offset=0, distrib=True, replica=False, latest=None, **kwargs)[source]

Search the ESGF

Searches the ESGF using its API. Keyword arguments not listed here are passed on to the API search, they can either be single values or lists.

Parameters:
  • query (str) – Full text query
  • fields (list) – Fields to return
  • limit (int) – Maximum items to return
  • offset (int) – Starting offset of returned items (use with limit for paging)
  • distrib (bool) – Distribute the search across all nodes
  • replica (bool) – Return replicated datasets
  • latest (bool or None) – Return only latest (True), only not latest (False) or all versions (None)
  • **kwargs – See the ESGF API docs
Returns:

API response from ESGF, decoded from JSON into a Python dict

clef.esgf.find_checksum_id(query, **kwargs)[source]

Get checksums and IDs of matching files from ESGF

Searches ESGF using esgf_query(), then converts the response into a SQLAlchemy selectable for further processing

Parameters:**kwargs – See esgf_query()
Returns:
Values table of matching File objects, containing
  • checksum
  • id
  • dataset_id
  • title
  • version

This table can be joined against the MAS database tables

clef.esgf.find_local_path(session, subq, oformat='file')[source]

Find the filesystem paths of ESGF matches

Converts the results of match_query() to local filesystem paths, either to the file itself or to the containing dataset.

Parameters:
  • format ('file' or 'dataset') – Return the path to the file or the dataset directory
  • subq – result of func:esgf_query
Returns:

Iterable of strings with the paths to either paths or datasets

clef.esgf.find_missing_id(session, subq, oformat='file')[source]

Returns the ESGF id for each file in the ESGF query that doesn’t have a local match

Parameters:
  • format ('file' or 'dataset') – Return the path to the file or the dataset directory
  • subq – result of func:esgf_query
Returns:

Iterable of strings with the ESGF file or dataset id

Convert search terms to a ESGF search URL

Returns a link to the user-facing ESGF web search matching a particular query. This is helpful for error messages, users can follow the URL to find the matches as ESGF sees them

Note that this link is to the ESGF user-facing search page, rather than the web API that esgf_query() uses.

Parameters:**kwargs – As esgf_query()
Returns:URL to the ESGF search website
Return type:str
clef.esgf.match_query(session, query, latest=None, **kwargs)[source]

Match ESGF results against clef.model.Path

Matches the results of find_checksum_id() with the Path table. If latest is True the checksums will be matched, otherwise only the file name is used in order to spot outdated versions that have been removed from ESGF.

Parameters:
  • latest (bool) – Match the checksums (True) or filenames (False)
  • **kwargs – See esgf_query()
Returns:

Joined result of clef.model.Path and find_checksum_id()