clef.esgf¶
Functions for searching the ESGF and matching the results against the MAS database
esgf_query()performs a query against the ESGF web API.match_query()performs an outer join of theesgf_query()results against theclef.model.Pathtablefind_local_path()andfind_missing_id()use the results ofmatch_query()to return the files that are replicated locally and missing from the replica respectively.
-
clef.esgf.esgf_query(query, fields, limit=5000, offset=0, distrib=True, replica=False, latest=None, **kwargs)[source]¶ Search the ESGF
Searches the ESGF using its API. Keyword arguments not listed here are passed on to the API search, they can either be single values or lists.
Parameters: - query (str) – Full text query
- fields (list) – Fields to return
- limit (int) – Maximum items to return
- offset (int) – Starting offset of returned items (use with limit for paging)
- distrib (bool) – Distribute the search across all nodes
- replica (bool) – Return replicated datasets
- latest (bool or None) – Return only latest (True), only not latest (False) or all versions (None)
- **kwargs – See the ESGF API docs
Returns: API response from ESGF, decoded from JSON into a Python dict
-
clef.esgf.find_checksum_id(query, **kwargs)[source]¶ Get checksums and IDs of matching files from ESGF
Searches ESGF using
esgf_query(), then converts the response into a SQLAlchemy selectable for further processingParameters: **kwargs – See esgf_query()Returns: - Values table of matching File objects, containing
- checksum
- id
- dataset_id
- title
- version
This table can be joined against the MAS database tables
-
clef.esgf.find_local_path(session, subq, oformat='file')[source]¶ Find the filesystem paths of ESGF matches
Converts the results of
match_query()to local filesystem paths, either to the file itself or to the containing dataset.Parameters: - format ('file' or 'dataset') – Return the path to the file or the dataset directory
- subq – result of func:esgf_query
Returns: Iterable of strings with the paths to either paths or datasets
-
clef.esgf.find_missing_id(session, subq, oformat='file')[source]¶ Returns the ESGF id for each file in the ESGF query that doesn’t have a local match
Parameters: - format ('file' or 'dataset') – Return the path to the file or the dataset directory
- subq – result of func:esgf_query
Returns: Iterable of strings with the ESGF file or dataset id
-
clef.esgf.link_to_esgf(query, **kwargs)[source]¶ Convert search terms to a ESGF search URL
Returns a link to the user-facing ESGF web search matching a particular query. This is helpful for error messages, users can follow the URL to find the matches as ESGF sees them
Note that this link is to the ESGF user-facing search page, rather than the web API that
esgf_query()uses.Parameters: **kwargs – As esgf_query()Returns: URL to the ESGF search website Return type: str
-
clef.esgf.match_query(session, query, latest=None, **kwargs)[source]¶ Match ESGF results against
clef.model.PathMatches the results of
find_checksum_id()with thePathtable. If latest is True the checksums will be matched, otherwise only the file name is used in order to spot outdated versions that have been removed from ESGF.Parameters: - latest (bool) – Match the checksums (True) or filenames (False)
- **kwargs – See
esgf_query()
Returns: Joined result of
clef.model.Pathandfind_checksum_id()