integrate_query

Functions related to querying posterior realizations in the INTEGRATE module.

Query posterior realizations based on geophysical constraints.

This module provides tools to compute probabilities that posterior realizations from Bayesian inversion satisfy user-defined constraints (e.g., thickness of lithology classes, resistivity thresholds).

integrate.integrate_query.get_prior_model_info(f_prior_h5, im)

Return metadata for prior model im.

Parameters:
  • f_prior_h5 (str) – Path to the prior HDF5 file.

  • im (int) – Model index.

Returns:

info – Keys: ‘name’, ‘is_discrete’, ‘z’, ‘class_id’, ‘class_name’.

Return type:

dict

integrate.integrate_query.load_query(path)

Load a query dict from a JSON file.

Parameters:

path (str) – Input JSON file path.

Returns:

query – Query definition dictionary.

Return type:

dict

integrate.integrate_query.query(f_post_h5, query_dict)

Compute per-data-point probability that posterior realizations satisfy a query.

Parameters:
  • f_post_h5 (str) – Path to the posterior HDF5 file (output of integrate_rejection).

  • query_dict (str or dict) – Path to a JSON file, or a dict defining the query.

Returns:

  • P (ndarray (N_data,)) – Probability [0, 1] for each data location.

  • meta (dict) – Metadata with keys:

    • ’X’, ‘Y’ : coordinate arrays (or None)

    • ’N_data’, ‘N_post’ : data location count and samples per location

    • ’i_use’ : ndarray (N_data, N_post), all posterior indices from f_post_h5

    • ’i_use_query’ : list of N_data arrays, indices that match the query for each data location

Notes

Constraints are applied sequentially (logical AND). A realization is counted as satisfying the query only if it passes every constraint. The probability is the fraction of accepted realizations.

The prior file path is read from the ‘f5_prior’ attribute of the posterior file. Coordinates (UTMX, UTMY) are read from the posterior file if present, otherwise from the data file (‘f5_data’ attribute).

Examples

>>> query_def = {
...     "constraints": [{
...         "im": 2, "classes": [2],
...         "thickness_mode": "cumulative",
...         "thickness_comparison": ">",
...         "thickness_threshold": 10.0,
...         "depth_min": 0.0, "depth_max": 30.0, "negate": False
...     }]
... }
>>> P, meta = query('f_post.h5', query_def)
integrate.integrate_query.query_from_text(text, f_prior_h5, model='claude-sonnet-4-6', api_key=None, verbose=False)

Translate a natural-language query into a query dict using an LLM.

Uses the Anthropic API to interpret the user’s text query in the context of the available prior models and the integrate query schema, returning a query dict and a plain-English interpretation of what the LLM understood.

Parameters:
  • text (str) – Natural language description of the query, e.g. “What is the probability that cumulative clay thickness exceeds 10 m?”.

  • f_prior_h5 (str) – Path to the prior HDF5 file. Model metadata (class names, depth ranges, discrete/continuous type) is read automatically and included in the LLM prompt so the model knows what constraints are valid.

  • model (str, optional) – Anthropic model ID to use (default: ‘claude-sonnet-4-6’).

  • api_key (str, optional) – Anthropic API key. If None, the ANTHROPIC_API_KEY environment variable is used.

  • verbose (bool, optional) – If True, print the system prompt and LLM response for inspection.

Returns:

  • query_dict (dict) – Query dict ready to pass to ig.query(f_post_h5, query_dict).

  • interpretation (str) – Plain English confirmation of what the LLM understood the query to mean. Check this before running ig.query() to catch misunderstandings cheaply.

Raises:
  • ImportError – If the anthropic package is not installed.

  • ValueError – If the LLM reports the query is unsupported, or if the response cannot be parsed as valid JSON.

Notes

Requires either the api_key parameter or the ANTHROPIC_API_KEY environment variable to be set. Install the dependency with: pip install anthropic

Examples

>>> import integrate as ig
>>> query_dict, interpretation = ig.query_from_text(
...     "Probability that cumulative clay thickness > 10 m within 0-30 m",
...     f_prior_h5='prior.h5',
...     api_key='sk-ant-...',
... )
>>> print(interpretation)
>>> P, meta = ig.query('posterior.h5', query_dict)
>>> ig.query_plot(P, meta)
integrate.integrate_query.query_plot(P, meta, ip=None, query_dict=None, f_prior_h5=None, f_post_h5=None, title=None, query_text=None, interpretation=None)

Plot query results and optionally detailed model visualization for a data point.

Always displays a probability map showing P(x, y). If ip is provided along with query_dict and (f_prior_h5 or f_post_h5), also displays a detailed visualization of the posterior models and query-matching models for that specific data location.

Parameters:
  • P (ndarray (N_data,)) – Probability array from query().

  • meta (dict) – Metadata dict from query() containing ‘X’, ‘Y’, ‘i_use’, ‘i_use_query’.

  • ip (int, optional) – Data point index to visualize in detail. If None, only shows probability map.

  • query_dict (dict, optional) – Query dict used in query(). Required for detailed visualization.

  • f_prior_h5 (str, optional) – Path to prior HDF5 file. If not provided, will be extracted from f_post_h5.

  • f_post_h5 (str, optional) – Path to posterior HDF5 file. Used to automatically extract prior file path if f_prior_h5 is not provided.

  • title (str, optional) – Custom title for the probability map. If None, a title is built from query_text and interpretation (if provided), or ‘Query Probability Map’.

  • query_text (str, optional) – The original natural-language query string. Shown in the figure title.

  • interpretation (str, optional) – The LLM interpretation string returned by query_from_text(). Shown as a second line in the figure title so the user can verify the query before inspecting results.

Examples

>>> P, meta = query(f_post_h5, query_def)
>>> query_plot(P, meta)  # Just probability map
>>> query_plot(P, meta, title='Custom Query Title')  # Custom title
>>> query_plot(P, meta, ip=1000, query_dict=query_def, f_post_h5='posterior.h5')
>>> query_plot(P, meta, ip=1000, query_dict=query_def, f_prior_h5='prior.h5')
>>> # With LLM query text and interpretation:
>>> query_dict, interp = ig.query_from_text(text, f_prior_h5)
>>> P, meta = ig.query(f_post_h5, query_dict)
>>> ig.query_plot(P, meta, query_text=text, interpretation=interp)
integrate.integrate_query.save_query(query, path)

Save a query dict to a JSON file.

Parameters:
  • query (dict) – Query definition dictionary.

  • path (str) – Output JSON file path.