integrate

Core functions in the INTEGRATE module.

INTEGRATE Core Module - Probabilistic Geophysical Data Integration

This module implements rejection sampling algorithms for Bayesian inversion and probabilistic data integration in geophysics, with particular focus on electromagnetic (EM) data analysis. The module provides comprehensive tools for prior model generation, forward modeling, likelihood computation, and posterior sampling.

Key Features:
  • Rejection sampling for Bayesian inversion

  • Parallel processing with shared memory optimization

  • Temperature annealing for improved sampling efficiency

  • Support for multiple data types (TDEM, multinomial, etc.)

  • Integration with GA-AEM electromagnetic forward modeling

  • Automatic temperature estimation and adaptive sampling

Main Functions:
  • integrate_rejection(): Main rejection sampling workflow (now in integrate_rejection module)

  • prior_data(): Integration of forward modeling with prior structure

  • forward_gaaem(): Electromagnetic forward modeling interface

  • likelihood_*(): Various likelihood calculation functions (now in integrate_rejection module)

  • posterior_*(): Posterior analysis and statistics

Author: Thomas Mejer Hansen Email: tmeha@geo.au.dk

integrate.integrate.allocate_large_page()

Allocate a 2MB large page if running on Windows.

Returns:

Pointer to allocated memory on success, None on failure or non-Windows systems.

Return type:

int or None

Notes

Large pages can improve performance but require specific Windows privileges.

integrate.integrate.class_id_to_idx(D, class_id=None)

Convert class identifiers to indices.

This function takes an array of class identifiers and converts them to corresponding indices. If no class identifiers are provided, it will automatically determine the unique class identifiers from the input array.

Parameters:
  • D (numpy.ndarray) – Array containing class identifiers.

  • class_id (numpy.ndarray, optional) – Array of unique class identifiers. If None, unique class identifiers will be determined from the input array D. Default is None.

Returns:

A tuple containing the following elements:

D_idxnumpy.ndarray

Array with class identifiers converted to indices.

class_idnumpy.ndarray

Array of unique class identifiers.

class_id_outnumpy.ndarray

Array of unique output class identifiers.

Return type:

tuple

integrate.integrate.comb_cprob(pA, pAgB, pAgC, tau=1.0)

Combine conditional probabilities based on permanence of updating ratios.

This function implements the probability combination method described in Journel’s “An Alternative to Traditional Data Independence Hypotheses” (Math Geology, 2004).

Parameters:

pAarray_like

Probability of event A

pAgBarray_like

Conditional probability of A given B

pAgCarray_like

Conditional probability of A given C

taufloat, optional

Combination parameter controlling the ratio permanence (default=1.0)

Returns:

ndarray

Combined conditional probability Prob(A|B,C)

References:

Journel, An Alternative to Traditional Data Independence Hypotheses, Mathematical Geology, 2002

integrate.integrate.compute_P_obs_from_log(depth_top, depth_bottom, lithology_obs, z, class_id, P_single=0.8, P_prior=None)

Compute discrete observation probability matrix from depth intervals and lithology observations.

This function creates a probability matrix where each depth point is assigned probabilities based on observed lithology classes within specified depth intervals.

Parameters:
  • depth_top (array-like) – Array of top depths for each observation interval.

  • depth_bottom (array-like) – Array of bottom depths for each observation interval.

  • lithology_obs (array-like) – Array of observed lithology class IDs for each interval.

  • z (array-like) – Array of depth/position values where probabilities are computed.

  • class_id (array-like) – Array of unique class identifiers (e.g., [0, 1, 2] for 3 lithology types).

  • P_single (float, optional) – Probability assigned to the observed class. Default is 0.8.

  • P_prior (ndarray, optional) – Prior probability matrix of shape (nclass, nm). If None, uses uniform distribution for depths not covered by observations. Default is None.

Returns:

P_obs – Probability matrix of shape (nclass, nm) where nclass is the number of classes and nm is the number of depth points. For each depth point covered by observations, the observed class gets probability P_single and other classes share (1-P_single). Depths not covered by any observation contain NaN or prior probabilities if provided.

Return type:

ndarray

Examples

>>> depth_top = [0, 10, 20]
>>> depth_bottom = [10, 20, 30]
>>> lithology_obs = [1, 2, 1]  # clay, sand, clay
>>> z = np.arange(30)
>>> class_id = [0, 1, 2]  # gravel, clay, sand
>>> P_obs = compute_P_obs_from_log(depth_top, depth_bottom, lithology_obs, z, class_id)
>>> print(P_obs.shape)  # (3, 30)
integrate.integrate.entropy(P, base=None)

Calculate the entropy of a discrete probability distribution.

The entropy is calculated using the formula: H(P) = -sum(P_i * log_b(P_i))

Parameters:
  • P (numpy.ndarray) – Probability distribution. Can be a 1D or 2D array. If 2D, each row represents a different distribution.

  • base (int, optional) – The logarithm base to use. If None, uses the number of elements in the probability distribution (P.shape[1]). Default is None.

Returns:

The entropy value(s). If input P is 2D, returns an array with entropy for each row distribution.

Return type:

numpy.ndarray

Notes

  • Input probabilities are assumed to be normalized (sum to 1)

  • Zero probabilities are handled by numpy’s log function

  • For 2D input, entropy is calculated row-wise

Examples

>>> P = np.array([0.5, 0.5])
>>> entropy(P)
1.0
>>> P = np.array([[0.5, 0.5], [0.1, 0.9]])
>>> entropy(P)
array([1.0, 0.469])
integrate.integrate.forward_gaaem(C=array([], dtype=float64), thickness=array([], dtype=float64), stmfiles=None, tx_height=array([], dtype=float64), txrx_dx=-13, txrx_dy=0, txrx_dz=0.1, GEX={}, file_gex=None, showtime=False, **kwargs)

Perform forward modeling using the GA-AEM method.

Parameters:
  • C (numpy.ndarray, optional) – Conductivity array. Default is np.array(()).

  • thickness (numpy.ndarray, optional) – Thickness array. Default is np.array(()).

  • stmfiles (list, optional) – List of STM files. Default is None.

  • tx_height (numpy.ndarray, optional) – Transmitter height array. Default is np.array(()).

  • txrx_dx (float, optional) – X-distance between transmitter and receiver. Default is -13.

  • txrx_dy (float, optional) – Y-distance between transmitter and receiver. Default is 0.

  • txrx_dz (float, optional) – Z-distance between transmitter and receiver. Default is 0.1.

  • GEX (dict, optional) – GEX dictionary. Default is {}.

  • file_gex (str, optional) – Path to GEX file. Default is None.

  • showtime (bool, optional) – Flag to display execution time. Default is False.

  • showInfo (int, optional) – Level of verbosity for output.

  • doCompress (bool, optional) – Flag to enable layer compression. Default is True.

Returns:

Forward modeled data array.

Return type:

numpy.ndarray

integrate.integrate.forward_gaaem_chunk(C_chunk, tx_height_chunk, thickness, stmfiles, file_gex, Nhank, Nfreq, **kwargs)

Perform forward modeling using the GA-AEM method on a chunk of data.

Parameters:
  • C_chunk (numpy.ndarray) – The chunk of data to be processed.

  • tx_height_chunk (numpy.ndarray) – The transmitter heights for this chunk.

  • thickness (float) – The thickness of the model.

  • stmfiles (list) – A list of STM files.

  • file_gex (str) – The path to the GEX file.

  • Nhank (int) – The number of Hankel functions.

  • Nfreq (int) – The number of frequencies.

  • **kwargs (dict) – Additional keyword arguments.

Returns:

The result of the forward modeling.

Return type:

numpy.ndarray

integrate.integrate.get_hypothesis_probability(f_post_h5_arr, T=1)

Calculate hypothesis probabilities and related statistics from posterior files.

This function processes an array of HDF5 file paths containing posterior evidences to compute normalized probabilities for each hypothesis, along with evidence values, mode hypotheses, and entropy measures.

Parameters:
  • f_post_h5_arr (list of str) – Array of file paths to HDF5 files containing posterior evidence values. Each file should have an ‘/EV’ dataset.

  • T (float, optional) – Temperature parameter that applies annealing. Higher temperatures create more uniform distributions. Useful for smoothing distributions from smaller lookup tables. Default is 1.

Returns:

A tuple containing the following elements:

Pnumpy.ndarray

Normalized probabilities for each hypothesis (shape: n_hypothesis, n_samples).

EV_allnumpy.ndarray

Evidence values for each hypothesis and sample (shape: n_hypothesis, n_samples).

MODE_hypothesisnumpy.ndarray

Index of most probable hypothesis per sample (shape: n_samples).

ENT_hypothesisnumpy.ndarray

Entropy of hypothesis distribution per sample, normalized by number of hypotheses (shape: n_samples).

Return type:

tuple

Notes

The probability normalization uses the log-sum-exp trick to avoid numerical underflow issues when working with evidence values.

integrate.integrate.get_process_handle_count()

Return the number of handles used by the current process (Windows only).

Returns:

The number of handles used by the current process.

Return type:

int

integrate.integrate.integrate_posterior_stats(f_post_h5='POST.h5', ip_range=None, **kwargs)

Compute posterior statistics for all model parameters in a POST HDF5 file.

Reads posterior sample indices (i_use) and the corresponding prior model realizations, then computes per-location statistics and writes them back into the same POST.h5 file.

Parameters:
  • f_post_h5 (str, optional) – Path to the POST HDF5 file. Default is ‘POST.h5’.

  • ip_range (array-like or None, optional) – Indices of data locations to process. If None, all locations are processed. Locations not in ip_range receive NaN values. Default is None.

  • **kwargs (dict) –

    showInfoint, optional

    Verbosity level. 0 = silent, 1 = progress bars. Default is 0.

    usePriorbool, optional

    If True, use randomly drawn prior samples instead of i_use indices (useful for computing prior statistics as a baseline). Default is False.

    updateGeometryFromDatabool, optional

    Copy UTMX, UTMY, LINE, ELEVATION from the DATA.h5 file into POST.h5 if not already present. Default is True.

    computeKLbool, optional

    Shorthand to enable KL divergence computation for all parameter types. If True, both computeKL_continuous and computeKL_discrete are set to True. Default is False.

    computeKL_continuousbool, optional

    Compute KL divergence D_KL(posterior || prior) for continuous model parameters using log10-space histograms (50 bins). Result is in bits (log base 2). Default is False.

    computeKL_discretebool, optional

    Compute KL divergence D_KL(posterior || prior) for discrete model parameters. Normalised to [0, 1] using log_base = number of classes (0 = posterior equals prior, 1 = completely certain). Default is False.

  • POST.h5 (Writes to)

  • -----------------

  • written (1]. Only)

  • location. (- /N_UNIQUE [Np] Number of unique prior realizations used per)

  • /UTMX (-) – Geometry copied from DATA.h5 (if updateGeometryFromData=True).

  • /UTMY – Geometry copied from DATA.h5 (if updateGeometryFromData=True).

  • /LINE – Geometry copied from DATA.h5 (if updateGeometryFromData=True).

  • [Np] (/ELEVATION) – Geometry copied from DATA.h5 (if updateGeometryFromData=True).

  • /Mx (For each discrete model parameter)

  • [Np (- /Mx/KL)

  • realizations. (Nm] Median of posterior)

  • [Np

  • values). (Nm] Geometric mean (exp of mean of log)

  • [Np

  • realizations.

  • [Np

  • log10(posterior). (Nm] Standard deviation of)

  • [NpcomputeKL_continuous=True.

  • when (Nm] KL divergence in bits. Only written) – computeKL_continuous=True.

  • /Mx

  • [Np

  • location/depth. (Nm] Most probable class at each)

  • [Np

  • log(n_classes). (Nm] Shannon entropy normalised by)

  • [Np

  • Nclass

  • class. (Nm] Posterior probability of each)

  • [Np – when computeKL_discrete=True.

  • [0 (Nm] KL divergence normalised to) – when computeKL_discrete=True.

  • written – when computeKL_discrete=True.

Return type:

None

integrate.integrate.integrate_update_prior_attributes(f_prior_h5, **kwargs)

Update the ‘is_discrete’ attribute of datasets in an HDF5 file.

This function iterates over all datasets in the provided HDF5 file. If a dataset’s name starts with ‘M’, the function checks if the dataset has an ‘is_discrete’ attribute. If not, it checks if the dataset appears to represent discrete data by sampling the first 1000 elements and checking how many unique values there are. If there are fewer than 20 unique values, it sets ‘is_discrete’ to 1; otherwise, it sets ‘is_discrete’ to 0. The ‘is_discrete’ attribute is then added to the dataset.

Parameters:
  • f_prior_h5 (str) – The path to the HDF5 file to process.

  • showInfo (int, optional) – Level of verbosity for output (default is 0).

integrate.integrate.is_notebook()

Check if the code is running in a Jupyter notebook or IPython shell.

Returns:

True if running in a Jupyter notebook or IPython shell, False otherwise.

Return type:

bool

integrate.integrate.kl_divergence(prior_sample, posterior_sample, is_discrete=False, log_base=None)

Compute KL divergence D_KL(posterior || prior) for one or multiple parameters.

Parameters:
  • prior_sample (array, shape (n_realizations,) or (n_realizations, n_params))

  • posterior_sample (array, shape (n_realizations,) or (n_realizations, n_params)) – If 2D, KL is computed per column and a 1D array of size n_params is returned.

  • is_discrete (bool, optional) – If True, treat the parameter as discrete. Default is False.

  • log_base (int, float, or None) – Base of the logarithm. None uses natural log (nats). For discrete parameters, passing log_base=N (number of classes) normalizes the result to [0, 1], where 1 means complete certainty.

Returns:

KL divergence value(s).

Return type:

float or numpy.ndarray

integrate.integrate.logl_T_est(logL, N_above=10, P_acc_lev=0.2)

Estimate a temperature (T_est) based on a given logarithmic likelihood (logL), a number (N_above), and an acceptance level (P_acc_lev).

Parameters:
  • logL (numpy.ndarray) – An array of logarithmic likelihoods.

  • N_above (int, optional) – The number of elements above which to consider in the sorted logL array. Default is 10.

  • P_acc_lev (float, optional) – The acceptance level for the calculation. Default is 0.2.

Returns:

The estimated temperature. It’s either a positive number or infinity.

Return type:

float

Notes

The function sorts the logL array in ascending order after normalizing the data by subtracting the maximum value from each element. It then removes any NaN values from the sorted array. If the sorted array is not empty, it calculates T_est based on the N_above+1th last element in the sorted array and the natural logarithm of P_acc_lev. If the sorted array is empty, it sets T_est to infinity.

integrate.integrate.lu_post_sample_logl(logL, ns=1, T=1)

Perform LU post-sampling log-likelihood calculation.

Parameters:
  • logL (array-like) – Array of log-likelihood values.

  • ns (int, optional) – Number of samples to generate. Defaults to 1.

  • T (float, optional) – Temperature parameter. Defaults to 1.

Returns:

A tuple containing the generated samples and the acceptance probabilities.

i_use_allnumpy.ndarray

Array of indices of the selected samples.

P_accnumpy.ndarray

Array of acceptance probabilities.

Return type:

tuple

integrate.integrate.posterior_cumulative_thickness(f_post_h5, im=2, icat=[0], usePrior=False, **kwargs)

Calculate the posterior cumulative thickness based on the given inputs.

Parameters:
  • f_post_h5 (str) – Path to the input h5 file.

  • im (int, optional) – Index of model parameter number, M[im]. Default is 2.

  • icat (list, optional) – List of category indices. Default is [0].

  • usePrior (bool, optional) – Flag indicating whether to use prior. Default is False.

  • **kwargs (dict) – Additional keyword arguments.

Returns:

A tuple containing the following elements:

thick_meanndarray

Array of mean cumulative thickness.

thick_medianndarray

Array of median cumulative thickness.

thick_stdndarray

Array of standard deviation of cumulative thickness.

class_outlist

List of class names.

Xndarray

Array of X values.

Yndarray

Array of Y values.

Return type:

tuple

integrate.integrate.prior_data(f_prior_in_h5, f_forward_h5, id=1, im=1, doMakePriorCopy=0, parallel=True)

Update prior structure with forward modeled data.

This function integrates forward modeling results into the prior data structure, supporting different data types including TDEM (time-domain electromagnetic) data with GA-AEM forward modeling and identity transforms.

Parameters:
  • f_prior_in_h5 (str) – Path to input prior HDF5 file containing prior models.

  • f_forward_h5 (str) – Path to forward modeling results HDF5 file.

  • id (int, optional) – Data identifier for the prior structure. Default is 1.

  • im (int, optional) – Model identifier for the prior structure. Default is 1.

  • doMakePriorCopy (int, optional) – Flag to create a copy of the prior file (0=no copy, 1=copy). Default is 0.

  • parallel (bool, optional) – Enable parallel processing for forward modeling. Default is True.

Returns:

Path to the updated prior HDF5 file containing integrated data.

Return type:

str

Notes

The function automatically detects the data type from the forward modeling file and calls appropriate integration methods (GA-AEM for TDEM, identity for direct data). Prints error messages for unsupported data types or methods.

integrate.integrate.prior_data_gaaem(f_prior_h5, file_gex=None, stmfiles=None, N=0, doMakePriorCopy=True, im=1, id=1, im_height=0, Nhank=280, Nfreq=12, is_log=False, parallel=True, **kwargs)

Generate prior data for the GA-AEM method.

Parameters:
  • f_prior_h5 (str) – Path to the prior data file in HDF5 format.

  • file_gex (str, optional) – Path to the file containing geophysical exploration data (.gex format).

  • stmfiles (list of str, optional) – List of STM files for system configuration. If not provided, will be generated from file_gex.

  • N (int, optional) – Number of soundings to consider. Default is 0 (use all).

  • doMakePriorCopy (bool, optional) – Flag indicating whether to make a copy of the prior file. Default is True.

  • im (int, optional) – Index of the model. Default is 1.

  • id (int, optional) – Index of the data. Default is 1.

  • im_height (int, optional) – Index of the model for height. Default is 0.

  • Nhank (int, optional) – Number of Hankel transform quadrature points. Default is 280.

  • Nfreq (int, optional) – Number of frequencies. Default is 12.

  • is_log (bool, optional) – Flag to apply logarithmic scaling to data. Default is False.

  • parallel (bool, optional) – Flag indicating whether multiprocessing is used. Default is True. When True, forward modeling is parallelized across available CPUs.

  • **kwargs (dict) –

    Additional keyword arguments:

    Ncpuint, optional

    Number of CPUs to use for parallel processing. Default is 0, which uses all available CPUs. Only used when parallel=True.

    showInfoint, optional

    Level of verbosity for output (0=silent, 1=normal, 2=verbose).

Returns:

Filename of the HDF5 file containing the updated prior data.

Return type:

str

Notes

This function computes forward-modeled electromagnetic responses for prior model realizations using the GA-AEM forward modeling code. The forward modeling can be parallelized for faster computation on multi-core systems.

Examples

>>> # Basic usage with all CPUs
>>> f_prior_data = prior_data_gaaem(f_prior_h5, file_gex)
>>> # Use specific number of CPUs
>>> f_prior_data = prior_data_gaaem(f_prior_h5, file_gex, Ncpu=4)
>>> # Sequential processing (no parallelization)
>>> f_prior_data = prior_data_gaaem(f_prior_h5, file_gex, parallel=False)
integrate.integrate.prior_data_identity(f_prior_h5, id=0, im=1, N=0, doMakePriorCopy=False, **kwargs)

Generate data D{id} from model M{im} in the prior file f_prior_h5 as an identity of M{im}.

Parameters:
  • f_prior_h5 (str) – Path to the prior data file in HDF5 format.

  • id (int, optional) – Index of the data. If id=0, the next available data id is used. Default is 0.

  • im (int, optional) – Index of the model. Default is 1.

  • N (int, optional) – Number of soundings to consider. Default is 0 (use all).

  • doMakePriorCopy (bool, optional) – Flag indicating whether to make a copy of the prior file. Default is False.

  • showInfo (int, optional) – Level of verbosity for output.

  • forceDeleteExisting (bool, optional) – Flag to force deletion of existing data. Default is True.

Returns:

Path to the HDF5 file containing the updated prior data.

Return type:

str

integrate.integrate.prior_model_layered(lay_dist='uniform', dz=1, z_max=90, NLAY_min=3, NLAY_max=6, NLAY_deg=6, RHO_dist='log-uniform', RHO_min=0.1, RHO_max=5000, RHO_mean=100, RHO_std=80, N=100000, save_sparse=True, RHO_threshold=0.001, **kwargs)

Generate a prior model with layered structure.

This optimized implementation uses vectorized NumPy operations for improved performance, providing ~2x speedup for large N compared to the original loop-based implementation.

Parameters:
  • lay_dist (str, optional) – Distribution of the number of layers. Options are ‘chi2’ and ‘uniform’. Default is ‘uniform’.

  • dz (float, optional) – Depth discretization step. Default is 1.

  • z_max (float, optional) – Maximum depth in m. Default is 90.

  • NLAY_min (int, optional) – Minimum number of layers. Default is 3.

  • NLAY_max (int, optional) – Maximum number of layers. Default is 6.

  • NLAY_deg (int, optional) – Degrees of freedom for chi-square distribution. Only applicable if lay_dist is ‘chi2’. Default is 6.

  • RHO_dist (str, optional) – Distribution of resistivity within each layer. Options are ‘log-uniform’, ‘uniform’, ‘normal’, and ‘lognormal’. Default is ‘log-uniform’.

  • RHO_min (float, optional) – Minimum resistivity value. Default is 0.1.

  • RHO_max (float, optional) – Maximum resistivity value. Default is 5000.

  • RHO_mean (float, optional) – Mean resistivity value. Only applicable if RHO_dist is ‘normal’ or ‘lognormal’. Default is 100.

  • RHO_std (float, optional) – Standard deviation of resistivity value. Only applicable if RHO_dist is ‘normal’ or ‘lognormal’. Default is 80.

  • N (int, optional) – Number of prior models to generate. Default is 100000.

  • save_sparse (bool, optional) – Whether to save the sparse representation (M2: depth-resistivity pairs) to the HDF5 file. Setting to False can reduce file size and processing time for large priors. Default is True.

  • RHO_threshold (float, optional) – Minimum physical resistivity threshold in Ohm·m. Any generated resistivity values below this threshold (including zero or negative values from ‘normal’ distribution) will be clamped to this value. Ensures physically realistic positive resistivity values. Default is 0.001 Ohm·m.

  • f_prior_h5 (str, optional) – Path to the prior model file in HDF5 format. Default is ‘’.

  • showInfo (int, optional) – Level of verbosity for output.

Returns:

Filepath of the saved prior model.

Return type:

str

Notes

This implementation pre-generates all random values using vectorized NumPy operations, significantly improving performance for large N (e.g., N=50000).

integrate.integrate.prior_model_workbench(N=100000, p=2, z1=0, z_max=100, dz=1, lay_dist='uniform', nlayers=0, NLAY_min=3, NLAY_max=6, NLAY_deg=5, RHO_dist='log-uniform', RHO_min=1, RHO_max=300, RHO_mean=180, RHO_std=80, chi2_deg=100, RHO_threshold=0.001, **kwargs)

Generate a prior model with increasingly thick layers.

Parameters:
  • N (int, optional) – Number of prior models to generate. Default is 100000.

  • p (int, optional) – Power parameter for thickness increase. Default is 2.

  • z1 (float, optional) – Minimum depth value. Default is 0.

  • z_max (float, optional) – Maximum depth value. Default is 100.

  • dz (float, optional) – Depth discretization step. Default is 1.

  • lay_dist (str, optional) – Distribution of the number of layers. Options are ‘chi2’ and ‘uniform’. Default is ‘uniform’.

  • nlayers (int, optional) – Number of layers. If greater than 0, sets both NLAY_min and NLAY_max to this value. Default is 0.

  • NLAY_min (int, optional) – Minimum number of layers. Default is 3.

  • NLAY_max (int, optional) – Maximum number of layers. Default is 6.

  • NLAY_deg (int, optional) – Degrees of freedom for chi-square distribution. Only applicable if lay_dist is ‘chi2’. Default is 5.

  • RHO_dist (str, optional) – Distribution of resistivity within each layer. Options are ‘log-uniform’, ‘uniform’, ‘normal’, ‘lognormal’, and ‘chi2’. Default is ‘log-uniform’.

  • RHO_min (float, optional) – Minimum resistivity value. Default is 1.

  • RHO_max (float, optional) – Maximum resistivity value. Default is 300.

  • RHO_mean (float, optional) – Mean resistivity value. Only applicable if RHO_dist is ‘normal’ or ‘lognormal’. Default is 180.

  • RHO_std (float, optional) – Standard deviation of resistivity value. Only applicable if RHO_dist is ‘normal’ or ‘lognormal’. Default is 80.

  • chi2_deg (int, optional) – Degrees of freedom for chi2 distribution. Only applicable if RHO_dist is ‘chi2’. Default is 100.

  • RHO_threshold (float, optional) – Minimum physical resistivity threshold in Ohm·m. Any generated resistivity values below this threshold (including zero or negative values from ‘normal’ distribution) will be clamped to this value. Ensures physically realistic positive resistivity values. Default is 0.001 Ohm·m.

  • f_prior_h5 (str, optional) – Path to the prior model file in HDF5 format. Default is ‘’.

  • showInfo (int, optional) – Level of verbosity for output.

Returns:

Filepath of the saved prior model.

Return type:

str

integrate.integrate.prior_model_workbench_direct(N=100000, RHO_dist='log-uniform', z1=0, z_max=100, nlayers=0, p=2, NLAY_min=3, NLAY_max=6, RHO_min=1, RHO_max=300, RHO_mean=180, RHO_std=80, chi2_deg=100, RHO_threshold=0.001, **kwargs)

Generate a prior model with increasingly thick layers.

All models have the same number of layers! See also: prior_model_workbench.

Parameters:
  • N (int, optional) – Number of prior models to generate. Default is 100000.

  • RHO_dist (str, optional) – Distribution of resistivity within each layer. Options are ‘log-uniform’, ‘uniform’, ‘normal’, ‘lognormal’, and ‘chi2’. Default is ‘log-uniform’.

  • z1 (float, optional) – Minimum depth value. Default is 0.

  • z_max (float, optional) – Maximum depth value. Default is 100.

  • nlayers (int, optional) – Number of layers. Default is 0 (uses 30 if less than 1).

  • p (int, optional) – Power parameter for thickness increase. Default is 2.

  • NLAY_min (int, optional) – Minimum number of layers. Default is 3.

  • NLAY_max (int, optional) – Maximum number of layers. Default is 6.

  • RHO_min (float, optional) – Minimum resistivity value. Default is 1.

  • RHO_max (float, optional) – Maximum resistivity value. Default is 300.

  • RHO_mean (float, optional) – Mean resistivity value. Only applicable if RHO_dist is ‘normal’ or ‘lognormal’. Default is 180.

  • RHO_std (float, optional) – Standard deviation of resistivity value. Only applicable if RHO_dist is ‘normal’ or ‘lognormal’. Default is 80.

  • chi2_deg (int, optional) – Degrees of freedom for chi2 distribution. Only applicable if RHO_dist is ‘chi2’. Default is 100.

  • RHO_threshold (float, optional) – Minimum physical resistivity threshold in Ohm·m. Any generated resistivity values below this threshold (including zero or negative values from ‘normal’ distribution) will be clamped to this value. Ensures physically realistic positive resistivity values. Default is 0.001 Ohm·m.

  • f_prior_h5 (str, optional) – Path to the prior model file in HDF5 format. Default is ‘’.

  • showInfo (int, optional) – Level of verbosity for output.

Returns:

Filepath of the saved prior model.

Return type:

str

integrate.integrate.rescale_P_obs_temperature(P_obs, T=1.0)

Rescale discrete observation probabilities by temperature and renormalize.

This function applies temperature annealing to probability distributions by raising each probability to the power (1/T), then renormalizing each column (depth point) so that probabilities sum to 1. Higher temperatures (T > 1) flatten the distribution, while lower temperatures (T < 1) sharpen it.

Parameters:
  • P_obs (ndarray) – Probability matrix of shape (nclass, nm) where nclass is the number of classes and nm is the number of model parameters (e.g., depth points). Each column should represent a probability distribution over classes.

  • T (float, optional) – Temperature parameter for annealing. Default is 1.0 (no scaling). - T = 1.0: No change (original probabilities) - T > 1.0: Flattens distribution (less certain) - T < 1.0: Sharpens distribution (more certain) - T → ∞: Approaches uniform distribution - T → 0: Approaches one-hot distribution

Returns:

P_obs_scaled – Temperature-scaled and renormalized probability matrix of shape (nclass, nm). Each column sums to 1.0. NaN values in input are preserved in output.

Return type:

ndarray

Examples

>>> P_obs = np.array([[0.8, 0.6, 0.5],
...                   [0.1, 0.2, 0.3],
...                   [0.1, 0.2, 0.2]])
>>> P_scaled = rescale_P_obs_temperature(P_obs, T=2.0)
>>> print(P_scaled)  # More uniform distribution
>>> P_scaled = rescale_P_obs_temperature(P_obs, T=0.5)
>>> print(P_scaled)  # Sharper distribution

Notes

The temperature scaling follows the Boltzmann distribution:

P_new(c) ∝ P_old(c)^(1/T)

After scaling, each column (depth point) is renormalized:

P_new(c) = P_new(c) / sum_c(P_new(c))

This is commonly used in simulated annealing and rejection sampling to control the strength of discrete observations during Bayesian inference.

integrate.integrate.sample_from_posterior(is_, d_sim, f_data_h5='tTEM-Djursland.h5', N_use=1000000, autoT=1, ns=400)

Sample from the posterior distribution.

Parameters:
  • is (int) – Index of data f_data_h5.

  • d_sim (ndarray) – Simulated data.

  • f_data_h5 (str, optional) – Filepath of the data file. Default is ‘tTEM-Djursland.h5’.

  • N_use (int, optional) – Number of samples to use. Default is 1000000.

  • autoT (int, optional) – Flag indicating whether to estimate temperature. Default is 1.

  • ns (int, optional) – Number of samples to draw from the posterior. Default is 400.

Returns:

A tuple containing the following elements:

i_usendarray

Indices of the samples used.

Tfloat

Temperature.

EVfloat

Expected value.

is_indexint

Index of the posterior sample.

Return type:

tuple

integrate.integrate.sample_posterior_multiple_hypotheses(f_post_h5_arr, P_hypothesis=None)

Sample posterior models from multiple hypotheses.

This function samples posterior models from multiple hypotheses stored in HDF5 files, according to the given hypothesis probabilities.

Parameters:
  • f_post_h5_arr (list of str) – List of paths to HDF5 files containing posterior models for different hypotheses.

  • P_hypothesis (numpy.ndarray, optional) – Array of shape (n_hypotheses, n_soundings) containing probability of each hypothesis for each sounding. If None, uniform probabilities are used. Default is None.

Returns:

List of posterior model arrays. Each array has shape (n_soundings, n_samples, n_parameters), where n_samples is determined by the first hypothesis’s number of samples.

Return type:

list of numpy.ndarray

Notes

The function combines posterior samples from different hypotheses in proportion to their probabilities and ensures the total number of samples equals the first hypothesis’s sample count.

integrate.integrate.synthetic_case(case='Wedge', **kwargs)

Generate synthetic geological models for different cases.

This function creates synthetic 2D geological models for testing and validation purposes. Supports ‘Wedge’ and ‘3Layer’ model types with customizable parameters.

Parameters:
  • case (str, optional) – The type of synthetic case to generate. Options are ‘Wedge’ and ‘3Layer’. Default is ‘Wedge’.

  • showInfo (int, optional) – If greater than 0, print information about the generated case. Default is 0.

  • rho_1 (list or array_like, optional) – Resistivity values for layer 1 along the profile. If a single value [v], resistivity is constant at v. If multiple values [v1, v2, …], resistivity varies from v1 (left) to v2 (middle) to vN (right) using interpolation. Only used when rho_1, rho_2, and rho_3 are all provided.

  • rho_2 (list or array_like, optional) – Resistivity values for layer 2 along the profile. Same format as rho_1.

  • rho_3 (list or array_like, optional) – Resistivity values for layer 3 along the profile. Same format as rho_1.

  • x_max (int, optional) – Maximum x-dimension size. Default is 1000 for ‘Wedge’, 100 for ‘3Layer’.

  • dx (float, optional) – Step size in the x-dimension.

  • z_max (int, optional) – Maximum z-dimension size. Default is 90 for ‘Wedge’, 60 for ‘3Layer’.

  • dz (float, optional) – Step size in the z-dimension. Default is 1.

  • z1 (float, optional) – Depth at which the wedge starts (‘Wedge’: default z_max/10) or first layer ends (‘3Layer’: default z_max/3).

  • rho (list, optional) – Density values for different layers (‘Wedge’ case). Default is [100, 200, 120]. Overridden by rho_1, rho_2, rho_3 if all three are provided.

  • wedge_angle (float, optional) – Angle of the wedge in degrees (‘Wedge’ case). Default is 1.

  • x_range (float, optional) – Range in the x-dimension for the cosine function (‘3Layer’ case). Default is x_max/4.

  • z_thick (float, optional) – Thickness of the second layer (‘3Layer’ case). Default is z_max/2.

  • rho1_1 (float, optional) – Density at the start of the first layer (‘3Layer’). Default is 120.

  • rho1_2 (float, optional) – Density at the end of the first layer (‘3Layer’). Default is 10.

  • rho2_1 (float, optional) – Density at the start of the second layer (‘3Layer’). Default is rho1_2.

  • rho2_2 (float, optional) – Density at the end of the second layer (‘3Layer’). Default is rho1_1.

  • rho3 (float, optional) – Density of the third layer (‘3Layer’). Default is 120.

Returns:

  • M (ndarray) – The generated synthetic resistivity model of shape (nx, nz).

  • x (ndarray) – X-coordinates of the model.

  • z (ndarray) – Z-coordinates (depth) of the model.

  • M_ref_lith (ndarray) – Lithology/layer number for each pixel, same shape as M. Values are 1, 2, 3 corresponding to the layer number.

  • layer_depths (ndarray) – Depth to the top of layers 1, 2, and 3 for each trace, shape (nx, 3). Column 0: depth to top of layer 1 (always 0) Column 1: depth to top of layer 2 Column 2: depth to top of layer 3

Examples

>>> # Constant resistivity in each layer
>>> M, x, z, M_lith, depths = ig.synthetic_case(case='3layer', rho_1=[10], rho_2=[80], rho_3=[10])
>>>
>>> # Linear variation from left to right
>>> M, x, z, M_lith, depths = ig.synthetic_case(case='3layer', rho_1=[10, 80], rho_2=[80, 10], rho_3=[10, 10])
>>>
>>> # Three-point variation (left, middle, right)
>>> M, x, z, M_lith, depths = ig.synthetic_case(case='3layer', rho_1=[10, 80, 10], rho_2=[50, 100, 50], rho_3=[10, 10, 10])
integrate.integrate.timing_compute(N_arr=[], Nproc_arr=[])

Execute timing benchmark for INTEGRATE workflow components.

This function benchmarks the performance of the complete INTEGRATE workflow including prior model generation, forward modeling, rejection sampling, and posterior statistics computation across different dataset sizes and processor counts.

Parameters:
  • N_arr (array_like, optional) – Array of dataset sizes (number of prior models) to test. Default is [100, 500, 1000, 5000, 10000, 50000, 100000, 500000, 1000000, 5000000].

  • Nproc_arr (array_like, optional) – Array of processor counts to test. Default is powers of 2 up to available CPUs.

Returns:

Filename of the NPZ file containing timing results.

Return type:

str

Notes

The benchmark tests four main components: 1. Prior model generation (layered geological models) 2. Forward modeling using GA-AEM electromagnetic simulation 3. Rejection sampling for Bayesian inversion 4. Posterior statistics computation

Results are saved to an NPZ file with timing arrays and system information. The function automatically uses appropriate test data and handles parallel processing configuration based on system capabilities.

integrate.integrate.timing_plot(f_timing='')

Generate comprehensive timing analysis plots from benchmark results.

This function creates multiple plots analyzing the performance characteristics of the INTEGRATE workflow across different dataset sizes and processor counts.

Parameters:

f_timing (str) – Path to NPZ file containing timing benchmark results from timing_compute().

Returns:

Saves multiple PNG files with timing analysis plots.

Return type:

None

Notes

Generated plots include: - Total execution time vs processors and dataset size - Forward modeling performance and speedup analysis - Rejection sampling performance and scaling - Posterior statistics computation performance - Cumulative time breakdowns for different processor counts - Comparisons with traditional least squares and MCMC methods

The function handles missing data gracefully and includes reference lines for linear scaling to assess parallel efficiency.

integrate.integrate.use_parallel(**kwargs)

Determine if parallel processing can be used based on the environment.

Parallel processing is supported on all platforms. The module handles platform differences internally: Linux uses fork for performance, while Windows and macOS use spawn for correctness. No if __name__ == “__main__”: guard is required in user scripts on any platform.

Parameters:

showInfo (int, optional) – If greater than 0, prints information about the environment and parallel processing status. Default is 0.

Returns:

True if parallel processing is safe, False otherwise.

Return type:

bool