integrate_io¶
Functions related to IO in the INTEGRATE module.
INTEGRATE I/O Module - Data Input/Output and File Management
This module provides comprehensive input/output functionality for the INTEGRATE geophysical data integration package. It handles reading and writing of HDF5 files, data format conversions, and management of prior/posterior data structures.
- Key Features:
HDF5 file I/O for prior models, data, and posterior results
Support for multiple geophysical data formats (GEX, STM, USF)
Automatic data validation and format checking
File conversion utilities between different formats
Data merging and aggregation functions
Checksum verification and file integrity checks
- Main Functions:
load_*(): Functions for loading prior models, data, and results
save_*(): Functions for saving prior models and data arrays
read_*(): File format readers (GEX, USF, etc.)
write_*(): File format writers and converters
merge_*(): Data and posterior merging utilities
- File Format Support:
HDF5: Primary data storage format
GEX: Geometry and survey configuration files
STM: System transfer function files
USF: Field measurement files
CSV: Export format for GIS integration
Author: Thomas Mejer Hansen Email: tmeha@geo.au.dk
- integrate.integrate_io.check_data(f_data_h5='data.h5', **kwargs)¶
Validate and complete INTEGRATE data file structure.
Ensures HDF5 data files contain required geometry datasets (UTMX, UTMY, LINE, ELEVATION) for electromagnetic surveys. Creates missing datasets using provided values or sensible defaults based on existing data dimensions.
- Parameters:
f_data_h5 (str, optional) – Path to the HDF5 data file to validate and update (default is ‘data.h5’).
**kwargs (dict) – Dataset values and configuration options: - UTMX : array-like, UTM X coordinates - UTMY : array-like, UTM Y coordinates - LINE : array-like, survey line identifiers - ELEVATION : array-like, ground elevation values - showInfo : int, verbosity level (0=silent, >0=verbose)
- Returns:
Function modifies the HDF5 file in place, adding missing datasets.
- Return type:
None
- Raises:
KeyError – If ‘D1/d_obs’ dataset is missing and geometry dimensions cannot be determined.
FileNotFoundError – If the specified HDF5 file does not exist.
Notes
The function ensures INTEGRATE data files have complete geometry information: - UTMX, UTMY: Spatial coordinates (required for mapping and modeling) - LINE: Survey line identifiers (required for data organization) - ELEVATION: Ground surface elevation (required for depth calculations)
Behavior:
If coordinate parameters are provided (UTMX, UTMY, LINE, ELEVATION): * Updates existing datasets with new values * Creates datasets if they don’t exist
If coordinate parameters are not provided: * Leaves existing datasets unchanged * Creates missing datasets with default values
Default value generation when datasets are missing and no values provided: - UTMX: Sequential values 0, 1, 2, … (placeholder coordinates) - UTMY: Zeros array with same length as UTMX - LINE: All values set to 1 (single survey line) - ELEVATION: All values set to 0 (sea level reference)
Dataset dimensions are inferred from existing ‘D1/d_obs’ observations when no coordinate data is provided.
- integrate.integrate_io.copy_hdf5_file(input_filename, output_filename, N=None, loadToMemory=True, compress=True, **kwargs)¶
Copy the contents of an HDF5 file to another HDF5 file.
- Parameters:
input_filename (str) – The path to the input HDF5 file.
output_filename (str) – The path to the output HDF5 file.
N (int, optional) – The number of elements to copy from each dataset. If not specified, all elements will be copied.
loadToMemory (bool, optional) – Whether to load the entire dataset to memory before slicing. Default is True.
compress (bool, optional) – Whether to compress the output dataset. Default is True.
- Returns:
output_filename
- integrate.integrate_io.copy_prior(input_filename, output_filename, idx=None, N_use=None, loadtomem=False, **kwargs)¶
Copy a PRIOR file, optionally subsetting the data.
This function copies an HDF5 PRIOR file, which may contain model parameters (M1, M2, …) and forward-modeled data (D1, D2, …). It allows for copying only a specific subset of samples using either an index array (idx) or a specified number of random samples (N_use).
- Parameters:
input_filename (str) – Path to the input PRIOR HDF5 file.
output_filename (str) – Path to the output PRIOR HDF5 file.
idx (array-like, optional) – An array of indices to copy. If provided, only the data corresponding to these indices will be included in the new file. This takes precedence over N_use. Default is None (copy all data).
N_use (int, optional) – The number of random samples to select and copy. This is ignored if idx is provided. Default is None.
loadtomem (bool, optional) – If True, datasets are loaded entirely into memory before slicing. This can significantly speed up copying large subsets of data but increases memory consumption. Default is False.
**kwargs (dict) – Additional keyword arguments (e.g., showInfo, compress).
- Returns:
The path to the output HDF5 file (output_filename).
- Return type:
str
- Raises:
ValueError – If N_use is greater than the total number of samples in the file, or if no datasets are found to determine the size for random sampling.
- integrate.integrate_io.download_file(url, download_dir, use_checksum=False, **kwargs)¶
Download a file from a URL to a specified directory.
- Parameters:
url (str) – The URL of the file to download.
download_dir (str) – The directory to save the downloaded file.
use_checksum (bool) – Whether to verify the file checksum after download.
kwargs – Additional keyword arguments.
- Returns:
None
- integrate.integrate_io.download_file_old(url, download_dir, **kwargs)¶
Download a file from a URL to a specified directory (old version).
- Parameters:
url (str) – The URL of the file to download.
download_dir (str) – The directory to save the downloaded file.
kwargs – Additional keyword arguments.
- Returns:
None
- integrate.integrate_io.extract_feature_at_elevation(f_post_h5, elevation, im=1, key='', iz=None, ic=None, iclass=None)¶
Extract model parameter feature values at a specific elevation for all data points.
This function extracts values from a posterior model parameter at a specified elevation (e.g., 40m above sea level) across all data points. The function performs linear interpolation between model layers to obtain values at the exact requested elevation. For each data point, it uses the ELEVATION from the data file and the depth discretization from the prior model to compute the interpolated value.
- Parameters:
f_post_h5 (str) – Path to the HDF5 file containing posterior sampling results.
elevation (float) – Elevation in meters at which to extract the feature values. This is an absolute elevation value (e.g., 40 means 40m above sea level).
im (int, optional) – Model index to extract from (e.g., 1 for M1, 2 for M2, default is 1).
key (str, optional) –
Dataset key within the model group to extract. If empty string, automatically selects appropriate statistic based on parameter type:
Continuous parameters: ‘Mean’, ‘Median’, ‘Std’ - Default: ‘Median’
Discrete parameters: ‘Mode’, ‘Entropy’, ‘P’ (probability) - Default: ‘Mode’ - For ‘P’: requires ic/iclass parameter to specify which class
iz (int or None, optional) – Specific layer/feature index to extract. If None, attempts to find the appropriate depth layer automatically based on the elevation and model discretization (default is None). This parameter is primarily for advanced use when you want to extract a specific indexed feature rather than interpolating at an elevation.
ic (int or None, optional) – Class index for probability extraction when key=’P’. Specifies which class probability to extract. If None and key=’P’, defaults to 0 (first class). Alias for iclass parameter (default is None).
iclass (int or None, optional) – Alternative name for ic parameter. Class index for probability extraction when key=’P’ (default is None).
- Returns:
Array of feature values at the specified elevation for all data points. Shape is (N_points,) where N_points is the number of data locations. Values are interpolated from the model layers surrounding the requested elevation. Returns NaN for data points where the requested elevation is outside the model domain (above surface or below maximum depth).
- Return type:
numpy.ndarray
- Raises:
FileNotFoundError – If the specified HDF5 file does not exist.
KeyError – If the requested model index (im) or key is not found in the file.
ValueError – If the elevation is invalid or cannot be interpolated from the model.
Notes
Elevation and Depth Calculation:
The function uses the following coordinate system: - ELEVATION: Ground surface elevation for each data point (from data file) - z: Depth below surface from the prior model (e.g., 0, 1, 2, … meters) - Absolute elevation = ELEVATION - z
For example, if a data point has ELEVATION=50m and the model has z=[0,10,20,30]: - At z=0: absolute elevation = 50m (surface) - At z=10: absolute elevation = 40m (10m below surface) - At z=20: absolute elevation = 30m (20m below surface)
To extract a value at elevation=40m, the function: 1. Computes depth below surface: depth = ELEVATION - elevation = 50 - 40 = 10m 2. Interpolates the feature value at depth=10m from the model
Interpolation:
Linear interpolation is used between model layers. If the requested elevation falls exactly on a model layer boundary, that layer’s value is returned. If the elevation is between two layers, values are linearly interpolated.
Automatic Key Selection:
When key=’’, the function automatically selects an appropriate statistic: - Discrete parameters: defaults to ‘Mode’ (most probable class) - Continuous parameters: defaults to ‘Median’ (robust central estimate)
Valid Keys by Parameter Type:
Continuous parameters: - ‘Mean’: Average value - ‘Median’: Median value (default) - ‘Std’: Standard deviation
Discrete parameters: - ‘Mode’: Most probable class (default) - ‘Entropy’: Uncertainty measure - ‘P’: Probability for a specific class (requires ic/iclass parameter)
Probability Extraction:
When extracting probabilities (key=’P’), the function requires a class index specified by ic or iclass. The P array has shape (nd, n_classes, nz) where: - nd = number of data points - n_classes = number of discrete classes - nz = number of depth layers
The ic/iclass parameter selects which class probability to extract.
Examples
Extract median resistivity at 40m elevation (continuous):
>>> values = extract_feature_at_elevation('post.h5', elevation=40, im=1, key='Median') >>> print(values.shape) # (N_points,)
Extract mean and standard deviation (continuous):
>>> mean_vals = extract_feature_at_elevation('post.h5', elevation=40, im=1, key='Mean') >>> std_vals = extract_feature_at_elevation('post.h5', elevation=40, im=1, key='Std')
Extract mode (most probable class) at 25m elevation (discrete):
>>> classes = extract_feature_at_elevation('post.h5', elevation=25, im=2, key='Mode')
Extract entropy (uncertainty) for discrete parameter:
>>> entropy = extract_feature_at_elevation('post.h5', elevation=25, im=2, key='Entropy')
Extract probability for first class (discrete):
>>> prob_class0 = extract_feature_at_elevation('post.h5', elevation=25, im=2, key='P', ic=0)
Extract probability for second class using iclass parameter:
>>> prob_class1 = extract_feature_at_elevation('post.h5', elevation=25, im=2, key='P', iclass=1)
Use automatic key selection (Mode for discrete, Median for continuous):
>>> values = extract_feature_at_elevation('post.h5', elevation=30, im=1)
Extract mean values at sea level (elevation=0):
>>> values = extract_feature_at_elevation('post.h5', elevation=0, im=1, key='Mean')
- integrate.integrate_io.file_checksum(file_path)¶
Calculate the MD5 checksum of a file.
- Parameters:
file_path (str) – The path to the file.
- Returns:
The MD5 checksum of the file.
- Return type:
str
- integrate.integrate_io.get_case_data(case='DAUGAARD', loadAll=False, loadType='', filelist=None, **kwargs)¶
Get case data for a specific case.
- Parameters:
case (str) – The case name. Default is ‘DAUGAARD’. Options are ‘DAUGAARD’, ‘GRUSGRAV’, ‘FANGEL’, ‘HALD’, ‘ESBJERG’, and ‘OERUM.
loadAll (bool) – Whether to load all files for the case. Default is False.
loadType (str) – The type of files to load. Options are ‘’, ‘prior’, ‘prior_data’, ‘post’, and ‘inout’.
filelist (list or None) – A list of files to load. Default is None (creates new empty list).
kwargs – Additional keyword arguments.
- Returns:
A list of file names for the case.
- Return type:
list
- integrate.integrate_io.get_discrete_classes(f_h5, im=1)¶
Get class IDs and class names for a discrete model parameter.
Retrieves the classification information (class IDs and class names) for a discrete model parameter from either a prior or posterior HDF5 file. This function is useful for understanding the categorical classes used in discrete parameter inversion (e.g., geological units, lithology types).
- Parameters:
f_h5 (str) – Path to the HDF5 file. Can be a prior file (reads classes directly) or a posterior file (extracts the prior file reference first, then reads classes from the prior).
im (int, optional) – Model index to get classes for (e.g., 1 for M1, 2 for M2, default is 1).
- Returns:
class_id (numpy.ndarray or list) – Array of class IDs. Empty list if the model parameter is not discrete or if class_id attribute is not set.
class_name (numpy.ndarray or list) – Array of class names corresponding to the class IDs. Empty list if the model parameter is not discrete or if class_name attribute is not set.
Examples
Get classes from a prior file:
>>> class_id, class_name = get_discrete_classes('PRIOR.h5', im=2) >>> for cid, cname in zip(class_id, class_name): ... print(f"Class {cid}: {cname}")
Get classes from a posterior file (automatically finds prior):
>>> class_id, class_name = get_discrete_classes('POST.h5', im=2) >>> if len(class_id) > 0: ... print(f"Found {len(class_id)} classes")
Check if parameter is discrete:
>>> class_id, class_name = get_discrete_classes('POST.h5', im=1) >>> if len(class_id) == 0: ... print("Model parameter M1 is continuous") ... else: ... print(f"Model parameter M1 is discrete with {len(class_id)} classes")
Notes
The function automatically determines whether the input file is a prior or posterior file. For posterior files, it extracts the prior file reference from the file attributes and reads the class information from the prior.
Class information is stored in the prior file attributes: - ‘class_id’: Numeric identifiers for each class (e.g., [0, 1, 2, 3]) - ‘class_name’: Text labels for each class (e.g., [‘Clay’, ‘Sand’, ‘Gravel’, ‘Bedrock’]) - ‘is_discrete’: Boolean flag indicating if the parameter is discrete
If the model parameter is continuous (is_discrete=False) or if the class attributes are not set, the function returns empty lists.
- integrate.integrate_io.get_geometry(f_data_h5)¶
Extract survey geometry data from HDF5 file.
Retrieves spatial coordinates, survey line identifiers, and elevation data from an INTEGRATE data file. Automatically handles both direct data files and posterior files that reference data files.
- Parameters:
f_data_h5 (str) – Path to the HDF5 file containing geometry data. Can be either a data file or posterior file (function automatically detects and uses correct file).
- Returns:
X (numpy.ndarray) – UTM X coordinates in meters, shape (N_points,).
Y (numpy.ndarray) – UTM Y coordinates in meters, shape (N_points,).
LINE (numpy.ndarray) – Survey line identifiers, shape (N_points,).
ELEVATION (numpy.ndarray) – Ground surface elevation in meters, shape (N_points,).
- Raises:
IOError – If the HDF5 file cannot be opened or required datasets are missing.
Examples
>>> X, Y, LINE, ELEVATION = get_geometry('data.h5') >>> print(f"Survey covers {X.max()-X.min():.0f}m x {Y.max()-Y.min():.0f}m")
Notes
The function expects geometry data to be stored in standard INTEGRATE format: - ‘/UTMX’: UTM X coordinates - ‘/UTMY’: UTM Y coordinates - ‘/LINE’: Survey line numbers - ‘/ELEVATION’: Ground elevation
When passed a posterior file, automatically extracts the reference to the original data file from the ‘f5_data’ attribute.
- integrate.integrate_io.get_gex_file_from_data(f_data_h5, id=1)¶
Retrieves the ‘gex’ attribute from the specified HDF5 file.
- Parameters:
f_data_h5 (str) – The path to the HDF5 file.
id (int) – The ID of the dataset within the HDF5 file. Defaults to 1.
- Returns:
The value of the ‘gex’ attribute if found, otherwise an empty string.
- Return type:
str
- integrate.integrate_io.get_number_of_data(f_data_h5, id=None, count_nan=False)¶
Get the number of data per location for datasets in an INTEGRATE data HDF5 file.
Returns a 2D numpy array of size (Ndataset, Ndatapoints) containing the number of valid (non-NaN) or total data values at each measurement location for each dataset.
- Parameters:
f_data_h5 (str) – Path to the HDF5 file containing INTEGRATE data with dataset groups.
id (int or list of int, optional) – Dataset identifier(s) to query (e.g., 1 for D1, [1,2] for D1 and D2). If None, returns data for all datasets found in the file.
count_nan (bool, optional) – If False (default), counts only non-NaN values at each location. If True, counts total number of data channels regardless of NaN values.
- Returns:
2D array of shape (Ndataset, Ndatapoints) where: - Ndataset: number of datasets - Ndatapoints: maximum number of data locations across all datasets - Values: number of valid data channels per location (or total if count_nan=True)
- Return type:
numpy.ndarray
- Raises:
FileNotFoundError – If the specified HDF5 file does not exist.
IOError – If the HDF5 file cannot be opened or read.
KeyError – If the specified dataset ID does not exist in the file.
Examples
>>> # Get non-NaN data counts for all datasets >>> data_counts = get_number_of_data('data.h5') >>> print(f"Shape: {data_counts.shape}") # (3, 4000) for 3 datasets, 4000 locations Shape: (3, 4000)
>>> # Get total data counts (including NaN) for specific dataset >>> counts_d1 = get_number_of_data('data.h5', id=1, count_nan=True) >>> print(f"Shape: {counts_d1.shape}") # (1, 4000) for 1 dataset, 4000 locations Shape: (1, 4000)
Notes
This function analyzes d_obs arrays in each dataset: - d_obs shape: (N_locations, N_data_per_location) - By default, counts non-NaN values: np.sum(~np.isnan(d_obs), axis=1) - With count_nan=True, returns total data channels: d_obs.shape[1] for each location
The returned 2D array allows easy comparison across datasets and locations. Missing datasets are filled with zeros in the output array.
- integrate.integrate_io.get_number_of_datasets(f_data_h5, return_ids=False)¶
Get the number of datasets (D1, D2, D3, etc.) in an INTEGRATE data HDF5 file.
Counts the number of dataset groups with names following the pattern ‘D1’, ‘D2’, ‘D3’, etc. in an INTEGRATE HDF5 data file. This function is useful for determining how many different data types or measurement systems are stored in a single file.
- Parameters:
f_data_h5 (str) – Path to the HDF5 file containing INTEGRATE data with dataset groups.
return_ids (bool, optional) – If True, returns the list of dataset IDs instead of just the count (default is False).
- Returns:
If return_ids=False: Number of datasets found in the file. Returns 0 if no datasets are found. If return_ids=True: List of dataset IDs (e.g., [1, 2, 3] for D1, D2, D3). Returns empty list if none found.
- Return type:
int or list
- Raises:
FileNotFoundError – If the specified HDF5 file does not exist.
IOError – If the HDF5 file cannot be opened or read.
Examples
>>> # Get number of datasets >>> n_datasets = get_number_of_datasets('data.h5') >>> print(f"File contains {n_datasets} datasets") File contains 3 datasets
>>> # Get dataset IDs >>> dataset_ids = get_number_of_datasets('data.h5', return_ids=True) >>> print(f"Dataset IDs: {dataset_ids}") Dataset IDs: [1, 2, 3]
Notes
This function looks for HDF5 groups with names starting with ‘D’ followed by digits. The typical INTEGRATE data file structure includes: - ‘/D1/’: First dataset (e.g., high moment data) - ‘/D2/’: Second dataset (e.g., low moment data) - ‘/D3/’: Third dataset (e.g., processed data) - And so on…
The function only counts top-level groups that match the ‘D{number}’ pattern, ignoring other groups like geometry data (UTMX, UTMY, etc.).
- integrate.integrate_io.gex_to_stm(file_gex, **kwargs)¶
Convert GEX system configuration to STM files for electromagnetic modeling.
Convenience function that combines GEX file reading and STM file generation into a single operation. Handles both file paths and pre-loaded GEX dictionaries to create system transfer matrix files required for GA-AEM forward modeling.
- Parameters:
file_gex (str or dict) – GEX system configuration. Pass a file path (str) to read and process a GEX file, or pass a pre-loaded GEX dictionary from a previous
read_gex()call.Nhank (int, optional) – Number of Hankel transform coefficients.
Nfreq (int, optional) – Number of frequencies for transform.
showInfo (int, optional) – Verbosity level.
- Returns:
stm_files (list of str) – Paths to the generated STM files.
GEX (dict) – Processed GEX dictionary used for STM generation.
- Raises:
TypeError – If file_gex is neither a string nor a dictionary.
FileNotFoundError – If file_gex is a string pointing to a non-existent file.
Notes
This function provides a streamlined workflow for electromagnetic system setup by automating the GEX→STM conversion process. The generated STM files contain system transfer functions needed for accurate forward modeling with GA-AEM.
When file_gex is a string, the function first tries
read_gex()for legacy format compatibility. If that fails (e.g., Workbench format), it automatically falls back toread_gex_workbench().When file_gex is a dictionary, it is assumed to be a valid GEX structure from a previous
read_gex()orread_gex_workbench()call.The write_stm_files() function handles the actual STM file generation with the provided or default parameters.
Examples
>>> # Direct file path (automatically detects format) >>> stm_files, GEX = gex_to_stm('TX08_20201112.gex')
>>> # Pre-loaded GEX dictionary >>> GEX = read_gex_workbench('TX08_20201112.gex') >>> stm_files, _ = gex_to_stm(GEX)
- integrate.integrate_io.hdf5_info(f_h5, verbose=True, load_data=False)¶
Get and print comprehensive information about an HDF5 file.
This function reads an HDF5 file (DATA, PRIOR, POST, or FORWARD) and prints detailed information about its contents, including datasets, dimensions, attributes, and file-type-specific metadata.
By default, only metadata (shapes, dtypes, attributes) is read for fast analysis. Set load_data=True to also compute data ranges and statistics.
- Parameters:
f_h5 (str) – Path to the HDF5 file to analyze.
verbose (bool, optional) – If True, prints detailed information. If False, returns dictionary only (default is True).
load_data (bool, optional) – If True, loads actual data to compute ranges and statistics. If False, only reads metadata (much faster, default is False).
- Returns:
info – Dictionary containing file information with keys: - ‘file_type’: Detected file type (‘DATA’, ‘PRIOR’, ‘POST’, ‘FORWARD’, or ‘UNKNOWN’) - ‘datasets’: List of dataset paths - ‘attributes’: Dictionary of root-level attributes - ‘structure’: Nested dictionary of file structure
- Return type:
dict
Examples
>>> hdf5_info('PRIOR.h5') >>> info = hdf5_info('DATA.h5', verbose=False) >>> info = hdf5_info('POST.h5', load_data=True) # Include data ranges
Notes
The function determines file type based on the presence of characteristic datasets: - DATA files: contain /UTMX, /UTMY, /ELEVATION, /LINE and /D1/, /D2/, etc. - PRIOR files: contain /M1, /M2, /D1, /D2 arrays - POST files: contain /i_use, /T, /EV attributes - FORWARD files: contain /method attribute
Performance: - With load_data=False (default): Very fast, only reads file metadata - With load_data=True: Slower, reads all data to compute ranges/statistics
See also
load_priorLoad prior model and data
load_dataLoad observational data
load_posteriorLoad posterior results
- integrate.integrate_io.hdf5_scan(file_path)¶
Scans an HDF5 file and prints information about datasets (including their size) and attributes.
- Parameters:
file_path (str) – The path to the HDF5 file.
- integrate.integrate_io.load_data(f_data_h5, id_arr=[], ii=None, **kwargs)¶
Load observational electromagnetic data from HDF5 file.
Loads observed electromagnetic measurements, uncertainties, covariance matrices, and associated metadata from structured HDF5 files. Handles multiple data types and noise models with automatic fallback for missing data components.
- Parameters:
f_data_h5 (str) – Path to the HDF5 file containing observational electromagnetic data.
id_arr (list of int, optional) – Dataset identifiers to load (e.g., [1, 2] for D1 and D2). Each ID corresponds to a different measurement system or processing stage (default is [1]).
ii (array-like, optional) – Array of indices specifying which data points to load from each dataset. If provided, only len(ii) data points will be loaded from each dataset using these indices (default is None).
**kwargs (dict) – Additional arguments: - showInfo : int, verbosity level (0=silent, 1=normal, >1=verbose)
- Returns:
Dictionary containing loaded observational data with keys:
- ’noise_model’list of str
Noise model type for each dataset (‘gaussian’, ‘multinomial’, etc.)
- ’d_obs’list of numpy.ndarray
Observed data measurements, shape (N_stations, N_channels) per dataset
- ’d_std’list of numpy.ndarray or None
Standard deviations of observations, same shape as d_obs
- ’Cd’list of numpy.ndarray or None
Full covariance matrices for each dataset
- ’id_arr’list of int
Dataset identifiers that were successfully loaded. If set as empty, all data types will be loaded
- ’i_use’list of numpy.ndarray
Data point usage indicators (1=use, 0=ignore)
- ’id_prior’list of int or numpy.ndarray
Index of prior data type to compare against, used for cross-referencing. If ‘id_prior’ is not present in the file, it defaults to the dataset id_arr
- Return type:
dict
Notes
The function gracefully handles missing data components: - Missing ‘id_prior’ defaults to sequential dataset IDs (1, 2, 3, …) - Missing ‘i_use’ defaults to ones array (use all data points) - Missing ‘d_std’ and ‘Cd’ remain as None (diagonal noise assumed)
Data structure follows INTEGRATE standard format: - ‘/D{id}/d_obs’: observed measurements - ‘/D{id}/d_std’: measurement uncertainties - ‘/D{id}/Cd’: full covariance matrix (optional) - ‘/D{id}/i_use’: data usage flags (optional) - ‘/D{id}/id_prior’: prior dataset cross-reference IDs (optional)
Each dataset can have a different noise model specified in the ‘noise_model’ attribute, enabling mixed data types in the same file.
- integrate.integrate_io.load_prior(f_prior_h5, N_use=0, idx=[], Randomize=False, ii=None)¶
Load prior model parameters and data from HDF5 file.
Loads both model parameters and forward-modeled data from a prior HDF5 file, with options for sample selection, indexing, and randomization. This is a convenience function that combines model and data loading operations.
- Parameters:
f_prior_h5 (str) – Path to the HDF5 file containing prior model realizations and data.
N_use (int, optional) – Number of samples to load. If 0, loads all available samples (default is 0).
idx (list, optional) – Specific indices to load. If empty, uses N_use or loads all samples (default is []).
Randomize (bool, optional) – Whether to randomize the order of loaded samples (default is False).
ii (array-like, optional) – Array of indices specifying which models and data to load. If provided, only len(ii) models and data will be loaded from ‘M1’, ‘M2’, … and ‘D1’, ‘D2’, … datasets using these indices (default is None).
- Returns:
D (dict) – Dictionary containing forward-modeled data arrays, with keys corresponding to data types (e.g., ‘D1’, ‘D2’).
M (dict) – Dictionary containing model parameter arrays, with keys corresponding to model types (e.g., ‘M1’, ‘M2’).
idx (numpy.ndarray) – Array of indices corresponding to the loaded samples.
Notes
This function internally calls load_prior_data() and load_prior_model() with consistent indexing to ensure data and model correspondence. Sample selection priority: ii > explicit idx > N_use > all samples.
- integrate.integrate_io.load_prior_data(f_prior_h5, id_use=[], idx=[], N_use=0, Randomize=False, **kwargs)¶
Load forward-modeled data arrays from prior HDF5 file.
Loads electromagnetic or other geophysical data predictions from forward modeling runs stored in the prior file. Supports selective loading by data type, sample indices, and randomization for sampling purposes.
- Parameters:
f_prior_h5 (str) – Path to the HDF5 file containing forward-modeled data arrays.
id_use (list of int, optional) – Data type identifiers to load (e.g., [1, 2] for D1 and D2). If empty, loads all available data types (default is []).
idx (list or array-like, optional) – Specific sample indices to load. If empty, uses N_use and Randomize to determine samples (default is []).
N_use (int, optional) – Number of samples to load. If 0, loads all available samples. Automatically limited to available data size (default is 0).
Randomize (bool, optional) – Whether to randomly select samples when idx is empty. If False, uses sequential selection (default is False).
- Returns:
D (list of numpy.ndarray) – List of forward-modeled data arrays, one for each requested data type. Each array has shape (N_samples, N_data_points).
idx (numpy.ndarray) – Array of sample indices that were loaded, useful for consistent indexing with corresponding model parameters.
Notes
Data arrays are stored as HDF5 datasets with keys ‘/D1’, ‘/D2’, etc., representing different data types (e.g., different measurement systems, frequencies, or processing stages). The function automatically detects available data types and loads the requested subset.
Sample selection follows the same priority as load_prior_model(): explicit idx > N_use random/sequential > all samples.
- integrate.integrate_io.load_prior_model(f_prior_h5, im_use=[], idx=[], N_use=0, Randomize=False)¶
Load model parameter arrays from prior HDF5 file.
Loads model parameter arrays (e.g., resistivity, layer thickness, geological units) from a prior HDF5 file with flexible model selection and sample indexing options. Supports loading specific model types and sample subsets.
- Parameters:
f_prior_h5 (str) – Path to the HDF5 file containing prior model parameter realizations.
im_use (list of int, optional) – Model parameter indices to load (e.g., [1, 2] for M1 and M2). If empty, loads all available model parameters (default is []).
idx (list or array-like, optional) – Specific sample indices to load. If empty, uses N_use and Randomize to determine samples (default is []).
N_use (int, optional) – Number of samples to load. If 0, loads all available samples. Ignored if idx is provided (default is 0).
Randomize (bool, optional) – Whether to randomly select samples when idx is empty. If False, uses sequential selection (default is False).
- Returns:
M (list of numpy.ndarray) – List of model parameter arrays, one for each requested model type. Each array has shape (N_samples, N_model_parameters).
idx (numpy.ndarray) – Array of sample indices that were loaded, useful for consistent indexing across related datasets.
Notes
The function automatically detects available model parameters (M1, M2, …) and loads the requested subset. Sample selection priority follows: explicit idx > N_use random/sequential > all samples.
When idx length differs from N_use, the function uses len(idx) and issues a warning message.
- integrate.integrate_io.merge_data(f_data, f_gex='', delta_line=0, f_data_merged_h5='', **kwargs)¶
Merge multiple data files into a single HDF5 file.
- Parameters:
f_data (list) – List of input data files to merge.
f_gex (str, optional) – Path to geometry exchange file, by default ‘’.
delta_line (int, optional) – Line number increment for each merged file, by default 0.
f_data_merged_h5 (str, optional) – Output merged HDF5 file path, by default derived from f_gex.
kwargs – Additional keyword arguments.
- Returns:
Filename of the merged HDF5 file.
- Return type:
str
- Raises:
ValueError – If f_data is not a list.
- integrate.integrate_io.merge_posterior(f_post_h5_files, f_data_h5_files, f_post_merged_h5='', showInfo=0)¶
Merge multiple posterior sampling results into unified datasets.
Combines posterior results from separate electromagnetic survey areas or time periods into single merged files for comprehensive regional analysis. Handles both model parameter statistics and observational data consolidation.
- Parameters:
f_post_h5_files (list of str) – List of paths to posterior HDF5 files containing sampling results from different survey areas or processing runs.
f_data_h5_files (list of str) – List of paths to corresponding observational data HDF5 files. Must have same length as f_post_h5_files with matching order.
f_post_merged_h5 (str, optional) – Output path for merged posterior file. If empty, generates default name based on input files (default is ‘’).
- Returns:
Tuple containing (merged_posterior_path, merged_data_path) where: - merged_posterior_path : str, path to merged posterior HDF5 file - merged_data_path : str, path to merged observational data HDF5 file
- Return type:
tuple
- Raises:
ValueError – If f_data_h5_files and f_post_h5_files have different lengths.
FileNotFoundError – If any input files do not exist or cannot be accessed.
Notes
The merging process combines: - Model parameter statistics (Mean, Median, Mode, Std, Entropy) - Temperature and evidence fields from sampling - Geometry and observational data from all survey areas - Metadata and file references for traceability
Spatial coordinates are preserved to maintain geographic relationships between different survey areas. The merged files retain full compatibility with INTEGRATE analysis and visualization functions.
If f_post_merged_h5 is not provided, the output uses the format
'POST_merged_N{N}.h5'and the data file uses'DATA_merged_N{N}.h5'. Posterior files must have compatible structure for merging.
- integrate.integrate_io.merge_prior(f_prior_h5_files, f_prior_merged_h5='', shuffle=True, showInfo=0)¶
Merge multiple prior model files into a single combined HDF5 file.
Combines prior model parameters and forward-modeled data from multiple HDF5 files into a unified dataset. Creates a new model parameter (MX where X is the next available number) that tracks the source file index for each sample, enabling traceability of merged data origins.
- Parameters:
f_prior_h5_files (list of str) – List of paths to prior HDF5 files to merge. Each file must contain compatible model parameters (M1, M2, M3, …) and data arrays (D1, D2, …).
f_prior_merged_h5 (str, optional) – Output path for the merged prior file. If empty, generates default name ‘PRIOR_merged_N{number_of_files}.h5’ (default is ‘’).
shuffle (bool, optional) – If True (default), randomly shuffle the order of realizations in the merged output. The same permutation is applied to all datasets (M1, M2, D1, D2, etc.) to maintain consistency. This is useful for ensuring realizations from different source files are well-mixed. If False, realizations are concatenated in order.
showInfo (int, optional) – Verbosity level for progress information. Higher values provide more detailed output (default is 0).
- Returns:
Path to the merged prior HDF5 file.
- Return type:
str
- Raises:
ValueError – If f_prior_h5_files is not a list or is empty.
FileNotFoundError – If any input files do not exist or cannot be accessed.
Notes
The merging process: - Concatenates all model parameters (M1, M2, M3, …) across files - Concatenates all data arrays (D1, D2, D3, …) across files - Creates new MX parameter (where X is next available number) containing source file indices (1-based) - Optionally shuffles realizations using a consistent permutation across all arrays - Preserves HDF5 attributes that are identical across all input files - Updates metadata to reflect merged status
Shuffling Behavior: When shuffle=True (default), a random permutation is applied to all realizations: - A single permutation is generated and applied to ALL datasets (M1, M2, D1, D2, etc.) - This ensures realizations remain synchronized across all parameters - Uses fixed random seed (42) for reproducibility - Useful for mixing realizations from different source files - The source file tracking parameter (MX) is also shuffled to maintain traceability
Attribute Preservation: The function intelligently copies dataset attributes from input files to the merged file: - Only attributes that are identical across all input files are copied - This includes important attributes like class_name, class_id, is_discrete, clim, cmap, etc. - Attributes for data arrays (D1, D2, …) like method, type, Nfreq, etc. are preserved - Special handling for x and z attributes to match potentially padded dimensions
Source File Tracking: The new MX parameter is a DISCRETE integer array with shape (Ntotal, 1) where each value indicates which input file the corresponding sample originated from: - 1: samples from first file in f_prior_h5_files - 2: samples from second file in f_prior_h5_files - etc.
The MX parameter is properly marked with: - is_discrete = 1 (discrete parameter type) - shape = (Ntotal, 1) (consistent with other model parameters) - class_name = meaningful names derived from filenames - class_id = [1, 2, 3, …] (class identifiers)
File Compatibility: Input files can have different model parameter dimensions (e.g., different numbers of layers). Arrays with fewer parameters will be padded with NaN values to match the maximum dimensions. Data arrays should ideally have the same dimensions, but padding is applied if they differ.
Examples
>>> # Default: merge with shuffling >>> f_files = ['prior1.h5', 'prior2.h5', 'prior3.h5'] >>> merged_file = merge_prior(f_files, 'combined_prior.h5') >>> print(f"Merged {len(f_files)} files into {merged_file}")
>>> # Merge without shuffling (preserves original order) >>> merged_file = merge_prior(f_files, 'combined_prior.h5', shuffle=False)
>>> # Merge with verbose output >>> merged_file = merge_prior(f_files, 'combined_prior.h5', showInfo=1)
- integrate.integrate_io.post_to_csv(f_post_h5='', Mstr='/M1')¶
Export posterior results to CSV format for GIS integration.
Converts posterior sampling results to CSV files containing spatial coordinates and model parameter statistics. Creates files suitable for import into GIS software or other analysis tools.
- Parameters:
f_post_h5 (str, optional) – Path to the HDF5 file containing posterior results. If empty string, uses a default example file (default is ‘’).
Mstr (str, optional) – Model parameter dataset path within the HDF5 file (e.g., ‘/M1’, ‘/M2’). Specifies which model parameter to export (default is ‘/M1’).
- Returns:
Path to the generated CSV file.
- Return type:
str
- Raises:
KeyError – If the specified model parameter dataset does not exist in the HDF5 file.
FileNotFoundError – If the specified HDF5 file does not exist or cannot be accessed.
Notes
The exported CSV file contains: - X, Y: UTM coordinates - ELEVATION: Ground surface elevation - Model statistics: Mean, Median, Mode, Standard deviation - For discrete models: probability distributions across classes - For continuous models: quantile values and uncertainty measures
The function automatically handles both discrete and continuous model types based on the ‘is_discrete’ attribute in the prior file. Output format is optimized for GIS applications with appropriate coordinate reference systems.
TODO: Future enhancements planned for LINE number export and separate functions for grid vs. point data export.
- integrate.integrate_io.read_borehole(filename, **kwargs)¶
Read one or more borehole dictionaries from a JSON file.
The file must have been written by
write_borehole(). If the file contains a single borehole (JSON object) a single dict is returned. If it contains many boreholes (JSON array) a list of dicts is returned.The returned dict(s) preserve all fields that were written, including the optional
elevationkey (ground-surface elevation in m a.s.l.) used byplot_boreholes()for elevation-axis plotting.- Parameters:
filename (str) – JSON file path to read.
**kwargs –
- showInfoint, optional
Verbosity level. Default is 0.
- Returns:
Single borehole dict, or list of borehole dicts.
- Return type:
dict or list of dict
Examples
>>> W = ig.read_borehole('borehole.json') >>> print(W['name'], W['depth_top'])
>>> WELLS = ig.read_borehole('all_boreholes.json') >>> for W in WELLS: ... print(W['name'])
- integrate.integrate_io.read_gex(file_gex, **kwargs)¶
Parse GEX (Geometry Exchange) file into structured dictionary.
Reads and parses electromagnetic system configuration files in GEX format, which contain survey geometry, system parameters, waveforms, and timing information required for electromagnetic forward modeling.
- Parameters:
file_gex (str) – Path to the GEX file containing electromagnetic system configuration.
Nhank (int, optional) – Number of Hankel transform abscissae for both frequency windows.
Nfreq (int, optional) – Number of frequencies per decade for both frequency windows.
Ndig (int, optional) – Number of digits for waveform digitizing frequency.
showInfo (int, optional) – Verbosity level (0=silent, >0=verbose, default 0).
- Returns:
Dictionary containing parsed GEX file contents. Keys include
'filename'(str),'General'(dict), section-specific parameter dicts,'WaveformLM'(ndarray),'WaveformHM'(ndarray), and'GateArray'(ndarray).- Return type:
dict
- Raises:
FileNotFoundError – If the specified GEX file does not exist or cannot be accessed.
Notes
GEX files use a section-based format with key=value pairs: - [Section] headers define parameter groups - Numeric values are automatically converted to numpy arrays - String values are preserved as text - Waveform and gate timing data are consolidated into arrays
The parser automatically handles: - Multi-point waveform definitions (WaveformLMPoint*, WaveformHMPoint*) - Gate timing arrays (GateTime*) - Numeric array conversion with space-separated values - Comments and formatting variations
Output dictionary structure matches INTEGRATE conventions for electromagnetic system configuration and GA-AEM compatibility.
- integrate.integrate_io.read_gex_workbench(file_gex, **kwargs)¶
Parse Seequent Workbench GEX file into structured dictionary.
Reads and parses electromagnetic system configuration files in the newer Seequent Workbench GEX format, which supports both dual-moment (LM/HM) and single-moment configurations. This function handles: - Dual-moment systems with GateTimeLM## and GateTimeHM## entries - Single-moment systems with GateTime## entries (e.g., Diamond SkyTEM) - WaveformLMPoint## and WaveformHMPoint## entries
- Parameters:
file_gex (str) – Path to the GEX file containing electromagnetic system configuration.
**kwargs (dict) – Additional parsing parameters: - showInfo : int, verbosity level (0=silent, >0=verbose, default 0)
- Returns:
Dictionary containing parsed GEX file contents with structure: - ‘filename’ : str, original file path - ‘General’ : dict, system description and general parameters - ‘WaveformLM’ : numpy.ndarray, low-moment waveform points - ‘WaveformHM’ : numpy.ndarray, high-moment waveform points (if present) - ‘GateArrayLM’ : numpy.ndarray, low-moment gate timing (if dual-moment) - ‘GateArrayHM’ : numpy.ndarray, high-moment gate timing (if dual-moment) - ‘GateArray’ : numpy.ndarray, gate timing (if single-moment)
- Return type:
dict
- Raises:
FileNotFoundError – If the specified GEX file does not exist or cannot be accessed.
Notes
This function supersedes read_gex() for newer Workbench-exported files.
Format detection: - Dual-moment: Keys contain ‘GateTimeLM’ or ‘GateTimeHM’ suffixes - Single-moment: Keys are ‘GateTime##’ without moment identifier
Examples
>>> GEX = read_gex_workbench('TX08_20201112.gex') >>> print(GEX['General']['GateArrayLM'].shape) # Dual-moment (30, 3) >>> print(GEX['General']['GateArrayHM'].shape) (30, 3)
>>> GEX = read_gex_workbench('diamond_system.gex') >>> print(GEX['General']['GateArray'].shape) # Single-moment (25, 3)
- integrate.integrate_io.read_usf(file_path: str) Dict[str, Any]¶
Parse Universal Sounding Format (USF) electromagnetic data file.
Reads and parses USF files containing electromagnetic survey data including measurement sweeps, timing information, and system parameters. USF is a standard format for time-domain electromagnetic data exchange.
- Parameters:
file_path (str) – Path to the USF file to be parsed.
- Returns:
Dictionary containing parsed USF file contents with keys: - ‘sweeps’ : list of dict, measurement sweep data with timing and values - ‘header’ : dict, file header information and metadata - ‘parameters’ : dict, system and acquisition parameters - ‘dummy_value’ : float, placeholder value for missing data - Additional keys for file-specific parameters and settings
- Return type:
Dict[str, Any]
Notes
USF files contain structured electromagnetic data with sections for: - Header information (file version, date, system type) - Acquisition parameters (timing, frequencies, coordinates) - Measurement sweeps with data points and uncertainties - System configuration and processing parameters
The parser handles various USF format variations and automatically converts numeric data while preserving text metadata. Sweep data includes timing gates, measured values, and quality indicators.
This function is compatible with USF files from various electromagnetic systems and processing software, following standard format specifications for time-domain electromagnetic data exchange.
- integrate.integrate_io.read_usf_mul(directory: str = '.', ext: str = '.usf') List[Dict[str, Any]]¶
Read all USF files in a specified directory and return a list of USF data structures.
- Parameters:
directory – Path to the directory containing USF files (default: current directory)
ext – File extension to look for (default: “.usf”)
- Returns:
np.ndarray: Array of observed data (d_obs) from all USF files
np.ndarray: Array of relative errors (d_rel_err) from all USF files
List[Dict[str, Any]]: List of USF data structures, each representing a single USF file
- Return type:
tuple containing
- integrate.integrate_io.save_data_gaussian(D_obs, D_std=[], d_std=[], Cd=[], id=1, id_prior=None, i_use=None, is_log=0, f_data_h5='data.h5', UTMX=None, UTMY=None, LINE=None, ELEVATION=None, delete_if_exist=False, name=None, compression=None, compression_opts=None, **kwargs)¶
Save observational data with Gaussian noise model to HDF5 file.
Creates HDF5 datasets for electromagnetic or other geophysical measurements assuming Gaussian-distributed uncertainties. Handles both diagonal and full covariance representations of measurement errors.
- Parameters:
D_obs (numpy.ndarray) – Observed data measurements with shape (N_stations, N_channels). Each row represents a measurement location, each column a data channel.
D_std (list, optional) – Standard deviations of observed data, same shape as D_obs. If empty, computed from d_std parameter (default is []).
d_std (list, optional) – Default standard deviation values or multipliers for uncertainty calculation when D_std is not provided (default is []).
Cd (list, optional) – Full covariance matrices for measurement uncertainties. If provided, takes precedence over D_std (default is []).
id (int, optional) – Dataset identifier for HDF5 group naming (‘/D{id}’, default is 1).
id_prior (int, optional) – Prior dataset identifier to compare against during inversion. If specified, observed data in /D{id} will be compared with prior data in /D{id_prior}. If None, defaults to same ID (D1 compares with D1, D2 with D2, etc.) (default is None).
i_use (numpy.ndarray, optional) – Binary mask indicating which data points to use in inversion, shape (N_stations,) or (N_stations,1). Values of 1 indicate data should be used, 0 indicates data should be excluded. If None, creates array of ones (all data used by default, default is None).
is_log (int, optional) – Flag indicating logarithmic data scaling (0=linear, 1=log, default is 0).
f_data_h5 (str, optional) – Path to output HDF5 file (default is ‘data.h5’).
UTMX (numpy.ndarray, optional) – UTM X coordinates in meters, shape (N_stations,) or (N_stations,1). If None, creates sequential integers (default is None).
UTMY (numpy.ndarray, optional) – UTM Y coordinates in meters, shape (N_stations,) or (N_stations,1). If None, creates zeros array (default is None).
LINE (numpy.ndarray, optional) – Survey line identifiers, shape (N_stations,) or (N_stations,1). If None, creates array filled with 1s (default is None).
ELEVATION (numpy.ndarray, optional) – Ground surface elevation in meters, shape (N_stations,) or (N_stations,1). If None, creates zeros array (default is None).
delete_if_exist (bool, optional) – Whether to delete the entire HDF5 file if it exists before creating new data. Use with caution as this removes all existing data (default is False).
name (str, optional) – Optional name attribute to be written to the data group. If provided, this string will be stored as an attribute alongside ‘noise_model’ (default is None).
compression (str or None, optional) – Compression filter to use. Options: ‘gzip’, ‘lzf’, or None. If None (default), uses global DEFAULT_COMPRESSION setting. Set to False to explicitly disable compression.
compression_opts (int, optional) – Compression level (0-9 for gzip). If None (default), uses global DEFAULT_COMPRESSION_OPTS setting. Level 1 provides 78% faster writes than level 9 with only 2% larger files.
**kwargs (dict) – Additional metadata parameters: - showInfo : int, verbosity level - Other dataset attributes for electromagnetic processing
- Returns:
Path to the HDF5 file where data was written.
- Return type:
str
Notes
The function creates HDF5 structure following INTEGRATE conventions: - ‘/D{id}/d_obs’: observed measurements - ‘/D{id}/d_std’: measurement standard deviations (if available) - ‘/D{id}/Cd’: full covariance matrix (if provided) - Dataset attributes include ‘noise_model’=’gaussian’
Uncertainty handling priority: Cd > D_std > computed from d_std The Gaussian noise model assumes independent, normally distributed measurement errors with specified standard deviations or covariances.
Compression settings default to module-wide DEFAULT_COMPRESSION and DEFAULT_COMPRESSION_OPTS values (gzip level 1 by default), providing 3.5x file size reduction with good performance.
Note
Additional Parameters (kwargs):
showInfo (int): Level of verbosity for printing information. Default is 0.
f_gex (str): Name of the GEX file associated with the data. Default is empty string.
Behavior:
If D_std is not provided, it is calculated as d_std * D_obs
If coordinate parameters (UTMX, UTMY, LINE, ELEVATION) are provided, uses check_data() to create/update geometry datasets
If coordinate parameters are not provided, creates default geometry datasets if they don’t exist
If a group with name ‘D{id}’ exists, it is removed before adding new data
Writes attributes ‘noise_model’ and ‘is_log’ to the dataset group
- integrate.integrate_io.save_data_multinomial(D_obs, i_use=None, id=[], id_prior=None, f_data_h5='data.h5', compression=None, compression_opts=None, **kwargs)¶
Save observed data to an HDF5 file in a specified group with a multinomial noise model.
- Parameters:
D_obs (numpy.ndarray) – The observed data array to be written to the file.
id (list, optional) – The ID of the group to write the data to. If not provided, the function will find the next available ID.
id_prior (int, optional) – The ID of PRIOR data to compare against this data. If not set, id_prior=id
f_data_h5 (str, optional) – The path to the HDF5 file where the data will be written. Default is ‘data.h5’.
kwargs – Additional keyword arguments.
- Returns:
The path to the HDF5 file where the data was written.
- Return type:
str
- integrate.integrate_io.save_prior_data(f_prior_h5, D_new, id=None, force_delete=False, compression='gzip', compression_opts=1, **kwargs)¶
Save forward-modeled data arrays to prior HDF5 file.
Saves electromagnetic or other geophysical data predictions from forward modeling to an HDF5 file with automatic data identifier assignment and data type optimization. Supports overwriting existing data arrays.
- Parameters:
f_prior_h5 (str) – Path to the HDF5 file where forward-modeled data will be saved.
D_new (numpy.ndarray) – Forward-modeled data array to save. Should have shape (N_samples, N_data_points) for consistency.
id (int, optional) – Data identifier for the dataset key (creates ‘/D{id}’). If None, automatically assigns the next available ID (default is None).
force_delete (bool, optional) – Whether to delete existing data with the same identifier before saving. If False, raises error when key exists (default is False).
compression (str or None, optional) – Compression filter to use. Options: ‘gzip’, ‘lzf’, or None. Default is ‘gzip’ for good compression with reasonable speed. Set to None to disable compression (fastest I/O, largest files).
compression_opts (int, optional) – Compression level (0-9 for gzip). Default is 1 (optimal balance). Level 1 provides 78% faster writes than level 9 with only 2% larger files. Only used when compression=’gzip’. Ignored if compression is None.
**kwargs (dict) – Additional arguments: - showInfo : int, verbosity level (0=silent, >0=verbose)
- Returns:
id – The data identifier used for saving the data.
- Return type:
int
Notes
Forward-modeled data is stored as HDF5 datasets with keys ‘/D1’, ‘/D2’, etc., representing different data types (e.g., electromagnetic frequencies, measurement systems, or processing variants).
Data type optimization is performed automatically: - Floating-point arrays are converted to float32 for memory efficiency - Integer arrays are preserved as appropriate integer types
Compression settings (default: gzip level 1): - Provides 3.5x file size reduction vs no compression - 78% faster write than gzip level 9 (old default) - Only 2% larger files than maximum compression
The function ensures 2D array format with shape (N_samples, N_data_points).
- integrate.integrate_io.save_prior_model(f_prior_h5, M_new, im=None, force_replace=False, delete_if_exist=False, compression='gzip', compression_opts=1, **kwargs)¶
Save model parameter arrays to prior HDF5 file.
Saves model parameter realizations (e.g., resistivity, layer thickness) to an HDF5 file with automatic model identifier assignment and data type optimization. Supports overwriting existing models and file management options.
- Parameters:
f_prior_h5 (str) – Path to the HDF5 file where model data will be saved.
M_new (numpy.ndarray) – Model parameter array to save. Can be 1D or 2D; 1D arrays are automatically converted to column vectors.
im (int, optional) – Model identifier for the dataset key (creates ‘/M{im}’). If None, automatically assigns the next available ID (default is None).
force_replace (bool, optional) – Whether to overwrite existing model data with the same identifier. If False, raises error when key exists (default is False).
delete_if_exist (bool, optional) – Whether to delete the entire HDF5 file before saving. Use with caution as this removes all existing data (default is False).
compression (str or None, optional) – Compression filter to use. Options: ‘gzip’, ‘lzf’, or None. - ‘gzip’: Good compression ratio, moderate speed (default) - ‘lzf’: Faster but lower compression ratio - None: No compression, fastest read/write Default is ‘gzip’. Set to None for temporary files or fast iteration.
compression_opts (int, optional) – Compression level for gzip (1-9). Higher = better compression but slower. - 1: Fast compression, excellent balance (NEW DEFAULT, changed from 9) - 4: Good compression, moderate speed - 9: Maximum compression, very slow (OLD DEFAULT) Only used when compression=’gzip’. Default is 1.
**kwargs (dict) – Additional arguments: - showInfo : int, verbosity level (0=silent, >0=verbose)
- Returns:
im – The model identifier used for saving the data.
- Return type:
int
Notes
Model data is stored as HDF5 datasets with keys ‘/M1’, ‘/M2’, etc. Data type optimization is performed automatically: - Floating-point arrays are converted to float32 for memory efficiency - Integer arrays are preserved as appropriate integer types
Compression settings (based on performance tests with N=50000): - compression=None: Fastest (baseline), but 3.6x larger files - compression=’gzip’, compression_opts=1: OPTIMAL - 78% faster than level 9, only 2% larger (NEW DEFAULT) - compression=’gzip’, compression_opts=4: 71% faster than level 9, only 0.5% larger - compression=’gzip’, compression_opts=9: Maximum compression, very slow (diminishing returns)
Recommendation: The new default (gzip level 1) provides the best balance: - 3.5x file size reduction vs no compression - 78% faster write than the old default (level 9) - Only 2% larger files than maximum compression
For temporary files or rapid iteration, use compression=None. For maximum compression (archival), use compression_opts=9.
The function ensures 2D array format with shape (N_samples, N_parameters) where 1D arrays are converted to column vectors.
- integrate.integrate_io.test_read_usf(file_path: str) None¶
Test function to read a USF file and print some key values.
- Parameters:
file_path – Path to the USF file
- integrate.integrate_io.write_borehole(W, filename, **kwargs)¶
Write one or more borehole dictionaries to a JSON file.
- Parameters:
W (dict or list of dict) –
A single borehole dict, or a list of borehole dicts. Each dict may contain any combination of the standard borehole fields:
name(str) – identifierX,Y(float) – UTM coordinatesdepth_top,depth_bottom(list of float) – interval boundaries (m)class_obs(list of int) – observed lithology class per intervalclass_prob(list of float) – confidence per interval (0–1)method(str, optional) – likelihood method (default'mode_probability')elevation(float, optional) – ground-surface elevation (m a.s.l.). Used only byplot_boreholes()to place the well on a shared elevation axis. Has no effect on inversion.
numpy arrays and scalars are automatically converted to plain Python lists/numbers so the file is human-readable JSON.
filename (str) – Output JSON file path (e.g.
'borehole.json').**kwargs –
- showInfoint, optional
Verbosity level. Default is 0.
- Returns:
The filename written.
- Return type:
str
Examples
>>> W = {'name': 'BH1', 'X': 498832.5, 'Y': 6250843.1, ... 'depth_top': [0, 10], 'depth_bottom': [10, 20], ... 'class_obs': [1, 2], 'class_prob': [0.9, 0.9]} >>> ig.write_borehole(W, 'borehole.json') 'borehole.json'
>>> WELLS = [W1, W2, W3] >>> ig.write_borehole(WELLS, 'all_boreholes.json') 'all_boreholes.json'
- integrate.integrate_io.write_data_gaussian(*args, **kwargs)¶
[DEPRECATED] Use save_data_gaussian() instead.
This function has been renamed to save_data_gaussian() to maintain consistency with the HDF5 I/O naming convention (load_* / save_* for HDF5 operations).
The write_data_gaussian() function will be removed in a future version. Please update your code to use save_data_gaussian() instead.
See also
save_data_gaussianThe new function name for this functionality
- integrate.integrate_io.write_data_multinomial(*args, **kwargs)¶
[DEPRECATED] Use save_data_multinomial() instead.
This function has been renamed to save_data_multinomial() to maintain consistency with the HDF5 I/O naming convention (load_* / save_* for HDF5 operations).
The write_data_multinomial() function will be removed in a future version. Please update your code to use save_data_multinomial() instead.
See also
save_data_multinomialThe new function name for this functionality
- integrate.integrate_io.write_stm_files(GEX, **kwargs)¶
Generate STM (System Transfer Matrix) files from GEX system configuration.
Creates system transfer matrix files required for electromagnetic forward modeling using GA-AEM. Processes both high-moment (HM) and low-moment (LM) configurations with customizable frequency content and Hankel transform parameters.
- Parameters:
GEX (dict) – Dictionary containing GEX system configuration data with keys: - ‘General’: System description and waveform information - Waveform and timing parameters for electromagnetic modeling
**kwargs (dict) – Additional configuration parameters: - Nhank : int, number of Hankel transform coefficients (default 280) - Nfreq : int, number of frequencies for transform (default 12) - Ndig : int, number of digital filters (default 7) - showInfo : int, verbosity level (0=silent, >0=verbose) - WindowWeightingScheme : str, weighting scheme (‘AreaUnderCurve’, ‘BoxCar’) - NumAbsHM : int, number of abscissae for high moment (default Nhank) - NumAbsLM : int, number of abscissae for low moment (default Nhank) - NumFreqHM : int, number of frequencies for high moment (default Nfreq) - NumFreqLM : int, number of frequencies for low moment (default Nfreq)
- Returns:
List of file paths for the generated STM files (typically HM and LM variants).
- Return type:
list of str
Notes
STM files contain system transfer functions that describe the electromagnetic system response characteristics needed for accurate forward modeling. The function generates separate files for high-moment and low-moment configurations when applicable.
The generated STM files follow GA-AEM format specifications and include: - Frequency domain transfer functions - Hankel transform coefficients - Digital filter parameters - System timing and waveform information
File naming convention follows: {system_description}_{moment_type}.stm
- integrate.integrate_io.xyz_to_h5(file_xyz, file_gex, f_data_h5=None, i_lm_skip=None, i_hm_skip=None, nan_value=None, showInfo=0, disregardFullNan=True)¶
Convert Aarhus Workbench XYZ export file(s) to an INTEGRATE HDF5 data file.
Reads one or more tTEM/SkyTEM XYZ files exported from Aarhus Workbench and writes a Gaussian-noise HDF5 data file suitable for use with
integrate_rejection(). The GEX file is used to determine which initial gates to skip per channel (RemoveInitialGates) and the total gate count (NoGates).- Parameters:
file_xyz (str or list of str) – Path(s) to Aarhus Workbench XYZ export file(s). Multiple files are concatenated in order (e.g. several flight days sharing one GEX).
file_gex (str) – Path to the GEX file describing the EM system configuration. Gate selection and channel count are read from this file.
f_data_h5 (str, optional) – Output HDF5 file path. If None, derived by joining the XYZ basename(s) with
'_'and appending'.h5'.i_lm_skip (list of int, optional) – Workbench LM gate numbers to exclude from inversion (1-indexed, same numbering as in the XYZ file header, e.g.
DBDT_Ch1GT3= gate 3). Gates already removed byRemoveInitialGatesin the GEX are silently ignored. Excluded gates have theird_obsset to NaN andd_stdset to 100.i_hm_skip (list of int, optional) – Same as
i_lm_skipbut for HM (channel 2) Workbench gate numbers.nan_value (float or None, optional) – Value used as a missing-data sentinel in the XYZ file. If None (default), the value is read from the XYZ file header (
/DUMMYfield, viamodel_info['dummy']); falls back to 9999 if the header field is absent. Pass an explicit value to override the header (e.g.nan_value=9999).showInfo (int, optional) – Verbosity level (default 0).
-1: suppress all output.0: minimal — print only the output file summary (“Adding group …”).>=1: verbose — also print each XYZ file as it is read.disregardFullNan (bool, optional) – If True (default), soundings where all gates are NaN are excluded from the output HDF5 file.
- Returns:
Path to the written HDF5 file.
- Return type:
str
Notes
d_stdin the XYZ file is a relative (fractional) uncertainty; absoluted_stdis computed asrelative_std * d_obs.Geometry (UTMX, UTMY, LINE, ELEVATION) is taken from channel-1 rows.
The
/D1/i_lmand/D1/i_hmdatasets store the 0-indexed gate arrays that were used (mirrors the MATLAB output).The GEX file path is stored as the
gexattribute on/D1.Requires
libaarhusxyz(pip install libaarhusxyz).
Examples
Single file:
>>> f = xyz_to_h5('tTEM_20230727_AVG_export.xyz', ... 'TX07_20230731_2x4_RC20-33.gex')
Multiple files merged into one HDF5:
>>> f = xyz_to_h5( ... ['tTEM_20230727_AVG_export.xyz', 'tTEM_20230814_AVG_export.xyz'], ... 'TX07_20230731_2x4_RC20-33.gex' ... )
Skip Workbench LM gates 2, 3 and HM gates 27-30:
>>> f = xyz_to_h5('data.xyz', 'system.gex', ... i_lm_skip=[2, 3], i_hm_skip=[27, 28, 29, 30])