Borehole Data Format¶
Overview¶
INTEGRATE supports integration of borehole (well log) data with geophysical surveys such as electromagnetic (EM) data. Boreholes provide direct observations of subsurface lithology at discrete depth intervals, which can be combined with spatially extensive geophysical data to improve characterization across the entire survey area.
This document describes:
How borehole data is structured and stored
The workflow for incorporating boreholes into probabilistic inversion
Distance-based weighting for spatial extrapolation
File formats and data structures
Core Functionality¶
Borehole handling is implemented in the integrate_borehole module with the following key functions:
prior_data_borehole(): Compute and save prior borehole data (dispatcher — recommended entry point)prior_data_borehole_class_mode(): Prior data using mode class extraction from prior realizationsprior_data_borehole_class_layer(): Prior data using direct layer-probability approachsave_borehole_data(): One-call wrapper — compute and save both prior and observed data for a boreholecompute_P_obs_discrete(): Low-level: compute observation probability from discrete lithology intervalsrescale_P_obs_temperature(): Apply temperature annealing for distance-based weightingPobs_to_datagrid(): Extrapolate point observations to survey grid with distance weightingget_weight_from_position(): Calculate spatial and data-similarity weights
Borehole Data Structure¶
Borehole Dictionary Format¶
Boreholes are represented as Python dictionaries containing lithology observations and spatial coordinates:
BH = {
'depth_top': [0, 8, 12, 16, 34], # Top depths (m)
'depth_bottom': [8, 12, 16, 28, 36], # Bottom depths (m)
'class_obs': [1, 2, 1, 5, 4], # Lithology class IDs
'class_prob': [0.9, 0.9, 0.9, 0.9, 0.9], # Confidence (0-1)
'X': 498832.5, # UTM Easting
'Y': 6250843.1, # UTM Northing
'name': '65.795', # Borehole identifier
'method': 'mode_probability', # Integration method
'elevation': 42.5 # Ground-surface elevation (m a.s.l.) — optional
}
Field Descriptions:
depth_topArray of top depths for each lithology interval (meters below surface). Must be strictly increasing.
depth_bottomArray of bottom depths for each lithology interval (meters below surface). Must match length of
depth_top.class_obsArray of observed lithology class IDs corresponding to each depth interval. Values must match
class_iddefined in prior model.class_probConfidence level (0-1) for each observation. Can be:
Scalar: applies same confidence to all intervals
Array: per-interval confidence specification
Default: 0.9 (high confidence)
X,YSpatial coordinates in UTM projection (meters). Used for distance-based weighting.
nameOptional string identifier for the borehole.
methodIntegration method. Options:
'mode_probability'(recommended),'layer_probability'.elevationOptional. Ground-surface elevation of the borehole in metres above sea level (m a.s.l.). When set to a non-zero value,
plot_boreholes()switches to elevation mode: the shared Y-axis shows absolute elevation instead of depth, and each borehole is vertically positioned at its surface elevation. Boreholes without this key (or withelevation=0) are placed at elevation 0 m a.s.l. in the plot. This field has no effect on inversion.
JSON File Format¶
Borehole dictionaries can be saved to and loaded from human-readable JSON files
using ig.write_borehole() and ig.read_borehole(). A JSON file may contain
either a single borehole (JSON object) or many boreholes (JSON array).
Single borehole file (borehole_65.795.json):
{
"name": "65.795",
"X": 498832.5,
"Y": 6250843.1,
"depth_top": [0, 8, 12, 16, 34],
"depth_bottom": [8, 12, 16, 28, 36],
"class_obs": [1, 2, 1, 5, 4],
"class_prob": [0.9, 0.9, 0.9, 0.9, 0.9],
"method": "mode_probability",
"elevation": 42.5
}
Multi-borehole file (all_boreholes.json):
[
{
"name": "65.795",
"X": 498832.5,
"Y": 6250843.1,
"depth_top": [0, 8, 12, 16, 34],
"depth_bottom": [8, 12, 16, 28, 36],
"class_obs": [1, 2, 1, 5, 4],
"class_prob": [0.9, 0.9, 0.9, 0.9, 0.9],
"method": "mode_probability",
"elevation": 42.5
},
{
"name": "65.732",
"X": 499100.0,
"Y": 6251200.0,
"depth_top": [0, 5, 15, 25],
"depth_bottom": [5, 15, 25, 40],
"class_obs": [2, 1, 3, 2],
"class_prob": [0.8, 0.8, 0.8, 0.8],
"method": "mode_probability",
"elevation": 38.1
}
]
ig.read_borehole() automatically detects the format: it returns a single dict
for a single-borehole file, or a list of dicts for a multi-borehole file.
numpy arrays stored in the dict are automatically converted to plain Python lists when writing, so the file is fully human-readable and editable in any text editor.
Writing and reading boreholes:
import integrate as ig
# Write a single borehole
ig.write_borehole(BH, 'borehole_65.795.json', showInfo=1)
# Write a list of boreholes (all in one file)
ig.write_borehole([BH1, BH2, BH3], 'all_boreholes.json', showInfo=1)
# Read back a single borehole
BH = ig.read_borehole('borehole_65.795.json', showInfo=1)
# Read back all boreholes
BHOLES = ig.read_borehole('all_boreholes.json', showInfo=1)
for BH in BHOLES:
print(BH['name'], BH['X'], BH['Y'])
Visualising boreholes:
# Plot lithology sticks (one subplot per borehole) — depth mode (default)
ig.plot_boreholes(BHOLES)
# With class names and colours from the prior HDF5 file
ig.plot_boreholes(BHOLES, f_prior_h5='PRIOR.h5')
# Load directly from a JSON file
ig.plot_boreholes('all_boreholes.json', f_prior_h5='PRIOR.h5')
# Elevation mode: set BH['elevation'] on any borehole to switch the shared
# Y-axis from depth to absolute elevation (m a.s.l.)
BHOLES[0]['elevation'] = 42.5 # ground-surface elevation of first borehole
BHOLES[1]['elevation'] = 38.1 # ground-surface elevation of second borehole
ig.plot_boreholes(BHOLES) # Y-axis now shows elevation; wells positioned correctly
HDF5 File Structure¶
Prior File (f_prior_h5)¶
The prior file contains lithology models and their spatial discretization:
f_prior_h5
├── /M1 # Dense parameters (e.g., resistivity)
├── /M2 # Lithology models (discrete)
│ ├── shape: (N_realizations, N_depth_points)
│ └── attributes:
│ ├── 'x' → depth array [m]
│ ├── 'class_id' → [0, 1, 2, ...] lithology identifiers
│ ├── 'class_name' → ['sand', 'clay', 'gravel', ...]
│ └── 'cmap' → colormap for visualization
├── /D1, /D2, ... # Forward modeled data (e.g., tTEM)
└── /D3, /D4, ... # Borehole lithology prior data
├── shape: (N_realizations, N_intervals)
└── stores mode lithology for each borehole interval
Key Points:
/M2contains the lithology models sampled from the prior distributionM2.attrs['x']provides the depth discretization (uniform spacing)M2.attrs['class_id']defines valid lithology class identifiers/D3, /D4, ...store extracted lithology mode for each borehole
Data File (f_data_h5)¶
The data file contains observed borehole data and survey geometry:
f_data_h5
├── /D1 # tTEM observed data
│ ├── d_obs # Observed data values
│ ├── d_std # Data uncertainties
│ └── id_prior # Reference to /D1 in prior file
├── /D2, /D3, ... # Borehole observations
│ ├── d_obs # Probability matrix (nd × nclass × nm)
│ ├── i_use # Binary mask (nd × 1)
│ ├── id_prior # Reference to /D{id} in prior file
│ └── noise_model # 'multinomial'
├── /UTMX # Survey Easting coordinates
├── /UTMY # Survey Northing coordinates
├── /LINE # Survey line identifiers
└── /ELEVATION # Ground surface elevation
Data Array Dimensions:
d_obs: shape (nd, nclass, nm) where:nd = number of survey data points
nclass = number of lithology classes
nm = number of depth intervals per borehole
i_use: shape (nd, 1), binary mask (1 = use, 0 = ignore)id_prior: scalar or array, references which prior data to compare against
Workflow¶
Complete Workflow Example¶
The recommended workflow uses ig.save_borehole_data() to handle all borehole
processing in a single call per borehole:
import integrate as ig
# 1. Define boreholes
BH1 = {
'depth_top': [0, 8, 12, 16, 34],
'depth_bottom': [8, 12, 16, 28, 36],
'class_obs': [1, 2, 1, 5, 4],
'class_prob': 0.9,
'X': 498832.5,
'Y': 6250843.1,
'name': 'BH_1',
'method': 'mode_probability'
}
BH2 = {
'depth_top': [0, 5, 15, 25],
'depth_bottom': [5, 15, 25, 40],
'class_obs': [2, 1, 3, 2],
'class_prob': 0.8,
'X': 499100.0,
'Y': 6251200.0,
'name': 'BH_2',
'method': 'mode_probability'
}
BHOLES = [BH1, BH2]
# 2. Process all boreholes — one call per borehole
im_prior = 2 # index of lithology model parameter (M2)
r_data = 2 # full-strength radius (m)
r_dis = 300 # fade-out radius (m)
id_borehole_list = []
for BH in BHOLES:
id_prior, id_out = ig.save_borehole_data(
f_prior_h5, f_data_h5, BH,
im_prior=im_prior, r_data=r_data, r_dis=r_dis,
parallel=False, showInfo=1)
id_borehole_list.append(id_out)
# 3. Run joint inversion (tTEM + boreholes)
f_post_h5 = ig.integrate_rejection(
f_prior_data_h5,
f_data_h5,
id_use=[1] + id_borehole_list, # e.g. [1, 2, 3]
parallel=True
)
save_borehole_data internally performs these three steps automatically:
Compute prior borehole data — calls
prior_data_borehole()which extracts mode lithology from prior realizations and saves it tof_prior_h5Extrapolate to survey grid — calls
Pobs_to_datagrid()to spatially distribute the point observation with distance-based weightingSave observed data — calls
save_data_multinomial()to write the gridded observations tof_data_h5
save_borehole_data Parameters¶
id_prior, id_out = ig.save_borehole_data(
f_prior_h5, # Path to prior HDF5 file
f_data_h5, # Path to observed-data HDF5 file
BH, # Borehole dictionary
im_prior=2, # Model parameter index (e.g. 2 → /M2)
parallel=False, # Parallel mode extraction
r_data=2, # Full-strength radius (m)
r_dis=300, # Fade-out radius (m)
doPlot=False, # Plot distance-weight maps
showInfo=1 # Verbosity (0=silent, 1=summary line)
)
Returns:
id_priorDataset index of the new
/Dentry added tof_prior_h5.id_outDataset index of the new
/Dentry added tof_data_h5. Append toid_borehole_listfor use inid_useduring inversion.
Integration Methods¶
BH['method'] controls how lithology observations are converted to probabilities:
'mode_probability'(recommended)Extracts the mode lithology from prior realizations within each observed depth interval, providing realistic probability distributions that account for prior model variability. Uses
prior_data_borehole_class_mode()internally.'layer_probability'Converts depth-interval observations directly to a probability matrix without querying the prior ensemble. Faster but less prior-aware. Uses
prior_data_borehole_class_layer()internally.
Distance-Based Weighting¶
Spatial Weighting¶
Borehole observations influence survey points based on distance:
Weight Calculation:
w_combined, w_dis, w_data, i_ref = ig.get_weight_from_position(
f_data_h5,
x_well=BH['X'],
y_well=BH['Y'],
r_dis=300,
r_data=2,
doPlot=True
)
Weight Components:
w_disDistance-based weight (spatial proximity)
w_dataData-similarity weight (optional, requires reference data)
w_combinedCombined weight: w_dis × w_data
Temperature Annealing:
Distance converts to temperature for probability scaling:
P_obs_scaled = ig.rescale_P_obs_temperature(P_obs, T=temperature)
T = 1.0: No scaling (full confidence)
T > 1.0: Flattens distribution (less confident)
T >> 1.0: Approaches uniform distribution (observation ignored)
Behavior by Distance:
d < r_data: T ≈ 1, full observation strength
r_data < d < r_dis: T increases gradually
d > r_dis: T >> 1, observation effectively ignored
Distance Weighting Function:
The weight decreases with distance using a Gaussian-like function:
This weight is converted to temperature for probability scaling:
Visualization¶
Visualize weight distribution:
w, _, _, _ = ig.get_weight_from_position(
f_data_h5,
x_well=BH['X'],
y_well=BH['Y'],
r_dis=300,
doPlot=True
)
This creates a map showing how borehole influence decreases with distance across the survey area.
Multiple Boreholes¶
Processing Multiple Boreholes¶
Use save_borehole_data() in a loop — one call per borehole:
BHOLES = [BH1, BH2, BH3]
id_borehole_list = []
for BH in BHOLES:
id_prior, id_out = ig.save_borehole_data(
f_prior_h5, f_data_h5, BH,
im_prior=2, r_data=2, r_dis=300,
showInfo=1)
id_borehole_list.append(id_out)
# Joint inversion: tTEM (D1) + all boreholes
f_post_h5 = ig.integrate_rejection(
f_prior_data_h5,
f_data_h5,
id_use=[1] + id_borehole_list,
parallel=True
)
Overlapping Influence Zones¶
When multiple boreholes have overlapping influence zones (r_dis), the inversion automatically handles this through the multinomial noise model. Each borehole observation is treated as an independent constraint, and the posterior distribution reflects the combined information from all boreholes and geophysical data.
Performance Considerations¶
Parallel Processing¶
For large prior ensembles (N > 100,000), enable parallel processing in save_borehole_data:
id_prior, id_out = ig.save_borehole_data(
f_prior_h5, f_data_h5, BH,
parallel=True # Parallel mode class extraction
)
Speedup (mode_probability method):
1 core: baseline
4 cores: ~3-4× faster
8 cores: ~6-8× faster
Memory usage scales with number of processes.
Optimization Tips¶
Adjust Distance Parameters:
Smaller r_dis reduces the area influenced by each borehole:
Faster processing (fewer survey points affected)
Less memory required
More localized borehole influence
Examples¶
Complete Example: Workflow Script¶
See the complete working example in examples/integrate_workflow.py, which demonstrates:
Defining borehole dictionaries
Processing boreholes with
ig.save_borehole_data()Running joint inversion (tTEM + boreholes)
Visualizing results
Key code section:
im_prior = 2 # lithology model index (M2)
r_data = 2 # full-strength radius (m)
r_dis = 300 # fade-out radius (m)
id_borehole_list = []
for BH in BHOLES:
id_prior, id_out = ig.save_borehole_data(
f_prior_h5, f_data_h5, BH,
im_prior=im_prior, r_data=r_data, r_dis=r_dis,
parallel=parallel, showInfo=1)
id_borehole_list.append(id_out)
f_post_h5 = ig.integrate_rejection(
f_prior_data_h5, f_data_h5,
id_use=[1] + id_borehole_list,
parallel=True)
API Reference¶
Quick Reference¶
High-level Functions (recommended):
from integrate import (
save_borehole_data, # One-call: prior + observed data for a borehole
prior_data_borehole, # Dispatcher: compute and save prior borehole data
prior_data_borehole_class_mode, # Prior data via mode extraction from ensemble
prior_data_borehole_class_layer, # Prior data via direct layer probability
)
Low-level / utility functions:
from integrate import (
compute_P_obs_discrete, # Direct probability from observations
rescale_P_obs_temperature, # Temperature-based scaling
Pobs_to_datagrid, # Extrapolate to survey grid
get_weight_from_position, # Calculate spatial weights
save_prior_data, # Save data to prior file
save_data_multinomial, # Save discrete observations to data file
write_borehole, # Write borehole dict(s) to JSON
read_borehole, # Read borehole dict(s) from JSON
plot_boreholes, # Plot lithology sticks
)
See Also¶
Data format - General data format specifications
Workflows - Complete inversion workflow
Notebooks - Jupyter notebook examples
References¶
For more information on the theoretical background:
Hansen et al. (2021): Localized rejection sampling for Bayesian inversion
Madsen et al. (2023): Probabilistic lithology modeling