Don't know where your data is from? Bayesian modeling for unknown coordinates | Christopher Krapu %F0%9F%8C%81"> An especially strong motivating case for the usage of spatial probability models comes from the mining industry. During exploration for mineral resources, prospectors will take geologic samples by drilling holes and examining the resulting material for presence or concentration of valuable ores. These data typically show strong spatial correlation, but constructing a fully-detailed geophysical model is at times infeasible as we are able to observe very little of the underground conditions, though the advent of remote sensing techniques like ground-penetrating radar and gravimetry has dramatically improved our ability to characterize Earth’s subsurface. To address this challenge, we would like to construct a probability model which uses nearby data to predict a variable of interest at a new location.<br>To illustrate the problem better, we will use a dataset of uranium and vanadium point-referenced concentration measurements from Walker Lake. The data originate from Isaaks and Srivastava’s An Introduction to Applied Geostatistics and are distributed with the R package gstat.<br>Up to this point, you may have seen lots of examples of how Gaussian process models are use in robotics, spatial statistics, neuroscience, etc. Now, we work through a more exotic example which modifies the Gaussian process model to accommodate the case in which the actual location of our data points is not known precisely, and is only observed with substantial measurement noise. Spatial location error changes the covariance and prediction problem itself, a point emphasized in geostatistical work on location error and GP regression work on noisy spatial inputs by Cressie and Kornak and Cervone and Pillai.<br>This may seem like an unusual example at first glance, but we include it to give an example of how Bayesian modeling with appropriate priors lets us modify and change nearly any part of the model, given that we have some idea of how to represent our assumptions as part of the model process. Then, we use Monte Carlo methods to turn the inference crank and obtain reliable parameter estimates.<br>Introducing some more notation, let \(\tilde{\mathbf{s}}_i\) denote the recorded coordinate and \(\mathbf{s}_i\) the latent coordinate where the measurement actually occurred. We use \(\mathbf{s}_i = \tilde{\mathbf{s}}_i + \Delta_i\), with \(\Delta_i \sim \operatorname{Normal}(\mathbf{0}, \sigma_s^2 I_2)\), and evaluate the Gaussian process at \(\mathbf{s}_i\) rather than at \(\tilde{\mathbf{s}}_i\). Our choice of coordinate system here is somewhat arbitrary; we could also choose to work with polar coordinates and place priors over the magnitude and the angle of the location error. The scale \(\sigma_s\) is treated as known in this example so that the model represents different assumed levels of coordinate error.<br>\[\begin{aligned} \Delta_i &\sim \mathrm{Normal}(\mathbf{0}, \sigma_s^2\mathbf{I}_2) \\ \mathbf{s}_i &= \tilde{\mathbf{s}}_i + \Delta_i \\ \mu &\sim \mathrm{Normal}(2, 2) \\ \sigma &\sim \mathrm{HalfNormal}(1) \\ \ell &\sim \mathrm{HalfNormal}(100) \\ \sigma_0 &\sim \mathrm{HalfNormal}(0.5) \\ f(\cdot) \mid \sigma,\ell &\sim \mathcal{GP}(0, \sigma^2 c(\cdot,\cdot;\ell)) \\ Y_i \mid f,\mu,\sigma_0,\mathbf{s}_i &\sim \mathrm{Normal}(\mu + f(\mathbf{s}_i), \sigma_0) \end{aligned}\] This model is computationally harder than the fixed-location GP because the covariance matrix changes whenever the latent coordinates change. We use pm.gp.Marginal so that the latent GP values at the observations are integrated out.<br>We will construct datasets with increasing noise and examine how the model’s parameter estimates change. To begin, we perturb the original coordinates with increasing noise:<br>from pathlib import Path
import arviz as az<br>import matplotlib as mpl<br>import matplotlib.pyplot as plt<br>import numpy as np<br>import pandas as pd<br>import pymc as pm<br>import seaborn as sns<br>from matplotlib import colors<br>from matplotlib.patches import Circle
from stsp import use_sepia_gunmetal
RANDOM_SEED = 8927<br>np.random.seed(RANDOM_SEED)
theme, sepia_cmap = use_sepia_gunmetal()
plot_bg_color = "#1c1c1d"<br>plot_text_color = "#e8e8e8"
plt.style.use("dark_background")<br>mpl.rcParams.update(<br>"figure.facecolor": plot_bg_color,<br>"axes.facecolor": plot_bg_color,<br>"savefig.facecolor": plot_bg_color,<br>"savefig.edgecolor": plot_bg_color,<br>"text.color": plot_text_color,<br>"axes.labelcolor": plot_text_color,<br>"xtick.color": plot_text_color,<br>"ytick.color": plot_text_color,<br>"axes.edgecolor": "#828282",
theme["paper"] = plot_bg_color
df = pd.read_csv(Path("../../data/walker/cleaned.csv"))
X_walker = df[["x_m", "y_m"]].to_numpy()<br>y_walker = df["u_log10_p1"].to_numpy()
n_prediction_grid = 70<br>buffer_fraction = 0.1<br>x_range = df["x_m"].max() - df["x_m"].min()<br>y_range = df["y_m"].max() - df["y_m"].min()<br>x_limits = (df["x_m"].min() - buffer_fraction * x_range, df["x_m"].max() +...