Researcher profile

Earl Lawrence

Earl Lawrence contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
6works
0followers
5topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

6 published item(s)

preprint2026arXiv

Generalization vs. Memorization in Autoregressive Deep Learning: Or, Examining Temporal Decay of Gradient Coherence

Foundation models trained as autoregressive PDE surrogates hold significant promise for accelerating scientific discovery through their capacity to both extrapolate beyond training regimes and efficiently adapt to downstream tasks despite a paucity of examples for fine-tuning. However, reliably achieving genuine generalization - a necessary capability for producing novel scientific insights and robustly performing during deployment - remains a critical challenge. Establishing whether or not these requirements are met demands evaluation metrics capable of clearly distinguishing genuine model generalization from mere memorization. We apply the influence function formalism to systematically characterize how autoregressive PDE surrogates assimilate and propagate information derived from diverse physical scenarios, revealing fundamental limitations of standard models and training routines in addition to providing actionable insights regarding the design of improved surrogates.

preprint2026arXiv

In-context learning enables continental-scale subsurface temperature prediction from sparse local observations

Continental-scale knowledge of subsurface temperature is limited by the cost and sparsity of borehole measurements, but such information is essential for geothermal resource assessment and for understanding heat transport in the shallow crust. The thermal field reflects the interaction between lithology, crustal structure, radiogenic heat production, and advective fluid flow, sometimes producing sharp anomalies that are smoothed by conventional interpolation or difficult to capture with physical models. Here we introduce In-Context Earth, a transformer-based model that uses sparse local borehole observations as geological context to predict continuous temperature-at-depth fields with calibrated uncertainty. In the contiguous United States, the model achieves a mean absolute error of 4.7 °C, outperforming the physics-informed Stanford Thermal Model, a model based on AlphaEarth embeddings, the multimodal Transparent Earth model, and universal kriging, while resolving sharper thermal gradients in geothermal provinces. Its uncertainty estimates are well calibrated, with a Kolmogorov-Smirnov statistic of 2.5%. Without finetuning, the model adapts to Alberta, Australia, and the United Kingdom (UK) using only 20 local observations at inference time, maintaining high accuracy in geologically distinct test regions with a mean absolute error of 2.2 °C in Alberta, 6.2 °C in Australia, and 5.4 °C in the UK. Interpretability analyses show that the model learns internal representations of subsurface properties it never observes during training, including seismic velocities, geochemistry, and crustal structure, and uses these representations in physically consistent ways. More broadly, this work shows that in-context learning can use sparse borehole observations for continental-scale subsurface characterization, without requiring dense measurements or region-specific retraining.

preprint2022arXiv

Fast emulation of density functional theory simulations using approximate Gaussian processes

Fitting a theoretical model to experimental data in a Bayesian manner using Markov chain Monte Carlo typically requires one to evaluate the model thousands (or millions) of times. When the model is a slow-to-compute physics simulation, Bayesian model fitting becomes infeasible. To remedy this, a second statistical model that predicts the simulation output -- an &#34;emulator&#34; -- can be used in lieu of the full simulation during model fitting. A typical emulator of choice is the Gaussian process (GP), a flexible, non-linear model that provides both a predictive mean and variance at each input point. Gaussian process regression works well for small amounts of training data ($n < 10^3$), but becomes slow to train and use for prediction when the data set size becomes large. Various methods can be used to speed up the Gaussian process in the medium-to-large data set regime ($n > 10^5$), trading away predictive accuracy for drastically reduced runtime. This work examines the accuracy-runtime trade-off of several approximate Gaussian process models -- the sparse variational GP, stochastic variational GP, and deep kernel learned GP -- when emulating the predictions of density functional theory (DFT) models. Additionally, we use the emulators to calibrate, in a Bayesian manner, the DFT model parameters using observed data, resolving the computational barrier imposed by the data set size, and compare calibration results to previous work. The utility of these calibrated DFT models is to make predictions, based on observed data, about the properties of experimentally unobserved nuclides of interest e.g. super-heavy nuclei.

preprint2020arXiv

An Initial Exploration of Bayesian Model Calibration for Estimating the Composition of Rocks and Soils on Mars

The Mars Curiosity rover carries an instrument, ChemCam, designed to measure the composition of surface rocks and soil using laser-induced breakdown spectroscopy (LIBS). The measured spectra from this instrument must be analyzed to identify the component elements in the target sample, as well as their relative proportions. This process, which we call disaggregation, is complicated by so-called matrix effects, which describe nonlinear changes in the relative heights of emission lines as an unknown function of composition due to atomic interactions within the LIBS plasma. In this work we explore the use of the plasma physics code ATOMIC, developed at Los Alamos National Laboratory, for the disaggregation task. ATOMIC has recently been used to model LIBS spectra and can robustly reproduce matrix effects from first principles. The ability of ATOMIC to predict LIBS spectra presents an exciting opportunity to perform disaggregation in a manner not yet tried in the LIBS community, namely via Bayesian model calibration. However, using it directly to solve our inverse problem is computationally intractable due to the large parameter space and the computation time required to produce a single output. Therefore we also explore the use of emulators as a fast solution for this analysis. We discuss a proof of concept Gaussian process emulator for disaggregating two-element compounds of sodium and copper. The training and test datasets were simulated with ATOMIC using a Latin hypercube design. After testing the performance of the emulator, we successfully recover the composition of 25 test spectra with Bayesian model calibration.

preprint2020arXiv

Estimating Scale Discrepancy in Bayesian Model Calibration for ChemCam on the Mars Curiosity Rover

The Mars rover Curiosity carries an instrument called ChemCam to determine the composition of the soil and rocks. ChemCam uses laser-induced breakdown spectroscopy (LIBS) for this purpose. Los Alamos National Laboratory has developed a simulation capability that can predict spectra from ChemCam, but there are major scale differences between the prediction and observation. This presents a challenge when using Bayesian model calibration to determine the unknown physical parameters that describe the LIBS observations. We present an analysis of LIBS data to support ChemCam based on including a structured discrepancy model in a Bayesian model calibration scheme. This is both a novel application of Bayesian model calibration and a general purpose approach to accounting for such systematic differences between theory and observation in this setting.

preprint2020arXiv

The Mira-Titan Universe. III. Emulation of the Halo Mass Function

We construct an emulator for the halo mass function over group and cluster mass scales for a range of cosmologies, including the effects of dynamical dark energy and massive neutrinos. The emulator is based on the recently completed Mira-Titan Universe suite of cosmological $N$-body simulations. The main set of simulations spans 111 cosmological models with 2.1 Gpc boxes. We extract halo catalogs in the redshift range $z=[0.0, 2.0]$ and for masses $M_{200\mathrm{c}}\geq 10^{13}M_\odot/h$. The emulator covers an 8-dimensional hypercube spanned by {$Ω_\mathrm{m}h^2$, $Ω_\mathrm{b}h^2$, $Ω_νh^2$, $σ_8$, $h$, $n_s$, $w_0$, $w_a$}; spatial flatness is assumed. We obtain smooth halo mass functions by fitting piecewise second-order polynomials to the halo catalogs and employ Gaussian process regression to construct the emulator while keeping track of the statistical noise in the input halo catalogs and uncertainties in the regression process. For redshifts $z\lesssim1$, the typical emulator precision is better than $2\%$ for $10^{13}-10^{14} M_\odot/h$ and $<10\%$ for $M\simeq 10^{15}M_\odot/h$. For comparison, fitting functions using the traditional universal form for the halo mass function can be biased at up to 30\% at $M\simeq 10^{14}M_\odot/h$ for $z=0$. Our emulator is publicly available at \url{https://github.com/SebastianBocquet/MiraTitanHMFemulator}.