Source author record

Earl Lawrence

Earl Lawrence appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

astro-ph.CO Machine Learning Applications astro-ph.IM Methodology astro-ph.HE Computation hep-ph physics.comp-ph physics.geo-ph

Catalog footprint

What is connected

13works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Generalization vs. Memorization in Autoregressive Deep Learning: Or, Examining Temporal Decay of Gradient Coherence

Foundation models trained as autoregressive PDE surrogates hold significant promise for accelerating scientific discovery through their capacity to both extrapolate beyond training regimes and efficiently adapt to downstream tasks despite a paucity of examples for fine-tuning. However, reliably achieving genuine generalization - a necessary capability for producing novel scientific insights and robustly performing during deployment - remains a critical challenge. Establishing whether or not these requirements are met demands evaluation metrics capable of clearly distinguishing genuine model generalization from mere memorization. We apply the influence function formalism to systematically characterize how autoregressive PDE surrogates assimilate and propagate information derived from diverse physical scenarios, revealing fundamental limitations of standard models and training routines in addition to providing actionable insights regarding the design of improved surrogates.

preprint2026arXiv

In-context learning enables continental-scale subsurface temperature prediction from sparse local observations

Continental-scale knowledge of subsurface temperature is limited by the cost and sparsity of borehole measurements, but such information is essential for geothermal resource assessment and for understanding heat transport in the shallow crust. The thermal field reflects the interaction between lithology, crustal structure, radiogenic heat production, and advective fluid flow, sometimes producing sharp anomalies that are smoothed by conventional interpolation or difficult to capture with physical models. Here we introduce In-Context Earth, a transformer-based model that uses sparse local borehole observations as geological context to predict continuous temperature-at-depth fields with calibrated uncertainty. In the contiguous United States, the model achieves a mean absolute error of 4.7 °C, outperforming the physics-informed Stanford Thermal Model, a model based on AlphaEarth embeddings, the multimodal Transparent Earth model, and universal kriging, while resolving sharper thermal gradients in geothermal provinces. Its uncertainty estimates are well calibrated, with a Kolmogorov-Smirnov statistic of 2.5%. Without finetuning, the model adapts to Alberta, Australia, and the United Kingdom (UK) using only 20 local observations at inference time, maintaining high accuracy in geologically distinct test regions with a mean absolute error of 2.2 °C in Alberta, 6.2 °C in Australia, and 5.4 °C in the UK. Interpretability analyses show that the model learns internal representations of subsurface properties it never observes during training, including seismic velocities, geochemistry, and crustal structure, and uses these representations in physically consistent ways. More broadly, this work shows that in-context learning can use sparse borehole observations for continental-scale subsurface characterization, without requiring dense measurements or region-specific retraining.

preprint2022arXiv

Fast emulation of density functional theory simulations using approximate Gaussian processes

Fitting a theoretical model to experimental data in a Bayesian manner using Markov chain Monte Carlo typically requires one to evaluate the model thousands (or millions) of times. When the model is a slow-to-compute physics simulation, Bayesian model fitting becomes infeasible. To remedy this, a second statistical model that predicts the simulation output -- an "emulator" -- can be used in lieu of the full simulation during model fitting. A typical emulator of choice is the Gaussian process (GP), a flexible, non-linear model that provides both a predictive mean and variance at each input point. Gaussian process regression works well for small amounts of training data ($n < 10^3$), but becomes slow to train and use for prediction when the data set size becomes large. Various methods can be used to speed up the Gaussian process in the medium-to-large data set regime ($n > 10^5$), trading away predictive accuracy for drastically reduced runtime. This work examines the accuracy-runtime trade-off of several approximate Gaussian process models -- the sparse variational GP, stochastic variational GP, and deep kernel learned GP -- when emulating the predictions of density functional theory (DFT) models. Additionally, we use the emulators to calibrate, in a Bayesian manner, the DFT model parameters using observed data, resolving the computational barrier imposed by the data set size, and compare calibration results to previous work. The utility of these calibrated DFT models is to make predictions, based on observed data, about the properties of experimentally unobserved nuclides of interest e.g. super-heavy nuclei.

preprint2020arXiv

An Initial Exploration of Bayesian Model Calibration for Estimating the Composition of Rocks and Soils on Mars

The Mars Curiosity rover carries an instrument, ChemCam, designed to measure the composition of surface rocks and soil using laser-induced breakdown spectroscopy (LIBS). The measured spectra from this instrument must be analyzed to identify the component elements in the target sample, as well as their relative proportions. This process, which we call disaggregation, is complicated by so-called matrix effects, which describe nonlinear changes in the relative heights of emission lines as an unknown function of composition due to atomic interactions within the LIBS plasma. In this work we explore the use of the plasma physics code ATOMIC, developed at Los Alamos National Laboratory, for the disaggregation task. ATOMIC has recently been used to model LIBS spectra and can robustly reproduce matrix effects from first principles. The ability of ATOMIC to predict LIBS spectra presents an exciting opportunity to perform disaggregation in a manner not yet tried in the LIBS community, namely via Bayesian model calibration. However, using it directly to solve our inverse problem is computationally intractable due to the large parameter space and the computation time required to produce a single output. Therefore we also explore the use of emulators as a fast solution for this analysis. We discuss a proof of concept Gaussian process emulator for disaggregating two-element compounds of sodium and copper. The training and test datasets were simulated with ATOMIC using a Latin hypercube design. After testing the performance of the emulator, we successfully recover the composition of 25 test spectra with Bayesian model calibration.

preprint2020arXiv

Estimating Scale Discrepancy in Bayesian Model Calibration for ChemCam on the Mars Curiosity Rover

The Mars rover Curiosity carries an instrument called ChemCam to determine the composition of the soil and rocks. ChemCam uses laser-induced breakdown spectroscopy (LIBS) for this purpose. Los Alamos National Laboratory has developed a simulation capability that can predict spectra from ChemCam, but there are major scale differences between the prediction and observation. This presents a challenge when using Bayesian model calibration to determine the unknown physical parameters that describe the LIBS observations. We present an analysis of LIBS data to support ChemCam based on including a structured discrepancy model in a Bayesian model calibration scheme. This is both a novel application of Bayesian model calibration and a general purpose approach to accounting for such systematic differences between theory and observation in this setting.

preprint2020arXiv

The Mira-Titan Universe. III. Emulation of the Halo Mass Function

We construct an emulator for the halo mass function over group and cluster mass scales for a range of cosmologies, including the effects of dynamical dark energy and massive neutrinos. The emulator is based on the recently completed Mira-Titan Universe suite of cosmological $N$-body simulations. The main set of simulations spans 111 cosmological models with 2.1 Gpc boxes. We extract halo catalogs in the redshift range $z=[0.0, 2.0]$ and for masses $M_{200\mathrm{c}}\geq 10^{13}M_\odot/h$. The emulator covers an 8-dimensional hypercube spanned by {$Ω_\mathrm{m}h^2$, $Ω_\mathrm{b}h^2$, $Ω_νh^2$, $σ_8$, $h$, $n_s$, $w_0$, $w_a$}; spatial flatness is assumed. We obtain smooth halo mass functions by fitting piecewise second-order polynomials to the halo catalogs and employ Gaussian process regression to construct the emulator while keeping track of the statistical noise in the input halo catalogs and uncertainties in the regression process. For redshifts $z\lesssim1$, the typical emulator precision is better than $2\%$ for $10^{13}-10^{14} M_\odot/h$ and $<10\%$ for $M\simeq 10^{15}M_\odot/h$. For comparison, fitting functions using the traditional universal form for the halo mass function can be biased at up to 30\% at $M\simeq 10^{14}M_\odot/h$ for $z=0$. Our emulator is publicly available at \url{https://github.com/SebastianBocquet/MiraTitanHMFemulator}.

preprint2016arXiv

Rare Event Statistics Applied to Fast Radio Bursts

Statistical interpretation of sparsely sampled event rates has become vital for new transient surveys, particularly those aimed at detecting fast radio bursts (FRBs). We provide an accessible reference for a number of simple, but critical, statistical questions relevant for current transient and FRB research and utilizing the negative binomial model for counts in which the count rate parameter is uncertain or randomly biased from one study to the next. We apply these methods to re-assess and update results from previous FRB surveys, finding as follows. 1) Thirteen FRBs detected across five high-Galactic-latitude (> 30$^\circ$) surveys are highly significant $(p = 5\times 10^{-5})$ evidence of a higher rate relative to the single FRB detected across four low-latitude (< 5$^\circ$) surveys, even after accounting for effects that dampen Galactic plane sensitivity. High- vs. mid-latitude (5 to 15$^\circ$) is marginally significant $(p = 0.03)$. 2) A meta analysis of twelve heterogeneous surveys gives an FRB rate of 2866 sky$^{-1}$ day$^{-1}$ above 1 Jy at high Galactic latitude (95% confidence 1121 to 7328) and 285 sky$^{-1}$ day$^{-1}$ at low/mid latitudes (95% from 48 to 1701). 3) Using the Parkes HTRU high-latitude setup requires 193 observing hours to achieve 50% probability of detecting an FRB and 937 hours to achieve 95% probability, based on the ten detections of (Champion et al. 2016) and appropriately accounting for uncertainty in the unknown Poisson rate. 4) Two quick detections at Parkes from a small number of high-latitude fields (Ravi et al. 2015; Petroff et al. 2015) tentatively favor a look long survey style relative to the scan wide HTRU survey, but only at $p = 0.07$ significance.

preprint2015arXiv

Partitioning a Large Simulation as It Runs

As computer simulations continue to grow in size and complexity, they present a particularly challenging class of big data problems. Many application areas are moving toward exascale computing systems, systems that perform $10^{18}$ FLOPS (FLoating-point Operations Per Second) --- a billion billion calculations per second. Simulations at this scale can generate output that exceeds both the storage capacity and the bandwidth available for transfer to storage, making post-processing and analysis challenging. One approach is to embed some analyses in the simulation while the simulation is running --- a strategy often called in situ analysis --- to reduce the need for transfer to storage. Another strategy is to save only a reduced set of time steps rather than the full simulation. Typically the selected time steps are evenly spaced, where the spacing can be defined by the budget for storage and transfer. This paper combines both of these ideas to introduce an online in situ method for identifying a reduced set of time steps of the simulation to save. Our approach significantly reduces the data transfer and storage requirements, and it provides improved fidelity to the simulation to facilitate post-processing and reconstruction. We illustrate the method using a computer simulation that supported NASA's 2009 Lunar Crater Observation and Sensing Satellite mission.

preprint2015arXiv

The Mira-Titan Universe: Precision Predictions for Dark Energy Surveys

Ground and space-based sky surveys enable powerful cosmological probes based on measurements of galaxy properties and the distribution of galaxies in the Universe. These probes include weak lensing, baryon acoustic oscillations, abundance of galaxy clusters, and redshift space distortions; they are essential to improving our knowledge of the nature of dark energy. On the theory and modeling front, large-scale simulations of cosmic structure formation play an important role in interpreting the observations and in the challenging task of extracting cosmological physics at the needed precision. These simulations must cover a parameter range beyond the standard six cosmological parameters and need to be run at high mass and force resolution. One key simulation-based task is the generation of accurate theoretical predictions for observables, via the method of emulation. Using a new sampling technique, we explore an 8-dimensional parameter space including massive neutrinos and a variable dark energy equation of state. We construct trial emulators using two surrogate models (the linear power spectrum and an approximate halo mass function). The new sampling method allows us to build precision emulators from just 26 cosmological models and to increase the emulator accuracy by adding new sets of simulations in a prescribed way. This allows emulator fidelity to be systematically improved as new observational data becomes available and higher accuracy is required. Finally, using one LCDM cosmology as an example, we study the demands imposed on a simulation campaign to achieve the required statistics and accuracy when building emulators for dark energy investigations.

preprint2014arXiv

A Millisecond Interferometric Search for Fast Radio Bursts with the Very Large Array

We report on the first millisecond timescale radio interferometric search for the new class of transient known as fast radio bursts (FRBs). We used the Very Large Array (VLA) for a 166-hour, millisecond imaging campaign to detect and precisely localize an FRB. We observed at 1.4 GHz and produced visibilities with 5 ms time resolution over 256 MHz of bandwidth. Dedispersed images were searched for transients with dispersion measures from 0 to 3000 pc/cm3. No transients were detected in observations of high Galactic latitude fields taken from September 2013 though October 2014. Observations of a known pulsar show that images typically had a thermal-noise limited sensitivity of 120 mJy/beam (8 sigma; Stokes I) in 5 ms and could detect and localize transients over a wide field of view. Our nondetection limits the FRB rate to less than 7e4/sky/day (95% confidence) above a fluence limit of 1.2 Jy-ms. Assuming a Euclidean flux distribution, the VLA rate limit is inconsistent with the published rate of Thornton et al. We recalculate previously published rates with a homogeneous consideration of the effects of primary beam attenuation, dispersion, pulse width, and sky brightness. This revises the FRB rate downward and shows that the VLA observations had a roughly 60% chance of detecting a typical FRB and that a 95% confidence constraint would require roughly 500 hours of similar VLA observing. Our survey also limits the repetition rate of an FRB to 2 times less than any known repeating millisecond radio transient.

preprint2013arXiv

The Coyote Universe Extended: Precision Emulation of the Matter Power Spectrum

Modern sky surveys are returning precision measurements of cosmological statistics such as weak lensing shear correlations, the distribution of galaxies, and cluster abundance. To fully exploit these observations, theorists must provide predictions that are at least as accurate as the measurements, as well as robust estimates of systematic errors that are inherent to the modeling process. In the nonlinear regime of structure formation, this challenge can only be overcome by developing a large-scale, multi-physics simulation capability covering a range of cosmological models and astrophysical processes. As a first step to achieving this goal, we have recently developed a prediction scheme for the matter power spectrum (a so-called emulator), accurate at the 1% level out to k~1/Mpc and z=1 for wCDM cosmologies based on a set of high-accuracy N-body simulations. It is highly desirable to increase the range in both redshift and wavenumber and to extend the reach in cosmological parameter space. To make progress in this direction, while minimizing computational cost, we present a strategy that maximally re-uses the original simulations. We demonstrate improvement over the original spatial dynamic range by an order of magnitude, reaching k~10 h/Mpc, a four-fold increase in redshift coverage, to z=4, and now include the Hubble parameter as a new independent variable. To further the range in k and z, a new set of nested simulations run at modest cost is added to the original set. The extension in h is performed by including perturbation theory results within a multi-scale procedure for building the emulator. This economical methodology still gives excellent error control, ~5% near the edges of the domain of applicability of the emulator. A public domain code for the new emulator is released as part of the work presented in this paper.

preprint2013arXiv

The Coyote Universe III: Simulation Suite and Precision Emulator for the Nonlinear Matter Power Spectrum

Many of the most exciting questions in astrophysics and cosmology, including the majority of observational probes of dark energy, rely on an understanding of the nonlinear regime of structure formation. In order to fully exploit the information available from this regime and to extract cosmological constraints, accurate theoretical predictions are needed. Currently such predictions can only be obtained from costly, precision numerical simulations. This paper is the third in a series aimed at constructing an accurate calibration of the nonlinear mass power spectrum on Mpc scales for a wide range of currently viable cosmological models, including dark energy. The first two papers addressed the numerical challenges, and the scheme by which an interpolator was built from a carefully chosen set of cosmological models. In this paper we introduce the "Coyote Univers"' simulation suite which comprises nearly 1,000 N-body simulations at different force and mass resolutions, spanning 38 wCDM cosmologies. This large simulation suite enables us to construct a prediction scheme, or emulator, for the nonlinear matter power spectrum accurate at the percent level out to k~1 h/Mpc. We describe the construction of the emulator, explain the tests performed to ensure its accuracy, and discuss how the central ideas may be extended to a wider range of cosmological models and applications. A power spectrum emulator code is released publicly as part of this paper.

preprint2012arXiv

Computer Model Calibration using the Ensemble Kalman Filter

The ensemble Kalman filter (EnKF) (Evensen, 2009) has proven effective in quantifying uncertainty in a number of challenging dynamic, state estimation, or data assimilation, problems such as weather forecasting and ocean modeling. In these problems a high-dimensional state parameter is successively updated based on recurring physical observations, with the aid of a computationally demanding forward model that prop- agates the state from one time step to the next. More recently, the EnKF has proven effective in history matching in the petroleum engineering community (Evensen, 2009; Oliver and Chen, 2010). Such applications typically involve estimating large numbers of parameters, describing an oil reservoir, using data from production history that accumulate over time. Such history matching problems are especially challenging examples of computer model calibration since they involve a large number of model parameters as well as a computationally demanding forward model. More generally, computer model calibration combines physical observations with a computational model - a computer model - to estimate unknown parameters in the computer model. This paper explores how the EnKF can be used in computer model calibration problems, comparing it to other more common approaches, considering applications in climate and cosmology.

Earl Lawrence

What is connected

Connect this record

See the researcher in context

Building this map preview

13 published item(s)

Generalization vs. Memorization in Autoregressive Deep Learning: Or, Examining Temporal Decay of Gradient Coherence

In-context learning enables continental-scale subsurface temperature prediction from sparse local observations

Fast emulation of density functional theory simulations using approximate Gaussian processes

An Initial Exploration of Bayesian Model Calibration for Estimating the Composition of Rocks and Soils on Mars

Estimating Scale Discrepancy in Bayesian Model Calibration for ChemCam on the Mars Curiosity Rover

The Mira-Titan Universe. III. Emulation of the Halo Mass Function

Rare Event Statistics Applied to Fast Radio Bursts

Partitioning a Large Simulation as It Runs

The Mira-Titan Universe: Precision Predictions for Dark Energy Surveys

A Millisecond Interferometric Search for Fast Radio Bursts with the Very Large Array

The Coyote Universe Extended: Precision Emulation of the Matter Power Spectrum

The Coyote Universe III: Simulation Suite and Precision Emulator for the Nonlinear Matter Power Spectrum

Computer Model Calibration using the Ensemble Kalman Filter