Source author record

Michael Habeck

Michael Habeck appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Applications Computation astro-ph.IM Methodology Artificial Intelligence astro-ph.CO Computation and Language cond-mat.mtrl-sci cond-mat.stat-mech hep-ph math.NA math.PR math.ST Numerical Analysis physics.comp-ph physics.data-an Statistics Theory

Catalog footprint

What is connected

7works

18topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Beyond the Training Distribution: Mapping Generalization Boundaries in Neural Program Synthesis

Large-scale transformers achieve impressive results on program synthesis benchmarks, yet their true generalization capabilities remain obscured by data contamination and opaque training corpora. To rigorously assess whether models are truly generalizing or merely retrieving memorized templates, we introduce a strictly controlled program synthesis environment based on a domain-specific arithmetic grammar. By systematically enumerating and evaluating millions of unique programs, we construct interpretable syntactic and semantic metric spaces. This allows us to precisely map data distributions and sample train and test splits that isolate specific distributional shifts. Our experiments demonstrate that optimizing density generalization -- through diverse sampling over both semantic and syntactic spaces -- induces robust out-of-distribution generalization. Conversely, evaluating support generalization reveals that transformers severely struggle with extrapolation, experiencing a performance drop of over 30% when forced to generate syntactically novel programs. While steadily scaling up compute improves generalization, the gains follow a strictly log-linear relationship. We conclude that robust generalization requires maximizing training diversity across multiple manifolds, and our findings indicate the necessity for novel search-based approaches to break through current log-linear scaling bottlenecks.

preprint2022arXiv

Nested sampling for physical scientists

We review Skilling's nested sampling (NS) algorithm for Bayesian inference and more broadly multi-dimensional integration. After recapitulating the principles of NS, we survey developments in implementing efficient NS algorithms in practice in high-dimensions, including methods for sampling from the so-called constrained prior. We outline the ways in which NS may be applied and describe the application of NS in three scientific fields in which the algorithm has proved to be useful: cosmology, gravitational-wave astronomy, and materials science. We close by making recommendations for best practice when using NS and by summarizing potential limitations and optimizations of NS.

preprint2020arXiv

Stability of doubly-intractable distributions

Doubly-intractable distributions appear naturally as posterior distributions in Bayesian inference frameworks whenever the likelihood contains a normalizing function $Z$. Having two such functions $Z$ and $\widetilde Z$ we provide estimates of the total variation and Wasserstein distance of the resulting posterior probability measures. As a consequence this leads to local Lipschitz continuity w.r.t. $Z$. In the more general framework of a random function $\widetilde Z$ we derive bounds on the expected total variation and expected Wasserstein distance. The applicability of the estimates is illustrated within the setting of two representative Monte Carlo recovery scenarios.

preprint2015arXiv

Bayesian Evidence and Model Selection

In this paper we review the concepts of Bayesian evidence and Bayes factors, also known as log odds ratios, and their application to model selection. The theory is presented along with a discussion of analytic, approximate and numerical techniques. Specific attention is paid to the Laplace approximation, variational Bayes, importance sampling, thermodynamic integration, and nested sampling and its recent variants. Analogies to statistical physics, from which many of these techniques originate, are discussed in order to provide readers with deeper insights that may lead to new techniques. The utility of Bayesian model testing in the domain sciences is demonstrated by presenting four specific practical examples considered within the context of signal processing in the areas of signal detection, sensor characterization, scientific model selection and molecular force characterization.

preprint2015arXiv

Ensemble annealing of complex physical systems

Algorithms for simulating complex physical systems or solving difficult optimization problems often resort to an annealing process. Rather than simulating the system at the temperature of interest, an annealing algorithm starts at a temperature that is high enough to ensure ergodicity and gradually decreases it until the destination temperature is reached. This idea is used in popular algorithms such as parallel tempering and simulated annealing. A general problem with annealing methods is that they require a temperature schedule. Choosing well-balanced temperature schedules can be tedious and time-consuming. Imbalanced schedules can have a negative impact on the convergence, runtime and success of annealing algorithms. This article outlines a unifying framework, ensemble annealing, that combines ideas from simulated annealing, histogram reweighting and nested sampling with concepts in thermodynamic control. Ensemble annealing simultaneously simulates a physical system and estimates its density of states. The temperatures are lowered not according to a prefixed schedule but adaptively so as to maintain a constant relative entropy between successive ensembles. After each step on the temperature ladder an estimate of the density of states is updated and a new temperature is chosen. Ensemble annealing is highly practical and broadly applicable. This is illustrated for various systems including Ising, Potts, and protein models.

preprint2013arXiv

Adaptive nonparametric detection in cryo-electron microscopy

Cryo-electron microscopy (cryo-EM) is an emerging experimental method to characterize the structure of large biomolecular assemblies. Single particle cryo-EM records 2D images (so-called micrographs) of projections of the three-dimensional particle, which need to be processed to obtain the three-dimensional reconstruction. A crucial step in the reconstruction process is particle picking which involves detection of particles in noisy 2D micrographs with low signal-to-noise ratios of typically 1:10 or even lower. Typically, each picture contains a large number of particles, and particles have unknown irregular and nonconvex shapes.

preprint2013arXiv

Spatial statistics, image analysis and percolation theory

We develop a novel method for detection of signals and reconstruction of images in the presence of random noise. The method uses results from percolation theory. We specifically address the problem of detection of multiple objects of unknown shapes in the case of nonparametric noise. The noise density is unknown and can be heavy-tailed. The objects of interest have unknown varying intensities. No boundary shape constraints are imposed on the objects, only a set of weak bulk conditions is required. We view the object detection problem as a multiple hypothesis testing for discrete statistical inverse problems. We present an algorithm that allows to detect greyscale objects of various shapes in noisy images. We prove results on consistency and algorithmic complexity of our procedures. Applications to cryo-electron microscopy are presented.

Michael Habeck

What is connected

Connect this record

See the researcher in context

Building this map preview

7 published item(s)

Beyond the Training Distribution: Mapping Generalization Boundaries in Neural Program Synthesis

Nested sampling for physical scientists

Stability of doubly-intractable distributions

Bayesian Evidence and Model Selection

Ensemble annealing of complex physical systems

Adaptive nonparametric detection in cryo-electron microscopy

Spatial statistics, image analysis and percolation theory