Researcher profile

Kristjan Greenewald

Kristjan Greenewald contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
7works
0followers
6topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

7 published item(s)

preprint2026arXiv

Distributional Process Reward Models: Calibrated Prediction of Future Rewards via Conditional Optimal Transport

Inference-time scaling methods rely on Process Reward Models (PRMs), which are often poorly calibrated and overestimate success probabilities. We propose, to our knowledge, the first use of conditional optimal transport for calibrating PRMs, modifying conditional OT (CondOT) map learning \cite{bunne2022supervised} to estimate a monotonic conditional quantile function over success probabilities estimated by the PRM, conditioned on PRM hidden states. This yields structurally valid quantile estimates and enables efficient extraction of confidence bounds at arbitrary levels, which we integrate into the instance-adaptive scaling (IAS) framework of \cite{park2025know}. We evaluate on mathematical reasoning benchmarks spanning moderate-difficulty problems (MATH-500) and harder out-of-distribution problems (AIME). For PRMs with reliable ranking signals, our method substantially improves calibration over both uncalibrated PRMs and quantile regression. On downstream Best-of-N IAS performance, our method generally improves over uncalibrated PRMs. These results establish conditional optimal transport as another principled and practical approach to PRM calibration, offering structural guarantees and flexible uncertainty estimation.

preprint2022arXiv

Improving Approximate Optimal Transport Distances using Quantization

Optimal transport (OT) is a popular tool in machine learning to compare probability measures geometrically, but it comes with substantial computational burden. Linear programming algorithms for computing OT distances scale cubically in the size of the input, making OT impractical in the large-sample regime. We introduce a practical algorithm, which relies on a quantization step, to estimate OT distances between measures given cheap sample access. We also provide a variant of our algorithm to improve the performance of approximate solvers, focusing on those for entropy-regularized transport. We give theoretical guarantees on the benefits of this quantization step and display experiments showing that it behaves well in practice, providing a practical approximation algorithm that can be used as a drop-in replacement for existing OT estimators.

preprint2022arXiv

Log-Euclidean Signatures for Intrinsic Distances Between Unaligned Datasets

The need for efficiently comparing and representing datasets with unknown alignment spans various fields, from model analysis and comparison in machine learning to trend discovery in collections of medical datasets. We use manifold learning to compare the intrinsic geometric structures of different datasets by comparing their diffusion operators, symmetric positive-definite (SPD) matrices that relate to approximations of the continuous Laplace-Beltrami operator from discrete samples. Existing methods typically assume known data alignment and compare such operators in a pointwise manner. Instead, we exploit the Riemannian geometry of SPD matrices to compare these operators and define a new theoretically-motivated distance based on a lower bound of the log-Euclidean metric. Our framework facilitates comparison of data manifolds expressed in datasets with different sizes, numbers of features, and measurement modalities. Our log-Euclidean signature (LES) distance recovers meaningful structural differences, outperforming competing methods in various application domains.

preprint2022arXiv

The Computational Limits of Deep Learning

Deep learning's recent history has been one of achievement: from triumphing over humans in the game of Go to world-leading performance in image classification, voice recognition, translation, and other tasks. But this progress has come with a voracious appetite for computing power. This article catalogs the extent of this dependency, showing that progress across a wide variety of applications is strongly reliant on increases in computing power. Extrapolating forward this reliance reveals that progress along current lines is rapidly becoming economically, technically, and environmentally unsustainable. Thus, continued progress in these applications will require dramatically more computationally-efficient methods, which will either have to come from changes to deep learning or from moving to other machine learning methods.

preprint2021arXiv

Entropic Causal Inference: Identifiability and Finite Sample Results

Entropic causal inference is a framework for inferring the causal direction between two categorical variables from observational data. The central assumption is that the amount of unobserved randomness in the system is not too large. This unobserved randomness is measured by the entropy of the exogenous variable in the underlying structural causal model, which governs the causal relation between the observed variables. Kocaoglu et al. conjectured that the causal direction is identifiable when the entropy of the exogenous variable is not too large. In this paper, we prove a variant of their conjecture. Namely, we show that for almost all causal models where the exogenous variable has entropy that does not scale with the number of states of the observed variables, the causal direction is identifiable from observational data. We also consider the minimum entropy coupling-based algorithmic approach presented by Kocaoglu et al., and for the first time demonstrate algorithmic identifiability guarantees using a finite number of samples. We conduct extensive experiments to evaluate the robustness of the method to relaxing some of the assumptions in our theory and demonstrate that both the constant-entropy exogenous variable and the no latent confounder assumptions can be relaxed in practice. We also empirically characterize the number of observational samples needed for causal identification. Finally, we apply the algorithm on Tuebingen cause-effect pairs dataset.

preprint2020arXiv

Convergence of Smoothed Empirical Measures with Applications to Entropy Estimation

This paper studies convergence of empirical measures smoothed by a Gaussian kernel. Specifically, consider approximating $P\ast\mathcal{N}_σ$, for $\mathcal{N}_σ\triangleq\mathcal{N}(0,σ^2 \mathrm{I}_d)$, by $\hat{P}_n\ast\mathcal{N}_σ$, where $\hat{P}_n$ is the empirical measure, under different statistical distances. The convergence is examined in terms of the Wasserstein distance, total variation (TV), Kullback-Leibler (KL) divergence, and $χ^2$-divergence. We show that the approximation error under the TV distance and 1-Wasserstein distance ($\mathsf{W}_1$) converges at rate $e^{O(d)}n^{-\frac{1}{2}}$ in remarkable contrast to a typical $n^{-\frac{1}{d}}$ rate for unsmoothed $\mathsf{W}_1$ (and $d\ge 3$). For the KL divergence, squared 2-Wasserstein distance ($\mathsf{W}_2^2$), and $χ^2$-divergence, the convergence rate is $e^{O(d)}n^{-1}$, but only if $P$ achieves finite input-output $χ^2$ mutual information across the additive white Gaussian noise channel. If the latter condition is not met, the rate changes to $ω(n^{-1})$ for the KL divergence and $\mathsf{W}_2^2$, while the $χ^2$-divergence becomes infinite - a curious dichotomy. As a main application we consider estimating the differential entropy $h(P\ast\mathcal{N}_σ)$ in the high-dimensional regime. The distribution $P$ is unknown but $n$ i.i.d samples from it are available. We first show that any good estimator of $h(P\ast\mathcal{N}_σ)$ must have sample complexity that is exponential in $d$. Using the empirical approximation results we then show that the absolute-error risk of the plug-in estimator converges at the parametric rate $e^{O(d)}n^{-\frac{1}{2}}$, thus establishing the minimax rate-optimality of the plug-in. Numerical results that demonstrate a significant empirical superiority of the plug-in approach to general-purpose differential entropy estimators are provided.

preprint2020arXiv

Gaussian-Smooth Optimal Transport: Metric Structure and Statistical Efficiency

Optimal transport (OT), and in particular the Wasserstein distance, has seen a surge of interest and applications in machine learning. However, empirical approximation under Wasserstein distances suffers from a severe curse of dimensionality, rendering them impractical in high dimensions. As a result, entropically regularized OT has become a popular workaround. However, while it enjoys fast algorithms and better statistical properties, it looses the metric structure that Wasserstein distances enjoy. This work proposes a novel Gaussian-smoothed OT (GOT) framework, that achieves the best of both worlds: preserving the 1-Wasserstein metric structure while alleviating the empirical approximation curse of dimensionality. Furthermore, as the Gaussian-smoothing parameter shrinks to zero, GOT $Γ$-converges towards classic OT (with convergence of optimizers), thus serving as a natural extension. An empirical study that supports the theoretical results is provided, promoting Gaussian-smoothed OT as a powerful alternative to entropic OT.