Source author record

Kristjan Greenewald

Kristjan Greenewald appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Methodology Applications Artificial Intelligence Information Theory math.IT math.ST Statistics Theory Computer Vision

Catalog footprint

What is connected

16works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Distributional Process Reward Models: Calibrated Prediction of Future Rewards via Conditional Optimal Transport

Inference-time scaling methods rely on Process Reward Models (PRMs), which are often poorly calibrated and overestimate success probabilities. We propose, to our knowledge, the first use of conditional optimal transport for calibrating PRMs, modifying conditional OT (CondOT) map learning \cite{bunne2022supervised} to estimate a monotonic conditional quantile function over success probabilities estimated by the PRM, conditioned on PRM hidden states. This yields structurally valid quantile estimates and enables efficient extraction of confidence bounds at arbitrary levels, which we integrate into the instance-adaptive scaling (IAS) framework of \cite{park2025know}. We evaluate on mathematical reasoning benchmarks spanning moderate-difficulty problems (MATH-500) and harder out-of-distribution problems (AIME). For PRMs with reliable ranking signals, our method substantially improves calibration over both uncalibrated PRMs and quantile regression. On downstream Best-of-N IAS performance, our method generally improves over uncalibrated PRMs. These results establish conditional optimal transport as another principled and practical approach to PRM calibration, offering structural guarantees and flexible uncertainty estimation.

preprint2022arXiv

Improving Approximate Optimal Transport Distances using Quantization

Optimal transport (OT) is a popular tool in machine learning to compare probability measures geometrically, but it comes with substantial computational burden. Linear programming algorithms for computing OT distances scale cubically in the size of the input, making OT impractical in the large-sample regime. We introduce a practical algorithm, which relies on a quantization step, to estimate OT distances between measures given cheap sample access. We also provide a variant of our algorithm to improve the performance of approximate solvers, focusing on those for entropy-regularized transport. We give theoretical guarantees on the benefits of this quantization step and display experiments showing that it behaves well in practice, providing a practical approximation algorithm that can be used as a drop-in replacement for existing OT estimators.

preprint2022arXiv

Log-Euclidean Signatures for Intrinsic Distances Between Unaligned Datasets

The need for efficiently comparing and representing datasets with unknown alignment spans various fields, from model analysis and comparison in machine learning to trend discovery in collections of medical datasets. We use manifold learning to compare the intrinsic geometric structures of different datasets by comparing their diffusion operators, symmetric positive-definite (SPD) matrices that relate to approximations of the continuous Laplace-Beltrami operator from discrete samples. Existing methods typically assume known data alignment and compare such operators in a pointwise manner. Instead, we exploit the Riemannian geometry of SPD matrices to compare these operators and define a new theoretically-motivated distance based on a lower bound of the log-Euclidean metric. Our framework facilitates comparison of data manifolds expressed in datasets with different sizes, numbers of features, and measurement modalities. Our log-Euclidean signature (LES) distance recovers meaningful structural differences, outperforming competing methods in various application domains.

preprint2022arXiv

The Computational Limits of Deep Learning

Deep learning's recent history has been one of achievement: from triumphing over humans in the game of Go to world-leading performance in image classification, voice recognition, translation, and other tasks. But this progress has come with a voracious appetite for computing power. This article catalogs the extent of this dependency, showing that progress across a wide variety of applications is strongly reliant on increases in computing power. Extrapolating forward this reliance reveals that progress along current lines is rapidly becoming economically, technically, and environmentally unsustainable. Thus, continued progress in these applications will require dramatically more computationally-efficient methods, which will either have to come from changes to deep learning or from moving to other machine learning methods.

preprint2021arXiv

Entropic Causal Inference: Identifiability and Finite Sample Results

Entropic causal inference is a framework for inferring the causal direction between two categorical variables from observational data. The central assumption is that the amount of unobserved randomness in the system is not too large. This unobserved randomness is measured by the entropy of the exogenous variable in the underlying structural causal model, which governs the causal relation between the observed variables. Kocaoglu et al. conjectured that the causal direction is identifiable when the entropy of the exogenous variable is not too large. In this paper, we prove a variant of their conjecture. Namely, we show that for almost all causal models where the exogenous variable has entropy that does not scale with the number of states of the observed variables, the causal direction is identifiable from observational data. We also consider the minimum entropy coupling-based algorithmic approach presented by Kocaoglu et al., and for the first time demonstrate algorithmic identifiability guarantees using a finite number of samples. We conduct extensive experiments to evaluate the robustness of the method to relaxing some of the assumptions in our theory and demonstrate that both the constant-entropy exogenous variable and the no latent confounder assumptions can be relaxed in practice. We also empirically characterize the number of observational samples needed for causal identification. Finally, we apply the algorithm on Tuebingen cause-effect pairs dataset.

preprint2020arXiv

Convergence of Smoothed Empirical Measures with Applications to Entropy Estimation

This paper studies convergence of empirical measures smoothed by a Gaussian kernel. Specifically, consider approximating $P\ast\mathcal{N}_σ$, for $\mathcal{N}_σ\triangleq\mathcal{N}(0,σ^2 \mathrm{I}_d)$, by $\hat{P}_n\ast\mathcal{N}_σ$, where $\hat{P}_n$ is the empirical measure, under different statistical distances. The convergence is examined in terms of the Wasserstein distance, total variation (TV), Kullback-Leibler (KL) divergence, and $χ^2$-divergence. We show that the approximation error under the TV distance and 1-Wasserstein distance ($\mathsf{W}_1$) converges at rate $e^{O(d)}n^{-\frac{1}{2}}$ in remarkable contrast to a typical $n^{-\frac{1}{d}}$ rate for unsmoothed $\mathsf{W}_1$ (and $d\ge 3$). For the KL divergence, squared 2-Wasserstein distance ($\mathsf{W}_2^2$), and $χ^2$-divergence, the convergence rate is $e^{O(d)}n^{-1}$, but only if $P$ achieves finite input-output $χ^2$ mutual information across the additive white Gaussian noise channel. If the latter condition is not met, the rate changes to $ω(n^{-1})$ for the KL divergence and $\mathsf{W}_2^2$, while the $χ^2$-divergence becomes infinite - a curious dichotomy. As a main application we consider estimating the differential entropy $h(P\ast\mathcal{N}_σ)$ in the high-dimensional regime. The distribution $P$ is unknown but $n$ i.i.d samples from it are available. We first show that any good estimator of $h(P\ast\mathcal{N}_σ)$ must have sample complexity that is exponential in $d$. Using the empirical approximation results we then show that the absolute-error risk of the plug-in estimator converges at the parametric rate $e^{O(d)}n^{-\frac{1}{2}}$, thus establishing the minimax rate-optimality of the plug-in. Numerical results that demonstrate a significant empirical superiority of the plug-in approach to general-purpose differential entropy estimators are provided.

preprint2020arXiv

Gaussian-Smooth Optimal Transport: Metric Structure and Statistical Efficiency

Optimal transport (OT), and in particular the Wasserstein distance, has seen a surge of interest and applications in machine learning. However, empirical approximation under Wasserstein distances suffers from a severe curse of dimensionality, rendering them impractical in high dimensions. As a result, entropically regularized OT has become a popular workaround. However, while it enjoys fast algorithms and better statistical properties, it looses the metric structure that Wasserstein distances enjoy. This work proposes a novel Gaussian-smoothed OT (GOT) framework, that achieves the best of both worlds: preserving the 1-Wasserstein metric structure while alleviating the empirical approximation curse of dimensionality. Furthermore, as the Gaussian-smoothing parameter shrinks to zero, GOT $Γ$-converges towards classic OT (with convergence of optimizers), thus serving as a natural extension. An empirical study that supports the theoretical results is provided, promoting Gaussian-smoothed OT as a powerful alternative to entropic OT.

preprint2016arXiv

Dynamic Metric Learning from Pairwise Comparisons

Recent work in distance metric learning has focused on learning transformations of data that best align with specified pairwise similarity and dissimilarity constraints, often supplied by a human observer. The learned transformations lead to improved retrieval, classification, and clustering algorithms due to the better adapted distance or similarity measures. Here, we address the problem of learning these transformations when the underlying constraint generation process is nonstationary. This nonstationarity can be due to changes in either the ground-truth clustering used to generate constraints or changes in the feature subspaces in which the class structure is apparent. We propose Online Convex Ensemble StrongLy Adaptive Dynamic Learning (OCELAD), a general adaptive, online approach for learning and tracking optimal metrics as they change over time that is highly robust to a variety of nonstationary behaviors in the changing metric. We apply the OCELAD framework to an ensemble of online learners. Specifically, we create a retro-initialized composite objective mirror descent (COMID) ensemble (RICE) consisting of a set of parallel COMID learners with different learning rates, demonstrate RICE-OCELAD on both real and synthetic data sets and show significant performance improvements relative to previously proposed batch and online distance metric learning algorithms.

preprint2016arXiv

Kronecker STAP and SAR GMTI

As a high resolution radar imaging modality, SAR detects and localizes non-moving targets accurately, giving it an advantage over lower resolution GMTI radars. Moving target detection is more challenging due to target smearing and masking by clutter. Space-time adaptive processing (STAP) is often used on multiantenna SAR to remove the stationary clutter and enhance the moving targets. In (Greenewald et al., 2016) it was shown that the performance of STAP can be improved by modeling the clutter covariance as a space vs. time Kronecker product with low rank factors, providing robustness and reducing the number of training samples required. In this work, we present a massively parallel algorithm for implementing Kronecker product STAP, enabling application to very large SAR datasets (such as the 2006 Gotcha data collection) using GPUs. Finally, we develop an extension of Kronecker STAP that uses information from multiple passes to improve moving target detection.

preprint2016arXiv

Nonstationary Distance Metric Learning

Recent work in distance metric learning has focused on learning transformations of data that best align with provided sets of pairwise similarity and dissimilarity constraints. The learned transformations lead to improved retrieval, classification, and clustering algorithms due to the better adapted distance or similarity measures. Here, we introduce the problem of learning these transformations when the underlying constraint generation process is nonstationary. This nonstationarity can be due to changes in either the ground-truth clustering used to generate constraints or changes to the feature subspaces in which the class structure is apparent. We propose and evaluate COMID-SADL, an adaptive, online approach for learning and tracking optimal metrics as they change over time that is highly robust to a variety of nonstationary behaviors in the changing metric. We demonstrate COMID-SADL on both real and synthetic data sets and show significant performance improvements relative to previously proposed batch and online distance metric learning algorithms.

preprint2016arXiv

Robust SAR STAP via Kronecker Decomposition

This paper proposes a spatio-temporal decomposition for the detection of moving targets in multiantenna SAR. As a high resolution radar imaging modality, SAR detects and localizes non-moving targets accurately, giving it an advantage over lower resolution GMTI radars. Moving target detection is more challenging due to target smearing and masking by clutter. Space-time adaptive processing (STAP) is often used to remove the stationary clutter and enhance the moving targets. In this work, it is shown that the performance of STAP can be improved by modeling the clutter covariance as a space vs. time Kronecker product with low rank factors. Based on this model, a low-rank Kronecker product covariance estimation algorithm is proposed, and a novel separable clutter cancelation filter based on the Kronecker covariance estimate is introduced. The proposed method provides orders of magnitude reduction in the required number of training samples, as well as improved robustness to corruption of the training data. Simulation results and experiments using the Gotcha SAR GMTI challenge dataset are presented that confirm the advantages of our approach relative to existing techniques.

preprint2015arXiv

Kronecker PCA Based Robust SAR STAP

In this work the detection of moving targets in multiantenna SAR is considered. As a high resolution radar imaging modality, SAR detects and identifies stationary targets very well, giving it an advantage over classical GMTI radars. Moving target detection is more challenging due to the "burying" of moving targets in the clutter and is often achieved using space-time adaptive processing (STAP) (based on learning filters from the spatio-temporal clutter covariance) to remove the stationary clutter and enhance the moving targets. In this work, it is noted that in addition to the oft noted low rank structure, the clutter covariance is also naturally in the form of a space vs time Kronecker product with low rank factors. A low-rank KronPCA covariance estimation algorithm is proposed to exploit this structure, and a separable clutter cancelation filter based on the Kronecker covariance estimate is proposed. Together, these provide orders of magnitude reduction in the number of training samples required, as well as improved robustness to corruption of the training data, e.g. due to outliers and moving targets. Theoretical properties of the proposed estimation algorithm are derived and the significant reductions in training complexity are established under the spherically invariant random vector model (SIRV). Finally, an extension of this approach incorporating multipass data (change detection) is presented. Simulation results and experiments using the real Gotcha SAR GMTI challenge dataset are presented that confirm the advantages of our approach relative to existing techniques.

preprint2015arXiv

Robust Kronecker Product PCA for Spatio-Temporal Covariance Estimation

Kronecker PCA involves the use of a space vs. time Kronecker product decomposition to estimate spatio-temporal covariances. In this work the addition of a sparse correction factor is considered, which corresponds to a model of the covariance as a sum of Kronecker products of low (separation) rank and a sparse matrix. This sparse correction extends the diagonally corrected Kronecker PCA of [Greenewald et al 2013, 2014] to allow for sparse unstructured "outliers" anywhere in the covariance matrix, e.g. arising from variables or correlations that do not fit the Kronecker model well, or from sources such as sensor noise or sensor failure. We introduce a robust PCA-based algorithm to estimate the covariance under this model, extending the rearranged nuclear norm penalized LS Kronecker PCA approaches of [Greenewald et al 2014, Tsiligkaridis et al 2013]. An extension to Toeplitz temporal factors is also provided, producing a parameter reduction for temporally stationary measurement modeling. High dimensional MSE performance bounds are given for these extensions. Finally, the proposed extension of KronPCA is evaluated on both simulated and real data coming from yeast cell cycle experiments. This establishes the practical utility of robust Kronecker PCA in biological and other applications.

preprint2014arXiv

Detection of Anomalous Crowd Behavior Using Spatio-Temporal Multiresolution Model and Kronecker Sum Decompositions

In this work we consider the problem of detecting anomalous spatio-temporal behavior in videos. Our approach is to learn the normative multiframe pixel joint distribution and detect deviations from it using a likelihood based approach. Due to the extreme lack of available training samples relative to the dimension of the distribution, we use a mean and covariance approach and consider methods of learning the spatio-temporal covariance in the low-sample regime. Our approach is to estimate the covariance using parameter reduction and sparse models. The first method considered is the representation of the covariance as a sum of Kronecker products as in (Greenewald et al 2013), which is found to be an accurate approximation in this setting. We propose learning algorithms relevant to our problem. We then consider the sparse multiresolution model of (Choi et al 2010) and apply the Kronecker product methods to it for further parameter reduction, as well as introducing modifications for enhanced efficiency and greater applicability to spatio-temporal covariance matrices. We apply our methods to the detection of crowd behavior anomalies in the University of Minnesota crowd anomaly dataset, and achieve competitive results.

preprint2014arXiv

Regularized Block Toeplitz Covariance Matrix Estimation via Kronecker Product Expansions

In this work we consider the estimation of spatio-temporal covariance matrices in the low sample non-Gaussian regime. We impose covariance structure in the form of a sum of Kronecker products decomposition (Tsiligkaridis et al. 2013, Greenewald et al. 2013) with diagonal correction (Greenewald et al.), which we refer to as DC-KronPCA, in the estimation of multiframe covariance matrices. This paper extends the approaches of (Tsiligkaridis et al.) in two directions. First, we modify the diagonally corrected method of (Greenewald et al.) to include a block Toeplitz constraint imposing temporal stationarity structure. Second, we improve the conditioning of the estimate in the very low sample regime by using Ledoit-Wolf type shrinkage regularization similar to (Chen, Hero et al. 2010). For improved robustness to heavy tailed distributions, we modify the KronPCA to incorporate robust shrinkage estimation (Chen, Hero et al. 2011). Results of numerical simulations establish benefits in terms of estimation MSE when compared to previous methods. Finally, we apply our methods to a real-world network spatio-temporal anomaly detection problem and achieve superior results.

preprint2013arXiv

Kronecker Sum Decompositions of Space-Time Data

In this paper we consider the use of the space vs. time Kronecker product decomposition in the estimation of covariance matrices for spatio-temporal data. This decomposition imposes lower dimensional structure on the estimated covariance matrix, thus reducing the number of samples required for estimation. To allow a smooth tradeoff between the reduction in the number of parameters (to reduce estimation variance) and the accuracy of the covariance approximation (affecting estimation bias), we introduce a diagonally loaded modification of the sum of kronecker products representation [1]. We derive a Cramer-Rao bound (CRB) on the minimum attainable mean squared predictor coefficient estimation error for unbiased estimators of Kronecker structured covariance matrices. We illustrate the accuracy of the diagonally loaded Kronecker sum decomposition by applying it to video data of human activity.

Kristjan Greenewald

What is connected

Connect this record

See the researcher in context

Building this map preview

16 published item(s)

Distributional Process Reward Models: Calibrated Prediction of Future Rewards via Conditional Optimal Transport

Improving Approximate Optimal Transport Distances using Quantization

Log-Euclidean Signatures for Intrinsic Distances Between Unaligned Datasets

The Computational Limits of Deep Learning

Entropic Causal Inference: Identifiability and Finite Sample Results

Convergence of Smoothed Empirical Measures with Applications to Entropy Estimation

Gaussian-Smooth Optimal Transport: Metric Structure and Statistical Efficiency

Dynamic Metric Learning from Pairwise Comparisons

Kronecker STAP and SAR GMTI

Nonstationary Distance Metric Learning

Robust SAR STAP via Kronecker Decomposition

Kronecker PCA Based Robust SAR STAP

Robust Kronecker Product PCA for Spatio-Temporal Covariance Estimation

Detection of Anomalous Crowd Behavior Using Spatio-Temporal Multiresolution Model and Kronecker Sum Decompositions

Regularized Block Toeplitz Covariance Matrix Estimation via Kronecker Product Expansions

Kronecker Sum Decompositions of Space-Time Data