Source author record

Jeff Calder

Jeff Calder appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning math.AP math.NA Computer Vision Numerical Analysis math.PR math.OC Computer Science and Game Theory eess.IV math.ST Statistics Theory Artificial Intelligence Databases Information Retrieval math-ph math.CO math.MP math.SP Quantitative Methods

Catalog footprint

What is connected

22works

19topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2025arXiv

Dynamical feedback control with operator learning for the Vlasov-Poisson system

To meet the demands of instantaneous control of instabilities over long time horizons in plasma fusion, we design a dynamic feedback control strategy for the Vlasov-Poisson system by constructing an operator that maps state perturbations to an external control field. In the first part of the paper, we propose learning such an operator using a neural network. Inspired by optimal control theory for linearized dynamics, we introduce a low-rank neural operator architecture and train it via adjoint state method. The resulting controller is effective at suppressing instabilities well beyond the training time horizon. To generalize control across varying initial data, we further introduce a novel cancellation-based control strategy that removes the destabilizing component of the electric field. This approach naturally defines an operator without requiring any training, ensures perturbation decay over infinite time, and demonstrates strong robustness under noisy feedback. Numerical experiments confirm the effectiveness of the method in both one- and multidimensional settings.

preprint2022arXiv

Analysis and algorithms for $\ell_p$-based semi-supervised learning on graphs

This paper addresses theory and applications of $\ell_p$-based Laplacian regularization in semi-supervised learning. The graph $p$-Laplacian for $p>2$ has been proposed recently as a replacement for the standard ($p=2$) graph Laplacian in semi-supervised learning problems with very few labels, where Laplacian learning is degenerate. In the first part of the paper we prove new discrete to continuum convergence results for $p$-Laplace problems on $k$-nearest neighbor ($k$-NN) graphs, which are more commonly used in practice than random geometric graphs. Our analysis shows that, on $k$-NN graphs, the $p$-Laplacian retains information about the data distribution as $p\to \infty$ and Lipschitz learning ($p=\infty$) is sensitive to the data distribution. This situation can be contrasted with random geometric graphs, where the $p$-Laplacian forgets the data distribution as $p\to \infty$. We also present a general framework for proving discrete to continuum convergence results in graph-based learning that only requires pointwise consistency and monotonicity. In the second part of the paper, we develop fast algorithms for solving the variational and game-theoretic $p$-Laplace equations on weighted graphs for $p>2$. We present several efficient and scalable algorithms for both formulations, and present numerical results on synthetic data indicating their convergence properties. Finally, we conduct extensive numerical experiments on the MNIST, FashionMNIST and EMNIST datasets that illustrate the effectiveness of the $p$-Laplacian formulation for semi-supervised learning with few labels. In particular, we find that Lipschitz learning ($p=\infty$) performs well with very few labels on $k$-NN graphs, which experimentally validates our theoretical findings that Lipschitz learning retains information about the data distribution (the unlabeled data) on $k$-NN graphs.

preprint2022arXiv

Boundary Estimation from Point Clouds: Algorithms, Guarantees and Applications

We investigate identifying the boundary of a domain from sample points in the domain. We introduce new estimators for the normal vector to the boundary, distance of a point to the boundary, and a test for whether a point lies within a boundary strip. The estimators can be efficiently computed and are more accurate than the ones present in the literature. We provide rigorous error estimates for the estimators. Furthermore we use the detected boundary points to solve boundary-value problems for PDE on point clouds. We prove error estimates for the Laplace and eikonal equations on point clouds. Finally we provide a range of numerical experiments illustrating the performance of our boundary estimators, applications to PDE on point clouds, and tests on image data sets.

preprint2022arXiv

Graph-based Active Learning for Semi-supervised Classification of SAR Data

We present a novel method for classification of Synthetic Aperture Radar (SAR) data by combining ideas from graph-based learning and neural network methods within an active learning framework. Graph-based methods in machine learning are based on a similarity graph constructed from the data. When the data consists of raw images composed of scenes, extraneous information can make the classification task more difficult. In recent years, neural network methods have been shown to provide a promising framework for extracting patterns from SAR images. These methods, however, require ample training data to avoid overfitting. At the same time, such training data are often unavailable for applications of interest, such as automatic target recognition (ATR) and SAR data. We use a Convolutional Neural Network Variational Autoencoder (CNNVAE) to embed SAR data into a feature space, and then construct a similarity graph from the embedded data and apply graph-based semi-supervised learning techniques. The CNNVAE feature embedding and graph construction requires no labeled data, which reduces overfitting and improves the generalization performance of graph learning at low label rates. Furthermore, the method easily incorporates a human-in-the-loop for active learning in the data-labeling process. We present promising results and compare them to other standard machine learning methods on the Moving and Stationary Target Acquisition and Recognition (MSTAR) dataset for ATR with small amounts of labeled data.

preprint2022arXiv

Hamilton-Jacobi equations on graphs with applications to semi-supervised learning and data depth

Shortest path graph distances are widely used in data science and machine learning, since they can approximate the underlying geodesic distance on the data manifold. However, the shortest path distance is highly sensitive to the addition of corrupted edges in the graph, either through noise or an adversarial perturbation. In this paper we study a family of Hamilton-Jacobi equations on graphs that we call the $p$-eikonal equation. We show that the $p$-eikonal equation with $p=1$ is a provably robust distance-type function on a graph, and the $p\to \infty$ limit recovers shortest path distances. While the $p$-eikonal equation does not correspond to a shortest-path graph distance, we nonetheless show that the continuum limit of the $p$-eikonal equation on a random geometric graph recovers a geodesic density weighted distance in the continuum. We consider applications of the $p$-eikonal equation to data depth and semi-supervised learning, and use the continuum limit to prove asymptotic consistency results for both applications. Finally, we show the results of experiments with data depth and semi-supervised learning on real image datasets, including MNIST, FashionMNIST and CIFAR-10, which show that the $p$-eikonal equation offers significantly better results compared to shortest path distances.

preprint2022arXiv

Rates of convergence for the continuum limit of nondominated sorting

Nondominated sorting is a discrete process that sorts points in Euclidean space according to the coordinatewise partial order, and is used to rank feasible solutions to multiobjective optimization problems. It was previously shown that nondominated sorting of random points has a Hamilton-Jacobi equation continuum limit. We prove quantitative error estimates for the convergence of nondominated sorting to its continuum limit Hamilton-Jacobi equation. Our proof uses the maximum principle and viscosity solution machinery, along with new semiconvexity estimates for domains with corner singularities.

preprint2022arXiv

Use and Misuse of Machine Learning in Anthropology

Machine learning (ML), being now widely accessible to the research community at large, has fostered a proliferation of new and striking applications of these emergent mathematical techniques across a wide range of disciplines. In this paper, we will focus on a particular case study: the field of paleoanthropology, which seeks to understand the evolution of the human species based on biological and cultural evidence. As we will show, the easy availability of ML algorithms and lack of expertise on their proper use among the anthropological research community has led to foundational misapplications that have appeared throughout the literature. The resulting unreliable results not only undermine efforts to legitimately incorporate ML into anthropological research, but produce potentially faulty understandings about our human evolutionary and behavioral past. The aim of this paper is to provide a brief introduction to some of the ways in which ML has been applied within paleoanthropology; we also include a survey of some basic ML algorithms for those who are not fully conversant with the field, which remains under active development. We discuss a series of missteps, errors, and violations of correct protocols of ML methods that appear disconcertingly often within the accumulating body of anthropological literature. These mistakes include use of outdated algorithms and practices; inappropriate train/test splits, sample composition, and textual explanations; as well as an absence of transparency due to the lack of data/code sharing, and the subsequent limitations imposed on independent replication. We assert that expanding samples, sharing data and code, re-evaluating approaches to peer review, and, most importantly, developing interdisciplinary teams that include experts in ML are all necessary for progress in future research incorporating ML within anthropology.

preprint2022arXiv

Using machine learning on new feature sets extracted from 3D models of broken animal bones to classify fragments according to break agent

Distinguishing agents of bone modification at paleoanthropological sites is at the root of much of the research directed at understanding early hominin exploitation of large animal resources and the effects those subsistence behaviors had on early hominin evolution. However, current methods, particularly in the area of fracture pattern analysis as a signal of marrow exploitation, have failed to overcome equifinality. Furthermore, researchers debate the replicability and validity of current and emerging methods for analyzing bone modifications. Here we present a new approach to fracture pattern analysis aimed at distinguishing bone fragments resulting from hominin bone breakage and those produced by carnivores. This new method uses 3D models of fragmentary bone to extract a much richer dataset that is more transparent and replicable than feature sets previously used in fracture pattern analysis. Supervised machine learning algorithms are properly used to classify bone fragments according to agent of breakage with average mean accuracy of 77% across tests.

preprint2021arXiv

A continuum limit for the PageRank algorithm

Semi-supervised and unsupervised machine learning methods often rely on graphs to model data, prompting research on how theoretical properties of operators on graphs are leveraged in learning problems. While most of the existing literature focuses on undirected graphs, directed graphs are very important in practice, giving models for physical, biological, or transportation networks, among many other applications. In this paper, we propose a new framework for rigorously studying continuum limits of learning algorithms on directed graphs. We use the new framework to study the PageRank algorithm, and show how it can be interpreted as a numerical scheme on a directed graph involving a type of normalized graph Laplacian. We show that the corresponding continuum limit problem, which is taken as the number of webpages grows to infinity, is a second-order, possibly degenerate, elliptic equation that contains reaction, diffusion, and advection terms. We prove that the numerical scheme is consistent and stable and compute explicit rates of convergence of the discrete solution to the solution of the continuum limit PDE. We give applications to proving stability and asymptotic regularity of the PageRank vector. Finally, we illustrate our results with numerical experiments and explore an application to data depth.

preprint2020arXiv

Asymptotically optimal strategies for online prediction with history-dependent experts

We establish sharp asymptotically optimal strategies for the problem of online prediction with history dependent experts. The prediction problem is played (in part) over a discrete graph called the $d$ dimensional de Bruijn graph, where $d$ is the number of days of history used by the experts. Previous work [11] established $O(\varepsilon)$ optimal strategies for $n=2$ experts and $d\leq 4$ days of history, while [10] established $O(\varepsilon^{1/3})$ optimal strategies for all $n\geq 2$ and all $d\geq 1$, where the game is played for $N$ steps and $\varepsilon=N^{-1/2}$. In this paper, we show that the optimality conditions over the de Bruijn graph correspond to a graph Poisson equation, and we establish $O(\varepsilon)$ optimal strategies for all values of $n$ and $d$.

preprint2020arXiv

Improved spectral convergence rates for graph Laplacians on epsilon-graphs and k-NN graphs

In this paper we improve the spectral convergence rates for graph-based approximations of Laplace-Beltrami operators constructed from random data. We utilize regularity of the continuum eigenfunctions and strong pointwise consistency results to prove that spectral convergence rates are the same as the pointwise consistency rates for graph Laplacians. In particular, for an optimal choice of the graph connectivity $\varepsilon$, our results show that the eigenvalues and eigenvectors of the graph Laplacian converge to those of the Laplace-Beltrami operator at a rate of $O(n^{-1/(m+4)})$, up to log factors, where $m$ is the manifold dimension and $n$ is the number of vertices in the graph. Our approach is general and allows us to analyze a large variety of graph constructions that include $\varepsilon$-graphs and $k$-NN graphs.

preprint2020arXiv

Online Prediction With History-Dependent Experts: The General Case

We study the problem of prediction of binary sequences with expert advice in the online setting, which is a classic example of online machine learning. We interpret the binary sequence as the price history of a stock, and view the predictor as an investor, which converts the problem into a stock prediction problem. In this framework, an investor, who predicts the daily movements of a stock, and an adversarial market, who controls the stock, play against each other over $N$ turns. The investor combines the predictions of $n\geq 2$ experts in order to make a decision about how much to invest at each turn, and aims to minimize their regret with respect to the best-performing expert at the end of the game. We consider the problem with history-dependent experts, in which each expert uses the previous $d$ days of history of the market in making their predictions. We prove that the value function for this game, rescaled appropriately, converges as $N\to \infty$ at a rate of $O(N^{-1/6})$ to the viscosity solution of a nonlinear degenerate elliptic PDE, which can be understood as the Hamilton-Jacobi-Issacs equation for the two-person game. As a result, we are able to deduce asymptotically optimal strategies for the investor. Our results extend those established by the first author and R.V.Kohn [13] for $n=2$ experts and $d\leq 4$ days of history. To appear in Communications on Pure and Applied Mathematics.

preprint2020arXiv

Poisson Learning: Graph Based Semi-Supervised Learning At Very Low Label Rates

We propose a new framework, called Poisson learning, for graph based semi-supervised learning at very low label rates. Poisson learning is motivated by the need to address the degeneracy of Laplacian semi-supervised learning in this regime. The method replaces the assignment of label values at training points with the placement of sources and sinks, and solves the resulting Poisson equation on the graph. The outcomes are provably more stable and informative than those of Laplacian learning. Poisson learning is efficient and simple to implement, and we present numerical experiments showing the method is superior to other recent approaches to semi-supervised learning at low label rates on MNIST, FashionMNIST, and Cifar-10. We also propose a graph-cut enhancement of Poisson learning, called Poisson MBO, that gives higher accuracy and can incorporate prior knowledge of relative class sizes.

preprint2020arXiv

Rates of Convergence for Laplacian Semi-Supervised Learning with Low Labeling Rates

We study graph-based Laplacian semi-supervised learning at low labeling rates. Laplacian learning uses harmonic extension on a graph to propagate labels. At very low label rates, Laplacian learning becomes degenerate and the solution is roughly constant with spikes at each labeled data point. Previous work has shown that this degeneracy occurs when the number of labeled data points is finite while the number of unlabeled data points tends to infinity. In this work we allow the number of labeled data points to grow to infinity with the number of labels. Our results show that for a random geometric graph with length scale $\varepsilon>0$ and labeling rate $β>0$, if $β\ll\varepsilon^2$ then the solution becomes degenerate and spikes form, and if $β\gg \varepsilon^2$ then Laplacian learning is well-posed and consistent with a continuum Laplace equation. Furthermore, in the well-posed setting we prove quantitative error estimates of $O(\varepsilonβ^{-1/2})$ for the difference between the solutions of the discrete problem and continuum PDE, up to logarithmic factors. We also study $p$-Laplacian regularization and show the same degeneracy result when $β\ll \varepsilon^p$. The proofs of our well-posedness results use the random walk interpretation of Laplacian learning and PDE arguments, while the proofs of the ill-posedness results use $Γ$-convergence tools from the calculus of variations. We also present numerical results on synthetic and real data to illustrate our results.

preprint2015arXiv

A direct verification argument for the Hamilton-Jacobi equation continuum limit of nondominated sorting

Nondominated sorting is a combinatorial algorithm that sorts points in Euclidean space into layers according to a partial order. It was recently shown that nondominated sorting of random points has a Hamilton-Jacobi equation continuum limit. The original proof relies on a continuum variational problem. In this paper, we give a new proof using a direct verification argument that completely avoids the variational interpretation. We believe this proof is new in the homogenization literature, and may be generalized to apply to other stochastic homogenization problems for which there is no obvious underlying variational principle.

preprint2015arXiv

Multi-criteria Similarity-based Anomaly Detection using Pareto Depth Analysis

We consider the problem of identifying patterns in a data set that exhibit anomalous behavior, often referred to as anomaly detection. Similarity-based anomaly detection algorithms detect abnormally large amounts of similarity or dissimilarity, e.g.~as measured by nearest neighbor Euclidean distances between a test sample and the training samples. In many application domains there may not exist a single dissimilarity measure that captures all possible anomalous patterns. In such cases, multiple dissimilarity measures can be defined, including non-metric measures, and one can test for anomalies by scalarizing using a non-negative linear combination of them. If the relative importance of the different dissimilarity measures are not known in advance, as in many anomaly detection applications, the anomaly detection algorithm may need to be executed multiple times with different choices of weights in the linear combination. In this paper, we propose a method for similarity-based anomaly detection using a novel multi-criteria dissimilarity measure, the Pareto depth. The proposed Pareto depth analysis (PDA) anomaly detection algorithm uses the concept of Pareto optimality to detect anomalies under multiple criteria without having to run an algorithm multiple times with different choices of weights. The proposed PDA approach is provably better than using linear combinations of the criteria and shows superior performance on experiments with synthetic and real data sets.

preprint2015arXiv

Numerical schemes and rates of convergence for the Hamilton-Jacobi equation continuum limit of nondominated sorting

Nondominated sorting arranges a set of points in Euclidean space into layers by repeatedly removing the coordinatewise minimal elements. It was recently shown that nondominated sorting of random points has a Hamilton-Jacobi equation continuum limit. The obvious numerical scheme for this PDE has a slow convergence rate of O(h^1/n) for a grid of spacing h>0 in dimension n. In this paper, we introduce two new numerical schemes that have formal rates of O(h) and we prove the usual O(h^1/2) theoretical rates. We also present the results of numerical simulations illustrating the difference between the formal and theoretical rates.

preprint2014arXiv

Directed last passage percolation with discontinuous weights

We prove that a directed last passage percolation model with discontinuous macroscopic (non-random) inhomogeneities has a continuum limit that corresponds to solving a Hamilton-Jacobi equation in the viscosity sense. This Hamilton-Jacobi equation is closely related to the conservation law for the hydrodynamic limit of the totally asymmetric simple exclusion process. We also prove convergence of a numerical scheme for the Hamilton-Jacobi equation and present an algorithm based on dynamic programming for finding the asymptotic shapes of maximal directed paths.

preprint2014arXiv

Pareto-depth for Multiple-query Image Retrieval

Most content-based image retrieval systems consider either one single query, or multiple queries that include the same object or represent the same semantic information. In this paper we consider the content-based image retrieval problem for multiple query images corresponding to different image semantics. We propose a novel multiple-query information retrieval algorithm that combines the Pareto front method (PFM) with efficient manifold ranking (EMR). We show that our proposed algorithm outperforms state of the art multiple-query retrieval algorithms on real-world image databases. We attribute this performance improvement to concavity properties of the Pareto fronts, and prove a theoretical result that characterizes the asymptotic concavity of the fronts.

preprint2013arXiv

A Hamilton-Jacobi equation for the continuum limit of non-dominated sorting

We show that non-dominated sorting of a sequence of i.i.d. random variables in Euclidean space has a continuum limit that corresponds to solving a Hamilton-Jacobi equation involving the probability density function of the random variables. Non-dominated sorting is a fundamental problem in multi-objective optimization, and is equivalent to finding the canonical antichain partition and to problems involving the longest chain among Euclidean points. As an application of this result, we show that non-dominated sorting is asymptotically stable under random perturbations in the data. We give a numerical scheme for computing the viscosity solution of this Hamilton-Jacobi equation and present some numerical simulations for various density functions.

preprint2013arXiv

A PDE-based approach to non-dominated sorting

Non-dominated sorting is a fundamental combinatorial problem in multiobjective optimization, and is equivalent to the longest chain problem in combinatorics and random growth models for crystals in materials science. In a previous work, we showed that non-dominated sorting has a continuum limit that corresponds to solving a Hamilton-Jacobi equation. In this work we present and analyze a fast numerical scheme for this Hamilton-Jacobi equation, and show how it can be used to design a fast algorithm for approximate non-dominated sorting.

preprint2013arXiv

Multi-criteria Anomaly Detection using Pareto Depth Analysis

We consider the problem of identifying patterns in a data set that exhibit anomalous behavior, often referred to as anomaly detection. In most anomaly detection algorithms, the dissimilarity between data samples is calculated by a single criterion, such as Euclidean distance. However, in many cases there may not exist a single dissimilarity measure that captures all possible anomalous patterns. In such a case, multiple criteria can be defined, and one can test for anomalies by scalarizing the multiple criteria using a linear combination of them. If the importance of the different criteria are not known in advance, the algorithm may need to be executed multiple times with different choices of weights in the linear combination. In this paper, we introduce a novel non-parametric multi-criteria anomaly detection method using Pareto depth analysis (PDA). PDA uses the concept of Pareto optimality to detect anomalies under multiple criteria without having to run an algorithm multiple times with different choices of weights. The proposed PDA approach scales linearly in the number of criteria and is provably better than linear combinations of the criteria.

Jeff Calder

What is connected

Connect this record

See the researcher in context

Building this map preview

22 published item(s)

Dynamical feedback control with operator learning for the Vlasov-Poisson system

Analysis and algorithms for $\ell_p$-based semi-supervised learning on graphs

Boundary Estimation from Point Clouds: Algorithms, Guarantees and Applications

Graph-based Active Learning for Semi-supervised Classification of SAR Data

Hamilton-Jacobi equations on graphs with applications to semi-supervised learning and data depth

Rates of convergence for the continuum limit of nondominated sorting

Use and Misuse of Machine Learning in Anthropology

Using machine learning on new feature sets extracted from 3D models of broken animal bones to classify fragments according to break agent

A continuum limit for the PageRank algorithm

Asymptotically optimal strategies for online prediction with history-dependent experts

Improved spectral convergence rates for graph Laplacians on epsilon-graphs and k-NN graphs

Online Prediction With History-Dependent Experts: The General Case

Poisson Learning: Graph Based Semi-Supervised Learning At Very Low Label Rates

Rates of Convergence for Laplacian Semi-Supervised Learning with Low Labeling Rates

A direct verification argument for the Hamilton-Jacobi equation continuum limit of nondominated sorting

Multi-criteria Similarity-based Anomaly Detection using Pareto Depth Analysis

Numerical schemes and rates of convergence for the Hamilton-Jacobi equation continuum limit of nondominated sorting

Directed last passage percolation with discontinuous weights

Pareto-depth for Multiple-query Image Retrieval

A Hamilton-Jacobi equation for the continuum limit of non-dominated sorting

A PDE-based approach to non-dominated sorting

Multi-criteria Anomaly Detection using Pareto Depth Analysis