Source author record

Jason Xu

Jason Xu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Methodology Machine Learning Populations and Evolution Computation math.OC math.PR math.ST physics.soc-ph Statistics Theory

Catalog footprint

What is connected

9works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

A unified analysis of convex and non-convex lp-ball projection problems

The task of projecting onto $\ell_p$ norm balls is ubiquitous in statistics and machine learning, yet the availability of actionable algorithms for doing so is largely limited to the special cases of $p = \left\{ 0, 1,2, \infty \right\}$. In this paper, we introduce novel, scalable methods for projecting onto the $\ell_p$ ball for general $p>0$. For $p \geq1 $, we solve the univariate Lagrangian dual via a dual Newton method. We then carefully design a bisection approach for $p<1$, presenting theoretical and empirical evidence of zero or a small duality gap in the non-convex case. The success of our contributions is thoroughly assessed empirically, and applied to large-scale regularized multi-task learning and compressed sensing.

preprint2022arXiv

Bregman Power k-Means for Clustering Exponential Family Data

Recent progress in center-based clustering algorithms combats poor local minima by implicit annealing, using a family of generalized means. These methods are variations of Lloyd's celebrated $k$-means algorithm, and are most appropriate for spherical clusters such as those arising from Gaussian data. In this paper, we bridge these algorithmic advances to classical work on hard clustering under Bregman divergences, which enjoy a bijection to exponential family distributions and are thus well-suited for clustering objects arising from a breadth of data generating mechanisms. The elegant properties of Bregman divergences allow us to maintain closed form updates in a simple and transparent algorithm, and moreover lead to new theoretical arguments for establishing finite sample bounds that relax the bounded support assumption made in the existing state of the art. Additionally, we consider thorough empirical analyses on simulated experiments and a case study on rainfall data, finding that the proposed method outperforms existing peer methods in a variety of non-Gaussian data settings.

preprint2022arXiv

Quasi-Newton acceleration of EM and MM algorithms via Broyden$'$s method

The principle of majorization-minimization (MM) provides a general framework for eliciting effective algorithms to solve optimization problems. However, they often suffer from slow convergence, especially in large-scale and high-dimensional data settings. This has drawn attention to acceleration schemes designed exclusively for MM algorithms, but many existing designs are either problem-specific or rely on approximations and heuristics loosely inspired by the optimization literature. We propose a novel, rigorous quasi-Newton method for accelerating any valid MM algorithm, cast as seeking a fixed point of the MM \textit{algorithm map}. The method does not require specific information or computation from the objective function or its gradient and enjoys a limited-memory variant amenable to efficient computation in high-dimensional settings. By connecting our approach to Broyden's classical root-finding methods, we establish convergence guarantees and identify conditions for linear and super-linear convergence. These results are validated numerically and compared to peer methods in a thorough empirical study, showing that it achieves state-of-the-art performance across a diverse range of problems.

preprint2020arXiv

Entropy Regularized Power k-Means Clustering

Despite its well-known shortcomings, $k$-means remains one of the most widely used approaches to data clustering. Current research continues to tackle its flaws while attempting to preserve its simplicity. Recently, the \textit{power $k$-means} algorithm was proposed to avoid trapping in local minima by annealing through a family of smoother surfaces. However, the approach lacks theoretical justification and fails in high dimensions when many features are irrelevant. This paper addresses these issues by introducing \textit{entropy regularization} to learn feature relevance while annealing. We prove consistency of the proposed approach and derive a scalable majorization-minimization algorithm that enjoys closed-form updates and convergence guarantees. In particular, our method retains the same computational complexity of $k$-means and power $k$-means, but yields significant improvements over both. Its merits are thoroughly assessed on a suite of real and synthetic data experiments.

preprint2020arXiv

Likelihood-based Inference for Partially Observed Epidemics on Dynamic Networks

We propose a generative model and an inference scheme for epidemic processes on dynamic, adaptive contact networks. Network evolution is formulated as a link-Markovian process, which is then coupled to an individual-level stochastic SIR model, in order to describe the interplay between epidemic dynamics on a network and network link changes. A Markov chain Monte Carlo framework is developed for likelihood-based inference from partial epidemic observations, with a novel data augmentation algorithm specifically designed to deal with missing individual recovery times under the dynamic network setting. Through a series of simulation experiments, we demonstrate the validity and flexibility of the model as well as the efficacy and efficiency of the data augmentation inference scheme. The model is also applied to a recent real-world dataset on influenza-like-illness transmission with high-resolution social contact tracking records.

preprint2015arXiv

Efficient Transition Probability Computation for Continuous-Time Branching Processes via Compressed Sensing

Branching processes are a class of continuous-time Markov chains (CTMCs) with ubiquitous applications. A general difficulty in statistical inference under partially observed CTMC models arises in computing transition probabilities when the discrete state space is large or uncountable. Classical methods such as matrix exponentiation are infeasible for large or countably infinite state spaces, and sampling-based alternatives are computationally intensive, requiring a large integration step to impute over all possible hidden events. Recent work has successfully applied generating function techniques to computing transition probabilities for linear multitype branching processes. While these techniques often require significantly fewer computations than matrix exponentiation, they also become prohibitive in applications with large populations. We propose a compressed sensing framework that significantly accelerates the generating function method, decreasing computational cost up to a logarithmic factor by only assuming the probability mass of transitions is sparse. We demonstrate accurate and efficient transition probability computations in branching process models for hematopoiesis and transposable element evolution.

preprint2014arXiv

Likelihood-Based Inference for Discretely Observed Birth-Death-Shift Processes, with Applications to Evolution of Mobile Genetic Elements

Continuous-time birth-death-shift (BDS) processes are frequently used in stochastic modeling, with many applications in ecology and epidemiology. In particular, such processes can model evolutionary dynamics of transposable elements - important genetic markers in molecular epidemiology. Estimation of the effects of individual covariates on the birth, death, and shift rates of the process can be accomplished by analyzing patient data, but inferring these rates in a discretely and unevenly observed setting presents computational challenges. We propose a mutli-type branching process approximation to BDS processes and develop a corresponding expectation maximization (EM) algorithm, where we use spectral techniques to reduce calculation of expected sufficient statistics to low dimensional integration. These techniques yield an efficient and robust optimization routine for inferring the rates of the BDS process, and apply more broadly to multi-type branching processes where rates can depend on many covariates. After rigorously testing our methodology in simulation studies, we apply our method to study intrapatient time evolution of IS6110 transposable element, a frequently used element during estimation of epidemiological clusters of Mycobacterium tuberculosis infections.

preprint2014arXiv

Stochastic Variational Inference for Hidden Markov Models

Variational inference algorithms have proven successful for Bayesian analysis in large data settings, with recent advances using stochastic variational inference (SVI). However, such methods have largely been studied in independent or exchangeable data settings. We develop an SVI algorithm to learn the parameters of hidden Markov models (HMMs) in a time-dependent data setting. The challenge in applying stochastic optimization in this setting arises from dependencies in the chain, which must be broken to consider minibatches of observations. We propose an algorithm that harnesses the memory decay of the chain to adaptively bound errors arising from edge effects. We demonstrate the effectiveness of our algorithm on synthetic experiments and a large genomics dataset where a batch algorithm is computationally infeasible.

preprint2010arXiv

Bounds on the artificial phase transition for perfect simulation of repulsive point processes

Repulsive point processes arise in models where competition forces entities to be more spread apart than if placed independently. Simulation of these types of processes can be accomplished using dominated coupling from the past with a running time that varies as the intensity of the number of points. These algorithms usually exhibit what is called an artificial phase transition, where below a critical intensity the algorithm runs in finite expected time, but above the critical intensity the expected number of steps is infinite. Here the artificial phase transition is examined. In particular, an earlier lower bound on this artificial phase transition is improved by including a new type of term in the analysis. In addition, the results of computer experiments to locate the transition are presented.

Jason Xu

What is connected

Connect this record

See the researcher in context

Building this map preview

9 published item(s)

A unified analysis of convex and non-convex lp-ball projection problems

Bregman Power k-Means for Clustering Exponential Family Data

Quasi-Newton acceleration of EM and MM algorithms via Broyden$'$s method

Entropy Regularized Power k-Means Clustering

Likelihood-based Inference for Partially Observed Epidemics on Dynamic Networks

Efficient Transition Probability Computation for Continuous-Time Branching Processes via Compressed Sensing

Likelihood-Based Inference for Discretely Observed Birth-Death-Shift Processes, with Applications to Evolution of Mobile Genetic Elements

Stochastic Variational Inference for Hidden Markov Models

Bounds on the artificial phase transition for perfect simulation of repulsive point processes