Source author record

Yao Xie

Yao Xie appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning math.ST Statistics Theory Information Theory math.IT Methodology Applications math.AT stat.OT Data Structures and Algorithms eess.SP eess.SY Human-Computer Interaction math.CO math.OC Numerical Analysis Social and Information Networks Systems and Control

Catalog footprint

What is connected

47works

18topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Online Kernel CUSUM for Change-Point Detection

We present a computationally efficient online kernel Cumulative Sum (CUSUM) method for change-point detection that utilizes the maximum over a set of kernel statistics to account for the unknown change-point location. Our approach exhibits increased sensitivity to small changes compared to existing kernel-based change-point detection methods, including Scan-B statistic, corresponding to a non-parametric Shewhart chart-type procedure. We provide accurate analytic approximations for two key performance metrics: the Average Run Length (ARL) and Expected Detection Delay (EDD), which enable us to establish an optimal window length to be on the order of the logarithm of ARL to ensure minimal power loss relative to an oracle procedure with infinite memory. Moreover, we introduce a recursive calculation procedure for detection statistics to ensure constant computational and memory complexity, which is essential for online implementation. Through extensive experiments on both simulated and real data, we demonstrate the competitive performance of our method and validate our theoretical results.

preprint2026arXiv

Point processes with event time uncertainty

Point processes are widely used statistical models for continuous-time discrete event data, such as medical records, crime reports, and social network interactions, to capture the influence of historical events on future occurrences. In many applications, however, event times are not observed exactly, motivating the need to incorporate time uncertainty into point process modeling. In this work, we introduce a framework for modeling time-uncertain self-exciting point processes, known as Hawkes processes, possibly defined over a network. We begin by formulating the model in continuous time under assumptions motivated by real-world scenarios. By imposing a time grid, we obtain a discrete-time model that facilitates inference and enables computation via first-order optimization methods such as gradient descent and variational inequality (VI). We establish a parameter recovery guarantee for VI inference with an $O(1/k)$ convergence rate using $k$ steps. Our framework accommodates non-stationary processes by representing the influence kernel as a matrix (or tensor on a network), while also encompassing stationary processes, such as the classical Hawkes process, as a special case. Empirically, we demonstrate that the proposed approach outperforms existing baselines on both simulated and real-world datasets, including the sepsis-associated derangement prediction challenge and the Atlanta Police Crime Dataset.

preprint2024arXiv

Transfer Learning for Causal Effect Estimation

We present a Transfer Causal Learning (TCL) framework when target and source domains share the same covariate/feature spaces, aiming to improve causal effect estimation accuracy in limited data. Limited data is very common in medical applications, where some rare medical conditions, such as sepsis, are of interest. Our proposed method, named \texttt{$\ell_1$-TCL}, incorporates $\ell_1$ regularized TL for nuisance models (e.g., propensity score model); the TL estimator of the nuisance parameters is plugged into downstream average causal/treatment effect estimators (e.g., inverse probability weighted estimator). We establish non-asymptotic recovery guarantees for the \texttt{$\ell_1$-TCL} with generalized linear model (GLM) under the sparsity assumption in the high-dimensional setting, and demonstrate the empirical benefits of \texttt{$\ell_1$-TCL} through extensive numerical simulation for GLM and recent neural network nuisance models. Our method is subsequently extended to real data and generates meaningful insights consistent with medical literature, a case where all baseline methods fail.

preprint2022arXiv

A Data-Driven Approach to Robust Hypothesis Testing Using Sinkhorn Uncertainty Sets

Hypothesis testing for small-sample scenarios is a practically important problem. In this paper, we investigate the robust hypothesis testing problem in a data-driven manner, where we seek the worst-case detector over distributional uncertainty sets centered around the empirical distribution from samples using Sinkhorn distance. Compared with the Wasserstein robust test, the corresponding least favorable distributions are supported beyond the training samples, which provides a more flexible detector. Various numerical experiments are conducted on both synthetic and real datasets to validate the competitive performances of our proposed method.

preprint2022arXiv

Bayesian Uncertainty Quantification for Low-Rank Matrix Completion

We consider the problem of uncertainty quantification for an unknown low-rank matrix $\mathbf{X}$, given a partial and noisy observation of its entries. This quantification of uncertainty is essential for many real-world problems, including image processing, satellite imaging, and seismology, providing a principled framework for validating scientific conclusions and guiding decision-making. However, existing literature has mainly focused on the completion (i.e., point estimation) of the matrix $\mathbf{X}$, with little work on investigating its uncertainty. To this end, we propose in this work a new Bayesian modeling framework, called BayeSMG, which parametrizes the unknown $\mathbf{X}$ via its underlying row and column subspaces. This Bayesian subspace parametrization enables efficient posterior inference on matrix subspaces, which represents interpretable phenomena in many applications. This can then be leveraged for improved matrix recovery. We demonstrate the effectiveness of BayeSMG over existing Bayesian matrix recovery methods in numerical experiments, image inpainting, and a seismic sensor network application.

preprint2022arXiv

Conformal prediction set for time-series

When building either prediction intervals for regression (with real-valued response) or prediction sets for classification (with categorical responses), uncertainty quantification is essential to studying complex machine learning methods. In this paper, we develop Ensemble Regularized Adaptive Prediction Set (ERAPS) to construct prediction sets for time-series (with categorical responses), based on the prior work of [Xu and Xie, 2021]. In particular, we allow unknown dependencies to exist within features and responses that arrive in sequence. Method-wise, ERAPS is a distribution-free and ensemble-based framework that is applicable for arbitrary classifiers. Theoretically, we bound the coverage gap without assuming data exchangeability and show asymptotic set convergence. Empirically, we demonstrate valid marginal and conditional coverage by ERAPS, which also tends to yield smaller prediction sets than competing methods.

preprint2022arXiv

Distributionally Robust Weighted $k$-Nearest Neighbors

Learning a robust classifier from a few samples remains a key challenge in machine learning. A major thrust of research has been focused on developing $k$-nearest neighbor ($k$-NN) based algorithms combined with metric learning that captures similarities between samples. When the samples are limited, robustness is especially crucial to ensure the generalization capability of the classifier. In this paper, we study a minimax distributionally robust formulation of weighted $k$-nearest neighbors, which aims to find the optimal weighted $k$-NN classifiers that hedge against feature uncertainties. We develop an algorithm, \texttt{Dr.k-NN}, that efficiently solves this functional optimization problem and features in assigning minimax optimal weights to training samples when performing classification. These weights are class-dependent, and are determined by the similarities of sample features under the least favorable scenarios. When the size of the uncertainty set is properly tuned, the robust classifier has a smaller Lipschitz norm than the vanilla $k$-NN, and thus improves the generalization capability. We also couple our framework with neural-network-based feature embedding. We demonstrate the competitive performance of our algorithm compared to the state-of-the-art in the few-training-sample setting with various real-data experiments.

preprint2022arXiv

Learning Sinkhorn divergences for supervised change point detection

Many modern applications require detecting change points in complex sequential data. Most existing methods for change point detection are unsupervised and, as a consequence, lack any information regarding what kind of changes we want to detect or if some kinds of changes are safe to ignore. This often results in poor change detection performance. We present a novel change point detection framework that uses true change point instances as supervision for learning a ground metric such that Sinkhorn divergences can be then used in two-sample tests on sliding windows to detect change points in an online manner. Our method can be used to learn a sparse metric which can be useful for both feature selection and interpretation in high-dimensional change point detection settings. Experiments on simulated as well as real world sequences show that our proposed method can substantially improve change point detection performance over existing unsupervised change point detection methods using only few labeled change point instances.

preprint2022arXiv

Neural Spectral Marked Point Processes

Self- and mutually-exciting point processes are popular models in machine learning and statistics for dependent discrete event data. To date, most existing models assume stationary kernels (including the classical Hawkes processes) and simple parametric models. Modern applications with complex event data require more general point process models that can incorporate contextual information of the events, called marks, besides the temporal and location information. Moreover, such applications often require non-stationary models to capture more complex spatio-temporal dependence. To tackle these challenges, a key question is to devise a versatile influence kernel in the point process model. In this paper, we introduce a novel and general neural network-based non-stationary influence kernel with high expressiveness for handling complex discrete events data while providing theoretical performance guarantees. We demonstrate the superior performance of our proposed method compared with the state-of-the-art on synthetic and real data.

preprint2022arXiv

PERCEPT: a new online change-point detection method using topological data analysis

Topological data analysis (TDA) provides a set of data analysis tools for extracting embedded topological structures from complex high-dimensional datasets. In recent years, TDA has been a rapidly growing field which has found success in a wide range of applications, including signal processing, neuroscience and network analysis. In these applications, the online detection of changes is of crucial importance, but this can be highly challenging since such changes often occur in a low-dimensional embedding within high-dimensional data streams. We thus propose a new method, called PERsistence diagram-based ChangE-PoinT detection (PERCEPT), which leverages the learned topological structure from TDA to sequentially detect changes. PERCEPT follows two key steps: it first learns the embedded topology as a point cloud via persistence diagrams, then applies a non-parametric monitoring approach for detecting changes in the resulting point cloud distributions. This yields a non-parametric, topology-aware framework which can efficiently detect online changes from high-dimensional data streams. We investigate the effectiveness of PERCEPT over existing methods in a suite of numerical experiments where the data streams have an embedded topological structure. We then demonstrate the usefulness of PERCEPT in two applications in solar flare monitoring and human gesture detection.

preprint2022arXiv

Sequential change-point detection for mutually exciting point processes over networks

We present a new CUSUM procedure for sequentially detecting change-point in the self and mutual exciting processes, a.k.a. Hawkes networks using discrete events data. Hawkes networks have become a popular model for statistics and machine learning due to their capability in modeling irregularly observed data where the timing between events carries a lot of information. The problem of detecting abrupt changes in Hawkes networks arises from various applications, including neuronal imaging, sensor network, and social network monitoring. Despite this, there has not been a computationally and memory-efficient online algorithm for detecting such changes from sequential data. We present an efficient online recursive implementation of the CUSUM statistic for Hawkes processes, both decentralized and memory-efficient, and establish the theoretical properties of this new CUSUM procedure. We then show that the proposed CUSUM method achieves better performance than existing methods, including the Shewhart procedure based on count data, the generalized likelihood ratio (GLR) in the existing literature, and the standard score statistic. We demonstrate this via a simulated example and an application to population code change-detection in neuronal networks.

preprint2022arXiv

Solar Radiation Ramping Events Modeling Using Spatio-temporal Point Processes

Modeling and predicting solar events, particularly the solar ramping event, is critical for improving situational awareness for solar power generation systems. It has been acknowledged that weather conditions such as temperature, humidity, and cloud density can significantly impact the emergence and position of solar ramping events. As a result, modeling these events with complex spatio-temporal correlations is highly challenging. To tackle the question, we adopt a novel spatio-temporal categorical point process model, which intuitively and effectively addresses correlation and interaction among ramping events. We demonstrate the interpretability and predictive power of our model on extensive real-data experiments.

preprint2022arXiv

Two-sample Test with Kernel Projected Wasserstein Distance

We develop a kernel projected Wasserstein distance for the two-sample test, an essential building block in statistics and machine learning: given two sets of samples, to determine whether they are from the same distribution. This method operates by finding the nonlinear mapping in the data space which maximizes the distance between projected distributions. In contrast to existing works about projected Wasserstein distance, the proposed method circumvents the curse of dimensionality more efficiently. We present practical algorithms for computing this distance function together with the non-asymptotic uncertainty quantification of empirical estimates. Numerical examples validate our theoretical results and demonstrate good performance of the proposed method.

preprint2021arXiv

Balanced Districting on Grid Graphs with Provable Compactness and Contiguity

Given a graph $G = (V,E)$ with vertex weights $w(v)$ and a desired number of parts $k$, the goal in graph partitioning problems is to partition the vertex set V into parts $V_1,\ldots,V_k$. Metrics for compactness, contiguity, and balance of the parts $V_i$ are frequent objectives, with much existing literature focusing on compactness and balance. Revisiting an old method known as striping, we give the first polynomial-time algorithms with guaranteed contiguity and provable bicriteria approximations for compactness and balance for planar grid graphs. We consider several types of graph partitioning, including when vertex weights vary smoothly or are stochastic, reflecting concerns in various real-world instances. We show significant improvements in experiments for balancing workloads for the fire department and reducing over-policing using 911 call data from South Fulton, GA.

preprint2021arXiv

Deep Fourier Kernel for Self-Attentive Point Processes

We present a novel attention-based model for discrete event data to capture complex non-linear temporal dependence structures. We borrow the idea from the attention mechanism and incorporate it into the point processes' conditional intensity function. We further introduce a novel score function using Fourier kernel embedding, whose spectrum is represented using neural networks, which drastically differs from the traditional dot-product kernel and can capture a more complex similarity structure. We establish our approach's theoretical properties and demonstrate our approach's competitive performance compared to the state-of-the-art for synthetic and real data.

preprint2021arXiv

Early Detection of COVID-19 Hotspots Using Spatio-Temporal Data

Recently, the Centers for Disease Control and Prevention (CDC) has worked with other federal agencies to identify counties with increasing coronavirus disease 2019 (COVID-19) incidence (hotspots) and offers support to local health departments to limit the spread of the disease. Understanding the spatio-temporal dynamics of hotspot events is of great importance to support policy decisions and prevent large-scale outbreaks. This paper presents a spatio-temporal Bayesian framework for early detection of COVID-19 hotspots (at the county level) in the United States. We assume both the observed number of cases and hotspots depend on a class of latent random variables, which encode the underlying spatio-temporal dynamics of the transmission of COVID-19. Such latent variables follow a zero-mean Gaussian process, whose covariance is specified by a non-stationary kernel function. The most salient feature of our kernel function is that deep neural networks are introduced to enhance the model's representative power while still enjoying the interpretability of the kernel. We derive a sparse model and fit the model using a variational learning strategy to circumvent the computational intractability for large data sets. Our model demonstrates better interpretability and superior hotspot-detection performance compared to other baseline methods.

preprint2021arXiv

Goodness-of-Fit Test for Mismatched Self-Exciting Processes

Recently there have been many research efforts in developing generative models for self-exciting point processes, partly due to their broad applicability for real-world applications. However, rarely can we quantify how well the generative model captures the nature or ground-truth since it is usually unknown. The challenge typically lies in the fact that the generative models typically provide, at most, good approximations to the ground-truth (e.g., through the rich representative power of neural networks), but they cannot be precisely the ground-truth. We thus cannot use the classic goodness-of-fit (GOF) test framework to evaluate their performance. In this paper, we develop a GOF test for generative models of self-exciting processes by making a new connection to this problem with the classical statistical theory of Quasi-maximum-likelihood estimator (QMLE). We present a non-parametric self-normalizing statistic for the GOF test: the Generalized Score (GS) statistics, and explicitly capture the model misspecification when establishing the asymptotic distribution of the GS statistic. Numerical simulation and real-data experiments validate our theory and demonstrate the proposed GS test's good performance.

preprint2021arXiv

Imitation Learning of Neural Spatio-Temporal Point Processes

We present a novel Neural Embedding Spatio-Temporal (NEST) point process model for spatio-temporal discrete event data and develop an efficient imitation learning (a type of reinforcement learning) based approach for model fitting. Despite the rapid development of one-dimensional temporal point processes for discrete event data, the study of spatial-temporal aspects of such data is relatively scarce. Our model captures complex spatio-temporal dependence between discrete events by carefully design a mixture of heterogeneous Gaussian diffusion kernels, whose parameters are parameterized by neural networks. This new kernel is the key that our model can capture intricate spatial dependence patterns and yet still lead to interpretable results as we examine maps of Gaussian diffusion kernel parameters. The imitation learning model fitting for the NEST is more robust than the maximum likelihood estimate. It directly measures the divergence between the empirical distributions between the training data and the model-generated data. Moreover, our imitation learning-based approach enjoys computational efficiency due to the explicit characterization of the reward function related to the likelihood function; furthermore, the likelihood function under our model enjoys tractable expression due to Gaussian kernel parameterization. Experiments based on real data show our method's good performance relative to the state-of-the-art and the good interpretability of NEST's result.

preprint2021arXiv

Inferring serial correlation with dynamic backgrounds

Sequential data with serial correlation and an unknown, unstructured, and dynamic background is ubiquitous in neuroscience, psychology, and econometrics. Inferring serial correlation for such data is a fundamental challenge in statistics. We propose a total variation constrained least square estimator coupled with hypothesis tests to infer the serial correlation in the presence of unknown and unstructured dynamic background. The total variation constraint on the dynamic background encourages a piece-wise constant structure, which can approximate a wide range of dynamic backgrounds. The tuning parameter is selected via the Ljung-Box test to control the bias-variance trade-off. We establish a non-asymptotic upper bound for the estimation error through variational inequalities. We also derive a lower error bound via Fano's method and show the proposed method is near-optimal. Numerical simulation and a real study in psychology demonstrate the excellent performance of our proposed method compared with the state-of-the-art.

preprint2021arXiv

Online detection of cascading change-points

We propose an online detection procedure for cascading failures in the network from sequential data, which can be modeled as multiple correlated change-points happening during a short period. We consider a temporal diffusion network model to capture the temporal dynamic structure of multiple change-points and develop a sequential Shewhart procedure based on the generalized likelihood ratio statistics based on the diffusion network model assuming unknown post-change distribution parameters. We also tackle the computational complexity posed by the unknown propagation. Numerical experiments demonstrate the good performance for detecting cascade failures.

preprint2021arXiv

Online High-Dimensional Change-Point Detection using Topological Data Analysis

Topological Data Analysis (TDA) is a rapidly growing field, which studies methods for learning underlying topological structures present in complex data representations. TDA methods have found recent success in extracting useful geometric structures for a wide range of applications, including protein classification, neuroscience, and time-series analysis. However, in many such applications, one is also interested in sequentially detecting changes in this topological structure. We propose a new method called Persistence Diagram based Change-Point (PD-CP), which tackles this problem by integrating the widely-used persistence diagrams in TDA with recent developments in nonparametric change-point detection. The key novelty in PD-CP is that it leverages the distribution of points on persistence diagrams for online detection of topological changes. We demonstrate the effectiveness of PD-CP in an application to solar flare monitoring.

preprint2021arXiv

Optimality of Graph Scanning Statistic for Online Community Detection

Sequential change-point detection for graphs is a fundamental problem for streaming network data types and has wide applications in social networks and power systems. Given fixed vertices and a sequence of random graphs, the objective is to detect the change-point where the underlying distribution of the random graph changes. In particular, we focus on the local change that only affects a subgraph. We adopt the classical Erdos-Renyi model and revisit the generalized likelihood ratio (GLR) detection procedure. The scan statistic is computed by sequentially estimating the most-likely subgraph where the change happens. We provide theoretical analysis for the asymptotic optimality of the proposed procedure based on the GLR framework. We demonstrate the efficiency of our detection algorithm using simulations.

preprint2021arXiv

Sequential Change Detection by Optimal Weighted $\ell_2$ Divergence

We present a new non-parametric statistic, called the weighed $\ell_2$ divergence, based on empirical distributions for sequential change detection. We start by constructing the weighed $\ell_2$ divergence as a fundamental building block for two-sample tests and change detection. The proposed statistic is proved to attain the optimal sample complexity in the offline setting. We then study the sequential change detection using the weighed $\ell_2$ divergence and characterize the fundamental performance metrics, including the average run length (ARL) and the expected detection delay (EDD). We also present practical algorithms to find the optimal projection to handle high-dimensional data and the optimal weights, which is critical to quick detection since, in such settings, there are not many post-change samples. Simulation results and real data examples are provided to validate the good performance of the proposed method.

preprint2020arXiv

CheXplain: Enabling Physicians to Explore and UnderstandData-Driven, AI-Enabled Medical Imaging Analysis

The recent development of data-driven AI promises to automate medical diagnosis; however, most AI functions as 'black boxes' to physicians with limited computational knowledge. Using medical imaging as a point of departure, we conducted three iterations of design activities to formulate CheXplain---a system that enables physicians to explore and understand AI-enabled chest X-ray analysis: (1) a paired survey between referring physicians and radiologists reveals whether, when, and what kinds of explanations are needed; (2) a low-fidelity prototype co-designed with three physicians formulates eight key features; and (3) a high-fidelity prototype evaluated by another six physicians provides detailed summative insights on how each feature enables the exploration and understanding of AI. We summarize by discussing recommendations for future work to design and implement explainable medical AI systems that encompass four recurring themes: motivation, constraint, explanation, and justification.

preprint2016arXiv

Data-Driven Threshold Machine: Scan Statistics, Change-Point Detection, and Extreme Bandits

We present a novel distribution-free approach, the data-driven threshold machine (DTM), for a fundamental problem at the core of many learning tasks: choose a threshold for a given pre-specified level that bounds the tail probability of the maximum of a (possibly dependent but stationary) random sequence. We do not assume data distribution, but rather relying on the asymptotic distribution of extremal values, and reduce the problem to estimate three parameters of the extreme value distributions and the extremal index. We specially take care of data dependence via estimating extremal index since in many settings, such as scan statistics, change-point detection, and extreme bandits, where dependence in the sequence of statistics can be significant. Key features of our DTM also include robustness and the computational efficiency, and it only requires one sample path to form a reliable estimate of the threshold, in contrast to the Monte Carlo sampling approach which requires drawing a large number of sample paths. We demonstrate the good performance of DTM via numerical examples in various dependent settings.

preprint2016arXiv

Detecting weak changes in dynamic events over networks

Large volume of networked streaming event data are becoming increasingly available in a wide variety of applications, such as social network analysis, Internet traffic monitoring and healthcare analytics. Streaming event data are discrete observation occurred in continuous time, and the precise time interval between two events carries a great deal of information about the dynamics of the underlying systems. How to promptly detect changes in these dynamic systems using these streaming event data? In this paper, we propose a novel change-point detection framework for multi-dimensional event data over networks. We cast the problem into sequential hypothesis test, and derive the likelihood ratios for point processes, which are computed efficiently via an EM-like algorithm that is parameter-free and can be computed in a distributed fashion. We derive a highly accurate theoretical characterization of the false-alarm-rate, and show that it can achieve weak signal detection by aggregating local statistics over time and networks. Finally, we demonstrate the good performance of our algorithm on numerical examples and real-world datasets from twitter and Memetracker.

preprint2016arXiv

Dynamic change-point detection using similarity networks

From a sequence of similarity networks, with edges representing certain similarity measures between nodes, we are interested in detecting a change-point which changes the statistical property of the networks. After the change, a subset of anomalous nodes which compares dissimilarly with the normal nodes. We study a simple sequential change detection procedure based on node-wise average similarity measures, and study its theoretical property. Simulation and real-data examples demonstrate such a simply stopping procedure has reasonably good performance. We further discuss the faulty sensor isolation (estimating anomalous nodes) using community detection.

preprint2016arXiv

Multi-Sensor Slope Change Detection

We develop a mixture procedure for multi-sensor systems to monitor data streams for a change-point that causes a gradual degradation to a subset of the streams. Observations are assumed to be initially normal random variables with known constant means and variances. After the change-point, observations in the subset will have increasing or decreasing means. The subset and the rate-of-changes are unknown. Our procedure uses a mixture statistics, which assumes that each sensor is affected by the change-point with probability $p_0$. Analytic expressions are obtained for the average run length (ARL) and the expected detection delay (EDD) of the mixture procedure, which are demonstrated to be quite accurate numerically. We establish the asymptotic optimality of the mixture procedure. Numerical examples demonstrate the good performance of the proposed procedure. We also discuss an adaptive mixture procedure using empirical Bayes. This paper extends our earlier work on detecting an abrupt change-point that causes a mean-shift, by tackling the challenges posed by the non-stationarity of the slope-change problem.

preprint2016arXiv

Sequential Low-Rank Change Detection

Detecting emergence of a low-rank signal from high-dimensional data is an important problem arising from many applications such as camera surveillance and swarm monitoring using sensors. We consider a procedure based on the largest eigenvalue of the sample covariance matrix over a sliding window to detect the change. To achieve dimensionality reduction, we present a sketching-based approach for rank change detection using the low-dimensional linear sketches of the original high-dimensional observations. The premise is that when the sketching matrix is a random Gaussian matrix, and the dimension of the sketching vector is sufficiently large, the rank of sample covariance matrix for these sketches equals the rank of the original sample covariance matrix with high probability. Hence, we may be able to detect the low-rank change using sample covariance matrices of the sketches without having to recover the original covariance matrix. We character the performance of the largest eigenvalue statistic in terms of the false-alarm-rate and the expected detection delay, and present an efficient online implementation via subspace tracking.

preprint2015arXiv

Categorical Matrix Completion

We consider the problem of completing a matrix with categorical-valued entries from partial observations. This is achieved by extending the formulation and theory of one-bit matrix completion. We recover a low-rank matrix $X$ by maximizing the likelihood ratio with a constraint on the nuclear norm of $X$, and the observations are mapped from entries of $X$ through multiple link functions. We establish theoretical upper and lower bounds on the recovery error, which meet up to a constant factor $\mathcal{O}(K^{3/2})$ where $K$ is the fixed number of categories. The upper bound in our case depends on the number of categories implicitly through a maximization of terms that involve the smoothness of the link functions. In contrast to one-bit matrix completion, our bounds for categorical matrix completion are optimal up to a factor on the order of the square root of the number of categories, which is consistent with an intuition that the problem becomes harder when the number of categories increases. By comparing the performance of our method with the conventional matrix completion method on the MovieLens dataset, we demonstrate the advantage of our method.

preprint2015arXiv

Online Supervised Subspace Tracking

We present a framework for supervised subspace tracking, when there are two time series $x_t$ and $y_t$, one being the high-dimensional predictors and the other being the response variables and the subspace tracking needs to take into consideration of both sequences. It extends the classic online subspace tracking work which can be viewed as tracking of $x_t$ only. Our online sufficient dimensionality reduction (OSDR) is a meta-algorithm that can be applied to various cases including linear regression, logistic regression, multiple linear regression, multinomial logistic regression, support vector machine, the random dot product model and the multi-scale union-of-subspace model. OSDR reduces data-dimensionality on-the-fly with low-computational complexity and it can also handle missing data and dynamic data. OSDR uses an alternating minimization scheme and updates the subspace via gradient descent on the Grassmannian manifold. The subspace update can be performed efficiently utilizing the fact that the Grassmannian gradient with respect to the subspace in many settings is rank-one (or low-rank in certain cases). The optimization problem for OSDR is non-convex and hard to analyze in general; we provide convergence analysis of OSDR in a simple linear regression setting. The good performance of OSDR compared with the conventional unsupervised subspace tracking are demonstrated via numerical examples on simulated and real data.

preprint2015arXiv

Poisson Matrix Completion

We extend the theory of matrix completion to the case where we make Poisson observations for a subset of entries of a low-rank matrix. We consider the (now) usual matrix recovery formulation through maximum likelihood with proper constraints on the matrix $M$, and establish theoretical upper and lower bounds on the recovery error. Our bounds are nearly optimal up to a factor on the order of $\mathcal{O}(\log(d_1 d_2))$. These bounds are obtained by adapting the arguments used for one-bit matrix completion \cite{davenport20121} (although these two problems are different in nature) and the adaptation requires new techniques exploiting properties of the Poisson likelihood function and tackling the difficulties posed by the locally sub-Gaussian characteristic of the Poisson distribution. Our results highlight a few important distinctions of Poisson matrix completion compared to the prior work in matrix completion including having to impose a minimum signal-to-noise requirement on each observed entry. We also develop an efficient iterative algorithm and demonstrate its good performance in recovering solar flare images.

preprint2015arXiv

Poisson Matrix Recovery and Completion

We extend the theory of low-rank matrix recovery and completion to the case when Poisson observations for a linear combination or a subset of the entries of a matrix are available, which arises in various applications with count data. We consider the usual matrix recovery formulation through maximum likelihood with proper constraints on the matrix $M$ of size $d_1$-by-$d_2$, and establish theoretical upper and lower bounds on the recovery error. Our bounds for matrix completion are nearly optimal up to a factor on the order of $\mathcal{O}(\log(d_1 d_2))$. These bounds are obtained by combing techniques for compressed sensing for sparse vectors with Poisson noise and for analyzing low-rank matrices, as well as adapting the arguments used for one-bit matrix completion \cite{davenport20121} (although these two problems are different in nature) and the adaptation requires new techniques exploiting properties of the Poisson likelihood function and tackling the difficulties posed by the locally sub-Gaussian characteristic of the Poisson distribution. Our results highlight a few important distinctions of the Poisson case compared to the prior work including having to impose a minimum signal-to-noise requirement on each observed entry and a gap in the upper and lower bounds. We also develop a set of efficient iterative algorithms and demonstrate their good performance on synthetic examples and real data.

preprint2015arXiv

Sequential Information Guided Sensing

We study the value of information in sequential compressed sensing by characterizing the performance of sequential information guided sensing in practical scenarios when information is inaccurate. In particular, we assume the signal distribution is parameterized through Gaussian or Gaussian mixtures with estimated mean and covariance matrices, and we can measure compressively through a noisy linear projection or using one-sparse vectors, i.e., observing one entry of the signal each time. We establish a set of performance bounds for the bias and variance of the signal estimator via posterior mean, by capturing the conditional entropy (which is also related to the size of the uncertainty), and the additional power required due to inaccurate information to reach a desired precision. Based on this, we further study how to estimate covariance based on direct samples or covariance sketching. Numerical examples also demonstrate the superior performance of Info-Greedy Sensing algorithms compared with their random and non-adaptive counterparts.

preprint2015arXiv

Sequential Sensing with Model Mismatch

We characterize the performance of sequential information guided sensing, Info-Greedy Sensing, when there is a mismatch between the true signal model and the assumed model, which may be a sample estimate. In particular, we consider a setup where the signal is low-rank Gaussian and the measurements are taken in the directions of eigenvectors of the covariance matrix in a decreasing order of eigenvalues. We establish a set of performance bounds when a mismatched covariance matrix is used, in terms of the gap of signal posterior entropy, as well as the additional amount of power required to achieve the same signal recovery precision. Based on this, we further study how to choose an initialization for Info-Greedy Sensing using the sample covariance matrix, or using an efficient covariance sketching scheme.

preprint2014arXiv

Fast Algorithm for Low-rank matrix recovery in Poisson noise

This paper describes a fast algorithm for recovering low-rank matrices from their linear measurements contaminated with Poisson noise: the Poisson noise Maximum Likelihood Singular Value thresholding (PMLSV) algorithm. We propose a convex optimization formulation with a cost function consisting of the sum of a likelihood function and a regularization function which the nuclear norm of the matrix. Instead of solving the optimization problem directly by semi-definite program (SDP), we derive an iterative singular value thresholding algorithm by expanding the likelihood function. We demonstrate the good performance of the proposed algorithm on recovery of solar flare images with Poisson noise: the algorithm is more efficient than solving SDP using the interior-point algorithm and it generates a good approximate solution compared to that solved from SDP.

preprint2014arXiv

On block coherence of frames

Block coherence of matrices plays an important role in analyzing the performance of block compressed sensing recovery algorithms (Bajwa and Mixon, 2012). In this paper, we characterize two block coherence metrics: worst-case and average block coherence. First, we present lower bounds on worst-case block coherence, in both the general case and also when the matrix is constrained to be a union of orthobases. We then present deterministic matrix constructions based upon Kronecker products which obtain these lower bounds. We also characterize the worst-case block coherence of random subspaces. Finally, we present a flipping algorithm that can improve the average block coherence of a matrix, while maintaining the worst-case block coherence of the original matrix. We provide numerical examples which demonstrate that our proposed deterministic matrix construction performs well in block compressed sensing.

preprint2014arXiv

PMU based Detection of Imbalance in Three-Phase Power Systems

The problem of imbalance detection in a three-phase power system using a phasor measurement unit (PMU) is considered. A general model for the zero, positive, and negative sequences from a PMU measurement at off-nominal frequencies is presented and a hypothesis testing framework is formulated. The new formulation takes into account the fact that minor degree of imbalance in the system is acceptable and does not indicate subsequent interruptions, failures, or degradation of physical components. A generalized likelihood ratio test (GLRT) is developed and shown to be a function of the negative-sequence phasor estimator and the acceptable level of imbalances for nominal system operations. As a by-product to the proposed detection method, a constrained estimation of the positive and negative phasors and the frequency deviation is obtained for both balanced and unbalanced situations. The theoretical and numerical performance analyses show improved performance over benchmark techniques and robustness to the presence of additional harmonics.

preprint2014arXiv

Sequential Changepoint Approach for Online Community Detection

We present new algorithms for detecting the emergence of a community in large networks from sequential observations. The networks are modeled using Erdos-Renyi random graphs with edges forming between nodes in the community with higher probability. Based on statistical changepoint detection methodology, we develop three algorithms: the Exhaustive Search (ES), the mixture, and the Hierarchical Mixture (H-Mix) methods. Performance of these methods is evaluated by the average run length (ARL), which captures the frequency of false alarms, and the detection delay. Numerical comparisons show that the ES method performs the best; however, it is exponentially complex. The mixture method is polynomially complex by exploiting the fact that the size of the community is typically small in a large network. However, it may react to a group of active edges that do not form a community. This issue is resolved by the H-Mix method, which is based on a dendrogram decomposition of the network. We present an asymptotic analytical expression for ARL of the mixture method when the threshold is large. Numerical simulation verifies that our approximation is accurate even in the non-asymptotic regime. Hence, it can be used to determine a desired threshold efficiently. Finally, numerical examples show that the mixture and the H-Mix methods can both detect a community quickly with a lower complexity than the ES method.

preprint2013arXiv

Compressive Demodulation of Mutually Interfering Signals

Multi-User Detection is fundamental not only to cellular wireless communication but also to Radio-Frequency Identification (RFID) technology that supports supply chain management. The challenge of Multi-user Detection (MUD) is that of demodulating mutually interfering signals, and the two biggest impediments are the asynchronous character of random access and the lack of channel state information. Given that at any time instant the number of active users is typically small, the promise of Compressive Sensing (CS) is the demodulation of sparse superpositions of signature waveforms from very few measurements. This paper begins by unifying two front-end architectures proposed for MUD by showing that both lead to the same discrete signal model. Algorithms are presented for coherent and noncoherent detection that are based on iterative matching pursuit. Noncoherent detection is all that is needed in the application to RFID technology where it is only the identity of the active users that is required. The coherent detector is also able to recover the transmitted symbols. It is shown that compressive demodulation requires $\mathcal{O}(K\log N(τ+1))$ samples to recover $K$ active users whereas standard MUD requires $N(τ+1)$ samples to process $N$ total users with a maximal delay $τ$. Performance guarantees are derived for both coherent and noncoherent detection that are identical in the way they scale with number of active users. The power profile of the active users is shown to be less important than the SNR of the weakest user. Gabor frames and Kerdock codes are proposed as signature waveforms and numerical examples demonstrate the superior performance of Kerdock codes - the same probability of error with less than half the samples.

preprint2013arXiv

Finding Zeros: Greedy Detection of Holes

In this paper, motivated by the setting of white-space detection [1], we present theoretical and empirical results for detection of the zero-support E of x \in Cp (xi = 0 for i \in E) with reduced-dimension linear measurements. We propose two low- complexity algorithms based on one-step thresholding [2] for this purpose. The second algorithm is a variant of the first that further assumes the presence of group-structure in the target signal [3] x. Performance guarantees for both algorithms based on the worst- case and average coherence (group coherence) of the measurement matrix is presented along with the empirical performance of the algorithms.

preprint2013arXiv

Reduced-Dimension Multiuser Detection

We present a reduced-dimension multiuser detector (RD-MUD) structure for synchronous systems that significantly decreases the number of required correlation branches at the receiver front-end, while still achieving performance similar to that of the conventional matched-filter (MF) bank. RD-MUD exploits the fact that, in some wireless systems, the number of active users may be small relative to the total number of users in the system. Hence, the ideas of analog compressed sensing may be used to reduce the number of correlators. The correlating signals used by each correlator are chosen as an appropriate linear combination of the users' spreading waveforms. We derive the probability-of-symbol-error when using two methods for recovery of active users and their transmitted symbols: the reduced-dimension decorrelating (RDD) detector, which combines subspace projection and thresholding to determine active users and sign detection for data recovery, and the reduced-dimension decision-feedback (RDDF) detector, which combines decision-feedback matching pursuit for active user detection and sign detection for data recovery. We derive probability of error bounds for both detectors, and show that the number of correlators needed to achieve a small probability-of-symbol-error is on the order of the logarithm of the number of users in the system. The theoretical performance results are validated via numerical simulations.

preprint2013arXiv

Sequential multi-sensor change-point detection

We develop a mixture procedure to monitor parallel streams of data for a change-point that affects only a subset of them, without assuming a spatial structure relating the data streams to one another. Observations are assumed initially to be independent standard normal random variables. After a change-point the observations in a subset of the streams of data have nonzero mean values. The subset and the post-change means are unknown. The procedure we study uses stream specific generalized likelihood ratio statistics, which are combined to form an overall detection statistic in a mixture model that hypothesizes an assumed fraction $p_0$ of affected data streams. An analytic expression is obtained for the average run length (ARL) when there is no change and is shown by simulations to be very accurate. Similarly, an approximation for the expected detection delay (EDD) after a change-point is also obtained. Numerical examples are given to compare the suggested procedure to other procedures for unstructured problems and in one case where the problem is assumed to have a well-defined geometric structure. Finally we discuss sensitivity of the procedure to the assumed value of $p_0$ and suggest a generalization.

preprint2012arXiv

Changepoint detection for high-dimensional time series with missing data

This paper describes a novel approach to change-point detection when the observed high-dimensional data may have missing elements. The performance of classical methods for change-point detection typically scales poorly with the dimensionality of the data, so that a large number of observations are collected after the true change-point before it can be reliably detected. Furthermore, missing components in the observed data handicap conventional approaches. The proposed method addresses these challenges by modeling the dynamic distribution underlying the data as lying close to a time-varying low-dimensional submanifold embedded within the ambient observation space. Specifically, streaming data is used to track a submanifold approximation, measure deviations from this approximation, and calculate a series of statistics of the deviations for detecting when the underlying manifold has changed in a sharp or unexpected manner. The approach described in this paper leverages several recent results in the field of high-dimensional data analysis, including subspace tracking with missing data, multiscale analysis techniques for point clouds, online optimization, and change-point detection performance analysis. Simulations and experiments highlight the robustness and efficacy of the proposed approach in detecting an abrupt change in an otherwise slowly varying low-dimensional manifold.

preprint2011arXiv

Reduced-dimension multiuser detection: detectors and performance guarantees

We explore several reduced-dimension multiuser detection (RD-MUD) structures that significantly decrease the number of required correlation branches at the receiver front-end, while still achieving performance similar to that of the conventional matched-filter (MF) bank. RD-MUD exploits the fact that the number of active users is typically small relative to the total number of users in the system and relies on ideas of analog compressed sensing to reduce the number of correlators. We first develop a general framework for both linear and nonlinear RD-MUD detectors. We then present theoretical performance analysis for two specific detectors: the linear reduced-dimension decorrelating (RDD) detector, which combines subspace projection and thresholding to determine active users and sign detection for data recovery, and the nonlinear reduced-dimension decision-feedback (RDDF) detector, which combines decision-feedback orthogonal matching pursuit for active user detection and sign detection for data recovery. The theoretical performance results for both detectors are validated via numerical simulations.

preprint2011arXiv

The Diversity-Multiplexing-Delay Tradeoff in MIMO Multihop Networks with ARQ

We study the tradeoff between reliability, data rate, and delay for half-duplex MIMO multihop networks that utilize the automatic-retransmission-request (ARQ) protocol both in the asymptotic high signal-to-noise ratio (SNR) regime and in the finite SNR regime. We propose novel ARQ protocol designs that optimize these tradeoffs. We first derive the diversity-multiplexing-delay tradeoff (DMDT) in the high SNR regime, where the delay is caused only by retransmissions. This asymptotic DMDT shows that the performance of an N node network is limited by the weakest three-node sub-network, and the performance of a three-node sub-network is determined by its weakest link, and, hence, the optimal ARQ protocol needs to equalize the performance on each link by allocating ARQ window sizes optimally. This equalization is captured through a novel Variable Block-Length (VBL) ARQ protocol that we propose, which achieves the optimal DMDT. We then consider the DMDT in the finite SNR regime, where the delay is caused by both the ARQ retransmissions and queueing. We characterize the finite SNR DMDT of the fixed ARQ protocol, when an end-to-end delay constraint is imposed, by deriving the probability of message error using an approach that couples the information outage analysis with the queueing network analysis. The exponent of the probability of deadline violation demonstrates that the system performance is again limited by the weakest three-node sub-network. The queueing delay changes the consideration for optimal ARQ design: more retransmissions reduce decoding error by lowering the information outage probability, but may also increase message drop rate due to delay deadline violations. Hence, the optimal ARQ should balance link performance while avoiding significant delay.

preprint2010arXiv

Diversity-Multiplexing-Delay Tradeoffs in MIMO Multihop Networks with ARQ

Tradeoff in diversity, multiplexing, and delay in multihop MIMO relay networks with ARQ is studied, where the random delay is caused by queueing and ARQ retransmission. This leads to an optimal ARQ allocation problem with per-hop delay or end-to-end delay constraint. The optimal ARQ allocation has to trade off between the ARQ error that the receiver fails to decode in the allocated maximum ARQ rounds and the packet loss due to queueing delay. These two probability of errors are characterized using the diversity-multiplexing-delay tradeoff (DMDT) (without queueing) and the tail probability of random delay derived using large deviation techniques, respectively. Then the optimal ARQ allocation problem can be formulated as a convex optimization problem. We show that the optimal ARQ allocation should balance each link performance as well avoid significant queue delay, which is also demonstrated by numerical examples.

Yao Xie

What is connected

Connect this record

See the researcher in context

Building this map preview

47 published item(s)

Online Kernel CUSUM for Change-Point Detection

Point processes with event time uncertainty

Transfer Learning for Causal Effect Estimation

A Data-Driven Approach to Robust Hypothesis Testing Using Sinkhorn Uncertainty Sets

Bayesian Uncertainty Quantification for Low-Rank Matrix Completion

Conformal prediction set for time-series

Distributionally Robust Weighted $k$-Nearest Neighbors

Learning Sinkhorn divergences for supervised change point detection

Neural Spectral Marked Point Processes

PERCEPT: a new online change-point detection method using topological data analysis

Sequential change-point detection for mutually exciting point processes over networks

Solar Radiation Ramping Events Modeling Using Spatio-temporal Point Processes

Two-sample Test with Kernel Projected Wasserstein Distance

Balanced Districting on Grid Graphs with Provable Compactness and Contiguity

Deep Fourier Kernel for Self-Attentive Point Processes

Early Detection of COVID-19 Hotspots Using Spatio-Temporal Data

Goodness-of-Fit Test for Mismatched Self-Exciting Processes

Imitation Learning of Neural Spatio-Temporal Point Processes

Inferring serial correlation with dynamic backgrounds

Online detection of cascading change-points

Online High-Dimensional Change-Point Detection using Topological Data Analysis

Optimality of Graph Scanning Statistic for Online Community Detection

Sequential Change Detection by Optimal Weighted $\ell_2$ Divergence

CheXplain: Enabling Physicians to Explore and UnderstandData-Driven, AI-Enabled Medical Imaging Analysis

Data-Driven Threshold Machine: Scan Statistics, Change-Point Detection, and Extreme Bandits

Detecting weak changes in dynamic events over networks

Dynamic change-point detection using similarity networks

Multi-Sensor Slope Change Detection

Sequential Low-Rank Change Detection

Categorical Matrix Completion

Online Supervised Subspace Tracking

Poisson Matrix Completion

Poisson Matrix Recovery and Completion

Sequential Information Guided Sensing

Sequential Sensing with Model Mismatch

Fast Algorithm for Low-rank matrix recovery in Poisson noise

On block coherence of frames

PMU based Detection of Imbalance in Three-Phase Power Systems

Sequential Changepoint Approach for Online Community Detection

Compressive Demodulation of Mutually Interfering Signals

Finding Zeros: Greedy Detection of Holes

Reduced-Dimension Multiuser Detection

Sequential multi-sensor change-point detection

Changepoint detection for high-dimensional time series with missing data

Reduced-dimension multiuser detection: detectors and performance guarantees

The Diversity-Multiplexing-Delay Tradeoff in MIMO Multihop Networks with ARQ

Diversity-Multiplexing-Delay Tradeoffs in MIMO Multihop Networks with ARQ