Source author record

Yajun Mei

Yajun Mei appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Methodology Applications math.ST Statistics Theory eess.SP

Catalog footprint

What is connected

14works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Active Learning-Based Multistage Sequential Decision-Making Model with Application on Common Bile Duct Stone Evaluation

Multistage sequential decision-making scenarios are commonly seen in the healthcare diagnosis process. In this paper, an active learning-based method is developed to actively collect only the necessary patient data in a sequential manner. There are two novelties in the proposed method. First, unlike the existing ordinal logistic regression model which only models a single stage, we estimate the parameters for all stages together. Second, it is assumed that the coefficients for common features in different stages are kept consistent. The effectiveness of the proposed method is validated in both a simulation study and a real case study. Compared with the baseline method where the data is modeled individually and independently, the proposed method improves the estimation efficiency by 62\%-1838\%. For both simulation and testing cohorts, the proposed method is more effective, stable, interpretable, and computationally efficient on parameter estimation. The proposed method can be easily extended to a variety of scenarios where decision-making can be done sequentially with only necessary information.

preprint2022arXiv

Adaptive Partially-Observed Sequential Change Detection and Isolation

High-dimensional data has become popular due to the easy accessibility of sensors in modern industrial applications. However, one specific challenge is that it is often not easy to obtain complete measurements due to limited sensing powers and resource constraints. Furthermore, distinct failure patterns may exist in the systems, and it is necessary to identify the true failure pattern. This work focuses on the online adaptive monitoring of high-dimensional data in resource-constrained environments with multiple potential failure modes. To achieve this, we propose to apply the Shiryaev-Roberts procedure on the failure mode level and utilize the multi-arm bandit to balance the exploration and exploitation. We further discuss the theoretical property of the proposed algorithm to show that the proposed method can correctly isolate the failure mode. Finally, extensive simulations and two case studies demonstrate that the change point detection performance and the failure mode isolation accuracy can be greatly improved.

preprint2022arXiv

Adaptive Resources Allocation CUSUM for Binomial Count Data Monitoring with Application to COVID-19 Hotspot Detection

In this paper, we present an efficient statistical method (denoted as "Adaptive Resources Allocation CUSUM") to robustly and efficiently detect the hotspot with limited sampling resources. Our main idea is to combine the multi-arm bandit (MAB) and change-point detection methods to balance the exploration and exploitation of resource allocation for hotspot detection. Further, a Bayesian weighted update is used to update the posterior distribution of the infection rate. Then, the upper confidence bound (UCB) is used for resource allocation and planning. Finally, CUSUM monitoring statistics to detect the change point as well as the change location. For performance evaluation, we compare the performance of the proposed method with several benchmark methods in the literature and showed the proposed algorithm is able to achieve a lower detection delay and higher detection precision. Finally, this method is applied to hotspot detection in a real case study of county-level daily positive COVID-19 cases in Washington State WA) and demonstrates the effectiveness with very limited distributed samples.

preprint2022arXiv

Bandit Change-Point Detection for Real-Time Monitoring High-Dimensional Data Under Sampling Control

In many real-world problems of real-time monitoring high-dimensional streaming data, one wants to detect an undesired event or change quickly once it occurs, but under the sampling control constraint in the sense that one might be able to only observe or use selected components data for decision-making per time step in the resource-constrained environments. In this paper, we propose to incorporate multi-armed bandit approaches into sequential change-point detection to develop an efficient bandit change-point detection algorithm based on the limiting Bayesian approach to incorporate a prior knowledge of potential changes. Our proposed algorithm, termed Thompson-Sampling-Shiryaev-Roberts-Pollak (TSSRP), consists of two policies per time step: the adaptive sampling policy applies the Thompson Sampling algorithm to balance between exploration for acquiring long-term knowledge and exploitation for immediate reward gain, and the statistical decision policy fuses the local Shiryaev-Roberts-Pollak statistics to determine whether to raise a global alarm by sum shrinkage techniques. Extensive numerical simulations and case studies demonstrate the statistical and computational efficiency of our proposed TSSRP algorithm.

preprint2022arXiv

Implicit Regularization Properties of Variance Reduced Stochastic Mirror Descent

In machine learning and statistical data analysis, we often run into objective function that is a summation: the number of terms in the summation possibly is equal to the sample size, which can be enormous. In such a setting, the stochastic mirror descent (SMD) algorithm is a numerically efficient method -- each iteration involving a very small subset of the data. The variance reduction version of SMD (VRSMD) can further improve SMD by inducing faster convergence. On the other hand, algorithms such as gradient descent and stochastic gradient descent have the implicit regularization property that leads to better performance in terms of the generalization errors. Little is known on whether such a property holds for VRSMD. We prove here that the discrete VRSMD estimator sequence converges to the minimum mirror interpolant in the linear regression. This establishes the implicit regularization property for VRSMD. As an application of the above result, we derive a model estimation accuracy result in the setting when the true model is sparse. We use numerical examples to illustrate the empirical power of VRSMD.

preprint2022arXiv

Private Sequential Hypothesis Testing for Statisticians: Privacy, Error Rates, and Sample Size

The sequential hypothesis testing problem is a class of statistical analyses where the sample size is not fixed in advance. Instead, the decision-process takes in new observations sequentially to make real-time decisions for testing an alternative hypothesis against a null hypothesis until some stopping criterion is satisfied. In many common applications of sequential hypothesis testing, the data can be highly sensitive and may require privacy protection; for example, sequential hypothesis testing is used in clinical trials, where doctors sequentially collect data from patients and must determine when to stop recruiting patients and whether the treatment is effective. The field of differential privacy has been developed to offer data analysis tools with strong privacy guarantees, and has been commonly applied to machine learning and statistical tasks. In this work, we study the sequential hypothesis testing problem under a slight variant of differential privacy, known as Renyi differential privacy. We present a new private algorithm based on Wald's Sequential Probability Ratio Test (SPRT) that also gives strong theoretical privacy guarantees. We provide theoretical analysis on statistical performance measured by Type I and Type II error as well as the expected sample size. We also empirically validate our theoretical results on several synthetic databases, showing that our algorithms also perform well in practice. Unlike previous work in private hypothesis testing that focused only on the classical fixed sample setting, our results in the sequential setting allow a conclusion to be reached much earlier, and thus saving the cost of collecting additional samples.

preprint2022arXiv

The Directional Bias Helps Stochastic Gradient Descent to Generalize in Kernel Regression Models

We study the Stochastic Gradient Descent (SGD) algorithm in nonparametric statistics: kernel regression in particular. The directional bias property of SGD, which is known in the linear regression setting, is generalized to the kernel regression. More specifically, we prove that SGD with moderate and annealing step-size converges along the direction of the eigenvector that corresponds to the largest eigenvalue of the Gram matrix. In addition, the Gradient Descent (GD) with a moderate or small step-size converges along the direction that corresponds to the smallest eigenvalue. These facts are referred to as the directional bias properties; they may interpret how an SGD-computed estimator has a potentially smaller generalization error than a GD-computed estimator. The application of our theory is demonstrated by simulation studies and a case study that is based on the FashionMNIST dataset.

preprint2020arXiv

Rapid Detection of Hot-spot by Tensor Decomposition with Application to Weekly Gonorrhea Data

In many bio-surveillance and healthcare applications, data sources are measured from many spatial locations repeatedly over time, say, daily/weekly/monthly. In these applications, we are typically interested in detecting hot-spots, which are defined as some structured outliers that are sparse over the spatial domain but persistent over time. In this paper, we propose a tensor decomposition method to detect when and where the hot-spots occur. Our proposed methods represent the observed raw data as a three-dimensional tensor including a circular time dimension for daily/weekly/monthly patterns, and then decompose the tensor into three components: smooth global trend, local hot-spots, and residuals. A combination of LASSO and fused LASSO is used to estimate the model parameters, and a CUSUM procedure is applied to detect when and where the hot-spots might occur. The usefulness of our proposed methodology is validated through numerical simulation and a real-world dataset in the weekly number of gonorrhea cases from $2006$ to $2018$ for $50$ states in the United States.

preprint2020arXiv

Rapid Detection of Hot-spots via Tensor Decomposition with applications to Crime Rate Data

We propose an efficient statistical method (denoted as SSR-Tensor) to robustly and quickly detect hot-spots that are sparse and temporal-consistent in a spatial-temporal dataset through the tensor decomposition. Our main idea is first to build an SSR model to decompose the tensor data into a Smooth global trend mean, Sparse local hot-spots, and Residuals. Next, tensor decomposition is utilized as follows: bases are introduced to describe within-dimension correlation, and tensor products are used for between-dimension interaction. Then, a combination of LASSO and fused LASSO is used to estimate the model parameters, where an efficient recursive estimation procedure is developed based on the large-scale convex optimization, where we first transform the general LASSO optimization into regular LASSO optimization and apply FISTA to solve it with the fastest convergence rate. Finally, a CUSUM procedure is applied to detect when and where the hot-spot event occurs. We compare the performance of the proposed method in a numerical simulation study and a real-world case study, which contains a dataset including a collection of three types of crime rates for U.S. mainland states during the year 1965-2014. In both cases, the proposed SSR-Tensor is able to achieve the fast detection and accurate localization of the hot-spots.

preprint2016arXiv

Large-Scale Multi-Stream Quickest Change Detection via Shrinkage Post-Change Estimation

The quickest change detection problem is considered in the context of monitoring large-scale independent normal distributed data streams with possible changes in some of the means. It is assumed that for each individual local data stream, either there are no local changes, or there is a "big" local change that is larger than a pre-specified lower bound. Two different kinds of scenarios are studied: one is the sparse post-change case when the unknown number of affected data streams is much smaller than the total number of data streams, and the other is when all local data streams are affected simultaneously although not necessarily identically. We propose a systematic approach to develop efficient global monitoring schemes for quickest change detection by combining hard thresholding with linear shrinkage estimators to estimating all post-change parameters simultaneously. Our theoretical analysis demonstrates that the shrinkage estimation can balance the tradeoff between the first-order and second-order terms of the asymptotic expression on the detection delays, and our numerical simulation studies illustrate the usefulness of shrinkage estimation and the challenge of Monte Carlo simulation of the average run length to false alarm in the context of online monitoring large-scale data streams.

preprint2016arXiv

Scalable SUM-Shrinkage Schemes for Distributed Monitoring Large-Scale Data Streams

In this article, motivated by biosurveillance and censoring sensor networks, we investigate the problem of distributed monitoring large-scale data streams where an undesired event may occur at some unknown time and affect only a few unknown data streams. We propose to develop scalable global monitoring schemes by parallel running local detection procedures and by combining these local procedures together to make a global decision based on SUM-shrinkage techniques. Our approach is illustrated in two concrete examples: one is the nonhomogeneous case when the pre-change and post-change local distributions are given, and the other is the homogeneous case of monitoring a large number of independent $N(0,1)$ data streams where the means of some data streams might shift to unknown positive or negative values. Numerical simulation studies demonstrate the usefulness of the proposed schemes.

preprint2016arXiv

Thresholded Multivariate Principal Component Analysis for Multi-channel Profile Monitoring

Monitoring multichannel profiles has important applications in manufacturing systems improvement, but it is non-trivial to develop efficient statistical methods due to two main challenges. First, profiles are high-dimensional functional data with intrinsic inner- and inter-channel correlations, and one needs to develop a dimension reduction method that can deal with such intricate correlations for the purpose of effective monitoring. The second, and probably more fundamental, challenge is that the functional structure of multi-channel profiles might change over time, and thus the dimension reduction method should be able to automatically take into account the potential unknown change. To tackle these two challenges, we propose a novel thresholded multivariate principal component analysis (PCA) method for multi-channel profile monitoring. Our proposed method consists of two steps of dimension reduction: It first applies the functional PCA to extract a reasonable large number of features under the normal operational (in-control) state, and then use the soft-thresholding techniques to further select significant features capturing profile information in the out-of-control state. The choice of tuning parameter for soft-thresholding is provided based on asymptotic analysis, and extensive simulation studies are conducted to illustrate the efficacy of our proposed thresholded PCA methodology.

preprint2010arXiv

Asymptotic Optimality Theory For Decentralized Sequential Multihypothesis Testing Problems

The Bayesian formulation of sequentially testing $M \ge 3$ hypotheses is studied in the context of a decentralized sensor network system. In such a system, local sensors observe raw observations and send quantized sensor messages to a fusion center which makes a final decision when stopping taking observations. Asymptotically optimal decentralized sequential tests are developed from a class of "two-stage" tests that allows the sensor network system to make a preliminary decision in the first stage and then optimize each local sensor quantizer accordingly in the second stage. It is shown that the optimal local quantizer at each local sensor in the second stage can be defined as a maximin quantizer which turns out to be a randomization of at most $M-1$ unambiguous likelihood quantizers (ULQ). We first present in detail our results for the system with a single sensor and binary sensor messages, and then extend to more general cases involving any finite alphabet sensor messages, multiple sensors, or composite hypotheses.

preprint2010arXiv

Decentralized Multihypothesis Sequential Detection

This article is concerned with decentralized sequential testing of multiple hypotheses. In a sensor network system with limited local memory, raw observations are observed at the local sensors, and quantized into binary sensor messages that are sent to a fusion center, which makes a final decision. It is assumed that the raw sensor observations are distributed according to a set of M>=2 specified distributions, and the fusion center has to utilize quantized sensor messages to decide which one is the true distribution. Asymptotically Bayes tests are offered for decentralized multihypothesis sequential detection by combining three existing methodologies together: tandem quantizers, unambiguous likelihood quantizers, and randomized quantizers.

Yajun Mei

What is connected

Connect this record

See the researcher in context

Building this map preview

14 published item(s)

Active Learning-Based Multistage Sequential Decision-Making Model with Application on Common Bile Duct Stone Evaluation

Adaptive Partially-Observed Sequential Change Detection and Isolation

Adaptive Resources Allocation CUSUM for Binomial Count Data Monitoring with Application to COVID-19 Hotspot Detection

Bandit Change-Point Detection for Real-Time Monitoring High-Dimensional Data Under Sampling Control

Implicit Regularization Properties of Variance Reduced Stochastic Mirror Descent

Private Sequential Hypothesis Testing for Statisticians: Privacy, Error Rates, and Sample Size

The Directional Bias Helps Stochastic Gradient Descent to Generalize in Kernel Regression Models

Rapid Detection of Hot-spot by Tensor Decomposition with Application to Weekly Gonorrhea Data

Rapid Detection of Hot-spots via Tensor Decomposition with applications to Crime Rate Data

Large-Scale Multi-Stream Quickest Change Detection via Shrinkage Post-Change Estimation

Scalable SUM-Shrinkage Schemes for Distributed Monitoring Large-Scale Data Streams

Thresholded Multivariate Principal Component Analysis for Multi-channel Profile Monitoring

Asymptotic Optimality Theory For Decentralized Sequential Multihypothesis Testing Problems

Decentralized Multihypothesis Sequential Detection