Source author record

Yun Wei

Yun Wei appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.ST Statistics Theory Information Theory math.IT Systems and Control eess.SP Machine Learning physics.soc-ph

Catalog footprint

What is connected

6works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Optimal transport based theory for latent structured models

This article is an exposition on some recent theoretical advances in learning latent structured models, with a primary focus on the fundamental roles that optimal transport distances play in the statistical theory. We aim at what may be the most critical and novel ingredient in this theory: the motivation, formulation, derivation and ramification of inverse bounds, a rich collection of structural inequalities for latent structured models which connect the space of distributions of unobserved structures of interest to the space of distributions for observed data. This theory is illustrated on classical mixture models, as well as the more modern hierarchical models that have been developed in Bayesian statistics, machine learning and related fields.

preprint2022arXiv

A unified framework for correlation mining in ultra-high dimension

Many applications benefit from theory relevant to the identification of variables having large correlations or partial correlations in high dimension. Recently there has been progress in the ultra-high dimensional setting when the sample size $n$ is fixed and the dimension $p$ tends to infinity. Despite these advances, the correlation screening framework suffers from practical, methodological and theoretical deficiencies. For instance, previous correlation screening theory requires that the population covariance matrix be sparse and block diagonal. This block sparsity assumption is however restrictive in practical applications. As a second example, correlation and partial correlation screening requires the estimation of dependence measures, which can be computationally prohibitive. In this paper, we propose a unifying approach to correlation and partial correlation mining that is not restricted to block diagonal correlation structure, thus yielding a methodology that is suitable for modern applications. By making connections to random geometric graphs, the number of highly correlated or partial correlated variables are shown to have compound Poisson finite-sample characterizations, which hold for both the finite $p$ case and when $p$ tends to infinity. The unifying framework also demonstrates a duality between correlation and partial correlation screening with theoretical and practical consequences.

preprint2022arXiv

Asymptotics for Outlier Hypothesis Testing

We revisit the outlier hypothesis testing framework of Li \emph{et al.} (TIT 2014) and derive fundamental limits for the optimal test. In outlier hypothesis testing, one is given multiple observed sequences, where most sequences are generated i.i.d. from a nominal distribution. The task is to discern the set of outlying sequences that are generated according to anomalous distributions. The nominal and anomalous distributions are \emph{unknown}. We consider the case of multiple outliers where the number of outliers is unknown and each outlier can follow a different anomalous distribution. Under this setting, we study the tradeoff among the probabilities of misclassification error, false alarm and false reject. Specifically, we propose a threshold-based test that ensures exponential decay of misclassification error and false alarm probabilities. We study two constraints on the false reject probability, with one constraint being that it is a non-vanishing constant and the other being that it has an exponential decay rate. For both cases, we characterize bounds on the false reject probability, as a function of the threshold, for each tuple of nominal and anomalous distributions. Finally, we demonstrate the asymptotic optimality of our test under the generalized Neyman-Pearson criterion.

preprint2022arXiv

Second-Order Asymptotically Optimal Outlier Hypothesis Testing

We revisit the outlier hypothesis testing framework of Li \emph{et al.} (TIT 2014) and derive fundamental limits for the optimal test under the generalized Neyman-Pearson criterion. In outlier hypothesis testing, one is given multiple observed sequences, where most sequences are generated i.i.d. from a nominal distribution. The task is to discern the set of outlying sequences that are generated from anomalous distributions. The nominal and anomalous distributions are \emph{unknown}. We study the tradeoff among the probabilities of misclassification error, false alarm and false reject for tests that satisfy weak conditions on the rate of decrease of these error probabilities as a function of sequence length. Specifically, we propose a threshold-based test that ensures exponential decay of misclassification error and false alarm probabilities. We study two constraints on the false reject probability, with one constraint being that it is a non-vanishing constant and the other being that it has an exponential decay rate. For both cases, we characterize bounds on the false reject probability, as a function of the threshold, for each pair of nominal and anomalous distributions and demonstrate the optimality of our test under the generalized Neyman-Pearson criterion. We first consider the case of at most one outlying sequence and then generalize our results to the case of multiple outlying sequences where the number of outlying sequences is unknown and each outlying sequence can follow a different anomalous distribution.

preprint2013arXiv

Learning Geo-Temporal Non-Stationary Failure and Recovery of Power Distribution

Smart energy grid is an emerging area for new applications of machine learning in a non-stationary environment. Such a non-stationary environment emerges when large-scale failures occur at power distribution networks due to external disturbances such as hurricanes and severe storms. Power distribution networks lie at the edge of the grid, and are especially vulnerable to external disruptions. Quantifiable approaches are lacking and needed to learn non-stationary behaviors of large-scale failure and recovery of power distribution. This work studies such non-stationary behaviors in three aspects. First, a novel formulation is derived for an entire life cycle of large-scale failure and recovery of power distribution. Second, spatial-temporal models of failure and recovery of power distribution are developed as geo-location based multivariate non-stationary GI(t)/G(t)/Infinity queues. Third, the non-stationary spatial-temporal models identify a small number of parameters to be learned. Learning is applied to two real-life examples of large-scale disruptions. One is from Hurricane Ike, where data from an operational network is exact on failures and recoveries. The other is from Hurricane Sandy, where aggregated data is used for inferring failure and recovery processes at one of the impacted areas. Model parameters are learned using real data. Two findings emerge as results of learning: (a) Failure rates behave similarly at the two different provider networks for two different hurricanes but differently at the geographical regions. (b) Both rapid- and slow-recovery are present for Hurricane Ike but only slow recovery is shown for a regional distribution network from Hurricane Sandy.

preprint2012arXiv

Non-Stationary Random Process for Large-Scale Failure and Recovery of Power Distributions

A key objective of the smart grid is to improve reliability of utility services to end users. This requires strengthening resilience of distribution networks that lie at the edge of the grid. However, distribution networks are exposed to external disturbances such as hurricanes and snow storms where electricity service to customers is disrupted repeatedly. External disturbances cause large-scale power failures that are neither well-understood, nor formulated rigorously, nor studied systematically. This work studies resilience of power distribution networks to large-scale disturbances in three aspects. First, a non-stationary random process is derived to characterize an entire life cycle of large-scale failure and recovery. Second, resilience is defined based on the non-stationary random process. Close form analytical expressions are derived under specific large-scale failure scenarios. Third, the non-stationary model and the resilience metric are applied to a real life example of large-scale disruptions due to Hurricane Ike. Real data on large-scale failures from an operational network is used to learn time-varying model parameters and resilience metrics.

Yun Wei

What is connected

Connect this record

See the researcher in context

Building this map preview

6 published item(s)

Optimal transport based theory for latent structured models

A unified framework for correlation mining in ultra-high dimension

Asymptotics for Outlier Hypothesis Testing

Second-Order Asymptotically Optimal Outlier Hypothesis Testing

Learning Geo-Temporal Non-Stationary Failure and Recovery of Power Distribution

Non-Stationary Random Process for Large-Scale Failure and Recovery of Power Distributions