Researcher profile

Vincent Y. F. Tan

Vincent Y. F. Tan contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
39works
0followers
19topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

39 published item(s)

preprint2022arXiv

A Survey of Risk-Aware Multi-Armed Bandits

In several applications such as clinical trials and financial portfolio optimization, the expected value (or the average reward) does not satisfactorily capture the merits of a drug or a portfolio. In such applications, risk plays a crucial role, and a risk-aware performance measure is preferable, so as to capture losses in the case of adverse events. This survey aims to consolidate and summarise the existing research on risk measures, specifically in the context of multi-armed bandits. We review various risk measures of interest, and comment on their properties. Next, we review existing concentration inequalities for various risk measures. Then, we proceed to defining risk-aware bandit problems, We consider algorithms for the regret minimization setting, where the exploration-exploitation trade-off manifests, as well as the best-arm identification setting, which is a pure exploration problem -- both in the context of risk-sensitive measures. We conclude by commenting on persisting challenges and fertile areas for future research.

preprint2022arXiv

A Unifying Theory of Thompson Sampling for Continuous Risk-Averse Bandits

This paper unifies the design and the analysis of risk-averse Thompson sampling algorithms for the multi-armed bandit problem for a class of risk functionals $ρ$ that are continuous and dominant. We prove generalised concentration bounds for these continuous and dominant risk functionals and show that a wide class of popular risk functionals belong to this class. Using our newly developed analytical toolkits, we analyse the algorithm $ρ$-MTS (for multinomial distributions) and prove that they admit asymptotically optimal regret bounds of risk-averse algorithms under CVaR, proportional hazard, and other ubiquitous risk measures. More generally, we prove the asymptotic optimality of $ρ$-MTS for Bernoulli distributions for a class of risk measures known as empirical distribution performance measures (EDPMs); this includes the well-known mean-variance. Numerical simulations show that the regret bounds incurred by our algorithms are reasonably tight vis-à-vis algorithm-independent lower bounds.

preprint2022arXiv

Asymptotic Nash Equilibrium for the $M$-ary Sequential Adversarial Hypothesis Testing Game

In this paper, we consider a novel $M$-ary sequential hypothesis testing problem in which an adversary is present and perturbs the distributions of the samples before the decision maker observes them. This problem is formulated as a sequential adversarial hypothesis testing game played between the decision maker and the adversary. This game is a zero-sum and strategic one. We assume the adversary is active under \emph{all} hypotheses and knows the underlying distribution of observed samples. We adopt this framework as it is the worst-case scenario from the perspective of the decision maker. The goal of the decision maker is to minimize the expectation of the stopping time to ensure that the test is as efficient as possible; the adversary's goal is, instead, to maximize the stopping time. We derive a pair of strategies under which the asymptotic Nash equilibrium of the game is attained. We also consider the case in which the adversary is not aware of the underlying hypothesis and hence is constrained to apply the same strategy regardless of which hypothesis is in effect. Numerical results corroborate our theoretical findings.

preprint2022arXiv

Asymptotics of Sequential Composite Hypothesis Testing under Probabilistic Constraints

We consider the sequential composite binary hypothesis testing problem in which one of the hypotheses is governed by a single distribution while the other is governed by a family of distributions whose parameters belong to a known set $Γ$. We would like to design a test to decide which hypothesis is in effect. Under the constraints that the probabilities that the length of the test, a stopping time, exceeds $n$ are bounded by a certain threshold $ε$, we obtain certain fundamental limits on the asymptotic behavior of the sequential test as $n$ tends to infinity. Assuming that $Γ$ is a convex and compact set, we obtain the set of all first-order error exponents for the problem. We also prove a strong converse. Additionally, we obtain the set of second-order error exponents under the assumption that $\mathcal{X}$ is a finite alphabet. In the proof of second-order asymptotics, a main technical contribution is the derivation of a central limit-type result for a maximum of an uncountable set of log-likelihood ratios under suitable conditions. This result may be of independent interest. We also show that some important statistical models satisfy the conditions.

preprint2022arXiv

Best Arm Identification in Restless Markov Multi-Armed Bandits

We study the problem of identifying the best arm in a multi-armed bandit environment when each arm is a time-homogeneous and ergodic discrete-time Markov process on a common, finite state space. The state evolution on each arm is governed by the arm's transition probability matrix (TPM). A decision entity that knows the set of arm TPMs but not the exact mapping of the TPMs to the arms, wishes to find the index of the best arm as quickly as possible, subject to an upper bound on the error probability. The decision entity selects one arm at a time sequentially, and all the unselected arms continue to undergo state evolution ({\em restless} arms). For this problem, we derive the first-known problem instance-dependent asymptotic lower bound on the growth rate of the expected time required to find the index of the best arm, where the asymptotics is as the error probability vanishes. Further, we propose a sequential policy that, for an input parameter $R$, forcibly selects an arm that has not been selected for $R$ consecutive time instants. We show that this policy achieves an upper bound that depends on $R$ and is monotonically non-increasing as $R\to\infty$. The question of whether, in general, the limiting value of the upper bound as $R\to\infty$ matches with the lower bound, remains open. We identify a special case in which the upper and the lower bounds match. Prior works on best arm identification have dealt with (a) independent and identically distributed observations from the arms, and (b) rested Markov arms, whereas our work deals with the more difficult setting of restless Markov arms.

preprint2022arXiv

Efficient Sharpness-aware Minimization for Improved Training of Neural Networks

Overparametrized Deep Neural Networks (DNNs) often achieve astounding performances, but may potentially result in severe generalization error. Recently, the relation between the sharpness of the loss landscape and the generalization error has been established by Foret et al. (2020), in which the Sharpness Aware Minimizer (SAM) was proposed to mitigate the degradation of the generalization. Unfortunately, SAM s computational cost is roughly double that of base optimizers, such as Stochastic Gradient Descent (SGD). This paper thus proposes Efficient Sharpness Aware Minimizer (ESAM), which boosts SAM s efficiency at no cost to its generalization performance. ESAM includes two novel and efficient training strategies-StochasticWeight Perturbation and Sharpness-Sensitive Data Selection. In the former, the sharpness measure is approximated by perturbing a stochastically chosen set of weights in each iteration; in the latter, the SAM loss is optimized using only a judiciously selected subset of data that is sensitive to the sharpness. We provide theoretical explanations as to why these strategies perform well. We also show, via extensive experiments on the CIFAR and ImageNet datasets, that ESAM enhances the efficiency over SAM from requiring 100% extra computations to 40% vis-a-vis base optimizers, while test accuracies are preserved or even improved.

preprint2022arXiv

Exact Recovery in the General Hypergraph Stochastic Block Model

This paper investigates fundamental limits of exact recovery in the general d-uniform hypergraph stochastic block model (d-HSBM), wherein n nodes are partitioned into k disjoint communities with relative sizes (p1,..., pk). Each subset of nodes with cardinality d is generated independently as an order-d hyperedge with a certain probability that depends on the ground-truth communities that the d nodes belong to. The goal is to exactly recover the k hidden communities based on the observed hypergraph. We show that there exists a sharp threshold such that exact recovery is achievable above the threshold and impossible below the threshold (apart from a small regime of parameters that will be specified precisely). This threshold is represented in terms of a quantity which we term as the generalized Chernoff-Hellinger divergence between communities. Our result for this general model recovers prior results for the standard SBM and d-HSBM with two symmetric communities as special cases. En route to proving our achievability results, we develop a polynomial-time two-stage algorithm that meets the threshold. The first stage adopts a certain hypergraph spectral clustering method to obtain a coarse estimate of communities, and the second stage refines each node individually via local refinement steps to ensure exact recovery.

preprint2022arXiv

On Robustness of Neural Ordinary Differential Equations

Neural ordinary differential equations (ODEs) have been attracting increasing attention in various research domains recently. There have been some works studying optimization issues and approximation capabilities of neural ODEs, but their robustness is still yet unclear. In this work, we fill this important gap by exploring robustness properties of neural ODEs both empirically and theoretically. We first present an empirical study on the robustness of the neural ODE-based networks (ODENets) by exposing them to inputs with various types of perturbations and subsequently investigating the changes of the corresponding outputs. In contrast to conventional convolutional neural networks (CNNs), we find that the ODENets are more robust against both random Gaussian perturbations and adversarial attack examples. We then provide an insightful understanding of this phenomenon by exploiting a certain desirable property of the flow of a continuous-time ODE, namely that integral curves are non-intersecting. Our work suggests that, due to their intrinsic robustness, it is promising to use neural ODEs as a basic block for building robust deep network models. To further enhance the robustness of vanilla neural ODEs, we propose the time-invariant steady neural ODE (TisODE), which regularizes the flow on perturbed data via the time-invariant property and the imposition of a steady-state constraint. We show that the TisODE method outperforms vanilla neural ODEs and also can work in conjunction with other state-of-the-art architectural methods to build more robust deep networks.

preprint2022arXiv

The Informativeness of K -Means for Learning Mixture Models

The learning of mixture models can be viewed as a clustering problem. Indeed, given data samples independently generated from a mixture of distributions, we often would like to find the {\it correct target clustering} of the samples according to which component distribution they were generated from. For a clustering problem, practitioners often choose to use the simple $k$-means algorithm. $k$-means attempts to find an {\it optimal clustering} that minimizes the sum-of-squares distance between each point and its cluster center. In this paper, we consider fundamental (i.e., information-theoretic) limits of the solutions (clusterings) obtained by optimizing the sum-of-squares distance. In particular, we provide sufficient conditions for the closeness of any optimal clustering and the correct target clustering assuming that the data samples are generated from a mixture of spherical Gaussian distributions. We also generalize our results to log-concave distributions. Moreover, we show that under similar or even weaker conditions on the mixture model, any optimal clustering for the samples with reduced dimensionality is also close to the correct target clustering. These results provide intuition for the informativeness of $k$-means (with and without dimensionality reduction) as an algorithm for learning mixture models.

preprint2022arXiv

Tight Regret Bounds for Noisy Optimization of a Brownian Motion

We consider the problem of Bayesian optimization of a one-dimensional Brownian motion in which the $T$ adaptively chosen observations are corrupted by Gaussian noise. We show that as the smallest possible expected cumulative regret and the smallest possible expected simple regret scale as $Ω(σ\sqrt{T / \log (T)}) \cap \mathcal{O}(σ\sqrt{T} \cdot \log T)$ and $Ω(σ/ \sqrt{T \log (T)}) \cap \mathcal{O}(σ\log T / \sqrt{T})$ respectively, where $σ^2$ is the noise variance. Thus, our upper and lower bounds are tight up to a factor of $\mathcal{O}( (\log T)^{1.5} )$. The upper bound uses an algorithm based on confidence bounds and the Markov property of Brownian motion (among other useful properties), and the lower bound is based on a reduction to binary hypothesis testing.

preprint2022arXiv

Towards Adversarially Robust Deep Image Denoising

This work systematically investigates the adversarial robustness of deep image denoisers (DIDs), i.e, how well DIDs can recover the ground truth from noisy observations degraded by adversarial perturbations. Firstly, to evaluate DIDs' robustness, we propose a novel adversarial attack, namely Observation-based Zero-mean Attack ({\sc ObsAtk}), to craft adversarial zero-mean perturbations on given noisy images. We find that existing DIDs are vulnerable to the adversarial noise generated by {\sc ObsAtk}. Secondly, to robustify DIDs, we propose an adversarial training strategy, hybrid adversarial training ({\sc HAT}), that jointly trains DIDs with adversarial and non-adversarial noisy data to ensure that the reconstruction quality is high and the denoisers around non-adversarial data are locally smooth. The resultant DIDs can effectively remove various types of synthetic and adversarial noise. We also uncover that the robustness of DIDs benefits their generalization capability on unseen real-world noise. Indeed, {\sc HAT}-trained DIDs can recover high-quality clean images from real-world noise even without training on real noisy data. Extensive experiments on benchmark datasets, including Set68, PolyU, and SIDD, corroborate the effectiveness of {\sc ObsAtk} and {\sc HAT}.

preprint2021arXiv

An Interpretable Intensive Care Unit Mortality Risk Calculator

Mortality risk is a major concern to patients have just been discharged from the intensive care unit (ICU). Many studies have been directed to construct machine learning models to predict such risk. Although these models are highly accurate, they are less amenable to interpretation and clinicians are typically unable to gain further insights into the patients' health conditions and the underlying factors that influence their mortality risk. In this paper, we use patients' profiles extracted from the MIMIC-III clinical database to construct risk calculators based on different machine learning techniques such as logistic regression, decision trees, random forests and multilayer perceptrons. We perform an extensive benchmarking study that compares the most salient features as predicted by various methods. We observe a high degree of agreement across the considered machine learning methods; in particular, the cardiac surgery recovery unit, age, and blood urea nitrogen levels are commonly predicted to be the most salient features for determining patients' mortality risks. Our work has the potential for clinicians to interpret risk predictions.

preprint2021arXiv

Community Detection and Matrix Completion with Social and Item Similarity Graphs

We consider the problem of recovering a binary rating matrix as well as clusters of users and items based on a partially observed matrix together with side-information in the form of social and item similarity graphs. These two graphs are both generated according to the celebrated stochastic block model (SBM). We develop lower and upper bounds on sample complexity that match for various scenarios. Our information-theoretic results quantify the benefits of the availability of the social and item similarity graphs. Further analysis reveals that under certain scenarios, the social and item similarity graphs produce an interesting synergistic effect. This means that observing two graphs is strictly better than observing just one in terms of reducing the sample complexity.

preprint2021arXiv

Distributionally Robust and Multi-Objective Nonnegative Matrix Factorization

Nonnegative matrix factorization (NMF) is a linear dimensionality reduction technique for analyzing nonnegative data. A key aspect of NMF is the choice of the objective function that depends on the noise model (or statistics of the noise) assumed on the data. In many applications, the noise model is unknown and difficult to estimate. In this paper, we define a multi-objective NMF (MO-NMF) problem, where several objectives are combined within the same NMF model. We propose to use Lagrange duality to judiciously optimize for a set of weights to be used within the framework of the weighted-sum approach, that is, we minimize a single objective function which is a weighted sum of the all objective functions. We design a simple algorithm based on multiplicative updates to minimize this weighted sum. We show how this can be used to find distributionally robust NMF (DR-NMF) solutions, that is, solutions that minimize the largest error among all objectives, using a dual approach solved via a heuristic inspired from the Frank-Wolfe algorithm. We illustrate the effectiveness of this approach on synthetic, document and audio data sets. The results show that DR-NMF is robust to our incognizance of the noise model of the NMF problem.

preprint2021arXiv

On Non-Interactive Simulation of Binary Random Variables

We leverage proof techniques Fourier analysis and an existing result in coding theory to derive new bounds for the problem of non-interactive simulation of binary random variables. Previous bounds in the literature were derived by applying data processing inequalities concerning maximal correlation or hypercontractivity. We show that our bounds are sharp in some regimes. For a specific instance of problem parameters, our main result answers an open problem posed by E. Mossel in 2017. As by-products of our analyses, various new properties of the average distance and distance enumerator of binary block codes are established.

preprint2021arXiv

Risk-Constrained Thompson Sampling for CVaR Bandits

The multi-armed bandit (MAB) problem is a ubiquitous decision-making problem that exemplifies the exploration-exploitation tradeoff. Standard formulations exclude risk in decision making. Risk notably complicates the basic reward-maximising objective, in part because there is no universally agreed definition of it. In this paper, we consider a popular risk measure in quantitative finance known as the Conditional Value at Risk (CVaR). We explore the performance of a Thompson Sampling-based algorithm CVaR-TS under this risk measure. We provide comprehensive comparisons between our regret bounds with state-of-the-art L/UCB-based algorithms in comparable settings and demonstrate their clear improvement in performance. We also include numerical simulations to empirically verify that CVaR-TS outperforms other L/UCB-based algorithms.

preprint2021arXiv

Sequential Classification with Empirically Observed Statistics

Motivated by real-world machine learning applications, we consider a statistical classification task in a sequential setting where test samples arrive sequentially. In addition, the generating distributions are unknown and only a set of empirically sampled sequences are available to a decision maker. The decision maker is tasked to classify a test sequence which is known to be generated according to either one of the distributions. In particular, for the binary case, the decision maker wishes to perform the classification task with minimum number of the test samples, so, at each step, she declares that either hypothesis 1 is true, hypothesis 2 is true, or she requests for an additional test sample. We propose a classifier and analyze the type-I and type-II error probabilities. We demonstrate the significant advantage of our sequential scheme compared to an existing non-sequential classifier proposed by Gutman. Finally, we extend our setup and results to the multi-class classification scenario and again demonstrate that the variable-length nature of the problem affords significant advantages as one can achieve the same set of exponents as Gutman's fixed-length setting but without having the rejection option.

preprint2021arXiv

SGA: A Robust Algorithm for Partial Recovery of Tree-Structured Graphical Models with Noisy Samples

We consider learning Ising tree models when the observations from the nodes are corrupted by independent but non-identically distributed noise with unknown statistics. Katiyar et al. (2020) showed that although the exact tree structure cannot be recovered, one can recover a partial tree structure; that is, a structure belonging to the equivalence class containing the true tree. This paper presents a systematic improvement of Katiyar et al. (2020). First, we present a novel impossibility result by deriving a bound on the necessary number of samples for partial recovery. Second, we derive a significantly improved sample complexity result in which the dependence on the minimum correlation $ρ_{\min}$ is $ρ_{\min}^{-8}$ instead of $ρ_{\min}^{-24}$. Finally, we propose Symmetrized Geometric Averaging (SGA), a more statistically robust algorithm for partial tree recovery. We provide error exponent analyses and extensive numerical results on a variety of trees to show that the sample complexity of SGA is significantly better than the algorithm of Katiyar et al. (2020). SGA can be readily extended to Gaussian models and is shown via numerical experiments to be similarly superior.

preprint2020arXiv

Asymptotic Expansions of Smooth Rényi Entropies and Their Applications

This study considers the unconditional smooth Rényi entropy, the smooth conditional Rényi entropy proposed by Kuzuoka [\emph{IEEE Trans.\ Inf.\ Theory}, vol.~66, no.~3, pp.~1674--1690, 2020], and a new quantity which we term the conditional smooth Rényi entropy. In particular, we examine asymptotic expansions of these entropies when the underlying source with its side-information is stationary and memoryless. Using these smooth Rényi entropies, we establish one-shot coding theorems of several information-theoretic problems: Campbell's source coding, guessing problems, and task encoding problems, all allowing errors. In each problem, we consider two error formalisms: the average and maximum error criteria, where the averaging and maximization are taken with respect to the side-information of the source. Applying our asymptotic expansions to the derived one-shot coding theorems, we derive various asymptotic fundamental limits for these problems when their error probabilities are allowed to be non-vanishing. We show that, in non-degenerate settings, the first-order fundamental limits differ under the average and maximum error criteria. This is in contrast to a different but related setting considered by the present authors (for variable-length conditional source coding allowing errors) in which the first-order terms are identical but the second-order terms are different under these criteria.

preprint2020arXiv

Best Arm Identification for Cascading Bandits in the Fixed Confidence Setting

We design and analyze CascadeBAI, an algorithm for finding the best set of $K$ items, also called an arm, within the framework of cascading bandits. An upper bound on the time complexity of CascadeBAI is derived by overcoming a crucial analytical challenge, namely, that of probabilistically estimating the amount of available feedback at each step. To do so, we define a new class of random variables (r.v.'s) which we term as left-sided sub-Gaussian r.v.'s; these are r.v.'s whose cumulant generating functions (CGFs) can be bounded by a quadratic only for non-positive arguments of the CGFs. This enables the application of a sufficiently tight Bernstein-type concentration inequality. We show, through the derivation of a lower bound on the time complexity, that the performance of CascadeBAI is optimal in some practical regimes. Finally, extensive numerical simulations corroborate the efficacy of CascadeBAI as well as the tightness of our upper bound on its time complexity.

preprint2020arXiv

Corrections to "Wyner's Common Information under Rényi Divergence Measures"

In this correspondence, we correct an erroneous result on the achievability part of the Rényi common information with order $1+s\in(1,2]$ in [1]. The new achievability result (upper bound) of the Rényi common information no longer coincides with Wyner's common information. We also provide a new converse result (lower bound) in this correspondence for the Rényi common information with order $1+s\in(1,\infty]$. Numerical results show that for doubly symmetric binary sources, the new upper and lower bounds coincide for the order $1+s\in(1,2]$ and they are both strictly larger than Wyner's common information for this case.

preprint2020arXiv

Distributed Detection with Empirically Observed Statistics

Consider a distributed detection problem in which the underlying distributions of the observations are unknown; instead of these distributions, noisy versions of empirically observed statistics are available to the fusion center. These empirically observed statistics, together with source (test) sequences, are transmitted through different channels to the fusion center. The fusion center decides which distribution the source sequence is sampled from based on these data. For the binary case, we derive the optimal type-II error exponent given that the type-I error decays exponentially fast. The type-II error exponent is maximized over the proportions of channels for both source and training sequences. We conclude that as the ratio of the lengths of training to test sequences $α$ tends to infinity, using only one channel is optimal. By calculating the derived exponents numerically, we conjecture that the same is true when $α$ is finite. We relate our results to the classical distributed detection problem studied by Tsitsiklis, in which the underlying distributions are known. Finally, our results are extended to the case of $m$-ary distributed detection with a rejection option.

preprint2020arXiv

Economy Statistical Recurrent Units For Inferring Nonlinear Granger Causality

Granger causality is a widely-used criterion for analyzing interactions in large-scale networks. As most physical interactions are inherently nonlinear, we consider the problem of inferring the existence of pairwise Granger causality between nonlinearly interacting stochastic processes from their time series measurements. Our proposed approach relies on modeling the embedded nonlinearities in the measurements using a component-wise time series prediction model based on Statistical Recurrent Units (SRUs). We make a case that the network topology of Granger causal relations is directly inferrable from a structured sparse estimate of the internal parameters of the SRU networks trained to predict the processes$'$ time series measurements. We propose a variant of SRU, called economy-SRU, which, by design has considerably fewer trainable parameters, and therefore less prone to overfitting. The economy-SRU computes a low-dimensional sketch of its high-dimensional hidden state in the form of random projections to generate the feedback for its recurrent processing. Additionally, the internal weight parameters of the economy-SRU are strategically regularized in a group-wise manner to facilitate the proposed network in extracting meaningful predictive features that are highly time-localized to mimic real-world causal events. Extensive experiments are carried out to demonstrate that the proposed economy-SRU based time series prediction model outperforms the MLP, LSTM and attention-gated CNN-based time series models considered previously for inferring Granger causality.

preprint2020arXiv

Exact Asymptotics for Learning Tree-Structured Graphical Models with Side Information: Noiseless and Noisy Samples

Given side information that an Ising tree-structured graphical model is homogeneous and has no external field, we derive the exact asymptotics of learning its structure from independently drawn samples. Our results, which leverage the use of probabilistic tools from the theory of strong large deviations, refine the large deviation (error exponents) results of Tan, Anandkumar, Tong, and Willsky [IEEE Trans. on Inform. Th., 57(3):1714--1735, 2011] and strictly improve those of Bresler and Karzand [Ann. Statist., 2020]. In addition, we extend our results to the scenario in which the samples are observed in random noise. In this case, we show that they strictly improve on the recent results of Nikolakakis, Kalogerias, and Sarwate [Proc. AISTATS, 1771--1782, 2019]. Our theoretical results demonstrate keen agreement with experimental results for sample sizes as small as that in the hundreds.

preprint2020arXiv

On Exact and $\infty$-Rényi Common Informations

Recently, two extensions of Wyner's common information\textemdash exact and Rényi common informations\textemdash were introduced respectively by Kumar, Li, and El Gamal (KLE), and the present authors. The class of common information problems involves determining the minimum rate of the common input to two independent processors needed to exactly or approximately generate a target joint distribution. For the exact common information problem, exact generation of the target distribution is required, while for Wyner's and $α$-Rényi common informations, the relative entropy and Rényi divergence with order $α$ were respectively used to quantify the discrepancy between the synthesized and target distributions. The exact common information is larger than or equal to Wyner's common information. However, it was hitherto unknown whether the former is strictly larger than the latter for some joint distributions. In this paper, we first establish the equivalence between the exact and $\infty$-Rényi common informations, and then provide single-letter upper and lower bounds for these two quantities. For doubly symmetric binary sources, we show that the upper and lower bounds coincide, which implies that for such sources, the exact and $\infty$-Rényi common informations are completely characterized. Interestingly, we observe that for such sources, these two common informations are strictly larger than Wyner's. This answers an open problem posed by KLE. Furthermore, we extend Wyner's, $\infty$-Rényi, and exact common informations to sources with countably infinite or continuous alphabets, including Gaussian sources.

preprint2020arXiv

On the Error Exponent of Approximate Sufficient Statistics for M-ary Hypothesis Testing

Consider the problem of detecting one of M i.i.d. Gaussian signals corrupted in white Gaussian noise. Conventionally, matched filters are used for detection. We first show that the outputs of the matched filter form a set of asymptotically optimal sufficient statistics in the sense of maximizing the error exponent of detecting the true signal. In practice, however, M may be large which motivates the design and analysis of a reduced set of N statistics which we term approximate sufficient statistics. Our construction of these statistics is based on a small set of filters that project the outputs of the matched filters onto a lower-dimensional vector using a sensing matrix. We consider a sequence of sensing matrices that has the desiderata of row orthonormality and low coherence. We analyze the performance of the resulting maximum likelihood (ML) detector, which leads to an achievable bound on the error exponent based on the approximate sufficient statistics; this bound recovers the original error exponent when N = M. We compare this to a bound that we obtain by analyzing a modified form of the Reduced Dimensionality Detector (RDD) proposed by Xie, Eldar, and Goldsmith [IEEE Trans. on Inform. Th., 59(6):3858-3874, 2013]. We show that by setting the sensing matrices to be column-normalized group Hadamard matrices, the exponents derived are ensemble-tight, i.e., our analysis is tight on the exponential scale given the sensing matrices and the decoding rule. Finally, we derive some properties of the exponents, showing, in particular, that they increase linearly in the compression ratio N/M.

preprint2020arXiv

Second- and Third-Order Asymptotics of the Continuous-Time Poisson Channel

The paper derives the optimal second-order coding rate for the continuous-time Poisson channel. We also obtain bounds on the third-order coding rate. This is the first instance of a second-order result for a continuous-time channel. The converse proof hinges on a novel construction of an output distribution induced by Wyner's discretized channel and the construction of an appropriate $ε$-net of the input probability simplex. While the achievability proof follows the general program to prove the third-order term for non-singular discrete memoryless channels put forth by Polyanskiy, several non-standard techniques -- such as new definitions and bounds on the probabilities of typical sets using logarithmic Sobolev inequalities -- are employed to handle the continuous nature of the channel.

preprint2020arXiv

Second-Order Asymptotics of Sequential Hypothesis Testing

We consider the classical sequential binary hypothesis testing problem in which there are two hypotheses governed respectively by distributions $P_0$ and $P_1$ and we would like to decide which hypothesis is true using a sequential test. It is known from the work of Wald and Wolfowitz that as the expectation of the length of the test grows, the optimal type-I and type-II error exponents approach the relative entropies $D(P_1\|P_0)$ and $D(P_0\|P_1)$. We refine this result by considering the optimal backoff---or second-order asymptotics---from the corner point of the achievable exponent region $(D(P_1\|P_0),D(P_0\|P_1))$ under two different constraints on the length of the test (or the sample size). First, we consider a probabilistic constraint in which the probability that the length of test exceeds a prescribed integer $n$ is less than a certain threshold $0<\varepsilon <1$. Second, the expectation of the sample size is bounded by $n$. In both cases, and under mild conditions, the second-order asymptotics is characterized exactly. Numerical examples are provided to illustrate our results.

preprint2020arXiv

Thompson Sampling Algorithms for Mean-Variance Bandits

The multi-armed bandit (MAB) problem is a classical learning task that exemplifies the exploration-exploitation tradeoff. However, standard formulations do not take into account {\em risk}. In online decision making systems, risk is a primary concern. In this regard, the mean-variance risk measure is one of the most common objective functions. Existing algorithms for mean-variance optimization in the context of MAB problems have unrealistic assumptions on the reward distributions. We develop Thompson Sampling-style algorithms for mean-variance MAB and provide comprehensive regret analyses for Gaussian and Bernoulli bandits with fewer assumptions. Our algorithms achieve the best known regret bounds for mean-variance MABs and also attain the information-theoretic bounds in some parameter regimes. Empirical simulations show that our algorithms significantly outperform existing LCB-based algorithms for all risk tolerances.

preprint2020arXiv

Unsupervised Image Noise Modeling with Self-Consistent GAN

Noise modeling lies in the heart of many image processing tasks. However, existing deep learning methods for noise modeling generally require clean and noisy image pairs for model training; these image pairs are difficult to obtain in many realistic scenarios. To ameliorate this problem, we propose a self-consistent GAN (SCGAN), that can directly extract noise maps from noisy images, thus enabling unsupervised noise modeling. In particular, the SCGAN introduces three novel self-consistent constraints that are complementary to one another, viz.: the noise model should produce a zero response over a clean input; the noise model should return the same output when fed with a specific pure noise input; and the noise model also should re-extract a pure noise map if the map is added to a clean image. These three constraints are simple yet effective. They jointly facilitate unsupervised learning of a noise model for various noise types. To demonstrate its wide applicability, we deploy the SCGAN on three image processing tasks including blind image denoising, rain streak removal, and noisy image super-resolution. The results demonstrate the effectiveness and superiority of our method over the state-of-the-art methods on a variety of benchmark datasets, even though the noise types vary significantly and paired clean images are not available.

preprint2020arXiv

Variable-Length Source Dispersions Differ under Maximum and Average Error Criteria

Variable-length compression without prefix-free constraints and with side-information available at both encoder and decoder is considered. Instead of requiring the code to be error-free, we allow for it to have a non-vanishing error probability. We derive one-shot bounds on the optimal average codeword length by proposing two new information quantities; namely, the conditional and unconditional $\varepsilon$-cutoff entropies. Using these one-shot bounds, we obtain the second-order asymptotics of the problem under two different formalisms---the average and maximum probabilities of error over the realization of the side-information. While the first-order terms in the asymptotic expansions for both formalisms are identical, we find that the source dispersion under the average error formalism is, in most cases, strictly smaller than its maximum error counterpart. Applications to a certain class of guessing problems, previously studied by Kuzuoka [\emph{{IEEE} Trans.\ Inf.\ Theory}, vol.~66, no.~3, pp.~1674--1690, 2020], are also discussed.

preprint2019arXiv

Fundamental Limits of Communication Over State-Dependent Channels With Feedback

The fundamental limits of communication over state-dependent discrete memoryless channels with noiseless feedback are studied, under the assumption that the communicating parties are allowed to use variable-length coding schemes. Various cases are analyzed, with the employed coding schemes having either bounded or unbounded codeword lengths, and with state information revealed to the encoder and/or decoder in a strictly causal, causal, or non-causal manner. In each of these settings, necessary and sufficient conditions for positivity of the zero-error capacity are obtained and it is shown that, whenever the zero-error capacity is positive, it equals the conventional vanishing-error capacity. Moreover, it is shown that the vanishing-error capacity of state-dependent channels is not increased by the use of feedback and variable-length coding. Both these kinds of capacities of state-dependent channels with feedback are thus fully characterized.

preprint2018arXiv

Asymptotically Optimal Codes Correcting Fixed-Length Duplication Errors in DNA Storage Systems

A (tandem) duplication of length $ k $ is an insertion of an exact copy of a substring of length $ k $ next to its original position. This and related types of impairments are of relevance in modeling communication in the presence of synchronization errors, as well as in several information storage applications. We demonstrate that Levenshtein&#39;s construction of binary codes correcting insertions of zeros is, with minor modifications, applicable also to channels with arbitrary alphabets and with duplication errors of arbitrary (but fixed) length $ k $. Furthermore, we derive bounds on the cardinality of optimal $ q $-ary codes correcting up to $ t $ duplications of length $ k $, and establish the following corollaries in the asymptotic regime of growing block-length: 1.) the presented family of codes is optimal for every $ q, t, k $, in the sense of the asymptotic scaling of code redundancy; 2.) the upper bound, when specialized to $ q = 2 $, $ k = 1 $, improves upon Levenshtein&#39;s bound for every $ t \geq 3 $; 3.) the bounds coincide for $ t = 1 $, thus yielding the exact asymptotic behavior of the size of optimal single-duplication-correcting codes.

preprint2018arXiv

Codes in the Space of Multisets---Coding for Permutation Channels with Impairments

Motivated by communication channels in which the transmitted sequences are subject to random permutations, as well as by certain DNA storage systems, we study the error control problem in settings where the information is stored/transmitted in the form of multisets of symbols from a given finite alphabet. A general channel model is assumed in which the transmitted multisets are potentially impaired by insertions, deletions, substitutions, and erasures of symbols. Several constructions of error-correcting codes for this channel are described, and bounds on the size of optimal codes correcting any given number of errors derived. The construction based on the notion of Sidon sets in finite Abelian groups is shown to be optimal, in the sense of the asymptotic scaling of code redundancy, for any &#34;error radius&#34; and any alphabet size. It is also shown to be optimal in the stronger sense of maximal code cardinality in various cases.

preprint2018arXiv

Strong Converse for Hypothesis Testing Against Independence over a Two-Hop Network

By proving a strong converse, we strengthen the weak converse result by Salehkalaibar, Wigger and Wang (2017) concerning hypothesis testing against independence over a two-hop network with communication constraints. Our proof follows by judiciously combining two recently proposed techniques for proving strong converse theorems, namely the strong converse technique via reverse hypercontractivity by Liu, van Handel, and Verdú (2017) and the strong converse technique by Tyagi and Watanabe (2018), in which the authors used a change-of-measure technique and replaced hard Markov constraints with soft information costs. The techniques used in our paper can also be applied to prove strong converse theorems for other multiterminal hypothesis testing against independence problems.

preprint2017arXiv

Improved Bounds on Sidon Sets via Lattice Packings of Simplices

A $ B_h $ set (or Sidon set of order $ h $) in an Abelian group $ G $ is any subset $ \{b_0, b_1, \ldots,b_{n}\} $ of $ G $ with the property that all the sums $ b_{i_1} + \cdots + b_{i_h} $ are different up to the order of the summands. Let $ ϕ(h,n) $ denote the order of the smallest Abelian group containing a $ B_h $ set of cardinality $ n + 1 $. It is shown that \[ \lim_{h \to \infty} \frac{ ϕ(h,n) }{ h^n } = \frac{1}{n! δ_L(\triangle^n)} , \] where $ δ_L(\triangle^n) $ is the lattice packing density of an $ n $-simplex in Euclidean space. This determines the asymptotics exactly in cases where this density is known ($ n \leq 3 $) and gives improved bounds on $ ϕ(h,n) $ in the remaining cases. The corresponding geometric characterization of bases of order $ h $ in finite Abelian groups in terms of lattice coverings by simplices is also given.

preprint2017arXiv

Zero-Error Capacity of $P$-ary Shift Channels and FIFO Queues

The objects of study of this paper are communication channels in which the dominant type of noise are symbol shifts, the main motivating examples being timing and bit-shift channels. Two channel models are introduced and their zero-error capacities and zero-error-detection capacities determined by explicit constructions of optimal codes. Model A can be informally described as follows: 1) The information is stored in an $ n $-cell register, where each cell is either empty or contains a particle of one of $ P $ possible types, and 2) due to the imperfections of the device each of the particles may be shifted several cells away from its original position over time. Model B is an abstraction of a single-server queue: 1) The transmitter sends packets from a $ P $-ary alphabet through a queuing system with an infinite buffer and a First-In-First-Out (FIFO) service procedure, and 2) each packet is being processed by the server for a random number of time slots. More general models including additional types of noise that the particles/packets can experience are also studied, as are the continuous-time versions of these problems.

preprint2010arXiv

Learning Gaussian Tree Models: Analysis of Error Exponents and Extremal Structures

The problem of learning tree-structured Gaussian graphical models from independent and identically distributed (i.i.d.) samples is considered. The influence of the tree structure and the parameters of the Gaussian distribution on the learning rate as the number of samples increases is discussed. Specifically, the error exponent corresponding to the event that the estimated tree structure differs from the actual unknown tree structure of the distribution is analyzed. Finding the error exponent reduces to a least-squares problem in the very noisy learning regime. In this regime, it is shown that the extremal tree structure that minimizes the error exponent is the star for any fixed set of correlation coefficients on the edges of the tree. If the magnitudes of all the correlation coefficients are less than 0.63, it is also shown that the tree structure that maximizes the error exponent is the Markov chain. In other words, the star and the chain graphs represent the hardest and the easiest structures to learn in the class of tree-structured Gaussian graphical models. This result can also be intuitively explained by correlation decay: pairs of nodes which are far apart, in terms of graph distance, are unlikely to be mistaken as edges by the maximum-likelihood estimator in the asymptotic regime.

preprint2010arXiv

Learning Latent Tree Graphical Models

We study the problem of learning a latent tree graphical model where samples are available only from a subset of variables. We propose two consistent and computationally efficient algorithms for learning minimal latent trees, that is, trees without any redundant hidden nodes. Unlike many existing methods, the observed nodes (or variables) are not constrained to be leaf nodes. Our first algorithm, recursive grouping, builds the latent tree recursively by identifying sibling groups using so-called information distances. One of the main contributions of this work is our second algorithm, which we refer to as CLGrouping. CLGrouping starts with a pre-processing procedure in which a tree over the observed variables is constructed. This global step groups the observed nodes that are likely to be close to each other in the true latent tree, thereby guiding subsequent recursive grouping (or equivalent procedures) on much smaller subsets of variables. This results in more accurate and efficient learning of latent trees. We also present regularized versions of our algorithms that learn latent tree approximations of arbitrary distributions. We compare the proposed algorithms to other methods by performing extensive numerical experiments on various latent tree graphical models such as hidden Markov models and star graphs. In addition, we demonstrate the applicability of our methods on real-world datasets by modeling the dependency structure of monthly stock returns in the S&P index and of the words in the 20 newsgroups dataset.