Source author record

Jonathan Scarlett

Jonathan Scarlett appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Information Theory math.IT Machine Learning math.ST Statistics Theory eess.SP math.PR Social and Information Networks Artificial Intelligence Cryptography and Security Data Structures and Algorithms Discrete Mathematics math.OC Neural and Evolutionary Computing

Catalog footprint

What is connected

38works

14topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

A Robust Phased Elimination Algorithm for Corruption-Tolerant Gaussian Process Bandits

We consider the sequential optimization of an unknown, continuous, and expensive to evaluate reward function, from noisy and adversarially corrupted observed rewards. When the corruption attacks are subject to a suitable budget $C$ and the function lives in a Reproducing Kernel Hilbert Space (RKHS), the problem can be posed as corrupted Gaussian process (GP) bandit optimization. We propose a novel robust elimination-type algorithm that runs in epochs, combines exploration with infrequent switching to select a small subset of actions, and plays each action for multiple time instants. Our algorithm, Robust GP Phased Elimination (RGP-PE), successfully balances robustness to corruptions with exploration and exploitation such that its performance degrades minimally in the presence (or absence) of adversarial corruptions. When $T$ is the number of samples and $γ_T$ is the maximal information gain, the corruption-dependent term in our regret bound is $O(C γ_T^{3/2})$, which is significantly tighter than the existing $O(C \sqrt{T γ_T})$ for several commonly-considered kernels. We perform the first empirical study of robustness in the corrupted GP bandit setting, and show that our algorithm is robust against a variety of adversarial attacks.

preprint2022arXiv

Adversarial Attacks on Gaussian Process Bandits

Gaussian processes (GP) are a widely-adopted tool used to sequentially optimize black-box functions, where evaluations are costly and potentially noisy. Recent works on GP bandits have proposed to move beyond random noise and devise algorithms robust to adversarial attacks. This paper studies this problem from the attacker's perspective, proposing various adversarial attack methods with differing assumptions on the attacker's strength and prior information. Our goal is to understand adversarial attacks on GP bandits from theoretical and practical perspectives. We focus primarily on targeted attacks on the popular GP-UCB algorithm and a related elimination-based algorithm, based on adversarially perturbing the function $f$ to produce another function $\tilde{f}$ whose optima are in some target region $\mathcal{R}_{\rm target}$. Based on our theoretical analysis, we devise both white-box attacks (known $f$) and black-box attacks (unknown $f$), with the former including a Subtraction attack and Clipping attack, and the latter including an Aggressive subtraction attack. We demonstrate that adversarial attacks on GP bandits can succeed in forcing the algorithm towards $\mathcal{R}_{\rm target}$ even with a low attack budget, and we test our attacks' effectiveness on a diverse range of objective functions.

preprint2022arXiv

Generative Principal Component Analysis

In this paper, we study the problem of principal component analysis with generative modeling assumptions, adopting a general model for the observed matrix that encompasses notable special cases, including spiked matrix recovery and phase retrieval. The key assumption is that the underlying signal lies near the range of an $L$-Lipschitz continuous generative model with bounded $k$-dimensional inputs. We propose a quadratic estimator, and show that it enjoys a statistical rate of order $\sqrt{\frac{k\log L}{m}}$, where $m$ is the number of samples. We also provide a near-matching algorithm-independent lower bound. Moreover, we provide a variant of the classic power method, which projects the calculated data onto the range of the generative model during each iteration. We show that under suitable conditions, this method converges exponentially fast to a point achieving the above-mentioned statistical rate. We perform experiments on various image datasets for spiked matrix and phase retrieval models, and illustrate performance gains of our method to the classic power method and the truncated power method devised for sparse principal component analysis.

preprint2022arXiv

Improved Convergence Rates for Sparse Approximation Methods in Kernel-Based Learning

Kernel-based models such as kernel ridge regression and Gaussian processes are ubiquitous in machine learning applications for regression and optimization. It is well known that a major downside for kernel-based models is the high computational cost; given a dataset of $n$ samples, the cost grows as $\mathcal{O}(n^3)$. Existing sparse approximation methods can yield a significant reduction in the computational cost, effectively reducing the actual cost down to as low as $\mathcal{O}(n)$ in certain cases. Despite this remarkable empirical success, significant gaps remain in the existing results for the analytical bounds on the error due to approximation. In this work, we provide novel confidence intervals for the Nyström method and the sparse variational Gaussian process approximation method, which we establish using novel interpretations of the approximate (surrogate) posterior variance of the models. Our confidence intervals lead to improved performance bounds in both regression and optimization problems.

preprint2022arXiv

Max-Min Grouped Bandits

In this paper, we introduce a multi-armed bandit problem termed max-min grouped bandits, in which the arms are arranged in possibly-overlapping groups, and the goal is to find the group whose worst arm has the highest mean reward. This problem is of interest in applications such as recommendation systems and resource allocation, and is also closely related to widely-studied robust optimization problems. We present two algorithms based successive elimination and robust optimization, and derive upper bounds on the number of samples to guarantee finding a max-min optimal or near-optimal group, as well as an algorithm-independent lower bound. We discuss the degree of tightness of our bounds in various cases of interest, and the difficulties in deriving uniformly tight bounds.

preprint2022arXiv

Model-Based and Graph-Based Priors for Group Testing

The goal of the group testing problem is to identify a set of defective items within a larger set of items, using suitably-designed tests whose outcomes indicate whether any defective item is present. In this paper, we study how the number of tests can be significantly decreased by leveraging the structural dependencies between the items, i.e., by incorporating prior information. To do so, we pursue two different perspectives: (i) As a generalization of the uniform combinatorial prior, we consider the case that the defective set is uniform over a \emph{subset} of all possible sets of a given size, and study how this impacts the information-theoretic limits on the number of tests for approximate recovery; (ii) As a generalization of the i.i.d.~prior, we introduce a new class of priors based on the Ising model, where the associated graph represents interactions between items. We show that this naturally leads to an Integer Quadratic Program decoder, which can be converted to an Integer Linear Program and/or relaxed to a non-integer variant for improved computational complexity, while maintaining strong empirical recovery performance.

preprint2022arXiv

Tight Regret Bounds for Noisy Optimization of a Brownian Motion

We consider the problem of Bayesian optimization of a one-dimensional Brownian motion in which the $T$ adaptively chosen observations are corrupted by Gaussian noise. We show that as the smallest possible expected cumulative regret and the smallest possible expected simple regret scale as $Ω(σ\sqrt{T / \log (T)}) \cap \mathcal{O}(σ\sqrt{T} \cdot \log T)$ and $Ω(σ/ \sqrt{T \log (T)}) \cap \mathcal{O}(σ\log T / \sqrt{T})$ respectively, where $σ^2$ is the noise variance. Thus, our upper and lower bounds are tight up to a factor of $\mathcal{O}( (\log T)^{1.5} )$. The upper bound uses an algorithm based on confidence bounds and the Markov property of Brownian motion (among other useful properties), and the lower bound is based on a reduction to binary hypothesis testing.

preprint2022arXiv

Universal 1-Bit Compressive Sensing for Bounded Dynamic Range Signals

A {\em universal 1-bit compressive sensing (CS)} scheme consists of a measurement matrix $A$ such that all signals $x$ belonging to a particular class can be approximately recovered from $\textrm{sign}(Ax)$. 1-bit CS models extreme quantization effects where only one bit of information is revealed per measurement. We focus on universal support recovery for 1-bit CS in the case of {\em sparse} signals with bounded {\em dynamic range}. Specifically, a vector $x \in \mathbb{R}^n$ is said to have sparsity $k$ if it has at most $k$ nonzero entries, and dynamic range $R$ if the ratio between its largest and smallest nonzero entries is at most $R$ in magnitude. Our main result shows that if the entries of the measurement matrix $A$ are i.i.d.~Gaussians, then under mild assumptions on the scaling of $k$ and $R$, the number of measurements needs to be $\tildeΩ(Rk^{3/2})$ to recover the support of $k$-sparse signals with dynamic range $R$ using $1$-bit CS. In addition, we show that a near-matching $O(R k^{3/2} \log n)$ upper bound follows as a simple corollary of known results. The $k^{3/2}$ scaling contrasts with the known lower bound of $\tildeΩ(k^2 \log n)$ for the number of measurements to recover the support of arbitrary $k$-sparse signals.

preprint2020arXiv

A Characteristic Function Approach to Deep Implicit Generative Modeling

Implicit Generative Models (IGMs) such as GANs have emerged as effective data-driven models for generating samples, particularly images. In this paper, we formulate the problem of learning an IGM as minimizing the expected distance between characteristic functions. Specifically, we minimize the distance between characteristic functions of the real and generated data distributions under a suitably-chosen weighting distribution. This distance metric, which we term as the characteristic function distance (CFD), can be (approximately) computed with linear time-complexity in the number of samples, in contrast with the quadratic-time Maximum Mean Discrepancy (MMD). By replacing the discrepancy measure in the critic of a GAN with the CFD, we obtain a model that is simple to implement and stable to train. The proposed metric enjoys desirable theoretical properties including continuity and differentiability with respect to generator parameters, and continuity in the weak topology. We further propose a variation of the CFD in which the weighting distribution parameters are also optimized during training; this obviates the need for manual tuning, and leads to an improvement in test power relative to CFD. We demonstrate experimentally that our proposed method outperforms WGAN and MMD-GAN variants on a variety of unsupervised image generation benchmarks.

preprint2020arXiv

A Fast Binary Splitting Approach to Non-Adaptive Group Testing

In this paper, we consider the problem of noiseless non-adaptive group testing under the for-each recovery guarantee, also known as probabilistic group testing. In the case of $n$ items and $k$ defectives, we provide an algorithm attaining high-probability recovery with $O(k \log n)$ scaling in both the number of tests and runtime, improving on the best known $O(k^2 \log k \cdot \log n)$ runtime previously available for any algorithm that only uses $O(k \log n)$ tests. Our algorithm bears resemblance to Hwang's adaptive generalized binary splitting algorithm (Hwang, 1972); we recursively work with groups of items of geometrically vanishing sizes, while maintaining a list of "possibly defective" groups and circumventing the need for adaptivity. While the most basic form of our algorithm requires $Ω(n)$ storage, we also provide a low-storage variant based on hashing, with similar recovery guarantees.

preprint2020arXiv

Corruption-Tolerant Gaussian Process Bandit Optimization

We consider the problem of optimizing an unknown (typically non-convex) function with a bounded norm in some Reproducing Kernel Hilbert Space (RKHS), based on noisy bandit feedback. We consider a novel variant of this problem in which the point evaluations are not only corrupted by random noise, but also adversarial corruptions. We introduce an algorithm Fast-Slow GP-UCB based on Gaussian process methods, randomized selection between two instances labeled "fast" (but non-robust) and "slow" (but robust), enlarged confidence bounds, and the principle of optimism under uncertainty. We present a novel theoretical analysis upper bounding the cumulative regret in terms of the corruption level, the time horizon, and the underlying kernel, and we argue that certain dependencies cannot be improved. We observe that distinct algorithmic ideas are required depending on whether one is required to perform well in both the corrupted and non-corrupted settings, and whether the corruption level is known or not.

preprint2020arXiv

High-Dimensional Bayesian Optimization via Tree-Structured Additive Models

Bayesian Optimization (BO) has shown significant success in tackling expensive low-dimensional black-box optimization problems. Many optimization problems of interest are high-dimensional, and scaling BO to such settings remains an important challenge. In this paper, we consider generalized additive models in which low-dimensional functions with overlapping subsets of variables are composed to model a high-dimensional target function. Our goal is to lower the computational resources required and facilitate faster model learning by reducing the model complexity while retaining the sample-efficiency of existing methods. Specifically, we constrain the underlying dependency graphs to tree structures in order to facilitate both the structure learning and optimization of the acquisition function. For the former, we propose a hybrid graph learning algorithm based on Gibbs sampling and mutation. In addition, we propose a novel zooming-based algorithm that permits generalized additive models to be employed more efficiently in the case of continuous domains. We demonstrate and discuss the efficacy of our approach via a range of experiments on synthetic functions and real-world datasets.

preprint2020arXiv

Information-Theoretic Lower Bounds for Compressive Sensing with Generative Models

It has recently been shown that for compressive sensing, significantly fewer measurements may be required if the sparsity assumption is replaced by the assumption the unknown vector lies near the range of a suitably-chosen generative model. In particular, in (Bora {\em et al.}, 2017) it was shown roughly $O(k\log L)$ random Gaussian measurements suffice for accurate recovery when the generative model is an $L$-Lipschitz function with bounded $k$-dimensional inputs, and $O(kd \log w)$ measurements suffice when the generative model is a $k$-input ReLU network with depth $d$ and width $w$. In this paper, we establish corresponding algorithm-independent lower bounds on the sample complexity using tools from minimax statistical analysis. In accordance with the above upper bounds, our results are summarized as follows: (i) We construct an $L$-Lipschitz generative model capable of generating group-sparse signals, and show that the resulting necessary number of measurements is $Ω(k \log L)$; (ii) Using similar ideas, we construct ReLU networks with high depth and/or high depth for which the necessary number of measurements scales as $Ω\big( kd \frac{\log w}{\log n}\big)$ (with output dimension $n$), and in some cases $Ω(kd \log w)$. As a result, we establish that the scaling laws derived in (Bora {\em et al.}, 2017) are optimal or near-optimal in the absence of further assumptions.

preprint2020arXiv

Learning Erdős-Rényi Random Graphs via Edge Detecting Queries

In this paper, we consider the problem of learning an unknown graph via queries on groups of nodes, with the result indicating whether or not at least one edge is present among those nodes. While learning arbitrary graphs with $n$ nodes and $k$ edges is known to be hard in the sense of requiring $Ω( \min\{ k^2 \log n, n^2\})$ tests (even when a small probability of error is allowed), we show that learning an Erdős-Rényi random graph with an average of $\bar{k}$ edges is much easier; namely, one can attain asymptotically vanishing error probability with only $O(\bar{k}\log n)$ tests. We establish such bounds for a variety of algorithms inspired by the group testing problem, with explicit constant factors indicating a near-optimal number of tests, and in some cases asymptotic optimality including constant factors. In addition, we present an alternative design that permits a near-optimal sublinear decoding time of $O(\bar{k} \log^2 \bar{k} + \bar{k} \log n)$.

preprint2020arXiv

Learning Gaussian Graphical Models via Multiplicative Weights

Graphical model selection in Markov random fields is a fundamental problem in statistics and machine learning. Two particularly prominent models, the Ising model and Gaussian model, have largely developed in parallel using different (though often related) techniques, and several practical algorithms with rigorous sample complexity bounds have been established for each. In this paper, we adapt a recently proposed algorithm of Klivans and Meka (FOCS, 2017), based on the method of multiplicative weight updates, from the Ising model to the Gaussian model, via non-trivial modifications to both the algorithm and its analysis. The algorithm enjoys a sample complexity bound that is qualitatively similar to others in the literature, has a low runtime $O(mp^2)$ in the case of $m$ samples and $p$ nodes, and can trivially be implemented in an online manner.

preprint2020arXiv

Sample Complexity Bounds for 1-bit Compressive Sensing and Binary Stable Embeddings with Generative Priors

The goal of standard 1-bit compressive sensing is to accurately recover an unknown sparse vector from binary-valued measurements, each indicating the sign of a linear function of the vector. Motivated by recent advances in compressive sensing with generative models, where a generative modeling assumption replaces the usual sparsity assumption, we study the problem of 1-bit compressive sensing with generative models. We first consider noiseless 1-bit measurements, and provide sample complexity bounds for approximate recovery under i.i.d.~Gaussian measurements and a Lipschitz continuous generative prior, as well as a near-matching algorithm-independent lower bound. Moreover, we demonstrate that the Binary $ε$-Stable Embedding property, which characterizes the robustness of the reconstruction to measurement errors and noise, also holds for 1-bit compressive sensing with Lipschitz continuous generative models with sufficiently many Gaussian measurements. In addition, we apply our results to neural network generative models, and provide a proof-of-concept numerical experiment demonstrating significant improvements over sparsity-based approaches.

preprint2020arXiv

Sublinear-Time Non-Adaptive Group Testing with $O(k \log n)$ Tests via Bit-Mixing Coding

The group testing problem consists of determining a small set of defective items from a larger set of items based on tests on groups of items, and is relevant in applications such as medical testing, communication protocols, pattern matching, and many more. While rigorous group testing algorithms have long been known with runtime at least linear in the number of items, a recent line of works has sought to reduce the runtime to ${\rm poly}(k \log n)$, where $n$ is the number of items and $k$ is the number of defectives. In this paper, we present such an algorithm for non-adaptive probabilistic group testing termed {\em bit mixing coding} (BMC), which builds on techniques that encode item indices in the test matrix, while incorporating novel ideas based on erasure-correction coding. We show that BMC achieves asymptotically vanishing error probability with $O(k \log n)$ tests and $O(k^2 \cdot \log k \cdot \log n)$ runtime, in the limit as $n \to \infty$ (with $k$ having an arbitrary dependence on $n$). This closes a recently-proposed open problem of simultaneously achieving ${\rm poly}(k \log n)$ decoding time using $O(k \log n)$ tests without any assumptions on $k$. In addition, we show that the same scaling laws can be attained in a commonly-considered noisy setting, in which each test outcome is flipped with constant probability.

preprint2016arXiv

Converse Bounds for Noisy Group Testing with Arbitrary Measurement Matrices

We consider the group testing problem, in which one seeks to identify a subset of defective items within a larger set of items based on a number of noisy tests. While matching achievability and converse bounds are known in several cases of interest for i.i.d.~measurement matrices, less is known regarding converse bounds for arbitrary measurement matrices. We address this by presenting two converse bounds for arbitrary matrices and general noise models. First, we provide a strong converse bound ($\mathbb{P}[\mathrm{error}] \to 1$) that matches existing achievability bounds in several cases of interest. Second, we provide a weak converse bound ($\mathbb{P}[\mathrm{error}] \not\to 0$) that matches existing achievability bounds in greater generality.

preprint2016arXiv

Improved group testing rates with constant column weight designs

We consider nonadaptive group testing where each item is placed in a constant number of tests. The tests are chosen uniformly at random with replacement, so the testing matrix has (almost) constant column weights. We show that performance is improved compared to Bernoulli designs, where each item is placed in each test independently with a fixed probability. In particular, we show that the rate of the practical COMP detection algorithm is increased by 31% in all sparsity regimes. In dense cases, this beats the best possible algorithm with Bernoulli tests, and in sparse cases is the best proven performance of any practical algorithm. We also give an algorithm-independent upper bound for the constant column weight case; for dense cases this is again a 31% increase over the analogous Bernoulli result.

preprint2016arXiv

Learning-based Compressive Subsampling

The problem of recovering a structured signal $\mathbf{x} \in \mathbb{C}^p$ from a set of dimensionality-reduced linear measurements $\mathbf{b} = \mathbf {A}\mathbf {x}$ arises in a variety of applications, such as medical imaging, spectroscopy, Fourier optics, and computerized tomography. Due to computational and storage complexity or physical constraints imposed by the problem, the measurement matrix $\mathbf{A} \in \mathbb{C}^{n \times p}$ is often of the form $\mathbf{A} = \mathbf{P}_Ω\boldsymbolΨ$ for some orthonormal basis matrix $\boldsymbolΨ\in \mathbb{C}^{p \times p}$ and subsampling operator $\mathbf{P}_Ω: \mathbb{C}^{p} \rightarrow \mathbb{C}^{n}$ that selects the rows indexed by $Ω$. This raises the fundamental question of how best to choose the index set $Ω$ in order to optimize the recovery performance. Previous approaches to addressing this question rely on non-uniform \emph{random} subsampling using application-specific knowledge of the structure of $\mathbf{x}$. In this paper, we instead take a principled learning-based approach in which a \emph{fixed} index set is chosen based on a set of training signals $\mathbf{x}_1,\dotsc,\mathbf{x}_m$. We formulate combinatorial optimization problems seeking to maximize the energy captured in these signals in an average-case or worst-case sense, and we show that these can be efficiently solved either exactly or approximately via the identification of modularity and submodularity structures. We provide both deterministic and statistical theoretical guarantees showing how the resulting measurement matrices perform on signals differing from the training signals, and we provide numerical examples showing our approach to be effective on a variety of data sets.

preprint2016arXiv

Limits on Support Recovery with Probabilistic Models: An Information-Theoretic Framework

The support recovery problem consists of determining a sparse subset of a set of variables that is relevant in generating a set of observations, and arises in a diverse range of settings such as compressive sensing, and subset selection in regression, and group testing. In this paper, we take a unified approach to support recovery problems, considering general probabilistic models relating a sparse data vector to an observation vector. We study the information-theoretic limits of both exact and partial support recovery, taking a novel approach motivated by thresholding techniques in channel coding. We provide general achievability and converse bounds characterizing the trade-off between the error probability and number of measurements, and we specialize these to the linear, 1-bit, and group testing models. In several cases, our bounds not only provide matching scaling laws in the necessary and sufficient number of measurements, but also sharp thresholds with matching constant factors. Our approach has several advantages over previous approaches: For the achievability part, we obtain sharp thresholds under broader scalings of the sparsity level and other parameters (e.g., signal-to-noise ratio) compared to several previous works, and for the converse part, we not only provide conditions under which the error probability fails to vanish, but also conditions under which it tends to one.

preprint2016arXiv

Multiuser Random Coding Techniques for Mismatched Decoding

This paper studies multiuser random coding techniques for channel coding with a given (possibly suboptimal) decoding rule. For the mismatched discrete memoryless multiple-access channel, an error exponent is obtained that is tight with respect to the ensemble average, and positive within the interior of Lapidoth's achievable rate region. This exponent proves the ensemble tightness of the exponent of Liu and Hughes in the case of maximum-likelihood decoding. An equivalent dual form of Lapidoth's achievable rate region is given, and the latter is shown to extend immediately to channels with infinite and continuous alphabets. In the setting of single-user mismatched decoding, similar analysis techniques are applied to a refined version of superposition coding, which is shown to achieve rates at least as high as standard superposition coding for any set of random-coding parameters.

preprint2016arXiv

On the Difficulty of Selecting Ising Models with Approximate Recovery

In this paper, we consider the problem of estimating the underlying graph associated with an Ising model given a number of independent and identically distributed samples. We adopt an \emph{approximate recovery} criterion that allows for a number of missed edges or incorrectly-included edges, in contrast with the widely-studied exact recovery problem. Our main results provide information-theoretic lower bounds on the sample complexity for graph classes imposing constraints on the number of edges, maximal degree, and other properties. We identify a broad range of scenarios where, either up to constant factors or logarithmic factors, our lower bounds match the best known lower bounds for the exact recovery criterion, several of which are known to be tight or near-tight. Hence, in these cases, approximate recovery has a similar difficulty to exact recovery in the minimax sense. Our bounds are obtained via a modification of Fano's inequality for handling the approximate recovery criterion, along with suitably-designed ensembles of graphs that can broadly be classed into two categories: (i) Those containing graphs that contain several isolated edges or cliques and are thus difficult to distinguish from the empty graph; (ii) Those containing graphs for which certain groups of nodes are highly correlated, thus making it difficult to determine precisely which edges connect them. We support our theoretical results on these ensembles with numerical experiments.

preprint2016arXiv

Partial Recovery Bounds for the Sparse Stochastic Block Model

In this paper, we study the information-theoretic limits of community detection in the symmetric two-community stochastic block model, with intra-community and inter-community edge probabilities $\frac{a}{n}$ and $\frac{b}{n}$ respectively. We consider the sparse setting, in which $a$ and $b$ do not scale with $n$, and provide upper and lower bounds on the proportion of community labels recovered on average. We provide a numerical example for which the bounds are near-matching for moderate values of $a - b$, and matching in the limit as $a-b$ grows large.

preprint2016arXiv

The Dispersion of Nearest-Neighbor Decoding for Additive Non-Gaussian Channels

We study the second-order asymptotics of information transmission using random Gaussian codebooks and nearest neighbor (NN) decoding over a power-limited stationary memoryless additive non-Gaussian noise channel. We show that the dispersion term depends on the non-Gaussian noise only through its second and fourth moments, thus complementing the capacity result (Lapidoth, 1996), which depends only on the second moment. Furthermore, we characterize the second-order asymptotics of point-to-point codes over $K$-sender interference networks with non-Gaussian additive noise. Specifically, we assume that each user's codebook is Gaussian and that NN decoding is employed, i.e., that interference from the $K-1$ unintended users (Gaussian interfering signals) is treated as noise at each decoder. We show that while the first-order term in the asymptotic expansion of the maximum number of messages depends on the power of the interferring codewords only through their sum, this does not hold for the second-order term.

preprint2016arXiv

Time-Varying Gaussian Process Bandit Optimization

We consider the sequential Bayesian optimization problem with bandit feedback, adopting a formulation that allows for the reward function to vary with time. We model the reward function using a Gaussian process whose evolution obeys a simple Markov model. We introduce two natural extensions of the classical Gaussian process upper confidence bound (GP-UCB) algorithm. The first, R-GP-UCB, resets GP-UCB at regular intervals. The second, TV-GP-UCB, instead forgets about old data in a smooth fashion. Our main contribution comprises of novel regret bounds for these algorithms, providing an explicit characterization of the trade-off between the time horizon and the rate at which the function varies. We illustrate the performance of the algorithms on both synthetic and real data, and we find the gradual forgetting of TV-GP-UCB to perform favorably compared to the sharp resetting of R-GP-UCB. Moreover, both algorithms significantly outperform classical GP-UCB, since it treats stale and fresh data equally.

preprint2016arXiv

Truncated Variance Reduction: A Unified Approach to Bayesian Optimization and Level-Set Estimation

We present a new algorithm, truncated variance reduction (TruVaR), that treats Bayesian optimization (BO) and level-set estimation (LSE) with Gaussian processes in a unified fashion. The algorithm greedily shrinks a sum of truncated variances within a set of potential maximizers (BO) or unclassified points (LSE), which is updated based on confidence bounds. TruVaR is effective in several important settings that are typically non-trivial to incorporate into myopic algorithms, including pointwise costs and heteroscedastic noise. We provide a general theoretical guarantee for TruVaR covering these aspects, and use it to recover and strengthen existing results on BO and LSE. Moreover, we provide a new result for a setting where one can select from a number of noise levels having associated costs. We demonstrate the effectiveness of the algorithm on both synthetic and real-world data sets.

preprint2015arXiv

A Counter-Example to the Mismatched Decoding Converse for Binary-Input Discrete Memoryless Channels

This paper studies the mismatched decoding problem for binary-input discrete memoryless channels. An example is provided for which an achievable rate based on superposition coding exceeds the LM rate (Hui, 1983; Csiszár-Körner, 1981), thus providing a counter-example to a previously reported converse result (Balakirsky, 1995). Both numerical evaluations and theoretical results are used in establishing this claim.

preprint2015arXiv

Second-Order Asymptotics for the Discrete Memoryless MAC with Degraded Message Sets

This paper studies the second-order asymptotics of the discrete memoryless multiple-access channel with degraded message sets. For a fixed average error probability $ε\in(0,1)$ and an arbitrary point on the boundary of the capacity region, we characterize the speed of convergence of rate pairs that converge to that point for codes that have asymptotic error probability no larger than $ε$, thus complementing an analogous result given previously for the Gaussian setting.

preprint2015arXiv

Second-Order Asymptotics for the Gaussian MAC with Degraded Message Sets

This paper studies the second-order asymptotics of the Gaussian multiple-access channel with degraded message sets. For a fixed average error probability $\varepsilon \in (0,1)$ and an arbitrary point on the boundary of the capacity region, we characterize the speed of convergence of rate pairs that converge to that boundary point for codes that have asymptotic error probability no larger than $\varepsilon$. As a stepping stone to this local notion of second-order asymptotics, we study a global notion, and establish relationships between the two. We provide a numerical example to illustrate how the angle of approach to a boundary point affects the second-order coding rate. This is the first conclusive characterization of the second-order asymptotics of a network information theory problem in which the capacity region is not a polygon.

preprint2014arXiv

Expurgated Random-Coding Ensembles: Exponents, Refinements and Connections

This paper studies expurgated random-coding bounds and exponents for channel coding with a given (possibly suboptimal) decoding rule. Variations of Gallager's analysis are presented, yielding several asymptotic and non-asymptotic bounds on the error probability for an arbitrary codeword distribution. A simple non-asymptotic bound is shown to attain an exponent of Csiszár and Körner under constant-composition coding. Using Lagrange duality, this exponent is expressed in several forms, one of which is shown to permit a direct derivation via cost-constrained coding which extends to infinite and continuous alphabets. The method of type class enumeration is studied, and it is shown that this approach can yield improved exponents and better tightness guarantees for some codeword distributions. A generalization of this approach is shown to provide a multi-letter exponent which extends immediately to channels with memory. Finally, a refined analysis expurgated i.i.d. random coding is shown to yield a O\big(\frac{1}{\sqrt{n}}\big) prefactor, thus improving on the standard O(1) prefactor. Moreover, the implied constant is explicitly characterized.

preprint2014arXiv

Mismatched Decoding: Error Exponents, Second-Order Rates and Saddlepoint Approximations

This paper considers the problem of channel coding with a given (possibly suboptimal) maximum-metric decoding rule. A cost-constrained random-coding ensemble with multiple auxiliary costs is introduced, and is shown to achieve error exponents and second-order coding rates matching those of constant-composition random coding, while being directly applicable to channels with infinite or continuous alphabets. The number of auxiliary costs required to match the error exponents and second-order rates of constant-composition coding is studied, and is shown to be at most two. For i.i.d. random coding, asymptotic estimates of two well-known non-asymptotic bounds are given using saddlepoint approximations. Each expression is shown to characterize the asymptotic behavior of the corresponding random-coding bound at both fixed and varying rates, thus unifying the regimes characterized by error exponents, second-order rates and moderate deviations. For fixed rates, novel exact asymptotics expressions are obtained to within a multiplicative 1+o(1) term. Using numerical examples, it is shown that the saddlepoint approximations are highly accurate even at short block lengths.

preprint2014arXiv

Second-Order Rate Region of Constant-Composition Codes for the Multiple-Access Channel

This paper studies the second-order asymptotics of coding rates for the discrete memoryless multiple-access channel with a fixed target error probability. Using constant-composition random coding, coded time-sharing, and a variant of Hoeffding's combinatorial central limit theorem, an inner bound on the set of locally achievable second-order coding rates is given for each point on the boundary of the capacity region. It is shown that the inner bound for constant-composition random coding includes that recovered by i.i.d. random coding, and that the inclusion may be strict. The inner bound is extended to the Gaussian multiple-access channel via an increasingly fine quantization of the inputs.

preprint2014arXiv

Sparsistency of $\ell_1$-Regularized $M$-Estimators

We consider the model selection consistency or sparsistency of a broad set of $\ell_1$-regularized $M$-estimators for linear and non-linear statistical models in a unified fashion. For this purpose, we propose the local structured smoothness condition (LSSC) on the loss function. We provide a general result giving deterministic sufficient conditions for sparsistency in terms of the regularization parameter, ambient dimension, sparsity level, and number of measurements. We show that several important statistical models have $M$-estimators that indeed satisfy the LSSC, and as a result, the sparsistency guarantees for the corresponding $\ell_1$-regularized $M$-estimators can be derived as simple applications of our main theorem.

preprint2014arXiv

The Saddlepoint Approximation: Unified Random Coding Asymptotics for Fixed and Varying Rates

This paper presents a saddlepoint approximation of the random-coding union bound of Polyanskiy et al. for i.i.d. random coding over discrete memoryless channels. The approximation is single-letter, and can thus be computed efficiently. Moreover, it is shown to be asymptotically tight for both fixed and varying rates, unifying existing achievability results in the regimes of error exponents, second-order coding rates, and moderate deviations. For fixed rates, novel exact-asymptotics expressions are specified to within a multiplicative 1+o(1) term. A numerical example is provided for which the approximation is remarkably accurate even at short block lengths.

preprint2013arXiv

A Derivation of the Asymptotic Random-Coding Prefactor

This paper studies the subexponential prefactor to the random-coding bound for a given rate. Using a refinement of Gallager's bounding techniques, an alternative proof of a recent result by Altuğ and Wagner is given, and the result is extended to the setting of mismatched decoding.

preprint2013arXiv

On the Dispersions of the Gel'fand-Pinsker Channel and Dirty Paper Coding

This paper studies second-order coding rates for memoryless channels with a state sequence known non-causally at the encoder. In the case of finite alphabets, an achievability result is obtained using constant-composition random coding, and by using a small fraction of the block to transmit the type of the state sequence. For error probabilities less than 1/2, it is shown that the second-order rate improves on an existing one based on i.i.d. random coding. In the Gaussian case (dirty paper coding) with an almost-sure power constraint, an achievability result is obtained used using random coding over the surface of a sphere, and using a small fraction of the block to transmit a quantized description of the state power. It is shown that the second-order asymptotics are identical to the single-user Gaussian channel of the same input power without a state.

preprint2011arXiv

On the Tradeoff Between Multiuser Diversity and Training Overhead in Multiple Access Channels

We consider a single antenna narrowband multiple access channel in which users send training sequences to the base station and scheduling is performed based on minimum mean square error (MMSE) channel estimates. In such a system, there is an inherent tradeoff between training overhead and the amount of multiuser diversity achieved. We analyze a block fading channel with independent Rayleigh distributed channel gains, where the parameters to be optimized are the number of users considered for transmission in each block and the corresponding time and power spent on training by each user. We derive closed form expressions for the optimal parameters in terms K and L, where K is the number of users considered for transmission in each block and L is the block length in symbols. Considering the behavior of the system as L grows large, we optimize K with respect to an approximate expression for the achievable rate, and obtain second order expressions for the resulting parameters in terms of L.

Jonathan Scarlett

What is connected

Connect this record

See the researcher in context

Building this map preview

38 published item(s)

A Robust Phased Elimination Algorithm for Corruption-Tolerant Gaussian Process Bandits

Adversarial Attacks on Gaussian Process Bandits

Generative Principal Component Analysis

Improved Convergence Rates for Sparse Approximation Methods in Kernel-Based Learning

Max-Min Grouped Bandits

Model-Based and Graph-Based Priors for Group Testing

Tight Regret Bounds for Noisy Optimization of a Brownian Motion

Universal 1-Bit Compressive Sensing for Bounded Dynamic Range Signals

A Characteristic Function Approach to Deep Implicit Generative Modeling

A Fast Binary Splitting Approach to Non-Adaptive Group Testing

Corruption-Tolerant Gaussian Process Bandit Optimization

High-Dimensional Bayesian Optimization via Tree-Structured Additive Models

Information-Theoretic Lower Bounds for Compressive Sensing with Generative Models

Learning Erdős-Rényi Random Graphs via Edge Detecting Queries

Learning Gaussian Graphical Models via Multiplicative Weights

Sample Complexity Bounds for 1-bit Compressive Sensing and Binary Stable Embeddings with Generative Priors

Sublinear-Time Non-Adaptive Group Testing with $O(k \log n)$ Tests via Bit-Mixing Coding

Converse Bounds for Noisy Group Testing with Arbitrary Measurement Matrices

Improved group testing rates with constant column weight designs

Learning-based Compressive Subsampling

Limits on Support Recovery with Probabilistic Models: An Information-Theoretic Framework

Multiuser Random Coding Techniques for Mismatched Decoding

On the Difficulty of Selecting Ising Models with Approximate Recovery

Partial Recovery Bounds for the Sparse Stochastic Block Model

The Dispersion of Nearest-Neighbor Decoding for Additive Non-Gaussian Channels

Time-Varying Gaussian Process Bandit Optimization

Truncated Variance Reduction: A Unified Approach to Bayesian Optimization and Level-Set Estimation

A Counter-Example to the Mismatched Decoding Converse for Binary-Input Discrete Memoryless Channels

Second-Order Asymptotics for the Discrete Memoryless MAC with Degraded Message Sets

Second-Order Asymptotics for the Gaussian MAC with Degraded Message Sets

Expurgated Random-Coding Ensembles: Exponents, Refinements and Connections

Mismatched Decoding: Error Exponents, Second-Order Rates and Saddlepoint Approximations

Second-Order Rate Region of Constant-Composition Codes for the Multiple-Access Channel

Sparsistency of $\ell_1$-Regularized $M$-Estimators

The Saddlepoint Approximation: Unified Random Coding Asymptotics for Fixed and Varying Rates

A Derivation of the Asymptotic Random-Coding Prefactor

On the Dispersions of the Gel'fand-Pinsker Channel and Dirty Paper Coding

On the Tradeoff Between Multiuser Diversity and Training Overhead in Multiple Access Channels