Source author record

Alex Dytso

Alex Dytso appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Information Theory math.IT Machine Learning math.ST Statistics Theory eess.SP Cryptography and Security Distributed, Parallel, and Cluster Computing Methodology Neural and Evolutionary Computing

Catalog footprint

What is connected

17works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Sub-Gaussian Concentration and Entropic Normality of the Maximum Likelihood Estimator

It is well known that, under standard regularity conditions, the maximum likelihood estimator (MLE) satisfies a central limit theorem and converges in distribution to a Gaussian random variable as the sample size grows. This paper strengthens this classical result by developing several stronger forms of asymptotic normality for the normalized MLE. With additional assumptions on the score, we first establish sub-Gaussian tail bounds and convergence of all moments for the normalized estimation error. We then prove an entropic central limit theorem for a smoothed version of the estimator, showing convergence in relative entropy to the limiting Gaussian law. When the Fisher information of the normalized estimate is bounded, or its density has bounded first derivative, we further show that the smoothing can be removed, yielding entropic normality of the MLE itself. The proofs develop auxiliary tools that may be of independent interest, including exponential consistency bounds, high-moment estimates, and entropy-control arguments for the estimator.

preprint2022arXiv

A Dimensionality Reduction Method for Finding Least Favorable Priors with a Focus on Bregman Divergence

A common way of characterizing minimax estimators in point estimation is by moving the problem into the Bayesian estimation domain and finding a least favorable prior distribution. The Bayesian estimator induced by a least favorable prior, under mild conditions, is then known to be minimax. However, finding least favorable distributions can be challenging due to inherent optimization over the space of probability distributions, which is infinite-dimensional. This paper develops a dimensionality reduction method that allows us to move the optimization to a finite-dimensional setting with an explicit bound on the dimension. The benefit of this dimensionality reduction is that it permits the use of popular algorithms such as projected gradient ascent to find least favorable priors. Throughout the paper, in order to make progress on the problem, we restrict ourselves to Bayesian risks induced by a relatively large class of loss functions, namely Bregman divergences.

preprint2022arXiv

An MMSE Lower Bound via Poincaré Inequality

This paper studies the minimum mean squared error (MMSE) of estimating $\mathbf{X} \in \mathbb{R}^d$ from the noisy observation $\mathbf{Y} \in \mathbb{R}^k$, under the assumption that the noise (i.e., $\mathbf{Y}|\mathbf{X}$) is a member of the exponential family. The paper provides a new lower bound on the MMSE. Towards this end, an alternative representation of the MMSE is first presented, which is argued to be useful in deriving closed-form expressions for the MMSE. This new representation is then used together with the Poincaré inequality to provide a new lower bound on the MMSE. Unlike, for example, the Cramér-Rao bound, the new bound holds for all possible distributions on the input $\mathbf{X}$. Moreover, the lower bound is shown to be tight in the high-noise regime for the Gaussian noise setting under the assumption that $\mathbf{X}$ is sub-Gaussian. Finally, several numerical examples are shown which demonstrate that the bound performs well in all noise regimes.

preprint2022arXiv

Entropic CLT for Order Statistics

It is well known that central order statistics exhibit a central limit behavior and converge to a Gaussian distribution as the sample size grows. This paper strengthens this known result by establishing an entropic version of the CLT that ensures a stronger mode of convergence using the relative entropy. In particular, an order $O(1/\sqrt{n})$ rate of convergence is established under mild conditions on the parent distribution of the sample generating the order statistics. To prove this result, ancillary results on order statistics are derived, which might be of independent interest.

preprint2022arXiv

On the Capacity Achieving Input of Amplitude Constrained Vector Gaussian Wiretap Channel

This paper studies secrecy-capacity of an $n$-dimensional Gaussian wiretap channel under the peak-power constraint. This work determines the largest peak-power constraint $\bar{\mathsf{R}}_n$ such that an input distribution uniformly distributed on a single sphere is optimal; this regime is termed the small-amplitude regime. The asymptotic of $\bar{\mathsf{R}}_n$ as $n$ goes to infinity is completely characterized as a function of noise variance at both receivers. Moreover, the secrecy-capacity is also characterized in a form amenable for computation. Furthermore, several numerical examples are provided, such as the example of the secrecy-capacity achieving distribution outside of the small amplitude regime.

preprint2021arXiv

The Most Informative Order Statistic and its Application to Image Denoising

We consider the problem of finding the subset of order statistics that contains the most information about a sample of random variables drawn independently from some known parametric distribution. We leverage information-theoretic quantities, such as entropy and mutual information, to quantify the level of informativeness and rigorously characterize the amount of information contained in any subset of the complete collection of order statistics. As an example, we show how these informativeness metrics can be evaluated for a sample of discrete Bernoulli and continuous Uniform random variables. Finally, we unveil how our most informative order statistics framework can be applied to image processing applications. Specifically, we investigate how the proposed measures can be used to choose the coefficients of the L-estimator filter to denoise an image corrupted by random noise. We show that both for discrete (e.g., salt-pepper noise) and continuous (e.g., mixed Gaussian noise) noise distributions, the proposed method is competitive with off-the-shelf filters, such as the median and the total variation filters, as well as with wavelet-based denoising methods.

preprint2020arXiv

A Cramér-Rao Type Bound for Bayesian Risk with Bregman Loss

A general class of Bayesian lower bounds when the underlying loss function is a Bregman divergence is demonstrated. This class can be considered as an extension of the Weinstein--Weiss family of bounds for the mean squared error and relies on finding a variational characterization of Bayesian risk. The approach allows for the derivation of a version of the Cramér--Rao bound that is specific to a given Bregman divergence. The new generalization of the Cramér--Rao bound reduces to the classical one when the loss function is taken to be the Euclidean norm. The effectiveness of the new bound is evaluated in the Poisson noise setting and the Binomial noise setting.

preprint2020arXiv

Estimation in Poisson Noise: Properties of the Conditional Mean Estimator

This paper considers estimation of a random variable in Poisson noise with signal scaling coefficient and dark current as explicit parameters of the noise model. Specifically, the paper focuses on properties of the conditional mean estimator as a function of the scaling coefficient, the dark current parameter, the distribution of the input random variable and channel realizations. With respect to the scaling coefficient and the dark current, several identities in terms of derivatives are established. For example, it is shown that the gradient of the conditional mean estimator with respect to the scaling coefficient and dark current parameter is proportional to the conditional variance. Moreover, a score function is proposed and a Tweedie-like formula for the conditional expectation is recovered. With respect to the distribution, several regularity conditions are shown. For instance, it is shown that the conditional mean estimator uniquely determines the input distribution. Moreover, it is shown that if the conditional expectation is close to a linear function in terms of mean squared error, then the input distribution is approximately gamma in the Lévy distance. Furthermore, sufficient and necessary conditions for linearity are found. Interestingly, it is shown that the conditional mean estimator cannot be linear when the dark current parameter of the Poisson noise is non-zero.

preprint2020arXiv

Information-Theoretic Bounds on the Generalization Error and Privacy Leakage in Federated Learning

Machine learning algorithms operating on mobile networks can be characterized into three different categories. First is the classical situation in which the end-user devices send their data to a central server where this data is used to train a model. Second is the distributed setting in which each device trains its own model and send its model parameters to a central server where these model parameters are aggregated to create one final model. Third is the federated learning setting in which, at any given time $t$, a certain number of active end users train with their own local data along with feedback provided by the central server and then send their newly estimated model parameters to the central server. The server, then, aggregates these new parameters, updates its own model, and feeds the updated parameters back to all the end users, continuing this process until it converges. The main objective of this work is to provide an information-theoretic framework for all of the aforementioned learning paradigms. Moreover, using the provided framework, we develop upper and lower bounds on the generalization error together with bounds on the privacy leakage in the classical, distributed and federated learning settings. Keywords: Federated Learning, Distributed Learning, Machine Learning, Model Aggregation.

preprint2020arXiv

Nonparametric Estimation of the Fisher Information and Its Applications

This paper considers the problem of estimation of the Fisher information for location from a random sample of size $n$. First, an estimator proposed by Bhattacharya is revisited and improved convergence rates are derived. Second, a new estimator, termed a clipped estimator, is proposed. Superior upper bounds on the rates of convergence can be shown for the new estimator compared to the Bhattacharya estimator, albeit with different regularity conditions. Third, both of the estimators are evaluated for the practically relevant case of a random variable contaminated by Gaussian noise. Moreover, using Brown's identity, which relates the Fisher information and the minimum mean squared error (MMSE) in Gaussian noise, two corresponding consistent estimators for the MMSE are proposed. Simulation examples for the Bhattacharya estimator and the clipped estimator as well as the MMSE estimators are presented. The examples demonstrate that the clipped estimator can significantly reduce the required sample size to guarantee a specific confidence interval compared to the Bhattacharya estimator.

preprint2020arXiv

Tight Bounds on the Weighted Sum of MMSEs with Applications in Distributed Estimation

In this paper, tight upper and lower bounds are derived on the weighted sum of minimum mean-squared errors for additive Gaussian noise channels. The bounds are obtained by constraining the input distribution to be close to a Gaussian reference distribution in terms of the Kullback--Leibler divergence. The distributions that attain these bounds are shown to be Gaussian whose covariance matrices are defined implicitly via systems of matrix equations. Furthermore, the estimators that attain the upper bound are shown to be minimax robust against deviations from the assumed input distribution. The lower bound provides a potentially tighter alternative to well-known inequalities such as the Cramér--Rao lower bound. Numerical examples are provided to verify the theoretical findings of the paper. The results derived in this paper can be used to obtain performance bounds, robustness guarantees, and engineering guidelines for the design of local estimators for distributed estimation problems which commonly arise in wireless communication systems and sensor networks.

preprint2016arXiv

On Communication through a Gaussian Channel with an MMSE Disturbance Constraint

This paper considers a Gaussian channel with one transmitter and two receivers. The goal is to maximize the communication rate at the intended/primary receiver subject to a disturbance constraint at the unintended/secondary receiver. The disturbance is measured in terms of minimum mean square error (MMSE) of the interference that the transmission to the primary receiver inflicts on the secondary receiver. The paper presents a new upper bound for the problem of maximizing the mutual information subject to an MMSE constraint. The new bound holds for vector inputs of any length and recovers a previously known limiting (when the length of vector input tends to infinity) expression from the work of Bustin $\textit{et al.}$ The key technical novelty is a new upper bound on the MMSE. This bound allows one to bound the MMSE for all signal-to-noise ratio (SNR) values $\textit{below}$ a certain SNR at which the MMSE is known (which corresponds to the disturbance constraint). This bound complements the `single-crossing point property' of the MMSE that upper bounds the MMSE for all SNR values $\textit{above}$ a certain value at which the MMSE value is known. The MMSE upper bound provides a refined characterization of the phase-transition phenomenon which manifests, in the limit as the length of the vector input goes to infinity, as a discontinuity of the MMSE for the problem at hand. For vector inputs of size $n=1$, a matching lower bound, to within an additive gap of order $O \left( \log \log \frac{1}{\sf MMSE} \right)$ (where ${\sf MMSE}$ is the disturbance constraint), is shown by means of the mixed inputs technique recently introduced by Dytso $\textit{et al.}$

preprint2016arXiv

On the Minimum Mean $p$-th Error in Gaussian Noise Channels and its Applications

The problem of estimating an arbitrary random vector from its observation corrupted by additive white Gaussian noise, where the cost function is taken to be the Minimum Mean $p$-th Error (MMPE), is considered. The classical Minimum Mean Square Error (MMSE) is a special case of the MMPE. Several bounds, properties and applications of the MMPE are derived and discussed. The optimal MMPE estimator is found for Gaussian and binary input distributions. Properties of the MMPE as a function of the input distribution, SNR and order $p$ are derived. In particular, it is shown that the MMPE is a continuous function of $p$ and SNR. These results are possible in view of interpolation and change of measure bounds on the MMPE. The `Single-Crossing-Point Property' (SCPP) that bounds the MMSE for all SNR values {\it above} a certain value, at which the MMSE is known, together with the I-MMSE relationship is a powerful tool in deriving converse proofs in information theory. By studying the notion of conditional MMPE, a unifying proof (i.e., for any $p$) of the SCPP is shown. A complementary bound to the SCPP is then shown, which bounds the MMPE for all SNR values {\it below} a certain value, at which the MMPE is known. As a first application of the MMPE, a bound on the conditional differential entropy in terms of the MMPE is provided, which then yields a generalization of the Ozarow-Wyner lower bound on the mutual information achieved by a discrete input on a Gaussian noise channel. As a second application, the MMPE is shown to improve on previous characterizations of the phase transition phenomenon that manifests, in the limit as the length of the capacity achieving code goes to infinity, as a discontinuity of the MMSE as a function of SNR. As a final application, the MMPE is used to show bounds on the second derivative of mutual information, that tighten previously known bounds.

preprint2015arXiv

Interference as Noise: Friend or Foe?

This paper shows that for the two-user Gaussian Interference Channel (G-IC) Treating Interference as Noise without Time Sharing (TINnoTS) achieves the closure of the capacity region to within either a constant gap, or to within a gap of the order O(logln(min(S,I))) where S is the largest Signal to Noise Ratio (SNR) on the direct links and I is the largest Interference to Noise Ratio (INR) on the cross links. As a consequence, TINnoTS is optimal from a generalized Degrees of Freedom (gDoF) perspective for all channel gains except for a subset of zero measure. TINnoTS with Gaussian inputs is known to be optimal to within 1/2 bit for a subset of the weak interference regime. Surprisingly, this paper shows that TINnoTS is gDoG optimal in all parameter regimes, even in the strong and very strong interference regimes where joint decoding of Gaussian inputs is optimal. For approximate optimality of TINnoTS in all parameter regimes it is critical to use non-Gaussian inputs. This work thus proposes to use mixed inputs as channel inputs where a mixed input is the sum of a discrete and a Gaussian random variable. Interestingly, compared to the Han-Kobayashi inner bound, the discrete part of a mixed input is shown to effectively act as a common message in the sense that, although treated as noise, its effect on the achievable rate region is as if it were jointly decoded together with the desired messages at a non-intended receiver. The practical implication is that a discrete interfering input is a 'friend', while a Gaussian interfering input is in general a 'foe'. Since TINnoTS requires neither joint decoding nor time sharing, the results of this paper are applicable to a variety of oblivions or asynchronous channels, such as the block asynchronous G-IC (which is not an information stable) and the G-IC with partial codebook knowledge at one or more receivers.

preprint2014arXiv

On Discrete Alphabets for the Two-user Gaussian Interference Channel with One Receiver Lacking Knowledge of the Interfering Codebook

In multi-user information theory it is often assumed that every node in the network possesses all codebooks used in the network. This assumption is however impractical in distributed ad-hoc and cognitive networks. This work considers the two- user Gaussian Interference Channel with one Oblivious Receiver (G-IC-OR), i.e., one receiver lacks knowledge of the interfering cookbook while the other receiver knows both codebooks. We ask whether, and if so how much, the channel capacity of the G-IC- OR is reduced compared to that of the classical G-IC where both receivers know all codebooks. Intuitively, the oblivious receiver should not be able to jointly decode its intended message along with the unintended interfering message whose codebook is unavailable. We demonstrate that in strong and very strong interference, where joint decoding is capacity achieving for the classical G-IC, lack of codebook knowledge does not reduce performance in terms of generalized degrees of freedom (gDoF). Moreover, we show that the sum-capacity of the symmetric G-IC- OR is to within O(log(log(SNR))) of that of the classical G-IC. The key novelty of the proposed achievable scheme is the use of a discrete input alphabet for the non-oblivious transmitter, whose cardinality is appropriately chosen as a function of SNR.

preprint2014arXiv

On the Capacity Region of the Two-user Interference Channel with a Cognitive Relay

This paper considers a variation of the classical two-user interference channel where the communication of two interfering source-destination pairs is aided by an additional node that has a priori knowledge of the messages to be transmitted, which is referred to as the it cognitive relay. For this Interference Channel with a Cognitive Relay (ICCR) In particular, for the class of injective semi-deterministic ICCRs, a sum-rate upper bound is derived for the general memoryless ICCR and further tightened for the Linear Deterministic Approximation (LDA) of the Gaussian noise channel at high SNR, which disregards the noise and focuses on the interaction among the users' signals. The capacity region of the symmetric LDA is completely characterized except for the regime of moderately weak interference and weak links from the CR to the destinations. The insights gained from the analysis of the LDA are then translated back to the symmetric Gaussian noise channel (GICCR). For the symmetric GICCR, an approximate characterization (to within a constant gap) of the capacity region is provided for a parameter regime where capacity was previously unknown. The approximately optimal scheme suggests that message cognition at a relay is beneficial for interference management as it enables simultaneous over the air neutralization of the interference at both destinations.

preprint2014arXiv

On the Two-user Interference Channel with Lack of Knowledge of the Interference Codebook at one Receiver

In multi-user information theory it is often assumed that every node in the network possesses all codebooks used in the network. This assumption may be impractical in distributed ad-hoc, cognitive or heterogeneous networks. This work considers the two-user Interference Channel with one Oblivious Receiver (IC-OR), i.e., one receiver lacks knowledge of the interfering cookbook while the other receiver knows both codebooks. The paper asks whether, and if so how much, the channel capacity of the IC-OR is reduced compared to that of the classical IC where both receivers know all codebooks. A novel outer bound is derived and shown to be achievable to within a gap for the class of injective semi-deterministic IC-ORs; the gap is shown to be zero for injective fully deterministic IC-ORs. For the linear deterministic IC-OR that models the Gaussian noise channel at high SNR, non i.i.d. Bernoulli(1/2) input bits are shown to achieve points not achievable by i.i.d. Bernoulli(1/2) input bits used in the same achievability scheme. For the real-valued Gaussian IC-OR the gap is shown to be at most 1/2 bit per channel use, even though the set of optimal input distributions for the derived outer bound could not be determined. Towards understanding the Gaussian IC-OR, an achievability strategy is evaluated in which the input alphabets at the non-oblivious transmitter are a mixture of discrete and Gaussian random variables, where the cardinality of the discrete part is appropriately chosen as a function of the channel parameters. Surprisingly, as the oblivious receiver intuitively should not be able to 'jointly decode' the intended and interfering messages (whose codebook is unavailable), it is shown that with this choice of input, the capacity region of the symmetric Gaussian IC-OR is to within 3.34 bits (per channel use per user) of an outer bound for the classical Gaussian IC with full codebook knowledge.

Alex Dytso

What is connected

Connect this record

See the researcher in context

Building this map preview

17 published item(s)

Sub-Gaussian Concentration and Entropic Normality of the Maximum Likelihood Estimator

A Dimensionality Reduction Method for Finding Least Favorable Priors with a Focus on Bregman Divergence

An MMSE Lower Bound via Poincaré Inequality

Entropic CLT for Order Statistics

On the Capacity Achieving Input of Amplitude Constrained Vector Gaussian Wiretap Channel

The Most Informative Order Statistic and its Application to Image Denoising

A Cramér-Rao Type Bound for Bayesian Risk with Bregman Loss

Estimation in Poisson Noise: Properties of the Conditional Mean Estimator

Information-Theoretic Bounds on the Generalization Error and Privacy Leakage in Federated Learning

Nonparametric Estimation of the Fisher Information and Its Applications

Tight Bounds on the Weighted Sum of MMSEs with Applications in Distributed Estimation

On Communication through a Gaussian Channel with an MMSE Disturbance Constraint

On the Minimum Mean $p$-th Error in Gaussian Noise Channels and its Applications

Interference as Noise: Friend or Foe?

On Discrete Alphabets for the Two-user Gaussian Interference Channel with One Receiver Lacking Knowledge of the Interfering Codebook

On the Capacity Region of the Two-user Interference Channel with a Cognitive Relay

On the Two-user Interference Channel with Lack of Knowledge of the Interference Codebook at one Receiver