Researcher profile

Alex Dytso

Alex Dytso contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
11works
0followers
10topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

11 published item(s)

preprint2026arXiv

Sub-Gaussian Concentration and Entropic Normality of the Maximum Likelihood Estimator

It is well known that, under standard regularity conditions, the maximum likelihood estimator (MLE) satisfies a central limit theorem and converges in distribution to a Gaussian random variable as the sample size grows. This paper strengthens this classical result by developing several stronger forms of asymptotic normality for the normalized MLE. With additional assumptions on the score, we first establish sub-Gaussian tail bounds and convergence of all moments for the normalized estimation error. We then prove an entropic central limit theorem for a smoothed version of the estimator, showing convergence in relative entropy to the limiting Gaussian law. When the Fisher information of the normalized estimate is bounded, or its density has bounded first derivative, we further show that the smoothing can be removed, yielding entropic normality of the MLE itself. The proofs develop auxiliary tools that may be of independent interest, including exponential consistency bounds, high-moment estimates, and entropy-control arguments for the estimator.

preprint2022arXiv

A Dimensionality Reduction Method for Finding Least Favorable Priors with a Focus on Bregman Divergence

A common way of characterizing minimax estimators in point estimation is by moving the problem into the Bayesian estimation domain and finding a least favorable prior distribution. The Bayesian estimator induced by a least favorable prior, under mild conditions, is then known to be minimax. However, finding least favorable distributions can be challenging due to inherent optimization over the space of probability distributions, which is infinite-dimensional. This paper develops a dimensionality reduction method that allows us to move the optimization to a finite-dimensional setting with an explicit bound on the dimension. The benefit of this dimensionality reduction is that it permits the use of popular algorithms such as projected gradient ascent to find least favorable priors. Throughout the paper, in order to make progress on the problem, we restrict ourselves to Bayesian risks induced by a relatively large class of loss functions, namely Bregman divergences.

preprint2022arXiv

An MMSE Lower Bound via Poincaré Inequality

This paper studies the minimum mean squared error (MMSE) of estimating $\mathbf{X} \in \mathbb{R}^d$ from the noisy observation $\mathbf{Y} \in \mathbb{R}^k$, under the assumption that the noise (i.e., $\mathbf{Y}|\mathbf{X}$) is a member of the exponential family. The paper provides a new lower bound on the MMSE. Towards this end, an alternative representation of the MMSE is first presented, which is argued to be useful in deriving closed-form expressions for the MMSE. This new representation is then used together with the Poincaré inequality to provide a new lower bound on the MMSE. Unlike, for example, the Cramér-Rao bound, the new bound holds for all possible distributions on the input $\mathbf{X}$. Moreover, the lower bound is shown to be tight in the high-noise regime for the Gaussian noise setting under the assumption that $\mathbf{X}$ is sub-Gaussian. Finally, several numerical examples are shown which demonstrate that the bound performs well in all noise regimes.

preprint2022arXiv

Entropic CLT for Order Statistics

It is well known that central order statistics exhibit a central limit behavior and converge to a Gaussian distribution as the sample size grows. This paper strengthens this known result by establishing an entropic version of the CLT that ensures a stronger mode of convergence using the relative entropy. In particular, an order $O(1/\sqrt{n})$ rate of convergence is established under mild conditions on the parent distribution of the sample generating the order statistics. To prove this result, ancillary results on order statistics are derived, which might be of independent interest.

preprint2022arXiv

On the Capacity Achieving Input of Amplitude Constrained Vector Gaussian Wiretap Channel

This paper studies secrecy-capacity of an $n$-dimensional Gaussian wiretap channel under the peak-power constraint. This work determines the largest peak-power constraint $\bar{\mathsf{R}}_n$ such that an input distribution uniformly distributed on a single sphere is optimal; this regime is termed the small-amplitude regime. The asymptotic of $\bar{\mathsf{R}}_n$ as $n$ goes to infinity is completely characterized as a function of noise variance at both receivers. Moreover, the secrecy-capacity is also characterized in a form amenable for computation. Furthermore, several numerical examples are provided, such as the example of the secrecy-capacity achieving distribution outside of the small amplitude regime.

preprint2021arXiv

The Most Informative Order Statistic and its Application to Image Denoising

We consider the problem of finding the subset of order statistics that contains the most information about a sample of random variables drawn independently from some known parametric distribution. We leverage information-theoretic quantities, such as entropy and mutual information, to quantify the level of informativeness and rigorously characterize the amount of information contained in any subset of the complete collection of order statistics. As an example, we show how these informativeness metrics can be evaluated for a sample of discrete Bernoulli and continuous Uniform random variables. Finally, we unveil how our most informative order statistics framework can be applied to image processing applications. Specifically, we investigate how the proposed measures can be used to choose the coefficients of the L-estimator filter to denoise an image corrupted by random noise. We show that both for discrete (e.g., salt-pepper noise) and continuous (e.g., mixed Gaussian noise) noise distributions, the proposed method is competitive with off-the-shelf filters, such as the median and the total variation filters, as well as with wavelet-based denoising methods.

preprint2020arXiv

A Cramér-Rao Type Bound for Bayesian Risk with Bregman Loss

A general class of Bayesian lower bounds when the underlying loss function is a Bregman divergence is demonstrated. This class can be considered as an extension of the Weinstein--Weiss family of bounds for the mean squared error and relies on finding a variational characterization of Bayesian risk. The approach allows for the derivation of a version of the Cramér--Rao bound that is specific to a given Bregman divergence. The new generalization of the Cramér--Rao bound reduces to the classical one when the loss function is taken to be the Euclidean norm. The effectiveness of the new bound is evaluated in the Poisson noise setting and the Binomial noise setting.

preprint2020arXiv

Estimation in Poisson Noise: Properties of the Conditional Mean Estimator

This paper considers estimation of a random variable in Poisson noise with signal scaling coefficient and dark current as explicit parameters of the noise model. Specifically, the paper focuses on properties of the conditional mean estimator as a function of the scaling coefficient, the dark current parameter, the distribution of the input random variable and channel realizations. With respect to the scaling coefficient and the dark current, several identities in terms of derivatives are established. For example, it is shown that the gradient of the conditional mean estimator with respect to the scaling coefficient and dark current parameter is proportional to the conditional variance. Moreover, a score function is proposed and a Tweedie-like formula for the conditional expectation is recovered. With respect to the distribution, several regularity conditions are shown. For instance, it is shown that the conditional mean estimator uniquely determines the input distribution. Moreover, it is shown that if the conditional expectation is close to a linear function in terms of mean squared error, then the input distribution is approximately gamma in the Lévy distance. Furthermore, sufficient and necessary conditions for linearity are found. Interestingly, it is shown that the conditional mean estimator cannot be linear when the dark current parameter of the Poisson noise is non-zero.

preprint2020arXiv

Information-Theoretic Bounds on the Generalization Error and Privacy Leakage in Federated Learning

Machine learning algorithms operating on mobile networks can be characterized into three different categories. First is the classical situation in which the end-user devices send their data to a central server where this data is used to train a model. Second is the distributed setting in which each device trains its own model and send its model parameters to a central server where these model parameters are aggregated to create one final model. Third is the federated learning setting in which, at any given time $t$, a certain number of active end users train with their own local data along with feedback provided by the central server and then send their newly estimated model parameters to the central server. The server, then, aggregates these new parameters, updates its own model, and feeds the updated parameters back to all the end users, continuing this process until it converges. The main objective of this work is to provide an information-theoretic framework for all of the aforementioned learning paradigms. Moreover, using the provided framework, we develop upper and lower bounds on the generalization error together with bounds on the privacy leakage in the classical, distributed and federated learning settings. Keywords: Federated Learning, Distributed Learning, Machine Learning, Model Aggregation.

preprint2020arXiv

Nonparametric Estimation of the Fisher Information and Its Applications

This paper considers the problem of estimation of the Fisher information for location from a random sample of size $n$. First, an estimator proposed by Bhattacharya is revisited and improved convergence rates are derived. Second, a new estimator, termed a clipped estimator, is proposed. Superior upper bounds on the rates of convergence can be shown for the new estimator compared to the Bhattacharya estimator, albeit with different regularity conditions. Third, both of the estimators are evaluated for the practically relevant case of a random variable contaminated by Gaussian noise. Moreover, using Brown's identity, which relates the Fisher information and the minimum mean squared error (MMSE) in Gaussian noise, two corresponding consistent estimators for the MMSE are proposed. Simulation examples for the Bhattacharya estimator and the clipped estimator as well as the MMSE estimators are presented. The examples demonstrate that the clipped estimator can significantly reduce the required sample size to guarantee a specific confidence interval compared to the Bhattacharya estimator.

preprint2020arXiv

Tight Bounds on the Weighted Sum of MMSEs with Applications in Distributed Estimation

In this paper, tight upper and lower bounds are derived on the weighted sum of minimum mean-squared errors for additive Gaussian noise channels. The bounds are obtained by constraining the input distribution to be close to a Gaussian reference distribution in terms of the Kullback--Leibler divergence. The distributions that attain these bounds are shown to be Gaussian whose covariance matrices are defined implicitly via systems of matrix equations. Furthermore, the estimators that attain the upper bound are shown to be minimax robust against deviations from the assumed input distribution. The lower bound provides a potentially tighter alternative to well-known inequalities such as the Cramér--Rao lower bound. Numerical examples are provided to verify the theoretical findings of the paper. The results derived in this paper can be used to obtain performance bounds, robustness guarantees, and engineering guidelines for the design of local estimators for distributed estimation problems which commonly arise in wireless communication systems and sensor networks.