Researcher profile

Mario Diaz

Mario Diaz contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
6works
0followers
7topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

6 published item(s)

preprint2022arXiv

Fluctuations for matrix-valued Gaussian processes

We consider a symmetric matrix-valued Gaussian process $Y^{(n)}=(Y^{(n)}(t);t\ge0)$ and its empirical spectral measure process $μ^{(n)}=(μ_{t}^{(n)};t\ge0)$. Under some mild conditions on the covariance function of $Y^{(n)}$, we find an explicit expression for the limit distribution of $$Z_F^{(n)} := \left( \big(Z_{f_1}^{(n)}(t),\ldots,Z_{f_r}^{(n)}(t)\big) ; t\ge0\right),$$ where $F=(f_1,\dots, f_r)$, for $r\ge 1$, with each component belonging to a large class of test functions, and $$ Z_{f}^{(n)}(t) := n\int_{\mathbb{R}}f(x)μ_{t}^{(n)}(\text{d} x)-n\mathbb{E}\left[\int_{\mathbb{R}}f(x)μ_{t}^{(n)}(\text{d} x)\right].$$ More precisely, we establish the stable convergence of $Z_F^{(n)}$ and determine its limiting distribution. An upper bound for the total variation distance of the law of $Z_{f}^{(n)}(t)$ to its limiting distribution, for a test function $f$ and $t\geq0$ fixed, is also given.

preprint2022arXiv

Lower Bounds for the MMSE via Neural Network Estimation and Their Applications to Privacy

The minimum mean-square error (MMSE) achievable by optimal estimation of a random variable $Y\in\mathbb{R}$ given another random variable $X\in\mathbb{R}^{d}$ is of much interest in a variety of statistical settings. In the context of estimation-theoretic privacy, the MMSE has been proposed as an information leakage measure that captures the ability of an adversary in estimating $Y$ upon observing $X$. In this paper we establish provable lower bounds for the MMSE based on a two-layer neural network estimator of the MMSE and the Barron constant of an appropriate function of the conditional expectation of $Y$ given $X$. Furthermore, we derive a general upper bound for the Barron constant that, when $X\in\mathbb{R}$ is post-processed by the additive Gaussian mechanism and $Y$ is binary, produces order optimal estimates in the large noise regime. In order to obtain numerical lower bounds for the MMSE in some concrete applications, we introduce an efficient optimization process that approximates the value of the proposed neural network estimator. Overall, we provide an effective machinery to obtain provable lower bounds for the MMSE.

preprint2022arXiv

On the Analytic Structure of Second-Order Non-Commutative Probability Spaces and Functions of Bounded Fréchet Variation

In this paper we propose a new approach to the central limit theorem (CLT), based on functions of bounded Féchet variation for the continuously differentiable linear statistics of random matrix ensembles which relies on: a weaker form of a large deviation principle for the operator norm; a Poincaré-type inequality for the linear statistics; and the existence of a second-order limit distribution. This approach frames into a single setting many known random matrix ensembles and, as a consequence, classical central limit theorems for linear statistics are recovered and new ones are established, e.g., the CLT for the continuously differentiable linear statistics of block Gaussian matrices. In addition, our main results contribute to the understanding of the analytical structure of second-order non-commutative probability spaces. On the one hand, they pinpoint the source of the unbounded nature of the bilinear functional associated to these spaces; on the other hand, they lead to a general archetype for the integral representation of the second-order Cauchy transform, $G_2$. Furthermore, we establish that the covariance of resolvents converges to this transform and that the limiting covariance of analytic linear statistics can be expressed as a contour integral in $G_2$.

preprint2022arXiv

To Split or Not to Split: The Impact of Disparate Treatment in Classification

Disparate treatment occurs when a machine learning model yields different decisions for individuals based on a sensitive attribute (e.g., age, sex). In domains where prediction accuracy is paramount, it could potentially be acceptable to fit a model which exhibits disparate treatment. To evaluate the effect of disparate treatment, we compare the performance of split classifiers (i.e., classifiers trained and deployed separately on each group) with group-blind classifiers (i.e., classifiers which do not use a sensitive attribute). We introduce the benefit-of-splitting for quantifying the performance improvement by splitting classifiers. Computing the benefit-of-splitting directly from its definition could be intractable since it involves solving optimization problems over an infinite-dimensional functional space. Under different performance measures, we (i) prove an equivalent expression for the benefit-of-splitting which can be efficiently computed by solving small-scale convex programs; (ii) provide sharp upper and lower bounds for the benefit-of-splitting which reveal precise conditions where a group-blind classifier will always suffer from a non-trivial performance gap from the split classifiers. In the finite sample regime, splitting is not necessarily beneficial and we provide data-dependent bounds to understand this effect. Finally, we validate our theoretical results through numerical experiments on both synthetic and real-world datasets.

preprint2020arXiv

On the Robustness of Information-Theoretic Privacy Measures and Mechanisms

Consider a data publishing setting for a dataset composed by both private and non-private features. The publisher uses an empirical distribution, estimated from $n$ i.i.d. samples, to design a privacy mechanism which is applied to new fresh samples afterward. In this paper, we study the discrepancy between the privacy-utility guarantees for the empirical distribution, used to design the privacy mechanism, and those for the true distribution, experienced by the privacy mechanism in practice. We first show that, for any privacy mechanism, these discrepancies vanish at speed $O(1/\sqrt{n})$ with high probability. These bounds follow from our main technical results regarding the Lipschitz continuity of the considered information leakage measures. Then we prove that the optimal privacy mechanisms for the empirical distribution approach the corresponding mechanisms for the true distribution as the sample size $n$ increases, thereby establishing the statistical consistency of the optimal privacy mechanisms. Finally, we introduce and study uniform privacy mechanisms which, by construction, provide privacy to all the distributions within a neighborhood of the estimated distribution and, thereby, guarantee privacy for the true distribution with high probability.

preprint2020arXiv

Privacy Amplification of Iterative Algorithms via Contraction Coefficients

We investigate the framework of privacy amplification by iteration, recently proposed by Feldman et al., from an information-theoretic lens. We demonstrate that differential privacy guarantees of iterative mappings can be determined by a direct application of contraction coefficients derived from strong data processing inequalities for $f$-divergences. In particular, by generalizing the Dobrushin's contraction coefficient for total variation distance to an $f$-divergence known as $E_γ$-divergence, we derive tighter bounds on the differential privacy parameters of the projected noisy stochastic gradient descent algorithm with hidden intermediate updates.