Source author record

Farzan Farnia

Farzan Farnia appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Information Theory math.IT Artificial Intelligence math.OC Computer Science and Game Theory Computer Vision

Catalog footprint

What is connected

11works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Training-Free Distribution Adaptation for Diffusion Models via Maximum Mean Discrepancy Guidance

Pre-trained diffusion models have emerged as powerful generative priors for both unconditional and conditional sample generation, yet their outputs often deviate from the characteristics of user-specific target data. Such mismatches are especially problematic in domain adaptation tasks, where only a few reference examples are available and retraining the diffusion model is infeasible. Existing inference-time guidance methods can adjust sampling trajectories, but they typically optimize surrogate objectives such as classifier likelihoods rather than directly aligning with the target distribution. We propose MMD Guidance, a training-free mechanism that augments the reverse diffusion process with gradients of the Maximum Mean Discrepancy (MMD) between generated samples and a reference dataset. MMD provides reliable distributional estimates from limited data, exhibits low variance in practice, and is efficiently differentiable, which makes it particularly well-suited for the guidance task. Our framework naturally extends to prompt-aware adaptation in conditional generation models via product kernels. Also, it can be applied with computational efficiency in latent diffusion models (LDMs), since guidance is applied in the latent space of the LDM. Experiments on synthetic and real-world benchmarks demonstrate that MMD Guidance can achieve distributional alignment while preserving sample fidelity.

preprint2022arXiv

An Optimal Transport Approach to Personalized Federated Learning

Federated learning is a distributed machine learning paradigm, which aims to train a model using the local data of many distributed clients. A key challenge in federated learning is that the data samples across the clients may not be identically distributed. To address this challenge, personalized federated learning with the goal of tailoring the learned model to the data distribution of every individual client has been proposed. In this paper, we focus on this problem and propose a novel personalized Federated Learning scheme based on Optimal Transport (FedOT) as a learning algorithm that learns the optimal transport maps for transferring data points to a common distribution as well as the prediction model under the applied transport map. To formulate the FedOT problem, we extend the standard optimal transport task between two probability distributions to multi-marginal optimal transport problems with the goal of transporting samples from multiple distributions to a common probability domain. We then leverage the results on multi-marginal optimal transport problems to formulate FedOT as a min-max optimization problem and analyze its generalization and optimization properties. We discuss the results of several numerical experiments to evaluate the performance of FedOT under heterogeneous data distributions in federated learning problems.

preprint2022arXiv

On Convergence of Gradient Descent Ascent: A Tight Local Analysis

Gradient Descent Ascent (GDA) methods are the mainstream algorithms for minimax optimization in generative adversarial networks (GANs). Convergence properties of GDA have drawn significant interest in the recent literature. Specifically, for $\min_{\mathbf{x}} \max_{\mathbf{y}} f(\mathbf{x};\mathbf{y})$ where $f$ is strongly-concave in $\mathbf{y}$ and possibly nonconvex in $\mathbf{x}$, (Lin et al., 2020) proved the convergence of GDA with a stepsize ratio $η_{\mathbf{y}}/η_{\mathbf{x}}=Θ(κ^2)$ where $η_{\mathbf{x}}$ and $η_{\mathbf{y}}$ are the stepsizes for $\mathbf{x}$ and $\mathbf{y}$ and $κ$ is the condition number for $\mathbf{y}$. While this stepsize ratio suggests a slow training of the min player, practical GAN algorithms typically adopt similar stepsizes for both variables, indicating a wide gap between theoretical and empirical results. In this paper, we aim to bridge this gap by analyzing the \emph{local convergence} of general \emph{nonconvex-nonconcave} minimax problems. We demonstrate that a stepsize ratio of $Θ(κ)$ is necessary and sufficient for local convergence of GDA to a Stackelberg Equilibrium, where $κ$ is the local condition number for $\mathbf{y}$. We prove a nearly tight convergence rate with a matching lower bound. We further extend the convergence guarantees to stochastic GDA and extra-gradient methods (EG). Finally, we conduct several numerical experiments to support our theoretical findings.

preprint2022arXiv

On the Role of Generalization in Transferability of Adversarial Examples

Black-box adversarial attacks designing adversarial examples for unseen neural networks (NNs) have received great attention over the past years. While several successful black-box attack schemes have been proposed in the literature, the underlying factors driving the transferability of black-box adversarial examples still lack a thorough understanding. In this paper, we aim to demonstrate the role of the generalization properties of the substitute classifier used for generating adversarial examples in the transferability of the attack scheme to unobserved NN classifiers. To do this, we apply the max-min adversarial example game framework and show the importance of the generalization properties of the substitute NN in the success of the black-box attack scheme in application to different NN classifiers. We prove theoretical generalization bounds on the difference between the attack transferability rates on training and test samples. Our bounds suggest that a substitute NN with better generalization behavior could result in more transferable adversarial examples. In addition, we show that standard operator norm-based regularization methods could improve the transferability of the designed adversarial examples. We support our theoretical results by performing several numerical experiments showing the role of the substitute network's generalization in generating transferable adversarial examples. Our empirical results indicate the power of Lipschitz regularization methods in improving the transferability of adversarial examples.

preprint2020arXiv

GANs May Have No Nash Equilibria

Generative adversarial networks (GANs) represent a zero-sum game between two machine players, a generator and a discriminator, designed to learn the distribution of data. While GANs have achieved state-of-the-art performance in several benchmark learning tasks, GAN minimax optimization still poses great theoretical and empirical challenges. GANs trained using first-order optimization methods commonly fail to converge to a stable solution where the players cannot improve their objective, i.e., the Nash equilibrium of the underlying game. Such issues raise the question of the existence of Nash equilibrium solutions in the GAN zero-sum game. In this work, we show through several theoretical and numerical results that indeed GAN zero-sum games may not have any local Nash equilibria. To characterize an equilibrium notion applicable to GANs, we consider the equilibrium of a new zero-sum game with an objective function given by a proximal operator applied to the original objective, a solution we call the proximal equilibrium. Unlike the Nash equilibrium, the proximal equilibrium captures the sequential nature of GANs, in which the generator moves first followed by the discriminator. We prove that the optimal generative model in Wasserstein GAN problems provides a proximal equilibrium. Inspired by these results, we propose a new approach, which we call proximal training, for solving GAN problems. We discuss several numerical experiments demonstrating the existence of proximal equilibrium solutions in GAN minimax problems.

preprint2020arXiv

GAT-GMM: Generative Adversarial Training for Gaussian Mixture Models

Generative adversarial networks (GANs) learn the distribution of observed samples through a zero-sum game between two machine players, a generator and a discriminator. While GANs achieve great success in learning the complex distribution of image, sound, and text data, they perform suboptimally in learning multi-modal distribution-learning benchmarks including Gaussian mixture models (GMMs). In this paper, we propose Generative Adversarial Training for Gaussian Mixture Models (GAT-GMM), a minimax GAN framework for learning GMMs. Motivated by optimal transport theory, we design the zero-sum game in GAT-GMM using a random linear generator and a softmax-based quadratic discriminator architecture, which leads to a non-convex concave minimax optimization problem. We show that a Gradient Descent Ascent (GDA) method converges to an approximate stationary minimax point of the GAT-GMM optimization problem. In the benchmark case of a mixture of two symmetric, well-separated Gaussians, we further show this stationary point recovers the true parameters of the underlying GMM. We numerically support our theoretical findings by performing several experiments, which demonstrate that GAT-GMM can perform as well as the expectation-maximization algorithm in learning mixtures of two Gaussians.

preprint2020arXiv

Robust Federated Learning: The Case of Affine Distribution Shifts

Federated learning is a distributed paradigm that aims at training models using samples distributed across multiple users in a network while keeping the samples on users' devices with the aim of efficiency and protecting users privacy. In such settings, the training data is often statistically heterogeneous and manifests various distribution shifts across users, which degrades the performance of the learnt model. The primary goal of this paper is to develop a robust federated learning algorithm that achieves satisfactory performance against distribution shifts in users' samples. To achieve this goal, we first consider a structured affine distribution shift in users' data that captures the device-dependent data heterogeneity in federated settings. This perturbation model is applicable to various federated learning problems such as image classification where the images undergo device-dependent imperfections, e.g. different intensity, contrast, and brightness. To address affine distribution shifts across users, we propose a Federated Learning framework Robust to Affine distribution shifts (FLRA) that is provably robust against affine Wasserstein shifts to the distribution of observed samples. To solve the FLRA's distributed minimax problem, we propose a fast and efficient optimization method and provide convergence guarantees via a gradient Descent Ascent (GDA) method. We further prove generalization error bounds for the learnt classifier to show proper generalization from empirical distribution of samples to the true underlying distribution. We perform several numerical experiments to empirically support FLRA. We show that an affine distribution shift indeed suffices to significantly decrease the performance of the learnt classifier in a new test user, and our proposed algorithm achieves a significant gain in comparison to standard federated learning and adversarial training methods.

preprint2015arXiv

Discrete Rényi Classifiers

Consider the binary classification problem of predicting a target variable $Y$ from a discrete feature vector $X = (X_1,...,X_d)$. When the probability distribution $\mathbb{P}(X,Y)$ is known, the optimal classifier, leading to the minimum misclassification rate, is given by the Maximum A-posteriori Probability decision rule. However, estimating the complete joint distribution $\mathbb{P}(X,Y)$ is computationally and statistically impossible for large values of $d$. An alternative approach is to first estimate some low order marginals of $\mathbb{P}(X,Y)$ and then design the classifier based on the estimated low order marginals. This approach is also helpful when the complete training data instances are not available due to privacy concerns. In this work, we consider the problem of finding the optimum classifier based on some estimated low order marginals of $(X,Y)$. We prove that for a given set of marginals, the minimum Hirschfeld-Gebelein-Renyi (HGR) correlation principle introduced in [1] leads to a randomized classification rule which is shown to have a misclassification rate no larger than twice the misclassification rate of the optimal classifier. Then, under a separability condition, we show that the proposed algorithm is equivalent to a randomized linear regression approach. In addition, this method naturally results in a robust feature selection method selecting a subset of features having the maximum worst case HGR correlation with the target variable. Our theoretical upper-bound is similar to the recent Discrete Chebyshev Classifier (DCC) approach [2], while the proposed algorithm has significant computational advantages since it only requires solving a least square optimization problem. Finally, we numerically compare our proposed algorithm with the DCC classifier and show that the proposed algorithm results in better misclassification rate over various datasets.

preprint2015arXiv

Minimum HGR Correlation Principle: From Marginals to Joint Distribution

Given low order moment information over the random variables $\mathbf{X} = (X_1,X_2,\ldots,X_p)$ and $Y$, what distribution minimizes the Hirschfeld-Gebelein-Rényi (HGR) maximal correlation coefficient between $\mathbf{X}$ and $Y$, while remains faithful to the given moments? The answer to this question is important especially in order to fit models over $(\mathbf{X},Y)$ with minimum dependence among the random variables $\mathbf{X}$ and $Y$. In this paper, we investigate this question first in the continuous setting by showing that the jointly Gaussian distribution achieves the minimum HGR correlation coefficient among distributions with the given first and second order moments. Then, we pose a similar question in the discrete scenario by fixing the pairwise marginals of the random variables $\mathbf{X}$ and $Y$. To answer this question in the discrete setting, we first derive a lower bound for the HGR correlation coefficient over the class of distributions with fixed pairwise marginals. Then we show that this lower bound is tight if there exists a distribution with certain {\it additive} structure satisfying the given pairwise marginals. Moreover, the distribution with the additive structure achieves the minimum HGR correlation coefficient. Finally, we conclude by showing that the event of obtaining pairwise marginals containing an additive structured distribution has a positive Lebesgue measure over the probability simplex.

preprint2015arXiv

Near Optimal Energy Control and Approximate Capacity of Energy Harvesting Communication

We consider an energy-harvesting communication system where a transmitter powered by an exogenous energy arrival process and equipped with a finite battery of size $B_{max}$ communicates over a discrete-time AWGN channel. We first concentrate on a simple Bernoulli energy arrival process where at each time step, either an energy packet of size $E$ is harvested with probability $p$, or no energy is harvested at all, independent of the other time steps. We provide a near optimal energy control policy and a simple approximation to the information-theoretic capacity of this channel. Our approximations for both problems are universal in all the system parameters involved ($p$, $E$ and $B_{max}$), i.e. we bound the approximation gaps by a constant independent of the parameter values. Our results suggest that a battery size $B_{max}\geq E$ is (approximately) sufficient to extract the infinite battery capacity of this channel. We then extend our results to general i.i.d. energy arrival processes. Our approximate capacity characterizations provide important insights for the optimal design of energy harvesting communication systems in the regime where both the battery size and the average energy arrival rate are large.

preprint2014arXiv

On feedback in Gaussian multi-hop networks

The study of feedback has been mostly limited to single-hop communication settings. In this paper, we consider Gaussian networks where sources and destinations can communicate with the help of intermediate relays over multiple hops. We assume that links in the network can be bidirected providing opportunities for feedback. We ask the following question: can the information transfer in both directions of a link be critical to maximizing the end-to-end communication rates in the network? Equivalently, could one of the directions in each bidirected link (and more generally at least one of the links forming a cycle) be shut down and the capacity of the network still be approximately maintained? We show that in any arbitrary Gaussian network with bidirected edges and cycles and unicast traffic, we can always identify a directed acyclic subnetwork that approximately maintains the capacity of the original network. For Gaussian networks with multiple-access and broadcast traffic, an acyclic subnetwork is sufficient to achieve every rate point in the capacity region of the original network, however, there may not be a single acyclic subnetwork that maintains the whole capacity region. For networks with multicast and multiple unicast traffic, on the other hand, bidirected information flow across certain links can be critically needed to maximize the end-to-end capacity region. These results can be regarded as generalizations of the conclusions regarding the usefulness of feedback in various single-hop Gaussian settings and can provide opportunities for simplifying operation in Gaussian multi-hop networks.

Farzan Farnia

What is connected

Connect this record

See the researcher in context

Building this map preview

11 published item(s)

Training-Free Distribution Adaptation for Diffusion Models via Maximum Mean Discrepancy Guidance

An Optimal Transport Approach to Personalized Federated Learning

On Convergence of Gradient Descent Ascent: A Tight Local Analysis

On the Role of Generalization in Transferability of Adversarial Examples

GANs May Have No Nash Equilibria

GAT-GMM: Generative Adversarial Training for Gaussian Mixture Models

Robust Federated Learning: The Case of Affine Distribution Shifts

Discrete Rényi Classifiers

Minimum HGR Correlation Principle: From Marginals to Joint Distribution

Near Optimal Energy Control and Approximate Capacity of Energy Harvesting Communication

On feedback in Gaussian multi-hop networks