Source author record

Farzin Haddadpour

Farzin Haddadpour appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Information Theory math.IT Data Structures and Algorithms Artificial Intelligence Distributed, Parallel, and Cluster Computing math.OC

Catalog footprint

What is connected

8works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Learning Distributionally Robust Models at Scale via Composite Optimization

To train machine learning models that are robust to distribution shifts in the data, distributionally robust optimization (DRO) has been proven very effective. However, the existing approaches to learning a distributionally robust model either require solving complex optimization problems such as semidefinite programming or a first-order method whose convergence scales linearly with the number of data samples -- which hinders their scalability to large datasets. In this paper, we show how different variants of DRO are simply instances of a finite-sum composite optimization for which we provide scalable methods. We also provide empirical results that demonstrate the effectiveness of our proposed algorithm with respect to the prior art in order to learn robust models from very large datasets.

preprint2020arXiv

Efficient Fair Principal Component Analysis

It has been shown that dimension reduction methods such as PCA may be inherently prone to unfairness and treat data from different sensitive groups such as race, color, sex, etc., unfairly. In pursuit of fairness-enhancing dimensionality reduction, using the notion of Pareto optimality, we propose an adaptive first-order algorithm to learn a subspace that preserves fairness, while slightly compromising the reconstruction loss. Theoretically, we provide sufficient conditions that the solution of the proposed algorithm belongs to the Pareto frontier for all sensitive groups; thereby, the optimal trade-off between overall reconstruction loss and fairness constraints is guaranteed. We also provide the convergence analysis of our algorithm and show its efficacy through empirical studies on different datasets, which demonstrates superior performance in comparison with state-of-the-art algorithms. The proposed fairness-aware PCA algorithm can be efficiently generalized to multiple group sensitive features and effectively reduce the unfairness decisions in downstream tasks such as classification.

preprint2020arXiv

FedSKETCH: Communication-Efficient and Private Federated Learning via Sketching

Communication complexity and privacy are the two key challenges in Federated Learning where the goal is to perform a distributed learning through a large volume of devices. In this work, we introduce FedSKETCH and FedSKETCHGATE algorithms to address both challenges in Federated learning jointly, where these algorithms are intended to be used for homogeneous and heterogeneous data distribution settings respectively. The key idea is to compress the accumulation of local gradients using count sketch, therefore, the server does not have access to the gradients themselves which provides privacy. Furthermore, due to the lower dimension of sketching used, our method exhibits communication-efficiency property as well. We provide, for the aforementioned schemes, sharp convergence guarantees. Finally, we back up our theory with various set of experiments.

preprint2020arXiv

Local SGD with Periodic Averaging: Tighter Analysis and Adaptive Synchronization

Communication overhead is one of the key challenges that hinders the scalability of distributed optimization algorithms. In this paper, we study local distributed SGD, where data is partitioned among computation nodes, and the computation nodes perform local updates with periodically exchanging the model among the workers to perform averaging. While local SGD is empirically shown to provide promising results, a theoretical understanding of its performance remains open. We strengthen convergence analysis for local SGD, and show that local SGD can be far less expensive and applied far more generally than current theory suggests. Specifically, we show that for loss functions that satisfy the Polyak-Łojasiewicz condition, $O((pT)^{1/3})$ rounds of communication suffice to achieve a linear speed up, that is, an error of $O(1/pT)$, where $T$ is the total number of model updates at each worker. This is in contrast with previous work which required higher number of communication rounds, as well as was limited to strongly convex loss functions, for a similar asymptotic performance. We also develop an adaptive synchronization scheme that provides a general condition for linear speed up. Finally, we validate the theory with experimental results, running over AWS EC2 clouds and an internal GPU cluster.

preprint2016arXiv

Low-Complexity Stochastic Generalized Belief Propagation

The generalized belief propagation (GBP), introduced by Yedidia et al., is an extension of the belief propagation (BP) algorithm, which is widely used in different problems involved in calculating exact or approximate marginals of probability distributions. In many problems, it has been observed that the accuracy of GBP considerably outperforms that of BP. However, because in general the computational complexity of GBP is higher than BP, its application is limited in practice. In this paper, we introduce a stochastic version of GBP called stochastic generalized belief propagation (SGBP) that can be considered as an extension to the stochastic BP (SBP) algorithm introduced by Noorshams et al. They have shown that SBP reduces the complexity per iteration of BP by an order of magnitude in alphabet size. In contrast to SBP, SGBP can reduce the computation complexity if certain topological conditions are met by the region graph associated to a graphical model. However, this reduction can be larger than only one order of magnitude in alphabet size. In this paper, we characterize these conditions and the amount of computation gain that we can obtain by using SGBP. Finally, using similar proof techniques employed by Noorshams et al., for general graphical models satisfy contraction conditions, we prove the asymptotic convergence of SGBP to the unique GBP fixed point, as well as providing non-asymptotic upper bounds on the mean square error and on the high probability error.

preprint2016arXiv

Simulation of a Channel with Another Channel

In this paper, we study the problem of simulating a DMC channel from another DMC channel under an average-case and an exact model. We present several achievability and infeasibility results, with tight characterizations in special cases. In particular for the exact model, we fully characterize when a BSC channel can be simulated from a BEC channel when there is no shared randomness. We also provide infeasibility and achievability results for simulation of a binary channel from another binary channel in the case of no shared randomness. To do this, we use properties of Rényi capacity of a given order. We also introduce a notion of "channel diameter" which is shown to be additive and satisfy a data processing inequality.

preprint2013arXiv

On AVCs with Quadratic Constraints

In this work we study an Arbitrarily Varying Channel (AVC) with quadratic power constraints on the transmitter and a so-called "oblivious" jammer (along with additional AWGN) under a maximum probability of error criterion, and no private randomness between the transmitter and the receiver. This is in contrast to similar AVC models under the average probability of error criterion considered in [1], and models wherein common randomness is allowed [2] -- these distinctions are important in some communication scenarios outlined below. We consider the regime where the jammer's power constraint is smaller than the transmitter's power constraint (in the other regime it is known no positive rate is possible). For this regime we show the existence of stochastic codes (with no common randomness between the transmitter and receiver) that enables reliable communication at the same rate as when the jammer is replaced with AWGN with the same power constraint. This matches known information-theoretic outer bounds. In addition to being a stronger result than that in [1] (enabling recovery of the results therein), our proof techniques are also somewhat more direct, and hence may be of independent interest.

preprint2012arXiv

Coordination via a relay

In this paper, we study the problem of coordinating two nodes which can only exchange information via a relay at limited rates. The nodes are allowed to do a two-round interactive two-way communication with the relay, after which they should be able to generate i.i.d. copies of two random variables with a given joint distribution within a vanishing total variation distance. We prove inner and outer bounds on the coordination capacity region for this problem. Our inner bound is proved using the technique of "output statistics of random binning" that has recently been developed by Yassaee, et al.

Farzin Haddadpour

What is connected

Connect this record

See the researcher in context

Building this map preview

8 published item(s)

Learning Distributionally Robust Models at Scale via Composite Optimization

Efficient Fair Principal Component Analysis

FedSKETCH: Communication-Efficient and Private Federated Learning via Sketching

Local SGD with Periodic Averaging: Tighter Analysis and Adaptive Synchronization

Low-Complexity Stochastic Generalized Belief Propagation

Simulation of a Channel with Another Channel

On AVCs with Quadratic Constraints

Coordination via a relay