Source author record

Deepesh Data

Deepesh Data appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Cryptography and Security Machine Learning Information Theory math.IT Distributed, Parallel, and Cluster Computing math.OC

Catalog footprint

What is connected

12works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

A Generative Framework for Personalized Learning and Estimation: Theory, Algorithms, and Privacy

A distinguishing characteristic of federated learning is that the (local) client data could have statistical heterogeneity. This heterogeneity has motivated the design of personalized learning, where individual (personalized) models are trained, through collaboration. There have been various personalization methods proposed in literature, with seemingly very different forms and methods ranging from use of a single global model for local regularization and model interpolation, to use of multiple global models for personalized clustering, etc. In this work, we begin with a generative framework that could potentially unify several different algorithms as well as suggest new algorithms. We apply our generative framework to personalized estimation, and connect it to the classical empirical Bayes' methodology. We develop private personalized estimation under this framework. We then use our generative framework for learning, which unifies several known personalized FL algorithms and also suggests new ones; we propose and study a new algorithm AdaPeD based on a Knowledge Distillation, which numerically outperforms several known algorithms. We also develop privacy for personalized learning methods with guarantees for user-level privacy and composition. We numerically evaluate the performance as well as the privacy for both the estimation and learning problems, demonstrating the advantages of our proposed methods.

preprint2022arXiv

QuPeD: Quantized Personalization via Distillation with Applications to Federated Learning

Traditionally, federated learning (FL) aims to train a single global model while collaboratively using multiple clients and a server. Two natural challenges that FL algorithms face are heterogeneity in data across clients and collaboration of clients with {\em diverse resources}. In this work, we introduce a \textit{quantized} and \textit{personalized} FL algorithm QuPeD that facilitates collective (personalized model compression) training via \textit{knowledge distillation} (KD) among clients who have access to heterogeneous data and resources. For personalization, we allow clients to learn \textit{compressed personalized models} with different quantization parameters and model dimensions/structures. Towards this, first we propose an algorithm for learning quantized models through a relaxed optimization problem, where quantization values are also optimized over. When each client participating in the (federated) learning process has different requirements for the compressed model (both in model dimension and precision), we formulate a compressed personalization framework by introducing knowledge distillation loss for local client objectives collaborating through a global model. We develop an alternating proximal gradient update for solving this compressed personalization problem, and analyze its convergence properties. Numerically, we validate that QuPeD outperforms competing personalized FL methods, FedAvg, and local training of clients in various heterogeneous settings.

preprint2021arXiv

QuPeL: Quantized Personalization with Applications to Federated Learning

Traditionally, federated learning (FL) aims to train a single global model while collaboratively using multiple clients and a server. Two natural challenges that FL algorithms face are heterogeneity in data across clients and collaboration of clients with {\em diverse resources}. In this work, we introduce a \textit{quantized} and \textit{personalized} FL algorithm QuPeL that facilitates collective training with heterogeneous clients while respecting resource diversity. For personalization, we allow clients to learn \textit{compressed personalized models} with different quantization parameters depending on their resources. Towards this, first we propose an algorithm for learning quantized models through a relaxed optimization problem, where quantization values are also optimized over. When each client participating in the (federated) learning process has different requirements of the quantized model (both in value and precision), we formulate a quantized personalization framework by introducing a penalty term for local client objectives against a globally trained model to encourage collaboration. We develop an alternating proximal gradient update for solving this quantized personalization problem, and we analyze its convergence properties. Numerically, we show that optimizing over the quantization levels increases the performance and we validate that QuPeL outperforms both FedAvg and local training of clients in a heterogeneous setting.

preprint2020arXiv

Byzantine-Resilient High-Dimensional Federated Learning

We study stochastic gradient descent (SGD) with local iterations in the presence of malicious/Byzantine clients, motivated by the federated learning. The clients, instead of communicating with the central server in every iteration, maintain their local models, which they update by taking several SGD iterations based on their own datasets and then communicate the net update with the server, thereby achieving communication-efficiency. Furthermore, only a subset of clients communicate with the server, and this subset may be different at different synchronization times. The Byzantine clients may collaborate and send arbitrary vectors to the server to disrupt the learning process. To combat the adversary, we employ an efficient high-dimensional robust mean estimation algorithm from Steinhardt et al.~\cite[ITCS 2018]{Resilience_SCV18} at the server to filter-out corrupt vectors; and to analyze the outlier-filtering procedure, we develop a novel matrix concentration result that may be of independent interest. We provide convergence analyses for strongly-convex and non-convex smooth objectives in the heterogeneous data setting, where different clients may have different local datasets, and we do not make any probabilistic assumptions on data generation. We believe that ours is the first Byzantine-resilient algorithm and analysis with local iterations. We derive our convergence results under minimal assumptions of bounded variance for SGD and bounded gradient dissimilarity (which captures heterogeneity among local datasets). We also extend our results to the case when clients compute full-batch gradients.

preprint2020arXiv

Byzantine-Resilient SGD in High Dimensions on Heterogeneous Data

We study distributed stochastic gradient descent (SGD) in the master-worker architecture under Byzantine attacks. We consider the heterogeneous data model, where different workers may have different local datasets, and we do not make any probabilistic assumptions on data generation. At the core of our algorithm, we use the polynomial-time outlier-filtering procedure for robust mean estimation proposed by Steinhardt et al. (ITCS 2018) to filter-out corrupt gradients. In order to be able to apply their filtering procedure in our {\em heterogeneous} data setting where workers compute {\em stochastic} gradients, we derive a new matrix concentration result, which may be of independent interest. We provide convergence analyses for smooth strongly-convex and non-convex objectives. We derive our results under the bounded variance assumption on local stochastic gradients and a {\em deterministic} condition on datasets, namely, gradient dissimilarity; and for both these quantities, we provide concrete bounds in the statistical heterogeneous data model. We give a trade-off between the mini-batch size for stochastic gradients and the approximation error. Our algorithm can tolerate up to $\frac{1}{4}$ fraction Byzantine workers. It can find approximate optimal parameters in the strongly-convex setting exponentially fast and reach to an approximate stationary point in the non-convex setting with a linear speed, thus, matching the convergence rates of vanilla SGD in the Byzantine-free setting. We also propose and analyze a Byzantine-resilient SGD algorithm with gradient compression, where workers send $k$ random coordinates of their gradients. Under mild conditions, we show a $\frac{d}{k}$-factor saving in communication bits as well as decoding complexity over our compression-free algorithm without affecting its convergence rate (order-wise) and the approximation error.

preprint2020arXiv

Interactive Secure Function Computation

We consider interactive computation of randomized functions between two users with the following privacy requirement: the interaction should not reveal to either user any extra information about the other user's input and output other than what can be inferred from the user's own input and output. We also consider the case where privacy is required against only one of the users. For both cases, we give single-letter expressions for feasibility and optimal rates of communication. Then we discuss the role of common randomness and interaction in both privacy settings. We also study perfectly secure non-interactive computation when only one of the users computes a randomized function based on a single transmission from the other user. We characterize randomized functions which can be perfectly securely computed in this model and obtain tight bounds on the optimal message lengths in all the privacy settings.

preprint2020arXiv

SPARQ-SGD: Event-Triggered and Compressed Communication in Decentralized Stochastic Optimization

In this paper, we propose and analyze SPARQ-SGD, which is an event-triggered and compressed algorithm for decentralized training of large-scale machine learning models. Each node can locally compute a condition (event) which triggers a communication where quantized and sparsified local model parameters are sent. In SPARQ-SGD each node takes at least a fixed number ($H$) of local gradient steps and then checks if the model parameters have significantly changed compared to its last update; it communicates further compressed model parameters only when there is a significant change, as specified by a (design) criterion. We prove that the SPARQ-SGD converges as $O(\frac{1}{nT})$ and $O(\frac{1}{\sqrt{nT}})$ in the strongly-convex and non-convex settings, respectively, demonstrating that such aggressive compression, including event-triggered communication, model sparsification and quantization does not affect the overall convergence rate as compared to uncompressed decentralized training; thereby theoretically yielding communication efficiency for "free". We evaluate SPARQ-SGD over real datasets to demonstrate significant amount of savings in communication over the state-of-the-art.

preprint2020arXiv

Successive Refinement of Privacy

This work examines a novel question: how much randomness is needed to achieve local differential privacy (LDP)? A motivating scenario is providing {\em multiple levels of privacy} to multiple analysts, either for distribution or for heavy-hitter estimation, using the \emph{same} (randomized) output. We call this setting \emph{successive refinement of privacy}, as it provides hierarchical access to the raw data with different privacy levels. For example, the same randomized output could enable one analyst to reconstruct the input, while another can only estimate the distribution subject to LDP requirements. This extends the classical Shannon (wiretap) security setting to local differential privacy. We provide (order-wise) tight characterizations of privacy-utility-randomness trade-offs in several cases for distribution estimation, including the standard LDP setting under a randomness constraint. We also provide a non-trivial privacy mechanism for multi-level privacy. Furthermore, we show that we cannot reuse random keys over time while preserving privacy of each user.

preprint2016arXiv

Communication and Randomness Lower Bounds for Secure Computation

In secure multiparty computation (MPC), mutually distrusting users collaborate to compute a function of their private data without revealing any additional information about their data to other users. While it is known that information theoretically secure MPC is possible among $n$ users (connected by secure and noiseless links and have access to private randomness) against the collusion of less than $n/2$ users in the honest-but-curious model, relatively less is known about the communication and randomness complexity of secure computation. In this work, we employ information theoretic techniques to obtain lower bounds on the amount of communication and randomness required for secure MPC. We restrict ourselves to a concrete interactive setting involving 3 users under which all functions are securely computable against corruption of a single user in the honest-but-curious model. We derive lower bounds for both the perfect security case (i.e., zero-error and no leakage of information) and asymptotic security (where the probability of error and information leakage vanish as block-length goes to $\infty$). Our techniques include the use of a data processing inequality for residual information (i.e., the gap between mutual information and Gács-Körner common information), a new information inequality for 3-user protocols, and the idea of distribution switching. Our lower bounds are shown to be tight for various functions of interest. In particular, we show concrete functions which have "communication-ideal" protocols, i.e., which achieve the minimum communication simultaneously on all links in the network, and also use minimum amount of randomness. Also, we obtain the first explicit example of a function that incurs a higher communication cost than the input length in the secure computation model of "Feige, Kilian, and Naor [STOC, 1994]", who had shown that such functions exist.

preprint2016arXiv

Secure Computation of Randomized Functions

Two user secure computation of randomized functions is considered, where only one user computes the output. Both the users are semi-honest; and computation is such that no user learns any additional information about the other user's input and output other than what cannot be inferred from its own input and output. First we consider a scenario, where privacy conditions are against both the users. In perfect security setting Kilian [STOC 2000] gave a characterization of securely computable randomized functions, and we provide rate-optimal protocols for such functions. We prove that the same characterization holds in asymptotic security setting as well and give a rate-optimal protocol. In another scenario, where privacy condition is only against the user who is not computing the function, we provide rate-optimal protocols. For perfect security in both the scenarios, our results are in terms of chromatic entropies of different graphs. In asymptotic security setting, we get single-letter expressions of rates in both the scenarios.

preprint2014arXiv

How to Securely Compute the Modulo-Two Sum of Binary Sources

In secure multiparty computation, mutually distrusting users in a network want to collaborate to compute functions of data which is distributed among the users. The users should not learn any additional information about the data of others than what they may infer from their own data and the functions they are computing. Previous works have mostly considered the worst case context (i.e., without assuming any distribution for the data); Lee and Abbe (2014) is a notable exception. Here, we study the average case (i.e., we work with a distribution on the data) where correctness and privacy is only desired asymptotically. For concreteness and simplicity, we consider a secure version of the function computation problem of Körner and Marton (1979) where two users observe a doubly symmetric binary source with parameter p and the third user wants to compute the XOR. We show that the amount of communication and randomness resources required depends on the level of correctness desired. When zero-error and perfect privacy are required, the results of Data et al. (2014) show that it can be achieved if and only if a total rate of 1 bit is communicated between every pair of users and private randomness at the rate of 1 is used up. In contrast, we show here that, if we only want the probability of error to vanish asymptotically in block length, it can be achieved by a lower rate (binary entropy of p) for all the links and for private randomness; this also guarantees perfect privacy. We also show that no smaller rates are possible even if privacy is only required asymptotically.

preprint2014arXiv

On the Communication Complexity of Secure Computation

Information theoretically secure multi-party computation (MPC) is a central primitive of modern cryptography. However, relatively little is known about the communication complexity of this primitive. In this work, we develop powerful information theoretic tools to prove lower bounds on the communication complexity of MPC. We restrict ourselves to a 3-party setting in order to bring out the power of these tools without introducing too many complications. Our techniques include the use of a data processing inequality for residual information - i.e., the gap between mutual information and Gács-Körner common information, a new information inequality for 3-party protocols, and the idea of distribution switching by which lower bounds computed under certain worst-case scenarios can be shown to apply for the general case. Using these techniques we obtain tight bounds on communication complexity by MPC protocols for various interesting functions. In particular, we show concrete functions that have "communication-ideal" protocols, which achieve the minimum communication simultaneously on all links in the network. Also, we obtain the first explicit example of a function that incurs a higher communication cost than the input length in the secure computation model of Feige, Kilian and Naor (1994), who had shown that such functions exist. We also show that our communication bounds imply tight lower bounds on the amount of randomness required by MPC protocols for many interesting functions.

Deepesh Data

What is connected

Connect this record

See the researcher in context

Building this map preview

12 published item(s)

A Generative Framework for Personalized Learning and Estimation: Theory, Algorithms, and Privacy

QuPeD: Quantized Personalization via Distillation with Applications to Federated Learning

QuPeL: Quantized Personalization with Applications to Federated Learning

Byzantine-Resilient High-Dimensional Federated Learning

Byzantine-Resilient SGD in High Dimensions on Heterogeneous Data

Interactive Secure Function Computation

SPARQ-SGD: Event-Triggered and Compressed Communication in Decentralized Stochastic Optimization

Successive Refinement of Privacy

Communication and Randomness Lower Bounds for Secure Computation

Secure Computation of Randomized Functions

How to Securely Compute the Modulo-Two Sum of Binary Sources

On the Communication Complexity of Secure Computation