Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
22works
0followers
15topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

22 published item(s)

preprint2022arXiv

An Optimal Transport Approach to Personalized Federated Learning

Federated learning is a distributed machine learning paradigm, which aims to train a model using the local data of many distributed clients. A key challenge in federated learning is that the data samples across the clients may not be identically distributed. To address this challenge, personalized federated learning with the goal of tailoring the learned model to the data distribution of every individual client has been proposed. In this paper, we focus on this problem and propose a novel personalized Federated Learning scheme based on Optimal Transport (FedOT) as a learning algorithm that learns the optimal transport maps for transferring data points to a common distribution as well as the prediction model under the applied transport map. To formulate the FedOT problem, we extend the standard optimal transport task between two probability distributions to multi-marginal optimal transport problems with the goal of transporting samples from multiple distributions to a common probability domain. We then leverage the results on multi-marginal optimal transport problems to formulate FedOT as a min-max optimization problem and analyze its generalization and optimization properties. We discuss the results of several numerical experiments to evaluate the performance of FedOT under heterogeneous data distributions in federated learning problems.

preprint2022arXiv

Binary Classification Under $\ell_0$ Attacks for General Noise Distribution

Adversarial examples have recently drawn considerable attention in the field of machine learning due to the fact that small perturbations in the data can result in major performance degradation. This phenomenon is usually modeled by a malicious adversary that can apply perturbations to the data in a constrained fashion, such as being bounded in a certain norm. In this paper, we study this problem when the adversary is constrained by the $\ell_0$ norm; i.e., it can perturb a certain number of coordinates in the input, but has no limit on how much it can perturb those coordinates. Due to the combinatorial nature of this setting, we need to go beyond the standard techniques in robust machine learning to address this problem. We consider a binary classification scenario where $d$ noisy data samples of the true label are provided to us after adversarial perturbations. We introduce a classification method which employs a nonlinear component called truncation, and show in an asymptotic scenario, as long as the adversary is restricted to perturb no more than $\sqrt{d}$ data samples, we can almost achieve the optimal classification error in the absence of the adversary, i.e. we can completely neutralize adversary's effect. Surprisingly, we observe a phase transition in the sense that using a converse argument, we show that if the adversary can perturb more than $\sqrt{d}$ coordinates, no classifier can do better than a random guess.

preprint2022arXiv

Congestion-aware Bi-modal Delivery Systems Utilizing Drones

Bi-modal delivery systems are a promising solution to the challenges posed by the increasing demand of e-commerce. Due to the potential benefit drones can have on logistics networks such as delivery systems, some countries have taken steps towards integrating drones into their airspace. In this paper we aim to quantify this potential by developing a mathematical model for a Bi-modal delivery system composed of trucks and drones. We propose an optimization formulation that can be efficiently solved in order to design socially-optimal routing and allocation policies. We incorporate both societal cost in terms of road congestion and parcel delivery latency in our formulation. Our model is able to quantify the effect drones have on mitigating road congestion, and can solve for the path routing needed to minimize the chosen objective. To accurately capture the effect of stopping trucks on road latency, we model it using SUMO by simulating roads shared by trucks and cars. Based on this, we show that the proposed framework is computationally feasible to scale due to its reliance on convex quadratic optimization techniques.

preprint2022arXiv

Efficient and Robust Classification for Sparse Attacks

In the past two decades we have seen the popularity of neural networks increase in conjunction with their classification accuracy. Parallel to this, we have also witnessed how fragile the very same prediction models are: tiny perturbations to the inputs can cause misclassification errors throughout entire datasets. In this paper, we consider perturbations bounded by the $\ell_0$--norm, which have been shown as effective attacks in the domains of image-recognition, natural language processing, and malware-detection. To this end, we propose a novel defense method that consists of "truncation" and "adversarial training". We then theoretically study the Gaussian mixture setting and prove the asymptotic optimality of our proposed classifier. Motivated by the insights we obtain, we extend these components to neural network classifiers. We conduct numerical experiments in the domain of computer vision using the MNIST and CIFAR datasets, demonstrating significant improvement for the robust classification error of neural networks.

preprint2022arXiv

Imitation Learning by Estimating Expertise of Demonstrators

Many existing imitation learning datasets are collected from multiple demonstrators, each with different expertise at different parts of the environment. Yet, standard imitation learning algorithms typically treat all demonstrators as homogeneous, regardless of their expertise, absorbing the weaknesses of any suboptimal demonstrators. In this work, we show that unsupervised learning over demonstrator expertise can lead to a consistent boost in the performance of imitation learning algorithms. We develop and optimize a joint model over a learned policy and expertise levels of the demonstrators. This enables our model to learn from the optimal behavior and filter out the suboptimal behavior of each demonstrator. Our model learns a single policy that can outperform even the best demonstrator, and can be used to estimate the expertise of any demonstrator at any state. We illustrate our findings on real-robotic continuous control tasks from Robomimic and discrete environments such as MiniGrid and chess, out-performing competing methods in $21$ out of $23$ settings, with an average of $7\%$ and up to $60\%$ improvement in terms of the final reward.

preprint2022arXiv

Social Coordination and Altruism in Autonomous Driving

Despite the advances in the autonomous driving domain, autonomous vehicles (AVs) are still inefficient and limited in terms of cooperating with each other or coordinating with vehicles operated by humans. A group of autonomous and human-driven vehicles (HVs) which work together to optimize an altruistic social utility -- as opposed to the egoistic individual utility -- can co-exist seamlessly and assure safety and efficiency on the road. Achieving this mission without explicit coordination among agents is challenging, mainly due to the difficulty of predicting the behavior of humans with heterogeneous preferences in mixed-autonomy environments. Formally, we model an AV's maneuver planning in mixed-autonomy traffic as a partially-observable stochastic game and attempt to derive optimal policies that lead to socially-desirable outcomes using a multi-agent reinforcement learning framework. We introduce a quantitative representation of the AVs' social preferences and design a distributed reward structure that induces altruism into their decision making process. Our altruistic AVs are able to form alliances, guide the traffic, and affect the behavior of the HVs to handle competitive driving scenarios. As a case study, we compare egoistic AVs to our altruistic autonomous agents in a highway merging setting and demonstrate the emerging behaviors that lead to a noticeable improvement in the number of successful merges as well as the overall traffic flow and safety.

preprint2022arXiv

Straggler-Resilient Personalized Federated Learning

Federated Learning is an emerging learning paradigm that allows training models from samples distributed across a large network of clients while respecting privacy and communication restrictions. Despite its success, federated learning faces several challenges related to its decentralized nature. In this work, we develop a novel algorithmic procedure with theoretical speedup guarantees that simultaneously handles two of these hurdles, namely (i) data heterogeneity, i.e., data distributions can vary substantially across clients, and (ii) system heterogeneity, i.e., the computational power of the clients could differ significantly. Our method relies on ideas from representation learning theory to find a global common representation using all clients' data and learn a user-specific set of parameters leading to a personalized solution for each client. Furthermore, our method mitigates the effects of stragglers by adaptively selecting clients based on their computational characteristics and statistical significance, thus achieving, for the first time, near optimal sample complexity and provable logarithmic speedup. Experimental results support our theoretical findings showing the superiority of our method over alternative personalized federated schemes in system and data heterogeneous environments.

preprint2022arXiv

Towards Learning Generalizable Driving Policies from Restricted Latent Representations

Training intelligent agents that can drive autonomously in various urban and highway scenarios has been a hot topic in the robotics society within the last decades. However, the diversity of driving environments in terms of road topology and positioning of the neighboring vehicles makes this problem very challenging. It goes without saying that although scenario-specific driving policies for autonomous driving are promising and can improve transportation safety and efficiency, they are clearly not a universal scalable solution. Instead, we seek decision-making schemes and driving policies that can generalize to novel and unseen environments. In this work, we capitalize on the key idea that human drivers learn abstract representations of their surroundings that are fairly similar among various driving scenarios and environments. Through these representations, human drivers are able to quickly adapt to novel environments and drive in unseen conditions. Formally, through imposing an information bottleneck, we extract a latent representation that minimizes the \textit{distance} -- a quantification that we introduce to gauge the similarity among different driving configurations -- between driving scenarios. This latent space is then employed as the input to a Q-learning module to learn generalizable driving policies. Our experiments revealed that, using this latent representation can reduce the number of crashes to about half.

preprint2021arXiv

Edge Computing in the Dark: Leveraging Contextual-Combinatorial Bandit and Coded Computing

With recent advancements in edge computing capabilities, there has been a significant increase in utilizing the edge cloud for event-driven and time-sensitive computations. However, large-scale edge computing networks can suffer substantially from unpredictable and unreliable computing resources which can result in high variability of service quality. Thus, it is crucial to design efficient task scheduling policies that guarantee quality of service and the timeliness of computation queries. In this paper, we study the problem of computation offloading over unknown edge cloud networks with a sequence of timely computation jobs. Motivated by the MapReduce computation paradigm, we assume each computation job can be partitioned to smaller Map functions that are processed at the edge, and the Reduce function is computed at the user after the Map results are collected from the edge nodes. We model the service quality (success probability of returning result back to the user within deadline) of each edge device as function of context (collection of factors that affect edge devices). The user decides the computations to offload to each device with the goal of receiving a recoverable set of computation results in the given deadline. Our goal is to design an efficient edge computing policy in the dark without the knowledge of the context or computation capabilities of each device. By leveraging the \emph{coded computing} framework in order to tackle failures or stragglers in computation, we formulate this problem using contextual-combinatorial multi-armed bandits (CC-MAB), and aim to maximize the cumulative expected reward. We propose an online learning policy called \emph{online coded edge computing policy}, which provably achieves asymptotically-optimal performance in terms of regret loss compared with the optimal offline policy for the proposed CC-MAB problem.

preprint2021arXiv

Generalized Likelihood Ratio Test for Adversarially Robust Hypothesis Testing

Machine learning models are known to be susceptible to adversarial attacks which can cause misclassification by introducing small but well designed perturbations. In this paper, we consider a classical hypothesis testing problem in order to develop fundamental insight into defending against such adversarial perturbations. We interpret an adversarial perturbation as a nuisance parameter, and propose a defense based on applying the generalized likelihood ratio test (GLRT) to the resulting composite hypothesis testing problem, jointly estimating the class of interest and the adversarial perturbation. While the GLRT approach is applicable to general multi-class hypothesis testing, we first evaluate it for binary hypothesis testing in white Gaussian noise under $\ell_{\infty}$ norm-bounded adversarial perturbations, for which a known minimax defense optimizing for the worst-case attack provides a benchmark. We derive the worst-case attack for the GLRT defense, and show that its asymptotic performance (as the dimension of the data increases) approaches that of the minimax defense. For non-asymptotic regimes, we show via simulations that the GLRT defense is competitive with the minimax approach under the worst-case attack, while yielding a better robustness-accuracy tradeoff under weaker attacks. We also illustrate the GLRT approach for a multi-class hypothesis testing problem, for which a minimax strategy is not known, evaluating its performance under both noise-agnostic and noise-aware adversarial settings, by providing a method to find optimal noise-aware attacks, and heuristics to find noise-agnostic attacks that are close to optimal in the high SNR regime.

preprint2021arXiv

Incentivizing Routing Choices for Safe and Efficient Transportation in the Face of the COVID-19 Pandemic

The COVID-19 pandemic has severely affected many aspects of people's daily lives. While many countries are in a re-opening stage, some effects of the pandemic on people's behaviors are expected to last much longer, including how they choose between different transport options. Experts predict considerably delayed recovery of the public transport options, as people try to avoid crowded places. In turn, significant increases in traffic congestion are expected, since people are likely to prefer using their own vehicles or taxis as opposed to riskier and more crowded options such as the railway. In this paper, we propose to use financial incentives to set the tradeoff between risk of infection and congestion to achieve safe and efficient transportation networks. To this end, we formulate a network optimization problem to optimize taxi fares. For our framework to be useful in various cities and times of the day without much designer effort, we also propose a data-driven approach to learn human preferences about transport options, which is then used in our taxi fare optimization. Our user studies and simulation experiments show our framework is able to minimize congestion and risk of infection.

preprint2020arXiv

Coded Computing for Distributed Graph Analytics

Performance of distributed graph processing systems significantly suffers from 'communication bottleneck' as a large number of messages are exchanged among servers at each step of the computation. Motivated by graph based MapReduce, we propose a coded computing framework that leverages computation redundancy to alleviate the communication bottleneck in distributed graph processing. We develop a novel 'coding' scheme that systematically injects structured redundancy in computation phase to enable 'coded' multicasting opportunities during message exchange between servers, reducing communication load substantially in large-scale graph processing. For theoretical analysis, we consider random graph models, and prove that our proposed scheme enables an (asymptotically) inverse-linear trade-off between 'computation load' and 'average communication load' for two popular random graph models -- Erdos-Renyi model, and power law model. Particularly, for a given computation load r, (i.e. when each graph vertex is carefully stored at r servers), the proposed scheme slashes the average communication load by (nearly) a multiplicative factor of r. For the Erdos-Renyi model, our proposed scheme is optimal asymptotically as the graph size increases by providing an information-theoretic converse. To illustrate the benefits of our scheme in practice, we implement PageRank over Amazon EC2, using artificial as well as real-world datasets, demonstrating significant gains over conventional PageRank. We also specialize our scheme and extend our theoretical results to two other random graph models -- random bi-partite model, and stochastic block model. They asymptotically enable inverse-linear trade-offs between computation and communication loads in distributed graph processing for these popular random graph models as well. We complement the achievability results with converse bounds for both of these models.

preprint2020arXiv

FedPAQ: A Communication-Efficient Federated Learning Method with Periodic Averaging and Quantization

Federated learning is a distributed framework according to which a model is trained over a set of devices, while keeping data localized. This framework faces several systems-oriented challenges which include (i) communication bottleneck since a large number of devices upload their local updates to a parameter server, and (ii) scalability as the federated network consists of millions of devices. Due to these systems challenges as well as issues related to statistical heterogeneity of data and privacy concerns, designing a provably efficient federated learning method is of significant importance yet it remains challenging. In this paper, we present FedPAQ, a communication-efficient Federated Learning method with Periodic Averaging and Quantization. FedPAQ relies on three key features: (1) periodic averaging where models are updated locally at devices and only periodically averaged at the server; (2) partial device participation where only a fraction of devices participate in each round of the training; and (3) quantized message-passing where the edge nodes quantize their updates before uploading to the parameter server. These features address the communications and scalability challenges in federated learning. We also show that FedPAQ achieves near-optimal theoretical guarantees for strongly convex and non-convex loss functions and empirically demonstrate the communication-computation tradeoff provided by our method.

preprint2020arXiv

Fundamental Limits of Ridge-Regularized Empirical Risk Minimization in High Dimensions

Empirical Risk Minimization (ERM) algorithms are widely used in a variety of estimation and prediction tasks in signal-processing and machine learning applications. Despite their popularity, a theory that explains their statistical properties in modern regimes where both the number of measurements and the number of unknown parameters is large is only recently emerging. In this paper, we characterize for the first time the fundamental limits on the statistical accuracy of convex ERM for inference in high-dimensional generalized linear models. For a stylized setting with Gaussian features and problem dimensions that grow large at a proportional rate, we start with sharp performance characterizations and then derive tight lower bounds on the estimation and prediction error that hold over a wide class of loss functions and for any value of the regularization parameter. Our precise analysis has several attributes. First, it leads to a recipe for optimally tuning the loss function and the regularization parameter. Second, it allows to precisely quantify the sub-optimality of popular heuristic choices: for instance, we show that optimally-tuned least-squares is (perhaps surprisingly) approximately optimal for standard logistic data, but the sub-optimality gap grows drastically as the signal strength increases. Third, we use the bounds to precisely assess the merits of ridge-regularization as a function of the over-parameterization ratio. Notably, our bounds are expressed in terms of the Fisher Information of random variables that are simple functions of the data distribution, thus making ties to corresponding bounds in classical statistics.

preprint2020arXiv

Mixed Autonomy in Ride-Sharing Networks

We consider ride-sharing networks served by human-driven vehicles (HVs) and autonomous vehicles (AVs). We propose a model for ride-sharing in this mixed autonomy setting for a multi-location network in which a ride-sharing platform sets prices for riders, compensations for drivers of HVs, and operates AVs for a fixed price with the goal of maximizing profits. When there are more vehicles than riders at a location, we consider three vehicle-to-rider assignment possibilities: rides are assigned to HVs first; rides are assigned to AVs first; rides are assigned in proportion to the number of available HVs and AVs. Next, for each of these priority possibilities, we establish a nonconvex optimization problem characterizing the optimal profits for a network operating at a steady-state equilibrium. We then provide a convex problem which we show to have the same optimal profits, allowing for efficient computation of equilibria, and we show that all three priority possibilities result in the same maximum profits for the platform. Next, we show that, in some cases, there is a regime for which the platform will choose to mix HVs and AVs in order to maximize its profit, while in other cases, the platform will use only HVs or only AVs, depending on the relative cost of AVs. For a specific class of networks, we fully characterize these thresholds analytically and demonstrate our results on an example.

preprint2020arXiv

Optimal Tolling for Multitype Mixed Autonomous Traffic Networks

When selfish users share a road network and minimize their individual travel costs, the equilibrium they reach can be worse than the socially optimal routing. Tolls are often used to mitigate this effect in traditional congestion games, where all vehicle contribute identically to congestion. However, with the proliferation of autonomous vehicles and driver-assistance technology, vehicles become heterogeneous in how they contribute to road latency. This magnifies the potential inefficiencies due to selfish routing and invalidates traditional tolling methods. To address this, we consider a network of parallel roads where the latency on each road is an affine function of the quantity of flow of each vehicle type. We provide tolls (which differentiate between vehicle types) which are guaranteed to minimize social cost at equilibrium. The tolls are a function of a calculated optimal routing; to enable this tolling, we prove that some element in the set of optimal routings has a lack of cycles in a graph representing the way vehicles types share roads. We then show that unless a planner can differentiate between vehicle types in the tolls given, the resulting equilibrium can be unboundedly worse than the optimal routing, and that marginal cost tolling fails in our setting.

preprint2020arXiv

Polarizing Front Ends for Robust CNNs

The vulnerability of deep neural networks to small, adversarially designed perturbations can be attributed to their "excessive linearity." In this paper, we propose a bottom-up strategy for attenuating adversarial perturbations using a nonlinear front end which polarizes and quantizes the data. We observe that ideal polarization can be utilized to completely eliminate perturbations, develop algorithms to learn approximately polarizing bases for data, and investigate the effectiveness of the proposed strategy on the MNIST and Fashion MNIST datasets.

preprint2020arXiv

Robust Federated Learning: The Case of Affine Distribution Shifts

Federated learning is a distributed paradigm that aims at training models using samples distributed across multiple users in a network while keeping the samples on users' devices with the aim of efficiency and protecting users privacy. In such settings, the training data is often statistically heterogeneous and manifests various distribution shifts across users, which degrades the performance of the learnt model. The primary goal of this paper is to develop a robust federated learning algorithm that achieves satisfactory performance against distribution shifts in users' samples. To achieve this goal, we first consider a structured affine distribution shift in users' data that captures the device-dependent data heterogeneity in federated settings. This perturbation model is applicable to various federated learning problems such as image classification where the images undergo device-dependent imperfections, e.g. different intensity, contrast, and brightness. To address affine distribution shifts across users, we propose a Federated Learning framework Robust to Affine distribution shifts (FLRA) that is provably robust against affine Wasserstein shifts to the distribution of observed samples. To solve the FLRA's distributed minimax problem, we propose a fast and efficient optimization method and provide convergence guarantees via a gradient Descent Ascent (GDA) method. We further prove generalization error bounds for the learnt classifier to show proper generalization from empirical distribution of samples to the true underlying distribution. We perform several numerical experiments to empirically support FLRA. We show that an affine distribution shift indeed suffices to significantly decrease the performance of the learnt classifier in a new test user, and our proposed algorithm achieves a significant gain in comparison to standard federated learning and adversarial training methods.

preprint2020arXiv

Sharp Asymptotics and Optimal Performance for Inference in Binary Models

We study convex empirical risk minimization for high-dimensional inference in binary models. Our first result sharply predicts the statistical performance of such estimators in the linear asymptotic regime under isotropic Gaussian features. Importantly, the predictions hold for a wide class of convex loss functions, which we exploit in order to prove a bound on the best achievable performance among them. Notably, we show that the proposed bound is tight for popular binary models (such as Signed, Logistic or Probit), by constructing appropriate loss functions that achieve it. More interestingly, for binary linear classification under the Logistic and Probit models, we prove that the performance of least-squares is no worse than 0.997 and 0.98 times the optimal one. Numerical simulations corroborate our theoretical findings and suggest they are accurate even for relatively small problem dimensions.

preprint2020arXiv

Sharp Guarantees for Solving Random Equations with One-Bit Information

We study the performance of a wide class of convex optimization-based estimators for recovering a signal from corrupted one-bit measurements in high-dimensions. Our general result predicts sharply the performance of such estimators in the linear asymptotic regime when the measurement vectors have entries IID Gaussian. This includes, as a special case, the previously studied least-squares estimator and various novel results for other popular estimators such as least-absolute deviations, hinge-loss and logistic-loss. Importantly, we exploit the fact that our analysis holds for generic convex loss functions to prove a bound on the best achievable performance across the entire class of estimators. Numerical simulations corroborate our theoretical findings and suggest they are accurate even for relatively small problem dimensions.

preprint2020arXiv

Straggler-Resilient Federated Learning: Leveraging the Interplay Between Statistical Accuracy and System Heterogeneity

Federated Learning is a novel paradigm that involves learning from data samples distributed across a large network of clients while the data remains local. It is, however, known that federated learning is prone to multiple system challenges including system heterogeneity where clients have different computation and communication capabilities. Such heterogeneity in clients' computation speeds has a negative effect on the scalability of federated learning algorithms and causes significant slow-down in their runtime due to the existence of stragglers. In this paper, we propose a novel straggler-resilient federated learning method that incorporates statistical characteristics of the clients' data to adaptively select the clients in order to speed up the learning procedure. The key idea of our algorithm is to start the training procedure with faster nodes and gradually involve the slower nodes in the model training once the statistical accuracy of the data corresponding to the current participating nodes is reached. The proposed approach reduces the overall runtime required to achieve the statistical accuracy of data of all nodes, as the solution for each stage is close to the solution of the subsequent stage with more samples and can be used as a warm-start. Our theoretical results characterize the speedup gain in comparison to standard federated benchmarks for strongly convex objectives, and our numerical experiments also demonstrate significant speedups in wall-clock time of our straggler-resilient method compared to federated learning benchmarks.

preprint2019arXiv

The Green Choice: Learning and Influencing Human Decisions on Shared Roads

Autonomous vehicles have the potential to increase the capacity of roads via platooning, even when human drivers and autonomous vehicles share roads. However, when users of a road network choose their routes selfishly, the resulting traffic configuration may be very inefficient. Because of this, we consider how to influence human decisions so as to decrease congestion on these roads. We consider a network of parallel roads with two modes of transportation: (i) human drivers who will choose the quickest route available to them, and (ii) ride hailing service which provides an array of autonomous vehicle ride options, each with different prices, to users. In this work, we seek to design these prices so that when autonomous service users choose from these options and human drivers selfishly choose their resulting routes, road usage is maximized and transit delay is minimized. To do so, we formalize a model of how autonomous service users make choices between routes with different price/delay values. Developing a preference-based algorithm to learn the preferences of the users, and using a vehicle flow model related to the Fundamental Diagram of Traffic, we formulate a planning optimization to maximize a social objective and demonstrate the benefit of the proposed routing and learning scheme.