Source author record

Dragana Bajovic

Dragana Bajovic appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Information Theory math.IT Machine Learning math.OC Applications math.PR Multiagent Systems Networking and Internet Architecture Social and Information Networks

Catalog footprint

What is connected

17works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Gradient Based Clustering

We propose a general approach for distance based clustering, using the gradient of the cost function that measures clustering quality with respect to cluster assignments and cluster center positions. The approach is an iterative two step procedure (alternating between cluster assignment and cluster center updates) and is applicable to a wide range of functions, satisfying some mild assumptions. The main advantage of the proposed approach is a simple and computationally cheap update rule. Unlike previous methods that specialize to a specific formulation of the clustering problem, our approach is applicable to a wide range of costs, including non-Bregman clustering methods based on the Huber loss. We analyze the convergence of the proposed algorithm, and show that it converges to the set of appropriately defined fixed points, under arbitrary center initialization. In the special case of Bregman cost functions, the algorithm converges to the set of centroidal Voronoi partitions, which is consistent with prior works. Numerical experiments on real data demonstrate the effectiveness of the proposed method.

preprint2022arXiv

Inaccuracy rates for distributed inference over random networks with applications to social learning

This paper studies probabilistic rates of convergence for consensus+innovations type of algorithms in random, generic networks. For each node, we find a lower and also a family of upper bounds on the large deviations rate function, thus enabling the computation of the exponential convergence rates for the events of interest on the iterates. Relevant applications include error exponents in distributed hypothesis testing, rates of convergence of beliefs in social learning, and inaccuracy rates in distributed estimation. The bounds on the rate function have a very particular form at each node: they are constructed as the convex envelope between the rate function of the hypothetical fusion center and the rate function corresponding to a certain topological mode of the node's presence. We further show tightness of the discovered bounds for several cases, such as pendant nodes and regular networks, thus establishing the first proof of the large deviations principle for consensus+innovations and social learning in random networks.

preprint2022arXiv

Nonlinear gradient mappings and stochastic optimization: A general framework with applications to heavy-tail noise

We introduce a general framework for nonlinear stochastic gradient descent (SGD) for the scenarios when gradient noise exhibits heavy tails. The proposed framework subsumes several popular nonlinearity choices, like clipped, normalized, signed or quantized gradient, but we also consider novel nonlinearity choices. We establish for the considered class of methods strong convergence guarantees assuming a strongly convex cost function with Lipschitz continuous gradients under very general assumptions on the gradient noise. Most notably, we show that, for a nonlinearity with bounded outputs and for the gradient noise that may not have finite moments of order greater than one, the nonlinear SGD's mean squared error (MSE), or equivalently, the expected cost function's optimality gap, converges to zero at rate~$O(1/t^ζ)$, $ζ\in (0,1)$. In contrast, for the same noise setting, the linear SGD generates a sequence with unbounded variances. Furthermore, for the nonlinearities that can be decoupled component wise, like, e.g., sign gradient or component-wise clipping, we show that the nonlinear SGD asymptotically (locally) achieves a $O(1/t)$ rate in the weak convergence sense and explicitly quantify the corresponding asymptotic variance. Experiments show that, while our framework is more general than existing studies of SGD under heavy-tail noise, several easy-to-implement nonlinearities from our framework are competitive with state of the art alternatives on real data sets with heavy tail noises.

preprint2022arXiv

Personalized Federated Learning via Convex Clustering

We propose a parametric family of algorithms for personalized federated learning with locally convex user costs. The proposed framework is based on a generalization of convex clustering in which the differences between different users' models are penalized via a sum-of-norms penalty, weighted by a penalty parameter $λ$. The proposed approach enables "automatic" model clustering, without prior knowledge of the hidden cluster structure, nor the number of clusters. Analytical bounds on the weight parameter, that lead to simultaneous personalization, generalization and automatic model clustering are provided. The solution to the formulated problem enables personalization, by providing different models across different clusters, and generalization, by providing models different than the per-user models computed in isolation. We then provide an efficient algorithm based on the Parallel Direction Method of Multipliers (PDMM) to solve the proposed formulation in a federated server-users setting. Numerical experiments corroborate our findings. As an interesting byproduct, our results provide several generalizations to convex clustering.

preprint2020arXiv

Primal-dual methods for large-scale and distributed convex optimization and data analytics

The augmented Lagrangian method (ALM) is a classical optimization tool that solves a given "difficult" (constrained) problem via finding solutions of a sequence of "easier"(often unconstrained) sub-problems with respect to the original (primal) variable, wherein constraints satisfaction is controlled via the so-called dual variables. ALM is highly flexible with respect to how primal sub-problems can be solved, giving rise to a plethora of different primal-dual methods. The powerful ALM mechanism has recently proved to be very successful in various large scale and distributed applications. In addition, several significant advances have appeared, primarily on precise complexity results with respect to computational and communication costs in the presence of inexact updates and design and analysis of novel optimal methods for distributed consensus optimization. We provide a tutorial-style introduction to ALM and its variants for solving convex optimization problems in large scale and distributed settings. We describe control-theoretic tools for the algorithms' analysis and design, survey recent results, and provide novel insights in the context of two emerging applications: federated learning and distributed energy trading.

preprint2016arXiv

CONDENSE: A Reconfigurable Knowledge Acquisition Architecture for Future 5G IoT

In forthcoming years, the Internet of Things (IoT) will connect billions of smart devices generating and uploading a deluge of data to the cloud. If successfully extracted, the knowledge buried in the data can significantly improve the quality of life and foster economic growth. However, a critical bottleneck for realising the efficient IoT is the pressure it puts on the existing communication infrastructures, requiring transfer of enormous data volumes. Aiming at addressing this problem, we propose a novel architecture dubbed Condense, which integrates the IoT-communication infrastructure into data analysis. This is achieved via the generic concept of network function computation: Instead of merely transferring data from the IoT sources to the cloud, the communication infrastructure should actively participate in the data analysis by carefully designed en-route processing. We define the Condense architecture, its basic layers, and the interactions among its constituent modules. Further, from the implementation side, we describe how Condense can be integrated into the 3rd Generation Partnership Project (3GPP) Machine Type Communications (MTC) architecture, as well as the prospects of making it a practically viable technology in a short time frame, relying on Network Function Virtualization (NFV) and Software Defined Networking (SDN). Finally, from the theoretical side, we survey the relevant literature on computing "atomic" functions in both analog and digital domains, as well as on function decomposition over networks, highlighting challenges, insights, and future directions for exploiting these techniques within practical 3GPP MTC architecture.

preprint2016arXiv

Distributed Gradient Methods with Variable Number of Working Nodes

We consider distributed optimization where $N$ nodes in a connected network minimize the sum of their local costs subject to a common constraint set. We propose a distributed projected gradient method where each node, at each iteration $k$, performs an update (is active) with probability $p_k$, and stays idle (is inactive) with probability $1-p_k$. Whenever active, each node performs an update by weight-averaging its solution estimate with the estimates of its active neighbors, taking a negative gradient step with respect to its local cost, and performing a projection onto the constraint set; inactive nodes perform no updates. Assuming that nodes' local costs are strongly convex, with Lipschitz continuous gradients, we show that, as long as activation probability $p_k$ grows to one asymptotically, our algorithm converges in the mean square sense (MSS) to the same solution as the standard distributed gradient method, i.e., as if all the nodes were active at all iterations. Moreover, when $p_k$ grows to one linearly, with an appropriately set convergence factor, the algorithm has a linear MSS convergence, with practically the same factor as the standard distributed gradient method. Simulations on both synthetic and real world data sets demonstrate that, when compared with the standard distributed gradient method, the proposed algorithm significantly reduces the overall number of per-node communications and per-node gradient evaluations (computational cost) for the same required accuracy.

preprint2015arXiv

Cooperative Slotted Aloha for Multi-Base Station Systems

We introduce a framework to study slotted Aloha with cooperative base stations. Assuming a geographic-proximity communication model, we propose several decoding algorithmswith different degrees of base stations' cooperation (non-cooperative, spatial, temporal, and spatio-temporal). With spatial cooperation, neighboring base stations inform each other whenever they collect a user within their coverage overlap; temporal cooperation corresponds to (temporal) successive interference cancellation done locally at each station. We analyze the four decoding algorithms and establish several fundamental results. With all algorithms, the peak throughput (average number of decoded users per slot, across all base stations) increases linearly with the number of base stations. Further, temporal and spatio-temporal cooperations exhibit a threshold behavior with respect to the normalized load (number of users per station, per slot). There exists a positive load $G^\star$, such that, below $G^\star$, the decoding probability is asymptotically maximal possible, equal the probability that a user is heard by at least one base station; with non-cooperative decoding and spatial cooperation, we show that $G^\star$ is zero. Finally, with spatio-temporal cooperation, we optimize the degree distribution according to which users transmit their packet replicas; the optimum is in general very different from the corresponding optimal distribution of the single-base station system.

preprint2014arXiv

Distributed Storage Allocations for Neighborhood-based Data Access

We introduce a neighborhood-based data access model for distributed coded storage allocation. Storage nodes are connected in a generic network and data is accessed locally: a user accesses a randomly chosen storage node, which subsequently queries its neighborhood to recover the data object. We aim at finding an optimal allocation that minimizes the overall storage budget while ensuring recovery with probability one. We show that the problem reduces to finding the fractional dominating set of the underlying network. Furthermore, we develop a fully distributed algorithm where each storage node communicates only with its neighborhood in order to find its optimal storage allocation. The proposed algorithm is based upon the recently proposed proximal center method--an efficient dual decomposition based on accelerated dual gradient method. We show that our algorithm achieves a $(1+ε)$-approximation ratio in $O(d_{\mathrm{max}}^{3/2}/ε)$ iterations and per-node communications, where $d_{\mathrm{max}}$ is the maximal degree across nodes. Simulations demonstrate the effectiveness of the algorithm.

preprint2014arXiv

Slotted Aloha for Networked Base Stations

We study multiple base station, multi-access systems in which the user-base station adjacency is induced by geographical proximity. At each slot, each user transmits (is active) with a certain probability, independently of other users, and is heard by all base stations within the distance $r$. Both the users and base stations are placed uniformly at random over the (unit) area. We first consider a non-cooperative decoding where base stations work in isolation, but a user is decoded as soon as one of its nearby base stations reads a clean signal from it. We find the decoding probability and quantify the gains introduced by multiple base stations. Specifically, the peak throughput increases linearly with the number of base stations $m$ and is roughly $m/4$ larger than the throughput of a single-base station that uses standard slotted Aloha. Next, we propose a cooperative decoding, where the mutually close base stations inform each other whenever they decode a user inside their coverage overlap. At each base station, the messages received from the nearby stations help resolve collisions by the interference cancellation mechanism. Building from our exact formulas for the non-cooperative case, we provide a heuristic formula for the cooperative decoding probability that reflects well the actual performance. Finally, we demonstrate by simulation significant gains of cooperation with respect to the non-cooperative decoding.

preprint2014arXiv

Slotted Aloha for Networked Base Stations with Spatial and Temporal Diversity

We consider framed slotted Aloha where $m$ base stations cooperate to decode messages from $n$ users. Users and base stations are placed uniformly at random over an area. At each frame, each user sends multiple replicas of its packet according to a prescribed distribution, and it is heard by all base stations within the communication radius $r$. Base stations employ a decoding algorithm that utilizes the successive interference cancellation mechanism, both in space--across neighboring base stations, and in time--across different slots, locally at each base station. We show that there exists a threshold on the normalized load $G=n/(τm)$, where $τ$ is the number of slots per frame, below which decoding probability converges asymptotically (as $n,m,τ\rightarrow \infty$, $r\rightarrow 0$) to the maximal possible value--the probability that a user is heard by at least one base station, and we find a lower bound on the threshold. Further, we give a heuristic evaluation of the decoding probability based on the and-or-tree analysis. Finally, we show that the peak throughput increases linearly in the number of base stations.

preprint2012arXiv

Consensus and Products of Random Stochastic Matrices: Exact Rate for Convergence in Probability

Distributed consensus and other linear systems with system stochastic matrices $W_k$ emerge in various settings, like opinion formation in social networks, rendezvous of robots, and distributed inference in sensor networks. The matrices $W_k$ are often random, due to, e.g., random packet dropouts in wireless sensor networks. Key in analyzing the performance of such systems is studying convergence of matrix products $W_kW_{k-1}... W_1$. In this paper, we find the exact exponential rate $I$ for the convergence in probability of the product of such matrices when time $k$ grows large, under the assumption that the $W_k$'s are symmetric and independent identically distributed in time. Further, for commonly used random models like with gossip and link failure, we show that the rate $I$ is found by solving a min-cut problem and, hence, easily computable. Finally, we apply our results to optimally allocate the sensors' transmission power in consensus+innovations distributed detection.

preprint2012arXiv

Large Deviations Performance of Consensus+Innovations Distributed Detection with Non-Gaussian Observations

We establish the large deviations asymptotic performance (error exponent) of consensus+innovations distributed detection over random networks with generic (non-Gaussian) sensor observations. At each time instant, sensors 1) combine theirs with the decision variables of their neighbors (consensus) and 2) assimilate their new observations (innovations). This paper shows for general non-Gaussian distributions that consensus+innovations distributed detection exhibits a phase transition behavior with respect to the network degree of connectivity. Above a threshold, distributed is as good as centralized, with the same optimal asymptotic detection performance, but, below the threshold, distributed detection is suboptimal with respect to centralized detection. We determine this threshold and quantify the performance loss below threshold. Finally, we show the dependence of the threshold and performance on the distribution of the observations: distributed detectors over the same random network, but with different observations' distributions, for example, Gaussian, Laplace, or quantized, may have different asymptotic performance, even when the corresponding centralized detectors have the same asymptotic performance.

preprint2010arXiv

Distributed Detection over Random Networks: Large Deviations Analysis

We show by large deviations theory that the performance of running consensus is asymptotically equivalent to the performance of the (asymptotically) optimal centralized detector. Running consensus is a stochastic approximation type algorithm for distributed detection in sensor networks, recently proposed. At each time step, the state at each sensor is updated by a local averaging of its own state and the states of its neighbors (consensus) and by accounting for the new observations (innovation). We assume Gaussian, spatially correlated observations, and we allow for the underlying network to be randomly varying. This paper shows through large deviations that the Bayes probability of detection error, for the distributed detector, decays at the best achievable rate, namely, the Chernoff information rate. Numerical examples illustrate the behavior of the distributed detector for finite number of observations.

preprint2010arXiv

Distributed Detection over Random Networks: Large Deviations Performance Analysis

We study the large deviations performance, i.e., the exponential decay rate of the error probability, of distributed detection algorithms over random networks. At each time step $k$ each sensor: 1) averages its decision variable with the neighbors' decision variables; and 2) accounts on-the-fly for its new observation. We show that distributed detection exhibits a "phase change" behavior. When the rate of network information flow (the speed of averaging) is above a threshold, then distributed detection is asymptotically equivalent to the optimal centralized detection, i.e., the exponential decay rate of the error probability for distributed detection equals the Chernoff information. When the rate of information flow is below a threshold, distributed detection achieves only a fraction of the Chernoff information rate; we quantify this achievable rate as a function of the network rate of information flow. Simulation examples demonstrate our theoretical findings on the behavior of distributed detection over random networks.

preprint2010arXiv

Distributed Detection over Time Varying Networks: Large Deviations Analysis

We apply large deviations theory to study asymptotic performance of running consensus distributed detection in sensor networks. Running consensus is a stochastic approximation type algorithm, recently proposed. At each time step k, the state at each sensor is updated by a local averaging of the sensor's own state and the states of its neighbors (consensus) and by accounting for the new observations (innovation). We assume Gaussian, spatially correlated observations. We allow the underlying network be time varying, provided that the graph that collects the union of links that are online at least once over a finite time window is connected. This paper shows through large deviations that, under stated assumptions on the network connectivity and sensors' observations, the running consensus detection asymptotically approaches in performance the optimal centralized detection. That is, the Bayes probability of detection error (with the running consensus detector) decays exponentially to zero as k goes to infinity at the Chernoff information rate-the best achievable rate of the asymptotically optimal centralized detector.

preprint2010arXiv

Sensor Selection for Event Detection in Wireless Sensor Networks

We consider the problem of sensor selection for event detection in wireless sensor networks (WSNs). We want to choose a subset of p out of n sensors that yields the best detection performance. As the sensor selection optimality criteria, we propose the Kullback-Leibler and Chernoff distances between the distributions of the selected measurements under the two hypothesis. We formulate the maxmin robust sensor selection problem to cope with the uncertainties in distribution means. We prove that the sensor selection problem is NP hard, for both Kullback-Leibler and Chernoff criteria. To (sub)optimally solve the sensor selection problem, we propose an algorithm of affordable complexity. Extensive numerical simulations on moderate size problem instances (when the optimum by exhaustive search is feasible to compute) demonstrate the algorithm's near optimality in a very large portion of problem instances. For larger problems, extensive simulations demonstrate that our algorithm outperforms random searches, once an upper bound on computational time is set. We corroborate numerically the validity of the Kullback-Leibler and Chernoff sensor selection criteria, by showing that they lead to sensor selections nearly optimal both in the Neyman-Pearson and Bayes sense.

Dragana Bajovic

What is connected

Connect this record

See the researcher in context

Building this map preview

17 published item(s)

Gradient Based Clustering

Inaccuracy rates for distributed inference over random networks with applications to social learning

Nonlinear gradient mappings and stochastic optimization: A general framework with applications to heavy-tail noise

Personalized Federated Learning via Convex Clustering

Primal-dual methods for large-scale and distributed convex optimization and data analytics

CONDENSE: A Reconfigurable Knowledge Acquisition Architecture for Future 5G IoT

Distributed Gradient Methods with Variable Number of Working Nodes

Cooperative Slotted Aloha for Multi-Base Station Systems

Distributed Storage Allocations for Neighborhood-based Data Access

Slotted Aloha for Networked Base Stations

Slotted Aloha for Networked Base Stations with Spatial and Temporal Diversity

Consensus and Products of Random Stochastic Matrices: Exact Rate for Convergence in Probability

Large Deviations Performance of Consensus+Innovations Distributed Detection with Non-Gaussian Observations

Distributed Detection over Random Networks: Large Deviations Analysis

Distributed Detection over Random Networks: Large Deviations Performance Analysis

Distributed Detection over Time Varying Networks: Large Deviations Analysis

Sensor Selection for Event Detection in Wireless Sensor Networks