Source author record

Usman A. Khan

Usman A. Khan appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.OC Multiagent Systems Information Theory math.IT Machine Learning Systems and Control Distributed, Parallel, and Cluster Computing eess.SY Social and Information Networks Robotics

Catalog footprint

What is connected

22works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Optimal and Scalable MAPF via Multi-Marginal Optimal Transport and Schrödinger Bridges

We consider anonymous multi-agent path finding (MAPF) where a set of robots is tasked to travel to a set of targets on a finite, connected graph. We show that MAPF can be cast as a special class of multi-marginal optimal transport (MMOT) problems with an underlying Markovian structure, under which the exponentially large MMOT collapses to a linear program (LP) polynomial in size. Focusing on the anonymous setting, we establish conditions under which the corresponding LP is feasible, totally unimodular, and consequently, yields min-cost, integral $(\{0,1\})$ transports that do not overlap in both space and time. To adapt the approach to large-scale problems, we cast the MAPF-MMOT in a probabilistic framework via Schrödinger bridges. Under standard assumptions, we show that the Schrödinger bridge formulation reduces to an entropic regularization of the corresponding MMOT that admits an iterative Sinkhorn-type solution. The Schrödinger bridge, being a probabilistic framework, provides a shadow (fractional) transport that we use as a template to solve a reduced LP and demonstrate that it results in near-optimal, integral transports at a significant reduction in complexity. Extensive experiments highlight the optimality and scalability of the proposed approaches.

preprint2022arXiv

Distributed Constraint-Coupled Optimization over Lossy Networks

This paper considers distributed resource allocation and sum-preserving constrained optimization over lossy networks, where the links are unreliable and subject to packet drops. We define the conditions to ensure convergence under packet drops and link removal by focusing on two main properties of our allocation algorithm: (i) The weight-stochastic condition in typical consensus schemes is reduced to balanced weights, with no need for readjusting the weights to satisfy stochasticity. (ii) The algorithm does not require all-time connectivity but instead uniform connectivity over some non-overlapping finite time intervals. First, we prove that our algorithm provides primal-feasible allocation at every iteration step and converges under the conditions (i)-(ii) and some other mild conditions on the nonlinear iterative dynamics. These nonlinearities address possible practical constraints in real applications due to, for example, saturation or quantization among others. Then, using (i)-(ii) and the notion of bond-percolation theory, we relate the packet drop rate and the network percolation threshold to the (finite) number of iterations ensuring uniform connectivity and, thus, convergence towards the optimum value.

preprint2022arXiv

Distributed saddle point problems for strongly concave-convex functions

In this paper, we propose GT-GDA, a distributed optimization method to solve saddle point problems of the form: $\min_{\mathbf{x}} \max_{\mathbf{y}} \{F(\mathbf{x},\mathbf{y}) :=G(\mathbf{x}) + \langle \mathbf{y}, \overline{P} \mathbf{x} \rangle - H(\mathbf{y})\}$, where the functions $G(\cdot)$, $H(\cdot)$, and the the coupling matrix $\overline{P}$ are distributed over a strongly connected network of nodes. GT-GDA is a first-order method that uses gradient tracking to eliminate the dissimilarity caused by heterogeneous data distribution among the nodes. In the most general form, GT-GDA includes a consensus over the local coupling matrices to achieve the optimal (unique) saddle point, however, at the expense of increased communication. To avoid this, we propose a more efficient variant GT-GDA-Lite that does not incur the additional communication and analyze its convergence in various scenarios. We show that GT-GDA converges linearly to the unique saddle point solution when $G(\cdot)$ is smooth and convex, $H(\cdot)$ is smooth and strongly convex, and the global coupling matrix $\overline{P}$ has full column rank. We further characterize the regime under which GT-GDA exhibits a network topology-independent convergence behavior. We next show the linear convergence of GT-GDA to an error around the unique saddle point, which goes to zero when the coupling cost ${\langle \mathbf y, \overline{P} \mathbf x \rangle}$ is common to all nodes, or when $G(\cdot)$ and $H(\cdot)$ are quadratic. Numerical experiments illustrate the convergence properties and importance of GT-GDA and GT-GDA-Lite for several applications.

preprint2022arXiv

Variance reduced stochastic optimization over directed graphs with row and column stochastic weights

This paper proposes AB-SAGA, a first-order distributed stochastic optimization method to minimize a finite-sum of smooth and strongly convex functions distributed over an arbitrary directed graph. AB-SAGA removes the uncertainty caused by the stochastic gradients using a node-level variance reduction and subsequently employs network-level gradient tracking to address the data dissimilarity across the nodes. Unlike existing methods that use the nonlinear push-sum correction to cancel the imbalance caused by the directed communication, the consensus updates in AB-SAGA are linear and uses both row and column stochastic weights. We show that for a constant step-size, AB-SAGA converges linearly to the global optimal. We quantify the directed nature of the underlying graph using an explicit directivity constant and characterize the regimes in which AB-SAGA achieves a linear speed-up over its centralized counterpart. Numerical experiments illustrate the convergence of AB-SAGA for strongly convex and nonconvex problems.

preprint2020arXiv

A general framework for decentralized optimization with first-order methods

Decentralized optimization to minimize a finite sum of functions over a network of nodes has been a significant focus within control and signal processing research due to its natural relevance to optimal control and signal estimation problems. More recently, the emergence of sophisticated computing and large-scale data science needs have led to a resurgence of activity in this area. In this article, we discuss decentralized first-order gradient methods, which have found tremendous success in control, signal processing, and machine learning problems, where such methods, due to their simplicity, serve as the first method of choice for many complex inference and training tasks. In particular, we provide a general framework of decentralized first-order methods that is applicable to undirected and directed communication networks alike, and show that much of the existing work on optimization and consensus can be related explicitly to this framework. We further extend the discussion to decentralized stochastic first-order methods that rely on stochastic gradients at each node and describe how local variance reduction schemes, previously shown to have promise in the centralized settings, are able to improve the performance of decentralized methods when combined with what is known as gradient tracking. We motivate and demonstrate the effectiveness of the corresponding methods in the context of machine learning and signal processing problems that arise in decentralized environments.

preprint2020arXiv

Gradient tracking and variance reduction for decentralized optimization and machine learning

Decentralized methods to solve finite-sum minimization problems are important in many signal processing and machine learning tasks where the data is distributed over a network of nodes and raw data sharing is not permitted due to privacy and/or resource constraints. In this article, we review decentralized stochastic first-order methods and provide a unified algorithmic framework that combines variance-reduction with gradient tracking to achieve both robust performance and fast convergence. We provide explicit theoretical guarantees of the corresponding methods when the objective functions are smooth and strongly-convex, and show their applicability to non-convex problems via numerical experiments. Throughout the article, we provide intuitive illustrations of the main technical ideas by casting appropriate tradeoffs and comparisons among the methods of interest and by highlighting applications to decentralized training of machine learning models.

preprint2020arXiv

S-ADDOPT: Decentralized stochastic first-order optimization over directed graphs

In this report, we study decentralized stochastic optimization to minimize a sum of smooth and strongly convex cost functions when the functions are distributed over a directed network of nodes. In contrast to the existing work, we use gradient tracking to improve certain aspects of the resulting algorithm. In particular, we propose the~\textbf{\texttt{S-ADDOPT}} algorithm that assumes a stochastic first-order oracle at each node and show that for a constant step-size~$α$, each node converges linearly inside an error ball around the optimal solution, the size of which is controlled by~$α$. For decaying step-sizes~$\mathcal{O}(1/k)$, we show that~\textbf{\texttt{S-ADDOPT}} reaches the exact solution sublinearly at~$\mathcal{O}(1/k)$ and its convergence is asymptotically network-independent. Thus the asymptotic behavior of~\textbf{\texttt{S-ADDOPT}} is comparable to the centralized stochastic gradient descent. Numerical experiments over both strongly convex and non-convex problems illustrate the convergence behavior and the performance comparison of the proposed algorithm.

preprint2019arXiv

Cyber-Social Systems: Modeling, Inference, and Optimal Design

This paper models the cyber-social system as a cyber-network of agents monitoring states of individuals in a social network. The state of each individual is represented by a social node and the interactions among individuals are represented by a social link. In the cyber-network each node represents an agent and the links represent information sharing among agents. Agents make an observation of social states and perform distributed inference. In this direction, the contribution of this work is threefold: (i) A novel distributed inference protocol is proposed that makes no assumption on the rank of the underlying social system. This is significant as most protocols in the literature only work on full-rank systems. (ii) A novel agent classification is developed, where it is shown that connectivity requirement on the cyber-network differs for each type. This is particularly important in finding the minimal number of observations and minimal connectivity of the cyber-network as the next contribution. (iii) The cost-optimal design of cyber-network constraint with distributed observability is addressed. This problem is subdivided into sensing cost optimization and networking cost optimization where both are claimed to be NP-hard. We solve both problems for certain types of social networks and find polynomial-order solutions.

preprint2016arXiv

Distributed Subgradient Projection Algorithm over Directed Graphs

We propose a distributed algorithm, termed the Directed-Distributed Projected Subgradient (D-DPS), to solve a constrained optimization problem over a multi-agent network, where the goal of agents is to collectively minimize the sum of locally known convex functions. Each agent in the network owns only its local objective function, constrained to a commonly known convex set. We focus on the circumstance when communications between agents are described by a directed network. The D-DPS augments an additional variable for each agent, to overcome the asymmetry caused by the directed communication network. The convergence analysis shows that D-DPS converges at a rate of $O(\frac{\ln k}{\sqrt{k}})$, where k is the number of iterations.

preprint2016arXiv

On the Distributed Optimization over Directed Networks

In this paper, we propose a distributed algorithm, called Directed-Distributed Gradient Descent (D-DGD), to solve multi-agent optimization problems over directed graphs. Existing algorithms mostly deal with similar problems under the assumption of undirected networks, i.e., requiring the weight matrices to be doubly-stochastic. The row-stochasticity of the weight matrix guarantees that all agents reach consensus, while the column-stochasticity ensures that each agent's local gradient contributes equally to the global objective. In a directed graph, however, it may not be possible to construct a doubly-stochastic weight matrix in a distributed manner. We overcome this difficulty by augmenting an additional variable for each agent to record the change in the state evolution. In each iteration, the algorithm simultaneously constructs a row-stochastic matrix and a column-stochastic matrix instead of only a doubly-stochastic matrix. The convergence of the new weight matrix, depending on the row-stochastic and column-stochastic matrices, ensures agents to reach both consensus and optimality. The analysis shows that the proposed algorithm converges at a rate of $O(\frac{\ln k}{\sqrt{k}})$, where $k$ is the number of iterations.

preprint2016arXiv

On the Linear Convergence of Distributed Optimization over Directed Graphs

This paper develops a fast distributed algorithm, termed \emph{DEXTRA}, to solve the optimization problem when~$n$ agents reach agreement and collaboratively minimize the sum of their local objective functions over the network, where the communication between the agents is described by a~\emph{directed} graph. Existing algorithms solve the problem restricted to directed graphs with convergence rates of $O(\ln k/\sqrt{k})$ for general convex objective functions and $O(\ln k/k)$ when the objective functions are strongly-convex, where~$k$ is the number of iterations. We show that, with the appropriate step-size, DEXTRA converges at a linear rate $O(τ^{k})$ for $0<τ<1$, given that the objective functions are restricted strongly-convex. The implementation of DEXTRA requires each agent to know its local out-degree. Simulation examples further illustrate our findings.

preprint2015arXiv

Distributed Mirror Descent over Directed Graphs

In this paper, we propose Distributed Mirror Descent (DMD) algorithm for constrained convex optimization problems on a (strongly-)connected multi-agent network. We assume that each agent has a private objective function and a constraint set. The proposed DMD algorithm employs a locally designed Bregman distance function at each agent, and thus can be viewed as a generalization of the well-known Distributed Projected Subgradient (DPS) methods, which use identical Euclidean distances at the agents. At each iteration of the DMD, each agent optimizes its own objective adjusted with the Bregman distance function while exchanging state information with its neighbors. To further generalize DMD, we consider the case where the agent communication follows a \emph{directed} graph and it may not be possible to design doubly-stochastic weight matrices. In other words, we restrict the corresponding weight matrices to be row-stochastic instead of doubly-stochastic. We study the convergence of DMD in two cases: (i) when the constraint sets at the agents are the same; and, (ii) when the constraint sets at the agents are different. By partially following the spirit of our proof, it can be shown that a class of consensus-based distributed optimization algorithms, restricted to doubly-stochastic matrices, remain convergent with stochastic matrices.

preprint2014arXiv

Asymptotic stability of stochastic LTV systems with applications to distributed dynamic fusion

In this paper, we investigate asymptotic stability of linear time-varying systems with (sub-) stochastic system matrices. Motivated by distributed dynamic fusion over networks of mobile agents, we impose some mild regularity conditions on the elements of time-varying system matrices. We provide sufficient conditions under which the asymptotic stability of the LTV system can be guaranteed. By introducing the notion of slices, as non-overlapping partitions of the sequence of systems matrices, we obtain stability conditions in terms of the slice lengths and some network parameters. In addition, we apply the LTV stability results to the distributed leader-follower algorithm, and show the corresponding convergence and steady-state. An illustrative example is also included to validate the effectiveness of our approach.

preprint2014arXiv

Graphic-theoretic distributed inference in social networks

We consider distributed inference in social networks where a phenomenon of interest evolves over a given social interaction graph, referred to as the \emph{social digraph}. For inference, we assume that a network of agents monitors certain nodes in the social digraph and no agent may be able to perform inference within its neighborhood; the agents must rely on inter-agent communication. The key contributions of this paper include: (i) a novel construction of the distributed estimator and distributed observability from the first principles; (ii) a graph-theoretic agent classification that establishes the importance and role of each agent towards inference; (iii) characterizing the necessary conditions, based on the classification in (ii), on the agent network to achieve distributed observability. Our results are based on structured systems theory and are applicable to any parameter choice of the underlying system matrix as long as the social digraph remains fixed. In other words, any social phenomena that evolves (linearly) over a structure-invariant social digraph may be considered--we refer to such systems as Liner Structure-Invariant (LSI). The aforementioned contributions, (i)--(iii), thus, only require the knowledge of the social digraph (topology) and are independent of the social phenomena. We show the applicability of the results to several real-wold social networks, i.e. social influence among monks, networks of political blogs and books, and a co-authorship graph.

preprint2014arXiv

Measurement partitioning and observational equivalence in state estimation

This letter studies measurement partitioning and equivalence in state estimation based on graph-theoretic principles. We show that a set of critical measurements (required to ensure LTI state-space observability) can be further partitioned into two types:~$α$ and~$β$. This partitioning is driven by different graphical (or algebraic) methods used to define the corresponding measurements. Subsequently, we describe observational equivalence, i.e. given an~$α$ (or~$β$) measurement, say~$y_i$, what is the set of measurements equivalent to~$y_i$, such that only one measurement in this set is required to ensure observability? Since~$α$ and~$β$ measurements are cast using different algebraic and graphical characteristics, their equivalence sets are also derived using different algebraic and graph-theoretic principles. We illustrate the related concepts on an appropriate system digraph.

preprint2013arXiv

Consensus in the presence of interference

This paper studies distributed strategies for average-consensus of arbitrary vectors in the presence of network interference. We assume that the underlying communication on any \emph{link} suffers from \emph{additive interference} caused due to the communication by other agents following their own consensus protocol. Additionally, no agent knows how many or which agents are interfering with its communication. Clearly, the standard consensus protocol does not remain applicable in such scenarios. In this paper, we cast an algebraic structure over the interference and show that the standard protocol can be modified such that the average is reachable in a subspace whose dimension is complimentary to the maximal dimension of the interference subspaces (over all of the communication links). To develop the results, we use \emph{information alignment} to align the intended transmission (over each link) to the null-space of the interference (on that link). We show that this alignment is indeed invertible, i.e. the intended transmission can be recovered over which, subsequently, consensus protocol is implemented. That \emph{local} protocols exist even when the collection of the interference subspaces span the entire vector space is somewhat surprising.

preprint2012arXiv

On the genericity properties in networked estimation: Topology design and sensor placement

In this paper, we consider networked estimation of linear, discrete-time dynamical systems monitored by a network of agents. In order to minimize the power requirement at the (possibly, battery-operated) agents, we require that the agents can exchange information with their neighbors only \emph{once per dynamical system time-step}; in contrast to consensus-based estimation where the agents exchange information until they reach a consensus. It can be verified that with this restriction on information exchange, measurement fusion alone results in an unbounded estimation error at every such agent that does not have an observable set of measurements in its neighborhood. To over come this challenge, state-estimate fusion has been proposed to recover the system observability. However, we show that adding state-estimate fusion may not recover observability when the system matrix is structured-rank ($S$-rank) deficient. In this context, we characterize the state-estimate fusion and measurement fusion under both full $S$-rank and $S$-rank deficient system matrices.

preprint2011arXiv

Networked estimation under information constraints

In this paper, we study estimation of potentially unstable linear dynamical systems when the observations are distributed over a network. We are interested in scenarios when the information exchange among the agents is restricted. In particular, we consider that each agent can exchange information with its neighbors only once per dynamical system evolution-step. Existing work with similar information-constraints is restricted to static parameter estimation, whereas, the work on dynamical systems assumes large number of information exchange iterations between every two consecutive system evolution steps. We show that when the agent communication network is sparely-connected, the sparsity of the network plays a key role in the stability and performance of the underlying estimation algorithm. To this end, we introduce the notion of \emph{Network Tracing Capacity} (NTC), which is defined as the largest two-norm of the system matrix that can be estimated with bounded error. Extending this to fully-connected networks or infinite information exchanges (per dynamical system evolution-step), we note that the NTC is infinite, i.e., any dynamical system can be estimated with bounded error. In short, the NTC characterizes the estimation capability of a sparse network by relating it to the evolution of the underlying dynamical system.

preprint2009arXiv

DILAND: An Algorithm for Distributed Sensor Localization with Noisy Distance Measurements

In this correspondence, we present an algorithm for distributed sensor localization with noisy distance measurements (DILAND) that extends and makes the DLRE more robust. DLRE is a distributed sensor localization algorithm in $\mathbb{R}^m$ $(m\geq1)$ introduced in \cite{usman_loctsp:08}. DILAND operates when (i) the communication among the sensors is noisy; (ii) the communication links in the network may fail with a non-zero probability; and (iii) the measurements performed to compute distances among the sensors are corrupted with noise. The sensors (which do not know their locations) lie in the convex hull of at least $m+1$ anchors (nodes that know their own locations.) Under minimal assumptions on the connectivity and triangulation of each sensor in the network, this correspondence shows that, under the broad random phenomena described above, DILAND converges almost surely (a.s.) to the exact sensor locations.

preprint2009arXiv

Higher Dimensional Consensus: Learning in Large-Scale Networks

The paper presents higher dimension consensus (HDC) for large-scale networks. HDC generalizes the well-known average-consensus algorithm. It divides the nodes of the large-scale network into anchors and sensors. Anchors are nodes whose states are fixed over the HDC iterations, whereas sensors are nodes that update their states as a linear combination of the neighboring states. Under appropriate conditions, we show that the sensor states converge to a linear combination of the anchor states. Through the concept of anchors, HDC captures in a unified framework several interesting network tasks, including distributed sensor localization, leader-follower, distributed Jacobi to solve linear systems of algebraic equations, and, of course, average-consensus. In many network applications, it is of interest to learn the weights of the distributed linear algorithm so that the sensors converge to a desired state. We term this inverse problem the HDC learning problem. We pose learning in HDC as a constrained non-convex optimization problem, which we cast in the framework of multi-objective optimization (MOP) and to which we apply Pareto optimality. We prove analytically relevant properties of the MOP solutions and of the Pareto front from which we derive the solution to learning in HDC. Finally, the paper shows how the MOP approach resolves interesting tradeoffs (speed of convergence versus quality of the final state) arising in learning in HDC in resource constrained networks.

preprint2008arXiv

Distributed Sensor Localization in Random Environments using Minimal Number of Anchor Nodes

The paper develops DILOC, a \emph{distributive}, \emph{iterative} algorithm that locates M sensors in $\mathbb{R}^m, m\geq 1$, with respect to a minimal number of m+1 anchors with known locations. The sensors exchange data with their neighbors only; no centralized data processing or communication occurs, nor is there centralized knowledge about the sensors' locations. DILOC uses the barycentric coordinates of a sensor with respect to its neighbors that are computed using the Cayley-Menger determinants. These are the determinants of matrices of inter-sensor distances. We show convergence of DILOC by associating with it an absorbing Markov chain whose absorbing states are the anchors. We introduce a stochastic approximation version extending DILOC to random environments when the knowledge about the intercommunications among sensors and the inter-sensor distances are noisy, and the communication links among neighbors fail at random times. We show a.s. convergence of the modified DILOC and characterize the error between the final estimates and the true values of the sensors' locations. Numerical studies illustrate DILOC under a variety of deterministic and random operating conditions.

preprint2008arXiv

Distributing the Kalman Filter for Large-Scale Systems

This paper derives a \emph{distributed} Kalman filter to estimate a sparsely connected, large-scale, $n-$dimensional, dynamical system monitored by a network of $N$ sensors. Local Kalman filters are implemented on the ($n_l-$dimensional, where $n_l\ll n$) sub-systems that are obtained after spatially decomposing the large-scale system. The resulting sub-systems overlap, which along with an assimilation procedure on the local Kalman filters, preserve an $L$th order Gauss-Markovian structure of the centralized error processes. The information loss due to the $L$th order Gauss-Markovian approximation is controllable as it can be characterized by a divergence that decreases as $L\uparrow$. The order of the approximation, $L$, leads to a lower bound on the dimension of the sub-systems, hence, providing a criterion for sub-system selection. The assimilation procedure is carried out on the local error covariances with a distributed iterate collapse inversion (DICI) algorithm that we introduce. The DICI algorithm computes the (approximated) centralized Riccati and Lyapunov equations iteratively with only local communication and low-order computation. We fuse the observations that are common among the local Kalman filters using bipartite fusion graphs and consensus averaging algorithms. The proposed algorithm achieves full distribution of the Kalman filter that is coherent with the centralized Kalman filter with an $L$th order Gaussian-Markovian structure on the centralized error processes. Nowhere storage, communication, or computation of $n-$dimensional vectors and matrices is needed; only $n_l \ll n$ dimensional vectors and matrices are communicated or used in the computation at the sensors.

Usman A. Khan

What is connected

Connect this record

See the researcher in context

Building this map preview

22 published item(s)

Optimal and Scalable MAPF via Multi-Marginal Optimal Transport and Schrödinger Bridges

Distributed Constraint-Coupled Optimization over Lossy Networks

Distributed saddle point problems for strongly concave-convex functions

Variance reduced stochastic optimization over directed graphs with row and column stochastic weights

A general framework for decentralized optimization with first-order methods

Gradient tracking and variance reduction for decentralized optimization and machine learning

S-ADDOPT: Decentralized stochastic first-order optimization over directed graphs

Cyber-Social Systems: Modeling, Inference, and Optimal Design

Distributed Subgradient Projection Algorithm over Directed Graphs

On the Distributed Optimization over Directed Networks

On the Linear Convergence of Distributed Optimization over Directed Graphs

Distributed Mirror Descent over Directed Graphs

Asymptotic stability of stochastic LTV systems with applications to distributed dynamic fusion

Graphic-theoretic distributed inference in social networks

Measurement partitioning and observational equivalence in state estimation

Consensus in the presence of interference

On the genericity properties in networked estimation: Topology design and sensor placement

Networked estimation under information constraints

DILAND: An Algorithm for Distributed Sensor Localization with Noisy Distance Measurements

Higher Dimensional Consensus: Learning in Large-Scale Networks

Distributed Sensor Localization in Random Environments using Minimal Number of Anchor Nodes

Distributing the Kalman Filter for Large-Scale Systems