Researcher profile

Soummya Kar

Soummya Kar contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
15works
0followers
9topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

15 published item(s)

preprint2022arXiv

Distributed Stochastic Gradient Descent: Nonconvexity, Nonsmoothness, and Convergence to Local Minima

In centralized settings, it is well known that stochastic gradient descent (SGD) avoids saddle points and converges to local minima in nonconvex problems. However, similar guarantees are lacking for distributed first-order algorithms. The paper studies distributed stochastic gradient descent (D-SGD)--a simple network-based implementation of SGD. Conditions under which D-SGD avoids saddle points and converges to local minima are studied. First, we consider the problem of computing critical points. Assuming loss functions are nonconvex and possibly nonsmooth, it is shown that, for each fixed initialization, D-SGD converges to critical points of the loss with probability one. Next, we consider the problem of avoiding saddle points. In this case, we again assume that loss functions may be nonconvex and nonsmooth, but are smooth in a neighborhood of a saddle point. It is shown that, for any fixed initialization, D-SGD avoids such saddle points with probability one. Results are proved by studying the underlying (distributed) gradient flow, using the ordinary differential equation (ODE) method of stochastic approximation, and extending classical techniques from dynamical systems theory such as stable manifolds. Results are proved in the general context of subspace-constrained optimization, of which D-SGD is a special case.

preprint2022arXiv

Gradient Based Clustering

We propose a general approach for distance based clustering, using the gradient of the cost function that measures clustering quality with respect to cluster assignments and cluster center positions. The approach is an iterative two step procedure (alternating between cluster assignment and cluster center updates) and is applicable to a wide range of functions, satisfying some mild assumptions. The main advantage of the proposed approach is a simple and computationally cheap update rule. Unlike previous methods that specialize to a specific formulation of the clustering problem, our approach is applicable to a wide range of costs, including non-Bregman clustering methods based on the Huber loss. We analyze the convergence of the proposed algorithm, and show that it converges to the set of appropriately defined fixed points, under arbitrary center initialization. In the special case of Bregman cost functions, the algorithm converges to the set of centroidal Voronoi partitions, which is consistent with prior works. Numerical experiments on real data demonstrate the effectiveness of the proposed method.

preprint2022arXiv

Nonlinear gradient mappings and stochastic optimization: A general framework with applications to heavy-tail noise

We introduce a general framework for nonlinear stochastic gradient descent (SGD) for the scenarios when gradient noise exhibits heavy tails. The proposed framework subsumes several popular nonlinearity choices, like clipped, normalized, signed or quantized gradient, but we also consider novel nonlinearity choices. We establish for the considered class of methods strong convergence guarantees assuming a strongly convex cost function with Lipschitz continuous gradients under very general assumptions on the gradient noise. Most notably, we show that, for a nonlinearity with bounded outputs and for the gradient noise that may not have finite moments of order greater than one, the nonlinear SGD's mean squared error (MSE), or equivalently, the expected cost function's optimality gap, converges to zero at rate~$O(1/t^ζ)$, $ζ\in (0,1)$. In contrast, for the same noise setting, the linear SGD generates a sequence with unbounded variances. Furthermore, for the nonlinearities that can be decoupled component wise, like, e.g., sign gradient or component-wise clipping, we show that the nonlinear SGD asymptotically (locally) achieves a $O(1/t)$ rate in the weak convergence sense and explicitly quantify the corresponding asymptotic variance. Experiments show that, while our framework is more general than existing studies of SGD under heavy-tail noise, several easy-to-implement nonlinearities from our framework are competitive with state of the art alternatives on real data sets with heavy tail noises.

preprint2022arXiv

Personalized Federated Learning via Convex Clustering

We propose a parametric family of algorithms for personalized federated learning with locally convex user costs. The proposed framework is based on a generalization of convex clustering in which the differences between different users' models are penalized via a sum-of-norms penalty, weighted by a penalty parameter $λ$. The proposed approach enables "automatic" model clustering, without prior knowledge of the hidden cluster structure, nor the number of clusters. Analytical bounds on the weight parameter, that lead to simultaneous personalization, generalization and automatic model clustering are provided. The solution to the formulated problem enables personalization, by providing different models across different clusters, and generalization, by providing models different than the per-user models computed in isolation. We then provide an efficient algorithm based on the Parallel Direction Method of Multipliers (PDMM) to solve the proposed formulation in a federated server-users setting. Numerical experiments corroborate our findings. As an interesting byproduct, our results provide several generalizations to convex clustering.

preprint2022arXiv

Variance reduced stochastic optimization over directed graphs with row and column stochastic weights

This paper proposes AB-SAGA, a first-order distributed stochastic optimization method to minimize a finite-sum of smooth and strongly convex functions distributed over an arbitrary directed graph. AB-SAGA removes the uncertainty caused by the stochastic gradients using a node-level variance reduction and subsequently employs network-level gradient tracking to address the data dissimilarity across the nodes. Unlike existing methods that use the nonlinear push-sum correction to cancel the imbalance caused by the directed communication, the consensus updates in AB-SAGA are linear and uses both row and column stochastic weights. We show that for a constant step-size, AB-SAGA converges linearly to the global optimal. We quantify the directed nature of the underlying graph using an explicit directivity constant and characterize the regimes in which AB-SAGA achieves a linear speed-up over its centralized counterpart. Numerical experiments illustrate the convergence of AB-SAGA for strongly convex and nonconvex problems.

preprint2020arXiv

A Circuit-Theoretic Approach to State Estimation

Traditional state estimation (SE) methods that are based on nonlinear minimization of the sum of localized measurement error functionals are known to suffer from non-convergence and large residual errors. In this paper we propose an equivalent circuit formulation (ECF)-based SE approach that inherently considers the complete network topology and associated physical constraints. We analyze the mathematical differences between the two approaches and show that our approach produces a linear state-estimator that is mathematically a quadratic programming (QP) problem with closed-form solution. Furthermore, this formulation imposes additional topology-based constraints that provably shrink the feasible region and promote convergence to a more physically meaningful solution. From a probabilistic viewpoint, we show that our method applies prior knowledge into the estimate, thus converging to a more physics-based estimate than the traditional observation-driven maximum likelihood estimator (MLE). Importantly, incorporation of the entire system topology and underlying physics, while being linear, makes ECF-based SE advantageous for large-scale systems.

preprint2020arXiv

Distributed Gradient Flow: Nonsmoothness, Nonconvexity, and Saddle Point Evasion

The paper considers distributed gradient flow (DGF) for multi-agent nonconvex optimization. DGF is a continuous-time approximation of distributed gradient descent that is often easier to study than its discrete-time counterpart. The paper has two main contributions. First, the paper considers optimization of nonsmooth, nonconvex objective functions. It is shown that DGF converges to critical points in this setting. The paper then considers the problem of avoiding saddle points. It is shown that if agents' objective functions are assumed to be smooth and nonconvex, then DGF can only converge to a saddle point from a zero-measure set of initial conditions. To establish this result, the paper proves a stable manifold theorem for DGF, which is a fundamental contribution of independent interest. In a companion paper, analogous results are derived for discrete-time algorithms.

preprint2020arXiv

Distributed Gradient Methods for Nonconvex Optimization: Local and Global Convergence Guarantees

The article discusses distributed gradient-descent algorithms for computing local and global minima in nonconvex optimization. For local optimization, we focus on distributed stochastic gradient descent (D-SGD)--a simple network-based variant of classical SGD. We discuss local minima convergence guarantees and explore the simple but critical role of the stable-manifold theorem in analyzing saddle-point avoidance. For global optimization, we discuss annealing-based methods in which slowly decaying noise is added to D-SGD. Conditions are discussed under which convergence to global minima is guaranteed. Numerical examples illustrate the key concepts in the paper.

preprint2020arXiv

Gradient tracking and variance reduction for decentralized optimization and machine learning

Decentralized methods to solve finite-sum minimization problems are important in many signal processing and machine learning tasks where the data is distributed over a network of nodes and raw data sharing is not permitted due to privacy and/or resource constraints. In this article, we review decentralized stochastic first-order methods and provide a unified algorithmic framework that combines variance-reduction with gradient tracking to achieve both robust performance and fast convergence. We provide explicit theoretical guarantees of the corresponding methods when the objective functions are smooth and strongly-convex, and show their applicability to non-convex problems via numerical experiments. Throughout the article, we provide intuitive illustrations of the main technical ideas by casting appropriate tradeoffs and comparisons among the methods of interest and by highlighting applications to decentralized training of machine learning models.

preprint2020arXiv

Power System Dispatch with Marginal Degradation Cost of Battery Storage

Battery storage is essential for the future smart grid. The inevitable cell degradation renders the battery lifetime volatile and highly dependent on battery dispatch, and thus incurs opportunity cost. This paper rigorously derives the marginal degradation cost of battery for power system dispatch. The derived optimal marginal degradation cost is time-variant to reflect the time value of money and the functionality fade of battery and takes the form of a constant value divided by a discount factor plus a term related to battery state of health. In case studies, we demonstrate the evolution of the optimal marginal costs of degradation that corresponds to the optimal long-term dispatch outcome. We also show that the optimal marginal cost of degradation depends on the marginal cost of generation in the grid.

preprint2020arXiv

Resilient Distributed Field Estimation

We study resilient distributed field estimation under measurement attacks. A network of agents or devices measures a large, spatially distributed physical field parameter. An adversary arbitrarily manipulates the measurements of some of the agents. Each agent's goal is to process its measurements and information received from its neighbors to estimate only a few specific components of the field. We present $\mathbf{SAFE}$, the Saturating Adaptive Field Estimator, a consensus+innovations distributed field estimator that is resilient to measurement attacks. Under sufficient conditions on the compromised measurement streams, the physical coupling between the field and the agents' measurements, and the connectivity of the cyber communication network, $\mathbf{SAFE}$ guarantees that each agent's estimate converges almost surely to the true value of the components of the parameter in which the agent is interested. Finally, we illustrate the performance of $\mathbf{SAFE}$ through numerical examples.

preprint2020arXiv

S-ADDOPT: Decentralized stochastic first-order optimization over directed graphs

In this report, we study decentralized stochastic optimization to minimize a sum of smooth and strongly convex cost functions when the functions are distributed over a directed network of nodes. In contrast to the existing work, we use gradient tracking to improve certain aspects of the resulting algorithm. In particular, we propose the~\textbf{\texttt{S-ADDOPT}} algorithm that assumes a stochastic first-order oracle at each node and show that for a constant step-size~$α$, each node converges linearly inside an error ball around the optimal solution, the size of which is controlled by~$α$. For decaying step-sizes~$\mathcal{O}(1/k)$, we show that~\textbf{\texttt{S-ADDOPT}} reaches the exact solution sublinearly at~$\mathcal{O}(1/k)$ and its convergence is asymptotically network-independent. Thus the asymptotic behavior of~\textbf{\texttt{S-ADDOPT}} is comparable to the centralized stochastic gradient descent. Numerical experiments over both strongly convex and non-convex problems illustrate the convergence behavior and the performance comparison of the proposed algorithm.

preprint2020arXiv

The economics of utility-scale portable energy storage systems in a high-renewable grid

Battery storage is expected to play a crucial role in the low-carbon transformation of energy systems. The deployment of battery storage in the power gird, however, is currently severely limited by its low economic viability, which results from not only high capital costs but also the lack of flexible and efficient utilization schemes and business models. Making utility-scale battery storage portable through trucking unlocks its capability to provide various on-demand services. We introduce the potential applications of utility-scale portable energy storage and investigate its economics in California using a spatiotemporal decision model that determines the optimal operation and transportation schedules of portable storage. We show that mobilizing energy storage can increase its life-cycle revenues by 70% in some areas and improve renewable energy integration by relieving local transmission congestion. The life-cycle revenue of spatiotemporal arbitrage can fully compensate for the costs of portable energy storage system in several regions in California, including San Diego and the San Francisco Bay Area.

preprint2019arXiv

The Economic End of Life of Electrochemical Energy Storage

The useful life of electrochemical energy storage (EES) is a critical factor to EES planning, operation, and economic assessment. Today, systems commonly assume a physical end-of-life criterion, retiring EES when the remaining capacity reaches a threshold below which the EES is of little use because of functionality degradation. Here, we propose an economic end of life criterion, where EES is retired when it cannot earn positive net economic benefit in its intended application. This criterion depends on the use case and degradation characteristics of the EES, but is independent of initial capital cost. Using an intertemporal operational framework to consider functionality and profitability degradation, our case study shows that the economic end of life could occur significantly faster than the physical end of life. We argue that both criteria should be applied in EES system planning and assessment. We also analyze how R&D efforts should consider cycling capability and calendar degradation rate when considering the economic end-of-life of EES.

preprint2018arXiv

Spatiotemporal Arbitrage of Large-Scale Portable Energy Storage for Grid Congestion Relief

Energy storage has great potential in grid congestion relief. By making large-scale energy storage portable through trucking, its capability to address grid congestion can be greatly enhanced. This paper explores a business model of large-scale portable energy storage for spatiotemporal arbitrage over nodes with congestion. We propose a spatiotemporal arbitrage model to determine the optimal operation and transportation schedules of portable storage. To validate the business model, we simulate the schedules of a Tesla Semi full of Tesla Powerpack doing arbitrage over two nodes in California with local transmission congestion. The results indicate that the contributions of portable storage to congestion relief are much greater than that of stationary storage, and that trucking storage can bring net profit in energy arbitrage applications.