Source author record

Ana Bušić

Ana Bušić appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Systems and Control math.OC math.PR Discrete Mathematics eess.SY Machine Learning Applications Computer Science and Game Theory math.ST Performance Statistics Theory

Catalog footprint

What is connected

20works

11topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Structural Equivalence and Learning Dynamics in Delayed MARL

We formally establish the equivalence between Observation Delay (OD) and Action Delay (AD) in cooperative partially observable multi-agent systems using observation-action histories. We show that both systems generate identical admissible joint-policy sets, and their induced state-action-observation trajectories are identical in distribution, leading to identical optimal solutions in Decentralized Partially Observable Markov Decision Processes (Dec-POMDPs). This formally generalizes existing infinite-horizon single-agent results to any-horizon partially observable cooperative multi-agent problems with decentralized policy execution, and allows any mixed-delay configuration to be reduced to a pure OD system. We further prove that in Transition-Independent MDPs (TI-MDPs), the observation-action history reduces to a tractable minimal local augmented state. However, we show through numerical experiments that although the optimal solution spaces are structurally isomorphic, the practical learning dynamics are fundamentally different. First, using the minimal local augmented state, the equivalence no longer holds when transitions are not independent. Second, operational constraints and causal credit-assignment errors in Temporal Difference (TD) algorithms induce different learning behaviors across regimes. Finally, leveraging this structural equivalence to bypass these learning challenges, we demonstrate successful multi-agent zero-shot policy transfer from OD to AD, paving the way for unified, efficient solution methods in complex delayed systems.

preprint2020arXiv

Energy storage applications for low voltage consumers in Uruguay

Energy storage can be used for many applications in the Smart Grid such as energy arbitrage, peak demand shaving, power factor correction, energy backup to name a few, and can play a major role at increasing the capacity of power networks to host renewable energy sources. Often, storage control algorithms will need to be \textit{tailored} according to power networks billing structure, reliability restrictions, and other local power networks norms. In this paper we explore residential energy storage applications in Uruguay, one of the global leaders in renewable energies, where new low-voltage consumer contracts were recently introduced. Based on these billing mechanisms, we focus on energy arbitrage and reactive energy compensation with the aim of minimizing the cost of consumption of an end-user. Given that in the new contacts the buying and selling price of electricity are equal and that reactive power compensation is primarily governed by the installed converter, the storage operation is not sensitive to parameter uncertainties and, therefore, no lookahead is required for decision making. A threshold-based \textit{hierarchical} controller is proposed which decides on the optimal active energy for arbitrage and uses the remaining converter capacity for reactive power compensation, which is shown to increase end-user profit. Numerical results indicate that storage could be profitable, even considering battery degradation, under some but not all of the studied contracts. For the cases in which it is not, we propose the best-suited contract. Results presented here can be naturally applied whenever the tariff structure satisfies the hypothesis considered in this work.

preprint2020arXiv

Explicit Mean-Square Error Bounds for Monte-Carlo and Linear Stochastic Approximation

This paper concerns error bounds for recursive equations subject to Markovian disturbances. Motivating examples abound within the fields of Markov chain Monte Carlo (MCMC) and Reinforcement Learning (RL), and many of these algorithms can be interpreted as special cases of stochastic approximation (SA). It is argued that it is not possible in general to obtain a Hoeffding bound on the error sequence, even when the underlying Markov chain is reversible and geometrically ergodic, such as the M/M/1 queue. This is motivation for the focus on mean square error bounds for parameter estimates. It is shown that mean square error achieves the optimal rate of $O(1/n)$, subject to conditions on the step-size sequence. Moreover, the exact constants in the rate are obtained, which is of great value in algorithm design.

preprint2020arXiv

Flexibility can hurt dynamic matching system performance

We study the performance of general dynamic matching models. This model is defined by a connected graph, where nodes represent the class of items and the edges the compatibilities between items. Items of different classes arrive one by one to the system according to a given probability distribution. Upon arrival, an item is matched with a compatible item according to the First Come First Served discipline and leave the system immediately, whereas it is enqueued with other items of the same class, if any. We show that such a model may exhibit a non intuitive behavior: increasing the services ability by adding new edges in the matching graph may lead to a larger average population. This is similar to a Braess paradox. We first consider a quasicomplete graph with four nodes and we provide values of the probability distribution of the arrivals such that when we add an edge the mean number of items is larger. Then, we consider an arbitrary matching graph and we show sufficient conditions for the existence or non-existence of this paradox. We conclude that the analog to the Braess paradox in matching models is given when specific independent sets are in saturation, i.e., the system is close to the stability condition.

preprint2020arXiv

Optimal Control of Dynamic Bipartite Matching Models

A dynamic bipartite matching model is given by a bipartite matching graph which determines the possible matchings between the various types of supply and demand items. Both supply and demand items arrive to the system according to a stochastic process. Matched pairs leave the system and the others wait in the queues, which induces a holding cost. We model this problem as a Markov Decision Process and study the discounted cost and the average cost problem. We fully characterize the optimal matching policy for complete matching graphs and for the N -shaped matching graph. In the former case, the optimal policy consists of matching everything and, in the latter case, it prioritizes the matchings in the extreme edges and is of threshold type for the diagonal edge. In addition, for the average cost problem, we compute the optimal threshold value. For more general graphs, we need to consider some assumptions on the cost of the nodes. For complete graphs minus one edge, we provide conditions on the cost of the nodes such that the optimal policy of the N-shaped matching graph extends to this case. For acyclic graphs, we show that, when the cost of the extreme edges is large, the optimal matching policy prioritizes the matchings in the extreme edges. We also study the W-shaped matching graph and, using simulations, we show that there are cases where it is not optimal to prioritize to matchings in the extreme edges.

preprint2020arXiv

Storage Optimal Control under Net Metering Policies

Electricity prices and the end user net load vary with time. Electricity consumers equipped with energy storage devices can perform energy arbitrage, i.e., buy when energy is cheap or when there is a deficit of energy, and sell it when it is expensive or in excess, taking into account future variations in price and net load. Net metering policies indicate that many of the utilities apply a {customer selling} rate lower than or equal to the retail {customer buying rate} in order to compensate excess energy generated by end users. In this paper, we formulate the optimal control problem for an end user energy storage device in presence of net metering. We propose a computationally efficient algorithm, with worst case run time complexity of quadratic in terms of number of samples in lookahead horizon, that computes the optimal energy ramping rates in a time horizon. The proposed algorithm exploits the problem's piecewise linear structure and convexity properties for the \textit{discretization} of optimal Lagrange multipliers. The solution has a \textit{threshold-based structure} in which optimal control decisions are independent of past or future price as well as of net load values beyond a certain time horizon, defined as a \textit{sub-horizon}. Numerical results show the effectiveness of the proposed model and algorithm. Furthermore, we investigate the impact of forecasting errors on the proposed technique. We consider an Auto-Regressive Moving Average (ARMA) based forecasting of net load together with the Model Predictive Control (MPC). We numerically show that adaptive forecasting and MPC significantly mitigate the effects of forecast error on energy arbitrage gains.

preprint2020arXiv

Towards Phase Balancing using Energy Storage

Ad-hoc growth of single-phase-connected distributed energy resources, such as solar generation and electric vehicles, can lead to network unbalance with negative consequences on the quality and efficiency of electricity supply. Case-studies are presented for a substation in Madeira, Portugal and an EV charging facility in Pasadena, California. These case studies show that phase imbalance can happen due to a large amount of distributed generation (DG) and electric vehicle (EV) integration. We conducted stylized load-flow analysis on a radial distribution network using an openDSS-based simulator to understand such negative effects of phase imbalance on neutral and phase conductor losses, and in voltage drop/rise. We evaluate the integration of storage in the distribution network as a possible solution for mitigating effects caused by imbalance. We present control architectures of storage operation for phase balancing. Numerically we show that relatively small-sized storage (compared to unbalance magnitude) can significantly reduce network imbalance. We identify the end node of the feeder as the best location to install storage.

preprint2020arXiv

Zap Q-Learning With Nonlinear Function Approximation

Zap Q-learning is a recent class of reinforcement learning algorithms, motivated primarily as a means to accelerate convergence. Stability theory has been absent outside of two restrictive classes: the tabular setting, and optimal stopping. This paper introduces a new framework for analysis of a more general class of recursive algorithms known as stochastic approximation. Based on this general theory, it is shown that Zap Q-learning is consistent under a non-degeneracy assumption, even when the function approximation architecture is nonlinear. Zap Q-learning with neural network function approximation emerges as a special case, and is tested on examples from OpenAI Gym. Based on multiple experiments with a range of neural network sizes, it is found that the new algorithms converge quickly and are robust to choice of function approximation architecture.

preprint2016arXiv

Approximate optimality with bounded regret in dynamic matching models

We consider a discrete-time bipartite matching model with random arrivals of units of supply and demand that can wait in queues located at the nodes in the network. A control policy determines which are matched at each time. The focus is on the infinite-horizon average-cost optimal control problem. A relaxation of the stochastic control problem is proposed, which is found to be a special case of an inventory model, as treated in the classical theory of Clark and Scarf. The optimal policy for the relaxation admits a closed-form expression. Based on the policy for this relaxation, a new matching policy is proposed. For a parameterized family of models in which the network load approaches capacity, this policy is shown to be approximately optimal, with bounded regret, even though the average cost grows without bound.

preprint2016arXiv

Distributed Randomized Control for Demand Dispatch

The paper concerns design of control systems for Demand Dispatch to obtain ancillary services to the power grid by harnessing inherent flexibility in many loads. The role of "local intelligence" at the load has been advocated in prior work, randomized local controllers that manifest this intelligence are convenient for loads with a finite number of states. The present work introduces two new design techniques for these randomized controllers: (i) The Individual Perspective Design (IPD) is based on the solution to a one-dimensional family of Markov Decision Processes, whose objective function is formulated from the point of view of a single load. The family of dynamic programming equation appears complex, but it is shown that it is obtained through the solution of a single ordinary differential equation. (ii) The System Perspective Design (SPD) is motivated by a single objective of the grid operator: Passivity of any linearization of the aggregate input-output model. A solution is obtained that can again be computed through the solution of a single ordinary differential equation. Numerical results complement these theoretical results.

preprint2016arXiv

Estimation and Control of Quality of Service in Demand Dispatch

It is now well known that flexibility of energy consumption can be harnessed for the purposes of grid-level ancillary services. In particular, through distributed control of a collection of loads, a balancing authority regulation signal can be tracked accurately, while ensuring that the quality of service (QoS) for each load is acceptable {\it on average}. In this paper it is argued that a histogram of QoS is approximately Gaussian, and consequently each load will eventually receive poor service. Statistical techniques are developed to estimate the mean and variance of QoS as a function of the power spectral density of the regulation signal. It is also shown that additional local control can eliminate risk: The histogram of QoS is {\it truncated} through this local control, so that strict bounds on service quality are guaranteed. While there is a tradeoff between the grid-level tracking performance (capacity and accuracy) and the bounds imposed on QoS, it is found that the loss of capacity is minor in typical cases.

preprint2016arXiv

State Estimation for the Individual and the Population in Mean Field Control with Application to Demand Dispatch

This paper concerns state estimation problems in a mean field control setting. In a finite population model, the goal is to estimate the joint distribution of the population state and the state of a typical individual. The observation equations are a noisy measurement of the population. The general results are applied to demand dispatch for regulation of the power grid, based on randomized local control algorithms. In prior work by the authors it has been shown that local control can be carefully designed so that the aggregate of loads behaves as a controllable resource with accuracy matching or exceeding traditional sources of frequency regulation. The operational cost is nearly zero in many cases. The information exchange between grid and load is minimal, but it is assumed in the overall control architecture that the aggregate power consumption of loads is available to the grid operator. It is shown that the Kalman filter can be constructed to reduce these communication requirements,

preprint2015arXiv

Smart Fridge / Dumb Grid? Demand Dispatch for the Power Grid of 2020

In discussions at the 2015 HICSS meeting, it was argued that loads can provide most of the ancillary services required today and in the future. Through load-level and grid-level control design, high-quality ancillary service for the grid is obtained without impacting quality of service delivered to the consumer. This approach to grid regulation is called demand dispatch: loads are providing service continuously and automatically, without consumer interference. In this paper we ask, what intelligence is required at the grid-level? In particular, does the grid-operator require more than one-way communication to the loads? Our main conclusion: risk is not great in lower frequency ranges, e.g., PJM's RegA or BPA's balancing reserves. In particular, ancillary services from refrigerators and pool-pumps can be obtained successfully with only one-way communication. This requires intelligence at the loads, and much less intelligence at the grid level.

preprint2015arXiv

Speeding up Glauber Dynamics for Random Generation of Independent Sets

The maximum independent set (MIS) problem is a well-studied combinatorial optimization problem that naturally arises in many applications, such as wireless communication, information theory and statistical mechanics. MIS problem is NP-hard, thus many results in the literature focus on fast generation of maximal independent sets of high cardinality. One possibility is to combine Gibbs sampling with coupling from the past arguments to detect convergence to the stationary regime. This results in a sampling procedure with time complexity that depends on the mixing time of the Glauber dynamics Markov chain. We propose an adaptive method for random event generation in the Glauber dynamics that considers only the events that are effective in the coupling from the past scheme, accelerating the convergence time of the Gibbs sampling algorithm.

preprint2014arXiv

Ancillary Service to the Grid Using Intelligent Deferrable Loads

Renewable energy sources such as wind and solar power have a high degree of unpredictability and time-variation, which makes balancing demand and supply challenging. One possible way to address this challenge is to harness the inherent flexibility in demand of many types of loads. Introduced in this paper is a technique for decentralized control for automated demand response that can be used by grid operators as ancillary service for maintaining demand-supply balance. A Markovian Decision Process (MDP) model is introduced for an individual load. A randomized control architecture is proposed, motivated by the need for decentralized decision making, and the need to avoid synchronization that can lead to large and detrimental spikes in demand. An aggregate model for a large number of loads is then developed by examining the mean field limit. A key innovation is an LTI-system approximation of the aggregate nonlinear model, with a scalar signal as the input and a measure of the aggregate demand as the output. This makes the approximation particularly convenient for control design at the grid level. The second half of the paper contains a detailed application of these results to a network of residential pools. Simulations are provided to illustrate the accuracy of the approximations and effectiveness of the proposed control approach.

preprint2014arXiv

Exact Simulation for Assemble-To-Order Systems

We develop exact simulation (also known as perfect sampling) algorithms for a family of assemble-to-order systems. Due to the finite capacity, and coupling in demands and replenishments, known solving techniques are inefficient for larger problem instances. We first consider the case with individual replenishments of items, and derive an event based representation of the Markov chain that allows applying existing exact simulation techniques, using the monotonicity properties or bounding chains. In the case of joint replenishments, the state space becomes intractable for the existing methods. We propose new exact simulation algorithms, based on aggregation and bounding chains, that allow a significant reduction of the state space of the Markov chain. We also discuss the coupling times of considered models and provide sufficient conditions for linear (in the single server replenishment case) or quadratic (many server case) complexity of our algorithms in terms of the total capacity in the system.

preprint2014arXiv

Individual risk in mean-field control models for decentralized control, with application to automated demand response

Flexibility of energy consumption can be harnessed for the purposes of ancillary services in a large power grid. In prior work by the authors a randomized control architecture is introduced for individual loads for this purpose. In examples it is shown that the control architecture can be designed so that control of the loads is easy at the grid level: Tracking of a balancing authority reference signal is possible, while ensuring that the quality of service (QoS) for each load is acceptable on average. The analysis was based on a mean field limit (as the number of loads approaches infinity), combined with an LTI-system approximation of the aggregate nonlinear model. This paper examines in depth the issue of individual risk in these systems. The main contributions of the paper are of two kinds: Risk is modeled and quantified: (i) The average performance is not an adequate measure of success. It is found empirically that a histogram of QoS is approximately Gaussian, and consequently each load will eventually receive poor service. (ii) The variance can be estimated from a refinement of the LTI model that includes a white-noise disturbance; variance is a function of the randomized policy, as well as the power spectral density of the reference signal. Additional local control can eliminate risk: (iii) The histogram of QoS is truncated through this local control, so that strict bounds on service quality are guaranteed. (iv) This has insignificant impact on the grid-level performance, beyond a modest reduction in capacity of ancillary service.

preprint2014arXiv

Passive Dynamics in Mean Field Control

Mean-field models are a popular tool in a variety of fields. They provide an understanding of the impact of interactions among a large number of particles or people or other "self-interested agents", and are an increasingly popular tool in distributed control. This paper considers a particular randomized distributed control architecture introduced in our own recent work. In numerical results it was found that the associated mean-field model had attractive properties for purposes of control. In particular, when viewed as an input-output system, its linearization was found to be minimum phase. In this paper we take a closer look at the control model. The results are summarized as follows: (i) The Markov Decision Process framework of Todorov is extended to continuous time models, in which the "control cost" is based on relative entropy. This is the basis of the construction of a family of controlled Markovian generators. (ii) A decentralized control architecture is proposed in which each agent evolves as a controlled Markov process. A central authority broadcasts a common control signal to each agent. The central authority chooses this signal based on an aggregate scalar output of the Markovian agents. (iii) Provided the control-free system is a reversible Markov process, the following identity holds for the linearization, \[ \text{Real} (G(jω)) = \text{PSD}_Y(ω)\ge 0, \quad ω\in\Re, \] where the right hand side denotes the power spectral density for the output of any one of the individual (control-free) Markov processes.

preprint2011arXiv

Perfect Sampling of Markov Chains with Piecewise Homogeneous Events

Perfect sampling is a technique that uses coupling arguments to provide a sample from the stationary distribution of a Markov chain in a finite time without ever computing the distribution. This technique is very efficient if all the events in the system have monotonicity property. However, in the general (non-monotone) case, this technique needs to consider the whole state space, which limits its application only to chains with a state space of small cardinality. We propose here a new approach for the general case that only needs to consider two trajectories. Instead of the original chain, we use two bounding processes (envelopes) and we show that, whenever they couple, one obtains a sample under the stationary distribution of the original chain. We show that this new approach is particularly effective when the state space can be partitioned into pieces where envelopes can be easily computed. We further show that most Markovian queueing networks have this property and we propose efficient algorithms for some of them.

preprint2010arXiv

Stability of the bipartite matching model

We consider the bipartite matching model of customers and servers introduced by Caldentey, Kaplan, and Weiss (Adv. Appl. Probab., 2009). Customers and servers play symmetrical roles. There is a finite set C resp. S, of customer, resp. server, classes. Time is discrete and at each time step, one customer and one server arrive in the system according to a joint probability measure on CxS, independently of the past. Also, at each time step, pairs of matched customer and server, if they exist, depart from the system. Authorized matchings are given by a fixed bipartite graph. A matching policy is chosen, which decides how to match when there are several possibilities. Customers/servers that cannot be matched are stored in a buffer. The evolution of the model can be described by a discrete time Markov chain. We study its stability under various admissible matching policies including: ML (Match the Longest), MS (Match the Shortest), FIFO (match the oldest), priorities. There exist natural necessary conditions for stability (independent of the matching policy) defining the maximal possible stability region. For some bipartite graphs, we prove that the stability region is indeed maximal for any admissible matching policy. For the ML policy, we prove that the stability region is maximal for any bipartite graph. For the MS and priority policies, we exhibit a bipartite graph with a non-maximal stability region.

Ana Bušić

What is connected

Connect this record

See the researcher in context

Building this map preview

20 published item(s)

Structural Equivalence and Learning Dynamics in Delayed MARL

Energy storage applications for low voltage consumers in Uruguay

Explicit Mean-Square Error Bounds for Monte-Carlo and Linear Stochastic Approximation

Flexibility can hurt dynamic matching system performance

Optimal Control of Dynamic Bipartite Matching Models

Storage Optimal Control under Net Metering Policies

Towards Phase Balancing using Energy Storage

Zap Q-Learning With Nonlinear Function Approximation

Approximate optimality with bounded regret in dynamic matching models

Distributed Randomized Control for Demand Dispatch

Estimation and Control of Quality of Service in Demand Dispatch

State Estimation for the Individual and the Population in Mean Field Control with Application to Demand Dispatch

Smart Fridge / Dumb Grid? Demand Dispatch for the Power Grid of 2020

Speeding up Glauber Dynamics for Random Generation of Independent Sets

Ancillary Service to the Grid Using Intelligent Deferrable Loads

Exact Simulation for Assemble-To-Order Systems

Individual risk in mean-field control models for decentralized control, with application to automated demand response

Passive Dynamics in Mean Field Control

Perfect Sampling of Markov Chains with Piecewise Homogeneous Events

Stability of the bipartite matching model