Researcher profile

Ana Bušić

Ana Bušić contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
8works
0followers
9topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

8 published item(s)

preprint2026arXiv

Structural Equivalence and Learning Dynamics in Delayed MARL

We formally establish the equivalence between Observation Delay (OD) and Action Delay (AD) in cooperative partially observable multi-agent systems using observation-action histories. We show that both systems generate identical admissible joint-policy sets, and their induced state-action-observation trajectories are identical in distribution, leading to identical optimal solutions in Decentralized Partially Observable Markov Decision Processes (Dec-POMDPs). This formally generalizes existing infinite-horizon single-agent results to any-horizon partially observable cooperative multi-agent problems with decentralized policy execution, and allows any mixed-delay configuration to be reduced to a pure OD system. We further prove that in Transition-Independent MDPs (TI-MDPs), the observation-action history reduces to a tractable minimal local augmented state. However, we show through numerical experiments that although the optimal solution spaces are structurally isomorphic, the practical learning dynamics are fundamentally different. First, using the minimal local augmented state, the equivalence no longer holds when transitions are not independent. Second, operational constraints and causal credit-assignment errors in Temporal Difference (TD) algorithms induce different learning behaviors across regimes. Finally, leveraging this structural equivalence to bypass these learning challenges, we demonstrate successful multi-agent zero-shot policy transfer from OD to AD, paving the way for unified, efficient solution methods in complex delayed systems.

preprint2020arXiv

Energy storage applications for low voltage consumers in Uruguay

Energy storage can be used for many applications in the Smart Grid such as energy arbitrage, peak demand shaving, power factor correction, energy backup to name a few, and can play a major role at increasing the capacity of power networks to host renewable energy sources. Often, storage control algorithms will need to be \textit{tailored} according to power networks billing structure, reliability restrictions, and other local power networks norms. In this paper we explore residential energy storage applications in Uruguay, one of the global leaders in renewable energies, where new low-voltage consumer contracts were recently introduced. Based on these billing mechanisms, we focus on energy arbitrage and reactive energy compensation with the aim of minimizing the cost of consumption of an end-user. Given that in the new contacts the buying and selling price of electricity are equal and that reactive power compensation is primarily governed by the installed converter, the storage operation is not sensitive to parameter uncertainties and, therefore, no lookahead is required for decision making. A threshold-based \textit{hierarchical} controller is proposed which decides on the optimal active energy for arbitrage and uses the remaining converter capacity for reactive power compensation, which is shown to increase end-user profit. Numerical results indicate that storage could be profitable, even considering battery degradation, under some but not all of the studied contracts. For the cases in which it is not, we propose the best-suited contract. Results presented here can be naturally applied whenever the tariff structure satisfies the hypothesis considered in this work.

preprint2020arXiv

Explicit Mean-Square Error Bounds for Monte-Carlo and Linear Stochastic Approximation

This paper concerns error bounds for recursive equations subject to Markovian disturbances. Motivating examples abound within the fields of Markov chain Monte Carlo (MCMC) and Reinforcement Learning (RL), and many of these algorithms can be interpreted as special cases of stochastic approximation (SA). It is argued that it is not possible in general to obtain a Hoeffding bound on the error sequence, even when the underlying Markov chain is reversible and geometrically ergodic, such as the M/M/1 queue. This is motivation for the focus on mean square error bounds for parameter estimates. It is shown that mean square error achieves the optimal rate of $O(1/n)$, subject to conditions on the step-size sequence. Moreover, the exact constants in the rate are obtained, which is of great value in algorithm design.

preprint2020arXiv

Flexibility can hurt dynamic matching system performance

We study the performance of general dynamic matching models. This model is defined by a connected graph, where nodes represent the class of items and the edges the compatibilities between items. Items of different classes arrive one by one to the system according to a given probability distribution. Upon arrival, an item is matched with a compatible item according to the First Come First Served discipline and leave the system immediately, whereas it is enqueued with other items of the same class, if any. We show that such a model may exhibit a non intuitive behavior: increasing the services ability by adding new edges in the matching graph may lead to a larger average population. This is similar to a Braess paradox. We first consider a quasicomplete graph with four nodes and we provide values of the probability distribution of the arrivals such that when we add an edge the mean number of items is larger. Then, we consider an arbitrary matching graph and we show sufficient conditions for the existence or non-existence of this paradox. We conclude that the analog to the Braess paradox in matching models is given when specific independent sets are in saturation, i.e., the system is close to the stability condition.

preprint2020arXiv

Optimal Control of Dynamic Bipartite Matching Models

A dynamic bipartite matching model is given by a bipartite matching graph which determines the possible matchings between the various types of supply and demand items. Both supply and demand items arrive to the system according to a stochastic process. Matched pairs leave the system and the others wait in the queues, which induces a holding cost. We model this problem as a Markov Decision Process and study the discounted cost and the average cost problem. We fully characterize the optimal matching policy for complete matching graphs and for the N -shaped matching graph. In the former case, the optimal policy consists of matching everything and, in the latter case, it prioritizes the matchings in the extreme edges and is of threshold type for the diagonal edge. In addition, for the average cost problem, we compute the optimal threshold value. For more general graphs, we need to consider some assumptions on the cost of the nodes. For complete graphs minus one edge, we provide conditions on the cost of the nodes such that the optimal policy of the N-shaped matching graph extends to this case. For acyclic graphs, we show that, when the cost of the extreme edges is large, the optimal matching policy prioritizes the matchings in the extreme edges. We also study the W-shaped matching graph and, using simulations, we show that there are cases where it is not optimal to prioritize to matchings in the extreme edges.

preprint2020arXiv

Storage Optimal Control under Net Metering Policies

Electricity prices and the end user net load vary with time. Electricity consumers equipped with energy storage devices can perform energy arbitrage, i.e., buy when energy is cheap or when there is a deficit of energy, and sell it when it is expensive or in excess, taking into account future variations in price and net load. Net metering policies indicate that many of the utilities apply a {customer selling} rate lower than or equal to the retail {customer buying rate} in order to compensate excess energy generated by end users. In this paper, we formulate the optimal control problem for an end user energy storage device in presence of net metering. We propose a computationally efficient algorithm, with worst case run time complexity of quadratic in terms of number of samples in lookahead horizon, that computes the optimal energy ramping rates in a time horizon. The proposed algorithm exploits the problem's piecewise linear structure and convexity properties for the \textit{discretization} of optimal Lagrange multipliers. The solution has a \textit{threshold-based structure} in which optimal control decisions are independent of past or future price as well as of net load values beyond a certain time horizon, defined as a \textit{sub-horizon}. Numerical results show the effectiveness of the proposed model and algorithm. Furthermore, we investigate the impact of forecasting errors on the proposed technique. We consider an Auto-Regressive Moving Average (ARMA) based forecasting of net load together with the Model Predictive Control (MPC). We numerically show that adaptive forecasting and MPC significantly mitigate the effects of forecast error on energy arbitrage gains.

preprint2020arXiv

Towards Phase Balancing using Energy Storage

Ad-hoc growth of single-phase-connected distributed energy resources, such as solar generation and electric vehicles, can lead to network unbalance with negative consequences on the quality and efficiency of electricity supply. Case-studies are presented for a substation in Madeira, Portugal and an EV charging facility in Pasadena, California. These case studies show that phase imbalance can happen due to a large amount of distributed generation (DG) and electric vehicle (EV) integration. We conducted stylized load-flow analysis on a radial distribution network using an openDSS-based simulator to understand such negative effects of phase imbalance on neutral and phase conductor losses, and in voltage drop/rise. We evaluate the integration of storage in the distribution network as a possible solution for mitigating effects caused by imbalance. We present control architectures of storage operation for phase balancing. Numerically we show that relatively small-sized storage (compared to unbalance magnitude) can significantly reduce network imbalance. We identify the end node of the feeder as the best location to install storage.

preprint2020arXiv

Zap Q-Learning With Nonlinear Function Approximation

Zap Q-learning is a recent class of reinforcement learning algorithms, motivated primarily as a means to accelerate convergence. Stability theory has been absent outside of two restrictive classes: the tabular setting, and optimal stopping. This paper introduces a new framework for analysis of a more general class of recursive algorithms known as stochastic approximation. Based on this general theory, it is shown that Zap Q-learning is consistent under a non-degeneracy assumption, even when the function approximation architecture is nonlinear. Zap Q-learning with neural network function approximation emerges as a special case, and is tested on examples from OpenAI Gym. Based on multiple experiments with a range of neural network sizes, it is found that the new algorithms converge quickly and are robust to choice of function approximation architecture.