Source author record

Amber Srivastava

Amber Srivastava appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.OC Artificial Intelligence eess.SY Machine Learning Systems and Control

Catalog footprint

What is connected

4works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Parameterized MDPs and Reinforcement Learning Problems -- A Maximum Entropy Principle Based Framework

We present a framework to address a class of sequential decision making problems. Our framework features learning the optimal control policy with robustness to noisy data, determining the unknown state and action parameters, and performing sensitivity analysis with respect to problem parameters. We consider two broad categories of sequential decision making problems modelled as infinite horizon Markov Decision Processes (MDPs) with (and without) an absorbing state. The central idea underlying our framework is to quantify exploration in terms of the Shannon Entropy of the trajectories under the MDP and determine the stochastic policy that maximizes it while guaranteeing a low value of the expected cost along a trajectory. This resulting policy enhances the quality of exploration early on in the learning process, and consequently allows faster convergence rates and robust solutions even in the presence of noisy data as demonstrated in our comparisons to popular algorithms such as Q-learning, Double Q-learning and entropy regularized Soft Q-learning. The framework extends to the class of parameterized MDP and RL problems, where states and actions are parameter dependent, and the objective is to determine the optimal parameters along with the corresponding optimal policy. Here, the associated cost function can possibly be non-convex with multiple poor local minima. Simulation results applied to a 5G small cell network problem demonstrate successful determination of communication routes and the small cell locations. We also obtain sensitivity measures to problem parameters and robustness to noisy environment data.

preprint2022arXiv

Time-Varying Parameters in Sequential Decision Making Problems

In this paper we address the class of Sequential Decision Making (SDM) problems that are characterized by time-varying parameters. These parameter dynamics are either pre-specified or manipulable. At any given time instant the decision policy -- that governs the sequential decisions -- along with all the parameter values determines the cumulative cost incurred by the underlying SDM. Thus, the objective is to determine the manipulable parameter dynamics as well as the time-varying decision policy such that the associated cost gets minimized at each time instant. To this end we develop a control-theoretic framework to design the unknown parameter dynamics such that it locates and tracks the optimal values of the parameters, and simultaneously determines the time-varying optimal sequential decision policy. Our methodology builds upon a Maximum Entropy Principle (MEP) based framework that addresses the static parameterized SDMs. More precisely, we utilize the resulting smooth approximation (from the above framework) of the cumulative cost as a control Lyapunov function. We show that under the resulting control law the parameters asymptotically track the local optimal, the proposed control law is Lipschitz continuous and bounded, as well as ensure that the decision policy of the SDM is optimal for a given set of parameter values. The simulations demonstrate the efficacy of our proposed methodology.

preprint2021arXiv

On Determining and Qualifying the Number of Superstates in Aggregation of Markov Chains

Many studies involving large Markov chains require determining a smaller representative (aggregated) chains. Each {\em superstate} in the representative chain represents a {\em group of related} states in the original Markov chain. Typically, the choice of number of superstates in the aggregated chain is ambiguous, and based on the limited prior know-how. In this paper we present a structured methodology of determining the best candidate for the number of superstates. We achieve this by comparing aggregated chains of different sizes. To facilitate this comparison we develop and quantify a notion of {\em marginal return}. Our notion captures the decrease in the {\em heterogeneity} within the group of the {\em related} states (i.e., states represented by the same superstate) upon a unit increase in the number of superstates in the aggregated chain. We use Maximum Entropy Principle to justify the notion of marginal return, as well as our quantification of heterogeneity. Through simulations on synthetic Markov chains, where the number of superstates are known apriori, we show that the aggregated chain with the largest marginal return identifies this number. In case of Markov chains that model real-life scenarios we show that the aggregated model with the largest marginal return identifies an inherent structure unique to the scenario being modelled; thus, substantiating on the efficacy of our proposed methodology.

preprint2020arXiv

Inequality Constraints in Facility Location and Other Similar Optimization Problems: An Entropy Based Approach

In this paper we propose an annealing based framework to incorporate inequality constraints in optimization problems such as facility location, simultaneous facility location with path optimization, and the last mile delivery problem. These inequality constraints are used to model several application specific size and capacity limitations on the corresponding facilities, transportation paths and the service vehicles. We design our algorithms in such a way that it allows to (possibly) violate the constraints during the initial stages of the algorithm, so as to facilitate a thorough exploration of the solution space; as the algorithm proceeds, this violation (controlled through the annealing parameter) is gradually lowered till the solution converges in the feasible region of the optimization problem. We present simulations on various datasets that demonstrate the efficacy of our algorithm.