Source author record

Romain Hollanders

Romain Hollanders appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Discrete Mathematics Computational Complexity Computer Science and Game Theory math.CO

Catalog footprint

What is connected

3works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2015arXiv

A complexity analysis of Policy Iteration through combinatorial matrices arising from Unique Sink Orientations

Unique Sink Orientations (USOs) are an appealing abstraction of several major optimization problems of applied mathematics such as for instance Linear Programming (LP), Markov Decision Processes (MDPs) or 2-player Turn Based Stochastic Games (2TBSGs). A polynomial time algorithm to find the sink of a USO would translate into a strongly polynomial time algorithm to solve the aforementioned problems---a major quest for all three cases. In addition, we may translate MDPs and 2TBSGs into the problem of finding the sink of an acyclic USO of a cube, which can be done using the well-known Policy Iteration algorithm (PI). The study of its complexity is the object of this work. Despite its exponential worst case complexity, the principle of PI is a powerful source of inspiration for other methods. As our first contribution, we disprove Hansen and Zwick's conjecture claiming that the number of steps of PI should follow the Fibonacci sequence in the worst case. Our analysis relies on a new combinatorial formulation of the problem---the so-called Order-Regularity formulation (OR). Then, for our second contribution, we (exponentially) improve the $Ω(1.4142^n)$ lower bound on the number of steps of PI from Schurr and Szabó in the case of the OR formulation and obtain an $Ω(1.4269^n)$ bound.

preprint2014arXiv

Improved bound on the worst case complexity of Policy Iteration

Solving Markov Decision Processes (MDPs) is a recurrent task in engineering. Even though it is known that solutions for minimizing the infinite horizon expected reward can be found in polynomial time using Linear Programming techniques, iterative methods like the Policy Iteration algorithm (PI) remain usually the most efficient in practice. This method is guaranteed to converge in a finite number of steps. Unfortunately, it is known that it may require an exponential number of steps in the size of the problem to converge. On the other hand, many open questions remain considering the actual worst case complexity. In this work, we provide the first improvement over the fifteen years old upper bound from Mansour & Singh (1999) by showing that PI requires at most k/(k-1)*k^n/n + o(k^n/n) iterations to converge, where n is the number of states of the MDP and k is the maximum number of actions per state. Perhaps more importantly, we also show that this bound is optimal for an important relaxation of the problem.

preprint2011arXiv

Policy Iteration is well suited to optimize PageRank

The question of knowing whether the policy Iteration algorithm (PI) for solving Markov Decision Processes (MDPs) has exponential or (strongly) polynomial complexity has attracted much attention in the last 50 years. Recently, Fearnley proposed an example on which PI needs an exponential number of iterations to converge. Though, it has been observed that Fearnley's example leaves open the possibility that PI behaves well in many particular cases, such as in problems that involve a fixed discount factor, or that are restricted to deterministic actions. In this paper, we analyze a large class of MDPs and we argue that PI is efficient in that case. The problems in this class are obtained when optimizing the PageRank of a particular node in the Markov chain. They are motivated by several practical applications. We show that adding natural constraints to this PageRank Optimization problem (PRO) makes it equivalent to the problem of optimizing the length of a stochastic path, which is a widely studied family of MDPs. Finally, we conjecture that PI runs in a polynomial number of iterations when applied to PRO. We give numerical arguments as well as the proof of our conjecture in a number of particular cases of practical importance.