On Good-for-MDPs Automata
Nondeterministic good-for-MDPs (GFM) automata are for MDP model checking and reinforcement learning what good-for-games (GFG) automata are for reactive synthesis: a more compact alternative to deterministic automata that displays nondeterminism, but only so much that it can be resolved locally, such that a syntactic product can be analysed. GFM has recently been introduced as a property for reinforcement learning, where the simpler Büchi acceptance conditions it allows to use is key. However, while there are classic and novel techniques to obtain automata that are GFM, there has not been a decision procedure for checking whether or not an automaton is GFM. We show that GFM-ness is decidable and provide an EXPTIME decision procedure as well as a PSPACE-hardness proof. We also compare the succinctness of GFM automata with other types of automata with restricted nondeterminism. The first natural comparison point are GFG automata. Deterministic automata are GFG, and GFG automata are GFM, but not vice versa. This raises the question of how these classes relate in terms of succinctness. GFG automata are known to be exponentially more succinct than deterministic automata, but the gap between GFM and GFG automata as well as the gap between ordinary nondeterministic automata and those that are GFM have been open. We establish that these gaps are exponential, and sharpen this result by showing that the latter gap remains exponential when restricting the nondeterministic automata to separating safety or unambiguous reachability automata.