Source author record

Mikaël Monet

Mikaël Monet appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Data Structures and Algorithms Databases Computational Complexity Discrete Mathematics Formal Languages and Automata Theory

Catalog footprint

What is connected

4works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2023arXiv

Enumerating Regular Languages with Bounded Delay

We study the task, for a given language $L$, of enumerating the (generally infinite) sequence of its words, without repetitions, while bounding the delay between two consecutive words. To allow for delay bounds that do not depend on the current word length, we assume a model where we produce each word by editing the preceding word with a small edit script, rather than writing out the word from scratch. In particular, this witnesses that the language is orderable, i.e., we can write its words as an infinite sequence such that the Levenshtein edit distance between any two consecutive words is bounded by a value that depends only on the language. For instance, $(a+b)^*$ is orderable (with a variant of the Gray code), but $a^* + b^*$ is not. We characterize which regular languages are enumerable in this sense, and show that this can be decided in PTIME in an input deterministic finite automaton (DFA) for the language. In fact, we show that, given a DFA $A$, we can compute in PTIME automata $A_1, \ldots, A_t$ such that $L(A)$ is partitioned as $L(A_1) \sqcup \ldots \sqcup L(A_t)$ and every $L(A_i)$ is orderable in this sense. Further, we show that the value of $t$ obtained is optimal, i.e., we cannot partition $L(A)$ into less than $t$ orderable languages. In the case where $L(A)$ is orderable (i.e., $t=1$), we show that the ordering can be produced by a bounded-delay algorithm: specifically, the algorithm runs in a suitable pointer machine model, and produces a sequence of bounded-length edit scripts to visit the words of $L(A)$ without repetitions, with bounded delay -- exponential in $|A|$ -- between each script. In fact, we show that we can achieve this while only allowing the edit operations push and pop at the beginning and end of the word, which implies that the word can in fact be maintained in a double-ended queue.

preprint2023arXiv

Weighted Counting of Matchings in Unbounded-Treewidth Graph Families

We consider a weighted counting problem on matchings, denoted $\textrm{PrMatching}(\mathcal{G})$, on an arbitrary fixed graph family $\mathcal{G}$. The input consists of a graph $G\in \mathcal{G}$ and of rational probabilities of existence on every edge of $G$, assuming independence. The output is the probability of obtaining a matching of $G$ in the resulting distribution, i.e., a set of edges that are pairwise disjoint. It is known that, if $\mathcal{G}$ has bounded treewidth, then $\textrm{PrMatching}(\mathcal{G})$ can be solved in polynomial time. In this paper we show that, under some assumptions, bounded treewidth in fact characterizes the tractable graph families for this problem. More precisely, we show intractability for all graph families $\mathcal{G}$ satisfying the following treewidth-constructibility requirement: given an integer $k$ in unary, we can construct in polynomial time a graph $G \in \mathcal{G}$ with treewidth at least $k$. Our hardness result is then the following: for any treewidth-constructible graph family $\mathcal{G}$, the problem $\textrm{PrMatching}(\mathcal{G})$ is intractable. This generalizes known hardness results for weighted matching counting under some restrictions that do not bound treewidth, e.g., being planar, 3-regular, or bipartite; it also answers a question left open in Amarilli, Bourhis and Senellart (PODS'16). We also obtain a similar lower bound for the weighted counting of edge covers.

preprint2022arXiv

Computing the Shapley Value of Facts in Query Answering

The Shapley value is a game-theoretic notion for wealth distribution that is nowadays extensively used to explain complex data-intensive computation, for instance, in network analysis or machine learning. Recent theoretical works show that query evaluation over relational databases fits well in this explanation paradigm. Yet, these works fall short of providing practical solutions to the computational challenge inherent to the Shapley computation. We present in this paper two practically effective solutions for computing Shapley values in query answering. We start by establishing a tight theoretical connection to the extensively studied problem of query evaluation over probabilistic databases, which allows us to obtain a polynomial-time algorithm for the class of queries for which probability computation is tractable. We then propose a first practical solution for computing Shapley values that adopts tools from probabilistic query evaluation. In particular, we capture the dependence of query answers on input database facts using Boolean expressions (data provenance), and then transform it, via Knowledge Compilation, into a particular circuit form for which we devise an algorithm for computing the Shapley values. Our second practical solution is a faster yet inexact approach that transforms the provenance to a Conjunctive Normal Form and uses a heuristic to compute the Shapley values. Our experiments on TPC-H and IMDB demonstrate the practical effectiveness of our solutions.

preprint2019arXiv

Connecting Knowledge Compilation Classes and Width Parameters

The field of knowledge compilation establishes the tractability of many tasks by studying how to compile them to Boolean circuit classes obeying some requirements such as structuredness, decomposability, and determinism. However, in other settings such as intensional query evaluation on databases, we obtain Boolean circuits that satisfy some width bounds, e.g., they have bounded treewidth or pathwidth. In this work, we give a systematic picture of many circuit classes considered in knowledge compilation and show how they can be systematically connected to width measures, through upper and lower bounds. Our upper bounds show that bounded-treewidth circuits can be constructively converted to d-SDNNFs, in time linear in the circuit size and singly exponential in the treewidth; and that bounded-pathwidth circuits can similarly be converted to uOBDDs. We show matching lower bounds on the compilation of monotone DNF or CNF formulas to structured targets, assuming a constant bound on the arity (size of clauses) and degree (number of occurrences of each variable): any d-SDNNF (resp., SDNNF) for such a DNF (resp., CNF) must be of exponential size in its treewidth, and the same holds for uOBDDs (resp., n-OBDDs) when considering pathwidth. Unlike most previous work, our bounds apply to any formula of this class, not just a well-chosen family. Hence, we show that pathwidth and treewidth respectively characterize the efficiency of compiling monotone DNFs to uOBDDs and d-SDNNFs with compilation being singly exponential in the corresponding width parameter. We also show that our lower bounds on CNFs extend to unstructured compilation targets, with an exponential lower bound in the treewidth (resp., pathwidth) when compiling monotone CNFs of constant arity and degree to DNNFs (resp., nFBDDs).