Source author record

Florian Ingels

Florian Ingels appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Data Structures and Algorithms Discrete Mathematics Machine Learning math.CO

Catalog footprint

What is connected

4works

4topics

1close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2023arXiv

Detection of Common Subtrees with Identical Label Distribution

Frequent pattern mining is a relevant method to analyse structured data, like sequences, trees or graphs. It consists in identifying characteristic substructures of a dataset. This paper deals with a new type of patterns for tree data: common subtrees with identical label distribution. Their detection is far from obvious since the underlying isomorphism problem is graph isomorphism complete. An elaborated search algorithm is developed and analysed from both theoretical and numerical perspectives. Based on this, the enumeration of patterns is performed through a new lossless compression scheme for trees, called DAG-RW, whose complexity is investigated as well. The method shows very good properties, both in terms of computation times and analysis of real datasets from the literature. Compared to other substructures like topological subtrees and labelled subtrees for which the isomorphism problem is linear, the patterns found provide a more parsimonious representation of the data.

preprint2022arXiv

Enumeration of Irredundant Forests

Reverse search is a convenient method for enumerating structured objects, that can be used both to address theoretical issues and to solve data mining problems. This method has already been successfully developed to handle unordered trees. If the literature proposes solutions to enumerate singletons of trees, we study in this article a more general problem, the enumeration of sets of trees -- forests. Specifically, we mainly study irredundant forests, i.e., where no tree is a subtree of another. By compressing each such forest into a Directed Acyclic Graph (DAG), we develop a reverse search like method to enumerate DAGs compressing irredundant forests. Remarkably, we prove that these DAGs are in bijection with the row-Fishburn matrices, a well-studied class of combinatorial objects. In a second step, we derive our irredundant forest enumeration to provide algorithms for tackling related problems: (i) enumeration of forests in their classical sense (where redundancy is allowed); (ii) the enumeration of "subforests" of a forest, and (iii) the frequent "subforest" mining problem. All the methods presented in this article enumerate each item uniquely, up to isomorphism.

preprint2021arXiv

Isomorphic unordered labeled trees up to substitution ciphering

Given two messages - as linear sequences of letters, it is immediate to determine whether one can be transformed into the other by simple substitution cipher of the letters. On the other hand, if the letters are carried as labels on nodes of topologically isomorphic unordered trees, determining if a substitution exists is referred to as marked tree isomorphism problem in the literature and has been show to be as hard as graph isomorphism. While the left-to-right direction provides the cipher of letters in the case of linear messages, if the messages are carried by unordered trees, the cipher is given by a tree isomorphism. The number of isomorphisms between two trees is roughly exponential in the size of the trees, which makes the problem of finding a cipher difficult by exhaustive search. This paper presents a method that aims to break the combinatorics of the isomorphisms search space. We show that in a linear time (in the size of the trees), we reduce the cardinality of this space by an exponential factor on average.

preprint2020arXiv

The Weight Function in the Subtree Kernel is Decisive

Tree data are ubiquitous because they model a large variety of situations, e.g., the architecture of plants, the secondary structure of RNA, or the hierarchy of XML files. Nevertheless, the analysis of these non-Euclidean data is difficult per se. In this paper, we focus on the subtree kernel that is a convolution kernel for tree data introduced by Vishwanathan and Smola in the early 2000's. More precisely, we investigate the influence of the weight function from a theoretical perspective and in real data applications. We establish on a 2-classes stochastic model that the performance of the subtree kernel is improved when the weight of leaves vanishes, which motivates the definition of a new weight function, learned from the data and not fixed by the user as usually done. To this end, we define a unified framework for computing the subtree kernel from ordered or unordered trees, that is particularly suitable for tuning parameters. We show through eight real data classification problems the great efficiency of our approach, in particular for small datasets, which also states the high importance of the weight function. Finally, a visualization tool of the significant features is derived.