Source author record

Danièle Gardy

Danièle Gardy appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.CO math.PR Data Structures and Algorithms Discrete Mathematics Machine Learning math.LO

Catalog footprint

What is connected

7works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

A Computational Model for Logical Analysis of Data

Initially introduced by Peter Hammer, Logical Analysis of Data is a methodology that aims at computing a logical justification for dividing a group of data in two groups of observations, usually called the positive and negative groups. Consider this partition into positive and negative groups as the description of a partially defined Boolean function; the data is then processed to identify a subset of attributes, whose values may be used to characterize the observations of the positive groups against those of the negative group. LAD constitutes an interesting rule-based learning alternative to classic statistical learning techniques and has many practical applications. Nevertheless, the computation of group characterization may be costly, depending on the properties of the data instances. A major aim of our work is to provide effective tools for speeding up the computations, by computing some \emph{a priori} probability that a given set of attributes does characterize the positive and negative groups. To this effect, we propose several models for representing the data set of observations, according to the information we have on it. These models, and the probabilities they allow us to compute, are also helpful for quickly assessing some properties of the real data at hand; furthermore they may help us to better analyze and understand the computational difficulties encountered by solving methods. Once our models have been established, the mathematical tools for computing probabilities come from Analytic Combinatorics. They allow us to express the desired probabilities as ratios of generating functions coefficients, which then provide a quick computation of their numerical values. A further, long-range goal of this paper is to show that the methods of Analytic Combinatorics can help in analyzing the performance of various algorithms in LAD and related fields.

preprint2015arXiv

2-Xor revisited: satisfiability and probabilities of functions

The problem 2-Xor-Sat asks for the probability that a random expression, built as a conjunction of clauses $x \oplus y$, is satisfiable. We revisit this classical problem by giving an alternative, explicit expression of this probability. We then consider a refinement of it, namely the probability that a random expression computes a specific Boolean function. The answers to both problems involve a description of 2-Xor expressions as multigraphs and use classical methods of analytic combinatorics by expressing probabilities through coefficients of generating functions.

preprint2015arXiv

B-urns

The fringe of a B-tree with parameter $m$ is considered as a particular Pólya urn with $m$ colors. More precisely, the asymptotic behaviour of this fringe, when the number of stored keys tends to infinity, is studied through the composition vector of the fringe nodes. We establish its typical behaviour together with the fluctuations around it. The well known phase transition in Pólya urns has the following effect on B-trees: for $m\leq 59$, the fluctuations are asymptotically Gaussian, though for $m\geq 60$, the composition vector is oscillating; after scaling, the fluctuations of such an urn strongly converge to a random variable $W$. This limit is $\mathbb C$-valued and it does not seem to follow any classical law. Several properties of $W$ are shown: existence of exponential moments, characterization of its distribution as the solution of a smoothing equation, existence of a density relatively to the Lebesgue measure on $\mathbb C$, support of $W$. Moreover, a few representations of the composition vector for various values of $m$ illustrate the different kinds of convergence.

preprint2015arXiv

On the number of unary-binary tree-like structures with restrictions on the unary height

We consider various classes of Motzkin trees as well as lambda-terms for which we derive asymptotic enumeration results. These classes are defined through various restrictions concerning the unary nodes or abstractions, respectively: We either bound their number or the allowed levels of nesting. The enumeration is done by means of a generating function approach and singularity analysis. The generating functions are composed of nested square roots and exhibit unexpected phenomena in some of the cases. Furthermore, we present some observations obtained from generating such terms randomly and explain why usually powerful tools for random generation, such as Boltzmann samplers, face serious difficulties in generating lambda-terms.

preprint2013arXiv

Enumeration of generalized $BCI$ lambda-terms

We investigate the asymptotic number of elements of size $n$ in a particular class of closed lambda-terms (so-called $BCI(p)$-terms) which are related to axiom systems of combinatory logic. By deriving a differential equation for the generating function of the counting sequence we obtain a recurrence relation which can be solved asymptotically. We derive differential equations for the generating functions of the counting sequences of other more general classes of terms as well: the class of $BCK(p)$-terms and that of closed lambda-terms. Using elementary arguments we obtain upper and lowerestimates for the number of closed lambda-terms of size $n$. Moreover, a recurrence relation is derived which allows an efficient computation of the counting sequence. $BCK(p)$-terms are discussed briefly.

preprint2012arXiv

The weighted words collector

Motivated by applications in bioinformatics, we consider the word collector problem, i.e. the expected number of calls to a random weighted generator of words of length $n$ before the full collection is obtained. The originality of this instance of the non-uniform coupon collector lies in the, potentially large, multiplicity of the words/coupons of a given probability/composition. We obtain a general theorem that gives an asymptotic equivalent for the expected waiting time of a general version of the Coupon Collector. This theorem is especially well-suited for classes of coupons featuring high multiplicities. Its application to a given language essentially necessitates some knowledge on the number of words of a given composition/probability. We illustrate the application of our theorem, in a step-by-step fashion, on three exemplary languages, revealing asymptotic regimes in $Θ(μ(n)\cdot n)$ and $Θ(μ(n)\cdot \log n)$, where $μ(n)$ is the sum of weights over words of length $n$.

preprint2010arXiv

Weighted random generation of context-free languages: Analysis of collisions in random urn occupancy models

The present work analyzes the redundancy of sets of combinatorial objects produced by a weighted random generation algorithm proposed by Denise et al. This scheme associates weights to the terminals symbols of a weighted context-free grammar, extends this weight definition multiplicatively on words, and draws words of length $n$ with probability proportional their weight. We investigate the level of redundancy within a sample of $k$ word, the proportion of the total probability covered by $k$ words (coverage), the time (number of generations) of the first collision, and the time of the full collection. For these four questions, we use an analytic urn analogy to derive asymptotic estimates and/or polynomially computable exact forms. We illustrate these tools by an analysis of an RNA secondary structure statistical sampling algorithm introduced by Ding et al.

Danièle Gardy

What is connected

Connect this record

See the researcher in context

Building this map preview

7 published item(s)

A Computational Model for Logical Analysis of Data

2-Xor revisited: satisfiability and probabilities of functions

B-urns

On the number of unary-binary tree-like structures with restrictions on the unary height

Enumeration of generalized $BCI$ lambda-terms

The weighted words collector

Weighted random generation of context-free languages: Analysis of collisions in random urn occupancy models