Researcher profile

Yann Ponty

Yann Ponty contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
12works
0followers
6topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

12 published item(s)

preprint2013arXiv

Flexible RNA design under structure and sequence constraints using formal languages

The problem of RNA secondary structure design (also called inverse folding) is the following: given a target secondary structure, one aims to create a sequence that folds into, or is compatible with, a given structure. In several practical applications in biology, additional constraints must be taken into account, such as the presence/absence of regulatory motifs, either at a specific location or anywhere in the sequence. In this study, we investigate the design of RNA sequences from their targeted secondary structure, given these additional sequence constraints. To this purpose, we develop a general framework based on concepts of language theory, namely context-free grammars and finite automata. We efficiently combine a comprehensive set of constraints into a unifying context-free grammar of moderate size. From there, we use generic generic algorithms to perform a (weighted) random generation, or an exhaustive enumeration, of candidate sequences. The resulting method, whose complexity scales linearly with the length of the RNA, was implemented as a standalone program. The resulting software was embedded into a publicly available dedicated web server. The applicability demonstrated of the method on a concrete case study dedicated to Exon Splicing Enhancers, in which our approach was successfully used in the design of \emph{in vitro} experiments.

preprint2013arXiv

Using structural and evolutionary information to detect and correct pyrosequencing errors in non-coding RNAs

Analysis of the sequence-structure relationship in RNA molecules are essential to evolutionary studies but also to concrete applications such as error-correction methodologies in sequencing technologies. The prohibitive sizes of the mutational and conformational landscapes combined with the volume of data to proceed require efficient algorithms to compute sequence-structure properties. More specifically, here we aim to calculate which mutations increase the most the likelihood of a sequence to a given structure and RNA family. In this paper, we introduce RNApyro, an efficient linear-time and space inside-outside algorithm that computes exact mutational probabilities under secondary structure and evolutionary constraints given as a multiple sequence alignment with a consensus structure. We develop a scoring scheme combining classical stacking base pair energies to novel isostericity scales, and apply our techniques to correct point-wise errors in 5s and 16s rRNA sequences. Our results suggest that RNApyro is a promising algorithm to complement existing tools in the NGS error-correction pipeline.

preprint2012arXiv

Non-redundant random generation algorithms for weighted context-free languages

We address the non-redundant random generation of $k$ words of length $n$ in a context-free language. Additionally, we want to avoid a predefined set of words. We study a rejection-based approach, whose worst-case time complexity is shown to grow exponentially with $k$ for some specifications and in the limit case of a coupon collector. We propose two algorithms respectively based on the recursive method and on an unranking approach. We show how careful implementations of these algorithms allow for a non-redundant generation of $k$ words of length $n$ in $\mathcal{O}(k\cdot n\cdot \log{n})$ arithmetic operations, after a precomputation of $Θ(n)$ numbers. The overall complexity is therefore dominated by the generation of $k$ words, and the non-redundancy comes at a negligible cost.

preprint2012arXiv

Rule-weighted and terminal-weighted context-free grammars have identical expressivity

Two formalisms, both based on context-free grammars, have recently been proposed as a basis for a non-uniform random generation of combinatorial objects. The former, introduced by Denise et al, associates weights with letters, while the latter, recently explored by Weinberg et al in the context of random generation, associates weights to transitions. In this short note, we use a simple modification of the Greibach Normal Form transformation algorithm, due to Blum and Koch, to show the equivalent expressivities, in term of their induced distributions, of these two formalisms.

preprint2012arXiv

The weighted words collector

Motivated by applications in bioinformatics, we consider the word collector problem, i.e. the expected number of calls to a random weighted generator of words of length $n$ before the full collection is obtained. The originality of this instance of the non-uniform coupon collector lies in the, potentially large, multiplicity of the words/coupons of a given probability/composition. We obtain a general theorem that gives an asymptotic equivalent for the expected waiting time of a general version of the Coupon Collector. This theorem is especially well-suited for classes of coupons featuring high multiplicities. Its application to a given language essentially necessitates some knowledge on the number of words of a given composition/probability. We illustrate the application of our theorem, in a step-by-step fashion, on three exemplary languages, revealing asymptotic regimes in $Θ(μ(n)\cdot n)$ and $Θ(μ(n)\cdot \log n)$, where $μ(n)$ is the sum of weights over words of length $n$.

preprint2012arXiv

Tree decomposition and parameterized algorithms for RNA structure-sequence alignment including tertiary interactions and pseudoknots

We present a general setting for structure-sequence comparison in a large class of RNA structures that unifies and generalizes a number of recent works on specific families on structures. Our approach is based on tree decomposition of structures and gives rises to a general parameterized algorithm, where the exponential part of the complexity depends on the family of structures. For each of the previously studied families, our algorithm has the same complexity as the specific algorithm that had been given before.

preprint2011arXiv

A Combinatorial Framework for Designing (Pseudoknotted) RNA Algorithms

We extend an hypergraph representation, introduced by Finkelstein and Roytberg, to unify dynamic programming algorithms in the context of RNA folding with pseudoknots. Classic applications of RNA dynamic programming energy minimization, partition function, base-pair probabilities...) are reformulated within this framework, giving rise to very simple algorithms. This reformulation allows one to conceptually detach the conformation space/energy model -- captured by the hypergraph model -- from the specific application, assuming unambiguity of the decomposition. To ensure the latter property, we propose a new combinatorial methodology based on generating functions. We extend the set of generic applications by proposing an exact algorithm for extracting generalized moments in weighted distribution, generalizing a prior contribution by Miklos and al. Finally, we illustrate our full-fledged programme on three exemplary conformation spaces (secondary structures, Akutsu's simple type pseudoknots and kissing hairpins). This readily gives sets of algorithms that are either novel or have complexity comparable to classic implementations for minimization and Boltzmann ensemble applications of dynamic programming.

preprint2010arXiv

Controlled non uniform random generation of decomposable structures

Consider a class of decomposable combinatorial structures, using different types of atoms $\Atoms = \{\At_1,\ldots ,\At_{|{\Atoms}|}\}$. We address the random generation of such structures with respect to a size $n$ and a targeted distribution in $k$ of its \emph{distinguished} atoms. We consider two variations on this problem. In the first alternative, the targeted distribution is given by $k$ real numbers $\TargFreq_1, \ldots, \TargFreq_k$ such that $0 < \TargFreq_i < 1$ for all $i$ and $\TargFreq_1+\cdots+\TargFreq_k \leq 1$. We aim to generate random structures among the whole set of structures of a given size $n$, in such a way that the {\em expected} frequency of any distinguished atom $\At_i$ equals $\TargFreq_i$. We address this problem by weighting the atoms with a $k$-tuple $\Weights$ of real-valued weights, inducing a weighted distribution over the set of structures of size $n$. We first adapt the classical recursive random generation scheme into an algorithm taking $\bigO{n^{1+o(1)}+mn\log{n}}$ arithmetic operations to draw $m$ structures from the $\Weights$-weighted distribution. Secondly, we address the analytical computation of weights such that the targeted frequencies are achieved asymptotically, i. e. for large values of $n$. We derive systems of functional equations whose resolution gives an explicit relationship between $\Weights$ and $\TargFreq_1, \ldots, \TargFreq_k$. Lastly, we give an algorithm in $\bigO{k n^4}$ for the inverse problem, {\it i.e.} computing the frequencies associated with a given $k$-tuple $\Weights$ of weights, and an optimized version in $\bigO{k n^2}$ in the case of context-free languages. This allows for a heuristic resolution of the weights/frequencies relationship suitable for complex specifications. In the second alternative, the targeted distribution is given by a $k$ natural numbers $n_1, \ldots, n_k$ such that $n_1+\cdots+n_k+r=n$ where $r \geq 0$ is the number of undistinguished atoms. The structures must be generated uniformly among the set of structures of size $n$ that contain {\em exactly} $n_i$ atoms $\At_i$ ($1 \leq i \leq k$). We give a $\bigO{r^2\prod_{i=1}^k n_i^2 +m n k \log n}$ algorithm for generating $m$ structures, which simplifies into a $\bigO{r\prod_{i=1}^k n_i +m n}$ for regular specifications.

preprint2010arXiv

Multi-dimensional Boltzmann Sampling of Languages

This paper addresses the uniform random generation of words from a context-free language (over an alphabet of size $k$), while constraining every letter to a targeted frequency of occurrence. Our approach consists in a multidimensional extension of Boltzmann samplers \cite{Duchon2004}. We show that, under mostly \emph{strong-connectivity} hypotheses, our samplers return a word of size in $[(1-\varepsilon)n, (1+\varepsilon)n]$ and exact frequency in $\mathcal{O}(n^{1+k/2})$ expected time. Moreover, if we accept tolerance intervals of width in $Ω(\sqrt{n})$ for the number of occurrences of each letters, our samplers perform an approximate-size generation of words in expected $\mathcal{O}(n)$ time. We illustrate these techniques on the generation of Tetris tessellations with uniform statistics in the different types of tetraminoes.

preprint2010arXiv

Non-redundant random generation from weighted context-free languages

We address the non-redundant random generation of k words of length n from a context-free language. Additionally, we want to avoid a predefined set of words. We study the limits of a rejection-based approach, whose time complexity is shown to grow exponentially in k in some cases. We propose an alternative recursive algorithm, whose careful implementation allows for a non-redundant generation of k words of size n in O(kn log n) arithmetic operations after the precomputation of O(n) numbers. The overall complexity is therefore dominated by the generation of k words, and the non-redundancy comes at a negligible cost.

preprint2010arXiv

Weighted random generation of context-free languages: Analysis of collisions in random urn occupancy models

The present work analyzes the redundancy of sets of combinatorial objects produced by a weighted random generation algorithm proposed by Denise et al. This scheme associates weights to the terminals symbols of a weighted context-free grammar, extends this weight definition multiplicatively on words, and draws words of length $n$ with probability proportional their weight. We investigate the level of redundancy within a sample of $k$ word, the proportion of the total probability covered by $k$ words (coverage), the time (number of generations) of the first collision, and the time of the full collection. For these four questions, we use an analytic urn analogy to derive asymptotic estimates and/or polynomially computable exact forms. We illustrate these tools by an analysis of an RNA secondary structure statistical sampling algorithm introduced by Ding et al.

preprint2006arXiv

Estimating seed sensitivity on homogeneous alignments

We address the problem of estimating the sensitivity of seed-based similarity search algorithms. In contrast to approaches based on Markov models [18, 6, 3, 4, 10], we study the estimation based on homogeneous alignments. We describe an algorithm for counting and random generation of those alignments and an algorithm for exact computation of the sensitivity for a broad class of seed strategies. We provide experimental results demonstrating a bias introduced by ignoring the homogeneousness condition.