Source author record

Christian M. Reidys

Christian M. Reidys appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

36works
19topics
4close collaborators

Actions

Connect this record

Log in to claim

Research graph

See the researcher in context

Open full explorer

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

36 published item(s)

preprint2022arXiv

A computational framework for weighted simplicial homology

We provide a bottom up construction of torsion generators for weighted homology of a weighted complex over a discrete valuation ring $R=\mathbb{F}[[π]]$. This is achieved by starting from a basis for classical homology of the $n$-th skeleton for the underlying complex with coefficients in the residue field $\mathbb{F}$ and then lifting it to a basis for the weighted homology with coefficients in the ring $R$. Using the latter, a bijection is established between $n+1$ and $n$ dimensional simplices whose weight ratios provide the exponents of the $π$-monomials that generate each torsion summand in the structure theorem of the weighted homology modules over $R$. We present algorithms that subsume the torsion computation by reducing it to normalization over the residue field of $R$, and describe a Python package we implemented that takes advantage of this reduction and performs the computation efficiently.

preprint2022arXiv

On Weighted Simplicial Homology

We develop a framework for computing the homology of weighted simplicial complexes with coefficients in a discrete valuation ring. A weighted simplicial complex, $(X,v)$, introduced by Dawson [Cah. Topol. Géom. Différ. Catég. 31 (1990), pp. 229--243], is a simplicial complex, $X$, together with an integer-valued function, $v$, assigning weights to simplices, such that the weight of any of faces are monotonously increasing. In addition, weighted homology, $H_n^v(X)$, features a new boundary operator, $\partial_n^v$. In difference to Dawson, our approach is centered at a natural homomorphism $θ$ of weighted chain complexes. The key object is $H^v_{n}(X/θ)$, the weighted homology of a quotient of chain complexes induced by $θ$, appearing in a long exact sequence linking weighted homologies with different weights. We shall construct bases for the kernel and image of the weighted boundary map, identifying $n$-simplices as either $κ_n$- or $μ_n$-vertices. Long exact sequences of weighted homology groups and the bases, allow us to prove a structure theorem for the weighted simplicial homology with coefficients in a ring of formal power series $R=\mathbb{F}[[π]]$, where $\mathbb{F}$ is a field. Relative to simplicial homology new torsion arises and we shall show that the torsion modules are connected to a pairing between distinguished $κ_n$ and $μ_{n+1}$ simplices.

preprint2016arXiv

Plane permutations and applications to a result of Zagier-Stanley and distances of permutations

In this paper, we introduce plane permutations, i.e. pairs $\mathfrak{p}=(s,π)$ where $s$ is an $n$-cycle and $π$ is an arbitrary permutation, represented as a two-row array. Accordingly a plane permutation gives rise to three distinct permutations: the permutation induced by the upper horizontal ($s$), the vertical $π$) and the diagonal ($D_{\mathfrak{p}}$) of the array. The latter can also be viewed as the three permutations of a hypermap. In particular, a map corresponds to a plane permutation, in which the diagonal is a fixed point-free involution. We study the transposition action on plane permutations obtained by permuting their diagonal-blocks. We establish basic properties of plane permutations and study transpositions and exceedances and derive various enumerative results. In particular, we prove a recurrence for the number of plane permutations having a fixed diagonal and $k$ cycles in the vertical, generalizing Chapuy's recursion for maps filtered by the genus. As applications of this framework, we present a combinatorial proof of a result of Zagier and Stanley, on the number of $n$-cycles $ω$, for which the product $ω(1~2~\cdots ~n)$ has exactly $k$ cycles. Furthermore, we integrate studies on the transposition and block-interchange distance of permutations as well as the reversal distance of signed permutations. Plane permutations allow us to generalize and recover various lower bounds for transposition and block-interchange distances and to connect reversals with block-interchanges.

preprint2016arXiv

RNA secondary structures having a compatible sequence of certain nucleotide ratios

Given a random RNA secondary structure, $S$, we study RNA sequences having fixed ratios of nuclotides that are compatible with $S$. We perform this analysis for RNA secondary structures subject to various base pairing rules and minimum arc- and stack-length restrictions. Our main result reads as follows: in the simplex of the nucleotide ratios there exists a convex region in which, in the limit of long sequences, a random structure a.a.s.~has compatible sequence with these ratios and outside of which a.a.s.~a random structure has no such compatible sequence. We localize this region for RNA secondary structures subject to various base pairing rules and minimum arc- and stack-length restrictions. In particular, for {\bf GC}-sequences having a ratio of {\bf G} nucleotides smaller than $1/3$, a random RNA secondary structure without any minimum arc- and stack-length restrictions has a.a.s.~no such compatible sequence. For sequences having a ratio of {\bf G} nucleotides larger than $1/3$, a random RNA secondary structure has a.a.s. such compatible sequences. We discuss our results in the context of various families of RNA structures.

preprint2016arXiv

Sequence-structure relations of biopolymers

Motivation: DNA data is transcribed into single-stranded RNA, which folds into specific molecular structures. In this paper we pose the question to what extent sequence- and structure-information correlate. We view this correlation as structural semantics of sequence data that allows for a different interpretation than conventional sequence alignment. Structural semantics could enable us to identify more general embedded "patterns" in DNA and RNA sequences. Results: We compute the partition function of sequences with respect to a fixed structure and connect this computation to the mutual information of a sequence-structure pair for RNA secondary structures. We present a Boltzmann sampler and obtain the a priori probability of specific sequence patterns. We present a detailed analysis for the three PDB-structures, 2JXV (hairpin), 2N3R (3-branch multi-loop) and 1EHZ (tRNA). We localize specific sequence patterns, contrast the energy spectrum of the Boltzmann sampled sequences versus those sequences that refold into the same structure and derive a criterion to identify native structures. We illustrate that there are multiple sequences in the partition function of a fixed structure, each having nearly the same mutual information, that are nevertheless poorly aligned. This indicates the possibility of the existence of relevant patterns embedded in the sequences that are not discoverable using alignments.

preprint2016arXiv

Statistics of topological RNA structures

In this paper we study properties of topological RNA structures, i.e.~RNA contact structures with cross-serial interactions that are filtered by their topological genus. RNA secondary structures within this framework are topological structures having genus zero. We derive a new bivariate generating function whose singular expansion allows us to analyze the distributions of arcs, stacks, hairpin- , interior- and multi-loops. We then extend this analysis to H-type pseudoknots, kissing hairpins as well as $3$-knots and compute their respective expectation values. Finally we discuss our results and put them into context with data obtained by uniform sampling structures of fixed genus.

preprint2016arXiv

Topological language for RNA

In this paper we introduce a novel, context-free grammar, {\it RNAFeatures$^*$}, capable of generating any RNA structure including pseudoknot structures (pk-structure). We represent pk-structures as orientable fatgraphs, which naturally leads to a filtration by their topological genus. Within this framework, RNA secondary structures correspond to pk-structures of genus zero. {\it RNAFeatures$^*$} acts on formal, arc-labeled RNA secondary structures, called $λ$-structures. $λ$-structures correspond one-to-one to pk-structures together with some additional information. This information consists of the specific rearrangement of the backbone, by which a pk-structure can be made cross-free. {\it RNAFeatures$^*$} is an extension of the grammar for secondary structures and employs an enhancement by labelings of the symbols as well as the production rules. We discuss how to use {\it RNAFeatures$^*$} to obtain a stochastic context-free grammar for pk-structures, using data of RNA sequences and structures. The induced grammar facilitates fast Boltzmann sampling and statistical analysis. As a first application, we present an $O(n log(n))$ runtime algorithm which samples pk-structures based on ninety tRNA sequences and structures from the Nucleic Acid Database (NDB).

preprint2015arXiv

A simple framework on sorting permutations

In this paper we present a simple framework to study various distance problems of permutations, including the transposition and block-interchange distance of permutations as well as the reversal distance of signed permutations. These problems are very important in the study of the evolution of genomes. We give a general formulation for lower bounds of the transposition and block-interchange distance from which the existing lower bounds obtained by Bafna and Pevzner, and Christie can be easily derived. As to the reversal distance of signed permutations, we translate it into a block-interchange distance problem of permutations so that we obtain a new lower bound. Furthermore, studying distance problems via our framework motivates several interesting combinatorial problems related to product of permutations, some of which are studied in this paper as well.

preprint2015arXiv

Narayana polynomials and some generalizations

In this note, by counting some colored plane trees we obtain several binomial identities. These identities can be viewed as specific evaluations of certain generalizations of the Narayana polynomials. As consequences, it provides combinatorial proofs for a bijective problem in Stanley's collection "Bijective Proof Problems", a new formula for the Narayana polynomials as well as a new expression for the Harer-Zagier formula enumerating unicellular maps, in a unified way. Furthermore, we identify a class of plane trees, whose enumeration is closely connected to the Schröder numbers. Many other binomial identities are presented as well.

preprint2015arXiv

On plane permutations

In this paper we generalize permutations to plane permutations. We employ this framework to derive a combinatorial proof of a result of Zagier and Stanley, that enumerates the number of $n$-cycles $ω$, for which $ω(12\cdots n)$ has exactly $k$ cycles. This quantity is $0$, if $n-k$ is odd and $\frac{2C(n+1,k)}{n(n+1)}$, otherwise, where $C(n,k)$ is the unsigned Stirling number of the first kind. The proof is facilitated by a natural transposition action on plane permutations which gives rise to various recurrences. Furthermore we study several distance problems of permutations. It turns out that plane permutations allow to study transposition and block-interchange distance of permutations as well as the reversal distance of signed permutations. Novel connections between these different distance problems are established via plane permutations.

preprint2015arXiv

Variation of the local topological structure of graph embeddings

The $2$-cell embeddings of graphs on closed surfaces have been widely studied. It is well known that ($2$-cell) embedding a given graph $G$ on a closed orientable surface is equivalent to cyclically ordering the edges incident to each vertex of $G$. In this paper, we study the following problem: given a genus $g$ embedding $\mathbb{E}$ of the graph $G$, if we randomly rearrange the edges around a vertex, i.e., re-embedding, what is the probability of the resulting embedding $\mathbb{E}'$ having genus $g+Δg$? We give a formula to compute this probability. Meanwhile, some other known and unknown results are also obtained. For example, we show that the probability of preserving the genus is at least $\frac{2}{deg(v)+2}$ for re-embedding any vertex $v$ of degree $deg(v)$ in a one-face embedding; and we obtain a necessary condition for a given embedding of $G$ to be an embedding with the minimum genus.

preprint2014arXiv

A combinatorial interpretation of the $κ^{\star}_{g}(n)$ coefficients

Studying the virtual Euler characteristic of the moduli space of curves, Harer and Zagier compute the generating function $C_g(z)$ of unicellular maps of genus $g$. They furthermore identify coefficients, $κ^{\star}_{g}(n)$, which fully determine the series $C_g(z)$. The main result of this paper is a combinatorial interpretation of $κ^{\star}_{g}(n)$. We show that these enumerate a class of unicellular maps, which correspond $1$-to-$2^{2g}$ to a specific type of trees, referred to as O-trees. O-trees are a variant of the C-decorated trees introduced by Chapuy, Féray and Fusy. We exhaustively enumerate the number $s_{g}(n)$ of shapes of genus $g$ with $n$ edges, which is a specific class of unicellular maps with vertex degree at least three. Furthermore we give combinatorial proofs for expressing the generating functions $C_g(z)$ and $S_g(z)$ for unicellular maps and shapes in terms of $κ^{\star}_{g}(n)$, respectively. We then prove a two term recursion for $κ^{\star}_{g}(n)$ and that for any fixed $g$, the sequence $\{κ_{g,t}\}_{t=0}^g$ is log-concave, where $κ^{\star}_{g}(n)= κ_{g,t}$, for $n=2g+t-1$.

preprint2014arXiv

A topological framework for signed permutations

In this paper we present a topological framework for studying signed permutations and their reversal distance. As a result we can give an alternative approach and interpretation of the Hannenhalli-Pevzner formula for the reversal distance of signed permutations. Our approach utlizes the Poincaré dual, upon which reversals act in a particular way and obsoletes the notion of "padding" of the signed permutations. To this end we construct a bijection between signed permutations and an equivalence class of particular fatgraphs, called $π$-maps, and analyze the action of reversals on the latter. We show that reversals act via either slicing, gluing or half-flipping of external vertices, which implies that any reversal changes the topological genus by at most one. Finally we revisit the Hannenhalli-Pevzner formula employing orientable and non-orientable, irreducible, $π$-maps.

preprint2014arXiv

Shapes of interacting RNA complexes

Shapes of interacting RNA complexes are studied using a filtration via their topological genus. A shape of an RNA complex is obtained by (iteratively) collapsing stacks and eliminating hairpin loops. This shape-projection preserves the topological core of the RNA complex and for fixed topological genus there are only finitely many such shapes.Our main result is a new bijection that relates the shapes of RNA complexes with shapes of RNA structures.This allows to compute the shape polynomial of RNA complexes via the shape polynomial of RNA structures. We furthermore present a linear time uniform sampling algorithm for shapes of RNA complexes of fixed topological genus.

preprint2014arXiv

Shapes of topological RNA structures

A topological RNA structure is derived from a diagram and its shape is obtained by collapsing the stacks of the structure into single arcs and by removing any arcs of length one. Shapes contain key topological, information and for fixed topological genus there exist only finitely many such shapes. We shall express topological RNA structures as unicellular maps, i.e. graphs together with a cyclic ordering of their half-edges. In this paper we prove a bijection of shapes of topological RNA structures. We furthermore derive a linear time algorithm generating shapes of fixed topological genus. We derive explicit expressions for the coefficients of the generating polynomial of these shapes and the generating function of RNA structures of genus $g$. Furthermore we outline how shapes can be used in order to extract essential information of RNA structure databases.

preprint2014arXiv

Uniform generation of RNA-RNA interaction structures of fixed topological genus

Interacting RNA complexes are studied via bicellular maps using a filtration via their topological genus. Our main result is a new bijection for RNA-RNA interaction structures and linear time uniform sampling algorithm for RNA complexes of fixed topological genus. The bijection allows to either reduce the topological genus of a bicellular map directly, or to lose connectivity by decomposing the complex into a pair of single stranded RNA structures. Our main result is proved bijectively. It provides an explicit algorithm of how to rewire the corresponding complexes and an unambiguous decomposition grammar. Using the concept of genus induction, we construct bicellular maps of fixed topological genus $g$ uniformly in linear time. We present various statistics on these topological RNA complexes and compare our findings with biological complexes. Furthermore we show how to construct loop-energy based complexes using our decomposition grammar.

preprint2013arXiv

A bijection between unicellular and bicellular maps

In this paper we present a combinatorial proof of a relation between the generating functions of unicellular and bicellular maps. This relation is a consequence of the Schwinger-Dyson equation of matrix theory. Alternatively it can be proved using representation theory of the symmetric group. Here we give a bijective proof by rewiring unicellular maps of topological genus $(g+1)$ into bicellular maps of genus $g$ and pairs of unicellular maps of lower topological genera. Our result has immediate consequences for the folding of RNA interaction structures, since the time complexity of folding the transformed structure is $O((n+m)^5)$, where $n,m$ are the lengths of the respective backbones, while the folding of the original structure has $O(n^6)$ time complexity, where $n$ is the length of the longer sequence.

preprint2013arXiv

Combinatorics of $γ$-structures

In this paper we study canonical $γ$-structures, a class of RNA pseudoknot structures that plays a key role in the context of polynomial time folding of RNA pseudoknot structures. A $γ$-structure is composed by specific building blocks, that have topological genus less than or equal to $γ$, where composition means concatenation and nesting of such blocks. Our main result is the derivation of the generating function of $γ$-structures via symbolic enumeration using so called irreducible shadows. We furthermore recursively compute the generating polynomials of irreducible shadows of genus $\le γ$. $γ$-structures are constructed via $γ$-matchings. For $1\le γ\le 10$, we compute Puiseux-expansions at the unique, dominant singularities, allowing us to derive simple asymptotic formulas for the number of $γ$-structures.

preprint2013arXiv

On the genus filtration of diagrams over two backbones

In this paper we compute the bivariate generating function of $γ$-matchings over two backbones, filtered by the number of arcs and the topological genus. $γ$-matchings over two backbones are chord-diagrams, obtained via concatenation and nesting of irreducible shapes of topological genus $\le γ$. We show that the key information is contained in the polynomials counting these shapes and provide recursions that allow to compute the latter. In particular we give a bijection between such irreducible shapes over one and two backbones. We present two applications of our results. The first is concerned with RNA-RNA interaction structures, obtained from the $γ$-matchings via symbolic methods. We secondly show that, using analytic-combinatorial methods, the topological genus satisfies a central limit theorem.

preprint2013arXiv

Uniform generation of RNA pseudoknot structures with genus filtration

In this paper we present a sampling framework for RNA structures of fixed topological genus. We introduce a novel, linear time, uniform sampling algorithm for RNA structures of fixed topological genus $g$, for arbitrary $g>0$. Furthermore we develop a linear time sampling algorithm for RNA structures of fixed topological genus $g$ that are weighted by a simplified, loop-based energy functional. For this process the partition function of the energy functional has to be computed once, which has $O(n^2)$ time complexity.

preprint2012arXiv

A phase transition in energy-filtered RNA secondary structures

In this paper we study the effect of energy parameters on minimum free energy (mfe) RNA secondary structures. Employing a simplified combinatorial energy model, that is only dependent on the diagram representation and that is not sequence specific, we prove the following dichotomy result. Mfe structures derived via the Turner energy parameters contain only finitely many complex irreducible substructures and just minor parameter changes produce a class of mfe-structures that contain a large number of small irreducibles. We localize the exact point where the distribution of irreducibles experiences this phase transition from a discrete limit to a central limit distribution and subsequently put our result into the context of quantifying the effect of sparsification of the folding of these respective mfe-structures. We show that the sparsification of realistic mfe-structures leads to a constant time and space reduction and that the sparsifcation of the folding of structures with modified parameters leads to a linear time and space reduction. We furthermore identify the limit distribution at the phase transition as a Rayleigh distribution.

preprint2012arXiv

On the combinatorics of sparsification

Background: We study the sparsification of dynamic programming folding algorithms of RNA structures. Sparsification applies to the mfe-folding of RNA structures and can lead to a significant reduction of time complexity. Results: We analyze the sparsification of a particular decomposition rule, $Λ^*$, that splits an interval for RNA secondary and pseudoknot structures of fixed topological genus. Essential for quantifying the sparsification is the size of its so called candidate set. We present a combinatorial framework which allows by means of probabilities of irreducible substructures to obtain the expected size of the set of $Λ^*$-candidates. We compute these expectations for arc-based energy models via energy-filtered generating functions (GF) for RNA secondary structures as well as RNA pseudoknot structures. For RNA secondary structures we also consider a simplified loop-energy model. This combinatorial analysis is then compared to the expected number of $Λ^*$-candidates obtained from folding mfe-structures. In case of the mfe-folding of RNA secondary structures with a simplified loop energy model our results imply that sparsification provides a reduction of time complexity by a constant factor of 91% (theory) versus a 96% reduction (experiment). For the "full" loop-energy model there is a reduction of 98% (experiment).

preprint2012arXiv

The topological filtration of $γ$-structures

In this paper we study $γ$-structures filtered by topological genus. $γ$-structures are a class of RNA pseudoknot structures that plays a key role in the context of polynomial time folding of RNA pseudoknot structures. A $γ$-structure is composed by specific building blocks, that have topological genus less than or equal to $γ$, where composition means concatenation and nesting of such blocks. Our main results are the derivation of a new bivariate generating function for $γ$-structures via symbolic methods, the singularity analysis of the solutions and a central limit theorem for the distribution of topological genus in $γ$-structures of given length. In our derivation specific bivariate polynomials play a central role. Their coefficients count particular motifs of fixed topological genus and they are of relevance in the context of genus recursion and novel folding algorithms.

preprint2011arXiv

Random induced subgraphs of Cayley graphs induced by transpositions

In this paper we study random induced subgraphs of Cayley graphs of the symmetric group induced by an arbitrary minimal generating set of transpositions. A random induced subgraph of this Cayley graph is obtained by selecting permutations with independent probability, $λ_n$. Our main result is that for any minimal generating set of transpositions, for probabilities $λ_n=\frac{1+ε_n}{n-1}$ where $n^{-{1/3}+δ}\le ε_n<1$ and $δ>0$, a random induced subgraph has a.s. a unique largest component of size $\wp(ε_n)\frac{1+ε_n}{n-1}n!$, where $\wp(ε_n)$ is the survival probability of a specific branching process.

preprint2011arXiv

The 5'-3' distance of RNA secondary structures

Recently Yoffe {\it et al.} observed that the average distances between $5'-3'$ ends of RNA molecules are very small and largely independent of sequence length. This observation is based on numerical computations as well as theoretical arguments maximizing certain entropy functionals. In this paper we compute the exact distribution of $5'-3'$ distances of RNA secondary structures for any finite $n$. We furthermore compute the limit distribution and show that already for $n=30$ the exact distribution and the limit distribution are very close. Our results show that the distances of random RNA secondary structures are distinctively lower than those of minimum free energy structures of random RNA sequences.

preprint2011arXiv

Topology of RNA-RNA interaction structures

The topological filtration of interacting RNA complexes is studied and the role is analyzed of certain diagrams called irreducible shadows, which form suitable building blocks for more general structures. We prove that for two interacting RNAs, called interaction structures, there exist for fixed genus only finitely many irreducible shadows. This implies that for fixed genus there are only finitely many classes of interaction structures. In particular the simplest case of genus zero already provides the formalism for certain types of structures that occur in nature and are not covered by other filtrations. This case of genus zero interaction structures is already of practical interest, is studied here in detail and found to be expressed by a multiple context-free grammar extending the usual one for RNA secondary structures. We show that in $O(n^6)$ time and $O(n^4)$ space complexity, this grammar for genus zero interaction structures provides not only minimum free energy solutions but also the complete partition function and base pairing probabilities.

preprint2010arXiv

Combinatorial analysis of interacting RNA molecules

Recently several minimum free energy (MFE) folding algorithms for predicting the joint structure of two interacting RNA molecules have been proposed. Their folding targets are interaction structures, that can be represented as diagrams with two backbones drawn horizontally on top of each other such that (1) intramolecular and intermolecular bonds are noncrossing and (2) there is no "zig-zag" configuration. This paper studies joint structures with arc-length at least four in which both, interior and exterior stack-lengths are at least two (no isolated arcs). The key idea in this paper is to consider a new type of shape, based on which joint structures can be derived via symbolic enumeration. Our results imply simple asymptotic formulas for the number of joint structures with surprisingly small exponential growth rates. They are of interest in the context of designing prediction algorithms for RNA-RNA interactions.

preprint2010arXiv

Combinatorics of RNA-RNA interaction

RNA-RNA binding is an important phenomenon observed for many classes of non-coding RNAs and plays a crucial role in a number of regulatory processes. Recently several MFE folding algorithms for predicting the joint structure of two interacting RNA molecules have been proposed. Here joint structure means that in a diagram representation the intramolecular bonds of each partner are pseudoknot-free, that the intermolecular binding pairs are noncrossing, and that there is no so-called ``zig-zag'' configuration. This paper presents the combinatorics of RNA interaction structures including their generating function, singularity analysis as well as explicit recurrence relations. In particular, our results imply simple asymptotic formulas for the number of joint structures.

preprint2010arXiv

Inverse folding of RNA pseudoknot structures

Background: RNA exhibits a variety of structural configurations. Here we consider a structure to be tantamount to the noncrossing Watson-Crick and \pairGU-base pairings (secondary structure) and additional cross-serial base pairs. These interactions are called pseudoknots and are observed across the whole spectrum of RNA functionalities. In the context of studying natural RNA structures, searching for new ribozymes and designing artificial RNA, it is of interest to find RNA sequences folding into a specific structure and to analyze their induced neutral networks. Since the established inverse folding algorithms, {\tt RNAinverse}, {\tt RNA-SSD} as well as {\tt INFO-RNA} are limited to RNA secondary structures, we present in this paper the inverse folding algorithm {\tt Inv} which can deal with 3-noncrossing, canonical pseudoknot structures. Results: In this paper we present the inverse folding algorithm {\tt Inv}. We give a detailed analysis of {\tt Inv}, including pseudocodes. We show that {\tt Inv} allows to design in particular 3-noncrossing nonplanar RNA pseudoknot 3-noncrossing RNA structures-a class which is difficult to construct via dynamic programming routines. {\tt Inv} is freely available at \url{http://www.combinatorics.cn/cbpc/inv.html}. Conclusions: The algorithm {\tt Inv} extends inverse folding capabilities to RNA pseudoknot structures. In comparison with {\tt RNAinverse} it uses new ideas, for instance by considering sets of competing structures. As a result, {\tt Inv} is not only able to find novel sequences even for RNA secondary structures, it does so in the context of competing structures that potentially exhibit cross-serial interactions.

preprint2010arXiv

Inverse Folding of RNA Pseudoknot Structures

Background: RNA exhibits a variety of structural configurations. Here we consider a structure to be tantamount to the noncrossing Watson-Crick and \pairGU-base pairings (secondary structure) and additional cross-serial base pairs. These interactions are called pseudoknots and are observed across the whole spectrum of RNA functionalities. In the context of studying natural RNA structures, searching for new ribozymes and designing artificial RNA, it is of interest to find RNA sequences folding into a specific structure and to analyze their induced neutral networks. Since the established inverse folding algorithms, {\tt RNAinverse}, {\tt RNA-SSD} as well as {\tt INFO-RNA} are limited to RNA secondary structures, we present in this paper the inverse folding algorithm {\tt Inv} which can deal with 3-noncrossing, canonical pseudoknot structures. Results: In this paper we present the inverse folding algorithm {\tt Inv}. We give a detailed analysis of {\tt Inv}, including pseudocodes. We show that {\tt Inv} allows to design in particular 3-noncrossing nonplanar RNA pseudoknot 3-noncrossing RNA structures--a class which is difficult to construct via dynamic programming routines. {\tt Inv} is freely available at \url{http://www.combinatorics.cn/cbpc/inv.html}. Conclusions: The algorithm {\tt Inv} extends inverse folding capabilities to RNA pseudoknot structures. In comparison with {\tt RNAinverse} it uses new ideas, for instance by considering sets of competing structures. As a result, {\tt Inv} is not only able to find novel sequences even for RNA secondary structures, it does so in the context of competing structures that potentially exhibit cross-serial interactions.

preprint2010arXiv

Linear chord diagrams on two intervals

Consider all possible ways of attaching disjoint chords to two ordered and oriented disjoint intervals so as to produce a connected graph. Taking the intervals to lie in the real axis with the induced orientation and the chords to lie in the upper half plane canonically determines a corresponding fatgraph which has some associated genus $g\geq 0$, and we consider the natural generating function ${\bf C}_g^{[2]}(z)=\sum_{n\geq 0} {\bf c}^{[2]}_g(n)z^n$ for the number ${\bf c}^{[2]}_g(n)$ of distinct such chord diagrams of fixed genus $g\geq 0$ with a given number $n\geq 0$ of chords. We prove here the surprising fact that ${\bf C}^{[2]}_g(z)=z^{2g+1} R_g^{[2]}(z)/(1-4z)^{3g+2} $ is a rational function, for $g\geq 0$, where the polynomial $R^{[2]}_g(z)$ with degree at most $g$ has integer coefficients and satisfies $R_g^{[2]}({1\over 4})\neq 0$. Earlier work had already determined that the analogous generating function ${\bf C}_g(z)=z^{2g}R_g(z)/(1-4z)^{3g-{1\over 2}}$ for chords attached to a single interval is algebraic, for $g\geq 1$, where the polynomial $R_g(z)$ with degree at most $g-1$ has integer coefficients and satisfies $R_g(1/4)\neq 0$ in analogy to the generating function ${\bf C}_0(z)$ for the Catalan numbers. The new results here on ${\bf C}_g^{[2]}(z)$ rely on this earlier work, and indeed, we find that $R_g^{[2]}(z)=R_{g+1}(z) -z\sum_{g_1=1}^g R_{g_1}(z) R_{g+1-g_1}(z)$, for $g\geq 1$.

preprint2010arXiv

On the uniform generation of modular diagrams

In this paper we present an algorithm that generates $k$-noncrossing, $σ$-modular diagrams with uniform probability. A diagram is a labeled graph of degree $\le 1$ over $n$ vertices drawn in a horizontal line with arcs $(i,j)$ in the upper half-plane. A $k$-crossing in a diagram is a set of $k$ distinct arcs $(i_1, j_1), (i_2, j_2),\ldots,(i_k, j_k)$ with the property $i_1 < i_2 < \ldots < i_k < j_1 < j_2 < \ldots< j_k$. A diagram without any $k$-crossings is called a $k$-noncrossing diagram and a stack of length $σ$ is a maximal sequence $((i,j),(i+1,j-1),\dots,(i+(σ-1),j-(σ-1)))$. A diagram is $σ$-modular if any arc is contained in a stack of length at least $σ$. Our algorithm generates after $O(n^k)$ preprocessing time, $k$-noncrossing, $σ$-modular diagrams in $O(n)$ time and space complexity.

preprint2010arXiv

RNA-RNA interaction prediction based on multiple sequence alignments

Many computerized methods for RNA-RNA interaction structure prediction have been developed. Recently, $O(N^6)$ time and $O(N^4)$ space dynamic programming algorithms have become available that compute the partition function of RNA-RNA interaction complexes. However, few of these methods incorporate the knowledge concerning related sequences, thus relevant evolutionary information is often neglected from the structure determination. Therefore, it is of considerable practical interest to introduce a method taking into consideration both thermodynamic stability and sequence covariation. We present the \emph{a priori} folding algorithm \texttt{ripalign}, whose input consists of two (given) multiple sequence alignments (MSA). \texttt{ripalign} outputs (1) the partition function, (2) base-pairing probabilities, (3) hybrid probabilities and (4) a set of Boltzmann-sampled suboptimal structures consisting of canonical joint structures that are compatible to the alignments. Compared to the single sequence-pair folding algorithm \texttt{rip}, \texttt{ripalign} requires negligible additional memory resource. Furthermore, we incorporate possible structure constraints as input parameters into our algorithm. The algorithm described here is implemented in C as part of the \texttt{rip} package. The supplemental material, source code and input/output files can freely be downloaded from \url{http://www.combinatorics.cn/cbpc/ripalign.html}. \section{Contact} Christian Reidys \texttt{duck@santafe.edu}

preprint2010arXiv

The evolution of random reversal graph

The random reversal graph offers new perspectives, allowing to study the connectivity of genomes as well as their most likely distance as a function of the reversal rate. Our main result shows that the structure of the random reversal graph changes dramatically at $λ_n=1/\binom{n+1}{2}$. For $λ_n=(1-ε)/\binom{n+1}{2}$, the random graph consists of components of size at most $O(n\ln(n))$ a.s. and for $(1+ε)/\binom{n+1}{2}$, there emerges a unique largest component of size $\sim \wp(ε) \cdot 2^n\cdot n$!$ a.s.. This "giant" component is furthermore dense in the reversal graph.

preprint2009arXiv

Random $k$-noncrossing RNA Structures

In this paper we derive polynomial time algorithms that generate random $k$-noncrossing matchings and $k$-noncrossing RNA structures with uniform probability. Our approach employs the bijection between $k$-noncrossing matchings and oscillating tableaux and the $P$-recursiveness of the cardinalities of $k$-noncrossing matchings. The main idea is to consider the tableaux sequences as paths of stochastic processes over shapes and to derive their transition probabilities.

preprint2007arXiv

Crossings and nesting in tangled-diagrams

A tangled-diagram over $[n]=\{1,...,n\}$ is a graph of degree less than two whose vertices $1,...,n$ are arranged in a horizontal line and whose arcs are drawn in the upper halfplane with a particular notion of crossings and nestings. Generalizing the construction of Chen {\it et.al.} we prove a bijection between generalized vacillating tableaux with less than $k$ rows and $k$-noncrossing tangled-diagrams and study their crossings and nestings. We show that the number of $k$-noncrossing and $k$-nonnesting tangled-diagrams are equal and enumerate tangled-diagrams.