Source author record

Cedric Chauve

Cedric Chauve appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Data Structures and Algorithms Genomics Quantitative Methods Discrete Mathematics math.CO Computational Engineering, Finance, and Science math.RT Populations and Evolution

Catalog footprint

What is connected

14works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2016arXiv

Counting, generating and sampling tree alignments

Pairwise ordered tree alignment are combinatorial objects that appear in RNA secondary structure comparison. However, the usual representation of tree alignments as supertrees is ambiguous, i.e. two distinct supertrees may induce identical sets of matches between identical pairs of trees. This ambiguity is uninformative, and detrimental to any probabilistic analysis.In this work, we consider tree alignments up to equivalence. Our first result is a precise asymptotic enumeration of tree alignments, obtained from a context-free grammar by mean of basic analytic combinatorics. Our second result focuses on alignments between two given ordered trees $S$ and $T$. By refining our grammar to align specific trees, we obtain a decomposition scheme for the space of alignments, and use it to design an efficient dynamic programming algorithm for sampling alignments under the Gibbs-Boltzmann probability distribution. This generalizes existing tree alignment algorithms, and opens the door for a probabilistic analysis of the space of suboptimal RNA secondary structures alignments.

preprint2016arXiv

The gene family-free median of three

The gene family-free framework for comparative genomics aims at developing methods for gene order analysis that do not require prior gene family assignment, but work directly on a sequence similarity multipartite graph. We present a model for constructing a median of three genomes in this family-free setting, based on maximizing an objective function that generalizes the classical breakpoint distance by integrating sequence similarity in the score of a gene adjacency. We show that the corresponding computational problem is MAX SNP-hard and we present a 0-1 linear program for its exact solution. The result of our FF-median program is a median genome with median genes associated to extant genes, in which median adjacencies are assumed to define positional orthologs. We demonstrate through simulations and comparison with the OMA orthology database that the herein presented method is able compute accurate medians and positional orthologs for genomes comparable in size of bacterial genomes.

preprint2016arXiv

The SCJ small parsimony problem for weighted gene adjacencies (Extended version)

Reconstructing ancestral gene orders in a given phylogeny is a classical problem in comparative genomics. Most existing methods compare conserved features in extant genomes in the phylogeny to define potential ancestral gene adjacencies, and either try to reconstruct all ancestral genomes under a global evolutionary parsimony criterion, or, focusing on a single ancestral genome, use a scaffolding approach to select a subset of ancestral gene adjacencies, generally aiming at reducing the fragmentation of the reconstructed ancestral genome. In this paper, we describe an exact algorithm for the Small Parsimony Problem that combines both approaches. We consider that gene adjacencies at internal nodes of the species phylogeny are weighted, and we introduce an objective function defined as a convex combination of these weights and the evolutionary cost under the Single-Cut-or-Join (SCJ) model. The weights of ancestral gene adjacencies can e.g. be obtained through the recent availability of ancient DNA sequencing data, which provide a direct hint at the genome structure of the considered ancestor, or through probabilistic analysis of gene adjacencies evolution. We show the NP-hardness of our problem variant and propose a Fixed-Parameter Tractable algorithm based on the Sankoff-Rousseau dynamic programming algorithm that also allows to sample co-optimal solutions. We apply our approach to mammalian and bacterial data providing different degrees of complexity. We show that including adjacency weights in the objective has a significant impact in reducing the fragmentation of the reconstructed ancestral gene orders.

preprint2015arXiv

Assessing the robustness of parsimonious predictions for gene neighborhoods from reconciled phylogenies

The availability of a large number of assembled genomes opens the way to study the evolution of syntenic character within a phylogenetic context. The DeCo algorithm, recently introduced by B{é}rard et al. allows the computation of parsimonious evolutionary scenarios for gene adjacencies, from pairs of reconciled gene trees. Following the approach pioneered by Sturmfels and Pachter, we describe how to modify the DeCo dynamic programming algorithm to identify classes of cost schemes that generates similar parsimonious evolutionary scenarios for gene adjacencies, as well as the robustness to changes to the cost scheme of evolutionary events of the presence or absence of specific ancestral gene adjacencies. We apply our method to six thousands mammalian gene families, and show that computing the robustness to changes to cost schemes provides new and interesting insights on the evolution of gene adjacencies and the DeCo model.

preprint2015arXiv

Chaining fragments in sequences: to sweep or not

Computing an optimal chain of fragments is a classical problem in string algorithms, with important applications in computational biology. There exist two efficient dynamic programming algorithms solving this problem, based on different principles. In the present note, we show how it is possible to combine the principles of two of these algorithms in order to design a hybrid dynamic programming algorithm that combines the advantages of both algorithms.

preprint2015arXiv

Joint Inference of Genome Structure and Content in Heterogeneous Tumour Samples

For a genomically unstable cancer, a single tumour biopsy will often contain a mixture of competing tumour clones. These tumour clones frequently differ with respect to their genomic content (copy number of each gene) and structure (order of genes on each chromosome). Modern bulk genome sequencing mixes the signals of tumour clones and contaminating normal cells, complicating inference of genomic content and structure. We propose a method to unmix tumour and contaminating normal signals and jointly predict genomic structure and content of each tumour clone. We use genome graphs to represent tumour clones, and model the likelihood of the observed reads given clones and mixing proportions. Our use of haplotype blocks allows us to accurately measure allele specific read counts, and infer allele specific copy number for each clone. The proposed method is a heuristic local search based on applying incremental, locally optimal modifications of the genome graphs. Using simulated data, we show that our method predicts copy counts and gene adjacencies with reasonable accuracy.

preprint2013arXiv

Hypergraph covering problems motivated by genome assembly questions

The Consecutive-Ones Property (C1P) is a classical concept in discrete mathematics that has been used in several genomics applications, from physical mapping of contemporary genomes to the assembly of ancient genomes. A common issue in genome assembly concerns repeats, genomic sequences that appear in several locations of a genome. Handling repeats leads to a variant of the C1P, the C1P with multiplicity (mC1P), that can also be seen as the problem of covering edges of hypergraphs by linear and circular walks. In the present work, we describe variants of the mC1P that address specific issues of genome assembly, and polynomial time or fixed-parameter algorithms to solve them.

preprint2013arXiv

The genome of the medieval Black Death agent (extended abstract)

The genome of a 650 year old Yersinia pestis bacteria, responsible for the medieval Black Death, was recently sequenced and assembled into 2,105 contigs from the main chromosome. According to the point mutation record, the medieval bacteria could be an ancestor of most Yersinia pestis extant species, which opens the way to reconstructing the organization of these contigs using a comparative approach. We show that recent computational paleogenomics methods, aiming at reconstructing the organization of ancestral genomes from the comparison of extant genomes, can be used to correct, order and complete the contig set of the Black Death agent genome, providing a full chromosome sequence, at the nucleotide scale, of this ancient bacteria. This sequence suggests that a burst of mobile elements insertions predated the Black Death, leading to an exceptional genome plasticity and increase in rearrangement rate.

preprint2012arXiv

Average-case analysis of perfect sorting by reversals (Journal Version)

Perfect sorting by reversals, a problem originating in computational genomics, is the process of sorting a signed permutation to either the identity or to the reversed identity permutation, by a sequence of reversals that do not break any common interval. Bérard et al. (2007) make use of strong interval trees to describe an algorithm for sorting signed permutations by reversals. Combinatorial properties of this family of trees are essential to the algorithm analysis. Here, we use the expected value of certain tree parameters to prove that the average run-time of the algorithm is at worst, polynomial, and additionally, for sufficiently long permutations, the sorting algorithm runs in polynomial time with probability one. Furthermore, our analysis of the subclass of commuting scenarios yields precise results on the average length of a reversal, and the average number of reversals.

preprint2012arXiv

Efficient Algorithms for Finding Tucker Patterns

The Consecutive Ones Property is an important notion for binary matrices, both from a theoretical and applied point of view. Tucker gave in 1972 a characterization of matrices that do not satisfy the Consecutive Ones Property in terms of forbidden submatrices, the Tucker patterns. We describe here a linear time algorithm to find a Tucker pattern in a non-C1P binary matrix, which allows to extract in linear time a certificate for the non-C1P. We also describe an output-sensitive algorithm to enumerate all Tucker patterns of a non-C1P binary matrix. This paper had been withdrawn due to some missing cases in Algorithms 2 and 3.

preprint2011arXiv

A tight bound on the length of odd cycles in the incompatibility graph of a non-C1P matrix

A binary matrix has the consecutive ones property (C1P) if it is possible to order the columns so that all 1s are consecutive in every row. In [McConnell, SODA 2004 768-777] the notion of incompatibility graph of a binary matrix was introduced and it was shown that odd cycles of this graph provide a certificate that a matrix does not have the consecutive ones property. A bound of (k+2) was claimed for the smallest odd cycle of a non-C1P matrix with k columns. In this note we show that this result can be obtained simply and directly via Tucker patterns, and that the correct bound is (k+2) when k is even, but (k+3) when k is odd.

preprint2011arXiv

Tractability results for the Double-Cut-and-Join circular median problem

The circular median problem in the Double-Cut-and-Join (DCJ) distance asks to find, for three given genomes, a fourth circular genome that minimizes the sum of the mutual distances with the three other ones. This problem has been shown to be NP-complete. We show here that, if the number of vertices of degree 3 in the breakpoint graph of the three input genomes is fixed, then the problem is tractable

preprint2009arXiv

Minimal Conflicting Sets for the Consecutive Ones Property in ancestral genome reconstruction

A binary matrix has the Consecutive Ones Property (C1P) if its columns can be ordered in such a way that all 1's on each row are consecutive. A Minimal Conflicting Set is a set of rows that does not have the C1P, but every proper subset has the C1P. Such submatrices have been considered in comparative genomics applications, but very little is known about their combinatorial structure and efficient algorithms to compute them. We first describe an algorithm that detects rows that belong to Minimal Conflicting Sets. This algorithm has a polynomial time complexity when the number of 1's in each row of the considered matrix is bounded by a constant. Next, we show that the problem of computing all Minimal Conflicting Sets can be reduced to the joint generation of all minimal true clauses and maximal false clauses for some monotone boolean function. We use these methods on simulated data related to ancestral genome reconstruction to show that computing Minimal Conflicting Set is useful in discriminating between true positive and false positive ancestral syntenies. We also study a dataset of yeast genomes and address the reliability of an ancestral genome proposal of the Saccahromycetaceae yeasts.

preprint2005arXiv

Combinatorial operators for Kronecker powers of representations of $§_n$

We present combinatorial operators for the expansion of the Kronecker product of irreducible representations of the symmetric group. These combinatorial operators are defined in the ring of symmetric functions and act on the Schur functions basis. This leads to a combinatorial description of the Kronecker powers of the irreducible representations indexed with the partition (n-1,1) which specializes the concept of oscillating tableaux in Young's lattice previously defined by S. Sundaram. We call our specialization {\it Kronecker tableaux}. Their combinatorial analysis leads to enumerative results for the multiplicity of any irreducible representation in the Kronecker powers of the form ${\c^{(n-1,1)}}^{\otimes k}$.

Cedric Chauve

What is connected

Connect this record

See the researcher in context

Building this map preview

14 published item(s)

Counting, generating and sampling tree alignments

The gene family-free median of three

The SCJ small parsimony problem for weighted gene adjacencies (Extended version)

Assessing the robustness of parsimonious predictions for gene neighborhoods from reconciled phylogenies

Chaining fragments in sequences: to sweep or not

Joint Inference of Genome Structure and Content in Heterogeneous Tumour Samples

Hypergraph covering problems motivated by genome assembly questions

The genome of the medieval Black Death agent (extended abstract)

Average-case analysis of perfect sorting by reversals (Journal Version)

Efficient Algorithms for Finding Tucker Patterns

A tight bound on the length of odd cycles in the incompatibility graph of a non-C1P matrix

Tractability results for the Double-Cut-and-Join circular median problem

Minimal Conflicting Sets for the Consecutive Ones Property in ancestral genome reconstruction

Combinatorial operators for Kronecker powers of representations of $§_n$