Source author record

Jeremy Sumner

Jeremy Sumner appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Populations and Evolution math.ST Statistics Theory math.CO math.GR math.PR math.RA Quantitative Methods

Catalog footprint

What is connected

6works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2023arXiv

Rearrangement Events on Circular Genomes

Early literature on genome rearrangement modelling views the problem of computing evolutionary distances as an inherently combinatorial one. In particular, attention was given to estimating distances using the minimum number of events required to transform one genome into another. In hindsight, this approach is analogous to early methods for inferring phylogenetic trees from DNA sequences such as maximum parsimony -- both are motivated by the principle that the true distance minimises evolutionary change, and both are effective if this principle is a true reflection of reality. Recent literature considers genome rearrangement under statistical models, continuing this parallel with DNA-based methods; the goal here is to use model-based methods (for example maximum likelihood techniques) to compute distance estimates that incorporate the large number of rearrangement paths that can transform one genome into another. Crucially, this approach requires one to decide upon a set of feasible rearrangement events and, in this paper, we focus on characterising well-motivated models for signed, uni-chromosomal circular genomes, where the number of regions remains fixed. Since rearrangements are often mathematically described using permutations, we isolate the sets of permutations representing rearrangements that are biologically reasonable in this context, for example inversions and translocations. We provide precise mathematical expressions for these rearrangements, and then describe them in terms of the set of cuts made in the genome when they are applied. We directly compare cuts to breakpoints, and use this concept to count the distinct rearrangement actions which apply a given number of cuts. Finally, we provide some examples of rearrangement models, and include a discussion of some questions that arise when defining plausible models.

preprint2022arXiv

A new algebraic approach to genome rearrangement models

We present a unified framework for modelling genomes and their rearrangements in a genome algebra, as elements that simultaneously incorporate all physical symmetries. Building on previous work utilising the group algebra of the symmetric group, we explicitly construct the genome algebra for the case of unsigned circular genomes with dihedral symmetry and show that the maximum likelihood estimate (MLE) of genome rearrangement distance can be validly and more efficiently performed in this setting. We then construct the genome algebra for a more general case, that is, for genomes that may be represented by elements of an arbitrary group and symmetry group, and show that the MLE computations can be performed entirely within this framework. There is no prescribed model in this framework; that is, it allows any choice of rearrangements that preserve the set of regions, along with arbitrary weights. Further, since the likelihood function is built from path probabilities -- a generalisation of path counts -- the framework may be utilised for any distance measure that is based on path probabilities.

preprint2022arXiv

Evaluation of the relative performance of the subflattenings method for phylogenetic inference

The algebraic properties of flattenings and subflattenings provide direct methods for identifying edges in the true phylogeny -- and by extension the complete tree -- using pattern counts from a sequence alignment. The relatively small number of possible internal edges among a set of taxa (compared to the number of binary trees) makes these methods attractive, however more could be done to evaluate their effectiveness for inferring phylogenetic trees. This is the case particularly for subflattenings, and our work makes progress in this area. We introduce software for constructing and evaluating subflattenings for splits, utilising a number of methods to make computing subflattenings more tractable. We then present the results of simulations we have performed in order to compare the effectiveness of subflattenings to that of flattenings in terms of split score distributions, and susceptibility to possible biases. We find that subflattenings perform similarly to flattenings in terms of the distribution of split scores on the trees we examined, but may be less affected by bias arising from both split size/balance and long branch attraction. These insights are useful for developing effective algorithms to utilise these tools for the purpose of inferring phylogenetic trees.

preprint2020arXiv

Notes on Markov embedding

The representation problem of finite-dimensional Markov matrices in Markov semigroups is revisited, with emphasis on concrete criteria for matrix subclasses of theoretical or practical relevance, such as equal-input, circulant, symmetric or doubly stochastic matrices. Here, we pay special attention to various algebraic properties of the embedding problem, and discuss the connection with the centraliser of a Markov matrix.

preprint2011arXiv

Is the general time-reversible model bad for molecular phylogenetics?

The general time reversible model (GTR) is presently the most popular model used in phylogentic studies. However, GTR has an undesirable mathematical property that is potentially of significant concern. It is the purpose of this article to give examples that demonstrate why this deficit may pose a problem for phylogenetic analysis and interpretation.

preprint2011arXiv

Lie Markov Models

Recent work has discussed the importance of multiplicative closure for the Markov models used in phylogenetics. For continuous-time Markov chains, a sufficient condition for multiplicative closure of a model class is ensured by demanding that the set of rate-matrices belonging to the model class form a Lie algebra. It is the case that some well-known Markov models do form Lie algebras and we refer to such models as "Lie Markov models". However it is also the case that some other well-known Markov models unequivocally do not form Lie algebras. In this paper, we will discuss how to generate Lie Markov models by demanding that the models have certain symmetries under nucleotide permutations. We show that the Lie Markov models include, and hence provide a unifying concept for, "group-based" and "equivariant" models. For each of two, three and four character states, the full list of Lie Markov models with maximal symmetry is presented and shown to include interesting examples that are neither group-based nor equivariant. We also argue that our scheme is pleasing in the context of applied phylogenetics, as, for a given symmetry of nucleotide substitution, it provides a natural hierarchy of models with increasing number of parameters.

Jeremy Sumner

What is connected

Connect this record

See the researcher in context

Building this map preview

6 published item(s)

Rearrangement Events on Circular Genomes

A new algebraic approach to genome rearrangement models

Evaluation of the relative performance of the subflattenings method for phylogenetic inference

Notes on Markov embedding

Is the general time-reversible model bad for molecular phylogenetics?

Lie Markov Models