Researcher profile

David Schaller

David Schaller contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 17 - UnverifiedVerification L1Unclaimed author
4works
0followers
6topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

4 published item(s)

preprint2022arXiv

Clustering Systems of Phylogenetic Networks

Rooted acyclic graphs appear naturally when the phylogenetic relationship of a set $X$ of taxa involves not only speciations but also recombination, horizontal transfer, or hybridization, that cannot be captured by trees. A variety of classes of such networks have been discussed in the literature, including phylogenetic, level-1, tree-child, tree-based, galled tree, regular, or normal networks as models of different types of evolutionary processes. Clusters arise in models of phylogeny as the sets $\mathtt{C}(v)$ of descendant taxa of a vertex $v$. The clustering system $\mathscr{C}_N$ comprising the clusters of a network $N$ conveys key information on $N$ itself. In the special case of rooted phylogenetic trees, $T$ is uniquely determined by its clustering system $\mathscr{C}_T$. Although this is no longer true for networks in general, it is of interest to relate properties of $N$ and $\mathscr{C}_N$. Here, we systematically investigate the relationships of several well-studied classes of networks and their clustering systems. The main results are correspondences of classes of networks and clustering system of the following form: If $N$ is a network of type $\mathbb{X}$, then $\mathcal{C}_N$ satisfies $\mathbb{Y}$, and conversely if $\mathscr{C}$ is a clustering system satisfying $\mathbb{Y}$ then there is network $N$ of type $\mathbb{X}$ such that $\mathscr{C}\subseteq\mathscr{C}_N$.This, in turn, allows us to investigate the mutual dependencies between the distinct types of networks in much detail.

preprint2021arXiv

Best Match Graphs with Binary Trees

Best match graphs (BMG) are a key intermediate in graph-based orthology detection and contain a large amount of information on the gene tree. We provide a near-cubic algorithm to determine whether a BMG is binary-explainable, i.e., whether it can be explained by a fully resolved gene tree and, if so, to construct such a tree. Moreover, we show that all such binary trees are refinements of the unique binary-resolvable tree (BRT), which in general is a substantial refinement of the also unique least resolved tree of a BMG. Finally, we show that the problem of editing an arbitrary vertex-colored graph to a binary-explainable BMG is NP-complete and provide an integer linear program formulation for this task.

preprint2021arXiv

Least resolved trees for two-colored best match graphs

2-colored best match graphs (2-BMGs) form a subclass of sink-free bi-transitive graphs that appears in phylogenetic combinatorics. There, 2-BMGs describe evolutionarily most closely related genes between a pair of species. They are explained by a unique least resolved tree (LRT). Introducing the concept of support vertices we derive an $O(|V|+|E|\log^2|V|)$-time algorithm to recognize 2-BMGs and to construct its LRT. The approach can be extended to also recognize binary-explainable 2-BMGs with the same complexity. An empirical comparison emphasizes the efficiency of the new algorithm.

preprint2020arXiv

From Best Hits to Best Matches

Many of the commonly used methods for orthology detection start from mutually most similar pairs of genes (reciprocal best hits) as an approximation for evolutionary most closely related pairs of genes (reciprocal best matches). This approximation of best matches by best hits becomes exact for ultrametric dissimilarities, i.e., under the Molecular Clock Hypothesis. It fails, however, whenever there are large lineage specific rate variations among paralogous genes. In practice, this introduces a high level of noise into the input data for best-hit-based orthology detection methods. If additive distances between genes are known, then evolutionary most closely related pairs can be identified by considering certain quartets of genes provided that in each quartet the outgroup relative to the remaining three genes is known. \emph{A priori} knowledge of underlying species phylogeny greatly facilitates the identification of the required outgroup. Although the workflow remains a heuristic since the correct outgroup cannot be determined reliably in all cases, simulations with lineage specific biases and rate asymmetries show that nearly perfect results can be achieved. In a realistic setting, where distances data have to be estimated from sequence data and hence are noisy, it is still possible to obtain highly accurate sets of best matches. Improvements of tree-free orthology assessment methods can be expected from a combination of the accurate inference of best matches reported here and recent mathematical advances in the understanding of (reciprocal) best match graphs and orthology relations.