Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
17works
0followers
14topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

17 published item(s)

preprint2022arXiv

Clustering Systems of Phylogenetic Networks

Rooted acyclic graphs appear naturally when the phylogenetic relationship of a set $X$ of taxa involves not only speciations but also recombination, horizontal transfer, or hybridization, that cannot be captured by trees. A variety of classes of such networks have been discussed in the literature, including phylogenetic, level-1, tree-child, tree-based, galled tree, regular, or normal networks as models of different types of evolutionary processes. Clusters arise in models of phylogeny as the sets $\mathtt{C}(v)$ of descendant taxa of a vertex $v$. The clustering system $\mathscr{C}_N$ comprising the clusters of a network $N$ conveys key information on $N$ itself. In the special case of rooted phylogenetic trees, $T$ is uniquely determined by its clustering system $\mathscr{C}_T$. Although this is no longer true for networks in general, it is of interest to relate properties of $N$ and $\mathscr{C}_N$. Here, we systematically investigate the relationships of several well-studied classes of networks and their clustering systems. The main results are correspondences of classes of networks and clustering system of the following form: If $N$ is a network of type $\mathbb{X}$, then $\mathcal{C}_N$ satisfies $\mathbb{Y}$, and conversely if $\mathscr{C}$ is a clustering system satisfying $\mathbb{Y}$ then there is network $N$ of type $\mathbb{X}$ such that $\mathscr{C}\subseteq\mathscr{C}_N$.This, in turn, allows us to investigate the mutual dependencies between the distinct types of networks in much detail.

preprint2022arXiv

What makes a reaction network "chemical"?

Reaction networks (RNs) comprise a set $X$ of species and a set $\mathscr{R}$ of reactions $Y\to Y'$, each converting a multiset of educts $Y\subseteq X$ into a multiset $Y'\subseteq X$ of products. RNs are equivalent to directed hypergraphs. However, not all RNs necessarily admit a chemical interpretation. Instead, they might contradict fundamental principles of physics such as the conservation of energy and mass or the reversibility of chemical reactions. The consequences of these necessary conditions for the stoichiometric matrix $\mathbf{S} \in \mathbb{R}^{X\times\mathscr{R}}$ have been discussed extensively in the literature. Here, we provide sufficient conditions for $\mathbf{S}$ that guarantee the interpretation of RNs in terms of balanced sum formulas and structural formulas, respectively. Chemically plausible RNs allow neither a perpetuum mobile, i.e., a "futile cycle" of reactions with non-vanishing energy production, nor the creation or annihilation of mass. Such RNs are said to be thermodynamically sound and conservative. For finite RNs, both conditions can be expressed equivalently as properties of $\mathbf{S}$. The first condition is vacuous for reversible networks, but it excludes irreversible futile cycles and - in a stricter sense - futile cycles that even contain an irreversible reaction. The second condition is equivalent to the existence of a strictly positive reaction invariant. Furthermore, it is sufficient for the existence of a realization in terms of sum formulas, obeying conservation of "atoms". In particular, these realizations can be chosen such that any two species have distinct sum formulas, unless $\mathbf{S}$ implies that they are "obligatory isomers". In terms of structural formulas, every compound is a labeled multigraph, in essence a Lewis formula, and reactions comprise only a rearrangement of bonds such that the total bond order is preserved.

preprint2021arXiv

Best Match Graphs with Binary Trees

Best match graphs (BMG) are a key intermediate in graph-based orthology detection and contain a large amount of information on the gene tree. We provide a near-cubic algorithm to determine whether a BMG is binary-explainable, i.e., whether it can be explained by a fully resolved gene tree and, if so, to construct such a tree. Moreover, we show that all such binary trees are refinements of the unique binary-resolvable tree (BRT), which in general is a substantial refinement of the also unique least resolved tree of a BMG. Finally, we show that the problem of editing an arbitrary vertex-colored graph to a binary-explainable BMG is NP-complete and provide an integer linear program formulation for this task.

preprint2021arXiv

Least resolved trees for two-colored best match graphs

2-colored best match graphs (2-BMGs) form a subclass of sink-free bi-transitive graphs that appears in phylogenetic combinatorics. There, 2-BMGs describe evolutionarily most closely related genes between a pair of species. They are explained by a unique least resolved tree (LRT). Introducing the concept of support vertices we derive an $O(|V|+|E|\log^2|V|)$-time algorithm to recognize 2-BMGs and to construct its LRT. The approach can be extended to also recognize binary-explainable 2-BMGs with the same complexity. An empirical comparison emphasizes the efficiency of the new algorithm.

preprint2020arXiv

Best Match Graphs

THIS IS A CORRECTED VERSION INCLUDING AN APPENDED CORRIGENDUM. Best match graphs arise naturally as the first processing intermediate in algorithms for orthology detection. Let $T$ be a phylogenetic (gene) tree $T$ and $σ$ an assignment of leaves of $T$ to species. The best match graph $(G,σ)$ is a digraph that contains an arc from $x$ to $y$ if the genes $x$ and $y$ reside in different species and $y$ is one of possibly many (evolutionary) closest relatives of $x$ compared to all other genes contained in the species $σ(y)$. Here, we characterize best match graphs and show that it can be decided in cubic time and quadratic space whether $(G,σ)$ derived from a tree in this manner. If the answer is affirmative, there is a unique least resolved tree that explains $(G,σ)$, which can also be constructed in cubic time.

preprint2020arXiv

Complete Edge-Colored Permutation Graphs

We introduce the concept of complete edge-colored permutation graphs as complete graphs that are the edge-disjoint union of "classical" permutation graphs. We show that a graph $G=(V,E)$ is a complete edge-colored permutation graph if and only if each monochromatic subgraph of $G$ is a "classical" permutation graph and $G$ does not contain a triangle with~$3$ different colors. Using the modular decomposition as a framework we demonstrate that complete edge-colored permutation graphs are characterized in terms of their strong prime modules, which induce also complete edge-colored permutation graphs. This leads to an $\mathcal{O}(|V|^2)$-time recognition algorithm. We show, moreover, that complete edge-colored permutation graphs form a superclass of so-called symbolic ultrametrics and that the coloring of such graphs is always a Gallai coloring.

preprint2020arXiv

Convexity deficit of benzenoids

In 2012, a family of benzenoids was introduced by Cruz, Gutman, and Rada, which they called convex benzenoids. In this paper we introduce the convexity deficit, a new topological index intended for benzenoids and, more generally, fusenes. This index measures by how much a given fusene departs from convexity. It is defined in terms of the boundary-edges code. In particular, convex benzenoids are exactly the benzenoids having convexity deficit equal to 0. Quasi-convex benzenoids form the family of non-convex benzenoids that are closest to convex, i.e., they have convexity deficit equal to 1. Finally, we investigate convexity deficit of several important families of benzenoids.

preprint2020arXiv

Exact-$2$-Relation Graphs

Pairwise compatibility graphs (PCGs) with non-negative integer edge weights recently have been used to describe rare evolutionary events and scenarios with horizontal gene transfer. Here we consider the case that vertices are separated by exactly two discrete events: Given a tree $T$ with leaf set $L$ and edge-weights $λ: E(T)\to\mathbb{N}_0$, the non-negative integer pairwise compatibility graph $\textrm{nniPCG}(T,λ,2,2)$ has vertex set $L$ and $xy$ is an edge whenever the sum of the non-negative integer weights along the unique path from $x$ to $y$ in $T$ equals $2$. A graph $G$ has a representation as $\textrm{nniPCG}(T,λ,2,2)$ if and only if its point-determining quotient $G/\!\rthin$ is a block graph, where two vertices are in relation $\rthin$ if they have the same neighborhood in $G$. If $G$ is of this type, a labeled tree $(T,λ)$ explaining $G$ can be constructed efficiently. In addition, we consider an oriented version of this class of graphs.

preprint2020arXiv

From Best Hits to Best Matches

Many of the commonly used methods for orthology detection start from mutually most similar pairs of genes (reciprocal best hits) as an approximation for evolutionary most closely related pairs of genes (reciprocal best matches). This approximation of best matches by best hits becomes exact for ultrametric dissimilarities, i.e., under the Molecular Clock Hypothesis. It fails, however, whenever there are large lineage specific rate variations among paralogous genes. In practice, this introduces a high level of noise into the input data for best-hit-based orthology detection methods. If additive distances between genes are known, then evolutionary most closely related pairs can be identified by considering certain quartets of genes provided that in each quartet the outgroup relative to the remaining three genes is known. \emph{A priori} knowledge of underlying species phylogeny greatly facilitates the identification of the required outgroup. Although the workflow remains a heuristic since the correct outgroup cannot be determined reliably in all cases, simulations with lineage specific biases and rate asymmetries show that nearly perfect results can be achieved. In a realistic setting, where distances data have to be estimated from sequence data and hence are noisy, it is still possible to obtain highly accurate sets of best matches. Improvements of tree-free orthology assessment methods can be expected from a combination of the accurate inference of best matches reported here and recent mathematical advances in the understanding of (reciprocal) best match graphs and orthology relations.

preprint2020arXiv

Hierarchical and Modularly-Minimal Vertex Colorings

Cographs are exactly the hereditarily well-colored graphs, i.e., the graphs for which a greedy vertex coloring of every induced subgraph uses only the minimally necessary number of colors $χ(G)$. We show that greedy colorings are a special case of the more general hierarchical vertex colorings, which recently were introduced in phylogenetic combinatorics. Replacing cotrees by modular decomposition trees generalizes the concept of hierarchical colorings to arbitrary graphs. We show that every graph has a modularly-minimal coloring $σ$ satisfying $|σ(M)|=χ(M)$ for every strong module $M$ of $G$. This, in particular, shows that modularly-minimal colorings provide a useful device to design efficient coloring algorithms for certain hereditary graph classes. For cographs, the hierarchical colorings coincide with the modularly-minimal coloring. As a by-product, we obtain a simple linear-time algorithm to compute a modularly-minimal coloring of $P_4$-sparse graphs.

preprint2020arXiv

Superbubbles as an Empirical Characteristic of Directed Networks

Superbubbles are acyclic induced subgraphs of a digraph with single entrance and exit that naturally arise in the context of genome assembly and the analysis of genome alignments in computational biology. These structures can be computed in linear time and are confined to non-symmetric digraphs. We demonstrate empirically that graph parameters derived from superbubbles provide a convenient means of distinguishing different classes of real-world graphical models, while being largely unrelated to simple, commonly used parameters.

preprint2013arXiv

Distribution of graph-distances in Boltzmann ensembles of RNA secondary structures

Large RNA molecules often carry multiple functional domains whose spatial arrangement is an important determinant of their function. Pre-mRNA splicing, furthermore, relies on the spatial proximity of the splice junctions that can be separated by very long introns. Similar effects appear in the processing of RNA virus genomes. Albeit a crude measure, the distribution of spatial distances in thermodynamic equilibrium therefore provides useful information on the overall shape of the molecule can provide insights into the interplay of its functional domains. Spatial distance can be approximated by the graph-distance in RNA secondary structure. We show here that the equilibrium distribution of graph-distances between arbitrary nucleotides can be computed in polynomial time by means of dynamic programming. A naive implementation would yield recursions with a very high time complexity of O(n^11). Although we were able to reduce this to O(n^6) for many practical applications a further reduction seems difficult. We conclude, therefore, that sampling approaches, which are much easier to implement, are also theoretically favorable for most real-life applications, in particular since these primarily concern long-range interactions in very large RNA molecules.

preprint2013arXiv

On the Complexity of Reconstructing Chemical Reaction Networks

The analysis of the structure of chemical reaction networks is crucial for a better understanding of chemical processes. Such networks are well described as hypergraphs. However, due to the available methods, analyses regarding network properties are typically made on standard graphs derived from the full hypergraph description, e.g.\ on the so-called species and reaction graphs. However, a reconstruction of the underlying hypergraph from these graphs is not necessarily unique. In this paper, we address the problem of reconstructing a hypergraph from its species and reaction graph and show NP-completeness of the problem in its Boolean formulation. Furthermore we study the problem empirically on random and real world instances in order to investigate its computational limits in practice.

preprint2012arXiv

Inferring Chemical Reaction Patterns Using Rule Composition in Graph Grammars

Modeling molecules as undirected graphs and chemical reactions as graph rewriting operations is a natural and convenient approach tom odeling chemistry. Graph grammar rules are most naturally employed to model elementary reactions like merging, splitting, and isomerisation of molecules. It is often convenient, in particular in the analysis of larger systems, to summarize several subsequent reactions into a single composite chemical reaction. We use a generic approach for composing graph grammar rules to define a chemically useful rule compositions. We iteratively apply these rule compositions to elementary transformations in order to automatically infer complex transformation patterns. This is useful for instance to understand the net effect of complex catalytic cycles such as the Formose reaction. The automatically inferred graph grammar rule is a generic representative that also covers the overall reaction pattern of the Formose cycle, namely two carbonyl groups that can react with a bound glycolaldehyde to a second glycolaldehyde. Rule composition also can be used to study polymerization reactions as well as more complicated iterative reaction schemes. Terpenes and the polyketides, for instance, form two naturally occurring classes of compounds of utmost pharmaceutical interest that can be understood as "generalized polymers" consisting of five-carbon (isoprene) and two-carbon units, respectively.

preprint2012arXiv

Landscape encodings enhance optimization

Hard combinatorial optimization problems deal with the search for the minimum cost solutions (ground states) of discrete systems under strong constraints. A transformation of state variables may enhance computational tractability. It has been argued that these state encodings are to be chosen invertible to retain the original size of the state space. Here we show how redundant non-invertible encodings enhance optimization by enriching the density of low-energy states. In addition, smooth landscapes may be established on encoded state spaces to guide local search dynamics towards the ground state.

preprint2012arXiv

Relations Between Graphs

Given two graphs G and H, we ask under which conditions there is a relation R that generates the edges of H given the structure of graph G. This construction can be seen as a form of multihomomorphism. It generalizes surjective homomorphisms of graphs and naturally leads to notions of R-retractions, R-cores, and R-cocores of graphs. Both R-cores and R-cocores of graphs are unique up to isomorphism and can be computed in polynomial time.

preprint2011arXiv

Maximizing Output and Recognizing Autocatalysis in Chemical Reaction Networks is NP-Complete

Background: A classical problem in metabolic design is to maximize the production of desired compound in a given chemical reaction network by appropriately directing the mass flow through the network. Computationally, this problem is addressed as a linear optimization problem over the "flux cone". The prior construction of the flux cone is computationally expensive and no polynomial-time algorithms are known. Results: Here we show that the output maximization problem in chemical reaction networks is NP-complete. This statement remains true even if all reactions are monomolecular or bimolecular and if only a single molecular species is used as influx. As a corollary we show, furthermore, that the detection of autocatalytic species, i.e., types that can only be produced from the influx material when they are present in the initial reaction mixture, is an NP-complete computational problem. Conclusions: Hardness results on combinatorial problems and optimization problems are important to guide the development of computational tools for the analysis of metabolic networks in particular and chemical reaction networks in general. Our results indicate that efficient heuristics and approximate algorithms need to be employed for the analysis of large chemical networks since even conceptually simple flow problems are provably intractable.