Researcher profile

Gregory J. Puleo

Gregory J. Puleo contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - Emerging
16works
0followers
8topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

16 published item(s)

preprint2016arXiv

A new correlation clustering method for cancer mutation analysis

Cancer genomes exhibit a large number of different alterations that affect many genes in a diverse manner. It is widely believed that these alterations follow combinatorial patterns that have a strong connection with the underlying molecular interaction networks and functional pathways. A better understanding of the generative mechanisms behind the mutation rules and their influence on gene communities is of great importance for the process of driver mutations discovery and for identification of network modules related to cancer development and progression. We developed a new method for cancer mutation pattern analysis based on a constrained form of correlation clustering. Correlation clustering is an agnostic learning method that can be used for general community detection problems in which the number of communities or their structure is not known beforehand. The resulting algorithm, named $C^3$, leverages mutual exclusivity of mutations, patient coverage, and driver network concentration principles; it accepts as its input a user determined combination of heterogeneous patient data, such as that available from TCGA (including mutation, copy number, and gene expression information), and creates a large number of clusters containing mutually exclusive mutated genes in a particular type of cancer. The cluster sizes may be required to obey some useful soft size constraints, without impacting the computational complexity of the algorithm. To test $C^3$, we performed a detailed analysis on TCGA breast cancer and glioblastoma data and showed that our algorithm outperforms the state-of-the-art CoMEt method in terms of discovering mutually exclusive gene modules and identifying driver genes. Our $C^3$ method represents a unique tool for efficient and reliable identification of mutation patterns and driver pathways in large-scale cancer genomics studies.

preprint2016arXiv

Correlation Clustering and Biclustering with Locally Bounded Errors

We consider a generalized version of the correlation clustering problem, defined as follows. Given a complete graph $G$ whose edges are labeled with $+$ or $-$, we wish to partition the graph into clusters while trying to avoid errors: $+$ edges between clusters or $-$ edges within clusters. Classically, one seeks to minimize the total number of such errors. We introduce a new framework that allows the objective to be a more general function of the number of errors at each vertex (for example, we may wish to minimize the number of errors at the worst vertex) and provide a rounding algorithm which converts "fractional clusterings" into discrete clusterings while causing only a constant-factor blowup in the number of errors at each vertex. This rounding algorithm yields constant-factor approximation algorithms for the discrete problem under a wide variety of objective functions.

preprint2016arXiv

On an Edge Precoloring Conjecture

Edwards, van den Heuvel, Kang, and Sereni conjectured the following strengthening of Vizing's Theorem: let $G$ be a simple graph, and let $K = Δ(G) + 1$. For any matching $M$ in $G$ and any precoloring of the edges in $M$ using the colors $\{1, \ldots, K\}$, there is some proper $K$-edge-coloring of $G$ extending the given precoloring. We give an infinite family of counterexamples to this conjecture, and prove a weaker version of the conjecture proposed in the same work.

preprint2015arXiv

Codes for DNA Sequence Profiles

We consider the problem of storing and retrieving information from synthetic DNA media. The mathematical basis of the problem is the construction and design of sequences that may be discriminated based on their collection of substrings observed through a noisy channel. This problem of reconstructing sequences from traces was first investigated in the noiseless setting under the name of "Markov type" analysis. Here, we explain the connection between the reconstruction problem and the problem of DNA synthesis and sequencing, and introduce the notion of a DNA storage channel. We analyze the number of sequence equivalence classes under the channel mapping and propose new asymmetric coding techniques to combat the effects of synthesis and sequencing noise. In our analysis, we make use of restricted de Bruijn graphs and Ehrhart theory for rational polytopes.

preprint2015arXiv

Codes for DNA Storage Channels

We consider the problem of assembling a sequence based on a collection of its substrings observed through a noisy channel. The mathematical basis of the problem is the construction and design of sequences that may be discriminated based on a collection of their substrings observed through a noisy channel. We explain the connection between the sequence reconstruction problem and the problem of DNA synthesis and sequencing, and introduce the notion of a DNA storage channel. We analyze the number of sequence equivalence classes under the channel mapping and propose new asymmetric coding techniques to combat the effects of synthesis and sequencing noise. In our analysis, we make use of restricted de Bruijn graphs and Ehrhart theory for rational polytopes.

preprint2015arXiv

Complexity of a Disjoint Matching Problem on Bipartite Graphs

We consider the following question: given an $(X,Y)$-bigraph $G$ and a set $S \subset X$, does $G$ contain two disjoint matchings $M_1$ and $M_2$ such that $M_1$ saturates $X$ and $M_2$ saturates $S$? When $|S|\geq |X|-1$, this question is solvable by finding an appropriate factor of the graph. In contrast, we show that when $S$ is allowed to be an arbitrary subset of $X$, the problem is NP-hard.

preprint2015arXiv

Correlation Clustering with Constrained Cluster Sizes and Extended Weights Bounds

We consider the problem of correlation clustering on graphs with constraints on both the cluster sizes and the positive and negative weights of edges. Our contributions are twofold: First, we introduce the problem of correlation clustering with bounded cluster sizes. Second, we extend the regime of weight values for which the clustering may be performed with constant approximation guarantees in polynomial time and apply the results to the bounded cluster size problem.

preprint2015arXiv

Extremal Aspects of the Erdős--Gallai--Tuza Conjecture

Erdős, Gallai, and Tuza posed the following problem: given an $n$-vertex graph $G$, let $τ_1(G)$ denote the smallest size of a set of edges whose deletion makes $G$ triangle-free, and let $α_1(G)$ denote the largest size of a set of edges containing at most one edge from each triangle of $G$. Is it always the case that $α_1(G) + τ_1(G) \leq n^2/4$? We also consider a variant on this conjecture: if $τ_B(G)$ is the smallest size of an edge set whose deletion makes $G$ bipartite, does the stronger inequality $α_1(G) + τ_B(G) \leq n^2/4$ always hold? By considering the structure of a minimal counterexample to each version of the conjecture, we obtain two main results. Our first result states that any minimum counterexample to the original Erdős--Gallai--Tuza Conjecture has "dense edge cuts", and in particular has minimum degree greater than $n/2$. This implies that the conjecture holds for all graphs if and only if it holds for all triangular graphs (graphs where every edge lies in a triangle). Our second result states that $α_1(G) + τ_B(G) \leq n^2/4$ whenever $G$ has no induced subgraph isomorphic to $K_4^-$, the graph obtained from the complete graph $K_4$ by deleting an edge. Thus, the original conjecture also holds for such graphs.

preprint2015arXiv

Favaron's Theorem, k-dependence, and Tuza's Conjecture

A vertex set $D$ in a graph $G$ is $k$-dependent if $G[D]$ has maximum degree at most $k-1$, and $k$-dominating if every vertex outside $D$ has at least $k$ neighbors in $D$. Favaron proved that if $D$ is a $k$-dependent set maximizing the quantity $k|D| - |E(G[D])|$, then $D$ is $k$-dominating. We extend this result, showing that such sets satisfy a stronger structural property, and we find a surprising connection between Favaron's theorem and a conjecture of Tuza regarding packing and covering of triangles.

preprint2015arXiv

Tuza's Conjecture for Graphs of Maximum Average Degree Less Than 7

Tuza's Conjecture states that if a graph $G$ does not contain more than $k$ edge-disjoint triangles, then some set of at most $2k$ edges meets all triangles of $G$. We prove Tuza's Conjecture for all graphs $G$ having no subgraph with average degree at least $7$. As a key tool in the proof, we introduce a notion of reducible sets for Tuza's Conjecture; these are substructures which cannot occur in a minimal counterexample to Tuza's Conjecture. We also introduce weak König--Egerváry graphs, a generalization of the well-studied König--Egerváry graphs.

preprint2014arXiv

Choosability with Separation in Complete Multipartite Graphs

We show that there is a constant $k$ such that when $r \geq 2$ and $m \geq r^k$, the complete $r$-partite graph $K_{m*r}$ has a non-colorable list assignment $L$ such that $|L(v)| \geq \frac{7}{750}r\ln m$ for all $v$ and such that $|L(u) \cap L(v)| \leq \left\lfloor \frac{2r}{r-1} \right\rfloor$ whenever $u \neq v$. This roughly extends a result of Alon to the context of "choosability with separation", introduced by Kratochvíl, Tuza, and Voigt.

preprint2014arXiv

Computing Similarity Distances Between Rankings

We address the problem of computing distances between rankings that take into account similarities between candidates. The need for evaluating such distances is governed by applications as diverse as rank aggregation, bioinformatics, social sciences and data storage. The problem may be summarized as follows: Given two rankings and a positive cost function on transpositions that depends on the similarity of the candidates involved, find a smallest cost sequence of transpositions that converts one ranking into another. Our focus is on costs that may be described via special metric-tree structures and on complete rankings modeled as permutations. The presented results include a quadratic-time algorithm for finding a minimum cost decomposition for simple cycles, and a quadratic-time, $4/3$-approximation algorithm for permutations that contain multiple cycles. The proposed methods rely on investigating a newly introduced balancing property of cycles embedded in trees, cycle-merging methods, and shortest path optimization techniques.

preprint2014arXiv

On a Conjecture of Erdős, Gallai, and Tuza

Erdős, Gallai, and Tuza posed the following problem: given an $n$-vertex graph $G$, let $τ_1(G)$ denote the smallest size of a set of edges whose deletion makes $G$ triangle-free, and let $α_1(G)$ denote the largest size of a set of edges containing at most one edge from each triangle of $G$. Is it always the case that $α_1(G) + τ_1(G) \leq n^2/4$? We have two main results. We first obtain the upper bound $α_1(G) + τ_1(G) \leq 5n^2/16$, as a partial result towards the Erdős--Gallai--Tuza conjecture. We also show that always $α_1(G) \leq n^2/2 - m$, where $m$ is the number of edges in $G$; this bound is sharp in several notable cases.

preprint2013arXiv

Environmental Evolutionary Graph Theory

Understanding the influence of an environment on the evolution of its resident population is a major challenge in evolutionary biology. Great progress has been made in homogeneous population structures while heterogeneous structures have received relatively less attention. Here we present a structured population model where different individuals are best suited to different regions of their environment. The underlying structure is a graph: individuals occupy vertices, which are connected by edges. If an individual is suited for their vertex, they receive an increase in fecundity. This framework allows attention to be restricted to the spatial arrangement of suitable habitat. We prove some basic properties of this model and find some counter-intuitive results. Notably, 1) the arrangement of suitable sites is as important as their proportion, and, 2) decreasing the proportion of suitable sites may result in a decrease in the fixation time of an allele.

preprint2012arXiv

Revolutionaries and spies: Spy-good and spy-bad graphs

We study a game on a graph $G$ played by $r$ {\it revolutionaries} and $s$ {\it spies}. Initially, revolutionaries and then spies occupy vertices. In each subsequent round, each revolutionary may move to a neighboring vertex or not move, and then each spy has the same option. The revolutionaries win if $m$ of them meet at some vertex having no spy (at the end of a round); the spies win if they can avoid this forever. Let $σ(G,m,r)$ denote the minimum number of spies needed to win. To avoid degenerate cases, assume $|V(G)|\ge r-m+1\ge\floor{r/m}\ge 1$. The easy bounds are then $\floor{r/m}\le σ(G,m,r)\le r-m+1$. We prove that the lower bound is sharp when $G$ has a rooted spanning tree $T$ such that every edge of $G$ not in $T$ joins two vertices having the same parent in $T$. As a consequence, $σ(G,m,r)\leγ(G)\floor{r/m}$, where $γ(G)$ is the domination number; this bound is nearly sharp when $γ(G)\le m$. For the random graph with constant edge-probability $p$, we obtain constants $c$ and $c&#39;$ (depending on $m$ and $p$) such that $σ(G,m,r)$ is near the trivial upper bound when $r<c\ln n$ and at most $c&#39;$ times the trivial lower bound when $r>c&#39;\ln n$. For the hypercube $Q_d$ with $d\ge r$, we have $σ(G,m,r)=r-m+1$ when $m=2$, and for $m\ge 3$ at least $r-39m$ spies are needed. For complete $k$-partite graphs with partite sets of size at least $2r$, the leading term in $σ(G,m,r)$ is approximately $\frac{k}{k-1}\frac{r}{m}$ when $k\ge m$. For $k=2$, we have $σ(G,2,r)=\bigl\lceil{\frac{\floor{7r/2}-3}5}\bigr\rceil$ and $σ(G,3,r)=\floor{r/2}$, and in general $\frac{3r}{2m}-3\le σ(G,m,r)\le\frac{(1+1/\sqrt3)r}{m}$.

preprint2011arXiv

Chain-making games in grid-like posets

We study the Maker-Breaker game on the hypergraph of chains of fixed size in a poset. In a product of chains, the maximum size of a chain that Maker can guarantee building is $k-\lfloor r/2\rfloor$, where $k$ is the maximum size of a chain in the product, and $r$ is the maximum size of a factor chain. We also study a variant in which Maker must follow the chain in order, called the {\it Walker-Blocker game}. In the poset consisting of the bottom $k$ levels of the product of $d$ arbitrarily long chains, Walker can guarantee a chain that hits all levels if $d\ge14$; this result uses a solution to Conway&#39;s Angel-Devil game. When d=2, the maximum that Walker can guarantee is only 2/3 of the levels, and 2/3 is asymptotically achievable in the product of two equal chains.