Source author record

Yusu Wang

Yusu Wang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computational Geometry Machine Learning math.AT math.CO Artificial Intelligence Data Structures and Algorithms math.MG Computational Complexity Computer Vision math.PR math.ST Neural and Evolutionary Computing physics.chem-ph physics.comp-ph Statistics Theory

Catalog footprint

What is connected

26works

15topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

A simple yet effective baseline for non-attributed graph classification

Graphs are complex objects that do not lend themselves easily to typical learning tasks. Recently, a range of approaches based on graph kernels or graph neural networks have been developed for graph classification and for representation learning on graphs in general. As the developed methodologies become more sophisticated, it is important to understand which components of the increasingly complex methods are necessary or most effective. As a first step, we develop a simple yet meaningful graph representation, and explore its effectiveness in graph classification. We test our baseline representation for the graph classification task on a range of graph datasets. Interestingly, this simple representation achieves similar performance as the state-of-the-art graph kernels and graph neural networks for non-attributed graph classification. Its performance on classifying attributed graphs is slightly weaker as it does not incorporate attributes. However, given its simplicity and efficiency, we believe that it still serves as an effective baseline for attributed graph classification. Our graph representation is efficient (linear-time) to compute. We also provide a simple connection with the graph neural networks. Note that these observations are only for the task of graph classification while existing methods are often designed for a broader scope including node embedding and link prediction. The results are also likely biased due to the limited amount of benchmark datasets available. Nevertheless, the good performance of our simple baseline calls for the development of new, more comprehensive benchmark datasets so as to better evaluate and analyze different graph learning methods. Furthermore, given the computational efficiency of our graph summary, we believe that it is a good candidate as a baseline method for future graph classification (or even other graph learning) studies.

preprint2022arXiv

Generative Coarse-Graining of Molecular Conformations

Coarse-graining (CG) of molecular simulations simplifies the particle representation by grouping selected atoms into pseudo-beads and drastically accelerates simulation. However, such CG procedure induces information losses, which makes accurate backmapping, i.e., restoring fine-grained (FG) coordinates from CG coordinates, a long-standing challenge. Inspired by the recent progress in generative models and equivariant networks, we propose a novel model that rigorously embeds the vital probabilistic nature and geometric consistency requirements of the backmapping transformation. Our model encodes the FG uncertainties into an invariant latent space and decodes them back to FG geometries via equivariant convolutions. To standardize the evaluation of this domain, we provide three comprehensive benchmarks based on molecular dynamics trajectories. Experiments show that our approach always recovers more realistic structures and outperforms existing data-driven methods with a significant margin.

preprint2022arXiv

Intrinsic Interleaving Distance for Merge Trees

Merge trees are a type of graph-based topological summary that tracks the evolution of connected components in the sublevel sets of scalar functions. They enjoy widespread applications in data analysis and scientific visualization. In this paper, we consider the problem of comparing two merge trees via the notion of interleaving distance in the metric space setting. We investigate various theoretical properties of such a metric. In particular, we show that the interleaving distance is intrinsic on the space of labeled merge trees and provide an algorithm to construct metric 1-centers for collections of labeled merge trees. We further prove that the intrinsic property of the interleaving distance also holds for the space of unlabeled merge trees. Our results are a first step toward performing statistics on graph-based topological summaries.

preprint2022arXiv

On the clique number of noisy random geometric graphs

Let $G_n$ be a random geometric graph, and then for $q,p \in [0,1)$ we construct a "$(q,p)$-perturbed noisy random geometric graph" $G_n^{q,p}$ where each existing edge in $G_n$ is removed with probability $q$, while and each non-existent edge in $G_n$ is inserted with probability $p$. We give asymptotically tight bounds on the clique number $ω\left(G_n^{q,p}\right)$ for several regimes of parameter.

preprint2022arXiv

Persistent Laplacians: properties, algorithms and implications

We present a thorough study of the theoretical properties and devise efficient algorithms for the \emph{persistent Laplacian}, an extension of the standard combinatorial Laplacian to the setting of pairs (or, in more generality, sequences) of simplicial complexes $K \hookrightarrow L$, which was independently introduced by Lieutier et al. and by Wang et al. In particular, in analogy with the non-persistent case, we first prove that the nullity of the $q$-th persistent Laplacian $Δ_q^{K,L}$ equals the $q$-th persistent Betti number of the inclusion $(K \hookrightarrow L)$. We then present an initial algorithm for finding a matrix representation of $Δ_q^{K,L}$, which itself helps interpret the persistent Laplacian. We exhibit a novel relationship between the persistent Laplacian and the notion of Schur complement of a matrix which has several important implications. In the graph case, it both uncovers a link with the notion of effective resistance and leads to a persistent version of the Cheeger inequality. This relationship also yields an additional, very simple algorithm for finding (a matrix representation of) the $q$-th persistent Laplacian which in turn leads to a novel and fundamentally different algorithm for computing the $q$-th persistent Betti number for a pair $(K,L)$ which can be significantly more efficient than standard algorithms. Finally, we study persistent Laplacians for simplicial filtrations and present novel stability results for their eigenvalues. Our work brings methods from spectral graph theory, circuit theory, and persistent homology together with a topological view of the combinatorial Laplacian on simplicial complexes.

preprint2022arXiv

Weisfeiler-Lehman meets Gromov-Wasserstein

The Weisfeiler-Lehman (WL) test is a classical procedure for graph isomorphism testing. The WL test has also been widely used both for designing graph kernels and for analyzing graph neural networks. In this paper, we propose the Weisfeiler-Lehman (WL) distance, a notion of distance between labeled measure Markov chains (LMMCs), of which labeled graphs are special cases. The WL distance is polynomial time computable and is also compatible with the WL test in the sense that the former is positive if and only if the WL test can distinguish the two involved graphs. The WL distance captures and compares subtle structures of the underlying LMMCs and, as a consequence of this, it is more discriminating than the distance between graphs used for defining the state-of-the-art Wasserstein Weisfeiler-Lehman graph kernel. Inspired by the structure of the WL distance we identify a neural network architecture on LMMCs which turns out to be universal w.r.t. continuous functions defined on the space of all LMMCs (which includes all graphs) endowed with the WL distance. Finally, the WL distance turns out to be stable w.r.t. a natural variant of the Gromov-Wasserstein (GW) distance for comparing metric Markov chains that we identify. Hence, the WL distance can also be construed as a polynomial time lower bound for the GW distance which is in general NP-hard to compute.

preprint2021arXiv

Graph Coarsening with Neural Networks

As large-scale graphs become increasingly more prevalent, it poses significant computational challenges to process, extract and analyze large graph data. Graph coarsening is one popular technique to reduce the size of a graph while maintaining essential properties. Despite rich graph coarsening literature, there is only limited exploration of data-driven methods in the field. In this work, we leverage the recent progress of deep learning on graphs for graph coarsening. We first propose a framework for measuring the quality of coarsening algorithm and show that depending on the goal, we need to carefully choose the Laplace operator on the coarse graph and associated projection/lift operators. Motivated by the observation that the current choice of edge weight for the coarse graph may be sub-optimal, we parametrize the weight assignment map with graph neural networks and train it to improve the coarsening quality in an unsupervised way. Through extensive experiments on both synthetic and real networks, we demonstrate that our method significantly improves common graph coarsening methods under various metrics, reduction ratios, graph sizes, and graph types. It generalizes to graphs of larger size ($25\times$ of training graphs), is adaptive to different losses (differentiable and non-differentiable), and scales to much larger graphs than previous work.

preprint2021arXiv

Ordinally Consensus Subset over Multiple Metrics

In this paper, we propose to study the following maximum ordinal consensus problem: Suppose we are given a metric system (M, X), which contains k metrics M = {ρ_1,..., ρ_k} defined on the same point set X. We aim to find a maximum subset X' of X such that all metrics in M are "consistent" when restricted on the subset X'. In particular, our definition of consistency will rely only on the ordering between pairwise distances, and thus we call a "consistent" subset an ordinal consensus of X w.r.t. M. We will introduce two concepts of "consistency" in the ordinal sense: a strong one and a weak one. Specifically, a subset X' is strongly consistent means that the ordering of their pairwise distances is the same under each of the input metric ρ_i from M. The weak consistency, on the other hand, relaxes this exact ordering condition, and intuitively allows us to take the plurality of ordering relation between two pairwise distances. We show in this paper that the maximum consensus problems over both the strong and the weak consistency notions are NP-complete, even when there are only 2 or 3 simple metrics, such as line metrics and ultrametrics. We also develop constant-factor approximation algorithms for the dual version, the minimum inconsistent subset problem of a metric system (M, P), - note that optimizing these two dual problems are equivalent.

preprint2020arXiv

A Note on Over-Smoothing for Graph Neural Networks

Graph Neural Networks (GNNs) have achieved a lot of success on graph-structured data. However, it is observed that the performance of graph neural networks does not improve as the number of layers increases. This effect, known as over-smoothing, has been analyzed mostly in linear cases. In this paper, we build upon previous results \cite{oono2019graph} to further analyze the over-smoothing effect in the general graph neural network architecture. We show when the weight matrix satisfies the conditions determined by the spectrum of augmented normalized Laplacian, the Dirichlet energy of embeddings will converge to zero, resulting in the loss of discriminative power. Using Dirichlet energy to measure "expressiveness" of embedding is conceptually clean; it leads to simpler proofs than \cite{oono2019graph} and can handle more non-linearities.

preprint2020arXiv

An efficient algorithm for $1$-dimensional (persistent) path homology

This paper focuses on developing an efficient algorithm for analyzing a directed network (graph) from a topological viewpoint. A prevalent technique for such topological analysis involves computation of homology groups and their persistence. These concepts are well suited for spaces that are not directed. As a result, one needs a concept of homology that accommodates orientations in input space. Path-homology developed for directed graphs by Grigor'yan, Lin, Muranov and Yau has been effectively adapted for this purpose recently by Chowdhury and Mémoli. They also give an algorithm to compute this path-homology. Our main contribution in this paper is an algorithm that computes this path-homology and its persistence more efficiently for the $1$-dimensional ($H_1$) case. In developing such an algorithm, we discover various structures and their efficient computations that aid computing the $1$-dimensional path-homnology. We implement our algorithm and present some preliminary experimental results.

preprint2020arXiv

Detection and skeletonization of single neurons and tracer injections using topological methods

Neuroscientific data analysis has traditionally relied on linear algebra and stochastic process theory. However, the tree-like shapes of neurons cannot be described easily as points in a vector space (the subtraction of two neuronal shapes is not a meaningful operation), and methods from computational topology are better suited to their analysis. Here we introduce methods from Discrete Morse (DM) Theory to extract the tree-skeletons of individual neurons from volumetric brain image data, and to summarize collections of neurons labelled by tracer injections. Since individual neurons are topologically trees, it is sensible to summarize the collection of neurons using a consensus tree-shape that provides a richer information summary than the traditional regional 'connectivity matrix' approach. The conceptually elegant DM approach lacks hand-tuned parameters and captures global properties of the data as opposed to previous approaches which are inherently local. For individual skeletonization of sparsely labelled neurons we obtain substantial performance gains over state-of-the-art non-topological methods (over 10% improvements in precision and faster proofreading). The consensus-tree summary of tracer injections incorporates the regional connectivity matrix information, but in addition captures the collective collateral branching patterns of the set of neurons connected to the injection site, and provides a bridge between single-neuron morphology and tracer-injection data.

preprint2020arXiv

Understanding the Power of Persistence Pairing via Permutation Test

Recently many efforts have been made to incorporate persistence diagrams, one of the major tools in topological data analysis (TDA), into machine learning pipelines. To better understand the power and limitation of persistence diagrams, we carry out a range of experiments on both graph data and shape data, aiming to decouple and inspect the effects of different factors involved. To this end, we also propose the so-called \emph{permutation test} for persistence diagrams to delineate critical values and pairings of critical values. For graph classification tasks, we note that while persistence pairing yields consistent improvement over various benchmark datasets, it appears that for various filtration functions tested, most discriminative power comes from critical values. For shape segmentation and classification, however, we note that persistence pairing shows significant power on most of the benchmark datasets, and improves over both summaries based on merely critical values, and those based on permutation tests. Our results help provide insights on when persistence diagram based summaries could be more suitable.

preprint2016arXiv

Measuring Distance between Reeb Graphs

One of the prevailing ideas in geometric and topological data analysis is to provide descriptors that encode useful information about hidden objects from observed data. The Reeb graph is one such descriptor for a given scalar function. The Reeb graph provides a simple yet meaningful abstraction of the input domain, and can also be computed efficiently. Given the popularity of the Reeb graph in applications, it is important to understand its stability and robustness with respect to changes in the input function, as well as to be able to compare the Reeb graphs resulting from different functions. In this paper, we propose a metric for Reeb graphs, called the functional distortion distance. Under this distance measure, the Reeb graph is stable against small changes of input functions. At the same time, it remains discriminative at differentiating input functions. In particular, the main result is that the functional distortion distance between two Reeb graphs is bounded from below by (and thus more discriminative than) the bottleneck distance between both the ordinary and extended persistence diagrams for appropriate dimensions. As an application of our results, we analyze a natural simplification scheme for Reeb graphs, and show that persistent features in Reeb graph remains persistent under simplification. Understanding the stability of important features of the Reeb graph under simplification is an interesting problem on its own right, and critical to the practical usage of Reeb graphs.

preprint2016arXiv

Mutiscale Mapper: A Framework for Topological Summarization of Data and Maps

Summarizing topological information from datasets and maps defined on them is a central theme in topological data analysis. \textsf{Mapper}, a tool for such summarization, takes as input both a possibly high dimensional dataset and a map defined on the data, and produces a summary of the data by using a cover of the codomain of the map. This cover, via a pullback operation to the domain, produces a simplicial complex connecting the data points. The resulting view of the data through a cover of the codomain offers flexibility in analyzing the data. However, it offers only a view at a fixed scale at which the cover is constructed. Inspired by the concept, we explore a notion of a tower of covers which induces a tower of simplicial complexes connected by simplicial maps, which we call {\em multiscale mapper}. We study the resulting structure, its stability, and design practical algorithms to compute its associated persistence diagrams efficiently. Specifically, when the domain is a simplicial complex and the map is a real-valued piecewise-linear function, the algorithm can compute the exact persistence diagram only from the 1-skeleton of the input complex. For general maps, we present a combinatorial version of the algorithm that acts only on \emph{vertex sets} connected by the 1-skeleton graph, and this algorithm approximates the exact persistence diagram thanks to a stability result that we show to hold. We also relate the multiscale mapper with the Čech complexes arising from a natural pullback pseudometric defined on the input domain.

preprint2016arXiv

SimBa: An Efficient Tool for Approximating Rips-filtration Persistence via Simplicial Batch-collapse

In topological data analysis, a point cloud data P extracted from a metric space is often analyzed by computing the persistence diagram or barcodes of a sequence of Rips complexes built on $P$ indexed by a scale parameter. Unfortunately, even for input of moderate size, the size of the Rips complex may become prohibitively large as the scale parameter increases. Starting with the Sparse Rips filtration introduced by Sheehy, some existing methods aim to reduce the size of the complex so as to improve the time efficiency as well. However, as we demonstrate, existing approaches still fall short of scaling well, especially for high dimensional data. In this paper, we investigate the advantages and limitations of existing approaches. Based on insights gained from the experiments, we propose an efficient new algorithm, called SimBa, for approximating the persistent homology of Rips filtrations with quality guarantees. Our new algorithm leverages a batch collapse strategy as well as a new sparse Rips-like filtration. We experiment on a variety of low and high dimensional data sets. We show that our strategy presents a significant size reduction, and our algorithm for approximating Rips filtration persistence is order of magnitude faster than existing methods in practice.

preprint2015arXiv

Beyond Hartigan Consistency: Merge Distortion Metric for Hierarchical Clustering

Hierarchical clustering is a popular method for analyzing data which associates a tree to a dataset. Hartigan consistency has been used extensively as a framework to analyze such clustering algorithms from a statistical point of view. Still, as we show in the paper, a tree which is Hartigan consistent with a given density can look very different than the correct limit tree. Specifically, Hartigan consistency permits two types of undesirable configurations which we term over-segmentation and improper nesting. Moreover, Hartigan consistency is a limit property and does not directly quantify difference between trees. In this paper we identify two limit properties, separation and minimality, which address both over-segmentation and improper nesting and together imply (but are not implied by) Hartigan consistency. We proceed to introduce a merge distortion metric between hierarchical clusterings and show that convergence in our distance implies both separation and minimality. We also prove that uniform separation and minimality imply convergence in the merge distortion metric. Furthermore, we show that our merge distortion metric is stable under perturbations of the density. Finally, we demonstrate applicability of these concepts by proving convergence results for two clustering algorithms. First, we show convergence (and hence separation and minimality) of the recent robust single linkage algorithm of Chaudhuri and Dasgupta (2010). Second, we provide convergence results on manifolds for topological split tree clustering.

preprint2015arXiv

Fréchet Distance for Curves, Revisited

$\renewcommand{\Re}{{\rm I\!\hspace{-0.025em} R}} \newcommand{\eps}{\varepsilon} \newcommand{\SetX}{\mathsf{X}} \newcommand{\VorX}[1]{\mathcal{V} \pth{#1}} \newcommand{\Polygon}{\mathsf{P}} \newcommand{\Space}{\overline{\mathsf{m}}} \newcommand{\pth}[2][\!]{#1\left({#2}\right)}$ We revisit the problem of computing Fréchet distance between polygonal curves under $L_1$, $L_2$, and $L_\infty$ norms, focusing on discrete Fréchet distance, where only distance between vertices is considered. We develop efficient algorithms for two natural classes of curves. In particular, given two polygonal curves of $n$ vertices each, a $\eps$-approximation of their discrete Fréchet distance can be computed in roughly $O(nκ^3\log n/\eps^3)$ time in three dimensions, if one of the curves is \emph{$κ$-bounded}. Previously, only a $κ$-approximation algorithm was known. If both curves are the so-called \emph{\backbone~curves}, which are widely used to model protein backbones in molecular biology, we can $\eps$-approximate their Fréchet distance in near linear time in two dimensions, and in roughly $O(n^{4/3}\log nm)$ time in three dimensions. In the second part, we propose a pseudo--output-sensitive algorithm for computing Fréchet distance exactly. The complexity of the algorithm is a function of a quantity we call the \emph{\bwnumber{}}, which is quadratic in the worst case, but tends to be much smaller in practice.

preprint2015arXiv

Metric embedding with outliers

We initiate the study of metric embeddings with \emph{outliers}. Given some metric space $(X,ρ)$ we wish to find a small set of outlier points $K \subset X$ and either an isometric or a low-distortion embedding of $(X\setminus K,ρ)$ into some target metric space. This is a natural problem that captures scenarios where a small fraction of points in the input corresponds to noise. For the case of isometric embeddings we derive polynomial-time approximation algorithms for minimizing the number of outliers when the target space is an ultrametric, a tree metric, or constant-dimensional Euclidean space. The approximation factors are 3, 4 and 2, respectively. For the case of embedding into an ultrametric or tree metric, we further improve the running time to $O(n^2)$ for an $n$-point input metric space, which is optimal. We complement these upper bounds by showing that outlier embedding into ultrametrics, trees, and $d$-dimensional Euclidean space for any $d\geq 2$ are all NP-hard, as well as NP-hard to approximate within a factor better than 2 assuming the Unique Game Conjecture. For the case of non-isometries we consider embeddings with small $\ell_{\infty}$ distortion. We present polynomial-time \emph{bi-criteria} approximation algorithms. Specifically, given some $ε> 0$, let $k_ε$ denote the minimum number of outliers required to obtain an embedding with distortion $ε$. For the case of embedding into ultrametrics we obtain a polynomial-time algorithm which computes a set of at most $3k_ε$ outliers and an embedding of the remaining points into an ultrametric with distortion $O(ε\log n)$. For embedding a metric of unit diameter into constant-dimensional Euclidean space we present a polynomial-time algorithm which computes a set of at most $2k_ε$ outliers and an embedding of the remaining points with distortion $O(\sqrtε)$.

preprint2015arXiv

Parameter-free Topology Inference and Sparsification for Data on Manifolds

In topology inference from data, current approaches face two major problems. One concerns the selection of a correct parameter to build an appropriate complex on top of the data points; the other involves with the typical `large' size of this complex. We address these two issues in the context of inferring homology from sample points of a smooth manifold of known dimension sitting in an Euclidean space $\mathbb{R}^k$. We show that, for a sample size of $n$ points, we can identify a set of $O(n^2)$ points (as opposed to $O(n^{\lceil \frac{k}{2}\rceil})$ Voronoi vertices) approximating a subset of the medial axis that suffices to compute a distance sandwiched between the well known local feature size and the local weak feature size (in fact, the approximating set can be further reduced in size to $O(n)$). This distance, called the lean feature size, helps pruning the input set at least to the level of local feature size while making the data locally uniform. The local uniformity in turn helps in building a complex for homology inference on top of the sparsified data without requiring any user-supplied distance threshold. Unlike most topology inference results, ours does not require that the input is dense relative to a {\em global} feature such as {\em reach} or {\em weak feature size}; instead it can be adaptive with respect to the local feature size. We present some empirical evidence in support of our theoretical claims.

preprint2015arXiv

Topological analysis of scalar fields with outliers

Given a real-valued function $f$ defined over a manifold $M$ embedded in $\mathbb{R}^d$, we are interested in recovering structural information about $f$ from the sole information of its values on a finite sample $P$. Existing methods provide approximation to the persistence diagram of $f$ when geometric noise and functional noise are bounded. However, they fail in the presence of aberrant values, also called outliers, both in theory and practice. We propose a new algorithm that deals with outliers. We handle aberrant functional values with a method inspired from the k-nearest neighbors regression and the local median filtering, while the geometric outliers are handled using the distance to a measure. Combined with topological results on nested filtrations, our algorithm performs robust topological analysis of scalar fields in a wider range of noise models than handled by current methods. We provide theoretical guarantees and experimental results on the quality of our approximation of the sampled scalar field.

preprint2014arXiv

Computing Topological Persistence for Simplicial Maps

Algorithms for persistent homology and zigzag persistent homology are well-studied for persistence modules where homomorphisms are induced by inclusion maps. In this paper, we propose a practical algorithm for computing persistence under $\mathbb{Z}_2$ coefficients for a sequence of general simplicial maps and show how these maps arise naturally in some applications of topological data analysis. First, we observe that it is not hard to simulate simplicial maps by inclusion maps but not necessarily in a monotone direction. This, combined with the known algorithms for zigzag persistence, provides an algorithm for computing the persistence induced by simplicial maps. Our main result is that the above simple minded approach can be improved for a sequence of simplicial maps given in a monotone direction. A simplicial map can be decomposed into a set of elementary inclusions and vertex collapses--two atomic operations that can be supported efficiently with the notion of simplex annotations for computing persistent homology. A consistent annotation through these atomic operations implies the maintenance of a consistent cohomology basis, hence a homology basis by duality. While the idea of maintaining a cohomology basis through an inclusion is not new, maintaining them through a vertex collapse is new, which constitutes an important atomic operation for simulating simplicial maps. Annotations support the vertex collapse in addition to the usual inclusion quite naturally. Finally, we exhibit an application of this new tool in which we approximate the persistence diagram of a filtration of Rips complexes where vertex collapses are used to tame the blow-up in size.

preprint2014arXiv

Dimension Detection with Local Homology

Detecting the dimension of a hidden manifold from a point sample has become an important problem in the current data-driven era. Indeed, estimating the shape dimension is often the first step in studying the processes or phenomena associated to the data. Among the many dimension detection algorithms proposed in various fields, a few can provide theoretical guarantee on the correctness of the estimated dimension. However, the correctness usually requires certain regularity of the input: the input points are either uniformly randomly sampled in a statistical setting, or they form the so-called $(\varepsilon,δ)$-sample which can be neither too dense nor too sparse. Here, we propose a purely topological technique to detect dimensions. Our algorithm is provably correct and works under a more relaxed sampling condition: we do not require uniformity, and we also allow Hausdorff noise. Our approach detects dimension by determining local homology. The computation of this topological structure is much less sensitive to the local distribution of points, which leads to the relaxation of the sampling conditions. Furthermore, by leveraging various developments in computational topology, we show that this local homology at a point $z$ can be computed \emph{exactly} for manifolds using Vietoris-Rips complexes whose vertices are confined within a local neighborhood of $z$. We implement our algorithm and demonstrate the accuracy and robustness of our method using both synthetic and real data sets.

preprint2014arXiv

Strong Equivalence of the Interleaving and Functional Distortion Metrics for Reeb Graphs

The Reeb graph is a construction that studies a topological space through the lens of a real valued function. It has widely been used in applications, however its use on real data means that it is desirable and increasingly necessary to have methods for comparison of Reeb graphs. Recently, several methods to define metrics on the space of Reeb graphs have been presented. In this paper, we focus on two: the functional distortion distance and the interleaving distance. The former is based on the Gromov--Hausdorff distance, while the latter utilizes the equivalence between Reeb graphs and a particular class of cosheaves. However, both are defined by constructing a near-isomorphism between the two graphs of study. In this paper, we show that the two metrics are strongly equivalent on the space of Reeb graphs. In particular, this gives an immediate proof of bottleneck stability for persistence diagrams in terms of the Reeb graph interleaving distance.

preprint2013arXiv

Measuring Similarity Between Curves on 2-Manifolds via Homotopy Area

Measuring the similarity of curves is a fundamental problem arising in many application fields. There has been considerable interest in several such measures, both in Euclidean space and in more general setting such as curves on Riemannian surfaces or curves in the plane minus a set of obstacles. However, so far, efficiently computable similarity measures for curves on general surfaces remain elusive. This paper aims at developing a natural curve similarity measure that can be easily extended and computed for curves on general orientable 2-manifolds. Specifically, we measure similarity between homotopic curves based on how hard it is to deform one curve into the other one continuously, and define this "hardness" as the minimum possible surface area swept by a homotopy between the curves. We consider cases where curves are embedded in the plane or on a triangulated orientable surface with genus $g$, and we present efficient algorithms (which are either quadratic or near linear time, depending on the setting) for both cases.

preprint2012arXiv

Annotating Simplices with a Homology Basis and Its Applications

Let $K$ be a simplicial complex and $g$ the rank of its $p$-th homology group $H_p(K)$ defined with $Z_2$ coefficients. We show that we can compute a basis $H$ of $H_p(K)$ and annotate each $p$-simplex of $K$ with a binary vector of length $g$ with the following property: the annotations, summed over all $p$-simplices in any $p$-cycle $z$, provide the coordinate vector of the homology class $[z]$ in the basis $H$. The basis and the annotations for all simplices can be computed in $O(n^ω)$ time, where $n$ is the size of $K$ and $ω<2.376$ is a quantity so that two $n\times n$ matrices can be multiplied in $O(n^ω)$ time. The pre-computation of annotations permits answering queries about the independence or the triviality of $p$-cycles efficiently. Using annotations of edges in 2-complexes, we derive better algorithms for computing optimal basis and optimal homologous cycles in 1-dimensional homology. Specifically, for computing an optimal basis of $H_1(K)$, we improve the time complexity known for the problem from $O(n^4)$ to $O(n^ω+n^2g^{ω-1})$. Here $n$ denotes the size of the 2-skeleton of $K$ and $g$ the rank of $H_1(K)$. Computing an optimal cycle homologous to a given 1-cycle is NP-hard even for surfaces and an algorithm taking $2^{O(g)}n\log n$ time is known for surfaces. We extend this algorithm to work with arbitrary 2-complexes in $O(n^ω)+2^{O(g)}n^2\log n$ time using annotations.

preprint2012arXiv

Graph Laplacians on Singular Manifolds: Toward understanding complex spaces: graph Laplacians on manifolds with singularities and boundaries

Recently, much of the existing work in manifold learning has been done under the assumption that the data is sampled from a manifold without boundaries and singularities or that the functions of interest are evaluated away from such points. At the same time, it can be argued that singularities and boundaries are an important aspect of the geometry of realistic data. In this paper we consider the behavior of graph Laplacians at points at or near boundaries and two main types of other singularities: intersections, where different manifolds come together and sharp "edges", where a manifold sharply changes direction. We show that the behavior of graph Laplacian near these singularities is quite different from that in the interior of the manifolds. In fact, a phenomenon somewhat reminiscent of the Gibbs effect in the analysis of Fourier series, can be observed in the behavior of graph Laplacian near such points. Unlike in the interior of the domain, where graph Laplacian converges to the Laplace-Beltrami operator, near singularities graph Laplacian tends to a first-order differential operator, which exhibits different scaling behavior as a function of the kernel width. One important implication is that while points near the singularities occupy only a small part of the total volume, the difference in scaling results in a disproportionately large contribution to the total behavior. Another significant finding is that while the scaling behavior of the operator is the same near different types of singularities, they are very distinct at a more refined level of analysis. We believe that a comprehensive understanding of these structures in addition to the standard case of a smooth manifold can take us a long way toward better methods for analysis of complex non-linear data and can lead to significant progress in algorithm design.

Yusu Wang

What is connected

Connect this record

See the researcher in context

Building this map preview

26 published item(s)

A simple yet effective baseline for non-attributed graph classification

Generative Coarse-Graining of Molecular Conformations

Intrinsic Interleaving Distance for Merge Trees

On the clique number of noisy random geometric graphs

Persistent Laplacians: properties, algorithms and implications

Weisfeiler-Lehman meets Gromov-Wasserstein

Graph Coarsening with Neural Networks

Ordinally Consensus Subset over Multiple Metrics

A Note on Over-Smoothing for Graph Neural Networks

An efficient algorithm for $1$-dimensional (persistent) path homology

Detection and skeletonization of single neurons and tracer injections using topological methods

Understanding the Power of Persistence Pairing via Permutation Test

Measuring Distance between Reeb Graphs

Mutiscale Mapper: A Framework for Topological Summarization of Data and Maps

SimBa: An Efficient Tool for Approximating Rips-filtration Persistence via Simplicial Batch-collapse

Beyond Hartigan Consistency: Merge Distortion Metric for Hierarchical Clustering

Fréchet Distance for Curves, Revisited

Metric embedding with outliers

Parameter-free Topology Inference and Sparsification for Data on Manifolds

Topological analysis of scalar fields with outliers

Computing Topological Persistence for Simplicial Maps

Dimension Detection with Local Homology

Strong Equivalence of the Interleaving and Functional Distortion Metrics for Reeb Graphs

Measuring Similarity Between Curves on 2-Manifolds via Homotopy Area

Annotating Simplices with a Homology Basis and Its Applications

Graph Laplacians on Singular Manifolds: Toward understanding complex spaces: graph Laplacians on manifolds with singularities and boundaries