Source author record

Yury Elkin

Yury Elkin appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

5works
3topics
2close collaborators

Actions

Connect this record

Log in to claim

Research graph

See the researcher in context

Open full explorer

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

5 published item(s)

preprint2023arXiv

A new compressed cover tree for k-nearest neighbour search and the stable-under-noise mergegram of a point cloud

This thesis consists of two topics related to computational geometry and one topic related to topological data analysis (TDA), which combines fields of computational geometry and algebraic topology for analyzing data. The first part studies the classical problem of finding k nearest neighbors to m query points in a larger set of n reference points in any metric space. The second part is about the construction of a Minimum Spanning Tree (MST) on any finite metric space. The third part extends the key concept of persistence within Topological Data Analysis in a new direction.

preprint2022arXiv

Counterexamples expose gaps in the proof of time complexity for cover trees introduced in 2006

This paper is motivated by the k-nearest neighbors search: given an arbitrary metric space, and its finite subsets (a reference set R and a query set Q), design a fast algorithm to find all k-nearest neighbors in R for every point q in Q. In 2006, Beygelzimer, Kakade, and Langford introduced cover trees to justify a near-linear time complexity for the neighbor search in the sizes of Q,R. Section 5.3 of Curtin's PhD (2015) pointed out that the proof of this result was wrong. The key step in the original proof attempted to show that the number of iterations can be estimated by multiplying the length of the longest root-to-leaf path in a cover tree by a constant factor. However, this estimate can miss many potential nodes in several branches of a cover tree, that should be considered during the neighbor search. The same argument was unfortunately repeated in several subsequent papers using cover trees from 2006. This paper explicitly constructs challenging datasets that provide counterexamples to the past proofs of time complexity for the cover tree construction, the k-nearest neighbor search presented at ICML 2006, and the dual-tree search algorithm published in NIPS 2009. The corrected near-linear time complexities with extra parameters are proved in another forthcoming paper by using a new compressed cover tree simplifying the original tree structure.

preprint2022arXiv

Paired compressed cover trees guarantee a near linear parametrized complexity for all $k$-nearest neighbors search in an arbitrary metric space

This paper studies the important problem of finding all $k$-nearest neighbors to points of a query set $Q$ in another reference set $R$ within any metric space. Our previous work defined compressed cover trees and corrected the key arguments in several past papers for challenging datasets. In 2009 Ram, Lee, March, and Gray attempted to improve the time complexity by using pairs of cover trees on the query and reference sets. In 2015 Curtin with the above co-authors used extra parameters to finally prove a time complexity for $k=1$. The current work fills all previous gaps and improves the nearest neighbor search based on pairs of new compressed cover trees. The novel imbalance parameter of paired trees allowed us to prove a better time complexity for any number of neighbors $k\geq 1$.

preprint2020arXiv

A fast approximate skeleton with guarantees for any cloud of points in a Euclidean space

The tree reconstruction problem is to find an embedded straight-line tree that approximates a given cloud of unorganized points in $\mathbb{R}^m$ up to a certain error. A practical solution to this problem will accelerate a discovery of new colloidal products with desired physical properties such as viscosity. We define the Approximate Skeleton of any finite point cloud $C$ in a Euclidean space with theoretical guarantees. The Approximate Skeleton ASk$(C)$ always belongs to a given offset of $C$, i.e. the maximum distance from $C$ to ASk$(C)$ can be a given maximum error. The number of vertices in the Approximate Skeleton is close to the minimum number in an optimal tree by factor 2. The new Approximate Skeleton of any unorganized point cloud $C$ is computed in a near linear time in the number of points in $C$. Finally, the Approximate Skeleton outperforms past skeletonization algorithms on the size and accuracy of reconstruction for a large dataset of real micelles and random clouds.

preprint2020arXiv

The mergegram of a dendrogram and its stability

This paper extends the key concept of persistence within Topological Data Analysis (TDA) in a new direction. TDA quantifies topological shapes hidden in unorganized data such as clouds of unordered points. In the 0-dimensional case the distance-based persistence is determined by a single-linkage (SL) clustering of a finite set in a metric space. Equivalently, the 0D persistence captures only edge-lengths of a Minimum Spanning Tree (MST). Both SL dendrogram and MST are unstable under perturbations of points. We define the new stable-under-noise mergegram, which outperforms previous isometry invariants on a classification of point clouds by PersLay.