Source author record

R. W. R. Darling

R. W. R. Darling appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning math.PR Computational Geometry math.AT math.CO math.NT

Catalog footprint

What is connected

4works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Empirical complexity of comparator-based nearest neighbor descent

A Java parallel streams implementation of the $K$-nearest neighbor descent algorithm is presented using a natural statistical termination criterion. Input data consist of a set $S$ of $n$ objects of type V, and a Function<V, Comparator<V>>, which enables any $x \in S$ to decide which of $y, z \in S\setminus\{x\}$ is more similar to $x$. Experiments with the Kullback-Leibler divergence Comparator support the prediction that the number of rounds of $K$-nearest neighbor updates need not exceed twice the diameter of the undirected version of a random regular out-degree $K$ digraph on $n$ vertices. Overall complexity was $O(n K^2 \log_K(n))$ in the class of examples studied. When objects are sampled uniformly from a $d$-dimensional simplex, accuracy of the $K$-nearest neighbor approximation is high up to $d = 20$, but declines in higher dimensions, as theory would predict.

preprint2022arXiv

Proceedings of TDA: Applications of Topological Data Analysis to Data Science, Artificial Intelligence, and Machine Learning Workshop at SDM 2022

Topological Data Analysis (TDA) is a rigorous framework that borrows techniques from geometric and algebraic topology, category theory, and combinatorics in order to study the "shape" of such complex high-dimensional data. Research in this area has grown significantly over the last several years bringing a deeply rooted theory to bear on practical applications in areas such as genomics, natural language processing, medicine, cybersecurity, energy, and climate change. Within some of these areas, TDA has also been used to augment AI and ML techniques. We believe there is further utility to be gained in this space that can be facilitated by a workshop bringing together experts (both theorists and practitioners) and non-experts. Currently there is an active community of pure mathematicians with research interests in developing and exploring the theoretical and computational aspects of TDA. Applied mathematicians and other practitioners are also present in community but do not represent a majority. This speaks to the primary aim of this workshop which is to grow a wider community of interest in TDA. By fostering meaningful exchanges between these groups, from across the government, academia, and industry, we hope to create new synergies that can only come through building a mutual comprehensive awareness of the problem and solution spaces.

preprint2012arXiv

Rank deficiency in sparse random GF[2] matrices

Let $M$ be a random $m \times n$ matrix with binary entries and i.i.d. rows. The weight (i.e., number of ones) of a row has a specified probability distribution, with the row chosen uniformly at random given its weight. Let $N(n,m)$ denote the number of left null vectors in ${0,1}^m$ for $M$ (including the zero vector), where addition is mod 2. We take $n, m \to \infty$, with $m/n \to α> 0$, while the weight distribution may vary with $n$ but converges weakly to a limiting distribution on ${3, 4, 5, ...}$; let $W$ denote a variable with this limiting distribution. Identifying $M$ with a hypergraph on $n$ vertices, we define the 2-core of $M$ as the terminal state of an iterative algorithm that deletes every row incident to a column of degree 1. We identify two thresholds $α^*$ and $\underlineα$, and describe them analytically in terms of the distribution of $W$. Threshold $α^*$ marks the infimum of values of $α$ at which $n^{-1} \log{\mathbb{E} [N(n,m)}]$ converges to a positive limit, while $\underlineα$ marks the infimum of values of $α$ at which there is a 2-core of non-negligible size compared to $n$ having more rows than non-empty columns. We have $1/2 \leq α^* \leq \underlineα \leq 1$, and typically these inequalities are strict; for example when $W = 3$ almost surely, numerics give $α^* = 0.88949 ...$ and $\underlineα = 0.91793 ...$ (previous work on this model has mainly been concerned with such cases where $W$ is non-random). The threshold of values of $α$ for which $N(n,m) \geq 2$ in probability lies in $[α^*,\underlineα]$ and is conjectured to equal $\underlineα$. The random row weight setting gives rise to interesting new phenomena not present in the non-random case that has been the focus of previous work.

preprint2009arXiv

Maximum GCD Among Pairs of Random Integers

Fix $α>0$, and sample $N$ integers uniformly at random from $\{1,2,\ldots ,\lfloor e^{αN}\rfloor \}$. Given $η>0$, the probability that the maximum of the pairwise GCDs lies between $N^{2-η}$ and $N^{2+η}$ converges to 1 as $N\to \infty $. More precise estimates are obtained. This is a Birthday Problem: two of the random integers are likely to share some prime factor of order $N^2/\log [N]$. The proof generalizes to any arithmetical semigroup where a suitable form of the Prime Number Theorem is valid.