Source author record

Jonathan Halcrow

Jonathan Halcrow appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning physics.flu-dyn Data Structures and Algorithms math.DS Social and Information Networks

Catalog footprint

What is connected

4works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2023arXiv

Stars: Tera-Scale Graph Building for Clustering and Graph Learning

A fundamental procedure in the analysis of massive datasets is the construction of similarity graphs. Such graphs play a key role for many downstream tasks, including clustering, classification, graph learning, and nearest neighbor search. For these tasks, it is critical to build graphs which are sparse yet still representative of the underlying data. The benefits of sparsity are twofold: firstly, constructing dense graphs is infeasible in practice for large datasets, and secondly, the runtime of downstream tasks is directly influenced by the sparsity of the similarity graph. In this work, we present $\textit{Stars}$: a highly scalable method for building extremely sparse graphs via two-hop spanners, which are graphs where similar points are connected by a path of length at most two. Stars can construct two-hop spanners with significantly fewer similarity comparisons, which are a major bottleneck for learning based models where comparisons are expensive to evaluate. Theoretically, we demonstrate that Stars builds a graph in nearly-linear time, where approximate nearest neighbors are contained within two-hop neighborhoods. In practice, we have deployed Stars for multiple data sets allowing for graph building at the $\textit{Tera-Scale}$, i.e., for graphs with tens of trillions of edges. We evaluate the performance of Stars for clustering and graph learning, and demonstrate 10~1000-fold improvements in pairwise similarity comparisons compared to different baselines, and 2~10-fold improvement in running time without quality loss.

preprint2020arXiv

Grale: Designing Networks for Graph Learning

How can we find the right graph for semi-supervised learning? In real world applications, the choice of which edges to use for computation is the first step in any graph learning process. Interestingly, there are often many types of similarity available to choose as the edges between nodes, and the choice of edges can drastically affect the performance of downstream semi-supervised learning systems. However, despite the importance of graph design, most of the literature assumes that the graph is static. In this work, we present Grale, a scalable method we have developed to address the problem of graph design for graphs with billions of nodes. Grale operates by fusing together different measures of(potentially weak) similarity to create a graph which exhibits high task-specific homophily between its nodes. Grale is designed for running on large datasets. We have deployed Grale in more than 20 different industrial settings at Google, including datasets which have tens of billions of nodes, and hundreds of trillions of potential edges to score. By employing locality sensitive hashing techniques,we greatly reduce the number of pairs that need to be scored, allowing us to learn a task specific model and build the associated nearest neighbor graph for such datasets in hours, rather than the days or even weeks that might be required otherwise. We illustrate this through a case study where we examine the application of Grale to an abuse classification problem on YouTube with hundreds of million of items. In this application, we find that Grale detects a large number of malicious actors on top of hard-coded rules and content classifiers, increasing the total recall by 89% over those approaches alone.

preprint2009arXiv

Equilibrium and traveling-wave solutions of plane Couette flow

We present ten new equilibrium solutions to plane Couette flow in small periodic cells at low Reynolds number (Re) and two new traveling-wave solutions. The solutions are continued under changes of Re and spanwise period. We provide a partial classification of the isotropy groups of plane Couette flow and show which kinds of solutions are allowed by each isotropy group. We find two complementary visualizations particularly revealing. Suitably chosen sections of their 3D-physical space velocity fields are helpful in developing physical intuition about coherent structures observed in low Re turbulence. Projections of these solutions and their unstable manifolds from their infinite-dimensional state space onto suitably chosen 2- or 3-dimensional subspaces reveal their interrelations and the role they play in organizing turbulence in wall-bounded shear flows.

preprint2008arXiv

Visualizing the geometry of state space in plane Couette flow

Motivated by recent experimental and numerical studies of coherent structures in wall-bounded shear flows, we initiate a systematic exploration of the hierarchy of unstable invariant solutions of the Navier-Stokes equations. We construct a dynamical, 10^5-dimensional state-space representation of plane Couette flow at Re = 400 in a small, periodic cell and offer a new method of visualizing invariant manifolds embedded in such high dimensions. We compute a new equilibrium solution of plane Couette flow and the leading eigenvalues and eigenfunctions of known equilibria at this Reynolds number and cell size. What emerges from global continuations of their unstable manifolds is a surprisingly elegant dynamical-systems visualization of moderate-Reynolds turbulence. The invariant manifolds tessellate the region of state space explored by transiently turbulent dynamics with a rigid web of continuous and discrete symmetry-induced heteroclinic connections.