Source author record

Tal Wagner

Tal Wagner appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Data Structures and Algorithms Machine Learning Computational Geometry

Catalog footprint

What is connected

8works

3topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

New Bounds for Kernel Sums via Fast Spherical Embeddings

We study query time bounds for the fundamental problem of estimating the kernel mean $\frac1{|X|}\sum_{x\in X}\mathbf{k}(x,y)$ of a query $y$ in a finite dataset $X\subset\mathbb{R}^d$ up to a prescribed additive error $\varepsilon$. The best known bounds for the Gaussian kernel are $O(d/\varepsilon^2)$, $\widetilde O(d+1/\varepsilon^4)$, and $\widetilde O(d+Δ^2/\varepsilon^2)$, where $Δ$ is the diameter of a region containing the points. We prove the new bound $\tilde O(d+\varepsilonΔ^2+1/\varepsilon^3)$, which improves over the previous ones in regimes with small error $\varepsilon$ and intermediate diameter $Δ$. At the center of our proof is a new fast spherical embedding theorem in the sense introduced by Bartal, Recht and Schulman (2011), which limits the embedded data diameter while preserving local Euclidean distances and avoiding ``distance collapse'' at larger scales. This fast embedding theorem may be of independent interest.

preprint2026arXiv

Positional LSH: Binary Block Matrix Approximation for Attention with Linear Biases

Positional encoding in transformers is commonly implemented through positional embeddings, attention masks, or bias terms, but formal connections between these mechanisms remain limited. We study attention with positional bias through the lens of locality-sensitive hashing (LSH), focusing on Attention with Linear Biases (ALiBi). We show that the ALiBi bias matrix is the expectation of contiguous block-diagonal binary masks induced by a ``positional LSH'' scheme. The empirical mean of masks sampled from this scheme yields spectral norm and max-norm approximation guarantees with bounded block sizes with high probability. This structural theorem implies a uniform approximation theorem for ALiBi-biased attention: with high probability over the sampled masks, the approximate attention output is accurate simultaneously for all query-key-value inputs and can be computed in near-linear time in the context length, reducing long-context ALiBi to a collection of randomized short-context regular (positionally unbiased) attention operations. Conceptually, this connects positional bias, masks, and positional embeddings in a single formal framework and suggests an approach to efficient ALiBi-biased attention. Experiments on large language models validate our theoretical findings.

preprint2022arXiv

Generalization Bounds for Data-Driven Numerical Linear Algebra

Data-driven algorithms can adapt their internal structure or parameters to inputs from unknown application-specific distributions, by learning from a training sample of inputs. Several recent works have applied this approach to problems in numerical linear algebra, obtaining significant empirical gains in performance. However, no theoretical explanation for their success was known. In this work we prove generalization bounds for those algorithms, within the PAC-learning framework for data-driven algorithm selection proposed by Gupta and Roughgarden (SICOMP 2017). Our main results are closely matching upper and lower bounds on the fat shattering dimension of the learning-based low rank approximation algorithm of Indyk et al.~(NeurIPS 2019). Our techniques are general, and provide generalization bounds for many other recently proposed data-driven algorithms in numerical linear algebra, covering both sketching-based and multigrid-based methods. This considerably broadens the class of data-driven algorithms for which a PAC-learning analysis is available.

preprint2022arXiv

Streaming Algorithms for Support-Aware Histograms

Histograms, i.e., piece-wise constant approximations, are a popular tool used to represent data distributions. Traditionally, the difference between the histogram and the underlying distribution (i.e., the approximation error) is measured using the $L_p$ norm, which sums the differences between the two functions over all items in the domain. Although useful in many applications, the drawback of this error measure is that it treats approximation errors of all items in the same way, irrespective of whether the mass of an item is important for the downstream application that uses the approximation. As a result, even relatively simple distributions cannot be approximated by succinct histograms without incurring large error. In this paper, we address this issue by adapting the definition of approximation so that only the errors of the items that belong to the support of the distribution are considered. Under this definition, we develop efficient 1-pass and 2-pass streaming algorithms that compute near-optimal histograms in sub-linear space. We also present lower bounds on the space complexity of this problem. Surprisingly, under this notion of error, there is an exponential gap in the space complexity of 1-pass and 2-pass streaming algorithms. Finally, we demonstrate the utility of our algorithms on a collection of real and synthetic data sets.

preprint2022arXiv

Triangle and Four Cycle Counting with Predictions in Graph Streams

We propose data-driven one-pass streaming algorithms for estimating the number of triangles and four cycles, two fundamental problems in graph analytics that are widely studied in the graph data stream literature. Recently, (Hsu 2018) and (Jiang 2020) applied machine learning techniques in other data stream problems, using a trained oracle that can predict certain properties of the stream elements to improve on prior "classical" algorithms that did not use oracles. In this paper, we explore the power of a "heavy edge" oracle in multiple graph edge streaming models. In the adjacency list model, we present a one-pass triangle counting algorithm improving upon the previous space upper bounds without such an oracle. In the arbitrary order model, we present algorithms for both triangle and four cycle estimation with fewer passes and the same space complexity as in previous algorithms, and we show several of these bounds are optimal. We analyze our algorithms under several noise models, showing that the algorithms perform well even when the oracle errs. Our methodology expands upon prior work on "classical" streaming algorithms, as previous multi-pass and random order streaming algorithms can be seen as special cases of our algorithms, where the first pass or random order was used to implement the heavy edge oracle. Lastly, our experiments demonstrate advantages of the proposed method compared to state-of-the-art streaming algorithms.

preprint2016arXiv

Cheeger-type approximation for sparsest $st$-cut

We introduce the $st$-cut version the Sparsest-Cut problem, where the goal is to find a cut of minimum sparsity among those separating two distinguished vertices $s,t\in V$. Clearly, this problem is at least as hard as the usual (non-$st$) version. Our main result is a polynomial-time algorithm for the product-demands setting, that produces a cut of sparsity $O(\sqrt{\OPT})$, where $\OPT$ denotes the optimum, and the total edge capacity and the total demand are assumed (by normalization) to be $1$. Our result generalizes the recent work of Trevisan [arXiv, 2013] for the non-$st$ version of the same problem (Sparsest-Cut with product demands), which in turn generalizes the bound achieved by the discrete Cheeger inequality, a cornerstone of Spectral Graph Theory that has numerous applications. Indeed, Cheeger's inequality handles graph conductance, the special case of product demands that are proportional to the vertex (capacitated) degrees. Along the way, we obtain an $O(\log n)$-approximation, where $n=\card{V}$, for the general-demands setting of Sparsest $st$-Cut.

preprint2016arXiv

Near-Optimal (Euclidean) Metric Compression

The metric sketching problem is defined as follows. Given a metric on $n$ points, and $ε>0$, we wish to produce a small size data structure (sketch) that, given any pair of point indices, recovers the distance between the points up to a $1+ε$ distortion. In this paper we consider metrics induced by $\ell_2$ and $\ell_1$ norms whose spread (the ratio of the diameter to the closest pair distance) is bounded by $Φ>0$. A well-known dimensionality reduction theorem due to Johnson and Lindenstrauss yields a sketch of size $O(ε^{-2} \log (Φn) n\log n)$, i.e., $O(ε^{-2} \log (Φn) \log n)$ bits per point. We show that this bound is not optimal, and can be substantially improved to $O(ε^{-2}\log(1/ε) \cdot \log n + \log\log Φ)$ bits per point. Furthermore, we show that our bound is tight up to a factor of $\log(1/ε)$. We also consider sketching of general metrics and provide a sketch of size $O(n\log(1/ε)+ \log\log Φ)$ bits per point, which we show is optimal.

preprint2015arXiv

Towards Resistance Sparsifiers

We study resistance sparsification of graphs, in which the goal is to find a sparse subgraph (with reweighted edges) that approximately preserves the effective resistances between every pair of nodes. We show that every dense regular expander admits a $(1+ε)$-resistance sparsifier of size $\tilde O(n/ε)$, and conjecture this bound holds for all graphs on $n$ nodes. In comparison, spectral sparsification is a strictly stronger notion and requires $Ω(n/ε^2)$ edges even on the complete graph. Our approach leads to the following structural question on graphs: Does every dense regular expander contain a sparse regular expander as a subgraph? Our main technical contribution, which may of independent interest, is a positive answer to this question in a certain setting of parameters. Combining this with a recent result of von Luxburg, Radl, and Hein~(JMLR, 2014) leads to the aforementioned resistance sparsifiers.

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Topic footprint