Source author record

Thomas Dybdahl Ahle

Thomas Dybdahl Ahle appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Data Structures and Algorithms Computational Geometry Databases Hardware Architecture Machine Learning

Catalog footprint

What is connected

3works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Autoformalizing Memory Specifications with Agents

The primary goal of Design Verification (DV) is to ensure that a proposed chip design implementation (either in code, or physical form) exactly matches its specification and is free of functional errors in order to avoid costly re-designs. Achieving this often demands extensive manual interpretation, translating the specification document into a formal, testable representation. While AI has made progress in DV, current approaches typically focus on narrow, isolated tasks rather than full end-to-end specification compliance of modern chip designs, failing to capture the complexity of real-world verification. Our method automatically formalizes natural language memory chip specifications, for industry relevant Dynamic Random Access Memory (DRAM) standards, into a formal representation called DRAMPyML that can be used for downstream DV tasks like the generation of SystemVerilog assertions, stimulus, and functional coverage. We also release our benchmarking dataset, DRAMBench, which can be used to evaluate the evolution of model capabilities (and new approaches) at hardware autoformalization.

preprint2020arXiv

On the Problem of $p_1^{-1}$ in Locality-Sensitive Hashing

A Locality-Sensitive Hash (LSH) function is called $(r,cr,p_1,p_2)$-sensitive, if two data-points with a distance less than $r$ collide with probability at least $p_1$ while data points with a distance greater than $cr$ collide with probability at most $p_2$. These functions form the basis of the successful Indyk-Motwani algorithm (STOC 1998) for nearest neighbour problems. In particular one may build a $c$-approximate nearest neighbour data structure with query time $\tilde O(n^ρ/p_1)$ where $ρ=\frac{\log1/p_1}{\log1/p_2}\in(0,1)$. That is, sub-linear time, as long as $p_1$ is not too small. This is significant since most high dimensional nearest neighbour problems suffer from the curse of dimensionality, and can't be solved exact, faster than a brute force linear-time scan of the database. Unfortunately, the best LSH functions tend to have very low collision probabilities, $p_1$ and $p_2$. Including the best functions for Cosine and Jaccard Similarity. This means that the $n^ρ/p_1$ query time of LSH is often not sub-linear after all, even for approximate nearest neighbours! In this paper, we improve the general Indyk-Motwani algorithm to reduce the query time of LSH to $\tilde O(n^ρ/p_1^{1-ρ})$ (and the space usage correspondingly.) Since $n^ρp_1^{ρ-1} < n \Leftrightarrow p_1 > n^{-1}$, our algorithm always obtains sublinear query time, for any collision probabilities at least $1/n$. For $p_1$ and $p_2$ small enough, our improvement over all previous methods can be \emph{up to a factor $n$} in both query time and space. The improvement comes from a simple change to the Indyk-Motwani algorithm, which can easily be implemented in existing software packages.

preprint2020arXiv

Subsets and Supermajorities: Optimal Hashing-based Set Similarity Search

We formulate and optimally solve a new generalized Set Similarity Search problem, which assumes the size of the database and query sets are known in advance. By creating polylog copies of our data-structure, we optimally solve any symmetric Approximate Set Similarity Search problem, including approximate versions of Subset Search, Maximum Inner Product Search (MIPS), Jaccard Similarity Search and Partial Match. Our algorithm can be seen as a natural generalization of previous work on Set as well as Euclidean Similarity Search, but conceptually it differs by optimally exploiting the information present in the sets as well as their complements, and doing so asymmetrically between queries and stored sets. Doing so we improve upon the best previous work: MinHash [J. Discrete Algorithms 1998], SimHash [STOC 2002], Spherical LSF [SODA 2016, 2017] and Chosen Path [STOC 2017] by as much as a factor $n^{0.14}$ in both time and space; or in the near-constant time regime, in space, by an arbitrarily large polynomial factor. Turning the geometric concept, based on Boolean supermajority functions, into a practical algorithm requires ideas from branching random walks on $\mathbb Z^2$, for which we give the first non-asymptotic near tight analysis. Our lower bounds follow from new hypercontractive arguments, which can be seen as characterizing the exact family of similarity search problems for which supermajorities are optimal. The optimality holds for among all hashing based data structures in the random setting, and by reductions, for 1 cell and 2 cell probe data structures. As a side effect, we obtain new hypercontractive bounds on the directed noise operator $T^{p_1 \to p_2}_ρ$.

Thomas Dybdahl Ahle

What is connected

Connect this record

See the researcher in context

Building this map preview

3 published item(s)

Autoformalizing Memory Specifications with Agents

On the Problem of $p_1^{-1}$ in Locality-Sensitive Hashing

Subsets and Supermajorities: Optimal Hashing-based Set Similarity Search