Source author record

Matti Karppa

Matti Karppa appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Data Structures and Algorithms Machine Learning

Catalog footprint

What is connected

4works

2topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Engineering Compressed Matrix Multiplication with the Fast Walsh-Hadamard Transform

We present an implementation of Pagh's compressed matrix multiplication algorithm, a randomized algorithm that constructs sketches of matrices to compute an unbiased estimate of their product. By leveraging fast polynomial multiplication via the FFT, the algorithm achieves high performance when the product matrix is sparse or contains only a small number of entries with magnitudes significantly larger than the rest. We show empirically that the algorithm is practical and can outperform state-of-the-art DGEMM implementations when the product matrix has few nonzero entries or is otherwise dominated by a small subset of elements with large magnitude. As a minor theoretical contribution, we replace the FFT with the Fast Walsh-Hadamard Transform (FWHT) in sketched multiplication, preserving all correctness and variance guarantees of the original algorithm. Experiments with our carefully engineered multithreaded CPU implementation for dense double-precision matrices on 64-core CPU nodes across a range of synthetic benchmarks, exhibiting variable sparsity patterns, show that the FWHT variant is up to 4 times faster than the FFT-based version. Under favorable sparsity and magnitude patterns in the product matrix, our FWHT-based implementation achieves a speedup of up to 40 over DGEMM from Intel MKL, with low probability of error in the estimates. Our implementation is released as free software and comes with NumPy-compatible Python bindings.

preprint2022arXiv

DEANN: Speeding up Kernel-Density Estimation using Approximate Nearest Neighbor Search

Kernel Density Estimation (KDE) is a nonparametric method for estimating the shape of a density function, given a set of samples from the distribution. Recently, locality-sensitive hashing, originally proposed as a tool for nearest neighbor search, has been shown to enable fast KDE data structures. However, these approaches do not take advantage of the many other advances that have been made in algorithms for nearest neighbor algorithms. We present an algorithm called Density Estimation from Approximate Nearest Neighbors (DEANN) where we apply Approximate Nearest Neighbor (ANN) algorithms as a black box subroutine to compute an unbiased KDE. The idea is to find points that have a large contribution to the KDE using ANN, compute their contribution exactly, and approximate the remainder with Random Sampling (RS). We present a theoretical argument that supports the idea that an ANN subroutine can speed up the evaluation. Furthermore, we provide a C++ implementation with a Python interface that can make use of an arbitrary ANN implementation as a subroutine for kernel density estimation. We show empirically that our implementation outperforms state of the art implementations in all high dimensional datasets we considered, and matches the performance of RS in cases where the ANN yield no gains in performance.

preprint2022arXiv

HyperLogLogLog: Cardinality Estimation With One Log More

We present HyperLogLogLog, a practical compression of the HyperLogLog sketch that compresses the sketch from $O(m\log\log n)$ bits down to $m \log_2\log_2\log_2 m + O(m+\log\log n)$ bits for estimating the number of distinct elements~$n$ using $m$~registers. The algorithm works as a drop-in replacement that preserves all estimation properties of the HyperLogLog sketch, it is possible to convert back and forth between the compressed and uncompressed representations, and the compressed sketch maintains mergeability in the compressed domain. The compressed sketch can be updated in amortized constant time, assuming $n$ is sufficiently larger than $m$. We provide a C++ implementation of the sketch, and show by experimental evaluation against well-known implementations by Google and Apache that our implementation provides small sketches while maintaining competitive update and merge times. Concretely, we observed approximately a 40% reduction in the sketch size. Furthermore, we obtain as a corollary a theoretical algorithm that compresses the sketch down to $m\log_2\log_2\log_2\log_2 m+O(m\log\log\log m/\log\log m+\log\log n)$ bits.

preprint2016arXiv

Explicit correlation amplifiers for finding outlier correlations in deterministic subquadratic time

We derandomize G. Valiant's [J. ACM 62 (2015) Art. 13] subquadratic-time algorithm for finding outlier correlations in binary data. Our derandomized algorithm gives deterministic subquadratic scaling essentially for the same parameter range as Valiant's randomized algorithm, but the precise constants we save over quadratic scaling are more modest. Our main technical tool for derandomization is an explicit family of correlation amplifiers built via a family of zigzag-product expanders in Reingold, Vadhan, and Wigderson [Ann. of Math. 155 (2002) 157--187]. We say that a function $f:\{-1,1\}^d\rightarrow\{-1,1\}^D$ is a correlation amplifier with threshold $0\leqτ\leq 1$, error $γ\geq 1$, and strength $p$ an even positive integer if for all pairs of vectors $x,y\in\{-1,1\}^d$ it holds that (i) $|\langle x,y\rangle|<τd$ implies $|\langle f(x),f(y)\rangle|\leq(τγ)^pD$; and (ii) $|\langle x,y\rangle|\geqτd$ implies $\bigl(\frac{\langle x,y\rangle}{γd}\bigr)^pD \leq\langle f(x),f(y)\rangle\leq \bigl(\frac{γ\langle x,y\rangle}{d}\bigr)^pD$.