Researcher profile

Lorenz Hübschle-Schneider

Lorenz Hübschle-Schneider contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 15 - UnverifiedVerification L1Unclaimed author
3works
0followers
2topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

3 published item(s)

preprint2022arXiv

Fast Succinct Retrieval and Approximate Membership using Ribbon

A retrieval data structure for a static function $f:S\rightarrow \{0,1\}^r$ supports queries that return $f(x)$ for any $x \in S$. Retrieval data structures can be used to implement a static approximate membership query data structure (AMQ), i.e., a Bloom filter alternative, with false positive rate $2^{-r}$. The information-theoretic lower bound for both tasks is $r|S|$ bits. While succinct theoretical constructions using $(1+o(1))r|S|$ bits were known, these could not achieve very small overheads in practice because they have an unfavorable space--time tradeoff hidden in the asymptotic costs or because small overheads would only be reached for physically impossible input sizes. With bumped ribbon retrieval (BuRR), we present the first practical succinct retrieval data structure. In an extensive experimental evaluation BuRR achieves space overheads well below 1\,\% while being faster than most previously used retrieval data structures (typically with space overheads at least an order of magnitude larger) and faster than classical Bloom filters (with space overhead $\geq 44\,\%$). This efficiency, including favorable constants, stems from a combination of simplicity, word parallelism, and high locality. We additionally describe homogeneous ribbon filter AMQs, which are even simpler and faster at the price of slightly larger space overhead.

preprint2022arXiv

Weighted Random Sampling on GPUs

An alias table is a data structure that allows for efficiently drawing weighted random samples in constant time and can be constructed in linear time. The PSA algorithm by Hübschle-Schneider and Sanders is able to construct alias tables in parallel on the CPU. In this report, we transfer the PSA algorithm to the GPU. Our construction algorithm achieves a speedup of 17 on a consumer GPU in comparison to the PSA method on a 16-core high-end desktop CPU. For sampling, we achieve an up to 24 times higher throughput. Both operations also require several times less energy than on the CPU. Adaptations helping to achieve this include changing memory access patterns to do coalesced access. Where this is not possible, we first copy data to the faster shared memory using coalesced access. We also enhance a generalization of binary search enabling to search for a range of items in parallel. Besides naive sampling, we also give improved batched sampling algorithms.

preprint2020arXiv

Communication-Efficient (Weighted) Reservoir Sampling from Fully Distributed Data Streams

We consider communication-efficient weighted and unweighted (uniform) random sampling from distributed data streams presented as a sequence of mini-batches of items. This is a natural model for distributed streaming computation, and our goal is to showcase its usefulness. We present and analyze fully distributed, communication-efficient algorithms for both versions of the problem. An experimental evaluation of weighted reservoir sampling on up to 256 nodes (5120 processors) shows good speedups, while theoretical analysis promises further scaling to much larger machines.