Researcher profile

Mathias B. T. Knudsen

Mathias B. T. Knudsen contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 11 - UnverifiedVerification L1Unclaimed author
1works
0followers
1topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

1 published item(s)

preprint2020arXiv

Fast hashing with Strong Concentration Bounds

Previous work on tabulation hashing by Patrascu and Thorup from STOC'11 on simple tabulation and from SODA'13 on twisted tabulation offered Chernoff-style concentration bounds on hash based sums, e.g., the number of balls/keys hashing to a given bin, but under some quite severe restrictions on the expected values of these sums. The basic idea in tabulation hashing is to view a key as consisting of $c=O(1)$ characters, e.g., a 64-bit key as $c=8$ characters of 8-bits. The character domain $Σ$ should be small enough that character tables of size $|Σ|$ fit in fast cache. The schemes then use $O(1)$ tables of this size, so the space of tabulation hashing is $O(|Σ|)$. However, the concentration bounds by Patrascu and Thorup only apply if the expected sums are $\ll |Σ|$. To see the problem, consider the very simple case where we use tabulation hashing to throw $n$ balls into $m$ bins and want to analyse the number of balls in a given bin. With their concentration bounds, we are fine if $n=m$, for then the expected value is $1$. However, if $m=2$, as when tossing $n$ unbiased coins, the expected value $n/2$ is $\gg |Σ|$ for large data sets, e.g., data sets that do not fit in fast cache. To handle expectations that go beyond the limits of our small space, we need a much more advanced analysis of simple tabulation, plus a new tabulation technique that we call \emph{tabulation-permutation} hashing which is at most twice as slow as simple tabulation. No other hashing scheme of comparable speed offers similar Chernoff-style concentration bounds.