Researcher profile

Jakob Bæk Tejs Houen

Jakob Bæk Tejs Houen contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 13 - UnverifiedVerification L1Unclaimed author
2works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

2 published item(s)

preprint2022arXiv

Bias Reduction for Sum Estimation

In classical statistics and distribution testing, it is often assumed that elements can be sampled from some distribution $P$, and that when an element $x$ is sampled, the probability $P$ of sampling $x$ is also known. Recent work in distribution testing has shown that many algorithms are robust in the sense that they still produce correct output if the elements are drawn from any distribution $Q$ that is sufficiently close to $P$. This phenomenon raises interesting questions: under what conditions is a "noisy" distribution $Q$ sufficient, and what is the algorithmic cost of coping with this noise? We investigate these questions for the problem of estimating the sum of a multiset of $N$ real values $x_1, \ldots, x_N$. This problem is well-studied in the statistical literature in the case $P = Q$, where the Hansen-Hurwitz estimator is frequently used. We assume that for some known distribution $P$, values are sampled from a distribution $Q$ that is pointwise close to $P$. For every positive integer $k$ we define an estimator $ζ_k$ for $μ= \sum_i x_i$ whose bias is proportional to $γ^k$ (where our $ζ_1$ reduces to the classical Hansen-Hurwitz estimator). As a special case, we show that if $Q$ is pointwise $γ$-close to uniform and all $x_i \in \{0, 1\}$, for any $ε> 0$, we can estimate $μ$ to within additive error $εN$ using $m = Θ({N^{1-\frac{1}{k}} / ε^{2/k}})$ samples, where $k = \left\lceil (\log ε)/(\log γ)\right\rceil$. We show that this sample complexity is essentially optimal. Our bounds show that the sample complexity need not vary uniformly with the desired error parameter $ε$: for some values of $ε$, perturbations in its value have no asymptotic effect on the sample complexity, while for other values, any decrease in its value results in an asymptotically larger sample complexity.

preprint2022arXiv

Understanding the Moments of Tabulation Hashing via Chaoses

Simple tabulation hashing dates back to Zobrist in 1970 and is defined as follows: Each key is viewed as $c$ characters from some alphabet $Σ$, we have $c$ fully random hash functions $h_0, \ldots, h_{c - 1} \colon Σ\to \{0, \ldots, 2^l - 1\}$, and a key $x = (x_0, \ldots, x_{c - 1})$ is hashed to $h(x) = h_0(x_0) \oplus \ldots \oplus h_{c - 1}(x_{c - 1})$ where $\oplus$ is the bitwise XOR operation. The previous results on tabulation hashing by P{\v a}tra{\c s}cu and Thorup~[J.ACM'11] and by Aamand et al.~[STOC'20] focused on proving Chernoff-style tail bounds on hash-based sums, e.g., the number keys hashing to a given value, for simple tabulation hashing, but their bounds do not cover the entire tail. Chaoses are random variables of the form $\sum a_{i_0, \ldots, i_{c - 1}} X_{i_0} \cdot \ldots \cdot X_{i_{c - 1}}$ where $X_i$ are independent random variables. Chaoses are a well-studied concept from probability theory, and tight analysis has been proven in several instances, e.g., when the independent random variables are standard Gaussian variables and when the independent random variables have logarithmically convex tails. We notice that hash-based sums of simple tabulation hashing can be seen as a sum of chaoses that are not independent. This motivates us to use techniques from the theory of chaoses to analyze hash-based sums of simple tabulation hashing. In this paper, we obtain bounds for all the moments of hash-based sums for simple tabulation hashing which are tight up to constants depending only on $c$. In contrast with the previous attempts, our approach will mostly be analytical and does not employ intricate combinatorial arguments. The improved analysis of simple tabulation hashing allows us to obtain bounds for the moments of hash-based sums for the mixed tabulation hashing introduced by Dahlgaard et al.~[FOCS'15].