Source author record

Mathias Bæk Tejs Knudsen

Mathias Bæk Tejs Knudsen appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

10works
3topics
4close collaborators

Actions

Connect this record

Log in to claim

Research graph

See the researcher in context

Open full explorer

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

10 published item(s)

preprint2016arXiv

Hashing for statistics over k-partitions

In this paper we analyze a hash function for $k$-partitioning a set into bins, obtaining strong concentration bounds for standard algorithms combining statistics from each bin. This generic method was originally introduced by Flajolet and Martin~[FOCS'83] in order to save a factor $Ω(k)$ of time per element over $k$ independent samples when estimating the number of distinct elements in a data stream. It was also used in the widely used HyperLogLog algorithm of Flajolet et al.~[AOFA'97] and in large-scale machine learning by Li et al.~[NIPS'12] for minwise estimation of set similarity. The main issue of $k$-partition, is that the contents of different bins may be highly correlated when using popular hash functions. This means that methods of analyzing the marginal distribution for a single bin do not apply. Here we show that a tabulation based hash function, mixed tabulation, does yield strong concentration bounds on the most popular applications of $k$-partitioning similar to those we would get using a truly random hash function. The analysis is very involved and implies several new results of independent interest for both simple and double tabulation, e.g. a simple and efficient construction for invertible bloom filters and uniform hashing on a given set.

preprint2016arXiv

Near-Optimal Induced Universal Graphs for Bounded Degree Graphs

A graph $U$ is an induced universal graph for a family $F$ of graphs if every graph in $F$ is a vertex-induced subgraph of $U$. For the family of all undirected graphs on $n$ vertices Alstrup, Kaplan, Thorup, and Zwick [STOC 2015] give an induced universal graph with $O\!\left(2^{n/2}\right)$ vertices, matching a lower bound by Moon [Proc. Glasgow Math. Assoc. 1965]. Let $k= \lceil D/2 \rceil$. Improving asymptotically on previous results by Butler [Graphs and Combinatorics 2009] and Esperet, Arnaud and Ochem [IPL 2008], we give an induced universal graph with $O\!\left(\frac{k2^k}{k!}n^k \right)$ vertices for the family of graphs with $n$ vertices of maximum degree $D$. For constant $D$, Butler gives a lower bound of $Ω\!\left(n^{D/2}\right)$. For an odd constant $D\geq 3$, Esperet et al. and Alon and Capalbo [SODA 2008] give a graph with $O\!\left(n^{k-\frac{1}{D}}\right)$ vertices. Using their techniques for any (including constant) even values of $D$ gives asymptotically worse bounds than we present. For large $D$, i.e. when $D = Ω\left(\log^3 n\right)$, the previous best upper bound was ${n\choose\lceil D/2\rceil} n^{O(1)}$ due to Adjiashvili and Rotbart [ICALP 2014]. We give upper and lower bounds showing that the size is ${\lfloor n/2\rfloor\choose\lfloor D/2 \rfloor}2^{\pm\tilde{O}\left(\sqrt{D}\right)}$. Hence the optimal size is $2^{\tilde{O}(D)}$ and our construction is within a factor of $2^{\tilde{O}\left(\sqrt{D}\right)}$ from this. The previous results were larger by at least a factor of $2^{Ω(D)}$. As a part of the above, proving a conjecture by Esperet et al., we construct an induced universal graph with $2n-1$ vertices for the family of graphs with max degree $2$. In addition, we give results for acyclic graphs with max degree $2$ and cycle graphs. Our results imply the first labeling schemes that for any $D$ are at most $o(n)$ bits from optimal.

preprint2016arXiv

Optimal induced universal graphs and adjacency labeling for trees

We show that there exists a graph $G$ with $O(n)$ nodes, where any forest of $n$ nodes is a node-induced subgraph of $G$. Furthermore, for constant arboricity $k$, the result implies the existence of a graph with $O(n^k)$ nodes that contains all $n$-node graphs as node-induced subgraphs, matching a $Ω(n^k)$ lower bound. The lower bound and previously best upper bounds were presented in Alstrup and Rauhe (FOCS'02). Our upper bounds are obtained through a $\log_2 n +O(1)$ labeling scheme for adjacency queries in forests. We hereby solve an open problem being raised repeatedly over decades, e.g. in Kannan, Naor, Rudich (STOC 1988), Chung (J. of Graph Theory 1990), Fraigniaud and Korman (SODA 2010).

preprint2016arXiv

Sublinear Distance Labeling

A distance labeling scheme labels the $n$ nodes of a graph with binary strings such that, given the labels of any two nodes, one can determine the distance in the graph between the two nodes by looking only at the labels. A $D$-preserving distance labeling scheme only returns precise distances between pairs of nodes that are at distance at least $D$ from each other. In this paper we consider distance labeling schemes for the classical case of unweighted graphs with both directed and undirected edges. We present a $O(\frac{n}{D}\log^2 D)$ bit $D$-preserving distance labeling scheme, improving the previous bound by Bollobás et. al. [SIAM J. Discrete Math. 2005]. We also give an almost matching lower bound of $Ω(\frac{n}{D})$. With our $D$-preserving distance labeling scheme as a building block, we additionally achieve the following results: 1. We present the first distance labeling scheme of size $o(n)$ for sparse graphs (and hence bounded degree graphs). This addresses an open problem by Gavoille et. al. [J. Algo. 2004], hereby separating the complexity from distance labeling in general graphs which require $Ω(n)$ bits, Moon [Proc. of Glasgow Math. Association 1965]. 2. For approximate $r$-additive labeling schemes, that return distances within an additive error of $r$ we show a scheme of size $O\left ( \frac{n}{r} \cdot\frac{\operatorname{polylog} (r\log n)}{\log n} \right )$ for $r \ge 2$. This improves on the current best bound of $O\left(\frac{n}{r}\right)$ by Alstrup et. al. [SODA 2016] for sub-polynomial $r$, and is a generalization of a result by Gawrychowski et al. [arXiv preprint 2015] who showed this for $r=2$.

preprint2016arXiv

The Power of Two Choices with Simple Tabulation

The power of two choices is a classic paradigm for load balancing when assigning $m$ balls to $n$ bins. When placing a ball, we pick two bins according to two hash functions $h_0$ and $h_1$, and place the ball in the least loaded bin. Assuming fully random hash functions, when $m=O(n)$, Azar et al.~[STOC'94] proved that the maximum load is $\lg \lg n + O(1)$ with high probability. In this paper, we investigate the power of two choices when the hash functions $h_0$ and $h_1$ are implemented with simple tabulation, which is a very efficient hash function evaluated in constant time. Following their analysis of Cuckoo hashing [J.ACM'12], Pǎtraşcu and Thorup claimed that the expected maximum load with simple tabulation is $O(\lg\lg n)$. This did not include any high probability guarantee, so the load balancing was not yet to be trusted. Here, we show that with simple tabulation, the maximum load is $O(\lg\lg n)$ with high probability, giving the first constant time hash function with this guarantee. We also give a concrete example where, unlike with fully random hashing, the maximum load is not bounded by $\lg \lg n + O(1)$, or even $(1+o(1))\lg \lg n$ with high probability. Finally, we show that the expected maximum load is $\lg \lg n + O(1)$, just like with fully random hashing.

preprint2015arXiv

A simple and optimal ancestry labeling scheme for trees

We present a $\lg n + 2 \lg \lg n+3$ ancestry labeling scheme for trees. The problem was first presented by Kannan et al. [STOC 88'] along with a simple $2 \lg n$ solution. Motivated by applications to XML files, the label size was improved incrementally over the course of more than 20 years by a series of papers. The last, due to Fraigniaud and Korman [STOC 10'], presented an asymptotically optimal $\lg n + 4 \lg \lg n+O(1)$ labeling scheme using non-trivial tree-decomposition techniques. By providing a framework generalizing interval based labeling schemes, we obtain a simple, yet asymptotically optimal solution to the problem. Furthermore, our labeling scheme is attained by a small modification of the original $2 \lg n$ solution.

preprint2015arXiv

Longest Common Extensions in Sublinear Space

The longest common extension problem (LCE problem) is to construct a data structure for an input string $T$ of length $n$ that supports LCE$(i,j)$ queries. Such a query returns the length of the longest common prefix of the suffixes starting at positions $i$ and $j$ in $T$. This classic problem has a well-known solution that uses $O(n)$ space and $O(1)$ query time. In this paper we show that for any trade-off parameter $1 \leq τ\leq n$, the problem can be solved in $O(\frac{n}τ)$ space and $O(τ)$ query time. This significantly improves the previously best known time-space trade-offs, and almost matches the best known time-space product lower bound.

preprint2015arXiv

Quicksort, Largest Bucket, and Min-Wise Hashing with Limited Independence

Randomized algorithms and data structures are often analyzed under the assumption of access to a perfect source of randomness. The most fundamental metric used to measure how "random" a hash function or a random number generator is, is its independence: a sequence of random variables is said to be $k$-independent if every variable is uniform and every size $k$ subset is independent. In this paper we consider three classic algorithms under limited independence. We provide new bounds for randomized quicksort, min-wise hashing and largest bucket size under limited independence. Our results can be summarized as follows. -Randomized quicksort. When pivot elements are computed using a $5$-independent hash function, Karloff and Raghavan, J.ACM'93 showed $O ( n \log n)$ expected worst-case running time for a special version of quicksort. We improve upon this, showing that the same running time is achieved with only $4$-independence. -Min-wise hashing. For a set $A$, consider the probability of a particular element being mapped to the smallest hash value. It is known that $5$-independence implies the optimal probability $O (1 /n)$. Broder et al., STOC'98 showed that $2$-independence implies it is $O(1 / \sqrt{|A|})$. We show a matching lower bound as well as new tight bounds for $3$- and $4$-independent hash functions. -Largest bucket. We consider the case where $n$ balls are distributed to $n$ buckets using a $k$-independent hash function and analyze the largest bucket size. Alon et. al, STOC'97 showed that there exists a $2$-independent hash function implying a bucket of size $Ω( n^{1/2})$. We generalize the bound, providing a $k$-independent family of functions that imply size $Ω( n^{1/k})$.

preprint2014arXiv

Additive Spanners: A Simple Construction

We consider additive spanners of unweighted undirected graphs. Let $G$ be a graph and $H$ a subgraph of $G$. The most naïve way to construct an additive $k$-spanner of $G$ is the following: As long as $H$ is not an additive $k$-spanner repeat: Find a pair $(u,v) \in H$ that violates the spanner-condition and a shortest path from $u$ to $v$ in $G$. Add the edges of this path to $H$. We show that, with a very simple initial graph $H$, this naïve method gives additive $6$- and $2$-spanners of sizes matching the best known upper bounds. For additive $2$-spanners we start with $H=\emptyset$ and end with $O(n^{3/2})$ edges in the spanner. For additive $6$-spanners we start with $H$ containing $\lfloor n^{1/3} \rfloor$ arbitrary edges incident to each node and end with a spanner of size $O(n^{4/3})$.

preprint2014arXiv

Dynamic and Multi-functional Labeling Schemes

We investigate labeling schemes supporting adjacency, ancestry, sibling, and connectivity queries in forests. In the course of more than 20 years, the existence of $\log n + O(\log \log)$ labeling schemes supporting each of these functions was proven, with the most recent being ancestry [Fraigniaud and Korman, STOC '10]. Several multi-functional labeling schemes also enjoy lower or upper bounds of $\log n + Ω(\log \log n)$ or $\log n + O(\log \log n)$ respectively. Notably an upper bound of $\log n + 5\log \log n$ for adjacency+siblings and a lower bound of $\log n + \log \log n$ for each of the functions siblings, ancestry, and connectivity [Alstrup et al., SODA '03]. We improve the constants hidden in the $O$-notation. In particular we show a $\log n + 2\log \log n$ lower bound for connectivity+ancestry and connectivity+siblings, as well as an upper bound of $\log n + 3\log \log n + O(\log \log \log n)$ for connectivity+adjacency+siblings by altering existing methods. In the context of dynamic labeling schemes it is known that ancestry requires $Ω(n)$ bits [Cohen, et al. PODS '02]. In contrast, we show upper and lower bounds on the label size for adjacency, siblings, and connectivity of $2\log n$ bits, and $3 \log n$ to support all three functions. There exist efficient adjacency labeling schemes for planar, bounded treewidth, bounded arboricity and interval graphs. In a dynamic setting, we show a lower bound of $Ω(n)$ for each of those families.