Source author record

Stephen Alstrup

Stephen Alstrup appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Data Structures and Algorithms Information Retrieval math.CO Computation and Language Discrete Mathematics Human-Computer Interaction

Catalog footprint

What is connected

11works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Constructing Light Spanners Deterministically in Near-Linear Time

Graph spanners are well-studied and widely used both in theory and practice. In a recent breakthrough, Chechik and Wulff-Nilsen [CW18] improved the state-of-the-art for light spanners by constructing a $(2k-1)(1+ε)$-spanner with $O(n^{1+1/k})$ edges and $O_ε(n^{1/k})$ lightness. Soon after, Filtser and Solomon [FS19] showed that the classic greedy spanner construction achieves the same bounds The major drawback of the greedy spanner is its running time of $O(mn^{1+1/k})$ (which is faster than [CW16]). This makes the construction impractical even for graphs of moderate size. Much faster spanner constructions do exist but they only achieve lightness $Ω_ε(kn^{1/k})$, even when randomization is used. The contribution of this paper is deterministic spanner constructions that are fast, and achieve similar bounds as the state-of-the-art slower constructions. Our first result is an $O_ε(n^{2+1/k+ε'})$ time spanner construction which achieves the state-of-the-art bounds. Our second result is an $O_ε(m + n\log n)$ time construction of a spanner with $(2k-1)(1+ε)$ stretch, $O(\log k\cdot n^{1+1/k})$ edges and $O_ε(\log k\cdot n^{1/k})$ lightness. This is an exponential improvement in the dependence on $k$ compared to the previous result with such running time. Finally, for the important special case where $k=\log n$, for every constant $ε>0$, we provide an $O(m+n^{1+ε})$ time construction that produces an $O(\log n)$-spanner with $O(n)$ edges and $O(1)$ lightness which is asymptotically optimal. This is the first known sub-quadratic construction of such a spanner for any $k = ω(1)$. To achieve our constructions, we show a novel deterministic incremental approximate distance oracle, which may be of independent interest.

preprint2020arXiv

Content-aware Neural Hashing for Cold-start Recommendation

Content-aware recommendation approaches are essential for providing meaningful recommendations for \textit{new} (i.e., \textit{cold-start}) items in a recommender system. We present a content-aware neural hashing-based collaborative filtering approach (NeuHash-CF), which generates binary hash codes for users and items, such that the highly efficient Hamming distance can be used for estimating user-item relevance. NeuHash-CF is modelled as an autoencoder architecture, consisting of two joint hashing components for generating user and item hash codes. Inspired from semantic hashing, the item hashing component generates a hash code directly from an item's content information (i.e., it generates cold-start and seen item hash codes in the same manner). This contrasts existing state-of-the-art models, which treat the two item cases separately. The user hash codes are generated directly based on user id, through learning a user embedding matrix. We show experimentally that NeuHash-CF significantly outperforms state-of-the-art baselines by up to 12\% NDCG and 13\% MRR in cold-start recommendation settings, and up to 4\% in both NDCG and MRR in standard settings where all items are present while training. Our approach uses 2-4x shorter hash codes, while obtaining the same or better performance compared to the state of the art, thus consequently also enabling a notable storage reduction.

preprint2020arXiv

Factuality Checking in News Headlines with Eye Tracking

We study whether it is possible to infer if a news headline is true or false using only the movement of the human eyes when reading news headlines. Our study with 55 participants who are eye-tracked when reading 108 news headlines (72 true, 36 false) shows that false headlines receive statistically significantly less visual attention than true headlines. We further build an ensemble learner that predicts news headline factuality using only eye-tracking measurements. Our model yields a mean AUC of 0.688 and is better at detecting false than true headlines. Through a model analysis, we find that eye-tracking 25 users when reading 3-6 headlines is sufficient for our ensemble learner.

preprint2020arXiv

Unsupervised Semantic Hashing with Pairwise Reconstruction

Semantic Hashing is a popular family of methods for efficient similarity search in large-scale datasets. In Semantic Hashing, documents are encoded as short binary vectors (i.e., hash codes), such that semantic similarity can be efficiently computed using the Hamming distance. Recent state-of-the-art approaches have utilized weak supervision to train better performing hashing models. Inspired by this, we present Semantic Hashing with Pairwise Reconstruction (PairRec), which is a discrete variational autoencoder based hashing model. PairRec first encodes weakly supervised training pairs (a query document and a semantically similar document) into two hash codes, and then learns to reconstruct the same query document from both of these hash codes (i.e., pairwise reconstruction). This pairwise reconstruction enables our model to encode local neighbourhood structures within the hash code directly through the decoder. We experimentally compare PairRec to traditional and state-of-the-art approaches, and obtain significant performance improvements in the task of document similarity search.

preprint2016arXiv

Near-Optimal Induced Universal Graphs for Bounded Degree Graphs

A graph $U$ is an induced universal graph for a family $F$ of graphs if every graph in $F$ is a vertex-induced subgraph of $U$. For the family of all undirected graphs on $n$ vertices Alstrup, Kaplan, Thorup, and Zwick [STOC 2015] give an induced universal graph with $O\!\left(2^{n/2}\right)$ vertices, matching a lower bound by Moon [Proc. Glasgow Math. Assoc. 1965]. Let $k= \lceil D/2 \rceil$. Improving asymptotically on previous results by Butler [Graphs and Combinatorics 2009] and Esperet, Arnaud and Ochem [IPL 2008], we give an induced universal graph with $O\!\left(\frac{k2^k}{k!}n^k \right)$ vertices for the family of graphs with $n$ vertices of maximum degree $D$. For constant $D$, Butler gives a lower bound of $Ω\!\left(n^{D/2}\right)$. For an odd constant $D\geq 3$, Esperet et al. and Alon and Capalbo [SODA 2008] give a graph with $O\!\left(n^{k-\frac{1}{D}}\right)$ vertices. Using their techniques for any (including constant) even values of $D$ gives asymptotically worse bounds than we present. For large $D$, i.e. when $D = Ω\left(\log^3 n\right)$, the previous best upper bound was ${n\choose\lceil D/2\rceil} n^{O(1)}$ due to Adjiashvili and Rotbart [ICALP 2014]. We give upper and lower bounds showing that the size is ${\lfloor n/2\rfloor\choose\lfloor D/2 \rfloor}2^{\pm\tilde{O}\left(\sqrt{D}\right)}$. Hence the optimal size is $2^{\tilde{O}(D)}$ and our construction is within a factor of $2^{\tilde{O}\left(\sqrt{D}\right)}$ from this. The previous results were larger by at least a factor of $2^{Ω(D)}$. As a part of the above, proving a conjecture by Esperet et al., we construct an induced universal graph with $2n-1$ vertices for the family of graphs with max degree $2$. In addition, we give results for acyclic graphs with max degree $2$ and cycle graphs. Our results imply the first labeling schemes that for any $D$ are at most $o(n)$ bits from optimal.

preprint2016arXiv

Optimal induced universal graphs and adjacency labeling for trees

We show that there exists a graph $G$ with $O(n)$ nodes, where any forest of $n$ nodes is a node-induced subgraph of $G$. Furthermore, for constant arboricity $k$, the result implies the existence of a graph with $O(n^k)$ nodes that contains all $n$-node graphs as node-induced subgraphs, matching a $Ω(n^k)$ lower bound. The lower bound and previously best upper bounds were presented in Alstrup and Rauhe (FOCS'02). Our upper bounds are obtained through a $\log_2 n +O(1)$ labeling scheme for adjacency queries in forests. We hereby solve an open problem being raised repeatedly over decades, e.g. in Kannan, Naor, Rudich (STOC 1988), Chung (J. of Graph Theory 1990), Fraigniaud and Korman (SODA 2010).

preprint2016arXiv

Sublinear Distance Labeling

A distance labeling scheme labels the $n$ nodes of a graph with binary strings such that, given the labels of any two nodes, one can determine the distance in the graph between the two nodes by looking only at the labels. A $D$-preserving distance labeling scheme only returns precise distances between pairs of nodes that are at distance at least $D$ from each other. In this paper we consider distance labeling schemes for the classical case of unweighted graphs with both directed and undirected edges. We present a $O(\frac{n}{D}\log^2 D)$ bit $D$-preserving distance labeling scheme, improving the previous bound by Bollobás et. al. [SIAM J. Discrete Math. 2005]. We also give an almost matching lower bound of $Ω(\frac{n}{D})$. With our $D$-preserving distance labeling scheme as a building block, we additionally achieve the following results: 1. We present the first distance labeling scheme of size $o(n)$ for sparse graphs (and hence bounded degree graphs). This addresses an open problem by Gavoille et. al. [J. Algo. 2004], hereby separating the complexity from distance labeling in general graphs which require $Ω(n)$ bits, Moon [Proc. of Glasgow Math. Association 1965]. 2. For approximate $r$-additive labeling schemes, that return distances within an additive error of $r$ we show a scheme of size $O\left ( \frac{n}{r} \cdot\frac{\operatorname{polylog} (r\log n)}{\log n} \right )$ for $r \ge 2$. This improves on the current best bound of $O\left(\frac{n}{r}\right)$ by Alstrup et. al. [SODA 2016] for sub-polynomial $r$, and is a generalization of a result by Gawrychowski et al. [arXiv preprint 2015] who showed this for $r=2$.

preprint2015arXiv

Distance labeling schemes for trees

We consider distance labeling schemes for trees: given a tree with $n$ nodes, label the nodes with binary strings such that, given the labels of any two nodes, one can determine, by looking only at the labels, the distance in the tree between the two nodes. A lower bound by Gavoille et. al. (J. Alg. 2004) and an upper bound by Peleg (J. Graph Theory 2000) establish that labels must use $Θ(\log^2 n)$ bits\footnote{Throughout this paper we use $\log$ for $\log_2$.}. Gavoille et. al. (ESA 2001) show that for very small approximate stretch, labels use $Θ(\log n \log \log n)$ bits. Several other papers investigate various variants such as, for example, small distances in trees (Alstrup et. al., SODA'03). We improve the known upper and lower bounds of exact distance labeling by showing that $\frac{1}{4} \log^2 n$ bits are needed and that $\frac{1}{2} \log^2 n$ bits are sufficient. We also give ($1+ε$)-stretch labeling schemes using $Θ(\log n)$ bits for constant $ε>0$. ($1+ε$)-stretch labeling schemes with polylogarithmic label size have previously been established for doubling dimension graphs by Talwar (STOC 2004). In addition, we present matching upper and lower bounds for distance labeling for caterpillars, showing that labels must have size $2\log n - Θ(\log\log n)$. For simple paths with $k$ nodes and edge weights in $[1,n]$, we show that labels must have size $\frac{k-1}{k}\log n+Θ(\log k)$.

preprint2015arXiv

Simpler, faster and shorter labels for distances in graphs

We consider how to assign labels to any undirected graph with n nodes such that, given the labels of two nodes and no other information regarding the graph, it is possible to determine the distance between the two nodes. The challenge in such a distance labeling scheme is primarily to minimize the maximum label lenght and secondarily to minimize the time needed to answer distance queries (decoding). Previous schemes have offered different trade-offs between label lengths and query time. This paper presents a simple algorithm with shorter labels and shorter query time than any previous solution, thereby improving the state-of-the-art with respect to both label length and query time in one single algorithm. Our solution addresses several open problems concerning label length and decoding time and is the first improvement of label length for more than three decades. More specifically, we present a distance labeling scheme with label size (log 3)/2 + o(n) (logarithms are in base 2) and O(1) decoding time. This outperforms all existing results with respect to both size and decoding time, including Winkler's (Combinatorica 1983) decade-old result, which uses labels of size (log 3)n and O(n/log n) decoding time, and Gavoille et al. (SODA'01), which uses labels of size 11n + o(n) and O(loglog n) decoding time. In addition, our algorithm is simpler than the previous ones. In the case of integral edge weights of size at most W, we present almost matching upper and lower bounds for label sizes. For r-additive approximation schemes, where distances can be off by an additive constant r, we give both upper and lower bounds. In particular, we present an upper bound for 1-additive approximation schemes which, in the unweighted case, has the same size (ignoring second order terms) as an adjacency scheme: n/2. We also give results for bipartite graphs and for exact and 1-additive distance oracles.

preprint2014arXiv

Adjacency labeling schemes and induced-universal graphs

We describe a way of assigning labels to the vertices of any undirected graph on up to $n$ vertices, each composed of $n/2+O(1)$ bits, such that given the labels of two vertices, and no other information regarding the graph, it is possible to decide whether or not the vertices are adjacent in the graph. This is optimal, up to an additive constant, and constitutes the first improvement in almost 50 years of an $n/2+O(\log n)$ bound of Moon. As a consequence, we obtain an induced-universal graph for $n$-vertex graphs containing only $O(2^{n/2})$ vertices, which is optimal up to a multiplicative constant, solving an open problem of Vizing from 1968. We obtain similar tight results for directed graphs, tournaments and bipartite graphs.

preprint2013arXiv

Near-optimal labeling schemes for nearest common ancestors

We consider NCA labeling schemes: given a rooted tree $T$, label the nodes of $T$ with binary strings such that, given the labels of any two nodes, one can determine, by looking only at the labels, the label of their nearest common ancestor. For trees with $n$ nodes we present upper and lower bounds establishing that labels of size $(2\pm ε)\log n$, $ε<1$ are both sufficient and necessary. (All logarithms in this paper are in base 2.) Alstrup, Bille, and Rauhe (SIDMA'05) showed that ancestor and NCA labeling schemes have labels of size $\log n +Ω(\log \log n)$. Our lower bound increases this to $\log n + Ω(\log n)$ for NCA labeling schemes. Since Fraigniaud and Korman (STOC'10) established that labels in ancestor labeling schemes have size $\log n +Θ(\log \log n)$, our new lower bound separates ancestor and NCA labeling schemes. Our upper bound improves the $10 \log n$ upper bound by Alstrup, Gavoille, Kaplan and Rauhe (TOCS'04), and our theoretical result even outperforms some recent experimental studies by Fischer (ESA'09) where variants of the same NCA labeling scheme are shown to all have labels of size approximately $8 \log n$.

Stephen Alstrup

What is connected

Connect this record

See the researcher in context

Building this map preview

11 published item(s)

Constructing Light Spanners Deterministically in Near-Linear Time

Content-aware Neural Hashing for Cold-start Recommendation

Factuality Checking in News Headlines with Eye Tracking

Unsupervised Semantic Hashing with Pairwise Reconstruction

Near-Optimal Induced Universal Graphs for Bounded Degree Graphs

Optimal induced universal graphs and adjacency labeling for trees

Sublinear Distance Labeling

Distance labeling schemes for trees

Simpler, faster and shorter labels for distances in graphs

Adjacency labeling schemes and induced-universal graphs

Near-optimal labeling schemes for nearest common ancestors