Source author record

Sebastiano Vigna

Sebastiano Vigna appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Data Structures and Algorithms Social and Information Networks physics.soc-ph Information Retrieval Mathematical Software Cryptography and Security cs.CY Discrete Mathematics Logic in Computer Science Numerical Analysis Programming Languages

Catalog footprint

What is connected

28works

11topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

A New Test for Hamming-Weight Dependencies

We describe a new statistical test for pseudorandom number generators (PRNGs). Our test can find bias induced by dependencies among the Hamming weights of the outputs of a PRNG, even for PRNGs that pass state-of-the-art tests of the same kind from the literature, and in particular for generators based on $\mathbf F_2$-linear transformations such as the dSFMT, xoroshiro1024+, and WELL512.

preprint2022arXiv

Monotonicity in Undirected Networks

Is it always beneficial to create a new relationship (have a new follower/friend) in a social network? This question can be formally stated as a property of the centrality measure that defines the importance of the actors of the network. Score monotonicity means that adding an arc increases the centrality score of the target of the arc; rank monotonicity means that adding an arc improves the importance of the target of the arc relatively to the remaining nodes. It is known that most centralities are both score and rank monotone on directed, strongly connected graphs. In this paper, we study the problem of score and rank monotonicity for classical centrality measures in the case of undirected networks: in this case, we require that score, or relative importance, improve at both endpoints of the new edge. We show that, surprisingly, the situation in the undirected case is very different, and in particular that closeness, harmonic centrality, betweenness, eigenvector centrality, Seeley's index, Katz's index, and PageRank are not rank monotone; betweenness and PageRank are not even score monotone. In other words, while it is always a good thing to get a new follower, it is not always beneficial to get a new friend.

preprint2022arXiv

Scrambled Linear Pseudorandom Number Generators

$\mathbf F_2$-linear pseudorandom number generators are very popular due to their high speed, to the ease with which generators with a sizable state space can be created, and to their provable theoretical properties. However, they suffer from linear artifacts that show as failures in linearity-related statistical tests such as the binary-rank and the linear-complexity test. In this paper, we give two new contributions. First, we introduce two new $\mathbf F_2$-linear transformations that have been handcrafted to have good statistical properties and at the same time to be programmable very efficiently on superscalar processors, or even directly in hardware. Then, we describe some scramblers, that is, nonlinear functions applied to the state array that reduce or delete the linear artifacts, and propose combinations of linear transformations and scramblers that give extremely fast pseudorandom number generators of high quality. A novelty in our approach is that we use ideas from the theory of filtered linear-feedback shift registers to prove some properties of our scramblers, rather than relying purely on heuristics. In the end, we provide simple, extremely fast generators that use a few hundred bits of memory, have provable properties, and pass strong statistical tests.

preprint2022arXiv

Spectral Rank Monotonicity on Undirected Networks

We study the problem of score and rank monotonicity for spectral ranking methods, such as eigenvector centrality and PageRank, in the case of undirected networks. Score monotonicity means that adding an edge increases the score at both ends of the edge. Rank monotonicity means that adding an edge improves the relative position of both ends of the edge with respect to the remaining nodes. It is known that common spectral rankings are both score and rank monotone on directed, strongly connected graphs. We show that, surprisingly, the situation is very different for undirected graphs, and in particular that PageRank is neither score nor rank monotone.

preprint2016arXiv

An experimental exploration of Marsaglia's xorshift generators, scrambled

Marsaglia proposed recently xorshift generators as a class of very fast, good-quality pseudorandom number generators. Subsequent analysis by Panneton and L'Ecuyer has lowered the expectations raised by Marsaglia's paper, showing several weaknesses of such generators, verified experimentally using the TestU01 suite. Nonetheless, many of the weaknesses of xorshift generators fade away if their result is scrambled by a non-linear operation (as originally suggested by Marsaglia). In this paper we explore the space of possible generators obtained by multiplying the result of a xorshift generator by a suitable constant. We sample generators at 100 equispaced points of their state space and obtain detailed statistics that lead us to choices of parameters that improve on the current ones. We then explore for the first time the space of high-dimensional xorshift generators, following another suggestion in Marsaglia's paper, finding choices of parameters providing periods of length $2^{1024} - 1$ and $2^{4096} - 1$. The resulting generators are of extremely high quality, faster than current similar alternatives, and generate long-period sequences passing strong statistical tests using only eight logical operations, one addition and one multiplication by a constant.

preprint2016arXiv

BUbiNG: Massive Crawling for the Masses

Although web crawlers have been around for twenty years by now, there is virtually no freely available, opensource crawling software that guarantees high throughput, overcomes the limits of single-machine systems and at the same time scales linearly with the amount of resources available. This paper aims at filling this gap, through the description of BUbiNG, our next-generation web crawler built upon the authors' experience with UbiCrawler [Boldi et al. 2004] and on the last ten years of research on the topic. BUbiNG is an opensource Java fully distributed crawler; a single BUbiNG agent, using sizeable hardware, can crawl several thousands pages per second respecting strict politeness constraints, both host- and IP-based. Unlike existing open-source distributed crawlers that rely on batch techniques (like MapReduce), BUbiNG job distribution is based on modern high-speed protocols so to achieve very high throughput.

preprint2016arXiv

Efficient Optimally Lazy Algorithms for Minimal-Interval Semantics

Minimal-interval semantics associates with each query over a document a set of intervals, called witnesses, that are incomparable with respect to inclusion (i.e., they form an antichain): witnesses define the minimal regions of the document satisfying the query. Minimal-interval semantics makes it easy to define and compute several sophisticated proximity operators, provides snippets for user presentation, and can be used to rank documents. In this paper we provide algorithms for computing conjunction and disjunction that are linear in the number of intervals and logarithmic in the number of operands; for additional operators, such as ordered conjunction and Brouwerian difference, we provide linear algorithms. In all cases, space is linear in the number of operands. More importantly, we define a formal notion of optimal laziness, and either prove it, or prove its impossibility, for each algorithm. We cast our results in a general framework of antichains of intervals on total orders, making our algorithms directly applicable to other domains.

preprint2016arXiv

Fast Scalable Construction of (Minimal Perfect Hash) Functions

Recent advances in random linear systems on finite fields have paved the way for the construction of constant-time data structures representing static functions and minimal perfect hash functions using less space with respect to existing techniques. The main obstruction for any practical application of these results is the cubic-time Gaussian elimination required to solve these linear systems: despite they can be made very small, the computation is still too slow to be feasible. In this paper we describe in detail a number of heuristics and programming techniques to speed up the resolution of these systems by several orders of magnitude, making the overall construction competitive with the standard and widely used MWHC technique, which is based on hypergraph peeling. In particular, we introduce broadword programming techniques for fast equation manipulation and a lazy Gaussian elimination algorithm. We also describe a number of technical improvements to the data structure which further reduce space usage and improve lookup speed. Our implementation of these techniques yields a minimal perfect hash function data structure occupying 2.24 bits per element, compared to 2.68 for MWHC-based ones, and a static function data structure which reduces the multiplicative overhead from 1.23 to 1.03.

preprint2016arXiv

Further scramblings of Marsaglia's xorshift generators

xorshift* generators are a variant of Marsaglia's xorshift generators that eliminate linear artifacts typical of generators based on $\mathbf Z/2\mathbf Z$-linear operations using multiplication by a suitable constant. Shortly after high-dimensional xorshift* generators were introduced, Saito and Matsumoto suggested a different way to eliminate linear artifacts based on addition in $\mathbf Z/2^{32}\mathbf Z$, leading to the XSadd generator. Starting from the observation that the lower bits of XSadd are very weak, as its reverse fails systematically several statistical tests, we explore xorshift+, a variant of XSadd using 64-bit operations, which leads, in small dimension, to extremely fast high-quality generators.

preprint2015arXiv

Liquid FM: Recommending Music through Viscous Democracy

Most modern recommendation systems use the approach of collaborative filtering: users that are believed to behave alike are used to produce recommendations. In this work we describe an application (Liquid FM) taking a completely different approach. Liquid FM is a music recommendation system that makes the user responsible for the recommended items. Suggestions are the result of a voting scheme, employing the idea of viscous democracy. Liquid FM can also be thought of as the first testbed for this voting system. In this paper we outline the design and architecture of the application, both from the theoretical and from the implementation viewpoints.

preprint2014arXiv

A Weighted Correlation Index for Rankings with Ties

Understanding the correlation between two different scores for the same set of items is a common problem in information retrieval, and the most commonly used statistics that quantifies this correlation is Kendall's $τ$. However, the standard definition fails to capture that discordances between items with high rank are more important than those between items with low rank. Recently, a new measure of correlation based on average precision has been proposed to solve this problem, but like many alternative proposals in the literature it assumes that there are no ties in the scores. This is a major deficiency in a number of contexts, and in particular while comparing centrality scores on large graphs, as the obvious baseline, indegree, has a very large number of ties in web and social graphs. We propose to extend Kendall's definition in a natural way to take into account weights in the presence of ties. We prove a number of interesting mathematical properties of our generalization and describe an $O(n\log n)$ algorithm for its computation. We also validate the usefulness of our weighted measure of correlation using experimental data.

preprint2014arXiv

Fibonacci Binning

This note argues that when dot-plotting distributions typically found in papers about web and social networks (degree distributions, component-size distributions, etc.), and more generally distributions that have high variability in their tail, an exponentially binned version should always be plotted, too, and suggests Fibonacci binning as a visually appealing, easy-to-use and practical choice.

preprint2014arXiv

Interactions of cultures and top people of Wikipedia from ranking of 24 language editions

Wikipedia is a huge global repository of human knowledge, that can be leveraged to investigate interwinements between cultures. With this aim, we apply methods of Markov chains and Google matrix, for the analysis of the hyperlink networks of 24 Wikipedia language editions, and rank all their articles by PageRank, 2DRank and CheiRank algorithms. Using automatic extraction of people names, we obtain the top 100 historical figures, for each edition and for each algorithm. We investigate their spatial, temporal, and gender distributions in dependence of their cultural origins. Our study demonstrates not only the existence of skewness with local figures, mainly recognized only in their own cultures, but also the existence of global historical figures appearing in a large number of editions. By determining the birth time and place of these persons, we perform an analysis of the evolution of such figures through 35 centuries of human history for each language, thus recovering interactions and entanglement of cultures over time. We also obtain the distributions of historical figures over world countries, highlighting geographical aspects of cross-cultural links. Considering historical figures who appear in multiple editions as interactions between cultures, we construct a network of cultures and identify the most influential cultures according to this network.

preprint2014arXiv

Supremum-Norm Convergence for Step-Asynchronous Successive Overrelaxation on M-matrices

Step-asynchronous successive overrelaxation updates the values contained in a single vector using the usual Gauß-Seidel-like weighted rule, but arbitrarily mixing old and new values, the only constraint being temporal coherence: you cannot use a value before it has been computed. We show that given a nonnegative real matrix $A$, a $σ\geqρ(A)$ and a vector $\boldsymbol w>0$ such that $A\boldsymbol w\leqσ\boldsymbol w$, every iteration of step-asynchronous successive overrelaxation for the problem $(sI- A)\boldsymbol x=\boldsymbol b$, with $s >σ$, reduces geometrically the $\boldsymbol w$-norm of the current error by a factor that we can compute explicitly. Then, we show that given a $σ>ρ(A)$ it is in principle always possible to compute such a $\boldsymbol w$. This property makes it possible to estimate the supremum norm of the absolute error at each iteration without any additional hypothesis on $A$, even when $A$ is so large that computing the product $A\boldsymbol x$ is feasible, but estimating the supremum norm of $(sI-A)^{-1}$ is not.

preprint2013arXiv

Axioms for Centrality

Given a social network, which of its nodes are more central? This question has been asked many times in sociology, psychology and computer science, and a whole plethora of centrality measures (a.k.a. centrality indices, or rankings) were proposed to account for the importance of the nodes of a network. In this paper, we try to provide a mathematically sound survey of the most important classic centrality measures known from the literature and propose an axiomatic approach to establish whether they are actually doing what they have been designed for. Our axioms suggest some simple, basic properties that a centrality measure should exhibit. Surprisingly, only a new simple measure based on distances, harmonic centrality, turns out to satisfy all axioms; essentially, harmonic centrality is a correction to Bavelas's classic closeness centrality designed to take unreachable nodes into account in a natural way. As a sanity check, we examine in turn each measure under the lens of information retrieval, leveraging state-of-the-art knowledge in the discipline to measure the effectiveness of the various indices in locating web pages that are relevant to a query. While there are some examples of this comparisons in the literature, here for the first time we take into consideration centrality measures based on distances, such as closeness, in an information-retrieval setting. The results match closely the data we gathered using our axiomatic approach. Our results suggest that centrality measures based on distances, which have been neglected in information retrieval in favour of spectral centrality measures in the last years, are actually of very high quality; moreover, harmonic centrality pops up as an excellent general-purpose centrality index for arbitrary directed graphs.

preprint2013arXiv

Broadword Implementation of Parenthesis Queries

We continue the line of research started in "Broadword Implementation of Rank/Select Queries" proposing broadword (a.k.a. SWAR, "SIMD Within A Register") algorithms for finding matching closed parentheses and the k-th far closed parenthesis. Our algorithms work in time O(log w) on a word of w bits, and contain no branch and no test instruction. On 64-bit (and wider) architectures, these algorithms make it possible to avoid costly tabulations, while providing a very significant speedup with respect to for-loop implementations.

preprint2013arXiv

Cache-Oblivious Peeling of Random Hypergraphs

The computation of a peeling order in a randomly generated hypergraph is the most time-consuming step in a number of constructions, such as perfect hashing schemes, random $r$-SAT solvers, error-correcting codes, and approximate set encodings. While there exists a straightforward linear time algorithm, its poor I/O performance makes it impractical for hypergraphs whose size exceeds the available internal memory. We show how to reduce the computation of a peeling order to a small number of sequential scans and sorts, and analyze its I/O complexity in the cache-oblivious model. The resulting algorithm requires $O(\mathrm{sort}(n))$ I/Os and $O(n \log n)$ time to peel a random hypergraph with $n$ edges. We experimentally evaluate the performance of our implementation of this algorithm in a real-world scenario by using the construction of minimal perfect hash functions (MPHF) as our test case: our algorithm builds a MPHF of $7.6$ billion keys in less than $21$ hours on a single machine. The resulting data structure is both more space-efficient and faster than that obtained with the current state-of-the-art MPHF construction for large-scale key sets.

preprint2013arXiv

In-Core Computation of Geometric Centralities with HyperBall: A Hundred Billion Nodes and Beyond

Given a social network, which of its nodes are more central? This question has been asked many times in sociology, psychology and computer science, and a whole plethora of centrality measures (a.k.a. centrality indices, or rankings) were proposed to account for the importance of the nodes of a network. In this paper, we approach the problem of computing geometric centralities, such as closeness and harmonic centrality, on very large graphs; traditionally this task requires an all-pairs shortest-path computation in the exact case, or a number of breadth-first traversals for approximated computations, but these techniques yield very weak statistical guarantees on highly disconnected graphs. We rather assume that the graph is accessed in a semi-streaming fashion, that is, that adjacency lists are scanned almost sequentially, and that a very small amount of memory (in the order of a dozen bytes) per node is available in core memory. We leverage the newly discovered algorithms based on HyperLogLog counters, making it possible to approximate a number of geometric centralities at a very high speed and with high accuracy. While the application of similar algorithms for the approximation of closeness was attempted in the MapReduce framework, our exploitation of HyperLogLog counters reduces exponentially the memory footprint, paving the way for in-core processing of networks with a hundred billion nodes using "just" 2TiB of RAM. Moreover, the computations we describe are inherently parallelizable, and scale linearly with the number of available cores.

preprint2012arXiv

Four Degrees of Separation

Frigyes Karinthy, in his 1929 short story "Láancszemek" ("Chains") suggested that any two persons are distanced by at most six friendship links. (The exact wording of the story is slightly ambiguous: "He bet us that, using no more than five individuals, one of whom is a personal acquaintance, he could contact the selected individual [...]". It is not completely clear whether the selected individual is part of the five, so this could actually allude to distance five or six in the language of graph theory, but the "six degrees of separation" phrase stuck after John Guare's 1990 eponymous play. Following Milgram's definition and Guare's interpretation, we will assume that "degrees of separation" is the same as "distance minus one", where "distance" is the usual path length-the number of arcs in the path.) Stanley Milgram in his famous experiment challenged people to route postcards to a fixed recipient by passing them only through direct acquaintances. The average number of intermediaries on the path of the postcards lay between 4.4 and 5.7, depending on the sample of people chosen. We report the results of the first world-scale social-network graph-distance computations, using the entire Facebook network of active users (\approx721 million users, \approx69 billion friendship links). The average distance we observe is 4.74, corresponding to 3.74 intermediaries or "degrees of separation", showing that the world is even smaller than we expected, and prompting the title of this paper. More generally, we study the distance distribution of Facebook and of some interesting geographic subgraphs, looking also at their evolution over time. The networks we are able to explore are almost two orders of magnitude larger than those analysed in the previous literature. We report detailed statistical metadata showing that our measurements (which rely on probabilistic algorithms) are very accurate.

preprint2012arXiv

Four Degrees of Separation, Really

We recently measured the average distance of users in the Facebook graph, spurring comments in the scientific community as well as in the general press ("Four Degrees of Separation"). A number of interesting criticisms have been made about the meaningfulness, methods and consequences of the experiment we performed. In this paper we want to discuss some methodological aspects that we deem important to underline in the form of answers to the questions we have read in newspapers, magazines, blogs, or heard from colleagues. We indulge in some reflections on the actual meaning of "average distance" and make a number of side observations showing that, yes, 3.74 "degrees of separation" are really few.

preprint2012arXiv

Predecessor search with distance-sensitive query time

A predecessor (successor) search finds the largest element $x^-$ smaller than the input string $x$ (the smallest element $x^+$ larger than or equal to $x$, respectively) out of a given set $S$; in this paper, we consider the static case (i.e., $S$ is fixed and does not change over time) and assume that the $n$ elements of $S$ are available for inspection. We present a number of algorithms that, with a small additional index (usually of O(n log w) bits, where $w$ is the string length), can answer predecessor/successor queries quickly and with time bounds that depend on different kinds of distance, improving significantly several results that appeared in the recent literature. Intuitively, our first result has a running time that depends on the distance between $x$ and $x^\pm$: it is especially efficient when the input $x$ is either very close to or very far from $x^-$ or $x^+$; our second result depends on some global notion of distance in the set $S$, and is fast when the elements of $S$ are more or less equally spaced in the universe; finally, for our third result we rely on a finger (i.e., an element of $S$) to improve upon the first one; its running time depends on the distance between the input and the finger.

preprint2012arXiv

Quasi-Succinct Indices

Compressed inverted indices in use today are based on the idea of gap compression: documents pointers are stored in increasing order, and the gaps between successive document pointers are stored using suitable codes which represent smaller gaps using less bits. Additional data such as counts and positions is stored using similar techniques. A large body of research has been built in the last 30 years around gap compression, including theoretical modeling of the gap distribution, specialized instantaneous codes suitable for gap encoding, and ad hoc document reorderings which increase the efficiency of instantaneous codes. This paper proposes to represent an index using a different architecture based on quasi-succinct representation of monotone sequences. We show that, besides being theoretically elegant and simple, the new index provides expected constant-time operations and, in practice, significant performance improvements on conjunctive, phrasal and proximity queries.

preprint2011arXiv

HyperANF: Approximating the Neighbourhood Function of Very Large Graphs on a Budget

The neighbourhood function N(t) of a graph G gives, for each t, the number of pairs of nodes <x, y> such that y is reachable from x in less that t hops. The neighbourhood function provides a wealth of information about the graph (e.g., it easily allows one to compute its diameter), but it is very expensive to compute it exactly. Recently, the ANF algorithm (approximate neighbourhood function) has been proposed with the purpose of approximating NG(t) on large graphs. We describe a breakthrough improvement over ANF in terms of speed and scalability. Our algorithm, called HyperANF, uses the new HyperLogLog counters and combines them efficiently through broadword programming; our implementation uses overdecomposition to exploit multi-core parallelism. With HyperANF, for the first time we can compute in a few hours the neighbourhood function of graphs with billions of nodes with a small error and good confidence using a standard workstation. Then, we turn to the study of the distribution of the shortest paths between reachable nodes (that can be efficiently approximated by means of HyperANF), and discover the surprising fact that its index of dispersion provides a clear-cut characterisation of proper social networks vs. web graphs. We thus propose the spid (Shortest-Paths Index of Dispersion) of a graph as a new, informative statistics that is able to discriminate between the above two types of graphs. We believe this is the first proposal of a significant new non-local structural index for complex networks whose computation is highly scalable.

preprint2011arXiv

Layered Label Propagation: A MultiResolution Coordinate-Free Ordering for Compressing Social Networks

We continue the line of research on graph compression started with WebGraph, but we move our focus to the compression of social networks in a proper sense (e.g., LiveJournal): the approaches that have been used for a long time to compress web graphs rely on a specific ordering of the nodes (lexicographical URL ordering) whose extension to general social networks is not trivial. In this paper, we propose a solution that mixes clusterings and orders, and devise a new algorithm, called Layered Label Propagation, that builds on previous work on scalable clustering and can be used to reorder very large graphs (billions of nodes). Our implementation uses overdecomposition to perform aggressively on multi-core architecture, making it possible to reorder graphs of more than 600 millions nodes in a few hours. Experiments performed on a wide array of web graphs and social networks show that combining the order produced by the proposed algorithm with the WebGraph compression framework provides a major increase in compression with respect to all currently known techniques, both on web graphs and on social networks. These improvements make it possible to analyse in main memory significantly larger graphs.

preprint2011arXiv

Robustness of Social Networks: Comparative Results Based on Distance Distributions

Given a social network, which of its nodes have a stronger impact in determining its structure? More formally: which node-removal order has the greatest impact on the network structure? We approach this well-known problem for the first time in a setting that combines both web graphs and social networks, using datasets that are orders of magnitude larger than those appearing in the previous literature, thanks to some recently developed algorithms and software tools that make it possible to approximate accurately the number of reachable pairs and the distribution of distances in a graph. Our experiments highlight deep differences in the structure of social networks and web graphs, show significant limitations of previous experimental results, and at the same time reveal clustering by label propagation as a new and very effective way of locating nodes that are important from a structural viewpoint.

preprint2011arXiv

The Push Algorithm for Spectral Ranking

The push algorithm was proposed first by Jeh and Widom in the context of personalized PageRank computations (albeit the name "push algorithm" was actually used by Andersen, Chung and Lang in a subsequent paper). In this note we describe the algorithm at a level of generality that make the computation of the spectral ranking of any nonnegative matrix possible. Actually, the main contribution of this note is that the description is very simple (almost trivial), and it requires only a few elementary linear-algebra computations. Along the way, we give new precise ways of estimating the convergence of the algorithm, and describe some of the contribution of the existing literature, which again turn out to be immediate when recast in our framework.

preprint2010arXiv

E = I + T: The internal extent formula for compacted tries

It is well known that in a binary tree the external path length minus the internal path length is exactly 2n-2, where n is the number of external nodes. We show that a generalization of the formula holds for compacted tries, replacing the role of paths with the notion of extent, and the value 2n-2 with the trie measure, an estimation of the number of bits that are necessary to describe the trie.

preprint2003arXiv

Distributive Computability

This thesis presents a series of theoretical results and practical realisations about the theory of computation in distributive categories. Distributive categories have been proposed as a foundational tool for Computer Science in the last years, starting from the papers of R.F.C. Walters. We shall focus on two major topics: distributive computability, i.e., a generalized theory of computability based on distributive categories, and the Imp(G) language, which is a language based on the syntax of distributive categories. The link between the former and the latter is that the functions computed by Imp(G) programs are exactly the distributively computable functions.

Sebastiano Vigna

What is connected

Connect this record

See the researcher in context

Building this map preview

28 published item(s)

A New Test for Hamming-Weight Dependencies

Monotonicity in Undirected Networks

Scrambled Linear Pseudorandom Number Generators

Spectral Rank Monotonicity on Undirected Networks

An experimental exploration of Marsaglia's xorshift generators, scrambled

BUbiNG: Massive Crawling for the Masses

Efficient Optimally Lazy Algorithms for Minimal-Interval Semantics

Fast Scalable Construction of (Minimal Perfect Hash) Functions

Further scramblings of Marsaglia's xorshift generators

Liquid FM: Recommending Music through Viscous Democracy

A Weighted Correlation Index for Rankings with Ties

Fibonacci Binning

Interactions of cultures and top people of Wikipedia from ranking of 24 language editions

Supremum-Norm Convergence for Step-Asynchronous Successive Overrelaxation on M-matrices

Axioms for Centrality

Broadword Implementation of Parenthesis Queries

Cache-Oblivious Peeling of Random Hypergraphs

In-Core Computation of Geometric Centralities with HyperBall: A Hundred Billion Nodes and Beyond

Four Degrees of Separation

Four Degrees of Separation, Really

Predecessor search with distance-sensitive query time

Quasi-Succinct Indices

HyperANF: Approximating the Neighbourhood Function of Very Large Graphs on a Budget

Layered Label Propagation: A MultiResolution Coordinate-Free Ordering for Compressing Social Networks

Robustness of Social Networks: Comparative Results Based on Distance Distributions

The Push Algorithm for Spectral Ranking

E = I + T: The internal extent formula for compacted tries

Distributive Computability