Topic overview

Data Structures and Algorithms

3564 works6309 researchers

Open map Browse papers

Map preview

Start with the graph, then narrow the list

3564works

6309researchers

Next steps

Use the topic as a working map

Open the full map for clusters, then return here to scan ranked papers and people.

Inspect nearby papers, researchers, institutions and communities without opening a separate graph page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2016arXiv

Simulated Quantum Annealing Can Be Exponentially Faster than Classical Simulated Annealing

Simulated Quantum Annealing (SQA) is a Markov Chain Monte-Carlo algorithm that samples the equilibrium thermal state of a Quantum Annealing (QA) Hamiltonian. In addition to simulating quantum systems, SQA has also been proposed as another physics-inspired classical algorithm for combinatorial optimization, alongside classical simulated annealing. However, in many cases it remains an open challenge to determine the performance of both QA and SQA. One piece of evidence for the strength of QA over classical simulated annealing comes from an example by Farhi, Goldstone and Gutmann . There a bit-symmetric cost function with a thin, high energy barrier was designed to show an exponential seperation between classical simulated annealing, for which thermal fluctuations take exponential time to climb the barrier, and quantum annealing which passes through the barrier and reaches the global minimum in poly time, arguably by taking advantage of quantum tunneling. In this work we apply a comparison method to rigorously show that the Markov chain underlying SQA efficiently samples the target distribution and finds the global minimum of this spike cost function in polynomial time. Our work provi

preprint2016arXiv

Set membership with non-adaptive bit probes

We consider the non-adaptive bit-probe complexity of the set membership problem, where a set S of size at most n from a universe of size m is to be represented as a short bit vector in order to answer membership queries of the form "Is x in S?" by non-adaptively probing the bit vector at t places. Let s_N(m,n,t) be the minimum number of bits of storage needed for such a scheme. In this work, we show existence of non-adaptive and adaptive schemes for a range of t that improves an upper bound of Buhrman, Miltersen, Radhakrishnan and Srinivasan (2002) on s_N(m,n,t). For three non-adaptive probes, we improve the previous best lower bound on s_N(m,n,3) by Alon and Feige (2009).

preprint2016arXiv

Unit Interval Editing is Fixed-Parameter Tractable

Given a graph~$G$ and integers $k_1$, $k_2$, and~$k_3$, the unit interval editing problem asks whether $G$ can be transformed into a unit interval graph by at most $k_1$ vertex deletions, $k_2$ edge deletions, and $k_3$ edge additions. We give an algorithm solving this problem in time $2^{O(k\log k)}\cdot (n+m)$, where $k := k_1 + k_2 + k_3$, and $n, m$ denote respectively the numbers of vertices and edges of $G$. Therefore, it is fixed-parameter tractable parameterized by the total number of allowed operations. Our algorithm implies the fixed-parameter tractability of the unit interval edge deletion problem, for which we also present a more efficient algorithm running in time $O(4^k \cdot (n + m))$. Another result is an $O(6^k \cdot (n + m))$-time algorithm for the unit interval vertex deletion problem, significantly improving the algorithm of van 't Hof and Villanger, which runs in time $O(6^k \cdot n^6)$.

preprint2016arXiv

Clique-Width and Directed Width Measures for Answer-Set Programming

Disjunctive Answer Set Programming (ASP) is a powerful declarative programming paradigm whose main decision problems are located on the second level of the polynomial hierarchy. Identifying tractable fragments and developing efficient algorithms for such fragments are thus important objectives in order to complement the sophisticated ASP systems available to date. Hard problems can become tractable if some problem parameter is bounded by a fixed constant; such problems are then called fixed-parameter tractable (FPT). While several FPT results for ASP exist, parameters that relate to directed or signed graphs representing the program at hand have been neglected so far. In this paper, we first give some negative observations showing that directed width measures on the dependency graph of a program do not lead to FPT results. We then consider the graph parameter of signed clique-width and present a novel dynamic programming algorithm that is FPT w.r.t. this parameter. Clique-width is more general than the well-known treewidth, and, to the best of our knowledge, ours is the first FPT algorithm for bounded clique-width for reasoning problems beyond SAT.

preprint2015arXiv

Fast Computation of Abelian Runs

Given a word $w$ and a Parikh vector $\mathcal{P}$, an abelian run of period $\mathcal{P}$ in $w$ is a maximal occurrence of a substring of $w$ having abelian period $\mathcal{P}$. Our main result is an online algorithm that, given a word $w$ of length $n$ over an alphabet of cardinality $σ$ and a Parikh vector $\mathcal{P}$, returns all the abelian runs of period $\mathcal{P}$ in $w$ in time $O(n)$ and space $O(σ+p)$, where $p$ is the norm of $\mathcal{P}$, i.e., the sum of its components. We also present an online algorithm that computes all the abelian runs with periods of norm $p$ in $w$ in time $O(np)$, for any given norm $p$. Finally, we give an $O(n^2)$-time offline randomized algorithm for computing all the abelian runs of $w$. Its deterministic counterpart runs in $O(n^2\logσ)$ time.

preprint2017arXiv

An Agglomeration Law for Sorting Networks and its Application in Functional Programming

In this paper we will present a general agglomeration law for sorting networks. Agglomeration is a common technique when designing parallel programmes to control the granularity of the computation thereby finding a better fit between the algorithm and the machine on which the algorithm runs. Usually this is done by grouping smaller tasks and computing them en bloc within one parallel process. In the case of sorting networks this could be done by computing bigger parts of the network with one process. The agglomeration law in this paper pursues a different strategy: The input data is grouped and the algorithm is generalized to work on the agglomerated input while the original structure of the algorithm remains. This will result in a new access opportunity to sorting networks well-suited for efficient parallelization on modern multicore computers, computer networks or GPGPU programming. Additionally this enables us to use sorting networks as (parallel or distributed) merging stages for arbitrary sorting algorithms, thereby creating new hybrid sorting algorithms with ease. The expressiveness of functional programming languages helps us to apply this law to systematically constructed s

preprint2015arXiv

Algebraic Conditions for Generating Accurate Adjacency Arrays

Data processing systems impose multiple views on data as it is processed by the system. These views include spreadsheets, databases, matrices, and graphs. Associative arrays unify and simplify these different approaches into a common two-dimensional view of data. Graph construction, a fundamental operation in the data processing pipeline, is typically done by multiplying the incidence array representations of a graph, $\mathbf{E}_\mathrm{in}$ and $\mathbf{E}_\mathrm{out}$, to produce an adjacency matrix of the graph that can be processed with a variety of machine learning clustering techniques. This work focuses on establishing the mathematical criteria to ensure that the matrix product $\mathbf{E}_\mathrm{out}^\intercal\mathbf{E}_\mathrm{in}$ is the adjacency array of the graph. It will then be shown that these criteria are also necessary and sufficient for the remaining nonzero product of incidence arrays, $\mathbf{E}_\mathrm{in}^\intercal\mathbf{E}_\mathrm{out}$ to be the adjacency matrices of the reversed graph. Algebraic structures that comply with the criteria will be identified and discussed.

preprint2016arXiv

Parallel Algorithms for Core Maintenance in Dynamic Graphs

This paper initiates the studies of parallel algorithms for core maintenance in dynamic graphs. The core number is a fundamental index reflecting the cohesiveness of a graph, which are widely used in large-scale graph analytics. The core maintenance problem requires to update the core numbers of vertices after a set of edges and vertices are inserted into or deleted from the graph. We investigate the parallelism in the core update process when multiple edges and vertices are inserted or deleted. Specifically, we discover a structure called superior edge set, the insertion or deletion of edges in which can be processed in parallel. Based on the structure of superior edge set, efficient parallel algorithms are then devised for incremental and decremental core maintenance respectively. To the best of our knowledge, the proposed algorithms are the first parallel ones for the fundamental core maintenance problem. The algorithms show a significant speedup in the processing time compared with previous results that sequentially handle edge and vertex insertions/deletions. Finally, extensive experiments are conducted on different types of real-world and synthetic datasets, and the results i

preprint2016arXiv

Connecting a Set of Circles with Minimum Sum of Radii

We consider the problem of assigning radii to a given set of points in the plane, such that the resulting set of circles is connected, and the sum of radii is minimized. We show that the problem is polynomially solvable if a connectivity tree is given. If the connectivity tree is unknown, the problem is NP-hard if there are upper bounds on the radii and open otherwise. We give approximation guarantees for a variety of polynomial-time algorithms, describe upper and lower bounds (which are matching in some of the cases), provide polynomial-time approximation schemes, and conclude with experimental results and open problems.

preprint2017arXiv

Fast and Powerful Hashing using Tabulation

Randomized algorithms are often enjoyed for their simplicity, but the hash functions employed to yield the desired probabilistic guarantees are often too complicated to be practical. Here we survey recent results on how simple hashing schemes based on tabulation provide unexpectedly strong guarantees. Simple tabulation hashing dates back to Zobrist [1970]. Keys are viewed as consisting of $c$ characters and we have precomputed character tables $h_1,...,h_c$ mapping characters to random hash values. A key $x=(x_1,...,x_c)$ is hashed to $h_1[x_1] \oplus h_2[x_2].....\oplus h_c[x_c]$. This schemes is very fast with character tables in cache. While simple tabulation is not even 4-independent, it does provide many of the guarantees that are normally obtained via higher independence, e.g., linear probing and Cuckoo hashing. Next we consider twisted tabulation where one input character is "twisted" in a simple way. The resulting hash function has powerful distributional properties: Chernoff-Hoeffding type tail bounds and a very small bias for min-wise hashing. This also yields an extremely fast pseudo-random number generator that is provably good for many classic randomized algori

preprint2016arXiv

New Geometric Algorithms for Fully Connected Staged Self-Assembly

We consider staged self-assembly systems, in which square-shaped tiles can be added to bins in several stages. Within these bins, the tiles may connect to each other, depending on the glue types of their edges. Previous work by Demaine et al. showed that a relatively small number of tile types suffices to produce arbitrary shapes in this model. However, these constructions were only based on a spanning tree of the geometric shape, so they did not produce full connectivity of the underlying grid graph in the case of shapes with holes; designing fully connected assemblies with a polylogarithmic number of stages was left as a major open problem. We resolve this challenge by presenting new systems for staged assembly that produce fully connected polyominoes in O(log^2 n) stages, for various scale factors and temperature τ = 2 as well as τ = 1. Our constructions work even for shapes with holes and uses only a constant number of glues and tiles. Moreover, the underlying approach is more geometric in nature, implying that it promised to be more feasible for shapes with compact geometric description.

preprint2015arXiv

A Note on Easy and Efficient Computation of Full Abelian Periods of a Word

Constantinescu and Ilie (Bulletin of the EATCS 89, 167-170, 2006) introduced the idea of an Abelian period with head and tail of a finite word. An Abelian period is called full if both the head and the tail are empty. We present a simple and easy-to-implement $O(n\log\log n)$-time algorithm for computing all the full Abelian periods of a word of length $n$ over a constant-size alphabet. Experiments show that our algorithm significantly outperforms the $O(n)$ algorithm proposed by Kociumaka et al. (Proc. of STACS, 245-256, 2013) for the same problem.

preprint2016arXiv

The Competition Complexity of Auctions: A Bulow-Klemperer Result for Multi-Dimensional Bidders

A seminal result of Bulow and Klemperer [1989] demonstrates the power of competition for extracting revenue: when selling a single item to $n$ bidders whose values are drawn i.i.d. from a regular distribution, the simple welfare-maximizing VCG mechanism (in this case, a second price-auction) with one additional bidder extracts at least as much revenue in expectation as the optimal mechanism. The beauty of this theorem stems from the fact that VCG is a {\em prior-independent} mechanism, where the seller possesses no information about the distribution, and yet, by recruiting one additional bidder it performs better than any prior-dependent mechanism tailored exactly to the distribution at hand (without the additional bidder). In this work, we establish the first {\em full Bulow-Klemperer} results in {\em multi-dimensional} environments, proving that by recruiting additional bidders, the revenue of the VCG mechanism surpasses that of the optimal (possibly randomized, Bayesian incentive compatible) mechanism. For a given environment with i.i.d. bidders, we term the number of additional bidders needed to achieve this guarantee the environment's {\em competition complexity}. Using th

preprint2016arXiv

Coalescing random walks and voting on connected graphs

In a coalescing random walk, a set of particles make independent random walks on a graph. Whenever one or more particles meet at a vertex, they unite to form a single particle, which then continues the random walk through the graph. Coalescing random walks can be used to achieve consensus in distributed networks, and is the basis of the self-stabilizing mutual exclusion algorithm of Israeli and Jalfon. Let G=(V,E), be an undirected, connected n vertex graph with m edges. Let C(n) be the expected time for all particles to coalesce, when initially one particle is located at each vertex of an n vertex graph. We study the problem of bounding the coalescence time C(n) for general classes of graphs. Our main result is that C(n)= O(1/(1-lambda_2))*((log n)^4 +n/A)), where lambda_2 is the absolute value of the second largest eigenvalue of the transition matrix of the random walk, A= (sum d^2(v))/(d^2 n), d(v) is the degree of vertex v, and d is the average node degree. The parameter A is an indicator of the variability of node degrees. Thus 1 <= A =O(n), with A=1 for regular graphs.

preprint2016arXiv

Online Algorithms for Multi-Level Aggregation

In the Multi-Level Aggregation Problem (MLAP), requests arrive at the nodes of an edge-weighted tree T, and have to be served eventually. A service is defined as a subtree X of T that contains its root. This subtree X serves all requests that are pending in the nodes of X, and the cost of this service is equal to the total weight of X. Each request also incurs waiting cost between its arrival and service times. The objective is to minimize the total waiting cost of all requests plus the total cost of all service subtrees. MLAP is a generalization of some well-studied optimization problems; for example, for trees of depth 1, MLAP is equivalent to the TCP Acknowledgment Problem, while for trees of depth 2, it is equivalent to the Joint Replenishment Problem. Aggregation problem for trees of arbitrary depth arise in multicasting, sensor networks, communication in organization hierarchies, and in supply-chain management. The instances of MLAP associated with these applications are naturally online, in the sense that aggregation decisions need to be made without information about future requests. Constant-competitive online algorithms are known for MLAP with one or two levels. However,

preprint2016arXiv

Graph Minors for Preserving Terminal Distances Approximately - Lower and Upper Bounds

Given a graph where vertices are partitioned into $k$ terminals and non-terminals, the goal is to compress the graph (i.e., reduce the number of non-terminals) using minor operations while preserving terminal distances approximately.The distortion of a compressed graph is the maximum multiplicative blow-up of distances between all pairs of terminals. We study the trade-off between the number of non-terminals and the distortion. This problem generalizes the Steiner Point Removal (SPR) problem, in which all non-terminals must be removed. We introduce a novel black-box reduction to convert any lower bound on distortion for the SPR problem into a super-linear lower bound on the number of non-terminals, with the same distortion, for our problem. This allows us to show that there exist graphs such that every minor with distortion less than $2~/~2.5~/~3$ must have $Ω(k^2)~/~Ω(k^{5/4})~/~Ω(k^{6/5})$ non-terminals, plus more trade-offs in between. The black-box reduction has an interesting consequence: if the tight lower bound on distortion for the SPR problem is super-constant, then allowing any $O(k)$ non-terminals will not help improving the lower bound to a constant. We also build on th

preprint2016arXiv

Monte Carlo Sort for unreliable human comparisons

Algorithms which sort lists of real numbers into ascending order have been studied for decades. They are typically based on a series of pairwise comparisons and run entirely on chip. However people routinely sort lists which depend on subjective or complex judgements that cannot be automated. Examples include marketing research; where surveys are used to learn about customer preferences for products, the recruiting process; where interviewers attempt to rank potential employees, and sporting tournaments; where we infer team rankings from a series of one on one matches. We develop a novel sorting algorithm, where each pairwise comparison reflects a subjective human judgement about which element is bigger or better. We introduce a finite and large error rate to each judgement, and we take the cost of each comparison to significantly exceed the cost of other computational steps. The algorithm must request the most informative sequence of comparisons from the user; in order to identify the correct sorted list with minimum human input. Our Discrete Adiabatic Monte Carlo approach exploits the gradual acquisition of information by tracking a set of plausible hypotheses which are updated a

preprint2016arXiv

On-line approach to off-line coloring problems on graphs with geometric representations

The main goal of this paper is to formalize and explore a connection between chromatic properties of graphs with geometric representations and competitive analysis of on-line algorithms, which became apparent after the recent construction of triangle-free geometric intersection graphs with arbitrarily large chromatic number due to Pawlik et al. We show that on-line graph coloring problems give rise to classes of game graphs with a natural geometric interpretation. We use this concept to estimate the chromatic number of graphs with geometric representations by finding, for appropriate simpler graphs, on-line coloring algorithms using few colors or proving that no such algorithms exist. We derive upper and lower bounds on the maximum chromatic number that rectangle overlap graphs, subtree overlap graphs, and interval filament graphs (all of which generalize interval overlap graphs) can have when their clique number is bounded. The bounds are absolute for interval filament graphs and asymptotic of the form $(\log\log n)^{f(ω)}$ for rectangle and subtree overlap graphs, where $f(ω)$ is a polynomial function of the clique number and $n$ is the number of vertices. In particular, we provi

preprint2016arXiv

A Unified Approach to Analyzing Asynchronous Coordinate Descent and Tatonnement

This paper concerns asynchrony in iterative processes, focusing on gradient descent and tatonnement, a fundamental price dynamic. Gradient descent is an important class of iterative algorithms for minimizing convex functions. Classically, gradient descent has been a sequential and synchronous process, although distributed and asynchronous variants have been studied since the 1980s. Coordinate descent is a commonly studied version of gradient descent. In this paper, we focus on asynchronous coordinate descent on convex functions $F:\mathbb{R}^n\rightarrow\mathbb{R}$ of the form $F(x) = f(x) + \sum_{k=1}^n Ψ_k(x_k)$, where $f:\mathbb{R}^n\rightarrow\mathbb{R}$ is a smooth convex function, and each $Ψ_k:\mathbb{R}\rightarrow\mathbb{R}$ is a univariate and possibly non-smooth convex function. Such functions occur in many data analysis and machine learning problems. We give new analyses of cyclic coordinate descent, a parallel asynchronous stochastic coordinate descent, and a rather general worst-case parallel asynchronous coordinate descent. For all of these, we either obtain sharply improved bounds, or provide the first analyses. Our analyses all use a common amortized framework. The

preprint2016arXiv

Provable learning of Noisy-or Networks

Many machine learning applications use latent variable models to explain structure in data, whereby visible variables (= coordinates of the given datapoint) are explained as a probabilistic function of some hidden variables. Finding parameters with the maximum likelihood is NP-hard even in very simple settings. In recent years, provably efficient algorithms were nevertheless developed for models with linear structures: topic models, mixture models, hidden markov models, etc. These algorithms use matrix or tensor decomposition, and make some reasonable assumptions about the parameters of the underlying model. But matrix or tensor decomposition seems of little use when the latent variable model has nonlinearities. The current paper shows how to make progress: tensor decomposition is applied for learning the single-layer {\em noisy or} network, which is a textbook example of a Bayes net, and used for example in the classic QMR-DT software for diagnosing which disease(s) a patient may have by observing the symptoms he/she exhibits. The technical novelty here, which should be useful in other settings in future, is analysis of tensor decomposition in presence of systematic error (i.e., w

preprint2016arXiv

A Constant Optimization of the Binary Indexed Tree Query Operation

There are several data structures which can calculate the prefix sums of an array efficiently, while handling point updates on the array, such as Segment Trees and Binary Indexed Trees (BIT). Both these data structures can handle the these two operations (query and update) in $O(\log{n})$ time. In this paper, we present a data structure similar to the BIT, but with an even smaller constant. To do this, we use Zeckendorf's Theorem, a property of the Fibonacci sequence of numbers. The new data structure achieves the same complexity of $O(\log{n})$, but requires about $\log_{ϕ^{2}} n$ computations for the Query Operation as opposed to the $\log_{2} n$ computations required for a BIT Query Operation in the worst case.

preprint2016arXiv

A Non-generative Framework and Convex Relaxations for Unsupervised Learning

We give a novel formal theoretical framework for unsupervised learning with two distinctive characteristics. First, it does not assume any generative model and based on a worst-case performance metric. Second, it is comparative, namely performance is measured with respect to a given hypothesis class. This allows to avoid known computational hardness results and improper algorithms based on convex relaxations. We show how several families of unsupervised learning models, which were previously only analyzed under probabilistic assumptions and are otherwise provably intractable, can be efficiently learned in our framework by convex optimization.

preprint2016arXiv

Joins via Geometric Resolutions: Worst-case and Beyond

We present a simple geometric framework for the relational join. Using this framework, we design an algorithm that achieves the fractional hypertree-width bound, which generalizes classical and recent worst-case algorithmic results on computing joins. In addition, we use our framework and the same algorithm to show a series of what are colloquially known as beyond worst-case results. The framework allows us to prove results for data stored in Btrees, multidimensional data structures, and even multiple indices per table. A key idea in our framework is formalizing the inference one does with an index as a type of geometric resolution; transforming the algorithmic problem of computing joins to a geometric problem. Our notion of geometric resolution can be viewed as a geometric analog of logical resolution. In addition to the geometry and logic connections, our algorithm can also be thought of as backtracking search with memoization.

preprint2016arXiv

Spectral algorithms for tensor completion

In the tensor completion problem, one seeks to estimate a low-rank tensor based on a random sample of revealed entries. In terms of the required sample size, earlier work revealed a large gap between estimation with unbounded computational resources (using, for instance, tensor nuclear norm minimization) and polynomial-time algorithms. Among the latter, the best statistical guarantees have been proved, for third-order tensors, using the sixth level of the sum-of-squares (SOS) semidefinite programming hierarchy (Barak and Moitra, 2014). However, the SOS approach does not scale well to large problem instances. By contrast, spectral methods --- based on unfolding or matricizing the tensor --- are attractive for their low complexity, but have been believed to require a much larger sample size. This paper presents two main contributions. First, we propose a new unfolding-based method, which outperforms naive ones for symmetric $k$-th order tensors of rank $r$. For this result we make a study of singular space estimation for partially revealed matrices of large aspect ratio, which may be of independent interest. For third-order tensors, our algorithm matches the SOS method in terms of sa

535 works