Source author record

Henning Meyerhenke

Henning Meyerhenke appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Data Structures and Algorithms Social and Information Networks Distributed, Parallel, and Cluster Computing physics.soc-ph Machine Learning Computational Engineering, Finance, and Science Computational Geometry Digital Libraries Discrete Mathematics Human-Computer Interaction math.CO Neural and Evolutionary Computing physics.ao-ph

Catalog footprint

What is connected

31works

13topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Fast Dynamic Updates and Dynamic SpGEMM on MPI-Distributed Graphs

Sparse matrix multiplication (SpGEMM) is a fundamental kernel used in many diverse application areas, both numerical and discrete. For example, many algebraic graph algorithms rely on SpGEMM in the tropical semiring to compute shortest paths in graphs. Recently, SpGEMM has received growing attention regarding implementations for specific (parallel) architectures. Yet, this concerns only the static problem, where both input matrices do not change. In many applications, however, matrices (or their corresponding graphs) change over time. Although recomputing from scratch is very expensive, we are not aware of any dynamic SpGEMM algorithms in the literature. In this paper, we thus propose a batch-dynamic algorithm for MPI-based parallel computing. Building on top of a distributed graph/matrix data structure that allows for fast updates, our dynamic SpGEMM reduces the communication volume significantly. It does so by exploiting that updates change far fewer matrix entries than there are non-zeros in the input operands. Our experiments with popular benchmark graphs show that our approach pays off. For batches of insertions or removals of matrix entries, our dynamic SpGEMM is substantially faster than the static algorithms in the state-of-the-art competitors CombBLAS, CTF and PETSc.

preprint2022arXiv

Interactive Visualization of Protein RINs using NetworKit in the Cloud

Network analysis has been applied in diverse application domains. In this paper, we consider an example from protein dynamics, specifically residue interaction networks (RINs). In this context, we use NetworKit -- an established package for network analysis -- to build a cloud-based environment that enables domain scientists to run their visualization and analysis workflows on large compute servers, without requiring extensive programming and/or system administration knowledge. To demonstrate the versatility of this approach, we use it to build a custom Jupyter-based widget for RIN visualization. In contrast to existing RIN visualization approaches, our widget can easily be customized through simple modifications of Python code, while both supporting a good feature set and providing near real-time speed. It is also easily integrated into analysis pipelines (e.g., that use Python to feed RIN data into downstream machine learning tasks).

preprint2022arXiv

More Recent Advances in (Hyper)Graph Partitioning

In recent years, significant advances have been made in the design and evaluation of balanced (hyper)graph partitioning algorithms. We survey trends of the last decade in practical algorithms for balanced (hyper)graph partitioning together with future research directions. Our work serves as an update to a previous survey on the topic. In particular, the survey extends the previous survey by also covering hypergraph partitioning and streaming algorithms, and has an additional focus on parallel algorithms.

preprint2022arXiv

Tarema: Adaptive Resource Allocation for Scalable Scientific Workflows in Heterogeneous Clusters

Scientific workflow management systems like Nextflow support large-scale data analysis by abstracting away the details of scientific workflows. In these systems, workflows consist of several abstract tasks, of which instances are run in parallel and transform input partitions into output partitions. Resource managers like Kubernetes execute such workflow tasks on cluster infrastructures. However, these resource managers only consider the number of CPUs and the amount of available memory when assigning tasks to resources; they do not consider hardware differences beyond these numbers, while computational speed and memory access rates can differ significantly. We propose Tarema, a system for allocating task instances to heterogeneous cluster resources during the execution of scalable scientific workflows. First, Tarema profiles the available infrastructure with a set of benchmark programs and groups cluster nodes with similar performance. Second, Tarema uses online monitoring data of tasks, assigning labels to tasks depending on their resource usage. Third, Tarema uses the node groups and task labels to dynamically assign task instances evenly to resources based on resource demand. Our evaluation of a prototype implementation for Kubernetes, using five real-world Nextflow workflows from the popular nf-core framework and two 15-node clusters consisting of different virtual machines, shows a mean reduction of isolated job runtimes by 19.8% compared to popular schedulers in widely-used resource managers and 4.54% compared to the heuristic SJFN, while providing a better cluster usage. Moreover, executing two long-running workflows in parallel and on restricted resources shows that Tarema is able to reduce the runtimes even more while providing a fair cluster usage.

preprint2021arXiv

Approximation of the Diagonal of a Laplacian's Pseudoinverse for Complex Network Analysis

The ubiquity of massive graph data sets in numerous applications requires fast algorithms for extracting knowledge from these data. We are motivated here by three electrical measures for the analysis of large small-world graphs $G = (V, E)$ -- i.e., graphs with diameter in $O(\log |V|)$, which are abundant in complex network analysis. From a computational point of view, the three measures have in common that their crucial component is the diagonal of the graph Laplacian's pseudoinverse, $L^\dagger$. Computing diag$(L^\dagger)$ exactly by pseudoinversion, however, is as expensive as dense matrix multiplication -- and the standard tools in practice even require cubic time. Moreover, the pseudoinverse requires quadratic space -- hardly feasible for large graphs. Resorting to approximation by, e.g., using the Johnson-Lindenstrauss transform, requires the solution of $O(\log |V| / ε^2)$ Laplacian linear systems to guarantee a relative error, which is still very expensive for large inputs. In this paper, we present a novel approximation algorithm that requires the solution of only one Laplacian linear system. The remaining parts are purely combinatorial -- mainly sampling uniform spanning trees, which we relate to diag$(L^\dagger)$ via effective resistances. For small-world networks, our algorithm obtains a $\pm ε$-approximation with high probability, in a time that is nearly-linear in $|E|$ and quadratic in $1 / ε$. Another positive aspect of our algorithm is its parallel nature due to independent sampling. We thus provide two parallel implementations of our algorithm: one using OpenMP, one MPI + OpenMP. In our experiments against the state of the art, our algorithm (i) yields more accurate results, (ii) is much faster and more memory-efficient, and (iii) obtains good parallel speedups, in particular in the distributed setting.

preprint2021arXiv

New Approximation Algorithms for Forest Closeness Centrality -- for Individual Vertices and Vertex Groups

The emergence of massive graph data sets requires fast mining algorithms. Centrality measures to identify important vertices belong to the most popular analysis methods in graph mining. A measure that is gaining attention is forest closeness centrality; it is closely related to electrical measures using current flow but can also handle disconnected graphs. Recently, [Jin et al., ICDM'19] proposed an algorithm to approximate this measure probabilistically. Their algorithm processes small inputs quickly, but does not scale well beyond hundreds of thousands of vertices. In this paper, we first propose a different approximation algorithm; it is up to two orders of magnitude faster and more accurate in practice. Our method exploits the strong connection between uniform spanning trees and forest distances by adapting and extending recent approximation algorithms for related single-vertex problems. This results in a nearly-linear time algorithm with an absolute probabilistic error guarantee. In addition, we are the first to consider the problem of finding an optimal group of vertices w.r.t. forest closeness. We prove that this latter problem is NP-hard; to approximate it, we adapt a greedy algorithm by [Li et al., WWW'19], which is based on (partial) matrix inversion. Moreover, our experiments show that on disconnected graphs, group forest closeness outperforms existing centrality measures in the context of semi-supervised vertex classification.

preprint2021arXiv

The climatic interdependence of extreme-rainfall events around the globe

The identification of regions of similar climatological behavior can be utilized for the discovery of spatial relationships over long-range scales, including teleconnections. In this regard, the global picture of the interdependence patterns of extreme rainfall events (EREs) still needs to be further explored. To this end, we propose a top-down complex-network-based clustering workflow, with the combination of consensus clustering and mutual correspondences. Consensus clustering provides a reliable community structure under each dataset, while mutual correspondences build a matching relationship between different community structures obtained from different datasets. This approach ensures the robustness of the identified structures when multiple datasets are available. By applying it simultaneously to two satellite-derived precipitation datasets, we identify consistent synchronized structures of EREs around the globe, during boreal summer. Two of them show independent spatiotemporal characteristics, uncovering the primary compositions of different monsoon systems. They explicitly manifest the primary intraseasonal variability in the context of the global monsoon, in particular the `monsoon jump' over both East Asia and West Africa and the mid-summer drought over Central America and southern Mexico. Through a case study related to the Asian summer monsoon (ASM), we verify that the intraseasonal changes of upper-level atmospheric conditions are preserved by significant connections within the global synchronization structure. Our work advances network-based clustering methodology for (i) decoding the spatiotemporal configuration of interdependence patterns of natural variability and for (ii) the intercomparison of these patterns, especially regarding their spatial distributions over different datasets.

preprint2020arXiv

Combined Centrality Measures for an Improved Characterization of Influence Spread in Social Networks

Influence Maximization (IM) aims at finding the most influential users in a social network, i. e., users who maximize the spread of an opinion within a certain propagation model. Previous work investigated the correlation between influence spread and nodal centrality measures to bypass more expensive IM simulations. The results were promising but incomplete, since these studies investigated the performance (i. e., the ability to identify influential users) of centrality measures only in restricted settings, e. g., in undirected/unweighted networks and/or within a propagation model less common for IM. In this paper, we first show that good results within the Susceptible- Infected-Removed (SIR) propagation model for unweighted and undirected networks do not necessarily transfer to directed or weighted networks under the popular Independent Cascade (IC) propagation model. Then, we identify a set of centrality measures with good performance for weighted and directed networks within the IC model. Our main contribution is a new way to combine the centrality measures in a closed formula to yield even better results. Additionally, we also extend gravitational centrality (GC) with the proposed combined centrality measures. Our experiments on 50 real-world data sets show that our proposed centrality measures outperform well-known centrality measures and the state-of-the art GC measure significantly. social networks, influence maximization, centrality measures, IC propagation model, influential spreaders

preprint2020arXiv

High-Quality Hierarchical Process Mapping

Partitioning graphs into blocks of roughly equal size such that few edges run between blocks is a frequently needed operation when processing graphs on a parallel computer. When a topology of a distributed system is known an important task is then to map the blocks of the partition onto the processors such that the overall communication cost is reduced. We present novel multilevel algorithms that integrate graph partitioning and process mapping. Important ingredients of our algorithm include fast label propagation, more localized local search, initial partitioning, as well as a compressed data structure to compute processor distances without storing a distance matrix. Experiments indicate that our algorithms speed up the overall mapping process and, due to the integrated multilevel approach, also find much better solutions in practice. For example, one configuration of our algorithm yields better solutions than the previous state-of-the-art in terms of mapping quality while being a factor 62 faster. Compared to the currently fastest iterated multilevel mapping algorithm Scotch, we obtain 16% better solutions while investing slightly more running time.

preprint2016arXiv

An Empirical Comparison of Big Graph Frameworks in the Context of Network Analysis

Complex networks are relational data sets commonly represented as graphs. The analysis of their intricate structure is relevant to many areas of science and commerce, and data sets may reach sizes that require distributed storage and processing. We describe and compare programming models for distributed computing with a focus on graph algorithms for large-scale complex network analysis. Four frameworks - GraphLab, Apache Giraph, Giraph++ and Apache Flink - are used to implement algorithms for the representative problems Connected Components, Community Detection, PageRank and Clustering Coefficients. The implementations are executed on a computer cluster to evaluate the frameworks' suitability in practice and to compare their performance to that of the single-machine, shared-memory parallel network analysis package NetworKit. Out of the distributed frameworks, GraphLab and Apache Giraph generally show the best performance. In our experiments a cluster of eight computers running Apache Giraph enables the analysis of a network with about 2 billion edges, which is too large for a single machine of the same type. However, for networks that fit into memory of one machine, the performance of the shared-memory parallel implementation is far better than the distributed ones. The study provides experimental evidence for selecting the appropriate framework depending on the task and data volume.

preprint2016arXiv

Generating massive complex networks with hyperbolic geometry faster in practice

Generative network models play an important role in algorithm development, scaling studies, network analysis, and realistic system benchmarks for graph data sets. The commonly used graph-based benchmark model R-MAT has some drawbacks concerning realism and the scaling behavior of network properties. A complex network model gaining considerable popularity builds random hyperbolic graphs, generated by distributing points within a disk in the hyperbolic plane and then adding edges between points whose hyperbolic distance is below a threshold. We present in this paper a fast generation algorithm for such graphs. Our experiments show that our new generator achieves speedup factors of 3-60 over the best previous implementation. One billion edges can now be generated in under one minute on a shared-memory workstation. Furthermore, we present a dynamic extension to model gradual network change, while preserving at each step the point position probabilities.

preprint2016arXiv

Querying Probabilistic Neighborhoods in Spatial Data Sets Efficiently

$\newcommand{\dist}{\operatorname{dist}}$ In this paper we define the notion of a probabilistic neighborhood in spatial data: Let a set $P$ of $n$ points in $\mathbb{R}^d$, a query point $q \in \mathbb{R}^d$, a distance metric $\dist$, and a monotonically decreasing function $f : \mathbb{R}^+ \rightarrow [0,1]$ be given. Then a point $p \in P$ belongs to the probabilistic neighborhood $N(q, f)$ of $q$ with respect to $f$ with probability $f(\dist(p,q))$. We envision applications in facility location, sensor networks, and other scenarios where a connection between two entities becomes less likely with increasing distance. A straightforward query algorithm would determine a probabilistic neighborhood in $Θ(n\cdot d)$ time by probing each point in $P$. To answer the query in sublinear time for the planar case, we augment a quadtree suitably and design a corresponding query algorithm. Our theoretical analysis shows that -- for certain distributions of planar $P$ -- our algorithm answers a query in $O((|N(q,f)| + \sqrt{n})\log n)$ time with high probability (whp). This matches up to a logarithmic factor the cost induced by quadtree-based algorithms for deterministic queries and is asymptotically faster than the straightforward approach whenever $|N(q,f)| \in o(n / \log n)$. As practical proofs of concept we use two applications, one in the Euclidean and one in the hyperbolic plane. In particular, our results yield the first generator for random hyperbolic graphs with arbitrary temperatures in subquadratic time. Moreover, our experimental data show the usefulness of our algorithm even if the point distribution is unknown or not uniform: The running time savings over the pairwise probing approach constitute at least one order of magnitude already for a modest number of points and queries.

preprint2016arXiv

Structure-Preserving Sparsification Methods for Social Networks

Sparsification reduces the size of networks while preserving structural and statistical properties of interest. Various sparsifying algorithms have been proposed in different contexts. We contribute the first systematic conceptual and experimental comparison of \textit{edge sparsification} methods on a diverse set of network properties. It is shown that they can be understood as methods for rating edges by importance and then filtering globally or locally by these scores. We show that applying a local filtering technique improves the preservation of all kinds of properties. In addition, we propose a new sparsification method (\textit{Local Degree}) which preserves edges leading to local hub nodes. All methods are evaluated on a set of social networks from Facebook, Google+, Twitter and LiveJournal with respect to network properties including diameter, connected components, community structure, multiple node centrality measures and the behavior of epidemic simulations. In order to assess the preservation of the community structure, we also include experiments on synthetically generated networks with ground truth communities. Experiments with our implementations of the sparsification methods (included in the open-source network analysis tool suite NetworKit) show that many network properties can be preserved down to about 20\% of the original set of edges for sparse graphs with a reasonable density. The experimental results allow us to differentiate the behavior of different methods and show which method is suitable with respect to which property. While our Local Degree method is best for preserving connectivity and short distances, other newly introduced local variants are best for preserving the community structure.

preprint2015arXiv

Algorithms for Mapping Parallel Processes onto Grid and Torus Architectures

Static mapping is the assignment of parallel processes to the processing elements (PEs) of a parallel system, where the assignment does not change during the application's lifetime. In our scenario we model an application's computations and their dependencies by an application graph. This graph is first partitioned into (nearly) equally sized blocks. These blocks need to communicate at block boundaries. To assign the processes to PEs, our goal is to compute a communication-efficient bijective mapping between the blocks and the PEs. This approach of partitioning followed by bijective mapping has many degrees of freedom. Thus, users and developers of parallel applications need to know more about which choices work for which application graphs and which parallel architectures. To this end, we not only develop new mapping algorithms (derived from known greedy methods). We also perform extensive experiments involving different classes of application graphs (meshes and complex networks), architectures of parallel computers (grids and tori), as well as different partitioners and mapping algorithms. Surprisingly, the quality of the partitions, unless very poor, has little influence on the quality of the mapping. More importantly, one of our new mapping algorithms always yields the best results in terms of the quality measure maximum congestion when the application graphs are complex networks. In case of meshes as application graphs, this mapping algorithm always leads in terms of maximum congestion AND maximum dilation, another common quality measure.

preprint2015arXiv

Approximating Betweenness Centrality in Fully-dynamic Networks

Betweenness is a well-known centrality measure that ranks the nodes of a network according to their participation in shortest paths. Since an exact computation is prohibitive in large networks, several approximation algorithms have been proposed. Besides that, recent years have seen the publication of dynamic algorithms for efficient recomputation of betweenness in networks that change over time. In this paper we propose the first betweenness centrality approximation algorithms with a provable guarantee on the maximum approximation error for dynamic networks. Several new intermediate algorithmic results contribute to the respective approximation algorithms: (i) new upper bounds on the vertex diameter, (ii) the first fully-dynamic algorithm for updating an approximation of the vertex diameter in undirected graphs, and (iii) an algorithm with lower time complexity for updating single-source shortest paths in unweighted graphs after a batch of edge actions. Using approximation, our algorithms are the first to make in-memory computation of betweenness in dynamic networks with millions of edges feasible. Our experiments show that our algorithms can achieve substantial speedups compared to recomputation, up to several orders of magnitude. Moreover, the approximation accuracy is usually significantly better than the theoretical guarantee in terms of absolute error. More importantly, for reasonably small approximation error thresholds, the rank of nodes is well preserved, in particular for nodes with high betweenness.

preprint2015arXiv

Drawing Large Graphs by Multilevel Maxent-Stress Optimization

Drawing large graphs appropriately is an important step for the visual analysis of data from real-world networks. Here we present a novel multilevel algorithm to compute a graph layout with respect to a recently proposed metric that combines layout stress and entropy. As opposed to previous work, we do not solve the linear systems of the maxent-stress metric with a typical numerical solver. Instead we use a simple local iterative scheme within a multilevel approach. To accelerate local optimization, we approximate long-range forces and use shared-memory parallelism. Our experiments validate the high potential of our approach, which is particularly appealing for dynamic graphs. In comparison to the previously best maxent-stress optimizer, which is sequential, our parallel implementation is on average 30 times faster already for static graphs (and still faster if executed on one thread) while producing a comparable solution quality.

preprint2015arXiv

Engineering Parallel Algorithms for Community Detection in Massive Networks

The amount of graph-structured data has recently experienced an enormous growth in many applications. To transform such data into useful information, fast analytics algorithms and software tools are necessary. One common graph analytics kernel is disjoint community detection (or graph clustering). Despite extensive research on heuristic solvers for this task, only few parallel codes exist, although parallelism will be necessary to scale to the data volume of real-world applications. We address the deficit in computing capability by a flexible and extensible community detection framework with shared-memory parallelism. Within this framework we design and implement efficient parallel community detection heuristics: A parallel label propagation scheme; the first large-scale parallelization of the well-known Louvain method, as well as an extension of the method adding refinement; and an ensemble scheme combining the above. In extensive experiments driven by the algorithm engineering paradigm, we identify the most successful parameters and combinations of these algorithms. We also compare our implementations with state-of-the-art competitors. The processing rate of our fastest algorithm often reaches 50M edges/second. We recommend the parallel Louvain method and our variant with refinement as both qualitatively strong and fast. Our methods are suitable for massive data sets with billions of edges.

preprint2015arXiv

Fast generation of complex networks with underlying hyperbolic geometry

Complex networks have become increasingly popular for modeling various real-world phenomena. Realistic generative network models are important in this context as they avoid privacy concerns of real data and simplify complex network research regarding data sharing, reproducibility, and scalability studies. \emph{Random hyperbolic graphs} are a well-analyzed family of geometric graphs. Previous work provided empirical and theoretical evidence that this generative graph model creates networks with non-vanishing clustering and other realistic features. However, the investigated networks in previous applied work were small, possibly due to the quadratic running time of a previous generator. In this work we provide the first generation algorithm for these networks with subquadratic running time. We prove a time complexity of $O((n^{3/2}+m) \log n)$ with high probability for the generation process. This running time is confirmed by experimental data with our implementation. The acceleration stems primarily from the reduction of pairwise distance computations through a polar quadtree, which we adapt to hyperbolic space for this purpose. In practice we improve the running time of a previous implementation by at least two orders of magnitude this way. Networks with billions of edges can now be generated in a few minutes. Finally, we evaluate the largest networks of this model published so far. Our empirical analysis shows that important features are retained over different graph densities and degree distributions.

preprint2015arXiv

Fully-dynamic Approximation of Betweenness Centrality

Betweenness is a well-known centrality measure that ranks the nodes of a network according to their participation in shortest paths. Since an exact computation is prohibitive in large networks, several approximation algorithms have been proposed. Besides that, recent years have seen the publication of dynamic algorithms for efficient recomputation of betweenness in evolving networks. In previous work we proposed the first semi-dynamic algorithms that recompute an approximation of betweenness in connected graphs after batches of edge insertions. In this paper we propose the first fully-dynamic approximation algorithms (for weighted and unweighted undirected graphs that need not to be connected) with a provable guarantee on the maximum approximation error. The transfer to fully-dynamic and disconnected graphs implies additional algorithmic problems that could be of independent interest. In particular, we propose a new upper bound on the vertex diameter for weighted undirected graphs. For both weighted and unweighted graphs, we also propose the first fully-dynamic algorithms that keep track of such upper bound. In addition, we extend our former algorithm for semi-dynamic BFS to batches of both edge insertions and deletions. Using approximation, our algorithms are the first to make in-memory computation of betweenness in fully-dynamic networks with millions of edges feasible. Our experiments show that they can achieve substantial speedups compared to recomputation, up to several orders of magnitude.

preprint2015arXiv

Is Nearly-linear the same in Theory and Practice? A Case Study with a Combinatorial Laplacian Solver

Linear system solving is one of the main workhorses in applied mathematics. Recently, theoretical computer scientists have contributed sophisticated algorithms for solving linear systems with symmetric diagonally dominant matrices (a class to which Laplacian matrices belong) in provably nearly-linear time. While these algorithms are highly interesting from a theoretical perspective, there are no published results how they perform in practice. With this paper we address this gap. We provide the first implementation of the combinatorial solver by [Kelner et al., STOC 2013], which is particularly appealing for implementation due to its conceptual simplicity. The algorithm exploits that a Laplacian matrix corresponds to a graph; solving Laplacian linear systems amounts to finding an electrical flow in this graph with the help of cycles induced by a spanning tree with the low-stretch property. The results of our comprehensive experimental study are ambivalent. They confirm a nearly-linear running time, but for reasonable inputs the constant factors make the solver much slower than methods with higher asymptotic complexity. One other aspect predicted by theory is confirmed by our findings, though: Spanning trees with lower stretch indeed reduce the solver's running time. Yet, simple spanning tree algorithms perform in practice better than those with a guaranteed low stretch.

preprint2015arXiv

k-way Hypergraph Partitioning via n-Level Recursive Bisection

We develop a multilevel algorithm for hypergraph partitioning that contracts the vertices one at a time. Using several caching and lazy-evaluation techniques during coarsening and refinement, we reduce the running time by up to two-orders of magnitude compared to a naive $n$-level algorithm that would be adequate for ordinary graph partitioning. The overall performance is even better than the widely used hMetis hypergraph partitioner that uses a classical multilevel algorithm with few levels. Aided by a portfolio-based approach to initial partitioning and adaptive budgeting of imbalance within recursive bipartitioning, we achieve very high quality. We assembled a large benchmark set with 310 hypergraphs stemming from application areas such VLSI, SAT solving, social networks, and scientific computing. We achieve significantly smaller cuts than hMetis and PaToH, while being faster than hMetis. Considerably larger improvements are observed for some instance classes like social networks, for bipartitioning, and for partitions with an allowed imbalance of 10%. The algorithm presented in this work forms the basis of our hypergraph partitioning framework KaHyPar (Karlsruhe Hypergraph Partitioning).

preprint2015arXiv

n-Level Hypergraph Partitioning

We develop a multilevel algorithm for hypergraph partitioning that contracts the vertices one at a time and thus allows very high quality. This includes a rating function that avoids nonuniform vertex weights, an efficient "semi-dynamic" hypergraph data structure, a very fast coarsening algorithm, and two new local search algorithms. One is a $k$-way hypergraph adaptation of Fiduccia-Mattheyses local search and gives high quality at reasonable cost. The other is an adaptation of size-constrained label propagation to hypergraphs. Comparisons with hMetis and PaToH indicate that the new algorithm yields better quality over several benchmark sets and has a running time that is comparable to hMetis. Using label propagation local search is several times faster than hMetis and gives better quality than PaToH for a VLSI benchmark set.

preprint2015arXiv

NetworKit: A Tool Suite for Large-scale Complex Network Analysis

We introduce NetworKit, an open-source software package for analyzing the structure of large complex networks. Appropriate algorithmic solutions are required to handle increasingly common large graph data sets containing up to billions of connections. We describe the methodology applied to develop scalable solutions to network analysis problems, including techniques like parallelization, heuristics for computationally expensive problems, efficient data structures, and modular software architecture. Our goal for the software is to package results of our algorithm engineering efforts and put them into the hands of domain experts. NetworKit is implemented as a hybrid combining the kernels written in C++ with a Python front end, enabling integration into the Python ecosystem of tested tools for data analysis and scientific computing. The package provides a wide range of functionality (including common and novel analytics algorithms and graph generators) and does so via a convenient interface. In an experimental comparison with related software, NetworKit shows the best performance on a range of typical analysis tasks.

preprint2015arXiv

Parallel Graph Partitioning for Complex Networks

Processing large complex networks like social networks or web graphs has recently attracted considerable interest. In order to do this in parallel, we need to partition them into pieces of about equal size. Unfortunately, previous parallel graph partitioners originally developed for more regular mesh-like networks do not work well for these networks. This paper addresses this problem by parallelizing and adapting the label propagation technique originally developed for graph clustering. By introducing size constraints, label propagation becomes applicable for both the coarsening and the refinement phase of multilevel graph partitioning. We obtain very high quality by applying a highly parallel evolutionary algorithm to the coarsened graph. The resulting system is both more scalable and achieves higher quality than state-of-the-art systems like ParMetis or PT-Scotch. For large complex networks the performance differences are very big. For example, our algorithm can partition a web graph with 3.3 billion edges in less than sixteen seconds using 512 cores of a high performance cluster while producing a high quality partition -- none of the competing systems can handle this graph on our system.

preprint2015arXiv

Recent Advances in Graph Partitioning

We survey recent trends in practical algorithms for balanced graph partitioning together with applications and future research directions.

preprint2015arXiv

Structure-Preserving Sparsification of Social Networks

Sparsification reduces the size of networks while preserving structural and statistical properties of interest. Various sparsifying algorithms have been proposed in different contexts. We contribute the first systematic conceptual and experimental comparison of \textit{edge sparsification} methods on a diverse set of network properties. It is shown that they can be understood as methods for rating edges by importance and then filtering globally by these scores. In addition, we propose a new sparsification method (\textit{Local Degree}) which preserves edges leading to local hub nodes. All methods are evaluated on a set of 100 Facebook social networks with respect to network properties including diameter, connected components, community structure, and multiple node centrality measures. Experiments with our implementations of the sparsification methods (using the open-source network analysis tool suite NetworKit) show that many network properties can be preserved down to about 20\% of the original set of edges. Furthermore, the experimental results allow us to differentiate the behavior of different methods and show which method is suitable with respect to which property. Our Local Degree method is fast enough for large-scale networks and performs well across a wider range of properties than previously proposed methods.

preprint2014arXiv

Approximating Betweenness Centrality in Large Evolving Networks

Betweenness centrality ranks the importance of nodes by their participation in all shortest paths of the network. Therefore computing exact betweenness values is impractical in large networks. For static networks, approximation based on randomly sampled paths has been shown to be significantly faster in practice. However, for dynamic networks, no approximation algorithm for betweenness centrality is known that improves on static recomputation. We address this deficit by proposing two incremental approximation algorithms (for weighted and unweighted connected graphs) which provide a provable guarantee on the absolute approximation error. Processing batches of edge insertions, our algorithms yield significant speedups up to a factor of $10^4$ compared to restarting the approximation. This is enabled by investing memory to store and efficiently update shortest paths. As a building block, we also propose an asymptotically faster algorithm for updating the SSSP problem in unweighted graphs. Our experimental study shows that our algorithms are the first to make in-memory computation of a betweenness ranking practical for million-edge semi-dynamic networks. Moreover, our results show that the accuracy is even better than the theoretical guarantees in terms of absolutes errors and the rank of nodes is well preserved, in particular for those with high betweenness.

preprint2014arXiv

Finding all Convex Cuts of a Plane Graph in Polynomial Time

Convexity is a notion that has been defined for subsets of $\RR^n$ and for subsets of general graphs. A convex cut of a graph $G=(V, E)$ is a $2$-partition $V_1 \dot{\cup} V_2=V$ such that both $V_1$ and $V_2$ are convex, \ie shortest paths between vertices in $V_i$ never leave $V_i$, $i \in \{1, 2\}$. Finding convex cuts is $\mathcal{NP}$-hard for general graphs. To characterize convex cuts, we employ the Djokovic relation, a reflexive and symmetric relation on the edges of a graph that is based on shortest paths between the edges' end vertices. It is known for a long time that, if $G$ is bipartite and the Djokovic relation is transitive on $G$, \ie $G$ is a partial cube, then the cut-sets of $G$'s convex cuts are precisely the equivalence classes of the Djokovic relation. In particular, any edge of $G$ is contained in the cut-set of exactly one convex cut. We first characterize a class of plane graphs that we call {\em well-arranged}. These graphs are not necessarily partial cubes, but any edge of a well-arranged graph is contained in the cut-set(s) of at least one convex cut. We also present an algorithm that uses the Djokovic relation for computing all convex cuts of a (not necessarily plane) bipartite graph in $\bigO(|E|^3)$ time. Specifically, a cut-set is the cut-set of a convex cut if and only if the Djokovic relation holds for any pair of edges in the cut-set. We then characterize the cut-sets of the convex cuts of a general graph $H$ using two binary relations on edges: (i) the Djokovic relation on the edges of a subdivision of $H$, where any edge of $H$ is subdivided into exactly two edges and (ii) a relation on the edges of $H$ itself that is not the Djokovic relation. Finally, we use this characterization to present the first algorithm for finding all convex cuts of a plane graph in polynomial time.

preprint2014arXiv

Partitioning Complex Networks via Size-constrained Clustering

The most commonly used method to tackle the graph partitioning problem in practice is the multilevel approach. During a coarsening phase, a multilevel graph partitioning algorithm reduces the graph size by iteratively contracting nodes and edges until the graph is small enough to be partitioned by some other algorithm. A partition of the input graph is then constructed by successively transferring the solution to the next finer graph and applying a local search algorithm to improve the current solution. In this paper, we describe a novel approach to partition graphs effectively especially if the networks have a highly irregular structure. More precisely, our algorithm provides graph coarsening by iteratively contracting size-constrained clusterings that are computed using a label propagation algorithm. The same algorithm that provides the size-constrained clusterings can also be used during uncoarsening as a fast and simple local search algorithm. Depending on the algorithm's configuration, we are able to compute partitions of very high quality outperforming all competitors, or partitions that are comparable to the best competitor in terms of quality, hMetis, while being nearly an order of magnitude faster on average. The fastest configuration partitions the largest graph available to us with 3.3 billion edges using a single machine in about ten minutes while cutting less than half of the edges than the fastest competitor, kMetis.

preprint2014arXiv

Tree-based Coarsening and Partitioning of Complex Networks

Many applications produce massive complex networks whose analysis would benefit from parallel processing. Parallel algorithms, in turn, often require a suitable network partition. For solving optimization tasks such as graph partitioning on large networks, multilevel methods are preferred in practice. Yet, complex networks pose challenges to established multilevel algorithms, in particular to their coarsening phase. One way to specify a (recursive) coarsening of a graph is to rate its edges and then contract the edges as prioritized by the rating. In this paper we (i) define weights for the edges of a network that express the edges' importance for connectivity, (ii) compute a minimum weight spanning tree $T^m$ with respect to these weights, and (iii) rate the network edges based on the conductance values of $T^m$'s fundamental cuts. To this end, we also (iv) develop the first optimal linear-time algorithm to compute the conductance values of \emph{all} fundamental cuts of a given spanning tree. We integrate the new edge rating into a leading multilevel graph partitioner and equip the latter with a new greedy postprocessing for optimizing the maximum communication volume (MCV). Experiments on bipartitioning frequently used benchmark networks show that the postprocessing already reduces MCV by 11.3%. Our new edge rating further reduces MCV by 10.3% compared to the previously best rating with the postprocessing in place for both ratings. In total, with a modest increase in running time, our new approach reduces the MCV of complex network partitions by 20.4%.

preprint2013arXiv

Static and Dynamic Aspects of Scientific Collaboration Networks

Collaboration networks arise when we map the connections between scientists which are formed through joint publications. These networks thus display the social structure of academia, and also allow conclusions about the structure of scientific knowledge. Using the computer science publication database DBLP, we compile relations between authors and publications as graphs and proceed with examining and quantifying collaborative relations with graph-based methods. We review standard properties of the network and rank authors and publications by centrality. Additionally, we detect communities with modularity-based clustering and compare the resulting clusters to a ground-truth based on conferences and thus topical similarity. In a second part, we are the first to combine DBLP network data with data from the Dagstuhl Seminars: We investigate whether seminars of this kind, as social and academic events designed to connect researchers, leave a visible track in the structure of the collaboration network. Our results suggest that such single events are not influential enough to change the network structure significantly. However, the network structure seems to influence a participant's decision to accept or decline an invitation.

Henning Meyerhenke

What is connected

Connect this record

See the researcher in context

Building this map preview

31 published item(s)

Fast Dynamic Updates and Dynamic SpGEMM on MPI-Distributed Graphs

Interactive Visualization of Protein RINs using NetworKit in the Cloud

More Recent Advances in (Hyper)Graph Partitioning

Tarema: Adaptive Resource Allocation for Scalable Scientific Workflows in Heterogeneous Clusters

Approximation of the Diagonal of a Laplacian's Pseudoinverse for Complex Network Analysis

New Approximation Algorithms for Forest Closeness Centrality -- for Individual Vertices and Vertex Groups

The climatic interdependence of extreme-rainfall events around the globe

Combined Centrality Measures for an Improved Characterization of Influence Spread in Social Networks

High-Quality Hierarchical Process Mapping

An Empirical Comparison of Big Graph Frameworks in the Context of Network Analysis

Generating massive complex networks with hyperbolic geometry faster in practice

Querying Probabilistic Neighborhoods in Spatial Data Sets Efficiently

Structure-Preserving Sparsification Methods for Social Networks

Algorithms for Mapping Parallel Processes onto Grid and Torus Architectures

Approximating Betweenness Centrality in Fully-dynamic Networks

Drawing Large Graphs by Multilevel Maxent-Stress Optimization

Engineering Parallel Algorithms for Community Detection in Massive Networks

Fast generation of complex networks with underlying hyperbolic geometry

Fully-dynamic Approximation of Betweenness Centrality

Is Nearly-linear the same in Theory and Practice? A Case Study with a Combinatorial Laplacian Solver

k-way Hypergraph Partitioning via n-Level Recursive Bisection

n-Level Hypergraph Partitioning

NetworKit: A Tool Suite for Large-scale Complex Network Analysis

Parallel Graph Partitioning for Complex Networks

Recent Advances in Graph Partitioning

Structure-Preserving Sparsification of Social Networks

Approximating Betweenness Centrality in Large Evolving Networks

Finding all Convex Cuts of a Plane Graph in Polynomial Time

Partitioning Complex Networks via Size-constrained Clustering

Tree-based Coarsening and Partitioning of Complex Networks

Static and Dynamic Aspects of Scientific Collaboration Networks