Researcher profile

Kamil Orujzade

Kamil Orujzade contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 13 - UnverifiedVerification L1Unclaimed author
2works
0followers
1topics
2close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

2 published item(s)

preprint2022arXiv

Out-of-Core Edge Partitioning at Linear Run-Time

Graph edge partitioning is an important preprocessing step to optimize distributed computing jobs on graph-structured data. The edge set of a given graph is split into $k$ equally-sized partitions, such that the replication of vertices across partitions is minimized. Out-of-core edge partitioning algorithms are able to tackle the problem with low memory overhead. Exsisting out-of-core algorithms mainly work in a streaming manner and can be grouped into two types. While \emph{stateless} streaming edge partitioning is fast and yields low partitioning quality, stateful streaming edge partitioning yields better quality, but is expensive, as it requires a scoring function to be evaluated for every edge on every partition, leading to a time complexity of $\mathcal{O}(|E|*k)$. In this paper, we propose 2PS-L, a novel out-of-core edge partitioning algorithm that builds upon the stateful streaming model, but achieves linear run-time (i.e., $\mathcal{O}(|E|)$). 2PS-L consists of two phases. In the first phase, vertices are separated into clusters by a lightweight streaming clustering algorithm. In the second phase, the graph is re-streamed and vertex clustering from the first phase is exploited to reduce the search space of graph partitioning to only two target partitions for every edge. Our evaluations show that 2PS-L can achieve better partitioning quality than existing stateful streaming edge partitioners while having a much lower run-time. As a consequence, the total run-time of partitioning and subsequent distributed graph processing can be significantly reduced.

preprint2020arXiv

2PS: High-Quality Edge Partitioning with Two-Phase Streaming

Graph partitioning is an important preprocessing step to distributed graph processing. In edge partitioning, the edge set of a given graph is split into $k$ equally-sized partitions, such that the replication of vertices across partitions is minimized. Streaming is a viable approach to partition graphs that exceed the memory capacities of a single server. The graph is ingested as a stream of edges, and one edge at a time is immediately and irrevocably assigned to a partition based on a scoring function. However, streaming partitioning suffers from the uninformed assignment problem: At the time of partitioning early edges in the stream, there is no information available about the rest of the edges. As a consequence, edge assignments are often driven by balancing considerations, and the achieved replication factor is comparably high. In this paper, we propose 2PS, a novel two-phase streaming algorithm for high-quality edge partitioning. In the first phase, vertices are separated into clusters by a lightweight streaming clustering algorithm. In the second phase, the graph is re-streamed and edge partitioning is performed while taking into account the clustering of the vertices from the first phase. Our evaluations show that 2PS can achieve a replication factor that is comparable to heavy-weight random access partitioners while inducing orders of magnitude lower memory overhead.