Source author record

Minas Gjoka

Minas Gjoka appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Social and Information Networks physics.data-an physics.soc-ph Networking and Internet Architecture Methodology Data Structures and Algorithms Information Theory math.IT

Catalog footprint

What is connected

7works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2015arXiv

Estimating Subgraph Frequencies with or without Attributes from Egocentrically Sampled Data

In this paper we show how to efficiently produce unbiased estimates of subgraph frequencies from a probability sample of egocentric networks (i.e., focal nodes, their neighbors, and the induced subgraphs of ties among their neighbors). A key feature of our proposed method that differentiates it from prior methods is the use of egocentric data. Because of this, our method is suitable for estimation in large unknown graphs, is easily parallelizable, handles privacy sensitive network data (e.g. egonets with no neighbor labels), and supports counting of large subgraphs (e.g. maximal clique of size 205 in Section 6) by building on top of existing exact subgraph counting algorithms that may not support sampling. It gracefully handles a variety of sampling designs such as uniform or weighted independence or random walk sampling. Our method can be used for subgraphs that are: (i) undirected or directed; (ii) induced or non-induced; (iii) maximal or non-maximal; and (iv) potentially annotated with attributes. We compare our estimators on a variety of real-world graphs and sampling methods and provide suggestions for their use. Simulation shows that our method outperforms the state-of-the-art approach for relative subgraph frequencies by up to an order of magnitude for the same sample size. Finally, we apply our methodology to a rare sample of Facebook users across the social graph to estimate and interpret the clique size distribution and gender composition of cliques.

preprint2013arXiv

A Network Coding Approach to Loss Tomography

Network tomography aims at inferring internal network characteristics based on measurements at the edge of the network. In loss tomography, in particular, the characteristic of interest is the loss rate of individual links and multicast and/or unicast end-to-end probes are typically used. Independently, recent advances in network coding have shown that there are advantages from allowing intermediate nodes to process and combine, in addition to just forward, packets. In this paper, we study the problem of loss tomography in networks with network coding capabilities. We design a framework for estimating link loss rates, which leverages network coding capabilities, and we show that it improves several aspects of tomography including the identifiability of links, the trade-off between estimation accuracy and bandwidth efficiency, and the complexity of probe path selection. We discuss the cases of inferring link loss rates in a tree topology and in a general topology. In the latter case, the benefits of our approach are even more pronounced compared to standard techniques, but we also face novel challenges, such as dealing with cycles and multiple paths between sources and receivers. Overall, this work makes the connection between active network tomography and network coding.

preprint2013arXiv

Estimating Clique Composition and Size Distributions from Sampled Network Data

Cliques are defined as complete graphs or subgraphs; they are the strongest form of cohesive subgroup, and are of interest in both social science and engineering contexts. In this paper we show how to efficiently estimate the distribution of clique sizes from a probability sample of nodes obtained from a graph (e.g., by independence or link-trace sampling). We introduce two types of unbiased estimators, one of which exploits labeling of sampled nodes neighbors and one of which does not require this information. We compare the estimators on a variety of real-world graphs and provide suggestions for their use. We generalize our estimators to cases in which cliques are distinguished not only by size but also by node attributes, allowing us to estimate clique composition by size. Finally, we apply our methodology to a sample of Facebook users to estimate the clique size distribution by gender over the social graph.

preprint2012arXiv

2.5K-Graphs: from Sampling to Generation

Understanding network structure and having access to realistic graphs plays a central role in computer and social networks research. In this paper, we propose a complete, and practical methodology for generating graphs that resemble a real graph of interest. The metrics of the original topology we target to match are the joint degree distribution (JDD) and the degree-dependent average clustering coefficient ($\bar{c}(k)$). We start by developing efficient estimators for these two metrics based on a node sample collected via either independence sampling or random walks. Then, we process the output of the estimators to ensure that the target properties are realizable. Finally, we propose an efficient algorithm for generating topologies that have the exact target JDD and a $\bar{c}(k)$ close to the target. Extensive simulations using real-life graphs show that the graphs generated by our methodology are similar to the original graph with respect to, not only the two target metrics, but also a wide range of other topological metrics; furthermore, our generator is order of magnitudes faster than state-of-the-art techniques.

preprint2011arXiv

A Walk in Facebook: Uniform Sampling of Users in Online Social Networks

Our goal in this paper is to develop a practical framework for obtaining a uniform sample of users in an online social network (OSN) by crawling its social graph. Such a sample allows to estimate any user property and some topological properties as well. To this end, first, we consider and compare several candidate crawling techniques. Two approaches that can produce approximately uniform samples are the Metropolis-Hasting random walk (MHRW) and a re-weighted random walk (RWRW). Both have pros and cons, which we demonstrate through a comparison to each other as well as to the "ground truth." In contrast, using Breadth-First-Search (BFS) or an unadjusted Random Walk (RW) leads to substantially biased results. Second, and in addition to offline performance assessment, we introduce online formal convergence diagnostics to assess sample quality during the data collection process. We show how these diagnostics can be used to effectively determine when a random walk sample is of adequate size and quality. Third, as a case study, we apply the above methods to Facebook and we collect the first, to the best of our knowledge, representative sample of Facebook users. We make it publicly available and employ it to characterize several key properties of Facebook.

preprint2011arXiv

Coarse-Grained Topology Estimation via Graph Sampling

Many online networks are measured and studied via sampling techniques, which typically collect a relatively small fraction of nodes and their associated edges. Past work in this area has primarily focused on obtaining a representative sample of nodes and on efficient estimation of local graph properties (such as node degree distribution or any node attribute) based on that sample. However, less is known about estimating the global topology of the underlying graph. In this paper, we show how to efficiently estimate the coarse-grained topology of a graph from a probability sample of nodes. In particular, we consider that nodes are partitioned into categories (e.g., countries or work/study places in OSNs), which naturally defines a weighted category graph. We are interested in estimating (i) the size of categories and (ii) the probability that nodes from two different categories are connected. For each of the above, we develop a family of estimators for design-based inference under uniform or non-uniform sampling, employing either of two measurement strategies: induced subgraph sampling, which relies only on information about the sampled nodes; and star sampling, which also exploits category information about the neighbors of sampled nodes. We prove consistency of these estimators and evaluate their efficiency via simulation on fully known graphs. We also apply our methodology to a sample of Facebook users to obtain a number of category graphs, such as the college friendship graph and the country friendship graph; we share and visualize the resulting data at www.geosocialmap.com.

preprint2011arXiv

Multigraph Sampling of Online Social Networks

State-of-the-art techniques for probability sampling of users of online social networks (OSNs) are based on random walks on a single social relation (typically friendship). While powerful, these methods rely on the social graph being fully connected. Furthermore, the mixing time of the sampling process strongly depends on the characteristics of this graph. In this paper, we observe that there often exist other relations between OSN users, such as membership in the same group or participation in the same event. We propose to exploit the graphs these relations induce, by performing a random walk on their union multigraph. We design a computationally efficient way to perform multigraph sampling by randomly selecting the graph on which to walk at each iteration. We demonstrate the benefits of our approach through (i) simulation in synthetic graphs, and (ii) measurements of Last.fm - an Internet website for music with social networking features. More specifically, we show that multigraph sampling can obtain a representative sample and faster convergence, even when the individual graphs fail, i.e., are disconnected or highly clustered.

Minas Gjoka

What is connected

Connect this record

See the researcher in context

Building this map preview

7 published item(s)

Estimating Subgraph Frequencies with or without Attributes from Egocentrically Sampled Data

A Network Coding Approach to Loss Tomography

Estimating Clique Composition and Size Distributions from Sampled Network Data

2.5K-Graphs: from Sampling to Generation

A Walk in Facebook: Uniform Sampling of Users in Online Social Networks

Coarse-Grained Topology Estimation via Graph Sampling

Multigraph Sampling of Online Social Networks