Source author record

Nelly Litvak

Nelly Litvak appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.PR physics.soc-ph Social and Information Networks Data Structures and Algorithms cond-mat.stat-mech Digital Libraries Distributed, Parallel, and Cluster Computing math.NA math.OC Networking and Internet Architecture Numerical Analysis

Catalog footprint

What is connected

18works

11topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Red Light Green Light Method for Solving Large Markov Chains

Discrete-time discrete-state finite Markov chains are versatile mathematical models for a wide range of real-life stochastic processes. One of most common tasks in studies of Markov chains is computation of the stationary distribution. Without loss of generality, and drawing our motivation from applications to large networks, we interpret this problem as one of computing the stationary distribution of a random walk on a graph. We propose a new controlled, easily distributed algorithm for this task, briefly summarized as follows: at the beginning, each node receives a fixed amount of cash (positive or negative), and at each iteration, some nodes receive `green light' to distribute their wealth or debt proportionally to the transition probabilities of the Markov chain; the stationary probability of a node is computed as a ratio of the cash distributed by this node to the total cash distributed by all nodes together. Our method includes as special cases a wide range of known, very different, and previously disconnected methods including power iterations, Gauss-Southwell, and online distributed algorithms. We prove exponential convergence of our method, demonstrate its high efficiency, and derive scheduling strategies for the green-light, that achieve convergence rate faster than state-of-the-art algorithms.

preprint2022arXiv

When Less Is More: Systematic Analysis of Cascade-Based Community Detection

Information diffusion, spreading of infectious diseases, and spreading of rumors are fundamental processes occurring in real-life networks. In many practical cases, one can observe when nodes become infected, but the underlying network, over which a contagion or information propagates, is hidden. Inferring properties of the underlying network is important since these properties can be used for constraining infections, forecasting, viral marketing, and so on. Moreover, for many applications, it is sufficient to recover only coarse high-level properties of this network rather than all its edges. This article conducts a systematic and extensive analysis of the following problem: Given only the infection times, find communities of highly interconnected nodes. This task significantly differs from the well-studied community detection problem since we do not observe a graph to be clustered. We carry out a thorough comparison between existing and new approaches on several large datasets and cover methodological challenges specific to this problem. One of the main conclusions is that the most stable performance and the most significant improvement on the current state-of-the-art are achieved by our proposed simple heuristic approaches agnostic to a particular graph structure and epidemic model. We also show that some well-known community detection algorithms can be enhanced by including edge weights based on the cascade data.

preprint2015arXiv

Modelling of trends in Twitter using retweet graph dynamics

In this paper we model user behaviour in Twitter to capture the emergence of trending topics. For this purpose, we first extensively analyse tweet datasets of several different events. In particular, for these datasets, we construct and investigate the retweet graphs. We find that the retweet graph for a trending topic has a relatively dense largest connected component (LCC). Next, based on the insights obtained from the analyses of the datasets, we design a mathematical model that describes the evolution of a retweet graph by three main parameters. We then quantify, analytically and by simulation, the influence of the model parameters on the basic characteristics of the retweet graph, such as the density of edges and the size and density of the LCC. Finally, we put the model in practice, estimate its parameters and compare the resulting behavior of the model to our datasets.

preprint2015arXiv

Phase transitions for scaling of structural correlations in directed networks

Analysis of degree-degree dependencies in complex networks, and their impact on processes on networks requires null models, i.e. models that generate uncorrelated scale-free networks. Most models to date however show structural negative dependencies, caused by finite size effects. We analyze the behavior of these structural negative degree-degree dependencies, using rank based correlation measures, in the directed Erased Configuration Model. We obtain expressions for the scaling as a function of the exponents of the distributions. Moreover, we show that this scaling undergoes a phase transition, where one region exhibits scaling related to the natural cut-off of the network while another region has scaling similar to the structural cut-off for uncorrelated networks. By establishing the speed of convergence of these structural dependencies we are able to asses statistical significance of degree-degree dependencies on finite complex networks when compared to networks generated by the directed Erased Configuration Model.

preprint2015arXiv

Predicting the long-term citation impact of recent publications

A fundamental problem in citation analysis is the prediction of the long-term citation impact of recent publications. We propose a model to predict a probability distribution for the future number of citations of a publication. Two predictors are used: The impact factor of the journal in which a publication has appeared and the number of citations a publication has received one year after its appearance. The proposed model is based on quantile regression. We employ the model to predict the future number of citations of a large set of publications in the field of physics. Our analysis shows that both predictors (i.e., impact factor and early citations) contribute to the accurate prediction of long-term citation impact. We also analytically study the behavior of the quantile regression coefficients for high quantiles of the distribution of citations. This is done by linking the quantile regression approach to a quantile estimation technique from extreme value theory. Our work provides insight into the influence of the impact factor and early citations on the long-term citation impact of a publication, and it takes a step toward a methodology that can be used to assess research institutions based on their most recently published work.

preprint2015arXiv

Upper bounds for number of removed edges in the Erased Configuration Model

Models for generating simple graphs are important in the study of real-world complex networks. A well established example of such a model is the erased configuration model, where each node receives a number of half-edges that are connected to half-edges of other nodes at random, and then self-loops are removed and multiple edges are concatenated to make the graph simple. Although asymptotic results for many properties of this model, such as the limiting degree distribution, are known, the exact speed of convergence in terms of the graph sizes remains an open question. We provide a first answer by analyzing the size dependence of the average number of removed edges in the erased configuration model. By combining known upper bounds with a Tauberian Theorem we obtain upper bounds for the number of removed edges, in terms of the size of the graph. Remarkably, when the degree distribution follows a power-law, we observe three scaling regimes, depending on the power law exponent. Our results provide a strong theoretical basis for evaluating finite-size effects in networks.

preprint2014arXiv

A survey on performance analysis of warehouse carousel systems

This paper gives an overview of recent research on the performance evaluation and design of carousel systems. We discuss picking strategies for problems involving one carousel, consider the throughput of the system for problems involving two carousels, give an overview of related problems in this area, and present an extensive literature review. Emphasis has been given on future research directions in this area.

preprint2014arXiv

Convergence of rank based degree-degree correlations in random directed networks

We introduce, and analyze, three measures for degree-degree dependencies, also called degree assortativity, in directed random graphs, based on Spearman's rho and Kendall's tau. We proof statistical consistency of these measures in general random graphs and show that the directed configuration model can serve as a null model for our degree-degree dependency measures. Based on these results we argue that the measures we introduce should be preferred over Pearson's correlation coefficients, when studying degree-degree dependencies, since the latter has several issues in the case of large networks with scale-free degree distributions.

preprint2014arXiv

Degree-degree dependencies in directed networks with heavy-tailed degrees

In network theory, Pearson's correlation coefficients are most commonly used to measure the degree assortativity of a network. We investigate the behavior of these coefficients in the setting of directed networks with heavy-tailed degree sequences. We prove that for graphs where the in- and out-degree sequences satisfy a power law with realistic parameters, Pearson's correlation coefficients converge to a non-negative number in the infinite network size limit. We propose alternative measures for degree-degree dependencies in directed networks based on Spearman's rho and Kendall's tau. Using examples and calculations on the Wikipedia graphs for nine different languages, we show why these rank correlation measures are more suited for measuring degree assortativity in directed graphs with heavy-tailed degrees.

preprint2014arXiv

PageRank in scale-free random graphs

We analyze the distribution of PageRank on a directed configuration model and show that as the size of the graph grows to infinity it can be closely approximated by the PageRank of the root node of an appropriately constructed tree. This tree approximation is in turn related to the solution of a linear stochastic fixed point equation that has been thoroughly studied in the recent literature.

preprint2014arXiv

Quick Detection of High-degree Entities in Large Directed Networks

In this paper, we address the problem of quick detection of high-degree entities in large online social networks. Practical importance of this problem is attested by a large number of companies that continuously collect and update statistics about popular entities, usually using the degree of an entity as an approximation of its popularity. We suggest a simple, efficient, and easy to implement two-stage randomized algorithm that provides highly accurate solutions for this problem. For instance, our algorithm needs only one thousand API requests in order to find the top-100 most followed users in Twitter, a network with approximately a billion of registered users, with more than 90% precision. Our algorithm significantly outperforms existing methods and serves many different purposes, such as finding the most popular users or the most popular interest groups in social networks. An important contribution of this work is the analysis of the proposed algorithm using Extreme Value Theory -- a branch of probability that studies extreme events and properties of largest order statistics in random samples. Using this theory, we derive an accurate prediction for the algorithm's performance and show that the number of API requests for finding the top-k most popular entities is sublinear in the number of entities. Moreover, we formally show that the high variability among the entities, expressed through heavy-tailed distributions, is the reason for the algorithm's efficiency. We quantify this phenomenon in a rigorous mathematical way.

preprint2014arXiv

Ranking algorithms on directed configuration networks

This paper studies the distribution of a family of rankings, which includes Google's PageRank, on a directed configuration model. In particular, it is shown that the distribution of the rank of a randomly chosen node in the graph converges in distribution to a finite random variable $\mathcal{R}^*$ that can be written as a linear combination of i.i.d. copies of the endogenous solution to a stochastic fixed point equation of the form $$\mathcal{R} \stackrel{\mathcal{D}}{=} \sum_{i=1}^{\mathcal{N}} \mathcal{C}_i \mathcal{R}_i + \mathcal{Q},$$ where $(\mathcal{Q}, \mathcal{N}, \{ \mathcal{C}_i\})$ is a real-valued vector with $\mathcal{N} \in \{0,1,2,\dots\}$, $P(|\mathcal{Q}| > 0) > 0$, and the $\{\mathcal{R}_i\}$ are i.i.d. copies of $\mathcal{R}$, independent of $(\mathcal{Q}, \mathcal{N}, \{ \mathcal{C}_i\})$. Moreover, we provide precise asymptotics for the limit $\mathcal{R}^*$, which when the in-degree distribution in the directed configuration model has a power law imply a power law distribution for $\mathcal{R}^*$ with the same exponent.

preprint2013arXiv

Alpha current flow betweenness centrality

A class of centrality measures called betweenness centralities reflects degree of participation of edges or nodes in communication between different parts of the network. The original shortest-path betweenness centrality is based on counting shortest paths which go through a node or an edge. One of shortcomings of the shortest-path betweenness centrality is that it ignores the paths that might be one or two steps longer than the shortest paths, while the edges on such paths can be important for communication processes in the network. To rectify this shortcoming a current flow betweenness centrality has been proposed. Similarly to the shortest path betwe has prohibitive complexity for large size networks. In the present work we propose two regularizations of the current flow betweenness centrality, α-current flow betweenness and truncated α-current flow betweenness, which can be computed fast and correlate well with the original current flow betweenness.

preprint2013arXiv

Uncovering disassortativity in large scale-free networks

Mixing patterns in large self-organizing networks, such as the Internet, the World Wide Web, social and biological networks are often characterized by degree-degree dependencies between neighbouring nodes. In this paper we propose a new way of measuring degree-degree dependencies. One of the problems with the commonly used assortativity coefficient is that in disassortative networks its magnitude decreases with the network size. We mathematically explain this phenomenon and validate the results on synthetic graphs and real-world network data. As an alternative, we suggest to use rank correlation measures such as Spearman's rho. Our experiments convincingly show that Spearman's rho produces consistent values in graphs of different sizes but similar structure, and it is able to reveal strong (positive or negative) dependencies in large graphs. In particular, we discover much stronger negative degree-degree dependencies} in Web graphs than was previously thought. {Rank correlations allow us to compare the assortativity of networks of different sizes, which is impossible with the assortativity coefficient due to its genuine dependence on the network size. We conclude that rank correlations provide a suitable and informative method for uncovering network mixing patterns.

preprint2012arXiv

A likelihood-based framework for the analysis of discussion threads

Online discussion threads are conversational cascades in the form of posted messages that can be generally found in social systems that comprise many-to-many interaction such as blogs, news aggregators or bulletin board systems. We propose a framework based on generative models of growing trees to analyse the structure and evolution of discussion threads. We consider the growth of a discussion to be determined by an interplay between popularity, novelty and a trend (or bias) to reply to the thread originator. The relevance of these features is estimated using a full likelihood approach and allows to characterize the habits and communication patterns of a given platform and/or community.

preprint2012arXiv

A scaling analysis of a cat and mouse Markov chain

If $(C_n)$ is a Markov chain on a discrete state space ${\mathcal{S}}$, a Markov chain $(C_n,M_n)$ on the product space ${\mathcal{S}}\times{\mathcal{S}}$, the cat and mouse Markov chain, is constructed. The first coordinate of this Markov chain behaves like the original Markov chain and the second component changes only when both coordinates are equal. The asymptotic properties of this Markov chain are investigated. A representation of its invariant measure is, in particular, obtained. When the state space is infinite it is shown that this Markov chain is in fact null recurrent if the initial Markov chain $(C_n)$ is positive recurrent and reversible. In this context, the scaling properties of the location of the second component, the mouse, are investigated in various situations: simple random walks in ${\mathbb{Z}}$ and ${\mathbb{Z}}^2$ reflected a simple random walk in ${\mathbb{N}}$ and also in a continuous time setting. For several of these processes, a time scaling with rapid growth gives an interesting asymptotic behavior related to limiting results for occupation times and rare events of Markov processes.

preprint2012arXiv

Quick Detection of Nodes with Large Degrees

Our goal is to quickly find top $k$ lists of nodes with the largest degrees in large complex networks. If the adjacency list of the network is known (not often the case in complex networks), a deterministic algorithm to find a node with the largest degree requires an average complexity of O(n), where $n$ is the number of nodes in the network. Even this modest complexity can be very high for large complex networks. We propose to use the random walk based method. We show theoretically and by numerical experiments that for large networks the random walk method finds good quality top lists of nodes with high probability and with computational savings of orders of magnitude. We also propose stopping criteria for the random walk method which requires very little knowledge about the structure of the network.

preprint2010arXiv

Monte Carlo Methods for Top-k Personalized PageRank Lists and Name Disambiguation

We study a problem of quick detection of top-k Personalized PageRank lists. This problem has a number of important applications such as finding local cuts in large graphs, estimation of similarity distance and name disambiguation. In particular, we apply our results to construct efficient algorithms for the person name disambiguation problem. We argue that when finding top-k Personalized PageRank lists two observations are important. Firstly, it is crucial that we detect fast the top-k most important neighbours of a node, while the exact order in the top-k list as well as the exact values of PageRank are by far not so crucial. Secondly, a little number of wrong elements in top-k lists do not really degrade the quality of top-k lists, but it can lead to significant computational saving. Based on these two key observations we propose Monte Carlo methods for fast detection of top-k Personalized PageRank lists. We provide performance evaluation of the proposed methods and supply stopping criteria. Then, we apply the methods to the person name disambiguation problem. The developed algorithm for the person name disambiguation problem has achieved the second place in the WePS 2010 competition.

Nelly Litvak

What is connected

Connect this record

See the researcher in context

Building this map preview

18 published item(s)

Red Light Green Light Method for Solving Large Markov Chains

When Less Is More: Systematic Analysis of Cascade-Based Community Detection

Modelling of trends in Twitter using retweet graph dynamics

Phase transitions for scaling of structural correlations in directed networks

Predicting the long-term citation impact of recent publications

Upper bounds for number of removed edges in the Erased Configuration Model

A survey on performance analysis of warehouse carousel systems

Convergence of rank based degree-degree correlations in random directed networks

Degree-degree dependencies in directed networks with heavy-tailed degrees

PageRank in scale-free random graphs

Quick Detection of High-degree Entities in Large Directed Networks

Ranking algorithms on directed configuration networks

Alpha current flow betweenness centrality

Uncovering disassortativity in large scale-free networks

A likelihood-based framework for the analysis of discussion threads

A scaling analysis of a cat and mouse Markov chain

Quick Detection of Nodes with Large Degrees

Monte Carlo Methods for Top-k Personalized PageRank Lists and Name Disambiguation