Source author record

Egor Samosvat

Egor Samosvat appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.CO math.PR Information Retrieval Social and Information Networks Discrete Mathematics physics.soc-ph

Catalog footprint

What is connected

7works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2016arXiv

Factorization threshold models for scale-free networks generation

Many real networks such as the World Wide Web, financial, biological, citation and social networks have a power-law degree distribution. Networks with this feature are also called scale-free. Several models for producing scale-free networks have been obtained by now and most of them are based on the preferential attachment approach. We will offer the model with another scale-free property explanation. The main idea is to approximate the network's adjacency matrix by multiplication of the matrices $V$ and $V^T$, where $V$ is the matrix of vertices' latent features. This approach is called matrix factorization and is successfully used in the link prediction problem. To create a generative model of scale-free networks we will sample latent features $V$ from some probabilistic distribution and try to generate a network's adjacency matrix. Entries in the generated matrix are dot products of latent features which are real numbers. In order to create an adjacency matrix, we approximate entries with the Boolean domain $\{0, 1\}$. We have incorporated the threshold parameter $θ$ into the model for discretization of a dot product. Actually, we have been influenced by the geographical threshold models which were recently proven to have good results in a scale-free networks generation. The overview of our results is the following. First, we will describe our model formally. Second, we will tune the threshold $θ$ in order to generate sparse growing networks. Finally, we will show that our model produces scale-free networks with the fixed power-law exponent which equals two. In order to generate oriented networks with tunable power-law exponents and to obtain other model properties, we will offer different modifications of our model. Some of our results will be demonstrated using computer simulation.

preprint2016arXiv

Generating maximally disassortative graphs with given degree distribution

In this paper we consider the optimization problem of generating graphs with a prescribed degree distribution, such that the correlation between the degrees of connected nodes, as measured by Spearman's rho, is minimal. We provide an algorithm for solving this problem and obtain a complete characterization of the joint degree distribution in these maximally disassortative graphs, in terms of the size-biased degree distribution. As a result we get a lower bound for Spearman's rho on graphs with an arbitrary given degree distribution. We use this lower bound to show that for any fixed tail exponent, there exist scale-free degree sequences with this exponent such that the minimum value of Spearman's rho for all graphs with such degree sequences is arbitrary close to zero. This implies that specifying only the tail behavior of the degree distribution, as is often done in the analysis of complex networks, gives no guarantees for the minimum value of Spearman's rho.

preprint2015arXiv

Generalized preferential attachment: tunable power-law degree distribution and clustering coefficient

We propose a wide class of preferential attachment models of random graphs, generalizing previous approaches. Graphs described by these models obey the power-law degree distribution, with the exponent that can be controlled in the models. Moreover, clustering coefficient of these graphs can also be controlled. We propose a concrete flexible model from our class and provide an efficient algorithm for generating graphs in this model. All our theoretical results are demonstrated in practice on examples of graphs obtained using this algorithm. Moreover, observations of generated graphs lead to future questions and hypotheses not yet justified by theory.

preprint2015arXiv

Global clustering coefficient in scale-free networks

In this paper, we analyze the behavior of the global clustering coefficient in scale free graphs. We are especially interested in the case of degree distribution with an infinite variance, since such degree distribution is usually observed in real-world networks of diverse nature. There are two common definitions of the clustering coefficient of a graph: global clustering and average local clustering. It is widely believed that in real networks both clustering coefficients tend to some positive constant as the networks grow. There are several models for which the average local clustering coefficient tends to a positive constant. On the other hand, there are no models of scale-free networks with an infinite variance of degree distribution and with a constant global clustering. In this paper we prove that if the degree distribution obeys the power law with an infinite variance, then the global clustering coefficient tends to zero with high probability as the size of a graph grows.

preprint2015arXiv

Recency-based preferential attachment models

Preferential attachment models were shown to be very effective in predicting such important properties of real-world networks as the power-law degree distribution, small diameter, etc. Many different models are based on the idea of preferential attachment: LCD, Buckley-Osthus, Holme-Kim, fitness, random Apollonian network, and many others. Although preferential attachment models reflect some important properties of real-world networks, they do not allow to model the so-called recency property. Recency property reflects the fact that in many real networks vertices tend to connect to other vertices of similar age. This fact motivated us to introduce a new class of models - recency-based models. This class is a generalization of fitness models, which were suggested by Bianconi and Barabasi. Bianconi and Barabasi extended preferential attachment models with pages' inherent quality or fitness of vertices. When a new vertex is added to the graph, it is joined to some already existing vertices that are chosen with probabilities proportional to the product of their fitness and incoming degree. We generalize fitness models by adding a recency factor to the attractiveness function. This means that pages are gaining incoming links according to their attractiveness, which is determined by the incoming degree of the page (current popularity), its inherent quality (some page-specific constant) and age (new pages are gaining new links more rapidly). We analyze different properties of recency-based models. In particular, we show that some distributions of inherent quality lead to the power-law degree distribution.

preprint2013arXiv

Evolution of the Media Web

We present a detailed study of the part of the Web related to media content, i.e., the Media Web. Using publicly available data, we analyze the evolution of incoming and outgoing links from and to media pages. Based on our observations, we propose a new class of models for the appearance of new media content on the Web where different \textit{attractiveness} functions of nodes are possible including ones taken from well-known preferential attachment and fitness models. We analyze these models theoretically and empirically and show which ones realistically predict both the incoming degree distribution and the so-called \textit{recency property} of the Media Web, something that existing models did not do well. Finally we compare these models by estimating the likelihood of the real-world link graph from our data set given each model and obtain that models we introduce are significantly more likely than previously proposed ones. One of the most surprising results is that in the Media Web the probability for a post to be cited is determined, most likely, by its quality rather than by its current popularity.

preprint2013arXiv

Timely crawling of high-quality ephemeral new content

Nowadays, more and more people use the Web as their primary source of up-to-date information. In this context, fast crawling and indexing of newly created Web pages has become crucial for search engines, especially because user traffic to a significant fraction of these new pages (like news, blog and forum posts) grows really quickly right after they appear, but lasts only for several days. In this paper, we study the problem of timely finding and crawling of such ephemeral new pages (in terms of user interest). Traditional crawling policies do not give any particular priority to such pages and may thus crawl them not quickly enough, and even crawl already obsolete content. We thus propose a new metric, well thought out for this task, which takes into account the decrease of user interest for ephemeral pages over time. We show that most ephemeral new pages can be found at a relatively small set of content sources and present a procedure for finding such a set. Our idea is to periodically recrawl content sources and crawl newly created pages linked from them, focusing on high-quality (in terms of user interest) content. One of the main difficulties here is to divide resources between these two activities in an efficient way. We find the adaptive balance between crawls and recrawls by maximizing the proposed metric. Further, we incorporate search engine click logs to give our crawler an insight about the current user demands. Efficiency of our approach is finally demonstrated experimentally on real-world data.

Egor Samosvat

What is connected

Connect this record

See the researcher in context

Building this map preview

7 published item(s)

Factorization threshold models for scale-free networks generation

Generating maximally disassortative graphs with given degree distribution

Generalized preferential attachment: tunable power-law degree distribution and clustering coefficient

Global clustering coefficient in scale-free networks

Recency-based preferential attachment models

Evolution of the Media Web

Timely crawling of high-quality ephemeral new content