Source author record

Vincent D. Blondel

Vincent D. Blondel appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

physics.soc-ph Social and Information Networks cs.CY Computational Complexity Information Theory math.IT physics.data-an Computation Data Structures and Algorithms math.CO math.DS physics.med-ph Populations and Evolution

Catalog footprint

What is connected

20works

13topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2016arXiv

Clean up or mess up: the effect of sampling biases on measurements of degree distributions in mobile phone datasets

Mobile phone data have been extensively used in the recent years to study social behavior. However, most of these studies are based on only partial data whose coverage is limited both in space and time. In this paper, we point to an observation that the bias due to the limited coverage in time may have an important influence on the results of the analyses performed. In particular, we observe significant differences, both qualitatively and quantitatively, in the degree distribution of the network, depending on the way the dataset is pre-processed and we present a possible explanation for the emergence of Double Pareto LogNormal (DPLN) degree distributions in temporal data.

preprint2016arXiv

Modelling influence and opinion evolution in online collective behaviour

Opinion evolution and judgment revision are mediated through social influence. Based on a large crowdsourced in vitro experiment (n=861), it is shown how a consensus model can be used to predict opinion evolution in online collective behaviour. It is the first time the predictive power of a quantitative model of opinion dynamics is tested against a real dataset. Unlike previous research on the topic, the model was validated on data which did not serve to calibrate it. This avoids to favor more complex models over more simple ones and prevents overfitting. The model is parametrized by the influenceability of each individual, a factor representing to what extent individuals incorporate external judgments. The prediction accuracy depends on prior knowledge on the participants' past behaviour. Several situations reflecting data availability are compared. When the data is scarce, the data from previous participants is used to predict how a new participant will behave. Judgment revision includes unpredictable variations which limit the potential for prediction. A first measure of unpredictability is proposed. The measure is based on a specific control experiment. More than two thirds of the prediction errors are found to occur due to unpredictability of the human judgment revision process rather than to model imperfection.

preprint2015arXiv

A survey of results on mobile phone datasets analysis

In this paper, we review some advances made recently in the study of mobile phone datasets. This area of research has emerged a decade ago, with the increasing availability of large-scale anonymized datasets, and has grown into a stand-alone topic. We will survey the contributions made so far on the social networks that can be constructed with such data, the study of personal mobility, geographical partitioning, urban planning, and help towards development as well as security and privacy issues.

preprint2015arXiv

On Primitivity of Sets of Matrices

A nonnegative matrix $A$ is called primitive if $A^k$ is positive for some integer $k>0$. A generalization of this concept to finite sets of matrices is as follows: a set of matrices $\mathcal M = \{A_1, A_2, \ldots, A_m \}$ is primitive if $A_{i_1} A_{i_2} \ldots A_{i_k}$ is positive for some indices $i_1, i_2, ..., i_k$. The concept of primitive sets of matrices comes up in a number of problems within the study of discrete-time switched systems. In this paper, we analyze the computational complexity of deciding if a given set of matrices is primitive and we derive bounds on the length of the shortest positive product. We show that while primitivity is algorithmically decidable, unless $P=NP$ it is not possible to decide primitivity of a matrix set in polynomial time. Moreover, we show that the length of the shortest positive sequence can be superpolynomial in the dimension of the matrices. On the other hand, defining ${\mathcal P}$ to be the set of matrices with no zero rows or columns, we give a simple combinatorial proof of a previously-known characterization of primitivity for matrices in ${\mathcal P}$ which can be tested in polynomial time. This latter observation is related to the well-known 1964 conjecture of Cerny on synchronizing automata; in fact, any bound on the minimal length of a synchronizing word for synchronizing automata immediately translates into a bound on the length of the shortest positive product of a primitive set of matrices in ${\mathcal P}$. In particular, any primitive set of $n \times n$ matrices in ${\mathcal P}$ has a positive product of length $O(n^3)$.

preprint2015arXiv

Sensitivity analysis of a branching process evolving on a network with application in epidemiology

We perform an analytical sensitivity analysis for a model of a continuous-time branching process evolving on a fixed network. This allows us to determine the relative importance of the model parameters to the growth of the population on the network. We then apply our results to the early stages of an influenza-like epidemic spreading among a set of cities connected by air routes in the United States. We also consider vaccination and analyze the sensitivity of the total size of the epidemic with respect to the fraction of vaccinated people. Our analysis shows that the epidemic growth is more sensitive with respect to transmission rates within cities than travel rates between cities. More generally, we highlight the fact that branching processes offer a powerful stochastic modeling tool with analytical formulas for sensitivity which are easy to use in practice.

preprint2014arXiv

Career on the Move: Geography, Stratification, and Scientific Impact

Changing institutions is an integral part of an academic life. Yet little is known about the mobility patterns of scientists at an institutional level and how these career choices affect scientific outcomes. Here, we examine over 420,000 papers, to track the affiliation information of individual scientists, allowing us to reconstruct their career trajectories over decades. We find that career movements are not only temporally and spatially localized, but also characterized by a high degree of stratification in institutional ranking. When cross-group movement occurs, we find that while going from elite to lower-rank institutions on average associates with modest decrease in scientific performance, transitioning into elite institutions does not result in subsequent performance gain. These results offer empirical evidence on institutional level career choices and movements and have potential implications for science policy.

preprint2014arXiv

D4D-Senegal: The Second Mobile Phone Data for Development Challenge

The D4D-Senegal challenge is an open innovation data challenge on anonymous call patterns of Orange's mobile phone users in Senegal. The goal of the challenge is to help address society development questions in novel ways by contributing to the socio-economic development and well-being of the Senegalese population. Participants to the challenge are given access to three mobile phone datasets. This paper describes the three datasets. The datasets are based on Call Detail Records (CDR) of phone calls and text exchanges between more than 9 million of Orange's customers in Senegal between January 1, 2013 to December 31, 2013. The datasets are: (1) antenna-to-antenna traffic for 1666 antennas on an hourly basis, (2) fine-grained mobility data on a rolling 2-week basis for a year with bandicoot behavioral indicators at individual level for about 300,000 randomly sampled users, (3) one year of coarse-grained mobility data at arrondissement level with bandicoot behavioral indicators at individual level for about 150,000 randomly sampled users

preprint2014arXiv

Estimating Food Consumption and Poverty Indices with Mobile Phone Data

Recent studies have shown the value of mobile phone data to tackle problems related to economic development and humanitarian action. In this research, we assess the suitability of indicators derived from mobile phone data as a proxy for food security indicators. We compare the measures extracted from call detail records and airtime credit purchases to the results of a nationwide household survey conducted at the same time. Results show high correlations (> .8) between mobile phone data derived indicators and several relevant food security variables such as expenditure on food or vegetable consumption. This correspondence suggests that, in the future, proxies derived from mobile phone data could be used to provide valuable up-to-date operational information on food security throughout low and middle income countries.

preprint2013arXiv

Cramér-Rao bounds for synchronization of rotations

Synchronization of rotations is the problem of estimating a set of rotations R_i in SO(n), i = 1, ..., N, based on noisy measurements of relative rotations R_i R_j^T. This fundamental problem has found many recent applications, most importantly in structural biology. We provide a framework to study synchronization as estimation on Riemannian manifolds for arbitrary n under a large family of noise models. The noise models we address encompass zero-mean isotropic noise, and we develop tools for Gaussian-like as well as heavy-tail types of noise in particular. As a main contribution, we derive the Cramér-Rao bounds of synchronization, that is, lower-bounds on the variance of unbiased estimators. We find that these bounds are structured by the pseudoinverse of the measurement graph Laplacian, where edge weights are proportional to measurement quality. We leverage this to provide interpretation in terms of random walks and visualization tools for these bounds in both the anchored and anchor-free scenarios. Similar bounds previously established were limited to rotations in the plane and Gaussian-like noise.

preprint2013arXiv

Data for Development: the D4D Challenge on Mobile Phone Data

The Orange "Data for Development" (D4D) challenge is an open data challenge on anonymous call patterns of Orange's mobile phone users in Ivory Coast. The goal of the challenge is to help address society development questions in novel ways by contributing to the socio-economic development and well-being of the Ivory Coast population. Participants to the challenge are given access to four mobile phone datasets and the purpose of this paper is to describe the four datasets. The website http://www.d4d.orange.com contains more information about the participation rules. The datasets are based on anonymized Call Detail Records (CDR) of phone calls and SMS exchanges between five million of Orange's customers in Ivory Coast between December 1, 2011 and April 28, 2012. The datasets are: (a) antenna-to-antenna traffic on an hourly basis, (b) individual trajectories for 50,000 customers for two week time windows with antenna location information, (3) individual trajectories for 500,000 customers over the entire observation period with sub-prefecture location information, and (4) a sample of communication graphs for 5,000 customers

preprint2013arXiv

Evaluating socio-economic state of a country analyzing airtime credit and mobile phone datasets

Reliable statistical information is important to make political decisions on a sound basis and to help measure the impact of policies. Unfortunately, statistics offices in developing countries have scarce resources and statistical censuses are therefore conducted sporadically. Based on mobile phone communications and history of airtime credit purchases, we estimate the relative income of individuals, the diversity and inequality of income, and an indicator for socioeconomic segregation for fine-grained regions of an African country. Our study shows how to use mobile phone datasets as a starting point to understand the socio-economic state of a country, which can be especially useful in countries with few resources to conduct large surveys.

preprint2012arXiv

Exploring the Mobility of Mobile Phone Users

Mobile phone datasets allow for the analysis of human behavior on an unprecedented scale. The social network, temporal dynamics and mobile behavior of mobile phone users have often been analyzed independently from each other using mobile phone datasets. In this article, we explore the connections between various features of human behavior extracted from a large mobile phone dataset. Our observations are based on the analysis of communication data of 100000 anonymized and randomly chosen individuals in a dataset of communications in Portugal. We show that clustering and principal component analysis allow for a significant dimension reduction with limited loss of information. The most important features are related to geographical location. In particular, we observe that most people spend most of their time at only a few locations. With the help of clustering methods, we then robustly identify home and office locations and compare the results with official census data. Finally, we analyze the geographic spread of users' frequent locations and show that commuting distances can be reasonably well explained by a gravity model.

preprint2012arXiv

PageRank Optimization by Edge Selection

The importance of a node in a directed graph can be measured by its PageRank. The PageRank of a node is used in a number of application contexts - including ranking websites - and can be interpreted as the average portion of time spent at the node by an infinite random walk. We consider the problem of maximizing the PageRank of a node by selecting some of the edges from a set of edges that are under our control. By applying results from Markov decision theory, we show that an optimal solution to this problem can be found in polynomial time. Our core solution results in a linear programming formulation, but we also provide an alternative greedy algorithm, a variant of policy iteration, which runs in polynomial time, as well. Finally, we show that, under the slight modification for which we are given mutually exclusive pairs of edges, the problem of PageRank optimization becomes NP-hard.

preprint2012arXiv

Temporal Heterogeneities Increase the Prevalence of Epidemics on Evolving Networks

Empirical studies suggest that contact patterns follow heterogeneous inter-event times, meaning that intervals of high activity are followed by periods of inactivity. Combined with birth and death of individuals, these temporal constraints affect the spread of infections in a non-trivial way and are dependent on the particular contact dynamics. We propose a stochastic model to generate temporal networks where vertices make instantaneous contacts following heterogeneous inter-event times, and leave and enter the system at fixed rates. We study how these temporal properties affect the prevalence of an infection and estimate R0, the number of secondary infections, by modeling simulated infections (SIR, SI and SIS) co-evolving with the network structure. We find that heterogeneous contact patterns cause earlier and larger epidemics on the SIR model in comparison to homogeneous scenarios. In case of SI and SIS, the epidemics is faster in the early stages (up to 90% of prevalence) followed by a slowdown in the asymptotic limit in case of heterogeneous patterns. In the presence of birth and death, heterogeneous patterns always cause higher prevalence in comparison to homogeneous scenarios with same average inter-event times. Our results suggest that R0 may be underestimated if temporal heterogeneities are not taken into account in the modeling of epidemics.

preprint2012arXiv

Uncovering space-independent communities in spatial networks

Many complex systems are organized in the form of a network embedded in space. Important examples include the physical Internet infrastucture, road networks, flight connections, brain functional networks and social networks. The effect of space on network topology has recently come under the spotlight because of the emergence of pervasive technologies based on geo-localization, which constantly fill databases with people's movements and thus reveal their trajectories and spatial behaviour. Extracting patterns and regularities from the resulting massive amount of human mobility data requires the development of appropriate tools for uncovering information in spatially-embedded networks. In contrast with most works that tend to apply standard network metrics to any type of network, we argue in this paper for a careful treatment of the constraints imposed by space on network topology. In particular, we focus on the problem of community detection and propose a modularity function adapted to spatial networks. We show that it is possible to factor out the effect of space in order to reveal more clearly hidden structural similarities between the nodes. Methods are tested on a large mobile phone network and computer-generated benchmarks where the effect of space has been incorporated.

preprint2011arXiv

An upper bound on community size in scalable community detection

It is well-known that community detection methods based on modularity optimization often fails to discover small communities. Several objective functions used for community detection therefore involve a resolution parameter that allows the detection of communities at different scales. We provide an explicit upper bound on the community size of communities resulting from the optimization of several of these functions. We also show with a simple example that the use of the resolution parameter may artificially force the complete disaggregation of large and densely connected communities.

preprint2011arXiv

Extracting spatial information from networks with low-order eigenvectors

We consider the problem of inferring meaningful spatial information in networks from incomplete information on the connection intensity between the nodes of the network. We consider two spatially distributed networks: a population migration flow network within the US, and a network of mobile phone calls between cities in Belgium. For both networks we use the eigenvectors of the Laplacian matrix constructed from the link intensities to obtain informative visualizations and capture natural geographical subdivisions. We observe that some low order eigenvectors localize very well and seem to reveal small geographically cohesive regions that match remarkably well with political and administrative boundaries. We discuss possible explanations for this observation by describing diffusion maps and localized eigenfunctions. In addition, we discuss a possible connection with the weighted graph cut problem, and provide numerical evidence supporting the idea that lower order eigenvectors point out local cuts in the network. However, we do not provide a formal and rigorous justification for our observations.

preprint2011arXiv

Interplay between telecommunications and face-to-face interactions - a study using mobile phone data

In this study we analyze one year of anonymized telecommunications data for over one million customers from a large European cellphone operator, and we investigate the relationship between people's calls and their physical location. We discover that more than 90% of users who have called each other have also shared the same space (cell tower), even if they live far apart. Moreover, we find that close to 70% of users who call each other frequently (at least once per month on average) have shared the same space at the same time - an instance that we call co-location. Co-locations appear indicative of coordination calls, which occur just before face-to-face meetings. Their number is highly predictable based on the amount of calls between two users and the distance between their home locations - suggesting a new way to quantify the interplay between telecommunications and face-to-face interactions.

preprint2008arXiv

Dynamics of latent voters

We study the effect of latency on binary-choice opinion formation models. Latency is introduced into the models as an additional dynamic rule: after a voter changes its opinion, it enters a waiting period of stochastic length where no further changes take place. We first focus on the voter model and show that as a result of introducing latency, the average magnetization is not conserved, and the system is driven toward zero magnetization, independently of initial conditions. The model is studied analytically in the mean-field case and by simulations in one dimension. We also address the behavior of the Majority Rule model with added latency, and show that the competition between imitation and latency leads to a rich phenomenology.

preprint2006arXiv

On the complexity of computing the capacity of codes that avoid forbidden difference patterns

We consider questions related to the computation of the capacity of codes that avoid forbidden difference patterns. The maximal number of $n$-bit sequences whose pairwise differences do not contain some given forbidden difference patterns increases exponentially with $n$. The exponent is the capacity of the forbidden patterns, which is given by the logarithm of the joint spectral radius of a set of matrices constructed from the forbidden difference patterns. We provide a new family of bounds that allows for the approximation, in exponential time, of the capacity with arbitrary high degree of accuracy. We also provide a polynomial time algorithm for the problem of determining if the capacity of a set is positive, but we prove that the same problem becomes NP-hard when the sets of forbidden patterns are defined over an extended set of symbols. Finally, we prove the existence of extremal norms for the sets of matrices arising in the capacity computation. This result makes it possible to apply a specific (even though non polynomial) approximation algorithm. We illustrate this fact by computing exactly the capacity of codes that were only known approximately.

Vincent D. Blondel

What is connected

Connect this record

See the researcher in context

Building this map preview

20 published item(s)

Clean up or mess up: the effect of sampling biases on measurements of degree distributions in mobile phone datasets

Modelling influence and opinion evolution in online collective behaviour

A survey of results on mobile phone datasets analysis

On Primitivity of Sets of Matrices

Sensitivity analysis of a branching process evolving on a network with application in epidemiology

Career on the Move: Geography, Stratification, and Scientific Impact

D4D-Senegal: The Second Mobile Phone Data for Development Challenge

Estimating Food Consumption and Poverty Indices with Mobile Phone Data

Cramér-Rao bounds for synchronization of rotations

Data for Development: the D4D Challenge on Mobile Phone Data

Evaluating socio-economic state of a country analyzing airtime credit and mobile phone datasets

Exploring the Mobility of Mobile Phone Users

PageRank Optimization by Edge Selection

Temporal Heterogeneities Increase the Prevalence of Epidemics on Evolving Networks

Uncovering space-independent communities in spatial networks

An upper bound on community size in scalable community detection

Extracting spatial information from networks with low-order eigenvectors

Interplay between telecommunications and face-to-face interactions - a study using mobile phone data

Dynamics of latent voters

On the complexity of computing the capacity of codes that avoid forbidden difference patterns