Source author record

Vincent Labatut

Vincent Labatut appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Social and Information Networks physics.soc-ph Information Retrieval Computation and Language Multimedia Machine Learning Software Engineering Computer Vision math.ST Robotics Statistics Theory Artificial Intelligence cs.CY math.HO math.OC Networking and Internet Architecture physics.data-an

Catalog footprint

What is connected

37works

17topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Extraction and Analysis of Fictional Character Networks: A Survey

A character network is a graph extracted from a narrative, in which vertices represent characters and edges correspond to interactions between them. A number of narrative-related problems can be addressed automatically through the analysis of character networks, such as summarization, classification, or role detection. Character networks are particularly relevant when considering works of fictions (e.g. novels, plays, movies, TV series), as their exploitation allows developing information retrieval and recommendation systems. However, works of fiction possess specific properties making these tasks harder. This survey aims at presenting and organizing the scientific literature related to the extraction of character networks from works of fiction, as well as their analysis. We first describe the extraction process in a generic way, and explain how its constituting steps are implemented in practice, depending on the medium of the narrative, the goal of the network analysis, and other factors. We then review the descriptive tools used to characterize character networks, with a focus on the way they are interpreted in this context. We illustrate the relevance of character networks by also providing a review of applications derived from their analysis. Finally, we identify the limitations of the existing approaches, and the most promising perspectives.

preprint2021arXiv

Characterizing and comparing external measures for the assessment of cluster analysis and community detection

In the context of cluster analysis and graph partitioning, many external evaluation measures have been proposed in the literature to compare two partitions of the same set. This makes the task of selecting the most appropriate measure for a given situation a challenge for the end user. However, this issue is overlooked in the literature. Researchers tend to follow tradition and use the standard measures of their field, although they often became standard only because previous researchers started consistently using them. In this work, we propose a new empirical evaluation framework to solve this issue, and help the end user selecting an appropriate measure for their application. For a collection of candidate measures, it first consists in describing their behavior by computing them for a generated dataset of partitions, obtained by applying a set of predefined parametric partition transformations. Second, our framework performs a regression analysis to characterize the measures in terms of how they are affected by these parameters and transformations. This allows both describing and comparing the measures. Our approach is not tied to any specific measure or application, so it can be applied to any situation. We illustrate its relevance by applying it to a selection of standard measures, and show how it can be put in practice through two concrete use cases.

preprint2021arXiv

Graph embeddings for Abusive Language Detection

Abusive behaviors are common on online social networks. The increasing frequency of antisocial behaviors forces the hosts of online platforms to find new solutions to address this problem. Automating the moderation process has thus received a lot of interest in the past few years. Various methods have been proposed, most based on the exchanged content, and one relying on the structure and dynamics of the conversation. It has the advantage of being languageindependent, however it leverages a hand-crafted set of topological measures which are computationally expensive and not necessarily suitable to all situations. In the present paper, we propose to use recent graph embedding approaches to automatically learn representations of conversational graphs depicting message exchanges. We compare two categories: node vs. whole-graph embeddings. We experiment with a total of 8 approaches and apply them to a dataset of online messages. We also study more precisely which aspects of the graph structure are leveraged by each approach. Our study shows that the representation produced by certain embeddings captures the information conveyed by specific topological measures, but misses out other aspects.

preprint2020arXiv

Multiple Partitioning of Multiplex Signed Networks: Application to European Parliament Votes

For more than a decade, graphs have been used to model the voting behavior taking place in parliaments. However, the methods described in the literature suffer from several limitations. The two main ones are that 1) they rely on some temporal integration of the raw data, which causes some information loss, and/or 2) they identify groups of antagonistic voters, but not the context associated to their occurrence. In this article, we propose a novel method taking advantage of multiplex signed graphs to solve both these issues. It consists in first partitioning separately each layer, before grouping these partitions by similarity. We show the interest of our approach by applying it to a European Parliament dataset.

preprint2020arXiv

Multiplicity and Diversity: Analyzing the Optimal Solution Space of the Correlation Clustering Problem on Complete Signed Graphs

In order to study real-world systems, many applied works model them through signed graphs, i.e. graphs whose edges are labeled as either positive or negative. Such a graph is considered as structurally balanced when it can be partitioned into a number of modules, such that positive (resp. negative) edges are located inside (resp. in-between) the modules. When it is not the case, authors look for the closest partition to such balance, a problem called Correlation Clustering (CC). Due to the complexity of the CC problem, the standard approach is to find a single optimal partition and stick to it, even if other optimal or high scoring solutions possibly exist. In this work, we study the space of optimal solutions of the CC problem, on a collection of synthetic complete graphs. We show empirically that under certain conditions, there can be many optimal partitions of a signed graph. Some of these are very different and thus provide distinct perspectives on the system, as illustrated on a small real-world graph. This is an important result, as it implies that one may have to find several, if not all, optimal solutions of the CC problem, in order to properly study the considered system.

preprint2020arXiv

Narrative Smoothing: Dynamic Conversational Network for the Analysis of TV Series Plots

Modern popular TV series often develop complex storylines spanning several seasons, but are usually watched in quite a discontinuous way. As a result, the viewer generally needs a comprehensive summary of the previous season plot before the new one starts. The generation of such summaries requires first to identify and characterize the dynamics of the series subplots. One way of doing so is to study the underlying social network of interactions between the characters involved in the narrative. The standard tools used in the Social Networks Analysis field to extract such a network rely on an integration of time, either over the whole considered period, or as a sequence of several time-slices. However, they turn out to be inappropriate in the case of TV series, due to the fact the scenes showed onscreen alternatively focus on parallel storylines, and do not necessarily respect a traditional chronology. This makes existing extraction methods inefficient to describe the dynamics of relationships between characters, or to get a relevant instantaneous view of the current social state in the plot. This is especially true for characters shown as interacting with each other at some previous point in the plot but temporarily neglected by the narrative. In this article, we introduce narrative smoothing, a novel, still exploratory, network extraction method. It smooths the relationship dynamics based on the plot properties, aiming at solving some of the limitations present in the standard approaches. In order to assess our method, we apply it to a new corpus of 3 popular TV series, and compare it to both standard approaches. Our results are promising, showing narrative smoothing leads to more relevant observations when it comes to the characterization of the protagonists and their relationships. It could be used as a basis for further modeling the intertwined storylines constituting TV series plots.

preprint2020arXiv

Remembering Winter Was Coming: Character-Oriented Video Summaries of TV Series

Today's popular TV series tend to develop continuous, complex plots spanning several seasons, but are often viewed in controlled and discontinuous conditions. Consequently, most viewers need to be re-immersed in the story before watching a new season. Although discussions with friends and family can help, we observe that most viewers make extensive use of summaries to re-engage with the plot. Automatic generation of video summaries of TV series' complex stories requires, first, modeling the dynamics of the plot and, second, extracting relevant sequences. In this paper, we tackle plot modeling by considering the social network of interactions between the characters involved in the narrative: substantial, durable changes in a major character's social environment suggest a new development relevant for the summary. Once identified, these major stages in each character's storyline can be used as a basis for completing the summary with related sequences. Our algorithm combines such social network analysis with filmmaking grammar to automatically generate character-oriented video summaries of TV series from partially annotated data. We carry out evaluation with a user study in a real-world scenario: a large sample of viewers were asked to rank video summaries centered on five characters of the popular TV series Game of Thrones, a few weeks before the new, sixth season was released. Our results reveal the ability of character-oriented summaries to re-engage viewers in television series and confirm the contributions of modeling the plot content and exploiting stylistic patterns to identify salient sequences.

preprint2020arXiv

Serial Speakers: a Dataset of TV Series

For over a decade, TV series have been drawing increasing interest, both from the audience and from various academic fields. But while most viewers are hooked on the continuous plots of TV serials, the few annotated datasets available to researchers focus on standalone episodes of classical TV series. We aim at filling this gap by providing the multimedia/speech processing communities with Serial Speakers, an annotated dataset of 161 episodes from three popular American TV serials: Breaking Bad, Game of Thrones and House of Cards. Serial Speakers is suitable both for investigating multimedia retrieval in realistic use case scenarios, and for addressing lower level speech related tasks in especially challenging conditions. We publicly release annotations for every speech turn (boundaries, speaker) and scene boundary, along with annotations for shot boundaries, recurring shots, and interacting speakers in a subset of episodes. Because of copyright restrictions, the textual content of the speech turns is encrypted in the public version of the dataset, but we provide the users with a simple online tool to recover the plain text from their own subtitle files.

preprint2020arXiv

WAC: A Corpus of Wikipedia Conversations for Online Abuse Detection

With the spread of online social networks, it is more and more difficult to monitor all the user-generated content. Automating the moderation process of the inappropriate exchange content on Internet has thus become a priority task. Methods have been proposed for this purpose, but it can be challenging to find a suitable dataset to train and develop them. This issue is especially true for approaches based on information derived from the structure and the dynamic of the conversation. In this work, we propose an original framework, based on the Wikipedia Comment corpus, with comment-level abuse annotations of different types. The major contribution concerns the reconstruction of conversations, by comparison to existing corpora, which focus only on isolated messages (i.e. taken out of their conversational context). This large corpus of more than 380k annotated messages opens perspectives for online abuse detection and especially for context-based approaches. We also propose, in addition to this corpus, a complete benchmarking platform to stimulate and fairly compare scientific works around the problem of content abuse detection, trying to avoid the recurring problem of result replication. Finally, we apply two classification methods to our dataset to demonstrate its potential.

preprint2016arXiv

A Review of Features for the Discrimination of Twitter Users: Application to the Prediction of Offline Influence

Many works related to Twitter aim at characterizing its users in some way: role on the service (spammers, bots, organizations, etc.), nature of the user (socio-professional category, age, etc.), topics of interest , and others. However, for a given user classification problem, it is very difficult to select a set of appropriate features, because the many features described in the literature are very heterogeneous, with name overlaps and collisions, and numerous very close variants. In this article, we review a wide range of such features. In order to present a clear state-of-the-art description, we unify their names, definitions and relationships, and we propose a new, neutral, typology. We then illustrate the interest of our review by applying a selection of these features to the offline influence detection problem. This task consists in identifying users which are influential in real-life, based on their Twitter account and related data. We show that most features deemed efficient to predict online influence, such as the numbers of retweets and followers, are not relevant to this problem. However, We propose several content-based approaches to label Twitter users as Influencers or not. We also rank them according to a predicted influence level. Our proposals are evaluated over the CLEF RepLab 2014 dataset, and outmatch state-of-the-art methods.

preprint2016arXiv

Generalized Measures for the Evaluation of Community Detection Methods

Community detection can be considered as a variant of cluster analysis applied to complex networks. For this reason, all existing studies have been using tools derived from this field when evaluating community detection algorithms. However, those are not completely relevant in the context of network analysis, because they ignore an essential part of the available information: the network structure. Therefore, they can lead to incorrect interpretations. In this article, we review these measures, and illustrate this limitation. We propose a modification to solve this problem, and apply it to the three most widespread measures: purity, Rand index and normalized mutual information (NMI). We then perform an experimental evaluation on artificially generated networks with realistic community structure. We assess the relevance of the modified measures by comparison with their traditional counterparts, and also relatively to the topological properties of the community structures. On these data, the modified NMI turns out to provide the most relevant results.

preprint2016arXiv

Straightness of rectilinear vs. radio-concentric networks: modeling simulation and comparison

This paper proposes a comparison between rectilinear and radio-concentric networks. Indeed, those networks are often observed in urban areas, in several cities all over the world. One of the interesting properties of such networks is described by the \textit{straightness} measure from graph theory, which assesses how much moving from one node to another along the network links departs from the network-independent straightforward path. We study this property in both rectilinear and radio-concentric networks, first by analyzing mathematically routes from the center to peripheral locations in a theoretical framework with perfect topology, then using simulations for multiple origin-destination paths. We show that in most of the cases, radio-concentric networks have a better straightness than rectilinear ones. How may this property be used in the future for urban networks?

preprint2015arXiv

A community role approach to assess social capitalists visibility in the Twitter network

In the context of Twitter, social capitalists are specific users trying to increase their number of followers and interactions by any means. These users are not healthy for the service, because they are either spammers or real users flawing the notions of influence and visibility. Studying their behavior and understanding their position in Twit-ter is thus of important interest. It is also necessary to analyze how these methods effectively affect user visibility. Based on a recently proposed method allowing to identify social capitalists, we tackle both points by studying how they are organized, and how their links spread across the Twitter follower-followee network. To that aim, we consider their position in the network w.r.t. its community structure. We use the concept of community role of a node, which describes its position in a network depending on its connectiv-ity at the community level. However, the topological measures originally defined to characterize these roles consider only certain aspects of the community-related connectivity, and rely on a set of empirically fixed thresholds. We first show the limitations of these measures, before extending and generalizing them. Moreover, we use an unsupervised approach to identify the roles, in order to provide more flexibility relatively to the studied system. We then apply our method to the case of social capitalists and show they are highly visible on Twitter, due to the specific roles they hold.

preprint2015arXiv

Étude de l'omniprésence des propriétés petit- monde et sans-échelle

The small-world and scale-free properties were identified in real-world complex net-works at the end of the 90s. Their analysis led to a better understanding of the dynamics and functioning of certain systems, and they were studied in many subsequent works. This might be the reason why one frequently finds, in the complex networks literature, assertions regarding their ubiquity, their validity for almost all complex networks. Yet, the mentioned seminal works were conducted on a very limited number of networks, and, to the best of our knowledge, no large-scale study has been conducted to answer this question. In this work, we take advantage, on the one hand, of the many datasets now available online, to constitute a large collection of networks, and on the other hand, of recent analysis tools, to check the validity of this hypothesis of ubiquity. It turns out a large majority of the studied networks are indeed small-world, how-ever this is not the case for the scale-free property, since the degree distribution of a significant proportion of our networks does not follow a power-law. MOTS-CL{É}S : R{é}seaux complexes, propri{é}t{é}s topologiques, petit-monde, sans-{é}chelle.

preprint2015arXiv

Interpreting communities based on the evolution of a dynamic attributed network

Many methods have been proposed to detect communities, not only in plain, but also in attributed, directed or even dynamic complex networks. From the modeling point of view, to be of some utility, the community structure must be characterized relatively to the properties of the studied system. However, most of the existing works focus on the detection of communities, and only very few try to tackle this interpretation problem. Moreover, the existing approaches are limited either by the type of data they handle, or by the nature of the results they output. In this work, we see the interpretation of communities as a problem independent from the detection process, consisting in identifying the most characteristic features of communities. We give a formal definition of this problem and propose a method to solve it. To this aim, we first define a sequence-based representation of networks, combining temporal information, community structure, topological measures, and nodal attributes. We then describe how to identify the most emerging sequential patterns of this dataset, and use them to characterize the communities. We study the performance of our method on artificially generated dynamic attributed networks. We also empirically validate our framework on real-world systems: a DBLP network of scientific collaborations, and a LastFM network of social and musical interactions.

preprint2015arXiv

Relevance of Negative Links in Graph Partitioning: A Case Study Using Votes From the European Parliament

In this paper, we want to study the informative value of negative links in signed complex networks. For this purpose, we extract and analyze a collection of signed networks representing voting sessions of the European Parliament (EP). We first process some data collected by the VoteWatch Europe Website for the whole 7 th term (2009-2014), by considering voting similarities between Members of the EP to define weighted signed links. We then apply a selection of community detection algorithms, designed to process only positive links, to these data. We also apply Parallel Iterative Local Search (Parallel ILS), an algorithm recently proposed to identify balanced partitions in signed networks. Our results show that, contrary to the conclusions of a previous study focusing on other data, the partitions detected by ignoring or considering the negative links are indeed remarkably different for these networks. The relevance of negative links for graph partitioning therefore is an open question which should be further explored.

preprint2014arXiv

A Method for Characterizing Communities in Dynamic Attributed Complex Networks

Many methods have been proposed to detect communities, not only in plain, but also in attributed, directed or even dynamic complex networks. In its simplest form, a community structure takes the form of a partition of the node set. From the modeling point of view, to be of some utility, this partition must then be characterized relatively to the properties of the studied system. However, if most of the existing works focus on defining methods for the detection of communities, only very few try to tackle this interpretation problem. Moreover, the existing approaches are limited either in the type of data they handle, or by the nature of the results they output. In this work, we propose a method to efficiently support such a characterization task. We first define a sequence-based representation of networks, combining temporal information, topological measures, and nodal attributes. We then describe how to identify the most emerging sequential patterns of this dataset, and use them to characterize the communities. We also show how to detect unusual behavior in a community, and highlight outliers. Finally, as an illustration, we apply our method to a network of scientific collaborations.

preprint2014arXiv

Classification of Complex Networks Based on Topological Properties

Complex networks are a powerful modeling tool, allowing the study of countless real-world systems. They have been used in very different domains such as computer science, biology, sociology, management, etc. Authors have been trying to characterize them using various measures such as degree distribution, transitivity or average distance. Their goal is to detect certain properties such as the small-world or scale-free properties. Previous works have shown some of these properties are present in many different systems, while others are characteristic of certain types of systems only. However, each one of these studies generally focuses on a very small number of topological measures and networks. In this work, we aim at using a more systematic approach. We first constitute a dataset of 152 publicly available networks, spanning over 7 different domains. We then process 14 different topological measures to characterize them in the most possible complete way. Finally, we apply standard data mining tools to analyze these data. A cluster analysis reveals it is possible to obtain two significantly distinct clusters of networks, corresponding roughly to a bisection of the domains modeled by the networks. On these data, the most discriminant measures are density, modularity, average degree and transitivity, and at a lesser extent, closeness and edgebetweenness centralities.Abstract--Complex networks are a powerful modeling tool, allowing the study of countless real-world systems. They have been used in very different domains such as computer science, biology, sociology, management, etc. Authors have been trying to characterize them using various measures such as degree distribution, transitivity or average distance. Their goal is to detect certain properties such as the small-world or scale-free properties. Previous works have shown some of these properties are present in many different systems, while others are characteristic of certain types of systems only. However, each one of these studies generally focuses on a very small number of topological measures and networks. In this work, we aim at using a more systematic approach. We first constitute a dataset of 152 publicly available networks, spanning over 7 different domains. We then process 14 different topological measures to characterize them in the most possible complete way. Finally, we apply standard data mining tools to analyze these data. A cluster analysis reveals it is possible to obtain two significantly distinct clusters of networks, corresponding roughly to a bisection of the domains modeled by the networks. On these data, the most discriminant measures are density, modularity, average degree and transitivity, and at a lesser extent, closeness and edgebetweenness centralities.

preprint2014arXiv

Identifying the Community Roles of Social Capitalists in the Twitter Network

In the context of Twitter, social capitalists are specific users trying to increase their number of followers and interactions by any means. These users are not healthy for the Twitter network since they flaw notions of influence and visibility. Indeed, it has recently been observed that they are real and active users that can help malicious users such as spammers gaining influence. Studying their behavior and understanding their position in Twitter is thus of important interest. A recent work provided an efficient way to detect social capitalists using two simple topological measures. Based on this detection method, we study how social capitalists are distributed over Twitter's friend-to-follower network. We are especially interested in analyzing how they are organized, and how their links spread across the network. Answering these questions allows to know whether the social capitalism methods increase the actual visibility on the service. To that aim, we study the position of social capitalists on Twitter w.r.t. the community structure of the network. We base our work on the concept of community role of a node, which describes its position in a network depending on its connectivity at the community level. The topological measures originally defined to characterize these roles consider only some aspects of community-related connectivity and rely on a set of empirically fixed thresholds. We first show the limitations of such measures and then extend and generalize them by considering new aspects of the community-related connectivity. Moreover, we use an unsupervised approach to distinguish the roles, in order to provide more flexibility relatively to the studied system. We then apply our method to the case of social capitalists and show that they are highly visible on Twitter, due to the specific roles they occupy.

preprint2013arXiv

A Comparison of Named Entity Recognition Tools Applied to Biographical Texts

Named entity recognition (NER) is a popular domain of natural language processing. For this reason, many tools exist to perform this task. Amongst other points, they differ in the processing method they rely upon, the entity types they can detect, the nature of the text they can handle, and their input/output formats. This makes it difficult for a user to select an appropriate NER tool for a specific situation. In this article, we try to answer this question in the context of biographic texts. For this matter, we first constitute a new corpus by annotating Wikipedia articles. We then select publicly available, well known and free for research NER tools for comparison: Stanford NER, Illinois NET, OpenCalais NER WS and Alias-i LingPipe. We apply them to our corpus, assess their performances and compare them. When considering overall performances, a clear hierarchy emerges: Stanford has the best results, followed by LingPipe, Illionois and OpenCalais. However, a more detailed evaluation performed relatively to entity types and article categories highlights the fact their performances are diversely influenced by those factors. This complementarity opens an interesting perspective regarding the combination of these individual tools in order to improve performance.

preprint2013arXiv

Benefits of Semantics on Web Service Composition from a Complex Network Perspective

The number of publicly available Web services (WS) is continuously growing, and in parallel, we are witnessing a rapid development in semantic-related web technologies. The intersection of the semantic web and WS allows the development of semantic WS. In this work, we adopt a complex network perspective to perform a comparative analysis of the syntactic and semantic approaches used to describe WS. From a collection of publicly available WS descriptions, we extract syntactic and semantic WS interaction networks. We take advantage of tools from the complex network field to analyze them and determine their properties. We show that WS interaction networks exhibit some of the typical characteristics observed in real-world networks, such as short average distance between nodes and community structure. By comparing syntactic and semantic networks through their properties, we show the introduction of semantics in WS descriptions should improve the composition process.

preprint2013arXiv

Identification de rôles communautaires dans des réseaux orientés appliquée à Twitter

The notion of community structure is particularly useful when analyzing complex networks, because it provides an intermediate level, compared to the more classic global (whole network) and local (node neighborhood) approaches. The concept of community role of a node was derived from this base, in order to describe the position of a node in a network depending on its connectivity at the community level. However, the existing approaches are restricted to undirected networks, use topological measures which do not consider all aspects of community-related connectivity, and their role identification methods are not generalizable to all networks. We tackle these limitations by generalizing and extending the measures, and using an unsupervised approach to determine the roles. We then illustrate the applicability of our method by analyzing a Twitter network.We show how our modifications allow discovering the fact some particular users called social capitalists occupy very specific roles in this system. --- La notion de structure de communautés est particulièrement utile pour étudier les réseaux complexes, car elle amène un niveau d'analyse intermédiaire, par opposition aux plus classiques niveaux local (voisinage des noeuds) et global (réseau entier). Le concept de rôle communautaire permet de décrire le positionnement d'un noeud en fonction de sa connectivité communautaire. Cependant, les approches existantes sont restreintes aux réseaux non-orientés, utilisent des mesures topologiques ne considérant pas tous les aspects de la connectivité communautaire, et des méthodes d'identification des rôles non-généralisables à tous les réseaux. Nous proposons de résoudre ces problèmes en généralisant les mesures existantes, et en utilisant une méthode non-supervisée pour déterminer les rôles. Nous illustrons l'intérêt de notre méthode en l'appliquant au réseau de Twitter. Nous montrons que nos modifications mettent en évidence les rôles spécifiques d'utilisateurs particuliers du réseau, nommés capitalistes sociaux.

preprint2013arXiv

MATAWS: A Multimodal Approach for Automatic WS Semantic Annotation

Many recent works aim at developing methods and tools for the processing of semantic Web services. In order to be properly tested, these tools must be applied to an appropriate benchmark, taking the form of a collection of semantic WS descriptions. However, all of the existing publicly available collections are limited by their size or their realism (use of randomly generated or resampled descriptions). Larger and realistic syntactic (WSDL) collections exist, but their semantic annotation requires a certain level of automation, due to the number of operations to be processed. In this article, we propose a fully automatic method to semantically annotate such large WS collections. Our approach is multimodal, in the sense it takes advantage of the latent semantics present not only in the parameter names, but also in the type names and structures. Concept-to-word association is performed by using Sigma, a mapping of WordNet to the SUMO ontology. After having described in details our annotation method, we apply it to the larger collection of real-world syntactic WS descriptions we could find, and assess its efficiency.

preprint2013arXiv

On Flexible Web Services Composition Networks

The semantic Web service community develops efforts to bring semantics to Web service descriptions and allow automatic discovery and composition. However, there is no widespread adoption of such descriptions yet, because semantically defining Web services is highly complicated and costly. As a result, production Web services still rely on syntactic descriptions, key-word based discovery and predefined compositions. Hence, more advanced research on syntactic Web services is still ongoing. In this work we build syntactic composition Web services networks with three well known similarity metrics, namely Levenshtein, Jaro and Jaro-Winkler. We perform a comparative study on the metrics performance by studying the topological properties of networks built from a test collection of real-world descriptions. It appears Jaro-Winkler finds more appropriate similarities and can be used at higher thresholds. For lower thresholds, the Jaro metric would be preferable because it detect less irrelevant relationships.

preprint2013arXiv

Rôle communautaire des capitalistes sociaux dans Twitter

Les capitalistes sociaux sont des utilisateurs de médias sociaux tels que Twitter, appliquant diverses techniques pour obtenir un maximum de visibilité. Ils peuvent être néfastes à l'équilibre du service, dans la mesure où leurs comptes, en gagnant en importance sans réelle raison de contenu, rendent difficile l'accès à un contenu pertinent. Dans ce travail, nous nous intéressons à leur caractérisation d'un point de vue purement topologique, i.e. sans considérer la nature des contenus partagés. Nous utilisons pour cela la notion de rôle communautaire, qui est basée sur la structure de communautés du réseau étudié. Nous apportons des modifications à des mesures précédemment définies à cet effet, et proposons une méthode objective de détection des rôles. Nous appliquons ensuite notre méthode à l'analyse d'un réseau représentant Twitter. Nos résultats montrent que les rôles que nous identifions via nos mesures se révèlent particulièrement cohérents par rapport aux capitalistes sociaux du réseau Twitter, dont le comportement est clairement identifié---Social capitalists are social media users taking advantage of various methods to maximize their visibility. This results in artificially important accounts, in the sense this importance is not backed by any real content. The risk is then to see those accounts hiding relevant contents and therefore preventing other users to access them. In this work, we want to characterize social capitalists from a purely topological perspective, i.e. without considering the nature of the shared contents. For this purpose, we use the notion of community role, based on the community structure of the studied network. We modify some measures previously designed for this matter, and propose an objective method to determine roles. We then apply this method to the analysis of a Twitter network. Our results show the roles identified through our measures are particularly consistent with Twitter's social capitalists, whose behavior was clearly identified.

preprint2013arXiv

Topological Properties of Web Services Similarity Networks

The number of publicly available Web services (WS) is continuously growing. To perform efficient WS discovery, it is desirable to organize the WS space. Works in this direction propose to group WS according to certain shared properties. Such groups commonly called communities are based either on similarity or on interaction between WS. In this paper we focus on the former, and propose a new network-based approach to extract communities from a WS collection. This process is three-stepped: first we define several similarity functions able to compare WS operations, second we use them to build so-called similarity networks, and third we identify communities under the form of specific structures in these networks. We apply our method on a collection of real-world WS and comment the resulting communities. Finally, we additionally provide an analysis and an interpretation of our similarity networks with a complex networks perspective.

preprint2013arXiv

Towards realistic artificial benchmark for community detection algorithms evaluation

Assessing the partitioning performance of community detection algorithms is one of the most important issues in complex network analysis. Artificially generated networks are often used as benchmarks for this purpose. However, previous studies showed their level of realism have a significant effect on the algorithms performance. In this study, we adopt a thorough experimental approach to tackle this problem and investigate this effect. To assess the level of realism, we use consensual network topological properties. Based on the LFR method, the most realistic generative method to date, we propose two alternative random models to replace the Configuration Model originally used in this algorithm, in order to increase its realism. Experimental results show both modifications allow generating collections of community-structured artificial networks whose topological properties are closer to those encountered in real-world networks. Moreover, the results obtained with eleven popular community identification algorithms on these benchmarks show their performance decrease on more realistic networks.

preprint2013arXiv

Une méthode pour caractériser les communautés des réseaux dynamiques à attributs

Many complex systems are modeled through complex networks whose analysis reveals typical topological properties. Amongst those, the community structure is one of the most studied. Many methods are proposed to detect communities, not only in plain, but also in attributed, directed or even dynamic networks. A community structure takes the form of a partition of the node set, which must then be characterized relatively to the properties of the studied system. We propose a method to support such a characterization task. We define a sequence-based representation of networks, combining temporal information, topological measures, and nodal attributes. We then characterize communities using the most representative emerging sequential patterns of its nodes. This also allows detecting unusual behavior in a community. We describe an empirical study of a network of scientific collaborations.---De nombreux systèmes complexes sont étudiés via l'analyse de réseaux dits complexes ayant des propriétés topologiques typiques. Parmi cellesci, les structures de communautés sont particulièrement étudiées. De nombreuses méthodes permettent de les détecter, y compris dans des réseaux contenant des attributs nodaux, des liens orientés ou évoluant dans le temps. La détection prend la forme d'une partition de l'ensemble des noeuds, qu'il faut ensuite caractériser relativement au système modélisé. Nous travaillons sur l'assistance à cette tâche de caractérisation. Nous proposons une représentation des réseaux sous la forme de séquences de descripteurs de noeuds, qui combinent les informations temporelles, les mesures topologiques, et les valeurs des attributs nodaux. Les communautés sont caractérisées au moyen des motifs séquentiels émergents les plus représentatifs issus de leurs noeuds. Ceci permet notamment la détection de comportements inhabituels au sein d'une communauté. Nous décrivons une étude empirique sur un réseau de collaboration scientifique.

preprint2013arXiv

Web Services Dependency Networks Analysis

Along with a continuously growing number of publicly available Web services (WS), we are witnessing a rapid development in semantic-related web technologies, which lead to the apparition of semantically described WS. In this work, we perform a comparative analysis of the syntactic and semantic approaches used to describe WS, from a complex network perspective. First, we extract syntactic and semantic WS dependency networks from a collection of publicly available WS descriptions. Then, we take advantage of tools from the complex network field to analyze them and determine their topological properties. We show WS dependency networks exhibit some of the typical characteristics observed in real-world networks, such as small world and scale free properties, as well as community structure. By comparing syntactic and semantic networks through their topological properties, we show the introduction of semantics in WS description allows modeling more accurately the dependencies between parameters, which in turn could lead to improved composition mining methods.

preprint2012arXiv

Accuracy Measures for the Comparison of Classifiers

The selection of the best classification algorithm for a given dataset is a very widespread problem. It is also a complex one, in the sense it requires to make several important methodological choices. Among them, in this work we focus on the measure used to assess the classification performance and rank the algorithms. We present the most popular measures and discuss their properties. Despite the numerous measures proposed over the years, many of them turn out to be equivalent in this specific case, to have interpretation problems, or to be unsuitable for our purpose. Consequently, classic overall success rate or marginal rates should be preferred for this specific task.

preprint2012arXiv

An Empirical Study of the Relation Between Community Structure and Transitivity

One of the most prominent properties in real-world networks is the presence of a community structure, i.e. dense and loosely interconnected groups of nodes called communities. In an attempt to better understand this concept, we study the relationship between the strength of the community structure and the network transitivity (or clustering coefficient). Although intuitively appealing, this analysis was not performed before. We adopt an approach based on random models to empirically study how one property varies depending on the other. It turns out the transitivity increases with the community structure strength, and is also affected by the distribution of the community sizes. Furthermore, increasing the transitivity also results in a stronger community structure. More surprisingly, if a very weak community structure causes almost zero transitivity, the opposite is not true and a network with a close to zero transitivity can still have a clearly defined community structure. Further analytical work is necessary to characterize the exact nature of the identified relationship.

preprint2012arXiv

Comparative Evaluation of Community Detection Algorithms: A Topological Approach

Community detection is one of the most active fields in complex networks analysis, due to its potential value in practical applications. Many works inspired by different paradigms are devoted to the development of algorithmic solutions allowing to reveal the network structure in such cohesive subgroups. Comparative studies reported in the literature usually rely on a performance measure considering the community structure as a partition (Rand Index, Normalized Mutual information, etc.). However, this type of comparison neglects the topological properties of the communities. In this article, we present a comprehensive comparative study of a representative set of community detection methods, in which we adopt both types of evaluation. Community-oriented topological measures are used to qualify the communities and evaluate their deviation from the reference structure. In order to mimic real-world systems, we use artificially generated realistic networks. It turns out there is no equivalence between both approaches: a high performance does not necessarily correspond to correct topological properties, and vice-versa. They can therefore be considered as complementary, and we recommend applying both of them in order to perform a complete and accurate assessment.

preprint2012arXiv

Qualitative Comparison of Community Detection Algorithms

Community detection is a very active field in complex networks analysis, consisting in identifying groups of nodes more densely interconnected relatively to the rest of the network. The existing algorithms are usually tested and compared on real-world and artificial networks, their performance being assessed through some partition similarity measure. However, artificial networks realism can be questioned, and the appropriateness of those measures is not obvious. In this study, we take advantage of recent advances concerning the characterization of community structures to tackle these questions. We first generate networks thanks to the most realistic model available to date. Their analysis reveals they display only some of the properties observed in real-world community structures. We then apply five community detection algorithms on these networks and find out the performance assessed quantitatively does not necessarily agree with a qualitative analysis of the identified communities. It therefore seems both approaches should be applied to perform a relevant comparison of the algorithms.

preprint2012arXiv

Topological measures for the analysis of wireless sensor networks

Concepts such as energy dependence, random deployment, dynamic topological update, self-organization, varying large number of nodes are among many factors that make WSNs a type of complex system. However, when analyzing WSNs properties using complex network tools, classical topological measures must be considered with care as they might not be applicable in their original form. In this work, we focus on the topological measures frequently used in the related field of Internet topological analysis. We illustrate their applicability to the WSNs domain through simulation experiments. In the cases when the classic metrics turn out to be incompatible, we propose some alternative measures and discuss them based on the WSNs characteristics.

preprint2012arXiv

Une nouvelle mesure pour l'évaluation des méthodes de détection de communautés

Community detection can be considered as a variant of cluster analysis applied to complex networks. For this reason, all existing studies have been using tools derived from this field when evaluating community detection algorithms. However, those are not completely relevant in the context of network analysis, because they ignore a part of the available information, and can therefore lead to incorrect interpretations. In this article, we illustrate this limitation, and propose a solution by modifying an existing measure. We then apply it to realistic community-structured networks, in order to perform a first evaluation.---La détection de communautés dans un réseau complexe est une tâche que l'on peut rapprocher de la classification non-supervisée réalisée en fouille de données classique. Pour cette raison, l'évaluation des algorithmes accomplissant ce type de traitement s'est faite jusqu'ici exclusivement au moyen de mesures comparables à celles utilisées en fouille de données. Cependant, dans le cas de l'analyse de réseau, celles-ci n'exploitent pas toute l'information disponible et sont susceptibles de fournir des résultats biaisés. Dans cet article, nous illustrons cette limitation et proposons une solution en modifiant une mesure existante. Nous l'appliquons ensuite à des données réalistes afin d'en effectuer une première évaluation expérimentale.

preprint2011arXiv

Evaluation of Performance Measures for Classifiers Comparison

The selection of the best classification algorithm for a given dataset is a very widespread problem, occuring each time one has to choose a classifier to solve a real-world problem. It is also a complex task with many important methodological decisions to make. Among those, one of the most crucial is the choice of an appropriate measure in order to properly assess the classification performance and rank the algorithms. In this article, we focus on this specific task. We present the most popular measures and compare their behavior through discrimination plots. We then discuss their properties from a more theoretical perspective. It turns out several of them are equivalent for classifiers comparison purposes. Futhermore. they can also lead to interpretation problems. Among the numerous measures proposed over the years, it appears that the classical overall success rate and marginal rates are the more suitable for classifier comparison task.

preprint2011arXiv

On Accuracy of Community Structure Discovery Algorithms

Community structure discovery in complex networks is a quite challenging problem spanning many applications in various disciplines such as biology, social network and physics. Emerging from various approaches numerous algorithms have been proposed to tackle this problem. Nevertheless little attention has been devoted to compare their efficiency on realistic simulated data. To better understand their relative performances, we evaluate systematically eleven algorithms covering the main approaches. The Normalized Mutual Information (NMI) measure is used to assess the quality of the discovered community structure from controlled artificial networks with realistic topological properties. Results show that along with the network size, the average proportion of intra-community to inter-community links is the most influential parameter on performances. Overall, "Infomap" is the leading algorithm, followed by "Walktrap", "SpinGlass" and "Louvain" which also achieve good consistency.

Vincent Labatut

What is connected

Connect this record

See the researcher in context

Building this map preview

37 published item(s)

Extraction and Analysis of Fictional Character Networks: A Survey

Characterizing and comparing external measures for the assessment of cluster analysis and community detection

Graph embeddings for Abusive Language Detection

Multiple Partitioning of Multiplex Signed Networks: Application to European Parliament Votes

Multiplicity and Diversity: Analyzing the Optimal Solution Space of the Correlation Clustering Problem on Complete Signed Graphs

Narrative Smoothing: Dynamic Conversational Network for the Analysis of TV Series Plots

Remembering Winter Was Coming: Character-Oriented Video Summaries of TV Series

Serial Speakers: a Dataset of TV Series

WAC: A Corpus of Wikipedia Conversations for Online Abuse Detection

A Review of Features for the Discrimination of Twitter Users: Application to the Prediction of Offline Influence

Generalized Measures for the Evaluation of Community Detection Methods

Straightness of rectilinear vs. radio-concentric networks: modeling simulation and comparison

A community role approach to assess social capitalists visibility in the Twitter network

Étude de l'omniprésence des propriétés petit- monde et sans-échelle

Interpreting communities based on the evolution of a dynamic attributed network

Relevance of Negative Links in Graph Partitioning: A Case Study Using Votes From the European Parliament

A Method for Characterizing Communities in Dynamic Attributed Complex Networks

Classification of Complex Networks Based on Topological Properties

Identifying the Community Roles of Social Capitalists in the Twitter Network

A Comparison of Named Entity Recognition Tools Applied to Biographical Texts

Benefits of Semantics on Web Service Composition from a Complex Network Perspective

Identification de rôles communautaires dans des réseaux orientés appliquée à Twitter

MATAWS: A Multimodal Approach for Automatic WS Semantic Annotation

On Flexible Web Services Composition Networks

Rôle communautaire des capitalistes sociaux dans Twitter

Topological Properties of Web Services Similarity Networks

Towards realistic artificial benchmark for community detection algorithms evaluation

Une méthode pour caractériser les communautés des réseaux dynamiques à attributs

Web Services Dependency Networks Analysis

Accuracy Measures for the Comparison of Classifiers

An Empirical Study of the Relation Between Community Structure and Transitivity

Comparative Evaluation of Community Detection Algorithms: A Topological Approach

Qualitative Comparison of Community Detection Algorithms

Topological measures for the analysis of wireless sensor networks

Une nouvelle mesure pour l'évaluation des méthodes de détection de communautés

Evaluation of Performance Measures for Classifiers Comparison

On Accuracy of Community Structure Discovery Algorithms