Researcher profile

Camille Roth

Camille Roth contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
10works
0followers
10topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

10 published item(s)

preprint2021arXiv

Semantic Hypergraphs

Approaches to Natural language processing (NLP) may be classified along a double dichotomy open/opaque - strict/adaptive. The former axis relates to the possibility of inspecting the underlying processing rules, the latter to the use of fixed or adaptive rules. We argue that many techniques fall into either the open-strict or opaque-adaptive categories. Our contribution takes steps in the open-adaptive direction, which we suggest is likely to provide key instruments for interdisciplinary research. The central idea of our approach is the Semantic Hypergraph (SH), a novel knowledge representation model that is intrinsically recursive and accommodates the natural hierarchical richness of natural language. The SH model is hybrid in two senses. First, it attempts to combine the strengths of ML and symbolic approaches. Second, it is a formal language representation that reduces but tolerates ambiguity and structural variability. We will see that SH enables simple yet powerful methods of pattern detection, and features a good compromise for intelligibility both for humans and machines. It also provides a semantically deep starting point (in terms of explicit meaning) for further algorithms to operate and collaborate on. We show how modern NLP ML-based building blocks can be used in combination with a random forest classifier and a simple search tree to parse NL to SH, and that this parser can achieve high precision in a diversity of text categories. We define a pattern language representable in SH itself, and a process to discover knowledge inference rules. We then illustrate the efficiency of the SH framework in a variety of tasks, including conjunction decomposition, open information extraction, concept taxonomy inference and co-reference resolution, and an applied example of claim and conflict analysis in a news corpus.

preprint2020arXiv

Tubes & Bubbles -- Topological confinement of YouTube recommendations

The role of recommendation algorithms in online user confinement is at the heart of a fast-growing literature. Recent empirical studies generally suggest that filter bubbles may principally be observed in the case of explicit recommendation (based on user-declared preferences) rather than implicit recommendation (based on user activity). We focus on YouTube which has become a major online content provider but where confinement has until now been little-studied in a systematic manner. Starting from a diverse number of seed videos, we first describe the properties of the sets of suggested videos in order to design a sound exploration protocol able to capture latent recommendation graphs recursively induced by these suggestions. These graphs form the background of potential user navigations along non-personalized recommendations. From there, be it in topological, topical or temporal terms, we show that the landscape of what we call mean-field YouTube recommendations is often prone to confinement dynamics. Moreover, the most confined recommendation graphs i.e., potential bubbles, seem to be organized around sets of videos that garner the highest audience and thus plausibly viewing time.

preprint2019arXiv

Interactional and Informational Attention on Twitter

Twitter may be considered as a decentralized social information processing platform whose users constantly receive their followees' information feeds, which they may in turn dispatch to their followers. This decentralization is not devoid of hierarchy and heterogeneity, both in terms of activity and attention. In particular, we appraise the distribution of attention at the collective and individual level, which exhibits the existence of attentional constraints and focus effects. We observe that most users usually concentrate their attention on a limited core of peers and topics, and discuss the relationship between interactional and informational attention processes -- all of which, we suggest, may be useful to refine influence models by enabling the consideration of differential attention likelihood depending on users, their activity levels and peers' positions.

preprint2018arXiv

Large-scale diversity estimation through surname origin inference

The study of surnames as both linguistic and geographical markers of the past has proven valuable in several research fields spanning from biology and genetics to demography and social mobility. This article builds upon the existing literature to conceive and develop a surname origin classifier based on a data-driven typology. This enables us to explore a methodology to describe large-scale estimates of the relative diversity of social groups, especially when such data is scarcely available. We subsequently analyze the representativeness of surname origins for 15 socio-professional groups in France.

preprint2014arXiv

Symbolic regression of generative network models

Networks are a powerful abstraction with applicability to a variety of scientific fields. Models explaining their morphology and growth processes permit a wide range of phenomena to be more systematically analysed and understood. At the same time, creating such models is often challenging and requires insights that may be counter-intuitive. Yet there currently exists no general method to arrive at better models. We have developed an approach to automatically detect realistic decentralised network growth models from empirical data, employing a machine learning technique inspired by natural selection and defining a unified formalism to describe such models as computer programs. As the proposed method is completely general and does not assume any pre-existing models, it can be applied "out of the box" to any given network. To validate our approach empirically, we systematically rediscover pre-defined growth laws underlying several canonical network generation models and credible laws for diverse real-world networks. We were able to find programs that are simple enough to lead to an actual understanding of the mechanisms proposed, namely for a simple brain and a social network.

preprint2012arXiv

A long-time limit of world subway networks

We study the temporal evolution of the structure of the world's largest subway networks in an exploratory manner. We show that, remarkably, all these networks converge to {a shape which shares similar generic features} despite their geographic and economic differences. This limiting shape is made of a core with branches radiating from it. For most of these networks, the average degree of a node (station) within the core has a value of order 2.5 and the proportion of k=2 nodes in the core is larger than 60%. The number of branches scales roughly as the square root of the number of stations, the current proportion of branches represents about half of the total number of stations, and the average diameter of branches is about twice the average radial extension of the core. Spatial measures such as the number of stations at a given distance to the barycenter display a first regime which grows as r^2 followed by another regime with different exponents, and eventually saturates. These results -- difficult to interpret in the framework of fractal geometry -- confirm and yield a natural explanation in the geometric picture of this core and their branches: the first regime corresponds to a uniform core, while the second regime is controlled by the interstation spacing on branches. The apparent convergence towards a unique network shape in the temporal limit suggests the existence of dominant, universal mechanisms governing the evolution of these structures.

preprint2012arXiv

Generating constrained random graphs using multiple edge switches

The generation of random graphs using edge swaps provides a reliable method to draw uniformly random samples of sets of graphs respecting some simple constraints, e.g. degree distributions. However, in general, it is not necessarily possible to access all graphs obeying some given con- straints through a classical switching procedure calling on pairs of edges. We therefore propose to get round this issue by generalizing this classical approach through the use of higher-order edge switches. This method, which we denote by "k-edge switching", makes it possible to progres- sively improve the covered portion of a set of constrained graphs, thereby providing an increasing, asymptotically certain confidence on the statistical representativeness of the obtained sample.

preprint2011arXiv

Intrinsically Dynamic Network Communities

Community finding algorithms for networks have recently been extended to dynamic data. Most of these recent methods aim at exhibiting community partitions from successive graph snapshots and thereafter connecting or smoothing these partitions using clever time-dependent features and sampling techniques. These approaches are nonetheless achieving longitudinal rather than dynamic community detection. We assume that communities are fundamentally defined by the repetition of interactions among a set of nodes over time. According to this definition, analyzing the data by considering successive snapshots induces a significant loss of information: we suggest that it blurs essentially dynamic phenomena - such as communities based on repeated inter-temporal interactions, nodes switching from a community to another across time, or the possibility that a community survives while its members are being integrally replaced over a longer time period. We propose a formalism which aims at tackling this issue in the context of time-directed datasets (such as citation networks), and present several illustrations on both empirical and synthetic dynamic networks. We eventually introduce intrinsically dynamic metrics to qualify temporal community structure and emphasize their possible role as an estimator of the quality of the community detection - taking into account the fact that various empirical contexts may call for distinct `community' definitions and detection criteria.

preprint2010arXiv

Academic team formation as evolving hypergraphs

This paper quantitatively explores the social and socio-semantic patterns of constitution of academic collaboration teams. To this end, we broadly underline two critical features of social networks of knowledge-based collaboration: first, they essentially consist of group-level interactions which call for team-centered approaches. Formally, this induces the use of hypergraphs and n-adic interactions, rather than traditional dyadic frameworks of interaction such as graphs, binding only pairs of agents. Second, we advocate the joint consideration of structural and semantic features, as collaborations are allegedly constrained by both of them. Considering these provisions, we propose a framework which principally enables us to empirically test a series of hypotheses related to academic team formation patterns. In particular, we exhibit and characterize the influence of an implicit group structure driving recurrent team formation processes. On the whole, innovative production does not appear to be correlated with more original teams, while a polarization appears between groups composed of experts only or non-experts only, altogether corresponding to collectives with a high rate of repeated interactions.

preprint2010arXiv

Precursors and Laggards: An Analysis of Semantic Temporal Relationships on a Blog Network

We explore the hypothesis that it is possible to obtain information about the dynamics of a blog network by analysing the temporal relationships between blogs at a semantic level, and that this type of analysis adds to the knowledge that can be extracted by studying the network only at the structural level of URL links. We present an algorithm to automatically detect fine-grained discussion topics, characterized by n-grams and time intervals. We then propose a probabilistic model to estimate the temporal relationships that blogs have with one another. We define the precursor score of blog A in relation to blog B as the probability that A enters a new topic before B, discounting the effect created by asymmetric posting rates. Network-level metrics of precursor and laggard behavior are derived from these dyadic precursor score estimations. This model is used to analyze a network of French political blogs. The scores are compared to traditional link degree metrics. We obtain insights into the dynamics of topic participation on this network, as well as the relationship between precursor/laggard and linking behaviors. We validate and analyze results with the help of an expert on the French blogosphere. Finally, we propose possible applications to the improvement of search engine ranking algorithms.