Source author record

Adrien Guille

Adrien Guille appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation and Language Social and Information Networks Information Retrieval Machine Learning physics.soc-ph

Catalog footprint

What is connected

7works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Anchor Prediction: A Topic Modeling Approach

Networks of documents connected by hyperlinks, such as Wikipedia, are ubiquitous. Hyperlinks are inserted by the authors to enrich the text and facilitate the navigation through the network. However, authors tend to insert only a fraction of the relevant hyperlinks, mainly because this is a time consuming task. In this paper we address an annotation, which we refer to as anchor prediction. Even though it is conceptually close to link prediction or entity linking, it is a different task that require developing a specific method to solve it. Given a source document and a target document, this task consists in automatically identifying anchors in the source document, i.e words or terms that should carry a hyperlink pointing towards the target document. We propose a contextualized relational topic model, CRTM, that models directed links between documents as a function of the local context of the anchor in the source document and the whole content of the target document. The model can be used to predict anchors in a source document, given the target document, without relying on a dictionary of previously seen mention or title, nor any external knowledge graph. Authors can benefit from CRTM, by letting it automatically suggest hyperlinks, given a new document and the set of target document to connect to. It can also benefit to readers, by dynamically inserting hyperlinks between the documents they're reading. Experiments conducted on several Wikipedia corpora (in English, Italian and German) highlight the practical usefulness of anchor prediction and demonstrate the relevancy of our approach.

preprint2020arXiv

Document Network Projection in Pretrained Word Embedding Space

We present Regularized Linear Embedding (RLE), a novel method that projects a collection of linked documents (e.g. citation network) into a pretrained word embedding space. In addition to the textual content, we leverage a matrix of pairwise similarities providing complementary information (e.g., the network proximity of two documents in a citation graph). We first build a simple word vector average for each document, and we use the similarities to alter this average representation. The document representations can help to solve many information retrieval tasks, such as recommendation, classification and clustering. We demonstrate that our approach outperforms or matches existing document network embedding methods on node classification and link prediction tasks. Furthermore, we show that it helps identifying relevant keywords to describe document classes.

preprint2020arXiv

Inductive Document Network Embedding with Topic-Word Attention

Document network embedding aims at learning representations for a structured text corpus i.e. when documents are linked to each other. Recent algorithms extend network embedding approaches by incorporating the text content associated with the nodes in their formulations. In most cases, it is hard to interpret the learned representations. Moreover, little importance is given to the generalization to new documents that are not observed within the network. In this paper, we propose an interpretable and inductive document network embedding method. We introduce a novel mechanism, the Topic-Word Attention (TWA), that generates document representations based on the interplay between word and topic representations. We train these word and topic vectors through our general model, Inductive Document Network Embedding (IDNE), by leveraging the connections in the document network. Quantitative evaluations show that our approach achieves state-of-the-art performance on various networks and we qualitatively show that our model produces meaningful and interpretable representations of the words, topics and documents.

preprint2020arXiv

New Datasets and a Benchmark of Document Network Embedding Methods for Scientific Expert Finding

The scientific literature is growing faster than ever. Finding an expert in a particular scientific domain has never been as hard as today because of the increasing amount of publications and because of the ever growing diversity of expertise fields. To tackle this challenge, automatic expert finding algorithms rely on the vast scientific heterogeneous network to match textual queries with potential expert candidates. In this direction, document network embedding methods seem to be an ideal choice for building representations of the scientific literature. Citation and authorship links contain major complementary information to the textual content of the publications. In this paper, we propose a benchmark for expert finding in document networks by leveraging data extracted from a scientific citation network and three scientific question & answer websites. We compare the performances of several algorithms on these different sources of data and further study the applicability of embedding methods on an expert finding task.

preprint2015arXiv

CommentWatcher: An Open Source Web-based platform for analyzing discussions on web forums

We present CommentWatcher, an open source tool aimed at analyzing discussions on web forums. Constructed as a web platform, CommentWatcher features automatic mass fetching of user posts from forum on multiple sites, extracting topics, visualizing the topics as an expression cloud and exploring their temporal evolution. The underlying social network of users is simultaneously constructed using the citation relations between users and visualized as a graph structure. Our platform addresses the issues of the diversity and dynamics of structures of webpages hosting the forums by implementing a parser architecture that is independent of the HTML structure of webpages. This allows easy on-the-fly adding of new websites. Two types of users are targeted: end users who seek to study the discussed topics and their temporal evolution, and researchers in need of establishing a forum benchmark dataset and comparing the performances of analysis tools.

preprint2015arXiv

Event detection, tracking, and visualization in Twitter: a mention-anomaly-based approach

The ever-growing number of people using Twitter makes it a valuable source of timely information. However, detecting events in Twitter is a difficult task, because tweets that report interesting events are overwhelmed by a large volume of tweets on unrelated topics. Existing methods focus on the textual content of tweets and ignore the social aspect of Twitter. In this paper we propose MABED (i.e. mention-anomaly-based event detection), a novel statistical method that relies solely on tweets and leverages the creation frequency of dynamic links (i.e. mentions) that users insert in tweets to detect significant events and estimate the magnitude of their impact over the crowd. MABED also differs from the literature in that it dynamically estimates the period of time during which each event is discussed, rather than assuming a predefined fixed duration for all events. The experiments we conducted on both English and French Twitter data show that the mention-anomaly-based approach leads to more accurate event detection and improved robustness in presence of noisy Twitter content. Qualitatively speaking, we find that MABED helps with the interpretation of detected events by providing clear textual descriptions and precise temporal descriptions. We also show how MABED can help understanding users' interest. Furthermore, we describe three visualizations designed to favor an efficient exploration of the detected events.

preprint2013arXiv

Predicting the Temporal Dynamics of Information Diffusion in Social Networks

Online social networks play a major role in the spread of information at very large scale and it becomes essential to provide means to analyse this phenomenon. In this paper we address the issue of predicting the temporal dynamics of the information diffusion process. We develop a graph-based approach built on the assumption that the macroscopic dynamics of the spreading process are explained by the topology of the network and the interactions that occur through it, between pairs of users, on the basis of properties at the microscopic level. We introduce a generic model, called T-BaSIC, and describe how to estimate its parameters from users behaviours using machine learning techniques. Contrary to classical approaches where the parameters are fixed in advance, T-BaSIC's parameters are functions depending of time, which permit to better approximate and adapt to the diffusion phenomenon observed in online social networks. Our proposal has been validated on real Twitter datasets. Experiments show that our approach is able to capture the particular patterns of diffusion depending of the studied sub-networks of users and topics. The results corroborate the "two-step" theory (1955) that states that information flows from media to a few "opinion leaders" who then transfer it to the mass population via social networks and show that it applies in the online context. This work also highlights interesting recommendations for future investigations.

Adrien Guille

What is connected

Connect this record

See the researcher in context

Building this map preview

7 published item(s)

Anchor Prediction: A Topic Modeling Approach

Document Network Projection in Pretrained Word Embedding Space

Inductive Document Network Embedding with Topic-Word Attention

New Datasets and a Benchmark of Document Network Embedding Methods for Scientific Expert Finding

CommentWatcher: An Open Source Web-based platform for analyzing discussions on web forums

Event detection, tracking, and visualization in Twitter: a mention-anomaly-based approach

Predicting the Temporal Dynamics of Information Diffusion in Social Networks