Source author record

Georges Linares

Georges Linares appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Social and Information Networks Computation and Language Information Retrieval Multimedia physics.soc-ph

Catalog footprint

What is connected

5works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2021arXiv

Graph embeddings for Abusive Language Detection

Abusive behaviors are common on online social networks. The increasing frequency of antisocial behaviors forces the hosts of online platforms to find new solutions to address this problem. Automating the moderation process has thus received a lot of interest in the past few years. Various methods have been proposed, most based on the exchanged content, and one relying on the structure and dynamics of the conversation. It has the advantage of being languageindependent, however it leverages a hand-crafted set of topological measures which are computationally expensive and not necessarily suitable to all situations. In the present paper, we propose to use recent graph embedding approaches to automatically learn representations of conversational graphs depicting message exchanges. We compare two categories: node vs. whole-graph embeddings. We experiment with a total of 8 approaches and apply them to a dataset of online messages. We also study more precisely which aspects of the graph structure are leveraged by each approach. Our study shows that the representation produced by certain embeddings captures the information conveyed by specific topological measures, but misses out other aspects.

preprint2020arXiv

Serial Speakers: a Dataset of TV Series

For over a decade, TV series have been drawing increasing interest, both from the audience and from various academic fields. But while most viewers are hooked on the continuous plots of TV serials, the few annotated datasets available to researchers focus on standalone episodes of classical TV series. We aim at filling this gap by providing the multimedia/speech processing communities with Serial Speakers, an annotated dataset of 161 episodes from three popular American TV serials: Breaking Bad, Game of Thrones and House of Cards. Serial Speakers is suitable both for investigating multimedia retrieval in realistic use case scenarios, and for addressing lower level speech related tasks in especially challenging conditions. We publicly release annotations for every speech turn (boundaries, speaker) and scene boundary, along with annotations for shot boundaries, recurring shots, and interacting speakers in a subset of episodes. Because of copyright restrictions, the textual content of the speech turns is encrypted in the public version of the dataset, but we provide the users with a simple online tool to recover the plain text from their own subtitle files.

preprint2020arXiv

WAC: A Corpus of Wikipedia Conversations for Online Abuse Detection

With the spread of online social networks, it is more and more difficult to monitor all the user-generated content. Automating the moderation process of the inappropriate exchange content on Internet has thus become a priority task. Methods have been proposed for this purpose, but it can be challenging to find a suitable dataset to train and develop them. This issue is especially true for approaches based on information derived from the structure and the dynamic of the conversation. In this work, we propose an original framework, based on the Wikipedia Comment corpus, with comment-level abuse annotations of different types. The major contribution concerns the reconstruction of conversations, by comparison to existing corpora, which focus only on isolated messages (i.e. taken out of their conversational context). This large corpus of more than 380k annotated messages opens perspectives for online abuse detection and especially for context-based approaches. We also propose, in addition to this corpus, a complete benchmarking platform to stimulate and fairly compare scientific works around the problem of content abuse detection, trying to avoid the recurring problem of result replication. Finally, we apply two classification methods to our dataset to demonstrate its potential.

preprint2015arXiv

Forecasting popularity of videos in YouTube

This paper proposes a new prediction process to explain and predict popularity evolution of YouTube videos. We exploit our recent study on the classification of YouTube videos in order to predict the evolution of videos' view-count. This classification allows to identify important factors of the observed popularity dynamics. Our experimental results show that our prediction process is able to reduce the average prediction errors compared to a state-of-the-art baseline model. We also evaluate the impact of adding popularity criteria in our classification.

preprint2014arXiv

Modelling View-count Dynamics in YouTube

The goal of this paper is to study the behaviour of view-count in YouTube. We first propose several bio-inspired models for the evolution of the view-count of YouTube videos. We show, using a large set of empirical data, that the view-count for 90% of videos in YouTube can indeed be associated to at least one of these models, with a Mean Error which does not exceed 5%. We derive automatic ways of classifying the view-count curve into one of these models and of extracting the most suitable parameters of the model. We study empirically the impact of videos' popularity and category on the evolution of its view-count. We finally use the above classification along with the automatic parameters extraction in order to predict the evolution of videos' view-count.