Source author record

Ashwin Geet D'Sa

Ashwin Geet D'Sa appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation and Language Machine Learning

Catalog footprint

What is connected

2works

2topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Placing M-Phasis on the Plurality of Hate: A Feature-Based Corpus of Hate Online

Even though hate speech (HS) online has been an important object of research in the last decade, most HS-related corpora over-simplify the phenomenon of hate by attempting to label user comments as "hate" or "neutral". This ignores the complex and subjective nature of HS, which limits the real-life applicability of classifiers trained on these corpora. In this study, we present the M-Phasis corpus, a corpus of ~9k German and French user comments collected from migration-related news articles. It goes beyond the "hate"-"neutral" dichotomy and is instead annotated with 23 features, which in combination become descriptors of various types of speech, ranging from critical comments to implicit and explicit expressions of hate. The annotations are performed by 4 native speakers per language and achieve high (0.77 <= k <= 1) inter-annotator agreements. Besides describing the corpus creation and presenting insights from a content, error and domain analysis, we explore its data characteristics by training several classification baselines.

preprint2020arXiv

Towards non-toxic landscapes: Automatic toxic comment detection using DNN

The spectacular expansion of the Internet has led to the development of a new research problem in the field of natural language processing: automatic toxic comment detection, since many countries prohibit hate speech in public media. There is no clear and formal definition of hate, offensive, toxic and abusive speeches. In this article, we put all these terms under the umbrella of "toxic" speech. The contribution of this paper is the design of binary classification and regression-based approaches aiming to predict whether a comment is toxic or not. We compare different unsupervised word representations and different DNN based classifiers. Moreover, we study the robustness of the proposed approaches to adversarial attacks by adding one (healthy or toxic) word. We evaluate the proposed methodology on the English Wikipedia Detox corpus. Our experiments show that using BERT fine-tuning outperforms feature-based BERT, Mikolov's and fastText representations with different DNN classifiers.

Ashwin Geet D'Sa

What is connected

Connect this record

See the researcher in context

Building this map preview

2 published item(s)

Placing M-Phasis on the Plurality of Hate: A Feature-Based Corpus of Hate Online

Towards non-toxic landscapes: Automatic toxic comment detection using DNN

Ashwin Geet D&#39;Sa

What is connected

Connect this record

See the researcher in context

Building this map preview

2 published item(s)

Placing M-Phasis on the Plurality of Hate: A Feature-Based Corpus of Hate Online

Towards non-toxic landscapes: Automatic toxic comment detection using DNN

Ashwin Geet D'Sa