Source author record

AbdelRahim Elmadany

AbdelRahim Elmadany appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation and Language Social and Information Networks Artificial Intelligence Machine Learning

Catalog footprint

What is connected

5works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

AraT5: Text-to-Text Transformers for Arabic Language Generation

Transfer learning with a unified Transformer framework (T5) that converts all language problems into a text-to-text format was recently proposed as a simple and effective transfer learning approach. Although a multilingual version of the T5 model (mT5) was also introduced, it is not clear how well it can fare on non-English tasks involving diverse data. To investigate this question, we apply mT5 on a language with a wide variety of dialects--Arabic. For evaluation, we introduce a novel benchmark for ARabic language GENeration (ARGEN), covering seven important tasks. For model comparison, we pre-train three powerful Arabic T5-style models and evaluate them on ARGEN. Although pre-trained with ~49 less data, our new models perform significantly better than mT5 on all ARGEN tasks (in 52 out of 59 test sets) and set several new SOTAs. Our models also establish new SOTA on the recently-proposed, large Arabic language understanding evaluation benchmark ARLUE (Abdul-Mageed et al., 2021). Our new models are publicly available. We also link to ARGEN datasets through our repository: https://github.com/UBC-NLP/araT5.

preprint2022arXiv

TURJUMAN: A Public Toolkit for Neural Arabic Machine Translation

We present TURJUMAN, a neural toolkit for translating from 20 languages into Modern Standard Arabic (MSA). TURJUMAN exploits the recently-introduced text-to-text Transformer AraT5 model, endowing it with a powerful ability to decode into Arabic. The toolkit offers the possibility of employing a number of diverse decoding methods, making it suited for acquiring paraphrases for the MSA translations as an added value. To train TURJUMAN, we sample from publicly available parallel data employing a simple semantic similarity method to ensure data quality. This allows us to prepare and release AraOPUS-20, a new machine translation benchmark. We publicly release our translation toolkit (TURJUMAN) as well as our benchmark dataset (AraOPUS-20).

preprint2021arXiv

Mega-COV: A Billion-Scale Dataset of 100+ Languages for COVID-19

We describe Mega-COV, a billion-scale dataset from Twitter for studying COVID-19. The dataset is diverse (covers 268 countries), longitudinal (goes as back as 2007), multilingual (comes in 100+ languages), and has a significant number of location-tagged tweets (~169M tweets). We release tweet IDs from the dataset. We also develop and release two powerful models, one for identifying whether or not a tweet is related to the pandemic (best F1=97%) and another for detecting misinformation about COVID-19 (best F1=92%). A human annotation study reveals the utility of our models on a subset of Mega-COV. Our data and models can be useful for studying a wide host of phenomena related to the pandemic. Mega-COV and our models are publicly available.

preprint2020arXiv

Holy Tweets: Exploring the Sharing of Quran on Twitter

While social media offer users a platform for self-expression, identity exploration, and community management, among other functions, they also offer space for religious practice and expression. In this paper, we explore social media spaces as they subtend new forms of religious experiences and rituals. We present a mixed-method study to understand the practice of sharing Quran verses on Arabic Twitter in their cultural context by combining a quantitative analysis of the most shared Quran verses, the topics covered by these verses, and the modalities of sharing, with a qualitative study of users' goals. This analysis of a set of 2.6 million tweets containing Quran verses demonstrates that online religious expression in the form of sharing Quran verses both extends offline religious life and supports new forms of religious expression including goals such as doing good deeds, giving charity, holding memorials, and showing solidarity. By analysing the responses on a survey, we found that our Arab Muslim respondents conceptualize social media platforms as everlasting, at least beyond their lifetimes, where they consider them to be effective for certain religious practices, such as reciting Quran, supplication (dua), and ceaseless charity. Our quantitative analysis of the most shared verses of the Quran underlines this commitment to religious expression as an act of worship, highlighting topics such as the hereafter, God's mercy, and sharia law. We note that verses on topics such as jihad are shared much less often, contradicting some media representation of Muslim social media use and practice.

preprint2020arXiv

Leveraging Affective Bidirectional Transformers for Offensive Language Detection

Social media are pervasive in our life, making it necessary to ensure safe online experiences by detecting and removing offensive and hate speech. In this work, we report our submission to the Offensive Language and hate-speech Detection shared task organized with the 4th Workshop on Open-Source Arabic Corpora and Processing Tools Arabic (OSACT4). We focus on developing purely deep learning systems, without a need for feature engineering. For that purpose, we develop an effective method for automatic data augmentation and show the utility of training both offensive and hate speech models off (i.e., by fine-tuning) previously trained affective models (i.e., sentiment and emotion). Our best models are significantly better than a vanilla BERT model, with 89.60% acc (82.31% macro F1) for hate speech and 95.20% acc (70.51% macro F1) on official TEST data.

AbdelRahim Elmadany

What is connected

Connect this record

See the researcher in context

Building this map preview

5 published item(s)

AraT5: Text-to-Text Transformers for Arabic Language Generation

TURJUMAN: A Public Toolkit for Neural Arabic Machine Translation

Mega-COV: A Billion-Scale Dataset of 100+ Languages for COVID-19

Holy Tweets: Exploring the Sharing of Quran on Twitter

Leveraging Affective Bidirectional Transformers for Offensive Language Detection