Source author record

Wissam Siblini

Wissam Siblini appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Computation and Language

Catalog footprint

What is connected

3works

2topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

The Importance of Future Information in Credit Card Fraud Detection

Fraud detection systems (FDS) mainly perform two tasks: (i) real-time detection while the payment is being processed and (ii) posterior detection to block the card retrospectively and avoid further frauds. Since human verification is often necessary and the payment processing time is limited, the second task manages the largest volume of transactions. In the literature, fraud detection challenges and algorithms performance are widely studied but the very formulation of the problem is never disrupted: it aims at predicting if a transaction is fraudulent based on its characteristics and the past transactions of the cardholder. Yet, in posterior detection, verification often takes days, so new payments on the card become available before a decision is taken. This is our motivation to propose a new paradigm: posterior fraud detection with "future" information. We start by providing evidence of the on-time availability of subsequent transactions, usable as extra context to improve detection. We then design a Bidirectional LSTM to make use of these transactions. On a real-world dataset with over 30 million transactions, it achieves higher performance than a regular LSTM, which is the state-of-the-art classifier for fraud detection that only uses the past context. We also introduce new metrics to show that the proposal catches more frauds, more compromised cards, and based on their earliest frauds. We believe that future works on this new paradigm will have a significant impact on the detection of compromised cards.

preprint2021arXiv

Multilingual Question Answering from Formatted Text applied to Conversational Agents

Recent advances with language models (e.g. BERT, XLNet, ...), have allowed surpassing human performance on complex NLP tasks such as Reading Comprehension. However, labeled datasets for training are available mostly in English which makes it difficult to acknowledge progress in other languages. Fortunately, models are now pre-trained on unlabeled data from hundreds of languages and exhibit interesting transfer abilities from one language to another. In this paper, we show that multilingual BERT is naturally capable of zero-shot transfer for an extractive Question Answering task (eQA) from English to other languages. More specifically, it outperforms the best previously known baseline for transfer to Japanese and French. Moreover, using a recently published large eQA French dataset, we are able to further show that (1) zero-shot transfer provides results really close to a direct training on the target language and (2) combination of transfer and training on target is the best option overall. We finally present a practical application: a multilingual conversational agent called Kate which answers to HR-related questions in several languages directly from the content of intranet pages.

preprint2020arXiv

Master your Metrics with Calibration

Machine learning models deployed in real-world applications are often evaluated with precision-based metrics such as F1-score or AUC-PR (Area Under the Curve of Precision Recall). Heavily dependent on the class prior, such metrics make it difficult to interpret the variation of a model's performance over different subpopulations/subperiods in a dataset. In this paper, we propose a way to calibrate the metrics so that they can be made invariant to the prior. We conduct a large number of experiments on balanced and imbalanced data to assess the behavior of calibrated metrics and show that they improve interpretability and provide a better control over what is really measured. We describe specific real-world use-cases where calibration is beneficial such as, for instance, model monitoring in production, reporting, or fairness evaluation.