Source author record

Sidahmed Benabderrahmane

Sidahmed Benabderrahmane appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Databases Computation and Language Cryptography and Security cs.CY

Catalog footprint

What is connected

4works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

The Algorithmic Caricature: Auditing LLM-Generated Political Discourse Across Crisis Events

Large Language Models (LLMs) can generate fluent political text at scale, raising concerns about synthetic discourse during crises and social conflict. Existing AI-text detection often focuses on sentence-level cues such as perplexity, burstiness, or token irregularities, but these signals may weaken as generative systems improve. We instead adopt a Computational Social Science perspective and ask whether synthetic political discourse behaves like an observed online population. We construct a paired corpus of 1,789,406 posts across nine crisis events: COVID-19, the Jan. 6 Capitol attack, the 2020 and 2024 U.S. elections, Dobbs/Roe v. Wade, the 2020 BLM protests, U.S. midterms, the Utah shooting, and the U.S.-Iran war. For each event, we compare observed discourse from social platforms with synthetic discourse generated for the same context. We evaluate four dimensions: emotional intensity, structural regularity, lexical-ideological framing, and cross-event dependency, using mean gaps and dispersion evidence. Across events, synthetic discourse is fluent but population-level unrealistic. It is generally more negative and less dispersed in sentiment, structurally more regular, and lexically more abstract than observed discourse. Observed discourse instead shows broader emotional variation, longer-tailed structural distributions, and more context-specific, colloquial lexical markers. These differences are event-dependent: larger for fast-moving, decentralized crises and smaller for formal or institutionally mediated events. We summarize them with a simple event-level measure, the Caricature Gap. Our findings suggest that the main limitation of synthetic political discourse is not grammar or fluency, but reduced population realism. Population-level auditing complements traditional text-detection and provides a CSS framework for evaluating the social realism of generated discourse.

preprint2020arXiv

Categorical anomaly detection in heterogeneous data using minimum description length clustering

Fast and effective unsupervised anomaly detection algorithms have been proposed for categorical data based on the minimum description length (MDL) principle. However, they can be ineffective when detecting anomalies in heterogeneous datasets representing a mixture of different sources, such as security scenarios in which system and user processes have distinct behavior patterns. We propose a meta-algorithm for enhancing any MDL-based anomaly detection model to deal with heterogeneous data by fitting a mixture model to the data, via a variant of k-means clustering. Our experimental results show that using a discrete mixture model provides competitive performance relative to two previous anomaly detection algorithms, while mixtures of more sophisticated models yield further gains, on both synthetic datasets and realistic datasets from a security scenario.

preprint2019arXiv

A baseline for unsupervised advanced persistent threat detection in system-level provenance

Advanced persistent threats (APT) are stealthy, sophisticated, and unpredictable cyberattacks that can steal intellectual property, damage critical infrastructure, or cause millions of dollars in damage. Detecting APTs by monitoring system-level activity is difficult because manually inspecting the high volume of normal system activity is overwhelming for security analysts. We evaluate the effectiveness of unsupervised batch and streaming anomaly detection algorithms over multiple gigabytes of provenance traces recorded on four different operating systems to determine whether they can detect realistic APT-like attacks reliably and efficiently. This report is the first detailed study of the effectiveness of generic unsupervised anomaly detection techniques in this setting.

preprint2016arXiv

Satellite Images Analysis with Symbolic Time Series: A Case Study of the Algerian Zone

Satellite Image Time Series (SITS) are an important source of information for studying land occupation and its evolution. Indeed, the very large volumes of digital data stored, usually are not ready to a direct analysis. In order to both reduce the dimensionality and information extraction, time series data mining generally gives rise to change of time series representation. In an objective of information intelligibility extracted from the representation change, we may use symbolic representations of time series. Many high level representations of time series have been proposed for data mining, including Fourier transforms, wavelets, piecewise polynomial models, etc. Many researchers have also considered symbolic representations of time series, noting that such representations would potentiality allow researchers to avail of the wealth of data structures and algorithms from the text processing and bioinformatics communities. We present in this work, one of the main symbolic representation methods "SAX"(Symbolic Aggregate Approximation) and we experience this method to symbolize and reduce the dimensionality of a Satellite Image Times Series acquired over a period of 5 years by characterizing the evolution of a vegetation index (NDVI).