Source author record

Moncef Gabbouj

Moncef Gabbouj appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

45works

23topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

A Pairwise Comparison Relation-assisted Multi-objective Evolutionary Neural Architecture Search Method with Multi-population Mechanism

Neural architecture search (NAS) has emerged as a powerful paradigm that enables researchers to automatically explore vast search spaces and discover efficient neural networks. However, NAS suffers from a critical bottleneck, i.e. the evaluation of numerous architectures during the search process demands substantial computing resources and time. In order to improve the efficiency of NAS, a series of methods have been proposed to reduce the evaluation time of neural architectures. However, they are not efficient enough and still only focus on the accuracy of architectures. Beyond classification accuracy, real-world applications increasingly demand more efficient and compact network architectures that balance multiple performance criteria. To address these challenges, we propose the SMEMNAS, a pairwise comparison relation-assisted multi-objective evolutionary algorithm based on a multi-population mechanism. In the SMEMNAS, a surrogate model is constructed based on pairwise comparison relations to predict the accuracy ranking of architectures, rather than the absolute accuracy. Moreover, two populations cooperate with each other in the search process, i.e. a main population that guides the evolutionary process, while a vice population that enhances search diversity. Our method aims to discover high-performance models that simultaneously optimize multiple objectives. We conduct comprehensive experiments on CIFAR-10, CIFAR-100 and ImageNet datasets to validate the effectiveness of our approach. With only a single GPU searching for 0.17 days, competitive architectures can be found by SMEMNAS which achieves 78.91% accuracy with the MAdds of 570M on the ImageNet. This work makes a significant advancement in the field of NAS.

preprint2024arXiv

Class-wise Generalization Error: an Information-Theoretic Analysis

Existing generalization theories of supervised learning typically take a holistic approach and provide bounds for the expected generalization over the whole data distribution, which implicitly assumes that the model generalizes similarly for all the classes. In practice, however, there are significant variations in generalization performance among different classes, which cannot be captured by the existing generalization bounds. In this work, we tackle this problem by theoretically studying the class-generalization error, which quantifies the generalization performance of each individual class. We derive a novel information-theoretic bound for class-generalization error using the KL divergence, and we further obtain several tighter bounds using the conditional mutual information (CMI), which are significantly easier to estimate in practice. We empirically validate our proposed bounds in different neural networks and show that they accurately capture the complex class-generalization error behavior. Moreover, we show that the theoretical tools developed in this paper can be applied in several applications beyond this context.

preprint2023arXiv

Blind Restoration of Real-World Audio by 1D Operational GANs

Objective: Despite numerous studies proposed for audio restoration in the literature, most of them focus on an isolated restoration problem such as denoising or dereverberation, ignoring other artifacts. Moreover, assuming a noisy or reverberant environment with limited number of fixed signal-to-distortion ratio (SDR) levels is a common practice. However, real-world audio is often corrupted by a blend of artifacts such as reverberation, sensor noise, and background audio mixture with varying types, severities, and duration. In this study, we propose a novel approach for blind restoration of real-world audio signals by Operational Generative Adversarial Networks (Op-GANs) with temporal and spectral objective metrics to enhance the quality of restored audio signal regardless of the type and severity of each artifact corrupting it. Methods: 1D Operational-GANs are used with generative neuron model optimized for blind restoration of any corrupted audio signal. Results: The proposed approach has been evaluated extensively over the benchmark TIMIT-RAR (speech) and GTZAN-RAR (non-speech) datasets corrupted with a random blend of artifacts each with a random severity to mimic real-world audio signals. Average SDR improvements of over 7.2 dB and 4.9 dB are achieved, respectively, which are substantial when compared with the baseline methods. Significance: This is a pioneer study in blind audio restoration with the unique capability of direct (time-domain) restoration of real-world audio whilst achieving an unprecedented level of performance for a wide SDR range and artifact types. Conclusion: 1D Op-GANs can achieve robust and computationally effective real-world audio restoration with significantly improved performance. The source codes and the generated real-world audio datasets are shared publicly with the research community in a dedicated GitHub repository1.

preprint2023arXiv

WLD-Reg: A Data-dependent Within-layer Diversity Regularizer

Neural networks are composed of multiple layers arranged in a hierarchical structure jointly trained with a gradient-based optimization, where the errors are back-propagated from the last layer back to the first one. At each optimization step, neurons at a given layer receive feedback from neurons belonging to higher layers of the hierarchy. In this paper, we propose to complement this traditional 'between-layer' feedback with additional 'within-layer' feedback to encourage the diversity of the activations within the same layer. To this end, we measure the pairwise similarity between the outputs of the neurons and use it to model the layer's overall diversity. We present an extensive empirical study confirming that the proposed approach enhances the performance of several state-of-the-art neural network models in multiple tasks. The code is publically available at \url{https://github.com/firasl/AAAI-23-WLD-Reg}

preprint2022arXiv

A Survey on Mobile Edge Computing for Video Streaming: Opportunities and Challenges

5G communication brings substantial improvements in the quality of service provided to various applications by achieving higher throughput and lower latency. However, interactive multimedia applications (e.g., ultra high definition video conferencing, 3D and multiview video streaming, crowd-sourced video streaming, cloud gaming, virtual and augmented reality) are becoming more ambitious with high volume and low latency video streams putting strict demands on the already congested networks. Mobile Edge Computing (MEC) is an emerging paradigm that extends cloud computing capabilities to the edge of the network i.e., at the base station level. To meet the latency requirements and avoid the end-to-end communication with remote cloud data centers, MEC allows to store and process video content (e.g., caching, transcoding, pre-processing) at the base stations. Both video on demand and live video streaming can utilize MEC to improve existing services and develop novel use cases, such as video analytics, and targeted advertisements. MEC is expected to reshape the future of video streaming by providing ultra-reliable and low latency streaming (e.g., in augmented reality, virtual reality, and autonomous vehicles), pervasive computing (e.g., in real-time video analytics), and blockchain-enabled architecture for secure live streaming. This paper presents a comprehensive survey of recent developments in MEC-enabled video streaming bringing unprecedented improvement to enable novel use cases. A detailed review of the state-of-the-art is presented covering novel caching schemes, optimal computation offloading, cooperative caching and offloading and the use of artificial intelligence (i.e., machine learning, deep learning, and reinforcement learning) in MEC-assisted video streaming services.

preprint2022arXiv

Blind ECG Restoration by Operational Cycle-GANs

Continuous long-term monitoring of electrocardiography (ECG) signals is crucial for the early detection of cardiac abnormalities such as arrhythmia. Non-clinical ECG recordings acquired by Holter and wearable ECG sensors often suffer from severe artifacts such as baseline wander, signal cuts, motion artifacts, variations on QRS amplitude, noise, and other interferences. Usually, a set of such artifacts occur on the same ECG signal with varying severity and duration, and this makes an accurate diagnosis by machines or medical doctors extremely difficult. Despite numerous studies that have attempted ECG denoising, they naturally fail to restore the actual ECG signal corrupted with such artifacts due to their simple and naive noise model. In this study, we propose a novel approach for blind ECG restoration using cycle-consistent generative adversarial networks (Cycle-GANs) where the quality of the signal can be improved to a clinical level ECG regardless of the type and severity of the artifacts corrupting the signal. To further boost the restoration performance, we propose 1D operational Cycle-GANs with the generative neuron model. The proposed approach has been evaluated extensively using one of the largest benchmark ECG datasets from the China Physiological Signal Challenge (CPSC-2020) with more than one million beats. Besides the quantitative and qualitative evaluations, a group of cardiologists performed medical evaluations to validate the quality and usability of the restored ECG, especially for an accurate arrhythmia diagnosis.

preprint2022arXiv

Global ECG Classification by Self-Operational Neural Networks with Feature Injection

Objective: Global (inter-patient) ECG classification for arrhythmia detection over Electrocardiogram (ECG) signal is a challenging task for both humans and machines. The main reason is the significant variations of both normal and arrhythmic ECG patterns among patients. Automating this process with utmost accuracy is, therefore, highly desirable due to the advent of wearable ECG sensors. However, even with numerous deep learning approaches proposed recently, there is still a notable gap in the performance of global and patient-specific ECG classification performances. This study proposes a novel approach to narrow this gap and propose a real-time solution with shallow and compact 1D Self-Organized Operational Neural Networks (Self-ONNs). Methods: In this study, we propose a novel approach for inter-patient ECG classification using a compact 1D Self-ONN by exploiting morphological and timing information in heart cycles. We used 1D Self-ONN layers to automatically learn morphological representations from ECG data, enabling us to capture the shape of the ECG waveform around the R peaks. We further inject temporal features based on RR interval for timing characterization. The classification layers can thus benefit from both temporal and learned features for the final arrhythmia classification. Results: Using the MIT-BIH arrhythmia benchmark database, the proposed method achieves the highest classification performance ever achieved, i.e., 99.21% precision, 99.10% recall, and 99.15% F1-score for normal (N) segments; 82.19% precision, 82.50% recall, and 82.34% F1-score for the supra-ventricular ectopic beat (SVEBs); and finally, 94.41% precision, 96.10% recall, and 95.2% F1-score for the ventricular-ectopic beats (VEBs).

preprint2022arXiv

Human experts vs. machines in taxa recognition

The step of expert taxa recognition currently slows down the response time of many bioassessments. Shifting to quicker and cheaper state-of-the-art machine learning approaches is still met with expert scepticism towards the ability and logic of machines. In our study, we investigate both the differences in accuracy and in the identification logic of taxonomic experts and machines. We propose a systematic approach utilizing deep Convolutional Neural Nets with the transfer learning paradigm and extensively evaluate it over a multi-pose taxonomic dataset with hierarchical labels specifically created for this comparison. We also study the prediction accuracy on different ranks of taxonomic hierarchy in detail. We used support vector machine classifier as a benchmark. Our results revealed that human experts using actual specimens yield the lowest classification error ($\overline{CE}=6.1\%$). However, a much faster, automated approach using deep Convolutional Neural Nets comes close to human accuracy ($\overline{CE}=11.4\%$) when a typical flat classification approach is used. Contrary to previous findings in the literature, we find that for machines following a typical flat classification approach commonly used in machine learning performs better than forcing machines to adopt a hierarchical, local per parent node approach used by human taxonomic experts ($\overline{CE}=13.8\%$). Finally, we publicly share our unique dataset to serve as a public benchmark dataset in this field.

preprint2022arXiv

Non-Linear Spectral Dimensionality Reduction Under Uncertainty

In this paper, we consider the problem of non-linear dimensionality reduction under uncertainty, both from a theoretical and algorithmic perspectives. Since real-world data usually contain measurements with uncertainties and artifacts, the input space in the proposed framework consists of probability distributions to model the uncertainties associated with each sample. We propose a new dimensionality reduction framework, called NGEU, which leverages uncertainty information and directly extends several traditional approaches, e.g., KPCA, MDA/KMFA, to receive as inputs the probability distributions instead of the original data. We show that the proposed NGEU formulation exhibits a global closed-form solution, and we analyze, based on the Rademacher complexity, how the underlying uncertainties theoretically affect the generalization ability of the framework. Empirical results on different datasets show the effectiveness of the proposed framework.

preprint2022arXiv

Self-attention fusion for audiovisual emotion recognition with incomplete data

In this paper, we consider the problem of multimodal data analysis with a use case of audiovisual emotion recognition. We propose an architecture capable of learning from raw data and describe three variants of it with distinct modality fusion mechanisms. While most of the previous works consider the ideal scenario of presence of both modalities at all times during inference, we evaluate the robustness of the model in the unconstrained settings where one modality is absent or noisy, and propose a method to mitigate these limitations in a form of modality dropout. Most importantly, we find that following this approach not only improves performance drastically under the absence/noisy representations of one modality, but also improves the performance in a standard ideal setting, outperforming the competing methods.

preprint2022arXiv

Self-Attention Neural Bag-of-Features

In this work, we propose several attention formulations for multivariate sequence data. We build on top of the recently introduced 2D-Attention and reformulate the attention learning methodology by quantifying the relevance of feature/temporal dimensions through latent spaces based on self-attention rather than learning them directly. In addition, we propose a joint feature-temporal attention mechanism that learns a joint 2D attention mask highlighting relevant information without treating feature and temporal representations independently. The proposed approaches can be used in various architectures and we specifically evaluate their application together with Neural Bag of Features feature extraction module. Experiments on several sequence data analysis tasks show the improved performance yielded by our approach compared to standard methods.

preprint2021arXiv

BM3D vs 2-Layer ONN

Despite their recent success on image denoising, the need for deep and complex architectures still hinders the practical usage of CNNs. Older but computationally more efficient methods such as BM3D remain a popular choice, especially in resource-constrained scenarios. In this study, we aim to find out whether compact neural networks can learn to produce competitive results as compared to BM3D for AWGN image denoising. To this end, we configure networks with only two hidden layers and employ different neuron models and layer widths for comparing the performance with BM3D across different AWGN noise levels. Our results conclusively show that the recently proposed self-organized variant of operational neural networks based on a generative neuron model (Self-ONNs) is not only a better choice as compared to CNNs, but also provide competitive results as compared to BM3D and even significantly surpass it for high noise levels.

preprint2021arXiv

Ensembling object detectors for image and video data analysis

In this paper, we propose a method for ensembling the outputs of multiple object detectors for improving detection performance and precision of bounding boxes on image data. We further extend it to video data by proposing a two-stage tracking-based scheme for detection refinement. The proposed method can be used as a standalone approach for improving object detection performance, or as a part of a framework for faster bounding box annotation in unseen datasets, assuming that the objects of interest are those present in some common public datasets.

preprint2021arXiv

Neural Architecture Search by Estimation of Network Structure Distributions

The influence of deep learning is continuously expanding across different domains, and its new applications are ubiquitous. The question of neural network design thus increases in importance, as traditional empirical approaches are reaching their limits. Manual design of network architectures from scratch relies heavily on trial and error, while using existing pretrained models can introduce redundancies or vulnerabilities. Automated neural architecture design is able to overcome these problems, but the most successful algorithms operate on significantly constrained design spaces, assuming the target network to consist of identical repeating blocks. While such approach allows for faster search, it does so at the cost of expressivity. We instead propose an alternative probabilistic representation of a whole neural network structure under the assumption of independence between layer types. Our matrix of probabilities is equivalent to the population of models, but allows for discovery of structural irregularities, while being simple to interpret and analyze. We construct an architecture search algorithm, inspired by the estimation of distribution algorithms, to take advantage of this representation. The probability matrix is tuned towards generating high-performance models by repeatedly sampling the architectures and evaluating the corresponding networks, while gradually increasing the model depth. Our algorithm is shown to discover non-regular models which cannot be expressed via blocks, but are competitive both in accuracy and computational cost, while not utilizing complex dataflows or advanced training techniques, as well as remaining conceptually simple and highly extensible.

preprint2021arXiv

Robust R-Peak Detection in Low-Quality Holter ECGs using 1D Convolutional Neural Network

Noise and low quality of ECG signals acquired from Holter or wearable devices deteriorate the accuracy and robustness of R-peak detection algorithms. This paper presents a generic and robust system for R-peak detection in Holter ECG signals. While many proposed algorithms have successfully addressed the problem of ECG R-peak detection, there is still a notable gap in the performance of these detectors on such low-quality ECG records. Therefore, in this study, a novel implementation of the 1D Convolutional Neural Network (CNN) is used integrated with a verification model to reduce the number of false alarms. This CNN architecture consists of an encoder block and a corresponding decoder block followed by a sample-wise classification layer to construct the 1D segmentation map of R- peaks from the input ECG signal. Once the proposed model has been trained, it can solely be used to detect R-peaks possibly in a single channel ECG data stream quickly and accurately, or alternatively, such a solution can be conveniently employed for real-time monitoring on a lightweight portable device. The model is tested on two open-access ECG databases: The China Physiological Signal Challenge (2020) database (CPSC-DB) with more than one million beats, and the commonly used MIT-BIH Arrhythmia Database (MIT-DB). Experimental results demonstrate that the proposed systematic approach achieves 99.30% F1-score, 99.69% recall, and 98.91% precision in CPSC-DB, which is the best R-peak detection performance ever achieved. Compared to all competing methods, the proposed approach can reduce the false-positives and false-negatives in Holter ECG signals by more than 54% and 82%, respectively. Results also demonstrate similar or better performance than most competing algorithms on MIT-DB with 99.83% F1-score, 99.85% recall, and 99.82% precision.

preprint2021arXiv

Speech Command Recognition in Computationally Constrained Environments with a Quadratic Self-organized Operational Layer

Automatic classification of speech commands has revolutionized human computer interactions in robotic applications. However, employed recognition models usually follow the methodology of deep learning with complicated networks which are memory and energy hungry. So, there is a need to either squeeze these complicated models or use more efficient light-weight models in order to be able to implement the resulting classifiers on embedded devices. In this paper, we pick the second approach and propose a network layer to enhance the speech command recognition capability of a lightweight network and demonstrate the result via experiments. The employed method borrows the ideas of Taylor expansion and quadratic forms to construct a better representation of features in both input and hidden layers. This richer representation results in recognition accuracy improvement as shown by extensive experiments on Google speech commands (GSC) and synthetic speech commands (SSC) datasets.

Moncef Gabbouj

What is connected

Connect this record

See the researcher in context

Building this map preview

45 published item(s)

A Pairwise Comparison Relation-assisted Multi-objective Evolutionary Neural Architecture Search Method with Multi-population Mechanism

Class-wise Generalization Error: an Information-Theoretic Analysis

Blind Restoration of Real-World Audio by 1D Operational GANs

WLD-Reg: A Data-dependent Within-layer Diversity Regularizer

A Survey on Mobile Edge Computing for Video Streaming: Opportunities and Challenges

Blind ECG Restoration by Operational Cycle-GANs

Global ECG Classification by Self-Operational Neural Networks with Feature Injection

Human experts vs. machines in taxa recognition

Non-Linear Spectral Dimensionality Reduction Under Uncertainty

Self-attention fusion for audiovisual emotion recognition with incomplete data

Self-Attention Neural Bag-of-Features

BM3D vs 2-Layer ONN

Ensembling object detectors for image and video data analysis

Neural Architecture Search by Estimation of Network Structure Distributions

Robust R-Peak Detection in Low-Quality Holter ECGs using 1D Convolutional Neural Network

Speech Command Recognition in Computationally Constrained Environments with a Quadratic Self-organized Operational Layer

A Review of Vibration-Based Damage Detection in Civil Structures: From Traditional Methods to Machine Learning and Deep Learning Applications

aColor: Mechatronics, Machine Learning, and Communications in an Unmanned Surface Vehicle

Attention-based Neural Bag-of-Features Learning for Sequence Data

Automatic image-based identification and biomass estimation of invertebrates

Benchmark Dataset for Mid-Price Forecasting of Limit Order Book Data with Machine Learning Methods

Collaborative Multi-Robot Systems for Search and Rescue: Coordination and Perception

Data Normalization for Bilinear Structures in High-Frequency Financial Time-series

Ellipsoidal Subspace Support Vector Data Description

Exploiting Heterogeneity in Operational Neural Networks by Synaptic Plasticity

FastONN -- Python based open-source GPU implementation for Operational Neural Networks

Incremental Fast Subclass Discriminant Analysis

Monte Carlo Dropout Ensembles for Robust Illumination Estimation

Multilinear Compressive Learning with Prior Knowledge

Multimodal Subspace Support Vector Data Description

Not all domains are equally complex: Adaptive Multi-Domain Learning

Operational vs Convolutional Neural Networks for Image Denoising

Performance Indicator in Multilinear Compressive Learning

Saliency-based Weighted Multi-label Linear Discriminant Analysis

Self-Organized Operational Neural Networks for Severe Image Restoration Problems

Self-Organized Operational Neural Networks with Generative Neurons

Subset Sampling For Progressive Neural Network Learning

Bag of Color Features For Color Constancy

Color Constancy Convolutional Autoencoder

Mid-price Prediction Based on Machine Learning Methods with Technical and Quantitative Indicators

Limited Random Walk Algorithm for Big Graph Data Clustering

Outlier Edge Detection Using Random Graph Generation Models and Applications

Video Ladder Networks

On the Dynamics of a Recurrent Hopfield Network

ShakeMe: Key Generation From Shared Motion