Source author record

Mohammad Akbari

Mohammad Akbari appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision eess.IV Machine Learning eess.SP Computation and Language math.OC Multimedia

Catalog footprint

What is connected

11works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

CPPO: Contrastive Perception for Vision Language Policy Optimization

We introduce CPPO, a Contrastive Perception Policy Optimization method for finetuning vision-language models (VLMs). While reinforcement learning (RL) has advanced reasoning in language models, extending it to multimodal reasoning requires improving both the perception and reasoning aspects. Prior works tackle this challenge mainly with explicit perception rewards, but disentangling perception tokens from reasoning tokens is difficult, requiring extra LLMs, ground-truth data, forced separation of perception from reasoning by policy model, or applying rewards indiscriminately to all output tokens. CPPO addresses this problem by detecting perception tokens via entropy shifts in the model outputs under perturbed input images. CPPO then extends the RL objective function with a Contrastive Perception Loss (CPL) that enforces consistency under information-preserving perturbations and sensitivity under information-removing ones. Experiments show that CPPO surpasses previous perception-rewarding methods, while avoiding extra models, making training more efficient and scalable.

preprint2022arXiv

A Deep Learning-Based Approach for Cell Outage Compensation in NOMA Networks

Cell outage compensation enables a network to react to a catastrophic cell failure quickly and serve users in the outage zone uninterruptedly. Utilizing the promising benefits of non-orthogonal multiple access (NOMA) for improving the throughput of cell edge users, we propose a newly NOMA-based cell outage compensation scheme. In this scheme, the compensation is formulated as a mixed integer non-linear program (MINLP) where outage zone users are associated to neighboring cells and their power are allocated with the objective of maximizing spectral efficiency, subject to maintaining the quality of service for the rest of the users. Owing to the importance of immediate management of cell outage and handling the computational complexity, we develop a low-complexity suboptimal solution for this problem in which the user association scheme is determined by a newly heuristic algorithm, and power allocation is set by applying an innovative deep neural network (DNN). The complexity of our proposed method is in the order of polynomial basis, which is much less than the exponential complexity of finding an optimal solution. Simulation results demonstrate that the proposed method approaches the optimal solution. Moreover, the developed scheme greatly improves fairness and increases the number of served users.

preprint2022arXiv

E-LANG: Energy-Based Joint Inferencing of Super and Swift Language Models

Building huge and highly capable language models has been a trend in the past years. Despite their great performance, they incur high computational cost. A common solution is to apply model compression or choose light-weight architectures, which often need a separate fixed-size model for each desirable computational budget, and may lose performance in case of heavy compression. This paper proposes an effective dynamic inference approach, called E-LANG, which distributes the inference between large accurate Super-models and light-weight Swift models. To this end, a decision making module routes the inputs to Super or Swift models based on the energy characteristics of the representations in the latent space. This method is easily adoptable and architecture agnostic. As such, it can be applied to black-box pre-trained models without a need for architectural manipulations, reassembling of modules, or re-training. Unlike existing methods that are only applicable to encoder-only backbones and classification tasks, our method also works for encoder-decoder structures and sequence-to-sequence tasks such as translation. The E-LANG performance is verified through a set of experiments with T5 and BERT backbones on GLUE, SuperGLUE, and WMT. In particular, we outperform T5-11B with an average computations speed-up of 3.3$\times$ on GLUE and 2.9$\times$ on SuperGLUE. We also achieve BERT-based SOTA on GLUE with 3.2$\times$ less computations. Code and demo are available in the supplementary materials.

preprint2021arXiv

A Compact Deep Learning Model for Face Spoofing Detection

In recent years, face biometric security systems are rapidly increasing, therefore, the presentation attack detection (PAD) has received significant attention from research communities and has become a major field of research. Researchers have tackled the problem with various methods, from exploiting conventional texture feature extraction such as LBP, BSIF, and LPQ to using deep neural networks with different architectures. Despite the results each of these techniques has achieved for a certain attack scenario or dataset, most of them still failed to generalized the problem for unseen conditions, as the efficiency of each is limited to certain type of presentation attacks and instruments (PAI). In this paper, instead of completely extracting hand-crafted texture features or relying only on deep neural networks, we address the problem via fusing both wide and deep features in a unified neural architecture. The main idea is to take advantage of the strength of both methods to derive well-generalized solution for the problem. We also evaluated the effectiveness of our method by comparing the results with each of the mentioned techniques separately. The procedure is done on different spoofing datasets such as ROSE-Youtu, SiW and NUAA Imposter datasets. In particular, we simultanously learn a low dimensional latent space empowered with data-driven features learnt via Convolutional Neural Network designes for spoofing detection task (i.e., deep channel) as well as leverages spoofing detection feature already popular for spoofing in frequency and temporal dimensions ( i.e., via wide channel).

preprint2021arXiv

An Iterative Riccati Algorithm for Online Linear Quadratic Control

An online policy learning problem of linear control systems is studied. In this problem, the control system is known and linear, and a sequence of quadratic cost functions is revealed to the controller in hindsight, and the controller updates its policy to achieve a sublinear regret, similar to online optimization. A modified online Riccati algorithm is introduced that under some boundedness assumption leads to logarithmic regret bound. In particular, the logarithmic regret for the scalar case is achieved without boundedness assumption. Our algorithm, while achieving a better regret bound, also has reduced complexity compared to earlier algorithms which rely on solving semi-definite programs at each stage.

preprint2021arXiv

Deep Learning meets Liveness Detection: Recent Advancements and Challenges

Facial biometrics has been recently received tremendous attention as a convenient replacement for traditional authentication systems. Consequently, detecting malicious attempts has found great significance, leading to extensive studies in face anti-spoofing~(FAS),i.e., face presentation attack detection. Deep feature learning and techniques, as opposed to hand-crafted features, have promised a dramatic increase in the FAS systems' accuracy, tackling the key challenges of materializing the real-world application of such systems. Hence, a new research area dealing with the development of more generalized as well as accurate models is increasingly attracting the attention of the research community and industry. In this paper, we present a comprehensive survey on the literature related to deep-feature-based FAS methods since 2017. To shed light on this topic, a semantic taxonomy based on various features and learning methodologies is represented. Further, we cover predominant public datasets for FAS in chronological order, their evolutional progress, and the evaluation criteria (both intra-dataset and inter-dataset). Finally, we discuss the open research challenges and future directions.

preprint2020arXiv

Deep Learning-based Image Compression with Trellis Coded Quantization

Recently many works attempt to develop image compression models based on deep learning architectures, where the uniform scalar quantizer (SQ) is commonly applied to the feature maps between the encoder and decoder. In this paper, we propose to incorporate trellis coded quantizer (TCQ) into a deep learning based image compression framework. A soft-to-hard strategy is applied to allow for back propagation during training. We develop a simple image compression model that consists of three subnetworks (encoder, decoder and entropy estimation), and optimize all of the components in an end-to-end manner. We experiment on two high resolution image datasets and both show that our model can achieve superior performance at low bit rates. We also show the comparisons between TCQ and SQ based on our proposed baseline model and demonstrate the advantage of TCQ.

preprint2020arXiv

DRL-Based QoS-Aware Resource Allocation Scheme for Coexistence of Licensed and Unlicensed Users in LTE and Beyond

In this paper, we employ deep reinforcement learning to develop a novel radio resource allocation and packet scheduling scheme for different Quality of Service (QoS) requirements applicable to LTEadvanced and 5G networks. In addition, regarding the scarcity of spectrum in below 6GHz bands, the proposed algorithm dynamically allocates the resource blocks (RBs) to licensed users in a way to mostly preserve the continuity of unallocated RBs. This would improve the efficiency of communication among the unlicensed entities by increasing the chance of uninterrupted communication and reducing the load of coordination overheads. The optimization problem is formulated as a Markov Decision Process (MDP), observing the entire queue of the demands, where failing to meet QoS constraints penalizes the goal with a multiplicative factor. Furthermore, a notion of continuity for unallocated resources is taken into account as an additive term in the objective function. Considering the variations in both channel coefficients and users requests, we utilize a deep reinforcement learning algorithm as an online and numerically efficient approach to solve the MDP. Numerical results show that the proposed method achieves higher average spectral efficiency, while considering delay budget and packet loss ratio, compared to the conventional greedy min-delay and max-throughput schemes, in which a fixed part of the spectrum is forced to be vacant for unlicensed entities.

preprint2020arXiv

Generalized Octave Convolutions for Learned Multi-Frequency Image Compression

Learned image compression has recently shown the potential to outperform the standard codecs. State-of-the-art rate-distortion (R-D) performance has been achieved by context-adaptive entropy coding approaches in which hyperprior and autoregressive models are jointly utilized to effectively capture the spatial dependencies in the latent representations. However, the latents are feature maps of the same spatial resolution in previous works, which contain some redundancies that affect the R-D performance. In this paper, we propose the first learned multi-frequency image compression and entropy coding approach that is based on the recently developed octave convolutions to factorize the latents into high and low frequency (resolution) components, where the low frequency is represented by a lower resolution. Therefore, its spatial redundancy is reduced, which improves the R-D performance. Novel generalized octave convolution and octave transposed-convolution architectures with internal activation layers are also proposed to preserve more spatial structure of the information. Experimental results show that the proposed scheme not only outperforms all existing learned methods as well as standard codecs such as the next-generation video coding standard VVC (4:2:0) on the Kodak dataset in both PSNR and MS-SSIM. We also show that the proposed generalized octave convolution can improve the performance of other auto-encoder-based computer vision tasks such as semantic segmentation and image denoising.

preprint2020arXiv

Learned Multi-Resolution Variable-Rate Image Compression with Octave-based Residual Blocks

Recently deep learning-based image compression has shown the potential to outperform traditional codecs. However, most existing methods train multiple networks for multiple bit rates, which increase the implementation complexity. In this paper, we propose a new variable-rate image compression framework, which employs generalized octave convolutions (GoConv) and generalized octave transposed-convolutions (GoTConv) with built-in generalized divisive normalization (GDN) and inverse GDN (IGDN) layers. Novel GoConv- and GoTConv-based residual blocks are also developed in the encoder and decoder networks. Our scheme also uses a stochastic rounding-based scalar quantization. To further improve the performance, we encode the residual between the input and the reconstructed image from the decoder network as an enhancement layer. To enable a single model to operate with different bit rates and to learn multi-rate image features, a new objective function is introduced. Experimental results show that the proposed framework trained with variable-rate objective function outperforms the standard codecs such as H.265/HEVC-based BPG and state-of-the-art learning-based variable-rate methods.

preprint2019arXiv

Improved Hybrid Layered Image Compression using Deep Learning and Traditional Codecs

Recently deep learning-based methods have been applied in image compression and achieved many promising results. In this paper, we propose an improved hybrid layered image compression framework by combining deep learning and the traditional image codecs. At the encoder, we first use a convolutional neural network (CNN) to obtain a compact representation of the input image, which is losslessly encoded by the FLIF codec as the base layer of the bit stream. A coarse reconstruction of the input is obtained by another CNN from the reconstructed compact representation. The residual between the input and the coarse reconstruction is then obtained and encoded by the H.265/HEVC-based BPG codec as the enhancement layer of the bit stream. Experimental results using the Kodak and Tecnick datasets show that the proposed scheme outperforms the state-of-the-art deep learning-based layered coding scheme and traditional codecs including BPG in both PSNR and MS-SSIM metrics across a wide range of bit rates, when the images are coded in the RGB444 domain.

Mohammad Akbari

What is connected

Connect this record

See the researcher in context

Building this map preview

11 published item(s)

CPPO: Contrastive Perception for Vision Language Policy Optimization

A Deep Learning-Based Approach for Cell Outage Compensation in NOMA Networks

E-LANG: Energy-Based Joint Inferencing of Super and Swift Language Models

A Compact Deep Learning Model for Face Spoofing Detection

An Iterative Riccati Algorithm for Online Linear Quadratic Control

Deep Learning meets Liveness Detection: Recent Advancements and Challenges

Deep Learning-based Image Compression with Trellis Coded Quantization

DRL-Based QoS-Aware Resource Allocation Scheme for Coexistence of Licensed and Unlicensed Users in LTE and Beyond

Generalized Octave Convolutions for Learned Multi-Frequency Image Compression

Learned Multi-Resolution Variable-Rate Image Compression with Octave-based Residual Blocks

Improved Hybrid Layered Image Compression using Deep Learning and Traditional Codecs