Researcher profile

Guangwei Gao

Guangwei Gao contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
9works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

9 published item(s)

preprint2026arXiv

LiBrA-Net: Lie-Algebraic Bilateral Affine Fields for Real-Time 4K Video Dehazing

Currently, there is a gap in the field of ultra-high-definition (UHD) video dehazing due to the lack of a benchmark for evaluation. Furthermore, existing video dehazing methods cannot run on consumer-grade GPUs when processing continuous UHD sequences of 3--5 frames at a time. In this paper, we address both issues with a new benchmark and an efficient method. Our key observation is that atmospheric dehazing reduces to a per-pixel affine transform governed by the low-frequency depth field, which can be compactly encoded in bilateral grids whose prediction cost is decoupled from the output resolution. Building on this, we propose LiBrA-Net, which factorizes the spatiotemporal affine field into a spatial--color and a temporal bilateral sub-grid predicted at a fixed low resolution, fuses their coefficients in the $\mathfrak{gl}(3)$ Lie algebra under group-theoretic regularization, maps the result to invertible GL(3) transforms via a Cayley parameterization, and restores high-frequency detail through a lightweight input-guided branch. We further release UHV-4K, the first paired 4K video dehazing benchmark with depth, transmission, and optical-flow annotations on every frame. Across UHV-4K, REVIDE, and HazeWorld, LiBrA-Net sets a new state of the art among compared video dehazing methods while running native 4K at 25 FPS on a single GPU with only 6.12 M parameters. Code and data are available at https://anonymous.4open.science/r/LiBrA-Net-42B8.

preprint2026arXiv

RPBA-Net: An Interpretable Residual Pyramid Bilateral Affine Network for RAW-Domain ISP Enhancement

To address module fragmentation, uninterpretable mappings, and deployment constraints in RAW-domain demosaicing, color correction, and detail enhancement, this paper proposes RPBA-Net, an interpretable residual pyramid bilateral affine network for RAW-domain ISP enhancement. Given packed RAW as input, the method performs residual affine base reconstruction by estimating a base RGB representation and learning identity-guided residual affine corrections, thereby unifying demosaicing and enhancement. It further builds pyramid bilateral affine grids and combines guide-driven autoregressive adaptive slicing with adaptive cross-layer fusion to hierarchically model global tone restoration and local texture enhancement. In addition, smoothness, cross-scale consistency, and magnitude regularization terms are introduced to improve model stability, controllability, and structural interpretability. Extensive experiments demonstrate that RPBA-Net surpasses representative RAW-to-sRGB methods and achieves state-of-the-art performance in reconstruction fidelity and perceptual quality, while maintaining low model complexity and strong deployment potential for mobile and embedded platforms.

preprint2022arXiv

A $C^{0}$ finite element approximation of planar oblique derivative problems in non-divergence form

This paper proposes a $C^{0}$ (non-Lagrange) primal finite element approximation of the linear elliptic equations in non-divergence form with oblique boundary conditions in planar, curved domains. As an extension of [Calcolo, 58 (2022), No. 9], the Miranda-Talenti estimate for oblique boundary conditions at a discrete level is established by enhancing the regularity on the vertices. Consequently, the coercivity constant for the proposed scheme is exactly the same as that from PDE theory. The quasi-optimal order error estimates are established by carefully studying the approximation property of the finite element spaces. Numerical experiments are provided to verify the convergence theory and to demonstrate the accuracy and efficiency of the proposed methods.

preprint2022arXiv

FBSNet: A Fast Bilateral Symmetrical Network for Real-Time Semantic Segmentation

Real-time semantic segmentation, which can be visually understood as the pixel-level classification task on the input image, currently has broad application prospects, especially in the fast-developing fields of autonomous driving and drone navigation. However, the huge burden of calculation together with redundant parameters are still the obstacles to its technological development. In this paper, we propose a Fast Bilateral Symmetrical Network (FBSNet) to alleviate the above challenges. Specifically, FBSNet employs a symmetrical encoder-decoder structure with two branches, semantic information branch and spatial detail branch. The Semantic Information Branch (SIB) is the main branch with semantic architecture to acquire the contextual information of the input image and meanwhile acquire sufficient receptive field. While the Spatial Detail Branch (SDB) is a shallow and simple network used to establish local dependencies of each pixel for preserving details, which is essential for restoring the original resolution during the decoding phase. Meanwhile, a Feature Aggregation Module (FAM) is designed to effectively combine the output of these two branches. Experimental results of Cityscapes and CamVid show that the proposed FBSNet can strike a good balance between accuracy and efficiency. Specifically, it obtains 70.9\% and 68.9\% mIoU along with the inference speed of 90 fps and 120 fps on these two test datasets, respectively, with only 0.62 million parameters on a single RTX 2080Ti GPU. The code is available at https://github.com/IVIPLab/FBSNet.

preprint2022arXiv

Feature Distillation Interaction Weighting Network for Lightweight Image Super-Resolution

Convolutional neural networks based single-image super-resolution (SISR) has made great progress in recent years. However, it is difficult to apply these methods to real-world scenarios due to the computational and memory cost. Meanwhile, how to take full advantage of the intermediate features under the constraints of limited parameters and calculations is also a huge challenge. To alleviate these issues, we propose a lightweight yet efficient Feature Distillation Interaction Weighted Network (FDIWN). Specifically, FDIWN utilizes a series of specially designed Feature Shuffle Weighted Groups (FSWG) as the backbone, and several novel mutual Wide-residual Distillation Interaction Blocks (WDIB) form an FSWG. In addition, Wide Identical Residual Weighting (WIRW) units and Wide Convolutional Residual Weighting (WCRW) units are introduced into WDIB for better feature distillation. Moreover, a Wide-Residual Distillation Connection (WRDC) framework and a Self-Calibration Fusion (SCF) unit are proposed to interact features with different scales more flexibly and efficiently.Extensive experiments show that our FDIWN is superior to other models to strike a good balance between model performance and efficiency. The code is available at https://github.com/IVIPLab/FDIWN.

preprint2022arXiv

Leaning Compact and Representative Features for Cross-Modality Person Re-Identification

This paper pays close attention to the cross-modality visible-infrared person re-identification (VI Re-ID) task, which aims to match pedestrian samples between visible and infrared modes. In order to reduce the modality-discrepancy between samples from different cameras, most existing works usually use constraints based on Euclidean metric. Because of the Euclidean based distance metric strategy cannot effectively measure the internal angles between the embedded vectors, the existing solutions cannot learn the angularly discriminative feature embedding. Since the most important factor affecting the classification task based on embedding vector is whether there is an angularly discriminative feature space, in this paper, we present a new loss function called Enumerate Angular Triplet (EAT) loss. Also, motivated by the knowledge distillation, to narrow down the features between different modalities before feature embedding, we further present a novel Cross-Modality Knowledge Distillation (CMKD) loss. Benefit from the above two considerations, the embedded features are discriminative enough in a way to tackle modality-discrepancy problem. The experimental results on RegDB and SYSU-MM01 datasets have demonstrated that the proposed method is superior to the other most advanced methods in terms of impressive performance. Code is available at https://github.com/IVIPLab/LCCRF.

preprint2022arXiv

Lightweight Bimodal Network for Single-Image Super-Resolution via Symmetric CNN and Recursive Transformer

Single-image super-resolution (SISR) has achieved significant breakthroughs with the development of deep learning. However, these methods are difficult to be applied in real-world scenarios since they are inevitably accompanied by the problems of computational and memory costs caused by the complex operations. To solve this issue, we propose a Lightweight Bimodal Network (LBNet) for SISR. Specifically, an effective Symmetric CNN is designed for local feature extraction and coarse image reconstruction. Meanwhile, we propose a Recursive Transformer to fully learn the long-term dependence of images thus the global information can be fully used to further refine texture details. Studies show that the hybrid of CNN and Transformer can build a more efficient model. Extensive experiments have proved that our LBNet achieves more prominent performance than other state-of-the-art methods with a relatively low computational cost and memory consumption. The code is available at https://github.com/IVIPLab/LBNet.

preprint2022arXiv

Multiple Degradation and Reconstruction Network for Single Image Denoising via Knowledge Distillation

Single image denoising (SID) has achieved significant breakthroughs with the development of deep learning. However, the proposed methods are often accompanied by plenty of parameters, which greatly limits their application scenarios. Different from previous works that blindly increase the depth of the network, we explore the degradation mechanism of the noisy image and propose a lightweight Multiple Degradation and Reconstruction Network (MDRN) to progressively remove noise. Meanwhile, we propose two novel Heterogeneous Knowledge Distillation Strategies (HMDS) to enable MDRN to learn richer and more accurate features from heterogeneous models, which make it possible to reconstruct higher-quality denoised images under extreme conditions. Extensive experiments show that our MDRN achieves favorable performance against other SID models with fewer parameters. Meanwhile, plenty of ablation studies demonstrate that the introduced HMDS can improve the performance of tiny models or the model under high noise levels, which is extremely useful for related applications.

preprint2020arXiv

Cross-View Image Synthesis with Deformable Convolution and Attention Mechanism

Learning to generate natural scenes has always been a daunting task in computer vision. This is even more laborious when generating images with very different views. When the views are very different, the view fields have little overlap or objects are occluded, leading the task very challenging. In this paper, we propose to use Generative Adversarial Networks(GANs) based on a deformable convolution and attention mechanism to solve the problem of cross-view image synthesis (see Fig.1). It is difficult to understand and transform scenes appearance and semantic information from another view, thus we use deformed convolution in the U-net network to improve the network's ability to extract features of objects at different scales. Moreover, to better learn the correspondence between images from different views, we apply an attention mechanism to refine the intermediate feature map thus generating more realistic images. A large number of experiments on different size images on the Dayton dataset[1] show that our model can produce better results than state-of-the-art methods.