Researcher profile

Bing Zeng

Bing Zeng contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
11works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

11 published item(s)

preprint2026arXiv

ExpoCM: Exposure-Aware One-Step Generative Single-Image HDR Reconstruction

Single-image HDR reconstruction aims to recover high dynamic range radiance from a single low dynamic range (LDR) input, but remains highly ill-posed due to detail saturation in over-exposed regions and noise amplification in under-exposed areas. While recent diffusion-based approaches offer powerful generative priors, they often overlook the exposure-dependent nature of the degradation and incur substantial computational costs from iterative sampling. To address these challenges, we propose ExpoCM, a novel one-step generative HDR reconstruction framework that reformulates HDR reconstruction as a Probability Flow ODE (PF-ODE) and constructs exposure-aware consistency trajectories via exposure-dependent perturbations. Specifically, a soft exposure mask is first constructed to separate the LDR image into over-, under-, and well-exposed regions. Based on this partition, region-conditioned consistency trajectories are designed to hallucinate saturated details, suppress noise in dark regions, and preserve reliable structures within a single, distillation-free inference step. To further enhance perceptual quality, we introduce an Exposure-guided Luminance-Chromaticity Loss in the CIE~$\text{L}^*\text{a}^*\text{b}^*$ space, which assigns exposure-aware weights to luminance and chromaticity components, effectively mitigating brightness bias and color drift. Extensive experiments on the HDR-REAL, HDR-EYE, and AIM2025 benchmarks demonstrate that ExpoCM achieves state-of-the-art fidelity and perceptual accuracy, while enabling over 400$\times$ and 20$\times$ faster inference compared to DDPM (1000 steps) and DDIM (50 steps), respectively.

preprint2026arXiv

ZeroIDIR: Zero-Reference Illumination Degradation Image Restoration with Perturbed Consistency Diffusion Models

In this paper, we propose a zero-reference diffusion-based framework, named ZeroIDIR, for illumination degradation image restoration, which decouples the restoration process into adaptive illumination correction and diffusion-based reconstruction while being trained solely on low-quality degraded images. Specifically, we design an adaptive gamma correction module that performs spatially varying exposure correction to generate illumination-corrected only representations to mitigate exposure bias and serve as reliable inputs for subsequent diffusion processes, where a histogram-guided illumination correction loss is introduced to regularize the corrected illumination distribution toward that of natural scenes. Subsequently, the illumination-corrected image is treated as an intermediate noisy state for the proposed perturbed consistency diffusion model to reconstruct details and suppress noise. Moreover, a perturbed diffusion consistency loss is proposed to constrain the forward diffusion trajectory of the final restored image to remain consistent with the perturbed state, thus improving restoration fidelity and stability in the absence of supervision. Extensive experiments on publicly available benchmarks show that the proposed method outperforms state-of-the-art unsupervised competitors and is comparable to supervised methods while being more generalizable to various scenes. Code is available at https://github.com/JianghaiSCU/ZeroIDIR.

preprint2022arXiv

A Two-stage Method for Non-extreme Value Salt-and-Pepper Noise Removal

There are several previous methods based on neural network can have great performance in denoising salt and pepper noise. However, those methods are based on a hypothesis that the value of salt and pepper noise is exactly 0 and 255. It is not true in the real world. The result of those methods deviate sharply when the value is different from 0 and 255. To overcome this weakness, our method aims at designing a convolutional neural network to detect the noise pixels in a wider range of value and then a filter is used to modify pixel value to 0, which is beneficial for further filtering. Additionally, another convolutional neural network is used to conduct the denoising and restoration work.

preprint2022arXiv

Cross-Block Difference Guided Fast CU Partition for VVC Intra Coding

In this paper, we propose a new fast CU partition algorithm for VVC intra coding based on cross-block difference. This difference is measured by the gradient and the content of sub-blocks obtained from partition and is employed to guide the skipping of unnecessary horizontal and vertical partition modes. With this guidance, a fast determination of block partitions is accordingly achieved. Compared with VVC, our proposed method can save 41.64% (on average) encoding time with only 0.97% (on average) increase of BD-rate.

preprint2022arXiv

FINet: Dual Branches Feature Interaction for Partial-to-Partial Point Cloud Registration

Data association is important in the point cloud registration. In this work, we propose to solve the partial-to-partial registration from a new perspective, by introducing multi-level feature interactions between the source and the reference clouds at the feature extraction stage, such that the registration can be realized without the attentions or explicit mask estimation for the overlapping detection as adopted previously. Specifically, we present FINet, a feature interaction-based structure with the capability to enable and strengthen the information associating between the inputs at multiple stages. To achieve this, we first split the features into two components, one for rotation and one for translation, based on the fact that they belong to different solution spaces, yielding a dual branches structure. Second, we insert several interaction modules at the feature extractor for the data association. Third, we propose a transformation sensitivity loss to obtain rotation-attentive and translation-attentive features. Experiments demonstrate that our method performs higher precision and robustness compared to the state-of-the-art traditional and learning-based methods. Code is available at https://github.com/megvii-research/FINet.

preprint2022arXiv

Ghost-free High Dynamic Range Imaging with Context-aware Transformer

High dynamic range (HDR) deghosting algorithms aim to generate ghost-free HDR images with realistic details. Restricted by the locality of the receptive field, existing CNN-based methods are typically prone to producing ghosting artifacts and intensity distortions in the presence of large motion and severe saturation. In this paper, we propose a novel Context-Aware Vision Transformer (CA-ViT) for ghost-free high dynamic range imaging. The CA-ViT is designed as a dual-branch architecture, which can jointly capture both global and local dependencies. Specifically, the global branch employs a window-based Transformer encoder to model long-range object movements and intensity variations to solve ghosting. For the local branch, we design a local context extractor (LCE) to capture short-range image features and use the channel attention mechanism to select informative local details across the extracted features to complement the global branch. By incorporating the CA-ViT as basic components, we further build the HDR-Transformer, a hierarchical network to reconstruct high-quality ghost-free HDR images. Extensive experiments on three benchmark datasets show that our approach outperforms state-of-the-art methods qualitatively and quantitatively with considerably reduced computational budgets. Codes are available at https://github.com/megvii-research/HDR-Transformer

preprint2022arXiv

JigsawGAN: Auxiliary Learning for Solving Jigsaw Puzzles with Generative Adversarial Networks

The paper proposes a solution based on Generative Adversarial Network (GAN) for solving jigsaw puzzles. The problem assumes that an image is divided into equal square pieces, and asks to recover the image according to information provided by the pieces. Conventional jigsaw puzzle solvers often determine the relationships based on the boundaries of pieces, which ignore the important semantic information. In this paper, we propose JigsawGAN, a GAN-based auxiliary learning method for solving jigsaw puzzles with unpaired images (with no prior knowledge of the initial images). We design a multi-task pipeline that includes, (1) a classification branch to classify jigsaw permutations, and (2) a GAN branch to recover features to images in correct orders. The classification branch is constrained by the pseudo-labels generated according to the shuffled pieces. The GAN branch concentrates on the image semantic information, where the generator produces the natural images to fool the discriminator, while the discriminator distinguishes whether a given image belongs to the synthesized or the real target domain. These two branches are connected by a flow-based warp module that is applied to warp features to correct the order according to the classification results. The proposed method can solve jigsaw puzzles more efficiently by utilizing both semantic information and boundary information simultaneously. Qualitative and quantitative comparisons against several representative jigsaw puzzle solvers demonstrate the superiority of our method.

preprint2022arXiv

OL-DN: Online learning based dual-domain network for HEVC intra frame quality enhancement

Convolution neural network (CNN) based methods offer effective solutions for enhancing the quality of compressed image and video. However, these methods ignore using the raw data to enhance the quality. In this paper, we adopt the raw data in the quality enhancement for the HEVC intra-coded image by proposing an online learning-based method. When quality enhancement is demanded, we online train our proposed model at encoder side and then use the parameters to update the model of decoder side. This method not only improves model performance, but also makes one model adoptable to multiple coding scenarios. Besides, quantization error in discrete cosine transform (DCT) coefficients is the root cause of various HEVC compression artifacts. Thus, we combine frequency domain priors to assist image reconstruction. We design a DCT based convolution layer, to produce DCT coefficients that are suitable for CNN learning. Experimental results show that our proposed online learning based dual-domain network (OL-DN) has achieved superior performance, compared with the state-of-the-art methods.

preprint2022arXiv

UPHDR-GAN: Generative Adversarial Network for High Dynamic Range Imaging with Unpaired Data

The paper proposes a method to effectively fuse multi-exposure inputs and generate high-quality high dynamic range (HDR) images with unpaired datasets. Deep learning-based HDR image generation methods rely heavily on paired datasets. The ground truth images play a leading role in generating reasonable HDR images. Datasets without ground truth are hard to be applied to train deep neural networks. Recently, Generative Adversarial Networks (GAN) have demonstrated their potentials of translating images from source domain X to target domain Y in the absence of paired examples. In this paper, we propose a GAN-based network for solving such problems while generating enjoyable HDR results, named UPHDR-GAN. The proposed method relaxes the constraint of the paired dataset and learns the mapping from the LDR domain to the HDR domain. Although the pair data are missing, UPHDR-GAN can properly handle the ghosting artifacts caused by moving objects or misalignments with the help of the modified GAN loss, the improved discriminator network and the useful initialization phase. The proposed method preserves the details of important regions and improves the total image perceptual quality. Qualitative and quantitative comparisons against the representative methods demonstrate the superiority of the proposed UPHDR-GAN.

preprint2020arXiv

Neural Point Cloud Rendering via Multi-Plane Projection

We present a new deep point cloud rendering pipeline through multi-plane projections. The input to the network is the raw point cloud of a scene and the output are image or image sequences from a novel view or along a novel camera trajectory. Unlike previous approaches that directly project features from 3D points onto 2D image domain, we propose to project these features into a layered volume of camera frustum. In this way, the visibility of 3D points can be automatically learnt by the network, such that ghosting effects due to false visibility check as well as occlusions caused by noise interferences are both avoided successfully. Next, the 3D feature volume is fed into a 3D CNN to produce multiple layers of images w.r.t. the space division in the depth directions. The layered images are then blended based on learned weights to produce the final rendering results. Experiments show that our network produces more stable renderings compared to previous methods, especially near the object boundaries. Moreover, our pipeline is robust to noisy and relatively sparse point cloud for a variety of challenging scenes.

preprint2020arXiv

Rethinking Image Deraining via Rain Streaks and Vapors

Single image deraining regards an input image as a fusion of a background image, a transmission map, rain streaks, and atmosphere light. While advanced models are proposed for image restoration (i.e., background image generation), they regard rain streaks with the same properties as background rather than transmission medium. As vapors (i.e., rain streaks accumulation or fog-like rain) are conveyed in the transmission map to model the veiling effect, the fusion of rain streaks and vapors do not naturally reflect the rain image formation. In this work, we reformulate rain streaks as transmission medium together with vapors to model rain imaging. We propose an encoder-decoder CNN named as SNet to learn the transmission map of rain streaks. As rain streaks appear with various shapes and directions, we use ShuffleNet units within SNet to capture their anisotropic representations. As vapors are brought by rain streaks, we propose a VNet containing spatial pyramid pooling (SSP) to predict the transmission map of vapors in multi-scales based on that of rain streaks. Meanwhile, we use an encoder CNN named ANet to estimate atmosphere light. The SNet, VNet, and ANet are jointly trained to predict transmission maps and atmosphere light for rain image restoration. Extensive experiments on the benchmark datasets demonstrate the effectiveness of the proposed visual model to predict rain streaks and vapors. The proposed deraining method performs favorably against state-of-the-art deraining approaches.